yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1233838] Re: cms token_id's are not URL safe nor RFC compliant

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Morgan Fainberg <morgan.fainberg@xxxxxxxxx>
Date: Wed, 04 Jun 2014 23:07:31 -0000
Reply-to: Bug 1233838 <1233838@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
** Changed in: keystone
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Keystone.
https://bugs.launchpad.net/bugs/1233838

Title:
  cms token_id's are not URL safe nor RFC compliant

Status in OpenStack Identity (Keystone):
  Fix Released

Bug description:
  <pre>
  The token id for a cms signed token is generated via
  cms.cms_to_token() which extracts the base64 cms data to use as the
  token id. The '/' character in the base64 text was
  replaced with the '-' character in an attempt to make the
  resulting token id URL safe. The token id is used in both URL's and in
  HTTP header values.

  There are a few problems with this approach.

  1) The result is still not url safe due the presence of the '+'
  character and the pad character '='. Both of these characters are
  reserved for the query component and thus would need to be escaped
  further disrupting the base64 alphabet. See RFC-2396 "Uniform Resource
  Identifiers (URI): Generic Syntax"

  2) RFC-4648 "The Base16, Base32, and Base64 Data Encodings" defines a
  URL safe encoding for base64 data. It maps '+' to '-' and '/' to '_'
  and either strips the padding or demands it be %-encoded as per
  RFC-2396. The result is both URL and file name safe and is referred to
  as base64url.

  3) The Python base64 module has direct support for base64url (we should
  be using it).

  4) The current mapping of '/' to '-' is unfortunate because it
  directly conflicts with the RFC-4648 base64url mapping. The alphabet
  character '-' is supposed to represent index 62 not index 63, thus one
  cannot augment the current mapping to comply with RFC-4648. Plus the
  current mapping still isn't URL safe.

  In OpenStack we should adhere to standards when they exist and not
  invent a non-standard incomplete solution. If we use the RFC-4648
  compliant mechanism we can then also call standard Python libraries to
  perform base64url encode/decode operations.

  Note, base64url is also safe as a value in HTTP headers.

  The cms.cms_to_token() and cms.token_to_cms() should be re-implemented
  to produce token id's which can be safely used in HTTP contexts as
  well as using RFC defined base64 alphabets.

  Since token lifetimes are quite short there shouldn't be backward
  compatibility issues with previously issued tokens. A new token
  utilizing the new token id format will issued.

  Note:

  base64url continues to use the '=' pad character which is NOT URL
  safe. RFC-4648 suggests two alternate methods to deal with this.

  percent-encode
      percent-encode the pad character (e.g. '=' becomes
      '%3D'). This makes the base64url text fully safe. But
      percent-enconding has the downside of requiring
      percent-decoding prior to feeding the base64url text into a
      base64url decoder since most base64url decoders do not
      recognize %3D as a pad character and most decoders require
      correct padding.

  no-padding
      padding is not strictly necessary to decode base64 or
      base64url text, the pad can be computed from the input text
      length. However many decoders demand padding and will consider
      non-padded text to be malformed. If one wants to omit the
      trailing pad character(s) for use in URL's it can be added back.

  For for token id use it we prefer strip the padding rather than
  percent-encode the padding. This makes the token id slightly shorter
  and cleaner.
  </pre>

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1233838/+subscriptions