OAuth and URL encoding

Differences in URL encoding functions will make implementation of OAuth harder. There is an immediate need for correct parameter encoding implementations.

I maintain an implementation of OAuth. Whilst checking existing code and implementing my own version I wondered how some implementations will differ.

OAuth uses RFC3986 for the encoding of URI parameters. Another often used encoding scheme is RFC1738 or RFC2396 (updated in RFC2732).

The differences? Quite a bit.

The unreserved characters in RFC3986 are:

unreserved = ALPHA, DIGIT, '-', '.', '_', '~'

And in RFC17138:

unreserved     = alpha | digit | safe | extra
safe           = "$" | "-" | "_" | "." | "+"
extra          = "!" | "*" | "'" | "(" | ")" | ","

And in RFC2396/2732

"A" .. "Z", "a" .. "z", "0" .. "9",
"-", "_", ".", "!", "~", "*", "'", "(", ")"

What do some programming languages use?

PHP

PHP has two options. Use urlencode or rawurlencode. The difference between the two is the escaping of the “+” character.

rawurlencode claim to fame is to be compatible with RFC1738. In fact it is not. It encodes all characters as in RFC3986, and then also the “~”. So rawurlencode comes close. It escapes everything, except:

A-Za-z0-9\-_.

So the only difference here is the “~” character. The correct funtion will be:

function oauth_urlencode ( $s )
{
    return str_replace('%7E', '~', rawurlencode($s);
}

Perl

CPAN implements URI:Escape. This one uses RFC2396/2732 as the basis for its character encoding. The default for this function is to escape everything except:

A-Za-z0-9\-_.!~*'()

Javascript

Yet another encoding method. Most javascript programmers will use the function escape. This one encoded everything except:

A-Za-z0-9\-_.+*/@

Addition from John Kristian: when you want to encode with Javascript you can take a look atOAuth.percentEncode in oauth.googlecode.com/svn/code/javascript/oauth.js

It is a mess, what to do?

I assume that programmers are lazy and will re-use their good old trusty encoding functions. They should, it is how we train programmers. This will give some interesting problems with OAuth though, as OAuth insists on encoding and sometimes double encoding parameters. And uses those encoded values to calculate the signature of a message.

This insistance on using a very specific implementation of encoding is, in my opinion, the Achilles heel of OAuth.

When checking the signature we have basically two options:

Not decode whatever comes in before recalculating the signature. In that way we won't make any false assumptions about parameters being encoded one way or the other.

Recode all incoming parameter values and names, in that way we make sure that we use the correct encoding (according to the spec) for our signature calculation.

However, we must assume that the encoding used in calculating the signature base string and the key from the consumer secret and the token secret is correct according to RFC3986.

Conclusion

We, as a community using OAuth, need to create test sets that will test all edge cases, otherwise we are in for a rough ride. And, to help other programmers, there is an immediate need for correct implementations of the RFC3986 parameter encoding.

Update

The OAuth community has a nice set of testcases online at wiki.oauth.net/TestCases Let's all use them and make sure that our implementations are correct!

Share this page Articles By Marc Worrell – Keyword php, perl, oauth, Software, Protocols, Articles – Saturday, October 11, 2008