Differences in URL encoding functions will make implementation of OAuth harder. There is an immediate need for correct parameter encoding implementations.
I maintain an implementation of OAuth. Whilst checking existing code and implementing my own version I wondered how some implementations will differ.
OAuth uses RFC3986 for the encoding of URI parameters. Another often used encoding scheme is RFC1738 or RFC2396 (updated in RFC2732).
The differences? Quite a bit.
The unreserved characters in RFC3986 are:
unreserved = ALPHA, DIGIT, '-', '.', '_', '~'
And in RFC17138:
unreserved = alpha | digit | safe | extra safe = "$" | "-" | "_" | "." | "+" extra = "!" | "*" | "'" | "(" | ")" | ","
And in RFC2396/2732
"A" .. "Z", "a" .. "z", "0" .. "9", "-", "_", ".", "!", "~", "*", "'", "(", ")"
PHP has two options. Use urlencode or rawurlencode. The difference between the two is the escaping of the “+” character.
rawurlencode claim to fame is to be compatible with RFC1738. In fact it is not. It encodes all characters as in RFC3986, and then also the “~”. So rawurlencode comes close. It escapes everything, except:
A-Za-z0-9\-_.
So the only difference here is the “~” character. The correct funtion will be:
function oauth_urlencode ( $s ) { return str_replace('%7E', '~', rawurlencode($s); }
CPAN implements URI:Escape. This one uses RFC2396/2732 as the basis for its character encoding. The default for this function is to escape everything except:
A-Za-z0-9\-_.!~*'()
Yet another encoding method. Most javascript programmers will use the function escape. This one encoded everything except:
A-Za-z0-9\-_.+*/@
Addition from John Kristian: when you want to encode with Javascript you can take a look atOAuth.percentEncode in oauth.googlecode.com/svn/code/javascript/oauth.js
I assume that programmers are lazy and will re-use their good old trusty encoding functions. They should, it is how we train programmers. This will give some interesting problems with OAuth though, as OAuth insists on encoding and sometimes double encoding parameters. And uses those encoded values to calculate the signature of a message.
This insistance on using a very specific implementation of encoding is, in my opinion, the Achilles heel of OAuth.
When checking the signature we have basically two options:
However, we must assume that the encoding used in calculating the signature base string and the key from the consumer secret and the token secret is correct according to RFC3986.
We, as a community using OAuth, need to create test sets that will test all edge cases, otherwise we are in for a rough ride. And, to help other programmers, there is an immediate need for correct implementations of the RFC3986 parameter encoding.
The OAuth community has a nice set of testcases online at wiki.oauth.net/TestCases Let's all use them and make sure that our implementations are correct!