Parsing Unicode Urls in Boto / Amazon S3

Tonight I ran into an old familiar "bug" that I have wrestled with for awhile but never taken the time to figure out a solution. It occurs when accessing/listing keys that exist on Amazon S3 via boto and a few other S3 libraries.


I tracked the problem down to the Python urllib.quote() method.

After a little research I found that I should insure any unicode strings are encoded to UTF-8 before passing as a converted ASCII string to urllib.quote(). So I added the following lines to revision 327 of the boto/s3/bucket.py module:

145,146d144
< if isinstance(v, unicode):
<                 v = v.encode('utf-8')

Now my bucket listings no longer break.

I have shared this with Mitch and would hope to see it integrated into the tree soon.