December 12, 2008
MySQL Default Text Encoding
Not sure why in this day and age of unicode and utf-8 that a database primarily used by web apps/hosts, MySQL, ships with an out of the box text encoding that is not inline with this universal standard.
I ran into a problem tonight with some data that was encoded funky. Doing a simple:
Blew up on me with:
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 8-10: invalid data
Then I remembered that even though my database is now using UTF-8 collation, it wasn't always that I way. I changed it some months ago, but only after the database had been online for over a year.
So, I thought I should decode to from the stored encoding and then encode it with UTF-8 in python so that I could use with the rest of the libraries I needed to use (e.g. simplejson).
This did the trick!