MySQL Default Text Encoding

Ok, partially a rant, but I feel entitled since I have figured out a quick and dirty solution to the problem of working with a database that has the default text encoding of latin1_swedish, or latin1.

Not sure why in this day and age of unicode and utf-8 that a database primarily used by web apps/hosts, MySQL, ships with an out of the box text encoding that is not inline with this universal standard.

I ran into a problem tonight with some data that was encoded funky. Doing a simple:

data.decode('utf-8')

Blew up on me with:

UnicodeDecodeError: 'utf8' codec can't decode bytes in position 8-10: invalid data

Then I remembered that even though my database is now using UTF-8 collation, it wasn't always that I way. I changed it some months ago, but only after the database had been online for over a year.

So, I thought I should decode to from the stored encoding and then encode it with UTF-8 in python so that I could use with the rest of the libraries I needed to use (e.g. simplejson).

data.decode('latin_1').encode('utf-8')

This did the trick!