Amazon S3 and Filename Magic

When using Amazon S3 as a backing store for your application or website, you'll soon realize that you need a good naming strategy for how how you will organize the objects you store. One such strategy is to normalize the name of the original file being uploaded to an MD5 Hash of the file contents or to use some randomly generated identifier that is globally unique (like a GUID/UUID) that is stored in a database to provide a key to any additional data you wish you store, decoupling the physical storage from how you will use the data that you've stored.

You can achieve the best of both worlds by storing the file using a GUID as the key and storing a second object that uses the file's MD5 hash as its key. This secondary object is purely a pointer object and the only data it contians is the GUID value the file it is linked to.

This works fine and dandy until you go to serve up the file from the S3 url and the browser doesn't know 1) how to handle the file, and 2) doesn't know anything about the original filename so it will prompt the user to save the file as the GUID. This can be solved by setting two headers when uploading the file under the GUID key, Content-Type and Content-Disposition.

Examples:

    # Python code for all you .NET readers, the concept is the same, however.
    header = {}
    header['content-type'] = 'images/jpeg'
    header['content-disposition'] = 'attachment; filename="original.jpg"'

The original.jpg string value can obviously be replaced by whatever you deem appropropriate with a little string concatentation.

Tags: amazon, s3, aws, python