Tag Archives: flickr scripting

Hashing Flickr Photos

I used to host my photos with a simple set of CGI scripts that basically worked well enough for my simple requirements.  Such web applications are easy and fun to write, but in the end I decided that it wasn’t worth it because:

  • Hosting large amounts of data on a generic shell account is typically quite expensive.  Flickr‘s “pro” account subscription is a very good deal in comparison: as long as each photo is beneath 20 megabytes in size, you can upload as many as you like for $24.95 a year.
  • The community aspect of sites like Flickr is very encouraging – it’s lovely to have random people say nice things about your photographs, and occasionally have people use them in articles, etc.

(Some people are put off from using Flickr by the appearance of the site, but its API means that there are plenty of alternative front-ends for viewing or presenting your photos, such as flickriver.)

The slight problem with switching to hosting on Flickr was that previously I’d indexed all my photos by the MD5sum of the original image, so several of my pages had links or inline images that pointed to an MD5sum-based URL on the old site.  It occurred to me that it might be useful in general to have “machine tags” on each photo with a hash or checksum of the image, so that, for example:

  • You can simply check which photos have already been uploaded.
  • You can find URLs for all the different image sizes, etc. based on the content of the file.

Unfortunately, I hadn’t done this when uploading the files in the first place, so had to write a script (flickr-checksum-tags.py) which takes the slightly extraordinary step of downloading the original version of every photo that doesn’t have the checksum tags to a temporary file, hashing each file, adding the tags and deleting the temporary file.  This add tags for the MD5sum and the SHA1sum, using a namespace and keys suggested in this discussion, where someone suggests taking the same approach. These tags are of the form:

  checksum:md5=c629c63f8508cfd1a5e6ba6b4b3253a8
  checksum:sha1=df44fc771660fbe7a2d6b2e284ae61e9ed3e377c

The same script can return URLs for a given checksum:

  # ./flickr-checksum-tags.py -m c629c63f8508cfd1a5e6ba6b4b3253a8 --short
  > http://flic.kr/p/7oQxqK
  # ./flickr-checksum-tags.py -m c629c63f8508cfd1a5e6ba6b4b3253a8 -p
  > [... the Flickr photo page URL, which WordPress insists on turning into an image ...]
  # ./flickr-checksum-tags.py -m c629c63f8508cfd1a5e6ba6b4b3253a8 --size=b
  > http://farm3.static.flickr.com/2552/4196574615_491c6387f8_b.jpg

The repository also has a script to pick out files that haven’t been uploaded, and a simple uploader script which will upload an image and add the checksum tags.  The scripts are based on the very useful Python flickrapi module and you’ll need to put your Flickr API key and secret in ~/.flickr-api

Anyway, these have been useful for me so maybe of some interest to someone out there…