MongoDB & Stuff

For quite some time I wanted to write something with a document-based database system again. My last trip into this area was in Feb. 2009 when I needed a small feed-aggregator for a local Barcamp and wanted to provide a live-ticker for all the tweets and photos related to that event. Back then I went with CouchDB with its slick web interface and HTTP API, but it is by far not the only attractive system on the block.

A few weeks ago I suddenly wanted to work with such a system again and remembered something that I wanted to write for my site for ages: A place to keep lists of games and books that I own, so that I can easily link them from blog posts. Since both these categories have only a small subset of properties in common (title, slug, createdAt-timestamp, review) and in the future I maybe want to add even more categories and esp. properties (like a reference to some Memory Alpha page for example) I wanted to keep the database backend as flexible as possible. So I decided to use a document-centric DB for this section. But which one? It's not like I will update the database multiple times per second, so CouchDB would be a candidate. Something else that I didn't really like about CouchDB, though, was the lack of composite unique keys. All you can do here, as far as I know, is to encode the value of each property that should be part of that unique-property into the main key (doc_id) of a document which ends up in duplication of information.

So I went looking around a bit and stumbled upon MongoDB. MongoDB is similar in many regards to CouchDB but in some takes a slightly different approach. For once it doesn't have a generic HTTP interface but relies on platform-dependent interfaces (similar to MySQL, PostgreSQL and friends). It also lets you organize documents not only into databases but also collections. Within these collections you can then define custom indices that also allow for composite unique keys (or "compound key indices" as they are called in the documentation). This is especially handy if you want to have a slug be unique only for a certain content type (i.e. book or game).

The integration with Python (and in my case django) is also pretty straight forward. You need to install the pymongo library and that's mostly it. pymongo's connection object already acts like a connection pool so you just need to have such an object as a module-global variable in your Django application.

from pymongo.connection import Connection
from django.conf import settings

connection = Connection(settings.MONGODB_HOST, settings.MONGODB_PORT)
database = connection[settings.MONGODB_NAME]

def index(request):
    games = database.collection.find({'type':'game'})
    # ...

For those of you who prefer seeing a complete example, take a look at Mike Dirolf's sample django+mongodb project.

There is one small problem, though, with pymongo's Cursor object: You can't use it out of the box with django's Paginator. The one missing method to work there is Cursor.getitem with support for slices. Depending on how you use the paginator, you should be able to monkey-patch it into the Cursor with a combination of Cursor.skip and Cursor.limit, though.

The rest was easy: Writing some small forms for adding and modifying books and games and presenting them with some images. While writing the presentation code, I noticed another small issue: Sorting over multiple fields with one of them being optional doesn't really work all that well in version 1.0. Something like that is interesting if you're working with books that are optionally part of a series and you want them to be listed with the series first and then the main title. Luckily this seems to have already been fixed in the development version.

In general, I really enjoyed this small trip into the field of document-based databases again. They are definitely not the solution for every storage problem out there, but for small stuff and esp. problems where the data-structure has to be highly flexible I can definitely see myself using them more and more :D