GenericForeignKeys with fewer queries

This post was written on 2008/08/13 at 18:36:40 by Horst Gutmann

When working with generic relations in Django you have to be quite careful not to end up with n+1 queries for a simple fetch of n elements. The reason for this is that internally a generic relation is not really a true foreign key (naturally) but just an id combined with a foreign key to a content-type. But there are some ways around this problem. Among them a quite simple one: Doing the actual content-loading by yourself.

Inspired by Ryan Berg's tutorial about how to build a small tumblelog with Django, let's adapt this example a little bit. If you're using generic foreign keys you will probably end up with a structure like this:

class Item(models.Model):
    pub_date = models.DateTimeField()
    content_type = models.ForeignKey(ContentType)
    object_id = models.IntegerField()
    content_object = generic.GenericForeignKey('content_type', 'object_id')

class Post(models.Model):
    title = models.CharField(max_length=255)
    pub_date = models.DateTimeField()
    content = models.TextField()
    item = generic.GenericRelation(Item)

class Link(models.Model):
    title = models.CharField(max_length=255)
    pub_date = models.DateTimeField()
    description = models.TextField(null=True, blank=True)
    url = models.URLField()
    item = generic.GenericRelation(Item)

class Photo(models.Model):
    title = models.CharField(max_length=255)
    pub_date = models.DateTimeField()
    item = generic.GenericRelation(Item)

Combined with a simple signal handler that gets triggered everytime you change a Post, Link or Photo instance and that updated the Item of that instance, this gets the job done pretty nicely from the writing-point-of-view.

from django.db.models import signals

def update_item(instance, raw, created, **kwargs):
    if created:
        item = Item()
        item.content_type = ContentType.objects.get_for_model(type(instance))
        item.object_id = instance.id
    else:
        item = instance.item.all()[0]
    item.pub_date = instance.pub_date
    item.save()

signals.post_save.connect(update_item, Post)
signals.post_save.connect(update_item, Photo)
signals.post_save.connect(update_item, Link)

When it comes to reading that data, you'd normally not want to use something like this:

Item.objects.select_related().all()

... for the simple reason that it would pull every related object in its own query. But since there is probably a much lower number of content types in your model structure than items, you could lower that count from n+1 queries (where n is the number of items) to something more like 1+m (where m is the number of models queried through the content_type property of the Item-class).

The trick is pretty simple: Don't do .select_related(), but be a bit more specific what related object you actually want to have. For now, all that is actually needed is (as the whole m-thing from above indicated) the content-type of each item. So a .select_related('content_type') is enough. With this we end up having all the references and content-types within one query (at least after the content-type's have been cached). Now all that is left to do is to do one query for each content type to get the actual Posts, Items and Photos that were referenced in the Items:

items = Item.objects.select_related('content_type').all()
model_map = {}
item_map = {}
for item in items:
    model_map.setdefault(item.content_type, {}) \
            [item.object_id] = item.id
    item_map[item.id] = item
for ct, items_ in model_map.items():
    for o in ct.model_class().objects.select_related() \
            .filter(id__in=items_.keys()).all():
        item_map[items_[o.id]].content_object = o

In order not to repeat myself here, I simply put that snippet into a simple manager and associated it as secondary manager with the Item class.

With something like this in place, GenericForeignKeys are once again quite high on my list of features I really like about Django. Sometimes, as nice as it is to have, model inheritance simply isn't what you want and for something like a tumblelog where you just want to have a meta-object that helps you basically merge queries, they are IMO simply still the way to go. And with < n+1 queries for a simple page, all the better ... ;-)

Comments:

  • Martin Geber (Guest)

    This is a really great tip! Why don't you add the ready-made manager to djangosnippets.com? I Guess some people'd love it.

    Cheers.

    PS. Again: Great work.

    Aug. 15, 2008, 10:07 a.m.

  • zerok

    I'm still guessing on what my password there is ;-)

    Aug. 15, 2008, 5:22 p.m.

  • zerok

    http://www.djangosnippets.org/snippets/984/

    Aug. 15, 2008, 10:27 p.m.

  • Martin Geber (Guest)

    Awesome. Great Manager! adding it to delicious

    Aug. 19, 2008, 11:06 a.m.

  • hab (Guest)

    Martin,

    I have been playing with your solution, but it seems that if you use ContentType object as a key in the .setdefault function, each object is unique and therefore resulting model_map has the same number of elements as there are items (=n). The following 'for' cycle is then repeated 'n' times, not 'm'. At least that is how it works for me in Django 0.96.1.

    The solution could be to use item.content_type.name as a key for model_map, and translate it later back to ContentType object.

    Also, it seems that this procedure will change the ordering of the list, so you basically can not control it. So far I do not know how to solve that.

    The last two things, just a corrections, I think you can not use filter() and all() together in ct.model_class().objects.select_related() .filter(id__in=items_.keys()).all(), and in the snippet you posted you probably want to return item_map.items(), not the initial 'qs'.

    I might be wrong, so please let me know what you think about that.

    Anyway your code gave me some good ideas, if we will be able to fix/clarify above stuff, it will be perfect.

    Aug. 24, 2008, 11:08 a.m.

  • hab (Guest)

    sorry, not Martin, but zerok, of course:)

    Aug. 24, 2008, 11:30 a.m.

  • zerok

    hab,

    • len(model_map.keys()) == m at least in trunk. Never tried it with 0.96.x since I don't use it anywhere.
    • No, it doesn't change the ordering since the items in the resultset will be changed through the ordering thanks to pass-by-reference in Python
    • Regarding filter+all: At least in post-qsrf all() doesn't really do all that much, so I basically put it everywhere ;-) I might be wrong there, but so far it hasn't bitten me :-)
    • qs vs. items: This is exactly what I meant above: If qs is returned, you still operate on the original order of things ;-)

    Aug. 24, 2008, 12:51 p.m.

  • hab (Guest)

    Thanks for your reply, I realized I was wrong about 'ordering' and 'qs'. Regarding using 'all()' and 'filter()' together - that returns error in Django 0.96.1 (I do no use the latest SVN version, maybe I should:). The same for 'object as key' issue, again it is probably fixed in newer versions. To sum it up, I have put together a code that I believe works as meant in the release that I use, maybe it could help also some other guys. BTW, it makes me wandering, if it is necessary to use select_related('content_type'), as it could be done using content_type_id as a key and then get the correct ContentType object using that...

    items = Item.objects.select_related('content_type').all()
    model_map = {}
    item_map = {}
    
    for item in items:
      model_map.setdefault(item.content_type.name, {})[item.object_id] = item.id
      item_map[item.id] = item
    
        for ct_name, items_ in model_map.items():
          for o in ContentType.objects.get(name=ct_name).model_class().objects.select_related().filter(id__in=items_.keys()):
               node_map[nodes_[o.id]].content_object = o
    
    ## We do not have to do this, because if
    ## we modify item in item_map, the change is reflected in 'items' also (OOP:), so I take back my comment about 'ordering' and also about 'return qs'
    #     for item in items:
    #         item.content_object = item_map[item.id].content_object
    
    return items
    

    Thanks again, keep up good work.

    Aug. 24, 2008, 1:05 p.m.

  • Anonymous (Guest)

    fewer.

    Sept. 24, 2008, 9:24 a.m.

  • Grigoriy Petukhov (Guest)

    Thanks for snippet. There is some issue. When some items from origin qs have the same content object than later loaded content object is assigned only to one item.

    My version which try to fix this lack: http://dumpz.org/5302/

    Feb. 6, 2009, 9:02 a.m.

  • zerok

    @Grigoriy good point :-) Never noticed it since that case was impossible in the application I needed that code for :-)

    Feb. 6, 2009, 10:58 a.m.

*'d input fields are required.