You just picked up this framework named Django, played around with it’s ORM for a while and you love how easy you can work with it.
You start to notice that your perfect page is taking ages to load. You check your django-debug-toolbar and you find that your view has more than 100 queries!!!
This situation has happened to me before. I will try to show you how to avoid the most common issues you can have when making queries and how to reduce the queryset count overall. I will also give a brief explanation on how Django querysets work. Keep in mind that it is better to read the documentation for the details.
Let’s say that our new shining project has the following models:
class Person(models.Model): name = models.CharField(max_length=50) class Phone(models.Model): number = models.CharField(max_length=50) person = models.ForeignKey(Person)
We want to list all of our users, so we have something like this in one of our views:
Person.objects.all()
This will, of course, return all of the Person instances in our database. This query will be fast at the beginning, but after we start having more and more Person instances saved in our database, this will take longer and longer. There is a known technique to handle this called pagination. This returns our Person instances by batches instead of returning the whole thing, reducing the time that the page will take to load. Have you been at StackOverflow before? Go to the bottom and you will see the option to switch to a different page.
Django has a paginator object built in. To paginate our Person instances all we need to do is the following:
people = Person.objects.all() paginator = Paginator(people, 25) page = paginator.page(1)
The page variable will contain the first 25 People instances returned by our queryset in the object_list attribute. You will need more logic to have a functional view, but that is all included in the docs.
Now we want to display a count of the number of Phone instances that every Person has. For this we use something like this in our template:
{% for person in persons %} {{ person.phone_set.count }} {% endfor %}
If we keep our template as it is now, we will generate an extra query per every Person in our query (The COUNT query), which means a lot more load to our servers. Fortunately, we can retrieve everything in a constant number of queries by using prefetch_related. In our example, we will load all of the Phone instances for all of our Persons:
persons = Person.objects.all().prefetch_related('phone_set')
With this, we have retrieved in 2 queries what before took around N+1, where N is the number of Person instances. We also have the whole Phone instance, which is a waste in this case where we only want the COUNT but can be useful if you do need to use the other fields.
We need to update the template to avoid using count, replacing it with length which counts the objects in Python. We can do this because we have loaded all the objects already:
{% for person in persons %} {{ person.phone_set.all | length }} {% endfor %}
I recommend to read more about prefetch_related and select_related in the docs. Those methods are essential when you want to reduce the queryset count.
In that case you can make use of the aggregation and annotation tools. Instead of retrieving the whole instances of a queryset, you can instruct Django to append the COUNTs to your results:
persons = Person.objects.annotate(phone_count=Count('phone'))
Now you can have access to the number of Phone instances:
{% for person in persons %} {{ person.phone_count }} {% endfor %}
Of course, welcome to the world of software development (there is ALWAYS something more). You can add cache at several levels of your application, asynchronous queues for long tasks, etc. There are too many topics to cover in this short post.
If you are interested on these topics go to the respective documentation pages, Django’s documentation is amazing!
View Comments
Performance is important!