Profiling & Improving Performance Using Django Debug Toolbar

Pansul Bhatt

07 Jun 2017

This is the seventh post on the Django blog series. In this post we will learn on how we can optimize the django application’s performance. The inspiration of this post has mainly been, the performance improvement of Charcha Discussion Forum. The source code for the charcha forum is available here.

For this post, like all preceding posts in this series, we are going to use Charcha’s codebase as an example and try to optimize the pages from there on out. Also, as the range for optimization is exceedingly high we would try to tackle all the domains one by one. The main focus would be to attain maximum proficiency in the backend (as it is a django application at the end of the day) and the measurement would be done through django debug toolbar.

The way we are going to proceed with this is by seeing the performance of the application and see where all can we optimize it domain by domain. To perform the assessment on the site we would be using a bunch of tools, the most important one being django debug toolbar and chrome’s network tab. We are going to hit the front-end portion first and move forward from there.

Optimizing the Front-end:

Typically when you are looking to optimize the application on the front-end we should look for the entire stack to understand how our app is working. This approach should always be followed in all cases as it gives you a better perspective on how things are functioning.

Charcha uses server-side rendering instead of client-side rendering. This on its own solves a lot of problems for us. How?? When performing client-side rendering, your request loads the page markup, its CSS, and its JavaScript. The problem is that all the code is not loaded at once. Instead what happens is JavaScript loads and then makes another request, gets a response and generates the required HTML whereas, with server-side rendering, your initial request loads the page, layout, CSS, JavaScript, and content. So basically, the server does everything for you.

The moment I got to know that Charcha is rendering its template from the server, my focused turned towards looking at only caching and minification of the pages (to be honest this should be present in all applications).

Caching Approach:

Before we start writing any code I want you to understand the working of caching in server-side rendering and how powerful it really is? As you may know, whenever there is an endpoint which returns any data, it can generally be cached. So the main question is that can we cache HTML elements? Yes, by rendering on the server we can easily cache our data.

So what’s going to happen is that the first time your client will get a response and that response will now be cached, so the next time when the same response is made not only will your browser have to do the rendering, your server will also not have to. Now that we have an approach for caching we can start implementing it. In order to perform server-side caching, we are going to use whitenoise. Now to integrate whitenoise in our Django app, we just have to follow the following steps:

# Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/1.10/howto/static-files/
STATIC_ROOT = os.path.join(PROJECT_ROOT, 'staticfiles')
MIDDLEWARE_CLASSES = [
    'django.middleware.security.SecurityMiddleware',
    'whitenoise.middleware.WhiteNoiseMiddleware',
    ...
]

Now whitenoise is going to start serving all your static files but our main purpose for caching is still not done. One thing to note here is that whitenoise also supports compression which we can use.

# Simplified static file serving.
# https://warehouse.python.org/project/whitenoise/
STATICFILES_STORAGE = 'whitenoise.storage.CompressedManifestStaticFilesStorage'
STATIC_HOST = os.environ.get('DJANGO_STATIC_HOST', '')
STATIC_URL = STATIC_HOST + '/static/'
INSTALLED_APPS = [
    ...
    # Disable Django's own staticfiles handling in favour of WhiteNoise, for
    # greater consistency between gunicorn and `./manage.py runserver`. See:
    # http://whitenoise.evans.io/en/stable/django.html#using-whitenoise-in-development
    'whitenoise.runserver_nostatic',
    'django.contrib.staticfiles',
    ...
]
def cache_images_forever(headers, path, url):
    """Force images to be cached forever"""
    tokens = path.split(".")
    if len(tokens) > 1:
        extension = tokens[-1].lower()
        if extension in ('png', 'jpg', 'jpeg', 'ico', 'gif'):
            headers['Cache-Control'] = 'public, max-age=315360000'
WHITENOISE_ADD_HEADERS_FUNCTION = cache_images_forever

Now our caching for the front-end is complete. Let’s move on to the second phase of front-end optimization and try to minify our files.

Minifying the Files:

For minifying our files we are going to use spaceless template tags. Although this will not minify our files per se, it is still going to give us a better result as it would help us reduce our page weight. How?? Well, spaceless template tagsremoves whitespace between HTML tags. This includes tab characters and newlines. This is really easy to implement in Django templates. All we need to add is spaceless and close it with endspacelessin our base.html. As our base.html is going to be used everywhere as the basic skeleton, we can be sure that it will be applied to all the other templates within it as well.

Now that we are done with our front-end optimization let’s move to our back-end. This is perhaps the one place where we would be able to achieve the maximum amount of efficiency.

Optimizing the Back-end:

Ideally, the flow for optimizing your code is to move from your queries to how your Django is functioning. In queries, we need to see the number of queries, the amount of time it takes to execute the query and how our Django ORM'S are working. Internally Django already does a lot for us but we can optimize its queries even further.
We need to start scavenging the code and check the total number of query calls per page. There are a few ways to do this. The most obvious one being logging all the queries in sql.log and start reading the queries from there. There is one other way of accomplishing this, that is by using django debug toolbar.

Now django debug toolbar is a really powerful tool and extremely easy to use. It helps in debugging responses and provides us with a bunch of configurable panels that we could use. We are mainly going to use this to see how quick our queries are and if we are performing any redundant queries.

Django Debug Toolbar

So let’s first integrate the toolbar in Charcha. To install django debug toolbar we are going to follow this doc. Its pretty straightforward and easy to implement. So first we need to install the package using pip.

 $ pip install django-debug-toolbar

One thing to note here is that if you are using a virtualenv avoid using sudo. Ideally we should not be using sudo anyways if we are in a virtualenv. Now our django debug toolbar is installed. We need to now add it in our settings.py INSTALLED_APPS by adding debug_toolbar.

INSTALLED_APPS = [
    ...
    'django.contrib.staticfiles',
    ...
    'debug_toolbar',
    ...
]

Now we need to put the debug toolbar in our middleware. Since there is no preference of the django debug toolbar middleware the middleware stack we can add it wherever we see fit.

MIDDLEWARE = [
    # ...
    'debug_toolbar.middleware.DebugToolbarMiddleware',
    # ...
]

Now we need to add the url to configure the toolbar. We just need to add the url and put it in our DEBUG mode, the rest django debug toolbar will take care of, on its own.

from django.conf import settings
if settings.DEBUG:
    import debug_toolbar
    urlpatterns = [
        url(r'^__debug__/', include(debug_toolbar.urls)),
    ] + urlpatterns

There are also a few configurations that we could use but for now we are going to set it to default as its not required (with the correct version this should be done automatically).

DEBUG_TOOLBAR_PATCH_SETTINGS = False

Also, since we are using our local we need to set the following INTERNAL_IP for django debug toolbar to understand.

INTERNAL_IPS = ('127.0.0.1',)

And now we test. Mostly our screen would look something like the image below.

Django Debug toolbar
Django Debug Toolbar in Action

We can now start checking the number of queries that are being run and how much time they are taking to execute as well.

This part is actually a bit easy as all we have to do is set a benchmark on the basis of the amount of data that we have. We also need to consider the higher effect where if we have a lot of data within a lot of foreign keys, we might need to start using indexes there.

To show how we are going to refactor the code we went ahead and started seeing the performance of all the pages. Most refactoring would mostly look at how we could reduce the number of queries. As an example we are going to look at the discussion page.

Django Debug toolbar
Django Debug Toolbar in the discussion page

Now according to the django debug toolbar there are a lot of queries that we are making 8 queries whenever we are making our call to see the post. This would, later on, start giving us problems if we don’t eliminate a few of them. The way we are going to approach optimizing queries is in subparts as follows:

Solving N+1 queries:

We can solve the problem of N+1 queries simply by using prefetch_related and select_related in Charcha. With the two functions, we can have a tremendous performance boost as well. But first, we need to understand what exactly is it that they do and how can we implement it in Charhca.

select_related should be used when the object that you are selecting in a single object of a model, so something like a ForeignKey or a OneToOneField. So whenever you make such a call, select_related would do a join of the two tables and send back the result to you thereby reducing an extra query call to the ForeignKey table.
Let’s see an example to better understand how we can integrate this in Charcha.
We are going to take the example of the Post model which looks something like this:

class Post(Votable):
    ...
    objects = PostsManager()
    title = models.CharField(max_length=120)
    url = models.URLField(blank=True)
    text = models.TextField(blank=True, max_length=8192)
    submission_time = models.DateTimeField(auto_now_add=True)
    num_comments = models.IntegerField(default=0)
    ...

Do note that we have a custom manager defined, so whichever query we need to execute we can define it in our manager. Now in your first glance, you can see that our class Post is inheriting from Votable, so we now need to see what is happening in this class.

class Votable(models.Model):
    ...
    votes = GenericRelation(Vote)
    author = models.ForeignKey(User, on_delete=models.PROTECT)
    ....

Whenever we make a call to check the Author of our Post we will be doing an extra call to the database. So now we go to our custom manager and change the way are fetching the data.

class PostsManager(models.Manager):
    def get_post_with_my_votes(self, post_id, user):
        post = Post.objects\
            .annotate(score=F('upvotes') - F('downvotes'))\
            .get(pk=post_id)
        ...
        return post

If we use the django debug toolbar you would see that whenever we do a call like get_post_with_my_votes().author, we are going to be executing an extra query to the User table.

Before select_related
Before select_related

This is not required and can be rectified easily by using select_related. What do we have to do? Just add select_related to the query.

class PostsManager(models.Manager):
    def get_post_with_my_votes(self, post_id, user):
        post = Post.objects\
            .annotate(score=F('upvotes') - F('downvotes'))\
            .select_related('author').get(pk=post_id)
        ...
        return post

And that’s it. Our redundant query should be removed. Lets check it using django debug toolbar.

After select_related
After select_related

We can use prefetch_related when we are going to get a set of things, so basically something like a ManyToManyField or a reverse ForeignKey. How does this help? Well, the way prefetch_related works is it makes another query and therefore reduces the redundant columns in the original object. This as of now is not really required so we are going to let it be.

* Query in a loop:

Though this is not done anywhere in Charcha but this a practise which a lot of people follow without realising its impact. Although this seems pretty basic I still feel that requires its separate section.

post_ids = Post.objects.filter(id = ?).values_list('id', flat=True)
authors = []
for post in post_ids:
    #Disaster as when we would have a lot of post_ids say around 10,000
    #we would be making that many calls, basically (n+1) calls.
    post = Author.objects.filter(id= post_ids)
    authors.append(post)

The above example is just a sample of how disasterous a query in a loop could be for a large dataset and this above example can easily be solved by the previous discussion (on select_related and prefetch_related) which we had.

Denormalization:

I recommend using denormalization only if we have some performance issues and not prematurely start optimizing it. We should check the queries before we start with our denormalization as it does not make any sense if we keep on adding more complexity if it is not impacting our performance.

The best place to explain denormalization implementation in Charcha is in the Votable model as done in the 2nd blog of this series. Why the Votable table? This is because we want to show the score i.e. upvotes – downvotes and the comments on the home page. We could make a join on the Votable table and call it from there but it would slow down our system. Instead what we are doing is, adding the fields of upvotesdownvotes and flag in the Votable model. This would in turn reduce our calls.

class Votable(models.Model):
    ...
    upvotes = models.IntegerField(default=0)
    downvotes = models.IntegerField(default=0)
    flags = models.IntegerField(default=0)
    ...

Now we can just inherit these properties in whichever models we require and from there we can move forward. Although this seems pretty easy to implement it does have a drawback. Every time there is some changes that is made, we would have to update these fields. So instead of making a JOIN we would rather have a more complex UPDATE.

WBS for Tracking Comments Hierarchy

This is also a form of denormalization. Each comment needs a reference to its parent comment and so on , so it basically makes something like the tree structure.

class Comment(Votable):
    ...
    wbs = models.CharField(max_length=30)
    ...

The problem here is that self-referential calls are exceedingly slow. So we can refactor this approach and add a new field called wbs which would help us track the comments as a tree. How would it work? Every comment would have a code, which is a dotted path. So the first comment would have the code “.0001” and the second top-level comment would have the code “.0002” and so on. Now if someone responds to the first comment it gets a code of “.0001.0001”. This would help us avoid doing self-referential queries and use wbs instead.

Now the limitation for this field is we would only allow 9999 comments at each level and the height of the wbs tree would only go till 6, which is sort of acceptable in our case. But in the case of having to go through a large database, we would have to index this field as well. We would discuss this in the next section.

Adding Indexes

Indexes is one of the many standard DB optimization techniques and django provides us with a lot of tools to add these indexes. Once we have identified which all queries are taking a long time we can use Field.db_index or Meta.index_together to add these from Django.

Before we start adding indexes we need to identify where all should we add these properly and to do that we will use django debug toolbar to see how fast we get our response. Now we are going to look at the post which we had before and we are going to track its queries. We are going to select a query which we could easily optimize indexes (given below)

Before indexes
Before indexes
SELECT ••• FROM "votes" WHERE ("votes"."type_of_vote" IN ('1', '2') AND "votes"."voter_id" = '2' AND "votes"."content_type_id" = '7' AND "votes"."object_id" = '1')

Now, this particular query is taking 1.45s to execute. Now, all we have to see is the table and which field we could add the index on. Since this query belongs to model Votes we are going to add the index on content_type_id and object_id. How?

class Vote(models.Model):
    class Meta:
        db_table = "votes"
        index_together = [
            ["content_type", "object_id"],
        ]

And that’s all we have to do. Now we run our migrations and check our performance.

After indexes
After indexes

Now, this query is taking only 0.4 seconds to execute and that is how easy it is to implement indexes.

Django QuerySets are LAZY

One thing to note when using django is that django querysets are lazy , what that means is queryset does not do any database activity and django will only run the query when the queryset is evaluated. So if we make a call from the Post model like

q = Post.objects.filter(...)
q = q.exclude(...)
q = q.filter(...)

This would make three separate queries which is not really required.

Caching sessions

One thing to notice using the django debug toolbar is that almost all the pages have a query made to retrieve the Django session.

Before cookie based sessions
Before cookie based sessions

The query that we are tackling is given below and since its being used in almost all the places we can simply cache it once and reap the benefits everywhere else.

SELECT ••• FROM "django_session" WHERE ("django_session"."session_key" = '''6z1ywga1kveh58ys87lfbjcp06220z47''' AND "django_session"."expire_date" > '''2017-05-03 10:56:18.957983''')

By default, Django stores the sessions in our database and expects us to occasionally prune out old entries which we would not be doing.

So on each request, we are doing a SQL query to get the session data and another to grab the User object information. This is really not required and we can easily add a cache here to reduce the load on our server. But the question still remains on which store should we use? From a higher operational point of view, introducing a distributed data store like redis is not really required. Instead, we could simply use cookie-based sessions here.

Do note that we would not be using cookie-based sessions for highly secure sites but Charcha does not need to be highly secure.

How do we implement it?

Using cookie-based sessions is very easy to do. We can simply follow the steps given in this link. All we have to do is add the following in our settings.py (or common.py as per Charcha) and see the magic happen. We will switch to storing our sessions in our cache and easily remove a SQL query from every single request to our site.

SESSION_ENGINE = "django.contrib.sessions.backends.signed_cookies"

Now from the snapshot given below, we can see that our redundant query is removed.

After cookie based sessions
After cookie-based sessions

Hence, we have accomplished our task of optimizing our site.

Summary

In this post, we have discussed how one should proceed when optimizing their site. We have tackled all the domains, discussed the various techniques to optimize a Django site and talked about the tools like which django debug toolbar can be used for a sites assessment.


Have a question?

Need Technology advice?

Connect

+1 669 253 9011

contact@hashedin.com

facebook twitter linkedIn youtube