In this post, we will go through how to assess which cache can be used and how to use it. The inspiration of this post has mainly been, to introduce a caching mechanism in
Charcha Discussion Forum. The source code for the Charcha forum is available here.

Why is this required?

Caching is one of the most useful techniques used by developers to improve the performance of their application. The challenges that most sites face is that they have to serve multiple clients and for an obvious reason your application might have to show different data or templates to different clients based on your use-case.

Let’s take an example to better explain this, let’s say you are an admin of Charcha and you are authorized to see all the comments and votes given to all the blogs. Now let’s say a new user comes along and all he/she is authorized to do is see the blog and the top comment. According to this logic, your server would have to make separate queries to the database and generate different templates based on your authorization (let’s just say permissions from now onwards).

Now, this is just a simple example using Charcha, if we go higher we might find that these sort of cases are really common. For a high traffic website, you would basically be asking your server to handle  X number of requests, making it perform  N number of queries to generate some ‘y’ number of templates. We can all agree that for high traffic sites the overhead can be pretty overwhelming. To counter this we introduce caching.

What caching would be doing is, it would be removing the burden of you having to do the queries, again and again, rather it would just store the result and send it directly to the client. Now, this all sounds pretty much on the upside but caching does have a lot of drawbacks.

Django already comes with a cache system where it lets you save the pages but that’s not it, Django does much more than that. It provides you with different levels of cache granularity. In this blog, we would discuss which cache system is best suited for us. We would tackle the advantages and disadvantages of the caches as well. Let’s try dividing all the cache that could be used as per the aforementioned levels of cache granularity.

Do note that Charcha might not really require cache at all if the maintenance might be high. We have already introduced some cache which has already been mentioned in our one of our posts.

Setting up Cache in Django

Setting up cache in django is exceedingly easy. All we have to do is define what type of caching do we want to integrate in our settings.py (or common.py in Charcha), how long will it live for and where it can be stored.
Let’s tackle all the levels of cache granularity that django provides us.

1. Caching Database

Django provides us with the availability of saving the cached data to our database. This works swimmingly if we have a well-indexed database.
To set it up all we have to do is create the cache database table as given below:

$ python manage.py createcachetable 

This will make the table as per the expected configuration of your django app. Now, all we have to do is set the backend in our cache preference and set the location of the database table.

CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.db.DatabaseCache',
        'LOCATION': 'cache_data_table',
    }
}

And we are done. This is not as effective and would probably require a lot of adjustments once we start using a lot of tables. Since there are better options available we are going to have a look at those.
[/et_pb_text][et_pb_text admin_label=”2. Using Memcached” _builder_version=”3.0.86″ background_layout=”light”]

2. Using Memcached

One of the most popular and efficient types of cache supported natively by Django is MEMCACHED. As the name suggest MEMCACHED is a memory based cache server. It can dramatically reduce the number of databases queries a server has to do and increase the performance of an application by 10x.

Generally, database calls are fast but not fast enough, since the query takes CPU resources to process and data is (usually) retrieved from disk. On the other hand, an in-memory cache, like,MEMCACHED takes very little CPU resources and data can be directly picked up from the memory. It’s not a query like structure like SQLrather MEMCACHED uses a key-value pair to get all the data, therefore for obvious reasons you go from a complexity of O(n) or O(n^2) to O(1).

There are a few ways to use,MEMCACHED you could individually install it (if you don’t already have it) by using the command below:

# Install on Debian and Ubuntu
$apt-get install memcached
# Install on Mac OS X (with Homebrew)
$brew install memcached

Once you have MEMCACHED installed, it is pretty easy to use it. All we have to do is call it in settings.py and we are done.

CACHES = {
    'default':{
        'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
        'LOCATION': '127.0.0.1:9000',
    }
}

One of the most popular features of MEMCACHED is its ability to share its cache over multiple servers. That means that we can basically run MEMCACHED daemons on multiple machines, and the program will treat the group of machines as a single cache. How?

CACHES = {
    'default':{
        'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
        'LOCATION': ['127.0.0.1:9000','127.0.0.1:9000'],
    }
}

We can also check the behavior of our cache using:

cache.set('hello', 'world')
cache.get('hello')

It feels like we have been going over how great MEMCACHED is but it has a huge disadvantage. Due to its cache granularity level (being memory), if the server crashes you lose all your data as well and that’s where it hits you. You basically go down with your server and restarting your server would basically be like a clean slate on your MEMCACHED.

3. Using Django-redis

It’s a valid point to note that Redis holds many advantages over MEMCACHED, the only disadvantage being Redis is at a more lower granular level than MEMCACHEDRedis offers clustering, and unlike MEMCACHEDsupport is provided out-of-the-box. Being built-in provides a more robust solution that is easier to administrate. It is also exceedingly fast and easy to use. It uses the same key-value pair as its opponent, so it’s not going to be that difficult to understand. Overall, I feel that both these caching systems would not hold that big of a performance improvement over the other, so it boils more towards how comfortable you are with between the two systems.

Personally, I like Redis more as it is easier to set up and gives us a wider range of possibilities. The area where Redis wins over its opponents are it data persistence, restore and high availability. This might not really make sense to use unless your data is important.

So let us download Redis in our application and see it in action using this. Alternatively, you can install Redisusing the commands below:

$ wget http://download.redis.io/redis-stable.tar.gz
$ tar xvzf redis-stable.tar.gz
$ cd redis-stable
$ make

Let us run the server now using:

$ redis-server

Redis also provides us with this awesome cli. We can get in this cli and start seeing all the keys which are getting stored as well.

$ redis-cli ping
PONG

One of the ways Django can be integrated with Redis is through django-redis to execute its commands in Redis. We can install django-redis using:

pip install django-redis

django-redis is also going to be easy to include as all we have to do is add it in our settings.py. Do note that Redis by default runs on port 6379, so we are going to point to that location in our settings.pydjango-redisto listen to.

CACHES = {
    'default':{
        'BACKEND': 'django_redis.cache.RedisCache',
        'LOCATION': 'redis://127.0.0.1:6379/1',
        'OPTIONS': {
            'CLIENT_CLASS': 'django_redis.client.DefaultClient',
        },
        'KEY_PREFIX': 'example'
    }
}

Charcha does not have a big overhead, so adding django-redis might seem a bit like overkill and so did MEMCACHED but for implementation purposes, and for more heavy traffic sites we want to give you an example on how to use django-redis.

Let’s try seeing how well our application runs when we make a lot of requests at the same time. We can use loadtest here to make concurrent calls and assess the performance of django-redis.

$ loadtest -n 100 -k http://localhost:9000/discuss/1/
Before django-redis in loadtest
Before Django-Redis in loadtest

So the total time taken to load is about 7 seconds. This is pretty commendable but can be optimized a bit further. We need to startanalyzing why does it take this long and what all views are getting called. For thi,s we would use django-toolbar.

When we look at what the code is doing, we can make decisions on how to make changes to improve the performance. I can see that there are a lot of the same queries happening for the same requests. All I have to do is add:

url(r'^discuss/(?P\d+)/$', cache_page(5*60)(views.DiscussionView.as_view()), name="discussion"),

and reap all the benefits of django-redis. You can see the output below yourself. Stand back ladies and gentlemen, benchmarking are happening.

After django-redis in loadtest
After django-redis in loadtest

Yup, thats right. We have achieved a response of 2 seconds. The overhead is not that much so using django-redis here is more of a call we need to take.

4. Using Varnish

Varnish is a caching HTTP reverse proxy. It’s always going to be in front of your server be it Apache or Nginx. The way varnish works is, it helps in caching your static pages. The problem with varnish is that it cannot cache dynamic content. If your site is not as dynamic, you could make good use of varnish but you should always check which all views which can be cached and then cache them. Second, identify how long of a delay you could tolerate someone seeing stale content. The best thing about django and varnish is how well they both fit together.
Setting up varnish is also really easy to do:

$apt-get install varnish
$brew install varnish

Let’s take a page and see its performance and how it is working (for this we will take the discussion page).

Before Varnish
Before Varnish

How will Varnish help? What does it do? Basically, Varnish is going to sit between your django and your users. So, all we have to do now is add the cache_page decorator on our view and it is going to do everything for us. Let’s try applying it in the upvote_post and see what happens.

from django.views.decorators.cache import cache_page
@cache_page(60)
def upvote_post(request, post_id):
	...
After Varnish
After Varnish

What just happened? Well, the varnish was waiting for the response from upvote_post and when you made a server call to the function, it held the response with itself. Now the next time we made the call, without having to go to the server view again, ‘varnish’ just sent back the response.
To be more secure we could add a cookie header with the request so we could have some security at this level as well.
This entire implementation is what is called as the per-view cache. To explain it in a more layman term, we are basically storing/caching all the responses from the views individually.
Varnish also has its own configuration language, this can be used in places for normalization where the endpoints are not different based on the user’s authentication. How?

sub vcl_recv {
	// Urls can be stripped of all the cookies
	if(req.url == "/" || req.url ~ "^/comments/") {
		unset req.http.Cookie;
	}
}

We can go ahead with Varnish for now and start caching the views at least. This has a lot of documentation and I urge you to read up on it.

 Summary

Caching as previously mentioned is a way of reducing the server load and improving the performance of your application. Although most caches assiduously keep on working to reduce the dependency on your server, one must always keep in mind the overhead of the cache as well.

Charcha might not require that many levels of caches as compared to high traffic sites. If we take an example, where Charcha becomes something like stackoverflow we can start adding cache to reduce the server response, something like using Redis/django-redis as a store for storing objects, the result of DB queries and use Varnish for serving our static pages. It really depends on your use-case and what you require your cache to do.

For now, in Charcha, the previous implementation for caching as mentioned in our previous post would do just fine.

This is the seventh post on the Django blog series. In this post we will learn on how we can optimize the django application’s performance. The inspiration of this post has mainly been, the performance improvement of Charcha Discussion Forum. The source code for the charcha forum is available here.

For this post, like all preceding posts in this series, we are going to use Charcha’s codebase as an example and try to optimize the pages from there on out. Also, as the range for optimization is exceedingly high we would try to tackle all the domains one by one. The main focus would be to attain maximum proficiency in the backend (as it is a django application at the end of the day) and the measurement would be done through django debug toolbar.

The way we are going to proceed with this is by seeing the performance of the application and see where all can we optimize it domain by domain. To perform the assessment on the site we would be using a bunch of tools, the most important one being django debug toolbar and chrome’s network tab. We are going to hit the front-end portion first and move forward from there.

Optimizing the Front-end:

Typically when you are looking to optimize the application on the front-end we should look for the entire stack to understand how our app is working. This approach should always be followed in all cases as it gives you a better perspective on how things are functioning.

Charcha uses server-side rendering instead of client-side rendering. This on its own solves a lot of problems for us. How?? When performing client-side rendering, your request loads the page markup, its CSS, and its JavaScript. The problem is that all the code is not loaded at once. Instead what happens is JavaScript loads and then makes another request, gets a response and generates the required HTML whereas, with server-side rendering, your initial request loads the page, layout, CSS, JavaScript, and content. So basically, the server does everything for you.

The moment I got to know that Charcha is rendering its template from the server, my focused turned towards looking at only caching and minification of the pages (to be honest this should be present in all applications).

Caching Approach:

Before we start writing any code I want you to understand the working of caching in server-side rendering and how powerful it really is? As you may know, whenever there is an endpoint which returns any data, it can generally be cached. So the main question is that can we cache HTML elements? Yes, by rendering on the server we can easily cache our data.

So what’s going to happen is that the first time your client will get a response and that response will now be cached, so the next time when the same response is made not only will your browser have to do the rendering, your server will also not have to. Now that we have an approach for caching we can start implementing it. In order to perform server-side caching, we are going to use whitenoise. Now to integrate whitenoise in our Django app, we just have to follow the following steps:

# Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/1.10/howto/static-files/
STATIC_ROOT = os.path.join(PROJECT_ROOT, 'staticfiles')
MIDDLEWARE_CLASSES = [
    'django.middleware.security.SecurityMiddleware',
    'whitenoise.middleware.WhiteNoiseMiddleware',
    ...
]

Now whitenoise is going to start serving all your static files but our main purpose for caching is still not done. One thing to note here is that whitenoise also supports compression which we can use.

# Simplified static file serving.
# https://warehouse.python.org/project/whitenoise/
STATICFILES_STORAGE = 'whitenoise.storage.CompressedManifestStaticFilesStorage'
STATIC_HOST = os.environ.get('DJANGO_STATIC_HOST', '')
STATIC_URL = STATIC_HOST + '/static/'
INSTALLED_APPS = [
    ...
    # Disable Django's own staticfiles handling in favour of WhiteNoise, for
    # greater consistency between gunicorn and `./manage.py runserver`. See:
    # http://whitenoise.evans.io/en/stable/django.html#using-whitenoise-in-development
    'whitenoise.runserver_nostatic',
    'django.contrib.staticfiles',
    ...
]
def cache_images_forever(headers, path, url):
    """Force images to be cached forever"""
    tokens = path.split(".")
    if len(tokens) > 1:
        extension = tokens[-1].lower()
        if extension in ('png', 'jpg', 'jpeg', 'ico', 'gif'):
            headers['Cache-Control'] = 'public, max-age=315360000'
WHITENOISE_ADD_HEADERS_FUNCTION = cache_images_forever

Now our caching for the front-end is complete. Let’s move on to the second phase of front-end optimization and try to minify our files.

Minifying the Files:

For minifying our files we are going to use spaceless template tags. Although this will not minify our files per se, it is still going to give us a better result as it would help us reduce our page weight. How?? Well, spaceless template tagsremoves whitespace between HTML tags. This includes tab characters and newlines. This is really easy to implement in Django templates. All we need to add is spaceless and close it with endspacelessin our base.html. As our base.html is going to be used everywhere as the basic skeleton, we can be sure that it will be applied to all the other templates within it as well.

Now that we are done with our front-end optimization let’s move to our back-end. This is perhaps the one place where we would be able to achieve the maximum amount of efficiency.

Optimizing the Back-end:

Ideally, the flow for optimizing your code is to move from your queries to how your Django is functioning. In queries, we need to see the number of queries, the amount of time it takes to execute the query and how our Django ORM'S are working. Internally Django already does a lot for us but we can optimize its queries even further.
We need to start scavenging the code and check the total number of query calls per page. There are a few ways to do this. The most obvious one being logging all the queries in sql.log and start reading the queries from there. There is one other way of accomplishing this, that is by using django debug toolbar.

Now django debug toolbar is a really powerful tool and extremely easy to use. It helps in debugging responses and provides us with a bunch of configurable panels that we could use. We are mainly going to use this to see how quick our queries are and if we are performing any redundant queries.

Django Debug Toolbar

So let’s first integrate the toolbar in Charcha. To install django debug toolbar we are going to follow this doc. Its pretty straightforward and easy to implement. So first we need to install the package using pip.

 $ pip install django-debug-toolbar

One thing to note here is that if you are using a virtualenv avoid using sudo. Ideally we should not be using sudo anyways if we are in a virtualenv. Now our django debug toolbar is installed. We need to now add it in our settings.py INSTALLED_APPS by adding debug_toolbar.

INSTALLED_APPS = [
    ...
    'django.contrib.staticfiles',
    ...
    'debug_toolbar',
    ...
]

Now we need to put the debug toolbar in our middleware. Since there is no preference of the django debug toolbar middleware the middleware stack we can add it wherever we see fit.

MIDDLEWARE = [
    # ...
    'debug_toolbar.middleware.DebugToolbarMiddleware',
    # ...
]

Now we need to add the url to configure the toolbar. We just need to add the url and put it in our DEBUG mode, the rest django debug toolbar will take care of, on its own.

from django.conf import settings
if settings.DEBUG:
    import debug_toolbar
    urlpatterns = [
        url(r'^__debug__/', include(debug_toolbar.urls)),
    ] + urlpatterns

There are also a few configurations that we could use but for now we are going to set it to default as its not required (with the correct version this should be done automatically).

DEBUG_TOOLBAR_PATCH_SETTINGS = False

Also, since we are using our local we need to set the following INTERNAL_IP for django debug toolbar to understand.

INTERNAL_IPS = ('127.0.0.1',)

And now we test. Mostly our screen would look something like the image below.

Django Debug toolbar
Django Debug Toolbar in Action

We can now start checking the number of queries that are being run and how much time they are taking to execute as well.

This part is actually a bit easy as all we have to do is set a benchmark on the basis of the amount of data that we have. We also need to consider the higher effect where if we have a lot of data within a lot of foreign keys, we might need to start using indexes there.

To show how we are going to refactor the code we went ahead and started seeing the performance of all the pages. Most refactoring would mostly look at how we could reduce the number of queries. As an example we are going to look at the discussion page.

Django Debug toolbar
Django Debug Toolbar in the discussion page

Now according to the django debug toolbar there are a lot of queries that we are making 8 queries whenever we are making our call to see the post. This would, later on, start giving us problems if we don’t eliminate a few of them. The way we are going to approach optimizing queries is in subparts as follows:

Solving N+1 queries:

We can solve the problem of N+1 queries simply by using prefetch_related and select_related in Charcha. With the two functions, we can have a tremendous performance boost as well. But first, we need to understand what exactly is it that they do and how can we implement it in Charhca.

select_related should be used when the object that you are selecting in a single object of a model, so something like a ForeignKey or a OneToOneField. So whenever you make such a call, select_related would do a join of the two tables and send back the result to you thereby reducing an extra query call to the ForeignKey table.
Let’s see an example to better understand how we can integrate this in Charcha.
We are going to take the example of the Post model which looks something like this:

class Post(Votable):
    ...
    objects = PostsManager()
    title = models.CharField(max_length=120)
    url = models.URLField(blank=True)
    text = models.TextField(blank=True, max_length=8192)
    submission_time = models.DateTimeField(auto_now_add=True)
    num_comments = models.IntegerField(default=0)
    ...

Do note that we have a custom manager defined, so whichever query we need to execute we can define it in our manager. Now in your first glance, you can see that our class Post is inheriting from Votable, so we now need to see what is happening in this class.

class Votable(models.Model):
    ...
    votes = GenericRelation(Vote)
    author = models.ForeignKey(User, on_delete=models.PROTECT)
    ....

Whenever we make a call to check the Author of our Post we will be doing an extra call to the database. So now we go to our custom manager and change the way are fetching the data.

class PostsManager(models.Manager):
    def get_post_with_my_votes(self, post_id, user):
        post = Post.objects\
            .annotate(score=F('upvotes') - F('downvotes'))\
            .get(pk=post_id)
        ...
        return post

If we use the django debug toolbar you would see that whenever we do a call like get_post_with_my_votes().author, we are going to be executing an extra query to the User table.

Before select_related
Before select_related

This is not required and can be rectified easily by using select_related. What do we have to do? Just add select_related to the query.

class PostsManager(models.Manager):
    def get_post_with_my_votes(self, post_id, user):
        post = Post.objects\
            .annotate(score=F('upvotes') - F('downvotes'))\
            .select_related('author').get(pk=post_id)
        ...
        return post

And that’s it. Our redundant query should be removed. Lets check it using django debug toolbar.

After select_related
After select_related

We can use prefetch_related when we are going to get a set of things, so basically something like a ManyToManyField or a reverse ForeignKey. How does this help? Well, the way prefetch_related works is it makes another query and therefore reduces the redundant columns in the original object. This as of now is not really required so we are going to let it be.

* Query in a loop:

Though this is not done anywhere in Charcha but this a practise which a lot of people follow without realising its impact. Although this seems pretty basic I still feel that requires its separate section.

post_ids = Post.objects.filter(id = ?).values_list('id', flat=True)
authors = []
for post in post_ids:
    #Disaster as when we would have a lot of post_ids say around 10,000
    #we would be making that many calls, basically (n+1) calls.
    post = Author.objects.filter(id= post_ids)
    authors.append(post)

The above example is just a sample of how disasterous a query in a loop could be for a large dataset and this above example can easily be solved by the previous discussion (on select_related and prefetch_related) which we had.

Denormalization:

I recommend using denormalization only if we have some performance issues and not prematurely start optimizing it. We should check the queries before we start with our denormalization as it does not make any sense if we keep on adding more complexity if it is not impacting our performance.

The best place to explain denormalization implementation in Charcha is in the Votable model as done in the 2nd blog of this series. Why the Votable table? This is because we want to show the score i.e. upvotes – downvotes and the comments on the home page. We could make a join on the Votable table and call it from there but it would slow down our system. Instead what we are doing is, adding the fields of upvotesdownvotes and flag in the Votable model. This would in turn reduce our calls.

class Votable(models.Model):
    ...
    upvotes = models.IntegerField(default=0)
    downvotes = models.IntegerField(default=0)
    flags = models.IntegerField(default=0)
    ...

Now we can just inherit these properties in whichever models we require and from there we can move forward. Although this seems pretty easy to implement it does have a drawback. Every time there is some changes that is made, we would have to update these fields. So instead of making a JOIN we would rather have a more complex UPDATE.

WBS for Tracking Comments Hierarchy

This is also a form of denormalization. Each comment needs a reference to its parent comment and so on , so it basically makes something like the tree structure.

class Comment(Votable):
    ...
    wbs = models.CharField(max_length=30)
    ...

The problem here is that self-referential calls are exceedingly slow. So we can refactor this approach and add a new field called wbs which would help us track the comments as a tree. How would it work? Every comment would have a code, which is a dotted path. So the first comment would have the code “.0001” and the second top-level comment would have the code “.0002” and so on. Now if someone responds to the first comment it gets a code of “.0001.0001”. This would help us avoid doing self-referential queries and use wbs instead.

Now the limitation for this field is we would only allow 9999 comments at each level and the height of the wbs tree would only go till 6, which is sort of acceptable in our case. But in the case of having to go through a large database, we would have to index this field as well. We would discuss this in the next section.

Adding Indexes

Indexes is one of the many standard DB optimization techniques and django provides us with a lot of tools to add these indexes. Once we have identified which all queries are taking a long time we can use Field.db_index or Meta.index_together to add these from Django.

Before we start adding indexes we need to identify where all should we add these properly and to do that we will use django debug toolbar to see how fast we get our response. Now we are going to look at the post which we had before and we are going to track its queries. We are going to select a query which we could easily optimize indexes (given below)

Before indexes
Before indexes
SELECT ••• FROM "votes" WHERE ("votes"."type_of_vote" IN ('1', '2') AND "votes"."voter_id" = '2' AND "votes"."content_type_id" = '7' AND "votes"."object_id" = '1')

Now, this particular query is taking 1.45s to execute. Now, all we have to see is the table and which field we could add the index on. Since this query belongs to model Votes we are going to add the index on content_type_id and object_id. How?

class Vote(models.Model):
    class Meta:
        db_table = "votes"
        index_together = [
            ["content_type", "object_id"],
        ]

And that’s all we have to do. Now we run our migrations and check our performance.

After indexes
After indexes

Now, this query is taking only 0.4 seconds to execute and that is how easy it is to implement indexes.

Django QuerySets are LAZY

One thing to note when using django is that django querysets are lazy , what that means is queryset does not do any database activity and django will only run the query when the queryset is evaluated. So if we make a call from the Post model like

q = Post.objects.filter(...)
q = q.exclude(...)
q = q.filter(...)

This would make three separate queries which is not really required.

Caching sessions

One thing to notice using the django debug toolbar is that almost all the pages have a query made to retrieve the Django session.

Before cookie based sessions
Before cookie based sessions

The query that we are tackling is given below and since its being used in almost all the places we can simply cache it once and reap the benefits everywhere else.

SELECT ••• FROM "django_session" WHERE ("django_session"."session_key" = '''6z1ywga1kveh58ys87lfbjcp06220z47''' AND "django_session"."expire_date" > '''2017-05-03 10:56:18.957983''')

By default, Django stores the sessions in our database and expects us to occasionally prune out old entries which we would not be doing.

So on each request, we are doing a SQL query to get the session data and another to grab the User object information. This is really not required and we can easily add a cache here to reduce the load on our server. But the question still remains on which store should we use? From a higher operational point of view, introducing a distributed data store like redis is not really required. Instead, we could simply use cookie-based sessions here.

Do note that we would not be using cookie-based sessions for highly secure sites but Charcha does not need to be highly secure.

How do we implement it?

Using cookie-based sessions is very easy to do. We can simply follow the steps given in this link. All we have to do is add the following in our settings.py (or common.py as per Charcha) and see the magic happen. We will switch to storing our sessions in our cache and easily remove a SQL query from every single request to our site.

SESSION_ENGINE = "django.contrib.sessions.backends.signed_cookies"

Now from the snapshot given below, we can see that our redundant query is removed.

After cookie based sessions
After cookie-based sessions

Hence, we have accomplished our task of optimizing our site.

Summary

In this post, we have discussed how one should proceed when optimizing their site. We have tackled all the domains, discussed the various techniques to optimize a Django site and talked about the tools like which django debug toolbar can be used for a sites assessment.

AngularJS is a popular JavaScript-based open-source front-end web application framework that has a lot of its own optimizations. It is a huge framework that can meet many challenges of developing single-page applications. However, there might be cases where the incorrect use of Angular methods may backfire and cause your application to start lagging. Below are few pointers which might improve the performance of your Angular projects:

1. Watchers:

I am starting with this as I feel that this is the first place where any angular expert would suggest you look at. Now, if your angular page is slow and laggy, then either you are using too many watchers or you are making them do too much work.

Let me give you a summary on how watchers work. Angular uses dirty checking to keep track of all the changes in the application. What this means is that it will have to go through every watcher to check if they need to be updated (basically angular restarts the digest cycle). If one of the watcher relies on another watcher, your angular would have to restart all the digest cycles to make sure that all of the changes are accounted for and it will continue to do so till everything is stable.

 

There is always a workaround but first, you need to identify your application’s watchers. If only one screen is slow then most probably that screen has a lot of bad scopes, too many watchers or handlers that are taking too long. If everything is slow, then I have bad news for you, your digest loop is taking too long and you are most probably using angular wrong.

 

One useful tool for identifying watchers is Batarang (I highly recommend it). You can also try timing your digest cycle.

console.time('digest');
$rootScope.digest();
console.timeEnd('digest');

Make sure that your watches are not taking too long, either try segregating the functions which are used or minimize the number of events that are being called.

 

Also try to avoid using the deep watch. But what if your deep watches are required because your business logic demands it?

 

Well, there is a solution where you can switch deep watch to watchCollection. How?
Well, you’ll have to change your watches to watchcollection, perhaps something like:

$scope.$watch ('something._var1', function () {}, true);
$scope.$watch ('something._var2', function () {}, true);
//Use watch collection instead
$scope.$watchCollection ('something', function () {});

 

2. ng-repeat:

Everybody uses this but nobody ever looks at all the edge cases. We usually tend to work with 10-15 items but nobody ever tests with 1000 items. There might be cases where using so many items might start breaking your application.

 

The simplest workaround is to use infinite scrolling or pagination but if you want some alternate solution, try looking at the timeline of your page and there may be a slight possibility that you are using some widget or something like ui-select which would be taking too much time to render.

Also, try using track by $index wherever possible. Why?? Well, let me give a real-world use-case. Let’s say you have a refresh button which basically fetches new items for you, something like:

 $scope.someList = fetchNewItemsFromServer();

Now whenever we are going to hit the refresh button and fill the $scope.someList variable with some data, what ng-repeat would be doing is removing all the li elements of list and creating them again. Why does ng-repeat do this? Behind the scenes ng-repeat is adding a property $$hashkey to each item so as to keep track of it. Now even if we are updating $scope.someList with the same data, ng-repeat will still trigger all the DOM manipulations again because your $$hashkey will never be the same. There are a few hacks but the best solution to solve this problem is track by. It allows you to specify your own key for ng-repeat to identify objects by, instead of just generating unique IDs. So if you are getting the same elements then ng-repeat will not recreate all the DOM elements again and again.

3. ng-show/ng-hide:

This is perhaps the most easiest thing to fix for improving your page’s performance. What do we do? Use ng-if instead of ng-show or ng-hide. Why would this help? The reason is quite simple, ng-show is going to render the element (where you have added ng-show) and use display:none to hide it. What does ng-if do? ng-if will actually remove the element from your DOM and will re-create it if it is required. Do note that, you might want to take a hard look at your use-case and see if ng-if is beneficial. You may need something like a toggle (ng-show) which toggles the elements on and off often.

 

4. DOM access:

For those who are fond of accessing the DOM using angular.element('something'), must know that this is really expensive. Since I am already mentioning DOM, you get extra points for keeping it small and negative points for keeping it huge. Also, it’s worth mentioning that whenever you try to restructure your DOM using whichever language, never use inline styles. This is because of javascript reflow. What is reflow? Well, reflow is ideally (re)calculation of your positions of elements in your application. Just avoid reflow if you can.

5. Objects and Arrays:

I believe I read somewhere that arrays are always faster than objects. Well, that is true if you are iterating them. But if you are using the iteration to get a particular element then we should probably have a look at that silly thing which we learnt in our college days. Remember? Objects or dictionaries have a complexity of O(1), which is any day better than O(n)(for arrays), so we can use it as a key-value pair and get a 10x optimization just like that. This whole optimization depends on the developer and how they want their data to be represented.

 

6. Filters:

So you remember that ng-repeat which we talked about above well it has a cousin :filter, which might be slowing your application down. How? Well, filters run twice per digest cycle, once when anything changes, and another time to collect further changes. They do not actually remove any part of the collection from memory either, instead simply masking filtered items with css. Even I wasn’t too certain about this but upon further investigation I found that this actually happens and might literally break your application if you are not careful enough.

 

7. Debounce:

This is probably the latest love of mine. When optimizing your angular, one must always try to debounce. What is this debounce I speak of? Well, debounce is a function which limits the number of events which are getting triggered. Example, let’s say I have some sections and a sidebar with the list of sections. Now everytime I scroll to a section my sidebar should highlight the section and remove the previously selected section. Hmm, lets say I use $(window).on to trigger an event everytime I scroll (bare with me, this is just an example , I know there are better ways, but we want it to be dynamic). So my function ($window.on ) should calculate the section height, remove the previously selected sidebar element , highlight the correct section which the user is at and this will be triggered everytime the user is doing a scroll.

 

This would work for a few sections, but what happens when I have 100 sections and 100 section items in my sidebar? I ‘ll tell you. This very function hits your application in the stomach and makes your application super laggy.

 

How do we solve this problem? DEBOUNCE IT!! Basically, what your debounce would do is make sure that everytime you scroll , you are not calculating the value, but waiting for some interval and then firing the events.

 

How?? Well you can wrap your function on something like this:

this.debounce = function(myFunc, wait, immediate) {
    var timeout;
    return function() {
      var context = this, args = arguments;
      var applyMyFunction = function() {
        myFunc.apply(context, args)
      }
      var later = function() {
        timeout = null;
        if (!immediate) {
          applyMyFunction();
         };
      };
      var makeCall = immediate && !timeout;
      clearTimeout(timeout);
      timeout = setTimeout(later, wait);
      if (makeCall) {
        applyMyFunction();
      };
    };
  }

What this would do is fire the events of the function, on which you are wrapping debounce, at a certain rate and not everytime. You want to know the beauty of debounce, you can use debounce on ng-models or anything that is getting fired like crazy. This usually gives a huge performance boost. Psst.. React guys , you can use this logic as well.

 

8. Digest cycle:

Sometimes, while using angular, you have to explicitly tell your Angular to initiate its digest cycle as you may require to do some dirty checking. This is a pretty common use-case in angular and most developers would perhaps do this on some service or some directive. Most of the time this is done by using $apply. This might not be the best approach here. Why?? There might be a case where you are triggering a $apply() and a digest cycle is already in the process. This condition leads to the infamous rootscope:inprog errors. Most developers try to fix this by using intervals or if conditions on the phase of the scope to check whether a digest cycle is already in progress or not. But there is a better and a more optimal solution scope.$evalAsync. What does it do? Well, it lets you queue operations up for execution at the end of the current digest cycle without marking the scope dirty for another digest cycle. This is much more faster than any interval.