This is the second post in the Django Blog Series. In this post, we will go through the data modeling of a discussion forum application in Django. This post is inspired from the data model design of Charcha Discussion Forum. You can find the full code for the charcha forum here.
Let us assume we have a requirement to create the django models of a discussion platform like Charcha. In this blog post, we will go through the process of designing such a model which is both conforming to our requirements as well as efficient.
You have been given the following requirements:
To design the data models for a discussion platform on which a user can start a discussion, comment on the discussion, upvote, downvote and flag the comments or posts. The users will have scores, which will be the difference of upvotes and downvotes. The score of a user and the number of comments on a post should be visible on the discussion pages. There should also be a feature for replying on a comment upto a hierarchy of six levels.
The User model – Abstract User
Our application, as per our requirement is unlike most of the interactive applications. It is mostly user oriented. Therefore, we’ll need a user model derived from AbstractUser which is a part of the default Django models.
You can see here that we have overridden the
AbstractUser model and added a field for storing the score of a user. We will use it as our custom user model to store our user information.
Content Type and Generic Relations
Now we have to create a
Vote model, the purpose of this model will be to store the information about the vote, the voter and the content which is voted upon. The content can be a comment or a post.
Let’s do this using the naive method:
Now, this method has an obvious imperfection. The vote will be either on a post or on a comment. So, this will result in either of the posts or comment foreign key to remain null in every case. A big NO.
We can make this better by using the Generic Relations
ContentTypemodel represents and store information about the models installed in your project, and new instances of ContentType are automatically created whenever new models are installed.
Now we do not need to keep Foreign Key to other Django models we want to track. Using the GenericRelations, we can now track those votes to any model we want without having to modify the Vote model.
The reverse relation will be the part of the models we need to track. For example:
Now, If we put a little thought into our existing
Comment models, we would observe that the two models should behave more or less in the same fashion. For instance, both of them can be upvoted, downvoted, flagged, unflagged and so they should provide interfaces to do so.
Hence we can create a base class for them as
Votable and push the common behaviors and attributes to it. The
Comment will then be concrete classes and will inherit from
Till this point, we have a basic scaffold of our Django models up and we can actually visualize the data models coming to life.
[/et_pb_text][et_pb_text admin_label=”Using Denormalization to Improve Query Performance” _builder_version=”3.0.82″ background_layout=”light” border_style=”solid” box_shadow_position=”outer”]
Using Denormalization to Improve Query Performance
Now, a part of our requirement is that we need to show the score of the user on the discussion page. The score is calculated as (upvotes-downvotes). Also, we need to show the number of comments on a post.
In the above snippet, you can see three fields in the Votable class viz. upvotes, downvotes, and flags. The purpose of these fields is to store the counts of the respective types of votes. Had we not defined the fields here, there would have been a necessity to perform a join against the Votable table and run a group by query, each time we had to retrieve these counts. This may have had impacts on the overall performance.
But as you can guess, there is always a tradeoff with denormalization. In our case, we would have to maintain these counts with every method.
Custom Model Managers
Now since we have our Post and Comment models ready, we would like to add functionalities to CRUD our Django models.
Naturally, we would like to code up the functionalities in our
views.py. We can do that, but if we put a little more thought to this approach, we would find a few flaws with it. Code written in views is difficult to test since we would have to mock the request and response objects to write the unit tests. Not to mention, the code in views is intermingled with the request and response handlers and it’ll be difficult to reuse the code as well.
A better and more efficient way to implement the functionalities relating to the models is in the models.py itself, as a Custom Model Manager.
Now, we’ll be able to call these methods from the views file and also it will be easy to write the unit test cases for them. So at some point, you should refactor your code and move logic from views.py to models.py. When you do so, it’s best to create a custom Django model manager.
Using Custom Queries
Sometimes we face scenarios when we have a huge amount of data that needs to be executed based on certain get or filter queries. At such times, the model query APIs are not enough for us and we need the ability to write custom queries. Django gives us this liberty by letting us execute raw queries.
In the above snippet, we can see that in the
best_ones_first method of the
CommentsManager class, we have executed a custom query to get the data, following which we have formatted the data before returning it.
Using WBS in Tracking Comments Hierarchy
According to our requirements, every post can have comments in the hierarchy. It simply means that we can have replied to comments and this comment-reply chain can go on to a depth of six levels. In a naive approach, we could have a structure where every comment has a reference to the parent comment, so it eventually forms a tree-like structure. Technically, we can just use this pointer to reconstruct the tree. But the problem with this approach is that self-referential queries are slow and hence can and will hamper our performance. The better alternative in our case will be to use a WBS.
In the WBS(Work Breakdown Structure) approach, every comment will have a WBS code which will be a dotted path. Hence the first comment will have a WBS code of .0001 while the second top-level comment would have the code .0002. If someone responds to the first comment, then the WBS for that would be .0001.0001. This would allow at max 9999 comments at each level.
The beauty of this approach is that we can simply sort by the WBS column and get the results in the right order. This makes rendering the results very easy.
Even this is a form of denormalization since our purpose is to reduce joins to improve query performance. Therefore, like all denormalization methods, this has its tradeoffs which is to maintain the overhead WBS field everytime a user adds or removes a comment.
Through this post, we underwent the journey of building a Django data model for a discussion forum from scratch and in this process we came across some really interesting and efficient techniques for model design. These are some techniques that can come to our aid which we can make use to design our Django models in a much better and efficient manner.