Django Models – Designing Data Model for Discussion Forum (Charcha)

Reet Shrivastava

02 Jun 2017

https://static.hashedin.com/wp-content/uploads/2017/06/Reet-2-150x150.png

This is the second post in the Django Blog Series. In this post, we will go through the data modeling of a discussion forum application in Django. This post is inspired from the data model design of Charcha Discussion Forum. You can find the full code for the charcha forum here.

Let us assume we have a requirement to create the django models of a discussion platform like Charcha. In this blog post, we will go through the process of designing such a model which is both conforming to our requirements as well as efficient.

You have been given the following requirements:

To design the data models for a discussion platform on which a user can start a discussion, comment on the discussion, upvote, downvote and flag the comments or posts. The users will have scores, which will be the difference of upvotes and downvotes. The score of a user and the number of comments on a post should be visible on the discussion pages. There should also be a feature for replying on a comment upto a hierarchy of six levels.


The User model – Abstract User

Our application, as per our requirement is unlike most of the interactive applications. It is mostly user oriented. Therefore, we’ll need a user model derived from AbstractUser which is a part of the default Django models.

class User(AbstractUser):
    """Our custom user model with a score"""
    class Meta:
        db_table = "users"
    score = models.IntegerField(default=0)

You can see here that we have overridden the AbstractUser model and added a field for storing the score of a user. We will use it as our custom user model to store our user information.

Content Type and Generic Relations

Now we have to create a Vote model, the purpose of this model will be to store the information about the vote, the voter and the content which is voted upon. The content can be a comment or a post.
Let’s do this using the naive method:

class Vote(models.Model):
    class Meta:
        db_table = "votes"
        index_together = [
            ["content_type", "object_id"],
        ]
    post = models.ForeignKey(Post, null=True)
    comment = models.ForeignKey(Comment, null=True)
    voter = models.ForeignKey(User, on_delete=models.PROTECT)
    type_of_vote = models.IntegerField(choices = (
            (UPVOTE, 'Upvote'),
            (DOWNVOTE, 'Downvote'),
            (FLAG, 'Flag'),
        ))
    submission_time = models.DateTimeField(auto_now_add=True)

Now, this method has an obvious imperfection. The vote will be either on a post or on a comment. So, this will result in either of the posts or comment foreign key to remain null in every case. A big NO.
We can make this better by using the Generic Relations

class Vote(models.Model):
    class Meta:
        db_table = "votes"
        index_together = [
            ["content_type", "object_id"],
        ]
    # The following 3 fields represent the Comment or Post
    # on which a vote has been cast
    # See Generic Relations in Django's documentation
    content_type = models.ForeignKey(ContentType, on_delete=models.PROTECT)
    object_id = models.PositiveIntegerField()
    content_object = GenericForeignKey('content_type', 'object_id')
    voter = models.ForeignKey(User, on_delete=models.PROTECT)
    type_of_vote = models.IntegerField(choices = (
            (UPVOTE, 'Upvote'),
            (DOWNVOTE, 'Downvote'),
            (FLAG, 'Flag'),
        ))
    submission_time = models.DateTimeField(auto_now_add=True)

Here, the ContentTypemodel represents and store information about the models installed in your project, and new instances of ContentType are automatically created whenever new models are installed.

Now we do not need to keep Foreign Key to other Django models we want to track. Using the GenericRelations, we can now track those votes to any model we want without having to modify the Vote model.
The reverse relation will be the part of the models we need to track. For example:

class Post:
    ...
    ...
    votes = GenericRelation(Vote)
    ...
    ...
class Comment:
    ...
    ...
    votes = GenericRelation(Vote)
    ...
    ...

Now, If we put a little thought into our existing Post and Comment models, we would observe that the two models should behave more or less in the same fashion. For instance, both of them can be upvoted, downvoted, flagged, unflagged and so they should provide interfaces to do so.

Hence we can create a base class for them as Votable and push the common behaviors and attributes to it. The Post and Comment will then be concrete classes and will inherit from Votable.

class Votable(models.Model):
    """ An object on which people would want to vote
        Post and Comment are concrete classes
    """
    class Meta:
        abstract = True
    votes = GenericRelation(Vote)
    author = models.ForeignKey(User, on_delete=models.PROTECT)
    # denormalization to save database queries
    # flags = count of votes of type "Flag"
    upvotes = models.IntegerField(default=0)
    downvotes = models.IntegerField(default=0)
    flags = models.IntegerField(default=0)
    def upvote(self, user):
        ....
        ....
    def downvote(self, user):
        ....
        ....
    # more common methods below
class Post(Votable):
	# post specific implementation
	...
	...
class Comment(Votable):
	# comment specific implementation
	...
	...

Till this point, we have a basic scaffold of our Django models up and we can actually visualize the data models coming to life.
[/et_pb_text][et_pb_text admin_label=”Using Denormalization to Improve Query Performance” _builder_version=”3.0.82″ background_layout=”light” border_style=”solid” box_shadow_position=”outer”]

Using Denormalization to Improve Query Performance

Now, a part of our requirement is that we need to show the score of the user on the discussion page. The score is calculated as (upvotes-downvotes). Also, we need to show the number of comments on a post.

In the above snippet, you can see three fields in the Votable class viz. upvotes, downvotes, and flags. The purpose of these fields is to store the counts of the respective types of votes. Had we not defined the fields here, there would have been a necessity to perform a join against the Votable table and run a group by query, each time we had to retrieve these counts. This may have had impacts on the overall performance.
But as you can guess, there is always a tradeoff with denormalization. In our case, we would have to maintain these counts with every method.

Custom Model Managers

Now since we have our Post and Comment models ready, we would like to add functionalities to CRUD our Django models.

Naturally, we would like to code up the functionalities in our views.py. We can do that, but if we put a little more thought to this approach, we would find a few flaws with it. Code written in views is difficult to test since we would have to mock the request and response objects to write the unit tests. Not to mention, the code in views is intermingled with the request and response handlers and it’ll be difficult to reuse the code as well.
A better and more efficient way to implement the functionalities relating to the models is in the models.py itself, as a Custom Model Manager.

class PostsManager(models.Manager):
    def get_post_with_my_votes(self, post_id, user):
        # implementation
    def recent_posts_with_my_votes(self, user=None):
        # implementation
    def _append_votes_by_user(self, posts, user):
        # implementation
class CommentsManager(models.Manager):
    def best_ones_first(self, post_id, user_id):
        comment_type = ContentType.objects.get_for_model(Comment)
        from django.db import connection
        with connection.cursor() as cursor:
            cursor.execute("""
                SELECT c.id, c.text, u.id, u.username, c.submission_time,
                c.wbs, length(c.wbs)/5 as indent,
                c.upvotes, c.downvotes, c.flags,
                c.upvotes - c.downvotes as score,
                up.is_upvoted, down.is_downvoted
                FROM comments c
                INNER JOIN users u on c.author_id = u.id
                LEFT OUTER JOIN (
                    SELECT 1 as is_upvoted, v1.object_id as comment_id
                    FROM votes v1
                    WHERE v1.content_type_id = %s
                    AND type_of_vote = 1
                    AND v1.voter_id = %s
                ) up on c.id = up.comment_id
                LEFT OUTER JOIN (
                    SELECT 1 as is_downvoted, v2.object_id as comment_id
                    FROM votes v2
                    WHERE v2.content_type_id = %s
                    AND type_of_vote = 2
                    AND v2.voter_id = %s
                ) down on c.id = down.comment_id
                WHERE c.post_id = %s
                ORDER BY c.wbs
            """, [comment_type.id, user_id,
                    comment_type.id, user_id,
                    post_id])
            comments = []
            for row in cursor.fetchall():
                comment = self.model(
                        id = row[0], text = row[1],
                        submission_time = row[4],
                        wbs = row[5],
                        upvotes = row[7], downvotes=row[8],
                        flags = row[9]
                    )
                author = User(id=row[2], username=row[3])
                comment.author = author
                comment.indent = row[6]
                comment.score = row[10]
                comment.is_upvoted = True if row[11] else False
                comment.is_downvoted = True if row[12] else False
                comments.append(comment)
            return comments

Now, we’ll be able to call these methods from the views file and also it will be easy to write the unit test cases for them. So at some point, you should refactor your code and move logic from views.py to models.py. When you do so, it’s best to create a custom Django model manager.

Using Custom Queries

Sometimes we face scenarios when we have a huge amount of data that needs to be executed based on certain get or filter queries. At such times, the model query APIs are not enough for us and we need the ability to write custom queries. Django gives us this liberty by letting us execute raw queries.
In the above snippet, we can see that in the best_ones_first method of the CommentsManager class, we have executed a custom query to get the data, following which we have formatted the data before returning it.

Using WBS in Tracking Comments Hierarchy

According to our requirements, every post can have comments in the hierarchy. It simply means that we can have replied to comments and this comment-reply chain can go on to a depth of six levels. In a naive approach, we could have a structure where every comment has a reference to the parent comment, so it eventually forms a tree-like structure. Technically, we can just use this pointer to reconstruct the tree. But the problem with this approach is that self-referential queries are slow and hence can and will hamper our performance. The better alternative in our case will be to use a WBS.

In the WBS(Work Breakdown Structure) approach, every comment will have a WBS code which will be a dotted path. Hence the first comment will have a WBS code of .0001 while the second top-level comment would have the code .0002. If someone responds to the first comment, then the WBS for that would be .0001.0001. This would allow at max 9999 comments at each level.

class Comment(Votable):
	...
	...
	other fields
	...
	...
	# wbs helps us to track the comments as a tree
    	# Format is .0000.0000.0000.0000.0000.0000
    	# This means that:
    	# 1. We only allow 9999 comments at each level
    	# 2. We allow threaded comments upto 6 levels
    	wbs = models.CharField(max_length=30)
    	...
    	...

The beauty of this approach is that we can simply sort by the WBS column and get the results in the right order. This makes rendering the results very easy.
Even this is a form of denormalization since our purpose is to reduce joins to improve query performance. Therefore, like all denormalization methods, this has its tradeoffs which is to maintain the overhead WBS field everytime a user adds or removes a comment.

Summary

Through this post, we underwent the journey of building a Django data model for a discussion forum from scratch and in this process we came across some really interesting and efficient techniques for model design. These are some techniques that can come to our aid which we can make use to design our Django models in a much better and efficient manner.


Have a question?

Need Technology advice?

Connect

+1 669 253 9011

contact@hashedin.com

facebook twitter linkedIn youtube