A famous saying goes as “Having fun is the best way to learn”. In other words, we can say that learning is most effective when the learner is having fun during the process.

 

HashedIn University, a uniquely curated Bootcamp, with a sole focus on chiseling the new campus recruits embodies just that concept.

 

The new recruits referred to as linkers are exposed to a learning process which is quite unique in its approach. The intense learning sessions are complemented by appending fun elements, making it so much more impactful.

 

During the entire duration of the Bootcamp, Linkers are divided into multiple teams, which compete against each other in various fun activities. These regular sessions not only provide the much-needed break from the technical sessions but also give ample opportunities for the Linkers to bond.

 

For the fresh graduates, probably reminiscing about college fests, cultural activities conducted at HU, like  Mad Ads, Mockumentary, Decorations, Board Games and sports activities like Football and Badminton, comes as an assurance that professional lives could be fun too.

 

This distinguished Bootcamp never misses a chance to instill values of team building and leadership and unveil individual talents, through its blended learning approach.

So What’s Different?

The first thing that strikes a bell for anyone when you say “University” are endless lectures, memorizing of theories and then finally reproducing them in a 3-hour examination. I

 

t wouldn’t really matter if you have understood the concepts or not as long as you are able to reproduce them on a piece of paper, isn’t it? But then you cannot always “ scrape through” life. To climb the ladder of success, you will need to ensure you not only understand concepts but also put them to use.

 

This is exactly what “HashedIn University”, an intense Bootcamp does. It gears you up with skills and empowers you with knowledge. Technical sessions, critical-thinking activities, brainstorming ideas and hands-on project ensures Linkers have fun, learning.

 

The project built at the end of the session ensures the concepts learned all firmly imbibed.

Excerpts From My Experience as a Linker – 2016

During the 2016 HU, we had an agile development session which was just as fun as it was informative. During this session, we had a fun exercise in which we had to form teams and make paper airplanes as client deliverables.

 

This exercise taught us life skills of interpersonal communication, teamwork, and agile development process while providing us that fun element all along.  

 

Of course, what better way to bond than having a scrumptious meal with your teammates? The weekly outings were always something to look forward to.

Learning Continues

The intense Hashedin University program may have ended, but the learnings from this period remain with us through the lifetime. The skills we learn, be it technical or life skills, provide us with a platform to build upon and are a base for the future career path of the future Hashers.

 

As a proud Hasher, I can confidently say, that learning new technology, developing professional skills and imbibing values through fun sessions and activities have definitely helped me come a long way. The message is clear and the process makes it fun.

 

You could read what is HashedIn University all about here.

In this post, we will go through the data modeling of a discussion forum application in Django. This post is inspired from the data model design of Charcha Discussion Forum. You can find the full code for the charcha forum here.

Let us assume we have a requirement to create the django models of a discussion platform like Charcha. In this blog post, we will go through the process of designing such a model which is both conforming to our requirements as well as efficient.

You have been given the following requirements:

To design the data models for a discussion platform on which a user can start a discussion, comment on the discussion, upvote, downvote and flag the comments or posts. The users will have scores, which will be the difference of upvotes and downvotes. The score of a user and the number of comments on a post should be visible on the discussion pages. There should also be a feature for replying on a comment upto a hierarchy of six levels.


The User model – Abstract User

Our application, as per our requirement is unlike most of the interactive applications. It is mostly user oriented. Therefore, we’ll need a user model derived from AbstractUser which is a part of the default Django models.

class User(AbstractUser):
    """Our custom user model with a score"""
    class Meta:
        db_table = "users"
    score = models.IntegerField(default=0)

You can see here that we have overridden the AbstractUser model and added a field for storing the score of a user. We will use it as our custom user model to store our user information.

Content Type and Generic Relations

Now we have to create a Vote model, the purpose of this model will be to store the information about the vote, the voter and the content which is voted upon. The content can be a comment or a post.
Let’s do this using the naive method:

class Vote(models.Model):
    class Meta:
        db_table = "votes"
        index_together = [
            ["content_type", "object_id"],
        ]
    post = models.ForeignKey(Post, null=True)
    comment = models.ForeignKey(Comment, null=True)
    voter = models.ForeignKey(User, on_delete=models.PROTECT)
    type_of_vote = models.IntegerField(choices = (
            (UPVOTE, 'Upvote'),
            (DOWNVOTE, 'Downvote'),
            (FLAG, 'Flag'),
        ))
    submission_time = models.DateTimeField(auto_now_add=True)

Now, this method has an obvious imperfection. The vote will be either on a post or on a comment. So, this will result in either of the posts or comment foreign key to remain null in every case. A big NO.
We can make this better by using the Generic Relations

class Vote(models.Model):
    class Meta:
        db_table = "votes"
        index_together = [
            ["content_type", "object_id"],
        ]
    # The following 3 fields represent the Comment or Post
    # on which a vote has been cast
    # See Generic Relations in Django's documentation
    content_type = models.ForeignKey(ContentType, on_delete=models.PROTECT)
    object_id = models.PositiveIntegerField()
    content_object = GenericForeignKey('content_type', 'object_id')
    voter = models.ForeignKey(User, on_delete=models.PROTECT)
    type_of_vote = models.IntegerField(choices = (
            (UPVOTE, 'Upvote'),
            (DOWNVOTE, 'Downvote'),
            (FLAG, 'Flag'),
        ))
    submission_time = models.DateTimeField(auto_now_add=True)

Here, the ContentTypemodel represents and store information about the models installed in your project, and new instances of ContentType are automatically created whenever new models are installed.

Now we do not need to keep Foreign Key to other Django models we want to track. Using the GenericRelations, we can now track those votes to any model we want without having to modify the Vote model.
The reverse relation will be the part of the models we need to track. For example:

class Post:
    ...
    ...
    votes = GenericRelation(Vote)
    ...
    ...
class Comment:
    ...
    ...
    votes = GenericRelation(Vote)
    ...
    ...

Now, If we put a little thought into our existing Post and Comment models, we would observe that the two models should behave more or less in the same fashion. For instance, both of them can be upvoted, downvoted, flagged, unflagged and so they should provide interfaces to do so.

Hence we can create a base class for them as Votable and push the common behaviors and attributes to it. The Post and Comment will then be concrete classes and will inherit from Votable.

class Votable(models.Model):
    """ An object on which people would want to vote
        Post and Comment are concrete classes
    """
    class Meta:
        abstract = True
    votes = GenericRelation(Vote)
    author = models.ForeignKey(User, on_delete=models.PROTECT)
    # denormalization to save database queries
    # flags = count of votes of type "Flag"
    upvotes = models.IntegerField(default=0)
    downvotes = models.IntegerField(default=0)
    flags = models.IntegerField(default=0)
    def upvote(self, user):
        ....
        ....
    def downvote(self, user):
        ....
        ....
    # more common methods below
class Post(Votable):
	# post specific implementation
	...
	...
class Comment(Votable):
	# comment specific implementation
	...
	...

Till this point, we have a basic scaffold of our Django models up and we can actually visualize the data models coming to life.
[/et_pb_text][et_pb_text admin_label=”Using Denormalization to Improve Query Performance” _builder_version=”3.0.82″ background_layout=”light” border_style=”solid” box_shadow_position=”outer”]

Using Denormalization to Improve Query Performance

Now, a part of our requirement is that we need to show the score of the user on the discussion page. The score is calculated as (upvotes-downvotes). Also, we need to show the number of comments on a post.

In the above snippet, you can see three fields in the Votable class viz. upvotes, downvotes, and flags. The purpose of these fields is to store the counts of the respective types of votes. Had we not defined the fields here, there would have been a necessity to perform a join against the Votable table and run a group by query, each time we had to retrieve these counts. This may have had impacts on the overall performance.
But as you can guess, there is always a tradeoff with denormalization. In our case, we would have to maintain these counts with every method.

Custom Model Managers

Now since we have our Post and Comment models ready, we would like to add functionalities to CRUD our Django models.

Naturally, we would like to code up the functionalities in our views.py. We can do that, but if we put a little more thought to this approach, we would find a few flaws with it. Code written in views is difficult to test since we would have to mock the request and response objects to write the unit tests. Not to mention, the code in views is intermingled with the request and response handlers and it’ll be difficult to reuse the code as well.
A better and more efficient way to implement the functionalities relating to the models is in the models.py itself, as a Custom Model Manager.

class PostsManager(models.Manager):
    def get_post_with_my_votes(self, post_id, user):
        # implementation
    def recent_posts_with_my_votes(self, user=None):
        # implementation
    def _append_votes_by_user(self, posts, user):
        # implementation
class CommentsManager(models.Manager):
    def best_ones_first(self, post_id, user_id):
        comment_type = ContentType.objects.get_for_model(Comment)
        from django.db import connection
        with connection.cursor() as cursor:
            cursor.execute("""
                SELECT c.id, c.text, u.id, u.username, c.submission_time,
                c.wbs, length(c.wbs)/5 as indent,
                c.upvotes, c.downvotes, c.flags,
                c.upvotes - c.downvotes as score,
                up.is_upvoted, down.is_downvoted
                FROM comments c
                INNER JOIN users u on c.author_id = u.id
                LEFT OUTER JOIN (
                    SELECT 1 as is_upvoted, v1.object_id as comment_id
                    FROM votes v1
                    WHERE v1.content_type_id = %s
                    AND type_of_vote = 1
                    AND v1.voter_id = %s
                ) up on c.id = up.comment_id
                LEFT OUTER JOIN (
                    SELECT 1 as is_downvoted, v2.object_id as comment_id
                    FROM votes v2
                    WHERE v2.content_type_id = %s
                    AND type_of_vote = 2
                    AND v2.voter_id = %s
                ) down on c.id = down.comment_id
                WHERE c.post_id = %s
                ORDER BY c.wbs
            """, [comment_type.id, user_id,
                    comment_type.id, user_id,
                    post_id])
            comments = []
            for row in cursor.fetchall():
                comment = self.model(
                        id = row[0], text = row[1],
                        submission_time = row[4],
                        wbs = row[5],
                        upvotes = row[7], downvotes=row[8],
                        flags = row[9]
                    )
                author = User(id=row[2], username=row[3])
                comment.author = author
                comment.indent = row[6]
                comment.score = row[10]
                comment.is_upvoted = True if row[11] else False
                comment.is_downvoted = True if row[12] else False
                comments.append(comment)
            return comments

Now, we’ll be able to call these methods from the views file and also it will be easy to write the unit test cases for them. So at some point, you should refactor your code and move logic from views.py to models.py. When you do so, it’s best to create a custom Django model manager.

Using Custom Queries

Sometimes we face scenarios when we have a huge amount of data that needs to be executed based on certain get or filter queries. At such times, the model query APIs are not enough for us and we need the ability to write custom queries. Django gives us this liberty by letting us execute raw queries.
In the above snippet, we can see that in the best_ones_first method of the CommentsManager class, we have executed a custom query to get the data, following which we have formatted the data before returning it.

Using WBS in Tracking Comments Hierarchy

According to our requirements, every post can have comments in the hierarchy. It simply means that we can have replied to comments and this comment-reply chain can go on to a depth of six levels. In a naive approach, we could have a structure where every comment has a reference to the parent comment, so it eventually forms a tree-like structure. Technically, we can just use this pointer to reconstruct the tree. But the problem with this approach is that self-referential queries are slow and hence can and will hamper our performance. The better alternative in our case will be to use a WBS.

In the WBS(Work Breakdown Structure) approach, every comment will have a WBS code which will be a dotted path. Hence the first comment will have a WBS code of .0001 while the second top-level comment would have the code .0002. If someone responds to the first comment, then the WBS for that would be .0001.0001. This would allow at max 9999 comments at each level.

class Comment(Votable):
	...
	...
	other fields
	...
	...
	# wbs helps us to track the comments as a tree
    	# Format is .0000.0000.0000.0000.0000.0000
    	# This means that:
    	# 1. We only allow 9999 comments at each level
    	# 2. We allow threaded comments upto 6 levels
    	wbs = models.CharField(max_length=30)
    	...
    	...

The beauty of this approach is that we can simply sort by the WBS column and get the results in the right order. This makes rendering the results very easy.
Even this is a form of denormalization since our purpose is to reduce joins to improve query performance. Therefore, like all denormalization methods, this has its tradeoffs which is to maintain the overhead WBS field everytime a user adds or removes a comment.

Summary

Through this post, we underwent the journey of building a Django data model for a discussion forum from scratch and in this process we came across some really interesting and efficient techniques for model design. These are some techniques that can come to our aid which we can make use to design our Django models in a much better and efficient manner.

Today, our web applications have become extremely advanced. These are really advanced systems with newer technologies coming up each and every day. But if you dissect any web application, you will find two basic functionalities it provides to its users. Firstly, it displays information and secondly, it consumes information. With this comes the problem of “form spam“.

 

 

Now to consume information, each website makes use of forms. It is a common practice for a user to visit a website and fill out a form. Filling out forms is such an integral part of our web browsing experience that we don’t even realize how often we do it.

 

 

The problem with a basic form is that it is only concerned with the data that is put into it. Of course, there are smart forms that make use of validations to validate the data, but a basic form can never differentiate between the data filled by a real user vs data filled by a bot.

 

 

Making use of this vulnerability, a spammer create “bots” (automated web crawlers) to seek out and find pages with forms in hopes to automate the POST requests on them, this leads to a lot of spam entries and can eventually result in a DoS attack on the website.

 

 

As you see, the scenario is pretty common, and hence the need for solutions. There are some widely known and implemented solutions which are used to counterattack this problem.

 

 

Banning IP Addresses

This technique requires finding out the IP address of the spammer and banning all requests from that particular IP address.

 

This method rarely works because those can be spoofed or reassigned and you might actually end up blocking a legitimate user; spammers tend to use dynamic IP anyway!

 

 

Forcing users to sign up and verify their mail id before posting

This technique can be used and will be useful, but it shifts the burden from the site admin’s shoulders to that of the users. It is pretty obvious here that this method hinders the user experience on the website.

 

 

The HoneyPot Technique

The Honeypot Technique makes use of a hidden input field or checkbox with the form. The secret is that no human will see the checkbox or the input field, but most bots fill forms by inspecting the page and manipulating it directly, not through actual vision.

 

Therefore, any form that comes in with that hidden field filled or that checkbox value set allows you to know it wasn’t filled by a human. This technique is called a bot trap.

 

The disadvantage of using this technique is that a spammer can quite easily access the HTML of a page, and once he finds out the use of a honeypot on the form, bypassing it is just a cakewalk for him. Hence this might only work for a general spammer.

 

 

Standard CAPTCHA

Probably the most widely known and used technique. A CAPTCHA is a method to fight spam robots by requiring users to enter a code, often displayed as a distorted image. CAPTCHA is a sure shot method to stop spammers from registering on your site.

 

Spammers and bots cannot tell what letters and numbers there are in a CAPTCHA, therefore stopping them. The only problem with a CAPTCHA is that it is really REALLY annoying for the user.

 

If your website is heavily form based, it will impact the user experience of the website heavily and also might bring down the actual registrations or form submissions on the site.

 

 

Google’s ReCAPTCHA

Google’s ReCAPTCHA might have just solved the user experience problems with the standard Captcha. With ReCAPTCHA, the user need not enter distorted numbers or letters and neither do they need to solve arithmetic. They need just a single click to confirm they are not a robot.

 

So, reCAPTCHA will protect your website from spam with the better user experience. You can try reCAPTCHA here. Note that there have been some surfacing arguments about reCAPTCHA that its simplicity makes it more vulnerable to spammers.

 

 

Summary

A programmer or the admin might want to gravitate towards the most technologically sophisticated solution. However, there are many existing ways to counter this problem. Every method has its own advantages and pitfalls. It drills down to the target users of the website and the use case, which should and more often eventually impact the method to be used.