Today, our web applications have become extremely advanced. These are really advanced systems with newer technologies coming up each and every day. But if you dissect any web site or web application, you will find two basic functionalities it provides to its users. Firstly, it displays information and secondly, it consumes information.
Now to consume information, each website makes use of forms. It is a common practice for a user to visit a website and fill out a form. Filling out forms are such an integral part of our web browsing experience that we don’t even realise how often we do it.
The problem with a basic form is that it is only concerned with the data that is put into it. Of course there are smart forms that make use of validations to validate the data, but a basic form can never differentiate between the data filled by a real user vs data filled by a bot.
Making use of this vulnerability, a spammer create “bots” (automated web crawlers) to seek out and find pages with forms in hopes to automate the POST requests on them, this leads to a lot of spam entries and can eventually result in a DoS attack on the website.
As you see, the scenario is pretty common, and hence the need for solutions. There are some widely known and implemented solutions which are used to counterattack this problem.
Banning IP Addresses
This technique requires finding out the IP address of the spammer and banning any and all requests from that particular IP address. This method rarely works because those can be spoofed or reassigned and you might actually end up blocking a legitimate user; spammers tend to use dynamic IP’s anyway!
Forcing users to sign up and verify their mail id before posting
This technique can be used and will be useful, but it shifts the burden from the site admin’s shoulders to that of the users. It is pretty obvious here that this method hinders the user experience on the website.
The HoneyPot Technique
The Honeypot Technique makes use of a hidden input field or checkbox with the form. The secret is that no human will see the checkbox or the input field, but most bots fill forms by inspecting the page and manipulating it directly, not through actual vision. Therefore, any form that comes in with that hidden field filled or that checkbox value set allows you to know it wasn’t filled by a human. This technique is called a bot trap. The disadvantage of using this technique is that a spammer can quite easily access the html of a page, and once he finds out the use of a honey pot on the form, bypassing it is just a cakewalk for him. Hence this might only work for a general spammer.
Probably the most widely known and used technique. A CAPTCHA is method to fight spam robots by requiring users to enter a code, often displayed as a distorted image. CAPTCHA is a sure shot method to stop spammers from registering on your site. Spammers and bots cannot tell what letters and numbers there are in a CAPTCHA, therefore stopping them. The only problem with a CAPTCHA is that it is really REALLY annoying for the user. And if your website is heavily form based, it will impact the user experience of the website heavily and also might bring down the actual registrations or form submissions on the site.
Google’s ReCAPTCHA might have just solved the user experience problems with the standard Captcha. With ReCAPTCHA, the user need not enter distorted numbers or letters and neither do they need to solve arithmetic.They need just a single click to confirm they are not a robot. So, reCAPTCHA will protect your website from spam with better user experience. You can try reCAPTCHA here. Note that there have been some surfacing arguments about reCAPTCHA that its simplicity makes it more vulnerable to spammers.
A programmer or the admin might want to gravitate towards the most technologically sophisticated solution, but as you see, there are many existing ways to counter this problem, and many more could come in the future. Every method has its own advantages and pitfalls. It drills down to the target users of the website and the use case, which should and more often eventually impact the method to be used.