Most of the organizations are trying make their presence felt in this virtual world, by putting up forums and blogs, so as to interact with their users / customers. However, the perils of putting up forums/blogs has its own flip side – Spammers and their Spam-Bots.
It is the admin who has to suffer the most – DB / Forum maintenance and above all – the task of identifying these thousands of SPAM registrations/posts and clean them.
In order to combat spammers, CAPTCHA was introduced, however, spammers have found a way of circumventing this deterrent, which essentially makes it difficult to differentiate between a Human and Bot.
Initially, CAPTCHA was a simple image , which the end-user was supposed to read it and provide the correct interpretation of the Image, ie. the words / numbers. Spammers, started using OCR to crack these images and automate the entire process, which effectively nullified the effect of CAPTCHA.
Over a period of time, the images being served started becoming more complex in nature essentially trying to make it difficult for OCR to decipher the exact output. However, it was the human-user who had to eventually suffer, due to the fact that , many of these new-age complex CAPTCHAs are un-readable.
For past few weeks, we were combating an attack by SPAM-Bots on one of our web-based services. We tried numerous CAPTCHA modules provided, which were being used by various other entities too. However, whatever might be the CAPTCHA used, it was being broken. The time taken ranged from 1 min to 4 mins per successful attempt.
It was to be noted that, all the CAPTCHAs which we had implemented were static images and looking at the time taken to solve it vis’a'vis our own attempts to solve the CAPTCHA manually has led me to believe that a lot of things must be happening behind the scenes as far as these SPAM Bots are concerned.
Since, all the CAPTCHAs were using static images, we decided to switch over to an interactive CAPTCHA. We were looking at a CAPTCHA which would force the user to think, usage of rational effort from the end-user was something I was looking into the various CAPTCHA solutions.
Some of the CAPTCHAs, provided “Flash” based solutions, with various effects like blurring of images (containing words / numbers) at a particular time-rate. Interesting concept of introducing time-based CAPTCHA.
Time-Based CAPTCHA, theoretically, nullify the screen-shot action of the SPAM-BOT, however, the SPAM-BOT algorithm can be changed to match the timing and take multiple screenshots and send it across for solving. The solvers in this case can be human solvers or just automated scripts with OCR. Hence, these type of CAPTCHA services were kept in the back-burner.
Other CAPTCHA service was related to matching words with their associated images and the correct images is to be selected and is to be dragged and dropped into the drop-zone. However, a few of the services which we analyzed, contained a major flaw – the Correct Image was the only one to contain a hyper-link, and logically it would be the easiest to crack.
Even then, we went ahead with its implementation and our understanding of this type of CAPTCHA was 100% correct. The time taken by the SPAM-Bot to break this CAPTCHA was 30 secs. If the issue related to hyper-link is solved then we would love to test this CAPTCHA.
Another CAPTCHA, which was tested, was based on a jig-saw puzzle. The end-user had to match the missing pieces and places the missing pieces in their correct position. This CAPTCHA required cognitive ability of the human brain as well as interaction of the user. At first instance itself it was amply clear that, even though the task of arranging the images was SIMPLE from a human perspective, the computing challenge it provided was immense.
Let me explain this in brief, when it comes to arranging images, a computer program has to understand the edges and pixel colors , secondly, a match (edges and pixels) needs to be done from the available pieces vis’a'vis the empty spaces and a solution needs to be arrived at.
From a Spammers perspective, their intention is to SPAM, using the least amount of resources and processing power. From a business perspective of these spammers, there are lot many web-sites which use simpler CAPTCHAs, so they wouldn’t mind switching over to a new target. Its not that, these CAPTCHAs cannot be cracked, but they do pose a computational problem.
So, what was the end-result of this exercise – 0% SPAM and till the time this CAPTCHA doesn’t get cracked, the method provided by jig-saw CAPTCHAs, is in here for a long time to come.