You are not logged in.

#1 2007-12-11 4:49 am

Russ
Guest

Catching Spambots, part 1

I'm going to talk about a few of the techniques I've used that have been successful in preventing most, if not all, forum spam from getting through. For the purposes of this documentation it's assumed you have at least a little working knowledge of PHP (or whatever scripting language your forum software is written in) and some HTML. My experience is mostly with phpBB and PunBB (Pun is the forum software used on this site).

First, a word or two about CAPTCHAs,the little squiggly letters you are sometimes required to fill out when you submit a form. The idea is simple enough, it's a challenge question given that in theory only humans can solve, and not a machine.

That's an important distinction because as you may have figured out most spam that spreads across the Internet and onto forums, guestbooks, comment forms, and anything else that can be filled out is not done by human visitors. They are programs designed to emulate a web browser as it goes through the process of requesting the page, filling out and submitting a form.

So the idea behind CAPTCHAs is that these scripts or "bots" as they're usually called can't visually process a distorted image, read it, and fill in what it says, something that humans can do easily.

But I don't like CAPTCHAs for a few reasons, first they don't really work. It's already been proven that with the right technique 92% of CAPTCHAs are easily read by computer. Not only that, but it is suspected that spammers are having CAPTCHAs solved for them by unsuspecting humans in the guise of offering something like free pr0n.

Even if they were more effective though, the main reason I don't like them is because it places the burden of proving humans from bots on the innocent human visitors themselves, which is, for a lot of people, a pain in the ass. Additionally, the CAPTCHAs can sometimes be distorted enough to make it hard for humans to solve, in the past I know I've encountered a few I've entered incorrectly and I've gotten emails from users who say they can't register because they can't figure out the letters.

So to me it would be ideal if CAPTCHAs worked the other way around, so that instead of forcing humans to identify themselves as such, you tricked the bots into revealing who they are instead, so that's what I set out to do.

In my experience, the average spambot is quite dumb. It does what it's instructed to do, sucks down a registration page on a forum for example, parses the fields given, and fills them in with data and submits the form. And they will fill in pretty much any field they come across, too. So my concept is: include some "extra" fields in the form that a human would not fill out, but a bot would do so indiscriminately, and then you could simply tell them apart for testing to see if your control fields are populated or not.

How best to do this? Keep humans from inadvertently filling out the fields you set but still fool the bots? Well we can capitalize on the point I made earlier, spambots are pretty dumb. So what we can do is hide these control fields in the form so they aren't seen on a visual browser, but spambots think they exist. There's a few ways you can do that, the easiest I've found is to simply include the control fields in an HTML comment.

Comments in HTML are identified by "<!--" at the beginning and "-->" at the end. To a visual user-agent like a web browser, anything inside comment tags are simply ignored, but of course you can see them if you view the source of the page. Most every spambot I've encountered ignores comments, and so they'll submit data in these fields. Then when the form is accepted, you just test to see if these custom fields you created have anything in them other than null, and you know you've caught a spammer. Halt registration.

This is most ideal because your human visitors you want to register are not inconvenienced by CAPTCHAs or other means of proving themselves, and all your spammers are caught silently in the background.

So what kind of names do you give your control fields? I usually name them things that I know spammers can't resist filling out, such as "url", "interests", or "realname". In my experience these get filled out consistently with junk, and are good tests.

The downside is in the implementation. It requires modifying forms and the backend code that make the registration form work, something that's difficult for someone without programming experience and it can make upgrades challenging when new versions come out. My goal is one day to develop mods for common applications like Pun and phpBB that could make installing a little easier, and make these techniques a little more accessible.

In the meantime though...this is how roughly most of the entries found on my site are caught. I then take the additional step of having them automatically added to the database as well.

Next when I get a chance I plan on posting some code examples.

Also, I've had people tell me before that spammers could easily work around these techniques, and that's true, keyword being could. But given the already millions of spambots deployed out there, and the knowledge that stock, insecure forums are always going to be the majority I say it will be a long time before this technique becomes obsolete and by that time new techniques can be invented. Usually there is never a single approach to fighting spam, but several used together is most effective.

Fighting spam has always been an arms race of sorts anyways, there's never a clear winner, each side just gets a bigger arsenal.

#2 2007-12-12 9:22 pm

mj12
Member
Registered: 2007-12-12
Posts: 11

Re: Catching Spambots, part 1

A bit off topic.. but I'm currently working on a C# app that will automatically log-in to our internet radio station admin page and navigate to the page that contains the list of users that are in the validation queue, pending registration. The member name, IP address, email address and domain name will be checked against the banned lists and deleted if found. The idea is to fully automate the process. When it's ready I'll make it available to whoever might be interested in it, that would include the source code.

Offline

#3 2007-12-13 1:46 pm

Russ
Guest

Re: Catching Spambots, part 1

That would be great. I can post the code on the site if you like, and attribute the code to you. I don't know C# at all.

I'm also interested in developing some plugins for forum software like phpBB that would modify the registration form to check this site through the API whenever someone registers.

#4 2007-12-13 3:48 pm

mj12
Member
Registered: 2007-12-12
Posts: 11

Re: Catching Spambots, part 1

That'd be great! It's kinda cool to watch this app fire-up IE, navigate to the admin page of our site, fill-in the login textboxes, click the login button, etc. It then goes on to click the appropriate links on the admin page to navigate to the page that displays the list of members whose registrations are pending validation. It goes thru that list, parsing out the IP address, email address and domain name. For each row in the html table, it submits queries to your website and checks the results that come back. Depending on what's found in the response, it ticks the checkbox on the admin page, marking the member for deletion. After all rows have been processed, it clicks the "Process" button to actually delete those members that have been "blacklisted" from the validation queue. More later...

Offline

#5 2007-12-14 1:16 am

lv2jm
Member
From: Northern Ontario
Registered: 2007-11-30
Posts: 6

Re: Catching Spambots, part 1

I'm just assistant admin on my little forum with limited permission and 0 html knowledge, Is that how anti-spam ACP work? giving bots several fields to fill in and denying them entrance when they do. I've had mail in reports of lots of bots kicked out in the last year with that mod.

Andre

Offline

#6 2007-12-14 4:44 am

Russ
Guest

Re: Catching Spambots, part 1

I'm not sure, I'd have to look at the mod and see what it does.

#7 2007-12-22 4:59 pm

forrie
Member
Registered: 2007-12-22
Posts: 15

Re: Catching Spambots, part 1

how would this be implemented in software like Vbulliten??? due to the fact that all the feilds have to be set via the admin control panel, thus the php/html code is based on an dynamic database source.

Offline

#8 2007-12-22 5:20 pm

mj12
Member
Registered: 2007-12-12
Posts: 11

Re: Catching Spambots, part 1

forrie wrote:

how would this be implemented in software like Vbulliten??? due to the fact that all the feilds have to be set via the admin control panel, thus the php/html code is based on an dynamic database source.

I presume your question is in response to my post... Using WatiN, which is an interface between .NET and Internet Explorer, it's very easy to fill-in the form textbox controls, click "Submit" buttons, etc.. It it's current implmentation, the app I'm working on fires up an instance of IE, navigates to www.stopforumspam.com/spamdomains and gets the latest list of all spammer domains. Then it logs on to our forum admin console and scans the page containing the banned domains looking for ones that haven't been added yet. If it finds one, it's added. This process takes just a few seconds to complete.

BTW... WatiN can be downloaded from here

Offline

#9 2008-01-29 11:03 pm

geordief
Member
Registered: 2008-01-29
Posts: 9

Re: Catching Spambots, part 1

I was looking for tactics on spambots some time back-with limited real success (I only have a site with a contact form).
I like your idea of the comments and am trying it out on my dummy form pages (old and discarded pages that still attract the formbots)

I also came across another idea recently on the Mailwasher forum..
It was designed as as a normal filter but it seems to  work for the formbots as well.They are called country relay filters.
Here is the link
http://www DOT castlecops DOT com/t209159-another_filter_tweaks_from_me.html
I don't know exactly why it does work -or even if it works in all cases .

ps I have now tried out your comments idea and it does seem to work(but not alawys-I wonder if they that is because some are using a cache of my page)

Last edited by geordief (2008-01-30 12:11 am)

Offline

Board footer

Powered by FluxBB

Close
Close