Fighting Back Against Blog Comment Spam

Defensive Steps

Make sure your web site isn’t hosting landing pages for spammers.

A quick and dirty way to do this is with a Google site search. This can be generated with the URL
https://www.google.com/search?q=site%3Ayour.domain
or simply by typing site:your.domain into the search box at Google. This will show all the pages Google has indexed for your domain.

If the Google results show pages you didn’t create, you have some cleanup to do. Make sure you update your access control passwords and that you have up-to-date versions of any content management system (CMS) or other web application software that spammers may be using as a vehicle for access.

Prevent spammers from successfully posting and linking.

If you use blog or bulletin board software on your web site, check the configuration settings that control the posting of comments. Comment moderation, keyword filters, and CAPTCHA systems can be very helpful. If your software doesn’t already use rel="nofollow" for comment links, see if there is a version that does. This will make sure that Google (and probably other search engines) will ignore the links if any spammer’s posts do make it on to your site.

Going On The Offensive

If you want to take a more aggressive approach, you can notify the owners of hacked web sites that are hosting spammers’ landing pages by harvesting the links the spammers submit in the comments to your site. The techniques for this vary, but here is an approach you can use with comments in a WordPress Spam comment folder and access to a Linux command line:

  1. View the first page of comments in your Spam comment folder.
  2. Using the “view source” facility of your browser, search for
    class="comment"
    (this is the code that allows for quick in-line editing of comments)
  3. Cut and paste from the beginning of the containing table to the end of the containing table (from the <table> tag to the </table> tag) into the following command line sequence (you may want to create a script for this):
    perl -ne 'print "$1\n" while /&quot;(http.*?)&quot;/g;' |
    (touch notified; grep -vf notified) |
    sort -u -o notify

    This will extract the URLs referring to spammers’ landing pages in the comments (except for any already recorded in the file “notified”) and puts a sorted, unique list into the file “notify”.

  4. View any additional pages of Spam comments and cut-and-paste the comment source the same way for each additional page. The command sequence will continue to accept input until you enter Ctrl-D by itself on a line (or twice mid-line) to end script processing.
  5. You can generally find a “contact us” or similar page on most web sites to determine who to notify, or you can generally perform a “whois” query on the domain to find administrative or technical contacts.
  6. I use a custom script to generate my email notices, but you can also generate one by hand in your email client and use it over and over again as a template if it has an “edit message as new” facility like the one available in Mozilla’s free Thunderbird client.
  7. Copy the URLs (or even better, just the //domains/) for any sites you notified to the “notified” file and they won’t appear again in your future “notify” results.

This won’t keep the spammers from spamming (they’ll keep telling us, over and over again, which sites they’ve compromised), but with a few minutes of time here and there, you can make the Internet a better place, undo some of their work, and invalidate some of their spam comment links everywhere.

One thought on “Fighting Back Against Blog Comment Spam

  1. kcsadm Post author

    As a faster and easier alternative to cutting and pasting the comment HTML, you can also extract the spam comments directly from MySQL. In this case, the link hrefs are not quoted, and the link extraction looks something like this:

    mysql -B -e 'select comment_content from wp_comments where comment_approved = "spam" ' database_name |
    perl -ne 'print "$1\n" while /href="(http.*?)"/g;' |
    sort -u

    This assumes that you’re storing user name and password information in .my.cnf in your home directory so that it is unnecessary to supply credentials on the command line.

Comments are closed.