MSNbot – Ban the Bot, Or Not?

I am quite happy to have search engine robots visiting Smiffy’s Place, following the links and indexing the content. The majority of my robot visitors seem to do just that, and are no trouble at all.

Today, I had reason to look at the server error log for Smiffy’s Place and found some activity coming from msnbot with which I was less than pleased.

  1. It had tried to access my main images directory, despite the fact that this is banned in robots.txt
  2. Although I cannot make out the full request, my blog software had thrown an error on a URI which contained the following string: .get_permalink($post->ID). This – I believe – is actually WordPress code. Just why was a search engine robot trying to force scripting into my software?
  3. It had followed a concealed and commented out link to a honeypot for spam harvesters.

When I started documenting this, I was wondering whether msnbot is really welcome at Smiffy’s Place. Having now got to the end of the post, I am in no doubt that it is not. My robots.txt now includes the following lines:

User-agent: msnbot/1.0
Disallow /

I will now watch with interest to see whether or not visits from msnbot actually cease. If not, the netblock will be finding its way into an iptables rule to keep it out of my servers once and for all.

I would love to hear the experiences of others in relation to this robot – please leave a comment against this article, or drop an e-mail to the address at the bottom of the page.

Oh, Smiffy’s Place now also has a filter to return a rude message on receipt of anything that looks like an XSS attempt. Try adding a dollar sign to the end of the URI of this post and you will see.