[lug] Blocking "Missigua Locator" from Hoovering my Server

Daniel Webb lists at danielwebb.us
Sun Apr 9 13:45:11 MDT 2006


On Sun, Apr 09, 2006 at 10:22:43AM -0600, Bill Thoen wrote:

> A couple of days ago my FC2/Apache server was visited by a 'bot that
> ignored robots.txt and on every php page it could find it tried to stuff it
> with a very long string made up of /software/software/software/...etc.
>  
> Is there a simple way to deny access to anyone riding on the user agent
> 'Missigua Locator 1.9'? Is it possible to filter on the user agent? 

Yes, lots of people do it that way, although I don't have a link.  Google for
"bad web robot" or along those lines.

A better solution is something like my bot trap:

http://danielwebb.us/software/bot-trap/

I'm biased, though.   :)

With my solution you ban every bot that ignores robots.txt, regardless of
user-agent string.

Some of those jerks will actually load robots.txt and go there FIRST!  I like
them, they only cost me two page loads.

One example of this is a particularly nasty netizen called Cyveillance.
Cyveillance is a company with a HUGE network pipe who scan *everything*
looking for disparaging things said about their corporate clients so they can
send frivolous legal threats hoping people won't stand up to them.  Many
webmasters simply block them by IP address at the firewall they are so bad.
This email will no doubt come to their attention.




More information about the LUG mailing list