[lug] Google (and other) ftp crawlers.

Bear Giles bgiles at coyotesong.com
Fri Feb 25 14:40:19 MST 2011


It doesn't /deny/ anything, it's just a hint and many poorly-written
crawlers will ignore it.

BTW the intent of the file is to identify dynamic content that's pointless
to crawl. E.g., a cached copy of "current weather" will be pretty useless in
a week. It's also a good to mark large files that are available elsewhere,
e.g., there's no point in downloading and caching my Ubuntu .iso images
since they're widely available elsewhere.

A lot of people think it can be used to protect sensitive information and
that was never its intent. I will leave it to your imagination whether it
can be productive to troll those files to see if anyone has highlighted 'the
juicy bits' from that misunderstanding.


On Fri, Feb 25, 2011 at 1:49 PM, Stephen Kraus <ub3ratl4sf00 at gmail.com>wrote:

> Yes, there is a config file you can add that denies crawlers the right to
> access and download stuff from your server via Robots.txt
>
> On Fri, Feb 25, 2011 at 1:46 PM, Dave Pitts <dpitts at cozx.com> wrote:
>
>> Hello:
>>
>> Is there a way to get Google (and other) sites to stop crawling through my
>> anon
>> ftp site? They download everything and drop my network access to a
>> crawl....
>> sometimes causing my applications to timeout and die.
>>
>> Thanks in advance.
>>
>> --
>> Dave Pitts             PULLMAN: Travel and sleep in safety and comfort.
>> dpitts at cozx.com        My other RV IS a Pullman (Colorado Pine).
>> http://www.cozx.com
>>
>> _______________________________________________
>> Web Page:  http://lug.boulder.co.us
>> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
>> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety
>>
>
>
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lug.boulder.co.us/pipermail/lug/attachments/20110225/7f3faa31/attachment.html>


More information about the LUG mailing list