[lug] Web crawler advice

George Sexton gsexton at mhsoftware.com
Mon May 5 19:34:40 MDT 2008


Goto:

http://www.aww-faq.org/#quickanswers

and read "How can I stop someone from hot-linking to my images?"

karl horlen wrote:
> Can you say more about how you detect that people are leeching your site content and how you prevent it.  For instance what specific rewrite rules or other techniques do you use to help defeat this type of behavior?
> 
> Do you automate the leech detection?  I'd think it would be pretty tedious to periodically manually inspect the logs looking for this type of behavior.    Do you have a cron script that periodically checks for certain logfile entries?  If so would you mind sharing some of it or some techniques used to detect the rogue hits?
> 
> Finally. Is there any way that one could "inject" "id info" in site content / pages and then later do a google search with those "id tags" to see if any other site pages have been spidered under those id tags?  I'm thinking that if you injected a really unique id tag in the html code, like an element attribute that wouldn't be displayed, it might actually get flagged by google.  Just a thought?
> 
> Thanks
> 
> 
>> MySpace and people deep-linking to content off-site is
>> really annoying 
>> on busy pages on their site too, but that's easily
>> handled with a 
>> rewrite rule to send them off to REALLY nasty photos (if
>> I'm in a bad 
>> mood) so they'll stop using me as their "image
>> host", by linking to only 
>> the images in my content and then loading 100 copies of it
>> every time 
>> some moron hits refresh on a MySpace page where some doofus
>> has used my 
>> images in their "avatar".
>>
>> Nate
>> _______________________________________________
>> Web Page:  http://lug.boulder.co.us
>> Mailing List:
>> http://lists.lug.boulder.co.us/mailman/listinfo/lug
>> Join us on IRC: lug.boulder.co.us port=6667 channel=#colug
> 
> 
>       ____________________________________________________________________________________
> Be a better friend, newshound, and 
> know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: lug.boulder.co.us port=6667 channel=#colug
> 

-- 
George Sexton
MH Software, Inc.
Voice: +1 303 438 9585
URL:   http://www.mhsoftware.com/



More information about the LUG mailing list