[lug] Web crawler advice

Bear Giles bgiles at coyotesong.com
Mon May 5 22:54:21 MDT 2008


If you want to be eviiiil, join us on the j2ee bench. You have full 
control of the network connection.

E.g., I've always wondered what would happen if you replied to an image 
request with a) headers that claim it's something like 3800x2600 pixels 
and b) after the header you only fed a dozen bytes a second. Just 
remember to flush -- don't let it buffer. Well-behaved browsers know 
that they shouldn't establish more than a few simultaneous connections 
to the same remote address, and progressive rendering will try allocate 
window space once it has the dimensions.. You might keep the connection 
up 30-60 seconds before closing it anyway, but you'll use almost no 
bandwidth.

Or create a dynamic image with the information you get back from geoip 
on the remote address. It still freaks out a lot of people to see 
references to their town or a nearby town,

Or you could just get weird on them. Return a redirect to the referer 
url. Return a redirect to the last referer url. Return a redirect to 
fbi.gov. Set a bunch of deeply disturbing cookies if they ever bother to 
check them. (See 'geoip' mentioned above.) Don't forget that you can set 
cookies for other sites -- you can't read them, but you can still set 
them on most browsers (iirc) so it won't be obvious where they came from.



More information about the LUG mailing list