[lug] Web crawler advice
Bear Giles
bgiles at coyotesong.com
Mon May 5 22:54:21 MDT 2008
If you want to be eviiiil, join us on the j2ee bench. You have full
control of the network connection.
E.g., I've always wondered what would happen if you replied to an image
request with a) headers that claim it's something like 3800x2600 pixels
and b) after the header you only fed a dozen bytes a second. Just
remember to flush -- don't let it buffer. Well-behaved browsers know
that they shouldn't establish more than a few simultaneous connections
to the same remote address, and progressive rendering will try allocate
window space once it has the dimensions.. You might keep the connection
up 30-60 seconds before closing it anyway, but you'll use almost no
bandwidth.
Or create a dynamic image with the information you get back from geoip
on the remote address. It still freaks out a lot of people to see
references to their town or a nearby town,
Or you could just get weird on them. Return a redirect to the referer
url. Return a redirect to the last referer url. Return a redirect to
fbi.gov. Set a bunch of deeply disturbing cookies if they ever bother to
check them. (See 'geoip' mentioned above.) Don't forget that you can set
cookies for other sites -- you can't read them, but you can still set
them on most browsers (iirc) so it won't be obvious where they came from.
More information about the LUG
mailing list