[lug] Web crawler advice

Tue May 6 18:30:08 MDT 2008

> > But how does one attach a js to an image if you
> don't control the page 
> > that loads the image?  Since someone is deep linking
> the image from a 
> > page you don't own, if you don't own or
> control the page you can't 
> > insert js.
> 
> He's definitely saying the attacker owns the page the
> "fake" image tag 
> is on, loaded with JavaScript instead of an image file.

I see!  When someone tries to illegally deeplink a valid image from your site onto their own, you identify the behavior and then redirect to something that appears to be an image file but really isn't.  That site foreign page (which you still do not control) winds up displaying / running the js contained in your rogue phony img file in place of where it thought the deeplinked img file was going to be.  Clever.

However, would that actually work?  Maybe the browser doesn't give a crap and where the img source used to be, simply places and then runs the js code?  But even if it did that, wouldn't it try to interpret the image as an encoded mimetype jpg or gif and have trouble running the js code on the page?  I know it's a theoretical question and not sure you know all of the ins and outs, just wondering how the browser actually interprets this kind of switch from an image to script so that the script is still useable.  It seems unlikely but who knows.

> How hard is it to set up a web page on a server, put up
> something 
> "interesting" enough to the general public to get
> a few thousand page 
> views a day, and then embed evil things in it?  Not very.

Since this assumes you do have control of the pages your embedding the img files in, this is a bit of a separate case from the original thread.  However, it does show how much trust you put in the pages that you visit every day.

> Now move that webserver off-shore where it's harder to
> get the attention 
> of the authorities and/or the ISP... but keep your
> ".com" domain name on 
> the foreign IP address...

Wouldn't a domain registrar shut down this domain if contacted by "certain" authorities.  Not sure who those authorities would be or if they exist.  That seems logical.  It would prevent the site from wreaking havoc regardless of the country where the ISP resides.

> You get the idea.  Evil incarnate.  And more common than
> people think, 
> sadly.  Indiscriminate web browsing and bad browser
> behavior is right up 
> there with some of the worst real "threats" to
> modern computing as it 
> gets.

So how would you really prevent this?  Unless you disable js in the browser, I don't think you'd be that successful.  It's hard to tell if a site is rogue until you actually visit it.

All of the major sites are moving to js (most use some sort of web 2.0 wizardry) these days and they don't degrade gracefully when disabled.  

I personally don't care for much of the js that's used out there.  Probably because I like to tab browse and when you get to many js heavy sites running on multiple tabs, they crash browswers (at least mine) and make them very unresponsive.  Not sure if that's bad coding or just the nature of the beast.

> Common techniques today are starting to become things like
> "contained" 
> environments or "sandboxes" where the browser is
> only used/loaded inside 
> a virtualized OS that can be wiped and reloaded, keeping
> (hopefully) the 
> host OS safe from harm.

Maybe that's part of the answer to my earlier question.  It seems like a lot of work to go to simply for browsing.

      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ