[lug] How tell last modified date of third party website pages?

karl horlen horlenkarl at yahoo.com
Fri Dec 14 20:06:43 MST 2007


Caching really doesn't help me here.  

I'm not looking to do this on a regular basis.  I just
want to know if it's possible to figure out when the
last time site content has been updated on a third
party site to figure out how old it is.

A site that hasn't changed in 10 years could still
enable an always expired cache to force complete
downloads every time a pageview is requested.

I think the answer to my question is that it all
depends but probably not very likely.  Most
dynamically generated pages won't add Last-Modified
lines to their headers.  And I think it might actually
be up to an Apache config param to determine whether
it wants to even transmit this info or not for static
html pages.

> generated.  Most applications give dynamic dates as
> well, but some may
> use the Expires header to give you an idea how long
> you can cache a page
> for.  You could store that information.
> 
> For most apps with dynamic pages that you might want
> to cache, it's probably
> just as easy to download all the pages.  I'm pretty
> sure that's what google does
> when it scrapes our apps.  It then keeps a local
> database of frequency of change
> so it can know when to scrape again.
> 
> Rob
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List:
> http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: lug.boulder.co.us port=6667
> channel=#colug
> 



      ____________________________________________________________________________________
Looking for last minute shopping deals?  
Find them fast with Yahoo! Search.  http://tools.search.yahoo.com/newsearch/category.php?category=shopping



More information about the LUG mailing list