[lug] How tell last modified date of third party website pages?
karl horlen
horlenkarl at yahoo.com
Fri Dec 14 20:06:43 MST 2007
Caching really doesn't help me here.
I'm not looking to do this on a regular basis. I just
want to know if it's possible to figure out when the
last time site content has been updated on a third
party site to figure out how old it is.
A site that hasn't changed in 10 years could still
enable an always expired cache to force complete
downloads every time a pageview is requested.
I think the answer to my question is that it all
depends but probably not very likely. Most
dynamically generated pages won't add Last-Modified
lines to their headers. And I think it might actually
be up to an Apache config param to determine whether
it wants to even transmit this info or not for static
html pages.
> generated. Most applications give dynamic dates as
> well, but some may
> use the Expires header to give you an idea how long
> you can cache a page
> for. You could store that information.
>
> For most apps with dynamic pages that you might want
> to cache, it's probably
> just as easy to download all the pages. I'm pretty
> sure that's what google does
> when it scrapes our apps. It then keeps a local
> database of frequency of change
> so it can know when to scrape again.
>
> Rob
> _______________________________________________
> Web Page: http://lug.boulder.co.us
> Mailing List:
> http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: lug.boulder.co.us port=6667
> channel=#colug
>
____________________________________________________________________________________
Looking for last minute shopping deals?
Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping
More information about the LUG
mailing list