[lug] How tell last modified date of third party website pages?

Rob Nagler nagler at bivio.biz
Fri Dec 14 16:33:13 MST 2007


On 12/14/07, karl horlen <horlenkarl at yahoo.com> wrote:
> Is this possible from a client browser or programs
> like wegt or curl (possibly mirroring site locally)?
>
> I assume this is not possible for dynamically
> generated web pages, only static html pages?

It really depends on the application, and what you mean by dynamically
generated.  Most applications give dynamic dates as well, but some may
use the Expires header to give you an idea how long you can cache a page
for.  You could store that information.

For most apps with dynamic pages that you might want to cache, it's probably
just as easy to download all the pages.  I'm pretty sure that's what google does
when it scrapes our apps.  It then keeps a local database of frequency of change
so it can know when to scrape again.

Rob



More information about the LUG mailing list