[lug] Web crawler advice
David L. Anselmi
anselmi at anselmi.us
Mon May 5 23:11:08 MDT 2008
gordongoldin at aim.com wrote:
> I'm doing a project to analyze text content on the web:
>
> i need to:
>
> start with a list of URLs
> for each URL in the URL list
[...]
> ?? extract all the links
> ????? add just the new links to the URL list (not those already in the list of URLs)
Doesn't that eventually crawl the whole web (as much as it's connected)?
Is that really what you want?
Dave
More information about the LUG
mailing list