[lug] Web crawler advice

David L. Anselmi anselmi at anselmi.us
Mon May 5 23:11:08 MDT 2008


gordongoldin at aim.com wrote:
>  I'm doing a project to analyze text content on the web:
> 
> i need to:
> 
> start with a list of URLs
> for each URL in the URL list
[...]
> ?? extract all the links
> ????? add just the new links to the URL list (not those already in the list of URLs)

Doesn't that eventually crawl the whole web (as much as it's connected)? 
  Is that really what you want?

Dave





More information about the LUG mailing list