[lug] Screen Scraping
Gordon Golding
gordongoldin at aim.com
Fri Sep 21 11:28:31 MDT 2012
I'm doing a project involving screen scraping
I want to go out to craft brew-oriented sites and get information off them I did 2 projects in screen scraping; a wall Street firm that was scraping specific very well defined websites. And I've also done work with lexical analysis pulling the entire website and just filtering out the text. . This is a much less defined problem - depending on what is needed at the moment, I will go out to a site and pull a list; beers, beer styles, beers offered, list of awards, etc.
Previously I just wrote the coding for the more defined problems with very specific Java.
I don't expect to build a spider intelligent enough that it can just go out and find whatever. I'm looking for a "foundation" which I would tailor for each new situation - better foundation, less tailoring ;-)
I'm looking for suggestions on specific environments, tools, utilities..... Got code snippets to pass on? Love to beg borrow steal anything already there. ;-)
Gordon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lug.boulder.co.us/pipermail/lug/attachments/20120921/23655511/attachment.html>
More information about the LUG
mailing list