[lug] Screen Scraping

Gordon Golding gordongoldin at aim.com
Fri Sep 21 11:28:31 MDT 2012


I'm doing a project involving screen scraping 

I want to go out to craft brew-oriented sites and get information off them  I did 2  projects in screen scraping; a wall Street firm that was scraping specific very well defined websites.  And I've also done work with lexical analysis pulling the entire website and  just filtering out the text.  .  This is a much less defined problem - depending on what is needed at the moment, I will go out to a site and pull a list;  beers, beer styles, beers offered, list of awards, etc. 
 Previously I just wrote the coding for the more defined problems with very specific Java. 

I don't expect to build a spider intelligent enough that it can just go out and find whatever.  I'm looking for a "foundation" which I would tailor for each new situation - better foundation, less tailoring ;-)

 I'm looking for suggestions on specific environments, tools, utilities.....   Got code snippets to pass on?  Love to beg borrow steal anything already there.  ;-)
Gordon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lug.boulder.co.us/pipermail/lug/attachments/20120921/23655511/attachment.html>


More information about the LUG mailing list