[lug] Help on scripting interactions with a web site

Jonathan Briggs zlynx at acm.org
Tue Sep 18 09:25:36 MDT 2001


Yes, Perl is a good tool for this.  If you happen to prefer another 
language, like Python I'm sure they have tools as well, but I'm not 
familiar with them.

What you want from Perl is the LWP and HTML libraries.
In a testing tool I wrote, I use LWP::UserAgent and HTML::Form.  The 
general method I use is to get the form with a 
LWP::UserAgent->request(), parse the result into a form with 
HTML::Form->parse(), set the values I want in the form elements, then 
submit the request with LWP::UserAgent->request(HTML::Form->click()).

To get your data out of the result, you can use pattern matching, or get 
fancy and use HTML::Parser, which works a lot like XML parsers.  If the 
document happens to be XHTML, you could use an actual XML parser instead.

Chip Atkinson wrote:

> I believe that Perl has some web client/browser tools/modules 
> available.  I haven't used them but remember seeing them as I was 
> going through the list of available modules.
> Try looking at the web page to find the name of the variables and see 
> if you can put together a url using get.  You might be able to make it 
> something simple and regular if the names for the reports are regular.
>
> Hope that helps a tiny bit.
>
> Chip
>
> Phil Rasch wrote:
>
>> I want to acquire some datasets from a web site.
>> Unfortunately, the web site is designed so that in order to acquire
>> the approximately 500 text files containing the data one must interact
>> with a server at that site repeatedly, and then finally cut and paste
>> the displayed data from a browser window into a file. I suspect the
>> web site is using CGI scripts in the procedure because the final
>> dataset does not show an html address that changes. Things stay the
>> same for the last 4 or so interactive choices.
>>
>> I am frustrated by the whole thing. It is a waste of my (or a
>> support persons) time to have to do this, and the opportunity for
>> mistakes is very high.
>>
>> I have contacted both the webmaster for the site, and the
>> investigators, and my sense is that they dont want to make it easy to
>> acquire the data. They are however contractually constrained to make
>> the data publicly available. They just dont have to make it easy.
>>
>> So I am looking for a way around my problems. I want to script the 
>> exchange. So
>> I just enter the relevant info in the script (e.g. the years, the
>> stations, the destination, etc) and the whole thing goes on 
>> automatically
>> from my end. As far as the web server is concerned somebody is sitting
>> at the my end. But in reality a program is handling the transaction.
>>
>> Can anybody make a suggestion on the right tool?
>>
>> Thanks
>>
>> Phil
>
>
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug







More information about the LUG mailing list