[lug] Help on scripting interactions with a web site

Evelyn Mitchell efm at tummy.com
Tue Sep 18 09:00:20 MDT 2001


On Mon, Sep 17, 2001 at 02:16:44PM -0600, Phil Rasch wrote:
> I want to acquire some datasets from a web site. 
> 
> Unfortunately, the web site is designed so that in order to acquire
> the approximately 500 text files containing the data one must interact
> with a server at that site repeatedly, and then finally cut and paste
> the displayed data from a browser window into a file. I suspect the
> web site is using CGI scripts in the procedure because the final
> dataset does not show an html address that changes. Things stay the
> same for the last 4 or so interactive choices.
> 

Everything that goes from your browser to the server and back
is encoded using the HTTP protocol. So, in theory, one could 
create the right query to send back to the server, and grab
the results automatically. It gets rather complicated if the
server is sending HTTP data with embedded scripts (Javascript
or Java). 

So, without more information, it is difficult to say how easy
this would be to script. If it is a plain HTTP GET or PUT,
then you will be able to create the query programmatically
and use something like wget to retrieve the pages.

As far as the cut-and-paste operation, you may be able to save
the contents of the window or frame to a file using 'Save as' or
'Save Frame as'. Still a manual process, but not cut and paste.

Evelyn Mitchell
efm at tummy.com



More information about the LUG mailing list