[lug] Help on scripting interactions with a web site

Ken Kinder kkinder at tridog.com
Tue Sep 18 10:10:17 MDT 2001


Plenty of people have already posted about using Python or Perl, and
what libraries to use. I have one more suggestion. When debugging stuff
like this, it's sometimes hard to figure out EXACTLY what the webserver
and webbrowser are sending back and forth. When I can't get my script to
emulate the actions of a browser and I'm not sure what the difference
is, I do a tcpdump (see the man page) on port 80, and diff the data.

Works every time. Sometimes there are obscure differences.

Phil Rasch wrote:
> 
> I want to acquire some datasets from a web site.
> 
> Unfortunately, the web site is designed so that in order to acquire
> the approximately 500 text files containing the data one must interact
> with a server at that site repeatedly, and then finally cut and paste
> the displayed data from a browser window into a file. I suspect the
> web site is using CGI scripts in the procedure because the final
> dataset does not show an html address that changes. Things stay the
> same for the last 4 or so interactive choices.
> 
> I am frustrated by the whole thing. It is a waste of my (or a
> support persons) time to have to do this, and the opportunity for
> mistakes is very high.
> 
> I have contacted both the webmaster for the site, and the
> investigators, and my sense is that they dont want to make it easy to
> acquire the data. They are however contractually constrained to make
> the data publicly available. They just dont have to make it easy.
> 
> So I am looking for a way around my problems. I want to script the exchange. So
> I just enter the relevant info in the script (e.g. the years, the
> stations, the destination, etc) and the whole thing goes on automatically
> from my end. As far as the web server is concerned somebody is sitting
> at the my end. But in reality a program is handling the transaction.
> 
> Can anybody make a suggestion on the right tool?
> 
> Thanks
> 
> Phil
> 
> --
> Phil Rasch, Climate Modeling Section, National Center for Atmospheric Research
> Mail     --> P.O. Box 3000, Boulder CO 80307
> Shipping --> 1850 Table Mesa Dr, Boulder, CO 80305
> email: pjr at ucar.edu, Web: http://www.cgd.ucar.edu/cms/pjr Phone: 303-497-1368, FAX: 303-497-1324
> 
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug

-- 
Ken G. Kinder - Engineer
Par Avance, Inc. / Tridog Interactive, Inc.



More information about the LUG mailing list