[lug] Scripting help, lynx

Chip Atkinson chip at pupman.com
Tue May 3 07:20:46 MDT 2011


In order to do this recursively one would have to use the find command:

for $file in $(find . -name "*.html"); do
  echo "lynx -nolist -dump > $file.txt"
done

I like to put the echo in front of a command before I have it working how
I want.  This is especially handy if your command does something
potentially destructive such as deleting files or filling up your disk.

Once the output looks correct remove the echo and quotes.  The quotes are
needed in this case because of the output redirect (>). You could also 
  echo lynx -nolist -dump \> $file.txt

There is also a utility html2text that I've used which works well if lynx
doesn't fill the bill.

Chip

On Tue, 3 May 2011, Dan Ferris wrote:

> for $file in `ls *.html`
> do
>      lynx -nolist -dump > $file.txt
> done
> 
> That will redirect the file to $file.html.txt, I'll leave it as an 
> exercise for you to figure out how to change it to $file.txt.
> 
> Dan
> 
>   On 5/3/2011 6:58 AM, Paul Nowosielski wrote:
> > Dear All,
> >
> > I'm trying to convert all the html files
> > into text using lynx. The files are in many directories
> > with meaningful names.
> >
> > Can anyone assist me in creating a script
> > That will go through each directory recursively
> > and convert the files to text and preserve the base name.
> >
> > ex: file1.html file1.txt file2.html file2.txt (or something close to this)
> >
> > I have this so far, which correctly traverse the directories
> > and spits out the text. But I am not understanding out how
> > to direct to a txt file with the same name as the html file.
> >
> > find ./ -name *.html |xargs -I '{}' lynx -nolist -dump '{}'
> >
> > Any thoughts?
> >
> > Thank you,
> >
> > Paul
> > _______________________________________________
> > Web Page:  http://lug.boulder.co.us
> > Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> > Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety
> 
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety
> 




More information about the LUG mailing list