[lug] Scripting help, lynx

Jeffrey S. Haemer jeffrey.haemer at gmail.com
Tue May 3 08:06:25 MDT 2011


Paul,

Another way to skin this cat:

for f in **/*.html; do
  lynx -nolist -dump $f > ${f/html/txt}
done

You have to have globstar set, like this: shopt -s globstar .  Me, I have it
in my .bashrc .

On Tue, May 3, 2011 at 6:58 AM, Paul Nowosielski
<paulnowosielski at yahoo.com>wrote:

> Dear All,
>
> I'm trying to convert all the html files
> into text using lynx. The files are in many directories
> with meaningful names.
>
> Can anyone assist me in creating a script
> That will go through each directory recursively
> and convert the files to text and preserve the base name.
>
> ex: file1.html file1.txt file2.html file2.txt (or something close to this)
>
> I have this so far, which correctly traverse the directories
> and spits out the text. But I am not understanding out how
> to direct to a txt file with the same name as the html file.
>
> find ./ -name *.html |xargs -I '{}' lynx -nolist -dump '{}'
>
> Any thoughts?
>
> Thank you,
>
> Paul
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety
>



-- 
Jeffrey Haemer <jeffrey.haemer at gmail.com>
720-837-8908 [cell], http://seejeffrun.blogspot.com [blog],
http://www.youtube.com/user/goyishekop [vlog]
*και οστις σε αγγαρευσει μιλιον εν υπαγε μετ αυτου δυο*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lug.boulder.co.us/pipermail/lug/attachments/20110503/4855b333/attachment.html>


More information about the LUG mailing list