[lug] Html to plain text

Walter Pienciak walter at frii.com
Wed Jan 17 12:48:23 MST 2001


I'm guessing you want not just the raw text, but rather formatted
text?  I use 'lynx -dump -width=72 $URL' and it does a decent enough
job.  I use this programmatically, and I can massage what I get.

Walter

On Wed, 17 Jan 2001, Atkinson, Chip wrote:

> Look in the Perl Cookbook.  I know that there is an example there because I
> saw it just yesterday.  I don't have the book with me right now though.
> This recipe will strip the tags out across lines too.
>
> Chip
>
> > -----Original Message-----
> > From: Ken Weinert [mailto:kenw at ihs.com]
> > Sent: Wednesday, January 17, 2001 12:22 PM
> > To: lug at lug.boulder.co.us
> > Subject: Re: [lug] Html to plain text
> >
> >
> > Take at look at this page, I think it will give you exactly what you
> > want: http://home.netscape.com/newsref/std/x-remote.html
> >
> >
> > * Carlos Hernández López (chernanl at banxico.org.mx) [010117 19:04]:
> > > Yes, technically, html files ARE plain text. But what I
> > want to do is remove
> > > all the html tags and get a  human  readable plain text
> > file. I need  exactly
> > > what netscape does with the sequence that Wayde has described.
> > >
> > > The  thing is that I need to do it  automatically, not by hand.
> > >
> > > With Lynx I can get a plain text file but it is not so easy to read.
> > >
> > > Any ideas?
> > >
> > > "J. Wayde Allen" wrote:
> > >
> > > > On Wed, 17 Jan 2001, Carlos Hernández López wrote:
> > > >
> > > > > Does anybody know an easy way to convert html files to
> > plain text files?
> > > >
> > > > Well ... one way using netscape is to use the click sequence:
> > > >
> > > >    file -> save as -> Format For Saved Document: Text
> > > >
> > > > - Wayde
> > > >   (wallen at lug.boulder.co.us)
> > > >
> > > > _______________________________________________
> > > > Web Page:  http://lug.boulder.co.us
> > > > Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> > >
> > >
> > > _______________________________________________
> > > Web Page:  http://lug.boulder.co.us
> > > Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> >
> > --
> > Ken Weinert   kenw at ihs.com 303-858-6956 (V) 303-705-4258 (F)
> > GnuPG KeyID: 9274F1CE           GnuPG available at
> http://www.gnupg.org/
> GnuPG Key Fingerprint: 1D87 3720 BB77 4489 A928  79D6 F8EC DD76 9274 F1CE
> Black holes are God's physical manifestation of a floating point exception.
>
>
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
>





More information about the LUG mailing list