[lug] Python HTMLparser

jafo at tummy.com jafo at tummy.com
Tue May 8 05:29:52 MDT 2001


On Mon, May 07, 2001 at 02:49:18PM -0600, KELLEY SCOTT T wrote:
>Does anyone out there have some Python code using the HTMLparser from the
>htmllib? I've tried the examples in the library reference but can't get
>them to work. I know I'm missing something, but I just can't find a good
>example out there of someone using Python parsers.

Here's "showhrefs" from my Python samples.  You can see some of the other
ones at http://www.tummy.com/python/

Sean
===========================
import sys
import urllib
import htmllib
import formatter

class HREFDisplay(htmllib.HTMLParser):
   def anchor_bgn(self, href, name, type):
      print "Anchor:\thref='%s' name='%s' type='%s'" % ( href, name, type )

   def handle_image(self, source, alt, ismap, align, width, height):
      print "Image:\tsource='%s' alt='%s'" % ( source, alt )

if len(sys.argv) != 2:
   print 'usage: %s <url>' % sys.argv[0]
   sys.exit(1)

parser = HREFDisplay(formatter.NullFormatter())
data = urllib.urlopen(sys.argv[1]).read()
parser.feed(data)
parser.close()
-- 
 Just because something doesn't do what you planned it to do doesn't mean
 it's useless.  -- T. Edison
Sean Reifschneider, Inimitably Superfluous <jafo at tummy.com>
tummy.com - Linux Consulting since 1995. Qmail, KRUD, Firewalls, Python



More information about the LUG mailing list