[lug] Python HTMLparser
jafo at tummy.com
jafo at tummy.com
Tue May 8 05:29:52 MDT 2001
On Mon, May 07, 2001 at 02:49:18PM -0600, KELLEY SCOTT T wrote:
>Does anyone out there have some Python code using the HTMLparser from the
>htmllib? I've tried the examples in the library reference but can't get
>them to work. I know I'm missing something, but I just can't find a good
>example out there of someone using Python parsers.
Here's "showhrefs" from my Python samples. You can see some of the other
ones at http://www.tummy.com/python/
Sean
===========================
import sys
import urllib
import htmllib
import formatter
class HREFDisplay(htmllib.HTMLParser):
def anchor_bgn(self, href, name, type):
print "Anchor:\thref='%s' name='%s' type='%s'" % ( href, name, type )
def handle_image(self, source, alt, ismap, align, width, height):
print "Image:\tsource='%s' alt='%s'" % ( source, alt )
if len(sys.argv) != 2:
print 'usage: %s <url>' % sys.argv[0]
sys.exit(1)
parser = HREFDisplay(formatter.NullFormatter())
data = urllib.urlopen(sys.argv[1]).read()
parser.feed(data)
parser.close()
--
Just because something doesn't do what you planned it to do doesn't mean
it's useless. -- T. Edison
Sean Reifschneider, Inimitably Superfluous <jafo at tummy.com>
tummy.com - Linux Consulting since 1995. Qmail, KRUD, Firewalls, Python
More information about the LUG
mailing list