[lug] vi wildcards
Tkil
tkil at scrye.com
Fri May 25 10:23:21 MDT 2001
>>>>> "John" == John Starkey <jstarkey at advancecreations.com> writes:
John> I also worked out a way to chop the unwanteds outta all 72 files
John> based on the info in this thread. I've been slowly force-feeding
John> myself regexp and perl. You guys pushed that along quit a bit.
if you didn't already find it, take a look at the '-i' flag for perl
(man perlrun). e.g.:
perl -i.bak -lpwe 's:</?span[^>]*>::gi' *.html
t.
p.s. in keeping with my title of "sick little monkey", here's one of
the regexes i wrote as a partial replacement for the HTML::Parser
module (when i was in a situation where i couldn't install any
CPAN modules...):
if ($t =~ m{\G
( # (whole thing is in $1)
<([^/]\w*) # start of tag ($2)
\s*
((?: # attribute list ($3)
(?:[^>\s=]+ # the attribute itself
(?:\s*=\s* # maybe followed by an equals sign and
(?:\"[^\"]*\"| # double-quoted, or
\'[^\']*\'| # single-quoted, or
[^>\s]+) # plain value
)? # or maybe not.
\s* # and a bit of whitespace
))*) # we can have 0 or more attributes
\s*>)}gcx)
which matches opening anchors, and then i might have to parse the
attribute list, which ended up in $3:
my $attr_list = $3;
while ($attr_list =~
m{(\S+?) # the attribute itself ($1)
(\s*=\s* # maybe followed by ($2)
(?:
\"([^\"]*)\"| # double-quoted ($3), or
\'([^\']*)\'| # single-quoted ($4), or
([^>\s]+))) # plain value ($5)
\s*}gcx)
{
# take the lower-case attribute name.
my $key = lc($1);
# assign a value, if we have one...
my $val = $2 && (defined $3 ? $3 :
defined $4 ? $4 :
defined $5 ? $5 :
undef);
# and store it for later.
$attrs{$key} = $val;
}
More information about the LUG
mailing list