[lug] vi wildcards

Tkil tkil at scrye.com
Thu May 24 12:14:43 MDT 2001


>>>>> "John" == John Starkey <jstarkey at advancecreations.com> writes:

John> I'm trying to delete about 500 SPAN tags in a document I have to
John> code using vi. All of them include a css class. Can someone tell
John> me how to use wildcards in a :s statement?

vi uses regular expressions in s///, not wildcards.  they're related,
but they're not the same.  wildcards are also sometimes called
"globs", after the operation they're most often used for on command
lines...  (also, glob wildcards (* and ?) are similar to sql wildcards
(% and _).

let me see if i can find a post i wrote about this a while back...

| From nobody Fri Jun 19 12:13:15 1998
| Cc: Laura Morgan <laura.morgan at itron.com>
| Newsgroups: comp.lang.perl.misc
| Subject: Re: Regular Expression Question
| References: <358A9B2D.7B5B at itron.com>
| From: Tkil <tkil at scrye.com>
| Reply-To: Tkil <tkil at scrye.com>
| X-Attribution: Tkil
| Date: 19 Jun 1998 12:13:15 -0600
| Message-ID: <glnqtgszo.fsf at scrye.com>
| Organization: Scrye.com
| Lines: 57
| X-Newsreader: Gnus v5.5/XEmacs 20.4 - "Emerald"
| In-Reply-To: Laura Morgan's message of "Fri, 19 Jun 1998 10:09:01 -0700"
| 
| [posted and cc'd]
| 
| >>>>> "Laura" == Laura Morgan <laura.morgan at itron.com> writes:
| 
| Laura> I have input parameters that can be file.name, *.ext, file.* or
| Laura> *.* I'm parsing through a file trying to match on this input
| Laura> and perform a function (i.e. if the user types in *.c, I want
| Laura> to match all files with a .c extension, etc).
| 
| i'm not sure i understand your structure, but this might help.  shells 
| use wildcards to do "globbing".  while some of the globbing characters 
| come from regular expressions, they usually have different meanings.
| 
| a short translation table:
| 
| 	GLOB		Perl RE
| 	----		-------
| 	*		.*
| 	?		.
| 	.		\.
| 	[a-z]		[a-z]
| 
| thus, a glob pattern of "*.c" would turn into the perl regexp of
| m/.*\.c/, possibly anchored with ^ and $  (e.g. m/^.*\.c$/).  watch
| out for case, if you're hitting windows filesystems.  some more
| translations:
| 
| 	GLOB		Perl RE
| 	----		-------
| 	file.name	^file\.name$
| 	*.ext		\.ext$
| 	file.*		^file\.
| 	*.*		\.
| 
| [notice various shortcuts for leading/trailing "*" in the glob
| pattern.]
| 
| take a look at the perlre man page, the 'glob' function in perlfunc,
| and play around with the shell (to get a better feel for globbing, if
| you need it) and a little script like this to get a feel for the
| equivalences:
| 
|    opendir DIR, "." or die "couldn't open dir: $!";
|    my @matches = grep { m/^.*\.c$/ } readdir DIR;
|    closedir DIR or die "couldn't close dir: $!";
|    print "matches: @matches\n";
| 
| or something like that.
| 
| t.
| 
| p.s. fish:  http://www.scrye.com/~tkil/glob_to_re.pl

that url is no longer functional; i think i moved it to

   http://slinky.scrye.com/~tkil/perl/glob-to-re

John> :1,$s/<SPAN CLASS=\"*\">//g
John> :1,$a/<SPAN CLASS=\"\*\">//g

even if you replaced the * with .* to get

   s/<SPAN CLASS=".*">//g

you'll still have problems with greedy matching (as others have
already pointed out).  on a line that looks like

   <SPAN CLASS="foo">this <A HREF="blah.html">works</A></SPAN>

your expression would turn it into

   works</A></SPAN>

one solution is to match only non-right-angle-bracket things:

   s/<SPAN CLASS="[^>]*">//g

note that this lets you ignore the form of the attributes altogether,
really:

   s/<SPAN[^>]*>//g

although this doesn't work if you have an attribute with an angle
bracket in it:

   <span class="<whee>">

which may or may not be valid HTML...  traditional regular expressions
are technically incapable of handling true nesting (finite state
machines vs. pushdown automata, for the geeks in the crowd) but you
can handle most sane quoting with enough pain.  please refer to the
o'reilly book _mastering regular expressions_ for more info on that.
(in fact, it's an excellent book all around, and no unixphile should
be without it!)

assuming that we don't have to worry about that, you can whack the
closing elements at the same time:

   s:</*SPAN[^>]*>::g

finally, if you don't know what case it's in, you can do:

   s:</*[Ss][Pp][Aa][Nn][^>]*>::g

and, last but not least, you can use perl regexps (and, most likely,
any other equally-powerful regexp engine) to do this:

   s:</?span[^>]*>::gi

one other twist to the perl regexp engine is that it offers non-greedy
matching, which lets us use a variant on the original:

   s:<span class=".*?">:gi

notice the ".*?" construct, which is the non-greedy version of ".*".

this topic is covered in the perl FAQ, under the topic "how do i strip
HTML tags using regular expressions?":

   http://www.perl.com/pub/doc/manual/html/pod/perlfaq9.html#How_do_I_remove_HTML_from_a_stri

happy geeking,
t.



More information about the LUG mailing list