[lug] Perl question

Chip Atkinson catkinson at circadence.com
Tue Jun 26 09:09:34 MDT 2001


Thanks for the help.  Splitting the file into tag and non-tag chunks is 
a really cool idea.  I'll take a crack at doing it without peeking too 
much and then use the example below to correct my mistakes.  Thanks 
again.  This idea is exactly what I need.

Chip

Tkil wrote:

> having said all that, i noticed that it was screwing up a few more
> test cases.  the use of \S on an unquoted value was grabbing the
> closing angle bracket as well, which was bad.  the revised code is:
> =============================================================================
> #!/usr/bin/perl -w
> 
> use strict;
> 
> # my $doc = do { local $/; <> };
> my $doc = do { local $/; <DATA> };
> 
> # split up the document into alternating tags and normal text.
> my @chunks = split /(<(?:[^>\'\"]+|"[^\"]+"|'[^\']+')+>)/s, $doc;
> 
> my $pos = 0;
> 
> CHUNK:
> foreach (@chunks)
> {
>     # print STDERR "|$_|\n";
> 
>     # skip everything but image tags.
>     goto DONE unless /^<img/i;
> 
>     my $orig = $_;
> 
>     # split out the SRC= attribute value.
>     my ($pre, $q_src, $qq_src, $naked_src, $post) = 
>       m{ ^(<img.*?src\s*=\s*)
>          (?:'([^\']+)'|"([^\"]+)"|([^\s>]+))
>          (.*)$ }isx;
> 
>     # get rid of quotes.  the condition is necessary, since "0" is a
>     # perfectly valid name for a file.
>     my $src = ( $q_src  && $q_src  ne '' ? $q_src  :
>                 $qq_src && $qq_src ne '' ? $qq_src :
>                 $naked_src );
>     unless ($src) 
>     {
>         print STDERR "no \"src\" attribute at position $pos\n";
>         goto DONE;
>     }
> 
>     # now do whatever you want to $src
>     $src = "MODIFIED(\"$src\")";
> 
>     # make it fancy.
>     $src =~ s/\"/"/g;
>     $src = "\"$src\"";
>     
>     # and reconstruct the tag.
>     $_ = $pre . $src . $post;
> 
>     # print STDERR
>     #   "orig: $orig\n",
>     #   "new:  $_\n";
> 
>   DONE:
>     $pos += length;
>     print;
> }
> 
> exit 0;
> 
> __DATA__
> <!-- some test data -->
> <img src="alpha">
> <img
>   src="beta">
> <img src = "gamma">
> <img src = delta>
> <img src='epsilon'>
> <img align="left">
> <img src='zeta eta'>
> <link src="iota">
> <IMG SRC=kappa.gif>
> 
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug




More information about the LUG mailing list