[lug] Regex Help

Sun Jul 10 22:15:20 MDT 2011

"Jeffrey S. Haemer" <jeffrey.haemer at gmail.com> writes:

>    (1) Perl permits "bookend" delimiters for regexes. Â Instead ofÂ
>
>    m/regex/Â or m!regex!
>
>    I often use
>
>    m(regex)or m[regex] or m{regex}
>
>    For me, these are more readable, and they work around almost all
>    quoting-and-backslashing nonsense. Â If a regex is trivial, print if
>    /regex/Â ; if it's not, bookends.

Hm... the problem is that all of those bookends you suggest are also
valid regex chars; while perl is probably smart enough to keep track
of nested pairs, it could lead to surprises.

I'm not dissing the bookends, really; I do use them sometimes,
especially because they allow me to do "parallel" substitutions:

   while ( $line =~ s(foo bar baz)
                     (oof rab zab) ) { ... }

You can't do that with the non-paired variants.  (Well, /x would allow
for some of these tricks, but introduces its own pain in the matching
regex to deal with desired whitespace.)

>    (2) If I'm debugging a regex, I never do it in the program. Â I do it
>    on the command line with this idiom:
>
>    $ perl -ne 'print if s(regex)(XXXXX)'
>
>    This sits and waits for me to type at it, then
>      * only prints lines that match the regex
>      * shows me exactly what it matched by substituting 'XXXXX' for what
>        it found.

Agreed; I use the command line for this stuff a lot.  For that matter,
I learned a huge amount of perl by hanging out on EFNet #perl, doing
one-liners as quickly as I could.  Learned to abuse the -M switch a
lot, and came up with horrors like this:

   http://www.foo.be/docs/tpj/issues/vol4_3/tpj0403-0013.html

Getting back to regexes, you might also want to play around with the
various regex debugging tools available in Perl.  Try this in a
terminal window:

   echo "foo bar baz" | \
     perl -Mre=debugcolor -lnwe 'print $1 while /(ba.?)/g;'

>    (3) Test-driven development (TDD) was made for regexes.
>
>    I make a file with lines that should and shouldn't match, then
>    feed it to the one-liner in #2. Vi lets me cut-and-paste a
>    zillion variants quickly; I bet your favorite text editor will,
>    too.

You can also use the __DATA__ feature to automate this a bit more, if
you are so inclined (see 'perldata' man page):

  #!/usr/bin/perl

  use warnings;
  use strict;

  # use re qw( debugcolor );

  my $test_re = qr/ba.?/;

  my $errors = 0;
  my $case = 0;

  while ( my $line = <DATA> )
  {
      ++$case;

      $line =~ s!\s+\z!!;
      my ( $expected_count, $test_str ) = split ' ', $line, 2;

      print "case $case: expect $expected_count matches " .
            "of '$test_re' in '$test_str'\n";

      my $count = 0;
      ++$count while $test_str =~ /$test_re/g;

      if ( $count != $expected_count )
      {
          ++$errors;
          warn "error $errors: got $count matches";
      }
  }

  exit ( $errors > 0 ? 1 : 0 );

  __DATA__
  2  foo bar baz
  0  blah gibber fee

Anyway.  Happy hacking!

t.