[lug] Perl question: how to print embedded metacharacters

Tom Christiansen tchrist at perl.com
Wed Nov 25 15:32:16 MST 2009


In-Reply-To: Message from Chip Atkinson <chip at pupman.com>
   of "Wed, 25 Nov 2009 11:20:05 MST." 
   <Pine.LNX.4.10.10911251110100.9985-100000 at chip1.pupman.com>

I've taken the time and trouble to rethread && retread the agonizingly
borked message I'm replying to, because it was the only way I could
figure out what was going on.  It was super-horrifically jeopardy-quoted
and miswrapped.  I can't believe people read this sort of thing. 

And there was no accumulated References header: !*SIGH*!

Jeopardy quoting is sometimes called bottom-posting.  It puts the new
answer *before* the quoted old question in a confusing violation of cause
and effect.  I've inverted the material back into FIFO order such that the
more-quoted portions precede the less-quoted answers.  That way it can be
read in the same order in which it occurred.  I really can't read it
backwards, and wonder who can.  I am afraid I've messed up the attributions,
but this was the best I could do given what I received.

Besides the jeopardy problem, I had to fix all the ugly miswrappage just so
I could read the message.  The miswrapping was stuff like:

    >> It looks like you are using a newsreader that wraps 
    > lines for you
    >> incorrectly.  This is often a badly configured web browser.  
    > If it's
    >> netscape, I suggest you report the bug to them.  It mangles 
    > your
    >> posting beyond any reasonable ability to read.

That's--well, execrable.  But it doesn't have to be like that.  

Don't accept bad wrapping: Fix it!  

Here's a little front-end to Damian Conway's awesome Text::Autoformat 
module to fix it up to (be) properly read:

    >> It looks like you are using a newsreader that wraps lines for you
    >> incorrectly.  This is often a badly configured web browser.  If
    >> it's netscape, I suggest you report the bug to them.  It mangles
    >> your posting beyond any reasonable ability to read.

Here, you need this:

 |  #!/usr/local/bin/perl
 |  # afmt - autoformat reply text using...
 |  use Text::Autoformat;
 |  
 |  $| = 1;
 |  
 |  $/ = '';
 |  $size = 75;
 |  
 |  if (@ARGV && $ARGV[0] =~ /^-?(\d+)$/) {
 |      shift;
 |      $size = $1;
 |  } 
 |  
 |  our $pid = postprocess();
 |  
 |  while (<>) { 
 |      die unless 3 == (($leading, $text, $ending) = /^\A(\n*)(.*?)(\n*)\Z/s);
 |      print   #$leading, 
 |              autoformat($text => {
 |                              squeeze     => 0, 
 |                              all         => 1, 
 |                              right       => $size, 
 |                              # ignore      => qr/^[ \t]/,
 |                          }),
 |              $ending;
 |  }
 |  close STDOUT;
 |  waitpid($pid, 0);
 |  
 |  # backend filter processing clean up work squeeze multiple newlines between
 |  # paragraphs (with optional whitespace slot in lines) into single blank lines
 |  # only and make sure the last paragraph has just a regular single terminating
 |  # newline, never a blank line
 |  #
 |  sub postprocess { 
 |      my $pid = open(STDOUT,  "|-");
 |      die "$0: forkopen failed: $!" unless defined $pid;
 |      return $pid if $pid;
 |      $/ = '';       # redundant with above, but different proc
 |      while (<>) {
 |          s/\s+\z/\n\n/g;
 |          s/\n\n\z/\n/ if eof;
 |          print;
 |      } 
 |      exit;
 |  }
 |  

Ok, and now for the message, placed in FIFO order the way it was spoken
instead of LIFO order the way it came to me.  I can't read backwards.

>>>>>>>>>> Greetings,
>>>>>>>>>>

This:

>>>>>>>>>> I'm working on a perl script that reads strings from a
>>>>>>>>>> configuration
>>>>>>>> file
>>>>>>>>>> and prints them out.  What I'd like to be able to do is have any
>>>>>>>> embedded
>>>>>>>>>> metacharacters interpreted but I'd like to avoid doing the
>>>>>>>> substitutions
>>>>>>>>>> myself.

is the erroneously quoted version of 

>>>>>>>>>> I'm working on a perl script that reads strings from a
>>>>>>>>>> configuration file and prints them out.  What I'd like to be
>>>>>>>>>> able to do is have any embedded metacharacters interpreted
>>>>>>>>>> but I'd like to avoid doing the substitutions myself.

>>>>>>>>>>
>>>>>>>>>> For example, here is a little script to emulate the problem:
>>>>>>>>>>
>>>>>>>>>> #!/usr/bin/perl
>>>>>>>>>>
>>>>>>>>>> my $kw = 'abc\ndef';
>>>>>>>>>>
>>>>>>>>>> print "$kw";
>>>>>>>>>> ----------------------
>>>>>>>>>> the output is
>>>>>>>>>> abc\ndef
>>>>>>>>>>
>>>>>>>>>> What I want is
>>>>>>>>>> abc
>>>>>>>>>> def
>>>>>>>>>>
>>>>>>>>>> --------------
>>>>>>>>>> I tried to jfgi but no joy.
>>>>>>>>>>
>>>>>>>>>> Thanks in advance.
>>>>>>>>>>
>>>>>>>>>> Chip
>>>>>>>> No, unfortunately.
>>>>>>>>

This:

>>>>>>>> The problem is that the script reads the string in from the command
>>>>>>>> file,
>>>>>>>> so I need to interpret the string before printing it.  Here's another
>>>>>>>> example:

is the erroneously quoted version of 

>>>>>>>> The problem is that the script reads the string in from the
>>>>>>>> command file, so I need to interpret the string before printing
>>>>>>>> it.  Here's another example:

>>>>>>>> ---------------------
>>>>>>>> chip at chip-desktop:~$ cat try2.pl
>>>>>>>> #!/usr/bin/perl
>>>>>>>>
>>>>>>>> #my $kw = 'abc\ndef';
>>>>>>>> #my $nv = "${\($kw)}";
>>>>>>>>
>>>>>>>> while (<>) {
>>>>>>>>  print "$_";
>>>>>>>> }
>>>>>>>> chip at chip-desktop:~$ echo 'abc\ndef' | ./try2.pl
>>>>>>>> abc\ndef
>>>>>>>> chip at chip-desktop:~$
>>>>>>>> ---------------------------------

This:

>>>>>>>> In a sense, I need the opposite of quotemeta -- something that takes
>>>>>>>> in a
>>>>>>>> string and does the metacharacter replacements.

is the erroneously quoted version of 

>>>>>>>> In a sense, I need the opposite of quotemeta -- something that
>>>>>>>> takes in a string and does the metacharacter replacements.

>>>>>>>>
>>>>>>>> I know that something like
>>>>>>>> while (<>) {
>>>>>>>>  s/\\n/\n/g;
>>>>>>>>  # etc. for all metacharacter sequences...
>>>>>>>>  print;
>>>>>>>> }
>>>>>>>>
>>>>>>>> would work but I wanted to avoid the "etc. for all metacharacter
>>>>>>>> sequences"
>>>>>>>>
>>>>>>>> Hopefully that's more understandable.
>>>>>>>>
>>>>>>>> Chip
>>>>>>>>
>>>>>>>> On Wed, 25 Nov 2009, Jeffrey Haemer wrote:
>>>>>>>>
>>>>>>>>> Chip,
>>>>>>>>>
>>>>>>>>> Can you do this instead?
>>>>>>>>>
>>>>>>>>> my $kw = "abc\ndef";
>>>>>>>>>
>>>>>>>>> If not, can you explain in enough detail that I can try again? :-)
>>>>>>>>>
>>>>>>>>> On Wed, Nov 25, 2009 at 8:29 AM, Chip Atkinson <chip at pupman.com>
>>>>>>>>> wrote:

>>>>>>> Chip,
>>>>>>>
>>>>>>> Okay.  Would something like this be good enough?
>>>>>>>
>>>>>>>
>>>>>>> #!/usr/bin/perl
>>>>>>>
>>>>>>> while (<>) {
>>>>>>>   $_=` printf "$_" `;
>>>>>>>   print;
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Nov 25, 2009 at 9:06 AM, Chip Atkinson <chip at pupman.com> wrote:

>>>>>>
>>>>>> Hmmmm... that does work, but isn't it using the bash printf rather than
>>>>>> perl?  My concern is how fast it would be.  I can experiment there
>>>>>> though,
>>>>>> and it does work.
>>>>>>
>>>>>> Thanks for your help!
>>>>>>
>>>>>> Chip
>>>>>>
>>>>>> On Wed, 25 Nov 2009, Jeffrey Haemer wrote:

>>>>> Chip,

>>>>> Yep, I'm using the shell to do the dirty work.

>>>>> I don't doubt there are elegant, pure-Perl solutions.  I was only trying
>>>>> to find something that would work for you, immediately, until someone else
>>>>> volunteers a more PC (Perlishly Correct) way. :-)

>>>>> On Wed, Nov 25, 2009 at 9:34 AM, Chip Atkinson <chip at pupman.com> wrote:

>>>> It didn't take much thought to realize I should have written "Perlitically
>>>> Correct."
>>>>
>>>> On Wed, Nov 25, 2009 at 9:41 AM, Jeffrey Haemer <jeffrey.haemer at gmail.com>
>>>> wrote:

>>> More Perlically Correct: (security considerations aside, I basically
>>> run the perl interpreter on the input and treat it as code again) Just
>>> make sure you do lots of input checking with this program.
>>>
>>> [09:57:25]            beldyk at chupacabra: ~
>>> $ cat tmp.pl
>>> #!/usr/bin/perl -w
>>>
>>> my $str = <<EOS;
>>> foo\nbar\tfoo\a\t
>>> EOS
>>>
>>> my $nstr;
>>>
>>> eval("\$nstr =  \"$str\";");
>>>
>>> print $nstr;
>>>
>>> [09:57:33]            beldyk at chupacabra: ~
>>> $ ./tmp.pl
>>> foo
>>> bar     foo
>>>

>> So it's more clear what my script is doing (and fixing the error with
>> the <<EOS thing (I initially wrote this with cat file|perl -ne
>> 'blah'))
>> 
>> [10:10:25]            beldyk at chupacabra: ~
>> $ ./tmp.pl
>>  foo\nbar\tfoo\a\t
>> *****************************
>>  foo
>> bar  foo     
>> [10:10:26]            beldyk at chupacabra: ~
>> $ cat tmp.pl
>> #!/usr/bin/perl -w
>> 
>> my $str = ' foo\nbar\tfoo\a\t';
>> 
>> my $nstr;
>> 
>> eval("\$nstr =  \"$str\";");
>> print $str;
>> print "\n*****************************\n";
>> print $nstr;

> Thanks!  Yes, this is pretty much what I was looking for.  
> 
> I think the thing that I must not have been clear about was that the
> metacharacters need to be interpreted at run time rather than compile
> time.  
> 
> Thus someting like 
> 
> $kx = "abc\ndef";
> 
> gets evaluated at compile time so that the \n that you see in the text
> editor is converted to 0x0D and stuffed into the string before it's
> executed by the perl engine(?).
> 
> I was playing with eval but wasn't sure how to go about it.

So it seems that what you want is to force a string through another 
level of interpolation.  So 

    interp('Please ring my \a.');

should return 

    interp("Please ring my \a.");

For this eval is the crux.  A better solution would use a unique heredoc 
terminator that's guaranteed *not* to occur in the string.

  || use charnames q(:full);
  || use strict;
  || 
  || our $EO_HEREDOC_TOKEN = q(<----End of Here Doc 000000);
  || 
  ||     sub interp {
  ||         my $interp_string = shift;
  ||         my $eo_heredoc_token = $EO_HEREDOC_TOKEN;
  ||         while ($interp_string =~ $eo_heredoc_token) {
  ||             $eo_heredoc_token =~ s{ \d+ \z }{ int rand 1e6 }ex;
  ||         }
  ||         my $evalstr = join(qq(\n), (
  ||                                         qq(<<"$eo_heredoc_token"), 
  ||                                         $interp_string, 
  ||                                         $eo_heredoc_token,
  ||                                         q()
  ||                                     )
  ||                       );
  ||         0 and print "\n[DEBUG: eval()ing:\n$evalstr]\n";
  ||         my $newstr = eval $evalstr;
  ||         die if $@;
  ||         chomp $newstr;
  ||         return $newstr;
  ||     }
  || 
  || 
  || our @tests = split(/ \s* \n \s* /x, <<'End of Interpolation Tests');
  || \uthis is \Ucapitalized.
  ||    I am $0.
  || Please ring my \a.
  || <----End of Here Doc 000000
  || tab \t formfeed \f return \r linefeed \n vertab \013 bs \b done.
  || Solidus is \N{SOLIDUS}.
  || Sudilso is \N{REVERSE SOLIDUS}.
  || End of Interpolation Tests
  || 
  || push @tests, $EO_HEREDOC_TOKEN;
  || push @tests, $EO_HEREDOC_TOKEN . "\n";
  || 
  || open(PIPE, "|cat -teunv") || die;
  || select(PIPE);
  || $| = 1;
  || 
  || for my $testno ( 0 .. $#tests ) {
  ||     my $was = $tests[$testno];
  ||     my $is = interp($was);
  ||     printf qq(Test #%d: "%s" => "%s"\n), $testno, $was, $is;
  || } 
  || 
  || close(PIPE) || die;
  || select STDOUT;
  || printf  "\nDone with %d tests.\n", 0+ at tests;
  || exit 0;
  || 

Which when run produces

	 1	Test #0: "\uthis is \Ucapitalized." => "This is CAPITALIZED."$
	 2	Test #1: "I am $0." => "I am interp-tester."$
	 3	Test #2: "Please ring my \a." => "Please ring my ^G."$
	 4	Test #3: "<----End of Here Doc 000000" => "<----End of Here Doc 000000"$
	 5	Test #4: "tab \t formfeed \f return \r linefeed \n vertab \013 bs \b done." => "tab ^I formfeed ^L return ^M linefeed $
	 6	 vertab ^K bs ^H done."$
	 7	Test #5: "Solidus is \N{SOLIDUS}." => "Solidus is /."$
	 8	Test #6: "Sudilso is \N{REVERSE SOLIDUS}." => "Sudilso is \."$
	 9	Test #7: "<----End of Here Doc 000000" => "<----End of Here Doc 000000"$
	10	Test #8: "<----End of Here Doc 000000$
	11	" => "<----End of Here Doc 000000$
	12	"$

    Done with 9 tests.

As you see, you have variables getting expanded--of course.

You may wish to look into String::Interpolate on CPAN; it 
can use Safe compartments &c&c&c.

MEANwhile, there was all this clearly important test in your message:

>>>>>>>>>> _______________________________________________
>>>>>>>>>> Web Page:  http://lug.boulder.co.us
>>>>>>>>>> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
>>>>>>>>>> Join us on IRC: lug.boulder.co.us port=6667
>>>>>>>>>> channel=#hackingsociety
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Click to Call Me Now! --
>>>>>>>>> http://seejeffrun.blogspot.com/2009/09/call-me-now.html
>>>>>>>>>
>>>>>>>>> Jeffrey Haemer <jeffrey.haemer at gmail.com>
>>>>>>>>> 720-837-8908 [cell],  @goyishekop [twitter]
>>>>>>>>> http://www.youtube.com/user/goyishekop [vlog]
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Web Page:  http://lug.boulder.co.us
>>>>>>>> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
>>>>>>>> Join us on IRC: lug.boulder.co.us port=6667 channel=#hackingsociety
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Click to Call Me Now! --
>>>>>>> http://seejeffrun.blogspot.com/2009/09/call-me-now.html
>>>>>>>
>>>>>>> Jeffrey Haemer <jeffrey.haemer at gmail.com>
>>>>>>> 720-837-8908 [cell],  @goyishekop [twitter]
>>>>>>> http://www.youtube.com/user/goyishekop [vlog]
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Web Page:  http://lug.boulder.co.us
>>>>>> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
>>>>>> Join us on IRC: lug.boulder.co.us port=6667 channel=#hackingsociety
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Click to Call Me Now! --
>>>>> http://seejeffrun.blogspot.com/2009/09/call-me-now.html
>>>>>
>>>>> Jeffrey Haemer <jeffrey.haemer at gmail.com>
>>>>> 720-837-8908 [cell],  @goyishekop [twitter]
>>>>> http://www.youtube.com/user/goyishekop [vlog]
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Click to Call Me Now! --
>>>> http://seejeffrun.blogspot.com/2009/09/call-me-now.html
>>>>
>>>> Jeffrey Haemer <jeffrey.haemer at gmail.com>
>>>> 720-837-8908 [cell],  @goyishekop [twitter]
>>>> http://www.youtube.com/user/goyishekop [vlog]
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Web Page:  http://lug.boulder.co.us
>>>> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
>>>> Join us on IRC: lug.boulder.co.us port=6667 channel=#hackingsociety
>>>>
>>>
>>>
>>>
>>> --
>>> Calvin: Know what I pray for?
>>> Hobbes: What?
>>> Calvin: The strength to change what I can, the inability to accept
>>> what I can't, and the incapacity to tell the difference.
>>>
>> 
>> 
>> 
>> -- 
>> Calvin: Know what I pray for?
>> Hobbes: What?
>> Calvin: The strength to change what I can, the inability to accept
>> what I can't, and the incapacity to tell the difference.
>> _______________________________________________
>> Web Page:  http://lug.boulder.co.us
>> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
>> Join us on IRC: lug.boulder.co.us port=6667 channel=#hackingsociety
>> 
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: lug.boulder.co.us port=6667 channel=#hackingsociety

Well, or not.  Hm.  This could use some netnannying.

  ! #!/usr/bin/perl -w
  ! # netnanny - check news message for netiquette issues
  ! # tchrist at perl.com
  ! 
  ! use strict;
  ! 
  ! my $Msg = get_message();
  ! 
  ! missing_headers();
  ! bogus_address();
  ! allcap_subject();
  ! annoying_subject();
  ! 
  ! mimes();
  ! 
  ! lines_too_long();
  ! control_characters();
  ! 
  ! miswrapped();
  ! jeopardy_quoted();
  ! overquoted();
  ! 
  ! good_signature();
  ! 
  ! exit(0);
  ! 
  ! #######################
  ! 
  ! sub AUTOLOAD {
  !     use vars '$AUTOLOAD';
  !     my $field;
  !     ($field = uc($AUTOLOAD)) =~ s/.*:://;
  !     if (!defined wantarray) {
  !         require Carp;
  !         Carp::confess("Undefined function call: $AUTOLOAD");
  !     } 
  !     $Msg->$field();
  ! } 
  ! 
  ! sub bogus_address {
  !     my $address = from();
  ! 
  !     if ($address !~ /\@\w.*\.\w/) {
  !         print "From address must contain an at sign, etc.\n";
  !         return;
  !     } 
  ! 
  !     if ($address =~ /(remove|spam)/i) {
  !         print "Munged return address suspected, found `$1' in from.\n";
  !     } 
  ! 
  !     ck822($address);    # inscrutable
  ! 
  !     my($host) = $address =~ /\@([a-zA-Z0-9_.-]+)/;
  !     dns_check($host);   # very slow!
  ! }
  ! 
  ! sub control_characters {
  !     my $lineno = 0;
  !     my $MAX_CONTROL = 5;
  ! 
  !     for (lines()) { 
  !         $lineno++;
  !         if (/(?=[^\s\b])([\000-\037])/) {
  !             printf "Control character (char %#o) appears at line %d of body.\n", 
  !                 ord $1, $lineno;
  !         }
  ! 
  !         if (/([\202-\237])/) {
  !             printf "MS-ASCII character (char %#o) appears at line %d of body.\n", 
  !                 ord $1, $lineno;
  !         } 
  !         last if --$MAX_CONTROL < 0;
  !     }
  ! 
  ! } 
  ! 
  ! sub lines_too_long {
  !     my $MAX_LINE_LEN = 80;
  !     my $line_count = scalar @{ [ lines() ] };
  !     my ($long_lines, $longest_line_data, $longest_line_number) = (0,'',0);
  !     my $lineno = 0;
  !     for (lines()) {
  !         $lineno++;
  !         next if /^[>+-]/;  # skip quotes and patch diffs
  !         if (length() > $MAX_LINE_LEN) {
  !             $long_lines++;
  !             if (length() > length($longest_line_data)) {
  !                 $longest_line_data = $_;
  !                 $longest_line_number = $lineno;
  !             } 
  !         } 
  !     } 
  !     if ($long_lines) {
  !         printf "%d of %d lines exceed maxlen %d, ",
  !            $long_lines, $line_count, $MAX_LINE_LEN;
  !         printf "longest is #%d at %d bytes\n",
  !             $longest_line_number, length($longest_line_data);
  !     } 
  ! }
  ! 
  ! sub missing_headers {
  !     if (subject() !~ /\S/) {
  !         print "Missing required subject header.\n";
  !     } 
  !     if (newsgroups() && subject() =~ /^\s*Re:/i && !references()) {
  !         print "Followup posting missing required references header.\n";
  !     } 
  ! }
  ! 
  ! sub allcap_subject {
  !     my $subject = subject();
  !     $subject =~ s/^(\s*Re:\s*)+//i;
  !     if ($subject !~ /[a-z]/) {
  !         print "No lower-case letters in subject header.\n";
  !     } 
  ! }
  ! 
  ! sub miswrapped {
  !     my($bq1, $bq2);
  !     for (paragraphs()) {
  !         next unless /\A(^\S.*){2,}\Z/ms;  # no indented code
  ! 
  !         if (!$bq1 && /^>\S.*\n\s*[a-zA-Z]/) { 
  !             print "Incorrectly wrapped quoted text.\n";
  !             $bq1++;
  !         }
  ! 
  !         next if $bq2;
  ! 
  !         my $count = 0;
  !         while (/^[^>].{60,}\n[^>].{1,20}\n(?=[^>].{60,}\n)/gm) {
  !             $count++;
  !         } 
  ! 
  !         if ($count > 1) {
  !             print "Incorrectly wrapped regular text.\n";
  !             $bq2++;
  !             ### print "OOPS count = $count:\n$_\n\n";
  !         } 
  !     } 
  ! }
  ! 
  ! sub jeopardy_quoted {
  !     local $_ = body();
  !     $_ = unquote_wrap($_);
  ! 
  !     $_ = strip_signature($_);
  !     $_ = strip_attribution($_);
  ! 
  !     # check quotation at bottom but nowhere else
  !     # XXX: these can go superlong superlong!  i've added
  !     #      some more anchors and constraints to try to avoid this,
  !     #      but I still mistrust it
  ! 
  !     #if (/ ((^\s*>.*){2,}) \s* \Z/mx   !/ (\n>.*?)+ (\n[^>].*?)+ (\n>.*?)+ /x ) 
  !     if ( /(?:\n?>.*)+\s*\Z/ )
  !     {
  !         print "Quote follows response, Jeopardy style #1\n";
  !     } 
  ! 
  !     # completely at bottom 
  !     elsif (/^.* wr(?:ote|ites):\s+(>.*\n)+\s*\Z/m) {
  !         print "Quote follows response, Jeopardy style #2\n";
  !     } 
  ! 
  !     # another way of saying the same
  !     elsif (/^(?:>+\s*)?-[_+]\s*Original Message\s*-[_+]\s.*\Z/ms) {
  !         print "Quote follows response, Jeopardy style #3\n";
  !     }
  ! 
  !     # another way of saying the same
  !     elsif (/^(?:>+\s*)?[_-]+\s*Reply Separator\s*[_-]+\s.*\Z/ms) {
  !         print "Quote follows response, Jeopardy style #4\n";
  !     }
  ! 
  ! }
  ! 
  ! sub overquoted {
  ! 
  !     # cfoq: check fascistly overquoted by tchrist at mox.perl.com
  !     #   (wants perl 5.0 or better; developed under 5.002)
  ! 
  !     my (
  !         $total,         # total number of lines, minus sig and attribution
  !         $quoted_lines,  # how many lines were quoted
  !         $percent,       # what percentage this in
  !         $pcount,        # how many in this paragraph were counted
  !         $match_part,    # holding space for current match
  !         $gotsig,        # is this the sig paragraph?
  !     );
  ! 
  !     $total = $quoted_lines = $pcount = $percent = 0;
  ! 
  !     my $MINLINES  = 20;
  !     my $TOLERANCE = 50; 
  !     my $VERBOSE   = 0;
  ! 
  !     if (body() =~ /^-+\s*Original Message\s*-+$/m) {
  !         my $body = strip_signature(body());
  !         my($text,$quote) = body() =~ /(.*)(^-+\s*Original Message\s*-+.*\Z)/ms;
  !         $total = ((my $x = body()) =~ y/\n//);
  !         $quoted_lines = ($quote =~ y/\n//);
  !     } 
  !     else { 
  !         for (paragraphs()) {
  ! 
  !             # strip sig line, remember we found it
  !             $gotsig = s/^-- \n.*//ms;
  ! 
  !             # strip attribution, possibly multiline
  !             if ($. == 2) { s/\A.*?(<.*?>|\@).*?:\n//s }  
  ! 
  !             # toss trailing blank lines into one single line
  !             s/\n+\Z/\n/;
  ! 
  !             # now reduce miswrapped lines from idiotic broken PC newsreaders
  !             # into what they should have been
  !             s/(>.*)\n\s*([a-zA-Z])/$1 $2/g;
  ! 
  !             # count lines in this paragraph
  !             $total++ while  /^./mg;
  ! 
  !             # is it a single line, quoted in the customary fashion?
  !             if ( /^(>+).*\n\Z/ ) {
  !                 $quoted_lines++;
  !                 print " 1 line quoted with $1\n" if $VERBOSE;
  !                 next;
  !             } 
  ! 
  !             # otherwise, it's a multiline block, which may be quoted
  !             # with any leading repeated string that's neither alphanumeric
  !             # nor string
  !             while (/^(([^\w\s]+).*\n)(\2.*\n)+/mg) {  # YANETUT
  !                 $quoted_lines += $pcount = ($match_part = $&) =~ tr/\n//;
  !                 printf "%2d lines quoted with $2\n", $pcount    if $VERBOSE;
  !             } 
  ! 
  !             last if $gotsig;
  !         } 
  ! 
  !     }
  ! 
  !     $percent = int($quoted_lines / $total * 100);
  ! 
  !     if ($total == $quoted_lines) {
  !         print "All $total lines were quoted lines!\n";
  !         # not ok
  !     } 
  !     elsif ($percent > $TOLERANCE && $total > $MINLINES) {
  !         print "Overquoted: $quoted_lines lines quoted out of $total: $percent%\n";
  !     } 
  ! 
  ! }
  ! 
  ! sub unquote_wrap {
  !     my $chunk = shift;
  !     # reduce miswrapped lines from idiotic broken PC newsreaders
  !     # into what they should have been
  !     $chunk =~ s/(>.*)\n\s*([a-zA-Z])/$1 $2/g;
  !     return $chunk;
  ! } 
  ! 
  ! sub good_signature {
  !     my $MAX_SIGLINES = 4;
  ! 
  !     my $sig = '';
  !     my($is_canon, $separator);
  ! 
  !     my $body = body();
  ! 
  !     # sometimes the ms idiotware quotes at the bottom this way
  !     $body =~ s/^-+\s*Original Message\s*-+\s.*\Z//ms;
  ! 
  !     # first check regular signature
  !     if ($body =~ /\n-- \n(.*)/s) {
  !         $sig = $1;
  !         $is_canon = 1;
  !     } 
  !     elsif ($body =~ /\n([_-]{2,}\s*)\n(.*?)$/s) {
  !         $separator = $1;
  !         $sig = $2;
  !     } 
  ! 
  !     for ($separator, $sig) { s/\n\Z// if defined }
  ! 
  !     my $siglines = $sig =~ tr/\n//;
  ! 
  !     if ($separator && ($siglines && $siglines < 20)) { 
  !         if ($separator eq '--') {
  !             print "Double-dash in signature missing trailing space.\n";
  !         } else { 
  !             print "Non-canonical signature separator: `$separator'\n";
  !         }
  !     }
  ! 
  !     if ($siglines > $MAX_SIGLINES && $siglines < 20) { 
  !         printf "Signature too long: %d lines\n", $siglines;
  !     }
  ! } 
  ! 
  ! sub strip_signature {
  !     local $_ = shift;
  ! 
  !     s/\n-- \n(.*)//s
  !         ||
  !     s/\n([_-]{2,}\s*)\n(.*?)$//s;
  ! 
  !     return $_;
  ! } 
  ! 
  ! sub attribution {
  !     local $_ = body();
  !     s/^\s*\[.*\]\s*//;  # remove [courtesy cc]
  !     if (/\A(.*wr(?:ote|ites):?)\n/) {
  !         return $1;
  !     } elsif (/\A(.*?(<.*?>|\@).*?:)\n/s) {
  !         return $1;
  !     } else {
  !         return '';
  !     } 
  ! } 
  ! 
  ! sub strip_attribution {
  !     local $_ = shift;
  ! 
  !     s/^\s*\[.*\]\s*//;  # remove [courtesy cc]
  ! 
  !     # XXX: duplicate code with previous function
  !     s/\A(.*wr(?:ote|ites):?)\n// 
  !         ||
  !     s/\A(.*?(<.*?>|\@).*?:)\n//s;
  ! 
  !     return $_;
  ! }
  ! 
  ! sub annoying_subject {
  !     local $_ = subject();
  ! 
  !     if ( / ( [?!]{3,} ) /x   ||
  !          / ( HELP     ) /x   ||
  !          / ( PLEASE   ) /x
  !        ) 
  !     {
  !         print "Subject line contains annoying `$1' in it.\n";
  !     } 
  ! }
  ! 
  ! sub mimes {
  ! 
  !     my $mime_crap = 0;
  ! 
  !     for (content_type()) { 
  !         last unless defined;
  !         $mime_crap++;
  !         if (/multipart/i) {
  !             print "Multipart MIME detected.\n";
  !         } 
  !         elsif (/html/i) {
  !             print "HTML encrypting detected.\n";
  !         } 
  !         elsif (! (/^text$/i || m#^text/plain#i)) {
  !             print "Strange content type detected: $_\n";
  !         } 
  !     }
  ! 
  !     for (content_transfer_encoding()) { 
  !         last unless defined;
  !         if (/quoted-printable/i) {
  !             print "Gratuitously quoted-illegible MIMEing detected.\n";
  !         } 
  !     }
  ! 
  !     unless ($mime_crap) {
  !         for (body()) { 
  !             if (/\A\s*This message is in MIME format/i) {
  !                 print "Gratuitous but unadvertised MIME detected.\n";
  !             } 
  !             elsif ( /\A\s*This is a multi-part message in MIME format/i) {
  !                 print "Unadvertised multipart MIME detected.\n";
  !             } 
  !         }
  !     } 
  ! 
  ! 
  ! }
  ! 
  ! sub dns_check {
  !     my $NSLOOKUP = 'nslookup';  # or /usr/ucb?
  ! 
  !     # first try an MX record, then an A rec (for badly configged hosts)
  ! 
  !     my $host = shift;
  !     local $/ = undef;
  !     local *NS;
  !     local $_;
  ! 
  !     # the following is commented out for security reasons:
  !     #   if ( `nslookup -query=mx $host` =~ /mail exchanger/
  !     # otherwise there could be naughty bits in $host
  !     # we'll bypass system() and get right at execvp()
  ! 
  !     my $pid;
  ! 
  !     if ($pid = open(NS, "-|")) {
  !         $_ = <NS>;
  !         kill 'TERM', $pid if $pid;  # just in case
  !         close NS;
  !         return if /mail exchanger/;
  !         # else fall trohugh to next test
  !     } else {
  !         die "cannot fork: $!" unless defined $pid;
  !         open(SE, ">&STDERR");
  !         open(STDERR, ">/dev/null");
  !         { exec $NSLOOKUP, '-query=mx', $host; }  # braces for -w
  !         open(STDERR, ">&SE");
  !         die "can't exec nslookup: $!";
  !     } 
  ! 
  !     if ($pid = open(NS, "-|")) {
  !         $_ = <NS>;
  !         kill 'TERM', $pid if $pid;  # just in case
  !         close NS;
  !         unless (/answer:.*Address/s || /Name:.*$host.*Address:/si) {
  !             print "No DNS for \@$host in return address.\n";
  !         }
  !     } else { 
  !         die "cannot fork: $!" unless defined $pid;
  !         open(SE, ">&STDERR");
  !         open(STDERR, ">/dev/null");
  !         { exec $NSLOOKUP, '-query=a', $host; }  # braces for -w
  !         open(STDERR, ">&SE");
  !         die "can't exec nslookup: $!";
  !     }
  ! 
  ! } 
  ! 
  ! 
  ! sub ck822 { 
  ! 
  !     # ck822 -- check whether address is valid rfc 822 address
  !     # tchrist at perl.com
  !     #
  !     # pattern developed in program by jfriedl; 
  !     # see "Mastering Regular Expressions" from ORA for details
  ! 
  !     # this will error on something like "ftp.perl.com." because
  !     # even though dns wants it, rfc822 hates it.  shucks.
  ! 
  !     my $what = 'address';
  ! 
  !     my $address = shift;
  !     local $_;
  ! 
  !     my $is_a_valid_rfc_822_addr;
  ! 
  !     ($is_a_valid_rfc_822_addr = <<'EOSCARY') =~ s/\n//g;
  ! (?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n
  ! \015()]|\\[^\x80-\xff])*\))*\))*(?:(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\
  ! xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|"(?:[^\\\x80-\xff\n\015"
  ! ]|\\[^\x80-\xff])*")(?:(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xf
  ! f]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*\.(?:[\040\t]|\((?:[
  ! ^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\
  ! xff])*\))*\))*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;
  ! :".\\\[\]\000-\037\x80-\xff])|"(?:[^\\\x80-\xff\n\015"]|\\[^\x80-\xff])*"))
  ! *(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\
  ! n\015()]|\\[^\x80-\xff])*\))*\))*@(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\
  ! \[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*(?:[^(\04
  ! 0)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-
  ! \xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\])(?:(?:[\040\t]|\((?
  ! :[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80
  ! -\xff])*\))*\))*\.(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\(
  ! (?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*(?:[^(\040)<>@,;:".\\\[\]
  ! \000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\
  ! \x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\]))*|(?:[^(\040)<>@,;:".\\\[\]\000-\0
  ! 37\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|"(?:[^\\\x80-\xf
  ! f\n\015"]|\\[^\x80-\xff])*")(?:[^()<>@,;:".\\\[\]\x80-\xff\000-\010\012-\03
  ! 7]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\
  ! \[^\x80-\xff])*\))*\)|"(?:[^\\\x80-\xff\n\015"]|\\[^\x80-\xff])*")*<(?:[\04
  ! 0\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]
  ! |\\[^\x80-\xff])*\))*\))*(?:@(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x
  ! 80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*(?:[^(\040)<>@
  ! ,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]
  ! )|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\])(?:(?:[\040\t]|\((?:[^\\
  ! \x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff
  ! ])*\))*\))*\.(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^
  ! \\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*(?:[^(\040)<>@,;:".\\\[\]\000-
  ! \037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-
  ! \xff\n\015\[\]]|\\[^\x80-\xff])*\]))*(?:(?:[\040\t]|\((?:[^\\\x80-\xff\n\01
  ! 5()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*,(?
  ! :[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\0
  ! 15()]|\\[^\x80-\xff])*\))*\))*@(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^
  ! \x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*(?:[^(\040)<
  ! >@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xf
  ! f])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\])(?:(?:[\040\t]|\((?:[^
  ! \\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\x
  ! ff])*\))*\))*\.(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:
  ! [^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*(?:[^(\040)<>@,;:".\\\[\]\00
  ! 0-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x8
  ! 0-\xff\n\015\[\]]|\\[^\x80-\xff])*\]))*)*:(?:[\040\t]|\((?:[^\\\x80-\xff\n\
  ! 015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*)
  ! ?(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000
  ! -\037\x80-\xff])|"(?:[^\\\x80-\xff\n\015"]|\\[^\x80-\xff])*")(?:(?:[\040\t]
  ! |\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[
  ! ^\x80-\xff])*\))*\))*\.(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xf
  ! f]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*(?:[^(\040)<>@,;:".\
  ! \\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|"(?:
  ! [^\\\x80-\xff\n\015"]|\\[^\x80-\xff])*"))*(?:[\040\t]|\((?:[^\\\x80-\xff\n\
  ! 015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*@
  ! (?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n
  ! \015()]|\\[^\x80-\xff])*\))*\))*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff
  ! ]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\
  ! ]]|\\[^\x80-\xff])*\])(?:(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\
  ! xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*\.(?:[\040\t]|\((?
  ! :[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80
  ! -\xff])*\))*\))*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@
  ! ,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff
  ! ])*\]))*(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x8
  ! 0-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*>)(?:[\040\t]|\((?:[^\\\x80-\xff\n\
  ! 015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*
  ! EOSCARY
  ! 
  !     if ($address !~ /^${is_a_valid_rfc_822_addr}$/o) { 
  !         print "rfc822 failure on $address"; 
  !     }
  ! }
  ! 
  ! 
  ! ##############################
  ! 
  ! package Mail_Message;
  ! 
  ! use Carp;
  ! 
  ! use vars qw/$AUTOLOAD/;
  ! 
  ! # process <ARGV> for a message header and body
  ! # currently this assumes one message per file!
  ! sub main::get_message {
  !     my $msg = bless {}, __PACKAGE__;
  !     local $/ = '';
  !     $msg->{HEADER_STRING} = <>;
  !     chomp $msg->{HEADER_STRING};
  !     for (split /\n(?!\s)/, $msg->{HEADER_STRING}) {
  !         my($tag, $value) = /^([^\s:]+):\s*(.*)\s*\Z/s;
  !         push @{ $msg->{HEADERS}{$tag} }, $value;
  !         $tag =~ tr/-/_/;
  !         $tag = uc($tag);
  !         push @{ $msg->{$tag} }, $value;
  !     } 
  !     local $/ = undef;
  !     for ($msg->{BODY} = <>) { 
  !         $msg->{PARAGRAPHS} = [ split /\n\n+/ ];
  !         $msg->{LINES}      = [ split /\n/    ];
  !     }
  ! 
  !     return $msg;
  ! } 
  ! 
  ! sub AUTOLOAD {
  !     use vars '$AUTOLOAD';
  !     my $self = shift;
  !     my $field;
  !     ($field = uc($AUTOLOAD)) =~ s/.*:://;
  !     my $xfield = "x_" . $field;
  ! 
  !     if (!exists $self->{$field} && exists $self->{$xfield}) {
  !         $field = $xfield;
  !     } 
  ! 
  !     unless (exists $self->{$field}) {
  !         return undef;
  !         # NOT REACHED
  !         confess "No field $field in message";
  !     } 
  ! 
  !     my $data = $self->{$field};
  !     my @data = ref $data ? @$data : $data;
  ! 
  !     if (wantarray) { 
  !         return @data;
  !     }
  !     else {
  !         return join("\n", @data);
  !     } 
  ! 
  ! } 

YAN^HETUT.

--tom



More information about the LUG mailing list