[lug] Perl question: how to print embedded metacharacters
Chip Atkinson
chip at pupman.com
Wed Nov 25 15:52:41 MST 2009
Can one's head explode and swim at the same time?
On Wed, 25 Nov 2009, Tom Christiansen wrote:
> In-Reply-To: Message from Chip Atkinson <chip at pupman.com>
> of "Wed, 25 Nov 2009 11:20:05 MST."
> <Pine.LNX.4.10.10911251110100.9985-100000 at chip1.pupman.com>
>
> I've taken the time and trouble to rethread && retread the agonizingly
> borked message I'm replying to, because it was the only way I could
> figure out what was going on. It was super-horrifically jeopardy-quoted
> and miswrapped. I can't believe people read this sort of thing.
>
> And there was no accumulated References header: !*SIGH*!
>
> Jeopardy quoting is sometimes called bottom-posting. It puts the new
> answer *before* the quoted old question in a confusing violation of cause
> and effect. I've inverted the material back into FIFO order such that the
> more-quoted portions precede the less-quoted answers. That way it can be
> read in the same order in which it occurred. I really can't read it
> backwards, and wonder who can. I am afraid I've messed up the attributions,
> but this was the best I could do given what I received.
>
> Besides the jeopardy problem, I had to fix all the ugly miswrappage just so
> I could read the message. The miswrapping was stuff like:
>
> >> It looks like you are using a newsreader that wraps
> > lines for you
> >> incorrectly. This is often a badly configured web browser.
> > If it's
> >> netscape, I suggest you report the bug to them. It mangles
> > your
> >> posting beyond any reasonable ability to read.
>
> That's--well, execrable. But it doesn't have to be like that.
>
> Don't accept bad wrapping: Fix it!
>
> Here's a little front-end to Damian Conway's awesome Text::Autoformat
> module to fix it up to (be) properly read:
>
> >> It looks like you are using a newsreader that wraps lines for you
> >> incorrectly. This is often a badly configured web browser. If
> >> it's netscape, I suggest you report the bug to them. It mangles
> >> your posting beyond any reasonable ability to read.
>
> Here, you need this:
>
> | #!/usr/local/bin/perl
> | # afmt - autoformat reply text using...
> | use Text::Autoformat;
> |
> | $| = 1;
> |
> | $/ = '';
> | $size = 75;
> |
> | if (@ARGV && $ARGV[0] =~ /^-?(\d+)$/) {
> | shift;
> | $size = $1;
> | }
> |
> | our $pid = postprocess();
> |
> | while (<>) {
> | die unless 3 == (($leading, $text, $ending) = /^\A(\n*)(.*?)(\n*)\Z/s);
> | print #$leading,
> | autoformat($text => {
> | squeeze => 0,
> | all => 1,
> | right => $size,
> | # ignore => qr/^[ \t]/,
> | }),
> | $ending;
> | }
> | close STDOUT;
> | waitpid($pid, 0);
> |
> | # backend filter processing clean up work squeeze multiple newlines between
> | # paragraphs (with optional whitespace slot in lines) into single blank lines
> | # only and make sure the last paragraph has just a regular single terminating
> | # newline, never a blank line
> | #
> | sub postprocess {
> | my $pid = open(STDOUT, "|-");
> | die "$0: forkopen failed: $!" unless defined $pid;
> | return $pid if $pid;
> | $/ = ''; # redundant with above, but different proc
> | while (<>) {
> | s/\s+\z/\n\n/g;
> | s/\n\n\z/\n/ if eof;
> | print;
> | }
> | exit;
> | }
> |
>
> Ok, and now for the message, placed in FIFO order the way it was spoken
> instead of LIFO order the way it came to me. I can't read backwards.
>
> >>>>>>>>>> Greetings,
> >>>>>>>>>>
>
> This:
>
> >>>>>>>>>> I'm working on a perl script that reads strings from a
> >>>>>>>>>> configuration
> >>>>>>>> file
> >>>>>>>>>> and prints them out. What I'd like to be able to do is have any
> >>>>>>>> embedded
> >>>>>>>>>> metacharacters interpreted but I'd like to avoid doing the
> >>>>>>>> substitutions
> >>>>>>>>>> myself.
>
> is the erroneously quoted version of
>
> >>>>>>>>>> I'm working on a perl script that reads strings from a
> >>>>>>>>>> configuration file and prints them out. What I'd like to be
> >>>>>>>>>> able to do is have any embedded metacharacters interpreted
> >>>>>>>>>> but I'd like to avoid doing the substitutions myself.
>
> >>>>>>>>>>
> >>>>>>>>>> For example, here is a little script to emulate the problem:
> >>>>>>>>>>
> >>>>>>>>>> #!/usr/bin/perl
> >>>>>>>>>>
> >>>>>>>>>> my $kw = 'abc\ndef';
> >>>>>>>>>>
> >>>>>>>>>> print "$kw";
> >>>>>>>>>> ----------------------
> >>>>>>>>>> the output is
> >>>>>>>>>> abc\ndef
> >>>>>>>>>>
> >>>>>>>>>> What I want is
> >>>>>>>>>> abc
> >>>>>>>>>> def
> >>>>>>>>>>
> >>>>>>>>>> --------------
> >>>>>>>>>> I tried to jfgi but no joy.
> >>>>>>>>>>
> >>>>>>>>>> Thanks in advance.
> >>>>>>>>>>
> >>>>>>>>>> Chip
> >>>>>>>> No, unfortunately.
> >>>>>>>>
>
> This:
>
> >>>>>>>> The problem is that the script reads the string in from the command
> >>>>>>>> file,
> >>>>>>>> so I need to interpret the string before printing it. Here's another
> >>>>>>>> example:
>
> is the erroneously quoted version of
>
> >>>>>>>> The problem is that the script reads the string in from the
> >>>>>>>> command file, so I need to interpret the string before printing
> >>>>>>>> it. Here's another example:
>
> >>>>>>>> ---------------------
> >>>>>>>> chip at chip-desktop:~$ cat try2.pl
> >>>>>>>> #!/usr/bin/perl
> >>>>>>>>
> >>>>>>>> #my $kw = 'abc\ndef';
> >>>>>>>> #my $nv = "${\($kw)}";
> >>>>>>>>
> >>>>>>>> while (<>) {
> >>>>>>>> print "$_";
> >>>>>>>> }
> >>>>>>>> chip at chip-desktop:~$ echo 'abc\ndef' | ./try2.pl
> >>>>>>>> abc\ndef
> >>>>>>>> chip at chip-desktop:~$
> >>>>>>>> ---------------------------------
>
> This:
>
> >>>>>>>> In a sense, I need the opposite of quotemeta -- something that takes
> >>>>>>>> in a
> >>>>>>>> string and does the metacharacter replacements.
>
> is the erroneously quoted version of
>
> >>>>>>>> In a sense, I need the opposite of quotemeta -- something that
> >>>>>>>> takes in a string and does the metacharacter replacements.
>
> >>>>>>>>
> >>>>>>>> I know that something like
> >>>>>>>> while (<>) {
> >>>>>>>> s/\\n/\n/g;
> >>>>>>>> # etc. for all metacharacter sequences...
> >>>>>>>> print;
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> would work but I wanted to avoid the "etc. for all metacharacter
> >>>>>>>> sequences"
> >>>>>>>>
> >>>>>>>> Hopefully that's more understandable.
> >>>>>>>>
> >>>>>>>> Chip
> >>>>>>>>
> >>>>>>>> On Wed, 25 Nov 2009, Jeffrey Haemer wrote:
> >>>>>>>>
> >>>>>>>>> Chip,
> >>>>>>>>>
> >>>>>>>>> Can you do this instead?
> >>>>>>>>>
> >>>>>>>>> my $kw = "abc\ndef";
> >>>>>>>>>
> >>>>>>>>> If not, can you explain in enough detail that I can try again? :-)
> >>>>>>>>>
> >>>>>>>>> On Wed, Nov 25, 2009 at 8:29 AM, Chip Atkinson <chip at pupman.com>
> >>>>>>>>> wrote:
>
> >>>>>>> Chip,
> >>>>>>>
> >>>>>>> Okay. Would something like this be good enough?
> >>>>>>>
> >>>>>>>
> >>>>>>> #!/usr/bin/perl
> >>>>>>>
> >>>>>>> while (<>) {
> >>>>>>> $_=` printf "$_" `;
> >>>>>>> print;
> >>>>>>> }
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Nov 25, 2009 at 9:06 AM, Chip Atkinson <chip at pupman.com> wrote:
>
> >>>>>>
> >>>>>> Hmmmm... that does work, but isn't it using the bash printf rather than
> >>>>>> perl? My concern is how fast it would be. I can experiment there
> >>>>>> though,
> >>>>>> and it does work.
> >>>>>>
> >>>>>> Thanks for your help!
> >>>>>>
> >>>>>> Chip
> >>>>>>
> >>>>>> On Wed, 25 Nov 2009, Jeffrey Haemer wrote:
>
> >>>>> Chip,
>
> >>>>> Yep, I'm using the shell to do the dirty work.
>
> >>>>> I don't doubt there are elegant, pure-Perl solutions. I was only trying
> >>>>> to find something that would work for you, immediately, until someone else
> >>>>> volunteers a more PC (Perlishly Correct) way. :-)
>
> >>>>> On Wed, Nov 25, 2009 at 9:34 AM, Chip Atkinson <chip at pupman.com> wrote:
>
> >>>> It didn't take much thought to realize I should have written "Perlitically
> >>>> Correct."
> >>>>
> >>>> On Wed, Nov 25, 2009 at 9:41 AM, Jeffrey Haemer <jeffrey.haemer at gmail.com>
> >>>> wrote:
>
> >>> More Perlically Correct: (security considerations aside, I basically
> >>> run the perl interpreter on the input and treat it as code again) Just
> >>> make sure you do lots of input checking with this program.
> >>>
> >>> [09:57:25] beldyk at chupacabra: ~
> >>> $ cat tmp.pl
> >>> #!/usr/bin/perl -w
> >>>
> >>> my $str = <<EOS;
> >>> foo\nbar\tfoo\a\t
> >>> EOS
> >>>
> >>> my $nstr;
> >>>
> >>> eval("\$nstr = \"$str\";");
> >>>
> >>> print $nstr;
> >>>
> >>> [09:57:33] beldyk at chupacabra: ~
> >>> $ ./tmp.pl
> >>> foo
> >>> bar foo
> >>>
>
> >> So it's more clear what my script is doing (and fixing the error with
> >> the <<EOS thing (I initially wrote this with cat file|perl -ne
> >> 'blah'))
> >>
> >> [10:10:25] beldyk at chupacabra: ~
> >> $ ./tmp.pl
> >> foo\nbar\tfoo\a\t
> >> *****************************
> >> foo
> >> bar foo
> >> [10:10:26] beldyk at chupacabra: ~
> >> $ cat tmp.pl
> >> #!/usr/bin/perl -w
> >>
> >> my $str = ' foo\nbar\tfoo\a\t';
> >>
> >> my $nstr;
> >>
> >> eval("\$nstr = \"$str\";");
> >> print $str;
> >> print "\n*****************************\n";
> >> print $nstr;
>
> > Thanks! Yes, this is pretty much what I was looking for.
> >
> > I think the thing that I must not have been clear about was that the
> > metacharacters need to be interpreted at run time rather than compile
> > time.
> >
> > Thus someting like
> >
> > $kx = "abc\ndef";
> >
> > gets evaluated at compile time so that the \n that you see in the text
> > editor is converted to 0x0D and stuffed into the string before it's
> > executed by the perl engine(?).
> >
> > I was playing with eval but wasn't sure how to go about it.
>
> So it seems that what you want is to force a string through another
> level of interpolation. So
>
> interp('Please ring my \a.');
>
> should return
>
> interp("Please ring my \a.");
>
> For this eval is the crux. A better solution would use a unique heredoc
> terminator that's guaranteed *not* to occur in the string.
>
> || use charnames q(:full);
> || use strict;
> ||
> || our $EO_HEREDOC_TOKEN = q(<----End of Here Doc 000000);
> ||
> || sub interp {
> || my $interp_string = shift;
> || my $eo_heredoc_token = $EO_HEREDOC_TOKEN;
> || while ($interp_string =~ $eo_heredoc_token) {
> || $eo_heredoc_token =~ s{ \d+ \z }{ int rand 1e6 }ex;
> || }
> || my $evalstr = join(qq(\n), (
> || qq(<<"$eo_heredoc_token"),
> || $interp_string,
> || $eo_heredoc_token,
> || q()
> || )
> || );
> || 0 and print "\n[DEBUG: eval()ing:\n$evalstr]\n";
> || my $newstr = eval $evalstr;
> || die if $@;
> || chomp $newstr;
> || return $newstr;
> || }
> ||
> ||
> || our @tests = split(/ \s* \n \s* /x, <<'End of Interpolation Tests');
> || \uthis is \Ucapitalized.
> || I am $0.
> || Please ring my \a.
> || <----End of Here Doc 000000
> || tab \t formfeed \f return \r linefeed \n vertab \013 bs \b done.
> || Solidus is \N{SOLIDUS}.
> || Sudilso is \N{REVERSE SOLIDUS}.
> || End of Interpolation Tests
> ||
> || push @tests, $EO_HEREDOC_TOKEN;
> || push @tests, $EO_HEREDOC_TOKEN . "\n";
> ||
> || open(PIPE, "|cat -teunv") || die;
> || select(PIPE);
> || $| = 1;
> ||
> || for my $testno ( 0 .. $#tests ) {
> || my $was = $tests[$testno];
> || my $is = interp($was);
> || printf qq(Test #%d: "%s" => "%s"\n), $testno, $was, $is;
> || }
> ||
> || close(PIPE) || die;
> || select STDOUT;
> || printf "\nDone with %d tests.\n", 0+ at tests;
> || exit 0;
> ||
>
> Which when run produces
>
> 1 Test #0: "\uthis is \Ucapitalized." => "This is CAPITALIZED."$
> 2 Test #1: "I am $0." => "I am interp-tester."$
> 3 Test #2: "Please ring my \a." => "Please ring my ^G."$
> 4 Test #3: "<----End of Here Doc 000000" => "<----End of Here Doc 000000"$
> 5 Test #4: "tab \t formfeed \f return \r linefeed \n vertab \013 bs \b done." => "tab ^I formfeed ^L return ^M linefeed $
> 6 vertab ^K bs ^H done."$
> 7 Test #5: "Solidus is \N{SOLIDUS}." => "Solidus is /."$
> 8 Test #6: "Sudilso is \N{REVERSE SOLIDUS}." => "Sudilso is \."$
> 9 Test #7: "<----End of Here Doc 000000" => "<----End of Here Doc 000000"$
> 10 Test #8: "<----End of Here Doc 000000$
> 11 " => "<----End of Here Doc 000000$
> 12 "$
>
> Done with 9 tests.
>
> As you see, you have variables getting expanded--of course.
>
> You may wish to look into String::Interpolate on CPAN; it
> can use Safe compartments &c&c&c.
>
> MEANwhile, there was all this clearly important test in your message:
>
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> Web Page: http://lug.boulder.co.us
> >>>>>>>>>> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> >>>>>>>>>> Join us on IRC: lug.boulder.co.us port=6667
> >>>>>>>>>> channel=#hackingsociety
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Click to Call Me Now! --
> >>>>>>>>> http://seejeffrun.blogspot.com/2009/09/call-me-now.html
> >>>>>>>>>
> >>>>>>>>> Jeffrey Haemer <jeffrey.haemer at gmail.com>
> >>>>>>>>> 720-837-8908 [cell], @goyishekop [twitter]
> >>>>>>>>> http://www.youtube.com/user/goyishekop [vlog]
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> Web Page: http://lug.boulder.co.us
> >>>>>>>> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> >>>>>>>> Join us on IRC: lug.boulder.co.us port=6667 channel=#hackingsociety
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Click to Call Me Now! --
> >>>>>>> http://seejeffrun.blogspot.com/2009/09/call-me-now.html
> >>>>>>>
> >>>>>>> Jeffrey Haemer <jeffrey.haemer at gmail.com>
> >>>>>>> 720-837-8908 [cell], @goyishekop [twitter]
> >>>>>>> http://www.youtube.com/user/goyishekop [vlog]
> >>>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Web Page: http://lug.boulder.co.us
> >>>>>> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> >>>>>> Join us on IRC: lug.boulder.co.us port=6667 channel=#hackingsociety
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Click to Call Me Now! --
> >>>>> http://seejeffrun.blogspot.com/2009/09/call-me-now.html
> >>>>>
> >>>>> Jeffrey Haemer <jeffrey.haemer at gmail.com>
> >>>>> 720-837-8908 [cell], @goyishekop [twitter]
> >>>>> http://www.youtube.com/user/goyishekop [vlog]
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Click to Call Me Now! --
> >>>> http://seejeffrun.blogspot.com/2009/09/call-me-now.html
> >>>>
> >>>> Jeffrey Haemer <jeffrey.haemer at gmail.com>
> >>>> 720-837-8908 [cell], @goyishekop [twitter]
> >>>> http://www.youtube.com/user/goyishekop [vlog]
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Web Page: http://lug.boulder.co.us
> >>>> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> >>>> Join us on IRC: lug.boulder.co.us port=6667 channel=#hackingsociety
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Calvin: Know what I pray for?
> >>> Hobbes: What?
> >>> Calvin: The strength to change what I can, the inability to accept
> >>> what I can't, and the incapacity to tell the difference.
> >>>
> >>
> >>
> >>
> >> --
> >> Calvin: Know what I pray for?
> >> Hobbes: What?
> >> Calvin: The strength to change what I can, the inability to accept
> >> what I can't, and the incapacity to tell the difference.
> >> _______________________________________________
> >> Web Page: http://lug.boulder.co.us
> >> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> >> Join us on IRC: lug.boulder.co.us port=6667 channel=#hackingsociety
> >>
> > _______________________________________________
> > Web Page: http://lug.boulder.co.us
> > Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> > Join us on IRC: lug.boulder.co.us port=6667 channel=#hackingsociety
>
> Well, or not. Hm. This could use some netnannying.
>
> ! #!/usr/bin/perl -w
> ! # netnanny - check news message for netiquette issues
> ! # tchrist at perl.com
> !
> ! use strict;
> !
> ! my $Msg = get_message();
> !
> ! missing_headers();
> ! bogus_address();
> ! allcap_subject();
> ! annoying_subject();
> !
> ! mimes();
> !
> ! lines_too_long();
> ! control_characters();
> !
> ! miswrapped();
> ! jeopardy_quoted();
> ! overquoted();
> !
> ! good_signature();
> !
> ! exit(0);
> !
> ! #######################
> !
> ! sub AUTOLOAD {
> ! use vars '$AUTOLOAD';
> ! my $field;
> ! ($field = uc($AUTOLOAD)) =~ s/.*:://;
> ! if (!defined wantarray) {
> ! require Carp;
> ! Carp::confess("Undefined function call: $AUTOLOAD");
> ! }
> ! $Msg->$field();
> ! }
> !
> ! sub bogus_address {
> ! my $address = from();
> !
> ! if ($address !~ /\@\w.*\.\w/) {
> ! print "From address must contain an at sign, etc.\n";
> ! return;
> ! }
> !
> ! if ($address =~ /(remove|spam)/i) {
> ! print "Munged return address suspected, found `$1' in from.\n";
> ! }
> !
> ! ck822($address); # inscrutable
> !
> ! my($host) = $address =~ /\@([a-zA-Z0-9_.-]+)/;
> ! dns_check($host); # very slow!
> ! }
> !
> ! sub control_characters {
> ! my $lineno = 0;
> ! my $MAX_CONTROL = 5;
> !
> ! for (lines()) {
> ! $lineno++;
> ! if (/(?=[^\s\b])([\000-\037])/) {
> ! printf "Control character (char %#o) appears at line %d of body.\n",
> ! ord $1, $lineno;
> ! }
> !
> ! if (/([\202-\237])/) {
> ! printf "MS-ASCII character (char %#o) appears at line %d of body.\n",
> ! ord $1, $lineno;
> ! }
> ! last if --$MAX_CONTROL < 0;
> ! }
> !
> ! }
> !
> ! sub lines_too_long {
> ! my $MAX_LINE_LEN = 80;
> ! my $line_count = scalar @{ [ lines() ] };
> ! my ($long_lines, $longest_line_data, $longest_line_number) = (0,'',0);
> ! my $lineno = 0;
> ! for (lines()) {
> ! $lineno++;
> ! next if /^[>+-]/; # skip quotes and patch diffs
> ! if (length() > $MAX_LINE_LEN) {
> ! $long_lines++;
> ! if (length() > length($longest_line_data)) {
> ! $longest_line_data = $_;
> ! $longest_line_number = $lineno;
> ! }
> ! }
> ! }
> ! if ($long_lines) {
> ! printf "%d of %d lines exceed maxlen %d, ",
> ! $long_lines, $line_count, $MAX_LINE_LEN;
> ! printf "longest is #%d at %d bytes\n",
> ! $longest_line_number, length($longest_line_data);
> ! }
> ! }
> !
> ! sub missing_headers {
> ! if (subject() !~ /\S/) {
> ! print "Missing required subject header.\n";
> ! }
> ! if (newsgroups() && subject() =~ /^\s*Re:/i && !references()) {
> ! print "Followup posting missing required references header.\n";
> ! }
> ! }
> !
> ! sub allcap_subject {
> ! my $subject = subject();
> ! $subject =~ s/^(\s*Re:\s*)+//i;
> ! if ($subject !~ /[a-z]/) {
> ! print "No lower-case letters in subject header.\n";
> ! }
> ! }
> !
> ! sub miswrapped {
> ! my($bq1, $bq2);
> ! for (paragraphs()) {
> ! next unless /\A(^\S.*){2,}\Z/ms; # no indented code
> !
> ! if (!$bq1 && /^>\S.*\n\s*[a-zA-Z]/) {
> ! print "Incorrectly wrapped quoted text.\n";
> ! $bq1++;
> ! }
> !
> ! next if $bq2;
> !
> ! my $count = 0;
> ! while (/^[^>].{60,}\n[^>].{1,20}\n(?=[^>].{60,}\n)/gm) {
> ! $count++;
> ! }
> !
> ! if ($count > 1) {
> ! print "Incorrectly wrapped regular text.\n";
> ! $bq2++;
> ! ### print "OOPS count = $count:\n$_\n\n";
> ! }
> ! }
> ! }
> !
> ! sub jeopardy_quoted {
> ! local $_ = body();
> ! $_ = unquote_wrap($_);
> !
> ! $_ = strip_signature($_);
> ! $_ = strip_attribution($_);
> !
> ! # check quotation at bottom but nowhere else
> ! # XXX: these can go superlong superlong! i've added
> ! # some more anchors and constraints to try to avoid this,
> ! # but I still mistrust it
> !
> ! #if (/ ((^\s*>.*){2,}) \s* \Z/mx !/ (\n>.*?)+ (\n[^>].*?)+ (\n>.*?)+ /x )
> ! if ( /(?:\n?>.*)+\s*\Z/ )
> ! {
> ! print "Quote follows response, Jeopardy style #1\n";
> ! }
> !
> ! # completely at bottom
> ! elsif (/^.* wr(?:ote|ites):\s+(>.*\n)+\s*\Z/m) {
> ! print "Quote follows response, Jeopardy style #2\n";
> ! }
> !
> ! # another way of saying the same
> ! elsif (/^(?:>+\s*)?-[_+]\s*Original Message\s*-[_+]\s.*\Z/ms) {
> ! print "Quote follows response, Jeopardy style #3\n";
> ! }
> !
> ! # another way of saying the same
> ! elsif (/^(?:>+\s*)?[_-]+\s*Reply Separator\s*[_-]+\s.*\Z/ms) {
> ! print "Quote follows response, Jeopardy style #4\n";
> ! }
> !
> ! }
> !
> ! sub overquoted {
> !
> ! # cfoq: check fascistly overquoted by tchrist at mox.perl.com
> ! # (wants perl 5.0 or better; developed under 5.002)
> !
> ! my (
> ! $total, # total number of lines, minus sig and attribution
> ! $quoted_lines, # how many lines were quoted
> ! $percent, # what percentage this in
> ! $pcount, # how many in this paragraph were counted
> ! $match_part, # holding space for current match
> ! $gotsig, # is this the sig paragraph?
> ! );
> !
> ! $total = $quoted_lines = $pcount = $percent = 0;
> !
> ! my $MINLINES = 20;
> ! my $TOLERANCE = 50;
> ! my $VERBOSE = 0;
> !
> ! if (body() =~ /^-+\s*Original Message\s*-+$/m) {
> ! my $body = strip_signature(body());
> ! my($text,$quote) = body() =~ /(.*)(^-+\s*Original Message\s*-+.*\Z)/ms;
> ! $total = ((my $x = body()) =~ y/\n//);
> ! $quoted_lines = ($quote =~ y/\n//);
> ! }
> ! else {
> ! for (paragraphs()) {
> !
> ! # strip sig line, remember we found it
> ! $gotsig = s/^-- \n.*//ms;
> !
> ! # strip attribution, possibly multiline
> ! if ($. == 2) { s/\A.*?(<.*?>|\@).*?:\n//s }
> !
> ! # toss trailing blank lines into one single line
> ! s/\n+\Z/\n/;
> !
> ! # now reduce miswrapped lines from idiotic broken PC newsreaders
> ! # into what they should have been
> ! s/(>.*)\n\s*([a-zA-Z])/$1 $2/g;
> !
> ! # count lines in this paragraph
> ! $total++ while /^./mg;
> !
> ! # is it a single line, quoted in the customary fashion?
> ! if ( /^(>+).*\n\Z/ ) {
> ! $quoted_lines++;
> ! print " 1 line quoted with $1\n" if $VERBOSE;
> ! next;
> ! }
> !
> ! # otherwise, it's a multiline block, which may be quoted
> ! # with any leading repeated string that's neither alphanumeric
> ! # nor string
> ! while (/^(([^\w\s]+).*\n)(\2.*\n)+/mg) { # YANETUT
> ! $quoted_lines += $pcount = ($match_part = $&) =~ tr/\n//;
> ! printf "%2d lines quoted with $2\n", $pcount if $VERBOSE;
> ! }
> !
> ! last if $gotsig;
> ! }
> !
> ! }
> !
> ! $percent = int($quoted_lines / $total * 100);
> !
> ! if ($total == $quoted_lines) {
> ! print "All $total lines were quoted lines!\n";
> ! # not ok
> ! }
> ! elsif ($percent > $TOLERANCE && $total > $MINLINES) {
> ! print "Overquoted: $quoted_lines lines quoted out of $total: $percent%\n";
> ! }
> !
> ! }
> !
> ! sub unquote_wrap {
> ! my $chunk = shift;
> ! # reduce miswrapped lines from idiotic broken PC newsreaders
> ! # into what they should have been
> ! $chunk =~ s/(>.*)\n\s*([a-zA-Z])/$1 $2/g;
> ! return $chunk;
> ! }
> !
> ! sub good_signature {
> ! my $MAX_SIGLINES = 4;
> !
> ! my $sig = '';
> ! my($is_canon, $separator);
> !
> ! my $body = body();
> !
> ! # sometimes the ms idiotware quotes at the bottom this way
> ! $body =~ s/^-+\s*Original Message\s*-+\s.*\Z//ms;
> !
> ! # first check regular signature
> ! if ($body =~ /\n-- \n(.*)/s) {
> ! $sig = $1;
> ! $is_canon = 1;
> ! }
> ! elsif ($body =~ /\n([_-]{2,}\s*)\n(.*?)$/s) {
> ! $separator = $1;
> ! $sig = $2;
> ! }
> !
> ! for ($separator, $sig) { s/\n\Z// if defined }
> !
> ! my $siglines = $sig =~ tr/\n//;
> !
> ! if ($separator && ($siglines && $siglines < 20)) {
> ! if ($separator eq '--') {
> ! print "Double-dash in signature missing trailing space.\n";
> ! } else {
> ! print "Non-canonical signature separator: `$separator'\n";
> ! }
> ! }
> !
> ! if ($siglines > $MAX_SIGLINES && $siglines < 20) {
> ! printf "Signature too long: %d lines\n", $siglines;
> ! }
> ! }
> !
> ! sub strip_signature {
> ! local $_ = shift;
> !
> ! s/\n-- \n(.*)//s
> ! ||
> ! s/\n([_-]{2,}\s*)\n(.*?)$//s;
> !
> ! return $_;
> ! }
> !
> ! sub attribution {
> ! local $_ = body();
> ! s/^\s*\[.*\]\s*//; # remove [courtesy cc]
> ! if (/\A(.*wr(?:ote|ites):?)\n/) {
> ! return $1;
> ! } elsif (/\A(.*?(<.*?>|\@).*?:)\n/s) {
> ! return $1;
> ! } else {
> ! return '';
> ! }
> ! }
> !
> ! sub strip_attribution {
> ! local $_ = shift;
> !
> ! s/^\s*\[.*\]\s*//; # remove [courtesy cc]
> !
> ! # XXX: duplicate code with previous function
> ! s/\A(.*wr(?:ote|ites):?)\n//
> ! ||
> ! s/\A(.*?(<.*?>|\@).*?:)\n//s;
> !
> ! return $_;
> ! }
> !
> ! sub annoying_subject {
> ! local $_ = subject();
> !
> ! if ( / ( [?!]{3,} ) /x ||
> ! / ( HELP ) /x ||
> ! / ( PLEASE ) /x
> ! )
> ! {
> ! print "Subject line contains annoying `$1' in it.\n";
> ! }
> ! }
> !
> ! sub mimes {
> !
> ! my $mime_crap = 0;
> !
> ! for (content_type()) {
> ! last unless defined;
> ! $mime_crap++;
> ! if (/multipart/i) {
> ! print "Multipart MIME detected.\n";
> ! }
> ! elsif (/html/i) {
> ! print "HTML encrypting detected.\n";
> ! }
> ! elsif (! (/^text$/i || m#^text/plain#i)) {
> ! print "Strange content type detected: $_\n";
> ! }
> ! }
> !
> ! for (content_transfer_encoding()) {
> ! last unless defined;
> ! if (/quoted-printable/i) {
> ! print "Gratuitously quoted-illegible MIMEing detected.\n";
> ! }
> ! }
> !
> ! unless ($mime_crap) {
> ! for (body()) {
> ! if (/\A\s*This message is in MIME format/i) {
> ! print "Gratuitous but unadvertised MIME detected.\n";
> ! }
> ! elsif ( /\A\s*This is a multi-part message in MIME format/i) {
> ! print "Unadvertised multipart MIME detected.\n";
> ! }
> ! }
> ! }
> !
> !
> ! }
> !
> ! sub dns_check {
> ! my $NSLOOKUP = 'nslookup'; # or /usr/ucb?
> !
> ! # first try an MX record, then an A rec (for badly configged hosts)
> !
> ! my $host = shift;
> ! local $/ = undef;
> ! local *NS;
> ! local $_;
> !
> ! # the following is commented out for security reasons:
> ! # if ( `nslookup -query=mx $host` =~ /mail exchanger/
> ! # otherwise there could be naughty bits in $host
> ! # we'll bypass system() and get right at execvp()
> !
> ! my $pid;
> !
> ! if ($pid = open(NS, "-|")) {
> ! $_ = <NS>;
> ! kill 'TERM', $pid if $pid; # just in case
> ! close NS;
> ! return if /mail exchanger/;
> ! # else fall trohugh to next test
> ! } else {
> ! die "cannot fork: $!" unless defined $pid;
> ! open(SE, ">&STDERR");
> ! open(STDERR, ">/dev/null");
> ! { exec $NSLOOKUP, '-query=mx', $host; } # braces for -w
> ! open(STDERR, ">&SE");
> ! die "can't exec nslookup: $!";
> ! }
> !
> ! if ($pid = open(NS, "-|")) {
> ! $_ = <NS>;
> ! kill 'TERM', $pid if $pid; # just in case
> ! close NS;
> ! unless (/answer:.*Address/s || /Name:.*$host.*Address:/si) {
> ! print "No DNS for \@$host in return address.\n";
> ! }
> ! } else {
> ! die "cannot fork: $!" unless defined $pid;
> ! open(SE, ">&STDERR");
> ! open(STDERR, ">/dev/null");
> ! { exec $NSLOOKUP, '-query=a', $host; } # braces for -w
> ! open(STDERR, ">&SE");
> ! die "can't exec nslookup: $!";
> ! }
> !
> ! }
> !
> !
> ! sub ck822 {
> !
> ! # ck822 -- check whether address is valid rfc 822 address
> ! # tchrist at perl.com
> ! #
> ! # pattern developed in program by jfriedl;
> ! # see "Mastering Regular Expressions" from ORA for details
> !
> ! # this will error on something like "ftp.perl.com." because
> ! # even though dns wants it, rfc822 hates it. shucks.
> !
> ! my $what = 'address';
> !
> ! my $address = shift;
> ! local $_;
> !
> ! my $is_a_valid_rfc_822_addr;
> !
> ! ($is_a_valid_rfc_822_addr = <<'EOSCARY') =~ s/\n//g;
> ! (?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n
> ! \015()]|\\[^\x80-\xff])*\))*\))*(?:(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\
> ! xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|"(?:[^\\\x80-\xff\n\015"
> ! ]|\\[^\x80-\xff])*")(?:(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xf
> ! f]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*\.(?:[\040\t]|\((?:[
> ! ^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\
> ! xff])*\))*\))*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;
> ! :".\\\[\]\000-\037\x80-\xff])|"(?:[^\\\x80-\xff\n\015"]|\\[^\x80-\xff])*"))
> ! *(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\
> ! n\015()]|\\[^\x80-\xff])*\))*\))*@(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\
> ! \[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*(?:[^(\04
> ! 0)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-
> ! \xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\])(?:(?:[\040\t]|\((?
> ! :[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80
> ! -\xff])*\))*\))*\.(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\(
> ! (?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*(?:[^(\040)<>@,;:".\\\[\]
> ! \000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\
> ! \x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\]))*|(?:[^(\040)<>@,;:".\\\[\]\000-\0
> ! 37\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|"(?:[^\\\x80-\xf
> ! f\n\015"]|\\[^\x80-\xff])*")(?:[^()<>@,;:".\\\[\]\x80-\xff\000-\010\012-\03
> ! 7]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\
> ! \[^\x80-\xff])*\))*\)|"(?:[^\\\x80-\xff\n\015"]|\\[^\x80-\xff])*")*<(?:[\04
> ! 0\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]
> ! |\\[^\x80-\xff])*\))*\))*(?:@(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x
> ! 80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*(?:[^(\040)<>@
> ! ,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]
> ! )|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\])(?:(?:[\040\t]|\((?:[^\\
> ! \x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff
> ! ])*\))*\))*\.(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^
> ! \\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*(?:[^(\040)<>@,;:".\\\[\]\000-
> ! \037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-
> ! \xff\n\015\[\]]|\\[^\x80-\xff])*\]))*(?:(?:[\040\t]|\((?:[^\\\x80-\xff\n\01
> ! 5()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*,(?
> ! :[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\0
> ! 15()]|\\[^\x80-\xff])*\))*\))*@(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^
> ! \x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*(?:[^(\040)<
> ! >@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xf
> ! f])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\])(?:(?:[\040\t]|\((?:[^
> ! \\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\x
> ! ff])*\))*\))*\.(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:
> ! [^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*(?:[^(\040)<>@,;:".\\\[\]\00
> ! 0-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x8
> ! 0-\xff\n\015\[\]]|\\[^\x80-\xff])*\]))*)*:(?:[\040\t]|\((?:[^\\\x80-\xff\n\
> ! 015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*)
> ! ?(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000
> ! -\037\x80-\xff])|"(?:[^\\\x80-\xff\n\015"]|\\[^\x80-\xff])*")(?:(?:[\040\t]
> ! |\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[
> ! ^\x80-\xff])*\))*\))*\.(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xf
> ! f]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*(?:[^(\040)<>@,;:".\
> ! \\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|"(?:
> ! [^\\\x80-\xff\n\015"]|\\[^\x80-\xff])*"))*(?:[\040\t]|\((?:[^\\\x80-\xff\n\
> ! 015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*@
> ! (?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n
> ! \015()]|\\[^\x80-\xff])*\))*\))*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff
> ! ]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\
> ! ]]|\\[^\x80-\xff])*\])(?:(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\
> ! xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*\.(?:[\040\t]|\((?
> ! :[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80
> ! -\xff])*\))*\))*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@
> ! ,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff
> ! ])*\]))*(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x8
> ! 0-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*>)(?:[\040\t]|\((?:[^\\\x80-\xff\n\
> ! 015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*
> ! EOSCARY
> !
> ! if ($address !~ /^${is_a_valid_rfc_822_addr}$/o) {
> ! print "rfc822 failure on $address";
> ! }
> ! }
> !
> !
> ! ##############################
> !
> ! package Mail_Message;
> !
> ! use Carp;
> !
> ! use vars qw/$AUTOLOAD/;
> !
> ! # process <ARGV> for a message header and body
> ! # currently this assumes one message per file!
> ! sub main::get_message {
> ! my $msg = bless {}, __PACKAGE__;
> ! local $/ = '';
> ! $msg->{HEADER_STRING} = <>;
> ! chomp $msg->{HEADER_STRING};
> ! for (split /\n(?!\s)/, $msg->{HEADER_STRING}) {
> ! my($tag, $value) = /^([^\s:]+):\s*(.*)\s*\Z/s;
> ! push @{ $msg->{HEADERS}{$tag} }, $value;
> ! $tag =~ tr/-/_/;
> ! $tag = uc($tag);
> ! push @{ $msg->{$tag} }, $value;
> ! }
> ! local $/ = undef;
> ! for ($msg->{BODY} = <>) {
> ! $msg->{PARAGRAPHS} = [ split /\n\n+/ ];
> ! $msg->{LINES} = [ split /\n/ ];
> ! }
> !
> ! return $msg;
> ! }
> !
> ! sub AUTOLOAD {
> ! use vars '$AUTOLOAD';
> ! my $self = shift;
> ! my $field;
> ! ($field = uc($AUTOLOAD)) =~ s/.*:://;
> ! my $xfield = "x_" . $field;
> !
> ! if (!exists $self->{$field} && exists $self->{$xfield}) {
> ! $field = $xfield;
> ! }
> !
> ! unless (exists $self->{$field}) {
> ! return undef;
> ! # NOT REACHED
> ! confess "No field $field in message";
> ! }
> !
> ! my $data = $self->{$field};
> ! my @data = ref $data ? @$data : $data;
> !
> ! if (wantarray) {
> ! return @data;
> ! }
> ! else {
> ! return join("\n", @data);
> ! }
> !
> ! }
>
> YAN^HETUT.
>
> --tom
>
More information about the LUG
mailing list