[lug] Perl question: how to print embedded metacharacters
Tom Christiansen
tchrist at perl.com
Wed Nov 25 15:32:16 MST 2009
In-Reply-To: Message from Chip Atkinson <chip at pupman.com>
of "Wed, 25 Nov 2009 11:20:05 MST."
<Pine.LNX.4.10.10911251110100.9985-100000 at chip1.pupman.com>
I've taken the time and trouble to rethread && retread the agonizingly
borked message I'm replying to, because it was the only way I could
figure out what was going on. It was super-horrifically jeopardy-quoted
and miswrapped. I can't believe people read this sort of thing.
And there was no accumulated References header: !*SIGH*!
Jeopardy quoting is sometimes called bottom-posting. It puts the new
answer *before* the quoted old question in a confusing violation of cause
and effect. I've inverted the material back into FIFO order such that the
more-quoted portions precede the less-quoted answers. That way it can be
read in the same order in which it occurred. I really can't read it
backwards, and wonder who can. I am afraid I've messed up the attributions,
but this was the best I could do given what I received.
Besides the jeopardy problem, I had to fix all the ugly miswrappage just so
I could read the message. The miswrapping was stuff like:
>> It looks like you are using a newsreader that wraps
> lines for you
>> incorrectly. This is often a badly configured web browser.
> If it's
>> netscape, I suggest you report the bug to them. It mangles
> your
>> posting beyond any reasonable ability to read.
That's--well, execrable. But it doesn't have to be like that.
Don't accept bad wrapping: Fix it!
Here's a little front-end to Damian Conway's awesome Text::Autoformat
module to fix it up to (be) properly read:
>> It looks like you are using a newsreader that wraps lines for you
>> incorrectly. This is often a badly configured web browser. If
>> it's netscape, I suggest you report the bug to them. It mangles
>> your posting beyond any reasonable ability to read.
Here, you need this:
| #!/usr/local/bin/perl
| # afmt - autoformat reply text using...
| use Text::Autoformat;
|
| $| = 1;
|
| $/ = '';
| $size = 75;
|
| if (@ARGV && $ARGV[0] =~ /^-?(\d+)$/) {
| shift;
| $size = $1;
| }
|
| our $pid = postprocess();
|
| while (<>) {
| die unless 3 == (($leading, $text, $ending) = /^\A(\n*)(.*?)(\n*)\Z/s);
| print #$leading,
| autoformat($text => {
| squeeze => 0,
| all => 1,
| right => $size,
| # ignore => qr/^[ \t]/,
| }),
| $ending;
| }
| close STDOUT;
| waitpid($pid, 0);
|
| # backend filter processing clean up work squeeze multiple newlines between
| # paragraphs (with optional whitespace slot in lines) into single blank lines
| # only and make sure the last paragraph has just a regular single terminating
| # newline, never a blank line
| #
| sub postprocess {
| my $pid = open(STDOUT, "|-");
| die "$0: forkopen failed: $!" unless defined $pid;
| return $pid if $pid;
| $/ = ''; # redundant with above, but different proc
| while (<>) {
| s/\s+\z/\n\n/g;
| s/\n\n\z/\n/ if eof;
| print;
| }
| exit;
| }
|
Ok, and now for the message, placed in FIFO order the way it was spoken
instead of LIFO order the way it came to me. I can't read backwards.
>>>>>>>>>> Greetings,
>>>>>>>>>>
This:
>>>>>>>>>> I'm working on a perl script that reads strings from a
>>>>>>>>>> configuration
>>>>>>>> file
>>>>>>>>>> and prints them out. What I'd like to be able to do is have any
>>>>>>>> embedded
>>>>>>>>>> metacharacters interpreted but I'd like to avoid doing the
>>>>>>>> substitutions
>>>>>>>>>> myself.
is the erroneously quoted version of
>>>>>>>>>> I'm working on a perl script that reads strings from a
>>>>>>>>>> configuration file and prints them out. What I'd like to be
>>>>>>>>>> able to do is have any embedded metacharacters interpreted
>>>>>>>>>> but I'd like to avoid doing the substitutions myself.
>>>>>>>>>>
>>>>>>>>>> For example, here is a little script to emulate the problem:
>>>>>>>>>>
>>>>>>>>>> #!/usr/bin/perl
>>>>>>>>>>
>>>>>>>>>> my $kw = 'abc\ndef';
>>>>>>>>>>
>>>>>>>>>> print "$kw";
>>>>>>>>>> ----------------------
>>>>>>>>>> the output is
>>>>>>>>>> abc\ndef
>>>>>>>>>>
>>>>>>>>>> What I want is
>>>>>>>>>> abc
>>>>>>>>>> def
>>>>>>>>>>
>>>>>>>>>> --------------
>>>>>>>>>> I tried to jfgi but no joy.
>>>>>>>>>>
>>>>>>>>>> Thanks in advance.
>>>>>>>>>>
>>>>>>>>>> Chip
>>>>>>>> No, unfortunately.
>>>>>>>>
This:
>>>>>>>> The problem is that the script reads the string in from the command
>>>>>>>> file,
>>>>>>>> so I need to interpret the string before printing it. Here's another
>>>>>>>> example:
is the erroneously quoted version of
>>>>>>>> The problem is that the script reads the string in from the
>>>>>>>> command file, so I need to interpret the string before printing
>>>>>>>> it. Here's another example:
>>>>>>>> ---------------------
>>>>>>>> chip at chip-desktop:~$ cat try2.pl
>>>>>>>> #!/usr/bin/perl
>>>>>>>>
>>>>>>>> #my $kw = 'abc\ndef';
>>>>>>>> #my $nv = "${\($kw)}";
>>>>>>>>
>>>>>>>> while (<>) {
>>>>>>>> print "$_";
>>>>>>>> }
>>>>>>>> chip at chip-desktop:~$ echo 'abc\ndef' | ./try2.pl
>>>>>>>> abc\ndef
>>>>>>>> chip at chip-desktop:~$
>>>>>>>> ---------------------------------
This:
>>>>>>>> In a sense, I need the opposite of quotemeta -- something that takes
>>>>>>>> in a
>>>>>>>> string and does the metacharacter replacements.
is the erroneously quoted version of
>>>>>>>> In a sense, I need the opposite of quotemeta -- something that
>>>>>>>> takes in a string and does the metacharacter replacements.
>>>>>>>>
>>>>>>>> I know that something like
>>>>>>>> while (<>) {
>>>>>>>> s/\\n/\n/g;
>>>>>>>> # etc. for all metacharacter sequences...
>>>>>>>> print;
>>>>>>>> }
>>>>>>>>
>>>>>>>> would work but I wanted to avoid the "etc. for all metacharacter
>>>>>>>> sequences"
>>>>>>>>
>>>>>>>> Hopefully that's more understandable.
>>>>>>>>
>>>>>>>> Chip
>>>>>>>>
>>>>>>>> On Wed, 25 Nov 2009, Jeffrey Haemer wrote:
>>>>>>>>
>>>>>>>>> Chip,
>>>>>>>>>
>>>>>>>>> Can you do this instead?
>>>>>>>>>
>>>>>>>>> my $kw = "abc\ndef";
>>>>>>>>>
>>>>>>>>> If not, can you explain in enough detail that I can try again? :-)
>>>>>>>>>
>>>>>>>>> On Wed, Nov 25, 2009 at 8:29 AM, Chip Atkinson <chip at pupman.com>
>>>>>>>>> wrote:
>>>>>>> Chip,
>>>>>>>
>>>>>>> Okay. Would something like this be good enough?
>>>>>>>
>>>>>>>
>>>>>>> #!/usr/bin/perl
>>>>>>>
>>>>>>> while (<>) {
>>>>>>> $_=` printf "$_" `;
>>>>>>> print;
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Nov 25, 2009 at 9:06 AM, Chip Atkinson <chip at pupman.com> wrote:
>>>>>>
>>>>>> Hmmmm... that does work, but isn't it using the bash printf rather than
>>>>>> perl? My concern is how fast it would be. I can experiment there
>>>>>> though,
>>>>>> and it does work.
>>>>>>
>>>>>> Thanks for your help!
>>>>>>
>>>>>> Chip
>>>>>>
>>>>>> On Wed, 25 Nov 2009, Jeffrey Haemer wrote:
>>>>> Chip,
>>>>> Yep, I'm using the shell to do the dirty work.
>>>>> I don't doubt there are elegant, pure-Perl solutions. I was only trying
>>>>> to find something that would work for you, immediately, until someone else
>>>>> volunteers a more PC (Perlishly Correct) way. :-)
>>>>> On Wed, Nov 25, 2009 at 9:34 AM, Chip Atkinson <chip at pupman.com> wrote:
>>>> It didn't take much thought to realize I should have written "Perlitically
>>>> Correct."
>>>>
>>>> On Wed, Nov 25, 2009 at 9:41 AM, Jeffrey Haemer <jeffrey.haemer at gmail.com>
>>>> wrote:
>>> More Perlically Correct: (security considerations aside, I basically
>>> run the perl interpreter on the input and treat it as code again) Just
>>> make sure you do lots of input checking with this program.
>>>
>>> [09:57:25] beldyk at chupacabra: ~
>>> $ cat tmp.pl
>>> #!/usr/bin/perl -w
>>>
>>> my $str = <<EOS;
>>> foo\nbar\tfoo\a\t
>>> EOS
>>>
>>> my $nstr;
>>>
>>> eval("\$nstr = \"$str\";");
>>>
>>> print $nstr;
>>>
>>> [09:57:33] beldyk at chupacabra: ~
>>> $ ./tmp.pl
>>> foo
>>> bar foo
>>>
>> So it's more clear what my script is doing (and fixing the error with
>> the <<EOS thing (I initially wrote this with cat file|perl -ne
>> 'blah'))
>>
>> [10:10:25] beldyk at chupacabra: ~
>> $ ./tmp.pl
>> foo\nbar\tfoo\a\t
>> *****************************
>> foo
>> bar foo
>> [10:10:26] beldyk at chupacabra: ~
>> $ cat tmp.pl
>> #!/usr/bin/perl -w
>>
>> my $str = ' foo\nbar\tfoo\a\t';
>>
>> my $nstr;
>>
>> eval("\$nstr = \"$str\";");
>> print $str;
>> print "\n*****************************\n";
>> print $nstr;
> Thanks! Yes, this is pretty much what I was looking for.
>
> I think the thing that I must not have been clear about was that the
> metacharacters need to be interpreted at run time rather than compile
> time.
>
> Thus someting like
>
> $kx = "abc\ndef";
>
> gets evaluated at compile time so that the \n that you see in the text
> editor is converted to 0x0D and stuffed into the string before it's
> executed by the perl engine(?).
>
> I was playing with eval but wasn't sure how to go about it.
So it seems that what you want is to force a string through another
level of interpolation. So
interp('Please ring my \a.');
should return
interp("Please ring my \a.");
For this eval is the crux. A better solution would use a unique heredoc
terminator that's guaranteed *not* to occur in the string.
|| use charnames q(:full);
|| use strict;
||
|| our $EO_HEREDOC_TOKEN = q(<----End of Here Doc 000000);
||
|| sub interp {
|| my $interp_string = shift;
|| my $eo_heredoc_token = $EO_HEREDOC_TOKEN;
|| while ($interp_string =~ $eo_heredoc_token) {
|| $eo_heredoc_token =~ s{ \d+ \z }{ int rand 1e6 }ex;
|| }
|| my $evalstr = join(qq(\n), (
|| qq(<<"$eo_heredoc_token"),
|| $interp_string,
|| $eo_heredoc_token,
|| q()
|| )
|| );
|| 0 and print "\n[DEBUG: eval()ing:\n$evalstr]\n";
|| my $newstr = eval $evalstr;
|| die if $@;
|| chomp $newstr;
|| return $newstr;
|| }
||
||
|| our @tests = split(/ \s* \n \s* /x, <<'End of Interpolation Tests');
|| \uthis is \Ucapitalized.
|| I am $0.
|| Please ring my \a.
|| <----End of Here Doc 000000
|| tab \t formfeed \f return \r linefeed \n vertab \013 bs \b done.
|| Solidus is \N{SOLIDUS}.
|| Sudilso is \N{REVERSE SOLIDUS}.
|| End of Interpolation Tests
||
|| push @tests, $EO_HEREDOC_TOKEN;
|| push @tests, $EO_HEREDOC_TOKEN . "\n";
||
|| open(PIPE, "|cat -teunv") || die;
|| select(PIPE);
|| $| = 1;
||
|| for my $testno ( 0 .. $#tests ) {
|| my $was = $tests[$testno];
|| my $is = interp($was);
|| printf qq(Test #%d: "%s" => "%s"\n), $testno, $was, $is;
|| }
||
|| close(PIPE) || die;
|| select STDOUT;
|| printf "\nDone with %d tests.\n", 0+ at tests;
|| exit 0;
||
Which when run produces
1 Test #0: "\uthis is \Ucapitalized." => "This is CAPITALIZED."$
2 Test #1: "I am $0." => "I am interp-tester."$
3 Test #2: "Please ring my \a." => "Please ring my ^G."$
4 Test #3: "<----End of Here Doc 000000" => "<----End of Here Doc 000000"$
5 Test #4: "tab \t formfeed \f return \r linefeed \n vertab \013 bs \b done." => "tab ^I formfeed ^L return ^M linefeed $
6 vertab ^K bs ^H done."$
7 Test #5: "Solidus is \N{SOLIDUS}." => "Solidus is /."$
8 Test #6: "Sudilso is \N{REVERSE SOLIDUS}." => "Sudilso is \."$
9 Test #7: "<----End of Here Doc 000000" => "<----End of Here Doc 000000"$
10 Test #8: "<----End of Here Doc 000000$
11 " => "<----End of Here Doc 000000$
12 "$
Done with 9 tests.
As you see, you have variables getting expanded--of course.
You may wish to look into String::Interpolate on CPAN; it
can use Safe compartments &c&c&c.
MEANwhile, there was all this clearly important test in your message:
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Web Page: http://lug.boulder.co.us
>>>>>>>>>> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
>>>>>>>>>> Join us on IRC: lug.boulder.co.us port=6667
>>>>>>>>>> channel=#hackingsociety
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Click to Call Me Now! --
>>>>>>>>> http://seejeffrun.blogspot.com/2009/09/call-me-now.html
>>>>>>>>>
>>>>>>>>> Jeffrey Haemer <jeffrey.haemer at gmail.com>
>>>>>>>>> 720-837-8908 [cell], @goyishekop [twitter]
>>>>>>>>> http://www.youtube.com/user/goyishekop [vlog]
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Web Page: http://lug.boulder.co.us
>>>>>>>> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
>>>>>>>> Join us on IRC: lug.boulder.co.us port=6667 channel=#hackingsociety
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Click to Call Me Now! --
>>>>>>> http://seejeffrun.blogspot.com/2009/09/call-me-now.html
>>>>>>>
>>>>>>> Jeffrey Haemer <jeffrey.haemer at gmail.com>
>>>>>>> 720-837-8908 [cell], @goyishekop [twitter]
>>>>>>> http://www.youtube.com/user/goyishekop [vlog]
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Web Page: http://lug.boulder.co.us
>>>>>> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
>>>>>> Join us on IRC: lug.boulder.co.us port=6667 channel=#hackingsociety
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Click to Call Me Now! --
>>>>> http://seejeffrun.blogspot.com/2009/09/call-me-now.html
>>>>>
>>>>> Jeffrey Haemer <jeffrey.haemer at gmail.com>
>>>>> 720-837-8908 [cell], @goyishekop [twitter]
>>>>> http://www.youtube.com/user/goyishekop [vlog]
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Click to Call Me Now! --
>>>> http://seejeffrun.blogspot.com/2009/09/call-me-now.html
>>>>
>>>> Jeffrey Haemer <jeffrey.haemer at gmail.com>
>>>> 720-837-8908 [cell], @goyishekop [twitter]
>>>> http://www.youtube.com/user/goyishekop [vlog]
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Web Page: http://lug.boulder.co.us
>>>> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
>>>> Join us on IRC: lug.boulder.co.us port=6667 channel=#hackingsociety
>>>>
>>>
>>>
>>>
>>> --
>>> Calvin: Know what I pray for?
>>> Hobbes: What?
>>> Calvin: The strength to change what I can, the inability to accept
>>> what I can't, and the incapacity to tell the difference.
>>>
>>
>>
>>
>> --
>> Calvin: Know what I pray for?
>> Hobbes: What?
>> Calvin: The strength to change what I can, the inability to accept
>> what I can't, and the incapacity to tell the difference.
>> _______________________________________________
>> Web Page: http://lug.boulder.co.us
>> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
>> Join us on IRC: lug.boulder.co.us port=6667 channel=#hackingsociety
>>
> _______________________________________________
> Web Page: http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: lug.boulder.co.us port=6667 channel=#hackingsociety
Well, or not. Hm. This could use some netnannying.
! #!/usr/bin/perl -w
! # netnanny - check news message for netiquette issues
! # tchrist at perl.com
!
! use strict;
!
! my $Msg = get_message();
!
! missing_headers();
! bogus_address();
! allcap_subject();
! annoying_subject();
!
! mimes();
!
! lines_too_long();
! control_characters();
!
! miswrapped();
! jeopardy_quoted();
! overquoted();
!
! good_signature();
!
! exit(0);
!
! #######################
!
! sub AUTOLOAD {
! use vars '$AUTOLOAD';
! my $field;
! ($field = uc($AUTOLOAD)) =~ s/.*:://;
! if (!defined wantarray) {
! require Carp;
! Carp::confess("Undefined function call: $AUTOLOAD");
! }
! $Msg->$field();
! }
!
! sub bogus_address {
! my $address = from();
!
! if ($address !~ /\@\w.*\.\w/) {
! print "From address must contain an at sign, etc.\n";
! return;
! }
!
! if ($address =~ /(remove|spam)/i) {
! print "Munged return address suspected, found `$1' in from.\n";
! }
!
! ck822($address); # inscrutable
!
! my($host) = $address =~ /\@([a-zA-Z0-9_.-]+)/;
! dns_check($host); # very slow!
! }
!
! sub control_characters {
! my $lineno = 0;
! my $MAX_CONTROL = 5;
!
! for (lines()) {
! $lineno++;
! if (/(?=[^\s\b])([\000-\037])/) {
! printf "Control character (char %#o) appears at line %d of body.\n",
! ord $1, $lineno;
! }
!
! if (/([\202-\237])/) {
! printf "MS-ASCII character (char %#o) appears at line %d of body.\n",
! ord $1, $lineno;
! }
! last if --$MAX_CONTROL < 0;
! }
!
! }
!
! sub lines_too_long {
! my $MAX_LINE_LEN = 80;
! my $line_count = scalar @{ [ lines() ] };
! my ($long_lines, $longest_line_data, $longest_line_number) = (0,'',0);
! my $lineno = 0;
! for (lines()) {
! $lineno++;
! next if /^[>+-]/; # skip quotes and patch diffs
! if (length() > $MAX_LINE_LEN) {
! $long_lines++;
! if (length() > length($longest_line_data)) {
! $longest_line_data = $_;
! $longest_line_number = $lineno;
! }
! }
! }
! if ($long_lines) {
! printf "%d of %d lines exceed maxlen %d, ",
! $long_lines, $line_count, $MAX_LINE_LEN;
! printf "longest is #%d at %d bytes\n",
! $longest_line_number, length($longest_line_data);
! }
! }
!
! sub missing_headers {
! if (subject() !~ /\S/) {
! print "Missing required subject header.\n";
! }
! if (newsgroups() && subject() =~ /^\s*Re:/i && !references()) {
! print "Followup posting missing required references header.\n";
! }
! }
!
! sub allcap_subject {
! my $subject = subject();
! $subject =~ s/^(\s*Re:\s*)+//i;
! if ($subject !~ /[a-z]/) {
! print "No lower-case letters in subject header.\n";
! }
! }
!
! sub miswrapped {
! my($bq1, $bq2);
! for (paragraphs()) {
! next unless /\A(^\S.*){2,}\Z/ms; # no indented code
!
! if (!$bq1 && /^>\S.*\n\s*[a-zA-Z]/) {
! print "Incorrectly wrapped quoted text.\n";
! $bq1++;
! }
!
! next if $bq2;
!
! my $count = 0;
! while (/^[^>].{60,}\n[^>].{1,20}\n(?=[^>].{60,}\n)/gm) {
! $count++;
! }
!
! if ($count > 1) {
! print "Incorrectly wrapped regular text.\n";
! $bq2++;
! ### print "OOPS count = $count:\n$_\n\n";
! }
! }
! }
!
! sub jeopardy_quoted {
! local $_ = body();
! $_ = unquote_wrap($_);
!
! $_ = strip_signature($_);
! $_ = strip_attribution($_);
!
! # check quotation at bottom but nowhere else
! # XXX: these can go superlong superlong! i've added
! # some more anchors and constraints to try to avoid this,
! # but I still mistrust it
!
! #if (/ ((^\s*>.*){2,}) \s* \Z/mx !/ (\n>.*?)+ (\n[^>].*?)+ (\n>.*?)+ /x )
! if ( /(?:\n?>.*)+\s*\Z/ )
! {
! print "Quote follows response, Jeopardy style #1\n";
! }
!
! # completely at bottom
! elsif (/^.* wr(?:ote|ites):\s+(>.*\n)+\s*\Z/m) {
! print "Quote follows response, Jeopardy style #2\n";
! }
!
! # another way of saying the same
! elsif (/^(?:>+\s*)?-[_+]\s*Original Message\s*-[_+]\s.*\Z/ms) {
! print "Quote follows response, Jeopardy style #3\n";
! }
!
! # another way of saying the same
! elsif (/^(?:>+\s*)?[_-]+\s*Reply Separator\s*[_-]+\s.*\Z/ms) {
! print "Quote follows response, Jeopardy style #4\n";
! }
!
! }
!
! sub overquoted {
!
! # cfoq: check fascistly overquoted by tchrist at mox.perl.com
! # (wants perl 5.0 or better; developed under 5.002)
!
! my (
! $total, # total number of lines, minus sig and attribution
! $quoted_lines, # how many lines were quoted
! $percent, # what percentage this in
! $pcount, # how many in this paragraph were counted
! $match_part, # holding space for current match
! $gotsig, # is this the sig paragraph?
! );
!
! $total = $quoted_lines = $pcount = $percent = 0;
!
! my $MINLINES = 20;
! my $TOLERANCE = 50;
! my $VERBOSE = 0;
!
! if (body() =~ /^-+\s*Original Message\s*-+$/m) {
! my $body = strip_signature(body());
! my($text,$quote) = body() =~ /(.*)(^-+\s*Original Message\s*-+.*\Z)/ms;
! $total = ((my $x = body()) =~ y/\n//);
! $quoted_lines = ($quote =~ y/\n//);
! }
! else {
! for (paragraphs()) {
!
! # strip sig line, remember we found it
! $gotsig = s/^-- \n.*//ms;
!
! # strip attribution, possibly multiline
! if ($. == 2) { s/\A.*?(<.*?>|\@).*?:\n//s }
!
! # toss trailing blank lines into one single line
! s/\n+\Z/\n/;
!
! # now reduce miswrapped lines from idiotic broken PC newsreaders
! # into what they should have been
! s/(>.*)\n\s*([a-zA-Z])/$1 $2/g;
!
! # count lines in this paragraph
! $total++ while /^./mg;
!
! # is it a single line, quoted in the customary fashion?
! if ( /^(>+).*\n\Z/ ) {
! $quoted_lines++;
! print " 1 line quoted with $1\n" if $VERBOSE;
! next;
! }
!
! # otherwise, it's a multiline block, which may be quoted
! # with any leading repeated string that's neither alphanumeric
! # nor string
! while (/^(([^\w\s]+).*\n)(\2.*\n)+/mg) { # YANETUT
! $quoted_lines += $pcount = ($match_part = $&) =~ tr/\n//;
! printf "%2d lines quoted with $2\n", $pcount if $VERBOSE;
! }
!
! last if $gotsig;
! }
!
! }
!
! $percent = int($quoted_lines / $total * 100);
!
! if ($total == $quoted_lines) {
! print "All $total lines were quoted lines!\n";
! # not ok
! }
! elsif ($percent > $TOLERANCE && $total > $MINLINES) {
! print "Overquoted: $quoted_lines lines quoted out of $total: $percent%\n";
! }
!
! }
!
! sub unquote_wrap {
! my $chunk = shift;
! # reduce miswrapped lines from idiotic broken PC newsreaders
! # into what they should have been
! $chunk =~ s/(>.*)\n\s*([a-zA-Z])/$1 $2/g;
! return $chunk;
! }
!
! sub good_signature {
! my $MAX_SIGLINES = 4;
!
! my $sig = '';
! my($is_canon, $separator);
!
! my $body = body();
!
! # sometimes the ms idiotware quotes at the bottom this way
! $body =~ s/^-+\s*Original Message\s*-+\s.*\Z//ms;
!
! # first check regular signature
! if ($body =~ /\n-- \n(.*)/s) {
! $sig = $1;
! $is_canon = 1;
! }
! elsif ($body =~ /\n([_-]{2,}\s*)\n(.*?)$/s) {
! $separator = $1;
! $sig = $2;
! }
!
! for ($separator, $sig) { s/\n\Z// if defined }
!
! my $siglines = $sig =~ tr/\n//;
!
! if ($separator && ($siglines && $siglines < 20)) {
! if ($separator eq '--') {
! print "Double-dash in signature missing trailing space.\n";
! } else {
! print "Non-canonical signature separator: `$separator'\n";
! }
! }
!
! if ($siglines > $MAX_SIGLINES && $siglines < 20) {
! printf "Signature too long: %d lines\n", $siglines;
! }
! }
!
! sub strip_signature {
! local $_ = shift;
!
! s/\n-- \n(.*)//s
! ||
! s/\n([_-]{2,}\s*)\n(.*?)$//s;
!
! return $_;
! }
!
! sub attribution {
! local $_ = body();
! s/^\s*\[.*\]\s*//; # remove [courtesy cc]
! if (/\A(.*wr(?:ote|ites):?)\n/) {
! return $1;
! } elsif (/\A(.*?(<.*?>|\@).*?:)\n/s) {
! return $1;
! } else {
! return '';
! }
! }
!
! sub strip_attribution {
! local $_ = shift;
!
! s/^\s*\[.*\]\s*//; # remove [courtesy cc]
!
! # XXX: duplicate code with previous function
! s/\A(.*wr(?:ote|ites):?)\n//
! ||
! s/\A(.*?(<.*?>|\@).*?:)\n//s;
!
! return $_;
! }
!
! sub annoying_subject {
! local $_ = subject();
!
! if ( / ( [?!]{3,} ) /x ||
! / ( HELP ) /x ||
! / ( PLEASE ) /x
! )
! {
! print "Subject line contains annoying `$1' in it.\n";
! }
! }
!
! sub mimes {
!
! my $mime_crap = 0;
!
! for (content_type()) {
! last unless defined;
! $mime_crap++;
! if (/multipart/i) {
! print "Multipart MIME detected.\n";
! }
! elsif (/html/i) {
! print "HTML encrypting detected.\n";
! }
! elsif (! (/^text$/i || m#^text/plain#i)) {
! print "Strange content type detected: $_\n";
! }
! }
!
! for (content_transfer_encoding()) {
! last unless defined;
! if (/quoted-printable/i) {
! print "Gratuitously quoted-illegible MIMEing detected.\n";
! }
! }
!
! unless ($mime_crap) {
! for (body()) {
! if (/\A\s*This message is in MIME format/i) {
! print "Gratuitous but unadvertised MIME detected.\n";
! }
! elsif ( /\A\s*This is a multi-part message in MIME format/i) {
! print "Unadvertised multipart MIME detected.\n";
! }
! }
! }
!
!
! }
!
! sub dns_check {
! my $NSLOOKUP = 'nslookup'; # or /usr/ucb?
!
! # first try an MX record, then an A rec (for badly configged hosts)
!
! my $host = shift;
! local $/ = undef;
! local *NS;
! local $_;
!
! # the following is commented out for security reasons:
! # if ( `nslookup -query=mx $host` =~ /mail exchanger/
! # otherwise there could be naughty bits in $host
! # we'll bypass system() and get right at execvp()
!
! my $pid;
!
! if ($pid = open(NS, "-|")) {
! $_ = <NS>;
! kill 'TERM', $pid if $pid; # just in case
! close NS;
! return if /mail exchanger/;
! # else fall trohugh to next test
! } else {
! die "cannot fork: $!" unless defined $pid;
! open(SE, ">&STDERR");
! open(STDERR, ">/dev/null");
! { exec $NSLOOKUP, '-query=mx', $host; } # braces for -w
! open(STDERR, ">&SE");
! die "can't exec nslookup: $!";
! }
!
! if ($pid = open(NS, "-|")) {
! $_ = <NS>;
! kill 'TERM', $pid if $pid; # just in case
! close NS;
! unless (/answer:.*Address/s || /Name:.*$host.*Address:/si) {
! print "No DNS for \@$host in return address.\n";
! }
! } else {
! die "cannot fork: $!" unless defined $pid;
! open(SE, ">&STDERR");
! open(STDERR, ">/dev/null");
! { exec $NSLOOKUP, '-query=a', $host; } # braces for -w
! open(STDERR, ">&SE");
! die "can't exec nslookup: $!";
! }
!
! }
!
!
! sub ck822 {
!
! # ck822 -- check whether address is valid rfc 822 address
! # tchrist at perl.com
! #
! # pattern developed in program by jfriedl;
! # see "Mastering Regular Expressions" from ORA for details
!
! # this will error on something like "ftp.perl.com." because
! # even though dns wants it, rfc822 hates it. shucks.
!
! my $what = 'address';
!
! my $address = shift;
! local $_;
!
! my $is_a_valid_rfc_822_addr;
!
! ($is_a_valid_rfc_822_addr = <<'EOSCARY') =~ s/\n//g;
! (?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n
! \015()]|\\[^\x80-\xff])*\))*\))*(?:(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\
! xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|"(?:[^\\\x80-\xff\n\015"
! ]|\\[^\x80-\xff])*")(?:(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xf
! f]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*\.(?:[\040\t]|\((?:[
! ^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\
! xff])*\))*\))*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;
! :".\\\[\]\000-\037\x80-\xff])|"(?:[^\\\x80-\xff\n\015"]|\\[^\x80-\xff])*"))
! *(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\
! n\015()]|\\[^\x80-\xff])*\))*\))*@(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\
! \[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*(?:[^(\04
! 0)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-
! \xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\])(?:(?:[\040\t]|\((?
! :[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80
! -\xff])*\))*\))*\.(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\(
! (?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*(?:[^(\040)<>@,;:".\\\[\]
! \000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\
! \x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\]))*|(?:[^(\040)<>@,;:".\\\[\]\000-\0
! 37\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|"(?:[^\\\x80-\xf
! f\n\015"]|\\[^\x80-\xff])*")(?:[^()<>@,;:".\\\[\]\x80-\xff\000-\010\012-\03
! 7]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\
! \[^\x80-\xff])*\))*\)|"(?:[^\\\x80-\xff\n\015"]|\\[^\x80-\xff])*")*<(?:[\04
! 0\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]
! |\\[^\x80-\xff])*\))*\))*(?:@(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x
! 80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*(?:[^(\040)<>@
! ,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]
! )|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\])(?:(?:[\040\t]|\((?:[^\\
! \x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff
! ])*\))*\))*\.(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^
! \\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*(?:[^(\040)<>@,;:".\\\[\]\000-
! \037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-
! \xff\n\015\[\]]|\\[^\x80-\xff])*\]))*(?:(?:[\040\t]|\((?:[^\\\x80-\xff\n\01
! 5()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*,(?
! :[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\0
! 15()]|\\[^\x80-\xff])*\))*\))*@(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^
! \x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*(?:[^(\040)<
! >@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xf
! f])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\])(?:(?:[\040\t]|\((?:[^
! \\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\x
! ff])*\))*\))*\.(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:
! [^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*(?:[^(\040)<>@,;:".\\\[\]\00
! 0-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x8
! 0-\xff\n\015\[\]]|\\[^\x80-\xff])*\]))*)*:(?:[\040\t]|\((?:[^\\\x80-\xff\n\
! 015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*)
! ?(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000
! -\037\x80-\xff])|"(?:[^\\\x80-\xff\n\015"]|\\[^\x80-\xff])*")(?:(?:[\040\t]
! |\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[
! ^\x80-\xff])*\))*\))*\.(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xf
! f]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*(?:[^(\040)<>@,;:".\
! \\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|"(?:
! [^\\\x80-\xff\n\015"]|\\[^\x80-\xff])*"))*(?:[\040\t]|\((?:[^\\\x80-\xff\n\
! 015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*@
! (?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n
! \015()]|\\[^\x80-\xff])*\))*\))*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff
! ]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\
! ]]|\\[^\x80-\xff])*\])(?:(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\
! xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*\.(?:[\040\t]|\((?
! :[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80
! -\xff])*\))*\))*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@
! ,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff
! ])*\]))*(?:[\040\t]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff]|\((?:[^\\\x8
! 0-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*>)(?:[\040\t]|\((?:[^\\\x80-\xff\n\
! 015()]|\\[^\x80-\xff]|\((?:[^\\\x80-\xff\n\015()]|\\[^\x80-\xff])*\))*\))*
! EOSCARY
!
! if ($address !~ /^${is_a_valid_rfc_822_addr}$/o) {
! print "rfc822 failure on $address";
! }
! }
!
!
! ##############################
!
! package Mail_Message;
!
! use Carp;
!
! use vars qw/$AUTOLOAD/;
!
! # process <ARGV> for a message header and body
! # currently this assumes one message per file!
! sub main::get_message {
! my $msg = bless {}, __PACKAGE__;
! local $/ = '';
! $msg->{HEADER_STRING} = <>;
! chomp $msg->{HEADER_STRING};
! for (split /\n(?!\s)/, $msg->{HEADER_STRING}) {
! my($tag, $value) = /^([^\s:]+):\s*(.*)\s*\Z/s;
! push @{ $msg->{HEADERS}{$tag} }, $value;
! $tag =~ tr/-/_/;
! $tag = uc($tag);
! push @{ $msg->{$tag} }, $value;
! }
! local $/ = undef;
! for ($msg->{BODY} = <>) {
! $msg->{PARAGRAPHS} = [ split /\n\n+/ ];
! $msg->{LINES} = [ split /\n/ ];
! }
!
! return $msg;
! }
!
! sub AUTOLOAD {
! use vars '$AUTOLOAD';
! my $self = shift;
! my $field;
! ($field = uc($AUTOLOAD)) =~ s/.*:://;
! my $xfield = "x_" . $field;
!
! if (!exists $self->{$field} && exists $self->{$xfield}) {
! $field = $xfield;
! }
!
! unless (exists $self->{$field}) {
! return undef;
! # NOT REACHED
! confess "No field $field in message";
! }
!
! my $data = $self->{$field};
! my @data = ref $data ? @$data : $data;
!
! if (wantarray) {
! return @data;
! }
! else {
! return join("\n", @data);
! }
!
! }
YAN^HETUT.
--tom
More information about the LUG
mailing list