[lug] Those Pesky Newlines

Tkil tkil at scrye.com
Fri Dec 27 23:28:52 MST 2002


I was just looking at some of the other replies, and (assuming your
documents aren't huge, where "huge" means "more than a few megabytes")
you can do stuff like this:

------------------------------------------------------------------------

#!/usr/bin/perl -w
#
# to-do:
# * handle blockquotes sanely.
# * consider switching to line-at-a-time to avoid huge memory overhead.

use strict;

# read document into single string:
my $text = do { local $/; <DATA> };

# try to handle all sorts of end-of-line (eol) markers
my $eol = qr/ (?: \015     |      # CR:   macintosh
                  \015\012 |      # CRLF: dos, windows, vms
                  \012     ) /x;  # LF:   unix

# record the first EOL so we can use it later.
my ($found_eol) = ( $text =~ m/($eol)/ );

# find a string that isn't used in the original text
my $magic = "xyzzy";
++$magic while ( index ( $text, $magic ) > -1 );

# horizontal white space
my $hws = qr/ (?: \t | \040 ) /x;

# replace anything that looks like a paragraph break with that
$text =~ s! $eol (?: $hws* $eol )+ | # 1+ blank lines
            $eol $hws+               # line break + indent
          !$magic!gx;

# then replace all the rest of the line breaks with a single space.
# as an effort to be nice, we watch for hyphens and try to fix them.
$text =~ s/-$eol//g;
$text =~ s/$eol/ /g;

# finally, replace the magic markers (heh) with line break
$text =~ s/$magic/$found_eol/g;
$text .= $found_eol;

# and output it.
print $text;

# bye!
exit 0;

__END__

This is a test.  This is only a test.  Were this a real emergency,
you'd all be dead by now.

This is a new paragraph.  The program should have figured that out on
its own.

   Here's a paragraph that starts with some whitespace.

   What should we do with paragraphs that are indented regularly, like
   blockquotes?  This might be interesting...

Oh, and it should know that "xyzzy" is the magic word, and use
something else.
This should still be in the same paragraph, tho.
    
The previous line has some whitespace, make sure that it's ignored.

Here's a new paragraph; let's see if we can't try to end it with a
hyphen at some point.  Like, maybe now.  Or, just in a hap-
hazard way...





More information about the LUG mailing list