[lug] finding text lines in a single file
Wagner, Carl
Carl.Wagner at Level3.com
Tue Apr 27 14:38:36 MDT 2004
Thanks! I will have to store that away for future use as well.
The file could potentially be quite large. So optimization is a good thing.
Thanks again,
Carl.
-----Original Message-----
From: Tkil [mailto:tkil at scrye.com]
Sent: Tuesday, April 27, 2004 2:26 PM
To: Wagner, Carl
Cc: Boulder (Colorado) Linux Users Group -- General Mailing List
Subject: Re: [lug] finding text lines in a single file
>>>>> "Carl" == Carl Wagner <Wagner> writes:
Carl> That should do it. Like I said, about 10 seconds.
If LogFile is really long, though, scanning through it multiple times
will be very slow. A better technique is to build a single regex with
all the candidates to match, then scan the log file once.
Not sure how to do it in just shell, but in perl (at a sh-ish prompt):
perl -we 'my $re = join "|", @ARGV;
while (<>) { print if /$re/o }' $( cat EntryFile ) < LogFile
In pure perl:
| #!/usr/bin/perl
|
| use strict;
| use warnings;
|
| require 5.006;
|
| unless ( @ARGV == 2 )
| {
| die "usage: $0 EntryFile LogFile";
| }
|
| my ( $entry_file, $log_file ) = @ARGV;
|
| my $entry_re = do
| {
| open my $entry_fh, $entry_file
| or die "$0: opening $entry_file: $!";
| my $re = join '|',
| grep { /\S/ } # anything left?
| map { s/^\s+//; s/\s+$//; $_ } # remove whitespace
| <$entry_fh>;
| qr/$re/
| };
|
| open my $log_fh, $log_file
| or die "$0: opening $log_file: $!";
| while ( <$log_fh> )
| {
| print if m/$entry_re/;
| }
| close $log_fh
| or die "$0: closing $log_file: $!";
|
| exit 0;
Oh, duh, you can build up a regex almost as easily in the shell:
| #!/bin/bash
|
| entry_file=$1
| log_file=$2
|
| re=""
| sep=""
| for i in $( cat $entry_file )
| do
| re="$re$sep$i";
| sep="|";
| done
|
| exec egrep "$re" $log_file
Note that both of these solutions are likely to explode if you use
special characters in EntryFile; in the shell case, even whitespace
will be enough to [possibly] cause spurious matches.
Here's what I tested against.
| $ perl -lwe 'for ( 1 .. 100 ) {
| printf "%03d 0x%04x\n", ( 100+rand(100) ) x 2
| }' > carl1-log.txt
|
| $ cat carl1-log.txt
| 182 0x00b6
| 126 0x007e
| 128 0x0080
| [...]
| 184 0x00b8
| 140 0x008c
| 127 0x007f
| 165 0x00a5
| 146 0x0092
|
| $ cat carl1-entries.txt
| 123
| 124
| 125
| 126
|
| $ ./carl1.plx carl1-entries.txt carl1-log.txt
| 126 0x007e
| 125 0x007d
|
| $ ./carl1.sh carl1-entries.txt carl1-log.txt
| 126 0x007e
| 125 0x007d
t.
More information about the LUG
mailing list