[lug] finding text lines in a single file

Matt Thompson thompsma at colorado.edu
Tue Apr 27 15:41:29 MDT 2004


On Tue, 2004-04-27 at 14:26, Tkil wrote:
> >>>>> "Carl" == Carl Wagner <Wagner> writes:
> 
> Carl> That should do it.  Like I said, about 10 seconds.
> 
> If LogFile is really long, though, scanning through it multiple times
> will be very slow.  A better technique is to build a single regex with
> all the candidates to match, then scan the log file once.
> 
> Not sure how to do it in just shell, but in perl (at a sh-ish prompt):
> 
> perl -we 'my $re = join "|", @ARGV;
>           while (<>) { print if /$re/o }' $( cat EntryFile ) < LogFile
<snip>

I thought I'd weigh in with a sample awk script.  Note that I am not an
awk hacker and am just trying to learn it.  So, I'm betting this sucker
can be much improved for efficiency and is probably a memory leak or
something.  To wit, using the same test files as tkil's perl script:

$ awk -f carlmatch.awk EntryFile LogFile

carlmatch.awk:

BEGIN {
   while (getline <ARGV[1] > 0) {
      arr[++nm] = $0
   }
}
{
   while (getline <ARGV[2] > 0) {
      for (i in arr)
         if ($1 ~ arr[i]) print
   }
}

Again, quite dependent on tkil's style of files.

-- 
Matt Thompson -- http://ucsub.colorado.edu/~thompsma/
440 UCB, Boulder, CO  80309-0440
JILA A510, 303-492-4662
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lug.boulder.co.us/pipermail/lug/attachments/20040427/053a3013/attachment.pgp>


More information about the LUG mailing list