[lug] finding text lines in a single file
Matt Thompson
thompsma at colorado.edu
Tue Apr 27 15:41:29 MDT 2004
On Tue, 2004-04-27 at 14:26, Tkil wrote:
> >>>>> "Carl" == Carl Wagner <Wagner> writes:
>
> Carl> That should do it. Like I said, about 10 seconds.
>
> If LogFile is really long, though, scanning through it multiple times
> will be very slow. A better technique is to build a single regex with
> all the candidates to match, then scan the log file once.
>
> Not sure how to do it in just shell, but in perl (at a sh-ish prompt):
>
> perl -we 'my $re = join "|", @ARGV;
> while (<>) { print if /$re/o }' $( cat EntryFile ) < LogFile
<snip>
I thought I'd weigh in with a sample awk script. Note that I am not an
awk hacker and am just trying to learn it. So, I'm betting this sucker
can be much improved for efficiency and is probably a memory leak or
something. To wit, using the same test files as tkil's perl script:
$ awk -f carlmatch.awk EntryFile LogFile
carlmatch.awk:
BEGIN {
while (getline <ARGV[1] > 0) {
arr[++nm] = $0
}
}
{
while (getline <ARGV[2] > 0) {
for (i in arr)
if ($1 ~ arr[i]) print
}
}
Again, quite dependent on tkil's style of files.
--
Matt Thompson -- http://ucsub.colorado.edu/~thompsma/
440 UCB, Boulder, CO 80309-0440
JILA A510, 303-492-4662
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lug.boulder.co.us/pipermail/lug/attachments/20040427/053a3013/attachment.pgp>
More information about the LUG
mailing list