[lug] parsing between two lists
Tkil
tkil at scrye.com
Thu Mar 28 13:47:30 MST 2002
>>>>> "Rob" == Rob Riggs <Riggs> writes:
Rob> Unique in list A: diff -u listA listB | grep ^- | sed 's/^.//g'
Rob> Unique in list B: diff -u listA listB | grep ^+ | sed 's/^.//g'
Rob> Common to both: diff -u listA listB | grep "^ " | sed 's/^.//g'
hm. now that i think about it, this version of "common to both"
probably won't work -- because "-u" only keeps 3 lines of context [by
default] in its output. so, the two files:
file1 file2
a a
b b
c c
d d
e e
f
g g
the "-u" output should only have
c
d
e
+ f
g
thus dropping "a" and "b". let me see if i got it right...
i think the least intrusive fix is something reasonably efficient,
thanks to the mergesort capability of "sort -m":
sort -m file1 file2 | uniq -c | grep '^ *2' | cut -f2- > common
if we wanted to stick with a perl script, we can take advantage of the
fact that they're already sorted. instead of allocating a big hash
table for all the entries in one file, we could do something like this
instead:
my $first = <FIRST>;
my $second = <SECOND>;
while (defined($first) && defined($second))
{
if ($first lt $second)
{
print IN_FIRST $first;
$first = <FIRST>;
}
elsif ($first gt $second)
{
print IN_SECOND $second;
$second = <SECOND>;
}
else
{
print IN_COMMON $first;
$first = <FIRST>;
$second = <SECOND>;
}
}
# take care of stragglers
if (defined($first))
{
while ($first = <FIRST>) { print $first; }
}
else
{
while ($second = <SECOND>) { print $second }
}
t.
More information about the LUG
mailing list