[lug] Apache log summaries

David L. Anselmi anselmi at anselmi.us
Sat Aug 30 18:01:47 MDT 2008


Chris McDermott wrote:
> But I realized that there was nothing
> summarizing the list of url requests resulting in a 404 error.

logwatch does that.  But for my simple site the 404 list is mostly noise.

> It would be best, of course, to synchronize running the script with
> the log file rotation schedule.

Even if the way logwatch reports doesn't handle your multiple sites the 
way you want you can write a module for logwatch and then it will handle 
scheduling.

> my $dstEmail = @ARGV[1];
> my $srcEmail = "logwatch\@domain.com";
> my $mailserver = "mailserver";
> my $apacheLog = @ARGV[0];

I'd be inclined to check that you got something sane on the command line 
and print usage if not.  (When you have 50 of these being able to type 
"foo" and get the arguments list is handy.)

> open ACCESSLOG, "$apacheLog" or die $!;
> 
> my @accesses = <ACCESSLOG>;
> 
> close ACCESSLOG;
> 
[...]
> 
> foreach my $line (@accesses) {

So you're going to read the whole file into memory and then process it 
line by line.  I'd be inclined to use:

while (<ACCESSLOG>) {

>         # Count the number of requests from each IP address
>         if (exists $ips{$ip}) {
>                 $ips{$ip}++;
>         }
>         else {
>                 $ips{$ip} = 1;
>         }

I think you can count on undef++ == 1 so you don't need the conditional.


>         # Save each url that resulted in a 404 error
>         if (($code == 404) && !(exists $urls{$url})) {
>                 $urls{$url} = " ";
>         }

This is where you can count the occurrences.

> my @sorted = reverse sort {$ips{$a} <=> $ips{$b}} keys %ips;
> for (my $c=0; $c<10; $c++) {
>         if ($c > (@sorted - 1)) {
>                 next;
>         }
>         push (@body, "$sorted[$c]\t\t\t$ips{$sorted[$c]}\n");
> }

That's a nice bit of code.  I don't know whether there's a break you can 
use rather than next (end the loop rather than iterate).  But you could do:

if ( $#sorted > 9 ) { $#sorted = 9 };
while (@sorted) {
   my ip = shift(@sorted);
   push(@body, "$ip\t\t\t$ips{$ip}\n");
}

> push(@head, "To: $dstEmail\n");
> push(@head, "From: $srcEmail\n");
> push(@head, "Subject: Web log results from $server\n");
> push (@head, "Content-Type: text/plain; charset=UTF-8\n");
> push(@head, "Content-Transfer-Encoding: 8bit\n");
> push(@head, "\n");
> 
> # Send the message out;
> my $mailer = Net::SMTP->new($mailserver, Timeout => 60, Debug   =>
> $mailerDebugLevel) or (print "$0: cannot open '$mailserver' for
> writing: $!\n");
> 
> $mailer->mail($srcEmail);
> $mailer->to($dstEmail);
> $mailer->data();
> 
> $mailer->datasend(@head);
> $mailer->datasend(@body);
> $mailer->dataend();
> $mailer->quit;

Wow, that's a lot of lines to send an email.  Surely Perl must have 
something as easy as mailx, though I don't know it offhand.

> I'm pretty new to perl, so I'd love feedback if you see areas that
> could use improvement.

HTH.  I didn't think too hard about the regexs you used.  But that's 
frequently a place where assumptions are made that might be true for 
your test data but aren't generally.

Dave




More information about the LUG mailing list