[lug] Apache log summaries

Chris McDermott csmcdermott at gmail.com
Fri Aug 29 11:30:19 MDT 2008


Hi all,

I was recently put in charge of a webserver which hosts several
websites.  I'm using Google Analytics to produce pretty graphs and
charts, and Google Webmaster Tools to monitor the status of our
sitemap and stuff like that.  But I realized that there was nothing
summarizing the list of url requests resulting in a 404 error.  Google
would tell me if IT ran into a 404, but that's only part of the
picture.  A brief search didn't come up with any pre-made tools that
would do that, so I wrote a perl script to do it for me.  I set up
weekly cron jobs to run the script against different log files (one
per website) and email the results to the interested parties.  Usage
is like so:

30 8 * * 5 /path/to/logWatch.pl "/path/to/www.website.com-access_log"
csmcdermott at gmail.com

It would be best, of course, to synchronize running the script with
the log file rotation schedule.

Here's the script:

-----------------------------------------------------------------------------------------------------------------------------------------------

#!/usr/bin/perl

## This script examines the apache logs and parses them for information.

use strict;
use Net::SMTP;

##########################################################
################ CONFIGURATION ###########################
##########################################################

my $dstEmail = @ARGV[1];
my $srcEmail = "logwatch\@domain.com";
my $mailserver = "mailserver";
my $apacheLog = @ARGV[0];

my $mailerDebugLevel = 0;

##########################################################
################ END CONFIGURATION #######################
##########################################################



open ACCESSLOG, "$apacheLog" or die $!;

my @accesses = <ACCESSLOG>;

close ACCESSLOG;

my %ips;
my %urls;
my %codes;

my $ip;
my $url;
my $code;

foreach my $line (@accesses) {
        if ($line =~
/^(\d+\.\d+\.\d+\.\d+).*GET\s(.*)HTTP.*\"\s+(\d+)\s+\d+$/) {
                $ip = $1;
                $url = $2;
                $code = $3;
        }


        # Count the number of requests from each IP address
        if (exists $ips{$ip}) {
                $ips{$ip}++;
        }
        else {
                $ips{$ip} = 1;
        }

        # Count the number of each return code
        if (exists $codes{$code}) {
                $codes{$code}++;
        }
        else {
                $codes{$code} = 1;
        }

        # Save each url that resulted in a 404 error
        if (($code == 404) && !(exists $urls{$url})) {
                $urls{$url} = " ";
        }

}


my @head;
my @body;

my $server;
if ($apacheLog =~ /(\w+\.\w+\.\w+)-access_log/) {
        $server = $1;
}

push (@body, "Web log results for $server\n\n");
push (@body, "Top Ten Visitors:\t\tNo of Requests:\n");

my @sorted = reverse sort {$ips{$a} <=> $ips{$b}} keys %ips;
for (my $c=0; $c<10; $c++) {
        if ($c > (@sorted - 1)) {
                next;
        }
        push (@body, "$sorted[$c]\t\t\t$ips{$sorted[$c]}\n");
}

push (@body, "\nReturn Codes:\t\t\tNo:\n");
foreach (sort keys %codes) {
        push (@body, "$_\t\t\t\t$codes{$_}\n");
}

my $uniques = keys (%urls);
push (@body, "\nThe following urls returned 404 error codes:\t \(No of
unique urls: $uniques\)\n");
foreach (keys %urls) {
        push (@body, "$_\n");
}


push(@head, "To: $dstEmail\n");
push(@head, "From: $srcEmail\n");
push(@head, "Subject: Web log results from $server\n");
push (@head, "Content-Type: text/plain; charset=UTF-8\n");
push(@head, "Content-Transfer-Encoding: 8bit\n");
push(@head, "\n");


# Send the message out;
my $mailer = Net::SMTP->new($mailserver, Timeout => 60, Debug   =>
$mailerDebugLevel) or (print "$0: cannot open '$mailserver' for
writing: $!\n");

$mailer->mail($srcEmail);
$mailer->to($dstEmail);
$mailer->data();

$mailer->datasend(@head);
$mailer->datasend(@body);
$mailer->dataend();
$mailer->quit;

exit 0;

-----------------------------------------------------------------------------------------------------------------------------------------------

And here's sample output:

-----------------------------------------------------------------------------------------------------------------------------------------------

Web log results for staging.domain.com

Top Ten Visitors:		No of Requests:
xx.xxx.xxx.xxx			1059
xx.xxx.xxx.xxx			1032

Return Codes:			No:
200				1999
301				11
401				31
404				50

The following urls returned 404 error codes:	 (No of unique urls: 4)
/jobs/internship
/favicon.ico
/welcome-to-my-site
/templates/default/css/template_css.css

-----------------------------------------------------------------------------------------------------------------------------------------------



I'm pretty new to perl, so I'd love feedback if you see areas that
could use improvement.  I think when I have time I'm going to do
reverse dns lookups on the top ten IP addresses and include the
results in the summary, since the IP's themselves are sort of useless.

Chris



More information about the LUG mailing list