[lug] Apache log summaries
Chris McDermott
csmcdermott at gmail.com
Fri Aug 29 11:30:19 MDT 2008
Hi all,
I was recently put in charge of a webserver which hosts several
websites. I'm using Google Analytics to produce pretty graphs and
charts, and Google Webmaster Tools to monitor the status of our
sitemap and stuff like that. But I realized that there was nothing
summarizing the list of url requests resulting in a 404 error. Google
would tell me if IT ran into a 404, but that's only part of the
picture. A brief search didn't come up with any pre-made tools that
would do that, so I wrote a perl script to do it for me. I set up
weekly cron jobs to run the script against different log files (one
per website) and email the results to the interested parties. Usage
is like so:
30 8 * * 5 /path/to/logWatch.pl "/path/to/www.website.com-access_log"
csmcdermott at gmail.com
It would be best, of course, to synchronize running the script with
the log file rotation schedule.
Here's the script:
-----------------------------------------------------------------------------------------------------------------------------------------------
#!/usr/bin/perl
## This script examines the apache logs and parses them for information.
use strict;
use Net::SMTP;
##########################################################
################ CONFIGURATION ###########################
##########################################################
my $dstEmail = @ARGV[1];
my $srcEmail = "logwatch\@domain.com";
my $mailserver = "mailserver";
my $apacheLog = @ARGV[0];
my $mailerDebugLevel = 0;
##########################################################
################ END CONFIGURATION #######################
##########################################################
open ACCESSLOG, "$apacheLog" or die $!;
my @accesses = <ACCESSLOG>;
close ACCESSLOG;
my %ips;
my %urls;
my %codes;
my $ip;
my $url;
my $code;
foreach my $line (@accesses) {
if ($line =~
/^(\d+\.\d+\.\d+\.\d+).*GET\s(.*)HTTP.*\"\s+(\d+)\s+\d+$/) {
$ip = $1;
$url = $2;
$code = $3;
}
# Count the number of requests from each IP address
if (exists $ips{$ip}) {
$ips{$ip}++;
}
else {
$ips{$ip} = 1;
}
# Count the number of each return code
if (exists $codes{$code}) {
$codes{$code}++;
}
else {
$codes{$code} = 1;
}
# Save each url that resulted in a 404 error
if (($code == 404) && !(exists $urls{$url})) {
$urls{$url} = " ";
}
}
my @head;
my @body;
my $server;
if ($apacheLog =~ /(\w+\.\w+\.\w+)-access_log/) {
$server = $1;
}
push (@body, "Web log results for $server\n\n");
push (@body, "Top Ten Visitors:\t\tNo of Requests:\n");
my @sorted = reverse sort {$ips{$a} <=> $ips{$b}} keys %ips;
for (my $c=0; $c<10; $c++) {
if ($c > (@sorted - 1)) {
next;
}
push (@body, "$sorted[$c]\t\t\t$ips{$sorted[$c]}\n");
}
push (@body, "\nReturn Codes:\t\t\tNo:\n");
foreach (sort keys %codes) {
push (@body, "$_\t\t\t\t$codes{$_}\n");
}
my $uniques = keys (%urls);
push (@body, "\nThe following urls returned 404 error codes:\t \(No of
unique urls: $uniques\)\n");
foreach (keys %urls) {
push (@body, "$_\n");
}
push(@head, "To: $dstEmail\n");
push(@head, "From: $srcEmail\n");
push(@head, "Subject: Web log results from $server\n");
push (@head, "Content-Type: text/plain; charset=UTF-8\n");
push(@head, "Content-Transfer-Encoding: 8bit\n");
push(@head, "\n");
# Send the message out;
my $mailer = Net::SMTP->new($mailserver, Timeout => 60, Debug =>
$mailerDebugLevel) or (print "$0: cannot open '$mailserver' for
writing: $!\n");
$mailer->mail($srcEmail);
$mailer->to($dstEmail);
$mailer->data();
$mailer->datasend(@head);
$mailer->datasend(@body);
$mailer->dataend();
$mailer->quit;
exit 0;
-----------------------------------------------------------------------------------------------------------------------------------------------
And here's sample output:
-----------------------------------------------------------------------------------------------------------------------------------------------
Web log results for staging.domain.com
Top Ten Visitors: No of Requests:
xx.xxx.xxx.xxx 1059
xx.xxx.xxx.xxx 1032
Return Codes: No:
200 1999
301 11
401 31
404 50
The following urls returned 404 error codes: (No of unique urls: 4)
/jobs/internship
/favicon.ico
/welcome-to-my-site
/templates/default/css/template_css.css
-----------------------------------------------------------------------------------------------------------------------------------------------
I'm pretty new to perl, so I'd love feedback if you see areas that
could use improvement. I think when I have time I'm going to do
reverse dns lookups on the top ten IP addresses and include the
results in the summary, since the IP's themselves are sort of useless.
Chris
More information about the LUG
mailing list