[lug] Apache log summaries

Walter Pienciak wpiencia at thunderdome.ieee.org
Fri Aug 29 12:07:08 MDT 2008


Nice.

You may find that a descending sort of "404" URLs, based on
number of request occurrences, helps you to determine/prioritize
response.

Walter

n Fri, Aug 29, 2008 at 11:30:19AM -0600, Chris McDermott wrote:
> Hi all,
> 
> I was recently put in charge of a webserver which hosts several
> websites.  I'm using Google Analytics to produce pretty graphs and
> charts, and Google Webmaster Tools to monitor the status of our
> sitemap and stuff like that.  But I realized that there was nothing
> summarizing the list of url requests resulting in a 404 error.  Google
> would tell me if IT ran into a 404, but that's only part of the
> picture.  A brief search didn't come up with any pre-made tools that
> would do that, so I wrote a perl script to do it for me.  I set up
> weekly cron jobs to run the script against different log files (one
> per website) and email the results to the interested parties.  Usage
> is like so:
> 
> 30 8 * * 5 /path/to/logWatch.pl "/path/to/www.website.com-access_log"
> csmcdermott at gmail.com
> 
> It would be best, of course, to synchronize running the script with
> the log file rotation schedule.
> 
> Here's the script:
> 
> -----------------------------------------------------------------------------------------------------------------------------------------------
> 
> #!/usr/bin/perl
> 
> ## This script examines the apache logs and parses them for information.
> 
> use strict;
> use Net::SMTP;
> 
> ##########################################################
> ################ CONFIGURATION ###########################
> ##########################################################
> 
> my $dstEmail = @ARGV[1];
> my $srcEmail = "logwatch\@domain.com";
> my $mailserver = "mailserver";
> my $apacheLog = @ARGV[0];
> 
> my $mailerDebugLevel = 0;
> 
> ##########################################################
> ################ END CONFIGURATION #######################
> ##########################################################
> 
> 
> 
> open ACCESSLOG, "$apacheLog" or die $!;
> 
> my @accesses = <ACCESSLOG>;
> 
> close ACCESSLOG;
> 
> my %ips;
> my %urls;
> my %codes;
> 
> my $ip;
> my $url;
> my $code;
> 
> foreach my $line (@accesses) {
>         if ($line =~
> /^(\d+\.\d+\.\d+\.\d+).*GET\s(.*)HTTP.*\"\s+(\d+)\s+\d+$/) {
>                 $ip = $1;
>                 $url = $2;
>                 $code = $3;
>         }
> 
> 
>         # Count the number of requests from each IP address
>         if (exists $ips{$ip}) {
>                 $ips{$ip}++;
>         }
>         else {
>                 $ips{$ip} = 1;
>         }
> 
>         # Count the number of each return code
>         if (exists $codes{$code}) {
>                 $codes{$code}++;
>         }
>         else {
>                 $codes{$code} = 1;
>         }
> 
>         # Save each url that resulted in a 404 error
>         if (($code == 404) && !(exists $urls{$url})) {
>                 $urls{$url} = " ";
>         }
> 
> }
> 
> 
> my @head;
> my @body;
> 
> my $server;
> if ($apacheLog =~ /(\w+\.\w+\.\w+)-access_log/) {
>         $server = $1;
> }
> 
> push (@body, "Web log results for $server\n\n");
> push (@body, "Top Ten Visitors:\t\tNo of Requests:\n");
> 
> my @sorted = reverse sort {$ips{$a} <=> $ips{$b}} keys %ips;
> for (my $c=0; $c<10; $c++) {
>         if ($c > (@sorted - 1)) {
>                 next;
>         }
>         push (@body, "$sorted[$c]\t\t\t$ips{$sorted[$c]}\n");
> }
> 
> push (@body, "\nReturn Codes:\t\t\tNo:\n");
> foreach (sort keys %codes) {
>         push (@body, "$_\t\t\t\t$codes{$_}\n");
> }
> 
> my $uniques = keys (%urls);
> push (@body, "\nThe following urls returned 404 error codes:\t \(No of
> unique urls: $uniques\)\n");
> foreach (keys %urls) {
>         push (@body, "$_\n");
> }
> 
> 
> push(@head, "To: $dstEmail\n");
> push(@head, "From: $srcEmail\n");
> push(@head, "Subject: Web log results from $server\n");
> push (@head, "Content-Type: text/plain; charset=UTF-8\n");
> push(@head, "Content-Transfer-Encoding: 8bit\n");
> push(@head, "\n");
> 
> 
> # Send the message out;
> my $mailer = Net::SMTP->new($mailserver, Timeout => 60, Debug   =>
> $mailerDebugLevel) or (print "$0: cannot open '$mailserver' for
> writing: $!\n");
> 
> $mailer->mail($srcEmail);
> $mailer->to($dstEmail);
> $mailer->data();
> 
> $mailer->datasend(@head);
> $mailer->datasend(@body);
> $mailer->dataend();
> $mailer->quit;
> 
> exit 0;
> 
> -----------------------------------------------------------------------------------------------------------------------------------------------
> 
> And here's sample output:
> 
> -----------------------------------------------------------------------------------------------------------------------------------------------
> 
> Web log results for staging.domain.com
> 
> Top Ten Visitors:		No of Requests:
> xx.xxx.xxx.xxx			1059
> xx.xxx.xxx.xxx			1032
> 
> Return Codes:			No:
> 200				1999
> 301				11
> 401				31
> 404				50
> 
> The following urls returned 404 error codes:	 (No of unique urls: 4)
> /jobs/internship
> /favicon.ico
> /welcome-to-my-site
> /templates/default/css/template_css.css
> 
> -----------------------------------------------------------------------------------------------------------------------------------------------
> 
> 
> 
> I'm pretty new to perl, so I'd love feedback if you see areas that
> could use improvement.  I think when I have time I'm going to do
> reverse dns lookups on the top ten IP addresses and include the
> results in the summary, since the IP's themselves are sort of useless.
> 
> Chris
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: lug.boulder.co.us port=6667 channel=#colug



More information about the LUG mailing list