[lug] Apache log summaries
Walter Pienciak
wpiencia at thunderdome.ieee.org
Fri Aug 29 12:07:08 MDT 2008
Nice.
You may find that a descending sort of "404" URLs, based on
number of request occurrences, helps you to determine/prioritize
response.
Walter
n Fri, Aug 29, 2008 at 11:30:19AM -0600, Chris McDermott wrote:
> Hi all,
>
> I was recently put in charge of a webserver which hosts several
> websites. I'm using Google Analytics to produce pretty graphs and
> charts, and Google Webmaster Tools to monitor the status of our
> sitemap and stuff like that. But I realized that there was nothing
> summarizing the list of url requests resulting in a 404 error. Google
> would tell me if IT ran into a 404, but that's only part of the
> picture. A brief search didn't come up with any pre-made tools that
> would do that, so I wrote a perl script to do it for me. I set up
> weekly cron jobs to run the script against different log files (one
> per website) and email the results to the interested parties. Usage
> is like so:
>
> 30 8 * * 5 /path/to/logWatch.pl "/path/to/www.website.com-access_log"
> csmcdermott at gmail.com
>
> It would be best, of course, to synchronize running the script with
> the log file rotation schedule.
>
> Here's the script:
>
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
> #!/usr/bin/perl
>
> ## This script examines the apache logs and parses them for information.
>
> use strict;
> use Net::SMTP;
>
> ##########################################################
> ################ CONFIGURATION ###########################
> ##########################################################
>
> my $dstEmail = @ARGV[1];
> my $srcEmail = "logwatch\@domain.com";
> my $mailserver = "mailserver";
> my $apacheLog = @ARGV[0];
>
> my $mailerDebugLevel = 0;
>
> ##########################################################
> ################ END CONFIGURATION #######################
> ##########################################################
>
>
>
> open ACCESSLOG, "$apacheLog" or die $!;
>
> my @accesses = <ACCESSLOG>;
>
> close ACCESSLOG;
>
> my %ips;
> my %urls;
> my %codes;
>
> my $ip;
> my $url;
> my $code;
>
> foreach my $line (@accesses) {
> if ($line =~
> /^(\d+\.\d+\.\d+\.\d+).*GET\s(.*)HTTP.*\"\s+(\d+)\s+\d+$/) {
> $ip = $1;
> $url = $2;
> $code = $3;
> }
>
>
> # Count the number of requests from each IP address
> if (exists $ips{$ip}) {
> $ips{$ip}++;
> }
> else {
> $ips{$ip} = 1;
> }
>
> # Count the number of each return code
> if (exists $codes{$code}) {
> $codes{$code}++;
> }
> else {
> $codes{$code} = 1;
> }
>
> # Save each url that resulted in a 404 error
> if (($code == 404) && !(exists $urls{$url})) {
> $urls{$url} = " ";
> }
>
> }
>
>
> my @head;
> my @body;
>
> my $server;
> if ($apacheLog =~ /(\w+\.\w+\.\w+)-access_log/) {
> $server = $1;
> }
>
> push (@body, "Web log results for $server\n\n");
> push (@body, "Top Ten Visitors:\t\tNo of Requests:\n");
>
> my @sorted = reverse sort {$ips{$a} <=> $ips{$b}} keys %ips;
> for (my $c=0; $c<10; $c++) {
> if ($c > (@sorted - 1)) {
> next;
> }
> push (@body, "$sorted[$c]\t\t\t$ips{$sorted[$c]}\n");
> }
>
> push (@body, "\nReturn Codes:\t\t\tNo:\n");
> foreach (sort keys %codes) {
> push (@body, "$_\t\t\t\t$codes{$_}\n");
> }
>
> my $uniques = keys (%urls);
> push (@body, "\nThe following urls returned 404 error codes:\t \(No of
> unique urls: $uniques\)\n");
> foreach (keys %urls) {
> push (@body, "$_\n");
> }
>
>
> push(@head, "To: $dstEmail\n");
> push(@head, "From: $srcEmail\n");
> push(@head, "Subject: Web log results from $server\n");
> push (@head, "Content-Type: text/plain; charset=UTF-8\n");
> push(@head, "Content-Transfer-Encoding: 8bit\n");
> push(@head, "\n");
>
>
> # Send the message out;
> my $mailer = Net::SMTP->new($mailserver, Timeout => 60, Debug =>
> $mailerDebugLevel) or (print "$0: cannot open '$mailserver' for
> writing: $!\n");
>
> $mailer->mail($srcEmail);
> $mailer->to($dstEmail);
> $mailer->data();
>
> $mailer->datasend(@head);
> $mailer->datasend(@body);
> $mailer->dataend();
> $mailer->quit;
>
> exit 0;
>
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
> And here's sample output:
>
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
> Web log results for staging.domain.com
>
> Top Ten Visitors: No of Requests:
> xx.xxx.xxx.xxx 1059
> xx.xxx.xxx.xxx 1032
>
> Return Codes: No:
> 200 1999
> 301 11
> 401 31
> 404 50
>
> The following urls returned 404 error codes: (No of unique urls: 4)
> /jobs/internship
> /favicon.ico
> /welcome-to-my-site
> /templates/default/css/template_css.css
>
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>
>
> I'm pretty new to perl, so I'd love feedback if you see areas that
> could use improvement. I think when I have time I'm going to do
> reverse dns lookups on the top ten IP addresses and include the
> results in the summary, since the IP's themselves are sort of useless.
>
> Chris
> _______________________________________________
> Web Page: http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: lug.boulder.co.us port=6667 channel=#colug
More information about the LUG
mailing list