[lug] Open source tools to monitor distributed services

Bamm Visscher bamm.visscher at gmail.com
Thu Nov 30 19:30:37 MST 2006


Sounds like you are describing Nagios.

Bammkkkk

On 11/30/06, Vince Dean <vdean at ucar.edu> wrote:
> I am managing a sized distributed system that depends
> on services running on several Unix hosts. I've found that one
> of the best ways to monitor the health of the system is to
> run independent tests:  to check that a port is open on a given
> host, a file has been modified in the last two hours, an
> HTTP URL can be retrieved, and so on.  There are a few dozen
> dozen conditions, distributed among eight machines, that I want to
> check every few minutes.
>
> I'm using ad-hoc scripts and cron jobs but I  feel the
> need for a more general, configurable solution.  I suspect that
> this is a well-studied problem.  Are there any solutions
> that you can recommend?
>
> My ideal solution:
> - is open source
> - runs on Linux, but preferably is portable to other
>      Unix systems and Windows
>      (i.e. written in Java, Python, Ruby, or Perl)
> - is easily configured for some standard types of tests:
>    - FTP server is running
>    - HTTP URL can be retrieved
>    - a given port is open on a given host
>    - a given file exists and has been recently modified
>    - a process is running with a given name
>    - etc.
> -  is easily extended by custom code to check for
>       application-specific conditions
> -  notifies by email and/or writes messages to a log file when a test fails
> -  checks at a configurable interval and suppresses redundant messages
>       (doesn't tell me the same service is down every minute)
> -  notifies me when a service is back up
>
> I'll be grateful for any suggestions.
>
> Vince
> --
> Vince Dean
> University of Colorado
> Center for Lower Atmospheric Studies
> 3450 Mitchell Lane, Rm FL0-2514
> Boulder, CO 80301
> Phone: (303) 497-8077
>
>
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: lug.boulder.co.us port=6667 channel=#colug
>


-- 
sguil - The Analyst Console for NSM
http://sguil.sf.net



More information about the LUG mailing list