[lug] monitoring jobs on linux

Will Sterling will.sterling at gmail.com
Fri Mar 16 12:37:48 MDT 2012


Install a job scheduler then have your users submit their jobs using
the scheduler.  You will then be able to run canned reports for all
kinds of info you never knew you were missing.

On Mar 16, 2012, at 12:31 PM, Davide Del Vento
<davide.del.vento at gmail.com> wrote:

> Hi,
> we have a server where users have shell access, and they usually
> submit nohupped background jobs (or cron jobs). I would like to
> monitor what users are doing. At the bare minimum how long the jobs
> last on average and what the distribution looks like. Better yet if I
> can get more details, such as when those jobs run (e.g. is the
> distribution changing during the weekends? is there any particular
> user doing something much off the others? etc.) I am particularly
> interested in long-running stuff, so a sampling would work fine, even
> at low frequency (e.g. 1-10 minutes)
>
> None of this is rocket science, filtering the output of ps happening
> in a cron every 5m or so would do the trick. However I don't want to
> do this myself, since there are many small details that would make
> this a serious project and not a quick test to collect some data to
> slap on a manager's desk. For example: what if PID rolls over? What
> about spawned processes? I care only about the "top level" jobs
> submitted by the user, so if in the system there is only a single
> 10-hour bash script calling 10 1-hour things, I want and easy way to
> be able to find the information I want which is "the average running
> time is 10 hours", and not the quick answer "the average running time
> is 1.8 hours" (since there have been 1 10h + 10 1h processes running).
> Again, since ps can do some parent-child stuff this is possible....
>
> But instead of reinventing the wheel, I'm wondering if such a tool
> exists (maybe withing Nagios and/or Ganglia which are already running
> on the system - I can just go to the system administrators and ask for
> what I need). I didn't find anything on Google, but that's probably
> because I am not a system administrator so I asked the "wrong"
> question (and Google is not smart enough to accept very elaborate
> queries like this by email :-)
>
> Thanks,
> Davide
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety



More information about the LUG mailing list