[lug] strange kernel happenings...
Nate Duehr
nate at natetech.com
Tue Aug 13 18:08:25 MDT 2002
On Mon, 2002-08-12 at 22:05, D. Stimits wrote:
> A general comment on such strange behaviors, that might or might not
> have anything to do with your situation. It is not unusual for heat
> buildup over time to do strange things. It is not unusual for marginal
> power supplies or marginal power line voltage (brown-out) to do this
> (have a voltmeter? if you have an UPS, hopefully it will output close to
> 120 VAC, and never dip down to 110). Marginal memory often does such
> strange things as well, run memtest86 on it for half a day:
> http://www.memtest86.com/
Yeah, had thought about this, but don't want that server down for long
enough to run a long-running memtest86 test on it... hmmm... might have
to.
> In terms of software, I think a triple-exception in the kernel will
> cause instant reboot, but not defunct processes.
Interesting. I'm not a kernel hack at all, so this is interesting info.
> To find out more about what is going on, you can compile with the kernel
> debugger option, or kdb. I do not know if all kernels have that, but I
> think the redhat kernels do, and all of the SGI kernels I use do
> (naturally, kdb was written mainly by an SGI employee). kdb can give you
> a list of processes, and allow you to get a stack dump of any process.
> It will even work most of the time if the kernel is locked up hard. The
> problem with kdb is that it does not play nicely with X11, you mostly
> need either a real console or a serial console to another machine. [I
> have not looked, but I would bet that the kdb docs in the kernel source
> Documentation/ directory name oss.sgi.com; if not, likely oss.sgi.com
> has tons of docs on kdb]
I think I'll avoid debugging kernels... it might be detrimental to my
health. :) (LOL...)
> Also, I'd run tail -f on /var/log/messages and always keep it visible,
> preferably via ssh from another machine. Although you said it does not
> show anything in the log, it might matter what the last message was,
> especially if the same message is always the last message. Maybe
> experiment with manually running it to init 2, and then back to init 3
> or init 5, see if that does anything (if it gives an oops, you are in
> luck...if you can type it in accurately).
Good idea... I'll set that up on the console.
Nate, nate at natetech.com
More information about the LUG
mailing list