[lug] strange kernel happenings...

Nate Duehr nate at natetech.com
Tue Aug 13 18:08:25 MDT 2002


On Mon, 2002-08-12 at 22:05, D. Stimits wrote:
> A general comment on such strange behaviors, that might or might not 
> have anything to do with your situation. It is not unusual for heat 
> buildup over time to do strange things. It is not unusual for marginal 
> power supplies or marginal power line voltage (brown-out) to do this 
> (have a voltmeter? if you have an UPS, hopefully it will output close to 
> 120 VAC, and never dip down to 110). Marginal memory often does such 
> strange things as well, run memtest86 on it for half a day:
>   http://www.memtest86.com/

Yeah, had thought about this, but don't want that server down for long
enough to run a long-running memtest86 test on it... hmmm... might have
to.

> In terms of software, I think a triple-exception in the kernel will 
> cause instant reboot, but not defunct processes.

Interesting.  I'm not a kernel hack at all, so this is interesting info.

> To find out more about what is going on, you can compile with the kernel 
> debugger option, or kdb. I do not know if all kernels have that, but I 
> think the redhat kernels do, and all of the SGI kernels I use do 
> (naturally, kdb was written mainly by an SGI employee). kdb can give you 
> a list of processes, and allow you to get a stack dump of any process. 
> It will even work most of the time if the kernel is locked up hard. The 
> problem with kdb is that it does not play nicely with X11, you mostly 
> need either a real console or a serial console to another machine. [I 
> have not looked, but I would bet that the kdb docs in the kernel source 
> Documentation/ directory name oss.sgi.com; if not, likely oss.sgi.com 
> has tons of docs on kdb]

I think I'll avoid debugging kernels... it might be detrimental to my
health. :)  (LOL...)

> Also, I'd run tail -f on /var/log/messages and always keep it visible, 
> preferably via ssh from another machine. Although you said it does not 
> show anything in the log, it might matter what the last message was, 
> especially if the same message is always the last message. Maybe 
> experiment with manually running it to init 2, and then back to init 3 
> or init 5, see if that does anything (if it gives an oops, you are in 
> luck...if you can type it in accurately).

Good idea... I'll set that up on the console.

Nate, nate at natetech.com




More information about the LUG mailing list