[lug] HW Puzzle
Lee Woodworth
blug-mail at duboulder.com
Fri Oct 19 15:17:36 MDT 2018
Hi All,
Here is a puzzle to ponder:
An old workstation is randomly hard locking when idle.
The PS and Case fans are running, no ping responses, SYSREQ keys are ignored
and the video output is black.
A remote monitoring loop dumping /sys/class/hwmon/hwmon0/device/temp*_input
shows CPU/case temps of around 40C about 120s or less before the lockup. A
similar dump for smartctl temps on the ssd shows 31C.
This happens for multiple versions of the 4.18 kernel, up to 4.18.13.
No lockups when the system is under a continuous heavy load (e.g. building a
tool chain). No hibernate/suspend/frequency-adjust configured for the kernel
and no userland agents either.
An oddity is that the kernel and gcc report differences as to whether the cpu
(athlon64, k8 arch) supports sse3. No sse3 flag in /proc/cpuinfo, but the output of
gcc -E -v -march=native -</dev/null 2>&1 | grep cc1
includes -msse3. The cpuid2cpuflags command also includes sse3 as a cpu feature.
-- ps issues would be the first guess, but the low temps and no crashes under
load don't suggest it; small form-factor case with non-standard PS so swapping
the ps is a pain; past web reports for similar issues have been about cpu
power states or watch dog timers
-- a failing component could be the cause; but how heavy loads could fail to
to not stress a failing component isn't obvious to me
-- cracked mb trace/solder joint is a possibility since failures happen when idle;
don't have an easy way to check that
-- WAG: maybe there is processor state related to sse3 the kernel isn't saving
because it doesn't think it needs to, but one of the screen savers use sse3;
will be rebuilding the tool chains and software with sse3 disabled and shutoff
the screen saver to see what happens.
-- gpu issue -- web searches find reports attributing lockups to gpu overload;
desktop is xfce4, we'll see if turning off the screen saver makes a difference;
no crashes when multiple instances of firefox are actively in use w/ hw
acceleration though
More information about the LUG
mailing list