[lug] debugging workstation issue

Davide Del Vento davide.del.vento at gmail.com
Fri Apr 10 15:24:00 MDT 2020


Good point about memory, and in fact I do have some spare around. I don't
think it's the issue though, because IIRC it was happening before I
replaced the DIMMs to increase the total amount. Plus, if it were memory,
I'd expect the issue to happen at random times, not at two very specific
moments.

Regarding the keyboard, I did try that (quite a lot before I replaced the
keyboard), and the system appears to be quite dead.

In a few tests, glxgears appear to work just fine, but then this problem
itself is occasional (say about 10% of the times).

I'll explore the serial console option.

Thanks again,
Dav

On Fri, Apr 10, 2020 at 2:09 PM D. Stimits <stimits at comcast.net> wrote:

> +1 to serial console, although it can be somewhat obscure on desktop PCs.
> Serial console is nearly immune to many failures which ssh and other
> mechanisms might have.
>
> Just a thought...if this is an OpenGL screen saver issue, then just run an
> OpenGL app and see if it dies. You can get info about what is present from
> "glxinfo" (part of "glx-utils" on many systems). You can run "glxgears"
> (also part of "glx-utils" on most systems) to actually run under OpenGL. If
> it crashes upon running glxgears, then you found the culprit. I'd recommend
> running "sync" twice prior to running "glxgears", and leaving all other
> apps closed.
>
> On April 10, 2020 at 1:55 PM Maxwell Spangler <lists at maxwellspangler.com>
> wrote:
>
> At this point, if I were in your circumstances and I were me, I would:
>
> * Setup a serial port on this system and try to capture some console
> output during the crash via that.
>
> * If it has a serial port, use it. If it doesn't then add a USB to serial
> port adapter.
>
> * Configure the kernel to recognize the Linux serial port as a console so
> all critical console messages go to the serial port.
>
> * Setup a second laptop to capture the serial output.
>
> * Use the desktop as you normally would and when there is a crash, it
> should output something to the serial port even if the storage system is
> read-only or broken, or if it otherwise won't log it and the system won't
> let you SSH in because its more than just a graphical fault but a system
> fault.
>
> If this was a server with an out of bands management system like an iLO,
> iDrac or iLOM, I would SSH in from another system, connect to the virtual
> serial console and do the same capture process.
>
> >> Anyway, the problem is that sometimes at boot and sometimes after the
> screensaver is engaged the machine goes into a "dead" status. I suspect it
> can be an hybernation mode or something like that, which does not awake
> when I hit the keyboard, mouse or power button. A long press on the power
> button does trigger a reboot, with all the usual consequences of such a
> thing (possible files not closed, filesystem checks, loss of not-saved
> data, etc). When it is in this status, trying to ssh into it from another
> machine hangs with no response.
>
> You might also try swapping out components if you have spare parts
> available. Bad memory might cause the system to lock up hard like this.
> PCIe bus issues with video and other cards could do the same as well. I'd
> replace memory with some spares and try to use the system. If this is a
> desktop, you're not likely to have the greatest tolerance for memory errors
> and even on servers I've seen bad ECC memory cause a server to just hang.
> In those cases, on the server, upon reboot of the hung server, the BIOS
> often recognizes the memory as bad and the mystery as solved. But on a
> non-ECC PC with few firmware features to support this kind of debugging,
> you're forced to do much of it yourself.
>
> Also, I wonder if you could plug in a USB standard (non-Apple) keyboard as
> a second keyboard and attempt to ALT-F2 to get to a console on that? If it
> won't because your primary keyboard is apple, perhaps a second simultaneous
> keyboard would allow this.
>
> I hope this helps.. Keep us posted if you try more things?
>
>
> On Fri, 2020-04-10 at 07:46 -0600, Davide Del Vento wrote:
>
> Thanks Maxwell and Michael,
>
> The video card is of this machine is older than the rest of the box. It's
> an ATI dual-monitor DVI (details below). There is no 3D enabled. One of the
> first things I tried when this was happening was indeed trying
> CONTROL-ALT-Fn and that did not work. I say "did" because I recently
> changed keyboard with a more ergonomic Apple one (or I had my hands chopped
> off with the inordinate amount of time I now have to spend there....) and
> on such keyboard the Fn keys are not recognized, so this test is a moot
> point.
> Yet, I think the key point here is the fact that the box does not respond
> to ssh attempts, so it must go in some weird status, not just a "broken
> display mode" one.
>
> Maxwell, I am sorry to hear that even you have to suffer these things.
> Sigh.
>
> Thanks,
> Davide
>
>
> VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV370 GL
> [FireMV 2200] (rev 80)
> Display controller: Advanced Micro Devices, Inc. [AMD/ATI] RV370 GL
> [FireMV 2200] (Secondary) (rev 80)
>
> On Thu, Apr 9, 2020 at 9:17 PM Maxwell Spangler <
> lists at maxwellspangler.com> wrote:
>
> Sometimes on my intel based laptop with integrated Intel GMA drivers, the
> GUI display will lock up.
>
> Can you CONTROL-ALT-F2 (or f3, f4, etc) to get to an alternate console?
>
> If so, you can then login on a text console and use 'dmesg' and
> 'journalctl -f' to see what's going on.
>
> Due to bugs in X-windows/Gnome/intel drivers/whatever this happens to me a
> lot and for all my love for Linux I have to reboot it a lot to get back to
> a normal GUI display.
>
> M
>
> On Thu, 2020-04-09 at 16:45 -0600, Davide Del Vento wrote:
>
> Thanks to both of you.
>
> Zan: it is indeed a systemd system and "journalctl -b -1" provides what I
> was looking for. I don't see anything suspicious other than perhaps
>
> Apr 09 13:23:43 buzzicone systemd[1]: Starting Message of the Day...
> Apr 09 13:23:43 buzzicone systemd[1]: Started Message of the Day.
>
> (and then the log ends). The time is around when the problem occurred and
> I tried to trigger that live by e.g. opening a new shell, starting bash
> with -l option, ssh'ing to localhost, triggering the screensaver. Nothing
> cause it to happen.... so that is weird.
>
> D Stimits: ssh'ing prior to the failure assumes that I have a spare system
> to do that, which unfortunately at the moment I haven't (plus as I said it
> sometimes happens at boot before I get a chance to ssh into it). Thanks for
> the tip about unplugging and replugging USB devices, I will try that next
> time!
>
> Cheers,
> Davide
>
> On Thu, Apr 9, 2020 at 3:10 PM D. Stimits < stimits at comcast.net> wrote:
>
>
>
> On April 9, 2020 at 2:14 PM Davide Del Vento < davide.del.vento at gmail.com>
> wrote:
> Folks,
>
> My workstation, a desktop-sized server computer used as a desktop, is
> having a very annoying problem, which is even more severe these days that I
> have to rely to it for basically everything (so far the only thing I don't
> use it for is for when I use the restroom, but that may change soon)...
>
> Anyway, the problem is that sometimes at boot and sometimes after the
> screensaver is engaged the machine goes into a "dead" status. I suspect it
> can be an hybernation mode or something like that, which does not awake
> when I hit the keyboard, mouse or power button. A long press on the power
> button does trigger a reboot, with all the usual consequences of such a
> thing (possible files not closed, filesystem checks, loss of not-saved
> data, etc). When it is in this status, trying to ssh into it from another
> machine hangs with no response.
>
> It's been a long time since I debugged something like this and dmesg shows
> only messages since last reboot, which are clearly useless. Any clues on
> how to look at the logs immediately BEFORE that? Bonus points if you have
> any ideas on what might be going on or specifically what to look for.
>
>
>  Ssh in and run "dmesg --follow" prior to the failure. The computer which
> is displaying this will still be running.
>
> Tip: Often USB devices do not correctly handle or respond to low power
> mode events. If this is the case, then when mouse/keyboard fails to wake up
> the system, you might try to unplug and replug the mouse/keyboard and test
> again if you can now resume.
>
> PS: If I were more motivated I'd edit my window manager and display
> manager code and remove the option to manually sleep/suspend. I hate
> accidentally clicking that instead of log out or shut down or reboot.
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety
>
> _______________________________________________
>
> Web Page:
>
> http://lug.boulder.co.us
>
>
> Mailing List:
>
> http://lists.lug.boulder.co.us/mailman/listinfo/lug
>
>
> Join us on IRC:
>
> irc.hackingsociety.org
>
>  port=6667 channel=#hackingsociety
>
> --
>
>
> Maxwell Spangler
> ===================================================================
> Denver, Colorado, USA
> maxwellspangler.com <http://www.maxwellspangler.com>
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety
>
> _______________________________________________
>
> Web Page:
>
> http://lug.boulder.co.us
>
>
> Mailing List:
>
> http://lists.lug.boulder.co.us/mailman/listinfo/lug
>
>
> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety
>
> --
>
> Maxwell Spangler
> ===================================================================
> Denver, Colorado, USA
> maxwellspangler.com <http://www.maxwellspangler.com>
> _______________________________________________
> Web Page: http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety
>
>
>
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lug.boulder.co.us/pipermail/lug/attachments/20200410/74ce2331/attachment-0001.html>


More information about the LUG mailing list