[lug] advice on a problem

David L. Anselmi anselmi at anselmi.us
Fri Jul 30 22:10:21 MDT 2010


Steve A Hart wrote:
> Out of the 52 clients, I have 1 system that is frequently locking up for
> no apparent reason at all.  Roughly 1-3 times per week this system just
> goes belly up and locks like it lost the NFS connection to the server.
> The kicker to this is that this system sits next to an exactly identical
> system (hardware and software setup) which acts completely normal and
> does not lock up.  All logs on the problem system show no errors of any
> kind.  Also, both systems are plugged into the same switch so it's not a
> network issue.

If you swap the wires where they plug into the 2 machines you'll know the problem is in the box. 
Otherwise it could be in the cable or switch port.

> My only commonality/trend I see is that when the system locks, the user
> is running a heavy matlab script that displays 20+ plots one right after
> the other so that all 20+ plots are visible in 20+ different windows.
> This script would run successfully multiple times in a week and then at
> some apparently random point, it locks.  It should be noted that if this
> same users runs the same code on the identical system, it runs fine
> every time.

Does it run on an identical system for weeks on end?  Perhaps it's related to this particular 
workload when there's an inopportune burst of network traffic.

If an identical system works consistently you could at least swap them and solve this user's problem.

Probably there's some debugging you could turn on but I haven't done that.  I'd be curious what's 
happening on the network when it locks.

Dave



More information about the LUG mailing list