[lug] advice on a problem

Dan Ferris dan at usrsbin.com
Fri Jul 30 21:55:08 MDT 2010


Hey Steve,

Have you looked at the network card to see if there are errors on the
interface, or checked it with ethtool to see if you have duplex mismatch? 

Dan

On 07/30/2010 09:13 PM, Steve A Hart wrote:
> Here's the setup:  I've got 52 RHEL 5 32 bit clients localized to a RHEL 
> 5 32 bit server which shares out home directories and /usr/local to the 
> clients.  /usr/local on the server is where programs like Matlab are 
> installed.  One side effect of this setup is that if NFS is interrupted 
> in any way, the clients lock up.  That is not my problem, just the setup 
> to my situation.
>
> Out of the 52 clients, I have 1 system that is frequently locking up for 
> no apparent reason at all.  Roughly 1-3 times per week this system just 
> goes belly up and locks like it lost the NFS connection to the server. 
> The kicker to this is that this system sits next to an exactly identical 
> system (hardware and software setup) which acts completely normal and 
> does not lock up.  All logs on the problem system show no errors of any 
> kind.  Also, both systems are plugged into the same switch so it's not a 
> network issue.
>
> I'm trying to figure out if this system has a hardware issue or if 
> something else is going on. I've replaced and tested the crucial memory 
> installed and it tests fine with Memtest86.  I have two internal SATA 
> hard drives and they both test fine using the WD drive tester.
>
> My only commonality/trend I see is that when the system locks, the user 
> is running a heavy matlab script that displays 20+ plots one right after 
> the other so that all 20+ plots are visible in 20+ different windows. 
> This script would run successfully multiple times in a week and then at 
> some apparently random point, it locks.  It should be noted that if this 
> same users runs the same code on the identical system, it runs fine 
> every time.
>
> Also, I've reloaded this system a couple times now and this last time I 
> ran a multi day test where I worked the system hard by doing the 
> following all at the same time:
>
> 1.  multiple glxgears
> 2.  Ran multiple flash-heavy websites (thought it might be a flash issue)
> 3.  had Matlab open but not running code (I didn't have any usable 
> matlab codes to run)
>
> I ran the above for three days straight and not a single hiccup from the 
> system.  Gave it back to the user to use and within one week it locked up.
>
> I've got the latest NVIDIA driver loaded and running and the system is 
> fully updated.  Here's some of the system info:
>
> * GIGABYTE GA-EP45T-UD3LR motherboard
> * Intel Core 2 Quad Q9550 Yorkfield 2.83GHz
> * 8GB Crucial 240-Pin DDR3 SDRAM DDR3 1333 (PC3 10600)
> * GIGABYTE GV-N95TOC-1GI GeForce 9500 GT video card
> * 3Com Corporation 3c905C-TX/TX-M PCI card (wanted to make sure the 
> onboard NIC was not the culprit)
> * 750W power supply
> * 16GB of swap
>
> I'm open to any and all ideas on this.  I'm frankly out of ideas and the 
> system owners are getting frustrated.  My only thought now is to replace 
> all hardware and see if that does the trick but that seems to be an 
> extreme measure on this unknown.  I'd kill for any error message that 
> would give me a clue as to what's happening.
>
> Any thoughts would be appreciated.
>
> cheers
>
> Steve Hart
>
>   



More information about the LUG mailing list