[lug] Seeking thoughts on this crash

Gary Hodges Gary.Hodges at noaa.gov
Mon Jan 5 11:41:19 MST 2004


Gary Hodges wrote:

> Peter Hutnick wrote:
>
>> Chuck Morrison wrote:
>>
>>> While my first inclination would be hardware, testing with the 
>>> original - worked once at least - system would point you in the 
>>> right direction, I think. Of course if it's software it may be 
>>> something other than the kernel settings. There could be libraries 
>>> that could be different and cause issues too.
>>
>>
>> Have you eliminated heat (chipset and CPU) as the culprit?
>
>
> After the first crash I started monitoring the CPU temp.  If you 
> believe the accuracy of lmsensors it was at 66.9 deg C once and 66.7 
> deg C another time.  These are withing a degree that the original 1.4 
> Athlon ran at, and the same temp as during my first performance test 
> after installing the new CPU.  The HSF is from PC Power and Cooling 
> and is rated for up to something like 3200+ CPU's.  I did replace a 
> seized chipset fan at the time I replaced the CPU.  I didn't know it 
> had seized before I observed all the fans with the case open.  Maybe 
> the replacement has too.  I've had problems with other computers in 
> the past due to heat, so it is something I'm always worried about.  I 
> wish these CPU's ran cooler, but according to specs even 66 deg C is 
> well under max operating temps.

It looks like heat was playing a part in the crashes.  I changed the 
physical location of the computer and the CPU runs ~4 deg C cooler while 
processing large amounts of data.  I have reprocessed data several times 
now without crashing, so it must be that the CPU was getting too hot.  
Before the break it also locked up while in screensaver mode which had 
never happened before.  It seems to me that the CPU has become more 
sensitive to heat.  I should probably RMA the bugger.

Thanks for all the comments on this.

Gary




More information about the LUG mailing list