[lug] Seeking thoughts on this crash

Gary Hodges Gary.Hodges at noaa.gov
Wed Dec 31 10:25:58 MST 2003


Chuck Morrison wrote:

> Did you run your test anytime other than the two times mentioned here ? 

In a sense, yes.  This morning I was trying to get that data processed 
in a piecemeal fashion and it locked again.

> To be sure, install RH 7.2 again (on a different hard drive if you 
> have one to avoid re-installing SuSE again) and run your test again. 
> If it works fine there, then maybe some kernel tuning might be in 
> order. If it fails under RH 7.2 now, it's hardware. 

As often happens when I send an email, I see a typo the moment I hit 
send.  It was actually a KRUD9 install the first time, though I don't 
think this changes the point of your suggestion.  Regarding your first 
question, I only ran that performance test once while I had KRUD9 
installed but was for 58 minutes if I remember correctly.  Now it locks 
fairly quickly.

> You didn't have anything else to do this weekend, did you ? 

Lets see...  New Year's....  Drink, recover, drink, recover....

I actually have a spare drive sitting next to me that I could install 
KRUD9 on.  Is there an option during the install to make a boot floppy?  
I don't remember.  I'd like to be able to just do the install, run the 
test, pull that drive and have everything back to normal.

What about a kernel recompile first?  Maybe try an older kernel, or 
whatever one KRUD9 uses?

Gary

>
>
>
> Gary Hodges wrote:
>
>> With RH 7.2:
>>
>> Two months ago I replaced a plain Athlon 1.4 with an OEM Athlon XP 
>> 2600+.  I first flashed the BIOS to a beta version released by 
>> Gigabyte so the CPU would be recognized correctly.  Before and after 
>> installing the new CPU I ran a large process to see what improvement 
>> I would get.  The new CPU ran at near 100% for ~60 minutes during 
>> this test without trouble.
>>
>> Installed SuSE v9.0:
>>
>> Today I was processing data using the same program as above and the 
>> machine locked hard with the following messages displayed in the 
>> xterm's:
>>
>> -----------------
>> Message from syslogd at space at Tue Dec 30 11:43:43 2003 ...
>> space kernel: CPU 0: Machine Check Exception: 000000000000004
>>
>> Message from syslogd at space at Tue Dec 30 11:43:43 2003 ...
>> space kernel: Bank 0: c410400000000136 at 0000000009b37000
>>
>> Message from syslogd at space at Tue Dec 30 11:43:43 2003 ...
>> space kernel Bank 1: d400400000000 at 000000001fd97140
>>
>> Message from syslogd at space at Tue Dec 30 11:43:43 2003 ...
>> space kernel Bank 2: f60020000000017a at 000000001ad96080
>>
>> Message from syslogd at space at Tue Dec 30 11:43:43 2003 ...
>> kernel panic: CPU context corrupt
>> ------------------
>>
>> I've done some google searches on "Machine Check Exception" and some 
>> results indicate that "It could be a hardware related problem."  
>> There were very few hits on "CPU context corrupt."  I was able to 
>> repeat the lock-up three out of four tries.  It typically takes 3-5 
>> minutes for the machine to lock up.  A complete run that I was trying 
>> takes about 15 minutes.  I guesss I'll RMA the CPU unless I get some 
>> other suggestions.  Thankfully I paid $2.99 for a 90 day warranty. 
>





More information about the LUG mailing list