[lug] Seeking thoughts on this crash
Gary Hodges
Gary.Hodges at noaa.gov
Wed Dec 31 10:25:58 MST 2003
Chuck Morrison wrote:
> Did you run your test anytime other than the two times mentioned here ?
In a sense, yes. This morning I was trying to get that data processed
in a piecemeal fashion and it locked again.
> To be sure, install RH 7.2 again (on a different hard drive if you
> have one to avoid re-installing SuSE again) and run your test again.
> If it works fine there, then maybe some kernel tuning might be in
> order. If it fails under RH 7.2 now, it's hardware.
As often happens when I send an email, I see a typo the moment I hit
send. It was actually a KRUD9 install the first time, though I don't
think this changes the point of your suggestion. Regarding your first
question, I only ran that performance test once while I had KRUD9
installed but was for 58 minutes if I remember correctly. Now it locks
fairly quickly.
> You didn't have anything else to do this weekend, did you ?
Lets see... New Year's.... Drink, recover, drink, recover....
I actually have a spare drive sitting next to me that I could install
KRUD9 on. Is there an option during the install to make a boot floppy?
I don't remember. I'd like to be able to just do the install, run the
test, pull that drive and have everything back to normal.
What about a kernel recompile first? Maybe try an older kernel, or
whatever one KRUD9 uses?
Gary
>
>
>
> Gary Hodges wrote:
>
>> With RH 7.2:
>>
>> Two months ago I replaced a plain Athlon 1.4 with an OEM Athlon XP
>> 2600+. I first flashed the BIOS to a beta version released by
>> Gigabyte so the CPU would be recognized correctly. Before and after
>> installing the new CPU I ran a large process to see what improvement
>> I would get. The new CPU ran at near 100% for ~60 minutes during
>> this test without trouble.
>>
>> Installed SuSE v9.0:
>>
>> Today I was processing data using the same program as above and the
>> machine locked hard with the following messages displayed in the
>> xterm's:
>>
>> -----------------
>> Message from syslogd at space at Tue Dec 30 11:43:43 2003 ...
>> space kernel: CPU 0: Machine Check Exception: 000000000000004
>>
>> Message from syslogd at space at Tue Dec 30 11:43:43 2003 ...
>> space kernel: Bank 0: c410400000000136 at 0000000009b37000
>>
>> Message from syslogd at space at Tue Dec 30 11:43:43 2003 ...
>> space kernel Bank 1: d400400000000 at 000000001fd97140
>>
>> Message from syslogd at space at Tue Dec 30 11:43:43 2003 ...
>> space kernel Bank 2: f60020000000017a at 000000001ad96080
>>
>> Message from syslogd at space at Tue Dec 30 11:43:43 2003 ...
>> kernel panic: CPU context corrupt
>> ------------------
>>
>> I've done some google searches on "Machine Check Exception" and some
>> results indicate that "It could be a hardware related problem."
>> There were very few hits on "CPU context corrupt." I was able to
>> repeat the lock-up three out of four tries. It typically takes 3-5
>> minutes for the machine to lock up. A complete run that I was trying
>> takes about 15 minutes. I guesss I'll RMA the CPU unless I get some
>> other suggestions. Thankfully I paid $2.99 for a 90 day warranty.
>
More information about the LUG
mailing list