[lug] Hard Drive Failure / somehow software issue?

Ben bluey at iguanaworks.net
Tue Feb 12 11:52:31 MST 2008


I'm getting the weirdest hard drive problem: Sometimes, (sometimes!!) when
I reboot my server (debian etch, 2.6.18), the kernel boots fine, sees my
raid (mirroring on / /usr /boot and /home) but during the init process
says:

hda: dma_intr: status 0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: set_drive_speed_status: status=0x58 {DriveReady SeekComplete
DataRequest }
ide: failed opcode was: unknown

I get the status=0x58 twice and from that point on, it does the full boot
process, only for every executable, I get:

/etc/init.d/rc: line 78: /etc/rcS.d/S05bootlogd: cannot execute binary file

--and the system is unuseable. If I try to login,  before I can enter a
password, the login fails because it couldn't load libraries. Same when I
hit control-alt-delete and try to reboot. I have to manually unpower the
box and reboot. When it reboots, I might get the same error, but in a
slightly different place in the boot sequence (say, before it runs set
hdparm parameters, instead of during or after). And sometimes it boots
fine and I have no errors, my raids all come up fine and synced. I see no
errors with smartctl (although I'm running a long test now).

I'm baffled: If this is hardware, why do I only have problems when I boot?
If it is software or something, why isn't it consistent between boots? And
I've got a raid on everything... even if it were hardware, the raid should
be able to handle that.. and when it boots fine, the raids don't even have
to resync. And why is the error "cannot execute binary file" ... doesn't
that mean it sees the file / directory structure? Any ideas?

Thanks,

Ben




More information about the LUG mailing list