[lug] RAID drive failure.
David L. Anselmi
anselmi at anselmi.us
Fri May 25 00:13:20 MDT 2007
I finally have my first drive failure in a RAID. Up to now in my
limited experience all RAID failures have been controllers, backplanes,
or personnel. It didn't quite work as smoothly as some thought it would
in a recent thread.
This is in a Dell SC420 with 2 SATA drives in RAID 1.
One of the drives is having trouble and it's hanging the machine. The
last console messages were something about the good drive redirecting
blocks to a different mirror.
The box was hung this morning and a reboot got it up for the day but it
hung again this afternoon.
Rebooting the machine caused the array to start syncing. I failed the
bad drive with mdadm and I expect that the box will work now until I get
a replacement.
The Dell diagnostics said the SMART test passed but the read test and
others failed (media problems). Unfortunately Seagate may not provide a
warranty for this drive beyond what Dell provides (the drive is 2 years
old).
I'm guessing that since a bad drive took down the box Dell hardware
isn't as robust as it could be (the box was $300 and it's pretty nice
for that, so I'm not expecting much more).
mdadm didn't complain about the disk failures or syncing on reboot.
Once I failed the drive it emailed me and now it complains on boot that
the array is degraded. So if anyone has advice on how to make it more
apparent that a drive is having problems I'd appreciate it (smartctl
shows the errors in the log, so maybe that's a possibility despite the
Dell SMART tests passing).
Dave
More information about the LUG
mailing list