[lug] RAID drive failure.

David L. Anselmi anselmi at anselmi.us
Fri May 25 00:13:20 MDT 2007


I finally have my first drive failure in a RAID.  Up to now in my 
limited experience all RAID failures have been controllers, backplanes, 
or personnel.  It didn't quite work as smoothly as some thought it would 
in a recent thread.

This is in a Dell SC420 with 2 SATA drives in RAID 1.

One of the drives is having trouble and it's hanging the machine.  The 
last console messages were something about the good drive redirecting 
blocks to a different mirror.

The box was hung this morning and a reboot got it up for the day but it 
hung again this afternoon.

Rebooting the machine caused the array to start syncing.  I failed the 
bad drive with mdadm and I expect that the box will work now until I get 
a replacement.

The Dell diagnostics said the SMART test passed but the read test and 
others failed (media problems).  Unfortunately Seagate may not provide a 
warranty for this drive beyond what Dell provides (the drive is 2 years 
old).

I'm guessing that since a bad drive took down the box Dell hardware 
isn't as robust as it could be (the box was $300 and it's pretty nice 
for that, so I'm not expecting much more).

mdadm didn't complain about the disk failures or syncing on reboot. 
Once I failed the drive it emailed me and now it complains on boot that 
the array is degraded.  So if anyone has advice on how to make it more 
apparent that a drive is having problems I'd appreciate it (smartctl 
shows the errors in the log, so maybe that's a possibility despite the 
Dell SMART tests passing).

Dave



More information about the LUG mailing list