[lug] RAID drive failure.

steve at badcheese.com steve at badcheese.com
Fri May 25 13:41:35 MDT 2007


Ahaa!  That's what I used to get when I used software raid too (but Sean 
insisted that the IDE controller tells linux that it's ok - so there, nya 
nya)!  Linux doesn't know that the drive has failed and continues to try 
to use it and basically hangs the system.  Manually taking the drive out 
of the array brings the machine back up unlike hardware raid.

Sean - I actually respect your work and your experience, so don't take my 
opposing viewpoint as anything negative, but I couldn't resist validating 
my experience which is the same as David's.  :)

- Steve

On Fri, 25 May 2007, David L. Anselmi wrote:

> Date: Fri, 25 May 2007 00:13:20 -0600
> From: David L. Anselmi <anselmi at anselmi.us>
> Reply-To: "Boulder (Colorado) Linux Users Group -- General Mailing List"
>     <lug at lug.boulder.co.us>
> To: "Boulder (Colorado) Linux Users Group -- General Mailing List"
>     <lug at lug.boulder.co.us>
> Subject: [lug] RAID drive failure.
> 
> I finally have my first drive failure in a RAID.  Up to now in my limited 
> experience all RAID failures have been controllers, backplanes, or personnel. 
> It didn't quite work as smoothly as some thought it would in a recent thread.
>
> This is in a Dell SC420 with 2 SATA drives in RAID 1.
>
> One of the drives is having trouble and it's hanging the machine.  The last 
> console messages were something about the good drive redirecting blocks to a 
> different mirror.
>
> The box was hung this morning and a reboot got it up for the day but it hung 
> again this afternoon.
>
> Rebooting the machine caused the array to start syncing.  I failed the bad 
> drive with mdadm and I expect that the box will work now until I get a 
> replacement.
>
> The Dell diagnostics said the SMART test passed but the read test and others 
> failed (media problems).  Unfortunately Seagate may not provide a warranty 
> for this drive beyond what Dell provides (the drive is 2 years old).
>
> I'm guessing that since a bad drive took down the box Dell hardware isn't as 
> robust as it could be (the box was $300 and it's pretty nice for that, so I'm 
> not expecting much more).
>
> mdadm didn't complain about the disk failures or syncing on reboot. Once I 
> failed the drive it emailed me and now it complains on boot that the array is 
> degraded.  So if anyone has advice on how to make it more apparent that a 
> drive is having problems I'd appreciate it (smartctl shows the errors in the 
> log, so maybe that's a possibility despite the Dell SMART tests passing).
>
> Dave
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: lug.boulder.co.us port=6667 channel=#colug
>

-- 
EMAIL: (h) steve at badcheese.com  WEB: http://badcheese.com/~steve




More information about the LUG mailing list