[lug] RAID drive failure.

Fri May 25 17:59:35 MDT 2007

On Fri, May 25, 2007 at 03:41:35PM -0400, steve at badcheese.com wrote:
>Sean - I actually respect your work and your experience, so don't take my 
>opposing viewpoint as anything negative, but I couldn't resist validating 

Indeed, it's good to hear some other experience.  As I said, I rarely have
hard drive failures on our systems, so I don't get much of a chance to test
real failures.  I've maybe had 20 IBM/Hitachi drives fail over the last 10
years, out of probably 400+ drives.

That said though, we have had a couple of instances where we got errors
reported by the drives which md detected and dropped the drive out of the
array.  So, we definitely have seen software RAID correctly remove a mirror
without locking up the system, over the last couple of years.

The funny thing is that about a month ago we got a batch of around a dozen
hard drives that were basically DOA.  Some were completely DOA, wouldn't
even be detected by the controller, some would work, but were running
roughly a tenth the speed they should have been (6MB/sec instead of 55
running badblocks).  We had more failures in this one batch of drives we
got than we have in production over the last several years.  Of course,
these things didn't make it into production, our test found them before
they even got close.

I don't think this was the manufacturer though, I think it was the place we
got them.  We've switched to buying case quantities of drives from a huge
vendor, which means we're getting them in the original Hitachi sealed
boxes.  No re-packaging, meaning less of an opportunity for them to play
hockey with the drives before packing them.

Sean
-- 
 eth0.666: VLAN of The Beast.
Sean Reifschneider, Member of Technical Staff <jafo at tummy.com>
tummy.com, ltd. - Linux Consulting since 1995: Ask me about High Availability
      Back off man. I'm a scientist.   http://HackingSociety.org/