[lug] Backup

Thu Dec 22 20:08:11 MST 2005

On Thu, Dec 22, 2005 at 04:39:59AM -0700, Sean Reifschneider wrote:

> Perhaps it's time to look at a different drive vendor, or making sure that
> your drives are properly cooled?  One of the few failures of a hard drive I
> had was in my laptop, a few weeks after I dropped it...  I can't really
> complain about that one.  I have been using Hitachi/IBM drives, and they
> seem to be quite solid.  As I said, something around 100 of them in service
> with few problems.
> 
> I *HAVE* found cooling to be a problem in some cases.  If you don't keep
> the drives cool to within their specs, they *WILL* fail and fail quickly.
> Though the IBM drive I last had that problem with gave me several years of
> additional service after I let it cool down and then kept it cool.
> 
> On that machine I had a couple of open drive bays because I lost the
> plastic covers.  I ended up covering those bays with cardboard once I
> noticed how hot the drive was getting, and after that it was just fine.
> That was one of those IBM hard drives that everyone seemed to be having
> problems with, so I wondered if cooling might have been the common problem
> there.
> 
> I recently bought a cheap case and found that it didn't have reasonable
> ventilation for any of the drive bays it had internally.  I was planning to
> install a SATA hard drive bay system anyway, and that keeps them cool.

You know, I'll bet that has a lot to do with it.  Until a couple of years ago,
I never paid much attention to the case or case fans, and my drives did run
fairly hot.  My current case seems to do better, I've opened it in the past
and the drives were merely warm to the touch.  I also haven't any failures for
a while.  Hmmm...

> >I buy the cheapest drives, though, so maybe that makes it more likely?
> 
> Cheapest how?  ;-/

In the past, I had a lot more time than money, so it wasn't a big deal.
Nowadays, I would be willing to pay more if I knew the failure rate was
significantly lower.  Has anyone done a study on that or is all the real-world
MTBF anecdotal?

> >Like I said before, I've
> >seen about a dozen drives go belly up in 19 years, probably 1/3 of the drives
> 
> Well, 19 years ago drives were incredibly different.  In fact, Tandem has a
> paper about hardware changes just from 1985 to 1990 and how they
> dramatically reduced overall system problems to the point that operator
> error was now dominating the outage causes, and hardware failures were
> greatly reduced.  And that was just 5 years, 20 years ago.
> 
> Todays drives are a whole different breed.  Just 10 years ago I was
> managing a cluster with 64 4GB drives in it, (taking up the space of a
> refrigerator), and we could pretty much expect one drive to fail every 2 to
> 3 months...  Current drives over the last 5 years have been much less of an
> issue for me.  Maybe I'm just extremely lucky.

Yeah, I think I've seen fewer failures as time goes on too, now that I think
about it.  It used to be just about a yearly thing.

> >What *are* the best RAID monitoring solutions or techniques for Linux software
> >RAID?
> 
> mdadm works just fine for software RAID.

When setting up my RAID, I just installed the mdadm Debian package, followed
the howto, and I see in the ps list that mdadm is running.  Is that all there
is to it?  I suppose it will email root if anything goes wrong (it has "-m
root")?  I guess I'm just treating it as a novelty, I don't really believe it
will work until a drive fails and it works as advertised.  I have much more
faith in my backup strategy to save me should anything bad happen.