[lug] RAID installation on Fedora 6 Zod

Nate Duehr nate at natetech.com
Fri May 11 15:24:08 MDT 2007


> Although "me too" posts are typically silly, I concur with Kevin (and
> Sean) here.  I've been using software RAID for many years-- across many
> kernel upgrades-- and never once had a problem.  The tools are awesome,
> and in the few instances where I had a drive go bad, I was able to drop
> in a new drive, reboot, and watch as the machine automatically realized
> there was a new drive and started its resynchronization.
>
> Given my experience with Linux software RAID, I've never had a need to
> even look into hardware RAID solutions.

If you have the luxury of being allowed to reboot, you're fine with most
software RAID implementations.

Perfectly working hardware RAID is important in "no downtime"
environments, like telecom and banking and a few others.

I have customer's machines that haven't been booted in years.

(When the box is capable of billing $50k-$100k an hour during peak traffic
time... it doesn't ever get turned off.  Sure wish they'd send me the
checks from that! Ha.)

Hot-swap that works flawlessly is a requirement, in that environment, 
obviously.  These machines go a step further and many are in dual-machine
hot-standby (cluster) configurations.  If a single box has a fault, over
to the "good" box you go... until you can figure out what's wrong with box
#1.

(The ultimate no-no in telecommunications is NOT to answer the phone line
that's got an incoming call... uptime is paramount, minor bugs are
regularly overlooked or the patch for the bug isn't applied until the call
traffic can be moved to another platform.  Engineers fix bugs anywhere
from six months to two years before the patches can actually be applied to
the production environment.  Seriously.  It's strange... you get used to
it.)

It's really common to see a platform and a whole technology/version of the
product get retired before the machine has been booted ten times... not
including the test boots with remote power switches during hardware
testing at initial turn-up.

Booting one of these Sun boxes we work on here, typically requires
Director-level approval or higher and is a highly documented process for a
very specific purpose.  Booting it because of an unscheduled hardware
failure is typically escalated to someone at VP level or higher.

Even loading a shell script to do something useful... clean up files...
rotate a log someone forgot about... etc... anything that touches cron...
must be dry-run in a lab environment at both our side and the customer,
and the full written procedure for every command that will be typed must
be signed off by at least three people, before the work can be scheduled. 
No one has root, and root is doled out by the customer's techs to us at
the beginning of the maintenance window, and the "temporary root access"
is turned off/re-locked-down as soon as the maintenance is completed.

Just sharing... there are environments where Linux software RAID is a
kiddie toy... (GRIN)...

But for home/small biz users... it's gotten very good.  I'd use it to
mirror a home desktop machine or a small server without any qualms... but
having hardware RAID and hot-swappable drives for years will have a
tendency to make you wish for that, even at home... (BIG GRIN).

Realistically, money is the problem -- and I have a server at home on
Debian that is backed up (heavily and often) and the backups have been
tested, at least a few times... and I just hate the thought that I know a
drive will fail in it "someday"... I keep meaning to upgrade it, but...
well, it's a money vs. time vs. benefit analysis I can't justify yet.

-- 
Nate Duehr, nate at natetech.com




More information about the LUG mailing list