[lug] RAID redundancy with boot sectors?

Nate Duehr nate at natetech.com
Sun Nov 26 02:09:40 MST 2006


Dan Ferris wrote:
> With Redhat based distros, you can only make the /boot partition on a 
> RAID 1 if you decide to use RAID.  You can't make it anything except for 
> RAID 1 if you want to use RAID.  When Redhat goes to format the 
> partitions, it will install grub on both partitions that are going to be 
> the /boot RAID 1 mirror.
> I've had it screw up as well.  When that happens, you just use dd like 
> this:
> 
> dd if=/dev/sda of=/dev/sdb bs=512 count=1
> 
> If the boot drive dies, you just go into the BIOS and tell the PC to 
> boot off of the second drive and you should be good to go.  It works 
> especially great with SATA drives.  I've done this procedure twice.  
> Once where Fedora didn't install grub on both drives and I had to 
> manually install it , the second time it worked flawlessly.  Since your 
> /boot is mirrored, the kernel images will all be where you expect, you 
> just have to make sure to install grub.
> 
> This should work even if you are using RAID 5, since you can only mirror 
> /boot.
> 
> Incidently, this is a good reason to always have a seperate /boot 
> partition.
> 
> Another thing is that if you have a RAID 1 mirror and a drive dies, you 
> can swap the bad one out, dd copy the partition table from the good 
> drive to the new drive and you should be able to rebuild the mirror very 
> rapidly.

Hi Dan,

I messed around with this for a while (software RAID 1 mirroring on 
Linux) and ultimately came to the conclusion that it was utterly 
useless.  But here's my criteria...

In the case of IDE or SATA, a disk failure generally has a tendency to 
lock the machine up anyway, if you're using bog-standard 
(non-hardware-RAID) interfaces.  If not locked up, so degraded that the 
box isn't working right anyway.

"Real" RAID's whole purpose in life was to add uptime.  A disk fails, 
it's removed and replaced seamlessly.  There's almost no way to do this 
with the "Simple" software RAID.  Either you're swapping cables, or 
pulling a disk out and rebooting (possibly with an fsck pass if the box 
went down in a particularly nasty way), etc.  So you don't gain the 
"uptime" factor.

An equally simple setup that has almost the same amount of downtime, is 
just keeping good backups.

What I've found in practice is that generally the MTBF (mean-time 
between failure) of most consumer-grade drives is in the order of years. 
  When they do finally fail, the drive technology is so far ahead of 
where it was, that an upgrade is almost always driven by the price-point 
of current hardware.

So for non-critical machines where "Real" RAID can't be purchased, you 
end up no worse off to buy a pair of cheap bigger drives, reload, and 
restore backups... and now you have three drives (one smaller than the 
other two) and you usually didn't take that much more downtime than 
messing around rebuilding the software RAID.

So I've kinda set the rule for my personal systems that:
1. Backups of some sort are a requirement... of everything.
2. Software RAID doesn't buy me much, other than a lot of wasted time 
setting it up and fixing it when a drive finally munches itself.
3. Hardware RAID is required for machines that shouldn't ever go down... 
and preferably it should be with hot-swappable discs.

Finding the funds for #3 will determine if the machine is really "that 
critical".  :-)

So instead of tinkering with Software RAID (which *is* fun, I'll admit), 
nowadays I'd rather use the second big disc for a nice backup scheme 
over Software RAID, really.

An even bigger time-waster appears to be "fakeraid" chipsets like the 
Highpoint's and others.  I spent a couple hours researching that whole 
mess tonight for a friend who just happens to have one of those (crappy) 
cards.  I realized that in order to recover from a failure on his box 
he's building, since he's not a Linux guru type, we'd have to come up 
with a huge document that described how to rebuild the machine and get 
the "fakeraid" card working again, just to see his data if he ever had 
major problems.  Ick.

I think he'd be better off buying a good 3Ware card and having two days 
of his life back to do better things with.  :-)

Nate



More information about the LUG mailing list