[lug] Robust storage

Nick Golder nrg at nirgo.net
Mon May 2 14:04:52 MDT 2005


On 2005-05-02 01:58 -0600, Sean Reifschneider wrote:
> On Mon, May 02, 2005 at 12:25:09AM -0600, Dean Brissinger wrote:
> Yeah, livejournal had such good luck with their systems during their recent
> power-failure...
> 

Fair enough.

> To be honest, I've actually been moving away from hardware RAID these days.
> Even with 3ware, where the Linux driver support is pretty good, we've just
> had problems where the utilities aren't up to it and lead to problems
> requiring booting into the BIOS or the like.
> 

What kind of issues were you seeing?

> I've had extremely good luck with Linux software RAID over the same time
> period.  Oh, add to it that the software RAID is well documented, if in no
> other way than through the code.  A local guy lost his 3ware RAID array in
> his home box last summer, it just got confused and he couldn't recover it.
> He basically had the data, but needed to tell the controller, through
> poking appropriate bits onto a new drive, to bring the array back up, but
> 3ware wouldn't cooperate with documenting those parts of the drive.
> 

Which flavor of disks are you using with Linux software RAID?  Any
special hdparms you pass?

> Really, the battery backed up cache on the controller is for performance,
> not safety.  It allows the controller to confirm data written to the disc
> when it gets into the cache, instead of the normal case where it waits for
> it to get on the disc.  The problem here is the added complexity may lead
> to other subtle problems, as we saw with livejournal.
> 

The write cache is most definitely for performance.  However, the
battery backup is for safety in case of power loss prior to the cache
being written to disk.  In the case of livejournal, they had write
cache enabled on both the RAID card and the drives - I can't recall a
situation in which everyone lying to each other was a good thing. ;-)

> I need to try reiser again.  I steered clear of it for quite a while after
> a year where all of the tummy.com folks ran into serious file-system
> corruption on our laptops running Reiser.  That was around 4 years ago now
> though.  It's time to try again.
> 

IMHO, reiserfs has matured well over the last couple of years.  What
other file systems can use "atomic" and "dancing trees" as their
selling points?

> The real trick is to run regular validations against the backups.  Files
> which have changed but have the same mtime probably indicate some problems.
> 

Time for a round table discussion on filesystems?

-- 
-Nick Golder



More information about the LUG mailing list