[lug] Robust storage

D. Stimits stimits at comcast.net
Sun May 1 15:54:04 MDT 2005


Daniel Webb wrote:
> I just noticed I have file corruption in an old mailing list archive (gzip
> fails about 1/4 of the way through).  It's not one I care about much,
> but it's got me thinking about the issue in general.  I have no idea
> when this corruption happened, sometime in the last two years.  Here are
> some questions I have for the experts out there:
> 
>  * How can I know if files have been corrupted through hardware errors?
>    Would Linux software RAID have prevented this?

I doubt RAID would help. If anything it would add complications, since 
your corruption was at the file level and not the filesystem. RAID can 
build copies and thus detect mirror errors, which would possibly aid if 
say a disk defect caused the error...but most defects don't simply show 
up as a bit alteration, they tend to show up as fatal filesystem errors 
(which might be fixed by bad block relocations).

>  * How can I know if files have been corrupted by bugs in the low-level
>    block drivers (the filesystem drivers or in my case drbd)?
>    Would Linux software RAID have prevented this?  What happens if the
>    corruption is cause by the RAID driver?

I'm guessing you need more of a filesystem journaling and that RAID 
itself will only help with overt failures. You can use ECC ram to avoid 
random bit failures in RAM (and believe it or not, apparently radiation 
hitting the ground from space, cosmic rays, can cause a random rare bit 
alteration and does so more frequently at higher altitudes like the 
Boulder area versus sea level).

>  * What are some inexpensive solutions to this problem?

Journaling filesystems, and UPS that guarantee full power during 
brownouts and low voltage situations. Brownouts and low voltages in 
general are a big danger to data corruption because the hardware doesn't 
necessarily know voltage is too low, and data can be altered without 
actual failure. Not all UPS provide power during low voltage, cheaper 
UPS will provide power only during overt failure. Never go without an 
UPS and never use an UPS that doesn't handle brownouts. On a similar 
line, if you have a power supply that is marginal and can under some 
circumstances act like a self-contained undervoltage (such as surge 
current use by a CD drive or HD starting up), an UPS won't help. Don't 
use an underrated power supply.

Among journaling filesystems, beware that most all journal metadata 
only, and not full data (full data journaling is a huge performance 
hit), so it can restore to a particular time but not necessarily all 
data will be available that was writing at the moment of failure...you 
might lose a few seconds, but the filesystem will be undamaged. But then 
if you try to write garbage you'll just get a good copy of garbage anyway.

D. Stimits, stimits AT comcast DOT net



More information about the LUG mailing list