[lug] Linux data corruption

D. Stimits stimits at comcast.net
Thu Jul 28 04:11:17 MDT 2005


Daniel Webb wrote:
> Today I have seen two instances already where the hard disk is being corrupted
> on a soft RAID system, yet nothing is reported in the syslog.  I'm using
> 2.6.11, lvm, ext3, the current soft RAID with Debian sarge.  
> 
> In one instance, I untarred a 10MB tar.bz2 file and noticed a file had what
> seemed to be a few bytes of corruption in one file in the new tree.  I
> untarred the same file again in a different directory and this time there was
> no corruption.  The corruption has to be happening somewhere between tar
> passing the data to the OS to write to disk and the time it hits the platter,
> but where?  If there are not errors about this in the syslog, should I assume
> this is happening before the data gets to the RAID layer?  Is ext3 still
> buggy?  Maybe my motherboard or memory are going bad?  There are so many
> things that could cause this it's very frustrating.

I've used exclusively RAID0 on 2.6.9 through 2.6.12 for the last year or 
2, ext3. No corruption. That said, I've seen something on other machines 
which look like corruption: Marginal ram in one case, hard drive failure 
in another.

It turns out that compression and decompression are particularly 
vulnerable to bad ram. Drives that are running at too high of a 
temperature as well. You might want to be sure that ram is good (run 
memtest86 for 6 or 8 hours), and make sure the drives are all cool 
enough that you can at least touch them with your fingers without any 
discomfort.

I'm sure there are other possible causes, bad hardware isn't the only 
one, but I haven't seen any ext3 or RAID0 issues for quite some time. If 
you're using new drivers for some SATA maybe it is a driver problem; I 
don't use LVM, so maybe that's another wildcard.

D. Stimits, stimits AT comcast DOT net



More information about the LUG mailing list