[lug] Linux data corruption
D. Stimits
stimits at comcast.net
Thu Jul 28 04:11:17 MDT 2005
Daniel Webb wrote:
> Today I have seen two instances already where the hard disk is being corrupted
> on a soft RAID system, yet nothing is reported in the syslog. I'm using
> 2.6.11, lvm, ext3, the current soft RAID with Debian sarge.
>
> In one instance, I untarred a 10MB tar.bz2 file and noticed a file had what
> seemed to be a few bytes of corruption in one file in the new tree. I
> untarred the same file again in a different directory and this time there was
> no corruption. The corruption has to be happening somewhere between tar
> passing the data to the OS to write to disk and the time it hits the platter,
> but where? If there are not errors about this in the syslog, should I assume
> this is happening before the data gets to the RAID layer? Is ext3 still
> buggy? Maybe my motherboard or memory are going bad? There are so many
> things that could cause this it's very frustrating.
I've used exclusively RAID0 on 2.6.9 through 2.6.12 for the last year or
2, ext3. No corruption. That said, I've seen something on other machines
which look like corruption: Marginal ram in one case, hard drive failure
in another.
It turns out that compression and decompression are particularly
vulnerable to bad ram. Drives that are running at too high of a
temperature as well. You might want to be sure that ram is good (run
memtest86 for 6 or 8 hours), and make sure the drives are all cool
enough that you can at least touch them with your fingers without any
discomfort.
I'm sure there are other possible causes, bad hardware isn't the only
one, but I haven't seen any ext3 or RAID0 issues for quite some time. If
you're using new drivers for some SATA maybe it is a driver problem; I
don't use LVM, so maybe that's another wildcard.
D. Stimits, stimits AT comcast DOT net
More information about the LUG
mailing list