[lug] Linux data corruption

Alan Robertson alanr at unix.sh
Tue Aug 9 22:51:26 MDT 2005


D. Stimits wrote:
> Daniel Webb wrote:
> 
>> Today I have seen two instances already where the hard disk is being 
>> corrupted
>> on a soft RAID system, yet nothing is reported in the syslog.  I'm using
>> 2.6.11, lvm, ext3, the current soft RAID with Debian sarge. 
>> In one instance, I untarred a 10MB tar.bz2 file and noticed a file had 
>> what
>> seemed to be a few bytes of corruption in one file in the new tree.  I
>> untarred the same file again in a different directory and this time 
>> there was
>> no corruption.  The corruption has to be happening somewhere between tar
>> passing the data to the OS to write to disk and the time it hits the 
>> platter,
>> but where?  If there are not errors about this in the syslog, should I 
>> assume
>> this is happening before the data gets to the RAID layer?  Is ext3 still
>> buggy?  Maybe my motherboard or memory are going bad?  There are so many
>> things that could cause this it's very frustrating.
> 
> 
> I've used exclusively RAID0 on 2.6.9 through 2.6.12 for the last year or 
> 2, ext3. No corruption. That said, I've seen something on other machines 
> which look like corruption: Marginal ram in one case, hard drive failure 
> in another.
> 
> It turns out that compression and decompression are particularly 
> vulnerable to bad ram. Drives that are running at too high of a 
> temperature as well. You might want to be sure that ram is good (run 
> memtest86 for 6 or 8 hours), and make sure the drives are all cool 
> enough that you can at least touch them with your fingers without any 
> discomfort.
> 
> I'm sure there are other possible causes, bad hardware isn't the only 
> one, but I haven't seen any ext3 or RAID0 issues for quite some time. If 
> you're using new drivers for some SATA maybe it is a driver problem; I 
> don't use LVM, so maybe that's another wildcard.

Late reply on this...

I've had two or three motherboards whose IDE controllers destroyed data. 
  Just replaced one of them recently.  Temps were all good, disk tested 
fine (zero SMART errors).

Linux is much harder on data paths than windows is.  It tries to get 
more performance out of the motherboard, and problems show up with it 
that never show up with windows.  RAID just makes this "more so".

 From what I know, I'd assume it was a hardware problem first.

-- 
     Alan Robertson <alanr at unix.sh>

"Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions." - William 
Wilberforce



More information about the LUG mailing list