[lug] Linux data corruption

Daniel Webb lists at danielwebb.us
Wed Jul 27 23:03:24 MDT 2005


Today I have seen two instances already where the hard disk is being corrupted
on a soft RAID system, yet nothing is reported in the syslog.  I'm using
2.6.11, lvm, ext3, the current soft RAID with Debian sarge.  

In one instance, I untarred a 10MB tar.bz2 file and noticed a file had what
seemed to be a few bytes of corruption in one file in the new tree.  I
untarred the same file again in a different directory and this time there was
no corruption.  The corruption has to be happening somewhere between tar
passing the data to the OS to write to disk and the time it hits the platter,
but where?  If there are not errors about this in the syslog, should I assume
this is happening before the data gets to the RAID layer?  Is ext3 still
buggy?  Maybe my motherboard or memory are going bad?  There are so many
things that could cause this it's very frustrating.

I read on Slashdot recently that HP is going to start using Linux for their
"NonStop" server line.  If they have the kind of problems I've seen over the
last 6 years using Linux, this is going to be a disaster for them.  I put up
with it because it's a Free OS and the benefits outweigh the flaws, but people
spending $100k for a server aren't going to tolerate the kinds of things I have
seen today, assuming it's a software problem.

Daniel



More information about the LUG mailing list