[lug] Implications of SMART warning

Daniel Webb lists at danielwebb.us
Sun May 6 18:23:32 MDT 2007


I'm recently starting to get this pair of message in my logs:

May  6 06:59:07 nerd smartd[3676]: Device: /dev/hdd, 1 Currently unreadable (pending) sectors
May  6 06:59:07 nerd smartd[3676]: Device: /dev/hdd, 1 Offline uncorrectable sectors

If this was a normal non-RAID partition, I would use fsck to figure out what's
going on, then act appropriately if it was a permanent sort of error (by
either chucking the drive or using it for temporary storage).  In the case of
RAID, though, what happens when you fsck a RAID where one of the drives has
some bad sectors?  Also, supposing this is an isolated incident, will Linux
RAID work around the bad sector and only retrieve the info from the disk with
the good drive?  (It's Linux soft RAID-1).

I do realize that it's often a sign of impending failure to get unreadable
sectors, but I've also seen unreadable sectors in the past on non-RAID drives
that were isolated incidents.

P.S. A nice thing you can with Linux on an old machine that is sort of
hair-of-your-chinny-chin-chin is to reformat a failing drive with 

$ mke2fs -c -c <device>

It marks out bad sectors at the software level in case the HD controller has
used up all its reserve sectors.  I know, normally if the HD controller has
used up all the reserve sectors it means impending doom, yet I have one drive
that's been running for years like that with dozens of bad sectors.  It's
just a VNC station, so it's no big loss if there is data corruption.  It just
keeps going somehow (it's 10 years old now and it has had all those bad
sectors for 5 years).




More information about the LUG mailing list