[lug] SMART test failure - retire disk?

Robert George Mayer mayer at acm.org
Tue Apr 6 16:20:31 MDT 2004


One drive of my three-drive RAID5 array fails a SMART test. Looks like the
Seek_Error_Rate is very bad. (Value 1 of a with a threshold of 23, with 
smaller numbers being worse in a range of, I believe 255-0).  The log files of 
the last four days do not show any change in Seek_Error_Rate for
this drive, but the other drives have changes with the raw values in the range
255-253.  Here is the output (long, feel free to snip when replying, table is easier to
read in a non-proportional font):

[root at blue root]# smartctl -a /dev/hda
smartctl version 5.21 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     MAXTOR 6L080L4
Serial Number:    664135619957
Firmware Version: A93.0500
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   5
ATA Standard is:  ATA/ATAPI-5 T13 1321D revision 1
Local Time is:    Tue Apr  6 15:54:14 2004 MDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x02) Offline data collection activity was
                                        completed without error.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (  64) The previous self-test completed having
                                        a test element that failed and the test
                                        element that failed is not known.
Total time to complete Offline
data collection:                 (  37) seconds.
Offline data collection
capabilities:                    (0x1b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        No General Purpose Logging support.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  40) minutes.

SMART Attributes Data Structure revision number: 11
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0029   100   253   020    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0027   069   066   020    Pre-fail  Always       -       3992
  4 Start_Stop_Count        0x0032   100   100   008    Old_age   Always       -       201
  5 Reallocated_Sector_Ct   0x0033   100   100   020    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   001   001   023    Pre-fail  Always   FAILING_NOW 13
  9 Power_On_Hours          0x0012   093   093   001    Old_age   Always       -       5102
 10 Spin_Retry_Count        0x0026   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0013   100   100   020    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   008    Old_age   Always       -       201
 13 Read_Soft_Error_Rate    0x000b   100   100   023    Pre-fail  Always       -       0
194 Temperature_Celsius     0x0022   078   074   042    Old_age   Always       -       57
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       78044
196 Reallocated_Event_Count 0x0010   100   100   020    Old_age   Offline      -       0
197 Current_Pending_Sector  0x0032   100   100   020    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x001a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: unknown failure    00%      5101         -
# 2  Short offline       Completed without error       00%      5101         -



Thanks.

- BOB



More information about the LUG mailing list