[lug] Interface CRC error on USB connected SATA drive
Jed S. Baer
blug at jbaer.cotse.net
Fri Sep 9 20:22:10 MDT 2016
Hi Folks.
I'm in the middle of some disk migration, owing to upgrading external
storage from 1TB to 2TB. Configuration is a new Toshiba 2TB SATA drive in
an external enclosure, connect via USB-2.
I copied a large number of files, appx 167GB, using my favorite method:
cd /path/source
tar cf - . | (cd /path/dest; tar xf -)
I'm trying to be on the lookout for any problems with the new drive,
before I fully commit to the rest of the process, so I periodically fire
up gsmartcontrol and see if anything's amiss. Now I have two instances of
a "interface CRC error, command aborted", with further logging indicating
this was during a DMA WRITE.
I see nothing informative in /var/log/syslog.
(I do see a gripe from smartd about /usr/bin/mail not being there, but
that's a seperate irritation. Possibly it would've mailed something
useful.)
The other fun part of this is that while tar was running in the
background, I fired up bluefish to tinker with some HTML. I launch most
of the things I use from the command line. bluefish has a nasty habit, I
discovered, of generating huge amounts of mindless bitspew to stdout (or
stderr) while it's running, thus, when I checked to see if the tar was
finished, any error messages it might have given were no longer available
in the terminal scrollback.
File count, and size (according to du -cs) are correct.
A web search indicates this is bad communication between the drive and
the controller - unsurprising, given the USB2 in the middle.
Finally, here's what I'm wondering: which of the following is more likely?
1) Down in the kernel, the ATA driver noticed the error, retried, and
succeeded
2) I have corruption in a file or files
Here's one of the 2 instances of this error, from SMART
SMART Error Log Version: 1
ATA Error Count: 2
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 2 occurred at disk power-on lifetime: 70 hours (2 days + 22 hours)
When the command that caused the error occurred, the device was active
or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 40 a0 58 44 0d Error: ICRC, ABRT 64 sectors at LBA = 0x0d4458a0
= 222582944
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
35 d5 f0 f0 57 44 e0 00 2d+19:22:18.897 WRITE DMA EXT
35 d5 f0 00 57 44 e0 00 2d+19:22:18.893 WRITE DMA EXT
35 d5 f0 10 56 44 e0 00 2d+19:22:18.889 WRITE DMA EXT
35 d5 f0 20 55 44 e0 00 2d+19:22:18.885 WRITE DMA EXT
35 d5 f0 30 54 44 e0 00 2d+19:22:18.881 WRITE DMA EXT
Error 1 occurred at disk power-on lifetime: 70 hours (2 days + 22 hours)
When the command that caused the error occurred, the device was active
or idle.
More information about the LUG
mailing list