[lug] Interface CRC error on USB connected SATA drive

Jed S. Baer blug at jbaer.cotse.net
Fri Sep 9 20:22:10 MDT 2016


Hi Folks.

I'm in the middle of some disk migration, owing to upgrading external
storage from 1TB to 2TB. Configuration is a new Toshiba 2TB SATA drive in
an external enclosure, connect via USB-2.

I copied a large number of files, appx 167GB, using my favorite method:
  cd /path/source
  tar cf - . | (cd /path/dest; tar xf -)

I'm trying to be on the lookout for any problems with the new drive,
before I fully commit to the rest of the process, so I periodically fire
up gsmartcontrol and see if anything's amiss. Now I have two instances of
a "interface CRC error, command aborted", with further logging indicating
this was during a DMA WRITE.

I see nothing informative in /var/log/syslog.

(I do see a gripe from smartd about /usr/bin/mail not being there, but
that's a seperate irritation. Possibly it would've mailed something
useful.)

The other fun part of this is that while tar was running in the
background, I fired up bluefish to tinker with some HTML. I launch most
of the things I use from the command line. bluefish has a nasty habit, I
discovered, of generating huge amounts of mindless bitspew to stdout (or
stderr) while it's running, thus, when I checked to see if the tar was
finished, any error messages it might have given were no longer available
in the terminal scrollback.

File count, and size (according to du -cs) are correct.

A web search indicates this is bad communication between the drive and
the controller - unsurprising, given the USB2 in the middle.

Finally, here's what I'm wondering: which of the following is more likely?
1) Down in the kernel, the ATA driver noticed the error, retried, and
succeeded
2) I have corruption in a file or files

Here's one of the 2 instances of this error, from SMART

SMART Error Log Version: 1
ATA Error Count: 2
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2 occurred at disk power-on lifetime: 70 hours (2 days + 22 hours)
  When the command that caused the error occurred, the device was active
or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 40 a0 58 44 0d  Error: ICRC, ABRT 64 sectors at LBA = 0x0d4458a0
= 222582944

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  35 d5 f0 f0 57 44 e0 00   2d+19:22:18.897  WRITE DMA EXT
  35 d5 f0 00 57 44 e0 00   2d+19:22:18.893  WRITE DMA EXT
  35 d5 f0 10 56 44 e0 00   2d+19:22:18.889  WRITE DMA EXT
  35 d5 f0 20 55 44 e0 00   2d+19:22:18.885  WRITE DMA EXT
  35 d5 f0 30 54 44 e0 00   2d+19:22:18.881  WRITE DMA EXT

Error 1 occurred at disk power-on lifetime: 70 hours (2 days + 22 hours)
  When the command that caused the error occurred, the device was active
or idle.


More information about the LUG mailing list