[lug] Interface CRC error on USB connected SATA drive

Jed S. Baer blug at jbaer.cotse.net
Sat Sep 10 11:49:45 MDT 2016


On Fri, 9 Sep 2016 22:27:40 -0600
Lee Woodworth wrote:

> I would vote for a driver retry of a command that didn't
> complete within the driver's deadline. That said, I would
> still want to know why the command failed or was slow. Dmesg
> is where I would start looking for driver error messages.

I'm not seeing anything in dmesg, but maybe too much time elapsed. I will
keep an eye out for that.

> A full diff after a complete cache flush would test for corruption
> and somewhat exercise the drives. Completley powering off the external
> drive would be a start for the cache flush. Then a smart short
> test might show something abnormal.

I did do a smart long test. No problems indicated. The md5sum operation
suggested by Davide should provide some good drive exercise as well.

I think cache auto-flushes after a certain amount of time.

> Possible things going on with the drive/driver:
> 
> 1) Not sure that I would necessarily blame the USB controller or
>    protocol**.
> 
>    If your disk enclosure is USB port powered then it might be the port
>    isn't providing enough/stable power. RPI users have reported seeing
>    intermittent disk errors with port-powered drives go away when they
>    switched to a beefier power brick for the RPI.

This old Thermaltake enclosure has its own power supply.

> 2) If you think multiple I/O streams are an issue, upgrading/changing
>    drivers might help. We have a thermaltake external USB3-SATA dock
> that intermittently times out using the uas driver (USB attached SCSI)
> but works fine using just usb-storage. The uas driver has issues with
>    multiple I/O streams for this dock.

I don't believe it's multiple I/O streams. Wasn't much else going on at
the time. Certainly nothing else I/O intensive. I'd sure like to upgrade
to USB3, but that's a ways in the future.

> 3) I would look at the drive temp from smart. For reference one of
>    our external USB backup drives has: recorded Min/Max 19/40 (C).
>    Its possible lots of seeks could increase the temp, but unless the
>    drive temp is near the max allowed its not obvious that would produce
>    your errors. I have seen high temps cause retries, but they were not
>    intermittent.

SMART is giving me no drive temp problem indications. I have though about
maybe drilling some ventilation holes in it, a time or two.

> 4) If you keep getting errors for the same LBA range, it would suggest
>    there may be a bad-block issue. You would probably also see other
>    smart errors in that case.

In that case, I think I would see a high reallocation count. At the
moment, the count is zero.


More information about the LUG mailing list