[lug] Reliable SATA server?
Lee Woodworth
blug-mail at duboulder.com
Sat Apr 21 14:57:48 MDT 2012
On 04/21/12 09:03, Rob Nagler wrote:
> Over the years we've struggled to find a reliable SATA server for our
> backups. I have tried numerous versions of SATA computers, and they
> all seem to fail under our loads, always. The failure is usually in
> the form of the computer hanging in such a bad state that it requires
> a power cycle.
My heavy load case is of course very different than yours, but it can
reliably make disks run hot, especially laptops: a complete source
rebuild of a Gentoo install (1100 packages, 24hrs). When the disks get
hot enough they lock the I/O system by hanging. This is a multi-core,
multi-process parallel make so there are often many processes doing
sequential reads and writes to separate files/directories and also
doing rm -r temp dir clean ups.
Doing this rebuild on SATA-based DELL rack-mount servers has never been
a problem even though the servers have more cores and processes running
at the same time (there can 7 compiles running at once). I do however
use the higher end lines of the SATA disks for these systems: WD Caviar Black
(5yr warranty vs. minimal) and some more 'enterprise-level' Seagates.
The cases are also probably better at thermal heat transfer from the
drives than typical towers.
Another bit of evidence that drive temp may matter, in ancient times I ran a
production data-warehouse loader on a whitebox, dual-cpu, pentium-based MB
with 4 consumer IDE disks. This process absolutely bashed the drives
continuously for 24+ hours with lots of seeks. Never had a temperature
or lockup problem with the drives or system -- it was big case with a
fan in front of each drive. Those drives took a year of abuse and lasted
3 more years of normal use after that. Maybe it was luck, but dual cpu
and IDE wasn't considered so reliable in those days.
I would compare the internal disk temps (maximums if recorded) for the
less stable to more stable systems if you can.
>
> We do have a fairly stable server these days, but now I'm finding the
> disk slots are failing -- along with the disks. The server isn't that
> old, but it has been relatively reliable (only the occasional hang).
>
> I've attached a vmstat when the server is busy. I can't do much on
> the server at this time, e.g. tab completion of a command takes a few
> seconds. This is very different from a benchmark load so the numbers
> are deceptive.
>
> At this stage, I'm tempted to bite the bullet and go with SAS. We
> have SCSI3 machines running with heavy (but dissimilar) loads without
> a problem.
As for SAS vs SATA, I think the case and drive can make as much difference.
Your SCSCI3 disks are probably designed for enterprise use (continuous use).
The consumer disk drives that I have looked up the duty cycle ratings for
have always been less than 50%, some only 20%.
The DELL servers we use have the MPT fusion controller with a SATA backplane.
The drive temps seem to running around 32C. No problems even though the
server is doing the rebuild 3 times as fast as an HP consumer tower.
On the other hand, laptop drives are often 40C idle. On our least stable laptop,
the temp is 48C and can go up 61C during the Gentoo rebuild. Higher than
that is where failures happen on this machine.
>
> I've asked this question on this list before, and not gotten a
> satisfactory answer. Nobody seems to run the same type of loads that
> we do, and all their servers run just fine. However, if you do have
> experience with non-sequential writing of large amounts of data to
> on a SATA server, please let me know.
>
> Thanks,
> Rob
>
> # vmstat 1
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
> r b swpd free buff cache si so bi bo in cs us sy id wa st
> 0 2 144 15672 328852 826992 0 0 300 184 9 3 2 1 91 6 0
> 0 2 144 16136 328436 827052 0 0 3684 0 1544 1361 3 2 55 39 0
> 0 2 144 19764 326164 826964 0 0 10948 0 1270 1253 10 2 52 37 0
> 2 1 144 26640 321224 826268 0 0 27828 2948 1295 2364 21 3 50 26 0
> 0 2 144 26504 320472 827268 0 0 5832 0 1375 1150 10 1 54 35 0
> 0 3 144 29356 319048 826360 0 0 4688 784 1371 984 13 2 52 33 0
> 0 3 144 29232 319048 826528 0 0 32 5624 1114 90 0 0 62 37 0
> 0 2 144 29232 319048 826508 0 0 8 748 1119 72 0 0 54 46 0
> 0 3 144 29232 319052 826532 0 0 8 36 1094 56 0 0 26 74 0
> 0 2 144 29608 318708 826432 0 0 3244 4 1446 1138 3 2 52 43 0
> 0 2 144 29884 318204 826952 0 0 2344 1000 1294 656 2 1 55 42 0
> 1 2 144 32428 316516 826648 0 0 6388 0 1167 732 10 2 49 40 0
> 1 2 144 35916 314220 825756 0 0 7476 0 1205 837 11 2 49 38 0
> 0 2 144 37940 312616 825584 0 0 6484 0 1228 835 10 2 49 40 0
> 0 2 144 39108 311712 825816 0 0 4184 14048 1235 553 5 1 50 44 0
> 0 3 144 39428 310848 826544 0 0 4528 4956 1333 939 6 2 53 40 0
> 0 3 144 39428 310852 826676 0 0 8 784 1131 80 0 0 51 49 0
> 0 2 144 39428 310852 826644 0 0 8 468 1099 78 0 0 50 50 0
> 0 2 144 40328 310620 825876 0 0 912 0 1338 694 0 0 52 48 0
> 0 2 144 39600 310476 826804 0 0 3092 104 1492 1285 1 2 60 37 0
> 0 2 144 40972 309896 826492 0 0 3636 1052 1312 806 3 1 54 42 0
> _______________________________________________
> Web Page: http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety
More information about the LUG
mailing list