[lug] Reliable SATA server?

Sean Reifschneider jafo at tummy.com
Tue May 8 00:21:26 MDT 2012


On 05/06/2012 10:17 AM, Rob Nagler wrote:
> I just bought a Dell 2950 with 6 x wd2003fyys.  I've had very good

We had a customer with a few Dell 29xx boxes, they seemed to be pretty
good.  Their $300 rack-mount kits, as I've mentioned here before, are
fantastic.

> A system shouldn't thrash for doing simple operations like this.

See, that's what I'm trying to say.  "cp -l" is *NOT* a simple operation.
It is lots of small random I/Os, generally considered one of the heaviest
loads you can put on a server.

For example, copying my home directory with "cp -al" takes 75 seconds:

   chats:/home$ time sudo cp -al jafo/ jafo-cp/
   cp: cannot stat `jafo/.gvfs': Permission denied
   sudo cp -al jafo/ jafo-cp/  0.98s user 21.83s system 30% cpu 1:15.98 total

And then removing it takes 63 seconds.

And that's on an SSD, though admittedly an older one, still fairly fast at
random I/Os...

It was running around 3K IO/sec, so it took around a quarter million
operations to do.  On a spinning disc with 11ms average access time, that
could have been more like an hour, just doing the straight math.

My home directory has over 300K files in 30K directories underneath it.

Comparatively, a snapshot takes well less than a second to create *AND*
delete:

   chats:/home$ time sudo btrfs subvolume snapshot . .snapshots/20120508
   Create a snapshot of '.' in '.snapshots/20120508'
   sudo btrfs subvolume snapshot . .snapshots/20120508  0.00s user 0.05s
system 29% cpu 0.160 total
   chats:/home$ time sudo btrfs subvolume delete .snapshots/20120508
   Delete subvolume '/home/.snapshots/20120508'
   sudo btrfs subvolume delete .snapshots/20120508  0.00s user 0.00s system 5%
cpu 0.135 total

> Thrashing is running out of resources.

*EXACTLY*.  You only have so many I/Os you can do in a second.  If you have
millions of I/Os that you are trying to do, doing anything else that
generates I/Os, like logging into a system, will cause each one to have a
dramatically increased latency.

Many operations need to do a few I/Os, then a few more, then a few more
(like loading the binary, loading the shared libraries, reading a config
file).  If each one of these operations now takes several seconds, a simple
operation like running "ls" can start taking minutes to complete.

I'm not doing some theoretical discussion here, I have personally observed
that "cp -l" can bring a system to it's knees and make it unresponsive.

Sean



More information about the LUG mailing list