[lug] Reliable SATA server?

Tue May 8 08:44:30 MDT 2012

ZFS is not new....Being in the FreeBSD community, I know many people 
that use it in production.  I also use it and it works fine.  ZFS has 
been in FreeBSD since 7, and that's over 3 years old now.  It's been in 
Solaris since 10 came out, which was in 2005 or so.

The only downside to it is that you really do much better if you have at 
least 8 GB of RAM.  More if you want to do deduplication.

On 5/8/2012 8:36 AM, Rob Nagler wrote:
> Hi Sean,
>
> I hope we're not boring the rest of BLUG, but this is something I don't
> see discussed often...  I like learning about ZFS from someone who actually
> uses it in production. :)
>
>> We had a customer with a few Dell 29xx boxes, they seemed to be pretty
>> good.  Their $300 rack-mount kits, as I've mentioned here before, are
>> fantastic.
> Yes, the rails are much better than whitebox rails I've had.  They
> just snap in and
> roll.
>
> The refurb box with rails is about $900 delivered.  It has 2 x
> quadcore 3ghz Xeons, 8GB,
> redundant power supply, dual enet, and remote access card.  It's
> really quite a beast,
> and way over kill for my problem, and quite a deal imiho.
>
>> For example, copying my home directory with "cp -al" takes 75 seconds:
> On 7.5M files and .5M directories, it takes about 2:45 on my slow
> server.  On the
> faster server (same disks), it takes about 2:30.
>
>> And then removing it takes 63 seconds.
> Removes take approx 3:00 and 2:20, respectively.
>
>> Comparatively, a snapshot takes well less than a second to create *AND*
>> delete:
> That makes sense.  ZFS is surely well-designed for high performance and
> reliability.  My contention is that it is "new", and I'll let others
> work out the bugs.
> (There was one data loss bug as recently as 2010, I believe.)  That's certainly
> selfish, but I do enough for the common green in other areas. :)
>
>>> Thrashing is running out of resources.
>> *EXACTLY*.  You only have so many I/Os you can do in a second.  If you have
>> millions of I/Os that you are trying to do, doing anything else that
>> generates I/Os, like logging into a system, will cause each one to have a
>> dramatically increased latency.
> I think we are talking apples and oranges here.  My server is
> dedicated to backup.
> It is designed for "peak load", that is, it consumes all the resources
> on the machine
> that are available to do its job, and no more.  This is similar to
> graphics cards,
> number crunching boxes, etc.  These are all "batch" load machines, and logging
> in to them and poking around is going to be slow when they are under peak load.
>
> All I require is that the machine be done with its batch job in 24
> hours.  It does
> that in plenty of time. The biggest job, in fact, is not the cp
> --link, but the weekly
> tars.  They take well over 24 hours on both machines.  The tar runs between
> different disks, and the writes are very sequential (typical file size
> is 100MB).
> The compression is probably a big cost on the slow boxes.  I'm curious how
> the Dell will fair.
>
> Why do I do the tar?  So I can put disks in a vault, and so I can also
> store multiple
> complete backups online on independent disks.  I've been fortunate that the
> compressed version of my backup data has kept pace with the maximum 2.5"
> drive available for the last couple of years.  That makes it possible
> to store a lot
> of data in one vault.
>
> You may say that I could make it all much faster with ZFS.  And, someday, I may
> do that.  However, these are blood of my and my customer's businesses.
> I have seen too many scary stories about backup software going awry.  If
> someone attacks my systems and destroys everything, I can get them back
> and running with the data in my vault(s).
>
> As it is, the systems have plenty of capacity (as long as they stay running).
> I don't care if the disks are rattling 7x24 as long as the complete process
> including weeklies finishes in a week.
>
> The only thing that needs to finish asap is pulling the data off the disks
> of the other systems.  That's a network problem for the most part.  That's
> why I'm also building standby servers, which will update and be validated
> in real-time.
>
>> file).  If each one of these operations now takes several seconds, a simple
>> operation like running "ls" can start taking minutes to complete.
> I created a red herring with the login thing.  I expect it to be slow on a busy
> system.  The problem is not slow logins but system failures.  For some reason,
> under these loads, whitebox servers I've owned don't cut the mustard
> -- to be fair, I was able to assemble one which has worked pretty well, but
> it can only have one CPU.  Hopefully, the Dell will work better.
>
>> I'm not doing some theoretical discussion here, I have personally observed
>> that "cp -l" can bring a system to it's knees and make it unresponsive.
> Agreed.
>
> Rob
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety