[lug] [Slightly OT] File Management?

Rob Nagler nagler at bivio.biz
Mon Mar 23 08:01:18 MDT 2009


Matt James writes:
> My brain hurts....  Help?

I'm sure we can help.  Just relax.  Take a deep breath....

We are having similar problems.  I reported back here a while ago
about problems I was having with Promise boxes.  Well I have now been
through the wringer, including a failed backplane on a generic box, on
dealing with TB of data.  I've been meaning to collect our experience
so here goes... Remember, the following is provided free of
charge. :-)

Buy reliable SCSI boxes for primary storage.  We use 2650s with RAID5.
SCSI3 disks are very reliable.

Primary data is expensive.  Make sure your customers know this, and
you are pricing your services accordingly.

Backups are expensive, Make sure your customers know this.

Archives are expensive. :-)

We have been through a few legal cases lately.  Make sure your data
retention policies are clear.  If you say you don't retain anything,
that's ok.  However, it may be to you and your customer's disadvantage
if you get sued.  I'm happy I save all our email, files, etc.  If you
run an honest business, having records to prove it is a very valuable
thing.

For primary systems, I highly recommend cheap, reliable SCSI3 servers.
We haven't gone to SAS yet, and I'm not excited about it.  One of the
biggest problems I have seen is that although the raw speed of
SAS/SATA is great, file systems don't do so well when they have to
manage millions of files.  I have a few friends at Seagate, and they
always say, "we get 6gb/s in the lab".  I tell them that we are lucky
to see 3mb/s with caching off.  "Oh, you have to turn caching on."
Then we get to 30mb/s, and they say, "Is it a fresh file system?"
And, I answer, you are joking, aren't you. :-(

If you get advice, make sure it is from somebody who has to manage
data in real situations, and for more than a month.

The SCSI3 servers are managing .5TB each.  We have two file systems: /
and /boot.  / is configured for 70M inodes.  Here's what we use to
build the file systems:

mkfs.ext3 -j -m 0 -i 8192 -b 4096 -O dir_index -O sparse_super /dev/sda2

Oh yea, don't use ext4.  It loses data apparently.  I'm a big fan of
old technology.  It got old, because it works. :-)

We use refurbished Dell 2650s.  They cost us about $1K, fully loaded.
All our 2650s are configured identically so that each can act as a hot
spare, in the even of a system failure.  I don't think we've lost a
single 2650 that was put in production.  (We did have some bad 2650s
out of the box which we bought on ebay.  We now buy from a reliable
source.)  We use Seagate ST373453LC and onboard PERC3, btw.

Make sure you have a 1gbps backnet, which is lightly loaded.  When you
are slinging TBs, you will want some cheap and readily available
bandwidth.  If your backnet is loaded, buy another enet card.  They
are cheap, and it will make backups, panics, etc. a lot easier.

We have been through numerous SATA backup solutions.  We have wasted a
tremendous amount of money ($10Ks) and time (100s of hours) on getting
this "right".  I'm satisfied with the solution given the sunk cost,
but I'm not happy.  If I had it to do all over again, I would have
stuck with Dell SCSI systems, and paid up front in dollars what I
ended up losing in time.  My biggest problem is time, not money.  I
just am a natural cheapskate when it comes to hardware, and it ends up
biting me hard.

The primary problem with SATA is the controllers depend on caching and
clean file systems (did I mention this already?).  When you want a
reliable backup, you can't afford to resend all the data, all the
time, unless you have unlimited bandwidth.  I might use SATA for what
I will call "tape" (explained later), but I wouldn't use it for our
main backups.

We keep three backup mirrors in three different locations.  We use "cp
--link" to make low cost images for the last week or so.  Since we
have reliable primaries, we only use our backups for "woops, I lost
that file" and e-discovery.  The latter is a real pain, if you don't
have a good archival system.  The former is trivial, because we use "cp
--link".   We tarball up weeklies and monthlies as full copies.  The
algorithm we use (see Bivio::Util::Backup) avoids super large files,
but stores data efficiently.

Our backup machines are Chenbro/Intel whiteboxes, again, I don't
recommend these.  We have tried 3ware 9550 and 9650s.  The 3ware
controllers are reliable.  We lost a Chenbro backplane, which caused
us to lose the entire backup.  Good thing we have two others. :-) Each
backup host has 6TB.  There's no way to copy this amount of data over
the net.  Make sure that you can easily recreate your backup machine
through copying from one of its siblings.  Turns out we can do this.

We no longer back up to tape.  It's simply too expensive, and in a
virtual organization like ours, someone can't always be in the office
to fix the tape drive.  I'm not going to go into how we secure our
data.  It's not hard to figure out a system where you would take disks
offline and swap occassionally.

I wouldn't go larger than 6TB per machine.  There just isn't enough
bandwidth in the system busses to manage this data.  People *think*
they are managing their data, but I suspect if I were to walk in and
start pulling the right cables, their systems would fall apart, and
it would be very hard for them to recover.  Just remember that disks
are cheap and data are expensive.  We've seen a number of high-profile
"woops, the entire system is gone" on slashdot recently.  It could
happen to you.

I'm not a big fan of "point in time" backup systems provided by NetApp
and others.  The biggest problem I find with systems is knowing what
you have.  If you can't define it (measure it, whatever), you don't
know what it is.  That goes doubly for data.  We export our databases,
run standbies, etc. We know what data is long-lived (essentially
frozen in time), and what isn't.  For example, my personal picture and
music collections are very large.  They aren't backed up via our
normal mechanisms.  Once a file a jpg or mp3 is stored, it doesn't
change so I mirror the archive nightly on multiple machines in
multiple locations, but this is the archive.  The files themselves are
read-only so I "know" they won't be modified.  I archive the files on
offline media every few months.  Know what your data is and how
frequently it is modified.

With NetApp, you are trusting that your dbms is going to write files
in such a way that you can recover.  More importantly, you are
trusting that you can recreate the system which was used to create the
point-in-time backup.  It's not an archive, because you would need to
keep the original computers around to recover the point-in-time
backup.  Since an archival system is as important (or more important?)
than most backup systems, point-in-time is really a bandaid, not a
general solution to data management.  Finally, to recover from
point-in-time, you have to be sure NetApp will do the right thing.
Given that such recoveries are extremely rare, it's not clear your
particular version of the software won't have a weird bug that
switches disk blocks around.  When I had that backplane problem, the
3ware controller could not recover the systems.  They were hosed.
Fortunately, we didn't trust any particular piece of software/hardware
too much.  We didn't lose any data as a result of that failure.

I hope this helps.

Rob





More information about the LUG mailing list