[lug] [Slightly OT] File Management?

Nate Duehr nate at natetech.com
Mon Mar 23 17:44:37 MDT 2009


Snipped a bunch out below... sorry I'm on the machine (winblows) that does a
bad job of quoting text... so I'll leave your comments below (not too many)
that I would like to comment on and comment up here with top-posting, which
will be sure to drive everyone crazy!  Wheee! 

First generic thought:  Computers are just bloody expensive.  THERE ARE
TIMES WHEN COMPUTER PEOPLE SHOULD TELL BUSINESS OWNERS THEY CAN'T AFFORD IT,
and keep things in a filing cabinet with paper and pen.  Seriously.  The
industry would actually be a hell of a lot better off if only those who
UNDERSTAND the benefits a RDBMS-based [insert technology here] system gets
them over their competitors -- and decides the business CAN afford to pay
for it to be done right because it will still lead to a profit -- would use
them.

Amen to the thought that people with experience running real-world data
systems should be the ones giving advice!  Caveat emptor, still applies, no
matter how many years some doofus has worked as a "storage consultant".
Real sysadmins in the trenches know what works, and what lets them sleep at
night...

I'll take a sysadmin's advice over the storage sales guy's any day of the
week, but business people still get wowed by dinner on the town and free
golf games, and make decisions that way instead of asking their folks in the
trenches.  

The way of the world, I suppose.  Business people:  You hired your system
admins to do a professional job, you might want to learn to listen to
them... not their MBA managers (unless they were promoted up from BEING a
sysadmin)... really!

I will be the first to take exception with the comments about NetApp --
while I agree with you that it's a proprietary system and also
WHOLEHEARTEDLY agree with you that "a backup that hasn't been recovered
somewhere to see if it's RIGHT, isn't a backup"... 

The NetApp systems I've had the pleasure of being involved with have been
FLAWLESS.  Did we ONLY trust the NetApp to "do the right thing"?  Hell no...
NEVER.  But did they?  Yes.

On the systems I had pointed at a NetApp Filer, the RDBMS was KNOWN to have
a solid working "quiescent" mode (stops all writes to the filesystem as long
as you have the RAM to stay stopped -- it was Oracle), and we recovered a
number of times, just fine... from those "quiet the DB (start transaction
logs over), NetApp snapshot, unquiet the DB" type backups.  

Then we'd roll forward the RDBMS logs from the time of the last snapshot.
If snapshots were done often enough, it was by FAR the FASTEST recovery
mechanism I ever saw for human "screwups" without getting into hot-copy type
database systems on, or off-site.

At that time, tape WAS still economically viable, and tape backups of the
snapshots was easy, with the AIT-3 carousel attached directly to the NetApp.
Those snapshots were readable in other ways with the AIT-3 drive hooked to
another (spare, but smaller) NetApp in the worst-case scenario (main NetApp
down) and it was only tested once, but it WAS tested...

Additionally, NetApp's customer service and automated repair/recovery was
incredible, back then.  I can't speak for it now.  But back then, if the box
so much as even hiccupped, as long as we were paying our service contract,
there was a guy standing at our front door with replacement parts in-hand,
sometimes before we even realized the box had a hardware problem... because
it was still up and running, and the alert had been missed in the busy
sysadmin environment we had there.  (The place, like all businesses, had
other business "issues" with sysadmins not having enough time to review
alerts properly, and distributing them badly via pagers and not e-mail or
other "smarter" means, but they made up for it in good hardware -- it was
always something they paid properly for, as well as high-level -- not cheap
bottom of the barrel -- service contracts.  Everyplace has it's ups and
downs.  We just needed to work on the monitoring stuff.)

Today... the systems I work on use Sun hardware, with solid SCSI-3 JBOD's
and Veritas handling the RAID setup, and the High-Availability clustering
environment, similar to your configuration you've found to be ultra-reliable
on your Dell boxes.  If you take away planned maintenance -- they run
continuously.  I wouldn't even give them a "six nines" or whatever, because
I've never seen this setup drop dead in production -- ever. 

(In fact, I laugh to myself when I watch Linux fans fighting problems that
Veritas figured out at least a decade ago, like the recent ext3 to ext4
timing debacles.  I think Veritas went through that at version 2 to 3 or
even earlier in VxFS.  Journaling filesystem technology isn't new to anyone
but the open-source world, and they're copying the commercial products that
already existed in the late 80's and early 90's.)

Many of these venerable Sun systems are still running Veritas 3.5 -- it's
rock solid, and performs better than any Linux HA scheme I've seen to-date,
other than complete Virtualization of the machines, and backups of the
virtual machine images.  

Virtualization seems to be the only way to get Linux to be as quick to
recover and as well off when a box fails.  (Well, technically Veritas
products are available for Linux too, but I RARELY see them on anything but
Sun boxes.  Sad, really... they work well on both!)  

Not saying that virtualization in ANY way is BAD... it's just a different
management technique for uptime.  And it only really started being an option
a few years ago, stability and performance-wise.  VMWare has done great
things, but Unix was "capable of doing these things" at least 5 years, if
not more, ago... prior to the cheap Linux environments doing it.  

It just wasn't CHEAP! Which is an over-riding theme in your comments also.

I've never seen a hardware failure take a dual-SunFire 440 with shared dual
external JBODs down.  Ever.  (Outside of planned hardware maintenance gone
bad... bad SCSI cables in particular can be a serious PITA!)

I'm sure it will happen eventually, and I'm also concerned that Sun has
moved FAR FAR away from this level of hardware quality in their modern 1RU
and 2RU servers, not to mention the recent IBM buy-out talks... sigh...
but... I digress.

As a side note, I still want to see more production use around here (or
elsewhere, if we can't find a reason to use it here) of ZFS.  But we're
already doing most of what ZFS does in Veritas, too... it's ALMOST another
"reinvent the wheel", also.  Maybe it's really not needed, but it's there
and "free" so why not try it out?  Many published documents are now out
where people ARE trying it, and it has ups and downs too... but it's good to
see a body of work forming around it.  Again, Veritas has a decade of this
sort of effort behind it... so is it worth switching?  Perhaps... especially
if starting off fresh...

The only complaint with Veritas these days is that Symantec bought them, and
they try their DAMNEDST to keep you AWAY from their support people.  For the
amount of money we spend on contracts to support these boxes that rarely
fail, I really would think it should be the other way around.  Symantec
tries to treat their customer service for large-scale enterprise
infrastructure technology the same as their support for $80 worth of backup
software, or virus software.  It's pitiful.  For this kind of money, I avoid
their website like the plague and still call the 800 number to get a live
body RIGHT FRAKKIN NOW when something is wrong... 

That's what you want in ANY proprietary solution... a real name, a real
phone number, and real support.  We always had that from NetApp, and so
far... have it from Veritas... but it's shaky compared to a few years ago.  

Watch out for company buy-outs or changes of "strategic direction"... they
change the support "landscape" at many companies, and can leave you high and
dry.  But before anyone says, "That never happens with open-source and
Linux", I'll just point out that it sure does... it just happens
differently.  The die-hard devs on a project move on, and the new folks
create "yet another new thing" to replace it... leaving you gasping for air
wondering why the thing that worked well is all of a sudden, "deprecated".
It happens in both worlds.  The trick is to get ready if you see signs that
it's coming... and that means being as plugged in as possible to the
"community" that creates anything you use that's mission critical to your
environment.  Of course, surprises do happen... Sun buying MySQL comes to
mind... not that I'd use MySQL in any of my production environments other
than hobby/small organization RDBMS use anyway.

One buyout that hasn't hurt a thing:  Informix buy out by IBM.  The rumors
of Informix's death are greatly exaggerated, as they say.

It's a HELL of a lot cheaper than Oracle, and works just as well or better
in some circumstances.  It's another commercial name I'd highly recommend...
and it's available as a free demo (last I checked) for Linux, and free for
personal/non-commercial use, and I think they'll make special deals for
non-profit use also.  Even at "normal" commercial prices, it wallops
Oracle's ass... in many of the ways that count, including price, for a
well-supported commercial RDBMS.  

(Who would use MySQL for anything other than a hobby site when great
commercial RDBMS's originally developed in the 80's and WELL debugged...
like Informix, are available?  Of course there were some really awful RDBMS'
back then also... let's all cheer for the death of Sybase SQL Anywhere,
while we're talking databases!!!)  

Grin... just some thoughts.  Take or leave, as you like....

Nate 

Nate 

-----Original Message-----

Primary data is expensive.  Make sure your customers know this, and
you are pricing your services accordingly.

Backups are expensive, Make sure your customers know this.

Archives are expensive. :-)

[snip]

If you get advice, make sure it is from somebody who has to manage
data in real situations, and for more than a month.

[snip]

I'm not a big fan of "point in time" backup systems provided by NetApp
and others.  The biggest problem I find with systems is knowing what
you have.  If you can't define it (measure it, whatever), you don't
know what it is.

[snip]
 
With NetApp, you are trusting that your dbms is going to write files
in such a way that you can recover.  





More information about the LUG mailing list