[lug] More Server Problems

Mon Mar 6 12:11:16 MST 2006

On Mon, Mar 06, 2006 at 08:17:38AM -0700, George Sexton wrote:
>The problem then is that I have 16 times more machines to update and manage.

Managing 2 boxes is not at all twice as time-consuming as managing one box.
Management and monitoring are things that scale extremely well.  I'm sure
that we have some economies of scale that you wouldn't, since we manage
hundreds of client machines, but it's still true that it's not linearly
more time.

>Additionally, the uptime formula is (if I remember right) the (Number of
>Boxes * SquareRoot(Probability of Failure))^2.

It's a bit more complicated than that because, as I said, the single large
box route has a 100% impact where a cluster of 16 boxes has an impact of
6.25% if any one of those boxes goes.  Plus, your big box is probably going
to have lots of discs in it in order to be able to keep up with the load,
unless you're not IO bounded.

So, yes, you'll have more discs with the cluster, but it probably won't be
16x as many.  You also have the benefit of adding disc capacity as you add
more computing capacity.

>My own experince tells me
>that if I have 16 machines with drive mirroring, 2-4 of those mirrored
>drives are going to die in the first year.

Then maybe you'd better look more closely at your disc vendor choice and
burn-in procedures.  I've deployed way over 16 machines this last year, and
I don't recall that I've had a single disc failure on those machines.

Thanks,
Sean
-- 
 /home is where your .heart is.  -- Sean Reifschneider, 1999
Sean Reifschneider, Member of Technical Staff <jafo at tummy.com>
tummy.com, ltd. - Linux Consulting since 1995: Ask me about High Availability
      Back off man. I'm a scientist.   http://HackingSociety.org/