[lug] Setting up failover in Linux?

Simos blug at chinesetearoom.com
Mon Apr 30 16:56:42 MDT 2012


On Monday 30 April 2012 16:12 Rob Nagler wrote:
> >> out the network partition problem for 30 years, and dammit, I still
> >
> > Many systems do it via STONITH or Fencing.
> 
> This is voodoo at best.  If the network is partitioned, you can't send
> a message to the server to shutdown.  The two nodes may be visible to
> clients, but they may not be able to talk to each other.  Or, the
> primary may be isolated for a time, the secondary takes over, and the
> primary comes back up.  There have been some interesting AWS failures
> due exactly to this behavior, except the primary came up as an
> entirely different system. :)

Most HA systems have more than one channel of (heartbeat) communication
to avoid network partitioning (and thus "split brain"). I've seen everything from
old-fashioned serial connections to crossover cables directly connecting network
cards on the systems to little independent hubs (off the main IP network) to SAN
heartbeats over shared LUNs. I would not call this "voodoo" and in my experience
works quite well - it's extremely rare for two independent communication channels
to fail concurrently (well, if your datacenter gets hit by a total loss of power or a
meteorite strike, I suppose it could, but then all your cluster nodes would be gone
as well...)

Simos



More information about the LUG mailing list