[lug] Convert swap from raid0 to raid1

Nate Duehr nate at natetech.com
Thu Jul 26 02:43:20 MDT 2007


On Jul 25, 2007, at 11:35 PM, Sean Reifschneider wrote:

> Besides, if your system is swapping you're already trading off 3  
> orders of
> magnitude performance, so who really cares if even if swap were 2000x
> slower than RAM rather than just 1000x.

Amen on that... swap blows when it comes to performance.

>> thoughts.  i worried about the performance hit when i decided to  
>> go that
>> route but everything is a tradeoff.  availability versus  
>> performance in
>
> If you are worried about performance, disable swap or make it very  
> small.
> Unless your application really just *HAS* to complete, no matter  
> what, it's
> probably better off to have the system kill processes, than for it  
> to start
> thrashing and become unresponsive.  A system that is unresponsive  
> because
> of thrashing for 15 minutes might as well be down...

Been there, done that.  Booted the box when the boss said it couldn't  
wait.  Didn't matter that it was about to recover from the idiot  
developer's mistake.  :-)  Waited for fsck (hey, it was ext2, okay?  
heh...) and then re-queued all the "stuff" with some carefully placed  
"nice" commands ahead of the hogs... and monitored to see if it was  
going to freak out again.  Made a note to myself to ask the boss for  
more RAM for everything... got approval to significantly increase RAM  
in 8 machines, out of over 100.  (GRIN)

I have always wished there was a way to "tag" things that the kernel  
could feel free to blow away first, though -- some way to give levels  
of criticality to the super-important processes on the box.  I've  
always hated that the kernel always seems to decide that MySQL and  
Apache are more interesting to kill off, than the background crud  
that I don't care about on the box until tomorrow on the box.   
SpamAssassin also has a knack for getting itself killed in low-RAM  
situations too, which can lead to interesting problems, depending on  
how your MTA handles such outages...

>> which brings up a question. as it stands now, swap is raid 0.   
>> which means
>> that if one drive goes south when the system is booted, i believe  
>> that
>> would prevent the system from coming up.  i'll pull my drive and  
>> check
>> that in a second.
>
> It depends on the failure mode of the drive.  If it is just some  
> blocks
> that are bad, then it will probably come up unless the swap  
> signature block
> is one of the bad ones.  In that case it may or may not come up.   
> If the
> drive is entirely dead, it may or may not come up, depending on  
> whether the
> kernel decides it's bad and gives up and continues booting, or  
> keeps trying
> to read it...  The same old discussion...

Failed hardware is always by its nature, unpredictable, no matter how  
hard we try to cushion the blow.  I've seen multi-hundred-thousand- 
dollar Sun setups with oodles of redundancy and gadgetry and software  
added on to make downtime near-impossible, taken down by one bad  
fiber-channel card that just happened to behave in the most  
inappropriate way to crash the entire multi-server system.  It's  
impressive to me to watch such things happen, and downright  
terrifying to non-technical managers who thought their "investment"  
was fail-safe.  (And fail-safe, is truly a different engineering  
design spec and different from fool-proof -- nothing's safe from fools!)

Inexperienced bit-jockeys try to gather knowledge to keep their large  
gaps in knowledge (unknowns) from biting them in the ass, middle- 
level bit-jockeys feel confident that they've learned enough that  
most things won't bite them in the ass but still get bit  
occasionally, and "Senior" bit-jockeys easily sidestep the common  
mistakes and now KNOW something's still coming each year to bite them  
in the ass.  No one gets away from hardware trouble.

"No one expects the Spanish Inquisition!" (you got screwed by the  
hardware) slowly leads to "Never go up against a Sicilian when death  
is on the line!"... (you know you forgot something right as the  
hardware once again screws you).

BIG HUGE GRIN,

--
Nate Duehr
nate at natetech.com






More information about the LUG mailing list