[lug] server memory woes (3rd attempt!)

Thu Aug 26 18:52:01 MDT 2004

Michael Belanger wrote:

> Thanks for the responses!
>
> John Hernandez wrote:
>
>> Depending what tools you're using to monitor available memory, this 
>> is probably just normal behavior.  The kernel will generally suck up 
>> physical memory for things like file caching, etc.
>
>
> [mrb at porbeagle ~]$ free
>              total       used       free     shared    buffers     cached
> Mem:       3058568    2152228     906340          0     159476    1732016
> -/+ buffers/cache:     260736    2797832
> Swap:      6289320          0    6289320

One comment here... common mistake on large RAM boxes...

6GB of swap will pretty much be useless to you and is wasted space.  If 
you calculate how long it would take to swap 6GB of data in and out 
under the kind of load that would require that level of swapping, you'll 
see that the box will likely die or become completely unresponsive long 
before you can ever "get there".

This looks like a case of where the RedHat installer "did the wrong 
thing".  It doubled your physical RAM size for your swap space, which is 
a good rule of thumb on smaller RAM machines, but a machine that has 3G 
of RAM is a different story. You may want to discuss this with RedHat if 
you're opening a ticket with them anyway, and have them walk you through 
lowering the amount of swap on this machine. 

You can see there where 1.7 G of that RAM in use in the 3GB you have is 
"cached" data, which is *usually* able to be dumped by the kernel at any 
time when applications need physical RAM, so the box is definitely not 
hurting for physical RAM.  (Want to loan me some?  GRIN... I have a 
512MB Athlon 2500 "do everything" box that is eating into about 100M of 
swap at all times... performance on it has gone straight into the 
toilet.  Heh.)

What kind of box is it?  Most "fell off the network and console won't 
respond locally" types of lockups I've seen over the years are the 
result of either a piece of hardware acting flakey (usually a bad RAM 
SIMM/DIMM) or a bug in the kernel drivers for a particular piece of 
on-board hardware, usually NIC's that don't play nicely.  Have also seen 
a few broken DMA implementations that would manifest themselves in 
wicked lockups with nasty data corruption, but that's super-rare and 
many years ago.

When the machine stops responding on the network, is the local console 
still alive?

--
Nate Duehr, nate at natetech.com