[lug] More Server Problems

George Sexton gsexton at mhsoftware.com
Thu Feb 16 10:23:43 MST 2006


> -----Original Message-----
> From: lug-bounces at lug.boulder.co.us 
> [mailto:lug-bounces at lug.boulder.co.us] On Behalf Of D. Stimits
> Sent: Wednesday, February 15, 2006 7:57 PM
> To: Boulder (Colorado) Linux Users Group -- General Mailing List
> Subject: Re: [lug] More Server Problems
> 
> George Sexton wrote:
> > My server tanked again (reference):
> > 
> > http://archive.lug.boulder.co.us/Week-of-Mon-20060123/031529.html
> > 
> > Since that thread I upgraded to 2.6.14.6 kernel version and 
> I'm still having
> > the same issue. After the server crashes (message unknown 
> but I'm guessing
> > its ReiserFS file system corruption related) I have to run 
> reiserfsck and
> > use the --rebuild-tree option.
> > 
> 
> I'm curious if you are using default journal options, or if you've 
> experimented with things like placing journals on other 
> disks? If not, I 
> was just thinking that a bad block right at the journal might be 
> difficult for the system to recover from...maybe if you can test 
> different journal options.

I only have two drives, and they are configured as a software raid pair. It
seems like the linux kernel software raid should be handling the odd bad
block.

> 
> > I ran memtest86 version 3.2 through 4 complete cycles and 
> found no memory
> > issues. I also checked the hardware monitoring from the 
> bios. It looks like
> > the temperature is well within reason for CPU and 
> motherboard (31-37 C).
> 
> memtest86 4 times is nice, but probably not definitive. The 

Agreed. 24 hours would be nice but I can't be down that long.

> part that's 
> really misleading is the cpu and motherboard temperature...what you 
> really need is the hard drive temperature. Granted, it won't 
> work if the 
> cpu is overheating. But hard drive failure rates go up so 
> dramatically 
> with just a 10 degree increase it's unbelievable.

I remember some 4GB Micropolis A/V SCSI drives that I used to have that
would almost burn because they were so hot.

If and when I pull the box out of production, I'll get some cheap
thermometers and mount them to the drives. I've actually got a Hobo data
logger with a thermocouple that I could tape to a drive. Since it is a 1U
rackmount (Supermicro) the ventillation system is pretty well designed and
pretty intense. There's a lot of airflow through them.

George Sexton
MH Software, Inc.
http://www.mhsoftware.com/
Voice: 303 438 9585
  




More information about the LUG mailing list