[lug] sad hardware announcement :(

D. Stimits stimits at idcomm.com
Thu Jul 20 00:25:06 MDT 2000


"D. Stimits" wrote:
> 
> George Sexton wrote:
> >
> > I have been running SMP under NT for about 4 years. My old machine (a Tyan
> > Dual Pentium 133)  rarely crashed. The same BX boards that don't work with
> > Linux are running NT 4.0 great. I have had maybe one blue screen in 3
> > months.
> >
> > With your SMP machine based on a BX chipset, what do your up-times look
> > like?
> 
> About 15 minutes to a few hours.

Let me clarify...15 minutes to a few hours under NT 4 or Win 2K. Under
linux, it died once when I stepped on the power cord and blinked the
power. Otherwise, it has never crashed in several years. It does,
however, have to be shut down when changing kernels or hardware. :P

I did have a print job die once, and the log showed the equivalent of a
kernel panic on cpu #1, but cpu#0 continued and apparently reset it. The
only way I knew it was a problem is because the print job stopped, and
there was a log message. I didn't even have to reboot to begin printing
again. Short of the power switch, and having operated it from old 2.0.36
smp (pretty bad, I think that required patching), through current 2.2.x,
I can't think of anything that will bring it down other than the power
switch.

> 
> >
> > > -----Original Message-----
> > > From: lug-admin at lug.boulder.co.us [mailto:lug-admin at lug.boulder.co.us]On
> > > Behalf Of D. Stimits
> > > Sent: Wednesday, July 19, 2000 5:36 PM
> > > To: lug at lug.boulder.co.us
> > > Subject: Re: [lug] sad hardware announcement :(
> > >
> > >
> > > George Sexton wrote:
> > > >
> > > > The other truth is that SMP under 2.2.x just plain sucks. It
> > > doesn't work. I
> > > > have several machines with a Intel 440BX chipset which Alan Cox
> > > describes as
> > > > "as stable as it gets" that don't work. Here are some typical uptimes:
> > >
> > > I have another SMP machine with the BX chipset, and it *never* crashes
> > > under linux. This same machine dies several times a day under NT 4,
> > > Win2K, and 98. Every imaginable driver, the sound card, and multiple
> > > video cards have been used. It simply won't die under linux SMP 2.2.x,
> > > nor will it stay up in any windows environment (I'm lucky if it stays up
> > > long enough to shutdown...which tends to blue screen). So I guess I'd
> > > say I'm a fan of linux SMP relative to anything MS runs. Oh well, c'est
> > > la vie.
> > >
> > > >
> > > > 2.2.14  5-70 days (70 was observed only once. The mean is around 15.
> > > > 2.2.16  < 2 hours
> > > > 2.2.17pre1      15 days
> > > > 2.2.17pre9      4-6 days
> > > > 2.2.17pre13     < 24 Hours
> > > >
> > > > These boards work correctly under NT 4.0. It's not a board
> > > problem, its a
> > > > kernel problem. Andrea Arcangeli has at least 4 unapplied
> > > patches for 2.2.17
> > > > that correct SMP related issues. There are a lot more left I think.
> > > >
> > > > The real killer is that when an SMP machine locks up it doesn't
> > > generate an
> > > > oops under 2.2.x. I tried using a serial console with Ingo Molnar's NMI
> > > > oopser and it just doesn't work at all on current kernel
> > > versions. I guess
> > > > that I could try buying a hardware watchdog card...
> > > >
> > > > Andrea A. has said that he would backport the 2.4.x oopser to
> > > 2.2.17 but it
> > > > hasn't happened yet.
> > > >
> > > > Your report documents high I/O as a trigger condition. In my case, the
> > > > machines just flake when they are doing nothing. They run great
> > > all day long
> > > > under very high CPU and I/O load. At 12:00 AM when they are
> > > doing nothing
> > > > <boink>.
> > > >
> > > > My advice to you, is that if you want to run Linux, do not buy
> > > an SMP board.
> > > > Spend your money on the fastest processor you can buy.
> > > >
> > > > George Sexton
> > > > MH Software, Inc.
> > > > Voice: 303 438 9585
> > > > http://www.mhsoftware.com
> > > >
> > > > > -----Original Message-----
> > > > > From: lug-admin at lug.boulder.co.us
> > > [mailto:lug-admin at lug.boulder.co.us]On
> > > > > Behalf Of D. Stimits
> > > > > Sent: Tuesday, July 18, 2000 9:09 PM
> > > > > To: BLUG
> > > > > Subject: [lug] sad hardware announcement :(
> > > > >
> > > > >
> > > > > I have used SuperMicro motherboards for years now, and recently picked
> > > > > up a PIIIDM3, which at first seemed stable. I'm not sure how
> > > many of you
> > > > > have noted sporadic reports of some server boxes locking up under high
> > > > > i/o, but Redhat and others have made notes on this. I've
> > > found out that
> > > > > this is a problem with all of the SuperMicro i840 chipset boards as
> > > > > well.
> > > > >
> > > > > The problem isn't entirely high i/o, but this tends to generate the
> > > > > conditions that trigger it. The problem is an unknown
> > > IO-APIC, which is
> > > > > a device responsible for reprogrammable IRQ steering between multiple
> > > > > cpu's. When i/o doesn't lock up the system prior to logging failure, a
> > > > > note is found as the last entry of /var/log/messages, "kernel:
> > > > > unexpected IRQ vector 217 on CPU#0!" (or on CPU#1). In other locations
> > > > > of the log, you'll likely see the entry "WARNING: unexpected IO-APIC,
> > > > > please mail".
> > > > >
> > > > > After speaking with SuperMicro, they simply state "it runs
> > > fine on NT",
> > > > > and they won't help. In the past they were interested in Linux, but
> > > > > SuperMicro has apparently changed its mind and is not interested
> > > > > anymore. I've contacted Allen Cox to see what else can be
> > > done, but for
> > > > > now, you should consider all i840 SuperMicro boards incompatible with
> > > > > Linux (I also saw very similar reports on FreeBSD and other
> > > open source
> > > > > o/s's).
> > > > >
> > > > > The temporary workaround is to boot with the kernel option "noapic".
> > > > > This removes irq redirection to the 2nd cpu, meaning all device i/o is
> > > > > entirely on the first cpu. Additionally, some PCI devices which might
> > > > > have been at an irq value will be changed or at an unreachable irq.
> > > > > There is some explanation of this sort of problem in the kernel source
> > > > > Documentation directory: "IO-APIC.txt".
> > > > >
> > > > > At this point, I am looking for a new motherboard, dual cpu, with 4x
> > > > > AGP-pro (I'm looking at high end OpenGL graphics cards) and 64 bit, 66
> > > > > MHz PCI slots (required for ultra 160, which I plan to
> > > continue using).
> > > > > Iwill has a dual slot 2 board, the DCA200, which
> > > unfortunately requires
> > > > > rdram (expensive and increased latency, with no ability to reuse my
> > > > > current pc133 ram), which might be the route to go if nothing else
> > > > > appears. Anyone know if this board really is stable under linux? The
> > > > > Intel OR840 would be a candidate, but it lacks 64 bit PCI. Does anyone
> > > > > know if the Via Apollo Pro 133A chipset is a solution? Do any of the
> > > > > 133A boards have 64 bit PCI slots?
> > > > >
> > > > > And is there anyone who is interested in buying a good non-linux
> > > > > motherboard, a PIIIDM3 SuperMicro?
> > > > >
> > > > > Thanks,
> > > > > D. Stimits, stimits at idcomm.com
> > > > >
> > > > > _______________________________________________
> > > > > Web Page:  http://lug.boulder.co.us
> > > > > Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> > > >
> > > > _______________________________________________
> > > > Web Page:  http://lug.boulder.co.us
> > > > Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> > >
> > > _______________________________________________
> > > Web Page:  http://lug.boulder.co.us
> > > Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> >
> > _______________________________________________
> > Web Page:  http://lug.boulder.co.us
> > Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> 
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug




More information about the LUG mailing list