[lug] sad hardware announcement :(

D. Stimits stimits at idcomm.com
Wed Jul 19 17:35:49 MDT 2000


George Sexton wrote:
> 
> The other truth is that SMP under 2.2.x just plain sucks. It doesn't work. I
> have several machines with a Intel 440BX chipset which Alan Cox describes as
> "as stable as it gets" that don't work. Here are some typical uptimes:

I have another SMP machine with the BX chipset, and it *never* crashes
under linux. This same machine dies several times a day under NT 4,
Win2K, and 98. Every imaginable driver, the sound card, and multiple
video cards have been used. It simply won't die under linux SMP 2.2.x,
nor will it stay up in any windows environment (I'm lucky if it stays up
long enough to shutdown...which tends to blue screen). So I guess I'd
say I'm a fan of linux SMP relative to anything MS runs. Oh well, c'est
la vie.

> 
> 2.2.14  5-70 days (70 was observed only once. The mean is around 15.
> 2.2.16  < 2 hours
> 2.2.17pre1      15 days
> 2.2.17pre9      4-6 days
> 2.2.17pre13     < 24 Hours
> 
> These boards work correctly under NT 4.0. It's not a board problem, its a
> kernel problem. Andrea Arcangeli has at least 4 unapplied patches for 2.2.17
> that correct SMP related issues. There are a lot more left I think.
> 
> The real killer is that when an SMP machine locks up it doesn't generate an
> oops under 2.2.x. I tried using a serial console with Ingo Molnar's NMI
> oopser and it just doesn't work at all on current kernel versions. I guess
> that I could try buying a hardware watchdog card...
> 
> Andrea A. has said that he would backport the 2.4.x oopser to 2.2.17 but it
> hasn't happened yet.
> 
> Your report documents high I/O as a trigger condition. In my case, the
> machines just flake when they are doing nothing. They run great all day long
> under very high CPU and I/O load. At 12:00 AM when they are doing nothing
> <boink>.
> 
> My advice to you, is that if you want to run Linux, do not buy an SMP board.
> Spend your money on the fastest processor you can buy.
> 
> George Sexton
> MH Software, Inc.
> Voice: 303 438 9585
> http://www.mhsoftware.com
> 
> > -----Original Message-----
> > From: lug-admin at lug.boulder.co.us [mailto:lug-admin at lug.boulder.co.us]On
> > Behalf Of D. Stimits
> > Sent: Tuesday, July 18, 2000 9:09 PM
> > To: BLUG
> > Subject: [lug] sad hardware announcement :(
> >
> >
> > I have used SuperMicro motherboards for years now, and recently picked
> > up a PIIIDM3, which at first seemed stable. I'm not sure how many of you
> > have noted sporadic reports of some server boxes locking up under high
> > i/o, but Redhat and others have made notes on this. I've found out that
> > this is a problem with all of the SuperMicro i840 chipset boards as
> > well.
> >
> > The problem isn't entirely high i/o, but this tends to generate the
> > conditions that trigger it. The problem is an unknown IO-APIC, which is
> > a device responsible for reprogrammable IRQ steering between multiple
> > cpu's. When i/o doesn't lock up the system prior to logging failure, a
> > note is found as the last entry of /var/log/messages, "kernel:
> > unexpected IRQ vector 217 on CPU#0!" (or on CPU#1). In other locations
> > of the log, you'll likely see the entry "WARNING: unexpected IO-APIC,
> > please mail".
> >
> > After speaking with SuperMicro, they simply state "it runs fine on NT",
> > and they won't help. In the past they were interested in Linux, but
> > SuperMicro has apparently changed its mind and is not interested
> > anymore. I've contacted Allen Cox to see what else can be done, but for
> > now, you should consider all i840 SuperMicro boards incompatible with
> > Linux (I also saw very similar reports on FreeBSD and other open source
> > o/s's).
> >
> > The temporary workaround is to boot with the kernel option "noapic".
> > This removes irq redirection to the 2nd cpu, meaning all device i/o is
> > entirely on the first cpu. Additionally, some PCI devices which might
> > have been at an irq value will be changed or at an unreachable irq.
> > There is some explanation of this sort of problem in the kernel source
> > Documentation directory: "IO-APIC.txt".
> >
> > At this point, I am looking for a new motherboard, dual cpu, with 4x
> > AGP-pro (I'm looking at high end OpenGL graphics cards) and 64 bit, 66
> > MHz PCI slots (required for ultra 160, which I plan to continue using).
> > Iwill has a dual slot 2 board, the DCA200, which unfortunately requires
> > rdram (expensive and increased latency, with no ability to reuse my
> > current pc133 ram), which might be the route to go if nothing else
> > appears. Anyone know if this board really is stable under linux? The
> > Intel OR840 would be a candidate, but it lacks 64 bit PCI. Does anyone
> > know if the Via Apollo Pro 133A chipset is a solution? Do any of the
> > 133A boards have 64 bit PCI slots?
> >
> > And is there anyone who is interested in buying a good non-linux
> > motherboard, a PIIIDM3 SuperMicro?
> >
> > Thanks,
> > D. Stimits, stimits at idcomm.com
> >
> > _______________________________________________
> > Web Page:  http://lug.boulder.co.us
> > Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> 
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug




More information about the LUG mailing list