[lug] Linux boxes drop off the net? Router problem?

Gary Frerking (TurboPower) garyf at turbopower.com
Mon Feb 5 10:52:29 MST 2001


I've got a problem that's been driving me nuts for some time now.

I'm afraid a reasonably full explanation will make this a long message, so
I'll summarize the problem first, then go into more detail.

In a nutshell, we have a mixed network consisting primarily of Linux boxes
and Win2k boxes. On a seemingly random basis, our Linux boxes cannot be
"seen" from the Internet. In other words, there are periods of time when a
given Linux box cannot be pinged and/or connected to via any of its open
ports. Checking the affected box locally reveals that it's fine (not
overloaded, not rebooting, responsive locally, etc). But no one can connect
from the outside.

We do not see this phenomenon with any of the Win2k boxes (which obviously
doesn't help my cause any).  :o(

Our SysAdmin (a MS-friendly kinda guy) simply tells me "gee -- it must be
Linux, none of our Win2k boxes do that").

This seems to affect every Linux box on our network. Not every box is set up
the same and not every box was set up by the same person, so I would tend to
think it's not a simple configuration problem.

Not every box uses the same distro either (some are RedHat 7, some are SuSE
7).

All boxes have the applicable patches/updates applied. All kernels are
fairly recent 2.2.x builds.

Some of the boxes have decent hardware (P450/128 Meg), so I think we can
rule out the boxes being "underpowered".

There are no "hosts.deny" or firewall issues getting in the way.

It's not a DNS problem 

There is enough mixing in our network that I think we can rule out a simple
hardware problem (like a bad hub, or bad wiring, or bad NICs found only on
the Linux boxes etc). In other words, our Linux boxes don't all connect to
the same hub or have identical hardware or anything.

-----

I'm suspecting it's more of a routing problem of some sort, but I don't know
enough about routing/routers to know exactly *what* the problem is or *how*
to troubleshoot further (and the SysAdmin isn't really willing to put time
into the problem unless I come up with a specific plan of attack).

Is it possible, for instance, that the Linux boxes go into a "dormant" mode
after a while and the router thinks they're off the net or something??

Our network topology is as follows:

A T-1 feeding a Cisco 2516 router.

The router dumps into a 3Com LinkSwitch 1000.

The switch feeds 3Com 10 meg hubs.

All of the Linux boxes are normally connected to the hubs (there was a
period of time where we connected a Linux box directly into the router for
testing, and we still had the same problem).

Does this ring a bell with anyone? I've posted questions (and looked for
answers) in other places with no luck.  :o(

Any pointers on how to fix (or at least how to troubleshoot) would be much
appreciated. We've been getting customer complaints for some time about this
-- and last week I experienced it first-hand while trying to connect to
various boxes from LinuxWorld.

-- Gary



More information about the LUG mailing list