[lug] Linux boxes drop off the net? Router problem?

Sebastian Sobolewski spsobole at mindless.com
Mon Feb 5 11:07:36 MST 2001


Couple of questions:


1. If you go to a Linux box that has "disappeared" and ping a remote site 
like yahoo. Can you now ping that Linux box externally?
         This would indicate the switch or router loosing track of the machine.
         In this case you may want to set up your linux boxes to do a ping 
once a minute and watch and see if they still drop of the network.

2.  From my understanding the win2k boxes and Linux boxes are intermingled 
on the same hubs, But are all the Linux boxes on the same subnet, which is 
different from the one the win2k boxes are on?
         A different subnet would point to a routing problem.

3.  Are you using the VLAN features of the LinkSwitch 1000?
         Your VLAN's may be miss configured. (If you have VLAN support of 
this should not be an issue)

4. Does your network run a DHCP Server?

    I have seen cases where a miss configured DHCP server decides to give 
out IP addresses that are in use.  This will cause an IP conflict and a 
router may stop sending packets to the Linux box. (I would also check to 
make sure there is no other IP conflicts)

5.  Are you running any "strange" services on your linux boxes.  Check 
/var/log/messages for any entries specifying that your Ethernet is in 
promiscuous mode.  If you find it most likely means your boxes have been 
compromised.

6.  I'm not familiar with the 2516 but you may want to check if it's been 
getting hit by any denial of service attacks.  The 2516 may be blocking 
port access after sensing a DOS.  (Router filter rules)  However that 
should ALSO cause your win2k boxes to disappear.


-Sebastian


At 10:52 AM 2/5/2001 -0700, you wrote:
>I've got a problem that's been driving me nuts for some time now.
>
>I'm afraid a reasonably full explanation will make this a long message, so
>I'll summarize the problem first, then go into more detail.
>
>In a nutshell, we have a mixed network consisting primarily of Linux boxes
>and Win2k boxes. On a seemingly random basis, our Linux boxes cannot be
>"seen" from the Internet. In other words, there are periods of time when a
>given Linux box cannot be pinged and/or connected to via any of its open
>ports. Checking the affected box locally reveals that it's fine (not
>overloaded, not rebooting, responsive locally, etc). But no one can connect
>from the outside.
>
>We do not see this phenomenon with any of the Win2k boxes (which obviously
>doesn't help my cause any).  :o(
>
>Our SysAdmin (a MS-friendly kinda guy) simply tells me "gee -- it must be
>Linux, none of our Win2k boxes do that").
>
>This seems to affect every Linux box on our network. Not every box is set up
>the same and not every box was set up by the same person, so I would tend to
>think it's not a simple configuration problem.
>
>Not every box uses the same distro either (some are RedHat 7, some are SuSE
>7).
>
>All boxes have the applicable patches/updates applied. All kernels are
>fairly recent 2.2.x builds.
>
>Some of the boxes have decent hardware (P450/128 Meg), so I think we can
>rule out the boxes being "underpowered".
>
>There are no "hosts.deny" or firewall issues getting in the way.
>
>It's not a DNS problem
>
>There is enough mixing in our network that I think we can rule out a simple
>hardware problem (like a bad hub, or bad wiring, or bad NICs found only on
>the Linux boxes etc). In other words, our Linux boxes don't all connect to
>the same hub or have identical hardware or anything.
>
>-----
>
>I'm suspecting it's more of a routing problem of some sort, but I don't know
>enough about routing/routers to know exactly *what* the problem is or *how*
>to troubleshoot further (and the SysAdmin isn't really willing to put time
>into the problem unless I come up with a specific plan of attack).
>
>Is it possible, for instance, that the Linux boxes go into a "dormant" mode
>after a while and the router thinks they're off the net or something??
>
>Our network topology is as follows:
>
>A T-1 feeding a Cisco 2516 router.
>
>The router dumps into a 3Com LinkSwitch 1000.
>
>The switch feeds 3Com 10 meg hubs.
>
>All of the Linux boxes are normally connected to the hubs (there was a
>period of time where we connected a Linux box directly into the router for
>testing, and we still had the same problem).
>
>Does this ring a bell with anyone? I've posted questions (and looked for
>answers) in other places with no luck.  :o(
>
>Any pointers on how to fix (or at least how to troubleshoot) would be much
>appreciated. We've been getting customer complaints for some time about this
>-- and last week I experienced it first-hand while trying to connect to
>various boxes from LinuxWorld.
>
>-- Gary
>_______________________________________________
>Web Page:  http://lug.boulder.co.us
>Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug

Sebastian Sobolewski 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lug.boulder.co.us/pipermail/lug/attachments/20010205/cb28c9fa/attachment.html>


More information about the LUG mailing list