[lug] Server clock losing serious time
Jeff Schroeder
jeff at neobox.net
Sat Aug 14 10:34:25 MDT 2004
Hey all--
I just built two new web servers (Dell PowerEdge 750s) and I'm having
some serious issues with the system clock losing time. In the span of
seven hours, the clock lost an hour and a half of time. That's a
frightening rate of about *15 seconds per minute*.
What's more, I'm running NTP on both servers. I can reset the time just
fine via ntpdate:
# ntpdate -d utcnist.colorado.edu
14 Aug 08:43:56 ntpdate[1500]: ntpdate 4.2.0 at 1.1161-r Sat Jul 17
15:12:33 MDT 2004 (1)
Looking for host utcnist.colorado.edu and service ntp
host found : india.Colorado.EDU
<lots of debugging messages snipped>
14 Aug 08:43:56 ntpdate[1500]: step time server 128.138.140.44 offset
6007.325720 sec
Wow, off by 6007 seconds (1hr 40min). Ouch.
I start the NTP daemon (ntpd), but then the clock drifts alarmingly.
I've verified that NTP is running (netstat -nl | grep 123), but if I
trace the time servers I get this odd result:
# ntptrace
localhost: stratum 16, offset 0.000000, root distance 0.000000
I believe that indicates ntpd can't synchronize with a time server, so
it's essentially not doing what it should. I compared configuration
files with another (working) server, and they're identical. The ntpd
executable is also identical.
In any case, I figured I'd just shut down NTP and let the clock run on
it's own, perhaps resetting it every few days via 'ntpdate'. I kill
ntpd and the clock continues to wind down at the same rate as above.
So it seems that NTP doesn't help (or hinder) the problem.
Out of curiosity, I checked the hardware clock. It appears to be just
fine; even after the system (software) clock drifted an hour and a
half, the hardware clock was still dead-on:
# hwclock ; date
Sat Aug 14 10:28:06 2004 -0.052861 seconds
Sat Aug 14 08:43:56 MDT 2004
I have a hard time believing it's hardware-related, but I've never seen
this kind of clock problem before. NTP has always been rock-solid for
me, so the 'ntptrace' problem above is a concern. But even without NTP
running, I shouldn't see this sort of thing.
Oh, and as a final note the kernel is 2.6.7 and the software is
absolutely identical to an installation on another PE750 server on the
same LAN that works fine. The only thing that's special about these
two servers is their second ethernet ports (eth1) are connected to one
another via a crossover cable.
Any ideas?
TIA,
Jeff
More information about the LUG
mailing list