Leap second tonight

Date: Mon, 16 Mar 2009 23:07:42 -0700

> We use CDMA clocks and last leap second it took weeks for all of the
> cell sites to adjust the last one. As a result, I have set all of our
> clocks for manual leap second and set them to adjust tonight at
> midnight
> (UTC).I'll take a look in about 35 minutes and see how it worked.

Chiming in a little late here ...

Over at the NTP Pool we had about 9% of the servers not handle the
leap second accurately; starting at midnight UTC. After an hour (so
between 01:00 and 02:00) it was down to about 3%; a couple hours later
down to about 1% of our servers (a few dozen)[1]. Most of those got
in order within 24-48 hours. Interestingly the few who didn't get
corrected within a few days were, tada: CDMA clocks.

To stay vaguely NANOG on-topic: I believe at least some of our ~1700
NTP servers are routers; so I'm guessing they handled the leap second
alright.

Routers as ntp servers. Yuck! Routers route well, but they treat time as
a low priority job and jitter on Cisco routers is simply terrible.
Junipers do better, but are still a poor time server.

Sounds like a "RISKS" lesson: Don't use side-effects of a tool for
something critical. (If I understand it right then CDMA uses accurate
time because it needs accurate frequency; not because it cares what
time it is).

As I understand it, they need both time and frequency to do cell
hand-offs cleanly, but, as long as all towers in a carrier's market are
showing the same time, it really does not matter if they do the leap
second.

Endrun Technologies, who make our clocks, ship them configured for
manual leap seconds because so many cell operators are pretty casual
about the leap second thing, but that means that the people using the
clocks need to be aware that they need to be told when a leap second is
coming and that, in turn, means the they must know a bit about leap
seconds and must have read the manual. No surprise that a lot of CDMA
clocks missed the leap second.

They may suck for being a Stratum-1/2 server, but even the most jittery
Cisco is still far and away good enough to serve up a ntpdate so that an
end-user PC-class machine is in the right minute.

As long as the end-user is made aware that the accuracy of said NTP clock
  is +/- 30.000 seconds (or whatever jitter might exist). Seems kind of
  ridiculous to use an NTP source that is, for many purposes, wildly
  inaccurate. For my purposes, wildly is more than +/- 0.1 seconds. Trying
  to troubleshoot a problem, network or server, where the timestamps on each
  server/router/device vary inconsistently, is like walking on broken
  fluorescent bulbs -- painful and dangerous to one's health.

  As long as the end-user is made aware that the accuracy of said NTP
clock
  is +/- 30.000 seconds (or whatever jitter might exist). Seems kind
of
  ridiculous to use an NTP source that is, for many purposes, wildly
  inaccurate. For my purposes, wildly is more than +/- 0.1 seconds.
Trying
  to troubleshoot a problem, network or server, where the timestamps on
each
  server/router/device vary inconsistently, is like walking on broken
  fluorescent bulbs -- painful and dangerous to one's health.

Not being a time geek, since Cisco's were called out for being wild
jitter-mongers... how much jitter are we talking about?

Clock is synchronized, stratum 2, xxxxxxxxxxxxxxxxxxxxxxxx
nominal freq is 250.0000 Hz, actual freq is 249.9989 Hz, precision is 2**18
reference time is CD6A7CD4.45A9BB00 (19:47:32.272 UTC Tue Mar 17 2009)
clock offset is 2.0581 msec, root delay is 29.62 msec
root dispersion is 6.81 msec, peer dispersion is 3.30 msec

Are we talking about +/- 30 seconds, or a problem bounded by +/- 30 msec?

Deepak Jain
AiNET

Not being a time geek, since Cisco's were called out for being wild
jitter-mongers... how much jitter are we talking about?

Clock is synchronized, stratum 2, xxxxxxxxxxxxxxxxxxxxxxxx
nominal freq is 250.0000 Hz, actual freq is 249.9989 Hz, precision is 2**18
reference time is CD6A7CD4.45A9BB00 (19:47:32.272 UTC Tue Mar 17 2009)
clock offset is 2.0581 msec, root delay is 29.62 msec
root dispersion is 6.81 msec, peer dispersion is 3.30 msec

Are we talking about +/- 30 seconds, or a problem bounded by +/- 30 msec?

I've actually been gathering some statistics on this using Munin (http://munin.projects.linpro.no/) on my linux server. There's currently 10 ntp servers being monitored and one of them is a 7600-series Cisco, which is handling quite a bit of traffic (CPU load around 20%). Here are the Munin graphs for it http://dx.fi/alt/ntp/7600.png (times in Finnish time, UTC+2).

In comparison, here are the same graphs for time1.mikes.fi (a stratum-2 clock provided by the Finnish Centre for metrology and accreditation) http://dx.fi/alt/ntp/time1.mikes.fi.png and for Netnods stratum-1 clock in Stockholm http://dx.fi/alt/ntp/ntp1.sth.netnod.se.png

Best regards,

> Not being a time geek, since Cisco's were called out for being wild
> jitter-mongers... how much jitter are we talking about?
>
> Clock is synchronized, stratum 2, xxxxxxxxxxxxxxxxxxxxxxxx
> nominal freq is 250.0000 Hz, actual freq is 249.9989 Hz, precision is
2**18
> reference time is CD6A7CD4.45A9BB00 (19:47:32.272 UTC Tue Mar 17
2009)
> clock offset is 2.0581 msec, root delay is 29.62 msec
> root dispersion is 6.81 msec, peer dispersion is 3.30 msec
>
> Are we talking about +/- 30 seconds, or a problem bounded by +/- 30
msec?

I've actually been gathering some statistics on this using Munin
(http://munin.projects.linpro.no/) on my linux server. There's
currently 10 ntp servers being monitored and one of them is a 7600-
series Cisco, which is handling quite a bit of traffic (CPU load around
20%). Here are the Munin graphs for it http://dx.fi/alt/ntp/7600.png
(times in Finnish time, UTC+2).

In comparison, here are the same graphs for time1.mikes.fi (a stratum-2
clock provided by the Finnish Centre for metrology and accreditation)
http://dx.fi/alt/ntp/time1.mikes.fi.png and for Netnods stratum-1 clock
in Stockholm http://dx.fi/alt/ntp/ntp1.sth.netnod.se.png

In an NTP scenario, where each device is keeping its own time, and being "disciplined" by
several others... don't these spikes of jitter get wiped away -- especially when
multiple NTP sources are used?

Perhaps I'm mistaken, but I thought that was the point of trying to keep precise time
via an imprecise network [the jitter could easily be congestion in the case of long
haul links] was that this can be mathematically worked out to a level of precision.

Is a Cisco device lying when it says it has 2^18th precision?

Are we just comparing and stating that between each sample from any one NTP device we might
see wildly differently levels of accuracy/precision and the truly diligent time keeper will
discipline his clocks with multiple readings over time?

Thanks,

Deepak