NSI bulletin 097-004 | Root Server Problems

Date: Thu, 17 Jul 1997 22:52:18 +0500 (GMT)
From: David Holtzman <dholtz@internic.net>
To: nanog@merit.edu
Subject: NSI bulletin 097-004 | Root Server Problems
Resent-Date: Thu, 17 Jul 1997 14:42:42 -0400 (EDT)

On Wednesday night, July 16, during the computer-generation of the
Internet top-level domain zone files, an Ingres database failure resulted
in corrupt .COM and .NET zone files. Despite alarms raised by Network
Solutions' quality assurance schemes, at approximately 2:30 a.m. (Eastern
Time), a system administrator released the zone file without regenerating the
file and verifying its integrity. Network Solutions corrected the
problem and reissued the zone file by 6:30 a.m. (Eastern Time).

Thank you.
David H. Holtzman
Sr VP Engineering, Network Solutions

So, if the new zone files were re-issued at 06:30 EST, and they take
about an hour to download, why was it that some root servers were still
handing out bad data many hours later (at least one until about 14:00
EST)? The particular server I'm thinking of, though not residing in the
Eastern timezone, does seem to have what I think is a 24x7 NOC nearby,
and in theory could have been prepared to reload as quickly as anyone.

This may be just a coincidence, but it was about an hour after I
e-mailed and telephoned them that they finally had the right data in
place. Unfortunately finding the right contact was not entirely trivial
because the listed contact person had a full voice-mailbox and his
operator had no idea who else I could speak to, and the NOC has only a
1-800 number (and a FAX) listed that doesn't work outside the USA. The
NOC person I finally reached on the telephone didn't even seem to be
fully aware that they indeed ran a root nameserver for the Internet. He
did know that there was e-mail bouncing, and indeed I didn't expect they
could answer my e-mail if they were using their own root server....

Worst of all though they left the errant server on-line, handing out
NXDOMAIN replies to any and all who asked, while they were downloading
the corrected zone files. Hopefully this is not standard operating
procedure for a root server, or at least not from now on.

What annoys me most is that I didn't receive any notification of any
sort of problem from any of the mailing lists out of internic.net. I
probably should subscribe to nanog, but I'd have thought namedroppers,
or maybe even rs-info, should have had the above announcement posted
just as soon as the mailers had enough trustworthy DNS data to deliver
it with. There was nothing in http://rs.internic.net/announcements/
either, except for drivel about "maintaining high customer service
levels," and there still isn't (though I suppose this event wasn't
exactly "good PR").

What are the current procedures for announcing such problems to more
than just the root operators themselves?