was i asleep when the gtld servers had the worse problem today?

Date: Fri, 13 Nov 1998 03:51:09 -0800 (PST)
From: Randy Bush <randy@psg.com>

if i understand, f.root-servers.net was having problems doing an axfr
from a.root-servers.net. has anyone determined a technical reason why?

for four days, tcpdump on my side shows behaviour consistent with lost ACKs;
pathchar from my side shows that A's first mile is a lossy 3Mb/s bottleneck.

i've switched F from axfr to ftp for now, and rz.internic.net is showing
the same lossage. i'm sending a lot of duplicate ACKs. transfer is slow.

08:33:01.038694 198.41.0.19.20 > 204.152.184.251.3022: . 58236:59696(1460)
  ack 1 win 8760 (DF) [tos 0x8]
08:33:01.038984 204.152.184.251.3022 > 198.41.0.19.20: . ack 58236
  win 33580 (DF)
08:33:01.039100 204.152.184.251.3022 > 198.41.0.19.20: . ack 62616
  win 29200 (DF)
08:33:01.039176 204.152.184.251.3022 > 198.41.0.19.20: . ack 62616
  win 33580 (DF)

Date: Fri, 13 Nov 1998 13:21:36 +0100
From: Ray Davis <ray@carpe.net>

Aren't NSI's nameservers running bind? If not, then that's scary.
If so, then why wouldn't they also become lame rather than insane?

they were having different problems. their operators had to restart named
but that the zone file itself had transferred cleanly. they weren't lame.

in other words, NSI's servers' problems were because of me (or: my code)
and my server's problems were because of NSI (or: their transit pipe.)

right now i'm looking at the code, and they're working on their link.

[ On Fri, November 13, 1998 at 08:43:34 (-0800), Paul A Vixie wrote: ]

Subject: Re: was i asleep when the gtld servers had the worse problem today?

for four days, tcpdump on my side shows behaviour consistent with lost ACKs;
pathchar from my side shows that A's first mile is a lossy 3Mb/s bottleneck.

Just to add a little fuel to the fire:

I've been experiencing TCP related connection failures from most/all
hosts at internic.net, with at least whois and http (though not SMTP).

For example when I attempt to do a whois I'll normally only get back the
first two lines of output (i.e. "\nRegistrant:\n"). Then the connection
hangs and times out. From my point of view it seems that the connection
has indeed been cut off. The weird part is that this only happens for
NetBSD-1.3.x hosts on my network. BSDI BSD/OS 1.1, Ultrix, and at one
point SunOS-4.1 hosts were all receiving complete output from whois
queries. It gets even weirder when I look at the tcpdump traces on the
NetBSD-1.3.x gateway that connects my network to the next one up the
stream shows that the packets are actually coming from the remote
internic.net host, but they're not getting through the NetBSD routing
code (i.e. I see the ACK come in on the ethernet interface, but not out
the PPP interface).

I suspect it's got something to do with the firewall and traffic
director stuff they're using for some services at internic.net. The
only apparent difference between the two TCP/IP connections (i.e. the
ones from NetBSD-1.3 that don't work, and the ones from other systems
that do work), are the initial window size negotiations.

So far internic.net hosts are the only ones I've ever encountered that
trigger this failure in the NetBSD networking code.... I've meant to do
some more extensive analysis and bring this up with the NetBSD
networking gurus, but so far haven't had time.

From: Greg A. Woods

Just to add a little fuel to the fire:

I've been experiencing TCP related connection failures from most/all
hosts at internic.net, with at least whois and http (though not

SMTP).

For example when I attempt to do a whois I'll normally only get back

the

first two lines of output (i.e. "\nRegistrant:\n"). Then the

connection

hangs and times out. From my point of view it seems that the

connection

has indeed been cut off. The weird part is that this only happens

for

NetBSD-1.3.x hosts on my network. BSDI BSD/OS 1.1, Ultrix, and at

one

point SunOS-4.1 hosts were all receiving complete output from whois
queries. [...]

I've noticed the same thing, but on a FreeBSD server. My personal
Linux box, and the Solaris box here both work fine.

Hmmm.

I've been seeing the same thing on our linux boxen here. It's being going
on for quite some time. The sad thing is the connecting hangs there for an
extended period of time, but nothing ever comes through. I find that 1 out
of 20 whois attempts actually goes completely through. Someone on the
inet-access list mentioned that you can telnet into rs.internic.net and
get them that way, i havent tried that yet. I have found that anything in
the internic.net domain is just horribly slow to begin with anyways.

  _ __ _____ __ _________
______________ /_______ ___ ____ /______ John Gonzalez/Net.Tech
__ __ \ __ \ __/_ __ `__ \/ __ /_ ___/ MDC Computers/netMDC!
_ / / / `__/ /_ / / / / / / /_/ / / /__ (505)437-7600/fax-437-3052
/_/ /_/\___/\__/ /_/ /_/ /_/\__,_/ \___/ http://www.netmdc.com
[---------------------------------------------[system info]-----------]
12:25pm up 33 days, 15:54, 3 users, load average: 0.05, 0.12, 0.09

This same thing is happening to me, but in my case it is a bug in the
InterNIC's whois server that must have just been introduced recently.

AFAIK, the whois protocol is supposed to consist of a request terminated
by a '\n'. However, the InterNIC's server is sending a response as soon
as it gets the first packet, even if there is no \n termination.

There is at least one common whois replacement client that sends the \n in
a separate packet. What happens is the first packet with the request but
without the \n arrives at the InterNIC, then they send a response and
close the socket. In the meantime, the second packet from the client with
the \n in arrives after the socket is closed for reading, prompting a RST
from the server. Even though the full response is either in socket
buffers at the InterNIC or on the wire or even in socket buffers on the
client, once the RST arrives that response will be (correctly) thrown away
if it hasn't been actually read() by the client.

The reason you see part of it is that the whois server is sending separate
packets for the "\nRegistrant:\n" part and the rest of the response.

The fix is for the InterNIC to fix their whois server to conform to the
"protocol" and/or do a lingering close so it doesn't send a RST. Simply
waiting for the end of the line should be enough and should be correct,
although if they really want to do a lingering close then see the
lingering_close() function in Apache for an example.

Note that I'm not sending this to the InterNIC, because I don't have time
to wade through and try to find a contact address that isn't ignored and
which is appropriate. I would hope someone from the InterNIC is reading
this list and will fix it.

[ On Fri, November 13, 1998 at 12:13:38 (-0800), Marc Slemko wrote: ]

Subject: Re: other network problems with hosts at internic.net

This same thing is happening to me, but in my case it is a bug in the
InterNIC's whois server that must have just been introduced recently.

I have the same problem contacting their web server, and sometimes even
with telnet and ftp to their servers.

AFAIK, the whois protocol is supposed to consist of a request terminated
by a '\n'. However, the InterNIC's server is sending a response as soon
as it gets the first packet, even if there is no \n termination.

The NetBSD whois client uses stdio, and sends the last argument with a
single fprintf() call followed by an fflush().

BTW, the NetBSD client sends "\r\n" on the end of the data sent to the
server.

There is at least one common whois replacement client that sends the \n in
a separate packet. What happens is the first packet with the request but
without the \n arrives at the InterNIC, then they send a response and
close the socket. In the meantime, the second packet from the client with
the \n in arrives after the socket is closed for reading, prompting a RST
from the server. Even though the full response is either in socket
buffers at the InterNIC or on the wire or even in socket buffers on the
client, once the RST arrives that response will be (correctly) thrown away
if it hasn't been actually read() by the client.

The exact same whois.c client code running on my NetBSD-1.3.x boxes
fails to retrieve the complete response, yet when run on at least the
BSDI 1.1 box it works fine.

The reason you see part of it is that the whois server is sending separate
packets for the "\nRegistrant:\n" part and the rest of the response.

The fix is for the InterNIC to fix their whois server to conform to the
"protocol" and/or do a lingering close so it doesn't send a RST. Simply
waiting for the end of the line should be enough and should be correct,
although if they really want to do a lingering close then see the
lingering_close() function in Apache for an example.

That might indeed help, but I'm not going to put the blame on them
immediately since I know that even when I make the connection from my
NetBSD boxes the packets are making it back as far as the machine on the
far end of my PPP link.

The only apparent difference between connections that work, and
connections that don't, for me at least, is the initial window size.