We had a very strange problem today. Two of our hosts could not reach
a server, but only those two hosts. All of our other hosts could reach
those servers fine. (OK, I didn't try ALL of our IPs, but the half
dozen I did try worked fine.)
I checked all of our firewalls and routers, and everywhere I looked
all of the traffic was exiting our network just fine. I saw on our
edge routers the traffic going out, just no traffic back to the two
hosts in question. (We had good bidirectional traffic to all of our
other hosts.) And the two hosts in question were only having problems
connecting to ftp.agnewsonline.com.
Lets start with a traceroute from a working host, the orginating host
is 12.192.92.14:
[~]% traceroute -I ftp.agnewsonline.com
traceroute to agnewsonline.com (64.46.45.226), 64 hops max, 60 byte packets
1 12.192.92.3 (12.192.92.3) 0.257 ms 0.171 ms 0.163 ms
2 pluto-0 (12.192.93.13) 0.401 ms 0.296 ms 0.294 ms
3 ixion-att (12.192.93.244) 1.260 ms 0.463 ms 1.116 ms
4 12.87.125.249 (12.87.125.249) 14.838 ms 9.314 ms 9.755 ms
5 tbr2.cgcil.ip.att.net (12.122.99.122) 24.528 ms 24.788 ms 23.009 ms
6 ggr2.cgcil.ip.att.net (12.123.6.69) 22.362 ms 23.410 ms 22.335 ms
7 192.205.33.186 (192.205.33.186) 23.448 ms 24.074 ms 29.405 ms
8 ae-31-53.ebr1.Chicago1.Level3.net (4.68.101.94) 22.800 ms 32.598 ms 36.093 ms
9 ae-68.ebr3.Chicago1.Level3.net (4.69.134.58) 23.446 ms 21.599 ms 34.060 ms
10 ae-3.ebr2.Denver1.Level3.net (4.69.132.61) 61.517 ms 57.482 ms 56.606 ms
11 ae-2.ebr2.Seattle1.Level3.net (4.69.132.53) 96.484 ms 114.264 ms 96.984 ms
12 ae-23-52.car3.Seattle1.Level3.net (4.68.105.36) 91.295 ms 88.700 ms 89.705 ms
13 BIG-PIPE-IN.car3.Seattle1.Level3.net (4.71.152.26) 90.053 ms 90.511 ms 92.072 ms
14 rc1wh-pos14-0.vc.shawcable.net (66.163.76.1) 90.062 ms 93.489 ms 90.757 ms
15 rc2wh-pos0-15-2-0.vc.shawcable.net (66.163.69.181) 96.527 ms 91.743 ms 97.254 ms
16 rd1ht-tge1-1-1.ok.shawcable.net (66.163.77.18) 101.412 ms 114.160 ms 100.530 ms
17 ra1ht-ge3-1.ok.shawcable.net (66.163.72.134) 105.651 ms 101.336 ms 101.628 ms
18 rx0ht-rack-force-2.ok.bigpipeinc.com (64.251.64.50) 111.960 ms 101.535 ms 116.136 ms
19 rf1.01.rackforce.net (69.10.128.198) 583.192 ms 491.170 ms 598.406 ms
20 64.46.45.226 (64.46.45.226) 110.207 ms 108.718 ms 107.279 ms
A traceroute from one of the hosts that doesn't work would reach
ae-3.ebr2.Denver1.Level3.net but go no further. I then tried pinging
the routers I couldn't reach. I could not ping:
ae-3.ebr2.Denver1.Level3.net (4.69.132.61)
ae-2.ebr2.Seattle1.Level3.net (4.69.132.53)
ae-23-52.car3.Seattle1.Level3.net (4.68.105.36)
BIG-PIPE-IN.car3.Seattle1.Level3.net (4.71.152.26)
but when I started pinging rc1wh-pos14-0.vc.shawcable.net (66.163.76.1)
not only did I start getting responses, but everything started working
to ftp.agnewsonline.com too, but just from that host. It really seemed
that pinging that router some how fixed my problem.
Well, I'm not sure I really believed that, but I still had another
host that couldn't reach ftp.agnewsonline.com, so on that host I
started a ping. I'll add my comments to describe what I was doing in
another window in /* */:
[~]% ping ftp.agnewsonline.com
PING agnewsonline.com (64.46.45.226): 56 data bytes
/* At this point in another window I started a nother ping: */
/* ping 66.163.76.1 and immediately this ping started working ... */
64 bytes from 64.46.45.226: icmp_seq=18 ttl=108 time=104.617 ms
64 bytes from 64.46.45.226: icmp_seq=19 ttl=108 time=105.775 ms
64 bytes from 64.46.45.226: icmp_seq=20 ttl=108 time=101.569 ms
--- agnewsonline.com ping statistics ---
22 packets transmitted, 3 packets received, 86% packet loss
round-trip min/avg/max/stddev = 101.569/103.987/105.775/1.774 ms
It was like I threw a switch. The single outbound ICMP packet to
rc1wh-pos14-0.vc.shawcable.net (66.163.76.1) fixed everything for that
host.
I was wondering if anybody has any clue what might be going on. I've
never experienced a problem like this before.