Severe latency at both San Jose and Los Angeles Level3/AT&T peering

Hi Nanog,

I have a ticket open with Level 3, with whom I have 1gig pipes in Oakland,
CA and Las Vegas, NV.

One of our users noticed very slow file transfer/media delivery from the
Bay Area to L.A., and on investigating it appears as though the peering
point between Level3 and AT&T in SF was saturated and had 300ms avg.
latency.

90 minutes later after receiving no call from Level3, I escalated to a P1
ticket, as the latency is now > 1000ms and we're seeing 20% packet loss.

I decided to statically route to the destination via our DR cluster in
Vegas, and interestingly I found the same situation where AT&T and Level3
peer in Tustin.

mtr traceroutes, for those curious:

Via Oakland:

                                       My traceroute [v0.71]

hivemind (0.0.0.0)
Fri Apr 11 15:36:08 2014

Keys: Help Display mode Restart statistics Order of fields quit

                                                                  Packets
            Pings

Host Loss% Last
  Avg Best Wrst StDev

1. 138.72.xxx.xxx 0.0% 0.3
  0.2 0.1 0.3 0.1
2. pan5060-ae1-401.routerland.pixar.com 0.0% 0.4
  0.4 0.4 0.5 0.0
3. verge-vlan66.pixar.com 0.0% 0.6
  0.7 0.6 0.9 0.1
4. ge-6-24.car1.Oakland1.Level3.net 0.0% 0.7
105.5 0.7 307.3 110.9
5. ae-5-5.ebr2.SanJose1.Level3.net 0.0% 1.7
  1.7 1.6 2.8 0.4
6. ae-92-92.csw4.SanJose1.Level3.net 0.0% 1.6
  1.7 1.6 3.0 0.4
7. ae-4-90.edge2.SanJose1.Level3.net 0.0% 1.6
  4.6 1.6 37.1 10.2
8. 192.205.32.209 41.7% 1042.
1048. 1038. 1059. 9.1
9. cr1.sffca.ip.att.net 25.0% 1052.
1059. 1046. 1072. 10.0
10. cr1.la2ca.ip.att.net 27.3% 1043.
1060. 1043. 1071. 10.7
11. cr83.la2ca.ip.att.net 16.7% 1058.
1060. 1045. 1073. 8.8
12. gar7.la2ca.ip.att.net 16.7% 1059.
1061. 1044. 1087. 13.3
13. 12.249.143.98 33.3% 1059.
1057. 1048. 1071. 7.8
14. ???

                                       My traceroute [v0.71]

hivemind (0.0.0.0)
Fri Apr 11 15:36:43 2014

Resolver: Received error response 2. (server failure)er of fields quit

                                                                  Packets
            Pings

Host Loss% Last
  Avg Best Wrst StDev
1. 138.72.xxx.xxx 0.0% 0.2
  0.1 0.1 0.2 0.0
2. pan5060-ae1-401.routerland.pixar.com 0.0% 0.4
  0.4 0.3 0.6 0.1
3. cat-vegas-01-vlan66.pixar.com 0.0% 22.0
21.8 21.7 22.3 0.2
4. 205.129.21.101 0.0% 19.4
19.5 19.3 19.9 0.2
5. ae-2-5.bar1.LasVegas1.Level3.net 0.0% 19.3
21.8 19.3 40.7 5.9
6. ae-4-4.ebr1.LosAngeles1.Level3.net 0.0% 22.0
22.4 21.9 26.8 1.3
7. ae-6-6.ebr1.Tustin1.Level3.net 0.0% 20.0
20.2 19.9 21.8 0.5
8. ae-107-3507.bar2.Tustin1.Level3.net 0.0% 22.0
22.0 21.9 22.1 0.0
9. 192.205.37.145 30.8% 1052.
1063. 1048. 1072. 8.1
10. cr1.la2ca.ip.att.net 35.7% 1050.
1060. 1050. 1070. 7.3
11. cr83.la2ca.ip.att.net 28.6% 1049.
1064. 1049. 1072. 7.8
12. gar7.la2ca.ip.att.net 21.4% 1048.
1061. 1048. 1072. 6.7
13. 12.249.143.98 28.6% 1050.
1061. 1050. 1072. 7.9
14. ???

Just wanted to share in case anyone else is running into similar issues. I
know, I should be on the outages list. I will add myself now. :slight_smile:

Regards,
Dave Sotnick

This should provide some background:

http://apps.fcc.gov/ecfs/document/view?id=7022026095

Drive Slow,
Paul