Packet loss on XO's network

Since ~11:08 am Central our monitoring system has been reporting
intermittent HTTPv6 timeouts to www.sprint.net, www.qwest.com
<http://www.qwest.com> , enterprise.com, and enterprise.ca. It goes bad for
a while and then clears.

Using mtr, these all have XO's network in common, and the packet loss to the
endpoint sustains over 40%. Seems to be between 2610:18:18b:c000::11 and
mcr1.minneapolis-mn.us.xo.net [2610:18::30b8]

Frank

To enterprise.com:

Host
Loss% Snt Last Avg Best Wrst StDev

1. router-core.mtcnet.net
0.0% 929 0.3 0.6 0.2 37.6 2.5

2. sxct.sxcy.mtcnet.net
0.0% 929 0.3 0.3 0.2 11.2 0.6

3. v6-premier.sxcy-mlx.fbnt.netins.net
0.0% 929 1.6 1.5 1.5 12.6 0.9

4. v6-ins-db4-te-0-6-0-4-219.desm.netins.net
52.4% 929 8.1 8.0 7.8 10.5 0.3

5. v6-ins-dc1-te-7-4.desm.netins.net
52.4% 929 7.9 8.0 7.6 10.9 0.5

6. 2610:18:18b:c000::11
0.0% 929 13.9 13.2 12.8 46.8 3.0

7. mcr1.minneapolis-mn.us.xo.net
50.3% 929 54.7 71.2 54.6 86.7 7.7

8. 2610:18::5802
50.8% 929 55.7 72.8 54.6 84.3 7.8

9. 2610:18::5202
51.4% 929 54.4 72.9 54.4 85.0 8.0

    2610:18::5804

10. 2610:18::2070
52.3% 928 54.4 71.5 54.4 90.7 7.9

    2610:18::5304

11. 2600:803:2::9
41.6% 928 54.5 56.8 47.8 96.7 3.4

    2610:18::5303

12. 2610:18::5200
96.0% 928 54.5 55.8 54.4 75.6 3.7

13. 2610:18::5202
96.0% 928 55.5 56.6 54.6 58.5 1.2

14. 2610:18::2070
96.0% 928 61.8 55.1 54.6 61.8 1.2

15. 2600:803:2::9
95.9% 928 45.0 45.1 45.0 48.0 0.5

16. ???

To www.sprint.net:

Packets Pings

Host
Loss% Snt Last Avg Best Wrst StDev

1. router-core.mtcnet.net
0.0% 846 0.4 0.8 0.2 91.5 3.9

2. sxct.sxcy.mtcnet.net
0.0% 846 0.9 0.3 0.2 10.3 0.4

3. v6-premier.sxcy-mlx.fbnt.netins.net
0.0% 846 1.5 1.6 1.5 12.0 0.8

4. v6-ins-db4-te-0-6-0-4-219.desm.netins.net
55.4% 846 7.9 8.1 7.8 14.8 0.4

5. v6-ins-dc1-te-7-4.desm.netins.net
54.3% 846 7.8 8.2 7.6 19.2 1.0

6. 2610:18:18b:c000::11
0.0% 846 12.9 14.0 12.8 73.0 5.4

7. mcr1.minneapolis-mn.us.xo.net
49.8% 846 54.7 70.0 54.6 100.1 8.1

8. 2610:18::5802
47.4% 846 58.3 71.9 54.6 82.9 8.2

9. 2610:18::5202
45.0% 846 54.4 72.1 54.4 83.3 8.4

    2610:18::5804

10. 2610:18::2070
50.6% 845 54.4 70.4 54.3 88.8 8.1

    2610:18::5304

11. sl-st31-ash-te0-15-2-0.v6.sprintlink.net
43.4% 845 57.5 59.1 53.1 80.5 2.3

    2610:18::5303

12. sl-crs3-dc-be13.v6.sprintlink.net
46.7% 845 54.6 64.7 54.4 69.1 3.7

    2610:18::5200

13. sl-crs1-ffx-be2.v6.sprintlink.net
44.1% 844 55.0 71.8 55.0 79.7 5.3

    2610:18::5202

14. sl-crs1-orl-bu-1.v6.sprintlink.net
41.7% 844 54.9 75.7 54.7 96.3 7.4

    2610:18::2070

15. sl-lkdstr2-p1-0.v6.sprintlink.net
41.0% 844 46.4 73.5 45.8 81.7 7.0

    sl-st31-ash-te0-15-2-0.v6.sprintlink.net

16. www.sprint.net
43.1% 844 54.5 74.6 53.5 81.6 5.0

    sl-crs3-dc-be13.v6.sprintlink.net

XO contacted me offline. Things have bene stable since ~12:15 pm Central.

Frank

Terri,

Except the initial traces show that hop 6 is clean. If the problem really
started at hop 4 then the issue would likely show up in hop 6 (unless there
was a hashing on the return path based on the far end router's source IP
that caused just some traffic to be thrown away).

Hop 4 and 5 always have packet loss - I believe it's due to ICMPv6 rate
limiters protecting their CPU:

Frank