Cogent NOC

Randy · December 14, 2016, 7:16pm

Hi all,

Anyone beyond front line support at cogento on list?

Nanog is the last place I'd look for assistance but it seems support over at cogentco is not nearly what it used to be.

Example MTR to cogen't own website (support doesn't utilize or understand MTR at all apparently):

Host Loss% Snt Last Avg Best Wrst StDev
  1. x.x.x.x 0.0% 196 0.5 11.7 0.3 186.8 35.2
  2. x.x.x.x 0.0% 196 0.6 10.2 0.4 226.3 36.2
  3. 38.88.249.209 0.0% 196 0.9 1.1 0.7 17.7 1.2
  4. te0-0-2-3.nr13.b023801-0.iad01.atl 0.0% 196 1.0 1.0 0.8 2.0 0.1
  5. te0-0-0-1.rcr22.iad01.atlas.cogent 2.0% 196 2.1 1.9 1.0 3.3 0.4
  6. be2961.ccr41.iad02.atlas.cogentco. 2.6% 196 1.8 2.1 1.1 3.8 0.5
  7. be2954.rcr21.iad03.atlas.cogentco. 2.6% 196 2.0 2.3 1.2 9.4 0.7
  8. be2952.agr11.iad03.atlas.cogentco. 0.5% 196 2.7 2.6 1.5 6.8 0.6
  9. cogentco.com 4.1% 196 2.1 2.0 1.0 16.8 1.1

Pretty much the same to anywhere. Packet loss begins at rcr22.iad01 and propagates all the way down the line. Worse during peak hours, gone late at night.

After three days of no email response for my ticket, I called and after an hour of my life I want back, front line support cannot reproduce the loss. Final conclusion: "Your host is dropping packets".

Kurt_Kraut · December 14, 2016, 7:53pm

Hello,

mtr packet loss column has no scientific precision and should not be
considered. It is not mtr fault but forwarding routers have a low priority
to respond to ICMP requests. The only way you can prove there is a problem
is a end to end ping, the regular ping command, not mtr.

Best regards,

Kurt Kraut

Andrew_Paolucci · December 14, 2016, 7:59pm

If the loss is seen towards a terminating webserver maybe spool curl response times over the course of a day and correlated that to observed loss via a smokeping service.

Support may be more responsive with some hard statistics.

Regards,
Andrew Paolucci

Local Time: December 14, 2016 2:53 PM
UTC Time: December 14, 2016 7:53 PM
Nanog <nanog@nanog.org>

Hello,

mtr packet loss column has no scientific precision and should not be
considered. It is not mtr fault but forwarding routers have a low priority
to respond to ICMP requests. The only way you can prove there is a problem
is a end to end ping, the regular ping command, not mtr.

Best regards,

Kurt Kraut

Randy · December 14, 2016, 8:04pm

(understood, however FYI,)

Also reproduced the results with pings walking them down the line up to
and including the actual host. The MTR example provided is simply the
clearest representation of the ping results which show the same.

~Randy

Ken_Chase1 · December 14, 2016, 8:08pm

I was going to reply and repeat Job Snijders's indications of Thu, 7 Jul 2016 to

Please review the excellent presentation from RA{T,S}:
https://www.nanog.org/meetings/nanog47/presentations/Sunday/RAS_Traceroute_N47_Sun.pdf
https://www.youtube.com/watch?v=a1IaRAVGPEE

esp the pdf there, but in this case Randy's mtr does do a ping to the last
hop. He did have 4.1% pl to the endpoint for his specific setup and
current gear/route/etc.

However, I go through the same hostname'd router he does (he didnt provide an
ip, but paris-traceroute doesnt show me load balancing, at least visibly), and
I only get 0.3% pl. (Though, my immediate upstream DSL provider's router is giving
me 0.2% pl, so who knows what that 0.3% means at the far end, really.)

Without bidirectional concurrent mtr's (one from cogent back to him at the
same time), it's quite hard to say what's going on. Even then that's no guarantee
of diagnosis.

Here's just the most recent thread with some depth on how to read traces and packetloss:

http://seclists.org/nanog/2016/Jul/155

the whole thread is useful. But is only one of the dozens of times this has come
up on nanog. (Again: read that pdf!)

/kc

Randy · December 14, 2016, 8:15pm

Walking the line, so to speak. Starting with our directly connected cogent peer. Loss begins at the same hop and carries through to the end host. I'm only using cogentco as an example, but the results are the same anywhere.

[root@mon ~]# ping -c 1000 -f 38.88.249.209
PING 38.88.249.209 (38.88.249.209) 56(84) bytes of data.

--- 38.88.249.209 ping statistics ---
1000 packets transmitted, 1000 received, 0% packet loss, time 765ms
rtt min/avg/max/mdev = 0.422/0.703/3.064/0.273 ms, ipg/ewma 0.765/0.648 ms
[root@mon ~]# ping -c 1000 -f 154.24.36.5
PING 154.24.36.5 (154.24.36.5) 56(84) bytes of data.

--- 154.24.36.5 ping statistics ---
1000 packets transmitted, 1000 received, 0% packet loss, time 754ms
rtt min/avg/max/mdev = 0.523/0.699/2.958/0.190 ms, ipg/ewma 0.755/0.663 ms
[root@mon ~]# ping -c 1000 -f 154.24.36.21
PING 154.24.36.21 (154.24.36.21) 56(84) bytes of data.
.....
--- 154.24.36.21 ping statistics ---
1000 packets transmitted, 995 received, 0% packet loss, time 1473ms
rtt min/avg/max/mdev = 0.718/1.330/3.471/0.461 ms, ipg/ewma 1.475/1.611 ms
[root@mon ~]# ping -c 1000 -f 154.54.42.105
PING 154.54.42.105 (154.54.42.105) 56(84) bytes of data.
........
--- 154.54.42.105 ping statistics ---
1000 packets transmitted, 992 received, 0% packet loss, time 1996ms
rtt min/avg/max/mdev = 0.884/1.794/6.813/0.626 ms, ipg/ewma 1.998/1.983 ms
[root@mon ~]# ping -c 1000 -f 154.54.7.54
PING 154.54.7.54 (154.54.7.54) 56(84) bytes of data.
.....
--- 154.54.7.54 ping statistics ---
1000 packets transmitted, 995 received, 0% packet loss, time 2376ms
rtt min/avg/max/mdev = 1.202/2.227/5.847/0.901 ms, ipg/ewma 2.378/1.465 ms
[root@mon ~]# ping -c 1000 -f 154.54.0.82
PING 154.54.0.82 (154.54.0.82) 56(84) bytes of data.
.........
--- 154.54.0.82 ping statistics ---
1000 packets transmitted, 991 received, 0% packet loss, time 2766ms
rtt min/avg/max/mdev = 1.178/2.530/6.219/0.831 ms, ipg/ewma 2.769/2.583 ms
[root@mon ~]# ping -c 1000 -f 38.100.128.10
PING 38.100.128.10 (38.100.128.10) 56(84) bytes of data.
.....
--- 38.100.128.10 ping statistics ---
1000 packets transmitted, 995 received, 0% packet loss, time 1730ms
rtt min/avg/max/mdev = 0.835/1.548/19.553/0.717 ms, ipg/ewma 1.732/1.077 ms

Mike_Hammett · December 14, 2016, 8:18pm

I think people are just going to see a traceroute determining packet loss and not going to read the rest of what happened. Just going to shortcut to an answer.

Bryan_Holloway1 · December 14, 2016, 11:42pm

Odd, though, that they didn't respond for three days. I've typically had good luck with that, although admittedly it's been months since I've opened an e-mail ticket with their NOC. Spam-filter?

Randy · December 14, 2016, 11:56pm

No, I got a confirmation and ticket ID immediately. Three days later I call it in and they pull it up, sure enough nobody took ownership. Same thing happened about a month ago. Seems email ticket priority is zero. You HAVE to call to get any kind of action. And it also seems the front line techs are not as skilled as before -- basic pinging and questionable understanding of traceroutes / asymmetrical routing.

It didn't used to be this way.

~Randy

Randy · December 15, 2016, 9:08pm

Hi All,

Final update from Cogent -- glad they have finally acknowledged -- but no ETA, just great:

After further investigation, we have identified an issue of congestion on our core device. At this time we are scheduling a maintenance to alleviate the congestion which in turn will fix the packetloss seen across the Ashburn area.

The maintenance has not yet been scheduled but we will inform you once we have a set date.