One of my NetOps engineers resides in Lima, Ohio and they are receiving terrible bufferbloat, packet loss, and random disconnects.
I have been pinging 220.127.116.11 (Lima, OH Spectrum/Chart Node) and it’s rejecting a ton of packets. This has been going on for weeks.
Node having problems: lag-1.limaohid01h.netops.charter.com
NOC seems like they don’t care, same with OSP in the field.
There is no reason why this hop (#13) should have up to 613ms ping times.
If you run MTRs or traceroutes through the node, is there any other additional packet loss seen in the path, and at the destination? What does the reverse MTR or traceroute look like? The attached image was stripped out by the mailing list system.
Bufferbloat is controlled at the firewall level, which is different from packet loss and disconnects.
ICMP response time from a router/device is not a great way to judge if there is an issue or not. The devices generally have control plane policing and responding to ICMP is not prioritized at all.
I would suggest your engineer setup something on their end of the connection that you can ping, and start there.
Leverage something like smoke ping to something on their LAN, or even the public IP on their RG/Modem.
For those who haven’t seen it (i.e. Austin), here is “the guide” on how to troubleshoot correctly with traceroute: https://archive.nanog.org/meetings/nanog47/presentations/Sunday/RAS_Traceroute_N47_Sun.pdf
ICMP is deprioritized by any normal router. Non-cascading loss does not indicate a problem of any kind. The NOC doesn’t care because nothing is wrong, and the OSP team definitely doesn’t care because ICMP is several layers above OSP and is therefore not their problem.