Do network diagnostic tools need upgrade?

Hello NANOG list members,

I have a question for you, are you happy with the current network
diagnostic tools, like ping, trace route .. etc, don't you think it's time
to have an upgraded version of icmp protocol? from my side there is a lot
that I can NOT do with current tools and protocols, here are few scenarios,
and I would like to hear yours:

First scenario:

To be able to troubleshoot advanced networks with complex QoS and
policy-based routing configuration, where ping, traceroute and other
network diagnostic tools do not provide accurate readings, for example, you
are troubleshooting a web server with ping, and it looks functioning quite
well (packet loss and round trip time is all good), but web services are
still significantly slow, the fact is icmp and tcp:80 might have different
priorities and bandwidth limits on each router along the path between the
client and the server, in this case, network admins usually use telnet
applications like (Paping), well it may help if the forward and return path
of all packets are exactly the same.

Second scenario:

So another possible scenario is that you need to determine readings for
forward and return paths, TraceRoute for example gives you forward path
only using icmp. But what if you need to troubleshoot a VoIP server for
example, assuming that packets return path might not be the same as forward
path.

Third scenario:

One of the most common problems in networking, is that you don't have
access to all equipment between client and server, but you have to
troubleshoot the path between them and to understand where the problem
exactly is in order to contact the right person without having the
privilege to check the configuration on each router.

Fourth scenario:

Also, with trace route you can't determine the actual path, for example,
the router may direct http traffic to proxy server while leaving other
traffic going through a different hop.

I have a question for you, are you happy with the current network
diagnostic tools, like ping, trace route .. etc, don't you think it's time
to have an upgraded version of icmp protocol? from my side there is a lot
that I can NOT do with current tools and protocols, here are few scenarios,
and I would like to hear yours:

Upgrading ICMP protocol is... challenging. I wouldn't even bother with
trying to do anything with IPv4. Might be some options for IPv6, *IF* you
can provide a *specific* proposal that looks worth the added code and
router complexity....

Also, remember that most routers will do packet forwarding in hardware if
they can just suck in bits on one interface and toss them out another - but if
they have to do stuff like create and send an ICMP TTL Exceeded packet,
you end up on the control plane and probably rate-limited.

applications like (Paping), well it may help if the forward and return path
of all packets are exactly the same.

That's a routing problem, not an ICMP problem. As are the remainder of your
examples.

Oldies, but goodies:
shaperprobe (1st), pchar (3rd), tcptrace.org, lft (4th), iftop, nsping
(2nd), iperf, sjitter, pathneck (3rd)

These are newer --
http://www.internet2.edu/products-services/performance-monitoring/performance-tools/
(OWAMP,
2nd) -- http://paris-traceroute.net (4th) --
http://packetdrill.googlecode.com

dre

Hello NANOG list members,

I have a question for you, are you happy with the current network
diagnostic tools, like ping, trace route .. etc,

What tools are you referring to by "..."? There are many others. I like
tcptraceroute (there are two variants of it) and mtr.

don't you think it's time
to have an upgraded version of icmp protocol?

What is it that you are thinking?

ICMP is for signaling between machines. Increasing signaling for human
diagnostics can lead to reconnaissance attacks. We don't want yet
another option for some to incorrectly block ICMP [1], which in turn
leads to other problems.

[1] ... when they want to just block ICMP echo and reply, which is also
bad enough and must be done really selectively.

First scenario:

To be able to troubleshoot advanced networks with complex QoS and
policy-based routing configuration, where ping, traceroute and other
network diagnostic tools do not provide accurate readings, for example, you
are troubleshooting a web server with ping, and it looks functioning quite
well (packet loss and round trip time is all good), but web services are
still significantly slow, the fact is icmp and tcp:80 might have different
priorities and bandwidth limits on each router along the path between the
client and the server, in this case, network admins usually use telnet
applications like (Paping), well it may help if the forward and return path
of all packets are exactly the same.

tcptraceroute.

Second scenario:

So another possible scenario is that you need to determine readings for
forward and return paths, TraceRoute for example gives you forward path
only using icmp. But what if you need to troubleshoot a VoIP server for
example, assuming that packets return path might not be the same as forward
path.

It depends. Asymmetric routing is not necessarily bad unless it causes a
problem like packet loss, high latency, etc. For example, if the return
path has packet loss but you should 'hopefully' (yeah I know...) notice
it in the traceroute if you increment the probe count or run it twice.
Or try mtr, a periodic traceroute with different statistics presentation
that significantly reduces the 'hopefully' problem.

Third scenario:

One of the most common problems in networking, is that you don't have
access to all equipment between client and server, but you have to
troubleshoot the path between them and to understand where the problem
exactly is in order to contact the right person without having the
privilege to check the configuration on each router.

This one's more difficult but also "it depends". State a specific
problem case.

Fourth scenario:

Also, with trace route you can't determine the actual path, for example,
the router may direct http traffic to proxy server while leaving other
traffic going through a different hop.

tcptraceroute.

There are lesser known options that are used by folks, eg: ping record-route.

One could certainly use those available tools, but most folks have a hard
enough time interpreting traceroute output. I've seen customers complain
about performance to have us show them it's on their network, or their firewall
modules, etc..

Having statistics on network usage/errors/drops is incredible useful in
isolating the performance limitations. Knowing that a firewall maxes at 350Mb/s
is as equally useful as having protocol extensions to collect the data.

One of my early experiences with a sysadmin who only cared about the application/OS
was "the router is a black box that gets my packets there". Knowing the behavior beyond
there is also important (how latency/loss impacts tcp/udp/application performance for
example).

Most importantly, keeping an open mind when troubleshooting is helpful. Sometimes
you find something unexpected. (eg: uRPF drops when responding IP is mapped-v4-in-v6
from within 6PE network).

- Jared

I like observium for monitoring gear, tons of information, great way to
find erroring fiber over thousands of devices and caught some memory leaks
prior to impacting things. This is in addition to flow data of course.

Bryan
DigitalOcean

We're Hiring