barak-online.net icmp performance vs. traceroute/tcptraceroute, ssh, ipsec

I was wondering if someone could shed some light on this little curiosity.

US ping (sourced from different networks, including cable customer in NE) to the consumer grade residental israel dsl cpe (currently cisco 871) look really nice and sweet, gotomypc works alright, consumer is enjoying overall internet experience.

vnc from customer to US is a non-starter. ssh from US almost never works. ipsec performance is horrid.

traceroute/tcptraceroute show packet loss and MUCH higher rtt than the corresponding direct pings on the reported hop entries.

Is this some sort of massaging or plain just "faking it"? Or is such things merely net-urban myth?

Here is a traceroute snippet

  8 dcr3-ge-0-2-1.newyork.savvis.net (204.70.193.98) 31.008 ms 31.539 ms 31.248 ms
  9 208.173.129.14 (208.173.129.14) 62.847 ms 31.095 ms 30.690 ms
10 barak-01814-nyk-b2.c.telia.net (213.248.83.2) 30.529 ms 30.820 ms 30.495 ms
11 * * po1-3.bk3-bb.013bk.net (212.150.232.214) 277.722 ms
12 gi2-1.bk6-gw.013bk.net (212.150.234.94) 223.398 ms 235.616 ms 214.551 ms
13 * * gi11-24.bk6-acc3.013bk.net (212.29.206.37) 227.259 ms
14 212.29.206.60 (212.29.206.60) 244.369 ms013bk.net * 246.271 ms
15 89.1.148.230.dynamic.barak-online.net (89.1.148.230) 251.923 ms 256.817 ms *

Compared to ICMP echo

root@jml03:~# ping 89.1.148.230
PING 89.1.148.230 (89.1.148.230) 56(84) bytes of data.
64 bytes from 89.1.148.230: icmp_seq=1 ttl=240 time=190 ms

--- 89.1.148.230 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 190.479/190.479/190.479/0.000 ms
root@jml03:~# ping 89.1.148.230
PING 89.1.148.230 (89.1.148.230) 56(84) bytes of data.
64 bytes from 89.1.148.230: icmp_seq=1 ttl=240 time=186 ms
64 bytes from 89.1.148.230: icmp_seq=2 ttl=240 time=196 ms
64 bytes from 89.1.148.230: icmp_seq=3 ttl=240 time=187 ms
64 bytes from 89.1.148.230: icmp_seq=4 ttl=240 time=181 ms
64 bytes from 89.1.148.230: icmp_seq=5 ttl=240 time=184 ms
64 bytes from 89.1.148.230: icmp_seq=6 ttl=240 time=190 ms

--- 89.1.148.230 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5001ms
rtt min/avg/max/mdev = 181.572/187.756/196.277/4.685 ms
root@jml03:~# ping 212.29.206.60
PING 212.29.206.60 (212.29.206.60) 56(84) bytes of data.
64 bytes from 212.29.206.60: icmp_seq=1 ttl=241 time=179 ms
64 bytes from 212.29.206.60: icmp_seq=2 ttl=241 time=171 ms
64 bytes from 212.29.206.60: icmp_seq=3 ttl=241 time=171 ms

--- 212.29.206.60 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 171.388/174.375/179.968/3.972 ms

root@jml03:~# ping 212.29.206.37
PING 212.29.206.37 (212.29.206.37) 56(84) bytes of data.
64 bytes from 212.29.206.37: icmp_seq=1 ttl=242 time=177 ms
64 bytes from 212.29.206.37: icmp_seq=2 ttl=242 time=176 ms
64 bytes from 212.29.206.37: icmp_seq=3 ttl=242 time=175 ms

--- 212.29.206.37 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 175.412/176.516/177.187/0.858 ms

Joe

traceroute/tcptraceroute show packet loss and MUCH higher rtt than the
corresponding direct pings on the reported hop entries.

Is this some sort of massaging or plain just "faking it"? Or is such
things merely net-urban myth?

the vast majority of routers on the internet respond very differently to
traffic 'directed at them' as opposed to traffic 'routed through them'.

many routers will punt traffic "at them" (such as icmp echo) to a low-priority
control-plane (software) stack to respond to. this is vastly different to what
may well be a hardware (ASIC) based forwarding path.

many routers will also typically rate-limit the number of such queries they
respond to per second. this may even be a tunable setting (e.g. CoPP on some
Cisco products).

i'd suggest that you don't try to read ANYTHING into comparing 'traceroute'
with end-to-end icmp echo. nor that traceroute only shows one direction of
traffic.

if you have IPSec/SSH and/or TCP in general which simply "doesn't work right",
i suggest you first verify that the end-to-end MTU is appropriate. my bet is
that it isn't, and that PMTUD isn't working as expected because of some
filtering and/or broken devices/configuration in the path.

try sending pings at 1500 byte packets with DF set & see if they get through.
my money is on they don't.

cheers,

lincoln.

Lincoln Dale wrote:

traceroute/tcptraceroute show packet loss and MUCH higher rtt than the
corresponding direct pings on the reported hop entries.

Is this some sort of massaging or plain just "faking it"? Or is such
things merely net-urban myth?

the vast majority of routers on the internet respond very differently to
traffic 'directed at them' as opposed to traffic 'routed through them'.

Thanks for your reply.

I did include icmp echo directly to each hop as a comparison.

> the vast majority of routers on the internet respond very differently to
> traffic 'directed at them' as opposed to traffic 'routed through them'.

Thanks for your reply.

I did include icmp echo directly to each hop as a comparison.

i guess what i'm saying is that you can't read much from the backscatter of
what a either:
- ping of each hop
- eliciting a response from each hop (as traceroute does)
as the basis for determining much.

you can perhaps derive SOME meaning from it, but that meaning rapidly
diminishes when there are multiple intermediate networks involved, some of
which you have no direct connectivity to verify problems with easily, likely
different return path for traffic (asymmetric routing) etc.

as i said before, if you have such terrible ssh/IPSec type performance, far
less than you think is reasonable, then my money is on a MTU issue, and
probably related to your DSL-based final hops.

cheers,

lincoln.

Right, but from what you posted you didn't send 1500-byte packets. My
reaction was the same as Lincoln's -- it smells like a Path MTU
problem. To repeat -- ping and traceroute RTT from intermediate nodes
is at best advisory, especially on timing.

I should add -- DSL lines often use PPPoE, which in turn cuts the
effective MTU available for user packets. If the PMTUD ICMP packets
don't get through -- and they often don't, because of misconfigured
firewalls -- you're likely to see problems like this.

    --Steve Bellovin, http://www.cs.columbia.edu/~smb

I did include icmp echo directly to each hop as a comparison.

Right, but from what you posted you didn't send 1500-byte packets. My
reaction was the same as Lincoln's -- it smells like a Path MTU
problem. To repeat -- ping and traceroute RTT from intermediate nodes
is at best advisory, especially on timing.

I should add -- DSL lines often use PPPoE, which in turn cuts the
effective MTU available for user packets. If the PMTUD ICMP packets
don't get through -- and they often don't, because of misconfigured
firewalls -- you're likely to see problems like this.

Of course, and thats why I have cut down ip mtu and tcp adjust mss and all the rest.

Not making much of a difference.

Furthermore, ipsec performance with normal sized icmp pings is what I was referring to, and those are nowwhere near full-sized.

Lincoln Dale wrote:

I did include icmp echo directly to each hop as a comparison.

i guess what i'm saying is that you can't read much from the backscatter of
what a either:
- ping of each hop
- eliciting a response from each hop (as traceroute does)
as the basis for determining much.

you can perhaps derive SOME meaning from it, but that meaning rapidly
diminishes when there are multiple intermediate networks involved, some of
which you have no direct connectivity to verify problems with easily, likely
different return path for traffic (asymmetric routing) etc.

When the cards consistently fall in certain patterns, you can actually read them quite easily.

The standard control plane arguments dont apply when the pattern holds all the way through to equipment under your {remote-}control.

In this specific instance, I find interesting the disparity of results between each hop ICMP echo and traceroute time exceeded processing, all the way up to the final hop.

I wouldnt care if the application protocols rode well, but they dont seem to.

When the cards consistently fall in certain patterns, you can actually
read them quite easily.

Not if the cardplayer is lying..

The standard control plane arguments dont apply when the pattern holds
all the way through to equipment under your {remote-}control.

In this specific instance, I find interesting the disparity of results
between each hop ICMP echo and traceroute time exceeded processing, all
the way up to the final hop.

I wouldnt care if the application protocols rode well, but they dont
seem to.

Have you fired up ethereal/wireshark at either end and sniffed the packet flow
to see exactly whats going on under these circumstances? Is there a difference
between IPSEC and normal TCP traffic? Whats handling your IPSEC at either
end? etc, etc.

I've got plenty of graphs available which show modern Cisco equipment holding
-horrible- ping variance compared to forwarding variance. Eg - Cat 4500 acting
as LAN router and switch having ping RTT between <1ms and 15ms, but forwarding
ping RTT (ie, to a PC at the other end doing 100% bugger all) is flat sub-1ms.
(Makes for some -very- interesting VoIP statistics if you're not careful.)

I say "You need more information before jumping to conclusions" and "the
information you have, whilst probably quite valid when correlated with other
information, isn't going to be very helpful by itself."

Adrian

> i guess what i'm saying is that you can't read much from the backscatter of
> what a either:
> - ping of each hop
> - eliciting a response from each hop (as traceroute does)
> as the basis for determining much.
>
> you can perhaps derive SOME meaning from it, but that meaning rapidly
> diminishes when there are multiple intermediate networks involved, some of
> which you have no direct connectivity to verify problems with easily,
likely
> different return path for traffic (asymmetric routing) etc.

When the cards consistently fall in certain patterns, you can actually
read them quite easily.

The standard control plane arguments dont apply when the pattern holds
all the way through to equipment under your {remote-}control.

it most certainly does. lets use an example network of:

        F
        >
A---B---C---D---E
        >
        G

you are looking at ICMP/traceroute responses through sending traffic to/from A
& E.

for all you know, there may be an ICMP DDoS attack going on from F-C or from
G-C. the router 'C' is perfectly entitled to rate-limit the # of icmp
responses it sends per second, and due to said traffic from F & G may be doing
so.

this would render your reading of the tea leaves of what A and E are seeing of
C.

this diagram is incredibly simplistic. for the "greater internet", we could
add perhaps 50x connections at each of B, C & D, not to mention the path you
posted showed upwards of a dozen hops - so more realistically there could be 4
or 5 order of magnitude more devices causing traffic in the path.

In this specific instance, I find interesting the disparity of results
between each hop ICMP echo and traceroute time exceeded processing, all
the way up to the final hop.

I wouldnt care if the application protocols rode well, but they dont
seem to.

while you can paint a partial picture from elicited icmp responses, it
certainly doesn't give you the full canvas.

you've only tested traffic from A to E. what about A to F where those are
ENDPOINTS and not ROUTERS? e.g. try a long-lived HTTP/FTP stream & see what
you get.

cheers,

lincoln.

I agree with Dale. The problem should be with e2e TCP
performance.

Maybe there is misconfigured firewall which block SYN
or ACK packet. Or, packet larger than 128B is dropped.

As you can find in your data, ping and traceroute show
different response speed.

Maybe you could try layer4 traceroute, and try packet
size bigger than 1000Byte. It will show you where the
problem may exist.

Joe

ICMP or traceroute usually use small packet.

Um.. sorry if you mean more than you said, but where did you cut down the TCP MTU? If you did it on your routers, then you are creating or at least complementing the problem.

The only way to make smaller MTUs work is to alter the MTU on both the origin and destination systems. Altering the MTU anywhere along the path only breaks things.

Jo Rhett wrote:

Of course, and thats why I have cut down ip mtu and tcp adjust mss and all the rest.
Not making much of a difference.

Um.. sorry if you mean more than you said, but where did you cut down the TCP MTU? If you did it on your routers, then you are creating or at least complementing the problem.

On the CPE dialer interface.

On the ezvpn dvti virtual-template

The only way to make smaller MTUs work is to alter the MTU on both the origin and destination systems. Altering the MTU anywhere along the path only breaks things.

Lower than 1500 mtu always requires some kind of hack in real life.

That would be the adjust-mss which is the hack-of-choice

Joe Maimon wrote:

Jo Rhett wrote:

Of course, and thats why I have cut down ip mtu and tcp adjust mss and all the rest.
Not making much of a difference.

Um.. sorry if you mean more than you said, but where did you cut down the TCP MTU? If you did it on your routers, then you are creating or at least complementing the problem.

On the CPE dialer interface.

On the ezvpn dvti virtual-template

The only way to make smaller MTUs work is to alter the MTU on both the origin and destination systems. Altering the MTU anywhere along the path only breaks things.

Lower than 1500 mtu always requires some kind of hack in real life.

That would be the adjust-mss which is the hack-of-choice

I remember from my early DSL days, it was recommended to configure
mtu=1480 on all interfaces connected to the internet or to the NAT-router.

I remember at least the Grandstream ATA and DSL-NAT-router was brainded
(lobotomized ICMP) enough simply to break connections when packets
exceeded the 1480 bytes.

Practically all german internet users are on dsl lines. Some smaller hosts
with ftp or http servers are on dsl or tunnels, maybe with even smaller mtu.

So mtu < 1500 is practically the norm.

Kind regards
Peter and Karin Dambier

Yes, I remember that too. Back when I was a consultant I came out to a lot of sites and undid that change because it just breaks things if you did that on your router and not on your hosts.

And note, hosts on *both* sides of every connection.

Which in short means: doesn't work in Real Life.

Lower than 1500 mtu always requires some kind of hack in real life.

That would be the adjust-mss which is the hack-of-choice

note that using 'adjust-mss' only adjusts the MSS for TCP.
it won't do much good for already-encapsulated IPSec traffic with protocol 47
or tunneled over UDP...

cheers,

lincoln.

After all the discussion, the difference of last hop of the trace
(from original email)

15 89.1.148.230.dynamic.barak-online.net (89.1.148.230) 251.923 ms

256.817 ms *
And the ping result

64 bytes from 89.1.148.230: icmp_seq=6 ttl=240 time=190 ms

is still quite interesting. I assumed the last hop is the cisco 871
(IP=89.1.148.230).
It will be good to know what cause the difference if you have full
controll of the 871.

Min

Lincoln Dale wrote:

Lower than 1500 mtu always requires some kind of hack in real life.

That would be the adjust-mss which is the hack-of-choice

note that using 'adjust-mss' only adjusts the MSS for TCP.
it won't do much good for already-encapsulated IPSec traffic with protocol 47
or tunneled over UDP...

Which is why its configured on the ipsec tunnel. And if there isnt one, on the the ingress interface. Which brings forth the observation that adjust mss should rather be used in route-map pbr style.

I know we had that whole discussion right here, back when I was younger and dumber, such as here:

Anyways, initial reports are that as per my advice, customer calls vendor says "voip not working" vendor says "i changed something, wont tell you what, reboot everything in 30" and now things seem to work perfectly, strangely enough EVEN the traceroutes.

This is obviously not best effort. Best guess would be "managed bandwidth" differentiated by ip ranges and that the "change" was a different pool assignment.

I suspect the stellar icmp echo performance is also intentional.

Compare:

tcptraceroute lsvomonline.dnsalias.com -q 5 -w 1 80 -f 7
Selected device eth0, address 192.168.0.3, port 33204 for outgoing packets
Tracing the path to lsvomonline.dnsalias.com (82.166.56.247) on TCP port 80 (www), 30 hops max
  7 kar2-so-7-0-0.newyork.savvis.net (204.70.150.253) 45.008 ms 52.978 ms 32.404 ms 50.676 ms 33.657 ms
  8 dcr3-ge-0-2-1.newyork.savvis.net (204.70.193.98) 49.037 ms 33.145 ms 48.029 ms 34.355 ms 48.453 ms
  9 208.173.129.14 32.841 ms 32.669 ms 33.274 ms 31.861 ms 32.570 ms
10 barak-01814-nyk-b2.c.telia.net (213.248.83.2) 37.181 ms 32.600 ms 33.442 ms 32.696 ms 32.882 ms
11 po1-3.bk3-bb.013bk.net (212.150.232.214) 177.165 ms 175.852 ms 178.104 ms 179.217 ms 175.214 ms
12 gi2-1.bk6-gw.013bk.net (212.150.234.94) 180.923 ms 182.761 ms 179.170 ms 203.878 ms 178.905 ms
13 gi8-1.bk6-acc3.013bk.net (212.29.206.41) 174.266 ms 177.854 ms 177.198 ms 177.439 ms 176.400 ms
14 bk6-lns-3.013bk.net (212.29.206.55) 181.717 ms 176.460 ms 228.843 ms 174.942 ms 176.706 ms
15 82-166-56-247.barak-online.net (82.166.56.247) [open] 190.395 ms 188.043 ms 189.961 ms 200.064 ms 192.943 ms

Joe Maimon wrote:

This is obviously not best effort. Best guess would be "managed bandwidth" differentiated by ip ranges and that the "change" was a different pool assignment.

I suspect the stellar icmp echo performance is also intentional.

Or it could just be some QOS policing/shaping.

How asymmetric is the link? I've noticed quite dramatic differences when
configuring even basic policy maps with WRED on DSL TX-side on CPE (ie,
the small sized pipe upstream from client to ISP.)

I can't (normally) control what the ISP is sending to me*, but I can
try to make the best of the situation. And it can allow pipes to be
almost fully utilised without massive performance drop-offs at the
top end.

Adrian

* except in instances where I also run the ISP network..

Adrian Chadd wrote: