Hurricane Electric packet loss

Hi,

We’ve been customers of Hurricane Electric for a number of years now and always been happy with their service.

In recent months packet loss on some of their major routes has become a very common (every few days) occurrence. Without knowledge of their network I am unsure what’s the cause of it but we’ve seen it on the Tokyo - US routes as well as the London - US routes. It reminds me of the Cogent expansion which was carried out by unsustainable oversubscription which eventually resulted in unusable service for a number of years. Having seen some of the rates that HE has been selling for I can’t help but wonder if they made the same mistake ...

Here is an example of what’s going on again atm.
HOST: prolocation01.ring.nlnog.ne Loss% Snt Last Avg Best Wrst StDev
  1.|-- 2a00:d00:ff:136::253 0.0% 11 0.3 0.3 0.3 0.4 0.0
  2.|-- 2a00:d00:1:12::1 0.0% 10 0.7 0.8 0.7 1.1 0.1
  3.|-- hurricane-electric.nikhef 0.0% 10 0.7 3.1 0.7 8.3 2.9
  4.|-- 100ge9-1.core1.lon2.he.ne 0.0% 10 9.8 12.6 8.0 19.2 4.1
  5.|-- 100ge1-1.core1.nyc4.he.ne 10.0% 10 74.7 74.6 73.7 80.8 2.3
  6.|-- 10ge10-3.core1.lax1.he.ne 30.0% 10 133.4 138.0 133.4 145.1 4.8
  7.|-- 10ge1-3.core1.lax2.he.net 20.0% 10 135.7 139.1 133.4 145.1 4.5
  8.|-- 2001:504:13::3b 40.0% 10 143.2 143.1 142.1 144.4 0.8
  9.|-- 2402:7800:100:1::55 50.0% 10 144.4 144.1 143.8 144.4 0.2
10.|-- 2402:7800:0:1::f6 60.0% 10 298.7 298.4 298.2 298.7 0.2
11.|-- ge-0-1-4.cor02.syd03.nsw. 10.0% 10 299.3 298.9 298.3 299.5 0.5
12.|-- 2402:7800:0:2::18a 20.0% 10 299.7 299.4 298.9 300.1 0.4
13.|-- 2001:dcd:12::10 30.0% 10 299.8 299.5 298.8 300.0 0.5

Is anybody else observing this as well?

Cheers,
Wolfgang

Wee$,1rye(Bve been customers of Hurricane Electric for a number of years now and always been happy with their service.

In recent months packet loss on some of their major routes has become a very common (every few days) occurrence. Without knowledge of their network I am unsure whate$,1rye(Bs the cause of it but wee$,1rye(Bve seen it on the Tokyo - US routes as well as the London - US routes. It reminds me of the Cogent expansion which was carried out by unsustainable oversubscription which eventually resulted in unusable service for a number of years. Having seen some of the rates that HE has been selling for I cane$,1rye(Bt help but wonder if they made the same mistake ...

Here is an example of whate$,1rye(Bs going on again atm.
HOST: prolocation01.ring.nlnog.ne Loss% Snt Last Avg Best Wrst StDev
  1.|-- 2a00:d00:ff:136::253 0.0% 11 0.3 0.3 0.3 0.4 0.0
  2.|-- 2a00:d00:1:12::1 0.0% 10 0.7 0.8 0.7 1.1 0.1
  3.|-- hurricane-electric.nikhef 0.0% 10 0.7 3.1 0.7 8.3 2.9
  4.|-- 100ge9-1.core1.lon2.he.ne 0.0% 10 9.8 12.6 8.0 19.2 4.1
  5.|-- 100ge1-1.core1.nyc4.he.ne 10.0% 10 74.7 74.6 73.7 80.8 2.3
  6.|-- 10ge10-3.core1.lax1.he.ne 30.0% 10 133.4 138.0 133.4 145.1 4.8
  7.|-- 10ge1-3.core1.lax2.he.net 20.0% 10 135.7 139.1 133.4 145.1 4.5
  8.|-- 2001:504:13::3b 40.0% 10 143.2 143.1 142.1 144.4 0.8
  9.|-- 2402:7800:100:1::55 50.0% 10 144.4 144.1 143.8 144.4 0.2
10.|-- 2402:7800:0:1::f6 60.0% 10 298.7 298.4 298.2 298.7 0.2
11.|-- ge-0-1-4.cor02.syd03.nsw. 10.0% 10 299.3 298.9 298.3 299.5 0.5
12.|-- 2402:7800:0:2::18a 20.0% 10 299.7 299.4 298.9 300.1 0.4
13.|-- 2001:dcd:12::10 30.0% 10 299.8 299.5 298.8 300.0 0.5

Is anybody else observing this as well?

Why do you think this indicates a problem? Are you seeing *end to end*
packet loss?

And have you read this Nanog presentation?

https://www.nanog.org/meetings/nanog47/presentations/Sunday/RAS_Traceroute_N47_Sun.pdf

Steinar Haug, Nethelp consulting, sthaug@nethelp.no

Hi,

Yes - I am not posting cause of a bad looking trace route. :wink: We are continuously monitoring our systems from locations around the globe and we see actually packet loss across services - HTTP, etc.

I have had a bunch of off-list replies that indicate that others are seeing the same issues. So we are not alone with this.

Cheers,
Wolfgang

And what did HE say when you asked them?

Mehmet

Hi,

Two different thins. Once they were affected by the cable fault cross-Atlantic.

All the other times the problem was acknowledged and then disappeared. No further detail given. :frowning:

Cheers,
Wolfgang

I have always found HE.net guys very responsive, and open. I am sure
they re reading this and will ping you directly.

Hello,

Did you notice the average RTT delay from the 5th hop and beyond? I'm not sure if you can trust the values according to the high packet loss, but it may indicate an additional problem (or MAY confirm your assumption about congestion).

Forget about it. The delay are correct, my mistake. I should look more carefully.

I also been a regular customer of HE, and always been satisfied with their service, especially regarding the IPv6 transit.

Wolfgang-
   Our NOC is always ready and willing to help you with any problems you
may have. I do not see any tickets opened recently for you. We have more
than enough capacity transatlantically and transpacifically so I do not
know what problem you may be seeing. The next time you see it please open a
ticket with our NOC, and we will be happy to try to debug it. Posting to
Nanog without first opening a ticket isn't a great method of debugging.

Reid Fishler

Hi Reid,

How recent is recent? The last one we opened was in June and there are several others dating back the past 6 months. I will send you the references off-list. Without getting into a fight about this - having tried to debug this with you guys several times there is better things I can have my network team do than run against walls.

I brought this onto NANOG to gauge if we are the only ones that have such issues. So far 5+ replies off-list suggest not.

Cheers,
Wolfgang

Hey Wolfgang,

I believe I may be seeing similar behavior but it's hard for me to
confirm. My network configuration is one that mtr doesn't support, so
I can't get a report when we're having issues. I don't have my transit
provided directly from HE, but rather through a provider who colocates
out of one of their facilities. So I'm not sure I could even directly
reach out to the Hurricane Electric NOC to get help.

We've been seeing the odd connectivity issues between HE FMT2 (Linode)
and AWS US-WEST-1 and US-WEST-2. It's a mixed combination of loss and
increased latency, both which cause some hiccups in some of our
WAN-based clusters. There have been times where the issues we've seen
have been attributed to a DoS attack directed toward a Linode
customer, but there have been quite a few networking events that seem
to have no relation to a known attack.

Thanks for reaching out to NANOG with this issue, it may have shed
some light on some of the issues we are seeing.

Cheers!
-Tim

On our HE uplink, I'm seeing no packet loss until your hop #9
at that point I see alot

HOST: ******** Loss% Snt Last Avg Best Wrst StDev
  1.|-- ******* 0.0% 10 0.2 0.2 0.2 0.3 0.0
  2.|-- ******* 0.0% 10 0.3 0.3 0.3 0.3 0.0
  3.|-- ******* 0.0% 10 4.4 4.5 4.3 4.6 0.0
  4.|-- gigabitethernet3-5.core1. 0.0% 10 15.6 15.1 14.1 17.2 1.0
  5.|-- 10ge5-7.core1.mci3.he.net 0.0% 10 26.2 26.7 26.2 30.1 1.0
  6.|-- 10ge5-1.core1.den1.he.net 0.0% 10 39.7 39.7 39.5 40.2 0.0
  7.|-- 10ge14-5.core1.lax2.he.ne 0.0% 10 66.8 65.9 63.5 68.4 1.6
  8.|-- 2001:504:13::3b 0.0% 10 73.0 73.0 72.8 73.1 0.0
  9.|-- 2402:7800:100:1::55 80.0% 10 71.8 71.8 71.8 71.9 0.0
10.|-- ten-0-5-0-0.cor01.syd04.n 0.0% 10 228.0 228.1 227.9 228.3 0.0
11.|-- ge-0-1-4.cor02.syd03.nsw. 0.0% 10 228.4 228.4 228.3 228.6 0.0
12.|-- 2402:7800:0:2::18a 10.0% 10 228.2 228.1 228.0 228.3 0.0
13.|-- 2001:dcd:12::10 10.0% 10 229.2 229.3 229.2 229.5 0.0

Hi,

We’ve been customers of Hurricane Electric for a number of years now and

always been happy with their service.

In recent months packet loss on some of their major routes has become a

very common (every few days) occurrence. Without knowledge of their network
I am unsure what’s the cause of it but we’ve seen it on the Tokyo - US
routes as well as the London - US routes. It reminds me of the Cogent
expansion which was carried out by unsustainable oversubscription which
eventually resulted in unusable service for a number of years. Having seen
some of the rates that HE has been selling for I can’t help but wonder if
they made the same mistake ...

Here is an example of what’s going on again atm.
HOST: prolocation01.ring.nlnog.ne Loss% Snt Last Avg Best Wrst

StDev

  1.|-- 2a00:d00:ff:136::253 0.0% 11 0.3 0.3 0.3 0.4

0.0

  2.|-- 2a00:d00:1:12::1 0.0% 10 0.7 0.8 0.7 1.1

0.1

  3.|-- hurricane-electric.nikhef 0.0% 10 0.7 3.1 0.7 8.3

2.9

  4.|-- 100ge9-1.core1.lon2.he.ne 0.0% 10 9.8 12.6 8.0 19.2

4.1

  5.|-- 100ge1-1.core1.nyc4.he.ne 10.0% 10 74.7 74.6 73.7 80.8

2.3

  6.|-- 10ge10-3.core1.lax1.he.ne 30.0% 10 133.4 138.0 133.4 145.1

4.8

  7.|-- 10ge1-3.core1.lax2.he.net 20.0% 10 135.7 139.1 133.4 145.1

4.5

  8.|-- 2001:504:13::3b 40.0% 10 143.2 143.1 142.1 144.4

0.8

  9.|-- 2402:7800:100:1::55 50.0% 10 144.4 144.1 143.8 144.4

0.2

10.|-- 2402:7800:0:1::f6 60.0% 10 298.7 298.4 298.2 298.7

0.2

11.|-- ge-0-1-4.cor02.syd03.nsw. 10.0% 10 299.3 298.9 298.3 299.5

0.5

12.|-- 2402:7800:0:2::18a 20.0% 10 299.7 299.4 298.9 300.1

0.4

13.|-- 2001:dcd:12::10 30.0% 10 299.8 299.5 298.8 300.0

0.5

We haven't had time to diagnose with them, but we ended up having to shut down our BGP sessions with HE last night due to horrible slow speed issues out of TOROONNX.

There's something going on, just don't know what it is yet.