Google peering pains in Dallas

Kaiser_Erich · April 29, 2020, 11:59pm

So it has been 3 weeks of major ICMP packet loss to any google service over the Dallas Equinix IX, it is not affecting performance of service but is affecting us with customer complaints and service calls due to some software using it for monitoring purposes people using it for benchmark testing. I have been told from them that they know the cause now and know that a Large ISP on the IX is causing the issue(Hmm wonder who that is…), so why do they not shutdown the peer with them and force the ISP to fix the issue? This issue is affecting everyone on the IX not just us, very very frustrating. Hopefully this will reach someone over there that can do something about it…

Christopher_Morrow · April 30, 2020, 12:53am

(already did this: "something about it")

I suppose it's time for a more public:
"Hey, when you want to test a service, please take the time to test
that service on it's service port/protocol"

Testing; "Is the internet up?"
by pinging a DNS server, is ... not great ;(
I get that telling 'joe/jane random user' this is hard/painful/ugh...
(haha, also look at cisco meraki devices!! "cant ping google dns,
internet is down")

Sorry

-chris

William_Allen_Simps3 · April 30, 2020, 5:08pm

Just as an anecdote: once upon a time I had a television that began
reporting it couldn't work anymore, because the Internet was down.

After resorting to packet tracing, discovered that it was pinging
(IIRC) speedtest.napster.com to decide. Napster had gone belly-up.

Fortunately, it had a 2 year warranty, took it back to Best Buy
with about a month to go.

Now think about the hundreds of thousands of customers who didn't
know how to diagnose the issue, or the warranty had expired, and
had to buy a new smart TV?

Tried to get the FTC interested, no joy. Congress made noises
about passing a law requiring software updates (especially for
security issues), but still nothing on that either.

Besides, what are we going to do after Google goes belly-up?

Saku_Ytti1 · April 30, 2020, 5:25pm

This is not practical or reasonable. Companies may not exist anymore
and the wide market may not want to pay the premium that proper
software requires. What might be more reasonable is regulation where
you either continue providing needed software (local and cloud) to
operate devices you sell or you provide all the source code for them,
entirely your choice. Regulation would software in escrow and should
company stop fulfilling its obligation, escrow would be opened to the
public domain.

This would create a new industry where some companies would specialise
to continue developing software for long dead companies as well as of
course open source versions.

Jared_Mauch · April 30, 2020, 6:09pm

Issues with the IXP ecosystem aren’t new in the US. This is why some providers don’t appear at them. The original one member could hurt it all was really the gigaswitch HOLB (head of line blocking) issue that was triggered by congested ports.

(Waits for others to crawl out of the woodwork who were more involved in this

This is why the majority of traffic volume for interconnection has generally been over private peering links (paid, SFI, otherwise).

If you tried to force it through an IXP ecosystem the tens of Tbps wouldn’t fit even in each city. Things like CDNs, the Netflix OpenConnect and otherwise have really shifted the demand off the interconnection points as much as feasible. Sometimes an organization can’t handle it or tries to cling to it’s old ways. Sometimes it takes organization change or people change to improve the situation.

I know it can sound like a broken record, but upgrading to match the capacity demands really can make a difference to offload paths. It may also expose other weak points. My personal goal is to cease thinking about things in the 95/5 model and more of a peak model. 95/5 gets you so far but the peaks are really where networks can shine or show their age.

I understand it’s not always possible to upgrade links, or sometimes one party holds out on the other. It’s certainly not the case at $dayjob and I try to ensure the process works as best as it can here.

Sometimes it’s best to just de-peer a network. You may find it works out better for all involved.

At $nightJob I want to peer as much traffic off as possible, but if the network paths aren’t there or low-speed it may not make sense.

Evaluate your peers periodically to ensure you are getting what you expect.

- jared

Nick_Hilliard3 · April 30, 2020, 6:18pm

ixps have always been a mid-market phenomenon. They don't deal with the high volume data flows because it never made financial sense to do that. At the lower end, they have a cut-off point which broadly aligns with smaller wholesale requirements. For the bits in between, they can provide good value.

Nick

Aaron_C_de_Bruyn1 · April 30, 2020, 6:38pm

Why isn’t there a well-known anycast ping address similar to CloudFlare/Google/Level 3 DNS, or sorta like the NTP project?
Get someone to carve out some well-known IP and allow every ISP on the planet to add that IP to a router or BSD box somewhere on their network? Allow product manufacturers to test connectivity by sending pings to it. It would survive IoT manufacturers going out of business.
Maybe even a second well-known IP that is just a very small webserver that responds with {‘status’: ‘ok’} for testing if there’s HTTP/HTTPS connectivity.

-A

Christopher_Morrow · April 30, 2020, 6:41pm

Why isn't there a well-known anycast ping address similar to CloudFlare/Google/Level 3 DNS, or sorta like the NTP project?
Get someone to carve out some well-known IP and allow every ISP on the planet to add that IP to a router or BSD box somewhere on their network? Allow product manufacturers to test connectivity by sending pings to it. It would survive IoT manufacturers going out of business.
Maybe even a second well-known IP that is just a very small webserver that responds with {'status': 'ok'} for testing if there's HTTP/HTTPS connectivity.

It sounds like, to me anyway, you'd like to copy/paste/sed the AS112
project's goals, no?

Jared_Mauch · April 30, 2020, 6:46pm

I just use this page:

http://hasthelargehadroncolliderdestroyedtheworldyet.com/

- jared

Seth_Mattinen · April 30, 2020, 11:05pm

Maybe run a "ping prisoner.iana.org" on ATLAS and see how universal it responds? It's possible some of the operators may block ICMP (I don't).

Matthew_Petach2 · April 30, 2020, 11:10pm

Or at least expand on it, to define specific IPs within
192.175.48.0/24

and
2620:4f:8000::/48

as ICMP/ICMPv6 probe destinations

If every manufacturer knew that, say 2620:4f:8000::58
was going to respond to ICMPv6 ping requests (::58 chosen
purely because it matches the IPV6-ICMP protocol number),
it would surely make it easier for them to do “aliveness”
probing without worries that a single company might go out
of business shortly after releasing their product.

Certainly worthy of proposing to the AS112 operators,
I would think.

Matt

Niels_Bakker · April 30, 2020, 11:39pm

* jared@puck.nether.net (Jared Mauch) [Thu 30 Apr 2020, 20:10 CEST]:

(Waits for others to crawl out of the woodwork who were more involved in this

Half duplex 10baseT ports, man. The collision LEDs never calmed down.

-- Niels.