Elephant in the room - Akamai

Kaiser_Erich · December 5, 2019, 3:03am

Lets talk Akamai

They have shifted 90% of their traffic off IXs and onto our full route DIA, anyone else seeing this issue or have insight as to what is going on over there? We have been asking for help on resolution for weeks and all we get is we are working on it and now we get no response. We were even sent an LOA and when the DC went to go put in the x-connect their patch panel was full. How do they not know if they have ports open or not? I have even reached out to an engineer who is on this list and he does not even respond.

The last two nights the traffic levels to them has skyrocketed as well.

Any insight?

craig_washington · December 5, 2019, 5:35am

I don’t have any insight but can confirm I am seeing the same thing. (Traffic shift back onto transit links)

They did tell me they were having some bandwidth issues and are working on it.
I am currently awaiting a direct PNI with them but haven’t heard from them in some time.

Matthew_Petach2 · December 5, 2019, 7:48am

Lets talk Akamai

[…]

The last two nights the traffic levels to them has skyrocketed as well.

Any insight?

Erich Kaiser
The Fusion Network

As a CDN, I would usually expect to see traffic from Akamai to be the large direction.

If you’re seeing your traffic to them skyrocketing, are you sure you aren’t carrying DDoS attack traffic at them?

CDNs aren’t known for being large traffic sinks. ^_^;;

Matt

Bryan_Holloway1 · December 5, 2019, 9:15am

I think he meant inbound (from). We also saw the same thing.

Jared_Mauch · December 5, 2019, 10:36am

Good morning!

If you are having Akamai issues you can reach out to me and I will help you out.

Jared

Kaiser_Erich · December 5, 2019, 2:28pm

Yes inbound. The patterns are not typical, we are talking gigs of traffic moved off of the IX side and onto our DIA side. They have reached out to me already. Will see what happens. Will post follow-up.

Aaron · December 5, 2019, 2:39pm

I see my Akamai aanp cache utilization at all-time highs the last 2 nights as well. Curious what it is.

Jared, you can reply to my off-list if you wish, or on-list if it would benefit the community.

Thanks,

Aaron

Tarko_Tikan · December 5, 2019, 2:54pm

hey,

I see my Akamai aanp cache utilization at all-time highs the last 2 nights as well. Curious what it is.

Halo Reach release.

Clayton1 · December 5, 2019, 3:05pm

Our AANP cache seems to have done the same in the past 2 nights. Lots of traffic that has never been there before. It has however not reduced the amount of traffic we’re getting from AS20940 directly - still hit a new record level last night.

We’ve got a request in with them for a PNI. If things keep growing at this rate, we might need two!

Over the years, I’ve questioned how much the AANP boxes really did for us, as their in:out ratio seemed almost balanced. If the last two nights are an indication, then they’re worth keeping.

Aaron · December 5, 2019, 8:41pm

Tarko… wow, gaming again ! It’s not going away… gaming traffic is growing in a big way it seems.

Clayton…. My thoughts exactly! I too have wondered how valuable these aanp’s were, but lately I’m seeing good efficiency

Thanks y’all

-Aaron

Valdis_Kletnieks · December 5, 2019, 9:44pm

And it's only going to get worse. Sony has already announced that the
Playstation 5 will have a (probably) 1-2 terabyte SSD. And even with that, the
game packaging is set up to support only downloading the single-player or
multi-player portions of a game because images are going to be pushing 100
gigabytes RSN (some are already well over 40gig).

So even with the download restructuring, we're probably going to be seeing a
lot of people downloading lots of gigabytes on Day 1 (or a few days before, for
games that support it), and re-downloading smaller (but still large) amounts
when they want to re-play the game...

Michael_Thomas · December 5, 2019, 10:18pm

I suspect that it's going to be even worse on the home side. A while ago a friend was here and unbeknownst to me, he was downloading a big game. The rest of the home network was rendered unusable, and it took me over an hour to figure out what was going on. I knew what to look for -- and even then giving the awful tools that routers support it was hard -- but just about anybody else would have been on the phone to their provider saying that "INTERTOOBS ARE SLOW!".

My suspicion is that the root problem was buffer bloat -- i flashed a new router with openwrt and was a little dismayed that the bufferbloat code is a plugin you have to enable. The buffer bloat got a lot better after that, but I forgot to retest the downloading after so I'm not 100% positive. But if it was the problem, we're probably in for a world of hurt as I doubt that many home routers implement it.

Mike

Chris_Adams2 · December 6, 2019, 12:19am

Once upon a time, Valdis Klētnieks <valdis.kletnieks@vt.edu> said:

And it's only going to get worse. Sony has already announced that the
Playstation 5 will have a (probably) 1-2 terabyte SSD. And even with that, the
game packaging is set up to support only downloading the single-player or
multi-player portions of a game because images are going to be pushing 100
gigabytes RSN (some are already well over 40gig).

Xbox One X games are already there... I'm a pretty casual gamer, and I
have multiple games over 90GB (one is 117GB).

Valdis_Kletnieks · December 6, 2019, 2:02am

Friends don't let friends run factory firmware.

Hopefully sometime soon the SQM stuff will be added to the default openwrt
configs for most of the supported routers, if it hasn't been already. It's been
in my config since before the Luci support for SQM got created....

The big problem is that a lot of eyeball networks have a lot of CPE boxes that
were created before the bufferbloat work was done, and often have no real
motivation to push software updates to the CPE (if they even have the ability),
and a lot of customers have routers that they bought at Best Buy or Walmart
that will *never* get a software update.

(I also admit having no idea what percentage of the intermediate routers in the
ISP's networks have gotten de-bloating code.

Stephen_Satchell · December 6, 2019, 5:02am

For SP-grade routers, there isn't "code" that needs to be added to combat buffer bloat. All an admin has to do is cut back on the number of packet buffers on each interface -- an interface setting, you see.

The reason that comsumer-grade devices can contribute to buffer bloat is because the vendor doesn't expose a knob to adjust buffering. At least in most instances with Best Buy and Office Depot routers.

Fawcett_Nick · December 6, 2019, 2:46pm

We had three onsite Akamai caches a few months ago. They called us up and said they are removing that service and sent us boxes to pack up the hardware and ship back. We’ve had quite the increase in DIA traffic as a result of it.

~Nick

Chris_Adams2 · December 6, 2019, 2:59pm

Once upon a time, Fawcett, Nick <nfawcett@corp.mtco.com> said:

We had three onsite Akamai caches a few months ago. They called us up and said they are removing that service and sent us boxes to pack up the hardware and ship back. We’ve had quite the increase in DIA traffic as a result of it.

Same here. We'd had Akamai servers for many years, replaced as needed
(including one failed servre replaced right before they turned them
off). Now about 50% of our Akamai traffic comes across transit links,
not peering. This seems like it would be rather inefficient for them
too...

Jared_Mauch · December 6, 2019, 4:13pm

There’s an element of scale when it comes to certain content that makes it not viable if the majority of traffic is VOD with variable bitrates it requires a lot more capital.

Things like downloads of software updates (eg: patch Tuesday) lend themselves to different optimizations. The hardware has a cost as well as the bandwidth as well.

I’ll say that most places that have a few servers may only see a minor improvement in their in:out. If you’re not peering with us or are and see significant traffic via transit, please do reach out.

I’m happy to discuss in private or at any NANOG/IETF meeting people are at. We generally have someone at most of the other NOG meetings as well, including RIPE, APRICOT and even GPF etc.

I am personally always looking for better ways to serve the medium (or small) size providers better.

- Jared

Michael_Thomas · December 6, 2019, 6:33pm

So I tested this out again after I sent out my message and it does indeed seem to be just fine: it wasn't an identical test since my friend was over wifi, but that really shouldn't affect things, I'd think.

The thing I don't get is that buffer bloat is a creature of the upstream, right? I wouldn't think that the stream of acks sent from downloading the file would put much pressure on the upstream. Which makes me wonder if it's just that the old router itself was saturated and couldn't keep up. Or something.

In any case, there are probably zillions of 10 year old routers out in the world, and no matter what exactly caused this for me it will probably happen for zillions of other people too. Hope support desks are ready for the deluge.

Mike

Keenan_Tims · December 6, 2019, 7:29pm

Speaking as a (very) small operator, we've also been seeing less and less of our Akamai traffic coming to us over peering over the last couple years. I've reached out to Akamai NOC as well as Jared directly on a few occasions and while they've been helpful and their changes usually have some short-term impact, the balance has always shifted back some weeks/months later. I've more or less resigned myself to this being how Akamai wants things, and as we so often have to as small fish, just dealing with it.

We're currently seeing about 80% of our AS20940 origin traffic coming from transit, and I'm certain there's a significant additional amount which is difficult to identify coming from on-net caches at our upstream providers (though it appears from the thread that may be reducing as well). Only about 20% is coming from peering where we have significantly more capacity and lower costs. Whatever the algorithm is doing, from my perspective it doesn't make a lot of sense and is pretty frustrating, and I'm somewhat concerned about busting commits and possibly running into congestion for the next big event that does hit us, which would not be a problem if it were delivered over peering.

Luckily we're business focussed, so we're not getting hit by these gaming events.

Keenan Tims
Stargate Connections Inc (AS19171)