Elephant in the room - Akamai

Seth_Mattinen · December 7, 2019, 5:06pm

Same here, removed last month, and no more Akamai traffic over peering since.

Jared_Mauch · December 7, 2019, 7:05pm

This last part doesn’t sound right.

Can you send me details in private?

Thanks,

- Jared

swlaemmr · December 7, 2019, 7:20pm

Same – we had an Akamai cache for 15+ years. Then we were notified that it was done and were sent boxes to pack our stuff up and send it back.

Eric_Kuhnke · December 7, 2019, 8:01pm

I think this thread might be a perfect example that when an organization reaches a sufficiently large size, one part of its engineering/operations team may no longer be fully aware of what other work groups are doing. Definitely a structural challenge for ISPs that span very large geographical areas and services/roles.

Jared_Mauch · December 7, 2019, 8:10pm

We are a decent sized (public) company. You can look at the # of employees if you are curious. I am but one person who can try to influence things. I’ll say that if you have a few servers from us, it’s not going to serve the entire content set that our customers have.

I do remain open to look at your individual cases and see what can be done to improve though. The answer may be nothing, but an e-mail also costs you little. I’ve got several people I’m corresponding with and will continue to do so at least until the paychecks dry up. Given the number of questions I still field about my prior employer, it may even last beyond that point

- Jared

Rod_Beck2 · December 7, 2019, 10:34pm

Have there been any fundamental change in their network architecture that might explain pulling these caches?

Mark_Tinka1 · December 7, 2019, 10:36pm

We've had 2 or 3 customers, in the last 3 months, complain about the
same thing - where they are seeing Akamai traffic drop over peering but
preferred via their transit service with us. We run a number of Akamai
AANP caches across our backbone.

We are working very closely with Akamai - and the customers - to resolve
this, I'll add.

Mark.

Mark_Delany · December 7, 2019, 11:09pm

Have there been any fundamental change in their network architecture
that might explain pulling these caches?

Maybe not network architecture, but what if the cache-to-content ratio
is dropping dramatically due to changes in consumer behavior and/or a
huge increase in the underlying content (such as adoption of higher
and multiple-resolution videos)?

There has to be a tipping point at which a proportionally small cache
becomes almost worthless from a traffic saving perspective.

If you run a cluster one presumes you can see what your in/out ratio
looks like and where the trend-line is headed.

Another possibility might be security. It may be that they need
additional security credentials for newer services which they are
reluctant to load into remote cache clusters they don't physically
control.

Mark.

Jared_Mauch · December 8, 2019, 12:19am

Please see my email on Friday where I outlined a few of the dynamics at play. Akamai isn’t just one thing, it’s an entire basket of products that all have their own resulting behaviors. This is why even though you may peer with us directly you may not see 100% of the traffic from that interconnection. (Take SSL for example, it’s often not served via the clusters in an ISP due to the security requirements we place on those racks, and this is something we treat very seriously!)

This is why I’m encouraging people to ping me off-list, because the dynamics at play for one provider don’t match across the board. I know we have thousands of distinct sites that each have their own attributes and composition at play.

I’ve been working hard to provide value to our AANP partners as well. I’ll try to stop responding to the list at this point but don’t hesitate to contact me here or via other means if you’re seeing something weird. I know I resolved a problem a few days ago for someone quickly as there was a misconfiguration left around.. We all make mistakes and can all do better.

- jared

https://www.peeringdb.com/asn/20940

Mark_Tinka1 · December 8, 2019, 6:07am

Problems are part of the gig - otherwise we'd have no reason to get up
in the morning.

What matters is that there is someone you can find to help you fix them.
That's what makes all the difference.

So kudos to you, Jared, and the entire team out there at Akamai.

Mark.

Rod_Beck2 · December 8, 2019, 2:39pm

Taking boxes out of a network does not sound like ‘emergent behavior’ or unintended consequences. Sounds like a policy change. Perhaps they are being redeployed for better performance or perhaps shut down to lower costs. Or may be the cost of transit for Akamai at the margin is less than the cost of peering with 50 billion peers.

Disclaimer: Not picking a fight. Better things to do.

Regards,

Roderick.

6x7_Networks_Lady_Be · December 8, 2019, 3:15pm

+100, and thanks to Jared.

-Ben

Brandon_Martin · December 8, 2019, 3:32pm

Does this mean that, if you peer with Akamai at some location, only content locally available at that location will come over that peering session with the rest coming via other means? Does Akamai not have private connectivity to their public peering points?

Jared_Mauch · December 8, 2019, 3:58pm

Not all content is suitable in all locations based on the physical security or market situation. We have some content that can not be served, an example is locations where there are licensing requirements (eg: ICP for China).

You will see a different mix from our 20940 vs 16625 as well. Those have different requirements on the security side. If you treat your PKI data seriously you will appreciate what is done here.

In Marquette Michigan there will be different opportunities compared with Amsterdam or Ashburn as well.

Our customers and traffic mix makes it challenging to serve from a platform where you do capital planning for several year depreciation cycle. We have thousands of unique sites and that scale is quite different from serving on a few distinct IXPs and transit providers.

So yes you will see a difference and there are things we can do to improve it when there is a variance in the behavior.

- Jared

Mehmet_Akcin · December 8, 2019, 4:41pm

Let’s take a minute and thank Jared for taking the time and responding.

thank you, Jared.

Brandon_Martin · December 8, 2019, 4:48pm

I guess what I'm getting at is that it sounds like, if you cannot source the content locally to the peering link, there's not likely to be an internal connection to the same site from somewhere else within the Akamai network to deliver that content and, instead, the target network should expect it to come in over the "public Internet" via some other connection. Is that accurate?

Thanks for the clarifications.

Owen_DeLong · December 8, 2019, 5:07pm

My guess (and it’s just this since I haven’t been inside Akamai for a couple of years now) is that they are culling the less effective AANPs (from Akamai’s perspective) in favor of redeploying the hardware to more effective locations and/or to eliminate the cost of supporting/refreshing said hardware.

I would guess that the traffic level required to justify the expense of maintaining an AANP (from Akamai’s perspective) probably depends on a great many factors not all of which would be obvious as viewed from the outside. I would guess that the density of AANPs and ISP interconnection in a given geography would be among the factors that would influence that number. I would also guess that the number would tend to rise over time.

Again, just external speculation on my part.

Owen

Jared_Mauch · December 8, 2019, 5:10pm

I was hired at Akamai to design the network architecture for a global backbone. This is proving to be an interesting challenge taking a diverse set of products with various requirements and interconnecting them in a way that saves costs and improves performance while my employers traffic continue to grow.

Akamai is built to use the paths available to deliver traffic and meet our customers and our business goals. Not all our sites are interconnected and it’s extremely unlikely (read: possibly never, but who knows) you will see all your traffic come over a direct link or cache. With any sufficiently complex system, plus the acquisitions we have made over my short tenure it’s almost impractical to integrate them all quickly or possibly at all.

I personally want to make sure that we deliver the traffic in a way that makes sense, and a few people have seen those efforts but there’s also many things in progress that are not yet complete or ready for public consumption. I believe there’s room here to improve and each time we can turn a switch or dial a knob to better serve our customers and the end-users that we are paid to serve, everyone wins.
<work hat off>

Enterprises vs consumer ISPs have very different traffic profiles, and I think the genesis of this thread was a direct result of a very consumer oriented traffic profile that was unexpected. People have wondered why I would spend so much time watching things like Apple rumor websites in the past, it’s because that would lead to high traffic events. You go to where the data is. The same can be said for other large download events or OTT launches. Everyone knows a live event can be big but generally bound by the target audience size.

As software is attacked within minutes or hours after security patches are released, I don’t find it surprising these days that systems automatically download whatever they can the moment it’s released from gaming consoles to IoT and server and OS patches.

If the traffic is causing you pain, I encourage you to reach out so we can look at what might be improved.

- Jared
(I swear I’ll stop responding.. off to make lunch)

Rod_Beck2 · December 8, 2019, 5:15pm

Yep. Real estate must be one of their largest expenses and unlike bandwidth it is not going down to price.

Rod_Beck2 · December 8, 2019, 5:17pm

Last time I spoke with an Akamai engineer many years ago the network was purely transit. Is that evolving?