The IPv6 Travesty that is Cogent's refusal to peer Hurricane Electric - and how to solve it

Hi everyone,

I know the long and storied history of Cogent and HE failing to peer for IPv6 and failing to provide (from either side) for IPv6 transit between their two networks has been mentioned and covered on this list before, but I am rather surprised it has not garnered much attention.

Until recently, that is. I notice an increasing number of people tweeting at both HE and Cogent about the problem.

From HE’s public statements on the matter, it’s pretty clear that they would gladly peer with Cogent for IPv6 but that Cogent declines to do this. I simply cannot understand Cogent’s logic on this. Cogent is the one loosing out here, to my way of thinking. They have far less IPv6 coverage than HE.

I myself, on behalf of my employers, am a direct customer of IP transit services from both Cogent and HE.

I don’t know about others similarly positioned, but my Cogent rep tries to call me at least twice a month. I’m going to start taking (more of) his calls and letting him know his account with us is in jeopardy come renewal time if Cogent can’t get a full IPv6 route table to happen.

Today, with Cogent & HE as peers, I am world reachable via IPv6. If either peer went down however, part of the internet couldn’t reach me via IPv6 because either HE wouldn’t have a route or Cogent wouldn’t have a route. That’s ridiculous.

Since Cogent is clearly the bad actor here (the burden being Cogent's to prove otherwise because HE is publicly on record as saying that they’d love to peer with Cogent), I’m giving serious consideration to dropping Cogent come renewal time and utilizing NTT or Zayo instead.

While that would not immediately solve the problem that if the NTT or Zayo link went down, single-homed Cogent customers would loose access to me via IPv6, I’m actually ok with that. It at least lets ensures that when there is a problem, the problem affects only single-home Cogent clients. Thus, the problem is borne exclusively by the people who pay the bad actor who is causing this problem. That tends to get uncomfortable for the payee (i.e. Cogent).

I intend to email my Cogent sales guy regarding this matter and make this a sticking point in every phone conversation I have with him. I call on others similarly situated to consider whether you may like to follow suit in this approach. I’ve come to believe that it’s best for my interests and I also believe that it’s best for the internet community at large, as ubiquitous worldwide routing of IPv6 becomes more essential with each passing day.

In closing, I further add that it’s a mystery to me why Cogent wouldn’t desire an IPv6 peering with HE. Let’s face it, if any of us had to choose a single-home IPv6 internet experience, between HE or Cogent, we’d all choose HE. If those were the two options, HE is the “real” IPv6 internet and Cogent is a tiny sliver of the IPv6 internet. I have actually wondered if HE is holding IPv6 peering with Cogent hostage, contingent on peering all protocols (IPv4 and IPv6) with Cogent. There, I could see why Cogent might hesitate. To my knowledge, however, this is not the case and I have heard no public accusation that HE is imposing such a constraint. I would love to hear anyone from HE tell as much of the story as they are able.

PS - As an aside, has anyone noticed HE’s been growing their network by leaps and bounds this past year? Direct peerings with AT&T and CenturyLink, more domestic US and Canadian POPs, and I believe the number of pathways across the North American continent has been improved substantially, too.

Thanks,

Matt Hardeman
IPiFony Systems, Inc.
AS6082

Take two transit providers that aren’t in the group of (HE, Cogent). Cogent is probably banking on this being the response; figuring that they have the financial resources to outlast HE if they’re both shedding customers.

If you really wanted to stick it to Cogent, take 3 transit providers: HE and two of any other providers besides Cogent.

Cogent clearly aren’t going to cave to their own customers asking them to peer with HE. Otherwise it would have happened by now.

Cogent sucks for lots of reasons and this one isn’t even in the top 5 IMHO.

I’m inclined to agree with you, subject to some caveats:

1. I think more Cogent customers need to be more vocal about it. There hasn’t been an impetus to do so until recently. Now real people (not network engineer sorts) are starting to use IPv6 for real.

2. I agree with you in principle. In an idea world, take HE and two others. I would however still say that if you could only take two, take HE and take something other than Cogent. It’s a win-win if the experience of single-home Cogent customers gets to be worse as a result. Perhaps having things occasionally break — only for single-home Cogent customers — is a benefit.

Let's hear the top 5. Peering disputes are up there, but what else?

We've had them as one of our providers going on 8 years, and we can only complain about the occasional peering disputes.

-Robert

Honestly, don't take HE or Cogent if you can help it. Neither deserves to be rewarded in this dispute. That being said, there are plenty of small customers that are single homed to both. Unfortunately, I doubt their voices matter.

Jack

I don’t really have 5 reasons to hate cogent but I’ve got 3 big ones. If you’ve had static transit with Cogent for 8 years at one or just a handful of locations, none of these will apply. But..

1) They charge per IPv4 BGP session per month
2) They constantly screw up our orders.
3) It then takes days for them to fix their own screw ups in their order system.

welcome to the commercial internet. get over it.

randy

Crazy multihop BGP setups because they don’t do BGP on many (most?) of their customer facing routers?
Frequent outages in many locations (maybe not where you are, seems to be certain problem areas on their network and not others)
Spamtastic sales force?
Overly aggressive sales calls?

I’m sure there are more, but as I’ve never been a Cogent customer (thankfully) due to their history of bad peering policies, peering disputes, generally obnoxious conduct as a company, etc. it is difficult for me to know much about the customer experience beyond what I hear from others, most of whom are former Cogent customers.

Interestingly, when I worked for HE, I wasn’t allowed to speak my mind about Cogent lest it “reflect badly” on HE.

Owen

Owen DeLong wrote:

Crazy multihop BGP setups

I like that setup. And it never struck me as crazy. In fact, their implementation avoids all multihop setup shortcuts and is quite purist from a routing standpoint.

The multihop approach gives you the option of where to slice and dice your full table direct from ebgp.

In essence, that setup enables you as a customer to have a setup exactly like Cogent had as a vendor. If thats what you want.

because they don�t do BGP on many (most?) of their customer facing routers?

I have a pending request to get that multi-hop setup. I was told that it was now a special request and they would "try" to get it done and these days all their routers had full table capacity and they no longer used the multi-hop.

First time I've heard that...

Mark.

Mark Tinka wrote:

I like that setup. And it never struck me as crazy. In fact, their
implementation avoids all multihop setup shortcuts and is quite purist
from a routing standpoint.

First time I've heard that...

Mark.

No static routes, dedicated BGP routed loopbacks on each side from an allocated /31, strict definitions on which routes belong to which session. Its gone about very properly.

In my opinion, that setup is a very good example of how and when to properly take advantage of a BGP feature that has been with us from the start.

And really, whats wrong with the ability on your side to decide when and where on your network you will take a full feed of ever expanding internet routes. On your edge? On a purpose built route server?

Or do you think the only paths forward for everyone's edges is continuous forklifting and/or selective filtering?

I suspect that people are as much wary of the flexibility made available to them as they are to the "complexity" imposed via this approach.

Joe

No static routes, dedicated BGP routed loopbacks on each side from an
allocated /31, strict definitions on which routes belong to which
session. Its gone about very properly.

And all of this is simpler than having a native BGP session that runs
across a point-to-point link?

In my opinion, that setup is a very good example of how and when to
properly take advantage of a BGP feature that has been with us from
the start.

My philosophy: if I could run a router with only one command in its
configuration, I would.

I realize some commands make a router more secure than them being absent
(and vice versa), while some commands make a router perform better than
them being absent (and vice versa).

My point - just because a feature is there, does not mean you have to
use it.

And really, whats wrong with the ability on your side to decide when
and where on your network you will take a full feed of ever expanding
internet routes. On your edge? On a purpose built route server?

Personally, I abhor tunnels (and things that resemble them) as well as
centralized networking. But that's just me.

Or do you think the only paths forward for everyone's edges is
continuous forklifting and/or selective filtering?

Can't speak for others, just myself.

Mark.

Mark Tinka wrote:

No static routes, dedicated BGP routed loopbacks on each side from an
allocated /31, strict definitions on which routes belong to which
session. Its gone about very properly.

And all of this is simpler than having a native BGP session that runs
across a point-to-point link?

Maybe not for some people, but I have a hard time understanding why one extra ebgp session is such a novel concept for all you networking folk.

My philosophy: if I could run a router with only one command in its
configuration, I would.

They sell those routers at your nearest staples, they require zero commands.

Personally, I abhor tunnels (and things that resemble them) as well as
centralized networking. But that's just me.

I know you know better. What does this have to do with tunnels? Or how centralized your network is built or not?

Joe

Joe Maimon wrote:

Maybe not for some people, but I have a hard time understanding why one
extra ebgp session is such a novel concept for all you networking folk.

multihop bgp means that you don't have synchronised ethernet carrier
status between the provider and customer routers. This in turn means
that if there's an intermediate connectivity problem, bgp will need to
time out before it notices and reroutes. During this period, traffic
will be black-holed. This is a crock.

Nick

My understanding is this was mostly legacy from devices that did not carry full Rib and fib. There were tricks to avoid ending up on these skinny devices if you wanted.

Life in the core has changed a lot in recent years from 6500/7600 and foundry/brocade class devices to a more interesting set in the pipeline or released.

There are some limited rib-> fib download boxes that could slice traffic in cost effective ways that the price conscious consumer will likely push the market to.

Jared Mauch

Maybe not for some people, but I have a hard time understanding why
one extra ebgp session is such a novel concept for all you networking
folk.

It's not that novel - I share my view of the Internet with various
industry initiatives this way.

But for a commercial service, the decoupling between the state of the
physical link and the control plane in this case creates an opportunity
for various forwarding issues that are avoidable. The BFD argument could
be made, but it is not yet a basic feature one can expect with one's
customers.

They sell those routers at your nearest staples, they require zero
commands.

No Staples this side of the world...

I know you know better. What does this have to do with tunnels? Or how
centralized your network is built or not?

Not everyone has the luxury of carrying a full table at the edge, for
various reasons, and I get that (even though in 2016, selective BGP FIB
downloads is a reality).

But if you can avoid it, determining one or two boxes in your core that
are your full BGP table reference puts a great deal of burden on those
devices to run and maintain routability for and within your network. If
I had the ability not to do this, I would, despite how sexy eBGP
Multi-Hop might be.

Mark.

Mark Tinka wrote:

Maybe not for some people, but I have a hard time understanding why
one extra ebgp session is such a novel concept for all you networking
folk.

It's not that novel - I share my view of the Internet with various
industry initiatives this way.

It appears that to route on the edge with multihop is viewed as novel.

And going further, multihop is quite novel to BGP Engineers in many a location, as per personal experience.

But for a commercial service, the decoupling between the state of the
physical link and the control plane in this case creates an opportunity
for various forwarding issues that are avoidable. The BFD argument could
be made, but it is not yet a basic feature one can expect with one's
customers.

Before BFD, we had keepalives right in BGP. Whats wrong with that?

I suppose you also advocate that each provider use a phy port directly on the ege, no switches in between, so that the full table can be yanked out as quickly as possible and that it be flooded back in as soon as possible, as many times as possible...

I know you know better. What does this have to do with tunnels? Or how
centralized your network is built or not?

Not everyone has the luxury of carrying a full table at the edge, for
various reasons, and I get that (even though in 2016, selective BGP FIB
downloads is a reality).

The question is whether it is a reality for gear that already cannot support full tables (likely EoS), or that is projected not to support them in the future. And which is practical to obtain and operate.

Further, FIB is one part. Collecting multiple full tables can also impose a dram burden on an edge router.

And churn on its CPU. Crypto, policy, etc.

Lets face it. An edge device control processor and memory is not the ideal location for all this. It does not compare with the GP hardware available for that task and it never will.

But if you can avoid it, determining one or two boxes in your core that
are your full BGP table reference puts a great deal of burden on those
devices to run and maintain routability for and within your network. If
I had the ability not to do this, I would, despite how sexy eBGP
Multi-Hop might be.

Mark.

Who says it must be that way? You could go the other extreme, it is quite feasible to have multiple RR's per pop (if thats what you want) and you can even segregate each eBGP feed into its own BGP router process, using a fraction of the hardware resources available to you in todays 1U server, available at a fraction of the cost of yesterday's edge.

It is not too hard to see that this approach offers a degree of design freedom that coupling your ebgp directly to your edge does not.

Joe

Before BFD, we had keepalives right in BGP. Whats wrong with that?

You may want to signal failure more quickly than BGP's own timers can
handle.

I suppose you also advocate that each provider use a phy port directly
on the ege, no switches in between, so that the full table can be
yanked out as quickly as possible and that it be flooded back in as
soon as possible, as many times as possible...

Not how I run my network. I aggregate customer ports to a Layer 2
switch, which upstreams to the edge router for service. Router ports are
expensive.

The only time I'll terminate customer links to a router is if they are
buying 100Gbps native services.

The question is whether it is a reality for gear that already cannot
support full tables (likely EoS), or that is projected not to support
them in the future. And which is practical to obtain and operate.

If your gear does not have the latest capabilities, then using what it
has to achieve the best possible outcome is a well understood strategy.

What we are talking about here is options in current state-of-the-art
that you would want to ignore for older options if you have the
opportunity not to. But, your network, your rules.

Further, FIB is one part. Collecting multiple full tables can also
impose a dram burden on an edge router.

And churn on its CPU. Crypto, policy, etc.

Lets face it. An edge device control processor and memory is not the
ideal location for all this. It does not compare with the GP hardware
available for that task and it never will.

Not from what I see in my network.

I have virtual routers running on x86_64 servers chugging along just as
well as the routing engines on my Juniper and Cisco edge routers.
Admittedly, the control planes in those routers are high-end, and I
can't expect that everyone can afford them, but to say the brains in
modern routers are not up to the task is simply not true. In fact, the
control plane on some of these boxes is not yet being fully exploited
because code is still slowly evolving to take advantage of multi-core
architecture, and 64-bit memory, particularly for routing processes. The
headroom and performance on these has been phenomenal, and I can take
that to the bank.

Who says it must be that way? You could go the other extreme, it is
quite feasible to have multiple RR's per pop (if thats what you want)
and you can even segregate each eBGP feed into its own BGP router
process, using a fraction of the hardware resources available to you
in todays 1U server, available at a fraction of the cost of
yesterday's edge.

It is not too hard to see that this approach offers a degree of design
freedom that coupling your ebgp directly to your edge does not.

Not the way I'd do it, but like I said, your network, your rules.

Mark.

Mark Tinka wrote:

You may want to signal failure more quickly than BGP's own timers can
handle.

I dont want to churn a full table any quicker then BGP timers. And if you choose to run that ebgp loopback multihop on the same router, you can track routes and interfaces in realtime, to the extent your CP SW supports it. Choice is yours.

Not how I run my network. I aggregate customer ports to a Layer 2
switch, which upstreams to the edge router for service. Router ports are
expensive.

That was my point. Phy signalling is easily and often sacrificed for density and flexibility.

If your gear does not have the latest capabilities, then using what it
has to achieve the best possible outcome is a well understood strategy.

And when you can use a design that offers advantages either way, so much the better.

To return to the topic on hand, Cogent seemed to do quite well in the transit wars with this approach. So perhaps there is something to it.

Maybe they were not constrained by the pricing for the gear with the latest capabilities and capacities as their competitors were? Perhaps this approach enabled them to more rapidly build out and light up their network to catch up to their competitors, to the point that they now sound more like them than they do their previous selves?

I have virtual routers running on x86_64 servers chugging along just as
well as the routing engines on my Juniper and Cisco edge routers.

Or better? And how do those routers get their full tables to munch on?

Admittedly, the control planes in those routers are high-end, and I
can't expect that everyone can afford them, but to say the brains in
modern routers are not up to the task is simply not true.

What I said is that they do not compare. Or is the control plane hardware specs in the latest and greatest C/J box identical to what you would be getting for the latest and greatest x86_64 server? My, times have changed.

In fact, the
control plane on some of these boxes is not yet being fully exploited
because code is still slowly evolving to take advantage of multi-core
architecture, and 64-bit memory, particularly for routing processes. The
headroom and performance on these has been phenomenal, and I can take
that to the bank.

Are you saying that the control plane experience lags behind general purpose computing?

Simply because you can afford the inflated pricing of the latest and greatest gear does not mean you should and it also does not mean the techniques available and in use to do so are in and of themselves suspect. No matter the temptation to do so.

To a certain extent, the market for the hardware probably accounts for and takes advantage of any such unwillingness to engineer around cost, whether it is due to pure design concerns or tinged with psychological suggestion.

Joe

I dont want to churn a full table any quicker then BGP timers.

You don't have to churn the whole table, you just have to churn the
(indirect) next-hop.

And if you choose to run that ebgp loopback multihop on the same
router, you can track routes and interfaces in realtime, to the extent
your CP SW supports it. Choice is yours.

This feature is not unique to eBGP Multi-Hop.

Search for Next-Hop Address Tracking and/or Indirect Next-Hop.

That was my point. Phy signalling is easily and often sacrificed for
density and flexibility.

We have not had to sacrifice performance with our customers in these
types of topologies.

In the Metro, BGP sessions instantiate directly on the Ethernet switch,
so we don't lose performance there either.

To return to the topic on hand, Cogent seemed to do quite well in the
transit wars with this approach. So perhaps there is something to it.

It allowed them to use cheap switches in the Access. That makes a lot of
difference when you're undercutting the competition.

In 2016, you can still use cheap switches to keep your Access costs
down, but you don't have to sacrifice edge-based BGP routing if it's
your thing.

Maybe they were not constrained by the pricing for the gear with the
latest capabilities and capacities as their competitors were? Perhaps
this approach enabled them to more rapidly build out and light up
their network to catch up to their competitors, to the point that they
now sound more like them than they do their previous selves?

Yes, and yes.

Cheap switches that you can deploy rapidly make for a good business case.

Or better?

Not necessarily.

I can hold more tables because the servers have 512GB of RAM, but won't
because the code can only address 16GB max. today (some of which goes to
the code itself at boot). Work in progress, the code started at 4GB only
last year, so we'll get there.

CPU performance also still needs to get better. 12x cores in the
chassis, but because of code limitations, they aren't yet fully optimized.

Overall, still better than using a dedicated router for RR functions.

And how do those routers get their full tables to munch on?

From a bunch of purpose-built edge, peering and border routers.

What I said is that they do not compare. Or is the control plane
hardware specs in the latest and greatest C/J box identical to what
you would be getting for the latest and greatest x86_64 server? My,
times have changed.

My Juniper routers are running x86_64-based 1.8GHz Quad-Core CPU's with
16GB of RAM. 32GB RAM options are now available.

Not cheap, but with several full IPv4/IPv6 views, dozens of customers
taking full feeds, I am not struggling for grunt.

As Junos gets cleverer, those additional cores will come to life
(fingers crossed).

Are you saying that the control plane experience lags behind general
purpose computing?

Nope - I'm saying if you have some cash to burn, you're now in a
position where one option is not automatically better than the other.

I use servers with virtual routers for my RR's because the prospect of
sticking 1TB of RAM in a router is not yet feasible. At the same time,
I'm comfortable running BGP natively in the edge because the control
planes on the routers I have are nowhere near saturation, running tech.
2x years old now.

Simply because you can afford the inflated pricing of the latest and
greatest gear does not mean you should and it also does not mean the
techniques available and in use to do so are in and of themselves
suspect. No matter the temptation to do so.

Agree, but BGP routing is not the only reason we need the control planes.

There are other elements to our business that drive that spec.

To a certain extent, the market for the hardware probably accounts for
and takes advantage of any such unwillingness to engineer around cost,
whether it is due to pure design concerns or tinged with psychological
suggestion.

We spend if we have to, and don't if we don't have to.

For our RR deployment, for example, it was either dedicated routers for
the task, or a long-term view on servers + a hypervisor. We chose the
latter.

Mark.