Cogent Layer 2

Are any legitimate beefs with Cogent limited to their IP policies, BGP session charges, and peering disputes? Meaning, would using them for layer 2 be reasonable?

I had a discussion with them about a point to point circuit last year and ran into some weirdness around how burstable it would be for specific IP to IP streams as our use case was cheap circuit / high speed data replication between given endpoints. The sales rep was suggesting to me that I’d see specific source/destination IP pairs capped at 2gbps regardless of circuit speed, which suggested to me it was not actually a point to point wave but some type of encapsulated service. We didn’t get into whether it was usable for non-IP, etc.

Thus spake Mike Hammett (nanog@ics-il.net) on Wed, Oct 14, 2020 at 12:36:39PM -0500:

Are any legitimate beefs with Cogent limited to their IP policies, BGP session charges, and peering disputes? Meaning, would using them for layer 2 be reasonable?

Be sure to ask if your circuit will face a 2G/flow cap, and examine if such
a limitation would affect your expected traffic mix.

https://www.reddit.com/r/networking/comments/iv0job/2gb_traffic_flow_cap_on_single_sourcedestination/

Dale

Mike,

Layer 2 is fine once it works.

  • You will have to put up with whatever VLAN tags they pick, if you plan on having multiple virtual circuits on a 10G hub.

  • They do like to see into the flows of traffic, as they only allow up to 2Gbits/flow, per there legacy infrastructure.

  • If the circuit doesn’t work on turn up (which is more than likely), you’ll have to be abrasive with their NOC and demand escalations.

IMO, if it’s 1Gbit or less per circuit and can deal with ^, you’re fine, otherwise look for another carrier.

I had a discussion with them about a point to point circuit last year and ran into some weirdness around how burstable it would be for specific IP to IP streams as our use case was cheap circuit / high speed data replication between given endpoints. The sales rep was suggesting to me that I’d see specific source/destination IP pairs capped at 2gbps regardless of circuit speed, which suggested to me it was not actually a point to point wave but some type of encapsulated service. We didn’t get into whether it was usable for non-IP, etc.

I always heard this service was really Layer 3 disguised as Layer 2.

All carrier Ethernet services are tunnels provided by VPLS Psuedowire or VXLAN services. Did you really expect a VLAN to be layer 2 switched everywhere?

Ryan

Hibernia was offering Switched Ethernet ‘everywhere’ long before it had a Layer 3 network. So I am a bit skeptical. In fact, in the ‘old days’ 2006-2011 we had a nice packet over SDH service that has all the performance of SDH with all the functionality of Ethernet. Very popular service. Unfortunately, management replaced with Switched Ethernet, which many customers distrusted because of potential overbooking issues.

Hibernia’s implementation must of made scaling in terms of VLAN allocations, and programming all the equipment in path (with possibly no redundancy), very difficult to manage. Any link can be saturated no matter if it is layer 2 or 3. If you want dedicated bandwidth with an SLA, you have to pay for it.

Ryan

Look, you are looking for a fight, in which I have no interest. And no, a provider can’t overbook a packet over SDH circuit. It is SDH performance. Pure dedicated bandwidth. You are correct that if you have to carve it up into a lots of VLANs, it would be a nightmare. But Hibernia was a true wholesale carrier providing backbone to clients, not links distributing traffic to lots of user end points.

This does raise an interesting point regarding ASR9K platform. IIRC, older Typhoon/NP4c 100GE cards (e.g. Juggernaut A9K-2X100GE-TR) had problems where on a 100GE port, you can't sustain more than 10-12 Gbps per individual flow. You had to hash to achieve aggregate bandwidth, which became a significant issue if you were trying to transport 10G L2 pseudowires with limited flow visibility.

I'm not aware that it is an issue any longer on newer Tomahawk/NP5c cards?

James

I haven’t heard any concerns with reliability, on-net performance (aside from 2 gig flow limit) or other such things. Do they generally deliver well in those regards?

The fact that there was a "switched Ethernet" commercial service doesn't mean that the underlying transport was really "switched ethernet" end-to-end. Ethernet over MPLS is a VERY old concept (VLL, VPWS, VPLS, lately EVPN), and these days Ethernet over VXLAN is becoming more and more popular (mostly EVPN).

A carrier using a pure, unencapsulated, end-to-end ethernet for transport over 1000s of km is (and was for at least 15 years) a disaster waiting to happen. Almost all ethernet services (switched, not switched or otherwise) use some form of encapsulation (IP or MPLS, see above) these days.

Yep. Make sure you run BFD with your peering protocols, to catch outages very quickly.

Make sure you get higher availability with BFD than without it, it is easy to get this wrong and end up losing availability.

First issue is that BFD has quite a lot of bug surface, because unlike most of your control-plane protocols, BFD is implemented in your NPU ucode when done right.
We’ve had the entire linecard down on ASR9k due to BFD, their BFD-of-death packet you can send over the internet to crash JNPR FPC.
When done in a control-plane, poor scheduling can cause false positives more often than it protects from actual outages (CISCO7600).

In a world where BFD is perfect you still need to consider what you are protecting yourself from, so you bought Martini from someone and run your backbone over that Martini. What is an outage? Is your provider IGP rerouting due to backbone outage an outage to you? Or would you rather the provider convergees their network and you don’t converge, you take the outage?
If provider rerouting is not an outage, you need to know what their SLA is regarding rerouting time and make BFD less aggressive than that. If provider rerouting is an outage, you can of course run as aggressive timers as you want, but you probably have lower availability than without BFD.

Also, don’t add complexity to solve problems you don’t have. If you don’t know if BFD improved your availability, you didn’t need it.
Networking is full of belief practices, we do things because we believe they help and faux data is used often to dress the beliefs as science. The problem space tends to be complex and good quality data is sparse to come by, we do necessarily fly a lot by the seat of our pants, if we admit or not.
My belief is the majority of BFD implementations in real life on average reduce availability, my belief is you need frequently failing link which does not propagate link-down to reliability improve availability by deploying BFD.

Saku,

My experience with multiple carriers is that reroutes happen in under a minute but rarely happen, I also have redundant backup circuits to another datacenter, so no traffic is truly lost. If an outage lasts longer than 5 minutes, or it’s flapping very frequently, then I call the carrier. Last mile carriers install CPE equipment at the sites, which makes BFD a requirement to account for the fiber uplink on it going down, or an issue upstream.

As for security vulnerabilities, none can be leveraged if they are using internal IPs, and if not, a quick ACL can drop BFD traffic from unknown sources the same way BGP sessions are filtered.

In Juniper speak, the ACL would look like:

(under policy-options)
prefix-list bgp_hosts {
apply-path “protocols bgp group <> neighbor <>”;
}

(under firewall family inet(6) filter mgmt_acl)
term allow_bfd {
from {
protocol udp;
destination-port [ 3784 3785 4784 ];
source-prefix-list bgp_hosts;
}
then accept;
}
term deny_bfd {
from {
protocol udp;
destination-port [ 3784 3785 4784 ];
}
then discard;
}

Ryan

My experience with multiple carriers is that reroutes happen in under a minute but rarely happen, I also have redundant backup circuits to another datacenter, so no traffic is truly lost. If an outage lasts longer than 5 minutes, or it's flapping very frequently, then I call the carrier. Last mile carriers install CPE equipment at the sites, which makes BFD a requirement to account for the fiber uplink on it going down, or an issue upstream.

I think I may have spoken ambiguously and confusingly based on that
statement. Rerouting inside operator network, such as their LSR-LSR
link dropping is ostensibly invisible to the customer, can be tens of
milliseconds outage can be 10s outage.
Do you want your martini emulated backbone link to fail when operator
reroutes their own LSR-LSR link failure?

As for security vulnerabilities, none can be leveraged if they are using internal IPs, and if not, a quick ACL can drop BFD traffic from unknown sources the same way BGP sessions are filtered.

In Juniper speak, the ACL would look like:

term deny_bfd {
    from {
        protocol udp;
        destination-port [ 3784 3785 4784 ];
    }
    then discard;

So you're dropping in every edge all UDP packets towards these three
ports? Your customers may not appreciate.

Do you want your martini emulated backbone link to fail when operator reroutes their own LSR-LSR link failure?
As I said, it’s an acceptable loss for my employers network, as we have a BGP failover mechanism in place that works perfectly.

So you’re dropping in every edge all UDP packets towards these three ports? Your customers may not appreciate.
You must not be familiar with JUNOS’ ACL handling. This would be applied to interface lo0, which is specifically for control planes. No data plane traffic to customers would be hit.

Ryan

I'm sure there are some gaps in knowledge at play here.

There are many reasons why packets hit the control-plane and not be
subject to lo0 filter, for example TTL expiry. Also, as I tried to
communicate with little success, BFD is implemented in NPU ucode and
you are subjected to NPU ucode bugs.
The bug I'm talking about, does not require you using or configuring
BFD, it just needs NPU to parse it, and your FPC is gone. Same deal
with Cisco issue I'm talking about.

I've not yet seen single non-broken junos control-plane protection,
everyone has terribly poorly written lo0 filters, no one has any idea
how to configure ddos-protection. If you some canonical sources to do
this, like Cymru or Juniper's MX book as source, you'll get it all
wrong, as they both contain trivial and naive errors.

But if you do manage to configure lo0 and ddos-protection correctly,
you're still exposed to wide array of packet-of-death style vectors.
Just yesterday on Junos SIRT-day bug where your KRT will become wedged
if you sample (IPFIX) specifically crafted packet, this will be
transit packet.

Problems become increasingly simple the less you understand them.

Not why IP addresses are even an issue on an a "Layer 2" service. But
then again, this is Cogent we're talking about here.