Lossy cogent p2p experiences?

David_Hubbard · August 31, 2023, 3:55pm

Hi all, curious if anyone who has used Cogent as a point to point provider has gone through packet loss issues with them and were able to successfully resolve? I’ve got a non-rate-limited 10gig circuit between two geographic locations that have about 52ms of latency. Mine is set up to support both jumbo frames and vlan tagging. I do know Cogent packetizes these circuits, so they’re not like waves, and that the expected single session TCP performance may be limited to a few gbit/sec, but I should otherwise be able to fully utilize the circuit given enough flows.

Circuit went live earlier this year, had zero issues with it. Testing with common tools like iperf would allow several gbit/sec of TCP traffic using single flows, even without an optimized TCP stack. Using parallel flows or UDP we could easily get close to wire speed. Starting about ten weeks ago we had a significant slowdown, to even complete failure, of bursty data replication tasks between equipment that was using this circuit. Rounds of testing demonstrate that new flows often experience significant initial packet loss of several thousand packets, and will then have ongoing lesser packet loss every five to ten seconds after that. There are times we can’t do better than 50 Mbit/sec, but it’s rare to achieve gigabit most of the time unless we do a bunch of streams with a lot of tuning. UDP we also see the loss, but can still push many gigabits through with one sender, or wire speed with several nodes.

For equipment which doesn’t use a tunable TCP stack, such as storage arrays or vmware, the retransmits completely ruin performance or may result in ongoing failure we can’t overcome.

Cogent support has been about as bad as you can get. Everything is great, clean your fiber, iperf isn’t a good test, install a physical loop oh wait we don’t want that so go pull it back off, new updates come at three to seven day intervals, etc. If the performance had never been good to begin with I’d have just attributed this to their circuits, but since it worked until late June, I know something has changed. I’m hoping someone else has run into this and maybe knows of some hints I could give them to investigate. To me it sounds like there’s a rate limiter / policer defined somewhere in the circuit, or an overloaded interface/device we’re forced to traverse, but they assure me this is not the case and claim to have destroyed and rebuilt the logical circuit.

Thanks!

Eric_Kuhnke · August 31, 2023, 8:51pm

Cogent has asked many people NOT to purchase their ethernet private circuit point to point service unless they can guarantee that you won’t move any single flow of greater than 2 Gbps. This works fine as long as the service is used mostly for mixed IP traffic like a bunch of randomly mixed customers together.

What you are trying to do is probably against the guidelines their engineering group has given them for what they can sell now.

This is a known weird limitation with Cogent’s private circuit service.

The best working theory that several people I know in the neteng community have come up with is because Cogent does not want to adversely impact all other customers on their router in some sites, where the site’s upstreams and links to neighboring POPs are implemented as something like 4 x 10 Gbps. In places where they have not upgraded that specific router to a full 100 Gbps upstream. Moving large flows >2Gbps could result in flat topping a traffic chart on just 1 of those 10Gbps circuits.

David_Hubbard · August 31, 2023, 9:42pm

That’s not what I’m trying to do, that’s just what I’m using during testing to demonstrate the loss to them. It’s intended to bridge a number of networks with hundreds of flows, including inbound internet sources, but any new TCP flow is subject to numerous dropped packets at establishment and then ongoing loss every five to ten seconds. The initial loss and ongoing bursts of loss cause the TCP window to shrink so much that any single flow, between systems that can’t be optimized, ends up varying from 50 Mbit/sec to something far short of a gigabit. It was also fine for six months before this miserable behavior began in late June.

Saku_Ytti1 · September 1, 2023, 8:50am

It is a very plausible theory, and everyone has this problem to a
lesser or greater degree. There was a time when edge interfaces were
much lower capacity than backbone interfaces, but I don't think that
time will ever come back. So this problem is systemic.
Luckily there is quite a reasonable solution to the problem, called
'adaptive load balancing', where software monitors balancing, and
biases the hash_result => egress_interface tables to improve balancing
when dealing with elephant flows.

Mark_Tinka4 · September 1, 2023, 11:49am

We didn't have much success with FAT when the PE was an MX480 and the P a CRS-X (FP40 + FP140 line cards). This was regardless of whether the core links were native IP/MPLS or 802.1AX.

When we switched our P devices to PTX1000 and PTX10001, we've had surprisingly good performance of all manner of traffic across native IP/MPLS and 802.1AX links, even without explicitly configuring FAT for EoMPLS traffic.

Of course, our policy is to never transport EoMPLS servics in excess of 40Gbps. Once a customer requires 41Gbps of EoMPLS service or more, we move them to EoDWDM. Cheaper and more scalable that way. It does help that we operate both a Transport and IP/MPLS network, but I understand this may not be the case for most networks.

Mark.

Saku_Ytti1 · September 1, 2023, 1:29pm

PTX and MX as LSR look inside pseudowire to see if it's IP (dangerous
guess to make for LSR), CSR/ASR9k does not. So PTX and MX LSR will
balance your pseudowire even without FAT. I've had no problem having
ASR9k LSR balancing FAT PWs.

However this is a bit of a sidebar, because the original problem is
about elephant flows, which FAT does not help with. But adaptive
balancing does.

Mike_Hammett · September 1, 2023, 1:44pm

and I would say the OP wasn’t even about elephant flows, just about a network that can’t deliver anything acceptable.

Mark_Tinka4 · September 1, 2023, 1:46pm

Yes, this was our conclusion as well after moving our core to PTX1000/10001.

Mark.

Saku_Ytti1 · September 1, 2023, 1:55pm

Personally I would recommend turning off LSR payload heuristics,
because there is no accurate way for LSR to tell what the label is
carrying, and wrong guess while rare will be extremely hard to root
cause, because you will never hear it, because the person suffering
from it is too many hops away from problem being in your horizon.
I strongly believe edge imposing entropy or fat is the right way to
give LSR hashing hints.

Mike_Hammett · September 1, 2023, 1:59pm

I wouldn’t call 50 megabit/s an elephant flow

Mark_Tinka4 · September 1, 2023, 2:08pm

Unless Cogent are not trying to accept (and by extension, may not be able to guarantee) large Ethernet flows because they can’t balance them across their various core links, end-to-end… Pure conjecture… Mark.

David_Hubbard · September 1, 2023, 2:19pm

The initial and recurring packet loss occurs on any flow of more than ~140 Mbit. The fact that it’s loss-free under that rate is what furthers my opinion it’s config-based somewhere, even though they say it isn’t.

Lukas_Tribus1 · September 1, 2023, 3:37pm

If you need to load-balance labelled IP traffic though, all your edge
devices would have to impose entropy/fat.

On the hand a workaround at the edge at least for EoMPLS would be to
enable control-word.

Lukas

Saku_Ytti1 · September 1, 2023, 4:47pm

Juniper LSR can actually do heuristics on pseudowires with CW.

Tony_Wicks · September 1, 2023, 7:32pm

Yes adaptive load balancing very much helps but the weakness is it is normally only fully supported on vendor silicon not merchant silicon. Much of the transport edge is merchant silicon due to the per packet cost being far lower and the general requirement to just pass not manipulate packets. Using the Nokia kit for example the 7750 does a great job of "adaptive-load-balancing" but the 7250 is lacklustre at best.

Mark_Tinka4 · September 1, 2023, 7:47pm

Fair point. Mark.

Mike_Hammett · September 1, 2023, 7:52pm

Hi all, curious if anyone who has used Cogent as a point to point provider has gone through packet loss issues with them and were able to successfully resolve? I’ve got a non-rate-limited 10gig circuit between two geographic locations that have about 52ms of latency. Mine is set up to support both jumbo frames and vlan tagging. I do know Cogent packetizes these circuits, so they’re not like waves, and that the expected single session TCP performance may be limited to a few gbit/sec, but I should otherwise be able to fully utilize the circuit given enough flows.

Circuit went live earlier this year, had zero issues with it. Testing with common tools like iperf would allow several gbit/sec of TCP traffic using single flows, even without an optimized TCP stack. Using parallel flows or UDP we could easily get close to wire speed. Starting about ten weeks ago we had a significant slowdown, to even complete failure, of bursty data replication tasks between equipment that was using this circuit. Rounds of testing demonstrate that new flows often experience significant initial packet loss of several thousand packets, and will then have ongoing lesser packet loss every five to ten seconds after that. There are times we can’t do better than 50 Mbit/sec, but it’s rare to achieve gigabit most of the time unless we do a bunch of streams with a lot of tuning. UDP we also see the loss, but can still push many gigabits through with one sender, or wire speed with several nodes.

For equipment which doesn’t use a tunable TCP stack, such as storage arrays or vmware, the retransmits completely ruin performance or may result in ongoing failure we can’t overcome.

Cogent support has been about as bad as you can get. Everything is great, clean your fiber, iperf isn’t a good test, install a physical loop oh wait we don’t want that so go pull it back off, new updates come at three to seven day intervals, etc. If the performance had never been good to begin with I’d have just attributed this to their circuits, but since it worked until late June, I know something has changed. I’m hoping someone else has run into this and maybe knows of some hints I could give them to investigate. To me it sounds like there’s a rate limiter / policer defined somewhere in the circuit, or an overloaded interface/device we’re forced to traverse, but they assure me this is not the case and claim to have destroyed and rebuilt the logical circuit.

Thanks!

Mark_Tinka4 · September 1, 2023, 7:56pm

PTX1000/10001 (Express) offers no real configurable options for load balancing the same way MX (Trio) does. This is what took us by surprise.

This is all we have on our PTX:

tinka@router# show forwarding-options
family inet6 {
route-accounting;
}
load-balance-label-capability;

[edit]
tinka@router#

Mark.

Mark_Tinka4 · September 1, 2023, 8:06pm

Large IP/MPLS operators insist on optical transport for their own backbone, but are more than willing to sell packet for transport. I find this amusing :-). I submit that customers who can’t afford large links (1Gbps or below) are forced into EoMPLS transport due to cost. Other customers are also forced into EoMPLS transport because there is no other option for long haul transport in their city other than a provider who can only offer EoMPLS. There is a struggling trend from some medium sized operators looking to turn an optical network into a packet network, i.e., they will ask for a 100Gbps EoDWDM port, but only seek to pay for a 25Gbps service. The large port is to allow them to scale in the future without too much hassle, but they want to pay for the bandwidth they use, which is hard to limit anyway if it’s a proper EoDWDM channel. I am swatting such requests away because you tie up a full 100Gbps channel on the line side for the majority of hardware that does pure EoDWDM, which is a contradiction to the reason a packet network makes sense for sub-rate services. Mark.

Saku_Ytti1 · September 2, 2023, 6:43am

What in particular are you missing?

As I explained, PTX/MX both allow for example speculating on transit
pseudowires having CW on them. Which is non-default and requires
'zero-control-word'. You should be looking at 'hash-key' on PTX and
'enhanced-hash-key' on MX. You don't appear to have a single stanza
configured, but I do wonder what you wanted to configure when you
noticed the missing ability to do so.