MX204 tunnel services BW

Encountered an issue with an MX204 using all 4x100G ports and a logical
tunnel to hairpin a VRF. The tunnel started dropping packets around 8Gbps.
I bumped up tunnel-services BW from 10G to 100G which made the problem
worse; the tunnel was now limited to around 1.3Gbps. To my knowledge with
Trio PFE you shouldn't have to disable a physical port to allocate bandwidth
for tunnel-services. Any helpful info is appreciated.

AIUI, with Trio, you don’t have to disable a physical port, but that comes at the cost of “Tunnel gets whatever bandwidth is left after physical port packets are processed” and likely some additional overhead for managing the sharing.

Could that be what’s happening to you?

Owen

You might have more luck in j-nsp.

But yes you don't need any physical interface in trio to do tunneling.
I can't explain your problem, and you probably need JTAC help. I would
appreciate it if you'd circle back and tell what the problem was.

How it works is that when PPE decides it needs to tunnel the packet,
you're going to send the packet back to MQ via SERDES (which will then
send it again to some PPE, not the same). I think what that bandwidth
command does is change the stream allocation, you should see it in
'show <MQ/XM...> <#> stream'.

In theory, because PPE can process packet forever (well, until
watchdog kills the PPE for thinking it is stuck) you could very
cheaply do outer+inner at the local PPE, but I think that would mean
that certain features like QoS would not work on the inner interface,
so I think all this expensive recirculation and SERDES consumption is
to satisfy quite limited need, and it should be possible to implement
some 'performance mode' for tunneling, where these MQ/XM provided
features are not available, but performance cost in most cases is
negligible.

In parallel to opening the JTAC case, you might want to try to
experiment in which FPC/PIC you set the tunneling bandwidth to. I
don't understand how the tunneling would work if the MQ/XM is remote,
like would you then also steal fabric capacity every time you tunnel,
not just MQ>LU>MQ>LU SERDES, but MQ>LU>MQ>FAB>MQ>LU. So intuitively I
would recommend ensuring you have the bandwidth configured at the
local PFE, if you don't know what the local PFE is, just configure it
everywhere?
Also you could consult several counters to see if some stream or
fabric is congested, and these tunneled packets are being sent over
congested fabric every time with lower fabric qos.

I don't understand why the bandwidth command is a thing, and why you
can configure where it is. To me it seems obvious they should always
handle tunneling strictly locally, never over fabric, because you
always end up stealing more capacity if you send it to remote MQ. That
is, implicitly it should be on for every MQ, and every PPE tunnel via
local MQ.

Yeah, doesn’t quite work that way…

The tunnel is assigned to one particular PFE.

What was the aggregate throughput on that PFE (which spending on the card may well top out at 40Gbps or even 10Gbps, though not likely
on most Trio-based cards, that’s more of the DPC era cards, which did require you to sacrifice a port for tunnel bandwidth).

Owen

You can configure tunnel bandwidth everywhere, but you can’t configure
a given tunnel everywhere, you have to assign it to a particular FPC/PIC/0.

For example, with:
set chassis fps 2 pic 3 tunnel-services bandwidth 10g

You need to create gr-2/3/0 interfaces for tunnels to use that PFE.

You can create multiple tunnel-services bandwidth entries on multiple
PICs, but you can only put a given tunnel on one gr-x/y/0 interface.

Owen

AIUI, with Trio, you don’t have to disable a physical port, but that comes at the cost of “Tunnel gets whatever bandwidth is left after physical port packets are processed” and likely some additional overhead for managing the sharing.

This was pretty much my understanding as well, last time I dealt with this. On MPC/Trio , you just enabled tunnel-services on a given PIC, and landed your tunnel there. The tunnel capacity was just part of the PFE capacity.

Was only on pre-Trio that the bandwidth keyword was required, and that actually reserved that much capacity strictly for the tunnel.

Aggregate throughput for the box was less than 100Gbps while the tunnel was being starved.

JTAC says we must disable a physical port to allocate BW for tunnel-services. Also leaving tunnel-services bandwidth unspecified is not possible on the 204. I haven't independently tested / validated in lab yet, but this is what they have told me. I advised JTAC to update the MX204 "port-checker" tool with a tunnel-services knob to make this caveat more apparent.

Looks like the MX204 Is a bit of an odd duck in the MX series. It probably shares some hardware characteristics under the hood (even the MX80 (mostly, there was a variant that had pre-installed interfaces) had MIC slots).

The MX-204 appears to be an entirely fixed configuration chassis and looks from the literature like it is based on pre-trio chipset technology. Interesting that there are 100Gbe interfaces implemented with this seemingly older technology, but yes, looks like the PFE on the MX-204 has all the same restrictions as a DPC-based line card in other MX-series routers.

Owen

According to: [https://www.juniper.net/documentation/us/en/software/junos/interfaces-encryption/topics/topic-map/configuring-tunnel-interfaces.html\\\#id\-configuring\-tunnel\-interfaces\-on\-mx\-204\-routers\][https_www.juniper.net_documentation_us_en_software_junos_interfaces-encryption_topics_topic-map_configuring-tunnel-interfaces.html_id-configuring-tunnel-interfaces-on-mx-204-routers]

"The MX204 router supports two inline tunnels - one per PIC. To configure the tunnel interfaces, include the tunnel-services statement and an optional bandwidth of 1 Gbps through 200 Gbps at the \[edit chassis fpc fpc-slot pic number\] hierarchy level. If you do not specify the tunnel bandwidth then, the tunnel interface can have a maximum bandwidth of up to 200 Gbps."

If JTAC is saying it's no longer optional they need to update their docs.

AFAIK, tunnel services doesn't directly take bandwidth from physical ports, but it does take from the total available PFE bandwidth. Disabling a port may be required as the MX204 has a maximum PFE bandwidth of 400G and you can oversubscribe that with the fixed physical ports.

I just checked a production config as an example, note how et-0/0/3 is not configured so the total bandwidth adds up to 400g:

set chassis fpc 0 pic 0 tunnel-services bandwidth 20g
set chassis fpc 0 pic 0 port 0 speed 100g
set chassis fpc 0 pic 0 port 1 speed 100g
set chassis fpc 0 pic 0 port 2 speed 100g
set chassis fpc 0 pic 1 port 0 speed 10g
set chassis fpc 0 pic 1 port 1 speed 10g
set chassis fpc 0 pic 1 port 2 speed 10g
set chassis fpc 0 pic 1 port 3 speed 10g
set chassis fpc 0 pic 1 port 4 speed 10g
set chassis fpc 0 pic 1 port 5 speed 10g
set chassis fpc 0 pic 1 port 6 speed 10g
set chassis fpc 0 pic 1 port 7 speed 10g

Regards,

Ryan

\-------- Original Message --------

This is true of other MX platforms as well, unless I misunderstand.

Mark.

We can commit "tunnel-services" on an MX204 without caveat.

Mark.

It is 100% normal Trio EA.

Did they explain why you need to disable the physical port? I'd love
to hear that explanation.

The MX204 is single Trio EA, so you can't even waste serdes sending
the packet to remote PFE after first lookup, it would only bounce
between local XM/MQ and LU/XL, wasting that serdes.