constant FEC errors juniper mpc10e 400g

We recently added MPC10E-15C-MRATE cards to our MX960's to upgrade our core to 400g.  During initial testing of the 400g interface (400GBASE-FR4), I see constant FEC errors.  FEC is new to me.  Anyone know why this is occurring?  Shown below, is an interface with no traffic, but seeing constant FEC errors.  This is (2) MX960's cabled directly, no dwdm or anything between them... just a fiber patch cable.

{master}
me@mx960> clear interfaces statistics et-7/1/4

{master}
me@mx960> show interfaces et-7/1/4 | grep rror | refresh 2
---(refreshed at 2024-04-17 14:18:53 CDT)---
  Link-level type: Ethernet, MTU: 1514, MRU: 1522, Speed: 400Gbps, BPDU Error: None, Loop Detect PDU Error: None, Loopback: Disabled, Source filtering: Disabled,
    Bit errors                             0
    Errored blocks                         0
  Ethernet FEC statistics              Errors
    FEC Corrected Errors                    0
    FEC Uncorrected Errors                  0
    FEC Corrected Errors Rate               0
    FEC Uncorrected Errors Rate             0
---(refreshed at 2024-04-17 14:18:55 CDT)---
  Link-level type: Ethernet, MTU: 1514, MRU: 1522, Speed: 400Gbps, BPDU Error: None, Loop Detect PDU Error: None, Loopback: Disabled, Source filtering: Disabled,
    Bit errors                             0
    Errored blocks                         0
  Ethernet FEC statistics              Errors
    FEC Corrected Errors                 4302
    FEC Uncorrected Errors                  0
    FEC Corrected Errors Rate               8
    FEC Uncorrected Errors Rate             0
---(refreshed at 2024-04-17 14:18:57 CDT)---
  Link-level type: Ethernet, MTU: 1514, MRU: 1522, Speed: 400Gbps, BPDU Error: None, Loop Detect PDU Error: None, Loopback: Disabled, Source filtering: Disabled,
    Bit errors                             0
    Errored blocks                         0
  Ethernet FEC statistics              Errors
    FEC Corrected Errors                 8796
    FEC Uncorrected Errors                  0
    FEC Corrected Errors Rate             146
    FEC Uncorrected Errors Rate             0
---(refreshed at 2024-04-17 14:18:59 CDT)---
  Link-level type: Ethernet, MTU: 1514, MRU: 1522, Speed: 400Gbps, BPDU Error: None, Loop Detect PDU Error: None, Loopback: Disabled, Source filtering: Disabled,
    Bit errors                             0
    Errored blocks                         0
  Ethernet FEC statistics              Errors
    FEC Corrected Errors                15582
    FEC Uncorrected Errors                  0
    FEC Corrected Errors Rate             111
    FEC Uncorrected Errors Rate             0
---(refreshed at 2024-04-17 14:19:01 CDT)---
  Link-level type: Ethernet, MTU: 1514, MRU: 1522, Speed: 400Gbps, BPDU Error: None, Loop Detect PDU Error: None, Loopback: Disabled, Source filtering: Disabled,
    Bit errors                             0
    Errored blocks                         0
  Ethernet FEC statistics              Errors
    FEC Corrected Errors                20342
    FEC Uncorrected Errors                  0
    FEC Corrected Errors Rate             256
    FEC Uncorrected Errors Rate             0

{master}
me@mx960> show interfaces et-7/1/4 | grep "put rate"
  Input rate     : 0 bps (0 pps)
  Output rate    : 0 bps (0 pps)

{master}
me@mx960> show interfaces et-7/1/4
Physical interface: et-7/1/4, Enabled, Physical link is Up
  Interface index: 226, SNMP ifIndex: 800
  Link-level type: Ethernet, MTU: 1514, MRU: 1522, Speed: 400Gbps, BPDU Error: None, Loop Detect PDU Error: None, Loopback: Disabled, Source filtering: Disabled,
  Flow control: Enabled
  Pad to minimum frame size: Disabled
  Device flags   : Present Running
  Interface flags: SNMP-Traps Internal: 0x4000
  Link flags     : None
  CoS queues     : 8 supported, 8 maximum usable queues
  Schedulers     : 0
  Last flapped   : 2024-04-17 13:55:28 CDT (00:36:19 ago)
  Input rate     : 0 bps (0 pps)
  Output rate    : 0 bps (0 pps)
  Active alarms  : None
  Active defects : None
  PCS statistics                      Seconds
    Bit errors                             0
    Errored blocks                         0
  Ethernet FEC Mode  :                 FEC119
  Ethernet FEC statistics              Errors
    FEC Corrected Errors               801787
    FEC Uncorrected Errors                  0
    FEC Corrected Errors Rate            2054
    FEC Uncorrected Errors Rate             0
  Link Degrade :
    Link Monitoring                   :  Disable
  Interface transmit statistics: Disabled

  Logical interface et-7/1/4.0 (Index 420) (SNMP ifIndex 815)
    Flags: Up SNMP-Traps 0x4004000 Encapsulation: ENET2
    Input packets : 1
    Output packets: 1
    Protocol inet, MTU: 1500
    Max nh cache: 75000, New hold nh limit: 75000, Curr nh cnt: 1, Curr new hold cnt: 0, NH drop cnt: 0
      Flags: Sendbcast-pkt-to-re
      Addresses, Flags: Is-Preferred Is-Primary
        Destination: 10.10.10.76/30, Local: 10.10.10.77, Broadcast: 10.10.10.79

Open a JTAC case,
That looks like a work for them

Kind Regards,
Dominik

i did. Usually my NANOG and J-NSP email list gets me a quicker solution than JTAC.

-Aaron

I’m no TAC engineer, but the purpose of FEC is to take and correct errors when the port is going so fast that errors are simply inevitable. Working as Intended.

Easier (read: cheaper) to build in some error correction than make the bits wiggle more reliably.

No idea if that rate of increment is alarming or not, but you’ve not yet hit your FEC cliff so you appear to be fine.

-Matt

Isn’t FEC required by the 400G spec?

fec cliff? is there a level of fec erros that i should be worried about then? not sure what you mean.

-Aaron

Thanks Joe and Schylar, that’s reassuring. Tom, yes, I believe fec is required for 400g as you see fec119 listed in that output… and i understand you can’t (or perhaps shouldn’t) change it.

-Aaron

Hi.

Looks like normal behavior:

https://supportportal.juniper.net/s/article/PTX-FEC-corrected-errors-increasing-on-link-between-QSFP-100GBASE-SR4-740-058734-and-QSFP-100G-SR4-T2-740-061405?language=en_US

"An incrementing FEC Corrected Errors counter is normal for a link that is running FEC. It just indicates that the errored bits have been corrected by FEC. "

"Therefore, the incrementing FEC Corrected Errors counter might only be indicating an interoperability issue between the optics from ......"

Notes I found that I took from smart optical people :

“PAM4 runs at much lower SNRs than NRZ, because you’re trying to read 4 distinct voltage levels instead of 2.Even the cleanest system will have some of that, so the only way to make it usable is to have FEC in place.”

At some point, an error rate would exceed the ability of forward error correction (FEC) overhead to compensate, resulting in CRC errors. You’re not seeing those so all is technically well.

It’s not so much how many packets come in with errors that causes a problem, but what percentage of each packet is corrupted. The former is usually indicative of the latter though.

Just as Tom said, we’re talking about a whole new animal than the NRZ we’re used to inside the building. Long-haul and DCI folks deal with this stuff pretty regularly. The secret is keep everything clean and mind your bend radii. We won’t get away with some of what we used to get away with.

-Matt

Interesting, thanks all, the JTAC rep got back to me and also pretty much said it’s not an issue and is expected… also, JTAC rep sited 2 KB’s, shown here, both using 100g as an example… question please, should I understand that this is also true about 400g, even though his KB’s speak about 100g ?

KB77305
KB35145

https://supportportal.juniper.net/s/article/What-is-the-acceptable-rate-of-FEC-corrected-errors-for-100G-interface
https://supportportal.juniper.net/s/article/PTX-FEC-corrected-errors-increasing-on-link-between-QSFP-100GBASE-SR4-740-058734-and-QSFP-100G-SR4-T2-740-061405?language=en_US

-Aaron

Well JTAC just said that it seems ok, and that 400g is going to show 4x more than 100g “This is due to having to synchronize much more to support higher data.”

-Aaron

FEC on 400G is required and expected. As long as it is “corrected”, you have nothing to worry about. We had the same realisation recenty when upgrading to 400G.

-Schylar

Corrected FEC errors are pretty normal for 400G FR4

We've seen the same between Juniper and Arista boxes in the same rack running at 100G, despite cleaning fibres, swapping optics, moving ports, moving line cards, e.t.c. TAC said it's a non-issue, and to be expected, and shared the same KB's.

It's a bit disconcerting when you plot the data on your NMS, but it's not material.

Mark.

In my reading the 400GBASE-R Physical Coding Sublayer (PCS) always includes the FEC. This is defined in clause 119 of IEEE Std 802.3-2022, and most easily seen in "Figure 119–2—Functional block diagram" if you don't want to get buried in the prose. Nothing there seems to imply that the FEC is optional.

I'd be happy to be corrected though. It may well be that there is a method to reading these tomes, that I have not discovered yet. It is the first time I dove deep into any IEEE standard.

Best regards
Joel

Standard deviation is now your friend. Learned to alert on outside of SD FEC and CRCs. Although the second should already be alerting.

We’ve seen the same between Juniper and Arista boxes in the same rack
running at 100G, despite cleaning fibres, swapping optics, moving ports,
moving line cards, e.t.c. TAC said it’s a non-issue, and to be expected,
and shared the same KB’s.

Just for extra clarity off those KB, probably has nothing to do with vendor interop as implied in at least one of those.

You will see some volume of FEC corrected on 400G FR4 with the same router hardware and transceiver vendor on both ends, with a 3m patch. Short of duct taping the transceivers together, not going to get much more optimal than that.

As far as I can suss out from my reading and what Smart People have told me, certain combinations of modulation and lamda are just more susceptible to transmission noise, so for those FEC is required by the standard. PAM4 modulation does seem to be a common thread, but there are some PAM2/NRZs that FEC is also required for. ( 100GBASE-CWDM4 for example. )

Yes, correct.

Mark.

Not to belabor this, but so interesting... I need a FEC-for-Dummies or FEC-for-IP/Ethernet-Engineers...

Shown below, my 400g interface with NO config at all... Interface has no traffic at all, no packets at all....  BUT, lots of FEC hits.  Interesting this FEC-thing.  I'd love to have a fiber splitter and see if wireshark could read it and show me what FEC looks like...but something tells me i would need a 400g sniffer to read it, lol

It's like FEC (fec119 in this case) is this automatic thing running between interfaces (hardware i guess), with no protocols and nothing needed at all in order to function.

-Aaron

{master}
me@mx960> show configuration interfaces et-7/1/4 | display set

{master}
me@mx960>

{master}
me@mx960> clear interfaces statistics et-7/1/4

{master}
me@mx960> show interfaces et-7/1/4 | grep packet
    Input packets : 0
    Output packets: 0

{master}
me@mx960> show interfaces et-7/1/4 | grep "put rate"
  Input rate     : 0 bps (0 pps)
  Output rate    : 0 bps (0 pps)

{master}
me@mx960> show interfaces et-7/1/4 | grep rror
  Link-level type: Ethernet, MTU: 1514, MRU: 1522, Speed: 400Gbps, BPDU Error: None, Loop Detect PDU Error: None, Loopback: Disabled, Source filtering: Disabled,
    Bit errors                             0
    Errored blocks                         0
  Ethernet FEC statistics              Errors
    FEC Corrected Errors                28209
    FEC Uncorrected Errors                  0
    FEC Corrected Errors Rate            2347
    FEC Uncorrected Errors Rate             0

{master}
me@mx960> show interfaces et-7/1/4 | grep packet
    Input packets : 0
    Output packets: 0

{master}
me@mx960> show interfaces et-7/1/4 | grep "put rate"
  Input rate     : 0 bps (0 pps)
  Output rate    : 0 bps (0 pps)

{master}
me@mx960> show interfaces et-7/1/4 | grep rror
  Link-level type: Ethernet, MTU: 1514, MRU: 1522, Speed: 400Gbps, BPDU Error: None, Loop Detect PDU Error: None, Loopback: Disabled, Source filtering: Disabled,
    Bit errors                             0
    Errored blocks                         0
  Ethernet FEC statistics              Errors
    FEC Corrected Errors                45153
    FEC Uncorrected Errors                  0
    FEC Corrected Errors Rate              29
    FEC Uncorrected Errors Rate             0

{master}
me@mx960> show interfaces et-7/1/4 | grep packet
    Input packets : 0
    Output packets: 0

{master}
me@mx960> show interfaces et-7/1/4 | grep "put rate"
  Input rate     : 0 bps (0 pps)
  Output rate    : 0 bps (0 pps)

{master}
me@mx960> show interfaces et-7/1/4 | grep rror
  Link-level type: Ethernet, MTU: 1514, MRU: 1522, Speed: 400Gbps, BPDU Error: None, Loop Detect PDU Error: None, Loopback: Disabled, Source filtering: Disabled,
    Bit errors                             0
    Errored blocks                         0
  Ethernet FEC statistics              Errors
    FEC Corrected Errors                57339
    FEC Uncorrected Errors                  0
    FEC Corrected Errors Rate            2378
    FEC Uncorrected Errors Rate             0

{master}
me@mx960>