VZ FIOS and Intel TCP IPv6 Checksum Offload problems

Sean_Donelan · August 27, 2022, 7:00pm

Hopefully, my pain will help someone else.

I've had sporadic Internet slowdowns and stuck networking since IPv6 was enabled on my FIOS ONT a few months ago.

After too much troubleshooting, I found out some older Intel GbE ethernet cards have a IPv6 Checksum Offload incompatibility with certain fiber ONT terminals. As Verizon is enabling IPv6 on its FIOS network, you might find intermittent network problems.

Intermittent are the worst kind of problems.

In some situations where a client machine is connected via some specific Optical Network Terminals (ONTs), and data is appended after the packet checksum, the network adapter can drop receive packets when using TCP-IPv6 Checksum Offload for receive traffic.

Intel published an alert in 2017, but I didn't have IPv6 on FIOS then.

TLDR; turn off TCP IPv6 Checksum Offload

Affects all operating systems (Windows, BSD, Linux, etc) using the affected wired Intel ethernet controllers. Not a problem with Intel WiFi.

Michael_Thomas · August 27, 2022, 10:06pm

My reaction is "offload from what"? Isn't this all done in silicon?

Mike

Mel_Beckman · August 27, 2022, 10:36pm

No. In fact, a lot of low-end Ethernet interfaces are completely implemented in interrupt-driven driver software that runs in the host OS (such as Windows). The only thing the hardware provides is the magnetically to transduce binary bit streams.

Even MAC-address decode is in software, and as a result, broadcast storms can slow these hosts to a crawl as the CPU had to check and discard every broadcast packet as “not mine”.

When these tasks are offloaded from the CPU to the Ethernet hardware, the CPU doesn’t need to perform these tasks, reducing CPU workload. These also offloading resources provide parallel computing and validation of checksums, which is otherwise computationally expensive.

I don’t know how this particular ONT bug works, but I’m guessing that it results in checksum failures under certain conditions, leading to retransmissions.

-mel via cell

Michael_Thomas · August 27, 2022, 10:40pm

No. In fact, a lot of low-end Ethernet interfaces are completely implemented in interrupt-driven driver software that runs in the host OS (such as Windows). The only thing the hardware provides is the magnetically to transduce binary bit streams.

Even MAC-address decode is in software, and as a result, broadcast storms can slow these hosts to a crawl as the CPU had to check and discard every broadcast packet as “not mine”.

When these tasks are offloaded from the CPU to the Ethernet hardware, the CPU doesn’t need to perform these tasks, reducing CPU workload. These also offloading resources provide parallel computing and validation of checksums, which is otherwise computationally expensive.

I don’t know how this particular ONT bug works, but I’m guessing that it results in checksum failures under certain conditions, leading to retransmissions.

Yeah, sorry brain fart. I'd be surprised if that were a big issue on home networks, but who knows.

Mike

William_Herrin · August 28, 2022, 2:16am

Hi Sean,

Do you happen to have any details on the bug? I note that the IPv6
header DOES NOT HAVE a checksum; it relies on the checksum in the
layer-2 frame. I'm not clear how you "append" bytes to the ethernet
frame "after" calculating the checksum, have the hardware checksum
fail but have the software checksum succeed.

Regards,
Bill Herrin

Sean_Donelan · August 29, 2022, 3:59pm

Because the interoperability flaw is in silicon, it can't be easily fixed in either the legacy wired Intel ethernet controller or fiber ONT. Would need to replace the hardware to fix the silicon.

Need to disable the hardware IPv6 TCP checksum offload, so its not mangled or dropped at the silicon layers anymore.

Its annoyingly intermittent and not visible with client-based Wireshark because the corruption occurs in the hardware controller.

Christopher_Morrow · August 29, 2022, 4:59pm

Uhm, this includes various versions of the intel pro 1000 card... so
that's a TON of gear,
to include like lenovo laptops, for instance. I'd wager that this is
super common in the field.
The PDF in the download says;
"Products Affected: All 1gbe and 10gbe intel ethernet controllers...."

One wonders if this is a case of the 'mac addresses that start with 4
or 6 fail' problem?
(the pdf has zero words about what the actual problem is)

Brian_Bruns · August 29, 2022, 5:36pm

So I keep seeing this being pushed as a problem with clients... but isn't it the ONT that is bugged out and appending the extra data after the checksum is already there (as Bill Herrin points out)?

I know it's asking a lot to expect networking equipment vendors to fix their gear, but...

Unless I'm totally not understanding the bug, which is entirely (and likely).

Niels_Bakker · August 29, 2022, 10:28pm

* bruns@2mbit.com (Brie) [Mon 29 Aug 2022, 19:38 CEST]:

Uhm, this includes various versions of the intel pro 1000 card... so that's a TON of gear, to include like lenovo laptops, for instance. I'd wager that this is super common in the field.
The PDF in the download says;
"Products Affected: All 1gbe and 10gbe intel ethernet controllers...."

So I keep seeing this being pushed as a problem with clients... but isn't it the ONT that is bugged out and appending the extra data after the checksum is already there (as Bill Herrin points out)?

I know it's asking a lot to expect networking equipment vendors to fix their gear, but...

Unless I'm totally not understanding the bug, which is entirely (and likely).

Here's my speculation on what was happening at Casa Sean.

The Ethernet frame has a length header. The IP frame has a length header. Ideally, the IP frame fits completely into the Ethernet frame, leaving room for the other required Ethernet bits but nothing more.

I vaguely recall there being some equipment that interpreted the minimum MTU requirement in IPv6 as meaning that there was a minimum packet size, not a minimum for the *maximum* packet size. Perhaps the fiber NTU padded the Ethernet frame up to the minimum MTU, sending along a bunch of junk bytes, without otherwise touching the IP packet.

The NIC would then perhaps forget that there were a bunch of junk bytes attached to the end of the frame beyond where the IP packet would end, and calculate the Ethernet checksum based on the IP packet length header, discarding otherwise valid frames as a result.

I've had my own run-ins over the years with supposed checksum offloading absolutely not happening on other brands, so implementation errors appear to be relatively common.

-- Niels.