Internet Exchanges supporting jumbo frames?

Hi,

I'm trying to convince my local Internet Exchange location (and it is not
small, exceed 1 terabit per second on a daily basis) to adopt jumbo frames.
For IPv6 is is hassle free, Path MTU Discovery arranges the max MTU per
connection/destination.

For IPv4, it requires more planning. For instance, two datacenters tend to
exchange relevant traffic because customers with disaster recovery in mind
(saving the same content in two different datacenters, two different
suppliers). In most cases, these datacenters are quite far from each other,
even in different countries. In this context, jumbo frames would allow max
speed even the latency is from a tipical international link.

Could anyone share with me Internet Exchanges you know that allow jumbo
frames (like https://www.gr-ix.gr/specs/ does) and how you notice benefit
from it?

Best regards,

Kurt Kraut

Netnod does it in separate vlan's.

Hi Kurt,

this has been tried before at many ixps. No matter how good an idea it
sounds like, most organisations are welded hard to the idea of a 1500
byte mtu. Even for those who use larger MTUs on their networks, you're
likely to find that there is no agreement on the mtu that should be
used. Some will want 9000, some 9200, others 4470 and some people
will complain that they have some old device somewhere that doesn't
support anything more than 1522, and could everyone kindly agree to that
instead.

Meanwhile, if anyone gets the larger MTU wrong anywhere on their
network, packets will be blackholed and customers will end up unhappy.
Management will demand that the IXP jumbo service is disconnected until
the root cause is fixed, or worse still, will blame the IXP for some
mumble relating to how things worked better before enabling jumbo mtus.

Nick

Hi Nick,

Thank you for replying so quickly. I don't see why the consensus for an MTU
must be reached. IPv6 Path MTU Discovery would handle it by itself,
wouldn't it? If one participant supports 9k and another 4k, the traffic
between them would be at 4k with no manual intervention. If to participants
adopts 9k, hooray, it will be 9k thanks do PMTUD.

Am I missing something?

Best regards,

Kurt Kraut

Maybe breaking v4 in the process?

Hi Mike,

The adoption of jumbo frames in a IXP doesn't brake IPv4. For an ISP, their
corporate and residencial users would still use 1,5k. For datacenters,
their local switches and servers are still set to 1,5k MTU. Nothing will
brake. When needed, if needed and when supported, from a specific server,
from a specific switch, to a specific router it can raise the MTU up to the
max MTU supported by IXP if the operator know the destination also supports
it, like in the disaster recovery example I gave. For IPv6, the best MTU
will be detected and used with no operational effort.

For those who doesn't care about it, an IXP adopting jumbo frames wouldn't
demand any kind of change for their network. They just set their interfaces
to 1500 bytes and go rest. For those who care like me can take benefit from
it and for that reason I see no reason for not adopting it.

Best regards,

Kurt Kraut

There is no way to avoid breaking MTU for IPv4 but use PMTUD for IPv6,
is there? Meaning to stick to 1500 for IPv4 and use something larger for
IPv6?

Kind regards,
Stefan

Kurt Kraut wrote:

Thank you for replying so quickly. I don't see why the consensus for an
MTU must be reached. IPv6 Path MTU Discovery would handle it by itself,
wouldn't it? If one participant supports 9k and another 4k, the traffic
between them would be at 4k with no manual intervention. If to
participants adopts 9k, hooray, it will be 9k thanks do PMTUD.

Am I missing something?

for starters, if you send a 9001 byte packet to a router which has its
interface MTU configured to be 9000 bytes, the packet will be
blackholed, not rejected with a PTB.

Even if it weren't, how many icmp PTB packets per second would a router
be happy to generate before rate limiters kicked in? Once someone
malicious works that out, they can send that number of crafted packets
per second through the IXP, thereby creating a denial of service situation.

There are many other problems, such as pmtud not working properly in the
general case.

Nick

I have a strong opinion for jumboframes=9180bytes (IPv4/IPv6 MTU), partly because there are two standards referencing this size (RFC 1209 and 1626), and also because all major core router vendors support this size now that Juniper has decided (after some pushing) to start supporting it in more recent software on all their major platforms (before that they had too low L2 MTU to be able to support 9180 L3 MTU).

In order to deploy this to end systems, I however thing we're going to need something like https://tools.ietf.org/html/draft-van-beijnum-multi-mtu-04 to make this work on mixed-MTU LANs. The whole thing about PMTUD blackhole detection is also going to be needed, so hosts try lower PMTU in case larger packets are dropped because of L2 misconfiguration in networks.

With IPv6 we have the chance to make PMTUD work properly and also have PMTU blackhole detection implemented in all hosts. IPv4 is lost cause in my opinion (although it's strange how many hosts that seem to get away with 1492 (or is it 1496) MTU because they're using PPPoE).

That is only true if the router/host sets MRU=MTU. That is definitely not always the case.

Could you do the same with a 1501 byte packet?

I have many times ping:ed with 10000 byte packets on a device that has "ip mtu 9000" configured on it, so it sends out two fragments, one being 9000, the other one around 1100 bytes, only to get back a stream of fragments, none of them larger than 1500 bytes.

MTU and MRU are two different things.

Regarding jumbo usage, the biggest immediate benefit I can see would be if two ISPs want to exchange tunneled traffic with each other, even if the customer access is 1500, there is definitely benefit in being able to slap an L3 tunnel header on that packet, send it as ~1550 bytes to the other ISP, and then they take off the header again, without having to handle tunnel packet fragments (which tend to be quite resource intensive).

IXP can verify if MTU is too large or too small with active poller.

Poller in the IXP has too large MTU, it tries to send ping packets
with max_size+1, if they work, customer has too large MTU. Also it
tries to send max_size, if it does not work, customer has too small
MTU. As icing on top, it tries to send max_size+1 but fragments it to
max_size and 1, and sees what comes back.

IXP is only interface in whole of Internet which collapses MTU to
1500B, private peers regularly have higher MTU, ~everyone runs core at
higher MTU.

I think it's crucial that we stop thinking MTU as single thing, we
should separate edge MTU and core MTU, that is how we already think
and provision when we think about our own network. Then question
becomes, is IXP edge or core? I would say run core MTU in IXP, so edge
MTU can be tunneled without fragmentation over it.
IXP can offer edgeMTU and coreMTU VLANs, so that people who are
religiously against it, can only peer in edgeMTU VLAN.

used. Some will want 9000, some 9200, others 4470 and some people

I have a strong opinion for jumboframes=9180bytes (IPv4/IPv6 MTU),
partly because there are two standards referencing this size (RFC 1209
and 1626), and also because all major core router vendors support this
size now that Juniper has decided (after some pushing) to start
supporting it in more recent software on all their major platforms
(before that they had too low L2 MTU to be able to support 9180 L3 MTU).

In order to deploy this to end systems, I however thing we're going to
need something like
draft-van-beijnum-multi-mtu-04 to make this
work on mixed-MTU LANs. The whole thing about PMTUD blackhole detection
is also going to be needed, so hosts try lower PMTU in case larger
packets are dropped because of L2 misconfiguration in networks.

With IPv6 we have the chance to make PMTUD work properly and also have

The prospects for that seem relatively dire. of course whats being
discussed here is the mixed L2 case, where the device will probably not
sent icmp6 ptb anyway but rather simply discard the packet as a giant.

PMTU blackhole detection implemented in all hosts. IPv4 is lost cause in
my opinion (although it's strange how many hosts that seem to get away
with 1492 (or is it 1496) MTU because they're using PPPoE).

if your adv_mss is set accordingly you can get away with
a lot.

Same for the SIX.

Saku Ytti wrote:

Poller in the IXP has too large MTU, it tries to send ping packets
with max_size+1, if they work, customer has too large MTU. Also it
tries to send max_size, if it does not work, customer has too small
MTU. As icing on top, it tries to send max_size+1 but fragments it to
max_size and 1, and sees what comes back.

you're recommending that routers at IXPs do inflight fragmentation?

Nick

Mikael Abrahamsson wrote:

I have many times ping:ed with 10000 byte packets on a device that has
"ip mtu 9000" configured on it, so it sends out two fragments, one being
9000, the other one around 1100 bytes, only to get back a stream of
fragments, none of them larger than 1500 bytes.

here's some data on INEX from a server interface with 9000 mtu. fping
has 40 bytes overhead:

# wc -l ixp-router-addresses.txt
      85 ixp-router-addresses.txt
# fping -b 1460 < ixp-router-addresses.txt | grep -c unreachable
0
# fping -b 1500 < ixp-router-addresses.txt | grep -c unreachable
10
# fping -b 5000 < ixp-router-addresses.txt | grep -c unreachable
11
# fping -b 8960 < ixp-router-addresses.txt | grep -c unreachable
12

Out of interest, there were 5 different vendors in the output, according
to the MAC addresses returned.Some of this may be caused by
inappropriate icmp filtering on the routers, but the point is that it
would be unwise to depend on routers doing the right thing here. If
you're going to have a jumbo mtu vlan at an IXP, the VLAN needs to be a
hard specification, not an aspiration with any variance.

Nick

I'm suggesting IXP has active poller which detects customer MTU misconfigs.

Saku Ytti wrote:

I'm suggesting IXP has active poller which detects customer MTU misconfigs.

any ixp configuration which requires active polling to ensure correct
configuration is doomed to failure. You are completely overestimating
human nature if you believe that the IXP operator can make this work by
harassing people into configuring the correct mtu, even if the data is
clear that their systems are misconfigured.

Nick