Smallest Transit MTU

[root@bofh sabri]# host -t ns verisign.com 192.5.6.30
Using domain server 192.5.6.30:

verisign.com name server bay-w1-inf5.verisign.net

[root@bofh root]# tcpdump -Xn host 192.5.6.30
tcpdump: listening on fxp0
17:37:53.124955 217.69.153.39.55058 > 192.5.6.30.53: 58565+ NS? verisign.com. (30)
0x0000 4500 003a 5f10 0000 4011 e312 d945 9927 E..:_...@....E.'
0x0010 c005 061e d712 0035 0026 8ac9 e4c5 0100 .......5.&......
0x0020 0001 0000 0000 0000 0876 6572 6973 6967 .........verisig
0x0030 6e03 636f 6d00 0002 0001 n.com.....
17:37:53.216656 192.5.6.30.53 > 217.69.153.39.55058: 58565- 3/0/3 NS[|domain] (DF)
0x0000 4500 00ca 0000 4000 3111 1093 c005 061e E.....@.1.......
0x0010 d945 9927 0035 d712 00b6 1b79 e4c5 8100 .E.'.5.....y....
0x0020 0001 0003 0000 0003 0876 6572 6973 6967 .........verisig
0x0030 6e03 636f 6d00 0002 0001 c00c 0002 0001 n.com...........
0x0040 0002 a300 001a 0b62 6179 2d77 312d 696e .......bay-w1-in
0x0050 6635 f5

Here you go. A root-nameserver setting the DF-bit on its replies :slight_smile:

Are there any common examples of the DF bit being set on non-TCP
packets?

[...]

Here you go. A root-nameserver setting the DF-bit on its replies :slight_smile:

This is very bad.

With a 296 byte MTU I don't get answers from (a|b|h|j).root-servers.net, *.gtld-servers.net, tld2.ultradns.net and some lesser-known ccTLD servers.

I would have thought this impossible, but seeing is believing...

Fortunately, this problem won't present itself with regular smaller MTUs, the MTU has to be smaller than around 500 bytes. I haven't tested whether these servers also suffer from the "regular" PMTUD problem where the ICMP messages are ignored, but I'm assuming they don't, so doing all of this over TCP should still work.

In article <A7B004D6-6288-11D9-BA2A-000A95CD987A@muada.com> you write:

Are there any common examples of the DF bit being set on non-TCP
packets?

[...]

Here you go. A root-nameserver setting the DF-bit on its replies :slight_smile:

This is very bad.

With a 296 byte MTU I don't get answers from
(a|b|h|j).root-servers.net, *.gtld-servers.net, tld2.ultradns.net and
some lesser-known ccTLD servers.

I would have thought this impossible, but seeing is believing...

Fortunately, this problem won't present itself with regular smaller
MTUs, the MTU has to be smaller than around 500 bytes. I haven't tested
whether these servers also suffer from the "regular" PMTUD problem
where the ICMP messages are ignored, but I'm assuming they don't, so
doing all of this over TCP should still work.

  Well DNS (not EDNS) is limited to 512 octets so you unless there
  are real links (not ones artificially constrained to demonstrate
  a issue) this should not be a issue in practice. The default link
  mtus for slip/ppp/ethernet are all large enought for a DNS/UDP
  response to get through without needing fragmentation.

  For EDNS which will send up to 4k UDP datagrams (current recommended
  size) this could be a issue in that the clients would have to fall
  back to DNS after timing out on the EDNS query.

  e.g.
    EDNS query
    EDNS response (dropped due to DF)
    timeout
    DNS query
    DNS response gets through.

  Note for IPv6 one sets IPV6_USE_MIN_MTU on the UDP socket so this
  should be a non-issue there.

  Mark

  Well DNS (not EDNS) is limited to 512 octets so you unless there
  are real links (not ones artificially constrained to demonstrate
  a issue) this should not be a issue in practice.

No, fortunately not. But it's still VERY wrong and it must be fixed.

  Note for IPv6 one sets IPV6_USE_MIN_MTU on the UDP socket so this
  should be a non-issue there.

In IPv6 PMTUD is handled at the IP layer so the second packet that's too large will be fragmented at the source, so there is no issue.

I receive DNS responses > 500 bytes every day (reported by PIX firewall). So
it is an issue, no matter wgat is recomended in RFC.