MicroSoft amplification?

> echo). This probably makes PMTUD work a lot better, but it sucks for


Or totally horques it up entirely if the actual data path used has a
different PMTU. No way this will work if 9 paths are clean and one
requires a frag. :wink:

I won't discuss what to do if you get back 10 FRAG NEEDED packets, with
differing frag sizes :wink:

That's not the way it works. You've got a load-balancing-system (LBS)
front-ending (using a single IP address) a cluster 10 web servers. A
client on the other side of the LBS initiates an 80/tcp connection. The
LBS directs it to (let's say) server 6. Once data starts flowing on the
connection, nothing interesting happens until server 6 sends a large
packet to the client with (as on all of its packets) the Don't-frag flag
on. That packet reaches a link with a smaller mtu. The router on that
link returns (to the server complex) and ICMP unreachable, fragmentation
needed packet (type 3, code 4).

That ICMP reaches the LBS; it has to decide what to do with it. Some LBSs
will just discard any ICMP packets addressed to the cluster. The one used
by MS instead forwards it to all the back-end servers. The servers that
don't have a session with the client may just discard the ICMP packet (or
they may simply update info in their routing table (I know that Aix does
that)). The server that does have a session with the client will
repackage his data packet (per the newly learned mtu) and send the smaller
packet. The path between the chosen server to the client is no more
ambiguous than any other PMTUD situation. Which is to say that, yes, the
path could change from packet to packet, but that isn't brought on by the
presence of the LBS, it's just a shortcoming of the PMTUD mechanism. In
fact outbound traffic from the clustered servers often doesn't even go
through the LBS.

(Note that if the client is also using PMTUD and happens to send a large
enough packet to trigger it, the only ICMP unreachable sent would be
towards the client. Even if the link mtu causing the unreachable is on
the server side of the LBS, there will be only one unreachable sent - no
ambiguity at all.)

(Also note that it isn't necessary for an LBS to forward all ICMPs to make
PMTUD work. It just has to forward the unreachable, fragmentation needed
packets. And it doesn't have to forward those to all the back-end
servers. There is enough info in the unreachable message to determine
which connection this ICMP message relates to - the 80/tcp connection
between the client and server 6. So the LBS could know that this ICMP
only has to be forwarded to server 6. I don't know of any LBS that is
smart enough to handle it this way.)

Tony Rall

The Windows Load Balancing Service doesn't use a front-end/back-end approach since that has obvious scale limitations, and last I knew each of those addresses actually pointed at ~ 32 machines, but that may have doubled a couple of times by now. It is supposed to be smart enough to prevent dup's, so the fact you are seeing them either indicates brokenness, or more likely a cluster in state transition.