How many bits of entropy do we need for (ECMP) load balancing in the core?
This question has kept coming up regularly in many discussions and drafts at the IETF.
The IPv6 flow label is 20 bits but hardware implementations do their balancing only on a subset of that, e.g. 12 or 16 bits.
There are drafts for MPLS, BIER etc… that provide their own entropy bit fields of various sizes.
I traced to a 6MAN discussion at IETF 78 a claim that 10 or 11 bits were enough.
Did someone do the actual exercise? It would be neat to align the IETF specs in the making to whatever truth may be established in the core.
So I’d argue that the pedantic answer is “you need only as many bits of entropy as your largest fan out” — meaning that 10 bits would allow 1024-way ECMP. But I don’t think that’s what you were actually after…
Most of the challenges I’ve seen are not around how many bits you end up with, but rather how you get to those bits. There are lots of different ways to compute the hash values, but if you want to be “fast” you’re unlikely to also get “good” and “cheap”… generally to select a path, we run a hash function against some set of packet fields, then map that hash to one of the member links. A “perfect” balancing algorithm would be crypto grade hash generation with a large output, and a true modulo operation to select which member we use. The reality is that both crypto hash functions and modulo operations are more expensive than lots of other ways to compute it, so vendors (disclaimer, I work for Cisco) have lots and lots of combinations for how it’s actually done.
And then you still have the flow issue: since the vast majority of implementation are hashing flows regardless of their actual bandwidth, if you hash even a few ‘elephants’ onto the same link, you’re not going to get good distribution no matter how good your hashing/selection mechanism is. With respect to your comment about standardization, I doubt you’ll ever be able to get a broad consensus on the combination of “how many bits we need given the others constraints for a spec” and “how much we want to assume about the goodness of the hash generator” and “how much I’m willing to just throw bits at the problem” …
Why? I don't think it's fundamentally true. Even if we imagine your
instruction set reading only 12 or 16, you can always read too much
and shift. And I certainly don't think it's a meaningful contributor
in HW design to read 20 bits or any other random bit size of entropy.
Your question itself is interesting and I don't have a good answer. I
guess the question should be 'to cover all practical networks', which
means, we need to allow for potentially hundreds of interfaces, maybe
millions of flows and potentially we need to allow highly biased
hash_result => interface mapping, to deal with elephant flows.
I think it's an answerable question, but not necessarily easy to answer.
There is also the question what SHOULD be used as a flow key, and
should the sender be able to affect that decision?
There are 3 kind of hashing algorithm
I'm sure there are a lot more. Like 'cryptographic' purposes are
ambiguous. Proving that content hasn't been changed requires hash to
be fast and HW friendly, using hash to protect password requires hash
to be slow and HW unfriendly (i.e. SHAs are, by design, not good PW
hash, but confusingly we use them as such, since we think hashes as a
metric of 'good' and 'bad'.).
If you control the domain you can probably choose a non-compromise
solution, which is good specifically for that application but may not
be good for many others.
The first one is used to check the sanity of input, against bit-swapping
error for instance
See CRC for instance
Those algorithm are deadly fast, but also dumb as hell
Yeah CRC is quite good for the application it has, verify that the
1500B ethernet frame has not been changed, I believe it detects all
single and double bit flips
CRC is a canonical choice for router/switches to hash traffic, even
though CRC has really poor diffusion quality, which is the main metric
you're interested in hash for this application, I suspect CRC is a
canonical choice not because it's a good choice but because it's
already there. I know JNPR Trio for example runs two CRC functions
which makes the diffusion much much better, but no where near as good
as choosing a good algo for domain to begin with.
Of course no matter how great hash algo you have, we also have a
problem of elephant flows, which you cannot fix by improving the hash,
as you cannot reasonably expect to feed that info to the algo. And if
you do solve this problem, of course you solve it by biasing the
hash_result => egress_int table, and if you have the mechanism to bias
the mapping to ensure fairness of traffic spread then the quality of
the algo becomes less important.
Sorry to have sent unedited mail.
So I’d argue that the pedantic answer is “you need only as many bits
of entropy as your largest fan out” — meaning that 10 bits would
allow 1024-way ECMP. But I don’t think that’s what you were actually
But, that is the proper answer for backbones where fair load
balancing is really required with many flows from many sources
As the required number of bits for the entropy is, as you pointed
out, very small, at the backbones, even entropy by source and
destination addresses should be, in practice, enough.
There are 3 kind of hashing algorithm
Four if you count the trails followed by runners drinking beer.
See on-on for instance.