Balanced Allocations and Double Hashing

Michael Mitzenmacher

With double hashing, for an item x, one generates two hash values f(x) and g(x),
and then uses combinations (f(x) + ig(x)) mod n for i = 0, 1, 2, … to generate
multiple hash values from the initial two. We show that the performance difference between double hashing and fully random hashing appears negligible in the standard balanced allocation paradigm, where each item is placed in the least loaded of
d choices, as well as several related variants. We perform an empirical study,
and consider multiple theoretical approaches. While several techniques can be used to
show asymptotic results for the maximum load, we demonstrate how fluid limit methods
explain why the behavior of double hashing and fully random hashing are essentially
indistinguishable in this context.

Guest: Michael Mitzenmacher
Host: Stefan Schmid