Balanced Allocations and Double Hashing

Michael Mitzenmacher

With double hashing, for an item x, one generates two hash values f(x) and g(x),
and then uses combinations (f(x) + ig(x)) mod n for i = 0, 1, 2, … to generate
multiple hash values from the initial two. We show that the performance difference between double hashing and fully random hashing appears negligible in the standard balanced allocation paradigm, where each item is placed in the least loaded of
d choices, as well as several related variants. We perform an empirical study,
and consider multiple theoretical approaches. While several techniques can be used to
show asymptotic results for the maximum load, we demonstrate how fluid limit methods
explain why the behavior of double hashing and fully random hashing are essentially
indistinguishable in this context.

Guest: Michael Mitzenmacher
Host: Stefan Schmid

A Polynomial-time Approximation Scheme for Fault-tolerant Distributed Storage

Constantinos Daskalakis, Anindya De, Ilias Diakonikolas, Ankur Moitra, and Rocco Servedio

Abstract: We consider a problem which has received considerable attention in systems literature because of its applications to routing in delay tolerant networks and replica placement in distributed storage systems. We provide an additive EPTAS for this problem. Our approach is inspired by, and extends, recent structural results from the complexity-theoretic study of linear threshold functions. Furthermore, in spite of the objective function being non-smooth, we give a unicriterion PTAS while previous work for such objective functions has typically led to a bicriterion PTAS. We believe our techniques may be applicable to get unicriterion PTAS for other non-smooth objective functions.

Guest: Anindya De, Berkeley, USA
Host: Stefan Schmid

Randomized Distributed Decision

Pierre Fraigniaud, Amos Korman, Merav Parter, David Peleg

Abstract: The paper tackles the power of randomization in the context of locality by analyzing the ability to`boost’ the success probability of deciding a distributed language. The main outcome of this analysis is that the distributed computing setting contrasts significantly with the sequential one as far as randomization is concerned. Indeed, we prove that in some cases, the ability to increase the success probability for deciding distributed languages is rather limited. Informally, a (p,q)-decider for a language L is a distributed randomized algorithm which accepts instances in L with probability at least p and rejects instances outside of L with probability at least q. It is known that every hereditary language that can be decided in t rounds by a (p,q)-decider, where p^2+q>1, can actually be decided deterministically in O(t) rounds. In one of our results we give evidence supporting the conjecture that the above statement holds for all distributed languages. This is achieved by considering the restricted case of path topologies. We then turn our attention to the range below the aforementioned threshold, namely, the case where p^2+q\leq1. We define B_k(t) to be the set of all languages decidable in at most t rounds by a (p,q)-decider, where p^{1+1/k}+q>1. It is easy to see that every language is decidable (in zero rounds) by a (p,q)-decider satisfying p+q=1. Hence, the hierarchy B_k provides a spectrum of complexity classes between determinism and complete randomization. We prove that all these classes are separated: for every integer k\geq 1, there exists a language L satisfying L\in B_{k+1}(0) but L\notin B_k(t) for any t=o(n). In addition, we show that B_\infty(t) does not contain all languages, for any t=o(n). Finally, we show that if the inputs can be restricted in certain ways, then the ability to boost the success probability becomes almost null.

Guest: Merav Parter

Host: Yvonne-Anne Pignolet