Lower Bounds for Distinct Elements in the Message Passing Model

David Woodruff and Qin Zhang

This interview introduces the results of two papers, one published at SODA 2014, one at DISC 2013.

——– SODA 2014 ————
Title: An Optimal Lower Bound for Distinct Elements in the Message Passing Model

Abstract: In the message-passing model of communication, there are k players each with their own private input, who try to compute or approximate a function of their inputs by sending messages to one another over private channels. We consider the setting in which each player holds a subset S_i of elements of a universe of size n, and their goal is to output a (1+\eps)-approximation to the total number of distinct elements in the union of the sets S_i with constant probability, which can be amplified by independent repetition. This problem has applications in data mining, sensor networks, and network monitoring. We resolve the communication complexity of this problem up to a constant factor, for all settings of n, k and \eps, by showing a lower bound of \Omega(k \cdot \min(n, 1/\eps^2) + k \log n) bits. This improves upon previous results, which either had non-trivial restrictions on the relationships between the values of n, k and \eps, or were suboptimal by logarithmic factors, or both.

———- DISC 2013 —————–
Title: When Distributed Computation is Communication Expensive

Abstract: We consider a number of fundamental statistical and graph problems in the message-passing model, where we have k machines (sites), each holding a piece of data, and the machines want to jointly solve a problem defined on the union of the k data sets. The communication is point-to-point, and the goal is to minimize the total communication among the k machines. This model captures all point-to-point distributed computational models with respect to minimizing communication costs. Our analysis shows that exact computation of many statistical and graph problems in this distributed setting requires a prohibitively large amount of communication, and often one cannot improve upon the communication of the simple protocol in which all machines send their data to a centralized server. Thus, in order to obtain protocols that are communication-efficient, one has to allow approximation, or investigate the distribution or layout of the data sets.
Guest: Qin Zhang
Host: Yvonne-Anne Pignolet

Stone Age Distributed Computing

Yuval Emek and Roger Wattenhofer

Abstract: A new model that depicts a network of randomized finite state machines operating in an asynchronous environment is introduced. This model, that can be viewed as a hybrid of the message passing model and cellular automata is suitable for applying the distributed computing lens to the study of networks of sub-microprocessor devices, e.g., biological cellular networks and man-made nano-networks. Although the computation and communication capabilities of each individual device in the new model are, by design, much weaker than those of an abstract computer, we show that some of the most important and extensively studied distributed computing problems can still be solved efficiently.

Guest: Yuval Emek
Host: Yvonne-Anne Pignolet

Distributed Deterministic Edge Coloring using Bounded Neighborhood Independence

Leonid Barenboim and Michael Elkin

We study the edge-coloring problem in the message-passing model of distributed computing. This is one of the most fundamental problems in this area. Currently, the best-known deterministic algorithms for (2∆− 1)-edge-coloring requires O(∆) + log∗ n time [23], where ∆ is the maximum degree of the input graph. Also, recent results of [5] for vertex-coloring imply that one can get an O(∆)-edge-coloring in O(∆^ϵ ·log n) time, and an O(∆^{1+ϵ})-edge-coloring in O(log ∆ log n) time, for an arbitrarily small constant ϵ > 0. In this paper we devise a significantly faster deterministic edge-coloring algorithm. Specifically, our algorithm computes an O(∆)-edge-coloring in O(∆^ϵ ) + log∗ n time, and an O(∆^{1+ϵ} )-edge-coloring in O(log ∆) + log∗ n time. This result improves the state-of-the-art running time for deterministic edge-coloring with this number of colors in almost the entire range of maximum degree ∆. Moreover, it improves it exponentially in a wide range of ∆, specifically, for 2^Ω(log* n) ≤ ∆ ≤ polylog(n). In addition, for small values of ∆ (up to log^{1−δ} n, for some fixed δ > 0) our deterministic algorithm outperforms all the existing randomized algorithms for this problem.

Guest: Leonid Barenboim
Host: Chen Avin

Performing Dynamically Injected Tasks on Processes Prone to Crashes and Restarts

Chryssis Georgiou and Dariusz Kowalski

Abstract: To identify the tradeoffs between efficiency  and fault-tolerance in dynamic cooperative computing, we initiate the study of a task performing problem under dynamic processes’ crashes/restarts and task injections. The system consists of $n$ message-passing processes which, subject to dynamic crashes and restarts, cooperate in performing independent tasks that are continuously and dynamically injected to the system. The task specifications are not known a priori to the processes. This problem abstracts todays Internet-based computations, such as Grid computing and cloud services, where tasks are generated dynamically and different tasks may be known to different processes. We measure performance in terms of the number of pending tasks, and as such it can be directly compared with the optimum number obtained under the same crash-restart-injection pattern by the best off-line algorithm. We propose several deterministic algorithmic solutions to the considered problem under different information models and correctness criteria, and we argue that their performance is close to the best possible offline solutions.

Guest: Chryssis Georgiou
Host: Yvonne-Anne Pignolet