1 Introduction

1.1 The process and its motivations

We study the following repeated balls-into-bins process. Given any \(n \geqslant 2\), we initially assign n balls to n bins in an arbitrary way. Then, at every round, from each non-empty bin one ball is chosen according to some strategy (random, FIFO, etc) and re-assigned to one of the n bins uniformly at random. Every ball thus performs a sort of delayed random walk over the bins and the delays of such random walks depend on the size of the bin queues encountered during their paths. It thus follows that these random walks are correlated. We study the impact of such correlation on the maximum load.

Inspired by previous notions of (load) stability [1, 2], we study the maximum load\(M^{(t)}\), i.e., the maximum number of balls inside one bin at round t and we are interested in the largest \(M^{(t)}\) achieved by the process over a period of any polynomial length. We say that a configuration is legitimate if its maximum load is \(\mathcal {O}(\log n)\) and a process is stable if, starting from any legitimate configuration, it only takes on legitimate configurations over a period of \({\mathrm {poly}}(n)\) length, w.h.p.Footnote 1 We also investigate a probabilistic version of self-stabilization [3, 4]: we say that a process is self-stabilizing if it is stable and if, moreover, starting from any configuration, it converges to a legitimate configuration, w.h.p. The convergence time of a self-stabilizing process is the maximum number of rounds required to reach a legitimate configuration starting from any configuration. This natural notion of (probabilistic) self-stabilization has also been inspired by that in [5] for other distributed processes.

The stability property, i.e. visiting only legitimate configurations for a relatively-long period, has consequences for other important aspects of this process. For instance, if the process is stable, every ball can be delayed for at most \(\mathcal {O}(\log n)\) rounds before leaving a node. Hence, we can get good bounds on the progress of a ball, namely the number of rounds the ball is selected from its current bin queue, along a relatively-long period. Furthermore, we can eventually bound the parallel cover time, i.e., the time required for every ball to visit all bins.

The process we study models a natural randomized solution to the fundamental task known as traversal [6, 7], which itself abstracts a number of crucial primitives in distributed computing, such as (parallel) resource (or task) assignment. In the basic case, the goal is to assign one resource in mutual exclusion to all processors (i.e. nodes) of a distributed system. This is typically described as a traversal process performed by a token (representing the resource or task) over the network. The process terminates when the token has visited all nodes of the system. Randomized protocols for this problem [8] are efficient approaches when, for instance, the network is prone to faults/changes and/or when there is no global labeling of the nodes, e.g., the network is anonymous.

A simple randomized protocol is the one based on random walks [5, 8, 9]: starting from any node, the token performs a random walk over the network until all nodes are visited, w.h.p. The first round in which all nodes have been visited by the token is called the cover time of the random walk [8, 10]. The expected cover time for general graphs is \(\mathcal {O}(|V| \cdot |E|)\) (see, e.g., [11]). In particular, random walks are at the basis of the first uniform self-stabilizing mutual exclusion protocol achieving graph traversal in general graphs [5].

In distributed systems, we often are in the presence of several resources or tasks that must be processed by every node in parallel. This naturally leads to consider the multi-token traversal problem. Here, n different tokens (resources) are initially distributed over the set of nodes and every token must visit all nodes of the network.

Similarly to the basic case, we propose a randomized solution based on (parallel) random walks. In order to visit nodes, every token performs a random walk under the constraint that every node can process and release at most one token per round. Note that this constraint causes random walks to become correlated. In particular, it is easy to see that, when the graph is complete, the above protocol—based on parallel random walks—is in fact equivalent to the repeated balls-into-bins process analyzed in this paper. In such a scenario, maximum load becomes a critical measure: not only does it determine required buffer size at every node, it also affects the progress of the tokens and thus the parallel cover time.

1.2 Our results

We provide a new, almost-tight analysis of the repeated balls-into-bins process that significantly departs from previous ones and show that the system is self-stabilizing. In Theorem 1, we show that, for any arbitrarily-large constant c, if the process starts from a legitimate configuration, then the maximum load \(M^{(t)}\) is \(\mathcal {O}(\log n)\) for all \(t = \mathcal {O}(n^c)\), w.h.p. Moreover, starting from any configuration, the system reaches a legitimate configuration within \(\mathcal {O}(n)\) rounds, w.h.p.

The above result strongly improves over the best previous bounds [12,13,14] and it is almost tight, since the classical lower bound \(\varOmega (\log n /\log \log n)\) on the maximum load (see, e.g., [11]) clearly applies also in our repeated setting. Our result further implies that, under the FIFO queueing policy, any ball performs \(\varOmega (t/\log n)\) steps of its individual random walk over any sequence of \(t = {\mathrm {poly}}(n)\) rounds w.h.p., so the parallel cover time is \(\mathcal {O}\left( n \log ^2n\right) \) w.h.p. This is only a \(\log n\) factor away from the lower bound following from the single-ball process.

As observed in Sect. 1.1, when the graph is complete, the repeated balls-into-bins process is equivalent to the protocol for the multi-token traversal task based on parallel random walks. In this setting, our results imply that, starting from any configuration, this task is completed w.h.p. with at most a logarithmic slowdown w.r.t. the case of a single token.

We can also consider the adversarial model in which, in some faulty rounds, an adversary can re-assign the tokens to the nodes in an arbitrary way. Then, the self-stabilization and the linear convergence time shown in Theorem 1 imply that the \(\mathcal {O}\left( n \log ^2n\right) \) bound on the cover time still holds, provided that faulty rounds occur with a frequency no higher than cn, for a sufficiently large constant c. Thus, the random walk-based approach offers a simple solution to multi-token traversal that has minimal overhead, works in anonymous networks and is resilient to (possibly adversarial) faults. Moreover, its performances (in terms of congestion and cover time) are provably close to optimum.

To the purpose of analyzing the original process, we need to consider another repeated balls-into-bins process, which we call Tetris and which is of independent interest (see the overview of our analysis in Sect. 3).

1.3 Related work

The repeated balls-into-bins process was first considered in [13] (see there Lemma 2 in Subsection 3.1), [12] (see there Theorem 4.1 in Sect. 4), and [14] (see there Lemma 6 in Subsection 3.1), since it describes the process of performing parallel random walks in the (uniform) gossip model (also known as random phone-call model [15, 16]) when every message can contain at most one token. Maximum load (i.e., node congestion), token delays, mixing and cover times are here the most crucial aspects. We remark that the flavor of these studies is different from ours: indeed, their main goal is to keep maximum load and token delays logarithmic over some polylogarithmic period. Their aim is to achieve a fast mixing time for every random walk in the case of good expander graphs. In particular, in [13], a logarithmic bound is shown for the complete graph when \(m=\mathcal {O}(n/\log n)\) random walks are performed over a logarithmic time interval, while a similar bound is also given for some families of almost-regular random graphs in [14]. A new analysis is given in [12] for regular graphs and time intervals of arbitrary length, yielding the bound \(\mathcal {O}\left( \sqrt{t}\right) \). Finally, after the conference version of this paper [17], a probabilistic version of the Tetris process, where the number of new balls arriving at each round is a random variable with expectation \(\lambda n\), for some \(\lambda = \lambda (n) \in [0,1]\), has been studied in [18].

Balls-into-bins processes have been extensively studied in the area of parallel and distributed computing, mainly to address balanced-allocation problems [19,20,21], PRAM simulation [22] and hashing [23]. In order to optimize the total number of random bin choices used for the allocation, further allocation strategies have been proposed and analyzed (see, e.g., [24,25,26,27,28]). As previously mentioned, our notion of stability is inspired by those studied in [1, 2] where load balancing algorithms are analyzed in scenarios in which new tasks arrive during the run of the system, and existing jobs are executed by the processors and leave the system. An adversarial model for a sequential balls-into-bins process has been studied in [29]. We remark that, in the above previous works, the goal is different from ours: each ball/task must be allocated to one, arbitrary bin/processor (it is not a token-traversal process).

To the best of our knowledge, the closest model to our setting in classical queuing theory is the closed Jackson network [30]. In this model, time is continuous and each node processes a single token among those in its queue; processing each token takes an exponentially distributed interval of time. As soon as its processing is completed, each token leaves the current node and enters the queue of a neighbor chosen uniformly at random. Notice that, since time is continuous, the process’ events are sequential, so that the associated Markov chain is much simpler than the one describing our parallel process. In particular, the stationary distribution of a closed Jackson network can be expressed as a product-form distribution. It is noted in [31] that “[...] virtually all of the models that have been successfully analyzed in classical queuing network theory are models having a so-called product form stationary distribution”. Because of the above considerations regarding the difficulty of our process (especially the non-reversibility of its Markov chain), the stationary distribution is instead very likely not to exhibit a product-form distribution, thus laying outside the domain where the techniques of classical queuing theory seem effective. We finally cite the seminal work [32] on adversarial queuing systems: here, new tokens (having specified source and destination nodes) are inserted in the nodes according to some adversarial strategy and a notion of edge-congestion stability is investigated.

Other closely related lines of work have investigated similar problem in the setting with an infinite number of bins [33, 34]. Finally, [35] provided a general framework to study the stabilization time of asynchronous balls-into-bins processes, and [36] investigated the generalization of the 2-choices balls-into-bins process in which at each round a ball is re-launched by selecting d bins u.a.r. and placing the ball in the one with the smallest load. The latter process has also been applied to cover ball deletions [37].

2 Preliminaries

We recall that the repeated balls-into-bins process is described as follows:

We are given any \(n \geqslant 2\) balls and bins. Initially, balls are assigned to bins in an arbitrary way. Then, in every subsequent round, one ball is chosen from each non-empty bin according to some strategyFootnote 2 and re-assigned to one of the n bins uniformly at random.

We are interested in the maximum load\(M^{(t)}\), i.e., the maximum number of balls enqueued at the same bin round t and we are interested in how \(M^{(t)}\) evolves over time. To the purpose of studying the maximum load of the repeated balls into bins process, the state of the system is completely characterized by the load of every bin. Formally, for each bin \(u \in [n]\) let \({\mathscr {\mathcal {Q}}}_u^{(t)}\) be the r.v.Footnote 3 indicating the number of balls, i.e. the load, in u at round t. We write \(\mathbf {Q}^{(t)}\) for the vector of these random variables, i.e., \(\mathbf {Q}^{(t)} = \left( {\mathscr {\mathcal {Q}}}_u^{(t)}: u \in [n]\right) \). We write \(\mathbf {q}= (q_1, \dots , q_n)\) for a (load) configuration, i.e., \(q_u \in \{0, 1, \dots , n\}\) for every \(u \in [n]\) and \(\sum _{u = 1}^n q_u = n\). We define the maximum load of a configuration \(\mathbf {q}= (q_1, \dots , q_n)\) as

$$\begin{aligned} M(\mathbf {q}) = \max \{ q_u : u \in [n] \} , \end{aligned}$$

and, for brevity’ sake, given any round t of the process, we define

$$\begin{aligned} M^{(t)} = M\left( \mathbf {Q}^{(t)}\right) . \end{aligned}$$

According to the above definition, we say that a configuration \(\mathbf {q}\) is legitimate if \(M(\mathbf {q}) \leqslant \beta \cdot \log n\), for some absolute constant \(\beta > 0\).

3 Self-stabilization of repeated balls into bins

In this section, we prove the main result of this paper.

Theorem 1

Let c be an arbitrarily-large constant and let \(\mathbf {q}\) be any legitimate configuration. Let the repeated balls-into-bins process start from \(\mathbf {Q}^{(0)} = \mathbf {q}\). Then, over any period of length \(\mathcal {O}(n^c)\), the process visits only legitimate configurations, w.h.p., i.e., \(M^{(t)} = \mathcal {O}(\log n)\) for all \(t = \mathcal {O}(n^c)\) w.h.p. Moreover, starting from any configuration, the system reaches a legitimate configuration within \(\mathcal {O}(n)\) rounds, w.h.p.

The proof relies on the analysis of the behaviour of some essential random variables describing the repeated balls-into-bins process. In the next paragraph, we informally describe the main steps of this analysis. Then in Sects. 3.23.4, we prove the technical results required by such steps and, finally, in Sect. 3.5 these technical results are combined in order to prove Theorem 1.

3.1 Overview of the analysis

In the repeated balls-into-bins process, every bin can release at most one ball per round. As a consequence, the random walks performed by the balls delay each other and are thus correlated in a way that can make bin queues larger than in the independent case. Indeed, intuitively speaking, a large load observed at a bin in some round makes “any” ball more likely to spend several future rounds in that bin, because if the ball ends up in that bin in one of the next few rounds, it will undergo a large delay. This is essentially the major technical issue to cope with.

The previous approach in [12] relies on the fact that, in every round, the expected balance between the number of incoming and outgoing balls is always non-positive for every non-empty bin (notice that the expected number of incoming balls is always at most one). This may suggest viewing the process as a sort of parallel birth-death process [10]. Using this approach and with some further arguments, one can (only) get the “standard-deviation” bound \(\mathcal {O}(\sqrt{t})\) in [12]. Our new analysis proving Theorem 1 proceeds along three main steps.

(i) We first show that, after the first round, the aforementioned expected balance is always negative, namely, not larger than \(-1/4\). Indeed, the number of empty bins remains at least n / 4 with (very) high probability, which is extremely useful since a bin can only receive tokens from non-empty bins. This fact is shown to hold starting from any configuration and over any period of polynomial length.

(ii) In order to exploit the above negative balance to bound the load of the bins, we need some strong concentration bound on the number of balls entering a specific bin u along any period of polynomial size. However, it is easy to see that, for any fixed u, the random variables \(\left\{ Z^{(t)}_u\right\} _{t \geqslant 0}\) counting the number of balls entering bin u are not mutually independent, neither are they negatively associated, so that we cannot apply standard tools to prove concentration (see “Appendix B” for a counterexample).

To address this issue, we define a simpler repeated balls-into-bins process as follows.

figure a

Using a coupling argument and our previous upper bound on the number of empty bins, we prove that the maximum number of balls accumulating in a bin in the original process is not larger than the maximum number of balls accumulating in a bin in the Tetris process, w.h.p.

(iii) The Tetris process is simpler than the original one since, at every round, the number of balls assigned to the bins does not depend on the system’s state in the previous round. Hence, random variables \(\left\{ \hat{Z}^{(t)}_u\right\} _{t \geqslant 0}\) counting the number of balls arriving at bin u in the Tetris process are mutually independent. We can thus apply standard concentration bounds. On the other hand, differently from the approximating process considered in [12], the negative balance of incoming and outgoing balls proved in Step (i) still holds, thus yielding a much smaller bound on the maximum load than that in [12].

In the remainder of this section, we formally describe the above three steps, thus proving Theorem 1.

3.2 On the number of empty bins

We next show that the number of empty bins is at least a constant fraction of n over a very large time-window, w.h.p. This fact could be proved by standard concentration arguments if, at every round, all balls were thrown independently and uniformly at random. A little care is instead required in our process to properly handle, at any round, “congested” bins whose load exceeds 1. These bins will be surely non-empty at the next round too. So, the number of empty bins at a given round also depends on the number of congested bins in the previous round.

Lemma 1

Let \(\mathbf {q}= (q_1, \dots , q_n)\) be a configuration in a given round and let X be the random variable indicating the number of empty bins in the next round. For any large enough n, it holds that

$$\begin{aligned} \mathbf {P}_{}{\left( X \leqslant \frac{n}{4} \right) } \leqslant e^{- \alpha n} , \end{aligned}$$

where \(\alpha \) is a suitable positive constant.

Proof

Let \(a = a(\mathbf {q})\) and \(b = b(\mathbf {q})\) respectively denote the number of empty bins and the number of bins with exactly one token in configuration \(\mathbf {q}\). For each bin u of the \(a+b\) bins with at most one token, let \(Y_u\) be the random variable indicating whether or not bin u is empty in the next round, so that

$$\begin{aligned} X = \sum _{u = 1}^{a+b} Y_u \quad \text{ and }\quad \mathbf {P}_{}{\left( Y_u = 1 \right) } = \left( 1 - \frac{1}{n} \right) ^{n - a} \geqslant e^{-\frac{n-a}{n-1}} , \end{aligned}$$

where in the last inequality we used the fact that \(1 - x \geqslant e^{- \frac{x}{1-x}}\). Hence we have that

$$\begin{aligned} \mathbf {E}_{} \left[ X \right] \geqslant (a+b) e^{-\frac{n-a}{n-1}} . \end{aligned}$$
(1)

The crucial fact is that the number of bins with two or more tokens cannot exceed the number of empty bins, i.e. \(n - (a+b) \leqslant a\). Thus, we can bound the number of empty bins from below,Footnote 4\(a \geqslant (n-b)/2\), and by using that bound in (1) we get

$$\begin{aligned} \mathbf {E}_{} \left[ X \right] \geqslant \frac{n+b}{2} e^{ -\frac{n+b}{2(n-1)} } . \end{aligned}$$

Now observe that, for large enough n a positive constant \(\varepsilon \) exists such that

$$\begin{aligned} \frac{n+b}{2} e^{ -\frac{n+b}{2(n-1)} } \geqslant (1+\varepsilon )\frac{n}{4} ,\quad \text{ for } \text{ every } 0 \leqslant b \leqslant n . \end{aligned}$$

It is easy to show that random variables \(Y_1, \dots , Y_{a+b}\) are negatively associated (e.g., see Theorem 13 in  [38]). Thus we can apply (see Lemma 7 in [38]) the Chernoff bound (6) with \(\delta = \varepsilon / (1+\varepsilon )\) to X to obtain

$$\begin{aligned} \mathbf {P}_{}{\left( X \leqslant \frac{n}{4} \right) } \leqslant \exp \left( - \frac{\varepsilon ^2}{4 (1+\varepsilon )} n \right) . \end{aligned}$$

\(\square \)

From the above lemma it easily follows that, if we look at our process over a time-window \(T = T(n)\) of polynomial size, after the first round we always see at least n / 4 empty bins, w.h.p. More formally, for every \(t \in \{1, \dots , T\}\), let \(\mathcal {E}_t\) be the event “The number of empty bins at round t is at least n / 4”. From Lemma 1 and the union bound we get the following lemma.

Lemma 2

Let \(\mathbf {q}_0\) denote the initial configuration, let \(T = T(n) = n^c\) for an arbitrarily large constant c. For any large enough n it holds that

$$\begin{aligned} \mathbf {P}_{}{\left( \bigcap _{t = 1}^T \mathcal {E}_t | \mathbf {Q}^{(0)} = \mathbf {q}_0 \right) } \geqslant 1 - e^{-\gamma n} , \end{aligned}$$

where \(\gamma \) is a suitable positive constant.

Proof

By using the union bound we obtain

$$\begin{aligned} \mathbf {P}_{}{\left( \bigcap _{t = 1}^T \mathcal {E}_t | \mathbf {Q}^{(0)} = \mathbf {q}_0 \right) }&= 1 - \mathbf {P}_{}{\left( \bigcup _{t = 1}^T \overline{\mathcal {E}_t} | \mathbf {Q}^{(0)} = \mathbf {q}_0 \right) } \\&\geqslant 1 - \sum _{t = 1}^T \mathbf {P}_{}{\left( \overline{\mathcal {E}_t} | \mathbf {Q}^{(0)} = \mathbf {q}_0 \right) } . \end{aligned}$$

By conditioning on the configuration at round \(t-1\), from the Markov property and Lemma 1 it then follows that

$$\begin{aligned}&\mathbf {P}_{}{\left( \overline{\mathcal {E}_t} | \mathbf {Q}^{(0)} = \mathbf {q}_0 \right) } \\&\quad = \sum _{\mathbf {q}} \mathbf {P}_{}{\left( \overline{\mathcal {E}_t} | \mathbf {Q}^{(t-1)} = \mathbf {q} \right) } \mathbf {P}_{}{\left( \mathbf {Q}^{(t-1)} = \mathbf {q}| \mathbf {Q}^{(0)} = \mathbf {q}_0 \right) } \\&\quad \leqslant e^{-\alpha n} . \end{aligned}$$

Hence,

$$\begin{aligned} \mathbf {P}_{}{\left( \bigcap _{t = 1}^T \mathcal {E}_t | \mathbf {Q}^{(0)} = \mathbf {q}_0 \right) } \geqslant 1 - T e^{-\alpha n} \geqslant 1 - e^{-\gamma n} , \end{aligned}$$

for a suitable positive constant \(\gamma \). \(\square \)

3.3 Coupling with Tetris

Using a coupling argument and Lemma 2 we now prove that the maximum load in the original process is stochastically not larger than the maximum load in the Tetris process w.h.p.

In what follows we denote by \(W^{(t)}\) the set of non-empty bins at round t in the original process. Recall that, in the latter, at every round a ball is selected from every non-empty bin u and it is moved to a bin chosen u.a.r. Accordingly we define, for every round t, the random variables

$$\begin{aligned} \left\{ X_u^{(t+1)} : u \in W^{(t)} \right\} , \end{aligned}$$
(2)

where \(X_u^{(t+1)}\) indicates the new position reached in round \(t+1\) by the ball selected in round t from bin u. Notice that for every non-empty bin \(u \in W^{(t)}\) we have that \(\mathbf {P}_{}{\left( X_u^{(t+1)} = v \right) } = 1/n\) for every bin \(v \in [n]\). The random process \(\left\{ \mathbf {Q}^{(t)} : t \in \mathbb {N} \right\} \) is completely defined by random variables \(X_u^{(t)}\)’s, indeed we can write

$$\begin{aligned} \mathcal {Q}_v^{(t+1)} = \max \left\{ \mathcal {Q}_v^{(t)}- 1,0\right\} + \left| \left\{ u \in W^{(t)} : X_u^{(t+1)} = v \right\} \right| \end{aligned}$$

and

$$\begin{aligned} W^{(t+1)} = \left\{ u \in [n] : \mathcal {Q}_u^{(t+1)} \geqslant 1 \right\} . \end{aligned}$$

Analogously, for each bin \(u \in [n]\) in the Tetris process, let \(\hat{\mathcal {Q}}_u^{(t)}\) be the random variable indicating the number of balls in bin u in round t. We next prove that, over any polynomially-large time window, the maximum load of any bin in our process is stochastically smaller than the maximum number of balls in a bin of the Tetris process w.h.p. More formally, we prove the following lemma.

Lemma 3

Let us start both the original process and the Tetris process from the same configuration \(\mathbf {q}= (q_1, \dots , q_n)\) such that \(\sum _{u = 1}^n q_u = n\) and containing at least n / 4 empty bins. Let \(T = T(n)\) be an arbitrary round and let \(M_T\) and \(\hat{M}_T\) be respectively the random variables indicating the maximum loads in our original process and in the Tetris process, up to round T. Formally

$$\begin{aligned} M_T = \max \left\{ \mathcal {Q}_u^{(t)} : u \in [n], t = 1, 2, \dots , T \right\} \end{aligned}$$

and

$$\begin{aligned} \hat{M}_T = \max \left\{ \hat{\mathcal {Q}}_u^{(t)} : u \in [n], t = 1, 2, \dots , T \right\} . \end{aligned}$$

Then, for every \(k \geqslant 0\) it holds that

$$\begin{aligned} \mathbf {P}_{}{\left( M_T \geqslant k \right) } \leqslant \mathbf {P}_{}{\left( \hat{M}_T \geqslant k \right) } + T \cdot e^{-\gamma n} , \end{aligned}$$

for a suitable positive constant \(\gamma \).

Proof

We proceed by coupling the Tetris process with the original one round by round. Intuitively speaking the coupling proceeds as follows:

  • Case (i): the number of non-empty bins in the original process is \(h \leqslant \frac{3}{4} n\). For each non-empty bin u, let \(i_u\) be the ball picked from u. We throw one of the \(\frac{3}{4} n\) new balls of the Tetris process in the same bin in which \(i_u\) ends up. Then, we throw all the remaining \(\frac{3}{4} n - h\) balls independently u.a.r.

  • Case (ii): the number of non-empty bins is \(h >\frac{3}{4} n\). We run one round of the Tetris process independently from the original one.

By construction, if the number of non-empty bins in the original process is not larger than \(\frac{3}{4} n\) at any round, case (ii) never applies and the Tetris process “dominates” the original one, meaning that every bin in the Tetris process contains at least as many balls as the corresponding bin in the original one. Since from Lemma 2 we know that the number of non-empty bins in the original process is not larger than \(\frac{3}{4} n\) for any time-window of polynomial size w.h.p., we thus have that the Tetris process dominates the original process for the whole time window w.h.p.

Formally, for \(t \in \{1, \dots , T\}\), denote by \(B^{(t)}\) the set of new balls in the Tetris process at round t (recall that the size of \(B^{(t)}\) is (3 / 4)n for every \(t \in \{1, \dots , T\}\)). For any round t and any ball \(i \in B^{(t)}\), let \(\hat{X}_i^{(t)}\) be the random variable indicating the bin where the ball ends up. Finally, let \(\left\{ U_i^{(t)} : t = 1, \dots , T, i \in B^{(t)} \right\} \) be a family of i.i.d. random variables uniform over [n].

At any round \(t \in \{ 1, \dots , T\}\), the following two cases may arise.

\(\text { If }|W^{(t-1)}| \leqslant (3/4)n\): Let \(B^{(t)}_W\) be an arbitrary subset of \(B^{(t)}\) with size exactly \(|W^{(t-1)}|\), let \(f^{(t)} : B^{(t)}_W \rightarrow W^{(t-1)}\) be an arbitrary bijection and set

$$\begin{aligned} \hat{X}_i^{(t)} =\left\{ \begin{array}{ll} X_i^{(t)} &{} \quad \text{ if } i \in B^{(t)}_W \\ U_i^{(t)} &{} \quad \text{ if } i \in B^{(t)} {\setminus } B^{(t)}_W \end{array}\right. \end{aligned}$$
(3)

\({\text {If} |W^{(t-1)}| > (3/4)n}\): Set \(\hat{X}_i^{(t)} = U_i^{(t)}\) for all \(i \in B^{(t)}\).

By construction we have that random variables

$$\begin{aligned} \left\{ \hat{X}_i^{(t)} : t \in \{ 1,2, \dots , T\}, i \in B^{(t)} \right\} \end{aligned}$$

are mutually independent and uniformly distributed over [n]. Moreover, in the joint probability space for any k we have that

$$\begin{aligned} \mathbf {P}_{}{\left( M_T \geqslant k \right) }&= \mathbf {P}_{}{\left( M_T \geqslant k, \hat{M}_T \geqslant M_t \right) } \\&\quad + \mathbf {P}_{}{\left( M_T \geqslant k, \hat{M}_T< M_T \right) } \\&\leqslant \mathbf {P}_{}{\left( \hat{M}_T \geqslant k \right) } + \mathbf {P}_{}{\left( \hat{M}_T < M_T \right) } . \end{aligned}$$

Finally, let \(\mathcal {E}_T\) be the event “There are at least n / 4 empty bins at all rounds \(t \in \{ 1, \dots , T\}\)” and observe that, from the coupling we have defined, the event \(\mathcal {E}_T\) implies event “\(\hat{M}_T \geqslant M_T\)”. Hence \(\mathbf {P}_{}{\left( \hat{M}_T < M_T \right) } \leqslant \mathbf {P}_{}{\left( \overline{\mathcal {E}_T} \right) }\) and the thesis follows from Lemma 2. \(\square \)

3.4 Analysis of the Tetris process

We begin by observing that in the Tetris process, the random variables indicating the number of balls ending up in a bin in different rounds are i.i.d. binomial. This fact is extremely useful to give upper bounds on the load of the bins, as we do in the next simple lemma, that will be used to prove self-stabilization of the original process.

Lemma 4

From any initial configuration, in the Tetris process every bin will be empty at least once within 5n rounds, w.h.p.

Proof

Let \(u \in [n]\) be a bin with \(k \leqslant n\) balls in the initial configuration. For \(t \in \{ 1, \dots , 5n\}\) let \(Y_t\) be the random variable indicating the number of new balls ending up in bin u at round t. Notice that in the Tetris process \(Y_1, \dots , Y_{5n}\) are i.i.d. \(B\left( (3/4)n, 1/n \right) \) hence

$$\begin{aligned} \mathbf {E}_{} \left[ Y_1 + \cdots + Y_{5n} \right] = \frac{15}{4} n \end{aligned}$$

and by applying Chernoff bound (7) with \(\delta = 1/15\) we get

$$\begin{aligned} \mathbf {P}_{}{\left( Y_1 + \cdots + Y_{5n} \geqslant 4n \right) } \leqslant e^{- \alpha n} , \text{ where } \alpha = 1/180 . \end{aligned}$$

Now let \(\mathcal {E}_u\) be the event “Bin u will be non-empty for all the 5n rounds”. Since when a bin is non-empty it loses a ball at every round, event \(\mathcal {E}_u\) implies, in particular, that

$$\begin{aligned} k - 5n + Y_1 + \cdots + Y_{5n} \geqslant 0 . \end{aligned}$$

That is

$$\begin{aligned} Y_1 + \cdots + Y_{5n} \geqslant 5n - k \geqslant 4n . \end{aligned}$$

Thus

$$\begin{aligned} \mathbf {P}_{}{\left( \mathcal {E}_u \right) } \leqslant \mathbf {P}_{}{\left( Y_1 + \cdots + Y_{5n} \geqslant 4n \right) } \leqslant e^{-\alpha n} . \end{aligned}$$

The thesis then follows from the union bound over all bins \(u \in [n]\). \(\square \)

We next focus on the maximum load that can be observed in the Tetris process at any given bin within a finite interval of time. We note that this result could be proved using tools from drift analysis (e.g., see [39]). We provide here an elementary and direct proof, that explicitly relies on the Markovian structure of the Tetris process.

Let \(\{X_t\}_t\) be a sequence of i.i.d. \(B\left( (3/4)n,1/n \right) \) random variables and let \(Z_t\) be the Markov chain with state space \(\{0,1,2, \dots \}\) defined as follows

$$\begin{aligned} Z_{t} =\left\{ \begin{array}{ll} 0 &{} \quad \text{ if } Z_{t-1} = 0 \\ Z_{t-1} - 1 + X_t &{}\quad \text{ if } Z_{t-1} \geqslant 1 \end{array}\right. \end{aligned}$$
(4)

Observe that 0 is an absorbing state for \(Z_t\) and let \(\tau \) be the absorption time \(\tau = \inf \{t \in \mathbb {N} : Z_t = 0\}\). We first prove the following lemma.

Lemma 5

For any initial starting state \(k \in \mathbb {N}\) and any \(t \geqslant 8 k\), it holds that

$$\begin{aligned} \mathbf {P}_{k}{\left( \tau > t \right) } \leqslant e^{- t / 144} . \end{aligned}$$

Proof

Observe that

$$\begin{aligned} \mathbf {P}_{k}{\left( \tau> t \right) }&= \mathbf {P}_{k}{\left( Z_t> 0 \right) } \\&= \mathbf {P}_{}{\left( k + \sum _{i = 1}^t X_i - t> 0 \right) } \\&= \mathbf {P}_{}{\left( \sum _{i = 1}^{t} X_i> t - k \right) } \\&\leqslant \mathbf {P}_{}{\left( \sum _{i = 1}^{t} X_i > \frac{7}{8} t \right) } , \end{aligned}$$

where in the last inequality we used hypothesis \(k < (1/8)t\). Since the \(X_i\)s are i.i.d. binomial B((3 / 4)n, 1 / n), it follows that \(\sum _{i = 1}^{t} X_i\) is binomial B((3 / 4)nt, 1 / n) and from Chernoff bound we get that

$$\begin{aligned} \mathbf {P}_{}{\left( \sum _{i = 1}^t X_i> \frac{7}{8} t \right) }&= \mathbf {P}_{}{\left( \sum _{i = 1}^t X_i > \left( 1 + \frac{1}{6}\right) \frac{3}{4}t \right) } \\&\leqslant e^{- \frac{(1/6)^2}{3} \frac{3}{4}t} \\&= e^{- t/144} . \end{aligned}$$

\(\square \)

Now we can easily prove the following statement on the Tetris process.

Lemma 6

Let c be an arbitrarily-large constant, and let the Tetris process start from any legitimate configuration. The maximum load \(\hat{M}^{(t)}\) is \(\mathcal {O}(\log n)\) for all \(t = \mathcal {O}(n^c)\), w.h.p.

Proof

Consider an arbitrary bin u that is non-empty in the initial legitimate configuration. Let \({\hat{\mathcal {Q}}}^{(0)} = \mathcal {O}(\log n)\) be its initial loadFootnote 5 and let \(\tau = \inf \left\{ t : \hat{\mathcal {Q}}^{(t)} = 0 \right\} \) be the first round the bin becomes empty. Observe that, for any \(t \leqslant \tau \), \({\hat{\mathcal {Q}}}^{(t)}\) behaves exactly as the Markov chain defined in (4). Hence, from Lemma 5 it follows that for every constant \(\hat{c}\) such that \(\hat{c} \log n \geqslant 8 {\hat{\mathcal {Q}}}^{(0)}\) we have

$$\begin{aligned} \mathbf {P}_{\hat{\mathcal {Q}}^{(0)}}{\left( \tau > \hat{c} \log n \right) } \leqslant n^{-\hat{c}/144} . \end{aligned}$$
(5)

Thus, within \(\mathcal {O}(\log n)\) rounds the bin will be empty w.h.p., and since the load of the bin decreases of at most one unit per round, the load of the bin is \(\mathcal {O}(\log n)\) for all such rounds w.h.p.

Next, define a phase as any sequence of rounds that starts when the bin becomes non-empty and ends when it becomes empty again. Notice that, by using a standard balls-into-bins argument, in the first round of each phase the load of the bin will be \(\mathcal {O}(\log n / \log \log n)\) w.h.p. Moreover, in any phase the load of the bin can be coupled with the Markov chain in (4). Hence, for any arbitrary large constant c we can choose the constant \(\hat{c}\) in (5) large enough so that, by taking the union bound over all phases up to round \(n^c\), the load of the bin is \(\mathcal {O}(\log n)\) in all rounds \(t \leqslant n^c\) w.h.p.

Finally, observe that for any bin that is initially empty the same argument applies with the only difference that the first phase for the bin does not start at round 0 but at the first round the bin becomes non-empty. The thesis thus follows from a union bound over all the bins. \(\square \)

3.5 Back to the original process: proof of Theorem 1

From a standard balls-into-bins argument (see, e.g., [11]), starting from any legitimate configuration, after one round the process still lies in a legitimate configuration w.h.p. and, thanks to Lemma 1, there are at least n / 4 empty bins w.h.p. From Lemma 3 with \(T = \mathcal {O}\left( n^c \right) \), we have that the maximum load of the repeated balls-into-bins process does not exceed the maximum load of the Tetris process in all rounds \(1,\dots ,T\), w.h.p. Finally, the upper bound on the maximum load of the Tetris process in Lemma 6 completes the proof of the first statement of Theorem 1.

As for self-stabilization, given an arbitrary initial configuration, Lemma  implies that within \(\mathcal {O}(n)\) rounds, all bins have been emptied at least once, w.h.p. When a bin becomes empty, Lemma 5 ensures that its load will be \(\mathcal {O}(\log n)\) over a polynomial number of rounds. Hence, within \(\mathcal {O}(n)\) rounds, the system will reach a legitimate configuration, w.h.p. \(\square \)

4 On the multi-token traversal problem

As mentioned in the introduction, the repeated balls-into-bins process can also be seen as running parallel random walks of n distinct tokens (i.e. balls), each of them starting from a node (i.e. bins) of the complete graph of size n. This is a randomized protocol for the multi-token traversal problem where tokens represent different resources/tasks that must be assigned to all nodes in mutual exclusion [8]. In this scenario, a critical complexity measure is the (global) cover time, i.e., the time required by any token to visit all nodes.

It is important to observe that our analysis of the maximum load works for anonymous tokens and nodes and, hence, for any particular queuing strategy. Under FIFO strategy, no token spends in a bin a number of rounds exceeding the current load as it entered the bin. Theorem 1 then implies that, after an initial stabilizing phase of \(\mathcal {O}(n)\) rounds, every token will spend at most a logarithmic number of rounds in any bin queue it traverses and over any period of polynomial length, w.h.p. We also know that the cover time of the single random-walk process is w.h.p. \(\mathcal {O}(n \log n)\) (see, e.g., [11]). Combining the above two facts, we easily get the following, almost tight result on the multi-token traversal problem problem.

Corollary 1

The random-walk protocol for the multi-token traversal problem on the clique has cover time \(\mathcal {O}\left( n \log ^2n\right) \), w.h.p.

4.1 Adversarial model

The self-stabilization property shown in Theorem 1 makes the random walk protocol robust to transient faults. We can consider an adversarial model in which, in some faulty rounds, an adversary can reassign the tokens to the nodes in an arbitrary way. Then, the linear convergence time shown in Theorem 1 implies that the \(\mathcal {O}\left( n \log ^2n\right) \) bound on the cover time still holds provided the faulty rounds happen with a frequency not higher than \(\gamma n\), for any constant \(\gamma \geqslant 6\). Indeed, thanks to Lemma 4, the action of an adversary manipulating the system configuration once every \(\gamma n\) rounds can affect only the successive 5n rounds, while our analysis in the non-adversarial model does hold for the remaining \((\gamma -5) n\) rounds. It follows that the overall slowdown on the cover time produced by such an adversary is at most a constant factor on the previous \(\mathcal {O}\left( n\log ^2 n\right) \) upper bound, w.h.p.

5 Conclusions and open questions

In this paper, we showed that repeated balls-into-bin is self-stabilizing when the number m of balls equals the number n of bins (obviously, this is still the case, whenever \(m < n\)). An interesting open question is whether this result extends to larger values of m, i.e., for any \(m = \mathcal {O}(n \log n)\).

A more general interesting question is the study of this process over more general graph classes. This line of research is also motivated by several recent applications of parallel random walks in the (uniform) gossip model [8, 13, 14, 40]. As mentioned in the introduction, previous analysis of this process provides a bound \(\mathcal {O}\left( \sqrt{t}\right) \) on the maximum load after t rounds on regular graphs [12]. We believe this previous bound for regular graphs is far from tight and it leads to rough bounds on parallel cover times on these networks. We conjecture that the maximum load remains logarithmic for a long period in any regular graph. A possible reason for this phenomenon (if true) might be that the expected difference between (token) arrivals and departures is always non-positive at every node in regular graphs. As highlighted in our analysis of the complete graph, this fact alone is not enough but it could be combined with a suitable bound on the number of empty bins, in order to prove our conjecture in this more general case. Unfortunately, non-complete graphs present a further technical issue: in order to apply any argument based on the presence of empty bins, not only do we need to argue about their number, but also about their distribution across the network. This technical issue seems to be far from trivial even on simple topologies such as rings.

Finally, a technical question concerns the tightness of our bound on the maximum load. In the classical (one shot) balls-into-bins problem, it is well-known that the maximum load of the bins is \(\varTheta \left( \log n / \log \log n \right) \) w.h.p. One may wonder whether our \(\mathcal {O}\left( \log n \right) \) upper bound on the maximum load of the repeated process for a polynomial number of rounds is tight, or it can be improved to \(\mathcal {O}\left( \log n / \log \log n \right) \). We conjecture that, within any polynomial time window, the probability that the maximum load asymptotically exceeds \(\log n / \log \log n\) is non-negligible.