The supermarket model with bounded queue lengths in equilibrium

In the supermarket model, there are n queues, each with a single server. Customers arrive in a Poisson process with arrival rate λn, where λ = λ(n) ∈ (0, 1). Upon arrival, a customer selects d = d(n) servers uniformly at random, and joins the queue of a least-loaded server amongst those chosen. Service times are independent exponentially distributed random variables with mean 1. In this paper, we analyse the behaviour of the supermarket model in the regime where λ(n) = 1−n−α and d(n) = bnc, where α and β are fixed numbers in (0, 1]. For suitable pairs (α, β), our results imply that, in equilibrium, with probability tending to 1 as n → ∞, the proportion of queues with length equal to k = dα/βe is at least 1− 2n−α+(k−1)β , and there are no longer queues. We further show that the process is rapidly mixing when started in a good state, and give bounds on the speed of mixing for more general initial conditions.


Introduction
The supermarket model is a well-studied Markov chain model for a dynamic load-balancing process. There are n servers, and customers arrive according to a Poisson process with rate λ = λ(n) < 1. On arrival, a customer inspects d = d(n) queues, chosen uniformly at random with replacement, and joins a shortest queue among those inspected (in case of a tie, the first shortest queue in the list is joined). Each server serves one customer at a time, and service times are iid random variables, with an exponential distribution of mean 1.
A number of authors [17,18,23,7,8,13,11,12,9,6,5,21] have studied the supermarket model, as well as various extensions, e.g., to the setting of a Jackson network [15] and to a version with one queue saved in memory [19,14]. There are related ideas in other queueing models, for instance one where one server inspects d queues and serves the longest [1].
Early papers on the supermarket model concentrated on the case where λ and d are held fixed as n tends to infinity. As with other related models (see, e.g. [10,20]), there is a dramatic change when d is increased from 1 to 2: if Key words and phrases. supermarket model; Markov chains; rapid mixing; concentration of measure; load balancing.
The research of Malwina Luczak was supported by an EPSRC Leadership Fellowship, grant references EP/J004022/1 and EP/J004022/2. d = 1, the maximum queue length in equilibrium is of order log n, while if d is a constant at least 2, then the maximum queue length in equilibrium is of order log log n/ log d.
Luczak and McDiarmid [11] prove that, for fixed λ and d, the sequence of Markov chains indexed by n is rapidly mixing: as n → ∞, the time for the system to converge to equilibrium is of order log n, provided the initial state has not too many customers and no very long queue. Also, they show that, for d ≥ 2, with probability tending to 1 as n → ∞, in the equilibrium distribution the maximum queue length takes one of at most 2 values, and that these values are log log n/ log d + O(1).
More recently, there has been interest in regimes where the parameters of the model may vary as n tends to infinity. Fairthorne [6] and Mukherjee et al [21] treat the case where λ < 1 is fixed and d = d(n) tends to infinity with n. Eschenfeldt and Gamarnik [5] consider the "heavy traffic regime", where λ = λ(n) tends to 1 from below as n → ∞, and d is held fixed.
In this paper, we study a different regime. We focus on the case where λ = λ(n) = 1 − n −α and d = d(n) = n β , where α and β are fixed constants in (0, 1] with k − 1 < α/β < k for some positive integer k. We also require that 2α < 1 + β(k − 1), for reasons that we shall explain after the statement of Theorem 1.1 (see Remark (4)). Our results imply that, in equilibrium, with high probability (i.e., with probability tending to 1 as n → ∞), the proportion of queues of length exactly equal to k is at least 1 − 2n −α+(k−1)β , and there are no longer queues. Our methods actually cover a much broader range of parameter values, but we focus on this case for ease of exposition.
We offer two reasons why such a regime might be of interest: for one, this is a range of parameter values where near-perfect load balancing is achieved, with bounded maximum queue length, even when the system is running at nearly full capacity, and the values of d we obtain thus represent a sufficient amount of resource (in terms of inspection of queue-lengths) required to achieve this load-balancing. From a more theoretical viewpoint, we see our regimes, for the different values of α/β , as possessing a scaling limit as n → ∞, and varying the parameters so that α/β passes through an integer is an example of a phase transition.
To motivate our results, we first give heuristics to indicate what behaviour we might expect. Consider the infinite system of differential equations where v 0 (t) = 1 for all t. For an initial condition v(0) such that 1 ≥ v 1 (0) ≥ v 2 (0) ≥ . . . ≥ 0 and v j (0) → 0 as j → ∞, there is a unique solution v(t) (t ≥ 0), with v(t) = (v j (t)) j≥1 , which is such that 1 ≥ v 1 (t) ≥ v 2 (t) ≥ . . . ≥ 0 and v j (t) → 0 as j → ∞, for each t ≥ 0. It follows from earlier work [23,7,8,13,12] that, with high probability, for each j, the proportion of queues of length at least j at time t stays "close to" v j (t) over a bounded time interval (or an interval whose length tends to infinity at most polynomially with n), assuming this is the case at time 0. The system (1.1) has a unique, attractive, fixed point π = (π j ) j≥1 , such that π j → 0 as j → ∞, given by If λ and d are fixed constants, then, in equilibrium, with high probability, the proportion of queues of length at least j is close to π j for each j ≥ 1; see [7,8,11,12]. For λ and d functions of n, there is no single limiting differential equation (1.1), but rather a sequence of approximating differential equations, each with their own solutions and fixed points. In this paper, we do not address the question of whether such approximations to the evolution of the process are valid in generality, focussing solely on equilibrium behaviour and the time to reach equilibrium. If λ = 1 − n −α and d = n β , and k is an integer with k − 1 < α/β < k, then We will indeed show that, in equilibrium, with high probability, there are no queues of length greater than k, while the proportion of queues with length exactly k tends to 1 as n → ∞. Moreover we show that, for 0 ≤ j < k, the number of queues of length exactly j is very close to n(π j −π j+1 ) n 1−α+jβ . We also prove results on mixing time to equilibrium. We show that, if we start in a "good" initial state (one without any very long queue, and without too many customers in the system in total), then the mixing time is of order n 1+(k−1)β log n, which is best possible up to the logarithmic term. We also prove general bounds on the mixing time, in terms of the initial number of customers and the initial maximum queue length, and show that these bounds are also roughly best possible.
We will shortly state our main results precisely, but first we describe the supermarket model more carefully. In fact, we describe a natural discretetime version of the process, which we shall work with throughout; as is standard, one may convert results about the discrete time version to the continuous model, with the understanding that one unit of time in the continuous model corresponds to about (1 + λ)n steps of the discrete model.
A queue-lengths vector is an n-tuple (x(1), . . . , x(n)) whose entries are non-negative integers. If x(j) = i, we say that queue j has length i, or that there are i customers in queue j; we think of these customers as in positions 1, . . . , i in the queue. We use similar terminology throughout; for instance, to say that a customer arrives and joins queue j means that x(j) increases by 1, and to say that a customer in queue j departs or is served means that x(j) decreases by 1. Given a queue-lengths vector x, we write x 1 = n j=1 x(j) to denote the total number of customers in state x, and x ∞ = max x(j) to denote the maximum queue length in state x.
For each i ≥ 0, and each x ∈ Z n + , we define u i (x) to be the proportion of queues in x with length at least i. So u 0 (x) = 1 for all x, and, for each fixed x, the u i (x) form a non-increasing sequence of multiples of 1/n, such that u i (x) = 0 eventually. The sequence (u i (x)) i≥0 captures the "profile" of a queue-lengths vector x, and we shall describe various sets of queue-lengths vectors, and functions of the queue-lengths vector, in terms of the u i (x).
For positive integers n and d, and λ ∈ (0, 1), we now define the (n, d, λ)supermarket process. This process is a discrete-time Markov chain (X t ), whose state space is the set Z n + of queue-lengths vectors, and where transitions occur at non-negative integer times. Each transition is either a customer arrival, with probability λ/(1 + λ), or a potential departure, with probability 1/(1 + λ). If there is a potential departure, then a queue K is selected uniformly at random from {1, . . . , n}: if there is a customer in queue K, then they are served and depart the system. If there is an arrival, then d queues are selected uniformly at random, with replacement, from {1, . . . , n}, and the arriving customer joins a shortest queue among those selected. To be precise, a d-tuple (K 1 , . . . , K d ) is selected, and the customer joins For x ∈ Z n + , (X x t ) denotes a copy of the (n, d, λ)-supermarket process (X t ) where X 0 = x a.s. Throughout, we let (Y t ) denote a copy of the process in equilibrium. The processes depend on the parameters (n, d, λ), but we suppress this dependence in the notation. Throughout, we use (F t ) to denote the natural filtration of the process (X t ). We use the notation P(·) freely to denote probability in whatever space we work in.
We now state our main results. First, we describe sets of queue-lengths vectors N (n, α, β): our aim is to prove that, for suitable values of α and β, with d = n β , λ = 1 − n −α and n sufficiently large, an equilibrium copy of the (n, d, λ)-supermarket process is concentrated in the set N (n, α, β).
(a) There are no queues of length k + 1 or greater. (b) For 1 ≤ j ≤ k, the number of queues of length less than j is n(1 − u j (x)), which lies between (1 ± 1 log n )n 1−α+(j−1)β . (c) In particular, the number of queues of length less than k is at most (1 + 1 log n )n 1−α+(k−1)β = o(n), and so the proportion of queues of length exactly k tends to 1 as n → ∞.

Remarks
(1) In fact, our proofs go through essentially unchanged if we demand only that 1 − λ(n) = n −α+δ 1 (n) and d(n) = n β+δ 2 (n) , where δ 1 (n) and δ 2 (n) tend to zero as n → ∞, and we replace instances of n −α+(j−1)β in the definition of N (n, α, β) by (1 − λ)d j−1 . For ease of exposition, we prefer to stick to definite values of λ and d; however, from now on we allow ourselves to write simply d = n β , even though this need not be an integer.
(2) The conclusion of the theorem implies that it is rare for there to be queues of length greater than k in equilibrium, and so in particular it is rare for the last arriving customer to have joined a queue containing k other customers. Theorem 1.1 can thus be used to make statements about the performance of the system in equilibrium in terms of the total waiting time for each customer; we leave the details to the interested reader.
(3) In the case where α ≤ β, Theorem 1.1 tells us that, in equilibrium, the maximum queue-length is 1 with high probability, and therefore that it will be extremely rare for an arriving customer to join a non-empty queue. In this case, some of the complexity of our proof can be avoided. This range is also covered by Fairthorne [6], with essentially the same proof and some sharper results, e.g. giving conditions for the maximum queue-length remaining equal to 1 for a time period n K for fixed K.
(4) We now indicate why the condition 2α < 1 + β(k − 1) in Theorem 1.1 is necessary. For a state in N (n, α, β), the total number of customers in the system is at least kn − 2n 1−α+(k−1)β . If we consider the next n 2α steps, the number of arrivals minus the number of potential departures is asymptotically a normal random variable with mean and standard deviation both of order n α . So the probability that the number of arrivals minus the number of departures is at least 3n α is bounded away from zero as n → ∞. If α ≥ 1 − α + (k − 1)β, then this many excess arrivals would drive the total number of customers in the system over kn, which certainly implies that some queue of length k + 1 would be created. (5) If α ≥ 1 and β is arbitrary, a similar argument shows that, in equilibrium, for each k, the probability that there is a queue of length at least k is bounded away from zero. Indeed, starting from any state, for any k ∈ N, there is a positive probability that, over the next n 2 transitions, the number of arrivals exceeds the number of departures by at least kn.
(6) For λ < λ , there is a coupling of the (n, d, λ)-and (n, d, λ )-supermarket processes, so that at each time, each queue in the (n, d, λ)-supermarket process is no longer than in the (n, d, λ )-supermarket process, provided this is true at time 0. So, for instance, if at a given time there are at least m queues with length k in the (n, d, λ)-supermarket process, then there are also at least m queues with length at least k in the (n, d, λ )-supermarket process. If α/β is equal to a positive integer k, and α < k/(k + 1) (so that the condition 2α < 1 + (k − 1)β is satisfied), then we can couple with the process for slightly lower, and slightly higher, values of α, to see that the maximum queue length in equilibrium is, with high probability, either k or k + 1, and that most queues have length either k or k + 1. Similarly, for d < d , there is a coupling of the (n, d , λ)-supermarket process and the (n, d, λ)-supermarket process such that, for all times t ≥ 0, and for each j, the number of customers in position at least j in their queue is no higher in the first process than the second (see [22,7]). Combining these arguments actually gives an essentially complete picture of the maximum queue length in equilibrium for any parameters α ∈ (0, 1), β > 0. The regions of the (α, β)-plane not covered by Theorem 1.1 are of For a model with parameters in E k , coupling in d shows that, with high probability, the maximum queue length in equilibrium is at most k + 1; coupling in λ shows that, with high probability, the maximum queue length in equilibrium is at least k. Moreover, the argument in Remark (4) shows that the value k + 1 occurs with probability bounded way from zero as n → ∞. (7) We define the model so that d queues are chosen with replacement, so it makes sense to ask what happens if β > 1. In this case, most arriving customers inspect every queue, and the situation is essentially the same as when β = 1 (when most arriving customers inspect at least half of the queues), or as when every arriving customer inspects every queue (the "join the shortest queue" protocol). Our result in this case says that, for α < 1/2, the maximum queue length is 1 with high probability in equilibrium. For α ≥ 1/2, we are in the region E 1 defined in the previous remark: the maximum queue length is either 1 or 2 with high probability in equilibrium, and the value 2 occurs with probability bounded away from 0. For the join the shortest queue protocol and λ = 1 − cn −1/2 , this situation is explored in detail by Eschenfeldt and Gamarnik [4]. (8) The case α = 1/2 has been studied in queueing theory under the name of the Halfin-Whitt heavy traffic regime. In this case, Theorem 1.1 applies whenever β < 1/2 and 1/2β is not an integer, and the result implies that, in equilibrium, the proportion of queues of length 1/2β tends to 1 as n → ∞, and with high probability there are no longer queues. For β > 1/2, the maximum queue length in equilibrium is either 1 or 2 with high probability, and the value 2 occurs with probability bounded away from 0, as in Remark (4). This is an explicit example of a model where we have a type of scaling limit: as we increase n with λ = 1 − n −α and d = n β , we retain the property that almost all queues have length k = α/β in equilibrium, with high probability, and the number of shorter queues is of order n 1−α+ α/β β = o(n). As we adjust the parameters so that α/β passes through an integer value, we have a phase transition to a different equilibrium regime.
As mentioned earlier, and explained in more detail in Section 2, our results are in line with a more general hypothesis: for a very wide range of parameter values, the maximum queue length of the (n, d, λ)-supermarket model in equilibrium is within 1 of the largest k such that (Recall that π k is the "predicted" proportion of queues of length at least k; see (1.2).) This general hypothesis holds when λ and d are constants: see [11]. It is also valid for the range where λ is fixed and d → ∞: see [6], and at least approximately when λ → 1 and d is fixed: see [5]. We now state our results concerning "rapid mixing", i.e., rapid convergence to equilibrium. For x ∈ Z n + , let L(X x t ) denote the law at time t of the (n, d, λ)-supermarket process (X x t ) started in state x. Also let Π denote the stationary distribution of the (n, d, λ)-supermarket process. Theorem 1.2. Suppose that λ(n) = 1−n −α and d(n) = n β , where α, β and k = α/β satisfy the conditions of Theorem 1.1. Let x be a queue-lengths vector in N (n, α, β). Then, for all sufficiently large n and for all t ≥ 0, In other words, for a copy of the process started in a state in N (n, α, β), the mixing time is at most of order n 1+(k−1)β log n = o(n 1+α ) = o(n 2 ). In fact, this upper bound on the mixing time is best possible up to the logarithmic factor: we show that mixing, starting from states in N (n, α, β), requires order at least n 1+(k−1)β steps. Theorem 1.3. Suppose that λ(n) = 1 − n −α and d(n) = n β , where α, β and k = α/β satisfy the conditions of Theorem 1.1. For all sufficiently large n, there is a state z ∈ N (n, α, β) such that, for t ≤ 1 8 n 1+(k−1)β , From states not in N (n, α, β), we cannot expect to have rapid mixing in general. For instance, suppose we start from a state x with number of customers x 1 ≥ kn. The expected decrease in the number of customers at each step of the chain is at most 1−λ 1+λ , so mixing takes at least of order ( x 1 − kn)(1 − λ) −1 = ( x 1 − kn)n α steps. Similarly, if we start with one long queue, of length x ∞ > k, then mixing takes at least of order ( x ∞ − k)n steps, to allow time for enough departures from the long queue. This shows that, for instance, if either x 1 ≥ 2kn or x ∞ > 2k, and then the total variation distance d T V (L(X x t , Π) is near to 1. The next result gives an upper bound on the mixing time for (X x t ) in terms of x 1 and x ∞ , and shows that (1.3) is best possible up to the constant factor. Theorem 1.4. Suppose that α and β satisfy the hypotheses of Theorem 1.1, and let x be any queue-lengths vector with x ∞ ≤ e 1 4 log 2 n . Then for n sufficiently large and t ≥ 7200 kn 1+α + x 1 n α + x ∞ n , In the case where the dominant term in the expression above is kn 1+α , this result is not as sharp as that in Theorem 1.2, since α > (k − 1)β.
The supermarket model is an instance of a model whose behaviour has been comprehensively analysed even though there are an unbounded number of variables that need to be tracked -namely, the proportions u i (X t ). While what we achieve in this paper is similar to what is achieved by Luczak and McDiarmid in [11] for the case where λ and d are fixed as n → ∞, only some of the techniques of that paper can be used here, as we now explain.
The proofs in [11] rely on a coupling of copies of the supermarket process where the distance between coupled copies does not increase in time. This coupling is, in particular, used to establish concentration of measure, over a long time period, for Lipschitz functions of the queue-lengths vector; this result is valid for any values of (n, d, λ), and in particular in our setting. Fast coalescence of coupled copies, and hence rapid mixing, is shown by comparing the behaviour of the (n, d, λ)-process (d ≥ 2) with the (n, 1, λ)process, which is easy to analyse. This then also implies concentration of measure for Lipschitz functions in equilibrium, and that the profile of the equilibrium process is well concentrated around the fixed point π of the equations (1.1).
The coupling from [11] also underlies the proofs in the present paper. However, in our regime, comparisons with the (n, 1, λ)-process are too crude. Thus we cannot show that the coupled copies coalesce quickly enough, until we know something about the profiles of the copies, in particular that their maximum queue lengths are small. Our approach is to investigate the equilibrium distribution first, as well as the time for a copy of the process from a fairly general starting state to reach a "good" set of states in which the equilibrium copy spends most of its time. Having done this, we then prove rapid mixing in a very similar way to the proof in [11].
To show anything about the equilibrium distribution, we would like to examine the trajectory of the vector u(X t ), whose components are the u i (X t ) for i ≥ 1. This seems difficult to do directly, but we perform a change of variables and analyse instead a collection of just k functions Q 1 (X t ), . . . , Q k (X t ). These are linear functions of u 1 (X t ), . . . , u k (X t ), with the property that the drift of each Q j (X t ) can be written, approximately, in terms of Q j (X t ) and Q j+1 (X t ) only. Exceptionally, the drift of Q k (X t ) is written in terms of Q k (X t ) and u k+1 (X t ) (which in fact is usually zero in equilibrium). The particular forms of the Q j are chosen by considering the Perron-Frobenius eigenvalues of certain matrices M k derived from the drifts of the u j (x). Making this change of variables allows us to consider one function Q j (X t ) at a time, and show that each in turn drifts towards its equilibrium mean (which is derived from the fixed point π of (1.1)), and we are thus able to prove enough about the trajectory of the Q j (X t ) to show that, starting from any reasonable state, with high probability the chain soon enters a good set of states where, in particular, u k+1 (X t ) = 0, and so the maximum queue length is at most k. We also show that, with high probability, the chain remains in this good set of states for a long time, which implies that the equilibrium copy spends the vast majority of its time in this set. The argument from [11] about coalescence of coupled copies can be used to show rapid mixing from this good set of states. The drift of the function Q k to its equilibrium is slower than that of any other Q j , and its drift rate is approximately n −1−(k−1)β , which is close to the spectral gap of the Markov chain (X t ), and hence determines the speed of mixing in Theorem 1.2.
The structure of the paper is as follows. In Section 2, we expand on the discussion above, and motivate the definitions of the functions Q j : Z n + → R, which are fundamental to the proof. In Section 3, we give a number of results about the long-term behaviour of random walks with drifts, including several variants on results from [11]. In Section 4, we describe the key coupling from [11], and use it to prove some results about the maximum queue length and number of customers. In Section 5, we discuss in detail the drifts of the functions Q j . The proof of Theorem 1.1 starts in Section 6, where we show how to derive a slightly stronger result from a sequence of lemmas. These lemmas are proved in Sections 7-9. We prove our results on mixing times in Section 10.
Note: this paper is heavily based on a manuscript [3] by the first and third named authors, placed on the arXiv in 2012, but not published in any other outlet. The present paper also incorporates results from the second author's PhD thesis [6]. The results proved in the present paper are in some sense weaker than those in [3] and [6], as, purely for the sake of exposition, we only treat the case where 1 − λ(n) and d(n) are powers of n, and state our results only in asymptotic form. In a more important sense, our results here are stronger, as they cover essentially best possible ranges of exponents; the key improvement in our methodology compared to [3] is that here we state and use Lemma 3.2 in a form where we get a stronger bound when a function on the state space stays the same with high probability at any step, allowing us to take proper account of the fact that the Q j for j < k rarely change value. Our intention is to update [3] to incorporate these improvements in our more general setting.

Heuristics
In this section, we set out the intuition behind our results and proofs. As before, let (Y t ) be an equilibrium copy of the (n, d, λ)-supermarket process. Guided by the results in [6,11], we start by supposing that, for each i ≥ 1, u i (Y t ) is well-concentrated around its expectation u i , and seeing what that implies about the u i . For a function F defined on the state space, and a state x, we define the drift of F at x to be ∆F ( To see this, observe that, for i ≥ 1, conditioned on Y t , the probability that the event at time t + 1 is an arrival to a queue of length exactly i − 1, while the probability that the event is a departure from a queue of length exactly i, . Note that u 0 is identically equal to 1. Taking expectations on both sides, and setting them to 0, we see that, are justified because of our assumption that u i (Y t ) and u i−1 (Y t ) are well-concentrated around their respective means u i and u i−1 .
The system of equations with π 0 = 1, has a unique solution with π i → 0 as i → ∞, namely: as in (1.2). See [11] and the references therein for details. By analogy with [11], and motivated by (2.2), if the u i (Y t ) are well concentrated, we expect that u i ≈ π i , for each i, and moreover that the values of u i (Y t ) will be close to the corresponding π i with high probability. In the regime of Theorem 1.1, for each i ≥ 1. As we are assuming that (k − 1)β < α < kβ, this means that π i is close to 1 for i ≤ k, and very close to 0 for i > k. In particular, π k+1 (which we expect to be the approximate proportion of queues of length greater than k) is much smaller than 1/n, suggesting that, in equilibrium, the probability that there is a queue of length greater than k is very small.
On the other hand, the fact that π k is close to 1 suggests that, in equilibrium, most queues have length exactly k. Moreover, We then obtain the following linear approximation to the equations (2.3), written in terms of variables 1 −ũ 1 , . . . , 1 −ũ k : These linear equations have solutionũ given by We then have the further approximation and we aim to show that indeed each u i (x) is close to the correspondingũ i with high probability in equilibrium.
Ideally, we would seek a single "Lyapunov" function of the u i (x), which is small when u i (x) ≈ũ i for each i, and larger otherwise, and which has a downward drift outside of a small neighbourhood ofũ: we could then analyse the trajectory of this function to show that (u 1 (x), . . . , u k (x)) stays close toũ for a long period. We have been unable to find such a function, and indeed analysing the evolution of the u i (X t ) directly appears to be challenging. Instead, we work with a sequence of functions Q j (x), j = 1, . . . , k, each of the form , where the γ j,i are positive real coefficients. This sequence of functions has the property that the drift of each Q j (x) can be written (approximately) in terms of Q j (x) itself and Q j+1 (x).
Let us see how these coefficients should be chosen, starting with the special case j = k, where we write γ i for γ k,i . Consider a function of the form ). As in the argument leading to (2.1), we have that the drift of this function satisfies ) for i = 1, . . . , k − 1, and rearranging, we arrive at We set γ 0 = 0 for convenience of writing the above expression. This calculation is done carefully, with precise inequalities, in Lemma 5.1 below. We would like to choose the γ i so that the vector whereas if Q k is below then it drifts up. What we need in order for the vector (2.5) to be a multiple of γ 1 , . . . , γ k is for γ 1 , . . . , γ k to be a left eigenvector of the k × k matrix with eigenvalue −µ, or, equivalently, of the matrix The non-negative matrix M k has a unique largest "Perron-Frobenius" eigenvalue, with a positive left eigenvector. By inspection, we see that, for k ≥ 2, this left eigenvector is close to the all-1 vector, with an eigenvalue close to λd + 1, so that M k has largest eigenvalue very close to 0. Recursion shows that a better approximation to the Perron-Frobenius left eigenvector of M k is γ 1 , . . . , γ k , where for i = 1, . . . , k, and the largest eigenvalue µ of M k is very close to −1/(λd) k−1 . We shall see in Lemma 5.1 that this approximation is close enough for our purposes, enabling us to show that, with these choices of the γ i , and thus Q k (x) drifts towards a value close to (1 − λ)n(λd) k−1 . A further consequence is that, in order for Q k (x) to move from (1±2ε)(1−λ)n(λd) k−1 to (1±ε)(1−λ)n(λd) k−1 , it has to travel a distance of ε(1−λ)n(λd) k−1 while drifting at rate no greater than 2ε(1 − λ), and so time of order n(λd) k−1 is required. This is then a lower bound on the mixing time from a "good" state to equilibrium, nearly matching that in Theorem 1.2. We make this argument precise at the very end of the paper.
(See the proof of Lemma 5.2.) We think of 1−u j+1 (x) as an "external" term (which in practice will be very close to Q j+1 (x)/n), which will determine the value towards which Q j drifts. We would like the rest of the expression to be a negative multiple of Q j (x). For this we need γ j,1 , . . . , γ j,j to be a left eigenvector of the j × j matrix with eigenvalue −µ < 0 or, equivalently, of the matrix with eigenvalue λd+1−µ. These matrices are tridiagonal Toeplitz matrices, and there is an exact formula for the eigenvalues and eigenvectors. (See, for instance, Example 7.2.5 in [16].) The Perron-Frobenius eigenvalue of M j is 2 √ λd cos π j+1 , with left eigenvector γ j,1 , . . . , γ j,j given by This means that the largest eigenvalue of M j is −λd + O( √ λd), so that we obtain This means that, if Q j+1 (X t ) remains in an interval aroundQ j+1 := n(1 − λ)(λd) j for a long time, then Q j (X t ) will enter some interval around Q j within a short time, and stay there for a long time. We can then conduct the analysis for each Q j in turn, starting with j = k, to show that indeed all the Q j (X t ) quickly become close toQ j , and stay close toQ j for a long time. This will then imply that the u j (X t ) all become and remain close tõ u j .
A subsidiary application of this same technique forms another important step in the proofs (see the proof of Lemma 6.5(1)). If we do not assume that u k+1 (x) is zero, but instead build this term into our calculations, we obtain the approximation If u k+1 (X t ) remains above ε(1−λ), for some ε > 0, for a long time, this drift equation tells us that Q k drifts down into an interval whose upper end is below the valueQ k , and then each of the Q j in turn drift down into intervals whose upper ends are below the correspondingQ j , and remain there. For j = 1, this means that the number of empty queues is at most (1−δ)(1−λ)n, for some positive δ, for a long period of time; this results in a persistent drift down in the total number of customers (since the departure rate is bounded below by n − (1 − δ)(1 − λ)n = λn + δ(1 − λ)n while the arrival rate is λn), and this is not possible.

Random Walks with Drifts
In this section, we state some general results about the long-term behaviour of real-valued functions of a Markov chain with bounds on the drift. These are variants of results of Luczak and McDiarmid [11] and Brightwell and Luczak [2], and we do not give the proofs in full detail.
We start with a lemma concerning random walks with a drift, adapted from a result of Luczak and McDiarmid [11]. We have a sequence (R t ) of realvalued random variables; on some "good" event, the jumps Z t = R t − R t−1 have magnitude at most 1, and expectation at most −v < 0. The lemma shows that, on the good event, with high probability, such a random walk, started at some value r 0 , hits a lower value r 1 after not too many more than (r 0 − r 1 )/v steps.
We omit the proof, which is similar to one in [11]. For a discrete-time Markov process (X t ) with state space X , a real-valued function F defined on X , and an element x of X , we define and call this the drift of F (at x). Similarly, we shall also use the notation ∆F (X t ) to denote the random variable The next lemma says that, if the function F has a negative drift of magnitude at least v > 0 on a good set U, and makes jumps of size at most 1, then it is unlikely to increase by a large positive value before leaving U.
Proof. (Sketch) We use Theorem 2.5 of [2], applied to the function F . Translated into our setting, that result says that, for all t ≥ 0, and all ω > 0, . It is easy to verify that max( ω(t)pt, ω(t)) < vt + a for each t (note that the hypotheses imply that v ≤ p). Therefore We now use the two lemmas above to prove a result about real-valued functions of a Markov chain which we shall use repeatedly in our proofs.
s. Let T * be any stopping time, and suppose that F (X T * ) ≤ c a.s. Let When we use the lemma, m will be much smaller than s, with high probability T * will be much smaller than s, and also P(T 0 ≤ s) will be small. In these circumstances, the lemma allows us to conclude that P(T 1 > T * + m) and P(T 2 ≤ s) are small. This means that, with high probability, F (X t ) decreases from its value at T * (at most c) to below h in at most a further m steps, and does not increase back above h + ρ before time s. We shall sometimes use the conclusion of (ii) in the weaker form P(T 2 ≤ s < T 0 ) ≤ 100s v 2 exp(−ρv/8p). For most uses of part (ii), we shall simply set p = 1, but on occasion we need to use the stronger result in cases where the function F rarely changes value.
Proof. We start by proving the lemma in the special case where the stopping time T * is equal to 0.
For (i), we apply Lemma 3.1. The filtration ϕ 0 ⊆ ϕ 1 ⊆ · · · ⊆ ϕ m will be the initial segment of the filtration We set r 0 = F (X 0 ) ≤ c, and r 1 = h. We may assume that r 0 > r 1 ; otherwise T 1 = 0 and there is nothing to prove.
On the event Thus, noting that vm ≥ 2(r 0 − r 1 ) by our assumption on m, we see that the conditions of Lemma 3.1 are satisfied. The event that R t > r 1 for all t = 1, . . . , m is the event that T 1 > m, so We move on to (ii). For each time r ∈ {0, . . . , s − 1}, set We say that r is a departure point if: crosses from its value, at most h, at time T 1 , up to a value at least h + ρ, taking steps of size at most 1, by time s ∧ T 0 . This is equivalent to saying that there is at least one departure point r ∈ [0, s). Therefore Fix any r ∈ [0, s). We claim that, for any h 0 ∈ [h, h + 1), on the ϕ rmeasurable event that F (X r ) = h 0 , the conditional expectation . This will imply that each term of the sum above is at most 100 v 2 e −ρv/8p , and so that P(T 2 ≤ s ∧ T 0 ) ≤ 100s v 2 exp(−ρv/8p), as required.
To prove the claim, we use Lemma 3.2. We consider the re-indexed process (X t ) = (X r+t ); by the Markov property, this is a Markov chain with the same transition probabilities as (X t ), and initial state X 0 = X r with F (X 0 ) = h 0 . We set ϕ i = ϕ r+i for each i, so that (X i ) is adapted to the filtration (ϕ i ).
For i ≤ T U , we have X i−1 = X r+i−1 ∈ S and F (X i−1 ) ≥ h, and therefore ∆F (X i−1 ) ≤ −v, and also P( v 2 e −ρv/8p , as required. This completes the proof in the special case where T * = 0. We now proceed to the general case. Suppose then that the hypotheses of the lemma are satisfied, with stopping time T * . We apply the result we have just proved to the process (X t ) = (X T * +t ). By the strong Markov property, (X t ) is also a Markov process, adapted to the filtration (ϕ t ) t≥0 = (ϕ T * +t ) t≥0 . The condition that F (X T * ) ≤ c is equivalent to F (X 0 ) ≤ c. Set: and note that these are all stopping times with respect to the filtration (ϕ t ). The special case of the result (with T * = 0) now tells us that: In both cases, these are the desired results.
We also use a "reversed" version of Lemma 3.3 where ∆F (x) ≥ v for all x in some "good" set S with F (x) ≤ h. The result and proof are practically identical to Lemma 3.3, changing the directions of inequalities where necessary, and using "reversed" versions of Lemmas 3.1 and 3.2.
The next lemma is a more precise version of Lemma 2.2 in [11]. We omit the proof, which is exactly as in [11], except that we track more carefully the values of the various constants appearing in that proof, and separate out the effects of the two occurrences of δ in that theorem. We will use this result in our proof of Lemma 10.1, showing rapid mixing.
Lemma 3.4. Let (ϕ t ) t≥0 be a filtration. Let Z 1 , Z 2 , . . . be {0, ±1}-valued random variables, where each Z i is ϕ i -measurable. Let S 0 ≥ 0 a.s., and for each positive integer j let Suppose that there is a positive integer k 0 and a constant δ with 0 < δ < 1/2 such that P( Several times we shall use the fact that, if Z is a binomial or Poisson random variable with mean µ, then for each 0 ≤ ≤ 1 we have (3.1)

Coupling
We now introduce a natural coupling of copies of the (n, d, λ)-supermarket process (X x t ) with different initial states x. The coupling is a natural adaptation to discrete time of that in [11]. In this section, we make no assumptions about the values of the parameters n, λ and d.
We describe the coupling in terms of three independent sequences of random variables. There is an iid sequence V = (V 1 , V 2 , . . . At time i, D i will be used if Z i = 1, and there will be an arrival to the first shortest queue in D i ; otherwise, there will be a departure from the queue with indexD i , if that queue is currently non-empty. Suppose that we are given a realisation (v, d,d) of (V, D,D). For each possible initial queue-lengths vector x ∈ Z n + , this realisation yields a deterministic process (x t ) with x 0 = x: let us write x t = s t (x; v, d,d). Then, for each x ∈ Z n + , the process s t (x; V, D,D) has the distribution of the (n, d, λ)supermarket process X x t with initial state x. In this way, we construct copies (X x t ) of the (n, d, λ)-supermarket process for each possible starting state x on a single probability space. When we treat more than one such copy at the same time, we always work in this probability space, and we let P(·) denote the corresponding coupling measure.
We shall use the following lemma, which is a discrete-time analogue of Lemma 2.3 in [11] and is proved in exactly the same way.
Lemma 4.1. Fix any triple z, d,d as above, and for each queue-lengths vector x, write s t (x) for s t (x; z, d,d). Then, for each x, y ∈ Z n + , both s t (x) − s t (y) 1 and s t (x) − s t (y) ∞ are nonincreasing; and further, if 0 ≤ t < t and s t (x) ≤ s t (y), then s t (x) ≤ s t (y).
Given positive real numbers and b, we set Thus a state x is in A 0 if there are at most 2n(1 − λ) −1 customers in total, and no more than (1 − λ) −1 log 2 n in any queue. These requirements are relaxed by a factor of 3 in A 1 .
The next result tells us that the (n, d, λ)-supermarket process (Y t ), in equilibrium, is very unlikely to be outside the set A 0 , for any d. This is accomplished by proving the result for d = 1, when the process is easy to analyse explicitly, and then using coupling in d to deduce the result for all d.
Of course, the result is actually extremely weak for all d > 1, and later we shall show a much stronger result whenever the various parameters of the model satisfy the conditions of Theorem 1.1; the importance of the lemma below is that it gets us started and enables us to say something about where the equilibrium of the process lives.
Proof. LetỸ denote a stationary copy of the (n, 1, λ)-supermarket process, in which each arriving customer joins a uniform random queue. Then the queue lengthsỸ t (j) are independent geometric random variables with mean λ/(1 − λ), where P(Ỹ t (j) = r) = (1 − λ)λ r for r = 0, 1, 2, . . .. Therefore, P( Ỹ t ∞ ≥ r) ≤ nλ r , and also it can easily be checked that P Ỹ t 1 ≥ 2n(1 − λ) −1 ≤ e −n/4 . As mentioned in the remarks after Theorem 1.1, there is a coupling between supermarket processes with different values of d, which can be used to show that the equilibrium copy (Y t ) of the (n, d, λ)-supermarket process, for any d, also satisfies Next we prove a very crude concentration of measure result: if the process (Y t ) in equilibrium is concentrated inside some set A 0 ( , b), and we start a copy (X x t ) of the process at a state x ∈ A 0 ( , b), then the process (X x t ) is unlikely to leave the larger set A 1 ( , b) over a long period of time. . Let (Y t ) be a copy of the (n, d, λ)-supermarket process in equilibrium, and let (X x t ) be a copy started in state x. Then for any natural number s, Proof. By Lemma 4.1, we can couple (X x t ) and (Y t ) in such a way that X x t − Y t 1 and X x t − Y t ∞ are both non-increasing, and hence that, for each t ≥ 0, and similarly We deduce that, for each t ≥ 0, The result now follows immediately.
We shall use Lemma 4.3 later for general values of and b, but for now we note the following immediate consequence of the previous two lemmas. Let T † A = T † A (x) = inf{t : X x t / ∈ A 1 }: this will be an instance of a more general notation we introduce later: when we have a pair of sets S 0 ⊆ S 1 , we will use T S to denote the first time we enter the inner set, and T † S to denote the first time after T S that we leave the outer one. Proof. The probability in question is P(∃t ∈ [0, e 1 3 log 2 n ], X x t / ∈ A 1 ) which, by Lemma 4.3 and Lemma 4.2, is at most which, for n sufficiently large, is at most e − 1 2 log 2 n , as required.

Functions and Drifts
We now start the detailed proofs of our main results. As explained in Section 2, we will consider a sequence of functions Q k , Q k−1 , . . . , Q 1 defined on the set Z n + of queue-lengths vectors. We now give precise definitions of these functions, along with another function P k−1 , and derive some of their properties.
The results in this section will be used in the course of the proof of Theorem 1.1, and we could assume that we are in the regime covered by our theorem; however, for this section all that is necessary is that λd ≥ 16. In the special case k = 1, we need only consider the function Q k = Q 1 and its drift; otherwise we assume that k ≥ 2.
Lemma 5.1. If k ≥ 2, then, for any state x ∈ Z n + , For k = 1, we have Proof. As in (2.1), we have that, for i = 1, . . . , k, and that u 0 is identically equal to 1. We deduce that We rearrange the formula above as follows: Here we have used the facts that γ 0 = 0 and 1 − u 0 (x) = 0.
For k ≥ 2, in order to estimate the terms constituting the two sums, we note the inequalities d( . To obtain our upper bound on ∆Q k (x), we apply the inequality 1 for each i = 1, . . . , k − 1. Using also (5.1), we have This establishes the required upper bound on (1 + λ)∆Q k (x). The calculation works because the γ i are the entries of a good approximation to the Perron-Frobenius eigenvector of the matrix M k defined in Section 2.
For the lower bound, the previous calculation, and the bound 1−u n .
Here we used the fact that 1 − 1/(λd) ≤ γ i for each i. It remains to show that We observe that which implies the required inequality.
In the special case k = 1, the equation for the drift reduces to and both the required bounds follow immediately.
We prove a similar result for the functions Q j (x), 1 ≤ j ≤ k − 1. Ideally, the drift bounds would be expressed in terms of Q j (x) itself and Q j+1 (x): however, there is a complication. In the upper bound, there appears a term which can be bounded above by λ d

and we would like to show that this is small compared with
). This is true if 1 − u j (x) 1/d, but in general we cannot assume this. We bound this term above, very crudely, by we use the function P k−1 here because its drifts are relatively easy to handle.
As before, we proceed by approximating as claimed. In the last line above, we used (5.4), as well as the inequality 2 √ λd cos(π/(j + 1)) ≥ √ 2λd ≥ 1, valid since λd ≥ 16. For the upper bound, we use the facts that This is the result we require, since j i=1 γ j,i (1 − u i (x)) = Q j (x)/n. We have a similar result for the function P k−1 . For this function, we need only a fairly crude upper bound on the drift, and we omit the simple proof. Lemma 5.3. For any state x ∈ Z n + , we have
We define a sequence of pairs of subsets of Z n + . Each pair consists of a set S 0 in which some inequality holds, and a set S 1 in which a looser version of the inequality holds: we also demand that S 0 and S 1 be subsets of the previous set R 1 in the sequence. Associated with each pair (S 0 , S 1 ) in the sequence is a hitting time where (R 0 , R 1 ) is the previous pair in the sequence, and an exit time Our aim in each case is to prove that, with high probability, unless the previous exit time T † R occurs early, T S is unlikely to be larger than some quantity m S whose order is polynomial in n. To be precise, if we start in a state in A 0 ( , b), then the sum of all the m S is of order at most the maximum of bn 1+α and n, so if and b are bounded by a polynomial in n, then so are all the m S .
Throughout the proof, we set We shall also prove that, again with high probability, each exit time T † S is at least s 0 , which is larger than the sum of all the terms m S . For convenience, we shall not be too precise about our error probabilities, and simply declare them all to be at most 1/s 0 = e − 1 3 log 2 n , or some small multiple of 1/s 0 . We will thus prove that, with high probability, we enter each of the sets S 0 in turn, while remaining inside all the earlier sets S 1 .
We fix, for the moment, a pair of positive real numbers and b with ≥ b ≥ k. We set and we make the (mild) assumption that ≤ e 1 4 log 2 n , so that q( , b) ≤ s 0 /2. The first pair of sets in our sequence will be as defined earlier: and we adopt the hypothesis that X 0 = x 0 almost surely, where x 0 is a fixed state in A 0 = A 0 ( , b), so that T A := min{t ≥ 0 : X t ∈ A 0 } = 0. For = * = n α log 2 n and b = b * = 2n α , Lemma 4.4 tells us that indeed the exit time T † A = inf{t > 0 : X t / ∈ A 1 } is unlikely to be less than s 0 . For smaller values of and b, we do not know this a priori.
The sets we define are dependent on the chosen values of n, α, β and ε, as well as on and b. For the most part, we drop reference to this dependence from the notation. When we need to vary ε while keeping all other parameters fixed, we shall use the notation (e.g.) B ε 0 to emphasise the dependence. We define: Next we have a sequence of pairs of sets, indexed by j = k − 1, . . . , 1: where we declare G k 1 to be equal to E 1 . Finally, departing slightly from our pattern, we define In the special case k = 1, only the pairs (B 0 , B 1 ), (E 0 , E 1 ) and H are defined.
The hitting times and exit times are all defined in accordance with the pattern given. For instance T B = inf{t : Initially, the sets above all depend on the values of and b defining the initial pair of sets (A 0 , A 1 ), since all the sets are intersected with A 1 . However, since states in H have no queue of length k + 1 or greater, we have H ⊆ A 0 (k, k) ⊆ A 1 ( , b) for all , b ≥ k, and so the set H does not depend on and b, provided these parameters are each at least k.
We now state a sequence of lemmas. Throughout, we assume that X 0 = x 0 a.s., where x 0 is an arbitrary state in A 0 = A 0 ( , b).
We shall postpone the proofs of these lemmas to later sections. For the remainder of this section, we show how the lemmas imply Theorem 6.1. To start with, combining the lemmas gives the following result. Proposition 6.8. For any x 0 ∈ A 0 = A 0 ( , b), and a copy (X t ) of the process with X 0 = x 0 a.s., we have Proof. The idea is that, with high probability, either the chain (X t ) exits A 1 ( , b) before time s 0 , or the chain enters each of the sets B 0 , . . . , H 0 in turn, within time q( , b), and does not exit any of the sets A 1 , . . . , H 1 before time s 0 , which is what we need. We assume that k ≥ 2: if k = 1, the proof is very similar and shorter. Consider the following list of events concerning the various stopping times we have defined: If E 2k+8 holds, then for sufficiently large n. Therefore, if E = 2k+9 j=1 E j holds, then in particular E 2k+8 and E 2k+9 hold, which implies that X t ∈ H for q( , b) ≤ t ≤ s 0 . Thus E is contained in the event {X t ∈ H for all t ∈ [q( , b), s 0 ]}, and it suffices to show that P(E) ≤ 2k+8 s 0 + P(E 1 ). We write and now we see that it suffices to prove that each of the terms We show how to derive the first few of these inequalities from Lemmas 6.2-6.7; first we have ≤ 1/s 0 by Lemma 6.2(1). Then we have ≤ 1/s 0 by Lemma 6.2(2). Next we have, using the fact that m B + m C ≤ s 0 , by Lemma 6.3(1). For j = 5, . . . , 2k+9, the upper bound on P E j ∩ j−1 i=1 E i follows either as for j = 3 or as for j = 4: it is important here that We now have the following consequence for an equilibrium copy (Y t ) of the (n, d, λ)-supermarket process. Corollary 6.9. P(Y t ∈ H for all t ∈ [0, s 0 ]) ≥ 1 − (4k + 20)/s 0 ≥ 1 − e − 1 4 log 2 n , for n sufficiently large.
Proof. Recall the definitions of * and b * in Section 4. Set also q * = q( * , b * ), and note that q * ≤ s 0 /2, with plenty to spare. From Lemma 4.2, we have that P(Y 0 / ∈ A 0 ) ≤ ne − log 2 n ≤ e − 1 3 log 2 n = 1/s 0 , since n ≥ 5. Also, from Lemma 4.4, for a copy (X x t ) of the process starting in a state x ∈ A 0 , we have that P(T † A < s 0 ) ≤ 1/s 0 . We now have The first part of Theorem 6.1 now follows, since we have already noted that H ε ⊆ N ε .
We can also use Corollary 6.9 to prove the following more explicit version of Proposition 6.8. Theorem 6.10. Suppose that and b are at least k, and that q( , b) ≤ s 0 /2. Let x 0 be any queue-lengths vector in A 0 ( , b), and suppose that X 0 = x 0 a.s. Then we have, for n sufficiently large, Proof. We apply, successively, Proposition 6.8, Lemma 4.3 and Corollary 6.9 to obtain that P(X t ∈ H for all t ∈ [q( , b), s 0 ]) To see the final assertion of Theorem 6.1, suppose that X 0 = x 0 a.s., where x 0 is in the set Then all the hitting times T B , T C , T D , T E , T j G and T H are equal to 0. In the notation of the proof of Proposition 6.8, this implies that the events E j for j even occur with probability 1. Also, by Lemma 4.4, P(E 1 ) ≤ 1/s 0 . So following the proof of Proposition 6.8 yields that, for X 0 = x 0 ∈ I, It can easily be seen that N ε/6 ⊆ I ε , and hence this result completes the proof of Theorem 6.1.

7.
Proofs of Lemmas 6.2, 6.3 and 6.4 In this section, we prove the first three of the sequence of lemmas stated in the previous section, and also derive tighter inequalities on the drifts of the functions Q j (x) for x ∈ D 1 . The proofs of the three lemmas are all straightforward applications of Lemma 3.3, and all similar to one another.

Proof of Lemma 6.2
Proof. We apply Lemma 3.3. We set (ϕ t ) = (F t ), the natural filtration of the process, and also: 3 log 2 n and T * = 0. It is clear that ρ ≥ 2 and that Q k (x) ≤ c := kn for any x ∈ Z n + . We note also that Q k takes jumps of size at most 1.
Suppose now that Q k (x) ≥ h. Then The final inequality above is true comfortably, as (1 − λ)d k = n −α+kβ = n δ for some δ > 0. Hence, by Lemma 5.1, for x with Q k (x) ≥ h, we have

THE SUPERMARKET MODEL WITH BOUNDED QUEUE LENGTHS IN EQUILIBRIUM 35
We have now verified that the conditions of Lemma 3.3 are satisfied, for the given values of the parameters. As in the lemma, we have T 0 = T † A , It need not be the case that T 1 = T B , since X T 1 need not be in A 1 . However, we do have Also the events T 2 ≤ s 0 < T † A and T † B ≤ s 0 < T † A coincide, so we have as required. Here we used that 1 − 2α + (k − 1)β > 0.
Proof of Lemma 6.3 Proof. Again we apply Lemma 3.3 to the Markov process (X t ) with its natural filtration. Set F = P k−1 , S = B 1 , p = 1, It is again clear that ρ ≥ 2, that P k−1 takes jumps of size at most 1, and that P k−1 (x) ≤ c := kn for all x ∈ Z n + . Here T 0 = T † B , T 1 = inf{t ≥ T B : P k−1 (X t ) ≤ h}, and T 2 = inf{t > T 1 : For x ∈ B 1 with P k−1 (x) ≥ h, we have Q k (x) ≤ (1 + 2ε)n(1 − λ)(λd) k−1 and so, by Lemma 5.3, We conclude that, for such x, As in the previous lemma, it need not be the case that T 1 = T C , since X T 1 need not be in B 1 , so we may have T C > T 1 . However, we do have Similarly, the events T 2 ≤ s 0 < T † B and T † C ≤ s 0 < T † B coincide, and so, for k ≥ 2, Sketch of proof of Lemma 6.4 Proof. The basic plan for this proof is the same as for the previous two lemmas, but here we have to take account of the fact that Q k−1 can take jumps of size up to (λd) (k−2)/2 , and accordingly we apply Lemma 3.3 to the "scaled" function F (x) = Q k−1 (x) = Q k−1 (x)/(λd) (k−2)/2 .
Apart from this, the proof is identical in structure to that of Lemma 6.3, and we give only the key calculation. For Thus, by Lemma 5.2 with j = k − 1, we have Thus, for such x, the drift in the scaled chain satisfies ∆Q k−1 (x) ≤ − 1 2 ε(1 − λ)(λd) k/2 := −v. Now Q k−1 (x) ≤ c := 2n for all x by (5.3), and m D v = 2c.
It is now straightforward to derive the result.
A queue-lengths vector x ∈ D 1 satisfies the three inequalities: in fact the second of these is redundant, as Substituting these bounds into the bounds of Lemmas 5.1 and 5.2, we obtain the following.

Proof of Lemma 6.5
This section is devoted to the rather more complex proof of Lemma 6.5. First, we prove a statement stronger than part (1) of the lemma. We set Note that K ⊆ E 0 , so to prove Lemma 6.5(1) it suffices to prove that We prove this result on the assumption that T D = 0 (i.e., that x 0 ∈ A 0 ∩ B 0 ∩ C 0 ∩ D 0 ). The general case follows immediately by applying the result for T D = 0 to the shifted process (X t ) = (X T D +t ), using the strong Markov property. So our task is to show that P( We define the following further sets, hitting times and exit times. We set Also, for j = k, . . . , 1, let Our goal is to show that P(W † L k+1 < m E ) ≥ 1 − 1/s 0 . If x 0 ∈ K, then W † L k+1 = 0 and we are done, so we may assume that x 0 / ∈ K, and hence that x 0 ∈ L k+1 1 . Thus Lemma 6.5(1) follows from the proposition below.
Proposition 8.1. Let x 0 be any queue-lengths vector in L k+1 1 . For a copy (X t ) of the (n, d, λ)-supermarket process with X 0 = x 0 a.s., we have For the proof of Proposition 8.1, we fix a state x 0 ∈ L k+1 1 , and work with a copy (X t ) of the (n, d, λ)-supermarket process where X 0 = x 0 a.s.
Our general plan for proving Proposition 8.1 is as follows. We suppose that the process (X t ) stays inside L k+1 with the aim of showing that this event has low probability. Observe that, if This "excess" in u k+1 would result in a downward drift in Q k (X t ), so if the process does not exit L k+1 1 quickly, then it enters L k 0 quickly, and stays in L k 1 throughout the interval [0, m E ): i.e., W L k is small and W † L k is large, with high probability. This means that Q k (X t ) maintains a "deficit" compared toQ k := n(1 − λ)(λd) k−1 until time m E . A deficit in Q k (X t ) would lead to a deficit in each Q j (X t ) in turn, compared toQ j := n(1 − λ)(λd) j−1 , for j = k − 1, k − 2, . . . , 1: each W L j is small, and W † L j is large, with high probability. Finally, a deficit in Q 1 (X t ) compared toQ 1 = n(1 − λ) is unsustainable, as this would lead to a drift down in the total number of customers over a long enough time interval to empty the entire system of customers. This would entail exiting the set B 1 ⊇ L k+1 1 , a contradiction.
As in earlier lemmas, we have The next lemma states that, if the process stays in some set L j+1 1 for a long time, then it quickly enters the "next" set L j 0 , and stays in L j 1 for a long time.
This proof is very similar to that of earlier lemmas, and we mention only a few points. As in Lemma 6.4, we apply Lemma 3.3 to the scaled process Q j (x) = Q j (x)/(λd) (j−1)/2 . The key step is to show that, for x ∈ L j+1 The proof now proceeds as earlier ones.
For part (2) of the lemma, we set ρ = ε 24k n(1 − λ)(λd) (j−1)/2 . We make use of the fact that the value of Q j only changes if either (i) the event is an arrival, and some queue of length at most j − 1 is inspected, or (ii) the event is a departure from some queue of length at most j. From any state x ∈ L j 1 , the probability of (i) is at most d(1 − u j (x)) ≤ dQ j (x)/n ≤ (1 − λ)d j , and the probability of (ii) is at most ( Hence we may apply Lemma 3.3(ii), with p = 2(1 − λ)d j .
We now prove a hitting time lemma for X t 1 , the total number of customers in the system at time t. Let W M = min{t ≥ W L 1 : X t 1 = 0}.
We now combine Lemmas 8.2, 8.3 and 8.4 to prove Proposition 8.1.
Observe that, for a copy (X t ) of the (n, d, λ)-supermarket process starting in a state x 0 ∈ L k+1 1 , exactly one of the following occurs: (a) W † L k+1 < m E , (b) not (a), and one of W † L k , W † L k−1 , . . . , W † L 1 is less than m E , (c) neither of the above, and W L k > 12kε −1 n(1 − λ) −1 , (d) none of the above, and W L j > W L j+1 + ε −1 n(1 − λ) −1 for some j = k − 1, . . . , 1, (e) none of the above, and W M > W L 1 + 72bε −1 n(1 − λ) −1 , (f) none of the above, and W M < m E ≤ W † L k+1 . Indeed, if none of (a)-(e) occurs, then W † L k+1 ≥ m E since (a) fails, and also We now show that the probability of each of (b)-(f) is small. For (b), Lemmas 8.2(2) and 8.3 (2) give that i.e., the probability of (b) is at most 1/2s 0 . The probability of (c) is at most 1/12s 0 by Lemma 8.2(1). The probability of (d) is at most (k−1) 1 3ks ≤ 1/3s 0 by Lemma 8.3 (1). The probability of (e) is at most 1/12s 0 by Lemma 8.4. Finally, (f) is not possible, since at time W M there are no customers in the system, so Q k (X W M ) > n, and thus W M ≥ T † B , but also T † B ≥ W † L k+1 since L k+1 1 ⊆ D 1 ⊆ B 1 by definition. Thus the probability of (a), for a copy of the process starting in a state in L k+1 1 , is at least 1 − 1 2s 0 − 1 12s 0 − 1 3s 0 − 1 12s 0 = 1 − 1 s 0 , which is what we need to prove Proposition 8.1, and thus also Lemma 6.5(1). Now we move to the proof of Lemma 6.5 (2), stating that the exit time T † E is large with high probability. There are two things to prove here. The first is that, if X t ∈ E 1 , then it is very unlikely that, at time t + 1, a customer arrives and creates a queue of length k + 1. The second is that, For t ≥ 0, let L t denote the event that, at time t, a customer arrives and joins a queue of length at least k (equivalently, the probability that the event is an arrival and that all the selected queues have length at least k). So L t is the event that u j (X t ) > u j (X t−1 ) for some j ≥ k + 1.
Lemma 8.5. On the event that X t ∈ E 1 , we have P(L t+1 | F t ) < e − log 2 n .
Proof. From the definition of L t , we have P(L t+1 Hence, on the event that X t ∈ E 1 , as required.
We claim that each of these last two probabilities is at most 1/2s 0 . For the first, we may apply Lemma 8.5. Observe that, if U † = t + 1, then the event L t+1 occurs. We now have: By Lemma 8.5, each term is at most e − log 2 n , and so we have To obtain the other required inequality, we apply the reversed version of Lemma 3.3(ii). We consider the process (X t ), with its natural filtration, the function F = Q k , and the set S = {x : u k+1 (x) ≤ ε(1 − λ)} ∩ D 1 . We set h = (1−3ε)n(1−λ)(λd) k−1 and ρ = εn(1−λ)(λd) k−1 ≥ 2. We also set s = s 0 and T * = T E . We have Take x ∈ S with Q k (x) ≤ h. As x ∈ D 1 , we apply Lemma 7.1 to obtain for such x. The reversed version of Lemma 3.3(ii) gives that as required. This completes the proof of Lemma 6.5.
9. Proofs of Lemmas 6.6 and 6.7 In this section, we prove the final two of our sequence of lemmas.
Proof of Lemma 6.6 Proof. Fix j with 1 ≤ j ≤ k − 1, and consider the state of the process at the hitting time T G j+1 . The hitting time T G j is the first time t ≥ T G j+1 that Q j (X t ) lies in the interval between 1 − (4 + k−j−1/2 k )ε n(1 − λ)(λd) j−1 and 1 + (4 + k−j−1/2 k )ε n(1 − λ)(λd) j−1 . Let B h be the event that Q j (X T G j+1 ) > 1 + (4 + k−j−1/2 k )ε n(1 − λ)(λd) j−1 , and B be the event that Q j (X T G j+1 ) < 1 − (4 + k−j−1/2 k )ε n(1 − λ)(λd) j−1 . For part (1) of the lemma, we have to show that, on the event B h , with high probability Q j (X t ) enters the interval from above within time m G , and also that, on the event B , with high probability Q j (X t ) enters the interval from below within time m G . These two results are essentially the same, and we give details only for the first. Of course, we have nothing to prove on the event that Q j (X T G j+1 ) is already in the interval.
To prove part (2) of the lemma, we need to show that, once X t has reached G j 0 , and while it remains in G j+1 1 , the process is unlikely to leave the set G j 1 quickly. There are two separate things to prove: that Q j (X t ) is unlikely to cross against the drift from 1 + (4 + k−j−1/2 k )ε n(1 − λ)(λd) j−1 to 1 + (4 + k−j k )ε n(1 − λ)(λd) j−1 before time s 0 , and also that Q j (X t ) is unlikely to cross against the drift from 1−(4+ k−j−1/2 k )ε n(1−λ)(λd) j−1 to 1 − (4 + k−j k )ε n(1 − λ)(λd) j−1 before time s 0 . Again, the two calculations required here are essentially identical, and we shall concentrate on the first.
Proof of Lemma 6.7 Proof. We first prove part (1). For i = 1, . . . , n, let N i be the number of potential departures from queue i over the time period between T G 1 and T G 1 +m H , so N i is a binomial random variable with parameters (m H , 1/n(1+ λ)). Recall that L t is the event that, at time t, a customer arrives and joins a queue of length k or longer, and observe that Indeed, at time T G 1 , the process is in A 1 ( , g), and so there is no queue with more than 3 customers in it at that time. If there are at least 3 potential departures from each queue over the time interval, and T G 1 +m H t=T G 1 +1 L t does not occur, then by time T G 1 + m H , every queue is reduced to length at most k, and no new queue of length k + 1 is created before T G 1 + m H . Now let (X t ) = (X T G 1 +t ), (F t ) = (F T G 1 +t ) and L t = L T G 1 +t . We have: ≤ m H e − log 2 n ≤ 1/2s 0 , where we used the strong Markov property, and Lemma 8.5.
Recall that m H = n(8 + 32 log 2 n), so that the mean µ of each N i is m H /n(1 + λ) ≥ 4 + 16 log 2 n. By (3.1), with ε = 1/4, we have µ) ≤ e −µ/32 ≤ e − 1 2 log 2 n for each i. Thus the probability that there are fewer than 3 departures from any queue over the interval from T G 1 to T G 1 + m H is at most ne − 1 2 log 2 n < 1/2s 0 , and part (1) follows.
Thus P(T † H ≤ s 0 < T † G 1 ) is at most the probability that X t exits the set H 1 before time T † G 1 ∧ s 0 , necessarily by the creation of a new queue of length k + 1, is at most 1/s 0 , as required.
We say that two queue-lengths vectors are adjacent if they differ by one customer in one queue, and we first consider two copies of the process starting in adjacent states in A 0 ( , b), coupled according to the coupling referred to in Lemma 4.1. The proof partly follows along the lines of the proof of Lemma 2.6 in [11].
Lemma 10.1. Let x, y be a pair of adjacent states in A 0 ( , b), with x(j 0 ) = y(j 0 ) − 1 for some queue j 0 , and x(j) = y(j) for j = j 0 . Consider coupled departure at the unbalanced queue. Also, if N ≥ j, let Z j be the ±1valued random variable S j − S j−1 . For each non-negative integer j, let ϕ j be the σ-field F T j+1 −1 of all events before time T j+1 . Let also A j be the ϕ j -measurable event B T j+1 , that is the event that X y s , X x s ∈ H for each s with T * H ≤ s ≤ T j+1 − 1. We shall use Lemma 3.4. We take the sequences (ϕ j ) j≥0 , (Z j ) j≥0 , (S j ) j≥0 and (A j ) j≥0 as defined above, and we set k 0 = k and δ = 1/(λd + 1). Note first that, at any time t < T , the probability, conditioned on F t , of an arrival to the longer of the unbalanced queues is at most dλ/n(1 + λ), while the conditional probability of a departure from that queue is 1/n(1 + λ). Therefore, on the event that N ≥ j, the probability, conditioned on ϕ j−1 , that the event at time T j is a departure from the longer unbalanced queue is at least 1/n(1 + λ) 1/n(1 + λ) + dλ/n(1 + λ) = 1 1 + dλ = δ.
In other words, on the event N ≥ j we have P(Z j = −1 | ϕ j−1 ) ≥ δ.
We now show that, on the event {N ≥ j} ∩ A j−1 ∩ {S j−1 ≥ k}, we have To see this, consider a time t ≥ T * H . On the event B t , we have X t ∈ H ⊆ E 1 , and so, by Lemma 8.5, the conditional probability P(L t+1 | F t ) that the event at time t + 1 is an arrival to a queue of length k or greater is at most e − log 2 n . In particular, on the event B t ∩ {W t−1 ≥ k}, the conditional probability that the event at time t + 1 is an arrival joining the longer unbalanced queue is at most e − log 2 n , while the conditional probability that the event at time t + 1 is a departure from the longer unbalanced queue is 1/n(1 + λ). Therefore, on the event {N ≥ j} ∩ A j−1 ∩ {S j−1 ≥ k}, we have P(Z j = −1 | ϕ j−1 ) ≥ 1/n(λ + 1) 1/n(λ + 1) + e − log 2 n ≥ 3 4 .
We have now shown that S m − S 0 can be written as a sum m i=1 Z i for {0, ±1}-valued random variables Z i that satisfy the conditions of Lemma 3.4, with k 0 = k and δ = 1/(λd + 1). (The argument above establishes this for m ≤ N : for m > N , we have set Z m = S m = 0, which also meets the requirements of the lemma.) Note that δ −(k−1) = (λd + 1) k−1 ≤ 2d k−1 . Hence, for m ≥ 16k, Here P(·) refers to the coupling measure in the probability space of Section 4, with coupled copies of the process for each possible starting state.