Fork–join and redundancy systems with heavy-tailed job sizes

Raaijmakers, Youri; Borst, Sem; Boxma, Onno

doi:10.1007/s11134-022-09856-6

Fork–join and redundancy systems with heavy-tailed job sizes

Open access
Published: 01 September 2022

Volume 103, pages 131–159, (2023)
Cite this article

Download PDF

You have full access to this open access article

Queueing Systems Aims and scope Submit manuscript

Fork–join and redundancy systems with heavy-tailed job sizes

Download PDF

1533 Accesses
2 Citations
Explore all metrics

Abstract

We investigate the tail asymptotics of the response time distribution for the cancel-on-start (c.o.s.) and cancel-on-completion (c.o.c.) variants of redundancy-d scheduling and the fork–join model with heavy-tailed job sizes. We present bounds, which only differ in the pre-factor, for the tail probability of the response time in the case of the first-come first-served discipline. For the c.o.s. variant, we restrict ourselves to redundancy-d scheduling, which is a special case of the fork–join model. In particular, for regularly varying job sizes with tail index-$\nu $ the tail index of the response time for the c.o.s. variant of redundancy-d equals -$\min \{d_{\mathrm {cap}}(\nu -1),\nu \}$, where $d_{\mathrm {cap}} = \min \{d,N-k\}$, N is the number of servers and k is the integer part of the load. This result indicates that for $d_{\mathrm {cap}} < \frac{\nu }{\nu -1}$ the waiting time component is dominant, whereas for $d_{\mathrm {cap}} > \frac{\nu }{\nu -1}$ the job size component is dominant. Thus, having $d = \lceil \min \{\frac{\nu }{\nu -1},N-k\} \rceil $ replicas is sufficient to achieve the optimal asymptotic tail behavior of the response time. For the c.o.c. variant of the fork–join ($n_{\mathrm {F}},n_{\mathrm {J}}$) model, the tail index of the response time, under some assumptions on the load, equals $1-\nu $ and $1-(n_{\mathrm {F}}+1-n_{\mathrm {J}})\nu $, for identical and i.i.d. replicas, respectively; here, the waiting time component is always dominant.

Biased Processor Sharing in Fork-Join Queues

Dynamic Control of the Join-Queue Lengths in Saturated Fork-Join Stations

Stochastic bounds in Fork–Join queueing systems under full and partial mapping

Article 24 June 2016

Amr Rizk, Felix Poloczek & Florin Ciucu

1 Introduction

In recent years, the fork–join model has attracted strong interest. This model is a theoretical abstraction of the popular MapReduce framework [8]. MapReduce is a programming model for processing and generating big data sets with parallel algorithms on clusters. In MapReduce, every job is divided into tasks which can be processed in parallel in any order. For completion of the job, the completed tasks need to be joined together.

1.1 Fork–join model

In the fork–join ($n_{\mathrm {F}},n_{\mathrm {J}}$) model, tasks (also referred to as replicas) of a job are assigned to $n_{\mathrm {F}}$ servers selected uniformly at random. Redundant tasks are abandoned as soon as $n_{\mathrm {J}}$ of the $n_{\mathrm {F}}$ tasks either enter service (‘cancel-on-start,’ c.o.s.) or finish service (‘cancel-on-completion,’ c.o.c.). The job is completed when all these $n_{\mathrm {J}}$ tasks complete service.

Note that in the c.o.s. variant of the fork–join ($n_{\mathrm {F}},n_{\mathrm {J}}$) model the dependency structure between the replicas does not play a role, since at all times there is only one replica of the job in service. In contrast, in the c.o.c. variant several replicas of the same job may be in service at the same time, and hence the dependency structure does matter. Special cases of the dependency structure are: (1) perfect dependency between the variables, so-called identical replicas, where the job size is preserved for all replicas and (2) no dependency at all, so-called i.i.d. replicas.

Analytical results for the fork–join model are unfortunately scarce. Tight characterizations of the response time are only known in the special case of $n_{\mathrm {F}}=n_{\mathrm {J}}=2$, see [10]. For a survey on results in other special cases, we refer to [29]. Results for the expectation of the response time are established when $n_{\mathrm {F}}=n_{\mathrm {J}} \rightarrow \infty $, see for example [3, 22]. For a more detailed overview of the results and applications, we refer to [17].

1.2 Redundancy scheduling

Redundancy-d scheduling is a special case within the fork–join model. In redundancy-d scheduling, replicas of a job are assigned to d servers selected uniformly at random. Redundant replicas are abandoned as soon as one of the d replicas either enters service (c.o.s.) or finishes service (c.o.s.). Thus, redundancy-d scheduling is equivalent to the fork–join model with $n_{\mathrm {F}}=d$ and $n_{\mathrm {J}}=1$. Observe that the c.o.s. variant of redundancy-d is equivalent to the Join-the-Smallest-Workload-d (JSW-d) policy, which assigns an arriving job to the server with the smallest workload among d servers selected uniformly at random, see [2]. The c.o.c. variant of redundancy-d shares similarities with a strategy that assigns the job to the server that provides the minimum response time among d servers selected uniformly at random, but involves possibly concurrent service of multiple replicas.

It has been empirically shown that redundancy scheduling can improve performance in parallel-server systems [31], especially in case of highly variable job sizes. More specifically, for large-scale applications such as Google search, the ability of redundancy scheduling to reduce the expectation and the tail of the response time has been demonstrated [7]. Our understanding of redundancy scheduling is growing, and especially the stability condition for c.o.c. redundancy policies has received considerable attention, however, expressions for performance metrics such as the expectation or the distribution of the response time remain scarce. In [16], analytical expressions for the expected response time are obtained for exponential job sizes and independent and identically distributed (i.i.d.) replicas. Under the assumption of asymptotic independence, a fixed-point equation characterizing the response time distribution for identical and i.i.d. replicas is derived in [19].

In this paper, we examine the tail behavior of the response time when job sizes are heavy-tailed, which is one of the most relevant scenarios in redundancy scheduling and the fork–join model. Indeed, heavy tails in parallel processing are encountered in conjunction with the MapReduce framework developed at Google and its Hadoop open source implementation [9]. Moreover, measurement studies show that workload characteristics such as file sizes, CPU times, and session lengths tend to be heavy-tailed, see [17, 23, 32] and the references therein. The tail behavior of the waiting time distribution of the single-server queue is well known, see for example [30] or [32, Chapter 2]. Let $W_{\mathrm {FCFS}}$ denote the waiting time for the single-server queue with the FCFS discipline, for subexponential (see Definition 2 in Appendix A) residual job sizes $B^{\mathrm {res}}$,

$$\begin{aligned} \mathbb {P}(W_{\mathrm {FCFS}}> x) \sim \frac{\tilde{\rho }}{1-\tilde{\rho }} \mathbb {P}(B^{\mathrm {res}}> x) ~~~ \text {as } x \rightarrow \infty , \end{aligned}$$

(1)

where $\tilde{\rho } := \frac{\mathbb {E}[B]}{\mathbb {E}[A]}$ denotes the load with A the interarrival time and B the job size, and

$$\begin{aligned} \mathbb {P}(B^{\mathrm {res}}> x) = \frac{1}{\mathbb {E}[B]} \int _{y=x}^{\infty } \mathbb {P}(B > y) \mathrm {d}y. \end{aligned}$$

In particular, for regularly varying (see Definition 6 in Appendix A) job size distributions with index $-\nu $, i.e., $\mathbb {P}( B > x) =x^{-\nu } L(x)$ with $L(\cdot )$ a slowly varying function at infinity,

$$\begin{aligned} \mathbb {P}(W_{\mathrm {FCFS}} > x) \sim \frac{\tilde{\rho }}{1-\tilde{\rho }} \frac{1}{(\nu - 1)\mathbb {E}[B]} L(x) x^{1-\nu } ~~~ \text {as } x \rightarrow \infty . \end{aligned}$$

(2)

One way to understand the tail index $1-\nu $ is the following. The workload (and waiting time) in an M/G/1 queue is distributed as a geometric($\tilde{\rho }$) sum of residual job sizes $B^{\mathrm {res}}$. According to the theory of regular variation [4], loosely speaking, regular variation is preserved under integration, and asymptotically one can integrate as if L(y) is kept outside the integral; so

$$\begin{aligned} \mathbb {P}(B^{\mathrm {res}} > x) = \frac{1}{\mathbb {E}[B]} \int _{y=x}^{\infty } L(y) y^{-\nu } \mathrm {d}y \sim \frac{1}{(\nu - 1)\mathbb {E}[B]} L(x) x^{1-\nu } ~~~ \text {as } x \rightarrow \infty , \end{aligned}$$

(3)

which implies that if B is regularly varying with index $-\nu $, then $B^{\mathrm {res}}$ is regularly varying with index $1-\nu $.

The tail behavior in the single-server queue has also been studied for other service disciplines. For regularly varying job sizes, the random order of service (ROS) discipline has the same tail index as the FCFS discipline, but with a smaller pre-factor [6]. For the last-come first-served with preemptive resume (LCFS-PR) discipline and the processor-sharing (PS) discipline, the tail index of the response time for regularly varying job sizes is the same as the tail index of the job size, see [33, 34], respectively. Thus, from a tail perspective, these service disciplines perform better than the FCFS discipline.

Closer related to the c.o.s. variant of redundancy scheduling are the results for the tail behavior of the waiting time for the Join-the-Smallest-Workload (JSW) policy or equivalently the GI/G/N queue, see [12, 13]. The key idea in [12, 13] to first consider deterministic interarrival times made the derivation of the tail behavior substantially more tractable. In [13], it is shown that for long-tailed residual job sizes and $\tilde{\rho }>k$, where $k := \left\lfloor \tilde{\rho } \right\rfloor $ is the integer part of the load,

$$\begin{aligned} \mathbb {P}(W_{\mathrm {JSW}}> x) \ge \frac{\tilde{\rho }^{N-k}+ o(1)}{(N-k)!} \mathbb {P} \left( B^{\mathrm {res}} > \frac{\tilde{\rho }+\delta }{\tilde{\rho }-k} x \right) ^{N-k} ~~~ \text {as } x\rightarrow \infty , \end{aligned}$$

(4)

for any $\delta >0$. For subexponential residual job sizes and $\tilde{\rho } < k+1$, it is shown that

$$\begin{aligned} \mathbb {P}(W_{\mathrm {JSW}}> x) \le {N \atopwithdelims ()k} \left( \frac{(k+1)\tilde{\rho }}{(k+1)-\tilde{\rho }}+o(1)\right) ^{N-k} \mathbb {P} \left( B^{\mathrm {res}}> x (1-\delta ) \right) ^{N-k} ~~~ \text {as } x\rightarrow \infty . \end{aligned}$$

(5)

A heuristic explanation for the exponent $N-k$ in Eq. (4) is as follows. After the arrival of $N-k$ big jobs, $N-k$ servers will be working on these big jobs for a very long time. The other k servers form an unstable GI/G/k system, which implies that the workload drifts linearly to infinity. Thus, eventually the workload at all N servers will exceed level x, causing the waiting time of an arriving job to be larger than x.

In this paper, we investigate the tail behavior of the response time for both the c.o.s. and c.o.c. variants of redundancy scheduling and the fork–join model when job sizes are heavy-tailed. Throughout the paper, we assume that the system under consideration is in steady state. For regularly varying job sizes with tail index $-\nu $ and the FCFS discipline, it is shown that the response time for the c.o.s. variant of redundancy-d has tail index $-\min \{d_{\mathrm {cap}}(\nu -1),\nu \}$, where $d_{\mathrm {cap}} = \min \{d,N-k\}$ and $k = \lfloor \tilde{\rho } \rfloor $. For small loads, this result indicates that for $d < \frac{\nu }{\nu -1}$ the waiting time component is dominant, whereas for $d > \frac{\nu }{\nu -1}$ the job size component is dominant. Thus, having $d = \lceil \min \{\frac{\nu }{\nu -1}\} \rceil $ replicas already achieves the optimal asymptotic tail behavior of the response time and creating even more replicas yields no improvements in terms of response time tail asymptotics. For high loads, the results indicate that creating many replicas yields no benefits for the tail index of the response time. For the c.o.c. variant of the more general fork–join ($n_{\mathrm {F}},n_{\mathrm {J}}$) model with identical and i.i.d. replicas, the tail index of the response time is $1-\nu $ and $1-(n_{\mathrm {F}}+1-n_{\mathrm {J}})\nu $, respectively, and the waiting time component is always dominant. Note that in this case the tail index is independent of the load of the system and for identical replicas even independent of the number of replicas. In the special case of redundancy-d scheduling with identical and i.i.d. replicas, it follows that the tail index of the response time is $1-\nu $ and $1-d\nu $, respectively. All these results for the c.o.c. variant rely on the fact that the upper bound system, which is used in the proof, is stable. The stability condition of this system does not necessarily coincide with the stability condition of the original fork–join model.

For the LCFS-PR discipline in the fork–join model, we show that the response time tail is just as heavy as the job size tail, implying that for the c.o.c. variant this discipline achieves better tail asymptotics than the FCFS discipline. For the c.o.s. variant, the LCFS-PR discipline has better tail asymptotics than the FCFS discipline for scenarios with low load and a small number of replicas; in all other scenarios, both service disciplines have similar tail asymptotics. In [27], it is shown that for the c.o.c. variant of redundancy-d scheduling with the PS discipline the tail index of the response time is $-\nu $ for identical replicas and $-d\nu $ for i.i.d. replicas. Table 1 provides an overview of the tail index for the various models and service disciplines.

Table 1 Overview of the tail index for the c.o.s. and c.o.c. variant of redundancy scheduling with various service disciplines where the job size is regularly varying with tail index $-\nu $

Full size table

The remainder of the paper is organized as follows. In Sect. 2, we provide a model description and state preliminary results. In Sect. 3, we characterize the tail behavior of the response time for the c.o.s. variant of redundancy scheduling and the c.o.c. variant of the more general fork–join model with the FCFS discipline, with some proofs deferred to Appendix B. In Sect. 4, we discuss the tail behavior in the fork–join model with the LCFS-PR discipline. Section 5 provides numerical results on the tail behavior of the response time in redundancy scheduling with Pareto distributed job sizes. Section 6 contains conclusions and some suggestions for further research. The paper ends with two appendices. Appendix A collects various definitions and results for heavy-tailed random variables, which will be used in the paper. Appendix B provides the proof of part of Theorem 1.

2 Model description and preliminaries

Consider a system of N parallel unit-speed servers. Jobs arrive at the epochs of a renewal process, with successive interarrival times $A_{i}$, $i \ge 1$, each distributed as a generic random variable A. When a job arrives, a dispatcher assigns replicas of the job to $n_{\mathrm {F}}$ servers chosen uniformly at random (without replacement), where $1 \le n_{\mathrm {F}} \le N$. We consider two possible variants where redundant replicas are abandoned as soon as $n_{\mathrm {J}}$ of the $n_{\mathrm {F}}$ replicas either have entered service (c.o.s.) or have finished service (c.o.c.). If in the c.o.s. variant multiple replicas enter service at exactly the same time, then one of these replicas is chosen uniformly at random and starts service. A special case of the fork–join model is redundancy-d scheduling, where $n_{\mathrm {F}}=d$ and $n_{\mathrm {J}}=1$. As observed in the introduction, in the c.o.s. variant of redundancy-d the dependency structure between the replicas does not play a role, but in the c.o.c. variant of redundancy-d, and also in the fork–join model, the dependency structure does matter. We thus allow the replica sizes $B_{1},\dots ,B_{n_{\mathrm {F}}}$ of a job to be governed by some joint distribution function $F_{B}(b_{1},\dots ,b_{n_{\mathrm {F}}})$, where $B_{i}$, $i=1,\dots ,n_{\mathrm {F}}$, are each distributed as some random variable B, but not necessarily independent. Special cases of the dependency structure are: (1) perfect dependency between the variables, so-called identical replicas, where the job size is preserved for all replicas, i.e., $B_{i}=B$, for all $i=1,\dots ,n_{\mathrm {F}}$, (2) no dependency at all, so-called i.i.d. replicas.

Finally, let us denote the steady-state waiting times of the replicas at their $n_{\mathrm {F}}$ servers (the time until their service starts if they are still in the system) by $W_1,\dots ,W_{n_{\mathrm {F}}}$ and the steady-state response time by R. Let $X_{(n_{\mathrm {J}})}$ denote the $n_{\mathrm {J}}$th order statistic of a set of random variables $X_{1},\dots ,X_{N}$ and the real random variables $Y_{1}$ and $Y_{2}$ are equal in distribution (denoted by $Y_{1} {\mathop {=}\limits ^{d}} Y_{2}$) if $\mathbb {P}(Y_{1}> x) = \mathbb {P}(Y_{2} > x)$ for all $x \in (-\infty , \infty ) $.

3 FCFS discipline

In this section, we analyze the tail asymptotics of the response time with the FCFS discipline. For the c.o.s. variant (Sect. 3.1), we restrict ourselves to redundancy-d scheduling, whereas for the c.o.c. variant (Sect. 3.2) we allow for the more general fork–join model.

3.1 Cancel-on-start

Observe that the steady-state response time in the c.o.s. variant of redundancy-d is given by

$$\begin{aligned} R {\mathop {=}\limits ^{d}} \min \{W_{1},\dots ,W_{d}\} + B. \end{aligned}$$

(6)

We refer to the time between the arrival of a job and the moment the first replica goes into service as the waiting time $W_{\mathrm {min}} = \min \{W_{1},\dots ,W_{d}\}$ of a job. As mentioned earlier, the c.o.s. variant of redundancy-d is equivalent to the Join-the-Smallest-Workload-d (JSW-d) policy, which assigns each job to the server with the smallest workload among d servers selected uniformly at random.

For general interarrival times and job sizes, the stability condition for the system with the JSW-d policy and FCFS is given by $\tilde{\rho } = \frac{\mathbb {E}[B]}{\mathbb {E}[A]} < N$, see [11].

In [13, Theorem 1.6], lower and upper bounds are derived for the tail probability of the waiting time for the JSW policy. The same methodology can be used to find lower and upper bounds for JSW-d, and hence for the c.o.s. variant of redundancy scheduling with $1 \le d \le N$ replicas, resulting in Theorem 1. The two derived lower bounds in this theorem hold for every value of $\tilde{\rho }$, but they are asymptotically dominant for different regions of $\tilde{\rho }$, as explained after the theorem. Note that for $d=N$ Theorem 1 recovers the results of [13] as captured in (4) and (5), whereas for $d=1$ the system is equivalent to a GI/G/1 queue for which the tail behavior is given by (2).

Theorem 1

Consider the c.o.s. variant of redundancy-d scheduling with the FCFS discipline. Let $k = \lfloor \tilde{\rho } \rfloor \in \{0,1,\dots ,N-1\}$ be the integer part of the load and $\delta >0$.

i) If the residual job size $B^{\mathrm {res}}$ is long-tailed, then

$$\begin{aligned} \mathbb {P}(W_{\mathrm {min}} > x) \ge \frac{1}{{N \atopwithdelims ()d}} \frac{\tilde{\rho }^{d}+ o(1)}{d!} \left( \bar{B}^{\mathrm {res}} \left( \left( 1 + \delta \right) x \right) \right) ^{d}. \end{aligned}$$

(7)

ii) If $\tilde{\rho } < N - d$ and the residual job size $B^{\mathrm {res}}$ is subexponential, then

$$\begin{aligned} \mathbb {P}(W_{\mathrm {min}} > x) \le {N \atopwithdelims ()d} \left( \frac{(k+1)\tilde{\rho }}{k+1-\tilde{\rho }}+o(1)\right) ^{d} \left( \bar{B}^{\mathrm {res}} \left( \frac{x (1-\delta )}{k+1} \right) \right) ^{d}. \end{aligned}$$

(8)

iii) If the residual job size $B^{\mathrm {res}}$ is long-tailed, then

$$\begin{aligned} \mathbb {P}(W_{\mathrm {min}} > x) \ge \frac{\tilde{\rho }^{N-k}+ o(1)}{(N-k)!} \left( \bar{B}^{\mathrm {res}}\left( \frac{\tilde{\rho }+\delta }{\tilde{\rho }- k} x\right) \right) ^{N-k}. \end{aligned}$$

(9)

iv) If $\tilde{\rho } > N - d$ and the residual job size $B^{\mathrm {res}}$ is subexponential, then

$$\begin{aligned} \mathbb {P}(W_{\mathrm {min}} > x) \le {N \atopwithdelims ()k} \left( \frac{(k+1)\tilde{\rho }}{k+1-\tilde{\rho }}+o(1)\right) ^{N-k} \left( \bar{B}^{\mathrm {res}} \left( \frac{(k+1-N+d)x (1-\delta )}{k+1} \right) \right) ^{N-k}. \end{aligned}$$

(10)

Proof

Let $\varvec{V}=(V_{1},\dots ,V_{N})$ denote the vector of residual workloads of the servers. Recall that $V_{(i)}$ denotes the ith-order statistic of the set $V_{1},\dots ,V_{N}$. The proof of i) follows from the inequality

$$\begin{aligned} \mathbb {P}(W_{\mathrm {min}}> x) \ge \frac{1}{{N \atopwithdelims ()d}} \mathbb {P}(V_{(1)}> x,\dots ,V_{(d)} > x), \end{aligned}$$

with $\frac{1}{{N \atopwithdelims ()d}}$ corresponding to the probability that the replicas of an arbitrary job are assigned to the servers with the d largest workloads, and where

$$\begin{aligned} \mathbb {P}(V_{(1)}> x,\dots ,V_{(d)} > x) \ge \frac{\tilde{\rho }^{d}+ o(1)}{d!} \left( \bar{B}^{\mathrm {res}} \left( \left( 1 + \delta \right) x \right) \right) ^{d}, \end{aligned}$$

by similar arguments as in the proof of Lemma 3.1 in [13]. The proof of iii) follows from the inequality

$$\begin{aligned} \mathbb {P}(W_{\mathrm {min}}> x) \ge \mathbb {P}(V_{1}> x,\dots ,V_{N} > x), \end{aligned}$$

where

$$\begin{aligned} \mathbb {P}(V_{1}> x,\dots ,V_{N} > x) \ge \frac{\tilde{\rho }^{N-k}+ o(1)}{(N-k)!} \left( \bar{B}^{\mathrm {res}}\left( \frac{\tilde{\rho }+\delta }{\tilde{\rho }- k} x\right) \right) ^{N-k}, \end{aligned}$$

by similar arguments as in the proof of Theorem 5.1 in [13]. The proof of ii) and iv) can be found in Appendix B.

As reflected in the proof sketches, the asymptotic lower bounds in (7) and (9) correspond to two different scenarios for a large value of $W_{\mathrm {min}}$ to occur.

Scenario 1 involves the arrival of d jobs of size x or larger ‘overlapping in time.’ In the JSW-d system, these jobs will be assigned to d different servers with overwhelming probability for large x, and thus the workload at these d servers will exceed x. A newly arriving job that is so unfortunate as to sample exactly these d servers (which happens with probability $1/{N \atopwithdelims ()d}$) will experience a waiting time larger than x. Scenario 2 involves the arrival of $N-k$ sufficiently large jobs ‘overlapping in time,’ which instantaneously causes the workloads at $N-k$ servers to become large as described above, assuming $N - k \le d$. This will also result in subsequent jobs all being assigned to one of the other k servers and hence create overload, so that the workloads at these servers will gradually start growing. Thus, eventually the workloads at all servers will be large, and every arriving job will experience a large waiting time. Observe that this scenario corresponds to that in the GI/G/N queue discussed in [13], as illustrated by the match with Eq. (4).

Scenarios 1 and 2 are asymptotically dominant in case $d \le N-k$ and $d \ge N-k$, respectively, reflecting that a large waiting time is most likely due to a minimum number of $d_{\text{ cap }} = \min \{d, N - k\}$ large jobs. Note that in Scenario 1 the workloads at all servers will in fact grow large as well when $d \ge N - k$, but that Scenario 2 dominates in that case.

Scenarios with large workloads at l servers, with $d< l < N$, do not asymptotically contribute to the probability of a large waiting time. This may be intuitively explained by observing the following. (1) If such scenarios involve strictly more than d large workloads without resulting in overload of all servers (so $d< l < N-k$) then they are asymptotically much less likely than Scenario 1. (2) If such scenarios involve $l \ge N-k$ large workloads, this will quickly result in overload of all servers, just like in Scenario 2.

Extending these arguments and the results in [13] to the c.o.s. variant of the fork–join ($n_{\mathrm {F}},n_{\mathrm {J}}$) model is complicated since in that model multiple replicas of the same job may be in service simultaneously.

Corollary 1

(Analogous to Corollary 1.1 in [13]) Let the residual job size $B^{\mathrm {res}}$ be long-tailed and dominated varying and $k<\tilde{\rho }<k+1$, i.e., $\tilde{\rho }$ not an integer value. Then, there exist constants $c_{1}$ and $c_{2}$ such that, for all x,

$$\begin{aligned} c_{1} \left( \bar{B}^{\mathrm {res}}(x ) \right) ^{d_{\mathrm {cap}}} \le \mathbb {P}(W_{\mathrm {min}} > x) \le c_{2} \left( \bar{B}^{\mathrm {res}}(x ) \right) ^{d_{\mathrm {cap}}}, \end{aligned}$$

where $d_{\mathrm {cap}} = \min \{d,N-k\}$.

Proof

The result follows directly from Theorem 1, the last inclusion in (A.3) and the definition of dominated variation (Definition 4 in Appendix A).

Remark 1

Note that in Corollary 1 we exclude integer values for the load. Most of the results in the literature for heavy-tailed queueing systems focus on the case where the load is not an integer, since the integer case is significantly more delicate to analyze. For a detailed study on the integer case in the GI/G/2 queueing system we refer to [5].

Corollary 2

For the c.o.s. variant of redundancy-d scheduling with the FCFS discipline:

i)
if $B \in RV(-\nu )$, then $W_{\mathrm {min}} \in ORV(d_{\mathrm {cap}}(1-\nu ))$,
ii)
if $B \in RV(-\nu )$, then $R \in ORV(-\min \{d_{\mathrm {cap}}(\nu -1),\nu \})$.

Proof

It is well known that if $B \in RV(-\nu )$, then $B^{\mathrm {res}} \in RV(1-\nu )$, see (3). The proof of i) follows by applying this result to Corollary 1 together with the inclusion $RV \subset \mathcal {L} \cap {\mathcal {D}}$ from (A.3) and Lemma 6 in Appendix A (see Definition 5 in Appendix A for the definition of $\mathcal {O}$-regularly varying; ORV). The proof of ii) follows by i), Eq. (6) and Lemma 5 in Appendix A.

From Corollary 2, we conclude that the waiting time component is dominant in the response time tail as long as $d_{\mathrm {cap}} \le \frac{\nu }{\nu -1}$, but otherwise the job size component is dominant. Better than that ($x^{-\nu }$ tail behavior) is, obviously, not possible for the response time. In other words, having more than $\frac{\nu }{\nu -1}$ replicas will not provide any improvement in the tail behavior. For example, consider a system with a sufficiently small load. If $\nu =4/3$, then $d=4$ already yields $R \in ORV(-\nu )$, and from a tail perspective choosing $d>4$ yields no benefits. If $\nu =3/2$, then it does not pay to take d larger than 3. If $\nu \ge 2$ (so B has a finite second moment), then it does not pay to take d larger than 2. For high loads, the results indicate that creating many replicas yields no benefits for the tail index of the response time.

3.2 Cancel-on-completion

In this section, we analyze the tail behavior for the c.o.c. variant of the fork–join($n_{\mathrm {F}},n_{\mathrm {J}}$) model.

The steady-state response time may be represented as

$$\begin{aligned} R {\mathop {=}\limits ^{d}} (W+B)_{(n_{\mathrm {J}})}, \end{aligned}$$

(11)

where $(W+B)_{(n_{\mathrm {J}})}$ denotes the $n_{\mathrm {J}}$th order statistic of the random variables $W_{1}+B_{1},\dots ,W_{n_{\mathrm {F}}}+B_{n_{\mathrm {F}}}$.

Our analysis is based on an upper and lower bound for the waiting time and response time via the workload in carefully chosen upper and lower bound systems.

We first introduce the upper bound system which is the same as the original system except for two differences. In the upper bound system all jobs are assigned to the same $n_{\mathrm {F}}$ servers. The second difference is that in the upper bound system the sizes of the $n_{\mathrm {J}}$ smallest replicas are increased to $B_{(n_{\mathrm {J}})}$. This upper bound system is similar to the system defined in [25, Lemma 1].

Let us define the workload as the amount of work a server needs to complete to become idle in absence of any arrivals. Consider the scenario where all $n_{\mathrm {F}}$ servers have the same amount of workload. For the FCFS discipline, it follows from the cancel-on-completion property that the $n_{\mathrm {J}}$th smallest replica will always be the $n_{\mathrm {J}}$th to complete, after which the other remaining replicas are abandoned. Hence, the workload at these $n_{\mathrm {F}}$ servers stays equal at all times, and it follows that the upper bound system with multiple servers is equivalent to a GI/G/1/FCFS queue with interarrival time A and job size $B_{(n_{\mathrm {J}})}$.

Lemma 1

The maximum workload in the c.o.c. variant of the fork–join($n_{\mathrm {F}},n_{\mathrm {J}}$) model is sample-pathwise bounded from above by the workload in the upper bound system.

Proof

Let $\omega _{i}$ be the workload at server i in the c.o.c. variant of the fork–join($n_{\mathrm {F}},n_{\mathrm {J}}$) model, and let the maximum workload be defined as $\max _{j \in \{1,\dots ,N \}}\omega _{j} = \omega _{(N)}$. Let $s_{l}$ and $b_{l}$ denote the sampled server and the realized job size of the l-th replica, respectively, for $l=1,\dots ,n_{\mathrm {F}}$. By induction, it can be shown that $\omega _{i}$ is bounded from above by the workload $\omega _{\mathrm {U}}$ in the upper bound system at all times. Assume that $\omega _{(N)} \le \omega _{\mathrm {U}}$ after the m-th arrival. Then, after the $(m+1)$-th arrival the new workload, denoted by $\omega _{\mathrm {new},s_{l}}$, is

$$\begin{aligned} \omega _{\mathrm {new},s_{l}} = \max \{(\omega _{s_{l}} +b_{l})_{(n_{\mathrm {J}})}, \omega _{s_{l}}\} \le \max \{ (\omega _{(N)}+b_{l})_{(n_{\mathrm {J}})}, \omega _{(N)}\} = \omega _{(N)} + b_{(n_{\mathrm {J}})}, \end{aligned}$$

for $l=1,\dots ,n_{\mathrm {F}}$, since $\omega _{i} \le \omega _{(N)}$ for all $i=1,\dots ,N$. Thus, the increase in maximum workload is bounded by $b_{(n_{\mathrm {J}})}$, which is exactly the increase in workload in a GI/G/1/FCFS queue with interarrival time A and job size $B_{(n_{\mathrm {J}})}$.

Corollary 3

The waiting time $W_{(n_{\mathrm {J}})}$ and the response time R in the c.o.c. variant of the fork–join($n_{\mathrm {F}},n_{\mathrm {J}}$) model with the FCFS discipline are stochastically bounded from above by the waiting time $W_{\mathrm {U}}$ and response time $R_{\mathrm {U}}$, respectively, in the upper bound system.

Proof

By Lemma 1, the maximum workload in the c.o.c. variant of the fork–join($n_{\mathrm {F}},n_{\mathrm {J}}$) model is bounded from above by the workload $W_{\mathrm {U}}^{0}$ in the upper bound system. This bound implies $W_{i} \le W_{\mathrm {U}}^{0}$ for all $i=1,\dots ,N$, from which it follows that $W_{(n_{\mathrm {J}})} \le W_{\mathrm {U}}^{0} {\mathop {=}\limits ^{d}} W_{\mathrm {U}}$ and

$$\begin{aligned} R {\mathop {=}\limits ^{d}} (W+B)_{(n_{\mathrm {J}})} \le W_{\mathrm {U}}^{0} + B_{(n_{\mathrm {J}})} {\mathop {=}\limits ^{d}} W_{\mathrm {U}} + B_{(n_{\mathrm {J}})} {\mathop {=}\limits ^{d}} R_{\mathrm {U}}. \end{aligned}$$

We now introduce a lower bound system. In the lower bound system, we only admit jobs for which the $n_{\mathrm {F}}$ replicas are assigned to servers $1,\dots ,n_{\mathrm {F}}$, and in addition the ith smallest replica is assigned to server i, $i=1,\dots ,n_{\mathrm {J}}$. Hence, we only admit a fraction 1/K of the jobs, where $K= {N \atopwithdelims ()n_{\mathrm {F}}} \frac{n_{\mathrm {F}}!}{(n_{\mathrm {F}}-n_{\mathrm {J}}+1)!}$ and we do not alter the assignment of the replicas.

Lemma 2

The workload at each server in the c.o.c. variant of the fork–join($n_{\mathrm {F}},n_{\mathrm {J}}$) model is sample-pathwise bounded from below by the workload at the corresponding server in the lower bound system.

Proof

Since in the lower bound system we only allow arrivals to the first $n_{\mathrm {F}}$ servers for which in addition the ith smallest replica is assigned to server i, $i=1,\dots ,n_{\mathrm {J}}$ and since we delete the other arrivals (which are not deleted in the original system), the amount of work at each server in the lower bound system cannot be larger than the amount of work at the corresponding server in the original system.

For the FCFS discipline, the $n_{\mathrm {J}}$th smallest replica will always be the $n_{\mathrm {J}}$th to complete in the lower bound system. Moreover, this replica is always assigned to server $n_{\mathrm {J}}$. Hence, this server acts as the bottleneck server since it dictates the waiting time and the response time of all the admitted jobs, and can be viewed as the server of a GI/G/1/FCFS queue with a random selection of the arrivals based on Bernoulli experiments with probability 1/K, i.e., mean interarrival time $K \mathbb {E}[A]$, and with a job size $B_{(n_{\mathrm {J}})}$.

Corollary 4

The waiting time $W_{(n_{\mathrm {J}})}$ and the response time R in the c.o.c. variant of the fork–join($n_{\mathrm {F}},n_{\mathrm {J}}$) model with the FCFS discipline are stochastically bounded from below by the waiting time $W_{\mathrm {L}}$ and response time $R_{\mathrm {L}}$, respectively, in the above-mentioned GI/G/1/FCFS queue.

Proof

By Lemma 2, the workload at each server in the c.o.c. variant of the fork–join($n_{\mathrm {F}},n_{\mathrm {J}}$) model is bounded from below by the workload at the corresponding server in the lower bound system. Also, in the lower bound system, the workload $W_{\mathrm {L}}^{0}$ at the bottleneck server $n_{\mathrm {J}}$ is no smaller than the workload at servers $1,\dots ,n_{\mathrm {J}}-1$, and no larger than the workload at servers $n_{\mathrm {J}}+1,\dots ,n_{\mathrm {F}}$. Thus, $W_{i} \ge W_{\mathrm {L}}^{0}$ for all $i=n_{\mathrm {J}},\dots ,n_{\mathrm {F}}$, which implies $W_{(n_{\mathrm {J}})} \ge W_{\mathrm {L}}^{0} {\mathop {=}\limits ^{d}} W_{\mathrm {L}}$. Since the replica sizes at servers $n_{\mathrm {J}}+1,\dots ,n_{\mathrm {F}}$ are no smaller than at server $n_{\mathrm {J}}$, it further follows that

$$\begin{aligned} R {\mathop {=}\limits ^{d}} (W+B)_{(n_{\mathrm {J}})} \ge W_{\mathrm {L}}^{0} + B_{(n_{\mathrm {J}})} {\mathop {=}\limits ^{d}} W_{\mathrm {L}} + B_{(n_{\mathrm {J}})} {\mathop {=}\limits ^{d}} R_{\mathrm {L}}. \end{aligned}$$

A sufficient stability condition for general interarrival times and job sizes is $\rho _{\mathrm {U}} := \frac{\mathbb {E}[B_{(n_{\mathrm {J}})}]}{\mathbb {E}[A]} < 1$, which can be proved via the upper bound system given in Corollary 3. The exact stability condition for the c.o.c. variant of the fork–join($n_{\mathrm {F}},n_{\mathrm {J}}$) model, and also redundancy-d scheduling, with the FCFS discipline in such a general setting is still unknown. Observe that it is hard to improve upon this sufficient stability condition resulting from the upper bound system. Indeed, finding an upper bound system that copes with multiple replicas, which may be in service concurrently and have different starting times, while being analytically tractable is difficult, as is also reflected in the scarcity of analytical results for the fork–join model in the literature.

Theorem 2

If $\rho _{\mathrm {U}} < 1$ and the residual job size $B_{(n_{\mathrm {J}})}^{\mathrm {res}}$ is subexponential, then for the c.o.c. variant of the fork–join($n_{\mathrm {F}},n_{\mathrm {J}}$) model with the FCFS scheduling discipline:

$$\begin{aligned} \frac{\rho _{\mathrm {L}}}{1-\rho _{\mathrm {L}}} \bar{B}_{(n_{\mathrm {J}})}^{\mathrm {res}}(x) \le \mathbb {P}(W_{(n_{\mathrm {J}})} > x) \le \frac{\rho _{\mathrm {U}}}{1-\rho _{\mathrm {U}}} \bar{B}_{(n_{\mathrm {J}})}^{\mathrm {res}}(x) ~~~ \text {as } x \rightarrow \infty , \end{aligned}$$

where $\rho _{\mathrm {L}} =\frac{\mathbb {E}[B_{(n_{\mathrm {J}})}]}{K \mathbb {E}[A]}$ with $K={N \atopwithdelims ()d}\frac{n_{\mathrm {F}}!}{(n_{\mathrm {F}}-n_{\mathrm {J}}+1)!}$ and $\rho _{\mathrm {U}} =\frac{\mathbb {E}[B_{(n_{\mathrm {J}})}]}{\mathbb {E}[A]}$.

Proof

Upper bound: By Corollary 3, the waiting time of a job is bounded from above by the waiting time $W_{\mathrm {U}}$ in a GI/G/1/FCFS queue with interarrival time A and job size $B_{(n_{\mathrm {J}})}$. Thus, by the subexponentiality of $B_{(n_{\mathrm {J}})}^{\mathrm {res}}$, we can apply known results for the single-server queue, see (1), and obtain

$$\begin{aligned} \mathbb {P}(W_{\mathrm {U}} > x) \sim \frac{\rho _{\mathrm {U}}}{1-\rho _{\mathrm {U}}} \bar{B}_{(n_{\mathrm {J}})}^{\mathrm {res}}(x) ~~~ \text {as } x \rightarrow \infty . \end{aligned}$$

(12)

Lower bound: By Corollary 4, the waiting time of a job is bounded from below by the waiting time $W_{\mathrm {L}}$ in a GI/G/1/FCFS queue with a random selection of the arrivals based on Bernoulli experiments with probability 1/K, i.e., mean interarrival time $K \mathbb {E}[A]$, and job size $B_{(n_{\mathrm {J}})}$. Again, by the subexponentiality of $B_{(n_{\mathrm {J}})}^{\mathrm {res}}$, by applying known results for the single-server queue we obtain

$$\begin{aligned} \mathbb {P}(W_{\mathrm {L}} > x) \sim \frac{\rho _{\mathrm {L}}}{1-\rho _{\mathrm {L}}} \bar{B}_{(n_{\mathrm {J}})}^{\mathrm {res}}(x) ~~~ \text {as } x \rightarrow \infty . \end{aligned}$$

(13)

By combining (12) and (13), we get the desired statement.

Note that the lower bound in Theorem 2 is valid even if $\rho _{\mathrm {U}} > 1$, since the auxiliary system in Corollary 4 is stable if the original system is stable.

The next corollary provides insight in the tail behavior when the distribution of the $n_{\mathrm {J}}$th order statistic of the job size is regularly varying, i.e., $B_{(n_{\mathrm {J}})} \in RV(-\tilde{\nu })$. Observe that, in the special case of identical replicas $B_{(n_{\mathrm {J}})} \in RV(-\nu )$ when $B \in RV(-\nu )$, thus in this case $\tilde{\nu } = \nu $, whereas for i.i.d. replicas $B_{(n_{\mathrm {J}})} \in RV(-(n_{\mathrm {F}}+1-n_{\mathrm {J}})\nu )$ when $B \in RV(-\nu )$ (see [20]), thus in this case $\tilde{\nu } = (n_{\mathrm {F}}+1-n_{\mathrm {J}})\nu $.

Corollary 5

For the c.o.c. variant of the fork–join($n_{\mathrm {F}},n_{\mathrm {J}}$) model with the FCFS discipline and $\rho _{\mathrm {U}} < 1$:

i)
if $B_{(n_{\mathrm {J}})} \in RV(-\tilde{\nu })$, then $W_{(n_{\mathrm {J}})} \in ORV(1-\tilde{\nu })$,
ii)
if $B_{(n_{\mathrm {J}})} \in RV(-\tilde{\nu })$, then $R \in ORV(1-\tilde{\nu })$.

Proof

For regularly varying residual job sizes, we know that

$$\begin{aligned} \mathbb {P}(B_{(n_{\mathrm {J}})}^{\mathrm {res}} > x) \sim \frac{1}{(\tilde{\nu } - 1)\mathbb {E}[B_{(n_{\mathrm {J}})}]} L(x) x^{1-\tilde{\nu }} ~~~ \text {as } x \rightarrow \infty , \end{aligned}$$

see (3). The proof of i) follows by Theorem 2 and Lemma 6. For the response time, we can again use Corollaries 3 and 4 as in Theorem 2. Using the known result for the tail behavior in the single-server queue, see (2), together with Lemma 4 we obtain that

$$\begin{aligned} \mathbb {P}(R> x) \ge \mathbb {P}(R_{\mathrm {L}}> x) = \mathbb {P}(W_{\mathrm {L}} + B_{(n_{\mathrm {J}})} > x) \sim \frac{\rho _{\mathrm {L}}}{1-\rho _{\mathrm {L}}} \frac{L(x) x^{1-\tilde{\nu }}}{(\tilde{\nu } - 1)\mathbb {E}[B_{(n_{\mathrm {J}})}]}~~~ \text {as } x \rightarrow \infty , \end{aligned}$$

and

$$\begin{aligned} \mathbb {P}(R> x) \le \mathbb {P}(R_{\mathrm {U}}> x) = \mathbb {P}(W_{\mathrm {U}} + B_{(n_{\mathrm {J}})} > x) \sim \frac{\rho _{\mathrm {U}}}{1-\rho _{\mathrm {U}}} \frac{L(x) x^{1-\tilde{\nu }}}{(\tilde{\nu } - 1)\mathbb {E}[B_{(n_{\mathrm {J}})}]} ~~~ \text {as } x \rightarrow \infty . \end{aligned}$$

Now we can apply Lemma 6 in Appendix A and obtain the desired result.

Remark 2

For identical replicas, we can even find a better upper bound in Theorem 2. Indeed, consider the system in which all replicas are completely served. This system is equivalent to a GI/G/1/FCFS queue with a random selection of the arrivals based on Bernoulli experiments with probability $n_{\mathrm {F}}/N$, i.e., mean interarrival time $\frac{N \mathbb {E}[A]}{n_{\mathrm {F}}}$, and job size B, which is equal to $B_{(n_{\mathrm {J}})}$ in the case of identical replicas.

Observe that all the results for the tail index rely on the fact that the upper bound system is stable. The stability condition of this system does not necessarily coincide with the stability condition of the original fork–join model. We conjecture that these tail index results are valid whenever the original fork–join model is stable. However, note that constructing a tractable upper bound system with the same stability condition as the original fork–join model is hard, because this stability condition is unknown.

Interestingly, in contrast to the c.o.s. variant of redundancy, we observe that the tail index in the c.o.c. variant of the fork–join model does not depend on the load of the system. The main difference between the two variants is that for the c.o.s. variant we need multiple big jobs for a large value of $W_{\mathrm {min}}$ to occur, whereas for the c.o.c. variant we only need one big job. Moreover, note that a big job means that at least $n_{\mathrm {F}}+1-n_{\mathrm {J}}$ replica sizes should be big since we cancel the redundant replicas as soon as the first $n_{\mathrm {J}}$ replicas complete service. This is the reason why for i.i.d. replicas we get the tail index $1-(n_{\mathrm {F}}+1-n_{\mathrm {J}})\nu $ and for identical replicas $1-\nu $.

In the remainder of this subsection, we focus on two special cases of the dependency structure, namely identical and i.i.d. replicas.

For the special case of identical replicas in the c.o.c. variant of the fork–join model with the FCFS discipline, we have concluded: if $B \in RV(-\nu )$, then $R \in ORV(1-\nu )$ which is independent of the number of replicas. We may conclude that the tail index is the same as for the single-server queue, see (2). Moreover, if we compare the tail index of the c.o.s. and c.o.c. variants of redundancy scheduling with identical replicas it follows that the c.o.s. variant always performs better from a tail perspective.

For the special case of i.i.d. replicas in the c.o.c. variant of the fork–join model with the FCFS discipline, we have concluded: if $B \in RV(-\nu )$, then $R \in ORV(1-(n_{\mathrm {F}}+1-n_{\mathrm {J}})\nu )$. If $n_{\mathrm {F}}=n_{\mathrm {J}}=1$, then $R \in ORV(1-\nu )$ which is consistent with the case of identical replicas. Moreover, if we compare the tail index of the c.o.s. and c.o.c. variants of redundancy scheduling with i.i.d. replicas it follows that the c.o.c. variant always performs better from a tail perspective. Observe that this statement is in contrast with the case for identical replicas.

We studied two special structures for the dependency between replicas. The general case, with a vector $(B_1,\dots ,B_{n_{\mathrm {F}}})$ of possibly dependent and multivariate regularly varying job sizes, will be more involved. For further information on multivariate regular variation, we refer to [28] or [4, Appendix A1.5] and the references therein.

We determined the tail behavior for the c.o.s. variant of redundancy scheduling and the c.o.c. variant of the more general fork–join model. It can be concluded that the analysis of the c.o.s. variant is much more challenging than of the c.o.c. variant. One of the reasons is that for the c.o.s. variant multiple big jobs might be needed for a large waiting time to occur while for the c.o.c. variant only one big job is needed. In some sense, this is remarkable, since for the stability condition it is the other way around: The stability condition for the c.o.s. variant of redundancy scheduling is known, whereas for the c.o.c. variant of the fork–join model, and also redundancy-d scheduling, it is still an open problem for non-exponential job size distributions.

4 LCFS-PR discipline

In this section, we study the tail behavior of the response time in the fork–join model with the LCFS-PR discipline. First, we discuss known results for the single-server queue and in Sects. 4.1 and 4.2 the tail behavior for the c.o.s. and c.o.c. variants of the fork–join model is discussed, respectively.

For the GI/G/1 queue with regularly varying job sizes, the tail behavior of the response time distribution is known

$$\begin{aligned} \mathbb {P}(R_{\text {LCFS-PR}} > x) \sim \mathbb {E}[N_{\text {bp}}] (1-\tilde{\rho })^{-\nu } L(x) x^{-\nu } ~~~ \text {as } x \rightarrow \infty , \end{aligned}$$

(14)

where $N_{\text {bp}}$ denotes the number of jobs completed during a busy period, see [33]. One way to understand (14) is the following. First, observe that for the LCFS-PR discipline

$$\begin{aligned} R_{\text {LCFS-PR}} {\mathop {=}\limits ^{d}} P, \end{aligned}$$

where P is the busy period of a GI/G/1 queue. Let V(t) be the amount of work in the system at time t and assume that the first job arrives in an empty system at time 0. The busy period P is then defined as

$$\begin{aligned} P := \inf \{t>0 : V(t) = 0\}. \end{aligned}$$

Let the cycle maximum $C_{\text {max}}$ be defined by

$$\begin{aligned} C_{\text {max}} := \sup \{V(t), 0 \le t \le P\}. \end{aligned}$$

It is shown, see for example [18, Corollary 2.2], that subexponentiality of B implies that $\mathbb {P}(C_{\text {max}}> x) \sim \mathbb {P}(W_{\text {max}} > x)$, where $W_{\text {max}}$ is the maximum waiting time during a busy period, and from [1] we know that,

$$\begin{aligned} \mathbb {P}(W_{\text {max}}> x) \sim \mathbb {E}[N_{\mathrm {bp}}] \mathbb {P}(B > x) ~~~ \text {as } x \rightarrow \infty . \end{aligned}$$

Combining both relations gives

$$\begin{aligned} \mathbb {P}(C_{\text {max}}> x) \sim \mathbb {E}[N_{\mathrm {bp}}] \mathbb {P}(B > x) ~~~ \text {as } x \rightarrow \infty . \end{aligned}$$

A large maximum waiting time is most likely due to one large job. After this large job, the system behaves normally and the workload goes to zero with negative drift $-(1-\tilde{\rho })$. Hence, if $C_{\text {max}}$ is large, then one would expect that

$$\begin{aligned} P \approx \frac{C_{\text {max}}}{1-\tilde{\rho }}, \end{aligned}$$

from which it follows that

$$\begin{aligned} \mathbb {P}(P > x) \sim \mathbb {E}[N_{\mathrm {bp}}] (1-\tilde{\rho })^{-\nu } L(x) x^{-\nu } ~~~ \text {as } x \rightarrow \infty . \end{aligned}$$

Observing that the busy period coincides with the response time of a job for the LCFS-PR discipline gives the desired result in (14).

4.1 Cancel-on-start

Note that for the LCFS-PR discipline the c.o.s. variant of the fork–join($n_{\mathrm {F}},n_{\mathrm {J}}$) model is equivalent to the system where replicas of each job are assigned to $n_{\mathrm {J}}$ servers chosen uniformly at random (without replacement), since all replicas immediately go into service. Thus, each queue is equivalent with a $GI/G/1/LCFS \text {-} PR$ queue with a random selection of the arrivals based on Bernoulli experiments with probability $n_{\mathrm {J}}/N$, i.e., mean interarrival time $N \mathbb {E}[A]/n_{\mathrm {J}}$ and mean job size $\mathbb {E}[B]$. Hence, the stability condition is $\tilde{\rho } < \frac{N}{n_{\mathrm {J}}}$. For regularly varying job sizes, the tail behavior of the response time is given by

$$\begin{aligned} \mathbb {P}(R> x) = \mathbb {P}(\max _{i=1,\dots ,n_{\mathrm {J}}}R_{\mathrm {LCFS-PR}}> x) \sim n_{\mathrm {J}} \mathbb {P}(R_{\mathrm {LCFS-PR}} > x) ~~~ \text {as } x \rightarrow \infty , \end{aligned}$$

see for example [21].

Observe that a similar reasoning is applicable for any service discipline in which all replicas immediately go into service. Another example is the processor-sharing (PS) discipline for which the tail behavior of the response time for the single-server queue with regularly varying job sizes with index $-\nu $ is given by

$$\begin{aligned} \mathbb {P}(R_{\text {PS}} > x) \sim (1-\tilde{\rho })^{-\nu } L(x) x^{-\nu } ~~~ \text {as } x \rightarrow \infty , \end{aligned}$$

see for example [32, Chapter 3] or [34].

4.2 Cancel-on-completion

In this section, we analyze the tail asymptotics for the c.o.c. variant of the fork–join($n_{\mathrm {F}},n_{\mathrm {J}}$) model with the LCFS-PR discipline.

We use the same upper bound system as for the FCFS discipline, see Sect. 3.2. Note that for the LCFS-PR discipline the servers wait with serving a new job until all the $n_{\mathrm {J}}$ replicas are finished or the job is pre-empted. Hence, all $n_{\mathrm {F}}$ replicas of a job (if present) receive service simultaneously at all times. From this, it follows that for the LCFS-PR discipline the upper bound system with multiple servers is equivalent to a $GI/G/1/LCFS \text {-} PR$ queue with interarrival time A and job size $B_{(n_{\mathrm {J}})}$. However, note that for the LCFS-PR discipline it is not sufficient that the upper bound system provides an upper bound in terms of the workload, since the response time in the LCFS-PR discipline does not depend on the workload upon arrival of a job. In Lemma 3, we prove that the upper bound system also provides an upper bound in terms of the residual size of each replica.

Lemma 3

At any time, the residual size of each replica (possibly zero) is larger in the upper bound system than in the c.o.c. variant of the fork–join($n_{\mathrm {F}},n_{\mathrm {J}}$) model.

Proof

The proof follows by contradiction. Let $t_{0}$ be the first time such that the stated inequality is about to be violated, and distinguish three cases depending on whether this is caused by the arrival of a job, the departure of a replica, or by some other reason.

In case of an arrival, the $n_{\mathrm {F}}$ replicas of the arriving job in the upper bound system are no smaller than in the original system by definition.
In case of a departure in the upper bound system because there is a $n_{\mathrm {J}}$th replica of a job that has residual size zero, $n_{\mathrm {F}} - n_{\mathrm {J}}$ replicas are abandoned. However, according to the hypothesis this replica should also have residual size zero in the original system. Now, in the upper bound system $n_{\mathrm {F}}$ replicas of the job that arrived the latest resume service (if present), but in the original system these replicas also resume service or are already receiving service, since they arrived the latest at their corresponding server.
In the absence of any arrival or departure, the inequality can only be violated if the replica in question receives service in the upper bound system but not in the original system, and has the same residual size in both systems at time $t_{0}$. Recall that in the upper bound system all $n_{\mathrm {F}}$ replicas of the same job always receive service simultaneously. Thus, replicas of jobs that arrived later than this job already fully completed service because of the LCFS-PR discipline. However, according to the hypothesis these replicas also fully completed service in the original system. Hence, it follows that the replica of interest also receives service in the original system (if present).

Thus, the statement is still true at time $t_{0}$ in all cases.

Corollary 6

The response time R in the c.o.c. variant of the fork–join($n_{\mathrm {F}},n_{\mathrm {J}}$) model with the LCFS-PR discipline is stochastically bounded from above by the response time $R_{\mathrm {U}}$ in a $GI/G/1/LCFS \text {-} PR$ queue with interarrival time A and job size $B_{(n_{\mathrm {J}})}$.

Proof

By Lemma 3, at any time the residual size of each replica (possibly zero) is larger in the upper bound system than in the c.o.c. variant of the fork–join($n_{\mathrm {F}},n_{\mathrm {J}}$) model. The departure time of a job is the moment at which the $n_{\mathrm {J}}$th replica of a job has residual size zero. Hence, it follows that $R \le R_{\mathrm {U}}$.

A sufficient stability condition for general interarrival times and job sizes is $\rho _{\mathrm {U}} = \frac{\mathbb {E}[B_{(n_{\mathrm {J}})}]}{\mathbb {E}[A]} < 1$, which can be proved via the upper bound system given in Corollary 6. The exact stability condition of the c.o.c. variant of the fork–join($n_{\mathrm {F}},n_{\mathrm {J}}$) model, and also redundancy-d scheduling, with the LCFS-PR discipline in such a general setting is still unknown.

The next theorem provides insight in the tail behavior when the distribution of the $n_{\mathrm {J}}$th order statistic of the job size is regularly varying, i.e., $B_{(n_{\mathrm {J}})} \in RV(-\tilde{\nu })$. Similarly to the FCFS discipline, it includes the special cases of identical replicas ($\tilde{\nu } = \nu $) and i.i.d. replicas ($\tilde{\nu } = (n_{\mathrm {F}}+1-n_{\mathrm {J}})\nu $).

Theorem 3

For the c.o.c. variant of the fork–join($n_{\mathrm {F}},n_{\mathrm {J}}$) model with the LCFS-PR discipline and $\rho _{\mathrm {U}} < 1$: if $B_{(n_{\mathrm {J}})} \in RV(-\tilde{\nu })$, then $R \in ORV(-\tilde{\nu })$.

Proof

Upper bound: By Corollary 6, the response time is bounded from above by the response time in a $GI/G/1/LCFS \text {-} PR$ queue with interarrival time A and job size $B_{(n_{\mathrm {J}})}$. Let $R_{\mathrm {U}}$ denote the response time in this upper bound system. Since $B_{(n_{\mathrm {J}})}$ is regularly varying, we can apply known results for the single-server queue, see (14), and obtain

$$\begin{aligned} \mathbb {P}(R_{\mathrm {U}} > x) \sim \mathbb {E}[N_{\mathrm {bp}}] (1-\rho _{\mathrm {U}})^{-\tilde{\nu }} L(x) x^{-\tilde{\nu }} ~~~ \text {as } x \rightarrow \infty , \end{aligned}$$

where $\rho _{\mathrm {U}}=\frac{\mathbb {E}[B_{(n_{\mathrm {J}})}]}{\mathbb {E}[A]}$.

Lower bound: One could argue that R cannot have a heavier tail than $R_{\mathrm {U}}$, but also not a lighter tail, since

$$\begin{aligned} \mathbb {P}(R> x) \ge \mathbb {P}(B_{(n_{\mathrm {J}})}> x) = L(x) x^{-\tilde{\nu }}, ~~~x>0. \end{aligned}$$

The proof follows by Lemma 6 in Appendix A.

Remark 3

For identical replicas, we can even find a better upper bound in Theorem 3. Indeed, consider the system in which all replicas are completely served. This system is equivalent to a $GI/G/1/LCFS \text {-} PR$ queue with a random selection of the arrivals based on Bernoulli experiments with probability $n_{\mathrm {F}}/N$, i.e., mean interarrival time $\frac{N \mathbb {E}[A]}{n_{\mathrm {F}}}$, and job size B, which is equal to $B_{(n_{\mathrm {J}})}$ in the case of identical replicas.

Theorem 3 indicates that for the LCFS-PR discipline the tail of the response time is just as heavy as the tail of the job size. Comparing the tail behavior in redundancy-d scheduling with the LCFS-PR discipline and with the FCFS discipline we can conclude that, for the c.o.s. variant, the LCFS-PR discipline has better tail behavior than (or equally good as) the FCFS discipline. Loosely speaking, the tail behavior of the LCFS-PR discipline is better in scenarios with small load and a small number of replicas d and the tail behavior of the two service disciplines is similar in all other scenarios. For the c.o.c. variant of the fork–join model, the LCFS-PR discipline always has better tail behavior than the FCFS discipline for all dependency structures between the replicas.

5 Numerical results

In the previous sections, we determined the tail behavior of the response time for heavy-tailed job sizes. In this section, we provide simulation results for redundancy-d scheduling that illustrate this tail behavior in various scenarios. All the simulation experiments are conducted with $10^{9}$ number of jobs. The figures are in log–log scale and we consider Pareto distributed job sizes with shape value $\nu =1.5$, which means that $B \in RV(-1.5)$. Note that in the simulation $\mathbb {P}(R > x) = 0$ for x big enough, which explains the steep drop in all the figures.

In Fig. 1, the tail behavior of the response time for the c.o.s. variant of redundancy is depicted, see Corollary 2 for the corresponding asymptotic bound. It can be seen that especially the lines for $d=2$ and $d=N=3$ are following the line representing tail index $-0.5$ quite well. For $d=1$, it can be seen that at first it diverges, but after $x>10$ it also runs parallel to the line representing tail index $-0.5$.

Figure 2 shows the tail behavior for the response time in the c.o.c. variant of redundancy with identical Pareto job sizes, see Corollary 5 for the asymptotic bound. It can be seen that for every number of replicas the tail index is equivalent to the value identified in Corollary 5. Interestingly, this figure shows that for $d=2$ the asymptotic lower bound represents the exact tail behavior better than the upper bound.

Figure 3 depicts the tail behavior for the response time in the c.o.c. variant of redundancy with i.i.d. Pareto job sizes. Note that according to Corollary 5 the tail index is given by $1-d\nu $. To get the same tail behavior for all the numbers of replicas in Fig. 3, we scaled the job size with d.

So far, we only considered the FCFS discipline. Figure 4 shows the tail behavior of the response time for the c.o.c. variant of redundancy with the LCFS discipline.

6 Conclusion and suggestions for further research

In this paper, we studied the tail behavior of the response time in redundancy-d scheduling and the fork–join model for heavy-tailed job sizes. In particular, for the c.o.s. variant of redundancy-d with the FCFS discipline and subexponential job sizes we determined the tail behavior of the response time and showed that it depends on the load of the system. For the c.o.c. variant of the fork–join model, we observed that the tail behavior of the response time depends on the dependency structure between the replicas. For job sizes $B \in RV(-\nu )$, our results indicate that for the c.o.s. variant of redundancy scheduling in the scenario of sufficiently small load having $d= \lceil \frac{\nu }{\nu -1} \rceil $ replicas already achieves the optimal asymptotic tail behavior of the response time. For high loads, the results indicate that creating many replicas yields no benefits for the tail index of the response time. For the c.o.c. variant of the fork–join($n_{\mathrm {F}},n_{\mathrm {J}}$) model with identical and i.i.d. replicas, the tail index of the response time is $1-\nu $ and $1-(n_{\mathrm {F}}+1-n_{\mathrm {J}})\nu $, respectively. Thus, the tail index is independent of the load of the system and for identical replicas even independent of the number of replicas.

Observe that all the results on the tail index for the c.o.c. variant of the fork–join model rely on the fact that the upper bound system is stable. The stability condition of this system does not necessarily coincide with the stability condition of the original fork–join model. It would also be interesting to study the tail index for values of the load for which the original fork–join model is stable but the upper bound system is unstable.

A natural topic for further research would be to extend our analysis to heterogeneous servers or even more generally to job types that can have different speeds at the various servers, see for example the model in [24].

Another extension would be to analyze the tail behavior of the response time for the ROS service discipline. As mentioned in the introduction, for the single-server queue this discipline has the same tail index as the FCFS discipline. Simulation experiments (not included in this paper) suggest that this statement extends to redundancy-d scheduling.

References

Asmussen, S.: Subexponential asymptotics for stochastic processes: extremal behaviour, stationary distributions and first passage times. Ann. Appl. Probab. 8, 354–374 (1997)
Google Scholar
Ayesta, U., Bodas, T., Verloop, I.M.: On redundancy-$d$ with cancel-on-start a.k.a. join-shortest-work($d$). ACM SIGMETRICS Perform. Eval. Rev. 46(2), 24–26 (2019)
Article Google Scholar
Baccelli, F., Makowski, A.M., Shwartz, A.: The fork-join queue and related systems with synchronization constraints: stochastic ordering and computable bounds. Adv. Appl. Probab. 21(3), 629–660 (1989)
Article Google Scholar
Bingham, N.H., Goldie, C.M., Teugels, J.L.: Regular Variation. Cambridge University Press, Cambridge (1987)
Book Google Scholar
Blanchet, J., Murthy, K.R.A.: Tail asymptotics for delay in a half-loaded GI/GI/2 queue with heavy-tailed job sizes. Queueing Syst. 81, 301–340 (2015)
Article Google Scholar
Boxma, O.J., Foss, S.G., Lasgouttes, J.M., Núñez Queija, R.: Waiting time asymptotics in the single server queue with service in random order. Queueing Syst. 46, 35–73 (2004)
Article Google Scholar
Dean, J., Barroso, L.Z.: The tail at scale. Commun. ACM 56(2), 74–80 (2013)
Article Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)
Article Google Scholar
Flatto, L., Hahn, S.: Two parallel queues created by arrivals with two demands. SIAM J. Appl. Math. 44(5), 1041–1053 (1984)
Article Google Scholar
Foss, S., Chernova, N.: On the stability of a partially accessible multi-station queue with state-dependent routing. Queueing Syst. 29, 55–73 (1998)
Article Google Scholar
Foss, S., Korshunov, D.: Heavy tails in multi-server queues. Queueing Syst. 52(1), 31–48 (2006)
Article Google Scholar
Foss, S., Korshunov, D.: On large delays in multi-server queues with heavy tails. Math. Oper. Res. 37(2), 201–218 (2012)
Article Google Scholar
Foss, S., Korshunov, D., Zachary, S.: An Introduction to Heavy-Tailed and Subexponential Distributions. Springer, New York (2013)
Book Google Scholar
Foss, S., Richards, A.: On sums of conditionally independent subexponential random variables. Math. Oper. Res. 35(1), 102–119 (2010)
Article Google Scholar
Gardner, K.S., Zbarsky, S., Doroudi, S., Harchol-Balter, M., Hyytia, E., Scheller-Wolf, A.: Reducing latency via redundant requests: exact analysis. ACM SIGMETRICS Perform. Eval. Rev. 43(1), 347–360 (2015)
Article Google Scholar
Harchol-Balter, M.: Open problems in queueing theory inspired by datacenter computing. Queueing Syst. 97, 3–37 (2021)
Article Google Scholar
Heath, D., Resnick, S., Samorodnitsky, G.: Patterns of buffer overflow in a class of queues with long memory in the input stream. Ann. Appl. Probab. 7, 1021–1057 (1997)
Article Google Scholar
Hellemans, T., Van Houdt, B.: Performance of redundancy-$d$ with identical/independent replicas. ACM Trans. Model. Perform. Eval. Comput. Syst. 4(2), 1–28 (2019)
Article Google Scholar
Jessen, A.H., Mikosch, T.: Regularly varying functions. Publications de l’Institut Mathématique 80(94), 171–192 (2006)
Article Google Scholar
Mikosch, T.: Regular variation, subexponentiality and their applications in probability theory. Lecture Notes. University of Groningen (1999). https://www.eurandom.tue.nl/pre-prints/1999-013
Nelson, R.D., Tantawi, A.N.: Approximate analysis of fork/join synchronization in parallel queues. IEEE Trans. Comput. 37(6), 739–743 (1988)
Article Google Scholar
Park, K., Willinger, W.: Self-Similar Network Traffic and Performance Evaluation. Wiley, New York (2000)
Book Google Scholar
Raaijmakers, Y., Borst, S.C., Boxma, O.J.: Delta probing policies for redundancy. Perform. Eval. 127–128, 21–35 (2018)
Article Google Scholar
Raaijmakers, Y., Borst, S.C., Boxma, O.J.: Redundancy scheduling with scaled Bernoulli service requirements. Queueing Syst. 93(1–2), 67–82 (2019)
Article Google Scholar
Raaijmakers, Y., Borst, S.C., Boxma, O.J.: Fork-join and redundancy systems with heavy-tailed job sizes. (2021) arXiv:2105.13738
Raaijmakers, Y., Borst, S.C., Boxma, O.J.: Stability and tail behavior of redundancy systems with processor sharing. Perform. Eval. 147, 1–40 (2021)
Article Google Scholar
Resnick, S.: Heavy tailed analysis. Lecture Notes, Cornell University (2005). https://www.eurandom.tue.nl/pre-prints/ 2005-024
Thomasian, A.: Analysis of fork/join and related queueing systems. ACM Comput. Surv. 47(2), 1–71 (2014)
Article Google Scholar
Veraverbeke, N.: Asymptotic behaviour of Wiener-Hopf factors of a random walk. Stoch. Process. Their Appl. 5(1), 27–37 (1977)
Article Google Scholar
Vulimiria, A., Godfrey, P.B., Mittal, R., Sherry, J., Ratnasamy, S., Shenker, S.: Low latency via redundancy. In: CoNEXT ’13: Proceedings of the Ninth ACM Conference on Emerging Networking Experiments and Technologies, pp. 283–294 (2013)
Zwart, A.P.: Queueing Systems with Heavy Tails. PhD thesis, Eindhoven University of Technology (2001). https://research.tue.nl/nl/publications/queueing-systems-with-heavy-tails
Zwart, A.P.: Tail asymptotics for the busy period in the ${GI}/{G}/1$ queue. Math. Oper. Res. 26(3), 485–493 (2001)
Article Google Scholar
Zwart, A.P., Boxma, O.J.: Sojourn time asymptotics in the ${M}/{G}/1$ processor sharing queue. Queueing Syst. 35, 141–166 (2000)
Article Google Scholar

Download references

Acknowledgements

The work in this paper is supported by the Netherlands Organisation for Scientific Research (NWO) through Gravitation Grant NETWORKS 024.002.003. The authors would like to thank Sergey Foss for his helpful suggestions and careful reading.

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands
Youri Raaijmakers, Sem Borst & Onno Boxma

Authors

Youri Raaijmakers
View author publications
You can also search for this author in PubMed Google Scholar
Sem Borst
View author publications
You can also search for this author in PubMed Google Scholar
Onno Boxma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Youri Raaijmakers.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Preliminary results

In this appendix, we introduce several classes of heavy-tailed distributions that are considered for the job size in this paper, see also [4, 14]. Let the complementary cumulative distribution function be defined as $\bar{F}_{B}(x):= 1-F_{B}(x) = \mathbb {P}(B > x)$.

Definition 1

B is heavy-tailed if, for all $\epsilon >0$,

$$\begin{aligned} \mathbb {E}[e^{\epsilon B}] = \infty , \end{aligned}$$

or equivalently (see for example [14, Theorem 2.6]), if for all $\epsilon >0$,

$$\begin{aligned} \mathbb {P}(B > x) e^{\epsilon x} \rightarrow \infty \text { as } x \rightarrow \infty . \end{aligned}$$

Let $F_{B}^{n*}(x)$ be the n-fold convolution of $F_{B}(x)$ for $n=2,3,\dots $, with $F_B^{1*}(x)~\equiv F_B(x)$.

Definition 2

B is subexponential, denoted by $B \in {\mathcal {S}}$, if

$$\begin{aligned} \frac{\bar{F}_{B}^{2*}(x)}{\bar{F}_{B}(x)} = \frac{\mathbb {P}(B_{1}+B_{2}>x)}{\mathbb {P}(B>x)} \rightarrow 2 ~~~ \text {as }x \rightarrow \infty . \end{aligned}$$

Examples of well-known subexponential distributions are Pareto, Lognormal and Weibull with a shape parameter between 0 and 1.

Definition 3

B is long-tailed, denoted by $B \in \mathcal {L}$, if $\bar{F}_{B}(x+1) \sim \bar{F}_{B}(x)$ as $x \rightarrow \infty $.

Here, $f(x) \sim g(x)$ means $\frac{f(x)}{g(x)} \rightarrow 1$ as $x \rightarrow \infty $.

Definition 4

B is dominated varying, denoted by $B \in {\mathcal {D}}$, if $\bar{F}_{B}(2x) \ge c \bar{F}_{B}(x)$ for some $c > 0$ and for all x.

Definition 5

B is $\mathcal {O}$-regularly varying, denoted by $B \in ORV$, if

$$\begin{aligned} 0< \liminf _{x \rightarrow \infty } \frac{\bar{F}_{B}(\alpha x )}{\bar{F}_{B}(x)} \le \limsup _{x \rightarrow \infty } \frac{\bar{F}_{B}(\alpha x)}{\bar{F}_{B}(x)} < \infty , ~~~ \forall \alpha \ge 1. \end{aligned}$$

Furthermore, $B \in ORV(-\nu )$ if

$$\begin{aligned} c_{1} \alpha ^{-\nu }< \liminf _{x \rightarrow \infty } \frac{\bar{F}_{B}(\alpha x )}{\bar{F}_{B}(x)} \le \limsup _{x \rightarrow \infty } \frac{\bar{F}_{B}(\alpha x)}{\bar{F}_{B}(x)} < c_{2} \alpha ^{-\nu }, ~~~ \forall \alpha \ge 1, \end{aligned}$$

(A.1)

with positive constants $c_{1}$ and $c_{2}$.

Definition 6

B is regularly varying of index $-\nu $, denoted by $B \in RV(-\nu )$, if

$$\begin{aligned} \bar{F}_{B}(x) = L(x) x^{-\nu }, ~~~x>0, \end{aligned}$$

(A.2)

with L(x) a slowly varying function, i.e., $L(\alpha x)/L(x) \rightarrow 1$ for any $\alpha >0$.

Observe that we have the following relations, see for example [4, Theorem 2.1.8] or [32, Chapter 2],

$$\begin{aligned} RV \subset {\mathcal {D}} \subset ORV ~~~ \text {and} ~~~ RV \subset \mathcal {L} \cap {\mathcal {D}} \subset {\mathcal {S}}. \end{aligned}$$

(A.3)

We will analyze the tail asymptotics of the response time of an arbitrary job in steady state. For this, we need some preliminary results that are stated in the lemmas below. The next lemma states that the minimum and the sum of two independent regularly varying random variables is again regularly varying.

Lemma 4

Let X and Y be two independent regularly varying random variables with $\mathbb {P}(X>x)=L_{1}(x)x^{-\nu _{1}}$ and $\mathbb {P}(Y>x)=L_{2}(x)x^{-\nu _{2}}$. Then,

i)
$\min \left( X,Y \right) \in RV(-(\nu _{1}+\nu _{2}))$,
ii)
$X+Y \in RV(-\min \{\nu _{1},\nu _{2}\})$.

Proof

See [4, Proposition 1.5.7].

A similar lemma as Lemma 4 can be proved for $\mathcal {O}$-regularly varying random variables.

Lemma 5

Let X and Y be two independent $\mathcal {O}$-regularly varying random variables with index $-\nu _{1}$ and $-\nu _{2}$, respectively, then

i)
$\min \left( X,Y \right) \in ORV(-(\nu _{1}+\nu _{2}))$,
ii)
$X+Y \in ORV(-\min \{\nu _{1},\nu _{2}\})$.

Proof

By definition of $ORV(-\nu )$, there exist $c_{i}$, $i=1,\dots ,4$, such that

$$\begin{aligned} c_{1} \alpha ^{-\nu _{1}}< \liminf _{x \rightarrow \infty } \frac{\bar{F}_{X}(\alpha x )}{\bar{F}_{X}(x)} \le \limsup _{x \rightarrow \infty } \frac{\bar{F}_{X}(\alpha x)}{\bar{F}_{X}(x)} < c_{2} \alpha ^{-\nu _{1}}, ~~~ \forall \alpha \ge 1, \end{aligned}$$

and

$$\begin{aligned} c_{3} \alpha ^{-\nu _{2}}< \liminf _{x \rightarrow \infty } \frac{\bar{F}_{Y}(\alpha x )}{\bar{F}_{Y}(x)} \le \limsup _{x \rightarrow \infty } \frac{\bar{F}_{Y}(\alpha x)}{\bar{F}_{Y}(x)} < c_{4} \alpha ^{-\nu _{2}}, ~~~ \forall \alpha \ge 1. \end{aligned}$$

Observe that by independence we have

$$\begin{aligned} \mathbb {P}(\min \left( X,Y \right)> x) = \mathbb {P}(X> x)\mathbb {P}(Y > x), \end{aligned}$$

and therefore

$$\begin{aligned} c_{1}c_{3} \alpha ^{-(\nu _{1}+\nu _{2})}< \liminf _{x \rightarrow \infty } \frac{\bar{F}_{\min \left( X,Y \right) }(\alpha x )}{\bar{F}_{\min \left( X,Y \right) }(x)} \le \limsup _{x \rightarrow \infty } \frac{\bar{F}_{\min \left( X,Y \right) }(\alpha x)}{\bar{F}_{\min \left( X,Y \right) }(x)} < c_{2}c_{4} \alpha ^{-(\nu _{1}+\nu _{2})}, ~~~ \forall \alpha \ge 1. \end{aligned}$$

The proof of i) follows by the definition of $ORV(-\nu )$.

The proof of ii) is similar to the proof of the convolution closure for regularly varying distributions, see for example [21]. Since $\{X+Y> x\} \supset \{X>x\} \cup \{Y>x\}$ it follows that

$$\begin{aligned} \mathbb {P}(X+Y> x) \ge \mathbb {P}(X>x)+\mathbb {P}(Y>x) - \mathbb {P}(X> x)\mathbb {P}(Y > x). \end{aligned}$$

For $0< \delta < \frac{1}{2}$, we have that

$$\begin{aligned} \{X+Y> x\} \subset \{X> (1-\delta )x \} \cup \{Y> (1-\delta )x \} \cup \{X> \delta x, Y > \delta x\}, \end{aligned}$$

and therefore

$$\begin{aligned} \mathbb {P}(X+Y> x) \le \mathbb {P}(X> (1-\delta )x) + \mathbb {P}(Y> (1-\delta )x) + \mathbb {P}(X> \delta x)\mathbb {P}(Y > \delta x). \end{aligned}$$

Now if $\nu _{1} = \min \{v_{1}, v_{2}\}$,

$$\begin{aligned} \frac{\mathbb {P}(X+Y> \alpha x)}{\mathbb {P}(X+Y > x)} \ge \frac{c_{1}}{c_{2}} (1-\delta )^{\nu _{1}} \alpha ^{-\nu _{1}} (1+o(1)), \end{aligned}$$

and

$$\begin{aligned} \frac{\mathbb {P}(X+Y> \alpha x)}{\mathbb {P}(X+Y > x)} \le \frac{c_{2}}{c_{1}} (1-\delta )^{-\nu _{1}} \alpha ^{-\nu _{1}} (1+o(1)). \end{aligned}$$

The case for $\nu _{2} = \min \{v_{1}, v_{2}\}$ follows by an analogous argument. By definition, see (A.1), we get that $X+Y \in ORV(-\min \{\nu _{1},\nu _{2}\})$.

Observe that these results could also be obtained by applying the principle of a single big jump, see for example [15].

Next we give an auxiliary lemma which states that a random variable is $\mathcal {O}$-regularly varying with index $-\nu $ under the condition $c_{1} L(x) x^{-\nu } \le \mathbb {P}(X > x) \le c_{2} L(x) x^{-\nu }$.

Lemma 6

If $c_{1} L(x) x^{-\nu } \le \mathbb {P}(X > x) \le c_{2} L(x) x^{-\nu }$, then $X \in ORV(-\nu )$.

Proof

From $c_{1} L(x) x^{-\nu } \le \mathbb {P}(X > x) \le c_{2} L(x) x^{-\nu }$, we get

$$\begin{aligned} \frac{c_{1}}{c_{2}} \alpha ^{-\nu }< \liminf _{x \rightarrow \infty } \frac{\bar{F}_{X}(\alpha x )}{\bar{F}_{X}(x)} \le \limsup _{x \rightarrow \infty } \frac{\bar{F}_{X}(\alpha x)}{\bar{F}_{X}(x)} < \frac{c_{2}}{c_{1}} \alpha ^{-\nu }, ~~~ \forall \alpha \ge 1. \end{aligned}$$

The proof follows by definition of $ORV(-\nu )$.

Appendix B: Proof of the upper bounds in Theorem 1

In this appendix, we will prove the upper bounds (8) and (10) on the tail of the waiting time for the c.o.s. variant of redundancy-d with the FCFS discipline. Our proof is based on the proof in [13] for the GI/G/N queue, which corresponds to a system of N queues with the JSW-N policy. While the JSW-d policy with $1 \le d \le N$ requires essential and sometimes subtle adaptations, overall we follow the main line of reasoning of [13] and indicate for each lemma and theorem its counterpart in [13].

In most heavy-tailed queueing systems, the interarrival time distribution does not have an effect on the waiting time tail behavior. With this in mind, similar to [12, 13], we first consider deterministic interarrival times, making the derivations more tractable, and thereafter prove Lemma 10 which allows us to extend the proof for deterministic interarrival times to generally distributed interarrival times. The idea behind the proof of the upper bounds is that we compare the system with the JSW-d policy to N auxiliary single-server queueing systems which work in parallel.

Let $\varvec{s}^{n}$ denote the vector of d servers which are sampled for the nth job. For $n=1,2,\dots $, let $\varvec{V}_{n}=(V_{n1},\dots ,V_{nN})$ be the vector of residual workloads at the arrival epoch of the nth job. The waiting time that the nth job experiences is $W_{\mathrm {min},n} := \min \{V_{nj},j \in \varvec{s}^{n}\}$ and it joins server $i_{n}$, where $i_{n} = \min \{ i \in \varvec{s}^{n} : V_{ni} = W_{\mathrm {min},n} \}$. Also,

$$\begin{aligned} V_{n+1,i} = {\left\{ \begin{array}{ll} (V_{ni}+b_{n}-a_{n+1})^{+} &{} \text {if } i = i_{n}, \\ (V_{ni}-a_{n+1})^{+} &{} \text {if } i \ne i_{n}, \end{array}\right. } \end{aligned}$$

with job sizes $b_{n}$ and interarrival times $a_{n}$ Let $R(\varvec{x}) = (R_{1}(\varvec{x}),\dots ,R_{N}(\varvec{x}))$ be the operator on $\mathbb {R}^{N}$ that orders the coordinates of $\varvec{x} \in \mathbb {R}^{N}$ in nondecreasing order, i.e., $R_{1}(\varvec{x})\le \dots \le R_{N}(\varvec{x})$. Moreover, let $f_{R}: \mathbb {R} \rightarrow \mathbb {R}$ be the function that maps the server number to the number ordered by workload as in the operator $R(\cdot )$. For $n=1,2,\dots $, put $\varvec{D}_{n} = R(\varvec{V}_{n})$. Then, $W_{\mathrm {min},n} = D_{ni}$, where $i = f_{R}(i_{n})$ and similar to the Kiefer–Wolfowitz recursion for the JSW policy, we get

$$\begin{aligned} \varvec{D}_{n+1} = R((D_{n1}-a_{n+1})^{+},\dots ,(D_{ni}+b_{n}-a_{n+1})^{+},\dots ,(D_{nN}-a_{n+1})^{+}). \end{aligned}$$

(B.1)

Observe that the operator $R(\cdot )$ is monotone, thus the sequence $\varvec{D}_{n}$ satisfying Equation (B.1) satisfies the two monotonicity properties of Lemma 4.1 in [13] as well.

Hereafter, we continue to assume deterministic interarrival times $a' \equiv \mathbb {E}[A]$ for the original system.

In Lemma 8, we upper bound the sum of waiting times by the sum of waiting times in auxiliary D/G/1 queues (defined after Lemma 7) and a light-tailed random variable. The proof of Lemma 8 uses the auxiliary Lemma 7 that provides an upper bound on the expected difference of the total workload at all the servers seen by the first and $(s+1)$th job when the workload at one of the servers is large. Note that the choice of this large workload is different from that used in [13].

Lemma 7

(Counterpart of Lemma 4.3 in [13]) Consider a system with $k+1$ servers and assume $\mathbb {E}[B] > k a'$. For any $\epsilon > 0$, there exist $V_{\mathrm {large}} < \infty $ and an integer $s \ge 1$ such that, for any initial value $\varvec{D}_{1}$ with $D_{1,k+1} \ge V_{\mathrm {large}}$,

$$\begin{aligned} \mathbb {E}\left[ \sum _{j=1}^{k+1} D_{1+s,j} - \sum _{j=1}^{k+1} D_{1j} \right] \le s(\mathbb {E}[B] - (k+1)a' + \epsilon ). \end{aligned}$$

Proof

By property (2) of Lemma 4.1 in [13], it is enough to prove the result for initial values $D_{11} = \dots = D_{1k} = 0$, $D_{1,k+1} = V_{\mathrm {large}}$ only. Choose C such that $\mathbb {E}[\min \{a', C\}] \ge a' - \epsilon /2$. By property (1) of Lemma 4.1 in [13], we may prove the lemma with interarrival times $\min \{a', C\}$ instead of $a'$.

For $d \ge k+1$, the proof follows from Lemma 4.3 in [13], since the JSW-d and JSW policy are equivalent in the system with $k+1$ servers. For $d < k+1$, consider an auxiliary unstable GI/G/k system that assigns the job to the server with the smallest workload among $d-1$ servers selected uniformly at random with probability $\frac{d}{k+1}$ and d servers selected uniformly at random with probability $1-\frac{d}{k+1}$. Observe that $\frac{d}{k+1}$ (respectively, $1-\frac{d}{k+1}$) equals the probability that server $k+1$ is (not) sampled in the original system. The auxiliary system has initial value $\varvec{\hat{D}}_{1}=0$. Find s such that $\mathbb {E}\left[ \sum _{i=1}^{k} \hat{D}_{1+s,i}\right] \le s(\mathbb {E}[B]-k a' + \epsilon /2)$. For an unstable system with workload vector $\hat{\varvec{D}}_{n}$, we have that $\hat{D}_{ni} \rightarrow \infty $ as $n \rightarrow \infty $ for $i=1,\dots ,k$.

Take $V_{\mathrm {large}}=\max \{(s+1)C,V_{\mathrm {large}}^{*}\}$, where $V_{\mathrm {large}}^{*}$ is defined as follows. Consider the system with initial values $D_{11} = \dots = D_{1k} = 0$, $D_{1,k+1} = V_{\mathrm {large}}^{*}$ and let the $n^{*}$th job be the first job that is assigned to the $(k+1)$th server, i.e., $n^{*} :=\min \{n \ge 1 : i_{n}=k+1\}$ which clearly depends on the initial workload $V_{\mathrm {large}}^{*}$. Then, take $V_{\mathrm {large}}^{*}$ such that

$$\begin{aligned} \min _{i=1,\dots ,k+1} D_{n^{*}i} \ge \left( s+1-n^{*}\right) C. \end{aligned}$$

Note that such $V_{\mathrm {large}}^{*}$ exists, since increasing $V_{\mathrm {large}}^{*}$ leads, loosely speaking, to increasing workloads at the other k servers as well (because they are unstable). This definition of $V_{\mathrm {large}}^{*}$ ensures that the first time a job is assigned to server $k+1$ the workload at the other servers is large enough so that, without any additional work, these servers are not empty before the $(s+1)$th job. We cannot simply take $V_{\mathrm {large}}=(s+1)C$ as in [13], because this does not guarantee that $D_{ni} > 0$ and $\hat{D}_{ni} > 0$, for all $n \in [\min \{s+1, n^{*}\},s+1]$ and $i=1,\dots ,k$, which is needed in the proof. Indeed, without additional constraints on $V_{\mathrm {large}}$ it may be that the job is allocated to the $(k+1)$th server, which has the smallest workload out of the d sampled servers, while at least one of the other $k+1-d$ servers is empty.

By the exact same steps as in Lemma 4.3 in [13], we can prove that

$$\begin{aligned} \sum _{j=1}^{k+1} D_{1+s,j} - \sum _{j=1}^{k+1} D_{1,j}&= \sum _{j=1}^{k} \hat{D}_{1+s,j} - \sum _{j=1}^{s} \min \{a',C\} \nonumber \\&\le s(\mathbb {E}[B] - k a' + \frac{\epsilon }{2}) - s(a' - \frac{\epsilon }{2})~~~ \text {a.s.}, \end{aligned}$$

(B.2)

and the result follows.

We now introduce the auxiliary D/G/1 queues used in Lemma 8. Consider N auxiliary D/G/1 queueing systems which work in parallel and with interarrival times $a' \equiv \mathbb {E}[A]$. At every time instant $T_{n}$, $n=1,2,\dots $, a batch of N jobs arrives, one job per server. Denote by $U_{ni}$, $i=1,\dots ,N$, the waiting times in the ith D/G/1 queue and let $b_{ni}$, $n \ge 1$ and $i=1,\dots ,N$, be independent random variables with common distribution that of B. We couple the job sizes of the D/G/N redundancy-d system with job sizes at the N auxiliary D/G/1 queues: we let $b_{n} = b_{n,i_{n}}$, where $i_{n} = \min \{ i \in \varvec{s}^{n}: V_{ni} = W_{\mathrm {min},n} \}$ as defined earlier. The deterministic interarrival times are $T_{n} = n (k+1) (a' - h)$, with

$$\begin{aligned} \frac{k}{k+1} \left( a' - \frac{\mathbb {E}[B]}{k+1} \right)< h < a' - \frac{\mathbb {E}[B]}{k+1}, \end{aligned}$$

(B.3)

so that the auxiliary queueing systems are stable.

Lemma 8

(Counterpart of Lemma 6.2 in [13]) There exists $\beta > 0$ such that, for any set of $k+1$ indices $I=\{i(1),\dots ,i(k+1)\}$, there is a random variable $\eta _{I}$ such that $\mathbb {E}[e^{\beta \eta _{I}}] < \infty $ and, for any n, with probability 1,

$$\begin{aligned} \sum _{i \in I} V_{ni} \le \sum _{i \in I} U_{ni} + \eta _{I}. \end{aligned}$$

(B.4)

Proof

Fix some $i^{*} \in I$. Observe that for $d \ge k+1$ the proof directly follows from Lemma 6.2 in [13], since the JSW-d and JSW policy are equivalent in the system with $k+1$ servers. For $d < k+1$, consider an auxiliary $GI/G/(k+1)$ redundancy-d system as in Lemma 7 with workloads $V^{*}_{n}=(V^{*}_{ni}, i \in I)$ with the same interarrival times equal to $a'$, but whose service times $b^{*}_{n}$ are chosen in a special manner. At any time n, if $i_{n} \in I$, then put $b^{*}_{n} = b_{n,i_{n}}$ and $i^{*}_{n} = i_{n}$. If $i^{*}_{n} \notin I$, then put $b^{*}_{n} = b_{n,i^{*}}$ and $i^{*}_{n} = i^{*}$. Applying property (1) of Lemma 4.1 in [13], we get that $R(V_{ni}, i \in I) \le R(\varvec{V}^{*}_{n})$ coordinate-wise, for any n. Therefore,

$$\begin{aligned} \sum _{i \in I} V_{ni} \le \sum _{i \in I} V^{*}_{ni}. \end{aligned}$$

By the exact same steps as in Lemma 6.2 in [13] using Lemma 7 (the counterpart of Lemma 4.3 in [13]), we can prove Equation (B.4).

Just like a crucial step in [13], Lemma 8 can be used to upper bound the waiting time in the N-server system by the waiting time in the corresponding system with deterministic interarrival times minus a negligible term. Note that the upper bounds are not as sharp as in [13] since, unlike [13], $W_{\mathrm {min},n} \nleq \frac{1}{k+1} \sum _{i \in I} V_{ni}$ for every collection I.

Lemma 9

(Counterpart of Lemma 6.1 in [13]) There exists a number $\beta > 0$ and a random variable $\eta $ such that $\mathbb {E}[e^{\beta \eta }] < \infty $ and, for all n, with probability 1

i) if $k \ge N-d$,

$$\begin{aligned} W_{\mathrm {min},n} \le \frac{k+1}{k+1-N+d}U_{n,(k+1)} + \eta , \end{aligned}$$

where $U_{n,(k+1)}$ is the $(k+1)$th order statistic of vector $(U_{n1}, \dots , U_{nN})$,

ii) if $k \le N-d$,

$$\begin{aligned} W_{\mathrm {min},n} \le (k+1)U_{n,(N-d+1)} + \eta . \end{aligned}$$

Proof

i) For $k \ge N-d$, we have for every collection I of $k+1$ coordinates,

$$\begin{aligned} W_{\mathrm {min},n} \le \frac{1}{k+1-N+d} \sum _{i \in I} V_{ni}, \end{aligned}$$

(B.5)

since $W_{\mathrm {min},n}$ is no larger than the $(N-d+1)$th smallest value of $V_{ni}$, $i \in I$. Then, it follows from Lemma 8 that

$$\begin{aligned} W_{\mathrm {min},n} \le \frac{1}{k+1-N+d} \sum _{i \in I} U_{ni} + \eta , \end{aligned}$$

(B.6)

where $\eta := \max _{I : |I|=k+1} \eta _{I}$. Take I such that $\{U_{ni}, i \in I\}$ are the $k+1$ smallest coordinates of the vector $(U_{n1},\dots ,U_{nN})$. Then, $U_{ni} \le U_{n,(k+1)}$ for every $i \in I$.

ii) For $k \le N-d$, we take the collection I of $k+1$ coordinates such that $\tilde{i}_{n} = \text {arg}\min _{i}\{U_{ni} : i \in \varvec{s}^{n}\} \in I$. Hence, $I \cap \varvec{s}^{n} \ne \emptyset $ and again Eqs. (B.5) and (B.6) hold. Take the remaining coordinates of I such that $\{U_{ni}, i \in I \setminus \tilde{i}_{n}\}$ are the k smallest coordinates of the vector $(U_{n1},\dots ,U_{n,\tilde{i}_{n}-1},U_{n,\tilde{i}_{n}+1},\dots ,U_{nN})$. Then, $U_{ni} \le U_{n,(N-d+1)}$ for every $i \in I$. Indeed, in the worst case $\{U_{ni} : i \in \varvec{s}^{n}\}$ are the d largest coordinates of the vector $(U_{n1},\dots ,U_{nN})$, but $\tilde{i}_{n}$ is defined as the argument that achieves the minimum of the set $\{U_{ni} : i \in \varvec{s}^{n}\}$ from which it follows that $U_{n\tilde{i}_{n}} \le U_{n,(N-d+1)}$.

Theorem 4

(Analogous to Theorem 7.1 in [13]) Let $\tilde{\rho }<k+1$ for some $k \in \{0,\dots ,N-1\}$. Then, for any fixed h satisfying Eq. (B.3) there exists $\beta > 0$ such that

i) if $k \ge N-d$,

$$\begin{aligned} \mathbb {P}(W_{\mathrm {min}} > x+y) \le {N \atopwithdelims ()k} \left( \bar{F}_{M_{\mathrm {rw}}}\left( \frac{(k+1-N+d)x}{k+1}\right) \right) ^{N-k} + \text {const} \cdot e^{-\beta y}, \end{aligned}$$

for all $x,y > 0$,

ii) if $k \le N-d$,

$$\begin{aligned} \mathbb {P}(W_{\mathrm {min}} > x+y) \le {N \atopwithdelims ()d} \left( \bar{F}_{M_{\mathrm {rw}}}\left( \frac{x}{k+1} \right) \right) ^{d} + \text {const} \cdot e^{-\beta y}, \end{aligned}$$

for all $x,y > 0$, where $F_{M_{\mathrm {rw}}}$ is the cumulative distribution function of the random variable

$$\begin{aligned} M_{\mathrm {rw}} := \sup _{n \ge 1} \left\{ 0, \sum _{j=1}^{n}(b_{j}-(k+1)(a'-h))\right\} . \end{aligned}$$

Proof

The proof follows along the same lines of Theorem 7.1 in [13] relying on Lemma 9, which upper bounds the waiting time of the nth job in the two cases $k \ge N-d$ and $k \le N-d$. For the full proof, we refer to [26].

The proof of the upper bounds (8) and (10) follows by taking the same steps as in the proof of the upper bound in Theorem 1.6 in [13]. Again, we refer to [26] for the exact steps.

So far, we assumed deterministic interarrival times; the following lemma allows us to extend the proof to the case of generally distributed interarrival times. For clarity, we highlight the metrics that correspond to the system with deterministic interarrival times by an apostrophe.

Lemma 10

(Counterpart of Lemma 1 in [12]) If $\mathbb {P}(W_{\mathrm {min}}' > x) \le \bar{G}(x)$ for some long-tailed distribution G, where $W_{\mathrm {min}}'$ denotes the waiting time in the system with deterministic interarrival times $a' \equiv \mathbb {E}[A]$, then

$$\begin{aligned} \limsup _{x \rightarrow \infty } \frac{\mathbb {P}(W_{\mathrm {min}} > x)}{\bar{G}(x)} \le 1. \end{aligned}$$

Proof

The proof follows along the same lines as Lemma 1 in [12]. For the full proof, we refer to [26].

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Raaijmakers, Y., Borst, S. & Boxma, O. Fork–join and redundancy systems with heavy-tailed job sizes. Queueing Syst 103, 131–159 (2023). https://doi.org/10.1007/s11134-022-09856-6

Download citation

Received: 31 May 2021
Revised: 11 April 2022
Accepted: 03 August 2022
Published: 01 September 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s11134-022-09856-6

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Fork–join and redundancy systems with heavy-tailed job sizes

Abstract

Similar content being viewed by others

Biased Processor Sharing in Fork-Join Queues

Dynamic Control of the Join-Queue Lengths in Saturated Fork-Join Stations

Stochastic bounds in Fork–Join queueing systems under full and partial mapping

1 Introduction

1.1 Fork–join model

1.2 Redundancy scheduling

2 Model description and preliminaries

3 FCFS discipline

3.1 Cancel-on-start

Theorem 1

Proof

Corollary 1

Proof

Remark 1

Corollary 2

Proof

3.2 Cancel-on-completion

Lemma 1

Proof

Corollary 3

Proof

Lemma 2

Proof

Corollary 4

Proof

Theorem 2

Proof

Corollary 5

Proof

Remark 2

4 LCFS-PR discipline

4.1 Cancel-on-start

4.2 Cancel-on-completion

Lemma 3

Proof

Corollary 6

Proof

Theorem 3

Proof

Remark 3

5 Numerical results

6 Conclusion and suggestions for further research

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: Preliminary results

Definition 1

Definition 2

Definition 3

Definition 4

Definition 5

Definition 6

Lemma 4

Proof

Lemma 5

Proof

Lemma 6

Proof

Appendix B: Proof of the upper bounds in Theorem 1

Lemma 7

Proof

Lemma 8

Proof

Lemma 9

Proof

Theorem 4

Proof

Lemma 10

Proof

Rights and permissions

About this article

Cite this article