1 Introduction

In recent years, the fork–join model has attracted strong interest. This model is a theoretical abstraction of the popular MapReduce framework [8]. MapReduce is a programming model for processing and generating big data sets with parallel algorithms on clusters. In MapReduce, every job is divided into tasks which can be processed in parallel in any order. For completion of the job, the completed tasks need to be joined together.

1.1 Fork–join model

In the fork–join (\(n_{\mathrm {F}},n_{\mathrm {J}}\)) model, tasks (also referred to as replicas) of a job are assigned to \(n_{\mathrm {F}}\) servers selected uniformly at random. Redundant tasks are abandoned as soon as \(n_{\mathrm {J}}\) of the \(n_{\mathrm {F}}\) tasks either enter service (‘cancel-on-start,’ c.o.s.) or finish service (‘cancel-on-completion,’ c.o.c.). The job is completed when all these \(n_{\mathrm {J}}\) tasks complete service.

Note that in the c.o.s. variant of the fork–join (\(n_{\mathrm {F}},n_{\mathrm {J}}\)) model the dependency structure between the replicas does not play a role, since at all times there is only one replica of the job in service. In contrast, in the c.o.c. variant several replicas of the same job may be in service at the same time, and hence the dependency structure does matter. Special cases of the dependency structure are: (1) perfect dependency between the variables, so-called identical replicas, where the job size is preserved for all replicas and (2) no dependency at all, so-called i.i.d. replicas.

Analytical results for the fork–join model are unfortunately scarce. Tight characterizations of the response time are only known in the special case of \(n_{\mathrm {F}}=n_{\mathrm {J}}=2\), see [10]. For a survey on results in other special cases, we refer to [29]. Results for the expectation of the response time are established when \(n_{\mathrm {F}}=n_{\mathrm {J}} \rightarrow \infty \), see for example [3, 22]. For a more detailed overview of the results and applications, we refer to [17].

1.2 Redundancy scheduling

Redundancy-d scheduling is a special case within the fork–join model. In redundancy-d scheduling, replicas of a job are assigned to d servers selected uniformly at random. Redundant replicas are abandoned as soon as one of the d replicas either enters service (c.o.s.) or finishes service (c.o.s.). Thus, redundancy-d scheduling is equivalent to the fork–join model with \(n_{\mathrm {F}}=d\) and \(n_{\mathrm {J}}=1\). Observe that the c.o.s. variant of redundancy-d is equivalent to the Join-the-Smallest-Workload-d (JSW-d) policy, which assigns an arriving job to the server with the smallest workload among d servers selected uniformly at random, see [2]. The c.o.c. variant of redundancy-d shares similarities with a strategy that assigns the job to the server that provides the minimum response time among d servers selected uniformly at random, but involves possibly concurrent service of multiple replicas.

It has been empirically shown that redundancy scheduling can improve performance in parallel-server systems [31], especially in case of highly variable job sizes. More specifically, for large-scale applications such as Google search, the ability of redundancy scheduling to reduce the expectation and the tail of the response time has been demonstrated [7]. Our understanding of redundancy scheduling is growing, and especially the stability condition for c.o.c. redundancy policies has received considerable attention, however, expressions for performance metrics such as the expectation or the distribution of the response time remain scarce. In [16], analytical expressions for the expected response time are obtained for exponential job sizes and independent and identically distributed (i.i.d.) replicas. Under the assumption of asymptotic independence, a fixed-point equation characterizing the response time distribution for identical and i.i.d. replicas is derived in [19].

In this paper, we examine the tail behavior of the response time when job sizes are heavy-tailed, which is one of the most relevant scenarios in redundancy scheduling and the fork–join model. Indeed, heavy tails in parallel processing are encountered in conjunction with the MapReduce framework developed at Google and its Hadoop open source implementation [9]. Moreover, measurement studies show that workload characteristics such as file sizes, CPU times, and session lengths tend to be heavy-tailed, see [17, 23, 32] and the references therein. The tail behavior of the waiting time distribution of the single-server queue is well known, see for example [30] or [32, Chapter 2]. Let \(W_{\mathrm {FCFS}}\) denote the waiting time for the single-server queue with the FCFS discipline, for subexponential (see Definition 2 in Appendix A) residual job sizes \(B^{\mathrm {res}}\),

$$\begin{aligned} \mathbb {P}(W_{\mathrm {FCFS}}> x) \sim \frac{\tilde{\rho }}{1-\tilde{\rho }} \mathbb {P}(B^{\mathrm {res}}> x) ~~~ \text {as } x \rightarrow \infty , \end{aligned}$$
(1)

where \(\tilde{\rho } := \frac{\mathbb {E}[B]}{\mathbb {E}[A]}\) denotes the load with A the interarrival time and B the job size, and

$$\begin{aligned} \mathbb {P}(B^{\mathrm {res}}> x) = \frac{1}{\mathbb {E}[B]} \int _{y=x}^{\infty } \mathbb {P}(B > y) \mathrm {d}y. \end{aligned}$$

In particular, for regularly varying (see Definition 6 in Appendix A) job size distributions with index \(-\nu \), i.e., \(\mathbb {P}( B > x) =x^{-\nu } L(x)\) with \(L(\cdot )\) a slowly varying function at infinity,

$$\begin{aligned} \mathbb {P}(W_{\mathrm {FCFS}} > x) \sim \frac{\tilde{\rho }}{1-\tilde{\rho }} \frac{1}{(\nu - 1)\mathbb {E}[B]} L(x) x^{1-\nu } ~~~ \text {as } x \rightarrow \infty . \end{aligned}$$
(2)

One way to understand the tail index \(1-\nu \) is the following. The workload (and waiting time) in an M/G/1 queue is distributed as a geometric(\(\tilde{\rho }\)) sum of residual job sizes \(B^{\mathrm {res}}\). According to the theory of regular variation [4], loosely speaking, regular variation is preserved under integration, and asymptotically one can integrate as if L(y) is kept outside the integral; so

$$\begin{aligned} \mathbb {P}(B^{\mathrm {res}} > x) = \frac{1}{\mathbb {E}[B]} \int _{y=x}^{\infty } L(y) y^{-\nu } \mathrm {d}y \sim \frac{1}{(\nu - 1)\mathbb {E}[B]} L(x) x^{1-\nu } ~~~ \text {as } x \rightarrow \infty , \end{aligned}$$
(3)

which implies that if B is regularly varying with index \(-\nu \), then \(B^{\mathrm {res}}\) is regularly varying with index \(1-\nu \).

The tail behavior in the single-server queue has also been studied for other service disciplines. For regularly varying job sizes, the random order of service (ROS) discipline has the same tail index as the FCFS discipline, but with a smaller pre-factor [6]. For the last-come first-served with preemptive resume (LCFS-PR) discipline and the processor-sharing (PS) discipline, the tail index of the response time for regularly varying job sizes is the same as the tail index of the job size, see [33, 34], respectively. Thus, from a tail perspective, these service disciplines perform better than the FCFS discipline.

Closer related to the c.o.s. variant of redundancy scheduling are the results for the tail behavior of the waiting time for the Join-the-Smallest-Workload (JSW) policy or equivalently the GI/G/N queue, see [12, 13]. The key idea in [12, 13] to first consider deterministic interarrival times made the derivation of the tail behavior substantially more tractable. In [13], it is shown that for long-tailed residual job sizes and \(\tilde{\rho }>k\), where \(k := \left\lfloor \tilde{\rho } \right\rfloor \) is the integer part of the load,

$$\begin{aligned} \mathbb {P}(W_{\mathrm {JSW}}> x) \ge \frac{\tilde{\rho }^{N-k}+ o(1)}{(N-k)!} \mathbb {P} \left( B^{\mathrm {res}} > \frac{\tilde{\rho }+\delta }{\tilde{\rho }-k} x \right) ^{N-k} ~~~ \text {as } x\rightarrow \infty , \end{aligned}$$
(4)

for any \(\delta >0\). For subexponential residual job sizes and \(\tilde{\rho } < k+1\), it is shown that

$$\begin{aligned} \mathbb {P}(W_{\mathrm {JSW}}> x) \le {N \atopwithdelims ()k} \left( \frac{(k+1)\tilde{\rho }}{(k+1)-\tilde{\rho }}+o(1)\right) ^{N-k} \mathbb {P} \left( B^{\mathrm {res}}> x (1-\delta ) \right) ^{N-k} ~~~ \text {as } x\rightarrow \infty . \end{aligned}$$
(5)

A heuristic explanation for the exponent \(N-k\) in Eq. (4) is as follows. After the arrival of \(N-k\) big jobs, \(N-k\) servers will be working on these big jobs for a very long time. The other k servers form an unstable GI/G/k system, which implies that the workload drifts linearly to infinity. Thus, eventually the workload at all N servers will exceed level x, causing the waiting time of an arriving job to be larger than x.

In this paper, we investigate the tail behavior of the response time for both the c.o.s. and c.o.c. variants of redundancy scheduling and the fork–join model when job sizes are heavy-tailed. Throughout the paper, we assume that the system under consideration is in steady state. For regularly varying job sizes with tail index \(-\nu \) and the FCFS discipline, it is shown that the response time for the c.o.s. variant of redundancy-d has tail index \(-\min \{d_{\mathrm {cap}}(\nu -1),\nu \}\), where \(d_{\mathrm {cap}} = \min \{d,N-k\}\) and \(k = \lfloor \tilde{\rho } \rfloor \). For small loads, this result indicates that for \(d < \frac{\nu }{\nu -1}\) the waiting time component is dominant, whereas for \(d > \frac{\nu }{\nu -1}\) the job size component is dominant. Thus, having \(d = \lceil \min \{\frac{\nu }{\nu -1}\} \rceil \) replicas already achieves the optimal asymptotic tail behavior of the response time and creating even more replicas yields no improvements in terms of response time tail asymptotics. For high loads, the results indicate that creating many replicas yields no benefits for the tail index of the response time. For the c.o.c. variant of the more general fork–join (\(n_{\mathrm {F}},n_{\mathrm {J}}\)) model with identical and i.i.d. replicas, the tail index of the response time is \(1-\nu \) and \(1-(n_{\mathrm {F}}+1-n_{\mathrm {J}})\nu \), respectively, and the waiting time component is always dominant. Note that in this case the tail index is independent of the load of the system and for identical replicas even independent of the number of replicas. In the special case of redundancy-d scheduling with identical and i.i.d. replicas, it follows that the tail index of the response time is \(1-\nu \) and \(1-d\nu \), respectively. All these results for the c.o.c. variant rely on the fact that the upper bound system, which is used in the proof, is stable. The stability condition of this system does not necessarily coincide with the stability condition of the original fork–join model.

For the LCFS-PR discipline in the fork–join model, we show that the response time tail is just as heavy as the job size tail, implying that for the c.o.c. variant this discipline achieves better tail asymptotics than the FCFS discipline. For the c.o.s. variant, the LCFS-PR discipline has better tail asymptotics than the FCFS discipline for scenarios with low load and a small number of replicas; in all other scenarios, both service disciplines have similar tail asymptotics. In [27], it is shown that for the c.o.c. variant of redundancy-d scheduling with the PS discipline the tail index of the response time is \(-\nu \) for identical replicas and \(-d\nu \) for i.i.d. replicas. Table 1 provides an overview of the tail index for the various models and service disciplines.

Table 1 Overview of the tail index for the c.o.s. and c.o.c. variant of redundancy scheduling with various service disciplines where the job size is regularly varying with tail index \(-\nu \)

The remainder of the paper is organized as follows. In Sect. 2, we provide a model description and state preliminary results. In Sect. 3, we characterize the tail behavior of the response time for the c.o.s. variant of redundancy scheduling and the c.o.c. variant of the more general fork–join model with the FCFS discipline, with some proofs deferred to Appendix B. In Sect. 4, we discuss the tail behavior in the fork–join model with the LCFS-PR discipline. Section 5 provides numerical results on the tail behavior of the response time in redundancy scheduling with Pareto distributed job sizes. Section 6 contains conclusions and some suggestions for further research. The paper ends with two appendices. Appendix A collects various definitions and results for heavy-tailed random variables, which will be used in the paper. Appendix B provides the proof of part of Theorem 1.

2 Model description and preliminaries

Consider a system of N parallel unit-speed servers. Jobs arrive at the epochs of a renewal process, with successive interarrival times \(A_{i}\), \(i \ge 1\), each distributed as a generic random variable A. When a job arrives, a dispatcher assigns replicas of the job to \(n_{\mathrm {F}}\) servers chosen uniformly at random (without replacement), where \(1 \le n_{\mathrm {F}} \le N\). We consider two possible variants where redundant replicas are abandoned as soon as \(n_{\mathrm {J}}\) of the \(n_{\mathrm {F}}\) replicas either have entered service (c.o.s.) or have finished service (c.o.c.). If in the c.o.s. variant multiple replicas enter service at exactly the same time, then one of these replicas is chosen uniformly at random and starts service. A special case of the fork–join model is redundancy-d scheduling, where \(n_{\mathrm {F}}=d\) and \(n_{\mathrm {J}}=1\). As observed in the introduction, in the c.o.s. variant of redundancy-d the dependency structure between the replicas does not play a role, but in the c.o.c. variant of redundancy-d, and also in the fork–join model, the dependency structure does matter. We thus allow the replica sizes \(B_{1},\dots ,B_{n_{\mathrm {F}}}\) of a job to be governed by some joint distribution function \(F_{B}(b_{1},\dots ,b_{n_{\mathrm {F}}})\), where \(B_{i}\), \(i=1,\dots ,n_{\mathrm {F}}\), are each distributed as some random variable B, but not necessarily independent. Special cases of the dependency structure are: (1) perfect dependency between the variables, so-called identical replicas, where the job size is preserved for all replicas, i.e., \(B_{i}=B\), for all \(i=1,\dots ,n_{\mathrm {F}}\), (2) no dependency at all, so-called i.i.d. replicas.

Finally, let us denote the steady-state waiting times of the replicas at their \(n_{\mathrm {F}}\) servers (the time until their service starts if they are still in the system) by \(W_1,\dots ,W_{n_{\mathrm {F}}}\) and the steady-state response time by R. Let \(X_{(n_{\mathrm {J}})}\) denote the \(n_{\mathrm {J}}\)th order statistic of a set of random variables \(X_{1},\dots ,X_{N}\) and the real random variables \(Y_{1}\) and \(Y_{2}\) are equal in distribution (denoted by \(Y_{1} {\mathop {=}\limits ^{d}} Y_{2}\)) if \(\mathbb {P}(Y_{1}> x) = \mathbb {P}(Y_{2} > x)\) for all \(x \in (-\infty , \infty ) \).

3 FCFS discipline

In this section, we analyze the tail asymptotics of the response time with the FCFS discipline. For the c.o.s. variant (Sect. 3.1), we restrict ourselves to redundancy-d scheduling, whereas for the c.o.c. variant (Sect. 3.2) we allow for the more general fork–join model.

3.1 Cancel-on-start

Observe that the steady-state response time in the c.o.s. variant of redundancy-d is given by

$$\begin{aligned} R {\mathop {=}\limits ^{d}} \min \{W_{1},\dots ,W_{d}\} + B. \end{aligned}$$
(6)

We refer to the time between the arrival of a job and the moment the first replica goes into service as the waiting time \(W_{\mathrm {min}} = \min \{W_{1},\dots ,W_{d}\}\) of a job. As mentioned earlier, the c.o.s. variant of redundancy-d is equivalent to the Join-the-Smallest-Workload-d (JSW-d) policy, which assigns each job to the server with the smallest workload among d servers selected uniformly at random.

For general interarrival times and job sizes, the stability condition for the system with the JSW-d policy and FCFS is given by \(\tilde{\rho } = \frac{\mathbb {E}[B]}{\mathbb {E}[A]} < N\), see [11].

In [13, Theorem 1.6], lower and upper bounds are derived for the tail probability of the waiting time for the JSW policy. The same methodology can be used to find lower and upper bounds for JSW-d, and hence for the c.o.s. variant of redundancy scheduling with \(1 \le d \le N\) replicas, resulting in Theorem 1. The two derived lower bounds in this theorem hold for every value of \(\tilde{\rho }\), but they are asymptotically dominant for different regions of \(\tilde{\rho }\), as explained after the theorem. Note that for \(d=N\) Theorem 1 recovers the results of [13] as captured in (4) and (5), whereas for \(d=1\) the system is equivalent to a GI/G/1 queue for which the tail behavior is given by (2).

Theorem 1

Consider the c.o.s. variant of redundancy-d scheduling with the FCFS discipline. Let \(k = \lfloor \tilde{\rho } \rfloor \in \{0,1,\dots ,N-1\}\) be the integer part of the load and \(\delta >0\).

i) If the residual job size \(B^{\mathrm {res}}\) is long-tailed, then

$$\begin{aligned} \mathbb {P}(W_{\mathrm {min}} > x) \ge \frac{1}{{N \atopwithdelims ()d}} \frac{\tilde{\rho }^{d}+ o(1)}{d!} \left( \bar{B}^{\mathrm {res}} \left( \left( 1 + \delta \right) x \right) \right) ^{d}. \end{aligned}$$
(7)

ii) If \(\tilde{\rho } < N - d\) and the residual job size \(B^{\mathrm {res}}\) is subexponential, then

$$\begin{aligned} \mathbb {P}(W_{\mathrm {min}} > x) \le {N \atopwithdelims ()d} \left( \frac{(k+1)\tilde{\rho }}{k+1-\tilde{\rho }}+o(1)\right) ^{d} \left( \bar{B}^{\mathrm {res}} \left( \frac{x (1-\delta )}{k+1} \right) \right) ^{d}. \end{aligned}$$
(8)

iii) If the residual job size \(B^{\mathrm {res}}\) is long-tailed, then

$$\begin{aligned} \mathbb {P}(W_{\mathrm {min}} > x) \ge \frac{\tilde{\rho }^{N-k}+ o(1)}{(N-k)!} \left( \bar{B}^{\mathrm {res}}\left( \frac{\tilde{\rho }+\delta }{\tilde{\rho }- k} x\right) \right) ^{N-k}. \end{aligned}$$
(9)

iv) If \(\tilde{\rho } > N - d\) and the residual job size \(B^{\mathrm {res}}\) is subexponential, then

$$\begin{aligned} \mathbb {P}(W_{\mathrm {min}} > x) \le {N \atopwithdelims ()k} \left( \frac{(k+1)\tilde{\rho }}{k+1-\tilde{\rho }}+o(1)\right) ^{N-k} \left( \bar{B}^{\mathrm {res}} \left( \frac{(k+1-N+d)x (1-\delta )}{k+1} \right) \right) ^{N-k}. \end{aligned}$$
(10)

Proof

Let \(\varvec{V}=(V_{1},\dots ,V_{N})\) denote the vector of residual workloads of the servers. Recall that \(V_{(i)}\) denotes the ith-order statistic of the set \(V_{1},\dots ,V_{N}\). The proof of i) follows from the inequality

$$\begin{aligned} \mathbb {P}(W_{\mathrm {min}}> x) \ge \frac{1}{{N \atopwithdelims ()d}} \mathbb {P}(V_{(1)}> x,\dots ,V_{(d)} > x), \end{aligned}$$

with \(\frac{1}{{N \atopwithdelims ()d}}\) corresponding to the probability that the replicas of an arbitrary job are assigned to the servers with the d largest workloads, and where

$$\begin{aligned} \mathbb {P}(V_{(1)}> x,\dots ,V_{(d)} > x) \ge \frac{\tilde{\rho }^{d}+ o(1)}{d!} \left( \bar{B}^{\mathrm {res}} \left( \left( 1 + \delta \right) x \right) \right) ^{d}, \end{aligned}$$

by similar arguments as in the proof of Lemma 3.1 in [13]. The proof of iii) follows from the inequality

$$\begin{aligned} \mathbb {P}(W_{\mathrm {min}}> x) \ge \mathbb {P}(V_{1}> x,\dots ,V_{N} > x), \end{aligned}$$

where

$$\begin{aligned} \mathbb {P}(V_{1}> x,\dots ,V_{N} > x) \ge \frac{\tilde{\rho }^{N-k}+ o(1)}{(N-k)!} \left( \bar{B}^{\mathrm {res}}\left( \frac{\tilde{\rho }+\delta }{\tilde{\rho }- k} x\right) \right) ^{N-k}, \end{aligned}$$

by similar arguments as in the proof of Theorem 5.1 in [13]. The proof of ii) and iv) can be found in Appendix B.

As reflected in the proof sketches, the asymptotic lower bounds in (7) and (9) correspond to two different scenarios for a large value of \(W_{\mathrm {min}}\) to occur.

Scenario 1 involves the arrival of d jobs of size x or larger ‘overlapping in time.’ In the JSW-d system, these jobs will be assigned to d different servers with overwhelming probability for large x, and thus the workload at these d servers will exceed x. A newly arriving job that is so unfortunate as to sample exactly these d servers (which happens with probability \(1/{N \atopwithdelims ()d}\)) will experience a waiting time larger than x. Scenario 2 involves the arrival of \(N-k\) sufficiently large jobs ‘overlapping in time,’ which instantaneously causes the workloads at \(N-k\) servers to become large as described above, assuming \(N - k \le d\). This will also result in subsequent jobs all being assigned to one of the other k servers and hence create overload, so that the workloads at these servers will gradually start growing. Thus, eventually the workloads at all servers will be large, and every arriving job will experience a large waiting time. Observe that this scenario corresponds to that in the GI/G/N queue discussed in [13], as illustrated by the match with Eq. (4).

Scenarios 1 and 2 are asymptotically dominant in case \(d \le N-k\) and \(d \ge N-k\), respectively, reflecting that a large waiting time is most likely due to a minimum number of \(d_{\text{ cap }} = \min \{d, N - k\}\) large jobs. Note that in Scenario 1 the workloads at all servers will in fact grow large as well when \(d \ge N - k\), but that Scenario 2 dominates in that case.

Scenarios with large workloads at l servers, with \(d< l < N\), do not asymptotically contribute to the probability of a large waiting time. This may be intuitively explained by observing the following. (1) If such scenarios involve strictly more than d large workloads without resulting in overload of all servers (so \(d< l < N-k\)) then they are asymptotically much less likely than Scenario 1. (2) If such scenarios involve \(l \ge N-k\) large workloads, this will quickly result in overload of all servers, just like in Scenario 2.

Extending these arguments and the results in [13] to the c.o.s. variant of the fork–join (\(n_{\mathrm {F}},n_{\mathrm {J}}\)) model is complicated since in that model multiple replicas of the same job may be in service simultaneously.

Corollary 1

(Analogous to Corollary 1.1 in [13]) Let the residual job size \(B^{\mathrm {res}}\) be long-tailed and dominated varying and \(k<\tilde{\rho }<k+1\), i.e., \(\tilde{\rho }\) not an integer value. Then, there exist constants \(c_{1}\) and \(c_{2}\) such that, for all x,

$$\begin{aligned} c_{1} \left( \bar{B}^{\mathrm {res}}(x ) \right) ^{d_{\mathrm {cap}}} \le \mathbb {P}(W_{\mathrm {min}} > x) \le c_{2} \left( \bar{B}^{\mathrm {res}}(x ) \right) ^{d_{\mathrm {cap}}}, \end{aligned}$$

where \(d_{\mathrm {cap}} = \min \{d,N-k\}\).

Proof

The result follows directly from Theorem 1, the last inclusion in (A.3) and the definition of dominated variation (Definition 4 in Appendix A).

Remark 1

Note that in Corollary 1 we exclude integer values for the load. Most of the results in the literature for heavy-tailed queueing systems focus on the case where the load is not an integer, since the integer case is significantly more delicate to analyze. For a detailed study on the integer case in the GI/G/2 queueing system we refer to [5].

Corollary 2

For the c.o.s. variant of redundancy-d scheduling with the FCFS discipline:

  1. i)

    if \(B \in RV(-\nu )\), then \(W_{\mathrm {min}} \in ORV(d_{\mathrm {cap}}(1-\nu ))\),

  2. ii)

    if \(B \in RV(-\nu )\), then \(R \in ORV(-\min \{d_{\mathrm {cap}}(\nu -1),\nu \})\).

Proof

It is well known that if \(B \in RV(-\nu )\), then \(B^{\mathrm {res}} \in RV(1-\nu )\), see (3). The proof of i) follows by applying this result to Corollary 1 together with the inclusion \(RV \subset \mathcal {L} \cap {\mathcal {D}}\) from (A.3) and Lemma 6 in Appendix A (see Definition 5 in Appendix A for the definition of \(\mathcal {O}\)-regularly varying; ORV). The proof of ii) follows by i), Eq. (6) and Lemma 5 in Appendix A.

From Corollary 2, we conclude that the waiting time component is dominant in the response time tail as long as \(d_{\mathrm {cap}} \le \frac{\nu }{\nu -1}\), but otherwise the job size component is dominant. Better than that (\(x^{-\nu }\) tail behavior) is, obviously, not possible for the response time. In other words, having more than \(\frac{\nu }{\nu -1}\) replicas will not provide any improvement in the tail behavior. For example, consider a system with a sufficiently small load. If \(\nu =4/3\), then \(d=4\) already yields \(R \in ORV(-\nu )\), and from a tail perspective choosing \(d>4\) yields no benefits. If \(\nu =3/2\), then it does not pay to take d larger than 3. If \(\nu \ge 2\) (so B has a finite second moment), then it does not pay to take d larger than 2. For high loads, the results indicate that creating many replicas yields no benefits for the tail index of the response time.

3.2 Cancel-on-completion

In this section, we analyze the tail behavior for the c.o.c. variant of the fork–join(\(n_{\mathrm {F}},n_{\mathrm {J}}\)) model.

The steady-state response time may be represented as

$$\begin{aligned} R {\mathop {=}\limits ^{d}} (W+B)_{(n_{\mathrm {J}})}, \end{aligned}$$
(11)

where \((W+B)_{(n_{\mathrm {J}})}\) denotes the \(n_{\mathrm {J}}\)th order statistic of the random variables \(W_{1}+B_{1},\dots ,W_{n_{\mathrm {F}}}+B_{n_{\mathrm {F}}}\).

Our analysis is based on an upper and lower bound for the waiting time and response time via the workload in carefully chosen upper and lower bound systems.

We first introduce the upper bound system which is the same as the original system except for two differences. In the upper bound system all jobs are assigned to the same \(n_{\mathrm {F}}\) servers. The second difference is that in the upper bound system the sizes of the \(n_{\mathrm {J}}\) smallest replicas are increased to \(B_{(n_{\mathrm {J}})}\). This upper bound system is similar to the system defined in [25, Lemma 1].

Let us define the workload as the amount of work a server needs to complete to become idle in absence of any arrivals. Consider the scenario where all \(n_{\mathrm {F}}\) servers have the same amount of workload. For the FCFS discipline, it follows from the cancel-on-completion property that the \(n_{\mathrm {J}}\)th smallest replica will always be the \(n_{\mathrm {J}}\)th to complete, after which the other remaining replicas are abandoned. Hence, the workload at these \(n_{\mathrm {F}}\) servers stays equal at all times, and it follows that the upper bound system with multiple servers is equivalent to a GI/G/1/FCFS queue with interarrival time A and job size \(B_{(n_{\mathrm {J}})}\).

Lemma 1

The maximum workload in the c.o.c. variant of the fork–join(\(n_{\mathrm {F}},n_{\mathrm {J}}\)) model is sample-pathwise bounded from above by the workload in the upper bound system.

Proof

Let \(\omega _{i}\) be the workload at server i in the c.o.c. variant of the fork–join(\(n_{\mathrm {F}},n_{\mathrm {J}}\)) model, and let the maximum workload be defined as \(\max _{j \in \{1,\dots ,N \}}\omega _{j} = \omega _{(N)}\). Let \(s_{l}\) and \(b_{l}\) denote the sampled server and the realized job size of the l-th replica, respectively, for \(l=1,\dots ,n_{\mathrm {F}}\). By induction, it can be shown that \(\omega _{i}\) is bounded from above by the workload \(\omega _{\mathrm {U}}\) in the upper bound system at all times. Assume that \(\omega _{(N)} \le \omega _{\mathrm {U}}\) after the m-th arrival. Then, after the \((m+1)\)-th arrival the new workload, denoted by \(\omega _{\mathrm {new},s_{l}}\), is

$$\begin{aligned} \omega _{\mathrm {new},s_{l}} = \max \{(\omega _{s_{l}} +b_{l})_{(n_{\mathrm {J}})}, \omega _{s_{l}}\} \le \max \{ (\omega _{(N)}+b_{l})_{(n_{\mathrm {J}})}, \omega _{(N)}\} = \omega _{(N)} + b_{(n_{\mathrm {J}})}, \end{aligned}$$

for \(l=1,\dots ,n_{\mathrm {F}}\), since \(\omega _{i} \le \omega _{(N)}\) for all \(i=1,\dots ,N\). Thus, the increase in maximum workload is bounded by \(b_{(n_{\mathrm {J}})}\), which is exactly the increase in workload in a GI/G/1/FCFS queue with interarrival time A and job size \(B_{(n_{\mathrm {J}})}\).

Corollary 3

The waiting time \(W_{(n_{\mathrm {J}})}\) and the response time R in the c.o.c. variant of the fork–join(\(n_{\mathrm {F}},n_{\mathrm {J}}\)) model with the FCFS discipline are stochastically bounded from above by the waiting time \(W_{\mathrm {U}}\) and response time \(R_{\mathrm {U}}\), respectively, in the upper bound system.

Proof

By Lemma 1, the maximum workload in the c.o.c. variant of the fork–join(\(n_{\mathrm {F}},n_{\mathrm {J}}\)) model is bounded from above by the workload \(W_{\mathrm {U}}^{0}\) in the upper bound system. This bound implies \(W_{i} \le W_{\mathrm {U}}^{0}\) for all \(i=1,\dots ,N\), from which it follows that \(W_{(n_{\mathrm {J}})} \le W_{\mathrm {U}}^{0} {\mathop {=}\limits ^{d}} W_{\mathrm {U}}\) and

$$\begin{aligned} R {\mathop {=}\limits ^{d}} (W+B)_{(n_{\mathrm {J}})} \le W_{\mathrm {U}}^{0} + B_{(n_{\mathrm {J}})} {\mathop {=}\limits ^{d}} W_{\mathrm {U}} + B_{(n_{\mathrm {J}})} {\mathop {=}\limits ^{d}} R_{\mathrm {U}}. \end{aligned}$$

We now introduce a lower bound system. In the lower bound system, we only admit jobs for which the \(n_{\mathrm {F}}\) replicas are assigned to servers \(1,\dots ,n_{\mathrm {F}}\), and in addition the ith smallest replica is assigned to server i, \(i=1,\dots ,n_{\mathrm {J}}\). Hence, we only admit a fraction 1/K of the jobs, where \(K= {N \atopwithdelims ()n_{\mathrm {F}}} \frac{n_{\mathrm {F}}!}{(n_{\mathrm {F}}-n_{\mathrm {J}}+1)!}\) and we do not alter the assignment of the replicas.

Lemma 2

The workload at each server in the c.o.c. variant of the fork–join(\(n_{\mathrm {F}},n_{\mathrm {J}}\)) model is sample-pathwise bounded from below by the workload at the corresponding server in the lower bound system.

Proof

Since in the lower bound system we only allow arrivals to the first \(n_{\mathrm {F}}\) servers for which in addition the ith smallest replica is assigned to server i, \(i=1,\dots ,n_{\mathrm {J}}\) and since we delete the other arrivals (which are not deleted in the original system), the amount of work at each server in the lower bound system cannot be larger than the amount of work at the corresponding server in the original system.

For the FCFS discipline, the \(n_{\mathrm {J}}\)th smallest replica will always be the \(n_{\mathrm {J}}\)th to complete in the lower bound system. Moreover, this replica is always assigned to server \(n_{\mathrm {J}}\). Hence, this server acts as the bottleneck server since it dictates the waiting time and the response time of all the admitted jobs, and can be viewed as the server of a GI/G/1/FCFS queue with a random selection of the arrivals based on Bernoulli experiments with probability 1/K, i.e., mean interarrival time \(K \mathbb {E}[A]\), and with a job size \(B_{(n_{\mathrm {J}})}\).

Corollary 4

The waiting time \(W_{(n_{\mathrm {J}})}\) and the response time R in the c.o.c. variant of the fork–join(\(n_{\mathrm {F}},n_{\mathrm {J}}\)) model with the FCFS discipline are stochastically bounded from below by the waiting time \(W_{\mathrm {L}}\) and response time \(R_{\mathrm {L}}\), respectively, in the above-mentioned GI/G/1/FCFS queue.

Proof

By Lemma 2, the workload at each server in the c.o.c. variant of the fork–join(\(n_{\mathrm {F}},n_{\mathrm {J}}\)) model is bounded from below by the workload at the corresponding server in the lower bound system. Also, in the lower bound system, the workload \(W_{\mathrm {L}}^{0}\) at the bottleneck server \(n_{\mathrm {J}}\) is no smaller than the workload at servers \(1,\dots ,n_{\mathrm {J}}-1\), and no larger than the workload at servers \(n_{\mathrm {J}}+1,\dots ,n_{\mathrm {F}}\). Thus, \(W_{i} \ge W_{\mathrm {L}}^{0}\) for all \(i=n_{\mathrm {J}},\dots ,n_{\mathrm {F}}\), which implies \(W_{(n_{\mathrm {J}})} \ge W_{\mathrm {L}}^{0} {\mathop {=}\limits ^{d}} W_{\mathrm {L}}\). Since the replica sizes at servers \(n_{\mathrm {J}}+1,\dots ,n_{\mathrm {F}}\) are no smaller than at server \(n_{\mathrm {J}}\), it further follows that

$$\begin{aligned} R {\mathop {=}\limits ^{d}} (W+B)_{(n_{\mathrm {J}})} \ge W_{\mathrm {L}}^{0} + B_{(n_{\mathrm {J}})} {\mathop {=}\limits ^{d}} W_{\mathrm {L}} + B_{(n_{\mathrm {J}})} {\mathop {=}\limits ^{d}} R_{\mathrm {L}}. \end{aligned}$$

A sufficient stability condition for general interarrival times and job sizes is \(\rho _{\mathrm {U}} := \frac{\mathbb {E}[B_{(n_{\mathrm {J}})}]}{\mathbb {E}[A]} < 1\), which can be proved via the upper bound system given in Corollary 3. The exact stability condition for the c.o.c. variant of the fork–join(\(n_{\mathrm {F}},n_{\mathrm {J}}\)) model, and also redundancy-d scheduling, with the FCFS discipline in such a general setting is still unknown. Observe that it is hard to improve upon this sufficient stability condition resulting from the upper bound system. Indeed, finding an upper bound system that copes with multiple replicas, which may be in service concurrently and have different starting times, while being analytically tractable is difficult, as is also reflected in the scarcity of analytical results for the fork–join model in the literature.

Theorem 2

If \(\rho _{\mathrm {U}} < 1\) and the residual job size \(B_{(n_{\mathrm {J}})}^{\mathrm {res}}\) is subexponential, then for the c.o.c. variant of the fork–join(\(n_{\mathrm {F}},n_{\mathrm {J}}\)) model with the FCFS scheduling discipline:

$$\begin{aligned} \frac{\rho _{\mathrm {L}}}{1-\rho _{\mathrm {L}}} \bar{B}_{(n_{\mathrm {J}})}^{\mathrm {res}}(x) \le \mathbb {P}(W_{(n_{\mathrm {J}})} > x) \le \frac{\rho _{\mathrm {U}}}{1-\rho _{\mathrm {U}}} \bar{B}_{(n_{\mathrm {J}})}^{\mathrm {res}}(x) ~~~ \text {as } x \rightarrow \infty , \end{aligned}$$

where \(\rho _{\mathrm {L}} =\frac{\mathbb {E}[B_{(n_{\mathrm {J}})}]}{K \mathbb {E}[A]}\) with \(K={N \atopwithdelims ()d}\frac{n_{\mathrm {F}}!}{(n_{\mathrm {F}}-n_{\mathrm {J}}+1)!}\) and \(\rho _{\mathrm {U}} =\frac{\mathbb {E}[B_{(n_{\mathrm {J}})}]}{\mathbb {E}[A]}\).

Proof

Upper bound: By Corollary 3, the waiting time of a job is bounded from above by the waiting time \(W_{\mathrm {U}}\) in a GI/G/1/FCFS queue with interarrival time A and job size \(B_{(n_{\mathrm {J}})}\). Thus, by the subexponentiality of \(B_{(n_{\mathrm {J}})}^{\mathrm {res}}\), we can apply known results for the single-server queue, see (1), and obtain

$$\begin{aligned} \mathbb {P}(W_{\mathrm {U}} > x) \sim \frac{\rho _{\mathrm {U}}}{1-\rho _{\mathrm {U}}} \bar{B}_{(n_{\mathrm {J}})}^{\mathrm {res}}(x) ~~~ \text {as } x \rightarrow \infty . \end{aligned}$$
(12)

Lower bound: By Corollary 4, the waiting time of a job is bounded from below by the waiting time \(W_{\mathrm {L}}\) in a GI/G/1/FCFS queue with a random selection of the arrivals based on Bernoulli experiments with probability 1/K, i.e., mean interarrival time \(K \mathbb {E}[A]\), and job size \(B_{(n_{\mathrm {J}})}\). Again, by the subexponentiality of \(B_{(n_{\mathrm {J}})}^{\mathrm {res}}\), by applying known results for the single-server queue we obtain

$$\begin{aligned} \mathbb {P}(W_{\mathrm {L}} > x) \sim \frac{\rho _{\mathrm {L}}}{1-\rho _{\mathrm {L}}} \bar{B}_{(n_{\mathrm {J}})}^{\mathrm {res}}(x) ~~~ \text {as } x \rightarrow \infty . \end{aligned}$$
(13)

By combining (12) and (13), we get the desired statement.

Note that the lower bound in Theorem 2 is valid even if \(\rho _{\mathrm {U}} > 1\), since the auxiliary system in Corollary 4 is stable if the original system is stable.

The next corollary provides insight in the tail behavior when the distribution of the \(n_{\mathrm {J}}\)th order statistic of the job size is regularly varying, i.e., \(B_{(n_{\mathrm {J}})} \in RV(-\tilde{\nu })\). Observe that, in the special case of identical replicas \(B_{(n_{\mathrm {J}})} \in RV(-\nu )\) when \(B \in RV(-\nu )\), thus in this case \(\tilde{\nu } = \nu \), whereas for i.i.d. replicas \(B_{(n_{\mathrm {J}})} \in RV(-(n_{\mathrm {F}}+1-n_{\mathrm {J}})\nu )\) when \(B \in RV(-\nu )\) (see [20]), thus in this case \(\tilde{\nu } = (n_{\mathrm {F}}+1-n_{\mathrm {J}})\nu \).

Corollary 5

For the c.o.c. variant of the fork–join(\(n_{\mathrm {F}},n_{\mathrm {J}}\)) model with the FCFS discipline and \(\rho _{\mathrm {U}} < 1\):

  1. i)

    if \(B_{(n_{\mathrm {J}})} \in RV(-\tilde{\nu })\), then \(W_{(n_{\mathrm {J}})} \in ORV(1-\tilde{\nu })\),

  2. ii)

    if \(B_{(n_{\mathrm {J}})} \in RV(-\tilde{\nu })\), then \(R \in ORV(1-\tilde{\nu })\).

Proof

For regularly varying residual job sizes, we know that

$$\begin{aligned} \mathbb {P}(B_{(n_{\mathrm {J}})}^{\mathrm {res}} > x) \sim \frac{1}{(\tilde{\nu } - 1)\mathbb {E}[B_{(n_{\mathrm {J}})}]} L(x) x^{1-\tilde{\nu }} ~~~ \text {as } x \rightarrow \infty , \end{aligned}$$

see (3). The proof of i) follows by Theorem 2 and Lemma 6. For the response time, we can again use Corollaries 3 and 4 as in Theorem 2. Using the known result for the tail behavior in the single-server queue, see (2), together with Lemma 4 we obtain that

$$\begin{aligned} \mathbb {P}(R> x) \ge \mathbb {P}(R_{\mathrm {L}}> x) = \mathbb {P}(W_{\mathrm {L}} + B_{(n_{\mathrm {J}})} > x) \sim \frac{\rho _{\mathrm {L}}}{1-\rho _{\mathrm {L}}} \frac{L(x) x^{1-\tilde{\nu }}}{(\tilde{\nu } - 1)\mathbb {E}[B_{(n_{\mathrm {J}})}]}~~~ \text {as } x \rightarrow \infty , \end{aligned}$$

and

$$\begin{aligned} \mathbb {P}(R> x) \le \mathbb {P}(R_{\mathrm {U}}> x) = \mathbb {P}(W_{\mathrm {U}} + B_{(n_{\mathrm {J}})} > x) \sim \frac{\rho _{\mathrm {U}}}{1-\rho _{\mathrm {U}}} \frac{L(x) x^{1-\tilde{\nu }}}{(\tilde{\nu } - 1)\mathbb {E}[B_{(n_{\mathrm {J}})}]} ~~~ \text {as } x \rightarrow \infty . \end{aligned}$$

Now we can apply Lemma 6 in Appendix A and obtain the desired result.

Remark 2

For identical replicas, we can even find a better upper bound in Theorem 2. Indeed, consider the system in which all replicas are completely served. This system is equivalent to a GI/G/1/FCFS queue with a random selection of the arrivals based on Bernoulli experiments with probability \(n_{\mathrm {F}}/N\), i.e., mean interarrival time \(\frac{N \mathbb {E}[A]}{n_{\mathrm {F}}}\), and job size B, which is equal to \(B_{(n_{\mathrm {J}})}\) in the case of identical replicas.

Observe that all the results for the tail index rely on the fact that the upper bound system is stable. The stability condition of this system does not necessarily coincide with the stability condition of the original fork–join model. We conjecture that these tail index results are valid whenever the original fork–join model is stable. However, note that constructing a tractable upper bound system with the same stability condition as the original fork–join model is hard, because this stability condition is unknown.

Interestingly, in contrast to the c.o.s. variant of redundancy, we observe that the tail index in the c.o.c. variant of the fork–join model does not depend on the load of the system. The main difference between the two variants is that for the c.o.s. variant we need multiple big jobs for a large value of \(W_{\mathrm {min}}\) to occur, whereas for the c.o.c. variant we only need one big job. Moreover, note that a big job means that at least \(n_{\mathrm {F}}+1-n_{\mathrm {J}}\) replica sizes should be big since we cancel the redundant replicas as soon as the first \(n_{\mathrm {J}}\) replicas complete service. This is the reason why for i.i.d. replicas we get the tail index \(1-(n_{\mathrm {F}}+1-n_{\mathrm {J}})\nu \) and for identical replicas \(1-\nu \).

In the remainder of this subsection, we focus on two special cases of the dependency structure, namely identical and i.i.d. replicas.

For the special case of identical replicas in the c.o.c. variant of the fork–join model with the FCFS discipline, we have concluded: if \(B \in RV(-\nu )\), then \(R \in ORV(1-\nu )\) which is independent of the number of replicas. We may conclude that the tail index is the same as for the single-server queue, see (2). Moreover, if we compare the tail index of the c.o.s. and c.o.c. variants of redundancy scheduling with identical replicas it follows that the c.o.s. variant always performs better from a tail perspective.

For the special case of i.i.d. replicas in the c.o.c. variant of the fork–join model with the FCFS discipline, we have concluded: if \(B \in RV(-\nu )\), then \(R \in ORV(1-(n_{\mathrm {F}}+1-n_{\mathrm {J}})\nu )\). If \(n_{\mathrm {F}}=n_{\mathrm {J}}=1\), then \(R \in ORV(1-\nu )\) which is consistent with the case of identical replicas. Moreover, if we compare the tail index of the c.o.s. and c.o.c. variants of redundancy scheduling with i.i.d. replicas it follows that the c.o.c. variant always performs better from a tail perspective. Observe that this statement is in contrast with the case for identical replicas.

We studied two special structures for the dependency between replicas. The general case, with a vector \((B_1,\dots ,B_{n_{\mathrm {F}}})\) of possibly dependent and multivariate regularly varying job sizes, will be more involved. For further information on multivariate regular variation, we refer to [28] or [4, Appendix A1.5] and the references therein.

We determined the tail behavior for the c.o.s. variant of redundancy scheduling and the c.o.c. variant of the more general fork–join model. It can be concluded that the analysis of the c.o.s. variant is much more challenging than of the c.o.c. variant. One of the reasons is that for the c.o.s. variant multiple big jobs might be needed for a large waiting time to occur while for the c.o.c. variant only one big job is needed. In some sense, this is remarkable, since for the stability condition it is the other way around: The stability condition for the c.o.s. variant of redundancy scheduling is known, whereas for the c.o.c. variant of the fork–join model, and also redundancy-d scheduling, it is still an open problem for non-exponential job size distributions.

4 LCFS-PR discipline

In this section, we study the tail behavior of the response time in the fork–join model with the LCFS-PR discipline. First, we discuss known results for the single-server queue and in Sects. 4.1 and 4.2 the tail behavior for the c.o.s. and c.o.c. variants of the fork–join model is discussed, respectively.

For the GI/G/1 queue with regularly varying job sizes, the tail behavior of the response time distribution is known

$$\begin{aligned} \mathbb {P}(R_{\text {LCFS-PR}} > x) \sim \mathbb {E}[N_{\text {bp}}] (1-\tilde{\rho })^{-\nu } L(x) x^{-\nu } ~~~ \text {as } x \rightarrow \infty , \end{aligned}$$
(14)

where \(N_{\text {bp}}\) denotes the number of jobs completed during a busy period, see [33]. One way to understand (14) is the following. First, observe that for the LCFS-PR discipline

$$\begin{aligned} R_{\text {LCFS-PR}} {\mathop {=}\limits ^{d}} P, \end{aligned}$$

where P is the busy period of a GI/G/1 queue. Let V(t) be the amount of work in the system at time t and assume that the first job arrives in an empty system at time 0. The busy period P is then defined as

$$\begin{aligned} P := \inf \{t>0 : V(t) = 0\}. \end{aligned}$$

Let the cycle maximum \(C_{\text {max}}\) be defined by

$$\begin{aligned} C_{\text {max}} := \sup \{V(t), 0 \le t \le P\}. \end{aligned}$$

It is shown, see for example [18, Corollary 2.2], that subexponentiality of B implies that \(\mathbb {P}(C_{\text {max}}> x) \sim \mathbb {P}(W_{\text {max}} > x)\), where \(W_{\text {max}}\) is the maximum waiting time during a busy period, and from [1] we know that,

$$\begin{aligned} \mathbb {P}(W_{\text {max}}> x) \sim \mathbb {E}[N_{\mathrm {bp}}] \mathbb {P}(B > x) ~~~ \text {as } x \rightarrow \infty . \end{aligned}$$

Combining both relations gives

$$\begin{aligned} \mathbb {P}(C_{\text {max}}> x) \sim \mathbb {E}[N_{\mathrm {bp}}] \mathbb {P}(B > x) ~~~ \text {as } x \rightarrow \infty . \end{aligned}$$

A large maximum waiting time is most likely due to one large job. After this large job, the system behaves normally and the workload goes to zero with negative drift \(-(1-\tilde{\rho })\). Hence, if \(C_{\text {max}}\) is large, then one would expect that

$$\begin{aligned} P \approx \frac{C_{\text {max}}}{1-\tilde{\rho }}, \end{aligned}$$

from which it follows that

$$\begin{aligned} \mathbb {P}(P > x) \sim \mathbb {E}[N_{\mathrm {bp}}] (1-\tilde{\rho })^{-\nu } L(x) x^{-\nu } ~~~ \text {as } x \rightarrow \infty . \end{aligned}$$

Observing that the busy period coincides with the response time of a job for the LCFS-PR discipline gives the desired result in (14).

4.1 Cancel-on-start

Note that for the LCFS-PR discipline the c.o.s. variant of the fork–join(\(n_{\mathrm {F}},n_{\mathrm {J}}\)) model is equivalent to the system where replicas of each job are assigned to \(n_{\mathrm {J}}\) servers chosen uniformly at random (without replacement), since all replicas immediately go into service. Thus, each queue is equivalent with a \(GI/G/1/LCFS \text {-} PR\) queue with a random selection of the arrivals based on Bernoulli experiments with probability \(n_{\mathrm {J}}/N\), i.e., mean interarrival time \(N \mathbb {E}[A]/n_{\mathrm {J}}\) and mean job size \(\mathbb {E}[B]\). Hence, the stability condition is \(\tilde{\rho } < \frac{N}{n_{\mathrm {J}}}\). For regularly varying job sizes, the tail behavior of the response time is given by

$$\begin{aligned} \mathbb {P}(R> x) = \mathbb {P}(\max _{i=1,\dots ,n_{\mathrm {J}}}R_{\mathrm {LCFS-PR}}> x) \sim n_{\mathrm {J}} \mathbb {P}(R_{\mathrm {LCFS-PR}} > x) ~~~ \text {as } x \rightarrow \infty , \end{aligned}$$

see for example [21].

Observe that a similar reasoning is applicable for any service discipline in which all replicas immediately go into service. Another example is the processor-sharing (PS) discipline for which the tail behavior of the response time for the single-server queue with regularly varying job sizes with index \(-\nu \) is given by

$$\begin{aligned} \mathbb {P}(R_{\text {PS}} > x) \sim (1-\tilde{\rho })^{-\nu } L(x) x^{-\nu } ~~~ \text {as } x \rightarrow \infty , \end{aligned}$$

see for example [32, Chapter 3] or [34].

4.2 Cancel-on-completion

In this section, we analyze the tail asymptotics for the c.o.c. variant of the fork–join(\(n_{\mathrm {F}},n_{\mathrm {J}}\)) model with the LCFS-PR discipline.

We use the same upper bound system as for the FCFS discipline, see Sect. 3.2. Note that for the LCFS-PR discipline the servers wait with serving a new job until all the \(n_{\mathrm {J}}\) replicas are finished or the job is pre-empted. Hence, all \(n_{\mathrm {F}}\) replicas of a job (if present) receive service simultaneously at all times. From this, it follows that for the LCFS-PR discipline the upper bound system with multiple servers is equivalent to a \(GI/G/1/LCFS \text {-} PR\) queue with interarrival time A and job size \(B_{(n_{\mathrm {J}})}\). However, note that for the LCFS-PR discipline it is not sufficient that the upper bound system provides an upper bound in terms of the workload, since the response time in the LCFS-PR discipline does not depend on the workload upon arrival of a job. In Lemma 3, we prove that the upper bound system also provides an upper bound in terms of the residual size of each replica.

Lemma 3

At any time, the residual size of each replica (possibly zero) is larger in the upper bound system than in the c.o.c. variant of the fork–join(\(n_{\mathrm {F}},n_{\mathrm {J}}\)) model.

Proof

The proof follows by contradiction. Let \(t_{0}\) be the first time such that the stated inequality is about to be violated, and distinguish three cases depending on whether this is caused by the arrival of a job, the departure of a replica, or by some other reason.

  • In case of an arrival, the \(n_{\mathrm {F}}\) replicas of the arriving job in the upper bound system are no smaller than in the original system by definition.

  • In case of a departure in the upper bound system because there is a \(n_{\mathrm {J}}\)th replica of a job that has residual size zero, \(n_{\mathrm {F}} - n_{\mathrm {J}}\) replicas are abandoned. However, according to the hypothesis this replica should also have residual size zero in the original system. Now, in the upper bound system \(n_{\mathrm {F}}\) replicas of the job that arrived the latest resume service (if present), but in the original system these replicas also resume service or are already receiving service, since they arrived the latest at their corresponding server.

  • In the absence of any arrival or departure, the inequality can only be violated if the replica in question receives service in the upper bound system but not in the original system, and has the same residual size in both systems at time \(t_{0}\). Recall that in the upper bound system all \(n_{\mathrm {F}}\) replicas of the same job always receive service simultaneously. Thus, replicas of jobs that arrived later than this job already fully completed service because of the LCFS-PR discipline. However, according to the hypothesis these replicas also fully completed service in the original system. Hence, it follows that the replica of interest also receives service in the original system (if present).

Thus, the statement is still true at time \(t_{0}\) in all cases.

Corollary 6

The response time R in the c.o.c. variant of the fork–join(\(n_{\mathrm {F}},n_{\mathrm {J}}\)) model with the LCFS-PR discipline is stochastically bounded from above by the response time \(R_{\mathrm {U}}\) in a \(GI/G/1/LCFS \text {-} PR\) queue with interarrival time A and job size \(B_{(n_{\mathrm {J}})}\).

Proof

By Lemma 3, at any time the residual size of each replica (possibly zero) is larger in the upper bound system than in the c.o.c. variant of the fork–join(\(n_{\mathrm {F}},n_{\mathrm {J}}\)) model. The departure time of a job is the moment at which the \(n_{\mathrm {J}}\)th replica of a job has residual size zero. Hence, it follows that \(R \le R_{\mathrm {U}}\).

A sufficient stability condition for general interarrival times and job sizes is \(\rho _{\mathrm {U}} = \frac{\mathbb {E}[B_{(n_{\mathrm {J}})}]}{\mathbb {E}[A]} < 1\), which can be proved via the upper bound system given in Corollary 6. The exact stability condition of the c.o.c. variant of the fork–join(\(n_{\mathrm {F}},n_{\mathrm {J}}\)) model, and also redundancy-d scheduling, with the LCFS-PR discipline in such a general setting is still unknown.

The next theorem provides insight in the tail behavior when the distribution of the \(n_{\mathrm {J}}\)th order statistic of the job size is regularly varying, i.e., \(B_{(n_{\mathrm {J}})} \in RV(-\tilde{\nu })\). Similarly to the FCFS discipline, it includes the special cases of identical replicas (\(\tilde{\nu } = \nu \)) and i.i.d. replicas (\(\tilde{\nu } = (n_{\mathrm {F}}+1-n_{\mathrm {J}})\nu \)).

Theorem 3

For the c.o.c. variant of the fork–join(\(n_{\mathrm {F}},n_{\mathrm {J}}\)) model with the LCFS-PR discipline and \(\rho _{\mathrm {U}} < 1\): if \(B_{(n_{\mathrm {J}})} \in RV(-\tilde{\nu })\), then \(R \in ORV(-\tilde{\nu })\).

Proof

Upper bound: By Corollary 6, the response time is bounded from above by the response time in a \(GI/G/1/LCFS \text {-} PR\) queue with interarrival time A and job size \(B_{(n_{\mathrm {J}})}\). Let \(R_{\mathrm {U}}\) denote the response time in this upper bound system. Since \(B_{(n_{\mathrm {J}})}\) is regularly varying, we can apply known results for the single-server queue, see (14), and obtain

$$\begin{aligned} \mathbb {P}(R_{\mathrm {U}} > x) \sim \mathbb {E}[N_{\mathrm {bp}}] (1-\rho _{\mathrm {U}})^{-\tilde{\nu }} L(x) x^{-\tilde{\nu }} ~~~ \text {as } x \rightarrow \infty , \end{aligned}$$

where \(\rho _{\mathrm {U}}=\frac{\mathbb {E}[B_{(n_{\mathrm {J}})}]}{\mathbb {E}[A]}\).

Lower bound: One could argue that R cannot have a heavier tail than \(R_{\mathrm {U}}\), but also not a lighter tail, since

$$\begin{aligned} \mathbb {P}(R> x) \ge \mathbb {P}(B_{(n_{\mathrm {J}})}> x) = L(x) x^{-\tilde{\nu }}, ~~~x>0. \end{aligned}$$

The proof follows by Lemma 6 in Appendix A.

Remark 3

For identical replicas, we can even find a better upper bound in Theorem 3. Indeed, consider the system in which all replicas are completely served. This system is equivalent to a \(GI/G/1/LCFS \text {-} PR\) queue with a random selection of the arrivals based on Bernoulli experiments with probability \(n_{\mathrm {F}}/N\), i.e., mean interarrival time \(\frac{N \mathbb {E}[A]}{n_{\mathrm {F}}}\), and job size B, which is equal to \(B_{(n_{\mathrm {J}})}\) in the case of identical replicas.

Theorem 3 indicates that for the LCFS-PR discipline the tail of the response time is just as heavy as the tail of the job size. Comparing the tail behavior in redundancy-d scheduling with the LCFS-PR discipline and with the FCFS discipline we can conclude that, for the c.o.s. variant, the LCFS-PR discipline has better tail behavior than (or equally good as) the FCFS discipline. Loosely speaking, the tail behavior of the LCFS-PR discipline is better in scenarios with small load and a small number of replicas d and the tail behavior of the two service disciplines is similar in all other scenarios. For the c.o.c. variant of the fork–join model, the LCFS-PR discipline always has better tail behavior than the FCFS discipline for all dependency structures between the replicas.

Fig. 1
figure 1

Tail behavior for the response time in the c.o.s. variant of redundancy-d scheduling with Pareto\((\nu =1.5,x_{m}=1/3)\) job sizes, \(\mathbb {E}[B]=1\), \(N=3\), \(\tilde{\rho }=2.5\) and the FCFS discipline. The dashed line depicts the function \(y=x^{-0.5}\)

Fig. 2
figure 2

Tail behavior for the response time in the c.o.c. variant of redundancy-d scheduling with identical Pareto\((\nu =1.5,x_{m}=1/3)\) job sizes, \(\mathbb {E}[B]=1\), \(N=3\), \(\tilde{\rho }=0.5\) and the FCFS discipline. The dashed lines depict the tail behavior for the response time in the lower bound (\(\mathbb {P}(R_{\mathrm {L}} > x)\)) and in the upper bound (\(\mathbb {P}(R_{\mathrm {U}} > x)\)) given in Corollary 5. Note that the system with \(d=1\) and \(d=3=N\) is equivalent to the lower and upper bound system, respectively

5 Numerical results

In the previous sections, we determined the tail behavior of the response time for heavy-tailed job sizes. In this section, we provide simulation results for redundancy-d scheduling that illustrate this tail behavior in various scenarios. All the simulation experiments are conducted with \(10^{9}\) number of jobs. The figures are in log–log scale and we consider Pareto distributed job sizes with shape value \(\nu =1.5\), which means that \(B \in RV(-1.5)\). Note that in the simulation \(\mathbb {P}(R > x) = 0\) for x big enough, which explains the steep drop in all the figures.

In Fig. 1, the tail behavior of the response time for the c.o.s. variant of redundancy is depicted, see Corollary 2 for the corresponding asymptotic bound. It can be seen that especially the lines for \(d=2\) and \(d=N=3\) are following the line representing tail index \(-0.5\) quite well. For \(d=1\), it can be seen that at first it diverges, but after \(x>10\) it also runs parallel to the line representing tail index \(-0.5\).

Figure 2 shows the tail behavior for the response time in the c.o.c. variant of redundancy with identical Pareto job sizes, see Corollary 5 for the asymptotic bound. It can be seen that for every number of replicas the tail index is equivalent to the value identified in Corollary 5. Interestingly, this figure shows that for \(d=2\) the asymptotic lower bound represents the exact tail behavior better than the upper bound.

Figure 3 depicts the tail behavior for the response time in the c.o.c. variant of redundancy with i.i.d. Pareto job sizes. Note that according to Corollary 5 the tail index is given by \(1-d\nu \). To get the same tail behavior for all the numbers of replicas in Fig. 3, we scaled the job size with d.

Fig. 3
figure 3

Tail behavior for the c.o.c. variant of redundancy-d scheduling with i.i.d. Pareto\((\nu =1.5/d,x_{m}=1/3)\) job sizes, \(\mathbb {E}[B_{\mathrm {min}}]=1\), \(N=3\), \(\tilde{\rho }=0.5\) and the FCFS discipline. The dashed lines depict the tail behavior for the response time in the lower bound (\(\mathbb {P}(R_{\mathrm {L}} > x)\)) and in the upper bound (\(\mathbb {P}(R_{\mathrm {U}} > x)\)) given in Corollary 5. Note that the system with \(d=1\) and \(d=3=N\) is equivalent to the lower and upper bound system, respectively

Fig. 4
figure 4

Tail behavior for the response time in the c.o.c. variant of redundancy-d scheduling with identical Pareto\((\nu =1.5,x_{m}=1/3)\) job sizes, \(\mathbb {E}[B]=1\), \(N=3\), \(\tilde{\rho }=0.5\) and the LCFS discipline. The dashed lines depict the tail behavior for the response time in the lower bound (\(\mathbb {P}(R_{\mathrm {L}} > x)\)) and in the upper bound (\(\mathbb {P}(R_{\mathrm {U}} > x)\)) given in Theorem 3. Note that the system with \(d=1\) and \(d=3=N\) is equivalent to the lower and upper bound system, respectively

So far, we only considered the FCFS discipline. Figure 4 shows the tail behavior of the response time for the c.o.c. variant of redundancy with the LCFS discipline.

6 Conclusion and suggestions for further research

In this paper, we studied the tail behavior of the response time in redundancy-d scheduling and the fork–join model for heavy-tailed job sizes. In particular, for the c.o.s. variant of redundancy-d with the FCFS discipline and subexponential job sizes we determined the tail behavior of the response time and showed that it depends on the load of the system. For the c.o.c. variant of the fork–join model, we observed that the tail behavior of the response time depends on the dependency structure between the replicas. For job sizes \(B \in RV(-\nu )\), our results indicate that for the c.o.s. variant of redundancy scheduling in the scenario of sufficiently small load having \(d= \lceil \frac{\nu }{\nu -1} \rceil \) replicas already achieves the optimal asymptotic tail behavior of the response time. For high loads, the results indicate that creating many replicas yields no benefits for the tail index of the response time. For the c.o.c. variant of the fork–join(\(n_{\mathrm {F}},n_{\mathrm {J}}\)) model with identical and i.i.d. replicas, the tail index of the response time is \(1-\nu \) and \(1-(n_{\mathrm {F}}+1-n_{\mathrm {J}})\nu \), respectively. Thus, the tail index is independent of the load of the system and for identical replicas even independent of the number of replicas.

Observe that all the results on the tail index for the c.o.c. variant of the fork–join model rely on the fact that the upper bound system is stable. The stability condition of this system does not necessarily coincide with the stability condition of the original fork–join model. It would also be interesting to study the tail index for values of the load for which the original fork–join model is stable but the upper bound system is unstable.

A natural topic for further research would be to extend our analysis to heterogeneous servers or even more generally to job types that can have different speeds at the various servers, see for example the model in [24].

Another extension would be to analyze the tail behavior of the response time for the ROS service discipline. As mentioned in the introduction, for the single-server queue this discipline has the same tail index as the FCFS discipline. Simulation experiments (not included in this paper) suggest that this statement extends to redundancy-d scheduling.