1 Introduction

This paper is concerned with gradient estimation for waiting times in the G/G/1 queue, where we assume that the distribution of the service times and that of the interarrival times may depend on some real-valued parameter θ. More specifically, let { S θ ( n ) : n ≥ 1 } and { A θ ( n ) : n ≥ 1 } denote the sequence of (iid) service and (iid) inter-arrival times, respectively. We will assume that these sequences are independent of each other. Denoting by {W θ ( n ) :n ≥ 1} the sequence of waiting times, Lindley’s recursion can be written as:

$$\label{eq:w} \forall n\geq 1:\: W_{\theta} (n+1)=\max\big\{W_\theta (n)+D_\theta ( n+1 ) ,0\big\}, $$
(1)

where { D θ ( n ) : n ≥ 1 } is an i.i.d. sequence modeling the drift with D θ ( 1 ) = 0. For this model,

$$\label{eq:d} D_\theta ( n +1) = S_\theta ( n ) - A_\theta ( n+1 ) , \quad n \geq 1 $$
(2)

In what follows, we model a queue that starts empty, that is, W θ (1) = 0, and consequently W θ (n ) is the waiting time of the nth customer arriving to the queue.

We are interested in estimating

$$\label{eq:fixed} \frac{d}{d \theta } \frac{1}{N} \mathbb{E} \left [\,\sum\limits_{n=1}^N W_\theta ( n ) \right ] $$
(3)

for fixed time horizon N . Provided that the queue is stable, we are also interested in the derivative of the steady-state waiting time, i.e.,

$$\label{eq:stst} \frac{d}{d \theta } \left( \lim\limits_{ N \rightarrow \infty } \frac{1}{N} \mathbb{E} \left [\,\sum\limits_{n=1}^N W_\theta ( n ) \right ] \right) = \frac{d}{d \theta } \left(\frac{ \mathbb{E} \left [\,\sum_{n=1}^{\tau_\theta} W_\theta ( n ) \right ] }{ \mathbb{E}[\tau_\theta]} \right) , $$
(4)

where \(\tau_\theta = \min ( n>1 \colon W_\theta(n) =0) -1\) is the number of customers in the first busy cycle of the queue. The equivalence of the formulas follows from renewal theory for regenerative systems (Ross 2002).

The gradient estimation problem posed in Eqs. 3 and 4 has been studied extensively in the literature (Pflug 1996; Fu 2006; Fu and Hu 1997; Glasserman 1994, 1991; Ho and Cao 1991; Cao 2007). There are two seemingly distinct philosophies to the implementation and coding of gradient estimators:

  • single-run computation: the sample-path derivatives are directly computed along-side the simulation of the nominal sample.

  • parallel systems computation: the derivative information is obtained by introducing a finite perturbation and analyzing the effect of this perturbation on the system performance.

In the following we discuss both philosophies in detail.

Single-run estimators  Much effort has been dedicated to the problem of estimating the derivatives in Eqs. 3 and 4 using the so-called single-run estimators. Although no rigorous definition exists, a single-run estimator is one that is built from the observed trajectory of the system. The name on-line estimator is also used for this concept, and it denotes an implementation of the estimation procedure that runs along the trajectory, as the system is observed or simulated. The advantage of such constructions is practical: if the estimation is implemented in a real time system for control or supervisory purposes, then it is not desirable to wait until the end of the horizon N (or the end of a cycle) to evaluate the gradient. Instead, a single-run estimator provides a formula that can be coded in terms of partial computations that use only the past history of the observations of the queue occupancy and/or waiting times. For any n ≤ N it is then possible to retrieve the current “best” gradient estimate. Mathematically, one seeks a derivative process {d θ (n) ; n > 1} that is adapted to the natural filtration of the process, for example, the one generated by {W θ (n); n ≥ 1}.

The canonical and best known examples of such estimation methods are the Infinitesimal Perturbation Analysis (IPA) and the Score Function (SF) methods, where the derivative estimation can be built trajectory by trajectory. For IPA, the derivative estimator is the stochastic derivative of the cost function and uses the derivatives of W θ (n,ω) for each fixed ω. For the SF method, one uses the score function statistics d/[ln f θ (a,s)] of the joint density of inter-arrival and service times, evaluated at the given observations along the trajectory.

Parallel systems  Historically, the development of gradient estimation for queues is closely linked with the theory of perturbation analysis, see Glasserman (1991; Glasserman (1994) and Ho and Cao (1991). The “nominal” system is the queueing system that evolves according to Eq. 1, without perturbations. Finite perturbations give rise to parallel systems where one of the customers has a different service or inter-arrival time, leaving the rest of the inter-arrival and service times unchanged. Commonly, the parallel systems that yield information about the gradient can be interpreted as “what-if” scenarios. The names “phantom” and “marked” customers were introduced in Suri and Cao (1983) to denote parallel systems where the given customer is either added to the nominal system or removed from it. In Baccelli and Brémaud (1993) the name “virtual” customer is used for a “marked” customer. The terminology “phantom estimators” quickly took hold, notably associated with the Rare Perturbation Analysis (RPA). Gong, Ho and Fu introduced and developed the Smoothed Perturbation Analysis (SPA) methodology. When dealing with certain jumps or discontinuities due to threshold-like control variables, SPA formulas also require computation on parallel systems, which they perform via “off-line” computation Fu and Hu (1997). The methodology of Measure Valued Derivatives (MVD) also produces estimators that require parallel systems (called the “plus” and the “minus” systems) that evolve in parallel with the nominal system. Such estimators were introduced in Brémaud and Vázquez-Abad (1992) and Pflug (1996). Gradient estimation for countable state systems has been studied in Cao (1994) and Cao (2007) using the so-called perturbation realization factors, which again require computation of parallel systems representing “what-if” scenarios where a particular attribute of the nominal system is perturbed. In this paper, we will generically use the name “phantom” system for a parallel system used to obtain information on the derivative.

There is a trade-off between these two families of estimators. As proved theoretically in Heidergott et al. (2008) for the special case of the normal distribution, phantom estimators may have a lower variance than single-run estimators, including IPA. However, single-run estimators are computationally more efficient than phantom estimators. This raises the question whether it is possible to combine the virtues of both types of estimators.

The distinction between these types of estimators is less strict than it seems: studying the effect of a finite perturbation does not necessarily lead to an off-line parallel computation. To see this, note that often parallel “what-if” scenarios cannot be constructed using only information of a single trajectory; mathematically, the natural filtration of the nominal process cannot provide enough information to build the derivative, and it must be augmented. However, we may seek a minimal augmentation of the natural filtration of the process to build a version of the parallel processes. This version of the derivative process should be as close as possible to a single-run adapted process. We will show in this paper how the implementation of such parallel systems can be computed efficiently for the G/G/1 queueing model. To summarize, we seek an implementation that will (i) run as the system is observed, and (ii) will need a minimal amount of added information in order to build a derivative process.

For discrete-state processes, a favorable approach is to construct the perturbed path (read “phantom”) from the nominal path by a “cut-and-paste” approach. An early reference on this is Ho and Li (1988) and for a thorough treatment we refer to the seminal work Cao (2007) as well as Ho and Li (1988). Another approach of reading the phantom information from the nominal sample path is by applying an importance sampling approach, see Heidergott and Hordijk (2004) for the general state-space and Li et al. (2008) for an efficient algorithm for constructing a good dominating measure for discrete-state space models. For continuous state-space processes, much research has been done on constructing coupling schemes in terms of merging phantom and nominal systems. Mostly, this approach exploits regeneration of derivative processes. The relation between coupling schemes and sample-path perturbation analysis has been explained in Brémaud (1993). A detailed analysis of coupling techniques useful in gradient estimation can be found in Dai (2000). For Markov processes with general state-space, we refer to Vázquez-Abad (1999) and Heidergott and Vázquez-Abad (2006) for a discussion on couplings of phantoms.

In this paper, we will present single-run versions of the phantom estimator for waiting times in the G/G/1 queue. The key idea is to find clever implementation schemes so that the standard phantom estimator can be implemented in a single-run version. In particular, we will present two ways of obtaining a single-run phantom estimator: (i) via change of measure and (ii) through elaborating on the perturbation propagation rules as known from IPA. The second estimator will combine the best of both worlds: low variance due to the fact that single finite perturbations are analyzed and numerical efficiency as the “phantoms” can be computed alongside the nominal waiting times.

This paper is rather an empirical study than a theoretical work. We hope that the research presented in this paper will stimulate the synthesis of knowledge accumulated in gradient estimation in order to come up with highly efficient gradient estimators.

The paper is organized as follows. IPA and MVD are introduced in Section 2. Section 3 establishes the single-run version of the MVD phantom estimator. The numerical efficiency of the new estimator for the finite horizon problem is illustrated in Section 4 and the steady-state estimation problem is discussed in Section 5.

2 Background on gradient estimation for waiting times

2.1 Infinitesimal perturbation analysis

IPA is based on sample-path derivatives. The key assumptions for achieving unbiasedness of IPA are on the existence of the sample-path derivatives together with the Lipschtiz continuity of the random variable where the Lipschtiz constant is assumed to be integrable. For general background on IPA we refer to Glasserman (1991) and Ho and Cao (1991). The main technical conditions for unbiasedness of IPA for waiting times in the single server queue are the following.

(I1):

For all k it holds that D θ ( k ) is a.s. differentiable with respect to θ.

(I2):

There exists an integrable random variable K such that for Δ sufficiently small it holds that

$$ | D_{\theta + \Delta } ( k ) - D_\theta ( k ) | \leq | \Delta | K \qquad \text{a.s.} $$

for all k.

Under conditions (I1) and (I2) it holds that the waiting times are a.s. differentiable with

$$ \label{eq:dw} \frac{d}{ d \theta } W_\theta ( n +1) = \begin{cases} \dfrac{d}{ d \theta } W_\theta ( n ) + \dfrac{d}{d\theta }D_\theta ( n +1) & W_\theta ( n+1 ) > 0 \\[8pt] 0 & \text{else}. \end{cases} $$
(5)

Moreover, it holds for the finite horizon problem in Eq. 3 that

$$\label{eq:fixed2} \frac{d}{d \theta } \frac{1}{N} \mathbb{E} \left [\,\sum\limits_{n=1}^N W_\theta ( n ) \right ] = \frac{1}{N}\mathbb{E} \left [\,\sum\limits_{n=1}^N \frac{d}{d\theta }W_\theta ( n ) \right ] $$
(6)

for fixed time horizon N , see Glasserman (1994) for details.

Example 1

Let θ be the service rate and suppose that the service distribution is such that \(S_{\theta}( n) {\,\buildrel \cal L\over = }\, S_1( n) /{\theta}\) (equality is in distribution). Via Skorokhod representation, there is a probability space for which S θ ( n) = S 1( n) /θ with probability one. Then

$$ \frac{d}{d \theta } D_\theta ( n +1) = \frac{d}{d \theta } S_\theta ( n ) = \frac{d}{d \theta } \frac{1}{\theta} S_1 ( n ) = - \frac{1}{\theta } S_\theta ( n ) , $$

for n ≥ 1 .

Suppose now that \(\mathbb{E}[S_{\theta}( n) ]={\theta}\) is a scaling parameter of the distribution of the service time, then \(S_{\theta}(n) {\,\buildrel \cal L\over = }\, {\theta} S_1(n)\) and \( \frac{d}{d \theta } S_\theta ( n ) = \frac{1}{\theta } S_\theta ( n ) \). Important cases of this result are the models with exponential, Erlang, Uniform and Pareto service time distributions, among others (see Section 4).

Remark 1

It is worth noting that the above representation of \(\frac{d}{d \theta } S_\theta (n)\) implies that the IPA estimator is independent of the actual distribution provided that θ is a scaling parameter. This robustness property of IPA with respect to the underlying distribution holds also in the case that θ is a location parameter; see Cassandras et al. (1991) for details.

2.2 The phantom estimator based on measure value differentiation

Let X θ be an integrable real-valued random variable with distribution μ θ . Denote the set of continuous mappings g from S to ℝ such that | g ( s ) | ≤ c s for all s ∈ S and some constant c by \( {\cal C}_1 \). We call X θ first-moment weakly-differentiable if for any continuous mapping \( g \in {\cal C}_1 \) it holds that probability measures \( \mu_\theta^+ \) and \( \mu_\theta^- \) and a constant c θ exist such that

$$\label{eq:l} \frac{d}{d \theta } \mathbb{E} \big[ g ( X_\theta ) \big] = c_\theta \left( \mathbb{E} \left[ g \left( X_\theta^+ \right) \right] - \mathbb{E} \left[ g \left( X_\theta^- \right) \right] \right) , $$
(7)

where \( X^\pm_\theta \) is distributed according to \( \mu_\theta^\pm \). We call \( \big( c_\theta , \mu_\theta^+ , \mu_\theta^- \big) \) a first-moment weak derivative of μ θ . When it comes to random variables, we call \( \big( c_\theta , X_\theta^+ , X_\theta^- \big) \) a first-moment weak derivative of X θ , with \( X^\pm_\theta \) defined as above, i.e., \( X_\theta^\pm\) have distribution \( \mu_\theta^\pm \). The extension to pth moment weak differentiability follows in the same vein by assuming that g in the above equation satisfies | g ( s ) | ≤ c s p for all s ∈ S . In case the left hand side in Eq. 7 is zero for all \( g \in {\cal C}_1\), we say that the first-moment weak derivative of μ θ , respectively X θ , is not significant and we take the weak derivative as ( c θ , μ θ , μ θ ) , respectively, ( c θ , X θ , X θ ) . Examples of first-moment weak derivatives are provided in Section 4. An instance of a weak derivative is always available via the so called Hahn-Jordan decomposition of the derivative measure. However, weak derivatives are not unique and for distributions that are of importance in practice representations that are more convenient to use are provided in the literature. For a concise treatment of weak differentiation we refer to Heidergott and Leahu (unpublished manuscript).

Denote the distribution of S θ ( n ) by \( \mu^S_\theta \) and the distribution of A θ ( n ) by \( \mu^A_\theta \). Moreover, denote the distribution of D θ ( n + 1) = S θ ( n ) − A θ ( n + 1 ), n ≥ 1 , by \( \mu^D_\theta \). The main technical conditions for unbiasedness are the following.

(M1):

The distribution of the service times \( \mu^S_\theta \) is first-moment weakly differentiable with weak derivative \( \big( c^S_\theta , \mu_\theta^{S +} , \mu_\theta^{S - }\big) \). Equivalently, \( S_\theta ( n \big) \) is first-moment weakly differentiable with weak derivative \( \left( c^S_\theta , S_\theta^{ +} (n ) , S_\theta^{ - } (n)\right) \), for n ∈ ℕ.

(M2):

The distribution of interarrival times \( \mu^A_\theta \) is first-moment weakly differentiable with weak derivative \( \big( c_\theta^A , \mu_\theta^{A +} , \mu_\theta^{A -} \big) \). Equivalently, A θ ( n ) is first-moment weakly differentiable with weak derivative \( \big( c^A_\theta , A_\theta^{ +} (n ) , A_\theta^{ - } (n)\big) \), for n ∈ ℕ.

As is shown in the following lemma, if S θ ( n ) and A θ ( n ) are first-moment weakly differentiable, so is the drift D θ ( n ).

Lemma 1

Let (M1) and (M2) hold. Then \(\mu_\theta^D \) is first-moment weakly differentiable with weak derivative \( \big( c^D_\theta , \mu_\theta^{D + } , \mu_\theta^{ D - } \big) \) , where \( c_\theta^D = c^S_\theta + c^A_\theta \) ,

$$ \mu_\theta^{D+} = \frac{c^S_\theta }{c^S_\theta + c^A_\theta} \mu_\theta^{S+} \mu_\theta^{A} + \frac{c^A_\theta }{c^S_\theta + c^A_\theta} \mu_\theta^{S} \mu_\theta^{A+} $$

and

$$ \mu_\theta^{D-} = \frac{c^S_\theta }{c^S_\theta + c^A_\theta} \mu_\theta^{S-} \mu_\theta^{A} + \frac{c^A_\theta }{c^S_\theta + c^A_\theta} \mu_\theta^{S} \mu_\theta^{A-} . $$

Proof

For \(g \in {\cal C}_1\), applying the product rule of weak differentiation (see Heidergott and Leahu, unpublished manuscript) yields

$$ \begin{array}{lll} \dfrac{d}{d \theta} \int g ( u ) \mu^D_\theta ( d u ) &=& \frac{d}{d \theta} \int \!\!\! \int g(s-a)\mu_\theta^{S}(ds) \mu_\theta^{A} (da) \\ &=& \int \!\!\! \int g(s-a)\left(\mu_\theta^{S}\mu_\theta^{A} \right)' (ds , da)\\ &=& \int \!\!\! \int g(s-a)\left(\mu_\theta^{S'}\mu_\theta^{A} + \mu_\theta^{S}\mu_\theta^{A'} \right)(ds , da). \end{array} $$

Inserting the weak derivatives for \( \mu_\theta^S \) and \( \mu_\theta^A \) yields

$$ \begin{array}{lll} &&\dfrac{d}{d \theta} \int g ( u ) \mu^D_\theta ( d u ) \\ &&{\kern1pc} = \int \!\!\! \int g(s-a)\left( c^S_\theta \left(\mu_\theta^{S+} - \mu_\theta^{S-}\right)\mu_\theta^{A} + c^A_\theta \mu_\theta^{S} \left(\mu_\theta^{A+} - \mu_\theta^{A-}\right) \right) (ds , da)\\ &&{\kern1pc} = \left(c^S_\theta + c^A_\theta\right)\left[\int \!\!\! \int g(s-a) \left(\frac{c^S_\theta }{c^S_\theta + c^A_\theta}\mu_\theta^{S+} \mu_\theta^{A} + \frac{c^A_\theta }{c^S_\theta + c^A_\theta}\mu_\theta^{S} \mu_\theta^{A+} \right)(ds , da) \right.\\ && \qquad \qquad\qquad{\kern15pt} - \left. \int \!\!\! \int g(s-a) \left(\frac{c^S_\theta }{c^S_\theta + c^A_\theta}\mu_\theta^{S-} \mu_\theta^{A} + \frac{c^A_\theta }{c^S_\theta + c^A_\theta}\mu_\theta^{S} \mu_\theta^{A-} \right)(ds , da)\right] , \end{array} $$

which proves the claim. □

Remark 2

The result put forward in Lemma 1 can be phrased in terms of random variables as follows. Let \( S_\theta^\pm ( n ) \) be distributed according to \( \mu_\theta^{S \pm }\) and let \( A_\theta^\pm ( n ) \) be distributed according to \( \mu_\theta^{A \pm }\). Then a sample of \( \mu_\theta^{ D + }\), denoted by \( D_\theta^+ ( n ) \), is obtained from

$$\begin{array}{lll} D_\theta^+(n) &=& \left\{ \begin{array}{ll} S_\theta^+(n-1) - A_\theta(n) & \textrm{with probability} \dfrac{c^S_\theta }{c^S_\theta + c^A_\theta} , \\ S_\theta(n-1) - A^+_\theta(n) & \textrm{with probability} \dfrac{c^A_\theta }{c^S_\theta + c^A_\theta} , \end{array} \right. \end{array}$$

and a sample of \( \mu_\theta^{ D - }\), denoted by \( D_\theta^- ( n ) \), is obtained from

$$\begin{array}{lll} D_\theta^-(n) &=& \left\{ \begin{array}{ll} S_\theta^-(n-1) - A_\theta(n) & \textrm{with probability} \dfrac{c^S_\theta }{c^S_\theta + c^A_\theta} , \\[13pt] S_\theta(n-1) - A^-_\theta(n) & \textrm{with probability} \dfrac{c^A_\theta }{c^S_\theta + c^A_\theta} . \end{array} \right. \end{array}$$

Then under the conditions put forward in Lemma 1 it holds for \( g \in {\cal C}_1\) that

$$ \frac{d}{d \theta } \mathbb{E} \big[ g ( D_\theta ( n )) \big] = c_\theta \left ( \mathbb{E} \left[ g ( D_\theta^+ ( n )) \right] - \mathbb{E} \big[ g \big( D_\theta^- ( n )\big) \big ] \right ) . $$

In the next lemma we show that first-moment weak differentiability of the drift implies that of the waiting times.

Lemma 2

If D(n) is first-moment weakly differentiable with weak derivative \(( c_\theta^D \), \(D_\theta^+ (n)\), \(D_\theta^- (n)) \), then W θ (n) is also first-moment weakly differentiable. In particular, letting \( c_\theta^W = c_\theta^D \) and, provided that W θ ( n − 1 ) = w, it holds that

$$ W_\theta^+ ( n) = \max \big( D^+_\theta ( n ) + w , 0 \big) \quad \text{ and } \quad W_\theta^- ( n) = \max \big( D^-_\theta ( n ) + w , 0 \big) , $$

for w ≤ 0 , it holds for any g, such that | g ( s ) | ≤ c |s| for all s and some constant c, that

$$\begin{array}{lll} &&\frac{d}{d \theta } \mathbb{E} \left[ g ( W_\theta ( n+1 )) | W_\theta ( n ) = w \right] \\ &&{\kern1pc} = c_\theta^W \Big( \mathbb{E} \left[ g \left( \max( D_\theta^+ ( n )+w,0)\right) \right] - \mathbb{E} \left[ g \left( \max\left( D_\theta^- ( n )+w,0\right)\right) \right] \Big). \end{array}$$

Proof

Note that \( g\in \mathcal{C}_1\) implies \(g_w(d) = g(\max(w+d, 0))\in \mathcal{C}_1\). Applying Lemma 1 to g w thus proves the claim. □

The nth waiting time depends on θ through the first n drift variables. First-moment weak differentiability of W θ ( n ) follows from that of D θ ( n ) by the product rule of weak differentiation, see Heidergott and Leahu (unpublished manuscript) and Heidergott and Vázquez-Abad (2006). In the following, we describe in detail how instances of the weak derivatives of W θ ( n ) can be obtained from weak derivatives of the drift variables. For k ≥ 1 we define auxiliary sequences \( \big\{ W_\theta^+ ( n , k ): n \geq 1 \big\} \) and \( \big\{ W_\theta^- ( n , k ): n \geq 1 \big\} \) as follows. For n < k , set

$$ W_\theta^+ ( n , k ) = W_\theta^- ( n , k ) = W_\theta( n ) . $$

For k = n , replace the nominal drift D θ ( n ) by its weak derivative. More specifically, set

$$ W_{\theta}^+ ( k , k )=\max\big\{W_\theta (k-1)+D_\theta^+ ( k ) ,0\big\} $$

and

$$ W_{\theta}^- ( k , k )=\max\big\{W_\theta (k-1)+D_\theta^- ( k ) ,0\big\} . $$

For n > k , we set

$$ W_{\theta}^+ ( n+1 , k )=\max\big\{W_\theta^+ (n,k)+D_\theta ( n +1) ,0\big\} $$

and

$$ W_{\theta}^- ( n+1 , k )=\max\big\{W_\theta^- (n,k)+D_\theta ( n +1 ) ,0\big\} . $$

For any \( g \in {\cal C}_1 \) it then holds that

$$ \label{eq:ph} \frac{d}{d \theta } \mathbb{E} \left [ \frac{1}{N}\sum\limits_{n=1}^N g \left( W_\theta ( n )\right) \right ] = \mathbb{E} \left [\frac{c_\theta}{N}\sum\limits_{k=1}^N \sum\limits_{n=k}^N \left( g\big( W_\theta^+ ( n , k)\big) - g \big( W_\theta^-( n , k) \big) \right)\right ]. $$
(8)

The processes \( \{W_\theta^\pm( n , k): n\geq 1\}\), k ≥ 1 , are called phantom processes in the literature. The derivative estimator in Eq. 8 is called phantom estimator (PhE).

3 The single-run phantom estimators

The Phantom estimator in Eq. 8 requires in principle simulating the processes \(\{W_\theta^\pm( n , k): n\geq 1\}\) in order to obtain the derivative information. Fortunately, for the waiting times, the standard phantom estimator can be implemented in a single-run version. In the following we will present two ways of obtaining a single-run phantom estimator: (i) via a change of measure and (ii) through elaborating on the perturbation propagation rules as known from IPA.

3.1 Change of measure

For fixed N , we can write

$$ g_N \big( D_\theta ( 1 ) , \ldots , D_\theta ( N ) \big) = \frac{1}{N} \sum\limits_{n=1}^N g \big( W_\theta ( n )\big) . $$

With this notation, the phantom estimator in Eq. 8 reads

$$ \label{eq:ph2}\begin{array}{lll} & & \frac{d}{d \theta } \mathbb{E} \left [ g_N \big( D_\theta ( 1 ) , \ldots , D_\theta ( N ) \big) \right ] \\[3pt] &&{\kern1pc} = {c_\theta} \sum\limits_{k=1}^N \Big ( \mathbb{E} \big [ g_N \big( D_\theta ( 1 ) , \ldots , D_\theta ( k-1 ) , D_\theta^+ ( k ) ,D_\theta ( k\!+\!1 ) , \ldots , D_\theta ( N ) \big) \big ] \\[3pt] &&{\kern55pt} -\, \mathbb{E} \big [ g_N \big( D_\theta ( 1 ) , \ldots , D_\theta ( k \!-\! 1 ) , D_\theta^- ( k ) ,D_\theta ( k \!+\! 1 ) , \ldots , D_\theta ( N ) \big) \big ] \Big ) .\\ \end{array}$$
(9)

Provided that \( \mu_\theta^{ D \pm }\) are absolutely continuous with respect to μ θ , applying a simple change of measure to the right hand side of Eq. 9 yields

$$\label{eq:ph3}\begin{array}{lll} && \frac{d}{d \theta } \mathbb{E} \left [ g_N \big( D_\theta ( 1 ) , \ldots , D_\theta ( N ) \big) \right ] \\[3pt] &&{\kern1pc} = {c_\theta} \sum\limits_{k=1}^N \left ( \mathbb{E} \left [ g_N \big( D_\theta ( 1 ) , \ldots , D_\theta ( N ) \big) \frac{d \mu_\theta^{ D + } }{ d \mu_\theta } ( D_\theta ( k ) ) \right . \right . \\[3pt] && \qquad \qquad {\kern30pt} \left . \left . -\, g_N \big( D_\theta ( 1 ) , \ldots , D_\theta ( N ) \big) \frac{d \mu_\theta^{ D - } }{ d \mu_\theta } ( D_\theta ( k ) ) \right ] \right ). \end{array} $$
(10)

The above representation simplifies provided that \( \mu_\theta^D \) and \( \mu_\theta^{ D \pm }\) have densities. For example, assume that the drift only depends through the service times on θ and denote the density of the service time distribution by \( f_\theta^S\). For fixed N , let

$$ g^N \big( S_\theta ( 1 ) , \ldots , S_\theta ( N ) \big) = \frac{1}{N} \sum\limits_{n=1}^N g ( W_\theta ( n )) . $$

Let \( \mu_\theta^S \) be first-moment weakly differentiable with weak derivative \(\big( c_\theta^S , \mu_\theta^{S +}, \mu_\theta^{S -}\big)\) and denote the corresponding derivatives by \( f_\theta^{S + } \) and \( f_\theta^{S - } \), respectively. Then, the phantom estimator in Eq. 8 reads

$$\begin{array}{lll} \frac{d }{d \theta} \mathbb{E}\left[\frac{1}{N}\sum\limits_{i=1}^N W_\theta(i{\kern.6pt})\right] &=& c_\theta^S \mathbb{E}\left[ g^N\big(S_\theta (1), \ldots, S_\theta (N{\kern.6pt}) \big) \sum\limits_{i=1}^N \frac{f^{S +}_\theta \big(S_\theta (i{\kern.6pt}) \big) - f^{S -}_\theta \big(S_\theta (i{\kern.6pt}) \big)}{f_\theta^S \big(S_\theta (i{\kern.6pt}) \big)} \right]\\ &=& \frac{c_\theta^S}{N} \mathbb{E}\left[ \left(\sum\limits_{i=1}^N W_\theta(i{\kern.6pt})\right) \cdot \left( \sum\limits_{i=1}^N \frac{f^{S+}_\theta \big(S_\theta (i{\kern.6pt})\big) - f^{S -}_\theta \big(S_\theta (i{\kern.6pt}) \big)}{f_\theta^S (S_\theta (i{\kern.6pt}))} \right) \right]. \end{array}$$

Example 2

Consider the M/M/1 queue where θ denotes the service rate. A weak derivative of \( \mu_\theta^S \) is obtained from:

$$ c_\theta^S =\frac{1}{ \theta}, \quad\mu^{S + }_\theta = \text{ Exponential}( \theta), \quad \mu^{S -}_\theta = \text{ Erlang-}(2, \theta), $$

where the densities are given by θe  − θx and θ 2 x e  − θx , respectively, see Heidergott et al. (2009). We have the estimator as

$$\begin{array}{ll} \frac{d }{d \theta} \mathbb{E}\left[\frac{1}{N}\sum\limits_{i=1}^N W_\theta(i{\kern.6pt})\right] &= \frac{1}{\theta N}\mathbb{E}\left[\left(\sum\limits_{i=1}^N W_\theta(i{\kern.6pt})\right) \cdot \left( \sum\limits_{i=1}^N \frac{f_\theta \big(S_\theta (i{\kern.6pt})\big) -f^{S-}_\theta \big(S_\theta (i{\kern.6pt})\big)}{ f_\theta \big(S_\theta (i{\kern.6pt}) \big)} \right)\right] \\ &=\frac{1}{\theta N}\mathbb{E}\left[\left(\sum\limits_{i=1}^N W_\theta(i{\kern.6pt})\right) \cdot \left( N - \sum\limits_{i=1}^N \frac{ \theta^2 S_\theta (i{\kern.6pt}) e^{- \theta S_\theta (i{\kern.6pt}) }}{\theta e^{- \theta S_\theta (i{\kern.6pt}) } } \right)\right] \\ & =\frac{1}{\theta N}\mathbb{E}\left[\left(\sum\limits_{i=1}^N W_\theta(i{\kern.6pt})\right) \cdot \left( N - \theta \sum\limits_{i=1}^N S_\theta (i{\kern.6pt}) \right)\right]. \label{eq:sf} \end{array} $$
(11)

Note that the resulting estimator in Eq. 11 is also known as the score function estimator (SF), see, for example, Rubinstein and Shapiro (1993). In order to reduce the variance of the estimator, we rewrite the summation as follows:

$$\begin{array}{lll} &&\frac{d }{d \theta} \mathbb{E}\left[\frac{1}{N}\sum\limits_{i=1}^N W_\theta(i{\kern.6pt})\right] \\ &&{\kern1pc} = \frac{1}{\theta N}\mathbb{E}\left[\sum\limits_{i=1}^N W_\theta(i{\kern.6pt}) \cdot \left( i - \theta \sum\limits_{j=1}^i S_\theta (j\,) + (N-i{\kern1pt}) + \theta \sum\limits_{j=i+1}^N S_\theta (j\,) \right)\right] \\ &&{\kern1pc} =\frac{1}{\theta N}\mathbb{E}\left[\sum\limits_{i=1}^N W_\theta(i{\kern.6pt}) \cdot \left( i - \theta \sum\limits_{j=1}^i S_\theta (j\,) \right)\right]\\ &&{\kern2pc}+\, \frac{1}{\theta N}\mathbb{E} \left[\sum\limits_{i=1}^N W_\theta(i{\kern.6pt})\cdot \left( (N-i{\kern.6pt}) + \theta \sum\limits_{j=i+1}^N S_\theta (j\,)\right)\right] . \end{array}$$

By

$$ \mathbb{E} \left[ \theta \sum\limits_{j=i+1}^N S_\theta (j\,)\right] = N-i $$

together with the independence of S θ ( j ) of W θ ( i ) for j > i , it follows that

$$\label{eq:sfr} \frac{d }{d \theta} \mathbb{E}\left[\frac{1}{N}\sum\limits_{i=1}^N W_\theta(i{\kern.6pt})\right] = \frac{1}{\theta N}\mathbb{E}\left[\sum\limits_{i=1}^N W_\theta(i{\kern.6pt}) \cdot \left( i - \theta \sum\limits_{j=1}^i S_\theta (j\,) \right)\right]. $$
(12)

We call the estimator in Eq. 12 the score function estimator with variance reduction (SFR). The simulation overhead introduced by the score function estimator is very small. We only need to obtain the sum of all the service time in the nominal sample path. We have done experiments for λ = 1.0, N = 1000 and the number of replications is 8000. The traffic rate is given by ρ = λ/ θ = 1 / θ. The variance of the score function estimators is quite high compared to IPA, see the first two rows in Table 1. In the following, we compare the “work-normalized variance” (WNV) which balances the computational effort and the variance of the estimator, and is given by the product of the variance and the expected work per run, see Glynn and Whitt (1992). Although the computational burden of the SF phantom estimator is low, due to the high variance, the work-normalized variance of SF is higher than IPA, see the third and the fourth row in Table 1.

Table 1 Comparison of IPA, SF and SFR when θ is the rate of the exponentially distributed service times

3.2 Perturbation analysis approach to phantom estimator

In this section, we show how to obtain the performance function for phantom processes (i.e., \(W^\pm_\theta(n,k)\)) directly from the nominal process (i.e., W θ (n)). For illustrating purposes, we assume that only service times depend on θ and that the inter-arrival time distribution is independent from θ. Consequently, we drop in the following the index θ for the interarrival times, i.e., A θ ( n ) = A ( n ) for all n. Then D θ (n) has the following weak derivative

$$ c_\theta = c_\theta^S,\; D_{\theta}^+(n) = S_\theta^+(n-1) - A(n),\; \text{ and }D_\theta^-(n) = S_\theta^-(n-1) - A(n). $$

Now, suppose that we have observed a sample path of the original GI/GI/1 system, called the nominal sample path. We then ask the question, what had happened if in the nominal sample path one service sample S θ ( k) were replaced by \(S^\pm_{\theta}( k)\) ?

Fix the index k where the perturbation of the service time starts, that is, the service time of the phantom system for the k th customer is either \(S^+_{\theta} (k)\) or \(S^-_{\theta}(k)\). As presented in Section 2.2 the phantom estimator needs the information of the difference between “plus” and “minus” phantom process, i.e., \(W_\theta^+ ( n , k) - W_\theta^-( n , k) .\) Define “plus” and “minus” difference process as follows

$$ \Delta_\theta^+ ( n, k ) = W_\theta^+ ( n,k ) - W_\theta ( n ), \quad \Delta_\theta^- ( n, k ) = W_\theta^- ( n,k ) - W_\theta ( n ). $$

Notice that the phantom estimator is completely determined through these difference processes. The phantom estimator in Eq. 8 can be rewritten as:

$$ \label{eq:paph} \frac{d}{d \theta } \mathbb{E} \left [\frac{1}{N} \sum\limits_{n=1}^N W_\theta ( n ) \right ] = \mathbb{E} \left [\frac{c_\theta}{N} \sum\limits_{k=1}^N \sum\limits_{n=k}^N \left( \Delta_\theta^+ ( n , k) - \Delta_\theta^-( n , k) \right)\right ]. $$
(13)

Without loss of generality, we only analyze the “minus”difference process \(\{\Delta_\theta^- ( n,k ): n\geq 1\}\) in the following illustration. Since for 1 ≤ n < k, \(W_\theta^- ( n,k ) = W_\theta ( n ) \), we have \(\{\Delta_\theta^- ( n, k ) = 0, 1\leq n \leq k \}\). For n = k + 1, suppose both waiting times are non zero, we have

$$\begin{array}{rll} W_\theta (k+1) &=& W_\theta(k) + S_\theta(k) - A(k+1) \\[4pt] W_\theta^- ( k+1,k ) &=& W_\theta(k) + S^-_\theta(k) - A(k+1). \end{array}$$

Therefore \( \Delta_\theta^- ( k+1, k ) = S_\theta^- ( k ) - S_\theta ( k )\). Let \(\Delta_k =S_\theta^- ( k ) - S_\theta ( k ) \) denote the introduced perturbation at the kth state. Depending on the relative value of \(S_\theta^- ( k )\) and S θ ( k ) , Δ k can be either positive or negative. So we now concentrate on this finite perturbation propagation. There are three cases to be considered.

Case 1 Small positive perturbation

 

In this case, the perturbation Δ k is positive and smaller than the length of the following idle period. To illustrate this case, consider the following example. In the sample path shown in Fig. 1, a service time perturbation \(\Delta_1 = S_\theta^- ( 1 ) - S_\theta ( 1 )\) is introduced.

Fig. 1
figure 1

Sample path with a small positive perturbation

We then see that:

  1. (i)

    \(\Delta_\theta^-(1,1) = W_\theta^-(1,1) - W_\theta(1) = 0\),

  2. (ii)

    \(\Delta_\theta^-(2,1) = W_\theta^-(2,1) - W_\theta(2) = \Delta_1\),

  3. (iii)

    \(\Delta_\theta^-(3,1) = W_\theta^-(3,1) - W_\theta(3) = 0\), since Δ1 is smaller than the idle period length (A θ (3) − W θ (2) − S θ (2)).

  4. (iv)

    \(\Delta_\theta^-(4,1) = W_\theta^-(4,1) - W_\theta(4) = 0\), since Δ1 is smaller than the idle period length of the busy cycle the 4th customer is in.

To summarize, the accumulated difference of the waiting times between the phantom process and the nominal path is \(\sum_{i=1}^4 \Delta_\theta^-(i,1)= \Delta_1 \). Since the initial perturbation Δ1 is smaller than the idle period, it does not propagate to the next busy cycle.

Case 2 Large positive perturbation

 

Now consider the case that the perturbation Δ k is positive and larger than the following idle period length. To illustrate this case, consider the same example as shown in Case 1. In the sample path shown in Fig. 2, a service time perturbation \(\Delta_1 = S_\theta^- ( 1 ) - S_\theta ( 1 )\) is introduced.

Fig. 2
figure 2

Sample path with a large positive perturbation

We then see that:

  1. (i)

    \(\Delta_\theta^-(1,1) = W_\theta^-(1,1) - W_\theta(1) = 0\),

  2. (ii)

    \(\Delta_\theta^-(2,1) = W_\theta^-(2,1) - W_\theta(2) = \Delta_1\),

  3. (iii)

    \(\Delta_\theta^-(3,1) \!=\! W_\theta^-(3,1) \!-\! W_\theta(3) \!=\! \Delta_1 \!-\! ( A(3) - W_\theta(2) - S_\theta(2)) \), since Δ1 is larger than the following idle period length A(3) − W θ (2) − S θ (2).

  4. (iv)

    \(\Delta_\theta^- (4,1) = W_\theta^-(4,1) - W_\theta(4) = \Delta_\theta^-(3,1)\), the part of the perturbation that propagates to the second busy cycle.

In general, the perturbation \( \Delta_\theta^- (n,k)\) propagates, for n > 1, as follows

$$\label{eq:dl}\begin{array}{lll} \Delta_\theta^- (n\!+\!1, k ) \!=\! \begin{cases} \Delta_\theta^- ( n, k), & W_\theta ( n\!+\!1 ) \!>\! 0 \\[4pt] \max (\Delta_\theta^- ( n, k ) \!-\! ( A ( n\!+\!1 ) \!-\! W_\theta ( n) \!-\! S_\theta ( n )), 0), & W_\theta ( n\!+\!1 ) \!=\!0 . \end{cases} \end{array} $$
(14)

The difference processes for the negative phantom are obtained from the nominal path by simple addition during busy cycles, and at the end of a busy cycle, the amount of perturbation that reaches the next busy cycle has to be computed. As can be seen from this construction, the extra computations required by the phantom estimator are essentially given by the number of busy cycles in the first N waiting times. Moreover, the construction of \(\{\Delta_\theta^- (n, k ): n\geq 1\}\) can be terminated as soon as the perturbation has died out, i.e., if for the first time \(\Delta_\theta^- (n,k )=0\).

Case 3 Negative perturbation

 

In this case, the perturbation Δ k is negative, that is the waiting time of the phantom process will be smaller than the nominal path. To illustrate this case, consider the same example as shown in the following sample path. In the sample path shown in Fig. 3, a service time perturbation \(\Delta_1 = S_\theta^- ( 1 ) - S_\theta ( 1 )\) is introduced.

Fig. 3
figure 3

Sample path with a negative perturbation

We then see that:

  1. (i)

    \(\Delta_\theta^-(1,1) = W_\theta^-(1,1) - W_\theta(1) = 0\),

  2. (ii)

    \(\Delta_\theta^-(2,1) = W_\theta^-(2,1) - W_\theta(2) = \Delta_1\),

  3. (iii)

    \(\Delta_\theta^-(3,1) = W_\theta^-(3,1) - W_\theta(3) = - W_\theta(3) \). Since |Δ1| is smaller than the following waiting time W θ (3), the perturbation terminates and \(W_\theta^-(3,1) = 0\); the difference variable is set to W θ (3).

  4. (iv)

    \(\Delta_\theta^-(4,1) = W_\theta^-(4,1) - W_\theta(4) = \Delta_\theta^-(3,1)\), the new perturbation propagates.

In general, the perturbation \( \Delta_\theta^- (n,k)\) propagates, for n > 1, as follows

$$\label{eq:dm}\begin{array}{lll} \Delta_\theta^- (n+1, k ) \!=\! \begin{cases} 0, & W_\theta ( n \!+\! 1 ) \!=\! 0\\[3pt] \Delta_\theta^- ( n, k), & W_\theta ( n+1 )> 0 \text{ and }W_\theta ( n+1 )+ \Delta_\theta^- ( n, k) \!>\! 0 \\[3pt] - W_\theta ( n \!+\! 1 ), & W_\theta ( n \!+\! 1 )> 0 \text{ and } W_\theta ( n+1 ) \!+\! \Delta_\theta^- ( n, k) \!\leq\! 0 . \end{cases}\\ \end{array} $$
(15)

In this case, the difference processes can be easily obtained from those of the nominal path by simple addition during busy cycles, and at the end of a busy cycle, the perturbation dies out. As can be seen from this construction, the extra computations required by the phantom estimator are essentially given by the number of customers in a busy cycles. Moreover, computing \(\{\Delta_\theta^- (n, k ): n\geq 1\}\) can be terminated as soon as the busy cycle finishes.

3.3 Implementation of the on-line phantom estimator

In the following we discuss how the phantom estimator based on the perturbation analysis approach Eq. 13 can be implemented on line. We introduce an “accumulator” D, which cumulates all the differences \(\Delta_\theta^+ ( n , k) - \Delta_\theta^-( n , k)\). In addition, we introduce a linked list \(\cal P\), which presents all active phantom processes. Each element of the phantom list \(\cal P\) has two variables Δ1 and Δ2, where \(\Delta_1 = \Delta_\theta^+ ( n , k)\) and \(\Delta_2= \Delta_\theta^- ( n , k)\). Whenever a customer enters the server, one phantom process is generated, that is, one element will be added to the phantom list \(\cal P\). We assume that the queue is initially empty. The algorithm is as follows:

Upon the nth arrival:

  1. 1.

    If the server is idle (i.e., A ( n ) ) > S θ ( n − 1 ) ), we generate a service time S θ (n).

  2. 2.

    Update all the current elements in phantom list \(\cal P\):

    1. a.

      Update the difference variable Δ according to formula (14), that is, for i = 1,2, if Δ i  > 0 , set Δ i : = Δ i  − (A ( n ) − S θ (n − 1)).

    2. b.

      If Δ1 = Δ2 = 0, this phantom process has terminated and we remove this element from the list \(\cal P\); otherwise update the accumulator D = D + (Δ1 − Δ2).

  3. 3.

    Generate a new element in the phantom list \(\cal P\):

    1. a.

      Sample \(S^+_\theta (n)\) from distribution \(\mu_\theta^{S +}\) and sample \(S^-_\theta (n)\) from distribution \(\mu_\theta^{S -}\).

    2. b.

      Set the difference variable in this new element as: \(\Delta_1 =S^+_\theta (n) - S_\theta (n) , \; \Delta_2 =S^-_\theta (n) - S_\theta (n).\)

Upon the (n − 1)st departure:

  1. 1.

    If the queue is empty (i.e., a busy cycle is terminated):

    1. a.

      If the difference variable Δ is smaller than 0, the perturbation dies, i.e., for i = 1 , 2 , if, Δ i  < 0 , then set Δ i  = 0.

    2. b.

      If Δ1 = Δ2 = 0, remove this element from list \(\cal P\).

  2. 2.

    If the queue is not empty:

    1. a.

      A customer enters the server and a service time S θ (n) is generated, and the waiting time W θ (n) of this customer is computed.

    2. b.

      Update all the current elements in the phantom list \(\cal P\):

      1. i.

        Update the accumulator D = D + (Δ1 − Δ2).

      2. ii.

        Update the difference variable Δ which is smaller than 0 according to formula (15), i.e., for i = 1,2, if Δ i  < 0 and W θ (n) + Δ i  < 0, set Δ i  = − W θ (n) .

    3. c.

      Generate a new element in the phantom list \(\cal P\):

      1. i.

        Sample \(S^+_\theta (n)\) from distribution \(\mu_\theta^{S +}\) and sample \(S^-_\theta (n)\) from distribution \(\mu_\theta^{S -}\).

      2. ii.

        Set the difference variable in this new element as: \(\Delta_1 =S^+_\theta (n) - S_\theta (n) , \; \Delta_2 =S^-_\theta (n) - S_\theta (n).\)

The above generation of the difference processes stops whenever the given number of N served customers is reached. Then, the derivative estimator in the right-hand side in Eq. 13 is given by c θ D /N. The resulting estimator is called the single-run phantom estimator (SRPhE).

In many cases, it is possible to choose the distribution of one of the phantom process \(W_\theta^+ ( n , k )\) or \(W_\theta^- ( n , k )\) as being the same as the distribution of the input process W θ ( k ). In another words, to decrease the computational burden, we would like to find appropriate choices of the “plus” and “minus” variables where one of them are equal to the nominal variable, in formula,

$$ W_{\theta}^+ ( k,k) = W_{\theta}( k) \qquad \text{or} \qquad W_{\theta}^- ( k,k) = W_{\theta}( k). $$

In Heidergott et al. (2009) a guide is provided to computing phantom estimators using a version of the phantom processes that re-uses the nominal process.

4 Finite horizon experiments

For the finite horizon case, we can use the single-run phantom estimator as presented in Eq. 13, where the waiting time of phantom processes are directly obtained from the nominal path. In numerical examples, we consider a M/G/1 queue, where service time distribution depends on θ and the arrival time distribution is independent from θ.

Example 3

(the M/M/1 queue) Interarrival time A(k) is an i.i.d. sequence of exponentially distributed random variables with rate λ, and service time S θ (k) is an i.i.d. sequence of exponentially distributed random variables with rate θ. As we have explained in Section 3.2, it is referable to choose one of phantom processes so that it coincides with the nominal process and we take the weak derivative as provided in Example 2. By doing so, we have \(W_\theta^+(n,k) = W_\theta(n),\) that is the plus phantom is the same as the nominal path. The perturbation introduced by the phantom estimator is a variable which follows exponential distribution with rate θ.

For our numerical experiments, we let λ = 1.0 and choose N = 1000. For all experiments presented in the section are based on 800 samples. Note that for this setting, the traffic rate ρ is equal to 1/θ. We compare in Table 2 the variance and the computation time of the phantom estimators with that of IPA. Note that SRPhE has the same variance as PhE, which is given in the second row of the upper part of the table. The results show that the phantom estimators have less variance than IPA. As the workload increases the variance of the phantom estimators increases faster than that of IPA. The computation time in Table 2 shows that the phantom estimators require higher computational effort, especially the computation time of PhE increases quite fast as the load of the system increases.

Table 2 The variance and computation time comparison of dE[W]/ for M/M/1 queue

For a better comparison, we depict in Fig. 4 the work-normalized variance (WNV) of the phantom estimators relative to the WNV of IPA (more precisely, the figure plots (WNV phantom)/(WNV IPA)). As can be seen from this figure, the WNV of PhE is sensitive with respect to ρ;, whereas the WNV of IPA and SRPhE is only mildly dependent on ρ. Furthermore, it is worth noting that SRPhE has even smaller WNV than IPA.

Fig. 4
figure 4

The relative work-normalized variance of the PhE and that of the SRPhE compared to that of IPA for M/M/1 queue

Example 4

(the M/G/1 queue) For this model, {A (k)} is an i.i.d. sequence of exponentially distributed random variables with rate λ, and { S θ (k) } is an i.i.d. sequence of μ θ distributed random variables, where we let μ θ be any of the distributions listed in Table 3. This table provides also weak derivatives, where Dirac (θ) denotes the Dirac measure, i.e., the point mass in θ.

Table 3 The weak-derivative for a few common distributions

Table 3 contains two versions of the Pareto distribution. The Pareto(θ, ∞ )(α,θ) distribution has density \( \alpha \theta^\alpha x^{-(\alpha +1)}\mathbb{1}_{(\theta, \infty)}(x)\); whereas the Pareto(α,θ) distribution has density

$$ f_{\theta , \alpha } ( x ) = \frac{\alpha}{\theta +x} \left ( \frac{\theta}{\theta +x }\right )^\alpha. $$

The distribution \(\mu^+_\theta\) for Pareto(α,θ) shown in this table has density

$$ f^+_{ \theta , \alpha } ( x ) = \frac{ \theta ( \alpha +1) }{\alpha }\frac{\alpha x}{\theta (\theta +x )} f_{\theta , \alpha } ( x ). $$

Note that \( f^+_{ \theta , \alpha } ( x ) \) is a twisted version of f θ, α ( x ) and can be sampled from with the acceptance rejection method by choosing normal Pareto(α,θ) as the instrumental distribution and the rejection bound as α + 1. The minus phantom process \(\mu^-_\theta\) has density

$$ f^-_{\theta , \alpha } ( x ) = \frac{ \theta ( \alpha +1) }{\alpha }\frac{1}{\theta +x} f_{\theta , \alpha } ( x). $$

The cumulative distribution function corresponding to \(f^-_{\theta , \alpha } ( x )\) is 1 − (1 + x/θ) − (α + 1) and it can be easily sampled from by the inverse function method.

For our numerical experiments, we let λ = 1.0 and choose N = 1000. In order to compare the different service time distribution, we choose such θ values that all of them have the same mean value. We depict in Fig. 5 the relative WNV of IPA with respect to the WNV of SRPhE as a function of the system load ρ. As can be seen from the figure, for most cases, SRPhE has smaller WNV than IPA. However, IPA outperforms SRPhE for the Pareto(θ, ∞ )(α,θ) distribution. It is worth noting that for the Pareto(α,θ) distribution SRPhE has better WNV.

Fig. 5
figure 5

The relative work-normalized variance of the SRPhE compared to that of IPA for M/G/1 queue

5 Steady-state experiments

In this section we deal with sensitivity analysis of stationary waiting times. Throughout this section we assume that the queue is stable, i.e., we assume that

$$ \frac{ \mathbb{E}[ S_\theta ( 1 ) ] }{ \mathbb{E}[ A_\theta ( 1 ) ]} =: \rho < 1 , $$

for all θ. In the above stability assumption, the stationary waiting time, denoted by W θ , is absolutely integrable and satisfies

$$\label{eq:r} \lim_{N \rightarrow \infty } \frac{1}{N} \sum\limits_{n=1}^N \mathbb{E} [ W_\theta ( n ) ] = \mathbb{E} [ W_\theta ] $$
(16)

for all θ. In the following we discuss the approaches available in the literature for estimating \( d \mathbb{E} [ W_\theta ] / d \theta \).

5.1 Transient approximation

Elaborating on the limit in Eq. 16, one can search for N large enough such that

$$ \frac{1}{N} \mathbb{E} \left [\,\sum\limits_{n=1}^N W_\theta ( n )\right] \approx \mathbb{E} [ W_\theta ] $$

in a neighborhood of θ. Then, the gradient estimators described in Section 4 can be applied and

$$ \frac{1}{N} \frac{d}{d \theta } \mathbb{E} \left [\,\sum\limits_{n=1}^N W_\theta ( n )\right] $$

will be used for optimizing \(\mathbb{E} [ W_\theta ] \).

5.2 Cycle analysis

Under stability of the queue it holds that W θ ( n ) hits the set { 0 } with positive probability. Denote by τ θ ( n ) the hitting times of W θ ( n ) on { 0 } . By renewal theory it holds that

$$\label{eq:reg} \mathbb{E}[ W_\theta] = \frac{\mathbb{E}\left[ \sum_{i=\tau_\theta (k) }^{\tau_\theta (k+1) -1}W_\theta(i{\kern.6pt})\right]}{\mathbb{E}\left[\tau_\theta (k+1) - \tau_\theta (k) \right]}, $$
(17)

for any k ≥ 1 . Assuming that derivatives exist, by directly differentiating Eq. 17, we obtain a strongly consistent and asymptotically unbiased estimator for the steady-state average waiting time as

$$\label{eq:rf}\begin{array}{lll} \frac{d}{d\theta}\mathbb{E}[ W_\theta ] & = & \frac{\frac{d}{d\theta}\mathbb{E_\theta }\left[ \sum_{i=\tau_k}^{\tau_\theta (k+1)-1}W_\theta (i{\kern.6pt})\right] \mathbb{E}\left[\tau_\theta (k+1) - \tau_\theta (k) \right] }{\big(\mathbb{E}[\tau_\theta (k+1) - \tau_\theta ( k)]\big)^2} \\ & & \quad - \frac{\mathbb{E}\left[ \sum_{i=\tau_\theta (k )}^{\tau_\theta (k+1)-1}W_\theta (i{\kern.6pt})\right] \frac{d}{d\theta}\mathbb{E}\left[\tau_\theta (k+1) - \tau_\theta (k) \right]} {\big(\mathbb{E}[\tau_\theta (k+1) - \tau_\theta ( k)]\big)^2 } . \end{array} $$
(18)

5.3 Phantom estimators

Denote \(\tau^\pm_\theta (k)\) the first entrance times of \(W_\theta^\pm(n,k)\) into the set { 0 } ; in formula let

$$ \tau^+_\theta (k) \triangleq \inf \big\{ n\geq 1: W_\theta^+(n,k) = 0 \big\}, $$

and

$$ \tau^-_\theta (k) \triangleq \inf \big\{ n\geq 1: W_\theta^-(n,k) = 0 \big\}. $$

Then the derivative of the waiting time and the cycle length are estimated as

$$ \frac{d}{d\theta}\mathbb{E}\left[\sum\limits_{i=\tau_\theta (k)}^{\tau_\theta (k+1)-1}W_\theta(i{\kern.6pt})\right]= c_\theta \mathbb{E} \left[\sum\limits_{k=\tau_\theta (k)}^{\tau_\theta (k+1)-1} \left ( \sum\limits_{n=k}^{\tau^+_\theta (k)-1} W_\theta^+ ( n , k) - \sum\limits_{n=k}^{\tau^-_\theta (k)-1} W_\theta^-( n , k) \right ) \right] $$

and

$$ \frac{d}{d\theta}\mathbb{E}\left[\tau_\theta (k+1) - \tau_\theta (k)\right]= c_\theta \mathbb{E} \left[\sum\limits_{k=\tau_\theta (k)}^{\tau_\theta (k+1)-1} \left ( \tau^+_\theta (k) - \tau^-_\theta (k) \right ) \right] . $$

Inserting the above derivative expressions into Eq. 18 yields the renewal approach phantom estimator (RPhE).

Based on an operator approach, in Heidergott et al. (2006) an alternative estimator is developed that avoids the use of renewal theory. More specifically, denote by γ(k) the first time that \(W^+_\theta (n,k)\) and \(W^-_\theta(n,k)\) simultaneously hit the set { 0 } ; for the single version of the phantom estimator this means that Δ(n , k ) = 0 . More formally, set

$$ \gamma(k) = \inf \big\{ n \geq k: W^+_\theta (n,k) = 0 \text{ and } W^-_\theta (n,k) = 0 \big\}. $$

The derivative is obtained as follows:

$$\label{opf} \frac{d}{ d\theta}\mathbb{E}[W_\theta ] = \frac{c_\theta}{\mathbb{E}[\tau_\theta ( k+1) - \tau_\theta (k) ]}\mathbb{E}\left[ \sum\limits_{k=\tau_\theta ( k)}^{\tau_\theta (k+1)-1} \sum\limits_{n=k}^{\gamma(k)} \left( W^+_\theta (n,k) - W^-_\theta (n, k) \right) \right]. $$
(19)

The above estimator is called the simultaneous phantom estimator (SPhE).

5.3.1 IPA

The IPA estimator for stationary waiting times in the G/G/1 is developed in Glasserman (1992) and we refer to Glasserman (1992) for details. The IPA estimator is as follows:

$$\label{stipa} \frac{d}{ d\theta}\mathbb{E}[ W_\theta ] = \frac{ \mathbb{E}\left[ \sum_{i=\tau_\theta (k)}^{\tau_\theta (k+1)-1} \frac{d}{ d\theta} W_\theta (i{\kern.6pt})\right]} {\mathbb{E}\left[ \tau_\theta (k+1) - \tau_\theta (k) \right]} , $$
(20)

where d W θ ( i ) / d θ is Eq. 5.

Remark 3

Note the resemblance of the derivative representations in Eqs. 19 and 20.

5.4 Performance comparison

5.4.1 Transient approximation

For the M/M/1 queue with arrival rate λ and service rate θ, the exact steady-state waiting time is given by

$$\label{eq:lw} \mathbb{E}[W_\theta] = \frac{\rho / \theta}{1-\rho}, $$
(21)

where ρ = λ/ θ. By performing a series of experiments we search for N sufficiently large so that the relative error of transient approximation, given by

$$\frac{\frac{1}{N} \mathbb{E} \left [\,\sum_{n=1}^N W_\theta ( n )\right] - \mathbb{E}[W_\theta]}{\mathbb{E}[W_\theta]}, $$

is sufficiently small. For the numerical results below we have taken N = 10000 which results in an relative error of less than ±3% with probability 95%.

By differentiating Eq. 21 with respect to θ, we obtain the exact derivative:

$$\frac{d}{d\theta}\mathbb{E}[W_\theta] = \frac{ \lambda (\lambda - 2 \theta)}{\theta^2(\lambda - \theta)^2 }. $$

The transient IPA derivative estimator is given by Eq. 6 and the SRPhE estimator is given by Eq. 13. For our numerical experiments, we let λ = 1.0 and choose N = 10000. For this experimental setting the relative error of both estimators is less than ±3% with probability 95%. The relative work-normalized variance (WNV IPA / WNV SRPhE) is shown in Fig. 6. Since the realative WNV is larger than one, this indicates that SRPhE outperforms IPA is terms of WNV.

Fig. 6
figure 6

The relative work-normalized variance comparison between the SRPhE and IPA for transient approximation

5.4.2 Cycle analysis

To approximate the derivative of the steady-state in cycle analysis, single-run (on-line) phantom estimator can be implemented in two different ways. One uses the RPhE as in Eq. 18, and the other uses the SPhE as in Eq. 19. The IPA estimator for cycle analysis is given in Eq. 20. For our numerical experiments, we let λ = 1.0 and choose the number of total busy cycles as 400000. As the estimators are only asymptotically unbiased, we plot in Fig. 7a the mean relative error of the derivative estimator with respect to the theoretical value as a function of the system load ρ. The relative error of SPhE and IPA lie within a margin of 3 % and are smaller than that of RPhE. The relative work-normalized variance (WNV IPA / WNV Phantom) is shown in Fig. 7b. SPhE has lower WNV than IPA for small and medium work load, whereas IPA has better performance for ρ > 0.6, as can be seen in Fig. 7b. This is caused by large cycle lengths in case of large work load. As shown in Fig. 2, for large positive perturbation, the perturbation in a cycle will be propagated to the next cycle (and the cycles merge), that is, at the end of nominal simulation, we may need to simulate extra cycles in order to obtain enough information, which introduces extra computational burden. Moreover, in the case of large workload, the cycle length is long, so this extra burden is also high. Furthermore, it is shown in Fig. 2 that the work-normalized variance of RPhE is larger than that of SPhE. This is because RPhE involves the derivative of a fraction, which introduces additional variance.

Fig. 7
figure 7

The performance comparison between on-line phantom estimator and IPA estimator for cycle analysis (a, b)

6 Conclusion

Our paper proposed a synthesis of knowledge accumulated in gradient estimation in order to come up with highly efficient gradient estimators. By combining perturbation propagation rules from perturbation analysis and perturbation generation rules from weak differentiation, we constructed a single-run phantom estimator that has smaller work normalized variance than IPA. Topic of further research will be the extension of these findings to more general type of gradient estimation problems and to queueing networks.