Importance sampling for non-Markovian tandem queues using subsolutions

In this paper, we use importance sampling simulation to estimate the probability that the number of customers in a d-node GI|GI|1 tandem queue reaches some high level N in a busy cycle of the system. We present a state-dependent change of measure for a d-node GI|GI|1 tandem queue based on the subsolution approach, and we prove, under a mild conjecture, that this state-dependent change of measure gives an asymptotically efficient estimator for the probability of interest when all supports are bounded.


Introduction
In this paper, we will develop a simulation scheme to estimate the probability that the total number of customers in a d-node G I |G I |1 tandem queue exceeds some high level N in a busy cycle of the system. Two of the main methods used for rare event simulation are importance sampling and splitting: in this paper, we consider importance sampling. In importance sampling, we make the event of interest less rare by changing the underlying probability distribution.
Importance sampling for queueing networks has been studied since 1989, when a state-independent change of measure for general queueing networks was proposed by Parekh and Walrand [11]. Sadowsky [12] showed that for the single queue this change of measure gives a so-called asymptotically efficient estimator. Afterwards, in [10], it has been shown that the change of measure that was proposed by Parekh and Walrand does not necessarily give an asymptotically efficient estimator for the two-node Markovian tandem queue. In [3,10], necessary and sufficient conditions for asymptotic efficiency have been provided; thus, an extension to a state-dependent change of measure is required in order to get an asymptotically efficient estimator for all input parameters.
The subsolution approach is a method, developed by Dupuis and Wang [7], which can be used to construct a state-dependent change of measure. For the two-node Markovian tandem queue [6] and Jackson networks [8], this method has been applied to develop a state-dependent change of measure that gives an asymptotically efficient estimator for the probability of interest. Note that the subsolution approach is not limited in its use to importance sampling. It can also be used for splitting, see, for example, [5], but for this paper we choose not to address splitting and focus on importance sampling instead.
In [2], it has been shown that also for non-Markovian tandem queues an extension to a state-dependent change of measure is required in order to get an asymptotically efficient estimator. In that paper, necessary conditions for asymptotic efficiency have been provided. In this paper, we extend the existing work on d-node Markovian tandem queues to d-node non-Markovian tandem queues. This extension is important, because in many practical applications the Markov assumption is not realistic.
To construct a state-dependent change of measure for non-Markovian tandem queues using the subsolution approach, we extend the state description for Markovian tandem queues-consisting of the number of customers in each queue-with the residual inter-arrival and service times of all queues. By doing so, we have complete knowledge on the state of the system and it turns out that this is sufficient information to construct a state-dependent change of measure.
Firstly, we will analyse how the subsolution approach works for the single G I |G I |1 queue. Using this approach, we find the same change of measure as in [12], with a much shorter proof of asymptotic efficiency than the proof in [12]. Secondly, we consider the two-node G I |G I |1 tandem queue. In that case, the state description is relatively small and therefore the proofs are still quite clean. We end with statements for the d-node G I |G I |1 tandem queue, where we present the results but omit the proofs (which are natural extensions of the proofs for the two-node tandem queue).
The contribution of this paper is threefold. Firstly, we extend the subsolution approach from Markovian tandem queues to non-Markovian tandem queues and we present a state-(in)dependent change of measure that may give an asymptotically efficient estimator. Secondly, we prove asymptotic efficiency of the estimator for the single queue when the change of measure is based on this subsolution. For the d-node tandem queue, we prove that-when all supports are bounded and some conjecture holds-this state-dependent change of measure indeed gives an asymptotically efficient estimator. Lastly, we provide numerical examples to support our results. This paper is structured as follows: In Sect. 2, we introduce the model and notation and we provide some background knowledge about the subsolution approach and importance sampling. Then, in Sect. 3 we present a state-dependent change of measure and we prove that this change of measure gives an asymptotically efficient estimator for the probability of interest. Section 4 concludes the paper with some numerical results.

The model
In this paper, we consider a d-node G I |G I |1 tandem queue; see Fig. 1 for a graphical illustration. The inter-arrival times are i.i.d. and distributed according to some distribution A and the service times of queue j are i.i.d. and distributed according to some distribution B ( j) . All processes are assumed to be independent, and services at all queues follow a first-come first-served (FCFS) discipline. For notational convenience, we will denote the cumulative distribution function of any random variable X by F X (x), its moment generating function by M X (t) and the log-moment generating function by Λ X (t).
We are interested in the probability of the event that the total number of customers in the system reaches some high level N in a busy cycle of the system. Throughout this paper, we make the following assumptions with respect to the distributions of the inter-arrival time and service times. In addition, when considering the d-node tandem queue for d > 1, we make the following assumption.

Assumption 2
The supports of all distributions are bounded, i.e., there exist constants When a Markovian system was studied in [6], the state description consisted of the number of customers in each queue. For a non-Markovian queue, we extend this state description by adding the residual inter-arrival time and the residual service times. Therefore, for the d-node tandem queue the state description is a vector with 2d + 1 A B (1) . . . B (d) Fig. 1 The d-node tandem queue components. As in [6], we define all processes embedded at a transition for the number of customers in a queue. For any vector y with 2d + 1 components y 1 , . . . , y 2d+1 , we introduce shorthand notationȳ j = y d+1+ j , j = 0, . . . , d. Using this, we let denote the state of the system after i transitions. Here, Z j,i , j = 1, . . . , d, is the number of customers in queue j after i transitions,Z 0,i is the residual inter-arrival time after i transitions andZ j,i , j = 1, . . . , d, is the residual service time at queue j after i transitions. If Z j,i = 0 for some j, then we setZ j,i = 0. Throughout this paper, we let Z 0 = (1, 0, . . . , 0), i.e., we start with one customer in queue 1. We  (1), are an arrival at queue 1, a customer going from queue j to queue j + 1, j ∈ {1, . . . , d − 1} and a departure from queue d, which is a departure from the system. As the process starts at (1, 0, . . . , 0), we need a different transition for Z 0 . It will become clear in Remarks 6 and 7 why this technicality is needed.
Let e j denote the jth unit vector. Then, for all i > 0 we have and where A i is the inter-arrival time if the ith transition is an arrival. If the ith transition is not an arrival, then A i = 0. Similarly, we have that B ( j) i is the service time of a customer at queue j if a service is starting at queue j at the ith transition. If the ith transition is not a service at queue j, then B ( j) i = 0. This means that, depending on the current state of the system, it is known which type of transition to take and each of them can have infinitely many possibilities in terms of the residuals (depending on the distribution).
We note that in (1) we consider 1{Z j,i > 1} and not 1{Z j,i > 0}, since we do not need a new service time for queue j when queue j is empty after the transition. We do need a new service time for queue j when a customer departs from queue j − 1 and queue j was empty before, i.e., Z j,i = 0.
is not uniquely defined, it is not clear in which order the transitions should happen. However, in most cases it is not important for our probability of interest which of the transitions happens first. Only if Z(Z i ) =Z 0,i =Z d,i , i.e., an arrival happens at the same time as a departure from the system, the order does matter, and we use the convention that the departure occurs first, i.e., Z(Z i ) =Z d,i .
As for the Markovian system in [6,8], we will work with the scaled process. Therefore, we define and we have is the (i + 1)th transition when the state of the scaled system is X i . The advantage of the scaled system is that the first d elements of the state description are always in [0, 1], which does not increase as N increases. Note that for the scaled system we will use a similar convention as in Remark 1. Similarly to the unscaled system, we define X ( and . Now that we have defined the full state description, we define the goal set δ e and taboo set δ 0 in the following way: where the goal set is reached if there are N − 1 customers in the system and the next event is an arrival to the system, and the taboo set is reached when there are no customers in the system. Using the definitions above, we define the time to reach δ e before δ 0 as and we set τ N = ∞ when δ 0 is reached before δ e . Now we can write our probability of interest, p N , in terms of τ N as Remark 2 Note that we could define δ e and δ 0 differently, i.e., there are more states for which we know that the total number of customers will reach N or that the system will be empty. However, the current definitions are easy to use and it turns out that these definitions are sufficient for our proofs; see Remarks 8 and 12.
From [1,12], we know that the decay rate of p N equals where θ * = min j {θ j } with or equivalently θ j = sup{θ : Note that as a result of the stability and non-triviality assumption, 0 < θ * < ∞; see [2] for a formal proof of P(B ( j) > A) ⇔ θ * = ∞. This gives rise to the following definition.

Definition 1
Queue j is a bottleneck queue when θ j = θ * .
Which queue is the bottleneck also determines the form of the so-called most likely path to overflow: if the overflow level is reached, this is most likely done along a specific path (that dominates the probability of reaching the overflow level and hence determines the decay rate). When queue j is the bottleneck queue, it is therefore expected that along the most likely path x j > 0.

Preliminaries
In order to estimate the probability of interest using simulation, we use importance sampling to make our event of interest less rare by changing the underlying probability distribution. In the current work, we make the change of measure state-dependent by using the subsolution approach.

Subsolution approach
The subsolution approach for importance sampling has been introduced in [7]. Later, in [6,8], it has been used to find a state-dependent change of measure that results in an asymptotically efficient estimator for p N in the context of a Markovian tandem queue [6] and Jackson networks [8]. The definition of a classical subsolution is as follows.
In addition to this definition, in order to use the subsolution as a basis for a change of measure we require that 4. W (( 1 N , 0, . . . , 0)) = γ , so that the change of measure can be useful (i.e., asymptotically efficient; see Definition 3). This means that along the most likely path to reach the overflow level, subsolution properties 2 and 3 are satisfied with equality; see also [8].
In [6], there is an additional condition in the definition of a classical subsolution for the boundaries, when at least one of the queues is empty, and in [8] there are dedicated boundary functions for H(x, DW(x)), which are used when one of the queues is empty. We will include the boundaries in H(x, DW(x)), as the boundaries are included in V X (x) by means of indicator functions.

Remark 3
The function W (x) that we will construct in the sequel consists of (a combination of) affine functions so that for all these affine functions its derivative DW(x) is a constant. Whenever we consider an affine function, we will denote the derivative by α.

Importance sampling simulation
In this paper, we will propose a particular change of measure and prove that it results in an asymptotically efficient estimator for p N ; see Definition 3. In order to define the change of measure, we first introduce some notation for the probability measure. With some abuse of notation, we let dF V X (x) (v) be the probability measure for some random vector V X (x). Typically, the vector V X (x) contains one random variable, see (2), but it may result in zero or two random variables as well, for example, when a customer moves from queue j to j + 1 and queue j + 1 is empty, there are two random variables in V X (x). For example, we can interpret dF V X (x) (v) as dF A (a) if X (x) =x 0 and x 1 > 0.

Remark 4
The probability measure of V X (x) can be written in a more formal way, which we illustrate by considering the arrival transition in a single queue. Thus, consider a state x = (x 1 ,x 0 ,x 1 ) for which x 1 > 0 and X (x) =x 0 . Then, we have and thus the density We remark that two of the three components of the transition V X (x) are deterministic, while the remaining component consists of a random sample from the inter-arrival time distribution F A , added to the deterministic value −x 0 N . Also, in larger models, most components of the transition are deterministic (cf. (2)); therefore, we prefer the shorter but less precise notation in which only the random component(s) are written down.
Let W (x) be a classical subsolution; see Sect. 2.2.1. Then, we define the change of measure as whereF V X (x) (v|x) is the distribution function under the new measure and H(x, DW(x)) is defined in (6). Note that the latter quantity can be interpreted as a normalization constant. In fact, (7) means that we apply an exponential tilt to a random variable (that depends on the random vector V X (x)), for example, when X (x) =x 0 and x 1 > 0 we are tilting the inter-arrival times exponentially with some parameter. While performing a simulation run and changing the underlying probability distributions at every step, we keep track of the likelihood ratio. The likelihood ratio for a successful path P = (X i , i = 0, . . . , τ N ) is where the second equality follows by using (7). We now define the estimator of p N to be L(P)I (P), where I (P) indicates whether we have reached level N or not, i.e., I (P) = 1{τ N < ∞}. This estimator is unbiased under the new measure, denoted by Q, since We use the following definition for asymptotic efficiency.

Definition 3 The estimator for p N is asymptotically efficient if and only if lim inf
Using the decay rate of p N , see (4), we find that Definition 3 is equivalent to the following definition.

Asymptotically efficient change of measure
In this section, we present our main result. For readability, we present it for three cases separately. In Sect. 3.1, we start with d = 1, i.e., the single queue, where the subsolution approach results in a state-independent change of measure. Although this change of measure has been proven to be asymptotically efficient in [12], we present an alternative proof using the method that will be extended to a state-dependent change of measure for the d-node tandem queue. Secondly, we consider d = 2 in Sect. 3.2 in detail, as in this case the state vector consists of five dimensions only. Lastly, in Sect. 3.3, we present the result for the d-node tandem queue, but we omit the proofs since they are very similar to the two-node case.
The approach of the subsolution method, as developed in [7], is the same in all cases defined above and is as follows: 1. For all possible x, we find solutions α to H(x, α) ≥ 0. 2. We construct a function W (x) that both is continuously differentiable and satisfies properties 3 and 4 in and below Definition 2, as N → ∞. The function W (x) will be as indicated in Remark 3; more precisely, for each x we will have DW(x) ≈ α, where α is a solution corresponding to x as found in Step 1. 3. We then use the function W (x) to define a change of measure as in (7).
After we have proposed the change of measure, we will prove asymptotic efficiency of this change of measure.

The single GI|GI|1 queue
For the single queue, the model, as presented in Sect. 2.1, simplifies significantly. We will highlight the most important simplifications. To start with, the state description reduces to x = (x 1 ,x 0 ,x 1 ). Furthermore, queue 1 is never empty during a busy cycle of the system and so X (x) = min{x 0 ,x 1 }. Due to the same reason, we always have 1{x 1 = 0} = 0 and, when there is a departure from the system, 1{x 1 > 1 N } = 1 (otherwise, x ∈ δ 0 , i.e., the system will become empty after the transition). In view of Remark 6, we do not yet substitute 1{x 1 = 0} = 0 in the remainder. Thus, (2) in the case d = 1 becomes In the next sections, we will follow the subsolution approach step by step and conclude with a proof of asymptotic efficiency of the change of measure based on the developed subsolution.

Solution to H(x,˛) ≥ 0 for all x
We have, for all possible x = ( 1 N , 0, 0), from (6) and the above, that Then, using that 1{x 1 = 0} = 0, a solution to H(x, α) ≥ 0 for all x during a busy cycle of the system is where θ * equals θ 1 in (5).

Remark 5
Another solution to H(x, α) ≥ 0 is α = (0, 0, 0). As this is equivalent to no change of measure, we will not use this solution.

Construction of W(x)
To use the subsolution approach, we want to choose the function W (x) such that W (x) ≤ 0 for x ∈ δ e and W (( 1 N , 0, 0)) = γ ; see properties 3 and 4 in and below Definition 2. For the single queue, we have δ e = {x : satisfying all the requirements when N → ∞: indeed, N for all x ∈ δ e , and W (( 1 N , 0, 0)) = γ 1 − 1 N , and so when N → ∞ the boundary conditions for W (x) are satisfied. In Fig. 2, the function W (x) is displayed as a function of x 1 andx 0 −x 1 .
N , we note that this is equivalent to starting with 1 customer in the system, and hence this is a natural choice.

The change of measure
We now find that, using (7), the change of measure is, for and

Proposition 1
The change of measure provided in (9) and (10) is the same change of measure as the state-independent change of measure from [12].
Proof The proof is straightforward by noting that the inter-arrival times are exponentially tilted with parameter −θ * and the service times are exponentially tilted with parameter θ * .

Asymptotic efficiency
In [12], it has been shown that the change of measure as mentioned above is asymptotically efficient under the condition that P(B This restriction to bounded service times is said to be a technicality, but the paper is not clear about how to remove it. Instead, we will give a different proof without the need for this condition. Theorem 1 Under Assumption 1, the state-independent change of measure based on DW(x) = (−γ, θ * , −θ * ), see (9) and (10), is asymptotically efficient for the single G I |G I |1 queue.
Proof We start with the log likelihood, log L(P), of any path P = ( where we note that where the inequality follows from (11) and by the definition of p N , and the second equality follows by using (4).

Remark 8
In the proof of Theorem 1, we see why we defined δ e as in (3), and not simply δ e = {x : x 1 = 1}; this allows for the second (strict) inequality to hold.

The two-node GI|GI|1 tandem queue
Although the subsolution approach for the single queue results in a state-independent and asymptotically efficient change of measure, it is known from both [2] (general distributions) and [3,10] (Markovian distribution) that a state-independent change of measure for the G I |G I |1 tandem queue cannot be asymptotically efficient for all input parameters. We will see that the subsolution method, as it did for the Markovian case in [6,8], results in a state-dependent change of measure.
In this section, we will use the same approach as in Sect. 3.1 and we conclude by proving asymptotic efficiency of the constructed change of measure.

Solutions to H H H(x,˛) ≥ 0 for all possible x
We start the method by finding a solution to H(x, α) ≥ 0 for all possible x. As before, we let DW(x) = (α 1 , α 2 ,ᾱ 0 ,ᾱ 1 ,ᾱ 2 ) and recall that X (x) = min k∈{0}∪{ j:x j >0} {x k }. Then, we have from (6), for all possible and . The solutions to H(x, α) ≥ 0 that we describe depend on the number of customers in queue 1 and the number of customers in queue 2. Note that there may be more solutions possible, but in order to find a state-dependent change of measure the current solutions turn out to be sufficient. Clearly, α = 0 is always a solution to H(x, α) ≥ 0 and is equivalent to no change of measure. However, using DW(x) = α = 0 does not give a subsolution, since in that case it is impossible to satisfy properties 3 and 4 in Definition 2.
In Table 1,   (2) , −θ * ] (2) , −θ * ] (1) and θ (2) when queue 1 is the bottleneck queue and θ (2) . Note that for all solutions proposed in Table 1 we have Remark 9 All solutions presented in Table 1 satisfy subsolution property 2 along the most likely path (as defined below Definition 1) with equality. Recall that, along the most likely path, x j > 0 when queue j is the bottleneck queue. Note that the only solutions that do not necessarily satisfy H(x, α) = 0 are those along the boundary x j = 0 when queue j is the bottleneck queue (except for α = 0, which is always possible).

Remark 10
We see from Table 1 that in some cases there is a possibility to chooseᾱ j > 0. It is not obvious to do so, because then the service times of queue j are tilted in the 'wrong' way, since it would imply that on average the service times of queue j are even shorter than under the original measure. Nevertheless, it is a solution that satisfies H(x, α) ≥ 0.

Construction of W(x)
Since there is no non-trivial solution for H(x, α) ≥ 0 that holds for all x, that is, there is not a single affine function W (x) such that for its derivative α this condition holds for all x, it turns out that a different change of measure needs to be applied in different 'regions' of the state space that are specified precisely later. From Table 1, we can extract two non-trivial solutions that are dependent on the bottleneck queue: one which we can use for x 1 > 0 (and any x 2 ) and one which we can use for x 2 > 0 (and any x 1 ). These solutions, along with the trivial solution, can be found in Table 2 and are sufficient to obtain an asymptotically efficient estimator (as we will prove in Sect. 3.2.4; see Theorem 2). In the construction that is explained below, we will also see that cases in which both x j > 0 can be handled using these subsolutions; see Eq. (14). As it turns out, when both x j > 0, either the solution for x 1 > 0 or the solution for x 2 > 0 will be used.

Remark 11
The other solutions that are described in Table 1 can also be used to find a state-dependent change of measure that is proven to be asymptotically efficient using the same method that is described below; see Theorem 3.
Remember that γ = −Λ A (−θ * ), and so we define which are used to determine the different changes of measure in each of the 'regions' of the state space. Thus, 'region' 1-corresponding to α 1 -has to cover x 1 = 0, 'region' 2-corresponding to α 2 -has to cover x 2 = 0 and, finally, 'region' 3corresponding to α 3 -has to cover x 1 = x 2 = 0. Note that α k is the notation for a solution to H(x, α) ≥ 0, whereas α k is the notation for a component of some vector α. From these solutions α k , we define three functions for some δ > 0, and their minimum The idea of this construction, as in [6][7][8], and in particular subtracting kδ, is that the minimum function W δ (x) has gradient α 1 near x 1 = 0, gradient α 2 near x 2 = 0 and gradient α 3 near x 1 = x 2 = 0, as desired. In Figs. 5 and 6, we give a rough illustration of the behaviour of the function W δ (x), considering the dependence on the scaled queue lengths x 1 and x 2 while neglecting the dependence on the scaled residualsx 0 ,x 1 andx 2 . For large N , these can indeed be neglected due to scaling, in particular in the case of bounded supports of the inter-arrival time and service times. However, for unbounded supports there can be exceptions to this idea due to a very large residual inter-arrival time or a very large residual service time. For example, at is the minimum function even when x 2 is not near 0. Indeed, in this case it is very likely that the system empties before the next arriving customer, and hence it makes sense that W δ (x) = W δ 3 (x) (and no change of measure will be applied).
Note that W δ (x) satisfies properties 2-4 in and below Definition 2 by construction. As we have different functions for different regions, W δ (x) is not a continuously differentiable function, which is the first property of a classical subsolution. Therefore, we apply a similar mollification procedure as in previous work on Markovian systems;  Fig. 6 Illustration of the dependence of W δ (x) on (x 1 , x 2 ), neglecting the scaled residuals as in Fig. 5. The figure here shows how for x in region k we have W δ (x) = W δ k (x), and hence DW δ (x) = α k see [6,8]. This mollification ensures that a derivative exists throughout the whole parameter space. We let When ε → 0, W ε,δ (x) converges to W δ (x). Another result of this choice of W ε,δ (x) is that with Throughout this paper, we make the following assumptions on ε and δ, as in [6]. We remark that we let ε and δ depend on N , though for brevity we do not explicitly write this dependence.

Assumption 3
We choose ε and δ dependent on N , such that As a result, W ε,δ (x) satisfies properties 3 and 4 in and below Definition 2 as N → ∞, which follows immediately from the following lemma.
Lemma 1 Under Assumption 1, we have, for the two-node G I |G I |1 tandem queue, Proof We have, for x ∈ δ e , by (15), where the final inequality follows as

Remark 12
The proof of Lemma 1 again explains our choice of δ e : it is chosen such that W ε,δ (x) can be upper bounded for x ∈ δ e .
Note that, by construction, we expect that also the second property of a classical subsolution is satisfied for W ε,δ (x), i.e., H(x, DW ε,δ (x)) ≥ 0 when N → ∞, which we will prove in Lemma 2.

The change of measure
In this section, we will give examples of the change of measure based on W δ k (x), see (13), for some parts of the state description. This gives some insight into the change of measure that is applied in the mollified function W ε,δ (x) and how the change of measure that we use in the current paper relates to previous work. When choosing DW δ k (x) in (7) as a constant vector α = (α 1 , α 2 ,ᾱ 0 ,ᾱ 1 ,ᾱ 2 ), the change of measure can be written in terms ofᾱ 0 ,ᾱ 1 andᾱ 2 as for x = X 0 and In each of the examples below, we consider the cases where our change of measure based on W ε,δ (x) gets close, as N → ∞, to an 'affine' change of measure as above, that is, based on some W δ k (x) with constant gradient α. It is two of these latter 'limiting' change of measures that we consider in the following.

Examples of the change of measure
When queue 1 is the bottleneck queue, this corresponds to the state-independent change of measure that has been studied in [2,9,11]. On the other hand, when queue 2 is the bottleneck queue, the inter-arrival times are still exponentially tilted with parameter −θ * = −θ 2 and so it also corresponds to the state-independent change of measure studied in those papers. However, it is not the service times of queue 2 that are exponentially tilted, but the service times of queue 1 that are exponentially tilted with parameter θ * = θ 2 .

Remark 13
We note that the change of measure used here along the horizontal boundary is different from the one in [6] for a two-node Markovian tandem queue. Let λ, μ 1 and μ 2 be the exponential rates for the inter-arrival times, and service times of queue 1 and queue 2 under the original measure. If we would consider DW δ 2 (x) ≡ α 2 = (−γ, 0, θ * , − θ (1) , 0), which is a solution to H(x, α) ≥ 0 according to Table 1, then we find that under the change of measure the service times of queue 1 have an exponential distribution with rate μ 1 λ μ 2 . This does correspond with the results from [6], although they consider the embedded discrete-time Markov chain, while we consider the continuous-time Markov chain.
When queue 1 is the bottleneck queue, this change of measure corresponds to the stateindependent change of measure studied in [2,9,11] in the sense that the inter-arrival times are exponentially tilted with parameter −θ * = −θ 1 . In contrast to an exponential tilt for the service times of queue 1, which happens for the state-independent change of measure, here we exponentially tilt the service times of queue 2 with parameter θ * = θ 1 . When queue 2 is the bottleneck queue, the change of measure described above corresponds to the state-independent change of measure that has been studied in [2,9,11].

Asymptotic efficiency
Using (16), we can find a lower bound on H(x, DW ε,δ (x)) which goes to 0 when N → ∞.

Remark 14
Since we assume bounded supports of the various service time distributions, we note that equality is achieved in (5) for all queues j. That is, we have Λ A (−θ ( j) ) + Λ B ( j) (θ ( j) ) = 0 for all queues j. In particular, equality holds for the bottleneck queue. Similarly, equality in the definition of θ (1) and θ (2) also holds.

Proof
We substitute (16) in (12) to find H(x, DW ε,δ (x)) for x = X 0 equal to where the inequality follows from convexity of the log-moment generating functions, ρ k (x) ∈ [0, 1] and by the definition of γ = −Λ A (−θ * ) ≥ 0. By using 1{x k > 1 N } ≤ 1 for k = 1, 2, Λ A (−θ * ) + Λ B ( j) (θ * ) ≤ 0 for all queues j by the definition of θ * , γ and the bounded supports (and hence bounded X (x)), we find that (20) is greater than or equal to − max{Q (0) , Q (1) , For any x with x 2 =x 2 = 0, we have, from (17), and, for any x with x 1 =x 1 = 0, Substituting (22) and (23) in (21), we find, for x = X 0 , that where the second and the third inequalities follow as at most one of the indicators equals 1 at any time during a busy cycle of the system and the final equality follows by the definition of θ * . To conclude the proof, we write for the initial state that Next we show that and provide an upper bound for the error term, similar to Lemma 2 in [4].

Lemma 3
Consider a two-node G I |G I |1 tandem queue satisfying Assumptions 1 and 2. Then, for any successful path X i , i = 0, . . . , τ N , it holds that , and R k (x + ηy) = R k (x)e − α k ,y η/ε , which implies that DW ε,δ (x + ηy) = where the inequality follows by the definition of DW ε,δ (x), see (16), ρ(x) and R j (x + ηy), and by bounding α k , y by the minimum or maximum over k (whichever is appropriate). We use the crude bounds (1) , where the second inequality follows by considering all possible transitions. For example, in the case of an arrival the number of customers in the system changes by 1, the residual inter-arrival time and the residual service time at queue 2 change by at most Q (0) and the residual service time at queue 1 changes by at most max{Q (0) , Q (1) }.
, we find | α k , y | ≤ C 1 for all k = 1, 2, 3, and so we can upper bound (24) by where the inequality holds for sufficiently large εN , and hence we have found

Remark 15
The bound in Lemma 3 is very crude and better bounds can be obtained by considering all possible transitions separately. However, in order to show asymptotic efficiency the current bound is sufficient. Table 1 use different values rather than θ * , for example, θ (1) . It is easy to verify that for these different values, similar bounds as in Lemmas 2 and 3 can be obtained by using their definition. The only additional requirement is that the value replacing θ * is finite. Note that θ * is finite by Assumption 1.

Remark 16 Some solutions to H(x, α) ≥ 0 in
In order to show asymptotic efficiency, we need an asymptotic result involving τ N , the total number of steps to reach level N (given that level N is reached before the system is empty). This is the subject of the following conjecture.

Conjecture 1 If σ N is a sequence of real numbers such that
Intuition We note that we can upper bound τ N by 3K N , where K N is the index of the first customer who reaches the overflow level N . Along the major part of the most likely path to reach the overflow level, the change of measure is very close to the state-independent change of measure as discussed in [2] (see also Examples 1 and 2). In Lemmas 3.2 and 4.1 of that same paper, it is shown that under this state-independent change of measure the system is unstable and K N N → C < ∞ with probability 1. This means that the left-hand side of (25) would be upper bounded by lim sup N →∞ This means that the random variable τ N , and also the random variable K N , would grow much faster than N as N → ∞, which does not seem plausible.
For a mathematical proof of the conjecture, two problems remain. The first one is to show that the difference between the actual change of measure and the stateindependent change of measure (which is small along the most likely path) does not influence the validity of (25). The second problem is to show that for the stateindependent change of measure we indeed have lim sup N →∞ Even if the change of measure were equal to the state-independent change of measure along the whole state space, we could not show this relation. Of course, there may be other possibilities to show that (25) holds.
We can now prove the main theorem of this section. Proof For a successful path P, we have where the second step follows from Lemmas 2 and 3. Define σ N = Since X τ N ∈ δ e for the successful path P, we find, by using Lemma 1, (3), and so we have where the equality follows by Assumption 3 and the last step follows by noting that P Q (I (P) = 1) ≤ 1. Using Conjecture 1 concludes the proof.

Remark 17
In the proof of Theorem 2, we bounded N τ N −1 i=0 DW ε,δ (X i ), X i+1 −X i and τ N −1 i=0 H(X i , DW ε,δ (X i )) separately, resulting in bounds involving the maximum support of the various distributions. Looking at Eqs. (18) and (19), we see, for example, that the new probability density functions themselves do not depend on X (X i )N , and thus the likelihood ratio in (8) does not depend on X (X i )N , even though X (X i ) is needed in order to determine which probability density function is used. However, to use the mean value theorem in Lemma 3 we do need to keep this quantity throughout the proofs.
We can extend Theorem 2 to the following set of changes of measure; see also Remarks 11 and 16. We restrict ourselves to three regions only, since this is the easiest from an implementation perspective. Let (2) , −θ * ], Note that when queue j is the bottleneck queue, θ ( j) = θ * and hence along the most likely path still the state-independent change of measure studied in [2,9,11] is used.
We leave the proof of the following theorem to the reader, as this only requires some small modifications in the proofs of Lemmas 2, 3 and Theorem 2.

The d-node GI|GI|1 tandem queue
In this section, we present the steps to follow in order to find a state-dependent change of measure for the d-node G I |G I |1 tandem queue. As we needed a conjecture for the case d = 2, we also need a conjecture for the more general case. We will formulate similar lemmas as in Sect. 3.2 but omit the proofs, as these are just extensions of the proofs for the two-node tandem queue and therefore do not require any additional techniques.

Solutions to H H H(x,˛) ≥ 0 for all possible x
For the d-node tandem queue we have, for x = X 0 , Some possible solutions to H(x, α) ≥ 0, similar to the solutions presented in Table 2, can be found in Table 3 (including the trivial solution α = 0). Similarly to Table 2, the solutions presented consider cases where some x j > 0. Due to the construction that Table 3 Some solutions to H(x, α) ≥ 0 for all x

Region of validity
Nonzero α i 's Nonzeroᾱ i 's . .
All α i 's andᾱ i 's that are not explicitly mentioned in this table are equal to 0 in that particular case is used, using the minimum of all subsolutions, these solutions also cover the interior of the state space. Clearly, solutions similar to the other solutions presented in Table 1 also exist. For example, define θ (1)

Construction of W(x)
As for the two-node tandem queue, there is no solution for H(x, α) ≥ 0 that holds for all x. Therefore, a different change of measure needs to be applied in different 'regions'. In this case, the following vectors are used to specify the different changes of measure in each of the 'regions': . . .
From this, we find d + 1 functions W δ k (x) = α k , x + γ − kδ, for k = 1, . . . , d + 1, for δ > 0, and their minimum which again is a piecewise affine function that equals W δ k (x) in 'region' k. The constant γ in W δ k (x) is included to satisfy the properties of the subsolution, and the subtraction of kδ is needed so that the minimum function W δ (x) is uniquely attained at each of the boundaries. To make the minimum function a continuously differentiable function, we apply a similar mollification procedure as for the two-node G I |G I |1 tandem queue. This mollification 'removes' the regions and ensures that a derivative exists throughout the whole parameter space. We let When The assumptions on ε and δ can be found in Assumption 3. Similarly to Lemma 1, we see that W ε,δ (x) satisfies properties 3 and 4 in and below Definition 2.

Lemma 4 Under Assumption 1, we have, for the d-node G I |G
The change of measure will be based on the gradient of W ε,δ (x). From (28), it follows that where ρ k (x) is defined similarly to in (17).

The change of measure
As for the 2-node tandem queue, we show that the change of measure based on W δ k (x) results in the state-independent change of measure studied in [2,9,11] in some particular cases. This gives some insight into the change of measure that is applied in the mollified function W ε,δ (x).
Choosing DW(x) in (7) as a constant vector α = (α 1 , . . . , α d ,ᾱ 0 , . . . ,ᾱ d ), the change of measure can be written in terms ofᾱ 0 , . . . ,ᾱ d as for x = X 0 and We now consider the cases where our change of measure based on W ε,δ (x) gets close, as N → ∞, to an 'affine' change of measure as above, that is, based on some W δ (27), it follows that indeed the resulting change of measure equals the state-independent change of measure studied in [2,9,11] if queue j is the bottleneck queue.

Asymptotic efficiency
To show that asymptotic efficiency also holds for the d-node G I |G I |1 tandem queue, assuming that Conjecture 1 is correct, we need the following two lemmas.

Lemma 6
Suppose we have a d-node G I |G I |1 tandem queue satisfying Assumptions 1 and 2. Then, for any successful path X i , i = 0, . . . , τ N , it holds for all X τ N that Using Lemmas 4-6, we can prove the following result.

Remark 18
We could also extend Theorem 4, as we did for the two-node tandem queue in Theorem 3, and we claim that asymptotic efficiency also holds in that case under the same conditions as in Theorem 4.
for the change of measure. We recall that the estimator for p N is L(P)I (P) and hence we will define the following estimator, which is obtained via simulation, where S is the total number of simulation runs, L(i) is the likelihood ratio in simulation i and I (i) indicates whether level N has been reached in simulation i or not. In all tables given below, RE is the relative error, i.e., the standard deviation of the estimator divided by its mean, and AE denotes Thus, if the change of measure is asymptotically efficient, AE should converge to 2 when N tends to infinity; see Definition 3. Furthermore, in our tables we will include the number of times the overflow level has been reached (out of a total of S = 10 6 simulation runs) and the (rounded) simulation time in seconds. Note that the latter quantity is only there for reference and does not indicate if the estimator is asymptotically efficient or not.
To use the change of measure based on (16), we need to choose ε and δ so that they satisfy Assumption 3. It is not trivial to find suitable ε and δ that satisfy all requirements; we only explored several possibilities, while many more exist. In all the examples below, we set ε proportional to 1 √ N and δ = −ε log ε, unless the condition in Remark 19 is not satisfied.

Remark 19
For the choice δ = −ε log ε, it may happen for small values of N that W δ (x) does not attain W δ k (x) for some k in the 'region' where it is supposed to be attained. For example, consider W δ 2 (x) < W δ 1 (x), which is equivalent to Taking into account that W δ 2 (x) is designed for the case x 2 = 0, the right-hand side of this equation should be positive. Thus, we need δ > Q (2) N θ * (recall that Q (2) is an upper bound on the support of the service times at queue 2). When we also consider W δ 1 (x) < W δ 3 (x) and W δ 2 (x) < W δ 3 (x), it turns out that we actually need δ > max{Q (1) , Q (2)  Therefore, when δ = −ε log ε does not satisfy (32), in our numerical experiments, we actually set δ = max{Q (1) ,Q (2) }θ * N 0.95 . We choose the denominator to be N 0.95 , but of course other choices are possible to obtain the strict inequality in (32). Unless mentioned otherwise, only for N = 20 this different value for δ is used. (Note that for a d-node tandem queue, a similar condition exists; δ > max{Q (1) ,...,Q (d) }θ * N .) We consider two types of tandem queues, a D|U |1 − ·|U |1 tandem queue and a M|U |1−·|U |1 tandem queue, and in both cases we vary the bottleneck queue. Starting with a D|U |1 − ·|U |1 tandem queue, we first consider the case where the θ -bottleneck queue is not unique, i.e., the service times at both queues have the same distribution with the same parameters. In our case, both service times have a uniform distribution on the interval [0, 2]. Similar cases are known to fail when using a state-independent change of measure, see [2,3], and thus this is an interesting case to consider. The results can be found in Table 4. In this table, we see results that clearly support the theoretical results of the estimator being asymptotically efficient; at first the relative error is decreasing, after which it slowly increases. We also see that the number of times the overflow level N is reached is decreasing with N . This behaviour may seem strange, but it can be seen in all of the results in this section. Given our assumptions on ε and δ, we note that δ N → ∞ and hence 'region' 3-where no change of measure is applied-is increasing in size. Therefore, we can expect that as N increases it becomes increasingly more difficult to reach the overflow level, even though we have asymptotic efficiency. In particular, we observed that when a system has small server utilizations, it may be hard to escape the 'region' of the state space where (almost) no change of measure is applied.
Next, we let one of the queues be the bottleneck queue. In Tables 5 and 6, queue 1 and queue 2 are the bottleneck queue, respectively. Here, we changed one of the service  time distributions of the previous example from U [0, 2] to U [0.5, 1.5] so that there is a unique θ -bottleneck queue, but the server utilizations of both queues remain the same. When queue 1 is the bottleneck queue, we see that the relative error is still very large for N = 20, but for larger values of N the theoretical results are supported. Note that in this case δ = −ε log ε for all values of N . We also see from this table that the simulation time is increasing, even though the number of times the overflow level is reached is decreasing. This is due to the fact that we have to reach a higher overflow level N . When queue 2 is the bottleneck queue, we have considered a much smaller value for ε, and thus δ differs from being −ε log ε for N = 20, . . . , 260. This can also be seen from the table, by noting that both the number of times the overflow is reached and the simulation time increase at N = 300. Besides this difference, the results in the table clearly support the theoretical results. Lastly, we consider a M|U |1 − ·|U |1 tandem queue in Tables 7, 8 and 9. We emphasize that the inter-arrival times are exponentially distributed, which means an unbounded support, and, hence, this is not covered by our theoretical results. Again, we start with the case where there is no unique bottleneck queue in Table 7. We see from this table that the relative error remains roughly constant, suggesting asymptotic efficiency. In Table 8, when queue 1 is the bottleneck queue, and in Table 9, when queue 2 is the bottleneck queue, we see that the relative error is slightly increasing, but still suggesting asymptotic efficiency. We remark that in Table 9 δ does not equal −ε log ε for N = 20, . . . , 100, which can be seen by an increasing number of times level N is reached for N = 140.
Even though we do not have a proof of asymptotic efficiency for unbounded supports, which is the case for the inter-arrival times in Tables 7, 8 and 9, all these tables show good results that suggest asymptotic efficiency of the estimator. We also did some experiments with exponentially distributed service times, where, regardless of (32) in Remark 19, we set δ = −ε log ε, but we did not obtain good results there. This is most likely due to the fact that for Markovian service times (and unbounded service times in general), the condition on δ, see (32), cannot hold. As a result, it could happen that, for example, W δ 2 (x) > W δ 1 (x) when x 2 = 0, see also Remark 19, leading for x 2 = 0   to H(x, DW ε,δ (x)) ≈ H(x, α 1 ), which is negative for x 2 = 0 (and hence violates property 2 in Definition 2).
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.