Queues on a Dynamically Evolving Graph

This paper considers a population process on a dynamically evolving graph, which can be alternatively interpreted as a queueing network. The queues are of infinite-server type, entailing that at each node all customers present are served in parallel. The links that connect the queues have the special feature that they are unreliable, in the sense that their status alternates between ‘up’ and ‘down’. If a link between two nodes is down, with a fixed probability each of the clients attempting to use that link is lost; otherwise the client remains at the origin node and reattempts using the link (and jumps to the destination node when it finds the link restored). For these networks we present the following results: (a) a system of coupled partial differential equations that describes the joint probability generating function corresponding to the queues’ time-dependent behavior (and a system of ordinary differential equations for its stationary counterpart), (b) an algorithm to evaluate the (time-dependent and stationary) moments, and procedures to compute user-perceived performance measures which facilitate the quantification of the impact of the links’ outages, (c) a diffusion limit for the joint queue length process. We include explicit results for a series relevant special cases, such as tandem networks and symmetric fully connected networks.

1 Introduction assumed to be constant over time. In many real-life situations, however, links may be temporarily inactive, entailing that the underlying structure should instead be considered as dynamic. At a conceptual level, such a system can be seen as a network of queues, where the links' availability fluctuates in time. The main objective of this paper is to study the performance of such a queueing network, with links alternating between being 'up' and 'down '. Leading examples in which our model can be used include communication networks, road traffic networks, various physics-motivated networks, and chemical reaction networks. At a somewhat more detailed level, the network can be described as follows. The network is a graph with nodes and links, along which clients travel. At any node, external arrivals occur according to a Poisson process with a node-specific rate. Service times at the nodes are exponentially distributed (with a node-specific parameter); when a customer has been served at a node, he selects a next node through some routing mechanism (where it is also an option to leave the network). Suppose the client resides at node i and he wants to be routed to node j; assume that each link's up-and down-times are exponentially distributed. Then, depending on the situation at hand, the following two options arise. If the link from i to j is up, then he jumps from node i to node j. If, on the contrary, the link from i to j is down, then either the client is lost (which happens with a node-specific probability), or he waits an exponentially distributed amount of time at node i and tries again.
The queueing mechanism studied in this paper is infinite-server, making our analysis particularly useful for situations in which there is no (or hardly any) interference between the clients at each individual queue, in the sense that they can be served essentially in parallel. It is noted that in this paper we use queueing-theoretic terminology, but infinite-server queues are frequently used in other domains as well. As a model in which particles move on a dynamically evolving graph, it can be seen as an object relevant to statistical physics (cf., for instance, the model considered in [11]), but there are applications in chemical reaction networks [4], (cell) biology [27], and population dynamics [22] as well. In operations research, the infinite-server model we have defined can be used to study e.g. the numbers of clients simultaneously using (somewhat larger) segments in a road traffic network, or numbers of clients simultaneously visiting connected websites.
For this class of model, we are interested in various performance measures. The most important one is the joint distribution of the (time-dependent and stationary) queue lengths at all nodes, together with the number of lost clients (i.e., clients who leave the network because of link failures).
• In the first part of the paper we derive a series of exact results. (i) Our first class of results is in terms of a system of coupled partial differential equations for the probability generating function pertaining to the joint queue length distribution. (ii) In the second place, this system of differential equations can be used to recursively determine all (timedependent and stationary) moments. (iii) Thirdly, we assess the impact of the network's down-times on the service quality that is perceived by its users. • Then we consider scaling limits: by scaling the external arrival rates and the up-and downtimes, we present a diffusion limit. This result entails that the joint queue-length process weakly converges to a mean-reverting Gaussian process (viz. a multivariate Ornstein-Uhlenbeck process). An important feature of the scaling chosen is that the speed at which the external arrival rates are scaled may differ from the speed at which the upand down-times are scaled. This creates the flexibility to cover networks in which the alternation between up-and down times is relatively slow (think of road networks) or relatively fast (think of the channel conditions in a wireless network); also time-scale separation ideas (as often relied on in chemical reaction networks) can thus be modeled.
The model in this paper can be seen as an instance of a stochastic process (viz. a queueing process) on a dynamically evolving graph. The literature on such models is still at its infancy. Where static random graphs form a classical topic in probability theory, dating back to the pioneering work of Erdős and Rényi [12] and Gilbert [15], only recently the behavior of randomly evolving graphs has received substantial attention; see e.g. [16,17,23,30] for a few examples. Examples of papers on random processes on (dynamic) random graphs are [3,6,7]. The systematic study of queueing processes on such a randomly evolving graph has hardly been looked at, a notable exception being the recent study [13]. The model considered in [13] complements the one studied in the present work. Most notably, the framework of [13] in particular facilitates modelling the effect of nodes going down every now and then, where the present paper has a focus on links going down. The immediate consequence of this difference in modelling, is that in the framework of [13] diffusion limits do not apply, due to the instantaneous downward jumps of the network population vector at epochs that a node fails.
There is also a relation with the classical work [25], where the Poisson-arrival-location model (PALM) is introduced. In this model customers arrive according to an inhomogeneous Poisson process and move independently through the network according to some random location process (with a fixed routing matrix). A consequence of the way the model is constructed is that, for instance, the number of customers at each node follows a Poisson distribution. The major difference with our model, is that in our setup the topology of the network is determined by a modulating process (meaning that the routing matrix is random); consequently the positions of different clients (during their path through the network) are in our model no longer independent, thereby also destroying the 'Poisson properties'. Observe that it is this dependence structure that considerably complicates the analysis. It also explains why we pursue scaling limits for obtaining insight in the network population distribution (which is obviously not needed in the setup of [25], as there closed-form expressions are available).
Our analysis will be based on casting our model as a network of infinite-server queues under Markov modulated arrival and service rates. Explicit results on (single-node) Markov modulated infinite-server queues (primarily in terms of differential equations for the probability generating function, and the corresponding moments) can be found in e.g. [5,10,14,19,26]. Diffusion limits for such single-node systems have been derived in e.g. [2,9]; we also refer to [18] for a recent contribution with such diffusion results for a broad class of networks of Markov modulated infinite-server queues. For general background on queueing networks, we refer to [20,21,28]. This paper is organized as follows. In Sect. 2 we describe our model. Section 3 presents our analysis, in terms of results exact results for the probability generating function and moments; we restrict ourselves to the case that clients who wish to jump but the corresponding link is down, are lost with probability 1. Section 4 concerns the weak convergence to a Gaussian process, for the same model. In Sect. 5 we consider a number of extensions, including the one in which blocked customers are not necessarily lost but retry. In Sect. 6 we discuss a number of special cases for which the calculations can be done explicitly. Concluding remarks are found in Sect. 7.
The network that we consider consists of n nodes that are connected throughn := n 2 links. Let λ i be the rate of the Poissonian arrival process at node i. The time spent at node i is exponentially distributed with parameter μ i (where we discuss in Sect. 5 how our setup extends to the case of phase-type service times). After having been served at node i, the probability that the served customer wishes to jump to node j (where j = i) is p i j , where p i0 is the probability of leaving the network. We obviously assume that j =i p i j = 1, and we write μ i j = μ i p i j . It is noted that this setup does not necessarily mean that we assume that the network be a complete graph; if a node pair (i, j) is not connected, we are to set the corresponding μ i j equal to 0. Observe that the dynamics as described above entail that the number of clients evolves as an infinite-server queue: the clients are served in parallel, and hence do not interact. We assume that the routing mechanism gives rise to an irreducible structure, entailing that the μ i j are such that for a client residing at a specific node with positive probability it visits any other node before leaving the network. In addition, for at least one node i it holds that μ i0 is strictly positive, thus guaranteeing that the network is stable. The arrival processes and service/routing processes are assumed independent.
We now describe how the links alternate between being 'up' and 'down'. To this end, we let the underlying graph dynamics be determined by a K -dimensional background process (X(t)) t 0 , assumed to be independent of the arrival processes and the serving/routing processes, that is defined as follows. Then links are partitioned into K mutually disjoint sets, which are denoted by A 1 , . . . , A K , which we refer to as blocks. All links that lie in a specific block, say A k , alternate between 'up' and 'down' simultaneously; X k (t) = 1 means that at time t the links in block k are 'up', and 0 otherwise. We define ; the down-time (up-time, respectively) of block k is exponentially distributed with parameter q (k) 0 (q (k) 1 ). The 'graph process' is given through which attains values in {0, 1} K . The two extreme scenarios are on one hand the case that we have just one block consisting of alln links, or on the other hand the case that we havē n independently evolving blocks that consist of one link each. The transition rate matrix of X(·) is of dimensionK ×K withK := 2 K , and given by where I n denotes the n-dimensional identity matrix and B 1 ⊗ B 2 denotes the Kronecker product of the two matrices B 1 and B 2 . We let q k be the (k, )-th entry of Q.
We now explain what happens to a client who wants to jump from i to j when the link is not present. As long as the link is down, at any attempt the client is lost with probability f i j ∈ [0, 1], and he remains at the node with probability 1 − f i j . While being at the node the mechanism that we defined above is in place: after an exponentially distributed amount of time with mean μ −1 i j (with j = 0, . . . , n) he wishes to jump to node j. To keep the notation compact, in Sects. 3 and 4 we assume that f i j = 1 for all i, j = 1, . . . , n (i.e., all clients are lost who wish to jump from i to j when the link between i and j is absent); in Sect. 5 we point out how to adapt the results to include situations with f i j ∈ [0, 1).
In this paper a key role is played by the n-dimensional queue length process where M i (t) ∈ N 0 represents the number of clients in the queue at node i at time t. Our objective is to characterize the distribution of M(t); as we will see below, this is possible, albeit in implicit terms, viz. in terms of a partial differential equation for the corresponding joint probability generating function. Observe that by itself (M(t)) t 0 is not a Markov process, but the joint process (M(t), X(t)) t 0 is.
As we want to keep track of M(t) as well as the number of lost clients, we work with the probability generating function with L(t) defined as the number of lost clients due to a link being 'down' during the interval [0, t] and k be an element in {0, 1} K .
Remark 1 In the model described all links are bidirectional: if the link between i and j is 'down', then clients can jump neither form i to j nor from j to i. The unidirectional variant of our model works in the precise same way; then there are n(n − 1) (instead of 1 2 n(n − 1)) possible links.

Prelimit Results
In this section we first set up a system of coupled partial differential equations for ϕ(w, z, t) (i.e., the 2 K -dimensional vector with elements ϕ k (w, z, t)). We then point out how these can be used to determine moments. The next subsection presents ways to quantify the effect of the graph dynamics on the performance as perceived by the network's users.

Partial Differential Equations
The main idea is to express ϕ(w, z, t + t), for t small, in terms of ϕ(w, z, t). We follow the precise same procedure as in e.g. [24]: we first set up the Kolmogorov equations for the state (L(t), M(t)) = (m 0 , . . . , m n ) at time t 0, then multiply with w m 0 z m 1 1 · · · z m n n , and sum over m 0 , . . . , m n . We recognize probability generating functions and their derivatives. More specifically, with I(i, j, k) being 1 if the link (i, j) is 'up' when X(·) is in state k and 0 otherwise, we thus obtain, The next step is to subtract ϕ k (w, z, t) from both sides, divide by t, and send t ↓ 0. In matrix-vector form, the resulting system of coupled partial differential equations reads, with IK (i, j) := diag{I(i, j, 1), . . . , I(i, j,K )} and JK (i, j) := IK − IK (i, j), as follows. The function ϕ(z) denotes the probability generating function of the stationary counterpart M of (M(t)) t 0 .

Proposition 1
The joint probability generating function ϕ(w, z, t) satisfies The probability generating function of the stationary counterpart ϕ(z) satisfies

First Moment
In this section we exploit Proposition 1 to determine the first moments; we first point out how this procedure works for the stationary queue length M, but later indicate how the corresponding transient moments can be found. We let X denote the stationary version of the background process.
Let π be the invariant probability measure of Q, i.e., theK -dimensional row-vector such that π Q = 0 and whose entries sum to 1. By differentiating the differential equation featuring in Proposition 1 with respect to z i and letting z ↑ 1, we obtain, for i = 1, . . . , n, We now explain how to set up a computational procedure with which the v i can be found. The n sets ofK -dimensional systems of linear equations can be cast into a single set of nK linear equations (in equally many unknowns). Let v ≡ (v 1 , . . . , v n ). Also, let λ the row-vector (λ 1 , . . . , λ n ), and In addition, we define the matrices We thus arrive at the linear system The transient first moment follows immediately by solving the corresponding system of linear differential equations. Let (M(0), X (0)) = (m, k 0 ), and let e k the k-th unit vector. Then, in self-evident notation, and with This approach can be extended in a straightforward fashion to also include the mean of the number of clients lost.

Remark 2
The relation (1) can be regarded as the expected-value version of the distributional identity cf. the relation used for our model's single-queue counterpart in [8]. The above distributional equality has an insightful interpretation: the first term on the right-hand side corresponds to the contribution to M(t) of clients that were already present at time 0, whereas the second term represents the contribution of arrivals in [0, t] which are then appropriately 'thinned'. As pointed out in [8], this representation in principle also provides a method to evaluate the covariance matrix of M(t), using the law of total variance; in Sect. 3.3 we will rely on an alternative approach, though.

Remark 3
In the fully symmetric situation, the formulas simplify considerably. Let λ be the arrival rate at each of the n nodes. The service rate is σ := ν + μ 0 , where the client leaves the network with probability μ 0 /σ and wants to move to another node (which is then picked uniformly at random) with probability ν/σ. Let all blocks alternate independently between being 'up' and 'down'. The up-and down rates are denoted by q 0 and q 1 , respectively (i.e., the down-time is exponentially distributed with mean q −1 0 , and the up-time exponentially distributed with mean q −1 1 ). Assume that the queues start empty at time 0, while all links are in stationary state (i.e., each of them is 'up' with probability π := q 0 /(q 0 + q 1 )). Then, for each of the nodes the mean number of clients present satisfies We thus find that

Higher Moments
We now point out how (mixed) higher moments can be evaluated. We work here, for obvious reasons, with the factorial moments, from which the regular moments can be recovered in an evident manner. We recall the standard notation (m) r := m!/(m − r )! for m, r ∈ N with m r ; this notation is the so-called Pochhammer symbol. The objective here is to compute, for r ≡ (r 1 , . . . , r n ) with r i ∈ N 0 , We will show that the ψ k (r, t) can be recursively evaluated. To this end, we first introduce the following differential operator: for f : Now the idea is to impose the operator D r [·] on both sides of the partial differential equation given in Proposition 1. In addition considering the limit w ↑ 1, z ↑ 1, we thus obtain, with e i the i-th n-dimensional unit vector, and μ + i jk : these computations rely on the evident relation Observe that in the above differential equation the indicator functions 1 {r i =0} (first term on the right-hand side) and 1 {r j =0} (second term on the right-hand side) can be left out: if the indicator function is 0, the corresponding term equals 0 anyway. Consequently, the transient mixed reduced moments can be alternatively expressed in compact notation as follows. Here it is used that To obtain the stationary mixed reduced moments, one has to set the left-hand side equal to 0. The reduced moments can be determined recursively, by solving a non-homogeneous system of linear differential equations. To verify this claim, define ξ(r) as the sum of the entries of r, i.e., r 1 + · · · + r n . Let S r be all vectors r such that ξ(r) = r. Observe that The idea now is to use (2) to evaluate (ψ 1 (r, t), . . . , ψK (r, t)) recursively: for r such that ξ(r) = r this vector is computed using the corresponding expressions for r such that ξ(r) = r − 1. In more detail, this procedure works as follows.
. . , n) by appealing to Eq. (2), using the ψ k (0, t) that we found for r = 0. This amounts to solvingK n coupled linear differential equations (where it can be checked that this system is equivalent to the one set up in Sect. 3.2). • We now consider r = 2. It is readily checked that #{S 2 } = n + 1 2 n(n − 1) = 1 2 n(n + 1), and that there are equally many equations of the type (2). As a consequence, using the ψ k (e i , t) that we found for r = 1, Eq. (2) reveals that we have to solve a non-homogeneous system ofK · 1 2 n(n + 1) linear differential equations. • One can proceed in a similar way with r 3; e.g. for ξ(r) = 3 we have that #{S 3 } = n + 2 · 1 2 n(n − 1) + 1 6 n(n − 1)(n − 2) = 1 6 n(n + 1)(n + 2). The cases with ξ(r) 3 are solved very similarly to the case ξ(r) = 2. Using (2) it can be inductively verified that in the r -th step we have a system of K r :=K · #{S r } non-homogeneous linear differential equations, where For the stationary reduced means a similar recursive procedure can be set up; in the r -th step a K r -dimensional system of linear equations needs to be solved. The above procedure can be extended in an evident way to include also the number of customers lost, focusing on the object with r ≡ (r 0 , . . . , r n ).

User-Perceived Performance
In this subsection we study the impact of the links' outages on the performance as perceived by the users. As it turns out, this can be done relying on classical arguments. We start by analyzing the fraction of clients that are lost (i.e., clients who leave the network because of a link being down, and not because of a service completion), denoted by ω. To this end, let ω ik be the probability that a client who enters the network at node i while the background process is in state k, is lost. Observe that μ + i jk /σ ik is the probability of a client jumping from node i to node j when the background process is in k; q k /σ ik and μ − i jk /σ ik can be interpreted analogously. Then, with q k := −q kk and σ ik := ν i + μ i0 + q k , by conditioning on the first jump, or, in a more convenient form, These equations constitute an nK -dimensional diagonally dominant system of linear equations (actually even strictly diagonally dominant as there is at least one i such that μ i0 > 0), which is known to yield a unique solution. Let, as before, π denote the stationary probability that the background process X(·) is in state (i.e., π solves π Q = 0). Then, with λ := n i=1 λ i , the loss probability equals As an aside we mention that the loss probability ω can alternative be evaluated, using the methodology of the previous subsections, as Along the same lines, we can determine the mean time (to be denoted by τ ) the job remains in the network, jointly with the client being eventually lost (i.e., leaving the network because of a link failure). Define τ ik to be this quantity for a client who enters the network at node i while the background process is in state k. Then, again conditioning on the first jump, Again this system of linear equations is diagonally dominant. As above,

Scaling Limit: Functional Central Limit Theorem
In this section we study the system under a particular scaling, under which there is convergence to a Gaussian process (viz. a multivariate Ornstein-Uhlenbeck process). We consider the following scaling: λ → N λ and q for some α > 0 (for k = 1, . . . , K and i = 0, 1). The model is thus parametrized by N ; to stress the dependence on N , we throughout write M (N ) (t) and X (N ) (t). We focus on a functional central limit theorem for a centered and appropriately scaled version of M (N ) (t). For the moment we concentrate on the (more involved) case α = 1; in Remark 5 we point out how this provides us with the limit result for α ∈ (0, ∞) \ {1} as well.
A full proof is (far) beyond the scope of this paper. For a rigorous derivation of the functional central limit theorem, based on martingale arguments in combination with the continuous mapping theorem, for a class of network models that is substantially broader than the one studied in this paper, we refer to [18]. Below we by and large follow the structure that we used in [9] for a single Markov modulated infinite-server queue (with a crucial step stemming from [18]), and therefore we restrict ourselves to highlighting the main steps.
• Deviation matrix. We now introduce some notions that we need across this subsection. For ease, we define them here for the unscaled system, but for the scaled system they can be adapted in a straightforward manner.
Define by p k (t) := P(X(t) = | X(0) = k) = (e Qt ) k the transition probabilities of X(·). The equilibrium probability that block m is 'up' is given by π (m) = q (m) for m = 1, . . . , K . An important role in the analysis is played by the deviation matrix D (of dimensionK ×K ), whose (k, )-th entry is given by with π, as before, the solution to π Q = 0 with entries summing to 1.
Let U k be the set of blocks that is 'up' when X(t) = k, and D k : and From the explicit expressions for the p (m) i j (t), we conclude that p where S is a non-empty set. When subtracting (4) from (3), this entails that, for non-empty sets S m (k, ) that depend on m , k, and , for coefficients α m (k, ) that are straightforward to evaluate but that do not allow an explicit expression. As a consequence, As an example, we work out the D matrix for the case of one block (i.e., K = 1, or, equivalently,K = 2). Put q := q (1) , q i := q (1) i (for i = 0, 1), and π := π 1 . Then • Time-changed Poisson process representation. In this section we repeatedly use the following representation. We throughout use the definition μ i jk := μ i j I(i, j, k) if j = 1, . . . , n and With all P i j (·) (for i, j = 0, . . . , n) independent unit rate Poisson processes, it is directly verified that with the above definition of the rates μ i jk the numbers of customers in the respective queues satisfy with Z (N ) . This type of Poisson processes with random time-change, and their applications in obtaining scaling-limits, have been described in great detail in e.g. [1].

• SDE for centered and normalized system. The idea is to first set up an SDE for M (N ) (t), which is then translated into an SDE for its centered and normalized versionM (N ) (t).
For i = 1, . . . , n, by (5), for some n-dimensional martingale κ (N ) (·). In the functional central limit theorem, fluctuations around an average are considered; this n-dimensional average vector (t) solves the following system of linear differential equations: for i = 1, . . . , n. As shown in [18,Sect. 3], this (·) can be considered as the fluid limit [29] corresponding to M (N ) (·).
Keeping in mind we aim at deriving results for the central-limit regime, we consider a centered and normalized process whose i-th component is defined bỹ The idea now is to plug in the differential equation that is obeyed by (t); the resulting stochastic differential equation resembles the one featuring in [9]: omitting a few elementary steps, using the compact notationZ • A simplification. The next step is to verify that as N → ∞,M in the previous display, but now with the Z (N ) k (t) in the first two terms in the right-hand side replaced by π k . To this end, observe that [18,Thm. 5 using the law-of-large-numbers property that M We have thus arrived at, withμ i j := K k=1 μ i jk π k , the following system of SDE's: • Functional central limit theorem. The following steps echo those in [9,Sect. 4]. They rely on the transformation with for i = j the (i, j)-th entry of the (n × n)-dimensional matrix M being given byμ ji , whereas the (i, i)-th entry is − n j=0, j =iμ i j . It thus follows that where the (i, k)-th entry of the (n ×K )-dimensional matrix M • (t) is given by In the next step we analyze the terms √ N M • (t)Z (N ) (t) and N −1/2 dκ (N ) (t) separately. As , withB(·) ā K -dimensional zero-mean Brownian motion with covariance matrix .

Now recall the relation between Y (N ) (t) andM (N ) (t), and the fact thatM (N ) (·)−M (N ) (·)
converges to the zero process as N → ∞. Based on the weak convergence results established above, we thus obtain the following functional central limit theorem. It states that the process under study converges to a (multivariate) process of Ornstein-Uhlenbeck type.

Remark 4
The distribution ofM(t) (for a given value of t 0, that is) can be explicitly found from known results for multivariate Ornstein-Uhlenbeck processes. IfM(0) is constant, then it is an n-dimensional Normal distribution with mean 0 and covariance matrix Observe that, as¯ ii (s) = i (s) by definition, t 0¯ ii (s)ds = i (t). With standard theory on multivariate Ornstein-Uhlenbeck processes also the Cov(M(t),M(t+u)), i.e., the covariance matrix pertaining to the system's time-dependent behavior, can be determined.

Remark 5
As we mentioned, the above result corresponds to the case α = 1. Precisely following the line of reasoning of [9,18], for arbitrary α > 0, we have to defineM Then the recipe is to go through precisely the same steps as in the proof for α = 1, but it will turn out that for α > 1 a specific part of the resulting SDE cancels, whereas for α < 1 another part cancels.
More specifically, it can be argued that for α > 1 the limiting system of differential equations reduces, for i = 1, . . . , n, to Observe that this entails that for α > 1 the limiting system depends on the service rates μ i jk only through their time averaged counterpartsμ i j ; this reflects the relatively fast alternating link state process. The system essentially behaves as a network of non-modulated infiniteserver queues; in particular, Remark 4 indicates that the centered and normalized versions of the individual queues, i.e., the processesM For α ∈ (0, 1) the limiting system of differential equations becomes, for i = 1, . . . , n, In this case, the link state process is relatively slow, such that the scaling limit contains detailed information on the transition rates. In this regime, the individual queues are not asymptotically independent.
Remark 6 At the expense of some additional notation and administration, the loss process L (N ) (t) can be added to vector M (N ) (t), in that a functional central limit theorem for the centered and normalized version of (L (N ) (t), M (N ) (t)) can be established using the same techniques.

Extensions, Ramifications
In this section we discuss two extensions. In the first subsection we describe how to adapt the model to incorporate phase-type distributed up-and down-times and phase-type service times. In the second subsection we point out how to adapt the model so as to cover the situation in which blocked customers (i.e., customers wishing to jump from i to j when the link between i and j is down) potentially retry.

Phase-Type Distributions
In case the up-and down-times are of phase-type, this is easily incorporated in that transition rate matrix Q. The background process now keeps tracks of each of the links being up or down, but in addition, it gives the phase of the current up-or down-time. If the up-time (down-time, respectively) of the link between i and j is phase type of degree δ Likewise, the service times can be made phase-type, by keeping track of an infinite-server queue of all clients at any specific node being in a specific phase of the phase-type service time.

Model in Which Blocked Customers Retry
The previous two section considered the case that all f i j are equal to 1. It is not hard to generalize the results to the situation in which f i j ∈ [0, 1) are allowed as well, but it comes at the price of having to introduce a substantial amount of additional notation. For this reason, we restrict ourselves in the subsection of just pointing out how the results have to be adapted to accommodate f i j ∈ [0, 1).
It is readily verified that now the joint probability generating function ϕ(w, z, t) satisfies The probability generating function of the stationary counterpart M follows as before, i.e., by plugging in w = 1 and equating the right-hand side to 0.
The time-dependent moments can be evaluated from the next result; again, the stationary counterpart follows by equating the right-hand side to 0. Proposition 4 For r ∈ N n 0 , k ∈ {1, . . . ,K } and t 0, An interesting special is case is f i j = 0 for all i, j, i.e., the case in which there are no clients lost. If in addition full symmetry is assumed, cf. Remark 3, the mean allows an explicit expression. It is directly seen that for each of the links the mean number of clients present at time t, denoted as before by v(t), satisfies It thus follows that which converges to v := λ/μ 0 as t → ∞. Observe that v(t) is in this case not affected by π, due to the fact that parts of the in-flux and out-flux cancel, as observed in (7). Regarding the functional central limit theorem, the result for f i j = 1 carries over to that for f i j ∈ [0, 1), but with I(i, j, k)).
Recall that in this model with a probability 1 − f i j a customer wishing to jump from node i to node j retries when the link is not available (and hence stays at node i).

Examples
In this section we work out a couple of relevant examples, starting with a tandem network in which the link between the nodes is subject to failure. In the first subsection we consider the case that all blocked customers are lost (i.e., f 12 = 1), whereas in the second subsection all blocked customers retry (i.e., f i j = 0). Section 6.3 presents the FCLT for the two-node tandem. In Sect. 6.4 we consider the FCLT for the case of a symmetric fully connected n-node network in which blocked customers are lost (where it is noted that the case in which they retry is dealt with fully analogously); Sect. 6.5 deals with its ring-shaped counterpart.

Two-Node Tandem, with Blocked Customers Being Lost
We consider a two-node tandem, where traffic arrives at the first node, is sent to the second node after having been served at the first node, and leaves the network after having been served at the second node. Jobs arrive at the first node according to a Poisson process with rate λ and have exponentially distributed service times with mean μ i at node i (i = 1, 2). The link between node 1 and 2 is up (down, respectively) during an exponentially distributed time with mean q −1 1 (q −1 0 , respectively). In this subsection we consider the case that f 12 = 1: clients who wish to jump from node 1 to node 2 when the link is down, are lost. In this case, Define by v i j (t) the mean number of clients at node i (i = 1, 2) when the background process is in state j ( j = 0, 1). Using our expression for the transient first moment, with p 0 (t) ( p 1 (t), respectively) the probability the link is down (up, respectively), This system can be solved in closed form, realizing that the first two differential equations can be solved in isolation first (leading to explicit expressions for v 10 (t) and v 11 (t)), and then the last two differential equations (using the found expression for v 11 (t)). As these are standard computations involving systems of non-homogeneous linear differential equations, we do not include the expressions here.
The stationary expectations can be found along the same lines. Let a be a (twodimensional) diagonal matrix, whose diagonal elements are all equal to a ∈ R. Then the steady-state means are, with π 0 = 1 − π 1 = q 1 /(q 0 + q 1 ) and π = (π 0 , π 1 ), The number of clients lost per unit of time is μ 1 v 10 .

Two-Node Tandem, with Blocked Customer Retrying
The model in which f 12 = 0 has more intricate interactions, as the link between the nodes being down has impact on the number of clients at node 1. It means that in the set of differential equations that we set up for f 12 = 1, the first one has to be replaced by v 10 (t) = λp 0 (t)+q 1 v 11 (t)−q 0 v 10 (t). Again the time-dependent means can be found in closed form by solving two 2-dimensional systems of linear differential equations, and the stationary means by solving two pairs of linear equations. As it turns out, however, we can also explicitly find the distribution of the stationary number of clients residing at node 1, as follows; the resulting formulae reveal the effect of the link failures on the performance experienced at this node.
Let ϕ 0 (z) be the probability generating function of the stationary number of customers at node 1 jointly with the event that the link is down, and ϕ 1 (z) its counterpart jointly with the link being up. The following (differential) equations apply: Inserting ϕ 0 (z) = q 1 ϕ 1 (z)/(q 0 + λ(1 − z)) into the second equation, we obtain or, equivalently, .
Up to an additive constant, we thus obtain that and using that ϕ 1 (1) = π 1 ,

Fig. 1 Stationary probability density function
Using the relation between ϕ 0 (z) and ϕ 1 (z), we find that the transform of the stationary number at the first node equals This expression has the following nice interpretation. Let A be Poisson with mean λ/μ 1 , and let B with probability π 0 be a negative binomial random variable with parameters r := q 1 /μ 1 + 1 and p := q 0 /(q 0 + λ) and with probability π 1 a negative binomial random variable with parameters r − 1 and p. Then the stationary number of customers at the first node is distributed as the sum of two independent random variables A and B (which are both non-negative and integer-valued). Note that if the link would never be down (i.e., π 1 = 1 and q 1 = ∞), the number of customers at node 1 is just Poisson with mean λ/μ 1 (like in the ordinary M/M/∞ queue); the additional random variable B (which is a mixture of two negative binomial random variable) thus represents the effect of the link failures.
A numerical illustration is shown in Fig. 1, where we have estimated the stationary random variable M by simulation. We have fixed the parameters λ = 20, μ 1 = 3, μ 2 = 2, q 0 = 1 and f = 0, and have varied the parameter q 1 . This experiment visualizes the impact of the link failures on the random variable M; the left graph corresponds with the upstream queue, and the right graph to the downstream queue. We choose q 1 = 0.01, 0.5, 1, and 3. The simulated numbers in the left panel align with the distribution identified above.

Functional Central Limit Theorem for Two-Node Tandem
In this example we derive the FCLT for the two-node tandem. Clients wishing to jump from node 1 to node 2 while the link is down are lost with probability f := f 12 , and stay at node 1 with probability 1− f. First we determine the fluid limit (which is to be used as the 'centering function' in our FCLT). We use the same notation as in the above examples, and in addition we define κ := μ 1 (π + (1 − π) f ) with π := π 1 = q 0 /(q 0 + q 1 ). Then Assuming the system starts empty, we thus obtain It can be checked that In addition, with q := q 0 + q 1 , With the matrix Cov(M(t),M(t)) can be evaluated using the expressions presented in Remark 4.
We now present a number of figures that illustrate the applicability of the diffusion limit as an approximation to the original population process. First we consider the scaling parameter N = 100 and the parameters λ = 25, μ 1 = 10, μ 2 = 20, and f = 0. In the two simulations we varied the transition rates: they are q 0 = 0.2, q 1 = 0.6 in the left sample path, and q 0 = 30, q 1 = 20 in the right sample path. We pick α = 1, so that the arrival rate λ (N ) is N λ, whereas transition rates are set to q = Nq 1 . The red curves appearing in Fig. 2 above correspond to the functions N 1 (·) and N 2 (·), where 1 (·) and 2 (·) are the two 'centering functions' that were computed above. The blue and green curves are the corresponding sample paths.
In Fig. 3 histograms are presented for the centered and scaled population process in each queue. The parameters λ, μ 1 , μ 2 and f are chosen as above; the transition rates of the background process are q 0 = 30 and q 1 = 20. We consider t = 2 and N = 60. The graphs show that the Gaussian limiting distribution provides an accurate fit; the dotted curves correspond to the zero-mean Gaussian distribution with the variance in line with the diffusion limit.

Functional Central Limit Theorem for Symmetric Fully Connected One-Block Network
In this subsection we consider the functional central limit theorem for a network with just a single block (i.e., all links alternate between being 'up' and 'down' simultaneously), and all parameters chosen symmetrically; blocked customers are assumed lost (but the case in which they retry works analogously). More concretely, the situation considered is the following. The arrival rate at each node is N λ. The down-time of all links is exponentially distributed with mean (Nq 0 ) −1 , whereas the up-time is exponentially distributed with mean (Nq 1 ) −1 . As in Remark 3, the service rate is σ := ν + μ 0 ; after service completion a client leaves the network with probability μ 0 /σ and wants to move to another node (picked uniformly at random) with probability ν/σ. Define π := q 0 /(q 0 + q 1 ).

Functional Central Limit Theorem for Symmetric Ring-Shaped One-Block Network
The setting considered is the same as in the previous subsection, with the only exception that a job served at queue m moves to queue m + 1 (where n + 1 is to be understood as 1). More specifically, the service rate is σ := ν + μ 0 ; after service completion a client leaves the network with probability μ 0 /σ and wants to move to the next node with probability ν/σ. We concentrate on the case that f i,i+1 = f n,1 = 1 (i.e., during outages jobs that wish to jump to the next queue are lost), but we remark that the case of retry can be handled analogously.
The matrix M • (s) is, analogously to what we found in Sect. 6.4, an (n × 2)-dimensional matrix whose entries in the first column are all m 0 (s) := − (s)σ , and whose entries in the second column are all m 1 (s) := − (s)μ 0 . The matrix is as defined in Sect. 6.3.
Observe that F n 1 n = 1 n (as F n is a permutation matrix), and therefore F k n 1 n = 1 n , so that e ω 1 F n t 1 n = ∞ k=0 F k n 1 n k! (ω 1 t) k = e ω 1 t 1 n .
This allows us to conclude that e M t 1 n = e −σ t 1 n . The remaining computations are as in the previous subsection.

Concluding Remarks
In this paper we have considered networks of infinite-server queues with faulty links. Clients that wish to jump from one queue to another while the required link is down are with a given probability lost (and otherwise stay at the origin node to retry after an exponentially distributed amount of time). For this model we derived prelimit results (in terms of differential equations uniquely characterizing the probability generating function, as well as a recursion by which all moments can be determined) as well as a functional central limit theorem (after appropriately scaling the arrival rates and the links' failure and repair rates). This work is among the first papers on queueing processes on dynamically evolving random graphs. Several alternative models can be considered; we mention a few here. (i) In our work all queues were of infinite-server type. In many applications, one would rather be interested in the queueing discipline being single-server of many-server. (ii) Our probabilistic analysis covers means and diffusion limits, but extreme behavior ('far away from the mean') is not included. Such a rare-event analysis sheds light on the probability that the queueing process attains values in remote sets. (iii) In dynamically evolving networks, typically measures are taken when links fail; think of rerouting mechanisms. This makes the systematic study of the efficacy of such rerouting protocols a relevant topic for further study.