K competing queues with customer abandonment: optimality of a generalised c (cid:2) -rule by the Smoothed Rate Truncation method

We consider a K -competing queues system with the additional feature of customer abandonment. Without abandonment, it is optimal to allocate the server to a queue according to the c μ -rule. To derive a similar rule for the system with abandonment, we model the system as a continuous-time Markov decision process. Due to impatience, the Markov decision process has unbounded jump rates as a function of the state. Hence it is not uniformisable, and so far there has been no systematic direct way to analyse this. The Smoothed Rate Truncation principle is a technique designed to make an unbounded rate process uniformisable, while preserving the properties of interest. Together with theory securing continuity in the limit, this provides a framework to analyse unbounded rate Markov decision processes. With this approach, we have been able to ﬁnd close-ﬁtting conditions guaranteeing optimality of a strict priority rule.


Introduction
In this paper, we consider a server assignment problem. There are K customer classes, and each customer class 1 ≤ i ≤ K has holding costs c i per unit time, per customer. There is a single server that can serve class i at rate μ i . Arrivals occur according to independent Poisson streams, independently of the service process. Each class i customers abandons the system at rate β i , 1 ≤ i ≤ K , independently of whether he is being served or waiting in the queue. The question we address in this paper is: what service policy minimises the expected discounted total and average cost?
In the K -competing queues model without abandonments, it is well-known that the cμrule is optimal. The cμ-rule gives full priority to the queue with the highest index c i μ i , that is, the queue that gives the highest cost reduction per unit time. This result was shown to be optimal in 1985 simultaneously by Baras et al. (1985) and by Buyukkoc et al. (1985).
Recently, there has been a revived interest in the K -competing queues model, with the additional feature of customer abandonment due to impatience. In this case the cμ-rule is not always optimal. When we model this problem as a continuous-time Markov decision process (MDP) the abandonments induce unboundedness of the transition rates as a function of the state. Hence, uniformisation is not possible and the standard (discrete time) techniques are not available. In the literature several approaches have been tried to deal with this difficulty. We may categorise them in three main approaches.
1. Study of a relaxation or approximate version of the original problem (see e.g. Atar et al. 2010;Ayesta et al. 2011;Larrañaga et al. 2013Larrañaga et al. , 2015. The obtained policies may serve as a heuristics. 2. Application of specific coupling techniques to obtain an optimal policy. Typically these papers (see e.g. Salch et al. 2013;Down et al. 2011, see also Ertiningsih et al. 2015) are limited to special cases, as the coupling gets more tedious in a more general setting. On the other hand, non-Markovian service time distributions and/or a non-Markovian arrival process may be handled. 3. Truncation of the process to make it uniformisable. Then use discrete-time techniques to derive properties of the optimal policy (see e.g. Down et al. 2011;Bhulai et al. 2014;Blok and Spieksma 2015). This is the solution method that we will follow in this paper.
The first approach is most prominent in the literature. In Atar et al. (2010) consider a Kcompeting queues problem with many servers. In their paper, the cμ/β-rule is introduced. This rule prioritises the queue with the highest index c i μ i /β i . We will refer to this rule as the cμ/β-rule. In the paper, it is shown, that it is asymptotically optimal to follow the cμ/β-rule in the overloaded regime, as the number of servers tends to infinity. Ayesta et al. (2011) studied the problem as well. They derive priority rules similar to the cμ/β-rule by analytically solving the case with one or two customers initially present and without arrivals. Larrañaga et al. (2013) have studied a fluid approximation of the multi-server variant of the competing queues problem. In this fluid approximation optimality of the cμ/β-rule in the overloaded regime is shown and it is shown that for K = 2 a switching curve policy is optimal in the underloaded regime. In Larrañaga et al. (2015) the same authors study asymptotic optimality of the multi-server competing queues problem for the average expected cost criterion. The authors consider the problem as a restless multi-armed bandit problem, and compute and show that the Whittle index is asymptotically optimal for convex holding cost. The asymptotics concern large states, and light and heavy traffic regimes. The paper also connects the cμ/β-rule to the Whittle index for fluid approximations.
Other papers do not focus on heuristics, but try to find a subset of the input parameters for which a strict priority rule can be proven to be optimal. Salch et al. (2013) study the competing queues system with a restriction to a maximum of K arrivals. Customers may be impatient, but do not leave the system when they become impatient. Thus, the model is, in fact, a scheduling problem, and the criterion is to minimise the expected weighted number of impatient customers. With the use of a coupling and an interchange argument optimality of a priority policy is proved, provided a set of three conditions on the service, impatience and cost rates holds.
The paper of Down et al. (2011) considers a two-competing queues reward system, where the two classes have equal service rates. A coupling argument is employed to show that if type 1 customers have the largest abandonment rate and reward per unit time, then prioritising these customers is optimal.
The approach that we will carry out is the following. First, we model the problem as a continuous-time MDP. To make the MDP uniformisable, a truncation is necessary. After uniformisation, the truncated processes can be analysed by value iteration. To justify appropriate convergence of the truncated processes to the original model, a limit theorem is required. To our knowledge, so far such a theorem is available only for the discounted cost criterion, see Blok and Spieksma (2015). Via a vanishing discount approach, the results are transferred to the average cost criterion (see Blok and Spieksma 2017 for the justification). Therefore, we will first show for the discounted cost criterion that prioritising type i customers is optimal, if type i has maximum index with respect to c, cμ and cμ/β. These conditions are similar to Salch et al. (2013), however the conditions of Salch et al. (2013) are implied by our conditions. Since the resulting index policy is optimal for all small discount factors, even strong Blackwell optimality of this policy follows.
In the paper of Down et al. (2011) a similar approach is used. The limit argument relies on specific properties of the model and a special truncation that does not affect optimality of the aforementioned priority policy. Due to the involved nature of the truncation, it seems unlikely that Down et al. (2011) can be extended to more dimensions or to heterogeneous service rates. The results of our paper can therefore be viewed as an extension of Down et al. (2011). In this paper, we use a different truncation technique called Smoothed Rate Truncation (SRT). This technique has been introduced by Bhulai et al. (2014) and can be utilised to make a process uniformisable while keeping the structural properties in tact.
The paper is organised as follows. In Sect. 2 we give a complete description of the model, and we present the main results. Section 3 contains the core of our analysis. First, it describes the Smoothed Rate Truncation in more detail, then the structural properties of the value function are derived. In Sect. 4 we prove the main theorem. This can be done by invoking the limit theorems of Blok and Spieksma (2017) and Blok and Spieksma (2015). Section 5 presents some numerical examples that show that none of the used conditions are redundant. In the "Appendix", we provide the proofs of the propositions in Sect. 3.

Problem formulation
We consider K stations that are served by a single server. Customers arrive to the stations according to independent Poisson processes with rates λ i > 0 for i = 1, . . . , K , respectively. The service requirements of class i customers are exponentially distributed with parameter μ i > 0. Customers have limited patience: they are willing to wait an exponential time with parameter β i > 0 for class i. We allow abandonment during service as well, resulting in an abandonment rate in station i of β i x i if there are x i customers present at station i. In Sect. 2.2 we will also discuss alternative modelling choices.
The service requirements, abandonments and arrivals are all stochastically independent of each other. Class i customers carry holding costs c i ≥ 0 per unit time, i = 1, . . . , K . The service regime is pre-emptive.
We will study this problem in the framework of Markov decision theory. To this end, let the state space be S = N K 0 . The action space is A = {1, . . . , K }, where action a ∈ {1, . . . , K } corresponds to assigning the server to station i if a = i. Thus, we only allow idling if one or more queues are empty. By Π = {π : S → A} denote the collection of stationary deterministic policies. For π ∈ Π, a rate matrix Q(π) and cost rate c(π) are given by where e i stands for the i-th unit vector. One can then define a measurable space to which X is adapted, and a probability distribution P π ν on (Ω, F ), such that X is the minimal Markov process with q-matrix Q(π), for each initial distribution ν on S, and each policy π ∈ Π. By P t (π) = ( p t,xy (π)) x,y∈S , t ≥ 0, we denote the corresponding minimal transition function and by E π ν the expectation operator corresponding to P π ν . Notice that we will write P π x , E π x , when ν = δ x is the Dirac measure at state x. The problem of interest is finding the policy π ∈ Π that minimises the total expected discounted cost and the expected average cost. Let α > 0. To this end, define to be the total expected α-discounted cost under policy π, given that the system is in state x initially, x ∈ S. Then the minimum total expected cost value function V α is defined by If V π α = V α , then π is said to be an α-discount optimal policy. If there exists α 0 > 0, such that π is α-discount optimal for α ∈ (0, α 0 ), then π is called a strongly Blackwell optimal policy. By g π given by we denote the expected average cost under policy π, given the initial state x, x ∈ S. Again, the minimum expected average cost is defined by and if g π = g, then π is an optimal policy.
It is not to be expected that the optimal policy has a simple description in general. In this paper, we will restrict to providing sufficient conditions for optimality of an index policy.

Main result
The two main results of our paper are Theorems 1 and 2, providing sufficient conditions for optimality of the Smallest Index Policy.

Definition 1
The Smallest Index Policy assigns the server to the non-empty station with the smallest index. The policy only idles, if no customers are present.
Theorem 1 Suppose that the stations can be ordered such that, for 1 ≤ i ≤ j ≤ K , the following three conditions hold then the Smallest Index Policy is α-discount optimal for any α > 0, and hence also strongly Blackwell optimal.
Theorem 2 Under the conditions of Theorem 1, the Smallest Index Policy is average cost optimal.
The proofs are postponed until Sect. 4. In Sect. 5 we give examples showing, that if any of the three conditions of (1) is omitted, the Smallest Index Policy can fail to be optimal.
Alternative modelling choices In our model the cost function is a holding cost c i x i per unit time, when the system is in state x. In many applications a penalty (say P i for class i) is charged, if a customer abandons the system due to impatience. Then the cost per unit time is given by i P i β i x i . Substitution of c i = P i β i , i = 1, . . . , K implies equivalence of these cost structures. We modelled the system, such that customers can leave the system while being in service. In some models, it may be more realistic that abandonment does not take place, after service has started. However, if the abandonment rates are smaller than the service rates, i.e., β i < μ i for all i, then our analysis is still valid after an appropriate parameter change. That is, we consider the system with service ratesμ i = μ i − β i > 0. Abandonments during service or service completions in the revised model correspond to a service completion in the original one.
If, for one or more classes, the abandonment rates are greater than or equal to the associated service rates, then this substitution is clearly not possible. However, serving that customer class delays the process of emptying the system. It follows directly that in this case, it can never be optimal to serve these classes of customers. Hence, when there are only customers of that type present then the server should idle in order to minimise the expected average cost. Therefore, the optimal policy never serves class i if μ i ≤ β i . For the remaining customer classes with μ i > β i , the Smallest Index Policy is optimal, whenever these classes can be ordered, such that c , cμ , cμ/β . Finally, it is possible to allow idling at all times. However, it can easily be shown that it is not optimal to have unforced idling. Therefore, we ignore this option for the sake of notational convenience.

Structural properties
As mentioned in the introduction, Sect. 1, we will first study the α-discounted cost problem. Crucial in establishing optimality of the Smallest Index Policy are certain properties of the value function. If V α is non-decreasing (I ) and weighted Upstream Increasing (wU I ), then optimality of the Smallest Index Policy can be directly deduced from the α-discounted cost optimality equation under certain conditions on the Markov decision problem (cf. Spieksma 2015, 2017) that we will not discuss explicitly in this paper. We will next define the structural properties (I ) and (wU I ).
Definition 2 The function f : S → R is called weighted Upstream Increasing (wU I ) if f ∈ wU I , with wU I defined by The following lemma makes the connection between the structural properties of the αdiscounted cost value function and optimality of the Smallest Index Policy.
Lemma 1 Let the discount factor α > 0. Then, the α-discounted cost value function V α is well-defined and finite. Suppose V α ∈ wU I ∩ I , then the Shortest Index Policy is α-discount optimal.
Proof One can view the MDP as a negative dynamic programming problem (cf. Strauch 1966), for which simple conditions allow to draw the conclusions that we aim for. Since later on we will have to include perturbations, we will use (Blok and Spieksma 2015, Theorem 4.2). The conditions in that theorem are all easily verified, except for the following two conditions: P1 There exist a function F : S → (0, ∞), and a constant γ < α with the properties that If F satisfies the first property, then F is called a γ -drift function for the MDP. P2 There exist a function G : S → (0, ∞) and a constant ξ , such that the following properties are satisfied.
-G is a ξ -drift function for the MDP.
-G is an F-moment function, i.e. there exists an increasing sequence {K n } n , K n ⊂ S, We check property P1. Take F x = e (x 1 +···+x K ) , with to be determined. Then, Clearly, one can choose > 0 sufficiently small, so that Now by virtue of (Blok and Spieksma 2015, Theorem 4 The DCOE yields that if class j 1 and j 2 customers are both present, then it is optimal to serve class j 1 rather than class j 2 .
Further, since V α is non-decreasing we have for 1 ≤ j ≤ K , and x with x j > 0 that with 0 corresponding to the cost if an empty queue is served. Hence idling is never optimal; it is optimal to serve a customer whenever possible. We conclude that the Shortest Index Policy is optimal.

Smoothed Rate Truncation
The abandonment rates increase linearly in the number of waiting customers. Hence the transition rates are unbounded as a function of the state. Thus, the system is not uniformisable and so there is no discrete-time equivalent to the continuous-time problem. To make discrete-time theory available, we approximate the MDP with a sequence of (essentially) finite state MDPs. Unfortunately, standard state space truncations generally destroy the structural properties of interest due to boundary effects.
To this end, we have developed the Smoothed Rate Truncation (SRT). This perturbation technique was first introduced in Bhulai et al. (2014). In that paper, SRT is applied to a Markov cost process, and properties of the value function are proven. The distinguishing feature of SRT is that the transition rates are decreased in all states, also close to the origin.
This makes the jump rates highly state dependent and complicates the analysis, but it is the key feature of SRT that ensures that the properties are preserved.
The idea of SRT is as follows. Every transition that moves the system into a higher state in one or more dimensions is linearly decreased as a function of these coordinates. This naturally generates a finite subset of the space, that cannot be left with positive probability under any policy. As a consequence, recurrent classes under any policy are always finite. As we get closer to the boundary of the finite set, the rates are smoothly truncated to 0. On the finite state space, the transition rates are bounded. Outside the finite set, the rates can be arbitrarily chosen, since these states are inessential. In particular, they can be chosen such that the jump rates are uniformly bounded.
In our model, a truncation parameter N = (N 1 , . . . , N K ) ∈ N = (N ∪ ∞) K defines the size of the state space. Since the empty state can always be reached, and there is a positive probability of an arrival in any queue within the finite set (not on the boundary clearly), the set of essential states is given by SRT prescribes a truncation of all transitions that move the system into a 'larger' state. In this model only arrivals move the system to a larger state, hence for all i the arrival rates λ i are replaced by new rates λ N i (x) in state x. The smoothed arrival rates are given by The result is a uniformisable MDP for each N ∈ N , which leads to a collection of parametrised MDPs As has been mentioned already, outside S N it is possible to choose the rates as we like, for these states are inessential. In particular, we can choose the new abandonment rates of class i to be bounded by N i β i . Furthermore, the perturbed MDP is easily checked to satisfy the conditions of (Blok and Spieksma 2015, Theorems 4.2 and 5.1). The main ingredients of its verification are analogous to the proof of Lemma 1. The results in Blok and Spieksma (2015) guarantee that the value function V and any limit point of α-discount optimal policies for the N -perturbation, N → ∞ K , is α-discount optimal for the original MDP.

Dynamic programming
Apart from the parameter space N , we will need to introduce a special subset N (λ), given by Throughout the rest of this section, we fix the truncation parameter N ∈ N and discount factor α > 0. Our goal is to show that V (N ) α ∈ wU I ∩ I for all α > 0 and N ∈ N (λ). We use the following short-hand notation Without loss of generality, we may assume thatλ + β N + μ = 1. The discrete-time uniformised MDP is defined by denote the expected discrete-timeᾱ-discounted optimal cost: Serfozo 1979). Moreover, we can approximate V (N ,d) α by using the value iteration algorithm. Indeed, the uniformised N -perturbed MDP in discrete time satisfies the conditions from Wessels (1977) for value iteration to converge. This is easily deduced from the fact that the N -perturbed MDP in continuous time satisfies the conditions of (Blok and Spieksma 2015, Theorems 4.2 and 5.1), which are the continuous time versions of the conditions developed by Wessels in discrete time. Let We will prove by induction that v (N ,d) n,ᾱ ∈ wU I ∩ I on S N , for all n ≥ 0. To employ the induction argument, we need three additional structural properties: convexity, supermodularity and bounded increasingness. We will specify these hereafter. The induction hypothesis v (N ,d) 0,ᾱ ≡ 0 trivially satisfies all these properties. For the induction step, we will use Event Based Dynamic Programming (EBDP). This method uses event operatorsrepresenting arrivals, departures or cost-as building blocks to construct the iteration step of the value iteration algorithm.
Definition 3 Let f : S → R, then define 1. (a) The total smoothed arrivals operator else.

(a) The total increasing departures operator
else.
3. The cost operator

The cost + increasing departures operator
.

The discount operator
as follows As has been mentioned, it is sufficient to verify that v (N ,d) n,ᾱ has the desired structural properties on the finite set S N . Therefore, we define the following collections of functions restricted to S N .
Definition 4 (Properties on S N )

Bounded increasing functions on
The following propositions are sufficient for the desired structural properties to propagate through the induction step.

Proposition 1
The smoothed arrivals operator has the following propagation properties Proposition 2 The increasing departure operator has the following propagation properties

Proposition 4 The cost + increasing departures operator has the following propagation properties
Proposition 5 The movable server operator has the following propagation properties

Proposition 6
The uniformisation operator has the following propagation properties:

Proposition 7
The discount operator has the following propagation properties: The proofs of the propositions are provided in the "Appendix".
Further, under the above conditions we have that Assertion (i) follows by induction. Assertion (ii) immediately follows from (i) due to convergence of value iteration [see Wessels 1977, (Blok andSpieksma 2017, Theorem 5.2)].
Notice, that the model satisfies the assumptions of (Blok and Spieksma 2017, Theorem 5.7). This theorem implies the existence of a sequence (α m ) with lim m→∞ α m = 0, such that the limit lim m→∞ π α m is average optimal. Since π α is the smallest index policy for all α > 0, so is the limit policy. Hence the Smallest Index Policy is average optimal.

Numerical results
The triple inequality on the input parameters of the process guaranteeing optimality of the Smallest Index Policy induces a lot of parameter configurations that fall outside the scope of the theorems. This naturally gives rise to the question whether all three inequalities are necessary.
From numerical calculations, it follows that we cannot omit one of the three conditions. If one of these three inequalities is violated, then the examples below show that the Smallest Index Policy need not be optimal. We carried out the calculations for K = 2. We see that the first condition is violated, c 1 < c 2 , while the other conditions are satisfied. The optimal policy is a switching curve policy: for 'small' states action 2 is optimal and for 'large' states action 1 is optimal, see Fig. 1. Note that colour green corresponds to action 1, i.e. serving queue 1, and colour red to action 2 i.e. serving queue 2. 2. The next parameter setting is given by cμ ↑ c i c i μ i c i μ i /β i λ i μ i β i i = 1 1 1 100 0.5 1 0.01 i = 2 1 2 20 0.5 2 0.01 Observe, that the first and the third condition hold, but the second condition is violated. In Fig. 2 the optimal policy is displayed. We see that the Smallest Index Policy need not be optimal. There is a small region-with only few customers-where it is optimal to Fig. 3 Optimality of highest index if third condition is violated take action 2. In the larger states action 1 is optimal, that is, the Smallest Index Policy is optimal. 3. The final parameter setting is given by 1.2 1.2 2.4 2 1 0.5 i = 2 1 1 2.5 2 1 0.4 Here only the first and second condition are satisfied. Figure 3 shows that it can be optimal to serve the station with the highest index instead of the smallest index.
Another observation can be made, based on these examples. In all cases a switching curve policy is optimal. Since an index policy can be viewed as a degenerate switching curve policy, we conjecture that a switching curve policy is always optimal.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Appendix
In the appendix we will provide the proofs of Propositions 1-7. In this section we use the following notation.
for all x, x + e i + e i+1 ∈ S N }.

For 1
for all x, x + e i + e j ∈ S N }.

For 1
It is straightforward that Bd N (i).

Proof of Proposition 1 First, we show that T N S A
The inequality is due to f ∈ I N (i). Hence we obtain T N S A(i) f ∈ I N (i). Notice that T N S A(i) f ∈ I N ( j) for i = j is trivial. Hence we conclude that T N S A(i) : I N → I N . This yields T N S A : We use f ∈ Super N (i, j) on the terms between square brackets to get the inequality.
Fourth, we prove T N S A : Bd N → Bd N . Suppose, that f ∈ Bd N . Let 1 ≤ i ≤ K arbitrary, then for all x with x, x + e i ∈ S N , we have We use f ∈ Bd N (i) on the terms in square brackets, to obtain the first inequality. Hence, We will prove, that T N S A : I N ∩ wU I N → wU I N . The property wU I N does not propagate through one individual smoothed arrivals operator, so it is necessary to look at the combined smoothed arrivals operator T N S A . Supposing that f ∈ I N ∩ wU I N , it suffices to show that T N S A f ∈ wU I N (i), for an arbitrary 1 ≤ i < K . First we look at the T N S A(i) operator, for all x with x, x + e i + e i+1 ∈ S N . Then, we have The terms between square brackets are greater or equal to zero because f ∈ wU I N (i). Notice, that the resulting term is smaller than or equal to zero, since f ∈ I N . Therefore we combine this with the T N S A(i+1) operator, similar as above we get ( 3) Then, we obtain for the total smoothed arrivals operator ≥ 0.
The first inequality is due to the fact that T N S A( j) for j = i, i +1 trivially propagates wU I N (i). The second inequality follows from Inequalities (2) and (3). The third inequality follows from f ∈ I N ∩ wU I N and λ i+1 /N i+1 ≥ λ i /N i . Therefore, we have T N S A f ∈ wU I N (i) for every 1 ≤ i < K , hence, T N S A : I N ∩ wU I N → wU I N .
Proof of Proposition 2 We start with the proof of T N I D : I N → I N . Suppose, that f ∈ I N . Let 1 ≤ i ≤ K be arbitrary, then for all x with x, x + e i ∈ S N , we have Here the inequality follows from f ∈ I N (i). Hence, T N I D(i) : I N (i) → I N (i). Moreover, for j = i trivially we have T N I D(i) : I N ( j) → I N ( j) as well. This induces that T N I D(i) f ∈ I N , and since 1 ≤ i ≤ K was chosen arbitrary, we have T N I D : I N → I N . We continue with the proof of T N I D : The inequality follows from f ∈ C x N (i). We may conclude, that The inequality follows from f ∈ Super N (i, j). Thus, T N I D(i) f ∈ Super N (i, j). It easily follows, that T N I D(i) : Super N → Super N , so that also T N I D : Super N → Super N .
Proof of Proposition 3 First, we prove T C : I N → I N . Let 1 ≤ i ≤ K , then for x with x, x + e i ∈ S N it holds that The inequality follows from the assumption that f ∈ I N (i) and c i ≥ 0. Hence, T C f ∈ I N (i), and so T C : I N → I N .
Next, we prove the propagation of convexity. Let 1 ≤ i ≤ K . For x with x, x + 2e i ∈ S N , it holds that The inequality follows directly from the assumption that f ∈ C x N (i). We conclude, that For the propagation of supermodularity T C : Super N → Super N , let 1 ≤ i = j ≤ K .
Then for x with x, x + e i + e j ∈ S N , it holds that The inequality follows from f ∈ Super N (i, j). Hence, T C f ∈ Super N (i, j) for any 1 ≤ i = j ≤ K , and so T C : Super N → Super N .

Proof of Proposition 4, part (i) First we prove T N C I D :
Bd N → Bd N . It is necessary to take the combination of more operators, because T C alone does not propagate Bd N . First, we derive the following inequalities for the increasing departure operators. Let f ∈ Bd N , and let 1 ≤ i ≤ K . For x x, x + e i ∈ S N , we have The inequality follows from f ∈ Bd N (i). Further, for j = i trivially Hence, for the operator T N C I D we obtain Here the inequality follows from Inequalities (4) and (5). This yields T N C I D f ∈ Bd N (i), for all i. From this the propagation of Bd N through T N C I D follows.

Proof of Proposition 4, part (ii) Suppose that
We will prove, that T N C I D : Further, we get the following inequality for T N I D(i) The inequality is due to f ∈ wU I N . For T N I D(i+1) it holds that The inequality follows from wU I N . Combining the above three Inequalities (6), (7) and (8) gives Hence, We wish to argue that that Eq. (9) is non-negative. To this end, we will make four case distinctions with respect to the parameters.
1. Suppose, that μ i ≤ μ i+1 and β i ≤ β i+1 , then The first inequality is due to c i μ i ≥ c i+1 μ i+1 . The second inequality follows from μ i ≤ μ i+1 and f ∈ Super N (i, j), together with β i ≤ β i+1 and f ∈ I N (i). 2. Suppose, that μ i ≤ μ i+1 and β i ≥ β i+1 . Then, The first inequality is due to f ∈ Super N (i, j) together with μ i ≤ μ i+1 . The second inequality comes from β i ≥ β i+1 and f ∈ Bd N (i). The last inequality is due to The first inequality follows from μ i /β i ≥ μ i+1 /β i+1 and f ∈ I N (i), together with μ i ≥ μ i+1 and f ∈ Bd N (i). The second inequality follows from c i ≥ c i+1 . 4. Finally, assume that μ i ≥ μ i+1 and μ i /β i ≤ μ i+1 /β i+1 , then The first inequality follows from μ i /β i ≤ μ i+1 /β i+1 and f ∈ Super N (i, j), together with β i ≥ β i+1 and f ∈ Bd N (i). The third inequality is due to So, for any 1 ≤ i < K , we have T N C I D f ∈ wU I N (i). Hence, we conclude that T N C I D : Proof of Proposition 5, part (i) Before starting with the proofs, the following remark is due. By Lemma 1, the Smallest Index Policy is optimal, if v α ∈ wU I ∩ I . The same holds true, if f ∈ wU I N ∩ I N . Then for x ∈ S N the action that chooses the smallest index minimises the T M S operator. We will use this several times below. First, we prove that T M S : I N ∩ wU I N → I N . Assume, that f ∈ I N ∩ wU I N . Let 1 ≤ i ≤ K be arbitrary. It suffices to show that T M S f ∈ I N (i). First, suppose x = 0, then The optimal policy is non-idling, because f ∈ I N . Hence, in state e i the system is minimised by serving station i, the only non-empty queue. The second term corresponds to the system being empty, so nobody is served, and the inequality follows from f ∈ I N (i).
x, x + e i ∈ S N . Let j * be the optimal action in state x. If j * ≤ i, then wU I N implies that this is also the optimal action in state x + e i , and so the inequality follows straightforward. If j * > i, then wU I N implies that serving station i is the optimal action in state x + e i . We obtain, The last inequality follows from f ∈ I N . We conclude, that T M S f ∈ I N (i).
We continue by the proof of T M S : I N ∩ wU I N → wU I N . Suppose that f ∈ I N ∩ wU I N , so that the Smallest Index Policy is optimal. Let 1 ≤ i < K be arbitrary. Let x be such that x, x + e i + e i + 1 ∈ S N , and let j * be the optimal action in state x + e i+1 . Suppose j * ≤ i.
Then the Smallest Index Policy implies that j * is optimal in states x + e i , x + e i+1 and x + e i + e i+1 as well. The propagation of wU I N is trivial. Supposing that j * > i, action i is optimal in state x + e i and x + e i + e i+1 , while in state x + e i+1 action i + 1 is optimal. We get, The inequality follows from the assumption f ∈ wU I N (i). We conclude that T M S : I N ∩ wU I N → wU I N (i),thus implying T M S : I N ∩ wU I N → wU I N . We prove T M S : I N ∩ wU I N ∩ C x N ∩ Super N → C x N ∩ Super N . To this end, assume f ∈ I N ∩ wU I N ∩ C x N ∩ Super N . Let 1 ≤ i = j ≤ K be arbitrary, w.l.o.g. assume that i < j. First, suppose that x = 0. Then, in state x + e i and x + e i + e j it is optimal to serve station i, while in state x + e j it is optimal to do action j. Therefore, The inequality follows from f ∈ I N ( j) ∩ Super N (i, j).
Next, suppose that x = 0, with x, x + e i + e j ∈ S N . Let the optimal action in state x be j * . We will make three case distinctions. Suppose that j * ≤ i. Then f ∈ I N ∩ wU I N implies that j * is also optimal in x + e i , x + e j and x + e i + e j . The same action is optimal in every state, which implies that Super N (i, j) is propagated trivially.
If i < j * ≤ j, then I N ∩ wU I N implies that j * is also optimal in x + e j , and action i is optimal in x + e i and x + e i + e j . We obtain The inequality follows from f ∈ Super N ∩ C x N .
If j * > j, then by I N ∩ wU I N , action i is optimal in x + e i and x + e i + e j . Serving station j is optimal in state x + e j , but if we choose suboptimal action j * instead, this makes the expression only smaller. Then we are in the same situation as above, for which we have already derived non-negativity. Hence, T M S f ∈ Super N (i, j). We conclude, that T M S f ∈ Super N as well.
This only leaves to prove that T M S f ∈ C x N . First, consider the case that x = 0. Then, in states x + e i and x + 2e i action i is optimal. Hence, The inequality follows from f ∈ I N (i) ∩ C x N (i).
Now, consider the case that x = 0, such that x, x + 2e i ∈ S N . Let j * be the optimal action in state x. If j * ≤ i, then in states x + e i and x + 2e i the optimal actions are equal to j * as well. The propagation is trivial in that case. If j * > i, then the optimal action in states x + e i and x + 2e i is action i. We obtain the following inequality The first inequality is due to f ∈ wU I N , the second comes from f ∈ Super N ∩C x N . Hence, T M S f ∈ C x N (i). We conclude, that T M S : I N ∩wU I N ∩C x N ∩ Super N → C x N ∩ Super N .
Proof of Proposition 5, part (ii) Assume, that c i μ i /β i ≥ c i+1 μ i+1 /β i+1 , for all 1 ≤ i < K . We will prove, that T M S : I N ∩ wU I N ∩ Bd N → Bd N . Let f ∈ I N ∩ wU I N ∩ Bd N , and let 1 ≤ i ≤ K be arbitrary. Again we make two case distinctions. First, suppose that x = 0. Then, For the second inequality we use that f ∈ Bd N (i).
Next, suppose that x = 0. Let j * be the optimal action in state x. If j * ≤ i, then as a result of f ∈ I N ∩ wU I N , the optimal actions in states x and x + e i are equal, namely j * . As a consequence, the propagation is trivial. If j * > i then the optimal actions are not equal, because wU I N implies that in state x + e i it is optimal to serve state i. We obtain The first inequality follows from f ∈ Bd N , the second follows fromc i μ i /β i ≥ c j * μ j * /β j * for i < j * . We conclude, that T M S f ∈ Bd N (i) and thus T M S : I N ∩ wU I N ∩ Bd N → Bd N .
Proof of Proposition 6 It follows directly that T N U N I F : I 3 N → I N , wU I 3 N → wU I N , C x 3 N → C x N , Super 3 N → Super N , since convex combinations of nonnegative terms are nonnegative.

Proof of Proposition 7 Recalling that
Tᾱ DI SC f = (1 −ᾱ) f , clearly Tᾱ DI SC : wU I N → wU I N , Further, suppose that f ∈ Bd N . Let 1 ≤ i ≤ K be arbitrary, and let x be such that x, x + e i ∈ S N . Then, The first inequality is due to f ∈ Bd N (i). This implies Tᾱ DI SC f ∈ Bd N (i). Hence Tᾱ DI SC : Bd N → Bd N .