Abstract
We study the dynamic server allocation problem for tandem queueing systems with an equal number of stations and servers. The servers are flexible, yet non-collaborative, so that at most one server can work at a station at any time. The objective is to maximize the long-run average throughput. We show that if each server is the fastest at one station, then a dedicated server assignment policy is optimal for systems of arbitrary size and with general service requirement distributions. Otherwise, the optimal policy is more complex as servers must divide their time between stations. For Markovian systems with two stations and two servers, we characterize the optimal policy completely. For larger Markovian systems, we use our results for two-station systems to propose four heuristic server assignment policies and provide computational results that show that our heuristics are near-optimal. We also compare collaborative and non-collaborative settings to evaluate the benefits of dynamic server allocation, as opposed to collaboration, in systems with flexible servers. We conclude that the loss in the long-run average throughput due to lack of collaboration is mitigated by the similarity of the tasks in the system, and cross-training can still be beneficial in non-collaborative systems.
Similar content being viewed by others
References
Ahn, H., Duenyas, I., Lewis, M.: Optimal control of a two-stage tandem queuing system with flexible servers. Probab. Eng. Inf. Sci. 16(4), 453–469 (2002)
Ahn, H., Duenyas, I., Zhang, R.Q.: Optimal stochastic scheduling of a two-stage tandem queue with parallel servers. Adv. Appl. Probab. 31(4), 1095–1117 (1999)
Ahn, H., Duenyas, I., Zhang, R.Q.: Optimal control of a flexible server. Adv. Appl. Probab. 36(1), 139–170 (2004)
Ahn, S., Righter, R.: Dynamic load balancing with flexible workers. Adv. Appl. Probab. 38(3), 621–642 (2006)
Andradóttir, S., Ayhan, H.: Throughput maximization for tandem lines with two stations and flexible servers. Oper. Res. 53(3), 516–531 (2005)
Andradóttir, S., Ayhan, H., Down, D.G.: Server assignment policies for maximizing the steady-state throughput of finite queueing systems. Manag. Sci. 47(10), 1421–1439 (2001)
Andradóttir, S., Ayhan, H., Down, D.G.: Dynamic server allocation for queueing networks with flexible servers. Oper. Res. 51(6), 952–968 (2003)
Andradóttir, S., Ayhan, H., Down, D.G.: Dynamic assignment of dedicated and flexible servers in tandem lines. Probab. Eng. Inf. Sci. 21(4), 497–538 (2007)
Andradóttir, S., Ayhan, H., Down, D.G.: Queueing systems with synergistic servers. Oper. Res. 59(3), 772–780 (2011)
Andradóttir, S., Ayhan, H., Down, D.G.: Optimal assignment of servers to tasks when collaboration is inefficient. Queueing Syst. 75(1), 79–110 (2013)
Argon, N.T., Andradóttir, S.: Partial pooling in tandem lines with cooperation and blocking. Queueing Syst. 52(1), 5–30 (2006)
Arumugam, R., Mayorga, M.E., Taaffe, K.M.: Inventory based allocation policies for flexible servers in serial systems. Ann. Oper. Res. 172(1), 1–23 (2009)
Bartholdi, J., Eisenstein, D.: A production line that balances itself. Oper. Res. 44(1), 21–34 (1996)
Brown, G.G., Geoffrion, A.M., Bradley, G.H.: Production and sales planning with limited shared tooling at the key operation. Manag. Sci. 27(3), 247–259 (1981)
Gargeya, V.B., Deane, R.H.: Scheduling in the dynamic job shop under auxiliary resource constraints: a simulation study. Int. J. Prod. Res. 37(12), 2817–2834 (1999)
Hasenbein, J.J., Kim, B.: Throughput maximization for two station tandem systems: a proof of the Andradóttir-Ayhan conjecture. Queueing Syst. 67(4), 365–386 (2011)
Hillier, F.S., Boling, R.W.: The effect of some design factors on the efficiency of production lines with variable operation times. J. Ind. Eng. 17(1), 651–657 (1966)
Hopp, W.J., Tekin, E., Van Oyen, M.P.: Benefits of skill chaining in serial production lines with cross-trained workers. Manag. Sci. 50(1), 83–98 (2004)
Hopp, W.J., Van Oyen, M.P.: Agile workforce evaluation: a framework for cross-training and coordination. IIE Trans. 36(10), 919–940 (2004)
Kim, H.W., Yu, J.M., Kim, J.S., Doh, H.H., Lee, D.H., Nam, S.H.: Loading algorithms for flexible manufacturing systems with partially grouped unrelated machines and additional tooling constraints. Int. J. Adv. Manuf. Technol. 58(5), 683–691 (2012)
Mandelbaum, A., Stolyar, A.L.: Scheduling flexible servers with convex delay costs: heavy-traffic optimality of the generalized \(c\mu \)-rule. Oper. Res. 52(6), 836–855 (2004)
Mayorga, M.E., Taaffe, K.M., Arumugam, R.: Allocating flexible servers in serial systems with switching costs. Ann. Oper. Res. 172(1), 231–242 (2009)
Puterman, M.L.: Markov Decision Processes. Wiley, New York (1994)
Schiefermayr, K., Weichbold, J.: A complete solution for the optimal stochastic scheduling of a two-stage tandem queue with two flexible servers. J. Appl. Probab. 42(3), 778–796 (2005)
Sennott, L.I., Van Oyen, M.P., Iravani, S.: Optimal dynamic assignment of a flexible worker on an open production line with specialists. Eur. J. Oper. Res. 170(2), 541–566 (2006)
Tsai, Y.C., Argon, N.T.: Dynamic server assignment policies for assembly-type queues with flexible servers. Nav. Rese. Logist. 55(3), 234–251 (2008)
Van Oyen, M.P., Gel, E., Hopp, W.J.: Performance opportunity for workforce agility in collaborative work systems. IIE Trans. 33(9), 761–777 (2001)
Acknowledgments
This work was supported by the National Science Foundation under Grant CMMI-0856600. The research of the third author was also supported by the National Science Foundation under Grant CMMI-0969747. The authors thank the associate editor and two anonymous referees for their helpful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Proof of Proposition 3
Under the optimal policy identified in Theorem 1, all states are recurrent and the optimal throughput is positive given that \(\mu _{11}>\mu _{21}, \mu _{22}>\mu _{12}\), since we then must have \(\mu _{11}>0,\mu _{22}>0\). To prove uniqueness, let us first eliminate the idling actions. If \(s=0\), then only the second station is starved, and assigning a server to the second station is the same as idling that server. Therefore actions \(a_{10}\) and \(a_{01}\) are equivalent to actions \(a_{12}\) and \(a_{21}\), respectively. Moreover, actions that idle the server at the first station (i.e., \(a_{20},a_{02},a_{00}\)) result in zero throughput. Similarly if \(s=B_1+2\), then only the first station is blocked and assigning a server to the first station is the same as idling that server. Therefore actions \(a_{02}\) and \(a_{20}\) are equivalent to actions \(a_{12}\) and \(a_{21}\), respectively. Moreover, actions that idle the server at the second station (i.e., \(a_{10},a_{01},a_{00}\)) result in zero throughput. Thus, we do not have to consider idling actions in these states. Furthermore, if a policy uses one of the actions \(\{a_{00},a_{10}, a_{01}, a_{20},a_{02}\}\) in some state \(s\in \{1,\dots ,B_1+1\}\), then the states \(s-1\) and \(s+1\) do not communicate, and the recurrent classes correspond to systems with a smaller buffer space. Let us denote the optimal policy given in Proposition 1 by \(\pi ^*\). Under policy \(\pi ^*\), the resulting Markov chain is a birth-death process with birth rate of \(\mu _{11}\), death rate of \(\mu _{22}\), and the stationary distribution \(\rho \), where \(\rho (s)=\frac{\mu _{11}^s\mu _{22}^{B_1+2-s}}{\sum _{i=0}^{B_1+2}\mu _{11}^i\mu _{22}^{B_1+2-i}}\) for \(s\in \{0,\dots ,B_1+2\}\). Thus, one can compute the long-run average throughput under the policy \(\pi ^*\) as
and show that it is strictly increasing in the buffer size \(B_1\), since
Therefore, any policy that results in smaller recurrent classes cannot be optimal, the idling actions are strictly suboptimal, and we can reduce our action space to \(\{a_{12}, a_{21}\}\).
Given that all states must be recurrent under an optimal policy, it is easy to see that the action \(a_{12}\) is unique optimal in the end states when \(\mu _{11}>\mu _{21}\) and \(\mu _{22}>\mu _{12}\) because the transition probabilities of the embedded discrete time Markov Chain out of states 0 and \(B_1+2\) do not depend on the action we choose, and the sojourn times in these states are strictly smaller under action \(a_{12}\).
Next, let us assume we have a policy \(\pi \) that uses action \(a_{21}\) at some state \(s\in \{1,\dots ,B_1+1\}\). Note that if \(\mu _{12}=0\) or \(\mu _{21}=0\), policy \(\pi \) results in a recurrent class that corresponds to a system with smaller buffer size, and hence policy \(\pi \) is strictly suboptimal given that \(\mu _{11}> \mu _{21}, \mu _{22}> \mu _{12}\). Hence we can assume both \(\mu _{12},\mu _{21}\) are strictly positive for the rest of the proof. We will construct a randomized policy \(\pi '\) that is exactly the same as policy \(\pi \) except in state s, that has the same embedded chain as policy \(\pi \), and has a strictly smaller expected sojourn time at state s.
First assume we have \(\mu _{11}\mu _{12} \le \mu _{21}\mu _{22}\), and policy \(\pi '\) uses action \(a_{12}\) with probability p and action \(a_{10}\) with probability \(1-p\) in state s, where
Note that \(p\in (0,1]\) due to the assumptions that \(\mu _{11}\mu _{12} \le \mu _{21}\mu _{22}\) and \(\mu _{12}>0,\mu _{21}>0\). Then, under policy \(\pi '\), the transition probabilities out of state s will be the same as under policy \(\pi \) (\(P_{s,s-1}=\frac{\mu _{12}}{\mu _{12}+\mu _{21}}\), \(P_{s,s+1}=\frac{\mu _{21}}{\mu _{12}+\mu _{21}}\)) and the expected sojourn time at state s will become
whereas the expected sojourn time under policy \(\pi \) is \(\eta =\frac{1}{\mu _{12}+\mu _{21}}\), and \(\eta '< \eta \) for \(\mu _{11}> \mu _{21}\). Hence the long-run average throughput is strictly larger under policy \(\pi '\).
When \(\mu _{11}\mu _{12}>\mu _{21}\mu _{22}\), a similar randomized policy can be constructed using actions \(a_{12}\), \(a_{02}\), and \(p=\frac{\mu _{21}(\mu _{11}+\mu _{22})}{\mu _{11}(\mu _{12}+\mu _{21})}\) at state s. Again the resulting transition probabilities will be unchanged and the mean sojourn time for state s will be strictly smaller due to our assumption \(\mu _{22} > \mu _{12}\). Thus, there exist a randomized policy \(\pi '\) that has larger long-run average throughput than policy \(\pi \). Theorem 9.1.8 of Puterman [23] implies that there must also exist a deterministic server assignment policy that performs at least as well as the randomized policy \(\pi '\). Hence we conclude that if \(\mu _{11}\ge \mu _{21}, \mu _{22}\ge \mu _{12}\) (\(\mu _{11}> \mu _{21}, \mu _{22}> \mu _{12}\)), then the action \(a_{21}\) is suboptimal (strictly suboptimal) at any state s. Therefore the policy given in Proposition 2 is the unique optimal policy when \(\mu _{11}> \mu _{21}, \mu _{22}> \mu _{12}\). \(\square \)
Proof of Theorem 2
We use Policy Iteration to show that the policy defined in the theorem is optimal. Let us choose the initial decision rule \(\delta _{0}=\delta ^{s^*}\) as in Theorem 2, and let \(\pi _0\) denote the corresponding policy. We can assume that \(\mu _{11}>0\) and \(\mu _{12}>0\) because otherwise there is a station at which neither server can work and the throughput of all policies is zero. Note that if \(\mu _{11}=0\) or \(\mu _{12}=0\), then \(f(i)=0\) for all \(i\in \{1,\dots , B_1+3\}\), and thus the uniqueness condition given in Theorem 2 does not hold. Also, we assume at least one of \(\mu _{21},\mu _{22}\) is nonzero, because if \(\mu _{21}=\mu _{22}=0\), there is only one server to work at two stations and any Markovian deterministic policy with non-zero long-run average throughput, including the one given in Theorem 2, is optimal. In this case, \(f(1)=\mu _{11}\mu _{12}^{B_1+2}>0\), \(f(B_1+3)=-\mu _{11}^{B_1+2}\mu _{12}<0\), and \(f(i)=0\) for \(1<i<B_1+3\), and the uniqueness condition given in Theorem 2 does not hold.
We start the Policy Iteration algorithm for a communicating model. Let \(r_{\delta _{0}}\) and \(P_{\delta _{0}}\) denote the corresponding reward vector and probability transition matrix for the decision rule \(\delta _{0}\), respectively. Without loss of generality, the uniformization constant can be taken as 1. We have
Note that \(\mu _{11}>0\) and \(\mu _{12}>0\) implies that the decision rule \(\delta _{0}\) yields a unichain structure. We can solve the following equation to find \(g_{0}\) and \(h_{0}\):
where e is the unit vector and \(h_{0}(0)=0\). In particular, \(g_{0}=T^{(\delta ^{s^*})^\infty }(B_1)\), where the function T is defined in Eqs. (2) and (3).
For \(s\le s^{*}\):
For \(s>s^{*}\):
The Policy Iteration algorithm terminates, proving that \(\pi _{0}\) is optimal, if the following is true for all states \(s\in \{0,1,\ \dots ,\ B_{1}+2\}\) and for all actions \(a\in A_s\) other than \(\delta _{0}(s)\):
We will examine states \(s\in S\) separately for \(s<s^{*}\) and \(s\ge s^{*}\) as our decision rule \(\delta _{0}\) takes different actions for states \(s<s^{*}\) and \(s\ge s^{*}\). We first show that the inequality (7) holds for non-idling actions (i.e., \(a_{21}\) when \(0\le s<s^*\), and \(a_{12}\) when \(s^*\le s\le B_1+2\)).
Define
Note that \({\varGamma }>0\) under our assumptions that \(\mu _{11}>0,\mu _{12}>0\). If \(0\le s<s^*\), the right-hand side of (7) with \(a=a_{21}\) becomes \({\varDelta }(s,a_{21})=\frac{{\varGamma }_1(s)}{{\varGamma }}\), where
It follows with some algebra that \({\varGamma }_1(s^{*}-1)\) is a negative multiple of \(f(s^{*})\), in particular, \({\varGamma }_1(s^{*}-1)=-f(s^*)\mu _{12}^{s^*}\). Since \(f(s^{*})\ge 0\), \({\varGamma }_1(s^{*}-1)\) is nonpositive and \({\varDelta }(s^*-1,a_{21})\le 0\), proving our claim at state \(s^{*}-1\). Next we prove that \({\varGamma }_1(s)\) is nondecreasing in s by showing \({\varGamma }_1(s-1)-{\varGamma }_1(s)\le 0\) for \(1<s<s^{*}\). We have
It follows from our assumptions \(\mu _{11}\ge \mu _{21}\) and \(\mu _{12}\ge \mu _{22}\) that the above expression is nonpositive. Hence \({\varGamma }_1(s)\) is nondecreasing in s, and we have \({\varDelta }(s,a_{21})\le 0\) for all \({0\le s<s^{*}}\). Furthermore, the inequality (7) is strict for \(0\le s<s^*\) unless \(f(s^*)= 0\) (because \({{\varGamma }_1(s^*-1)<0}\)).
On the other hand, if \(s^*\le s\le B_1+2\), the right-hand side of expression (7) with \(a=a_{12}\) becomes \({\varDelta }(s,a_{12})=\frac{{\varGamma }_2(s)}{{\varGamma }}\), where
Evaluating \({\varGamma }_{2}(s)\) at \(s=s^*\), one can show that \({\varGamma }_{2}(s^*)\) is a positive multiple of \(f(s^*+1)\), namely \({\varGamma }_{2}(s^*)=f(s^*+1)\mu _{12}^{s^*}\). By the definition of \(s^{*}\), we know that \(f(s^{*}+1)\le 0\). Therefore we have \({\varGamma }_{2}(s^*)\le 0\) and \({\varDelta }(s^*,a_{12})\le 0\) proving our claim for \(s=s^{*}\). We prove that \({\varGamma }_{2}(s)\) is non increasing in s by looking at \({\varGamma }_{2}(s)-{\varGamma }_{2}(s+1)\) for \(s^*\le s<B_1+1\) and showing it to be non negative. We have
Due to our assumptions \(\mu _{11}\ge \mu _{21}\) and \(\mu _{12}\ge \mu _{22}\), it follows that \({\varGamma }_{2}(s)-{\varGamma }_{2}(s+1)\ge 0\). Therefore, \({\varDelta }(s^{*},a_{12})\le 0\) implies that \({\varDelta }(s,a_{12})\le 0\) for all states s such that \(s^{*}\le s \le B_1+2\). Also, inequality (7) is strict for all states \(s^*\le s\le B_1+2\) unless \(f(s^*+1)=0\) (because \({\varGamma }_2(s^*)<0\)).
Since inequality (7) holds for all states \(s\in \{0,\dots ,B_1+2\}\) and non-idling actions, we have shown the policy \((\delta _0)^{\infty }\) is optimal among non-idling policies.
From the arguments given in the proof of Proposition 3, it follows that we do not need to consider idling actions for \(s=0\) and \(s=B_1+2\) (because idling actions are either equivalent to non-idling actions or they are strictly suboptimal). Here we use induction on \(B_1\) to show that a policy that uses an idling action in any of the states \(s\in \{1,\dots ,B_1+1\}\) cannot be optimal.
If \(B_1=0\), the state space becomes \(S=\{0,1,2\}\) and it is enough to prove that idling actions are strictly suboptimal in state \(s=1\). Note that the transition probabilities of the embedded discrete time Markov Chain out of states \(s=0\) and \(s=2\) does not depend on the action we choose, and the sojourn times in these states are smaller under actions \(a_{12}, a_{21}\), respectively. Thus, these actions are optimal in \(s=0\) and \(s=2\). Let \(\delta _a\) denote the decision rule that uses an idling action \(a\in \{a_{00},a_{10}, a_{01}, a_{20}, a_{02}\}\) in \(s=1\), and non-idling actions \(a_{12},a_{21}\) in \(s=0\), \(s=2\), respectively. It is easy to see that decision rule \(\delta _{a_{00}}\) is strictly suboptimal since it results in zero throughput. Note that \(\delta _0\in \{\delta ^1,\delta ^2\}\) and define \(\kappa _{a,i}=T^{(\delta ^i)^\infty }(0)-T^{(\delta _a)^\infty }(0)\) for \(i\in S\backslash \{0\}=\{1,2\}\). We have
We consider three cases. First, if \(\mu _{21}>0,\mu _{22}>0\), we have \(\kappa _{a,i}>0\) for all \(i\in \{1,2\}\) and \({a\in \{a_{10}, a_{01}, a_{20}, a_{02}\}}\). Similarly, if \(\mu _{21}>0,\mu _{22}=0\), then \(\kappa _{a,1}>0\) for all \(a\in \{a_{10}, a_{01}, a_{20}, a_{02}\}\). Finally, if \(\mu _{21}=0,\mu _{22}>0\), we have \(\kappa _{a,2}>0\) for all \(a\in \{a_{10}, a_{01}, a_{20}, a_{02}\}\). Therefore, for any idling policy \((\delta _a)^\infty \), there exists a non-idling policy \((\delta ^i)^\infty \) that performs strictly better, and thus idling actions are strictly suboptimal for \(B_1=0\) in all three cases.
Assume now that the non-idling decision rule \(\delta _0\) is optimal among all possible decision rules for all buffer sizes \(B_1\le B'_1\). (Note that \(\delta _0\) depends on the buffer size \(B_1\), but we suppress this in our notation.) For buffer size \(B_1'+1\), assume there exists an optimal decision rule \(\delta '\) that uses an idling action at some state \(s\in \{1,\dots ,B_1'+2\}\). Under decision rule \(\delta '\), states \(s-1\) and \(s+1\) do not communicate and the resulting recurrent classes correspond to systems with buffer size strictly smaller than \(B_1'+1\). Let \(B''<B_1'+1\) denote the buffer size for any one of the resulting systems. By our assumption, \(\delta _0\) is optimal for this system, hence the (constant) long-run average throughput achieved by \(\delta '\) must be equal to that of \(\delta _0\). We now show that this leads to a contradiction.
We have \(T^{\delta ^i}(B_1+1)-T^{\delta ^i}(B_1)=\frac{\nu _1(i,B_1)}{\nu _2(i,B_1)\nu _2(i,B_1+1)}\) for all \(i\in \{1,\dots ,B_1+2\}\), where
If \(\mu _{21}>0,\mu _{22}>0\), then \(T^{\delta ^i}(B_1+1)-T^{\delta ^i}(B_1)>0\) for all \(i\in \{1,\dots , B_1+2\}\), implying that the throughput achieved by the decision rule \(\delta _0\) is strictly increasing in the buffer size. Similarly, if \(\mu _{21}>0,\mu _{22}=0\), we have \(f(1)\ge 0\), \(f(i)<0\) for \(i\in \{2,\dots ,B_1+3\}\) and \({s^*=1}\), \(\delta _0=\delta ^1\) for all buffer sizes, and the throughput achieved by the decision rule \(\delta _0\) is strictly increasing in buffer size since \({T^{(\delta ^1)^\infty }(B_1+1)-T^{(\delta ^1)^\infty }(B_1)>0}\). Finally, if \(\mu _{21}=0,\mu _{22}>0\), we have \(f(i)>0\) for \(i\in \{1,\dots ,B_1+2\}\), \(f(B_1+3)\le 0\), and \(s^*=B_1+2\). Thus, \(\delta _0=\delta ^{B_1+2}\) for buffer size \(B_1\) and \(\delta _0=\delta ^{B_1+3}\) for buffer size \(B_1+1\). We have \({T^{(\delta ^{B_1+3})^\infty }(B_1+1)-T^{(\delta ^{B_1+2})^\infty }(B_1)=\frac{\nu _3(B_1)}{\nu _4(B_1)\nu _4(B_1+1)}>0}\), where
In all three cases, the throughput achieved by the decision rule \(\delta _0\) is strictly increasing in the buffer size, and we must have \({T^{(\delta ')^\infty }(B_1'+1)=T^{(\delta _0)^\infty }(B'')<T^{(\delta _0)^\infty }(B_1'+1)}\), contradicting the assumption that the decision rule \(\delta '\) is optimal. We conclude that no policy that uses an idling action in a state \(s\in \{1,\dots ,B'_1+2\}\) can be optimal for buffer size \(B'_1+1\). By induction, it follows that idling actions are strictly suboptimal and can be eliminated for all buffer sizes. Therefore, the decision rule \(\delta _0\) is optimal in the class of Markovian stationary deterministic policies.
We have shown that policies with idling actions are strictly suboptimal. To prove uniqueness among non-idling policies, we use a similar approach to Ayhan and Andradóttir [5] and consider a non-idling decision rule \(\delta '\) that differs from \(\delta _0\) in at least one state \(s\in S\). Let us define
where we have used Eq. (6). It is shown in the proof of optimality above that \(v(s)\le 0\) for all \(s\in S\) and that if \(f(s^*)>0, f(s^*+1)<0\), we must have
It is easy to see that if both \(\mu _{21},\mu _{22}\) are positive, then \(P_{\delta _0}\) must be irreducible since \(\delta _0\) is non-idling. Moreover, \(\delta _0=\delta ^{B_1+2}\) and \(\delta _0=\delta ^{1}\) also result in irreducible transition matrices when \(\mu _{21}=0,\mu _{22}>0\) and \(\mu _{21}>0,\mu _{22}=0\), respectively. Hence, given that at least one of \(\mu _{21},\mu _{22}\) is nonzero, \(P_{\delta _0}\) is irreducible.
Since \(P_{\delta _0}\) is irreducible and \(\delta '\) differs from \(\delta _0\) in at least one state, then it must differ from \(\delta _0\) in at least one state \(s_0\in S\) that is recurrent under \(\delta '\). Let \(g'\) denote the (possibly state dependent) throughput of the stationary policy \((\delta ')^{\infty }\) and define \({\varDelta } g=g'-g_0 e\). Also let \(P^*_{\delta '}\) denote the limiting matrix of \(P_{\delta '}\). Suppose \(P_{\delta '}\) has n recurrent classes, and partition \(P_{\delta '}\) such that \(P_i\) for \(i\in \{1,2,\dots ,n\}\) corresponds to transitions within recurrent class i. Also partition \(g'\), \({\varDelta } g\), and \(P^*_{\delta '}\) in a manner consistent with the partition of \(P_{\delta '}\). Lemma 9.2.5 of Puterman [23] states that \({\varDelta } g_i=P^*_i v_i\). Using this lemma and Eq. (8), we conclude \(g'(s_0)-g_0<0\). Thus the decision rule \(\delta '\) cannot be optimal, proving that \((\delta _0)^{\infty }\) is the unique optimal policy if \(f(s^*)>0,f(s^*+1)<0\). \(\square \)
Rights and permissions
About this article
Cite this article
Işık, T., Andradóttir, S. & Ayhan, H. Optimal control of queueing systems with non-collaborating servers. Queueing Syst 84, 79–110 (2016). https://doi.org/10.1007/s11134-016-9481-2
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11134-016-9481-2
Keywords
- Tandem queueing networks
- Flexible servers
- Non-collaborative systems
- Finite buffers
- Throughput optimality
- Markov decision processes