Skip to main content
Log in

Optimal control of queueing systems with non-collaborating servers

  • Published:
Queueing Systems Aims and scope Submit manuscript

Abstract

We study the dynamic server allocation problem for tandem queueing systems with an equal number of stations and servers. The servers are flexible, yet non-collaborative, so that at most one server can work at a station at any time. The objective is to maximize the long-run average throughput. We show that if each server is the fastest at one station, then a dedicated server assignment policy is optimal for systems of arbitrary size and with general service requirement distributions. Otherwise, the optimal policy is more complex as servers must divide their time between stations. For Markovian systems with two stations and two servers, we characterize the optimal policy completely. For larger Markovian systems, we use our results for two-station systems to propose four heuristic server assignment policies and provide computational results that show that our heuristics are near-optimal. We also compare collaborative and non-collaborative settings to evaluate the benefits of dynamic server allocation, as opposed to collaboration, in systems with flexible servers. We conclude that the loss in the long-run average throughput due to lack of collaboration is mitigated by the similarity of the tasks in the system, and cross-training can still be beneficial in non-collaborative systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Ahn, H., Duenyas, I., Lewis, M.: Optimal control of a two-stage tandem queuing system with flexible servers. Probab. Eng. Inf. Sci. 16(4), 453–469 (2002)

    Article  Google Scholar 

  2. Ahn, H., Duenyas, I., Zhang, R.Q.: Optimal stochastic scheduling of a two-stage tandem queue with parallel servers. Adv. Appl. Probab. 31(4), 1095–1117 (1999)

    Article  Google Scholar 

  3. Ahn, H., Duenyas, I., Zhang, R.Q.: Optimal control of a flexible server. Adv. Appl. Probab. 36(1), 139–170 (2004)

    Article  Google Scholar 

  4. Ahn, S., Righter, R.: Dynamic load balancing with flexible workers. Adv. Appl. Probab. 38(3), 621–642 (2006)

    Article  Google Scholar 

  5. Andradóttir, S., Ayhan, H.: Throughput maximization for tandem lines with two stations and flexible servers. Oper. Res. 53(3), 516–531 (2005)

    Article  Google Scholar 

  6. Andradóttir, S., Ayhan, H., Down, D.G.: Server assignment policies for maximizing the steady-state throughput of finite queueing systems. Manag. Sci. 47(10), 1421–1439 (2001)

    Article  Google Scholar 

  7. Andradóttir, S., Ayhan, H., Down, D.G.: Dynamic server allocation for queueing networks with flexible servers. Oper. Res. 51(6), 952–968 (2003)

    Article  Google Scholar 

  8. Andradóttir, S., Ayhan, H., Down, D.G.: Dynamic assignment of dedicated and flexible servers in tandem lines. Probab. Eng. Inf. Sci. 21(4), 497–538 (2007)

    Article  Google Scholar 

  9. Andradóttir, S., Ayhan, H., Down, D.G.: Queueing systems with synergistic servers. Oper. Res. 59(3), 772–780 (2011)

    Article  Google Scholar 

  10. Andradóttir, S., Ayhan, H., Down, D.G.: Optimal assignment of servers to tasks when collaboration is inefficient. Queueing Syst. 75(1), 79–110 (2013)

    Article  Google Scholar 

  11. Argon, N.T., Andradóttir, S.: Partial pooling in tandem lines with cooperation and blocking. Queueing Syst. 52(1), 5–30 (2006)

    Article  Google Scholar 

  12. Arumugam, R., Mayorga, M.E., Taaffe, K.M.: Inventory based allocation policies for flexible servers in serial systems. Ann. Oper. Res. 172(1), 1–23 (2009)

    Article  Google Scholar 

  13. Bartholdi, J., Eisenstein, D.: A production line that balances itself. Oper. Res. 44(1), 21–34 (1996)

    Article  Google Scholar 

  14. Brown, G.G., Geoffrion, A.M., Bradley, G.H.: Production and sales planning with limited shared tooling at the key operation. Manag. Sci. 27(3), 247–259 (1981)

  15. Gargeya, V.B., Deane, R.H.: Scheduling in the dynamic job shop under auxiliary resource constraints: a simulation study. Int. J. Prod. Res. 37(12), 2817–2834 (1999)

    Article  Google Scholar 

  16. Hasenbein, J.J., Kim, B.: Throughput maximization for two station tandem systems: a proof of the Andradóttir-Ayhan conjecture. Queueing Syst. 67(4), 365–386 (2011)

    Article  Google Scholar 

  17. Hillier, F.S., Boling, R.W.: The effect of some design factors on the efficiency of production lines with variable operation times. J. Ind. Eng. 17(1), 651–657 (1966)

    Google Scholar 

  18. Hopp, W.J., Tekin, E., Van Oyen, M.P.: Benefits of skill chaining in serial production lines with cross-trained workers. Manag. Sci. 50(1), 83–98 (2004)

    Article  Google Scholar 

  19. Hopp, W.J., Van Oyen, M.P.: Agile workforce evaluation: a framework for cross-training and coordination. IIE Trans. 36(10), 919–940 (2004)

    Article  Google Scholar 

  20. Kim, H.W., Yu, J.M., Kim, J.S., Doh, H.H., Lee, D.H., Nam, S.H.: Loading algorithms for flexible manufacturing systems with partially grouped unrelated machines and additional tooling constraints. Int. J. Adv. Manuf. Technol. 58(5), 683–691 (2012)

    Article  Google Scholar 

  21. Mandelbaum, A., Stolyar, A.L.: Scheduling flexible servers with convex delay costs: heavy-traffic optimality of the generalized \(c\mu \)-rule. Oper. Res. 52(6), 836–855 (2004)

    Article  Google Scholar 

  22. Mayorga, M.E., Taaffe, K.M., Arumugam, R.: Allocating flexible servers in serial systems with switching costs. Ann. Oper. Res. 172(1), 231–242 (2009)

    Article  Google Scholar 

  23. Puterman, M.L.: Markov Decision Processes. Wiley, New York (1994)

    Book  Google Scholar 

  24. Schiefermayr, K., Weichbold, J.: A complete solution for the optimal stochastic scheduling of a two-stage tandem queue with two flexible servers. J. Appl. Probab. 42(3), 778–796 (2005)

    Article  Google Scholar 

  25. Sennott, L.I., Van Oyen, M.P., Iravani, S.: Optimal dynamic assignment of a flexible worker on an open production line with specialists. Eur. J. Oper. Res. 170(2), 541–566 (2006)

  26. Tsai, Y.C., Argon, N.T.: Dynamic server assignment policies for assembly-type queues with flexible servers. Nav. Rese. Logist. 55(3), 234–251 (2008)

    Article  Google Scholar 

  27. Van Oyen, M.P., Gel, E., Hopp, W.J.: Performance opportunity for workforce agility in collaborative work systems. IIE Trans. 33(9), 761–777 (2001)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Science Foundation under Grant CMMI-0856600. The research of the third author was also supported by the National Science Foundation under Grant CMMI-0969747. The authors thank the associate editor and two anonymous referees for their helpful comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tuğçe Işık.

Appendix

Appendix

Proof of Proposition 3

Under the optimal policy identified in Theorem 1, all states are recurrent and the optimal throughput is positive given that \(\mu _{11}>\mu _{21}, \mu _{22}>\mu _{12}\), since we then must have \(\mu _{11}>0,\mu _{22}>0\). To prove uniqueness, let us first eliminate the idling actions. If \(s=0\), then only the second station is starved, and assigning a server to the second station is the same as idling that server. Therefore actions \(a_{10}\) and \(a_{01}\) are equivalent to actions \(a_{12}\) and \(a_{21}\), respectively. Moreover, actions that idle the server at the first station (i.e., \(a_{20},a_{02},a_{00}\)) result in zero throughput. Similarly if \(s=B_1+2\), then only the first station is blocked and assigning a server to the first station is the same as idling that server. Therefore actions \(a_{02}\) and \(a_{20}\) are equivalent to actions \(a_{12}\) and \(a_{21}\), respectively. Moreover, actions that idle the server at the second station (i.e., \(a_{10},a_{01},a_{00}\)) result in zero throughput. Thus, we do not have to consider idling actions in these states. Furthermore, if a policy uses one of the actions \(\{a_{00},a_{10}, a_{01}, a_{20},a_{02}\}\) in some state \(s\in \{1,\dots ,B_1+1\}\), then the states \(s-1\) and \(s+1\) do not communicate, and the recurrent classes correspond to systems with a smaller buffer space. Let us denote the optimal policy given in Proposition 1 by \(\pi ^*\). Under policy \(\pi ^*\), the resulting Markov chain is a birth-death process with birth rate of \(\mu _{11}\), death rate of \(\mu _{22}\), and the stationary distribution \(\rho \), where \(\rho (s)=\frac{\mu _{11}^s\mu _{22}^{B_1+2-s}}{\sum _{i=0}^{B_1+2}\mu _{11}^i\mu _{22}^{B_1+2-i}}\) for \(s\in \{0,\dots ,B_1+2\}\). Thus, one can compute the long-run average throughput under the policy \(\pi ^*\) as

$$\begin{aligned} {T^{\pi ^*}(B_1)=\frac{\sum _{i=0}^{B_1+1}\mu _{11}^{i+1}\mu _{22}^{B_1+2-i}}{\sum _{i=0}^{B_1+2}\mu _{11}^i \mu _{22}^{B_1+2-i}},} \end{aligned}$$

and show that it is strictly increasing in the buffer size \(B_1\), since

$$\begin{aligned} {T^{\pi ^*}(B_1)-T^{\pi ^*}(B_1-1)}&{=\frac{\mu _{11}^{B_1+2}\mu _{22}^{B_1+2}}{\Big (\sum _{i=0}^{B_1+2}\mu _{11}^i \mu _{22}^{B_1+2-i}\Big )\Big (\sum _{i=0}^{B_1+1}\mu _{11}^i \mu _{22}^{B_1+1-i}\Big )}>0}. \end{aligned}$$

Therefore, any policy that results in smaller recurrent classes cannot be optimal, the idling actions are strictly suboptimal, and we can reduce our action space to \(\{a_{12}, a_{21}\}\).

Given that all states must be recurrent under an optimal policy, it is easy to see that the action \(a_{12}\) is unique optimal in the end states when \(\mu _{11}>\mu _{21}\) and \(\mu _{22}>\mu _{12}\) because the transition probabilities of the embedded discrete time Markov Chain out of states 0 and \(B_1+2\) do not depend on the action we choose, and the sojourn times in these states are strictly smaller under action \(a_{12}\).

Next, let us assume we have a policy \(\pi \) that uses action \(a_{21}\) at some state \(s\in \{1,\dots ,B_1+1\}\). Note that if \(\mu _{12}=0\) or \(\mu _{21}=0\), policy \(\pi \) results in a recurrent class that corresponds to a system with smaller buffer size, and hence policy \(\pi \) is strictly suboptimal given that \(\mu _{11}> \mu _{21}, \mu _{22}> \mu _{12}\). Hence we can assume both \(\mu _{12},\mu _{21}\) are strictly positive for the rest of the proof. We will construct a randomized policy \(\pi '\) that is exactly the same as policy \(\pi \) except in state s, that has the same embedded chain as policy \(\pi \), and has a strictly smaller expected sojourn time at state s.

First assume we have \(\mu _{11}\mu _{12} \le \mu _{21}\mu _{22}\), and policy \(\pi '\) uses action \(a_{12}\) with probability p and action \(a_{10}\) with probability \(1-p\) in state s, where

$$\begin{aligned} {p=\frac{\mu _{12}(\mu _{11}+\mu _{22})}{\mu _{22}(\mu _{12}+\mu _{21})}.} \end{aligned}$$

Note that \(p\in (0,1]\) due to the assumptions that \(\mu _{11}\mu _{12} \le \mu _{21}\mu _{22}\) and \(\mu _{12}>0,\mu _{21}>0\). Then, under policy \(\pi '\), the transition probabilities out of state s will be the same as under policy \(\pi \) (\(P_{s,s-1}=\frac{\mu _{12}}{\mu _{12}+\mu _{21}}\), \(P_{s,s+1}=\frac{\mu _{21}}{\mu _{12}+\mu _{21}}\)) and the expected sojourn time at state s will become

$$\begin{aligned} {\eta '=\frac{\mu _{21}}{\mu _{11}(\mu _{12}+\mu _{21})},} \end{aligned}$$

whereas the expected sojourn time under policy \(\pi \) is \(\eta =\frac{1}{\mu _{12}+\mu _{21}}\), and \(\eta '< \eta \) for \(\mu _{11}> \mu _{21}\). Hence the long-run average throughput is strictly larger under policy \(\pi '\).

When \(\mu _{11}\mu _{12}>\mu _{21}\mu _{22}\), a similar randomized policy can be constructed using actions \(a_{12}\), \(a_{02}\), and \(p=\frac{\mu _{21}(\mu _{11}+\mu _{22})}{\mu _{11}(\mu _{12}+\mu _{21})}\) at state s. Again the resulting transition probabilities will be unchanged and the mean sojourn time for state s will be strictly smaller due to our assumption \(\mu _{22} > \mu _{12}\). Thus, there exist a randomized policy \(\pi '\) that has larger long-run average throughput than policy \(\pi \). Theorem 9.1.8 of Puterman [23] implies that there must also exist a deterministic server assignment policy that performs at least as well as the randomized policy \(\pi '\). Hence we conclude that if \(\mu _{11}\ge \mu _{21}, \mu _{22}\ge \mu _{12}\) (\(\mu _{11}> \mu _{21}, \mu _{22}> \mu _{12}\)), then the action \(a_{21}\) is suboptimal (strictly suboptimal) at any state s. Therefore the policy given in Proposition 2 is the unique optimal policy when \(\mu _{11}> \mu _{21}, \mu _{22}> \mu _{12}\). \(\square \)

Proof of Theorem 2

We use Policy Iteration to show that the policy defined in the theorem is optimal. Let us choose the initial decision rule \(\delta _{0}=\delta ^{s^*}\) as in Theorem 2, and let \(\pi _0\) denote the corresponding policy. We can assume that \(\mu _{11}>0\) and \(\mu _{12}>0\) because otherwise there is a station at which neither server can work and the throughput of all policies is zero. Note that if \(\mu _{11}=0\) or \(\mu _{12}=0\), then \(f(i)=0\) for all \(i\in \{1,\dots , B_1+3\}\), and thus the uniqueness condition given in Theorem 2 does not hold. Also, we assume at least one of \(\mu _{21},\mu _{22}\) is nonzero, because if \(\mu _{21}=\mu _{22}=0\), there is only one server to work at two stations and any Markovian deterministic policy with non-zero long-run average throughput, including the one given in Theorem 2, is optimal. In this case, \(f(1)=\mu _{11}\mu _{12}^{B_1+2}>0\), \(f(B_1+3)=-\mu _{11}^{B_1+2}\mu _{12}<0\), and \(f(i)=0\) for \(1<i<B_1+3\), and the uniqueness condition given in Theorem 2 does not hold.

We start the Policy Iteration algorithm for a communicating model. Let \(r_{\delta _{0}}\) and \(P_{\delta _{0}}\) denote the corresponding reward vector and probability transition matrix for the decision rule \(\delta _{0}\), respectively. Without loss of generality, the uniformization constant can be taken as 1. We have

$$\begin{aligned} P_{\delta _0}(s,s')= & {} {\left\{ \begin{array}{ll} {1-\mu _{11}} &{} {\text {for }\ s=0, s'=0,}\\ {\mu _{11}} &{} {\text {for }\ s=0, s'=1,}\\ {0} &{} {\text {for }\ s=0, s'\ge 2,} \\ {\mu _{22}} &{} {\text {for } 1\le s \le s^*-1, s'=s-1,}\\ {1-\mu _{22}-\mu _{11}} &{} {\text {for } 1\le s \le s^*-1, s'=s,}\\ {\mu _{11}} &{} {\text {for } 1\le s \le s^*-1, s'=s+1,}\\ {0} &{} {\text {for }1\le s \le s^*-1, s'> s+1 \text { or } s'< s-1,}\\ {\mu _{12}} &{} {\text {for } s^*\le s \le B_1+1, s'=s-1,}\\ {1-\mu _{12}-\mu _{21}} &{} {\text {for } s^*\le s \le B_1+1, s'=s,}\\ {\mu _{21}} &{} {\text {for } s^*\le s \le B_1+1, s'=s+1,}\\ {0} &{} {\text {for } s^*\le s \le B_1+1, s'> s+1 \text { or } s' < s-1,}\\ {\mu _{12}} &{} {\text {for } s=B_1+2, s'=B_1+1,}\\ {1-\mu _{12}} &{} {\text {for } s=B_1+2, s'=B_1+2,}\\ {0} &{} {\text {for } s=B_1+2, s'\le B_1}; \end{array}\right. }\\ r_{\delta _0}(s)= & {} {\left\{ \begin{array}{ll} {0} &{} {\text {for}\ s=0,}\\ {\mu _{22}} &{} {\text {for}\ 1\le s\le s^*-1,}\\ {\mu _{12}} &{} {\text {for}\ s^*\le s\le B_{1}+2.} \end{array}\right. } \end{aligned}$$

Note that \(\mu _{11}>0\) and \(\mu _{12}>0\) implies that the decision rule \(\delta _{0}\) yields a unichain structure. We can solve the following equation to find \(g_{0}\) and \(h_{0}\):

$$\begin{aligned} {r_{\delta _{0}}-g_{0}e+(P_{\delta _{0}}-I)h_{0}=0,} \end{aligned}$$
(6)

where e is the unit vector and \(h_{0}(0)=0\). In particular, \(g_{0}=T^{(\delta ^{s^*})^\infty }(B_1)\), where the function T is defined in Eqs. (2) and (3).

For \(s\le s^{*}\):

$$\begin{aligned} {h_0(s)}&{=g_{0}\sum _{i=0}^{s-1}(s-i)\frac{\mu _{22}^{i}}{\mu _{11}^{i+1}}-\sum _{i=0}^{s-2}(s-1-i)\frac{\mu _{22}^{i+1}}{\mu _{11}^{i+1}}\ .} \end{aligned}$$

For \(s>s^{*}\):

$$\begin{aligned} {h_0(s)}&=g_{0}\left( \sum _{i=0}^{s^{*}-1}(s^{*}-i)\frac{\mu _{22}^{i}}{\mu _{11}^{i+1}}\frac{\mu _{12}^{s-s^{*}}}{\mu _{21}^{s-s^{*}}}\right. \\&\qquad \left. +\frac{(\mu _{22}-\mu _{12})}{\mu _{21}}\sum _{i=0}^{s-s^{*}-1}\frac{\mu _{12}^{i}}{\mu _{21}^{i}}\sum _{j=0}^{s^{*}-2}(s^{*}-1-j)\frac{\mu _{22}^{j}}{\mu _{11}^{j+1}}\right. \\&\left. \qquad +\sum _{i=0}^{s-s^{*}-1}(s-i)\frac{\mu _{12}^{i}}{\mu _{21}^{i+1}}+\frac{(\mu _{21}-\mu _{11})}{\mu _{21}}\sum _{i=0}^{s-s^{*}-1}\frac{\mu _{12}^{i}}{\mu _{21}^{i}}\sum _{j=1}^{s^{*}-1}(s^{*}-j)\frac{\mu _{22}^{j}}{\mu _{11}^{j+1}}\right) \\&\qquad -\sum _{i=0}^{s-s^{*}-1}(s-s^{*}-i)\frac{\mu _{12}^{i+1}}{\mu _{21}^{i+1}}\\&\qquad +\frac{(\mu _{22}-\mu _{12})}{\mu _{21}}\sum _{i=0}^{s-s^{*}-1}\frac{\mu _{12}^{i}}{\mu _{21}^{i}}\sum _{j=0}^{s^{*}-3}(s^{*}-2-j)\frac{\mu _{22}^{j+1}}{\mu _{11}^{j+1}}\\&{\qquad +\sum _{i=0}^{s^{*}-2}(s^{*}-1-i)\frac{\mu _{22}^{i+1}}{\mu _{11}^{i+1}}\frac{\mu _{12}^{s-s^{*}}}{\mu _{21}^{s-s^{*}}}+(s^{*}-1)\frac{\mu _{22}}{\mu _{21}}\sum _{i=0}^{s-s^{*}-1}\frac{\mu _{12}^{i}}{\mu _{21}^{i}} }\\&{\qquad +\frac{(\mu _{21}-\mu _{11})}{\mu _{21}}\sum _{i=0}^{s-s^{*}-1}\frac{\mu _{12}^{i}}{\mu _{21}^{i}}\sum _{j=0}^{s^{*}-2}(s^{*}-1-j)\frac{\mu _{22}^{j+1}}{\mu _{11}^{j+1}}.} \end{aligned}$$

The Policy Iteration algorithm terminates, proving that \(\pi _{0}\) is optimal, if the following is true for all states \(s\in \{0,1,\ \dots ,\ B_{1}+2\}\) and for all actions \(a\in A_s\) other than \(\delta _{0}(s)\):

$$\begin{aligned} {{\varDelta }(s,a)=r(s,a)+\sum _{j\in S}p(j|s,a)h_{0}(j)-r(s,\delta _{0}(s))-\sum _{j\in S}p(j|s,\delta _{0}(s))h_{0}(j)\le 0.} \end{aligned}$$
(7)

We will examine states \(s\in S\) separately for \(s<s^{*}\) and \(s\ge s^{*}\) as our decision rule \(\delta _{0}\) takes different actions for states \(s<s^{*}\) and \(s\ge s^{*}\). We first show that the inequality (7) holds for non-idling actions (i.e., \(a_{21}\) when \(0\le s<s^*\), and \(a_{12}\) when \(s^*\le s\le B_1+2\)).

Define

$$\begin{aligned} {{\varGamma }=\mu _{12}^{B_1+3}\sum _{i=0}^{s^{*}-1}\mu _{11}^{i}\mu _{22}^{s^{*}-1-i}+\mu _{11}^{s^{*}}\sum _{i=0}^{B_1+2-s^{*}}\mu _{12}^{s^*+i}\mu _{21}^{B_1+2-s^{*}-i}.} \end{aligned}$$

Note that \({\varGamma }>0\) under our assumptions that \(\mu _{11}>0,\mu _{12}>0\). If \(0\le s<s^*\), the right-hand side of (7) with \(a=a_{21}\) becomes \({\varDelta }(s,a_{21})=\frac{{\varGamma }_1(s)}{{\varGamma }}\), where

$$\begin{aligned} {{\varGamma }_{1}(s)}&{ =(\mu _{21}-\mu _{11})\mu _{11}^{s^{*}-s-1}\mu _{22}^{s}\sum _{i=0}^{B_1+1-s^{*}}\mu _{12}^{s^{*}+1+i}\mu _{21}^{B_1+2-s^*-i}\nonumber }\\&\quad {+\,(\mu _{12}-\mu _{22})\mu _{12}^{s^{*}}\mu _{21}^{B_1+3-s^*}\sum _{i=0}^{s-1}\mu _{11}^{s^*-s+i}\mu _{22}^{s-1-i}\nonumber }\\&\quad {+\,(\mu _{21}-\mu _{11})\mu _{12}^{B_1+3}\sum _{i=0}^{s^*-s-1}\mu _{11}^{i}\mu _{22}^{s^*-1-i}.} \end{aligned}$$

It follows with some algebra that \({\varGamma }_1(s^{*}-1)\) is a negative multiple of \(f(s^{*})\), in particular, \({\varGamma }_1(s^{*}-1)=-f(s^*)\mu _{12}^{s^*}\). Since \(f(s^{*})\ge 0\), \({\varGamma }_1(s^{*}-1)\) is nonpositive and \({\varDelta }(s^*-1,a_{21})\le 0\), proving our claim at state \(s^{*}-1\). Next we prove that \({\varGamma }_1(s)\) is nondecreasing in s by showing \({\varGamma }_1(s-1)-{\varGamma }_1(s)\le 0\) for \(1<s<s^{*}\). We have

$$\begin{aligned} {{\varGamma }_1(s-1)-{\varGamma }_1(s) =}&-\mu _{11}^{s^{*}-s-1}\mu _{22}^{s-1}(\mu _{11}\mu _{12}-\mu _{21}\mu _{22})\\&\left( (\mu _{11}-\mu _{21})\mu _{12}^{s^{*}}\sum _{i=0}^{B_1+1-s^{*}}\mu _{12}^{B_1+2-s^{*}-i}\mu _{21}^{i}+\mu _{11}\mu _{12}^{B_1+2}\right) . \end{aligned}$$

It follows from our assumptions \(\mu _{11}\ge \mu _{21}\) and \(\mu _{12}\ge \mu _{22}\) that the above expression is nonpositive. Hence \({\varGamma }_1(s)\) is nondecreasing in s, and we have \({\varDelta }(s,a_{21})\le 0\) for all \({0\le s<s^{*}}\). Furthermore, the inequality (7) is strict for \(0\le s<s^*\) unless \(f(s^*)= 0\) (because \({{\varGamma }_1(s^*-1)<0}\)).

On the other hand, if \(s^*\le s\le B_1+2\), the right-hand side of expression (7) with \(a=a_{12}\) becomes \({\varDelta }(s,a_{12})=\frac{{\varGamma }_2(s)}{{\varGamma }}\), where

$$\begin{aligned} {{\varGamma }_{2}(s)}&{= \mu _{11}^{s^{*}}(\mu _{22}-\mu _{12})\sum _{i=0}^{s-s^{*}-1}\mu _{12}^{s^*+i}\mu _{21}^{B_1-s^{*}-i+2}}\\&\quad {+\,(\mu _{11}-\mu _{21})\mu _{22}^{s^{*}}\sum _{i=0}^{B_1+1-s}\mu _{21}^{i}\mu _{12}^{B_1+2-i}\nonumber }\\&\quad {+\,(\mu _{22}-\mu _{12})\mu _{12}^{s}\mu _{21}^{B_1+2-s}\sum _{i=0}^{s^{*}-1}\mu _{11}^{i+1}\mu _{22}^{s^{*}-1-i}.} \end{aligned}$$

Evaluating \({\varGamma }_{2}(s)\) at \(s=s^*\), one can show that \({\varGamma }_{2}(s^*)\) is a positive multiple of \(f(s^*+1)\), namely \({\varGamma }_{2}(s^*)=f(s^*+1)\mu _{12}^{s^*}\). By the definition of \(s^{*}\), we know that \(f(s^{*}+1)\le 0\). Therefore we have \({\varGamma }_{2}(s^*)\le 0\) and \({\varDelta }(s^*,a_{12})\le 0\) proving our claim for \(s=s^{*}\). We prove that \({\varGamma }_{2}(s)\) is non increasing in s by looking at \({\varGamma }_{2}(s)-{\varGamma }_{2}(s+1)\) for \(s^*\le s<B_1+1\) and showing it to be non negative. We have

$$\begin{aligned} {{\varGamma }_{2}(s)-{\varGamma }_{2}(s+1)}&=-\mu _{12}^s\mu _{21}^{B_1+1-s}(\mu _{21}\mu _{22}\\&\quad -\mu _{11}\mu _{12})\left( (\mu _{12}-\mu _{22})\sum _{i=0}^{s^*-2}\mu _{22}^i\mu _{11}^{s^*-1-i}+\mu _{12}\mu _{22}^{s^*-1}\right) . \end{aligned}$$

Due to our assumptions \(\mu _{11}\ge \mu _{21}\) and \(\mu _{12}\ge \mu _{22}\), it follows that \({\varGamma }_{2}(s)-{\varGamma }_{2}(s+1)\ge 0\). Therefore, \({\varDelta }(s^{*},a_{12})\le 0\) implies that \({\varDelta }(s,a_{12})\le 0\) for all states s such that \(s^{*}\le s \le B_1+2\). Also, inequality (7) is strict for all states \(s^*\le s\le B_1+2\) unless \(f(s^*+1)=0\) (because \({\varGamma }_2(s^*)<0\)).

Since inequality (7) holds for all states \(s\in \{0,\dots ,B_1+2\}\) and non-idling actions, we have shown the policy \((\delta _0)^{\infty }\) is optimal among non-idling policies.

From the arguments given in the proof of Proposition 3, it follows that we do not need to consider idling actions for \(s=0\) and \(s=B_1+2\) (because idling actions are either equivalent to non-idling actions or they are strictly suboptimal). Here we use induction on \(B_1\) to show that a policy that uses an idling action in any of the states \(s\in \{1,\dots ,B_1+1\}\) cannot be optimal.

If \(B_1=0\), the state space becomes \(S=\{0,1,2\}\) and it is enough to prove that idling actions are strictly suboptimal in state \(s=1\). Note that the transition probabilities of the embedded discrete time Markov Chain out of states \(s=0\) and \(s=2\) does not depend on the action we choose, and the sojourn times in these states are smaller under actions \(a_{12}, a_{21}\), respectively. Thus, these actions are optimal in \(s=0\) and \(s=2\). Let \(\delta _a\) denote the decision rule that uses an idling action \(a\in \{a_{00},a_{10}, a_{01}, a_{20}, a_{02}\}\) in \(s=1\), and non-idling actions \(a_{12},a_{21}\) in \(s=0\), \(s=2\), respectively. It is easy to see that decision rule \(\delta _{a_{00}}\) is strictly suboptimal since it results in zero throughput. Note that \(\delta _0\in \{\delta ^1,\delta ^2\}\) and define \(\kappa _{a,i}=T^{(\delta ^i)^\infty }(0)-T^{(\delta _a)^\infty }(0)\) for \(i\in S\backslash \{0\}=\{1,2\}\). We have

$$\begin{aligned} {\kappa _{a_{10},1}}&={\frac{\mu _{11}\mu _{12}^2\mu _{21}}{(\mu _{11}+\mu _{12})(\mu _{12}^2+\mu _{11}\mu _{12}+\mu _{11}\mu _{21})},}\\ {\kappa _{a_{02},1}}&{=\frac{(\mu _{12}-\mu _{22})\mu _{11}^2\mu _{12}+(\mu _{12}-\mu _{22})\mu _{11}^2\mu _{21}+\mu _{11}\mu _{12}\mu _{21}\mu _{22}}{(\mu _{11}+\mu _{22})(\mu _{12}^2+\mu _{11}\mu _{12}+\mu _{11}\mu _{21})},}\\ {\kappa _{a_{01},1}}&{=\frac{(\mu _{11}-\mu _{21})\mu _{12}^3+\mu _{11}\mu _{12}^2\mu _{21}}{(\mu _{12}+\mu _{21})(\mu _{12}^2+\mu _{11}\mu _{12}+\mu _{11}\mu _{21})},}\\ {\kappa _{a_{20},1}}&{=\frac{\mu _{11}\mu _{12}^2\mu _{21}}{(\mu _{12}+\mu _{12})(\mu _{12}^2+\mu _{11}\mu _{12}+\mu _{11}\mu _{21})},}\\ {\kappa _{a_{10},2}}&{=\frac{\mu _{11}^2\mu _{12}\mu _{22}}{(\mu _{11}+\mu _{12})(\mu _{11}^2+\mu _{11}\mu _{12}+\mu _{12}\mu _{22})},}\\ {\kappa _{a_{02},2}}&{=\frac{\mu _{11}^3(\mu _{12}-\mu _{22})+\mu _{11}^2\mu _{12}\mu _{22}}{(\mu _{11}+\mu _{22})(\mu _{11}^2+\mu _{11}\mu _{12}+\mu _{12}\mu _{22})},}\\ {\kappa _{a_{01},2}}&{=\frac{(\mu _{11}-\mu _{21})\mu _{12}^2\mu _{22}+(\mu _{11}-\mu _{21})\mu _{11}\mu _{12}^2+\mu _{11}\mu _{12}\mu _{21}\mu _{22}}{(\mu _{12}+\mu _{21})(\mu _{11}^2+\mu _{11}\mu _{12}+\mu _{12}\mu _{22})},}\\ {\kappa _{a_{20},2}}&{=\frac{\mu _{11}^2\mu _{12}\mu _{22}}{(\mu _{11}+\mu _{12})(\mu _{11}^2+\mu _{11}\mu _{12}+\mu _{12}\mu _{22})}.} \end{aligned}$$

We consider three cases. First, if \(\mu _{21}>0,\mu _{22}>0\), we have \(\kappa _{a,i}>0\) for all \(i\in \{1,2\}\) and \({a\in \{a_{10}, a_{01}, a_{20}, a_{02}\}}\). Similarly, if \(\mu _{21}>0,\mu _{22}=0\), then \(\kappa _{a,1}>0\) for all \(a\in \{a_{10}, a_{01}, a_{20}, a_{02}\}\). Finally, if \(\mu _{21}=0,\mu _{22}>0\), we have \(\kappa _{a,2}>0\) for all \(a\in \{a_{10}, a_{01}, a_{20}, a_{02}\}\). Therefore, for any idling policy \((\delta _a)^\infty \), there exists a non-idling policy \((\delta ^i)^\infty \) that performs strictly better, and thus idling actions are strictly suboptimal for \(B_1=0\) in all three cases.

Assume now that the non-idling decision rule \(\delta _0\) is optimal among all possible decision rules for all buffer sizes \(B_1\le B'_1\). (Note that \(\delta _0\) depends on the buffer size \(B_1\), but we suppress this in our notation.) For buffer size \(B_1'+1\), assume there exists an optimal decision rule \(\delta '\) that uses an idling action at some state \(s\in \{1,\dots ,B_1'+2\}\). Under decision rule \(\delta '\), states \(s-1\) and \(s+1\) do not communicate and the resulting recurrent classes correspond to systems with buffer size strictly smaller than \(B_1'+1\). Let \(B''<B_1'+1\) denote the buffer size for any one of the resulting systems. By our assumption, \(\delta _0\) is optimal for this system, hence the (constant) long-run average throughput achieved by \(\delta '\) must be equal to that of \(\delta _0\). We now show that this leads to a contradiction.

We have \(T^{\delta ^i}(B_1+1)-T^{\delta ^i}(B_1)=\frac{\nu _1(i,B_1)}{\nu _2(i,B_1)\nu _2(i,B_1+1)}\) for all \(i\in \{1,\dots ,B_1+2\}\), where

$$\begin{aligned} {\nu _1(i,k)}&{=\mu _{11}^{i}\mu _{12}^{k+3+i}\mu _{21}^{k+3-i}\left( (\mu _{12}-\mu _{22})\sum _{j=0}^{i-2}\mu _{22}^j\mu _{11}^{i-1-j}+\mu _{12}\mu _{22}^{i-1}\right) ,}\\ {\nu _2(i,k)}&{=\mu _{11}^{i}\sum _{j=0}^{k+2-i}\mu _{12}^{i+j}\mu _{21}^{k+2-i-j}+\mu _{12}^{k+3}\sum _{j=0}^{i-1}\mu _{11}^j\mu _{22}^{i-1-j}.} \end{aligned}$$

If \(\mu _{21}>0,\mu _{22}>0\), then \(T^{\delta ^i}(B_1+1)-T^{\delta ^i}(B_1)>0\) for all \(i\in \{1,\dots , B_1+2\}\), implying that the throughput achieved by the decision rule \(\delta _0\) is strictly increasing in the buffer size. Similarly, if \(\mu _{21}>0,\mu _{22}=0\), we have \(f(1)\ge 0\), \(f(i)<0\) for \(i\in \{2,\dots ,B_1+3\}\) and \({s^*=1}\), \(\delta _0=\delta ^1\) for all buffer sizes, and the throughput achieved by the decision rule \(\delta _0\) is strictly increasing in buffer size since \({T^{(\delta ^1)^\infty }(B_1+1)-T^{(\delta ^1)^\infty }(B_1)>0}\). Finally, if \(\mu _{21}=0,\mu _{22}>0\), we have \(f(i)>0\) for \(i\in \{1,\dots ,B_1+2\}\), \(f(B_1+3)\le 0\), and \(s^*=B_1+2\). Thus, \(\delta _0=\delta ^{B_1+2}\) for buffer size \(B_1\) and \(\delta _0=\delta ^{B_1+3}\) for buffer size \(B_1+1\). We have \({T^{(\delta ^{B_1+3})^\infty }(B_1+1)-T^{(\delta ^{B_1+2})^\infty }(B_1)=\frac{\nu _3(B_1)}{\nu _4(B_1)\nu _4(B_1+1)}>0}\), where

$$\begin{aligned} {\nu _3(k)}&{=\mu _{11}^{k+3}\mu _{12}\mu _{22}^{k+2},}\\ {\nu _4(k)}&{=\mu _{11}^{k+2}+\mu _{12}\sum _{i=0}^{k+1}\mu _{11}^{i}\mu _{22}^{k+1-i}.} \end{aligned}$$

In all three cases, the throughput achieved by the decision rule \(\delta _0\) is strictly increasing in the buffer size, and we must have \({T^{(\delta ')^\infty }(B_1'+1)=T^{(\delta _0)^\infty }(B'')<T^{(\delta _0)^\infty }(B_1'+1)}\), contradicting the assumption that the decision rule \(\delta '\) is optimal. We conclude that no policy that uses an idling action in a state \(s\in \{1,\dots ,B'_1+2\}\) can be optimal for buffer size \(B'_1+1\). By induction, it follows that idling actions are strictly suboptimal and can be eliminated for all buffer sizes. Therefore, the decision rule \(\delta _0\) is optimal in the class of Markovian stationary deterministic policies.

We have shown that policies with idling actions are strictly suboptimal. To prove uniqueness among non-idling policies, we use a similar approach to Ayhan and Andradóttir [5] and consider a non-idling decision rule \(\delta '\) that differs from \(\delta _0\) in at least one state \(s\in S\). Let us define

$$\begin{aligned} {u}&{ = P_{\delta ^{'}}g_{0}e-g_{0}e=0,}\\ {v}&{ = r_{\delta ^{'}}+(P_{\delta ^{'}}-I)h_0 - g_0e= r_{\delta ^{'}}+P_{\delta ^{'}}h_0-(r_{\delta _0}+P_{\delta _0}h_0),} \end{aligned}$$

where we have used Eq. (6). It is shown in the proof of optimality above that \(v(s)\le 0\) for all \(s\in S\) and that if \(f(s^*)>0, f(s^*+1)<0\), we must have

$$\begin{aligned} v(s)<0 \text { for all }s\in S\text { with }\delta '(s)\ne \delta _0(s). \end{aligned}$$
(8)

It is easy to see that if both \(\mu _{21},\mu _{22}\) are positive, then \(P_{\delta _0}\) must be irreducible since \(\delta _0\) is non-idling. Moreover, \(\delta _0=\delta ^{B_1+2}\) and \(\delta _0=\delta ^{1}\) also result in irreducible transition matrices when \(\mu _{21}=0,\mu _{22}>0\) and \(\mu _{21}>0,\mu _{22}=0\), respectively. Hence, given that at least one of \(\mu _{21},\mu _{22}\) is nonzero, \(P_{\delta _0}\) is irreducible.

Since \(P_{\delta _0}\) is irreducible and \(\delta '\) differs from \(\delta _0\) in at least one state, then it must differ from \(\delta _0\) in at least one state \(s_0\in S\) that is recurrent under \(\delta '\). Let \(g'\) denote the (possibly state dependent) throughput of the stationary policy \((\delta ')^{\infty }\) and define \({\varDelta } g=g'-g_0 e\). Also let \(P^*_{\delta '}\) denote the limiting matrix of \(P_{\delta '}\). Suppose \(P_{\delta '}\) has n recurrent classes, and partition \(P_{\delta '}\) such that \(P_i\) for \(i\in \{1,2,\dots ,n\}\) corresponds to transitions within recurrent class i. Also partition \(g'\), \({\varDelta } g\), and \(P^*_{\delta '}\) in a manner consistent with the partition of \(P_{\delta '}\). Lemma 9.2.5 of Puterman [23] states that \({\varDelta } g_i=P^*_i v_i\). Using this lemma and Eq. (8), we conclude \(g'(s_0)-g_0<0\). Thus the decision rule \(\delta '\) cannot be optimal, proving that \((\delta _0)^{\infty }\) is the unique optimal policy if \(f(s^*)>0,f(s^*+1)<0\). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Işık, T., Andradóttir, S. & Ayhan, H. Optimal control of queueing systems with non-collaborating servers. Queueing Syst 84, 79–110 (2016). https://doi.org/10.1007/s11134-016-9481-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11134-016-9481-2

Keywords

Mathematics Subject Classification

Navigation