Skip to main content
Log in

Admit or preserve? Addressing server failures in cloud computing task management

  • Published:
Queueing Systems Aims and scope Submit manuscript

Abstract

Cloud computing task management has a critical role in the efficient operation of the cloud resources, i.e., the servers. The task management handles critical and complicated decisions, overcoming the inherent dynamic nature of cloud computing systems and the additional complexity due to the large magnitude of resources in such systems (tens of thousands of servers). Due to the fact that servers may fail, task management is required to conduct both task admissions and task preservation decisions. Moreover, both these decisions require considering future system trajectories and the interplay between preservation and admission. In this paper we study the combined problem of task admission and preservation in a dynamic environment of cloud computing systems through analysis of a queueing system based on a Markov decision process (MDP). We show that the optimal operational policy is of a double switching curve type. On face value, the extraction of the optimal policy is rather complicated, yet our analysis reveals that the optimal policy can be reduced to a single rule, since the rules can effectively be decoupled. Based on this result, we propose two heuristic approaches that approximate the optimal rule for the most relevant system settings in cloud computing systems. Our results provide a simple policy scheme for the combined admission and preservation problem that can be applied in a complex cloud computing environments, and eliminate the need for sophisticated real-time control mechanisms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. There are multiple methods to investigate a finite state-space MDP. [18, 39] formulate an additional representation of the value function for states on the system’s boundaries, which then needed to be considered in the investigation of the value function properties. A scheme described in [16] and used in [1, 37, 38] is based on a state-dependent operator, for example, in our system \(K(M-m_a-M_f)^{+}\), which is added to the value function. By defining K sufficiently large, the system is prevented exiting the original state space. The scheme that we used defines the value function in states beyond the boundary as infinity, an approach that is similar to the previous one if \(K=\infty \).

  2. In [16] \(T_{disc}f(x_1,x_2)=C(x_1,x_2)+\delta f(x_1,x_2)\), where \(C(x_1,x_2)\in \mathbb {R}\) and convex. \(C(x_1,x_2)\) is the direct state cost.

  3. We use \(1/\mu _f=30,120\) to emphasize the impact of the properties under investigation.

  4. We note that \(\delta \) is set to this very high value as our system includes virtual events (due to the uniformization) that do not impact the value function yet force us to give more weight to actual events.

  5. We stop the value iteration once the relative change of V (for all elements in the state space) between consecutive iterations drops below the convergence criterion.

References

  1. Altman, E., Jimenez, T., Koole, G.: On optimal call admission control in resource-sharing system. IEEE Trans. Commun. 49(9), 1659–1668 (2001). https://doi.org/10.1109/26.950352

    Article  Google Scholar 

  2. Altman, E., Koole, G.: On submodular value functions and complex dynamic programming. Commun. Stat. Stoch. Models 14(5), 1051–1072 (1998). https://doi.org/10.1080/15326349808807514

    Article  Google Scholar 

  3. Bin, E., Biran, O., Boni, O., Hadad, E., Kolodner, E., Moatti, Y., Lorenz, D.: Guaranteeing high availability goals for virtual machine placement. In: Distributed Computing Systems (ICDCS), 2011 31st International Conference on, pp. 700–709 (2011). https://doi.org/10.1109/ICDCS.2011.72

  4. Bonomi, F., Milito, R., Zhu, J., Addepalli, S.: Fog computing and its role in the internet of things. In: Proceedings of the First Edition of the MCC Workshop on Mobile Cloud Computing, MCC ’12, pp. 13–16. ACM, New York, NY, USA (2012). https://doi.org/10.1145/2342509.2342513

  5. Chandra, A., Weissman, J., Heintz, B.: Decentralized edge clouds. IEEE Internet Comput. 17(5), 70–73 (2013). https://doi.org/10.1109/MIC.2013.93

    Article  Google Scholar 

  6. Chen, X., Lu, C.D., Pattabiraman, K.: Failure prediction of jobs in compute clouds: a google cluster case study. In: Software Reliability Engineering Workshops (ISSREW), 2014 IEEE International Symposium on, pp. 341–346 (2014). https://doi.org/10.1109/ISSREW.2014.105

  7. Dean, J.: Designs, lessons and advice from building large distributed systems (2009). https://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf. Keynote Speech at the 3rd ACM SIGOPS International Workshop on Large Scale Distributed Systems and Middleware

  8. Efrosinin, D.: Queueing model of a hybrid channel with faster link subject to partial and complete failures. Ann. Oper. Res. 202(1), 75–102 (2013). https://doi.org/10.1007/s10479-011-0939-7

    Article  Google Scholar 

  9. Feldman, Z., Masin, M., Tantawi, A., Arroyo, D., Steinder, M.: Using approximate dynamic programming to optimize admission control in cloud computing environment. In: Simulation Conference (WSC), Proceedings of the 2011 Winter, pp. 3153–3164 (2011). https://doi.org/10.1109/WSC.2011.6148014

  10. Garraghan, P., Townend, P., Xu, J.: An empirical failure-analysis of a large-scale cloud computing environment. In: High-Assurance Systems Engineering (HASE), 2014 IEEE 15th International Symposium on, pp. 113–120 (2014). https://doi.org/10.1109/HASE.2014.24

  11. Ghoneim, H.A., Stidham Jr., S.: Control of arrivals to two queues in series. Eur. J. Oper. Res. 21(3), 399–409 (1985). https://doi.org/10.1016/0377-2217(85)90160-2

    Article  Google Scholar 

  12. Hu, Z., Wu, K., Huang, J.: An utility-based job scheduling algorithm for current computing cloud considering reliability factor. In: Software Engineering and Service Science (ICSESS), 2012 IEEE 3rd International Conference on, pp. 296–299 (2012). https://doi.org/10.1109/ICSESS.2012.6269464

  13. Javadi, B., Abawajy, J., Buyya, R.: Failure-aware resource provisioning for hybrid cloud infrastructure. J. Parallel Distrib. Comput. 72(10), 1318–1331 (2012). https://doi.org/10.1016/j.jpdc.2012.06.012

    Article  Google Scholar 

  14. Jiminez, T.: Optimal admission control for high speed networks: a dynamic programming approach. In: Decision and Control, 2000. Proceedings of the 39th IEEE Conference on, Vol. 2, pp. 1846–1851 Vol. 2 (2000). https://doi.org/10.1109/CDC.2000.912131

  15. Koole, G.: Structural results for the control of queueing systems using event-based dynamic programming. Queueing Syst. 30(3–4), 323–339 (1998). https://doi.org/10.1023/A:1019177307418

    Article  Google Scholar 

  16. Koole, G.: Monotonicity in Markov reward and decision chains: theory and applications. Found. Trends® Stoch. Syst. 1(1), 1–76 (2006). https://doi.org/10.1561/0900000002

    Article  Google Scholar 

  17. Lavi, N., Scheim, J.: Vehicular relay nodes for cellular deployments: uplink channel modeling and analysis. In: Vehicular Technology Conference (VTC Spring), 2013 IEEE 77th, pp. 1–5 (2013). https://doi.org/10.1109/VTCSpring.2013.6692808

  18. Lerzan Örmeci, E., Burnetas, A., van der Wal, J.: Admission policies for a two class loss system. Stoch. Models 17(4), 513–539 (2001). https://doi.org/10.1081/STM-120001221

    Article  Google Scholar 

  19. Lewis, M.E., Ayhan, H., Foley, R.D.: Bias optimality in a queue with admission control. Probab. Eng. Inf. Sci. 13, 309–327 (1999)

    Article  Google Scholar 

  20. Limrungsi, N., Zhao, J., Xiang, Y., Lan, T., Huang, H., Subramaniam, S.: Providing reliability as an elastic service in cloud computing. In: Communications (ICC), 2012 IEEE International Conference on, pp. 2912–2917 (2012). https://doi.org/10.1109/ICC.2012.6364649

  21. Machida, F., Kawato, M., Maeno, Y.: Redundant virtual machine placement for fault-tolerant consolidated server clusters. In: Network Operations and Management Symposium (NOMS), 2010 IEEE, pp. 32–39 (2010). https://doi.org/10.1109/NOMS.2010.5488431

  22. Malik, S., Huet, F.: Adaptive fault tolerance in real time cloud computing. In: Services (SERVICES), 2011 IEEE World Congress on, pp. 280–287 (2011). https://doi.org/10.1109/SERVICES.2011.108

  23. Malik, S., Huet, F., Caromel, D.: Reliability aware scheduling in cloud computing. In: Internet Technology and Secured Transactions, 2012 International Conference for, pp. 194–200 (2012)

  24. Milito, R., Levy, H.: Modeling and dynamic scheduling of a queueing system with blocking and starvation. IEEE Trans. Commun. 37(12), 1318–1329 (1989). https://doi.org/10.1109/26.44203

    Article  Google Scholar 

  25. Miller, B.L.: A queueing reward system with several customer classes. Manage. Sci. 16(3), 234–245 (1969). https://doi.org/10.1287/mnsc.16.3.234

    Article  Google Scholar 

  26. Mitrany, I.L., Avi-Itzhak, B.: A many-server queue with service interruptions. Oper. Res. 16(3), 628–638 (1968). https://doi.org/10.1287/opre.16.3.628

    Article  Google Scholar 

  27. Mondal, S., Muppala, J., Machida, F., Trivedi, K.: Computing defects per million in cloud caused by virtual machine failures with replication. In: Dependable Computing (PRDC), 2014 IEEE 20th Pacific Rim International Symposium on, pp. 161–168 (2014). https://doi.org/10.1109/PRDC.2014.29

  28. Neuts, M.F., Lucantoni, D.M.: A Markovian queue with n servers subject to breakdowns and repairs. Manage. Sci. 25(9), 849–861 (1979). https://doi.org/10.1287/mnsc.25.9.849

    Article  Google Scholar 

  29. Ni, J., Tsang, D., Tatikonda, S., Bensaou, B.: Threshold and reservation based call admission control policies for multiservice resource-sharing systems. In: INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE, Vol. 2, pp. 773–783 Vol. 2 (2005). https://doi.org/10.1109/INFCOM.2005.1498309

  30. Örmeci, E.L.: Dynamic admission control in a call center with one shared and two dedicated service facilities. IEEE Trans. Autom. Control 49(7), 1157–1161 (2004). https://doi.org/10.1109/TAC.2004.831133

    Article  Google Scholar 

  31. Özkan, E., Kharoufeh, J.: Incompleteness of results for the slow-server problem with an unreliable fast server. Ann. Oper. Res. (2014). https://doi.org/10.1007/s10479-014-1615-5

    Article  Google Scholar 

  32. Özkan, E., Kharoufeh, J.P.: Optimal control of a two-server queueing system with failures. Probab. Eng. Inf. Sci. 28, 489–527 (2014). https://doi.org/10.1017/S0269964814000114

    Article  Google Scholar 

  33. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Probability and Mathematical Statistics. Wiley, New York (1994). http://opac.inria.fr/record=b1084090. A Wiley-Interscience publication

  34. Ramjee, R., Nagarajan, R., Towsley, D.: On optimal call admission control in cellular networks. In: INFOCOM ’96. Fifteenth Annual Joint Conference of the IEEE Computer Societies. Networking the Next Generation. Proceedings IEEE, Vol. 1, pp. 43–50 Vol. 1 (1996). https://doi.org/10.1109/INFCOM.1996.497876

  35. Satyanarayanan, M., Bahl, P., Caceres, R., Davies, N.: The case for vm-based cloudlets in mobile computing. IEEE Pervasive Comput. 8(4), 14–23 (2009). https://doi.org/10.1109/MPRV.2009.82

    Article  Google Scholar 

  36. Scheim, J., Lavi, N.: Vehicular relay nodes for cellular deployment: downlink channel modeling and analysis. In: Microwaves, Communications, Antennas and Electronics Systems (COMCAS), 2013 IEEE International Conference on, pp. 1–5 (2013). https://doi.org/10.1109/COMCAS.2013.6685270

  37. Shifrin, M.: Admission and scheduling control in cloud computing—Markov decision processes and diffusion approximations. Ph.D. thesis, Technion—Israel Institute of Technology (2013)

  38. Shifrin, M., Atar, R., Cidon, I.: Optimal scheduling in the hybrid-cloud. In: Integrated Network Management (IM 2013), 2013 IFIP/IEEE International Symposium on, pp. 51–59 (2013)

  39. Ulukus, M.Y., Güllü, R., Örmeci, L.: Admission and termination control of a two class loss system. Stoch. Models 27(1), 2–25 (2011)

    Article  Google Scholar 

  40. Unuvar, M., Doganata, Y., Tantawi, A.: Configuring cloud admission policies under dynamic demand. In: Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS), 2013 IEEE 21st International Symposium on, pp. 313–317 (2013). https://doi.org/10.1109/MASCOTS.2013.42

  41. Vishwanath, K.V., Nagappan, N.: Characterizing cloud computing hardware reliability. In: Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC ’10, pp. 193–204. ACM, New York, NY, USA (2010). https://doi.org/10.1145/1807128.1807161

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their insightful comments throughout the reviewing process.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nadav Lavi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Properties of the preservation operator

In this section we establish the fundamental properties of the preservation operator, \(T_{CP}\), defined in (10). We continue the analysis of the properties essential to our value function investigation (non-decreasing, convex and supermodular) from Theorem 1. Furthermore, and in order to provide a more comprehensive analysis of the new preservation operator and its potential in the investigation of other systems, we also prove additional properties maintained by the new preservation operator which can be used in other types of systems.

We note that while our original problem is defined over a finite state space, we extend our investigation of this new operator to an infinite state space, i.e., \((x_1,x_2)\in \mathbb {N}_0^2\). The fact that we do not consider the state variables makes the investigation less complex. We emphasize that, as shown above in Sect. 6, the results also apply for our finite state space, as will be demonstrated below.

1.1 Essential properties maintained by the preservation operator

We focus on the essential properties necessary to show a switching curve policy and which are used in our proof in Sect. 6. We continue the investigation from Theorem 1 of the essential properties. We use the same definitions of \(T:A\rightarrow {B}\), I, Cx and Super from Sect. 6.

Proof of Theorem 1

We show that the operator maintains the non-decreasing, convexity in \(x_1\) and supermodularity properties.

Non-decreasing We assume that \(f\in {\mathbf{I }}\). We need to show that for any of the decisions taken by the preservation operator (on the RHS of the definition), i.e., preserve or drop in states \((x_1+1,x_2)\) and \((x_1,x_2+1)\), the property is maintained. We first prove this for \(x_1\) with the preservation decision:

$$\begin{aligned}&\min \{c+f(x_1,x_2+1), c'+f(x_1-1,x_2+1)\} \\&\quad \le c+f(x_1,x_2+1) \\&\quad \le c+f(x_1+1,x_2+1). \end{aligned}$$

Similarly, we prove this for \(x_1\) with the drop decision:

$$\begin{aligned}&\min \{c+f(x_1,x_2+1), c'+f(x_1-1,x_2+1)\} \\&\quad \le c'+f(x_1-1,x_2+1) \\&\quad \le c'+f(x_1,x_2+1). \end{aligned}$$

The proof for \(x_2\) follows similar steps and is therefore omitted from the paper.

Convexity in\({\varvec{x}}_\mathbf{1}\) It is important to note that a simpler approach with weaker properties, such as \({\mathbf{I }}\cap \mathbf{Cx(1) }\rightarrow \mathbf{Cx(1) }\), could have been used to address our specific needs. However, we wish to prove the complete properties of the new preservation operator \(T_{CP}\) introduced in this paper.

Similarly to [16], we denote the preservation decisions of the last arguments in the convexity definition after applying the operator, i.e., \(a_1\) and \(a_2\) for \(T_{CP}f(X)\) and \(T_{CP}f(X+2e_{i})\), respectively, where \(a_i=1\) indicates a decision to preserve a task, and \(a_i=0\) indicates a decision to drop a task. We now examine the four cases:

\(\mathbf {a_1=a_2=1}\)

We apply convexity once and get

$$\begin{aligned}&2\min \{c+f(x_1+1,x_2+1), c'+f(x_1,x_2+1)\} \\&\quad \le 2(c + f(x_1+1,x_2+1)) \\&\quad \le 2c + f(x_1,x_2+1) + f(x_1+2,x_2+1) \\&\quad = c + f(x_1,x_2+1) + c + f(x_1+2,x_2+1). \end{aligned}$$

\(\mathbf {a_1=a_2=0}\)

Similarly to the previous case, we apply convexity once:

$$\begin{aligned}&2\min \{c+f(x_1+1,x_2+1), c'+f(x_1,x_2+1)\} \\&\quad \le 2(c' + f(x_1,x_2+1)) \\&\quad \le 2c' + f(x_1-1,x_2+1) + f(x_1+1,x_2+1) \\&\quad = c' + f(x_1-1,x_2+1) + c' + f(x_1+1,x_2+1). \end{aligned}$$

\(\mathbf {a_1=1, a_2=0}\)

This scenario is straightforward:

$$\begin{aligned}&2\min \{c+f(x_1+1,x_2+1), c'+f(x_1,x_2+1)\} \\&\quad \le c + f(x_1+1,x_2+1) + c' + f(x_1,x_2+1) \\&\quad = c + f(x_1,x_2+1) + c' + f(x_1+1,x_2+1). \end{aligned}$$

\(\mathbf {a_1=0, a_2=1}\)

In this scenario we apply convexity twice:

$$\begin{aligned}&2\min \{c+f(x_1+1,x_2+1), c'+f(x_1,x_2+1)\} \\&\quad \le c + f(x_1+1,x_2+1) + c' + f(x_1,x_2+1) \\&\quad \le c + c' + 2f(x_1+1,x_2+1) + f(x_1-1,x_2+1)- f(x_1,x_2+1) \\&\quad \le c + c' + f(x_1+2,x_2+1)+f(x_1-1,x_2+1) \\&\quad =(c'+f(x_1-1,x_2+1))+(c+f(x_1+2,x_2+1)). \end{aligned}$$

Supermodularity For clarity we indicate that in the two-dimensional state space Super(1,2)=Super. Similarly to the convexity proof, we apply the definition of [16] and denote the preservation decisions of the arguments on the RHS of the supermodularity by \(a_1\) and \(a_2\), i.e., \(a_1\) for the decision in \(T_{CP}f(X)\) and \(a_2\) for the decision in \(T_{CP}f(X+e_{i}+e_{j})\), where \(a_i=1\) indicates a decision to preserve a task, and \(a_i=0\) indicates a decision to drop a task. We now prove supermodularity by examining the rest of the three scenarios for \(a_1\) and \(a_2\) (recall that \(a_1=0, a_2=1\) was examined in Sect. 6).

\(\mathbf {a_1=a_2=1}\)

We apply supermodularity once and get

$$\begin{aligned}&\min \{c+f(x_1+1,x_2+1), c'+f(x_1,x_2+1)\} \\&\qquad +\min \{c+f(x_1,x_2+2), c'+f(x_1-1,x_2+2)\} \\&\quad \le c+f(x_1+1,x_2+1)+c+f(x_1,x_2+2) \\&\quad \le c+f(x_1,x_2+1)+c+f(x_1+1,x_2+2). \end{aligned}$$

\(\mathbf {a_1=a_2=0}\)

We apply supermodularity once and get

$$\begin{aligned}&\min \{c+f(x_1+1,x_2+1), c'+f(x_1,x_2+1)\} \\&\qquad +\min \{c+f(x_1,x_2+2), c'+f(x_1-1,x_2+2)\} \\&\quad \le c'+f(x_1,x_2+1)+c'+f(x_1-1,x_2+2) \\&\quad \le c'+f(x_1-1,x_2+1)+c'+f(x_1,x_2+2). \end{aligned}$$

\(\mathbf {a_1=1, a_2=0}\)

This case is straightforward:

$$\begin{aligned}&\min \{c+f(x_1+1,x_2+1), c'+f(x_1,x_2+1)\} \\&\qquad +\min \{c+f(x_1,x_2+2), c'+f(x_1-1,x_2+2)\} \\&\quad \le c'+f(x_1,x_2+1)+c+f(x_1,x_2+2) \\&\quad =c+f(x_1,x_2+1)+c'+f(x_1,x_2+2). \end{aligned}$$

\(\square \)

Remark 2

For the non-decreasing property over a finite state space, it is necessary to investigate the state space defined by \(x_1+x_2<M\). The only difference is the investigation of the boundaries to show that the inequality holds with the RHS equal to the drop decision. This is clearly proven also above in the infinite state space.

Similarly, for the convexity and supermodularity properties over a finite state space, it is necessary to investigate the state space defined by \(x_1+x_2<M-1\). The only difference is the investigation of the boundaries to show that the inequality holds, where on the RHS \(a_1=0,1\) and \(a_2=0\). This is clearly proven also above in the infinite state space.

1.2 Additional properties maintained by the preservation operator

As stated above, the following properties are not needed for proving the value function properties that we require in our proofs in Sects. 6 and 7. However, they may be useful for other problems. Similarly to the definitions in Sect. 6, here we follow the property definitions from [16]:

  • f(X) is upstream-increasing in \(x_i\) if \(f(X+e_{i+1})\le {f(X+e_{i})}\), \(1\le i<m,\)

  • f(X) is submodular in \(x_i, x_j\) (\(1\le {i}<{j}\le {m}\)) if

    \(f(X)+f(X+e_i+e_j)\le f(X+e_i)+f(X+e_j),\)

  • f(X) is superconvex in \(x_i, x_j\) (\(1\le {i},{j}\le {m}\), \(i\ne j\)) if

    \(f(X+e_i+e_j)+f(X+e_i)\le {f(X+2e_i)+f(X+e_j),}\)

  • f(X) is subconvex in \(x_i, x_j\) (\(1\le {i},{j}\le {m}\), \(i\ne j\)) if

    \(f(X+e_i+e_j)+f(X+e_i)\le f(X+2e_i+e_j)+f(X),\)

where \(e_i\) is a vector of same dimension as X with all zeros and a 1 at the ith location.

Similarly to [16], we denote upstream-increasing as UI(i). If the properties hold in all dimensions we can simply use the notation UI. In addition, we denote submodular, superconvex and subconvex in \(x_i\) and \(x_j\) as Sub(i,j), SuperC(i,j) and SubC(i,j), respectively. If the properties hold in all of the potential dimension combination (under the conditions stated above per property), we use the notation Sub, SuperC and SubC.

Theorem 8

\(T_{CP}f(x_1,x_2): \mathbf{UI }\longrightarrow \mathbf{UI }\ \)(preserving the upstream increase property).

Proof

The upstream increase property provides insight into the relations between various dimensions of X. The proof follows the same technique as in the theorems in Sect. A.1. As we are focused on a two-dimensional state space, we only need to prove that

$$\begin{aligned} T_{CP}f(x_1,x_2+1)\le T_{CP}f(x_1+1,x_2). \end{aligned}$$
(31)

We examine the two possible decisions on the RHS of (31). We first examine preservation:

$$\begin{aligned}&\min \{c+f(x_1,x_2+2), c'+f(x_1-1,x_2+2)\} \\&\quad \le c+f(x_1,x_2+2) \\&\quad \le c+f(x_1+1,x_2+1). \end{aligned}$$

We now examine the drop decision:

$$\begin{aligned}&\min \{c+f(x_1,x_2+2), c'+f(x_1-1,x_2+2)\} \\&\quad \le c'+f(x_1-1,x_2+2) \\&\quad \le c'+f(x_1,x_2+1). \end{aligned}$$

\(\square \)

Theorem 9

\(T_{CP}f(x_1,x_2): \mathbf{Sub }\cap \mathbf{SubC(1,2) }\longrightarrow \mathbf{Sub }\ (\)preserving the submodularity property)

Proof

The submodularity property is beneficial in various complex systems and stochastic control problems as detailed in [2]. An example of the use of submodularity in admission control can be found in [1]. We note that, based on [16] (Equation 6.3), combining submodularity and subconvexity properly results in component-wise convexity, i.e., \({\varvec{Sub}}(i,j)\cap {\varvec{SubC}}(i,j)\subset {\varvec{Cx}}(i)\).

Similarly to the supermodularity proof Theorem 1, in the two-dimensional case Sub(1,2) = Sub. We follow the same scheme used in Theorem 1 (for the convexity and supermodularity investigation) and examine the preservation decisions on the RHS of the submodularity definition. We define the decisions as \(a_1\) and \(a_2\) for \(T_{CP}f(x_1+1,x_2)\) and \(T_{CP}f(x_1,x_2+1)\), respectively, where \(a_i=1\) indicates preservation, and \(a_i=0\) indicates dropping. We examine the four scenarios for \(a_1\), \(a_2\).

\(\mathbf {a_1=a_2=1}\)

We apply submodularity once and get

$$\begin{aligned}&\min \{c+f(x_1,x_2+1), c'+f(x_1-1,x_2+1)\} \\&\qquad +\min \{c+f(x_1+1,x_2+2), c'+f(x_1,x_2+2)\} \\&\quad \le c+f(x_1,x_2+1)+c+f(x_1+1,x_2+2) \\&\quad \le c+f(x_1+1,x_2+1)+c+f(x_1,x_2+2). \end{aligned}$$

\(\mathbf {a_1=a_2=0}\)

We apply submodularity once and get

$$\begin{aligned}&\min \{c+f(x_1,x_2+1), c'+f(x_1-1,x_2+1)\} \\&\qquad +\min \{c+f(x_1+1,x_2+2), c'+f(x_1,x_2+2)\} \\&\quad \le c'+f(x_1-1,x_2+1)+c'+f(x_1,x_2+2) \\&\quad \le c'+f(x_1,x_2+1)+c'+f(x_1-1,x_2+2). \end{aligned}$$

\(\mathbf {a_1=0, a_2=1}\)

The proof is straightforward:

$$\begin{aligned}&\min \{c+f(x_1,x_2+1), c'+f(x_1-1,x_2+1)\} \\&\qquad +\min \{c+f(x_1+1,x_2+2), c'+f(x_1,x_2+2)\} \\&\quad \le c+f(x_1,x_2+1)+c'+f(x_1,x_2+2) \\&\quad =c'+f(x_1,x_2+1)+c+f(x_1,x_2+2). \end{aligned}$$

\(\mathbf {a_1=1, a_2=0}\)

The proof is more complex and includes applying submodularity twice and then subconvexity:

$$\begin{aligned}&\min \{c+f(x_1,x_2+1), c'+f(x_1-1,x_2+1)\} \\&\qquad +\min \{c+f(x_1+1,x_2+2), c'+f(x_1,x_2+2)\} \\&\quad \le c+f(x_1,x_2+1)+c'+f(x_1,x_2+2) \\&\quad \le c+c'+f(x_1,x_2+1)+f(x_1,x_2+1)+f(x_1-1,x_2+2)\\&\qquad -f(x_1-1,x_2+1). \end{aligned}$$

We can now apply \(f\in \mathbf{SubC(1,2) }\) on one of \(f(x_1,x_2+1)\) and \(f\in \mathbf{Sub }\) on the second \(f(x_1,x_2+1)\), and get

$$\begin{aligned}&\min \{c+f(x_1,x_2+1), c'+f(x_1-1,x_2+1)\} \\&\qquad +\min \{c+f(x_1+1,x_2+2), c'+f(x_1,x_2+2)\} \\&\quad \le c+f(x_1+1,x_2+1)+c'+f(x_1-1,x_2+2). \end{aligned}$$

\(\square \)

Theorem 10

\(T_{CP}f(x_1,x_2): \mathbf{SubC(1,2) }\longrightarrow \mathbf{SubC(1,2) }\ (\)preserving the subconvexity property)

Proof

The subconvexity terminology is from [16], with its definition based on [11], in which it was used in the analysis of arrivals to two queues in series. Similarly to our previous note in Theorem 9, the combination of subconvexity and submodularity results in convexity (\({\varvec{Sub}}(i,j)\cap {\varvec{SubC}}(i,j)\subset {\varvec{Cx}}(i)\)).

The proof follows the same technique as in the theorems in Sect. A.1. We examine the preservation decisions on the RHS of the subconvexity definition. We define the decisions as \(a_1\) and \(a_2\) for \(T_{CP}f(x_1,x_2)\) and \(T_{CP}f(x_1+2,x_2+1)\), respectively, where \(a_i=1\) indicates preservation, and \(a_i=0\) indicates dropping. We examine the four scenarios for \(a_1\), \(a_2\).

\(\mathbf {a_1=a_2=1}\)

We apply subconvexity once and get

$$\begin{aligned}&\min \{c+f(x_1+1,x_2+1), c'+f(x_1,x_2+1)\} \\&\qquad +\min \{c+f(x_1+1,x_2+2), c'+f(x_1,x_2+2)\} \\&\quad \le c+f(x_1+1,x_2+1)+c+f(x_1+1,x_2+2) \\&\quad \le c+f(x_1,x_2+1)+c+f(x_1+2,x_2+2). \end{aligned}$$

\(\mathbf {a_1=a_2=0}\)

We apply subconvexity once and get

$$\begin{aligned}&\min \{c+f(x_1+1,x_2+1), c'+f(x_1,x_2+1)\} \\&\qquad +\min \{c+f(x_1+1,x_2+2), c'+f(x_1,x_2+2)\} \\&\quad \le c'+f(x_1,x_2+1)+c'+f(x_1,x_2+2) \\&\quad \le c'+f(x_1-1,x_2+1)+c'+f(x_1+2,x_2+2). \end{aligned}$$

\(\mathbf {a_1=1, a_2=0}\)

$$\begin{aligned}&\min \{c+f(x_1+1,x_2+1), c'+f(x_1,x_2+1)\} \\&\qquad +\min \{c+f(x_1+1,x_2+2), c'+f(x_1,x_2+2)\} \\&\quad \le c'+f(x_1,x_2+1)+c+f(x_1+1,x_2+2) \\&\quad = c+f(x_1,x_2+1)+c'+f(x_1+1,x_2+2). \end{aligned}$$

\(\mathbf {a_1=0, a_2=1}\)

We apply subconvexity twice and get

$$\begin{aligned}&\min \{c+f(x_1+1,x_2+1), c'+f(x_1,x_2+1)\} \\&\qquad +\min \{c+f(x_1+1,x_2+2), c'+f(x_1,x_2+2)\} \\&\quad \le c+f(x_1+1,x_2+1)+c'+f(x_1,x_2+2) \\&\quad \le c+f(x_1,x_2+1)+f(x_1+2,x_2+2)-f(x_1-1,x_2+2) \\&\qquad +c'+f(x_1-1,x_2+1)+f(x_1+1,x_2+2)-f(x_1,x_2+1) \\&\quad \le c'+f(x_1-1,x_2+1)+c+f(x_1+2,x_2+2). \end{aligned}$$

\(\square \)

Theorem 11

\(T_{CP}f(x_1,x_2): \mathbf{SuperC(1,2) }\longrightarrow \mathbf{SuperC(1,2) }\ (\)preserving the superconvexity property)

Proof

Similarly to subconvexity, superconvexity was termed by [16] with the definition based on [11]. We further note that, based on [16] (Equation 6.2), combining supermodularity and superconvexity properly results in component-wise convexity, i.e., \({\varvec{Super}}(i,j)\cap {\varvec{SuperC}}(i,j)\subset {\varvec{Cx}}(i)\).

The proof follows the same technique as in the theorems in Sect. A.1. We examine the preservation decisions on the RHS of the superconvexity definition. We define the decisions as \(a_1\) and \(a_2\) for \(T_{CP}f(x_1,x_2+1)\) and \(T_{CP}f(x_1+2,x_2)\), respectively, where \(a_i=1\) indicates preservation, and \(a_i=0\) indicates dropping. We examine the four scenarios for \(a_1\), \(a_2\).

\(\mathbf {a_1=a_2=1}\)

We apply superconvexity once and get

$$\begin{aligned}&\min \{c+f(x_1+1,x_2+1), c'+f(x_1,x_2+1)\} \\&\qquad +\min \{c+f(x_1+1,x_2+2), c'+f(x_1,x_2+2)\} \\&\quad \le c+f(x_1+1,x_2+1)+c+f(x_1+1,x_2+2) \\&\quad \le c+f(x_1,x_2+2)+c+f(x_1+2,x_2+1). \end{aligned}$$

\(\mathbf {a_1=a_2=0}\)

We apply superconvexity once and get

$$\begin{aligned}&\min \{c+f(x_1+1,x_2+1), c'+f(x_1,x_2+1)\} \\&\qquad +\min \{c+f(x_1+1,x_2+2), c'+f(x_1,x_2+2)\} \\&\quad \le c'+f(x_1,x_2+1)+c'+f(x_1,x_2+2) \\&\quad \le c'+f(x_1-1,x_2+2)+c'+f(x_1+1,x_2+1). \end{aligned}$$

\(\mathbf {a_1=1, a_2=0}\)

This case is straightforward:

$$\begin{aligned}&\min \{c+f(x_1+1,x_2+1), c'+f(x_1,x_2+1)\} \\&\qquad +\min \{c+f(x_1+1,x_2+2), c'+f(x_1,x_2+2)\} \\&\quad \le c+f(x_1+1,x_2+1)+c'+f(x_1,x_2+2) \\&\quad = c+f(x_1,x_2+2)+c'+f(x_1+1,x_2+1). \end{aligned}$$

\(\mathbf {a_1=0, a_2=1}\)

We apply subconvexity twice and get

$$\begin{aligned}&\min \{c+f(x_1+1,x_2+1), c'+f(x_1,x_2+1)\} \\&\qquad +\min \{c+f(x_1+1,x_2+2), c'+f(x_1,x_2+2)\} \\&\quad \le c'+f(x_1,x_2+1)+c+f(x_1+1,x_2+2) \\&\quad \le c'+f(x_1-1,x_2+2)+f(x_1+1,x_2+1)-f(x_1-1,x_2+2) \\&\qquad +c+f(x_1+1,x_2+2) \\&\quad \le c'+f(x_1-1,x_2+2)+c+f(x_1+2,x_2+1). \end{aligned}$$

\(\square \)

To conclude, we proved that \(T_{CP}\) over a two-dimensional state space maintains the following properties:

  • Non-decreasing: \({\mathbf{I }}\longrightarrow {\mathbf{I }}\)

  • Upstream increase: \(\mathbf{UI }\longrightarrow \mathbf{UI }\)

  • Convexity in one dimension (active servers): \(\mathbf{Cx(1) }\longrightarrow \mathbf{Cx(1) }\)

  • Supermodular: \(\mathbf{Super }\longrightarrow \mathbf{Super }\)

  • Submodular: \(\mathbf{Sub }\cap \mathbf{SubC(1,2) }\longrightarrow \mathbf{Sub }\)

  • Subconvexity: \(\mathbf{SubC(1,2) }\longrightarrow \mathbf{SubC(1,2) }\)

  • Superconvexity: \(\mathbf{SuperC(1,2) }\longrightarrow \mathbf{SuperC(1,2) }\)

Properties of the combined server failure operators

Proof of Theorem 2

Convexity in\({\varvec{x}}_\mathbf{1}\) We investigate the remaining three cases of the preservation decisions on the RHS of the convexity definition, denoted by \(a_{1}\) and \(a_{2}\) (\(a_i=1\) indicates a decision to preserve a task, and \(a_i=0\) indicates a decision to drop a task) for \(T_{SF}f(x_1,x_2)\) and \(T_{SF}f(x_1+2,x_2)\), respectively. As indicated above, we investigate our finite state space, i.e., \(0\le x_1+x_2\le M-2\), \(0\le x_1\le M-2\). We decompose \(T_{SF}f(x_1+1,x_2)\) into two formulas: one with the \(T_{CP}\) component, and the other with the rest of the components.

\(\mathbf {a_1=a_2=1}\)

We first evaluate the \(T_{CP}\) components of \(T_{SF}f(x_1+1,x_2)\) using the same methods as in Theorem 1:

$$\begin{aligned}&2(x_1+1)\min \{c+f(x_1+1,x_2+1), c'+f(x_1,x_2+1)\}\nonumber \\&\quad \le 2(x_1+1)(c+f(x_1+1,x_2+1))\nonumber \\&\quad \le 2(x_1+1)c+x_1(f(x_1,x_2+1)+f(x_1+2,x_2+1))+2f(x_1+1,x_2+1). \end{aligned}$$
(32)

We combine (32) and (17) and get

$$\begin{aligned} 2T_{SF}f(x_1+1,x_2)&\le 2(x_1+1)c+x_1(f(x_1,x_2+1)+f(x_1+2,x_2+1)) \\&\quad +(M-x_1-x_2)(f(x_1,x_2+1)+f(x_1+2,x_2+1)) \\&\quad +x_2(f(x_1,x_2)+f(x_1+2,x_2)) \\&=x_1(c+f(x_1,x_2+1))+(M-x_1-x_2)f(x_1,x_2+1) \\&\quad +x_2f(x_1,x_2)+(x_1+2)(c+f(x_1+2,x_2+1)) \\&\quad +(M-x_1-2-x_2)f(x_1+2,x_2+1) \\&\quad +x_2f(x_1+2,x_2). \end{aligned}$$

We note that the last step (the equality) includes the addition and subtraction of \(2f(x_1+2,x_2+1)\).

\(\mathbf {a_1=a_2=0}\)

Similarly to the case \(a_1=a_2=1\), we first use the known results of \(T_{CP}\):

$$\begin{aligned}&2(x_1+1)\min \{c+f(x_1+1,x_2+1), c'+f(x_1,x_2+1)\}\nonumber \\&\quad \le 2(x_1+1)(c'+f(x_1,x_2+1))\nonumber \\&\quad \le 2(x_1+1)c'+x_1(f(x_1-1,x_2+1)+f(x_1+1,x_2+1))+2f(x_1,x_2+1). \end{aligned}$$
(33)

We now apply convexity to the rest of the components in \(T_{SF}\):

$$\begin{aligned}&2(M-x_1-1-x_2)f(x_1+1,x_2+1)+2x_2f(x_1+1,x_2)\nonumber \\&\quad =2(M-x_1-2-x_2)f(x_1+1,x_2+1)+2x_2f(x_1+1,x_2)\nonumber \\&\qquad +2f(x_1+1,x_2+1)\nonumber \\&\quad \le (M-x_1-2-x_2)(f(x_1,x_2+1)+f(x_1+2,x_2+1))\nonumber \\&\qquad +x_2(f(x_1,x_2)+f(x_1+2,x_2))+2f(x_1+1,x_2+1). \end{aligned}$$
(34)

By combining (33) and (34), we get

$$\begin{aligned} 2T_{SF}f(x_1+1,x_2)&\le x_1(c'+f(x_1-1,x_2+1))+(M-x_1-x_2)f(x_1,x_2+1) \\&\quad +x_2f(x_1,x_2)+(x_1+2)(c'+f(x_1+1,x_2+1)) \\&\quad +(M-x_1-2-x_2)f(x_1+2,x_2+1) \\&\quad +x_2f(x_1+2,x_2). \end{aligned}$$

\(\mathbf {a_1=1, a_2=0}\)

Similarly, we first investigate the \(T_{CP}\) component:

$$\begin{aligned}&2(x_1+1)\min \{c+f(x_1+1,x_2+1), c'+f(x_1,x_2+1)\}\nonumber \\&\quad \le (x_1+2)(c'+f(x_1,x_2+1))+x_1(c+f(x_1+1,x_2+1)). \end{aligned}$$
(35)

We note that for the rest of the components in \(T_{SF}\) we can use (34) from the case \(a_1=a_2=0\) to combine with (35) and achieve the required result.

Supermodularity We prove in detail the four cases of the two preservation decisions by \(T_{CP}\) on the RHS of the supermodularity definition. We recall that the decisions are denoted by \(a_{1}\) and \(a_{2}\) for \(T_{SF}f(x1,x2)\) and \(T_{SF}f(x_1+1,x_2+1)\), respectively, where \(a_i=1\) indicates a decision to preserve a task, and \(a_i=0\) indicates a decision to drop a task. We focus on the valid states, i.e., \(0\le x_1+x_2\le M-2\).

\(\mathbf {a_1=a_2=1}\)

We first use the results from Appendix A and get the following result for the \(T_{CP}\) components \(T_{SF}f(x_1+1,x_2)+T_{SF}f(x_1,x_2+1)\):

$$\begin{aligned}&(x_1+1)\min \{c+f(x_1+1,x_2+1), c'+f(x_1,x_2+1)\}\nonumber \\&\qquad +x_1\min \{c+f(x_1,x_2+2), c'+f(x_1-1,x_2+2)\}\nonumber \\&\quad \le (2x_1+1)c+(x_1+1)f(x_1+1,x_2+1)+x_1f(x_1,x_2+2)\nonumber \\&\quad \le (2x_1+1)c+x_1(f(x_1,x_2+1)+f(x_1+1,x_2+2))+f(x_1+1,x_2+1). \end{aligned}$$
(36)

We now evaluate the rest of the components in \(T_{SF}f(x_1+1,x_2)+T_{SF}f(x_1,x_2+1)\):

$$\begin{aligned}&(M-x_1-1-x_2)f(x_1+1,x_2+1)+x_2f(x_1+1,x_2)\nonumber \\&\qquad +(M-x_1-x_2-1)f(x_1,x_2+2)+(x_2+1)f(x_1,x_2+1)\nonumber \\&\quad \le (M-x_1-1-x_2)(f(x_1,x_2+1)+f(x_1+1,x_2+2))\nonumber \\&\qquad +x_2(f(x_1,x_2)+f(x_1+1,x_2+1))+f(x_1,x_2+1)\nonumber \\&\quad =(M-x_1-x_2)f(x_1,x_2+1)+(M-x_1-1-x_2)f(x_1+1,x_2+2)\nonumber \\&\qquad +x_2(f(x_1,x_2)+f(x_1+1,x_2+1)). \end{aligned}$$
(37)

We combine (36) and (37) and get

$$\begin{aligned}&T_{SF}f(x_1+1,x_2)+T_{SF}f(x_1,x_2+1) \\&\quad \le x_1(c+f(x_1,x_2+1))+(M-x_1-x_2)f(x_1,x_2+1) \\&\qquad +x_2f(x_1,x_2)+((x_1+1)c+x_1f(x_1+1,x_2+2)) \\&\qquad +(M-x_1-x_2-1)f(x_1+1,x_2+2) \\&\qquad +(x_2+1)f(x_1+1,x_2+1) \\&\quad =x_1(c+f(x_1,x_2+1))+(M-x_1-x_2)f(x_1,x_2+1) \\&\qquad +x_2f(x_1,x_2)+(x_1+1)(c+f(x_1+1,x_2+2)) \\&\qquad +(M-x_1-x_2-2)f(x_1+1,x_2+2) \\&\qquad +(x_2+1)f(x_1+1,x_2+1). \end{aligned}$$

\(\mathbf {a_1=a_2=0}\)

Similarly to the case above, we first examine the \(T_{CP}\) components of \(T_{SF}f(x_1+1,x_2)+T_{SF}f(x_1,x_2+1)\):

$$\begin{aligned}&(x_1+1)\min \{c+f(x_1+1,x_2+1), c'+f(x_1,x_2+1)\}\nonumber \\&\qquad +x_1\min \{c+f(x_1,x_2+2), c'+f(x_1-1,x_2+2)\}\nonumber \\&\quad \le (2x_1+1)c'+(x_1+1)f(x_1,x_2+1)+x_1f(x_1-1,x_2+2)\nonumber \\&\quad \le (2x_1+1)c'+x_1(f(x_1,x_2+2)+f(x_1-1,x_2+1))+f(x_1,x_2+1). \end{aligned}$$
(38)

We now investigate the rest of the components in \(T_{SF}f(x_1+1,x_2)+T_{SF}f(x_1,x_2+1)\):

$$\begin{aligned}&(M-x_1-1-x_2)f(x_1+1,x_2+1)+x_2f(x_1+1,x_2)\nonumber \\&\qquad +(M-x_1-x_2-1)f(x_1,x_2+2)+(x_2+1)f(x_1,x_2+1)\nonumber \\&\quad =(M-x_1-x_2-2)(f(x_1+1,x_2+1)+f(x_1,x_2+2))\nonumber \\&\qquad +x_2(f(x_1+1,x_2)+f(x_1,x_2+1))\nonumber \\&\qquad +f(x_1,x_2+1)+f(x_1+1,x_2+1)+f(x_1,x_2+2)\nonumber \\&\quad \le (M-x_1-x_2-2)(f(x_1+1,x_2+2)+f(x_1,x_2+1))\nonumber \\&\qquad +x_2(f(x_1+1,x_2+1)+f(x_1,x_2))\nonumber \\&\qquad +f(x_1,x_2+1)+f(x_1+1,x_2+1)+f(x_1,x_2+2). \end{aligned}$$
(39)

In the last inequality we applied supermodularity twice.

We combine (38) and (39), add \(c'\ge 0\) and get

$$\begin{aligned}&T_{SF}f(x_1+1,x_2)+T_{SF}f(x_1,x_2+1) \\&\quad \le x_1(c'+f(x_1-1,x_2+1))+(M-x_1-x_2)f(x_1,x_2+1) \\&\qquad +x_2f(x_1,x_2)+(x_1+1)(c'+f(x_1+1,x_2+2)) \\&\qquad +(M-x_1-x_2-2)f(x_1+1,x_2+2) +(x_2+1)f(x_1+1,x_2+1). \end{aligned}$$

\(\mathbf {a_1=1, a_2=0}\)

We examine the \(T_{CP}\) components of \(T_{SF}f(x_1+1,x_2)+T_{SF}f(x_1,x_2+1)\):

$$\begin{aligned}&(x_1+1)\min \{c+f(x_1+1,x_2+1), c'+f(x_1,x_2+1)\}\nonumber \\&\qquad +x_1\min \{c+f(x_1,x_2+2), c'+f(x_1-1,x_2+2)\}\nonumber \\&\quad \le (x_1+1)(c'+f(x_1,x_2+1))+x_1(c+f(x_1,x_2+2))\nonumber \\&\quad \le x_1(c+f(x_1,x_2+1))+(x_1+1)c'+x_1f(x_1,x_2+2)+f(x_1,x_2+1). \end{aligned}$$
(40)

For the rest of the components on the LHS of the supermodularity definition, we use the results of the case \(a_1=a_2=0\) as described in (39). We combine (39) and (40) and get

$$\begin{aligned}&T_{SF}f(x_1+1,x_2)+T_{SF}f(x_1,x_2+1)\nonumber \\&\quad \le x_1(c+f(x_1,x_2+1))+(M-x_1-x_2)f(x_1,x_2+1)\nonumber \\&\qquad +x_2f(x_1,x_2)+(x_1+1)(c'+f(x_1,x_2+2))\nonumber \\&\qquad +(M-x_1-x_2-2)f(x_1+1,x_2+2)\nonumber \\&\qquad +(x_2+1)f(x_1+1,x_2+1). \end{aligned}$$
(41)

\(\mathbf {a_1=0, a_2=1}\)

Similarly to the case \(a_1=1\), \(a_2=0\), we focus on the \(T_{CP}\) components and afterward combine it with (37):

$$\begin{aligned}&(x_1+1)\min \{c+f(x_1+1,x_2+1), c'+f(x_1,x_2+1)\}\nonumber \\&\qquad +x_1\min \{c+f(x_1,x_2+2), c'+f(x_1-1,x_2+2)\}\nonumber \\&\quad \le (x_1+1)(c+f(x_1+1,x_2+1))+x_1(c'+f(x_1-1,x_2+2))\nonumber \\&\quad \le x_1(f(x_1-1,x_2+1)+f(x_1+1,x_2+2))+(x_1+1)c\nonumber \\&\qquad +x_1c'+f(x_1+1,x_2+1). \end{aligned}$$
(42)

We now combine (37) and (42) and get

$$\begin{aligned}&T_{SF}f(x_1+1,x_2)+T_{SF}f(x_1,x_2+1)\nonumber \\&\quad \le x_1(c'+f(x_1-1,x_2+1))+(M-x_1-x_2)f(x_1,x_2+1)\nonumber \\&\qquad +x_2f(x_1,x_2)+(x_1+1)(c+f(x_1+1,x_2+2))\nonumber \\&\qquad +(M-x_1-x_2-2)f(x_1+1,x_2+2)\nonumber \\&\qquad +(x_2+1)f(x_1+1,x_2+1). \end{aligned}$$
(43)

\(\square \)

Always admit when \(C_r>C_d\) and \(C_{d}\ge C_a/(1-p_{f})\)

Proof of Theorem 7

Similarly to Theorem 5, we will prove it by a way of contradiction. For the sake of contradiction assume that the optimal policy is policy B, that in some cases does not obey the admission rule of the theorem. Let us look at a system applying policy B (system B) and at the first such event at which system B violates the rule, that is it does not admit a new task though it has an idle server. Assume this happens at \(t_0\).

We now construct a policy A that will perform better than policy B. System A applying policy A is identical to system B until \(t_0\) at which time it admits the new task which was rejected by system B. We denote this task as the extra task, and the rest of the tasks (if any exist) that are in both systems as the mutual tasks. Similarly to the sample-path technique, we couple the two systems via the failure, service and arrival times, i.e., all server failures, task departures and arrivals are the same in both systems. Once system B rejects the task the difference of the value functions between the systems is \(C_r-C_a\). We note that from \(t_0\) system A follows system B, until at some point in time, say \(t_1\):

  1. 1.

    A new task arrives and system A needs to reject it (lack of idle servers), but system B can accept the task (due to the additional idle server). Thus, in this case the overall cost difference between the systems is 0. We note that from \(t_1\) onwards system A imitates system B (both systems will behave similarly).

  2. 2.

    An active server of one of the mutual tasks fails and system A drops the task, but system B can switch to another server (due to the extra idle server). In this case system A is penalized with \(C_d\) and system B incurs a server activation cost \(C_{pres}\). Hence, the overall cost difference between the systems is \(C_r+C_{pres}-(C_d+C_a)\). As \(C_{pres}=C_a\), the overall cost difference is \(C_r-C_d\). Also here from \(t_1\) onwards both systems will behave similarly.

  3. 3.

    The extra task in system A is completed. In this case the difference between the systems remains \(C_r-C_a\), and from this point (\(t_1\)) onwards both systems will behave similarly.

  4. 4.

    The server of the extra task in system A fails. As in system B this server is idle, this event does not impact the system and therefore system B does not incur any additional cost. In system A, we consider the case in which the task is dropped and the system is penalized with \(C_d\). We note that due to the conditions in the theorem, i.e., \(C_{d}\ge C_a/(1-p_{f})\), this approach investigates the worst case scenario and therefore holds for all potential system trajectories. As the task is dropped, system A incurs a cost of \(C_d\). From this point (\(t_1\)) onwards both systems will behave similarly.

Similarly to Theorem 5, we again emphasize that the two systems are completely coupled before \(t_0\), and then any time between \(t_0\) and \(t_1\), and then after \(t_1\). Therefore, we cover all trajectories at which the systems differ from each other.

We note that each of the aforementioned scenarios occur with a certain probability. Due to the cost structure, it is obvious that \(C_r-C_a>0\) and \(C_r-C_d>0\). We focus on scenarios 3 and 4, which depend on the extra task and hence their probabilities are \(1- p_f\) and \(p_f\), respectively. Therefore, the overall expected cost of System A in this scenario is \(C_A=C_a+p_fC_d\); and the overall expected cost of system B is \(C_B=C_r\). We emphasize again that dropping the extra task is the worst case scenario, i.e., incurs the highest cost. We compare the two systems cost difference and show that \(C_B-C_A>0\):

$$\begin{aligned} C_r-(C_a+p_fC_d). \end{aligned}$$
(44)

We substitute \(C_d<C_r\) and afterward \((1-p_f)C_d\ge C_a\) in (44), which yields

$$\begin{aligned} C_B-C_A>C_d-(C_a+p_fC_d)=(1-p_f)C_d-C_a\ge 0. \end{aligned}$$
(45)

Based on the conditions of the theorem, i.e., \(C_r\ge C_a/(1-p_f)\), it is clear that \(C_A<C_B\). Thus, system A achieves an overall lower expected system cost than system B, considering all scenarios and their occurrence probabilities. We note that as our investigation is of an arbitrary task at an arbitrary point in time (and hence in system state) we can conclude that it is always beneficial to preserve an already admitted task (as long as there are available servers) when an active server fails. \(\square \)

Systems with unequal admission and preservation costs

We now detail the changes required in Theorems 57 for cost structures in which \(C_{pres}>C_a\) or \(C_{pres}<C_a\).

1.1 Preservation is more expensive than admission (\(C_{pres}>C_a\))

We first address the case where \(C_r<C_d\), i.e., the rejection cost is cheaper than the drop cost. Due to the cost structure and the need for both Lemmas 3 and 1, Theorem 5 in this cost structure requires three conditions instead of the two in the original theorem.

Theorem 5’ Consider a system with finite number of servers and \(C_r\ge C_a+C_{pres}p_f/(1-p_{f})\), \(C_d\ge C_{pres}/(1-p_{f})\) and \(C_d>C_r\). Then for all \((x_1,x_2), x_1\ge 1,x_2\ge 0,\) such that \(x_1+x_2<M\) it is optimal to preserve an existing task.

The proof will follow easily based on the new conditions. We note that for the proof of case 1 in the sample-path the condition \(C_d>C_r\) is used (similarly to the original proof), and for cases 3 and 4 the new condition (\(C_d\ge C_{pres}/(1-p_{f})\)) is now required.

As indicated in Sect. 8.1, Theorem 6’ and Lemma 1’ show the properties of the new region, Region VI.

We now continue and investigate the case where \(C_r>C_d\), i.e., the drop cost is cheaper than the rejection cost. We note that while the shapes and sizes of the regions within \(C_r>C_d\) are slightly changed (as can be seen in Fig. 5b), the conditions on the relevant regions change only to include the preservation cost.

Theorem 6’ If \(C_r>C_d\), \(C_r\ge C_a+p_fC_d\) and \(C_{pres}/(1-p_{f})>C_{d}\), then for all \((x_1,x_2), x_1\ge 1,x_2\ge 0\), such that \(x_1+x_2<M\), it is optimal to admit a new task.

Theorem 7’ If \(C_r>C_d\) and \(C_{d}\ge C_{pres}/(1-p_{f})\), then for all \((x_1,x_2), x_1\ge 1,x_2\ge 0\), such that \(x_1+x_2<M\), it is optimal to admit an existing task.

The proofs of Theorem 6’ and 7’ follow easily according to the outlines of the original proofs. We note that in Theorem 7’ we use the fact that \(C_r>C_d\) and \(C_{pres}>C_a\) to prove case 2 in the sample-path. For the proofs of cases 3 and 4, we use the two conditions of the theorem and the fact that \(C_{pres}>C_a\).

1.2 Admission is more expensive than preservation (\(C_{pres}<C_a\))

We first address the changes required when \(C_d>C_r\). We note that in this specific cost structure (\(C_{pres}<C_a\) and \(C_d>C_r\)), the conditions of Theorem 5 change only to include the preservation cost. The reason is that the preservation condition (needed for the case \(C_{pres}>C_a\)) can now be extracted from the two conditions.

Theorem 5* Consider a system with a finite number of server, and \(C_r\ge C_a+C_{pres}p_f/(1-p_{f})\) and \(C_d>C_r\). Then for all \((x_1,x_2), x_1\ge 1,x_2\ge 0,\) such that \(x_1+x_2<M\) it is optimal to preserve an existing task.

The proof follows easily and similarly to the original proof.

For the case where the rejection cost is higher than the drop cost (\(C_r>C_d\)), we note that Regions III and IV are slightly changed (as can be seen in Fig. 5c); thus, the conditions on the regions change to include the preservation cost.

Theorem 6* If \(C_r>C_d\), \(C_r\ge C_a+p_fC_d\) and \(C_{pres}/(1-p_{f})>C_{d}\), then for all \((x_1,x_2), x_1\ge 1,x_2\ge 0\), such that \(x_1+x_2<M\), it is optimal to admit a new task.

The proof follows easily.

In Theorem 7 we note that an extra condition is required in addition to the adaptation to the preservation cost.

Theorem 7* If \(C_r>C_d\) and \(C_{d}\ge C_{pres}/(1-p_{f}),\) then for all \((x_1,x_2), x_1\ge 1,x_2\ge 0\), such that \(x_1+x_2<M\), it is optimal to admit an existing task.

Here the condition on the cost structure, i.e., \(C_a>C_{pres},\) is used in the sample-path proof (cases 3 and 4), in which the extra task in system A is either completed or dropped (due to server failure). By combining the two cases, we reach that the overall cost of system A is \(C_a+p_{f}C_{d}\), and the cost of system B is \(C_r\). The condition \(C_a>C_{pres}\) ensures that admission is preferable. The impact on Region II can be observed in comparison to Fig. 5a (the new region VII).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lavi, N., Levy, H. Admit or preserve? Addressing server failures in cloud computing task management. Queueing Syst 94, 279–325 (2020). https://doi.org/10.1007/s11134-019-09624-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11134-019-09624-z

Keywords

Mathematics Subject Classification

Navigation