Abstract
One of the goals of cloud service providers is to satisfy service-level agreements without significant over-provisioning in data center clusters. Efforts to meet these requirements have been mainly based on resource over-provisioning rather than identifying performance bottlenecks. While increasing parallelism tends to reduce the average and tail latency, the joint impact of concurrent job scheduling and parallel task processing is a challenging problem to analytically model, particularly when compared to the models developed without the notion of concurrency. This article presents an analytical model for distributed schedulers in data center cluster networks. The model can be used to investigate how latency can affect a data center network design and how many resources should be allocated to meet service-level agreements. To get better insight, we build upon ideas from queuing networks, which provide a framework to measure expected latency versus resource provisioning. The model is based on tandem queuing networks and fork–join systems to compute expected latency in closed forms at various stages of data center cluster networks. Theoretical analysis and simulations have been conducted to demonstrate the effectiveness of the proposed model and to strike a balance between expected latency and resource utilization. Results obtained from various simulation scenarios on different data center traffic traces confirm the soundness of the model.
Similar content being viewed by others
Notes
\(Pr[X_{1}, X_{2}, \ldots X_{n}] = c \displaystyle \prod \nolimits _{i=1}^{n} Pr[X_{i}]\), where c is a constant.
Queue length distribution is insensitive to the service time distribution.
References
Alibaba.com: Alibaba production cluster data (2018). https://github.com/alibaba/clusterdata
Ananthanarayanan, G., Ghodsi, A., Shenker, S., Stoica, I.: Effective straggler mitigation: Attack of the clones. In: Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2013, Lombard, IL, USA, 2–5 April 2013, pp. 185–198. USENIX Association (2013)
Bai, W.H., Xi, J.Q., Zhu, J.X., Huang, S.W.: Performance analysis of heterogeneous data centers in cloud computing using a complex queuing model. Math. Probl. Eng. 2015, 1–15 (2015)
Chkirbene, Z., Hadjidj, R., Foufou, S., Hamila, R.: Lascada: a novel scalable topology for data center network. IEEE/ACM Trans. Netw. 28(5), 2051–2064 (2020)
Dukic, V., Khanna, G., Gkantsidis, C., Karagiannis, T., Parmigiani, F., Singla, A., Filer, M., Cox, J.L., Ptasznik, A., Harland, N., Saunders, W., Belady, C.: Beyond the mega-data center: networking multi-data center regions. In: SIGCOMM, pp. 765–781. ACM (2020)
El Kafhali, S., Salah, K.: Stochastic modelling and analysis of cloud computing data center. In: Innovations in Clouds, Internet and Networks (ICIN), pp. 122–126. IEEE (2017)
Garcia-Carballeira, F., Calderón, A., Carretero, J.: Enhancing the power of two choices load balancing algorithm using round robin policy. Clust. Comput. 24, 611–624 (2020)
Graham, C., Buest, R., Ackerman, D., Nag, S.: Forecast analysis: cloud managed services, worldwide (2020). https://www.gartner.com/en/documents/3981360
Guo, L., Yan, T., Zhao, S., Jiang, C.: Dynamic performance optimization for cloud computing using M/M/m queueing system. J. Appl. Math. 2014, 756592:1–756592:8 (2014)
Jackson, J.R.: Networks of waiting lines. Oper. Res. 5(4), 518–521 (1957)
Jafarnejad Ghomi, E., Rahmani, A.M., Qader, N.N.: Applying queue theory for modeling of cloud computing: a systematic review. Concurr. Comput. Pract. Exp. 31(2), e5186 (2019)
Khazaei, H., Misic, J.V., Misic, V.B.: Performance analysis of cloud computing centers using M/G/m/m + r queuing systems. IEEE Trans. Parallel Distrib. Syst. 23(5), 936–943 (2012)
Kleinrock, L.: Communication Nets. Stochastic Message Flow and Delay. McGraw-Hill Book Company, New York (1964)
Kleinrock, L.: Queueing Systems: Theory, vol. I. Wiley Interscience, New York (1975)
Kumar, G., Dukkipati, N., Jang, K., Wassel, H.M.G., Wu, X., Montazeri, B., Wang, Y., Springborn, K., Alfeld, C., Ryan, M., Wetherall, D., Vahdat, A.: Swift: delay is simple and effective for congestion control in the datacenter. In: SIGCOMM, pp. 514–528. ACM (2020)
Mohtavipour, S.M., Mollajafari, M., Naseri, A.: A novel packet exchanging strategy for preventing HoL-blocking in fat-trees. Clust. Comput. 23, 461–482 (2020)
Ousterhout, A., Perry, J., Balakrishnan, H., Lapukhov, P.: Flexplane: an experimentation platform for resource management in datacenters. In: 14th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2017, Boston, MA, USA, 27–29 March 2017, pp. 438–451. USENIX Association (2017)
Ousterhout, K., Canel, C., Ratnasamy, S., Shenker, S.: Monotasks: architecting for performance clarity in data analytics frameworks. In: SOSP, pp. 184–200. ACM (2017)
Poola, D., Ramamohanarao, K., Buyya, R.: Enhancing reliability of workflow execution using task replication and spot instances. ACM Trans. Auton. Adapt. Syst. 10(4), 1–21 (2016)
Qiu, Z., Pérez, J.F., Harrison, P.G.: Beyond the mean in fork-join queues: efficient approximation for response-time tails. Perform. Eval. 91, 99–116 (2015)
Reiss, C., Wilkes, J., Hellerstein, J.L.: Google Cluster-Usage Traces: Format+ Schema, White Paper, pp. 1–14. Google, Inc. (2011)
Schwarzkopf, M., Bailis, P.: Research for practice: cluster scheduling for datacenters. Commun. ACM 61(5), 50–53 (2018)
Sridharan, R., Domnic, S.: Network policy aware placement of tasks for elastic applications in IaaS-cloud environment. Clust. Comput. 24, 1381–1396 (2021)
Thomasian, A.: Analysis of fork/join and related queueing systems. ACM Comput. Surv. (CSUR) 47(2), 17 (2015)
Vilaplana, J., Solsona, F., Teixidó, I., Mateo, J., Abella, F., Rius, J.: A queuing theory model for cloud computing. J. Supercomput. 69(1), 492–507 (2014)
Wang, W., Harchol-Balter, M., Jiang, H., Scheller-Wolf, A., Srikant, R.: Delay asymptotics and bounds for multi-task parallel jobs. SIGMETRICS Perform. Eval. Rev. 46(3), 2–7 (2018)
Yang, B., Tan, F., Dai, Y.S.: Performance evaluation of cloud service considering fault recovery. J. Supercomput. 65(1), 426–444 (2013)
Zhang, T., Huang, J., Chen, K., Wang, J., Chen, J., Pan, Y., Min, G.: Rethinking fast and friendly transport in data center networks. IEEE/ACM Trans. Netw. 28(5), 2364–2377 (2020)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix
Order statistics
Let \(X \overset{iid}{=} X_{1}, X_{2}, \ldots , X_{n}\) are mutually independent and identically distributed (iid) random variable with \(X_{(k)}\) be the kth smallest X (called the kth order statistic), then,
The expected values of the maximum and the minimum of these n random variables can be found if the cumulative distribution function (cdf) of \(X_{(n)}\) and \(X_{(1)}\) are calculated.
1.1 Density of and cumulative functions the maximum
1.2 Density and cumulative functions of the minimum
1.3 Density of maximum and minimum for exponential distribution
Let \(X_{1}, X_{2}, \ldots , X_{n} \overset{iid}{\sim } Exp(\mu )\), then
and
1.4 Expected values of the maximum and the minimum for exponential distribution
The mean of the maximum of n independent random variables is
If \(X_{i} \overset{iid}{\sim } Exp (\mu )\) for \(i=1, 2, \ldots \), then
where \(\int xe^{ax} = \left( \frac{x}{a} - \frac{1}{a^{2}} \right) e^{ax}\), and \((a+b)^{n} = \sum _{k=0}^{n} \left( {\begin{array}{c}n\\ k\end{array}}\right) a^{n-k} b^{k}\), and \(\left( {\begin{array}{c}n+1\\ k\end{array}}\right) = \left( {\begin{array}{c}n\\ k\end{array}}\right) + \left( {\begin{array}{c}n\\ k-1\end{array}}\right) \).
Rights and permissions
About this article
Cite this article
Alshahrani, R., Peyravi, H. Modeling and analysis of distributed schedulers in data center cluster networks. Cluster Comput 24, 3351–3366 (2021). https://doi.org/10.1007/s10586-021-03343-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-021-03343-y