Modeling and analysis of distributed schedulers in data center cluster networks

Alshahrani, Reem; Peyravi, Hassan

doi:10.1007/s10586-021-03343-y

Modeling and analysis of distributed schedulers in data center cluster networks

Published: 17 June 2021

Volume 24, pages 3351–3366, (2021)
Cite this article

Cluster Computing Aims and scope Submit manuscript

288 Accesses
Explore all metrics

Abstract

One of the goals of cloud service providers is to satisfy service-level agreements without significant over-provisioning in data center clusters. Efforts to meet these requirements have been mainly based on resource over-provisioning rather than identifying performance bottlenecks. While increasing parallelism tends to reduce the average and tail latency, the joint impact of concurrent job scheduling and parallel task processing is a challenging problem to analytically model, particularly when compared to the models developed without the notion of concurrency. This article presents an analytical model for distributed schedulers in data center cluster networks. The model can be used to investigate how latency can affect a data center network design and how many resources should be allocated to meet service-level agreements. To get better insight, we build upon ideas from queuing networks, which provide a framework to measure expected latency versus resource provisioning. The model is based on tandem queuing networks and fork–join systems to compute expected latency in closed forms at various stages of data center cluster networks. Theoretical analysis and simulations have been conducted to demonstrate the effectiveness of the proposed model and to strike a balance between expected latency and resource utilization. Results obtained from various simulation scenarios on different data center traffic traces confirm the soundness of the model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparative analysis of metaheuristic load balancing algorithms for efficient load balancing in cloud computing

Article Open access 13 June 2023

A survey of Kubernetes scheduling algorithms

Article Open access 13 June 2023

Improved Dynamic Johnson Sequencing Algorithm (DJS) in Cloud Computing Environment for Efficient Resource Scheduling for Distributed Overloading

Article 24 May 2024

Notes

$Pr[X_{1}, X_{2}, \ldots X_{n}] = c \displaystyle \prod \nolimits _{i=1}^{n} Pr[X_{i}]$, where c is a constant.
Queue length distribution is insensitive to the service time distribution.

References

Alibaba.com: Alibaba production cluster data (2018). https://github.com/alibaba/clusterdata
Ananthanarayanan, G., Ghodsi, A., Shenker, S., Stoica, I.: Effective straggler mitigation: Attack of the clones. In: Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2013, Lombard, IL, USA, 2–5 April 2013, pp. 185–198. USENIX Association (2013)
Bai, W.H., Xi, J.Q., Zhu, J.X., Huang, S.W.: Performance analysis of heterogeneous data centers in cloud computing using a complex queuing model. Math. Probl. Eng. 2015, 1–15 (2015)
MathSciNet MATH Google Scholar
Chkirbene, Z., Hadjidj, R., Foufou, S., Hamila, R.: Lascada: a novel scalable topology for data center network. IEEE/ACM Trans. Netw. 28(5), 2051–2064 (2020)
Article Google Scholar
Dukic, V., Khanna, G., Gkantsidis, C., Karagiannis, T., Parmigiani, F., Singla, A., Filer, M., Cox, J.L., Ptasznik, A., Harland, N., Saunders, W., Belady, C.: Beyond the mega-data center: networking multi-data center regions. In: SIGCOMM, pp. 765–781. ACM (2020)
El Kafhali, S., Salah, K.: Stochastic modelling and analysis of cloud computing data center. In: Innovations in Clouds, Internet and Networks (ICIN), pp. 122–126. IEEE (2017)
Garcia-Carballeira, F., Calderón, A., Carretero, J.: Enhancing the power of two choices load balancing algorithm using round robin policy. Clust. Comput. 24, 611–624 (2020)
Article Google Scholar
Graham, C., Buest, R., Ackerman, D., Nag, S.: Forecast analysis: cloud managed services, worldwide (2020). https://www.gartner.com/en/documents/3981360
Guo, L., Yan, T., Zhao, S., Jiang, C.: Dynamic performance optimization for cloud computing using M/M/m queueing system. J. Appl. Math. 2014, 756592:1–756592:8 (2014)
Google Scholar
Jackson, J.R.: Networks of waiting lines. Oper. Res. 5(4), 518–521 (1957)
Article MathSciNet Google Scholar
Jafarnejad Ghomi, E., Rahmani, A.M., Qader, N.N.: Applying queue theory for modeling of cloud computing: a systematic review. Concurr. Comput. Pract. Exp. 31(2), e5186 (2019)
Article Google Scholar
Khazaei, H., Misic, J.V., Misic, V.B.: Performance analysis of cloud computing centers using M/G/m/m + r queuing systems. IEEE Trans. Parallel Distrib. Syst. 23(5), 936–943 (2012)
Article Google Scholar
Kleinrock, L.: Communication Nets. Stochastic Message Flow and Delay. McGraw-Hill Book Company, New York (1964)
MATH Google Scholar
Kleinrock, L.: Queueing Systems: Theory, vol. I. Wiley Interscience, New York (1975)
MATH Google Scholar
Kumar, G., Dukkipati, N., Jang, K., Wassel, H.M.G., Wu, X., Montazeri, B., Wang, Y., Springborn, K., Alfeld, C., Ryan, M., Wetherall, D., Vahdat, A.: Swift: delay is simple and effective for congestion control in the datacenter. In: SIGCOMM, pp. 514–528. ACM (2020)
Mohtavipour, S.M., Mollajafari, M., Naseri, A.: A novel packet exchanging strategy for preventing HoL-blocking in fat-trees. Clust. Comput. 23, 461–482 (2020)
Article Google Scholar
Ousterhout, A., Perry, J., Balakrishnan, H., Lapukhov, P.: Flexplane: an experimentation platform for resource management in datacenters. In: 14th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2017, Boston, MA, USA, 27–29 March 2017, pp. 438–451. USENIX Association (2017)
Ousterhout, K., Canel, C., Ratnasamy, S., Shenker, S.: Monotasks: architecting for performance clarity in data analytics frameworks. In: SOSP, pp. 184–200. ACM (2017)
Poola, D., Ramamohanarao, K., Buyya, R.: Enhancing reliability of workflow execution using task replication and spot instances. ACM Trans. Auton. Adapt. Syst. 10(4), 1–21 (2016)
Article Google Scholar
Qiu, Z., Pérez, J.F., Harrison, P.G.: Beyond the mean in fork-join queues: efficient approximation for response-time tails. Perform. Eval. 91, 99–116 (2015)
Article Google Scholar
Reiss, C., Wilkes, J., Hellerstein, J.L.: Google Cluster-Usage Traces: Format+ Schema, White Paper, pp. 1–14. Google, Inc. (2011)
Schwarzkopf, M., Bailis, P.: Research for practice: cluster scheduling for datacenters. Commun. ACM 61(5), 50–53 (2018)
Article Google Scholar
Sridharan, R., Domnic, S.: Network policy aware placement of tasks for elastic applications in IaaS-cloud environment. Clust. Comput. 24, 1381–1396 (2021)
Article Google Scholar
Thomasian, A.: Analysis of fork/join and related queueing systems. ACM Comput. Surv. (CSUR) 47(2), 17 (2015)
Article Google Scholar
Vilaplana, J., Solsona, F., Teixidó, I., Mateo, J., Abella, F., Rius, J.: A queuing theory model for cloud computing. J. Supercomput. 69(1), 492–507 (2014)
Article Google Scholar
Wang, W., Harchol-Balter, M., Jiang, H., Scheller-Wolf, A., Srikant, R.: Delay asymptotics and bounds for multi-task parallel jobs. SIGMETRICS Perform. Eval. Rev. 46(3), 2–7 (2018)
Article Google Scholar
Yang, B., Tan, F., Dai, Y.S.: Performance evaluation of cloud service considering fault recovery. J. Supercomput. 65(1), 426–444 (2013)
Article Google Scholar
Zhang, T., Huang, J., Chen, K., Wang, J., Chen, J., Pan, Y., Min, G.: Rethinking fast and friendly transport in data center networks. IEEE/ACM Trans. Netw. 28(5), 2364–2377 (2020)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Kent State University, Kent, Ohio, 44242, USA
Reem Alshahrani & Hassan Peyravi

Authors

Reem Alshahrani
View author publications
You can also search for this author in PubMed Google Scholar
Hassan Peyravi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hassan Peyravi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Order statistics

Let $X \overset{iid}{=} X_{1}, X_{2}, \ldots , X_{n}$ are mutually independent and identically distributed (iid) random variable with $X_{(k)}$ be the kth smallest X (called the kth order statistic), then,

$$ \begin{array}{c} X_{(1)} = \min \left\{ X_{1}, X_{2}, \ldots , X_{n} \right\} , \\ X_{(n)} = \max \left\{ X_{1}, X_{2}, \ldots , X_{n} \right\} , \\ X_{(1)} \le X_{(2)} \le \cdots \le X_{(n-1)} \le X_{(n)}. \end{array} $$

(29)

The expected values of the maximum and the minimum of these n random variables can be found if the cumulative distribution function (cdf) of $X_{(n)}$ and $X_{(1)}$ are calculated.

1.1 Density of and cumulative functions the maximum

$$\begin{aligned} F_{max}(x)= & {} F_{X_{(n)}}(x) = P \left( X_{(n)} \le x\right) \\= & {} P \left( X_{(1)} \le x, X_{(2)} \le x, \ldots , X_{(n)} \le x \right) \\ = P \left( X_{1} \le x, X_{2} \le x, \ldots , X_{n} \le x \right) = F_{1}(x) F_{2}(x) \cdots F_{n}(x) = F^{n}(x) \end{aligned}$$

(30)

$$\begin{aligned} f_{max}(x)= & {} \frac{d}{dx} F^{n}(x) = n f(x) F^{n-1}(x). \end{aligned}$$

(31)

1.2 Density and cumulative functions of the minimum

$$\begin{aligned} F_{min}(x)= & {} F_{X_{(1)}}(x) = 1- P \left( X_{min}> x\right) = 1-P \left( X_{1}> x, X_{2}> x, \ldots , X_{n} > x \right) \\= & {} 1-\left( 1-F_{1}(x)\right) \left( 1-F_{2}(x)\right) \cdots \left( 1-F_{n}(x)\right) = 1-(1-F(x))^{n}, \end{aligned}$$

(32)

$$\begin{aligned} f_{min}(x)= & {} - \frac{d}{dx} (1-F(x))^{n} = n f(x) (1-F(x))^{n-1}. \end{aligned}$$

(33)

1.3 Density of maximum and minimum for exponential distribution

Let $X_{1}, X_{2}, \ldots , X_{n} \overset{iid}{\sim } Exp(\mu )$, then

$$ f_{min}(x) = n f(x) (1-F(x))^{n-1} = n \left( \mu e^{-\mu x}\right) \left[ (1- \left( 1-e^{-\mu x}\right) \right] ^{n-1} = n \mu e^{-n \mu x} $$

(34)

and

$$ f_{max}(x) = n f(x) (F(x))^{n-1} = n \left( \mu e^{-\mu x}\right) \left[ 1-e^{-\mu x} \right] ^{n-1}. $$

(35)

1.4 Expected values of the maximum and the minimum for exponential distribution

The mean of the maximum of n independent random variables is

$$\begin{aligned} E\left[ X_{(n)}\right]= & {} n \int _{-\infty }^{\infty} x f(x) F^{n-1} (x) dx, \end{aligned}$$

(36)

$$\begin{aligned} E\left[ X_{(1)}\right]= & {} n \int _{-\infty }^{\infty} x f(x) \left( 1 -F(x) \right) ^{n-1} dx. \end{aligned}$$

(37)

If $X_{i} \overset{iid}{\sim } Exp (\mu )$ for $i=1, 2, \ldots $, then

$$\begin{aligned} E\left[ X_{(n)}\right]= & {} n \int _{0}^{\infty} x f(x) F^{n-1} (x) dx \\= & {} n \int _{0}^{\infty} x \mu e^{-\mu x} \left( 1- e^{-\mu x} \right) ^{n-1} dx \\= & {} n \mu \int _{0}^{\infty} x e^{-\mu x} \left( 1- e^{-\mu x} \right) ^{n-1} dx = n \mu \int _{0}^{\infty} x e^{-\mu x} \sum _{k=0}^{n-1} \left( {\begin{array}{c}n-1\\ k\end{array}}\right) \left( -e^{-\mu x}\right) ^{k} dx \\= & {} n \mu \sum _{k=0}^{n-1} \left( {\begin{array}{c}n-1\\ k\end{array}}\right) (-1)^{k} \int _{0}^{\infty} x e^{-(k+1)\mu x} dx = n \mu \sum _{k=0}^{n-1} \left( {\begin{array}{c}n-1\\ k\end{array}}\right) (-1)^{k} \frac{1}{((k+1)\mu )^{2}} \\= & {} \frac{n}{\mu } \sum _{k=0}^{n-1} \left( {\begin{array}{c}n-1\\ k\end{array}}\right) (-1)^{k} \frac{1}{(k+1)^{2}} = J(n) \\= & {} \frac{1}{\mu } \sum _{k=1}^{n} \left( {\begin{array}{c}n\\ k\end{array}}\right) (-1)^{k} \frac{1}{k} = \frac{1}{\mu } \left( 1+\frac{1}{2}+\frac{1}{3}+ \cdots + \frac{1}{n}\right) = H_{n}/\mu , \end{aligned}$$

(38)

where $\int xe^{ax} = \left( \frac{x}{a} - \frac{1}{a^{2}} \right) e^{ax}$, and $(a+b)^{n} = \sum _{k=0}^{n} \left( {\begin{array}{c}n\\ k\end{array}}\right) a^{n-k} b^{k}$, and $\left( {\begin{array}{c}n+1\\ k\end{array}}\right) = \left( {\begin{array}{c}n\\ k\end{array}}\right) + \left( {\begin{array}{c}n\\ k-1\end{array}}\right) $.

$$\begin{aligned} E\left[ X_{(1)}\right]= & {} n \int _{-\infty }^{\infty} x f(x) \left( 1 -F(x) \right) ^{n-1} dx \\= & {} n \int _{0}^{\infty} x \mu e^{-\mu x} \left( e^{-\mu x}\right) ^{n-1} dx \\= & {} n \int _{0}^{\infty} x \mu e^{-n \mu x} dx \\= & {} \frac{1}{n\mu }. \end{aligned}$$

(39)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alshahrani, R., Peyravi, H. Modeling and analysis of distributed schedulers in data center cluster networks. Cluster Comput 24, 3351–3366 (2021). https://doi.org/10.1007/s10586-021-03343-y

Download citation

Received: 28 September 2020
Revised: 09 June 2021
Accepted: 11 June 2021
Published: 17 June 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s10586-021-03343-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modeling and analysis of distributed schedulers in data center cluster networks

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of metaheuristic load balancing algorithms for efficient load balancing in cloud computing

A survey of Kubernetes scheduling algorithms

Improved Dynamic Johnson Sequencing Algorithm (DJS) in Cloud Computing Environment for Efficient Resource Scheduling for Distributed Overloading

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix

Order statistics

1.1 Density of and cumulative functions the maximum

1.2 Density and cumulative functions of the minimum

1.3 Density of maximum and minimum for exponential distribution

1.4 Expected values of the maximum and the minimum for exponential distribution

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Modeling and analysis of distributed schedulers in data center cluster networks

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of metaheuristic load balancing algorithms for efficient load balancing in cloud computing

A survey of Kubernetes scheduling algorithms

Improved Dynamic Johnson Sequencing Algorithm (DJS) in Cloud Computing Environment for Efficient Resource Scheduling for Distributed Overloading

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix

Order statistics

1.1 Density of and cumulative functions the maximum

1.2 Density and cumulative functions of the minimum

1.3 Density of maximum and minimum for exponential distribution

1.4 Expected values of the maximum and the minimum for exponential distribution

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation