Skip to main content
Log in

Mean-field Macro Computation in Large-scale Cloud Service Systems with Resource Management and Job Scheduling

  • Published:
Journal of Systems Science and Systems Engineering Aims and scope Submit manuscript

Abstract

Service computing is an emerging and distributed computing mode in cloud service systems, and has become an interesting research direction for both academia and industry. Note that the cloud service systems always display new characteristics, such as stochasticity, large scale, loose coupling, concurrency, non-homogeneity and heterogeneity, thus their load balancing investigation has been more interesting, difficult and challenging until now. By using resource management and job scheduling, this paper proposes an integrated, real-time and dynamic control mechanism for large-scale cloud service systems and their load balancing through combining supermarket models with not only work stealing models but also scheduling of public reserved resource. To this end, this paper provides a novel stochastic model with weak interactions by means of nonlinear Markov processes. To overcome theoretical difficulties growing out of the state explosion in high-dimensional stochastic systems, this paper applies the mean-field theory to develop a macro computational technique in terms of an infinite-dimensional system of mean-field equations. Furthermore, this paper proves the asymptotic independence of the large-scale cloud service system, and show how to compute the fixed point by virtue of an infinite-dimensional system of nonlinear equations. Based on the fixed point, this paper provides effective numerical computation for performance analysis of this system under a high approximate precision. Therefore, we hope that the methodology and results given in this paper can be applicable to the study of more general large-scale cloud service systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Anselmi J, Gaujal B (2009). Performance evaluation ofwork stealing for streaming applications. In 13th International Conference on Principles of Distributed Systems, Nimes, France, December 15–18, 2009.

    Google Scholar 

  • Berenbrink P, Friedetzky T, Goldberg L A (2003). The natural work-stealing algorithm is stable. SIAM Journal on Computing 32(5): 1260–1279.

    Article  MathSciNet  MATH  Google Scholar 

  • Blumofe R D, Leiserson C E (1999). Scheduling multi-threaded computations by work stealing. Journal of the ACM 46(5): 720–748.

    Article  MathSciNet  MATH  Google Scholar 

  • Bramson M, Lu Y, Prabhakar B (2010). Randomized load balancing with general service time distributions. ACM SIGMETRICS Performance Evaluation Review 38(1): 275–286.

    Article  Google Scholar 

  • Bramson M, Lu Y, Prabhakar B (2012). Asymptotic independence of queues under randomized load balancing. Queueing Systems 71(3): 247–292.

    Article  MathSciNet  MATH  Google Scholar 

  • Bramson M, Lu Y, Prabhakar B (2013). Decay of tails at equilibrium for FIFO join the shortest queue networks. The Annals of Applied Probability 23(5): 1841–1878.

    Article  MathSciNet  MATH  Google Scholar 

  • Calheiros R N, Ranjan R, Beloglazov A, De Rose C A, Buyya R (2011). CloudSim: A toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software: Practice and experience 41(1): 23–50.

    Google Scholar 

  • Ethier S N, Kurtz T G(2009). Markov Processes: Characterization and Convergence. John Wiley & Sons, Inc., Hoboken, New Jersey.

    MATH  Google Scholar 

  • Gast N, Gaujal B (2010). A mean field model of work stealing in large-scale systems. ACM SIGMETRICS Performance Evaluation Review 38(1): 13–24.

    Article  Google Scholar 

  • Graham C (2000). Chaoticity on path space for a queueing network with selection of the shortest queue among several. Journal of Applied Probability 37(1): 198–211.

    Article  MathSciNet  MATH  Google Scholar 

  • Graham C (2005). Functional central limit theorems for a large network in which customers join the shortest of several queues. Probability Theory and Related Fields 131(1): 97–120.

    Article  MathSciNet  MATH  Google Scholar 

  • Harchol-Balter M, Li C, Osogami T, Scheller-Wolf A, Squillante M S (2003). Analysis of task assignment with cycle stealing under central queue. In Proceedings of the 23rd International Conference on Distributed Computing Systems, Providence, Rhode Island, May 19–22, 2003.

    Google Scholar 

  • Hendler D, Shavit N (2002). Non-blocking steal-half work queues. In Proceedings of the 21st Annual Symposium on Principles of Distributed Computing, Monterey, California, July 21–24, 2002.

    Google Scholar 

  • Iosup A, Ostermann S, Yigitbasi M N, Prodan R, Fahringer T, Epema D (2011). Performance analysis of cloud computing services for many-tasks scientific computing. IEEE Transactions on Parallel and Distributed Systems 22(6): 931–945.

    Article  Google Scholar 

  • Jennings B, Stadler R (2015). Resource management in clouds: Survey and research challenges. Journal of Network and Systems Management 23(3): 567–619.

    Article  Google Scholar 

  • Li Q L (2014). Tail probabilities in queueing processes. Asia-Pacific Journal of Operational Research 31(2): 1–31.

    Article  MathSciNet  MATH  Google Scholar 

  • Li Q L, Dai G, Lui J C S, Wang Y (2014). The mean-field computation in a supermarket model with server multiple vacations. Discrete Event Dynamic Systems 24(4): 473–522.

    Article  MathSciNet  MATH  Google Scholar 

  • Li Q L, Du Y, Dai G, Wang M (2015). On a doubly dynamically controlled supermarket model with impatient customers. Computers & Operations Research 55(1): 76–87.

    Article  MathSciNet  MATH  Google Scholar 

  • Li Q L, Lui J C S (2016). Block-structured supermarket models. Discrete Event Dynamic Systems 26(2): 147–182.

    Article  MathSciNet  MATH  Google Scholar 

  • Li Q L, Yang F (2015). Mean-field analysis for heterogeneous work stealing models. In 14th International Conference on Information Technologies and Mathematical Modelling Anzhero-Sudzhensk, Russia, November 18–22, 2015.

    Google Scholar 

  • Lin C, Tian Y, Yao M (2012). Green network and green evaluation: Mechanism, modeling and evaluation. Chinese Journal of Computers 34(4): 593–612.

    Article  Google Scholar 

  • Lu Y, Xie Q, Kliot G, Geller A, Larus J R, Greenberg A (2011). Join-idle-queue: A novel load balancing algorithm for dynamically scalable web services. Performance Evaluation 68(11): 1056–1071.

    Article  Google Scholar 

  • Luczak M, McDiarmid C (2007). Asymptotic distributions and chaos for the supermarket model. Electronic Journal of Probability 12(1): 75–99.

    Article  MathSciNet  MATH  Google Scholar 

  • Manvi S S, Shyam G K (2014). Resource management for infrastructure as a service (iaas) in cloud computing: a survey. Journal of Network & Computer Applications 41(1): 424–440.

    Article  Google Scholar 

  • Minnebo W, Van Houdt B (2012). Pull versus push mechanism in large distributed networks: Closed formresults. In Proceedings of the 24th International Teletraffic Congress, Krakow, Poland, September 04–07, 2012.

    Google Scholar 

  • Minnebo W, Van Houdt B (2013). Improved rate-based pull and push strategies in large distributed networks. In the IEEE 21st International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems San Francisco, August 14–16, 2013.

    Google Scholar 

  • Mitzenmacher M D (1996). The power of two choices in randomized load balancing. Department of Computer Science. PhD Thesis, University of California, Berkeley, USA.

    Google Scholar 

  • Mitzenmacher M D (2000). Analyses of load stealing models based on families of differential equations. Theory of Computing Systems 34(1): 77–98.

    Article  MathSciNet  MATH  Google Scholar 

  • Moreno I S, Garraghan P, Townend P, Xu J (2014). Analysis, modeling and simulation of workload patterns in a large-scale utility cloud. IEEE Transactions on Cloud Computing 2(2): 208–221.

    Article  Google Scholar 

  • Osogami T, Harchol-Balter M, Scheller-Wolf A (2003). Analysis of cycle stealing with switching cost. Journal of the ACM 31(1): 184–195.

    Google Scholar 

  • Sotiriadis S, Bessis N, Antonopoulos N, Anjum A (2013). SimIC: Designing a new inter-cloud simulation platform for integrating large-scale resource management. In the IEEE 27th International Conference on Advanced Information Networking and Applications, Barcelona, Spain, March 25–28, 2013.

    Google Scholar 

  • Squillante M S (2007). Stochastic analysis of multiserver systems. ACM SIGMETRICS Performance Evaluation Review 34(4): 44–51.

    Article  Google Scholar 

  • Squillante M S, Nelson R D (1991). Analysis of task migration in shared-memory multiprocessor scheduling. ACM SIGMETRICS Performance Evaluation Review 19(1): 143–155.

    Article  Google Scholar 

  • Stolyar A L (2015). Pull-based load distribution in large-scale heterogeneous service systems. Queueing Systems 80(4): 341–361.

    Article  MathSciNet  MATH  Google Scholar 

  • Turner S R (1998). The effect of increasing routing choice on resource pooling. Probability in the Engineering and Informational Sciences 12(1): 109–124.

    Article  MathSciNet  MATH  Google Scholar 

  • van der Boor M, Borst S C, van Leeuwaarden J S, Mukherjee D (2018). Scalable load balancing in networked systems: A survey of recent advances. arXiv preprint arXiv:1806.05444 1–69.

    Google Scholar 

  • Van Houdt B (2011). Performance comparison of aggressive push and traditional pull strategies in large distributed systems. In the 8th International Conference on Quantitative Evaluation of Systems, Aachen, Germany, September 5–8, 2011.

    Google Scholar 

  • Vvedenskaya N D, Dobrushin R L, Karpelevich F I (1996). Queueing system with selection of the shortest of two queues: An asymptotic approach. Problems of Information Transmission 32(1): 20–34.

    MathSciNet  MATH  Google Scholar 

  • Vvedenskaya N D, Suhov Y M (1997). Dobrushin’s meanfield approximation for a queue with dynamic routing. Markov Processes and Related Fields 13(1): 493–526.

    MATH  Google Scholar 

  • Wuhib F, Yanggratoke R, Stadler R (2015). Allocating compute and network resources under management objectives in large-scale clouds. Journal of Network and Systems Management 23(1): 111–136.

    Article  Google Scholar 

Download references

Acknowledgments

The authors are grateful to the editor and two anonymous referees for their constructive comments and suggestions, which sufficiently help the authors to improve the presentation of this manuscript. In addition, Yanping Jiangwas supported by the National Natural Science Foundation of China under grant Nos. 71871048 and 71571040; and Quanlin Li was supported by the National Natural Science Foundation of China under grant Nos. 71671158 and 71471160, and by the Natural Science Foundation of Hebei province under grant No. G2017203277.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yanping Jiang or Quanlin Li.

Additional information

Feifei Yang is a doctoral candidate at Department of Information Management and Decision Sciences, School of Business Administration, Northeastern University, Shenyang, China. She got her master degree at School of Economics and Management, Yanshan University, Qinhuangdao, China. Now, her research interests include queueing networks, the mean-field theory, service systems, resource management of big networks, and health care systems.

Yanping Jiang is a full professor at Department of Information Management and Decision Sciences, School of Business Administration, Northeastern University, Shenyang, China. She received her Ph.D. degree of the management science and engineering from Northeastern University. Now, her research interests include decision analysis, service systems, health care systems, and other topics in operations research. She has published two Chinese monographs and over 80 papers in various academic journals, for example, European Journal of Operational Research, Computers & Industrial Engineering, Soft Computing and so on.

Quanlin Li is a full professor at School of Economics and Management, Beijing University of Technology, Beijing, China. He received his Ph.D. degree at Institute of Applied Mathematics, Chinese Academy of Sciences, Beijing, China. He has published an English monograph (Constructive Computation in Stochastic Models with Applications: The RG-Factorizations, Springer, 2010) and over 60 research papers in a variety of international journals, such as, Advances in Applied Probability, Queueing Systems, Stochastic Models, European Journal of Operational Research, Computer Networks, Performance Evaluation, Discrete Event Dynamic Systems, Computers & Operations Research, Computers & Mathematics with Applications, Annals of Operations Research, and International Journal of Production Economics. Now, his research interests include stochastic models, stochastic processes, the mean-field theory, stochastic process algebra, game theory, queueing networks, computer networks, resource management in big networks, and health care systems.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, F., Jiang, Y. & Li, Q. Mean-field Macro Computation in Large-scale Cloud Service Systems with Resource Management and Job Scheduling. J. Syst. Sci. Syst. Eng. 28, 238–261 (2019). https://doi.org/10.1007/s11518-018-5399-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11518-018-5399-z

Keywords

Navigation