Abstract
Distributed systems use randomized work stealing to improve performance and resource utilization. In most prior analytical studies of randomized work stealing, jobs are considered to be sequential and are executed as a whole on a single server. In this paper we consider a homogeneous system of servers where parent jobs spawn child jobs that can feasibly be executed in parallel. When an idle server probes a busy server in an attempt to steal work, it may either steal a parent job or multiple child jobs.
To approximate the performance of this system we introduce a Quasi-Birth-Death Markov chain and express the performance measures of interest via its unique steady state. We perform simulation experiments that suggest that the approximation error tends to zero as the number of servers in the system becomes large. Using numerical experiments we compare the performance of various simple stealing strategies as well as optimized strategies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bini, D., Meini, B., Steffé, S., Van Houdt, B.: Structured Markov chains solver: software tools. In: Proceeding From the 2006 Workshop on Tools for Solving Structured Markov Chains, pp. 1–14 (2006)
Blumofe, R., Leiserson, C.: Scheduling multithreaded computations by work stealing. J. ACM (JACM) 46(5), 720–748 (1999)
Blumofe, R., Joerg, C., Kuszmaul, B., Leiserson, C., Randall, K., Zhou, Y.: Cilk: an efficient multithreaded runtime system. J. Parallel Distrib. Comput. 37(1), 55–69 (1996)
Bramson, M., Lu, Y., Prabhakar, B.: Randomized load balancing with general service time distributions. In: ACM SIGMETRICS 2010, pp. 275–286 (2010). https://doi.org/10.1145/1811039.1811071, http://doi.acm.org/10.1145/1811039.1811071
Eager, D., Lazowska, E., Zahorjan, J.: A comparison of receiver-initiated and sender-initiated adaptive load sharing. Perform. Eval. 6(1), 53–68 (1986)
Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the cilk-5 multithreaded language. In: Proceedings of the SIGPLAN 1998 Conference on Program Language Design and Implementation, pp. 212–223 (1998)
Gast, N.: Expected values estimated via mean-field approximation are 1/n-accurate. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 1, no. 1, p. 17 (2017)
Gast, N., Gaujal, B.: A mean field model of work stealing in large-scale systems. ACM SIGMETRICS Perform. Eval. Rev. 38(1), 13–24 (2010)
Gautier, T., Besseron, X., Pigeon, L.: Kaapi: A thread scheduling runtime system for data flow computations on cluster of multi-processors. In: Proceedings of the 2007 International Workshop on Parallel Symbolic Computation, pp. 15–23 (2007)
Horváth, G., Van Houdt, B., Telek, M.: Commuting matrices in the queue length and sojourn time analysis of map/map/1 queues. Stoch. Model. 30(4), 554–575 (2014)
Latouche, G., Ramaswami, V.: Introduction to matrix analytic methods in stochastic modeling, vol. 5. SIAM (1999)
Lea, D.: A java fork/join framework. In: Proceedings of the ACM 2000 Conference on Java Grande, JAVA 2000, New York, NY, USA, pp. 36–43. Association for Computing Machinery (2000). https://doi.org/10.1145/337449.337465
Leijen, D., Schulte, W., Burckhardt, S.: The design of a task parallel library. In: Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications. OOPSLA 2009, New York, NY, USA, pp. 227–242. Association for Computing Machinery (2009). https://doi.org/10.1145/1640089.1640106
Minnebo, W., Hellemans, T., Van Houdt, B.: On a class of push and pull strategies with single migrations and limited probe rate. Perform. Eval. 113, 42–67 (2017)
Minnebo, W., Van Houdt, B.: A fair comparison of pull and push strategies in large distributed networks. IEEE/ACM Trans. Networking (TON) 22(3), 996–1006 (2014)
Mirchandaney, R., Towsley, D., Stankovic, J.: Adaptive load sharing in heterogeneous distributed systems. J. Parallel Distrib. Comput. 9(4), 331–346 (1990)
Neuts, M.: Matrix-Geometric Solutions in Stochastic Models: An Algorithmic Approach. John Hopkins University Press, Baltimore (1981)
Ozawa, T.: Sojourn time distributions in the queue defined by a general QBD process. Queue. Syst. Appl. 53(4), 203–211 (2006)
Robison, A., Voss, M., Kukanov, A.: Optimization via reflection on work stealing in TBB. In: 2008 IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8. IEEE (2008)
Sonenberg, B., Kielanski, G., Van Houdt, B.: Performance analysis of work stealing in large scale multithreaded computing. ACM ToMPECS (2021, to appear)
Spilbeeck, I.V., Houdt, B.V.: Performance of rate-based pull and push strategies in heterogeneous networks. Perform. Eval. 91, 2–15 (2015)
Squillante, M., Nelson, R.: Analysis of task migration in shared-memory multiprocessor scheduling. SIGMETRICS Perform. Eval. Rev. 19(1), 143–155 (1991). http://doi.acm.org/10.1145/107972.107987
Van Houdt, B.: Randomized work stealing versus sharing in large-scale systems with non-exponential job sizes. IEEE/ACM Trans. Networking 27, 2137–2149 (2019)
Wirth, N.: Tasks versus threads: an alternative multiprocessing paradigm. Software Concepts Tools 17, 6–12 (1996)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Model Validation
A Model Validation
Based on numerical experiments in the Sect. 5, we see that stealing all or half of the children are good stealing policies: stealing all works best for low values of r, while stealing half of the children works well for higher values. Therefore, we validate the mean field model for the policy of stealing all or half of the children. We always start the simulations from an empty system and simulate the behaviour for \(T = 10^5\) with a warm up period of 33% of T.
In Fig. 4 we focus on the case where all children are stolen. The 95% confidence intervals were computed based on 5 runs with \(N=500\) servers, \(m=4\), \(\mu _1 = 1, \mu _2 = 2, \rho = 0.75\), \(\mathbf {p} = (1,1,1,1,1)/5\) and \(r\in \{1,5\}\). We see that there is an excellent match between the simulated waiting and service times and those of the QBD model (calculated using Sect. 4).
In Table 2 we compare the relative error of the simulated mean response time, based on 20 runs, to the one obtained from Section 4. We do this for \(\mu _1 = 1, \mu _2 = 2\), \(\mathbf {p} = (1,1,1,1,1)/5\), \(\rho \in \{0.75,0.85\}\), \(r \in \{1,10\}\) and \(N \in \{250,500,1000,2000,4000\}\).
The relative error in all cases is below 1.5% and tends to increase with the steal rate r. Further, the relative error seems roughly to halve when doubling N, which is in agreement with the results in [7].
Next we validate the model for the strategy of stealing half of the children using the same simulation settings. In Fig. 5, we see that there is an excellent match between the simulated waiting and service times and those of the QBD model. Similarly to Table 2, we see in Table 3 that the relative error is below 1.5% in all cases, tends to increase with the steal rate r and seems about halved when doubling N.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Kielanski, G., Van Houdt, B. (2021). Performance Analysis of Work Stealing Strategies in Large Scale Multi-threaded Computing. In: Abate, A., Marin, A. (eds) Quantitative Evaluation of Systems. QEST 2021. Lecture Notes in Computer Science(), vol 12846. Springer, Cham. https://doi.org/10.1007/978-3-030-85172-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-85172-9_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85171-2
Online ISBN: 978-3-030-85172-9
eBook Packages: Computer ScienceComputer Science (R0)