Performance Analysis of Work Stealing Strategies in Large Scale Multi-threaded Computing

Kielanski, Grzegorz; Van Houdt, Benny

doi:10.1007/978-3-030-85172-9_18

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12846))

Included in the following conference series:

International Conference on Quantitative Evaluation of Systems

701 Accesses
1 Citations

Abstract

Distributed systems use randomized work stealing to improve performance and resource utilization. In most prior analytical studies of randomized work stealing, jobs are considered to be sequential and are executed as a whole on a single server. In this paper we consider a homogeneous system of servers where parent jobs spawn child jobs that can feasibly be executed in parallel. When an idle server probes a busy server in an attempt to steal work, it may either steal a parent job or multiple child jobs.

To approximate the performance of this system we introduce a Quasi-Birth-Death Markov chain and express the performance measures of interest via its unique steady state. We perform simulation experiments that suggest that the approximation error tends to zero as the number of servers in the system becomes large. Using numerical experiments we compare the performance of various simple stealing strategies as well as optimized strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bini, D., Meini, B., Steffé, S., Van Houdt, B.: Structured Markov chains solver: software tools. In: Proceeding From the 2006 Workshop on Tools for Solving Structured Markov Chains, pp. 1–14 (2006)
Google Scholar
Blumofe, R., Leiserson, C.: Scheduling multithreaded computations by work stealing. J. ACM (JACM) 46(5), 720–748 (1999)
Article MathSciNet Google Scholar
Blumofe, R., Joerg, C., Kuszmaul, B., Leiserson, C., Randall, K., Zhou, Y.: Cilk: an efficient multithreaded runtime system. J. Parallel Distrib. Comput. 37(1), 55–69 (1996)
Article Google Scholar
Bramson, M., Lu, Y., Prabhakar, B.: Randomized load balancing with general service time distributions. In: ACM SIGMETRICS 2010, pp. 275–286 (2010). https://doi.org/10.1145/1811039.1811071, http://doi.acm.org/10.1145/1811039.1811071
Eager, D., Lazowska, E., Zahorjan, J.: A comparison of receiver-initiated and sender-initiated adaptive load sharing. Perform. Eval. 6(1), 53–68 (1986)
Article Google Scholar
Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the cilk-5 multithreaded language. In: Proceedings of the SIGPLAN 1998 Conference on Program Language Design and Implementation, pp. 212–223 (1998)
Google Scholar
Gast, N.: Expected values estimated via mean-field approximation are 1/n-accurate. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 1, no. 1, p. 17 (2017)
Google Scholar
Gast, N., Gaujal, B.: A mean field model of work stealing in large-scale systems. ACM SIGMETRICS Perform. Eval. Rev. 38(1), 13–24 (2010)
Article Google Scholar
Gautier, T., Besseron, X., Pigeon, L.: Kaapi: A thread scheduling runtime system for data flow computations on cluster of multi-processors. In: Proceedings of the 2007 International Workshop on Parallel Symbolic Computation, pp. 15–23 (2007)
Google Scholar
Horváth, G., Van Houdt, B., Telek, M.: Commuting matrices in the queue length and sojourn time analysis of map/map/1 queues. Stoch. Model. 30(4), 554–575 (2014)
Article MathSciNet Google Scholar
Latouche, G., Ramaswami, V.: Introduction to matrix analytic methods in stochastic modeling, vol. 5. SIAM (1999)
Google Scholar
Lea, D.: A java fork/join framework. In: Proceedings of the ACM 2000 Conference on Java Grande, JAVA 2000, New York, NY, USA, pp. 36–43. Association for Computing Machinery (2000). https://doi.org/10.1145/337449.337465
Leijen, D., Schulte, W., Burckhardt, S.: The design of a task parallel library. In: Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications. OOPSLA 2009, New York, NY, USA, pp. 227–242. Association for Computing Machinery (2009). https://doi.org/10.1145/1640089.1640106
Minnebo, W., Hellemans, T., Van Houdt, B.: On a class of push and pull strategies with single migrations and limited probe rate. Perform. Eval. 113, 42–67 (2017)
Article Google Scholar
Minnebo, W., Van Houdt, B.: A fair comparison of pull and push strategies in large distributed networks. IEEE/ACM Trans. Networking (TON) 22(3), 996–1006 (2014)
Article Google Scholar
Mirchandaney, R., Towsley, D., Stankovic, J.: Adaptive load sharing in heterogeneous distributed systems. J. Parallel Distrib. Comput. 9(4), 331–346 (1990)
Article Google Scholar
Neuts, M.: Matrix-Geometric Solutions in Stochastic Models: An Algorithmic Approach. John Hopkins University Press, Baltimore (1981)
MATH Google Scholar
Ozawa, T.: Sojourn time distributions in the queue defined by a general QBD process. Queue. Syst. Appl. 53(4), 203–211 (2006)
Article MathSciNet Google Scholar
Robison, A., Voss, M., Kukanov, A.: Optimization via reflection on work stealing in TBB. In: 2008 IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8. IEEE (2008)
Google Scholar
Sonenberg, B., Kielanski, G., Van Houdt, B.: Performance analysis of work stealing in large scale multithreaded computing. ACM ToMPECS (2021, to appear)
Google Scholar
Spilbeeck, I.V., Houdt, B.V.: Performance of rate-based pull and push strategies in heterogeneous networks. Perform. Eval. 91, 2–15 (2015)
Article Google Scholar
Squillante, M., Nelson, R.: Analysis of task migration in shared-memory multiprocessor scheduling. SIGMETRICS Perform. Eval. Rev. 19(1), 143–155 (1991). http://doi.acm.org/10.1145/107972.107987
Article Google Scholar
Van Houdt, B.: Randomized work stealing versus sharing in large-scale systems with non-exponential job sizes. IEEE/ACM Trans. Networking 27, 2137–2149 (2019)
Article Google Scholar
Wirth, N.: Tasks versus threads: an alternative multiprocessing paradigm. Software Concepts Tools 17, 6–12 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Antwerp, Middelheimlaan 1, Antwerp, 2020, Belgium
Grzegorz Kielanski & Benny Van Houdt

Authors

Grzegorz Kielanski
View author publications
You can also search for this author in PubMed Google Scholar
Benny Van Houdt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Benny Van Houdt .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Alessandro Abate
Ca’ Foscari University of Venice, Venice, Italy
Andrea Marin

A Model Validation

Based on numerical experiments in the Sect. 5, we see that stealing all or half of the children are good stealing policies: stealing all works best for low values of r, while stealing half of the children works well for higher values. Therefore, we validate the mean field model for the policy of stealing all or half of the children. We always start the simulations from an empty system and simulate the behaviour for \(T = 10^5\) with a warm up period of 33% of T.

In Fig. 4 we focus on the case where all children are stolen. The 95% confidence intervals were computed based on 5 runs with \(N=500\) servers, \(m=4\), \(\mu _1 = 1, \mu _2 = 2, \rho = 0.75\), \(\mathbf {p} = (1,1,1,1,1)/5\) and \(r\in \{1,5\}\). We see that there is an excellent match between the simulated waiting and service times and those of the QBD model (calculated using Sect. 4).

In Table 2 we compare the relative error of the simulated mean response time, based on 20 runs, to the one obtained from Section 4. We do this for \(\mu _1 = 1, \mu _2 = 2\), \(\mathbf {p} = (1,1,1,1,1)/5\), \(\rho \in \{0.75,0.85\}\), \(r \in \{1,10\}\) and \(N \in \{250,500,1000,2000,4000\}\).

The relative error in all cases is below 1.5% and tends to increase with the steal rate r. Further, the relative error seems roughly to halve when doubling N, which is in agreement with the results in [7].

Table 2. Relative error of simulation results for E[T(r)], based on 20 runs

Full size table

Next we validate the model for the strategy of stealing half of the children using the same simulation settings. In Fig. 5, we see that there is an excellent match between the simulated waiting and service times and those of the QBD model. Similarly to Table 2, we see in Table 3 that the relative error is below 1.5% in all cases, tends to increase with the steal rate r and seems about halved when doubling N.

Table 3. Relative error of simulation results for E[T(r)], based on 20 runs

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kielanski, G., Van Houdt, B. (2021). Performance Analysis of Work Stealing Strategies in Large Scale Multi-threaded Computing. In: Abate, A., Marin, A. (eds) Quantitative Evaluation of Systems. QEST 2021. Lecture Notes in Computer Science(), vol 12846. Springer, Cham. https://doi.org/10.1007/978-3-030-85172-9_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-85172-9_18
Published: 19 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85171-2
Online ISBN: 978-3-030-85172-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Performance Analysis of Work Stealing Strategies in Large Scale Multi-threaded Computing

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Model Validation

A Model Validation

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation