Skip to main content

Performance Analysis of Work Stealing Strategies in Large Scale Multi-threaded Computing

  • Conference paper
  • First Online:
Quantitative Evaluation of Systems (QEST 2021)

Abstract

Distributed systems use randomized work stealing to improve performance and resource utilization. In most prior analytical studies of randomized work stealing, jobs are considered to be sequential and are executed as a whole on a single server. In this paper we consider a homogeneous system of servers where parent jobs spawn child jobs that can feasibly be executed in parallel. When an idle server probes a busy server in an attempt to steal work, it may either steal a parent job or multiple child jobs.

To approximate the performance of this system we introduce a Quasi-Birth-Death Markov chain and express the performance measures of interest via its unique steady state. We perform simulation experiments that suggest that the approximation error tends to zero as the number of servers in the system becomes large. Using numerical experiments we compare the performance of various simple stealing strategies as well as optimized strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bini, D., Meini, B., Steffé, S., Van Houdt, B.: Structured Markov chains solver: software tools. In: Proceeding From the 2006 Workshop on Tools for Solving Structured Markov Chains, pp. 1–14 (2006)

    Google Scholar 

  2. Blumofe, R., Leiserson, C.: Scheduling multithreaded computations by work stealing. J. ACM (JACM) 46(5), 720–748 (1999)

    Article  MathSciNet  Google Scholar 

  3. Blumofe, R., Joerg, C., Kuszmaul, B., Leiserson, C., Randall, K., Zhou, Y.: Cilk: an efficient multithreaded runtime system. J. Parallel Distrib. Comput. 37(1), 55–69 (1996)

    Article  Google Scholar 

  4. Bramson, M., Lu, Y., Prabhakar, B.: Randomized load balancing with general service time distributions. In: ACM SIGMETRICS 2010, pp. 275–286 (2010). https://doi.org/10.1145/1811039.1811071, http://doi.acm.org/10.1145/1811039.1811071

  5. Eager, D., Lazowska, E., Zahorjan, J.: A comparison of receiver-initiated and sender-initiated adaptive load sharing. Perform. Eval. 6(1), 53–68 (1986)

    Article  Google Scholar 

  6. Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the cilk-5 multithreaded language. In: Proceedings of the SIGPLAN 1998 Conference on Program Language Design and Implementation, pp. 212–223 (1998)

    Google Scholar 

  7. Gast, N.: Expected values estimated via mean-field approximation are 1/n-accurate. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 1, no. 1, p. 17 (2017)

    Google Scholar 

  8. Gast, N., Gaujal, B.: A mean field model of work stealing in large-scale systems. ACM SIGMETRICS Perform. Eval. Rev. 38(1), 13–24 (2010)

    Article  Google Scholar 

  9. Gautier, T., Besseron, X., Pigeon, L.: Kaapi: A thread scheduling runtime system for data flow computations on cluster of multi-processors. In: Proceedings of the 2007 International Workshop on Parallel Symbolic Computation, pp. 15–23 (2007)

    Google Scholar 

  10. Horváth, G., Van Houdt, B., Telek, M.: Commuting matrices in the queue length and sojourn time analysis of map/map/1 queues. Stoch. Model. 30(4), 554–575 (2014)

    Article  MathSciNet  Google Scholar 

  11. Latouche, G., Ramaswami, V.: Introduction to matrix analytic methods in stochastic modeling, vol. 5. SIAM (1999)

    Google Scholar 

  12. Lea, D.: A java fork/join framework. In: Proceedings of the ACM 2000 Conference on Java Grande, JAVA 2000, New York, NY, USA, pp. 36–43. Association for Computing Machinery (2000). https://doi.org/10.1145/337449.337465

  13. Leijen, D., Schulte, W., Burckhardt, S.: The design of a task parallel library. In: Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications. OOPSLA 2009, New York, NY, USA, pp. 227–242. Association for Computing Machinery (2009). https://doi.org/10.1145/1640089.1640106

  14. Minnebo, W., Hellemans, T., Van Houdt, B.: On a class of push and pull strategies with single migrations and limited probe rate. Perform. Eval. 113, 42–67 (2017)

    Article  Google Scholar 

  15. Minnebo, W., Van Houdt, B.: A fair comparison of pull and push strategies in large distributed networks. IEEE/ACM Trans. Networking (TON) 22(3), 996–1006 (2014)

    Article  Google Scholar 

  16. Mirchandaney, R., Towsley, D., Stankovic, J.: Adaptive load sharing in heterogeneous distributed systems. J. Parallel Distrib. Comput. 9(4), 331–346 (1990)

    Article  Google Scholar 

  17. Neuts, M.: Matrix-Geometric Solutions in Stochastic Models: An Algorithmic Approach. John Hopkins University Press, Baltimore (1981)

    MATH  Google Scholar 

  18. Ozawa, T.: Sojourn time distributions in the queue defined by a general QBD process. Queue. Syst. Appl. 53(4), 203–211 (2006)

    Article  MathSciNet  Google Scholar 

  19. Robison, A., Voss, M., Kukanov, A.: Optimization via reflection on work stealing in TBB. In: 2008 IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8. IEEE (2008)

    Google Scholar 

  20. Sonenberg, B., Kielanski, G., Van Houdt, B.: Performance analysis of work stealing in large scale multithreaded computing. ACM ToMPECS (2021, to appear)

    Google Scholar 

  21. Spilbeeck, I.V., Houdt, B.V.: Performance of rate-based pull and push strategies in heterogeneous networks. Perform. Eval. 91, 2–15 (2015)

    Article  Google Scholar 

  22. Squillante, M., Nelson, R.: Analysis of task migration in shared-memory multiprocessor scheduling. SIGMETRICS Perform. Eval. Rev. 19(1), 143–155 (1991). http://doi.acm.org/10.1145/107972.107987

    Article  Google Scholar 

  23. Van Houdt, B.: Randomized work stealing versus sharing in large-scale systems with non-exponential job sizes. IEEE/ACM Trans. Networking 27, 2137–2149 (2019)

    Article  Google Scholar 

  24. Wirth, N.: Tasks versus threads: an alternative multiprocessing paradigm. Software Concepts Tools 17, 6–12 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benny Van Houdt .

Editor information

Editors and Affiliations

A Model Validation

A Model Validation

Based on numerical experiments in the Sect. 5, we see that stealing all or half of the children are good stealing policies: stealing all works best for low values of r, while stealing half of the children works well for higher values. Therefore, we validate the mean field model for the policy of stealing all or half of the children. We always start the simulations from an empty system and simulate the behaviour for \(T = 10^5\) with a warm up period of 33% of T.

In Fig. 4 we focus on the case where all children are stolen. The 95% confidence intervals were computed based on 5 runs with \(N=500\) servers, \(m=4\), \(\mu _1 = 1, \mu _2 = 2, \rho = 0.75\), \(\mathbf {p} = (1,1,1,1,1)/5\) and \(r\in \{1,5\}\). We see that there is an excellent match between the simulated waiting and service times and those of the QBD model (calculated using Sect. 4).

Fig. 4.
figure 4

Waiting and response times from the QBD (blue dots) and simulations (red dashed line) with confidence intervals for 5 runs. (Color figure online)

In Table 2 we compare the relative error of the simulated mean response time, based on 20 runs, to the one obtained from Section 4. We do this for \(\mu _1 = 1, \mu _2 = 2\), \(\mathbf {p} = (1,1,1,1,1)/5\), \(\rho \in \{0.75,0.85\}\), \(r \in \{1,10\}\) and \(N \in \{250,500,1000,2000,4000\}\).

The relative error in all cases is below 1.5% and tends to increase with the steal rate r. Further, the relative error seems roughly to halve when doubling N, which is in agreement with the results in [7].

Table 2. Relative error of simulation results for E[T(r)], based on 20 runs
Fig. 5.
figure 5

Waiting and response times from the QBD (blue dots) and simulations (red dashed line) with confidence intervals for 5 runs. (Color figure online)

Next we validate the model for the strategy of stealing half of the children using the same simulation settings. In Fig. 5, we see that there is an excellent match between the simulated waiting and service times and those of the QBD model. Similarly to Table 2, we see in Table 3 that the relative error is below 1.5% in all cases, tends to increase with the steal rate r and seems about halved when doubling N.

Table 3. Relative error of simulation results for E[T(r)], based on 20 runs

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kielanski, G., Van Houdt, B. (2021). Performance Analysis of Work Stealing Strategies in Large Scale Multi-threaded Computing. In: Abate, A., Marin, A. (eds) Quantitative Evaluation of Systems. QEST 2021. Lecture Notes in Computer Science(), vol 12846. Springer, Cham. https://doi.org/10.1007/978-3-030-85172-9_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-85172-9_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-85171-2

  • Online ISBN: 978-3-030-85172-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics