Distributed Computing

, Volume 18, Issue 3, pp 189–207 | Cite as

A dynamic-sized nonblocking work stealing deque

Special Issue Disc 04

Abstract

The non-blocking work-stealing algorithm of Arora, Blumofe, and Plaxton (hencheforth ABP work-stealing) is on its way to becoming the multiprocessor load balancing technology of choice in both industry and academia. This highly efficient scheme is based on a collection of array-based double-ended queues (deques) with low cost synchronization among local and stealing processes. Unfortunately, the algorithm's synchronization protocol is strongly based on the use of fixed size arrays, which are prone to overflows, especially in the multiprogrammed environments for which they are designed. This is a significant drawback since, apart from memory inefficiency, it means that the size of the deque must be tailored to accommodate the effects of the hard-to-predict level of multiprogramming, and the implementation must include an expensive and application-specific overflow mechanism.

This paper presents the first dynamic memory work-stealing algorithm. It is based on a novel way of building non-blocking dynamic-sized work stealing deques by detecting synchronization conflicts based on “pointer-crossing” rather than “gaps between indexes” as in the original ABP algorithm. As we show, the new algorithm dramatically increases robustness and memory efficiency, while causing applications no observable performance penalty. We therefore believe it can replace array-based ABP work stealing deques, eliminating the need for application-specific overflow mechanisms.

Keywords

Concurrent programming Load balancing Work stealing Lock-free Data structures 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Lev, Y.: A Dynamic-Sized Nonblocking Work Stealing Deque. MS thesis, Tel-Aviv University, Tel-Aviv, Israel (2004)Google Scholar
  2. 2.
    Rudolph, L., Slivkin-Allalouf, M., Upfal, E.: A simple load balancing scheme for task allocation in parallel machines. In Proceedings of the 3rd Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 237–245. ACM Press (1991)Google Scholar
  3. 3.
    Arora, N.S., Blumofe, R.D., Plaxton, C.G.: Thread scheduling for multiprogrammed multiprocessors. Theory of Computing Systems 34, 115–144 (2001)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Acar, U.A., Blelloch, G.E., Blumofe, R.D.: The data locality of work stealing. In: ACM Symposium on Parallel Algorithms and Architectures, pp. 1–12 (2000)Google Scholar
  5. 5.
    Flood, C., Detlefs, D., Shavit, N., Zhang, C.: Parallel garbage collection for shared memory multiprocessors. In: Usenix Java Virtual Machine Research and Technology Symposium (JVM '01), Monterey, CA (2001)Google Scholar
  6. 6.
    Leiserson, P.: Programming parallel applications in cilk. SINEWS: SIAM News 31 (1998)Google Scholar
  7. 7.
    Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. Journal of the ACM 46, 720–748 (1999)CrossRefMathSciNetGoogle Scholar
  8. 8.
    Knuth, D.: The Art of Computer Programming: Fundamental Algorithms. 2nd edn. Addison-Wesley (1968)Google Scholar
  9. 9.
    Hendler, D., Shavit, N.: Non-blocking steal-half work queues. In: Proceedings of the 21st Annual ACM Symposium on Principles of Distributed Computing (2002)Google Scholar
  10. 10.
    Detlefs, D., Flood, C., Heller, S., Printezis, T.: Garbage-first garbage collection. Technical report, Sun Microsystems – Sun Laboratories (2004) To appear.Google Scholar
  11. 11.
    Agesen, O., Detlefs, D., Flood, C., Garthwaite, A., Martin, P., Moir, M., Shavit, N., Steele, G.: DCAS-based concurrent deques. Theory of Computing Systems 35, 349–386 (2002)CrossRefMathSciNetGoogle Scholar
  12. 12.
    Martin, P., Moir, M., Steele, G.: Dcas-based concurrent deques supporting bulk allocation. Technical Report TR-2002-111, Sun Microsystems Laboratories (2002)Google Scholar
  13. 13.
    Greenwald, M.B., Cheriton, D.R.: The synergy between non-blocking synchronization and operating system structure. In: 2nd Symposium on Operating Systems Design and Implementation, pp. 123–136. Seattle, WA (1996)Google Scholar
  14. 14.
    Blumofe, R.D., Papadopoulos, D.: The performance of work stealing in multiprogrammed environments (extended abstract). In: Measurement and Modeling of Computer Systems, pp. 266–267 (1998)Google Scholar
  15. 15.
    Arnold, J.M., Buell, D.A., Davis, E.G.: Splash 2. In: Proceedings of the Fourth Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 316–322. ACM Press (1992)Google Scholar
  16. 16.
    Papadopoulos, D.: Hood: A user-level thread library for multiprogrammed multiprocessors. In: Master's thesis, Department of Computer Sciences, University of Texas at Austin (1998)Google Scholar
  17. 17.
    Prakash, S., Lee, Y., Johnson, T.: A non-blocking algorithm for shared queues using compare-and-swap. IEEE Transactions on Computers 43, 548–559 (1994)CrossRefGoogle Scholar
  18. 18.
    Scott, M.L.: Personal communication: Code for a lock-free memory management pool (2003)Google Scholar
  19. 19.
    Hendler, D., Lev, Y., Moir, M., Shavit, N.: A dynamic-sized nonblocking work stealing deque. Technical Report TR-2005-144, Sun Microsystems Laboratories (2005)Google Scholar

Copyright information

© Springer-Verlag 2005

Authors and Affiliations

  • Danny Hendler
    • 1
  • Yossi Lev
    • 2
  • Mark Moir
    • 3
  • Nir Shavit
    • 4
  1. 1.Tel-Aviv UniversityIsrael
  2. 2.Brown University & Sun Microsystems LaboratoriesUK
  3. 3.Sun Microsystems LaboratoriesUSA
  4. 4.Sun Microsystems Laboratories & Tel-Aviv UniversityIsrael

Personalised recommendations