Dynamic Memory ABP Work-Stealing

  • Danny Hendler
  • Yossi Lev
  • Nir Shavit
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3274)

Abstract

The non-blocking work-stealing algorithm of Arora, Blumofe, and Plaxton (hencheforth ABP work-stealing) is on its way to becoming the multiprocessor load balancing technology of choice in both Industry and Academia. This highly efficient scheme is based on a collection of array-based deques with low cost synchronization among local and stealing processes. Unfortunately, the algorithm’s synchronization protocol is strongly based on the use of fixed size arrays, which are prone to overflows, especially in the multiprogrammed environments which they are designed for. This is a significant drawback since, apart from memory inefficiency, it means users must tailor the deque size to accommodate the effects of the hard-to-predict level of multiprogramming, and add expensive blocking overflow-management mechanisms.

This paper presents the first dynamic memory work-stealing algorithm. It is based on a novel way of building non-blocking dynamic memory ABP deques by detecting synchronization conflicts based on “pointer-crossing” rather than “gaps between indexes” as in the original ABP algorithm. As we show, the new algorithm dramatically increases robustness and memory efficiency, while causing applications no observable performance penalty. We therefore believe it can replace array-based ABP work-queues, eliminating the need to add application specific overflow mechanisms.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arora, N.S., Blumofe, R.D., Plaxton, C.G.: Thread scheduling for multiprogrammed multiprocessors. Theory of Computing Systems 34, 115–144 (2001)MATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Acar, U.A., Blelloch, G.E., Blumofe, R.D.: The data locality of work stealing. In: ACM Symposium on Parallel Algorithms and Architectures, pp. 1–12 (2000)Google Scholar
  3. 3.
    Flood, C., Detlefs, D., Shavit, N., Zhang, C.: Parallel garbage collection for shared memory multiprocessors. In: Usenix Java Virtual Machine Research and Technology Symposium (JVM 2001), Monterey, CA (2001)Google Scholar
  4. 4.
    Leiserson, Plaat: Programming parallel applications in cilk. SINEWS: SIAM News 31 (1998)Google Scholar
  5. 5.
    Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. Journal of the ACM 46, 720–748 (1999)MATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Hendler, D., Shavit, N.: Non-blocking steal-half work queues. In: Proceedings of the 21st Annual ACM Symposium on Principles of Distributed Computing (2002)Google Scholar
  7. 7.
    Detlefs, D., Flood, C., Heller, S., Printezis, T.: Garbage-first garbage collection. Technical report, Sun Microsystems – Sun Laboratories (2004) (to appear)Google Scholar
  8. 8.
    Agesen, O., Detlefs, D., Flood, C., Garthwaite, A., Martin, P., Moir, M., Shavit, N., Steele, G.: DCAS-based concurrent deques. Theory of Computing Systems 35, 349–386 (2002)MATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Martin, P., Moir, M., Steele, G.: Dcas-based concurrent deques supporting bulk allocation. Technical Report TR-2002-111, Sun Microsystems Laboratories (2002)Google Scholar
  10. 10.
    Greenwald, M.B., Cheriton, D.R.: The synergy between non-blocking synchronization and operating system structure. In: 2nd Symposium on Operating Systems Design and Implementation, Seattle, WA, pp. 123–136 (1996)Google Scholar
  11. 11.
    Blumofe, R.D., Papadopoulos, D.: The performance of work stealing in multiprogrammed environments (extended abstract). In: Measurement and Modeling of Computer Systems, pp. 266–267 (1998)Google Scholar
  12. 12.
    Arnold, J.M., Buell, D.A., Davis, E.G.: Splash 2. In: Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures, pp. 316–322. ACM Press, New York (1992)CrossRefGoogle Scholar
  13. 13.
    Papadopoulos, D.: Hood: A user-level thread library for multiprogrammed multiprocessors. In: Master’s thesis, Department of Computer Sciences, University of Texas at Austin (1998)Google Scholar
  14. 14.
    Scott, M.L.: Personal communication: Code for a lock-free memory management pool (2003)Google Scholar
  15. 15.
    Herlihy, M., Wing, J.: Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems 12, 463–492 (1990)CrossRefGoogle Scholar
  16. 16.
    Blumofe, R.D., Plaxton, C.G., Ray, S.: Verification of a concurrent deque implementation. Technical Report CS-TR-99-11, University of Texas at Austin (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Danny Hendler
    • 1
  • Yossi Lev
    • 2
  • Nir Shavit
    • 2
  1. 1.Tel-Aviv University 
  2. 2.Sun Microsystems Laboratories & Tel-Aviv University 

Personalised recommendations