Dynamic Memory ABP Work-Stealing
The non-blocking work-stealing algorithm of Arora, Blumofe, and Plaxton (hencheforth ABP work-stealing) is on its way to becoming the multiprocessor load balancing technology of choice in both Industry and Academia. This highly efficient scheme is based on a collection of array-based deques with low cost synchronization among local and stealing processes. Unfortunately, the algorithm’s synchronization protocol is strongly based on the use of fixed size arrays, which are prone to overflows, especially in the multiprogrammed environments which they are designed for. This is a significant drawback since, apart from memory inefficiency, it means users must tailor the deque size to accommodate the effects of the hard-to-predict level of multiprogramming, and add expensive blocking overflow-management mechanisms.
This paper presents the first dynamic memory work-stealing algorithm. It is based on a novel way of building non-blocking dynamic memory ABP deques by detecting synchronization conflicts based on “pointer-crossing” rather than “gaps between indexes” as in the original ABP algorithm. As we show, the new algorithm dramatically increases robustness and memory efficiency, while causing applications no observable performance penalty. We therefore believe it can replace array-based ABP work-queues, eliminating the need to add application specific overflow mechanisms.
Unable to display preview. Download preview PDF.
- 2.Acar, U.A., Blelloch, G.E., Blumofe, R.D.: The data locality of work stealing. In: ACM Symposium on Parallel Algorithms and Architectures, pp. 1–12 (2000)Google Scholar
- 3.Flood, C., Detlefs, D., Shavit, N., Zhang, C.: Parallel garbage collection for shared memory multiprocessors. In: Usenix Java Virtual Machine Research and Technology Symposium (JVM 2001), Monterey, CA (2001)Google Scholar
- 4.Leiserson, Plaat: Programming parallel applications in cilk. SINEWS: SIAM News 31 (1998)Google Scholar
- 6.Hendler, D., Shavit, N.: Non-blocking steal-half work queues. In: Proceedings of the 21st Annual ACM Symposium on Principles of Distributed Computing (2002)Google Scholar
- 7.Detlefs, D., Flood, C., Heller, S., Printezis, T.: Garbage-first garbage collection. Technical report, Sun Microsystems – Sun Laboratories (2004) (to appear)Google Scholar
- 9.Martin, P., Moir, M., Steele, G.: Dcas-based concurrent deques supporting bulk allocation. Technical Report TR-2002-111, Sun Microsystems Laboratories (2002)Google Scholar
- 10.Greenwald, M.B., Cheriton, D.R.: The synergy between non-blocking synchronization and operating system structure. In: 2nd Symposium on Operating Systems Design and Implementation, Seattle, WA, pp. 123–136 (1996)Google Scholar
- 11.Blumofe, R.D., Papadopoulos, D.: The performance of work stealing in multiprogrammed environments (extended abstract). In: Measurement and Modeling of Computer Systems, pp. 266–267 (1998)Google Scholar
- 13.Papadopoulos, D.: Hood: A user-level thread library for multiprogrammed multiprocessors. In: Master’s thesis, Department of Computer Sciences, University of Texas at Austin (1998)Google Scholar
- 14.Scott, M.L.: Personal communication: Code for a lock-free memory management pool (2003)Google Scholar
- 16.Blumofe, R.D., Plaxton, C.G., Ray, S.: Verification of a concurrent deque implementation. Technical Report CS-TR-99-11, University of Texas at Austin (1999)Google Scholar