The Adaptive Priority Queue with Elimination and Combining
Priority queues are fundamental abstract data structures, often used to manage limited resources in parallel programming. Several proposed parallel priority queue implementations are based on skiplists, harnessing the potential for parallelism of the add() operations. In addition, methods such as Flat Combining have been proposed to reduce contention, batching together multiple operations to be executed by a single thread. While this technique can decrease lock-switching overhead and the number of pointer changes required by the removeMin() operations in the priority queue, it can also create a sequential bottleneck and limit parallelism, especially for non-conflicting add() operations.
In this paper, we describe a novel priority queue design, harnessing the scalability of parallel insertions in conjunction with the efficiency of batched removals. Moreover, we present a new elimination algorithm suitable for a priority queue, which further increases concurrency on balanced workloads with similar numbers of add() and removeMin() operations. We implement and evaluate our design using a variety of techniques including locking, atomic operations, hardware transactional memory, as well as employing adaptive heuristics given the workload.
KeywordsSequential Part Priority Queue Transactional Memory Parallel Part Elimination Algorithm
Unable to display preview. Download preview PDF.
- 1.Calciu, I., Dice, D., Harris, T., Herlihy, M., Kogan, A., Marathe, V., Moir, M.: Message passing or shared memory: Evaluating the delegation abstraction for multicores. In: Baldoni, R., Nisse, N., van Steen, M. (eds.) OPODIS 2013. LNCS, vol. 8304, pp. 83–97. Springer, Heidelberg (2013)CrossRefGoogle Scholar
- 2.Calciu, I., Gottschlich, J., Herlihy, M.: Using delegation and elimination to implement a scalable numa-friendly stack. In: 5th USENIX Workshop on Hot Topics in Parallelism (2013)Google Scholar
- 3.Hendler, D., Incze, I., Shavit, N., Tzafrir, M.: Flat combining and the synch-ronization-parallelism tradeoff. In: Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2010, pp. 355–364. ACM, New York (2010), http://doi.acm.org/10.1145/1810479.1810540 Google Scholar
- 7.Intel Corporation: Transactional Synchronization in Haswell (September 8, 2012), http://software.intel.com/en-us/blogs/2012/02/07/transactional-synchronization-in-haswell/ (retrieved)
- 8.Lotan, I., Shavit, N.: Skiplist-based concurrent priority queues. In: Proc. of the 14th International Parallel and Distributed Processing Symposium (IPDPS), pp. 263–268 (2000)Google Scholar
- 10.Moir, M., Nussbaum, D., Shalev, O., Shavit, N.: Using elimination to implement scalable and lock-free fifo queues. In: Proceedings of the Seventeenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2005, pp. 253–262. ACM, New York (2005), http://doi.acm.org/10.1145/1073970.1074013 Google Scholar
- 11.Sundell, H., Tsigas, P.: Fast and lock-free concurrent priority queues for multi-thread systems. In: IEEE International Symposium on Parallel and Distributed Processing, p. 11 (April 2003)Google Scholar
- 12.Wang, A., Gaudet, M., Wu, P., Amaral, J.N., Ohmacht, M., Barton, C., Silvera, R., Michael, M.: Evaluation of blue gene/q hardware support for transactional memories. In: Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, PACT 2012, pp. 127–136. ACM, New York (2012), http://doi.acm.org/10.1145/2370816.2370836 Google Scholar