Fast and scalable rendezvousing

Afek, Yehuda; Hakimi, Michael; Morrison, Adam

doi:10.1007/s00446-013-0185-0

Fast and scalable rendezvousing

Published: 28 March 2013

Volume 26, pages 243–269, (2013)
Cite this article

Distributed Computing Aims and scope Submit manuscript

Yehuda Afek¹,
Michael Hakimi¹ &
Adam Morrison¹

235 Accesses
8 Citations
Explore all metrics

Abstract

In an asymmetric rendezvous system, such as an unfair synchronous queue or an elimination array, threads of two types, consumers and producers, show up and are matched each with a unique thread of the other type. Here we present new highly scalable, high throughput asymmetric rendezvous systems that outperform prior synchronous queue and elimination array implementations under both symmetric and asymmetric workloads (more operations of one type than the other). Based on this rendezvous system, we also construct a highly scalable and competitive stack implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fissile Locks

CBPQ: High Performance Lock-Free Priority Queue

Design and Implementation of Highly Scalable Quantifiable Data Structures

Notes

This reflects Java semantics, where arrays are of references to objects and not of objects themselves.
This is standard array semantics in Java, but not in C++.
GC is part of modern environments such as C# and Java, in which most prior synchronous queue algorithms were implemented [1, 8, 17].
Java benchmarks were ran with HotSpot Server JVM, build 1.7.0_05-b05. C++ benchmarks were compiled with Sun C++ 5.9 on the SPARC machine and with gcc 4.3.3 (-O3 optimization setting) on the Intel machine. In the C++ experiments we used the Hoard 3.8 [3] memory allocator.
We remove all statistics counting from the code and use the latest JVM. Thus, the results we report are usually slightly better than those reported in the original papers. On the other hand, we fixed a bug in the benchmark of [8] that miscounted timed-out operations of the Java channel as successful operations; thus the results we report for it are sometimes lower.
We reduced the overhead due to memory allocation in the original implementations [7] by caching objects popped from the stack and using them in future push operations.

References

Afek, Y., Korland, G., Natanzon, M., Shavit, N.: Scalable producer-consumer pools based on elimination-diffraction trees. In: Euro-Par 2010—Parallel Processing, vol. 6272 of LNCS, pp. 151–162. Springer, Berlin, Heidelberg (2010)
Andrews, G.R.: Concurrent Programming: Principles and Practice. Benjamin-Cummings Publishing Co, Redwood City (1991)
Google Scholar
Berger, E.D., McKinley, K.S., Blumofe, R.D., Wilson, P.R.: Hoard: a scalable memory allocator for multithreaded applications. SIGARCH Comput. Archit. News 28(5), 117–128 (2000)
Article Google Scholar
Fatourou, P., Kallimanis, N.D.: A highly-efficient wait-free universal construction. In: Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2011, pp. 325–334. ACM, New York, NY, USA (2011)
Fatourou, P., Kallimanis, N.D.: Revisiting the combining synchronization technique. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’12, pp. 257–266. ACM, New York, NY, USA, (2012)
Hanson, D.R.: C Interfaces and Implementations: Techniques for Creating Reusable Software. Addison-Wesley Longman Publishing, Boston (1996)
Hendler, D., Incze, I., Shavit, N., Tzafrir, M.: Flat combining and the synchronization-parallelism tradeoff. In: Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2010, pp. 355–364. ACM, New York, NY, USA (2010)
Hendler, D., Incze, I., Shavit, N., Tzafrir, M.: Scalable flat-combining based synchronous queues. In: Proceedings of the 24th International Symposium on Distributed Computing (DISC 2010), vol. 6343 of LNCS, pp. 79–93. Springer, Berlin, Heidelberg (2010)
Hendler, D., Shavit, N., Yerushalmi, L.: A scalable lock-free stack algorithm. J. Parallel Distrib. Comput. 70(1), 1–12 (2010). doi: 10.1016/j.jpdc.2009.08.011
Google Scholar
Herlihy, M.: Wait-free synchronization. ACM Trans. Program. Lang. Syst. (TOPLAS) 13, 124–149 (1991)
Article Google Scholar
Herlihy, M.P., Wing, J.M.: Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. (TOPLAS) 12, 463–492 (1990)
Article Google Scholar
Lea, D., Scherer, W.N. III, Scott, M.L.: java.util.concurrent. Exchanger source code. http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/main/java/util/concurrent/Exchanger.java (2011)
Merritt, M., Taubenfeld, G.: Computing with infinitely many processes. In: Proceedings of the 14th International Symposium on Distributed Computing (DISC 2000), vol. 1914 of LNCS, pp. 164–178. Springer, Berlin, Heidelberg (2000)
Michael, M.M.: Hazard pointers: safe memory reclamation for lock-free objects. IEEE Trans. Parallel Distrib. Syst. 15(6), 491–504 (2004)
Google Scholar
Michael, M.M., Scott, M.L.: Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In: Proceedings of the 15th Annual ACM Symposium on Principles of Distributed Computing, PODC ’96, pp. 267–275. ACM, New York, NY, USA (1996)
Moir, M., Nussbaum, D., Shalev, O., Shavit, N.: Using elimination to implement scalable and lock-free fifo queues. In Proceedings of the 17th Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2005, pp. 253–262. ACM, New York, NY, USA (2005)
Scherer, W.N., III, Lea, D., Scott, M.L.: Scalable synchronous queues. In Proceedings of the 11th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2006, pp. 147–156. ACM, New York, NY, USA (2006)
Scherer, W.N. III, Lea, D., Scott, M.L.: A scalable elimination-based exchange channel. In: Workshop on Synchronization and Concurrency in Object-Oriented Languages (SCOOL 2005) October (2005)
Scherer, W.N. III, Scott, M.L.: Nonblocking concurrent data structures with condition synchronization. In: Proceedings of the 18th International Symposium on Distributed Computing (DISC 2004), vol. 3274 of LNCS, pp. 174–187. Springer, Berlin/Heidelberg (2004)
Shavit, N., Touitou, D.: Elimination trees and the construction of pools and stacks. Theory Comput. Syst. 30(6), 645–670 (1997). doi: 10.1007/s002240000072
Shavit, N., Zemach, A.: Diffracting trees. ACM Trans. Comput. Syst. (TOCS) 14, 385–428 (1996)
Article Google Scholar
Shavit, N., Zemach, A.: Combining funnels: a dynamic approach to software combining. J. Parallel Distrib. Comput. 60(11), 1355–1387 (2000)
Google Scholar
Tang, L., Mars, J., Vachharajani, N., Hundt, R., Soffa, M.L.: The impact of memory subsystem resource sharing on datacenter applications. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA ’11, ACM, New York, NY, USA (2011)
Treiber, R.K.: Systems programming: coping with parallelism. Technical Report RJ5118, IBM Almaden Research Center (2006)

Download references

Acknowledgments

We are grateful to Hillel Avni, Nir Shavit and the anonymous reviewers, whose comments and suggestions helped to considerably improve this paper.

Author information

Authors and Affiliations

Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
Yehuda Afek, Michael Hakimi & Adam Morrison

Authors

Yehuda Afek
View author publications
You can also search for this author in PubMed Google Scholar
Michael Hakimi
View author publications
You can also search for this author in PubMed Google Scholar
Adam Morrison
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yehuda Afek.

Additional information

This work was supported by the Israel Science Foundation under grant 1386/11 and by machine donations from Intel and Oracle. Adam Morrison is supported by an IBM PhD Fellowship.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Afek, Y., Hakimi, M. & Morrison, A. Fast and scalable rendezvousing. Distrib. Comput. 26, 243–269 (2013). https://doi.org/10.1007/s00446-013-0185-0

Download citation

Received: 26 December 2011
Accepted: 23 November 2012
Published: 28 March 2013
Issue Date: August 2013
DOI: https://doi.org/10.1007/s00446-013-0185-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast and scalable rendezvousing

Abstract

Access this article

Similar content being viewed by others

Fissile Locks

CBPQ: High Performance Lock-Free Priority Queue

Design and Implementation of Highly Scalable Quantifiable Data Structures

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fast and scalable rendezvousing

Abstract

Access this article

Similar content being viewed by others

Fissile Locks

CBPQ: High Performance Lock-Free Priority Queue

Design and Implementation of Highly Scalable Quantifiable Data Structures

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation