Skip to main content

Fast and scalable rendezvousing


In an asymmetric rendezvous system, such as an unfair synchronous queue or an elimination array, threads of two types, consumers and producers, show up and are matched each with a unique thread of the other type. Here we present new highly scalable, high throughput asymmetric rendezvous systems that outperform prior synchronous queue and elimination array implementations under both symmetric and asymmetric workloads (more operations of one type than the other). Based on this rendezvous system, we also construct a highly scalable and competitive stack implementation.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23


  1. 1.

    This reflects Java semantics, where arrays are of references to objects and not of objects themselves.

  2. 2.

    This is standard array semantics in Java, but not in C++.

  3. 3.

    GC is part of modern environments such as C# and Java, in which most prior synchronous queue algorithms were implemented [1, 8, 17].

  4. 4.

    Java benchmarks were ran with HotSpot Server JVM, build 1.7.0_05-b05. C++ benchmarks were compiled with Sun C++ 5.9 on the SPARC machine and with gcc 4.3.3 (-O3 optimization setting) on the Intel machine. In the C++ experiments we used the Hoard 3.8 [3] memory allocator.

  5. 5.

    We remove all statistics counting from the code and use the latest JVM. Thus, the results we report are usually slightly better than those reported in the original papers. On the other hand, we fixed a bug in the benchmark of [8] that miscounted timed-out operations of the Java channel as successful operations; thus the results we report for it are sometimes lower.

  6. 6.

    We reduced the overhead due to memory allocation in the original implementations [7] by caching objects popped from the stack and using them in future push operations.


  1. 1.

    Afek, Y., Korland, G., Natanzon, M., Shavit, N.: Scalable producer-consumer pools based on elimination-diffraction trees. In: Euro-Par 2010—Parallel Processing, vol. 6272 of LNCS, pp. 151–162. Springer, Berlin, Heidelberg (2010)

  2. 2.

    Andrews, G.R.: Concurrent Programming: Principles and Practice. Benjamin-Cummings Publishing Co, Redwood City (1991)

    Google Scholar 

  3. 3.

    Berger, E.D., McKinley, K.S., Blumofe, R.D., Wilson, P.R.: Hoard: a scalable memory allocator for multithreaded applications. SIGARCH Comput. Archit. News 28(5), 117–128 (2000)

    Article  Google Scholar 

  4. 4.

    Fatourou, P., Kallimanis, N.D.: A highly-efficient wait-free universal construction. In: Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2011, pp. 325–334. ACM, New York, NY, USA (2011)

  5. 5.

    Fatourou, P., Kallimanis, N.D.: Revisiting the combining synchronization technique. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’12, pp. 257–266. ACM, New York, NY, USA, (2012)

  6. 6.

    Hanson, D.R.: C Interfaces and Implementations: Techniques for Creating Reusable Software. Addison-Wesley Longman Publishing, Boston (1996)

  7. 7.

    Hendler, D., Incze, I., Shavit, N., Tzafrir, M.: Flat combining and the synchronization-parallelism tradeoff. In: Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2010, pp. 355–364. ACM, New York, NY, USA (2010)

  8. 8.

    Hendler, D., Incze, I., Shavit, N., Tzafrir, M.: Scalable flat-combining based synchronous queues. In: Proceedings of the 24th International Symposium on Distributed Computing (DISC 2010), vol. 6343 of LNCS, pp. 79–93. Springer, Berlin, Heidelberg (2010)

  9. 9.

    Hendler, D., Shavit, N., Yerushalmi, L.: A scalable lock-free stack algorithm. J. Parallel Distrib. Comput. 70(1), 1–12 (2010). doi: 10.1016/j.jpdc.2009.08.011

    Google Scholar 

  10. 10.

    Herlihy, M.: Wait-free synchronization. ACM Trans. Program. Lang. Syst. (TOPLAS) 13, 124–149 (1991)

    Article  Google Scholar 

  11. 11.

    Herlihy, M.P., Wing, J.M.: Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. (TOPLAS) 12, 463–492 (1990)

    Article  Google Scholar 

  12. 12.

    Lea, D., Scherer, W.N. III, Scott, M.L.: java.util.concurrent. Exchanger source code. (2011)

  13. 13.

    Merritt, M., Taubenfeld, G.: Computing with infinitely many processes. In: Proceedings of the 14th International Symposium on Distributed Computing (DISC 2000), vol. 1914 of LNCS, pp. 164–178. Springer, Berlin, Heidelberg (2000)

  14. 14.

    Michael, M.M.: Hazard pointers: safe memory reclamation for lock-free objects. IEEE Trans. Parallel Distrib. Syst. 15(6), 491–504 (2004)

    Google Scholar 

  15. 15.

    Michael, M.M., Scott, M.L.: Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In: Proceedings of the 15th Annual ACM Symposium on Principles of Distributed Computing, PODC ’96, pp. 267–275. ACM, New York, NY, USA (1996)

  16. 16.

    Moir, M., Nussbaum, D., Shalev, O., Shavit, N.: Using elimination to implement scalable and lock-free fifo queues. In Proceedings of the 17th Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2005, pp. 253–262. ACM, New York, NY, USA (2005)

  17. 17.

    Scherer, W.N., III, Lea, D., Scott, M.L.: Scalable synchronous queues. In Proceedings of the 11th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2006, pp. 147–156. ACM, New York, NY, USA (2006)

  18. 18.

    Scherer, W.N. III, Lea, D., Scott, M.L.: A scalable elimination-based exchange channel. In: Workshop on Synchronization and Concurrency in Object-Oriented Languages (SCOOL 2005) October (2005)

  19. 19.

    Scherer, W.N. III, Scott, M.L.: Nonblocking concurrent data structures with condition synchronization. In: Proceedings of the 18th International Symposium on Distributed Computing (DISC 2004), vol. 3274 of LNCS, pp. 174–187. Springer, Berlin/Heidelberg (2004)

  20. 20.

    Shavit, N., Touitou, D.: Elimination trees and the construction of pools and stacks. Theory Comput. Syst. 30(6), 645–670 (1997). doi: 10.1007/s002240000072

  21. 21.

    Shavit, N., Zemach, A.: Diffracting trees. ACM Trans. Comput. Syst. (TOCS) 14, 385–428 (1996)

    Article  Google Scholar 

  22. 22.

    Shavit, N., Zemach, A.: Combining funnels: a dynamic approach to software combining. J. Parallel Distrib. Comput. 60(11), 1355–1387 (2000)

    Google Scholar 

  23. 23.

    Tang, L., Mars, J., Vachharajani, N., Hundt, R., Soffa, M.L.: The impact of memory subsystem resource sharing on datacenter applications. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA ’11, ACM, New York, NY, USA (2011)

  24. 24.

    Treiber, R.K.: Systems programming: coping with parallelism. Technical Report RJ5118, IBM Almaden Research Center (2006)

Download references


We are grateful to Hillel Avni, Nir Shavit and the anonymous reviewers, whose comments and suggestions helped to considerably improve this paper.

Author information



Corresponding author

Correspondence to Yehuda Afek.

Additional information

This work was supported by the Israel Science Foundation under grant 1386/11 and by machine donations from Intel and Oracle. Adam Morrison is supported by an IBM PhD Fellowship.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Afek, Y., Hakimi, M. & Morrison, A. Fast and scalable rendezvousing. Distrib. Comput. 26, 243–269 (2013).

Download citation


  • Ring Size
  • Private Location
  • False Match
  • Free Node
  • Thread Count