A Portable Lock-Free Bounded Queue

  • Peter PirkelbauerEmail author
  • Reed Milewicz
  • Juan Felipe Gonzalez
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10048)


Attaining efficient and portable lock-free containers is challenging as almost any CPU family implements slightly different memory models and atomic read-modify-write operations. C++11 offers a memory model and operation abstractions that enable portable implementations of non-blocking algorithms. In this paper, we present a first scalable and portable lock-free bounded queue supporting multiple readers and multiple writers. Our design uses unique empty values to decouple writing an element from incrementing the tail during enqueue. Dequeue employs a helping scheme that delays helping in the regular case, thereby reducing contention on shared memory. We evaluate our implementation on architectures featuring weak and strong memory consistency models. Our comparison with known blocking and lock-free designs shows that the presented implementation scales well on architectures that implement a weak memory consistency model.


Memory Location Memory Model Data Race Tail Pointer Memory Consistency Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Batty, M., Dodds, M., Gotsman, A.: Library abstraction for C/C++ concurrency. SIGPLAN Not. 48(1), 235–248 (2013)CrossRefzbMATHGoogle Scholar
  2. 2.
    Boehm, H.-J., Adve, S.V.: Foundations of the C++ concurrency memory model. In: PLDI 2008, pp. 68–78. ACM (2008)Google Scholar
  3. 3.
    Feldman, S., Dechev, D.: A wait-free multi-producer multi-consumer ring buffer. SIGAPP Appl. Comput. Rev. 15(3), 59–71 (2015)CrossRefGoogle Scholar
  4. 4.
    Franke, H., Russell, R., Kirkwood, M.: Fuss, futexes, furwocks: fast user level locking in linux. In: Linux Symposium in Ottawa, pp. 479–491 (2002)Google Scholar
  5. 5.
    Fraser, K., Harris, T.: Concurrent programming without locks. ACM Trans. Comput. Syst. 25(2), 5 (2007)CrossRefGoogle Scholar
  6. 6.
    Frechilla, F.: Yet another implementation of a lock-free circular array queue, April 2011. Accessed 3 Mar 2013
  7. 7.
    Greenebaum, K., Barzel, R.: Audio Anecdotes II: Tools, Tips, and Techniques for Digital Audio. A K Peters/CRC Press, Natick (2004)Google Scholar
  8. 8.
    Hedström, K.: Lock-free single-producer - single consumer circular queue, December 2012. Accessed 10 Jan 2013
  9. 9.
    Herlihy, M., Shavit, N.: The Art of Multiprocessor Programming, revised 1st edn. Morgan Kaufmann Publishers Inc., San Francisco (2012)Google Scholar
  10. 10.
    Herlihy, M.P., Wing, J.M.: Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12(3), 463–492 (1990)CrossRefGoogle Scholar
  11. 11.
    ISO/IEC 14882 International Standard. Programming Language C++. JTC1/SC22/WG21 - The C++ Standards Committee (2011)Google Scholar
  12. 12.
    Kirsch, C., Lippautz, M., Payer, H.: Fast and scalable k-FIFO queues. Technical report TR2012-04, University of Salzburg (2012)Google Scholar
  13. 13.
    Kogan, A., Petrank, E.: Wait-free queues with multiple enqueuers and dequeuers. In: PPoPP 2011, pp. 223–234. ACM, New York (2011)Google Scholar
  14. 14.
    Lamport, L.: Specifying concurrent program modules. ACM Trans. Program. Lang. Syst. 5(2), 190–222 (1983)CrossRefzbMATHGoogle Scholar
  15. 15.
    Lee, P.P.C., Bu, T., Chandranmenon, G.: A lock-free, cache-efficient shared ring buffer for multi-core architectures. In: ANCS 2009, pp. 78–79. ACM, New York (2009)Google Scholar
  16. 16.
    Luchangco, V., Moir, M., Shavit, N.: Nonblocking k-compare-single-swap. In: Proceedings of the Fifteenth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA 2003, pp. 314–323. ACM, New York (2003)Google Scholar
  17. 17.
    McKenney, P.: Memory ordering in modern microprocessors (draft), September 2007. Accessed 20 Feb 2013
  18. 18.
    Michael, M.M.: Safe memory reclamation for dynamic lock-free objects using atomic reads and writes. In: PODC 2002, pp. 21–30. ACM, New York (2002)Google Scholar
  19. 19.
    Michael, M.M.: CAS-based lock-free algorithm for shared deques. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 651–660. Springer, Heidelberg (2003). doi: 10.1007/978-3-540-45209-6_92 CrossRefGoogle Scholar
  20. 20.
    Moir, M., Nussbaum, D., Shalev, O., Shavit, N.: Using elimination to implement scalable and lock-free FIFO queues. In: SPAA 2005, pp. 253–262. ACM, New York (2005)Google Scholar
  21. 21.
    Norris, B., Demsky, B.: CDSchecker: checking concurrent data structures written with C/C++ atomics. In: OOPSLA 2013, pp. 131–150. ACM, New York (2013)Google Scholar
  22. 22.
    Pirkelbauer, P.: Non-blocking programming techniques. University of Innsbruck, Invited Talk (2013)Google Scholar
  23. 23.
    Pirkelbauer, P.: Portable non-blocking data structures. University of Alabama, Invited Talk (2013)Google Scholar
  24. 24.
    Sarkar, S., Memarian, K., Owens, S., Batty, M., Sewell, P., Maranget, L., Alglave, J., Williams, D.: Synchronising C/C++ and POWER. In: PLDI, PLDI 2012, pp. 311–322. ACM, New York (2012)Google Scholar
  25. 25.
    Savage, S., Burrows, M., Nelson, G., Sobalvarro, P., Anderson, T.: Eraser: a dynamic data race detector for multithreaded programs. ACM Trans. Comput. Syst. 15(4), 391–411 (1997)CrossRefGoogle Scholar
  26. 26.
    Shafiei, N.: Non-blocking array-based algorithms for stacks and queues. In: Garg, V., Wattenhofer, R., Kothapalli, K. (eds.) ICDCN 2009. LNCS, vol. 5408, pp. 55–66. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-92295-7_10 CrossRefGoogle Scholar
  27. 27.
    Shann, C.-H., Huang, T.L., Chen, C.: A practical nonblocking queue algorithm using compare-and-swap. In: 7th International Conference on Parallel and Distributed Systems, pp. 470–475 (2000)Google Scholar
  28. 28.
    Dechev, D., Feldman, S., LaBorde, P.: Tervel (2015).
  29. 29.
    Stone, J.M.: A nonblocking compare-and-swap algorithm for a shared circular queue. In: Parallel and Distributed Computing in Engineering Systems, pp. 147–152. Elsevier Science B.V. (1992)Google Scholar
  30. 30.
    Stroustrup, B.: The C++ Programming Language, 4th edn. Addison-Wesley Professional, Salt Lake City (2013)zbMATHGoogle Scholar
  31. 31.
    Tsigas, P., Zhang, Y.: A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems. In: SPAA 2001, pp. 134–143. ACM, New York (2001)Google Scholar
  32. 32.
    Vyukov, D.: Bounded MPMC queue (2013). Accessed 21 May 2016
  33. 33.
    Williams, A.: C++ Concurrency in Action: Practical Multithreading. Manning Publications, Shelter Island (2012)Google Scholar
  34. 34.
    Yang, C., Mellor-Crummey, J.: A wait-free queue as fast as fast as fetch-and-add. In: PPoPP 2016, pp. 16:1–16:13. ACM, New York (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Peter Pirkelbauer
    • 1
    Email author
  • Reed Milewicz
    • 1
  • Juan Felipe Gonzalez
    • 2
  1. 1.University of Alabama at BirminghamBirminghamUSA
  2. 2.Motorola SolutionsBirminghamUSA

Personalised recommendations