International Journal of Parallel Programming

, Volume 43, Issue 4, pp 572–596

A Wait-Free Multi-Word Compare-and-Swap Operation



The number of cores in future multi-core systems are expected to increase by 100 fold over the next decade. The fine-grained synchronization methods found in wait-free algorithm designs makes them desirable for these future systems. Unfortunately, such designs are often inhibited by the limitations of portable atomic hardware primitives. Typically these primitives can only operate on a single address at a time, while concurrent algorithms often need to operate on multiple addresses. To support such algorithms we present a practical wait-free Multi-word-compare-and-swap. The wait-free property ensures that each thread completes its operation in a finite number of steps, even if it is continuously interrupted. Our approach uses a progress assurance scheme that allows a blocked thread to announce that it is unable to make progress. This differs from traditional lock-free helping techniques where a thread will only help complete an operation that is in conflict with its own. Our design is practical in that it is built from only portable atomic operations, it is efficient in its utilization of memory (i.e. requiring only a single bit to be reserved from each word, not requiring use of explicit memory barriers, and requiring only four words per address in the operation), and has a wait-free progress guarantee. When tested in a high contention scenario with 64 threads executing updates on a single multi-word object, our wait-free design performs on average 77.1 % more operations than other practical approaches. Over all tested scenarios, our design performs on average 8.3 % more operations.


Wait-free Lock-free Non-blocking Concurrent  Multi-word compare-and-swap MCAS CAS 


  1. 1.
    Shalf, J., Dosanjh, S., Morrison, J.: In: Proceedings of the 9th International Conference on High Performance Computing for Computational Science, pp. 1–25. Springer-Verlag, Berlin, Heidelberg, VECPAR’10 (2011).
  2. 2.
    Herlihy, M.: A methodology for implementing highly concurrent data objects. ACM Trans. Prog. Lang. Syst. 15(5), 745 (1993). doi:10.1145/161468.161469 CrossRefGoogle Scholar
  3. 3.
    Steven Feldman, D.D., LaBorde, P.: In: International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, pp. 155–163 (2013)Google Scholar
  4. 4.
    Timnat, S., Braginsky, A., Kogan, A., Petrank, E.: Wait-free linked-lists. SIGPLAN Not. 47(8), 309 (2012). doi:10.1145/2370036.2145869 Google Scholar
  5. 5.
    Meawad, F., Schoeberl, M., Iyer, K., Vitek, J.: In: Proceedings of the 9th International Workshop on Java Technologies for Real-Time and Embedded Systems, pp. 1–10. ACM, New York, NY, USA, JTRES ’11 (2011). doi:10.1145/2043910.2043912
  6. 6.
    Harris, T.L., Fraser, K., Pratt, I.A.: In: Proceedings of the 16th International Conference on Distributed Computing, pp. 265–279. Springer-Verlag, London, UK, DISC ’02 (2002).
  7. 7.
    Purcell, C., Harris, T.: In: Proceedings of the 19th International Conference on Distributed Computing, pp. 108–121. Springer-Verlag, Berlin, Heidelberg, DISC’05 (2005). doi:10.1007/11561927_10
  8. 8.
    Liu, Y., Spear, M.: A lock-free, array-based priority queue. SIGPLAN Not. 47(8), 323 (2012). doi:10.1145/2370036.2145876 CrossRefGoogle Scholar
  9. 9.
    Saha, B., Adl-Tabatabai, A.R., Hudson, R.L., Minh, C.C., Hertzberg, B.: In: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 187–197. ACM, New York, NY, USA, PPoPP ’06 (2006). doi:10.1145/1122971.1123001
  10. 10.
    Barnes, G.: In: Proceedings of the Fifth Annual ACM Symposium on Parallel Algorithms and Architecturespp. 261–270. ACM, New York, NY, USA, SPAA ’93 (1993). doi:10.1145/165231.165265
  11. 11.
    Fraser, K., Harris, T.: ACM Trans. Comput. Syst. 25(2) (2007). doi:10.1145/1233307.1233309,
  12. 12.
    Israeli, A., Rappoport, L.: In: Proceedings of the Thirteenth Annual ACM Symposium on Principles of Distributed Computing, pp. 151–160. ACM, New York, NY, PODC ’94 (1994). doi:10.1145/197917.198079
  13. 13.
    Anderson, J.H., Ramamurthy, S., Jeffay, K.: Real-time computing with lock-free shared objects. ACM Trans. Comput. Syst. 15(2), 134 (1997). doi:10.1145/253145.253159 CrossRefGoogle Scholar
  14. 14.
    Moir, M.: In: Proceedings of the 11th International Workshop on Distributed Algorithms, pp. 305–319. Springer-Verlag, London, UK, WDAG ’97 (1997).
  15. 15.
    Attiya, H., Hillel, E.: Built-in coloring for highly-concurrent doubly-linked lists. Theor. Comput. Sci. 412(12–14), 1243 (2011). doi:10.1016/j.tcs.2010.12.049 CrossRefMATHMathSciNetGoogle Scholar
  16. 16.
    Sundell, H.: International Journal of Parallel Programming 39, 694 (2011) DOI:10.1007/s10766-011-0167-4,
  17. 17.
    Kogan, A., Petrank, E.: A methodology for creating fast wait-free data structures. SIGPLAN Not. 47(8), 141 (2012). doi:10.1145/2370036.2145835 CrossRefGoogle Scholar
  18. 18.
    Detlefs, D.L., Martin, P.A., Moir, M., Steele, G.L. Jr.: In: Proceedings of the Twentieth Annual ACM Symposium on Principles of Distributed Computing, pp. 190–199. ACM, New York, NY, USA, PODC ’01 (2001). doi:10.1145/383962.384016
  19. 19.
    Herlihy, M.: The Art of Multiprocessor Programming. Elsevier, Amsterdam (2008)Google Scholar
  20. 20.
    Michael, M.M.: Performance of memory reclamation for lockless synchronization. IEEE Trans. Parallel Distrib. Syst. 15(6), 491 (2004). doi:10.1109/TPDS.2004.8 CrossRefGoogle Scholar
  21. 21.
    Amdahl, G.M.: In: Proceedings of the April 18–20, 1967, Spring Joint Computer Conference, pp. 483–485. ACM, New York, NY, AFIPS ’67 (Spring) (1967). doi:10.1145/1465482.1465560

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.University of Central FloridaOrlandoUSA

Personalised recommendations