NBmalloc: Allocating Memory in a Lock-Free Manner


Efficient, scalable memory allocation for multithreaded applications on multiprocessors is a significant goal of recent research. In the distributed computing literature it has been emphasized that lock-based synchronization and concurrency-control may limit the parallelism in multiprocessor systems. Thus, system services that employ such methods can hinder reaching the full potential of these systems. A natural research question is the pertinence and the impact of lock-free concurrency control in key services for multiprocessors, such as in the memory allocation service, which is the theme of this work. We show the design and implementation of NBmalloc, a lock-free memory allocator designed to enhance the parallelism in the system. The architecture of NBmalloc is inspired by Hoard, a well-known concurrent memory allocator, with modular design that preserves scalability and helps avoiding false-sharing and heap-blowup. Within our effort to design appropriate lock-free algorithms for NBmalloc, we propose and show a lock-free implementation of a new data structure, flat-set, supporting conventional “internal” set operations as well as “inter-object” operations, for moving items between flat-sets. The design of NBmalloc also involved a series of other algorithmic problems, which are discussed in the paper. Further, we present the implementation of NBmalloc and a study of its behaviour in a set of multiprocessor systems. The results show that the good properties of Hoard w.r.t. false-sharing and heap-blowup are preserved.


  1. 1.

    Gidenstam, A., Papatriantafilou, M., Tsigas, P.: Allocating memory in a lock-free manner. In: Proc. of the 13th Annual European Symp. on Algorithms (ESA’05). LNCS, vol. 3669, pp. 329–342. Springer, Berlin (2005)

    Google Scholar 

  2. 2.

    Berger, E.D.: Memory management for high-performance applications. Ph.D. Thesis, The University of Texas at Austin, Department of Computer Sciences (2002)

  3. 3.

    Berger, E., McKinley, K., Blumofe, R., Wilson, P.: Hoard: A scalable memory allocator for multithreaded applications. In: ASPLOS-IX: 9th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pp. 117–128 (2000)

  4. 4.

    Michael, M.: Scalable lock-free dynamic memory allocation. In: Proc. of SIGPLAN 2004 Conf. on Programming Languages Design and Implementation. ACM SIGPLAN Notices. ACM, New York (2004)

    Google Scholar 

  5. 5.

    Dice, D., Garthwaite, A.: Mostly lock-free malloc. In: ISMM’02 Proc. of the 3rd Int. Symp. on Memory Management. ACM SIGPLAN Notices, pp. 163–174. ACM, New York (2002)

    Google Scholar 

  6. 6.

    Michael, M.M.: Safe memory reclamation for dynamic lock-free objects using atomic reads and writes. In: Proceedings of the 21st ACM Symposium on Principles of Distributed Computing, pp. 21–30. ACM, New York (2002)

    Google Scholar 

  7. 7.

    Michael, M.M.: Hazard pointers: Safe memory reclamation for lock-free objects. IEEE Trans. Parallel Distrib. Syst. 15, 491–504 (2004)

    Article  Google Scholar 

  8. 8.

    Herlihy, M., Luchangco, V., Moir, M.: The repeat offender problem: A mechanism for supporting dynamic-sized, lock-free data structures. In: Proceedings of the 16th International Symposium on Distributed Computing (DISC’02). LNCS, vol. 2508, pp. 339–353. Springer, Berlin (2002)

    Google Scholar 

  9. 9.

    Valois, J.D.: Lock-free data structures. Ph.D. Thesis, Rensselaer Polytechnic Institute, Department of Computer Science, Troy, New York (1995)

  10. 10.

    Michael, M.M., Scott, M.L.: Correction of a memory management method for lock-free data structures. Technical Report TR599, University of Rochester, Computer Science Department (1995)

  11. 11.

    Detlefs, D.L., Martin, P.A., Moir, M., Guy L. Steele, J.: Lock-free reference counting. In: Proceedings of the 20th Annual ACM Symposium on Principles of Distributed Computing, pp. 190–199. ACM, New York (2001)

  12. 12.

    Herlihy, M., Luchangco, V., Martin, P., Moir, M.: Nonblocking memory management support for dynamic-sized data structures. ACM Trans. Comput. Syst. 23, 146–196 (2005)

    Article  Google Scholar 

  13. 13.

    Gidenstam, A., Papatriantafilou, M., Sundell, H., Tsigas, P.: Practical and efficient lock-free garbage collection based on reference counting. In: Proc. of the 8th International Symp. on Parallel Architectures, Algorithms, and Networks (I-SPAN), pp. 202–207. IEEE Comput. Soc., Los Alamitos (2005)

    Google Scholar 

  14. 14.

    Schneider, S., Antonopoulos, C., Nikolopoulos, D.: Scalabel locality-conscious multithreaded memory allocation. In: Proceedings of the 2006 International Symposium on Memory Management (ISMM’06), pp. 84–94. ACM, New York (2006)

    Google Scholar 

  15. 15.

    Massalin, H., Pu, C.: A lock-free multiprocessor OS kernel. Technical Report CUCS-005-91 (1991)

  16. 16.

    Massalin, H.: Synthesis: an efficient implementation of fundamental operating system services. Ph.D. Thesis, Columbia University (1992)

  17. 17.

    Greenwald, M., Cheriton, D.R.: The synergy between non-blocking synchronization and operating system structure. In: Operating Systems Design and Implementation, pp. 123–136 (1996)

  18. 18.

    Greenwald, M.B.: Non-blocking synchronization and system design. Ph.D. Thesis, Stanford University (1999)

  19. 19.

    SGI: The standard template library for C++ (2003). http://www.sgi.com/tech/stl/Allocators.html

  20. 20.

    Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. J. Assoc. Comput. Mach. 46, 720–748 (1999)

    MATH  MathSciNet  Google Scholar 

  21. 21.

    Gloger, W.: Wolfram Gloger’s malloc homepage (2003). http://www.malloc.de/en/

  22. 22.

    Steensgaard, B.: Thread-specific heaps for multi-threaded programs. In: ISMM 2000 Proc. of the Second Int. Symp. on Memory Management. ACM SIGPLAN Notices, vol. 36(1). ACM, New York (2000)

    Google Scholar 

  23. 23.

    Herlihy, M.P., Wing, J.M.: Linearizability: A correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12, 463–492 (1990)

    Article  Google Scholar 

  24. 24.

    Barnes, G.: A method for implementing lock-free shared data structures. In: Proc. of the 5th Annual ACM Symp. on Parallel Algorithms and Architectures, SIGACT and SIGARCH, pp. 261–270 (1993). Extended abstract

  25. 25.

    Herlihy, M.: Wait-free synchronization. ACM Trans. Program. Syst. 11, 124–149 (1991)

    Article  Google Scholar 

  26. 26.

    Rinard, M.C.: Effective fine-grain synchronization for automatically parallelized programs using optimistic synchronization primitives. ACM Trans. Comput. Syst. 17, 337–371 (1999)

    Article  Google Scholar 

  27. 27.

    Valois, J.D.: Lock-free linked lists using compare-and-swap. In: Proc. of the 14th Annual ACM Symp. on Principles of Distributed Computing (PODC ’95), pp. 214–222. ACM, New York (1995)

    Google Scholar 

  28. 28.

    Tsigas, P., Zhang, Y.: A simple, fast and scalable non-blocking concurrent fifo queue for shared memory multiprocessor systems. In: Proc. of the 13th Annual ACM Symp. on Parallel Algorithms and Architectures, pp. 134–143. ACM, New York (2001)

    Google Scholar 

  29. 29.

    Harris, T.L.: A pragmatic implementation of non-blocking linked lists. In: Proc. of the 15th Int. Conf. on Distributed Computing, pp. 300–314. Springer, Berlin (2001)

    Google Scholar 

  30. 30.

    Hoepman, J.H., Papatriantafilou, M., Tsigas, P.: Self-stabilization of wait-free shared memory objects. J. Parallel Distrib. Comput. 62, 766–791 (2002)

    Article  Google Scholar 

  31. 31.

    Moir, M.: Practical implementations of non-blocking synchronization primitives. In: Proc. of the 16th Annual ACM Symp. on Principles of Distributed Computing, pp. 219–228 (1997)

  32. 32.

    Papatriantafilou, M., Tsigas, P.: Wait-free consensus in “in-phase” multiprocessor systems. In: Symp. on Parallel and Distributed Processing (SPDP ’95), pp. 312–319. IEEE Comput. Soc., Los Alamitos (1995)

    Google Scholar 

  33. 33.

    Papatriantafilou, M., Tsigas, P.: On self-stabilizing wait-free clock synchronization. Parallel Process. Lett. 7, 321–328 (1997)

    Article  MathSciNet  Google Scholar 

  34. 34.

    Sundell, H., Tsigas, P.: Fast and lock-free concurrent priority queues for multi-thread systems. In: Proc. of the 17th IEEE/ACM Int. Parallel and Distributed Processing Symp. (IPDPS 03). IEEE Press, New York (2003)

    Google Scholar 

  35. 35.

    Valois, J.D.: Implementing lock-free queues. In: Proc. of the Seventh Int. Conf. on Parallel and Distributed Computing Systems, pp. 64–69 (1994)

  36. 36.

    Michael, M.M.: High performance dynamic lock-free hash tables and list-based sets. In: Proc. of the 14th Annual ACM Symp. on Parallel Algorithms and Architectures (SPAA-02), pp. 73–82. ACM, New York (2002)

    Google Scholar 

  37. 37.

    Jayanti, P.: A complete and constant time wait-free implementation of CAS from LL/SC and vice versa. In: Proceedings of the 12th International Symposium on Distributed Computing (DISC ’98). Lecture Notes in Computer Science, vol. 1499, pp. 216–230. Springer, Berlin (1998)

    Google Scholar 

  38. 38.

    IBM: IBM System/370 Extended Architecture, Principles of Operation (1983). Publication No. SA22-7085

  39. 39.

    Larson, P.P.Å., Krishnan, M.: Memory allocation for long-running server applications. In: ISMM’98 Proc. of the 1st Int. Symp. on Memory Management. ACM SIGPLAN Notices, pp. 176–185. ACM, New York (1998)

    Google Scholar 

  40. 40.

    Dechev, D., Pirkelbauer, P., Stroustrup, B.: Lock-free dynamically resizable arrays. In: Proc. of the 10th Int. Conf. on Principles of Distributed Systems OPODIS’06. LNCS, vol. 4305, pp. 142–156. Springer, Berlin (2006)

    Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Anders Gidenstam.

Additional information

This work was supported by computational resources provided by the Swedish National Supercomputer Centre (NSC). This paper is an extended version of [1].

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and Permissions

About this article

Cite this article

Gidenstam, A., Papatriantafilou, M. & Tsigas, P. NBmalloc: Allocating Memory in a Lock-Free Manner. Algorithmica 58, 304–338 (2010). https://doi.org/10.1007/s00453-008-9268-x

Download citation


  • Memory allocation
  • Lock-free synchronization
  • Non-blocking synchronization
  • Multithreaded applications