MedusaVM: Decentralizing Virtual Memory System for Multithreaded Applications on Many-core

  • Miao CaiEmail author
  • Shenming Liu
  • Weiyong Yang
  • Hao Huang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11334)


Virtual memory system multiplexes the single physical memory for multiple running processes with two centralized resources, i.e., virtual memory space and page table hierarchy. However, for multithreaded applications running a single address space, current centralized VM system design encounters severe scalability bottlenecks and significantly impedes the application speedup increment on many-core systems. This paper proposes a novel VM system called MedusaVM to scale VM system to many cores. To this end, MedusaVM partitions the global virtual memory space and page table tree in a memory-efficient way, eliminating performance interference and lock contention between cores. Moreover, MedusaVM also provides a traditional shared memory interface for multithreaded applications.

Our prototype system is implemented based on Linux kernel 4.4.0 and glibc 2.23. Experimental results evaluated on a 32-core machine demonstrate that MedusaVM scales much better than Linux kernel and uses 22\(\times \) less memory compared with the state-of-art approach. For microbenchmarks experiments, MedusaVM achieves nearly linear performance speedup. In multithreaded applications Metis and Psearchy, MedusaVM also significantly outperforms Linux kernel by up to a factor of 2.5\(\times \).


Virtual memory system Multithreaded application Many-core system 



This work was supported by the ministry of industry and information technology grants the project of key technology improvement of the industrial control system 2017.


  1. 1.
    Baumann, A., et al.: The multikernel: a new OS architecture for scalable multicore systems. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, pp. 29–44. ACM (2009)Google Scholar
  2. 2.
    Boyd-Wickizer, S., et al.: An analysis of Linux scalability to many cores. In: OSDI, vol. 10, pp. 86–93 (2010)Google Scholar
  3. 3.
    Boyd-Wickizer, S., Kaashoek, M.F., Morris, R., Zeldovich, N.: Non-scalable locks are dangerous. In: Proceedings of the Linux Symposium, pp. 119–130 (2012)Google Scholar
  4. 4.
    Brown, T., Ellen, F., Ruppert, E.: A general technique for non-blocking trees. In: ACM SIGPLAN Notices, vol. 49, pp. 329–342. ACM (2014)Google Scholar
  5. 5.
    Cai, M., Liu, S., Huang, H.: tScale: a contention-aware multithreaded framework for multicore multiprocessor systems. In: 2017 IEEE 16th International Conference on Parallel and Distributed Systems, pp. 87–104. IEEE (2017)Google Scholar
  6. 6.
    Chapin, J., Rosenblum, M., Devine, S., Lahiri, T., Teodosiu, D., Gupta, A.: Hive: fault containment for shared-memory multiprocessors. In: ACM SIGOPS Operating Systems Review, vol. 29, pp. 12–25. ACM (1995)Google Scholar
  7. 7.
    Clements, A.T., Kaashoek, M.F., Zeldovich, N.: Scalable address spaces using RCU balanced trees. ACM SIGPLAN Not. 47(4), 199–210 (2012)CrossRefGoogle Scholar
  8. 8.
    Clements, A.T., Kaashoek, M.F., Zeldovich, N.: RadixVM: scalable address spaces for multithreaded applications. In: Proceedings of the 8th ACM European Conference on Computer Systems, pp. 211–224. ACM (2013)Google Scholar
  9. 9.
    Clements, A.T., Kaashoek, M.F., Zeldovich, N., Morris, R.T., Kohler, E.: The scalable commutativity rule: designing scalable software for multicore processors. ACM Trans. Comput. Syst. (TOCS) 32(4), 10 (2015)CrossRefGoogle Scholar
  10. 10.
    Drachsler, D., Vechev, M., Yahav, E.: Practical concurrent binary search trees via logical ordering. ACM SIGPLAN Not. 49(8), 343–356 (2014)CrossRefGoogle Scholar
  11. 11.
    Herlihy, M., Shavit, N.: The Art of Multiprocessor Programming. Morgan Kaufmann, Burlington (2011)Google Scholar
  12. 12.
    Liu, R., Zhang, H., Chen, H.: Scalable read-mostly synchronization using passive reader-writer locks. In: USENIX Annual Technical Conference, pp. 219–230 (2014)Google Scholar
  13. 13.
    Michael, M.M.: Hazard pointers: safe memory reclamation for lock-free objects. IEEE Trans. Parallel Distrib. Syst. 15(6), 491–504 (2004)CrossRefGoogle Scholar
  14. 14.
    Nightingale, E.B., Hodson, O., McIlroy, R., Hawblitzel, C., Hunt, G.: Helios: heterogeneous multiprocessing with satellite kernels. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, pp. 221–234. ACM (2009)Google Scholar
  15. 15.
    Schneider, S., Antonopoulos, C.D., Nikolopoulos, D.S.: Scalable locality-conscious multithreaded memory allocation. In: Proceedings of the 5th International Symposium on Memory Management, pp. 84–94. ACM (2006)Google Scholar
  16. 16.
    Siakavaras, D., Nikas, K., Goumas, G., Koziris, N.: Combining HTM and RCU to implement highly efficient balanced binary search trees (2017)Google Scholar
  17. 17.
    Song, X., Chen, H., Chen, R., Wang, Y., Zang, B.: A case for scaling applications to many-core with OS clustering. In: Proceedings of the Sixth Conference on Computer Systems, pp. 61–76. ACM (2011)Google Scholar
  18. 18.
    Wang, Q., Stamler, T., Parmer, G.: Parallel sections: scaling system-level data-structures. In: Proceedings of the Eleventh European Conference on Computer Systems, p. 33. ACM (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.State Key Laboratory for Novel Software TechnologyNanjing UniversityNanjingChina
  2. 2.Department of Computer Science and TechnologyNanjing UniversityNanjingChina

Personalised recommendations