Abstract
Emerging persistent memory technologies, like PCM and 3D XPoint, offer numerous advantages, such as higher density, larger capacity, and better energy efficiency, compared with the DRAM. However, they also have some drawbacks, e.g., slower access speed, limited write endurance, and unbalanced read/write latency. Persistent memory technologies provide both great opportunities and challenges for operating systems. As a result, a large number of solutions have been proposed. With the increasing number and complexity of problems and approaches, we believe this is the right moment to investigate and analyze these works systematically.
To this end, we perform a comprehensive and in-depth study on operating system support for persistent memory within three steps. First, we present an overview of how to build the operating system on persistent memory from three perspectives: system abstraction, crash consistency, and system reliability. Then, we classify the existing research works into three categories: storage stack, memory manager, and OS-bypassing library. For each category, we summarize the major research topics and discuss these topics deeply. Specifically, we present the challenges and opportunities in each topic, describe the contributions and limitations of proposed approaches, and compare these solutions in different dimensions. Finally, we also envision the future operating system based on this study.
Similar content being viewed by others
References
Hajj I, Merritt A, Zellweger G, Milojicic D, Achermann R, Faraboschi P, Hwu W, Roscoe T, Schwan K. Spacejmp: programming with multiple virtual address spaces. In: Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems. 2016, 353–368
Qureshi M, Srinivasan V, Rivers J. Scalable high performance main memory system using phase-change memory technology. In: Proceedings of International Symposium on Computer Architecture. 2009, 24–33
Kawahara T. Scalable spin-transfer torque technology for normally-off computing. IEEE Design and Test of Computers, 2011, 28(1): 52–63
Ottavi M, Gupta V, Khandelwal S, Kvatinsky S, Mathew J, Martinelli E, Jabir A. The missing applications found: robust design techniques and novel uses of memristors. In: Proceedings of International Symposium on On-Line Testing and Robust System Design. 2019, 159–164
Xu J, Swanson S. NOVA: a log-structured file system for hybrid volatile/non-volatile main memories. In: Proceedings of USENIX Conference on File and Storage Technologies. 2016, 323–338
Xia F, Jiang D J, Xiong J, Sun N H. A survey of phase change memory systems. Journal of Computer Science and Technology, 2015, 30(1): 121–144
Mittal S, Vetter J. A survey of software techniques for using non-volatile memories for storage and main memory systems. IEEE Transactions on Parallel and Distributed Systems, 2016, 27(5): 1537–1550
Boukhobza J, Rubini S, Chen R H, Shao Z L. Emerging NVM: a survey on architectural integration and research challenges. ACM Transactions on Design Automation of Electronic System, 2018, 23(2): 1–32
Qureshi M, Karidis J, Franceschini M, Srinivasan V, Lastras L, Abali B. Enhancing lifetime and security of pcm-based main memory with start-gap wear leveling. In: Proceedings of International Symposium on Microarchitecture. 2009, 14–23
Rao D S, Kumar S, Keshavamurthy A, Lantz P, Reddy D, Sankaran R, Jackson J. System software for persistent memory. In: Proceedings of European Conference on Computer Systems. 2014, 1–15
Liu M X, Zhang M X, Chen K, Qian X H, Wu Y W, Zheng W M, Ren J L. Dudetm: building durable transactions with decoupling for persistent memory. In: Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems. 2017, 329–343
Mishra A, Dong X Y, Sun G Y, Xie Y, Vijaykrishnan N, Das C. Architecting on-chip interconnects for stacked 3D STT-RAM caches in CMPs. In: Proceedings of the 38th International Symposium on Computer Architecture. 2011, 69–80
Kwon Y, Fingler H, Hunt T, Peter S, Witchel E, Anderson T. Strata: a cross media file system. In: Proceedings of ACM Symposium on Operating Systems Principles. 2017, 460–477
Condit J, Nightingale E, Frost C, Ipek E, Lee B, Burger D, Coetzee D. Better I/O through byte-addressable persistent memory. In: Proceedings of ACM Symposium on Operating Systems Principles. 2009, 133–146
Renen A, Leis V, Kemper A, Neumann T, Hashida T, Oe K, Doi Y, Harada L, Sato M. Managing non-volatile memory in database systems. In: Proceedings of International Conference on Management of Data. 2018, 1541–1555
Wu C W, Zhang G Y, Li K Q. Rethinking computer architectures and software systems for phase-change memory. ACM Journal on Emerging Technologies in Computing Systems, 2016, 12(4): 1–40
Puglia G O, Zorzo A F, Rose C, Perez T, Milojicic D. Non-volatile memory file systems: a survey. IEEE Access, 2019, 7(2): 25836–25871
Gu R H, Shao Z, Chen H, Wu X N, Kim J, Sjoberg V, Costanzo D. Certikos: an extensible architecture for building certified concurrent os kernels. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2016, 653–669
Nelson L, Sigurbjarnarson H, Zhang K, Johnson D, Bornholt J, Torlak E, Wang X. Hyperkernel: push-button verification of an os kernel. In: Proceedings of ACM Symposium on Operating Systems Principles. 2017, 252–269
Chen H G, Ziegler D, Chajed T, Chlipala A, Kaashoek F, Zeldovich N. Using crash hoare logic for certifying the FSCQ file system. In: Proceedings of ACM Symposium on Operating Systems Principles. 2015, 18–37
Chen H G, Chajed T, Konradi A, Wang S, Ileri A M, Chlipala A, Kaashoek F, Zeldovich N. Verifying a high-performance crash-safe file system using a tree specification. In: Proceedings of ACM Symposium on Operating Systems Principles. 2017, 270–286
Narayanan D, Hodson O. Whole-system persistence. In: Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems. 2012, 401–410
Faraboschi P, Keeton K, Marsland T, Milojicic D. Beyond processor-centric operating systems. In: Proceedings of USENIX Workshop on Hot Topics in Operating Systems. 2015, 1–8
Lee D, Won Y. Bootless boot: reducing device boot latency with byte addressable NVRAM. In: Proceedings of International Conference on High Performance Computing and Communications. 2013, 2014–2021
Gunawi H, Hao M Z, Suminto R, Laksono A, Satria A, Adityatama J, Eliazar K. Why does the cloud stop computing? lessons from hundreds of service outages. In: Proceedings of ACM Symposium on Cloud Computing. 2016, 1–16
Hussain Z, Znati T, Melhem R. Partial redundancy in HPC systems with non-uniform node reliabilities. In: Proceedings of International Conference for High Performance Computing, Networking, Storage, and Analysis. 2018, 1–11
Schroeder B, Gibson G. A large-scale study of failures in highperformance computing systems. In: Proceedings of International Conference on Dependable Systems and Networks. 2006, 249–258
Qureshi M, Franceschini M, Jagmohan A, Lastras L. Preset: improving performance of phase change memories by exploiting asymmetry in write times. In: Proceedings of International Symposium on Computer Architecture. 2012, 380–391
Zhou P, Zhao B, Yang J, Zhang Y T. A durable and energy efficient main memory using phase change memory technology. In: Proceedings of International Symposium on Computer Architecture. 2009, 14–23
Seong N H, Yeo S, Lee H H. Tri-level-cell phase change memory: toward an efficient and reliable memory system. In: Proceedings of International Symposium on Computer Architecture. 2013, 440–451
Lee B, Ipek E, Mutlu O, Burger D. Architecting phase change memory as a scalable dram alternative. In: Proceedings of International Symposium on Computer Architecture. 2009, 2–13
Kultursay E, Kandemir M, Sivasubramaniam A, Mutlu O. Evaluating STT-RAM as an energy-efficient main memory alternative. In: Proceedings of International Symposium on Performance Analysis of Systems & Software. 2013, 256–267
Ahn J, Yoo S, Choi K. Dasca: dead write prediction assisted STT-RAM cache architecture. In: Proceedings of International Symposium on High Performance Computer Architecture. 2014, 25–36
Qureshi M. Pay-as-you-go: low-overhead hard-error correction for phase change memories. In: Proceedings of International Symposium on Microarchitecture. 2011, 318–328
Yoon D H, Muralimanohar N, Chang J C, Ranganathan P, Jouppi N, Erez M. Free-p: protecting non-volatile memory against both hard and soft errors. In: Proceedings of International Conference on High-Performance Computer Architecture. 2011, 466–477
Awasthi M, Shevgoor M, Sudan K, Rajendran B, Balasubramonian R, Srinivasan V. Efficient scrub mechanisms for error-prone emerging memories. In: Proceedings of International Symposium on High Performance Computer Architecture. 2012, 15–26
Schechter S, Loh G, Strauss K, Burger D. Use ecp, not ecc, for hard failures in resistive memories. In: Proceedings of International Symposium on Computer Architecture. 2010, 141–152
Jiang L, Zhang Y T, Yang J. Mitigating write disturbance in super-dense phase change memories. In: Proceedings of International Conference on Dependable Systems and Networks. 2014, 216–227
Sridharan V, DeBardeleben N, Blanchard S, Ferreira K, Stearley J, Shalf J, Gurumurthi S. Memory errors in modern systems: the good, the bad, and the ugly. In: Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems. 2015, 297–310
Kannan S, Gavrilovska A, Schwan K. PVM: persistent virtual memory for efficient capacity scaling and object storage. In: Proceedings of European Conference on Computer Systems. 2016, 1–16
Liu L, Yang S J, Peng L, Li X Y. Hierarchical hybrid memory management in OS for tiered memory systems. IEEE Transactions on Parallel Distributed Systems, 2019, 30(10): 2223–2236
Soares L, Stumm M. Flexsc: flexible system call scheduling with exception-less system calls. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2010, 33–46
Engler D, Kaashoek F, Toole J. Exokernel: an operating system architecture for application-level resource management. In: Proceedings of ACM Symposium on Operating System Principles. 1995, 251–266
Volos H, Tack A J, Swift M. Mnemosyne: lightweight persistent memory. In: Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems. 2011, 91–104
Coburn J, Caulfield A, Akel A, Grupp L, Gupta R, Jhala R, Swanson S. NV-heaps: making persistent objects fast and safe with next-generation non-volatile memories. In: Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems. 2011, 105–118
Hwang T, Jung J, Won Y. Heapo: heap-based persistent object store. ACM Transactions on Storage, 2015, 11(1): 1–21
Memaripour A, Badam A, Phanishayee A, Zhou Y Q, Alagappan R, Strauss K, Swanson S. Atomic in-place updates for non-volatile main memories with kamino-tx. In: Proceedings of European Conference on Computer Systems. 2017, 499–512
Hu Q D, Ren J L, Badam A, Shu J W, Moscibroda T. Log-structured non-volatile main memory. In: Proceedings of USENIX Annual Technical Conference. 2017, 703–717
Zhao J S, Li S, Yoon D H, Xie Y, Jouppi N. Kiln: closing the performance gap between systems with and without persistence support. In: Proceedings of International Symposium on Microarchitecture. 2013, 421–432
Tweedie S. Journaling the linux ext2fs filesystem. In: Proceedings of Annual Linux Expo. 1998, 1–8
Seltzer M, Bostic K, McKusick M, Staelin C. An implementation of a log-structured file system for unix. In: Proceedings of Usenix Winter Technical Conference. 1993, 307–326
Joshi A, Nagarajan V, Viglas S, Cintra M. Atom: atomic durability in non-volatile memory through hardware logging. In: Proceedings of International Symposium on High Performance Computer Architecture. 2017, 361–372
Lu Y Y, Shu J W, Sun L, Mutlu O. Loose-ordering consistency for persistent memory. In: Proceedings of International Conference on Computer Design. 2014, 216–223
Kumar H, Patel Y, Kesavan R, Makam S. High performance metadata integrity protection in the wafl copy-on-write file system. In: Proceedings of USENIX Conference on File and Storage Technologies. 2017, 197–212
Xu J, Zhang L, Memaripour A, Gangadharaiah A, Borase A, Silva T, Swanson S, Rudoff A. Nova-fortis: a fault-tolerant non-volatile main memory file system. In: Proceedings of ACM Symposium on Operating Systems Principles. 2017, 478–496
Dong M K, Bu H, Yi J F, Dong B C, Chen H B. Performance and protection in the ZoFS user-space NVM file system. In: Proceedings of ACM Symposium on Operating Systems Principles. 2019, 478–493
Jaffer S, Maneas S, Hwang A, Schroeder B. Evaluating file system reliability on solid state drives. In: Proceedings of USENIX Annual Technical Conference. 2019, 783–798
Sweeney A, Doucette D, Hu W, Anderson C, Nishimoto M, Peck G. Scalability in the XFS file system. In: Proceedings of USENIX Annual Technical Conference. 1996, 1–14
Kleiman S. Vnodes: an architecture for multiple file system types in sun unix. In: Proceedings of USENIX Summer Conference. 2011, 238–247
Wang Y, Jiang D J, Xiong J. Caching or not: rethinking virtual file system for non-volatile main memory. In: Proceedings of USENIX Workshop on Hot Topics in Storage and File Systems. 2018, 1–6
Yang J, Minturn D, Hady F. When poll is better than interrupt. In: Proceedings of USENIX Conference on File and Storage Technologies. 2012, 1–7
Lee G, Shin S, Song W, Ham T J, Lee J, Jeong J. Asynchronous I/O stack: a low-latency kernel I/O stack for ultra-low latency SSDs. In: Proceedings of USENIX Annual Technical Conference. 2019, 603–616
Ou J X, Shu J W, Lu Y Y. A high performance file system for nonvolatile main memory. In: Proceedings of European Conference on Computer Systems. 2016, 1–16
Chen C, Yang J, Wei Q S, Wang C D, Xue M D. Fine-grained metadata journaling on NVM. In: Proceedings of IEEE Conference on Mass Storage Systems and Technologies. 2016, 1–13
Lee E, Yoo S, Bahn H. Design and implementation of a journaling file system for phase-change memory. IEEE Transaction on Computers, 2015, 64(5): 1349–1360
Dong M K, Chen H B. Soft updates made simple and fast on non-volatile memory. In: Proceedings of USENIX Annual Technical Conference. 2017, 719–731
Ganger G, McKusick M, Soules C, Patt Y. Soft updates: a solution to the metadata update problem in file systems. ACM Transactions on Computer Systems, 2000, 18(2): 127–153
Chen J X, Wei Q S, Chen C, Wu L K. FSMAC: a file system metadata accelerator with non-volatile memory. In: Proceedings of IEEE Conference on Mass Storage Systems and Technologies. 2013, 1–11
Wu X J, Reddy N. SCMFS: a file system for storage class memory. In: Proceedings of International Conference on High Performance Computing Networking, Storage and Analysis. 2011, 1–11
Sha E, Chen X Z, Zhuge Q F, Shi L, Jiang W W. A new design of in-memory file system based on file virtual address framework. IEEE Transactions on Computers, 2016, 65(10): 2959–2972
Amit N. Optimizing the tlb shootdown algorithm with page access rracking. In: Proceedings of USENIX Annual Technical Conference. 2017, 27–39
Mohan C, Haderle D, Lindsay B, Pirahesh H, Schwarz P. Aries: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Transactions on Database System, 1992, 17(1): 94–162
Hitz D, Lau J, Malcolm M. File system design for an nfs file server appliance. In: Proceedings of Winter Technical Conference. 1994, 235–246
Schroeder B, Gibson G. Disk failures in the real world: what does an MTTF of 1000000 hours mean to you. In: Proceedings of USENIX Conference on File and Storage Technologies. 2007, 1–16
Schroeder B, Damouras S, Gill P. Understanding latent sector errors and how to protect against them. In: Proceedings of USENIX Conference on File and Storage Technologies. 2010, 71–84
Schroeder B, Lagisetty R, Merchant A. Flash reliability in production: the expected and the unexpected. In: Proceedings of USENIX Conference on File and Storage Technologies. 2016, 67–80
Rodeh O, Bacik J, Mason C. BTRFS: the linux b-tree filesystem. ACM Transactions on Storage, 2013, 9(3): 1–32
Volos H, Nalli S, Panneerselvam S, Varadarajan V, Saxena P, Swift M. Aerie: flexible file-system interfaces to storage-class memory. In: Proceedings of European Conference on Computer Systems. 2014, 1–14
Kadekodi R, Lee S K, Kashyap S, Kim T, Kolli A, Chidambaram V. Splitfs: reducing software overhead in file systems for persistent memory. In: Proceedings of ACM Symposium on Operating Systems Principles. 2019, 494–508
Belay A, Prekas G, Klimovic A, Grossman S, Kozyrakis C, Bugnion E. IX: a protected dataplane operating system for high throughput and low latency. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2014, 49–65
Peter S, Li J L, Zhang I, Ports D, Woos D, Krishnamurthy A, Anderson T, Roscoe T. Arrakis: the operating system is the control plane. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2014, 1–16
Zheng S G, Hoseinzadeh M, Swanson S. Ziggurat: a tiered file system for non-volatile main memories and disks. In: Proceedings of USENIX Conference on File and Storage Technologies. 2019, 207–219
Lee E, Bahn H, Noh S. Unioning of the buffer cache and journaling layers with non-volatile memory. In: Proceedings of USENIX Conference on File and Storage Technologies. 2013, 73–80
Ou J X, Shu J W. Fast and failure-consistent updates of application data in non-volatile main memory file system. In: Proceedings of IEEE Conference on Mass Storage Systems and Technologies. 2016, 1–15
Zheng S G, Huang L P, Liu H, Wu L Z, Zha J. HMVFS: a hybrid memory versioning file system. In: Proceedings of IEEE Conference on Mass Storage Systems and Technologies. 2016, 1–14
Sim J, Alameldeen A, Chishti Z, Wilkerson C, Kim H. Transparent hardware management of stacked dram as part of memory. In: Proceedings of International Symposium on Microarchitecture. 2014, 13–24
Clements A, Kaashoek F, Zeldovich N. Radixvm: scalable address spaces for multithreaded applications. In: Proceedings of European Conference on Computer Systems. 2013, 211–224
Xue D L, Huang L P, Li C, Wu C T. Dapper: an adaptive manager for large-capacity persistent memory. IEEE Transactions on Computers, 2019, 68(7): 1019–1034
Yan Z, Lustig D, Nellans D, Bhattacharjee A. Nimble page management for tiered memory systems. In: Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems. 2019, 331–345
Kannan S, Gavrilovska A, Gupta V, Schwan K. Heteroos: OS design for heterogeneous memory management in datacenter. In: Proceedings of International Symposium on Computer Architecture. 2017, 521–534
Agarwal N, Wenisch T. Thermostat: application-transparent page management for two-tiered main memory. In: Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems. 2017, 631–644
Gogte V, Wang W, Diestelhorst S, Kolli A, Chen P, Narayanasamy S, Wenisch T. Software wear management for persistent memories. In: Proceedings of USENIX Conference on File and Storage Technologies. 2019, 45–63
Dhiman G, Ayoub R Z, Rosing T. PDRAM: a hybrid pram and dram main memory system. In: Proceedings of Design Automation Conference. 2009, 664–669
Dulloor S, Roy A, Zhao Z G, Sundaram N, Satish N, Sankaran R, Jackson J, Schwan K. Data tiering in heterogeneous memory systems. In: Proceedings of European Conference on Computer Systems. 2016, 1–16
Gupta V, Lee M, Schwan K. Heterovisor: exploiting resource heterogeneity to enhance the elasticity of cloud platforms. In: Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments. 2015, 79–92
Xue D L, Li C, Huang L P, Wu C T, Li T Y. Adaptive memory fusion: towards transparent, agile integration of persistent memory. In: Proceedings of International Symposium on High Performance Computer Architecture. 2018, 324–335
Ipek E, Condit J, Nightingale E, Burger D, Moscibroda T. Dynamically replicated memory: building reliable systems from nanoscale resistive memories. In: Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems. 2010, 3–14
Chen J, Venkataramani G, Huang H. Repram: re-cycling pram faulty blocks for extended lifetime. In: Proceedings of International Conference on Dependable Systems and Networks. 2012, 1–12
Chen C H, Hsiu P, Kuo T, Yang C, Wang C M. Age-based PCM wear leveling with nearly zero search cost. In: Proceedings of Design Automation Conference. 2012, 453–458
Chakrabarti D, Boehm H, Bhandari K. Atlas: leveraging locks for nonvolatile memory consistency. In: Proceedings of International Conference on Object Oriented Programming Programming, Systems, Languages, and Applications. 2014, 433–452
Hsu T, Brugner H, Roy I, Keeton K, Eugster P. Nvthreads: practical persistence for multi-threaded applications. In: Proceedings of European Conference on Computer Systems. 2017, 468–482
Giles E, Doshi K, Varman P. Softwrap: a lightweight framework for transactional support of storage class memory. In: Proceedings of International Conference on Mass Storage Systems and Technologies. 2015, 1–14
Gu J Y, Yu Q Q, Wang X Y, Wang Z G, Zang B Y, Guan H B, Chen H B. Pisces: a scalable and efficient persistent transactional memory. In: Proceedings of USENIX Annual Technical Conference. 2019, 913–928
Lu Y Y, Shu J W, Sun L. Blurred persistence in transactional persistent memory. In: Proceedings of International Conference on Mass Storage Systems and Technologies. 2015, 1–13
Zhang L, Swanson S. Pangolin: a fault-tolerant persistent memory programming library. In: Proceedings of USENIX Annual Technical Conference. 2019, 897–912
Bhandari K, Chakrabarti D, Boehm H. Makalu: fast recoverable allocation of non-volatile memory. In: Proceedings of International Conference on Object-Oriented Programming, Systems, Languages, and Applications. 2016, 677–694
Kolli A, Pelley S, Saidi A, Chen P, Wenisch T. High-performance transactions for persistent memories. In: Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems. 2016, 399–411
Ren X, Rodrigues K, Chen L, Vega C, Stumm M, Yuan D. An analysis of performance evolution of linux’s core operations. In: Proceedings of ACM Symposium on Operating Systems Principles. 2019, 554–569
Li X, Lu K, Wang X P, Zhou X. NV-process: a fault-tolerance process model based on non-volatile memory. In: Proceedings of Asia-Pacific Workshop on Systems. 2012, 1–8
Porter D, Hofmann O, Rossbach C, Benn A, Witchel E. Operating systems transactions. In: Proceedings of ACM Symposium on Operating Systems Principles. 2009, 161–176
Acknowledgements
The authors would like to thank the anonymous reviewers for their helpful comments and suggestions. This work was supported by the Key Technology Improvement of Industrial Control System granted by the Ministry of Industry and Information.
Author information
Authors and Affiliations
Corresponding author
Additional information
Miao Cai received the BS degree from China University of Mining and Technology, China in 2014. He received the MS degree from Nanjing University, China in 2016. He is currently working towards the PhD degree in the Department of Computer Science and Technology at Nanjing University, China. His research interests include operating system and memory/storage system.
Hao Huang received the BS degrees from Xiamen University, China in 1982 and the PhD degree from Nanjing University, China in 1999. He is now a professor in the Department of Computer Science and Technology at Nanjing University, China. His research interests include operating system and system security.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Cai, M., Huang, H. A survey of operating system support for persistent memory. Front. Comput. Sci. 15, 154207 (2021). https://doi.org/10.1007/s11704-020-9395-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11704-020-9395-3