Abstract
Non-volatile random-access memory (NVRAM) technology is maturing rapidly and its byte-persistence feature allows the design of new and efficient fault tolerance mechanisms. In this paper we propose the versionized process (VerP), a new process model based on NVRAM that is natively non-volatile and fault tolerant. We introduce an intermediate software layer that allows us to run a process directly on NVRAM and to put all the process states into NVRAM, and then propose a mechanism to versionize all the process data. Each piece of the process data is given a special version number, which increases with the modification of that piece of data. The version number can effectively help us trace the modification of any data and recover it to a consistent state after a system crash. Compared with traditional checkpoint methods, our work can achieve fine-grained fault tolerance at very little cost.
Similar content being viewed by others
References
Adiga NR, Almasi G, Bright AA, et al., 2002. An overview of the Bluegene/L supercomputer. Proc ACM/IEEE Conf on Supercomputing, p.60. https://doi.org/10.1109/SC.2002.10017
Badam A, 2013. How persistent memory will change software systems. Computer, 46(8):45–51. https://doi.org/10.1109/MC.2013.189
Bailey K, Ceze L, Gribble SD, et al., 2011. Operating system implications of fast, cheap, non-volatile memory. Proc 13th Usenix Conf on Hot Topics in Operating Systems, p.2.
Coburn J, Caulfield AM, Akel A, et al., 2011. NV-Heaps: making persistent objects fast and safe with nextgeneration, non-volatile memories. ACM SIGARCH Comput Archit News, 39(1):105–118. https://doi.org/10.1145/1950365.1950380
D’Amorim M, Rosu G, 2005. An equational specification for the scheme language. J Univ Comput, 11(7):1327–1348. https://doi.org/10.3217/jucs-011-07-1327
Dong X, Xie Y, Muralimanohar N, et al., 2011. Hybrid checkpointing using emerging nonvolatile memories for future exascale system. ACM Trans Archit Code Optim, 8(2), Article 6. https://doi.org/10.1145/1970386.1970387
Dulloor SR, Kumar S, Keshavamurthy A, et al., 2014. System software for persistent memory. Proc 9th European Conf on Computer Systems, p.15. https://doi.org/10.1145/2592798.2592814
Guerraoui R, Trigonakis V, 2016. Optimistic concurrency with OPTIK. ACM SIGPLAN Symp on Principles and Practice of Parallel Programming, p.197–211. https://doi.org/10.1145/2851141.2851146
Kannan S, Gavrilovska A, Schwan K, et al., 2013. Optimizing checkpoints using NVM as virtual memory. IEEE 27th Int Symp on Parallel & Distributed Processing, p.29–40.
Larkin J, Fahey M, 2007. Guidelines for efficient parallel I/O on the cray XT3/XT4. Proc Cray User Group.
Liang S, Bracha G, 2000. Dynamic class loading in the Java virtual machine. ACM SIGPLAN Not, 33(10):36–44. https://doi.org/10.1145/286942.286945
Liang Y, Zhang Y, Sivasubramaniam A, et al., 2006. Bluegene/ L failure analysis and prediction models. Int Conf on Dependable Systems and Networks, p.425–434. https://doi.org/10.1109/DSN.2006.18
Liang Y, Zhang Y, Xiong H, et al., 2007. Failure prediction in IBM Bluegene/L event logs. 7th IEEE Int Conf on Data Mining, p.583–588. https://doi.org/10.1109/ICDM.2007.46
Lu X, Wang H, Wang J, et al., 2013. Internet-based virtual computing environment: beyond the data center as a computer. Fut Gener Comput Syst, 29(1):309–322. https://doi.org/10.1016/j.future.2011.08.005
Luk CK, Cohn R, Muth R, et al., 2005. Pin: building customized program analysis tools with dynamic instrumentation. ACM SIGPLAN Conf on Programming Language Design and Implementation, p.190–200. https://doi.org/10.1145/1064978.1065034
Oliphant TE, 2007. Python for scientific computing. Comput Sci Eng, 9(3):10–20. https://doi.org/10.1109/MCSE.2007.58
Qureshi MK, Franceschini MM, Jagmohan A, et al., 2012. PreSET: improving performance of phase change memories by exploiting asymmetry in write times. 39th Annual Int Symp on Computer Architecture, p.380–391.
Rhodes C, Costanza P, D’Hondt T, et al., 2007. Lisp. Conf on Object-Oriented Technology, p.1–6.
Surhone LM, Timpledon M, Marseken SF, et al., 2010. TinyScheme. Betascript Publishing.
Uhlig R, Neiger G, Rodger D, et al., 2005. Intel virtualization technology. Computer, 38(5):48–56.
Vallée-Rai R, Gagnon E, Hendren L, et al., 2000. Optimizing Java bytecode using the soot framework: is it feasible? Int Conf on Compiler Construction, p.18–34.
Venkataraman S, Tolia N, Ranganathan P, et al., 2011. Consistent and durable data structures for non-volatile byteaddressable memory. Usenix Conf on File and Stroage Technologies, p.61–75. https://doi.org/10.1145/2189750.2151018
Volos H, Tack AJ, Swift MM, 2011. Mnemosyne: lightweight persistent memory. ACM SIGARCH Comput Archit News, 39(1):91–104. https://doi.org/10.1145/1961296.1950379
Volos H, Nalli S, Panneerselvam S, et al., 2014. Aerie: flexible file-system interfaces to storage-class memory. Proc 9th European Conf on Computer Systems, p.1–14.
Wong HSP, Raoux S, Kim SB, et al., 2010. Phase change memory. Proc IEEE, 98(12):2201–2227. https://doi.org/10.1109/JPROC.2010.2070050
Yang X, Wang Z, Xue J, et al., 2012. The reliability wall for exascale supercomputing. IEEE Trans Comput, 61(6):767–779. https://doi.org/10.1109/TC.2011.106
Zhang WZ, Kai L, Luján M, et al., 2017. Fine-grained checkpoint based on non-volatile memory. Front Inform Technol Electron Eng, 18(2):220–234. https://doi.org/10.1631/FITEE.1500352
Zhou P, Zhao B, Yang J, et al., 2009. A durable and energy efficient main memory using phase change memory technology. ACM SIGARCH Comput Archit News, 37(3):14–23. https://doi.org/10.1145/1555754.1555759
Author information
Authors and Affiliations
Corresponding author
Additional information
Project supported by the National High-Tech R&D Program (863) of China (Nos. 2012AA01A301, 2012AA010901, 2012AA010303, and 2015AA01A301), the Program for New Century Excellent Talents in University, the National Natural Science Foundation of China (Nos. 61272142, 61402492, 61402486, 61379146, and 61272483), the Laboratory Pre-research Fund (No. 9140C810106150C81001), and the Open Project of the State Key Laboratory of High-End Server & Storage Technology (No. 2014HSSA01)
Rights and permissions
About this article
Cite this article
Zhang, Wz., Lu, K. & Wang, Xp. Versionized process based on non-volatile random-access memory for fine-grained fault tolerance. Frontiers Inf Technol Electronic Eng 19, 192–205 (2018). https://doi.org/10.1631/FITEE.1601477
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.1601477