Skip to main content
Log in

Versionized process based on non-volatile random-access memory for fine-grained fault tolerance

  • Published:
Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Abstract

Non-volatile random-access memory (NVRAM) technology is maturing rapidly and its byte-persistence feature allows the design of new and efficient fault tolerance mechanisms. In this paper we propose the versionized process (VerP), a new process model based on NVRAM that is natively non-volatile and fault tolerant. We introduce an intermediate software layer that allows us to run a process directly on NVRAM and to put all the process states into NVRAM, and then propose a mechanism to versionize all the process data. Each piece of the process data is given a special version number, which increases with the modification of that piece of data. The version number can effectively help us trace the modification of any data and recover it to a consistent state after a system crash. Compared with traditional checkpoint methods, our work can achieve fine-grained fault tolerance at very little cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Adiga NR, Almasi G, Bright AA, et al., 2002. An overview of the Bluegene/L supercomputer. Proc ACM/IEEE Conf on Supercomputing, p.60. https://doi.org/10.1109/SC.2002.10017

    Google Scholar 

  • Badam A, 2013. How persistent memory will change software systems. Computer, 46(8):45–51. https://doi.org/10.1109/MC.2013.189

    Article  Google Scholar 

  • Bailey K, Ceze L, Gribble SD, et al., 2011. Operating system implications of fast, cheap, non-volatile memory. Proc 13th Usenix Conf on Hot Topics in Operating Systems, p.2.

    Google Scholar 

  • Coburn J, Caulfield AM, Akel A, et al., 2011. NV-Heaps: making persistent objects fast and safe with nextgeneration, non-volatile memories. ACM SIGARCH Comput Archit News, 39(1):105–118. https://doi.org/10.1145/1950365.1950380

    Article  Google Scholar 

  • D’Amorim M, Rosu G, 2005. An equational specification for the scheme language. J Univ Comput, 11(7):1327–1348. https://doi.org/10.3217/jucs-011-07-1327

    Google Scholar 

  • Dong X, Xie Y, Muralimanohar N, et al., 2011. Hybrid checkpointing using emerging nonvolatile memories for future exascale system. ACM Trans Archit Code Optim, 8(2), Article 6. https://doi.org/10.1145/1970386.1970387

    Google Scholar 

  • Dulloor SR, Kumar S, Keshavamurthy A, et al., 2014. System software for persistent memory. Proc 9th European Conf on Computer Systems, p.15. https://doi.org/10.1145/2592798.2592814

    Google Scholar 

  • Guerraoui R, Trigonakis V, 2016. Optimistic concurrency with OPTIK. ACM SIGPLAN Symp on Principles and Practice of Parallel Programming, p.197–211. https://doi.org/10.1145/2851141.2851146

    Google Scholar 

  • Kannan S, Gavrilovska A, Schwan K, et al., 2013. Optimizing checkpoints using NVM as virtual memory. IEEE 27th Int Symp on Parallel & Distributed Processing, p.29–40.

    Google Scholar 

  • Larkin J, Fahey M, 2007. Guidelines for efficient parallel I/O on the cray XT3/XT4. Proc Cray User Group.

    Google Scholar 

  • Liang S, Bracha G, 2000. Dynamic class loading in the Java virtual machine. ACM SIGPLAN Not, 33(10):36–44. https://doi.org/10.1145/286942.286945

    Article  Google Scholar 

  • Liang Y, Zhang Y, Sivasubramaniam A, et al., 2006. Bluegene/ L failure analysis and prediction models. Int Conf on Dependable Systems and Networks, p.425–434. https://doi.org/10.1109/DSN.2006.18

    Chapter  Google Scholar 

  • Liang Y, Zhang Y, Xiong H, et al., 2007. Failure prediction in IBM Bluegene/L event logs. 7th IEEE Int Conf on Data Mining, p.583–588. https://doi.org/10.1109/ICDM.2007.46

    Google Scholar 

  • Lu X, Wang H, Wang J, et al., 2013. Internet-based virtual computing environment: beyond the data center as a computer. Fut Gener Comput Syst, 29(1):309–322. https://doi.org/10.1016/j.future.2011.08.005

    Article  Google Scholar 

  • Luk CK, Cohn R, Muth R, et al., 2005. Pin: building customized program analysis tools with dynamic instrumentation. ACM SIGPLAN Conf on Programming Language Design and Implementation, p.190–200. https://doi.org/10.1145/1064978.1065034

    Google Scholar 

  • Oliphant TE, 2007. Python for scientific computing. Comput Sci Eng, 9(3):10–20. https://doi.org/10.1109/MCSE.2007.58

    Article  Google Scholar 

  • Qureshi MK, Franceschini MM, Jagmohan A, et al., 2012. PreSET: improving performance of phase change memories by exploiting asymmetry in write times. 39th Annual Int Symp on Computer Architecture, p.380–391.

    Google Scholar 

  • Rhodes C, Costanza P, D’Hondt T, et al., 2007. Lisp. Conf on Object-Oriented Technology, p.1–6.

    Google Scholar 

  • Surhone LM, Timpledon M, Marseken SF, et al., 2010. TinyScheme. Betascript Publishing.

    Google Scholar 

  • Uhlig R, Neiger G, Rodger D, et al., 2005. Intel virtualization technology. Computer, 38(5):48–56.

    Article  Google Scholar 

  • Vallée-Rai R, Gagnon E, Hendren L, et al., 2000. Optimizing Java bytecode using the soot framework: is it feasible? Int Conf on Compiler Construction, p.18–34.

    Chapter  Google Scholar 

  • Venkataraman S, Tolia N, Ranganathan P, et al., 2011. Consistent and durable data structures for non-volatile byteaddressable memory. Usenix Conf on File and Stroage Technologies, p.61–75. https://doi.org/10.1145/2189750.2151018

    Google Scholar 

  • Volos H, Tack AJ, Swift MM, 2011. Mnemosyne: lightweight persistent memory. ACM SIGARCH Comput Archit News, 39(1):91–104. https://doi.org/10.1145/1961296.1950379

    Article  Google Scholar 

  • Volos H, Nalli S, Panneerselvam S, et al., 2014. Aerie: flexible file-system interfaces to storage-class memory. Proc 9th European Conf on Computer Systems, p.1–14.

    Google Scholar 

  • Wong HSP, Raoux S, Kim SB, et al., 2010. Phase change memory. Proc IEEE, 98(12):2201–2227. https://doi.org/10.1109/JPROC.2010.2070050

    Article  Google Scholar 

  • Yang X, Wang Z, Xue J, et al., 2012. The reliability wall for exascale supercomputing. IEEE Trans Comput, 61(6):767–779. https://doi.org/10.1109/TC.2011.106

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang WZ, Kai L, Luján M, et al., 2017. Fine-grained checkpoint based on non-volatile memory. Front Inform Technol Electron Eng, 18(2):220–234. https://doi.org/10.1631/FITEE.1500352

    Article  Google Scholar 

  • Zhou P, Zhao B, Yang J, et al., 2009. A durable and energy efficient main memory using phase change memory technology. ACM SIGARCH Comput Archit News, 37(3):14–23. https://doi.org/10.1145/1555754.1555759

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kai Lu.

Additional information

Project supported by the National High-Tech R&D Program (863) of China (Nos. 2012AA01A301, 2012AA010901, 2012AA010303, and 2015AA01A301), the Program for New Century Excellent Talents in University, the National Natural Science Foundation of China (Nos. 61272142, 61402492, 61402486, 61379146, and 61272483), the Laboratory Pre-research Fund (No. 9140C810106150C81001), and the Open Project of the State Key Laboratory of High-End Server & Storage Technology (No. 2014HSSA01)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Wz., Lu, K. & Wang, Xp. Versionized process based on non-volatile random-access memory for fine-grained fault tolerance. Frontiers Inf Technol Electronic Eng 19, 192–205 (2018). https://doi.org/10.1631/FITEE.1601477

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/FITEE.1601477

Key words

CLC number

Navigation