Skip to main content
Log in

SCR algorithm: Saving/restoring states of file systems

  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Fault-tolerance is very important in cluster computing and has been implemented in many famous cluster-computing systems using checkpoint/restart mechanisms. But existent check-pointing algorithms cannot restore the states of a file system when roll-backing the running of a program, so there are many restrictions on file accesses in existent fault-tolerance systems. SCR algorithm, an algorithm based on atomic operation and consistent schedule, which can restore the states of file systems, is presented in this paper. In the SCR algorithm, system calls on file systems are classified into idem-potent operations and non-idem-potent operations. A non-idem-potent operation modifies a file system’s states, while an idem-potent operation does not. SCR algorithm tracks changes of the file system states. It logs each non-idem-potent operation used by user programs and the information that can restore the operation in disks. When check-pointing roll-backing the program, SCR algorithm will revert the file system states to the last checkpoint time. By using SCR algorithm, users, are allowed to use any file operation in their programs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Sunderam V S. PVM: A framework for parallel distributed computing.Concurrency: Practice and Expernce, 1990, 2(4): 315–339.

    Article  Google Scholar 

  2. Litzkow M. Supporting check-pointing and process migration outside the Unix kernel. InProc. USENIX-Winter’92, San Francisco, CA, 1992, pp.283–290.

  3. Litzkow M, Miron L, Mattw M. Condor — A hunter of idle workstations. InIEEE 8ICDCS, San Jose, California, 1988, pp. 104–111.

  4. Casas Jet al. Mist: PVM with transparent migration and check-pointing InProc. the 3rd Annual PVM User’s Group Meeting, Pittsburgh, 1995.

  5. Casas Jet al. MPVM: A migration transparent version of PVM. Dept. of Computer Science and Engineering, Oregon Graduate Institute of Science & Technology: TR CSE-95-002, Feb. 1995.

  6. Stellner G. Resource management and check-pointing for PVM. InProc. the 2rd European User’s Group Meeting, Lyon, France, 1995, pp.131–136.

  7. Juan León, Allan L Fisher, Peter Steenkiste. Fail-safe PVM: A portable package for distributed programming with transparent recovery. School of Computer Science, Carnegie Mellon University: TR CMU-CS-93-124, 1993.

  8. Arabe J Net al. Dome: Parallel programming in a heterogeneous multi-user environment. School of Computer Science, Carnegie Mellon University: TR CMU-CS-95-137, 1995.

  9. Erik Seligmon, Adam Beguelin. High-level fault tolerance in distributed programs. School of Computer Science, Carnegie Mellon University: CMU-CS-94-223, Dec. 1994.

  10. Eliezer Levy, Abraham Silberschatz. Distributed file systems: Concept and examples.ACM Computing Surveys, 1990, 22(4): 321–374.

    Article  Google Scholar 

  11. Chen P Met al. RAID: High-performance, reliable secondary storage.ACM Computing Surveys, 1994, 26(2): 145–185.

    Article  Google Scholar 

  12. James S Plank. Improving the performance of coordinated checkpointers on networks of workstations using RAID techniques. InSRDS-15: 15th Symposium on Reliable Distributed Systems, Niagra-on-the-Lake, Canada, Oct. 1996, pp.76–85.

  13. James S Plank. A tutorial on reed-Solomon coding for fault-tolerance in RAID-like systems. Tenn. University: TR UT-CS-96-332, July 1996.

  14. Schwarz T J E, Burkhard W A. RAID: Organization and performance. InProc. the 12th Int. Conf. Dist. Comp. Sys., Yokohama, June 1992, pp.318–325.

  15. James S Plank. Efficient check-pointing on MIMD architectures [dissertation]. Princeton University, Princeton, 1993.

    Google Scholar 

  16. Manivanan D, Mukesh Singhal. A low-overhead recovery technique using quasi-synchronous checkpointing. InIEEE Proceedings of the 16th ICDCS, Hong Kong 1996, pp.100–107.

  17. JU Jiubin, WEI Xiaohuiet al. Implementing process migration in PVM with check-pointing.Journal of Software, 1996, 7(3): 175–179. (in Chinese).

    Google Scholar 

  18. JU Jiubin, WEI Xiaohuiet al. DPVM: An enhanced PVM supporting task migration and quening.Chinese Journal of Computers, 1997, 20(10): 872–877. (in Chinese)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Project supported by NNSFC under grant No.69673012.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wei, X., Ju, J. SCR algorithm: Saving/restoring states of file systems. J. Comput. Sci. & Technol. 15, 393–400 (2000). https://doi.org/10.1007/BF02948877

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02948877

Keywords

Navigation