SCR algorithm: Saving/restoring states of file systems

Wei, Xiaohui; Ju, Jiubin

doi:10.1007/BF02948877

SCR algorithm: Saving/restoring states of file systems

Published: July 2000

Volume 15, pages 393–400, (2000)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Wei Xiaohui¹ &
Ju Jiubin¹

44 Accesses
1 Citation
Explore all metrics

Abstract

Fault-tolerance is very important in cluster computing and has been implemented in many famous cluster-computing systems using checkpoint/restart mechanisms. But existent check-pointing algorithms cannot restore the states of a file system when roll-backing the running of a program, so there are many restrictions on file accesses in existent fault-tolerance systems. SCR algorithm, an algorithm based on atomic operation and consistent schedule, which can restore the states of file systems, is presented in this paper. In the SCR algorithm, system calls on file systems are classified into idem-potent operations and non-idem-potent operations. A non-idem-potent operation modifies a file system’s states, while an idem-potent operation does not. SCR algorithm tracks changes of the file system states. It logs each non-idem-potent operation used by user programs and the information that can restore the operation in disks. When check-pointing roll-backing the program, SCR algorithm will revert the file system states to the last checkpoint time. By using SCR algorithm, users, are allowed to use any file operation in their programs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Sunderam V S. PVM: A framework for parallel distributed computing.Concurrency: Practice and Expernce, 1990, 2(4): 315–339.
Article Google Scholar
Litzkow M. Supporting check-pointing and process migration outside the Unix kernel. InProc. USENIX-Winter’92, San Francisco, CA, 1992, pp.283–290.
Litzkow M, Miron L, Mattw M. Condor — A hunter of idle workstations. InIEEE 8ICDCS, San Jose, California, 1988, pp. 104–111.
Casas Jet al. Mist: PVM with transparent migration and check-pointing InProc. the 3rd Annual PVM User’s Group Meeting, Pittsburgh, 1995.
Casas Jet al. MPVM: A migration transparent version of PVM. Dept. of Computer Science and Engineering, Oregon Graduate Institute of Science & Technology: TR CSE-95-002, Feb. 1995.
Stellner G. Resource management and check-pointing for PVM. InProc. the 2rd European User’s Group Meeting, Lyon, France, 1995, pp.131–136.
Juan León, Allan L Fisher, Peter Steenkiste. Fail-safe PVM: A portable package for distributed programming with transparent recovery. School of Computer Science, Carnegie Mellon University: TR CMU-CS-93-124, 1993.
Arabe J Net al. Dome: Parallel programming in a heterogeneous multi-user environment. School of Computer Science, Carnegie Mellon University: TR CMU-CS-95-137, 1995.
Erik Seligmon, Adam Beguelin. High-level fault tolerance in distributed programs. School of Computer Science, Carnegie Mellon University: CMU-CS-94-223, Dec. 1994.
Eliezer Levy, Abraham Silberschatz. Distributed file systems: Concept and examples.ACM Computing Surveys, 1990, 22(4): 321–374.
Article Google Scholar
Chen P Met al. RAID: High-performance, reliable secondary storage.ACM Computing Surveys, 1994, 26(2): 145–185.
Article Google Scholar
James S Plank. Improving the performance of coordinated checkpointers on networks of workstations using RAID techniques. InSRDS-15: 15th Symposium on Reliable Distributed Systems, Niagra-on-the-Lake, Canada, Oct. 1996, pp.76–85.
James S Plank. A tutorial on reed-Solomon coding for fault-tolerance in RAID-like systems. Tenn. University: TR UT-CS-96-332, July 1996.
Schwarz T J E, Burkhard W A. RAID: Organization and performance. InProc. the 12th Int. Conf. Dist. Comp. Sys., Yokohama, June 1992, pp.318–325.
James S Plank. Efficient check-pointing on MIMD architectures [dissertation]. Princeton University, Princeton, 1993.
Google Scholar
Manivanan D, Mukesh Singhal. A low-overhead recovery technique using quasi-synchronous checkpointing. InIEEE Proceedings of the 16th ICDCS, Hong Kong 1996, pp.100–107.
JU Jiubin, WEI Xiaohuiet al. Implementing process migration in PVM with check-pointing.Journal of Software, 1996, 7(3): 175–179. (in Chinese).
Google Scholar
JU Jiubin, WEI Xiaohuiet al. DPVM: An enhanced PVM supporting task migration and quening.Chinese Journal of Computers, 1997, 20(10): 872–877. (in Chinese)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Jilin University, 130023, Changchun, P.R. China
Wei Xiaohui & Ju Jiubin

Authors

Wei Xiaohui
View author publications
You can also search for this author in PubMed Google Scholar
Ju Jiubin
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Project supported by NNSFC under grant No.69673012.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wei, X., Ju, J. SCR algorithm: Saving/restoring states of file systems. J. Comput. Sci. & Technol. 15, 393–400 (2000). https://doi.org/10.1007/BF02948877

Download citation

Received: 12 October 1998
Revised: 01 September 1999
Issue Date: July 2000
DOI: https://doi.org/10.1007/BF02948877

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SCR algorithm: Saving/restoring states of file systems

Abstract

Access this article

Similar content being viewed by others

A Fail-Safe NVRAM Based Mechanism for Efficient Creation and Recovery of Data Copies in Parallel MPI Applications

Syncpal: A Simple and Iterative Reconciliation Algorithm for File Synchronizers

A Resilient Hierarchical Checkpointing Algorithm for Distributed Systems Running on Cluster Federation

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SCR algorithm: Saving/restoring states of file systems

Abstract

Access this article

Similar content being viewed by others

A Fail-Safe NVRAM Based Mechanism for Efficient Creation and Recovery of Data Copies in Parallel MPI Applications

Syncpal: A Simple and Iterative Reconciliation Algorithm for File Synchronizers

A Resilient Hierarchical Checkpointing Algorithm for Distributed Systems Running on Cluster Federation

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation