Abstract
Jobs in Grid workflows are exposed to different types of failure. It is important to develop fault tolerant mechanisms to ensure a good level of reliability during the execution of Grid jobs. While checkpointing is the most common method to achieve fault tolerance, there is still a lot of work to be done to improve the efficiency of the mechanism. This paper gives an overview of a checkpoint solution for checkpointing parallel applications executed on multiple sites in the Grid environment. The checkpointing mechanism is an improvement of the PGRADE checkpointing solution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Sajadah, K., Terstyansky, G., Winter, S.C., Kacsuk, P. (2008). Checkpointing of Parallel Applications in a Grid Environment. In: Kacsuk, P., Lovas, R., Németh, Z. (eds) Distributed and Parallel Systems. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-79448-8_16
Download citation
DOI: https://doi.org/10.1007/978-0-387-79448-8_16
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-79447-1
Online ISBN: 978-0-387-79448-8
eBook Packages: Computer ScienceComputer Science (R0)