Skip to main content

Checkpointing of Parallel Applications in a Grid Environment

  • Chapter
Distributed and Parallel Systems

Abstract

Jobs in Grid workflows are exposed to different types of failure. It is important to develop fault tolerant mechanisms to ensure a good level of reliability during the execution of Grid jobs. While checkpointing is the most common method to achieve fault tolerance, there is still a lot of work to be done to improve the efficiency of the mechanism. This paper gives an overview of a checkpoint solution for checkpointing parallel applications executed on multiple sites in the Grid environment. The checkpointing mechanism is an improvement of the PGRADE checkpointing solution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Sajadah, K., Terstyansky, G., Winter, S.C., Kacsuk, P. (2008). Checkpointing of Parallel Applications in a Grid Environment. In: Kacsuk, P., Lovas, R., Németh, Z. (eds) Distributed and Parallel Systems. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-79448-8_16

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-79448-8_16

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-79447-1

  • Online ISBN: 978-0-387-79448-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics