Skip to main content

Using Cliques Of Nodes To Store Desktop Grid Checkpoints

  • Chapter
Book cover Grid Computing

Checkpoints that store intermediate results of computation have a fundamental impact on the computing throughput of Desktop Grid systems, like BOINC. Currently, BOINC workers store their checkpoints locally. A major limitation of this approach is that whenever a worker leaves unfinished computation, no other worker can proceed from the last stable checkpoint. This forces tasks to be restarted from scratch when the original machine is no longer available.

To overcome this limitation, we propose to share checkpoints between nodes. To organize this mechanism, we arrange nodes to form complete graphs (cliques), where nodes share all the checkpoints they compute. Cliques function as survivable units, where checkpoints and tasks are not lost as long as one of the nodes of the clique remains alive. To simplify construction and maintenance of the cliques, we take advantage of the central supervisor of BOINC. To evaluate our solution, we combine simulation with some real data to answer the most fundamental question: what do we need to pay for increased throughput?

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D. Anderson. BOINC: A system for public-resource computing and storage. In 5th IEEE/ACM International Workshop on Grid Computing, Pittsburgh, USA, 2004.

    Google Scholar 

  2. S. Annapureddy, M. Freedman, and D. Mazieres. Shark: Scaling File Servers via Cooperative Caching. Proceedings of the 2nd USENIX/ACM Symposium on Networked Systems Design and Implementation (NSDI), Boston, USA, May, 2005.

    Google Scholar 

  3. C. Christensen, T. Aina, and D. Stainforth. The challenge of volunteer computing with lengthy climate model simulations. In 1st IEEE International Conference on e-Science and Grid Computing, pages 8-15, Melbourne, Australia, 2005. IEEE Computer Society.

    Google Scholar 

  4. Condor-g. http://www.cs.wisc.edu/condor/condorg/.

  5. P. Domingues, F. Araujo, and L. M. Silva. A DHT-based infrastructure for sharing checkpoints in desktop grid computing. In 2nd IEEE International Conference on e-Science and Grid Computing (eScience ’06), Amsterdam, The Netherlands, December 2006.

    Google Scholar 

  6. P. Domingues, P. Marques, and L. Silva. Resource usage of windows computer laboratories. In International Conference Parallel Processing (ICPP 2005)/Workshop PENPCGCS, pages 469-476, Oslo, Norway, 2005.

    Chapter  Google Scholar 

  7. P. Domingues, J. G. Silva, and L. Silva. Sharing checkpoints to improve turnaround time in desktop grid. In 20th IEEE International Conference on Advanced Information Networking and Applications (AINA 2006), 18-20 April 2006, Vienna, Austria, pages 301-306. IEEE Computer Society, April 2006.

    Chapter  Google Scholar 

  8. P. Druschel and A. Rowstron. Past: A large-scale, persistent peer-to-peer storage utility. In HotOS VIII, Schoss Elmau, Germany, May 2001.

    Google Scholar 

  9. S. Goel, M. Robson, M. Polte, and E. G. Sirer. Herbivore: A scalable and efficient protocol for anonymous communication. Technical Report TR2003-1890, Cornell University Computing and Information Science Technical, February 2003.

    Google Scholar 

  10. S. Kandula, J. K. Lee, and J. C. Hou. LARK: a light-weight, resilient application-level multicast protocol. In IEEE 18th Annual Workshop on computer Communications (CCW 2003). IEEE, October 2003.

    Google Scholar 

  11. A. Martin, T. Aina, C. Christensen, J. Kettleborough, and D. Stainforth. On two kinds of public-resource distributed computing. In Fourth UK e-Science All Hands Meeting, Nottingham, UK, 2005.

    Google Scholar 

  12. S. Rhea, C. Wells, P. Eaton, D. Geels, B. Zhao, H. Weatherspoon, and J. Kubiatowicz. Maintenance-free global data storage. IEEE Internet Computing, 5(5):40-49, 2001.

    Article  Google Scholar 

  13. B. Richard, D. Nioclais Mac, and D. Chalon. Clique: A transparent, peer-to-peer collab- orative file sharing system. Technical Report HPL-2002-307, HP Laboratories Grenoble, 2002.

    Google Scholar 

  14. E. Sit, J. Cates, and R. Cox. A DHT-based backup system, 2003.

    Google Scholar 

  15. D. Thain, T. Tannenbaum, and M. Livny. Distributed computing in practice: the Condor experience. Concurrency and Computation Practice and Experience, 17(2-4):323-356, 2005.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Araujo, F., Domingues, P., Kondo, D., Silva, L.M. (2008). Using Cliques Of Nodes To Store Desktop Grid Checkpoints. In: Gorlatch, S., Fragopoulou, P., Priol, T. (eds) Grid Computing. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09457-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-09457-1_3

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-09456-4

  • Online ISBN: 978-0-387-09457-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics