Advertisement

Validating Desktop Grid Results By Comparing Intermediate Checkpoints

  • Filipe Araujo
  • Patricio Domingues
  • Derrick Kondo
  • Luis Moura Silva

Abstract

We present a scheme based on the comparison of intermediate checkpoints that accelerates the detection of computing errors of bag-of-tasks executed on volunteer desktop grids. Currently, in the state-of-the-art, replicated task execution is used for result validation. Our method also uses replication, but instead of only comparing results at the end of the replicated computations, we validate ongoing executions by comparing checkpoints of their intermediate execution points. This scheme significantly reduces the time to detect a computational error, which we show with both theoretical analysis and simulation results. In particular, we develop a model that gives the benefit of intermediate checkpointing as a function of checkpoint frequency and error rate, and we confirm this model with simulation experiments. We find that with an error rate of 5% and checkpoint frequency of 20 times per task, the gain is as high as 35% compared to the case where error detection is done only at the end of task execution; for higher checkpoint frequencies or high error rates, the benefits are even greater. In addition, when an erroneous computation is detected at an intermediate execution point, we propose the immediate replacement of that computation with a correct replica from another worker. In this way, useful execution and further validation can continue from that point onward instead of being delayed.

Keywords

Desktop grid error detection checkpointing redundancy 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    A. Agbaria and R. Friedman. A replication-and checkpoint-based approach for anomaly-based intrusion detection and recovery. Distributed Computing Systems Workshops, 2005. 25th IEEE International Conference on, pages 137–143, 2005.Google Scholar
  2. [2]
    D. Allen. Personal communication, June 2006.Google Scholar
  3. [3]
    C. An. Personal communication, March 2006.Google Scholar
  4. [4]
    D. Anderson. BOINC: A system for public-resource computing and storage. In 5th IEEE/ACM International Workshop on Grid Computing, Pittsburgh, USA, 2004.Google Scholar
  5. [5]
    D. Antonelli, A. Cordero, and A. Mettler. Securing Distributed Computation with Untrusted Participants. 2004.Google Scholar
  6. [6]
    J. Bohannon. Grassroots supercomputing. Science, 308(6 May):810–813, 2005.CrossRefGoogle Scholar
  7. [7]
    C. Christensen, T. Aina, and D. Stainforth. The challenge of volunteer computing with lengthy climate model simulations. In 1st IEEE International Conference on e-Science and Grid Computing, pages 8–15, Melbourne, Australia, 2005. IEEE Computer Society.Google Scholar
  8. [8]
    W. Du, J. Jia, M. Mangal, and M. Murugesan. Uncheatable grid computing. Distributed Computing Systems, 2004. Proceedings. 24th International Conference on, pages 4–11, 2004.Google Scholar
  9. [9]
    D. Eastlake and P. Jones. RFC 3174: US Secure Hash Algorithm 1 (SHA1). Request for Comments, September, 2001.Google Scholar
  10. [10]
    G. Fedak, C. Germain, V. Neri, and F. Cappello. Xtremweb: A generic global computing system. In 1st Int’l Symposium on Cluster Computing and the Grid (CCGRID’01), pages 582–587, Brisbane, 2001.Google Scholar
  11. [11]
    A. Holohan and A. Garg. Collaboration Online: The Example of Distributed Computing. Journal of Computer-Mediated Communication, 10(4), 2005.Google Scholar
  12. [12]
    D. Molnar. The SETI@home Problem. ACM Crossroads Student Magazine, september 2000.Google Scholar
  13. [13]
    R. Rivest. RFC-1321 The MD5 Message-Digest Algorithm. Network Working Group, IETF, April 1992.Google Scholar
  14. [14]
    L. Sarmenta. Sabotage-tolerance mechanisms for volunteer computing systems. In 1st International Symposium on Cluster Computing and the Grid, page 337, 2001.Google Scholar
  15. [15]
    L. M. Silva and J. G. Silva. System-level versus user-defined checkpointing. In Symposium on Reliable Distributed Systems, pages 68–74, 1998.Google Scholar
  16. [16]
    S. Son and M. Livny. Recovering Internet Symmetry in Distributed Computing. Cluster Computing and the Grid, 2003. Proceedings. CCGrid 2003. 3rd IEEE/ACM International Symposium on, pages 542–549, 2003.Google Scholar
  17. [17]
    M. Taufer, P. J. Teller, D. P. Anderson, and I. Charles L. Brooks. Metrics for effective resource management in global computing environments. e-science, 0:204–211, 2005.Google Scholar
  18. [18]
    XtremLab. http://xtremlab.lri.fr.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Filipe Araujo
    • 1
  • Patricio Domingues
    • 2
  • Derrick Kondo
    • 3
  • Luis Moura Silva
    • 1
  1. 1.CISUC Department of Informatics EngineeringUniversity of CoimbraPortugal
  2. 2.School of Technology and ManagementPolytechnic Institute of LeiriaPortugal
  3. 3.Laboratoire de Recherche en Informatique/INRIA FutursFrance

Personalised recommendations