Advertisement

Static Worksharing Strategies for Heterogeneous Computers with Unrecoverable Failures

  • Anne Benoit
  • Yves Robert
  • Arnold Rosenberg
  • Frédéric Vivien
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6043)

Abstract

One has a large workload that is “divisible” (its constituent work’s granularity can be adjusted arbitrarily) and one has access to p remote computers that can assist in computing the workload. How can one best utilize the computers toward this end? Two features complicate this question. First, the remote computers may differ from one another in speed. Second, each remote computer is subject to interruptions of known likelihood that kill all work in progress on it. One wishes to orchestrate sharing the workload with the remote computers in a way that maximizes the expected amount of work completed, given the risk of interruptions. We consider three versions of the preceding problem. Two versions envision heterogeneous computing resources: the remote computers may differ from one another in speed; one version envisions homogeneous computing resources: the remote computers are identical. One of the heterogeneous versions ignores communication costs (i.e., assumes that they are negligible); the other two versions account explicitly for communication costs. We provide exact expressions for the optimal work expectation for all three versions of the problem. For the most general version (heterogeneous resources, with communication costs), we provide a recurrence for computing this expectation; for the other two versions, we provide closed-form expressions.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abawajy, J.: Fault-tolerant scheduling policy for grid computing systems. In: International Parallel and Distributed Processing Symposium IPDPS 2004. IEEE Computer Society Press, Los Alamitos (2004)Google Scholar
  2. 2.
    Albers, S., Schmidt, G.: Scheduling with unexpected machine breakdowns. Discrete Applied Mathematics 110(2-3), 85–99 (2001)zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Awerbuch, B., Azar, Y., Fiat, A., Leighton, F.T.: Making commitments in the face of uncertainty: How to pick a winner almost every time. In: 28th ACM SToC, pp. 519–530 (1996)Google Scholar
  4. 4.
    Beaumont, O., Casanova, H., Legrand, A., Robert, Y., Yang, Y.: Scheduling divisible loads on star and tree networks: results and open problems. IEEE Trans. Parallel Distributed Systems 16(3), 207–218 (2005)CrossRefGoogle Scholar
  5. 5.
    Benoit, A., Robert, Y., Rosenberg, A., Vivien, F.: Static strategies for worksharing with unrecoverable interruptions. In: IPDPS 2009, the 23rd IEEE International Parallel and Distributed Processing Symposium. IEEE Computer Society Press, Los Alamitos (2009)Google Scholar
  6. 6.
    Benoit, A., Robert, Y., Rosenberg, A., Vivien, F.: Static worksharing strategies for heterogeneous computers with unrecoverable failures. Research Report 2009-23, LIP, ENS Lyon, France (July 2009), graal.ens-lyon.fr/~yrobert/
  7. 7.
    Bharadwaj, V., Ghose, D., Mani, V., Robertazzi, T.: Scheduling Divisible Loads in Parallel and Distributed Systems. IEEE Computer Society Press, Los Alamitos (1996)Google Scholar
  8. 8.
    Bharadwaj, V., Ghose, D., Robertazzi, T.: Divisible load theory: a new paradigm for load scheduling in distributed systems. Cluster Computing 6(1), 7–17 (2003)CrossRefGoogle Scholar
  9. 9.
    Bhat, P., Raghavendra, C., Prasanna, V.: Efficient collective communication in distributed heterogeneous systems. Journal of Parallel and Distributed Computing 63, 251–263 (2003)zbMATHCrossRefGoogle Scholar
  10. 10.
    Bhatt, S., Chung, F., Leighton, F., Rosenberg, A.: On optimal strategies for cycle-stealing in networks of workstations. IEEE Trans. Computers 46(5), 545–557 (1997)CrossRefMathSciNetGoogle Scholar
  11. 11.
    Robertazzi, T.: Ten reasons to use divisible load theory. IEEE Computer 36(5), 63–68 (2003)Google Scholar
  12. 12.
    Rosenberg, A.L.: Optimal schedules for cycle-stealing in a network of workstations with a bag-of-tasks workload. IEEE Trans. Parallel Distrib. Syst. 13(2), 179–191 (2002)CrossRefGoogle Scholar
  13. 13.
    Snir, M., Otto, S.W., Huss-Lederman, S., Walker, D.W., Dongarra, J.: MPI the complete reference. The MIT Press, Cambridge (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Anne Benoit
    • 1
    • 4
  • Yves Robert
    • 1
    • 4
  • Arnold Rosenberg
    • 2
  • Frédéric Vivien
    • 3
    • 4
  1. 1.Ecole Normale Supérieure de LyonFrance
  2. 2.Colorado State UniversityFort CollinsUSA
  3. 3.INRIAFrance
  4. 4.LIP, UMR 5668 ENS-CNRS-INRIA-UCBLLyonFrance

Personalised recommendations