Static Worksharing Strategies for Heterogeneous Computers with Unrecoverable Failures
One has a large workload that is “divisible” (its constituent work’s granularity can be adjusted arbitrarily) and one has access to p remote computers that can assist in computing the workload. How can one best utilize the computers toward this end? Two features complicate this question. First, the remote computers may differ from one another in speed. Second, each remote computer is subject to interruptions of known likelihood that kill all work in progress on it. One wishes to orchestrate sharing the workload with the remote computers in a way that maximizes the expected amount of work completed, given the risk of interruptions. We consider three versions of the preceding problem. Two versions envision heterogeneous computing resources: the remote computers may differ from one another in speed; one version envisions homogeneous computing resources: the remote computers are identical. One of the heterogeneous versions ignores communication costs (i.e., assumes that they are negligible); the other two versions account explicitly for communication costs. We provide exact expressions for the optimal work expectation for all three versions of the problem. For the most general version (heterogeneous resources, with communication costs), we provide a recurrence for computing this expectation; for the other two versions, we provide closed-form expressions.
Unable to display preview. Download preview PDF.
- 1.Abawajy, J.: Fault-tolerant scheduling policy for grid computing systems. In: International Parallel and Distributed Processing Symposium IPDPS 2004. IEEE Computer Society Press, Los Alamitos (2004)Google Scholar
- 3.Awerbuch, B., Azar, Y., Fiat, A., Leighton, F.T.: Making commitments in the face of uncertainty: How to pick a winner almost every time. In: 28th ACM SToC, pp. 519–530 (1996)Google Scholar
- 5.Benoit, A., Robert, Y., Rosenberg, A., Vivien, F.: Static strategies for worksharing with unrecoverable interruptions. In: IPDPS 2009, the 23rd IEEE International Parallel and Distributed Processing Symposium. IEEE Computer Society Press, Los Alamitos (2009)Google Scholar
- 6.Benoit, A., Robert, Y., Rosenberg, A., Vivien, F.: Static worksharing strategies for heterogeneous computers with unrecoverable failures. Research Report 2009-23, LIP, ENS Lyon, France (July 2009), graal.ens-lyon.fr/~yrobert/
- 7.Bharadwaj, V., Ghose, D., Mani, V., Robertazzi, T.: Scheduling Divisible Loads in Parallel and Distributed Systems. IEEE Computer Society Press, Los Alamitos (1996)Google Scholar
- 11.Robertazzi, T.: Ten reasons to use divisible load theory. IEEE Computer 36(5), 63–68 (2003)Google Scholar
- 13.Snir, M., Otto, S.W., Huss-Lederman, S., Walker, D.W., Dongarra, J.: MPI the complete reference. The MIT Press, Cambridge (1996)Google Scholar