Evaluation of Task Assignment Policies for Supercomputing Servers: The Case for Load Unbalancing and Fairness
- 124 Downloads
While the MPP is still the most common architecture in supercomputer centers today, a simpler and cheaper machine configuration is appearing at many supercomputing sites. This alternative setup may be described simply as a collection of multiprocessors or a distributed server system. This collection of multiprocessors is fed by a single common stream of jobs, where each job is dispatched to exactly one of the multiprocessor machines for processing.
The biggest question which arises in such distributed server systems is what is a good rule for assigning jobs to host machines: i.e. what is a good task assignment policy. Many task assignment policies have been proposed, but not systematically evaluated under supercomputing workloads.
In this paper we start by comparing existing task assignment policies using a trace-driven simulation under supercomputing workloads. We validate our experiments by providing analytical proofs of the performance of each of these policies. These proofs also help provide much intuition. We find that while the performance of supercomputing servers varies widely with the task assignment policy, none of the above task assignment policies perform as well as we would like.
We observe that all policies proposed thus far aim to balance load among the hosts. We propose a policy which purposely unbalances load among the hosts, yet, counter-to-intuition, is also fair in that it achieves the same expected slowdown for all jobs – thus no jobs are biased against. We evaluate this policy again using both trace-driven simulation and analysis. We find that the performance of the load unbalancing policy is significantly better than the best of those policies which balance load.
Unable to display preview. Download preview PDF.
- Baily, Foster, Hoang, Jette, Klingner, Kramer, Macaluso, Messina, Nielsen, Reed, Rudolph, Smith, Tomkins, Towns and Vildibill, Valuation of ultra-scale computing systems, White Paper (1999).Google Scholar
- A. Bestavros, Load profiling: A methodology for scheduling real-time tasks in a distributed system, in: Proceedings of ICDCS '97 (May 1997).Google Scholar
- S. Blomquist and C. Hill, Personal communication (2000).Google Scholar
- S. Blomquist, C. Hill, J. Ho, C. Leiserson, L. Rudolph, M. Squillante and K. Stanley, Personal communication (2000).Google Scholar
- M.E. Crovella, M. Harchol-Balter and C. Murta, Task assignment in a distributed system: Improving performance by unbalancing load, in: Sigmetrics '98 Poster Session (1998).Google Scholar
- A.B. Downey, A parallel workload model and its implications for processor allocation, in: Proceedings of High Performance Distributed Computing (August 1997) pp. 112–123.Google Scholar
- D. Feitelson, The parallel workload archive, http://www.cs.huji.ac.il/labs/parallel/workload/ (1998).Google Scholar
- D. Feitelson, L. Rudolph, U. Schwiegelshohn, K. Sevcik and P. Wong, Theory and practice in parallel job scheduling, in: Proceedings of IPPS/SPDP '97 Workshop, Lecture Notes in Computer Science, Vol. 1291 (April 1997) pp. 1–34.Google Scholar
- R. Gibbons, A historical application profiler for use by parallel schedulers, in: Proceedings of IPPS/SPDP '97 Workshop, Lecture Notes in Computer Science, Vol. 1291 (April 1997) pp. 58–77.Google Scholar
- M. Harchol-Balter, Task assignment with unknown duration, in: Proceedings of ICDCS (2000), to appear.Google Scholar
- M. Harchol-Balter, M. Crovella and C. Murta, On choosing a task assignment policy for a distributed server system, IEEE Journal of Parallel and Distributed Computing 59 (1999) 204–228.Google Scholar
- M. Harchol-Balter and A. Downey, Exploiting process lifetime distributions for dynamic load balancing, ACM Transactions on Computer Systems 15(3) (1997).Google Scholar
- C. Leiserson, The Pleiades alpha cluster at M.I.T., Documentation at: //http://bonanza.lcs.mit.edu/ (1998).Google Scholar
- C. Leiserson, The Xolas supercomputing project at M.I.T., Documentation available at: http://xolas.lcs.mit.edu (1998).Google Scholar
- E.W. Parsons and K.C. Sevcik, Implementing multiprocessor scheduling disciplines, in: Proceedings of IPPS/SPDP '97 Workshop, Lecture Notes in Computer Science, Vol. 1459 (April 1997) pp. 166–182.Google Scholar
- W. Smith, V. Taylor and I. Foster, Using runtime predictions to estimate queue wait times and improve scheduler performance, in Proceedings of IPPS/SPDP '99 Workshop, Lecture Notes in Computer Science, Vol. 1659 (April 1999) pp. 202–219.Google Scholar
- S. Sozaki and R. Ross, Approximations in finite capacity multiserver queues with poisson arrivals, Journal of Applied Probability 13 (1978) 826–834.Google Scholar
- J. Subhlok, T. Gross and T. Suzuoka, Impacts of job mix on optimizations for space sharing schedulers, in: Proceedings of Supercomputing (1996).Google Scholar
- Supercomputing at the NAS facility, http://www.nas.nasa.gov/Technology/Supercomputing/ (1998).Google Scholar
- The PSC's Cray J90's, http://www.psc.edu/machines/cray/j90/j90.html (1998).Google Scholar
- R.W. Wolff, Stochastic Modeling and the Theory of Queues (Prentice Hall, 1989).Google Scholar