Cluster Computing

, Volume 7, Issue 2, pp 151–161 | Cite as

Evaluation of Task Assignment Policies for Supercomputing Servers: The Case for Load Unbalancing and Fairness

  • Bianca Schroeder
  • Mor Harchol-Balter
Article

Abstract

While the MPP is still the most common architecture in supercomputer centers today, a simpler and cheaper machine configuration is appearing at many supercomputing sites. This alternative setup may be described simply as a collection of multiprocessors or a distributed server system. This collection of multiprocessors is fed by a single common stream of jobs, where each job is dispatched to exactly one of the multiprocessor machines for processing.

The biggest question which arises in such distributed server systems is what is a good rule for assigning jobs to host machines: i.e. what is a good task assignment policy. Many task assignment policies have been proposed, but not systematically evaluated under supercomputing workloads.

In this paper we start by comparing existing task assignment policies using a trace-driven simulation under supercomputing workloads. We validate our experiments by providing analytical proofs of the performance of each of these policies. These proofs also help provide much intuition. We find that while the performance of supercomputing servers varies widely with the task assignment policy, none of the above task assignment policies perform as well as we would like.

We observe that all policies proposed thus far aim to balance load among the hosts. We propose a policy which purposely unbalances load among the hosts, yet, counter-to-intuition, is also fair in that it achieves the same expected slowdown for all jobs – thus no jobs are biased against. We evaluate this policy again using both trace-driven simulation and analysis. We find that the performance of the load unbalancing policy is significantly better than the best of those policies which balance load.

load balancing task scheduling performance evaluation fairness 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Baily, Foster, Hoang, Jette, Klingner, Kramer, Macaluso, Messina, Nielsen, Reed, Rudolph, Smith, Tomkins, Towns and Vildibill, Valuation of ultra-scale computing systems, White Paper (1999).Google Scholar
  2. [2]
    A. Bestavros, Load profiling: A methodology for scheduling real-time tasks in a distributed system, in: Proceedings of ICDCS '97 (May 1997).Google Scholar
  3. [3]
    S. Blomquist and C. Hill, Personal communication (2000).Google Scholar
  4. [4]
    S. Blomquist, C. Hill, J. Ho, C. Leiserson, L. Rudolph, M. Squillante and K. Stanley, Personal communication (2000).Google Scholar
  5. [5]
    M.E. Crovella, M. Harchol-Balter and C. Murta, Task assignment in a distributed system: Improving performance by unbalancing load, in: Sigmetrics '98 Poster Session (1998).Google Scholar
  6. [6]
    A.B. Downey, A parallel workload model and its implications for processor allocation, in: Proceedings of High Performance Distributed Computing (August 1997) pp. 112–123.Google Scholar
  7. [7]
    D. Feitelson, The parallel workload archive, http://www.cs.huji.ac.il/labs/parallel/workload/ (1998).Google Scholar
  8. [8]
    D. Feitelson, L. Rudolph, U. Schwiegelshohn, K. Sevcik and P. Wong, Theory and practice in parallel job scheduling, in: Proceedings of IPPS/SPDP '97 Workshop, Lecture Notes in Computer Science, Vol. 1291 (April 1997) pp. 1–34.Google Scholar
  9. [9]
    R. Gibbons, A historical application profiler for use by parallel schedulers, in: Proceedings of IPPS/SPDP '97 Workshop, Lecture Notes in Computer Science, Vol. 1291 (April 1997) pp. 58–77.Google Scholar
  10. [10]
    M. Harchol-Balter, Task assignment with unknown duration, in: Proceedings of ICDCS (2000), to appear.Google Scholar
  11. [11]
    M. Harchol-Balter, M. Crovella and C. Murta, On choosing a task assignment policy for a distributed server system, IEEE Journal of Parallel and Distributed Computing 59 (1999) 204–228.Google Scholar
  12. [12]
    M. Harchol-Balter and A. Downey, Exploiting process lifetime distributions for dynamic load balancing, ACM Transactions on Computer Systems 15(3) (1997).Google Scholar
  13. [13]
    C. Leiserson, The Pleiades alpha cluster at M.I.T., Documentation at: //http://bonanza.lcs.mit.edu/ (1998).Google Scholar
  14. [14]
    C. Leiserson, The Xolas supercomputing project at M.I.T., Documentation available at: http://xolas.lcs.mit.edu (1998).Google Scholar
  15. [15]
    E.W. Parsons and K.C. Sevcik, Implementing multiprocessor scheduling disciplines, in: Proceedings of IPPS/SPDP '97 Workshop, Lecture Notes in Computer Science, Vol. 1459 (April 1997) pp. 166–182.Google Scholar
  16. [16]
    W. Smith, V. Taylor and I. Foster, Using runtime predictions to estimate queue wait times and improve scheduler performance, in Proceedings of IPPS/SPDP '99 Workshop, Lecture Notes in Computer Science, Vol. 1659 (April 1999) pp. 202–219.Google Scholar
  17. [17]
    S. Sozaki and R. Ross, Approximations in finite capacity multiserver queues with poisson arrivals, Journal of Applied Probability 13 (1978) 826–834.Google Scholar
  18. [18]
    J. Subhlok, T. Gross and T. Suzuoka, Impacts of job mix on optimizations for space sharing schedulers, in: Proceedings of Supercomputing (1996).Google Scholar
  19. [19]
    Supercomputing at the NAS facility, http://www.nas.nasa.gov/Technology/Supercomputing/ (1998).Google Scholar
  20. [20]
    The PSC's Cray J90's, http://www.psc.edu/machines/cray/j90/j90.html (1998).Google Scholar
  21. [21]
    R.W. Wolff, Stochastic Modeling and the Theory of Queues (Prentice Hall, 1989).Google Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  • Bianca Schroeder
    • 1
  • Mor Harchol-Balter
    • 1
  1. 1.School of Computer ScienceCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations