Advertisement

Metrics for Parallel Job Scheduling and Their Convergence

  • Dror G. Feitelson
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2221)

Abstract

The arrival process of jobs submitted to a parallel system is bursty, leading to fluctuations in the load at many time scales. In particular, rare events of extreme load may occur. Such events lead to an increase in the standard deviation of performance metrics, and thus delay the convergence of simulations used to evaluate the scheduling. Different performance metrics have been proposed in an effort to reduce this variability, and indeed display different rates of convergence. However, there is no single metric that outperforms the others under all conditions. Rather, the convergence of different metrics depends on the system being studied.

Keywords

Arrival Process Interarrival Time Average Response Time Workload Model Runtime Distribution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    P. Brinch Hansen, “An analysis of response ratio scheduling”. In IFIP Congress, Ljubljana, pp. TA–3 150–154, Aug 1971.Google Scholar
  2. 2.
    W. Cirne and F. Berman, “Adaptive selection of partition size for supercomputer requests”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 187–207, Springer Verlag, 2000. Lect. Notes Comput. Sci. vol. 1911.CrossRefGoogle Scholar
  3. 3.
    M. E. Crovella and A. Bestavros, “Self-similarity in world wide web traffic: evidence and possible causes”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 160–169, May 1996.Google Scholar
  4. 4.
    M. E. Crovella and L. Lipsky, “Long-lasting transient conditions in simulations with heavy-tailed workloads”. In Winter Simulation conf., Dec 1997.Google Scholar
  5. 5.
    A. B. Downey, “A parallel workload model and its implications for processor allocation”. In 6th Intl. Symp. High Performance Distributed Comput., Aug 1997.Google Scholar
  6. 6.
    A. B. Downey and D. G. Feitelson, “The elusive goal of workload characterization”. Performance Evaluation Rev. 26(4), pp. 14–29, Mar 1999.CrossRefGoogle Scholar
  7. 7.
    D. L. Eager, E. D. Lazowska, and J. Zahorjan, “The limited performance benefits of migrating active processes for load sharing”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 63–72, May 1988.Google Scholar
  8. 8.
    D. G. Feitelson, “Packing schemes for gang scheduling”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 89–110, Springer-Verlag, 1996. Lect. Notes Comput. Sci. vol. 1CrossRefGoogle Scholar
  9. 9.
    D. G. Feitelson, L. Rudolph, U. Schwiegelshohn, K. C. Sevcik, and P. Wong, “Theory and practice in parallel job scheduling”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 1–34, Springer Verlag, 1997. Lect. Notes Comput. Sci. vol. 1Google Scholar
  10. 10.
    R. Giladi and N. Ahituv, “SPEC as a performance evaluation measure”. Computer 28(8), pp. 33–42, Aug 1995.CrossRefGoogle Scholar
  11. 11.
    M. Harchol-Balter, M. E. Crovella, and C. D. Murta, “On choosing a task assignment policy for a distributed server system”. In Computer Performance Evaluation, R. Puigjaner, N. Savino, and B. Serra (eds.), pp. 231–242, Springer-Verlag, 1998.Google Scholar
  12. 12.
    M. Harchol-Balter and A. B. Downey, “Exploiting process lifetime distributions for dynamic load balancing”. ACM Trans. Comput. Syst. 15(3), pp. 253–285, Aug 1997.Google Scholar
  13. 13.
    P. Heidelberger, “Fast simulation of rare events in queueing and reliability models”. ACM Trans. Modeling & Comput. Simulation 5(1), pp. 43–85, Jan 1995.MATHCrossRefGoogle Scholar
  14. 14.
    R. Jain, The Art of Computer Systems Performance Analysis. John Wiley & Sons, 1991.Google Scholar
  15. 15.
    J. Jann, P. Pattnaik, H. Franke, F. Wang, J. Skovira, and J. Riodan, “Modeling of workload in MPPs”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 95–116, Springer Verlag, 1997. Lect. Notes Comput. Sci. vol. 1291.Google Scholar
  16. 16.
    E. D. Lazowska, “The use of percentiles in modeling CPU service time distributions”. In Computer Performance, K. M. Chandy and M. Reiser (eds.), pp. 53–66, North-Holland, 197Google Scholar
  17. 17.
    W. E. Leland and T. J. Ott, “Load-balancing heuristics and process behavior”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 54–69, 1986.Google Scholar
  18. 18.
    D. Lifka, “The ANL/IBM SP scheduling system”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 295–303, Springer-Verlag, 1995. Lect. Notes Comput. Sci. vol. 949.Google Scholar
  19. 19.
    U. Lublin, A Workload Model for Parallel Computer Systems. Master’s thesis, Hebrew University, 1999. (In Hebrew).Google Scholar
  20. 20.
    M. H. MacDougall, Simulating Computer Systems: Techniques and Tools. MIT Press, 1987.Google Scholar
  21. 21.
    A. W. Mu’alem and D. G. Feitelson, “Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling”. IEEE Trans. Parallel & Distributed Syst. 12(6), pp. 529–543, Jun 2001.CrossRefGoogle Scholar
  22. 22.
    K. Pawlikowski, “Steady-state simulation of queueing processes: a survey of problems and solutions”. ACM Comput. Surv. 22(2), pp. 123–170, Jun 1990.CrossRefMathSciNetGoogle Scholar
  23. 23.
    R. F. Rosin, “Determining a computing center environment”. Comm. ACM 8(7), pp. 465–468, Jul 1965.CrossRefGoogle Scholar
  24. 24.
    D. Zotkin and P. J. Keleher, “Job-length estimation and performance in backfilling schedulers”. In 8th Intl. Symp. High Performance Distributed Comput., Aug 1999.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Dror G. Feitelson
    • 1
  1. 1.School of Computer Science and EngineeringThe Hebrew UniversityJerusalemIsrael

Personalised recommendations