Scheduling Restartable Jobs with Short Test Runs

  • Ojaswirajanya Thebe
  • David P. Bunde
  • Vitus J. Leung
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5798)

Abstract

In this paper, we examine the concept of giving every job a trial run before committing it to run until completion. Trial runs allow immediate job failures to be detected shortly after job submission and benefit short jobs by letting them run and finish early. This occurs without inflicting a significant penalty on longer jobs, whose average and maximum waiting time are actually improved in some cases. The strategy does not require preemption and instead uses the ability to kill and restart a job from the beginning, which it does at most once for each job. While others have proposed similar strategies, our algorithm is distinguished by its determination to give each job a fixed-length trial run as soon as possible. Our study is also more focused, including a detailed description of the algorithm and an examination of the effect of varying the length of a trial run.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Feitelson, D.G., Rudolph, L. (eds.): IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291. Springer, Heidelberg (1997)Google Scholar
  2. 2.
    Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.): JSSPP 2002. LNCS, vol. 2537. Springer, Heidelberg (2002)MATHGoogle Scholar
  3. 3.
    Chiang, S.-H., Arpaci-Dusseau, A., Vernon, M.K.: The impact of more accurate requested runtimes on production job scheduling performance. In: Proc. 8th Workshop on Job Scheduling Strategies for Parallel Processing, [2], pp. 103–127Google Scholar
  4. 4.
    Chiang, S.-H., Mansharamani, R., Vernon, M.: Use of application characteristics and limited preemption for run-to-completion parallel processor scheduling policies. In: Proc. ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, pp. 33–44 (1994)Google Scholar
  5. 5.
    Chiang, S.-H., Vernon, M.K.: Production job scheduling for parallel shared memory systems. In: Proc. 15th IEEE Intern. Parallel and Distributed Processing Symp. (2001)Google Scholar
  6. 6.
    Downey, A.B.: Using queue time predictions for processor allocation. In: Proc. 3rd Workshop on Job Scheduling Strategies for Parallel Processing [2], pp. 35–57Google Scholar
  7. 7.
    Feitelson, D.: The parallel workloads archive, http://www.cs.huji.ac.il/labs/parallel/workload/index.html
  8. 8.
    Gibbons, R.: A historical application profiler for use by parallel schedulers. In: Proc. 3rd Workshop on Job Scheduling Strategies for Parallel Processing [1]Google Scholar
  9. 9.
    Kettimuthu, R., Subramani, V., Srinivasan, S., Gopalsamy, T., Panda, D.K., Sadayappan, P.: Selective preemption strategies for parallel job scheduling. Intern. J. of High Performance Computing and Networking 3(2/3), 122–152 (2005)CrossRefGoogle Scholar
  10. 10.
    Lawson, B., Smirni, E., Puiu, D.: Self-adapting backfilling scheduling for parallel systems. In: Proc. 31st Intern. Conf. Parallel Processing, pp. 583–592 (2002)Google Scholar
  11. 11.
    Lawson, B.G., Smirni, E.: Multiple-queue backfilling scheduling with priorities and reservations for parallel systems. In: Proc. 8th Workshop on Job Scheduling Strategies for Parallel Processing [2]Google Scholar
  12. 12.
    Lee, C.B., Schwartzman, Y., Hardy, J., Snavely, A.: Are user runtime estimates inherently inaccurate? In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 253–263. Springer, Heidelberg (2005)Google Scholar
  13. 13.
    Lifka, D.: The ANL/IBM SP scheduling system. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 295–303. Springer, Heidelberg (1995)Google Scholar
  14. 14.
    Mu’alem, A.W., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel and Distributed Syst. 12(6), 529–543 (2001)CrossRefGoogle Scholar
  15. 15.
    Nissimov, A., Feitelson, D.G.: Probabilistic backfilling. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2007. LNCS, vol. 4942, pp. 102–115. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  16. 16.
    Perković, D., Keleher, P.J.: Randomization, speculation, and adaptation in batch schedulers. In: Proc. 2000 ACM/IEEE Conf. on Supercomputing (2000)Google Scholar
  17. 17.
    Schwiegelshohn, U., Yahyapour, R.: Improving first-come-first-serve job scheduling by gang scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 180–198. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  18. 18.
    Shmueli, E., Feitelson, D.G.: On simulation and design of parallel-systems schedulers: Are we doing the right thing? IEEE Trans. Parallel and Distributed Systems (to appear)Google Scholar
  19. 19.
    Snell, Q.O., Clement, M.J., Jackson, D.B.: Preemption based backfill. In: Proc. 8th Workshop on Job Scheduling Strategies for Parallel Processing [2], pp. 24–37Google Scholar
  20. 20.
    Srinivasan, S., Kettimuthu, R., Subramani, V., Sadayappan, P.: Characterization of backfilling strategies for parallel job scheduling. In: Proc. Intern. Conf. on Parallel Processing Workshops, pp. 514–522 (2002)Google Scholar
  21. 21.
    Tsafrir, D., Etsion, Y., Feitelson, D.G.: Backfilling using system-generated predictions rather than user runtime estimates. IEEE Trans. on Parallel and Distributed Systems 18(6), 789–803 (2007)CrossRefGoogle Scholar
  22. 22.
    Tsafrir, D., Feitelson, D.G.: The dynamics of backfilling: Solving the mystery of why increased inaccuracy help. In: Proc. IEEE Intern. Symp. on Workload Characterization, pp. 131–141 (2006)Google Scholar
  23. 23.
    Zotkin, D., Keleher, P.J.: Job-length estimation and performance in backfilling schedulers. In: Proc. 8th IEEE International Symposium on High Performance Distributed Computing, pp. 236–243 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Ojaswirajanya Thebe
    • 1
  • David P. Bunde
    • 1
  • Vitus J. Leung
    • 2
  1. 1.Knox College 
  2. 2.Sandia National Laboratories 

Personalised recommendations