Modeling User Runtime Estimates

  • Dan Tsafrir
  • Yoav Etsion
  • Dror G. Feitelson
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3834)

Abstract

User estimates of job runtimes have emerged as an important component of the workload on parallel machines, and can have a significant impact on how a scheduler treats different jobs, and thus on overall performance. It is therefore highly desirable to have a good model of the relationship between parallel jobs and their associated estimates. We construct such a model based on a detailed analysis of several workload traces. The model incorporates those features that are consistent in all of the logs, most notably the inherently modal nature of estimates (e.g. only 20 different values are used as estimates for about 90% of the jobs). We find that the behavior of users, as manifested through the estimate distributions, is remarkably similar across the different workload traces. Indeed, providing our model with only the maximal allowed estimate value, along with the percentage of jobs that have used it, yields results that are very similar to the original. The remaining difference (if any) is largely eliminated by providing information on one or two additional popular estimates. Consequently, in comparison to previous models, simulations that utilize our model are better in reproducing scheduling behavior similar to that observed when using real estimates.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Chiang, S.-H., Arpaci-Dusseau, A., Vernon, M.K.: The impact of more accurate requested runtimes on production job scheduling performance. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 103–127. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  2. 2.
    Chiang, S.-H., Vernon, M.K.: Characteristics of a large shared memory production workload. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 2001. LNCS, vol. 2221, pp. 159–187. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  3. 3.
    Cirne, W., Berman, F.: A comprehensive model of the supercomputer workload. In: 4th Workshop on Workload Characterization (December 2001)Google Scholar
  4. 4.
    Cirne, W., Berman, F.: A model for moldable supercomputer jobs. In: 15th Intl. Parallel & Distributed Processing Symp. (April 2001)Google Scholar
  5. 5.
    Crovella, M.E.: Performance evaluation with heavy tailed distributions. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 2001. LNCS, vol. 2221, pp. 1–10. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  6. 6.
    Downey, A.B.: A parallel workload model and its implications for processor allocation. In: 6th Intl. Symp. High Performance Distributed Comput., August 1997, pp. 112–124 (1997)Google Scholar
  7. 7.
    Etsion, Y., Tsafrir, D.: A Short Survey of Commercial Cluster Batch Schedulers. Technical Report 2005-13, Hebrew University (May 2005)Google Scholar
  8. 8.
    Feitelson, D.G.: Experimental analysis of the root causes of performance evaluation results: a backfilling case study. IEEE Trans. Parallel & Distributed Syst. 16(2), 175–182 (2005)CrossRefGoogle Scholar
  9. 9.
    Feitelson, D.G.: Parallel workloads archive, http://www.cs.huji.ac.il/labs/parallel/workload
  10. 10.
    Feitelson, D.G., Jette, M.A.: Improved utilization and responsiveness with gang scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 238–261. Springer, Heidelberg (1997)Google Scholar
  11. 11.
    Feitelson, D.G., Mu’alem Weil, A.: Utilization and predictability in scheduling the IBM SP2 with backfilling. In: 12th Intl. Parallel Processing Symp., April 1998, pp. 542–546 (1998)Google Scholar
  12. 12.
    Feitelson, D.G., Nitzberg, B.: Job characteristics of a production parallel scientific workload on the NASA Ames iPSC/860. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 337–360. Springer, Heidelberg (1995)Google Scholar
  13. 13.
    Frachtenberg, E., Feitelson, D.G., Fernandez, J., Petrini, F.: Parallel job scheduling under dynamic workloads. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 208–227. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  14. 14.
    Gibbons, R.: A historical application profiler for use by parallel schedulers. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 58–77. Springer, Heidelberg (1997)Google Scholar
  15. 15.
    Jann, J., Pattnaik, P., Franke, H., Wang, F., Skovira, J., Riodan, J.: Modeling of workload in MPPs. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 95–116. Springer, Heidelberg (1997)Google Scholar
  16. 16.
    Keleher, P.J., Zotkin, D., Perkovic, D.: Attacking the bottlenecks of backfilling schedulers. Cluster Comput. 3(4), 255–263 (2000)CrossRefGoogle Scholar
  17. 17.
    Lee, C.B., Schwartzman, Y., Hardy, J., Snavely, A.: Are user runtime estimates inherently inaccurate? In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 253–263. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  18. 18.
    Li, H., Groep, D., Wolters, J.T.L.: Predicting job start times on clusters. In: International Symposium on Cluster Computing and the Grid, CCGrid (2004)Google Scholar
  19. 19.
    Lifka, D.: The ANL/IBM SP scheduling system. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 295–303. Springer, Heidelberg (1995)Google Scholar
  20. 20.
    Lublin, U., Feitelson, D.G.: The workload on parallel supercomputers: modeling the characteristics of rigid jobs. J. Parallel & Distributed Comput. 63(11), 1105–1122 (2003)MATHCrossRefGoogle Scholar
  21. 21.
    Mu’alem, A.W., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel & Distributed Syst. 12(6), 529–543 (2001)CrossRefGoogle Scholar
  22. 22.
    Perkovic, D., Keleher, P.J.: Randomization, speculation, and adaptation in batch schedulers. In: Supercomputing, September 2000, p. 7 (2000)Google Scholar
  23. 23.
    Smith, W., Foster, I., Taylor, V.: Predicting application run times using historical information. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 122–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  24. 24.
    Talby, D.: User Modeling of Parallel Workloads. PhD thesis, The Hebrew University of Jerusalem, Israel (2000) (in preparation)Google Scholar
  25. 25.
    Tsafrir, D., Etsion, Y., Feitelson, D.G.: Backfilling Using Runtime Predictions Rather Than User Estimates. Technical Report 2005-5, Hebrew University (February 2005)Google Scholar
  26. 26.
    Tsafrir, D., Feitelson, D.G.: Workload Flurries. Technical Report 2003-85, Hebrew University (November 2003)Google Scholar
  27. 27.
    Zhang, Y., Franke, H., Moreira, J.E., Sivasubramaniam, A.: An integrated approach to parallel scheduling using gang-scheduling, backfilling, and migration. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 2001. LNCS, vol. 2221, pp. 133–158. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  28. 28.
    Zilber, J., Amit, O., Talby, D.: What is worth learning from parallel workloads? A user and session based analysis. In: Intl. Conf. Supercomputing (June 2005)Google Scholar
  29. 29.
    Zotkin, D., Keleher, P.J.: Job-length estimation and performance in backfilling schedulers. In: 8th Intl. Symp. High Performance Distributed Comput. (August 1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Dan Tsafrir
    • 1
  • Yoav Etsion
    • 1
  • Dror G. Feitelson
    • 1
  1. 1.School of Computer Science and EngineeringThe Hebrew UniversityJerusalemIsrael

Personalised recommendations