Penalty Scheduling Policy Applying User Estimates and Aging for Supercomputing Centers

  • Nestor Rocchetti
  • Miguel Da Silva
  • Sergio Nesmachnow
  • Andrei Tchernykh
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 697)

Abstract

In this article we address the problem of scheduling on realistic high performance computing facilities using incomplete information about tasks execution times. We introduce a variation of our previous Penalty Scheduling Policy, including an aging scheme that increases the priority of jobs over time. User-provided runtime estimates are applied as in the original Penalty Scheduling Policy, but a realistic priority schema is proposed to avoid starvation. The experimental evaluation of the proposed scheduler is performed using real workload logs, and validated using a job scheduler simulator. We study different realistic workload scenarios to evaluate the performance of the Penalty Scheduling Policy with aging. The main results suggest that using the proposed scheduler with the aging scheme, the waiting time of jobs in the high performance computing facility is significantly reduced (up to 50% in average).

Keywords

High performance computing Scheduling Execution time estimation Aging scheme Penalty scheduling policy 

References

  1. 1.
    Cirne, W., Berman, F.: A comprehensive model of the supercomputer workload. In: IEEE International Workshop on Workload Characterization, pp. 140–148 (2001)Google Scholar
  2. 2.
    Tsafrir, D.: Using inaccurate estimates accurately. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2010. LNCS, vol. 6253, pp. 208–221. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-16505-4_12 CrossRefGoogle Scholar
  3. 3.
    Tsafrir, D., Etsion, Y., Feitelson, D.G.: Modeling user runtime estimates. In: Feitelson, D., Frachtenberg, E., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2005. LNCS, vol. 3834, pp. 1–35. Springer, Heidelberg (2005). doi: 10.1007/11605300_1 CrossRefGoogle Scholar
  4. 4.
    Rocchetti, N., Iturriaga, S., Nesmachnow, S.: Including accurate user estimates in HPC schedulers: an empirical analysis. In: XXI Congreso Argentino de Ciencias de la Computación, pp. 1–10 (2015)Google Scholar
  5. 5.
    Lee, C.B., Schwartzman, Y., Hardy, J., Snavely, A.: Are user runtime estimates inherently inaccurate? In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 253–263. Springer, Heidelberg (2005). doi: 10.1007/11407522_14 CrossRefGoogle Scholar
  6. 6.
    Feitelson, D.: Parallel Workloads Archive. http://www.cs.huji.ac.il/labs/parallel/workload/. Accessed 12 July 2016
  7. 7.
    Ward Jr., W.A., Mahood, C.L., West, J.E.: Scheduling jobs on parallel systems using a relaxed backfill strategy. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 88–102. Springer, Heidelberg (2002). doi: 10.1007/3-540-36180-4_6 CrossRefGoogle Scholar
  8. 8.
    Chiang, S.-H., Arpaci-Dusseau, A., Vernon, M.K.: The impact of more accurate requested runtimes on production job scheduling performance. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 103–127. Springer, Heidelberg (2002). doi: 10.1007/3-540-36180-4_7 CrossRefGoogle Scholar
  9. 9.
    Hirales-Carbajal, A., Tchernykh, A., Yahyapour, R., González-García, J.L., Röblitz, T., Ramírez-Alcaraz, J.M.: Multiple workflow scheduling strategies with user runtime estimates on a grid. J. Grid Comput. 10, 325–346 (2012)CrossRefGoogle Scholar
  10. 10.
    Ramírez-Alcaraz, J.M., Tchernykh, A., Yahyapour, R., Schwiegelshohn, U., Quezada-Pina, A., González-García, J.L., Hirales-Carbajal, A.: Job allocation strategies with user run time estimates for online scheduling in hierarchical grids. J. Grid Comput. 9, 95–116 (2011)CrossRefGoogle Scholar
  11. 11.
    Tsafrir, D., Etsion, Y., Feitelson, D.G.: Backfilling using system-generated predictions rather than user runtime estimates. IEEE Trans. Parallel Distrib. Syst. 18, 789–803 (2007)CrossRefGoogle Scholar
  12. 12.
    Nesmachnow, S.: Computación Científica de Alto Desempeño en la Facultad de Ingeniería, Universidad de la República. Rev. Asoc. Ing. Urug. 61(1), 12–15 (2010)Google Scholar
  13. 13.
    Feitelson, D., Tsafrir, D.: Workload sanitation for performance evaluation. In: IEEE International Symposium on Performance Analysis of Systems and Software, pp. 221–230 (2006)Google Scholar
  14. 14.
    Slurm simulator web page. https://www.bsc.es/marenostrum-support-services/services/slurm-simulator. Accessed 12 July 2016
  15. 15.
    Iturriaga, S., García, S., Nesmachnow, S.: An empirical study of the robustness of energy-aware schedulers for high performance computing systems under uncertainty. In: Hernández, G., Hernández, C.J.B., Díaz, G., Garino, C.G., Nesmachnow, S., Pérez-Acle, T., Storti, M., Vázquez, M. (eds.) CARLA 2014. CCIS, vol. 485, pp. 143–157. Springer, Heidelberg (2014). doi: 10.1007/978-3-662-45483-1_11 Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Nestor Rocchetti
    • 1
  • Miguel Da Silva
    • 1
  • Sergio Nesmachnow
    • 1
  • Andrei Tchernykh
    • 2
  1. 1.Universidad de la RepúblicaMontevideoUruguay
  2. 2.CICESE Research CenterEnsenadaMexico

Personalised recommendations