A Job Scheduling Approach for Multi-core Clusters Based on Virtual Malleability

  • Gladys Utrera
  • Siham Tabik
  • Julita Corbalan
  • Jesús Labarta
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7484)


Many commercial job scheduling strategies in multi processing systems tend to minimize waiting times of short jobs. However, long jobs cannot be left aside as their impact on the performance of the system is also determinant. In this work we propose a job scheduling strategy that maximizes resources utilization and improves the overall performance by allowing jobs to adapt to variations in the load. The experimental evaluations include both simulations and executions of real workloads. The results show that our strategy provides significant improvements over the traditional EASY backfilling policy, especially in medium to high machine loads.


job scheduling MPI malleability 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
    Top500 supercomputers sites,
  6. 6.
    Arpaci-Dusseau, A.C.: Implicit coscheduling: coordinated scheduling with implicit information in distributed systems. ACM Trans. Comput. Syst. 19, 283–331 (2001)CrossRefGoogle Scholar
  7. 7.
    Buisson, J., Sonmez, O., Mohamed, H., Lammers, W., Epema, D.: Scheduling malleable applications in multicluster systems. In: Proc. of the IEEE International Conference on Cluster Computing 2007, pp. 372–381 (2007)Google Scholar
  8. 8.
    Cera, M.C., Georgiou, Y., Richard, O., Maillard, N., Navaux, P.O.A.: Supporting Malleability in Parallel Architectures with Dynamic CPUSETs Mapping and Dynamic MPI. In: Kant, K., Pemmaraju, S.V., Sivalingam, K.M., Wu, J. (eds.) ICDCN 2010. LNCS, vol. 5935, pp. 242–257. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  9. 9.
    Cirne, W., Berman, F.: Using moldability to improve the performance of supercomputer jobs. J. Parallel Distrib. Comput. 62, 1571–1601 (2002)zbMATHGoogle Scholar
  10. 10.
    Downey, A.B.: A model for speedup of parallel programs. Technical report, University of California at Berkerley (1997)Google Scholar
  11. 11.
    El Maghraoui, K., Desell, T.J., Szymanski, B.K., Varela, C.A.: Dynamic malleability in iterative MPI applications. In: Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid, CCGRID 2007, pp. 591–598. IEEE Computer Society, Washington, DC (2007)CrossRefGoogle Scholar
  12. 12.
    Ernemann, C., Krogmann, M., Lepping, J., Yahyapour, R.: Scheduling on the Top 50 Machines. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 17–46. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  13. 13.
    Feitelson, D.G., Rudolph, L.: Gang scheduling performance benefits for fine-grain synchronization. Journal of Parallel and Distributed Computing 16(4), 306–318 (1992)zbMATHCrossRefGoogle Scholar
  14. 14.
    Feitelson, D.G., Rudolph, L.: Toward Convergence in Job Schedulers for Parallel Supercomputers. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1996 and JSSPP 1996. LNCS, vol. 1162, pp. 1–26. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  15. 15.
    Iancu, C., Hofmeyr, S., Zheng, Y., Blagojevic, F.: Oversubscription on multicore processors. In: 24th International Parallel and Distributed Processing Symposium (IPDPS), pp. 1–11 (2010)Google Scholar
  16. 16.
    Lifka, D.A.: The ANL/IBM SP Scheduling System. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 295–303. Springer, Heidelberg (1995)CrossRefGoogle Scholar
  17. 17.
    Lublin, U., Feitelson, D.G.: The workload on parallel supercomputers: Modeling the characteristics of rigid jobs. Journal of Parallel and Distributed Computing 63, 2003 (2001)Google Scholar
  18. 18.
    McCann, C., Zahorjan, J.: Processor allocation policies for message-passing parallel computers. In: Proceedings of the 1994 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 1994, pp. 19–32. ACM, New York (1994)CrossRefGoogle Scholar
  19. 19.
    Mu’alem, A.W., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the ibm sp2 with backfilling. IEEE Transactions on Parallel and Distributed Systems 12(6), 529–543 (2001)CrossRefGoogle Scholar
  20. 20.
    Padhye, J., Dowdy, L.W.: Dynamic Versus Adaptive Processor Allocation Policies for Message Passing Parallel Computers: An Empirical Comparison. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1996 and JSSPP 1996. LNCS, vol. 1162, pp. 224–243. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  21. 21.
    Sodan, A.C., Jin, W.: Backfilling with fairness and slack for parallel job scheduling. Journal of Physics: Conference Series 256(1), 012–023 (2010)Google Scholar
  22. 22.
    Subotic, V., Labarta, J., Valero, M.: Simulation environment for studying overlap of communication and computation. In: 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), White Plains, NY, pp. 115–116 (March 2010)Google Scholar
  23. 23.
    Sudarsan, R., Ribbens, C.J.: Scheduling resizable parallel applications. In: International Parallel and Distributed Processing Symposium, pp. 1–10 (2009)Google Scholar
  24. 24.
    Utrera, G., Corbalán, J., Labarta, J.: Implementing malleability on MPI jobs. In: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, PACT 2004, pp. 215–224. IEEE Computer Society, Washington, DC (2004)CrossRefGoogle Scholar
  25. 25.
    Utrera, G., Corbalán, J., Labarta, J.: Scheduling of MPI Applications: Self-co-scheduling. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds.) Euro-Par 2004. LNCS, vol. 3149, pp. 238–245. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  26. 26.
    Utrera, G., Tabik, S., Corbalán, J., Labarta, J.: A job scheduling approach to reduce waiting times. Technical report, Technical University of Catalonia, UPC-DAC-RR-2012-1 (October 2011)Google Scholar
  27. 27.
    Wiseman, Y., Feitelson, D.G.: Paired gang scheduling. IEEE Transactions on Parallel and Distributed Systems 14(6), 581–592 (2003)CrossRefGoogle Scholar
  28. 28.
    Zhang, Y., Sivasubramaniam, A., Moreira, J., Franke, H.: A simulation-based study of scheduling mechanisms for a dynamic cluster environment. In: Proceedings of the 14th International Conference on Supercomputing, ICS 2000, pp. 100–109. ACM, New York (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Gladys Utrera
    • 1
  • Siham Tabik
    • 2
  • Julita Corbalan
    • 1
  • Jesús Labarta
    • 3
  1. 1.Technical University of Catalonia (UPC)BarcelonaSpain
  2. 2.University of MalagaMalagaSpain
  3. 3.Barcelona Supercomputing Center (BSC)BarcelonaSpain

Personalised recommendations