The Journal of Supercomputing

, Volume 71, Issue 3, pp 966–994 | Cite as

Locality-aware policies to improve job scheduling on 3D tori

  • Jose A. Pascual
  • Jose Miguel-Alonso
  • Jose A. Lozano


This paper studies the influence that contiguous job placement has on the performance of schedulers for large-scale computing systems. In contrast with non-contiguous strategies, contiguous partitioning enables the exploitation of communication locality in applications, and also reduces inter-application interference. However, contiguous partitioning increases scheduling times and system fragmentation, degrading system utilization. We propose and evaluate several strategies to select contiguous partitions to allocate incoming jobs. These strategies are used in combination with different mapping mechanisms to perform the task-to-node assignment in order to further reduce application run times. A simulation-based study has been carried out, using a collection of synthetic applications performing common communication patterns. Results show that the exploitation of communication locality by means of a correct partitioning–mapping results in an effective reduction of application run times, and the gains achieved more than compensate the scheduling inefficiency, therefore resulting in better overall system performance.


Scheduling Contiguous partitioning Task mapping Locality-aware policies 



This work has been supported by programs Saiotek and Research Groups 2013-2018 (IT-609-13) from the Basque Government, projects TIN2013-41272P from the Spanish Ministry of Science and Innovation, COMBIOMED network in computational biomedicine (Carlos III Health Institute), and by the NICaiA Project PIRSES-GA-2009-247619 (European Commission). Dr. Pascual is supported by a postdoctoral Grant from the University of the Basque Country. Prof. Miguel-Alonso is a member of the HiPEAC European Network of Excellence.


  1. 1.
    Pascual JA, Miguel-Alonso J, Lozano JA (2011) Optimization-based mapping framework for parallel applications. J Parallel Distrib Comput 71(10):1377–1387CrossRefGoogle Scholar
  2. 2.
    Navaridas J, Miguel-Alonso J, Pascual JA, Ridruejo FJ (2011) Simulating and evaluating interconnection networks with INSEE. Simul Model Pract Theory 19(1):494–515CrossRefGoogle Scholar
  3. 3.
    Feitelson DG, Rudolph L, Schwiegelshohn U (2005) Parallel job scheduling—a status report. In: Feitelson DG, Rudolph L (eds) Job scheduling strategies for parallel processing. Springer, Berlin, pp 1–16Google Scholar
  4. 4.
    Bender MA, Bunde DP, Demaine ED, Fekete SP, Leung VJ, Meijer H, Phillips CA (2008) Communication-aware processor allocation for supercomputers: finding point sets of small average distance. Algorithmica 50(2):279–298CrossRefMATHMathSciNetGoogle Scholar
  5. 5.
    Lo V, Windisch K, Liu W, Nitzberg B (1997) Noncontiguous processor allocation algorithms for mesh-connected multicomputers. IEEE Trans Parallel Distrib Syst 8(7):712–726CrossRefGoogle Scholar
  6. 6.
    Pascual JA, Miguel-Alonso J, Lozano JA (2014) A fast implementation of the first-fit contiguous partitioning strategy for cubic topologies. Concurr Comput: Pract Exper 26(17):2792–2810CrossRefGoogle Scholar
  7. 7.
    Ansaloni R (2007) The Cray XT4 programming environment. (March 2007)
  8. 8.
    Bhatele A, Kalé LV (2008) Benefits of topology aware mapping for mesh interconnects. Parallel Process Lett 18(4):549–566CrossRefMathSciNetGoogle Scholar
  9. 9.
    Smith BE, Bode B (2005) Performance effects of node mappings on the IBM blue gene/l machine. In Proceedings of the 11th international Euro-Par conference on parallel processing. Springer, Berlin, pp 1005–1013Google Scholar
  10. 10.
    Yu H, Chung I-H, Moreira J (2006) Topology mapping for Blue Gene/L supercomputer. In: Proceedings of the 2006 ACM/IEEE conference on supercomputing, New York, NY, USA, 2006. ACMGoogle Scholar
  11. 11.
    Bani-Mohammad S, Ould-Khaoua M, Ababneh I, Mackenzie LM (2009) Comparative evaluation of contiguous allocation strategies on 3d mesh multicomputers. J Syst Softw 82(2):307–318CrossRefGoogle Scholar
  12. 12.
    Kang M, Yu C, Youn HY, Lee B, Kim M (2003) Isomorphic strategy for processor allocation in k-ary n-cube systems. IEEE Trans Comput 52(5):645–657CrossRefGoogle Scholar
  13. 13.
    Windisch K, Lo V, Bose B (1995) Contiguous and non-contiguous processor allocation algorithms for k-ary n-cubes. IEEE Trans Parallel Distrib Syst 8:712–726Google Scholar
  14. 14.
    Broeg B, Bose B, Kwon Y, Ashir Y (1995) Lee distance and topological properties of k-ary n-cubes. IEEE Trans Comput 44(8):1021–1030CrossRefMATHMathSciNetGoogle Scholar
  15. 15.
    Navaridas J, Miguel-Alonso J (2009) Realistic evaluation of interconnection networks using synthetic traffic. In: Proceedings of the 2009 eighth international symposium on parallel and distributed computing, pp 249–252, Lisbon, Portugal, 2009. IEEE Computer SocietyGoogle Scholar
  16. 16.
    Navaridas J, Miguel-Alonso J, Ridruejo F (2008) On synthesizing workloads emulating mpi applications. In: IEEE international symposium on parallel and distributed processing, IPDPS, April 2008, pp 1–8, Miami, FloridaGoogle Scholar
  17. 17.
    Puente V, Izu C, Beivide R, Gregorio J, Vallejo F, Prellezo J (2001) The adaptive bubble router. J Parallel Distrib Comput 61(9):1180–1208CrossRefMATHGoogle Scholar
  18. 18.
    Pascual JA, Navaridas J, Miguel-Alonso J (2009) Effects of topology-aware allocation policies on scheduling performance. In: Job scheduling strategies for parallel processing (IPDPS), Rome, Italy. Springer, Berlin, pp 138–156Google Scholar
  19. 19.
    Dally W, Towles B (2003) Principles and practices of interconnection networks. Morgan Kaufmann, San Francisco, CA, USAGoogle Scholar
  20. 20.
    Sheskin DJ (2007) Handbook of parametric and nonparametric statistical procedures, 4th edn. Chapman & Hall/CRC, LondonGoogle Scholar
  21. 21.
    Tsafrir D, Etsion Y, Feitelson DG (2007) Backfilling using system-generated predictions rather than user runtime estimates. IEEE Trans Parallel Distrib Syst 18(6):789–803CrossRefGoogle Scholar
  22. 22.
    Liu W, Lo V, Windisch K, Nitzberg B (1994) Non-contiguous processor allocation algorithms for distributed memory multicomputers. In: Proceedings of the 1994 ACM/IEEE conference on supercomputing, Supercomputing ’94, pp 227–236, Los Alamitos, CA, USA. IEEE Computer SocietyGoogle Scholar
  23. 23.
    Johnson CR, Bunde DP, Leung V J (2010) A Tie-breaking strategy for processor allocation in meshes. In: 39th International conference on parallel processing, ICPP workshops 2010, San Diego, California, USA, 13–16 September 2010, pp 331–338. IEEE Computer SocietyGoogle Scholar
  24. 24.
    Walker P, Bunde DP, Leung VJ (2010) Faster high-quality processor allocation. In: Proceedings of the 11th LCI international conference on high-performance cluster computing, 2010Google Scholar
  25. 25.
    Bokhari SH (1981) On the mapping problem. IEEE Trans Comput 30(3):207–214CrossRefMathSciNetGoogle Scholar
  26. 26.
    Bhatele A, Gupta G, Kale L, Chung I-H (2010) Automated mapping of regular communication graphs on mesh interconnects. In: 2010 International conference on high performance computing (HiPC), Dec 2010, pp 1–10Google Scholar
  27. 27.
    Balzuweit E, Bunde DP, Leung VJ, Finley A, Lee ACS (2014) Local search to improve task mapping. In: Proceedings of the 7th international workshop on parallel programming models and systems software for high-end computing (P2S2). IEEEGoogle Scholar
  28. 28.
    Meisner D, Gold BT, Wenisch TF (2009) Powernap: eliminating server idle power. In: Proceedings of the 14th international conference on architectural support for programming languages and operating systems, ASPLOS ’09, pp 205–216, New York, NY, USA, 2009. ACMGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Jose A. Pascual
    • 1
  • Jose Miguel-Alonso
    • 1
  • Jose A. Lozano
    • 1
  1. 1.Intelligent Systems Group, School of Computer ScienceUniversity of the Basque Country UPV/EHUSan SebastianSpain

Personalised recommendations