Automatic Co-scheduling Based on Main Memory Bandwidth Usage

  • Jens BreitbartEmail author
  • Josef Weidendorfer
  • Carsten Trinitis
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10353)


Most applications running on supercomputers achieve only a fraction of a system’s peak performance. It has been demonstrated that co-scheduling applications can improve overall system utilization. In this case, however, applications being co-scheduled need to fulfill certain criteria such that mutual slowdown is kept at a minimum. In this paper we present a set of libraries and a first HPC scheduler prototype that automatically detects an application’s main memory bandwidth utilization and prevents the co-scheduling of multiple main memory bandwidth limited applications. We demonstrate that our prototype achieves almost the same performance as we achieved with manually tuned co-schedules in previous work.



We want to thank MEGWARE, who provided us with a Clustsafe to measure energy consumption. The work presented in this paper was funded by the German Ministry of Education and Science as part of the FAST project (funding code 01IH11007A).


  1. 1.
    Breitbart, J., Weidendorfer, J., Trinitis, C.: Case study on co-scheduling for HPC applications. In: 44th International Conference on Parallel Processing Workshops (ICPPW), pp. 277–285 (2015)Google Scholar
  2. 2.
    Kraus, J., Förster, M., Brandes, T., Soddemann, T.: Using lama for efficient amg on hybrid clusters. Comput. Sci. Res. Dev. 28(2–3), 211–220 (2013)CrossRefGoogle Scholar
  3. 3.
    Lin, H., Balaji, P., Poole, R., Sosa, C., Ma, X., Feng, W.-C.: Massively parallel genomic sequence search on the blue gene/p architecture. In: International Conference for High Performance Computing, Networking, Storage and Analysis. SC 2008, pp. 1–11. IEEE (2008)Google Scholar
  4. 4.
    Teyssier, R.: Cosmological hydrodynamics with adaptive mesh refinement-a new high resolution code called ramses. Astron. Astrophys. 385(1), 337–364 (2002)CrossRefGoogle Scholar
  5. 5.
    Lavallée, P.-F., de Verdière, G.C., Wautelet, P., Lecas, D., Dupays, J.-M.: Porting and optimizing HYDRO to new platforms and programming paradigms lessons learnt (2012).
  6. 6.
    Bertolacci, I.J., Olschanowsky, C., Harshbarger,B., Chamberlain, B.L., Wonnacott, D.G., Strout, M.M.: Parameterized diamond tiling for stencil computations with chapel parallel iterators. In: Proceedings of the 29th ACM on International Conference on Supercomputing, pp. 197–206. ACM (2015)Google Scholar
  7. 7.
    Weidendorfer, J., Breitbart, J.: Detailed characterization of HPC applications for co-scheduling. In: Proceedings of the 1st COSH Workshop on Co-Scheduling of HPC Applications, p. 19, January 2016Google Scholar
  8. 8.
    Bennett, B.T., Kruskal, V.J.: LRU stack processing. IBM J. Res. Dev. 19, 353–357 (1975)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Klug, T., Ott, M., Weidendorfer, J., Trinitis, C.: Automated optimization of thread-to-core pinning on multicore systems. In: Stenström, P. (ed.) Transactions on High-Performance Embedded Architectures and Compilers III. LNCS, vol. 6590, pp. 219–235. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-19448-1_12 CrossRefGoogle Scholar
  10. 10.
    Tsafack Chetsa, G.L., Lefèvre, L., Pierson, J.-M., Stolf, P., Da Costa, G.: Exploiting performance counters to predict and improve energy performance of HPC systems. Future Gener. Comput. Syst. 36, 287–298 (2014). CrossRefGoogle Scholar
  11. 11.
    Wang, L., Von Laszewski, G., Dayal, J., Wang, F.: Towards energy aware scheduling for precedence constrained parallel tasks in a cluster with DVFS. In: 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid), pp. 368–377. IEEE (2010)Google Scholar
  12. 12.
    Rountree, B., Lownenthal, D.K., de Supinski, B.R., Schulz, M., Freeh, V.W., Bletsch, T.: Adagio: making DVS practical for complex HPC applications. In: Proceedings of the 23rd International Conference on Supercomputing, ser. ICS 2009, pp. 460–469. ACM, New York (2009).
  13. 13.
    Teich, J., Henkel, J., Herkersdorf, A., Schmitt-Landsiedel, D., Schröder-Preikschat, W., Snelting, G.: Invasive computing: an overview. In: Hübner, M., Becker, J. (eds.) Multiprocessor System-on-Chip, pp. 241–268. Springer, New York (2011)CrossRefGoogle Scholar
  14. 14.
    Schreiber, M., Riesinger, C., Neckel, T., Bungartz, H.-J., Breuer, A.: Invasive compute balancing for applications with shared and hybrid parallelization. Int. J. Parallel Program. 1–24 (2014)Google Scholar
  15. 15.
    Auweter, A., Bode, A., Brehm, M., Huber, H., Kranzlmüller, D.: Principles of energy efficiency in high performance computing. In: Kranzlmüller, D., Toja, A.M. (eds.) ICT-GLOW 2011. LNCS, vol. 6868, pp. 18–25. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-23447-7_3 CrossRefGoogle Scholar
  16. 16.
    de Blanche, A., Lundqvist, T.: EnglishAddressing characterization methods for memory contention aware co-scheduling. Engl. J. Supercomput. 71(4), 1451–1483 (2015)CrossRefGoogle Scholar
  17. 17.
    Eklov, D., Nikoleris, N., Black-Schaffer, D., Hagersten, E.: Bandwidth bandit: Quantitative characterization of memory contention. In: 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 1–10 (2013)Google Scholar
  18. 18.
    Mars, J., Vachharajani, N., Hundt, R., Soffa, M.L.: Contention aware execution: online contention detection and response. In: Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization, ser. CGO 2010, pp. 257–265. ACM, New York (2010)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Jens Breitbart
    • 1
    Email author
  • Josef Weidendorfer
    • 1
  • Carsten Trinitis
    • 1
  1. 1.Department of Informatics, Chair for Computer ArchitectureTechnical University MunichMunichGermany

Personalised recommendations