Skip to main content
Log in

Addressing characterization methods for memory contention aware co-scheduling

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The ability to precisely predict how memory contention degrades performance when co-scheduling programs is critical for reaching high performance levels in cluster, grid and cloud environments. In this paper we present an overview and compare the performance of state-of-the-art characterization methods for memory aware (co-)scheduling. We evaluate the prediction accuracy and co-scheduling performance of four methods: one slowdown-based, two cache-contention based and one based on memory bandwidth usage. Both our regression analysis and scheduling simulations find that the slowdown based method, represented by Memgen, performs better than the other methods. The linear correlation coefficient \(R^2\) of Memgen’s prediction is 0.890. Memgen’s preferred schedules reached 99.53 % of the obtainable performance on average. Also, the memory bandwidth usage method performed almost as well as the slowdown based method. Furthermore, while most prior work promote characterization based on cache miss rate we found it to be on par with random scheduling of programs and highly unreliable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Akyil L et al (2012) Memory management and programming tools. In: Intel guide for developing multithreaded applications, Intel Corporation, pp 1–133. http://software.intel.com/en-us/articles/intel-guide-for-developing-multithreaded-applications

  2. Antonopoulos CD, Nikolopoulos DS, Papatheodorou TS (2004) Realistic workload scheduling policies for taming the memory bandwidth bottleneck of smps., International conference on high performance computing, Springer, Berlin

  3. Araiza R, Aguilera MG, Pham T, Teller PJ (2005) Towards a cross-platform microbenchmark suite for evaluating hardware performance counter data. In: Proceedings of the 2005 conference on diversity in computing, ACM, New York, NY, USA, TAPIA ’05, pp 36–39. doi:10.1145/1095242.1095259

  4. Blagodurov S, Zhuravlev S, Fedorova A (2010) Contention-aware scheduling on multicore systems. ACM Trans Comput Syst 28(4):8:1–8:45. doi:10.1145/1880018.1880019

    Article  Google Scholar 

  5. de Blanche A, Lundqvist T (2014) A methodology for estimating co-scheduling slowdowns due to memory bus contention on multicore nodes. In: International conference on parallel and distributed computing and networks

  6. de Blanche A, Mankefors-Christiernin S (2010) Method for experimental measurement of an applications memory bus usage. In: International conference on parallel and distributed processing techniques and applications, CRSEA

  7. Boklund A, Jiresjo C, Mankefors-Christiernin S, Namaki N, Gustavsson-Christiernin L, Ebbmar M (2005) Performance of network subsystems for technical simulation on linux clusters. In: Conference on parallel and distributed computing and systems, pp 503–509

  8. Boklund A, Namaki N, Mankefors-Christiernin S, Gustafsson J, Lingbrand M (2008) Dual core efficiency for engineering simulation applications. In: International conference on parallel and distributed processing techniques and applications, pp 962–968

  9. Browne S, Dongarra J, Garner N, London K, Mucci P (2000) A portable programming interface for performance evaluation on modern processors. Int J High Perform Comput Appl 14:189–204

    Article  Google Scholar 

  10. Cascaval C, Rose LD, Padua DA, Reed DA (2000) Compile-time based performance prediction. In: Proceedings of the 12th international workshop on languages and compilers for parallel computing, Springer, London, LCPC ’99, pp 365–379. http://dl.acm.org/citation.cfm?id=645677.663790

  11. Chandra D, Guo F, Kim S, Solihin Y (2005) Predicting inter-thread cache contention on a chip multi-processor architecture., International symposium on high-performance computer architectureIEEE Computer Society, Washington, DC, USA

    Book  Google Scholar 

  12. Daci G, Tartari M (2013) A comparative review of contention-aware scheduling algorithms to avoid contention in multicore systems. In: Das VV (ed) Proceedings of the third international conference on trends in information, telecommunication and computing, vol 150, lecture notes in electrical engineering, Springer, New York, pp 99–106

  13. Eklov D, Nikoleris N, Black-Schaffer D, Hagersten E (2011) Cache pirating: measuring the curse of the shared cache. In: Parallel processing (ICPP), 2011 International conference on, pp 165–175. doi:10.1109/ICPP.2011.15

  14. Eklov D, Nikoleris N, Black-Schaffer D, Hagersten E (2012) Bandwidth bandit: quantitative characterization of memory contention. In: Proceedings of the 21st international conference on parallel architectures and compilation techniques, ACM, New York, PACT ’12, pp 457–458. doi:10.1145/2370816.2370894

  15. Eranian S (2008) What can performance counters do for memory subsystem analysis? ACM SIGPLAN workshop on Memory systems performance and correctness: in conjunction with the thirteenth international conference on architectural support for programming languages and operating systems. ACM, New York, pp 26–30

  16. Fedorova A, Blagodurov S, Zhuravlev S (2010) Managing contention for shared resources on multicore processors. Commun ACM 53(2):49–57. doi:10.1145/1646353.1646371

    Article  Google Scholar 

  17. Field D, Johnson D, Mize D, Stober R (2007) Scheduling to overcome the multi-core memory bandwidth bottleneck. Hewlett Packard and Platform Computing White Paper

  18. Guo F (2008) Analyzing and managing shared cache in chip multi-processors. PhD thesis, North Carolina State University

  19. Hoste K, Eeckhout L (2007) Microarchitecture-independent workload characterization. IEEE Micro 27(3):63–72. doi:10.1109/MM.2007.56

    Article  Google Scholar 

  20. Iyer R, Zhao L, Guo F, Illikkal R, Makineni S, Newell D, Solihin Y, Hsu L, Reinhardt S (2007) Qos policies and architecture for cache/memory in cmp platforms. SIGMETRICS Perform Eval Rev 35(1):25–36. doi:10.1145/1269899.1254886

    Article  Google Scholar 

  21. Jia G, Sheng W, Dai W, Li X (2011) Using fom predicting method for scheduling on chip multi-processor. In: Communication software and networks (ICCSN), 2011 IEEE 3rd international conference on, pp 579–584. doi:10.1109/ICCSN.2011.6013973

  22. Jiang Y, Shen X, Chen J, Tripathi R (2008) Analysis and approximation of optimal co-scheduling on chip multiprocessors. International conference on parallel architectures and compilation techniques. NY, USA, New York, pp 220–229

  23. Koller R, Verma A, Rangaswami R (2011) Estimating application cache requirement for provisioning caches in virtualized systems. In: Modeling, analysis simulation of computer and telecommunication systems (MASCOTS), 2011 IEEE 19th international symposium on, pp 55–62. doi:10.1109/MASCOTS.2011.67

  24. Koukis E, Koziris N (2006) Memory and network bandwidth aware scheduling of multiprogrammed workloads on clusters of smps. International conference on parallel and distributed systems, vol 1. IEEE Computer Society, Washington, DC, pp 345–354

  25. Levinthal D (2007) Performance analysis guide for intel core i7 processor and intel xeon 5500 processors. Intel White Paper, from internet 2014. http://software.intel.com/sites/products/collateral/hpc/vtune/resolving_multicore_non_scaling.pdf

  26. Levinthal D (2009) Analyzing and resolving multi-core non scaling on intel core 2 processors. Intel White Paper, from internet 2014. https://software.intel.com/sites/products/collateral/hpc/vtun/performance_analysis_guide.pdf

  27. Liu X, Tong W, Zhi X, ZhiRen F, WenZhao L (2014) Performance analysis of cloud computing services considering resources sharing among virtual machines. J Supercomput 69(1):357–374. doi:10.1007/s11227-014-1156-3

    Article  Google Scholar 

  28. Mars J, Vachharajani N, Hundt R, Soffa ML (2010) Contention aware execution: online contention detection and response. In: CGO ’10: proceedings of the 2010 international symposium on code generation and optimization, ACM, New York, pp 257–265. doi:10.1145/1772954.1772991

  29. Mars J, Tang L, Hundt R, Skadron K, Soffa ML (2011) Bubble-up: increasing utilization in modern warehouse scale computers via sensible co-locations. In: MICRO ’11: proceedings of the 44th annual IEEE/ACM international symposium on microarchitecture, ACM, New York

  30. Mars J, Tang L, Hundt R, Skadron K, Soffa ML (2012) Increasing utilization in warehouse scale computers using bubbleup. IEEE Micro

  31. McCalpin JD (1995) Memory bandwidth and machine balance in current high performance computers. In: IEEE computer society technical committee on computer architecture newsletter pp 19–25

  32. Namaki N, de Blanche A, Mankefors-Christiernin S (2009a) Exhaustion dominated performance: a first attempt. In: Proceedings of the 2009 ACM symposium on applied computing, ACM, New York, SAC ’09, pp 1011–1012. doi:10.1145/1529282.1529504

  33. Namaki N, de Blanche A, Mankefors-Christiernin S (2009b) A tool for processor dependency characterization of hpc applications. In: International Conference HPC Asia 2009

  34. Namaki N, de Blanche A, Mankefors-Christiernin S (2010) Black-box characterization of processor workloads for engineering applications. In: IEEE international symposium on workload characterization, IEEE

  35. Niemi T, Hameri AP (2012) Memory-based scheduling of scientific computing clusters. J Supercomput 61(3):520–544. doi:10.1007/s11227-011-0612-6

    Article  Google Scholar 

  36. Publications NASD (2009) Nas parallel benchmarks. http://www.nas.nasa.gov/publications/npb.html

  37. Singer N (2009) More chip cores can mean slower supercomputing, sandia simulation shows. Sandia National Laboratories News Release

  38. Tam DK, Azimi R, Soares LB, Stumm M (2009) Rapidmrc: approximating l2 miss rate curves on commodity systems for online optimizations. In: Proceedings of the 14th international conference on architectural support for programming languages and operating systems, ACM, New York, ASPLOS XIV, pp 121–132. doi:10.1145/1508244.1508259

  39. Tang L, Mars J, Vachharajani N, Hundt R, Soffa ML (2011) The impact of memory subsystem resource sharing on datacenter applications. In: ISCA ’11: Proceeding of the 38th annual international symposium on computer architecture, ACM, New York, ISCA ’11, pp 283–294. doi:10.1145/2000064.2000099

  40. Utrera G, Corbalan J, Labarta J (2014) Scheduling parallel jobs on multicore clusters using cpu oversubscription. J Supercomput 68(3):1113–1140. doi:10.1007/s11227-014-1142-9

    Article  Google Scholar 

  41. Xu D, Wu C, Yew PC (2010) On mitigating memory bandwidth contention through bandwidth-aware scheduling. International conference on parallel architectures and compilation techniques. New York, USA, pp 237–248

  42. Yang CT, Leu FY, Chen SY (2010) Network bandwidth-aware job scheduling with dynamic information model for grid resource brokers. J Supercomput 52(3):199–223. doi:10.1007/s11227-008-0256-3

    Article  Google Scholar 

  43. Yang LT, Ma X, Mueller F (2005) Cross-platform performance prediction of parallel applications using partial execution. In: Proceedings of the 2005 ACM/IEEE conference on supercomputing, IEEE Computer Society, Washington, DC, USA, SC ’05. doi:10.1109/SC.2005.20

  44. Zhuravlev S, Blagodurov S, Fedorova A (2010) Addressing shared resource contention in multicore processors via scheduling., ASPLOS on Architectural support for programming languages and operating systems.ACM, New York

    Book  Google Scholar 

  45. Zhuravlev S, Saez JC, Blagodurov S, Fedorova A, Prieto M (2012) Survey of scheduling techniques for addressing shared resources in multicore processors. ACM Comput Surv 45(1):4:1–4:28. doi:10.1145/2379776.2379780

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas de Blanche.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

de Blanche, A., Lundqvist, T. Addressing characterization methods for memory contention aware co-scheduling. J Supercomput 71, 1451–1483 (2015). https://doi.org/10.1007/s11227-014-1374-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-014-1374-8

Keywords

Navigation