Skip to main content
Log in

Towards optimal scheduling policy for heterogeneous memory architecture in many-core system

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

With the advent of Intels second-generation many-core processor (Knights Landing: KNL), high-bandwidth memory (HBM) with potentially five times more bandwidth than existing dynamic random-access memory has become available as a valuable computing resource for high-performance computing (HPC) applications. Therefore, resource management schemes should now be able to consider existing central processing unit cores, conventional main memory, and this newly available HBM to improve the overall system throughput and user response time. In this paper, we present our profiling mechanism and related scheduling policy that analyzes the resource usage patterns of various HPC workloads. By carefully allocating memory-intensive workloads to HBM in KNL, we show that the overall performance of multiple message passing interface workloads can be improved in terms of the execution time and system utilization. We evaluate and verify the effectiveness of our scheme for optimizing the use of HBM by using NAS Parallel Benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Sodani, A.: Knights landing (KNL): 2nd Generation Intel® Xeon Phi processor. In: Conference: 2015 IEEE Hot Chips 27 Symposium (HCS) (2015). https://doi.org/10.1109/HOTCHIPS.2015.7477467

  2. Sodani, A.: Knights Landing Intel Xeon Phi CPU: Path to parallelism with general purpose programming. In: IXPUG ISC 2016 Workshop (2016)

  3. Jeffers, J., Reinders, J., Sodani, A.: Intel Xeon Phi Processor High Performance Programming, Knights Landing Edition. Morgan Kaufmann, Burlington (2016)

    Google Scholar 

  4. Antypas, K., et al.: Cori: A Cray XC Pre-exascale System for NERSC. In: Cray User Group Proceedings, Cray (2014)

  5. Peng, I.B., et al.: Exploring the Performance Benefit of Hybrid Memory System on HPC Environments. In: Proc. Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 683–692 (2017)

  6. Li, A., et al: Exploring and Analyzing the Real Impact of Modern On-Package Memory on HPC Scientific Kernels. In: Proceedings of SC17, Article No. 26. (2017)

  7. Blue Waters 2017 Annual Report.: https://bluewaters.ncsa.illinois.edu/apps/docs/BW_AR_2017_linked.pdf, (2017)

  8. Rho, S., et al.: A Study on Optimal Scheduling Using High-Bandwidth Memory of Knights Landing Processor. In: Proceedings of 2017 IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS*W) (2017)

  9. Mike, P.: An Intro to MCDRAM (High Bandwidth Memory) on Knights Landing. In: Intel HPC Developer Conference (2016)

  10. Jeddeloh, J., Keeth, B.: Hybrid memory cube new DRAM architecture increases density and performance. In: Proc. VLSIT (2012)

  11. Kandalla, K., et al.: Optimizing Cray MPI and SHMEM software stacks for Cray-XC supercomputers based on Intel KNL processors. In: Proceedings of CUG (2016)

  12. Rosales, C., et al.: A comparative study of application performance and scalability on the Intel Knights Landing processor. In: Taufer, M., et al. (eds.) High Performance Computing. ISC High Performance, vol. 9945, pp. 307–318. Springer, Cham (2016)

    Chapter  Google Scholar 

  13. Bailey, D., Lucas, R., Williams, S.: Performance Tuning of Scientific Applications. CRC Press Taylor & Francis Group, New York (2011)

    MATH  Google Scholar 

  14. Weaver, V.: Self-monitoring overhead of the Linux perf event performance counter interface. In: Proceedings of 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 102–111 (2015)

  15. Performance API (PAPI).: http://icl.cs.utk.edu/papi/, (2018)

  16. Pasquale, J., Bittel, B., Kraiman, D.: A static and dynamic workload characterization study of the San Diego Supercomputer Center CRAY X-MP. In: Proceedings of the 1991 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pp. 218–219 (1991)

  17. Park, G., et al.: Profiling parallel program execution patterns for effectively leveraging high-bandwidth on-package memory. In: Proceedings of the Communications of the Korean Institute of Information Scientists and Engineers Winter Conference 2016, pp. 42–44 (2016)

  18. NAS Parallel Benchmarks.: https://www.nas.nasa.gov/publications/npb.html, (2018)

  19. Li, S., Raman, K., Sasanka, R.: Enhancing application performance using heterogeneous memory architectures on a many-core platform. In: Proceedings of the 2016 International Conference on High Performance Computing Simulation (HPCSim), pp. 1035–1042 (2016)

  20. Slurm Workload Manager.: http://slurm.schedmd.com/, (2018)

Download references

Acknowledgements

This research was supported by Korea Institute of Science and Technology Information (KISTI) (Grant No: K-18-L12-C07-S01) and the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (Grant No: NRF-2017S1A3A2066319).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dukyun Nam.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Park, G., Rho, S., Kim, JS. et al. Towards optimal scheduling policy for heterogeneous memory architecture in many-core system. Cluster Comput 22, 121–133 (2019). https://doi.org/10.1007/s10586-018-2825-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-018-2825-4

Keywords

Navigation