Towards optimal scheduling policy for heterogeneous memory architecture in many-core system

Park, Geunchul; Rho, Seungwoo; Kim, Jik-Soo; Nam, Dukyun

doi:10.1007/s10586-018-2825-4

Towards optimal scheduling policy for heterogeneous memory architecture in many-core system

Published: 25 July 2018

Volume 22, pages 121–133, (2019)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Geunchul Park¹,
Seungwoo Rho¹,
Jik-Soo Kim² &
…
Dukyun Nam ORCID: orcid.org/0000-0003-1023-7311¹

416 Accesses
4 Citations
Explore all metrics

Abstract

With the advent of Intels second-generation many-core processor (Knights Landing: KNL), high-bandwidth memory (HBM) with potentially five times more bandwidth than existing dynamic random-access memory has become available as a valuable computing resource for high-performance computing (HPC) applications. Therefore, resource management schemes should now be able to consider existing central processing unit cores, conventional main memory, and this newly available HBM to improve the overall system throughput and user response time. In this paper, we present our profiling mechanism and related scheduling policy that analyzes the resource usage patterns of various HPC workloads. By carefully allocating memory-intensive workloads to HBM in KNL, we show that the overall performance of multiple message passing interface workloads can be improved in terms of the execution time and system utilization. We evaluate and verify the effectiveness of our scheme for optimizing the use of HBM by using NAS Parallel Benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Article 27 April 2021

Containerization technologies: taxonomies, applications and challenges

Article 08 June 2021

Performance improvement of the triangular matrix product in commodity clusters

Article Open access 15 April 2024

References

Sodani, A.: Knights landing (KNL): 2nd Generation Intel® Xeon Phi processor. In: Conference: 2015 IEEE Hot Chips 27 Symposium (HCS) (2015). https://doi.org/10.1109/HOTCHIPS.2015.7477467
Sodani, A.: Knights Landing Intel Xeon Phi CPU: Path to parallelism with general purpose programming. In: IXPUG ISC 2016 Workshop (2016)
Jeffers, J., Reinders, J., Sodani, A.: Intel Xeon Phi Processor High Performance Programming, Knights Landing Edition. Morgan Kaufmann, Burlington (2016)
Google Scholar
Antypas, K., et al.: Cori: A Cray XC Pre-exascale System for NERSC. In: Cray User Group Proceedings, Cray (2014)
Peng, I.B., et al.: Exploring the Performance Benefit of Hybrid Memory System on HPC Environments. In: Proc. Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 683–692 (2017)
Li, A., et al: Exploring and Analyzing the Real Impact of Modern On-Package Memory on HPC Scientific Kernels. In: Proceedings of SC17, Article No. 26. (2017)
Blue Waters 2017 Annual Report.: https://bluewaters.ncsa.illinois.edu/apps/docs/BW_AR_2017_linked.pdf, (2017)
Rho, S., et al.: A Study on Optimal Scheduling Using High-Bandwidth Memory of Knights Landing Processor. In: Proceedings of 2017 IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS*W) (2017)
Mike, P.: An Intro to MCDRAM (High Bandwidth Memory) on Knights Landing. In: Intel HPC Developer Conference (2016)
Jeddeloh, J., Keeth, B.: Hybrid memory cube new DRAM architecture increases density and performance. In: Proc. VLSIT (2012)
Kandalla, K., et al.: Optimizing Cray MPI and SHMEM software stacks for Cray-XC supercomputers based on Intel KNL processors. In: Proceedings of CUG (2016)
Rosales, C., et al.: A comparative study of application performance and scalability on the Intel Knights Landing processor. In: Taufer, M., et al. (eds.) High Performance Computing. ISC High Performance, vol. 9945, pp. 307–318. Springer, Cham (2016)
Chapter Google Scholar
Bailey, D., Lucas, R., Williams, S.: Performance Tuning of Scientific Applications. CRC Press Taylor & Francis Group, New York (2011)
MATH Google Scholar
Weaver, V.: Self-monitoring overhead of the Linux perf event performance counter interface. In: Proceedings of 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 102–111 (2015)
Performance API (PAPI).: http://icl.cs.utk.edu/papi/, (2018)
Pasquale, J., Bittel, B., Kraiman, D.: A static and dynamic workload characterization study of the San Diego Supercomputer Center CRAY X-MP. In: Proceedings of the 1991 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pp. 218–219 (1991)
Park, G., et al.: Profiling parallel program execution patterns for effectively leveraging high-bandwidth on-package memory. In: Proceedings of the Communications of the Korean Institute of Information Scientists and Engineers Winter Conference 2016, pp. 42–44 (2016)
NAS Parallel Benchmarks.: https://www.nas.nasa.gov/publications/npb.html, (2018)
Li, S., Raman, K., Sasanka, R.: Enhancing application performance using heterogeneous memory architectures on a many-core platform. In: Proceedings of the 2016 International Conference on High Performance Computing Simulation (HPCSim), pp. 1035–1042 (2016)
Slurm Workload Manager.: http://slurm.schedmd.com/, (2018)

Download references

Acknowledgements

This research was supported by Korea Institute of Science and Technology Information (KISTI) (Grant No: K-18-L12-C07-S01) and the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (Grant No: NRF-2017S1A3A2066319).

Author information

Authors and Affiliations

National Institute of Supercomputing and Networking, KISTI, 245 Daehak-ro, Yuseong-gu, Daejeon, 34141, Korea
Geunchul Park, Seungwoo Rho & Dukyun Nam
Department of Computer Engineering, Myongji University, 116 Myongji-ro, Cheoin-gu, Yongin, Gyeonggi-do, Korea
Jik-Soo Kim

Authors

Geunchul Park
View author publications
You can also search for this author in PubMed Google Scholar
Seungwoo Rho
View author publications
You can also search for this author in PubMed Google Scholar
Jik-Soo Kim
View author publications
You can also search for this author in PubMed Google Scholar
Dukyun Nam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dukyun Nam.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Park, G., Rho, S., Kim, JS. et al. Towards optimal scheduling policy for heterogeneous memory architecture in many-core system. Cluster Comput 22, 121–133 (2019). https://doi.org/10.1007/s10586-018-2825-4

Download citation

Received: 26 January 2018
Revised: 19 June 2018
Accepted: 17 July 2018
Published: 25 July 2018
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s10586-018-2825-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards optimal scheduling policy for heterogeneous memory architecture in many-core system

Abstract

Access this article

Similar content being viewed by others

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Containerization technologies: taxonomies, applications and challenges

Performance improvement of the triangular matrix product in commodity clusters

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Towards optimal scheduling policy for heterogeneous memory architecture in many-core system

Abstract

Access this article

Similar content being viewed by others

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Containerization technologies: taxonomies, applications and challenges

Performance improvement of the triangular matrix product in commodity clusters

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation