Advertisement

High-Level Approaches for Leveraging Deep-Memory Hierarchies on Modern Supercomputers

  • Antonio Gómez-IglesiasEmail author
  • Ritu Arora
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 964)

Abstract

There is a growing demand for supercomputers that can support memory-intensive applications to solve large-scale problems from various domains. Novel supercomputers with fast and complex memory subsystems are being provisioned to meet this demand. While complex and deep-memory hierarchies offer increased memory-bandwidth they can also introduce additional latency. Optimizing the memory usage of the applications is required to improve performance. However, this can be an effort-intensive and a time-consuming activity if done entirely manually. Hence, high-level approaches for supporting the memory-management and memory-optimization on modern supercomputers are needed. Such scalable approaches can contribute towards supporting the users at the open-science data centers - mostly domain scientists and students - in their code modernization efforts. In this paper, we present a memory management and optimization workflow based on high-level tools. While the workflow can be generalized for supercomputers with different architectures, we demonstrate its usage on the Stampede2 system at the Texas Advanced Computing Center that contains both Intel Knights Landing and Intel Xeon processors, and each Knights Landing node offers both DDR4 and MCDRAM.

Notes

Acknowledgment

We are very grateful to the National Science Foundation for grant #1642396, ICERT REU program (National Science Foundation grant #1359304), XSEDE (National Science Foundation grant #ACI-1053575), and Texas Advanced Computing Center (TACC) for providing resources required for this project. We are grateful to Tiffany Connors and Lars Koesterke for their contributions to the ICAT codebase. Stampede2 is generously funded by the National Science Foundation (NSF) through award ACI-1540931.

References

  1. 1.
    Arora, R., Koesterke, L.: Interactive code adaptation tool for modernizing applications for Intel knights landing processors. In: Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact, PEARC 2017, pp. 28:1–28:8. ACM, New York (2017). DOI  https://doi.org/10.1145/3093338.3093352
  2. 2.
    Chandrasekar, K., Ni, X., Kale, L.V.: A memory heterogeneity-aware runtime system for bandwidth-sensitive HPC applications. In: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1293–1300 (2017).  https://doi.org/10.1109/IPDPSW.2017.168
  3. 3.
    Harrod, W.: A journey to exascale computing. In: 2012 SC Companion on High Performance Computing, Networking, Storage and Analysis (SCC), pp. 1702–1730. IEEE (2012)Google Scholar
  4. 4.
    Hartmann, C., Fey, D.: An extended analysis of memory hierarchies for efficient implementations of image processing applications. J. R.-Time Image Process. (2017).  https://doi.org/10.1007/s11554-017-0723-2CrossRefGoogle Scholar
  5. 5.
    Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach, 5th edn. Morgan Kaufmann Publishers Inc., San Francisco (2011)zbMATHGoogle Scholar
  6. 6.
    Heroux, M.A., et al.: Improving performance via mini-applications. Technical report SAND2009-5574, Sandia National Laboratories, Albuquerque, New Mexico 87185 and Livermore, California 94550 (2009)Google Scholar
  7. 7.
    Intel: Memkind (2017). http://memkind.github.io/memkind/. Accessed 25 Jun 2018
  8. 8.
    Jun, H., et al.: HBM (High Bandwidth Memory) DRAM technology and architecture. In: 2017 IEEE International Memory Workshop (IMW), pp. 1–4 (2017).  https://doi.org/10.1109/IMW.2017.7939084
  9. 9.
    Karlin, I., et al.: Exploring traditional and emerging parallel programming models using a proxy application. In: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, IPDPS 2013, pp. 919–932. IEEE Computer Society, Washington, DC (2013).  https://doi.org/10.1109/IPDPS.2013.115
  10. 10.
    Karlin, I., Keasler, J., Neely, J.: Lulesh 2.0 updates and changes. Technical report, Lawrence Livermore National Laboratory (LLNL), Livermore, CA (2013)Google Scholar
  11. 11.
    Khaldi, D., Chapman, B.: Towards automatic HBM allocation using LLVM: a case study with knights landing. In: Proceedings of the Third Workshop on LLVM Compiler Infrastructure in HPC, LLVM-HPC 2016, pp. 12–20. IEEE Press, Piscataway (2016).  https://doi.org/10.1109/LLVM-HPC.2016.7
  12. 12.
    Laboratory, S.N.: Mantevo Project Homepage (2018). http://mantevo.org. Accessed 25 June 2018
  13. 13.
    Peng, I.B., Gioiosa, R., Kestor, G., Cicotti, P., Laure, E., Markidis, S.: RTHMS: a tool for data placement on hybrid memory system. In: Proceedings of the 2017 ACM SIGPLAN International Symposium on Memory Management, ISMM 2017, pp. 82–91. ACM, New York (2017).  https://doi.org/10.1145/3092255.3092273
  14. 14.
    Plimpton, S., et al.: Crossing the mesoscale no-man’s land via parallel kinetic Monte Carlo (2009)Google Scholar
  15. 15.
    PRACE: Partnership for Advanced Computing in Europe (2018). http://www.prace-ri.eu/. Accessed 25 June 2018
  16. 16.
    Reinders, J.: VTune Performance Analyzer Essentials. Intel Press (2005)Google Scholar
  17. 17.
    Rosales, C., et al.: A comparative study of application performance and scalability on the Intel knights landing processor. In: Taufer, M., Mohr, B., Kunkel, J.M. (eds.) ISC High Performance 2016. LNCS, vol. 9945, pp. 307–318. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46079-6_22CrossRefGoogle Scholar
  18. 18.
    Rosales, C., Gómez-Iglesias, A., Predoehl, A.: REMORA: a resource monitoring tool for everyone. In: Proceedings of the Second International Workshop on HPC User Support Tools, HUST 2015, pp. 3:1–3:8. ACM, New York (2015).  https://doi.org/10.1145/2834996.2834999
  19. 19.
    Rosales, C., et al.: KNL utilization guidelines. Technical report TR-16-03, Texas Advanced Computing Center, Austin, Texas (2013)Google Scholar
  20. 20.
    Sodani, A.: Knights Landing (KNL): 2nd generation Intel Xeon Phi processor. In: 2015 IEEE Hot Chips 27 Symposium (HCS), pp. 1–24 (2015).  https://doi.org/10.1109/HOTCHIPS.2015.7477467
  21. 21.
    Stanzione, D., et al.: Stampede 2: the evolution of an XSEDE supercomputer. In: Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact, PEARC 2017, pp. 15:1–15:8. ACM, New York (2017).  https://doi.org/10.1145/3093338.3093385
  22. 22.
    Towns, J., et al.: XSEDE: accelerating scientific discovery. Comput. Sci. Eng. 16(5), 62–74 (2014).  https://doi.org/10.1109/MCSE.2014.80CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Intel CorporationHillsboroUSA
  2. 2.Texas Advanced Computing CenterThe University of Texas at AustinAustinUSA

Personalised recommendations