Skip to main content

Software Technology That Deals with Deeper Memory Hierarchy in Post-petascale Era

  • Chapter
  • First Online:
Advanced Software Technologies for Post-Peta Scale Computing

Abstract

There is an urgent need to develop technology that realizes larger, finer, and faster simulations in meteorology, bioinformatics, disaster measures, and so on, toward post-petascale era. However, the “memory wall” problem will be the one of largest obstacles; the growth of memory bandwidth and capacity will be even slower than that of processor throughput. For this purpose, we suppose system architecture with memory hierarchy including hybrid memory devices, including nonvolatile RAM (NVRAM), and develop new software technology that efficiently utilizes the hybrid memory hierarchy. The area of our research includes new compiler technology, memory management, and application algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Available at https://github.com/toshioendo/hhrt

  2. 2.

    In the actual implementation, there are two transient states, “swapping-in” and “swapping-out.”

  3. 3.

    https://github.com/YukinoriSato/ExanaPkg

References

  1. Ammons, G., Ball, T., Larus, J.R.: Exploiting hardware performance counters with flow and context sensitive profiling. In: Proceedings of the ACM SIGPLAN 1997 conference on programming language design and implementation, pp. 85–96 (1997)

    Google Scholar 

  2. Bernaschi, M., Bisson, M., Endo, T., Fatica, M., Matsuoka, S., Melchionna, S., Succi, S.: Petaflop biofluidics simulations on a two million-core system. In: IEEE/ACM SC’11, 12p. (2011)

    Google Scholar 

  3. Endo, T.: Realizing out-of-core stencil computations using multi-Tier memory hierarchy on GPGPU clusters. In: IEEE Cluster Computing (CLUSTER2016), pp. 21–29 (2016)

    Google Scholar 

  4. Endo, T., Jin, G.: Software technologies coping with memory hierarchy of GPGPU clusters for stencil computations. In: IEEE Cluster Computing (CLUSTER2014), pp. 132–139 (2014)

    Google Scholar 

  5. Endo, T., Nukada, A., Matsuoka, S.: TSUBAME-KFC: a modern liquid submersion cooling prototype towards exascale becoming the greenest supercomputer in the world. In: IEEE International Conference on Parallel and Distributed Systems (ICPADS 2014), pp. 360–367 (2014)

    Google Scholar 

  6. Endo, T., Takasaki, Y., Matsuoka, S.: Realizing extremely large-scale stencil applications on GPU supercomputers. In: IEEE International Conference on Parallel and Distributed Systems (ICPADS 2015), pp. 625–632 (2015)

    Google Scholar 

  7. Grosser, T., Groesslinger, A., Lengauer, C.: Polly – performing polyhedral optimizations on a low-level intermediate representation. Parallel Process. Lett. 22(04), 1–28 (2012)

    Article  MathSciNet  Google Scholar 

  8. Hong, C., et al.: Effective padding of multidimensional arrays to avoid cache conflict misses. In: Proceedings of the 37th ACM Conference on Programming Language Design and Implementation, PLDI ’16, pp. 129–144 (2016)

    Google Scholar 

  9. Lucas, R., et al.: Top ten exascale research challenges, DOE ASCAC Subcommittee Report (2014)

    Google Scholar 

  10. Luk, C.K., et al.: Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 190–200 (2005)

    Google Scholar 

  11. Matsubara, Y., Sato, Y.: Online memory access pattern analysis on an application profiling tool. In: International Workshop on Advances in Networking and Computing, 2014 (WANC2014), pp. 602–604 (2014)

    Book  Google Scholar 

  12. Matsuoka, S., Endo, T., Nukada, A., Miura, S., Nomura, A., Sato, H., Jitsumoto, H., Sandr Drozd, A.: Overview of TSUBAME3.0, green cloud supercomputer for convergence of HPC, AI and big-data, GSIC, Tokyo Institute of Technology. e-Sci. J. 16, 2–9 (2017)

    Google Scholar 

  13. Midorikawa, H.: The performance analysis of portable parallel programming interface MpC for SDSM and pthread. In: Proceedings of IEEE/ACM International Symposium on Cluster Computing and the Grid CCGrid2005. Fifth International Workshop on Distributed Shared Memory (DSM2005), vol. 2, pp. 889–896 (2005). https://doi.org/10.1109/CCGRID.2005.155865

  14. Midorikawa, H.: Blk-Tune: blocking parameter auto-tuning to minimize input-output traffic for flash-based out-of-core stencil computations. In: Proceedings of IEEE International Parallel and Distributed Processing Symposium 2016 Workshop, IPDPSW2016, pp. 1516–1526 (2016). https://doi.org/10.1109/IPDPSW.2016.48

  15. Midorikawa, H., Tan, H.: Locality-aware stencil computations using flash SSDs as main memory extension. In: Proceedings of IEEE/ACM International Symposium on Cluster, Cloud and the Grid Computing CCGrid2015, pp. 1163–1168 (2015). https://doi.org/10.1109/CCGrid.2015.126

    Google Scholar 

  16. Midorikawa, H., Tan, H.: Evaluation of flash-based out-of-core stencil computation algorithms for SSD-equipped clusters. In: The 22nd IEEE International Conference on Parallel and Distributed Systems ICPADS2016, pp. 1031–1040 (2016). https://doi.org/10.1109/ICPADS.2016.0137

  17. Midorikawa, H., Tan, H.: A highly efficient I/O-based out-of-core stencil algorithm with globally optimized temporal blocking. In: Proceedings of 2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pp. 1–6 (2017). https://doi.org/10.1109/PACRIM.2017.8121909

  18. Midorikawa, H., Saito, K., et al.: Using a cluster as a memory resource: a fast and large virtual memory on MPI. In: Proceedings of IEEE International Conference on Cluster Computing Cluster2009, pp. 1–10 (2009). https://doi.org/10.1109/CLUSTR.2009.5289180

    Google Scholar 

  19. Midorikawa, H., Kitagawa, K., Ohura, H.: Efficient swap protocol for remote memory paging in out-of-core multi-thread applications. In: Proceedings of 2017 IEEE International Conference on Cluster Computing Cluster2017, pp. 637–638 (2017). https://doi.org/10.1109/CLUSTER.2017.55

  20. Nguyen, A., Satish, N., Chhugani, J., Kim, C., Dubey, P.: 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In: IEEE/ACM SC’10, 13p. (2010)

    Google Scholar 

  21. Onodera, N., Aoki, T., Shimokawabe, T., Miyashita, T., Kobayashi, H.: Large-Eddy simulation of fluid-structure interaction using lattice Boltzmann method on multi-GPU clusters. In: 5th Asia Pacific Congress on Computational Mechanics and 4th International Symposium on Computational Mechanics (2013).

    Google Scholar 

  22. Phillips, E.H., Fatica, M.: Implementing the Himeno benchmark with CUDA on GPU clusters. In: IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–10 (2010)

    Google Scholar 

  23. Satish, N., Kim, C., Chhugani, J., Saito, H., Krishnaiyer, R., Smelyanskiy, M., Girkar, M., Dubey, P.: Can traditional programming bridge the ninja performance gap for parallel computing applications? Commun. ACM 58(5), 77–86 (2015)

    Article  Google Scholar 

  24. Sato, Y., Endo, T.: An accurate simulator of cache-line conflicts to exploit the underlying cache performance. In: Proceedings of 23rd International European Conference on Parallel and Distributed Computing (Euro-Par 2017), pp. 119–133 (2017)

    Google Scholar 

  25. Sato, Y., Inoguchi, Y., Nakamura, T.: On-the-fly detection of precise loop nests across procedures on a dynamic binary translation system. In: Proceedings of the 8th ACM International Conference on Computing Frontiers, pp. 25:0–25:10 (2011)

    Google Scholar 

  26. Sato, Y., Inoguchi, Y., Nakamura, T.: Whole program data dependence profiling to unveil parallel regions in the dynamic execution. In: Proceedings of 2012 IEEE International Symposium on Workload Characterization (IISWC2012), pp. 69–80 (2012)

    Google Scholar 

  27. Sato, Y., Inoguchi, Y., Nakamura, T.: Identifying program loop nesting structures during execution of machine code. IEICE Trans. Inf. Syst. E97-D(9), 2371–2385 (2014)

    Article  Google Scholar 

  28. Sato, Y., Sato, S., Endo, T.: Exana: an execution-driven application analysis tool for assisting productive performance tuning. In: Proceedings of the 2nd International Workshop on Software Engineering for Parallel Systems, SEPS 2015, pp. 1–10 (2015)

    Google Scholar 

  29. Sato, Y., Yuki, T., Endo, T.: ExanaDBT: a dynamic compilation system for transparent polyhedral optimizations at runtime. In: ACM International Conference on Computing Frontiers 2017 (CF’17), p. 10 (2017)

    Book  Google Scholar 

  30. Shimokawabe, T., Aoki, T., Takaki, T., Yamanaka, A., Nukada, A., Endo, T., Maruyama, N., Matsuoka, S.: Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer. In: IEEE/ACM SC’11, 11p. (2011)

    Google Scholar 

  31. TSUBAME3.0: The super computer in Global Scientific Information and Computing Center, Tokyo Institute of Technology. http://www.gsic.titech.ac.jp/en. Online: 26 Mar 2018

  32. Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. ACM PLDI 91, 30–44 (1991)

    Google Scholar 

  33. Yuki, T., Sato, Y., Endo, T.: Evaluating autotuning heuristics for loop tiling. In: International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2018), p. 2 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Toshio Endo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Endo, T., Midorikawa, H., Sato, Y. (2019). Software Technology That Deals with Deeper Memory Hierarchy in Post-petascale Era. In: Sato, M. (eds) Advanced Software Technologies for Post-Peta Scale Computing. Springer, Singapore. https://doi.org/10.1007/978-981-13-1924-2_12

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-1924-2_12

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-1923-5

  • Online ISBN: 978-981-13-1924-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics