Software Technology That Deals with Deeper Memory Hierarchy in Post-petascale Era

Endo, Toshio; Midorikawa, Hiroko; Sato, Yukinori

doi:10.1007/978-981-13-1924-2_12

Toshio Endo²,
Hiroko Midorikawa³ &
Yukinori Sato⁴

446 Accesses
2 Citations

Abstract

There is an urgent need to develop technology that realizes larger, finer, and faster simulations in meteorology, bioinformatics, disaster measures, and so on, toward post-petascale era. However, the “memory wall” problem will be the one of largest obstacles; the growth of memory bandwidth and capacity will be even slower than that of processor throughput. For this purpose, we suppose system architecture with memory hierarchy including hybrid memory devices, including nonvolatile RAM (NVRAM), and develop new software technology that efficiently utilizes the hybrid memory hierarchy. The area of our research includes new compiler technology, memory management, and application algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Available at https://github.com/toshioendo/hhrt
2.
In the actual implementation, there are two transient states, “swapping-in” and “swapping-out.”
3.
https://github.com/YukinoriSato/ExanaPkg

References

Ammons, G., Ball, T., Larus, J.R.: Exploiting hardware performance counters with flow and context sensitive profiling. In: Proceedings of the ACM SIGPLAN 1997 conference on programming language design and implementation, pp. 85–96 (1997)
Google Scholar
Bernaschi, M., Bisson, M., Endo, T., Fatica, M., Matsuoka, S., Melchionna, S., Succi, S.: Petaflop biofluidics simulations on a two million-core system. In: IEEE/ACM SC’11, 12p. (2011)
Google Scholar
Endo, T.: Realizing out-of-core stencil computations using multi-Tier memory hierarchy on GPGPU clusters. In: IEEE Cluster Computing (CLUSTER2016), pp. 21–29 (2016)
Google Scholar
Endo, T., Jin, G.: Software technologies coping with memory hierarchy of GPGPU clusters for stencil computations. In: IEEE Cluster Computing (CLUSTER2014), pp. 132–139 (2014)
Google Scholar
Endo, T., Nukada, A., Matsuoka, S.: TSUBAME-KFC: a modern liquid submersion cooling prototype towards exascale becoming the greenest supercomputer in the world. In: IEEE International Conference on Parallel and Distributed Systems (ICPADS 2014), pp. 360–367 (2014)
Google Scholar
Endo, T., Takasaki, Y., Matsuoka, S.: Realizing extremely large-scale stencil applications on GPU supercomputers. In: IEEE International Conference on Parallel and Distributed Systems (ICPADS 2015), pp. 625–632 (2015)
Google Scholar
Grosser, T., Groesslinger, A., Lengauer, C.: Polly – performing polyhedral optimizations on a low-level intermediate representation. Parallel Process. Lett. 22(04), 1–28 (2012)
Article MathSciNet Google Scholar
Hong, C., et al.: Effective padding of multidimensional arrays to avoid cache conflict misses. In: Proceedings of the 37th ACM Conference on Programming Language Design and Implementation, PLDI ’16, pp. 129–144 (2016)
Google Scholar
Lucas, R., et al.: Top ten exascale research challenges, DOE ASCAC Subcommittee Report (2014)
Google Scholar
Luk, C.K., et al.: Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 190–200 (2005)
Google Scholar
Matsubara, Y., Sato, Y.: Online memory access pattern analysis on an application profiling tool. In: International Workshop on Advances in Networking and Computing, 2014 (WANC2014), pp. 602–604 (2014)
Book Google Scholar
Matsuoka, S., Endo, T., Nukada, A., Miura, S., Nomura, A., Sato, H., Jitsumoto, H., Sandr Drozd, A.: Overview of TSUBAME3.0, green cloud supercomputer for convergence of HPC, AI and big-data, GSIC, Tokyo Institute of Technology. e-Sci. J. 16, 2–9 (2017)
Google Scholar
Midorikawa, H.: The performance analysis of portable parallel programming interface MpC for SDSM and pthread. In: Proceedings of IEEE/ACM International Symposium on Cluster Computing and the Grid CCGrid2005. Fifth International Workshop on Distributed Shared Memory (DSM2005), vol. 2, pp. 889–896 (2005). https://doi.org/10.1109/CCGRID.2005.155865
Midorikawa, H.: Blk-Tune: blocking parameter auto-tuning to minimize input-output traffic for flash-based out-of-core stencil computations. In: Proceedings of IEEE International Parallel and Distributed Processing Symposium 2016 Workshop, IPDPSW2016, pp. 1516–1526 (2016). https://doi.org/10.1109/IPDPSW.2016.48
Midorikawa, H., Tan, H.: Locality-aware stencil computations using flash SSDs as main memory extension. In: Proceedings of IEEE/ACM International Symposium on Cluster, Cloud and the Grid Computing CCGrid2015, pp. 1163–1168 (2015). https://doi.org/10.1109/CCGrid.2015.126
Google Scholar
Midorikawa, H., Tan, H.: Evaluation of flash-based out-of-core stencil computation algorithms for SSD-equipped clusters. In: The 22nd IEEE International Conference on Parallel and Distributed Systems ICPADS2016, pp. 1031–1040 (2016). https://doi.org/10.1109/ICPADS.2016.0137
Midorikawa, H., Tan, H.: A highly efficient I/O-based out-of-core stencil algorithm with globally optimized temporal blocking. In: Proceedings of 2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pp. 1–6 (2017). https://doi.org/10.1109/PACRIM.2017.8121909
Midorikawa, H., Saito, K., et al.: Using a cluster as a memory resource: a fast and large virtual memory on MPI. In: Proceedings of IEEE International Conference on Cluster Computing Cluster2009, pp. 1–10 (2009). https://doi.org/10.1109/CLUSTR.2009.5289180
Google Scholar
Midorikawa, H., Kitagawa, K., Ohura, H.: Efficient swap protocol for remote memory paging in out-of-core multi-thread applications. In: Proceedings of 2017 IEEE International Conference on Cluster Computing Cluster2017, pp. 637–638 (2017). https://doi.org/10.1109/CLUSTER.2017.55
Nguyen, A., Satish, N., Chhugani, J., Kim, C., Dubey, P.: 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In: IEEE/ACM SC’10, 13p. (2010)
Google Scholar
Onodera, N., Aoki, T., Shimokawabe, T., Miyashita, T., Kobayashi, H.: Large-Eddy simulation of fluid-structure interaction using lattice Boltzmann method on multi-GPU clusters. In: 5th Asia Pacific Congress on Computational Mechanics and 4th International Symposium on Computational Mechanics (2013).
Google Scholar
Phillips, E.H., Fatica, M.: Implementing the Himeno benchmark with CUDA on GPU clusters. In: IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–10 (2010)
Google Scholar
Satish, N., Kim, C., Chhugani, J., Saito, H., Krishnaiyer, R., Smelyanskiy, M., Girkar, M., Dubey, P.: Can traditional programming bridge the ninja performance gap for parallel computing applications? Commun. ACM 58(5), 77–86 (2015)
Article Google Scholar
Sato, Y., Endo, T.: An accurate simulator of cache-line conflicts to exploit the underlying cache performance. In: Proceedings of 23rd International European Conference on Parallel and Distributed Computing (Euro-Par 2017), pp. 119–133 (2017)
Google Scholar
Sato, Y., Inoguchi, Y., Nakamura, T.: On-the-fly detection of precise loop nests across procedures on a dynamic binary translation system. In: Proceedings of the 8th ACM International Conference on Computing Frontiers, pp. 25:0–25:10 (2011)
Google Scholar
Sato, Y., Inoguchi, Y., Nakamura, T.: Whole program data dependence profiling to unveil parallel regions in the dynamic execution. In: Proceedings of 2012 IEEE International Symposium on Workload Characterization (IISWC2012), pp. 69–80 (2012)
Google Scholar
Sato, Y., Inoguchi, Y., Nakamura, T.: Identifying program loop nesting structures during execution of machine code. IEICE Trans. Inf. Syst. E97-D(9), 2371–2385 (2014)
Article Google Scholar
Sato, Y., Sato, S., Endo, T.: Exana: an execution-driven application analysis tool for assisting productive performance tuning. In: Proceedings of the 2nd International Workshop on Software Engineering for Parallel Systems, SEPS 2015, pp. 1–10 (2015)
Google Scholar
Sato, Y., Yuki, T., Endo, T.: ExanaDBT: a dynamic compilation system for transparent polyhedral optimizations at runtime. In: ACM International Conference on Computing Frontiers 2017 (CF’17), p. 10 (2017)
Book Google Scholar
Shimokawabe, T., Aoki, T., Takaki, T., Yamanaka, A., Nukada, A., Endo, T., Maruyama, N., Matsuoka, S.: Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer. In: IEEE/ACM SC’11, 11p. (2011)
Google Scholar
TSUBAME3.0: The super computer in Global Scientific Information and Computing Center, Tokyo Institute of Technology. http://www.gsic.titech.ac.jp/en. Online: 26 Mar 2018
Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. ACM PLDI 91, 30–44 (1991)
Google Scholar
Yuki, T., Sato, Y., Endo, T.: Evaluating autotuning heuristics for loop tiling. In: International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2018), p. 2 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Global Scientific Information and Computing Center, Tokyo Institute of Technology, Tokyo, Japan
Toshio Endo
Seikei University, Tokyo, Japan
Hiroko Midorikawa
Toyohashi University of Technology, Aichi, Japan
Yukinori Sato

Authors

Toshio Endo
View author publications
You can also search for this author in PubMed Google Scholar
Hiroko Midorikawa
View author publications
You can also search for this author in PubMed Google Scholar
Yukinori Sato
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Toshio Endo .

Editor information

Editors and Affiliations

RIKEN Center for Computational Science, Kobe, Japan
Mitsuhisa Sato

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Endo, T., Midorikawa, H., Sato, Y. (2019). Software Technology That Deals with Deeper Memory Hierarchy in Post-petascale Era. In: Sato, M. (eds) Advanced Software Technologies for Post-Peta Scale Computing. Springer, Singapore. https://doi.org/10.1007/978-981-13-1924-2_12

Download citation

DOI: https://doi.org/10.1007/978-981-13-1924-2_12
Published: 07 December 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1923-5
Online ISBN: 978-981-13-1924-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics