Analyzing I/O Performance of a Hierarchical HPC Storage System for Distributed Deep Learning

Fukai, Takaaki; Sato, Kento; Hirofuchi, Takahiro

doi:10.1007/978-3-031-29927-8_7

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13798))

Included in the following conference series:

International Conference on Parallel and Distributed Computing: Applications and Technologies

527 Accesses

Abstract

Deep learning is a vital technology in our lives today. Both the size of training datasets and neural networks are growing to tackle more challenging problems with deep learning. Distributed deep neural network (DDNN) training technique is necessary to train a model with large datasets and networks. For large-scale DDNN training, HPC clusters are excellent computation environments. I/O performance is critical in large-scale DDNN on HPC clusters because it is becoming a bottleneck. Most flagship-class HPC clusters have hierarchical storage systems. It is necessary to quantify the performance improvement effect of the hierarchical storage system on the workloads to design future HPC storage systems. This study demonstrates the quantitative performance analysis of the hierarchical storage system for DDNN workload in a flagship-class supercomputer. Our analysis shows how much performance improvement and storage volume increment will be required to achieve the performance goal.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Darshan-util installation and usage. https://www.mcs.anl.gov/research/projects/darshan/docs/darshan-util.html
Darshan - HPC I/O Characterization Tool. https://www.mcs.anl.gov/research/projects/darshan/
Akiba, T., et al.: Extremely large minibatch SGD: training ResNet-50 on ImageNet in 15 minutes (2017). https://arxiv.org/abs/1711.04325
Akimoto, H., et al.: File system and power management enhanced for supercomputer Fugaku. Fujitsu Tech. Rev. 3, 2020-03 (2020)
Google Scholar
Beal, J., et al.: Billion-scale pretraining with vision transformers for multi-task visual representations. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 564–573 (2022)
Google Scholar
Devarajan, H.: DLIO Benchmark. https://github.com/hariharan-devarajan/dlio_benchmark
Devarajan, H., et al.: DLIO: a data-centric benchmark for scientific deep learning applications. In: 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid), pp. 81–91 (2021)
Google Scholar
Dryden, N., et al.: Clairvoyant prefetching for distributed machine learning I/O. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2021. Association for Computing Machinery, New York (2021)
Google Scholar
Kurth, T., et al.: Exascale deep learning for climate analytics. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 649–660. IEEE (2018)
Google Scholar
Kuznetsova, A., et al.: The open images dataset V4. Int. J. Comput. Vision 128(7), 1956–1981 (2020). https://doi.org/10.1007/s11263-020-01316-z
Article Google Scholar
Mahajan, D., et al.: Exploring the limits of weakly supervised pretraining. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 181–196 (2018)
Google Scholar
Mathuriya, A., et al.: CosmoFlow: using deep learning to learn the universe at scale. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018. IEEE Press (2018)
Google Scholar
Mikami, H., et al.: Massively distributed SGD: ImageNet/ResNet-50 training in a flash (2018). https://arxiv.org/abs/1811.05233
Mohan, J., et al.: Analyzing and mitigating data stalls in DNN training. Proc. VLDB Endow. 14(5), 771–784 (2021)
Article Google Scholar
Paul, A.K., et al.: Characterizing machine learning I/O workloads on leadership scale HPC systems. In: 2021 29th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 1–8 (2021)
Google Scholar
Pumma, S., et al.: Scalable deep learning via I/O analysis and optimization. ACM Trans. Parallel Comput. 6(2) (2019)
Google Scholar
Sato, M., et al.: Co-design for A64FX manycore processor and “Fugaku”. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–15 (2020)
Google Scholar
Serizawa, K., Tatebe, O.: Accelerating machine learning I/O by overlapping data staging and mini-batch generations. In: Proceedings of the 6th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, BDCAT 2019, pp. 31–34. Association for Computing Machinery, New York (2019)
Google Scholar
Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 6105–6114. PMLR (2019). https://proceedings.mlr.press/v97/tan19a.html
Vazhkudai, S.S., et al.: The design, deployment, and evaluation of the CORAL pre-exascale systems. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 661–672 (2018)
Google Scholar
Wang, Y., et al.: Time-based roofline for deep learning performance analysis. In: 2020 IEEE/ACM Fourth Workshop on Deep Learning on Supercomputers (DLS), pp. 10–19 (2020)
Google Scholar
Yamazaki, M., et al.: Yet another accelerated SGD: ResNet-50 training on ImageNet in 74.7 seconds (2019). https://arxiv.org/abs/1903.12650
Zhu, Y., et al.: Entropy-aware I/O pipelining for large-scale deep learning on HPC systems. In: 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 145–156 (2018)
Google Scholar
Zhu, Y., et al.: Efficient user-level storage disaggregation for deep learning. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER), pp. 1–12 (2019)
Google Scholar

Download references

Acknowledgements

This research used computational resources of the supercomputer Fugaku provided by the RIKEN Center for Computational Science. The authors would like to thank Enago (www.enago.jp) for the English language review.

Author information

Authors and Affiliations

National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
Takaaki Fukai & Takahiro Hirofuchi
RIKEN Center for Computational Science, Kobe, Japan
Kento Sato

Authors

Takaaki Fukai
View author publications
You can also search for this author in PubMed Google Scholar
Kento Sato
View author publications
You can also search for this author in PubMed Google Scholar
Takahiro Hirofuchi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Takaaki Fukai .

Editor information

Editors and Affiliations

Tohoku University, Aoba-ku, Japan
Hiroyuki Takizawa
Sun Yat-sen University, Guangzhou, China
Hong Shen
The University of Tokyo, Tokyo, Japan
Toshihiro Hanawa
Seoul National University of Science and Technology, Seoul, Korea (Republic of)
Jong Hyuk Park
Griffith University, Queensland, QLD, Australia
Hui Tian
Tokyo Denki University, Tokyo, Japan
Ryusuke Egawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fukai, T., Sato, K., Hirofuchi, T. (2023). Analyzing I/O Performance of a Hierarchical HPC Storage System for Distributed Deep Learning. In: Takizawa, H., Shen, H., Hanawa, T., Hyuk Park, J., Tian, H., Egawa, R. (eds) Parallel and Distributed Computing, Applications and Technologies. PDCAT 2022. Lecture Notes in Computer Science, vol 13798. Springer, Cham. https://doi.org/10.1007/978-3-031-29927-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-29927-8_7
Published: 08 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-29926-1
Online ISBN: 978-3-031-29927-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Analyzing I/O Performance of a Hierarchical HPC Storage System for Distributed Deep Learning