Abstract
Deep learning is a vital technology in our lives today. Both the size of training datasets and neural networks are growing to tackle more challenging problems with deep learning. Distributed deep neural network (DDNN) training technique is necessary to train a model with large datasets and networks. For large-scale DDNN training, HPC clusters are excellent computation environments. I/O performance is critical in large-scale DDNN on HPC clusters because it is becoming a bottleneck. Most flagship-class HPC clusters have hierarchical storage systems. It is necessary to quantify the performance improvement effect of the hierarchical storage system on the workloads to design future HPC storage systems. This study demonstrates the quantitative performance analysis of the hierarchical storage system for DDNN workload in a flagship-class supercomputer. Our analysis shows how much performance improvement and storage volume increment will be required to achieve the performance goal.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Darshan-util installation and usage. https://www.mcs.anl.gov/research/projects/darshan/docs/darshan-util.html
Darshan - HPC I/O Characterization Tool. https://www.mcs.anl.gov/research/projects/darshan/
Akiba, T., et al.: Extremely large minibatch SGD: training ResNet-50 on ImageNet in 15 minutes (2017). https://arxiv.org/abs/1711.04325
Akimoto, H., et al.: File system and power management enhanced for supercomputer Fugaku. Fujitsu Tech. Rev. 3, 2020-03 (2020)
Beal, J., et al.: Billion-scale pretraining with vision transformers for multi-task visual representations. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 564–573 (2022)
Devarajan, H.: DLIO Benchmark. https://github.com/hariharan-devarajan/dlio_benchmark
Devarajan, H., et al.: DLIO: a data-centric benchmark for scientific deep learning applications. In: 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid), pp. 81–91 (2021)
Dryden, N., et al.: Clairvoyant prefetching for distributed machine learning I/O. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2021. Association for Computing Machinery, New York (2021)
Kurth, T., et al.: Exascale deep learning for climate analytics. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 649–660. IEEE (2018)
Kuznetsova, A., et al.: The open images dataset V4. Int. J. Comput. Vision 128(7), 1956–1981 (2020). https://doi.org/10.1007/s11263-020-01316-z
Mahajan, D., et al.: Exploring the limits of weakly supervised pretraining. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 181–196 (2018)
Mathuriya, A., et al.: CosmoFlow: using deep learning to learn the universe at scale. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018. IEEE Press (2018)
Mikami, H., et al.: Massively distributed SGD: ImageNet/ResNet-50 training in a flash (2018). https://arxiv.org/abs/1811.05233
Mohan, J., et al.: Analyzing and mitigating data stalls in DNN training. Proc. VLDB Endow. 14(5), 771–784 (2021)
Paul, A.K., et al.: Characterizing machine learning I/O workloads on leadership scale HPC systems. In: 2021 29th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 1–8 (2021)
Pumma, S., et al.: Scalable deep learning via I/O analysis and optimization. ACM Trans. Parallel Comput. 6(2) (2019)
Sato, M., et al.: Co-design for A64FX manycore processor and “Fugaku”. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–15 (2020)
Serizawa, K., Tatebe, O.: Accelerating machine learning I/O by overlapping data staging and mini-batch generations. In: Proceedings of the 6th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, BDCAT 2019, pp. 31–34. Association for Computing Machinery, New York (2019)
Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 6105–6114. PMLR (2019). https://proceedings.mlr.press/v97/tan19a.html
Vazhkudai, S.S., et al.: The design, deployment, and evaluation of the CORAL pre-exascale systems. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 661–672 (2018)
Wang, Y., et al.: Time-based roofline for deep learning performance analysis. In: 2020 IEEE/ACM Fourth Workshop on Deep Learning on Supercomputers (DLS), pp. 10–19 (2020)
Yamazaki, M., et al.: Yet another accelerated SGD: ResNet-50 training on ImageNet in 74.7 seconds (2019). https://arxiv.org/abs/1903.12650
Zhu, Y., et al.: Entropy-aware I/O pipelining for large-scale deep learning on HPC systems. In: 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 145–156 (2018)
Zhu, Y., et al.: Efficient user-level storage disaggregation for deep learning. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER), pp. 1–12 (2019)
Acknowledgements
This research used computational resources of the supercomputer Fugaku provided by the RIKEN Center for Computational Science. The authors would like to thank Enago (www.enago.jp) for the English language review.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Fukai, T., Sato, K., Hirofuchi, T. (2023). Analyzing I/O Performance of a Hierarchical HPC Storage System for Distributed Deep Learning. In: Takizawa, H., Shen, H., Hanawa, T., Hyuk Park, J., Tian, H., Egawa, R. (eds) Parallel and Distributed Computing, Applications and Technologies. PDCAT 2022. Lecture Notes in Computer Science, vol 13798. Springer, Cham. https://doi.org/10.1007/978-3-031-29927-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-29927-8_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-29926-1
Online ISBN: 978-3-031-29927-8
eBook Packages: Computer ScienceComputer Science (R0)