Skip to main content

Analyzing I/O Performance of a Hierarchical HPC Storage System for Distributed Deep Learning

  • Conference paper
  • First Online:
Parallel and Distributed Computing, Applications and Technologies (PDCAT 2022)

Abstract

Deep learning is a vital technology in our lives today. Both the size of training datasets and neural networks are growing to tackle more challenging problems with deep learning. Distributed deep neural network (DDNN) training technique is necessary to train a model with large datasets and networks. For large-scale DDNN training, HPC clusters are excellent computation environments. I/O performance is critical in large-scale DDNN on HPC clusters because it is becoming a bottleneck. Most flagship-class HPC clusters have hierarchical storage systems. It is necessary to quantify the performance improvement effect of the hierarchical storage system on the workloads to design future HPC storage systems. This study demonstrates the quantitative performance analysis of the hierarchical storage system for DDNN workload in a flagship-class supercomputer. Our analysis shows how much performance improvement and storage volume increment will be required to achieve the performance goal.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Darshan-util installation and usage. https://www.mcs.anl.gov/research/projects/darshan/docs/darshan-util.html

  2. Darshan - HPC I/O Characterization Tool. https://www.mcs.anl.gov/research/projects/darshan/

  3. Akiba, T., et al.: Extremely large minibatch SGD: training ResNet-50 on ImageNet in 15 minutes (2017). https://arxiv.org/abs/1711.04325

  4. Akimoto, H., et al.: File system and power management enhanced for supercomputer Fugaku. Fujitsu Tech. Rev. 3, 2020-03 (2020)

    Google Scholar 

  5. Beal, J., et al.: Billion-scale pretraining with vision transformers for multi-task visual representations. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 564–573 (2022)

    Google Scholar 

  6. Devarajan, H.: DLIO Benchmark. https://github.com/hariharan-devarajan/dlio_benchmark

  7. Devarajan, H., et al.: DLIO: a data-centric benchmark for scientific deep learning applications. In: 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid), pp. 81–91 (2021)

    Google Scholar 

  8. Dryden, N., et al.: Clairvoyant prefetching for distributed machine learning I/O. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2021. Association for Computing Machinery, New York (2021)

    Google Scholar 

  9. Kurth, T., et al.: Exascale deep learning for climate analytics. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 649–660. IEEE (2018)

    Google Scholar 

  10. Kuznetsova, A., et al.: The open images dataset V4. Int. J. Comput. Vision 128(7), 1956–1981 (2020). https://doi.org/10.1007/s11263-020-01316-z

    Article  Google Scholar 

  11. Mahajan, D., et al.: Exploring the limits of weakly supervised pretraining. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 181–196 (2018)

    Google Scholar 

  12. Mathuriya, A., et al.: CosmoFlow: using deep learning to learn the universe at scale. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018. IEEE Press (2018)

    Google Scholar 

  13. Mikami, H., et al.: Massively distributed SGD: ImageNet/ResNet-50 training in a flash (2018). https://arxiv.org/abs/1811.05233

  14. Mohan, J., et al.: Analyzing and mitigating data stalls in DNN training. Proc. VLDB Endow. 14(5), 771–784 (2021)

    Article  Google Scholar 

  15. Paul, A.K., et al.: Characterizing machine learning I/O workloads on leadership scale HPC systems. In: 2021 29th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 1–8 (2021)

    Google Scholar 

  16. Pumma, S., et al.: Scalable deep learning via I/O analysis and optimization. ACM Trans. Parallel Comput. 6(2) (2019)

    Google Scholar 

  17. Sato, M., et al.: Co-design for A64FX manycore processor and “Fugaku”. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–15 (2020)

    Google Scholar 

  18. Serizawa, K., Tatebe, O.: Accelerating machine learning I/O by overlapping data staging and mini-batch generations. In: Proceedings of the 6th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, BDCAT 2019, pp. 31–34. Association for Computing Machinery, New York (2019)

    Google Scholar 

  19. Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 6105–6114. PMLR (2019). https://proceedings.mlr.press/v97/tan19a.html

  20. Vazhkudai, S.S., et al.: The design, deployment, and evaluation of the CORAL pre-exascale systems. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 661–672 (2018)

    Google Scholar 

  21. Wang, Y., et al.: Time-based roofline for deep learning performance analysis. In: 2020 IEEE/ACM Fourth Workshop on Deep Learning on Supercomputers (DLS), pp. 10–19 (2020)

    Google Scholar 

  22. Yamazaki, M., et al.: Yet another accelerated SGD: ResNet-50 training on ImageNet in 74.7 seconds (2019). https://arxiv.org/abs/1903.12650

  23. Zhu, Y., et al.: Entropy-aware I/O pipelining for large-scale deep learning on HPC systems. In: 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 145–156 (2018)

    Google Scholar 

  24. Zhu, Y., et al.: Efficient user-level storage disaggregation for deep learning. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER), pp. 1–12 (2019)

    Google Scholar 

Download references

Acknowledgements

This research used computational resources of the supercomputer Fugaku provided by the RIKEN Center for Computational Science. The authors would like to thank Enago (www.enago.jp) for the English language review.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Takaaki Fukai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fukai, T., Sato, K., Hirofuchi, T. (2023). Analyzing I/O Performance of a Hierarchical HPC Storage System for Distributed Deep Learning. In: Takizawa, H., Shen, H., Hanawa, T., Hyuk Park, J., Tian, H., Egawa, R. (eds) Parallel and Distributed Computing, Applications and Technologies. PDCAT 2022. Lecture Notes in Computer Science, vol 13798. Springer, Cham. https://doi.org/10.1007/978-3-031-29927-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-29927-8_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-29926-1

  • Online ISBN: 978-3-031-29927-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics