Characterizing I/O Optimization Effect Through Holistic Log Data Analysis of Parallel File Systems and Interconnects

Tsujita, Yuichi; Furutani, Yoshitaka; Hida, Hajime; Yamamoto, Keiji; Uno, Atsuya

doi:10.1007/978-3-030-59851-8_11

Yuichi Tsujita¹²,
Yoshitaka Furutani¹³,
Hajime Hida¹⁴,
Keiji Yamamoto¹² &
…
Atsuya Uno¹²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12321))

Included in the following conference series:

International Conference on High Performance Computing

Abstract

Recent HPC systems utilize parallel file systems such as GPFS and Lustre to cope with the huge demand of data-intensive applications. Although most of the HPC systems provide performance tuning tools on compute nodes, there is not enough chance to tune I/O activities on parallel file systems including high-speed interconnects among compute nodes and file systems. We propose an I/O performance optimization framework using log data of parallel file systems and interconnects in a holistic way for improving performance of HPC systems including I/O nodes and parallel file systems. We demonstrate our framework at the K computer with two I/O benchmarks for the original and the enhanced MPI-IO implementations. Its I/O analysis has revealed that I/O performance improvements achieved by the enhanced MPI-IO implementation are due to effective utilization of parallel file systems and interconnects among I/O nodes compared with the original MPI-IO implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ajima, Y., Inoue, T., Hiramoto, S., Takagi, Y., Shimizu, T.: The Tofu interconnect. IEEE Micro 32(1), 21–31 (2012)
Article Google Scholar
Ching, A., Choudhary, A., keng Liao, W., Ward, L., Pundit, N.: Evaluating I/O characteristics and methods for storing structured scientific data. In: Proceedings 20th IEEE International Parallel and Distributed Processing Symposium, p. 49. IEEE Computer Society, April 2006
Google Scholar
fluentd. https://www.fluentd.org/
Ida, K., Ohno, Y., Inoue, S., Minami, K.: Performance profiling and debugging on the k computer. Fujitsu Sci. Tech. J. 48(3), 331–339 (2012)
Google Scholar
IOR. https://github.com/hpc/ior
Kumar, M., et al.: Understanding and analyzing interconnect errors and network congestion on a large scale HPC system. In: 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2018, pp. 107–114. IEEE, June 2018
Google Scholar
Kunkel, J.M., et al.: The SIOX architecture – coupling automatic monitoring and optimization of parallel I/O. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 245–260. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07518-1_16
Chapter Google Scholar
Liu, Y., Gunasekaran, R., Ma, X., Vazhkudai, S.S.: Automatic identification of application I/O signatures from noisy server-side traces. In: Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST 2014), USENIX, pp. 213–228 (2014)
Google Scholar
Liu, Y., Gunasekaran, R., Ma, X., Vazhkudai, S.S.: Server-side log data analytics for I/O workload characterization and coordination on large shared storage systems. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016. ACM (2016)
Google Scholar
Lockwood, G.K., Wright, N.J., Snyder, S., Carns, P., Brown, G., Harms, K.: TOKIO on ClusterStor: connecting standard tools to enable holistic I/O performance analysis. In: 2018 Cray User Group Meeting (CUG) (2018)
Google Scholar
Lustre. http://lustre.org/
Luu, H., Winslett, M., Gropp, W., Ross, R., Carns, P., Harms, K., Prabhat, M., Byna, S., Yao, Y.: A multiplatform study of I/O behavior on petascale supercomputers. In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2015, pp. 33–44. ACM (2015)
Google Scholar
Madireddy, S., et al.: Analysis and correlation of application I/O performance and system-wide I/O activity. In: Proceedings of the 2017 International Conference on Networking, Architecture, and Storage (NAS), pp. 1–10. IEEE (2017)
Google Scholar
MPI Forum. https://www.mpi-forum.org/
Patel, T., Byna, S., Lockwood, G.K., Tiwari, D.: Revisiting I/O behavior in large-scale storage systems: The expected and the unexpected. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019, pp. 65:1–65:13. ACM (2019)
Google Scholar
Post-K (Fugaku) Information. https://postk-web.r-ccs.riken.jp/index.html
Saini, S., Rappleye, J., Chang, J., Barker, D., Mehrotra, P., Biswas, R.: I/O performance characterization of Lustre and NASA applications on Pleiades. In: 19th International Conference on High Performance Computing (HiPC), pp. 1–10 (2012)
Google Scholar
Sakai, K., Sumimoto, S., Kurokawa, M.: High-performance and highly reliable file system for the K computer. Fujitsu Sci. Tech. J. 48(3), 302–309 (2012)
Google Scholar
Schmuck, F., Haskin, R.: GPFS: a shared-disk file system for large computing clusters. In: Proceedings of the 1st USENIX Conference on File and Storage Technologies, FAST 2002, USENIX Association (2002)
Google Scholar
Thakur, R., Gropp, W., Lusk, E.: On implementing MPI-IO portably and with high performance. In: Proceedings of the Sixth Workshop on Input/Output in Parallel and Distributed Systems, pp. 23–32 (1999)
Google Scholar
Tsujita, Y., Hori, A., Ishikawa, Y.: Locality-aware process mapping for high performance collective MPI-IO on FEFS with Tofu interconnect. In: Proceedings of the 21th European MPI Users’ Group Meeting, EuroMPI/ASIA 2014, pp. 157:157–157:162. ACM (2014). Challenges in Data-Centric Computing
Google Scholar
Tsujita, Y., Hori, A., Kameyama, T., Uno, A., Shoji, F., Ishikawa, Y.: Improving collective MPI-IO using topology-aware stepwise data aggregation with I/O throttling. In: Proceedings of HPC Asia 2018: International Conference on High Performance Computing in Asia-Pacific Region, 28–31 January 2018, pp. 12–23. ACM (2018)
Google Scholar
Uselton, A., Wright, N.: A file system utilization metric for I/O characterization. In: 2013 Cray User Group Meeting (2013)
Google Scholar
Xie, B., et al.: Characterizing output bottlenecks in a supercomputer. In: Proceedings of 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012, pp. 1–11. IEEE (2012)
Google Scholar
Xu, C., et al.: LIOProf: exposing Lustre file system behavior for I/O middleware. In: 2016 Cray User Group Meeting, May 2016
Google Scholar
Yang, B., et al.: End-to-end I/O monitoring on a leading supercomputer. In: Proceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2019, pp. 379–394. USENIX (2019)
Google Scholar
Zimmer, C., Gupta, S., Larrea, V.G.V.: Finally, a way to measure frontend I/O performance. In: 2016 Cray User Group Meeting (CUG) (2016)
Google Scholar

Download references

Acknowledgment

This research used computational resources of the K computer provided by the RIKEN Center for Computational Science.

Author information

Authors and Affiliations

RIKEN Center for Computational Science, Kobe, Japan
Yuichi Tsujita, Keiji Yamamoto & Atsuya Uno
Fujitsu Limited, Tokyo, Japan
Yoshitaka Furutani
Fujitsu Social Science Laboratory Limited, Kawasaki, Japan
Hajime Hida

Authors

Yuichi Tsujita
View author publications
You can also search for this author in PubMed Google Scholar
Yoshitaka Furutani
View author publications
You can also search for this author in PubMed Google Scholar
Hajime Hida
View author publications
You can also search for this author in PubMed Google Scholar
Keiji Yamamoto
View author publications
You can also search for this author in PubMed Google Scholar
Atsuya Uno
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuichi Tsujita .

Editor information

Editors and Affiliations

University of Tennessee at Knoxville, Knowville, TN, USA
Heike Jagode
Department of Mathematics, KIT für Technologie Karlsruhe, Karlsruhe, Baden-Württemberg, Germany
Hartwig Anzt
Computational Science, Helmholtz-Zentrum Dresden Rossendorf, Dresden, Sachsen, Germany
Guido Juckeland
Extreme Computing Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
Hatem Ltaief

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tsujita, Y., Furutani, Y., Hida, H., Yamamoto, K., Uno, A. (2020). Characterizing I/O Optimization Effect Through Holistic Log Data Analysis of Parallel File Systems and Interconnects. In: Jagode, H., Anzt, H., Juckeland, G., Ltaief, H. (eds) High Performance Computing. ISC High Performance 2020. Lecture Notes in Computer Science(), vol 12321. Springer, Cham. https://doi.org/10.1007/978-3-030-59851-8_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-59851-8_11
Published: 20 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59850-1
Online ISBN: 978-3-030-59851-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics