Illuminating the I/O Optimization Path of Scientific Applications

Ather, Hammad; Bez, Jean Luca; Norris, Boyana; Byna, Suren

doi:10.1007/978-3-031-32041-5_2

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13948))

Included in the following conference series:

International Conference on High Performance Computing

1101 Accesses
1 Citations

Abstract

The existing parallel I/O stack is complex and difficult to tune due to the interdependencies among multiple factors that impact the performance of data movement between storage and compute systems. When performance is slower than expected, end-users, developers, and system administrators rely on I/O profiling and tracing information to pinpoint the root causes of inefficiencies. Despite having numerous tools that collect I/O metrics on production systems, it is not obvious where the I/O bottlenecks are (unless one is an I/O expert), their root causes, and what to do to solve them. Hence, there is a gap between the currently available metrics, the issues they represent, and the application of optimizations that would mitigate performance slowdowns. An I/O specialist often checks for common problems before diving into the specifics of each application and workload. Streamlining such analysis, investigation, and recommendations could close this gap without requiring a specialist to intervene in every case. In this paper, we propose a novel interactive, user-oriented visualization, and analysis framework, called Drishti. This framework helps users to pinpoint various root causes of I/O performance problems and to provide a set of actionable recommendations for improving performance based on the observed characteristics of an application. We evaluate the applicability and correctness of Drishti using four use cases from distinct science domains and demonstrate its value to end-users, developers, and system administrators when seeking to improve an application’s I/O performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: HPCTOOLKIT: tools for performance analysis of optimized parallel programs. CCPE 22(6), 685–701 (2010). https://doi.org/10.1002/cpe.1553
Article Google Scholar
Agarwal, M., Singhvi, D., Malakar, P., Byna, S.: Active learning-based automatic tuning and prediction of parallel I/O performance. In: 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW), pp. 20–29 (2019). https://doi.org/10.1109/PDSW49588.2019.00007
Bağbaba, A.: Improving collective I/o performance with machine learning supported auto-tuning. In: IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 814–821 (2020). https://doi.org/10.1109/IPDPSW50202.2020.00138
Behzad, B., Byna, S., Prabhat, Snir, M.: Optimizing I/O performance of HPC applications with autotuning. ACM Trans. Parallel Comput. 5(4) (2019). https://doi.org/10.1145/3309205
Bez, J.L., Ather, H., Byna, S.: Drishti: guiding end-users in the I/O optimization journey. In: 2022 IEEE/ACM International Parallel Data Systems Workshop (PDSW), pp. 1–6 (2022). https://doi.org/10.1109/PDSW56643.2022.00006
Bez, J.L., Boito, F.Z., Schnorr, L.M., Navaux, P.O.A., Méhaut, J.F.: TWINS: server access coordination in the I/O forwarding layer. In: 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), pp. 116–123 (2017). https://doi.org/10.1109/PDP.2017.61
Bez, J.L., Zanon Boito, F., Nou, R., Miranda, A., Cortes, T., Navaux, P.O.: Adaptive request scheduling for the I/O forwarding layer using reinforcement learning. Futur. Gener. Comput. Syst. 112, 1156–1169 (2020). https://doi.org/10.1016/j.future.2020.05.005
Article Google Scholar
Bez, J.L., et al.: I/O bottleneck detection and tuning: connecting the dots using interactive log analysis. In: 2021 IEEE/ACM 6th International Parallel Data Systems Workshop (PDSW), pp. 15–22 (2021). https://doi.org/10.1109/PDSW54622.2021.00008
Boito, F.Z., Kassick, R.V., Navaux, P.O., Denneulin, Y.: AGIOS: application-guided I/O scheduling for parallel file systems. In: International Conference on Parallel and Distributed Systems, pp. 43–50 (2013). https://doi.org/10.1109/ICPADS.2013.19
Carns, P., Kunkel, J., Mohror, K., Schulz, M.: Understanding I/O behavior in scientific and data-intensive computing (Dagstuhl Seminar 21332). Dagstuhl Rep. 11(7), 16–75 (2021). https://doi.org/10.4230/DagRep.11.7.16
Article Google Scholar
Carns, P., et al.: Understanding and improving computational science storage access through continuous characterization. ACM Trans. Storage 7(3) (2011). https://doi.org/10.1109/MSST.2011.5937212
Carretero, J., et al.: Mapping and scheduling hpc applications for optimizing I/O. In: Proceedings of the 34th ACM International Conference on Supercomputing. ICS’20 (2020). https://doi.org/10.1145/3392717.3392764
Darshan team: pyDarshan. https://github.com/darshan-hpc/darshan/tree/main/darshan-util/pydarshan
Huebl, A., et al.: openPMD: a meta data standard for particle and mesh based data (2015). https://doi.org/10.5281/zenodo.1167843
Knüpfer, A., et al.: Score-P: a joint performance measurement run-time infrastructure for periscope, scalasca, TAU, and vampir. In: Brunst, H., Müller, M.S., Nagel, W.E., Resch, M.M. (eds.) Tools High Perform. Comput., pp. 79–91. Springer, Berlin Heidelberg, Berlin, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31476-6_7
Chapter Google Scholar
Koller, F., et al.: openPMD-api: C++ & python API for scientific I/O with openPMD (2019). https://doi.org/10.14278/rodare.209
Kousha, P., et al.: INAM: cross-stack profiling and analysis of communication in MPI-based applications. In: Practice and Experience in Advanced Research Computing (2021). DOIurl10.1145/3437359.3465582
Google Scholar
Li, T., Byna, S., Koziol, Q., Tang, H., Bez, J.L., Kang, Q.: h5bench: HDF5 I/O kernel suite for exercising HPC I/O patterns. In: CUG (2021)
Google Scholar
Li, Y., Bel, O., Chang, K., Miller, E.L., Long, D.D.E.: CAPES: unsupervised storage performance tuning using neural network-based deep reinforcement learning. In: SC’17 (2017). DOIurl10.1145/3126908.3126951
Google Scholar
Liu, Y., Gunasekaran, R., Ma, X., Vazhkudai, S.S.: Server-side log data analytics for I/O workload characterization and coordination on large shared storage systems. In: SC16: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 819–829. IEEE (2016). https://doi.org/10.1109/SC.2016.69
Lockwood, G.K., Wright, N.J., Snyder, S., Carns, P., Brown, G., Harms, K.: TOKIO on ClusterStor: connecting standard tools to enable holistic I/O performance analysis. CUG (2018). https://www.osti.gov/biblio/1632125
Lockwood, G.K., et al.: UMAMI: a recipe for generating meaningful metrics through holistic I/O performance analysis. In: PDSW-DISCS, p. 55–60 (2017). https://doi.org/10.1145/3149393.3149395
Lockwood, G.K., et al.: A year in the life of a parallel file system. In: SC’18 (2018). https://doi.org/10.1109/SC.2018.00077
Lofstead, J., et al.: Six degrees of scientific data: reading patterns for extreme scale science IO. In: HPDC’11, pp. 49–60. ACM, New York (2011). https://doi.org/10.1145/1996130.1996139
Lofstead, J.F., Klasky, S., Schwan, K., Podhorszki, N., Jin, C.: Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS). In: CLADE, pp. 15–24. ACM, NY (2008). https://doi.org/10.1145/1383529.1383533
Nicolae, B., et al.: VeloC: towards high performance adaptive asynchronous checkpointing at large scale. In: IPDPS, pp. 911–920 (2019). https://doi.org/10.1109/IPDPS.2019.00099
NVIDIA: Nsight systems. https://developer.nvidia.com/nsight-systems
Pezoa, F., et al.: Foundations of JSON schema. In: Proceedings of the 25th International Conference on World Wide Web, pp. 263–273 (2016)
Google Scholar
Shende, S., et al.: Characterizing I/O performance using the TAU performance system. In: ParCo 2011, Advances in Parallel Computing, vol. 22, pp. 647–655. IOS Press (2011). https://doi.org/10.3233/978-1-61499-041-3-647
Snyder, S., et al.: Modular HPC I/O characterization with darshan. In: ESPT ’16, pp. 9–17. IEEE Press (2016). https://doi.org/10.1109/ESPT.2016.006
Stovner, E.B., Sætrom, P.: PyRanges: efficient comparison of genomic intervals in Python. Bioinformatics 36(3), 918–919 (2019). https://doi.org/10.1093/bioinformatics/btz615
Article Google Scholar
Sung, H., et al.: Understanding parallel I/o performance trends under various HPC configurations. In: Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics, pp. 29–36 (2019). https://doi.org/10.1145/3322798.3329258
Tang, H., Koziol, Q., Byna, S., Mainzer, J., Li, T.: Enabling transparent asynchronous I/O using background threads. In: 2019 IEEE/ACM 4th International Parallel Data Systems Workshop (PDSW), pp. 11–19 (2019). https://doi.org/10.1109/PDSW49588.2019.00006
Tang, H., Koziol, Q., Ravi, J., Byna, S.: Transparent asynchronous parallel I/O using background threads. IEEE TPDS 33(4), 891–902 (2022). https://doi.org/10.1109/TPDS.2021.3090322
Article Google Scholar
Taufer, M.: AI4IO: a suite of Ai-based tools for IO-aware HPC resource management. In: HiPC, pp. 1–1 (2021). https://doi.org/10.1109/HiPC53243.2021.00012
Tavakoli, N., Dai, D., Chen, Y.: Log-assisted straggler-aware I/O scheduler for high-end computing. In: 2016 45th International Conference on Parallel Processing Workshops (ICPPW), pp. 181–189 (2016). https://doi.org/10.1109/ICPPW.2016.38
Thakur, R., Gropp, W., Lusk, E.: Data sieving and collective I/O in ROMIO. In: Proceedings Frontiers ’99 7th Symposium on the Frontiers of Massively Parallel Computation, pp. 182–189 (1999). https://doi.org/10.1109/FMPC.1999.750599
The HDF Group: Hierarchical data format, version 5 (1997). http://www.hdfgroup.org/HDF5
The pandas Development Team: pandas-dev/pandas: Pandas (2020). https://doi.org/10.5281/zenodo.3509134
Wang, C., et al.: Recorder 2.0: efficient parallel I/O tracing and analysis. In: 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1–8 (2020). https://doi.org/10.1109/IPDPSW50202.2020.00176
Wang, T., et al.: A zoom-in analysis of I/O logs to detect root causes of I/O performance bottlenecks. In: CCGRID, pp. 102–111 (2019). https://doi.org/10.1109/CCGRID.2019.00021
Wang, T., et al.: IOMiner: large-scale analytics framework for gaining knowledge from I/O Logs. In: IEEE CLUSTER, pp. 466–476 (2018). https://doi.org/10.1109/CLUSTER.2018.00062
Wilkinson, L.: The Grammar of Graphics (Statistics and Computing). Springer-Verlag, Berlin (2005)
MATH Google Scholar
Xu, C., et al.: DXT: darshan eXtended tracing. CUG (2019)
Google Scholar
Yildiz, O., et al.: On the root causes of cross-application I/O interference in HPC storage systems. In: IEEE IPDPS, pp. 750–759 (2016). https://doi.org/10.1109/IPDPS.2016.50
Yu, J., Liu, G., Dong, W., Li, X., Zhang, J., Sun, F.: On the load imbalance problem of I/O forwarding layer in HPC systems. In: International Conference on Computer and Communications (ICCC), pp. 2424–2428 (2017). https://doi.org/10.1109/CompComm.2017.8322970
Zhang, W., et al.: AMReX: block-structured adaptive mesh refinement for multiphysics applications. Int. J. High Perform. Comput. Appl. 35(6), 508–526 (2021). https://doi.org/10.1177/10943420211022811
Article Google Scholar

Download references

Acknowledgment

This research was supported in part by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This research was also supported by The Ohio State University under a subcontract (GR130303), which was supported by the U.S. Department of Energy (DOE), Office of Science, Office of Advanced Scientific Computing Research (ASCR) under contract number DE-AC02-05CH11231 with LBNL. This research used resources of the National Energy Research Scientific Computing Center under Contract No. DE-AC02-05CH11231.

Author information

Authors and Affiliations

Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
Hammad Ather, Jean Luca Bez & Suren Byna
University of Oregon, Eugene, OR, 97403, USA
Hammad Ather & Boyana Norris
The Ohio State University, Columbus, OH, 43210, USA
Suren Byna

Authors

Hammad Ather
View author publications
You can also search for this author in PubMed Google Scholar
Jean Luca Bez
View author publications
You can also search for this author in PubMed Google Scholar
Boyana Norris
View author publications
You can also search for this author in PubMed Google Scholar
Suren Byna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jean Luca Bez .

Editor information

Editors and Affiliations

University of Maryland, College Park, MD, USA
Abhinav Bhatele
NVIDIA, Helsinki, Finland
Jeff Hammond
Université Paris-Saclay, Gif-sur-Yvette, France
Marc Baboulin
CERFACS, Toulouse, France
Carola Kruse

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ather, H., Bez, J.L., Norris, B., Byna, S. (2023). Illuminating the I/O Optimization Path of Scientific Applications. In: Bhatele, A., Hammond, J., Baboulin, M., Kruse, C. (eds) High Performance Computing. ISC High Performance 2023. Lecture Notes in Computer Science, vol 13948. Springer, Cham. https://doi.org/10.1007/978-3-031-32041-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-32041-5_2
Published: 10 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-32040-8
Online ISBN: 978-3-031-32041-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Illuminating the I/O Optimization Path of Scientific Applications