Abstract
The existing parallel I/O stack is complex and difficult to tune due to the interdependencies among multiple factors that impact the performance of data movement between storage and compute systems. When performance is slower than expected, end-users, developers, and system administrators rely on I/O profiling and tracing information to pinpoint the root causes of inefficiencies. Despite having numerous tools that collect I/O metrics on production systems, it is not obvious where the I/O bottlenecks are (unless one is an I/O expert), their root causes, and what to do to solve them. Hence, there is a gap between the currently available metrics, the issues they represent, and the application of optimizations that would mitigate performance slowdowns. An I/O specialist often checks for common problems before diving into the specifics of each application and workload. Streamlining such analysis, investigation, and recommendations could close this gap without requiring a specialist to intervene in every case. In this paper, we propose a novel interactive, user-oriented visualization, and analysis framework, called Drishti. This framework helps users to pinpoint various root causes of I/O performance problems and to provide a set of actionable recommendations for improving performance based on the observed characteristics of an application. We evaluate the applicability and correctness of Drishti using four use cases from distinct science domains and demonstrate its value to end-users, developers, and system administrators when seeking to improve an application’s I/O performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: HPCTOOLKIT: tools for performance analysis of optimized parallel programs. CCPE 22(6), 685–701 (2010). https://doi.org/10.1002/cpe.1553
Agarwal, M., Singhvi, D., Malakar, P., Byna, S.: Active learning-based automatic tuning and prediction of parallel I/O performance. In: 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW), pp. 20–29 (2019). https://doi.org/10.1109/PDSW49588.2019.00007
Bağbaba, A.: Improving collective I/o performance with machine learning supported auto-tuning. In: IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 814–821 (2020). https://doi.org/10.1109/IPDPSW50202.2020.00138
Behzad, B., Byna, S., Prabhat, Snir, M.: Optimizing I/O performance of HPC applications with autotuning. ACM Trans. Parallel Comput. 5(4) (2019). https://doi.org/10.1145/3309205
Bez, J.L., Ather, H., Byna, S.: Drishti: guiding end-users in the I/O optimization journey. In: 2022 IEEE/ACM International Parallel Data Systems Workshop (PDSW), pp. 1–6 (2022). https://doi.org/10.1109/PDSW56643.2022.00006
Bez, J.L., Boito, F.Z., Schnorr, L.M., Navaux, P.O.A., Méhaut, J.F.: TWINS: server access coordination in the I/O forwarding layer. In: 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), pp. 116–123 (2017). https://doi.org/10.1109/PDP.2017.61
Bez, J.L., Zanon Boito, F., Nou, R., Miranda, A., Cortes, T., Navaux, P.O.: Adaptive request scheduling for the I/O forwarding layer using reinforcement learning. Futur. Gener. Comput. Syst. 112, 1156–1169 (2020). https://doi.org/10.1016/j.future.2020.05.005
Bez, J.L., et al.: I/O bottleneck detection and tuning: connecting the dots using interactive log analysis. In: 2021 IEEE/ACM 6th International Parallel Data Systems Workshop (PDSW), pp. 15–22 (2021). https://doi.org/10.1109/PDSW54622.2021.00008
Boito, F.Z., Kassick, R.V., Navaux, P.O., Denneulin, Y.: AGIOS: application-guided I/O scheduling for parallel file systems. In: International Conference on Parallel and Distributed Systems, pp. 43–50 (2013). https://doi.org/10.1109/ICPADS.2013.19
Carns, P., Kunkel, J., Mohror, K., Schulz, M.: Understanding I/O behavior in scientific and data-intensive computing (Dagstuhl Seminar 21332). Dagstuhl Rep. 11(7), 16–75 (2021). https://doi.org/10.4230/DagRep.11.7.16
Carns, P., et al.: Understanding and improving computational science storage access through continuous characterization. ACM Trans. Storage 7(3) (2011). https://doi.org/10.1109/MSST.2011.5937212
Carretero, J., et al.: Mapping and scheduling hpc applications for optimizing I/O. In: Proceedings of the 34th ACM International Conference on Supercomputing. ICS’20 (2020). https://doi.org/10.1145/3392717.3392764
Darshan team: pyDarshan. https://github.com/darshan-hpc/darshan/tree/main/darshan-util/pydarshan
Huebl, A., et al.: openPMD: a meta data standard for particle and mesh based data (2015). https://doi.org/10.5281/zenodo.1167843
Knüpfer, A., et al.: Score-P: a joint performance measurement run-time infrastructure for periscope, scalasca, TAU, and vampir. In: Brunst, H., Müller, M.S., Nagel, W.E., Resch, M.M. (eds.) Tools High Perform. Comput., pp. 79–91. Springer, Berlin Heidelberg, Berlin, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31476-6_7
Koller, F., et al.: openPMD-api: C++ & python API for scientific I/O with openPMD (2019). https://doi.org/10.14278/rodare.209
Kousha, P., et al.: INAM: cross-stack profiling and analysis of communication in MPI-based applications. In: Practice and Experience in Advanced Research Computing (2021). DOIurl10.1145/3437359.3465582
Li, T., Byna, S., Koziol, Q., Tang, H., Bez, J.L., Kang, Q.: h5bench: HDF5 I/O kernel suite for exercising HPC I/O patterns. In: CUG (2021)
Li, Y., Bel, O., Chang, K., Miller, E.L., Long, D.D.E.: CAPES: unsupervised storage performance tuning using neural network-based deep reinforcement learning. In: SC’17 (2017). DOIurl10.1145/3126908.3126951
Liu, Y., Gunasekaran, R., Ma, X., Vazhkudai, S.S.: Server-side log data analytics for I/O workload characterization and coordination on large shared storage systems. In: SC16: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 819–829. IEEE (2016). https://doi.org/10.1109/SC.2016.69
Lockwood, G.K., Wright, N.J., Snyder, S., Carns, P., Brown, G., Harms, K.: TOKIO on ClusterStor: connecting standard tools to enable holistic I/O performance analysis. CUG (2018). https://www.osti.gov/biblio/1632125
Lockwood, G.K., et al.: UMAMI: a recipe for generating meaningful metrics through holistic I/O performance analysis. In: PDSW-DISCS, p. 55–60 (2017). https://doi.org/10.1145/3149393.3149395
Lockwood, G.K., et al.: A year in the life of a parallel file system. In: SC’18 (2018). https://doi.org/10.1109/SC.2018.00077
Lofstead, J., et al.: Six degrees of scientific data: reading patterns for extreme scale science IO. In: HPDC’11, pp. 49–60. ACM, New York (2011). https://doi.org/10.1145/1996130.1996139
Lofstead, J.F., Klasky, S., Schwan, K., Podhorszki, N., Jin, C.: Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS). In: CLADE, pp. 15–24. ACM, NY (2008). https://doi.org/10.1145/1383529.1383533
Nicolae, B., et al.: VeloC: towards high performance adaptive asynchronous checkpointing at large scale. In: IPDPS, pp. 911–920 (2019). https://doi.org/10.1109/IPDPS.2019.00099
NVIDIA: Nsight systems. https://developer.nvidia.com/nsight-systems
Pezoa, F., et al.: Foundations of JSON schema. In: Proceedings of the 25th International Conference on World Wide Web, pp. 263–273 (2016)
Shende, S., et al.: Characterizing I/O performance using the TAU performance system. In: ParCo 2011, Advances in Parallel Computing, vol. 22, pp. 647–655. IOS Press (2011). https://doi.org/10.3233/978-1-61499-041-3-647
Snyder, S., et al.: Modular HPC I/O characterization with darshan. In: ESPT ’16, pp. 9–17. IEEE Press (2016). https://doi.org/10.1109/ESPT.2016.006
Stovner, E.B., Sætrom, P.: PyRanges: efficient comparison of genomic intervals in Python. Bioinformatics 36(3), 918–919 (2019). https://doi.org/10.1093/bioinformatics/btz615
Sung, H., et al.: Understanding parallel I/o performance trends under various HPC configurations. In: Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics, pp. 29–36 (2019). https://doi.org/10.1145/3322798.3329258
Tang, H., Koziol, Q., Byna, S., Mainzer, J., Li, T.: Enabling transparent asynchronous I/O using background threads. In: 2019 IEEE/ACM 4th International Parallel Data Systems Workshop (PDSW), pp. 11–19 (2019). https://doi.org/10.1109/PDSW49588.2019.00006
Tang, H., Koziol, Q., Ravi, J., Byna, S.: Transparent asynchronous parallel I/O using background threads. IEEE TPDS 33(4), 891–902 (2022). https://doi.org/10.1109/TPDS.2021.3090322
Taufer, M.: AI4IO: a suite of Ai-based tools for IO-aware HPC resource management. In: HiPC, pp. 1–1 (2021). https://doi.org/10.1109/HiPC53243.2021.00012
Tavakoli, N., Dai, D., Chen, Y.: Log-assisted straggler-aware I/O scheduler for high-end computing. In: 2016 45th International Conference on Parallel Processing Workshops (ICPPW), pp. 181–189 (2016). https://doi.org/10.1109/ICPPW.2016.38
Thakur, R., Gropp, W., Lusk, E.: Data sieving and collective I/O in ROMIO. In: Proceedings Frontiers ’99 7th Symposium on the Frontiers of Massively Parallel Computation, pp. 182–189 (1999). https://doi.org/10.1109/FMPC.1999.750599
The HDF Group: Hierarchical data format, version 5 (1997). http://www.hdfgroup.org/HDF5
The pandas Development Team: pandas-dev/pandas: Pandas (2020). https://doi.org/10.5281/zenodo.3509134
Wang, C., et al.: Recorder 2.0: efficient parallel I/O tracing and analysis. In: 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1–8 (2020). https://doi.org/10.1109/IPDPSW50202.2020.00176
Wang, T., et al.: A zoom-in analysis of I/O logs to detect root causes of I/O performance bottlenecks. In: CCGRID, pp. 102–111 (2019). https://doi.org/10.1109/CCGRID.2019.00021
Wang, T., et al.: IOMiner: large-scale analytics framework for gaining knowledge from I/O Logs. In: IEEE CLUSTER, pp. 466–476 (2018). https://doi.org/10.1109/CLUSTER.2018.00062
Wilkinson, L.: The Grammar of Graphics (Statistics and Computing). Springer-Verlag, Berlin (2005)
Xu, C., et al.: DXT: darshan eXtended tracing. CUG (2019)
Yildiz, O., et al.: On the root causes of cross-application I/O interference in HPC storage systems. In: IEEE IPDPS, pp. 750–759 (2016). https://doi.org/10.1109/IPDPS.2016.50
Yu, J., Liu, G., Dong, W., Li, X., Zhang, J., Sun, F.: On the load imbalance problem of I/O forwarding layer in HPC systems. In: International Conference on Computer and Communications (ICCC), pp. 2424–2428 (2017). https://doi.org/10.1109/CompComm.2017.8322970
Zhang, W., et al.: AMReX: block-structured adaptive mesh refinement for multiphysics applications. Int. J. High Perform. Comput. Appl. 35(6), 508–526 (2021). https://doi.org/10.1177/10943420211022811
Acknowledgment
This research was supported in part by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This research was also supported by The Ohio State University under a subcontract (GR130303), which was supported by the U.S. Department of Energy (DOE), Office of Science, Office of Advanced Scientific Computing Research (ASCR) under contract number DE-AC02-05CH11231 with LBNL. This research used resources of the National Energy Research Scientific Computing Center under Contract No. DE-AC02-05CH11231.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ather, H., Bez, J.L., Norris, B., Byna, S. (2023). Illuminating the I/O Optimization Path of Scientific Applications. In: Bhatele, A., Hammond, J., Baboulin, M., Kruse, C. (eds) High Performance Computing. ISC High Performance 2023. Lecture Notes in Computer Science, vol 13948. Springer, Cham. https://doi.org/10.1007/978-3-031-32041-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-32041-5_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-32040-8
Online ISBN: 978-3-031-32041-5
eBook Packages: Computer ScienceComputer Science (R0)