Advertisement

The SIOX Architecture – Coupling Automatic Monitoring and Optimization of Parallel I/O

  • Julian M. Kunkel
  • Michaela Zimmer
  • Nathanael Hübbe
  • Alvaro Aguilera
  • Holger Mickler
  • Xuan Wang
  • Andriy Chut
  • Thomas Bönisch
  • Jakob Lüttgau
  • Roman Michel
  • Johann Weging
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8488)

Abstract

Performance analysis and optimization of high-performance I/O systems is a daunting task. Mainly, this is due to the overwhelmingly complex interplay of the involved hardware and software layers. The Scalable I/O for Extreme Performance (SIOX) project provides a versatile environment for monitoring I/O activities and learning from this information. The goal of SIOX is to automatically suggest and apply performance optimizations, and to assist in locating and diagnosing performance problems.

In this paper, we present the current status of SIOX. Our modular architecture covers instrumentation of POSIX, MPI and other high-level I/O libraries; the monitoring data is recorded asynchronously into a global database, and recorded traces can be visualized. Furthermore, we offer a set of primitive plug-ins with additional features to demonstrate the flexibility of our architecture: A surveyor plug-in to keep track of the observed spatial access patterns; an fadvise plug-in for injecting hints to achieve read-ahead for strided access patterns; and an optimizer plug-in which monitors the performance achieved with different MPI-IO hints, automatically supplying the best known hint-set when no hints were explicitly set. The presentation of the technical status is accompanied by a demonstration of some of these features on our 20 node cluster. In additional experiments, we analyze the overhead for concurrent access, for MPI-IO’s 4-levels of access, and for an instrumented climate application.

While our prototype is not yet full-featured, it demonstrates the potential and feasibility of our approach.

Keywords

Parallel I/O Machine Learning Performance Optimization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Carns, P.H., Latham, R., Ross, R.B., Iskra, K., Lang, S., Riley, K.: 24/7 Characterization of Petascale I/O Workloads. In: Proceedings of the First Workshop on Interfaces and Abstractions for Scientific Data Storage, New Orleans, LA, USA (September 2009)Google Scholar
  2. 2.
    Madhyastha, T., Reed, D.: Learning to Classify Parallel Input/Output Access Patterns. Parallel and Distributed Systems, IEEE Transactions on 13(8), 802–813 (2002)CrossRefGoogle Scholar
  3. 3.
    Barham, P., Donnelly, A., Isaacs, R., Mortier, R.: Using Magpie for Request Extraction and Workload Modelling. In: Proceedings of the 6th Symposium on Opearting Systems Design and Implementation, vol. 6, pp. 259–272 (2004)Google Scholar
  4. 4.
    Yuan, C., Lao, N., Wen, J.R., Li, J., Zhang, Z., Wang, Y.M., Ma, W.Y.: Automated Known Problem Diagnosis with Event Traces. In: Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems, EuroSys 2006, pp. 375–388. ACM, New York (2006)CrossRefGoogle Scholar
  5. 5.
    Sandeep, S.R., Swapna, M., Niranjan, T., Susarla, S., Nandi, S.: CLUEBOX: a Performance Log Analyzer for Automated Troubleshooting. In: Proceedings of the First USENIX Conference on Analysis of system logs, WASL 2008, USENIX Association, Berkeley (2008)Google Scholar
  6. 6.
    Duan, S., Babu, S., Munagala, K.: Fa: A System for Automating Failure Diagnosis. In: Data Engineering. In: IEEE 25th International Conference on ICDE 2009, March 29-April 2, pp. 1012–1023 (2009)Google Scholar
  7. 7.
    Behzad, B., Huchette, J., Luu, H.V.T., Aydt, R., Byna, S., Yao, Y., Koziol, Q.: Prabhat: A framework for auto-tuning hdf5 applications. In: Proceedings of the 22Nd International Symposium on High-performance Parallel and Distributed Computing, HPDC 2013, pp. 127–128. ACM, New York (2013)Google Scholar
  8. 8.
    Wiedemann, M.C., Kunkel, J.M., Zimmer, M., Ludwig, T., Resch, M., Bönisch, T., Wang, X., Chut, A., Aguilera, A., Nagel, W.E., Kluge, M., Mickler, H.: Towards I/O Analysis of HPC Systems and a Generic Architecture to Collect Access Patterns. Computer Science - Research and Development 1, 1–11 (2012)Google Scholar
  9. 9.
    Zimmer, M., Kunkel, J.M., Ludwig, T.: Towards self-optimization in HPC I/O. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. LNCS, vol. 7905, pp. 422–434. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  10. 10.
    Mordvinova, O., Runz, D., Kunkel, J., Ludwig, T.: I/O Performance Evaluation with Parabench – Programmable I/O Benchmark. Procedia Computer Science, 2119–2128 (2010)Google Scholar
  11. 11.
    Max-Planck-Institut für Meteorologie: ICON, http://www.mpimet.mpg.de/en/science/models/icon.html
  12. 12.
    Thakur, R., Gropp, W., Lusk, E.: Optimizing Noncontiguous Accesses in MPI/IO. Parallel Computing 28(1), 83–105 (2002)CrossRefzbMATHGoogle Scholar
  13. 13.
    IBM: Data Management API Guide (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Julian M. Kunkel
    • 1
  • Michaela Zimmer
    • 1
  • Nathanael Hübbe
    • 1
  • Alvaro Aguilera
    • 2
  • Holger Mickler
    • 2
  • Xuan Wang
    • 3
  • Andriy Chut
    • 3
  • Thomas Bönisch
    • 3
  • Jakob Lüttgau
    • 1
  • Roman Michel
    • 1
  • Johann Weging
    • 1
  1. 1.University of HamburgGermany
  2. 2.ZIH DresdenGermany
  3. 3.HLRS StuttgartGermany

Personalised recommendations