Bottleneck Detection in Parallel File Systems with Trace-Based Performance Monitoring

  • Julian M. Kunkel
  • Thomas Ludwig
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5168)


Today we recognize a high demand for powerful storage. In industry this issue is tackled either with large storage area networks, or by deploying parallel file systems on top of RAID systems or on smaller storage networks. The bigger the system gets the more important is the ability to analyze the performance and to identify bottlenecks in the architecture and the applications.

We extended the performance monitor available in the parallel file system PVFS2 by including statistics of the server process and information of the system. Performance monitor data is available during runtime and the server process was modified to store this data in off-line traces suitable for post-mortem analysis. These values can be used to detect bottlenecks in the system. Some measured results demonstrate how these help to identify bottlenecks and may assists to rank the servers depending on their capabilities.


File System Idle Time Server Process Performance Monitor Load Load 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Mellor, C.: US defense department builds world’s biggest SAN (2006),
  2. 2.
    Schmuck, F., Haskin, R.: GPFS: A Shared-Disk File System for Large Computing Clusters. In: Proc. of the First Conference on File and Storage Technologies (FAST), January 2002, pp. 231–244 (2002)Google Scholar
  3. 3.
    Cluster File Systems Inc: Lustre,
  4. 4.
    IBM: General Parallel File System - Advanced Administration Guide V3.1. (2006),
  5. 5.
  6. 6.
    Cluster File Systems Inc: Lustre Debugging (2007),
  7. 7.
    Cluster File Systems Inc: Lustre: Profiling Tools for IO (2007),
  8. 8.
    Ligon, W., Ross, R.: PVFS: Parallel Virtual File System. In: Sterling, T. (ed.) Beowulf Cluster Computing with Linux. Scientific and Engineering Computation, November 2001, pp. 391–430. The MIT Press, Cambridge (2001)Google Scholar
  9. 9.
    Seger, M.: Homepage of collectl,
  10. 10.
    Forster, F.: Homepage of collectd,
  11. 11.
    Ludwig, T., Krempel, S., Kunkel, J.M., Panse, F., Withanage, D.: Tracing the MPI-IO Calls’ Disk Accesses. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds.) PVM/MPI 2006. LNCS, vol. 4192, pp. 322–330. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  12. 12.
    Ludwig, T., Krempel, S., Kuhn, M., Kunkel, J.M., Lohse, C.: Analysis of the MPI-IO Optimization Levels with the PIOViz Jumpshot Enhancement. In: Cappello, F., Herault, T., Dongarra, J. (eds.) PVM/MPI 2007. LNCS, vol. 4757, pp. 213–222. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  13. 13.
  14. 14.
    AT Consultancy bv: Atop,
  15. 15.
    Kunkel, J.M.: Towards Automatic Load Balancing of a Parallel File System with Subfile Based Migration. Master’s thesis, Ruprecht-Karls-Universität Heidelberg, Institute of Computer Science (July 2007)Google Scholar
  16. 16.
    Gropp, W., Thakur, R., Lusk, E.: 3.10.1. In: Using MPI-2: Advanced Features of the Message Passing Interface, pp. 101–105. MIT Press, Cambridge (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Julian M. Kunkel
    • 1
  • Thomas Ludwig
    • 1
  1. 1.Ruprecht-Karls-Universität HeidelbergHeidelbergGermany

Personalised recommendations