Abstract
This paper presents a systemwide monitoring and analysis tool for high performance computers with several features aimed at minimizing the transport of performance data along a network of agents. The aim of the tool is to do a preliminary detection of performance bottlenecks on user applications running in HPC systems with a negligible impact on production runs. Continuous systemwide monitoring can lead to large volumes of data, if the data is required to be stored permanently to be available for queries. For system monitoring level we require to store the monitoring data synchronously. We retain the descriptive qualities by using quantiles; an aggregation with respect to the number of cores used by the application at every measuring interval. The optimization of the transport route for the performance data enables us to precisely calculate quantiles as opposed to quantile estimation.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Benedict, S., Brehm, M., Gerndt, M., Guillen, C., Hesse, W., Petkov, V.: Automatic performance analysis of large scale simulations. In: Lin, H.-X., Alexander, M., Forsell, M., Knüpfer, A., Prodan, R., Sousa, L., Streit, A. (eds.) Euro-Par 2009. LNCS, vol. 6043, pp. 199–207. Springer, Heidelberg (2010)
Brim, M.J., DeRose, L., Miller, B.P., Olichandran, R., Roth, P.C.: Mrnet: A scalable infrastructure for development of parallel tools and applications. In: Cray User Group 2010 Proceedings (2010)
Focht, E., Jeutter, A.: AggMon: Scalable Hierarchical Cluster Monitoring. In: Proceedings of the Joint Workshop on High Performance Computing on Vector Systems (2012)
Frank, I.E., Todeschini, R.: The data analysis handbook, vol. 14. Elsevier Science B.V (1994)
Gerndt, M., Fuerlinger, K.: Automatic performance analysis with periscope. In: Journal: Concurrency and Computation: Practice and Experience, Wiley InterScience. John Wiley & Sons, Ltd. (2009)
Gerndt, M., Fuerlinger, K., Kereku, E.: Periscope: Advanced techniques for performance analysis, parallel computing: Current & future issues of high-end computing. In: International Conference ParCo 2005. NIC Series, vol. 33 (2006) ISBN 3-00-017352-8
Guillen, C., Hesse, W., Brehm, M.: A new scalable monitoring tool using performance properties of hpc systems. In: Bischof, C., Hegering, H.-G., Nagel, W.E., Wittum, G. (eds.) Competence in High Performance Computing 2010, pp. 51–60. Springer, Heidelberg (2012) 10.1007/978-3-642-24025-6.5
Mendenhall, W., Sincich, T.: Statistics for engineering and the sciences, 4th edn. Prentice-Hall International, Inc. (1995) ISBN 0-13-181017-0
Mooney, R., Schmidt, K.P., Studham, R.S.: NWPerf: a system wide performance monitoring tool for large Linux clusters. In: IEEE International Conference on Cluster Computing, pp. 379–389. IEEE Computer Society, Los Alamitos (2004)
Roth, P.C., Arnold, D.C., Miller, B.P.: Mrnet: A software-based multicast/reduction network for scalable tools. In: Proc. IEEE/ACM Supercomputing (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Guillen, C., Hesse, W., Brehm, M. (2014). The PerSyst Monitoring Tool. In: Lopes, L., et al. Euro-Par 2014: Parallel Processing Workshops. Euro-Par 2014. Lecture Notes in Computer Science, vol 8806. Springer, Cham. https://doi.org/10.1007/978-3-319-14313-2_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-14313-2_31
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14312-5
Online ISBN: 978-3-319-14313-2
eBook Packages: Computer ScienceComputer Science (R0)