Abstract
Tools are essential for application developers and system support personnel during tasks such as performance optimization and debugging of massively parallel applications. An important class are event-based tools that analyze relevant events during the runtime of an application, e.g., function invocations or communication operations. We develop a parallel tools infrastructure that supports both the observation and analysis of application events at runtime. Some analyses—e.g., deadlock detection algorithms—require complex processing and apply to many types of frequently occurring events. For situations where the rate at which an application generates new events exceeds the processing rate of the analysis, we experience tool instability or even failures, e.g., memory exhaustion. Tool infrastructures must provide means to avoid or mitigate such situations. This paper explores two such techniques: first, a heuristic that selects events to receive and process next; second, a pause mechanism that temporarily suspends the execution of an application. An application study with applications from the SPEC MPI2007 benchmark suite and the NAS parallel benchmarks evaluates these techniques at up to \(16{,}384\) processes and illustrates how they avoid memory exhaustion problems that limited the applicability of a runtime correctness tool in the past.
The rights of this work are transferred to the extent transferable according to title 17 §105 U.S.C.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Uses numbers of processes that are a multiple of three.
- 2.
The lref data set operates with up to \(2{,}048\) processes (http://www.spec.org/mpi/docs/faq.html#DataSetL).
References
Arnold, D.C., Ahn, D.H., de Supinski, B.R., Lee, G.L., Miller, B.P., Schulz, M.: Stack trace analysis for large scale debugging. In: International Parallel and Distributed Processing Symposium (2007)
Bailey, D.H., Dagum, L., Barszcz, E., Simon, H.D.: NAS parallel benchmark results. Technical report, IEEE Parallel and Distributed Technology (1992)
Besnard, J.-B., Pérache, M., Jalby, W.: Event streaming for online performance measurements reduction. In: 42nd International Conference on Parallel Processing, ICPP 2013, pp. 985–994 (2013)
Buntinas, D., Bosilca, G., Graham, R.L., Vallée, G., Watson, G.R.: A scalable tools communications infrastructure. In: Proceedings of the 2008 22nd International Symposium on High Performance Computing Systems and Applications, HPCS 2008, pp. 33–39. IEEE Computer Society, Washington (2008)
Geimer, M., Wolf, F., Wylie, B.J.N., Ábrahám, E., Becker, D., Mohr, B.: The Scalasca performance toolset architecture. Concurrency Comput. Pract. Exp. 22(6), 702–719 (2010)
Gerndt, M., Fürlinger, K., Kereku, E.: Periscope: advanced techniques for performance analysis. In: Parallel Computing: Current and Future Issues of High-End Computing, Proceedings of the International Conference ParCo 2005, John von Neumann Institute for Computing Series, vol. 33. Central Institute for Applied Mathematics, Jülich (2005)
Hilbrich, T., de Supinski, B.R., Nagel, W.E., Protze, J., Baier, C., Müller, M.S.: Distributed wait state tracking for runtime MPI deadlock detection. In: Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2013, pp. 16:1–16:12. ACM, New York (2013)
Hilbrich, T., Müller, M.S., de Supinski, B.R., Schulz, M., Nagel, W.E.: GTI: a generic tools infrastructure for event-based tools in parallel systems. In: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012, pp. 1364–1375. IEEE Computer Society, Washington (2012)
Hilbrich, T., Müller, M.S., Schulz, M., de Supinski, B.R.: Order preserving event aggregation in TBONs. In: Cotronis, Y., Danalis, A., Nikolopoulos, D.S., Dongarra, J. (eds.) EuroMPI 2011. LNCS, vol. 6960, pp. 19–28. Springer, Heidelberg (2011)
Hilbrich, T., Protze, J., de Supinski, B.R., Schulz, M., Müller, M.S., Nagel, W.E.: Intralayer communication for tree-based overlay networks. In: 42nd International Conference on Parallel Processing (ICPP), Fourth International Workshop on Parallel Software Tools and Tool Infrastructures, pp. 995–1003. IEEE Computer Society Press, Los Alamitos (2013)
Ilsche, T., Schuchart, J., Cope, J., Kimpe, D., Jones, T., Knüpfer, A., Iskra, K., Ross, R., Nagel, W.E., Poole, S.: Enabling event tracing at leadership-class scale through I/O forwarding middleware. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2012, pp. 49–60. ACM, New York (2012)
Jun, T.H., Watson, G.R.: Scalable Communication Infrastructure (2013). http://wiki.eclipse.org/PTP/designs/SCI Accessed 30 April 2013
Krell Institute. The Component Based Tool Infrastructure (2014). http://sourceforge.net/projects/cbtf/ Accessed 19 January 2014
Lee, G.L., Ahn, D.H., Arnold, D.C., de Supinski, B.R., Legendre, M., Miller, B.P., Schulz, M., Liblit, B.: Lessons learned at 208K: towards debugging millions of cores. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC 2008, pp. 26:1–26:9. IEEE Press, Piscataway (2008)
Message Passing Interface Forum. MPI: A Message-Passing Interface Standard, Version 3.0 (2012). http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf Accessed 27 November 2013
Müller, M.S., van Waveren, M., Lieberman, R., Whitney, B., Saito, H., Kumaran, K., Baron, J., Brantley, W.C., Parrott, C., Elken, T., Feng, H., Ponder, C.: SPEC MPI2007 - an application benchmark suite for parallel systems using MPI. Concurrency Comput. Pract. Exp. 22(2), 191–205 (2010)
Nagel, W.E., Arnold, A., Weber, M., Hoppe, H.C., Solchenbach, K.: VAMPIR: visualization and analysis of MPI resources. Supercomputer 12(1), 69–80 (1996)
Nataraj, A., Malony, A.D., Morris, A., Arnold, D.C., Miller, B.P.: A framework for scalable, parallel performance monitoring. Concurrency Comput. Pract. Exp. 22(6), 720–735 (2010)
Noeth, M., Mueller, F., Schulz, M., de Supinski, B.R.: Scalable compression and replay of communication traces in massively parallel environments. In: IEEE International Parallel and Distributed Processing Symposium, IPDPS 2007, pp. 69–70 (2007)
Roth, P.C., Arnold, D.C., Miller, B.P.: MRNet: a software-based multicast/reduction network for scalable tools. In: Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, SC 2003. ACM, New York (2003)
Wagner, M., Knüpfer, A., Nagel, W.E.: Hierarchical memory buffering techniques for an in-memory event tracing extension to the open trace format 2. In: 42nd International Conference on Parallel Processing, ICPP 2013, pp. 970–976 (2013)
Wylie, B.J.N., Geimer, M., Mohr, B., Böhme, D., Szebenyi, Z., Wolf, F.: Large-scale performance analysis of Sweep3D with the Scalasca toolset. Parallel Process. Lett. 20(04), 397–414 (2010)
Acknowledgments
We thank the ASC Tri-Labs and the Los Alamos National Laboratory for their friendly support. Part of this work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. (LLNL-CONF-652119). This work has been supported by the CRESTA project that has received funding from the European Community’s Seventh Framework Programme (ICT-2011.9.13) under Grant Agreement no. 287703.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland 2015 (outside the US)
About this paper
Cite this paper
Hilbrich, T. et al. (2015). Memory Usage Optimizations for Online Event Analysis. In: Markidis, S., Laure, E. (eds) Solving Software Challenges for Exascale. EASC 2014. Lecture Notes in Computer Science(), vol 8759. Springer, Cham. https://doi.org/10.1007/978-3-319-15976-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-15976-8_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-15975-1
Online ISBN: 978-3-319-15976-8
eBook Packages: Computer ScienceComputer Science (R0)