Skip to main content

Memory Usage Optimizations for Online Event Analysis

  • Conference paper
  • First Online:
Solving Software Challenges for Exascale (EASC 2014)

Abstract

Tools are essential for application developers and system support personnel during tasks such as performance optimization and debugging of massively parallel applications. An important class are event-based tools that analyze relevant events during the runtime of an application, e.g., function invocations or communication operations. We develop a parallel tools infrastructure that supports both the observation and analysis of application events at runtime. Some analyses—e.g., deadlock detection algorithms—require complex processing and apply to many types of frequently occurring events. For situations where the rate at which an application generates new events exceeds the processing rate of the analysis, we experience tool instability or even failures, e.g., memory exhaustion. Tool infrastructures must provide means to avoid or mitigate such situations. This paper explores two such techniques: first, a heuristic that selects events to receive and process next; second, a pause mechanism that temporarily suspends the execution of an application. An application study with applications from the SPEC MPI2007 benchmark suite and the NAS parallel benchmarks evaluates these techniques at up to \(16{,}384\) processes and illustrates how they avoid memory exhaustion problems that limited the applicability of a runtime correctness tool in the past.

The rights of this work are transferred to the extent transferable according to title 17 §105 U.S.C.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Uses numbers of processes that are a multiple of three.

  2. 2.

    The lref data set operates with up to \(2{,}048\) processes (http://www.spec.org/mpi/docs/faq.html#DataSetL).

References

  1. Arnold, D.C., Ahn, D.H., de Supinski, B.R., Lee, G.L., Miller, B.P., Schulz, M.: Stack trace analysis for large scale debugging. In: International Parallel and Distributed Processing Symposium (2007)

    Google Scholar 

  2. Bailey, D.H., Dagum, L., Barszcz, E., Simon, H.D.: NAS parallel benchmark results. Technical report, IEEE Parallel and Distributed Technology (1992)

    Google Scholar 

  3. Besnard, J.-B., Pérache, M., Jalby, W.: Event streaming for online performance measurements reduction. In: 42nd International Conference on Parallel Processing, ICPP 2013, pp. 985–994 (2013)

    Google Scholar 

  4. Buntinas, D., Bosilca, G., Graham, R.L., Vallée, G., Watson, G.R.: A scalable tools communications infrastructure. In: Proceedings of the 2008 22nd International Symposium on High Performance Computing Systems and Applications, HPCS 2008, pp. 33–39. IEEE Computer Society, Washington (2008)

    Google Scholar 

  5. Geimer, M., Wolf, F., Wylie, B.J.N., Ábrahám, E., Becker, D., Mohr, B.: The Scalasca performance toolset architecture. Concurrency Comput. Pract. Exp. 22(6), 702–719 (2010)

    Google Scholar 

  6. Gerndt, M., Fürlinger, K., Kereku, E.: Periscope: advanced techniques for performance analysis. In: Parallel Computing: Current and Future Issues of High-End Computing, Proceedings of the International Conference ParCo 2005, John von Neumann Institute for Computing Series, vol. 33. Central Institute for Applied Mathematics, Jülich (2005)

    Google Scholar 

  7. Hilbrich, T., de Supinski, B.R., Nagel, W.E., Protze, J., Baier, C., Müller, M.S.: Distributed wait state tracking for runtime MPI deadlock detection. In: Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2013, pp. 16:1–16:12. ACM, New York (2013)

    Google Scholar 

  8. Hilbrich, T., Müller, M.S., de Supinski, B.R., Schulz, M., Nagel, W.E.: GTI: a generic tools infrastructure for event-based tools in parallel systems. In: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012, pp. 1364–1375. IEEE Computer Society, Washington (2012)

    Google Scholar 

  9. Hilbrich, T., Müller, M.S., Schulz, M., de Supinski, B.R.: Order preserving event aggregation in TBONs. In: Cotronis, Y., Danalis, A., Nikolopoulos, D.S., Dongarra, J. (eds.) EuroMPI 2011. LNCS, vol. 6960, pp. 19–28. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  10. Hilbrich, T., Protze, J., de Supinski, B.R., Schulz, M., Müller, M.S., Nagel, W.E.: Intralayer communication for tree-based overlay networks. In: 42nd International Conference on Parallel Processing (ICPP), Fourth International Workshop on Parallel Software Tools and Tool Infrastructures, pp. 995–1003. IEEE Computer Society Press, Los Alamitos (2013)

    Google Scholar 

  11. Ilsche, T., Schuchart, J., Cope, J., Kimpe, D., Jones, T., Knüpfer, A., Iskra, K., Ross, R., Nagel, W.E., Poole, S.: Enabling event tracing at leadership-class scale through I/O forwarding middleware. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2012, pp. 49–60. ACM, New York (2012)

    Google Scholar 

  12. Jun, T.H., Watson, G.R.: Scalable Communication Infrastructure (2013). http://wiki.eclipse.org/PTP/designs/SCI Accessed 30 April 2013

  13. Krell Institute. The Component Based Tool Infrastructure (2014). http://sourceforge.net/projects/cbtf/ Accessed 19 January 2014

  14. Lee, G.L., Ahn, D.H., Arnold, D.C., de Supinski, B.R., Legendre, M., Miller, B.P., Schulz, M., Liblit, B.: Lessons learned at 208K: towards debugging millions of cores. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC 2008, pp. 26:1–26:9. IEEE Press, Piscataway (2008)

    Google Scholar 

  15. Message Passing Interface Forum. MPI: A Message-Passing Interface Standard, Version 3.0 (2012). http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf Accessed 27 November 2013

  16. Müller, M.S., van Waveren, M., Lieberman, R., Whitney, B., Saito, H., Kumaran, K., Baron, J., Brantley, W.C., Parrott, C., Elken, T., Feng, H., Ponder, C.: SPEC MPI2007 - an application benchmark suite for parallel systems using MPI. Concurrency Comput. Pract. Exp. 22(2), 191–205 (2010)

    Google Scholar 

  17. Nagel, W.E., Arnold, A., Weber, M., Hoppe, H.C., Solchenbach, K.: VAMPIR: visualization and analysis of MPI resources. Supercomputer 12(1), 69–80 (1996)

    Google Scholar 

  18. Nataraj, A., Malony, A.D., Morris, A., Arnold, D.C., Miller, B.P.: A framework for scalable, parallel performance monitoring. Concurrency Comput. Pract. Exp. 22(6), 720–735 (2010)

    Google Scholar 

  19. Noeth, M., Mueller, F., Schulz, M., de Supinski, B.R.: Scalable compression and replay of communication traces in massively parallel environments. In: IEEE International Parallel and Distributed Processing Symposium, IPDPS 2007, pp. 69–70 (2007)

    Google Scholar 

  20. Roth, P.C., Arnold, D.C., Miller, B.P.: MRNet: a software-based multicast/reduction network for scalable tools. In: Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, SC 2003. ACM, New York (2003)

    Google Scholar 

  21. Wagner, M., Knüpfer, A., Nagel, W.E.: Hierarchical memory buffering techniques for an in-memory event tracing extension to the open trace format 2. In: 42nd International Conference on Parallel Processing, ICPP 2013, pp. 970–976 (2013)

    Google Scholar 

  22. Wylie, B.J.N., Geimer, M., Mohr, B., Böhme, D., Szebenyi, Z., Wolf, F.: Large-scale performance analysis of Sweep3D with the Scalasca toolset. Parallel Process. Lett. 20(04), 397–414 (2010)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

We thank the ASC Tri-Labs and the Los Alamos National Laboratory for their friendly support. Part of this work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. (LLNL-CONF-652119). This work has been supported by the CRESTA project that has received funding from the European Community’s Seventh Framework Programme (ICT-2011.9.13) under Grant Agreement no. 287703.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Schulz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland 2015 (outside the US)

About this paper

Cite this paper

Hilbrich, T. et al. (2015). Memory Usage Optimizations for Online Event Analysis. In: Markidis, S., Laure, E. (eds) Solving Software Challenges for Exascale. EASC 2014. Lecture Notes in Computer Science(), vol 8759. Springer, Cham. https://doi.org/10.1007/978-3-319-15976-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-15976-8_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-15975-1

  • Online ISBN: 978-3-319-15976-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics