Skip to main content

Ant: A Debugging Framework for MPI Parallel Programs

  • Conference paper
Book cover Languages and Compilers for Parallel Computing (LCPC 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7760))

Abstract

This paper describes Ant, a debugging framework targeting MPI parallel programs. The Ant framework statically analyzes programs, marking code regions as being executed by all processes or executed by only some of the processes. The analyzed program is then instrumented with calls to an invariant violation monitoring and detection library. The analysis allows regions to be instrumented based on whether all, or less than all, processes execute the region. Ant’s instrumentation strategy allows sampled monitoring across processes in regions executed by all processes. We present a case study using Ant with C-DIDUCE (a variant of DIDUCE for C) to find violations of value invariants in parallel C/MPI programs. Ant’s instrumentation strategy reduces the overhead of monitoring by over 14 times with less impact on accuracy than a scheme that simply distributes monitoring over all processes executing the program.

This material is based upon work supported by the National Science Foundation under Grant No. CCF-0916901.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Software errors cost U.S. economy $59.5 billion annually, NIST News Release 2002-10 (2002)

    Google Scholar 

  2. Hangal, S., Lam, M.S.: Tracking down software bugs using automatic anomaly detection. In: Proceedings of the 24th International Conference on Software Engineering, pp. 291–301 (2002)

    Google Scholar 

  3. Fei, L., Midkiff, S.P.: Artemis: practical runtime monitoring of applications for execution anomalies. In: PLDI 2006: Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 84–95. ACM Press, New York (2006)

    Chapter  Google Scholar 

  4. Zhou, P., Liu, W., Fei, L., Lu, S., Qin, F., Zhou, Y., Midkiff, S., Torrellas, J.: AccMon: Automatically detecting memory-related bugs via program counter-based invariants. In: Proceedings of the 37th Annual IEEE/ACM International Symposium on Micro-architecture, MICRO 2004 (2004)

    Google Scholar 

  5. Liblit, B., Naik, M., Zheng, A.X., Aiken, A., Jordan, M.I.: Scalable statistical bug isolation. In: Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation (2005)

    Google Scholar 

  6. Liblit, B., Aiken, A., Zheng, A.X., Jordan, M.I.: Bug isolation via remote program sampling. In: Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, pp. 141–154 (2003)

    Google Scholar 

  7. Liu, C., Yan, X., Fei, L., Han, J., Midkiff, S.P.: Sober: statistical model-based bug localization. In: ESEC/FSE-13: Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ACM Press (2005)

    Google Scholar 

  8. Ernst, M.D., Czeisler, A., Griswold, W.G., Notkin, D.: Quickly detecting relevant program invariants. In: Proceedings of the 22nd International Conference on Software Engineering, pp. 449–458 (2000)

    Google Scholar 

  9. The Cetus Project, http://cetus.ecn.purdue.edu

  10. NAS Parallel Benchmarks, http://www.nas.nasa.gov/publications/npb.html

  11. The ASCI Purple Benchmark, https://asc.llnl.gov/computing_resources/purple/archive/benchmarks/

  12. SPEC MPI2007, http://www.spec.org/mpi2007/

  13. Hutchins, M., Foster, H., Goradia, T., Ostrand, T.: Experiments of the effectiveness of dataflow- and controlflow-based test adequacy criteria. In: Proceedings of the 16th International Conference on Software Engineering, ICSE 1994, pp. 191–200. IEEE Computer Society Press, Los Alamitos (1994)

    Chapter  Google Scholar 

  14. Alexander, V.: Mirgorodskiy, Naoya Maruyama, and Barton P. Miller. Problem diagnosis in large-scale computing environments. In: SC 2006: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, p. 88. ACM (2006)

    Google Scholar 

  15. Gao, Q., Qin, F., Panda, D.K.: DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements. In: SC 2007: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing. ACM (2007)

    Google Scholar 

  16. TotalView, http://www.roguewave.com/products/totalview.aspx

  17. Lumetta, S.S., Culler, D.E.: The Mantis parallel debugger. In: SPDT 1996: Proceedings of the SIGMETRICS Symposium on Parallel and Distributed Tools, pp. 118–126. ACM Press, New York (1996)

    Chapter  Google Scholar 

  18. Sistare, S., Dorenkamp, E., Nevin, N., Loh, E.: MPI support in the Prism programming environment. In: Supercomputing 1999: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing (CDROM), p. 22. ACM Press (1999)

    Google Scholar 

  19. Stringhini, D., Navaux, P., de Kergommeaux, J.C.: A selection mechanism to group processes in a parallel debugger. In: In Proceedings 2000 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2000) (June 2000)

    Google Scholar 

  20. Cheng, D., Hood, R.: A portable debugger for parallel and distributed programs. In: Proceedings of Supercomputing 1994, pp. 723–732 (November 1994)

    Google Scholar 

  21. Wismuller, R., Oberhubera, M., Krammera, J., Hansenb, O.: Interactive debugging and performance analysis of massively parallel applications. Parallel Computing 22(3), 415–442 (1996)

    Article  Google Scholar 

  22. Arnold, D.C., Ahn, D.H., de Supinski, B.R., Lee, G.L., Miller, B.P., Schulz, M.: Stack trace analysis for large scale debugging. In: International Parallel and Distributed Processing Symposium, p. 64 (2007)

    Google Scholar 

  23. Lee, G.L., Ahn, D.H., Arnold, D.C., de Supinski, B.R., Legendre, M., Miller, B.P., Schulz, M., Liblit, B.: Lessons learned at 208k: towards debugging millions of cores. In: SC 2008: Proceedings of the, ACM/IEEE Conference on Supercomputing, pp. 1–9. IEEE Press, Piscataway (2008)

    Google Scholar 

  24. Strom, R.E., Bacon, D.F., Goldberg, A.P., Lowry, A., Yellin, D.M., Yemini, S.A.: Hermes: a Language for Distributed Computing. Prentice-Hall, Inc., Upper Saddle River (1991)

    Google Scholar 

  25. Kamil, A., Yelick, K.: Concurrency Analysis for Parallel Programs with Textually Aligned Barriers. In: Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds.) LCPC 2005. LNCS, vol. 4339, pp. 185–199. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lee, JW., Bachega, L.R., Midkiff, S.P., Hu, Y.C. (2013). Ant: A Debugging Framework for MPI Parallel Programs. In: Kasahara, H., Kimura, K. (eds) Languages and Compilers for Parallel Computing. LCPC 2012. Lecture Notes in Computer Science, vol 7760. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37658-0_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37658-0_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37657-3

  • Online ISBN: 978-3-642-37658-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics