Abstract
This paper describes Ant, a debugging framework targeting MPI parallel programs. The Ant framework statically analyzes programs, marking code regions as being executed by all processes or executed by only some of the processes. The analyzed program is then instrumented with calls to an invariant violation monitoring and detection library. The analysis allows regions to be instrumented based on whether all, or less than all, processes execute the region. Ant’s instrumentation strategy allows sampled monitoring across processes in regions executed by all processes. We present a case study using Ant with C-DIDUCE (a variant of DIDUCE for C) to find violations of value invariants in parallel C/MPI programs. Ant’s instrumentation strategy reduces the overhead of monitoring by over 14 times with less impact on accuracy than a scheme that simply distributes monitoring over all processes executing the program.
This material is based upon work supported by the National Science Foundation under Grant No. CCF-0916901.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Software errors cost U.S. economy $59.5 billion annually, NIST News Release 2002-10 (2002)
Hangal, S., Lam, M.S.: Tracking down software bugs using automatic anomaly detection. In: Proceedings of the 24th International Conference on Software Engineering, pp. 291–301 (2002)
Fei, L., Midkiff, S.P.: Artemis: practical runtime monitoring of applications for execution anomalies. In: PLDI 2006: Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 84–95. ACM Press, New York (2006)
Zhou, P., Liu, W., Fei, L., Lu, S., Qin, F., Zhou, Y., Midkiff, S., Torrellas, J.: AccMon: Automatically detecting memory-related bugs via program counter-based invariants. In: Proceedings of the 37th Annual IEEE/ACM International Symposium on Micro-architecture, MICRO 2004 (2004)
Liblit, B., Naik, M., Zheng, A.X., Aiken, A., Jordan, M.I.: Scalable statistical bug isolation. In: Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation (2005)
Liblit, B., Aiken, A., Zheng, A.X., Jordan, M.I.: Bug isolation via remote program sampling. In: Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, pp. 141–154 (2003)
Liu, C., Yan, X., Fei, L., Han, J., Midkiff, S.P.: Sober: statistical model-based bug localization. In: ESEC/FSE-13: Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ACM Press (2005)
Ernst, M.D., Czeisler, A., Griswold, W.G., Notkin, D.: Quickly detecting relevant program invariants. In: Proceedings of the 22nd International Conference on Software Engineering, pp. 449–458 (2000)
The Cetus Project, http://cetus.ecn.purdue.edu
NAS Parallel Benchmarks, http://www.nas.nasa.gov/publications/npb.html
The ASCI Purple Benchmark, https://asc.llnl.gov/computing_resources/purple/archive/benchmarks/
SPEC MPI2007, http://www.spec.org/mpi2007/
Hutchins, M., Foster, H., Goradia, T., Ostrand, T.: Experiments of the effectiveness of dataflow- and controlflow-based test adequacy criteria. In: Proceedings of the 16th International Conference on Software Engineering, ICSE 1994, pp. 191–200. IEEE Computer Society Press, Los Alamitos (1994)
Alexander, V.: Mirgorodskiy, Naoya Maruyama, and Barton P. Miller. Problem diagnosis in large-scale computing environments. In: SC 2006: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, p. 88. ACM (2006)
Gao, Q., Qin, F., Panda, D.K.: DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements. In: SC 2007: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing. ACM (2007)
Lumetta, S.S., Culler, D.E.: The Mantis parallel debugger. In: SPDT 1996: Proceedings of the SIGMETRICS Symposium on Parallel and Distributed Tools, pp. 118–126. ACM Press, New York (1996)
Sistare, S., Dorenkamp, E., Nevin, N., Loh, E.: MPI support in the Prism programming environment. In: Supercomputing 1999: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing (CDROM), p. 22. ACM Press (1999)
Stringhini, D., Navaux, P., de Kergommeaux, J.C.: A selection mechanism to group processes in a parallel debugger. In: In Proceedings 2000 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2000) (June 2000)
Cheng, D., Hood, R.: A portable debugger for parallel and distributed programs. In: Proceedings of Supercomputing 1994, pp. 723–732 (November 1994)
Wismuller, R., Oberhubera, M., Krammera, J., Hansenb, O.: Interactive debugging and performance analysis of massively parallel applications. Parallel Computing 22(3), 415–442 (1996)
Arnold, D.C., Ahn, D.H., de Supinski, B.R., Lee, G.L., Miller, B.P., Schulz, M.: Stack trace analysis for large scale debugging. In: International Parallel and Distributed Processing Symposium, p. 64 (2007)
Lee, G.L., Ahn, D.H., Arnold, D.C., de Supinski, B.R., Legendre, M., Miller, B.P., Schulz, M., Liblit, B.: Lessons learned at 208k: towards debugging millions of cores. In: SC 2008: Proceedings of the, ACM/IEEE Conference on Supercomputing, pp. 1–9. IEEE Press, Piscataway (2008)
Strom, R.E., Bacon, D.F., Goldberg, A.P., Lowry, A., Yellin, D.M., Yemini, S.A.: Hermes: a Language for Distributed Computing. Prentice-Hall, Inc., Upper Saddle River (1991)
Kamil, A., Yelick, K.: Concurrency Analysis for Parallel Programs with Textually Aligned Barriers. In: Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds.) LCPC 2005. LNCS, vol. 4339, pp. 185–199. Springer, Heidelberg (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, JW., Bachega, L.R., Midkiff, S.P., Hu, Y.C. (2013). Ant: A Debugging Framework for MPI Parallel Programs. In: Kasahara, H., Kimura, K. (eds) Languages and Compilers for Parallel Computing. LCPC 2012. Lecture Notes in Computer Science, vol 7760. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37658-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-37658-0_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37657-3
Online ISBN: 978-3-642-37658-0
eBook Packages: Computer ScienceComputer Science (R0)