Advertisement

Debugging Distributed Shared Memory Applications

  • Jeffrey Olivier
  • Chih-Ping Chen
  • Jay Hoeflinger
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4330)

Abstract

A debugger is a crucial part of any programming system, and is especially crucial for those supporting a parallel programming paradigm, like OpenMP. A parallel, relaxed-consistency, distributed shared memory (DSM) system presents unique challenges to a debugger for several reasons: 1) the local copies of a given variable are not always consistent between distributed machines, so directly accessing the variable in the local memory by the debugger won’t always work as expected; 2) if the DSM and debugger both modify page protections, they will likely interfere with each other; and 3) since a large number of SIGSEGVs occur as part of the normal operation of a DSM program, a program error producing a SIGSEGV may be missed. In this paper, we discuss these problems and propose solutions.

Keywords

Message Passing Interface Safe State Distribute Shared Memory Helper Thread Runtime Library 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Balle, S.M., Brett, B.R., Chen, C., LaFrance-Linden, D.: Extending a traditional debugger to debug massively parallel applications. Journal of Parallel and Distributed Computing 64(5), 617–628 (2004)CrossRefGoogle Scholar
  2. 2.
    Carlson, W.W., Draper, J.M., Culler, D.E., Yelick, K., Brooks, E., Warren, K.: Introduction to UPC and Language Specification. Technical Report CCS-TR-99-157, Institute for Defense Analysis, Center for Computer Sciences, Bowie, Maryland (1999)Google Scholar
  3. 3.
    Chen, C.: The Parallel Debugging Architecture in the Intel® Debugger. In: Malyshkin, V.E. (ed.) PaCT 2003. LNCS, vol. 2763, pp. 444–451. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  4. 4.
    Cownie, J., Gropp, W.: A standard interface for debugger access to message queue information in MPI. In: Margalef, T., Dongarra, J., Luque, E. (eds.) PVM/MPI 1999. LNCS, vol. 1697, pp. 51–58. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  5. 5.
    Dubois, M., Scheurich, C., Briggs, F.A.: Memory Access Buffering in Multiprocessors. In: Proceedings of the Thirteenth Annual International Symposium on Computer Architecture, vol. 14(2), pp. 434–442 (June 1986)Google Scholar
  6. 6.
    Etnus LLC: TotalView Reference Guide, Version 6.0. Etnus LLC (2002) Google Scholar
  7. 7.
    Intel Corporation: Cluster OpenMP User’s Guide, Version 9.1, Intel Corporation (2005-2006) Google Scholar
  8. 8.
    Keleher, P., Cox, A.L., Zwaenepoel, W.: Lazy release consistency for software distributed shared memory. In: Proceedings of the 19th Annual International Symposium on Computer Architecture, May 1992, pp. 13–21 (1992)Google Scholar
  9. 9.
    Lamport, L.: How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Trans. Computers 28(9), 690–691 (1979)MATHCrossRefGoogle Scholar
  10. 10.
    Li, K., Hudak, P.: Memory Coherence in Shared Virtual Memory Systems. In: Proceedings of the 5th ACM Symposium on Principles of Distributed Computing (1989)Google Scholar
  11. 11.
    LeBlanc, T.J., Mellor-Crummey, J.M.: Debugging parallel programs with instant replay. IEEE Transaction on Computers 36(4), 471–482 (1987)CrossRefGoogle Scholar
  12. 12.
    Lumetta, S.S., Culler, D.E.: The Mantis Parallel Debugger. In: Proceedings of SPDT 1996: SIGMETRICS Symposium on Parallel and Distributed Tools (1996)Google Scholar
  13. 13.
    Message Passing Interface Forum. MPI: A Message Passing Interface Standard. Version 1.1 (June 1995)Google Scholar
  14. 14.
    Miller, B.P., Choi, J.: Breakpoints and Halting in Distributed Programs. In: Proceedings of the 8th International Conference on Distributed Computing Systems (ICDCS) (1988)Google Scholar
  15. 15.
    Mittal, N., Garg, V.K.: Debugging Distributed Programs Using Controlled Re- execution. In: Proceedings of the 19th Annual ACM Symposium on Principles of Distributed Computing (PODC) (2000)Google Scholar
  16. 16.
    Netzer, R.H.B.: Optimal tracing and replay for debugging shared-memory parallel programs. In: Proceedings of ACM/ONR Workshop on Parallel and Distributed Debugging, San Diego, California, May 1993, pp. 1–11 (1993)Google Scholar
  17. 17.
    OpenMP Architecture Review Board: OpenMP Application Program Interface, Version 2.5. OpenMP Architecture Review Board (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jeffrey Olivier
    • 1
  • Chih-Ping Chen
    • 1
  • Jay Hoeflinger
    • 1
  1. 1.Intel CorporationSanta ClaraUSA

Personalised recommendations