Advertisement

Cluster Computing

, Volume 16, Issue 1, pp 171–189 | Cite as

Extending the scope of the controlled logical clock

  • Daniel BeckerEmail author
  • Markus Geimer
  • Rolf Rabenseifner
  • Felix Wolf
Article
  • 130 Downloads

Abstract

Event traces are helpful in understanding the performance behavior of parallel applications since they allow the in-depth analysis of communication and synchronization patterns. However, the absence of synchronized clocks on most cluster systems may render the analysis ineffective because inaccurate relative event timings may misrepresent the logical event order and lead to errors when quantifying the impact of certain behaviors or confuse the users of time-line visualization tools by showing messages flowing backward in time. In our earlier work, we have developed a scalable algorithm called the controlled logical clock that eliminates inconsistent inter-process timings postmortem in traces of pure MPI applications, potentially running on large processor configurations. In this paper, we first demonstrate that our algorithm also proves beneficial in computational grids, where a single application is executed using the combined computational power of several geographically dispersed clusters. Second, we present an extended version of the algorithm that—in addition to message-passing event semantics—also preserves and restores shared-memory event semantics, enabling the correction of traces from hybrid applications.

Keywords

Event tracing Timestamp synchronization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: Proc. of the AFIPS Joint Computer Conferences, Atlantic City, NJ, USA, pp. 483–485. ACM Press, New York (1967). doi: 10.1145/1465482.1465560 Google Scholar
  2. 2.
    Babaoǧlu, O., Drummond, R.: (Almost) no cost clock synchronization. Technical Report TR86-791, Cornell University (1986) Google Scholar
  3. 3.
    Barnes, J.E., Hut, P.: A hierarchical O(N log N) force-calculation algorithm. Nature 324(6096), 446–449 (1986). doi: 10.1038/324446a0 CrossRefGoogle Scholar
  4. 4.
    Becker, D., Wolf, F., Frings, W., Geimer, M., Wylie, B.J.N., Mohr, B.: Automatic trace-based performance analysis of metacomputing applications. In: Proc. of the International Parallel and Distributed Processing Symposium, Long Beach, CA, USA. IEEE Press, New York (2007) Google Scholar
  5. 5.
    Becker, D., Frings, W., Wolf, F.: Performance evaluation and optimization of parallel grid computing applications. In: Proc. of the 16th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, Toulouse, France, pp. 193–199. IEEE Press, New York (2008) Google Scholar
  6. 6.
    Becker, D., Rabenseifner, R., Wolf, F.: Implications of non-constant clock drifts for the timestamps of concurrent events. In: Proc. of the IEEE Cluster Conference, Tsukuba, Japan, pp. 59–68. IEEE Press, New York (2008) Google Scholar
  7. 7.
    Becker, D., Rabenseifner, R., Wolf, F., Linford, J.C.: Scalable timestamp synchronization for event traces of message-passing applications. Parallel Comput. 35(12), 595–607 (2009) MathSciNetCrossRefGoogle Scholar
  8. 8.
    Biberstein, M., Harel, Y., Heilper, A.: Clock synchronization in Cell BE traces. In: Proc. of the 14th Euro-Par Conference, Las Palmas de Gran Canaria, Spain. LNCS, vol. 5168, pp. 3–12. Springer, Berlin (2008) Google Scholar
  9. 9.
    Brown, P.N., Falgout, R.D., Jones, J.E.: Semicoarsening multigrid on distributed memory machines. SIAM J. Sci. Comput. 21(5), 1823–1834 (2000) MathSciNetzbMATHCrossRefGoogle Scholar
  10. 10.
    Cell Broadband Engine resource center: (2011). www.ibm.com/developerworks/power/cell
  11. 11.
    Cristian, F.: Probabilistic clock synchronization. Distrib. Comput. 3(3), 146–158 (1989) zbMATHCrossRefGoogle Scholar
  12. 12.
    Doleschal, J., Knüpfer, A., Müller, M.S., Nagel, W.: Internal timer synchronization for parallel event tracing. In: Proc. of the 15th European PVM/MPI Users’ Group Meeting, Dublin, Ireland. LNCS, vol. 5205, pp. 202–209. Springer, Berlin (2008) Google Scholar
  13. 13.
    Dorta, A.J., Rodriguez, C., de Sande, F., Gonzalez-Escribano, A.: The OpenMP source code repository. In: Proc. of the 13th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, Lugano, Switzerland, pp. 244–250. IEEE Press, New York (2005) CrossRefGoogle Scholar
  14. 14.
    Drummond, R., Babaoǧlu, O.: Low-cost clock synchronization. Distrib. Comput. 6(4), 193–203 (1993) zbMATHCrossRefGoogle Scholar
  15. 15.
    Duda, A., Harrus, G., Haddad, Y., Bernard, G.: Estimating global time in distributed systems. In: Proc. of the 7th International Conference on Distributed Computing Systems, Berlin, Germany, pp. 299–306. IEEE Press, New York (1987) Google Scholar
  16. 16.
    Dunigan, T.H.: Hypercube clock synchronization. ORNL TM-11744 (1994). www.csm.ornl.gov/dunigan/clock.ps
  17. 17.
    Edwards, D., Kearns, P.: DTVS: A distributed trace visualization system. In: Proc. of the 6th IEEE Symposium on Parallel and Distributed Processing, Dallas, TX, USA, pp. 281–288. IEEE Press, New York (1994) Google Scholar
  18. 18.
    Fidge, C.J.: Timestamps in message-passing systems that preserve partial ordering. Aust. Comput. Sci. Commun. 10(1), 56–66 (1988) Google Scholar
  19. 19.
    Fidge, C.J.: Partial orders for parallel debugging. ACM SIGPLAN Not. 24(1), 183–194 (1989) CrossRefGoogle Scholar
  20. 20.
    Foster, I.T.: Globus toolkit version 4: Software for service-oriented systems. In: Proc. of the International Conference on Network and Parallel Computing, Tokyo, Japan. LNCS, vol. 3779, pp. 2–13. Springer, Berlin (2006) CrossRefGoogle Scholar
  21. 21.
    Geimer, M., Wolf, F., Knüpfer, A., Mohr, B., Wylie, B.J.N.: A parallel trace-data interface for scalable performance analysis. In: Proc. of the Workshop on State-of-the-Art in Scientific and Parallel Computing, Umeå, Sweden. LNCS, vol. 4699, pp. 398–408. Springer, Berlin (2006) CrossRefGoogle Scholar
  22. 22.
    Geimer, M., Wolf, F., Wylie, B.J.N., Mohr, B.: A scalable tool architecture for diagnosing wait states in massively parallel applications. Parallel Comput. 35(7), 375–388 (2009) CrossRefGoogle Scholar
  23. 23.
    Haban, D., Weigel, W.: Global events and global breakpoints in distributed systems. In: Proc. of the 21st Hawaii International Conference on System Sciences, Kailua-Kona, HI, USA, pp. 166–175. IEEE Press, New York (1988) Google Scholar
  24. 24.
    Hoeflinger, J.P.: Extending OpenMP to clusters (2005). cache-www.intel.com/cd/00/00/28/58/285865_285865.pdf
  25. 25.
    Hofmann, R.: Gemeinsame Zeitskala für lokale Ereignisspuren. In: Messung, Modellierung und Bewertung von Rechen- und Kommunikationssystemen, Aachen, Germany, pp. 333–345. Springer, Berlin (1993) CrossRefGoogle Scholar
  26. 26.
    Hofmann, R., Hilgers, U.: Theory and tool for estimating global time in parallel and distributed systems. In: Proc. of the 6th Euromicro Workshop on Parallel and Distributed Processing, Madrid, Spain, pp. 173–179. IEEE Press, New York (1998) Google Scholar
  27. 27.
    Huband, S., McDonald, C.: A preliminary topological debugger for MPI programs. In: Proc. of the 1st IEEE/ACM International Symposium on Cluster Computing and the Grid, Brisbane, Australia, pp. 422–429. IEEE Press, New York (2001) CrossRefGoogle Scholar
  28. 28.
    Jafri, H.: Measuring causal propagation of overhead of inefficiencies in parallel applications. In: Proc. of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems, Cambridge, MA, pp. 237–243 (2007) Google Scholar
  29. 29.
    Janet: UK’s Education and Research Network: (2011). www.ja.net
  30. 30.
    Jézéquel, J.M.: Building a global time on parallel machines. In: Proc. of the 3rd International Workshop on Distributed Algorithms, Nice, France. LNCS, vol. 392, pp. 136–147. Springer, Berlin (1989) CrossRefGoogle Scholar
  31. 31.
    Karonis, N., Toonen, B., Foster, I.: MPICH-G2: a grid-enabled implementation of the message passing interface. J. Parallel Distrib. Comput. 63(5), 551–563 (2003) zbMATHCrossRefGoogle Scholar
  32. 32.
    Labarta, J., Girona, S., Pillet, V., Cortes, T., Gregoris, L.: DiP: a parallel program development environment. In: Proc. of the European Conference on Parallel Computing, Lyon, France. LNCS, vol. 1124, pp. 665–674. Springer, Berlin (1996) Google Scholar
  33. 33.
    Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21(7), 558–565 (1978) zbMATHCrossRefGoogle Scholar
  34. 34.
    Lorenz, D., Mohr, B., Rössel, C., Schmidl, D., Wolf, F.: How to reconcile event-based performance analysis with tasking in OpenMP. In: Proc. of the 6th International Workshop on OpenMP, Tsukuba, Japan. LNCS, vol. 6132, pp. 109–121. Springer, Berlin (2010) Google Scholar
  35. 35.
    MacLaren, J.: HARC: the highly-available resource co-allocator. In: Proc. of On the Move to Meaningful Internet Systems 2007: CoopIS, DOA, ODBASE, GADA, and IS, Vilamoura, Portugal. LNCS, vol. 4804, pp. 1385–1402. Springer, Berlin (2007) CrossRefGoogle Scholar
  36. 36.
    Maillet, E., Tron, C.: On efficiently implementing global time for performance evaluation on multiprocessor systems. J. Parallel Distrib. Comput. 28, 84–93 (1995) zbMATHCrossRefGoogle Scholar
  37. 37.
    Mattern, F.: Virtual time and global states of distributed systems. In: Proc. of the International Workshop on Parallel and Distributed Algorithms, Chateau de Bonas, France, pp. 215–226. Elsevier Science, Amsterdam (1989) Google Scholar
  38. 38.
    Mills, D.L.: Network Time Protocol (Version 3). The Internet Engineering Task Force—Network Working Group (1992). RFC 1305 Google Scholar
  39. 39.
    Mohr, B., Malony, A., Shende, S., Wolf, F.: Design and prototype of a performance tool interface for OpenMP. J. Supercomput. 23(1), 105–128 (2002) zbMATHCrossRefGoogle Scholar
  40. 40.
    Nagel, W., Arnold, A., Weber, M., Hoppe, H.C., Solchenbach, K.: Vampir: visualization and analysis of MPI resources. Supercomputer 12(1), 69–80 (1996) Google Scholar
  41. 41.
    NGS: National Grid Service: (2011). www.grid-support.ac.uk
  42. 42.
    Pfalzner, S., Gibbon, P.: Many-Body Tree Methods in Physics. Cambridge University Press, Cambridge (1996) CrossRefGoogle Scholar
  43. 43.
    Probert, R.L., Yu, H., Saleh, K.: Relative-clock-based specification and test result analysis of distributed systems. In: Proc. of the 11th Annual International Phoenix Conference on Computers and Communications, Scottsdale, AZ, USA, pp. 687–694. IEEE Press, New York (1992) Google Scholar
  44. 44.
    Rabenseifner, R.: The controlled logical clock—a global time for trace based software monitoring of parallel applications in workstation clusters. In: Proc. of the 5th Euromicro Workshop on Parallel and Distributed Processing, London, UK, pp. 477–484. IEEE Press, New York (1997) Google Scholar
  45. 45.
    Rabenseifner, R.: Die geregelte logische Uhr, eine globale Uhr für die tracebasierte Überwachung paralleler Anwendungen. Ph.D. thesis, University of Stuttgart, Stuttgart (2000) Google Scholar
  46. 46.
    Rodriguez, G., Badia, R.M., Labarta, J.: Generation of simple analytical models for message passing applications. In: Proc. of the European Conference on Parallel Computing, Pisa, Italy. LNCS, vol. 3149, pp. 183–188. Springer, Berlin (2004) Google Scholar
  47. 47.
    Schwarz, R., Mattern, F.: Detecting causal relationships in distributed computations: in search of the holy grail. Distrib. Comput. 7(3), 149–174 (1994) zbMATHCrossRefGoogle Scholar
  48. 48.
    Smarr, L., Catlett, C.E.: Metacomputing. Commun. ACM 35(6), 44–52 (1992) CrossRefGoogle Scholar
  49. 49.
    van Dijk, G.J.V., van der Wal, J.V.D.: Partial ordering of synchronization events for distributed debugging in tightly-coupled multiprocessor systems. In: Proc. of the 2nd European Conference on Distributed Memory Computing, Munich, Germany. LNCS, vol. 487, pp. 100–109. Springer, Berlin (1991) CrossRefGoogle Scholar
  50. 50.
    Warren, M.S., Salmon, J.K.: A parallel hashed oct-tree n-body algorithm. In: Proc. of the Conference on High Performance Networking and Computing, Portland, OR, USA, pp. 12–21. ACM Press, New York (1993). doi: 10.1145/169627.169640 Google Scholar
  51. 51.
    Wong, A.K.L., Goscinski, A.M.: Using an enterprise grid for execution of MPI parallel applications—a case study. In: Proc. of the 13th European PVM/MPI Users’ Group Meeting, Bonn, Germany. LNCS, vol. 4192. Springer, Berlin (2006) Google Scholar
  52. 52.
    Yang, Z., Marsland, T.A.: Annotated bibliography on global states and times in distributed systems. Oper. Syst. Rev. 27(3), 55–74 (1993) CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Daniel Becker
    • 1
    Email author
  • Markus Geimer
    • 2
  • Rolf Rabenseifner
    • 3
  • Felix Wolf
    • 1
    • 2
    • 4
  1. 1.German Research School for Simulation SciencesAachenGermany
  2. 2.Jülich Supercomputing CentreJülichGermany
  3. 3.University of StuttgartStuttgartGermany
  4. 4.RWTH Aachen UniversityAachenGermany

Personalised recommendations