Skip to main content
Log in

Timestamping messages and events in a distributed system using synchronous communication

  • Published:
Distributed Computing Aims and scope Submit manuscript

Abstract

Determining order relationship between events of a distributed computation is a fundamental problem in distributed systems which has applications in many areas including debugging, visualization, checkpointing and recovery. Fidge/Mattern’s vector-clock mechanism captures the order relationship using a vector of size N in a system consisting of N processes. As a result, it incurs message and space overhead of N integers. Many distributed applications use synchronous messages for communication. It is therefore natural to ask whether it is possible to reduce the timestamping overhead for such applications. In this paper, we present a new approach for timestamping messages and events of a synchronously ordered computation, that is, when processes communicate using synchronous messages. Our approach depends on decomposing edges in the communication topology into mutually disjoint edge groups such that each edge group either forms a star or a triangle. We show that, to accurately capture the order relationship between synchronous messages, it is sufficient to use one component per edge group in the vector instead of one component per process. Timestamps for events are only slightly bigger than timestamps for messages. Many common communication topologies such as ring, grid and hypercube can be decomposed into \({\lceil N/2 \rceil}\) edge groups, resulting in almost 50% improvement in both space and communication overheads. We prove that the problem of computing an optimal edge decomposition of a communication topology is NP-complete in general. We also present a heuristic algorithm for computing an edge decomposition whose size is within a factor of two of the optimal. We prove that, in the worst case, it is not possible to timestamp messages of a synchronously ordered computation using a vector containing fewer than \({2 \lfloor N/6 \rfloor}\) components when N ≥ 2. Finally, we show that messages in a synchronously ordered computation can always be timestamped in an offline manner using a vector of size at most \({\lfloor N/2 \rfloor}\) .

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  1. Agarwal, A., Garg, V.K. Efficient dependency tracking for relevant events in shared-memory systems. In: Proceedings of the ACM Symposium on Principles of Distributed Computing (PODC), pp. 19–28 (2005)

  2. Alagar S., Venkatesan S. (2001) Techniques to tackle state explosion in global predicate detection. IEEE Trans. Softw. Eng. 27(8): 704–714

    Article  Google Scholar 

  3. Basten T., Kunz T., Black J.P., Coffin M.H., Taylor D.J. (1997) Vector time and causality among abstract events in distributed computations. Distrib. Comput. 11, 21–39

    Article  Google Scholar 

  4. Charron-Bost B., Mattern F., Tel G. (1996) Synchronous and asynchronous communication in distributed computations. Distrib. Comput. 9, 173–191

    Article  MathSciNet  Google Scholar 

  5. IBM Corporation. IBM distributed debugger for workstations. Available at http://www.ibm.com/software/webservers/appserv/doc/v35/ae/infocenter/olt/ind ex.html

  6. Damani, O.P., Garg, V.K. How to recover efficiently and asynchronously when optimism fails. In: Proceedings of the IEEE International Conference on Distributed Computing Systems (ICDCS), pp. 108–115, Hong Kong (1996)

  7. Dilworth R.P. (1950) A decomposition theorem for partially ordered sets. Ann. Math. 51, 161–166

    Article  MathSciNet  Google Scholar 

  8. Dushnik B., Miller E.W. (1941) Partially ordered sets. Am. J. Math. 63, 600–610

    Article  MATH  MathSciNet  Google Scholar 

  9. Fidge, C.J. Timestamps in message-passing systems that preserve the partial-ordering. In: Raymond, K. (ed.) Proceedings of the 11th Australian Computer Science Conference (ACSC), pp. 56–66 (1988)

  10. Fidge, C.J. Partial orders for parallel debugging. In: Proceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging, pp. 183–194 (1989)

  11. Fidge C.J. (1991) Logical time in distributed computing systems. IEEE Comput. 24(8): 28–33

    Google Scholar 

  12. Fowler, J., Zwaenepoel, W. Causal distributed breakpoints. In: Proceedings of the 10th IEEE International Conference on Distributed Computing Systems (ICDCS), pp. 131–141. IEEE Computer Society (1990)

  13. Garey M.R., Johnson D.S. (1991) Computer and intractability: a guide to the theory of NP-completeness. W. H. Freeman and Company, New York

    Google Scholar 

  14. Garg, V.K. Elements of distributed computing. J Wiley, New York (2002, Incorporated)

  15. Garg, V.K., Skawratananond, C. String realizers of posets with applications to distributed computing. In: Proceedings of the 20th ACM Symposium on Principles of Distributed Computing (PODC), pp. 72–80 Newport (2001)

  16. Garg V.K., Waldecker B. (1994) Detection of weak unstable predicates in distributed programs. IEEE Trans. Parallel Distrib. Syst. (TPDS) 5(3): 299–307

    Article  Google Scholar 

  17. Haban, D., Weigel, W. Global events and global breakpoints in distributed systems. In: Proceedings of the 21st Hawaii International Conference on Systems Sciences, pp. 166–175 (1988)

  18. Hélary J.-M., Raynal M., Melideo G., Baldoni R. (2003) Efficient causality-tracking timestamping. IEEE Trans. Knowl. Data Eng. 15(5): 1239–1250

    Article  Google Scholar 

  19. Jard, C., Jourdan, G.-V. Dependency tracking and filtering in distributed computations. Technical Report 851, IRISA, Campus de Beaulieu, 35042 Rennes Cedex (1994)

  20. Kohl J.A., Geist G.A. (1995). The PVM34 tracing facility and XPVM 11. Technical report, Computer Science and Mathematics Division Oak Ridge National Lab, Tennesse

    Google Scholar 

  21. Kunz T., Black J.P., Taylor D.J., Basten T. (1997) POET: target- system independent visualizations of complex distributed-applications executions. Comput. J. 40(8): 499–512

    Article  Google Scholar 

  22. Lamport L. (1978) Time, clocks, and the ordering of events in a distributed system. Commun ACM (CACM) 21(7): 558–565

    Article  MATH  Google Scholar 

  23. Marzullo K., Sabel L. (1994) Efficient detection of a class of stable properties. Distrib. Comput. 8(2): 81–91

    Article  Google Scholar 

  24. Mattern, F. Virtual time and global states of distributed systems. In: Parallel and Distributed Algorithms: Proceedings of the Workshop on Distributed Algorithms (WDAG), pp. 215–226. Elsevier, North-Holland (1989)

  25. Murty, V.V., Garg, V.K. Synchronous message passing. In: Proceedings of the International Symposium on Autonomous Decentralized Systems, pp. 208–214. Phoenix, Arizona (1995)

  26. Rabinovitch I., Rival I. (1979) The rank of distributive lattice. Discrete Math. 25, 275–279

    Article  MATH  MathSciNet  Google Scholar 

  27. Singhal M., Kshemkalyani A. (1992) An efficient implementation of vector clocks. Inf. Process. Lett. (IPL) 43, 47–52

    Article  MATH  Google Scholar 

  28. Singhal M., Shivaratri N.G. (1994) Advanced Concepts in Operating Systems. McGraw-Hill and The MIT Press

  29. Strom R.E., Yemeni S. (1985) Optimistic recovery in distributed systems. ACM Trans. Comput. Syst. 3(3): 204–226

    Article  Google Scholar 

  30. Torres-Rojas, F.J., Ahamad, M. Plausible clocks: constant size logical clocks for distributed systems. In: Proceedings of the 10th Workshop on Distributed Algorithms (WDAG), pp. 71–88. Springer, Berlin Heidelberg New York (1996)

  31. Trotter W.T. (1992) Combinatorics and partially ordered sets: dimension theory. The Johns Hopkins University Press, Baltimore

    MATH  Google Scholar 

  32. Ward, P.A.S. An offline algorithm for dimension-bound analysis. In: Panda, D., Shiratori, N. (eds.) Proceedings of the International Conference on Parallel Processing, pp. 128–136. IEEE Computer Society (1999)

  33. Ward P.A.S. (1999). An online algorithm for dimension-bound analysis. In: Amestoy P., et al. (eds). Proceedings of the Euro-Par. Lecture Notes in Computer Science (LNCS). Springer, Berlin Heidelberg New York, pp. 144–153

    Google Scholar 

  34. Ward, P.A.S., Taylor, D.T. A hierarchical cluster algorithm for dynamic, centralized timestamps. In: Proceedings of the IEEE International Conference on Distributed Computing Systems (ICDCS), pp. 585–593 (2001)

  35. Yannakakis M. (1982) The complexity of the partial order dimension problem. SIAM J. Algeb. Discrete Methods 3, 351–358

    MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vijay K. Garg.

Additional information

An earlier version of this paper appeared in 2002 Proceedings of the IEEE International Conference on Distributed Computing Systems (ICDCS).

The author V. K. Garg was supported in part by the NSF Grants ECS-9907213, CCR-9988225, an Engineering Foundation Fellowship.

This work was done while the author C. Skawratananond was a Ph.D. student at the University of Texas at Austin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Garg, V.K., Skawratananond, C. & Mittal, N. Timestamping messages and events in a distributed system using synchronous communication. Distrib. Comput. 19, 387–402 (2007). https://doi.org/10.1007/s00446-006-0018-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00446-006-0018-5

Keywords

Navigation