Advertisement

Living with Nondeterminism in Replicated Middleware Applications

  • Joseph Slember
  • Priya Narasimhan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4290)

Abstract

Application-level nondeterminism can lead to inconsistent state that defeats the purpose of replication as a fault-tolerance strategy. We present Midas, a new approach for living with nondeterminism in distributed, replicated, middleware applications. Midas exploits (i) the static program analysis of the application’s source code prior to replica deployment and (ii) the online compensation of replica divergence even as replicas execute. We identify the sources of nondeterminism within the application, discriminate between actual and superficial nondeterminism, and track the propagation of actual nondeterminism. We evaluate our techniques for the active replication of servers using micro-benchmarks that contain various sources (multi-threading, system calls and propagation) of nondeterminism.

Keywords

Communication Overhead Shared State Active Replication Transparent Approach Server Replica 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
  2. 2.
    Aigner, G., Diwan, A., Heine, D.L., Lam, M.S., Moore, D.L., Murphy, B.R., Sapuntzakis, C.: The Basic SUIF Programming GuideGoogle Scholar
  3. 3.
    Amir, Y., Danilov, C., Stanton, J.: A low latency, loss tolerant architecture and protocol for wide area group communication. In: The International Conference on Dependable Systems and Networks, New York, NY, June 2000, pp. 327–336 (2000)Google Scholar
  4. 4.
    Barrett, P., Bond, P., Hilborne, A., Rodrigues, L., Seaton, D., Speirs, N., Verissimo, P.: The Delta-4 extra performance architecture (XPA). In: Fault-Tolerant Computing Symposium, Newcastle, UK, June 1990, pp. 481–488 (1990)Google Scholar
  5. 5.
    Basile, C., Kalbarczyk, Z., Iyer, R.: A preemptive deterministic scheduling algorithm for multithreaded replicas. In: The International Conference on Dependable Systems and Networks, San Francisco, CA, pp. 149–158 (June 2003)Google Scholar
  6. 6.
    Bestaoui, S.: One solution for the nondeterminism problem in the SCEPTRE 2 fault tolerance technique. In: Euromicro Workshop on Real-Time Systems, Odense, Denmark, June 1995, pp. 352–358 (1995)Google Scholar
  7. 7.
    Bressoud, T.C.: TFT: A software system for application-transparent fault tolerance. In: Fault-Tolerant Computing Symposium, Munich, Germany, pp. 128–137 (June 1998)Google Scholar
  8. 8.
    Bressoud, T.C., Schneider, F.B.: Hypervisor-based fault-tolerance. ACM Transactions on Computer Systems 14(1), 90–107 (1996)CrossRefGoogle Scholar
  9. 9.
    Frolund, S., Guerraoui, R.: X-ability: A theory of replication. In: Principles of Distributed Computing, Portland, OR, pp. 229–237 (2000)Google Scholar
  10. 10.
    Gaifman, H., Maher, M.J., Shapiro, E.: Replay, recovery, replication, and snapshots of nondeterministic concurrent programs. In: Principles of Distributed Computing, Montreal, Canada, August 1991, pp. 241–255 (1991)Google Scholar
  11. 11.
    Jimenez-Peris, R., Patino-Martinez, M., Arevalo, S.: Deterministic scheduling for transactional multithreaded replicas. In: Symposium on Reliable Distributed Systems, Nurnberg, Germany, October 2000, pp. 164–173 (2000)Google Scholar
  12. 12.
    Narasimhan, P., Dumitraş, T.A., Pertet, S.M., Reverte, C.F., Slember, J.G., Srivastava, D.: MEAD: Support for real-time fault-tolerant CORBA. Concurrency and Computation: Practice and Experience 17(12), 1527–1545 (2005)CrossRefGoogle Scholar
  13. 13.
    Narasimhan, P., Moser, L.E., Melliar-Smith, P.M.: Enforcing determinism for the consistent replication of multithreaded CORBA applications. In: Symposium on Reliable Distributed Systems, Lausanne, Switzerland, October 1999, pp. 263–273 (1999)Google Scholar
  14. 14.
    Object Management Group. Fault Tolerant CORBA. OMG Technical Committee Document formal/2001-09-29 (September 2001)Google Scholar
  15. 15.
    Orgiyan, M., Fetzer, C.: Tapping TCP streams. In: IEEE International Symposium on Network Computing and Applications, Cambridge, MA, pp. 278–289 (October 2001)Google Scholar
  16. 16.
    Poledna, S.: Replica Determinism in Fault-Tolerant Real-Time Systems. PhD thesis, Technical University of Vienna, Vienna, Austria (April 1994)Google Scholar
  17. 17.
    Schneider, F.B.: Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys 22(4), 299–319 (1990)CrossRefGoogle Scholar
  18. 18.
    Slember, J.G., Narasimhan, P.: Exploiting program analysis to identify and sanitize nondeterminism in fault-tolerant, replicated systems. In: Symposium on Reliable Distributed Systems, Florianopolis, Brazil, October 2004, pp. 251–263 (2004)Google Scholar
  19. 19.
    Slember, J.G., Narasimhan, P.: Nondeterminism in ORBs: The perception and the reality. In: Workshop on High Availability of Distributed Systems, Krakow, Poland (September 2006)Google Scholar
  20. 20.
    Slye, J.H., Elnozahy, E.N.: Supporting nondeterministic execution in fault-tolerant systems. In: Fault-Tolerant Computing Symposium, Sendai, Japan, June 1996, pp. 250–259 (1996)Google Scholar
  21. 21.
    Vogels, W., van Renesse, R., Birman, K.: Six misconceptions about reliable distributed computing. In: ACM Special Interest Group on Operating Systems, European Workshop, Sintra, Portugal (September 1998)Google Scholar
  22. 22.
    White, B., Lepreau, J., Stoller, L., Ricci, R., Guruprasad, S., Newbold, M., Hibler, M., Barb, C., Joglekar, A.: An integrated experimental environment for distributed systems and networks. In: Symposium on Operating Systems Design and Implementation, Boston, MA, December 2002, pp. 255–270 (2002)Google Scholar
  23. 23.
    Wolf, T.: Replication of Non-Deterministic Objects. PhD thesis, Ecole Polytechnique Federale de Lausanne, Switzerland (November 1988)Google Scholar
  24. 24.
    Zagorodnov, D., Marzullo, K.: Managing self-inflicted nondeterminism. In: HotDep, International Conference on Dependable Systems and Networks, Yokohama, Japan, June 2005, pp. 323–328 (2005)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2006

Authors and Affiliations

  • Joseph Slember
    • 1
  • Priya Narasimhan
    • 1
  1. 1.Carnegie Mellon UniversityPittsburghUSA

Personalised recommendations