Abstract
We introduce the recovery-oriented programming paradigm. Programs that are designed according to the recovery-oriented programming paradigm include, as an integral part, the important safety and liveness properties that the program should respect and the recovery actions that should be executed upon a violation of these properties. We design a pre-compiler that compiles the properties and recovery actions into a code snippet for monitoring properties and enforcing recovery actions upon property violation. Assuming the restartability property of a given program and the existence of a self-stabilizing software platform, the compiled program is able to recover from safety and liveness violations. We provide a generic correctness proof scheme for recovery-oriented programs, proving that the code, as transformed by the pre-compiler, converges to a legal execution in a finite number of steps after experiencing an arbitrary failure.
Similar content being viewed by others
References
Arora, A., Theimer, M.: On modeling and tolerating incorrect software. Tech. Rep. MSR-TR-2003-27, Microsoft Research (2003)
Barringer, H., Goldberg, A., Havelund, K., Sen, K.: Program monitoring with ltl in eagle. In: Proceedings of the Workshop on Parallel and Distributed Systems: Testing and Debugging (PADTAD), p. 264. IEEE Computer Society, Washington (2004)
Baumann R.: Soft errors in advanced computer systems. IEEE Des. Test 22(3), 258–266 (2005)
Beck K., Andres C.: Extreme Programming Explained: Embrace Change, 2nd edn. Addison-Wesley, Boston (2004)
Bracha, G.: An asynchronous [(n − 1)/3]-resilient consensus protocol. In: Proceedings of the 3d Annual ACM Symposium on Principles of Distributed Computing (PODC), pp. 154–162. ACM, New York (1984)
Brukman, O., Dolev, S., Haviv, Y., Yagel, R.: Self-stabilization as a foundation for autonomic computing. In: Proceedings of the The 2nd International Conference on Availability, Reliability and Security (ARES), pp. 991–998. IEEE Computer Society, Washington (2007)
Brukman, O., Dolev, S., Kolodner, E.K.: Self-stabilizing autonomic recoverer for eventual byzantine software. In: Proceedings of the IEEE International Conference on Software-Science, Technology & Engineering (SWSTE), pp. 20–29 (2003)
Burdy L., Cheon Y., Cok D., Ernst M.D., Kiniry J., Leavens G.T., Leino K.R.M., Poll E.: An overview of JML tools and applications. Softw. Tools Technol. Transfer 7(3), 212–232 (2005)
Candea, G., Fox, A.: Crash-only software. In: HOTOS’03: Proceedings of the 9th Conference on Hot Topics in Operating Systems, pp. 12–12. USENIX Association, Berkeley (2003)
Candea, G., Kawamoto, S., Fujiki, Y., Friedman, G., Fox, A.: Microreboot—a technique for cheap recovery. In: OSDI’04: Proceedings of the 6th Symposium on Operating Systems Design & Implementation, pp. 31–44. USENIX Association, Berkeley (2004)
Castro M., Liskov B.: Practical byzantine fault tolerance and proactive recovery. ACM Trans. Comput. Syst. 20(4), 398–461 (2002)
Chandy K.M., Lamport L.: Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Syst. 3(1), 63–75 (1985)
Chen, F., Rosu, G.: Java-mop: a monitoring oriented programming environment for java. In: Proceedings of 11th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS). Lecture Notes in Computer Science, vol. 3440, pp. 546–550. Springer, Berlin (2005)
Constable R.L., Knoblock T.B., Bates J.L.: Writing programs that construct proofs. J. Autom. Reason. 1(3), 285–326 (1985)
Demsky, B., Rinard, M.: Automatic detection and repair of errors in data structures. In: Proceedings of the 18th Annual ACM SIGPLAN Conference on Object-Oriented Programing, Systems, Languages, and Applications (OOPSLA), pp. 78–95. ACM, New York (2003)
Dolev S.: Self-Stabilization. MIT Press, Cambridge (2000)
Dolev S., Haviv Y.A.: Self-stabilizing microprocessor: analyzing and overcoming soft errors. IEEE Trans. Comput. 55(4), 385–399 (2006)
Dolev S., Welch J.L.: Self-stabilizing clock synchronization in the presence of byzantine faults. J. ACM 51(5), 780–799 (2004)
Dolev, S., Yagel, R.: Toward self-stabilizing operating systems. In: Proceedings of the 15th International Workshop on Database and Expert Systems Applications (DEXA), pp. 684–688. IEEE Computer Society, Washington (2004)
Drusinsky, D.: Monitoring temporal rules combined with time series. In: In CAV03. LNCS, vol. 2725, pp. 114–118. Springer, New York (2003)
Easwaran A., Kannan S., Sokolovsky O.: steering of discrete event systems: Control theory approach. Electron. Notes Theor. Comput. Sci. 144(4), 21–39 (2005)
Elkarablieh, B., Khurshid, S.: Juzi: a tool for repairing complex data structures. In: Proceedings of the 30th International Conference on Software Engineering (ICSE), pp. 855–858. ACM, New York (2008)
Falcone, Y., Fernandez, J.C., Mounier, L.: Synthesizing enforcement monitors wrt. the safety-progress classification of properties. In: Proceedings of the 4th International Conference on Information Systems Security (ICISS), pp. 41–55. Springer, Berlin (2008)
Friedman D.P., Haynes C.T., Wand M.: Essentials of Programming Languages, 2nd edn. Massachusetts Institute of Technology, Cambridge (2001)
Gurevich, Y., Rossman, B., Schulte, W.: Semantic essence of asml. Tech. Rep. MSR-TR-2004-27, Microsoft Research (2004)
Havelund K., Havelund K., Havelund K.: An overview of the runtime verification tool java pathexplorer. Formal Methods Syst. Des. 24(2), 189–215 (2004)
Haviv, Y.A.: Self-stabilizing fault-resilient embedded systems. Ph.D. thesis, Ben-Gurion University of the Negev, Be’er Sheva, Israel (2006)
Kim, M., Kannan, S., Lee, I., Sokolsky, O., Viswanathan, M.: Java-mac: a run-time assurance tool for java programs. In: Proceedings of the Conference on Runtime Verification, volume 55 of ENTCS. Elsevier, Amsterdam (2001)
Lamport L., Shostak R.E., Pease M.C.: The byzantine generals problem. ACM Trans. Program. Lang. Syst. 4(3), 382–401 (1982)
Leal, W., Arora, A.: Scalable self-stabilization via composition. Tech. Rep. OSU-CISRC-7/03-TR46, Department of Computer Information Science, The Ohio State University (2003). http://www.cse.ohio-state.edu
Lynch N.A.: Distributed Algorithms. Morgan Kaufmann, San Francisco (1996)
McConnell S.: Code Complete, 2nd edn. Microsoft Press, Redmond (2004)
Neumann P.G.: Computer-Related Risks. Addison-Wesley, Boston (1994)
Patterson, D., Brown, A., Broadwell, P., Candea, G., Chen, M., Cutler, J., Enriquez, P., Fox, A., Kiciman, E., Merzbacher, M., Oppenheimer, D., Sastry, N., Tetzlaff, W., Traupamn, J., Treuhaft, N.: Recovery oriented computing (roc): motivation, definition, techniques, and case studies. Tech. rep., UC Berkeley (2002)
Project, A.: AKKA: Simpler scalability, fault-tolerance, concurrency & remoting through actors (2010). http://akka.io/
Randell B., Lee P., Treleaven P.C.: Reliability issues in computing system design. ACM Comput. Surv. 10(2), 123–165 (1978)
Randell, B., Xu, J.: The evolution of the recovery block concept. In: Software Fault Tolerance, chap. 1, pp. 1–22. Wiley, New York (1994)
Rinard, M., Cadar, C., Dumitran, D., Roy, D.M., Leu, T., William S. Beebee, J.: Enhancing server availability and security through failure-oblivious computing. In: Proceedings of the 6th Symposium on Operating Systems Design & Implementation (OSDI), pp. 21–21. USENIX Association, Berkeley (2004)
Rinard, M., Cadar, C., Nguyen, H.H.: Exploring the acceptability envelope. In: Companion to the 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pp. 21–30. ACM, New York (2005)
Rist R., Terwilliger R.: Object-Oriented Programming in Eiffel. Prentice-Hall, Upper Saddle River (1995)
Rosen E.C., Beranek B.: Rfc 789: vulnerabilities of network control protocols: an example. Comput. Commun. Rev. 11, 10–16 (1981)
Rothamel, T., Liu, Y.A., Heitmeyer, C.L., Leonard, E.I.: Generating optimized code from SCR specifications. In: Proceedings of the 2006 ACM SIGPLAN/SIGBED Conference on Language, Compilers and Tool Support for Embedded Systems, pp. 135–144. ACM Press, New York (2006)
Schneider F.B., Zhou L.: Implementing trustworthy services using replicated state machines. IEEE Secur. Priv. 3, 34–43 (2005)
Schulze, M., Gibson, G.A., Katz, R.H., Patterson, D.A.: How reliable is a raid? In: COMPCON, pp. 118–123 (1989)
Sen K., Roşu G., Agha G.: Runtime safety analysis of multithreaded programs. SIGSOFT Softw. Eng. Notes 28(5), 337–346 (2003)
Shivakumar, P., Kistler, M., Keckler, S.W., Burger, D., Alvisi, L., Technical, I., Keaty, C.J., Bell, R., Rajamony, R.: Modeling the effect of technology trends on the soft error rate of combinational logic. In: Proceedings of the International Conference on Dependable Systems and Networks, pp. 389–398 (2002)
Verssimo, P.E., Neves, N.F., Correia, M.P.: Intrusion-tolerant architectures: concepts and design. In: Architecting Dependable Systems. Lecture Notes in Computer Science, vol. 2677, pp. 3–36. Springer, New York (2003)
Xu, J., Romanovsky, A., Stroud, R.J., Zorzo, A.F.: Rigorous development of a safety-critical system based on coordinated atomic actions. In: Proceedings of the 29th International Symposium on Fault-Tolerant Computing, pp. 68–75. IEEE Computer Society Press (1999)
Author information
Authors and Affiliations
Corresponding author
Additional information
An extended abstract of this work was presented at the 8th International Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS’06), Austin, Texas, USA, 2006, and the 20th ACM Symposium on Operating Systems Principles (SOSP’05), Brighton, England, 2005.
This work was partially supported by the Lynne and William Frankel Center for Computer Sciences and the Rita Altura Trust Chair in Computer Sciences.
Rights and permissions
About this article
Cite this article
Brukman, O., Dolev, S. Recovery oriented programming: runtime monitoring of safety and liveness. Int J Softw Tools Technol Transfer 13, 377–395 (2011). https://doi.org/10.1007/s10009-011-0200-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10009-011-0200-3