Recovery oriented programming: runtime monitoring of safety and liveness

Brukman, Olga; Dolev, Shlomi

doi:10.1007/s10009-011-0200-3

Recovery oriented programming: runtime monitoring of safety and liveness

Regular Paper
Published: 15 May 2011

Volume 13, pages 377–395, (2011)
Cite this article

International Journal on Software Tools for Technology Transfer Aims and scope Submit manuscript

Olga Brukman¹ &
Shlomi Dolev¹

147 Accesses
3 Citations
3 Altmetric
Explore all metrics

Abstract

We introduce the recovery-oriented programming paradigm. Programs that are designed according to the recovery-oriented programming paradigm include, as an integral part, the important safety and liveness properties that the program should respect and the recovery actions that should be executed upon a violation of these properties. We design a pre-compiler that compiles the properties and recovery actions into a code snippet for monitoring properties and enforcing recovery actions upon property violation. Assuming the restartability property of a given program and the existence of a self-stabilizing software platform, the compiled program is able to recover from safety and liveness violations. We provide a generic correctness proof scheme for recovery-oriented programs, proving that the code, as transformed by the pre-compiler, converges to a legal execution in a finite number of steps after experiencing an arbitrary failure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

An empirical study of automated unit test generation for Python

Article Open access 31 January 2023

Stephan Lukasczyk, Florian Kroiß & Gordon Fraser

Introduction to Model Checking

A Formal Semantics for P-Code

References

Arora, A., Theimer, M.: On modeling and tolerating incorrect software. Tech. Rep. MSR-TR-2003-27, Microsoft Research (2003)
Barringer, H., Goldberg, A., Havelund, K., Sen, K.: Program monitoring with ltl in eagle. In: Proceedings of the Workshop on Parallel and Distributed Systems: Testing and Debugging (PADTAD), p. 264. IEEE Computer Society, Washington (2004)
Baumann R.: Soft errors in advanced computer systems. IEEE Des. Test 22(3), 258–266 (2005)
Article Google Scholar
Beck K., Andres C.: Extreme Programming Explained: Embrace Change, 2nd edn. Addison-Wesley, Boston (2004)
Google Scholar
Bracha, G.: An asynchronous [(n − 1)/3]-resilient consensus protocol. In: Proceedings of the 3d Annual ACM Symposium on Principles of Distributed Computing (PODC), pp. 154–162. ACM, New York (1984)
Brukman, O., Dolev, S., Haviv, Y., Yagel, R.: Self-stabilization as a foundation for autonomic computing. In: Proceedings of the The 2nd International Conference on Availability, Reliability and Security (ARES), pp. 991–998. IEEE Computer Society, Washington (2007)
Brukman, O., Dolev, S., Kolodner, E.K.: Self-stabilizing autonomic recoverer for eventual byzantine software. In: Proceedings of the IEEE International Conference on Software-Science, Technology & Engineering (SWSTE), pp. 20–29 (2003)
Burdy L., Cheon Y., Cok D., Ernst M.D., Kiniry J., Leavens G.T., Leino K.R.M., Poll E.: An overview of JML tools and applications. Softw. Tools Technol. Transfer 7(3), 212–232 (2005)
Article Google Scholar
Candea, G., Fox, A.: Crash-only software. In: HOTOS’03: Proceedings of the 9th Conference on Hot Topics in Operating Systems, pp. 12–12. USENIX Association, Berkeley (2003)
Candea, G., Kawamoto, S., Fujiki, Y., Friedman, G., Fox, A.: Microreboot—a technique for cheap recovery. In: OSDI’04: Proceedings of the 6th Symposium on Operating Systems Design & Implementation, pp. 31–44. USENIX Association, Berkeley (2004)
Castro M., Liskov B.: Practical byzantine fault tolerance and proactive recovery. ACM Trans. Comput. Syst. 20(4), 398–461 (2002)
Article Google Scholar
Chandy K.M., Lamport L.: Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Syst. 3(1), 63–75 (1985)
Article Google Scholar
Chen, F., Rosu, G.: Java-mop: a monitoring oriented programming environment for java. In: Proceedings of 11th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS). Lecture Notes in Computer Science, vol. 3440, pp. 546–550. Springer, Berlin (2005)
Constable R.L., Knoblock T.B., Bates J.L.: Writing programs that construct proofs. J. Autom. Reason. 1(3), 285–326 (1985)
Article MATH MathSciNet Google Scholar
Demsky, B., Rinard, M.: Automatic detection and repair of errors in data structures. In: Proceedings of the 18th Annual ACM SIGPLAN Conference on Object-Oriented Programing, Systems, Languages, and Applications (OOPSLA), pp. 78–95. ACM, New York (2003)
Dolev S.: Self-Stabilization. MIT Press, Cambridge (2000)
MATH Google Scholar
Dolev S., Haviv Y.A.: Self-stabilizing microprocessor: analyzing and overcoming soft errors. IEEE Trans. Comput. 55(4), 385–399 (2006)
Article Google Scholar
Dolev S., Welch J.L.: Self-stabilizing clock synchronization in the presence of byzantine faults. J. ACM 51(5), 780–799 (2004)
Article MATH MathSciNet Google Scholar
Dolev, S., Yagel, R.: Toward self-stabilizing operating systems. In: Proceedings of the 15th International Workshop on Database and Expert Systems Applications (DEXA), pp. 684–688. IEEE Computer Society, Washington (2004)
Drusinsky, D.: Monitoring temporal rules combined with time series. In: In CAV03. LNCS, vol. 2725, pp. 114–118. Springer, New York (2003)
Easwaran A., Kannan S., Sokolovsky O.: steering of discrete event systems: Control theory approach. Electron. Notes Theor. Comput. Sci. 144(4), 21–39 (2005)
Article Google Scholar
Elkarablieh, B., Khurshid, S.: Juzi: a tool for repairing complex data structures. In: Proceedings of the 30th International Conference on Software Engineering (ICSE), pp. 855–858. ACM, New York (2008)
Falcone, Y., Fernandez, J.C., Mounier, L.: Synthesizing enforcement monitors wrt. the safety-progress classification of properties. In: Proceedings of the 4th International Conference on Information Systems Security (ICISS), pp. 41–55. Springer, Berlin (2008)
Friedman D.P., Haynes C.T., Wand M.: Essentials of Programming Languages, 2nd edn. Massachusetts Institute of Technology, Cambridge (2001)
MATH Google Scholar
Gurevich, Y., Rossman, B., Schulte, W.: Semantic essence of asml. Tech. Rep. MSR-TR-2004-27, Microsoft Research (2004)
Havelund K., Havelund K., Havelund K.: An overview of the runtime verification tool java pathexplorer. Formal Methods Syst. Des. 24(2), 189–215 (2004)
Article MATH Google Scholar
Haviv, Y.A.: Self-stabilizing fault-resilient embedded systems. Ph.D. thesis, Ben-Gurion University of the Negev, Be’er Sheva, Israel (2006)
Kim, M., Kannan, S., Lee, I., Sokolsky, O., Viswanathan, M.: Java-mac: a run-time assurance tool for java programs. In: Proceedings of the Conference on Runtime Verification, volume 55 of ENTCS. Elsevier, Amsterdam (2001)
Lamport L., Shostak R.E., Pease M.C.: The byzantine generals problem. ACM Trans. Program. Lang. Syst. 4(3), 382–401 (1982)
Article MATH Google Scholar
Leal, W., Arora, A.: Scalable self-stabilization via composition. Tech. Rep. OSU-CISRC-7/03-TR46, Department of Computer Information Science, The Ohio State University (2003). http://www.cse.ohio-state.edu
Lynch N.A.: Distributed Algorithms. Morgan Kaufmann, San Francisco (1996)
MATH Google Scholar
McConnell S.: Code Complete, 2nd edn. Microsoft Press, Redmond (2004)
Google Scholar
Neumann P.G.: Computer-Related Risks. Addison-Wesley, Boston (1994)
Google Scholar
Patterson, D., Brown, A., Broadwell, P., Candea, G., Chen, M., Cutler, J., Enriquez, P., Fox, A., Kiciman, E., Merzbacher, M., Oppenheimer, D., Sastry, N., Tetzlaff, W., Traupamn, J., Treuhaft, N.: Recovery oriented computing (roc): motivation, definition, techniques, and case studies. Tech. rep., UC Berkeley (2002)
Project, A.: AKKA: Simpler scalability, fault-tolerance, concurrency & remoting through actors (2010). http://akka.io/
Randell B., Lee P., Treleaven P.C.: Reliability issues in computing system design. ACM Comput. Surv. 10(2), 123–165 (1978)
Article MATH Google Scholar
Randell, B., Xu, J.: The evolution of the recovery block concept. In: Software Fault Tolerance, chap. 1, pp. 1–22. Wiley, New York (1994)
Rinard, M., Cadar, C., Dumitran, D., Roy, D.M., Leu, T., William S. Beebee, J.: Enhancing server availability and security through failure-oblivious computing. In: Proceedings of the 6th Symposium on Operating Systems Design & Implementation (OSDI), pp. 21–21. USENIX Association, Berkeley (2004)
Rinard, M., Cadar, C., Nguyen, H.H.: Exploring the acceptability envelope. In: Companion to the 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pp. 21–30. ACM, New York (2005)
Rist R., Terwilliger R.: Object-Oriented Programming in Eiffel. Prentice-Hall, Upper Saddle River (1995)
Google Scholar
Rosen E.C., Beranek B.: Rfc 789: vulnerabilities of network control protocols: an example. Comput. Commun. Rev. 11, 10–16 (1981)
Article Google Scholar
Rothamel, T., Liu, Y.A., Heitmeyer, C.L., Leonard, E.I.: Generating optimized code from SCR specifications. In: Proceedings of the 2006 ACM SIGPLAN/SIGBED Conference on Language, Compilers and Tool Support for Embedded Systems, pp. 135–144. ACM Press, New York (2006)
Schneider F.B., Zhou L.: Implementing trustworthy services using replicated state machines. IEEE Secur. Priv. 3, 34–43 (2005)
Google Scholar
Schulze, M., Gibson, G.A., Katz, R.H., Patterson, D.A.: How reliable is a raid? In: COMPCON, pp. 118–123 (1989)
Sen K., Roşu G., Agha G.: Runtime safety analysis of multithreaded programs. SIGSOFT Softw. Eng. Notes 28(5), 337–346 (2003)
Article Google Scholar
Shivakumar, P., Kistler, M., Keckler, S.W., Burger, D., Alvisi, L., Technical, I., Keaty, C.J., Bell, R., Rajamony, R.: Modeling the effect of technology trends on the soft error rate of combinational logic. In: Proceedings of the International Conference on Dependable Systems and Networks, pp. 389–398 (2002)
Verssimo, P.E., Neves, N.F., Correia, M.P.: Intrusion-tolerant architectures: concepts and design. In: Architecting Dependable Systems. Lecture Notes in Computer Science, vol. 2677, pp. 3–36. Springer, New York (2003)
Xu, J., Romanovsky, A., Stroud, R.J., Zorzo, A.F.: Rigorous development of a safety-critical system based on coordinated atomic actions. In: Proceedings of the 29th International Symposium on Fault-Tolerant Computing, pp. 68–75. IEEE Computer Society Press (1999)

Download references

Author information

Authors and Affiliations

Ben-Gurion University of the Negev, 84105, Beer-Sheva, Israel
Olga Brukman & Shlomi Dolev

Authors

Olga Brukman
View author publications
You can also search for this author in PubMed Google Scholar
Shlomi Dolev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Olga Brukman.

Additional information

An extended abstract of this work was presented at the 8th International Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS’06), Austin, Texas, USA, 2006, and the 20th ACM Symposium on Operating Systems Principles (SOSP’05), Brighton, England, 2005.

This work was partially supported by the Lynne and William Frankel Center for Computer Sciences and the Rita Altura Trust Chair in Computer Sciences.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Brukman, O., Dolev, S. Recovery oriented programming: runtime monitoring of safety and liveness. Int J Softw Tools Technol Transfer 13, 377–395 (2011). https://doi.org/10.1007/s10009-011-0200-3

Download citation

Published: 15 May 2011
Issue Date: August 2011
DOI: https://doi.org/10.1007/s10009-011-0200-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Recovery oriented programming: runtime monitoring of safety and liveness

Abstract

Access this article

Similar content being viewed by others

An empirical study of automated unit test generation for Python

Introduction to Model Checking

A Formal Semantics for P-Code

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Recovery oriented programming: runtime monitoring of safety and liveness

Abstract

Access this article

Similar content being viewed by others

An empirical study of automated unit test generation for Python

Introduction to Model Checking

A Formal Semantics for P-Code

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation