REASSURE: A Self-contained Mechanism for Healing Software Using Rescue Points

  • Georgios Portokalidis
  • Angelos D. Keromytis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7038)


Software errors are frequently responsible for the limited availability of Internet Services, loss of data, and many security compromises. Self-healing using rescue points (RPs) is a mechanism that can be used to recover software from unforeseen errors until a more permanent remedy, like a patch or update, is available. We present REASSURE, a self-contained mechanism for recovering from such errors using RPs. Essentially, RPs are existing code locations that handle certain anticipated errors in the target application, usually by returning an error code. REASSURE enables the use of these locations to also handle unexpected faults. This is achieved by rolling back execution to a RP when a fault occurs, returning a valid error code, and enabling the application to gracefully handle the unexpected error itself. REASSURE can be applied on already running applications, while disabling and removing it is equally facile. We tested REASSURE with various applications, including the MySQL and Apache servers, and show that it allows them to successfully recover from errors, while incurring moderate overhead between 1% and 115%. We also show that even under very adverse conditions, like their continuous bombardment with errors, REASSURE protected applications remain operational.


Virtual Machine Rescue Point Healing Software Unexpected Error Software Rejuvenation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Akritidis, P., Cadar, C., Raiciu, C., Costa, M., Castro, M.: Preventing memory error exploits with WIT. In: Proc. of the Symposium on Security and Privacy, pp. 263–277 (May 2008)Google Scholar
  2. 2.
    Bessey, A., Block, K., Chelf, B., Chou, A., Fulton, B., Hallem, S., Henri-Gros, C., Kamsky, A., McPeak, S., Engler, D.: A few billion lines of code later: using static analysis to find bugs in the real world. Commun. ACM 53, 66–75 (2010)CrossRefGoogle Scholar
  3. 3.
    Bressoud, T.C., Schneider, F.B.: Hypervisor-based fault tolerance. In: Proc. of the 15th ACM symposium on Operating systems principles (SOSP), pp. 1–11 (1995)Google Scholar
  4. 4.
    Buck, B., Hollingsworth, J.K.: An api for runtime code patching. Int. J. High Perform. Comput. Appl. 14, 317–329 (2000)CrossRefGoogle Scholar
  5. 5.
    Cadar, C., Dunbar, D., Engler, D.: KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proc. of the 8th OSDI, pp. 209–224 (2008)Google Scholar
  6. 6.
    Candea, G., Fox, A.: Crash-only software. In: Proc. of the 9th Workshop on Hot Topics in Operating Systems, HotOS IX (May 2003)Google Scholar
  7. 7.
    Cowan, C., Barringer, M., Beattie, S., Kroah-Hartman, G.: FormatGuard: Automatic Protection From printf Format String Vulnerabilities. In: Proc. of the 10th USENIX Security Symposium, pp. 191–199 (August 2001)Google Scholar
  8. 8.
    Etoh, J.: GCC extension for protecting applications from stack-smashing attacks,
  9. 9.
    Hicks, M., Nettles, S.: Dynamic software updating. ACM Trans. Program. Lang. Syst. 27, 1049–1096 (2005)CrossRefGoogle Scholar
  10. 10.
    Huang, Y., Kintala, C., Kolettis, N., Fulton, N.: Software rejuvenation: Analysis, module and applications. In: Proc. of the 25th International Symposium on Fault-Tolerant Computing (FTCS), p. 381 (1995)Google Scholar
  11. 11.
    Information Week: Windows home server bug could lead to data loss, (December 2007)
  12. 12.
    Kc, G.S., Keromytis, A.D., Prevelakis, V.: Countering code-injection attacks with instruction-set randomization. In: Proc. of the 10th CCS, pp. 272–280 (October 2003)Google Scholar
  13. 13.
    Keromytis, A.D.: Characterizing self-healing software systems. In: Proc. of the 4th MMM-ACNS (September 2007)Google Scholar
  14. 14.
    King, S.T., Dunlap, G.W., Chen, P.M.: Debugging operating systems with time-traveling virtual machines. In: Proc. of the USENIX Annual Technical Conference (2005)Google Scholar
  15. 15.
    Laadan, O., Nieh, J.: Transparent checkpoint-restart of multiple processes on commodity operating systems. In: Proc. of the 2007 USENIX ATC, pp. 323–336 (2007)Google Scholar
  16. 16.
    Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: Building customized program analysis tools with dynamic instrumentation. In: Proc. of the 2005 PLDI, pp. 190–200 (June 2005)Google Scholar
  17. 17.
    Makris, K., Ryu, K.D.: Dynamic and adaptive updates of non-quiescent subsystems in commodity operating system kernels. In: Proc. of the 2nd EuroSys, pp. 327–340 (March 2007)Google Scholar
  18. 18.
    Newsome, J., Song, D.: Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software. In: Proc. of the 12th NDSS (February 2005)Google Scholar
  19. 19.
    Osman, S., Subhraveti, D., Su, G., Nieh, J.: The design and implementation of Zap: a system for migrating computing environments. In: Proc. of the 5th OSDI, pp. 361–376 (December 2002)Google Scholar
  20. 20.
    Ostrand, T.J., Weyuker, E.J.: The distribution of faults in a large industrial software system. In: Proc. of the 2002 ACM SIGSOFT ISSTA, pp. 55–64 (2002)Google Scholar
  21. 21.
    PaX Project: Address space layout randomization (March 2003),
  22. 22.
  23. 23.
    Perkins, J.H., Kim, S., Larsen, S., Amarasinghe, S., Bachrach, J., Carbin, M., Pacheco, C., Sherwood, F., Sidiroglou, S., Sullivan, G., Wong, W.F., Zibin, Y., Ernst, M.D., Rinard, M.: Automatically patching errors in deployed software. In: Proc. of the ACM SIGOPS 22nd symposium on Operating systems principles, pp. 87–102 (2009)Google Scholar
  24. 24.
    Porras, P., Saidi, H., Yegneswaran, V.: Conficker C analysis. Tech. rep., SRI International (2009)Google Scholar
  25. 25.
    Rinard, M., Cadar, C., Dumitran, D., Roy, D., Leu, T.W., Beebee, J.: Enhancing server availability and security through failure-oblivious computing. In: Proc. of the 6th OSDI (December 2004)Google Scholar
  26. 26.
    Sidiroglou, S., Laadan, O., Perez, C., Viennot, N., Nieh, J., Keromytis, A.D.: ASSURE: automatic software self-healing using rescue points. In: Proc. of the 14th ASPLOS, pp. 37–48 (2009)Google Scholar
  27. 27.
    Sidiroglou, S., Locasto, M.E., Boyd, S.W., Keromytis, A.D.: Building a reactive immune system for software services. In: Proc. of the 2005 USENIX ATC (April 2005)Google Scholar
  28. 28.
    Sullivan, M., Chillarege, R.: Software defects and their impact on system availability - A study of field failures in operating systems. Digest of Papers., 21st International Symposium on Fault Tolerant Computing (FTCS-21), pp. 2–9 (1991)Google Scholar
  29. 29.
    Susskraut, M., Fetzer, C.: Automatically finding and patching bad error handling. In: Proc. of the Sixth European Dependable Computing Conference, pp. 13–22 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Georgios Portokalidis
    • 1
  • Angelos D. Keromytis
    • 1
  1. 1.Network Security Lab, Department of Computer ScienceColumbia UniversityNew YorkUSA

Personalised recommendations