Advertisement

Compiler Support for Fine-Grain Software-Only Checkpointing

  • Chuck (Chengyan) Zhao
  • J. Gregory Steffan
  • Cristiana Amza
  • Allan Kielstra
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7210)

Abstract

Checkpointing support allows program execution to roll-back to an earlier program point, discarding any modifications made since that point. Existing software-based checkpointing methods are mainly libraries that snapshot all of working-memory, and hence have prohibitive overhead for many potential applications. In this paper we present a light-weight, fine-grain checkpointing framework implemented entirely in software through compiler transformations and optimizations. A programmer can specify arbitrary checkpoint regions via a simple API, and the compiler automatically transforms the code to implement the checkpoint at the granularity of individual stores, optimizing to remove redundancy. We explore two application areas for this support. First, we investigate its application to debugging, in particular by providing the ability to rewind to an arbitrarily-placed point in a buggy program’s execution. A study using BugBench applications shows that our compiler-based approach is more than 100x less overhead than full-process checkpointing. Second, we demonstrate that compiler-based checkpointing support can be leveraged to free the programmer from manually implementing and maintaining software rollback mechanisms when coding a back-tracking algorithm, with runtime overhead of only 15% compared to the manual implementation.

Keywords

Transactional Memory Compiler Optimization Checkpoint Region Software Transactional Memory Redundancy Rate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Adl-Tabatabai, A., Lewis, B.T., Menon, V.S., Murphy, B.R., Saha, B., Shpeisman, T.: Compiler and runtime optimizations for efficient software transactional memory. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI (2006)Google Scholar
  2. 2.
    Agrawal, H., Demillo, R., Spafford, E.: An execution-backtracking approach to debugging. IEEE Transactions on Software (May-June 1991)Google Scholar
  3. 3.
    Agrawal, H., Demillo, R., Spafford, E.: Debugging with dynamic slicing and backtracking. Software: Practice and Experience (October 2006)Google Scholar
  4. 4.
    Akkary, H., Rajwar, R., Srinivasan, S.: Checkpoint processing and recovery: An efficient, scalable alternative to reorder buffers. IEEE Computer Society (2003)Google Scholar
  5. 5.
    Betz, V., Rose, J.: Vpr: A new packing, placement and routing tool for fpga research. In: VPR: A New Packing, Placement and Routing Tool for FPGA Research (1997)Google Scholar
  6. 6.
    Betz, V., Rose, J., Marquardt, A.: Architecture and cad for deep-submicron fpgas. Kluwer Academic Publishers (February 1999)Google Scholar
  7. 7.
    Elnozahy, W., Johnson, D., Zwaenepoel, W.: The performance of consistent checkpointing. In: 11th Symposium on Reliable Distributed Systems, pp. 39-47 (October 1992) Google Scholar
  8. 8.
    Feldman, S.I., Brown, C.I.: Igor: A system for program debugging via reversible execution. In: ACM SIGPLAN Notices, Workshop on Parallel and Distributed Debugging (1989)Google Scholar
  9. 9.
    Free Softwar Foundation. Gdb: the gnu debugger manual 7.0 (September 2009)Google Scholar
  10. 10.
    Hammond, L., Willey, M., Olukotun, K.: Data speculation support for a chip multiprocessor. In: ACM SIGOPS Operating Systems (December 1998)Google Scholar
  11. 11.
    Hammond, L., Wong, V., Chen, M., Carlstrom, B.D., Davis, J.D., Hertzberg, B., Prabhu, M., Wijaya, H., Kozyrakis, C., Olukotun, K.: Transactional memory coherence and consistency. In: CM SIGARCH Computer Architecture News (March 2004)Google Scholar
  12. 12.
    Herlihy, M., Luchangco, V., Moir, M., Scherer, W.N.: Software transactional memory for dynamic-sized data structures. In: The Twenty-Second Annual Symposium on Principles of Distributed Computing (2003) Google Scholar
  13. 13.
    Hwu, W., Patt, Y.: Checkpoint repair for out-of-order execution machines. In: Computer Science Division. ACM, University of California at Berkeley (1987)Google Scholar
  14. 14.
    Jagadish, H.V., Silberschatz, A., Sudarshan, S.: Recovering from main-memory lapses. In: Procs. of the International Conf. on Very Large Databases, VLDB (1993)Google Scholar
  15. 15.
    King, S.T., Dunlap, G.W., Chen, P.M.: Debugging operating systems with time-traveling virtual machines. In: Annual USENIX Technical Conference (2005)Google Scholar
  16. 16.
    Kingsley, G., Beck, M., Plank, J.: Compiler-assisted checkpoint optimization using suif. In: First SUIF Compiler Workshop (1995)Google Scholar
  17. 17.
    Lattner, C., Adve, V.: Llvm a compilation framework for lifelong program analysis and transformation. In: Proc. of the 2004 International Symposium on Code Generation and Optimization (CGO) (March 2004)Google Scholar
  18. 18.
    Lattner, C., Adve, V.: The LLVM Compiler Framework and Infrastructure Tutorial. In: Eigenmann, R., Li, Z., Midkiff, S.P. (eds.) LCPC 2004. LNCS, vol. 3602, pp. 15–16. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  19. 19.
    Li, C., Stewart, E., Fuchs, W.: Compiler-assisted full checkpointing. Software-practice and Experience 24(10), 871–886 (1994)CrossRefGoogle Scholar
  20. 20.
    Lu, S., Li, Z., Qin, F., Tan, L., Zhou, P., Zhou, Y.: Bugbench: Benchmarks for evaluating bug detection tools. In: Workshop on the Evaluation of Software Defect Detection Tools (2005)Google Scholar
  21. 21.
    Mcdonald, A., Chung, J., Carlstrom, B.D., Minh, C.C., Chafi, H., Kozyrakis, C., Olukotun, K.: Architectural semantics for practical transactional memory. Computer Architecture News (2006)Google Scholar
  22. 22.
    Moore, K.E., Bobba, J., Moravan, M.J., Hill, M.D., Wood, D.A.: Logtm: Log-based transactional memory. In: High-Performance Computer Architecture (2006)Google Scholar
  23. 23.
    Eliot, J., Moss, B.: Log-based recovery for nested transactions. In: Proceedings of the 13th International Conference on Very Large Data Bases (1987)Google Scholar
  24. 24.
    Ng, W., Chen, P.: The symmetric improvement of fault tolerance in the rio file cache. In: Proceedings of 1999 Fault Tolerance Computing, FTC (1999)Google Scholar
  25. 25.
    Plank, J., Beck, M., Kingsley, G.: Compiler-assisted memory exclusion for fast checkpointing. In: IEEE Technical Committee on Operating System and Application Environments, Special Issue on Fault-Tolerance (1995)Google Scholar
  26. 26.
    Plank, J.S., Beck, M., Kingsley, G., Li, K.: Libckpt: Transparent checkpointing under unix. In: Usenix Winter Technical Conference (1995)Google Scholar
  27. 27.
    Chandra, S.: An evaluation of recovery related properties of software faults. Ph.D. thesis (2004)Google Scholar
  28. 28.
    Saha, B., Adl-Tabatabai, A.-R., Hudson, R.L., Minh, C.C.: Mcrt-stm: A high performance software transactional memory system for a multi-core runtime. In: Principles and Practice of Parallel Programming, PPOPP (2006)Google Scholar
  29. 29.
    Gregory Steffan, J., Colohan, C.B., Zhai, A., Mowry, T.C.: A scalable approach to thread-level speculation. In: International Symposium on Computer Architecture (ISCA) (June 2000)Google Scholar
  30. 30.
    Wang, Y., Huang, Y., Vo, K., Chung, P., Kintala, C.: Checkpointing and its applications. In: 25th Int. Symp. On Fault-Tol. Comp., pp. 22–31 (June 1995)Google Scholar
  31. 31.
    Whaley, J.: System checkpointing using reflection and program analysisGoogle Scholar
  32. 32.
    Xu, M., Malyugin, V., Sheldon, J., Venkitachalam, G., Weissman, B.: Retrace: Collecting execution trace with virtual machine deterministic replay. In: 3rd Workshop on Modeling, Benchmarking and Simulation (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Chuck (Chengyan) Zhao
    • 1
  • J. Gregory Steffan
    • 1
  • Cristiana Amza
    • 1
  • Allan Kielstra
    • 2
  1. 1.Department of Electrical and Computer EngineeringUniversity of TorontoCanada
  2. 2.IBM Canada Toronto LaboratoryCanada

Personalised recommendations