Techniques for Efficient Software Checking

  • Jing Yu
  • María Jesús Garzarán
  • Marc Snir
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5234)

Abstract

Dramatic increases in the number of transistors that can be integrated on a chip make processors more susceptible to radiation-induced transient errors. For commodity chips which are cost- and energy-constrained, we need a flexible and inexpensive technology for fault detection. Software approaches can play a major role for this sector of the market because they need little hardware modifications and can be tailored to fit different requirements of reliability and performance. However, software approaches add a significant overhead.

In this paper we propose two novel techniques that reduce the overhead of software error checking approaches. The first technique uses boolean logic to identify code patterns that correspond to outcome tolerant branches. We develop a compiler algorithm that finds those patterns and removes the unnecessary replicas. In the second technique we evaluate the performance benefit obtained by removing address checks before load and stores. In addition, we evaluate the overheads that can be removed when the register file is protected in hardware.

Our experimental results show that the first technique improves performance by an average 7% for three of the SPEC benchmarks. The second technique can reduce overhead by up-to 50% when the most aggressive optimization is applied.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Constantinescu, C.: Impact of Deep Submicron Technology on Dependability of VLSI Circuits. In: Proc. of the International Conf. on Dependable Systems and Networks, pp. 205–209 (2002)Google Scholar
  2. 2.
    Hazucha, P., Karnik, T., Walstra, S., Bloechel, B., Tschanz, J.W., Maiz, J., Soumyanath, K., Dermer, G., Narendra, S., De, V., Borkar, S.: Measurements and Analysis of SER-tolerant Latch in a 90-nm dual-V/sub T/ CMOS Process. IEEE Journal of Solid-State Circuits 39(9), 1536–1543 (2004)CrossRefGoogle Scholar
  3. 3.
    Karnik, T., Hazucha, P.: Characterization of Soft Errors Caused by Single Event Upsets in CMOS Processes. IEEE Transactions on Dependable and Secure Computing 1(2), 128–143 (2004)CrossRefGoogle Scholar
  4. 4.
    Shivakumar, P., Kistler, M., Keckler, S., Burger, D., Alvisi, L.: Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic. In: Proc. of the International Conf. on Dependable Systems and Networks, pp. 289–398 (2002)Google Scholar
  5. 5.
    Slegel, T., Averill, R., Check, M., Giamei, B., Krumm, B., Krygowski, C., Li, W., Liptay, J., MacDougall, J., McPherson, T., Navarro, J., Schwarz, E., Shum, K., Webb, C.: IBM’s S/390 G5 Microprocessor Design. IEEE Micro 19(2), 12–23 (1999)CrossRefGoogle Scholar
  6. 6.
    McEvoy, D.: The architecture of tandem’s nonstop system. In: ACM 1981: Proceedings of the ACM 1981 conference, p. 245. ACM Press, New York (1981)CrossRefGoogle Scholar
  7. 7.
    Yeh, Y.: Triple-triple Redundant 777 Primary Flight Computer. In: Proc. of the IEEE Aerospace Applications Conference, pp. 293–307 (1996)Google Scholar
  8. 8.
    Mukherjee, S., Emer, J., Fossum, T., Reinhardt, S.: Cache Scrubbing in Microprocessors: Myth or Necessity? In: Proc. of the Pacific RIM International Symposium on Dependable Computing, pp. 37–42 (2004)Google Scholar
  9. 9.
    Prvulovic, M., Zhang, Z., Torrellas, J.: ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors. In: Proc. of the International Symposium on Computer Architecture (ISCA) (2002)Google Scholar
  10. 10.
    Sorin, D., Martin, M., Hill, M., Wood, D.: SafetyNet: Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery. In: Proc. of the International Symposium on Computer Architecture (ISCA) (2002)Google Scholar
  11. 11.
    Wang, N.J., Patel, S.J.: ReStore: Symptom Based Soft Error Detection in Microprocessors. In: Proc. of the International Conference on Dependable Systems and Network (DSN), pp. 30–39 (2005)Google Scholar
  12. 12.
    McNairy, C., Bhatia, R.: Montecito: A Dual-core, Dual-thread Itanium Processor. IEEE Micro 25(2), 10–20 (2005)CrossRefGoogle Scholar
  13. 13.
    Kongetira, P., Aingaran, K., Olukotun, K.: Niagara: A 32-way multithreaded sparc processor. IEEE Micro 25(2), 21–29 (2005)CrossRefGoogle Scholar
  14. 14.
    Bossen, D., Tendler, J., Reick, K.: Power4 system design for high reliability. IEEE Micro 22(2), 16–24 (2002)CrossRefGoogle Scholar
  15. 15.
    Lattner, C., Adve, V.: The LLVM Compiler Framework and Infrastructure Tutorial. In: Eigenmann, R., Li, Z., Midkiff, S.P. (eds.) LCPC 2004. LNCS, vol. 3602. Springer, Heidelberg (2005)Google Scholar
  16. 16.
    Reis, G.A., Chang, J., Vachharajani, N., Rangan, R., August, D.I.: SWIFT: Software Implemented Fault Tolerance. In: Proc. of the International Symposium on Code Generation and Optimization (CGO) (2005)Google Scholar
  17. 17.
    Mukherjee, S.S., Kontz, M., Reinhardt, S.K.: Detailed Design and Evaluation of Redundant Multithreading Alternatives. In: Proc. of International Symposium on Computer Architecture, Washington, DC, USA, pp. 99–110. IEEE Computer Society, Los Alamitos (2002)Google Scholar
  18. 18.
    Reinhardt, S.K., Mukherjee, S.S.: Transient Fault Detection via Simultaneous Multithreading. In: Proc. of International Symposium on Computer Architecture, pp. 25–36. ACM Press, New York (2000)Google Scholar
  19. 19.
    Wang, N., Fertig, M., Patel, S.: Y-Branches: When You Come to a Fork in the Road, Take It. In: Proc. of the International Conference on Parallel Architectures and Compilation Techniques (PACT) (2003)Google Scholar
  20. 20.
    Chang, J., Reis, G.A., Vachharajani, N., Rangan, R., August, D.: Non-uniform fault tolerance. In: Proceedings of the 2nd Workshop on Architectural Reliability (WAR) (2006)Google Scholar
  21. 21.
    Gaisler, J.: Evaluation of a 32-bit microprocessor with built-in concurrent error detection. In: FTCS 1997: Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS 1997), Washington, DC, USA, p. 42. IEEE Computer Society, Los Alamitos (1997)CrossRefGoogle Scholar
  22. 22.
    Montesinos, P., Liu, W., Torrellas, J.: Shield: Cost-Effective Soft-Error Protection for Register Files. In: Third IBM TJ Watson Conference on Interaction between Architecture, Circuits and Compilers (PAC 2006) (2006)Google Scholar
  23. 23.
    Hu, J., Wang, S., Ziavras, S.G.: In-register duplication: Exploiting narrow-width value for improving register file reliability. In: DSN 2006: Proceedings of the International Conference on Dependable Systems and Networks (DSN 2006), Washington, DC, USA, pp. 281–290. IEEE Computer Society, Los Alamitos (2006)Google Scholar
  24. 24.
    Reis, G.A., Chang, J., Vachharajani, N., Rangan, R., August, D.I., Mukherjee, S.S.: Design and Evaluation of Hybrid Fault-Detection Systems. In: Proc. of the International International Symposium on Computer Architecture (ISCA) (2005)Google Scholar
  25. 25.
    Reis, G.A., Chang, J., August, D.I., Cohn, R., Mukherjee, S.S.: Configurable Transient Fault Detection via Dynamic Binary Translation. In: Proceedings of the 2nd Workshop on Architectural Reliability (WAR) (2006)Google Scholar
  26. 26.
    Luk, C., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In: Proc. of the Intenational Conference on Programming Language Design and Implementation (PLDI) (2005)Google Scholar
  27. 27.
    Mukherjee, S.S., Weaver, C., Emer, J., Reinhardt, S.K., Austin, T.: A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor. In: MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, p. 29. IEEE Computer Society, Los Alamitos (2003)Google Scholar
  28. 28.
    Oh, N., McCluskey, E.J.: Low Energy Error Detection Technique Using Procedure Call Duplication. In: Proc. of the International Conference on Dependable Systems and Network (DSN) (2001)Google Scholar
  29. 29.
    Reis, G.A., Chang, J., Vachharajani, N., Rangan, R., August, D.I., Mukherjee, S.S.: Software-controlled fault tolerance. ACM Trans. Archit. Code Optim. 2(4), 366–396 (2005)CrossRefGoogle Scholar
  30. 30.
    Rebaudengo, M., Reorda, M.S., Violante, M., Torchiano, M.: A Source-to-Source Compiler for Generating Dependable Software. In: IEEE International Workshop on Source Code Analysis and Manipulation (SCAM), pp. 35–44 (2001)Google Scholar
  31. 31.
    Oh, N., Shirvani, P., McCluskey, E.J.: Error Detection by Duplicated Instructions in Super-scalar Processors. IEEE Transactions on Reliability 51(1), 63–75 (2002)CrossRefGoogle Scholar
  32. 32.
    Chang, J., Reis, G.A., August, D.I.: Automatic Instruction-Level Software-Only Recovery. In: DSN 2006: Proceedings of the International Conference on Dependable Systems and Networks (DSN 2006), Washington, DC, USA, pp. 83–92. IEEE Computer Society, Los Alamitos (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Jing Yu
    • 1
  • María Jesús Garzarán
    • 1
  • Marc Snir
    • 1
  1. 1.University of Illinois at Urbana-Champaign 

Personalised recommendations