Advertisement

Software Encoded Processing: Building Dependable Systems with Commodity Hardware

  • Ute Wappler
  • Christof Fetzer
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4680)

Abstract

In future, the decreasing feature size and the reduced power supply will make it much more difficult to built reliable microprocessors. Economic pressure will most likely result in the reliability of microprocessors being tuned for the commodity market. In the dependability domain we expect the continued spreading of mixed-mode computing systems, i.e., systems that execute both critical and non-critical functionality. To permit the efficient execution of non-critical applications and the correct execution of critical applications, we introduce the concept of Software Encoded Processing (SEP). SEP enforces a crash failure semantics of the underlying CPU. It does not require the source code of encoded programs and provides probabilistic guarantees. To achieve this, arithmetic codes and signatures are used to detect corrupted data and faulty executions of programs.

Keywords

Code Word Fault Injection Soft Error Arithmetic Code Code Check 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bagchi, S., Kalbarczyk, Z., Iyer, R., Levendel, Y.: Design and evaluation of preemptive control signature (PECOS) checking. IEEE Trans. on Computers  (2003)Google Scholar
  2. 2.
    Bernick, D., Bruckert, B., Vigna, P.D., Garcia, D., Jardine, R., Klecka, J., Smullen, J.: NonStop advanced architecture. DSN  (2005)Google Scholar
  3. 3.
    Blum, M., Luby, M., Rubinfeld, R.: Self-testing/correcting with applications to numerical problems. In: Proceedings of STOC 1990, United States (1990)Google Scholar
  4. 4.
    Borkar, S.: Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. IEEE Micro  (2005)Google Scholar
  5. 5.
    Bossen, D.C., Tendler, J.M., Reick, K.: Power4 system design for high reliability. IEEE Micro  (2002)Google Scholar
  6. 6.
    Carmichael, C.: Triple module redundancy design techniques for virtex series FPGA. Xilinx Application Notes 197 (March 2001)Google Scholar
  7. 7.
    Forin, P.: Vital coded microprocessor principles and application for various transit systems. In: IFA-GCCT (September 1989)Google Scholar
  8. 8.
    Huang, K.-H., Abraham, J.A.: Algorithm-based fault tolerance for matrix operations. IEEE Trans. Computers  (1984)Google Scholar
  9. 9.
    Knauth, T.: Performance improvements of the vital encoded interpreter. Großer Beleg, Technische Universität Dresden (2006)Google Scholar
  10. 10.
    Li, X., Gaudiot, J.-L.: A compiler-assisted on-chip assigned-signature control flow checking. In: Asia-Pacific Computer Systems Architecture Conference. LNCS, Springer, Heidelberg (2004)Google Scholar
  11. 11.
    Mahmood, A., McCluskey, E.J.: Concurrent error detection using watchdog processors—a survey. IEEE Trans. Comput. (1988)Google Scholar
  12. 12.
    Miller, E.L.: UC Santa Cruz, School of Engineering, http://www2.ucsc.edu/courses/cmps111-elm/dlx/install.shtml
  13. 13.
    Nicolescu, B., Velazco, R.: Detecting soft errors by a purely software approach: Method, tools and experimental results. In: DATE 2003 (2003)Google Scholar
  14. 14.
    Oh, N., Mitra, S., McCluskey, E.J.: ED4I: Error detection by diverse data and duplicated instructions. IEEE Trans. Comput. (2002)Google Scholar
  15. 15.
    Patterson, D.A., Hennessy, J.L.: Computer architecture: a quantitative approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1990)Google Scholar
  16. 16.
    Quach, N.: High availability and reliability in the Itanium processor. IEEE Micro  (2000)Google Scholar
  17. 17.
    Spainhower, L., Gregg, T.A.: IBM S/390 parallel enterprise server G5 fault tolerance: A historical perspective. IBM Journal of Research  (1999)Google Scholar
  18. 18.
    Stefanidis, V.K., Margaritis, K.G.: Algorithm based fault tolerance: Review and experimental study. In: International Conference of Numerical Analysis and Applied Mathematics (2004)Google Scholar
  19. 19.
    Wang, C., Kim, H.s., Wu, Y., Ying, V.: Compiler-managed software-based redundant multi-threading for transient fault detection. In: Proceedings of CGO 2007 (2007)Google Scholar
  20. 20.
    Wappler, U., Fetzer, C.: Hardware fault injection using dynamic binary instrumentation: FITgrind. In: Proceedings Supplemental, vol. EDCC-6 (October 2006)Google Scholar
  21. 21.
    Wappler, U., Fetzer, C.: Hardware failure virtualization via software encoded processing. In: INDIN 2007 (2007)Google Scholar
  22. 22.
    Wasserman, H., Blum, M.: Software reliability via run-time result-checking. J. ACM  (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Ute Wappler
    • 1
  • Christof Fetzer
    • 1
  1. 1.Technische Universtät Dresden, Department of Computer Science, DresdenGermany

Personalised recommendations