Skip to main content

The Evolution of Fault Tolerant Computing at the University of Illinois

  • Conference paper
The Evolution of Fault-Tolerant Computing

Part of the book series: Dependable Computing and Fault-Tolerant Systems ((DEPENDABLECOMP,volume 1))

  • 99 Accesses

Abstract

The University of Illinois has been active in research in the fault-tolerant computing field for over 25 years. Fundamental ideas have been proposed and major contributions made by researchers at the University of Illinois in the areas of testing and diagnosis, concurrent error detection, and fault tolerance. This paper traces the origins of these ideas and their development within the University of Illinois, as well as their influence upon research at other institutions, and outlines current directions of research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abraham, J. A., Metze, G., 1978: Roving Diagnosis for High-Performance Digital Systems. Proc. Conf. on Information Sciences and Systems, pp. 221–226.

    Google Scholar 

  2. Abraham, J. A., 1979: “An Improved Algorithm for Network Reliability”, IEEE Trans, on Network Reliability R-28, pp 58–61

    Article  Google Scholar 

  3. Abraham, J. A., Gajski, D. D., 1981: “Design of Testable Structures Defined by Simple Loops”, IEEE Trans, on Computers C-30, pp. 875–884

    Google Scholar 

  4. Abraham, J. A., Davidson, E. S., Patel, J. H., 1983: “Memory System Design for Tolerating Single-Event Upsets”, IEEE Trans, on Nuclear Science NS-30, No. 6, pp. 4339–4344

    Article  Google Scholar 

  5. Abraham, J. A., Shih, H. -C., 1985: “Testing of MOS VLSI Circuits”, Proc. Int. Symp. on Circuits and Systems, pp. 1297–1300.

    Google Scholar 

  6. Anderson, D. A., 1971: “Design of Self-Checking Digital Networks”, Coordinated Science Laboratory Technical Report R-527, University of Illinois, Urbana, Illinois.

    Google Scholar 

  7. Anderson, D. A., Metze, G., 1973: “Design of Totally Self-Checking Circuits for m-out-of-n Codes”, IEEE Trans, on Computers C-22, No. 3, pp. 263–269

    Article  Google Scholar 

  8. Banerjee, P., Abraham, J. A., 1984a: “Characterization and Testing of Physical Failures in MOS Logic Circuits”, IEEE Design and Test 1, pp. 76–86

    Article  Google Scholar 

  9. Banerjee, P., Abraham, J. A., 1984b: “Fault-Secure Algorithms for Multiple Processor Systems”, Proc. 11th Int. Symp. on Computer Architecture, pp. 279–287.

    Google Scholar 

  10. Banerjee, P., Abraham, J. A., 1985: “A Multivalued Algebra for Modeling Physical Failures in MOS VLSI Circuits”, IEEE Trans, on Computer-Aided Design, CAD-4, No. 3, pp. 312–321

    Article  Google Scholar 

  11. Banerjee, P., Abraham, J. A., 1986: “Bounds on Algorithm-Based Fault Tolerance in Multiple Processor Systems”, IEEE Trans, on Computers C-35, No. 4,pp. 296–306

    Article  Google Scholar 

  12. Bose, P., Abraham, J. A., 1982: “Test Generation for Programmable Logic Arrays”, Proc. ACM/IEEE 19th Design Automation Conf., pp. 574–580.

    Google Scholar 

  13. Brahme, D., Abraham, J. A., 1984: “Functional Testing of Microprocessors”, IEEE Trans, on Computers C-33, No. 6, pp. 475–485

    Article  Google Scholar 

  14. Breuer, M. A., Ismaeel, A. A., 1983: “Roving Emulation as a Fault Detection Mechanism”, Proc. 13th Int. Symp. on Fault-Tolerant Computing, pp. 206–215.

    Google Scholar 

  15. Carter, W. C., Schneider, P. R., 1968: “Design of Dynamically Checked Computers”, Proc. IFIP Congress 2, pp. 878–883

    Google Scholar 

  16. Cha, C. W., 1974: “Multiple Fault Diagnosis in Combinational Networks”, Coordinated Science Laboratory Technical Report R-650, University of Illinois, Urbana, Illinois.

    Google Scholar 

  17. Chang, H. Y., Manning, E., Metze, G, 1970: “Fault Diagnosis of Digital Systems”, Huntington, NY: Robert E., Krieger Publishing Company.

    Google Scholar 

  18. Cheng, W. -T., Patel, J. H., 1984: “Concurrent Error Detection in Iterative Logic Arrays”, Proc. 14th Int. Symp. on Fault-Tolerant Computing, pp. 10–15.

    Google Scholar 

  19. Cheng, W. -T., Patel, J. H., 1985a: “A Minimum Test Set for Multiple-Fault Detection in Ripple-Carry Adders”

    Google Scholar 

  20. Proc. Int. Conf. on Computer Design, pp. 435–438.

    Google Scholar 

  21. Cheng, W. -T., Patel, J. H., 1985b: “Multiple-Fault Detection in Iterative Logic Arrays”, Proc. Int. Test Conf., pp. 493–499.

    Google Scholar 

  22. Cheng, W. -T. Patel, J. H., 1985c: “A Shortest Length Test Sequence for Sequential-Fault Detection in Ripple Carry Adders”, Proc. Int. Conf. on Computer-Aided Design, pp. 71–73.

    Google Scholar 

  23. Chillarege, R., Iyer, R. K., 1985: “The Effect of System Workload on Error Latency: An Experimental Study”, Proc. ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, pp. 69–77.

    Google Scholar 

  24. Chillarege, R., Iyer, R. K., 1986: “Fault Latency in the Memory-An Experimental Study on VAX 11/780”, Proc. 16th Int. Symp. on Fault-Tolerant Computing.

    Google Scholar 

  25. Choi, Y. -H., Malek, M., 1985: “A Fault-Tolerant FFT Processor”, Proc. 15th Int. Symp. on Fault-Tolerant Computing, pp. 266–271.

    Google Scholar 

  26. Chou, T. C. -K., Abraham, J. A., 1980: “Performance/Availability Modeling of Shared Resource Multiprocessors”, IEEE Trans, on Reliability R-29, pp. 70–74

    Google Scholar 

  27. Chou, T. C. -K., Abraham, J. A., 1983: “Load Redistribution under Failure in Distributed Systems”, IEEE Trans, on Computers C-32, pp. 799–808

    Article  Google Scholar 

  28. Dahbura, A. T., Masson, G. M., 1984: “An Order 0(n2 5) Fault Identification Algorithm for Diagnosable Systems”, IEEE Trans, on Computers C-33, pp. 486–492

    Google Scholar 

  29. Davis, T. A., Kunda, R. P., Fuchs, W. K., 1985: “Testing of Bit-Serial Multipliers”, Proc. Int. Conf. on Computer Design, pp. 430–434.

    Google Scholar 

  30. Dussault, J., 1977: “On the Design of Self-Checking Systems under Various Fault Models”, Coordinated Science Laboratory Technical Report R-781, University of Illinois, Urbana, Illinois.

    Google Scholar 

  31. Friedman, A. D., Simoncini, L., 1980: “System-Level Fault Diagnosis”, Computer (Special Issue on Fault-Tolerant Computing) 13, No. 3, pp. 47–53

    Google Scholar 

  32. Fuchs, W. K., Abraham, J. A., Huang, K. -H., 1983: “Concurrent Error Detection in VLSI Interconnection Networks”, Proc. 10th Int. Symp. on Computer Architecture, pp. 309–315. Also reprinted in: Interconnection Networks for Parallel and Distributed Processing (Wu, C. -H., Fung, T. -Y., eds.), pp. 380–386. IEEE Press.

    Google Scholar 

  33. Fuchs, W. K., Abraham, J. A., 1984: “A Unified Approach to Concurrent Error Detection in Highly Structured Logic Arrays”, Proc. 14th Int. Symp. on Fault-Tolerant Computing, pp. 4–9

    Google Scholar 

  34. Fujii, R., Abraham, J. A., 1985: “Self-Test for Microprocessors”, Proc. Int. Test Conf., pp. 356–361.

    Google Scholar 

  35. Fujiwara, H., Kinoshita, K., 1981: “A Design of Programmable Logic Arrays with Universal Tests”, IEEE Trans, on Computers CD-30, No. 11, pp. 823–828

    Article  Google Scholar 

  36. Hayes, J. P., 1971: “A NAND Model for Fault Diagnosis in Combinational Logic Networks”, IEEE Trans, on Computers C-20, pp. 1496–1506

    Google Scholar 

  37. Hong, S. J., Ostapko, D. L., 1980: “FITPLA: A Programmable Logic Array for Function-Independent Testing”, Proc. 10th Int. Conf. on Fault-Tolerant Computing, pp. 131–136.

    Google Scholar 

  38. Hua, K. A., Jou, J. -Y., Abraham, J. A., 1984: “Built-in Tests for VLSI Finite-State Machines”, Proc. 14th Int. Conf. on Fault-Tolerant Computing, pp. 292–297.

    Google Scholar 

  39. Huang, K. -H., Abraham, J. A., 1982: “Low-Cost Schemes for Fault Tolerance in Matrix Operations with Array Processors”, Proc. 12th Int. Symp. on Fault-Tolerant Computing, pp. 330–337.

    Google Scholar 

  40. Huang, K. -H., Abraham, J. A., 1984a: “Algorithm-Based Fault Tolerance for Matrix Operations”, IEEE Trans, on Computers (Special Issue on Reliable and Fault-Tolerant Computing) C-33, pp. 518–528

    Google Scholar 

  41. Huang, K. -H., Abraham, J. A., 1984b: “Fault-Tolerant Algorithms and their Applications to Solving Laplace Equations”, Proc. Int. Conf. on Parallel Processing, pp. 117–122.

    Google Scholar 

  42. Iyer, R. K., Rossetti, D. J., 1985: “Effect of System Workload on Operating System Reliability: A Study on the IBM 3081”, IEEE Trans, on Software Engineer ng (Special Issue on Software Reliability, Part 1) SE-11, No.: pp. 1438–1448.

    Google Scholar 

  43. Iyer, R. K., Rossetti, D. J., 1986: “A Measurement-Based Model for Workload Dependency of CPU Errors”, IEEE Trans, on Computers C-35, No. 6 (to appear).

    Google Scholar 

  44. Jansch, I., Courtois, B., 1985: “Strongly Language Disjoint Checkers”, Proc. 15th Int. Symp. on Fault-Tolerant Computing, pp. 390–395.

    Google Scholar 

  45. Jha, N. K., Abraham, J. A., 1984: “The Design of Totally Self-Checking Embedded Checkers”, Proc. 14th Int. Symp. on Fault-Tolerant Computing, pp. 265–270.

    Google Scholar 

  46. Jha, N. K., Abraham, J. A. 1985a: “Techniques for Efficient MOS Implementation of Totally Self-Checking Checkers”, Proc. 15th Int. Symp. on Fault-Tolerant Computing, pp. 430–435.

    Google Scholar 

  47. Jha, N. K., Abraham, J. A., 1985b: “Design of Testable CMOS Logic Circuits under Arbitrary Delays”, IEEE Trans, on Computer-Aided Design, CAD-4, No. 3, pp. 312–321

    Google Scholar 

  48. Jou, J. -Y., Abraham, J. A., 1984: “Fault-Tolerant Matrix Operations on Multiple Processor Systems using Weighted Checksums”, Proc. SPIE Conf., pp. 94–101.

    Google Scholar 

  49. Jou, J. -Y., Abraham, J. A., 1985: “Fault-Tolerant FFT Networks”, Proc. Int. Symp. on Fault-Tolerant Computing, pp. 338–343.

    Google Scholar 

  50. Laha, S., Patel, J. H., 1983: “Error Correction in Arithmetic Operations using Time Redundancy”, Proc. 13th Int. Symp. on Fault-Tolerant Computing, pp. 298–305.

    Google Scholar 

  51. Luk, F. T., 1985: “Algorithm-Based Fault Tolerance for Parallel Matrix Equation Solvers”, Proc. SPIE Conf. ( Real-Time Signal Processing VIII ) 564.

    Google Scholar 

  52. Mak, G. -P., Davidson, E. S., Abraham, J. A., 1982: “The Design of PLAs with Concurrent Error Detection”, Proc. 12th Int. Symp. on Fault-Tolerant Computing, pp. 303–310.

    Google Scholar 

  53. Manning, E., 1966: “On Computer Self-Diagnosis: Part I and II”, IEEE Trans. Electronic Computers EC-15, pp. 873–890

    Google Scholar 

  54. Marlett, R. A., 1966: “On the Design and Testing of Self-Diagnosable Computers”, Coordinated Science Laboratory Technical Report R-293, University of Illinois, Urbana, Illinois.

    Google Scholar 

  55. McCluskey, E. J., Clegg, F. W., 1971: “Fault Equivalence in Combinational Logic Networks”, IEEE Trans, on Computers C-20, pp. 1286–1293.

    Google Scholar 

  56. Meagher, R. E., Nash, J. P., 1952: “The ORDVAC”, Review of Electronic Digital Computers, pp. 37–43.

    Google Scholar 

  57. Muller, D. E., Bartky, J. S., 1959: “A Theory of Asynchronous Circuits”, Proc. Int. Symp. on Theory of Switching, pp. 204–243.

    Google Scholar 

  58. Nair, R., Thatte, S. M., Abraham, J. A., 1978: “Efficient Algorithms for Testing Semiconductor Random-Access Memories”, IEEE Trans, on Computers C-27, No. 6, pp. 572–576

    Article  MathSciNet  Google Scholar 

  59. Patel, J. H., Fung, L. Y., 1982: “Concurrent Error Detection in ALUs by Recomputing with Shifted Operands”, IEEE Trans, on Computers, vol. C-31, pp. 589–595.

    Google Scholar 

  60. Patel, J. H., Fung, L. Y., 1983: “Concurrent Error Detection in Multiply and Divide Arrays”, IEEE Trans, on Computers, vol. C-32, pp. 417–422.

    Article  Google Scholar 

  61. Pollard, L. H., Patel, J. H., 1983: “Correction of Errors in Data Transmission using Time Redundancy”, Proc. 13th Int. Symp. on Fault-Tolerant Computing, pp. 314–317.

    Google Scholar 

  62. Preparata, F. P., Metze, G., Chien, R. T., 1967: “On the Connection Assignment Problem of Diagnosable Systems”, IEEE Trans, on Electronic Computers EC-16, No. 6, pp. 848–854

    Article  Google Scholar 

  63. Reynolds, D. A., Metze, G., 1978: “Fault Detection Capabilities of Alternating Logic”, IEEE Trans, on Computers, vol. C-27, pp. 1093–1098.

    Google Scholar 

  64. Rogers, W. A., Abraham, J. A., 1985a: “High-Level Hierarchical Fault Simulation Techniques”, Proc. ACM Computer Science Conference, pp. 89–97.

    Google Scholar 

  65. Rogers, W. A., Abraham, J. A., 1985b: “CHIEFS: A Concurrent, Hierarchical, and Extensible Fault Simulator”, Proc. Int. Test Conf., pp. 710–716.

    Google Scholar 

  66. Schertz, D. R., Metze, G., 1968: “On the Indistinguishability of Faults in Digital Systems”, Proc. 6th Ann. Allerton Conf. on Circuit and System Theory, pp. 752–760.

    Google Scholar 

  67. Schertz, D. R., 1969: “On the Representation of Digital Faults”, Coordinated Science Laboratory Technical Report R-418, University of Illinois, Urbana, Illinois.

    Google Scholar 

  68. Schertz, D. R. and Metze, G., 1972: “A New Representation for Faults in Combinational Digital Circuits”, IEEE Trans, on Computers, C-21, No. 8, pp. 858–866

    Article  Google Scholar 

  69. Seshu, S., Freeman, D. N., 1962: “The Diagnosis of Asynchronous Sequential Switching Systems”, IRE Trans, on Electronic Computers EC-11, No. 4, pp. 459–465

    Article  MathSciNet  Google Scholar 

  70. Seshu, S., 1964: “The Logic Organizer and Diagnosis Programs”, Coordinated Science Laboratory Technical Report R-226, University of Illinois, Urbana, Illinois.

    Google Scholar 

  71. Seshu, S., 1965: “On an Improved Diagnosis Program”, IEEE Trans, on Electronic Computers EC-14, No. 1, pp. 76–79

    Article  Google Scholar 

  72. Shih, H. -C., Rahmeh, J. T., Abraham, J. A., 1985: “An MOS Fault Simulator with Timing Information”, Proc. Int. Conf. on Computer-Aided Design, pp. 45–47.

    Google Scholar 

  73. Smith, J. E., Metze, G., 1975: “On the Existence of Combinational Networks with Arbitrary Multiple Redundancies”, Coordinated Science Laboratory Technical Report R-692, University of Illinois, Urbana, Illinois.

    Google Scholar 

  74. Smith, J. E., 1976: “The Design of Totally Self-Checking Combinational Circuits”, Coordinated Science Laboratory Technical Report R-737, University of Illinois, Urbana, Illinois.

    Google Scholar 

  75. Smith, J. E., Metze, G., 1978: “Strongly Fault-Secure Logic Networks”, IEEE Trans, on Computers C-27, No. 6, pp. 491–499.

    Article  MathSciNet  Google Scholar 

  76. Smith, J. E., 1979: “On Necessary and Sufficient Conditions for Multiple Fault Undetectability”, IEEE Trans, on Computers C-28, pp. 801–802

    Google Scholar 

  77. Suk, D. S., Reddy, S. M., 1981: “A March Test for Functional Faults in Semiconductor Random Access Memories”, IEEE Trans, on Computers C-30, pp. 982–984

    Article  Google Scholar 

  78. Thatte, S. M., Abraham, J. A., 1977: “Testing of Semiconductor Random Access Memories”, Proc. 7th Int. Symp. on Fault-Tolerant Computing, pp. 81–87.

    Google Scholar 

  79. Thatte, S. M., Abraham, J. A., 1980: “Test Generation for Microprocessors”, IEEE Trans, on Computers C-29, No. 6, pp. 429–441.

    Article  MathSciNet  Google Scholar 

  80. To, K., 1973: “Fault Folding for Irredundant and Redundant Combinational Circuits”, IEEE Trans, on Computers C-22, No. 11, pp. 1008–1015.

    Article  MathSciNet  Google Scholar 

  81. Treuer, R., Fujiwara, H., Agarwal, V. K., 1985: “A Low-Overhead, High Coverage, Built-in Self-Test PLA Design”, Proc. 15th Int. Symp. on Fault-Tolerant Computing, pp. 112–117.

    Google Scholar 

  82. Wheeler, D. J., Robertson, J. E., 1953: “Diagnostic Programs for the ILLIAC”, Proc. IRE 41, pp. 1320–1325.

    Article  MathSciNet  Google Scholar 

  83. Wong, C. -Y., Fuchs, W. K., Abraham, J. A., Davidson, E. S., 1983: “The Design of a Microprogram Control Unit with Concurrent Error Detection”, Proc. 13th Int. Symp. on Fault-Tolerant Computing, pp. 476–483.

    Google Scholar 

  84. Yen, M. M., 1984: “Design of a Microprogram Control Unit with Concurrent Error Detection”, Computer Systems Group Technical Report CSG-30, Coordinated Science Laboratory, University of Illinois, Urbana, Illinois.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1987 Springer-Verlag/Wien

About this paper

Cite this paper

Abraham, J.A., Metze, G., Iyer, R.K., Patel, J.H. (1987). The Evolution of Fault Tolerant Computing at the University of Illinois. In: Avižienis, A., Kopetz, H., Laprie, JC. (eds) The Evolution of Fault-Tolerant Computing. Dependable Computing and Fault-Tolerant Systems, vol 1. Springer, Vienna. https://doi.org/10.1007/978-3-7091-8871-2_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-7091-8871-2_11

  • Publisher Name: Springer, Vienna

  • Print ISBN: 978-3-7091-8873-6

  • Online ISBN: 978-3-7091-8871-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics