Advertisement

Reconfiguration of massively parallel systems

  • Johan Vounckx
  • G. Deconinck
  • R. Lauwereins
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 919)

Abstract

The reconfiguration approach presented in this paper provides a solution to the need for fault tolerance in large systems. The developed techniques all have a data complexity and an execution time complexity less than proportional to the number of nodes in the system. Hence the approach is extremely suited for massively parallel systems. The reconfiguration strategy consists of four different subtasks, repartitioning (each application must have sufficient working processors), loading of injured networks, remapping (to replace faulty processors by working ones) and deadlock-free fault tolerant compact routing.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Vounckx J., Deconinck G., e.a.: The FTMPS-Project, Design and Implementation of Fault Tolerance Techniques for Massively Parallel Systems, HPCN 95, LNCS 797, Springer-Verlag, pp. 401–406, Munich, April 1994Google Scholar
  2. 2.
    Mahmood A.: Concurrent Error Detection Using Watchdog Processors — A Survey. IEEE Trans. on Computers, 37 (2), 1990.Google Scholar
  3. 3.
    Altmann J., Balbach F., Hein A.: An Approach for Hierarchical System Level Diagnosis of Massively Parallel Computers Combined with a Simulation-Based Method for Dependability Analysis, EDCC-1 conference, LNCS 852, Springer-Verlag, pp. 371–385, Berlin, October 1994Google Scholar
  4. 4.
    Bieker B., Maehle E., Deconinck G., Vounckx J.: Reconfiguration and Checkpointing in Massively Parallel Systems, EDCC-1 conference, LNCS 852, Springer-Verlag, pp. 353–370, Berlin, October 1994Google Scholar
  5. 5.
    Vounckx J., Deconinck G., Lauwereins R., Peperstraete J.A.: Fault-Tolerant Compact Routing based on Reduced Structural Information in Wormhole-Switching based Networks, Proc. SICC 94 conference, Ottawa, Canada, May 1994Google Scholar
  6. 6.
    Vounckx J., Deconinck G., Lauwereins R., Peperstraete J.A.: Deadlock-Free Fault-Tolerant Wormhole Routing in Mesh based Massively Parallel Networks, IEEE TCAA Newsletter, accepted for publication (Automn 1994)Google Scholar
  7. 7.
    Vounckx J., Deconinck G., Cuyvers R., Lauwereins R.: Minimal Deadlock-Free Compact Routing in Wormhole Switching based Injured Meshes, internal report KULeuven-ESAT, August 1994Google Scholar
  8. 8.
    van Leeuwen J., Tan R.B.: Interval Routing, The Computer Journal, 30(4), 1987, pp. 298–307MathSciNetGoogle Scholar
  9. 9.
    Tanenbaum A.S.: Computer Networks, Prentice-Hall, 1988Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1995

Authors and Affiliations

  • Johan Vounckx
    • 1
  • G. Deconinck
    • 1
  • R. Lauwereins
    • 1
    • 2
  1. 1.K.U.Leuven-ESATHeverleeBelgium
  2. 2.Belgian National Science FoundationBelgium

Personalised recommendations