Stochastic Models for Fault Tolerance

Restart, Rejuvenation and Checkpointing

  • Katinka Wolter

Table of contents

  1. Front Matter
    Pages i-xvi
  2. Introduction

    1. Front Matter
      Pages 1-1
    2. Katinka Wolter
      Pages 3-12
    3. Katinka Wolter
      Pages 13-31
  3. Restart

    1. Front Matter
      Pages 33-33
    2. Katinka Wolter
      Pages 35-50
    3. Katinka Wolter
      Pages 51-93
    4. Katinka Wolter
      Pages 95-115
  4. Software Rejuvenation

  5. Checkpointing

    1. Front Matter
      Pages 167-168
    2. Katinka Wolter
      Pages 171-176
    3. Katinka Wolter
      Pages 177-236
    4. Katinka Wolter
      Pages 237-240
  6. Back Matter
    Pages 241-269

About this book

Introduction

As modern society relies on the fault-free operation of complex computing systems, system fault-tolerance has become an indispensable requirement. Therefore, we need mechanisms that guarantee correct service in cases where system components fail, be they software or hardware elements. Redundancy patterns are commonly used, for either redundancy in space or redundancy in time.

Wolter’s book details methods of redundancy in time that need to be issued at the right moment. In particular, she addresses the so-called "timeout selection problem", i.e., the question of choosing the right time for different fault-tolerance mechanisms like restart, rejuvenation and checkpointing. Restart indicates the pure system restart, rejuvenation denotes the restart of the operating environment of a task, and checkpointing includes saving the system state periodically and reinitializing the system at the most recent checkpoint upon failure of the system. Her presentation includes a brief introduction to the methods, their detailed stochastic description, and also aspects of their efficient implementation in real-world systems.

The book is targeted at researchers and graduate students in system dependability, stochastic modeling and software reliability. Readers will find here an up-to-date overview of the key theoretical results, making this the only comprehensive text on stochastic models for restart-related problems.

Keywords

Analysis Error prediction Fault-tolerance mechanisms System behavior System modeling System optimization modeling system

Authors and affiliations

  • Katinka Wolter
    • 1
  1. 1.Inst. InformatikHumboldt-Universität BerlinBerlinGermany

Bibliographic information

  • DOI https://doi.org/10.1007/978-3-642-11257-7
  • Copyright Information Springer-Verlag Berlin Heidelberg 2010
  • Publisher Name Springer, Berlin, Heidelberg
  • eBook Packages Computer Science
  • Print ISBN 978-3-642-11256-0
  • Online ISBN 978-3-642-11257-7
  • About this book