Algorithmica

, Volume 12, Issue 2, pp 182–208

Coding techniques for handling failures in large disk arrays

Authors

  • L. Hellerstein
    • Department of Electrical Engineering and Computer ScienceNorthwestern University
  • G. A. Gibson
    • School of Computer ScienceCarnegie Mellon University
  • R. M. Karp
    • Computer Science Division, Electrical Engineering and Computer SciencesUniversity of California
  • R. H. Katz
    • Computer Science Division, Electrical Engineering and Computer SciencesUniversity of California
  • D. A. Patterson
    • Computer Science Division, Electrical Engineering and Computer SciencesUniversity of California
Article

DOI: 10.1007/BF01185210

Cite this article as:
Hellerstein, L., Gibson, G.A., Karp, R.M. et al. Algorithmica (1994) 12: 182. doi:10.1007/BF01185210

Abstract

A crucial issue in the design of very large disk arrays is the protection of data against catastrophic disk failures. Although today single disks are highly reliable, when a disk array consists of 100 or 1000 disks, the probability that at least one disk will fail within a day or a week is high. In this paper we address the problem of designing erasure-correcting binary linear codes that protect against the loss of data caused by disk failures in large disk arrays. We describe how such codes can be used to encode data in disk arrays, and give a simple method for data reconstruction. We discuss important reliability and performance constraints of these codes, and show how these constraints relate to properties of the parity check matrices of the codes. In so doing, we transform code design problems into combinatorial problems. Using this combinatorial framework, we present codes and prove they are optimal with respect to various reliability and performance constraints.

Key words

Input/output architectureRedundant disk arraysRAIDError-correcting codesReliabilityAvailability

Copyright information

© Springer-Verlag New York Inc. 1994