Coding techniques for handling failures in large disk arrays
 L. Hellerstein,
 G. A. Gibson,
 R. M. Karp,
 R. H. Katz,
 D. A. Patterson
A crucial issue in the design of very large disk arrays is the protection of data against catastrophic disk failures. Although today single disks are highly reliable, when a disk array consists of 100 or 1000 disks, the probability that at least one disk will fail within a day or a week is high. In this paper we address the problem of designing erasurecorrecting binary linear codes that protect against the loss of data caused by disk failures in large disk arrays. We describe how such codes can be used to encode data in disk arrays, and give a simple method for data reconstruction. We discuss important reliability and performance constraints of these codes, and show how these constraints relate to properties of the parity check matrices of the codes. In so doing, we transform code design problems into combinatorial problems. Using this combinatorial framework, we present codes and prove they are optimal with respect to various reliability and performance constraints.
 Title
 Coding techniques for handling failures in large disk arrays
 Journal

Algorithmica
Volume 12, Issue 23 , pp 182208
 Cover Date
 19940901
 DOI
 10.1007/BF01185210
 Print ISSN
 01784617
 Online ISSN
 14320541
 Publisher
 SpringerVerlag
 Additional Links
 Topics
 Keywords

 Input/output architecture
 Redundant disk arrays
 RAID
 Errorcorrecting codes
 Reliability
 Availability
 Industry Sectors
 Authors

 L. Hellerstein ^{(1)}
 G. A. Gibson ^{(2)}
 R. M. Karp ^{(3)}
 R. H. Katz ^{(3)}
 D. A. Patterson ^{(3)}
 Author Affiliations

 1. Department of Electrical Engineering and Computer Science, Northwestern University, 60208, Evanston, IL, USA
 2. School of Computer Science, Carnegie Mellon University, 152133890, Pittsburgh, PA, USA
 3. Computer Science Division, Electrical Engineering and Computer Sciences, University of California, 94720, Berkeley, CA, USA