Coding techniques for handling failures in large disk arrays
 L. Hellerstein,
 G. A. Gibson,
 R. M. Karp,
 R. H. Katz,
 D. A. Patterson
 … show all 5 hide
Rent the article at a discount
Rent now* Final gross prices may vary according to local VAT.
Get AccessAbstract
A crucial issue in the design of very large disk arrays is the protection of data against catastrophic disk failures. Although today single disks are highly reliable, when a disk array consists of 100 or 1000 disks, the probability that at least one disk will fail within a day or a week is high. In this paper we address the problem of designing erasurecorrecting binary linear codes that protect against the loss of data caused by disk failures in large disk arrays. We describe how such codes can be used to encode data in disk arrays, and give a simple method for data reconstruction. We discuss important reliability and performance constraints of these codes, and show how these constraints relate to properties of the parity check matrices of the codes. In so doing, we transform code design problems into combinatorial problems. Using this combinatorial framework, we present codes and prove they are optimal with respect to various reliability and performance constraints.
 Product Description, RAID + Series Model RX, Revision 1.0. Array Technology Corporation, Boulder, CO 80301
 Bates, K. H. (1989) Performance aspects of the HSC controller. Digital Tech. J. 8: pp. 2537
 Bell, C. G. (1985) Multis: a new class of multiprocessor computers. Science 228: pp. 462467 CrossRef
 Bell, C. G. (1989) The future of high performance computers in science and engineering. Comm. ACM 32: pp. 10911101 CrossRef
 Bitton, D., and J. Gray, Disk shadowing,Proc. 14th Internat. Conf. on Very Large Data Bases (VLDB), 1988, pp. 331–338.
 Bollobás, B. (1986) Combinatorics, Set Systems, Hypergraphs, Families of Vectors, and Combinatorial Probability. Cambridge University Press, Cambridge
 Boral, H., DeWitt, D. Database machines: an idea whose time has passed?. In: Leilich, H. O., Missikoff, M. eds. (1983) Database Machines. SpringerVerlag, New York, pp. 166187
 Brouwer, A. E. Wilson's theory. In: Schrijver, A. eds. (1979) Packing and Covering in Combinatorics. Mathematisch Centrum, Amsterdam, pp. 7588
 Brouwer, A. E., Schrijver, A. Uniform hypergraphs. In: Schrijver, A. eds. (1979) Packing and Covering in Combinatorics. Mathematisch Centrum, Amsterdam, pp. 3973
 Chen, P. M., and D. A. Patterson, Maximizing performance in a striped disk array,Proc. 1990 ACM SIGARCH 17th Ann. Internat. Symp. on Computer Architecture, Seattle, WA, May 1990, pp. 322–331.
 Gelsinger, P. P., P. A. Gargini, G. H. Parker, and A. Y. C. Yu, Microprocessors circa 2000,IEEE Spectrum, October 1989, pp. 43–74.
 Gibson, G. A. (1992) Redundant Disk Arrays: Reliable, Parallel Secondary Storage. M.I.T. Press, Cambridge, MA
 Gibson, G. A., Patterson, D. A. (1993) Designing disk arrays for high data reliability. J. Parallel Distrib. Comput. 17: pp. 427 CrossRef
 Gray, J., Why do computers stop and what can be done about it?, Tandem Technical Report 85.7, June 1985.
 Hanani, H. (1961) The existence of construction of balanced incomplete block designs. Ann. of Math. Statist. 32: pp. 361386 CrossRef
 Hanani, H. (1975) Balanced incomplete block desings and related designs. Discrete Math. 11: pp. 255369 CrossRef
 Hoagland, A. (1985) Information storage technology: a look at the future. IEEE Trans. Computer 18: pp. 6067
 Holland, M., and G. A. Gibson, Parity declustering for continuous operation in redundant disk arrays,Fifth Internat. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS V), Boston, MA, October 1992, pp. 23–25.
 Jilke, W., Disk array mass storage sytems: the new opportunity, Amperif Corp., September 1986.
 Kim, M. Y., Synchronously Interleaved Disk Systems with their Application to the Very Large FFT, Ph.D. Dissertation, Polytechnic University, January 1987.
 Klietz, A., J. Turner, and T. C. Jacobson, TurboNFS: fast shared access for Cray disk storage,Proc. Cray User Group Convention, April 1988.
 Kryder, M. H. (1989) Data storage in 2000—trends in data storage technologies. IEEE Trans. Magnetics 25: pp. 43584363 CrossRef
 Lin, TingTing Yao, Design and Evaluation of an OnLine Predictive Diagnostic System, Ph.D. Dissertation, Carnegie Mellon University, April 1988.
 Livny, M., S. Khoshafian, and H. Boral, Multidisk management algorithms,Proc. ACM SIGMETRICS, May 1987, pp. 69–77.
 MacWilliams, F. J., Sloane, N. J. A. (1977) The Theory of ErrorCorrecting Codes. NorthHolland Mathematical Library, Vol. 16. Elsevier Science, New York
 Muntz, R. R., Lui, J. C. S. Performance analysis of disk arrays under failure. In: McLeod, D., SacksDavis, R., Schek, H. eds. (1990) Proc. 16th Internat. Conf. on Very Large Data Bases (VDLB). Morgan Kaufman, Los Altos, CA, pp. 162173
 Myers, G. J., Yi, A. Y. C., House, D. L. (1986) Microprocessor technology trends. Proc. IEEE 74: pp. 16051622 CrossRef
 Newberg, L., and D. Wolfe, String layouts for a redundant array of inexpensive disks,Algorithmica, this issue, pp. 209–224.
 Patterson, D. A., G. A. Gibson, and R. H. Katz, A case for redundant arrays of inexpensive disks (RAID),ACM SIGMOD 88, Chicago, June 1988, pp. 109–116.
 Peterson, W. W., Weldon, E. J. (1972) ErrorCorrecting Codes. M.I.T. Press, Cambridge, MA
 Rabin, M. O. (1989) Efficient dispersal of information at security, load balancing, and fault tolerance. J. Assoc. Comput. Mach. 36: pp. 335348
 Rubinstein, R. Y. (1981) Simulation and the Monte Carlo Method. Wiley, New York
 Salem, K., and H. GarciaMolina, Disk striping,Proc. IEEE 1986 Internat. Conf. on Data Engineering, 1986, pp. 336–342.
 Title
 Coding techniques for handling failures in large disk arrays
 Journal

Algorithmica
Volume 12, Issue 23 , pp 182208
 Cover Date
 19940901
 DOI
 10.1007/BF01185210
 Print ISSN
 01784617
 Online ISSN
 14320541
 Publisher
 SpringerVerlag
 Additional Links
 Topics
 Keywords

 Input/output architecture
 Redundant disk arrays
 RAID
 Errorcorrecting codes
 Reliability
 Availability
 Industry Sectors
 Authors

 L. Hellerstein ^{(1)}
 G. A. Gibson ^{(2)}
 R. M. Karp ^{(3)}
 R. H. Katz ^{(3)}
 D. A. Patterson ^{(3)}
 Author Affiliations

 1. Department of Electrical Engineering and Computer Science, Northwestern University, 60208, Evanston, IL, USA
 2. School of Computer Science, Carnegie Mellon University, 152133890, Pittsburgh, PA, USA
 3. Computer Science Division, Electrical Engineering and Computer Sciences, University of California, 94720, Berkeley, CA, USA