Coding techniques for handling failures in large disk arrays

Hellerstein, L.; Gibson, G. A.; Karp, R. M.; Katz, R. H.; Patterson, D. A.

doi:10.1007/BF01185210

Coding techniques for handling failures in large disk arrays

Published: September 1994

Volume 12, pages 182–208, (1994)
Cite this article

Algorithmica Aims and scope Submit manuscript

L. Hellerstein¹,
G. A. Gibson²,
R. M. Karp³,
R. H. Katz³ &
…
D. A. Patterson³

161 Accesses
79 Citations
3 Altmetric
Explore all metrics

Abstract

A crucial issue in the design of very large disk arrays is the protection of data against catastrophic disk failures. Although today single disks are highly reliable, when a disk array consists of 100 or 1000 disks, the probability that at least one disk will fail within a day or a week is high. In this paper we address the problem of designing erasure-correcting binary linear codes that protect against the loss of data caused by disk failures in large disk arrays. We describe how such codes can be used to encode data in disk arrays, and give a simple method for data reconstruction. We discuss important reliability and performance constraints of these codes, and show how these constraints relate to properties of the parity check matrices of the codes. In so doing, we transform code design problems into combinatorial problems. Using this combinatorial framework, we present codes and prove they are optimal with respect to various reliability and performance constraints.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two-Erasure Codes from 3-Plexes

Infinite families of optimal linear codes and their applications to distributed storage systems

Article 21 January 2022

Thou code: a triple-erasure-correcting horizontal code with optimal update complexity

Article 20 January 2022

References

Array Technology CorporationProduct Description, RAID + Series Model RX, Revision 1.0, Array Technology Corporation, Boulder, CO 80301, February 1990.
Google Scholar
Bates, K. H., Performance aspects of the HSC controller,Digital Tech. J., Vol. 8, February 1989, pp. 25–37.
Google Scholar
Bell, C. G., Multis: a new class of multiprocessor computers,Science, Vol. 228, April 1985, pp. 462–467.
Article Google Scholar
Bell, C. G., The future of high performance computers in science and engineering,Comm. ACM, Vol. 32, No. 9, September 1989, pp. 1091–1101.
Article Google Scholar
Bitton, D., and J. Gray, Disk shadowing,Proc. 14th Internat. Conf. on Very Large Data Bases (VLDB), 1988, pp. 331–338.
Bollobás, B.,Combinatorics, Set Systems, Hypergraphs, Families of Vectors, and Combinatorial Probability, Cambridge University Press, Cambridge, 1986.
MATH Google Scholar
Boral, H., and D. DeWitt, Database machines: an idea whose time has passed?,Database Machines, H. O. Leilich and M. Missikoff, eds., Springer-Verlag, New York, September 1983, pp. 166–187.
Google Scholar
Brouwer, A. E., Wilson's theory, inPacking and Covering in Combinatorics, A. Schrijver, ed., Mathematical Centre Tracts 106, Mathematisch Centrum, Amsterdam, 1979, pp. 75–88.
Google Scholar
Brouwer, A. E., and A. Schrijver, Uniform hypergraphs, inPacking and Covering in Combinatorics, A. Schrijver, ed., Mathematical Centre Tracts 106, Mathematisch Centrum, Amsterdam, 1979, pp. 39–73.
Google Scholar
Chen, P. M., and D. A. Patterson, Maximizing performance in a striped disk array,Proc. 1990 ACM SIGARCH 17th Ann. Internat. Symp. on Computer Architecture, Seattle, WA, May 1990, pp. 322–331.
Gelsinger, P. P., P. A. Gargini, G. H. Parker, and A. Y. C. Yu, Microprocessors circa 2000,IEEE Spectrum, October 1989, pp. 43–74.
Gibson, G. A.,Redundant Disk Arrays: Reliable, Parallel Secondary Storage, M.I.T. Press, Cambridge, MA, 1992.
Google Scholar
Gibson, G. A., and D. A. Patterson, Designing disk arrays for high data reliability,J. Parallel Distrib. Comput., Vol. 17, No. 1, January 1993, pp. 4–27.
Article Google Scholar
Gray, J., Why do computers stop and what can be done about it?, Tandem Technical Report 85.7, June 1985.
Hanani, H., The existence of construction of balanced incomplete block designs,Ann. of Math. Statist., Vol. 32, 1961, pp. 361–386.
Article MATH MathSciNet Google Scholar
Hanani, H., Balanced incomplete block desings and related designs,Discrete Math., Vol. 11, 1975, pp. 255–369.
Article MATH MathSciNet Google Scholar
Hoagland, A., Information storage technology: a look at the future,IEEE Trans. Computer, Vol. 18, July 1985, pp. 60–67.
Google Scholar
Holland, M., and G. A. Gibson, Parity declustering for continuous operation in redundant disk arrays,Fifth Internat. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS V), Boston, MA, October 1992, pp. 23–25.
Jilke, W., Disk array mass storage sytems: the new opportunity, Amperif Corp., September 1986.
Kim, M. Y., Synchronously Interleaved Disk Systems with their Application to the Very Large FFT, Ph.D. Dissertation, Polytechnic University, January 1987.
Klietz, A., J. Turner, and T. C. Jacobson, TurboNFS: fast shared access for Cray disk storage,Proc. Cray User Group Convention, April 1988.
Kryder, M. H., Data storage in 2000—trends in data storage technologies,IEEE Trans. Magnetics, Vol. 25, No. 6, November 1989, pp. 4358–4363.
Article Google Scholar
Lin, Ting-Ting Yao, Design and Evaluation of an On-Line Predictive Diagnostic System, Ph.D. Dissertation, Carnegie Mellon University, April 1988.
Livny, M., S. Khoshafian, and H. Boral, Multi-disk management algorithms,Proc. ACM SIGMETRICS, May 1987, pp. 69–77.
MacWilliams, F. J., and N. J. A. Sloane,The Theory of Error-Correcting Codes, North-Holland Mathematical Library, Vol. 16, Elsevier Science, New York, 1977.
Google Scholar
Muntz, R. R., and J. C. S. Lui, Performance analysis of disk arrays under failure,Proc. 16th Internat. Conf. on Very Large Data Bases (VDLB), D. McLeod, R. Sacks-Davis, and H. Schek, eds., Morgan Kaufman, Los Altos, CA, August 1990, pp. 162–173.
Google Scholar
Myers, G. J., A. Y. C. Yi, and D. L. House, Microprocessor technology trends,Proc. IEEE, Vol. 74, No. 12, December 1986, pp. 1605–1622.
Article Google Scholar
Newberg, L., and D. Wolfe, String layouts for a redundant array of inexpensive disks,Algorithmica, this issue, pp. 209–224.
Patterson, D. A., G. A. Gibson, and R. H. Katz, A case for redundant arrays of inexpensive disks (RAID),ACM SIGMOD 88, Chicago, June 1988, pp. 109–116.
Peterson, W. W., and E. J. Weldon, Jr.,Error-Correcting Codes, 2nd edn., M.I.T. Press, Cambridge, MA, 1972, pp. 131–136.
MATH Google Scholar
Rabin, M. O., Efficient dispersal of information at security, load balancing, and fault tolerance,J. Assoc. Comput. Mach., Vol. 36, 1989, pp. 335–348.
MATH MathSciNet Google Scholar
Rubinstein, R. Y.,Simulation and the Monte Carlo Method, Wiley, New York, 1981.
MATH Google Scholar
Salem, K., and H. Garcia-Molina, Disk striping,Proc. IEEE 1986 Internat. Conf. on Data Engineering, 1986, pp. 336–342.

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Science, Northwestern University, 60208, Evanston, IL, USA
L. Hellerstein
School of Computer Science, Carnegie Mellon University, 15213-3890, Pittsburgh, PA, USA
G. A. Gibson
Computer Science Division, Electrical Engineering and Computer Sciences, University of California, 94720, Berkeley, CA, USA
R. M. Karp, R. H. Katz & D. A. Patterson

Authors

L. Hellerstein
View author publications
You can also search for this author in PubMed Google Scholar
G. A. Gibson
View author publications
You can also search for this author in PubMed Google Scholar
R. M. Karp
View author publications
You can also search for this author in PubMed Google Scholar
R. H. Katz
View author publications
You can also search for this author in PubMed Google Scholar
D. A. Patterson
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Communicated by Jeffrey Scott Vitter.

This paper is a revised and expanded version of material that appeared at the Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS III), Boston, MA, March 1989. The work here was supported in part by the National Science Foundation under Grant Numbers MIP-8715235 and CCR-8411954, as well as an AT&T Bell Labs GRPW grant, a Siemens Corporation grant, and an IBM graduate fellowship.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hellerstein, L., Gibson, G.A., Karp, R.M. et al. Coding techniques for handling failures in large disk arrays. Algorithmica 12, 182–208 (1994). https://doi.org/10.1007/BF01185210

Download citation

Received: 21 February 1991
Revised: 11 November 1992
Issue Date: September 1994
DOI: https://doi.org/10.1007/BF01185210

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Coding techniques for handling failures in large disk arrays

Abstract

Access this article

Similar content being viewed by others

Two-Erasure Codes from 3-Plexes

Infinite families of optimal linear codes and their applications to distributed storage systems

Thou code: a triple-erasure-correcting horizontal code with optimal update complexity

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Key words

Navigation

Coding techniques for handling failures in large disk arrays

Abstract

Access this article

Similar content being viewed by others

Two-Erasure Codes from 3-Plexes

Infinite families of optimal linear codes and their applications to distributed storage systems

Thou code: a triple-erasure-correcting horizontal code with optimal update complexity

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation