Abstract
This chapter analyzes the error behavior of a 3.2TB disk storage system. We report reliability data for 18 months of the prototype’s operation, and analyze 6 months of error logs from nodes in the prototype. We found that the disks drives were among the most reliable components in the system. We were also able to divide errors into eleven categories, comprising disk errors, network errors and SCSI errors that appeared repeatedly across all nodes. We also gained insight into the types of error messages reported by devices in various conditions, and the effects of these events on the operating system. We also present data from four cases of disk drive failures. These results and insights should be useful to any designer of a fault tolerant storage system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Burkhard, W. and Menon, J. (1993). Disk Array Storage System reliability. In Proceedings 23rd annual International Symposium on Fault Tolerant Computing.
Cao, P., Lim, S., Venkataraman, S., and Wilkes, J. (1993). The TickerTAIP Parallel RAID Architecture. In Proceedings 20th Annual International Symposium on Computer Architecture.
Chen, P., Lee, E., Gibson, G., Katz, R., and Patterson, D. (1994). RAID: High Performance Reliable Secondary Storage. ACM Computing Surveys, 26 (no.2):145–188.
Gibson, G. (1992). Redundant Disk Arrays: Reliable Parallel Secondary Storage. The MIT Press, Cambridge Massachusetts.
Gray, J. (1990). A Census of Tandem System Availability Between 1985 and 1990. IEEE Transactions on Reliability, 29(no. 4).
Hartman, J. and Ousterhout, J. (1995). The Zebra Striped Network File System. ACM Transactions on Computer Systems.
IBM (1998). Predictive Failure Analysis. In http://www.storage.ibm.com/stor-age/oem/tech/pfa.htm.
Lin, T.-T. (1988). Design and Evaluation of an on-line predictive diagnostic system. In Ph.D Thesis, Technical Report, CMUCSD-88-1. Electrical and Computer Engineering, Carnegie Mellon University.
Lin, T.-T. and Siewiorek, D. (1990). Error Log Analysis: Statistical Modeling and Heuristic Trend Analysis. IEEE Transactions on Reliability, 39(no.4).
Ng, S. (1994). Crosshatch disk array for improved reliability and performance. In Proceedings the 21st Annual International Symposium on Computer Architecture, pages 255–264.
Schulze, M. (1988). Considerations in the Design of a RAID Prototype. In Technical Report UCB/CSD 88/448. Computer Science Division, University of California at Berkeley.
SCSI2 (1998). The SCSI-2 Interface Specification.
Seagate (1997). Cheetah Disk Drive Specification.
Seagate (1998). Self Monitoring. Analysis and Reporting Technology (S.M.A.R.T) Frequently Asked Questions. In http://www.seagate.com:80 /support/disc/faq/smart.shtml.
Talagala, N., Asami, S., and Patterson, D. (1999). Access Patterns of a Web Based Image Collection. In Proceedings of the 1999 IEEE Symposium on Mass Storage Systems.
Tsao, M. (1988). Trend Analysis and Fault Prediction. In PhD. Dissertation, Technical Report CMU-CS 83/130. Computer Science Division, Carnegie Mellon University.
Worthington, B., Ganger, G., Patt, Y., and Wilkes, J. (1995). On-line extraction of SCSI disk drive parameters. In 1995 Joint International Conference on Measurement and Modeling of Computer Systems.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer Science+Business Media New York
About this chapter
Cite this chapter
Talagala, N., Patterson, D. (2000). Failure Characteristics and Soft Error Behavior in a Large Storage System. In: Avresky, D.R. (eds) Dependable Network Computing. The Springer International Series in Engineering and Computer Science, vol 538. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-4549-1_2
Download citation
DOI: https://doi.org/10.1007/978-1-4615-4549-1_2
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-7053-6
Online ISBN: 978-1-4615-4549-1
eBook Packages: Springer Book Archive