To Fail or Not to Fail: Predicting Hard Disk Drive Failure Time Windows

Züfle, Marwin; Krupitzer, Christian; Erhard, Florian; Grohmann, Johannes; Kounev, Samuel

doi:10.1007/978-3-030-43024-5_2

Marwin Züfle⁹,
Christian Krupitzer⁹,
Florian Erhard⁹,
Johannes Grohmann⁹ &
…
Samuel Kounev⁹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 12040))

Included in the following conference series:

International Conference on Measurement, Modelling and Evaluation of Computing Systems

780 Accesses
7 Citations

Abstract

Due to the increasing size of today’s data centers as well as the expectation of 24/7 availability, the complexity in the administration of hardware continuously increases. Techniques as the Self-Monitoring, Analysis, and Reporting Technology (S.M.A.R.T.) support the monitoring of the hardware. However, those techniques often lack algorithms for intelligent data analytics. Especially, the integration of machine learning to identify potential failures in advance seems to be promising to reduce administration overhead. In this work, we present three machine learning approaches to (i) identify imminent failures, (ii) predict time windows for failures, as well as (iii) predict the exact time-to-failure. In a case study with real data from 369 hard disks, we achieve an F1-score of up to 98.0% and 97.6% for predicting potential failures with two or multiple time windows, respectively, and a hit rate of 84.9% (with a mean absolute error of 4.5 h) for predicting the time-to-failure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Weak learners are classification methods that correlate rather weakly with the true classification, while strong learners correlate very well with the true classification.
2.
Sampling with replacement means that instances can be selected multiple times in the same sample.
3.
The handling of multiple classes is not explicitly required for comparing data preparation for binary classification. However, it is necessary for multi-class classification in Sect. 4.2 and to maintain comparability between the approaches.
4.
This does not apply if the data set is sufficiently large to still be large enough after undersampling.
5.
HDD data set: http://dsp.ucsd.edu/~jfmurray/software.htm.

References

Aussel, N., Jaulin, S., Gandon, G., Petetin, Y., Fazli, E., Chabridon, S.: Predictive models of hard drive failures based on operational data. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 619–625. IEEE (2017)
Google Scholar
Botezatu, M.M., Giurgiu, I., Bogojeska, J., Wiesmann, D.: Predicting disk replacement towards reliable data centers. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 39–48. ACM (2016)
Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Breiman, L., Cutler, A., Liaw, A., Wiener, M.: Breiman and Cutler’s random forests for classification and regression (2018). https://cran.r-project.org/web/packages/randomForest/randomForest.pdf
Cao, H., Li, X.L., Woon, D.Y.K., Ng, S.K.: Integrated oversampling for imbalanced time series classification. IEEE Trans. Knowl. Data Eng. 25(12), 2809–2822 (2013)
Article Google Scholar
Chaves, I.C., de Paula, M.R.P., Leite, L.G., Gomes, J.P.P., Machado, J.C.: Hard disk drive failure prediction method based on a Bayesian network. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–7. IEEE (2018)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Dal Pozzolo, A., Caelen, O., Bontempi, G.: Unbalanced (2015). https://cran.r-project.org/web/packages/unbalanced/unbalanced.pdf
Dixon, M., Klabjan, D., Wei, L.: OSTSC (2017). https://cran.r-project.org/web/packages/OSTSC/OSTSC.pdf
Hamerly, G., Elkan, C., et al.: Bayesian approaches to failure prediction for disk drives. In: ICML, vol. 1, pp. 202–209 (2001)
Google Scholar
Ho, T.K.: Random decision forests. In: Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 1, pp. 278–282. IEEE (1995)
Google Scholar
Krupitzer, C., Roth, F.M., VanSyckel, S., Schiele, G., Becker, C.: A survey on engineering approaches for self-adaptive systems. Pervasive Mob. Comput. J. 17(Part B), 184–206 (2015)
Article Google Scholar
Li, J., et al.: Hard drive failure prediction using classification and regression trees. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp. 383–394. IEEE (2014)
Google Scholar
Li, J., Stones, R.J., Wang, G., Liu, X., Li, Z., Xu, M.: Hard drive failure prediction using decision trees. Reliab. Eng. Syst. Saf. 164, 55–65 (2017)
Article Google Scholar
Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Machine learning methods for predicting failures in hard drives: a multiple-instance application. J. Mach. Learn. Res. 6(May), 783–816 (2005)
MathSciNet MATH Google Scholar
Ottem, E., Plummer, J.: Playing it smart: The emergence of reliability prediction technology. Technical report, Seagate Technology Paper (1995)
Google Scholar
Pinheiro, E., Weber, W.D., Barroso, L.A.: Failure trends in a large disk drive population. In: 5th USENIX Conference on File and Storage Technologies (FAST 2007), pp. 17–29 (2007)
Google Scholar
Pitakrat, T., Van Hoorn, A., Grunske, L.: A comparison of machine learning algorithms for proactive hard disk drive failure detection. In: Proceedings of the 4th International ACM SIGSOFT Symposium on Architecting Critical Systems, pp. 1–10. ACM (2013)
Google Scholar
dos Santos Lima, F.D., Pereira, F.L.F., Chaves, I.C., Gomes, J.P.P., de Castro Machado, J.: Evaluation of recurrent neural networks for hard disk drives failure prediction. In: 2018 7th Brazilian Conference on Intelligent Systems (BRACIS), pp. 85–90. IEEE (2018)
Google Scholar
Seagate Product Marketing: Get S.M.A.R.T. for reliability. Technical report, Seagate Technology Paper (1999)
Google Scholar
Shen, J., Wan, J., Lim, S.J., Yu, L.: Random-forest-based failure prediction for hard disk drives. Int. J. Distrib. Sens. Netw. 14(11), 1550147718806480 (2018)
Article Google Scholar
Sun, X., et al.: System-level hardware failure prediction using deep learning. In: Proceedings of the 56th Annual Design Automation Conference 2019, p. 20. ACM (2019)
Google Scholar
Wang, Y., Ma, E.W., Chow, T.W., Tsui, K.L.: A two-step parametric method for failure prediction in hard disk drives. IEEE Trans. Industr. Inf. 10(1), 419–430 (2013)
Article Google Scholar
Xiao, J., Xiong, Z., Wu, S., Yi, Y., Jin, H., Hu, K.: Disk failure prediction in data centers via online learning. In: Proceedings of the 47th International Conference on Parallel Processing, p. 35. ACM (2018)
Google Scholar
Xu, C., Wang, G., Liu, X., Guo, D., Liu, T.Y.: Health status assessment and failure prediction for hard drives with recurrent neural networks. IEEE Trans. Comput. 65(11), 3502–3508 (2016)
Article MathSciNet Google Scholar
Yang, W., Hu, D., Liu, Y., Wang, S., Jiang, T.: Hard drive failure prediction using big data. In: 2015 IEEE 34th Symposium on Reliable Distributed Systems Workshop (SRDSW), pp. 13–18. IEEE (2015)
Google Scholar
Zhao, Y., Liu, X., Gan, S., Zheng, W.: Predicting disk failures with HMM- and HSMM-based approaches. In: Perner, P. (ed.) ICDM 2010. LNCS (LNAI), vol. 6171, pp. 390–404. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14400-4_30
Chapter Google Scholar
Zhu, B., Wang, G., Liu, X., Hu, D., Lin, S., Ma, J.: Proactive drive failure prediction for large scale storage systems. In: 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–5. IEEE (2013)
Google Scholar
Züfle, M., et al.: Autonomic forecasting method selection: examination and ways ahead. In: Proceedings of the 16th IEEE International Conference on Autonomic Computing (ICAC). IEEE (2019)
Google Scholar

Download references

Acknowledgements

This work was co-funded by the German Research Foundation (DFG) under grant No. (KO 3445/11-1) and the IHK (Industrie- und Handelskammer) Würz-burg-Schweinfurt.

Author information

Authors and Affiliations

Software Engineering Group, University of Würzburg, Würzburg, Germany
Marwin Züfle, Christian Krupitzer, Florian Erhard, Johannes Grohmann & Samuel Kounev

Authors

Marwin Züfle
View author publications
You can also search for this author in PubMed Google Scholar
Christian Krupitzer
View author publications
You can also search for this author in PubMed Google Scholar
Florian Erhard
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Grohmann
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Kounev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marwin Züfle .

Editor information

Editors and Affiliations

Saarland University, Saarbrücken, Germany
Holger Hermanns

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Züfle, M., Krupitzer, C., Erhard, F., Grohmann, J., Kounev, S. (2020). To Fail or Not to Fail: Predicting Hard Disk Drive Failure Time Windows. In: Hermanns, H. (eds) Measurement, Modelling and Evaluation of Computing Systems. MMB 2020. Lecture Notes in Computer Science(), vol 12040. Springer, Cham. https://doi.org/10.1007/978-3-030-43024-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-43024-5_2
Published: 09 March 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43023-8
Online ISBN: 978-3-030-43024-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics