Abstract
With the fast expansion of online media and cloud-based storage, hard disk drive failure prediction becomes an increasingly important problem that has great industry impact. In the last 20 years, much effort has been put into using machine learning method to enhance the S.M.A.R.T monitoring system. Success has been achieved at various degrees, but the state-of-the-art methods still have considerable distance from the level of performance required by industry operations. In this paper, we demonstrated that with a strategic ensemble of models that cover both short-range and long-range temporal dependencies of S.M.A.R.T data, it is possible to achieve higher overall failure prediction accuracy and robustness. Our proposed model, named SHARP, is shown to achieve 56% F1 score in one of the holdout blind tests using an industry-scale data set. In the online competition test set, the F1 score was 38%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Miller, R.: Google building more data centers for massive future clouds. https://datacenterfrontier.com/google-building-more-data-centers-for-massive-future-clouds
Huang, S., Liang, S., Fu, S., Shi, W., Tiwari, D., Chen, H.: Characterizing disk health degradation and proactively protecting against disk failures for reliable storage systems. In 2019 IEEE International Conference on Autonomic Computing (ICAC), pp. 157–166. IEEE (2019)
Ganguly, S., Consul, A., Khan, A., Bussone, B., Richards, J., Miguel, A.: A practical approach to hard disk failure prediction in cloud platforms: big data model for failure management in datacenters. In 2016 IEEE Second International Conference on Big Data Computing Service and Applications (BigDataService), pp. 105–116. IEEE (2016)
Hughes, G.F., Murray, J.F., Kreutz-Delgado, K., Elkan, C.: Improved disk-drive failure warnings. IEEE Trans. Reliab. 51(3), 350–357 (2002)
Hamerly, Greg, Elkan, Charles, et al.: Bayesian approaches to failure prediction for disk drives. In: ICML, vol. 1, pp. 202–209 (2001)
Chang, X., Wang, G., Liu, X., Guo, D., Liu, T.-Y.: Health status assessment and failure prediction for hard drives with recurrent neural networks. IEEE Trans. Comput. 65(11), 3502–3508 (2016)
Zhao, Y., Liu, X., Gan, S., Zheng, W.: Predicting disk failures with HMM- and HSMM-based approaches. In: Perner, P. (ed.) ICDM 2010. LNCS (LNAI), vol. 6171, pp. 390–404. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14400-4_30
Wang, Y., Miao, Q., Pecht, M.: Health monitoring of hard disk drive based on Mahalanobis distance. In: 2011 Prognostics and System Health Management Conference, pp. 1–8. IEEE (2011)
Zhu, B., Wang, G., Liu, X., Hu, D., Lin, S., Ma, J.: Proactive drive failure prediction for large scale storage systems. In: 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–5. IEEE (2013)
Li, J., et al.: Hard drive failure prediction using classification and regression trees. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp. 383–394. IEEE (2014)
Yang, W., Hu, D., Liu, Y., Wang, S., Jiang, T.: Hard drive failure prediction using big data. In: 2015 IEEE 34th Symposium on Reliable Distributed Systems Workshop (SRDSW), pp. 13–18. IEEE (2015)
Pang, S., Jia, Y., Stones, R., Wang, G., Liu, X.: A combined Bayesian network method for predicting drive failure times from smart attributes. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 4850–4856. IEEE (2016)
Botezatu, M.M., Giurgiu, I., Bogojeska, J., Wiesmann, D.: Predicting disk replacement towards reliable data centers. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 39–48 (2016)
Rincón, C.A.C., Pâris, J.-F., Vilalta, R., Cheng, A.M.K., Long, D.D.E.: Disk failure prediction in heterogeneous environments. In: 2017 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS), pp. 1–7. IEEE (2017)
Li, J., Stones, R.J., Wang, G., Liu, X., Li, Z., Xu, M.: Hard drive failure prediction using decision trees. Reliab. Eng. Syst. Saf. 164, 55–65 (2017)
Mahdisoltani, F., Stefanovici, I., Schroeder, B.: Proactive error prediction to improve storage system reliability. In: 2017 \(\{\)USENIX\(\}\) Annual Technical Conference (\(\{\)USENIX\(\}\)\(\{\)ATC\(\}\) 2017), pp. 391–402 (2017)
Xiao, J., Xiong, Z., Wu, S., Yi, Y., Jin, H., Hu, K.: Disk failure prediction in data centers via online learning. In: Proceedings of the 47th International Conference on Parallel Processing, pp. 1–10 (2018)
Xu, Y., et al.: Improving service availability of cloud systems by predicting disk error. In: 2018 \(\{\)USENIX\(\}\) Annual Technical Conference (\(\{\)USENIX\(\}\)\(\{\)ATC\(\}\) 2018), pp. 481–494 (2018)
Shen, J., Wan, J., Lim, S.-J., Lifeng, Yu.: Random-forest-based failure prediction for hard disk drives. Int. J. Distrib. Sensor Netw. 14(11), 1550147718806480 (2018)
Kaur, K., Kaur, K.: Failure prediction and health status assessment of storage systems with decision trees. In: Luhach, A.K., et al. (eds.) ICAICR 2018. CCIS, vol. 955, pp. 366–376. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-3140-4_33
Kaur, K., Kaur, K.: Failure prediction, lead time estimation and health degree assessment for hard disk drives using voting based decision trees. CMC Comput. Mater. Continua 60, 913–946 (2019)
Breiman, L., Friedman, J.H., Olshen, R., Stone, C.J.: Classification and Regression Trees. CRC Press, Boca Raton (1984)
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
Yoon, B.-J., Vaidyanathan, P.P.: Context-sensitive hidden Markov models for modeling long-range dependencies in symbol sequences. IEEE Trans. Signal Process. 54(11), 4169–4184 (2006)
Wang, Y., Jiang, S., He, L., Peng, Y., Chow, T.W.S.: Hard disk drives failure detection using a dynamic tracking method. In: 2019 IEEE 17th International Conference on Industrial Informatics (INDIN), vol. 1, pp. 1473–1477. IEEE (2019)
Alibaba S.M.A.R.T data-set. https://github.com/alibaba-edu/dcbrain/tree/master/diskdata
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liu, W., Xue, Y., Liu, P. (2020). SHARP: SMART HDD Anomaly Risk Prediction. In: He, C., Feng, M., Lee, P., Wang, P., Han, S., Liu, Y. (eds) Large-Scale Disk Failure Prediction. AI Ops 2020. Communications in Computer and Information Science, vol 1261. Springer, Singapore. https://doi.org/10.1007/978-981-15-7749-9_8
Download citation
DOI: https://doi.org/10.1007/978-981-15-7749-9_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-7748-2
Online ISBN: 978-981-15-7749-9
eBook Packages: Computer ScienceComputer Science (R0)