Abstract
The failure prediction for storage systems plays more and more important role given the explosive growth of data in various data centers in recent years. In this paper, the existing technologies have been employed in the prediction have been reviewed. In particular, the techniques such as imbalance data alleviation and temporal feature construction, which are also adopted in our solution, are reviewed in more detail. Our solution to the prediction problem which is mainly built upon LightGBM is then presented. The solution ranks 38 with \(F_1\)-score of 34.28% on PAKDD2020 Alibaba AI Ops Competition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agarwal, V., Bhattacharyya, C., Niranjan, T., Susarla, S.: Discovering rules from disk events for predicting hard drive failures. In: Proceedings of the IEEE International Conference on Machine Learning and Applications, vol. 1, pp. 782–786. IEEE, December 2009
Botezatu, M.M., Giurgiu, I., Bogojeska, J., Wiesmann, D.: Predicting disk replacement towards reliable data centers. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 13–17-August, pp. 39–48. ACM Press, August 2016
Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. vol. 13–17-August, pp. 785–794. ACM Press, August 2016
alibaba edu: The dataset of over 200 thousands hard disk drives in alibaba cloud’s data centers (2020). https://github.com/alibaba-edu/dcbrain/tree/master/diskdata
Han, S., Lee, P.P., Shen, Z., He, C., Liu, Y., Huang, T.: Toward adaptive disk failure prediction via stream mining. In: Proceedings of the IEEE International Conference on Distributed Computing Systems (2020)
Kaur, K., Kaur, K.: Failure prediction and health status assessment of storage systems with decision trees. In: Luhach, A.K., Singh, D., Hsiung, P.-A., Hawari, K.B.G., Lingras, P., Singh, P.K. (eds.) ICAICR 2018. CCIS, vol. 955, pp. 366–376. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-3140-4_33
Kaur, K., Kaur, K.: Failure prediction, lead time estimation and health degree assessment for hard disk drives using voting based decision trees. Comput. Mater. Continua 60(3), 913–946 (2019)
Ke, G., et al.: Lightgbm: A highly efficient gradient boosting decision tree. In: Proceedings of the International Conference on Neural Information Processing Systems, pp. 3149–3157. Curran Associates Inc. (2017)
Li, J., et al.: Hard drive failure prediction using classification and regression trees. In: Proceedings of Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp. 383–394. IEEE, June 2014
Li, J., Stones, R.J., Wang, G., Li, Z., Liu, X., Ding, J.: New metrics for disk failure prediction that go beyond prediction accuracy. IEEE Access 6, 76627–76639 (2018)
Li, J., Stones, R.J., Wang, G., Li, Z., Liu, X., Xiao, K.: Being accurate is not enough: new metrics for disk failure prediction. In: Proceedings of IEEE International Symposium on Reliable Distributed Systems, pp. 71–80. IEEE, September 2016
Li, J., Stones, R.J., Wang, G., Liu, X., Li, Z., Xu, M.: Hard drive failure prediction using decision trees. Reliabil. Eng. Syst. Safety 164, 55–65 (2017)
Mahdisoltani, F., Stefanovici, I., Schroeder, B.: Proactive error prediction to improve storage system reliability. In: Proceedings of the 2017 USENIX Annual Technical Conference, pp. 391–402. USENIX Association, July 2017
Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Machine learning methods for predicting failures in hard drives: a multiple-instance application. J. Mach. Learn. Res. 6, 783–816 (2005)
Pang, S., Jia, Y., Stones, R., Wang, G., Liu, X.: A combined bayesian network method for predicting drive failure times from smart attributes. In: International Joint Conference on Neural Networks, vol. 2016-October, pp. 4850–4856. IEEE, July 2016
Paris, J.F., Rincón, C.A.C., Vilalta, R., Cheng, A.M.K., Long, D.D.E.: Disk failure prediction in heterogeneous environments. In: Proceedings of the International Symposium on Performance Evaluation of Computer and Telecommunication Systems, pp. 1–7. IEEE, July 2017
Shen, J., Wan, J., Lim, S.J., Yu, L.: Random-forest-based failure prediction for hard disk drives. Int. J. Distrib. Sens. Netw. 14(11), 155014771880648 (2018)
Wang, Y., Miao, Q., Pecht, M.: Health monitoring of hard disk drive based on mahalanobis distance. In: Proceedings of Prognostics and System Health Managment Confernece, pp. 1–8. IEEE, May 2011
Xiao, J., Xiong, Z., Wu, S., Yi, Y., Jin, H., Hu, K.: Disk failure prediction in data centers via online learning. In: Proceedings of the International Conference on Parallel Processing, pp. 1–10. ACM Press, August 2018
Xu, C., Wang, G., Liu, X., Guo, D., Liu, T.Y.: Health status assessment and failure prediction for hard drives with recurrent neural networks. IEEE Trans. Comput. 65(11), 3502–3508 (2016)
Xu, Y., et al.: Improving service availability of cloud systems by predicting disk error. In: Proceedings of the 2018 USENIX Annual Technical Conference, pp. 481–494. USENIX Association, July 2018
Yang, W., Hu, D., Liu, Y., Wang, S., Jiang, T.: Hard drive failure prediction using big data. In: Proceedings of IEEE International Symposium on Reliable Distributed Systems Workshop, vol. 2016-January, pp. 13–18. IEEE, September 2015
Zhao, Y., Liu, X., Gan, S., Zheng, W.: Predicting disk failures with HMM- and HSMM-based approaches. In: Perner, P. (ed.) ICDM 2010. LNCS (LNAI), vol. 6171, pp. 390–404. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14400-4_30
Zhu, B., Wang, G., Liu, X., Hu, D., Lin, S., Ma, J.: Proactive drive failure prediction for large scale storage systems. In: Proceedings of the IEEE Symposium on Mass Storage Systems and Technologies, pp. 1–5. IEEE, May 2013
Acknowledgment
This work is supported by National Natural Science Foundation of China under grants 61572408 and 61972326, and the grants of Xiamen University 20720180074.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chen, RQ. (2020). PAKDD2020 Alibaba AI Ops Competition: Large-Scale Disk Failure Prediction. In: He, C., Feng, M., Lee, P., Wang, P., Han, S., Liu, Y. (eds) Large-Scale Disk Failure Prediction. AI Ops 2020. Communications in Computer and Information Science, vol 1261. Springer, Singapore. https://doi.org/10.1007/978-981-15-7749-9_7
Download citation
DOI: https://doi.org/10.1007/978-981-15-7749-9_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-7748-2
Online ISBN: 978-981-15-7749-9
eBook Packages: Computer ScienceComputer Science (R0)