Abstract
As the scale of data in data centers expands, the hard drives are widely used in computer. However, hard disk failures occur frequently in actual scenarios. With the increase of utilizing time, the stability and accuracy of hard disk are continuously decreasing, and will result in negative impact on normal operation of the system. However, there are no researches on the estimation of hard disk quality in entire industry. In this article, we utilize Generative Adversarial Networks (GAN) for realizing data augmentation, and use the catboost model to model the prediction of disk damage, which achieved tenth place in the PAKDD2020 Alibaba intelligent operation and maintenance algorithm competition-large-scale hard disk failure prediction competition [1].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Botezatu, M.M., Giurgiu, I., Bogojeska, J., et al.: Predicting disk replacement towards reliable data centers. In: the 22nd ACM SIGKDD International Conference. ACM (2016)
Vishwanath, K.V., Nagappan, N.: Characterizing cloud computing hardware reliability. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 193–204 (2010)
Shen, J., Wan, J., Lim, S.J., et al.: Random-forest-based failure prediction for hard disk drives. Int. J. Distrib. Sensor Netw. 14(11) (2018)
Jiang, T., Zeng, J., Zhou, K., et al.: Lifelong disk failure prediction via GAN-based anomaly detection. In: 2019 IEEE 37th International Conference on Computer Design (ICCD), pp. 199–207. IEEE (2019)
Kamarthi, S., Zeid, A., Bagul, Y.: Assessment of current health of hard disk drives. In: Proceedings of the Fifth Annual IEEE International Conference on Automation Science and Engineering. IEEE (2009)
Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Machine learning methods for predicting failures in hard drives: a multiple-instance application. J. Mach. Learn. Res. 6(1), 783–816 (2005)
Xiao, J., Xiong, Z., Wu, S., Yi Y., Jin, H., Hu, K.: Disk failure prediction in data centers via online learning. In: ICPP 2018: Proceedings of the 47th International Conference on Parallel Processing, pp. 1–10 (2018)
Li, J., Ji, X., Jia, Y., et al.: Hard drive failure prediction using classification and regression trees. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 383–394. IEEE (2014)
Yang, W., Hu, D., Liu, Y., et al.: Hard drive failure prediction using big data. In: IEEE Symposium on Reliable Distributed Systems Workshop. IEEE (2015)
Xu, Y., Sui, K., Yao, R., et al.: Improving service availability of cloud systems by predicting disk error. In: 2018 USENIX Annual Technical Conference. USENIX (2018)
Eduardo, P., Wolfdietrich, W., Luiz, A.B.: Failure trends in a large disk drive population. In: USENIX Conference on File and Storage Technologies San Jose. USENIX (2007)
Gunawi, H.S., Hao, M., Suminto, R.O., et al.: Why does the cloud stop computing?: lessons from hundreds of service outages. In: Seventh ACM Symposium. ACM (2016)
Han, S., Wu, J., Xu, E., et al.: Robust data preprocessing for machine-learning-based disk failure prediction in cloud production environments. arXiv preprint arXiv:1912.09722 (2019)
Stoer, J., Bulirsch, R.: Introduction to Numerical Analysis, vol. 12, 3rd edn. Springer, New York (2002). https://doi.org/10.1007/978-0-387-21738-3
Zhu, B., Wang, G., Liu, X., et al.: Proactive drive failure prediction for large scale storage systems. In: 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–5. IEEE (2013)
He, H., Ma, Y.: Imbalanced Learning. Foundations, Algorithms, and Applications. Wiley, Hoboken (2013)
Hughes, G.F., Murray, J.F., Kreutz-Delgado, K., Elkan, C.: Improved disk-drive failure warnings. IEEE Trans. Reliab. 51(3), 350–357 (2002)
Murray J.F., Hughes G.F., Kreutz-Delgado K.: Hard drive failure prediction using non-parametric statistical methods. In: ICANN/ICONIP 2003 Proceedings of Joint International Conference on Artificial Neural Networks and Neural Information Processing. Springer, Heidelberg (2003)
Hamerly, G., Elkan, C.: Bayesian approaches to failure prediction for disk drives. In: ICML 2001 Proceedings of the Eighteenth International Conference on Machine Learning, pp. 202–209 (2001)
Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.L.: Machine learning methods for predicting failures in hard drives: a multiple-instance application. J. Mach. Learn. Res. 6, 783–816 (2005)
Ma, A., Douglis, F., Lu, G., et al.: RAIDShield: characterizing, monitoring, and proactively protecting against disk failures. In: Proceedings of the 13th USENIX Conference on File and Storage Technologies, pp. 241–256. SENIX Association (2015)
Qian, J., Skelton, S., Moore, J., et al.: P3: priority based proactive prediction for soon-to-fail disks. In: 2015 IEEE International Conference on Networking, Architecture and Storage (NAS), pp. 81–86. IEEE (2015)
Wang, Y., Miao, Q., Ma, E.W., et al.: Online anomaly detection for hard disk drives based on Mahalanobis distance. IEEE Trans. Reliab. 62(1), 136–145 (2013)
Wang, Y., Ma, E.W., Chow, T.W., et al.: A two-step parametric method for failure prediction in hard disk drives. IEEE Trans. Industr. Inf. 10(1), 419–430 (2014)
Queiroz, L.P., Rodrigues, F.C.M., Gomes, J.P.P., et al.: A fault detection method for hard disk drives based on mixture of Gaussians and nonparametric statistics. IEEE Trans. Industr. Inf. 13(2), 542–550 (2017)
Xu, C., Wang, G., Liu, X.G., et al.: Health status assessment and failure prediction for hard drives with recurrent neural networks. IEEE Trans. Comput. 13(9), 1–8 (2015)
Lu, S., Luo, B., Patel, T., et al.: Making disk failure predictions SMARTer!. In: 18th USENIX Conference on File and Storage Technologies, pp. 151–167. USENIX (2020)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Prokhorenkova, L., Gusev, G., Vorobev, A., et al.: CatBoost: unbiased boosting with categorical features. In: Advances in Neural Information Processing Systems, pp. 6638–6648 (2018)
Acknowledgments
This work was supported by the National Key R&D Program of China (2018YFC08 32103, 2018YFC0831000, 2018YFC0832101) and National Social Science Foundation of China (15BGL035).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wu, Q. et al. (2020). Tree-Based Model with Advanced Data Preprocessing for Large Scale Hard Disk Failure Prediction. In: He, C., Feng, M., Lee, P., Wang, P., Han, S., Liu, Y. (eds) Large-Scale Disk Failure Prediction. AI Ops 2020. Communications in Computer and Information Science, vol 1261. Springer, Singapore. https://doi.org/10.1007/978-981-15-7749-9_9
Download citation
DOI: https://doi.org/10.1007/978-981-15-7749-9_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-7748-2
Online ISBN: 978-981-15-7749-9
eBook Packages: Computer ScienceComputer Science (R0)