Abstract
This paper describes our method to the PAKDD2020 Alibaba AI Ops Competition: Large-Scale Disk Failure Prediction. Our approach is based on Gradient Boosting Machine and deep feature engineering, and most important is the research on feature selection method. In this competition, we proposed a new feature selection method based on the existing Null Importance method. We named Noise Feature Selection short for NFS. To fitted to noise, NFS using target permutation tests actual significance against the whole distribution of feature importance. The effectiveness of NFS method has been proved in experiments. While in the competition task, we got 0.2509 score in Qualification, 0.1946 score in Semi-Finals.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
PAKDD2020 Alibaba AI OPS competition: Large-scale disk failure prediction (2020). https://www.pakdd2020.org/competition_aiops.html
Essam Al Daoud: Comparison between XGBoost, LightGBM and CatBOOST using a home credit dataset. Int. J. Comput. Inform. Eng. 13(1), 6–10 (2019)
Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(3), 131–156 (1997)
El-Shimi, A.: Predicting storage failures (2017)
Hamerly, G., Elkan, C.: Bayesian approaches to failure prediction for disk drives, pp. 202–209 (2001)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, pp. 3146–3154 (2017)
Kraskov, A., Stögbauer, H., Grassberger, P.: Estimating mutual information. Phys. Rev. E 69(6), 066138 (2004)
Li, J., Stones, R.J., Wang, G., Liu, X., Li, Z., Xu, M.: Hard drive failure prediction using decision trees. Reliab. Eng. Syst. Saf. 164, 55–65 (2017)
Liaw, A., Wiener, M., et al.: Classification and regression by randomforest. R News 2(3), 18–22 (2002)
Liu, G., Cheng, H.-R., Qin, Z.-G., Liu, Q., Liu, C.-X.: E-CVFDT: an improving CVFDT method for concept drift data stream. In 2013 International Conference on Communications, Circuits and Systems (ICCCAS), vol. 1, pp. 315–318. IEEE (2013)
Chang, X., Wang, G., Liu, X., Guo, D., Liu, T.-Y.: Health status assessment and failure prediction for hard drives with recurrent neural networks. IEEE Trans. Comput. 65(11), 3502–3508 (2016)
Zhao, Y., Liu, X., Gan, S., Zheng, W.: Predicting disk failures with HMM- and HSMM-based approaches. In: Perner, P. (ed.) ICDM 2010. LNCS (LNAI), vol. 6171, pp. 390–404. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14400-4_30
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Peng, Y., Xu, J., Zhao, N. (2020). Noise Feature Selection Method in PAKDD 2020 Alibaba AI Ops Competition: Large-Scale Disk Failure Prediction. In: He, C., Feng, M., Lee, P., Wang, P., Han, S., Liu, Y. (eds) Large-Scale Disk Failure Prediction. AI Ops 2020. Communications in Computer and Information Science, vol 1261. Springer, Singapore. https://doi.org/10.1007/978-981-15-7749-9_11
Download citation
DOI: https://doi.org/10.1007/978-981-15-7749-9_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-7748-2
Online ISBN: 978-981-15-7749-9
eBook Packages: Computer ScienceComputer Science (R0)