Skip to main content

Noise Feature Selection Method in PAKDD 2020 Alibaba AI Ops Competition: Large-Scale Disk Failure Prediction

  • Conference paper
  • First Online:
Large-Scale Disk Failure Prediction (AI Ops 2020)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1261))

Included in the following conference series:

  • 538 Accesses

Abstract

This paper describes our method to the PAKDD2020 Alibaba AI Ops Competition: Large-Scale Disk Failure Prediction. Our approach is based on Gradient Boosting Machine and deep feature engineering, and most important is the research on feature selection method. In this competition, we proposed a new feature selection method based on the existing Null Importance method. We named Noise Feature Selection short for NFS. To fitted to noise, NFS using target permutation tests actual significance against the whole distribution of feature importance. The effectiveness of NFS method has been proved in experiments. While in the competition task, we got 0.2509 score in Qualification, 0.1946 score in Semi-Finals.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/alibaba-edu/dcbrain/tree/master/diskdata

References

  1. PAKDD2020 Alibaba AI OPS competition: Large-scale disk failure prediction (2020). https://www.pakdd2020.org/competition_aiops.html

  2. Essam Al Daoud: Comparison between XGBoost, LightGBM and CatBOOST using a home credit dataset. Int. J. Comput. Inform. Eng. 13(1), 6–10 (2019)

    Google Scholar 

  3. Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(3), 131–156 (1997)

    Article  Google Scholar 

  4. El-Shimi, A.: Predicting storage failures (2017)

    Google Scholar 

  5. Hamerly, G., Elkan, C.: Bayesian approaches to failure prediction for disk drives, pp. 202–209 (2001)

    Google Scholar 

  6. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  7. Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, pp. 3146–3154 (2017)

    Google Scholar 

  8. Kraskov, A., Stögbauer, H., Grassberger, P.: Estimating mutual information. Phys. Rev. E 69(6), 066138 (2004)

    Article  MathSciNet  Google Scholar 

  9. Li, J., Stones, R.J., Wang, G., Liu, X., Li, Z., Xu, M.: Hard drive failure prediction using decision trees. Reliab. Eng. Syst. Saf. 164, 55–65 (2017)

    Article  Google Scholar 

  10. Liaw, A., Wiener, M., et al.: Classification and regression by randomforest. R News 2(3), 18–22 (2002)

    Google Scholar 

  11. Liu, G., Cheng, H.-R., Qin, Z.-G., Liu, Q., Liu, C.-X.: E-CVFDT: an improving CVFDT method for concept drift data stream. In 2013 International Conference on Communications, Circuits and Systems (ICCCAS), vol. 1, pp. 315–318. IEEE (2013)

    Google Scholar 

  12. Chang, X., Wang, G., Liu, X., Guo, D., Liu, T.-Y.: Health status assessment and failure prediction for hard drives with recurrent neural networks. IEEE Trans. Comput. 65(11), 3502–3508 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  13. Zhao, Y., Liu, X., Gan, S., Zheng, W.: Predicting disk failures with HMM- and HSMM-based approaches. In: Perner, P. (ed.) ICDM 2010. LNCS (LNAI), vol. 6171, pp. 390–404. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14400-4_30

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yifan Peng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Peng, Y., Xu, J., Zhao, N. (2020). Noise Feature Selection Method in PAKDD 2020 Alibaba AI Ops Competition: Large-Scale Disk Failure Prediction. In: He, C., Feng, M., Lee, P., Wang, P., Han, S., Liu, Y. (eds) Large-Scale Disk Failure Prediction. AI Ops 2020. Communications in Computer and Information Science, vol 1261. Springer, Singapore. https://doi.org/10.1007/978-981-15-7749-9_11

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-7749-9_11

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-7748-2

  • Online ISBN: 978-981-15-7749-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics