Skip to main content

PAKDD2020 Alibaba AI Ops Competition: Large-Scale Disk Failure Prediction

  • Conference paper
  • First Online:
Large-Scale Disk Failure Prediction (AI Ops 2020)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1261))

Included in the following conference series:

  • 537 Accesses

Abstract

The failure prediction for storage systems plays more and more important role given the explosive growth of data in various data centers in recent years. In this paper, the existing technologies have been employed in the prediction have been reviewed. In particular, the techniques such as imbalance data alleviation and temporal feature construction, which are also adopted in our solution, are reviewed in more detail. Our solution to the prediction problem which is mainly built upon LightGBM is then presented. The solution ranks 38 with \(F_1\)-score of 34.28% on PAKDD2020 Alibaba AI Ops Competition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agarwal, V., Bhattacharyya, C., Niranjan, T., Susarla, S.: Discovering rules from disk events for predicting hard drive failures. In: Proceedings of the IEEE International Conference on Machine Learning and Applications, vol. 1, pp. 782–786. IEEE, December 2009

    Google Scholar 

  2. Botezatu, M.M., Giurgiu, I., Bogojeska, J., Wiesmann, D.: Predicting disk replacement towards reliable data centers. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 13–17-August, pp. 39–48. ACM Press, August 2016

    Google Scholar 

  3. Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. vol. 13–17-August, pp. 785–794. ACM Press, August 2016

    Google Scholar 

  4. alibaba edu: The dataset of over 200 thousands hard disk drives in alibaba cloud’s data centers (2020). https://github.com/alibaba-edu/dcbrain/tree/master/diskdata

  5. Han, S., Lee, P.P., Shen, Z., He, C., Liu, Y., Huang, T.: Toward adaptive disk failure prediction via stream mining. In: Proceedings of the IEEE International Conference on Distributed Computing Systems (2020)

    Google Scholar 

  6. Kaur, K., Kaur, K.: Failure prediction and health status assessment of storage systems with decision trees. In: Luhach, A.K., Singh, D., Hsiung, P.-A., Hawari, K.B.G., Lingras, P., Singh, P.K. (eds.) ICAICR 2018. CCIS, vol. 955, pp. 366–376. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-3140-4_33

    Chapter  Google Scholar 

  7. Kaur, K., Kaur, K.: Failure prediction, lead time estimation and health degree assessment for hard disk drives using voting based decision trees. Comput. Mater. Continua 60(3), 913–946 (2019)

    Article  Google Scholar 

  8. Ke, G., et al.: Lightgbm: A highly efficient gradient boosting decision tree. In: Proceedings of the International Conference on Neural Information Processing Systems, pp. 3149–3157. Curran Associates Inc. (2017)

    Google Scholar 

  9. Li, J., et al.: Hard drive failure prediction using classification and regression trees. In: Proceedings of Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp. 383–394. IEEE, June 2014

    Google Scholar 

  10. Li, J., Stones, R.J., Wang, G., Li, Z., Liu, X., Ding, J.: New metrics for disk failure prediction that go beyond prediction accuracy. IEEE Access 6, 76627–76639 (2018)

    Article  Google Scholar 

  11. Li, J., Stones, R.J., Wang, G., Li, Z., Liu, X., Xiao, K.: Being accurate is not enough: new metrics for disk failure prediction. In: Proceedings of IEEE International Symposium on Reliable Distributed Systems, pp. 71–80. IEEE, September 2016

    Google Scholar 

  12. Li, J., Stones, R.J., Wang, G., Liu, X., Li, Z., Xu, M.: Hard drive failure prediction using decision trees. Reliabil. Eng. Syst. Safety 164, 55–65 (2017)

    Article  Google Scholar 

  13. Mahdisoltani, F., Stefanovici, I., Schroeder, B.: Proactive error prediction to improve storage system reliability. In: Proceedings of the 2017 USENIX Annual Technical Conference, pp. 391–402. USENIX Association, July 2017

    Google Scholar 

  14. Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Machine learning methods for predicting failures in hard drives: a multiple-instance application. J. Mach. Learn. Res. 6, 783–816 (2005)

    MathSciNet  MATH  Google Scholar 

  15. Pang, S., Jia, Y., Stones, R., Wang, G., Liu, X.: A combined bayesian network method for predicting drive failure times from smart attributes. In: International Joint Conference on Neural Networks, vol. 2016-October, pp. 4850–4856. IEEE, July 2016

    Google Scholar 

  16. Paris, J.F., Rincón, C.A.C., Vilalta, R., Cheng, A.M.K., Long, D.D.E.: Disk failure prediction in heterogeneous environments. In: Proceedings of the International Symposium on Performance Evaluation of Computer and Telecommunication Systems, pp. 1–7. IEEE, July 2017

    Google Scholar 

  17. Shen, J., Wan, J., Lim, S.J., Yu, L.: Random-forest-based failure prediction for hard disk drives. Int. J. Distrib. Sens. Netw. 14(11), 155014771880648 (2018)

    Article  Google Scholar 

  18. Wang, Y., Miao, Q., Pecht, M.: Health monitoring of hard disk drive based on mahalanobis distance. In: Proceedings of Prognostics and System Health Managment Confernece, pp. 1–8. IEEE, May 2011

    Google Scholar 

  19. Xiao, J., Xiong, Z., Wu, S., Yi, Y., Jin, H., Hu, K.: Disk failure prediction in data centers via online learning. In: Proceedings of the International Conference on Parallel Processing, pp. 1–10. ACM Press, August 2018

    Google Scholar 

  20. Xu, C., Wang, G., Liu, X., Guo, D., Liu, T.Y.: Health status assessment and failure prediction for hard drives with recurrent neural networks. IEEE Trans. Comput. 65(11), 3502–3508 (2016)

    Article  MathSciNet  Google Scholar 

  21. Xu, Y., et al.: Improving service availability of cloud systems by predicting disk error. In: Proceedings of the 2018 USENIX Annual Technical Conference, pp. 481–494. USENIX Association, July 2018

    Google Scholar 

  22. Yang, W., Hu, D., Liu, Y., Wang, S., Jiang, T.: Hard drive failure prediction using big data. In: Proceedings of IEEE International Symposium on Reliable Distributed Systems Workshop, vol. 2016-January, pp. 13–18. IEEE, September 2015

    Google Scholar 

  23. Zhao, Y., Liu, X., Gan, S., Zheng, W.: Predicting disk failures with HMM- and HSMM-based approaches. In: Perner, P. (ed.) ICDM 2010. LNCS (LNAI), vol. 6171, pp. 390–404. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14400-4_30

    Chapter  Google Scholar 

  24. Zhu, B., Wang, G., Liu, X., Hu, D., Lin, S., Ma, J.: Proactive drive failure prediction for large scale storage systems. In: Proceedings of the IEEE Symposium on Mass Storage Systems and Technologies, pp. 1–5. IEEE, May 2013

    Google Scholar 

Download references

Acknowledgment

This work is supported by National Natural Science Foundation of China under grants 61572408 and 61972326, and the grants of Xiamen University 20720180074.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Run-Qing Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, RQ. (2020). PAKDD2020 Alibaba AI Ops Competition: Large-Scale Disk Failure Prediction. In: He, C., Feng, M., Lee, P., Wang, P., Han, S., Liu, Y. (eds) Large-Scale Disk Failure Prediction. AI Ops 2020. Communications in Computer and Information Science, vol 1261. Springer, Singapore. https://doi.org/10.1007/978-981-15-7749-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-7749-9_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-7748-2

  • Online ISBN: 978-981-15-7749-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics