Skip to main content

SHARP: SMART HDD Anomaly Risk Prediction

  • Conference paper
  • First Online:
Large-Scale Disk Failure Prediction (AI Ops 2020)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1261))

Included in the following conference series:

  • 523 Accesses

Abstract

With the fast expansion of online media and cloud-based storage, hard disk drive failure prediction becomes an increasingly important problem that has great industry impact. In the last 20 years, much effort has been put into using machine learning method to enhance the S.M.A.R.T monitoring system. Success has been achieved at various degrees, but the state-of-the-art methods still have considerable distance from the level of performance required by industry operations. In this paper, we demonstrated that with a strategic ensemble of models that cover both short-range and long-range temporal dependencies of S.M.A.R.T data, it is possible to achieve higher overall failure prediction accuracy and robustness. Our proposed model, named SHARP, is shown to achieve 56% F1 score in one of the holdout blind tests using an industry-scale data set. In the online competition test set, the F1 score was 38%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Miller, R.: Google building more data centers for massive future clouds. https://datacenterfrontier.com/google-building-more-data-centers-for-massive-future-clouds

  2. Huang, S., Liang, S., Fu, S., Shi, W., Tiwari, D., Chen, H.: Characterizing disk health degradation and proactively protecting against disk failures for reliable storage systems. In 2019 IEEE International Conference on Autonomic Computing (ICAC), pp. 157–166. IEEE (2019)

    Google Scholar 

  3. Ganguly, S., Consul, A., Khan, A., Bussone, B., Richards, J., Miguel, A.: A practical approach to hard disk failure prediction in cloud platforms: big data model for failure management in datacenters. In 2016 IEEE Second International Conference on Big Data Computing Service and Applications (BigDataService), pp. 105–116. IEEE (2016)

    Google Scholar 

  4. Hughes, G.F., Murray, J.F., Kreutz-Delgado, K., Elkan, C.: Improved disk-drive failure warnings. IEEE Trans. Reliab. 51(3), 350–357 (2002)

    Article  Google Scholar 

  5. Hamerly, Greg, Elkan, Charles, et al.: Bayesian approaches to failure prediction for disk drives. In: ICML, vol. 1, pp. 202–209 (2001)

    Google Scholar 

  6. Chang, X., Wang, G., Liu, X., Guo, D., Liu, T.-Y.: Health status assessment and failure prediction for hard drives with recurrent neural networks. IEEE Trans. Comput. 65(11), 3502–3508 (2016)

    Article  MathSciNet  Google Scholar 

  7. Zhao, Y., Liu, X., Gan, S., Zheng, W.: Predicting disk failures with HMM- and HSMM-based approaches. In: Perner, P. (ed.) ICDM 2010. LNCS (LNAI), vol. 6171, pp. 390–404. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14400-4_30

    Chapter  Google Scholar 

  8. Wang, Y., Miao, Q., Pecht, M.: Health monitoring of hard disk drive based on Mahalanobis distance. In: 2011 Prognostics and System Health Management Conference, pp. 1–8. IEEE (2011)

    Google Scholar 

  9. Zhu, B., Wang, G., Liu, X., Hu, D., Lin, S., Ma, J.: Proactive drive failure prediction for large scale storage systems. In: 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–5. IEEE (2013)

    Google Scholar 

  10. Li, J., et al.: Hard drive failure prediction using classification and regression trees. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp. 383–394. IEEE (2014)

    Google Scholar 

  11. Yang, W., Hu, D., Liu, Y., Wang, S., Jiang, T.: Hard drive failure prediction using big data. In: 2015 IEEE 34th Symposium on Reliable Distributed Systems Workshop (SRDSW), pp. 13–18. IEEE (2015)

    Google Scholar 

  12. Pang, S., Jia, Y., Stones, R., Wang, G., Liu, X.: A combined Bayesian network method for predicting drive failure times from smart attributes. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 4850–4856. IEEE (2016)

    Google Scholar 

  13. Botezatu, M.M., Giurgiu, I., Bogojeska, J., Wiesmann, D.: Predicting disk replacement towards reliable data centers. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 39–48 (2016)

    Google Scholar 

  14. Rincón, C.A.C., Pâris, J.-F., Vilalta, R., Cheng, A.M.K., Long, D.D.E.: Disk failure prediction in heterogeneous environments. In: 2017 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS), pp. 1–7. IEEE (2017)

    Google Scholar 

  15. Li, J., Stones, R.J., Wang, G., Liu, X., Li, Z., Xu, M.: Hard drive failure prediction using decision trees. Reliab. Eng. Syst. Saf. 164, 55–65 (2017)

    Article  Google Scholar 

  16. Mahdisoltani, F., Stefanovici, I., Schroeder, B.: Proactive error prediction to improve storage system reliability. In: 2017 \(\{\)USENIX\(\}\) Annual Technical Conference (\(\{\)USENIX\(\}\)\(\{\)ATC\(\}\) 2017), pp. 391–402 (2017)

    Google Scholar 

  17. Xiao, J., Xiong, Z., Wu, S., Yi, Y., Jin, H., Hu, K.: Disk failure prediction in data centers via online learning. In: Proceedings of the 47th International Conference on Parallel Processing, pp. 1–10 (2018)

    Google Scholar 

  18. Xu, Y., et al.: Improving service availability of cloud systems by predicting disk error. In: 2018 \(\{\)USENIX\(\}\) Annual Technical Conference (\(\{\)USENIX\(\}\)\(\{\)ATC\(\}\) 2018), pp. 481–494 (2018)

    Google Scholar 

  19. Shen, J., Wan, J., Lim, S.-J., Lifeng, Yu.: Random-forest-based failure prediction for hard disk drives. Int. J. Distrib. Sensor Netw. 14(11), 1550147718806480 (2018)

    Article  Google Scholar 

  20. Kaur, K., Kaur, K.: Failure prediction and health status assessment of storage systems with decision trees. In: Luhach, A.K., et al. (eds.) ICAICR 2018. CCIS, vol. 955, pp. 366–376. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-3140-4_33

    Chapter  Google Scholar 

  21. Kaur, K., Kaur, K.: Failure prediction, lead time estimation and health degree assessment for hard disk drives using voting based decision trees. CMC Comput. Mater. Continua 60, 913–946 (2019)

    Article  Google Scholar 

  22. Breiman, L., Friedman, J.H., Olshen, R., Stone, C.J.: Classification and Regression Trees. CRC Press, Boca Raton (1984)

    MATH  Google Scholar 

  23. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)

    Google Scholar 

  24. Yoon, B.-J., Vaidyanathan, P.P.: Context-sensitive hidden Markov models for modeling long-range dependencies in symbol sequences. IEEE Trans. Signal Process. 54(11), 4169–4184 (2006)

    Article  Google Scholar 

  25. Wang, Y., Jiang, S., He, L., Peng, Y., Chow, T.W.S.: Hard disk drives failure detection using a dynamic tracking method. In: 2019 IEEE 17th International Conference on Industrial Informatics (INDIN), vol. 1, pp. 1473–1477. IEEE (2019)

    Google Scholar 

  26. Alibaba S.M.A.R.T data-set. https://github.com/alibaba-edu/dcbrain/tree/master/diskdata

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Xue .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, W., Xue, Y., Liu, P. (2020). SHARP: SMART HDD Anomaly Risk Prediction. In: He, C., Feng, M., Lee, P., Wang, P., Han, S., Liu, Y. (eds) Large-Scale Disk Failure Prediction. AI Ops 2020. Communications in Computer and Information Science, vol 1261. Springer, Singapore. https://doi.org/10.1007/978-981-15-7749-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-7749-9_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-7748-2

  • Online ISBN: 978-981-15-7749-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics