Recurrent neural network based real-time failure detection of storage devices

  • Chuan-Jun SuEmail author
  • Yi Li
Technical Paper


Studies have revealed that the failure rates of storage devices can often be as high as fourteen percent. To make matters worse, there are frequently no warning signs for precaution before catastrophic failure of storage devices occurs. A real-time predictive maintenance system that provides an automatic means for predicting when maintenance should be performed to ultimately eliminate unexpected breakdowns needs to be developed. Unlike traditional regression predictive modeling, the failure detection of storage devices is a problem of time series prediction, which adds the complexity of a sequence dependence among the input variables. The proposed LSTM (Long Short-Term Memory) network is a branch of RNN (Recurrent Neural Network) used in deep learning, which presents a very large architecture that can be successfully trained. LSTM is good at extracting patterns in input feature space, where the input data spans over long sequences. With the gated architecture of LSTM, it is capable of learning the context required to make predictions in time series forecasting. It is ideal for generating responses that depend on a time-evolving state; for example detecting the condition of storage devices over time. This paper describes our development of an LSTM (Long short-term memory), a special kind of RNN (Recurrent Neural Network)—based real-time predictive maintenance system (RPMS) built on top of Apache Spark for detecting storage device failure. By streaming real-time data into a RPMS directly from the device itself, the issues can be revealed and addressed early before they cause costly downtime.



  1. Agarwal V, Bhattacharyya C, Niranjan T and Susarla S (2009) Discovering rules from disk events for predicting hard drive failures. In: 2009. ICMLA’09. international conference on machine learning and applications, IEEE. pp 782–786Google Scholar
  2. Andy K (2017) Backblaze hard drive stats for 2017. Retrieved from
  3. Canizo M, Onieva E, Conde A, Charramendieta S and Trujillo S (2017) Real-time predictive maintenance for wind turbines using big data frameworks. In: 2017 IEEE international conference on prognostics and health management (ICPHM), IEEE. pp 70–77Google Scholar
  4. Gers FA (1999) Learning to forget: continual prediction with LSTM. In: Proc. of the 9th Int. conf. on artificial neural networks, pp 850–855Google Scholar
  5. Gers FA, Schraudolph NN, Schmidhuber J (2002) Learning precise timing with LSTM recurrent networks. J Mach Learn Res 3:115–143MathSciNetzbMATHGoogle Scholar
  6. Giles CL, Lawrence S, Tsoi A-C (1997) Rule inference for financial prediction using recurrent neural networks. In: IEEE conference on computational intelligence for financial engineering, IEEE Press, 253Google Scholar
  7. Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw 18(5–6):602–610CrossRefGoogle Scholar
  8. Hamerly G and Elkan C (2001) Bayesian approaches to failure prediction for disk drives. In: ICML. Vol. 1, pp 202–209Google Scholar
  9. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRefGoogle Scholar
  10. Hughes GF, Murray JF, Kreutz-Delgado K, Elkan C (2002) Improved disk-drive failure warnings. IEEE Trans Reliab 51(3):350–357CrossRefGoogle Scholar
  11. Kanawaday A and Sane A (2017) Machine learning for predictive maintenance of industrial machines using IoT sensor data. In: 2017 8th IEEE international conference on software engineering and service science (ICSESS), IEEE. pp 87–90Google Scholar
  12. Pinheiro E, Weber WD and Barroso LA (2007) Failure trends in a large disk drive population. In: FAST. Vol 7(1), pp 17–23)Google Scholar
  13. Pinheiro E, Weber W-D and Barroso LA (2007) Failure trends in a large disk drive population. In: Proc. of the 5th USENIX conference on file and storage technologies, p 2Google Scholar
  14. Qian S, Sun R, Fan G and Liu J (2017) Short-term traffic flow forecast based on parallel long short-term memory neural network, In: IEEE conference on software engineering and service scienceGoogle Scholar
  15. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. In: Rumelhart DE, McClelland JL (eds) Parallel distributed processing: explorations in the microstructure of cognition, vol 45. MIT Press, CambridgeGoogle Scholar
  16. Saad EW, Caudell TP and Wunsch II DC (1999) Predictive head tracking for virtual reality. In: Proceedings of the international joint conference on neural networksGoogle Scholar
  17. Sundermeyer M, Schlüter R and Ney H (2012) LSTM neural networks for language modeling. In: Thirteenth annual conference of the international speech communication associationGoogle Scholar
  18. Yogatama D, Dyer C, Ling W and Blunsom P (2017) Generative and discriminative text classification with recurrent neural networks. arXiv:1703.01898v2 [stat.ML]
  19. Zhao P, Kurihara M, Tanaka J, Noda T, Chikuma S and Suzuki T (2017) Advanced correlation-based anomaly detection method for predictive maintenance. In: 2017 IEEE international conference on prognostics and health management (ICPHM), IEEE. pp 78–83Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Industrial Engineering and ManagementYuan Ze University 135Chung-LiTaiwan, ROC

Personalised recommendations