Skip to main content
Log in

Reinforcement learning for predictive maintenance: a systematic technical review

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

The manufacturing world is subject to ever-increasing cost optimization pressures. Maintenance adds to cost and disrupts production; optimized maintenance is therefore of utmost interest. As an autonomous learning mechanism reinforcement learning (RL) is increasingly used to solve complex tasks. While designing an optimal, model-free RL solution for predictive maintenance (PdM) is an attractive proposition, there are several key steps and design elements to be considered—from modeling degradation of the physical equipment to creating RL formulations. In this article, we survey how researchers have applied RL to optimally predict maintenance in diverse forms—from early diagnosis to computing a “health index” to directly suggesting a maintenance action. Contributions of this article include developing a taxonomy for PdM techniques in general and one specifically for RL applied to PdM. We discovered and studied unique techniques and applications by applying \(tf-idf\) (a text mining technique). Furthermore, we systematically studied how researchers have mathematically formulated RL concepts and included some detailed case-studies that help demonstrate the complete flow of applying RL to PdM. Finally, in Sect. 14, we summarize the insights for researchers, and for the industrial practitioner we lay out a simple approach for implementing RL for PdM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Enhanced from source: Hui (2021)

Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Notes

  1. Sinha (2021) reports 12.3 B devices as of 2021 and estimates 27.0 B to be connected by 2025.

  2. As of July-2022, combined over Scopus, Web of Science and IEEE Xplore.

  3. The R Project for Statistical Computing: https://www.r-project.org/.

  4. A variant of PM is “opportunistic” maintenance; triggered when a machine fails before \(\tau\) and is administered CM and all other machines receive PM.

  5. For certain RL problems, for example when the reward is stochastic, this definition may not strictly apply. For the problem to be solvable, it is sufficient if the expected value is a function of s, a, and \(s^{\prime}\).

  6. Not all RL algorithms are value based; some estimate the return over long trajectories governed by policies (Swazinna et al. 2022; Schaefer et al. 2007). However, in general, they all seek the Bellman optimality condition.

  7. The Bellman equation is designed to consider all state-action pairs. In case of partial, sampled data, using it as a surrogate objective for value prediction, is a poor choice (Fujimoto et al. 2022).

  8. As of 09-Feb-2023, the search “‘reinforcement learning’ AND ‘predictive maintenance’ AND (PID OR MPC)”, did not return any results on the Scopus and Web Of Science databases.

  9. This statistical model is different from the MDP “model” we refer to in the model-based/model-free RL.

  10. This assists in stabilizing learning in neural-networks.

References

  • Abernethy RB (2018) Dr. E. H. Wallodi Weibull. http://km.fgg.uni-lj.si/PREDMETI/sei/Ljudje/weibull.htm

  • Abudali M, Siegel D (2021) A pressing case for predictive analytics at Maclean–Fogg. https://www.plantengineering.com/articles/a-pressing-case-for-predictive-analytics-at-maclean-fogg/

  • Achiam J (2018a) Deep deterministic policy gradient—the q-learning side of DDPG. https://spinningup.openai.com/en/latest/algorithms/ddpg.html#the-q-learning-side-of-ddpg

  • Achiam J (2018b) Part 1: key concepts in RL—spinning up documentation. OpenAI. https://spinningup.openai.com/en/latest/spinningup/rl_intro.html#key-concepts-and-terminology

  • Adams S, Meekins R, Beling P et al (2019) Hierarchical fault classification for resource constrained systems. Mech Syst Signal Process. https://doi.org/10.1016/j.ymssp.2019.106266

  • Adsule A, Kulkarni M, Tewari A (2020) Reinforcement learning for optimal policy learning in condition-based maintenance. IET Collabor Intell Manuf 2(4):182–188. https://doi.org/10.1049/IET-CIM.2020.0022

    Article  Google Scholar 

  • Afshari H, Al-Ani D, Habibi S (2014) Fault prognosis of roller bearings using the adaptive auto-step reinforcement learning technique. In: ASME 2014 dynamic systems and control conference (DSCC 20140), p 1. https://doi.org/10.1115/dscc2014-5928

  • Ahmed I, Khorasgani H, Biswas G (2018) Comparison of model predictive and reinforcement learning methods for fault tolerant control. IFAC-Papers OnLine 51(24):233–240

    Article  Google Scholar 

  • Aissani N, Beldjilali B, Trentesaux D (2009) Dynamic scheduling of maintenance tasks in the petroleum industry: a reinforcement approach. Eng Appl Artif Intell 22(7):1089–1103. https://doi.org/10.1016/j.engappai.2009.01.014

    Article  Google Scholar 

  • Alimi M, Rhif A, Rebai A et al (2021) Optimal adaptive backstepping control for chaos synchronization of nonlinear dynamical systems. Backstepping Control of Nonlinear Dynamical Systems pp 291–345

  • Andriotis C, Papakonstantinou K (2019) Managing engineering systems with large state and action spaces through deep reinforcement learning. Reliab Eng Syst Saf 191:106483. https://doi.org/10.1016/j.ress.2019.04.036

  • Andriotis C, Papakonstantinou K (2021) Deep reinforcement learning driven inspection and maintenance planning under incomplete information and constraints. Reliab Eng Syst Saf 212(107):551

    Google Scholar 

  • Bala R, Govinda R, Murthy CS (2018) Reliability analysis and failure rate evaluation of load haul dump machines using weibull distribution analysis. Math Model 5(2):116–122. https://doi.org/10.18280/mmep.050209

  • Barde S, Yacout S, Shin H (2019) Optimal preventive maintenance policy based on reinforcement learning of a fleet of military trucks. J Intell Manuf 30(1):147–161. https://doi.org/10.1007/s10845-016-1237-7

    Article  Google Scholar 

  • Barja-Martinez S, Aragüés-Peñalba M, Munné-Collado Í et al (2021) Artificial intelligence techniques for enabling big data services in distribution networks: a review. Renew Sustain Energy Rev 150(111):459. https://doi.org/10.1016/j.rser.2021.111459

    Article  Google Scholar 

  • Baykal-Gürsoy M (2010) Semi-markov decision processes. In: Wiley encyclopedia of operations research and management science. Wiley, Hoboken

  • Bellani L, Compare M, Baraldi P et al (2019) Towards developing a novel framework for practical PHM: a sequential decision problem solved by reinforcement learning and artificial neural networks. Int J Progn Health Manag 10(4). https://doi.org/10.36001/ijphm.2019.v10i4.2616

  • Ben-Daya M, Duffuaa SO, Raouf A (2012) Maintenance, modeling and optimization. Springer, Berlin

    Google Scholar 

  • Burke R, Mussomeli A, Laaper S et al (2017) The smart factory. Deloitte Insights. https://www2.deloitte.com/us/en/insights/focus/industry-4-0/smart-factory-connected-manufacturing.html

  • Busoniu L, Babuska R, De Schutter B et al (2017) Reinforcement learning and dynamic programming using function approximators. CRC Press, Boca Raton

    Book  Google Scholar 

  • Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat Theory Methods 3(1):1–27

    Article  MathSciNet  MATH  Google Scholar 

  • Chen H, Li X (2011) Distributed active learning with application to battery health management. In: 14th International conference on information fusion 2011

  • Chen Z, Wu M, Zhao R et al (2020) Machine remaining useful life prediction via an attention-based deep learning approach. IEEE Trans Ind Electron 68(3):2521–2531

    Article  Google Scholar 

  • Chen G, Liu M, Kong Z (2021) Temporal-logic-based semantic fault diagnosis with time-series data from industrial Internet of Things. IEEE Trans Ind Electron 68(5):4393–4403. https://doi.org/10.1109/TIE.2020.2984976

    Article  Google Scholar 

  • Chen Y, Liu Y, Xiahou T (2022) A deep reinforcement learning approach to dynamic loading strategy of repairable multistate systems. IEEE Trans Reliab 71(1):484–499. https://doi.org/10.1109/TR.2020.3044596

    Article  Google Scholar 

  • Cheng M, Frangopol D (2021) A decision-making framework for load rating planning of aging bridges using deep reinforcement learning. J Comput Civ Eng. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000991

  • Cheng Y, Peng J, Gu X et al (2018) RLCP: a reinforcement learning method for health stage division using change points. In: 2018 IEEE international conference on prognostics and health management (ICPHM 2018). https://doi.org/10.1109/ICPHM.2018.8448499

  • Coleman C, Damodaran S, Deuel E (2017) Predictive maintenance and the smart factory. https://www2.deloitte.com/content/dam/Deloitte/us/Documents/process-and-operations/us-cons-predictive-maintenance.pdf

  • Compare M, Bellani L, Cobelli E et al (2020) A reinforcement learning approach to optimal part flow management for gas turbine maintenance. Proc Inst Mech Eng Part O J Risk Reliab 234(1):52–62. https://doi.org/10.1177/1748006X19869750

    Article  Google Scholar 

  • Correa JCAJ, Guzman AAL (2020) Guidelines for the implementation of a predictive maintenance program. Mech Vib Condit Monit. https://doi.org/10.1016/B978-0-12-819796-7.00007-X

  • Correa-Jullian C, Droguett EL, Cardemil JM (2020) Operation scheduling in a solar thermal system: a reinforcement learning-based framework. Appl Energy. https://doi.org/10.1016/j.apenergy.2020.114943

  • Cui P, Wang J, Zhang W et al (2021) Predictive maintenance decision-making for serial production lines based on deep reinforcement learning. Comput Integrated Manuf Syst (CIMS) 27(12):3416–3428. https://doi.org/10.13196/j.cims.2021.12.004

    Article  Google Scholar 

  • Cui PH, Wang JQ, Li Y (2022) Data-driven modelling, analysis and improvement of multistage production systems with predictive maintenance and product quality. Int J Prod Res 60(22):6848–6865

    Article  Google Scholar 

  • Dahlqvist F, Patel M, Rajko A et al (2019) Growing opportunities in the Internet Of Things. https://www.mckinsey.com/industries/private-equity-and-principal-investors/our-insights/growing-opportunities-in-the-internet-of-things

  • Dai W, Mo Z, Luo C et al (2020) Fault diagnosis of rotating machinery based on deep reinforcement learning and reciprocal of smoothness index. IEEE Sensors J 20(15):8307–8315. https://doi.org/10.1109/JSEN.2020.2970747

    Article  Google Scholar 

  • Dai Z, Jiang M, Li X et al (2021) Reinforcement lion swarm optimization algorithm for tool wear prediction. In: 2021 Global reliability and prognostics and health management (PHM)—Nanjing 2021. https://doi.org/10.1109/PHM-Nanjing52125.2021.9613134

  • Dangut M, Jennions I, King S et al (2022) Application of deep reinforcement learning for extremely rare failure prediction in aircraft maintenance. Mech Syst Signal Process. https://doi.org/10.1016/j.ymssp.2022.108873

  • Das T, Gosavi A, Mahadevan S et al (1999) Solving semi-Markov decision problems using average reward reinforcement learning. Manag Sci 45(4):560–574. https://doi.org/10.1287/mnsc.45.4.560

    Article  MATH  Google Scholar 

  • Dau HA, Bagnall A, Kamgar K et al (2019) The UCR time series archive. Mach Learn. arXiv:1810.07758

  • Deloitte (2020) Industry 4.0. Deloitte Insights https://www2.deloitte.com/us/en/insights/focus/industry-4-0.html

  • Ding F, He Z, Zi Y et al (2008) Application of support vector machine for equipment reliability forecasting. In: 2008 6th IEEE international conference on industrial informatics, pp 526–530

  • Ding Y, Ma L, Ma J et al (2019) Intelligent fault diagnosis for rotating machinery using deep Q-network based health state classification: a deep reinforcement learning approach. Adv Eng Inf. https://doi.org/10.1016/j.aei.2019.100977

  • Dogru O, Velswamy K, Ibrahim F et al (2022) Reinforcement learning approach to autonomous PID tuning. Comput Chem Eng 161(107):760. https://doi.org/10.1016/j.compchemeng.2022.107760

    Article  Google Scholar 

  • Dong S, Wen G, Lei Z et al (2021a) Transfer learning for bearing performance degradation assessment based on deep hierarchical features. ISA Trans 108:343–355. https://doi.org/10.1016/j.isatra.2020.09.004

    Article  Google Scholar 

  • Dong W, Zhao T, Wu Y (2021b) Deep reinforcement learning based preventive maintenance for wind turbines. In: 2021 IEEE 5th conference on energy internet and energy system integration (EI2), pp 2860–2865

  • Duan Y, Chen X, Houthooft R et al (2016) Benchmarking deep reinforcement learning for continuous control. arXiv:1604.06778

  • Dulac-Arnold G, Levine N, Mankowitz DJ et al (2021) Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Mach Learn. https://doi.org/10.1007/s10994-021-05961-4

  • Eke S, Aka-Ngnui T, Clerc G et al (2017) Characterization of the operating periods of a power transformer by clustering the dissolved gas data. In: 2017 IEEE 11th International symposium on diagnostics for electrical machines, power electronics and drives (SDEMPED), pp 298–303

  • Eltotongy A, Awad M, Maged S et al (2021) Fault detection and classification of machinery bearing under variable operating conditions based on wavelet transform and CNN. 2021 International Mobile. Intelligent, and Ubiquitous Computing Conference, MIUCC 2021:117–123. https://doi.org/10.1109/MIUCC52538.2021.9447673

  • Encapera A, Gosavi A (2017) A new reinforcement learning algorithm with fixed exploration for semi-markov control in preventive maintenance. In: ASME 2017 12th international manufacturing science and engineering conference (MSEC 2017) collocated with the JSME/ASME 2017 6th international conference on materials and processing 3. https://doi.org/10.1115/MSEC2017-2880

  • Epureanu B, Li X, Nassehi A et al (2020) Self-repair of smart manufacturing systems by deep reinforcement learning. CIRP Ann 69:421–424. https://doi.org/10.1016/j.cirp.2020.04.008

    Article  Google Scholar 

  • Erhan L, Ndubuaku M, Di Mauro M et al (2021) Smart anomaly detection in sensor systems: a multi-perspective review. In Fusion 67:64–79. https://doi.org/10.1016/j.inffus.2020.10.001

    Article  Google Scholar 

  • Ericsson (2021) IoT connections outlook. https://www.ericsson.com/en/reports-and-papers/mobility-report/dataforecasts/iot-connections-outlook

  • Fei Y, Yang Z, Wang Z (2021) Risk-sensitive reinforcement learning with function approximation: a debiasing approach. In: International conference on machine learning (PMLR), pp 3198–3207

  • Feng M, Li Y (2022) Predictive maintenance decision making based on reinforcement learning in multistage production systems. IEEE Access 10:18910–18921. https://doi.org/10.1109/ACCESS.2022.3151170

  • Fink O, Wang Q, Svensén M et al (2020) Potential, challenges and future directions for deep learning in prognostics and health management applications. Eng Appl Artif Intell. https://doi.org/10.1016/j.engappai.2020.103678

  • Fons E, Dawson P, Zeng X et al (2021) Adaptive weighting scheme for automatic time-series data augmentation. arXiv preprint. arXiv:2102.08310

  • Frangopol DM, Lin KY, Estes AC (1997) Life-cycle cost design of deteriorating structures. J Struct Eng 123(10):1390–1401

    Article  Google Scholar 

  • Fujimoto S, Meger D, Precup D et al (2022) Why should I trust you, bellman? the bellman error is a poor replacement for value error. arXiv preprint. arXiv:2201.12417

  • Gosavi A (2004a) A reinforcement learning algorithm based on policy iteration for average reward: empirical results with yield management and convergence analysis. Mach Learn 55(1):5–29

    Article  MATH  Google Scholar 

  • Gosavi A (2004b) Reinforcement learning for long-run average cost. Eur J Oper Res 155(3):654–674. https://doi.org/10.1016/S0377-2217(02)00874-3

    Article  MathSciNet  MATH  Google Scholar 

  • Gosavi A, Parulekar A (2016) Solving markov decision processes with downside risk adjustment. Int J Automat Comput 13(3):235–245. https://doi.org/10.1007/s11633-016-1005-3

    Article  Google Scholar 

  • Grzes M (2017) Reward shaping in episodic reinforcement learning. In: Proceedings of the international joint conference on autonomous agents and multiagent systems (AAMAS) 1

  • Hardt M, Recht B, Singer Y (2015) Train faster, generalize better: stability of stochastic gradient descent. In: Proceedings of the 33rd international conference on machine learning

  • Hasselt HV, Guez A, Silver D (2016) Deep reinforcement learning with double Q-learning. In: 30th AAAI conference on artificial intelligence (AAAI 2016)

  • Henderson P, Islam R, Bachman P et al (2018) Deep reinforcement learning that matters. In: Proceedings of the AAAI conference on artificial intelligence, vol 32(1)

  • Hofmann P, Tashman Z (2020) Hidden markov models and their application for predicting failure events. Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and Lecture notes in bioinformatics), vol 12139. LNCS, pp 464–477. https://doi.org/10.1007/978-3-030-50420-5_35

  • Hoffmann C, Altenüller T, May MC et al (2021) Simulative dispatching optimization of maintenance resources in a semiconductor use-case using reinforcement learning. In: Simulation in Produktion und Logistik 2021, Erlangen, 15–17 September 2021, p 357

  • Hoong Ong K, Niyato D, Yuen C (2020) Predictive maintenance for edge-based sensor networks: a deep reinforcement learning approach. In: IEEE world forum on Internet of Things (WF-IoT 2020)—symposium proceedings. https://doi.org/10.1109/WF-IoT48130.2020.9221098

  • Hosseinloo A, Dahleh M (2021) Deterministic policy gradient algorithms for semi-Markov decision processes. Int J Intell Syst. https://doi.org/10.1002/int.22709

    Article  Google Scholar 

  • Hu Q, Yue W (2003) Optimal replacement of a system according TOA semi-markov decision process in a semi-Markov environment. Optim Methods Softw 18(2):181–196

    Article  MathSciNet  MATH  Google Scholar 

  • Hu Y, Miao X, Zhang J et al (2021a) Reinforcement learning-driven maintenance strategy: a novel solution for long-term aircraft maintenance decision optimization. Comput Ind Eng. https://doi.org/10.1016/j.cie.2020.107056

  • Hua Y, Wang X, Jin B et al (2021b) HMRL: hyper-meta learning for sparse reward reinforcement learning problem. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pp 637–645

  • Huang J, Chang Q, Chakraborty N (2019) Machine preventive replacement policy for serial production lines based on reinforcement learning. In: IEEE international conference on automation science and engineering 2019, August, pp 523–528. https://doi.org/10.1109/COASE.2019.8843338

  • Huang J, Chang Q, Arinez J (2020) Deep reinforcement learning based preventive maintenance policy for serial production lines. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2020.113701

  • Hui J (2021) Reinforcement learning algorithms comparison. https://jonathan-hui.medium.com/rl-reinforcement-learning-algorithms-comparison-76df90f180cf

  • Hutsebaut-Buysse M, Mets K, Latré S (2022) Hierarchical reinforcement learning: a survey and open research challenges. Mach Learn Knowl Extr 4(1):172–221

    Article  Google Scholar 

  • Icarte RT, Klassen TQ, Valenzano R et al (2022) Reward machines: exploiting reward function structure in reinforcement learning. J Artif Intell Res 73:173–208

    Article  MathSciNet  MATH  Google Scholar 

  • Imagawa T, Hiraoka T, Tsuruoka Y (2022) Off-policy meta-reinforcement learning with belief-based task inference. IEEE Access 10:49494–49507

    Article  Google Scholar 

  • Jaakkola T, Singh S, Jordan M (1994) Reinforcement learning algorithm for partially observable markov decision problems. Adv Neural Inf Process Syst. https://proceedings.neurips.cc/paper/1994/file/1c1d4df596d01da60385f0bb17a4a9e0-Paper.pdf

  • Jha M, Theilliol D, Biswas G et al (2019a) Approximate q-learning approach for health aware control design. In: Conference on control and fault-tolerant systems (SysTol), pp 418–423. https://doi.org/10.1109/SYSTOL.2019.8864756

  • Jha M, Weber P, Theilliol D et al (2019b) A reinforcement learning approach to health aware control strategy. In: 27th Mediterranean conference on control and automation (MED 2019)—proceedings, pp 171–176. https://doi.org/10.1109/MED.2019.8798548

  • Kabir F, Foggo B, Yu N (2018) Data driven predictive maintenance of distribution transformers. In: 2018 China international conference on electricity distribution (CICED), pp 312–316. https://doi.org/10.1109/CICED.2018.8592417

  • Khan S, Farnsworth M, McWilliam R et al (2020) On the requirements of digital twin-driven autonomous maintenance. Annu Rev Control 50:13–28. https://doi.org/10.1016/j.arcontrol.2020.08.003

    Article  Google Scholar 

  • Knowles M, Baglee D, Wermter S (2011) Reinforcement learning for scheduling of maintenance. In: Research and development in intelligent systems XXVII: incorporating applications and innovations in Intelligent systems XVIII—AI 2010, 30th SGAI international conference on innovative techniques and applications of artificial intelligence, pp 409–422. https://doi.org/10.1007/978-0-85729-130-1_31

  • Kofinas P, Dounis AI (2019) Online tuning of a PID controller with a fuzzy reinforcement learning mas for flow rate control of a desalination unit. Electronics 8(2):231

    Article  Google Scholar 

  • Kuhnle A, Jakubik J, Lanza G (2019) Reinforcement learning for opportunistic maintenance optimization. Prod Eng 13(1):33–41

    Article  Google Scholar 

  • Laape S, Dollar B, Cotteleer M et al (2020) Implementing the smart factory. Deloitte Insights. https://www2.deloitte.com/us/en/insights/topics/digital-transformation/smart-factory-2-0-technology-initiatives.html

  • Lange S, Gabel T, Riedmiller M (2012) Batch reinforcement learning, reinforcement learning. In: Wiering M, van Otterlo M (eds) Reinforcement learning. Adaptation, learning, and optimization. Springer, Berlin, pp 45–73

  • Lee J, Wu F, Zhao W et al (2014) Prognostics and health management design for rotary machinery systems—reviews, methodology and applications. Mech Syst Signal Process 42(1–2):314–334

    Article  Google Scholar 

  • Lepenioti K, Pertselakis M, Bousdekis A et al (2020) Machine learning for predictive and prescriptive analytics of operational data in smart manufacturing. Lecture notes in business information processing, vol 382 LNBIP, pp 5–16. https://doi.org/10.1007/978-3-030-49165-9_1

  • Lewis F, Vrabie D, Vamvoudakis K (2012) Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers. https://ieeexplore.ieee.org/document/6315769

  • Li Z (2019) CWRU bearing dataset and Gearbox dataset of IEEE PHM challenge competition in 2009. https://doi.org/10.21227/g8ts-zd15

  • Li Z, Guo J, Zhou R (2016) Maintenance scheduling optimization based on reliability and prognostics information. In: 2016 Annual reliability and maintainability symposium (RAMS), pp 1–5. https://doi.org/10.1109/RAMS.2016.7448069

  • Li B, Zhou Y (2020) Multi-component maintenance optimization: an approach combining genetic algorithm and multiagent reinforcement learning. In: 2020 global reliability and prognostics and health management (PHM—Shanghai), pp 1–7

  • Li J, Blumenfeld DE, Huang N et al (2009) Throughput analysis of production systems: recent advances and future topics. Int J Prod Res 47(14):3823–3851. https://doi.org/10.1080/00207540701829752

    Article  Google Scholar 

  • Li X, Qian J, Gg Wang (2013) Fault prognostic based on hybrid method of state judgment and regression. Adv Mech Eng 5(149):562

    Google Scholar 

  • Li Z, Zhong S, Lin L (2019) An aero-engine life-cycle maintenance policy optimization algorithm: reinforcement learning approach. Chin J Aeronaut 32(9):2133–2150. https://doi.org/10.1016/j.cja.2019.07.003

    Article  Google Scholar 

  • Li L, Liu J, Wei S et al (2021) Smart robot-enabled remaining useful life prediction and maintenance optimization for complex structures using artificial intelligence and machine learning. Proc SPIE. https://doi.org/10.1117/12.2589045

  • Lillicrap TP, Hunt JJ, Pritzel A et al (2015) Continuous control with deep reinforcement learning. arXiv e-prints. arXiv:1509.02971

  • Ling Z, Wang X, Qu F (2018) Reinforcement learning-based maintenance scheduling for resource constrained flow line system. In: 2018 IEEE 4th international conference on control science and systems engineering (ICCSSE 2018), pp 364–369. https://doi.org/10.1109/CCSSE.2018.8724807

  • Liu K, Gebraeel NZ, Shi J (2013) A data-level fusion model for developing composite health indices for degradation modeling and prognostic analysis. IEEE Trans Automat Sci Eng 10(3):652–664

    Article  Google Scholar 

  • Liu L, Wang Z, Zhang H (2017) Adaptive fault-tolerant tracking control for MIMO discrete-time systems via reinforcement learning algorithm with less learning parameters. IEEE Trans Automat Sci Eng 14(1):299–313. https://doi.org/10.1109/TASE.2016.2517155

    Article  Google Scholar 

  • Liu Y, Chen Y, Jiang T (2020) Dynamic selective maintenance optimization for multi-state systems over a finite horizon: a deep reinforcement learning approach. Eur J Oper Res 283(1):166–181. https://doi.org/10.1016/j.ejor.2019.10.049

    Article  MathSciNet  MATH  Google Scholar 

  • Luo Y (2021) Application of reinforcement learning algorithm model in gas path fault intelligent diagnosis of gas turbine. Comput Intell Neurosci. https://doi.org/10.1155/2021/3897077

  • Ma Z, Guo J, Mao S et al (2020) An interpretability research of the XGBoost algorithm in remaining useful life prediction. In: 2020 International conference on big data & artificial intelligence & software engineering (ICBASE), pp 433–438

  • Macek K, Endel P, Cauchi N et al (2017) Long-term predictive maintenance: a study of optimal cleaning of biomass boilers. Energy Build 150:111–117

    Article  Google Scholar 

  • Mahadevan S, Marchalleck N, Das TK et al (1997) Self-improving factory simulation using continuous-time average-reward reinforcement learning. In: Machine learning international workshop. Morgan Kaufmann Publishers, Los Angeles

  • Mahmood AR, Sutton RS, Degris T et al (2012) Tuning-free step-size adaptation. In: 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2121–2124

  • Mann L, Saxena A, Knapp GM (1995) Statistical-based or condition-based preventive maintenance? J Qual Maintenance Eng 6(5):519–541

  • Mao H, Liu Z, Qiu C (2021) Adaptive disassembly sequence planning for VR maintenance training via deep reinforcement learning. Int J Adv Manuf Technol. https://doi.org/10.1007/s00170-021-08290-x

    Article  Google Scholar 

  • Martinez C, Perrin G, Ramasso E et al (2018) A deep reinforcement learning approach for early classification of time series. In: European signal processing conference 2018, September, pp 2030–2034. https://doi.org/10.23919/EUSIPCO.2018.8553544

  • Mattioli J, Perico P, Robic PO (2020) Improve total production maintenance with artificial intelligence. In: Proceedings—2020 3rd international conference on artificial intelligence for industries (AI4I 2020), pp 56–59. https://doi.org/10.1109/AI4I49448.2020.00019

  • Mehndiratta M, Camci E, Kayacan E (2018) Automated tuning of nonlinear model predictive controller by reinforcement learning. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 3016–3021

  • Meng H, Ludema K (1995) Wear models and predictive equations: their form and content. Wear 181:443–457

    Article  Google Scholar 

  • Mikhail M, Yacout S, Ouali M (2019) Optimal preventive maintenance strategy using reinforcement learning. In: Proceedings of the international conference on industrial engineering and operations management, pp 133–141

  • Min W, Chao Q (2012) Reinforcement learning based maintenance scheduling for a two-machine flow line with deteriorating quality states. In: Proceedings—2012 3rd global congress on intelligent systems (GCIS 2012), pp 176–179. https://doi.org/10.1109/GCIS.2012.82

  • Moos J, Hansel K, Abdulsamad H et al (2022) Robust reinforcement learning: a review of foundations and recent advances. Mach Learn Knowl Extr 4(1):276–315

    Article  Google Scholar 

  • Morimoto J, Doya K (2005) Robust reinforcement learning. Neural Comput 17(2):335–359

    Article  MathSciNet  Google Scholar 

  • Nair A, Gupta A, Dalal M et al (2020) AWAC: accelerating online reinforcement learning with offline datasets. arXiv preprint. arXiv:2006.09359

  • Narvekar S, Peng B, Leonetti M et al (2020) Curriculum learning for reinforcement learning domains: a framework and survey. CoRR. arXiv:2003.04960

  • Nectoux P, Gouriveau R, Medjaher K et al (2012) Pronostia: an experimental platform for bearings accelerated degradation tests. In: IEEE international conference on prognostics and health management (PHM’12), pp 1–8

  • Ng AY, Coates A, Diel M et al (2006) Autonomous inverted helicopter flight via reinforcement learning. Experimental Robotics IX pp 363–372

  • Ong K, Wenbo W, Friedrichs T et al (2021a) Augmented human intelligence for decision making in maintenance risk taking tasks using reinforcement learning. In: Conference proceedings—IEEE international conference on systems, man and cybernetics, pp 3114–3120. https://doi.org/10.1109/SMC52423.2021.9658936

  • Ong K, Wenbo W, Niyato D et al (2021b) Deep reinforcement learning based predictive maintenance model for effective resource management in industrial IoT. IEEE Internet Things J. https://doi.org/10.1109/JIOT.2021.3109955

    Article  Google Scholar 

  • Ozturk S, Fthenakis V, Faulstich S (2018) Failure modes, effects and criticality analysis for wind turbines considering climatic regions and comparing geared and direct drive wind turbines. Energies 11(9):2317

    Article  Google Scholar 

  • Panzer M, Bender B (2021) Deep reinforcement learning in production systems: a systematic literature review. Int J Prod Res 60(3):1–26

  • Paraschos P, Koulinas G, Koulouriotis D (2020) Reinforcement learning for combined production-maintenance and quality control of a manufacturing system with deterioration failures. J Manuf Syst 56:470–483. https://doi.org/10.1016/j.jmsy.2020.07.004

    Article  Google Scholar 

  • Patil S, Abbeel P (2013) Partially observable markov decision processes (POMDPs). Guest Lecture: CS287 advanced robotics

  • Pinciroli L, Baraldi P, Compare M et al (2020) Agent-based modeling and reinforcement learning for optimizing energy systems operation and maintenance: the pathmind solution. In: Proceedings of the 30th European safety and reliability conference and the 15th probabilistic safety assessment and management conference, pp 1476–1480. https://doi.org/10.3850/978-981-14-8593-0_5863-cd

  • Pinciroli L, Baraldi P, Ballabio G et al (2021) Deep reinforcement learning based on proximal policy optimization for the maintenance of a wind farm with multiple crews. Energies. https://doi.org/10.3390/en14206743

  • Pinciroli L, Baraldi P, Ballabio G et al (2022) Optimization of the operation and maintenance of renewable energy systems by deep reinforcement learning. Renew Energy 183:752–763. https://doi.org/10.1016/j.renene.2021.11.052

    Article  Google Scholar 

  • Pinto L, Davidson J, Sukthankar R et al (2017a) Robust adversarial reinforcement learning. In: International conference on machine learning (PMLR), pp 2817–2826

  • Plappert M, Houthooft R, Dhariwal P et al (2017b) Parameter space noise for exploration. arXiv preprint. arXiv:1706.01905

  • Powell WB (2009) What you should know about approximate dynamic programming. Naval Res Logist NRL) 56(3):239–249

    Article  MathSciNet  MATH  Google Scholar 

  • Prashanth L, Fu MC et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693

    Article  MATH  Google Scholar 

  • Prognostics HM Society (2010) 2010 PHM society conference data challenge. https://phmsociety.org/phm_competition/2010-phm-society-conference-data-challenge/

  • Ramasso E (2014) Investigating computational geometry for failure prognostics in presence of imprecise health indicator: results and comparisons on C-MAPSS datasets. In: PHM society European conference 2(1)

  • Ren Y (2021) Optimizing predictive maintenance with machine learning for reliability improvement. ASCE ASME J Risk Uncertain Eng Syst Part B Mech Eng. https://doi.org/10.1115/1.4049525

  • Rocchetta R, Bellani L, Compare M et al (2019) A reinforcement learning framework for optimal operation and maintenance of power grids. Appl Energy 241:291–301. https://doi.org/10.1016/j.apenergy.2019.03.027

    Article  Google Scholar 

  • Russenschuck S (1999) Mathematical optimization techniques. Tech. rep., CERN

  • Sateesh Babu G, Zhao P, Li XL (2016) Deep convolutional neural network based regression approach for estimation of remaining useful life. In: International conference on database systems for advanced applications, pp 214–228

  • Saxena A, Goebel K (2008) Turbofan engine degradation simulation data set. http://ti.arc.nasa.gov/project/prognostic-data-repository

  • Saxena A, Goebel K, Simon D et al (2008) Damage propagation modeling for aircraft engine run-to-failure simulation. In: 2008 international conference on prognostics and health management, pp 1–9

  • Saxena A, Celaya J, Saha B et al (2010a) Evaluating prognostics performance for algorithms incorporating uncertainty estimates. In: 2010 IEEE aerospace conference, pp 1–11

  • Saxena A, Celaya J, Saha B et al (2010b) Metrics for offline evaluation of prognostic performance. Int J Prognost Health Manag 1(1):4–23

    Google Scholar 

  • Saydam D, Frangopol DM (2015) Risk-based maintenance optimization of deteriorating bridges. J Struct Eng 141(4):04014120. https://doi.org/10.1061/(ASCE)ST.1943-541X.0001038

    Article  Google Scholar 

  • Sayyad S, Kumar S, Bongale A et al (2022) Tool wear prediction using long short-term memory variants and hybrid feature selection techniques. Int J Adv Manuf Technol 121(9):6611–6633

    Article  Google Scholar 

  • Schaefer AM, Udluft S, Zimmermann HG (2007) A recurrent control neural network for data efficient reinforcement learning. In: 2007 IEEE international symposium on approximate dynamic programming and reinforcement learning (IEEE), pp 151–157

  • Scheibelhofer P, Gleispach D, Hayderer G et al (2012) A methodology for predictive maintenance in semiconductor manufacturing. Aust J Stat 41(3):161–173

    Google Scholar 

  • Senthil C, Pandian R (2022) Proactive maintenance model using reinforcement learning algorithm in rubber industry. Processes. https://doi.org/10.3390/pr10020371

  • Shen Y, Tobia MJ, Sommer T et al (2014) Risk-sensitive reinforcement learning. Neural Comput 26(7):1298–1328

    Article  MathSciNet  MATH  Google Scholar 

  • Shi Y, Xiang Y, Jin T (2019) Structured maintenance policies for deteriorating transportation infrastructures: combination of maintenance types. In: Proceedings of annual reliability and maintainability symposium 2019, January. https://doi.org/10.1109/RAMS.2019.8769227

  • Shi Q, Lam HK, Xuan C et al (2020) Adaptive neuro-fuzzy pid controller based on twin delayed deep deterministic policy gradient algorithm. Neurocomputing 402:183–194. https://doi.org/10.1016/j.neucom.2020.03.063

    Article  Google Scholar 

  • Shuvo S, Yilmaz Y (2020) Predictive maintenance for increasing EV charging load in distribution power system. In: 2020 IEEE international conference on communications, control, and computing technologies for smart grids, SmartGridComm 2020 https://doi.org/10.1109/SmartGridComm47815.2020.9303021

  • Singh SP, Jaakkola T, Jordan MI (1994) Learning without state-estimation in partially observable Markovian decision processes. Mach Learn Proc 1994:284–292

    Google Scholar 

  • Sinha S (2021) State of IoT 2021. https://iot-analytics.com/number-connected-iot-devices/

  • Skordilis E, Moghaddass R (2020) A deep reinforcement learning approach for real-time sensor-driven decision making and predictive analytics. Comput Ind Eng. https://doi.org/10.1016/j.cie.2020.106600

  • Skydt MR, Bang M, Shaker HR (2021) A probabilistic sequence classification approach for early fault prediction in distribution grids using long short-term memory neural networks. Measurement 170(108):691

    Google Scholar 

  • Song X, Jiang Y, Tu S et al (2019) Observational overfitting in reinforcement learning. arXiv preprint. arXiv:1912.02975

  • Su J, Huang J, Adams S et al (2022) Deep multi-agent reinforcement learning for multi-level preventive maintenance in manufacturing systems. Expert Syst Appl 192(116):323. https://doi.org/10.1016/j.eswa.2021.116323

    Article  Google Scholar 

  • Susto GA, Schirru A, Pampuri S et al (2013) A predictive maintenance system for integral type faults based on support vector machines: an application to ion implantation. In: 2013 IEEE international conference on automation science and engineering (CASE), pp 195–200

  • Susto GA, Wan J, Pampuri S et al (2014) An adaptive machine learning decision system for flexible predictive maintenance. In: 2014 IEEE international conference on automation science and engineering (CASE), pp 806–811

  • Sutton R, Barto A (2018) Reinforcement learning: an introduction, 2nd edn. MIT, Cambridge

    MATH  Google Scholar 

  • Sutton RS, Precup D, Singh S (1999) Between mdps and semi-MDPS: a framework for temporal abstraction in reinforcement learning. Artif Intell 112(1–2):181–211

    Article  MathSciNet  MATH  Google Scholar 

  • Swazinna P, Udluft S, Hein D et al (2022) Comparing model-free and model-based algorithms for offline reinforcement learning. arXiv preprint. arXiv:2201.05433

  • Tanimoto A (2021) Combinatorial Q-learning for condition-based infrastructure maintenance. IEEE Access 9:46788-46799. https://doi.org/10.1109/ACCESS.2021.3059244

    Article  Google Scholar 

  • Templier M, Paré G (2015) A framework for guiding and evaluating literature reviews. Commun Assoc Inf Syst 37(1):6

    Google Scholar 

  • Thomas D (2020) Manufacturing machinery maintenance—NIST. National Institute of Standards and Technology (NIST), Gaithersburg. https://www.nist.gov/el/applied-economics-office/manufacturing/topics-manufacturing/manufacturing-machinery-maintenance

  • Thomas DS, Weiss BA (2020) Economics of manufacturing machinery maintenance. National Institute of Standards and Technology (NIST), Gaithersburg. https://doi.org/10.6028/NIST.AMS.100-34https://nvlpubs.nist.gov/nistpubs/ams/NIST.AMS.100-34.pdf

  • Valet A, Altenmüller T, Waschneck B et al (2022) Opportunistic maintenance scheduling with deep reinforcement learning. J Manuf Syst 64:518–534

    Article  Google Scholar 

  • Vogl GW, Qiao H (2021) Monitoring, diagnostics and prognostics for manufacturing operations (NIST). National Institute of Standards and Technology (NIST), Gaithersburg. https://www.nist.gov/programs-projects/monitoring-diagnostics-and-prognostics-manufacturing-operations

  • Walsh C (2022) Paris-Erdogan equation. https://www.maths.tcd.ie/~chas/node24.html#SECTION00841000000000000000

  • Wang X, Wang H, Qi C et al (2014) Reinforcement learning based predictive maintenance for a machine with multiple deteriorating yield levels. J Comput Inf Syst 10(1):9–19. https://doi.org/10.12733/jcis8124

    Article  Google Scholar 

  • Wang X, Qi C, Wang H et al (2015) Resilience-driven maintenance scheduling methodology for multi-agent production line system. In: Proceedings of the 2015 27th Chinese control and decision conference (CCDC 2015), pp 614–619. https://doi.org/10.1109/CCDC.2015.7161844

  • Wang X, Wang H, Qi C (2016) Multi-agent reinforcement learning based maintenance policy for a resource constrained flow line system. J Intell Manuf 27(2):325–333. https://doi.org/10.1007/s10845-013-0864-5

    Article  Google Scholar 

  • Wang H, Yan Q, Zhang S (2021a) Integrated scheduling and flexible maintenance in deteriorating multi-state single machine system using a reinforcement learning approach. Adv Eng Inf. https://doi.org/10.1016/j.aei.2021.101339

  • Wang X, Wang Y, Dai H (2021b) Fault diagnosis based on data-driven dynamic model. In: ICSMD 2021—2nd international conference on sensing, measurement and data analytics in the era of artificial intelligence. https://doi.org/10.1109/ICSMD53520.2021.9670767

  • Wang X, Xu D, Qu N et al (2021c) Predictive maintenance and sensitivity analysis for equipment with multiple quality states. Math Probl Eng. https://doi.org/10.1155/2021/4914372

  • Wang X, Zhang G, Li Y et al (2022) A heuristically accelerated reinforcement learning method for maintenance policy of an assembly line. J Ind Manag Optim 19(4):2381–2395

  • Weibull W (1951) A statistical distribution function of wide applicability. J Appl Mech 18:293–297

  • Weiss BA, Helu M, Vogl G et al (2016) Use case development to advance monitoring, diagnostics, and prognostics in manufacturing operations. IFAC-Papers OnLine 49:13–18. https://doi.org/10.1016/J.IFACOL.2016.12.154

    Article  MathSciNet  Google Scholar 

  • Weiss BA, Alonzo D, Weinman SD (2017) Nist advanced manufacturing series 100–13 summary report on a workshop on advanced monitoring, diagnostics, and prognostics for manufacturing operations. National Institute of Standards and Technology, Gaithersburg. https://doi.org/10.6028/NIST.AMS.100-13

  • Wu Q, Feng Q, Ren Y et al (2021) An intelligent preventive maintenance method based on reinforcement learning for battery energy storage systems. IEEE Trans Ind Inf. https://doi.org/10.1109/TII.2021.3066257

    Article  Google Scholar 

  • Xanthopoulos A, Kiatipis A, Koulouriotis D et al (2017) Reinforcement learning-based and parametric production-maintenance control policies for a deteriorating manufacturing system. IEEE Access 6:576–588. https://doi.org/10.1109/ACCESS.2017.2771827

    Article  Google Scholar 

  • Yan S, Ma B, Zheng C et al (2019) An optimal lubrication oil replacement method based on selected oil field data. IEEE Access 7:92110–92118. https://doi.org/10.1109/ACCESS.2019.2927426

    Article  Google Scholar 

  • Yang D (2022) Adaptive risk-based life-cycle management for large-scale structures using deep reinforcement learning and surrogate modeling. J Eng Mech. https://doi.org/10.1061/(ASCE)EM.1943-7889.0002028

  • Yang Z, Qi C (2013) Preventive maintenance of a multi-yield deteriorating machine: using reinforcement learning. Syst Eng Theory Pract 33(7):1647–1653

    Google Scholar 

  • Yang H, Shen L, Cheng M et al (2018) Integrated optimization of scheduling and maintenance in multi-state production systems with deterioration effects. Comput Integr Manuf Syst (CIMS) 24(1):80–88. https://doi.org/10.13196/j.cims.2018.01.008

    Article  Google Scholar 

  • Yang H, Li W, Wang B (2021) Joint optimization of preventive maintenance and production scheduling for multi-state production systems based on reinforcement learning. Reliab Eng Syst Saf. https://doi.org/10.1016/j.ress.2021.107713

  • Zhang N, Si W (2020) Deep reinforcement learning for condition-based maintenance planning of multi-component systems under dependent competing risks. Reliab Eng Syst Saf. https://doi.org/10.1016/j.ress.2020.107094

  • Zhang Z, Tang Q (2022) Integrating preventive maintenance to two-stage assembly flow shop scheduling: Milp model, constructive heuristics and meta-heuristics. Flexible Serv Manuf J 34(1):156–203. https://doi.org/10.1007/s10696-021-09403-0

    Article  MathSciNet  Google Scholar 

  • Zhang C, Vinyals O, Munos R et al (2018) A study on overfitting in deep reinforcement learning. arXiv:1804.06893

  • Zhang C, Gupta C, Farahat A et al (2019) Equipment health indicator learning using deep reinforcement learning. Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and Lecture notes in bioinformatics), vol 11053. LNAI, pp 488–504. https://doi.org/10.1007/978-3-030-10997-4_30

  • Zhang P, Zhu X, Xie M (2021) A model-based reinforcement learning approach for maintenance optimization of degrading systems in a large state space. Comput Ind Eng. https://doi.org/10.1016/j.cie.2021.107622

  • Zheng S, Ristovski K, Farahat A et al (2017a) Long short-term memory network for remaining useful life estimation. In: 2017 IEEE international conference on prognostics and health management (ICPHM), pp 88–95

  • Zheng S, Ristovski K, Farahat A et al (2017b) Long short-term memory network for remaining useful life estimation. In: 2017 IEEE international conference on prognostics and health management (ICPHM), pp 88–95

  • Zheng W, Lei Y, Chang Q (2017c) Reinforcement learning based real-time control policy for two-machine-one-buffer production system. In: ASME 2017 12th international manufacturing science and engineering conference, MSEC 2017 collocated with the JSME/ASME 2017 6th international conference on materials and processing 3. https://doi.org/10.1115/MSEC2017-2771

  • Zonta T, da Costa C, da Rosa Righi R et al (2020) Predictive maintenance in the industry 4.0: A systematic literature review. Comput Ind Eng. https://doi.org/10.1016/j.cie.2020.106889

Download references

Acknowledgements

We would like to sincerely thank the Reviewers. Their valuable comments and suggestions helped us improve the technical quality of the manuscript.

Funding

This work was supported by Symbiosis Institute of Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Satish Kumar.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Acronyms and notations

See Table 11.

Table 11 Acronyms and notations

Appendix 2: Tables—applications, algorithms and evaluation data-sets

See Tables 12, 13, 14, 15 and 16.

Table 12 Industrial applications (RQ-3)
Table 13 Algorithms applied by researchers (standard and their variants)
Table 14 Specialized and novel algorithms applied by researchers
Table 15 MDP model formulation
Table 16 Environments and data-set used for evaluation

Appendix 3: \({tf-idf}\) weighting

The \({tf-idf}\) scheme assigns a weight to each term in a document using Eq. (30), that is low when the term is infrequent in a document or occurs in many documents and is high when the term occurs many times within a small sub-set of documents.

$$\begin{aligned} tf-idf(t, d) = tf(t, d) \times idf(t), \end{aligned},$$
(30)

where t is a term within the document-set d; and the inverse document frequency, idf, is computed using (31)

$$\begin{aligned} idf(t) = log \left[ \frac{n}{df(t)} \right] + 1 \end{aligned}$$
(31)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Siraskar, R., Kumar, S., Patil, S. et al. Reinforcement learning for predictive maintenance: a systematic technical review. Artif Intell Rev 56, 12885–12947 (2023). https://doi.org/10.1007/s10462-023-10468-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-023-10468-6

Keywords

Navigation