Journal of Intelligent Manufacturing

, Volume 30, Issue 1, pp 147–161 | Cite as

Optimal preventive maintenance policy based on reinforcement learning of a fleet of military trucks

  • Stephane R. A. Barde
  • Soumaya YacoutEmail author
  • Hayong Shin


In this paper, we model preventive maintenance strategies for equipment composed of multi-non-identical components which have different time-to-failure probability distribution, by using a Markov decision process (MDP). The originality of this paper resides in the fact that a Monte Carlo reinforcement learning (MCRL) approach is used to find the optimal policy for each different strategy. The approach is applied to an already existing published application which deals with a fleet of military trucks. The fleet consists of a group of similar trucks that are composed of non-identical components. The problem is formulated as a MDP and solved by a MCRL technique. The advantage of this modeling technique when compared to the published one is that there is no need to estimate the main parameters of the model, for example the estimation of the transition probabilities. These parameters are treated as variables and they are found by the modeling technique, while searching for the optimal solution. Moreover, the technique is not bounded by any explicit mathematical formula, and it converges to the optimal solution whereas the previous model optimizes the replacement policy of each component separately, which leads to a local optimization. The results show that by using the reinforcement learning approach, we are able of getting a 36.44 % better solution that is less downtime.


Preventive maintenance Opportunistic maintenance Markov decision process Reinforcement learning 


  1. Abdel Haleem, B., & Yacout, S. (1998). Simulation of components replacement policies for a fleet of military trucks. Quality Engineering, 11(2), 303–308.CrossRefGoogle Scholar
  2. Das, T. K., & Sarkar, S. (1999). Optimal preventive maintenance in a production inventory system. IIE Transactions, 31(6), 537–551.Google Scholar
  3. Gelly, S., Kocsis, L., Schoenauer, M., Sebag, M., Silver, D., Szepesvári, C., et al. (2012). The grand challenge of computer Go: Monte Carlo tree search and extensions. Communications of the ACM, 55(3), 106–113.CrossRefGoogle Scholar
  4. Gosavi, A. (2004). Reinforcement learning for long-run average cost. European Journal of Operational Research, 155(3), 654–674.CrossRefGoogle Scholar
  5. Jardine, A. K., & Tsang, A. H. (2013). Maintenance, replacement, and reliability: Theory and applications. Boca Raton: CRC Press.CrossRefGoogle Scholar
  6. Jia, Q.-S. (2010). A structural property of optimal policies for multi-component maintenance problems. IEEE Transactions on Automation Science and Engineering, 7(3), 677–680.CrossRefGoogle Scholar
  7. Powell, W. B. (2007). Approximate dynamic programming: Solving the curses of dimensionality (Vol. 703). New York: Wiley.CrossRefGoogle Scholar
  8. Steven, B. (2001). J. D. Campbell, A. K. Jardine, & W. M. Dekker (Eds.), Maintenance excellence, optimizing equipment life-cycle decisions, pp. 43–44.Google Scholar
  9. Sutton, R. S., & Andrew, G. B. (1998). Reinforcement learning: An introduction (Vol. 1, No. 1). Cambridge: MIT press.Google Scholar
  10. Szepesvári, C. (2010). Algorithms for reinforcement learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 4(1), 1–103.CrossRefGoogle Scholar
  11. Tsitsiklis, J. N. (2003). On the convergence of optimistic policy iteration. The Journal of Machine Learning Research, 3, 59–72.Google Scholar
  12. Tuncel, E., Zeid, A., & Kamarthi, S. (2014). Solving large scale disassembly line balancing problem with uncertainty using reinforcement learning. Journal of Intelligent Manufacturing, 25(4), 647–659.Google Scholar
  13. Wang, X., Wang, H., & Qi, C. (2014). Multi-agent reinforcement learning based maintenance policy for a resource constrained flow line system. Journal of Intelligent Manufacturing, 27(2), 325–333.Google Scholar
  14. Wang, J. W., Wang, H., Ip, W. H., Furuta, K., & Zhang, W. J. (2013). Predatory search strategy based on swarm intelligence for continuous optimization problems. Mathematical Problems in Engineering. 11 pp. doi: 10.1155/2013/749256
  15. Zhang, W. J., & Van Luttervelt, C. A. (2011). Toward a resilient manufacturing system. CIRP Annals-Manufacturing Technology, 60(1), 469–472.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Stephane R. A. Barde
    • 1
  • Soumaya Yacout
    • 2
    Email author
  • Hayong Shin
    • 1
  1. 1.Korea Advanced Institute of Science and Technology (KAIST)Yuseong-gu, DaejeonRepublic of Korea
  2. 2.Ecole Polytechnique de MontrealMontrealCanada

Personalised recommendations