Advertisement

Falsification of Cyber-Physical Systems Using Deep Reinforcement Learning

  • Takumi Akazaki
  • Shuang Liu
  • Yoriyuki Yamagata
  • Yihai Duan
  • Jianye Hao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10951)

Abstract

With the rapid development of software and distributed computing, Cyber-Physical Systems (CPS) are widely adopted in many application areas, e.g., smart grid, autonomous automobile. It is difficult to detect defects in CPS models due to the complexities involved in the software and physical systems. To find defects in CPS models efficiently, robustness guided falsification of CPS is introduced. Existing methods use several optimization techniques to generate counterexamples, which falsify the given properties of a CPS. However those methods may require a large number of simulation runs to find the counterexample and are far from practical. In this work, we explore state-of-the-art Deep Reinforcement Learning (DRL) techniques to reduce the number of simulation runs required to find such counterexamples. We report our method and the preliminary evaluation results.

References

  1. 1.
  2. 2.
    Abbas, H., Fainekos, G., Sankaranarayanan, S., Ivančić, F., Gupta, A.: Probabilistic temporal logic falsification of cyber-physical systems. ACM Trans. Embed. Comput. Syst. 12(2s), 95:1–95:30 (2013)CrossRefGoogle Scholar
  3. 3.
    Abbas, H., Fainekos, G.E.: Convergence proofs for simulated annealing falsification of safety properties. In: 50th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2012, Allerton Park & Retreat Center, Monticello, IL, USA, 1–5 October 2012, pp. 1594–1601. IEEE (2012)Google Scholar
  4. 4.
    Akazaki, T.: Falsification of conditional safety properties for cyber-physical systems with gaussian process regression. In: Falcone, Y., Sánchez, C. (eds.) RV 2016. LNCS, vol. 10012, pp. 439–446. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46982-9_27CrossRefGoogle Scholar
  5. 5.
    Annpureddy, Y., Liu, C., Fainekos, G., Sankaranarayanan, S.: S-TaLiRo: a tool for temporal logic falsification for hybrid systems. In: Abdulla, P.A., Leino, K.R.M. (eds.) TACAS 2011. LNCS, vol. 6605, pp. 254–257. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-19835-9_21CrossRefzbMATHGoogle Scholar
  6. 6.
    Bardh Hoxha, H.A., Fainekos, G.: Benchmarks for temporal logic requirements for automotive systems. In: Proceedings of Applied Verification for Continuous and Hybrid Systems (2014)Google Scholar
  7. 7.
    Bartocci, E., Bortolussi, L., Nenzi, L., Sanguinetti, G.: On the robustness of temporal properties for stochastic models. In: Dang, T., Piazza, C. (eds.) Proceedings Second International Workshop on Hybrid Systems and Biology, HSB 2013. EPTCS, Taormina, Italy, 2nd September 2013, vol. 125, pp. 3–19 (2013)Google Scholar
  8. 8.
    Bartocci, E., Bortolussi, L., Nenzi, L., Sanguinetti, G.: System design of stochastic models using robustness of temporal properties. Theor. Comput. Sci. 587, 3–25 (2015)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Bartocci, E., et al.: Specification-based monitoring of cyber-physical systems: a survey on theory, tools and applications. In: Bartocci, E., Falcone, Y. (eds.) Lectures on Runtime Verification. LNCS, vol. 10457, pp. 135–175. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-75632-5_5CrossRefGoogle Scholar
  10. 10.
    Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: OpenAI gym (2016)Google Scholar
  11. 11.
    Cook, J.D.: Basic properties of the soft maximum (2011)Google Scholar
  12. 12.
    Corder, G.W., Foreman, D.I.: Nonparametric Statistics: A Step-by-Step Approach. Wiley, Hoboken (2014)zbMATHGoogle Scholar
  13. 13.
    Ding, X.C., Smith, S.L., Belta, C., Rus, D.: MDP optimal control under temporal logic constraints. In: Proceedings of the 50th IEEE Conference on Decision and Control and European Control Conference, CDC-ECC 2011, Orlando, FL, USA, 12–15 December 2011, pp. 532–538. IEEE (2011)Google Scholar
  14. 14.
    Ding, X.C., Smith, S.L., Belta, C., Rus, D.: Optimal control of markov decision processes with linear temporal logic constraints. IEEE Trans. Autom. Control 59(5), 1244–1257 (2014)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Donzé, A.: Breach, a toolbox for verification and parameter synthesis of hybrid systems. In: Touili, T., Cook, B., Jackson, P. (eds.) CAV 2010. LNCS, vol. 6174, pp. 167–170. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-14295-6_17CrossRefGoogle Scholar
  16. 16.
    Donzé, A., Maler, O.: Robust satisfaction of temporal logic over real-valued signals. In: Chatterjee, K., Henzinger, T.A. (eds.) FORMATS 2010. LNCS, vol. 6246, pp. 92–106. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15297-9_9CrossRefzbMATHGoogle Scholar
  17. 17.
    Gu, S., Lillicrap, T., Sutskever, I., Levine, S.: Continuous deep q-learning with model-based acceleration. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning, Proceedings of Machine Learning Research, PMLR, New York, USA, 20–22 June 2016, vol. 48, pp. 2829–2838 (2016)Google Scholar
  18. 18.
    Ho, H.-M., Ouaknine, J., Worrell, J.: Online monitoring of metric temporal logic. In: Bonakdarpour, B., Smolka, S.A. (eds.) RV 2014. LNCS, vol. 8734, pp. 178–192. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-11164-3_15CrossRefGoogle Scholar
  19. 19.
    Hoxha, B., Abbas, H., Fainekos, G.E.: Using S-TaLiRo on industrial size auimmlertomotive models. In: Frehse, G., Althoff, M. (eds.) 1st and 2nd International Workshop on Applied Verification for Continuous and Hybrid Systems, ARCH@CPSWeek 2014.EPiC Series in Computing, Berlin, Germany, 14 April 2014/ARCH@CPSWeek 2015, Seattle, WA, USA, 13 April 2015, vol. 34, pp. 113–119. EasyChair (2014)Google Scholar
  20. 20.
    Li, X., Ma, Y., Belta, C.: A policy search method for temporal logic specified reinforcement learning tasks. CoRR, abs/1709.09611 (2017)Google Scholar
  21. 21.
    Li, X., Vasile, C.I., Belta, C.: Reinforcement learning with temporal logic rewards. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2017, Vancouver, BC, Canada, 24–28 September 2017, pp. 3834–3839. IEEE (2017)Google Scholar
  22. 22.
    Luna, R., Lahijanian, M., Moll, M., Kavraki, L.E.: Asymptotically optimal stochastic motion planning with temporal goals. In: Akin, H.L., Amato, N.M., Isler, V., van der Stappen, A.F. (eds.) Algorithmic Foundations of Robotics XI. STAR, vol. 107, pp. 335–352. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-16595-0_20CrossRefGoogle Scholar
  23. 23.
    Maler, O., Nickovic, D.: Monitoring temporal properties of continuous signals. In: Lakhnech, Y., Yovine, S. (eds.) FORMATS/FTRTFT -2004. LNCS, vol. 3253, pp. 152–166. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-30206-3_12CrossRefzbMATHGoogle Scholar
  24. 24.
    Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T.P., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning, vol. 48 (2016)Google Scholar
  25. 25.
    Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRefGoogle Scholar
  26. 26.
    Sadigh, D., Kim, E.S., Coogan, S., Sastry, S.S., Seshia, S.A.: A learning based approach to control synthesis of Markov decision processes for linear temporal logic specifications. In: 53rd IEEE Conference on Decision and Control, CDC 2014, Los Angeles, CA, USA, 15–17 December 2014, pp. 1091–1096. IEEE (2014)Google Scholar
  27. 27.
    Sankaranarayanan, S., Fainekos, G.E.: Falsification of temporal properties of hybrid systems using the cross-entropy method. In: Dang, T., Mitchell, I.M. (eds.) Hybrid Systems: Computation and Control (part of CPS Week 2012), HSCC 2012, Beijing, China, 17–19 April 2012, pp. 125–134. ACM (2012)Google Scholar
  28. 28.
    Silvetti, S., Policriti, A., Bortolussi, L.: An active learning approach to the falsification of black box cyber-physical systems. In: Polikarpova, N., Schneider, S. (eds.) IFM 2017. LNCS, vol. 10510, pp. 3–17. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-66845-1_1CrossRefGoogle Scholar
  29. 29.
    Soudjani, S.E.Z., Majumdar, R.: Controller synthesis for reward collecting Markov processes in continuous space. In: Frehse, G., Mitra, S. (eds.) Proceedings of the 20th International Conference on Hybrid Systems: Computation and Control, HSCC 2017, Pittsburgh, PA, USA, 18–20 April 2017, pp. 45–54. ACM (2017)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Takumi Akazaki
    • 1
    • 2
  • Shuang Liu
    • 3
  • Yoriyuki Yamagata
    • 4
  • Yihai Duan
    • 3
  • Jianye Hao
    • 3
  1. 1.The University of TokyoTokyoJapan
  2. 2.Japan Society for the Promotion of ScienceTokyoJapan
  3. 3.School of SoftwareTianjin UniversityTianjinChina
  4. 4.National Institute of Advanced Industrial Science and Technology (AIST)TokyoJapan

Personalised recommendations