Skip to main content

Learning Provably Stabilizing Neural Controllers for Discrete-Time Stochastic Systems

  • Conference paper
  • First Online:
Automated Technology for Verification and Analysis (ATVA 2023)

Abstract

We consider the problem of learning control policies in discrete-time stochastic systems which guarantee that the system stabilizes within some specified stabilization region with probability 1. Our approach is based on the novel notion of stabilizing ranking supermartingales (sRSMs) that we introduce in this work. Our sRSMs overcome the limitation of methods proposed in previous works whose applicability is restricted to systems in which the stabilizing region cannot be left once entered under any control policy. We present a learning procedure that learns a control policy together with an sRSM that formally certifies probability 1 stability, both learned as neural networks. We show that this procedure can also be adapted to formally verifying that, under a given Lipschitz continuous control policy, the stochastic system stabilizes within some stabilizing region with probability 1. Our experimental evaluation shows that our learning procedure can successfully learn provably stabilizing policies in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Our implementation is available at https://github.com/mlech26l/neural_martingales/tree/ATVA2023.

References

  1. Abate, A., Ahmed, D., Giacobbe, M., Peruffo, A.: Formal synthesis of Lyapunov neural networks. IEEE Control. Syst. Lett. 5(3), 773–778 (2021). https://doi.org/10.1109/LCSYS.2020.3005328

    Article  MathSciNet  Google Scholar 

  2. Abate, A., Giacobbe, M., Roy, D.: Learning probabilistic termination proofs. In: Silva, A., Leino, K.R.M. (eds.) CAV 2021. LNCS, vol. 12760, pp. 3–26. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-81688-9_1

    Chapter  Google Scholar 

  3. Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: International Conference on Machine Learning, pp. 22–31. PMLR (2017)

    Google Scholar 

  4. Altman, E.: Constrained Markov Decision Processes, vol. 7. CRC Press (1999)

    Google Scholar 

  5. Amodei, D., Olah, C., Steinhardt, J., Christiano, P.F., Schulman, J., Mané, D.: Concrete problems in AI safety. CoRR abs/1606.06565 (2016). https://arxiv.org/abs/1606.06565

  6. Ansaripour, M., Chatterjee, K., Henzinger, T.A., Lechner, M., Zikelic, D.: Learning provably stabilizing neural controllers for discrete-time stochastic systems. CoRR abs/2210.05304 (2022). https://doi.org/10.48550/arXiv.2210.05304

  7. Badings, T.S., et al.: Robust control for dynamical systems with non-gaussian noise via formal abstractions. J. Artif. Intell. Res. 76, 341–391 (2023). https://doi.org/10.1613/jair.1.14253

    Article  MathSciNet  MATH  Google Scholar 

  8. Berkenkamp, F., Turchetta, M., Schoellig, A.P., Krause, A.: Safe model-based reinforcement learning with stability guarantees. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017, pp. 908–918 (2017). https://proceedings.neurips.cc/paper/2017/hash/766ebcd59621e305170616ba3d3dac32-Abstract.html

  9. Brockman, G., et al.: OpenAI gym. arXiv preprint arXiv:1606.01540 (2016)

  10. Cauchi, N., Abate, A.: Stochy-automated verification and synthesis of stochastic processes. In: Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control, pp. 258–259 (2019)

    Google Scholar 

  11. Chakarov, A., Sankaranarayanan, S.: Probabilistic program analysis with martingales. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 511–526. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39799-8_34

    Chapter  Google Scholar 

  12. Chakarov, A., Voronin, Y.-L., Sankaranarayanan, S.: Deductive proofs of almost sure persistence and recurrence properties. In: Chechik, M., Raskin, J.-F. (eds.) TACAS 2016. LNCS, vol. 9636, pp. 260–279. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49674-9_15

    Chapter  MATH  Google Scholar 

  13. Chang, E., Manna, Z., Pnueli, A.: Characterization of temporal property classes. In: Kuich, W. (ed.) ICALP 1992. LNCS, vol. 623, pp. 474–486. Springer, Heidelberg (1992). https://doi.org/10.1007/3-540-55719-9_97

    Chapter  Google Scholar 

  14. Chang, Y., Gao, S.: Stabilizing neural control using self-learned almost Lyapunov critics. In: IEEE International Conference on Robotics and Automation, ICRA 2021, Xi’an, China, 30 May–5 June 2021, pp. 1803–1809. IEEE (2021). https://doi.org/10.1109/ICRA48506.2021.9560886

  15. Chang, Y., Roohi, N., Gao, S.: Neural Lyapunov control. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada, 8–14 December 2019, pp. 3240–3249 (2019). https://proceedings.neurips.cc/paper/2019/hash/2647c1dba23bc0e0f9cdf75339e120d2-Abstract.html

  16. Chatterjee, K., Fu, H., Goharshady, A.K.: Termination analysis of probabilistic programs through Positivstellensatz’s. In: Chaudhuri, S., Farzan, A. (eds.) CAV 2016. LNCS, vol. 9779, pp. 3–22. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41528-4_1

    Chapter  Google Scholar 

  17. Chatterjee, K., Fu, H., Novotný, P., Hasheminezhad, R.: Algorithmic analysis of qualitative and quantitative termination problems for affine probabilistic programs. In: Bodík, R., Majumdar, R. (eds.) Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg, FL, USA, 20–22 January 2016, pp. 327–342. ACM (2016). https://doi.org/10.1145/2837614.2837639

  18. Chatterjee, K., Goharshady, A.K., Meggendorfer, T., Zikelic, D.: Sound and complete certificates for quantitative termination analysis of probabilistic programs. In: Shoham, S., Vizel, Y. (eds.) CAV 2022. LNCS, vol. 13371, pp. 55–78. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-13185-1_4

    Chapter  Google Scholar 

  19. Chatterjee, K., Goharshady, E.K., Novotný, P., Zárevúcky, J., Žikelić, Đ: On lexicographic proof rules for probabilistic termination. In: Huisman, M., Păsăreanu, C., Zhan, N. (eds.) FM 2021. LNCS, vol. 13047, pp. 619–639. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-90870-6_33

    Chapter  Google Scholar 

  20. Chatterjee, K., Novotný, P., Zikelic, D.: Stochastic invariants for probabilistic termination. In: Castagna, G., Gordon, A.D. (eds.) Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017, Paris, France, 18–20 January 2017, pp. 145–160. ACM (2017). https://doi.org/10.1145/3009837.3009873

  21. Chow, Y., Nachum, O., Duéñez-Guzmán, E.A., Ghavamzadeh, M.: A Lyapunov-based approach to safe reinforcement learning. In: Bengio, S., Wallach, H.M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, Montréal, Canada, 3–8 December 2018, pp. 8103–8112 (2018). https://proceedings.neurips.cc/paper/2018/hash/4fe5149039b52765bde64beb9f674940-Abstract.html

  22. Cousot, P., Cousot, R.: Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In: Graham, R.M., Harrison, M.A., Sethi, R. (eds.) Conference Record of the Fourth ACM Symposium on Principles of Programming Languages, Los Angeles, California, USA, January 1977, pp. 238–252. ACM (1977). https://doi.org/10.1145/512950.512973

  23. Crespo, L.G., Sun, J.: Stochastic optimal control via Bellman’s principle. Automatica 39(12), 2109–2114 (2003). https://doi.org/10.1016/S0005-1098(03)00238-3

    Article  MathSciNet  MATH  Google Scholar 

  24. Dalal, G., Dvijotham, K., Vecerík, M., Hester, T., Paduraru, C., Tassa, Y.: Safe exploration in continuous action spaces. arXiv abs/1801.08757 (2018)

    Google Scholar 

  25. García, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16, 1437–1480 (2015). https://dl.acm.org/citation.cfm?id=2886795

  26. Geibel, P.: Reinforcement learning for MDPs with constraints. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 646–653. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_63

    Chapter  Google Scholar 

  27. Gowal, S., et al.: On the effectiveness of interval bound propagation for training verifiably robust models. CoRR abs/1810.12715 (2018). https://arxiv.org/abs/1810.12715

  28. Henrion, D., Garulli, A.: Positive Polynomials in Control, vol. 312. Springer, Heidelberg (2005)

    Book  MATH  Google Scholar 

  29. Jin, W., Wang, Z., Yang, Z., Mou, S.: Neural certificates for safe control policies. CoRR abs/2006.08465 (2020). https://arxiv.org/abs/2006.08465

  30. Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex: an efficient SMT solver for verifying deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 97–117. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_5

    Chapter  Google Scholar 

  31. Khalil, H.: Nonlinear Systems. Pearson Education, Prentice Hall (2002)

    Google Scholar 

  32. Koller, T., Berkenkamp, F., Turchetta, M., Krause, A.: Learning-based model predictive control for safe exploration. In: 2018 IEEE Conference on Decision and Control (CDC), pp. 6059–6066 (2018)

    Google Scholar 

  33. Kushner, H.J.: On the stability of stochastic dynamical systems. Proc. Natl. Acad. Sci. U.S.A. 53(1), 8 (1965)

    Article  MathSciNet  Google Scholar 

  34. Kushner, H.J.: A partial history of the early development of continuous-time nonlinear stochastic systems theory. Automatica 50(2), 303–334 (2014). https://doi.org/10.1016/j.automatica.2013.10.013

    Article  MathSciNet  MATH  Google Scholar 

  35. Lavaei, A., Khaled, M., Soudjani, S., Zamani, M.: AMYTISS: parallelized automated controller synthesis for large-scale stochastic systems. In: Lahiri, S.K., Wang, C. (eds.) CAV 2020. LNCS, vol. 12225, pp. 461–474. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-53291-8_24

    Chapter  Google Scholar 

  36. Lechner, M., Zikelic, D., Chatterjee, K., Henzinger, T.A.: Infinite time horizon safety of Bayesian neural networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, 6–14 December 2021, Virtual, pp. 10171–10185 (2021). https://proceedings.neurips.cc/paper/2021/hash/544defa9fddff50c53b71c43e0da72be-Abstract.html

  37. Lechner, M., Zikelic, D., Chatterjee, K., Henzinger, T.A.: Stability verification in stochastic control systems via neural network supermartingales. In: Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, 22 February–1 March 2022, pp. 7326–7336. AAAI Press (2022). https://ojs.aaai.org/index.php/AAAI/article/view/20695

  38. Liu, A., Shi, G., Chung, S.J., Anandkumar, A., Yue, Y.: Robust regression for safe exploration in control. In: L4DC (2020)

    Google Scholar 

  39. Murphy, K.P.: Machine Learning - A Probabilistic Perspective. Adaptive Computation and Machine Learning Series. MIT Press (2012)

    Google Scholar 

  40. Parrilo, P.A.: Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization. California Institute of Technology (2000)

    Google Scholar 

  41. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Probability and Statistics. Wiley (1994). https://doi.org/10.1002/9780470316887

  42. Richards, S.M., Berkenkamp, F., Krause, A.: The Lyapunov neural network: adaptive stability certification for safe learning of dynamical systems. In: 2nd Annual Conference on Robot Learning, CoRL 2018, Zürich, Switzerland, 29–31 October 2018, Proceedings. Proceedings of Machine Learning Research, vol. 87, pp. 466–476. PMLR (2018). https://proceedings.mlr.press/v87/richards18a.html

  43. Sälzer, M., Lange, M.: Reachability is NP-complete even for the simplest neural networks. In: Bell, P.C., Totzke, P., Potapov, I. (eds.) RP 2021. LNCS, vol. 13035, pp. 149–164. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-89716-1_10

    Chapter  MATH  Google Scholar 

  44. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  45. Soudjani, S.E.Z., Gevaerts, C., Abate, A.: FAUST\(^2\): formal abstractions of uncountable-state stochastic processes. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 272–286. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46681-0_23

    Chapter  Google Scholar 

  46. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)

    MATH  Google Scholar 

  47. Szegedy, C., et al.: Intriguing properties of neural networks. In: Bengio, Y., LeCun, Y. (eds.) 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16 April 2014, Conference Track Proceedings (2014). https://arxiv.org/abs/1312.6199

  48. Takisaka, T., Oyabu, Y., Urabe, N., Hasuo, I.: Ranking and repulsing supermartingales for reachability in randomized programs. ACM Trans. Program. Lang. Syst. 43(2), 5:1–5:46 (2021). https://doi.org/10.1145/3450967

  49. Turchetta, M., Berkenkamp, F., Krause, A.: Safe exploration for interactive machine learning. In: NeurIPS (2019)

    Google Scholar 

  50. Uchibe, E., Doya, K.: Constrained reinforcement learning from intrinsic and extrinsic rewards. In: 2007 IEEE 6th International Conference on Development and Learning, pp. 163–168. IEEE (2007)

    Google Scholar 

  51. Vaidya, U.: Stochastic stability analysis of discrete-time system using Lyapunov measure. In: American Control Conference, ACC 2015, Chicago, IL, USA, 1–3 July 2015, pp. 4646–4651. IEEE (2015). https://doi.org/10.1109/ACC.2015.7172061

  52. Van Huijgevoort, B., Schön, O., Soudjani, S., Haesaert, S.: SySCoRe: synthesis via stochastic coupling relations. In: Proceedings of the 26th ACM International Conference on Hybrid Systems: Computation and Control, HSCC 2023. Association for Computing Machinery (2023). https://doi.org/10.1145/3575870.3587123

  53. Vinod, A.P., Gleason, J.D., Oishi, M.M.K.: SReachTools: a MATLAB stochastic reachability toolbox. In: Ozay, N., Prabhakar, P. (eds.) Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control, HSCC 2019, Montreal, QC, Canada, 16–18 April 2019, pp. 33–38. ACM (2019). https://doi.org/10.1145/3302504.3311809

  54. Williams, D.: Probability with Martingales. Cambridge Mathematical Textbooks. Cambridge University Press (1991)

    Google Scholar 

  55. Zikelic, D., Lechner, M., Chatterjee, K., Henzinger, T.A.: Learning stabilizing policies in stochastic control systems. CoRR abs/2205.11991 (2022). https://doi.org/10.48550/arXiv.2205.11991

  56. Zikelic, D., Lechner, M., Henzinger, T.A., Chatterjee, K.: Learning control policies for stochastic systems with reach-avoid guarantees. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 10, pp. 11926–11935 (2023). https://doi.org/10.1609/aaai.v37i10.26407

Download references

Acknowledgement

This work was supported in part by the ERC-2020-AdG 101020093, ERC CoG 863818 (FoRM-SMArt) and the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No. 665385.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Đorđe Žikelić .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ansaripour, M., Chatterjee, K., Henzinger, T.A., Lechner, M., Žikelić, Đ. (2023). Learning Provably Stabilizing Neural Controllers for Discrete-Time Stochastic Systems. In: André, É., Sun, J. (eds) Automated Technology for Verification and Analysis. ATVA 2023. Lecture Notes in Computer Science, vol 14215. Springer, Cham. https://doi.org/10.1007/978-3-031-45329-8_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-45329-8_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-45328-1

  • Online ISBN: 978-3-031-45329-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics