Skip to main content
Log in

Controller synthesis for linear temporal logic and steady-state specifications

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

The problem of deriving decision-making policies, subject to some formal specification of behavior, has been well-studied in the control synthesis, reinforcement learning, and planning communities. Such problems are typically framed in the context of a non-deterministic decision process, the non-determinism of which is optimally resolved by the computed policy. In this paper, we explore the derivation of such policies in Markov decision processes (MDPs) subject to two types of formal specifications. First, we consider steady-state specifications that reason about the infinite-frequency behavior of the resulting agent. This behavior corresponds to the frequency with which an agent visits each state as it follows its decision-making policy indefinitely. Second, we examine the infinite-trace behavior of the agent by imposing Linear Temporal Logic (LTL) constraints on the behavior induced by the resulting policy. We present an algorithm to find a deterministic policy satisfying LTL and steady-state constraints by characterizing the solutions as an integer linear program (ILP) and experimentally evaluate our approach. In our experimental results section, we evaluate the proposed ILP using MDPs with stochastic and deterministic transitions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. LTL is not equivalent to the \(\omega\)-regular languages in terms of expressive power. It is worth noting that our approach applies to logics that are strictly more expressive than LTL, such as Linear Dynamic Logic (LDL), which was introduced by Vardi [4] and has the same expressive power as the \(\omega\)-regular languages. We simply adopt LTL for ease of presentation.

  2. https://github.com/ialkhouri/SSLTL.

References

  1. Velasquez, A., Alkhouri, I., Beckus, A., Trivedi, A., & Atia, G.(2022). Controller synthesis for omega-regular and steady-state specifications. In Proceedings of the 21st international conference on autonomous agents and multiagent systems (pp. 1310–1318).

  2. Thomas, W. (1990). Handbook of theoretical computer science (pp. 133–191). Chap. Automata on Infinite Objects.

  3. Perrin, D., & Pin, J. (2004). Infinite words: Automata, semigroups, logic and games. Academic Press.

  4. Vardi, M. (2011). The rise and fall of linear time logic. In 2nd International Symposium on Games, Automata, Logics and Formal Verification.

  5. de Alfaro, L. (1998). Formal verification of probabilistic systems. (Ph.D. Thesis, Stanford University).

  6. Baier, C., & Katoen, J.-P. (2008). Principles of model checking. MIT Press.

  7. Akshay, S., Bertrand, N., Haddad, S., & Helouet, L. (2013). The steady-state control problem for markov decision processes. In International conference on quantitative evaluation of systems (pp. 290–304). Springer.

  8. Velasquez, A. (2019). Steady-state policy synthesis for verifiable control. In Proceedings of the 28th international joint conference on artificial intelligence, IJCAI-19 (pp. 5653–5661).

  9. Atia, G., Beckus, A., & Alkhouri, I., Velasquez, A. (2020). Steady-state policy synthesis in multichain markov decision processes. In Proceedings of the 29th international joint conference on artificial intelligence, IJCAI-20 (pp. 4069–4075).

  10. Atia, G. K., Beckus, A., Alkhouri, I., & Velasquez, A. (2021). Steady-state planning in expected reward multichain MDPs. Journal of Artificial Intelligence Research, 72, 1029–1082.

    Article  MathSciNet  Google Scholar 

  11. Křetínskỳ, J. (2021). LTL-constrained steady-state policy synthesis. arXiv preprint arXiv:2105.14894.

  12. Kallenberg, L. (2002). Classification problems in MDPs. In Markov Processes and Controlled Markov Chains (pp. 151–165).

  13. Sarathy, V., Kasenberg, D., Goel, S., Sinapov, J., & Scheutz, M. (2021). Spotter: Extending symbolic planning operators through targeted reinforcement learning. In Proceedings of the 20th international conference on autonomous agents and multiAgent systems (pp. 1118–1126).

  14. Ding, X. C. D., Smith, S. L., Belta, C., & Rus, D. (2011). LTL control in uncertain environments with probabilistic satisfaction guarantees. IFAC Proceedings Volumes, 44(1), 3515–3520.

    Article  Google Scholar 

  15. Lacerda, B., Parker, D., & Hawes, N. (2014). Optimal and dynamic planning for markov decision processes with co-safe LTL specifications. In 2014 IEEE/RSJ international conference on intelligent robots and systems (pp. 1511–1516). IEEE.

  16. Norris, J. R., & Norris, J. R. (1998). Markov chains, vol. 2. Cambridge University Press.

  17. Pnueli, A., & Rosner, R. (1989). On the synthesis of a reactive module. In Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on principles of programming languages (pp. 179–190).

  18. Church, A. (1963). Application of recursive arithmetic to the problem of circuit synthesis. Journal of Symbolic Logic, 28(4).

  19. Etessami, K., Kwiatkowska, M., Vardi, M. Y., & Yannakakis, M. (2007). Multi-objective model checking of Markov decision processes. In International conference on tools and algorithms for the construction and analysis of systems (pp. 50–65). Springer.

  20. Yannakakis, M., Vardi, M. Y., Kwiatkowska, M., & Etessami, K. (2008). Multi-objective model checking of Markov decision processes. Logical Methods in Computer Science.

  21. Forejt, V., Kwiatkowska, M., Norman, G., & Parker, D. (2011). Automated verification techniques for probabilistic systems. In International school on formal methods for the design of computer, communication and software systems (pp. 53–113). Springer.

  22. Chatterjee, K., & Henzinger, M. (2011). Faster and dynamic algorithms for maximal end-component decomposition and related graph problems in probabilistic verification. In Symposium on discrete algorithms (SODA) (pp. 1318–1336).

  23. Kallenberg, L. C. M. (1983). Linear programming and finite Markovian control problems. Mathematisch Centrum.

  24. Puterman, M. L. (1994). Markov decision processes. Wiley.

  25. Altman, E. (1998). Constrained Markov decision processes with total cost criteria: Lagrangian approach and dual linear program. Mathematical Methods of Operations Research, 48(3), 387–417.

    Article  MathSciNet  Google Scholar 

  26. Feinberg, E. A. (2009). Adaptive computation of optimal nonrandomized policies in constrained average-reward MDPs. In IEEE symposium on adaptive dynamic programming and reinforcement learning (pp. 96–100). https://doi.org/10.1109/ADPRL.2009.4927531

  27. Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT Press.

  28. Bouyer, P., Markey, N., & Matteplackel, R. M. (2014). Averaging in LTL. In: International conference on concurrency theory (pp. 266–280). Springer.

  29. Almagor, S., Boker, U., & Kupferman, O. (2014). Discounting in LTL. In International conference on tools and algorithms for the construction and analysis of systems (pp. 424–439). Springer.

  30. Boker, U., Chatterjee, K., Henzinger, T. A., & Kupferman, O. (2014). Temporal specifications with accumulative values. ACM Transactions on Computational Logic (TOCL), 15(4), 1–25.

    Article  MathSciNet  Google Scholar 

  31. Bollig, B., Decker, N., & Leucker, M. (2012). Frequency linear-time temporal logic. In 2012 6th international symposium on theoretical aspects of software engineering (pp. 85–92). IEEE.

  32. Svoreňová, M., Černá, I., & Belta, C. (2013). Optimal control of MDPs with temporal logic constraints. In 52nd IEEE conference on decision and control (pp. 3938–3943). IEEE.

  33. Altman, E., Boularouk, S., & Josselin, D. (2019). Constrained Markov decision processes with total expected cost criteria. In Proceedings of the 12th EAI international conference on performance evaluation methodologies and tools (pp. 191–192). ACM.

  34. Krass, D., & Vrieze, O. J. (2002). Achieving target state-action frequencies in multichain average-reward Markov decision processes. Mathematics of Operations Research, 27(3), 545–566.

    Article  MathSciNet  Google Scholar 

  35. Esparza, J., & Křetínskỳ, J. (2014). From ltl to deterministic automata: A safraless compositional approach. In Computer aided verification: 26th international conference, CAV 2014, held as part of the Vienna summer of logic, VSL 2014, Vienna, Austria, July 18–22, 2014. Proceedings 26 (pp. 192–208). Springer.

  36. Trevizan, F. W., Thiébaux, S., & Haslum, P. (2017). Occupation measure heuristics for probabilistic planning. In ICAPS (pp. 306–315).

  37. Buchholz, P. (1994). Exact and ordinary lumpability in finite Markov chains. Journal of Applied Probability, 59–75.

  38. Sumita, U., & Rieders, M. (1989). Lumpability and time reversibility in the aggregation-disaggregation method for large Markov chains. Stochastic Models, 5(1), 63–81.

    Article  MathSciNet  Google Scholar 

  39. ILOG, Inc. (2006). ILOG CPLEX: High-performance software for mathematical programming and optimization. See http://www.ilog.com/products/cplex/.

Download references

Acknowledgements

This research was supported in part by the Air Force Research Laboratory through the Information Directorate’s Information Institute\(^{\circledR }\) Contract Number FA8750-20-3-1003 and FA8750-20-3-1004, the Air Force Office of Scientific Research through Award 20RICOR012, and the National Science Foundation through CAREER Award CCF-1552497 and Award CCF-2106339.

Author information

Authors and Affiliations

Authors

Contributions

A.V. proposed the idea, wrote the paper, and developed the theoretical work. I.A. and A.B. worked on the experiments, code, and figures. G.A. and A.T. helped with the writing and developing the theoretical work.

Corresponding authors

Correspondence to Alvaro Velasquez or Ismail Alkhouri.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A conference version of this work (with mathematical proofs omitted) appeared in AAMAS 2022 [1]. The work was done while at the University of Central Florida.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Velasquez, A., Alkhouri, I., Beckus, A. et al. Controller synthesis for linear temporal logic and steady-state specifications. Auton Agent Multi-Agent Syst 38, 17 (2024). https://doi.org/10.1007/s10458-024-09648-7

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10458-024-09648-7

Keywords

Navigation