Abstract
The problem of deriving decision-making policies, subject to some formal specification of behavior, has been well-studied in the control synthesis, reinforcement learning, and planning communities. Such problems are typically framed in the context of a non-deterministic decision process, the non-determinism of which is optimally resolved by the computed policy. In this paper, we explore the derivation of such policies in Markov decision processes (MDPs) subject to two types of formal specifications. First, we consider steady-state specifications that reason about the infinite-frequency behavior of the resulting agent. This behavior corresponds to the frequency with which an agent visits each state as it follows its decision-making policy indefinitely. Second, we examine the infinite-trace behavior of the agent by imposing Linear Temporal Logic (LTL) constraints on the behavior induced by the resulting policy. We present an algorithm to find a deterministic policy satisfying LTL and steady-state constraints by characterizing the solutions as an integer linear program (ILP) and experimentally evaluate our approach. In our experimental results section, we evaluate the proposed ILP using MDPs with stochastic and deterministic transitions.
Similar content being viewed by others
Notes
LTL is not equivalent to the \(\omega\)-regular languages in terms of expressive power. It is worth noting that our approach applies to logics that are strictly more expressive than LTL, such as Linear Dynamic Logic (LDL), which was introduced by Vardi [4] and has the same expressive power as the \(\omega\)-regular languages. We simply adopt LTL for ease of presentation.
References
Velasquez, A., Alkhouri, I., Beckus, A., Trivedi, A., & Atia, G.(2022). Controller synthesis for omega-regular and steady-state specifications. In Proceedings of the 21st international conference on autonomous agents and multiagent systems (pp. 1310–1318).
Thomas, W. (1990). Handbook of theoretical computer science (pp. 133–191). Chap. Automata on Infinite Objects.
Perrin, D., & Pin, J. (2004). Infinite words: Automata, semigroups, logic and games. Academic Press.
Vardi, M. (2011). The rise and fall of linear time logic. In 2nd International Symposium on Games, Automata, Logics and Formal Verification.
de Alfaro, L. (1998). Formal verification of probabilistic systems. (Ph.D. Thesis, Stanford University).
Baier, C., & Katoen, J.-P. (2008). Principles of model checking. MIT Press.
Akshay, S., Bertrand, N., Haddad, S., & Helouet, L. (2013). The steady-state control problem for markov decision processes. In International conference on quantitative evaluation of systems (pp. 290–304). Springer.
Velasquez, A. (2019). Steady-state policy synthesis for verifiable control. In Proceedings of the 28th international joint conference on artificial intelligence, IJCAI-19 (pp. 5653–5661).
Atia, G., Beckus, A., & Alkhouri, I., Velasquez, A. (2020). Steady-state policy synthesis in multichain markov decision processes. In Proceedings of the 29th international joint conference on artificial intelligence, IJCAI-20 (pp. 4069–4075).
Atia, G. K., Beckus, A., Alkhouri, I., & Velasquez, A. (2021). Steady-state planning in expected reward multichain MDPs. Journal of Artificial Intelligence Research, 72, 1029–1082.
Křetínskỳ, J. (2021). LTL-constrained steady-state policy synthesis. arXiv preprint arXiv:2105.14894.
Kallenberg, L. (2002). Classification problems in MDPs. In Markov Processes and Controlled Markov Chains (pp. 151–165).
Sarathy, V., Kasenberg, D., Goel, S., Sinapov, J., & Scheutz, M. (2021). Spotter: Extending symbolic planning operators through targeted reinforcement learning. In Proceedings of the 20th international conference on autonomous agents and multiAgent systems (pp. 1118–1126).
Ding, X. C. D., Smith, S. L., Belta, C., & Rus, D. (2011). LTL control in uncertain environments with probabilistic satisfaction guarantees. IFAC Proceedings Volumes, 44(1), 3515–3520.
Lacerda, B., Parker, D., & Hawes, N. (2014). Optimal and dynamic planning for markov decision processes with co-safe LTL specifications. In 2014 IEEE/RSJ international conference on intelligent robots and systems (pp. 1511–1516). IEEE.
Norris, J. R., & Norris, J. R. (1998). Markov chains, vol. 2. Cambridge University Press.
Pnueli, A., & Rosner, R. (1989). On the synthesis of a reactive module. In Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on principles of programming languages (pp. 179–190).
Church, A. (1963). Application of recursive arithmetic to the problem of circuit synthesis. Journal of Symbolic Logic, 28(4).
Etessami, K., Kwiatkowska, M., Vardi, M. Y., & Yannakakis, M. (2007). Multi-objective model checking of Markov decision processes. In International conference on tools and algorithms for the construction and analysis of systems (pp. 50–65). Springer.
Yannakakis, M., Vardi, M. Y., Kwiatkowska, M., & Etessami, K. (2008). Multi-objective model checking of Markov decision processes. Logical Methods in Computer Science.
Forejt, V., Kwiatkowska, M., Norman, G., & Parker, D. (2011). Automated verification techniques for probabilistic systems. In International school on formal methods for the design of computer, communication and software systems (pp. 53–113). Springer.
Chatterjee, K., & Henzinger, M. (2011). Faster and dynamic algorithms for maximal end-component decomposition and related graph problems in probabilistic verification. In Symposium on discrete algorithms (SODA) (pp. 1318–1336).
Kallenberg, L. C. M. (1983). Linear programming and finite Markovian control problems. Mathematisch Centrum.
Puterman, M. L. (1994). Markov decision processes. Wiley.
Altman, E. (1998). Constrained Markov decision processes with total cost criteria: Lagrangian approach and dual linear program. Mathematical Methods of Operations Research, 48(3), 387–417.
Feinberg, E. A. (2009). Adaptive computation of optimal nonrandomized policies in constrained average-reward MDPs. In IEEE symposium on adaptive dynamic programming and reinforcement learning (pp. 96–100). https://doi.org/10.1109/ADPRL.2009.4927531
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT Press.
Bouyer, P., Markey, N., & Matteplackel, R. M. (2014). Averaging in LTL. In: International conference on concurrency theory (pp. 266–280). Springer.
Almagor, S., Boker, U., & Kupferman, O. (2014). Discounting in LTL. In International conference on tools and algorithms for the construction and analysis of systems (pp. 424–439). Springer.
Boker, U., Chatterjee, K., Henzinger, T. A., & Kupferman, O. (2014). Temporal specifications with accumulative values. ACM Transactions on Computational Logic (TOCL), 15(4), 1–25.
Bollig, B., Decker, N., & Leucker, M. (2012). Frequency linear-time temporal logic. In 2012 6th international symposium on theoretical aspects of software engineering (pp. 85–92). IEEE.
Svoreňová, M., Černá, I., & Belta, C. (2013). Optimal control of MDPs with temporal logic constraints. In 52nd IEEE conference on decision and control (pp. 3938–3943). IEEE.
Altman, E., Boularouk, S., & Josselin, D. (2019). Constrained Markov decision processes with total expected cost criteria. In Proceedings of the 12th EAI international conference on performance evaluation methodologies and tools (pp. 191–192). ACM.
Krass, D., & Vrieze, O. J. (2002). Achieving target state-action frequencies in multichain average-reward Markov decision processes. Mathematics of Operations Research, 27(3), 545–566.
Esparza, J., & Křetínskỳ, J. (2014). From ltl to deterministic automata: A safraless compositional approach. In Computer aided verification: 26th international conference, CAV 2014, held as part of the Vienna summer of logic, VSL 2014, Vienna, Austria, July 18–22, 2014. Proceedings 26 (pp. 192–208). Springer.
Trevizan, F. W., Thiébaux, S., & Haslum, P. (2017). Occupation measure heuristics for probabilistic planning. In ICAPS (pp. 306–315).
Buchholz, P. (1994). Exact and ordinary lumpability in finite Markov chains. Journal of Applied Probability, 59–75.
Sumita, U., & Rieders, M. (1989). Lumpability and time reversibility in the aggregation-disaggregation method for large Markov chains. Stochastic Models, 5(1), 63–81.
ILOG, Inc. (2006). ILOG CPLEX: High-performance software for mathematical programming and optimization. See http://www.ilog.com/products/cplex/.
Acknowledgements
This research was supported in part by the Air Force Research Laboratory through the Information Directorate’s Information Institute\(^{\circledR }\) Contract Number FA8750-20-3-1003 and FA8750-20-3-1004, the Air Force Office of Scientific Research through Award 20RICOR012, and the National Science Foundation through CAREER Award CCF-1552497 and Award CCF-2106339.
Author information
Authors and Affiliations
Contributions
A.V. proposed the idea, wrote the paper, and developed the theoretical work. I.A. and A.B. worked on the experiments, code, and figures. G.A. and A.T. helped with the writing and developing the theoretical work.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare no Conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A conference version of this work (with mathematical proofs omitted) appeared in AAMAS 2022 [1]. The work was done while at the University of Central Florida.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Velasquez, A., Alkhouri, I., Beckus, A. et al. Controller synthesis for linear temporal logic and steady-state specifications. Auton Agent Multi-Agent Syst 38, 17 (2024). https://doi.org/10.1007/s10458-024-09648-7
Accepted:
Published:
DOI: https://doi.org/10.1007/s10458-024-09648-7