Controller synthesis for linear temporal logic and steady-state specifications

Velasquez, Alvaro; Alkhouri, Ismail; Beckus, Andre; Trivedi, Ashutosh; Atia, George

doi:10.1007/s10458-024-09648-7

Controller synthesis for linear temporal logic and steady-state specifications

Published: 03 May 2024

Volume 38, article number 17, (2024)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Alvaro Velasquez¹,
Ismail Alkhouri^2,3,
Andre Beckus⁴,
Ashutosh Trivedi¹ &
…
George Atia⁵

58 Accesses
Explore all metrics

Abstract

The problem of deriving decision-making policies, subject to some formal specification of behavior, has been well-studied in the control synthesis, reinforcement learning, and planning communities. Such problems are typically framed in the context of a non-deterministic decision process, the non-determinism of which is optimally resolved by the computed policy. In this paper, we explore the derivation of such policies in Markov decision processes (MDPs) subject to two types of formal specifications. First, we consider steady-state specifications that reason about the infinite-frequency behavior of the resulting agent. This behavior corresponds to the frequency with which an agent visits each state as it follows its decision-making policy indefinitely. Second, we examine the infinite-trace behavior of the agent by imposing Linear Temporal Logic (LTL) constraints on the behavior induced by the resulting policy. We present an algorithm to find a deterministic policy satisfying LTL and steady-state constraints by characterizing the solutions as an integer linear program (ILP) and experimentally evaluate our approach. In our experimental results section, we evaluate the proposed ILP using MDPs with stochastic and deterministic transitions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal Deterministic Controller Synthesis from Steady-State Distributions

Article 12 January 2023

Enforcing Almost-Sure Reachability in POMDPs

A Framework for Transforming Specifications in Reinforcement Learning

Notes

LTL is not equivalent to the \(\omega\)-regular languages in terms of expressive power. It is worth noting that our approach applies to logics that are strictly more expressive than LTL, such as Linear Dynamic Logic (LDL), which was introduced by Vardi [4] and has the same expressive power as the \(\omega\)-regular languages. We simply adopt LTL for ease of presentation.
https://github.com/ialkhouri/SSLTL.

References

Velasquez, A., Alkhouri, I., Beckus, A., Trivedi, A., & Atia, G.(2022). Controller synthesis for omega-regular and steady-state specifications. In Proceedings of the 21st international conference on autonomous agents and multiagent systems (pp. 1310–1318).
Thomas, W. (1990). Handbook of theoretical computer science (pp. 133–191). Chap. Automata on Infinite Objects.
Perrin, D., & Pin, J. (2004). Infinite words: Automata, semigroups, logic and games. Academic Press.
Vardi, M. (2011). The rise and fall of linear time logic. In 2nd International Symposium on Games, Automata, Logics and Formal Verification.
de Alfaro, L. (1998). Formal verification of probabilistic systems. (Ph.D. Thesis, Stanford University).
Baier, C., & Katoen, J.-P. (2008). Principles of model checking. MIT Press.
Akshay, S., Bertrand, N., Haddad, S., & Helouet, L. (2013). The steady-state control problem for markov decision processes. In International conference on quantitative evaluation of systems (pp. 290–304). Springer.
Velasquez, A. (2019). Steady-state policy synthesis for verifiable control. In Proceedings of the 28th international joint conference on artificial intelligence, IJCAI-19 (pp. 5653–5661).
Atia, G., Beckus, A., & Alkhouri, I., Velasquez, A. (2020). Steady-state policy synthesis in multichain markov decision processes. In Proceedings of the 29th international joint conference on artificial intelligence, IJCAI-20 (pp. 4069–4075).
Atia, G. K., Beckus, A., Alkhouri, I., & Velasquez, A. (2021). Steady-state planning in expected reward multichain MDPs. Journal of Artificial Intelligence Research, 72, 1029–1082.
Article MathSciNet Google Scholar
Křetínskỳ, J. (2021). LTL-constrained steady-state policy synthesis. arXiv preprint arXiv:2105.14894.
Kallenberg, L. (2002). Classification problems in MDPs. In Markov Processes and Controlled Markov Chains (pp. 151–165).
Sarathy, V., Kasenberg, D., Goel, S., Sinapov, J., & Scheutz, M. (2021). Spotter: Extending symbolic planning operators through targeted reinforcement learning. In Proceedings of the 20th international conference on autonomous agents and multiAgent systems (pp. 1118–1126).
Ding, X. C. D., Smith, S. L., Belta, C., & Rus, D. (2011). LTL control in uncertain environments with probabilistic satisfaction guarantees. IFAC Proceedings Volumes, 44(1), 3515–3520.
Article Google Scholar
Lacerda, B., Parker, D., & Hawes, N. (2014). Optimal and dynamic planning for markov decision processes with co-safe LTL specifications. In 2014 IEEE/RSJ international conference on intelligent robots and systems (pp. 1511–1516). IEEE.
Norris, J. R., & Norris, J. R. (1998). Markov chains, vol. 2. Cambridge University Press.
Pnueli, A., & Rosner, R. (1989). On the synthesis of a reactive module. In Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on principles of programming languages (pp. 179–190).
Church, A. (1963). Application of recursive arithmetic to the problem of circuit synthesis. Journal of Symbolic Logic, 28(4).
Etessami, K., Kwiatkowska, M., Vardi, M. Y., & Yannakakis, M. (2007). Multi-objective model checking of Markov decision processes. In International conference on tools and algorithms for the construction and analysis of systems (pp. 50–65). Springer.
Yannakakis, M., Vardi, M. Y., Kwiatkowska, M., & Etessami, K. (2008). Multi-objective model checking of Markov decision processes. Logical Methods in Computer Science.
Forejt, V., Kwiatkowska, M., Norman, G., & Parker, D. (2011). Automated verification techniques for probabilistic systems. In International school on formal methods for the design of computer, communication and software systems (pp. 53–113). Springer.
Chatterjee, K., & Henzinger, M. (2011). Faster and dynamic algorithms for maximal end-component decomposition and related graph problems in probabilistic verification. In Symposium on discrete algorithms (SODA) (pp. 1318–1336).
Kallenberg, L. C. M. (1983). Linear programming and finite Markovian control problems. Mathematisch Centrum.
Puterman, M. L. (1994). Markov decision processes. Wiley.
Altman, E. (1998). Constrained Markov decision processes with total cost criteria: Lagrangian approach and dual linear program. Mathematical Methods of Operations Research, 48(3), 387–417.
Article MathSciNet Google Scholar
Feinberg, E. A. (2009). Adaptive computation of optimal nonrandomized policies in constrained average-reward MDPs. In IEEE symposium on adaptive dynamic programming and reinforcement learning (pp. 96–100). https://doi.org/10.1109/ADPRL.2009.4927531
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT Press.
Bouyer, P., Markey, N., & Matteplackel, R. M. (2014). Averaging in LTL. In: International conference on concurrency theory (pp. 266–280). Springer.
Almagor, S., Boker, U., & Kupferman, O. (2014). Discounting in LTL. In International conference on tools and algorithms for the construction and analysis of systems (pp. 424–439). Springer.
Boker, U., Chatterjee, K., Henzinger, T. A., & Kupferman, O. (2014). Temporal specifications with accumulative values. ACM Transactions on Computational Logic (TOCL), 15(4), 1–25.
Article MathSciNet Google Scholar
Bollig, B., Decker, N., & Leucker, M. (2012). Frequency linear-time temporal logic. In 2012 6th international symposium on theoretical aspects of software engineering (pp. 85–92). IEEE.
Svoreňová, M., Černá, I., & Belta, C. (2013). Optimal control of MDPs with temporal logic constraints. In 52nd IEEE conference on decision and control (pp. 3938–3943). IEEE.
Altman, E., Boularouk, S., & Josselin, D. (2019). Constrained Markov decision processes with total expected cost criteria. In Proceedings of the 12th EAI international conference on performance evaluation methodologies and tools (pp. 191–192). ACM.
Krass, D., & Vrieze, O. J. (2002). Achieving target state-action frequencies in multichain average-reward Markov decision processes. Mathematics of Operations Research, 27(3), 545–566.
Article MathSciNet Google Scholar
Esparza, J., & Křetínskỳ, J. (2014). From ltl to deterministic automata: A safraless compositional approach. In Computer aided verification: 26th international conference, CAV 2014, held as part of the Vienna summer of logic, VSL 2014, Vienna, Austria, July 18–22, 2014. Proceedings 26 (pp. 192–208). Springer.
Trevizan, F. W., Thiébaux, S., & Haslum, P. (2017). Occupation measure heuristics for probabilistic planning. In ICAPS (pp. 306–315).
Buchholz, P. (1994). Exact and ordinary lumpability in finite Markov chains. Journal of Applied Probability, 59–75.
Sumita, U., & Rieders, M. (1989). Lumpability and time reversibility in the aggregation-disaggregation method for large Markov chains. Stochastic Models, 5(1), 63–81.
Article MathSciNet Google Scholar
ILOG, Inc. (2006). ILOG CPLEX: High-performance software for mathematical programming and optimization. See http://www.ilog.com/products/cplex/.

Download references

Acknowledgements

This research was supported in part by the Air Force Research Laboratory through the Information Directorate’s Information Institute\(^{\circledR }\) Contract Number FA8750-20-3-1003 and FA8750-20-3-1004, the Air Force Office of Scientific Research through Award 20RICOR012, and the National Science Foundation through CAREER Award CCF-1552497 and Award CCF-2106339.

Author information

Authors and Affiliations

Department of CS, University of Colorado, Boulder, CO, USA
Alvaro Velasquez & Ashutosh Trivedi
Department of CMSE, Michigan State University, East Lansing, MI, USA
Ismail Alkhouri
Department of EECS, University of Michigan, Ann Arbor, MI, USA
Ismail Alkhouri
Information Directorate, Air Force Research Laboratory, Rome, NY, USA
Andre Beckus
Departments of ECE and CS, University of Central Florida, Orlando, FL, USA
George Atia

Authors

Alvaro Velasquez
View author publications
You can also search for this author in PubMed Google Scholar
Ismail Alkhouri
View author publications
You can also search for this author in PubMed Google Scholar
Andre Beckus
View author publications
You can also search for this author in PubMed Google Scholar
Ashutosh Trivedi
View author publications
You can also search for this author in PubMed Google Scholar
George Atia
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.V. proposed the idea, wrote the paper, and developed the theoretical work. I.A. and A.B. worked on the experiments, code, and figures. G.A. and A.T. helped with the writing and developing the theoretical work.

Corresponding authors

Correspondence to Alvaro Velasquez or Ismail Alkhouri.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A conference version of this work (with mathematical proofs omitted) appeared in AAMAS 2022 [1]. The work was done while at the University of Central Florida.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Velasquez, A., Alkhouri, I., Beckus, A. et al. Controller synthesis for linear temporal logic and steady-state specifications. Auton Agent Multi-Agent Syst 38, 17 (2024). https://doi.org/10.1007/s10458-024-09648-7

Download citation

Accepted: 08 April 2024
Published: 03 May 2024
DOI: https://doi.org/10.1007/s10458-024-09648-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Controller synthesis for linear temporal logic and steady-state specifications

Abstract

Access this article

Similar content being viewed by others

Optimal Deterministic Controller Synthesis from Steady-State Distributions

Enforcing Almost-Sure Reachability in POMDPs

A Framework for Transforming Specifications in Reinforcement Learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Controller synthesis for linear temporal logic and steady-state specifications

Abstract

Access this article

Similar content being viewed by others

Optimal Deterministic Controller Synthesis from Steady-State Distributions

Enforcing Almost-Sure Reachability in POMDPs

A Framework for Transforming Specifications in Reinforcement Learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation