Formalization of Methods for the Development of Autonomous Artificial Intelligence Systems

Zgurovsky, M. Z.; Kasyanov, P. O.; Levenchuk, L. B.

doi:10.1007/s10559-023-00612-z

Formalization of Methods for the Development of Autonomous Artificial Intelligence Systems

Published: 07 October 2023

Volume 59, pages 763–771, (2023)
Cite this article

Cybernetics and Systems Analysis Aims and scope

M. Z. Zgurovsky¹,
P. O. Kasyanov² &
L. B. Levenchuk¹

50 Accesses
2 Citations
Explore all metrics

Abstract

This paper explores the problem of formalizing the development of autonomous artificial intelligence systems (AAISs) whose mathematical models may be complex or non-identifiable. Using the value-iterations method for Q-functions of rewards, a methodology for constructing ε-optimal strategies with a given accuracy was developed. The results allow us to outline classes (including dual-use), for which it is possible to rigorously justify the construction of optimal and ε-optimal strategies even in cases where the models are identifiable, but the computational complexity of standard dynamic programming algorithms may not be strictly polynomial.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Formalization and Development of Autonomous Artificial Intelligence Systems

Analysis of Algorithms and Partial Algorithms

Universal Artificial Intelligence

References

E. A. Feinberg, M. A. Bender, M. T. Curry, D. Huang, T. Koutsoudis, and J. L. Bernstein, “Sensor resource management for an airborne early warning radar,” in: O. E. Drummond (ed.), Signal and Data Processing of Small Targets, Proc. of SPIE, Vol. 4728 (2002), pp. 145–156. https://doi.org/10.1117/12.478500.
E. A. Feinberg, P. O. Kasyanov, and M. Z. Zgurovsky, “Continuity of equilibria for twoperson zero-sum games with noncompact action sets and unbounded payoffs,” Ann. Oper. Res., Vol. 317, 537–568 (2022). https://doi.org/10.1007/s10479-017-2677-y.
Article MathSciNet MATH Google Scholar
E. A. Feinberg, P. O. Kasyanov, and M. Z. Zgurovsky, “A class of solvable Markov decision models with incomplete information,” in: 2021 60th IEEE Conf. on Decision and Control (CDC), Austin, TX, USA (2021), pp. 1615–1620, https://doi.org/10.1109/CDC45484.2021.9683160.
Article Google Scholar
V. Myers and D. P. Williams, “Adaptive multiview target classification in synthetic aperture sonar images using a partially observable Markov decision process,” IEEE J. Ocean. Eng., Vol. 37, No. 1, 45–55 (2012). https://doi.org/10.1109/JOE.2011.2175510.
Article Google Scholar
A. B. Piunovskiy, Examples in Markov Decision Processes, Imperial College Press., London (2012). https://doi.org/10.1142/p809.
M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley & Sons, Inc. (2005). https://doi.org/10.1002/9780470316887.
Article Google Scholar
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed., MIT Press, Cambridge–London (2018).
MATH Google Scholar
C. Y. Wakayama and Z. B. Zabinsky, “Simulation-driven task prioritization using a restless bandit model for active sonar missions,” in: 2015 Winter Simulation Conf. (WSC), Huntington Beach, CA, USA (2015), pp. 3725–3736. https://doi.org/10.1109/WSC.2015.7408530.
Article Google Scholar
W. A. Wallis, “The statistical research group, 1942–1945,” J. Am. Stat. Assoc., Vol. 75, No. 370, 320–330 (1980). https://doi.org/10.2307/2287451.
Article MathSciNet MATH Google Scholar
V. Yordanova, H. Griffiths, and S. Hailes, “Rendezvous planning for multiple autonomous underwater vehicles using a Markov decision process,” IET Radar, Sonar Navig., Vol. 11, No. 12, 1762–1769 (2017). https://doi.org/10.1049/iet-rsn.2017.0098.
Article Google Scholar
D. Silver, S. Singh, D. Precup, and R. S. Sutton, “Reward is enough,” Artif. Intell., Vol. 299, 103535 (2021). https://doi.org/10.1016/j.artint.2021.103535.
Article MathSciNet MATH Google Scholar
A. D. Kara, N. Saldi, and S. Yüksel, “Q-learning for MDPs with general spaces: Convergence and near optimality via quantization under weak continuity,” arXiv:2111.06781v1 [cs.LG] 12 Nov (2021). https://doi.org/10.48550/arXiv.2111.06781.
A. D. Kara and S. Yüksel, “Convergence of finite memory Q-learning for POMDPs and near optimality of learned policies under filter stability,” Math. Oper. Res. (2022). https://doi.org/10.1287/moor.2022.1331.
Article Google Scholar
K. R. Parthasarathy, Probability Measures on Metric Spaces, Academic Press, New York (1967).
Book MATH Google Scholar
D. P. Bertsekas and S. E. Shreve, Stochastic Optimal Control: The Discrete-Time Case, Athena Scientific, Belmont, MA (1996).
MATH Google Scholar
O. Hernández-Lerma and J. B. Lassere, Discrete-Time Markov Control Processes: Basic Optimality Criteria, Springer, New York (1996). https://doi.org/10.1007/978-1-4612-0729-0.
Article Google Scholar
E. A. Feinberg, P. O. Kasyanov, and N. V. Zadoianchuk, “Berge’s theorem for noncompact image sets,” J. Math. Anal. Appl., Vol. 397, Iss. 1, 255–259 (2013). https://doi.org/10.1016/j.jmaa.2012.07.051.
E. A. Feinberg, P. O. Kasyanov, and N. V. Zadoianchuk, “Average-cost Markov decision processes with weakly continuous transition probabilities,” Math. Oper. Res., Vol. 37, No. 4, 591–607 (2012). https://doi.org/10.1287/moor.1120.0555.
Article MathSciNet MATH Google Scholar
D. Rhenius, “Incomplete information in Markovian decision models,” Ann. Statist., Vol. 2, No. 6, 1327–1334 (1974). DOI: https://doi.org/10.1214/aos/1176342886 .
Article MathSciNet MATH Google Scholar
A. A. Yushkevich, “Reduction of a controlled Markov model with incomplete data to a problem with complete information in the case of Borel state and control spaces,” Theory Probab., Vol. 21, No. 1, 153–158 (1976). https://doi.org/10.1137/1121014.
Article MATH Google Scholar
E. B. Dynkin and A. A. Yushkevich, Controlled Markov Processes, Springer-Verlag, New York (1979).
Book Google Scholar
D. Bertsekas, “Multiagent rollout algorithms and reinforcement learning,” arXiv:1910.00120 [cs.LG], 30 Sep (2019). https://doi.org/10.48550/arXiv.1910.00120.
O. Hernández-Lerma, Adaptive Markov Control Processes, Springer, New York, (1989). https://doi.org/10.1007/978-1-4419-8714-3.
Article MATH Google Scholar
E. A. Feinberg, P. O. Kasyanov, and M. Z. Zgurovsky, “Markov decision processes with incomplete information and semiuniform feller transition probabilities,” SIAM J. Control Optim., Vol. 60, No. 4, 2488–2513 (2022). https://doi.org/10.1137/21M1442152.
Article MathSciNet MATH Google Scholar
E. J. Sondik, “The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs,” Oper. Res., Vol. 26, No. 2, 282–304 (1978). https://doi.org/10.1287/opre.26.2.282.
Article MathSciNet MATH Google Scholar
O. Hernández-Lerma and J. B. Lasserre, Further Topics on Discrete-Time Markov Control Processes, Springer Science & Business Media, New York (2012).
MATH Google Scholar
E. A. Feinberg, P. O. Kasyanov, and M. Z. Zgurovsky, “Convergence of value iterations for total-cost MDPs and POMDPs with general state and action sets,” in: 2014 IEEE Symp. on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), Orlando, FL, USA (2014), pp. 1–8. doi: https://doi.org/10.1109/ADPRL.2014.7010613.
Article Google Scholar
C. Szepesvári, Algorithms for Reinforcement Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, Springer, Cham (2010). https://doi.org/10.1007/978-3-031-01551-9.
M. Rempel and J. Cai, “A review of approximate dynamic programming applications within military operations research,” Oper. Res. Perspect., Vol. 8, 100204 (2021). https://doi.org/10.1016/j.orp.2021.100204.
Article MathSciNet Google Scholar
Science & Technology Strategy for Intelligent Autonomous Systems, Department of the Navy, July 2 (2021). URL: https://www.nre.navy.mil/media/document/department-navy-science-technology-strategy-intelligent-autonomous-systems.
E. A. Feinberg and J. Huang, “The value iteration algorithm is not strongly polynomial for discounted dynamic programming,” Oper. Res. Lett., Vol. 42, Iss. 2, 130–131 (2014). https://doi.org/10.1016/j.orl.2013.12.011.
G. Arslan, S. Yüksel, “Decentralized Q-learning for stochastic teams and games,” IEEE Trans. Autom. Control, Vol. 62, No. 4, 1545–1558 (2017). https://doi.org/10.1109/TAC.2016.2598476.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine
M. Z. Zgurovsky & L. B. Levenchuk
Educational and Research Institute for Applied System Analysis of the National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute,” Ministry of Education and Science of Ukraine and the National Academy of Sciences of Ukraine, Kyiv, Ukraine
P. O. Kasyanov

Authors

M. Z. Zgurovsky
View author publications
You can also search for this author in PubMed Google Scholar
P. O. Kasyanov
View author publications
You can also search for this author in PubMed Google Scholar
L. B. Levenchuk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Z. Zgurovsky.

Additional information

Translated from Kibernetyka ta Systemnyi Analiz, No. 5, September–October, 2023, pp. 89–99.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zgurovsky, M.Z., Kasyanov, P.O. & Levenchuk, L.B. Formalization of Methods for the Development of Autonomous Artificial Intelligence Systems. Cybern Syst Anal 59, 763–771 (2023). https://doi.org/10.1007/s10559-023-00612-z

Download citation

Received: 24 April 2023
Published: 07 October 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s10559-023-00612-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Formalization of Methods for the Development of Autonomous Artificial Intelligence Systems

Abstract

Access this article

Similar content being viewed by others

Formalization and Development of Autonomous Artificial Intelligence Systems

Analysis of Algorithms and Partial Algorithms

Universal Artificial Intelligence

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Formalization of Methods for the Development of Autonomous Artificial Intelligence Systems

Abstract

Access this article

Similar content being viewed by others

Formalization and Development of Autonomous Artificial Intelligence Systems

Analysis of Algorithms and Partial Algorithms

Universal Artificial Intelligence

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation