Skip to main content
Log in

Formalization of Methods for the Development of Autonomous Artificial Intelligence Systems

  • Published:
Cybernetics and Systems Analysis Aims and scope

Abstract

This paper explores the problem of formalizing the development of autonomous artificial intelligence systems (AAISs) whose mathematical models may be complex or non-identifiable. Using the value-iterations method for Q-functions of rewards, a methodology for constructing ε-optimal strategies with a given accuracy was developed. The results allow us to outline classes (including dual-use), for which it is possible to rigorously justify the construction of optimal and ε-optimal strategies even in cases where the models are identifiable, but the computational complexity of standard dynamic programming algorithms may not be strictly polynomial.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. E. A. Feinberg, M. A. Bender, M. T. Curry, D. Huang, T. Koutsoudis, and J. L. Bernstein, “Sensor resource management for an airborne early warning radar,” in: O. E. Drummond (ed.), Signal and Data Processing of Small Targets, Proc. of SPIE, Vol. 4728 (2002), pp. 145–156. https://doi.org/10.1117/12.478500.

  2. E. A. Feinberg, P. O. Kasyanov, and M. Z. Zgurovsky, “Continuity of equilibria for twoperson zero-sum games with noncompact action sets and unbounded payoffs,” Ann. Oper. Res., Vol. 317, 537–568 (2022). https://doi.org/10.1007/s10479-017-2677-y.

    Article  MathSciNet  MATH  Google Scholar 

  3. E. A. Feinberg, P. O. Kasyanov, and M. Z. Zgurovsky, “A class of solvable Markov decision models with incomplete information,” in: 2021 60th IEEE Conf. on Decision and Control (CDC), Austin, TX, USA (2021), pp. 1615–1620, https://doi.org/10.1109/CDC45484.2021.9683160.

    Article  Google Scholar 

  4. V. Myers and D. P. Williams, “Adaptive multiview target classification in synthetic aperture sonar images using a partially observable Markov decision process,” IEEE J. Ocean. Eng., Vol. 37, No. 1, 45–55 (2012). https://doi.org/10.1109/JOE.2011.2175510.

    Article  Google Scholar 

  5. A. B. Piunovskiy, Examples in Markov Decision Processes, Imperial College Press., London (2012). https://doi.org/10.1142/p809.

  6. M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley & Sons, Inc. (2005). https://doi.org/10.1002/9780470316887.

    Article  Google Scholar 

  7. R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed., MIT Press, Cambridge–London (2018).

    MATH  Google Scholar 

  8. C. Y. Wakayama and Z. B. Zabinsky, “Simulation-driven task prioritization using a restless bandit model for active sonar missions,” in: 2015 Winter Simulation Conf. (WSC), Huntington Beach, CA, USA (2015), pp. 3725–3736. https://doi.org/10.1109/WSC.2015.7408530.

    Article  Google Scholar 

  9. W. A. Wallis, “The statistical research group, 1942–1945,” J. Am. Stat. Assoc., Vol. 75, No. 370, 320–330 (1980). https://doi.org/10.2307/2287451.

    Article  MathSciNet  MATH  Google Scholar 

  10. V. Yordanova, H. Griffiths, and S. Hailes, “Rendezvous planning for multiple autonomous underwater vehicles using a Markov decision process,” IET Radar, Sonar Navig., Vol. 11, No. 12, 1762–1769 (2017). https://doi.org/10.1049/iet-rsn.2017.0098.

    Article  Google Scholar 

  11. D. Silver, S. Singh, D. Precup, and R. S. Sutton, “Reward is enough,” Artif. Intell., Vol. 299, 103535 (2021). https://doi.org/10.1016/j.artint.2021.103535.

    Article  MathSciNet  MATH  Google Scholar 

  12. A. D. Kara, N. Saldi, and S. Yüksel, “Q-learning for MDPs with general spaces: Convergence and near optimality via quantization under weak continuity,” arXiv:2111.06781v1 [cs.LG] 12 Nov (2021). https://doi.org/10.48550/arXiv.2111.06781.

  13. A. D. Kara and S. Yüksel, “Convergence of finite memory Q-learning for POMDPs and near optimality of learned policies under filter stability,” Math. Oper. Res. (2022). https://doi.org/10.1287/moor.2022.1331.

    Article  Google Scholar 

  14. K. R. Parthasarathy, Probability Measures on Metric Spaces, Academic Press, New York (1967).

    Book  MATH  Google Scholar 

  15. D. P. Bertsekas and S. E. Shreve, Stochastic Optimal Control: The Discrete-Time Case, Athena Scientific, Belmont, MA (1996).

    MATH  Google Scholar 

  16. O. Hernández-Lerma and J. B. Lassere, Discrete-Time Markov Control Processes: Basic Optimality Criteria, Springer, New York (1996). https://doi.org/10.1007/978-1-4612-0729-0.

    Article  Google Scholar 

  17. E. A. Feinberg, P. O. Kasyanov, and N. V. Zadoianchuk, “Berge’s theorem for noncompact image sets,” J. Math. Anal. Appl., Vol. 397, Iss. 1, 255–259 (2013). https://doi.org/10.1016/j.jmaa.2012.07.051.

  18. E. A. Feinberg, P. O. Kasyanov, and N. V. Zadoianchuk, “Average-cost Markov decision processes with weakly continuous transition probabilities,” Math. Oper. Res., Vol. 37, No. 4, 591–607 (2012). https://doi.org/10.1287/moor.1120.0555.

    Article  MathSciNet  MATH  Google Scholar 

  19. D. Rhenius, “Incomplete information in Markovian decision models,” Ann. Statist., Vol. 2, No. 6, 1327–1334 (1974). DOI: https://doi.org/10.1214/aos/1176342886 .

    Article  MathSciNet  MATH  Google Scholar 

  20. A. A. Yushkevich, “Reduction of a controlled Markov model with incomplete data to a problem with complete information in the case of Borel state and control spaces,” Theory Probab., Vol. 21, No. 1, 153–158 (1976). https://doi.org/10.1137/1121014.

    Article  MATH  Google Scholar 

  21. E. B. Dynkin and A. A. Yushkevich, Controlled Markov Processes, Springer-Verlag, New York (1979).

    Book  Google Scholar 

  22. D. Bertsekas, “Multiagent rollout algorithms and reinforcement learning,” arXiv:1910.00120 [cs.LG], 30 Sep (2019). https://doi.org/10.48550/arXiv.1910.00120.

  23. O. Hernández-Lerma, Adaptive Markov Control Processes, Springer, New York, (1989). https://doi.org/10.1007/978-1-4419-8714-3.

    Article  MATH  Google Scholar 

  24. E. A. Feinberg, P. O. Kasyanov, and M. Z. Zgurovsky, “Markov decision processes with incomplete information and semiuniform feller transition probabilities,” SIAM J. Control Optim., Vol. 60, No. 4, 2488–2513 (2022). https://doi.org/10.1137/21M1442152.

    Article  MathSciNet  MATH  Google Scholar 

  25. E. J. Sondik, “The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs,” Oper. Res., Vol. 26, No. 2, 282–304 (1978). https://doi.org/10.1287/opre.26.2.282.

    Article  MathSciNet  MATH  Google Scholar 

  26. O. Hernández-Lerma and J. B. Lasserre, Further Topics on Discrete-Time Markov Control Processes, Springer Science & Business Media, New York (2012).

    MATH  Google Scholar 

  27. E. A. Feinberg, P. O. Kasyanov, and M. Z. Zgurovsky, “Convergence of value iterations for total-cost MDPs and POMDPs with general state and action sets,” in: 2014 IEEE Symp. on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), Orlando, FL, USA (2014), pp. 1–8. doi: https://doi.org/10.1109/ADPRL.2014.7010613.

    Article  Google Scholar 

  28. C. Szepesvári, Algorithms for Reinforcement Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, Springer, Cham (2010). https://doi.org/10.1007/978-3-031-01551-9.

  29. M. Rempel and J. Cai, “A review of approximate dynamic programming applications within military operations research,” Oper. Res. Perspect., Vol. 8, 100204 (2021). https://doi.org/10.1016/j.orp.2021.100204.

    Article  MathSciNet  Google Scholar 

  30. Science & Technology Strategy for Intelligent Autonomous Systems, Department of the Navy, July 2 (2021). URL: https://www.nre.navy.mil/media/document/department-navy-science-technology-strategy-intelligent-autonomous-systems.

  31. E. A. Feinberg and J. Huang, “The value iteration algorithm is not strongly polynomial for discounted dynamic programming,” Oper. Res. Lett., Vol. 42, Iss. 2, 130–131 (2014). https://doi.org/10.1016/j.orl.2013.12.011.

  32. G. Arslan, S. Yüksel, “Decentralized Q-learning for stochastic teams and games,” IEEE Trans. Autom. Control, Vol. 62, No. 4, 1545–1558 (2017). https://doi.org/10.1109/TAC.2016.2598476.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Z. Zgurovsky.

Additional information

Translated from Kibernetyka ta Systemnyi Analiz, No. 5, September–October, 2023, pp. 89–99.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zgurovsky, M.Z., Kasyanov, P.O. & Levenchuk, L.B. Formalization of Methods for the Development of Autonomous Artificial Intelligence Systems. Cybern Syst Anal 59, 763–771 (2023). https://doi.org/10.1007/s10559-023-00612-z

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10559-023-00612-z

Keywords

Navigation