Abstract
Markov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming, Hoboken, USA: John Wiley, 2005.
L. P. Kaelbling, M. L. Littman, A. R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, vol. 101, no. 1–2, pp. 99–134, 1998. DOI: https://doi.org/10.1016/S0004-3702(98)00023-X.
J. Barry, D. T. Barry, S. Aaronson. Quantum partially observable Markov decision processes. Physical Review A, vol. 90, no. 3, Article number 032311, 2014. DOI: https://doi.org/10.1103/PhysRevA.90.032311.
S. G. Ying, M. S. Ying. Reachability analysis of quantum Markov decision processes. Information and Computation, vol. 263, pp. 31–51, 2018. DOI: https://doi.org/10.1016/j.ic.2018.09.001.
M. S. Ying. Foundations of Quantum Programming, Amsterdam, Netherlands: Morgan Kaufmann, 2016.
M. S. Ying, N. K. Yu, Y. Feng, R. Y. Duan. Verification of quantum programs. Science of Computer Programming, vol. 78, no. 9, pp. 1679–1700, 2013. DOI: https://doi.org/10.1016/j.scico.2013.03.016.
J. Guan, Y. Feng, M. S. Ying. Decomposition of quantum Markov chains and its applications. Journal of Computer and System Sciences, vol. 95, pp. 55–68, 2018. DOI: https://doi.org/10.1016/j.jcss.2018.01.005.
M. S. Ying, Y. Feng. Model Checking Quantum Systems: Principles and Algorithms, Cambridge, USA: Cambridge University Press, 2021.
S. G. Ying, Y. Feng, N. K. Yu, M. S. Ying. Reachability probabilities of quantum Markov chains. In Proceedings of the 24th International Conference on Concurrency Theory, Springer, Buenos Aires, Argentina, pp. 334–348, 2013. DOI: https://doi.org/10.1007/978-3-642-40184-8_24.
D. Powell. Quantum boost for artificial intelligence. Nature, to be published.
M. S. Ying. Quantum computation, quantum theory and AI). Artificial Intelligence, vol. 174, no. 2, pp. 162–176, 2010. DOI: https://doi.org/10.1016/j.artint.2009.11.009.
J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, S. Lloyd. Quantum machine learning. Nature, vol. 549, no. 7671, pp. 195–202, 2017. DOI: https://doi.org/10.1038/nature23474.
V. Dunjko, H. J. Briegel. Machine learning & artificial intelligence in the quantum domain: A review of recent progress. Reports on Progress in Physics, vol. 81, no. 7, Article number 074001, 2018. DOI: https://doi.org/10.1088/1361-6633/aab406.
S. D. Sarma, D. L. Deng, L. M. Duan. Machine learning meets quantum physics. Physics Today, vol. 72, no. 3, pp. 48–54, 2019. DOI: https://doi.org/10.1063/PT.3.4164.
L. P. Kaelbling, M. L. Littman, A. W. Moore. Reinforcement learning: A survey. Journal of Artificial Intelligence Research, vol. 4, pp. 237–285, 1996. DOI: https://doi.org/10.1613/jair.301.
R. S. Sutton, A. G. Barto. Reinforcement Learning: An Introduction, Cambridge, USA: MIT Press, 1998.
D. Y. Dong, C. L. Chen, Z. H. Chen. Quantum reinforcement learning. In Proceedings of the 1st International Conference on Advances in Natural Computation, Springer, Changsha, China, pp. 686–689, 2005. DOI: https://doi.org/10.1007/11539117_97.
D. Y. Dong, C. L. Chen, H. X. Li, T. J. Tarn. Quantum reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 38, no. 5, pp. 1207–1220, 2008. DOI: https://doi.org/10.1109/TSMCB.2008.925743.
V. Dunjko, J. M. Taylor, H. J. Briegel. Quantum-enhanced machine learning. Physical Review Letters, vol. 117, no. 13, Article number 130501, 2016. DOI: https://doi.org/10.1103/PhysRevLett.117.130501.
V. Dunjko, J. M. Taylor, H. J. Briegel. Advances in quantum reinforcement learning. Proceedings of 2017 IEEE International Conference on Systems, Man, and Cybernetics, IEEE, Banff, Canada, pp. 282–287, 2017. DOI: https://doi.org/10.1109/SMC.2017.8122616.
A. Ambainis, E. Bach, A. Nayak, A. Vishwanath, J. Watrous. One-dimensional quantum walks. In Proceedings of the 33rd ACM Symposium on Theory of Computing, ACM, Heraklion, Greece, pp.37–49, 2001. DOI: https://doi.org/10.1145/380752.380757.
P. Benioff. Some foundational aspects of quantum computers and quantum robots. Superlattices and Microstructures, vol. 23, no. 3–4, pp. 407–417, 1998. DOI: https://doi.org/10.1006/spmi.1997.0519.
P. Benioff. Quantum robots and environments. Physical Review A, vol. 58, no. 2, pp. 893–904, 1998. DOI: https://doi.org/10.1103/PhysRevA.58.893.
D. Y. Dong, C. L. Chen, C. B. Zhang, Z. H. Chen. Quantum robot: Structure, algorithms and applications. Robotica, vol. 24, no. 4, pp. 513–521, 2006. DOI: https://doi.org/10.1017/S0263574705002596.
M. Mundhenk, J. Goldsmith, C. Lusena, E. Allender. Complexity of finite-horizon Markov decision process problems. Journal of the ACM, vol. 47, no. 4, pp. 681–720, 2000. DOI: https://doi.org/10.1145/347476.347480.
C. H. Papadimitriou, J. N. Tsitsiklis. The complexity of Markov decision processes. Mathematics of Operations Research, vol. 12, no. 3, pp. 441–450, 1987. DOI: https://doi.org/10.1287/moor.12.3.441.
N. Ferns, P. S. Castro, D. Precup, P. Panangaden. Methods for computing state similarity in Markov decision processes. In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence, AUAI, Cambridge, USA, pp. 174–181, 2006.
N. Ferns, P. Panangaden, D. Precup. Metrics for Markov decision processes with infinite state spaces. In Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence, Edinburgh, Scotland, pp. 201–208, 2005.
Acknowledgements
This work has been partly supported by National Key R&D Program of China (No. 2018YFA0306701), the Australian Research Council (Nos. DP160101652 and DP180100691), National Natural Science Foundation of China (No. 61832015) and the Key Research Program of Frontier Sciences, Chinese Academy of Sciences.
Author information
Authors and Affiliations
Corresponding author
Additional information
Recommended by Associate Editor Jyh-Horng Chou
Colored figures are available in the online version at https://link.springer.com/journal/11633
Ming-Sheng Ying is a Distinguished Professor and Research Director of the Center for Quantum Software and Information at the University of Technology Sydney, Australia. He is also Deputy Director for Research (adjunct position) at the Institute of Software at the Chinese Academy of Sciences, and holds the Cheung Kong Chair Professorship at Tsinghua University, China. He has published books: Model Checking Quantum Systems: Principles and Algorithms (2021) (with Yuan Feng), Foundations of Quantum Programming (2016) and Topology in Process Calculus: Approximate Correctness and Infinite Evolution of Concurrent Programs (2001). He received a China National Science Award in Natural Science (2008). He has served on the editorial board of several publications including Artificial Intelligence. He is currently Editor-in-Chief of ACM Transactions on Quantum Computing.
His research interests include quantum computation, theory of programming languages, and logics in AI.
Yuan Feng received the B.Sc. degree in mathematics from Department of Applied Mathematics, Tsinghua University, China in 1999, and received the Ph. D. degree in computer science from Department of Computer Science and Technology, Tsinghua University, China in and 2004. He is currently a professor at Centre for Quantum Software and Information (QSI), University of Technology Sydney (UTS), Australia.
His research interests include quantum programming theory, quantum information and quantum computation, and probabilistic systems.
Sheng-Gang Ying received the B. Sc. degree in physics from Department of Physics, Tsinghua University, China in 2010, and received the Ph. D. degree in computer science from Department of Computer Science and Technology, Tsinghua University, China in 2015, He is currently an associate researcher at State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, China.
His research interests include quantum programming theory, quantum Markov systems.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ying, MS., Feng, Y. & Ying, SG. Optimal Policies for Quantum Markov Decision Processes. Int. J. Autom. Comput. 18, 410–421 (2021). https://doi.org/10.1007/s11633-021-1278-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11633-021-1278-z