Taking Gradients Through Experiments: LSTMs and Memory Proximal Policy Optimization for Black-Box Quantum Control

  • Moritz AugustEmail author
  • José Miguel Hernández-Lobato
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11203)


In this work we introduce a general method to solve quantum control tasks as an interesting reinforcement learning problem not yet discussed in the machine learning community. We analyze the structure of the reinforcement learning problems typically arising in quantum physics and argue that agents parameterized by long short-term memory (LSTM) networks trained via stochastic policy gradients yield a versatile method to solving them. In this context we introduce a variant of the proximal policy optimization (PPO) algorithm called the memory proximal policy optimization (MPPO) which is based on the previous analysis. We argue that our method can by design be easily combined with numerical simulations as well as real experiments providing the reward signal. We demonstrate how the method can incorporate physical domain knowledge and present results of numerical experiments showing that it achieves state-of-the-art performance for several learning tasks in quantum control with discrete and continuous control parameters.


Reinforcement learning Quantum control Numerical simulation 


  1. 1.
    August, M., Ni, X.: Using recurrent neural networks to optimize dynamical decoupling for quantum memory. Phys. Rev. A 95(1), 012335 (2017)CrossRefGoogle Scholar
  2. 2.
    Biamonte, J., Wittek, P., Pancotti, N., Rebentrost, P., Wiebe, N., Lloyd, S.: Quantum machine learning. Nature 549(7671), 195–202 (2017)CrossRefGoogle Scholar
  3. 3.
    Bukov, M., Day, A.G., Sels, D., Weinberg, P., Polkovnikov, A., Mehta, P.: Machine learning meets quantum state preparation. the phase diagram of quantum control. arXiv preprint arXiv:1705.00565 (2017)
  4. 4.
    Caneva, T., Calarco, T., Montangero, S.: Chopped random-basis quantum optimization. Phys. Rev. A 84(2), 022326 (2011)CrossRefGoogle Scholar
  5. 5.
    Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
  6. 6.
    Cohen, C., Tannoudji, B.D., Laloë, F.: Quantum Mechanics, vol. i and ii. Hermann and Wiley, Paris and Hoboken (1977)Google Scholar
  7. 7.
    Doria, P., Calarco, T., Montangero, S.: Optimal control technique for many-body quantum dynamics. Phys. Rev. Lett. 106, 190501 (2011). Scholar
  8. 8.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  9. 9.
    Khaneja, N., Reiss, T., Kehlet, C., Schulte-Herbrüggen, T., Glaser, S.J.: Optimal control of coupled spin dynamics: design of nmr pulse sequences by gradient ascent algorithms. J. Magn. Reson. 172(2), 296–305 (2005)CrossRefGoogle Scholar
  10. 10.
    Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  11. 11.
    Melnikov, A.A., et al.: Active learning machine learns to create new quantum experiments. In: Proceedings of the National Academy of Sciences, p. 201714936 (2018)Google Scholar
  12. 12.
    Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)Google Scholar
  13. 13.
    Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)CrossRefGoogle Scholar
  14. 14.
    Nielsen, M.A., Chuang, I.: Quantum computation and quantum information (2002)Google Scholar
  15. 15.
    Palittapongarnpim, P., Wittek, P., Zahedinejad, E., Vedaie, S., Sanders, B.C.: Learning in quantum control: high-dimensional global optimization for noisy quantum dynamics. Neurocomputing 268, 116–126 (2017)CrossRefGoogle Scholar
  16. 16.
    Quiroz, G., Lidar, D.A.: Optimized dynamical decoupling via genetic algorithms. Phys. Rev. A 88, 052306 (2013). Scholar
  17. 17.
    Robbins, H.: Some aspects of the sequential design of experiments. In: Lai, T.L., Siegmund, D. (eds.) Herbert Robbins Selected Papers, pp. 169–177. Springer, Newyork (1985)CrossRefGoogle Scholar
  18. 18.
    Sakurai, J.J., Commins, E.D.: Modern Quantum Mechanics, Revised edn. AAPT, College Park (1995)Google Scholar
  19. 19.
    Schollwöck, U.: The density-matrix renormalization group in the age of matrix product states. Ann. Phys. 326(1), 96–192 (2011)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015)Google Scholar
  21. 21.
    Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
  22. 22.
    Silver, D., et al.: Mastering chess and Shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815 (2017)
  23. 23.
    Silver, D., et al.: Mastering the game of Go without human knowledge. Nature 550(7676), 354–359 (2017)CrossRefGoogle Scholar
  24. 24.
    Souza, A.M., Álvarez, G.A., Suter, D.: Robust dynamical decoupling for quantum computing and quantum memory. Phys. Rev. Lett. 106, 240501 (2011). Scholar
  25. 25.
    Viola, L., Knill, E., Lloyd, S.: Dynamical decoupling of open quantum systems. Phys. Rev. Lett. 82, 2417–2421 (1999). Scholar
  26. 26.
    Wigley, P.B., et al.: Fast machine-learning online optimization of ultra-cold-atom experiments. Sci. Rep. 6, 25890 (2016)CrossRefGoogle Scholar
  27. 27.
    Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. In: Sutton, R.S. (ed.) Reinforcement Learning. SECS, vol. 173, pp. 5–32. Springer, Boston (1992). Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Moritz August
    • 1
    Email author
  • José Miguel Hernández-Lobato
    • 2
  1. 1.Department of InformaticsTechnical University of MunichGarchingGermany
  2. 2.Computational and Biological Learning LabUniversity of CambridgeCambridgeUK

Personalised recommendations