Abstract
One of the main concepts in quantum physics is a density matrix, which is a symmetric positive definite matrix of trace one. Finite probability distributions can be seen as a special case when the density matrix is restricted to be diagonal.
We develop a probability calculus based on these more general distributions that includes definitions of joints, conditionals and formulas that relate these, including analogs of the Theorem of Total Probability and various Bayes rules for the calculation of posterior density matrices. The resulting calculus parallels the familiar “conventional” probability calculus and always retains the latter as a special case when all matrices are diagonal. We motivate both the conventional and the generalized Bayes rule with a minimum relative entropy principle, where the Kullbach-Leibler version gives the conventional Bayes rule and Umegaki’s quantum relative entropy the new Bayes rule for density matrices.
Whereas the conventional Bayesian methods maintain uncertainty about which model has the highest data likelihood, the generalization maintains uncertainty about which unit direction has the largest variance. Surprisingly the bounds also generalize: as in the conventional setting we upper bound the negative log likelihood of the data by the negative log likelihood of the MAP estimator.
Article PDF
Similar content being viewed by others
References
Alexa, M. (2002). Linear combination of transformations. In SIGGRAPH’02: Proceedings of the 29th annual conference on computer graphics and interactive techniques (pp. 380–387). New York: ACM Press.
Bernstein, D. S. (2005). Matrix mathematics: theory, facts, and formulas with application to linear systems theory. Princeton: Princeton University Press.
Bhatia, R. (1997). Matrix analysis. Berlin: Springer.
Buz̆ek, V., Drobný, G., Derka, R., Adam, G., & Wiedemann, H. (1999). Quantum state reconstruction from incomplete data. Chaos Solitons Fractals, 10, 981–1074.
Caves, C. M., Fuchs, C. A., Manne, K. K., & Renes, J. M. (2004). Gleason-type derivations of the quantum probability rule for generalized measurements. Foundations of Physics, 34, 193–209.
Cerf, N. J., & Adami, C. (1999). Quantum extension of conditional probability. Physical Review A, 60(2), 893–897.
Feynman, R. P. (1972). Statistical mechanics: a set of lectures. Reading: Addison-Wesley.
Gleason, A. (1957). Measures on the closed subspaces of a Hilbert space. Indiana University Mathematics Journal, 6, 885–893.
Holevo, A. S. (2001). Lecture notes in physics. Monographs: Vol. 67. Statistical structure of quantum theory, Berlin, New York: Springer.
Kato, T. (1978). Trotter’s product formula for an arbitrary pair of self-adjoint contraction semigroups. Topics in Functional Analysis (Advances in Mathematics—Supplementary Studies), 3, 185–195.
Kivinen, J., & Warmuth, M. K. (1997). Additive versus exponentiated gradient updates for linear prediction. Information and Computation, 132(1), 1–64.
Kivinen, J., & Warmuth, M. K. (1999). Averaging expert predictions. In Lecture notes in artificial intelligence : Vol. 1572. Computational learning theory, 4th European conference (EuroCOLT’99), Nordkirchen, Germany, March 29–31, 1999, Proceedings (pp. 153–167). Berlin: Springer.
Nielsen, M. A., & Chuang, I. L. (2000). Quantum computation and quantum information. Cambridge: Cambridge University Press.
Olivares, S., & Paris, M. G. A. Quantum estimation via the minimum Kullback entropy principle. Physical Review A, 76, 2007.
Schack, R., Brun, T. A., & Caves, C. M. (2001). Quantum Bayes rule. Physical Review A, 64, 014305.
Simon, B. (1979). Functional integration and quantum physics. San Diego: Academic Press.
Singh, R., Warmuth, M. K., Raj, B., & Lamere, P. (2003). Classification with free energy at raised temperatures. In Proc. of EUROSPEECH 2003, September 2003 (pp. 1773–1776)
Tsuda, K., Raätsch, G., & Warmuth, M. K. (2005). Matrix exponentiated gradient updates for on-line learning and Bregman projections. Journal of Machine Learning Research, 6, 995–1018.
Warmuth, M. K. (2005). Bayes rule for density matrices. In Advances in neural information processing systems 18 (NIPS’05). Cambridge: MIT Press.
Warmuth, M. K. (2007). Winnowing subspaces. In Proceedings of the 24th international conference on machine learning (ICML’07). New York: ACM.
Warmuth, M. K., & Kuzmin, D. (2006). Online variance minimization. In Proceedings of the 19th annual conference on learning theory (COLT’06), Pittsburg, June 2006. New York: Springer.
Warmuth, M. K., & Kuzmin, D. (2008). Randomized PCA algorithms with regret bounds that are logarithmic in the dimension. Journal of Machine Learning Research, 9, 2217–2250.
Zellner, A. (1998). Optimal information processing and Bayes’s theorem. The American Statistician, 42(4), 278–284.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Nicolo Cesa-Bianchi.
Supported by NSF grant IIS 0325363. Some of this work was done while visiting National ICT Australia in Canberra.
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Warmuth, M.K., Kuzmin, D. Bayesian generalized probability calculus for density matrices. Mach Learn 78, 63–101 (2010). https://doi.org/10.1007/s10994-009-5133-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-009-5133-7