Semidefinite Approximations of the Matrix Logarithm

Abstract

The matrix logarithm, when applied to Hermitian positive definite matrices, is concave with respect to the positive semidefinite order. This operator concavity property leads to numerous concavity and convexity results for other matrix functions, many of which are of importance in quantum information theory. In this paper we show how to approximate the matrix logarithm with functions that preserve operator concavity and can be described using the feasible regions of semidefinite optimization problems of fairly small size. Such approximations allow us to use off-the-shelf semidefinite optimization solvers for convex optimization problems involving the matrix logarithm and related functions, such as the quantum relative entropy. The basic ingredients of our approach apply, beyond the matrix logarithm, to functions that are operator concave and operator monotone. As such, we introduce strategies for constructing semidefinite approximations that we expect will be useful, more generally, for studying the approximation power of functions with small semidefinite representations.

This is a preview of subscription content, log in to check access.

Fig. 1

Notes

  1. 1.

    We define \(D_{\text {op}}\) as \(D_{\text {op}}(X\Vert Y) = -P_{\log }(Y,X)\) to match the conventional order of arguments in information theory.

  2. 2.

    This last requirement is to ensure uniqueness.

References

  1. 1.

    Al-Mohy, A.H., Higham, N.J.: Improved inverse scaling and squaring algorithms for the matrix logarithm. SIAM J. Sci. Comput. 34(4), C153–C169 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  2. 2.

    ApS, M.: The MOSEK optimization toolbox for MATLAB manual. Version 7.1 (Revision 28). (2015). http://docs.mosek.com/7.1/toolbox/index.html

  3. 3.

    Ben-Tal, A., Nemirovski, A.: Lectures on modern convex optimization: analysis, algorithms, and engineering applications. SIAM (2001)

  4. 4.

    Ben-Tal, A., Nemirovski, A.: On polyhedral approximations of the second-order cone. Math. Oper. Res. 26(2), 193–205 (2001)

    MathSciNet  Article  MATH  Google Scholar 

  5. 5.

    Besenyei, A., Petz, D.: Successive iterations and logarithmic means. Oper. and Matrices 7(1), 205–218 (2013). https://doi.org/10.7153/oam-07-12

  6. 6.

    Bhatia, R.: Positive definite matrices. Princeton University Press (2009)

  7. 7.

    Bhatia, R.: Matrix analysis, vol. 169. Springer Science & Business Media (2013)

  8. 8.

    Blekherman, G., Parrilo, P.A., Thomas, R.R.: Semidefinite optimization and convex algebraic geometry. SIAM (2013)

  9. 9.

    Boyd, S., Kim, S.J., Vandenberghe, L., Hassibi, A.: A tutorial on geometric programming. Optim. Eng. 8(1), 67–127 (2007)

    MathSciNet  Article  MATH  Google Scholar 

  10. 10.

    Bushell, P.J.: Hilbert’s metric and positive contraction mappings in a Banach space. Arch. Ration. Mech. Anal. 52(4), 330–338 (1973)

    MathSciNet  Article  MATH  Google Scholar 

  11. 11.

    Carlen, E.A.: Trace inequalities and quantum entropy. An introductory course. In: Entropy and the quantum, vol. 529, pp. 73–140. AMS (2010)

  12. 12.

    Carlson, B.C.: An algorithm for computing logarithms and arctangents. Math. Comp. 26(118), 543–549 (1972)

    MathSciNet  Article  MATH  Google Scholar 

  13. 13.

    Cox, D.A.: The arithmetic-geometric mean of Gauss. In: Pi: A source book, pp. 481–536. Springer (2004)

  14. 14.

    Dieci, L., Morini, B., Papini, A.: Computational techniques for real logarithms of matrices. SIAM J. Matrix Anal. Appl. 17(3), 570–593 (1996)

    MathSciNet  Article  MATH  Google Scholar 

  15. 15.

    Domahidi, A., Chu, E., Boyd, S.: ECOS: An SOCP solver for embedded systems. In: European Control Conference (ECC), pp. 3071–3076 (2013)

  16. 16.

    Ebadian, A., Nikoufar, I., Gordji, M.E.: Perspectives of matrix convex functions. Proc. Natl. Acad. Sci. USA 108(18), 7313–7314 (2011)

    MathSciNet  Article  MATH  Google Scholar 

  17. 17.

    Effros, E., Hansen, F.: Non-commutative perspectives. Ann. Funct. Anal 5(2), 74–79 (2014)

    MathSciNet  Article  MATH  Google Scholar 

  18. 18.

    Effros, E.G.: A matrix convexity approach to some celebrated quantum inequalities. Proc. Natl. Acad. Sci. USA 106(4), 1006–1008 (2009)

    MathSciNet  Article  MATH  Google Scholar 

  19. 19.

    Fawzi, H., Fawzi, O.: Relative entropy optimization in quantum information theory via semidefinite programming approximations. arXiv preprint arXiv:1705.06671 (2017)

  20. 20.

    Fawzi, H., Saunderson, J.: Lieb’s concavity theorem, matrix geometric means, and semidefinite optimization. Linear Algebra Appl. 513, 240–263 (2017)

    MathSciNet  Article  MATH  Google Scholar 

  21. 21.

    Fujii, J., Kamei, E.: Relative operator entropy in noncommutative information theory. Math. Japon 34, 341–348 (1989)

    MathSciNet  MATH  Google Scholar 

  22. 22.

    Glineur, F.: Quadratic approximation of some convex optimization problems using the arithmetic-geometric mean iteration. Talk at the “Workshop GeoLMI on the geometry and algebra of linear matrix inequalities”. http://homepages.laas.fr/henrion/geolmi/geolmi-glineur.pdf (retrieved November 2, 2016) (2009)

  23. 23.

    Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx (2014)

  24. 24.

    Hansen, F., Pedersen, G.K.: Jensen’s inequality for operators and Löwner’s theorem. Math. Ann. 258(3), 229–241 (1982)

    MathSciNet  Article  MATH  Google Scholar 

  25. 25.

    Helton, J.W., Klep, I., McCullough, S.: The tracial Hahn–Banach theorem, polar duals, matrix convex sets, and projections of free spectrahedra. J. Eur. Math. Soc. (JEMS) 19(6), 1845–1897 (2017)

    MathSciNet  Article  MATH  Google Scholar 

  26. 26.

    Higham, N.J.: Functions of matrices: theory and computation. SIAM (2008)

  27. 27.

    Kenney, C., Laub, A.J.: Condition estimates for matrix functions. SIAM J. Matrix Anal. Appl. 10(2), 191–209 (1989)

    MathSciNet  Article  MATH  Google Scholar 

  28. 28.

    Lieb, E.H.: Convex trace functions and the Wigner-Yanase-Dyson conjecture. Adv. Math. 11(3), 267–288 (1973)

    MathSciNet  Article  MATH  Google Scholar 

  29. 29.

    Lieb, E.H., Ruskai, M.B.: Proof of the strong subadditivity of quantum mechanical entropy. J. Math. Phys. 14(12), 1938–1941 (1973)

    MathSciNet  Article  Google Scholar 

  30. 30.

    Meurant, G., Sommariva, A.: Fast variants of the Golub and Welsch algorithm for symmetric weight functions in Matlab. Numerical Algorithms 67(3), 491–506 (2014)

    MathSciNet  Article  MATH  Google Scholar 

  31. 31.

    Nesterov, Y.E.: Constructing self-concordant barriers for convex cones. CORE Discussion Paper (2006/30) (2006)

  32. 32.

    Sagnol, G.: On the semidefinite representation of real functions applied to symmetric matrices. Linear Algebra Appl. 439(10), 2829–2843 (2013)

    MathSciNet  Article  MATH  Google Scholar 

  33. 33.

    Serrano, S.A.: Algorithms for unsymmetric cone optimization and an implementation for problems with the exponential cone. Ph.D. thesis, Stanford University (2015)

  34. 34.

    Skajaa, A., Ye, Y.: A homogeneous interior-point algorithm for nonsymmetric convex conic optimization. Math. Program. 150(2), 391–422 (2015)

    MathSciNet  Article  MATH  Google Scholar 

  35. 35.

    Stoer, J., Bulirsch, R.: Introduction to numerical analysis, vol. 12, 3 edn. Springer-Verlag, New York (2002)

    Google Scholar 

  36. 36.

    Trefethen, L.N.: Is Gauss quadrature better than Clenshaw-Curtis? SIAM Rev. 50(1), 67–87 (2008)

    MathSciNet  Article  MATH  Google Scholar 

  37. 37.

    Trefethen, L.N.: Approximation theory and approximation practice. SIAM (2013)

  38. 38.

    Tropp, J.A.: From joint convexity of quantum relative entropy to a concavity theorem of Lieb. Proc. Amer. Math. Soc. 140(5), 1757–1760 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  39. 39.

    Tropp, J.A.: An introduction to matrix concentration inequalities. Found. Trends Mach. Learn. 8(1-2), 1–230 (2015)

    Article  MATH  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to James Saunderson.

Additional information

This work was done in part while Pablo Parrilo was visiting the Simons Institute for the Theory of Computing. It was partially supported by the DIMACS/Simons Collaboration on Bridging Continuous and Discrete Optimization through NSF Grant #CCF-1740425. This work was also supported in part by the Air Force Office of Scientific Research through AFOSR Grants FA9550-11-1-0305 and FA9550-12-1-0287, and in part by the National Science Foundation through NSF Grant #CCF-1565235.

Communicated by James Renegar.

Appendices

Background on Approximation Theory

Gaussian quadrature A quadrature rule is a method of approximating an integral with a weighted sum of evaluations of the integrand. A quadrature rule is determined by the evaluation points, called nodes, and the weights of the weighted sum. Given a measure \(\nu \) supported on \([-1,1]\), a quadrature rule gives an approximation of the form

$$\begin{aligned} \int _{-1}^{1}h(t)\;\mathrm{d} \nu (t) \approx \sum _{j=1}^{m} w_jh(t_j) \end{aligned}$$
(37)

where the \(t_j\in [-1,1]\) are the nodes and the \(w_j\) are the weights. A Gaussian quadrature is a choice of nodes \(t_1,\ldots ,t_m\in (-1,1)\) and positive weights \(w_1,\ldots ,w_m\) that integrates all polynomials of degree at most \(2m-1\)exactly. For example, when \(\nu \) is the uniform measure on \([-1,1]\), such a quadrature rule is known as Gauss–Legendre quadrature, and the nodes and weights can be computed for example by an eigenvalue decomposition of the associated Jacobi matrix, see e.g., [36, Sect. 2].

Padé approximants Padé approximants are approximations of a given univariate function, analytic at a point \(x_0\), by rational functions. More precisely, the (mn)-Padé approximant of h at \(x_0\) is the rational function p(x) / q(x) such that p is a polynomial of degree mq is a polynomial of degree n, and the Taylor series expansion of the error at \(x_0\) is of the form

$$\begin{aligned} h(x) - \frac{p(x)}{q(x)} = (x-x_0)^{s}\sum _{k\ge 0}a_k(x-x_0)^k\end{aligned}$$

for real numbers \(a_k\) and the largest possible positive integerFootnote 2s. Expressed differently, p and q are chosen so that the Taylor series of p(x) / q(x) at \(x_0\) matches as many Taylor series coefficients of h at \(x_0\) as possible (and at least the first \(m+n+1\) coefficients).

Properties and Error Bounds of Gaussian Quadrature-Based Approximations

Assume \(g:\mathbb {R}_{++}\rightarrow \mathbb {R}\) is a function with an integral representation

$$\begin{aligned} g(x) = g(1) + g'(1)\int _{0}^1 f_t(x) \mathrm{d}\nu (t) \end{aligned}$$
(38)

where \(\nu \) is a probability measure on [0, 1] and \(f_t(x) = \frac{x-1}{1+t(x-1)}\). The case \(g=\log \) corresponds to \(\nu \) being the Lebesgue measure on [0, 1]. In this appendix we show that the rational approximation obtained by applying Gaussian quadrature on (38) coincides with the Padé approximant of g at \(x=1\). We also derive error bounds on the quality of this rational approximation. Note that functions of the form (38) are precisely operator monotone functions, by Löwner’s theorem (see Sect. 4.1).

Let \(r_m\) be the rational approximant obtained by using Gaussian quadrature on (38):

$$\begin{aligned} r_m(x) = g(1) + g'(1)\sum _{i=1}^m w_i f_{t_i}(x) \end{aligned}$$
(39)

where \(w_i > 0, t_i \in [0,1]\) are the Gaussian quadrature weights and nodes for the measure \(\nu \).

Connection with Padé Approximant

We first show that the function \(r_m\) coincides with the Padé approximant of g at \(x=1\). The special case \(g =\log \) was established in [14, Theorem 4.3].

Proposition 5

Assume \(g:\mathbb {R}_{++}\rightarrow \mathbb {R}\) has the form (38) and let \(r_m\) be the rational approximation obtained via Gaussian quadrature as in (39). Then \(r_m\) is the (mm) Padé approximant of g at \(x=1\).

Proof

First we note that \(f_t(x)\) admits the following series expansion, valid for \(|x-1|<\frac{1}{|t|}\):

$$\begin{aligned} f_t(x) =&\frac{x-1}{t(x-1)+1} = (x-1)\sum _{k=0}^{\infty } (-1)^kt^k(x-1)^k. \end{aligned}$$

Let \(\nu _m = \sum _{i=1}^m w_i \delta _{t_i}\) be the atomic measure on [0, 1] corresponding to Gaussian quadrature applied to \(\nu \). By definition of Gaussian quadrature, \(\nu _m\) matches all moments of \(\nu \) up to degree \(2m-1\), i.e., \(\int _{0}^{1}p(t)\;\mathrm{d}\nu (t) = \int _{0}^{1}p(t)\;\mathrm{d}\nu _m(t)\) for all polynomials p of degree at most \(2m-1\). It thus follows that

$$\begin{aligned} g(x) - r_m(x)= g'(1)(x-1) \sum _{k=2m}^{\infty } (-1)^k(x-1)^{k}\left[ \int _{0}^{1}t^k\; d\nu (t) - \int _{0}^{1}t^k\; d\nu _m(t)\right] ,\end{aligned}$$

establishing that \(r_m\) matches the first 2m Taylor series coefficients of g at \(x=1\). Since \(r_m\) has numerator and denominator degree m, it is the (mm)-Padé approximant of g at \(x=1\). \(\square \)

Error Bounds

In this section we derive an error bound on the approximation quality \(|g(x) - r_m(x)|\). To do this we use standard methods as described, e.g., in [37]. This error is essentially controlled by the decay of the Chebyshev coefficients of the integrand. For the rational functions \(f_t\) one can compute these coefficients exactly.

Quadrature Error Bounds for Operator Monotone Functions

To appeal to standard arguments, it is easiest to rewrite the integrals of interest over the interval \([-1,1]\) by the transformation \(t\mapsto 1-2t\) mapping [0, 1] to \([-1,1]\). To this end, let

$$\begin{aligned} \tilde{f}_t(x) := f_{\frac{1-t}{2}}(x)= \frac{2}{\frac{x+1}{x-1}-t}.\end{aligned}$$

Let \(T_k(t)\) denote the kth Chebyshev polynomial. We start by explicitly computing the Chebyshev expansion of \(\tilde{f}_t(x)\) for fixed x, i.e., we find the coefficients \(a_k(x)\) of \(\tilde{f}_t(x) = \sum _{k=0}^{\infty }a_k(x)T_k(t)\). To do this, we first define \(h_\rho (t) = \frac{2}{(\rho +\rho ^{-1})/2 - t}\) and observe that with the substitution \(\rho = \frac{\sqrt{x}-1}{\sqrt{x}+1}\) we have that \(\tilde{f}_t(x) = h_\rho (t)\) and that \(x>0\) if and only if \(-1<\rho <1\). We can compute the Chebyshev expansion of \(h_\rho (t)\) by observing that the generating function of Chebyshev polynomials is (see e.g., [37, Exercise 3.14])

$$\begin{aligned} \sum _{k=0}^{\infty }\rho ^kT_k(t) = \frac{1-\rho t}{1-2\rho t + \rho ^2} = \frac{1}{2} + \frac{\rho ^{-1}-\rho }{8}h_\rho (t).\end{aligned}$$

It then follows that the Chebyshev expansion of \(h_\rho (t)\) is

$$\begin{aligned} h_\rho (t) = \frac{2}{(\rho +\rho ^{-1})/2 - t} = \frac{8}{\rho ^{-1}-\rho }\left[ \frac{1}{2} + \sum _{k=1}^{\infty } \rho ^k T_k(t)\right] . \end{aligned}$$
(40)

Since \(\frac{8}{\rho ^{-1}-\rho } = 2(\sqrt{x} - 1/\sqrt{x})\), the Chebyshev expansion of \(\tilde{f}_t(x)\) is

$$\begin{aligned} \tilde{f}_t(x) = \frac{2}{\frac{x+1}{x-1}-t} = 2\left( \sqrt{x} - 1/\sqrt{x}\right) \left[ \frac{1}{2} + \sum _{k=1}^{\infty } \left( \frac{\sqrt{x}-1}{\sqrt{x}+1}\right) ^k T_k(t)\right] . \end{aligned}$$
(41)

We are now ready to state an error bound on the approximation quality \(|g(x) - r_m(x)|\). Our arguments are standard and follow closely the ideas described in [37].

Proposition 6

Let \(g:\mathbb {R}_{++}\rightarrow \mathbb {R}\) be a function with an integral representation (38) and let \(r_m\) be the rational approximation obtained by applying Gaussian quadrature as in (39). If \(m\ge 1\) and \(x>0\) then

$$\begin{aligned} \left| g(x) - r_m(x)\right| \le 4g'(1)|\sqrt{x}-1/\sqrt{x}|\frac{\left| \frac{\sqrt{x}-1}{\sqrt{x}+1}\right| ^{2m}}{1-\left| \frac{\sqrt{x}-1}{\sqrt{x}+1}\right| }. \end{aligned}$$
(42)

If \(\nu \) is invariant under the map \(t\mapsto 1-t\) (i.e., \(g(x^{-1}) = -g(x)\)) then this can be improved to

$$\begin{aligned} \left| g(x) - r_m(x)\right| \le g'(1)\left| \sqrt{x} - 1/\sqrt{x}\right| ^2\left| \frac{\sqrt{x}-1}{\sqrt{x}+1}\right| ^{2m-1}. \end{aligned}$$
(43)

Finally, \(r_m(x) \ge g(x)\) for all \(0<x\le 1\) and \(r_m(x) \le g(x)\) for all \(x\ge 1\).

Proof

Let \(\tilde{\nu }\) be the measure on \([-1,1]\) obtained from \(\nu \) by changing variables \(t \in [0,1] \mapsto 1-2t \in [-1,1]\) so that \(g(x) = g(0) + g'(1) \int _{-1}^1 \tilde{f_t}(x) \mathrm{d}\tilde{\nu }(t)\). Let \(\tilde{\nu }_m\) be the atomic measure supported on m points obtained by applying Gaussian quadrature on \(\nu \). Finally let the Chebyshev expansion of \(\tilde{f}_t(x)\) be \(\sum _{k=0}^\infty a_k(x)T_k(t)\). Since \(\int _{-1}^{1}T_k(t)\;\mathrm{d}\tilde{\nu }(t) = \int _{-1}^{1}T_k(t)\;\mathrm{d}\tilde{\nu }_m(t)\) for \(k\le 2m-1\),

$$\begin{aligned} \left| g(x) - r_m(x)\right| = g'(1)\left| \sum _{k=2m}^{\infty }a_k(x)\left[ \int _{-1}^{1}T_k(t)\;\mathrm{d}\tilde{\nu }(t) - \int _{-1}^{1}T_k(t)\;\mathrm{d}\tilde{\nu }_m(t)\right] \right| .\end{aligned}$$

For \(k\ge 2\), we have that \(a_k(x) = 2(\sqrt{x}-1/\sqrt{x})\left( \frac{\sqrt{x}-1}{\sqrt{x}+1}\right) ^k\) [see (41)]. So using the fact that \(\tilde{\nu }\) and \(\tilde{\nu }_m\) are probability measures (when \(m\ge 1\)), together with the fact that \(|T_k(t)|\le 1\) for \(t\in [-1,1]\), the triangle inequality gives

$$\begin{aligned} \left| g(x) - r_m(x)\right|\le & {} 4g'(1)|\sqrt{x}-1/\sqrt{x}|\sum _{k=2m}^{\infty }\left| \frac{\sqrt{x}-1}{\sqrt{x}+1}\right| ^k\\= & {} 4g'(1)|\sqrt{x}-1/\sqrt{x}|\frac{\left| \frac{\sqrt{x}-1}{\sqrt{x}+1}\right| ^{2m}}{1-\left| \frac{\sqrt{x}-1}{\sqrt{x}+1}\right| }. \end{aligned}$$

If the measure \(\nu \) is invariant under the map \(t\mapsto -t\) then the same is true of \(\nu _m\) (see, e.g., [30]). Since \(\tilde{f}_t(x^{-1}) = -\tilde{f}_{-t}(x)\) it follows that \(r_m(x^{-1}) = -r_m(x)\). Furthermore,

$$\begin{aligned} \int _{-1}^{1}T_{2k+1}(t)\;\mathrm{d}\tilde{\nu }(t) = \int _{-1}^{1}T_{2k+1}(t)\;\mathrm{d}\tilde{\nu }_m(t) = 0\qquad \text {for all nonnegative integers}\,k\end{aligned}$$

because Chebyshev polynomials of odd degree are odd functions. In this case only the even Chebyshev coefficients contribute to the error bound so

$$\begin{aligned} \left| g(x) - r_m(x)\right|\le & {} 4g'(1)|\sqrt{x}-1/\sqrt{x}|\sum _{k=m}^{\infty }\left| \frac{\sqrt{x}-1}{\sqrt{x}+1}\right| ^{2k}\\= & {} g'(1)|\sqrt{x}-1/\sqrt{x}|^2\left| \frac{\sqrt{x}-1}{\sqrt{x}+1}\right| ^{2m-1}. \end{aligned}$$

To establish inequalities between \(r_m(x)\) and g(x), we use an alternative formula for the error obtained by approximating an integral via Gaussian quadrature. Since \(t\mapsto \tilde{f}_t(x)\) has derivatives of all orders, one can show (see, e.g., [35, Theorem 3.6.24]) that there exists \(\tau \in [-1,1]\) and \(\kappa \ge 0\) such that

$$\begin{aligned} g(x) - r_m(x) = \frac{\kappa }{(2m){!}} \frac{\partial ^{2m}}{\partial t^{2m}}\tilde{f}_\tau (x) = \frac{\kappa }{\left( \frac{x+1}{x-1}-\tau \right) ^{2m+1}}.\end{aligned}$$

If \(x\in (0,1)\) then \(\frac{1+x}{1-x}-\tau <0\) for all \(\tau \in [-1,1]\) and so \(g(x)-r_m(x) < 0\). If \(x\in (1,\infty )\) then \(\frac{1+x}{1-x}-\tau > 0\) for all \(\tau \in [-1,1]\) and so \(g(x) - r_m(x) > 0\). If \(x=1\) then \(g(x) = r_m(x)\). \(\square \)

Very similar bounds hold for the error between \(g:\mathbb {R}_{++}\rightarrow \mathbb {R}_{++}\), a positive operator monotone function, and \(r_{m}^+\), the rational approximation obtained by applying Gaussian quadrature to the integral representation in (25). Indeed, if \(m\ge 1\) and \(x> 0\),

$$\begin{aligned} \left| g(x) - r_m^+(x)\right| \le 4(g(1)-g(0))\sqrt{x}\frac{\left| \frac{\sqrt{x}-1}{\sqrt{x}+1}\right| ^{2m}}{1-\left| \frac{\sqrt{x}-1}{\sqrt{x}+1}\right| }. \end{aligned}$$
(44)

We omit the proof, since it follows the same basic argument as the proof of (42), together with the observation that \(f_t^+(x) = \frac{x}{x-1}f_t(x)\).

The Special Case of Log: Proofs of Proposition 1 and Theorem 1

Proof (of Proposition 1)

The function \(g(x) = \log (x)\) has an integral representation (38) where the measure \(\nu \) is the uniform measure on [0, 1], which is invariant under the map \(t\mapsto 1-t\). Proposition 6 tells us that, for any \(x>0\),

$$\begin{aligned} |\log (x) - r_m(x)| \le |\sqrt{x}-1/\sqrt{x}|^2\left| \frac{\sqrt{x}-1}{\sqrt{x}+1}\right| ^{2m-1}. \end{aligned}$$
(45)

The error between \(\log (x) = 2^k\log (x^{1/2^k})\) and \(r_{m,k}(x) = 2^kr_m(x^{1/2^k})\) can be obtained by evaluating at \(x^{1/2^k}\) and scaling by \(2^k\) to obtain

$$\begin{aligned} |\log (x) - r_{m,k}(x)| \le 2^k|\sqrt{\kappa }-1/\sqrt{\kappa }|^2\left| \frac{\sqrt{\kappa }-1}{\sqrt{\kappa }+1}\right| ^{2m-1}. \end{aligned}$$
(46)

where \(\kappa = x^{1/2^k}\). By using the fact that

$$\begin{aligned} \left| \frac{\sqrt{\kappa } - 1}{\sqrt{\kappa }+1}\right| = \left| \tanh \left( \frac{1}{4}\log (\kappa )\right) \right| \le \frac{1}{2^{k+2}}|\log (x)|\end{aligned}$$

we can write this as a bound on relative error as

$$\begin{aligned} |\log (x) - r_{m,k}(x)| \le |\log (x)|\left| \frac{\sqrt{\kappa } - 1/\sqrt{\kappa }}{2}\right| ^2\left| \frac{\sqrt{\kappa }-1}{\sqrt{\kappa }+1}\right| ^{2(m-1)}.\end{aligned}$$

Asymptotic behavior of (46): Since \(\kappa = x^{1/2^k} = e^{2^{-k}\log (x)}\), we can rewrite the right-hand side of (46) as

$$\begin{aligned} 2^k \left[ 2\sinh (2^{-(k+1)}\log (x))\right] ^2 \left[ \tanh (2^{-(k+2)}\log (x))\right] ^{2m-1}.\end{aligned}$$

Since \(\sinh ^2(2x)\tanh ^{2m-1}(x) = 4x^{2m+1} + O(x^{2m+3})\), we have that

$$\begin{aligned} 2^k \left[ 2\sinh (2^{-(k+1)}\log (x))\right] ^2 \left[ \tanh (2^{-(k+2)}\log (x))\right] ^{2m-1}\asymp 4\cdot 4^{-m(k+2)}\log (x)^{2m+1}\end{aligned}$$

as \(k\rightarrow \infty \). \(\square \)

Proof (of Theorem 1)

The function r is chosen to be of the form \(r_{m,k}\) for certain m and k. In particular we can choose \(k = k_1+k_2\), with \(k_1 = \lceil \log _2\log _e(a)\rceil +1\), \(k_2\) being the smallest even integer larger than \(\sqrt{\log _2(32\log _e(a)/\epsilon )}\), and with \(m=k_2/2\). The function \(r_{m,k}\) has a semidefinite representation of size \(m+k\) (as a special case of Theorem 3 in Sect. 3), which is \(O(\sqrt{\log _e(1/\epsilon )})\) for fixed a. It remains to establish the error bound. To do so, we first note that \(x^{1/2^{k_1}} < 1\) for all \(x\in [1/a,a]\). Then, for all \(x\in [1/a,a]\),

$$\begin{aligned} \begin{aligned} |r_{m,k}(x) - \log (x)|&\le 2^k|x^{1/2^{k+1}} - x^{-1/2^{k+1}}|^2\left| \frac{x^{1/2^{k+1}} - 1}{x^{1/2^{k+1}}+1}\right| ^{2m-1}\\ {}&= 2^{k+2}\sinh ^{2}\left( \frac{1}{2^{k_2+1}}\log _e(x^{1/2^{k_1}})\right) \\ {}&\times \tanh ^{2m-1}\left( \frac{1}{2^{k_2+2}}\log _e(x^{1/2^{k_1}})\right) \\ {}&\le 8\cdot 2^{k_1-1}2^{k_2}\sinh ^2(1/2^{k_2+1}) \tanh ^{2m-1}(1/2^{k_2+2})\\ {}&\le 8 \log _e(a) 2^{k_2} 2^{-(k_2+2)(2m-1)}\\ {}&\le \epsilon . \end{aligned} \end{aligned}$$

Here, the second last equality holds because \(\sinh (1/2)^2 \le 1,\tanh (x)\le x\) for all \(x\ge 0\), and \(2^{k_1-1} \le \log _e(a)\) (by our choice of \(k_1\)). The last inequality holds by our choice of m and \(k_2\). \(\square \)

Proof of Theorem 7

Proof (of Theorem 7)

The function r is of the form \(r_{m,k}\) defined by (29) for particular values of the parameters m and k. Throughout the proof, for convenience of notation, let \((x_k,y_k) = \varPhi ^{(k)}(x,y)\) for \(k\ge 0\). The error bound (44) of “Appendix B.2” shows that for any \(x,y > 0\):

$$\begin{aligned} |yr_{m,k}(x/y) - yg(x/y)|&= |y_kr_m^+(x_k/y_k) - y_kg(x_k/y_k)|\nonumber \\&\le 4(g(1)-g(0))\sqrt{x_ky_k}\frac{\left| \frac{\sqrt{x_k}-\sqrt{y_k}}{\sqrt{x_k}+\sqrt{y_k}}\right| ^{2m}}{1-\left| \frac{\sqrt{x_k}-\sqrt{y_k}}{\sqrt{x_k}+\sqrt{y_k}}\right| } \nonumber \\&= 4(g(1)-g(0)) \sqrt{x_k y_k} \frac{\left| \tanh \left( \frac{1}{4}\log (x_k/y_k)\right) \right| ^{2m}}{1-\left| \tanh \left( \frac{1}{4}\log (x_k/y_k)\right) \right| }. \end{aligned}$$
(47)

We will show that if \(\varPhi \) has the linear contraction property (31) then the bound (47) decays like \(O(c^{-k^2})\) for the choice of \(m \approx k\) (that we make precise later). To establish this, we need to bound two terms: first, if (31) holds then \(\log (x_k / y_k) = O(c^{-k/2})\) and so the numerator in (47) converges like \(O(c^{-km})\) as we want. The second term that we need to control is \(\sqrt{x_k y_k}\) and one can show that this term grows at most linearly. This is proved in the following lemma:

Lemma 1

There is a constant \(b > 0\) such that for any \(x,y > 0\) satisfying \(a^{-1} \le x/y \le a\) we have

$$\begin{aligned} \sqrt{x_k y_k} \le yb^k (1+a)/2 \end{aligned}$$
(48)

where \((x_k,y_k) = \varPhi ^{(k)}(x,y)\).

Proof

Since \(h_1\) and \(h_2\) are concave, they are each bounded above by their linear approximation at \(x=1\). As such, \(P_{h_i}(x,y) \le h_i'(1)(x-y) + h_i(1)y\) for all \(x,y\in \mathbb {R}_{++}\) and \(i=1,2\). Summing these two inequalities we see that

$$\begin{aligned} P_{h_1}(x,y)+P_{h_2}(x,y)\le & {} \left[ h_1'(1)+h_2'(1)\right] x \\&+ \left[ h_1(1)+h_2(1)-(h_1'(1) +h_2'(1))\right] y.\end{aligned}$$

Because \(h_1\) and \(h_2\) take positive values, \(h_1(1) \ge h_1(1)-h_1'(1)\ge 0\) and \(h_2(1) \ge h_2(1) - h_2'(1)\ge 0\). As such, if \(b = \max \{h_1'(1)+h_2'(1),h_1(1)+h_2(1) - (h_1'(1)+h_2'(1))\}\), then \(P_{h_1}(x,y)+P_{h_2}(x,y) \le b(x+y)\) for all \(x,y\in \mathbb {R}_{++}\). It then follows that \(x_k + y_k \le b^k(x+y)\) for all \(x,y\in \mathbb {R}_{++}\) and so that

$$\begin{aligned} \sqrt{x_ky_k} \le (x_k+y_k)/2 \le y b^k(1+a)/2 \end{aligned}$$

as desired. \(\square \)

Plugging (48) in (47) gives us, for any \(a^{-1} \le x/y \le a\):

$$\begin{aligned} |r_{m,k}(x/y) - g(x/y)|\le & {} 2(g(1) - g(0)) (1+a) b^k\frac{\left| \tanh \left( \frac{1}{4}\log (x_k/y_k)\right) \right| }{1-\left| \tanh \left( \frac{1}{4}\log (x_k/y_k)\right) \right| }.\nonumber \\ \end{aligned}$$
(49)

Choose k to be the smallest even integer satisfying

$$\begin{aligned} k \ge \max \left\{ 2\log _c\log (a), \sqrt{\log _c\left( \frac{8(g(1)-g(0))(1+a)}{3\epsilon }\right) }\right\} \end{aligned}$$

and m to be the smallest integer satisfying \(m \ge k\max \{1,\frac{\log (b)}{\log (16)}\}\). Note that both m and k are \(O(\sqrt{\log _c(1/\epsilon )})\) when we treat a and b as constants. With these choices, and assumption (31), we have that \(b^k16^{-m} \le 1\) and

$$\begin{aligned}&|\log (x_{k/2}/y_{k/2})|\le {} c^{-k/2}\log (a) \le 1\quad \text{ and }\quad \\&|\log (x_k/y_k)| \le {} c^{-k/2}|\log (x_{k/2}/y_{k/2})| \le c^{-k/2}.\end{aligned}$$

Using the inequality \(|\tanh (z)| \le |z|\) for all z, and setting \(y=1\) in the error bound (49), we have that

$$\begin{aligned} |r_{m,k}(x) - g(x)|\le & {} 2(g(1)-g(0))(1+a)\frac{b^kc^{-km}16^{-m}}{1-1/4} \\= & {} \frac{8(g(1)-g(0))(1+a)}{3} c^{-k^2} \le \epsilon .\end{aligned}$$

The size of the semidefinite representation of \(r = r_{m,k}\) is \(O(m+k)\), if we view the size of the semidefinite representations of \(h_1\) and \(h_2\) as being constant. Since \(m,k\in O(\sqrt{\log _c(1/\epsilon )})\) it follows that the size of the semidefinite representation of r is also \(O(\sqrt{\log _c(1/\epsilon )})\).

In the case where assumption (32) also holds, we choose m (respectively, k) to be the smallest integer (respectively, even integer) satisfying

$$\begin{aligned} k \ge \max \left\{ 2\log _c\log (a),2\log _2\log _{c_0}\left( \frac{8(g(1)-g(0))(1+a)}{ 3\epsilon }\right) \right\} \end{aligned}$$

and \(m\ge \max \left\{ 1,\frac{k\log (b)}{\log (16/c_0)}\right\} \). Note that both m and k are \(O(\log _{2}\log _{c_0}(1/\epsilon ))\). With these choices, and assumptions (31) and (32), we have that \(c_0^mb^k16^{-m} \le 1\) and

$$\begin{aligned} |\log (x_{k/2}/y_{k/2})|&\le c^{-k/2}\log (a)\le 1\quad \text {and}\\ |\log (x_k/y_k)|&\le c_0^{-(2^{k/2}-1)}|\log (x_{k/2}/y_{k/2})|^{2^{k/2}} \le c_0^{-(2^{k/2}-1)}. \end{aligned}$$

Using the inequality \(|\tanh (z)|\le |z|\) for all z, and putting \(y=1\) in the error bound (49), we obtain

$$\begin{aligned} |r_{m,k}(x) - g(x)|&\le 2(g(1)-g(0))(1+a)\frac{b^k c_0^{-(2^{k/2}-1)m}16^{-m}}{1-1/4}\\&= \frac{8(g(1)-g(0))(1+a)}{3} c_0^{-2^{k/2}} \le \epsilon . \end{aligned}$$

\(\square \)

Semidefinite Description of \(f_t\)

In this section we establish the linear matrix inequality characterization of \(f_t\) given in Proposition 2. We use the fact that if \(t\in (0,1]\) then

$$\begin{aligned} f_t(X)= & {} (X-I)\left[ t(X-I)+I\right] ^{-1} \nonumber \\= & {} (I/t) - (I/t)\left[ (X-I) + (I/t)\right] ^{-1}(I/t). \end{aligned}$$
(50)

The characterization will follow from the following easy observation.

Proposition 7

If \(A+B \succ 0\) then

$$\begin{aligned} B - B(A+B)^{-1}B \succeq T \quad \iff \quad \begin{bmatrix} A&0\\0&B\end{bmatrix} - \begin{bmatrix}T&T\\T&T \end{bmatrix} \succeq 0. \end{aligned}$$
(51)

Proof

The proof follows by expressing the left-hand side of (51) using Schur complements, followed by a congruence transformation:

$$\begin{aligned} \begin{aligned} B - B(A+B)^{-1}B \succeq T&\quad \iff \quad \begin{bmatrix} A+B&B\\ B&B-T \end{bmatrix} \succeq 0\\&\quad \iff \quad \begin{bmatrix} I&-I\\ 0&-I \end{bmatrix} \begin{bmatrix} A+B&B\\ B&B-T \end{bmatrix} \begin{bmatrix} I&-I\\ 0&-I \end{bmatrix}^T \succeq 0\\&\quad \iff \quad \begin{bmatrix} A-T&-T\\ -T&B-T \end{bmatrix} \succeq 0. \end{aligned} \end{aligned}$$

\(\square \)

Proof (of Proposition 7)

We need to show that

$$\begin{aligned} f_t(X) \succeq T\quad \iff \quad \begin{bmatrix} X-I&0\\0&I\end{bmatrix} \succeq \begin{bmatrix} T&\sqrt{t}T\\\sqrt{t}T&tT\end{bmatrix}. \end{aligned}$$
(52)

The case \(t=0\) can be easily verified to hold. We thus assume \(0 < t \le 1\). Given the expression of \(f_t\) in Eq. (50) we simply apply (51) with \(B=(I/t)\) and \(A= X-I\). This shows that

$$\begin{aligned} f_t(X) \succeq T \quad \iff \quad \begin{bmatrix} X-I&0\\ 0&I/t\end{bmatrix} - \begin{bmatrix} T&T\\ T&T\end{bmatrix} \succeq 0. \end{aligned}$$

Applying a congruence transformation with the diagonal matrix \({{\mathrm{diag}}}(I,\sqrt{t}I)\) yields the desired linear matrix representation (52). \(\square \)

We can also directly get, from Proposition 7, a semidefinite representation of the noncommutative perspective of \(f_t\) defined by \(P_{f_t}(X,Y) {=} Y^{1/2} f_t\left( Y^{-1/2} X Y^{-1/2}\right) Y^{1/2}\).

Proposition 8

If \(t\in [0,1]\) then the perspective \(P_{f_t}\) of \(f_t\) is jointly matrix concave since

$$\begin{aligned}&P_{f_t}(X,Y)\succeq {} T\;\;\ {and}\;\; X,Y \succ 0\nonumber \\&\;\;\iff \;\; \begin{bmatrix} X-Y&0\\0&Y\end{bmatrix} - \begin{bmatrix} T&\sqrt{t}T\\\sqrt{t}T&t T\end{bmatrix} \succeq 0 \;\;\ {and}\;\; X,Y \succ 0. \end{aligned}$$
(53)

Proof

From the definition of \(P_{f_t}\) and the expression (50) for \(f_t\) it is easy to see that we have:

$$\begin{aligned} P_{f_t}(X,Y) = (Y/t) - (Y/t) [(X-Y) + (Y/t)]^{-1} (Y/t). \end{aligned}$$

The semidefinite representation (53) then follows easily by applying (51) with \(B=Y/t\) and \(A=X-Y\), followed by applying a congruence transformation with the diagonal matrix \({{\mathrm{diag}}}(1,\sqrt{t})\). \(\square \)

Integral Representations of Operator Monotone Functions

In this section we show how to obtain the integral representations (24) and (25) as a fairly easy reworking of the following result.

Theorem 8

([24, Theorem 4.4]) If \(h:(-1,1)\rightarrow \mathbb {R}\) is non-constant and operator monotone then there is a unique probability measure \(\tilde{\nu }\) supported on \([-1,1]\) such that

$$\begin{aligned} h(z) = h(0) + h'(0)\int _{-1}^{1}\frac{z}{1-tz}\;\mathrm{d}\tilde{\nu }(t). \end{aligned}$$
(54)

Suppose \(g:\mathbb {R}_{++}\rightarrow \mathbb {R}\) is operator monotone. Then it is straightforward to check that \(h:(-1,1)\rightarrow \mathbb {R}\) defined by \(h(z) = g\left( \frac{1+z}{1-z}\right) \) is operator monotone and that \(g(x) = h\left( \frac{x-1}{x+1}\right) \). By applying Theorem 8 to h(z) and then evaluating at \(z = \frac{x-1}{x+1}\), we obtain the integral representation

$$\begin{aligned} g(x) = h(0) + h'(0)\int _{-1}^{1}\frac{(x-1)}{(x+1)-t(x-1)}\;\mathrm{d}\tilde{\nu }(t).\end{aligned}$$

Using the fact that \(h(0) = g(1)\) and \(h'(0) = 2g'(1)\), and applying a linear change of coordinates to rewrite the integral over [0, 1], we see that there is a probability measure \(\nu \) on [0, 1] such that

$$\begin{aligned} g(x) = g(1) + g'(1)\int _{0}^{1}f_t(x)\;\mathrm{d}\nu (t). \end{aligned}$$
(55)

This establishes (24). If, in addition, g takes positive values, then \(g(0) := \lim _{x\rightarrow 0}g(x) \ge 0\). Hence,

$$\begin{aligned} \begin{aligned} g(0)&= {} g(1) + g'(1)\int _{0}^{1}f_t(0)\;\mathrm {d}\nu (t)\ \\ {}&= g(1)+g'(1)\int _{0}^{1}\frac{-1}{1-t}\; \mathrm {d}\nu (t)\ge 0,\end{aligned}\end{aligned}$$

so we can define a probability measure supported on [0, 1] by \(\mathrm{d}\mu (t) = \frac{g'(1)}{g(1)-g(0)}\left( \frac{1}{1-t}\right) \mathrm{d}\nu (t)\). Then using the fact that \(f_t(x) = \frac{1}{1-t}\left[ f^+_t(x)-1\right] \) we immediately obtain, from (55), the representation

$$\begin{aligned} g(x) = g(0) + (g(1)-g(0))\int _{0}^{1}f^+_t(x)\;\mathrm{d}\mu (t).\end{aligned}$$

This establishes (25).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Fawzi, H., Saunderson, J. & Parrilo, P.A. Semidefinite Approximations of the Matrix Logarithm. Found Comput Math 19, 259–296 (2019). https://doi.org/10.1007/s10208-018-9385-0

Download citation

Keywords

  • Convex optimization
  • Matrix concavity
  • Quantum relative entropy

Mathematics Subject Classification

  • 90C22
  • 52A41
  • 47A63