Semidefinite Approximations of the Matrix Logarithm

Fawzi, Hamza; Saunderson, James; Parrilo, Pablo A.

doi:10.1007/s10208-018-9385-0

Semidefinite Approximations of the Matrix Logarithm

Published: 21 March 2018

Volume 19, pages 259–296, (2019)
Cite this article

Foundations of Computational Mathematics Aims and scope Submit manuscript

Hamza Fawzi¹,
James Saunderson² &
Pablo A. Parrilo³

1692 Accesses
46 Citations
2 Altmetric
Explore all metrics

Abstract

The matrix logarithm, when applied to Hermitian positive definite matrices, is concave with respect to the positive semidefinite order. This operator concavity property leads to numerous concavity and convexity results for other matrix functions, many of which are of importance in quantum information theory. In this paper we show how to approximate the matrix logarithm with functions that preserve operator concavity and can be described using the feasible regions of semidefinite optimization problems of fairly small size. Such approximations allow us to use off-the-shelf semidefinite optimization solvers for convex optimization problems involving the matrix logarithm and related functions, such as the quantum relative entropy. The basic ingredients of our approach apply, beyond the matrix logarithm, to functions that are operator concave and operator monotone. As such, we introduce strategies for constructing semidefinite approximations that we expect will be useful, more generally, for studying the approximation power of functions with small semidefinite representations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Logarithmically sparse symmetric matrices

Article Open access 31 May 2024

Sum-of-Squares Relaxations for Information Theory and Variational Inference

Article 05 April 2024

Some properties of the Lozinskii logarithmic norm

Article 01 June 2017

Notes

We define $D_{\text {op}}$ as $D_{\text {op}}(X\Vert Y) = -P_{\log }(Y,X)$ to match the conventional order of arguments in information theory.
This last requirement is to ensure uniqueness.

References

Al-Mohy, A.H., Higham, N.J.: Improved inverse scaling and squaring algorithms for the matrix logarithm. SIAM J. Sci. Comput. 34(4), C153–C169 (2012)
Article MathSciNet MATH Google Scholar
ApS, M.: The MOSEK optimization toolbox for MATLAB manual. Version 7.1 (Revision 28). (2015). http://docs.mosek.com/7.1/toolbox/index.html
Ben-Tal, A., Nemirovski, A.: Lectures on modern convex optimization: analysis, algorithms, and engineering applications. SIAM (2001)
Ben-Tal, A., Nemirovski, A.: On polyhedral approximations of the second-order cone. Math. Oper. Res. 26(2), 193–205 (2001)
Article MathSciNet MATH Google Scholar
Besenyei, A., Petz, D.: Successive iterations and logarithmic means. Oper. and Matrices 7(1), 205–218 (2013). https://doi.org/10.7153/oam-07-12
Bhatia, R.: Positive definite matrices. Princeton University Press (2009)
Bhatia, R.: Matrix analysis, vol. 169. Springer Science & Business Media (2013)
Blekherman, G., Parrilo, P.A., Thomas, R.R.: Semidefinite optimization and convex algebraic geometry. SIAM (2013)
Boyd, S., Kim, S.J., Vandenberghe, L., Hassibi, A.: A tutorial on geometric programming. Optim. Eng. 8(1), 67–127 (2007)
Article MathSciNet MATH Google Scholar
Bushell, P.J.: Hilbert’s metric and positive contraction mappings in a Banach space. Arch. Ration. Mech. Anal. 52(4), 330–338 (1973)
Article MathSciNet MATH Google Scholar
Carlen, E.A.: Trace inequalities and quantum entropy. An introductory course. In: Entropy and the quantum, vol. 529, pp. 73–140. AMS (2010)
Carlson, B.C.: An algorithm for computing logarithms and arctangents. Math. Comp. 26(118), 543–549 (1972)
Article MathSciNet MATH Google Scholar
Cox, D.A.: The arithmetic-geometric mean of Gauss. In: Pi: A source book, pp. 481–536. Springer (2004)
Dieci, L., Morini, B., Papini, A.: Computational techniques for real logarithms of matrices. SIAM J. Matrix Anal. Appl. 17(3), 570–593 (1996)
Article MathSciNet MATH Google Scholar
Domahidi, A., Chu, E., Boyd, S.: ECOS: An SOCP solver for embedded systems. In: European Control Conference (ECC), pp. 3071–3076 (2013)
Ebadian, A., Nikoufar, I., Gordji, M.E.: Perspectives of matrix convex functions. Proc. Natl. Acad. Sci. USA 108(18), 7313–7314 (2011)
Article MathSciNet MATH Google Scholar
Effros, E., Hansen, F.: Non-commutative perspectives. Ann. Funct. Anal 5(2), 74–79 (2014)
Article MathSciNet MATH Google Scholar
Effros, E.G.: A matrix convexity approach to some celebrated quantum inequalities. Proc. Natl. Acad. Sci. USA 106(4), 1006–1008 (2009)
Article MathSciNet MATH Google Scholar
Fawzi, H., Fawzi, O.: Relative entropy optimization in quantum information theory via semidefinite programming approximations. arXiv preprint arXiv:1705.06671 (2017)
Fawzi, H., Saunderson, J.: Lieb’s concavity theorem, matrix geometric means, and semidefinite optimization. Linear Algebra Appl. 513, 240–263 (2017)
Article MathSciNet MATH Google Scholar
Fujii, J., Kamei, E.: Relative operator entropy in noncommutative information theory. Math. Japon 34, 341–348 (1989)
MathSciNet MATH Google Scholar
Glineur, F.: Quadratic approximation of some convex optimization problems using the arithmetic-geometric mean iteration. Talk at the “Workshop GeoLMI on the geometry and algebra of linear matrix inequalities”. http://homepages.laas.fr/henrion/geolmi/geolmi-glineur.pdf (retrieved November 2, 2016) (2009)
Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx (2014)
Hansen, F., Pedersen, G.K.: Jensen’s inequality for operators and Löwner’s theorem. Math. Ann. 258(3), 229–241 (1982)
Article MathSciNet MATH Google Scholar
Helton, J.W., Klep, I., McCullough, S.: The tracial Hahn–Banach theorem, polar duals, matrix convex sets, and projections of free spectrahedra. J. Eur. Math. Soc. (JEMS) 19(6), 1845–1897 (2017)
Article MathSciNet MATH Google Scholar
Higham, N.J.: Functions of matrices: theory and computation. SIAM (2008)
Kenney, C., Laub, A.J.: Condition estimates for matrix functions. SIAM J. Matrix Anal. Appl. 10(2), 191–209 (1989)
Article MathSciNet MATH Google Scholar
Lieb, E.H.: Convex trace functions and the Wigner-Yanase-Dyson conjecture. Adv. Math. 11(3), 267–288 (1973)
Article MathSciNet MATH Google Scholar
Lieb, E.H., Ruskai, M.B.: Proof of the strong subadditivity of quantum mechanical entropy. J. Math. Phys. 14(12), 1938–1941 (1973)
Article MathSciNet Google Scholar
Meurant, G., Sommariva, A.: Fast variants of the Golub and Welsch algorithm for symmetric weight functions in Matlab. Numerical Algorithms 67(3), 491–506 (2014)
Article MathSciNet MATH Google Scholar
Nesterov, Y.E.: Constructing self-concordant barriers for convex cones. CORE Discussion Paper (2006/30) (2006)
Sagnol, G.: On the semidefinite representation of real functions applied to symmetric matrices. Linear Algebra Appl. 439(10), 2829–2843 (2013)
Article MathSciNet MATH Google Scholar
Serrano, S.A.: Algorithms for unsymmetric cone optimization and an implementation for problems with the exponential cone. Ph.D. thesis, Stanford University (2015)
Skajaa, A., Ye, Y.: A homogeneous interior-point algorithm for nonsymmetric convex conic optimization. Math. Program. 150(2), 391–422 (2015)
Article MathSciNet MATH Google Scholar
Stoer, J., Bulirsch, R.: Introduction to numerical analysis, vol. 12, 3 edn. Springer-Verlag, New York (2002)
Book MATH Google Scholar
Trefethen, L.N.: Is Gauss quadrature better than Clenshaw-Curtis? SIAM Rev. 50(1), 67–87 (2008)
Article MathSciNet MATH Google Scholar
Trefethen, L.N.: Approximation theory and approximation practice. SIAM (2013)
Tropp, J.A.: From joint convexity of quantum relative entropy to a concavity theorem of Lieb. Proc. Amer. Math. Soc. 140(5), 1757–1760 (2012)
Article MathSciNet MATH Google Scholar
Tropp, J.A.: An introduction to matrix concentration inequalities. Found. Trends Mach. Learn. 8(1-2), 1–230 (2015)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, CB3 0WA, UK
Hamza Fawzi
Department of Electrical and Computer Systems Engineering, Monash University, Victoria, 3800, Australia
James Saunderson
Laboratory for Information and Decision Systems, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Pablo A. Parrilo

Authors

Hamza Fawzi
View author publications
You can also search for this author in PubMed Google Scholar
James Saunderson
View author publications
You can also search for this author in PubMed Google Scholar
Pablo A. Parrilo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to James Saunderson.

Additional information

Communicated by James Renegar.

This work was done in part while Pablo Parrilo was visiting the Simons Institute for the Theory of Computing. It was partially supported by the DIMACS/Simons Collaboration on Bridging Continuous and Discrete Optimization through NSF Grant #CCF-1740425. This work was also supported in part by the Air Force Office of Scientific Research through AFOSR Grants FA9550-11-1-0305 and FA9550-12-1-0287, and in part by the National Science Foundation through NSF Grant #CCF-1565235.

Appendices

Background on Approximation Theory

Gaussian quadrature A quadrature rule is a method of approximating an integral with a weighted sum of evaluations of the integrand. A quadrature rule is determined by the evaluation points, called nodes, and the weights of the weighted sum. Given a measure $\nu $ supported on $[-1,1]$, a quadrature rule gives an approximation of the form

$$\begin{aligned} \int _{-1}^{1}h(t)\;\mathrm{d} \nu (t) \approx \sum _{j=1}^{m} w_jh(t_j) \end{aligned}$$

(37)

where the $t_j\in [-1,1]$ are the nodes and the $w_j$ are the weights. A Gaussian quadrature is a choice of nodes $t_1,\ldots ,t_m\in (-1,1)$ and positive weights $w_1,\ldots ,w_m$ that integrates all polynomials of degree at most $2m-1$exactly. For example, when $\nu $ is the uniform measure on $[-1,1]$, such a quadrature rule is known as Gauss–Legendre quadrature, and the nodes and weights can be computed for example by an eigenvalue decomposition of the associated Jacobi matrix, see e.g., [36, Sect. 2].

Padé approximants Padé approximants are approximations of a given univariate function, analytic at a point $x_0$, by rational functions. More precisely, the (m, n)-Padé approximant of h at $x_0$ is the rational function p(x) / q(x) such that p is a polynomial of degree m, q is a polynomial of degree n, and the Taylor series expansion of the error at $x_0$ is of the form

$$\begin{aligned} h(x) - \frac{p(x)}{q(x)} = (x-x_0)^{s}\sum _{k\ge 0}a_k(x-x_0)^k\end{aligned}$$

for real numbers $a_k$ and the largest possible positive integer^{Footnote 2}s. Expressed differently, p and q are chosen so that the Taylor series of p(x) / q(x) at $x_0$ matches as many Taylor series coefficients of h at $x_0$ as possible (and at least the first $m+n+1$ coefficients).

Properties and Error Bounds of Gaussian Quadrature-Based Approximations

Assume $g:\mathbb {R}_{++}\rightarrow \mathbb {R}$ is a function with an integral representation

$$\begin{aligned} g(x) = g(1) + g'(1)\int _{0}^1 f_t(x) \mathrm{d}\nu (t) \end{aligned}$$

(38)

where $\nu $ is a probability measure on [0, 1] and $f_t(x) = \frac{x-1}{1+t(x-1)}$. The case $g=\log $ corresponds to $\nu $ being the Lebesgue measure on [0, 1]. In this appendix we show that the rational approximation obtained by applying Gaussian quadrature on (38) coincides with the Padé approximant of g at $x=1$. We also derive error bounds on the quality of this rational approximation. Note that functions of the form (38) are precisely operator monotone functions, by Löwner’s theorem (see Sect. 4.1).

Let $r_m$ be the rational approximant obtained by using Gaussian quadrature on (38):

$$\begin{aligned} r_m(x) = g(1) + g'(1)\sum _{i=1}^m w_i f_{t_i}(x) \end{aligned}$$

(39)

where $w_i > 0, t_i \in [0,1]$ are the Gaussian quadrature weights and nodes for the measure $\nu $.

1.1 Connection with Padé Approximant

We first show that the function $r_m$ coincides with the Padé approximant of g at $x=1$. The special case $g =\log $ was established in [14, Theorem 4.3].

Proposition 5

Assume $g:\mathbb {R}_{++}\rightarrow \mathbb {R}$ has the form (38) and let $r_m$ be the rational approximation obtained via Gaussian quadrature as in (39). Then $r_m$ is the (m, m) Padé approximant of g at $x=1$.

Proof

First we note that $f_t(x)$ admits the following series expansion, valid for $|x-1|<\frac{1}{|t|}$:

$$\begin{aligned} f_t(x) =&\frac{x-1}{t(x-1)+1} = (x-1)\sum _{k=0}^{\infty } (-1)^kt^k(x-1)^k. \end{aligned}$$

Let $\nu _m = \sum _{i=1}^m w_i \delta _{t_i}$ be the atomic measure on [0, 1] corresponding to Gaussian quadrature applied to $\nu $. By definition of Gaussian quadrature, $\nu _m$ matches all moments of $\nu $ up to degree $2m-1$, i.e., $\int _{0}^{1}p(t)\;\mathrm{d}\nu (t) = \int _{0}^{1}p(t)\;\mathrm{d}\nu _m(t)$ for all polynomials p of degree at most $2m-1$. It thus follows that

$$\begin{aligned} g(x) - r_m(x)= g'(1)(x-1) \sum _{k=2m}^{\infty } (-1)^k(x-1)^{k}\left[ \int _{0}^{1}t^k\; d\nu (t) - \int _{0}^{1}t^k\; d\nu _m(t)\right] ,\end{aligned}$$

establishing that $r_m$ matches the first 2m Taylor series coefficients of g at $x=1$. Since $r_m$ has numerator and denominator degree m, it is the (m, m)-Padé approximant of g at $x=1$. $\square $

1.2 Error Bounds

In this section we derive an error bound on the approximation quality $|g(x) - r_m(x)|$. To do this we use standard methods as described, e.g., in [37]. This error is essentially controlled by the decay of the Chebyshev coefficients of the integrand. For the rational functions $f_t$ one can compute these coefficients exactly.

1.2.1 Quadrature Error Bounds for Operator Monotone Functions

To appeal to standard arguments, it is easiest to rewrite the integrals of interest over the interval $[-1,1]$ by the transformation $t\mapsto 1-2t$ mapping [0, 1] to $[-1,1]$. To this end, let

$$\begin{aligned} \tilde{f}_t(x) := f_{\frac{1-t}{2}}(x)= \frac{2}{\frac{x+1}{x-1}-t}.\end{aligned}$$

Let $T_k(t)$ denote the kth Chebyshev polynomial. We start by explicitly computing the Chebyshev expansion of $\tilde{f}_t(x)$ for fixed x, i.e., we find the coefficients $a_k(x)$ of $\tilde{f}_t(x) = \sum _{k=0}^{\infty }a_k(x)T_k(t)$. To do this, we first define $h_\rho (t) = \frac{2}{(\rho +\rho ^{-1})/2 - t}$ and observe that with the substitution $\rho = \frac{\sqrt{x}-1}{\sqrt{x}+1}$ we have that $\tilde{f}_t(x) = h_\rho (t)$ and that $x>0$ if and only if $-1<\rho <1$. We can compute the Chebyshev expansion of $h_\rho (t)$ by observing that the generating function of Chebyshev polynomials is (see e.g., [37, Exercise 3.14])

$$\begin{aligned} \sum _{k=0}^{\infty }\rho ^kT_k(t) = \frac{1-\rho t}{1-2\rho t + \rho ^2} = \frac{1}{2} + \frac{\rho ^{-1}-\rho }{8}h_\rho (t).\end{aligned}$$

It then follows that the Chebyshev expansion of $h_\rho (t)$ is

$$\begin{aligned} h_\rho (t) = \frac{2}{(\rho +\rho ^{-1})/2 - t} = \frac{8}{\rho ^{-1}-\rho }\left[ \frac{1}{2} + \sum _{k=1}^{\infty } \rho ^k T_k(t)\right] . \end{aligned}$$

(40)

Since $\frac{8}{\rho ^{-1}-\rho } = 2(\sqrt{x} - 1/\sqrt{x})$, the Chebyshev expansion of $\tilde{f}_t(x)$ is

$$\begin{aligned} \tilde{f}_t(x) = \frac{2}{\frac{x+1}{x-1}-t} = 2\left( \sqrt{x} - 1/\sqrt{x}\right) \left[ \frac{1}{2} + \sum _{k=1}^{\infty } \left( \frac{\sqrt{x}-1}{\sqrt{x}+1}\right) ^k T_k(t)\right] . \end{aligned}$$

(41)

We are now ready to state an error bound on the approximation quality $|g(x) - r_m(x)|$. Our arguments are standard and follow closely the ideas described in [37].

Proposition 6

Let $g:\mathbb {R}_{++}\rightarrow \mathbb {R}$ be a function with an integral representation (38) and let $r_m$ be the rational approximation obtained by applying Gaussian quadrature as in (39). If $m\ge 1$ and $x>0$ then

$$\begin{aligned} \left| g(x) - r_m(x)\right| \le 4g'(1)|\sqrt{x}-1/\sqrt{x}|\frac{\left| \frac{\sqrt{x}-1}{\sqrt{x}+1}\right| ^{2m}}{1-\left| \frac{\sqrt{x}-1}{\sqrt{x}+1}\right| }. \end{aligned}$$

(42)

If $\nu $ is invariant under the map $t\mapsto 1-t$ (i.e., $g(x^{-1}) = -g(x)$) then this can be improved to

$$\begin{aligned} \left| g(x) - r_m(x)\right| \le g'(1)\left| \sqrt{x} - 1/\sqrt{x}\right| ^2\left| \frac{\sqrt{x}-1}{\sqrt{x}+1}\right| ^{2m-1}. \end{aligned}$$

(43)

Finally, $r_m(x) \ge g(x)$ for all $0<x\le 1$ and $r_m(x) \le g(x)$ for all $x\ge 1$.

Proof

Let $\tilde{\nu }$ be the measure on $[-1,1]$ obtained from $\nu $ by changing variables $t \in [0,1] \mapsto 1-2t \in [-1,1]$ so that $g(x) = g(0) + g'(1) \int _{-1}^1 \tilde{f_t}(x) \mathrm{d}\tilde{\nu }(t)$. Let $\tilde{\nu }_m$ be the atomic measure supported on m points obtained by applying Gaussian quadrature on $\nu $. Finally let the Chebyshev expansion of $\tilde{f}_t(x)$ be $\sum _{k=0}^\infty a_k(x)T_k(t)$. Since $\int _{-1}^{1}T_k(t)\;\mathrm{d}\tilde{\nu }(t) = \int _{-1}^{1}T_k(t)\;\mathrm{d}\tilde{\nu }_m(t)$ for $k\le 2m-1$,

$$\begin{aligned} \left| g(x) - r_m(x)\right| = g'(1)\left| \sum _{k=2m}^{\infty }a_k(x)\left[ \int _{-1}^{1}T_k(t)\;\mathrm{d}\tilde{\nu }(t) - \int _{-1}^{1}T_k(t)\;\mathrm{d}\tilde{\nu }_m(t)\right] \right| .\end{aligned}$$

For $k\ge 2$, we have that $a_k(x) = 2(\sqrt{x}-1/\sqrt{x})\left( \frac{\sqrt{x}-1}{\sqrt{x}+1}\right) ^k$ [see (41)]. So using the fact that $\tilde{\nu }$ and $\tilde{\nu }_m$ are probability measures (when $m\ge 1$), together with the fact that $|T_k(t)|\le 1$ for $t\in [-1,1]$, the triangle inequality gives

$$\begin{aligned} \left| g(x) - r_m(x)\right|\le & {} 4g'(1)|\sqrt{x}-1/\sqrt{x}|\sum _{k=2m}^{\infty }\left| \frac{\sqrt{x}-1}{\sqrt{x}+1}\right| ^k\\= & {} 4g'(1)|\sqrt{x}-1/\sqrt{x}|\frac{\left| \frac{\sqrt{x}-1}{\sqrt{x}+1}\right| ^{2m}}{1-\left| \frac{\sqrt{x}-1}{\sqrt{x}+1}\right| }. \end{aligned}$$

If the measure $\nu $ is invariant under the map $t\mapsto -t$ then the same is true of $\nu _m$ (see, e.g., [30]). Since $\tilde{f}_t(x^{-1}) = -\tilde{f}_{-t}(x)$ it follows that $r_m(x^{-1}) = -r_m(x)$. Furthermore,

$$\begin{aligned} \int _{-1}^{1}T_{2k+1}(t)\;\mathrm{d}\tilde{\nu }(t) = \int _{-1}^{1}T_{2k+1}(t)\;\mathrm{d}\tilde{\nu }_m(t) = 0\qquad \text {for all nonnegative integers}\,k\end{aligned}$$

because Chebyshev polynomials of odd degree are odd functions. In this case only the even Chebyshev coefficients contribute to the error bound so

$$\begin{aligned} \left| g(x) - r_m(x)\right|\le & {} 4g'(1)|\sqrt{x}-1/\sqrt{x}|\sum _{k=m}^{\infty }\left| \frac{\sqrt{x}-1}{\sqrt{x}+1}\right| ^{2k}\\= & {} g'(1)|\sqrt{x}-1/\sqrt{x}|^2\left| \frac{\sqrt{x}-1}{\sqrt{x}+1}\right| ^{2m-1}. \end{aligned}$$

To establish inequalities between $r_m(x)$ and g(x), we use an alternative formula for the error obtained by approximating an integral via Gaussian quadrature. Since $t\mapsto \tilde{f}_t(x)$ has derivatives of all orders, one can show (see, e.g., [35, Theorem 3.6.24]) that there exists $\tau \in [-1,1]$ and $\kappa \ge 0$ such that

$$\begin{aligned} g(x) - r_m(x) = \frac{\kappa }{(2m){!}} \frac{\partial ^{2m}}{\partial t^{2m}}\tilde{f}_\tau (x) = \frac{\kappa }{\left( \frac{x+1}{x-1}-\tau \right) ^{2m+1}}.\end{aligned}$$

If $x\in (0,1)$ then $\frac{1+x}{1-x}-\tau <0$ for all $\tau \in [-1,1]$ and so $g(x)-r_m(x) < 0$. If $x\in (1,\infty )$ then $\frac{1+x}{1-x}-\tau > 0$ for all $\tau \in [-1,1]$ and so $g(x) - r_m(x) > 0$. If $x=1$ then $g(x) = r_m(x)$. $\square $

Very similar bounds hold for the error between $g:\mathbb {R}_{++}\rightarrow \mathbb {R}_{++}$, a positive operator monotone function, and $r_{m}^+$, the rational approximation obtained by applying Gaussian quadrature to the integral representation in (25). Indeed, if $m\ge 1$ and $x> 0$,

$$\begin{aligned} \left| g(x) - r_m^+(x)\right| \le 4(g(1)-g(0))\sqrt{x}\frac{\left| \frac{\sqrt{x}-1}{\sqrt{x}+1}\right| ^{2m}}{1-\left| \frac{\sqrt{x}-1}{\sqrt{x}+1}\right| }. \end{aligned}$$

(44)

We omit the proof, since it follows the same basic argument as the proof of (42), together with the observation that $f_t^+(x) = \frac{x}{x-1}f_t(x)$.

1.2.2 The Special Case of Log: Proofs of Proposition 1 and Theorem 1

Proof (of Proposition 1)

The function $g(x) = \log (x)$ has an integral representation (38) where the measure $\nu $ is the uniform measure on [0, 1], which is invariant under the map $t\mapsto 1-t$. Proposition 6 tells us that, for any $x>0$,

$$\begin{aligned} |\log (x) - r_m(x)| \le |\sqrt{x}-1/\sqrt{x}|^2\left| \frac{\sqrt{x}-1}{\sqrt{x}+1}\right| ^{2m-1}. \end{aligned}$$

(45)

The error between $\log (x) = 2^k\log (x^{1/2^k})$ and $r_{m,k}(x) = 2^kr_m(x^{1/2^k})$ can be obtained by evaluating at $x^{1/2^k}$ and scaling by $2^k$ to obtain

$$\begin{aligned} |\log (x) - r_{m,k}(x)| \le 2^k|\sqrt{\kappa }-1/\sqrt{\kappa }|^2\left| \frac{\sqrt{\kappa }-1}{\sqrt{\kappa }+1}\right| ^{2m-1}. \end{aligned}$$

(46)

where $\kappa = x^{1/2^k}$. By using the fact that

$$\begin{aligned} \left| \frac{\sqrt{\kappa } - 1}{\sqrt{\kappa }+1}\right| = \left| \tanh \left( \frac{1}{4}\log (\kappa )\right) \right| \le \frac{1}{2^{k+2}}|\log (x)|\end{aligned}$$

we can write this as a bound on relative error as

$$\begin{aligned} |\log (x) - r_{m,k}(x)| \le |\log (x)|\left| \frac{\sqrt{\kappa } - 1/\sqrt{\kappa }}{2}\right| ^2\left| \frac{\sqrt{\kappa }-1}{\sqrt{\kappa }+1}\right| ^{2(m-1)}.\end{aligned}$$

Asymptotic behavior of (46): Since $\kappa = x^{1/2^k} = e^{2^{-k}\log (x)}$, we can rewrite the right-hand side of (46) as

$$\begin{aligned} 2^k \left[ 2\sinh (2^{-(k+1)}\log (x))\right] ^2 \left[ \tanh (2^{-(k+2)}\log (x))\right] ^{2m-1}.\end{aligned}$$

Since $\sinh ^2(2x)\tanh ^{2m-1}(x) = 4x^{2m+1} + O(x^{2m+3})$, we have that

$$\begin{aligned} 2^k \left[ 2\sinh (2^{-(k+1)}\log (x))\right] ^2 \left[ \tanh (2^{-(k+2)}\log (x))\right] ^{2m-1}\asymp 4\cdot 4^{-m(k+2)}\log (x)^{2m+1}\end{aligned}$$

as $k\rightarrow \infty $. $\square $

Proof (of Theorem 1)

The function r is chosen to be of the form $r_{m,k}$ for certain m and k. In particular we can choose $k = k_1+k_2$, with $k_1 = \lceil \log _2\log _e(a)\rceil +1$, $k_2$ being the smallest even integer larger than $\sqrt{\log _2(32\log _e(a)/\epsilon )}$, and with $m=k_2/2$. The function $r_{m,k}$ has a semidefinite representation of size $m+k$ (as a special case of Theorem 3 in Sect. 3), which is $O(\sqrt{\log _e(1/\epsilon )})$ for fixed a. It remains to establish the error bound. To do so, we first note that $x^{1/2^{k_1}} < 1$ for all $x\in [1/a,a]$. Then, for all $x\in [1/a,a]$,

$$\begin{aligned} \begin{aligned} |r_{m,k}(x) - \log (x)|&\le 2^k|x^{1/2^{k+1}} - x^{-1/2^{k+1}}|^2\left| \frac{x^{1/2^{k+1}} - 1}{x^{1/2^{k+1}}+1}\right| ^{2m-1}\\ {}&= 2^{k+2}\sinh ^{2}\left( \frac{1}{2^{k_2+1}}\log _e(x^{1/2^{k_1}})\right) \\ {}&\times \tanh ^{2m-1}\left( \frac{1}{2^{k_2+2}}\log _e(x^{1/2^{k_1}})\right) \\ {}&\le 8\cdot 2^{k_1-1}2^{k_2}\sinh ^2(1/2^{k_2+1}) \tanh ^{2m-1}(1/2^{k_2+2})\\ {}&\le 8 \log _e(a) 2^{k_2} 2^{-(k_2+2)(2m-1)}\\ {}&\le \epsilon . \end{aligned} \end{aligned}$$

Here, the second last equality holds because $\sinh (1/2)^2 \le 1,\tanh (x)\le x$ for all $x\ge 0$, and $2^{k_1-1} \le \log _e(a)$ (by our choice of $k_1$). The last inequality holds by our choice of m and $k_2$. $\square $

1.2.3 Proof of Theorem 7

Proof (of Theorem 7)

The function r is of the form $r_{m,k}$ defined by (29) for particular values of the parameters m and k. Throughout the proof, for convenience of notation, let $(x_k,y_k) = \varPhi ^{(k)}(x,y)$ for $k\ge 0$. The error bound (44) of “Appendix B.2” shows that for any $x,y > 0$:

$$\begin{aligned} |yr_{m,k}(x/y) - yg(x/y)|&= |y_kr_m^+(x_k/y_k) - y_kg(x_k/y_k)|\nonumber \\&\le 4(g(1)-g(0))\sqrt{x_ky_k}\frac{\left| \frac{\sqrt{x_k}-\sqrt{y_k}}{\sqrt{x_k}+\sqrt{y_k}}\right| ^{2m}}{1-\left| \frac{\sqrt{x_k}-\sqrt{y_k}}{\sqrt{x_k}+\sqrt{y_k}}\right| } \nonumber \\&= 4(g(1)-g(0)) \sqrt{x_k y_k} \frac{\left| \tanh \left( \frac{1}{4}\log (x_k/y_k)\right) \right| ^{2m}}{1-\left| \tanh \left( \frac{1}{4}\log (x_k/y_k)\right) \right| }. \end{aligned}$$

(47)

We will show that if $\varPhi $ has the linear contraction property (31) then the bound (47) decays like $O(c^{-k^2})$ for the choice of $m \approx k$ (that we make precise later). To establish this, we need to bound two terms: first, if (31) holds then $\log (x_k / y_k) = O(c^{-k/2})$ and so the numerator in (47) converges like $O(c^{-km})$ as we want. The second term that we need to control is $\sqrt{x_k y_k}$ and one can show that this term grows at most linearly. This is proved in the following lemma:

Lemma 1

There is a constant $b > 0$ such that for any $x,y > 0$ satisfying $a^{-1} \le x/y \le a$ we have

$$\begin{aligned} \sqrt{x_k y_k} \le yb^k (1+a)/2 \end{aligned}$$

(48)

where $(x_k,y_k) = \varPhi ^{(k)}(x,y)$.

Proof

Since $h_1$ and $h_2$ are concave, they are each bounded above by their linear approximation at $x=1$. As such, $P_{h_i}(x,y) \le h_i'(1)(x-y) + h_i(1)y$ for all $x,y\in \mathbb {R}_{++}$ and $i=1,2$. Summing these two inequalities we see that

$$\begin{aligned} P_{h_1}(x,y)+P_{h_2}(x,y)\le & {} \left[ h_1'(1)+h_2'(1)\right] x \\&+ \left[ h_1(1)+h_2(1)-(h_1'(1) +h_2'(1))\right] y.\end{aligned}$$

Because $h_1$ and $h_2$ take positive values, $h_1(1) \ge h_1(1)-h_1'(1)\ge 0$ and $h_2(1) \ge h_2(1) - h_2'(1)\ge 0$. As such, if $b = \max \{h_1'(1)+h_2'(1),h_1(1)+h_2(1) - (h_1'(1)+h_2'(1))\}$, then $P_{h_1}(x,y)+P_{h_2}(x,y) \le b(x+y)$ for all $x,y\in \mathbb {R}_{++}$. It then follows that $x_k + y_k \le b^k(x+y)$ for all $x,y\in \mathbb {R}_{++}$ and so that

$$\begin{aligned} \sqrt{x_ky_k} \le (x_k+y_k)/2 \le y b^k(1+a)/2 \end{aligned}$$

as desired. $\square $

Plugging (48) in (47) gives us, for any $a^{-1} \le x/y \le a$:

$$\begin{aligned} |r_{m,k}(x/y) - g(x/y)|\le & {} 2(g(1) - g(0)) (1+a) b^k\frac{\left| \tanh \left( \frac{1}{4}\log (x_k/y_k)\right) \right| }{1-\left| \tanh \left( \frac{1}{4}\log (x_k/y_k)\right) \right| }.\nonumber \\ \end{aligned}$$

(49)

Choose k to be the smallest even integer satisfying

$$\begin{aligned} k \ge \max \left\{ 2\log _c\log (a), \sqrt{\log _c\left( \frac{8(g(1)-g(0))(1+a)}{3\epsilon }\right) }\right\} \end{aligned}$$

and m to be the smallest integer satisfying $m \ge k\max \{1,\frac{\log (b)}{\log (16)}\}$. Note that both m and k are $O(\sqrt{\log _c(1/\epsilon )})$ when we treat a and b as constants. With these choices, and assumption (31), we have that $b^k16^{-m} \le 1$ and

$$\begin{aligned}&|\log (x_{k/2}/y_{k/2})|\le {} c^{-k/2}\log (a) \le 1\quad \text{ and }\quad \\&|\log (x_k/y_k)| \le {} c^{-k/2}|\log (x_{k/2}/y_{k/2})| \le c^{-k/2}.\end{aligned}$$

Using the inequality $|\tanh (z)| \le |z|$ for all z, and setting $y=1$ in the error bound (49), we have that

$$\begin{aligned} |r_{m,k}(x) - g(x)|\le & {} 2(g(1)-g(0))(1+a)\frac{b^kc^{-km}16^{-m}}{1-1/4} \\= & {} \frac{8(g(1)-g(0))(1+a)}{3} c^{-k^2} \le \epsilon .\end{aligned}$$

The size of the semidefinite representation of $r = r_{m,k}$ is $O(m+k)$, if we view the size of the semidefinite representations of $h_1$ and $h_2$ as being constant. Since $m,k\in O(\sqrt{\log _c(1/\epsilon )})$ it follows that the size of the semidefinite representation of r is also $O(\sqrt{\log _c(1/\epsilon )})$.

In the case where assumption (32) also holds, we choose m (respectively, k) to be the smallest integer (respectively, even integer) satisfying

$$\begin{aligned} k \ge \max \left\{ 2\log _c\log (a),2\log _2\log _{c_0}\left( \frac{8(g(1)-g(0))(1+a)}{ 3\epsilon }\right) \right\} \end{aligned}$$

and $m\ge \max \left\{ 1,\frac{k\log (b)}{\log (16/c_0)}\right\} $. Note that both m and k are $O(\log _{2}\log _{c_0}(1/\epsilon ))$. With these choices, and assumptions (31) and (32), we have that $c_0^mb^k16^{-m} \le 1$ and

$$\begin{aligned} |\log (x_{k/2}/y_{k/2})|&\le c^{-k/2}\log (a)\le 1\quad \text {and}\\ |\log (x_k/y_k)|&\le c_0^{-(2^{k/2}-1)}|\log (x_{k/2}/y_{k/2})|^{2^{k/2}} \le c_0^{-(2^{k/2}-1)}. \end{aligned}$$

Using the inequality $|\tanh (z)|\le |z|$ for all z, and putting $y=1$ in the error bound (49), we obtain

$$\begin{aligned} |r_{m,k}(x) - g(x)|&\le 2(g(1)-g(0))(1+a)\frac{b^k c_0^{-(2^{k/2}-1)m}16^{-m}}{1-1/4}\\&= \frac{8(g(1)-g(0))(1+a)}{3} c_0^{-2^{k/2}} \le \epsilon . \end{aligned}$$

$\square $

Semidefinite Description of $f_t$

In this section we establish the linear matrix inequality characterization of $f_t$ given in Proposition 2. We use the fact that if $t\in (0,1]$ then

$$\begin{aligned} f_t(X)= & {} (X-I)\left[ t(X-I)+I\right] ^{-1} \nonumber \\= & {} (I/t) - (I/t)\left[ (X-I) + (I/t)\right] ^{-1}(I/t). \end{aligned}$$

(50)

The characterization will follow from the following easy observation.

Proposition 7

If $A+B \succ 0$ then

$$\begin{aligned} B - B(A+B)^{-1}B \succeq T \quad \iff \quad \begin{bmatrix} A&0\\0&B\end{bmatrix} - \begin{bmatrix}T&T\\T&T \end{bmatrix} \succeq 0. \end{aligned}$$

(51)

Proof

The proof follows by expressing the left-hand side of (51) using Schur complements, followed by a congruence transformation:

$$\begin{aligned} \begin{aligned} B - B(A+B)^{-1}B \succeq T&\quad \iff \quad \begin{bmatrix} A+B&B\\ B&B-T \end{bmatrix} \succeq 0\\&\quad \iff \quad \begin{bmatrix} I&-I\\ 0&-I \end{bmatrix} \begin{bmatrix} A+B&B\\ B&B-T \end{bmatrix} \begin{bmatrix} I&-I\\ 0&-I \end{bmatrix}^T \succeq 0\\&\quad \iff \quad \begin{bmatrix} A-T&-T\\ -T&B-T \end{bmatrix} \succeq 0. \end{aligned} \end{aligned}$$

$\square $

Proof (of Proposition 7)

We need to show that

$$\begin{aligned} f_t(X) \succeq T\quad \iff \quad \begin{bmatrix} X-I&0\\0&I\end{bmatrix} \succeq \begin{bmatrix} T&\sqrt{t}T\\\sqrt{t}T&tT\end{bmatrix}. \end{aligned}$$

(52)

The case $t=0$ can be easily verified to hold. We thus assume $0 < t \le 1$. Given the expression of $f_t$ in Eq. (50) we simply apply (51) with $B=(I/t)$ and $A= X-I$. This shows that

$$\begin{aligned} f_t(X) \succeq T \quad \iff \quad \begin{bmatrix} X-I&0\\ 0&I/t\end{bmatrix} - \begin{bmatrix} T&T\\ T&T\end{bmatrix} \succeq 0. \end{aligned}$$

Applying a congruence transformation with the diagonal matrix ${{\mathrm{diag}}}(I,\sqrt{t}I)$ yields the desired linear matrix representation (52). $\square $

We can also directly get, from Proposition 7, a semidefinite representation of the noncommutative perspective of $f_t$ defined by $P_{f_t}(X,Y) {=} Y^{1/2} f_t\left( Y^{-1/2} X Y^{-1/2}\right) Y^{1/2}$.

Proposition 8

If $t\in [0,1]$ then the perspective $P_{f_t}$ of $f_t$ is jointly matrix concave since

$$\begin{aligned}&P_{f_t}(X,Y)\succeq {} T\;\;\ {and}\;\; X,Y \succ 0\nonumber \\&\;\;\iff \;\; \begin{bmatrix} X-Y&0\\0&Y\end{bmatrix} - \begin{bmatrix} T&\sqrt{t}T\\\sqrt{t}T&t T\end{bmatrix} \succeq 0 \;\;\ {and}\;\; X,Y \succ 0. \end{aligned}$$

(53)

Proof

From the definition of $P_{f_t}$ and the expression (50) for $f_t$ it is easy to see that we have:

$$\begin{aligned} P_{f_t}(X,Y) = (Y/t) - (Y/t) [(X-Y) + (Y/t)]^{-1} (Y/t). \end{aligned}$$

The semidefinite representation (53) then follows easily by applying (51) with $B=Y/t$ and $A=X-Y$, followed by applying a congruence transformation with the diagonal matrix ${{\mathrm{diag}}}(1,\sqrt{t})$. $\square $

Integral Representations of Operator Monotone Functions

In this section we show how to obtain the integral representations (24) and (25) as a fairly easy reworking of the following result.

Theorem 8

([24, Theorem 4.4]) If $h:(-1,1)\rightarrow \mathbb {R}$ is non-constant and operator monotone then there is a unique probability measure $\tilde{\nu }$ supported on $[-1,1]$ such that

$$\begin{aligned} h(z) = h(0) + h'(0)\int _{-1}^{1}\frac{z}{1-tz}\;\mathrm{d}\tilde{\nu }(t). \end{aligned}$$

(54)

Suppose $g:\mathbb {R}_{++}\rightarrow \mathbb {R}$ is operator monotone. Then it is straightforward to check that $h:(-1,1)\rightarrow \mathbb {R}$ defined by $h(z) = g\left( \frac{1+z}{1-z}\right) $ is operator monotone and that $g(x) = h\left( \frac{x-1}{x+1}\right) $. By applying Theorem 8 to h(z) and then evaluating at $z = \frac{x-1}{x+1}$, we obtain the integral representation

$$\begin{aligned} g(x) = h(0) + h'(0)\int _{-1}^{1}\frac{(x-1)}{(x+1)-t(x-1)}\;\mathrm{d}\tilde{\nu }(t).\end{aligned}$$

Using the fact that $h(0) = g(1)$ and $h'(0) = 2g'(1)$, and applying a linear change of coordinates to rewrite the integral over [0, 1], we see that there is a probability measure $\nu $ on [0, 1] such that

$$\begin{aligned} g(x) = g(1) + g'(1)\int _{0}^{1}f_t(x)\;\mathrm{d}\nu (t). \end{aligned}$$

(55)

This establishes (24). If, in addition, g takes positive values, then $g(0) := \lim _{x\rightarrow 0}g(x) \ge 0$. Hence,

$$\begin{aligned} \begin{aligned} g(0)&= {} g(1) + g'(1)\int _{0}^{1}f_t(0)\;\mathrm {d}\nu (t)\ \\ {}&= g(1)+g'(1)\int _{0}^{1}\frac{-1}{1-t}\; \mathrm {d}\nu (t)\ge 0,\end{aligned}\end{aligned}$$

so we can define a probability measure supported on [0, 1] by $\mathrm{d}\mu (t) = \frac{g'(1)}{g(1)-g(0)}\left( \frac{1}{1-t}\right) \mathrm{d}\nu (t)$. Then using the fact that $f_t(x) = \frac{1}{1-t}\left[ f^+_t(x)-1\right] $ we immediately obtain, from (55), the representation

$$\begin{aligned} g(x) = g(0) + (g(1)-g(0))\int _{0}^{1}f^+_t(x)\;\mathrm{d}\mu (t).\end{aligned}$$

This establishes (25).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fawzi, H., Saunderson, J. & Parrilo, P.A. Semidefinite Approximations of the Matrix Logarithm. Found Comput Math 19, 259–296 (2019). https://doi.org/10.1007/s10208-018-9385-0

Download citation

Received: 02 May 2017
Revised: 12 February 2018
Accepted: 09 March 2018
Published: 21 March 2018
Issue Date: 15 April 2019
DOI: https://doi.org/10.1007/s10208-018-9385-0

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semidefinite Approximations of the Matrix Logarithm

Abstract

Access this article

Similar content being viewed by others

Logarithmically sparse symmetric matrices

Sum-of-Squares Relaxations for Information Theory and Variational Inference

Some properties of the Lozinskii logarithmic norm

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Background on Approximation Theory

Properties and Error Bounds of Gaussian Quadrature-Based Approximations

1.1 Connection with Padé Approximant

Proposition 5

Proof

1.2 Error Bounds

1.2.1 Quadrature Error Bounds for Operator Monotone Functions

Proposition 6

Proof

1.2.2 The Special Case of Log: Proofs of Proposition 1 and Theorem 1

Proof (of Proposition 1)

Proof (of Theorem 1)

1.2.3 Proof of Theorem 7

Proof (of Theorem 7)

Lemma 1

Proof

Semidefinite Description of \(f_t\)

Proposition 7

Proof

Proof (of Proposition 7)

Proposition 8

Proof

Integral Representations of Operator Monotone Functions

Theorem 8

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation