Randomized matrix-free trace and log-determinant estimators

Saibaba, Arvind K.; Alexanderian, Alen; Ipsen, Ilse C. F.

doi:10.1007/s00211-017-0880-z

Randomized matrix-free trace and log-determinant estimators

Published: 06 April 2017

Volume 137, pages 353–395, (2017)
Cite this article

Numerische Mathematik Aims and scope Submit manuscript

Arvind K. Saibaba¹,
Alen Alexanderian¹ &
Ilse C. F. Ipsen¹

1307 Accesses
34 Citations
1 Altmetric
Explore all metrics

Abstract

We present randomized algorithms for estimating the trace and determinant of Hermitian positive semi-definite matrices. The algorithms are based on subspace iteration, and access the matrix only through matrix vector products. We analyse the error due to randomization, for starting guesses whose elements are Gaussian or Rademacher random variables. The analysis is cleanly separated into a structural (deterministic) part followed by a probabilistic part. Our absolute bounds for the expectation and concentration of the estimators are non-asymptotic and informative even for matrices of low dimension. For the trace estimators, we also present asymptotic bounds on the number of samples (columns of the starting guess) required to achieve a user-specified relative error. Numerical experiments illustrate the performance of the estimators and the tightness of the bounds on low-dimensional matrices, and on a challenging application in uncertainty quantification arising from Bayesian optimal experimental design.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Approximate Bayesian computation using asymptotically normal point estimates

Article 27 April 2022

Eigenvalue Attraction

Article 16 December 2015

On some stable linear functional regression estimators based on random projections

Article 17 April 2024

Notes

The square matrix $\mathbf {I}$ denotes the identity, with ones on the diagonal and zeros everywhere else.
The superscript $\dagger $ denotes the Moore–Penrose inverse.
independent and identically distributed.

References

Akçelik, V., Biros, G., Draganescu, A., Ghattas, O., Hill, J., Van Bloemen Waanders, B.: Dynamic data-driven inversion for terascale simulations: real-time identification of airborne contaminants. In: Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference, pp. 43–43 (2005)
Alexanderian, A., Gloor, P.J., Ghattas, O.: On Bayesian $A$-and $D$-optimal experimental designs in infinite dimensions. Bayesian Anal. 11(3), 671–695 (2016)
Article MathSciNet MATH Google Scholar
Alexanderian, A., Petra, N., Stadler, G., Ghattas, O.: A-optimal design of experiments for infinite-dimensional Bayesian linear inverse problems with regularized $\ell _0$-sparsification. SIAM J. Sci. Comput. 36(5), A2122–A2148 (2014)
Article MATH Google Scholar
Anitescu, M., Chen, J., Wang, L.: A matrix-free approach for solving the parametric Gaussian process maximum likelihood problem. SIAM J. Sci. Comput. 34(1), A240–A262 (2012)
Article MathSciNet MATH Google Scholar
Atkinson, A.C., Donev, A.N.: Optimum Experimental Designs. Oxford University Press, Oxford (1992)
MATH Google Scholar
Avron, H., Toledo, S.: Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix. J. ACM 58(2), Art. 8, 17 (2011)
Bai, Z., Fahey, M., Golub, G.: Some large-scale matrix computation problems. J. Comput. Appl. Math. 74(1), 71–89 (1996)
Article MathSciNet MATH Google Scholar
Bai, Z., Golub, G.H.: Bounds for the trace of the inverse and the determinant of symmetric positive definite matrices. Ann. Numer. Math. 4(1-4), 29–38 (1997). The heritage of P.L. Chebyshev: a Festschrift in honor of the 70th birthday of T.J. Rivlin
Barry, R.P., Pace, R.K.: Monte Carlo estimates of the log determinant of large sparse matrices. Linear Algebra Appl. 289(1-3), 41–54 (1999). Linear algebra and statistics (Istanbul, 1997)
Boutsidis, C., Drineas, P., Kambadur, P., Zouzias, A.: A randomized algorithm for approximating the log determinant of a symmetric positive definite matrix. arXiv:1503.00374 (2015)
Chaloner, K., Verdinelli, I.: Bayesian experimental design: a review. Stat. Sci. 10(3), 273–304 (1995)
Article MathSciNet MATH Google Scholar
Chen, J., Anitescu, M., Saad, Y.: Computing $f({A})b$ via least squares polynomial approximations. SIAM J. Sci. Comput. 33(1), 195–222 (2011)
Article MathSciNet MATH Google Scholar
Flath, P.H., Wilcox, L.C., Akçelik, V., Hill, J., van Bloemen Waanders, B., Ghattas, O.: Fast algorithms for Bayesian uncertainty quantification in large-scale linear inverse problems based on low-rank partial Hessian approximations. SIAM J. Sci. Comput. 33(1), 407–432 (2011)
Article MathSciNet MATH Google Scholar
Gittens, A., Mahoney, M.W.: Revisiting the Nystrom method for improved large-scale machine learning. arXiv:1303.1849 (2013)
Golub, G.H., Von Matt, U.: Generalized cross-validation for large-scale problems. J. Comput. Graph. Stat. 6(1), 1–34 (1997)
MathSciNet MATH Google Scholar
Gu, M.: Subspace iteration randomization and singular value problems. SIAM J. Sci. Comput. 37(3), A1139–A1173 (2015)
Article MathSciNet MATH Google Scholar
Haber, E., Horesh, L., Tenorio, L.: Numerical methods for experimental design of large-scale linear ill-posed inverse problems. Inverse Probl. 24(5), 055012–055017 (2008)
Article MathSciNet MATH Google Scholar
Haber, E., Magnant, Z., Lucero, C., Tenorio, L.: Numerical methods for $A$-optimal designs with a sparsity constraint for ill-posed inverse problems. Comput. Optim. Appl. 52, 293–314 (2012)
Article MathSciNet MATH Google Scholar
Halko, N., Martinsson, P.G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)
Article MathSciNet MATH Google Scholar
Han, I., Malioutov, D., Shin, J.: Large-scale log-determinant computation through stochastic Chebyshev expansions. arXiv:1503.06394 (2015)
Higham, N.J.: Accuracy and Stability of Numerical Algorithms, 2nd edn. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2002)
Book MATH Google Scholar
Horn, R.A., Johnson, C.R.: Topics in Matrix Analysis. Cambridge University Press, Cambridge (1991)
Book MATH Google Scholar
Horn, R.A., Johnson, C.R.: Matrix Analysis, 2nd edn. Cambridge University Press, Cambridge (2013)
MATH Google Scholar
Hutchinson, M.F.: A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Commun. Stat. Simul. Comput. 18(3), 1059–1076 (1989)
Article MathSciNet MATH Google Scholar
Ledoux, M.: On Talagrand’s deviation inequalities for product measures. ESAIM Probab. Statist. 1, 63–87 (1995/1997)
Liberty, E., Woolfe, F., Martinsson, P.G., Rokhlin, V., Tygert, M.: Randomized algorithms for the low-rank approximation of matrices. Proc. Natl. Acad. Sci. USA 104(51), 20167–20172 (2007)
Article MathSciNet MATH Google Scholar
Lin, L.: Randomized estimation of spectral densities of large matrices made accurate. Numer. Math. 1–31 (2016). doi:10.1007/s00211-016-0837-7
Mahoney, M.W.: Randomized Algorithms for Matrices and Data. Now Publishers Inc, Hanover (2011)
MATH Google Scholar
Martinsson, P.G., Rokhlin, V., Tygert, M.: A randomized algorithm for the decomposition of matrices. Appl. Comput. Harmon. Anal. 30(1), 47–68 (2011)
Article MathSciNet MATH Google Scholar
Mitzenmacher, M., Upfal, E.: Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, Cambridge (2005)
Book MATH Google Scholar
Ouellette, D.V.: Schur complements and statistics. Linear Algebra Appl. 36, 187–295 (1981)
Article MathSciNet MATH Google Scholar
Pace, R.K., LeSage, J.P.: Chebyshev approximation of log-determinants of spatial weight matrices. Comput. Stat. Data Anal. 45(2), 179–196 (2004)
Article MathSciNet MATH Google Scholar
Parlett, B.N.: The Symmetric Eigenvalue Problem. Prentice Hall Inc, Englewood Cliffs (1980)
MATH Google Scholar
Petra, N., Stadler, G.: Model variational inverse problems governed by partial differential equations. Tech. Rep. 11-05, The Institute for Computational Engineering and Sciences, The University of Texas at Austin (2011)
Roosta-Khorasani, F., Ascher, U.: Improved bounds on sample size for implicit matrix trace estimators. Found. Comput. Math. 15(5), 1187–1212 (2015)
Article MathSciNet MATH Google Scholar
Rudelson, M., Vershynin, R.: Smallest singular value of a random rectangular matrix. Commun. Pure Appl. Math. 62(12), 1707–1739 (2009)
Article MathSciNet MATH Google Scholar
Saad, Y.: Numerical methods for large eigenvalue problems. In: Classics in Applied Mathematics, vol. 66. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA (2011). Revised edition of the 1992 original
Saibaba, A.K., Kitanidis, P.K.: Fast computation of uncertainty quantification measures in the geostatistical approach to solve inverse problems. Adv. Water Resour. 82, 124–138 (2015)
Article Google Scholar
Sorensen, D.C., Embree, M.: A DEIM induced CUR factorization. SIAM J. Sci. Comput. 38(3), A1454–A1482 (2016)
Article MathSciNet MATH Google Scholar
Stirling, D.S.G.: Mathematical Analysis and Proof, 2nd edn. Horwood Publishing Limited, Chichester (2009)
Book MATH Google Scholar
Tang, J.M., Saad, Y.: A probing method for computing the diagonal of a matrix inverse. Numer. Linear Algebra Appl. 19(3), 485–501 (2012)
Article MathSciNet MATH Google Scholar
Tropp, J.A.: Improved analysis of the subsampled randomized Hadamard transform. Adv. Adapt. Data Anal. 3(1–2), 115–126 (2011)
Article MathSciNet MATH Google Scholar
Tropp, J.A.: An introduction to matrix concentration inequalities. Found. Trends Mach. Learn. 8(1–2), 1–230 (2015)
Article MATH Google Scholar
Uciński, D.: Optimal Measurement Methods for Distributed Parameter System Identification. CRC Press, Boca Raton (2005)
MATH Google Scholar
Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices. In: Eldar, Y.C., Kutyniok, G. (eds.) Compressed Sensing, pp. 210–268. Cambridge University Press, Cambridge (2012)
Wahba, G.: Practical approximate solutions to linear operator equations when the data are noisy. SIAM J. Numer. Anal. 14(4), 651–667 (1977)
Article MathSciNet MATH Google Scholar
Wahba, G.: Spline Models for Observational Data, CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 59. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (1990)
Google Scholar
Zhang, Y., Leithead, W.E., Leith, D.J., Walshe, L.: Log-det approximation based on uniformly distributed seeds and its application to Gaussian process regression. J. Comput. Appl. Math. 220(1–2), 198–214 (2008)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, North Carolina State University, Raleigh, NC, 27695-8205, USA
Arvind K. Saibaba, Alen Alexanderian & Ilse C. F. Ipsen

Authors

Arvind K. Saibaba
View author publications
You can also search for this author in PubMed Google Scholar
Alen Alexanderian
View author publications
You can also search for this author in PubMed Google Scholar
Ilse C. F. Ipsen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arvind K. Saibaba.

Additional information

The third author acknowledges the support from the XDATA Program of the Defense Advanced Research Projects Agency (DARPA), administered through Air Force Research Laboratory contract FA8750-12-C-0323 FA8750-12-C-0323.

Appendices

Appendix 1: Gaussian random matrices

In this section, we state a lemma on the pseudo-inverse of a rectangular Gaussian random matrix, and use this result to prove both parts of Lemma 4.

1.1 Pseudo-inverse of a Gaussian random matrix

We state a result on the large deviation bound of the pseudo-inverse of a Gaussian random matrix [19, Proposition 10.4].

Lemma 6

Let $\mathbf {G}\in \mathbb {R}^{k\times (k+p)}$ be a random Gaussian matrix and let $p\ge 2$. For all $t \ge 1$,

$$\begin{aligned} \mathbb {P}\left[ \, \Vert \mathbf {G}^\dagger \Vert _2 \ge \frac{e\sqrt{k+p}}{p+1}\cdot t \right] \le t^{-(p+1)}. \end{aligned}$$

(23)

1.2 Proof of Lemma 4

Proof

From [45, Corollary 5.35] we have

$$\begin{aligned} \mathbb {P}\left[ \, \Vert \mathbf {G}_2\Vert _2 > \sqrt{n-k} + \sqrt{k+p} + t \right] \le \exp (-t^2/2). \end{aligned}$$

Recall from (3) $\mu = \sqrt{n-k} + \sqrt{k+p}$. From the law of the unconscious statistician [16, Proposition S4.2],

$$\begin{aligned} \mathbb {E}\left[ \, \Vert \mathbf {G}_2\Vert _2^2 \right] =&\> \int _0^\infty 2t \mathbb {P}\left[ \, \Vert \mathbf {G}_2\Vert _2> t \right] dt\\ \le&\> \int _0^{\mu } 2t dt + \int _{\mu }^\infty 2t \mathbb {P}\left[ \, \Vert \mathbf {G}_2\Vert _2> t \right] dt \\ \le&\> \mu ^2 + \int _{0}^\infty 2(u+\mu ) \exp (-u^2/2)du = \> \mu ^2 + 2\left( 1+ \mu \sqrt{\frac{\pi }{2}}\right) . \end{aligned}$$

This concludes the proof for (16).

Next consider (17). Using Lemma 6, we have for $t > 0$

$$\begin{aligned} \mathbb {P}\left[ \, \Vert \mathbf {G}_1^\dagger \Vert _2 \ge t \right] \le \> D t^{-(p+1)} \qquad D \equiv \> \frac{1}{\sqrt{2\pi (p+1)}}\left( \frac{e\sqrt{k+p}}{p+1}\right) . \end{aligned}$$

(24)

As before, we have

$$\begin{aligned} \mathbb {E}\left[ \, \Vert \mathbf {G}_1^{\dagger }\Vert _2^2 \right]&= \int _0^\infty 2t \mathbb {P}\left[ \, \Vert \mathbf {G}_1^\dagger \Vert _2> t \right] dt\\&\le \int _0^{\beta } 2t dt + \int _{\beta }^\infty 2t \mathbb {P}\left[ \, \Vert \mathbf {G}_1^\dagger \Vert _2 > t \right] dt\\ & \le \beta ^2 + \int _{\beta }^\infty 2t D t^{-(p+1)} dt = \beta ^2 + 2D\frac{\beta ^{1-p}}{p-1}. \end{aligned}$$

Minimizing w.r.t. $\beta $, we get $\beta = (D)^{1/(p+1)}$. Substitute this value for $\beta $ and simplify.

$\square $

Appendix 2: Rademacher random matrices

In this section, we state the matrix Chernoff inequalities [43] and other useful concentration inequalities and use these results to prove Theorem 10.

1.1 Useful concentration inequalities

The proof of Theorem 10 relies on the matrix concentration inequalities developed in [43]. We will need the following result [43, Theorem 5.1.1] in what follows.

Theorem 11

(Matrix Chernoff) Let $\{\mathbf {X}_k\}$ be finite sequence of independent, random, $d\times d$ Hermitian matrices. Assume that $0 \le \lambda _{\min }(\mathbf {X}_k)$ and $\lambda _{\max }(\mathbf {X}_k) \le L$ for each index k. Let us define

$$\begin{aligned} \mu _{\min } \equiv \lambda _{\min }\left( \sum _k\mathbb {E}\left[ \, \mathbf {X}_k \right] \right) \qquad \mu _{\max } \equiv \lambda _{\max }\left( \sum _k\mathbb {E}\left[ \, \mathbf {X}_k \right] \right) , \end{aligned}$$

and let $g(x) \equiv e^x(1+x)^{-(1+x)}.$ Then for any $\epsilon > 0$

$$\begin{aligned} \mathbb {P}\left[ \, \lambda _{\max }\left( \sum _k \mathbf {X}_k \right) \ge (1+\epsilon )\mu _{\max } \right] \le d g(\epsilon )^{\mu _{\max }/L}, \end{aligned}$$

and for any $0 \le \epsilon <1 $

$$\begin{aligned} \mathbb {P}\left[ \, \lambda _{\min }\left( \sum _k \mathbf {X}_k \right) {\le } (1-\epsilon )\mu _{\min } \right] \le d g(-\epsilon )^{\mu _{\min }/L}. \end{aligned}$$

The following result was first proved by Ledoux [25] but we reproduce the statement from [42, Proposition 2.1].

Lemma 7

Suppose $f:\mathbb {R}^n \rightarrow \mathbb {R}$ is a convex function that satisfies the following Lipschitz bound

$$\begin{aligned} |f(\mathbf {x})-f(\mathbf {y})| \le L \Vert \mathbf {x}-\mathbf {y}\Vert _2 \qquad \text {for all }\>\mathbf {x},\mathbf {y}\in \mathbb {R}^n. \end{aligned}$$

Let $\mathbf {z}\in \mathbb {R}^n$ be a random vector with entries drawn from an i.i.d. Rademacher distribution. Then, for all $t \ge 0$,

$$\begin{aligned} \mathbb {P}\left[ \, f(\mathbf {z}) \ge \mathbb {E}\left[ \, f(\mathbf {z}) \right] + Lt \right] \le e^{-t^2/8}. \end{aligned}$$

Lemma 8

Let $\mathbf {V}$ be a $n\times r$ matrix with orthonormal columns and let $n\ge r$. Let $\mathbf {z}$ be an $n\times 1$ vector with entries drawn from an i.i.d. Rademacher distribution. Then, for $0< \delta < 1$,

$$\begin{aligned} \mathbb {P}\left[ \, \Vert \mathbf {V}^*\mathbf {z}\Vert _2 \ge \sqrt{r} + \sqrt{8\log \left( \frac{1}{\delta }\right) } \right] \le \delta . \end{aligned}$$

Proof

Our proof follows the strategy in [42, Lemma 3.3]. Define the function $f(\mathbf {x}) = \Vert \mathbf {V}^*\mathbf {x}\Vert _2$. We observe that f satisfies the assumptions of Lemma 7, with Lipschitz constant $L=1$; the latter follows from

$$\begin{aligned} |\Vert \mathbf {V}^*\mathbf {x}\Vert _2-\Vert \mathbf {V}^*\mathbf {y}\Vert _2| \le \Vert \mathbf {V}^*(\mathbf {x}-\mathbf {y})\Vert _2 \le \Vert \mathbf {x}-\mathbf {y}\Vert _2. \end{aligned}$$

Furthermore, using Hölder’s inequality

$$\begin{aligned} \mathbb {E}\left[ \, f(\mathbf {z}) \right] \le [\mathbb {E}\left[ \, f(\mathbf {z})^2 \right] ]^{1/2} = \Vert \mathbf {V}\Vert _F = \sqrt{r}. \end{aligned}$$

Using Lemma 7 with $t_\delta = \sqrt{8\log \left( 1/\delta \right) }$ we have

$$\begin{aligned} \mathbb {P}\left[ \, f(\mathbf {z}) \ge \sqrt{r} + t_\delta \right] \le \mathbb {P}\left[ \, f(\mathbf {z}) \ge \mathbb {E}\left[ \, f(\mathbf {z}) \right] + t_\delta \right] \le e^{-t_\delta ^2/8} = \delta . \end{aligned}$$

$\square $

Lemma 9

Let $X_i$ for $i=1,\ldots ,n$ be a sequence of i.i.d. random variables. If for each $i=1,\ldots ,n$, $\mathbb {P}\left[ \, X_i \ge a \right] \le \xi $ holds, where $\xi \in (0,1]$, then

$$\begin{aligned} \mathbb {P}\left[ \, \max _{i=1,\cdots ,n} X_i \ge a \right] \le n\xi . \end{aligned}$$

Proof

Since $ \mathbb {P}\left[ \, X_i \ge a \right] \le \xi $ then $\mathbb {P}\left[ \, X_i < a \right] \ge 1 - \xi $. We can bound

$$\begin{aligned} \mathbb {P}\left[ \, \max _{i=1,\cdots ,n} X_i \ge a \right] =&\left( 1-\mathbb {P}\left[ \, \max _{i=1,\cdots ,n} X_i< a \right] \right) \\ =&\left( 1- \prod _{i=1}^n\mathbb {P}\left[ \, X_i <a \right] \right) \le 1 - (1-\xi )^n. \end{aligned}$$

The proof follows from Bernoulli’s inequality [40, Theorem 5.1] which states $(1-\xi )^n \ge 1 - n\xi $ for $\xi \in [0,1]$ and $n\ge 1$. $\square $

1.2 Proof of Theorem 10

Proof

Recall that $\varvec{\Omega }_1 = \mathbf {U}^*_1 \varvec{\Omega }$ and $\varvec{\Omega }_2 = \mathbf {U}_2^*\varvec{\Omega }$ where $\varvec{\Omega }$ is random matrix with entries chosen from an i.i.d. Rademacher distribution. The proof proceeds in three steps.

1. Bound for $\Vert \varvec{\Omega }_2\Vert _2^2$ The proof uses the matrix Chernoff concentration inequality. Let $\varvec{\omega }_i \in \mathbb {R}^{n\times 1}$ be the i-th column of $\varvec{\Omega }$. Note $ \varvec{\Omega }_2\varvec{\Omega }_2^* \in \mathbb {C}^{(n-k)\times (n-k)}$ and

$$\begin{aligned} \mathbb {E}\left[ \, \varvec{\Omega }_2\varvec{\Omega }_2^* \right] = \sum _{i=1}^\ell \mathbf {U}_2^*\mathbb {E}\left[ \, \varvec{\omega }_i\varvec{\omega }_i^{*} \right] \mathbf {U}_2 = \ell \mathbf {I}_{n-k}. \end{aligned}$$

Furthermore, define $\mu _\text {min} ( \varvec{\Omega }_2\varvec{\Omega }_2^*) \equiv \lambda _\text {min} (\mathbb {E}\left[ \, \varvec{\Omega }_2\varvec{\Omega }_2^* \right] )$ and $\mu _\text {max} (\varvec{\Omega }_2\varvec{\Omega }_2^*) \equiv \lambda _\text {max} (\mathbb {E}\left[ \, \varvec{\Omega }_2\varvec{\Omega }_2^* \right] )$. Clearly $\mu _\text {min} = \mu _\text {max} = \ell $. Note that here we have expressed $ \varvec{\Omega }_2\varvec{\Omega }_2^*$ as a finite sum of $\ell $ rank-1 matrices, each with a single nonzero eigenvalue $\varvec{\omega }_i^* \mathbf {U}_2\mathbf {U}_2^*\varvec{\omega }_i $. We want to obtain a probabilistic bound for the maximum eigenvalue i.e., $L_2 = \max _{i=1,\cdots ,\ell } \Vert \mathbf {U}_2^*\varvec{\omega }_i\Vert _2^2$. Using Lemma 8 we can write with probability at most $e^{-t^2/8}$

$$\begin{aligned} \left( \sqrt{n-k} + t\right) ^2 \le \Vert \mathbf {U}_2^*\varvec{\omega }_i\Vert _2^2 = \varvec{\omega }_i^* \mathbf {U}_2\mathbf {U}_2^*\varvec{\omega }_i . \end{aligned}$$

Since $\Vert \mathbf {U}_2^*\varvec{\omega }_i\Vert _2^2$ are i.i.d., applying Lemma 9 gives

$$\begin{aligned} \mathbb {P}\left[ \, \max _{i=1,\cdots ,\ell } \Vert \mathbf {U}_2^*\varvec{\omega }_i\Vert _2 \ge \sqrt{n-k} + t \right] \le \ell e^{-t^2/8}. \end{aligned}$$

Take $t = \sqrt{ 8\log (4\ell /\delta )}$ to obtain

$$\begin{aligned} \mathbb {P}\left[ \, L_2 \ge C_u^2 \right] \le \delta /4, \qquad C_u \equiv \sqrt{n-k} + \sqrt{ 8\log \left( \frac{4\ell }{\delta }\right) }. \end{aligned}$$

(25)

The matrix $\varvec{\Omega }_2$ satisfies the conditions of the matrix Chernoff theorem 11; for $\eta \ge 0$ we have

$$\begin{aligned} \mathbb {P}\left[ \, \lambda _\text {max}(\varvec{\Omega }_2\varvec{\Omega }_2^*) \ge (1+\eta ) \ell \right] \le (n-k) g(\eta )^\frac{\ell }{L_2}, \end{aligned}$$

where the function $g(\eta )$ is defined in Theorem 11. For $\eta > 1$ the Chernoff bounds can be simplified [30, Section 4.3] since $g(\eta ) \le e^{-\eta /3}$, to obtain

$$\begin{aligned} \mathbb {P}\left[ \, \lambda _\text {max}(\varvec{\Omega }_2\varvec{\Omega }_2^*) \ge (1+\eta ) \ell \right] \le (n-k) \exp \left( -\frac{\eta \ell }{3L_2}\right) . \end{aligned}$$

Choose the parameter

$$\begin{aligned} \eta _\delta = C_{\ell ,\delta }C_u^2 = 3\ell ^{-1}C_u^2\log \left( \frac{4(n-k)}{\delta }\right) , \end{aligned}$$

so that

$$\begin{aligned} \mathbb {P}\left[ \, \Vert \varvec{\Omega }_2\Vert _2^2 \ge (1+\eta _\delta )\ell \right] \le&\> (n-k)\exp \left( -\frac{C_u^2}{L_2}\log \frac{4(n-k)}{\delta }\right) \\ =&\> (n-k)\left( \frac{\delta }{4(n-k)}\right) ^{C_u^2/L_2}. \end{aligned}$$

Finally, we want to find a lower bound for $\Vert \varvec{\Omega }_2\Vert _2^2$. Define the events

$$\begin{aligned} A = \>\left\{ \varvec{\Omega }_2 \mid L_2 < C_u^2 \right\} , \qquad B = \>\left\{ \varvec{\Omega }_2 \mid \Vert \varvec{\Omega }_2\Vert _2^2 \ge (1+\eta _\delta )\ell \right\} . \end{aligned}$$

Note that $\mathbb {P}\left[ \, A^c \right] \le \delta /4$ and under event A we have $C_u^2 >L_2$ so that

$$\begin{aligned} \mathbb {P}\left[ \, B\mid A \right] \le (n-k)\left( \frac{\delta }{4(n-k)}\right) ^{C_u^2/L_2} \le \delta /4. \end{aligned}$$

Using the law of total probability

$$\begin{aligned} \mathbb {P}\left[ \, B \right] =&\> \mathbb {P}\left[ \, B\mid A \right] \mathbb {P}\left[ \, A \right] + \mathbb {P}\left[ \, B\mid A^c \right] \mathbb {P}\left[ \, A^c \right] \\ \le&\> \mathbb {P}\left[ \, B\mid A \right] + \mathbb {P}\left[ \, A^c \right] , \end{aligned}$$

we can obtain a bound for $\mathbb {P}\left[ \, B \right] $ as

$$\begin{aligned} \mathbb {P}\left[ \, \Vert \varvec{\Omega }_2\Vert _2^2 \ge \ell \left( 1 + C_u^2 C_{\ell ,\delta }\right) \right] \le \delta /2. \end{aligned}$$

2. Bound for $\Vert \varvec{\Omega }_1^\dagger \Vert _2^2$ The steps are similar and we again use the matrix Chernoff concentration inequality. Consider $\varvec{\Omega }_1\varvec{\Omega }_1^*\in \mathbb {C}^{k\times k}$, and as before, write this matrix as the sum of rank-1 matrices to obtain

$$\begin{aligned} \mathbb {E}\left[ \, \varvec{\Omega }_1\varvec{\Omega }_1^* \right] = \sum _{i=1}^\ell \mathbf {U}_1^*\mathbb {E}\left[ \, \varvec{\omega }_i\varvec{\omega }_i^{*} \right] \mathbf {U}_1 = \ell \mathbf {I}_{k}, \end{aligned}$$

and $\mu _\text {min} (\varvec{\Omega }_1\varvec{\Omega }_1^*) = \ell $. Each summand in the above decomposition of $\varvec{\Omega }_1\varvec{\Omega }_1^*$ has one nonzero eigenvalue $\varvec{\omega }_i^* \mathbf {U}_1\mathbf {U}_1^*\varvec{\omega }_i $. Following the same strategy as in Step 1, we define $L_1 \equiv \max _{i=1,\ldots ,\ell }\Vert \mathbf {U}_1^*\varvec{\omega }_i\Vert _2^2 $ and apply Lemma 8 to obtain

$$\begin{aligned} \mathbb {P}\left[ \, \max _{i=1,\cdots ,\ell } \Vert \mathbf {U}_1^*\varvec{\omega }_i\Vert _2 \ge \sqrt{k} + t \right] \le \ell e^{-t^2/8} \le ne^{-t^2/8}. \end{aligned}$$

Take $t = \sqrt{ 8\log (4n/\delta )}$ to obtain

$$\begin{aligned} \mathbb {P}\left[ \, L_1 \ge C_l^2 \right] \le \delta /4, \qquad C_l \equiv \sqrt{k} + \sqrt{ 8\log \left( \frac{4n}{\delta }\right) }. \end{aligned}$$

(26)

A straightforward application of the Chernoff bound in Theorem 11 gives us

$$\begin{aligned} {\mathbb {P}\left[ \, \lambda _\text {min}(\varvec{\Omega }_1\varvec{\Omega }_1^*) \le (1-\rho ) \ell \right] \le k g(-\rho )^\frac{\ell }{L_1}. } \end{aligned}$$

Next, observe that $-\log g(-\rho )$ has the Taylor series expansion in the region $0 {<} \rho {<} 1$

$$\begin{aligned} -\log g(-\rho ) = {\rho } + (1-\rho )\log (1-\rho ) = \frac{\rho ^2}{2} + \frac{\rho ^3}{6} + \frac{\rho ^4}{12} + \cdots \end{aligned}$$

so that $-\log g(-\rho ) \ge \rho ^2/2$ for $0< \rho < 1$ or $g(-\rho ) \le e^{-\rho ^2/2}$. This gives us

$$\begin{aligned} \mathbb {P}\left[ \, \Vert \varvec{\Omega }_1^\dagger \Vert _2^2 \ge \frac{1}{ (1-\rho ) \ell } \right] \> \le \> k \exp \left( -\frac{\rho ^2\ell }{2L_1}\right) , \end{aligned}$$

(27)

where we have used $\lambda _\text {min}(\varvec{\Omega }_1\varvec{\Omega }_1^*) = 1/\Vert \varvec{\Omega }_1^\dagger \Vert _2^2$ assuming $\mathsf {rank}(\varvec{\Omega }_1) = k$.

With the number of samples as defined in Theorem 3

$$\begin{aligned} \ell \ge 2\rho ^{-2}C_l^2 \log \left( \frac{4k}{\delta }\right) , \end{aligned}$$

the Chernoff bound (27) becomes

$$\begin{aligned} \mathbb {P}\left[ \, \Vert \varvec{\Omega }_1^\dagger \Vert _2^2 \ge \frac{1}{ (1-\rho ) \ell } \right] \> \le k\left( \frac{\delta }{4k}\right) ^{C_l^2/L_1}. \end{aligned}$$

Define the events

$$\begin{aligned} C = \left\{ \varvec{\Omega }_1 \mid \Vert \varvec{\Omega }_1^\dagger \Vert _2^2 \ge \frac{1}{ (1-\rho ) \ell }\right\} , \qquad D = \{ \varvec{\Omega }_1 \mid L_1 < C_\ell ^2\}. \end{aligned}$$

Note that $\mathbb {P}\left[ \, D^c \right] \le \delta /4$ from (26). Then since the exponent is strictly greater than 1, we have

$$\begin{aligned} \mathbb {P}\left[ \, C \mid D \right] \> \le k\left( \frac{\delta }{4k}\right) ^{C_l^2/L_1} \le \delta /4. \end{aligned}$$

Using the conditioning argument as before gives $\mathbb {P}\left[ \, C \right] \le \delta /2$.

3. Combining bounds Define the event

$$\begin{aligned} E = \left\{ \varvec{\Omega }\mid \Vert \varvec{\Omega }_1^\dagger \Vert _2^2 \ge \frac{1}{(1-\rho )\ell }\right\} , \qquad F = \left\{ \varvec{\Omega }\mid \Vert \varvec{\Omega }_2\Vert _2^2 \ge (1 + C_{\ell ,\delta }C_u^2)\ell \right\} , \end{aligned}$$

where $C_{\ell ,\delta } $ is defined in Step 1, $\mathbb {P}\left[ \, E \right] \le \delta /2$ and from Step 2, $\mathbb {P}\left[ \, F \right] \le \delta /2$. It can be verified that

$$\begin{aligned} \left\{ \varvec{\Omega }\mid \Vert \varvec{\Omega }_2\Vert _2^2 \Vert \varvec{\Omega }_1^\dagger \Vert _2^2\ge \frac{1}{1-\rho }(1 + C_{\ell ,\delta }C_u^2) \right\} \subseteq E \cup F, \end{aligned}$$

and therefore, we can use the union bound

$$\begin{aligned} \mathbb {P}\left[ \, \Vert \varvec{\Omega }_2\Vert _2^2 \Vert \varvec{\Omega }_1^\dagger \Vert _2^2 \ge \frac{1}{1-\rho }(1 + C_{\ell ,\delta }C_u^2) \right] \le \mathbb {P}\left[ \, E \right] + \mathbb {P}\left[ \, F \right] \le \delta . \end{aligned}$$

Plugging in the value of $C_{\ell ,\delta }$ and $C_u^2$ from Step 1 gives the desired result. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saibaba, A.K., Alexanderian, A. & Ipsen, I.C.F. Randomized matrix-free trace and log-determinant estimators. Numer. Math. 137, 353–395 (2017). https://doi.org/10.1007/s00211-017-0880-z

Download citation

Received: 14 May 2016
Revised: 20 January 2017
Published: 06 April 2017
Issue Date: October 2017
DOI: https://doi.org/10.1007/s00211-017-0880-z

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Randomized matrix-free trace and log-determinant estimators

Abstract

Access this article

Similar content being viewed by others

Approximate Bayesian computation using asymptotically normal point estimates

Eigenvalue Attraction

On some stable linear functional regression estimators based on random projections

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1: Gaussian random matrices

1.1 Pseudo-inverse of a Gaussian random matrix

Lemma 6

1.2 Proof of Lemma 4

Proof

Appendix 2: Rademacher random matrices

1.1 Useful concentration inequalities

Theorem 11

Lemma 7

Lemma 8

Proof

Lemma 9

Proof

1.2 Proof of Theorem 10

Proof

Rights and permissions

About this article

Cite this article

Mathematics Subject Classification

Navigation

Randomized matrix-free trace and log-determinant estimators

Abstract

Access this article

Similar content being viewed by others

Approximate Bayesian computation using asymptotically normal point estimates

Eigenvalue Attraction

On some stable linear functional regression estimators based on random projections

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1: Gaussian random matrices

1.1 Pseudo-inverse of a Gaussian random matrix

Lemma 6

1.2 Proof of Lemma 4

Proof

Appendix 2: Rademacher random matrices

1.1 Useful concentration inequalities

Theorem 11

Lemma 7

Lemma 8

Proof

Lemma 9

Proof

1.2 Proof of Theorem 10

Proof

Rights and permissions

About this article

Cite this article

Share this article

Mathematics Subject Classification

Search

Navigation