Variance Reduction for Matrix Computations with Applications to Gaussian Processes

Mathur, Anant; Moka, Sarat; Botev, Zdravko

doi:10.1007/978-3-030-92511-6_16

Anant Mathur¹⁷,
Sarat Moka¹⁸ &
Zdravko Botev¹⁷

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 404))

Included in the following conference series:

EAI International Conference on Performance Evaluation Methodologies and Tools

549 Accesses
1 Citations
1 Altmetric

Abstract

In addition to recent developments in computing speed and memory, methodological advances have contributed to significant gains in the performance of stochastic simulation. In this paper we focus on variance reduction for matrix computations via matrix factorization. We provide insights into existing variance reduction methods for estimating the entries of large matrices. Popular methods do not exploit the reduction in variance that is possible when the matrix is factorized. We show how computing the square root factorization of the matrix can achieve in some important cases arbitrarily better stochastic performance. In addition, we detail a factorized estimator for the trace of a product of matrices and numerically demonstrate that the estimator can be up to 1,000 times more efficient on certain problems of estimating the log-likelihood of a Gaussian process. Additionally, we provide a new estimator of the log-determinant of a positive semi-definite matrix where the log-determinant is treated as a normalizing constant of a probability density.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Adams, R.P., et al.: Estimating the spectral density of large implicit matrices. arXiv preprint arXiv:1802.03451 (2018)
Avron, H., Toledo, S.: Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix. J. ACM (JACM) 58(2), 1–34 (2011)
Article MathSciNet Google Scholar
Bekas, C., Kokiopoulou, E., Saad, Y.: An estimator for the diagonal of a matrix. Appl. Numer. Math. 57(11–12), 1214–1229 (2007)
Article MathSciNet Google Scholar
Casella, G., Berger, R.L.: Statistical inference. Cengage Learning (2021)
Google Scholar
Chow, E., Saad, Y.: Preconditioned krylov subspace methods for sampling multivariate gaussian distributions. SIAM J. Sci. Comput. 36(2), A588–A608 (2014)
Article MathSciNet Google Scholar
Dauphin, Y.N., De Vries, H., Bengio, Y.: Equilibrated adaptive learning rates for non-convex optimization. arXiv preprint arXiv:1502.04390 (2015)
Drineas, P., Magdon-Ismail, M., Mahoney, M., Woodruff, D.P.: Fast approximation of matrix coherence and statistical leverage. J. Mach. Learn. Res. 13(1), 3475–3506 (2012)
MathSciNet MATH Google Scholar
Fitzsimons, J.K., Osborne, M.A., Roberts, S.J., Fitzsimons, J.F.: Improved stochastic trace estimation using mutually unbiased bases. arXiv preprint arXiv:1608.00117 (2016)
Gardner, J.R., Pleiss, G., Bindel, D., Weinberger, K.Q., Wilson, A.G.: Gpytorch: blackbox matrix-matrix gaussian process inference with gpu acceleration. arXiv preprint arXiv:1809.11165 (2018)
Geoga, C.J., Anitescu, M., Stein, M.L.: Scalable gaussian process computations using hierarchical matrices. J. Comput. Graph. Stat. 29(2), 227–237 (2020)
Article MathSciNet Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computations, vol. 3. JHU Press (2013)
Google Scholar
Hutchinson, M.F.: A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines. Commun. Stat. Simul. Comput. 18(3), 1059–1076 (1989)
Article MathSciNet Google Scholar
Kaperick, B.J.: Diagonal Estimation with Probing Methods. Ph.D. thesis, Virginia Tech (2019)
Google Scholar
Lin, L., Saad, Y., Yang, C.: Approximating spectral densities of large matrices. SIAM Rev. 58(1), 34–65 (2016)
Article MathSciNet Google Scholar
Martens, J., Sutskever, I., Swersky, K.: Estimating the hessian by back-propagating curvature. arXiv preprint arXiv:1206.6464 (2012)
Meyer, R.A., Musco, C., Musco, C., Woodruff, D.P.: Hutch++: optimal stochastic trace estimation. In: Symposium on Simplicity in Algorithms (SOSA), pp. 142–155. SIAM (2021)
Google Scholar
Pleiss, G., Jankowiak, M., Eriksson, D., Damle, A., Gardner, J.R.: Fast matrix square roots with applications to gaussian processes and bayesian optimization. arXiv preprint arXiv:2006.11267 (2020)
Stathopoulos, A., Laeuchli, J., Orginos, K.: Hierarchical probing for estimating the trace of the matrix inverse on toroidal lattices. SIAM J. Sci. Comput. 35(5), S299–S322 (2013)
Article MathSciNet Google Scholar
Stein, M.L., Chen, J., Anitescu, M.: Stochastic approximation of score functions for gaussian processes. In: The Annals of Applied Statistics, pp. 1162–1191 (2013)
Google Scholar
Tropp, J.A.: Randomized algorithms for matrix computations (2020)
Google Scholar
Ubaru, S., Chen, J., Saad, Y.: Fast estimation of tr(f(a)) via stochastic lanczos quadrature. SIAM J. Matrix Anal. Appl. 38(4), 1075–1099 (2017)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

University of New South Wales, High Street, Kensington Sydney, NSW, 2052, Australia
Anant Mathur & Zdravko Botev
Macquarie University, 192 Balaclava Rd, Macquarie Park, NSW, 2113, Australia
Sarat Moka

Authors

Anant Mathur
View author publications
You can also search for this author in PubMed Google Scholar
Sarat Moka
View author publications
You can also search for this author in PubMed Google Scholar
Zdravko Botev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anant Mathur .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Qianchuan Zhao
Sun Yat-sen University, Guangzhou, China
Li Xia

Appendices

A Proof of Theorem 1

The following proof is not explicitly stated in [15] and so we include it here for completeness.

Proof

To see this, first observe that when $\mathbf {C} = \mathbf {B}$, $\sum _{j}b^2_{i,j}=a_{i,i}$. Therefore, from (3) we can deduce $\mathbb {V}\mathrm {ar}_G[a_{i,i}]= 2a^2_{i,i}.$ Then, applying Cauchy-Schwarz inequality to (3), we obtain the inequality,

$$ \mathbb {V}\mathrm {ar}_G[a_{i,i}] \ge (\boldsymbol{b}_{i,:})^\top (\boldsymbol{c}_{i,:}) + a_{i,i}^2. $$

Since $a_{i,i}^2$ remains the same for every decomposition of the form $\mathbf {A} = \mathbf {B} \mathbf {C}^\top $ and Cauchy-Schwarz inequality holds with equality if and only if $\boldsymbol{b}_{i,:}$ and $\boldsymbol{c}_{i, :}$ are linearly dependent, we conclude that $\mathbb {V}\mathrm {ar}_G[a_{i,i}]$ is minimized when $\mathbf {C} = \mathbf {B}$.

B Uniform Spherical Estimator

1.1 B.1 Proof of Theorem 2

Proof

Suppose $\boldsymbol{Z} = (Z_1,\dots ,Z_n)^{\top }$ is uniformly distributed on the surface of the unit radius n-dimensional sphere centered at the origin. Using spherical coordinates, it is not difficult to show that the vector $\boldsymbol{X}=\boldsymbol{Z}\odot \boldsymbol{Z}$ follows a Dirichlet distribution over the simplex:

$$ f(\boldsymbol{x})=\frac{\pi ^{n/2}}{\varGamma (n/2)}\prod _{i=1}^n x_i^{-1/2},\quad x_i\in [0,1],\quad \sum _{i=1}^nx_i=1. $$

Therefore, using the well-known mean and variance formula for the Dirichlet distribution we have,

$$\begin{aligned} \mathbb {E}[\boldsymbol{X}]=\boldsymbol{1}/n,\quad \mathbb {V}\mathrm {ar}[\boldsymbol{X}]=\frac{1}{n(n/2+1)}\left[ \mathbf {I}_n-\frac{1}{n}\boldsymbol{1}\boldsymbol{1}^\top \right] . \end{aligned}$$

(26)

Using spherical coordinates, we also know that $\mathbb {E}[Z_iZ_j]=0$ for $i\ne j$. Therefore,

$$\begin{aligned} \mathbb {E}[\boldsymbol{Z}\boldsymbol{Z}^{\top }]=\frac{1}{n}\mathbf {I}_n, \end{aligned}$$

and,

$$\begin{aligned} \mathbb {E}\left[ n\left( \mathbf {B} \boldsymbol{Z}\right) \left( \mathbf {C} \boldsymbol{Z}\right) ^{\top }\right] =n\mathbb {E}\left[ \mathbf {B}\boldsymbol{Z} \boldsymbol{Z}^{\top }\mathbf {C}^\top \right] = \mathbf {A}. \end{aligned}$$

To prove the variance, we first note that,

$$ \mathbb {E}[(\hat{a}_{i,j}/n)^2]=\mathbb {E}[(\left[ \mathbf {B} \boldsymbol{Z}\right] _i\times \left[ \mathbf {C} \boldsymbol{Z}\right] _j)^2] = \sum _{k,l,m,n}b_{i,k}c_{j,l}b_{i,m}c_{j,n}\mathbb {E}[Z_{k} Z_{l} Z_{m} Z_{n}]. $$

Representing $\boldsymbol{Z}$ in spherical coordinates, we obtain the formula,

$$\begin{aligned} \mathbb {E}[Z_{k} Z_{l} Z_{m} Z_{n}] = c_1[\delta _{kl}\delta _{mn}+\delta _{km}\delta _{ln}+\delta _{kn}\delta _{lm}]+(c_2-3c_1)[\delta _{kl}\delta _{lm}\delta _{mn}]. \end{aligned}$$

The constants $c_1$ and $c_2$ are given by,

$$\begin{aligned} c_1&= \mathbb {V}\mathrm {ar}[\boldsymbol{X}]_{p,q}+ \mathbb {E}[\boldsymbol{X}]_p^2= \frac{1}{n(n+2)},\qquad p \ne q,\qquad \text {and}\\ c_2&=\mathbb {V}\mathrm {ar}[\boldsymbol{X}]_{p,p}+\mathbb {E}[\boldsymbol{X}]_p^2= \frac{3}{n(n+2)}. \end{aligned}$$

Thus $c_2-3c_1=0$, and,

$$\begin{aligned} \mathbb {E}[(a_{i,j}/n)^2]&= c_1\sum _{k,l,m,n}b_{ik}c_{jl}b_{im}c_{jn}[\delta _{kl}\delta _{mn}+\delta _{km}\delta _{ln}+\delta _{kn}\delta _{lm}]\\&= c_1\left[ (\boldsymbol{b}_{i,:}^{\top }\boldsymbol{c}_{j,:})^2 +(\boldsymbol{b}_{i,:}^{\top }\boldsymbol{b}_{i,:})(\boldsymbol{c}_{j,:}^{\top }\boldsymbol{c}_{j,:})+(\boldsymbol{b}_{i,:}^{\top }\boldsymbol{c}_{j,:})^2\right] \\&= c_1\left[ 2a_{i,j}^2 +\Vert \boldsymbol{b}_{i,:}\Vert ^2\Vert \boldsymbol{c}_{i,:}\Vert ^2\right] . \end{aligned}$$

Therefore,

$$\begin{aligned} \mathbb {V}\mathrm {ar}_{S}[(\hat{a}_{i,j}/n)]&= \mathbb {E}\left[ (\hat{a}_{i,j}/n)^2\right] -a^2_{i,j}/n^2\\&= (2c_1-1/n^2)a_{i,j}^2+c_{1}\Vert \boldsymbol{b}_{i,:}\Vert ^2\Vert \boldsymbol{c}_{j,:}\Vert ^2.\\ \end{aligned}$$

Hence, the variance for $\hat{a}_{i,j}$ is,

$$\begin{aligned} \mathbb {V}\mathrm {ar}_{S}[(\hat{a}_{i,j})]&= n^2\mathbb {V}\mathrm {ar}_{S}[(\hat{a}_{i,j}/n)]\\&= \frac{n-2}{n+2}a_{i,j}^2+\frac{n}{n+2}\Vert \boldsymbol{b}_{i,:}\Vert ^2\Vert \boldsymbol{c}_{j,:}\Vert ^2. \end{aligned}$$

1.2 B.2 Proof of Theorem 4

Proof

Suppose $\mathbf {\Sigma }'$ has eigenvalues $\{\lambda _i\}$, and orthonormal eigendecomposition $\mathbf {Q}\mathbf {\Lambda } \mathbf {Q}^{\top }$. Then, as the random variable that is distributed uniformly on the surface of the n-dimensional sphere centered at the origin is invariant to orthogonal rotations, we obtain,

$$ \mathbb {V}\mathrm {ar}[\boldsymbol{Z}^\top \mathbf {\Sigma }'\boldsymbol{Z}]=\mathbb {V}\mathrm {ar}[\boldsymbol{Z}^\top \mathbf {\Lambda }\boldsymbol{Z}]=\mathbb {V}\mathrm {ar}[\boldsymbol{\lambda }^\top \boldsymbol{X}]=\boldsymbol{\lambda }^\top \mathbb {V}\mathrm {ar}[\boldsymbol{X}]\boldsymbol{\lambda }. $$

We notice that ${\text {tr}}(\mathbf {\Sigma }^\top \mathbf {\Sigma })=\boldsymbol{\lambda }^\top \boldsymbol{\lambda }$. Therefore,

$$\begin{aligned} \mathbb {V}\mathrm {ar}[n\boldsymbol{Z}^\top \mathbf {\Sigma }'\boldsymbol{Z}]=\frac{n}{n+2}\times 2\left( {\text {tr}}(\mathbf {\Sigma }'^\top \mathbf {\Sigma }')-\frac{(\sum _i\mathbf {\Sigma }'_{i,i})^2}{n}\right) . \end{aligned}$$

To see that this gives the variance for $n\boldsymbol{Z}^{\top }\mathbf {\Sigma }\boldsymbol{Z}$, we note,

$$\begin{aligned} \boldsymbol{Z}^{\top }\mathbf {\Sigma }'\boldsymbol{Z} = \boldsymbol{Z}^{\top }\left[ (\mathbf {\Sigma } + \mathbf {\Sigma }^{\top })/2\right] \boldsymbol{Z}=\boldsymbol{Z}^{\top }\mathbf {\Sigma }\boldsymbol{Z}. \end{aligned}$$

We note that when $\boldsymbol{Z}$ is distributed uniformly on the unit radius complex sphere instead, the variance formula is given in Tropp [20].

C Log-Determinant

Suppose $\mathbf {\Sigma }\in \mathbb {R}^{n\times n}$ is a PSD matrix. Let $\boldsymbol{Z}\sim \mathcal {N}(\mathbf {0},\mathbf {I}_n)$. Then, $\boldsymbol{Z}$ can be represented as $\boldsymbol{Z}==R\boldsymbol{\varTheta }$ with $\boldsymbol{\varTheta }$ being uniformly distributed on the surface of the unit radius n-dimensional sphere centered at the origin and $R\sim \chi _n$, independently. Then using the standard normal as a change of measure we get the following,

$$ \begin{aligned} \frac{1}{(2\pi )^{n/2}}\int \exp (-\boldsymbol{z}^\top \mathbf {\Sigma }^{-1}\boldsymbol{z}/2)\mathbf {d} \boldsymbol{Z}&=\mathbb {E} \left[ \exp (-\boldsymbol{Z}^\top \mathbf {\Sigma }^{-1}\boldsymbol{Z}/2+\Vert \boldsymbol{Z}\Vert ^2/2)\right] \\&=\mathbb {E} \left[ \exp (-R^2\boldsymbol{\varTheta }^\top \mathbf {\Sigma }^{-1}\boldsymbol{\varTheta }/2+R^2/2)\right] \\&=\frac{1}{2^{n/2}\varGamma (n/2)}\mathbb {E} \left[ \int _0^\infty r^{n/2-1}\exp (-r\boldsymbol{\varTheta }^\top \mathbf {\Sigma }^{-1}\boldsymbol{\varTheta }/2)\mathbf {d} r\right] \\&=\mathbb {E} \left[ (\boldsymbol{\varTheta }^\top \mathbf {\Sigma }^{-1}\boldsymbol{\varTheta })^{-n/2} \right] \\&\approx \frac{1}{M}\sum _{i=1}^M(\boldsymbol{\varTheta }_i^\top \mathbf {\Sigma }^{-1}\boldsymbol{\varTheta }_i)^{-n/2}. \end{aligned} $$

We use the following integral formula to evaluate the integral on line 3,

$$ \int _0^\infty r^{n/2-1}\exp (-r\alpha /2)\mathbf {d} r=\alpha ^{-n/2}2^{n/2}\varGamma (n/2). $$

Thus a conditional Monte Carlo estimate for the log-determinant is,

$$\begin{aligned} \widehat{\ln |\mathbf {\Sigma }|}=2\ln \left[ \frac{1}{M}\sum _{i=1}^M(\boldsymbol{\varTheta }_i^\top \mathbf {\Sigma }^{-1}\boldsymbol{\varTheta }_i)^{-n/2}\right] . \end{aligned}$$

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mathur, A., Moka, S., Botev, Z. (2021). Variance Reduction for Matrix Computations with Applications to Gaussian Processes. In: Zhao, Q., Xia, L. (eds) Performance Evaluation Methodologies and Tools. VALUETOOLS 2021. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 404. Springer, Cham. https://doi.org/10.1007/978-3-030-92511-6_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-92511-6_16
Published: 08 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92510-9
Online ISBN: 978-3-030-92511-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Variance Reduction for Matrix Computations with Applications to Gaussian Processes

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Proof of Theorem 1

Proof

B Uniform Spherical Estimator

1.1 B.1 Proof of Theorem 2

Proof

1.2 B.2 Proof of Theorem 4

Proof

C Log-Determinant

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation