Covariance Estimation via the Modified Cholesky Decomposition

Kang, Xiaoning; Zhang, Zhiyang; Deng, Xinwei

doi:10.1007/978-1-4471-7503-2_43

Xiaoning Kang³,
Zhiyang Zhang⁴ &
Xinwei Deng⁴

Part of the book series: Springer Handbooks ((SHB))

2160 Accesses

Abstract

In many engineering applications, estimation of covariance and precision matrices is of great importance, helping researchers understand the dependency and conditional dependency between variables of interest. Among various matrix estimation methods, the modified Cholesky decomposition is a commonly used technique. It has the advantage of transforming the matrix estimation task into solving a sequence of regression models. Moreover, the sparsity on the regression coefficients implies certain sparse structure on the covariance and precision matrices. In this chapter, we first overview the Cholesky-based covariance and precision matrices estimation. It is known that the Cholesky-based matrix estimation depends on a prespecified ordering of variables, which is often not available in practice. To address this issue, we then introduce several techniques to enhance the Cholesky-based estimation of covariance and precision matrices. These approaches are able to ensure the positive definiteness of the matrix estimate and applicable in general situations without specifying the ordering of variables. The advantage of Cholesky-based estimation is illustrated by numerical studies and several real-case applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 309.00; Price excludes VAT (USA)

Hardcover Book: USD 399.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bickel, P.J., Levina, E.: Covariance regularization by thresholding. Ann. Stat. 36(6), 2577–2604 (2008a)
Article MathSciNet MATH Google Scholar
Bickel, P.J., Levina, E.: Regularized estimation of large covariance matrices. Ann. Stat. 36(1), 199–227 (2008b)
Article MathSciNet MATH Google Scholar
Bien, J., Tibshirani, R.J.: Sparse estimation of a covariance matrix. Biometrika 98(4), 807–820 (2011)
Article MathSciNet MATH Google Scholar
Cai, T.T., Yuan, M.: Adaptive covariance matrix estimation through block thresholding. Ann. Stat. 40(40), 2014–2042 (2012)
MathSciNet MATH Google Scholar
Cai, T.T., Ren, Z., Zhou, H.H.: Estimating structured high-dimensional covariance and precision matrices: optimal rates and adaptive estimation. Electronic Journal of Statistics 10(1), 1–59 (2016)
MathSciNet MATH Google Scholar
Chang, C., Tsay, R.S.: Estimation of covariance matrix via the sparse Cholesky factor with lasso. J. Stat. Plann. Inference 140(12), 3858–3873 (2010)
Article MathSciNet MATH Google Scholar
Cochran, W.G.: Sampling Techniques. Wiley, New York (1977)
MATH Google Scholar
Deng, X., Lin, C.D., Liu, K.-W., Rowe, R.K.: Additive Gaussian process for computer models with qualitative and quantitative factors. Technometrics 59(3), 283–292 (2017)
Article MathSciNet Google Scholar
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96 1348–1360 (2001)
Article MathSciNet MATH Google Scholar
Fan, J., Liao, Y., Liu, H.: An overview of the estimation of large covariance and precision matrices. Econ. J. 19(1), 1–32 (2016)
Article MathSciNet MATH Google Scholar
Francq, C., Zakoïan, J. M.: Estimating multivariate volatility models equation by equation. J. R. Stat. Soc. Ser. B (Stat Methodol.) 78(3), 613–635 (2016)
Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010)
Article Google Scholar
Huang, J.Z., Liu, N., Pourahmadi, M., Liu, L.: Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93, 85–98 (2006)
Article MathSciNet MATH Google Scholar
Kang, X., Xie, C., Wang, M.: A Cholesky-based estimation for large-dimensional covariance matrices. J. Appl. Stat. 47, 1017–1030 (2020)
Article MathSciNet MATH Google Scholar
Kang, X., Deng, X., Tsui, K., Pourahmad,i M.: On variable ordination of modified Cholesky decomposition for estimating time-varying covariance matrices. Int. Stat. Rev., 88(3), 616–641 (2020)
Google Scholar
Lam, C., Fan, J.: Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Stat. 37, 4254–4278 (2009)
Article MathSciNet MATH Google Scholar
Lan, Q., Sun, H., Robertson, J., Deng, X., Jin, R.: Non-invasive assessment of liver 1uality in transplantation based on thermal imaging analysis. Comput. Methods Prog. Biomed. 164, 31–47 (2018)
Article Google Scholar
Leng, C., Li, B.: Forward adaptive banding for estimating large covariance matrices. Biometrika 98(4), 821–830 (2011)
Article MathSciNet MATH Google Scholar
Levina, E., Zhu, R.J.: Sparse estimation of large covariance matrices via a nested lasso penalty. Ann. Appl. Stat. 2(1), 245–263 (2008)
Article MathSciNet MATH Google Scholar
Liu, H., Wang, L., Zhao, T.: Sparse covariance matrix estimation with eigenvalue constraints. J. Comput. Graph. Stat. 23(2), 439–459 (2014)
Article MathSciNet Google Scholar
Mohammadi, A., Wit, E.C.: Bayesian structure learning in sparse Gaussian graphical models. Bayesian Anal. 10(1), 109–138 (2015)
Article MathSciNet MATH Google Scholar
Nino-Ruiz, E.D., Sandu, A., Deng, X.: An ensemble Kalman filter implementation based on modified cholesky decomposition for inverse covariance matrix estimation. SIAM J. Sci. Comput. 40(2), A867–CA886 (2018)
Article MathSciNet MATH Google Scholar
Pedeli, X., Fokianos, K., Pourahmadi, M.: Two Cholesky-log-GARCH models for multivariate volatilities. Stat. Model. 15, 233–255 (2015)
Article MathSciNet MATH Google Scholar
Pourahmadi, M.: Joint mean-covariance models with applications to longitudinal data: unconstrained parameterisation. Biometrika 86, 677–690 (1999)
Article MathSciNet MATH Google Scholar
Rajaratnam, B., Salzman, J.: Best permutation analysis. J. Multivar. Anal. 121, 193–223 (2013)
Article MathSciNet MATH Google Scholar
Rigollet, P., Tsybakov, A.: Estimation of covariance matrices under sparsity constraints. Probl. Inf. Transm. 51(4), 32–46 (2012)
Google Scholar
Rothman, A.J., Levina, E., Zhu, J.: Generalized thresholding of large covariance matrices. J. Am. Stat. Assoc. 104(485), 177–186 (2009)
Article MathSciNet MATH Google Scholar
Rothman, A.J., Levina, E., Zhu, J.: A new approach to Cholesky-based covariance regularization in high dimensions. Biometrika 97(3), 539–550 (2010)
Article MathSciNet MATH Google Scholar
Sapsanis, C., Georgoulas, G., Tzes, A., Lymberopoulos, D.: Improving EMG based classification of basic hand movements using EMD. In: Proceedings of the 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 5754–5757 (2013)
Google Scholar
Sun, H., Huang, S., Jin, R.: Functional graphical models for manufacturing process modeling. IEEE Trans. Autom. Sci. Eng. 14(4), 1612–1621 (2017)
Article Google Scholar
Sun, H., Rao, P.K., Kong, Z., Deng, X., Jin, R.: Functional quantitative and qualitative models for quality modeling in a fused deposition modeling process. IEEE Trans. Autom. Sci. Eng. 15(1), 393–403 (2018)
Article Google Scholar
Tan, L.S., Nott, D.J.: Gaussian variational approximation with sparse precision matrices. Stat. Comput. 28(2), 259–275 (2018)
Article MathSciNet MATH Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Stat Methodol.) 58, 267–288 (1996)
Google Scholar
Wagaman, A., Levina, E.: Discovering sparse covariance structures with the Isomap. J. Comput. Graph. Stat. 18(3), 551–572 (2009)
Article MathSciNet Google Scholar
Wu, H., Deng, X., Ramakrishnan, N.: Sparse estimation of multivariate poisson log-normal model and inverse covariance for counting data. Stat. Anal. Data Min. 11, 66–77 (2018)
Article MathSciNet MATH Google Scholar
Xue, L., Ma, S., Zou, H.: Positive-definite L₁-penalized estimation of large covariance matrices. J. Am. Stat. Assoc. 107(500), 1480–1491 (2012)
Article MATH Google Scholar
Yu, P.L.H., Wang, X., Zhu, Y.: High dimensional covariance matrix estimation by penalizing the matrix-logarithm transformed likelihood. Comput. Stat. Data Anal. 114, 12–25 (2017)
Article MathSciNet MATH Google Scholar
Yuan, M., Lin, Y.: Model selection and estimation in the Gaussian graphical model. Biometrika 94, 19–35 (2007)
Article MathSciNet MATH Google Scholar
Zeng, L., Deng, X., Yang, J.: A constrained Gaussian process approach to modeling tissue-engineered scaffold degradation. IISE Trans. 50(5), 431–447 (2018)
Article Google Scholar
Zheng, H., Tsui, K.-W., Kang, X., Deng, X.: Cholesky-based model averaging for covariance matrix estimation. Stat. Theory Relat. Fields 1(1), 48–58 (2017)
Article MATH Google Scholar

Download references

Acknowledgements

The authors would like to thank the editor and reviewers for the constructive and insightful comments, which have significantly enhanced the quality of this article.

Proof of Remark 1 and 2

Since the conclusions of Remarks 1 and 2 are much similar, we only provide the proof of Remark 2 here. Assume that there are n independent and identically distributed observations x₁, …, x_n, which are centered. Let $\boldsymbol S = \frac {1}{n} \sum _{i=1}^{n} \boldsymbol x_{i} \boldsymbol x^{\prime }_{i}$ be the sample covariance matrix and assume that S is non-singular since n > p. We denote $\hat {\boldsymbol \Sigma }_{0}$ as the estimated covariance matrix from (43.12) with tuning parameters equal to zeroes in (43.11). Then Remark 2 states that $\hat {\boldsymbol \Sigma }_{0} = \boldsymbol S$ in spite of any permutation of x₁, …, x_n. Below is the proof.

Based on the sequential regression of (43.11), it is known that the first step is X₁ = 𝜖₁. It means that

$$\displaystyle \begin{aligned} e_{i1} = x_{i1}, 1\leq i\leq n, \mbox{ and } \hat{\sigma}^2_1 = \frac{1}{n} \sum_{i=1}^n e^2_{i1} \end{aligned} $$

Then the second step is to consider X₂ = l₂₁𝜖₁ + 𝜖₂, which provides

$$\displaystyle \begin{aligned} \hat{l}_{21} &= \frac{\sum_{i=1}^n x_{i2}e_{i1}}{\sum_{i=1}^n e^2_{i1}}, \; e_{i2} = x_{i2} - \hat{l}_{21} e_{i1},1\leq i\leq n \\ \hat{\sigma}^2_2 &= \frac{1}{n} \sum_{i=1}^n e^2_{i2}, \; \sum_{i=1}^n e_{i2} e_{i1} = 0 \end{aligned} $$

In general, the jth step is to consider the regression problem as

$$\displaystyle \begin{aligned} X_j = \sum_{k<j} l_{jk} \epsilon_{k} + \epsilon_j \end{aligned} $$

and we can obtain

$$\displaystyle \begin{aligned} & \hat{l}_{j1} = \frac{\sum_{i=1}^n x_{ij}e_{i1}}{\sum_{i=1}^n e^2_{i1}},\ldots, \hat{l}_{jk} = \frac{\sum_{i=1}^n x_{ij}e_{ik}}{\sum_{i=1}^n e^2_{ik}},\ldots, \hat{l}_{j,j-1} = \frac{\sum_{i=1}^n x_{ij}e_{i,j-1}}{\sum_{i=1}^n e^2_{i,j-1}} \\ & e_{ij} = x_{ij} - \sum_{k<j} \hat{l}_{jk} e_{ik}, 1\leq i\leq n \\ & \hat{\sigma}^2_j = \frac{1}{n} \sum_{i=1}^n e^2_{ij}, \; \sum_{i=1}^n e_{ij} e_{i1} = 0, \; \ldots , \;\sum_{i=1}^n e_{ij} e_{i,j-1} = 0 \end{aligned} $$

Therefore, we can express the (s, t) entry of the covariance matrix estimate using the regression coefficients as

$$\displaystyle \begin{aligned} (\hat{\boldsymbol \Sigma})_{st} = (\hat{\boldsymbol L} \hat{\boldsymbol D} \hat{\boldsymbol L}^{T})_{st} = \sum_{u=1}^{\text{min}(s,t)} \hat{l}_{su} \hat{l}_{tu} \hat{\sigma}^2_u \quad (\hat{l}_{uu} = 1). \end{aligned}$$

Note that

$$\displaystyle \begin{aligned} x_{is} & = \sum_{u=1}^{s}\hat{l}_{su} {e}_{iu} \quad (\hat{l}_{uu} = 1), \quad 1\leq i \leq n, \\ x_{it} & = \sum_{v=1}^{t}\hat{l}_{tv} {e}_{iv} \quad (\hat{l}_{vv} = 1), \quad 1\leq i \leq n, \end{aligned} $$

and the (s, t) entry of the sample covariance matrix is

$$\displaystyle \begin{aligned} (\boldsymbol S)_{st} & = \frac{1}{n} \sum_{i=1}^{n} x_{is}x_{it} = \frac{1}{n} \sum_{i=1}^{n} \left( \sum_{u=1}^{s} \hat{l}_{su} {e}_{iu} \right) \left( \sum_{v=1}^{t}\hat{l}_{tv} {e}_{iv} \right) \\ & = \frac{1}{n} \sum_{u=1}^{s} \sum_{v=1}^{t} \hat{l}_{su} \hat{l}_{tv} \left( \sum_{i=1}^{n} {e}_{iu} {e}_{iv} \right) \\ & = \sum_{u=1}^{\text{min}(s,t)} \hat{l}_{su} \hat{l}_{tu} \hat{\sigma}^2_u \quad (\hat{l}_{uu} = 1). \end{aligned} $$

The last equality holds because of

$$\displaystyle \begin{aligned} \sum_{i=1}^{n} {e}_{iu} {e}_{iv} = \left\{ \begin{aligned} & n \sigma^2_u & u=v; \\ & 0 & u\neq v. \end{aligned} \right. \end{aligned}$$

Thus, we can establish the result

$$\displaystyle \begin{aligned} \boldsymbol S = \hat{\boldsymbol L} \; \text{diag}\left(\hat{\sigma}^2_1,\ldots,\hat{\sigma}^2_p\right) \; \hat{\boldsymbol L}^{T}. \end{aligned}$$

□

Conditional Misclassification Error of LDA

Without loss of generality, we consider a two-class classification problem here. Suppose the binary classifier function for LDA is $g(\boldsymbol x) = \log [ P(Y = 1|\boldsymbol X = \boldsymbol x) / P(Y = 2|\boldsymbol X = \boldsymbol x) ]$. Then

$$\displaystyle \begin{aligned} g(\boldsymbol x) \triangleq \boldsymbol a^{T} \boldsymbol x - b &= (\boldsymbol \mu_{1} - \boldsymbol \mu_{2})^{T}\boldsymbol \Sigma^{-1} \boldsymbol x \\ &\quad - \left[ \frac{1}{2} (\boldsymbol \mu_{1} + \boldsymbol \mu_{2})^{T}\boldsymbol \Sigma^{-1}(\boldsymbol \mu_{1} - \boldsymbol \mu_{2}) - \log\frac{\pi_{1}}{\pi_{2}}\right], \end{aligned} $$

where π₁ and π₂ are the prior probabilities for class 1 and 2, respectively, i.e., π₁ = P(Y = 1) and π₂ = P(Y = 2). For a new observation x, we predict its class Y = 1 if g(x) > 0, and Y = 2 otherwise. Then the conditional misclassification error is

$$\displaystyle \begin{aligned} P(g(\boldsymbol x) &= 1 | Y = 2)P(Y = 2) + P(g(\boldsymbol x) = 2 | Y = 1)P(Y = 1)\\ &= P(a^{T}\boldsymbol x\! -\! b \!>\!0 | Y\!=\!2)\pi_{2}\!+\! P(a^{T}\boldsymbol x\! -\! b \leq 0 | Y\!=\!1)\pi_{1}. \end{aligned} $$

Since x|Y = 1 ∼ N(μ₁, Σ), and x|Y = 2 ∼ N(μ₂, Σ), obviously a^Tx|Y = 1 ∼ N(a^Tμ₁, a^T Σa), and a^Tx|Y = 2 ∼ N(a^Tμ₂, a^T Σa). Therefore,

$$\displaystyle \begin{aligned} P(\boldsymbol a^{T}\boldsymbol x - b >0 | Y = 2) &= \Phi\left ( \frac{\boldsymbol a^{T}\boldsymbol \mu_{2} - b}{\sqrt{\boldsymbol a^{T}\boldsymbol \Sigma \boldsymbol a}}\right ),\\ P(a^{T}\boldsymbol x - b \leq 0 | Y = 1) &= \Phi\left (- \frac{\boldsymbol a^{T}\boldsymbol \mu_{1} - b}{\sqrt{\boldsymbol a^{T}\boldsymbol \Sigma \boldsymbol a}}\right ), \end{aligned} $$

where Φ(⋅) is the cumulative distribution function of the standard normal random variable. As a result, the conditional misclassification error is

$$\displaystyle \begin{aligned} \pi_{2} \Phi\left ( \frac{\boldsymbol a^{T}\boldsymbol \mu_{2} - b}{\sqrt{\boldsymbol a^{T}\boldsymbol \Sigma \boldsymbol a}}\right ) + \pi_{1}\Phi\left (- \frac{\boldsymbol a^{T}\boldsymbol \mu_{1} - b}{\sqrt{\boldsymbol a^{T}\boldsymbol \Sigma \boldsymbol a}}\right ). \end{aligned} $$

Assume π₁ = π₂ = 1∕2. Then with the estimates of a and b through $\hat {\boldsymbol \mu }_{1}, \hat {\boldsymbol \mu }_{2}, \hat {\boldsymbol \Sigma }$, the conditional misclassification error $\gamma (\hat {\boldsymbol \Sigma }, \hat {\boldsymbol \mu }_{1}, \hat {\boldsymbol \mu }_{2})$ is

$$\displaystyle \begin{aligned} &\gamma(\hat{\boldsymbol \Sigma}, \hat{\boldsymbol \mu}_{1}, \hat{\boldsymbol \mu}_{2}) \\ &\quad = \frac{1}{2}\Phi\left ( \frac{(\hat{\boldsymbol \mu}_{1} - \hat{\boldsymbol \mu}_{2})^{T} \hat{\boldsymbol \Sigma}^{-1} \boldsymbol \mu_{2} - \frac{1}{2}(\hat{\boldsymbol \mu}_{1} + \hat{\boldsymbol \mu}_{2})^{T}\hat{\boldsymbol \Sigma}^{-1} (\hat{\boldsymbol \mu}_{1} - \hat{\boldsymbol \mu}_{2})}{\sqrt{(\hat{\boldsymbol \mu}_{1} - \hat{\boldsymbol \mu}_{2})^{T} \hat{\boldsymbol \Sigma}^{-1}\boldsymbol \Sigma \hat{\boldsymbol \Sigma}^{-1} (\hat{\boldsymbol \mu}_{1} - \hat{\boldsymbol \mu}_{2})}}\right ) \\ & \qquad + \frac{1}{2}\Phi\left (- \frac{(\hat{\boldsymbol \mu}_{1}\! -\! \hat{\boldsymbol \mu}_{2})^{T} \hat{\boldsymbol \Sigma}^{-1} \boldsymbol \mu_{1}\! -\! \frac{1}{2}(\hat{\boldsymbol \mu}_{1} \!+\! \hat{\boldsymbol \mu}_{2})^{T}\hat{\boldsymbol \Sigma}^{-1} (\hat{\boldsymbol \mu}_{1}\! -\! \hat{\boldsymbol \mu}_{2})}{\sqrt{(\hat{\boldsymbol \mu}_{1} - \hat{\boldsymbol \mu}_{2})^{T} \hat{\boldsymbol \Sigma}^{-1}\boldsymbol \Sigma \hat{\boldsymbol \Sigma}^{-1}(\hat{\boldsymbol \mu}_{1} - \hat{\boldsymbol \mu}_{2}) }}\right ). \end{aligned} $$

Author information

Authors and Affiliations

International Business College and Institute of Supply Chain Analytics, Dongbei University of Finance and Economics, Dalian, China
Xiaoning Kang
Department of Statistics, Virginia Tech, Blacksburg, VA, USA
Zhiyang Zhang & Xinwei Deng

Authors

Xiaoning Kang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xinwei Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinwei Deng .

Editor information

Editors and Affiliations

Department of Industrial & Systems Engineering, Rutgers University, Piscataway, NJ, USA
Hoang Pham

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kang, X., Zhang, Z., Deng, X. (2023). Covariance Estimation via the Modified Cholesky Decomposition. In: Pham, H. (eds) Springer Handbook of Engineering Statistics. Springer Handbooks. Springer, London. https://doi.org/10.1007/978-1-4471-7503-2_43

Download citation

DOI: https://doi.org/10.1007/978-1-4471-7503-2_43
Published: 22 April 2023
Publisher Name: Springer, London
Print ISBN: 978-1-4471-7502-5
Online ISBN: 978-1-4471-7503-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics