Skip to main content

Covariance Estimation via the Modified Cholesky Decomposition

  • Chapter
  • First Online:
Springer Handbook of Engineering Statistics

Part of the book series: Springer Handbooks ((SHB))

  • 2160 Accesses

Abstract

In many engineering applications, estimation of covariance and precision matrices is of great importance, helping researchers understand the dependency and conditional dependency between variables of interest. Among various matrix estimation methods, the modified Cholesky decomposition is a commonly used technique. It has the advantage of transforming the matrix estimation task into solving a sequence of regression models. Moreover, the sparsity on the regression coefficients implies certain sparse structure on the covariance and precision matrices. In this chapter, we first overview the Cholesky-based covariance and precision matrices estimation. It is known that the Cholesky-based matrix estimation depends on a prespecified ordering of variables, which is often not available in practice. To address this issue, we then introduce several techniques to enhance the Cholesky-based estimation of covariance and precision matrices. These approaches are able to ensure the positive definiteness of the matrix estimate and applicable in general situations without specifying the ordering of variables. The advantage of Cholesky-based estimation is illustrated by numerical studies and several real-case applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 309.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 399.00
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bickel, P.J., Levina, E.: Covariance regularization by thresholding. Ann. Stat. 36(6), 2577–2604 (2008a)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bickel, P.J., Levina, E.: Regularized estimation of large covariance matrices. Ann. Stat. 36(1), 199–227 (2008b)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bien, J., Tibshirani, R.J.: Sparse estimation of a covariance matrix. Biometrika 98(4), 807–820 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  4. Cai, T.T., Yuan, M.: Adaptive covariance matrix estimation through block thresholding. Ann. Stat. 40(40), 2014–2042 (2012)

    MathSciNet  MATH  Google Scholar 

  5. Cai, T.T., Ren, Z., Zhou, H.H.: Estimating structured high-dimensional covariance and precision matrices: optimal rates and adaptive estimation. Electronic Journal of Statistics 10(1), 1–59 (2016)

    MathSciNet  MATH  Google Scholar 

  6. Chang, C., Tsay, R.S.: Estimation of covariance matrix via the sparse Cholesky factor with lasso. J. Stat. Plann. Inference 140(12), 3858–3873 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  7. Cochran, W.G.: Sampling Techniques. Wiley, New York (1977)

    MATH  Google Scholar 

  8. Deng, X., Lin, C.D., Liu, K.-W., Rowe, R.K.: Additive Gaussian process for computer models with qualitative and quantitative factors. Technometrics 59(3), 283–292 (2017)

    Article  MathSciNet  Google Scholar 

  9. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96 1348–1360 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  10. Fan, J., Liao, Y., Liu, H.: An overview of the estimation of large covariance and precision matrices. Econ. J. 19(1), 1–32 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  11. Francq, C., Zakoïan, J. M.: Estimating multivariate volatility models equation by equation. J. R. Stat. Soc. Ser. B (Stat Methodol.) 78(3), 613–635 (2016)

    Google Scholar 

  12. Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010)

    Article  Google Scholar 

  13. Huang, J.Z., Liu, N., Pourahmadi, M., Liu, L.: Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93, 85–98 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  14. Kang, X., Xie, C., Wang, M.: A Cholesky-based estimation for large-dimensional covariance matrices. J. Appl. Stat. 47, 1017–1030 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  15. Kang, X., Deng, X., Tsui, K., Pourahmad,i M.: On variable ordination of modified Cholesky decomposition for estimating time-varying covariance matrices. Int. Stat. Rev., 88(3), 616–641 (2020)

    Google Scholar 

  16. Lam, C., Fan, J.: Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Stat. 37, 4254–4278 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  17. Lan, Q., Sun, H., Robertson, J., Deng, X., Jin, R.: Non-invasive assessment of liver 1uality in transplantation based on thermal imaging analysis. Comput. Methods Prog. Biomed. 164, 31–47 (2018)

    Article  Google Scholar 

  18. Leng, C., Li, B.: Forward adaptive banding for estimating large covariance matrices. Biometrika 98(4), 821–830 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  19. Levina, E., Zhu, R.J.: Sparse estimation of large covariance matrices via a nested lasso penalty. Ann. Appl. Stat. 2(1), 245–263 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  20. Liu, H., Wang, L., Zhao, T.: Sparse covariance matrix estimation with eigenvalue constraints. J. Comput. Graph. Stat. 23(2), 439–459 (2014)

    Article  MathSciNet  Google Scholar 

  21. Mohammadi, A., Wit, E.C.: Bayesian structure learning in sparse Gaussian graphical models. Bayesian Anal. 10(1), 109–138 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  22. Nino-Ruiz, E.D., Sandu, A., Deng, X.: An ensemble Kalman filter implementation based on modified cholesky decomposition for inverse covariance matrix estimation. SIAM J. Sci. Comput. 40(2), A867–CA886 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  23. Pedeli, X., Fokianos, K., Pourahmadi, M.: Two Cholesky-log-GARCH models for multivariate volatilities. Stat. Model. 15, 233–255 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  24. Pourahmadi, M.: Joint mean-covariance models with applications to longitudinal data: unconstrained parameterisation. Biometrika 86, 677–690 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  25. Rajaratnam, B., Salzman, J.: Best permutation analysis. J. Multivar. Anal. 121, 193–223 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  26. Rigollet, P., Tsybakov, A.: Estimation of covariance matrices under sparsity constraints. Probl. Inf. Transm. 51(4), 32–46 (2012)

    Google Scholar 

  27. Rothman, A.J., Levina, E., Zhu, J.: Generalized thresholding of large covariance matrices. J. Am. Stat. Assoc. 104(485), 177–186 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  28. Rothman, A.J., Levina, E., Zhu, J.: A new approach to Cholesky-based covariance regularization in high dimensions. Biometrika 97(3), 539–550 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  29. Sapsanis, C., Georgoulas, G., Tzes, A., Lymberopoulos, D.: Improving EMG based classification of basic hand movements using EMD. In: Proceedings of the 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 5754–5757 (2013)

    Google Scholar 

  30. Sun, H., Huang, S., Jin, R.: Functional graphical models for manufacturing process modeling. IEEE Trans. Autom. Sci. Eng. 14(4), 1612–1621 (2017)

    Article  Google Scholar 

  31. Sun, H., Rao, P.K., Kong, Z., Deng, X., Jin, R.: Functional quantitative and qualitative models for quality modeling in a fused deposition modeling process. IEEE Trans. Autom. Sci. Eng. 15(1), 393–403 (2018)

    Article  Google Scholar 

  32. Tan, L.S., Nott, D.J.: Gaussian variational approximation with sparse precision matrices. Stat. Comput. 28(2), 259–275 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  33. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Stat Methodol.) 58, 267–288 (1996)

    Google Scholar 

  34. Wagaman, A., Levina, E.: Discovering sparse covariance structures with the Isomap. J. Comput. Graph. Stat. 18(3), 551–572 (2009)

    Article  MathSciNet  Google Scholar 

  35. Wu, H., Deng, X., Ramakrishnan, N.: Sparse estimation of multivariate poisson log-normal model and inverse covariance for counting data. Stat. Anal. Data Min. 11, 66–77 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  36. Xue, L., Ma, S., Zou, H.: Positive-definite L1-penalized estimation of large covariance matrices. J. Am. Stat. Assoc. 107(500), 1480–1491 (2012)

    Article  MATH  Google Scholar 

  37. Yu, P.L.H., Wang, X., Zhu, Y.: High dimensional covariance matrix estimation by penalizing the matrix-logarithm transformed likelihood. Comput. Stat. Data Anal. 114, 12–25 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  38. Yuan, M., Lin, Y.: Model selection and estimation in the Gaussian graphical model. Biometrika 94, 19–35 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  39. Zeng, L., Deng, X., Yang, J.: A constrained Gaussian process approach to modeling tissue-engineered scaffold degradation. IISE Trans. 50(5), 431–447 (2018)

    Article  Google Scholar 

  40. Zheng, H., Tsui, K.-W., Kang, X., Deng, X.: Cholesky-based model averaging for covariance matrix estimation. Stat. Theory Relat. Fields 1(1), 48–58 (2017)

    Article  MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the editor and reviewers for the constructive and insightful comments, which have significantly enhanced the quality of this article.

Proof of Remark 1 and 2

Since the conclusions of Remarks 1 and 2 are much similar, we only provide the proof of Remark 2 here. Assume that there are n independent and identically distributed observations x1, …, xn, which are centered. Let \(\boldsymbol S = \frac {1}{n} \sum _{i=1}^{n} \boldsymbol x_{i} \boldsymbol x^{\prime }_{i}\) be the sample covariance matrix and assume that S is non-singular since n > p. We denote \(\hat {\boldsymbol \Sigma }_{0}\) as the estimated covariance matrix from (43.12) with tuning parameters equal to zeroes in (43.11). Then Remark 2 states that \(\hat {\boldsymbol \Sigma }_{0} = \boldsymbol S\) in spite of any permutation of x1, …, xn. Below is the proof.

Based on the sequential regression of (43.11), it is known that the first step is X1 = 𝜖1. It means that

$$\displaystyle \begin{aligned} e_{i1} = x_{i1}, 1\leq i\leq n, \mbox{ and } \hat{\sigma}^2_1 = \frac{1}{n} \sum_{i=1}^n e^2_{i1} \end{aligned} $$

Then the second step is to consider X2 = l21𝜖1 + 𝜖2, which provides

$$\displaystyle \begin{aligned} \hat{l}_{21} &= \frac{\sum_{i=1}^n x_{i2}e_{i1}}{\sum_{i=1}^n e^2_{i1}}, \; e_{i2} = x_{i2} - \hat{l}_{21} e_{i1},1\leq i\leq n \\ \hat{\sigma}^2_2 &= \frac{1}{n} \sum_{i=1}^n e^2_{i2}, \; \sum_{i=1}^n e_{i2} e_{i1} = 0 \end{aligned} $$

In general, the jth step is to consider the regression problem as

$$\displaystyle \begin{aligned} X_j = \sum_{k<j} l_{jk} \epsilon_{k} + \epsilon_j \end{aligned} $$

and we can obtain

$$\displaystyle \begin{aligned} & \hat{l}_{j1} = \frac{\sum_{i=1}^n x_{ij}e_{i1}}{\sum_{i=1}^n e^2_{i1}},\ldots, \hat{l}_{jk} = \frac{\sum_{i=1}^n x_{ij}e_{ik}}{\sum_{i=1}^n e^2_{ik}},\ldots, \hat{l}_{j,j-1} = \frac{\sum_{i=1}^n x_{ij}e_{i,j-1}}{\sum_{i=1}^n e^2_{i,j-1}} \\ & e_{ij} = x_{ij} - \sum_{k<j} \hat{l}_{jk} e_{ik}, 1\leq i\leq n \\ & \hat{\sigma}^2_j = \frac{1}{n} \sum_{i=1}^n e^2_{ij}, \; \sum_{i=1}^n e_{ij} e_{i1} = 0, \; \ldots , \;\sum_{i=1}^n e_{ij} e_{i,j-1} = 0 \end{aligned} $$

Therefore, we can express the (s, t) entry of the covariance matrix estimate using the regression coefficients as

$$\displaystyle \begin{aligned} (\hat{\boldsymbol \Sigma})_{st} = (\hat{\boldsymbol L} \hat{\boldsymbol D} \hat{\boldsymbol L}^{T})_{st} = \sum_{u=1}^{\text{min}(s,t)} \hat{l}_{su} \hat{l}_{tu} \hat{\sigma}^2_u \quad (\hat{l}_{uu} = 1). \end{aligned}$$

Note that

$$\displaystyle \begin{aligned} x_{is} & = \sum_{u=1}^{s}\hat{l}_{su} {e}_{iu} \quad (\hat{l}_{uu} = 1), \quad 1\leq i \leq n, \\ x_{it} & = \sum_{v=1}^{t}\hat{l}_{tv} {e}_{iv} \quad (\hat{l}_{vv} = 1), \quad 1\leq i \leq n, \end{aligned} $$

and the (s, t) entry of the sample covariance matrix is

$$\displaystyle \begin{aligned} (\boldsymbol S)_{st} & = \frac{1}{n} \sum_{i=1}^{n} x_{is}x_{it} = \frac{1}{n} \sum_{i=1}^{n} \left( \sum_{u=1}^{s} \hat{l}_{su} {e}_{iu} \right) \left( \sum_{v=1}^{t}\hat{l}_{tv} {e}_{iv} \right) \\ & = \frac{1}{n} \sum_{u=1}^{s} \sum_{v=1}^{t} \hat{l}_{su} \hat{l}_{tv} \left( \sum_{i=1}^{n} {e}_{iu} {e}_{iv} \right) \\ & = \sum_{u=1}^{\text{min}(s,t)} \hat{l}_{su} \hat{l}_{tu} \hat{\sigma}^2_u \quad (\hat{l}_{uu} = 1). \end{aligned} $$

The last equality holds because of

$$\displaystyle \begin{aligned} \sum_{i=1}^{n} {e}_{iu} {e}_{iv} = \left\{ \begin{aligned} & n \sigma^2_u & u=v; \\ & 0 & u\neq v. \end{aligned} \right. \end{aligned}$$

Thus, we can establish the result

$$\displaystyle \begin{aligned} \boldsymbol S = \hat{\boldsymbol L} \; \text{diag}\left(\hat{\sigma}^2_1,\ldots,\hat{\sigma}^2_p\right) \; \hat{\boldsymbol L}^{T}. \end{aligned}$$

Conditional Misclassification Error of LDA

Without loss of generality, we consider a two-class classification problem here. Suppose the binary classifier function for LDA is \(g(\boldsymbol x) = \log [ P(Y = 1|\boldsymbol X = \boldsymbol x) / P(Y = 2|\boldsymbol X = \boldsymbol x) ]\). Then

$$\displaystyle \begin{aligned} g(\boldsymbol x) \triangleq \boldsymbol a^{T} \boldsymbol x - b &= (\boldsymbol \mu_{1} - \boldsymbol \mu_{2})^{T}\boldsymbol \Sigma^{-1} \boldsymbol x \\ &\quad - \left[ \frac{1}{2} (\boldsymbol \mu_{1} + \boldsymbol \mu_{2})^{T}\boldsymbol \Sigma^{-1}(\boldsymbol \mu_{1} - \boldsymbol \mu_{2}) - \log\frac{\pi_{1}}{\pi_{2}}\right], \end{aligned} $$

where π1 and π2 are the prior probabilities for class 1 and 2, respectively, i.e., π1 = P(Y = 1) and π2 = P(Y = 2). For a new observation x, we predict its class Y = 1 if g(x) > 0, and Y = 2 otherwise. Then the conditional misclassification error is

$$\displaystyle \begin{aligned} P(g(\boldsymbol x) &= 1 | Y = 2)P(Y = 2) + P(g(\boldsymbol x) = 2 | Y = 1)P(Y = 1)\\ &= P(a^{T}\boldsymbol x\! -\! b \!>\!0 | Y\!=\!2)\pi_{2}\!+\! P(a^{T}\boldsymbol x\! -\! b \leq 0 | Y\!=\!1)\pi_{1}. \end{aligned} $$

Since x|Y = 1 ∼ N(μ1, Σ), and x|Y = 2 ∼ N(μ2, Σ), obviously aTx|Y = 1 ∼ N(aTμ1, aT Σa), and aTx|Y = 2 ∼ N(aTμ2, aT Σa). Therefore,

$$\displaystyle \begin{aligned} P(\boldsymbol a^{T}\boldsymbol x - b >0 | Y = 2) &= \Phi\left ( \frac{\boldsymbol a^{T}\boldsymbol \mu_{2} - b}{\sqrt{\boldsymbol a^{T}\boldsymbol \Sigma \boldsymbol a}}\right ),\\ P(a^{T}\boldsymbol x - b \leq 0 | Y = 1) &= \Phi\left (- \frac{\boldsymbol a^{T}\boldsymbol \mu_{1} - b}{\sqrt{\boldsymbol a^{T}\boldsymbol \Sigma \boldsymbol a}}\right ), \end{aligned} $$

where Φ(⋅) is the cumulative distribution function of the standard normal random variable. As a result, the conditional misclassification error is

$$\displaystyle \begin{aligned} \pi_{2} \Phi\left ( \frac{\boldsymbol a^{T}\boldsymbol \mu_{2} - b}{\sqrt{\boldsymbol a^{T}\boldsymbol \Sigma \boldsymbol a}}\right ) + \pi_{1}\Phi\left (- \frac{\boldsymbol a^{T}\boldsymbol \mu_{1} - b}{\sqrt{\boldsymbol a^{T}\boldsymbol \Sigma \boldsymbol a}}\right ). \end{aligned} $$

Assume π1 = π2 = 1∕2. Then with the estimates of a and b through \(\hat {\boldsymbol \mu }_{1}, \hat {\boldsymbol \mu }_{2}, \hat {\boldsymbol \Sigma }\), the conditional misclassification error \(\gamma (\hat {\boldsymbol \Sigma }, \hat {\boldsymbol \mu }_{1}, \hat {\boldsymbol \mu }_{2})\) is

$$\displaystyle \begin{aligned} &\gamma(\hat{\boldsymbol \Sigma}, \hat{\boldsymbol \mu}_{1}, \hat{\boldsymbol \mu}_{2}) \\ &\quad = \frac{1}{2}\Phi\left ( \frac{(\hat{\boldsymbol \mu}_{1} - \hat{\boldsymbol \mu}_{2})^{T} \hat{\boldsymbol \Sigma}^{-1} \boldsymbol \mu_{2} - \frac{1}{2}(\hat{\boldsymbol \mu}_{1} + \hat{\boldsymbol \mu}_{2})^{T}\hat{\boldsymbol \Sigma}^{-1} (\hat{\boldsymbol \mu}_{1} - \hat{\boldsymbol \mu}_{2})}{\sqrt{(\hat{\boldsymbol \mu}_{1} - \hat{\boldsymbol \mu}_{2})^{T} \hat{\boldsymbol \Sigma}^{-1}\boldsymbol \Sigma \hat{\boldsymbol \Sigma}^{-1} (\hat{\boldsymbol \mu}_{1} - \hat{\boldsymbol \mu}_{2})}}\right ) \\ & \qquad + \frac{1}{2}\Phi\left (- \frac{(\hat{\boldsymbol \mu}_{1}\! -\! \hat{\boldsymbol \mu}_{2})^{T} \hat{\boldsymbol \Sigma}^{-1} \boldsymbol \mu_{1}\! -\! \frac{1}{2}(\hat{\boldsymbol \mu}_{1} \!+\! \hat{\boldsymbol \mu}_{2})^{T}\hat{\boldsymbol \Sigma}^{-1} (\hat{\boldsymbol \mu}_{1}\! -\! \hat{\boldsymbol \mu}_{2})}{\sqrt{(\hat{\boldsymbol \mu}_{1} - \hat{\boldsymbol \mu}_{2})^{T} \hat{\boldsymbol \Sigma}^{-1}\boldsymbol \Sigma \hat{\boldsymbol \Sigma}^{-1}(\hat{\boldsymbol \mu}_{1} - \hat{\boldsymbol \mu}_{2}) }}\right ). \end{aligned} $$

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinwei Deng .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer-Verlag London Ltd., part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kang, X., Zhang, Z., Deng, X. (2023). Covariance Estimation via the Modified Cholesky Decomposition. In: Pham, H. (eds) Springer Handbook of Engineering Statistics. Springer Handbooks. Springer, London. https://doi.org/10.1007/978-1-4471-7503-2_43

Download citation

Publish with us

Policies and ethics