Machine Learning

, Volume 108, Issue 12, pp 2061–2086 | Cite as

A scalable sparse Cholesky based approach for learning high-dimensional covariance matrices in ordered data

  • Kshitij KhareEmail author
  • Sang-Yun Oh
  • Syed Rahman
  • Bala Rajaratnam


Covariance estimation for high-dimensional datasets is a fundamental problem in machine learning, and has numerous applications. In these high-dimensional settings the number of features or variables p is typically larger than the sample size n. A popular way of tackling this challenge is to induce sparsity in the covariance matrix, its inverse or a relevant transformation. In many applications, the data come with a natural ordering. In such settings, methods inducing sparsity in the Cholesky parameter of the inverse covariance matrix can be quite useful. Such methods are also better positioned to yield a positive definite estimate of the covariance matrix, a critical requirement for several downstream applications. Despite some important advances in this area, a principled approach to general sparse-Cholesky based covariance estimation with both statistical and algorithmic convergence safeguards has been elusive. In particular, the two popular likelihood based methods proposed in the literature either do not lead to a well-defined estimator in high-dimensional settings, or consider only a restrictive class of models. In this paper, we propose a principled and general method for sparse-Cholesky based covariance estimation that aims to overcome some of the shortcomings of current methods, but retains their respective strengths. We obtain a jointly convex formulation for our objective function, and show that it leads to rigorous convergence guarantees and well-defined estimators, even when \(p > n\). Very importantly, the approach always leads to a positive definite and symmetric estimator of the covariance matrix. We establish both high-dimensional estimation and selection consistency, and also demonstrate excellent finite sample performance on simulated/real data.


Covariance estimation High-dimensional data Sparse Cholesky Penalized likelihood 



The work of Kshitij Khare was partially supported by NSF Grant DMS-1106084. Sang-Yun Oh was supported in part by Laboratory Directed Research and Development (LDRD) funding from Berkeley Lab, provided by the Director, Office of Science, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. The work of Bala Rajaratnam was partially supported by US Air Force Office of Scientific Research grant award number FA9550-13-1-0043, US National Science Foundation under Grant Nos. DMS-CMG 1025465, AGS-1003823, DMS-1106642, DMS-CAREER-1352656, Defense Advanced Research Projects Agency DARPA- YFAN66001-111-4131 and SMC-DBNKY.

Supplementary material

10994_2019_5810_MOESM1_ESM.pdf (218 kb)
Supplementary material 1 (pdf 218 KB)


  1. Aragam, B., Amini, A., & Zhou, Q. (2016). Learning directed acyclic graphs with penalized neighbourhood regression. arxiv.Google Scholar
  2. Aragam, B., & Zhou, Q. (2015). Concave penalized estimation of sparse Gaussian Bayesian networks. Journal of Machine Learning Research, 16, 2273–2328.MathSciNetzbMATHGoogle Scholar
  3. Banerjee, O., Ghaoui, L. E., & D’Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. The Journal of Machine Learning Research, 9, 485–516.MathSciNetzbMATHGoogle Scholar
  4. Cai, T., Liu, W., & Luo, X. (2011). A constrained l1 minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 106, 594–607.MathSciNetCrossRefGoogle Scholar
  5. Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874.MathSciNetCrossRefGoogle Scholar
  6. Friedman, J., Hastie, T., & Tibshirani, R. (2008a). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1–22.Google Scholar
  7. Friedman, J., Hastie, T., & Tibshirani, R. (2008b). Sparse inverse covariance estimation with the graphical Lasso. Biostatistics, 9, 432–441.CrossRefGoogle Scholar
  8. Friedman, J., Hastie, T., & Tibshirani, R. (2010). Applications of the lasso and grouped Lasso to the estimation of sparse graphical models. Technical Report, Department of Statistics, Stanford University.Google Scholar
  9. Fu, W. J. (1998). Penalized regressions: The bridge versus the Lasso. Journal of Computational and Graphical Statistics, 7, 397–416.MathSciNetGoogle Scholar
  10. Hsieh, C.-J., Sustik, M. A., Dhillon, I. S., & Ravikumar, P. (2011). Sparse inverse covariance matrix estimation using quadratic approximation. In Advances in Neural Information Processing Systems 24 (NIPS 2011) Google Scholar
  11. Huang, J., Liu, N., Pourahmadi, M., & Liu, L. (2006). Covariance selection and estimation via penalised normal likelihoode. Biometrika, 93, 85–98.MathSciNetCrossRefGoogle Scholar
  12. International HapMap 3 Consortium et al. (2010). Integrating common and rare genetic variation in diverse human populations. Nature, 467(7311), 52–58.Google Scholar
  13. Khare, K., Oh, S., & Rajaratnam, B. (2015). A convex pseudo-likelihood framework for high dimensional partial correlation estimation with convergence guarantees. Journal of the Royal Statistical Society B, 77, 803–825.MathSciNetCrossRefGoogle Scholar
  14. Khare, K. & Rajaratnam, B. (2014). Convergence of cyclic coordinatewise l1 minimization. arxiv.Google Scholar
  15. Lin, L., Drton, M., & Shojaie, A. (2016). Estimation of high-dimensional graphical models using regularized score matching. Electronic Journal of Statistics, 10, 806–854.MathSciNetCrossRefGoogle Scholar
  16. Liu, W., & Luo, X. (2015). Fast and adaptive sparse precision matrix estimation in high dimensions. Journal of Multivariate Analysis, 135, 153–162.MathSciNetCrossRefGoogle Scholar
  17. Massam, H., Paul, D., & Rajaratnam, B. (2007). Penalized empirical risk minimization using a convex loss function and \(\ell _1\) penalty. (unpublished manuscript).Google Scholar
  18. Mazumder, R., & Hastie, T. (2012). Exact covariance thresholding into connected components for large-scale graphical lasso. The Journal of Machine Learning Research, 13, 781–794.MathSciNetzbMATHGoogle Scholar
  19. Meinshausen, N., & Buhlmann, P. (2006). High dimensional graphs and variable selection with the Lasso. Annals of Statistics, 34, 1436–1462.MathSciNetCrossRefGoogle Scholar
  20. Oh, S., Dalal, O., Khare, K., & Rajaratnam, B. (2014). Optimization methods for sparse pseudo-likelihood graphical model selection. In Proceedings of neural information processing systems.Google Scholar
  21. Paulsen, V. I., Power, S. C., & Smith, R. R. (1989). Schur products and matrix completions. Journal of Functional Analysis, 85, 151–178.MathSciNetCrossRefGoogle Scholar
  22. Peng, J., Wang, P., Zhou, N., & Zhu, J. (2009). Partial correlation estimation by joint sparse regression models. Journal of the American Statistical Association, 104, 735–746.MathSciNetCrossRefGoogle Scholar
  23. Rothman, A., Levina, E., & Zhu, J. (2010). A new approach to cholesky-based covariance regularization in high dimensions. Biometrika, 97, 539–550.MathSciNetCrossRefGoogle Scholar
  24. Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D., & Nolan, G. (2003). Causal protein-signaling networks derived from multiparameter single-cell data. Science, 308(5721), 504–6.Google Scholar
  25. Shen, H., & Huang, J. Z. (2005). Analysis of call center arrival data using singular value decomposition. Applied Stochastic Models in Business and Industry, 21, 251–63.MathSciNetCrossRefGoogle Scholar
  26. Shojaie, A., & Michailidis, G. (2010). Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. Biometrika, 97, 519–538.MathSciNetCrossRefGoogle Scholar
  27. Smith, M., & Kohn, R. (2002). Parsimonious covariance matrix estimation for longitudinal data. Journal of the American Statistical Association, 97, 1141–1153.MathSciNetCrossRefGoogle Scholar
  28. van de Geer, S., & Buhlmann, P. (2013). l0-penalized maximum likelihood for sparse directed acyclic graphs. Annals of Statistics, 41, 536–567.MathSciNetCrossRefGoogle Scholar
  29. Wagaman, A., & Levina, E. (2009). Discovering sparse covariance structures with the isomap. Journal of Computational and Graphical Statistics, 18, 551–572.MathSciNetCrossRefGoogle Scholar
  30. Wu, W. B., & Pourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika, 90, 831–844.MathSciNetCrossRefGoogle Scholar
  31. Yu, G., & Bien, J. (2016). Learning local dependence in ordered data. arXiv:1604.07451.
  32. Yuan, M. (2010). High dimensional inverse covariance matrix estimation via linear programming. Journal of Machine Learning Research, 11, 2261–2286.MathSciNetzbMATHGoogle Scholar
  33. Zhang, T., & Zou, H. (2014). High dimensional inverse covariance matrix estimation via linear programming. Biometrika, 101, 103–120.MathSciNetCrossRefGoogle Scholar
  34. Zheng, H., Tsui, K. W., Kang, X., & Deng, X. (2017). Cholesky-based model averaging for covariance matrix estimation. Statistical Theory and Related Fields, 1, 48–58.CrossRefGoogle Scholar

Copyright information

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019

Authors and Affiliations

  • Kshitij Khare
    • 1
    Email author
  • Sang-Yun Oh
    • 2
  • Syed Rahman
    • 1
  • Bala Rajaratnam
    • 3
  1. 1.University of FloridaGainesvilleUSA
  2. 2.University of CaliforniaSanta BarbaraUSA
  3. 3.University of CaliforniaDavisUSA

Personalised recommendations