Abstract
Estimating high-dimensional dependence structures in models of multivariate datasets is an ongoing challenge. Copulas provide a powerful and intuitive way to model dependence structure in the joint distribution of disparate types of variables. Here, we propose an estimation method for Gaussian copula parameters based on the maximum likelihood estimate of a covariance matrix that includes shrinkage and where all of the diagonal elements are restricted to be equal to 1. We show that this estimation problem can be solved using a numerical solution that optimizes the problem in a block coordinate descent fashion. We illustrate the advantage of our proposed scheme in providing an efficient estimate of sparse Gaussian copula covariance parameters using a simulation study. The sparse estimate was obtained by regularizing the constrained problem using either the least absolute shrinkage and selection operator (LASSO) or the adaptive LASSO penalty, applied to either the covariance matrix or the inverse covariance (precision) matrix. Simulation results indicate that our method outperforms conventional estimates of sparse Gaussian copula covariance parameters. We demonstrate the proposed method for modelling dependence structures through an analysis of multivariate groundfish abundance data obtained from annual bottom trawl surveys in the northeast Pacific from 2014 to 2018. Supplementary materials accompanying this paper appear on-line.
Similar content being viewed by others
References
Anderson MJ, Tolimieri N, Millar RB (2013) Beta diversity of demersal fish assemblages in the North-Eastern Pacific: interactions of latitude and depth. PLoS ONE 8(3):e57918
Anderson MJ, de Valpine P, Punnett A, Miller AE (2019) A pathway for multivariate analysis of ecological communities using copulas. Ecol Evol 9(6):3276–3294
Aubry A, De Maio A, Pallotta L, Farina A (2012) Maximum likelihood estimation of a structured covariance matrix with a condition number constraint. IEEE Trans Signal Process 60(6):3004–3021
Bien J, Tibshirani RJ (2011) Sparse estimation of a covariance matrix. Biometrika 98(4):807–820
Breheny P, Huang J (2011) Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann Appl Stat 5(1):232
Chappell A et al (2021) 2021 West coast groundfish bottom trawl survey and indices of abundance. https://www.webapps.nwfsc.noaa.gov/apex/parrdata/inventory/datasets/dataset/131
Chen SX, Zhang LX, Zhong PS (2010) Tests for high-dimensional covariance matrices. J Am Stat Assoc 105(490):810–819
Clarke KR, Somerfield PJ, Gorley RN (2008) Testing of null hypotheses in exploratory community analyses: similarity profiles and biota-environment linkage. J Exp Mar Biol Ecol 366(1–2):56–69
Clayton DG (1978) A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika 65(1):141–151
Cox DR, Snell EJ (1968) A general definition of residuals. J R Stat Soc: Ser B (Methodol) 30(2):248–265
Demarta S, McNeil AJ (2005) The t copula and related copulas. Int Stat Rev 73(1):111–129
Embrechts P, Lindskog F, McNeil A (2001) Modeling dependence with copulas. Technical report, Department of Mathematics, F ’e d é ral Institute of Technology Zurich. Zurich 14
Emmert-Streib F, Tripathi S, Dehmer M (2019) Constrained covariance matrices with a biologically realistic structure: comparison of methods for generating highdimensional Gaussian graphical models. Front Appl Math Stat 5:17
Fan J, Feng Y, Wu Y (2009) Network exploration via the adaptive LASSO and SCAD penalties. Ann Appl Stat 3(2):521
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Fang K-T, Kotz S, Ng KW (2018) Symmetric multivariate and related distributions. Chapman and Hall/CRC
Frank MJ (1979) On the simultaneous associativity of F (x, y) and x+y- F (x, y). Aequationes Math 19(1):194–226
Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3):432–441
Friedman J, Hastie T, Tibshirani R (2019) glasso: graphical lasso: estimation of gaussian graphical models. R package version 1:11. https://CRAN.R-project.org/package=glasso
Gijbels I, Veraverbeke N, Omelka M (2011) Conditional copulas, association measures and their applications. Comput Stat Data Anal 55(5):1919–1932
Goodman IN, Johnson DH (2004) Orthogonal decompositions of multivariate statistical dependence measures. In: 2004 IEEE international conference on acoustics, speech, and signal processing, vol 2. IEEE, p ii-1017
Gumbel EJ (1960) Bivariate exponential distributions. J Am Stat Assoc 55(292):698–707
Jaworski P, Durante F, Hardle WK, Rychlik T (2010) Copula theory and its applications, vol 198. Springer, Berlin
Joe H (1997) Multivariate models and multivariate dependence concepts. CRC Press
Joe H, Kurowicka D (2011) Dependence modeling: vine copula handbook. World Scientific, Singapore
Kang B, Monga V, Rangaswamy M (2014) Rank-constrained maximum likelihood estimation of structured covariance matrices. IEEE Trans Aerosp Electron Syst 50(1):501–515
Keller AA (2008) The 2005 US West Coast bottom trawl survey of groundfish resources off Washington, Oregon, and California: estimates of distribution, abundance, and length composition US. Dept. Commer., NOAA Tech. Memo. NMFS-NWFSC-93
Mai J-F, Scherer M (2017) Simulating copulas: stochastic models, sampling algorithms, and applications, 2nd edn. Scientific Publishing, Singapore. https://doi.org/10.1142/10265
Manstavičius M, Bagdonas G (2021) A class of bivariate independence copula transformations. In: Fuzzy sets and systems
Martin TG, Wintle BA, Rhodes JR, Kuhnert PM, Field SA, Low-Choy SJ, Tyre AJ, Possingham HP (2005) Zero tolerance ecology: improving ecological inference by modelling the source of zero observations. Ecol Lett 8(11):1235–1246
McArdle BH, Gaston KJ, Lawton JH (1990) Variation in the size of animal populations: patterns, problems and artefacts. J Anim Ecol 59:439–454
McGill BJ et al (2007) Species abundance distributions: moving beyond single prediction theories to integration within an ecological framework. Ecol Lett 10(10):995–1015
Nelsen RB (1996) Nonparametric measures of multivariate association. In: Lecture notes-monograph series, pp 223–232
Popovic GC, Hui FKC, Warton DI (2018) A general algorithm for covariance modeling of discrete data. J Multivar Anal 165:86–100
R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Renard B, Lang M (2007) Use of a Gaussian copula for multivariate extreme value analysis: some case studies in hydrology. Adv Water Resour 30(4):897–912
Roy A, McElroy TS, Linton P (2019) Constrained estimation of causal invertible VARMA. Stat Sin 29:455–478
Schmid F, Schmidt R (2007) Multivariate conditional versions of Spearman’s rho and related measures of tail dependence. J Multivar Anal 98(6):1123–1140
Schoenberg R (1997) Constrained maximum likelihood. Comput Econ 10(3):251–266
Somerfield PJ, Clarke KR (2013) Inverse analysis in non-parametric multivariate analyses: distinguishing groups of associated species which covary coherently across samples. J Exp Mar Biol Ecol 449:261–273
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc: Ser B (Methodol) 58(1):267–288
Trede M (2020) Maximum likelihood estimation of high-dimensional student-t copulas. Stat Probab Lett 159:108678
Tseng P (2001) Convergence of a block coordinate descent method for nondifferentiable minimization. J Optim Theory Appl 109(3):475–494
Ullah I, Jones B (2015) Regularised MANOVA for high-dimensional data. Aust N Z J Stat 57(3):377–389
Vetterling WT, Vetterling WT, Press WH, Press WH, Teukolsky SA, Flannery BP, Flannery BP (1992) Numerical recipes: example book C. Cambridge University Press, Cambridge
Wang H (2014) Coordinate descent algorithm for covariance graphical lasso. Stat Comput 24(4):521–529
Welsh AH, Cunningham RB, Donnelly CF, Lindenmayer DB (1996) Modelling the abundance of rare species: statistical models for counts with extra zeros. Ecol Model 88(1–3):297–308
White GC, Bennetts RE (1996) Analysis of frequency count data using the negative binomial distribution. Ecology 77(8):2549–2557
Won JH, Kim S-J (2006) Maximum likelihood covariance estimation with a condition number constraint. In: 2006 fortieth Asilomar conference on signals, systems and computers. IEEE, pp 1445–1449
Wu TT, Lange K et al (2008) Coordinate descent algorithms for lasso penalized regression. Ann Appl Stat 2(1):224–244
Zhu Y, Shen X, Pan W (2020) On high-dimensional constrained maximum likelihood inference. J Am Stat Assoc 115(529):217–230
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429
Acknowledgements
This research was supported by funding from PRIMER-e (Quest Research Limited). MJA was also supported by PRIMER-e (Quest Research Limited), a Royal Society of New Zealand Marsden Grant (19-MAU-145), and the Strategic Science Investment Fund, administered by the Ministry of Business Innovation and Employment, Aotearoa/New Zealand.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Adegoke, N.A., Punnett, A. & Anderson, M.J. Estimation of Multivariate Dependence Structures via Constrained Maximum Likelihood. JABES 27, 240–260 (2022). https://doi.org/10.1007/s13253-021-00475-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13253-021-00475-x