Fast covariance estimation for high-dimensional functional data

Xiao, Luo; Zipunnikov, Vadim; Ruppert, David; Crainiceanu, Ciprian

doi:10.1007/s11222-014-9485-x

Fast covariance estimation for high-dimensional functional data

Published: 27 June 2014

Volume 26, pages 409–421, (2016)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Luo Xiao¹,
Vadim Zipunnikov¹,
David Ruppert² &
…
Ciprian Crainiceanu¹

2022 Accesses
44 Citations
1 Altmetric
Explore all metrics

Abstract

We propose two fast covariance smoothing methods and associated software that scale up linearly with the number of observations per function. Most available methods and software cannot smooth covariance matrices of dimension $J>500$; a recently introduced sandwich smoother is an exception but is not adapted to smooth covariance matrices of large dimensions, such as $J= 10{,}000$. We introduce two new methods that circumvent those problems: (1) a fast implementation of the sandwich smoother for covariance smoothing; and (2) a two-step procedure that first obtains the singular value decomposition of the data matrix and then smoothes the eigenvectors. These new approaches are at least an order of magnitude faster in high dimensions and drastically reduce computer memory requirements. The new approaches provide instantaneous (a few seconds) smoothing for matrices of dimension $J=10{,}000$ and very fast ($<$10 min) smoothing for $J=100{,}000$. R functions, simulations, and data analysis provide ready to use, reproducible, and scalable tools for practical data analysis of noisy high-dimensional functional data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluating significance in linear mixed-effects models in R

Article 12 September 2016

Steven G. Luke

The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective

Article 07 February 2017

John K. Kruschke & Torrin M. Liddell

The fallacy of placing confidence in confidence intervals

Article Open access 08 October 2015

Richard D. Morey, Rink Hoekstra, … Eric-Jan Wagenmakers

References

Besse, P., Cardot, H., Ferraty, F.: Simultaneous nonparametric regressions of unbalanced longitudinal data. Comput. Stat. Data Anal. 24, 255–270 (1997)
Article MATH MathSciNet Google Scholar
Besse, P., Ramsay, J.O.: Principal components analysis of sampled functions. Psychometrika 51, 285–311 (1986)
Article MATH MathSciNet Google Scholar
Bunea, F., Xiao L.: On the sample covariance matrix estimator of reduced effective rank population matrices, with applications to fPCA. To appear in Bernoulli. http://arxiv.org/abs/1212.5321 (2013)
Capra, W., Müller, H.: An accelerated-time model for response curves. J. Am. Stat. Assoc. 92, 72–83 (1997)
Article MATH Google Scholar
Cardot, H.: Nonparametric estimation of smoothed principal components analysis of sampled noisy functions. J. Nonparametr. Stat. 12, 503–538 (2000)
Article MATH MathSciNet Google Scholar
Crainiceanu, C., Reiss, P., Goldsmith, J., Huang, L., Huo, L., Scheipl, F., Swihart, B., Greven, S., Harezlak, J., Kundu, M., Zhao, Y., Mclean, M., Xiao, L.: R package refund: methodology for regression with functional data (version 0.1-9). http://cran.r-project.org/web/packages/refund/index.html (2013)
Crainiceanu, C., Staicu, A., Di, C.: Generalized multilevel functional regression. J. Am. Stat. Assoc. 104, 1550–1561 (2009)
Article MATH MathSciNet Google Scholar
Crainiceanu, C., Staicu, A., Ray, S., Punjabi, N.: Bootstrap-based inference on the difference in the means of two correlated functional processes. Stat. Med. 31, 3223–3240 (2012)
Article MathSciNet Google Scholar
Craven, P., Wahba, G.: Smoothing noisy data with spline functions. Numer. Math. 31, 377–403 (1979)
Article MATH MathSciNet Google Scholar
Cummins, D., Filloon, T., Nychka, D.: Confidence intervals for nonparametric curve estimates: toward more uniform pointwise coverage. J. Am. Stat. Assoc. 96, 233–246 (2001)
Article MATH MathSciNet Google Scholar
Dauxois, J., Pousse, A., Romain, Y.: Simultaneous nonparametric regressions of unbalanced longitudinal data. J. Multivar. Anal. 12, 136–154 (1982)
Article MATH MathSciNet Google Scholar
Di, C., Crainiceanu, C.M., Caffo, B.S., Punjabi, N.: Multilevel functional principal component analysis. Ann. Appl. Stat. 3, 458–488 (2009)
Article MATH MathSciNet Google Scholar
Eilers, P., Marx, B.: Flexible smoothing with B-splines and penalties (with Discussion). Stat. Sci. 11, 89–121 (1996)
Article MATH MathSciNet Google Scholar
Eilers, P., Marx, B.: Multivariate calibration with temperature interaction using two-dimensional penalized signal regression. Chemometr. Intell. Lab. Syst. 66, 159–174 (2003)
Article Google Scholar
Goldsmith, J., Bobb, J., Crainiceanu, C., Caffo, B., Reich, D.: Longitudinal functional principal component. J. Comput. Graph. Stat. 20, 830–851 (2011)
Article MathSciNet Google Scholar
Greven, S., Crainiceanu, C., Caffo, B., Reich, D.: Longitudinal functional principal component. Electron. J. Stat. 4, 1022–1054 (2010)
Article MATH MathSciNet Google Scholar
Karhunen, K.: Uber lineare methoden in der wahrscheinlichkeitsrechnung. Annales Academie Scientiarum Fennicae 37, 1–79 (1947)
Google Scholar
Kim, Y.J., Gu, C.: Smoothing spline Gaussian regression: more scalable computation via efficient approximation. J. R. Stat. Soc. B 66, 337–356 (2004)
Article MATH MathSciNet Google Scholar
Kneip, A.: Nonparametric estimation of common regressors for similar curve data. Ann. Stat. 22, 1386–1427 (1994)
Article MATH MathSciNet Google Scholar
Marx, B., Eilers, P.: Multidimensional penalized signal regression. Technometrics 47, 13–22 (2005)
Article MathSciNet Google Scholar
Ramsay, J., Dalzell, C.J.: Some tools for functional data analysis (with Discussion). J. R. Stat. Soc. B 53, 539–572 (1991)
MATH MathSciNet Google Scholar
Ramsay, J., Silverman, B.: Functional data analysis. Springer, New York (2005)
Book Google Scholar
Ramsay, J., Silverman, B.W.: Applied Functional Data Analysis: Methods and Case Studies. Springer, New York (2002)
Book Google Scholar
Rice, J., Silverman, B.: Estimating the mean and covariance structure nonparametrically when the data are curves. J. R. Stat. Soc. B 53, 233–243 (1991)
MATH MathSciNet Google Scholar
Ruppert, D.: Selecting the number of knots for penalized splines. J. Comput. Graph. Stat. 1, 735–757 (2002)
Article MathSciNet Google Scholar
Ruppert, D., Wand, M., Carroll, R.: Semiparametric Regression. Cambridge University Press, Cambridge (2003)
Book MATH Google Scholar
Seber, G.: A Matrix Handbook for Statisticians. Wiley-Interscience, New Jersey (2007)
Book Google Scholar
Shinohara, R., Crainiceanu, C., Caffo, B., Reich, D.: Longitudinal analysis of spatio-temporal processes: a case study of dynamic contrast-enhanced magnetic resonance imaging in multiple sclerosis. http://biostats.bepress.com/jhubiostat/paper231/ (2014)
Shou, H., Zipunnikov, V., Crainiceanu, C., Greven, S.: Structured functional principal component analysis. http://arxiv.org/pdf/1304.6783.pdf (2013)
Staniswalis, J., Lee, J.: Nonparametric regression analysis of longitudinal data. J. Am. Stat. Assoc. 93, 1403–1418 (1998)
Article MATH MathSciNet Google Scholar
Swihart, B., Caffo, B., Crainiceanu, C., Punjabi, N.: Mixed effect poisson log-linear models for clinical and epidemiological sleep hypnogram data. Stat. Med. 31, 855–870 (2012)
Article MathSciNet Google Scholar
Wang, X., Shen, J., Ruppert, D.: Some asymptotic results on generalized penalized spline smoothing. Electron. J. Stat. 4, 1–17 (2011)
MathSciNet Google Scholar
Wood, S.: Thin plate regression splines. J. R. Stat. Soc. B 65, 95–114 (2003)
Article MATH Google Scholar
Wood, S.: R package mgcv: mixed GAM computation vehicle with GCV/AIC/REML, smoothese estimation (version 1.7-24). http://cran.r-project.org/web/packages/mgcv/index.html (2013)
Xiao, L., Li, Y., Apanasovich, T., Ruppert, D.: Local asymptotics of P-splines. http://arxiv.org/abs/1201.0708v3 (2012)
Xiao, L., Li, Y., Ruppert, D.: Fast bivariate P-splines: the sandwich smoother. J. R. Stat. Soc. B 75, 577–599 (2013)
Article MathSciNet Google Scholar
Yao, F., Müller, H., Clifford, A., Dueker, S., Follett, J., Lin, Y., Buchholz, B., Vogel, J.: Shrinkage estimation for functional principal component scores with application to the population kinetics of plasma folate. Biometrics 20, 852–873 (2003)
Google Scholar
Yao, F., Müller, H., Wang, J.: Functional data analysis for sparse longitudinal data. J. Am. Stat. Assoc. 100, 577–590 (2005)
Article MATH Google Scholar
Zhang, J., Chen, J.: Statistical inferences for functional data. Ann. Stat. 35, 1052–1079 (2007)
Article MATH Google Scholar
Zipunnikov, V., Caffo, B.S., Crainiceanu, C.M., Yousem, D., Davatzikos, C., Schwartz, B.: Multilevel functional principal component analysis for high-dimensional data. J. Comput. Graph. Stat. 20, 852–873 (2011)
Article MathSciNet Google Scholar
Zipunnikov, V., Greven, S., Shou, H., Caffo, B.S., Reich, D.S., Crainiceanu, C.: Longitudinal high-dimensional principal components analysis with application to diffusion tensor imaging of multiple sclerosis. Ann. Appl. Stat. http://biostats.bepress.com/jhubiostat/paper234/ (2012)

Download references

Acknowledgments

This work was supported by Grant Number R01EB012547 from the National Institute of Biomedical Imaging And Bioengineering and Grant Number R01NS060910 from the National Institute of Neurological Disorders and Stroke. This work represents the opinions of the researchers and not necessarily that of the granting organizations.

Author information

Authors and Affiliations

Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA
Luo Xiao, Vadim Zipunnikov & Ciprian Crainiceanu
Department of Statistical Science and School of Operations Research and Information Engineering, Cornell University, Ithaca, NY, USA
David Ruppert

Authors

Luo Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Vadim Zipunnikov
View author publications
You can also search for this author in PubMed Google Scholar
David Ruppert
View author publications
You can also search for this author in PubMed Google Scholar
Ciprian Crainiceanu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luo Xiao.

Appendices

Appendix 1: Proofs

Proof of Proposition 1

The design matrix $\mathbf{B}$ is of full rank (Xiao et al. 2012). Hence $\mathbf{B}^T\mathbf{B}$ is invertible and $\mathbf{A}_S$ is of rank $c$. ${\varvec{\Sigma }}_S$ is a diagonal matrix with all elements greater than 0 and ${\widetilde{\mathbf{Y}}}$ is of rank at most $\min (c,I)$. Hence ${\widetilde{\mathbf{K}}}= \mathbf{A}_S\left( I^{-1}{\varvec{\Sigma }}_S{\widetilde{\mathbf{Y}}}{\widetilde{\mathbf{Y}}}^T{\varvec{\Sigma }}_S\right) \mathbf{A}_S^T$ has a rank at most $\min (c,I)$ and the proposition follows. $\square $

Proof of Proposition 2

First of all, $\text{ tr }(\mathbf {S}) = \text{ tr }({\varvec{\Sigma }}_S)$ which is easy to calculate. We now compute $\sum _{i=1}^I \Vert \mathbf{Y}_i - \mathbf{S}\mathbf{Y}_i\Vert ^2$. Because $ \Vert \mathbf{Y}_i - \mathbf{S}\mathbf{Y}_i\Vert ^2 = \mathbf{Y}_i^T(\mathbf{S}-\mathbf{I}_J)^2\mathbf{Y}_i = \text{ tr }\{(\mathbf{S}-\mathbf{I}_J)^2\mathbf{Y}_i\mathbf{Y}_i^T\}$,

$$\begin{aligned} \sum _{i=1}^I \Vert \mathbf{Y}_i - \mathbf{S}\mathbf{Y}_i\Vert ^2&= \text{ tr }\left\{ (\mathbf{S}-\mathbf{I}_J)^2\sum _{i=1}^I \mathbf{Y}_i\mathbf{Y}_i^T\right\} \\&= \text{ tr }\left\{ (\mathbf{S}-\mathbf{I}_J)^2\mathbf{Y}\mathbf{Y}^T\right\} . \end{aligned}$$

It can be shown that $\mathbf{S}^2 = \mathbf{A}_S {\varvec{\Sigma }}_S^2\mathbf{A}_S^T$. Hence $\text{ tr }(\mathbf{S}^2\mathbf{Y}\mathbf{Y}^T) = \text{ tr }(\mathbf{Y}^T\mathbf{S}^2\mathbf{Y}) = \text{ tr }({\widetilde{\mathbf{Y}}}^T{\varvec{\Sigma }}_S^2{\widetilde{\mathbf{Y}}})=\text{ tr }({\varvec{\Sigma }}_S^2{\widetilde{\mathbf{Y}}}{\widetilde{\mathbf{Y}}}^T)$. Similarly, we derive $\text{ tr }(\mathbf{S}\mathbf{Y}\mathbf{Y}^T) = \text{ tr }({\varvec{\Sigma }}_S{\widetilde{\mathbf{Y}}}{\widetilde{\mathbf{Y}}}^T)$. We have $\text{ tr }(\mathbf{Y}\mathbf{Y}^T) = \Vert \mathbf{Y}\Vert _F^2$. It follows that

$$\begin{aligned} \sum _{i=1}^I \Vert \mathbf{Y}_i - \mathbf{S}\mathbf{Y}_i\Vert ^2 = \text{ tr }\left\{ ({\varvec{\Sigma }}_S-\mathbf{I}_c)^2{\widetilde{\mathbf{Y}}}{\widetilde{\mathbf{Y}}}^T\right\} -\Vert {\widetilde{\mathbf{Y}}}\Vert _F^2+ \Vert \mathbf{Y}\Vert _F^2. \end{aligned}$$

$\square $

Proposition 3

The computation time of FACE is $O(IJc +Jc^2+c^3 + ck_0)$, where $k_0$ is the number of iterations needed for selecting the smoothing parameter (see Sect. 3.2), and the total required computer memory is $O(JI+I^2+Jc+c^2+k_0)$ memory units.

Proof of Proposition 3

We need to compute or store the following quantities: $\mathbf{X}$, $\mathbf{B}$, $\mathbf{B}^T\mathbf{B}$, $(\mathbf{B}^T\mathbf{B})^{-1/2}, \mathbf{P}, (\mathbf{B}^T\mathbf{B})^{-1/2}\mathbf{P}(\mathbf{B}^T\mathbf{B})^{-1/2}, \mathbf{A}_S, {\widetilde{\mathbf{Y}}}, \mathbf{A}, \mathbf{U}$, and $\mathbf{A}_S \mathbf{A}$. For the computational complexity, $\mathbf{B}^T\mathbf{B}$, $\mathbf{A}_S = \mathbf{B}(\mathbf{B}^T\mathbf{B})^{-1/2}\mathbf{U}$, and $\mathbf{A}_S\mathbf{A}$ require $O(Jc^2)$ computations; $(\mathbf{B}^T\mathbf{B})^{-1/2}$, $\mathbf{P}$, $(\mathbf{B}^T\mathbf{B})^{-1/2}\mathbf{P}(\mathbf{B}^T\mathbf{B})^{-1/2}$, $\mathbf{A}$, and $\mathbf{U}$ require $O(c^3)$ computations; ${\widetilde{\mathbf{Y}}}= \mathbf{A}_S^T\mathbf{Y}$ requires $O(JIc)$ computations. So in total, $O(JIc+Jc^2+c^3)$ computations are required. For the memory burden, the loading of $\mathbf{Y}$ requires $O(JI)$ memory units, computer of $\mathbf{B}$ and $\mathbf{A}_S\mathbf{A}$ requires $O(Jc)$ memory units, and other objects require $O(c^2)$ memory units. $\square $

Proof of Theorem 1 We have ${\widehat{{\varvec{\xi }}}}_i \!=\! J^{-1/2}(\mathbf{A}_S{\widehat{\mathbf{A}}}_N)^T\mathbf{Y}_i\! =\! J^{-1/2}{\widehat{\mathbf{A}}}_N^T(\mathbf{A}_S^T\mathbf{Y}_i)=J^{-1/2}{\widehat{\mathbf{A}}}_N^T{\widetilde{\mathbf{Y}}}_i$.$\square $

Proof of Theorem 2 Let ${\widetilde{\mathbf{A}}}_N$ denote the first $N$ columns of $\mathbf{A}_S\mathbf{A}$, then ${\widetilde{\mathbf{A}}}_N = \mathbf{A}_S{\widehat{\mathbf{A}}}$. The estimated BLUPs for ${\varvec{\xi }}_i$ (Ruppert et al. 2003) is

$$\begin{aligned} {\widehat{{\varvec{\xi }}}}_i = J^{-1/2}{\widehat{{\varvec{\Sigma }}}}_N{\widetilde{\mathbf{A}}}_N^T \left( {\widetilde{\mathbf{A}}}_N{\widehat{{\varvec{\Sigma }}}}_N{\widetilde{\mathbf{A}}}_N^T +J^{-1}{\widehat{\sigma }}^2\mathbf{I}_J\right) ^{-1}\mathbf{Y}_i. \end{aligned}$$

The inverse matrix in the above equality can be replaced by the following [Seber (2007), page 309, equality b(i)],

$$\begin{aligned}&\left( {\widehat{\mathbf{A}}}_N{\widehat{{\varvec{\Sigma }}}}_N{\widetilde{\mathbf{A}}}_N^T + J^{-1}{\widehat{\sigma }}^2\mathbf{I}_J\right) ^{-1}\\&\quad = \frac{J}{{\widehat{\sigma }}^2}\left\{ \mathbf{I}_N- \frac{J}{{\widehat{\sigma }}^2} {\widetilde{\mathbf{A}}}_N\left( {\widehat{{\varvec{\Sigma }}}}_N^{-1} +\frac{J}{{\widehat{\sigma }}^2}\mathbf{I}_N\right) ^{-1} {\widetilde{\mathbf{A}}}_N^T\right\} . \end{aligned}$$

It follows that

$$\begin{aligned} {\widehat{{\varvec{\xi }}}}&= J^{-1/2}\frac{J}{{\widehat{\sigma }}^2}{\widehat{{\varvec{\Sigma }}}}\left\{ \mathbf{I}_N - \frac{J}{{\widehat{\sigma }}^2}\left( {\widehat{{\varvec{\Sigma }}}}_N^{-1} +\frac{J}{{\widehat{\sigma }}^2}\mathbf{I}_N\right) ^{-1}\right\} {\widehat{\mathbf{A}}}_N^T{\widetilde{\mathbf{Y}}}_i\\&= J^{-1/2} {\widehat{{\varvec{\Sigma }}}}_N\left( {\widehat{{\varvec{\Sigma }}}}_N + J^{-1}{\widehat{\sigma }}^2\mathbf{I}_N\right) ^{-1}{\widehat{\mathbf{A}}}_N^T{\widetilde{\mathbf{Y}}}_i. \end{aligned}$$

$\square $

Appendix 2: Empirical covariance operators for $K_X$ and $K_U$

Let $I$ denote the number of pairs of cases and controls. For simplicity, we assume estimates of $\mu _A(t)$ and $\mu _C(t)$ have been subtracted from $Y_{iA}$ and $Y_{iC}$, respectively. Let $\mathbf{Y}_{iA} = (Y_{iA}(t_1),\ldots , Y_{iA}(t_T))^T$ and $\mathbf{Y}_{iC} = (Y_{iC}(t_1),\ldots , Y_{iC}(t_J))^T$. By Zipunnikov et al. (2011), we have estimates of the covariance operators,

$$\begin{aligned} {\widehat{\mathbf{K}}}_X = \frac{1}{2I}\sum _{i=1}^I\left( \mathbf{Y}_{iA}\mathbf{Y}_{iC}^T + \mathbf{Y}_{iC}\mathbf{Y}_{iA}^T\right) , \end{aligned}$$

and

$$\begin{aligned} {\widehat{\mathbf{K}}}_U = \frac{1}{2I}\sum _{i=1}^I\left( \mathbf{Y}_{iA}-\mathbf{Y}_{iC}\right) \left( \mathbf{Y}_{iA}-\mathbf{Y}_{iC}\right) ^T. \end{aligned}$$

Let $\mathbf{Y}_A = [\mathbf{Y}_{1A},\ldots , \mathbf{Y}_{nA}]$, $\mathbf{Y}_C = [\mathbf{Y}_{1C}, \ldots , \mathbf{Y}_{nC}]$ and $\mathbf{Y}= [\mathbf{Y}_A,\mathbf{Y}_C]$. Then $\mathbf{Y}$ is of dimension $J\times 2I$. It can be shown that ${\widehat{\mathbf{K}}}_X = \mathbf{Y}\mathbf{H}_X\mathbf{Y}^T$ and ${\widehat{\mathbf{K}}}_U = \mathbf{Y}\mathbf{H}_U\mathbf{Y}^T$, where

$$\begin{aligned} \mathbf{H}_X = \frac{1}{2I}\left( \begin{array}{cc} \mathbf {0}_{I}&{}\quad \mathbf{I}_{I}\\ \mathbf{I}_{I}&{}\quad \mathbf {0}_{I} \end{array}\right) ,\,\, \mathbf{H}_U = \frac{1}{2I}\left( \begin{array}{cc} \mathbf {I}_{I}&{}\quad -\mathbf{I}_{I}\\ -\mathbf{I}_{I}&{}\quad \mathbf {I}_{I} \end{array}\right) . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiao, L., Zipunnikov, V., Ruppert, D. et al. Fast covariance estimation for high-dimensional functional data. Stat Comput 26, 409–421 (2016). https://doi.org/10.1007/s11222-014-9485-x

Download citation

Received: 04 September 2013
Accepted: 09 June 2014
Published: 27 June 2014
Issue Date: January 2016
DOI: https://doi.org/10.1007/s11222-014-9485-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast covariance estimation for high-dimensional functional data

Abstract

Access this article

Similar content being viewed by others

Evaluating significance in linear mixed-effects models in R

The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective

The fallacy of placing confidence in confidence intervals

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proofs

Proof of Proposition 1

Proof of Proposition 2

Proposition 3

Proof of Proposition 3

Appendix 2: Empirical covariance operators for \(K_X\) and \(K_U\)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fast covariance estimation for high-dimensional functional data

Abstract

Access this article

Similar content being viewed by others

Evaluating significance in linear mixed-effects models in R

The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective

The fallacy of placing confidence in confidence intervals

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proofs

Proof of Proposition 1

Proof of Proposition 2

Proposition 3

Proof of Proposition 3

Appendix 2: Empirical covariance operators for \(K_X\) and \(K_U\)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation