Skip to main content
Log in

Fast covariance estimation for high-dimensional functional data

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

We propose two fast covariance smoothing methods and associated software that scale up linearly with the number of observations per function. Most available methods and software cannot smooth covariance matrices of dimension \(J>500\); a recently introduced sandwich smoother is an exception but is not adapted to smooth covariance matrices of large dimensions, such as \(J= 10{,}000\). We introduce two new methods that circumvent those problems: (1) a fast implementation of the sandwich smoother for covariance smoothing; and (2) a two-step procedure that first obtains the singular value decomposition of the data matrix and then smoothes the eigenvectors. These new approaches are at least an order of magnitude faster in high dimensions and drastically reduce computer memory requirements. The new approaches provide instantaneous (a few seconds) smoothing for matrices of dimension \(J=10{,}000\) and very fast (\(<\)10 min) smoothing for \(J=100{,}000\). R functions, simulations, and data analysis provide ready to use, reproducible, and scalable tools for practical data analysis of noisy high-dimensional functional data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Besse, P., Cardot, H., Ferraty, F.: Simultaneous nonparametric regressions of unbalanced longitudinal data. Comput. Stat. Data Anal. 24, 255–270 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  • Besse, P., Ramsay, J.O.: Principal components analysis of sampled functions. Psychometrika 51, 285–311 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  • Bunea, F., Xiao L.: On the sample covariance matrix estimator of reduced effective rank population matrices, with applications to fPCA. To appear in Bernoulli. http://arxiv.org/abs/1212.5321 (2013)

  • Capra, W., Müller, H.: An accelerated-time model for response curves. J. Am. Stat. Assoc. 92, 72–83 (1997)

    Article  MATH  Google Scholar 

  • Cardot, H.: Nonparametric estimation of smoothed principal components analysis of sampled noisy functions. J. Nonparametr. Stat. 12, 503–538 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  • Crainiceanu, C., Reiss, P., Goldsmith, J., Huang, L., Huo, L., Scheipl, F., Swihart, B., Greven, S., Harezlak, J., Kundu, M., Zhao, Y., Mclean, M., Xiao, L.: R package refund: methodology for regression with functional data (version 0.1-9). http://cran.r-project.org/web/packages/refund/index.html (2013)

  • Crainiceanu, C., Staicu, A., Di, C.: Generalized multilevel functional regression. J. Am. Stat. Assoc. 104, 1550–1561 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  • Crainiceanu, C., Staicu, A., Ray, S., Punjabi, N.: Bootstrap-based inference on the difference in the means of two correlated functional processes. Stat. Med. 31, 3223–3240 (2012)

    Article  MathSciNet  Google Scholar 

  • Craven, P., Wahba, G.: Smoothing noisy data with spline functions. Numer. Math. 31, 377–403 (1979)

    Article  MATH  MathSciNet  Google Scholar 

  • Cummins, D., Filloon, T., Nychka, D.: Confidence intervals for nonparametric curve estimates: toward more uniform pointwise coverage. J. Am. Stat. Assoc. 96, 233–246 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  • Dauxois, J., Pousse, A., Romain, Y.: Simultaneous nonparametric regressions of unbalanced longitudinal data. J. Multivar. Anal. 12, 136–154 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  • Di, C., Crainiceanu, C.M., Caffo, B.S., Punjabi, N.: Multilevel functional principal component analysis. Ann. Appl. Stat. 3, 458–488 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  • Eilers, P., Marx, B.: Flexible smoothing with B-splines and penalties (with Discussion). Stat. Sci. 11, 89–121 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  • Eilers, P., Marx, B.: Multivariate calibration with temperature interaction using two-dimensional penalized signal regression. Chemometr. Intell. Lab. Syst. 66, 159–174 (2003)

    Article  Google Scholar 

  • Goldsmith, J., Bobb, J., Crainiceanu, C., Caffo, B., Reich, D.: Longitudinal functional principal component. J. Comput. Graph. Stat. 20, 830–851 (2011)

    Article  MathSciNet  Google Scholar 

  • Greven, S., Crainiceanu, C., Caffo, B., Reich, D.: Longitudinal functional principal component. Electron. J. Stat. 4, 1022–1054 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  • Karhunen, K.: Uber lineare methoden in der wahrscheinlichkeitsrechnung. Annales Academie Scientiarum Fennicae 37, 1–79 (1947)

    Google Scholar 

  • Kim, Y.J., Gu, C.: Smoothing spline Gaussian regression: more scalable computation via efficient approximation. J. R. Stat. Soc. B 66, 337–356 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  • Kneip, A.: Nonparametric estimation of common regressors for similar curve data. Ann. Stat. 22, 1386–1427 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  • Marx, B., Eilers, P.: Multidimensional penalized signal regression. Technometrics 47, 13–22 (2005)

    Article  MathSciNet  Google Scholar 

  • Ramsay, J., Dalzell, C.J.: Some tools for functional data analysis (with Discussion). J. R. Stat. Soc. B 53, 539–572 (1991)

    MATH  MathSciNet  Google Scholar 

  • Ramsay, J., Silverman, B.: Functional data analysis. Springer, New York (2005)

    Book  Google Scholar 

  • Ramsay, J., Silverman, B.W.: Applied Functional Data Analysis: Methods and Case Studies. Springer, New York (2002)

    Book  Google Scholar 

  • Rice, J., Silverman, B.: Estimating the mean and covariance structure nonparametrically when the data are curves. J. R. Stat. Soc. B 53, 233–243 (1991)

    MATH  MathSciNet  Google Scholar 

  • Ruppert, D.: Selecting the number of knots for penalized splines. J. Comput. Graph. Stat. 1, 735–757 (2002)

    Article  MathSciNet  Google Scholar 

  • Ruppert, D., Wand, M., Carroll, R.: Semiparametric Regression. Cambridge University Press, Cambridge (2003)

    Book  MATH  Google Scholar 

  • Seber, G.: A Matrix Handbook for Statisticians. Wiley-Interscience, New Jersey (2007)

    Book  Google Scholar 

  • Shinohara, R., Crainiceanu, C., Caffo, B., Reich, D.: Longitudinal analysis of spatio-temporal processes: a case study of dynamic contrast-enhanced magnetic resonance imaging in multiple sclerosis. http://biostats.bepress.com/jhubiostat/paper231/ (2014)

  • Shou, H., Zipunnikov, V., Crainiceanu, C., Greven, S.: Structured functional principal component analysis. http://arxiv.org/pdf/1304.6783.pdf (2013)

  • Staniswalis, J., Lee, J.: Nonparametric regression analysis of longitudinal data. J. Am. Stat. Assoc. 93, 1403–1418 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  • Swihart, B., Caffo, B., Crainiceanu, C., Punjabi, N.: Mixed effect poisson log-linear models for clinical and epidemiological sleep hypnogram data. Stat. Med. 31, 855–870 (2012)

    Article  MathSciNet  Google Scholar 

  • Wang, X., Shen, J., Ruppert, D.: Some asymptotic results on generalized penalized spline smoothing. Electron. J. Stat. 4, 1–17 (2011)

    MathSciNet  Google Scholar 

  • Wood, S.: Thin plate regression splines. J. R. Stat. Soc. B 65, 95–114 (2003)

    Article  MATH  Google Scholar 

  • Wood, S.: R package mgcv: mixed GAM computation vehicle with GCV/AIC/REML, smoothese estimation (version 1.7-24). http://cran.r-project.org/web/packages/mgcv/index.html (2013)

  • Xiao, L., Li, Y., Apanasovich, T., Ruppert, D.: Local asymptotics of P-splines. http://arxiv.org/abs/1201.0708v3 (2012)

  • Xiao, L., Li, Y., Ruppert, D.: Fast bivariate P-splines: the sandwich smoother. J. R. Stat. Soc. B 75, 577–599 (2013)

    Article  MathSciNet  Google Scholar 

  • Yao, F., Müller, H., Clifford, A., Dueker, S., Follett, J., Lin, Y., Buchholz, B., Vogel, J.: Shrinkage estimation for functional principal component scores with application to the population kinetics of plasma folate. Biometrics 20, 852–873 (2003)

    Google Scholar 

  • Yao, F., Müller, H., Wang, J.: Functional data analysis for sparse longitudinal data. J. Am. Stat. Assoc. 100, 577–590 (2005)

    Article  MATH  Google Scholar 

  • Zhang, J., Chen, J.: Statistical inferences for functional data. Ann. Stat. 35, 1052–1079 (2007)

    Article  MATH  Google Scholar 

  • Zipunnikov, V., Caffo, B.S., Crainiceanu, C.M., Yousem, D., Davatzikos, C., Schwartz, B.: Multilevel functional principal component analysis for high-dimensional data. J. Comput. Graph. Stat. 20, 852–873 (2011)

    Article  MathSciNet  Google Scholar 

  • Zipunnikov, V., Greven, S., Shou, H., Caffo, B.S., Reich, D.S., Crainiceanu, C.: Longitudinal high-dimensional principal components analysis with application to diffusion tensor imaging of multiple sclerosis. Ann. Appl. Stat. http://biostats.bepress.com/jhubiostat/paper234/ (2012)

Download references

Acknowledgments

This work was supported by Grant Number R01EB012547 from the National Institute of Biomedical Imaging And Bioengineering and Grant Number R01NS060910 from the National Institute of Neurological Disorders and Stroke. This work represents the opinions of the researchers and not necessarily that of the granting organizations.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luo Xiao.

Appendices

Appendix 1: Proofs

Proof of Proposition 1

The design matrix \(\mathbf{B}\) is of full rank (Xiao et al. 2012). Hence \(\mathbf{B}^T\mathbf{B}\) is invertible and \(\mathbf{A}_S\) is of rank \(c\). \({\varvec{\Sigma }}_S\) is a diagonal matrix with all elements greater than 0 and \({\widetilde{\mathbf{Y}}}\) is of rank at most \(\min (c,I)\). Hence \({\widetilde{\mathbf{K}}}= \mathbf{A}_S\left( I^{-1}{\varvec{\Sigma }}_S{\widetilde{\mathbf{Y}}}{\widetilde{\mathbf{Y}}}^T{\varvec{\Sigma }}_S\right) \mathbf{A}_S^T\) has a rank at most \(\min (c,I)\) and the proposition follows. \(\square \)

Proof of Proposition 2

First of all, \(\text{ tr }(\mathbf {S}) = \text{ tr }({\varvec{\Sigma }}_S)\) which is easy to calculate. We now compute \(\sum _{i=1}^I \Vert \mathbf{Y}_i - \mathbf{S}\mathbf{Y}_i\Vert ^2\). Because \( \Vert \mathbf{Y}_i - \mathbf{S}\mathbf{Y}_i\Vert ^2 = \mathbf{Y}_i^T(\mathbf{S}-\mathbf{I}_J)^2\mathbf{Y}_i = \text{ tr }\{(\mathbf{S}-\mathbf{I}_J)^2\mathbf{Y}_i\mathbf{Y}_i^T\}\),

$$\begin{aligned} \sum _{i=1}^I \Vert \mathbf{Y}_i - \mathbf{S}\mathbf{Y}_i\Vert ^2&= \text{ tr }\left\{ (\mathbf{S}-\mathbf{I}_J)^2\sum _{i=1}^I \mathbf{Y}_i\mathbf{Y}_i^T\right\} \\&= \text{ tr }\left\{ (\mathbf{S}-\mathbf{I}_J)^2\mathbf{Y}\mathbf{Y}^T\right\} . \end{aligned}$$

It can be shown that \(\mathbf{S}^2 = \mathbf{A}_S {\varvec{\Sigma }}_S^2\mathbf{A}_S^T\). Hence \(\text{ tr }(\mathbf{S}^2\mathbf{Y}\mathbf{Y}^T) = \text{ tr }(\mathbf{Y}^T\mathbf{S}^2\mathbf{Y}) = \text{ tr }({\widetilde{\mathbf{Y}}}^T{\varvec{\Sigma }}_S^2{\widetilde{\mathbf{Y}}})=\text{ tr }({\varvec{\Sigma }}_S^2{\widetilde{\mathbf{Y}}}{\widetilde{\mathbf{Y}}}^T)\). Similarly, we derive \(\text{ tr }(\mathbf{S}\mathbf{Y}\mathbf{Y}^T) = \text{ tr }({\varvec{\Sigma }}_S{\widetilde{\mathbf{Y}}}{\widetilde{\mathbf{Y}}}^T)\). We have \(\text{ tr }(\mathbf{Y}\mathbf{Y}^T) = \Vert \mathbf{Y}\Vert _F^2\). It follows that

$$\begin{aligned} \sum _{i=1}^I \Vert \mathbf{Y}_i - \mathbf{S}\mathbf{Y}_i\Vert ^2 = \text{ tr }\left\{ ({\varvec{\Sigma }}_S-\mathbf{I}_c)^2{\widetilde{\mathbf{Y}}}{\widetilde{\mathbf{Y}}}^T\right\} -\Vert {\widetilde{\mathbf{Y}}}\Vert _F^2+ \Vert \mathbf{Y}\Vert _F^2. \end{aligned}$$

\(\square \)

Proposition 3

The computation time of FACE is \(O(IJc +Jc^2+c^3 + ck_0)\), where \(k_0\) is the number of iterations needed for selecting the smoothing parameter (see Sect. 3.2), and the total required computer memory is \(O(JI+I^2+Jc+c^2+k_0)\) memory units.

Proof of Proposition 3

We need to compute or store the following quantities: \(\mathbf{X}\), \(\mathbf{B}\), \(\mathbf{B}^T\mathbf{B}\), \((\mathbf{B}^T\mathbf{B})^{-1/2}, \mathbf{P}, (\mathbf{B}^T\mathbf{B})^{-1/2}\mathbf{P}(\mathbf{B}^T\mathbf{B})^{-1/2}, \mathbf{A}_S, {\widetilde{\mathbf{Y}}}, \mathbf{A}, \mathbf{U}\), and \(\mathbf{A}_S \mathbf{A}\). For the computational complexity, \(\mathbf{B}^T\mathbf{B}\), \(\mathbf{A}_S = \mathbf{B}(\mathbf{B}^T\mathbf{B})^{-1/2}\mathbf{U}\), and \(\mathbf{A}_S\mathbf{A}\) require \(O(Jc^2)\) computations; \((\mathbf{B}^T\mathbf{B})^{-1/2}\), \(\mathbf{P}\), \((\mathbf{B}^T\mathbf{B})^{-1/2}\mathbf{P}(\mathbf{B}^T\mathbf{B})^{-1/2}\), \(\mathbf{A}\), and \(\mathbf{U}\) require \(O(c^3)\) computations; \({\widetilde{\mathbf{Y}}}= \mathbf{A}_S^T\mathbf{Y}\) requires \(O(JIc)\) computations. So in total, \(O(JIc+Jc^2+c^3)\) computations are required. For the memory burden, the loading of \(\mathbf{Y}\) requires \(O(JI)\) memory units, computer of \(\mathbf{B}\) and \(\mathbf{A}_S\mathbf{A}\) requires \(O(Jc)\) memory units, and other objects require \(O(c^2)\) memory units. \(\square \)

Proof of Theorem 1 We have \({\widehat{{\varvec{\xi }}}}_i \!=\! J^{-1/2}(\mathbf{A}_S{\widehat{\mathbf{A}}}_N)^T\mathbf{Y}_i\! =\! J^{-1/2}{\widehat{\mathbf{A}}}_N^T(\mathbf{A}_S^T\mathbf{Y}_i)=J^{-1/2}{\widehat{\mathbf{A}}}_N^T{\widetilde{\mathbf{Y}}}_i\).\(\square \)

Proof of Theorem 2 Let \({\widetilde{\mathbf{A}}}_N\) denote the first \(N\) columns of \(\mathbf{A}_S\mathbf{A}\), then \({\widetilde{\mathbf{A}}}_N = \mathbf{A}_S{\widehat{\mathbf{A}}}\). The estimated BLUPs for \({\varvec{\xi }}_i\) (Ruppert et al. 2003) is

$$\begin{aligned} {\widehat{{\varvec{\xi }}}}_i = J^{-1/2}{\widehat{{\varvec{\Sigma }}}}_N{\widetilde{\mathbf{A}}}_N^T \left( {\widetilde{\mathbf{A}}}_N{\widehat{{\varvec{\Sigma }}}}_N{\widetilde{\mathbf{A}}}_N^T +J^{-1}{\widehat{\sigma }}^2\mathbf{I}_J\right) ^{-1}\mathbf{Y}_i. \end{aligned}$$

The inverse matrix in the above equality can be replaced by the following [Seber (2007), page 309, equality b(i)],

$$\begin{aligned}&\left( {\widehat{\mathbf{A}}}_N{\widehat{{\varvec{\Sigma }}}}_N{\widetilde{\mathbf{A}}}_N^T + J^{-1}{\widehat{\sigma }}^2\mathbf{I}_J\right) ^{-1}\\&\quad = \frac{J}{{\widehat{\sigma }}^2}\left\{ \mathbf{I}_N- \frac{J}{{\widehat{\sigma }}^2} {\widetilde{\mathbf{A}}}_N\left( {\widehat{{\varvec{\Sigma }}}}_N^{-1} +\frac{J}{{\widehat{\sigma }}^2}\mathbf{I}_N\right) ^{-1} {\widetilde{\mathbf{A}}}_N^T\right\} . \end{aligned}$$

It follows that

$$\begin{aligned} {\widehat{{\varvec{\xi }}}}&= J^{-1/2}\frac{J}{{\widehat{\sigma }}^2}{\widehat{{\varvec{\Sigma }}}}\left\{ \mathbf{I}_N - \frac{J}{{\widehat{\sigma }}^2}\left( {\widehat{{\varvec{\Sigma }}}}_N^{-1} +\frac{J}{{\widehat{\sigma }}^2}\mathbf{I}_N\right) ^{-1}\right\} {\widehat{\mathbf{A}}}_N^T{\widetilde{\mathbf{Y}}}_i\\&= J^{-1/2} {\widehat{{\varvec{\Sigma }}}}_N\left( {\widehat{{\varvec{\Sigma }}}}_N + J^{-1}{\widehat{\sigma }}^2\mathbf{I}_N\right) ^{-1}{\widehat{\mathbf{A}}}_N^T{\widetilde{\mathbf{Y}}}_i. \end{aligned}$$

\(\square \)

Appendix 2: Empirical covariance operators for \(K_X\) and \(K_U\)

Let \(I\) denote the number of pairs of cases and controls. For simplicity, we assume estimates of \(\mu _A(t)\) and \(\mu _C(t)\) have been subtracted from \(Y_{iA}\) and \(Y_{iC}\), respectively. Let \(\mathbf{Y}_{iA} = (Y_{iA}(t_1),\ldots , Y_{iA}(t_T))^T\) and \(\mathbf{Y}_{iC} = (Y_{iC}(t_1),\ldots , Y_{iC}(t_J))^T\). By Zipunnikov et al. (2011), we have estimates of the covariance operators,

$$\begin{aligned} {\widehat{\mathbf{K}}}_X = \frac{1}{2I}\sum _{i=1}^I\left( \mathbf{Y}_{iA}\mathbf{Y}_{iC}^T + \mathbf{Y}_{iC}\mathbf{Y}_{iA}^T\right) , \end{aligned}$$

and

$$\begin{aligned} {\widehat{\mathbf{K}}}_U = \frac{1}{2I}\sum _{i=1}^I\left( \mathbf{Y}_{iA}-\mathbf{Y}_{iC}\right) \left( \mathbf{Y}_{iA}-\mathbf{Y}_{iC}\right) ^T. \end{aligned}$$

Let \(\mathbf{Y}_A = [\mathbf{Y}_{1A},\ldots , \mathbf{Y}_{nA}]\), \(\mathbf{Y}_C = [\mathbf{Y}_{1C}, \ldots , \mathbf{Y}_{nC}]\) and \(\mathbf{Y}= [\mathbf{Y}_A,\mathbf{Y}_C]\). Then \(\mathbf{Y}\) is of dimension \(J\times 2I\). It can be shown that \({\widehat{\mathbf{K}}}_X = \mathbf{Y}\mathbf{H}_X\mathbf{Y}^T\) and \({\widehat{\mathbf{K}}}_U = \mathbf{Y}\mathbf{H}_U\mathbf{Y}^T\), where

$$\begin{aligned} \mathbf{H}_X = \frac{1}{2I}\left( \begin{array}{cc} \mathbf {0}_{I}&{}\quad \mathbf{I}_{I}\\ \mathbf{I}_{I}&{}\quad \mathbf {0}_{I} \end{array}\right) ,\,\, \mathbf{H}_U = \frac{1}{2I}\left( \begin{array}{cc} \mathbf {I}_{I}&{}\quad -\mathbf{I}_{I}\\ -\mathbf{I}_{I}&{}\quad \mathbf {I}_{I} \end{array}\right) . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiao, L., Zipunnikov, V., Ruppert, D. et al. Fast covariance estimation for high-dimensional functional data. Stat Comput 26, 409–421 (2016). https://doi.org/10.1007/s11222-014-9485-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-014-9485-x

Keywords

Navigation