Abstract
We give an introduction to functional data analysis, with examples, and provide a brief review of the literature. We explain how principal component analysis (PCA) can be used to transform curves into finite dimensional data. An application of PCA is developed to test for the equality of the means of several populations (functional analysis of variance). Asymptotics are derived under the null hypothesis that the populations have the same mean curves. The selection of the basis for the projections and the power of the test is discussed for simple random samples and stationary time series samples of curves. We review the part of the literature which is needed to establish the validity of the PCA method. Two data sets, magnetogram records and stock returns, are used to illustrate the applicability of our limit results.
Similar content being viewed by others
References
Abramovich, F., Angelini, C.: Testing in mixed-effects FANOVA models. J. Stat. Plan. Inference 136, 4326–4348 (2006)
Anderson, T.W.: An Introduction to Multivariate Statistical Analysis, 3rd edn. Wiley, New York (2003)
Andrews, D.: Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59, 817–858 (1991)
Antoniadis, A., Sapatinas, T.: Estimation and inference in functional mixed-effect models. Comput. Stat. Data Anal. 51, 4793–4813 (2007)
Arcones, M.A., Giné, E.: On the bootstrap of \(U\) and \(V\) statistics. Ann. Stat. 20, 655–674 (1992)
Aston, J., Kirch, C.: Detecting and estimating changes in dependent finctional data. J. Multivar. Anal. 109, 204–220 (2012)
Aue, A., Hörmann, S., Horváth, L., Hušková, M., Steinebach, J.: Sequential testing for the stability of portfolio betas. Econom. Theory 28, 804–837 (2012)
Aue, A., Horváth, L.: Structural breaks in time series. J. Time Ser. Anal. 23, 1–16 (2013)
Bartlett, M.S.: Further aspects of the theory of multiple regression. In: Proceedings of the Cambridge Philosophical Society, vol. 34, pp. 33–40 (1938)
Berkes, I., Horváth, L., Rice, G.: Weak invariance principals for sums of dependent random functions. Stoch. Process. Appl. 123, 385–403 (2013)
Berkes, I., Horváth, L., Rice, G.: On the asymptotic normality of kernel estimators of the long run covariance of functional time series, (2015, preprint)
Billingsley, P.: Convergence of Probability Measures. Wiley, New York (1968)
Bollerslev, T.: Modeling the coherence in short run nominal exchange rates: a multivariate generalized ARCH model. Rev. Econ. Stat. 72, 498–505 (1990)
Bosq, D.: Linear Processes in Function Spaces. Springer, New York (2000)
Brown, M.B., Forsythe, A.B.: Robust tests for equality of variances. J. Am. Stat. Assoc. 69, 364–367 (1974)
Bühlmann, P.: Blockwise bootstrapped empirical processes for stationary sequences. Annal. Stat. 22, 995–1012 (1994)
Cardot, H., Ferraty, F., Mas, A., Sarda, P.: Testing hypothesis in the functional linear model. Scand. J. Stat. 30, 241–255 (2003)
Cuesta-Albertos, J., Febrero, M.: A simple multiway ANOVA for functional data. Test 19, 537–557 (2010)
Cuevas, A.: A partial overview of the theory of statistics with functional data. J. Stat. Plan. Inference 147, 1–23 (2014)
Cuevas, A., Febrero, M., Fraiman, R.: An anova test for functional data. Comput. Stat. Data Anal. 47, 111–122 (2004)
Dauxois, J., Pousse, A., Romain, Y.: Asymptotic theory for the principal component analysis of a vector random function: some applications to statistical inference. J. Multiv. Anal. 12, 136–154 (1982)
Debnath, L., Mikusiński, P.: Hilbert Spaces with Applications, 3rd edn. Elsevier, New York (2005)
Dehling, H., Sharipov, O., Wendler, M.: Bootstrap for dependent Hilbert space-valued random variables with application to von Mises statistics. J. Multiv. Anal. 133, 200–215 (2015)
Delicado, P., Giraldo, R., Comas, C., Mateu, J.: Statistics for spatial functional data: Some recent contributions. Environmetrics 21, 224–239 (2010)
Dunford, N., Schwartz, J.T.: Linear Operators: General Theory (Part 1). Springer, New York (1988)
Doukhan, P., Lang, G., Leucht, A. and Neumann, M.: Dependent wild bootstrap for empirical processes. J. Time Ser. Anal. (to appear, 2015)
Ferraty, F., Romain, Y. (Eds): The Oxford Handbook of Functional Data Analysis. Oxford University Press, Oxford (2011)
Ferraty, F., Vieu, P.: Nonparametric Functional Data Analysis: Theory and Practice. Springer, New York (2006)
Fremdt, S., Horváth, L., Kokoszka, P., Steinebach, J.G.: Functional data analysis with increasing number of projections. J. Multiv. Anal. 124, 313–332 (2014)
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman & Hall, New York (1993)
Gabrys, R., Kokoszka, P.: Portmanteau test of independence for functional observations. J. Am. Stat. Assoc. 102, 1338–1348 (2007)
Montoro, González, A, M., Cao, R., Espinosa, N., Cudeiro, J., Mariño, J.: Functional two-way analysis of variance and bootstrap methods for neural synchrony analysis. BMC Neurosci. 15, 96 (2014)
Górecki, T., Smaga, L.: A comparison of tests for the one-way ANOVA problem for functional data. Comput. Stat. (to appear, 2015)
Grenander, U., Rosenblatt, M.: Statistical spectral analysis of time series arising from stationary stochastic processes. Ann. Math. Stat. 24, 537–558 (1953)
Grenander, U., Rosenblatt, M.: Statistical Analysis of Stationary Time Series. Wiley, New York (1957)
Gromenko, O., Kokoszka, P.: Testing the equality of mean functions of ionospheric critical frequency curves. J. R. Stat. Soc. Ser. C 61, 715–731 (2012)
Gromenko, O., Kokoszka, P., Zhu, L., Sojka, J.: Estimation and testing for spatially distributed curves with application to ionospheric and magnetic field trends. Ann. Appl. Stat. 6(2012), 669–696 (2012)
Hall, P., Hosseini-Nasab, M.: On properties of functional principal components. J. R. Stat. Soc. Ser. B 68, 109–126 (2006)
Hall, P., Von Keilegom, I.: Two-sample tests in functional data analysis starting from discrete data. Statistica Sinica 17, 1511–1531 (2007)
Hannan, E.J.: Multiple Time Series. Wiley, New York (1970)
Hörmann, S., Horváth, L., Reeder, R.: A functional version of the ARCH model. Econ. Theory 29, 138–152 (2013)
Hörmann, S., Kidziński, L., Hallin, M.: Dynamic functional principal components. J. R. Stat. Soc. Ser. B (in press, 2015)
Hörmann, S., Kokoszka, P.: Weakly dependent functional data. Ann. Stat. 38, 1845–1884 (2010)
Hörmann, S., Kokoszka, P.: Consistency of the mean and the principal components of spatially distributed functional data. Bernoulli 19, 1535–1558 (2013)
Horváth, L., Hušková, M., Kokoszka, P.: Testing the stability of the functional autoregressive processes. J. Multiv. Anal. 101, 352–367 (2010)
Horváth, L., Hušková, Rice, G.: Test of independence for functional data. J. Multiv. Anal. 117, 100–119 (2013)
Horváth, L., Kokoszka, P.: Inference for Functional Data with Applications. Springer, New York (2012)
Horváth, L., Kokoszka, P., Reeder, R.: Estimation of the mean of of functional time series and a two sample problem. J. R. Stat. Soc. Ser. B 75, 103–122 (2013)
Horváth, L., Kokoszka, P., Reimherr, M.: Two sample inference in functional linear models. Can. J. Stat. 37, 571–591 (2009)
Horváth, L., Kokoszka, P., Rice, G.: Stationarity of functional time series. J. Econ. 179, 66–82 (2014)
Horváth, L., Rice, G.: Extensions of some classical methods in change point analysis (with discussions). Test 23, 219–290 (2014)
Horváth, L., Rice, G.: Testing equality of means when the observations are from functional time series. J. Time Ser. Anal. 36, 84–108 (2015)
Horváth, L., Rice, G., Whipple, S.: Adaptive bandwidth selection in the long run covariance estimator of functional time series. Comput. Stat. Data Anal. (in press, 2015)
Ibragimov, I.A.: Some limit theorems for stationary processes. Theory Probab. Appl. 7, 349–382 (1962)
Jirak, M.: On weak invariance principals for sums of dependent random functionals. Stat. Probab. Lett. 83, 2291–2296 (2013)
Kokoszka, P., Miao, H., Zhang, X.: Functional dynamic factor model for intraday price curves. J. Financ. Econ. nbu004 (unpublished, 2014)
Kokoszka, P., Reimherr, M.: Determining the order of the functional autoregressive model. J. Time Ser. Anal. 34, 116–129 (2013a)
Kokoszka, P., Reimherr, M.: Asymptotic normality of the principal components of functional time series. Stoch. Process. Appl. 123, 1546–1562 (2013b)
James, G.S.: The comparision of several groups of observations when the ratios of population variances are unknown. Biometrika 38, 324–329 (1951)
Krishnamoorthy, K., Lu, F.: A parametric bootstrap solution to the MANOVA under heteroscedasticity. J. Stat. Comput. Simul. 80, 873–887 (2009)
Krishnamoorthy, K., Lu, F., Mathew, T.: A parametric bootstrap approach for ANOVA with unequal variances: fixed and random models. Comput. Stat. Data Anal. 51, 5731–5742 (2007)
Laukaitis, A.: Functional analysis for cash flow and transaction intensity continous-time prediction using Hilbert-valued autoregressive processes. Eur. J. Op. Res. 185, 1607–1614 (2008)
Laukaitis, A., Račkauskas, A.: Functional data analysis for clients segmentation task. Eur. J. Op. Res. 163, 210–216 (2005)
Love, J.L.: Magnetic monitoring of Earth and space. In: Proceedings of Physics Today, pp. 31–37 (2008)
Mas, A.: Weak convergence for the covariance operators of a Hilbertian linear process. Stoch. Process. Appl. 99, 117–135 (2002)
Maslova, I., Kokoszka, P., Sojka, J., Zhu, L.: Removal of nonconstant daily variation by means of wavelet and functional data analysis. J. Geophys. Res. 114, A03202 (2009)
Maslova, I., Kokoszka, P., Sojka, J., Zhu, L.: Statistical significance testing for the association of magnetometer records at high-, mid- and low latitudes during substorm days. Planet. Space Sci. 58, 437–445 (2010a)
Maslova, I., Kokoszka, P., Sojka, J., Zhu, L.: Estimation of Sq variation by means of multiresolution and principal component analyses. J. Atmos. Solar-Terr. Phys. 72, 625–632 (2010b)
Maslova, I., Kokoszka, P., Sojka, J., Zhu, L.: Statistical significance testing for the association of magnetometer records at high-, mid- and low latitudes during substorm days. Planet. Space Sci. 58, 437–445 (2010c)
Müller, H.G., Sen, R., Stadtmüller, U.: Functional data analysis for volatility. J. Econ. 165, 233–245 (2011)
Onatski, A., Kargin, V.: Curve forecasting by functional autoregression. J. Multiv. Anal. 99, 2508–2526 (2008)
Parzen, E.: On choosing an estimate of the spectral density function of a stationary time series. Ann. Math. Stat. 28, 921–932 (1957)
Pillai, K.C.S.: Upper percentage points of the largest root of a matrix in multivariate analysis. Biometrika 54, 189–193 (1967)
Politis, D.N.: Adaptive bandwidth choice. J. Nonparametr. Stat. 25, 517–533 (2003)
Politis, D.N., Romano, J.: Limit theorem for weakly dependent Hilbert space valued random variables with application to the stationary bootstrap. Statistica Sinica 4, 461–476 (1994)
Priestley, M.: Spectral Analysis of Time Series, vol. 1. Academic Press, New York (1981)
Rady, E.A., Kilany, N.M., Eliwa, S.A.: Estimation in mixed-effects functional ANOVA models. J. Multivar. Anal. 133, 346–355 (2015)
Sharipov, O., Tewes, J., Wendler, M.: Sequential bootstrap in a Hilbert space with application to change point analysis, (preprint, 2014)
Ramsay, J.O., Silverman, B.W.: Applied Functional Data Analysis. Methods and Case Studies. Springer, New York (2002)
Ramsay, J.O., Silverman, B.W.: Functional Data Analysis. Springer, New York (2005)
Rice, W.R., Gaines, S.D.: One-way analysis of variance with unequal variances. Proc. Natl. Acad. Sci. 86, 8183–8184 (1989)
Roy, S.N.: Some Aspects of Multivariate Analysis. Wiley, New York (1957)
Scheffé, H.: The Analysis of Variance. Wiley, New York (1959)
Taniguchi, A., Kakizawa, Y.: Asymptotic Theory of Statistical Inference for Time Series. Springer, New York (2000)
Ullah, S., Finch, C.F.: Applications of functional data analysis: a systematic review. BMC Med. Res. Methodol. 13, 43 (2013)
Xu, W.-Y., Kamide, Y.: Decomposition of daily geomagnetic variations by using method of natural orthogonal component. J. Geophys. Res. 109, A05218 (2004)
Weerahandi, S.: ANOVA under unequal variances. Biometrics 51, 589–599 (1995)
Welch, B.L.: The generalization of Student’s problem when several different population variances are involved. Biometrika 34, 28–35 (1947)
Welch, B.L.: On the comparison of several mean values: an alternative approach. Biometrika 38, 330–336 (1951)
Wilks, S.S.: Certain generalizations of the analysis of variance. Biometrika 24, 471–494 (1932)
Zhang, J.-T.: Analysis of Variance for Functional Data. Chapman & Hall/CRC, New York (2013)
Zhang, J.-T., Liang, X.: One-way ANOVA for functional data via globalizing the pointwise \(F\)-test. Scand. J. Stat. 41, 51–71 (2014)
Author information
Authors and Affiliations
Corresponding author
Additional information
Research supported by NSF Grant DMS 0905400.
Appendix: Technical lemmas
Appendix: Technical lemmas
The results in this section are taken from [52], where their proofs are also given.
Suppose in this section that \(\mathbf{Z}_1,\ldots ,\mathbf{Z}_k\) are independent normal random vectors in \(\mathbb {R}^d\) so that \(E \mathbf{Z}_i = \mathbf{0}\) for all \(1\le i\le \), and
Define
where \(c_i, 1\le i\le k\) satisfy
We recall that \(\chi ^2(r)\) stands for a \(\chi ^2\) random variable with \(r\) degrees of freedom.
Lemma 9.1
If
then
Lemma 9.2
Let \(Y(t)\in L^2\) with \(EY(t)=0, E\Vert Y\Vert ^2<\infty \) and \(H(t,s)=EY(t)Y(s)\) be a strictly positive function and \(\{\psi _i, 1\le i <\infty \}\) be orthonormal functions. Then for any \(1\le d <\infty \) the matrix \(\mathbf{C}=\{E\langle Y,\psi _i\rangle \langle Y,\psi _j\rangle , 1\le i,j\le d\}\) is nonsingular.
Lemma 9.3
We assume \(m\ge 1, g_1, g_2, \ldots ,g_m\in L^2, b_1, b_2, b_m\) are non-negative numbers and \(U(t,s)\) is a symmetric positive definite function with eigenvalues \(\gamma _1>\gamma _2> \cdots >\gamma _\ell >\gamma _{\ell +1}\ge \cdots 0\) and corresponding orthonormal eigenfunctions \(\psi _1, \psi _2, \ldots \) Let
with eigenvalues \(\gamma _1^*\ge \gamma _2^*\ge \ldots \ge 0\) and corresponding orthonormal eigenfunctions \(\psi _1^*, \psi _2^*, \ldots .\) If
then with some \(j=1,2, \ldots , \ell \) and \(i=1,2,\ldots , m\) we have that
We assume that
and
Let \({\mathcal {A}}_0=\hbox {span}(\psi _1, \psi _2, \ldots ,\psi _m)\). We recall that \(\bar{\mathcal {B}}\) denotes the orthogonal complement of the set \({\mathcal {B}}\). Assume that
We say that \(D\) has regular maxima of order \(n\) with respect to \({\mathcal {A}}_0\) if there are \(r_1>r_2>\cdots >r_n\) and orthonormal function \(g_1, g_2, \ldots , g_n\) such that
with \({\mathcal {A}}_{i}=\hbox {span}(\psi _1, \ldots , \psi _m, g_1, \ldots ,g_{i}),1\le i \le n-1.\) The functions \(g_1, \ldots , g_n\) are unique up to signs. Let
Since \(D_M\) is symmetric, non negative definite there are \({\uplambda }_{1,M}\ge {\uplambda }_{2,M}\ge \dots \ge 0\) and orthonormal functions \(f_{1,M}, f_{2,M}, \ldots \) such that
Lemma 9.4
If (9.2)–(9.4) hold and \(D\) has regular maxima of order \(n\) with respect to \({\mathcal {A}}_0\), then, as \(M\rightarrow \infty \) we have
and
where the values of \(c_1=c_{1,M}, c_2=c_{2,M}, \ldots ,c_{m+n}=c_{m+n,M}\) are \(1\) or \(-1.\)
Rights and permissions
About this article
Cite this article
Horváth, L., Rice, G. An introduction to functional data analysis and a principal component approach for testing the equality of mean curves. Rev Mat Complut 28, 505–548 (2015). https://doi.org/10.1007/s13163-015-0169-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13163-015-0169-7