Skip to main content
Log in

Quadratic forms of the empirical processes for the two-sample problem for functional data

  • Original Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

The use of quadratic forms of the empirical process for the two-sample problem in the context of functional data is considered. The convergence of the family of statistics proposed to a Chi-squared limit is established under metric entropy conditions for smooth functional data. The applicability of the proposed methodology is evaluated in simulations and real data examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Alvarez-Esteban P, Euán C, Ortega J (2016a) Time series clustering using the total variation distance with applications in oceanography. Environmetrics 27:355–369

  • Alvarez-Esteban P, Euán C, Ortega J (2016b) Statistical analysis of stationary intervals for random waves. In: Proceedings of 26th international offshore and polar engineering conference (ISOPE), vol 3, pp 305–311

  • Benko M, Härdle W, Kneip A (2009) Common functional principal components. Ann Stat 37:1–34

    Article  MathSciNet  MATH  Google Scholar 

  • Borgman Leon E (1972) Statistical models for ocean waves and wave forces. In: Te Chow Ven (ed) Advances in hydroscience, vol 8. Academic Press, New York

    Google Scholar 

  • Bosq D (2000) Linear processes in function spaces. Lecture Notes in Statistics, vol 149. Springer, New York

    Book  MATH  Google Scholar 

  • Brodtkorb PA, Johannesson P, Lindgren G, Rychlik I, Rydén E, Sjö E (2000) WAFO—a Matlab toolbox for analysis of random waves and loads. In: Proceedings of 10th international offshore and polar engineering conference (ISOPE), vol III, Seattle, USA, pp 343–350

  • Cuevas A (2013) A partial overview of the theory of statistics with functional data. J Stat Plann Inference 147:1–23

    Article  MathSciNet  MATH  Google Scholar 

  • Dudley RM (1987) Universal Donsker classes and metric entropy. Ann Probab 15:1306–1326

    Article  MathSciNet  MATH  Google Scholar 

  • Ermakov MS (1998) Asimptotic minimaxity of chi-square tests. Theory Probab Appl 42(4):589–610

    Article  MathSciNet  Google Scholar 

  • Ferraty F (ed) (2011) Recent advances in functional data analysis and related topics. Physica Verlag, Berlin

    MATH  Google Scholar 

  • Ferraty F, Vieu P (2006) Nonparametric functional data analysis: theory and practice. Springer, New York

    MATH  Google Scholar 

  • Fremdt S, Horváth L, Kokoszka P, Steinebach JG (2014) Functional data analysis with increasing number of projections. J Multivar Anal 124:313–332

    Article  MathSciNet  MATH  Google Scholar 

  • Fremdt S, Steinebach JG, Horváth L, Kokoszka P (2013) Testing the equality of covariance operators in functional samples. Scand J Stat 40:138–152

    Article  MathSciNet  MATH  Google Scholar 

  • Good P (2005) Permutation, parametric and bootstrap tests of hypothesis. Springer, New York

    MATH  Google Scholar 

  • Gorrostieta C, Ortega J, Quiroz AJ, Smith GH (2014) Characterization of storm wave asymmetries with functional data analysis. Environ Ecol Stat 21(2):263–283

    Article  MathSciNet  Google Scholar 

  • Götze F, Tikhomirov A (2002) Asymptotic distribution of quadratic forms and applications. J Theor Probab 15(2):423–475

    Article  MathSciNet  MATH  Google Scholar 

  • Hall P, Van Keilegom I (2007) Two sample tests in functional data analysis starting from discrete data. Stat Sin 17:1511–1531

    MathSciNet  MATH  Google Scholar 

  • Horváth L, Kokoszka P (2009) Two sample inference in functional linear models. Can J Stat 37:571–591

    Article  MathSciNet  MATH  Google Scholar 

  • Horváth L, Kokoszka P (2012) Inference for functional data with applications. Springer, New York

    Book  MATH  Google Scholar 

  • Horváth L, Kokoszka P, Reeder R (2013) Estimation of the mean of functional time series and a two-sample problem. J R Stat Soc Ser B 75:103–122

    Article  MathSciNet  Google Scholar 

  • Horváth L, Rice G (2015a) Testing equality of means when the observations are from functional times series. J Time Ser Anal 36:84–108

    Article  MathSciNet  MATH  Google Scholar 

  • Horváth L, Rice G (2015b) An introduction to functional data analysis and a principal component approach for testing the equality of mean curves. Revista Matemática Complutense 28(505):548

    MathSciNet  MATH  Google Scholar 

  • Longuet-Higgins M (1956) Statistical properties of a moving wave form. Proc Camb Philos Soc 52:234–245 Part 2

    Article  MathSciNet  MATH  Google Scholar 

  • Longuet-Higgins M (1957) The statistical analysis of a random moving surface. Philos Trans R Soc Lond Ser A 249(966):321–387

    Article  MathSciNet  MATH  Google Scholar 

  • Mas A (2007) Testing for the mean of random curves: a penalization approach. Stat Inference Stoch Process 10:147–163

    Article  MathSciNet  MATH  Google Scholar 

  • Mikosch T (1991) Functional limit theorems for random quadratic forms. Stoch Process Appl 37:81–98

    Article  MathSciNet  MATH  Google Scholar 

  • Muñoz Maldonado Y, Staniswalis JG, Irwin LN, Byers D (2002) A similarity analysis of curves. Can J Stat 30:373–381

    Article  MathSciNet  MATH  Google Scholar 

  • Ochi MK (1998) Ocean waves: the stochastic approach. Cambridge ocean technology series. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Paparoditis E, Sapatinas Th (2014) Bootstrap-based testing for functional data. arXiv:1409.4317v1 [math.ST]

  • Peña J (2012) Propuestas para el problema de dos muestras con datos funcionales. Tesis de maestría. Universidad de Los Andes, Colombia

    Google Scholar 

  • Pierson WJ Jr (1955) Wind-generated gravity waves. Adv Geophys 2:93–178

    Article  MathSciNet  Google Scholar 

  • Pollard D (1982) A central limit theorem for empirical processes. J Aust Math Soci Ser A 33:235–248

    Article  MathSciNet  MATH  Google Scholar 

  • Pollard D (1984) Convergence of stochastic processes. Springer, New York

    Book  MATH  Google Scholar 

  • Pomann G-M, Staicu A-M, Ghosh S (2016) A two-sample distribution-free test for functional data with application to a diffusion tensor imaging study of multiple sclerosis. J R Stat Soc Ser C 65:395–414

    Article  MathSciNet  Google Scholar 

  • Ramsay JO, Silverman BW (2002) Applied functional data analysis. Springer, New York

    MATH  Google Scholar 

  • Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer, New York

    MATH  Google Scholar 

  • Torsethaugen K (1993) A two-peak wave spectrum model. In: Proceedings of the 18th international conference on ocean, offshore and artic engineering (OMAE), vol II, pp 175–180

  • Torsethaugen K, Haver S (2004). Simplified double peak spectral model for ocean waves. In: Proceedings of the 14th international offshore and polar engineering conference, pp 23–28

  • van der Vaart Aad (1996) New Donsker classes. Ann Probab 24:2128–2140

    Article  MathSciNet  MATH  Google Scholar 

  • van der Vaart Aad (1998) Asymptotic statistics. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer series in statistics. Springer, New York

    Book  MATH  Google Scholar 

  • Zhang X, Shao X (2015) Two sample inference for the second-order property of temporally dependent functional data. Bernoulli 21:909–929

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The software WAFO (Brodtkorb et al. 2000) developed by the Wafo group at Lund University of Technology, Sweden, available at http://www.maths.lth.se/matstat/wafo was used for the calculation of all Fourier spectra and associated spectral characteristics as well as for the simulation of Gaussian random waves. The data for station 106 were furnished by the Coastal Data Information Program (CDIP), Integrative Oceanographic Division, operated by the Scripps Institution of Oceanography, under the sponsorship of the US Army Corps of Engineers and the California Department of Boating and Waterways (http://cdip.ucsd.edu/). This work was partially supported by CONACYT, Mexico, Proyectos 169175 Análisis Estadístico de Olas Marinas, Fase II y 234057 Análisis Espectral, Datos Funcionales y Aplicaciones. It was finished while J.O. was visiting, on sabbatical leave from CIMAT and with support from CONACYT, México, the Departamento de Estadística e I.O., Universidad de Valladolid. Their hospitality and support is gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Ortega.

Appendix: Proof of results

Appendix: Proof of results

Proof of Proposition 1:

For each random function X on J, and \(L_g\in \mathcal{H}\), by the Cauchy–Schwarz inequality, we have

$$\begin{aligned} |L_g(X)|\le \sqrt{\int X^2(t)\hbox {d}t\int g^2(t)\hbox {d}t}\le \sqrt{M^2\int F^2(t)\hbox {d}t}. \end{aligned}$$
(20)

by hypothesis. Next, let \(g^*_1,g^*_2,\dots ,g^*_l\) be a minimal set of functions such that, for every \(g\in \mathscr {G}\), there exists \(j\le l\) for which \(\Vert g-g^*_j\Vert _{2,J}\le \epsilon \). Let \(Q^*\) be a probability measure on \(\mathcal{X}\). Then,

$$\begin{aligned} {Q}^*(L_g-L_{g^*_j})^2={Q}^*\left( \int X(t)(g-g^*_j)(t)\hbox {d}t \right) ^2\le M^2\epsilon ^2, \end{aligned}$$
(21)

again by the Cauchy–Schwarz inequality, and independently of the particular \({Q}^*\). It follows that, for an appropriate choice of a positive constant \(\gamma \),

$$\begin{aligned} N_2(C\epsilon ,\mathcal{H})\le N_2(\epsilon \gamma ,\mathscr {G},\lambda ) \end{aligned}$$

and the result follows. \(\square \)

Proof of Proposition 2:

Under the null hypothesis of equality of distributions, the covariance matrices at the limiting vector of functions, \(C({X,\mathbf {g}_\infty })\) and \(C({Y,\mathbf {g}_\infty })\), are the same. We are writing \(\mathbf {g}_\infty \), as in the statement of Proposition 1, for the vector of the \(g_{j,\infty }\), \(j\le k\). Now, by Pollard’s Uniform Entropy Condition that holds for \(\mathcal{H}\), the Donsker property holds for the inner product class \(\mathcal{H}\). This means that the empirical processes \(\nu _X(L_g)\) and \(\nu _Y(L_g)\), both indexed in \(\mathscr {G}\), converge uniformly to a limiting Gaussian process and, by Dudley’s asymptotic equicontinuity condition and the assumed convergence of the functions in the vector \(\tilde{\mathbf {g}}\),

$$\begin{aligned} \nu _X(L_{\tilde{\mathbf {g}}})\mathop {\rightarrow }\limits ^{\mathrm{(p)}}\nu _X(L_{\mathbf {g}_\infty }) \text{ and, } \text{ likewise, } \nu _Y(L_{\tilde{\mathbf {g}}})\mathop {\rightarrow }\limits ^{\mathrm{(p)}}\nu _Y(L_{\mathbf {g}_\infty }). \end{aligned}$$
(22)

Let \(\tilde{C}({X,\mathbf {g}_\infty })\) be the sample covariance of the vectors \((L_{g_{1,\infty }}(X_i),\dots ,\) \(L_{g_{k,\infty }}(X_i))\), \(i\le m\), and define similarly \(\tilde{C}({Y,\mathbf {g}_\infty })\) for the Y sample. It is clear that, under the null hypothesis, both \(\tilde{C}({X,\mathbf {g}_\infty })\) and \(\tilde{C}({Y,\mathbf {g}_\infty })\) are consistent estimators of \(C({X,\mathbf {g}_\infty })\). Using the independence of the processes \(\nu _X(L_g)\) and \(\nu _Y(L_g)\), it follows that

$$\begin{aligned} \tilde{C}({\mathbf {g}_\infty })=\frac{\alpha ^2+\beta ^2}{m+n-2}\big ((m-1) \tilde{C}({X,\mathbf {g}_\infty })+(n-1)\tilde{C}({Y,\mathbf {g}_\infty })\big ), \end{aligned}$$
(23)

is a consistent estimator of the covariance matrix of the vector

$$\begin{aligned} \varphi =\alpha \, \nu _X(L_{\mathbf {g}_\infty })-\beta \,\nu _Y(L_{\mathbf {g}_\infty }). \end{aligned}$$

Since \(L_{\mathbf {g}_\infty }\) is a fixed set of functionals, from the usual k-dimensional Central Limit Theorem and Slutzky’s theorem, it follows that

$$\begin{aligned} \varphi ^t \tilde{C}({\mathbf {g}_\infty })^{-1}\varphi \mathop {\rightarrow }\limits ^{\mathrm{(d)}}\chi ^2_k \end{aligned}$$
(24)

In view of (22), the same limit is obtained the if we replace \(\varphi \) by \(\gamma \) in (24). Thus, by Slutzky’s theorem again, it only remains to show that \(\tilde{C}({X,\tilde{\mathbf {g}}})\) converges pointwise, in probability, to \(C({X,\mathbf {g}_\infty })\). But using inequalities (20) and (21), it is easy to see that the covariance matrix \(C(X,\tilde{\mathbf {g}})\) is a continuous function of the vector \(\tilde{\mathbf {g}}\), with respect to the norm of \(\mathcal{L}^2(J)\). Thus, by the triangle inequality, it suffices to have a uniform law of large numbers for the class

$$\begin{aligned} \mathcal{H}^{(2)}=\{L_g\,L_f:\> g,f\in \mathcal{G}\} \end{aligned}$$
(25)

and for the class \(\mathcal{H}\) as well. Now, let \(Q^*\) be a probability law on \(\mathcal{X}\) and \(g,g',f,f'\) functions in \(\mathscr {G}\). Then, using Proposition 1, we get

$$\begin{aligned} Q^*|L_g\,L_f-L_{g'}\,L_{f'}|\le & {} Q^*(|L_g-L_{g'}||L_{f'}|)+Q^*(|L_f-L_{f'}||L_{g}|)\nonumber \\\le & {} C(Q^*(|L_g-L_{g'}|)+Q^*(|L_f-L_{f'}|)), \end{aligned}$$

a for the constant C in that Proposition. It follows that,

$$\begin{aligned} N_1(\epsilon ,\mathcal{H}^{(2)},Q^*)\le N^2_1 \left( \frac{\epsilon }{2C},\mathcal{H},Q^*\right) \le N^2_2 \left( \frac{\epsilon }{2C},\mathcal{H},Q^*\right) , \end{aligned}$$

and since the covering number \(N_2({\epsilon }/{2C},\mathcal{H})\) satisfies Pollard’s uniform entropy condition (16), the same will hold for \(\sup _{Q^*}N_1(\epsilon ,\mathcal{H}^{(2)},Q^*)\) (squaring the covering number does not affect the entropy condition), and this is more that enough for a Uniform Law of Large Numbers for \(\mathcal{H}^{(2)}\). The argument for \(\mathcal{H}\) is simpler and omitted, and the proof of Proposition 2 is complete. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bárcenas, R., Ortega, J. & Quiroz, A.J. Quadratic forms of the empirical processes for the two-sample problem for functional data. TEST 26, 503–526 (2017). https://doi.org/10.1007/s11749-017-0522-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-017-0522-x

Keywords

Mathematics Subject Classification

Navigation