Quadratic forms of the empirical processes for the two-sample problem for functional data

Bárcenas, R.; Ortega, J.; Quiroz, A. J.

doi:10.1007/s11749-017-0522-x

Quadratic forms of the empirical processes for the two-sample problem for functional data

Original Paper
Published: 24 January 2017

Volume 26, pages 503–526, (2017)
Cite this article

TEST Aims and scope Submit manuscript

R. Bárcenas¹,
J. Ortega¹ &
A. J. Quiroz²

331 Accesses
5 Citations
Explore all metrics

Abstract

The use of quadratic forms of the empirical process for the two-sample problem in the context of functional data is considered. The convergence of the family of statistics proposed to a Chi-squared limit is established under metric entropy conditions for smooth functional data. The applicability of the proposed methodology is evaluated in simulations and real data examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Convergence arguments to bridge cauchy and matérn covariance functions

Article 15 February 2023

A new computational framework for log-concave density estimation

Article Open access 30 April 2024

Multivariate Gaussian processes: definitions, examples and applications

Article Open access 27 January 2023

References

Alvarez-Esteban P, Euán C, Ortega J (2016a) Time series clustering using the total variation distance with applications in oceanography. Environmetrics 27:355–369
Alvarez-Esteban P, Euán C, Ortega J (2016b) Statistical analysis of stationary intervals for random waves. In: Proceedings of 26th international offshore and polar engineering conference (ISOPE), vol 3, pp 305–311
Benko M, Härdle W, Kneip A (2009) Common functional principal components. Ann Stat 37:1–34
Article MathSciNet MATH Google Scholar
Borgman Leon E (1972) Statistical models for ocean waves and wave forces. In: Te Chow Ven (ed) Advances in hydroscience, vol 8. Academic Press, New York
Google Scholar
Bosq D (2000) Linear processes in function spaces. Lecture Notes in Statistics, vol 149. Springer, New York
Book MATH Google Scholar
Brodtkorb PA, Johannesson P, Lindgren G, Rychlik I, Rydén E, Sjö E (2000) WAFO—a Matlab toolbox for analysis of random waves and loads. In: Proceedings of 10th international offshore and polar engineering conference (ISOPE), vol III, Seattle, USA, pp 343–350
Cuevas A (2013) A partial overview of the theory of statistics with functional data. J Stat Plann Inference 147:1–23
Article MathSciNet MATH Google Scholar
Dudley RM (1987) Universal Donsker classes and metric entropy. Ann Probab 15:1306–1326
Article MathSciNet MATH Google Scholar
Ermakov MS (1998) Asimptotic minimaxity of chi-square tests. Theory Probab Appl 42(4):589–610
Article MathSciNet Google Scholar
Ferraty F (ed) (2011) Recent advances in functional data analysis and related topics. Physica Verlag, Berlin
MATH Google Scholar
Ferraty F, Vieu P (2006) Nonparametric functional data analysis: theory and practice. Springer, New York
MATH Google Scholar
Fremdt S, Horváth L, Kokoszka P, Steinebach JG (2014) Functional data analysis with increasing number of projections. J Multivar Anal 124:313–332
Article MathSciNet MATH Google Scholar
Fremdt S, Steinebach JG, Horváth L, Kokoszka P (2013) Testing the equality of covariance operators in functional samples. Scand J Stat 40:138–152
Article MathSciNet MATH Google Scholar
Good P (2005) Permutation, parametric and bootstrap tests of hypothesis. Springer, New York
MATH Google Scholar
Gorrostieta C, Ortega J, Quiroz AJ, Smith GH (2014) Characterization of storm wave asymmetries with functional data analysis. Environ Ecol Stat 21(2):263–283
Article MathSciNet Google Scholar
Götze F, Tikhomirov A (2002) Asymptotic distribution of quadratic forms and applications. J Theor Probab 15(2):423–475
Article MathSciNet MATH Google Scholar
Hall P, Van Keilegom I (2007) Two sample tests in functional data analysis starting from discrete data. Stat Sin 17:1511–1531
MathSciNet MATH Google Scholar
Horváth L, Kokoszka P (2009) Two sample inference in functional linear models. Can J Stat 37:571–591
Article MathSciNet MATH Google Scholar
Horváth L, Kokoszka P (2012) Inference for functional data with applications. Springer, New York
Book MATH Google Scholar
Horváth L, Kokoszka P, Reeder R (2013) Estimation of the mean of functional time series and a two-sample problem. J R Stat Soc Ser B 75:103–122
Article MathSciNet Google Scholar
Horváth L, Rice G (2015a) Testing equality of means when the observations are from functional times series. J Time Ser Anal 36:84–108
Article MathSciNet MATH Google Scholar
Horváth L, Rice G (2015b) An introduction to functional data analysis and a principal component approach for testing the equality of mean curves. Revista Matemática Complutense 28(505):548
MathSciNet MATH Google Scholar
Longuet-Higgins M (1956) Statistical properties of a moving wave form. Proc Camb Philos Soc 52:234–245 Part 2
Article MathSciNet MATH Google Scholar
Longuet-Higgins M (1957) The statistical analysis of a random moving surface. Philos Trans R Soc Lond Ser A 249(966):321–387
Article MathSciNet MATH Google Scholar
Mas A (2007) Testing for the mean of random curves: a penalization approach. Stat Inference Stoch Process 10:147–163
Article MathSciNet MATH Google Scholar
Mikosch T (1991) Functional limit theorems for random quadratic forms. Stoch Process Appl 37:81–98
Article MathSciNet MATH Google Scholar
Muñoz Maldonado Y, Staniswalis JG, Irwin LN, Byers D (2002) A similarity analysis of curves. Can J Stat 30:373–381
Article MathSciNet MATH Google Scholar
Ochi MK (1998) Ocean waves: the stochastic approach. Cambridge ocean technology series. Cambridge University Press, Cambridge
Book MATH Google Scholar
Paparoditis E, Sapatinas Th (2014) Bootstrap-based testing for functional data. arXiv:1409.4317v1 [math.ST]
Peña J (2012) Propuestas para el problema de dos muestras con datos funcionales. Tesis de maestría. Universidad de Los Andes, Colombia
Google Scholar
Pierson WJ Jr (1955) Wind-generated gravity waves. Adv Geophys 2:93–178
Article MathSciNet Google Scholar
Pollard D (1982) A central limit theorem for empirical processes. J Aust Math Soci Ser A 33:235–248
Article MathSciNet MATH Google Scholar
Pollard D (1984) Convergence of stochastic processes. Springer, New York
Book MATH Google Scholar
Pomann G-M, Staicu A-M, Ghosh S (2016) A two-sample distribution-free test for functional data with application to a diffusion tensor imaging study of multiple sclerosis. J R Stat Soc Ser C 65:395–414
Article MathSciNet Google Scholar
Ramsay JO, Silverman BW (2002) Applied functional data analysis. Springer, New York
MATH Google Scholar
Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer, New York
MATH Google Scholar
Torsethaugen K (1993) A two-peak wave spectrum model. In: Proceedings of the 18th international conference on ocean, offshore and artic engineering (OMAE), vol II, pp 175–180
Torsethaugen K, Haver S (2004). Simplified double peak spectral model for ocean waves. In: Proceedings of the 14th international offshore and polar engineering conference, pp 23–28
van der Vaart Aad (1996) New Donsker classes. Ann Probab 24:2128–2140
Article MathSciNet MATH Google Scholar
van der Vaart Aad (1998) Asymptotic statistics. Cambridge University Press, Cambridge
Book MATH Google Scholar
van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer series in statistics. Springer, New York
Book MATH Google Scholar
Zhang X, Shao X (2015) Two sample inference for the second-order property of temporally dependent functional data. Bernoulli 21:909–929
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The software WAFO (Brodtkorb et al. 2000) developed by the Wafo group at Lund University of Technology, Sweden, available at http://www.maths.lth.se/matstat/wafo was used for the calculation of all Fourier spectra and associated spectral characteristics as well as for the simulation of Gaussian random waves. The data for station 106 were furnished by the Coastal Data Information Program (CDIP), Integrative Oceanographic Division, operated by the Scripps Institution of Oceanography, under the sponsorship of the US Army Corps of Engineers and the California Department of Boating and Waterways (http://cdip.ucsd.edu/). This work was partially supported by CONACYT, Mexico, Proyectos 169175 Análisis Estadístico de Olas Marinas, Fase II y 234057 Análisis Espectral, Datos Funcionales y Aplicaciones. It was finished while J.O. was visiting, on sabbatical leave from CIMAT and with support from CONACYT, México, the Departamento de Estadística e I.O., Universidad de Valladolid. Their hospitality and support is gratefully acknowledged.

Author information

Authors and Affiliations

Dpto. de Probabilidad y Estadística, CIMAT, Guanajuato, Mexico
R. Bárcenas & J. Ortega
Dpto. de Matemáticas, Universidad de Los Andes, Carrera 1, Nro. 18A-10, edificio H, Bogotá, Colombia
A. J. Quiroz

Authors

R. Bárcenas
View author publications
You can also search for this author in PubMed Google Scholar
J. Ortega
View author publications
You can also search for this author in PubMed Google Scholar
A. J. Quiroz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to J. Ortega.

Appendix: Proof of results

Proof of Proposition 1:

For each random function X on J, and $L_g\in \mathcal{H}$, by the Cauchy–Schwarz inequality, we have

$$\begin{aligned} |L_g(X)|\le \sqrt{\int X^2(t)\hbox {d}t\int g^2(t)\hbox {d}t}\le \sqrt{M^2\int F^2(t)\hbox {d}t}. \end{aligned}$$

(20)

by hypothesis. Next, let $g^*_1,g^*_2,\dots ,g^*_l$ be a minimal set of functions such that, for every $g\in \mathscr {G}$, there exists $j\le l$ for which $\Vert g-g^*_j\Vert _{2,J}\le \epsilon $. Let $Q^*$ be a probability measure on $\mathcal{X}$. Then,

$$\begin{aligned} {Q}^*(L_g-L_{g^*_j})^2={Q}^*\left( \int X(t)(g-g^*_j)(t)\hbox {d}t \right) ^2\le M^2\epsilon ^2, \end{aligned}$$

(21)

again by the Cauchy–Schwarz inequality, and independently of the particular ${Q}^*$. It follows that, for an appropriate choice of a positive constant $\gamma $,

$$\begin{aligned} N_2(C\epsilon ,\mathcal{H})\le N_2(\epsilon \gamma ,\mathscr {G},\lambda ) \end{aligned}$$

and the result follows. $\square $

Proof of Proposition 2:

Under the null hypothesis of equality of distributions, the covariance matrices at the limiting vector of functions, $C({X,\mathbf {g}_\infty })$ and $C({Y,\mathbf {g}_\infty })$, are the same. We are writing $\mathbf {g}_\infty $, as in the statement of Proposition 1, for the vector of the $g_{j,\infty }$, $j\le k$. Now, by Pollard’s Uniform Entropy Condition that holds for $\mathcal{H}$, the Donsker property holds for the inner product class $\mathcal{H}$. This means that the empirical processes $\nu _X(L_g)$ and $\nu _Y(L_g)$, both indexed in $\mathscr {G}$, converge uniformly to a limiting Gaussian process and, by Dudley’s asymptotic equicontinuity condition and the assumed convergence of the functions in the vector $\tilde{\mathbf {g}}$,

$$\begin{aligned} \nu _X(L_{\tilde{\mathbf {g}}})\mathop {\rightarrow }\limits ^{\mathrm{(p)}}\nu _X(L_{\mathbf {g}_\infty }) \text{ and, } \text{ likewise, } \nu _Y(L_{\tilde{\mathbf {g}}})\mathop {\rightarrow }\limits ^{\mathrm{(p)}}\nu _Y(L_{\mathbf {g}_\infty }). \end{aligned}$$

(22)

Let $\tilde{C}({X,\mathbf {g}_\infty })$ be the sample covariance of the vectors $(L_{g_{1,\infty }}(X_i),\dots ,$ $L_{g_{k,\infty }}(X_i))$, $i\le m$, and define similarly $\tilde{C}({Y,\mathbf {g}_\infty })$ for the Y sample. It is clear that, under the null hypothesis, both $\tilde{C}({X,\mathbf {g}_\infty })$ and $\tilde{C}({Y,\mathbf {g}_\infty })$ are consistent estimators of $C({X,\mathbf {g}_\infty })$. Using the independence of the processes $\nu _X(L_g)$ and $\nu _Y(L_g)$, it follows that

$$\begin{aligned} \tilde{C}({\mathbf {g}_\infty })=\frac{\alpha ^2+\beta ^2}{m+n-2}\big ((m-1) \tilde{C}({X,\mathbf {g}_\infty })+(n-1)\tilde{C}({Y,\mathbf {g}_\infty })\big ), \end{aligned}$$

(23)

is a consistent estimator of the covariance matrix of the vector

$$\begin{aligned} \varphi =\alpha \, \nu _X(L_{\mathbf {g}_\infty })-\beta \,\nu _Y(L_{\mathbf {g}_\infty }). \end{aligned}$$

Since $L_{\mathbf {g}_\infty }$ is a fixed set of functionals, from the usual k-dimensional Central Limit Theorem and Slutzky’s theorem, it follows that

$$\begin{aligned} \varphi ^t \tilde{C}({\mathbf {g}_\infty })^{-1}\varphi \mathop {\rightarrow }\limits ^{\mathrm{(d)}}\chi ^2_k \end{aligned}$$

(24)

In view of (22), the same limit is obtained the if we replace $\varphi $ by $\gamma $ in (24). Thus, by Slutzky’s theorem again, it only remains to show that $\tilde{C}({X,\tilde{\mathbf {g}}})$ converges pointwise, in probability, to $C({X,\mathbf {g}_\infty })$. But using inequalities (20) and (21), it is easy to see that the covariance matrix $C(X,\tilde{\mathbf {g}})$ is a continuous function of the vector $\tilde{\mathbf {g}}$, with respect to the norm of $\mathcal{L}^2(J)$. Thus, by the triangle inequality, it suffices to have a uniform law of large numbers for the class

$$\begin{aligned} \mathcal{H}^{(2)}=\{L_g\,L_f:\> g,f\in \mathcal{G}\} \end{aligned}$$

(25)

and for the class $\mathcal{H}$ as well. Now, let $Q^*$ be a probability law on $\mathcal{X}$ and $g,g',f,f'$ functions in $\mathscr {G}$. Then, using Proposition 1, we get

$$\begin{aligned} Q^*|L_g\,L_f-L_{g'}\,L_{f'}|\le & {} Q^*(|L_g-L_{g'}||L_{f'}|)+Q^*(|L_f-L_{f'}||L_{g}|)\nonumber \\\le & {} C(Q^*(|L_g-L_{g'}|)+Q^*(|L_f-L_{f'}|)), \end{aligned}$$

a for the constant C in that Proposition. It follows that,

$$\begin{aligned} N_1(\epsilon ,\mathcal{H}^{(2)},Q^*)\le N^2_1 \left( \frac{\epsilon }{2C},\mathcal{H},Q^*\right) \le N^2_2 \left( \frac{\epsilon }{2C},\mathcal{H},Q^*\right) , \end{aligned}$$

and since the covering number $N_2({\epsilon }/{2C},\mathcal{H})$ satisfies Pollard’s uniform entropy condition (16), the same will hold for $\sup _{Q^*}N_1(\epsilon ,\mathcal{H}^{(2)},Q^*)$ (squaring the covering number does not affect the entropy condition), and this is more that enough for a Uniform Law of Large Numbers for $\mathcal{H}^{(2)}$. The argument for $\mathcal{H}$ is simpler and omitted, and the proof of Proposition 2 is complete. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bárcenas, R., Ortega, J. & Quiroz, A.J. Quadratic forms of the empirical processes for the two-sample problem for functional data. TEST 26, 503–526 (2017). https://doi.org/10.1007/s11749-017-0522-x

Download citation

Received: 02 July 2015
Accepted: 09 January 2017
Published: 24 January 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s11749-017-0522-x

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quadratic forms of the empirical processes for the two-sample problem for functional data

Abstract

Access this article

Similar content being viewed by others

Convergence arguments to bridge cauchy and matérn covariance functions

A new computational framework for log-concave density estimation

Multivariate Gaussian processes: definitions, examples and applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: Proof of results

Proof of Proposition 1:

Proof of Proposition 2:

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Quadratic forms of the empirical processes for the two-sample problem for functional data

Abstract

Access this article

Similar content being viewed by others

Convergence arguments to bridge cauchy and matérn covariance functions

A new computational framework for log-concave density estimation

Multivariate Gaussian processes: definitions, examples and applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: Proof of results

Appendix: Proof of results

Proof of Proposition 1:

Proof of Proposition 2:

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation