Abstract
A conventional regression model for functional data involves expressing a response variable in terms of the predictor function. Two assumptions, that (i) the predictor function and the error are independent and (ii) the relationship between the response variable and the predictor function takes functional linear model, are usually added to the model. Checking the validation of these two assumptions is fundamental to statistic inference and practical applications. We develop a test procedure to check these assumptions simultaneously based on generalized distance covariance. We establish the asymptotic theory for the proposed test under null and alternative hypotheses, and provide a bootstrap procedure to obtain the critical value of the test. The proposed test is consistent against all alternatives provided that the semimetrics related to the generalized distance are strong negative, and can be readily generalized to other functional regression models. We explore the finite sample performance of the proposed test by using both simulations and real data examples. The results illustrate that the proposed method has favorable performance compared with the competing method.
Similar content being viewed by others
References
Aneiros-Pérez, G., & Vieu, P. (2006). Semi-functional partial linear regression. Statistics and Probability Letters, 76(11), 1102–1110.
Cai, T. T., & Hall, P. (2006). Prediction in functional linear regression. The Annals of Statistics, 34(5), 2159–2179.
Cai, T. T., & Yuan, M. (2012). Minimax and adaptive prediction for functional linear regression. Journal of the American Statistical Association, 107(499), 1201–1216.
Cardot, H., Ferraty, F., Mas, A., & Sarda, P. (2003). Testing hypotheses in the functional linear model. Scandinavian Journal of Statistics, 30(1), 241–255.
Cardot, H., Mas, A., & Sarda, P. (2007). Clt in functional linear regression models. Probability Theory and Related Fields, 138(3–4), 325–361.
Crambes, C., Kneip, A., Sarda, P., et al. (2009). Smoothing splines estimators for functional linear regression. The Annals of Statistics, 37(1), 35–72.
Cuesta-Albertos, J. A., García-Portugués, E., Febrero-Bande, M., González-Manteiga, W., et al. (2019). Goodness-of-fit tests for the functional linear model based on randomly projected empirical processes. The Annals of Statistics, 47(1), 439–467.
Delsol, L., Ferraty, F., & Vieu, P. (2011). Structural test in regression on functional variables. Journal of Multivariate Analysis, 102(3), 422–447.
Fukumizu, K., Gretton, A., Schölkopf, B., & Sriperumbudur, B. K. (2009). Characteristic kernels on groups and semigroups. Advances in Neural Information Processing Systems, 473–480.
García-Portugués, E., González-Manteiga, W., & Febrero-Bande, M. (2014). A goodness-of-fit test for the functional linear model with scalar response. Journal of Computational and Graphical Statistics, 23(3), 761–778.
Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Schölkopf, B., & Smola, A. J. (2008). A kernel statistical test of independence. Advances in neural information processing systems, 585–592.
Hall, P., & Horowitz, J. L. (2007). Methodology and convergence rates for functional linear regression. Annals of Statistics, 35(1), 70–91.
Hall, P., & Hosseini-Nasab, M. (2006). On properties of functional principal components analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 109–126.
Hilgert, N., Mas, A., Verzelen, N., et al. (2013). Minimax adaptive tests for the functional linear model. The Annals of Statistics, 41(2), 838–869.
Horváth, L., & Reeder, R. (2013). A test of significance in functional quadratic regression. Bernoulli Official Journal of the Bernoulli Society for Mathematical Statistics and Probability, 19(5A), 225–232.
Huang, L., Wang, H., & Zheng, A. (2014). The m-estimator for functional linear regression model. Statistics and Probability Letters, 88, 165–173.
Kokoszka, P., Maslova, I., Sojka, J., & Zhu, L. (2008). Testing for lack of dependence in the functional linear model. Canadian Journal of Statistics, 36(2), 207–222.
Koroljuk, V., & Borovskich, Y. (1994). Theory of U-statistics. Springer Science+Business Media Dordrecht.
Lyons, R. (2013). Distance covariance in metric spaces. The Annals of Probability, 41(5), 3284–3305.
Pan, W., Wang, X., Zhang, H., Zhu, H., & Zhu, J. (2019). Ball covariance: A generic measure of dependence in banach space. Journal of the American Statistical Association, 1–24.
Patilea, V., Sanchez-Sellero, C., & Saumard, M. (2012). Projection-based nonparametric goodness-of-fit testing with functional covariates. arXiv preprint arXiv:1205.5578.
Patilea, V., Sánchez-Sellero, C., & Saumard, M. (2016). Testing the predictor effect on a functional response. Journal of the American Statistical Association, 111(516), 1684–1695.
Ramsay, J., Hooker, G., & Graves, S. (2009). Functional Data Analysis with R and MATLAB. New York: Springer.
Ramsay, J. O., & Silverman, B. W. (2005). Functional data analysis. New York: Springer.
Sejdinovic, D., Sriperumbudur, B., Gretton, A., Fukumizu, K., et al. (2013). Equivalence of distance-based and rkhs-based statistics in hypothesis testing. The Annals of Statistics, 41(5), 2263–2291.
Sen, A., & Sen, B. (2014). Testing independence and goodness-of-fit in linear models. Biometrika, 101(4), 927–942.
Serfling, R. J. (1980). Approximation theorems of mathematical statistics. Hoboken: Wiley.
Sriperumbudur, B. K., Gretton, A., Fukumizu, K., Lanckriet, G., & Schölkopf, B. (2008). Injective hilbert space embeddings of probability measures. In 21st Annual Conference on Learning Theory (COLT 2008), pages 111–122. Omnipress.
Sriperumbudur, B. K., Gretton, A., Fukumizu, K., Schölkopf, B., & Lanckriet, G. R. (2010). Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research, 11(Apr):1517–1561.
Székely, G. J., & Rizzo, M. L. (2009). Brownian distance covariance. The Annals of Applied Statistics, 3(4), 1236–1265.
Székely, G. J., Rizzo, M. L., & Bakirov, N. K. B. (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35(6), 2769–2794.
Vaart, A. W. V. D., & Wellner, J. A. (1996). Weak Convergence and Empirical Processes. New York: Springer.
van der Vaart, A. W. (2000). Asymptotic statistics. Cambridge: Cambridge University Press.
Yuan, M., Cai, T. T., et al. (2010). A reproducing kernel hilbert space approach to functional linear regression. The Annals of Statistics, 38(6), 3412–3444.
Acknowledgements
We would like to thank the editor and the anonymous reviewers for thoughtful comments that led to a substantial improvement of the paper. This work is supported by NSFC with grant No.11771032.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: technical proofs
Appendix: technical proofs
We prove the theorems in the Appendix with the asymptotic theory of V-statistics, which could be found in Koroljuk and Borovskich (1994).
Proof of Theorem 1
We divide the proof into three parts as follows.
Step 1: Decomposition of \(T_n\)
Observe that
Based on this and by Taylor expansion it holds
where
for some point \((\varsigma _{ij}, \tau _{ij})\) on the straight line connecting the two points \(({\hat{\varepsilon }}_i, {\hat{\varepsilon }}_j)\) and \((\varepsilon _i, \varepsilon _j)\) on \({\mathbb {R}}^2\), where \(L^{2*}([0, 1])\) denotes the space of linear operators from \(L^2([0, 1])\) to \(L^2([0, 1])\). By (8), \(T_n\) can be decomposed in the following way
where
and \(R_n\) is the reminder term.
We can express \(T_{n}^{(p)}\), \(p\in \{0, 1, 2\}\), as V-statistics of the form
for the symmetrical kernel \(h^{(p)}\) defined as
where \(Z_i=(X_i, \varepsilon _i)\) and the sum is taken over all 4! permutations of (i, j, q, r).
Step 2: Negligibility of the remainder term \(R_n\)
In this part we will show that
Denote
then
Since \(\Vert \sqrt{n}({\hat{\beta }}-\beta )\Vert =O_p(1)\), we only need to show that \(\Vert Q_n\Vert =o_p(1)\). Note that \(Q_n\) is the sum of three terms and each of these terms can be shown to converge to zero in probability. We will only show the first term, the other two terms can be done in a similar way. Using the condition of Lipschitz continuity of \(l_{xx}\), \(l_{yy}\) and \(l_{xy}\), we have
Therefore, \(\frac{1}{n^2}\sum _{i,j}^{n}k_{ij}({\mathcal {V}}_{ij}-l^{(2)}_{ij})\) is bounded by
By condition C2(b) and the week law of large number for V-statistics,
hence \(\frac{1}{n^2}\sum _{i,j}^{n}k_{ij}({\mathcal {V}}_{ij}-l^{(2)}_{ij})=o_\text {p}(1)\). With similar techniques for the other two terms, we obtain \(\Vert Q_n\Vert =o_p(1)\).
Step 3: Finding the limiting distribution
By (9) and (12), it is enough to show that the following term
converges in distribution. By conditions C2(c)-(e), \(\text {E}\{|h^{(p)}(Z_i, Z_j, Z_q, Z_r)|^2\}<\infty\) for \(1\le i, j, q, r \le 4, p=0, 1\). With calculation, \(\text {E}\{h^{(0)}(z_1, Z_2, Z_3, Z_4)\}=0\) almost surely, so the kernel \(h^{(0)}\) is degenerate. Denote
and define the V-statistic \({\mathcal {S}}^{(0)}_n\) with kernel \(h_2^{(0)}\), that is,
By the standard results of V-statistics, we have
Define the linear operator \((Af)(s)=\int h^{(0)}_2(s, t)f(t)dP_{X\varepsilon }(t)\) for \(f\in L^2(L^2([0, 1])\times {\mathbb {R}}, P_{X\varepsilon })\), where \(L^2(L^2([0, 1])\times {\mathbb {R}}, P_{X\varepsilon })\) denotes the space consisting of all square integrable functions defined on \(L^2([0, 1])\times {\mathbb {R}}\), and \(P_{X\varepsilon }\) is the joint probability measure of X and \(\varepsilon\). Then the symmetric function \(h^{(0)}_2\) admits an eigenvalue decomposition
where \(\{\gamma _r\}_{r=1}^{\infty }\) and \(\{\phi _r\}_{r=1}^{\infty }\) are the eigenvalues and eigenfunctions of A, respectively, satisfying \(\mathrm {E}[\phi _i(Z)\phi _j(Z)]=\delta _{ij}\). Clearly, we have \(\text {E}[h_2^{(0)}(Z_1, Z_1)]=\sum _{r=1}^{\infty }\gamma _r\) and \(\text {E}[h_2^{(0)}(Z_1, Z_2)]^2=\sum _{r=1}^{\infty }\gamma _r^2\). Since \(\text {E}\{|h^{(0)}(Z_1, Z_2, Z_3, Z_4)|^2\}<\infty\), by the results in page 182 of Serfling (1980), \(\text {E}[h^{(0)}_2(Z_1, Z_2)]^2<\infty\). Similarly, we also have \(\text {E}|h^{(0)}(Z_1, Z_1)|<\infty\). Hence, \(|\sum _{r=1}^{\infty }\gamma _r| < \infty\) and \(\sum _{r=1}^{\infty }\gamma _r^2 < \infty\). Note that
In view of this, \(n{\mathcal {S}}_n^{(0)}\) can be expressed as
Now let us turn to the terms \(T_n^{(1)}\) and \(T_n^{(2)}\) in (13). It can be shown that \(\text {E}\{h^{(1)}(Z_1, Z_2, Z_3, Z_4)\} =0\). Define
then, by the standard theory of V-statistics,
Meanwhile, by the weak law of large numbers for V-statistics,
Recall that \(\sqrt{n}({\hat{\beta }}-\beta )=\frac{1}{\sqrt{n}}\sum _{i=1}^{n}\psi (Z_i) + o_p(1)\). According to the multivariate central limit theorem and the Theorem 1.4.8 in Vaart and Wellner (1996), the countable random sequence
convergences in distribution to the joint Gaussian random sequence
where \({\mathcal {Z}}_r\) are i.i.d. N(0, 1) random variables, \({\mathcal {N}}\), \({\mathcal {G}}\) are Gaussian random functions in \(L^2([0, 1])\) with mean zero and covariance functions \(\text {cov}({\mathcal {N}}(s), {\mathcal {N}}(t))=\text {E}\{16h^{(1)}_{1}(X(s), \varepsilon ) h^{(1)}_{1}(X(t), \varepsilon )\}\) and \(\text {cov}({\mathcal {G}}(s), {\mathcal {G}}(t))=\text {E}\{\psi (X(s), \varepsilon ) \psi (X(t), \varepsilon ) \}\), respectively. Then, by the continuous mapping theorem, we have
\(\square\)
Proof of Theorem 2
Recall that
Using Taylor expansion of \(l_{ij}\) to order 1, we have, almost surely,
where \(l^{(0)}_{ij}=l(\varepsilon _i, \varepsilon _j)\) and \({\mathcal {V}}_{ij}=-\{l_{x}(\varsigma _{ij}, \tau _{ij})X_i+l_{y}(\varsigma _{ij}, \tau _{ij})X_j\}\) for some point \((\varsigma _{ij}, \tau _{ij})\) on the straight line connecting the two points \(({\hat{\varepsilon }}_i, {\hat{\varepsilon }}_j)\) and \((\varepsilon _i, \varepsilon _j)\) on \({\mathbb {R}}^2\). By (14), \(T_n\) can be decomposed in three terms
where \(T_n^{(p)}\), \(p=0, 1\) are defined the same as in the proof of Theorem 1 and
Under \(H_0\), the predictor X and the error \(\varepsilon\) are independent, therefore the generalized distance \(\theta (X, \varepsilon )\) between X and \(\varepsilon\) is zero. By the condition C4(c)-(e), \(\text {E}\{|h^{(p)}(Z_i, Z_j, Z_q, Z_r)|\}<\infty\) for \(1\le i, j, q, r \le 4, p=0, 1\). By the law of large number for V-statistics,
Therefore \(T_n^{(1)}=O_p(1)\). Observe that
By condition C3, \(\Vert {\hat{\beta }}-\beta \Vert = o_p(1)\). We only need to show that \(\Vert R_n\Vert =O_p(1)\). Note that \(R_n\) is a sum of three terms and each of these terms can be shown to converge to zero in probability. We will only show the first term, the other two terms can be done in a similar way. Using the condition of Lipschitz continuity of \(l_{x}\) and \(l_{y}\), we have
Therefore, \(\frac{1}{n^2}\sum _{i,j}^{n}k_{ij}({\mathcal {V}}_{ij}-l^{(1)}_{ij})\) is bounded by
By condition C4(b) and the week law of V-statistics,
hence \(\frac{1}{n^2}\sum _{i,j}^{n}k_{ij}({\mathcal {V}}_{ij}-l^{(1)}_{ij})=O_\text {p}(1)\). With similar techniques for the other two terms, we obtain \(\Vert R_n\Vert =O_p(1)\). \(\square\)
Proof of Theorem 3
Let \(\epsilon _i=m(X_i)-\langle X_i, {\tilde{\beta }}\rangle + \varepsilon _i\). Even though m(x) might not be linear, \(\langle X, {\tilde{\beta }}\rangle\) is the closest function to m(x) in \({\mathcal {M}}_{L^2([0, 1])}\) in the sense of square loss. By the consistency of M-estimator (see Corollary 3.2.3 in Vaart and Wellner 1996), the estimator \({\hat{\beta }}\) convergence in probability to \({\tilde{\beta }}\), that is, \(\Vert {\hat{\beta }}-{\tilde{\beta }}\Vert =o_{\text {p}}(1)\). Using Taylor expansion, we have, almost surely,
where \((\varsigma _{ij}, \tau _{ij})\) is some point on the line connecting the two points \(({\hat{\varepsilon }}_i, {\hat{\varepsilon }}_j)\) and \((\epsilon _{i}, \epsilon _{j})\). Note that
we can decompose
where
and
We will show that \(T_n^{(0)}{\mathop {\rightarrow }\limits ^{\text {p}}}\tau\), \(\tau > 0\), \(\langle {\hat{\beta }}-{\tilde{\beta }}, T_n^{(1)}\rangle =o_\text {p}(1)\) and \(R_n=o_\text {p}(1)\). By the results in the proof of Theorem 2, \(\langle {\hat{\beta }}-{\tilde{\beta }}, T_n^{(1)}\rangle =o_\text {p}(1)\) and \(R_n=o_\text {p}(1)\).
Now we show that \(T_n^{(0)}{\mathop {\rightarrow }\limits ^{\text {p}}}\tau\), \(\tau > 0\). Using the same arguments of proof of Theorem 1, \(T_n^{(0)}\) is a V-statistic. By the weak law of V-statistics, \(T_n^{(0)}\) convergence in probability to generalized distance covariance of X and \(\epsilon\) , \(\theta (X, \epsilon )\). Under \(H_{1, 1}\), \(\epsilon =\varepsilon\), hence X and \(\epsilon\) are dependent. Under scenarios \(H_{1, 2}\) or \(H_{1, 3}\), \(m(X)\not = \langle X, {\tilde{\beta }}\rangle\) with positive probability. And the conditional mean of \(\epsilon\) given X is
With the condition that \(m(X)-\langle X, {\tilde{\beta }}\rangle\) is a non-constant function of X, \(\text {E}(\epsilon |X)\) depends on X, and hence X and \(\epsilon\) are dependent. Since k and l are strong negative type, \(\tau = \theta (X, \epsilon ) >0\). \(\square\)
Rights and permissions
About this article
Cite this article
Lai, T., Zhang, Z. & Wang, Y. Testing independence and goodness-of-fit jointly for functional linear models. J. Korean Stat. Soc. 50, 380–402 (2021). https://doi.org/10.1007/s42952-020-00083-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42952-020-00083-4