Skip to main content
Log in

Testing independence and goodness-of-fit jointly for functional linear models

  • Research Article
  • Published:
Journal of the Korean Statistical Society Aims and scope Submit manuscript

Abstract

A conventional regression model for functional data involves expressing a response variable in terms of the predictor function. Two assumptions, that (i) the predictor function and the error are independent and (ii) the relationship between the response variable and the predictor function takes functional linear model, are usually added to the model. Checking the validation of these two assumptions is fundamental to statistic inference and practical applications. We develop a test procedure to check these assumptions simultaneously based on generalized distance covariance. We establish the asymptotic theory for the proposed test under null and alternative hypotheses, and provide a bootstrap procedure to obtain the critical value of the test. The proposed test is consistent against all alternatives provided that the semimetrics related to the generalized distance are strong negative, and can be readily generalized to other functional regression models. We explore the finite sample performance of the proposed test by using both simulations and real data examples. The results illustrate that the proposed method has favorable performance compared with the competing method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aneiros-Pérez, G., & Vieu, P. (2006). Semi-functional partial linear regression. Statistics and Probability Letters, 76(11), 1102–1110.

    Article  MathSciNet  Google Scholar 

  • Cai, T. T., & Hall, P. (2006). Prediction in functional linear regression. The Annals of Statistics, 34(5), 2159–2179.

    MathSciNet  MATH  Google Scholar 

  • Cai, T. T., & Yuan, M. (2012). Minimax and adaptive prediction for functional linear regression. Journal of the American Statistical Association, 107(499), 1201–1216.

    Article  MathSciNet  Google Scholar 

  • Cardot, H., Ferraty, F., Mas, A., & Sarda, P. (2003). Testing hypotheses in the functional linear model. Scandinavian Journal of Statistics, 30(1), 241–255.

    Article  MathSciNet  Google Scholar 

  • Cardot, H., Mas, A., & Sarda, P. (2007). Clt in functional linear regression models. Probability Theory and Related Fields, 138(3–4), 325–361.

    Article  MathSciNet  Google Scholar 

  • Crambes, C., Kneip, A., Sarda, P., et al. (2009). Smoothing splines estimators for functional linear regression. The Annals of Statistics, 37(1), 35–72.

    Article  MathSciNet  Google Scholar 

  • Cuesta-Albertos, J. A., García-Portugués, E., Febrero-Bande, M., González-Manteiga, W., et al. (2019). Goodness-of-fit tests for the functional linear model based on randomly projected empirical processes. The Annals of Statistics, 47(1), 439–467.

    Article  MathSciNet  Google Scholar 

  • Delsol, L., Ferraty, F., & Vieu, P. (2011). Structural test in regression on functional variables. Journal of Multivariate Analysis, 102(3), 422–447.

    Article  MathSciNet  Google Scholar 

  • Fukumizu, K., Gretton, A., Schölkopf, B., & Sriperumbudur, B. K. (2009). Characteristic kernels on groups and semigroups. Advances in Neural Information Processing Systems, 473–480.

  • García-Portugués, E., González-Manteiga, W., & Febrero-Bande, M. (2014). A goodness-of-fit test for the functional linear model with scalar response. Journal of Computational and Graphical Statistics, 23(3), 761–778.

    Article  MathSciNet  Google Scholar 

  • Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Schölkopf, B., & Smola, A. J. (2008). A kernel statistical test of independence. Advances in neural information processing systems, 585–592.

  • Hall, P., & Horowitz, J. L. (2007). Methodology and convergence rates for functional linear regression. Annals of Statistics, 35(1), 70–91.

    MathSciNet  MATH  Google Scholar 

  • Hall, P., & Hosseini-Nasab, M. (2006). On properties of functional principal components analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 109–126.

    Article  MathSciNet  Google Scholar 

  • Hilgert, N., Mas, A., Verzelen, N., et al. (2013). Minimax adaptive tests for the functional linear model. The Annals of Statistics, 41(2), 838–869.

    Article  MathSciNet  Google Scholar 

  • Horváth, L., & Reeder, R. (2013). A test of significance in functional quadratic regression. Bernoulli Official Journal of the Bernoulli Society for Mathematical Statistics and Probability, 19(5A), 225–232.

    MathSciNet  MATH  Google Scholar 

  • Huang, L., Wang, H., & Zheng, A. (2014). The m-estimator for functional linear regression model. Statistics and Probability Letters, 88, 165–173.

    Article  MathSciNet  Google Scholar 

  • Kokoszka, P., Maslova, I., Sojka, J., & Zhu, L. (2008). Testing for lack of dependence in the functional linear model. Canadian Journal of Statistics, 36(2), 207–222.

    Article  MathSciNet  Google Scholar 

  • Koroljuk, V., & Borovskich, Y. (1994). Theory of U-statistics. Springer Science+Business Media Dordrecht.

  • Lyons, R. (2013). Distance covariance in metric spaces. The Annals of Probability, 41(5), 3284–3305.

    Article  MathSciNet  Google Scholar 

  • Pan, W., Wang, X., Zhang, H., Zhu, H., & Zhu, J. (2019). Ball covariance: A generic measure of dependence in banach space. Journal of the American Statistical Association, 1–24.

  • Patilea, V., Sanchez-Sellero, C., & Saumard, M. (2012). Projection-based nonparametric goodness-of-fit testing with functional covariates. arXiv preprint arXiv:1205.5578.

  • Patilea, V., Sánchez-Sellero, C., & Saumard, M. (2016). Testing the predictor effect on a functional response. Journal of the American Statistical Association, 111(516), 1684–1695.

    Article  MathSciNet  Google Scholar 

  • Ramsay, J., Hooker, G., & Graves, S. (2009). Functional Data Analysis with R and MATLAB. New York: Springer.

    Book  Google Scholar 

  • Ramsay, J. O., & Silverman, B. W. (2005). Functional data analysis. New York: Springer.

    Book  Google Scholar 

  • Sejdinovic, D., Sriperumbudur, B., Gretton, A., Fukumizu, K., et al. (2013). Equivalence of distance-based and rkhs-based statistics in hypothesis testing. The Annals of Statistics, 41(5), 2263–2291.

    Article  MathSciNet  Google Scholar 

  • Sen, A., & Sen, B. (2014). Testing independence and goodness-of-fit in linear models. Biometrika, 101(4), 927–942.

    Article  MathSciNet  Google Scholar 

  • Serfling, R. J. (1980). Approximation theorems of mathematical statistics. Hoboken: Wiley.

    Book  Google Scholar 

  • Sriperumbudur, B. K., Gretton, A., Fukumizu, K., Lanckriet, G., & Schölkopf, B. (2008). Injective hilbert space embeddings of probability measures. In 21st Annual Conference on Learning Theory (COLT 2008), pages 111–122. Omnipress.

  • Sriperumbudur, B. K., Gretton, A., Fukumizu, K., Schölkopf, B., & Lanckriet, G. R. (2010). Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research, 11(Apr):1517–1561.

  • Székely, G. J., & Rizzo, M. L. (2009). Brownian distance covariance. The Annals of Applied Statistics, 3(4), 1236–1265.

    MathSciNet  MATH  Google Scholar 

  • Székely, G. J., Rizzo, M. L., & Bakirov, N. K. B. (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35(6), 2769–2794.

    Article  MathSciNet  Google Scholar 

  • Vaart, A. W. V. D., & Wellner, J. A. (1996). Weak Convergence and Empirical Processes. New York: Springer.

    Book  Google Scholar 

  • van der Vaart, A. W. (2000). Asymptotic statistics. Cambridge: Cambridge University Press.

    Google Scholar 

  • Yuan, M., Cai, T. T., et al. (2010). A reproducing kernel hilbert space approach to functional linear regression. The Annals of Statistics, 38(6), 3412–3444.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We would like to thank the editor and the anonymous reviewers for thoughtful comments that led to a substantial improvement of the paper. This work is supported by NSFC with grant No.11771032.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhongzhan Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: technical proofs

Appendix: technical proofs

We prove the theorems in the Appendix with the asymptotic theory of V-statistics, which could be found in Koroljuk and Borovskich (1994).

Proof of Theorem 1

We divide the proof into three parts as follows.

Step 1: Decomposition of \(T_n\)

Observe that

$$\begin{aligned} {\hat{\varepsilon }}_{i }&=\varepsilon _{i}-\langle X_{i}, {\hat{\beta }}-\beta \rangle \end{aligned}$$

Based on this and by Taylor expansion it holds

$$\begin{aligned} l_{ij}=l^{(0)}_{ij}+\langle {\hat{\beta }}-\beta , l^{(1)}_{ij}\rangle + \frac{1}{2}\langle {\mathcal {V}}_{ij} ({\hat{\beta }}-\beta ), {\hat{\beta }}-\beta \rangle \end{aligned}$$
(8)

where

$$\begin{aligned}&l^{(0)}_{ij}=l(\varepsilon _i, \varepsilon _j), \quad l^{(1)}_{ij}=-\{l_x(\varepsilon _i, \varepsilon _j)X_i+l_y(\varepsilon _i, \varepsilon _j)X_j\}\in L^2([0, 1]),\\&{\mathcal {V}}_{ij}=\{l_{xx}(\varsigma _{ij}, \tau _{ij})X_i\otimes X_i+l_{yy}(\varsigma _{ij}, \tau _{ij})X_j\otimes X_j+2l_{xy}(\varsigma _{ij},\tau _{ij})X_i\otimes X_j\}\in L^{2*}([0, 1]), \end{aligned}$$

for some point \((\varsigma _{ij}, \tau _{ij})\) on the straight line connecting the two points \(({\hat{\varepsilon }}_i, {\hat{\varepsilon }}_j)\) and \((\varepsilon _i, \varepsilon _j)\) on \({\mathbb {R}}^2\), where \(L^{2*}([0, 1])\) denotes the space of linear operators from \(L^2([0, 1])\) to \(L^2([0, 1])\). By (8), \(T_n\) can be decomposed in the following way

$$\begin{aligned} T_n=T_n^{(0)}+\langle {\hat{\beta }} - \beta , T_n^{(1)}\rangle + \frac{1}{2}\langle T_n^{(2)} ({\hat{\beta }}-\beta ), {\hat{\beta }}-\beta \rangle + R_n, \end{aligned}$$
(9)

where

$$\begin{aligned} T_n^{(p)}&=\frac{1}{n^2}\sum _{i,j}^{n}k_{ij}l_{ij}^{(p)} + \frac{1}{n^4}\sum _{i, j, q, r}^{n}k_{ij}l^{(p)}_{qr}-\frac{2}{n^3}\sum _{i,j,q}^{n}k_{ij}l^{(p)}_{iq} \quad (p=0, 1, 2), \\ l^{(2)}_{ij}&=\{l_{xx}(\varepsilon _i, \varepsilon _j)X_i\otimes X_i+l_{yy}(\varepsilon _i, \varepsilon _j)X_j\otimes X_j+2l_{xy}(\varepsilon _i,\varepsilon _j)X_i\otimes X_j\}, \end{aligned}$$

and \(R_n\) is the reminder term.

We can express \(T_{n}^{(p)}\), \(p\in \{0, 1, 2\}\), as V-statistics of the form

$$\begin{aligned} T_n^{(p)}=\frac{1}{n^4}\sum _{i,j,q,r}^{n}h^{(p)}(Z_i, Z_j, Z_q,Z_r) , \end{aligned}$$
(10)

for the symmetrical kernel \(h^{(p)}\) defined as

$$\begin{aligned} h^{(p)}(Z_i, Z_j, Z_q, Z_r)=\frac{1}{4!}\sum _{(t, u, v, w)}^{(i, j, q, r)}(k_{tu}l^{(p)}_{tu}+k_{tu}l^{(p)}_{vw}-2k_{tu}l^{(p)}_{tv}), \end{aligned}$$
(11)

where \(Z_i=(X_i, \varepsilon _i)\) and the sum is taken over all 4! permutations of (ijqr).

Step 2: Negligibility of the remainder term \(R_n\)

In this part we will show that

$$\begin{aligned} nR_n{\mathop {\rightarrow }\limits ^{\text {p}}}0. \end{aligned}$$
(12)

Denote

$$\begin{aligned} Q_n=&\frac{1}{n^2}\sum _{i,j}^{n}k_{ij}({\mathcal {V}}_{ij}-l^{(2)}_{ij})+\frac{1}{n^4}\sum _{i,j,q, r}^{n}k_{ij}({\mathcal {V}}_{qr}-l^{(2)}_{qr})\\&-\frac{2}{n^3}\sum _{i,j,q}^{n}k_{ij}({\mathcal {V}}_{iq}-l^{(2)}_{iq}) \in L^{2*}([0, 1]), \end{aligned}$$

then

$$\begin{aligned} |nR_n|&=\frac{1}{2}|\langle Q_n(\sqrt{n}({\hat{\beta }}-\beta )), \sqrt{n}({\hat{\beta }} - \beta ) \rangle |\\&\le \frac{1}{2}\Vert Q_n\Vert \Vert \sqrt{n}({\hat{\beta }}-\beta )\Vert ^2. \end{aligned}$$

Since \(\Vert \sqrt{n}({\hat{\beta }}-\beta )\Vert =O_p(1)\), we only need to show that \(\Vert Q_n\Vert =o_p(1)\). Note that \(Q_n\) is the sum of three terms and each of these terms can be shown to converge to zero in probability. We will only show the first term, the other two terms can be done in a similar way. Using the condition of Lipschitz continuity of \(l_{xx}\), \(l_{yy}\) and \(l_{xy}\), we have

$$\begin{aligned} \Vert {\mathcal {V}}_{ij}-l_{ij}^{(2)}\Vert \le&L |({\hat{\varepsilon }}_i, {\hat{\varepsilon }}_j) - (\varepsilon _i, \varepsilon _j)|_{\infty }(\Vert X_i\Vert + \Vert X_j\Vert )^2\\ \le&L\Vert {\hat{\beta }}-\beta \Vert (\Vert X_i\Vert + \Vert X_j\Vert )^3. \end{aligned}$$

Therefore, \(\frac{1}{n^2}\sum _{i,j}^{n}k_{ij}({\mathcal {V}}_{ij}-l^{(2)}_{ij})\) is bounded by

$$\begin{aligned} L\Vert {\hat{\beta }}-\beta \Vert n^{-2}\sum _{i,j=1}^{n}|k_{ij}|(\Vert X_i\Vert + \Vert X_j\Vert )^3. \end{aligned}$$

By condition C2(b) and the week law of large number for V-statistics,

$$\begin{aligned} n^{-2}\sum _{i,j=1}^{n}|k_{ij}|(\Vert X_i\Vert + \Vert X_j\Vert )^3 = O_{\text {p}}(1), \end{aligned}$$

hence \(\frac{1}{n^2}\sum _{i,j}^{n}k_{ij}({\mathcal {V}}_{ij}-l^{(2)}_{ij})=o_\text {p}(1)\). With similar techniques for the other two terms, we obtain \(\Vert Q_n\Vert =o_p(1)\).

Step 3: Finding the limiting distribution

By (9) and (12), it is enough to show that the following term

$$\begin{aligned} nT_n^{(0)}+n\langle {\hat{\beta }} - \beta , T_n^{(1)}\rangle + \frac{1}{2}n \langle T_n^{(2)} ({\hat{\beta }}-\beta ), {\hat{\beta }}-\beta \rangle \end{aligned}$$
(13)

converges in distribution. By conditions C2(c)-(e), \(\text {E}\{|h^{(p)}(Z_i, Z_j, Z_q, Z_r)|^2\}<\infty\) for \(1\le i, j, q, r \le 4, p=0, 1\). With calculation, \(\text {E}\{h^{(0)}(z_1, Z_2, Z_3, Z_4)\}=0\) almost surely, so the kernel \(h^{(0)}\) is degenerate. Denote

$$\begin{aligned} h^{(0)}_2(z_1, z_2)=\text {E}\{ h^{(0)}(z_1, z_2, Z_3, Z_4)\} \end{aligned}$$

and define the V-statistic \({\mathcal {S}}^{(0)}_n\) with kernel \(h_2^{(0)}\), that is,

$$\begin{aligned} {\mathcal {S}}_n^{(0)}=\frac{6}{n^2}\sum _{i,j=1}^{n}h_2^{(0)}(Z_i, Z_j). \end{aligned}$$

By the standard results of V-statistics, we have

$$\begin{aligned} n(T_n^{(0)}-{\mathcal {S}}_n^{(0)}){\mathop {\rightarrow }\limits ^{\text {p}}}0. \end{aligned}$$

Define the linear operator \((Af)(s)=\int h^{(0)}_2(s, t)f(t)dP_{X\varepsilon }(t)\) for \(f\in L^2(L^2([0, 1])\times {\mathbb {R}}, P_{X\varepsilon })\), where \(L^2(L^2([0, 1])\times {\mathbb {R}}, P_{X\varepsilon })\) denotes the space consisting of all square integrable functions defined on \(L^2([0, 1])\times {\mathbb {R}}\), and \(P_{X\varepsilon }\) is the joint probability measure of X and \(\varepsilon\). Then the symmetric function \(h^{(0)}_2\) admits an eigenvalue decomposition

$$\begin{aligned} h^{(0)}_2(z_1, z_2)=\sum _{r=1}^{\infty }\gamma _r\phi _r(z_1)\phi _r(z_2), \end{aligned}$$

where \(\{\gamma _r\}_{r=1}^{\infty }\) and \(\{\phi _r\}_{r=1}^{\infty }\) are the eigenvalues and eigenfunctions of A, respectively, satisfying \(\mathrm {E}[\phi _i(Z)\phi _j(Z)]=\delta _{ij}\). Clearly, we have \(\text {E}[h_2^{(0)}(Z_1, Z_1)]=\sum _{r=1}^{\infty }\gamma _r\) and \(\text {E}[h_2^{(0)}(Z_1, Z_2)]^2=\sum _{r=1}^{\infty }\gamma _r^2\). Since \(\text {E}\{|h^{(0)}(Z_1, Z_2, Z_3, Z_4)|^2\}<\infty\), by the results in page 182 of Serfling (1980), \(\text {E}[h^{(0)}_2(Z_1, Z_2)]^2<\infty\). Similarly, we also have \(\text {E}|h^{(0)}(Z_1, Z_1)|<\infty\). Hence, \(|\sum _{r=1}^{\infty }\gamma _r| < \infty\) and \(\sum _{r=1}^{\infty }\gamma _r^2 < \infty\). Note that

$$\begin{aligned} \{n^{-\frac{1}{2}} \sum _{i=1}^{n}\phi _r(Z_i)\}^2=n^{-1}\sum _{i=1}^{n}\sum _{j=1}^{n}\phi _r(Z_i)\phi _r(Z_j). \end{aligned}$$

In view of this, \(n{\mathcal {S}}_n^{(0)}\) can be expressed as

$$\begin{aligned} n{\mathcal {S}}^{(0)}_n=6\sum _{r=1}^{\infty }\gamma _r\{n^{-\frac{1}{2}}\sum _{i=1}^{n}\phi _r(Z_i) \}^2. \end{aligned}$$

Now let us turn to the terms \(T_n^{(1)}\) and \(T_n^{(2)}\) in (13). It can be shown that \(\text {E}\{h^{(1)}(Z_1, Z_2, Z_3, Z_4)\} =0\). Define

$$\begin{aligned} h^{(1)}_1(z_1)=\text {E}\{h^{(1)}(z_1, Z_2, Z_3, Z_4)\}, \end{aligned}$$

then, by the standard theory of V-statistics,

$$\begin{aligned} n^{1/2}T^{(1)}_n-4n^{-1/2}\sum _{i=1}^{n}h^{(1)}_1(Z_i){\mathop {\rightarrow }\limits ^{\text {p}}}0. \end{aligned}$$

Meanwhile, by the weak law of large numbers for V-statistics,

$$\begin{aligned} T^{(2)}_n{\mathop {\rightarrow }\limits ^{\text {p}}}\text {E}\{h^{(2)}(Z_1, Z_2, Z_3, Z_4)\} :=\Lambda . \end{aligned}$$

Recall that \(\sqrt{n}({\hat{\beta }}-\beta )=\frac{1}{\sqrt{n}}\sum _{i=1}^{n}\psi (Z_i) + o_p(1)\). According to the multivariate central limit theorem and the Theorem 1.4.8 in Vaart and Wellner (1996), the countable random sequence

$$\begin{aligned} \left\{ n^{-1/2}\sum _{i=1}^{n}\phi _r(Z_i)\right\} _{r\ge 1}, \quad \left\{ n^{-1/2}\sum _{i=1}^{n}4h^{(1)}_{1}(Z_i)\right\} ,\quad \left\{ \frac{1}{\sqrt{n}}\sum _{i=1}^{n}\psi (Z_i)\right\} \end{aligned}$$

convergences in distribution to the joint Gaussian random sequence

$$\begin{aligned} {\mathcal {Z}}=({\mathcal {Z}}_r)_{r\ge 1}, \quad {\mathcal {N}}, \quad {\mathcal {G}}, \end{aligned}$$

where \({\mathcal {Z}}_r\) are i.i.d. N(0, 1) random variables, \({\mathcal {N}}\), \({\mathcal {G}}\) are Gaussian random functions in \(L^2([0, 1])\) with mean zero and covariance functions \(\text {cov}({\mathcal {N}}(s), {\mathcal {N}}(t))=\text {E}\{16h^{(1)}_{1}(X(s), \varepsilon ) h^{(1)}_{1}(X(t), \varepsilon )\}\) and \(\text {cov}({\mathcal {G}}(s), {\mathcal {G}}(t))=\text {E}\{\psi (X(s), \varepsilon ) \psi (X(t), \varepsilon ) \}\), respectively. Then, by the continuous mapping theorem, we have

$$\begin{aligned} nT_n=&nT_n^{(0)}+n\langle {\hat{\beta }} - \beta , T_n^{(1)}\rangle + \frac{1}{2}n \langle T_n^{(2)} ({\hat{\beta }}-\beta ), {\hat{\beta }}-\beta \rangle + o_{\text {p}}(1)\\&{\mathop {\rightarrow }\limits ^{d}}6\sum _{j=1}^{\infty }\gamma _j {\mathcal {Z}}_j^2 + \langle {\mathcal {G}}, {\mathcal {N}}\rangle + \langle \Lambda ({\mathcal {G}}), {\mathcal {G}}\rangle . \end{aligned}$$

\(\square\)

Proof of Theorem 2

Recall that

$$\begin{aligned} {\hat{\varepsilon }}_{i }&=\varepsilon _{i}-\langle X_{i}, {\hat{\beta }}-\beta \rangle . \end{aligned}$$

Using Taylor expansion of \(l_{ij}\) to order 1, we have, almost surely,

$$\begin{aligned} l_{ij}=l^{(0)}_{ij}+\langle {\hat{\beta }}-\beta , {\mathcal {V}}_{ij}\rangle \end{aligned}$$
(14)

where \(l^{(0)}_{ij}=l(\varepsilon _i, \varepsilon _j)\) and \({\mathcal {V}}_{ij}=-\{l_{x}(\varsigma _{ij}, \tau _{ij})X_i+l_{y}(\varsigma _{ij}, \tau _{ij})X_j\}\) for some point \((\varsigma _{ij}, \tau _{ij})\) on the straight line connecting the two points \(({\hat{\varepsilon }}_i, {\hat{\varepsilon }}_j)\) and \((\varepsilon _i, \varepsilon _j)\) on \({\mathbb {R}}^2\). By (14), \(T_n\) can be decomposed in three terms

$$\begin{aligned} T_n=T_n^{(0)}+\langle {\hat{\beta }} - \beta , T_n^{(1)}\rangle + \langle {\hat{\beta }} - \beta , R_n\rangle \end{aligned}$$
(15)

where \(T_n^{(p)}\), \(p=0, 1\) are defined the same as in the proof of Theorem 1 and

$$\begin{aligned} R_n=\frac{1}{n^2}\sum _{i,j}^{n}k_{ij}({\mathcal {V}}_{ij}-l^{(1)}_{ij}) + \frac{1}{n^4}\sum _{i, j, q, r}^{n}k_{ij}({\mathcal {V}}_{qr}-l^{(1)}_{qr})-\frac{2}{n^3}\sum _{i,j,q}^{n}k_{ij}({\mathcal {V}}_{iq}-l^{(1)}_{iq}). \end{aligned}$$

Under \(H_0\), the predictor X and the error \(\varepsilon\) are independent, therefore the generalized distance \(\theta (X, \varepsilon )\) between X and \(\varepsilon\) is zero. By the condition C4(c)-(e), \(\text {E}\{|h^{(p)}(Z_i, Z_j, Z_q, Z_r)|\}<\infty\) for \(1\le i, j, q, r \le 4, p=0, 1\). By the law of large number for V-statistics,

$$\begin{aligned}&T_n^{(0)}{\mathop {\rightarrow }\limits ^{\mathrm {p}}}\theta (X, \varepsilon )=0,\\&T_n^{(1)}{\mathop {\rightarrow }\limits ^{\mathrm {p}}}\text {E}\{|h^{(1)}(Z_i, Z_j, Z_q, Z_r)|\}. \end{aligned}$$

Therefore \(T_n^{(1)}=O_p(1)\). Observe that

$$\begin{aligned}&\langle {\hat{\beta }} - \beta , T_n^{(1)}\rangle \le \Vert {\hat{\beta }}-\beta \Vert \Vert T_n^{(1)}\Vert ,\\&\langle {\hat{\beta }} - \beta , R_n\rangle \le \Vert {\hat{\beta }}-\beta \Vert \Vert R_n\Vert . \end{aligned}$$

By condition C3, \(\Vert {\hat{\beta }}-\beta \Vert = o_p(1)\). We only need to show that \(\Vert R_n\Vert =O_p(1)\). Note that \(R_n\) is a sum of three terms and each of these terms can be shown to converge to zero in probability. We will only show the first term, the other two terms can be done in a similar way. Using the condition of Lipschitz continuity of \(l_{x}\) and \(l_{y}\), we have

$$\begin{aligned} \Vert {\mathcal {V}}_{ij}-l_{ij}^{(1)}\Vert \le&L |({\hat{\varepsilon }}_i, {\hat{\varepsilon }}_j) - (\varepsilon _i, \varepsilon _j)|_{\infty }(\Vert X_i\Vert + \Vert X_j\Vert )\\ \le&L\Vert {\hat{\beta }}-\beta \Vert (\Vert X_i\Vert + \Vert X_j\Vert )^2. \end{aligned}$$

Therefore, \(\frac{1}{n^2}\sum _{i,j}^{n}k_{ij}({\mathcal {V}}_{ij}-l^{(1)}_{ij})\) is bounded by

$$\begin{aligned} L\Vert {\hat{\beta }}-\beta \Vert n^{-2}\sum _{i,j=1}^{n}|k_{ij}|(\Vert X_i\Vert + \Vert X_j\Vert )^2. \end{aligned}$$

By condition C4(b) and the week law of V-statistics,

$$\begin{aligned} n^{-2}\sum _{i,j=1}^{n}|k_{ij}|(\Vert X_i\Vert + \Vert X_j\Vert )^2 = O_{\text {p}}(1), \end{aligned}$$

hence \(\frac{1}{n^2}\sum _{i,j}^{n}k_{ij}({\mathcal {V}}_{ij}-l^{(1)}_{ij})=O_\text {p}(1)\). With similar techniques for the other two terms, we obtain \(\Vert R_n\Vert =O_p(1)\). \(\square\)

Proof of Theorem 3

Let \(\epsilon _i=m(X_i)-\langle X_i, {\tilde{\beta }}\rangle + \varepsilon _i\). Even though m(x) might not be linear, \(\langle X, {\tilde{\beta }}\rangle\) is the closest function to m(x) in \({\mathcal {M}}_{L^2([0, 1])}\) in the sense of square loss. By the consistency of M-estimator (see Corollary 3.2.3 in Vaart and Wellner 1996), the estimator \({\hat{\beta }}\) convergence in probability to \({\tilde{\beta }}\), that is, \(\Vert {\hat{\beta }}-{\tilde{\beta }}\Vert =o_{\text {p}}(1)\). Using Taylor expansion, we have, almost surely,

$$\begin{aligned} l({\hat{\varepsilon }}_i, {\hat{\varepsilon }}_j)= l(\epsilon _{i}, \epsilon _{j})+\{({\hat{\varepsilon }}_i-\epsilon _{i})l_x(\varsigma _{ij}, \tau _{ij})+({\hat{\varepsilon }}_j-\epsilon _{j})l_y(\varsigma _{ij}, \tau _{ij})\}, \end{aligned}$$

where \((\varsigma _{ij}, \tau _{ij})\) is some point on the line connecting the two points \(({\hat{\varepsilon }}_i, {\hat{\varepsilon }}_j)\) and \((\epsilon _{i}, \epsilon _{j})\). Note that

$$\begin{aligned} {\hat{\varepsilon }}_i-\epsilon _{i}=-\langle X_i, {\hat{\beta }}-{\tilde{\beta }}\rangle , \end{aligned}$$

we can decompose

$$\begin{aligned} T_n=T_n^{(0)} + \langle {\hat{\beta }}-{\tilde{\beta }}, T_n^{(1)}\rangle + R_n, \end{aligned}$$
(16)

where

$$\begin{aligned} T_n^{(p)}&=\frac{1}{n^2}\sum _{i,j}^{n}k_{ij}l_{ij}^{(p)} + \frac{1}{n^4}\sum _{i, j, q, r}^{n}k_{ij}l^{(p)}_{qr}-\frac{2}{n^3}\sum _{i,j,q}^{n}k_{ij}l^{(p)}_{i,q} \quad (p=0, 1), \\ l^{(0)}_{ij}&=l_{ij}=l(\epsilon _i, \epsilon _j), \quad l^{(1)}_{ij}=-\{l_x(\epsilon _i, \epsilon _j)X_i+l_y(\epsilon _i, \epsilon _j)X_j\}\in {\mathcal {H}},\\ R_n&=\langle {\hat{\beta }}-{\tilde{\beta }}, \frac{1}{n^2}\sum _{i,j}^{n}k_{ij}l_{ij}^{*} + \frac{1}{n^4}\sum _{i, j, q, r}^{n}k_{ij}l^{*}_{qr}-\frac{2}{n^3}\sum _{i,j,q}^{n}k_{ij}l^{*}_{i,q}\rangle , \end{aligned}$$

and

$$\begin{aligned} l_{ij}^{*}=-\{(l_x(\varsigma _{ij}, \tau _{ij})-l_x(\epsilon _{i}, \epsilon _{j}))X_i + (l_y(\varsigma _{ij}, \tau _{ij})-l_y(\epsilon _{i}, \epsilon _{j}))X_j\}. \end{aligned}$$

We will show that \(T_n^{(0)}{\mathop {\rightarrow }\limits ^{\text {p}}}\tau\), \(\tau > 0\), \(\langle {\hat{\beta }}-{\tilde{\beta }}, T_n^{(1)}\rangle =o_\text {p}(1)\) and \(R_n=o_\text {p}(1)\). By the results in the proof of Theorem 2, \(\langle {\hat{\beta }}-{\tilde{\beta }}, T_n^{(1)}\rangle =o_\text {p}(1)\) and \(R_n=o_\text {p}(1)\).

Now we show that \(T_n^{(0)}{\mathop {\rightarrow }\limits ^{\text {p}}}\tau\), \(\tau > 0\). Using the same arguments of proof of Theorem 1, \(T_n^{(0)}\) is a V-statistic. By the weak law of V-statistics, \(T_n^{(0)}\) convergence in probability to generalized distance covariance of X and \(\epsilon\) , \(\theta (X, \epsilon )\). Under \(H_{1, 1}\), \(\epsilon =\varepsilon\), hence X and \(\epsilon\) are dependent. Under scenarios \(H_{1, 2}\) or \(H_{1, 3}\), \(m(X)\not = \langle X, {\tilde{\beta }}\rangle\) with positive probability. And the conditional mean of \(\epsilon\) given X is

$$\begin{aligned} \text {E}(\epsilon | X)=m(X)-\langle X, {\tilde{\beta }}\rangle +\text {E}(\varepsilon | X) = m(X)-\langle X, {\tilde{\beta }}\rangle . \end{aligned}$$

With the condition that \(m(X)-\langle X, {\tilde{\beta }}\rangle\) is a non-constant function of X, \(\text {E}(\epsilon |X)\) depends on X, and hence X and \(\epsilon\) are dependent. Since k and l are strong negative type, \(\tau = \theta (X, \epsilon ) >0\). \(\square\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lai, T., Zhang, Z. & Wang, Y. Testing independence and goodness-of-fit jointly for functional linear models. J. Korean Stat. Soc. 50, 380–402 (2021). https://doi.org/10.1007/s42952-020-00083-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42952-020-00083-4

Keywords

Navigation