Abstract
In this paper a test for specification in functional regression with scalar response that exploits semi-parametric principles is illustrated. Once the test statistics is defined, its asymptotic null distribution is derived under suitable conditions. The finite sample performances of the test are analyzed through a simulation study by using both the asymptotic p-value and some bootstrap approaches. To appreciate the potentialities of the method, an application to a spectrometric real dataset is performed.
Similar content being viewed by others
1 Introduction
The problem of specifying the link function that models the dependence structure between two random elements is a very important task in many statistical regression analysis. That problem could be hard to treat when a functional predictor X, that is a random element taking values in a functional space, is used to explain the variability of a real random variable (r.v.) Y. In this case, in fact, the link is described through a real valued operator r acting on a functional space and it is difficult to visualize it, and consequently, to select a coherent specification. This kind of model is known as functional regression model with scalar response:
where \({\mathcal {E}}\) is a centered random error uncorrelated with X. It has appeared in many scientific domains and falls within the methodologies of the so-called functional statistics. For a review on this relatively recent branch of statistics, see for instance the monographes (Ferraty and Vieu 2006; Horvath and Kokoszka 2012) or (Ramsay and Silverman 2005), or the papers that appeared in recent special issues (see, for instance, Aneiros et al. 2022, 2019a, b).
The interest toward the checking of a specification of the regression operator for the model (1) has produced a rich line of research on structural testing procedures. Only to cite some examples, in Aneiros and Vieu (2013) the authors deal with testing linearity in semi-functional partially linear regression models, in Bücher et al. (2011) a test for the hypothesis of a specific parametric functional regression model is introduced, in Cuesta-Albertos et al. (2019) a goodness-of-fit tests for the functional linear model is illustrated, the paper (Delsol et al. 2011) tackles an omnibus goodness-of-fit tests in a full nonparametric framework, finally the work (Patilea et al. 2016) provides a test for the case of a functional response in the spirit of the smoothing test statistic considered by Sheng (1996).
A useful help in the context of the model specification can come from the semi-parametric regression approaches. Thanks to a projective strategy, it allows to visualize a link function, combining the flexibility and the interpretability and, at the same time, to avoid some dimensionality problems that can occur in the full nonparametric context. Only to get a partial idea of the variety of techniques developed, one can see a general presentation in Härdle et al. (2004) and, as examples for what concerns the functional statistics, the papers Ferraty et al. (2013), Goia and Vieu (2015), Ling and Vieu (2021) and Novo et al. (2019) and references therein.
The present paper explores the possibility to build a specification test by exploiting the potential of the Single Functional Index Model (SFIM). This defines the relationship between X and Y through an unknown real link function g acting on a projection of the functional regressor along an unknown direction \(\theta\), constrained for identifiability. Formally, let X be a random element valued in a Hilbert space \({\mathcal {H}}\) of real functions defined over a compact interval \({\mathcal {T}}\), equipped with an inner product \(\left\langle \cdot ,\cdot \right\rangle\) and associated norm \(\left\| \cdot \right\|\), then \(r[X]=g\left( \left\langle \theta ,X\right\rangle \right)\), where \(g:{\mathbb {R}}\rightarrow {\mathbb {R}}\), \(\theta \in {\mathcal {H}}\) with \(\left\| \theta \right\| ^{2}=1\) and \(\theta \left( t\right) >0\) for any fixed \(t\in {\mathcal {T}}\) for identifiability (see e.g. Ait-Saïdi et al. 2008). So far, various techniques have been introduced to estimate g and \(\theta\) from samples drawn from \(\left( X,Y\right)\) and the rate of convergence derived (see e.g. Jiang et al. 2020; Novo et al. 2019 or Shang 2020 for recent contributions).
The main interest in SFIM is that it makes possible to bring back an infinite dimensional problem to a one dimensional framework and to visualize an estimate \(\widehat{g}\) of g, obtained from an observed dataset, that can suggest the nature of the link between X and Y. This allows to postulate a target specification \(g_{0}\) for g, depending on some real parameter, and then to check if it is compatible with the observed dataset at a given significance level. For instance, if the plot of \(\widehat{g}\) exhibits a straightness shape, the linearity of the regression model should be investigated. As a consequence, once the link function is specified, the resulting model depends only on the functional parameter \(\theta\) and hence it is full parametric, with some practical and theoretical benefits in the estimation step (for instance, a faster rate of convergence than in the semi-parmetric case, a full interpretability of the link, good estimates for rather small sample sizes, no smoothing parameters have to be introduced). By way of example, one can consider the prediction of the moisture value for 80 samples of corn by using the first derivatives of the corresponding Near-infrared (NIR) spectra measured over the wavelength range 1100–2498 nanometer (see the original spectra in the top panel of Fig. 1): an estimate of the link function g is plotted in the bottom panel of Fig. 1 and one may wonder if the hypothesis of a linear link is compatible with the empirical evidence.
The aim of this work is to define and operationalize a suitable specification test procedure and then analyze its performances: given the SFIM framework, one wants to test the null hypothesis that the link function g belongs to a family of possible parametric functions:
where the parameter \(\beta\) can be entirely specified or not, against the alternative that g is not an element of \({{{\mathcal {G}}}}_0\). The main idea is to exploit the so-called conditional moment test approach (see e.g. Newey 1985) based on the fact that, under the null hypothesis the quantity \({\mathbb {E}}\left[ {\mathcal {E}}{\mathbb {E}}\left[ {\mathcal {E}}\vert X\right] w\left( X\right) \right]\), where \(w\left( X\right)\) is a suitable weight, is null, whereas it is strictly positive under the alternative one. Therefore, by using a kernel regression approach, a test statistic belonging to the family of U-statistic is derived and appropriately standardized, and, under suitable assumptions on the distribution of X (in particular, expressed in terms of small-ball probability), the involved kernel, the nature of the error, the behaviour of the estimator of the SFIM model used, it is proved that its asymptotic null distribution is the Gaussian one. Thanks to the latter statement, the p value of the test can be directly computed and an evaluation of the power of the test, based on the asymptotic result, can be evaluated by means of Monte Carlo experiments carried out on samples with finite sizes and under various experimental conditions, that is nature of the link, sample size, variability of the error model. In order to appreciate the robustness of the introduced test methodology, a comparison with the results obtained when some standard bootstrap procedures are used is made. Finally, to show how the test can be used for practical purposes, an application to the prediction via SFIM model for the spectrometric dataset is performed and discussed.
The outline of the paper is as follows: the theoretical background, the basic principle of the test and the test statistic are defined in Sect. 2. Theoretical properties of the test statistic are discussed in Sect. 3: in particular in Sects. 3.1 and 3.1.1 the null distribution of the test statistic is derived, whereas some remarks on consistency are provided in Sect. 3.1.2. The performances of the test are analyzed in Sect. 4 and the bootstrap approaches are described in Sect. 5. Finally the real world analysis is done in Sect. 6. For the sake of readability, all the detailed proofs of the theoretical results are postponed to the technical Appendix.
2 Notations and test definition
Consider the random element \(\left( X,Y\right)\) defined on a probability space and mapping on \({\mathcal {H}}\times {\mathbb {R}}\), where \({\mathcal {H}}\) is a Hilbert space of real functions defined over a compact interval \({\mathcal {T}}\). From now on, one takes \({\mathcal {H}}={\mathcal {L}}_{\left[ 0,1\right] }^{2}\) the space of square integrable real functions defined over \(\left[ 0,1\right]\), equipped with the natural inner product \(\left\langle g,h\right\rangle =\int _{0}^{1}g\left( s\right) h\left( s\right) ds\) and associated norm \(\left\| g\right\| ^{2}=\left\langle g,g\right\rangle\).
Assume that the relation between Y and X is defined by the following SFIM:
where \(g:{\mathbb {R}}\rightarrow {\mathbb {R}}\) is an unknown link function, \(\theta \in {\mathcal {H}}\) is an unknown direction such that \(\left\| \theta \right\| ^{2}=1\) and \(\theta \left( t\right) >0\) for a fixed t, for identifiability, and \({\mathcal {E}}\) is a r.v. satisfying \({\mathbb {E}}\left[ {\mathcal {E}}\vert X\right] =0\) and \({\mathbb {E}}\left[ {\mathcal {E}}^{2}\vert X\right] =\sigma ^{2}\) (see e.g. Ait-Saïdi et al. 2008).
2.1 The test principle
Define \({{{\mathcal {G}}}}_0 = \{g_{0}^\beta :{\mathbb {R}} \rightarrow {\mathbb {R}},\beta \in {\mathbb {R}}^{d+1} \}\), \(d\ge 1\) integer, be a known function, measurable w.r.t. the \(\sigma\)-algebra generated by X, and depending on the parameter \(\beta =\left( \beta _{0},\beta _{1},\dots ,\beta _{d}\right) \in {\mathbb {R}} ^{d+1}\). Consider then the following hypothesis:
where \({{{\mathcal {G}}}}_1\) is a set of real functions \(g_1^\beta\) such that \({{{\mathcal {G}}}}_1 \cap {{{\mathcal {G}}}}_0 = \text{\O }\). If \(\theta\) and \(\beta\) are fixed, the hypotheses are simple, otherwise complex. To fix the ideas, the above setting includes the possibility of testing the linearity of the regression by specifying \({\mathcal {G}}_0\) as the set of affine functions \(g_{0}^\beta (u)=\beta _{0}+\beta _{1}u,\) with \(\beta _{0},\beta _{1}\in {\mathbb {R}}\), \(\beta _{1}\ne 0\). In practice, the specification of the model under the null hypothesis could be done by a direct inspection of the scatterplot between the observed values \(\langle X,\theta \rangle\) and those of Y or by imposing an a priori model.
Consider:
where \(g_0 \in {{{\mathcal {G}}}}_0\), and w is a positive weight function. By the SFIM assumption (2),\(\ {\mathbb {E}}\left[ Y\vert X\right] =g(\left\langle X,\theta \right\rangle )\) and this implies that
where \({\mathcal {E}} = Y-g_{0}^\beta \left( \left\langle X,\theta \right\rangle \right) .\) Hence, (3) can be rewritten in the equivalent form:
Under the null hypothesis the latter is null, because of \(g(\langle X,\theta \rangle )=g_0(\langle X,\theta \rangle )\) a.s., whereas under the alternative it is strictly positive, because of \({\mathbb {P}}\left( g_0 (\langle X,\theta \rangle ) = g(\langle X,\theta \rangle ) \right) < 1\).
Thanks to the above principle, it is possible to implement a test procedure starting from an empirical version of \({\mathbb {E}}\left[ {\mathcal {E}} {\mathbb {E}}\left[ {\mathcal {E}}\vert X\right] w\left( X\right) \right]\): the null hypothesis is rejected if it is significantly far from zero.
2.2 The test statistic
Consider a sample \(\left( X_{i},Y_{i}\right) ,i=1,\dots ,n\), of i.i.d. replications of \(\left( X,Y\right)\) and suppose that \(\theta\) and \(\beta\) are completely specified and equal to \(\theta _\star\) and \(\beta _\star\) respectively. Take a Nadaraya–Watson type nonparametric kernel estimate of \({\mathbb {E}}\left[ {\mathcal {E}}\vert X\right]\) at the point \(X_{i}\). Subsequently, an empirical version of \({\mathbb {E}}\left[ {\mathcal {E}}{\mathbb {E}}\left[ {\mathcal {E}}\vert X\right] w\left( X\right) \right]\) can be written as follows
where \({\mathcal {E}}_{i}=Y_i - g_0^{\beta _\star }(\langle X_i, \theta _\star \rangle )\), K is a kernel function, h is a bandwidth, which depends on n, and \(\delta\) is a semi-metric.
In order to derive a convenient expression for the test statistic in the SFIM framework, some simplifications can be introduced. Firsly, as w can be chosen arbitrarily, if one can assume that the projection r.v. \(\left\langle X,\theta _\star \right\rangle\) admits a strictly positive probability density function \(f_{\theta _\star }\), then a possible choice is \(w=f_{\theta _\star }\). Since it is unknown, consider the following cross-validated kernel estimate of \(f_{\theta _\star }\) based on the same kernel, semi-metric and bandwidth as above:
Secondly, one can select the the projection semi-metric:
By plugging these choices in (4), the following simplified expression of the test statistics follows:
where \(K_{ij}^{\theta _\star }=K\left( \left| \left\langle X_{i}-X_{j},\theta _\star \right\rangle \right| /h\right)\).
Invoking similar arguments as in Sheng (1996) one can derive the following estimate for the variance of \(n \sqrt{h} Q_{n}\left( \theta _\star \right)\):
This allows to obtain the standardized test statistic version \(n\sqrt{h}Q_{n}\left( \theta _\star \right) /\nu _{n}\left( \theta _\star \right)\) that can be used to test a simple null hypothesis.
When one deals with composite null hypothesis, \(\theta\) and/or \(\beta\) are not specified and some estimates of them have to be introduced. In particular, let \(\widehat{\theta }\) be an estimator of \(\theta\), one defines \(\widehat{{\mathcal {E}}}_{i}=Y_{i}-g_{0}^{\beta _\star } ( \langle X_{i},\widehat{\theta }\rangle )\) and the resulting test statistic can be written as follows:
where \(K_{ij}^{\widehat{\theta }}=K\left( \vert \langle \widehat{\theta },X_{i}-X_{j}\rangle \vert /h\right)\). To obtain a suitable estimator \(\widehat{\theta }\) for \(\theta\) it is possible to use, for instance, the first step of the approach proposed in Ferraty et al. (2013) which combines a spline approximation of the functional coefficient \(\theta\) and the one-dimensional Nadaraya–Watson approach to estimate the link function in the SFIM (2).
In the case of the parameter \(\beta\) is also not specified and an estimate \(\widehat{\theta }\) is available, one has to consider some estimators \(\widehat{\beta }\) of \(\beta\) that can be obtained through a least square approach, by minimizing \(\sum _{i=1}^n \{Y_i - g_{0}^\beta (\langle X_{i},\widehat{\theta }\rangle )\}^2\). In particular, if \(g_{0}\) is linear affine, one regresses directly the observations \(Y_{i}\)s against the projections \(\left\langle X_{i},\widehat{\theta }\right\rangle\)s. The studentized versions of these test statistics are obtained by plugging in (6) the estimates \(\widehat{\theta }\) and \(\widehat{\beta }\).
3 Asymptotic behaviour of the test statistics
To define the critical region of the test, one needs the derivation of the null distribution of the test statistics. To do this, one considers two scenarios, namely the simple null hypothesis, where both parameters \(\beta\) and \(\theta\) are completely specified, and the complex one: they are discuss in Sects. 3.1 and 3.2 respectively. The study of the behaviour under some alternative hypothesis concludes this section (see Sect. 3.3).
3.1 Simple null hypothesis
Suppose that \(g(\langle \theta , X \rangle ) = g_{0}^{\beta_\star} (\langle \theta , X \rangle )\) and that the direction \(\theta\) is specified and equals to \(\theta _\star\). Consider the following sets of assumptions; in what follows \({\mathcal {F}}[\cdot ]\) denotes the Fourier transform.
Assumptions on the sample
- S-i. :
-
\(\left( X_{i},Y_{i}\right) ,i=1,\dots ,n\), of i.i.d. replications of \(\left( X,Y\right)\)
Assumptions on the model
- M-i. :
-
There exists the generating moments function of the error \({\mathcal {E}}\);
- M-ii. :
-
There exists \(\underline{\sigma }^{2}\) and \(\overline{\sigma }^{2}\) such that \(0<\underline{\sigma }^{2}\le Var({\mathcal {E}}\vert X)\le \overline{\sigma }^{2}<\infty\) almost surely;
- M-iii.1. :
-
There exist a constant \(C_1>0\) such that
$$\begin{aligned} \frac{1}{C_1}\le {\mathbb {E}}\left[ f_{\theta _\star } \left( \langle \theta _\star , X \rangle \right) \right] \le C_1. \end{aligned}$$ - M-iii.2. :
-
There exists \(C_2>0,\epsilon >0\) such that \(\int _{\vert x\vert \le \epsilon }\vert {\mathcal {F}}[f_{\theta _\star }]\vert ^{2}(x)dx\ge C_2\).
- M-iv. :
-
The function \(g^{\beta _\star }_0\) is Lipschitz.
Assumptions on the Kernel function
- K-i. :
-
The kernel K is a continuous density of bounded variation and with strictly positive Fourier transform on the real line.
- K-ii. :
-
As n diverges, \(h\rightarrow 0\) and \(\dfrac{\ln n}{(nh^{2})^{\lambda }}\rightarrow 0\), for some \(\lambda \in (0,1)\).
The main results are collected in the following theorem and its detailed proof is postponed in the Appendix for the sake of readability. The role of the assumptions in proving these results is discussed at the end of this section.
Theorem 1
Under the assumptions S, M and K and when the null hypothesis \(H_{0}\) holds true, one has:
-
(i)
\(\left| Q_{n}(\theta _\star )\right| =O_{{\mathbb {P}}}\left( \dfrac{\ln n}{nh^{1/2}}\right) ,\)
-
(ii)
\(\dfrac{1}{\nu _{n}^{2}(\theta _\star )}=O_{{\mathbb {P}}}(1),\)
-
(iii)
as n goes to infinity, \(n\sqrt{h}Q_{n}\left( \theta _\star \right) /\nu _{n}\left( \theta _\star \right) \sim {\mathcal {N}}\left( 0,1\right) .\)
According to statement (iii) in Theorem 1 if the null hypothesis \(H_{0}\) holds true, the test statistics \(T_{n}=n\sqrt{h}Q_{n}(\theta _\star )/\nu _{n}(\theta _\star )\) converges in law to a standard normal. Consequently, the test given by \({\mathbb {I}}_{\{T_{n}\ge z_{1-a}\}}\), with \(z_{\alpha }\) the \((1-\alpha )\)-th quantile of the standard normal distribution, has an asymptotic level \(\alpha\).
To conclude the section, some remarks about the assumptions stated above are added. Firstly, one can point out that M-i., M-iv. and K-i. are standard hypothesis in the framework of nonparametric functional regression models (see for instance Ferraty and Vieu 2006). In particular, the bounded variation condition K-i. is satisfied by many well-known kernels, such as Gaussian or Epachnikov ones, and thanks to this condition, the class of kernel functions subsequently considered is Euclidean, a notion crucial for achieving the proofs and that is defined in the Appendix. In fact, the latter notion together with the regularity condition M.iv, imposed on the link function \(g_{0}\), allow to utilise an existing exponential inequality due to Major (2006). To do this the conditional expectation of the squared errors and the squared kernel need to be investigated. The boundedness condition of the conditional variance (see M-ii.) allows one to bound the conditional errors below and above. The behaviour of the squared kernel is studied through the properties of its Fourier transform. The technical assumptions such as the positive Fourier transform on the real line K-i., the boundedness below and above of the density of the pdf \(f_{\theta _\star}\) (see M-iii.1), the boundedness below of the restricted Fourier transform of the pdf \(f_{\theta _\star}\) (see M-iii.2) allow one to establish the upper and lower bounds of the kernel and this verifies a condition required for the Major’s exponential inequality. Note that the condition M-iii.2 is satisfied by many random variables, such as Gaussian or exponential. The existence of moment generating function of the error \({\mathcal {E}}\) (see M-i.), and the trade-off between the sample size n and the bandwidth h (see K-ii.) permit to establish the rate of convergence in statements (i) and (ii) of Theorem 1. Finally, the collection of hypotheses further allows one to operate in a similar setting as in Lavergne and Patilea (2008) to establish the asymptotic normality in statement (iii).
3.2 Complex null hypothesis
Suppose that \(\beta\) is fixed and \(\theta\) is estimated by \(\widehat{\theta }\). Then one has to investigate the behaviour of \(Q_{n}(\widehat{\theta })\): to achieve the convergence results, the following set of extra assumptions is needed.
Extra assumptions on the estimate
- E-i. :
-
Take \(\widehat{\theta }\) belonging to a set \(\Theta _{n}\) of possible directions of interest which is such that
$$\begin{aligned} \#\Theta _{n}=n^{p},p>0\text { and }\left\| \theta -\theta _{\star }\right\| \le C\left( \frac{\ln n}{n}\right) ^{r} \text { for any } \theta \in \Theta _{n} \end{aligned}$$where \(\#\) is the cardinality of a set, \(p=p_{n}\) depends on n, and \(r>0\) depends on some regularity assumption on the regression operator in (1) (see e.g. Ferraty et al. 2013 or Novo et al. 2019).
- E-ii. :
-
The bandwidth h satisfies: \(n^{1-2r}h^{1/2}\rightarrow 0.\)
Extra assumptions on the sample
- S-ii. :
-
\(\Vert X_{i}\Vert\) is bounded.
Extra assumptions on the model
- M-iii.1.bis :
-
There exist a constant \(C_1>0\) such that for any \(\theta \in \Theta _{n}\)
$$\begin{aligned} \frac{1}{C_1}\le {\mathbb {E}}\left[ f_{\theta }( \langle \theta , X \rangle ) \right] \le C_1. \end{aligned}$$ - M-iii.2.bis :
-
There exists \(C_2>0,\epsilon >0\) such that for any \(\theta \in \Theta _{n}\), \(\int _{\vert x\vert \le \epsilon }\vert {\mathcal {F}}[f_{\theta }]\vert ^{2}(x)dx\ge C_2\).
Extra assumptions on the Kernel function
- K-iii. :
-
\(p\ge 1\) increases to infinity with n and \(p^{3/2}(\ln n)^{-\lambda }\) is bounded, for some constant \(\lambda >0\).
Return to the expression of \(Q_{n}\left( \widehat{\theta }\right)\) and define the following decomposition
The following result provides the asymptotic behaviour of each term in the previous decomposition and it demonstrates that the leading term is \(\widehat{Q}_{n}^{A}\left( \widehat{\theta }\right)\) whilst the other two terms are negligible with respect to the first one.
Theorem 2
Under Assumptions S, M, K and E and the null hypothesis \(H_{0}\) holds true, one has:
-
(i)
\(\left| \widehat{Q}_{n}^{A}\left( \widehat{\theta }\right) \right| =O_{{\mathbb {P}}}\left( \dfrac{p^{3/2}\ln n}{nh^{1/2}}\right) ,\)
-
(ii)
\(\left| \widehat{Q}_{n}^{B}\left( \widehat{\theta }\right) \right| =o_{{\mathbb {P}}}\left( \dfrac{p^{3/2}\ln n}{nh^{1/2}}\right) \quad \text {and} \quad \left| \widehat{Q}_{n}^{C}\left( \widehat{\theta }\right) \right| =o_{{\mathbb {P}}}\left( \dfrac{p^{3/2}\ln n}{nh^{1/2}}\right) ,\)
-
(iii)
as n goes to infinity, \(n\sqrt{h}Q_{n}\left( \widehat{\theta }\right) \Big / \nu _{n}\left( \widehat{\theta }\right) \sim {\mathcal {N}}\left( 0,1\right) .\)
In what follows, a short discussion on the assumptions is reported. Firstly, note that since \(\widehat{Q}_{n}^{A}(\widehat{\theta })\) is the version of \(Q_{n}(\theta )\) when one deals with complex null hypothesis, the argument to deduce the rate of convergence as in Theorem 2, statement (i), proceeds very similarly to the simple null hypothesis situation. Hence, the sets of conditions M and K on the models and on the kernel function respectively are still necessary but with certain modifications or additions with respect to the simple null hypothesis case, such as M-iii.1.bis and M-iii.2.bis. Extra assumption on the kernel, such as the trade off between the dimension p and the sample size n, allows one to establish the required result. More in detail, the fact that the kernel is a continuous density of bounded variation and the regularity condition on the link function \(g_{0}\) (assumptions K-i. and M-iv. respectively) allow to invoke an inequality of Sherman (see Sherman 1994). Secondly, the assumption E-i. is rather standard to derive a uniform rate of convergence for the estimator of the link function in the SFIM (see, for instance, Novo et al. 2019). It imposes that the space of possible directions is finite but it can grow up to infinity as the sample size n increases and the direction of interest \(\theta\) becomes closer to \(\theta _{\star }\) and consequently also for \(\widehat{\theta }\) as it is an element of \(\Theta _{n}\). The latter assumption, the boundedness of X (assumption S-ii.) and the extra condition on the trade-off between the sample size n and the bandwidth h (assumption E-ii.) combined with Sherman’s inequality allow us to deduce the rate of convergence of the statement (ii) of Theorem 2.
When \(\beta\) is also not specified, a least square estimate \(\widehat{\beta }\) can be used as mentioned previously. In that case, if \(\beta\) belongs to a compact subset of \({\mathbb {R}}^{d+1}\), if the used estimate achieves the rate \(n^{-r}\), \(0<r \le 1/2\), and if there exists a positive constant C such that for any fixed argument u, \(\vert g_0^\beta (u) - g_0^{\beta '}(u) \vert \le C \Vert \beta -\beta ' \Vert\), the result in Theorem 2 remains valid (see the proof in the Appendix). This is verified, for instance, when \(g_0^\beta\) is a polynomial model.
3.3 Some remarks on consistency
Consider the following alternative hypothesis:
where G is a smooth function and \(\gamma _{n}\) is a positive sequence tending to zero as \(n\rightarrow +\infty\).
The following result provides the asymptotic behaviour of the test statistic under the considered alternative hypothesis. For the sake of simplicity the case in which the parameter \(\beta\) and the direction \(\theta\) in the SFIM are completely specified (and equals to \(\beta _{\star }\) and \(\theta _{\star }\) respectively) is dealt with.
Theorem 3
Under the alternative hypothesis (8), the assumptions S, M and K of Theorem 1, if G is bounded variation and \(f_{\theta _\star }\) is continuous and bounded, and \(Gf_{\theta _\star } \ne 0\) a.s., and if \(\gamma _{n}^{2}h^{-1/2}\) diverges as n goes to infinity, then the following statement holds:
The proof of this result is deferred in Appendix.
4 Simulation study
In this section the finite sample properties of the test proposed are explored by evaluating the empirical level and power under different experimental conditions. For each setting, the empirical power is computed as the proportion of times in which the test rejects the null hypothesis at the nominal level \(\alpha\) (here \(\alpha =5\%\)) over 1000 Monte Carlo replications. The critical region of the test is based on the Gaussian approximation of the null distribution provided in Theorem 2: one rejects the null hypothesis whenever the value of the studentized test statistics is greater than the quantile of order \(1-\alpha\) of the standard normal distribution. All the experiments are conducted using the software R.
The data used in all the simulations are generated according to the following SFIM model:
with \(n=50,100,200\), corresponding to small and medium sample sizes.
For any \(i=1,\dots ,n\), the functional covariate obeys to:
where \(a_{i}\), \(b_{i}\), \(c_{i}\) are i.i.d. uniform r.v.s over \(\left( -1,1\right)\), so that the random curves are centered and bounded so that assumption S-ii. is satisfied; every trajectory is discretized over a grid of 100 equispaced design points. A sample of 30 of such a functional data is plotted in Fig. 2.
For what concerns the functional coefficient, one uses the normalized direction:
Due to the nature of the involved objects, the r.v.s \(\left\langle \theta ,X_{i}\right\rangle = \sqrt{2}(b_i/2-a_i/\pi )\) are centered, symmetric, bounded, and with a strictly positive density \(f_\theta\) over \(\left( -(\pi +2)/(\sqrt{2}\pi ),(\pi +2)/(\sqrt{2}\pi )\right)\) exhibiting a trapezoidal behavior over that interval, satisfying in this way the assumption M.iii.1. In practice, since one works with discretized curves, all the integrals \(\left\langle \theta ,X_{i}\right\rangle\) are approximated by summations.
About the error in the model, \({\mathcal {E}}_{i}\) are i.i.d. standard Gaussian r.v.s. and, to control the signal-to-noise ratio, the variability coefficient \(\sigma\) is defined by \(\sigma ^{2}=\rho ^{2}Var\left( g\left( \left\langle \theta ,X\right\rangle \right) \right)\), where the latter variance is estimated for each sample using the data. Here \(\rho ^{2}=0.2\) and 0.5 corresponding to a theoretical coefficient of determination \(R^{2}\) of about 0.83 and 0.67, respectively. These choices guarantee that the assumptions M.i. and M.ii. hold.
In all the experiments some composite hypotheses, with both \(\beta\) and \(\theta\) unknown, are tested. In particular, testing for linear and cubic link are analyzed in details (all the tested functions satisfy the assumption M.iv.). Hence, to operationalize the test procedure some estimate of \(\beta\) and \(\theta\) and the evaluation of the bandwidth h are necessary. For what concerns \(\beta\), the standard OLS approach is used, whereas to estimate \(\theta\) one adopts the first step of the approach proposed in Ferraty et al. (2013). Here one uses cubic splines with 10 internal knots and the Epanechnikov Kernel
The same kernel is also used in evaluating the test statistic: since the bandwidth h is related to the estimate of \(f_{\theta }\), it is selected by the unbiased cross-validation approach that one uses in estimating that density. Due to the non parametric nature of the test statistic, the selection of the bandwidth could have an impact on the results of the simulation study. On the other hand, a systematic analysis of the effects on the performances of the test caused by changing the bandwidth goes beyond the scope of this work; therefore it is not performed.
4.1 Testing a linear affine link
Let \(\beta _{0},\beta _{1}\in {\mathbb {R}}\), \(\beta _{1}\ne 0\), \(u=\left\langle \theta ,x\right\rangle\), \(\theta\) unknown, \(\left| u\right| < \left( \pi +2\right) /\left( \sqrt{2}\pi \right)\) and consider the complex null hypothesis of a linear affine link:
and the following alternatives:
where \(\gamma >0\), which controls the departure from the null hypothesis. To generate the models under the null and the alternative hypotheses are used \(\beta _{0}=0\), \(\beta _{1}=1\) so that \({\mathbb {E}}\left[ g\left( U\right) \right] =0\) and \(\gamma =0.2,0.3,0.4,0.5,0.6,0.7,0.8,1.2,1.4\).
In order to appreciate the differences between the considered models, the shapes of the functions g under the null and the alternative hypotheses with \(\gamma =0.5\) are drawn in Fig. 3: one can note that the quadratic perturbation involved in \(H_{1}^{\left( 1\right) }\) interest mainly the tails of the distribution of the projected data \(\left\langle \theta ,x\right\rangle\), the sinus in \(H_{1}^{\left( 2\right) }\) the central parts of the two halves of the interval, whereas the cosinus in \(H_{1} ^{\left( 3\right) }\) acts both on the central part and the tails of the interval. Hence one expects that the test performs rather well in the last case for any sample size n and \(\rho\), also with \(\gamma\) close to zero. On the other hand, it is expected that the performances are not so good, at least for small samples and large \(\rho\), for the first two alternatives \(H_{1}^{\left( 1\right) }\) and \(H_{1}^{\left( 2\right) }\) where a rather large sample size is necessary to detect small deviations from the linearity.
The estimated powers varying \(\gamma\) for the different considered scenarios are represented in Figs. 4, 5 and 6. First, note that, despite an asymptotic approximation for the null distribution is used, the empirical level is rather close to the theoretical one, also for a relatively small sample size. Second, as expected, for any sample size n and given \(\rho ^{2}\), the further one moves away from the linearity by increasing \(\gamma\), the greater is the estimated power. In any case, the performances are very good when \(n=200\) and \(\rho ^{2}=0.2\), also with a relatively modest departure from the null hypothesis. Looking at more in detail, the graphs support the previous comments that relate the nature of the link functions and the power behaviour. In particular the alternatives in the family indexed by \(H_{1}^{\left( 3\right) }\) produce the best result also for \(\gamma\) rather small, whereas it appears more complex to detect correctly \(H_{1}^{\left( 1\right) }\) and \(H_{1}^{\left( 2\right) }\), at least for small \(\gamma\), small samples and a rather low signal-to-noise.
4.2 Testing a cubic link
Let \(\beta _{0},\beta _{1}\in {\mathbb {R}}\), \(\beta _{1}\ne 0\), \(u=\left\langle \theta ,x\right\rangle\), \(\theta\) unknown, \(\left| u\right| < \left( \pi +2\right) /\left( \sqrt{2}\pi \right)\). The second experiment considers the following null hypothesis:
against the following alternatives
where \({\mathbb {I}}_{A}\) is the indicator function of the set A, and \(\gamma >0\) measures the departure from the null hypothesis. To generate the models one uses \(\ \beta _{0}=0\), \(\beta _{1}=1\) and \(\gamma =1,1.5,2,2.5,3.5,4,4.5,5\) (the behaviour of some link functions for some selected values for \(\gamma\) are drawn in Fig. 7).
Also in this second experiment, the obtained results, visualized in Fig. 8, are generally rather good. One can note that the estimated level is slightly higher than the nominal one, providing a liberal test in particular for relatively small sample size. For any fixed n and \(\rho ^{2}\), the estimated power increases coherently with the departure from the null hypothesis. In particular it emerges that it is possible to better discriminate situation with \(\gamma <3\) than \(\gamma >3\) where the test is slightly less efficient. Anyway, for rather large samples the results obtained are good also for values of \(\gamma\) close to zero and \(\rho ^{2}=0.5\). In general, the results corroborate the fact that the Gaussian approximation of the null distribution works reasonably well.
5 Some bootstrap procedures
Despite the use of the asymptotic null distribution in defining the critical region of the test has proven capable of producing good results, one can explore the possibility to estimate the threshold of the critical region by using the quantiles calculated through some bootstrap algorithms. Since methods based on bootstrapping the pairs \(\left( X_{i},Y_{i}\right)\) are not adapted, approaches based on boostrapping residuals are adopted.
The general procedure is described in the steps below:
-
1.
Estimate \(\beta\) and \(\theta\) and then compute the errors under the null hypothesis \(\widehat{{\mathcal {E}}}_{i}\) and the value of the test statistic \(T_{n}\);
-
2.
Compute the bootstrap version \(T_{n}^{\star }\) of the test statistics by using the bootstrapped errors \(\widehat{{\mathcal {E}}}_{i}^{\star }\) (see below for details);
-
3.
Repeat step 2. a large number B of times and compute the \(\left( 1-\alpha \right)\)-quantile \(\tau _{\alpha }^{\star }\) of the distribution of \(T_{n}^{\star }\);
-
4.
Compare \(T_{n}\) with \(\tau _{\alpha }^{\star }\): if the value of \(T_{n}\) is larger than \(\tau _{\alpha }^{\star }\) reject the null hypothesis.
For what concerns step 2., both naive and wild bootstrap approaches are adopted:
-
(i)
the naive bootstrap is based on a direct resampling with replacement of the estimated errors \(\widehat{{\mathcal {E}}}_{i}\), \(i=1,\dots n\);
-
(ii)
the wild bootstrap errors are calculated as \(\widehat{{\mathcal {E}} }_{i}^{\star }=\widehat{{\mathcal {E}}}_{i}\xi _{i}\), where \(\xi _{i}\) are i.i.d. and independent on \(\left( X_{i},Y_{i}\right)\). In the experiment three different distributions for \(\xi _{i}\) are used:
-
(a)
the Rademacher distribution with equiprobable values \(\left\{ -1,1\right\}\);
-
(b)
the distribution suggested by Mammen (1993) with values \(\left( 1-\sqrt{5}\right) /2\) and \(\left( 1+\sqrt{5}\right) /2\) with associated probability \(\left( \sqrt{5}+1\right) /\left( 2\sqrt{5}\right)\) and \(\left( \sqrt{5}-1\right) /\left( 2\sqrt{5}\right)\) respectively;
-
(c)
the standard Gaussian distribution.
-
(a)
To assess the performances of the test when the asympototic null distribution and the bootstrap approaches are employed, an experiment similar to the one presented in Sect. 4 is carried out, choosing \(B=1000\) in step 3. of the algorithm.
In particular let \(\beta _{0},\beta _{1}\in {\mathbb {R}}\), \(\beta _{1}\ne 0\), \(u=\left\langle \theta ,x\right\rangle\), \(\theta\) unknown, \(\left| u\right| < \left( \pi +2\right) /\left( \sqrt{2}\pi \right)\). Consider the null hypothesis of a quadratic link:
against the alternatives
where \(\gamma =1,1.5,2.5,3\). For what concerns the data generation process, one fixes \(n=100\), and \(\beta _{0}=0\), \(\beta _{1}=1\), and \(\rho ^{2}=0.2\).
The estimated powers varying \(\gamma\) (when \(\gamma =2\) one deals with the null hypothesis), on the base of 1000 MC replications, are collected in Table 1. The results obtained in the different cases are very similar: it emerges that the test based on the asymptotic distribution is slightly more liberal than all the tests based on the bootstrap approaches; for what concerns the estimated power it seems that the performances are very similar in all the analyzed cases. In conclusion, the test that uses the quantiles of the standard Gaussian performs quite well also when rather small samples are available.
6 Application to spectrometric data
An important task in domains like chemistry, medicine or food industry is to get the composition of a given substance. Since chemical analysis is rather expensive and require time, it is often preferred to estimate that composition by using spectrometric curves that can be easily obtained as the absorption of a reflected light for various wavelengths.
In this section one considers an example of such a modeling from food industry. The dataset (available at https://www.eigenvector.com/data) consists of 80 samples of corn to each of which corresponds both the values of moisture, oil, protein and starch contents obtained by chemical analysis and the spectrometric curve measured by NIR spectrometers on the wavelength range 1100–2498 and discretized over an equispaced mesh of 700 points (the set of these curves is reproduced in Fig. 1). The aim is to model the chemical composition of the corn by using the spectrometric curves, or better, as it is done in similar context, their derivatives.
A systematic study on this dataset has been carried out in Delsol (2013) where no effect tests in functional regression were performed taking as explanatory variables the original spectrometric curves and their successive derivatives, from the first to the fourth order. According to the results in the cited paper, it emerges that the first derivative has a significant effect on moisture content, no significant effects are detected for oil and starch content, whereas the effect of the fourth derivative on the protein is not evident. On the other hand, the first derivative on the wavelength range 2010–2220 exhibits a significant effect on the protein content.
Starting from the latter evidences, one concentrates the attention on modelling by a SFIM the moisture content by using as covariate the first derivative on the whole wavelength range, and the protein content with covariate the first derivative on the range 2010–2220. The estimated links g and directions \(\theta\), obtained by using the same procedure adopted in Sect. 4 with cubic splines with 10 and 2 knots respectively, are depicted in Fig. 9. Both graphs suggest that the link functions g in the SFIMs could be specified linearly: therefore the test for linearity illustrated in Sect. 4.1 is performed.
In particular one wants to test the following model specification:
Since all the parameters involved (\(\beta _0\), \(\beta _1\) and \(\theta\)) are unknown, they must be estimated from the data: the \(\widehat{\theta }\)s are the formerly estimated directions plotted in Fig. 9, whereas \(\widehat{\beta }_0\) and \(\widehat{\beta }_1\) are the OLS estimated for the model under the null hypothesis with covariate \(\langle X,\widehat{\theta }\rangle\). For the sake of completeness, the estimates obtained are as follows: \(\widehat{\beta }_0 = 18.57\) and \(\widehat{\beta }_1= -37.80\) when the response is the moisture content, \(\widehat{\beta }_0 = 13.74\) and \(\widehat{\beta }_1= 20.91\) when the response is the protein content. The behaviour of the residuals of such a models, from which the test statistics are calculated, is shown in Fig. 10.
By using the asymptotic null distribution, the following results are achieved: the p-value for the model having as response the moisture content is 0.695 whereas the one when the response is the protein content equals 0.719. Hence, one can conclude that in both cases the hypothesis of linearity of the link function appears compatible with the empirical evidence.
Since one tests the linearity of the link functions, to complete the analysis, a comparison with the linearity test proposed by Garcia-Portugues et al. (2014) and available in the R package fda.usc is performed. To be coherent with what is done above, one uses the estimated method based on B-spline with the same number of elements of the basis considered in the SFIM. For what concerns the model with response the moisture content, the p-value is 0.217, whereas for the model having as response the protein content, the p-value equals 0.224. In both cases, there is no reason to reject of the hypothesis of linearity of the model: the latter results are then coherent with what emerged by the proposed test.
References
Ait-Saïdi A, Ferraty F, Kassa R, Vieu P (2008) Cross-validated estimations in the single-functional index model. Statistics 42:475–494
Aneiros G, Vieu P (2013) Testing linearity in semi-parametric functional data analysis. Comput Stat 28:413–434
Aneiros G, Cao R, Fraiman R, Vieu P (2019a) Editorial for the special issue on functional data analysis and related topics. J Multivar Anal 170:1–2
Aneiros G, Cao R, Vieu P (2019b) Editorial on the special issue on functional data analysis and related topics. Comput Stat 34:447–450
Aneiros G, Horová I, Hušková M, Vieu P (2022) On functional data analysis and related topics. J Multivar Anal 189:104861
Bücher A, Dette H, Wieczorek G (2011) Testing model assumptions in functional regression models. J Multivar Anal 102:1472–1488
Cuesta-Albertos JA, García-Portugués E, Febrero-Bande M, González-Manteiga W (2019) Goodness-of-fit tests for the functional linear model based on randomly projected empirical processes. Ann Stat 47:439–467
Delsol L (2013) No effect tests in regression on functional variable and some applications to spectrometric studies. Comput Stat 28:1775–1811
Delsol L, Ferraty F, Vieu P (2011) Structural test in regression on functional variables. J Multivar Anal 102:422–447
Ferraty F, Vieu P (2006) Nonparametric functional data analysis. Springer, New York
Ferraty F, Goia A, Salinelli E, Vieu P (2013) Functional projection pursuit regression. TEST 22:293–320
Garcia-Portugues E, Gonzalez-Manteiga W, Febrero-Bande M (2014) A goodness-of-fit test for the functional linear model with scalar response. J Comput Graph Stat 23:761–778
Goia A, Vieu P (2015) A partitioned single functional index model. Comput Stat 30:673–692
Guerre E, Lavergne P (2005) Data-driven rate-optimal specification testing in regression models. Ann Stat 33:840–870
Härdle W, Müller N, Sperlich S, Werwatz A (2004) Nonparametric and semiparametric models. Springer, New York
Horvath L, Kokoszka P (2012) Inference for functional data with applications. Springer, New York
Jiang F, Baek S, Cao J, Ma Y (2020) A functional single-index model. Stat Sin 30:303–324
Lavergne P, Patilea V (2008) Breaking the curse of dimensionality in nonparametric testing. J Econ 143:103–122
Ling N, Vieu P (2021) On semiparametric regression in functional data analysis. Rev Comput Stat 13:e1538
Major P (2006) An estimate on the supremum of a nice class of stochastic integrals and U-statistics. Probab Theory Relat Fields 134:489–537
Mammen E (1993) Bootstrap and wild bootstrap for high dimensional linear models. Ann Stat 21:255–285
Newey WK (1985) Maximum likelihood specification testing and conditional moment tests. Econometrica 53:1047–1070
Nolan D, Pollard D (1987) U-processes: rates of convergence. Ann Stat 15:780–799
Novo S, Aneiros-Pérez G, Vieu P (2019) Automatic and location-adaptive estimation in functional single-index regression. J Nonparametr Stat 31:364–392
Pakes A, Pollard D (1989) Simulation and the asymptotics of optimization estimators. Econometrica 57:1027–1057
Patilea V, Sánchez-Sellero C, Saumard M (2016) Testing the predictor effect on a functional response. J Am Stat Assoc 111:1684–1695
Patilea V, Sánchez-Sellero C, Saumard M (2018) Projection-based nonparametric goodness-of-fit testing with functional covariates (Preprint)
Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer, New York
Shang HL (2020) Estimation of a functional single index model with dependent errors and unknown error density. Commun Stat Simul Comput 49:3111–3133
Sherman RP (1994) Maximal inequalities for degenerate U-processes with applications to optimization estimators. Ann Stat 22:439–459
Zheng X (1996) A consistent test of functional form via nonparametric estimation technique. J Econ 75:263–289
van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer, New York
Acknowledgements
The authors thank an Associate Editor and two anonymous reviewers for their helpful comments and suggestions. Lax Chan holds a career grant supported by the European Commission - FSE REACT-EU, PON Ricerca e Innovazione 2014-2020. A. Goia and L. Chan are members of the Gruppo Nazionale per l’Analisi Matematica, la Probabilità e le loro Applicazioni (GNAMPA) of the Istituto Nazionale di Alta Matematica (INdAM).
Funding
Open access funding provided by Università degli Studi del Piemonte Orientale Amedeo Avogrado within the CRUI-CARE Agreement.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Proofs
Appendix A: Proofs
In this appendix the detailed proofs of theorems 1 and 2 are provided. The results are inspired by the ones in Patilea et al. (2019) and reported here for the sake of completeness and to make the work self-contained as much as possible.
To derive the main results, some properties of the Euclidean class of functions are considered; for more details, see Definition 2.7 and lemmas 2.13, 2.14 and 2.15 in Pakes and Pollard (1989).
Let introduce the following notation, where x denotes a fixed element in \({{\mathcal {H}}}\):
It is possible to verify that the class of functions \({\mathcal {K}}_{\theta }\) is Euclidean for a constant envelope thanks to the bounded variation assumption K-i. together with the fact that \(\left\langle X,\theta \right\rangle\) is a real-valued measurable map. Combining Lemma 22(ii) in Nolan and Pollard (1987) and Lemma 2.15 of Pakes and Pollard (1989) allow to reach the conclusion. Similarly, from Lemma 22(ii) in Nolan and Pollard (1987) the class \({\mathcal {K}}_{\Theta _{n}}\) is Euclidean for a constant envelope.
1.1 A.1: Proof of Theorem 1
Statement (i)
Define \({\mathcal {E}}_{M,i}={\mathcal {E}}_{i}{\mathbb {I}}_{\{\vert {{\mathcal {E}}_{i}}\vert \le M\}}-{\mathbb {E}}[{\mathcal {E}}_{i}{\mathbb {I}}_{\{\vert {{\mathcal {E}}_{i}}\vert \le M\}}\vert X_{i}]\), where M depends on n and will be specified below, and take \(\theta =\theta _\star .\) Let
and \(Q_{M,n}(\theta _\star )=U_{M,n}(\theta _\star )/h\). The class of all functions \({\mathcal {K}}_{\theta _\star }\) is Euclidean and by assumption M-iv. together with Lemma 2.13 of Pakes and Pollard (1989), \({\mathcal {E}}_{M,i}\) is Euclidean and similarly for \({\mathcal {E}}_{M,j}\). Since the product of classes of Euclidean functions is Euclidean (see Lemma 2.14(ii) of Pakes and Pollard 1989), then also the kernel of the U-statistics in (A1) is.
Therefore, Theorem 2 in Major (2006) can be invoked and for any \(t>0\) one has
provided that
where \(C_{1},\ldots ,C_{5}\) are some positive constants and
One has to verify now condition (A3). To do this, apply the tower law to the definition of \(\sigma _{M}^{2}\) to obtain:
The behaviour of \({\mathbb {E}}\left[ \left( {\mathcal {E}}_{M,i}\right) ^{2}\vert X_{i}\right] , i=1,2\) are investigated: thanks to assumption M-ii., it is bounded above and below by positive constants. It remains to study \({\mathbb {E}}\left[ (K_{1,2}^{\theta _\star })^{2} \right]\).
Apply the Fourier inversion theorem to the kernel \((K_{1,2}^{\theta _\star })^{2}\) and since \(X_{1},X_{2}\) are independent then
By Fubini theorem and then applying the Plancherel theorem together with assumption K-i., one gets
where C is a constant and the latter is bounded by a constant by assumption M-iii.1. On the other hand, (A4) is bounded from below: assumptions K-i. and M-iii.2. allow to write
where \(C_{3},C_{4}\) are some constants.
Combining the above arguments, there exists a constant \(C>0\) such that \(1/C\le \sigma _{M}^{2}M^{4}/h\le C\). Let
with \(\delta >0\) arbitrarily small, then \(\sigma _{M}^{2}\le C{(\ln n)^{1+\delta }}/n\), and therefore \(\sigma _{M}^{2}\) is of order \({(\ln n)^{1+\delta }/n}\rightarrow 0\). For any \(t>0\), using the fact that \(1/C\le \sigma _{M}^{2}M^{4}/h\), then the left hand side inequality in (A3) holds provided n is large enough.
Since \(\sigma _{M}^{2}M^{4}/h\le C\) there exists a positive constant \(C'\) such that
which goes to infinity for sufficiently large t. Noting that \(\ln \left( 2/\sigma _{M}\right) /\ln n\) tends to a non-negative constant as \(n\rightarrow \infty\), the right hand side inequality in (A3) holds.
Thanks to Major inequality (A2), one deduces that \(\vert Q_{M,n}(\theta _\star )\vert =O_{{\mathbb {P}}}( {\ln n}/(nh^{1/2}))\) and the proof of Statement (i) is completed once one can prove that \(\vert Q_{n}(\theta _\star )-Q_{M,n}(\theta _\star )\vert =o_{{\mathbb {P}}}({\ln n}/(nh^{1/2}))\).
Note that \(Q_{n}(\theta _\star )-Q_{M,n}(\theta _\star )=2R_{1n}+R_{2n}\), where
and
where \(\zeta _{i}={\mathcal {E}}_{i}{\mathbb {I}}_{\{\vert {{\mathcal {E}}_{i}}\vert \ge M\}}-{\mathbb {E}}[{\mathcal {E}}_{i}{\mathbb {I}}_{\{\vert {{\mathcal {E}}_{i}}\vert \ge M\}}\vert X_{i}]\). By assumptions S-i. and K-i., one deduces that
Applying the Hölder inequality and Markov inequality, one obtains
Using assumption K-ii. and with the choice of \(m>11\) as in assumption M-i., one has \(M^{1-m}=o(h^{1/2}\ln n/n)\). Moreover, since \(\vert R_{2n}\vert\) is of smaller order of \(\vert R_{1n}\vert\). Invoke the Markov inequality and the proof is concluded.
Statement (ii)
In what follows, the dependence on \(\theta _\star\) is dropped if there is not any confusion. Define
and its the theoretical counterpart
Define
and
The centered Hoeffding decomposition allows to write:
By using the same argument to prove that the kernel of (A1) is Euclidean, one can deduce that \(U_{1,n}\) and \(U_{2,n}\) are Euclidean. Consequently, one can use (Major, 2006, Theorem 2) to derive \(\left| U_{2,n}\right| =O_{{\mathbb {P}}}\left( h^{1/2}\ln n/n\right)\) and (Vaart and Wellner 1996, Theorem 2.14.9) to obtain \(\left| U_{1,n}\right| =O_{{\mathbb {P}}}\left( 1/n^{1/2}\right)\). Gathering these rates of convergence and by assumption K-ii. it follows that the difference between the empirical version \(\nu _{n}^{2}\) and its theoretical counterpart \(\nu ^{2}\) is negligible, that is \(\nu _{n}^{2}-\nu ^{2}=o_{{\mathbb {P}}}(1)\), and then the behaviour of \(\nu ^{2}\) is investigated.
By using the tower law and the independence of \(X_{1},X_{2}\), deduce that
and thanks to assumption M-ii. one has:
Exploiting the same arguments used to study the behaviour of \({\mathbb {E}}\left[ (K^{\theta _\star }_{1,2})^2 \right]\) in statement (i), one gets that (A5) is bounded from above and below by positive constants. The proof is completed gathering the results.
Proof of Statement (iii)
Given the symmetric matrix \({\mathcal {W}}\) with entries
Denote by \(Sp({\mathcal {W}})\) the spectral radius and \(\Vert {\mathcal {W}}\Vert\) the corresponding matrix norm. Given the random variables \(A_{n},B_{n}\), the notation \(A_{n}\asymp B_{n}\) in probability means that there exists a constant \(C>0\) such that \({\mathbb {P}}\left( {1}/{C}\le {A_{n}}/{B_{n}}\le C\right)\) goes to 1 as n tends to infinity.
By Lemma 2(i) in Guerre and Lavergne (2005) \(n\sqrt{h}Q_{n}/v_{n}\) converges to a standard normal conditionally to \(X_{i}\) if \(Sp({\mathcal {W}})/\left\| {\mathcal {W}}\right\|\) goes to zero in probability. In particular it is enough to show that \(Sp({\mathcal {W}})=O_{{\mathbb {P}}}(1/n)\) and \(nh^{1/2} \left\| {\mathcal {W}}\right\| \asymp 1\) in probability. The proof of these two results can be found in Lemma 6.2 in Lavergne and Patilea (2008) by using the assumptions K.i and K.ii.
1.2 A.2 Proof of Theorem 2
Consider first the term \(\widehat{Q}_{n}^{A}(\widehat{\theta })\). Once the following result is proved:
the same rate of convergence for \(\widehat{Q}_{n}^{A}(\widehat{\theta })\) is achieved, since \(\widehat{\theta }\) is assumed to be an element of \(\Theta _{n}\) and recall for \(\widehat{Q}_{n}^{A}(\widehat{\theta })\), \({\mathcal {E}}_{i}=Y_i - g_0^{\beta }(\langle X_i, \theta \rangle )\). The argument to deduce that result proceeds almost identically to the simple hypothesis case by using assumptions M.iii.1.bis and M.iii.2.bis instead of M.iii.1 and M.iii.2 respectively, and assumption K.iii.
Define \({\mathcal {E}}_{M,i}\) as in the proof of Statement (i) of Theorem 1 and
and \(Q_{M,n}(\theta )=U_{M,n}(\theta )/h\). The class of all functions \({\mathcal {K}}_{\Theta _{n}}\) is Euclidean and, by using the same arguments to prove that (A1) is Euclidean, also the kernel of the U-statistics in (A6) is. Therefore, Theorem 2 in Major (2006) can be invoked but now also taking into consideration the properties of the space of directions \(\Theta _{n}\) where the estimate \(\widehat{\theta }\) is assumed to be an element in it, and for any \(t>0\) one has
provided that
where \(C_{1},\ldots ,C_{4}\) are some positive constants and
To verify condition (A8) one can evoke the same arguments used to prove Statement (i) of Theorem 1 with
and \(\delta >0\) arbitrarily small. Thanks to Major inequality (A7), one concludes that \(\sup _{\theta \in \Theta _{n}} \vert Q_{M,n}(\theta )\vert =O_{{\mathbb {P}}}(p^{3/2}{\ln n}/(nh^{1/2}))\).
It remains to prove that \(\sup _{\theta \in \Theta _{n}}\vert Q_{n}(\theta )-Q_{M,n}(\theta )\vert =o_{{\mathbb {P}}}(p^{3/2}{\ln n}/(nh^{1/2}))\). To do this, one follows steps which are similar as in the simple hypothesis case and then they are omitted.
Turning the attention to the variance, by using assumptions M.iii.1.bis, M.iii.2.bis and K.iii., the following result is deduced
Since the proof proceeds almost identically to the simple hypothesis case, it is omitted. The main differences are in the Hoeffding decomposition where the results of the second order U-statistic is now \(\sup _{\theta \in \Theta _{n} }\left| U_{2,n}\right| =O_{{\mathbb {P}}}\left( p^{3/2}h^{1/2}\ln n/n\right)\) and the first order U-statistics is now \(\sup _{\theta \in \Theta _{n}}\left| U_{1,n}\right| =O_{{\mathbb {P}}}\left( p^{3/2} /n^{1/2}\right)\).
The asymptotic normality is established by invoking Lemma 2(i) of Guerre and Lavergne (2005), in fact, using the same technicalities as in Statement (iii) in Theorem 1, the conditions in the aforementioned Lemma are satisfied.
In order to treat the terms \(\widehat{Q}_{n}^{B}(\widehat{\theta })\) and \(\widehat{Q}_{n}^{C}(\widehat{\theta })\) it is enough to investigate the behaviour of \(\sup _{\theta \in \Theta _{n}}\widehat{Q}_{n}^{B}(\theta )\) and \(\sup _{\theta \in \Theta _{n}}\widehat{Q}_{n}^{C}(\theta )\). Without loss of generality, in what follows C denotes any positive constant. Note preliminarily that by assumption E-i.,
Now consider the term \(\widehat{Q}_{n}^{B}(\theta )\). Denote \(W=({\mathcal {E}},X^{\prime })^{\prime }\) and \(\kappa _{\theta ,h}(W_{i},W_{j})={\mathcal {E}} _{i}\left( g_{0}\left( \left\langle X_{j},\theta _{\star }\right\rangle \right) -g_{0}\left( \left\langle X_{j},\theta \right\rangle \right) \right) K_{ij}^{\theta }\). As a consequence of assumption M-iv. the class of functions \(g_{0}\) is Euclidean (see Lemma 2.13 of Pakes and Pollard 1989). By assumption K-i., the kernel K is of bounded variation and by Lemma 22(ii) in Nolan and Pollard (1987) the class of all functions K is Euclidean. Since the product of two classes of Euclidean functions is Euclidean (see Lemma 2.14(ii) of Pakes and Pollard 1989) then \(\kappa _{\theta ,h}\) is Euclidean.
Apply the centered Hoeffding decomposition to \(h\widehat{Q}_{n}^{B}(\theta )\) to obtain:
where \(U_{n}\widetilde{\kappa }_{1}\) is the first order U-process associated with a kernel \(\widetilde{\kappa }_{1}(W_{i})={\mathbb {E}}[\kappa _{\theta ,h}(W_{i},W_{j})\vert W_{i}]\), whereas \(U_{n}\widetilde{\kappa }_{\theta ,h}\) is the second order U-process associated with the kernel \(\widetilde{\kappa }_{\theta ,h}=\kappa _{\theta ,h}-\widetilde{\kappa }_{1}(W_{i})\). Moreover, note that \(\widetilde{\kappa }_{1}(W_{j})={\mathbb {E}}[\kappa _{\theta ,h}(W_{i},W_{j})\vert W_{j}]=0\).
One starts by investigating \(U_{n}\widetilde{\kappa }_{\theta ,h}\): from the argument above, \(\kappa _{\theta ,h}\) is Euclidean and by Lemma 5 in Sherman (1994) the corresponding class of functions by taking the conditional expectation is also Euclidean and conclude that \(\widetilde{\kappa }_{\theta ,h}\) is Euclidean as a consequence of Lemma 2.14(i) in Pakes and Pollard (1989). By the triangle inequality, one obtains \(\left| \widetilde{\kappa }_{\theta ,h}\right| \le \vert \kappa _{\theta ,h}\vert +\vert \widetilde{\kappa }_{1}\vert\). First, \(\vert \kappa _{\theta ,h}\vert\) is investigated: by the definition and the assumption M-iv. and the boundedness of the kernel:
Similarly, for what concerns the second addend one has
Hence, by putting the results together, \(\left| \widetilde{\kappa } _{\theta ,h}\right| \le C\vert {\mathcal {E}}_{i}\vert \left( {\ln n }/{n}\right) ^{r}\).
The Main Corollary in Sherman (1994) states that
where \(0<\alpha <1\). Plugging in the right hand side of the inequality the bound obtained for \(\vert \widetilde{\kappa }_{\theta ,h}\vert\), it can be controlled by \(C \left( {\ln n}/{n}\right) ^{r\alpha }{\mathbb {E}}\left[ \left( \vert {\mathcal {E}} _{i}\vert ^{2}\right) ^{\alpha }\right] ^{1/2}\). By concavity and Jensen’s inequality, \({\mathbb {E}}\left[ \left( \vert {\mathcal {E}}_{i}\vert ^{2}\right) ^{\alpha }\right] \le ({\mathbb {E}}\left[ \vert {\mathcal {E}}_{i}\vert ^{2}\right] )^{\alpha }\), where \({\mathbb {E}}[\vert {\mathcal {E}}_{i}\vert ^{2}]\) is bounded. Gathering the previous results one obtains
Finally, by invoking the Markov inequality,
The behaviour \(U_{n}\widetilde{\kappa }_{1}\) is now investigated. First, by using the fact that \(\kappa _{\theta ,h}\) is Euclidean, the corresponding class of functions by taking the conditional expectation is also Euclidean by (Sherman 1994, Lemma 5) and hence \(\widetilde{\kappa }_{1}\) is. Then using the bound (A9) and the Main Corollary in Sherman (1994), one achieves
Using similar arguments as above, one gets
By putting together (A10) and (A11), deduce that
Investigate now the behaviour of the term \(\widehat{Q}_{n}^{C}(\theta )\). Denote its kernel by
Arguing as above with the boundedness of X S-ii., Lipschitz condition M-iv., deduce that
Recalling that \(g_{0}\) is Euclidean as well as the kernel K (see Nolan and Pollard, 1987, Lemma 22(ii)), then \(\kappa _{C}\) is Euclidean (see Pakes and Pollard, 1989 Lemma 2.14(ii)). Therefore, it is possible to appeal to the Main Corollary of Sherman (1994) and, as a consequence of the bound for \(\vert \kappa _{C}\vert\)
By Markov inequality
Finally
Combining all the rates of convergence, the asymptotic normality follows directly.
In the case when \(\beta\) must be estimated by \(\widehat{\beta }\), one can follows similar steps as developed before. In particular \(Q_n(\widehat{\theta })\) can be developed as a sum of six terms; besides \(\widehat{Q}_{n}^{A}\), \(\widehat{Q}_{n}^{B}\) and \(\widehat{Q}_{n}^{C}\) one has the following three extra-terms:
-
\(\widehat{Q}_{n}^{D} = \sum \limits _{i=1}^{n}\sum \limits _{j=1,j\ne i}^{n}{\mathcal {E}}_{i}\left\{ g_{0}^{\beta }\left( \left\langle \theta ,X_{j}\right\rangle \right) -g_{0}^{\widehat{\beta }}\left( \left\langle \theta ,X_{j}\right\rangle \right) \right\} K_{ij}^{\widehat{\theta }}\)
-
\(\widehat{Q}_{n}^{E}= \sum \limits _{i=1}^{n}\sum \limits _{j=1,j\ne i}^{n}{\mathcal {E}}_{i}\left\{ g_{0}^{ \widehat{\beta }}\left( \left\langle \theta ,X_{j}\right\rangle \right) -g_{0}^{\widehat{\beta }}\left( \left\langle \widehat{\theta },X_{j}\right\rangle \right) \right\} K_{ij}^{\widehat{\theta }}\)
-
\(\widehat{Q}_{n}^{F} =\sum \limits _{i=1}^{n}\sum \limits _{j=1,j\ne i}^{n}\left\{ g_{0}^{\beta }\left( \left\langle \theta ,X_{i}\right\rangle \right) -g_{0}^{\widehat{\beta } }\left( \left\langle \theta ,X_{i}\right\rangle \right) \right\} \left\{ g_{0}^{\widehat{\beta }}\left( \left\langle \theta ,X_{j}\right\rangle \right) -g_{0}^{\widehat{\beta }}\left( \left\langle \widehat{\theta },X_{j}\right\rangle \right) \right\} K_{ij}^{\widehat{\theta }}.\)
The assumption S-ii., Lipschitzianity of \(g_0\), the rate of convergence of \(\widehat{\beta }\) and the fact that \(\vert g_0^\beta (u) - g_0^{\beta '}(u) \vert \le C \Vert \beta -\beta ' \Vert\), allow to use the Euclidean property on the extra-terms to establish the claim.
1.3 A.3: Proof of Theorem 3
Firstly, note that Assumption M-ii. guarantees that the variance \(\nu _n^2(\theta _\star )\) is bounded above and below. Secondly, consider then the following decomposition for \(Q_{n}\left( \theta _\star \right)\):
In what follows, if it is not strictly necessary, the dependency on \(\theta _\star\) is dropped, since \(\theta _\star\) is fixed.
Since \(Q_{n}^{a}\) is identical to \(Q_{n}\) and the assumptions are the same as in Theorem 1, one can invoke Major’s inequality and proceed with similar steps as in to the proofs of statement (i) in Theorem 1. This leads to the asymptotic normality of \(n\sqrt{h}Q_{n}^{a}/\nu _n\) and then \(n\sqrt{h}Q_{n}^{a}\) is bounded in probability.
Concerning the term \(Q_{n}^{b}\), first define a new kernel \(\kappa _{ij}^b={\mathcal {E}}_{i}G\left( \left\langle X_{j},\theta _\star \right\rangle \right) K_{ij}^{\theta _\star }\), with the same argument used for the kernel \(\mathcal {\kappa }_{\theta _\star ,h}\) in the proof of Theorem 2, statement (ii), the kernel \(\kappa _{ij}^b\) is Euclidean and, by the bounded variation condition on the functions G and the kernel K, it follows that \(\left| \kappa _{ij}^b \right| \le C\left| {\mathcal {E}}_{i}\right|\). Hence, the main corollary of Sherman (1994) can be used on the U-statistic \(\widetilde{Q}_{n}^{b}=hQ_{n}^{b}/\gamma _{n}\) and, together with the concavity property and hence the Jensen’s inequality on the error \({\mathcal {E}}_{i}\) and by assumption M-i., one obtains \({\mathbb {E}}\left[ \left| n\widetilde{Q}_{n}^{b}\right| \right] \le C.\) Finally, by the Markov inequality, \(nh^{1/2}\left| Q_{n}^{b}\right| =O_{{\mathbb {P}}}\left( \gamma _{n}h^{-1/2}\right)\).
To study the behaviour of the last term \(Q_{n}^{c}(\theta )\), define the new kernel \(\kappa ^{c}_{ij}=G\left( \left\langle X_{i},\theta \right\rangle \right) G\left( \left\langle X_{j},\theta \right\rangle \right) K_{ij}^{\theta }\). By the bounded variation condition on the functions G and K, it follows that \(\left| \kappa ^{c}_{ij} \right| \le C\). Since the kernel \(\kappa ^{c}_{ij}\) is Euclidean (see Pakes and Pollard, 1989) Lemma 2.14 (ii), it is therefore possible to utilise the main corollary of Sherman (1994) on the U-statistic \(\widetilde{Q}_{n}^{c}=hQ_{n}^{c}/\gamma _{n}^{2}\) to derive \({\mathbb {E}}\left[ \left| n\widetilde{Q}_{n}^{c}\right| \right] \le C.\) By the Markov inequality it holds \(n h^{1/2}\left| Q_{n}^{c}-{\mathbb {E}}\left[ Q_{n}^{c}\right] \right| =O_{{\mathbb {P}}}\left( \gamma _{n}^{2} h^{-1/2}\right)\).
Let study now a lower bound for \({\mathbb {E}}[\vert hQ_{n}^{c} \vert ].\) By definition
Let \(u_{1}=\left( t_{1}-t_{2}\right) /h\) and \(u_{2}=t_{2}\), and set \(M_{\theta _\star }=Gf_{\theta _\star }\). Since the kernel K has unit integral, G is bounded variation, and by the dominated convergence theorem, then one can treat (A12) as follows
where \(C=\int _{{\mathbb {R}}}M_{\theta _\star }^{2}(x_{2})dx_{2}>0\). One can conclude that for n large enough,
Consider now the following decomposition:
Thanks to the rates derived before one gets
Since the lower bound for dominant term diverges as \(n\rightarrow +\infty\), this concludes the proof.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Chan, L., Delsol, L. & Goia, A. A link function specification test in the single functional index model. Adv Data Anal Classif (2023). https://doi.org/10.1007/s11634-023-00545-7
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11634-023-00545-7