1 Introduction

The problem of specifying the link function that models the dependence structure between two random elements is a very important task in many statistical regression analysis. That problem could be hard to treat when a functional predictor X, that is a random element taking values in a functional space, is used to explain the variability of a real random variable (r.v.) Y. In this case, in fact, the link is described through a real valued operator r acting on a functional space and it is difficult to visualize it, and consequently, to select a coherent specification. This kind of model is known as functional regression model with scalar response:

$$\begin{aligned} Y=r[X]+{\mathcal {E}} \end{aligned}$$
(1)

where \({\mathcal {E}}\) is a centered random error uncorrelated with X. It has appeared in many scientific domains and falls within the methodologies of the so-called functional statistics. For a review on this relatively recent branch of statistics, see for instance the monographes (Ferraty and Vieu 2006; Horvath and Kokoszka 2012) or (Ramsay and Silverman 2005), or the papers that appeared in recent special issues (see, for instance, Aneiros et al. 2022, 2019a, b).

The interest toward the checking of a specification of the regression operator for the model (1) has produced a rich line of research on structural testing procedures. Only to cite some examples, in Aneiros and Vieu (2013) the authors deal with testing linearity in semi-functional partially linear regression models, in Bücher et al. (2011) a test for the hypothesis of a specific parametric functional regression model is introduced, in Cuesta-Albertos et al. (2019) a goodness-of-fit tests for the functional linear model is illustrated, the paper (Delsol et al. 2011) tackles an omnibus goodness-of-fit tests in a full nonparametric framework, finally the work (Patilea et al. 2016) provides a test for the case of a functional response in the spirit of the smoothing test statistic considered by Sheng (1996).

A useful help in the context of the model specification can come from the semi-parametric regression approaches. Thanks to a projective strategy, it allows to visualize a link function, combining the flexibility and the interpretability and, at the same time, to avoid some dimensionality problems that can occur in the full nonparametric context. Only to get a partial idea of the variety of techniques developed, one can see a general presentation in Härdle et al. (2004) and, as examples for what concerns the functional statistics, the papers Ferraty et al. (2013), Goia and Vieu (2015), Ling and Vieu (2021) and Novo et al. (2019) and references therein.

The present paper explores the possibility to build a specification test by exploiting the potential of the Single Functional Index Model (SFIM). This defines the relationship between X and Y through an unknown real link function g acting on a projection of the functional regressor along an unknown direction \(\theta\), constrained for identifiability. Formally, let X be a random element valued in a Hilbert space \({\mathcal {H}}\) of real functions defined over a compact interval \({\mathcal {T}}\), equipped with an inner product \(\left\langle \cdot ,\cdot \right\rangle\) and associated norm \(\left\| \cdot \right\|\), then \(r[X]=g\left( \left\langle \theta ,X\right\rangle \right)\), where \(g:{\mathbb {R}}\rightarrow {\mathbb {R}}\), \(\theta \in {\mathcal {H}}\) with \(\left\| \theta \right\| ^{2}=1\) and \(\theta \left( t\right) >0\) for any fixed \(t\in {\mathcal {T}}\) for identifiability (see e.g. Ait-Saïdi et al. 2008). So far, various techniques have been introduced to estimate g and \(\theta\) from samples drawn from \(\left( X,Y\right)\) and the rate of convergence derived (see e.g. Jiang et al. 2020; Novo et al. 2019 or Shang 2020 for recent contributions).

The main interest in SFIM is that it makes possible to bring back an infinite dimensional problem to a one dimensional framework and to visualize an estimate \(\widehat{g}\) of g, obtained from an observed dataset, that can suggest the nature of the link between X and Y. This allows to postulate a target specification \(g_{0}\) for g, depending on some real parameter, and then to check if it is compatible with the observed dataset at a given significance level. For instance, if the plot of \(\widehat{g}\) exhibits a straightness shape, the linearity of the regression model should be investigated. As a consequence, once the link function is specified, the resulting model depends only on the functional parameter \(\theta\) and hence it is full parametric, with some practical and theoretical benefits in the estimation step (for instance, a faster rate of convergence than in the semi-parmetric case, a full interpretability of the link, good estimates for rather small sample sizes, no smoothing parameters have to be introduced). By way of example, one can consider the prediction of the moisture value for 80 samples of corn by using the first derivatives of the corresponding Near-infrared (NIR) spectra measured over the wavelength range 1100–2498 nanometer (see the original spectra in the top panel of Fig. 1): an estimate of the link function g is plotted in the bottom panel of Fig. 1 and one may wonder if the hypothesis of a linear link is compatible with the empirical evidence.

Fig. 1
figure 1

Spectrometric curves (top) and estimate of the link function (bottom)

The aim of this work is to define and operationalize a suitable specification test procedure and then analyze its performances: given the SFIM framework, one wants to test the null hypothesis that the link function g belongs to a family of possible parametric functions:

$$\begin{aligned} {\mathcal {G}}_0 = \left\{ g_{0}^\beta :{\mathbb {R}}\rightarrow {\mathbb {R}},\beta =\left( \beta _{0},\beta _{1},\dots ,\beta _{p}\right) \in {\mathbb {R}}^{d+1} \right\} \ \ \ \ \ d\ge 1\text { integer} \end{aligned}$$

where the parameter \(\beta\) can be entirely specified or not, against the alternative that g is not an element of \({{{\mathcal {G}}}}_0\). The main idea is to exploit the so-called conditional moment test approach (see e.g. Newey 1985) based on the fact that, under the null hypothesis the quantity \({\mathbb {E}}\left[ {\mathcal {E}}{\mathbb {E}}\left[ {\mathcal {E}}\vert X\right] w\left( X\right) \right]\), where \(w\left( X\right)\) is a suitable weight, is null, whereas it is strictly positive under the alternative one. Therefore, by using a kernel regression approach, a test statistic belonging to the family of U-statistic is derived and appropriately standardized, and, under suitable assumptions on the distribution of X (in particular, expressed in terms of small-ball probability), the involved kernel, the nature of the error, the behaviour of the estimator of the SFIM model used, it is proved that its asymptotic null distribution is the Gaussian one. Thanks to the latter statement, the p value of the test can be directly computed and an evaluation of the power of the test, based on the asymptotic result, can be evaluated by means of Monte Carlo experiments carried out on samples with finite sizes and under various experimental conditions, that is nature of the link, sample size, variability of the error model. In order to appreciate the robustness of the introduced test methodology, a comparison with the results obtained when some standard bootstrap procedures are used is made. Finally, to show how the test can be used for practical purposes, an application to the prediction via SFIM model for the spectrometric dataset is performed and discussed.

The outline of the paper is as follows: the theoretical background, the basic principle of the test and the test statistic are defined in Sect. 2. Theoretical properties of the test statistic are discussed in Sect. 3: in particular in Sects. 3.1 and 3.1.1 the null distribution of the test statistic is derived, whereas some remarks on consistency are provided in Sect. 3.1.2. The performances of the test are analyzed in Sect. 4 and the bootstrap approaches are described in Sect. 5. Finally the real world analysis is done in Sect. 6. For the sake of readability, all the detailed proofs of the theoretical results are postponed to the technical Appendix.

2 Notations and test definition

Consider the random element \(\left( X,Y\right)\) defined on a probability space and mapping on \({\mathcal {H}}\times {\mathbb {R}}\), where \({\mathcal {H}}\) is a Hilbert space of real functions defined over a compact interval \({\mathcal {T}}\). From now on, one takes \({\mathcal {H}}={\mathcal {L}}_{\left[ 0,1\right] }^{2}\) the space of square integrable real functions defined over \(\left[ 0,1\right]\), equipped with the natural inner product \(\left\langle g,h\right\rangle =\int _{0}^{1}g\left( s\right) h\left( s\right) ds\) and associated norm \(\left\| g\right\| ^{2}=\left\langle g,g\right\rangle\).

Assume that the relation between Y and X is defined by the following SFIM:

$$\begin{aligned} Y=g(\left\langle X,\theta \right\rangle )+{\mathcal {E}} \end{aligned}$$
(2)

where \(g:{\mathbb {R}}\rightarrow {\mathbb {R}}\) is an unknown link function, \(\theta \in {\mathcal {H}}\) is an unknown direction such that \(\left\| \theta \right\| ^{2}=1\) and \(\theta \left( t\right) >0\) for a fixed t, for identifiability, and \({\mathcal {E}}\) is a r.v. satisfying \({\mathbb {E}}\left[ {\mathcal {E}}\vert X\right] =0\) and \({\mathbb {E}}\left[ {\mathcal {E}}^{2}\vert X\right] =\sigma ^{2}\) (see e.g. Ait-Saïdi et al. 2008).

2.1 The test principle

Define \({{{\mathcal {G}}}}_0 = \{g_{0}^\beta :{\mathbb {R}} \rightarrow {\mathbb {R}},\beta \in {\mathbb {R}}^{d+1} \}\), \(d\ge 1\) integer, be a known function, measurable w.r.t. the \(\sigma\)-algebra generated by X, and depending on the parameter \(\beta =\left( \beta _{0},\beta _{1},\dots ,\beta _{d}\right) \in {\mathbb {R}} ^{d+1}\). Consider then the following hypothesis:

$$\begin{aligned} H_{0}:g \in {{{\mathcal {G}}}}_0 \qquad \text {vs.\qquad } H_{1}: g \in {{{\mathcal {G}}}}_1 \end{aligned}$$

where \({{{\mathcal {G}}}}_1\) is a set of real functions \(g_1^\beta\) such that \({{{\mathcal {G}}}}_1 \cap {{{\mathcal {G}}}}_0 = \text{\O }\). If \(\theta\) and \(\beta\) are fixed, the hypotheses are simple, otherwise complex. To fix the ideas, the above setting includes the possibility of testing the linearity of the regression by specifying \({\mathcal {G}}_0\) as the set of affine functions \(g_{0}^\beta (u)=\beta _{0}+\beta _{1}u,\) with \(\beta _{0},\beta _{1}\in {\mathbb {R}}\), \(\beta _{1}\ne 0\). In practice, the specification of the model under the null hypothesis could be done by a direct inspection of the scatterplot between the observed values \(\langle X,\theta \rangle\) and those of Y or by imposing an a priori model.

Consider:

$$\begin{aligned} {\mathbb {E}}\left[ \left( g(\left\langle X,\theta \right\rangle )-g_{0}^\beta \left( \left\langle X,\theta \right\rangle \right) \right) ^{2}w\left( X\right) \right] \end{aligned}$$
(3)

where \(g_0 \in {{{\mathcal {G}}}}_0\), and w is a positive weight function. By the SFIM assumption (2),\(\ {\mathbb {E}}\left[ Y\vert X\right] =g(\left\langle X,\theta \right\rangle )\) and this implies that

$$\begin{aligned} g(\left\langle X,\theta \right\rangle )-g_{0}^\beta \left( \left\langle X,\theta \right\rangle \right) = {\mathbb {E}} \left[ {\mathcal {E}}\vert X\right] \end{aligned}$$

where \({\mathcal {E}} = Y-g_{0}^\beta \left( \left\langle X,\theta \right\rangle \right) .\) Hence, (3) can be rewritten in the equivalent form:

$$\begin{aligned} {\mathbb {E}}\left[ {\mathbb {E}}\left[ {\mathcal {E}}\vert X\right] ^{2}w\left( X\right) \right] ={\mathbb {E}}\left[ {\mathcal {E}}{\mathbb {E}}\left[ {\mathcal {E}}\vert X\right] w\left( X\right) \right] . \end{aligned}$$

Under the null hypothesis the latter is null, because of \(g(\langle X,\theta \rangle )=g_0(\langle X,\theta \rangle )\) a.s., whereas under the alternative it is strictly positive, because of \({\mathbb {P}}\left( g_0 (\langle X,\theta \rangle ) = g(\langle X,\theta \rangle ) \right) < 1\).

Thanks to the above principle, it is possible to implement a test procedure starting from an empirical version of \({\mathbb {E}}\left[ {\mathcal {E}} {\mathbb {E}}\left[ {\mathcal {E}}\vert X\right] w\left( X\right) \right]\): the null hypothesis is rejected if it is significantly far from zero.

2.2 The test statistic

Consider a sample \(\left( X_{i},Y_{i}\right) ,i=1,\dots ,n\), of i.i.d. replications of \(\left( X,Y\right)\) and suppose that \(\theta\) and \(\beta\) are completely specified and equal to \(\theta _\star\) and \(\beta _\star\) respectively. Take a Nadaraya–Watson type nonparametric kernel estimate of \({\mathbb {E}}\left[ {\mathcal {E}}\vert X\right]\) at the point \(X_{i}\). Subsequently, an empirical version of \({\mathbb {E}}\left[ {\mathcal {E}}{\mathbb {E}}\left[ {\mathcal {E}}\vert X\right] w\left( X\right) \right]\) can be written as follows

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}{\mathcal {E}}_{i}\sum _{j=1,j\ne i}^{n}{\mathcal {E}}_{j}\frac{K\left( \delta \left( X_{i},X_{j}\right) /h\right) }{\sum _{j\ne i}K\left( \delta \left( X_{i},X_{j}\right) /h\right) }w\left( X_{i}\right) \end{aligned}$$
(4)

where \({\mathcal {E}}_{i}=Y_i - g_0^{\beta _\star }(\langle X_i, \theta _\star \rangle )\), K is a kernel function, h is a bandwidth, which depends on n, and \(\delta\) is a semi-metric.

In order to derive a convenient expression for the test statistic in the SFIM framework, some simplifications can be introduced. Firsly, as w can be chosen arbitrarily, if one can assume that the projection r.v. \(\left\langle X,\theta _\star \right\rangle\) admits a strictly positive probability density function \(f_{\theta _\star }\), then a possible choice is \(w=f_{\theta _\star }\). Since it is unknown, consider the following cross-validated kernel estimate of \(f_{\theta _\star }\) based on the same kernel, semi-metric and bandwidth as above:

$$\begin{aligned} f_{\theta _\star ,n}\left( X_{i}\right) =\frac{1}{\left( n-1\right) h} \sum _{j=1,j\ne i}^{n}K\left( \frac{\left| \left\langle X_i-X_j,\theta _\star \right\rangle \right| }{h}\right) . \end{aligned}$$

Secondly, one can select the the projection semi-metric:

$$\begin{aligned} \delta \left( X_{1},X_{2}\right) =\left| \left\langle X_{1}-X_{2},\theta _\star \right\rangle \right| . \end{aligned}$$

By plugging these choices in (4), the following simplified expression of the test statistics follows:

$$\begin{aligned} Q_{n}\left( \theta _\star \right) =\frac{1}{n\left( n-1\right) h}\sum _{i=1}^{n}\sum _{j=1,j\ne i}^{n}{\mathcal {E}}_{i}{\mathcal {E}}_{j}K_{ij}^{\theta _\star } \end{aligned}$$
(5)

where \(K_{ij}^{\theta _\star }=K\left( \left| \left\langle X_{i}-X_{j},\theta _\star \right\rangle \right| /h\right)\).

Invoking similar arguments as in Sheng (1996) one can derive the following estimate for the variance of \(n \sqrt{h} Q_{n}\left( \theta _\star \right)\):

$$\begin{aligned} \nu _{n}^{2}\left( \theta _\star \right) =\frac{2}{n\left( n-1\right) h}\sum _{i=1}^{n}\sum _{j=1,j\ne i}^{n}{\mathcal {E}}_{i}^{2}{\mathcal {E}}_{j}^{2}\left( K_{ij}^{\theta _\star }\right) ^{2}. \end{aligned}$$
(6)

This allows to obtain the standardized test statistic version \(n\sqrt{h}Q_{n}\left( \theta _\star \right) /\nu _{n}\left( \theta _\star \right)\) that can be used to test a simple null hypothesis.

When one deals with composite null hypothesis, \(\theta\) and/or \(\beta\) are not specified and some estimates of them have to be introduced. In particular, let \(\widehat{\theta }\) be an estimator of \(\theta\), one defines \(\widehat{{\mathcal {E}}}_{i}=Y_{i}-g_{0}^{\beta _\star } ( \langle X_{i},\widehat{\theta }\rangle )\) and the resulting test statistic can be written as follows:

$$\begin{aligned} Q_{n}\left( \widehat{\theta }\right)\, =\frac{1}{n\left( n-1\right) h} \sum _{i=1}^{n}\sum _{j=1,j\ne i}^{n}\widehat{{\mathcal {E}}}_{i}\widehat{{\mathcal {E}}}_{j}K_{ij}^{\widehat{\theta }} \end{aligned}$$
(7)

where \(K_{ij}^{\widehat{\theta }}=K\left( \vert \langle \widehat{\theta },X_{i}-X_{j}\rangle \vert /h\right)\). To obtain a suitable estimator \(\widehat{\theta }\) for \(\theta\) it is possible to use, for instance, the first step of the approach proposed in Ferraty et al. (2013) which combines a spline approximation of the functional coefficient \(\theta\) and the one-dimensional Nadaraya–Watson approach to estimate the link function in the SFIM (2).

In the case of the parameter \(\beta\) is also not specified and an estimate \(\widehat{\theta }\) is available, one has to consider some estimators \(\widehat{\beta }\) of \(\beta\) that can be obtained through a least square approach, by minimizing \(\sum _{i=1}^n \{Y_i - g_{0}^\beta (\langle X_{i},\widehat{\theta }\rangle )\}^2\). In particular, if \(g_{0}\) is linear affine, one regresses directly the observations \(Y_{i}\)s against the projections \(\left\langle X_{i},\widehat{\theta }\right\rangle\)s. The studentized versions of these test statistics are obtained by plugging in (6) the estimates \(\widehat{\theta }\) and \(\widehat{\beta }\).

3 Asymptotic behaviour of the test statistics

To define the critical region of the test, one needs the derivation of the null distribution of the test statistics. To do this, one considers two scenarios, namely the simple null hypothesis, where both parameters \(\beta\) and \(\theta\) are completely specified, and the complex one: they are discuss in Sects. 3.1 and 3.2 respectively. The study of the behaviour under some alternative hypothesis concludes this section (see Sect. 3.3).

3.1 Simple null hypothesis

Suppose that \(g(\langle \theta , X \rangle ) = g_{0}^{\beta_\star} (\langle \theta , X \rangle )\) and that the direction \(\theta\) is specified and equals to \(\theta _\star\). Consider the following sets of assumptions; in what follows \({\mathcal {F}}[\cdot ]\) denotes the Fourier transform.

Assumptions on the sample

S-i. :

\(\left( X_{i},Y_{i}\right) ,i=1,\dots ,n\), of i.i.d. replications of \(\left( X,Y\right)\)

Assumptions on the model

M-i. :

There exists the generating moments function of the error \({\mathcal {E}}\);

M-ii. :

There exists \(\underline{\sigma }^{2}\) and \(\overline{\sigma }^{2}\) such that \(0<\underline{\sigma }^{2}\le Var({\mathcal {E}}\vert X)\le \overline{\sigma }^{2}<\infty\) almost surely;

M-iii.1. :

There exist a constant \(C_1>0\) such that

$$\begin{aligned} \frac{1}{C_1}\le {\mathbb {E}}\left[ f_{\theta _\star } \left( \langle \theta _\star , X \rangle \right) \right] \le C_1. \end{aligned}$$
M-iii.2. :

There exists \(C_2>0,\epsilon >0\) such that \(\int _{\vert x\vert \le \epsilon }\vert {\mathcal {F}}[f_{\theta _\star }]\vert ^{2}(x)dx\ge C_2\).

M-iv. :

The function \(g^{\beta _\star }_0\) is Lipschitz.

Assumptions on the Kernel function

K-i. :

The kernel K is a continuous density of bounded variation and with strictly positive Fourier transform on the real line.

K-ii. :

As n diverges, \(h\rightarrow 0\) and \(\dfrac{\ln n}{(nh^{2})^{\lambda }}\rightarrow 0\), for some \(\lambda \in (0,1)\).

The main results are collected in the following theorem and its detailed proof is postponed in the Appendix for the sake of readability. The role of the assumptions in proving these results is discussed at the end of this section.

Theorem 1

Under the assumptions S, M and K and when the null hypothesis \(H_{0}\) holds true, one has:

  1. (i)

    \(\left| Q_{n}(\theta _\star )\right| =O_{{\mathbb {P}}}\left( \dfrac{\ln n}{nh^{1/2}}\right) ,\)

  2. (ii)

    \(\dfrac{1}{\nu _{n}^{2}(\theta _\star )}=O_{{\mathbb {P}}}(1),\)

  3. (iii)

    as n goes to infinity, \(n\sqrt{h}Q_{n}\left( \theta _\star \right) /\nu _{n}\left( \theta _\star \right) \sim {\mathcal {N}}\left( 0,1\right) .\)

According to statement (iii) in Theorem 1 if the null hypothesis \(H_{0}\) holds true, the test statistics \(T_{n}=n\sqrt{h}Q_{n}(\theta _\star )/\nu _{n}(\theta _\star )\) converges in law to a standard normal. Consequently, the test given by \({\mathbb {I}}_{\{T_{n}\ge z_{1-a}\}}\), with \(z_{\alpha }\) the \((1-\alpha )\)-th quantile of the standard normal distribution, has an asymptotic level \(\alpha\).

To conclude the section, some remarks about the assumptions stated above are added. Firstly, one can point out that M-i., M-iv. and K-i. are standard hypothesis in the framework of nonparametric functional regression models (see for instance Ferraty and Vieu 2006). In particular, the bounded variation condition K-i. is satisfied by many well-known kernels, such as Gaussian or Epachnikov ones, and thanks to this condition, the class of kernel functions subsequently considered is Euclidean, a notion crucial for achieving the proofs and that is defined in the Appendix. In fact, the latter notion together with the regularity condition M.iv, imposed on the link function \(g_{0}\), allow to utilise an existing exponential inequality due to Major (2006). To do this the conditional expectation of the squared errors and the squared kernel need to be investigated. The boundedness condition of the conditional variance (see M-ii.) allows one to bound the conditional errors below and above. The behaviour of the squared kernel is studied through the properties of its Fourier transform. The technical assumptions such as the positive Fourier transform on the real line K-i., the boundedness below and above of the density of the pdf \(f_{\theta _\star}\) (see M-iii.1), the boundedness below of the restricted Fourier transform of the pdf \(f_{\theta _\star}\) (see M-iii.2) allow one to establish the upper and lower bounds of the kernel and this verifies a condition required for the Major’s exponential inequality. Note that the condition M-iii.2 is satisfied by many random variables, such as Gaussian or exponential. The existence of moment generating function of the error \({\mathcal {E}}\) (see M-i.), and the trade-off between the sample size n and the bandwidth h (see K-ii.) permit to establish the rate of convergence in statements (i) and (ii) of Theorem 1. Finally, the collection of hypotheses further allows one to operate in a similar setting as in Lavergne and Patilea (2008) to establish the asymptotic normality in statement (iii).

3.2 Complex null hypothesis

Suppose that \(\beta\) is fixed and \(\theta\) is estimated by \(\widehat{\theta }\). Then one has to investigate the behaviour of \(Q_{n}(\widehat{\theta })\): to achieve the convergence results, the following set of extra assumptions is needed.

Extra assumptions on the estimate

E-i. :

Take \(\widehat{\theta }\) belonging to a set \(\Theta _{n}\) of possible directions of interest which is such that

$$\begin{aligned} \#\Theta _{n}=n^{p},p>0\text { and }\left\| \theta -\theta _{\star }\right\| \le C\left( \frac{\ln n}{n}\right) ^{r} \text { for any } \theta \in \Theta _{n} \end{aligned}$$

where \(\#\) is the cardinality of a set, \(p=p_{n}\) depends on n, and \(r>0\) depends on some regularity assumption on the regression operator in (1) (see e.g. Ferraty et al. 2013 or Novo et al. 2019).

E-ii. :

The bandwidth h satisfies: \(n^{1-2r}h^{1/2}\rightarrow 0.\)

Extra assumptions on the sample

S-ii. :

\(\Vert X_{i}\Vert\) is bounded.

Extra assumptions on the model

M-iii.1.bis :

There exist a constant \(C_1>0\) such that for any \(\theta \in \Theta _{n}\)

$$\begin{aligned} \frac{1}{C_1}\le {\mathbb {E}}\left[ f_{\theta }( \langle \theta , X \rangle ) \right] \le C_1. \end{aligned}$$
M-iii.2.bis :

There exists \(C_2>0,\epsilon >0\) such that for any \(\theta \in \Theta _{n}\), \(\int _{\vert x\vert \le \epsilon }\vert {\mathcal {F}}[f_{\theta }]\vert ^{2}(x)dx\ge C_2\).

Extra assumptions on the Kernel function

K-iii. :

\(p\ge 1\) increases to infinity with n and \(p^{3/2}(\ln n)^{-\lambda }\) is bounded, for some constant \(\lambda >0\).

Return to the expression of \(Q_{n}\left( \widehat{\theta }\right)\) and define the following decomposition

$$\begin{aligned} Q_{n}\left( \widehat{\theta }\right)&\,=\frac{1}{n\left( n-1\right) h}\sum _{i=1}^{n}\sum _{j=1,j\ne i}^{n}\left( Y_{i}-g_{0}\left( \left\langle X_{i},\widehat{\theta }\right\rangle \right) \right) \left( Y_{j}-g_{0}\left( \left\langle X_{j},\widehat{\theta }\right\rangle \right) \right) K_{ij}^{\widehat{\theta }}\\&=\frac{1}{n\left( n-1\right) h}\sum _{i=1}^{n}\sum _{j=1,j\ne i}^{n}{\mathcal {E}}_{i}{\mathcal {E}}_{j}K_{ij}^{\widehat{\theta }}+\\&\quad +2\frac{1}{n\left( n-1\right) h}\sum _{i=1}^{n}\sum _{j=1,j\ne i} ^{n}{\mathcal {E}}_{i}\left( g_{0}\left( \left\langle X_{j},\theta _{\star }\right\rangle \right) -g_{0}\left( \left\langle X_{j},\widehat{\theta }\right\rangle \right) \right) K_{ij}^{\widehat{\theta }}+\\&\quad +\frac{1}{n\left( n-1\right) h}\sum _{i=1}^{n}\sum _{j=1,j\ne i} ^{n}\left( g_{0}\left( \left\langle X_{i},\theta _{\star }\right\rangle \right) -g_{0}\left( \left\langle X_{i},\widehat{\theta }\right\rangle \right) \right) \times \\&\quad \times \left( g_{0}\left( \left\langle X_{j},\theta _{\star }\right\rangle \right) -g_{0}\left( \left\langle X_{j},\widehat{\theta }\right\rangle \right) \right) K_{ij}^{\widehat{\theta }}\\&=\widehat{Q}_{n}^{A}\left( \widehat{\theta }\right) +2\widehat{Q}_{n} ^{B}\left( \widehat{\theta }\right) +\widehat{Q}_{n}^{C}\left( \widehat{\theta }\right) \end{aligned}$$

The following result provides the asymptotic behaviour of each term in the previous decomposition and it demonstrates that the leading term is \(\widehat{Q}_{n}^{A}\left( \widehat{\theta }\right)\) whilst the other two terms are negligible with respect to the first one.

Theorem 2

Under Assumptions S, M, K and E and the null hypothesis \(H_{0}\) holds true, one has:

  1. (i)

    \(\left| \widehat{Q}_{n}^{A}\left( \widehat{\theta }\right) \right| =O_{{\mathbb {P}}}\left( \dfrac{p^{3/2}\ln n}{nh^{1/2}}\right) ,\)

  2. (ii)

    \(\left| \widehat{Q}_{n}^{B}\left( \widehat{\theta }\right) \right| =o_{{\mathbb {P}}}\left( \dfrac{p^{3/2}\ln n}{nh^{1/2}}\right) \quad \text {and} \quad \left| \widehat{Q}_{n}^{C}\left( \widehat{\theta }\right) \right| =o_{{\mathbb {P}}}\left( \dfrac{p^{3/2}\ln n}{nh^{1/2}}\right) ,\)

  3. (iii)

    as n goes to infinity, \(n\sqrt{h}Q_{n}\left( \widehat{\theta }\right) \Big / \nu _{n}\left( \widehat{\theta }\right) \sim {\mathcal {N}}\left( 0,1\right) .\)

In what follows, a short discussion on the assumptions is reported. Firstly, note that since \(\widehat{Q}_{n}^{A}(\widehat{\theta })\) is the version of \(Q_{n}(\theta )\) when one deals with complex null hypothesis, the argument to deduce the rate of convergence as in Theorem 2, statement (i), proceeds very similarly to the simple null hypothesis situation. Hence, the sets of conditions M and K on the models and on the kernel function respectively are still necessary but with certain modifications or additions with respect to the simple null hypothesis case, such as M-iii.1.bis and M-iii.2.bis. Extra assumption on the kernel, such as the trade off between the dimension p and the sample size n, allows one to establish the required result. More in detail, the fact that the kernel is a continuous density of bounded variation and the regularity condition on the link function \(g_{0}\) (assumptions K-i. and M-iv. respectively) allow to invoke an inequality of Sherman (see Sherman 1994). Secondly, the assumption E-i. is rather standard to derive a uniform rate of convergence for the estimator of the link function in the SFIM (see, for instance, Novo et al. 2019). It imposes that the space of possible directions is finite but it can grow up to infinity as the sample size n increases and the direction of interest \(\theta\) becomes closer to \(\theta _{\star }\) and consequently also for \(\widehat{\theta }\) as it is an element of \(\Theta _{n}\). The latter assumption, the boundedness of X (assumption S-ii.) and the extra condition on the trade-off between the sample size n and the bandwidth h (assumption E-ii.) combined with Sherman’s inequality allow us to deduce the rate of convergence of the statement (ii) of Theorem 2.

When \(\beta\) is also not specified, a least square estimate \(\widehat{\beta }\) can be used as mentioned previously. In that case, if \(\beta\) belongs to a compact subset of \({\mathbb {R}}^{d+1}\), if the used estimate achieves the rate \(n^{-r}\), \(0<r \le 1/2\), and if there exists a positive constant C such that  for any fixed argument u, \(\vert g_0^\beta (u) - g_0^{\beta '}(u) \vert \le C \Vert \beta -\beta ' \Vert\), the result in Theorem 2 remains valid (see the proof in the Appendix). This is verified, for instance, when \(g_0^\beta\) is a polynomial model.

3.3 Some remarks on consistency

Consider the following alternative hypothesis:

$$\begin{aligned} H_1: g \in {{{\mathcal {G}}}}_1 = \left\{ g_{1}=g_{0}^{\beta }+\gamma _n G, \ G: {\mathbb {R}} \rightarrow {\mathbb {R}}, \ G \notin {{{\mathcal {G}}}}_0 \right\} \end{aligned}$$
(8)

where G is a smooth function and \(\gamma _{n}\) is a positive sequence tending to zero as \(n\rightarrow +\infty\).

The following result provides the asymptotic behaviour of the test statistic under the considered alternative hypothesis. For the sake of simplicity the case in which the parameter \(\beta\) and the direction \(\theta\) in the SFIM are completely specified (and equals to \(\beta _{\star }\) and \(\theta _{\star }\) respectively) is dealt with.

Theorem 3

Under the alternative hypothesis (8), the assumptions S, M and K of Theorem 1, if G is bounded variation and \(f_{\theta _\star }\) is continuous and bounded, and \(Gf_{\theta _\star } \ne 0\) a.s., and if \(\gamma _{n}^{2}h^{-1/2}\) diverges as n goes to infinity, then the following statement holds:

$$\begin{aligned} n\sqrt{h}Q_{n}\left( \theta _\star \right) /v_{n}(\theta _\star ) \longrightarrow +\infty \quad \text {in probability as } n \rightarrow +\infty . \end{aligned}$$
(9)

The proof of this result is deferred in Appendix.

4 Simulation study

In this section the finite sample properties of the test proposed are explored by evaluating the empirical level and power under different experimental conditions. For each setting, the empirical power is computed as the proportion of times in which the test rejects the null hypothesis at the nominal level \(\alpha\) (here \(\alpha =5\%\)) over 1000 Monte Carlo replications. The critical region of the test is based on the Gaussian approximation of the null distribution provided in Theorem 2: one rejects the null hypothesis whenever the value of the studentized test statistics is greater than the quantile of order \(1-\alpha\) of the standard normal distribution. All the experiments are conducted using the software R.

The data used in all the simulations are generated according to the following SFIM model:

$$\begin{aligned} Y_{i}=g\left( \left\langle \theta ,X_{i}\right\rangle \right) +\sigma {\mathcal {E}}_{i}\qquad i=1,\dots ,n \end{aligned}$$

with \(n=50,100,200\), corresponding to small and medium sample sizes.

For any \(i=1,\dots ,n\), the functional covariate obeys to:

$$\begin{aligned} X_{i}\left( t\right) =2a_{i}t^{2}+b_{i}\sin (2\pi t)+c_{i}\cos (4\pi t)\qquad t\in \left[ 0,1\right] \end{aligned}$$

where \(a_{i}\), \(b_{i}\), \(c_{i}\) are i.i.d. uniform r.v.s over \(\left( -1,1\right)\), so that the random curves are centered and bounded so that assumption S-ii. is satisfied; every trajectory is discretized over a grid of 100 equispaced design points. A sample of 30 of such a functional data is plotted in Fig. 2.

Fig. 2
figure 2

A sample of 30 curves \(X\left( t\right)\) randomly selected

For what concerns the functional coefficient, one uses the normalized direction:

$$\begin{aligned} \theta \left( t\right) =\sqrt{2}\sin (2\pi t)\qquad t\in \left[ 0,1\right] . \end{aligned}$$

Due to the nature of the involved objects, the r.v.s \(\left\langle \theta ,X_{i}\right\rangle = \sqrt{2}(b_i/2-a_i/\pi )\) are centered, symmetric, bounded, and with a strictly positive density \(f_\theta\) over \(\left( -(\pi +2)/(\sqrt{2}\pi ),(\pi +2)/(\sqrt{2}\pi )\right)\) exhibiting a trapezoidal behavior over that interval, satisfying in this way the assumption M.iii.1. In practice, since one works with discretized curves, all the integrals \(\left\langle \theta ,X_{i}\right\rangle\) are approximated by summations.

About the error in the model, \({\mathcal {E}}_{i}\) are i.i.d. standard Gaussian r.v.s. and, to control the signal-to-noise ratio, the variability coefficient \(\sigma\) is defined by \(\sigma ^{2}=\rho ^{2}Var\left( g\left( \left\langle \theta ,X\right\rangle \right) \right)\), where the latter variance is estimated for each sample using the data. Here \(\rho ^{2}=0.2\) and 0.5 corresponding to a theoretical coefficient of determination \(R^{2}\) of about 0.83 and 0.67, respectively. These choices guarantee that the assumptions M.i. and M.ii. hold.

In all the experiments some composite hypotheses, with both \(\beta\) and \(\theta\) unknown, are tested. In particular, testing for linear and cubic link are analyzed in details (all the tested functions satisfy the assumption M.iv.). Hence, to operationalize the test procedure some estimate of \(\beta\) and \(\theta\) and the evaluation of the bandwidth h are necessary. For what concerns \(\beta\), the standard OLS approach is used, whereas to estimate \(\theta\) one adopts the first step of the approach proposed in Ferraty et al. (2013). Here one uses cubic splines with 10 internal knots and the Epanechnikov Kernel

$$\begin{aligned} K\left( u\right) =\frac{3}{4}\left( 1-u^{2}\right) \qquad \left| u\right| \le 1. \end{aligned}$$

The same kernel is also used in evaluating the test statistic: since the bandwidth h is related to the estimate of \(f_{\theta }\), it is selected by the unbiased cross-validation approach that one uses in estimating that density. Due to the non parametric nature of the test statistic, the selection of the bandwidth could have an impact on the results of the simulation study. On the other hand, a systematic analysis of the effects on the performances of the test caused by changing the bandwidth goes beyond the scope of this work; therefore it is not performed.

4.1 Testing a linear affine link

Let \(\beta _{0},\beta _{1}\in {\mathbb {R}}\), \(\beta _{1}\ne 0\), \(u=\left\langle \theta ,x\right\rangle\), \(\theta\) unknown, \(\left| u\right| < \left( \pi +2\right) /\left( \sqrt{2}\pi \right)\) and consider the complex null hypothesis of a linear affine link:

$$\begin{aligned} H_{0}:g\left( u\right) =\beta _{0}+\beta _{1}u \end{aligned}$$

and the following alternatives:

$$\begin{aligned} H_{1}^{\left( 1\right) }\left( \gamma \right)&:g\left( u\right) =\beta _{0}+\beta _{1}u+\gamma \left( u\sqrt{2}\pi /\left( \pi +2\right) \right) ^{2}\\ H_{1}^{\left( 2\right) }\left( \gamma \right)&:g\left( u\right) =\beta _{0}+\beta _{1}u+\gamma \sin \left( u\sqrt{2}\pi ^{2}/\left( \pi +2\right) \right) \\ H_{1}^{\left( 3\right) }\left( \gamma \right)&:g\left( u\right) =\beta _{0}+\beta _{1}u+\gamma \cos \left( u\sqrt{2}\pi ^{2}/\left( \pi +2\right) \right) \end{aligned}$$

where \(\gamma >0\), which controls the departure from the null hypothesis. To generate the models under the null and the alternative hypotheses are used \(\beta _{0}=0\), \(\beta _{1}=1\) so that \({\mathbb {E}}\left[ g\left( U\right) \right] =0\) and \(\gamma =0.2,0.3,0.4,0.5,0.6,0.7,0.8,1.2,1.4\).

In order to appreciate the differences between the considered models, the shapes of the functions g under the null and the alternative hypotheses with \(\gamma =0.5\) are drawn in Fig. 3: one can note that the quadratic perturbation involved in \(H_{1}^{\left( 1\right) }\) interest mainly the tails of the distribution of the projected data \(\left\langle \theta ,x\right\rangle\), the sinus in \(H_{1}^{\left( 2\right) }\) the central parts of the two halves of the interval, whereas the cosinus in \(H_{1} ^{\left( 3\right) }\) acts both on the central part and the tails of the interval. Hence one expects that the test performs rather well in the last case for any sample size n and \(\rho\), also with \(\gamma\) close to zero. On the other hand, it is expected that the performances are not so good, at least for small samples and large \(\rho\), for the first two alternatives \(H_{1}^{\left( 1\right) }\) and \(H_{1}^{\left( 2\right) }\) where a rather large sample size is necessary to detect small deviations from the linearity.

Fig. 3
figure 3

Shapes of the link functions g under \(H_{0}\) (solid), \(H_{1}^{\left( 1\right) }\left( 0.5\right)\) (dahsed), \(H_{1}^{\left( 2\right) }\left( 0.5\right)\) (dotted) and \(H_{1}^{\left( 3\right) }\left( 0.5\right)\) (dotdash)

The estimated powers varying \(\gamma\) for the different considered scenarios are represented in Figs. 45 and 6. First, note that, despite an asymptotic approximation for the null distribution is used, the empirical level is rather close to the theoretical one, also for a relatively small sample size. Second, as expected, for any sample size n and given \(\rho ^{2}\), the further one moves away from the linearity by increasing \(\gamma\), the greater is the estimated power. In any case, the performances are very good when \(n=200\) and \(\rho ^{2}=0.2\), also with a relatively modest departure from the null hypothesis. Looking at more in detail, the graphs support the previous comments that relate the nature of the link functions and the power behaviour. In particular the alternatives in the family indexed by \(H_{1}^{\left( 3\right) }\) produce the best result also for \(\gamma\) rather small, whereas it appears more complex to detect correctly \(H_{1}^{\left( 1\right) }\) and \(H_{1}^{\left( 2\right) }\), at least for small \(\gamma\), small samples and a rather low signal-to-noise.

Fig. 4
figure 4

Power curves for the alternatives \(H_{1}^{\left( 1\right) }\left( \gamma \right)\) varying \(\gamma\) and for \(\rho ^{2}=0.2\) (left panel) and \(\rho ^{2}=0.5\) (right panel). The grey line represents the nominal level, the dotted line \(n=50\), the dashed line \(n=100\), and the solid line \(n=200\)

Fig. 5
figure 5

Power curves for the alternatives \(H_{1}^{\left( 2\right) }\left( \gamma \right)\) varying \(\gamma\) and for \(\rho ^{2}=0.2\) (left panel) and \(\rho ^{2}=0.5\) (right panel). The grey line represents the nominal level, the dotted line \(n=50\), the dashed line \(n=100\), and the solid line \(n=200\)

Fig. 6
figure 6

Power curves for the alternatives \(H_{1}^{\left( 3\right) }\left( \gamma \right)\) varying \(\gamma\) and for \(\rho ^{2}=0.2\) (left panel) and \(\rho ^{2}=0.5\) (right panel). The grey line represents the nominal level, the dotted line \(n=50\), the dashed line \(n=100\), and the solid line \(n=200\)

4.2 Testing a cubic link

Let \(\beta _{0},\beta _{1}\in {\mathbb {R}}\), \(\beta _{1}\ne 0\), \(u=\left\langle \theta ,x\right\rangle\), \(\theta\) unknown, \(\left| u\right| < \left( \pi +2\right) /\left( \sqrt{2}\pi \right)\). The second experiment considers the following null hypothesis:

$$\begin{aligned} H_{0}:g\left( u\right) =\beta _{0}+\beta _{1}u^{3} \end{aligned}$$

against the following alternatives

$$\begin{aligned} H_{1}\left( \gamma \right) :g\left( u\right) =\beta _{0}+\beta _{1}\left( u^{\gamma }{\mathbb {I}}_{u>0}-\left| u\right| ^{\gamma }{\mathbb {I}} _{u\le 0}\right) \qquad \gamma \ne 3 \end{aligned}$$

where \({\mathbb {I}}_{A}\) is the indicator function of the set A, and \(\gamma >0\) measures the departure from the null hypothesis. To generate the models one uses \(\ \beta _{0}=0\), \(\beta _{1}=1\) and \(\gamma =1,1.5,2,2.5,3.5,4,4.5,5\) (the behaviour of some link functions for some selected values for \(\gamma\) are drawn in Fig. 7).

Fig. 7
figure 7

Shapes of the link functions g under \(H_{0}\) (solid), \(H_{1}^{\left( 1\right) }\left( 1.5\right)\) (dahsed), \(H_{1}^{\left( 2\right) }\left( 2\right)\) (dotted), \(H_{1}^{\left( 3\right) }\left( 4\right)\) (dotdash) and \(H_{1}^{\left( 3\right) }\left( 5\right)\) (longdash)

Also in this second experiment, the obtained results, visualized in Fig. 8, are generally rather good. One can note that the estimated level is slightly higher than the nominal one, providing a liberal test in particular for relatively small sample size. For any fixed n and \(\rho ^{2}\), the estimated power increases coherently with the departure from the null hypothesis. In particular it emerges that it is possible to better discriminate situation with \(\gamma <3\) than \(\gamma >3\) where the test is slightly less efficient. Anyway, for rather large samples the results obtained are good also for values of \(\gamma\) close to zero and \(\rho ^{2}=0.5\). In general, the results corroborate the fact that the Gaussian approximation of the null distribution works reasonably well.

Fig. 8
figure 8

Power curves for the alternatives \(H_{1}\left( \gamma \right)\) varying \(\gamma\) and for \(\rho ^{2}=0.2\) (left panel) and \(\rho ^{2}=0.5\) (right panel). The grey line represents the nominal level, the dotted line \(n=50\), the dashed line \(n=100\), and the solid line \(n=200\)

5 Some bootstrap procedures

Despite the use of the asymptotic null distribution in defining the critical region of the test has proven capable of producing good results, one can explore the possibility to estimate the threshold of the critical region by using the quantiles calculated through some bootstrap algorithms. Since methods based on bootstrapping the pairs \(\left( X_{i},Y_{i}\right)\) are not adapted, approaches based on boostrapping residuals are adopted.

The general procedure is described in the steps below:

  1. 1.

    Estimate \(\beta\) and \(\theta\) and then compute the errors under the null hypothesis \(\widehat{{\mathcal {E}}}_{i}\) and the value of the test statistic \(T_{n}\);

  2. 2.

    Compute the bootstrap version \(T_{n}^{\star }\) of the test statistics by using the bootstrapped errors \(\widehat{{\mathcal {E}}}_{i}^{\star }\) (see below for details);

  3. 3.

    Repeat step 2. a large number B of times and compute the \(\left( 1-\alpha \right)\)-quantile \(\tau _{\alpha }^{\star }\) of the distribution of \(T_{n}^{\star }\);

  4. 4.

    Compare \(T_{n}\) with \(\tau _{\alpha }^{\star }\): if the value of \(T_{n}\) is larger than \(\tau _{\alpha }^{\star }\) reject the null hypothesis.

For what concerns step 2., both naive and wild bootstrap approaches are adopted:

  1. (i)

    the naive bootstrap is based on a direct resampling with replacement of the estimated errors \(\widehat{{\mathcal {E}}}_{i}\), \(i=1,\dots n\);

  2. (ii)

    the wild bootstrap errors are calculated as \(\widehat{{\mathcal {E}} }_{i}^{\star }=\widehat{{\mathcal {E}}}_{i}\xi _{i}\), where \(\xi _{i}\) are i.i.d. and independent on \(\left( X_{i},Y_{i}\right)\). In the experiment three different distributions for \(\xi _{i}\) are used:

    1. (a)

      the Rademacher distribution with equiprobable values \(\left\{ -1,1\right\}\);

    2. (b)

      the distribution suggested by Mammen (1993) with values \(\left( 1-\sqrt{5}\right) /2\) and \(\left( 1+\sqrt{5}\right) /2\) with associated probability \(\left( \sqrt{5}+1\right) /\left( 2\sqrt{5}\right)\) and \(\left( \sqrt{5}-1\right) /\left( 2\sqrt{5}\right)\) respectively;

    3. (c)

      the standard Gaussian distribution.

To assess the performances of the test when the asympototic null distribution and the bootstrap approaches are employed, an experiment similar to the one presented in Sect. 4 is carried out, choosing \(B=1000\) in step 3. of the algorithm.

In particular let \(\beta _{0},\beta _{1}\in {\mathbb {R}}\), \(\beta _{1}\ne 0\), \(u=\left\langle \theta ,x\right\rangle\), \(\theta\) unknown, \(\left| u\right| < \left( \pi +2\right) /\left( \sqrt{2}\pi \right)\). Consider the null hypothesis of a quadratic link:

$$\begin{aligned} H_{0}:g\left( u\right) =\beta _{0}+\beta _{1}u^{2} \end{aligned}$$

against the alternatives

$$\begin{aligned} H_{1}\left( \gamma \right) :g\left( u\right) =\beta _{0}+\beta _{1}\left| u\right| ^{\gamma }\qquad \gamma \ne 2 \end{aligned}$$

where \(\gamma =1,1.5,2.5,3\). For what concerns the data generation process, one fixes \(n=100\), and \(\beta _{0}=0\), \(\beta _{1}=1\), and \(\rho ^{2}=0.2\).

The estimated powers varying \(\gamma\) (when \(\gamma =2\) one deals with the null hypothesis), on the base of 1000 MC replications, are collected in Table 1. The results obtained in the different cases are very similar: it emerges that the test based on the asymptotic distribution is slightly more liberal than all the tests based on the bootstrap approaches; for what concerns the estimated power it seems that the performances are very similar in all the analyzed cases. In conclusion, the test that uses the quantiles of the standard Gaussian performs quite well also when rather small samples are available.

Table 1 Estimated level and powers for testing quadratic specification under different experimental conditions when one uses the asymptotic null distribution (A), the naive bootstrap (N), the wild boostrap with Rademacher (R), Mammen (M) or Gaussian (G) distributions

6 Application to spectrometric data

An important task in domains like chemistry, medicine or food industry is to get the composition of a given substance. Since chemical analysis is rather expensive and require time, it is often preferred to estimate that composition by using spectrometric curves that can be easily obtained as the absorption of a reflected light for various wavelengths.

In this section one considers an example of such a modeling from food industry. The dataset (available at https://www.eigenvector.com/data) consists of 80 samples of corn to each of which corresponds both the values of moisture, oil, protein and starch contents obtained by chemical analysis and the spectrometric curve measured by NIR spectrometers on the wavelength range 1100–2498 and discretized over an equispaced mesh of 700 points (the set of these curves is reproduced in Fig. 1). The aim is to model the chemical composition of the corn by using the spectrometric curves, or better, as it is done in similar context, their derivatives.

A systematic study on this dataset has been carried out in Delsol (2013) where no effect tests in functional regression were performed taking as explanatory variables the original spectrometric curves and their successive derivatives, from the first to the fourth order. According to the results in the cited paper, it emerges that the first derivative has a significant effect on moisture content, no significant effects are detected for oil and starch content, whereas the effect of the fourth derivative on the protein is not evident. On the other hand, the first derivative on the wavelength range 2010–2220 exhibits a significant effect on the protein content.

Starting from the latter evidences, one concentrates the attention on modelling by a SFIM the moisture content by using as covariate the first derivative on the whole wavelength range, and the protein content with covariate the first derivative on the range 2010–2220. The estimated links g and directions \(\theta\), obtained by using the same procedure adopted in Sect. 4 with cubic splines with 10 and 2 knots respectively, are depicted in Fig. 9. Both graphs suggest that the link functions g in the SFIMs could be specified linearly: therefore the test for linearity illustrated in Sect. 4.1 is performed.

In particular one wants to test the following model specification:

$$\begin{aligned} H_0: Y = \beta _0 + \beta _1 (\langle X,\theta \rangle )+{\mathcal {E}} \end{aligned}$$

Since all the parameters involved (\(\beta _0\), \(\beta _1\) and \(\theta\)) are unknown, they must be estimated from the data: the \(\widehat{\theta }\)s are the formerly estimated directions plotted in Fig. 9, whereas \(\widehat{\beta }_0\) and \(\widehat{\beta }_1\) are the OLS estimated for the model under the null hypothesis with covariate \(\langle X,\widehat{\theta }\rangle\). For the sake of completeness, the estimates obtained are as follows: \(\widehat{\beta }_0 = 18.57\) and \(\widehat{\beta }_1= -37.80\) when the response is the moisture content, \(\widehat{\beta }_0 = 13.74\) and \(\widehat{\beta }_1= 20.91\) when the response is the protein content. The behaviour of the residuals of such a models, from which the test statistics are calculated, is shown in Fig. 10.

By using the asymptotic null distribution, the following results are achieved: the p-value for the model having as response the moisture content is 0.695 whereas the one when the response is the protein content equals 0.719. Hence, one can conclude that in both cases the hypothesis of linearity of the link function appears compatible with the empirical evidence.

Fig. 9
figure 9

Estimated link functions g and directions \(\theta\) when the response is the moisture content (top) or the protein content (bottom)

Fig. 10
figure 10

Estimated errors under the null hypothesis when the response is the moisture content (left) or the protein content (right)

Since one tests the linearity of the link functions, to complete the analysis, a comparison with the linearity test proposed by Garcia-Portugues et al. (2014) and available in the R package fda.usc is performed. To be coherent with what is done above, one uses the estimated method based on B-spline with the same number of elements of the basis considered in the SFIM. For what concerns the model with response the moisture content, the p-value is 0.217, whereas for the model having as response the protein content, the p-value equals 0.224. In both cases, there is no reason to reject of the hypothesis of linearity of the model: the latter results are then coherent with what emerged by the proposed test.