1 Introduction

Semiparametric regression models have a long history in statistics (see, e.g., the textbooks Green and Silverman 1994; Bickel et al. 1998; Ruppert et al. 2003, and references therein). Because of their flexibility and versatility, they have been the object of an extensive and still very active literature. In this work, we propose an efficient (conditional) resampling-based test (Pesarin 2001; Hemerik and Goeman 2018b; Chung and Romano 2013) for the linear component in partially linear and semiparametric regression models with roughness penalties. The test can be applied to a vast class of extensively used models, with nonparametric components defined over one-dimensional, as well as over multi-dimensional domains, including manifold domains. This embraces, for instance, the highly popular semiparametric regression models based on splines (see, e.g., Heckman 1986; Yu and Ruppert 2002; Wand and Ormerod 2008; Wang 2019, and references therein), on thin-plate splines (see, e.g., Wood 2003), and on spherical splines (Wahba 1981), as well as semiparametric models based on recent smoothing techniques over two-dimensional (possibly irregularly shaped or curved) domains, such as soap film smoothing (Wood et al. 2008), bivariate-splines over triangulations (Lai and Schumaker 2007; Baramidze et al. 2006; Lai et al. 2009; Guillas and Lai 2010; Lai and Wang 2013; Wang et al. 2020), and Spatial Regression with Partial Differential Equation regularization (SR-PDE) (see, e.g., Sangalli et al. 2013; Azzimonti et al. 2015; Ettinger et al. 2016; Wilhelm et al. 2016; Sangalli 2021).

Various classical approaches are available to make inference in the context of semiparametric regressions, and different strategies have been proposed to cope with the bias induced by the roughness penalty. Some possibilities include undersmoothing approaches developed for nonparametric models [see, e.g., the review in Hall and Horowitz (2013)], Bayesian approaches (Wahba 1983; Nychka 1988; Marra and Wood 2012) and various corrections of Wald-type test statistics, such as the sandwich estimators in Gray (1994) and Yu and Ruppert (2002), and the Speckman’s version in Speckman (1988) and Holland (2017). These approaches might nonetheless have poor performances in the finite sample scenario, due to the effects of the roughness penalty (see, e.g., Maas and Hox 2004; Freedman 2006). In particular, as also evidenced by the simulation studies reported in this work, such tests have a poor control of Type I error.

Here we propose an innovative test for the linear part of semiparametric regression models, based on conditional resampling of a transformation of the residuals. This test, unlike other proposals, allows to overcome the problem of dependence in the residuals that is particularly strong in semiparametric models. Some approaches proposed in the context of classical regression models, such as those in Huh and Jhun (2001) and Kherad-Pajouh and Renaud (2010), derive transformed residuals from spectral decomposition of the residualizing matrix that projects into the residual space. In the setting considered by these authors, the conditional distribution of the test statistic can be defined on the basis of permutations (see, e.g., Pesarin 2001; Chung and Romano 2013; Pauly et al. 2015; Winkler et al. 2014), rotations (Solari et al. 2014) or sign-flips (Hemerik et al. 2020) of such transformed residuals. These approaches are nonetheless not valid in this context, since the residualizing matrix is not idempotent in the case of penalized regression models. Because of this, the transformed residuals are not spherical (i.e., they are not homoscedastic and independent) and the standard permutation, rotation or sign-flip procedures become invalid in our context. To overcome this problem, we here study a conditional sign-flip procedure, named eigen sign-flip test, that preserves the finite sample covariance structure of the residuals, hence ensuring asymptotically exactness of the derived test. This idea has been explored in Ferraccioli et al. (2021), restricted to a specific case of SR-PDE model. The current work addresses instead the broad spectrum of highly popular semiparametric regression models mentioned above. Moreover, we study in detail the asymptotic properties of the test. In particular, we prove the asymptotic exactness of the test and derive similar results for interval hypothesis and confidence intervals. Some of the obtained results leverage on the asymptotic properties of the estimator of the nonparametric part of the model. Such properties in turn depend on conditions that are model-specific, since they depend, for instance, on the dimension and geometry of the domain over which the nonparametric term is defined, on the roughness term being considered, on the type of basis, etc. In the present work, we hence define assumptions that are general enough to cover a variety of semiparametric regression models, and refer the reader to other works for the appropriate specifications of such assumptions for the specific model being considered (e.g., to Claeskens et al. (2009) for univariate penalized splines estimators, to Holland (2017) for multivariate penalized splines estimators, to Xiao (2019) for general penalized splines and to Arnone et al. (2021) for SR-PDE.)

The paper is organized as follows. In Sect. 2 we briefly review the semiparametric penalized regression framework, outlining the forms of the associated discrete estimators. In Sect. 3 we recall some classical parametric approaches for inference on the linear part of a semiparametric regression model and summarize the properties of the score test statistic in this context. In Sect. 4 we present the eigen-sign flip test and describe its theoretical and asymptotic properties. In Sect. 5 we compare our proposal to more classical parametric approaches in extensive simulation studies. In Sect. 6 we present an application to the study of human development in Nigeria. Finally, some discussions and possible directions for future research are outlined in Sect. 7.

2 Semiparametric regression

Let \(y_{i} \in {\mathbb {R}}\) be the value of the variable of interest observed in correspondence of covariates \({\textbf{x}}_{i} \in {\mathbb {R}}^{q}\) and of \({\textbf{p}}_{i}\in \Omega \subseteq {\mathbb {R}}^{d},\) \(d\ge 1\). We consider the semiparametric model

$$\begin{aligned} y_{i} = {\textbf{x}}_{i}^{\top }\varvec{\beta } + f({\textbf{p}}_{i}) + \epsilon _{i}, \qquad i = 1, \dots , n, \end{aligned}$$
(1)

where \(\varvec{\beta } \in {\mathbb {R}}^{q}\) is the vector of regression parameters, f is a real-valued smooth function on \(\Omega \), and \(\epsilon _{i}\) are i.i.d. random errors with \({\mathbb {E}}(\epsilon _{i}) = 0\) and \({\mathbb {E}}(\epsilon _{i}^2) = \sigma ^{2}\).

The interest is to estimate both the linear coefficients \(\varvec{\beta }\) and the nonparametric component f. However, the estimation of \((\varvec{\beta },f)\) in model (1) via maximum likelihood is usually inappropriate or infeasible, due to the infinite-dimensionality of the nonparametric component f. To avoid this problem, some type of roughness penalty can be imposed, in order to reduce the space of possible solutions. In general, the resulting penalized likelihood estimators for \(\varvec{\beta }\) and f are the solution of the minimization problem

$$\begin{aligned} \underset{\varvec{\beta }, f}{\text {argmin}}\sum _{i = 1}^{n}(y_{i} - f({\textbf{p}}_{i}) - {\textbf{x}}_{i}^{\top }\varvec{\beta })^{2} + \lambda _{n}{\mathcal {P}}(f), \end{aligned}$$
(2)

where \({\mathcal {P}}(\cdot )\) is some type of roughness penalty. Depending on the assumptions on the domain \(\Omega \subseteq {\mathbb {R}}^{d},\) on the dimension d,  and on the required smoothness of the function f, various proposals for \({\mathcal {P}}(f)\) have been considered in the literature, and different discretization procedures have been adopted to reduce the infinite-dimensional estimation problem (2) to a finite dimensional one. For instance, for \(d=1\) and \(\Omega \) an interval of the real line, model (1)-(2) can involve the classical and extensively used O’Sullivan splines (O’Sullivan 1986; Heckman 1986; Yu and Ruppert 2002; Wand and Ormerod 2008), whose penalty is the integrated squared derivative of some order, and can, for instance, rely on B-spline bases. When \(\Omega \) is the real plane, it is possible to use thin-plate splines (see, e.g., Duchon 1977; Wahba 1990; Wood 2003), which involve as penalty the so-called thin-plate energy. Moreover, various recent techniques target two-dimensional bounded planar domains \(\Omega \subset {\mathbb {R}}^2\), including: soap-film smoothing (Wood et al. 2008) that considers a penalty involving the Laplacian of f; bivariate-splines over triangulations (Lai and Schumaker 2007; Guillas and Lai 2010; Lai and Wang 2013), whose regularizing term may include high-order derivatives; SR-PDE (Sangalli et al. 2013; Azzimonti et al. 2015), where the regularizing term can involve general second-order partial differential equations, and the estimation problem is discretized via finite element bases (Sangalli et al. 2013; Azzimonti et al. 2015) or advanced spline bases (Wilhelm et al. 2016). Some of these techniques also permit the constructions of semiparametric models over spherical domains (Wahba 1981; Baramidze et al. 2006; Lai et al. 2009) and general surface domains (Ettinger et al. 2016; Wilhelm et al. 2016).

2.1 Discrete estimators

The estimation of model (1) usually involves the representation of the nonparametric component f through some type of basis expansion, depending on the penalization being considered. Let \(\Psi \in {\mathbb {R}}^n\times {\mathbb {R}}^K\) be the matrix of the evaluations of the K basis functions \(\psi _{1}, \dots , \psi _{K}\) at the n data locations \({\textbf{p}}_{1}, \dots , {\textbf{p}}_{n}\), that is,

$$\begin{aligned} \Psi = \begin{bmatrix} \psi _{1}({\textbf{p}}_{1}) &{}\quad \dots &{}\quad \psi _{K}({\textbf{p}}_{1})\\ \vdots &{} \ddots &{} \vdots \\ \psi _{1}({\textbf{p}}_{n}) &{} \quad \dots &{}\quad \psi _{K}({\textbf{p}}_{n}) \end{bmatrix}. \end{aligned}$$

Then, we write \((f({\textbf{p}}_{1}), \dots , f({\textbf{p}}_{n}))^{\top }=\Psi \varvec{\gamma }\) for some vector of coefficients \(\varvec{\gamma }\in {\mathbb {R}}^K.\) Moreover, let P denote the \(K\times K\) positive semidefinite matrix representing the discretization of the penalty \({\mathcal {P}}(\cdot )\). Finally, set \({\textbf{y}}=(y_1,\ldots ,y_n)^{\top }\)and denote by \(X\in {\mathbb {R}}^n\times {\mathbb {R}}^q\) the design matrix, whose i-th row is given by \({\textbf{x}}_{i}\). The estimation problem (2) is therefore discretized as

$$\begin{aligned} \underset{\varvec{\beta }, \varvec{\gamma }}{\text {argmin}} \big \{ ({\textbf{y}} - X\varvec{\beta }- \Psi \varvec{\gamma })^{\top } ({\textbf{y}} - X\varvec{\beta }- \Psi \varvec{\gamma })+ \lambda _{n}\varvec{\gamma }^{\top }P\varvec{\gamma } \big \}. \end{aligned}$$
(3)

The solution to (3) is uniquely determined by the normal equations

$$\begin{aligned} X^\top ({\textbf{y}} - X\varvec{\beta } - \Psi \varvec{\gamma })&= 0 , \end{aligned}$$
(4)
$$\begin{aligned} (\Psi ^{\top }\Psi + \lambda P)\varvec{\gamma } - \Psi ^\top ({\textbf{y}} - X\varvec{\beta })&= 0. \end{aligned}$$
(5)

Setting

$$\begin{aligned} Q= & {} I_{n} - X(X^{\top }X)^{-1}X^{\top },\nonumber \\ \Lambda= & {} I_{n} - \Psi (\Psi ^{\top }\Psi + \lambda P)^{-1}\Psi ^{\top }, \end{aligned}$$
(6)

the explicit form of the estimators for \(\varvec{\beta }\) and \(\varvec{\gamma }\) is, respectively,

$$\begin{aligned} \hat{\varvec{\beta }}&= (X^{\top }\Lambda X)^{-1}X^{\top }\Lambda {\textbf{y}},\nonumber \\ \hat{\varvec{\gamma }}&= (\Psi ^{\top }Q\Psi + \lambda P)^{-1}\Psi ^{\top }Q{\textbf{y}}, \end{aligned}$$
(7)

or equivalently

$$\begin{aligned} \hat{\varvec{\beta }}&= (X^{\top }X)^{-1}X^{\top }({\textbf{y}}-\Psi \hat{\varvec{\gamma }}),\nonumber \\ \hat{\varvec{\gamma }}&= (\Psi ^{\top }\Psi + \lambda P)^{-1}\Psi ^{\top }({\textbf{y}} - X\hat{\varvec{\beta }}). \end{aligned}$$
(8)

3 Inference on \(\varvec{\beta }\)

In semiparametric regression, a natural question is whether the covariates X have an effect on the variable of interest. We are thus interested in the system of hypotheses

$$\begin{aligned} \text {H}_{0}: \varvec{\beta } = \varvec{\beta }_{0} \qquad \text {versus} \qquad \text {H}_{1}: \varvec{\beta } \ne \varvec{\beta }_{0}. \end{aligned}$$
(9)

A standard approach to verify (9) is to use a Wald-type test (see, e.g., Schervish 2012), based on the asymptotic distribution of \(\hat{\varvec{\beta }}\). The study of the asymptotic distribution of \(\hat{\varvec{\beta }},\) in semiparametric regression models, has been tackled by a number of works. See, for instance, Heckman (1986); Yu and Ruppert (2002); Li and Ruppert (2008); Holland (2017); Xiao (2019); Yu et al. (2019); Wang et al. (2020) for semiparametric models based on univariate and bivariate splines.

The parametric Wald-type test may nonetheless have poor performances in small sample scenarios, due to the overestimation of the variance of the test statistic, induced by the penalization. A number of corrections to Wald-type test have been proposed to avoid this issue, such as the sandwich estimators in Gray (1994) and Yu and Ruppert (2002) and the Speckman’s version in Speckman (1988) and Holland (2017). Nonetheless, these approaches can only partially solve the problem, and may lead to a poor control of the Type I error, especially when a strong temporal/spatial structure in the covariates is present, as indicated by the simulations carried on in Sect. 5.

In the Sect. 4 we introduce an innovative nonparametric alternative for testing on \(\varvec{\beta }\). Such proposal is based on the score statistic. For this reason, in the remainder of this section we review the properties of the score statistic in the context of penalized semiparametric regression. The proposed method does not rely on the estimation of the Fisher information matrix to define the null distribution, which is implicitly recovered by an appropriate nonparametric resampling procedure, as described in Sect. 4.

3.1 Properties of the score statistic in penalized semiparametric regression

We first study the distributional properties of the score statistic, which constitute the base of the nonparametric test defined in Sect. 4. Using the normal equation (4), we can define the classical score test statistic

$$\begin{aligned} X^{\top }({\textbf{y}} - X{\varvec{\beta }_{0}}- \Psi \varvec{\gamma }). \end{aligned}$$
(10)

Since \(\varvec{\gamma }\) is unknown, we can use the plug-in \(\hat{\varvec{\gamma }}\). Substituting \(\hat{\varvec{\gamma }}\) in expression (10), we define the test statistic T as

$$\begin{aligned} T&= n^{-1/2}X^{\top }({\textbf{y}} - \Psi \hat{\varvec{\gamma }} - X{\varvec{\beta }_{0}}) \nonumber \\&= n^{-1/2}X^{\top }({\textbf{y}} - \Psi (\Psi ^{\top }\Psi + \lambda P)^{-1}\Psi ^{\top }({\textbf{y}} - X{\varvec{\beta }_{0}}) - X{\varvec{\beta }_{0}}) \nonumber \\&= n^{-1/2}X^{\top }((I - \Psi (\Psi ^{\top }\Psi + \lambda P)^{-1}\Psi ^{\top }){\textbf{y}} - (I - \Psi (\Psi ^{\top }\Psi + \lambda P)^{-1}\Psi ^{\top })X{\varvec{\beta }_{0}}) \nonumber \\&= n^{-1/2}X^{\top }\Lambda ({\textbf{y}} - X{\varvec{\beta }_{0}}) = n^{-1/2}X^{\top }\Lambda {\textbf{r}}. \end{aligned}$$
(11)

with \({\textbf{r}} = {\textbf{y}} - X{\varvec{\beta }_{0}}\). We make the following assumption:

  1. (A1)

    For n large enough, the matrix \(\Psi ^{\top }\Psi \) is positive definite.

Assumption (A1) is quite general; its specification depends on the basis considered. In particular, this specification usually involves conditions on the nodes of the basis and their position with respect to the design points \({\textbf{p}}_{1}, \dots , {\textbf{p}}_{n}.\) More specifically, it involves the type of basis, the rate at which the number of bases K grows with n, the minimum distance between the nodes, and the density of the design points inside the domain. For instance, in the case of univariate penalized splines estimators, (A1) follows from Assumptions 1–3 in Claeskens et al. (2009). In the case of multivariate penalized splines estimators, it follows from Assumptions 1–2 in Holland (2017). In the case of SR-PDE, it follows from Assumptions 3–5 in Arnone et al. (2021).

Here we consider the case of fixed designs, thus implicitly conditioning on the sample points and the covariates. Similar results can be obtained in the random design scenario, by introducing further assumptions on the distribution of the design points and covariates (e.g., that the covariates are realizations of continuous processes on \(\Omega \)).

Under (A1), we can consider the Demmler and Reinsch (1975) decomposition

$$\begin{aligned} (\Psi ^{\top }\Psi )^{-1/2}P(\Psi ^{\top }\Psi )^{-1/2} = U \text {diag}(\rho )U^{\top }, \end{aligned}$$

where U is the matrix of eigenvectors, and \(\rho \) is the corresponding vector of eigenvalues \(\{\rho _{k}\}_{k = 1}^{K}\) (see Eubank 1999, for details). Let us also denote \(A = \Psi (\Psi ^{\top }\Psi )^{-1/2}U\). Note that this matrix is semi-orthogonal, i.e., \(A^{\top }A = I_{K}\) and \(AA^{\top } = \Psi (\Psi ^{\top }\Psi )^{-1}\Psi ^{\top }\). Following Demmler and Reinsch (1975), we can rewrite the matrix \(\Lambda \) in (6) as

$$\begin{aligned} \Lambda = I_n - A (I_{K} + \lambda \text {diag}(\rho ))^{-1}A^{\top }. \end{aligned}$$

Using this decomposition, we can now study the behavior of the bias of the test statistic T,  in terms of the eigenvalues \(\rho _{k}\).

Lemma 3.1

Assume (A1) and let and . Let also be the q-dimensional vectors corresponding to the rows of , and be the elements of the vector . Under the null hypothesis (9), the bias \({\textbf{b}}_{\lambda }\) of T is

(12)

where the inequality is considered element-wise.

Proof

Denote by \(\varvec{\epsilon }\) the n-dimensional vector of i.i.d. residuals. Under the null hypothesis, we have

$$\begin{aligned} {\mathbb {E}}(T)&= n^{-1/2}{\mathbb {E}}(X^{\top }\Lambda {\textbf{r}}) \\&= n^{-1/2}{\mathbb {E}}(X^{\top }\Lambda ({\textbf{y}} - X{\varvec{\beta }_{0}})) \\&= n^{-1/2}X^{\top }\Lambda {\mathbb {E}}(\Psi \varvec{\gamma } + X{\varvec{\beta }_{0}} - X{\varvec{\beta }_{0}}+ \varvec{\epsilon }) \\&= n^{-1/2}X^{\top }\Lambda \Psi \varvec{\gamma } + n^{-1/2} X^{\top }\Lambda {\mathbb {E}}(\varvec{\epsilon }) \\&= n^{-1/2}X^{\top }\Lambda \Psi \varvec{\gamma } = {\textbf{b}}_{\lambda }, \end{aligned}$$

since the term \({\mathbb {E}}(\varvec{\epsilon })\) is zero by assumption. Using the decomposition in (3.1), it follows that

$$\begin{aligned} \Lambda&= I_{n} - A (I_{K} + \lambda \text {diag}(\rho ))^{-1}A^{\top } \nonumber \\&= AA^{\top } - AA^{\top } + I_{n} - A (I_{K} + \lambda \text {diag}(\rho ))^{-1}A^{\top } \nonumber \\&= A(I_{K} - (I_{K} + \lambda \text {diag}(\rho ))^{-1})A^{\top } - AA^{\top } + I_{n}. \end{aligned}$$
(13)

Substituting (13) in \({\textbf{b}}_{\lambda }\), we obtain

$$\begin{aligned} {\textbf{b}}_{\lambda }&= n^{-1/2}[X^{\top }(A(I_{K} - (I_{K} + \lambda \text {diag}(\rho ))^{-1})A^{\top } - AA^{\top } + I_{n})\Psi \varvec{\gamma }] \\&= n^{-1/2}[X^{\top }A(I_{K} - (I_{K} + \lambda \text {diag}(\rho ))^{-1})A^{\top }\Psi \varvec{\gamma } - X^{\top }AA^{\top }\Psi \varvec{\gamma } + X^{\top }\Psi \varvec{\gamma }] \\&= n^{-1/2}[X^{\top }A(I_{K} - (I_{K} + \lambda \text {diag}(\rho ))^{-1})A^{\top }\Psi \varvec{\gamma } - X^{\top }\Psi (\Psi ^{\top }\Psi )^{-1}\Psi ^{\top }\Psi \varvec{\gamma } + X^{\top }\Psi \varvec{\gamma }] \\&= n^{-1/2}X^{\top }A(I_{K} - (I_{K} + \lambda \text {diag}(\rho ))^{-1})A^{\top }\Psi \varvec{\gamma }. \end{aligned}$$

Using the notation and , the bias can therefore be rewritten as

(14)

where are the q-dimensional vectors corresponding to the rows of , and the elements of the vector . Equation (14) highlights that the bias is a sum of K contributions, weighted by the eigenvalues \(\rho _{k}\), and moderated by \(\lambda \). Since the function \(x/(1+x) < x\), for \(x > 0\), we can bound the bias as follows

\(\square \)

The expression (14) highlights how the bias depends on the chosen penalization through the eigenvalues \(\rho _k\). We finally make the following assumption.

  1. (A2)

    The smoothing parameter \(\lambda = \lambda _{n}\) is chosen so that \(\lambda \sum _{i = 1}^{K}\rho _{i} = o(1)\).

Thanks to Lemma 3.1, assumption (A2) implies the asymptotic unbiasedness of score statistic T, since f is a continuous function on the bounded domain \(\Omega \) and the covariates are realizations of a continuous process on \(\Omega \). This is a standard assumption when studying the asymptotic properties of semiparametric and nonparametric penalized regression models. Likewise for Assumption (A1), also Assumption (A2) needs to be specified depending on the penalty and basis considered. Indeed, Assumptions (A1)–(A2) are intentionally left quite general to embrace various semiparametric models; moreover, the precise rates of convergence are not of direct interest in this work. Theorem 1 in Claeskens et al. (2009) gives, for instance, the appropriate rates for \(\lambda \) in the case of univariate penalized splines estimators, Theorem 3 in Holland (2017) gives it for multivariate penalized spline estimators, while Lemma 3 in Arnone et al. (2021) gives it for SR-PDE estimators.

We can now state the main result for the asymptotic distribution of the test statistic T.

Theorem 3.2

Let \(\nu = \sigma ^{2}X^{\top }\Lambda ^{2}X\). Under the assumptions (A1)–(A2), the test statistic T in (11) is asymptotically normal under the null hypothesis (9), with

$$\begin{aligned} \sqrt{n} \nu ^{-1/2} T \overset{\cdot }{\sim } {\mathcal {N}}({\varvec{0}}, {\mathbb {I}}_{q}). \end{aligned}$$

Proof

We know that

$$\begin{aligned} {\mathbb {E}}(T) = n^{-1/2}{\mathbb {E}}(X^{\top }\Lambda {\textbf{r}})&= n^{-1/2}X^{\top }\Lambda \Psi \varvec{\gamma } + n^{-1/2}X^{\top }\Lambda {\mathbb {E}}(\varvec{\epsilon }) \\&= n^{-1/2}{\textbf{b}}_{\lambda } + n^{-1/2}\sum _{i = 1}^{n}[X^{\top }\Lambda ]_{i}{\mathbb {E}}(\epsilon _{i}) \end{aligned}$$

where the notation \([X^{\top }\Lambda ]_{i}\) is used to indicate the i-th column of the \(q\times n\) matrix \(X^{\top }\Lambda \). Under assumption (A2), it follows from (12) that the bias \({\textbf{b}}_{\lambda }\) is asymptotically zero. The expected value \({\mathbb {E}}(T)\) is therefore asymptotically zero. For the variance, under the null hypothesis we have

$$\begin{aligned} \text {Var}(T) = \text {Var}(n^{-1/2}X^{\top }\Lambda {\textbf{r}})&= n^{-1}X^{\top }\Lambda \text {Var}(\Psi \varvec{\gamma } + X\varvec{\beta }_{0} + \varvec{\epsilon } - X\varvec{\beta }_{0})\Lambda X\\&= n^{-1}X^{\top }\Lambda \text {Var}( \varvec{\epsilon })\Lambda X = n^{-1}\sigma ^{2}X^{\top }\Lambda ^{2} X. \end{aligned}$$

Substituting the expression of \(\Lambda \) from equation (13) in the previous expression, we obtain

Using the notation and completing the square in the second term, we hence get

(15)

where are the q-dimensional vectors corresponding to the rows of . Note that the first term does not depend on \(\lambda \). As for the second term, since \(x^{2}/(1+x)^{2} < x^{2}\), for \(x > 0\), we have

where the maximum is taken element-wise. Therefore, for n large enough (since the covariates are realizations of a continuous process on \(\Omega \)), assumption (A2) implies that the second term in (15) vanishes faster than the first term. Concerning the first term in (15), with a similar argument it is easy to check that the matrix \(AA^{\top }\) is idempotent with rank K. Thus, it admits the spectral decomposition \(AA^{\top } = U \text {diag}(1, \dots , 1, 0, \dots , 0)U^{\top }\), with the first K non-null eigenvalues equal to 1. The term \(X^{\top }(I_{n} - AA^{\top })X\) is therefore the sum of \(n-K\) components with bounded variance, since the covariates are realizations of a continuous process on \(\Omega \), thus the Feller condition is satisfied. It follows from the central limit theorem [see, e.g., Van der Vaart (2000)] that the test statistic T is also asymptotically normal. \(\square \)

4 Eigen sign-flip test for the linear component in penalized semiparametric regression models

In the classical linear regression case, under the standard assumption of i.i.d. random noise, the score statistics can also be viewed as a sum of n contributions that have asymptotically zero mean, under the null hypothesis \(H_0\) (9). This information can be used to derive the null distribution of the test statistic, without the need of a direct estimation of the Fisher information. In the context of semiparametric regression, instead, a first naive attempt to derive the distribution of the test statistic can be made by random permutations (or sign-flips) of the contributions of the score (Winkler et al. 2014; Hemerik et al. 2020). This approach, attempted in Ferraccioli (2020) for a simple type of SR-PDE model (Sangalli et al. 2013), might nonetheless be not optimal in the semiparametric regression setting. The reason for this lies in the fact that naive permutation does not account for the correlation between residuals, nor for the bias of the estimates, which is inherent to semiparametric models. To solve this issue, always considering a special case of SR-PDE model, Ferraccioli et al. (2021) defines a new test statistic, that leverages on the spectral decomposition of the matrix \(\Lambda \), leading to the definition of the eigen sign-flip test.

We here defined the eigen sign-flip test on \(\varvec{\beta }\) for a general forms of penalized semiparametric regression models. We study the properties of the test, proving its asymptotic distribution. A thorough discussion on the nature of the proposed test in given in Sect. 4.2.

Definition 1

(Eigen sign-flip test) Let us consider the singular value decomposition \(\Lambda = VDV^{\top }\). Set \(\Pi = \textrm{diag}(\pi _{1}, \dots , \pi _{n})\), where \(\varvec{\pi } = (\pi _{1}, \dots , \pi _{n})\) is a random vector uniformly distributed in \(\{-1, 1\}^{n}\). Let us also define the n-dimensional vectors \({\tilde{X}} = D^{1/2}V^{\top }X\) and \(\tilde{{\textbf{r}}} = D^{1/2}V^{\top }{\textbf{r}} = D^{1/2}V^{\top }({\textbf{y}} - X{\varvec{\beta }_{0}})\). The eigen sign-flip statistics is defined as

$$\begin{aligned} T_{\Pi } = n^{-1/2}X^{\top }VD^{1/2}\Pi D^{1/2}V^{\top } \Lambda ({\textbf{y}} - X{\varvec{\beta }_{0}}) = n^{-1/2}{\tilde{X}}^{\top }\Pi \tilde{{\textbf{r}}}. \end{aligned}$$

Note that the observed statistic \(T = T_{I}\) corresponds to the case where \(\pi _{i} = 1,\ i=1,\ldots ,n\). As standard in permutational approaches, the component-wise p-values are thus computed as the rank of \(T_{I}\) with respect to a sample of M sign-flips \(\varvec{\pi }\), divided by M (see, e.g., Pesarin 2001).

4.1 Asymptotic properties of the eigen sign-flip test

We now study the asymptotic properties of the test statistic \(T_{\Pi }\) in Definition 1. We first show that the asymptotic distribution of the test statistic \(T_{\Pi }\) is the same as \(T_{I}\). We then show that the eigen sign-flip test is asymptotically exact.

Theorem 4.1

Let \(\nu = \sigma ^{2}X^{\top }\Lambda ^{2}X\). Under the assumptions (A1)–(A2), for any given \(\Pi \), the distribution of \(T_{\Pi }\) is asymptotically normal, with

$$\begin{aligned} \sqrt{n} \nu ^{-1/2} T_{\Pi } \overset{\cdot }{\sim } {\mathcal {N}}({\varvec{0}}, {\mathbb {I}}_{q}). \end{aligned}$$

Proof

For the expected value, under the null hypothesis we have

$$\begin{aligned} {\mathbb {E}}(T_{\Pi })&= n^{-1/2}{\mathbb {E}}({\tilde{X}}^{\top }\Pi \tilde{{\textbf{r}}}) \\&= n^{-1/2}{\mathbb {E}}(X^{\top }V\Pi V^{\top } \Lambda ({\textbf{y}} - X{\varvec{\beta }_{0}})) \\&= n^{-1/2}X^{\top }V\Pi V^{\top } \Lambda {\mathbb {E}}({\textbf{y}} - X{\varvec{\beta }_{0}}) \\&= n^{-1/2}X^{\top }V\Pi V^{\top } \Lambda \Psi \varvec{\gamma }. \end{aligned}$$

Following the same reasoning of the proof of Theorem 3.2, but with the quantity \(V \Pi V^{\top }X\) in place of X, we can show that the expected value of \(T_{\Pi }\) is asymptotically zero.

As for the variance, under the null hypothesis we have

$$\begin{aligned} \text {Var}(T_{\Pi })&= \text {Var}(n^{-1/2}{\tilde{X}}^{\top }\Pi \tilde{{\textbf{r}}}) \\&= \text {Var}(n^{-1/2}X^{\top }V\Pi D V^{\top }({\textbf{y}} - X{\varvec{\beta }_{0}})) \\&= n^{-1}X^{\top } V D\Pi V^{\top } \text {Var}(\varvec{\epsilon }) V\Pi D V^{\top } X \\&= n^{-1}\sigma ^{2} X^{\top } \Lambda ^2 X=n^{-1}\nu . \end{aligned}$$

It follows from the central limit theorem (Van der Vaart 2000), that the test statistic \(T_{\Pi }\) is also asymptotically normal. \(\square \)

Remark 1

Note that the bias in the mean of the test statistic is intrinsic in the regularization approach, and cannot be avoided in the finite sample scenario. Because of this bias, we are only able to reach asymptotically exact results.

Remark 2

Note also that the matrix \(\Pi \) is defined so that it commutes with D. This is necessary to ensure that the variance of the test statistic is invariant under the action of \(\Pi \).

We now introduce some notation before establishing the main result, that constitutes the pivot point to prove the asymptotic control of the probability of Type I error. For the sake of simplicity of exposition, we consider the results for a single covariate case in the remainder of this section and in Sect. 4.3. In Sect. 4.4, we outline the procedure for the general multivariate case. Let \(\alpha \in [0, 1)\). For any \(a \in {\mathbb {R}}\), let \(\lceil a \rceil \) be the smallest integer which is larger than or equal to a and let \(\lfloor a \rfloor \) be the largest integer which is at most a. We consider all the possible \(w = 2^{n}\) sign-flips \(\Pi _1,\ldots ,\Pi _w,\) where \(\Pi _1=I\). For a given value of the test statistic \(T_{I}^n\), we hence consider all the associated sign-flipped values \(T_{I}^n,T_{\Pi _2}^n, \ldots , T_{\Pi _w}^n\), where we use the superscript n to highlight the sample size. We denote by \(T_{(1)}^n \le \ldots \le T_{(w)}^n\) the corresponding sorted value. Finally, we write \(T_{[1-\alpha ]}^{n} = T_{(\lceil 1-\alpha \rceil w)}^{n}\).

Theorem 4.2

Consider the test that rejects \(H_{0}\) if and only if \(T_{I}^{n} > T_{[1-\alpha ]}^{n}\). Then, under the null hypothesis, the test is asymptotically exact and the rejection probability \({\mathbb {P}}(T_{I}^{n} > T_{[1-\alpha ]}^{n})\) is at most \(\alpha \).

Proof

We need to show that the asymptotic distribution of the \(2^n\)-dimensional vector of test statistics \({\textbf{T}}=(T_{I}^n, \ldots , T_{\Pi _w}^n)^{\top }\) is invariant under sign-flip transformations \(\Pi \), that is \({\textbf{T}}{\mathop {=}\limits ^{d}}\Pi \circ {\textbf{T}}\), where \({\mathop {=}\limits ^{d}}\) represent the equality in distribution and the composition stands for \(\Pi \circ {\textbf{T}}=\Pi \circ (T_{I}^n,T_{\Pi _2}^n, \ldots , T_{\Pi _w}^n)^{\top }=(T_{\Pi I}^n,T_{\Pi \Pi _2}^n, \ldots , T_{\Pi \Pi _w}^n)^{\top }\). This will prove the asymptotic control of the Type I error through Theorem 15.2.1 in Lehmann and Romano (2008) and Theorem 1 in Hemerik and Goeman (2018a).

Let \(\tilde{{\textbf{X}}}\) be the diagonal \(n \times n\) matrix with elements \(({\tilde{X}}_{1}, \dots , {\tilde{X}}_{n})\). The test statistic in Definition 1 can be rewritten as

$$\begin{aligned} T_{\Pi } = {\tilde{X}}^{\top }\Pi \tilde{{\textbf{r}}} = \mathbb {1}_{n}^{\top }\Pi \tilde{{\textbf{X}}}\tilde{{\textbf{r}}}, \end{aligned}$$

where \(\mathbb {1}_{n}\) is the n-dimensional unit vector. The test statistic \(T_{\Pi }\) can hence be viewed as sum of n contributions, where each element of \(\tilde{{\textbf{X}}}\tilde{{\textbf{r}}}\) is sign-flipped through \(\Pi \). Similarly, the variance of \(T_\Pi \) can be written as

$$\begin{aligned} \text {Var}(T_{\Pi }) = n^{-1}\sigma ^{2}\mathbb {1}_{n}^{\top }\tilde{{\textbf{X}}}D\tilde{{\textbf{X}}}\mathbb {1}_{n}; \end{aligned}$$

see also Theorem 4.1.

To evaluate the joint distribution of the test statistics \({\textbf{T}}\), let us now define \(\varvec{\Pi }\) as the \(2^n\times n\) matrix collecting all the \(w=2^n\) vectors of sign-flip row-wise. Therefore, we can write \({\textbf{T}} = \varvec{\Pi } \tilde{{\textbf{X}}}\tilde{{\textbf{r}}}\), and \(\Pi \circ {\textbf{T}}=\varvec{\Pi } \Pi \tilde{{\textbf{X}}}\tilde{{\textbf{r}}}\). The joint distribution of \({\textbf{T}}\) is multivariate normal with variance \(\text {Var}({\textbf{T}})=\sigma ^2n^{-1}\varvec{\Pi } \tilde{{\textbf{X}}}D\tilde{{\textbf{X}}} \varvec{\Pi }^{\top }\) and asymptotically zero mean. We now have to show that \(\Pi \circ {\textbf{T}}\) follows the same asymptotic multivariate normal distribution. First note that the transformation \(\Pi \) does not affect the expected value that remains asymptotically zero. Furthermore, for the variance we have

$$\begin{aligned} \text {Var}(\Pi \circ {\textbf{T}})&=n^{-1}\sigma ^2\varvec{\Pi }\Pi \tilde{{\textbf{X}}}D\tilde{{\textbf{X}}}\Pi \varvec{\Pi }^{\top }\\&=n^{-1}\sigma ^2\varvec{\Pi }\Pi \Pi \tilde{{\textbf{X}}}D\tilde{{\textbf{X}}} \varvec{\Pi }^{\top }\\&=n^{-1}\sigma ^2\varvec{\Pi } \tilde{{\textbf{X}}}D\tilde{{\textbf{X}}} \varvec{\Pi }^{\top }. \end{aligned}$$

Thanks to Theorem 15.2.1 in Lehmann and Romano (2008) and Theorem 1 in Hemerik and Goeman (2018a), this yields the null invariance \({\textbf{T}}{\mathop {=}\limits ^{d}}\Pi \circ {\textbf{T}}\). It follows that under \(H_0\), \({\mathbb {P}}(T_{I}^{n} > T_{[1-\alpha ]}^{n})\le \alpha \). \(\square \)

Remark 3

The previous result is still valid in the case when \(w \ne 2^{n}\), i.e., when not every element of \(\{1, -1\}^{n}\) is used once (see, e.g., Hemerik and Goeman 2018a). For computational reasons, it is in fact common practice to sample uniformly from \(\{-1, 1\}^{n}\), with or without replacement. The same holds also for the results in the next section.

4.2 On the nature of the eigen sign-flip test

A few comments on this approach may be useful to understand its nature.

In the simpler context of linear regression models, it is possible to define a residualizing matrix that projects into the residual space. This is an orthogonal projection matrix; as such, it is idempotent, and all its eigenvalues are zero or one. Thanks to this, for classical linear regression models, Kherad-Pajouh and Renaud (2010) propose to pre-multiply the residuals by the semiorthogonal matrix defined by the eigenvectors corresponding to the non-null eigenvalues. This pre-multiplication transforms the n residuals into pseudo-residuals, reducing their cardinality. The number of these new pseudo-residuals is equal to the rank of the residualizing projection matrix (i.e., the number of non zeros eigenvalues, usually equal to the number of covariates). Being the remaining eigenvalues of residualizing matrix equal to one, the resulting pseudo-residuals are now independent and homoscedastic (i.e., spherical). In particular, Kherad-Pajouh and Renaud (2010) suggest the use of a permutation approach, while Solari et al. (2014) extend it to the more general framework of rotations matrices.

Unfortunately, within the semiparametric regression framework, the residualizing matrix \(\Lambda \) is not a projection matrix and is not idempotent; therefore, its eigenvalues do not take values in \(\{0, 1\}\). A multiplication by these eigenvalues would act as a scaling factor for the residuals, making them independent, but not homoscedastic. For this reason, defining \(\Pi \) as a permutation or a rotation matrix would not be a valid solution. Defining instead \(\Pi \) as sign-flipping matrix ensures the commutative property \(\Pi D=D \Pi ,\) as highlighted in Remark 2. This property is indeed crucial, as it guarantees that the variance of the test statistics is constant over \(\Pi ,\) as proved in Theorem 4.1. It is also worth to emphasize that the test guarantees only asymptotic exactness since the penalization induces a bias in the estimation of the mean—which vanishes with increasing n—while, for fixed n, the variance remains constant for all the test statistics defining the null distribution. On the contrary, the standard parametric Wald test needs asymptotic results for both the mean and the variance, such as those obtained in Sect. 3.1. Similar considerations could be drawn for a naive sign-flip score test that does not decomposes the matrix \(\Lambda \). In this case, for finite samples, the sign-flipped test statistic would not have variance equal to observed test statistic. This would lead to performances that are comparable to the parametric Wald test, as shown in Ferraccioli (2020). This difference between the eigen sign-flip test and the other competitors is crucial in providing an adequate control of Type I error, as shown by the simulations in Sect. 5.

4.3 Interval hypotheses, two-sided tests and confidence intervals

So far we have defined the eigen sign-flip test for point-wise null hypothesis. The most common situation in practice is nonetheless to define interval null hypotheses. As for standard approaches, we need to prove that the p-value computed under any point-wise null hypothesis \(H_0: \beta = \beta _0-\epsilon \) (\(\forall \ \epsilon >0\)) has also Type I error probability bounded by \(\alpha \). For convenience, let us define the test statistic as a function of the tested coefficient, that is \(T_{\Pi }(\beta _0) = n^{-1/2}X^{\top }V\Pi V^{\top } \Lambda ({\textbf{y}} - X\beta _0) = n^{-1/2}X^{\top }V\Pi D V ({\textbf{y}} - X\beta _0)\). We now give two results for interval hypothesis and two-sided tests, and a third result on confidence intervals.

Corollary 4.2.1

(Interval hypotheses) Consider the hypotheses \(H_0^{'}:\ \beta =\beta _0-\epsilon \) and \(H_1^{'}:\ \beta >\beta _0-\epsilon \), with \(\epsilon >0\). Then for every \(\epsilon > 0\), \({\mathbb {P}}(T_{\Pi }(\beta _0-\epsilon ) \ge T_{I}(\beta _0-\epsilon ))\le {\mathbb {P}}(T_{\Pi }(\beta _0) \ge T_{I}(\beta _0))\). The same is true for the opposite hypothesis: that is, \(\epsilon <0\) and \(H_1':\ \beta <\beta _0+\epsilon \).

Proof

We have

$$\begin{aligned}{} & {} {\mathbb {P}}(T_{\Pi }(\beta _0-\epsilon ) \ge T_{I}(\beta _0-\epsilon ))\\{} & {} ={\mathbb {P}}(T_{\Pi }(\beta _0) -n^{-1/2}X^{\top }V\Pi D V^{\top } X{\epsilon } \ge T_{I}(\beta _0) - n^{-1/2}X^{\top } D V^{\top } X\epsilon )\\{} & {} ={\mathbb {P}}(T_{\Pi }(\beta _0) + n^{-1/2}X^{\top }V(I-\Pi ) D V^{\top } X{\epsilon } \ge T_{I}(\beta _0) )\\{} & {} \le {\mathbb {P}}(T_{\Pi }(\beta _0) \ge T_{I}(\beta _0)). \end{aligned}$$

Note that last inequality holds since

$$\begin{aligned} n^{-1/2}X^{\top }V(I-\Pi ) D V X\epsilon = \mathbb {1}_{n}^{\top }\tilde{{\textbf{X}}}(I-\Pi )\tilde{{\textbf{X}}} \mathbb {1}_{n}\epsilon \ge 0 \quad \forall \Pi . \end{aligned}$$

Note that \((I-\Pi )\) is a diagonal matrix with non-negative diagonal entries, thus it is positive semi-definite for all \(\Pi \). Therefore, any quadratic form of it is nonnegative. \(\square \)

Corollary 4.2.2

(Two-sided test) Consider \(\alpha _{1}, \alpha _{2} \in (0,1]\) and such that \(\alpha _{1}+\alpha _{2}<1\). Then, under \(H_{0}: \beta =\beta _0\), as \(n \rightarrow \infty \),

$$\begin{aligned} {\mathbb {P}}\left[ (T_{I}^{n} < T_{[\alpha _{1}]}^{n}) \cup (T_{I}^{n} > T_{[1-\alpha _{2}]}^{n})\right] \rightarrow \alpha _{1} + \alpha _{2}. \end{aligned}$$

That is, the eigen sign-flip test controls the Type I error asymptotically when testing \(H_0:\ \beta = \beta _0\) against the two sided alternative \(H_1:\ \beta \ne \beta _0\).

Proof

Theorem 4.2 proves that \({\mathbb {P}}\left[ T_{I}^{n} < T_{[\alpha _{1}]}^{n} \right] \rightarrow \alpha _{1}\) and \({\mathbb {P}}\left[ T_{I}^{n} > T_{[1-\alpha _{2}]}^{n}\right] \rightarrow \alpha _{2}\). This, together with the fact that \({\mathbb {P}}\left[ (T_{I}^{n} < T_{[\alpha _{1}]}^{n}) \cap (T_{I}^{n} > T_{[1-\alpha _{2}]}^{n})\right] \rightarrow 0\), proves the corollary. \(\square \)

As consequence of the two lemmas above, we can also derive confidence intervals for the parameter \(\beta \).

Corollary 4.2.3

(Confidence Interval) Let \(\alpha \in (0,1]\). Then the set

$$\begin{aligned} \Theta (\beta ) = \{\beta : {\mathbb {P}}(T_{\Pi }(\beta ) \ge (\le ) T_{I}(\beta ))\ge (\le ) \alpha \} \end{aligned}$$

is a one-sided confidence interval for parameter \(\beta \) with asymptotic coverage \(1-\alpha \). Let also \(\alpha _{1}, \alpha _{2} \in (0,1]\) such that \(\alpha _{1}+\alpha _{2}<1\). Similarly, the set

$$\begin{aligned} \Theta (\beta ) = \{\beta : \left( {\mathbb {P}}(T_{\Pi }(\beta ) \ge T_{I}(\beta ))\ge \alpha _1\right) \cap \left( {\mathbb {P}}(T_{\Pi }(\beta ) \le T_{I}(\beta ))\le \alpha _{2}\right) \} \end{aligned}$$

is a two-sided confidence interval with asymptotic coverage \(1 - (\alpha _{1} + \alpha _{2})\).

Proof

The proof follows directly from Corollaries 4.2.1 and 4.2.2. \(\square \)

4.4 Testing a subset of the covariates

We now deal with the case where we have multiple covariates, and we are interested in testing a subset of the covariates. Specifically, assume \(X\in {\mathbb {R}}^n\times {\mathbb {R}}^q\) represents the set of covariates of interest, with associated vector of coefficients \(\varvec{\beta },\) and \(Z\in {\mathbb {R}}^n\times {\mathbb {R}}^p\) the set of covariates associated with the vector of nuisance coefficients \(\varvec{\zeta }\). The minimization problem in (2) then becomes

$$\begin{aligned} \underset{\varvec{\beta }, f}{\text {argmin}}\sum _{i = 1}^{n}(y_{i} - f({\textbf{p}}_{i}) - {\textbf{x}}_{i}^{\top }\varvec{\beta } - {\textbf{z}}_{i}^{\top }\varvec{\zeta })^{2} + \lambda _{n}{\mathcal {P}}(f). \end{aligned}$$
(16)

We might be interested in testing

$$\begin{aligned} \text {H}_{0}: \varvec{\beta } = \varvec{\beta }_{0} \qquad \text {versus} \qquad \text {H}_{1}: \varvec{\beta } \ne \varvec{\beta }_{0}, \end{aligned}$$

for any value of \(\varvec{\zeta }\) and \(\varvec{\gamma }\). Let us define \(\Psi ^{*} = [Z|\Psi ]\) the \(n\times (p + K)\) matrix composed by the covariates associated with the nuisance parameters and the bases for the nonparametric part of the model, with coefficients \(\varvec{\theta } = (\varvec{\zeta }, \varvec{\gamma })\). We can then rewrite equation (5) as

$$\begin{aligned} (\Psi ^{*\top }\Psi ^{*} + \lambda P^{*})\varvec{\theta } - \Psi ^\top ({\textbf{y}} - X\varvec{\beta }) = 0, \quad P^{*} = \begin{bmatrix} {\mathbb {O}}_{p \times p} &{} {\mathbb {O}}_{p \times K} \\ {\mathbb {O}}_{K \times p} &{} P_{K \times K} \end{bmatrix}, \end{aligned}$$

where \({\mathbb {O}}\) is a matrix of zeros.

Definition 1 of the eigen sign-flip test remains valid also in this case, with the only modification of the matrix \(\Lambda \) in (6), where \(\Psi \) is replaced by \(\Psi ^{*}\). Moreover, the following corollary provides the extension of Theorem 4.2 to the case where \(\varvec{\beta }\) is a vector.

Corollary 4.2.4

Consider the test that rejects \(\text {H}_{0}: \varvec{\beta } = \varvec{\beta }_{0}\) if and only if \(\varphi (T_{I}^{n}) > \varphi (T_{[1-\alpha ]}^{n})\), where \(\varphi (\cdot )\) is any nonparametric combining function (Section 6.2 Pesarin 2001). Then, under the null hypothesis, the test is asymptotically exact and the rejection probability \({\mathbb {P}}(\varphi (T_{I}^{n}) > \varphi (T_{[1-\alpha ]}^{n}))\) is at most \(\alpha \).

Proof

In order to extend the proof of Theorem 4.2 to the multivariate framework, we need to rely on the Nonparametric Combination of dependent test statistics, as defined, e.g., in Section 6.2 of Pesarin (2001). First of all, recall that the test statistic T is a vector itself. Moreover, Theorem 3.2 proves the asymptotic multivariate normality of T and Theorem 4.1 shows that the sign-flipped vectors of test statistics \(T_{I}^n, \ldots , T_{\Pi _w}^n\) share the same distribution. Therefore, the matrix \({\textbf{T}}=(T_{I}^n, \ldots , T_{\Pi _w}^n)^{\top }\) is equal in distribution to \(\Pi \circ {\textbf{T}}\). More precisely, \({\textbf{T}}\) is the \(2^n\)-dimensional vector of test statistics \(T_{(\cdot )}^{n}\), i.e., each row of \({\textbf{T}}\) is a sampling from the multivariate test statistics T. We can therefore use any nonparametric combining function Pesarin (2001) to obtain a p-value. \(\square \)

Among the most commonly used nonparametric combining functions, defined, e.g., in Pesarin (2001), are the max-T, sum-T or Mahalanobis distance. As an illustrative example, a p-value based on the \(\min \)-p combining function (Westfall and Young 1993) rejects the multivariate null hypothesis if the maximum value of T is larger than the \(1-\alpha \) quantile of the distribution of the maxima computed over the w elements of \((T_{I}^n, \ldots , T_{\Pi _w}^n)\).

5 Simulation studies

In this section we present two simulation studies, to investigate the finite sample performances of the proposed test. Simulation 1, in Sect. 5.1, considers a semiparametric model based on classical univariate splines (as, for instance, in Heckman 1986; Wand and Ormerod 2008). Simulation 2, in Sect. 5.2, considers instead a semiparametric model based on SR-PDE (Sangalli et al. 2013). In these different settings, we compare the performances of three different tests:

  • Wald: a classical Wald-type test based on the asymptotic distribution of \(\hat{\varvec{\beta }}\);

  • Speck: a similar Wald-type test based on the asymptotic distribution of the Speckman version of the estimator (Speckman 1988), as derived in Holland (2017);

  • ESF: the Eigen sign-flip score test introduced in Definition 1.

The results show the performances of the tests over 1000 simulation repetitions.

5.1 Simulation 1

In Simulation 1, we simulate from model (1), with \(\Omega =[0,1]\) and \(p_1,\ldots ,p_n\) randomly sampled from a uniform distribution on \(\Omega ,\) with \(n=200.\) For the nonparametric component of the model, we consider the test function 1 from the function gamSim in the R package mgcv (Wood 2015, 2017), defined as \(0.2p^{11}(10(1 - p))^6 + 10(10p)^3(1 - p)^{10}.\) We consider \(q=1\) covariate, and we generate \(x_1,\ldots ,x_n\) according to four different stochastic processes:

  1. (a)

    an i.i.d. random sample from \({\mathcal {N}}(0, 0.1^2)\);

  2. (b)

    a Gaussian random field on [0, 1] with mean zero and scale 0.01;

  3. (c)

    the function \(\sqrt{p + 2}\) on [0, 1], with added an i.i.d. random sample from \({\mathcal {N}}(0, 0.1^2);\)

  4. (d)

    the function \(\sqrt{p + 2}\) on [0, 1], with added a Gaussian random field with mean zero and scale 0.01.

The covariates and the true f are standardized, before computing the response variable y,  so that their relative contributions to the response are comparable. We consider both \({\beta }_0=0\) and other 10 different values of \({\beta }_0,\) from 0.01 to 0.1, to check both the Type I error and the power of the test. Finally, we add i.i.d. normal random errors \(\epsilon _1,\ldots ,\epsilon _n,\) with zero mean and standard deviation 0.1. For each test case, the generation of the covariates and noise is repeated 1000 times.

The model is estimated using cubic B-spline bases, with 200 equispaced internal nodes on \(\Omega \), using the implementation in Wand and Ormerod (2008). The smoothing parameter is chosen via cross-validation. The tests are performed with nominal value 0.05. For the proposed eigen sign-flip test, we consider 1000 random sign-flips.

Table 1 Simulation 1
Fig. 1
figure 1

Simulation 1. Power of the Wald test (green dotted), of its Speckman variant (cyan dashed) and of the proposed eigen sign-flip (red solid)

Table 1 shows the control of Type I error, and Fig. 1 shows the power functions for the three competing tests. The table and figure immediately highlight that the most challenging scenarios are cases (b) and (d), where the covariates have been generated with a dependence structure, sampling from a Gaussian process. The classic parametric test (Wald) shows an extremely poor control of the Type I error in these two cases, with an observed proportion of Type I error of over \(26\%\), when the nominal value of the test is \(5\%\). This behavior is possibly due to the poor estimation of the variance induced by the regularized estimates. The Speckman variant appears more robust, partly correcting for the misspecified variance. Nonetheless, this test is significantly underconservative in cases (b) and (d), with a proportion of Type I error of almost 9%, while it is over-conservative in cases (a) and (c), where it returns a proportion of Type I error of about 2.3%. The proposed eigen sign-flip score test, on the contrary, maintains an extremely good control of the Type I error, under all scenarios, and it is never underconservative. Also in the challenging cases (b) and (d), at the cost of a slightly loss of power, it manages to keep a proportion of Type I error very close (and just slightly inferior) to the nominal value of the test.

We also considered the case of multiple covariates, following the simulation scheme detailed above, but including simultaneously all four covariates (a), (b), (c) and (d) in the data generation, and testing one parameter at a time, considering the other parameters as nuisance, as detailed in Sect. 4.4. The same considerations as those detailed for the simulation in Fig. 1 can be drown (results non included for sake of space).

5.2 Simulation 2

In Simulation 2, we simulate from model (1), with \(\Omega =[0,1]\times [0,1]\) and \({\textbf{p}}_1,\ldots ,{\textbf{p}}_n\) randomly sampled from a uniform distribution on \(\Omega ,\) with \(n=225.\) For the nonparametric component of the model, we consider the test function 2 from the function gamSim in the R package mgcv (Wood 2015, 2017), defined as

$$\begin{aligned}{} & {} 0.4\pi ^{0.3}\left( 1.2\exp \left( -\frac{(p_{1} - 0.2)^2}{0.3^2} - \frac{(p_{2} - 0.3)^2}{0.4^2}\right) \right. \\{} & {} \quad \left. + 0.8\exp \left( -\frac{(p_{1} - 0.7)^2}{0.3^2} - \frac{(p_{2} - 0.8)^2}{0.4^2}\right) \right) . \end{aligned}$$

We consider \(q=1\) covariate, and we generate \(x_1,\ldots ,x_n\) according to four different stochastic processes:

  1. (a)

    a Gaussian random field with zero mean and scale 0.05;

  2. (b)

    a Matern random field with \(\nu = 1\), \(\sigma = 2\) and scale 0.1;

  3. (c)

    the function \(\cos (5(p_{1} + p_{2})) + (2p_{1} - p_{1}p_{2}^2)^2\) with added a Gaussian random field with scale 0.05;

  4. (d)

    the function \(\cos (5(p_{1} + p_{2})) + (2p_{1} - p_{1}p_{2}^2)^2\) with added a Matern random field with \(\nu = 1\), \(\sigma = 2\) and scale 0.1.

The covariates and the true f are standardized, before computing the response variable y,  so that their relative contributions to the response are comparable. We consider both \({\beta }_0=0\) and other 10 different values of \({\beta }_0,\) from 0.01 to 0.1, to check both the Type I error and the power of the test. Finally, we add i.i.d. normal random errors \(\epsilon _1,\ldots ,\epsilon _n,\) with zero mean and standard deviation 0.1. For each test case, the generation of the covariates and noise is repeated 1000 times.

The model is estimated using SR-PDE, with linear finite elements on a mesh having 225 nodes on a regular lattice over \(\Omega ,\) implemented using the package fdaPDE. The smoothing parameter is chosen via cross-validation. The tests are performed with nominal value 0.05. For the proposed eigen sign-flip test, we consider 1000 random sign-flips.

Table 2 Simulation 2
Fig. 2
figure 2

Simulation 2. Power of the Wald test (green dotted), of its Speckman variant (cyan dashed) and of the proposed eigen sign-flip (red solid)

The results are presented in Table 2 and Fig. 2. The classic parametric test (Wald) has poor performances and very low control of the Type I error in all the scenarios, with proportion of Type I error of about 15% and higher. The Speckman variant is always more robust than the Wald, but it is often severely underconservative, with observed proportion of Type I error of about 10%. The proposed eigen sign-flip, on the contrary, at a loss of some power, permits an extremely good control of the Type I error, even in the more challenging scenarios, where the covariate has a strong spatial structure.

6 Study of human development in Nigeria

In this section we apply the proposed methodology to the analysis of human development in Nigeria. In particular, we are interested in better understanding the difference in socioeconomic and health conditions in the various states of the country. Unfortunately, data at national and subnational level are often poor or not publicly available. This lack of information and of public domain surveys hamper the efforts to identify and develop targeted interventions in troubled areas (Jerven 2013). An alternative to traditional data consists in using other sources of openly accessible data, such as data from social media, mobile phone networks, or satellites. In particular, a popular recent approach leverages on satellite images of luminosity at night to estimate economic activity (Chen and Nordhaus 2011; Jean et al. 2016). These images highlight urban areas, which typically offer better provisions of basic services such as electricity, water and public health, as well as more job opportunities, with respect to rural areas.

Here we use open satellite data (NASA Worldview Snapshots), together with demographic data, to predict human development. Specifically, as a response variable, we consider the Human Development Index (HDI) (available at https://globaldatalab.org/shdi), an aggregated index that takes into account multiple dimensions at the household and individual level in health, education and standard of living. This index is available at states level, for the 36 states of Nigeria, and for the Federal Capital Territory. The values of this index are shown in panel d of Fig. 3. As covariates, in the parametric part of the model, we use the population density, \({\textbf{x}}_{Pop}\), of each state (data from the National Bureau of Statistics, Nigeria), shown in panel e of Fig. 3, and the three satellite images shown in the top panels of the same figure that are

  • Nightlight luminosity, \({\textbf{x}}_{Night}\), obtained via the VIIRS Nighttime Imagery, that captures low-light emission sources, under varying illumination conditions (panel a);

  • Short-Wave Infrared, \({\textbf{x}}_{SWIR}\), that highlights bare soils, such as deserts (panel b);

  • Near Infrared, \({\textbf{x}}_{NIR}\), that highlights vegetation (panel c).

We are interested in identifying significant effects of these covariates on human development, considering the model

$$\begin{aligned} {\textbf{y}} = {\textbf{f}} + {\textbf{x}}_{Pop}\beta _{Pop} + {\textbf{x}}_{Night}\beta _{Night} + {\textbf{x}}_{SWIR}\beta _{SWIR} + {\textbf{x}}_{NIR}\beta _{NIR} + \mathbf {\epsilon }. \end{aligned}$$

Since the HDI, the response variable, and one of the covariate, the population density, are available at state level, we also aggregate the other three covariates at state level, considering their areal means. We then apply SR-PDE, considering the data located at the capitals of each state. We use a mesh with 320 nodes and select the smoothing parameter through generalized cross-validation (\(\lambda _n = 0.1\)). We hence perform significance tests on each covariate, one at a time, considering the other parameters as nuisance, as described in Sect. 4.4, using the eigen sign-flip procedure with 5000 random sign-flips.

Fig. 3
figure 3

Panels ad show the covariates used: nightlight luminosity (a), Short-wave infrared (b), Near infrared (c) satellite images, and population density (d). Panel e shows the observed HDI. Panel f shows the HDI predicted by the SR-PDE model. Imagery from the Worldview Snapshots application (https://wvs.earthdata.nasa.gov

Nightlight results significant (\(p < 0.005\)), with a positive impact on human development (the estimated coefficients is 0.29). The finding on nightlight is in line with other recent research studies (Chen and Nordhaus 2011; Jean et al. 2016). The presence of urban areas, in fact, plays a huge role in the overall wealth of the population. This of course does not imply a causality effect, since increased wealth has itself an impact on the development of urban areas. Nightlight is nonetheless a good indicator of the socioeconomic status at local level, that does not require any official statistics, as previously discussed. Short-wave infrared seems to be slightly significant (\(0.05< p < 0.1\)), with a negative impact on human development (the estimated coefficients is \(-0.016\)). The result might suggest that the presence of deserted areas with large amounts of bare soil lead to a decrease in human development. The more advanced states are indeed close to the ocean, in the southern part of the country. The northern part instead, that is mostly deserted, is not very populated. It is also worth noting that the aggregation at state level averages localized features, such as the presence of rivers, lakes or small vegetation, possibly reducing important information. The third satellite covariate, near infrared, does not appear significant (\(p > 0.1\)). The same apply for population density (\(p > 0.1\)). This is possibly due to the fact that the distribution is highly skewed, with most of the population residing in the state of Lagos, in the southwest of the country (see panel d in Fig. 3). Panel f in Fig. 3 shows the predicted MPI values, highlighting the very high explicative power of the model.

7 Discussion

This paper describes a strongly innovative and highly promising inferential approach in the context of semiparametric regression with roughness penalties. The paper focuses on tests for the linear part of the models. On the other hand, similar ideas can be used to develop tests and confidence bands on the nonlinear part of the models. Moreover, the described approach could be extended to deal with semiparametric regression with spatiotemporal components [see, e.g., Ugarte et al. (2009, 2010); Aguilera-Morillo et al. (2017); Marra et al. (2012); Augustin et al. (2013); Bernardi et al. (2018)], further broadening the spectrum of potential models that could benefit from our proposal. These developments will be objects of dedicated future studies. We are confident this inferential approach will become popular and will prove to be highly valuable in the varied contexts where semiparametric regression is used.