1 Introduction

Single index models are widely used in statistics since they compromise interpretability of index coefficients in the parametric part and flexibility of regression modeling in the nonparametric part (see, ch. 8 of Li and Racine, 2007, for a review). Many estimation methods have been proposed for single index models, such as the semiparametric least squares estimator (Härdle et al., 1993; Ichimura, 1993), M-estimator (Klein & Spady, 1993), and average derivative estimator (Powell et al., 1989). Although these estimation methods have desirable theoretical properties under certain regularity conditions, they typically require some nonparametric smoothing method to evaluate the unknown link function, which involves tuning parameters, such as bandwidth and series length parameters, and the optimal choices of them are substantial (theoretical and practical) problems.

The monotone single index model, in which monotonicity is imposed on the link function, has been studied in recent years. Balabdaoui et al., (2016) showed that the least square estimator of a monotone single index model generally converges at the cube root rate, but its asymptotic distribution is still unknown. The main difficulty for deriving the asymptotic distribution of the least square estimator arises from the non-differentiability of the objective function; in a monotone single index model, the link function, which is an infinite-dimensional nuisance parameter, is generally estimated by a nonparametric approach such as isotonic regression, while the index part is parametrically modeled as a linear combination of the covariates. Then the derivative of the objective function with respect to the index coefficients is intractable due to the non-smoothness of the estimated nuisance parameter.

To overcome this issue, Groeneboom and Hendrickx, (2018) developed a score-type estimator for the current status model, which is a special case of monotone single index models. Their approach is based on the estimating equation which is the same as the first-order condition of the least square estimator except that it ignores the derivative of the estimated link function. They proved \(\sqrt{n}\)-consistency and asymptotic normality of their estimator without any tuning parameter. Their result was extended to general monotone single index models by Balabdaoui et al., (2019), where they derived \(\sqrt{n}\)-consistency and asymptotic normality for the parametric component and an \(n^{1/3}/\log n\) convergence rate for the nonparametric estimator of the link function.

Although the score estimation approach is remarkable, the main drawback is that it requires smoothing parameters to estimate the asymptotic variance to implement hypothesis testing and interval estimation. Because the estimating function in the score-type approach is dependent on the estimated link function, some conditional expectation is involved in the asymptotic variance. Besides, the partial derivative of the link function is also included in the asymptotic variance even though the estimated link function is not smooth. Therefore, smoothing methods, such as the kernel smoothing, are employed to estimate such quantities, which require us to select multiple smoothing parameters and make statistical inference cumbersome.

To address this problem, we propose an empirical likelihood inference method based on the score-type approach for monotone index models. We show that the empirical likelihood statistic based on the estimating equation of Balabdaoui et al., (2019) converges in distribution to the weighted chi-squared distribution. Even in our empirical likelihood approach, the conditional expectation as mentioned above appears in the asymptotic distribution. To circumvent selection of smoothing parameters, we adapt the bootstrap calibration method proposed by Hjort et al., (2009) to our context. Because of the estimating equation with the estimated nuisance parameter plugged-in, a classical naive bootstrap method is not asymptotically valid. Hjort et al. (2018) provided a modified bootstrap method by recentering and reweighting to deal with such a situation. Combining the empirical likelihood and modified bootstrap methods, our approach provides a simple and theoretically justified method for statistical inference in monotone single index models.

The remainder of this paper is organized as follows. Section 2 presents our basic setup, methodology, and theoretical results. In Sect. 3, we conduct a small simulation study to illustrate the proposed method. All proofs are contained in the appendix.

2 Main result

We closely follow the setup and notation of Balabdaoui et al., (2019) (hereafter BGH). Consider the monotone index model

$$\begin{aligned} Y=\psi _{0}(X^{\prime }\alpha _{0})+\epsilon ,\qquad E[\epsilon |X]=0, \end{aligned}$$
(1)

where Y is a scalar response variable, X is a d-dimensional vector of covariates, \(\epsilon \) is an error term, \(\alpha _{0}\) is a k-dimensional vector of parameters, and \(\psi _{0}:{\mathbb {R}}\rightarrow {\mathbb {R}}\) is an unknown monotone increasing function. For identification, we assume that \(\alpha _{0}\) belongs to the d-dimensional unit sphere \({\mathcal {S}}_{d-1}=\{\alpha \in {\mathbb {R}}^{d}:||\alpha ||=1\}\). We are interested in conducting statistical inference (i.e., interval estimation and hypothesis testing) on \(\alpha _{0}\) based on the empirical likelihood approach.

Let \({\mathbb {S}}:{\mathbb {R}}^{d-1}\rightarrow {\mathcal {S}}_{d-1}\) be a parameterization such that for each \(\alpha \) in a neighborhood of \(\alpha _{0}\) on \({\mathcal {S}}_{d-1}\), there exists a unique \(\beta \in {\mathbb {R}}^{d-1}\) which satisfies \(\alpha ={\mathbb {S}}(\beta )\). To motivate the score-type approach of BGH, we tentatively assume that \(\psi _{0}\) is known. The population score equation for the least square estimation of \(\beta _{0}\) is

$$\begin{aligned} E\left[ {\mathbb {J}}(\beta _{0})^{\prime }X\psi _{0}^{(1)}(X^{\prime }{\mathbb {S}}(\beta _{0}))\{Y-\psi _{0}(X^{\prime }{\mathbb {S}}(\beta _{0}))\}\right] =0, \end{aligned}$$
(2)

where \(\psi _{0}^{(1)}\) is the derivative of \(\psi _{0}\) and \({\mathbb {J}}(\beta )\) is the Jacobian of \({\mathbb {S}}(\beta )\). Thus, it is natural to construct an estimator of \(\beta _{0}\) by taking an empirical counterpart of (2) and inserting estimators for \(\psi _{0}^{(1)}\) and \(\psi _{0}\). However, when we estimate \(\psi _{0}\) by the isotonic regression method, the resulting estimator of \(\psi _{0}\) is typically discontinuous and it is not clear how to evaluate the derivative \(\psi _{0}^{(1)}\) without introducing smoothing parameters. To address this issue, BGH and Groeneboom and Hendrickx, (2018) considered the modified population score equation

$$\begin{aligned} E\left[ {\mathbb {J}}(\beta _{0})^{\prime }X\{Y-\psi _{0}(X^{\prime }{\mathbb {S}}(\beta _{0}))\}\right] =0. \end{aligned}$$
(3)

In particular, for point estimation of \(\alpha _{0}\), BGH proposed to solve the following score-type equation:

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}{\mathbb {J}}({\hat{\beta }})^{\prime }X_{i}\{Y_{i}-{\hat{\psi }}_{{\hat{\beta }}}(X_{i}^{\prime }{\mathbb {S}}({\hat{\beta }}))\}=0, \end{aligned}$$
(4)

with respect to \({\hat{\beta }}\), and estimate \(\alpha _{0}\) by \({\hat{\alpha }}={\mathbb {S}}({\hat{\beta }})\), where for given \(\beta \), \({\hat{\psi }}_{\beta }\) is obtained by the isotonic regression

$$\begin{aligned} {\hat{\psi }}_{\beta }=\arg \min _{\psi \in {\mathcal {M}}}\sum _{i=1}^{n}\{Y_{i}-\psi (X_{i}^{\prime }{\mathbb {S}}(\beta ))\}^{2}, \end{aligned}$$
(5)

and \({\mathcal {M}}\) is the set of monotone increasing functions defined on \({\mathbb {R}}\).

In this paper, we employ the score-type equation in (3) as a moment function and propose the following empirical likelihood statistic

$$\begin{aligned} \ell (\beta _{0})=-2\max _{\{p_{i}\}_{i=1}^{n}}\sum _{i=1}^{n}\log (np_{i})\qquad \text {s.t. }\sum _{i=1}^{n}p_{i}=1,\quad \sum _{i=1}^{n}p_{i}{\hat{g}}_{i}(\beta _{0})=0, \end{aligned}$$
(6)

where

$$\begin{aligned} {\hat{g}}_{i}(\beta )={\mathbb {J}}(\beta )^{\prime }X_{i}\{Y_{i}-{\hat{\psi }}_{\beta }(X_{i}^{\prime }{\mathbb {S}}(\beta ))\}. \end{aligned}$$

By the Lagrange multiplier argument, its dual form is obtained as

$$\begin{aligned} \ell (\beta _{0})=2\sum _{i=1}^{n}\log (1+{\hat{\lambda }}^{\prime }{\hat{g}}_{i}(\beta _{0})), \end{aligned}$$
(7)

where the Lagrange multiplier \({\hat{\lambda }}\) solves

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}\frac{{\hat{g}}_{i}(\beta _{0})}{1+{\hat{\lambda }}^{\prime }{\hat{g}}_{i}(\beta _{0})}=0. \end{aligned}$$
(8)

In practice, we use the dual representation in (7) to implement statistical inference. To study the asymptotic properties of the empirical likelihood statistic \(\ell (\beta _{0})\), we impose the following assumptions. Let \(\left\| \cdot \right\| \) be the Euclidean norm and \({\mathcal {B}}(a_{0},A)=\{a:\left\| a-a_{0}\right\| \le A\}\) be a ball around \(a_{0}\) of radius A.

Assumption

  1. A1

    \(\{Y_{i},X_{i}\}_{i=1}^{n}\) is an iid sample generated by (1). The support \({\mathcal {X}}\) of X is convex with a nonempty interior, and \({\mathcal {X}}\subset {\mathcal {B}}(0,R)\) for some \(R>0\). The Lebesgue density of X has a bounded derivative on \({\mathcal {X}}\). There exist positive constants c and C such that \(E[|Y|^{m}|X=x]\le cm!C^{m-2}\) for all integers \(m\ge 2\) and almost every \(x\in {\mathcal {X}}\).

  2. A2

    \(\psi _{0}\) is monotone increasing and there exists \(K_{0}>0\) such that \(|\psi _{0}(u)|\le K_{0}\) for all \(u\in \{x^{\prime }\alpha _{0}:x\in {\mathcal {X}}\}\).

These assumptions are adaptations of Assumptions A1-A6 in BGH. Compared to BGH, our assumptions are simpler because we do not need to control the behavior of the score function outside the true parameter \(\alpha _{0}={\mathbb {S}}(\beta _{0})\). Assumption A1 is on the distribution form of the data. The support condition in A1 may be relaxed by assuming X to follow a sub-Gaussian distribution. The moment condition in A1, which is analogous to BGH’s A6, is required to guarantee \(\max _{1\le i\le n}|Y_{i}|=O_{p}(\log n)\) to control the entropy of a class of score functions. Assumption A2 is on the true link function \(\psi _{0}\). Compared to BGH which considers point estimation, we only need to impose boundedness, which is a mild requirement.

Under these assumptions, our main result is presented as follows.

Theorem 1

Under Assumptions A1-A2, it holds

$$\begin{aligned} \ell (\beta _{0})\overset{d}{\rightarrow }Z^{\prime }V^{-1}Z, \end{aligned}$$

where \(Z\sim N(0,\Sigma )\) with \(\Sigma ={\mathbb {J}}(\beta _{0})^{\prime }E[\epsilon ^{2}(X-E[X|X^{\prime }{\mathbb {S}}(\beta _{0})])(X-E[X|X^{\prime }{\mathbb {S}}(\beta _{0})])^{\prime }]{\mathbb {J}}(\beta _{0})\) and \(V={\mathbb {J}}(\beta _{0})^{\prime }E[\epsilon ^{2}XX^{\prime }]{\mathbb {J}}(\beta _{0})\).

Remark 1

This theorem says that the empirical likelihood statistic \(\ell (\beta _{0})\) is not asymptotically pivotal and converges to a weighted chi-squared distribution \(w_{1}\chi _{1,1}^{2}+\cdots +w_{d-1}\chi _{1,d-1}^{2}\), where \(w_{1},\ldots ,w_{d-1}\) are the eigenvalues of \(\Sigma ^{-1}V\) and \(\chi _{1,1}^{2},\ldots ,\chi _{1,d-1}^{2}\) are independent \(\chi _{1}^{2}\) random variables. This lack of asymptotic pivotalness is caused by the mismatch in the asymptotic variance \(\Sigma \) of the score function \(\frac{1}{\sqrt{n}}\sum _{i=1}^{n}{\hat{g}}_{i}(\beta _{0})\) and the limit V of the sample variance \({\hat{V}}=\frac{1}{n}\sum _{i=1}^{n}{\hat{g}}_{i}(\beta _{0}){\hat{g}}_{i}(\beta _{0})^{\prime }\). In the literature of empirical likelihood, weighted chi-squared limiting distributions often emerge when the score (or moment) functions involve estimated nuisance parameters (e.g., Qin and Jing, 2001; Xue and Zhu, 2006; Hjort et al., 2009).

Remark 2

One way to conduct statistical inference based on \(\ell (\beta _{0})\) is to estimate the critical values of \(w_{1}\chi _{1,1}^{2}+\cdots +w_{d-1}\chi _{1,d-1}^{2}\) based on some estimators of \(\Sigma \) and V. Based on (13), V is consistently estimated by \({\hat{V}}\). On the other hand, \(\Sigma \) can be estimated by

$$\begin{aligned} {\hat{\Sigma }}={\mathbb {J}}(\beta _{0})^{\prime }\frac{1}{n}\sum _{i=1}^{n}{\hat{\epsilon }}_{i}^{2}\{X_{i}-{\hat{m}}(X_{i}^{\prime }{\mathbb {S}}(\beta _{0}))\}\{X_{i}-{\hat{m}}(X_{i}^{\prime }{\mathbb {S}}(\beta _{0}))\}{\mathbb {J}}(\beta _{0}), \end{aligned}$$

where \({\hat{\epsilon }}_{i}=Y_{i}-{\hat{\psi }}_{\beta _{0}}(X_{i}^{\prime }{\mathbb {S}}(\beta _{0}))\) and \({\hat{m}}(\cdot )\) is a nonparametric estimator of \(m(\cdot )=E[X|X^{\prime }{\mathbb {S}}(\beta _{0})=\cdot ]\). An alternative way for statistical inference is to adjust the empirical likelihood statistic \(\ell (\beta _{0})\) to recover the asymptotic pivotalness. Based on Rao and Scottc (1981) (see also Xue and Zhu, 2006), the above theorem implies

$$\begin{aligned} \ell _{A}(\beta _{0})=\frac{d-1}{\textrm{trace}({\hat{\Sigma }}^{-1}{\hat{V}})}\ell (\beta _{0})\overset{d}{\rightarrow }\chi _{d-1}^{2}. \end{aligned}$$
(9)

Then the confidence region of \(\alpha _{0}={\mathbb {S}}(\beta _{0})\) can be obtained by \(\{{\mathbb {S}}(\beta ):\ell _{A}(\beta )\le q_{a}\}\), where \(q_{a}\) is the \((1-a)\)th quantile of the \(\chi _{d-1}^{2}\) distribution.

Remark 3

A drawback of the asymptotic inference method presented in the previous remark is that it requires a selection of a tuning parameter to implement the nonparametric estimator \({\hat{m}}(\cdot )\). In order to obtain an inference procedure which is free from tuning parameters, we adapt the bootstrap method of Hjort et al., (2009) as follows.

  1. (1)

    Based on the original sample \(\{Y_{i},X_{i}\}_{i=1}^{n}\), compute \({\hat{\beta }}\) as in (4), and then compute

    $$\begin{aligned} M_{n}({\hat{\beta }})=\frac{1}{n}\sum _{i=1}^{n}{\hat{g}}_{i}({\mathbb {S}}({\hat{\beta }})),\qquad {\bar{V}}=\frac{1}{n}\sum _{i=1}^{n}{\hat{g}}_{i}({\mathbb {S}}({\hat{\beta }})){\hat{g}}_{i}({\mathbb {S}}({\hat{\beta }}))^{\prime }. \end{aligned}$$
  2. (2)

    Draw \(\{Y_{i}^{*},X_{i}^{*}\}_{i=1}^{n}\) from the original sample \(\{Y_{i},X_{i}\}_{i=1}^{n}\) with equal weights. Then compute

    $$\begin{aligned} M_{n}^{*}({\hat{\beta }})=\frac{1}{n}\sum _{i=1}^{n}{\mathbb {J}}({\hat{\beta }})^{\prime }X_{i}^{*}\{Y_{i}^{*}-{\hat{\psi }}_{{\hat{\beta }}}^{*}(X_{i}^{*\prime }{\mathbb {S}}({\hat{\beta }}))\}, \end{aligned}$$

    where \({\hat{\psi }}_{{\hat{\beta }}}^{*}=\arg \min _{\psi \in {\mathcal {M}}}\sum _{i=1}^{n}\{Y_{i}^{*}-\psi (X_{i}^{*\prime }{\mathbb {S}}({\hat{\beta }}))\}^{2}\).

  3. (3)

    The bootstrap counterpart of \(\ell (\beta _{0})\) is given by

    $$\begin{aligned} \ell ^{*}=n\{M_{n}^{*}({\hat{\beta }})-M_{n}({\hat{\beta }})\}^{\prime }{\bar{V}}^{-1}\{M_{n}^{*}({\hat{\beta }})-M_{n}({\hat{\beta }})\}. \end{aligned}$$
    (10)

Under the additional assumptions A3-A5 in the appendix, the validity of this bootstrap approximation is obtained as follows.

Theorem 2

Under Assumptions A1-A5, it holds

$$\begin{aligned} \sup _{t\ge 0}|P^{*}\{\ell ^{*}\le t\}-P_{0}\{\ell (\beta _{0})\le t\}|\overset{p}{\rightarrow }0, \end{aligned}$$

where \(P^{*}\) is the bootstrap distribution conditional on the data.

3 Simulation

We conduct a simulation study to investigate the finite sample performance of the proposed inference methods. We consider the following data generation process:

$$\begin{aligned} Y= & {} \psi _{0}(X^{\prime }\alpha _{0})+\epsilon ,\quad \psi _{0}(u)=u^{3},\quad \alpha _{0}=(1,1,1)^{\prime }/\sqrt{3}\\ \epsilon\sim & {} N(0,1),\quad X\sim N(0,I_{3}), \end{aligned}$$

where \(I_{3}\) is the \(3\times 3\) identity matrix. We consider sample sizes \(n=100,500,1000\). The number of Monte Carlo replications is 1000. We consider two testing methods discussed in Remarks 2 and 3. For the adjusted statistic in (9), we estimate \(m(\cdot )=E[X|X^{\prime }{\mathbb {S}}(\beta _{0})=\cdot ]\) by the Nadaraya–Watson estimator, and choose the bandwidths based on the expected Kullback–Leibler cross-validation (Hurvich et al., 1998). To test the null hypothesis \(H_{0}:\alpha _{0}=(1,1,1)^{\prime }/\sqrt{3}\), we calculate the test statistic (9) and compare it with the 95 percentile of the \(\chi _{d-1}^{2}\) distribution. For the bootstrap-calibrated test statistic (10), we compute \({\hat{\beta }}\) as in BGH (the computer code is available at Groeneboom’s website), and generate 499 bootstrap samples, and calculate the bootstrap counterpart \(\ell ^{*}\) in (10).

Table  presents the rejection frequencies of the above empirical likelihood tests for the null \(H_{0}:\alpha _{0}=(1,1,1)^{\prime }/\sqrt{3}\) when the true values of \(\alpha _{0}\) are (N) \(\alpha _{0}=(1,1,1)^{\prime }/\sqrt{3}\), (A1) \(\alpha _{0}=(1.03,1,1)^{\prime }/\sqrt{1.03^{2}+2}\), (A2) \(\alpha _{0}=(1.05,1,1)^{\prime }/\sqrt{1.05^{2}+2}\), and (A3) \(\alpha _{0}=(1.10,1,1)^{\prime }/\sqrt{1.10^{2}+2}\). (N) is for the size properties, and (A1)-(A3) are to evaluate power properties.

The column “\({\hat{\alpha }}_{1}\)” reports the Monte Carlos averages and standard deviations of the first element of the BGH estimator \({\hat{\alpha }}\). It shows that the mean is close to the truth, \(\alpha _{01}=1/\sqrt{3}\simeq 0.577\), while the standard deviation becomes smaller with the sample size. From the columns (N), we can see that both the adjusted and bootstrap empirical likelihood tests have reasonable size properties. Both tests become powerful as the sample size increases and the true values of \(\alpha _{0}\) are more distinct from the null values (i.e., from A1 to A3). Also, we find that overall the bootstrap test rejects slightly more often than the adjusted test.

Overall, our simulation results are encouraging.

Table 1 Rejection frequencies (%)