1 Introduction

Binary multivariate regression models are for example used to analyze longitudinal data. Those appear in clinical studies and are used to evaluate the effect of interventions over time. For different individuals, information is collected at several assessment times. To deal with incomplete data in a longitudinal setup, inverse probability weighted generalized estimating equations (WGEE) (Robins et al., 1994) are used. The resulting WGEE provides consistent estimators only if the underlying (binary) process of missing data is properly modeled. Of course, this should be secured in advance.

This paper addresses, in a more general context than just described, the question of how to test the model assumptions of a binary generalized linear regression model.

Mathematically, we describe the data with a sequence of independent and identically distributed (iid) random variables

$$\begin{aligned} (\delta _1, X_1), ..., (\delta _n, X_n), \end{aligned}$$

where \(\delta\) is a binary or \(0 - 1\) response variable and \(X \in {\mathbb {R}}^d\) a d-dimensional input with continuous distribution function (df) H. For the binary regression model,

$$\begin{aligned} {\hat{m}} : {\mathbb {R}}^d \ni x \rightarrow {\hat{m}}(x) = {\mathbb {E}}(\delta |X = x) \equiv {\mathbb {P}}(\delta = 1|X = x) \in [0, 1] \end{aligned}$$

denotes the conditional expectation of \(\delta\) given \(X = x\). Under the generalized linear model (GLM), one assumes that there exists a link function g, that is an invertible function with measurable inverse, such that

$$\begin{aligned} g({\mathbb {E}}(\delta |X=x))=\beta _0^{\top }x, \end{aligned}$$

for H almost all \(x \in {\mathbb {R}}^d\) and an appropriate \(\beta _0 \in {\mathbb {R}}^d\). The function g is assumed to be known. Based on this, we set

$$\begin{aligned} m:{\mathbb {R}}\ni t \rightarrow m(t)=g^{-1}(t)\in [0,1]. \end{aligned}$$

Assuming that the data \((\delta ,X)\) comes from a GLM with link function g now means that \({\hat{m}} \in M:=\{m(\beta ^{\top }\cdot )| \beta \in {\mathbb {R}}^d \}\).

If one assumes a GLM to analyze a sample \((\delta _1, X_1),... , (\delta _n, X_n)\) of iid data, one has to guarantee that the linear part and the assumed link-function are correct or, at least, that the data shows no obvious departure from the model. Thus, we need a goodness-of-fit test to validate the model, i. e., we need a universal test to check the null hypothesis

$$\begin{aligned} H_0: {\hat{m}} \in M \text { versus } H_1: {\hat{m}} \notin M. \end{aligned}$$

A general approach for model checking in a regression setup was introduced by Stute, (1997). Stute & Zhu, (2002) specialized this approach to GLM, where the response variable is not necessarily binary. In the binary setup of GLM, the underlying probabilistic background is a functional limit result of the marked empirical process with estimated parameters:

$$\begin{aligned} R_n^1(t)=n^{-1/2}\sum _{i=1}^{n}(\delta _i-m(\beta _n^{\top }X_i))I(\beta _n^{\top }X_i\le t), \text { } t\in {\mathbb {R}}, \end{aligned}$$

where \(\beta _n\) is a proper estimator of \(\beta _0\) and I denotes the indicator function, see Stute (1997). With \(R_n^1(-\infty )=0\) and \(R_n^1(\infty )=n^{-1/2}\sum _{i=1}^{n}(\delta _i-m(\beta _n^{\top }X_i))\), this process is a random element in the Skorokhod space \(D([-\infty ,\infty ])\). Under appropriate conditions, \(R_n^1\) converges in distribution against a centered Gaussian process \(R_{\infty }^1\), which, however, has a rather complicated, model-dependent covariance structure, cf. Theorem 1 in Stute & Zhu, (2002). To make this result usable for applications in statistics, Stute and Zhu introduced a model-based transformation. Applying this transformation, respectively its estimated version, to \(R_n^1\), this composition converges in distribution against a time-transformed Brownian motion, cf. Theorem 2 in Stute & Zhu, (2002). This framework is then used to get asymptotically distribution-free statistics.

The approach works excellently, but has two weak points. For the transformation, one needs an estimate of the conditional expectation of X given \(\beta _0^{\top } X = v\), \({\mathbb {E}}(X\,|\,\beta _0^{\top } X = v)\), for all \(v \in {\mathbb {R}}\). Under general conditions, one must estimate this quantity using a non-parametric procedure. However, such a method always requires a smoothing parameter, but its selection is not unproblematic. However, since the model as a whole is parametric, the question inevitably arises whether this non-parametric method is absolutely necessary. Moreover, a user who wants to check a chosen GLM with this method must implement the model-dependent transformation in each case. This is of course feasible, but goes along with a considerable effort, because the transformation is quite complex especially for non-statisticians. Of course, parts of this procedure could be automated and implemented as software, but then it will hardly be applicable without the appropriate knowledge about the transformation. It would be nice if all this could be avoided.

To estimate \(\beta _0\), we use the maximum likelihood estimator (MLE) given by

$$\begin{aligned} \beta _n=\arg \max \limits _{\beta \in {\mathbb {R}}^d} l_n(\beta ), \end{aligned}$$

where

$$\begin{aligned} l_n(\beta )=\frac{1}{n}\sum _{i=1}^n(\delta _i ln(m(\beta ^{\top }X_i))+(1-\delta _i)ln(1-m(\beta ^{\top }X_i))) \end{aligned}$$

is the normalized log-likelihood function.

For the bootstrap data, we propose the following model-based (MB) resampling scheme similar to the resampling scheme in Dikta et al., (2006). MBB guarantees that the bootstrap data are always generated according to the null hypothesis.

Definition 1

Let \((\delta _1, X_1), ..., (\delta _n, X_n)\) be iid observations, where the \(\delta _i\) are binary and the \(X_i\) have a continuous distribution function H. Let \(\beta _n\) be the corresponding MLE. The model-based resampling scheme is then defined as follows:

  1. 1.

    Set \(X_i^*=X_i\) for \(1\le i\le n\).

  2. 2.

    Generate a sample \(\delta _1^*, ...,\delta _n^*\) of independent Bernoulli random variables where \(\delta _i^*\) has the probability of success given by \(m(\beta _n^{\top }X_i)\), for \(1\le i \le n\), where \(m(\beta ^{\top }x)={\mathbb {P}}_{\beta }(\delta =1|X=x)\).

Under this resampling scheme, only the \(\delta 's\) are resampled, the corresponding \(X's\) are taken from the original sample.

To define \(R_n^{1*}\), the bootstrap analog of \(R_n^1\), we assume a bootstrap sample

$$\begin{aligned} (\delta _1^{*}, X_1^{*}), ..., (\delta _n^{*}, X_n^{*}) \end{aligned}$$

and set

$$\begin{aligned} R_n^{1*}(t)=n^{-1/2}\sum _{i=1}^n(\delta _i^{*}-m(\beta _n^{*\top }X_i^{*}))I(\beta _n^{\top }X_i^{*}\le t), \text { } t\in {\mathbb {R}}, \end{aligned}$$

where \(\beta _n^{*}\) is the MLE corresponding to the log-likelihood function based on the bootstrap sample. Usually the \(\beta _n\) in the indicator is also replaced by \(\beta _n^{*}\). We don’t replace it here, since both processes can be shown to be asymptotically equivalent. Furthermore, simulations that use \(\beta _n\) instead of \(\beta _n^{*}\) run faster, since \(\beta _n\) is the same for each bootstrap sample.

We will prove that the cumulative residual process \(R_n^{1*}(t)\) corresponding to the MB bootstrap data behaves asymptotically as \(R_n^1\) if the original data satisfy the null hypothesis. Thus, the distribution of any statistic that depends continuously on \(R_n^1\) can be approximated by the corresponding distribution based on \(R_n^{*1}\). This provides the basic asymptotic backup of our method. But in addition to this, an approximation based on \(R_n^{1*}\) also has the advantage that it accurately reflects the fixed sample sizes. Even if the original data come from the alternative, the bootstrap data are always generated under the null hypothesis. Thus, a statistic based on \(R_n^{1*}\) fits the null hypothesis. This is crucial because p-values are based on the distribution under the null hypothesis. Overall, this should lead to a more accurate approximation of the p-values compared to the pure asymptotic one under finite sample size, and, hence, to an improvement of the power. Indeed, we can observe some improvements in the simulation study.

As in Stute & Zhu, (2002), we consider a Kolmogorov-Smirnov (KS) and Cramér-von Mises (CvM) test statistics \(D_n\) and \(W_n\) based on \(R_n^1\) as

$$\begin{aligned} D_n=\sup \limits _{t\in {\mathbb {R}}}|R_n^1(t)| \end{aligned}$$

and

$$\begin{aligned} W_n=\int {(R_n^1(t))^2H_n(dt)}. \end{aligned}$$

Here \(H_n\) is the empirical distribution function (edf) of the \(\beta _n^{\top }X\) sample. Since, under \(H_0\), \(R^1_n\rightarrow R^1_{\infty }\) in distribution, as \(n\rightarrow \infty\), the continuous mapping theorem implies that

$$\begin{aligned} D_n\longrightarrow D_{\infty }\equiv \sup \limits _{t\in {\mathbb {R}}}|R_{\infty }^1(t)|, \end{aligned}$$

and \(W_n\rightarrow W_{\infty }\) in distribution, as \(n\rightarrow \infty\).

If under \(H_0\) the process \(R_n^{1*}\) tends in distribution to the same limiting process \(R_{\infty }^{1}\) as \(R_n^{1}\), the p-values corresponding to the KS and CvM test can now be approximated by the typical Monte-Carlo approach (used in bootstrap applications) based on the distribution of

$$\begin{aligned} D_n^{*}=\sup \limits _{t\in {\mathbb {R}}}|R_n^{1 *}(t)| \end{aligned}$$

and

$$\begin{aligned} W_n^{*}=\int {(R_n^{1 *}(t))^2H_n^{*}(dt)}, \end{aligned}$$

where \(H_n^{*}\) denotes the edf based on the \(\beta _n^{T}X^{*}\) sample.

This article is organized as follows: In Section 2 we state the main results, which guarantee that the MBB can be used to test our null hypothesis. In Section 3 the approach is applied in a simulation study and a real data application. Here, our approach is also compared to the approach by Stute & Zhu, (2002). The results of Section 3 are discussed in Section 4. The proofs of our main results are provided in Section 5. Additionally, in the Appendix some results used in Section 5 are presented.

2 Main results

In this chapter, our main result is given in Theorem 2.

To prove Theorem 2, we first show that in the space \(D[-\infty , \infty ]\), the process \(R_n^{*}(u)=n^{-1/2}\sum _{i=1}^n\left( \delta _i^{*}-m(\beta _n^TX_i)\right) I(\beta _n^TX_i\le u) \rightarrow R_\infty\) in distribution, where \(R_\infty\) is a centered Gaussian process, see Theorem 1. This process is similar to \(R_n^1(u)\), but \(\delta\) is replaced with \(\delta ^{*}\). Theorem 1 is a stepping stone for proving Theorem 2, in which we also replace \(\beta _n\) with \(\beta _n^{*}\). To prove both theorems, we show that the fidis of both processes converge and that the processes are tight, see Theorem 13.5 of Billingsley, (1999). Lemma 1 (iii) provides a result which is required to prove the convergence of the fidis of the process \(R_n^{*}\). Lemma 1 (i) and Lemma 1 (ii) are required to prove Lemma 1 (iii).

Since we finally replace \(\beta _n\) with \(\beta _n^{*}\) in Theorem 2, we need to ensure that \(\beta _n^{*}\) converges to \(\beta _n\), which is done in Lemma 2. The proof of Theorem 2 uses a decomposition of the process \(R_n^{1*}(u)\) into \(R_n^{*}(u)\) and a difference term. To simplify the representation, Lemma 3 is used. With the final decomposition we now prove the tightness and the convergence of the fidis of the process \(R_n^{1*}(u)\) .

For Theorem 1 we need the following assumptions:

  1. (A1)

    \(\beta _n\rightarrow \beta _0\), as \(n\rightarrow \infty\), w.p. 1.

  2. (B1)

    Define

    $$\begin{aligned} H(u, \beta )=\int m(\beta ^TX){\bar{m}}(\beta ^TX)I(\beta ^TX\le u)d{\mathbb {P}}, \end{aligned}$$

    where \({\bar{m}}=1-m\). H is uniformly continuous in u at \(\beta\).

  3. (C1)

    \(m(\beta ^Tx)\) is continuous in \(\beta ^Tx\).

  4. (D1)

    \(m(\beta ^Tx)\) is continuous differentiable in \(\beta ^Tx\) with

    \(m'(\beta ^Tx)=\partial m(\beta ^Tx)/\partial (\beta ^Tx)\) and \(m'\) is bounded.

Assumptions (C1), (D1) and (B1) are similar to assumptions (B) and (C) in Stute & Zhu, (2002), but specified to the binary setup. Furthermore, with (A1) we ensure that \(\beta _n\rightarrow \beta _0\), as \(n\rightarrow \infty\), w.p. 1.

As mentioned before, the following Lemma is used to prove the convergence of the fidis of the process \(R_n^{*}(u)\), which is defined in Theorem 1.

Lemma 1

(i) If assumption (D1) is fulfilled,

$$\begin{aligned} \begin{aligned} \sup \limits _{\beta \in {\mathbb {R}}^d, u\in {\mathbb {R}}}\left| \frac{1}{n}\right.&\sum _{i=1}^n I(\beta ^TX_i\le u)m(\beta ^TX_i){\bar{m}}(\beta ^TX_i)\\&\left. -{\mathbb {E}}\left( I(\beta ^TX\le u)m(\beta ^TX){\bar{m}}(\beta ^TX)\right) \right| \rightarrow 0, \end{aligned} \end{aligned}$$

as \(n\rightarrow \infty\), w.p. 1.

(ii) If assumptions (B1) and (C1) are fulfilled,

$$\begin{aligned} \begin{aligned} \sup \limits _{|\beta -\beta _0|\le \varepsilon , u\in {\mathbb {R}}}\left| {\mathbb {E}}\right.&\left( I(\beta ^TX\le u)m(\beta ^TX){\bar{m}}(\beta ^TX)\right) \\&\left. -{\mathbb {E}}\left( I(\beta _0^TX\le u)m(\beta _0^TX){\bar{m}}(\beta _0^TX)\right) \right| \rightarrow 0, \end{aligned} \end{aligned}$$

as \(\varepsilon \rightarrow 0\).

(iii) If assumptions (A1), (B1), (C1) and (D1) are fulfilled and \({\mathbb {E}}(|X|)<\infty\), then

$$\begin{aligned} \begin{aligned} \sup \limits _{u\in {\mathbb {R}}}\left| \frac{1}{n}\right.&\sum _{i=1}^n I(\beta _n^TX_i\le u)m(\beta _n^TX_i){\bar{m}}(\beta _n^TX_i)\\&\left. -{\mathbb {E}}\left( I(\beta _0^TX\le u)m(\beta _0^TX){\bar{m}}(\beta _0^TX)\right) \right| \rightarrow 0, \end{aligned} \end{aligned}$$

as \(n\rightarrow \infty\), w.p. 1.

Now, in the process \(R_n^1(u)\), we replace \(\delta\) with \(\delta ^{*}\), where \(\delta ^{*}\) is generated by using the MB scheme. As stated in the following Theorem, \(R_n^{*}(u)\) converges.

Theorem 1

Assume that \({\mathbb {E}}(|X|)<\infty\), assumptions (A1), (B1), (C1) and (D1) are satisfied, and the MB resampling scheme is used to generate the bootstrap data, then, w.p. 1, under the null hypothesis, the process

$$\begin{aligned} R_n^{*}(u)=n^{-1/2}\sum _{i=1}^n\left( \delta _i^{*}-m(\beta _n^TX_i)\right) I(\beta _n^TX_i\le u) \rightarrow R_\infty \end{aligned}$$

in distribution in the space \(D[-\infty , \infty ]\), where \(R_\infty\) is a centered Gaussian process with covariance function

$$\begin{aligned} K(s,t)={\mathbb {E}}\left( R_{\infty }(s),R_{\infty }(t)\right) =\int m(\beta _0^TX){\bar{m}}(\beta _0^TX)I(\beta _0^TX\le s\wedge t)d{\mathbb {P}}. \end{aligned}$$

After replacing \(\delta\) with \(\delta ^{*}\), we need to replace \(\beta _n\) with \(\beta _n^{*}\).

For this, we define

$$\begin{aligned} \begin{aligned} l(\beta ^TX,\delta ^{*})&=\frac{\partial }{\partial \beta }\left( \delta ^{*}ln\left( m(\beta ^TX)\right) +\left( 1-\delta ^{*}\right) ln(1-m(\beta ^TX))\right) \\&=\delta ^{*}\frac{w(X, \beta )}{m(\beta ^TX)}-\left( 1-\delta ^{*}\right) \frac{w(X, \beta )}{1-m(\beta ^TX)} \end{aligned} \end{aligned}$$

which is the derivative of the summands of the log-likelihood function and \(w(x,\beta )=\partial m(\beta ^Tx)/\partial \beta =\left( w_1(x,\beta ), ..., w_{d}(x,\beta )\right) ^T\).

Check that \({\mathbb {E}}_n^{*}\left( l(\beta _n^TX, \delta ^{*})\right) =0\).

For the following Lemmas and Theorem 2 we need some additional assumptions:

  1. (A2)

    \(L(\beta _0)={\mathbb {E}}\left( l(\beta _0^TX,\delta )l^T(\beta _0^TX,\delta )\right)\) exists and is positive definite.

  2. (B2)

    \(n^{1/2}(\beta _n^{*}-\beta _n)=n^{-1/2}\sum _{i=1}^n l(\beta _n^TX_i,\delta ^{*}_{i})+o_{{\mathbb {P}}_n^{*}}(1)\), w.p. 1.

  3. (C2)

    \(L_n^{*}(\beta _n)=\frac{1}{n}\sum _{i=1}^n{\mathbb {E}}_n^{*}\left( l(\beta _n^TX_i,\delta ^{*}_{i})l^T(\beta _n^TX_i,\delta ^{*}_{i})\right) \rightarrow L(\beta _0)\), w.p. 1.

  4. (D2)

    For every \(x\in {\mathbb {R}}^d\), \(w(x,\beta )=\partial m(\beta ^Tx)/\partial \beta =\left( w_1(x,\beta ), ..., w_{d}(x,\beta )\right) ^T\) exists and is continuous with respect to \(\beta\) for every \(\beta\) in a neighborhood of \(\beta _0\) (not depending on x).

  5. (E2)

    There exists a square-integrable function M(x) such that for every x

    \(\max \left( \frac{w_i(x,\beta )}{m(\beta ^Tx)}, \frac{w_i(x,\beta )}{1-m(\beta ^Tx)}\right) \le M(x)\) for every \(\beta\) in a neighborhood of \(\beta _0\) and \(1\le i\le d\).

  6. (F2)

    The function

    $$\begin{aligned} W: {\mathbb {R}}\times V_{\beta }\ni (x,\beta )\rightarrow W(x,\beta )={\mathbb {E}}\left( w(X,\beta _0)I(\beta ^TX\le x)\right) \in {\mathbb {R}}^{d} \end{aligned}$$

    is uniformly continuous in u at \(\beta _0\), where \(V_{\beta }=\{\beta :\beta \in V\}\) and V is given under (D2).

Assumptions (D2) and (E2) are again similar to assumption (B) in Stute & Zhu, (2002), but specified to the binary setup. Furthermore, assumptions (A2) and (B2) are similar to assumption (A).

Lemma 2 is necessary to ensure that \(\beta _n^{*}\) converges to \(\beta _n\).

Lemma 2

Assume that assumptions (A1), (A2), (B2), (C2) and (E2) hold. Then, w.p. 1,

$$\begin{aligned} n^{1/2}(\beta _n^{*}-\beta _n)\rightarrow Z, ~as~n\rightarrow \infty , \end{aligned}$$

where Z is a multivariate normal distribution with zero mean and covariance matrix \(L(\beta _0)\).

In addition, we need some results for \(w(x,\beta )\) and \(W(x,\beta )\).

Lemma 3

Let \({\hat{\beta }}_n^{*}: {\mathbb {R}}^d\rightarrow V\) be a measurable function such that \({\hat{\beta }}_n^{*}(x)\) lies in the line segment that connects \(\beta _n^{*}\) and \(\beta _n\) for each \(x\in {\mathbb {R}}^d\) and assume (A1), (A2), (D2), (E2) and (F2) hold. Then, w.p. 1, for \(1\le j\le d\),

  1. (i)

    \(\sup \limits _{u\in {\mathbb {R}}}\left| n^{-1}\sum _{i=1}^nw_j(X_i, \beta _0)I(\beta _n^TX_i\le u)-W_j(u,\beta _0)\right| \rightarrow 0,~as~n\rightarrow \infty ,\)

  2. (ii)

    \(\sup \limits _{u\in {\mathbb {R}}}\left| n^{-1}\sum _{i=1}^n\left( w_j\left( X_i,{\hat{\beta }}_n^{*}(X_i)\right) -w_j(X_i, \beta _0)\right) I(\beta _n^TX_i\le u)\right| =o_{{\mathbb {P}}_n^{*}}(1).\)

Finally, the process \(R_n^{1*}(u)\) converges in distribution.

Theorem 2

Assume that \({\mathbb {E}}(|X|)<\infty\), assumptions (A1), (B1), (C1), (D1), (A2), (B2), (C2), (D2), (E2) and (F2) are satisfied, and the MB resampling scheme is used to generate the bootstrap data, then, w.p. 1, under the null hypothesis, the process

$$\begin{aligned} R_n^{1*}(u)=n^{-1/2}\sum _{i=1}^n\left( \delta _i^{*}-m(\beta _n^{*T}X_i)\right) I(\beta _n^TX_i\le u) \rightarrow R^1_\infty \end{aligned}$$

in distribution in the space \(D[-\infty , \infty ]\), where \(R^1_\infty\) is a centered Gaussian process with covariance function

$$\begin{aligned} \begin{aligned} {\hat{K}}(s,t)&=K(s,t)+W^T(s, \beta _0)L(\beta _0)W(t, \beta _0)\\&\quad -2W^T(s, \beta _0)W(t, \beta _0). \end{aligned} \end{aligned}$$

3 Simulations and real data application

3.1 Simulations

To clarify the results, the Bootstrap approach is compared to the approach introduced by Stute & Zhu, (2002). For the application of their method, we make use of an additional assumption to avoid the non-parametric estimation of \({\mathbb {E}}(X|\beta _0^TX=v)\). As stated in Stute & Zhu, (2002), page 541, we assume that X belongs to a family of elliptically contoured distributions. Note that we do not need this assumption for our bootstrap approach. To calculate the p-values for the approach by Stute and Zhu we use the Karhunen-Loève expansion for a Brownian motion [(Bass, 2011), formula (6.2)] to approximate the distribution of the integrated squared Brownian motion over the unit interval.

In all simulations, the empirical powers and ecdfs of the p-values based on the CvM statistic are calculated from 1000 replications. The sample sizes are set to \(n=50\) and \(n=100\). For the Bootstrap approach, each p-value is based on 200 bootstrap samples. The ecdfs of the 1000 p values per simulation and approach are displayed in a graph together with the uniform distribution function (red: Bootstrap approach, blue: approach by Stute and Zhu, gray: uniform distribution function). In addition, the percentages of rejecting the null hypothesis (at levels \(\alpha =0.05\) and \(\alpha =0.01\)) are given explicitly.

In the first simulation, we generate uncorrelated \(X_i\) from a 3-dimensional normal distribution with mean values 0 and variance 1. Based on a chosen \(\beta\) (\(\beta =(1,1,2)^T\)) we calculate the probability \(P(\delta =1|X = x)\), assuming a logistic regression model. In our test, we assume that the generated data belong to a GLM with a logistic regression function where \(\beta\) is 3-dimensional, which is true. Table 1 shows the results. The two ecdfs of the p values based on the CvM statistic are very similar to the distribution function of a uniform distribution. Thus, the test holds the level.

Table 1 \(H_0\) (logistic regression function, \(\beta\) is 3-dimensional) is true

In the second simulation, the data are generated the same way as in the first simulation, but now the third covariate was squared. We assume that the data belong to a GLM with a logistic regression function where the third component is not squared, which is false. Table 2 shows that both approaches yield similar results. Furthermore, in both cases the power increased with the sample size.

Table 2 \(H_0\) (logistic regression function, \(\beta\) is 3-dimensional, third component not squared) is false

In the third simulation, we generate data using a nonparametric mixing of logistic regression, see Agresti, (2002), 13.2.2. The \(X_i\) are again generated from a 3-dimensional normal distribution with mean values 0 and variance 1 and \(\beta =(1,1,2)^T\). Furthermore, a Bernoulli variable with \(p=0.2\) is generated. If this variable is 0, we add 1 to \(\beta ^T X\). Again we calculate the probability \(P(\delta =1|X = x)\) assuming a logistic regression model. In our test, we assume that the generated data belong to a GLM with a logistic regression function where \(\beta\) is 3-dimensional, which is false. Table 3 shows similar results as in the second simulation.

Table 3 \(H_0\) (logistic regression function, \(\beta\) is 3-dimensional) is false

In the last simulation, we generate the data in the same way as in the first simulation again. This time, we assume a probit regression model where \(\beta\) is 3-dimensional, which is false. Table 4 shows, that all ecdfs of the p values based on the CvM statistic a are very similar to the distribution function of a uniform distribution. Thus, both tests do not detect this departure from the null hypothesis.

Table 4 \(H_0\) (logistic regression function, \(\beta\) is 3-dimensional) is false

3.2 Real data application

We applied the introduced test to the data set reported on by Härdle and Stoker, (1989). This data set consists of 58 measurements on simulated side impact collisions. The fatality (binary \(0-1\) random variable, 1 means the crash resulted in fatality) and three covariates (age of the driver, velocity of the automobile, maximal acceleration measured on the subject’s abdomen) were measured. Härdle and Stoker estimated \(\beta _0\) and fitted m in a non-parametric way and concluded that the link function is of "distribution type", i.e., non-decreasing in \(\beta _0^Tx\), as in the logit or probit case. They did not check if a GLM would fit the data at all. We tested, if (after a standardization) a GLM with a logit or probit link function is appropriate for the data set. Based on the bootstrap approach the p value for a logit link function is 0.047, for a probit link function 0.049. Thus, in both cases, the model is rejected. Stute and Zhu (2002) also applied their approach to this data set and came to the same result.

4 Discussion

Our small simulation study indicates that the bootstrap approach has slightly better empirical power than the Stute & Zhu, (2002) approach. This is noteworthy because the Stute and Zhu approach was conducted here under an additional assumption (elliptically contoured distributions) that is unnecessary for the bootstrap approach. If this additional assumption is not fulfilled, then non-parametric regression estimation has to be applied in the Stute and Zhu procedure, but this entails further problems (choice of smoothing parameter) and can have negative effects on the power of the test. For the bootstrap method all these problems do not exist.

The resampling procedure guarantees that the bootstrap data are always generated under the null hypothesis, regardless of whether the original data satisfy the null hypothesis or not. Consequently, the distribution of a test statistic based on the bootstrap data fits the null hypothesis. If the test statistic based on the original data lies at the edge of this bootstrap-based distribution, then this indicates a violation of the null hypothesis. It is important to note that the sample size is also considered in the approximating distribution by the bootstrap approach. In the approximation with the asymptotic distribution this is not given in the last consequence. We assume that the slight improvement with respect to the empirical power is based on this. That the consideration of the sample size in the approximating distribution can be advantageous compared to the approximation by the limiting distribution, Singh, (1981) was able to prove for the classical bootstrap and the standardized mean. However, this is not studied further in our paper, but should be addressed theoretically in future work.

The bootstrap method is easier to implement because it is not as technically demanding as the method of Stute and Zhu. However, it is more complex in terms of computing time. The latter is always of great importance if the method is to be used on a large scale.

5 Proofs

Proof of Lemma 1

Define \({\mathcal {F}}=\{I(\beta ^T\cdot \le u)m(\beta ^T\cdot ){\bar{m}}(\beta ^T\cdot ), \beta \in {\mathbb {R}}^d, u \in {\mathbb {R}}\}\). Following Lemma 7, \({\mathcal {F}}\) is GC. Thus (i) is true.

For (ii) check that

$$\begin{aligned}&\sup \limits _{|\beta -\beta _0|\le \varepsilon , u\in {\mathbb {R}}}\left| {\mathbb {E}}\left( I(\beta ^TX\le u)m(\beta ^TX){\bar{m}}(\beta ^TX)\right) \right. \\&\qquad \left. -{\mathbb {E}}\left( I(\beta _0^TX\le u)\right. \left. m(\beta _0^TX){\bar{m}}(\beta _0^TX)\right) \right| \\&\quad \le \sup \limits _{|\beta -\beta _0|\le \varepsilon , u\in {\mathbb {R}}}{\mathbb {E}}\left( \left| m(\beta ^TX){\bar{m}}(\beta ^TX)-{\mathbb {E}}\left( m(\beta _0^TX){\bar{m}}(\beta _0^TX)\right) \right| \right) \\&\qquad +\sup \limits _{|\beta -\beta _0|\le \varepsilon , u\in {\mathbb {R}}}\left| {\mathbb {E}}\left( I(\beta ^TX\le u)m(\beta _0^TX){\bar{m}}(\beta _0^TX)\right) \right. \\&\qquad \left. -{\mathbb {E}}\left( I(\beta _0^TX\le u)m(\beta _0^TX){\bar{m}}(\beta _0^TX)\right) \right| \end{aligned}$$

Due to assumption (C1) the dominated convergence theorem yields that the first term converges to 0 as \(\varepsilon \rightarrow 0\).

Denote the second term as \(\sup \limits _{|\beta -\beta _0|\le \varepsilon , u\in {\mathbb {R}}}A(\beta , u)\) and choose \(K>0\) and check that

$$\begin{aligned} \begin{aligned}&A(\beta , u)\\&\quad \le {\mathbb {E}}\left( m(\beta _0^TX){\bar{m}}(\beta _0^TX)\left| I(\beta ^TX\le u)-I(\beta _0^TX\le u)\right| I(|X|\le K)\right) \\&\qquad +{\mathbb {E}}\left( m(\beta _0^TX){\bar{m}}(\beta _0^TX)\left| I(\beta ^TX\le u)-I(\beta _0^TX\le u)\right| I(|X|>K)\right) \\&\quad = A_1(\beta , \beta _0, u, K)+A_2(\beta , \beta _0, u, K) \end{aligned} \end{aligned}$$

Select \(\gamma >0\) to get

$$\begin{aligned} \begin{aligned}&A_1(\beta , \beta _0,u, K)\\&\quad \le {\mathbb {E}}\left( m(\beta _0^TX){\bar{m}}(\beta _0^TX)\left| I(\beta ^TX\le u)-I(\beta _0^TX\le u)\right| \right. \\&\quad \left. \quad \cdot I(|X|\le K)I\left( |\beta ^TX-\beta _0^TX|\le \gamma \right) \right) \\&\qquad +I(|\beta -\beta _0|>\gamma /K){\mathbb {E}}\left( m(\beta _0^TX){\bar{m}}(\beta _0^TX)\right) \\&\quad \le {\mathbb {E}}\left( m(\beta _0^TX){\bar{m}}(\beta _0^TX)\left| I(\beta _0^TX\le u+\gamma )-I(\beta _0^TX\le u)\right| \right) \\&\qquad +I(|\beta -\beta _0|>\gamma /K){\mathbb {E}}\left( m(\beta _0^TX){\bar{m}}(\beta _0^TX)\right) \\&\quad =H(u+\gamma ,\beta _0)-H(u, \beta _0)\\&\qquad +I(|\beta -\beta _0|>\gamma /K){\mathbb {E}}\left( m(\beta _0^TX){\bar{m}}(\beta _0^TX)\right) \\&\quad =A_{1,1}(\beta _0, u, \gamma )+A_{1,2}(\beta ,\beta _0,\gamma , K) \end{aligned} \end{aligned}$$

Fix \(\delta >0\). Since \({\mathbb {E}}\left( m(\beta _0^TX){\bar{m}}(\beta _0^TX)\right) \le 1\), we can find a \(K>0\) such that \(A_2(\beta , \beta _0, u, K)\le \delta .\) Due to assumption (B1), \(H(\cdot , \beta _0)\) is uniformly continuous and therefore we can find a \(\gamma >0\) such that \(A_{1,1}<\delta\) uniformly in u. Furthermore, we can choose \(\varepsilon\) such that \(\varepsilon <min(\gamma ,\gamma /K)\) which yields that \(A_{1,2}(\beta ,\beta _0,\gamma , K)=0\) and, therefore, \(A(\beta , \beta _0, u)<2\delta\). This proves part (ii).

Since \(\beta _n\rightarrow \beta _0\), w.p. 1, (iii) follows directly from (i) and (ii). \(\square\)

Proof of Theorem 1

To prove the Theorem, we will use Theorem 13.5 of Billingsley (1999). We first show, that the fidis of \(R_n^{*}\) converge in distribution to the fidis of \(R_{\infty }\). Obviously, \(R_n^{*}\) has independent zero-mean summands, since

$$\begin{aligned} \begin{aligned} {\mathbb {E}}_n^{*}&\left( I(\beta _n^TX_i^{*}\le u)\left( \delta _i^{*}-m(\beta _n^TX_i^{*})\right) \right) \\&={\mathbb {E}}_n^{*}\left( I(\beta _n^TX_i\le u)\left( \delta _i^{*}-m(\beta _n^TX_i)\right) \right) \\&=I(\beta _n^TX_i\le u)\left( {\mathbb {E}}_n^{*}(\delta _i^{*})-m(\beta _n^TX_i)\right) \\&=I(\beta _n^TX_i\le u)\left( m(\beta _n^TX_i)-m(\beta _n^TX_i)\right) \\&=0. \end{aligned} \end{aligned}$$

For the covariance of \(R_n^{*}\) we get for \(u_1, u_2 \in {\mathbb {R}}\)

$$\begin{aligned} \begin{aligned}&Cov_n^{*}(R_n^{*}(u_1), R_n^{*}(u_2))={\mathbb {E}}_n^{*}(R_n^{*}(u_1)R_n^{*}(u_2))\\&\quad ={\mathbb {E}}_n^{*}\left( \frac{1}{n}\sum _{i=1}^n I(\beta _n^TX_i\le u_1)\left( \delta _i^{*}-m(\beta _n^TX_i)\right) \right. \\&\quad \quad \left. \cdot \sum _{j=1}^n I(\beta _n^TX_j\le u_2)\left( \delta _j^{*}-m(\beta _n^TX_j)\right) \right) \\&\quad =\frac{1}{n}\sum _{1\le i,j\le n} I(\beta _n^TX_i\le u_1)I(\beta _n^TX_j\le u_2)\\&\quad \quad \cdot {\mathbb {E}}_n^{*}\left( \left( \delta _i^{*}-m(\beta _n^TX_i)\right) \left( \delta _j^{*}-m(\beta _n^TX_j)\right) \right) , \end{aligned} \end{aligned}$$

where \(\delta _i^{*}\) and \(\delta _j^{*}\) are iid. Thus, if \(i\ne j\), the expectation in the last equation is 0. Therefore, the last equation equals

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n I(\beta _n^TX_i\le u_1 \wedge u_2){\mathbb {E}}_n^{*}\left( (\delta _i^{*}-m(\beta _n^TX_i))^2\right) . \end{aligned}$$

Here, the expectation equals the conditional covariance of a binomial distribution with success probability \(m(\beta _n^TX_i)\). Thus, for the last equation we get

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n I(\beta _n^TX_i\le u_1 \wedge u_2)m(\beta _n^TX_i){\bar{m}}(\beta _n^TX_i). \end{aligned}$$

Due to Lemma 1 (iii), this converges to \({\mathbb {E}}\left( I(\beta _0^TX\le u_1 \wedge u_2)m(\beta _0^TX){\bar{m}}(\beta _0^TX)\right)\) uniformly in u for \(n\rightarrow \infty\) , w. p. 1. Thus, w. p. 1, the covariance function of the process \(R_n^{*}(u)\) converges to

$$\begin{aligned} K(u_s,u_t)=Cov\left( R_{\infty }(u_s), R_{\infty }(u_t)\right) =\int m(\beta _0^TX){\bar{m}}(\beta _0^TX)I(\beta _0^TX\le u_s\wedge u_t)\,d{\mathbb {P}}. \end{aligned}$$

Now let \(k\in {\mathbb {N}}\) and choose \(-\infty \le u_1<...<u_k\le \infty\). Following Cramér-Wold, see Theorem 7.7 of Billingsley, (1999), we have to show that, w.p. 1, for every \(a\in {\mathbb {R}}^k\), \(a\ne 0\)

$$\begin{aligned} \sum _{j=1}^ka_jR_n^{*}(u_j)\rightarrow {\mathcal {N}}(0, a^T\Sigma a),~for~n\rightarrow \infty \end{aligned}$$

in distribution, with \(\Sigma =(\sigma _{s,t})_{1\le s,t\le k}\) and \(\sigma _{s,t}=Cov(R_{\infty }(u_s), R_{\infty }(u_t))={\mathbb {E}}(R_{\infty }(u_s), R_{\infty }(u_t))=K(u_s, u_t)\).

Set

$$\begin{aligned} \begin{aligned} Z_n^{*}&=\sum _{j=1}^ka_jR_n^{*}(u_j)=n^{-1/2}\sum _{i=1}^n\left( \left( \delta _i^{*}-m(\beta _n^TX_i)\right) \sum _{j=1}^ka_jI(\beta _n^TX_i\le u_j)\right) \\&=\sum _{i=1}^n\xi _{i,n}^{*}A_{i,n}, \end{aligned} \end{aligned}$$

where \(\xi _{i,n}^{*}=n^{-1/2}\left( \delta _i^{*}-m(\beta _nX_i)\right)\) and \(A_{i,n}=\sum _{j=1}^ka_jI(\beta _n^TX_i\le u_j)\). Here, \(\xi _{1,n}^{*},...,\xi _{n,n}^{*}\) are independent and centered, and \(A_{1,n}, ..., A_{n,n}\) are deterministic in the bootstrap setup. To show the asymptotic normality of \(Z_n^{*}\), we apply Theorem 1.9.3 of Serfling, (1980) and prove the Lindeberg condition,

$$\begin{aligned} \frac{1}{Var_n^{*}(Z_n^{*})^2}\sum _{i=1}^n\int (\xi _{i,n}^{*}A_{i,n})^2 I\left( |\xi _{i,n}^{*}A_{i,n}|>\varepsilon \sqrt{Var_n^{*}(Z_n^{*})}\right) d{\mathbb {P}}^{*}\rightarrow 0, \end{aligned}$$

as \(n\rightarrow \infty\), is true w.p. 1 for each \(\varepsilon >0\).

First, check that

$$\begin{aligned} \begin{aligned} Var_n^{*}(Z_n^{*})&=Var_n^{*}\left( \sum _{j=1}^ka_jR_n^{*}(u_j)\right) \\&=\sum _{1\le s,t\le k}a_sCov_n^{*}\left( R_n^{*}(u_s), R_n^{*}(u_t)\right) a_t\\&\rightarrow \sum _{1\le s,t\le k}a_sCov\left( R_{\infty }(u_s), R_{\infty }(u_t)\right) a_t\\&=\sum _{1\le s,t\le k}a_sK(u_s,u_t)a_t\\&=a^T\Sigma a, ~as~n\rightarrow \infty ,~w.p.~1. \end{aligned} \end{aligned}$$

Since \(\Sigma\) is positive semi-definite, \(a^T\Sigma a\ge 0\). If \(a^T\Sigma a=0\), Tschebyscheff’s inequality guarantees that \(Z_n^{*}=o_{{\mathbb {P}}_n^{*}}(1)\) and thus, for \(n\rightarrow \infty\),

$$\begin{aligned} Z_n^{*}\rightarrow {\mathcal {N}}(0, a^T\Sigma a),~w.p.~1. \end{aligned}$$

Now, assume that \(a^T\Sigma a>0\). Obviously \(|A_{i,n}|\le ||a||k\). Hence, for each \(\epsilon >0\),

$$\begin{aligned} \begin{aligned} I&\left( \left| \xi _{i,n}^{*}A_{i,n}\right|>\varepsilon \sqrt{Var_n^{*}(Z_n^{*})}\right) \\&\le I\left( \left| \xi _{i,n}^{*}\right| ||a||k>\varepsilon \sqrt{Var_n^{*}(Z_n^{*})}\right) \\&=I\left( \left| \delta _i^{*}-m(\beta _n^TX_i)\right| ||a||k>\sqrt{n}\varepsilon \sqrt{Var_n^{*}(Z_n^{*})}\right) \\&\le I\left( ||a||k>\sqrt{n}\varepsilon \sqrt{Var_n^{*}(Z_n^{*})}\right) \\&=0, ~as~n\rightarrow \infty , ~w.p.~1.\\ \end{aligned} \end{aligned}$$

Thus, the indicator of the Lindeberg condition equals 0 as \(n\rightarrow \infty\) and therefore the Lindeberg condition is fulfilled, the finite dimensional distributions converge to \({\mathcal {N}}(0, a^T\Sigma a)\). This is part (i) of Theorem 13.5 of Billingsley, (1999).

For the tightness we use a modification of this Theorem, see Corollary 1, where F also depends on n. For this we assume that our process is only defined on the interval [0, 1]. If this is not the case, we can use a transformation to receive such a process.

Check that for \(0\le u_1\le u\le u_2\le 1\)

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_n^{*}\left( \left[ R_n^{*}(u)\right. \right. -R_n^{*}(u_1)\left. \right] ^2\left[ R_n^{*}(u_2)-R_n^{*}(u)\right] ^2\left. \right) \\&\quad =\frac{1}{n^2}{\mathbb {E}}_n^{*}\left( \left[ \sum _{i=1}^n I(\beta _n^TX_i\le u)\left( \delta _i^{*}-m(\beta _n^TX_i^{*})\right) \right. \right. \\&\qquad \quad -\left. \sum _{i=1}^n I(\beta _n^TX_i\le u_1)\left( \delta _i^{*}-m(\beta _n^TX_i^{*})\right) \right] ^2\\&\qquad \cdot \left[ \sum _{i=1}^n I(\beta _n^TX_i\le u_2)\left( \delta _i^{*}-m(\beta _n^TX_i^{*})\right) \right. \\&\qquad \quad -\left. \left. \sum _{i=1}^n I(\beta _n^TX_i\le u)\left( \delta _i^{*}-m(\beta _n^TX_i^{*})\right) \right] ^2 \right) \\&\quad =\frac{1}{n^2}{\mathbb {E}}_n^{*}\left( \left[ \sum _{i=1}^n I(u_1<\beta _n^TX_i\le u)\left( \delta _i^{*}-m(\beta _n^TX_i^{*})\right) \right] ^2\right. \\&\qquad \cdot \left. \left[ \sum _{i=1}^n I(u<\beta _n^TX_i\le u_2)\left( \delta _i^{*}-m(\beta _n^TX_i^{*})\right) \right] ^2\right) .\\ \end{aligned} \end{aligned}$$

Now set

$$\begin{aligned} \alpha _i=I(u_1<\beta _n^TX_i\le u)\left( \delta _i^{*}-m(\beta _n^TX_i^{*})\right) \end{aligned}$$

and

$$\begin{aligned} \beta _i=I(u<\beta _n^TX_i\le u_2)\left( \delta _i^{*}-m(\beta _n^TX_i^{*})\right) . \end{aligned}$$

Use this and check that

$$\begin{aligned} \begin{aligned} \frac{1}{n^2}&{\mathbb {E}}_n^{*}\left( \left( \sum _{i=1}^n\alpha _i\right) ^2\left( \sum _{j=1}^n\beta _j\right) ^2\right) \\&=\frac{1}{n^2}\sum _{1\le i,j,k,l\le n}{\mathbb {E}}_n^{*}\left( \alpha _i\alpha _j\beta _k\beta _l\right) \\&=\frac{1}{n^2}\left( \sum _{1\le i\ne j\le n}\left( {\mathbb {E}}_n^{*}(\alpha _i\alpha _i\beta _j\beta _j)+2{\mathbb {E}}_n^{*}(\alpha _i\alpha _j\beta _i\beta _j)\right) +\sum _{1\le i\le n}{\mathbb {E}}_n^{*}(\alpha _i^2\beta _i^2)\right) \\&=\frac{1}{n^2}\sum _{1\le i\ne j\le n}{\mathbb {E}}_n^{*}(\alpha _i^2){\mathbb {E}}_n^{*}(\beta _j^2), \end{aligned} \end{aligned}$$

where the last equality follows since the \(\alpha _i\) and \(\beta _i\) are independent and since either \(I(u_1<\beta _n^TX_i\le u)\) or \(I(u<\beta _n^TX_i\le u_2)\) equals 0. Now, recall the definition of \(\alpha _i\) and \(\beta _i\) to get

$$\begin{aligned} \begin{aligned} \frac{1}{n^2}&\sum _{1\le i\ne j\le n}{\mathbb {E}}_n^{*}(\alpha _i^2){\mathbb {E}}_n^{*}(\beta _j^2)\\&=\frac{1}{n^2}\sum _{1\le i\ne j\le n}{\mathbb {E}}_n^{*}\left( I(u_1<\beta _n^TX_i\le u)\left( \delta _i^{*}-m(\beta _n^TX_i^{*})\right) ^2\right) \\&\quad \quad \cdot {\mathbb {E}}_n^{*}\left( I(u<\beta _n^TX_j\le u_2)\left( \delta _j^{*}-m(\beta _n^TX_j^{*})\right) ^2\right) \\&\le \frac{1}{n^2}\sum _{1\le i\ne j\le n}m(\beta _n^TX_i){\bar{m}}(\beta _n^TX_i)I(u_1<\beta _n^TX_i\le u_2)\\&\quad \quad \cdot m(\beta _n^TX_j){\bar{m}}(\beta _n^TX_j)I(u_1<\beta _n^TX_j\le u_2)\\&\le \frac{1}{n^2}\sum _{1\le \, i,\, j\le n}m(\beta _n^TX_i){\bar{m}}(\beta _n^TX_i)I(u_1<\beta _n^TX_i\le u_2)\\&\quad \quad \cdot m(\beta _n^TX_j){\bar{m}}(\beta _n^TX_j)I(u_1<\beta _n^TX_j\le u_2)\\&=\left( \frac{1}{n}\sum _{i=1}^n m(\beta _n^TX_i){\bar{m}}(\beta _n^TX_i)I(u_1<\beta _n^TX_i\le u_2)\right) ^2\\&=\left( \frac{1}{n}\sum _{i=1}^n m(\beta _n^TX_i){\bar{m}}(\beta _n^TX_i)I(\beta _n^TX_i\le u_2)\right. \\&\quad \qquad \left. -\frac{1}{n}\sum _{i=1}^nm(\beta _n^TX_i){\bar{m}}(\beta _n^TX_i)I(\beta _n^TX_i\le u_1)\right) ^2\\&=:\left( H_n(u_2)-H_n(u_1)\right) ^2. \end{aligned} \end{aligned}$$

Since \(H_n(u)\rightarrow K(u,u)\), w.p. 1, due to assumption (B1), a continuous, non-decreasing function H with \(\sup \limits _{u\in {\mathbb {R}}}\left| H_n(u)-H(u)\right| \rightarrow 0\) exists. Therefore, following Corollary 1 the process \(R_n^{*}\) is tight. \(\square\)

Proof of Lemma 2

Following Cramér-Wold, see Theorem 7.7 of Billingsley (1999), due to (B2) we have to show that, w.p. 1, for every \(a\in {\mathbb {R}}^d\), \(a\ne 0\),

$$\begin{aligned} Z_n^{*}=n^{-1/2}\sum _{i=1}^na^Tl(\beta _n^TX_i, \delta _i^{*})\rightarrow {\mathcal {N}}(0, a^TL(\beta _0) a), \end{aligned}$$

in distribution for \(n\rightarrow \infty\). According to Serfling, (1980), Theorem 1.9.3, this follows from the Lindeberg condition,

$$\begin{aligned} \begin{aligned} \frac{1}{Var_n^{*}(Z_n^{*})}\frac{1}{n}\sum _{i=1}^n\int&a^Tl(\beta _n^TX_i, \delta _i^{*})l^T(\beta _n^TX_i, \delta _i^{*})a\\&I\left( |n^{-1/2}a^Tl(\beta _n^TX_i, \delta _i^{*})|>\varepsilon \sqrt{Var_n^{*}(Z_n^{*})}\right) d{\mathbb {P}}^{*}\\&\rightarrow 0, ~ as~ n\rightarrow \infty . \end{aligned} \end{aligned}$$

Use (C2) to get

$$\begin{aligned} \begin{aligned} Var_n^{*}(Z_n^{*})&=\frac{1}{n}\sum _{i=1}^na^T{\mathbb {E}}_n^{*}\left( l(\beta _n^TX_i, \delta _i^{*})l^T(\beta _n^TX_i, \delta _i^{*})\right) a=a^TL_n^{*}(\beta _n)a\\&\rightarrow a^TL(\beta _0)a, \end{aligned} \end{aligned}$$

as \(n\rightarrow \infty\), w.p.1. Furthermore, \(L(\beta _0)\) is positive definite and \(a\ne 0\), thus, \(a^TL(\beta _0)a>0\) and it suffices to show that

$$\begin{aligned} \begin{aligned} \frac{1}{n}\sum _{i=1}^n\int&a^Tl(\beta _n^TX_i, \delta _i^{*})l^T(\beta _n^TX_i, \delta _i^{*})a\\&\cdot I\left( |n^{-1/2}a^Tl(\beta _n^TX_i, \delta _i^{*})|>\varepsilon \sqrt{Var_n^{*}(Z_n^{*})}\right) d{\mathbb {P}}^{*}\rightarrow 0, ~ as~ n\rightarrow \infty . \end{aligned} \end{aligned}$$

The integral equals

$$\begin{aligned} n^{-1}\sum _{i=1}^n a^Tl(\beta _n^TX_i, \delta _i)l^T(\beta _n^TX_i, \delta _i)a\,I\left( |n^{-1/2}a^Tl(\beta _n^TX_i, \delta _i)|>\varepsilon \sqrt{Var_n(Z_n)}\right) \end{aligned}$$

Since \(Var_n^{*}\rightarrow a^TL(\beta _0)a\) we get for the indicator

$$\begin{aligned} \begin{aligned} I&\left( |n^{-1/2}a^Tl(\beta _n^TX_i, \delta _i)|>\varepsilon \sqrt{Var_n(Z_n)}\right) \\&\quad =I\left( |a^Tl(\beta _n^TX_i, \delta _i)|>\varepsilon n^{1/2} \sqrt{Var_n(Z_n)}\right) \\&\quad <I\left( |a^Tl(\beta _n^TX_i, \delta _i)|>\varepsilon i^{1/2} \sqrt{a^TL(\beta _0)a}/2\right) , \end{aligned} \end{aligned}$$

for \(1\le i\le n\) and n sufficiently large.

Due to assumption (E2),

$$\begin{aligned} \begin{aligned} \sum _{i=1}^n&{\mathbb {P}}\left( |a^Tl(\beta _n^TX_i, \delta _i)|^2>\varepsilon i\right) \\&=\sum _{i=1}^n{\mathbb {P}}\left( |a^Tl(\beta _n^TX, \delta _i)|^2>\varepsilon i\right) \\&=\sum _{i=1}^n\varepsilon ^{-1}\int _{[\varepsilon (i-1),\varepsilon i]}{\mathbb {P}}\left( |a^Tl(\beta _n^TX, \delta _i)|^2>\varepsilon i\right) dx\\&\le \sum _{i=1}^n\varepsilon ^{-1}\int _{[\varepsilon (i-1),\varepsilon i]}{\mathbb {P}}\left( |a^Tl(\beta _n^TX, \delta _i)|^2>x\right) dx\\&=\varepsilon ^{-1}\int _{0}^{\infty }{\mathbb {P}}\left( |a^Tl(\beta _n^TX, \delta _i)|^2>x\right) dx\\&=\varepsilon ^{-1}{\mathbb {E}}\left( |a^Tl(\beta _n^TX, \delta _i)|^2\right) \\&<\infty . \end{aligned} \end{aligned}$$

Thus, Borel-Cantelli yields

$$\begin{aligned} \limsup \limits _{i\rightarrow \infty }\frac{|a^Tl(\beta _n^TX_i, \delta _i^{*})|}{\sqrt{i}}=0, \end{aligned}$$

w.p. 1. Therefore, the indicator equals 0 as \(n\rightarrow \infty\), and the Lindeberg condition is fulfilled. \(\square\)

Proof of Lemma 3

Since the half-spaces in \({\mathbb {R}}^d\) are a GC-class and \(w_j(X,\beta )\) is integrable, see assumption (F2), Corollary 9.27 of Kosorok (2008) yields

$$\begin{aligned} \sup \limits _{\beta \in {\mathbb {R}}^d, u\in {\mathbb {R}}}\left| n^{-1}\sum _{i=1}^nw_j(X_i, \beta _0)I(\beta ^TX_i\le u)-W_j(u,\beta )\right| \rightarrow 0, ~as~n\rightarrow \infty ,~w.p.~1. \end{aligned}$$

Due to assumption (A1), for every \(\varepsilon >0\) we get

$$\begin{aligned} \begin{aligned}&\limsup \limits _{n\rightarrow \infty }\sup \limits _{u\in {\mathbb {R}}}\left| n^{-1}\sum _{i=1}^nw_j(X_i, \beta _0)I(\beta _n^TX_i\le u)-W_j(u,\beta _0)\right| \\&\quad \le \limsup \limits _{n\rightarrow \infty }\sup \limits _{\beta \in {\mathbb {R}}^d, u\in {\mathbb {R}}}\left| n^{-1}\sum _{i=1}^nw_j(X_i, \beta _0)I(\beta ^TX_i\le u)-W_j(u,\beta )\right| \\&\qquad +\sup \limits _{|\beta -\beta _0|<\varepsilon , u\in {\mathbb {R}}}\left| W_j(u,\beta )-W_j(u,\beta _0)\right| \\&\quad =\sup \limits _{|\beta -\beta _0|<\varepsilon , u\in {\mathbb {R}}}\left| W_j(u,\beta )-W_j(u,\beta _0)\right| , \end{aligned} \end{aligned}$$

w.p. 1. Furthermore, the last term on the right side tends to 0 as \(\varepsilon \rightarrow 0\). This is part (i).

For part (ii) we get

$$\begin{aligned} \begin{aligned} \sup \limits _{u\in {\mathbb {R}}}&\left| n^{-1}\sum _{i=1}^n\left( w_j\left( X_i,{\hat{\beta }}_n^{*}(X_i)\right) -w_j(X_i, \beta _0)\right) I(\beta _n^TX_i\le u)\right| \\&\le n^{-1}\sum _{i=1}^n \sup \limits _{|\beta -\beta _0|<\varepsilon }\left| w_j(X_i,\beta )-w_j(X_i,\beta _0)\right| +o_{{\mathbb {P}}_n^{*}}(1), \end{aligned} \end{aligned}$$

since, due to (A1) and Lemma 2, for \(\varepsilon >0\), \({\mathbb {P}}_n^{*}(|\beta _n^{*}-\beta _0|>\varepsilon )\rightarrow 0\) as \(n\rightarrow \infty\), w.p. 1.

Furthermore, as \(n\rightarrow \infty\)

$$\begin{aligned} \begin{aligned} n^{-1}\sum _{i=1}^n&\sup \limits _{|\beta -\beta _0|<\varepsilon }\left| w_j(X_i,\beta )-w_j(X_i,\beta _0)\right| \\&\rightarrow {\mathbb {E}}\left( \sup \limits _{|\beta -\beta _0|<\varepsilon }\left| w_j(X_i,\beta )-w_j(X_i,\beta _0)\right| \right) . \end{aligned} \end{aligned}$$

Due to assumption (D2) and (E2), applying the dominated convergence theorem yields that the expectation on the right side tends to 0 as \(\varepsilon \rightarrow 0\). \(\square\)

Proof of Theorem 2

Check that

$$\begin{aligned} \begin{aligned} R_n^{1*}(u)&=n^{-1/2}\sum _{i=1}^n\left( \delta _i^{*}-m(\beta _n^{*T}X_i)\right) I(\beta _n^TX_i\le u)\\&=n^{-1/2}\sum _ {i=1}^n\left( \delta _i^{*}-m(\beta _n^TX_i)\right) I(\beta _n^TX_i\le u)\\&\quad -n^{-1/2}\sum _{i=1}^n\left( m(\beta _n^{*T}X_i)-m(\beta _n^TX_i)\right) I(\beta _n^TX_i\le u)\\&=R_n^{*}(u)-S_n^{*}(u). \end{aligned} \end{aligned}$$

Since we already dealt with \(R_n^{*}(u)\) in Theorem 1, we now have to handle \(S_n^{*}(u)\). It follows from assumptions (A1) and Lemma 2 that

$$\begin{aligned} {\mathbb {P}}_n^{*}(|\beta _n^{*}-\beta _0|>\varepsilon )\rightarrow 0, ~as~n\rightarrow \infty ,~w.p.~1, \end{aligned}$$

for \(\varepsilon >0\). Thus, we can assume that \(\beta _n^{*}\) and \(\beta _n\) are in the neighborhood of \(\beta _0\). Following assumption (D2) we can apply Taylor’s expansion to get

$$\begin{aligned} m(\beta _n^{*T}x)=m(\beta _n^{T}x)+(\beta _n^{*}-\beta _n)^Tw\left( x,{\hat{\beta }}_n^{*}(x)\right) , \end{aligned}$$

where \({\hat{\beta }}_n^{*}(x)\) is in the line segment connecting \(\beta _n^{*}\) and \(\beta _n\). Thus we can write \(S_n^{*}(u)\) as follows:

$$\begin{aligned} \begin{aligned} S_n^{*}(u)&=n^{1/2}(\beta _n^{*}-\beta _n)^Tn^{-1}\sum _{i=1}^nw\left( X_i, {\hat{\beta }}_n^{*}(X_i)\right) I(\beta _n^TX_i\le u)+o_{{\mathbb {P}}_n^{*}}(1)\\&=n^{1/2}(\beta _n^{*}-\beta _n)^TW(u,\beta _0)\\&\quad +n^{1/2}(\beta _n^{*}-\beta _n)^Tn^{-1}\sum _{i=1}^n\left( w\left( X_i, {\hat{\beta }}_n^{*}(X_i)\right) -w(X_i, \beta _0)\right) I(\beta _n^TX_i\le u)\\&\quad +n^{1/2}(\beta _n^{*}-\beta _n)^T\left( n^{-1}\sum _{i=1}^nw(X_i, \beta _0)I(\beta _n^TX_i\le u)-W(u,\beta _0)\right) \\&\quad +o_{{\mathbb {P}}_n^{*}}(1). \end{aligned} \end{aligned}$$

Lemma 3 now yields that

$$\begin{aligned} S_n^{*}(u)=n^{1/2}(\beta _n^{*}-\beta _n)^TW(u,\beta _0)+o_{{\mathbb {P}}_n^{*}}(1), ~w.p.~1, \end{aligned}$$

and with (B2) and (E2)

$$\begin{aligned} S_n^{*}(u)=n^{-1/2}\sum _{i=1}^n l(\beta _n^TX_i,\delta ^{*}_{i})W(u,\beta _0)+o_{{\mathbb {P}}_n^{*}}(1), ~w.p.~1, \end{aligned}$$

uniformly in u.

Now define

$$\begin{aligned} \begin{aligned} {\hat{R}}_n^{1*}(u)&=n^{-1/2}\sum _{i=1}^n \left( (\delta _i^{*}-m(\beta _n^TX_i))I(\beta _n^TX_i\le u)-l^T(\beta _n^TX_i,\delta ^{*}_{i})W(u,\beta _0)\right) \\&=R_n^{*}(u)-n^{-1/2}\sum _{i=1}^n l^T(\beta _n^TX_i,\delta ^{*}_{i})W(u,\beta _0), \end{aligned} \end{aligned}$$

which is asymptotically equivalent to \(R_n^{1*}\), see Theorem 4.1 of Billingsley (1999). Furthermore, following the proof of Theorem 1, \(R_n^{*}(u)\) is tight in \(D[-\infty , \infty ]\) and due to Lemma 2\(n^{-1/2}\sum _{i=1}^n l^T(\beta _n^TX_i,\delta ^{*}_{i})\) converges to a zero mean multivariate normal distribution with covariance matrix \(L(\beta _0)\), w. p. 1.

Furthermore, assumption (F2) yields that \(W(\cdot )\) is continuous.

Thus, \(n^{-1/2}\sum _{i=1}^n l^T(\beta _n^TX_i,\delta ^{*}_{i})W(u,\beta _0)\) is tight in \(C[-\infty ,\infty ]\) and therefore also tight in \(D[-\infty ,\infty ]\). Finally, w.p. 1, \({\hat{R}}_n^{1*}(u)\) is tight in \(D[-\infty ,\infty ]\).

Now let \(k\in {\mathbb {N}}\) and choose \(-\infty \le u_1<...<u_k\le \infty\). Following Cramér-Wold, see Theorem 7.7 of Billingsley, (1999), we have to show that, w.p. 1, for every \(a\in {\mathbb {R}}^k\), \(a\ne 0\)

$$\begin{aligned} Z_n^{1*}=\sum _{j=1}^ka_j{\hat{R}}_n^{1*}(u_j)\rightarrow {\mathcal {N}}(0, a^T\Sigma a),~for~n\rightarrow \infty , \end{aligned}$$

in distribution, with \(\Sigma =(\sigma _{s,t})_{1\le s,t\le k}\) and \(\sigma _{s,t}=Cov\left( R^1_{\infty }(u_s), R^1_{\infty }(u_t)\right) ={\mathbb {E}}\left( R^1_{\infty }(u_s), R^1_{\infty }(u_t)\right) ={\hat{K}}(u_s, u_t)\).

We can rearrange the terms to

$$\begin{aligned} \begin{aligned} Z_n^{1*}&=\sum _{i=1}^n\left( \frac{\delta _i^{*}-m(\beta _n^TX_i)}{\sqrt{n}}\sum _{j=1}^ka_jI(\beta _n^TX_i<u_j)\right. \\&\qquad \left. -\frac{l^T(\beta _n^TX_i, \delta _i^{*})}{\sqrt{n}}\sum _{j=1}^ka_jW(u_j, \beta _0)\right) \\&=\sum _{i=1}^n\xi _{i,n}^{*}A_{i,n}-\eta _{i,n}^{*T}B, \end{aligned} \end{aligned}$$

with \(\xi _{i,n}^{*}=n^{-1/2}\left( \delta _i^{*}-m(\beta _n^TX_i)\right)\) and \(\eta _{i,n}^{*}=n^{-1/2}l(\beta _n^TX_i, \delta _i^{*})\). Obviously, those variables are centered and \((\xi _{1,n}^{*},\eta _{1,n}^{*}),...,(\xi _{n,n}^{*},\eta _{n,n}^{*})\) are independent. Additionally, \(A_{i,n}\) and B are deterministic with respect to \({\mathbb {P}}_n^{*}\). Thus, we get for the variance of \(Z_n^{1*}\)

$$\begin{aligned} \begin{aligned} Var_n^{*}(Z_n^{1*})&=\sum _{i=1}^nA^2_{i,n}Var_n^{*}(\xi _{i,n}^{*})+\sum _{i=1}^nB^T{\mathbb {E}}_n^{*}(\eta _{i,n}^{*}\eta _{i,n}^{*T})B\\&\quad -2B^T\sum _{i=1}^n{\mathbb {E}}_n^{*}(\xi _{i,n}^{*}\eta _{i,n}^{*})A_{i,n}. \end{aligned} \end{aligned}$$

In the proof of Theorem 1 we have shown that

$$\begin{aligned} \sum _{i=1}^nA^2_{i,n}Var_n^{*}(\xi _{i,n}^{*})\rightarrow \sum _{1\le s,t\le n}a_sK(u_s, u_t)a_t, ~as~n\rightarrow \infty ,~w.p.~1. \end{aligned}$$

Furthermore, due to assumption (C2)

$$\begin{aligned} \sum _{i=1}^nB^T{\mathbb {E}}_n^{*}(\eta _{i,n}^{*}\eta _{i,n}^{*T})B\rightarrow B^TL(\beta _0)B=\sum _{1\le s,t\le n}a_sW^T(u_s,\beta _0)L(\beta _0)W(u_t,\beta _0)a_t, \end{aligned}$$

as \(n\rightarrow \infty\), w.p. 1.

Now check that

$$\begin{aligned} \begin{aligned} {\mathbb {E}}_n^{*}&(\xi _{i,n}^{*}\eta _{i,n}^{*})\\&=\frac{1}{n}{\mathbb {E}}_n^{*}\left( \left( \delta _i^{*}-m(\beta _n^TX_i)\right) l(\beta _n^TX_i, \delta _i^{*})\right) \\&=\frac{1}{n}{\mathbb {E}}_n^{*}\left( \left( \delta _i^{*}-m(\beta _n^TX_i)\right) \left( \delta _i^{*}\frac{w(X_i, \beta _n)}{m(\beta _n^TX_i)}-(1-\delta _i^{*})\frac{w(X_i, \beta _n)}{1-m(\beta _n^TX_i)}\right) \right) \\&=\frac{1}{n}{\mathbb {E}}_n^{*}\left( \left( \delta _i^{*}-m(\beta _n^TX_i)\right) \frac{\left( \delta _i^{*}-m(\beta _n^TX_i)\right) w(X_i, \beta _n)}{m(\beta _n^TX_i)(1-m(\beta _n^TX_i))}\right) \\&=\frac{1}{n}w(X_i,\beta _n). \end{aligned} \end{aligned}$$

Thus, for the last term, due to Lemma 2 (i), we get as \(n\rightarrow \infty\)

$$\begin{aligned} \begin{aligned} B^T\sum _{i=1}^n&{\mathbb {E}}_n^{*}(\xi _{i,n}^{*}\eta _{i,n}^{*})A_{i,n}\\&=\left( \sum _{l=1}^ka_lW(u_l,\beta _0)\right) ^T\frac{1}{n}\sum _{i=1}^n \left( w(X_i,\beta _n)\sum _{j=1}^ka_jI(\beta _n^TX_i<u_j)\right) \\&=\sum _{1\le ,s,t\le k}\left[ a_sW(u_s,\beta _0)^T\frac{1}{n}\sum _{i=1}^n \left( w(X_i,\beta _n)I(\beta _n^TX_i<u_t)\right) a_t\right] \\&\rightarrow \sum _{1\le ,s,t\le k}a_sW(u_s,\beta _0)^TW(u_t,\beta _0)a_t. \end{aligned} \end{aligned}$$

And finally, as \(n\rightarrow \infty\), w.p. 1,

$$\begin{aligned} Var_n^{*}(Z_n^{1*})\rightarrow \sum _{1\le s,t\le n}a_s{\hat{K}}(u_s,u_t)a_t=a^T\Sigma a. \end{aligned}$$

Assume that \(a^T\Sigma a>0\). Then we have to prove the Lindeberg condition

$$\begin{aligned} \begin{aligned} \frac{1}{Var_n^{*}(Z_n^{1*})}\sum _{i=1}^n\int&(\xi _{i,n}^{*}A_{i,n}-\eta _{i,n}^{*T}B)^2\\&\cdot I\left( \left| \xi _{i,n}^{*}A_{i,n}-\eta _{i,n}^{*T}B\right| >\varepsilon \sqrt{Var_n^{*}(Z_n^{1*})}\right) d{\mathbb {P}}^{*}\rightarrow 0, \end{aligned} \end{aligned}$$

as \(n\rightarrow \infty\), w.p. 1.

The integral equals

$$\begin{aligned} n^{-1}\sum _{i=1}^n(\xi _{i,n}A_{i,n}-\eta _{i,n}^TB)^2 I\left( \left| \xi _{i,n}A_{i,n}-\eta _{i,n}^TB\right| >\varepsilon \sqrt{Var_n(Z_n^1)}\right) . \end{aligned}$$

Since \(Var_n^{*}(Z_n^1)\rightarrow a^TL(\beta _0)a\) we get for the indicator

$$\begin{aligned} \begin{aligned} I&\left( \left| \xi _{i,n}A_{i,n}-\eta _{i,n}^{T}B\right|>\varepsilon \sqrt{Var_n(Z_n^1)}\right) \\&=I\left( \left| (\delta _i-m(\beta _n^TX_i))A_{i,n}-n^{-1/2}l^T(\beta _n^TX_i, \delta _i)B\right|>\varepsilon n^{1/2} \sqrt{Var_n(Z_n^{1})}\right) \\&<I\left( \left| (\delta _i-m(\beta _n^TX_i))A_{i,n}-n^{-1/2}l^T(\beta _n^TX_i, \delta _i)B\right| >\varepsilon i^{1/2} \sqrt{a^TL(\beta _0)a}/2\right) , \end{aligned} \end{aligned}$$

for \(1\le i\le n\) an n sufficiently large.

Since \(|(\delta _i-m(\beta _n^TX_i))A_{i,n}|\) and B are bounded, assumption (E2) yields

$$\begin{aligned} \begin{aligned} \sum _{i\ge 1}&{\mathbb {P}}\left( \left| \left( \delta _i-m(\beta _n^TX_i)\right) A_{i,n}-n^{-1/2}l^T(\beta _n^TX_i, \delta _i)B\right| ^2>\varepsilon i\right) \\&=\sum _{i\ge 1}{\mathbb {P}}\left( \left| \left( \delta _i-m(\beta _n^TX_i)\right) A_{i,n}-n^{-1/2}l^T(\beta _n^TX_i, \delta _i)B\right| ^2>\varepsilon i\right) \\&=\sum _{i\ge 1}\varepsilon ^{-1}\int _{[\varepsilon (i-1),\varepsilon i]}{\mathbb {P}}\left( \left| \left( \delta _i-m(\beta _n^TX)\right) A_{i,n}\right. \right. \\&\qquad \left. \left. -n^{-1/2}l^T(\beta _n^TX, \delta _i)B\right| ^2>\varepsilon i\right) dx\\&\le \sum _{i\ge 1}\varepsilon ^{-1}\int _{[\varepsilon (i-1),\varepsilon i]}{\mathbb {P}}\left( \left| \left( \delta _i-m(\beta _n^TX)\right) A_{i,n}-\right. \right. \\&\qquad \left. \left. -n^{-1/2}l^T(\beta _n^TX, \delta _i^{*})B\right| ^2>x\right) dx\\&=\varepsilon ^{-1}\int _{0}^{\infty }{\mathbb {P}}\left( \left| \left( \delta _i-m(\beta _n^TX)\right) A_{i,n}-n^{-1/2}l^T(\beta _n^TX, \delta _i)B\right| ^2>x\right) dx\\&=\varepsilon ^{-1}{\mathbb {E}}\left( \left| \left( \delta _i-m(\beta _n^TX)\right) A_{i,n}-n^{-1/2}l^T(\beta _n^TX, \delta _i)B\right| ^2\right) \\&\le \varepsilon ^{-1}\left( {\mathbb {E}}\left( \left| \left( \delta _i-m(\beta _n^TX)\right) A_{i,n}\right| ^2\right) \right. \\&\qquad +2{\mathbb {E}}\left( \left| \left( \delta _i-m(\beta _n^TX)\right) A_{i,n}\right| \left| n^{-1/2}l^T(\beta _n^TX, \delta _i)B\right| \right) \\&\qquad \left. +{\mathbb {E}}\left( \left| n^{-1/2}l^T(\beta _n^TX, \delta _i)B\right| ^2\right) \right) \\&<\infty . \end{aligned} \end{aligned}$$

Thus, Borel-Cantelli yields

$$\begin{aligned} \limsup \limits _{i\rightarrow \infty }\frac{\left| \left( \delta _i-m(\beta _n^TX_i)\right) A_{i,n}-n^{-1/2}l^T(\beta _n^TX_i, \delta _i)B\right| }{\sqrt{i}}=0, \end{aligned}$$

w.p. 1. Therefore, the indicator equals 0 as \(n\rightarrow \infty\), and the Lindeberg condition is fulfilled and the finite dimensional distributions converge against a centered normal distribution with variance \(a^T\Sigma a\) in distribution as \(n\rightarrow \infty\). \(\square\)