Appendix
Proof of Theorem 1
Define
$$\begin{aligned} A_n^*(\mathbf{u })= & {} \sum _{k=1}^K\sum _{i=1}^n \frac{1}{N\pi ^*_i} A^*_{ik}(\mathbf{u }), \end{aligned}$$
where \(A^*_{ik}(\mathbf{u }) = \rho _{\tau _k}( \varepsilon _i^*-b_{0k}- \mathbf{u }^\textsf {T} \tilde{\mathbf{x }}_{ik}^* /\sqrt{n})-\rho _{\tau _k} ( \varepsilon _i^*-b_{0k})\), \(\tilde{\mathbf{x }}_{ik}^*=(\mathbf{x }_i^{*\textsf {T}}, \mathbf{e }_k^\textsf {T})^\textsf {T}\), and \(\varepsilon _i^*=y_i^*-{\varvec{\beta }}^\textsf {T}_0\mathbf{x }_i^*\), \(i=1,\ldots ,n\). As a function of \(\mathbf{u }\), \(A_n^*(\mathbf{u })\) is convex and its minimizer is \( \sqrt{n}( \tilde{{\varvec{\theta }}}_S-{\varvec{\theta }}_0)\). Thus, we can focus on \(A_n^*(\mathbf{u })\) when assessing the properties of \( \sqrt{n}( \tilde{{\varvec{\theta }}}_S-{\varvec{\theta }}_0)\).
Let \(\psi _\tau (u)=\tau -I(u<0)\). By the identity (Knight 1998),
$$\begin{aligned} \rho _{\tau }(u-v)-\rho _{\tau }(u)= & {} -v\psi _\tau (u)+\int _0^v \{I(u\le s)-I(u\le 0)\}ds, \end{aligned}$$
we obtain
$$\begin{aligned} A^*_{ik}(\mathbf{u })= & {} \rho _{\tau _k} ( \varepsilon _i^*-b_{0k}- \mathbf{u }^\textsf {T}\tilde{\mathbf{x }}_{ik}^* /\sqrt{n})-\rho _{\tau _k}( \varepsilon _i^*-b_{0k})\\= & {} - \frac{1}{\sqrt{n}}\mathbf{u }^\textsf {T}\tilde{\mathbf{x }}_{ik}^* \{\tau _k- I(\varepsilon _i^*-b_{0k}<0 )\}\\&+ \int _0^{{\mathbf{u }}^\textsf {T}\tilde{{\mathbf{x }}}_{ik}^*/\sqrt{n}} \{I(\varepsilon _i^*- b_{0k}\le s)-I(\varepsilon _i^*-b_{0k}\le 0)\}ds. \end{aligned}$$
Thus, we write
$$\begin{aligned} A_n^*(\mathbf{u })= & {} -\mathbf{u }^\textsf {T}\frac{1}{\sqrt{n}}\sum _{k=1}^K\sum _{i=1}^n \frac{1}{N\pi ^*_i} \{\tau _k- I(\varepsilon _i^*-b_{0k}<0 )\}\tilde{\mathbf{x }}_{ik}^* \nonumber \\&+\sum _{k=1}^K\sum _{i=1}^n \frac{1}{N\pi ^*_i}\int _0^{ {\mathbf{u }}^\textsf {T}\tilde{{\mathbf{x }}}_{ik} /\sqrt{n}} \{I(\varepsilon _i^*-b_{0k}\le s)-I(\varepsilon _i^*-b_{0k}\le 0)\}ds\nonumber \\= & {} \mathbf{u }^\textsf {T} \mathbf{Z }_{n}^* +A^*_{2n}(\mathbf{u }), \end{aligned}$$
(15)
where
$$\begin{aligned}&\mathbf{Z }_{n}^*= - \frac{1}{\sqrt{n}}\sum _{k=1}^K\sum _{i=1}^n \frac{1}{N\pi ^*_i} \{\tau _k- I(\varepsilon _i^*-b_{0k}<0 )\}\tilde{\mathbf{x }}_{ik}^*,\\&A^*_{2n}(\mathbf{u })= \sum _{i=1}^n\frac{1}{N\pi ^*_i} A^*_{2n,i}(\mathbf{u }), \\&A^*_{2n,i}(\mathbf{u })= \sum _{k=1}^K\int _0^{{\mathbf{u }}^\textsf {T}\tilde{{\mathbf{x }}}_{ik}^* /\sqrt{n}} \{I(\varepsilon _i^*-b_{0k}\le s)-I(\varepsilon _i^*-b_{0k}\le 0)\}ds. \end{aligned}$$
We first prove the asymptotic normality of \(\mathbf{Z }_{n}^* \). Denote
$$\begin{aligned} {\varvec{\eta }}_i^*= & {} -\frac{1}{N\pi ^*_i} \sum _{k=1}^K \{\tau _k- I(\varepsilon _i^*-b_{0k}<0)\}\tilde{\mathbf{x }}_{ik}^*, \end{aligned}$$
then we can write \(\mathbf{Z }_{n}^*= \frac{1}{\sqrt{n}} \sum _{i=1}^n {\varvec{\eta }}_i^*\). Direct calculation yields
$$\begin{aligned} E({\varvec{\eta }}_i^*|{\mathbb {D}}_N)= & {} -\frac{1}{N} \sum _{i=1}^N \sum _{k=1}^K \{\tau _k-I(\varepsilon _i-b_{0k}<0 )\}\tilde{\mathbf{x }}_{ik}=O_p(N^{-1/2}), \\ \text{ cov }({\varvec{\eta }}_i^*|{\mathbb {D}}_N)= & {} E\{({\varvec{\eta }}_i^*)^{\otimes 2} |{\mathbb {D}}_N\}-\{E({\varvec{\eta }}_i^*|{\mathbb {D}}_N)\}^{\otimes 2} \\= & {} \sum _{i=1}^N \frac{1}{N^2\pi _i} \biggr \{\sum _{k=1}^K [\tau _k-I(\varepsilon _i-b_{0k}<0)] \tilde{\mathbf{x }}_{ik}\biggr \} ^{\otimes 2}- \{E({\varvec{\eta }}_i^*|{\mathbb {D}}_N)\}^{\otimes 2} \\= & {} \sum _{i=1}^N \frac{1}{N^2\pi _i} \biggr \{\sum _{k=1}^K [\tau _k-I(\varepsilon _i-b_{0k}<0)] \tilde{\mathbf{x }}_{ik}\biggr \} ^{\otimes 2} - o_p(1). \end{aligned}$$
It follows that
$$\begin{aligned}&E\{E({\varvec{\eta }}_{i}^*|{\mathbb {D}}_N)\}= 0, \\&\text{ cov }\{E({\varvec{\eta }}_{i}^*|{\mathbb {D}}_N)\}= \frac{1}{N^2} \sum _{i=1}^N \text{ cov }\left\{ \sum _{k=1}^K\left[ \tau _k-I(\varepsilon _i<b_{0k})\right] \tilde{\mathbf{x }}_{ik}\right\} . \end{aligned}$$
Consider the (s, t)th element of \( \text{ cov }\{E({\varvec{\eta }}_{i}^*|{\mathbb {D}}_N)\}\), denoted by \(\sigma _{st}\). Using the \(c_r\) inequality, we have \(|\sigma _{st}| \le \sqrt{\sigma _{ss}}\sqrt{\sigma _{tt}}\le \frac{1}{N^2}\sum _{i=1}^NK(\Vert \mathbf{x }_i\Vert ^2+1)=O(N^{-1})\) under Assumption 1(b). By Chebyshev’s inequality, \( E({\varvec{\eta }}_i^*|{\mathbb {D}}_N) =O_p(N^{-1/2})\).
We now check Lindeberg’s conditions (Theorem 2.27 of van der Vaart 1998) under the conditional distribution given \({\mathbb {D}}_N\). Specifically, we want to show that for \(\epsilon >0\),
$$\begin{aligned}&\sum _{i=1}^n E\{\Vert n^{-1/2}{\varvec{\eta }}_i^* \Vert ^2 I(\Vert {\varvec{\eta }}_i^*\Vert>\sqrt{n}\epsilon ) |{\mathbb {D}}_N\} \nonumber \\&\quad = \sum _{i=1}^n E\biggr \{\biggr \Vert \frac{1}{\sqrt{n} N \pi _i^*} \sum _{k=1}^K \tilde{\mathbf{x }}_{ik}^*\{\tau _k-I(\varepsilon _i-b_{0k}<0 )\} \biggr \Vert ^2 \nonumber \\&\qquad \times I\biggr (\biggr \Vert \frac{1}{\sqrt{n} N \pi _i^* \epsilon } \sum _{k=1}^K \tilde{\mathbf{x }}_{ik}^* \{\tau _k-I(\varepsilon _i-b_{0k}<0)\} \biggr \Vert>1 \biggr ) \biggr | {\mathbb {D}}_N \biggr \} \nonumber \\&\quad = \sum _{i=1}^N\frac{1}{ N^2 \pi _i} \biggr \Vert \sum _{k=1}^K \{\tau _k-I(\varepsilon _i-b_{0k}<0)\}\tilde{\mathbf{x }}_{ik} \biggr \Vert ^2 \nonumber \\&\qquad \times I\biggr (\frac{1}{\sqrt{n} N \pi _i \epsilon } \biggr \Vert \sum _{k=1}^K \{\tau _k-I(\varepsilon _i-b_{0k}<0)\} \tilde{\mathbf{x }}_{ik} \biggr \Vert >1 \biggr ) \end{aligned}$$
(16)
goes to zero in probability. If condition (7) holds, then the right hand side of (16) satisfies that
$$\begin{aligned}&\sum _{i=1}^N\frac{1}{ N^2 \pi _i} \biggr \Vert \sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\}\tilde{\mathbf{x }}_{ik} \biggr \Vert ^2 \nonumber \\&\quad \times I\biggr (\frac{1}{\sqrt{n} N \pi _i \epsilon } \biggr \Vert \sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\} \tilde{\mathbf{x }}_{ik} \biggr \Vert>1 \biggr ) \nonumber \\&\qquad \le K^2 \sum _{i=1}^N\frac{1}{ N^2 \pi _i} (1+ \Vert \mathbf{x }_{i}\Vert )^2 I\biggr (\frac{K(1+ \Vert \mathbf{x }_{i}\Vert )}{\sqrt{n} N \pi _i \epsilon }>1 \biggr ) \nonumber \\&\qquad \le I\biggr (\max _{1\le i\le N} \frac{\Vert \mathbf{x }_{i}\Vert +1 }{\pi _i}>\frac{\sqrt{n} N\epsilon }{ K} \biggr )\biggr ( K^2\sum _{i=1}^N\frac{(1+ \Vert \mathbf{x }_{i}\Vert )^2}{N^2\pi _i}\biggr ). \end{aligned}$$
By Assumption 2(a), \(\max _{1\le i\le N}\frac{\Vert {\mathbf{x }}_i\Vert +1}{\pi _i}=o_p(\sqrt{n}N)\). By Assumption 2(b), \(K^2\sum _{i=1}^N\frac{(1+ \Vert {\mathbf{x }}_{i}\Vert )^2}{N^2\pi _i}=O_p(1)\). It follows that
$$\begin{aligned} \sum _{i=1}^n E\{\Vert n^{-1/2}{\varvec{\eta }}_i^* \Vert ^2 I(\Vert {\varvec{\eta }}_i^*\Vert >\sqrt{n}\epsilon ) |{\mathbb {D}}_N\}=o_p(1), \end{aligned}$$
which shows that Lindeberg’s conditions hold with probability approaching one.
Given \({\mathbb {D}}_N\), \({\varvec{\eta }}^*_i\), \(i=1,\ldots ,n\), are independent and identically distributed with mean \(E({\varvec{\eta }}_i^*|{\mathbb {D}}_N)=O_p(N^{-1/2})\) and the covariance \(\text{ cov }({\varvec{\eta }}_i^*|{\mathbb {D}}_N)\). Thus, conditional on \({\mathbb {D}}_N\), when \(n,N \rightarrow +\infty \), with probability approaching one,
$$\begin{aligned} \{\text{ cov }({\varvec{\eta }}_i^*|{\mathbb {D}}_N) \}^{-1/2}\{ \mathbf{Z }_{n}^*- \sqrt{n} E({\varvec{\eta }}_i^*|{\mathbb {D}}_N)\}&{\mathop {\longrightarrow }\limits ^{d}}&N(\mathbf{0 },\mathbf{I }). \end{aligned}$$
Since \(\sqrt{n} E({\varvec{\eta }}_i^*|{\mathbb {D}}_N)=O_p(\sqrt{n}/\sqrt{N})=o_p(1)\), it follows that
$$\begin{aligned} \{\text{ cov }({\varvec{\eta }}_i^*|{\mathbb {D}}_N) \}^{-1/2} \mathbf{Z }_{n}^*&{\mathop {\longrightarrow }\limits ^{d}}&N(\mathbf{0 },\mathbf{I }). \end{aligned}$$
(17)
Next, we prove that
$$\begin{aligned} A^*_{2n}(\mathbf{u })= & {} \frac{1}{2}\mathbf{u }^\textsf {T}\mathbf{E }\mathbf{u }+o_{p}(1). \end{aligned}$$
Write the conditional expectation of \(A^*_{2n}(\mathbf{u })\) as
$$\begin{aligned}&E\{A^*_{2n}(\mathbf{u })|{\mathbb {D}}_N\} \nonumber \\&\qquad = \frac{n}{N}\sum _{i=1}^N E\{A_{2n,i}(\mathbf{u })\}+\frac{n}{N}\sum _{i=1}^N [ A_{2n,i}(\mathbf{u })-E\{A_{2n,i}(\mathbf{u })\}]. \end{aligned}$$
(18)
By Assumption 1,
$$\begin{aligned}&\frac{n}{N}\sum _{i=1}^N E(A_{2n,i}(\mathbf{u })) \nonumber \\&\quad = \frac{n}{N}\sum _{i=1}^N \sum _{k=1}^K \int _0^{{\mathbf{u }}^\textsf {T}\tilde{{\mathbf{x }}}_{ik}/\sqrt{n}} \{F(b_{0k}+s)-F(b_{0k})\}ds \nonumber \\&\quad =\frac{\sqrt{n}}{N}\sum _{i=1}^N \sum _{k=1}^K\int _0^{{\mathbf{u }}^\textsf {T}\tilde{{\mathbf{x }}}_{ik}}\{F(b_{0k}+t/\sqrt{n})-F(b_{0k})\}dt \nonumber \\&\quad =\frac{1}{2}\mathbf{u }^\textsf {T}\left( \frac{1}{N}\sum _{i=1}^N \sum _{k=1}^K f(b_{0k})\tilde{\mathbf{x }}_{ik}\tilde{\mathbf{x }}_{ik}^\textsf {T}\right) \mathbf{u }+o(1) \nonumber \\&\quad = \frac{1}{2}\mathbf{u }^\textsf {T}\mathbf{E }\mathbf{u }+o(1). \end{aligned}$$
(19)
The second term of (18) has mean 0 and its variance satisfies
$$\begin{aligned} \text{ var }\biggr (\frac{n}{N}\sum _{i=1}^N [ A_{2n,i}(\mathbf{u })-E\{A_{2n,i}(\mathbf{u })\}]\biggr ) \le \frac{n^2}{N^2} \sum _{i=1}^N E\{A_{2n,i}^2(\mathbf{u })\}. \end{aligned}$$
(20)
From the fact that \(A_{2n,i}(\mathbf{u })\) is nonnegative, we obtain
$$\begin{aligned} A_{2n,i}(\mathbf{u })\le & {} \left| \sum _{k=1}^K \int _0^{{\mathbf{u }}^\textsf {T}\tilde{{\mathbf{x }}}_{ik}/\sqrt{n}} \{I(\varepsilon _i\le b_{0k}+s)-I(\varepsilon _i\le b_{0k})\}ds \right| \nonumber \\\le & {} \sum _{k=1}^K \int _0^{{\mathbf{u }}^\textsf {T}\tilde{{\mathbf{x }}}_{ik}/\sqrt{n}} \left| \{I(\varepsilon _i\le b_{0k}+s)-I(\varepsilon _i\le b_{0k})\} \right| ds \nonumber \\\le & {} \frac{1}{\sqrt{n}}\sum _{k=1}^K |\mathbf{u }^\textsf {T} \tilde{\mathbf{x }}_{ik}|. \end{aligned}$$
(21)
By Assumption 1(b), \(\max _{1\le i \le N}\Vert \mathbf{x }_i\Vert =o(\sqrt{N})\). Combining this fact, (20), and (21), we have
$$\begin{aligned}&\text{ var }\biggr (\frac{n}{N}\sum _{i=1}^N [ A_{2n,i}(\mathbf{u })-E\{A_{2n,i}(\mathbf{u })\}]\biggr ) \nonumber \\&\quad \le \left\{ K\frac{\Vert \mathbf{u }\Vert }{\sqrt{N}}(1+\max _{1\le i \le N}\Vert \mathbf{x }_i\Vert ) \right\} \frac{\sqrt{n}}{\sqrt{N}} \frac{n}{N} \sum _{i=1}^N E\{ A_{2n,i}(\mathbf{u })\}= o(1). \end{aligned}$$
(22)
From (18), (19), (22), and Chebyshev’s inequality, it follows that
$$\begin{aligned} E\left\{ A^*_{2n}(\mathbf{u })| {\mathbb {D}}_N \right\} =\frac{n}{N}\sum _{i=1}^N A_{2n,i}(\mathbf{u })=\frac{1}{2}\mathbf{u }^\textsf {T}\mathbf{E }\mathbf{u }+o_p(1). \end{aligned}$$
(23)
Next, we exam the conditional variance of \(A^*_{2n}(\mathbf{u })\), i.e., \(\text{ var }\left\{ A^*_{2n}(\mathbf{u })| {\mathbb {D}}_N \right\} \). Observing that conditional on \({\mathbb {D}}_N\), \(A^*_{2n,i}(\mathbf{u }), i=1,\ldots , n\) are independent and identically distributed, then
$$\begin{aligned} \text{ var }\left\{ A^*_{2n}(\mathbf{u }) |{\mathbb {D}}_N\right\}= & {} \frac{1}{N^2} \sum _{i=1}^n \text{ var }\biggr \{ \frac{A^*_{2n,i}(\mathbf{u })}{\pi ^*_i} \biggr |{\mathbb {D}}_N\biggr \} \nonumber \\\le & {} \frac{n}{N^2}E\biggr [\biggr \{\frac{A^*_{2n,i}(\mathbf{u })}{\pi ^*_i}\biggr \}^2 \biggr |{\mathbb {D}}_N\biggr ]. \end{aligned}$$
(24)
By (21), the right hand of (24) satisfies
$$\begin{aligned} \frac{n}{N^2}\sum _{i=1}^N \frac{A^2_{2n,i}(\mathbf{u })}{\pi _i}\le & {} \frac{n}{N^2}\sum _{i=1}^N \frac{A_{2n,i}(\mathbf{u })}{\pi _i}\biggr (\frac{1}{\sqrt{n}}\sum _{k=1}^K |\mathbf{u }^\textsf {T} \tilde{\mathbf{x }}_{ik}| \biggr ) \nonumber \\\le & {} \frac{1}{\sqrt{n}N} \biggr (K\Vert \mathbf{u }\Vert \max _{1\le i \le N}\frac{\Vert \mathbf{x }_i\Vert +1}{\pi _i}\biggr )\frac{n}{N}\sum _{i=1}^N A_{2n,i}(\mathbf{u }). \end{aligned}$$
(25)
From (19), (25) and Assumption 2(a), we have
$$\begin{aligned} \text{ var }\biggr \{ A^*_{2n}(\mathbf{u }) |{\mathbb {D}}_N\biggr \}=o_p(1). \end{aligned}$$
(26)
From (21), (26), and Chebyshev’s inequality,
$$\begin{aligned} A^*_{2n}(\mathbf{u })= & {} \frac{1}{2}\mathbf{u }^\textsf {T}\mathbf{E }\mathbf{u }+o_{p|{\mathbb {D}}_N}(1), \end{aligned}$$
(27)
Here \(a=o_{p|{\mathbb {D}}_N}(1)\) means a converges to 0 in conditional probability given \({\mathbb {D}}_N\) in probability, namely, for any \(\delta >0\), \(P(|a|>\delta |{\mathbb {D}}_N) {\mathop {\longrightarrow }\limits ^{p}}0\) as \(N\rightarrow +\infty \). Note that \(0\le P(|a|>\delta |{\mathbb {D}}_N) \le 1\), thus it converges to 0 in probability if and only \(P(|a|>\delta )= E\{P(|a|>\delta |{\mathbb {D}}_N)\} \rightarrow 0\). Thus, \(a=o_{p|{\mathbb {D}}_N}(1)\) is equivalent to \(a=o_{p}(1)\). We will use the notation of \(o_p\) only.
From (15) and (27), we have
$$\begin{aligned} A_n^*(\mathbf{u })= & {} \mathbf{u }^\textsf {T} \mathbf{Z }_{n}^*+ \frac{1}{2} \mathbf{u }^\textsf {T}\mathbf{E }\mathbf{u }+o_{p}(1). \end{aligned}$$
Since \(A_n^*(\mathbf{u })\) is a convex function, then from the corollary in page 2 of Hjort and Pollard (2011), its minimizer, \(\sqrt{n}(\tilde{{\varvec{\theta }}}_S- {\varvec{\theta }}_0)\), satisfies that
$$\begin{aligned} \sqrt{n} (\tilde{{\varvec{\theta }}}_S- {\varvec{\theta }}_0) = - \mathbf{E }_N^{-1} \mathbf{Z }_{n}^*+o_p(1). \end{aligned}$$
Thus, we have
$$\begin{aligned} \{\mathbf{E }_N^{-1}\mathbf{V }_\pi \mathbf{E }_N^{-1}\}^{-1/2} \sqrt{n} (\tilde{{\varvec{\theta }}}_S-{\varvec{\theta }}_0)= & {} -\{\mathbf{E }_N^{-1}\mathbf{V }_\pi \mathbf{E }_N^{-1}\}^{-1/2} \mathbf{E }^{-1}_N \mathbf{Z }_{n}^*+o_p(1). \end{aligned}$$
Combining (17) and Slutsky’s Theorem, we have that, for any \(\mathbf{a }\in {\mathbb {R}}^{p+K}\),
$$\begin{aligned} P[ \{\mathbf{E }_N^{-1}\mathbf{V }_\pi \mathbf{E }_N^{-1}\}^{-1/2} \sqrt{n} (\tilde{{\varvec{\theta }}}_S-{\varvec{\theta }}_0)\le \mathbf{a }|{\mathbb {D}}_N]{\mathop {\longrightarrow }\limits ^{p}}\varPhi _{p+K}(\mathbf{a }), \end{aligned}$$
(28)
where \(\varPhi _{p+K}(\mathbf{a })\) denotes the standard \(p+K\)-dimensional multivariate normal distribution function. Note that the conditional probability in (28) is a bounded random variable, thus convergence in probability to a constant implies convergence in the mean. Therefore, for any \(\mathbf{a }\in {\mathbb {R}}^{p+K}\),
$$\begin{aligned}&P[ \{\mathbf{E }_N^{-1}\mathbf{V }_\pi \mathbf{E }_N^{-1}\}^{-1/2} \sqrt{n} (\tilde{{\varvec{\theta }}}_S-{\varvec{\theta }}_0)\le \mathbf{a }]\\&\quad =E(P[ \{\mathbf{E }_N^{-1}\mathbf{V }_\pi \mathbf{E }_N^{-1}\}^{-1/2} \sqrt{n} (\tilde{{\varvec{\theta }}}_S-{\varvec{\theta }}_0)\le \mathbf{a }|{\mathbb {D}}_N])\\&\quad \rightarrow \varPhi _{p+K}(\mathbf{a }). \end{aligned}$$
This finishes the proof of Theorem 1.
Proof the Theorem 2
Note that
$$\begin{aligned} \text{ tr }(\mathbf{E }_N^{-1}\mathbf{V }_\pi \mathbf{E }_N^{-1})= & {} \frac{1}{N^2} \sum _{i=1}^N \text{ tr }\biggr ( \frac{1}{\pi _i}\mathbf{E }_N^{-1} \biggr [\sum _{k=1}^K \{I(\varepsilon _i<b_{0k})-\tau _k\}\tilde{\mathbf{x }}_{ik} \biggr ]^{\otimes 2}\mathbf{E }_N^{-1}\biggr ) \nonumber \\= & {} \frac{1}{N^2}\sum _{i=1}^N \biggr [\frac{1}{\pi _i}\biggr \Vert \sum _{k=1}^K [I(\varepsilon _i<b_{0k})-\tau _k]\mathbf{E }_N^{-1}\tilde{\mathbf{x }}_{ik}\biggr \Vert ^2\biggr ] \nonumber \\= & {} \frac{1}{N^2}\biggr (\sum _{i=1}^N \pi _i \biggr ) \biggr (\sum _{i=1}^N \frac{1}{\pi _i}\biggr \Vert \sum _{k=1}^K [I(\varepsilon _i<b_{0k})-\tau _k]\mathbf{E }_N^{-1}\tilde{\mathbf{x }}_{ik}\biggr \Vert ^2 \biggr ) \nonumber \\\ge & {} \frac{1}{N^2} \biggr [ \sum _{i=1}^N \biggr \Vert \sum _{k=1}^K\{I(\varepsilon _i<b_{0k})-\tau _k\}\mathbf{E }_N^{-1}\tilde{\mathbf{x }}_{ik}\biggr \Vert \biggr ]^2 \end{aligned}$$
where the last step is from the Cauchy-Schwarz inequality and the equality in it holds if and only if when \(\pi _i \propto \Vert \sum _{k=1}^K [I(\varepsilon _i<b_{0k})-\tau _k]\mathbf{E }_N^{-1}\tilde{\mathbf{x }}_{ik}\Vert \).
Proof the Theorem 3
Note that
$$\begin{aligned} \text{ tr }(\mathbf{V }_\pi )= & {} \text{ tr }\biggr (\sum _{i=1}^N \frac{1}{N^2\pi _i} \biggr [\sum _{k=1}^K \{I(\varepsilon _i<b_{0k})-\tau _k\} \tilde{\mathbf{x }}_{ik}\biggr ]^{\otimes 2}\biggr ) \nonumber \\= & {} \frac{1}{N^2} \sum _{i=1}^N \text{ tr }\biggr ( \frac{1}{\pi _i}\biggr [\sum _{k=1}^K \{I(\varepsilon _i<b_{0k})-\tau _k\} \tilde{\mathbf{x }}_{ik}\biggr ]^{\otimes 2}\biggr ) \nonumber \\= & {} \frac{1}{N^2}\sum _{i=1}^N \biggr [ \frac{1}{\pi _i} \biggr \Vert \sum _{k=1}^K \{I(\varepsilon _i<b_{0k})-\tau _k\} \tilde{\mathbf{x }}_{ik} \biggr \Vert ^2\biggr ] \nonumber \\= & {} \frac{1}{N^2}\biggr (\sum _{i=1}^N \pi _i \biggr ) \biggr [\sum _{i=1}^N \frac{1}{\pi _i}\biggr \Vert \sum _{k=1}^K \{I(\varepsilon _i<b_{0k})-\tau _k\} \tilde{\mathbf{x }}_{ik} \biggr \Vert ^2 \biggr ] \nonumber \\\ge & {} \frac{1}{N^2} \biggr [\sum _{i=1}^N \biggr \Vert \sum _{k=1}^K \{I(\varepsilon _i<b_{0k})-\tau _k\} \tilde{\mathbf{x }}_{ik} \biggr \Vert \biggr ]^2 \end{aligned}$$
where the last step is from the Cauchy-Schwarz inequality and the equality in it holds if and only if when \(\pi _i \propto \Vert \sum _{k=1}^K \{I(\varepsilon _i<b_{0k})-\tau _k\} \tilde{\mathbf{x }}_{ik} \Vert \).
Proof the Theorem 4
Recall that \(\varepsilon _i^*=y_i^*-{\varvec{\beta }}^\textsf {T}_0\mathbf{x }_i^*\), \(i= 1,\ldots ,n\). Denote
$$\begin{aligned} {\tilde{A}}_n^*(\mathbf{u })= & {} \sum _{k=1}^K\sum _{i=1}^n \frac{1}{N\tilde{\pi }_{i}^{*Lopt}} A^*_{ik}(\mathbf{u }), \end{aligned}$$
where \(A^*_{ik}(\mathbf{u }) = \rho _{\tau _k} ( \varepsilon _i^*-b_{0k}- \mathbf{u }^\textsf {T}\tilde{\mathbf{x }}_{ik}^* /\sqrt{n})-\rho _{\tau _k} (\varepsilon _i^*-b_{0k})\).
As a function of \(\mathbf{u }\), \({\tilde{A}}_n^*(\mathbf{u })\) is convex and its minimizer is \( \sqrt{n}(\tilde{{\varvec{\theta }}}_{Lopt}-{\varvec{\theta }}_0)\). Thus we can focus on \({\tilde{A}}_n^*(\mathbf{u })\) when assessing the properties of \( \sqrt{n}(\tilde{{\varvec{\theta }}}_{Lopt}-{\varvec{\theta }}_0)\). We can write
$$\begin{aligned} {\tilde{A}}_n^*(\mathbf{u })= & {} -\mathbf{u }^\textsf {T}\frac{1}{\sqrt{n}}\sum _{k=1}^K\sum _{i=1}^n \frac{1}{N\tilde{\pi }_{i}^{*Lopt}} \{\tau _k- I(\varepsilon _i^*-b_{0k} <0 )\}\tilde{\mathbf{x }}_{ik}^* \nonumber \\&+\sum _{k=1}^K\sum _{i=1}^n \frac{1}{N\tilde{\pi }_{i}^{*Lopt}}\int _0^{{\mathbf{u }}^\textsf {T}\tilde{{\mathbf{x }}}_{ik}^*/\sqrt{n}} \{I(\varepsilon _i^*-b_{0k}\le s)-I(\varepsilon _i^*- b_{0k}\le 0 )\}ds\nonumber \\= & {} \mathbf{u }^\textsf {T} \tilde{\mathbf{Z }}_{n}^*+ {\tilde{A}}^*_{2n}(\mathbf{u }), \end{aligned}$$
(29)
where
$$\begin{aligned}&\tilde{\mathbf{Z }}_{n}^*=- \frac{1}{\sqrt{n}}\sum _{k=1}^K\sum _{i=1}^n \frac{1}{N\tilde{\pi }_{i}^{*Lopt}} \{\tau _k- I(\varepsilon _i^*<b_{0k} )\}\tilde{\mathbf{x }}_{ik}^*,\\&{\tilde{A}}^*_{2n}(\mathbf{u })=\sum _{i=1}^n \frac{1}{N\tilde{\pi }_{i}^{*Lopt}}{\tilde{A}}^*_{2n,i}(\mathbf{u }),\\&{\tilde{A}}^*_{2n,i}(\mathbf{u })= \sum _{k=1}^K\int _0^{{\mathbf{u }}^\textsf {T}\tilde{{\mathbf{x }}}_{ik}^*/\sqrt{n}} \{I(\varepsilon _i^*\le b_{0k}+s)-I(\varepsilon _i^*\le b_{0k})\}ds. \end{aligned}$$
Conditioning on \(({\mathbb {D}}_N, \tilde{{\varvec{\theta }}}_U)\), we first prove the asymptotic normality of \(\tilde{\mathbf{Z }}_{n}^* \). Denote
$$\begin{aligned} \tilde{{\varvec{\eta }}}_i^*= & {} -\frac{1}{N\tilde{\pi }_{i}^{*Lopt}} \sum _{k=1}^K \{\tau _k- I(\varepsilon _i^*<b_{0k} )\}\tilde{\mathbf{x }}_{ik}^*, \end{aligned}$$
then we can write \(\tilde{\mathbf{Z }}_{n}^*= \frac{1}{\sqrt{n}} \sum _{i=1}^n \tilde{{\varvec{\eta }}}_i^*\). Direct calculation yields
$$\begin{aligned} E(\tilde{{\varvec{\eta }}}_i^*|{\mathbb {D}}_N, \tilde{{\varvec{\theta }}}_U)= & {} -\frac{1}{N} \sum _{i=1}^N \sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\}\tilde{\mathbf{x }}_{ik} =O_p(N^{-1/2}),\nonumber \\ \text{ cov }(\tilde{{\varvec{\eta }}}_i^*|{\mathbb {D}}_N, \tilde{{\varvec{\theta }}}_U)= & {} \sum _{i=1}^N \frac{1}{N^2\tilde{\pi }_{i}^{Lopt}} \biggr [\sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\} \tilde{\mathbf{x }}_{ik} \biggr ]^{\otimes 2}+o_p(1).\nonumber \\ \end{aligned}$$
(30)
Let \(\tilde{\varepsilon }_i= y_i-\tilde{{\varvec{\beta }}}^\textsf {T}_U\mathbf{x }_i\), \(i=1,\ldots ,N\), then we can write
$$\begin{aligned} \tilde{\pi }_{i}^{Lopt}= & {} \frac{\Vert \sum _{k=1}^K\{\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})\}\tilde{\mathbf{x }}_{ik}\Vert }{\sum _{j=1}^N\Vert \sum _{k=1}^K\{\tau _k-I(\tilde{\varepsilon }_j<{\tilde{b}}_{U,k})\}\tilde{\mathbf{x }}_{jk}\Vert },\ \ i=1,\ldots ,N. \end{aligned}$$
Thus we have
$$\begin{aligned}&\sum _{i=1}^N \frac{1}{N^2\tilde{\pi }_{i}^{Lopt}} \biggr [\sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\} \tilde{\mathbf{x }}_{ik} \biggr ]^{\otimes 2} \nonumber \\&\quad =\frac{1}{N}\sum _{i=1}^N \frac{ [\sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\} \tilde{\mathbf{x }}_{ik} ]^{\otimes 2} }{ \Vert \sum _{k=1}^K\{\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})\}\tilde{\mathbf{x }}_{ik}\Vert } \times \frac{1}{N} \sum _{i=1}^N\biggr \Vert \sum _{k=1}^K\{\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})\}\tilde{\mathbf{x }}_{ik}\biggr \Vert \nonumber \\&\quad = \tilde{\varDelta }_1 \times \tilde{\varDelta }_2. \end{aligned}$$
(31)
Let
$$\begin{aligned} \varDelta _1= & {} \frac{1}{N}\sum _{i=1}^N \frac{ [\sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\} \tilde{\mathbf{x }}_{ik} ]^{\otimes 2} }{ \Vert \sum _{k=1}^K\{\tau _k-I(\varepsilon _i<b_{0k})\}\tilde{\mathbf{x }}_{ik}\Vert }, \\ \varDelta _2= & {} \frac{1}{N} \sum _{i=1}^N\biggr \Vert \sum _{k=1}^K\{\tau _k-I(\varepsilon _i<b_{0k})\}\tilde{\mathbf{x }}_{ik}\biggr \Vert . \end{aligned}$$
Next, we show that \(\tilde{\varDelta }_1=\varDelta _1+o_p(1)\) and \(\tilde{\varDelta }_2=\varDelta _2+o_p(1)\). Note that the \((j_1,j_2)\)th element of \(\tilde{\varDelta }_1-\varDelta _1\) \((j_1, j_2=1,\ldots ,p)\) is bounded by
$$\begin{aligned}&|\tilde{\varDelta }_1-\varDelta _1|_{(j_1,j_2)}\le \frac{1}{N} \sum _{i=1}^N \biggr [\sum _{k=1}^K |\tau _k-I(\varepsilon _i<b_{0k})| \Vert \tilde{\mathbf{x }}_{ik}\Vert \biggr ]^{2} \\&\qquad \times \left| \frac{1}{\Vert \sum _{k=1}^K[\tau _k-I(\varepsilon _i<b_{0k})]\tilde{\mathbf{x }}_{ik}\Vert }-\frac{1}{\Vert \sum _{k=1}^K[\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})]\tilde{\mathbf{x }}_{ik}\Vert } \right| \\&\quad \le \frac{1}{N} \sum _{i=1}^N K^{2}(\Vert \mathbf{x }_{i}\Vert +1 )^2 \\&\qquad \times \frac{\sum _{k=1}^K|I(\varepsilon _i<b_{0k})-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})|(\Vert \mathbf{x }_{i}\Vert +1)}{\Vert \sum _{k=1}^K[\tau _k-I(\varepsilon _i<b_{0k})]\tilde{\mathbf{x }}_{ik}\Vert \Vert \sum _{k=1}^K[\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})]\tilde{\mathbf{x }}_{ik}\Vert }. \end{aligned}$$
Observing that
$$\begin{aligned} \biggr |K^{-1}\sum _{k=1}^K\{\tau _k-I(\varepsilon _i<b_{0k})\}\biggr |^2\le K^{-1}\sum _{k=1}^K\{\tau _k-I(\varepsilon _i<b_{0k})\}^2, \end{aligned}$$
we have
$$\begin{aligned}&\biggr \Vert \sum _{k=1}^K[\tau _k-I(\varepsilon _i<b_{0k})]\tilde{\mathbf{x }}_{ik}\biggr \Vert ^2\\&\quad =\biggr \Vert \sum _{k=1}^K[\tau _k-I(\varepsilon _i<b_{0k})]\mathbf{x }_{i}\biggr \Vert ^2+\biggr \Vert \sum _{k=1}^K[\tau _k-I(\varepsilon _i<b_{0k})]\mathbf{e }_{k}\biggr \Vert ^2\\&\quad =\biggr |\sum _{k=1}^K\{\tau _k-I(\varepsilon _i<b_{0k})\}\biggr |^2\Vert \mathbf{x }_{i}\Vert ^2+\sum _{k=1}^K\{\tau _k-I(\varepsilon _i<b_{0k})\}^2\\&\quad \ge \biggr |\sum _{k=1}^K\{\tau _k-I(\varepsilon _i<b_{0k})\}\biggr |^2(\Vert \mathbf{x }_{i}\Vert ^2+K^{-1}). \end{aligned}$$
Similarly,
$$\begin{aligned} \biggr \Vert \sum _{k=1}^K[\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})]\tilde{\mathbf{x }}_{ik}\biggr \Vert ^2\ge \biggr |\sum _{k=1}^K\{\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})\}\biggr |^2(\Vert \mathbf{x }_{i}\Vert ^2+K^{-1}).\nonumber \\ \end{aligned}$$
(32)
Using the inequality
$$\begin{aligned} \frac{(\Vert {\mathbf{x }}_{i}\Vert +1)^2}{\Vert {\mathbf{x }}_i\Vert ^2+K^{-1}}\le 2\frac{\Vert {\mathbf{x }}_i\Vert ^2+1}{\Vert {\mathbf{x }}_i\Vert ^2+K^{-1}}\le 2K, \end{aligned}$$
(33)
we have
$$\begin{aligned}&|\tilde{\varDelta }_1-\varDelta _1|_{(j_1,j_2)}\nonumber \\&\quad \le \frac{2 K^{3}}{N} \sum _{i=1}^N\frac{\sum _{k=1}^K|I(\varepsilon _i<b_{0k})-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})|(\Vert \mathbf{x }_{i}\Vert +1)}{|\sum _{k=1}^K\{\tau _k-I(\varepsilon _i<b_{0k})\}||\sum _{k=1}^K\{\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})\}|}\nonumber \\&\quad \le \frac{2 K^{3}}{N} \sum _{i=1}^N\frac{\sum _{k=1}^K|I(\varepsilon _i<b_{0k})-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})|(\Vert \mathbf{x }_{i}\Vert +1)}{(\min _{0\le j\le K,j\in {\mathbb {Z}}}|j-\sum _{k=1}^K\tau _k|)^2}, \end{aligned}$$
(34)
where \(\min _{0\le j\le K,j\in {\mathbb {Z}}}|j-\sum _{k=1}^K\tau _k|>0\). Note that for each i, \(| I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k}) - I(\varepsilon _i<b_{0k})|\) is bounded and converges in probability to 0, as \(n_0\rightarrow \infty \). Thus, for \(k=1,\ldots ,K\), \(E\{| I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k}) - I(\varepsilon _i<b_{0k})|\}\rightarrow 0\). For any \(\epsilon >0\),
$$\begin{aligned}&P\biggr \{\frac{1}{N} \sum _{i=1}^N\sum _{k=1}^K|I(\varepsilon _i<b_{0k})-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})|(\Vert \mathbf{x }_{i}\Vert +1)>\epsilon \biggr \}\\&\qquad \le \frac{1}{\epsilon N} \sum _{i=1}^N\sum _{k=1}^KE\{|I(\varepsilon _i<b_{0k})-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})|\}(\Vert \mathbf{x }_{i}\Vert +1)\rightarrow 0, \end{aligned}$$
which implies that the term in (34) converges in probability to 0. Thus, \(|\tilde{\varDelta }_1-\varDelta _1|_{(j_1,j_2)}{\mathop {\longrightarrow }\limits ^{p}}0\). Similarly, it is easy to verify that \(|\tilde{\varDelta }_2-\varDelta _2|{\mathop {\longrightarrow }\limits ^{p}}0\). These facts, together with (30) and (31), show that
$$\begin{aligned} \text{ cov }(\tilde{{\varvec{\eta }}}_i^*|{\mathbb {D}}_N, \tilde{{\varvec{\theta }}}_U )= & {} \sum _{i=1}^N \frac{1}{N^2 \pi _{i}^{Lopt}} \biggr [\sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\}\tilde{\mathbf{x }}_{ik} \biggr ]^{\otimes 2}+ o_p(1)\\= & {} \mathbf{V }_{Lopt}+o_p(1). \end{aligned}$$
We now check Lindeberg’s conditions (Theorem 2.27 of van der Vaart 1998) given \({\mathbb {D}}_N\) and \(\tilde{{\varvec{\theta }}}_U\). For \(\epsilon >0\),
$$\begin{aligned}&\sum _{i=1}^n E\{\Vert n^{-1/2}\tilde{{\varvec{\eta }}}_i^* \Vert ^2 I(\Vert \tilde{{\varvec{\eta }}}_i^*\Vert>\sqrt{n}\epsilon ) |{\mathbb {D}}_N, \tilde{{\varvec{\theta }}}_U\} \nonumber \\&\quad = \sum _{i=1}^n E\biggr \{\biggr \Vert \frac{1}{\sqrt{n} N \tilde{\pi }_{i}^{*Lopt}} \sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\} \tilde{\mathbf{x }}_{ik}^* \biggr \Vert ^2 \nonumber \\&\qquad \times I\biggr (\biggr \Vert \frac{1}{\sqrt{n} N \tilde{\pi }_{i}^{*Lopt} \epsilon } \sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\} \tilde{\mathbf{x }}_{ik}^*\biggr \Vert>1 \biggr ) \biggr | {\mathbb {D}}_N, \tilde{{\varvec{\theta }}}_U \biggr \} \nonumber \\&\quad \le I\left( \max _{1\le i\le N} \frac{ \Vert \mathbf{x }_{i}\Vert +1 }{\tilde{\pi }_{i}^{Lopt}} >\frac{\sqrt{n} N \epsilon }{K} \right) \left( K^2 \sum _{i=1}^N\frac{(\Vert \mathbf{x }_{i}\Vert +1)^2}{ N^2\tilde{\pi }_{i}^{Lopt}}\right) . \end{aligned}$$
(35)
Using the inequalities (32) and (33), it is easy to verify that
$$\begin{aligned} \frac{\Vert \mathbf{x }_i\Vert +1}{\tilde{\pi }_{i}^{Lopt}}= & {} \frac{\Vert \mathbf{x }_i\Vert +1 }{\Vert \sum _{k=1}^K[\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})] \tilde{\mathbf{x }}_{ik} \Vert } \sum _{j=1}^N\left\| \sum _{k=1}^K[\tau _k-I(\tilde{\varepsilon }_j<{\tilde{b}}_{U,k})] \tilde{\mathbf{x }}_{jk}\right\| \nonumber \\\le & {} \frac{\Vert \mathbf{x }_i\Vert +1}{|\sum _{k=1}^K\{\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})\}|\sqrt{\Vert \mathbf{x }_{i}\Vert ^2+K^{-1}}} K \sum _{j=1}^N (1+\Vert \mathbf{x }_j\Vert )\nonumber \\\le & {} \frac{\sqrt{2K}}{\min _{0\le j\le K,j\in {\mathbb {Z}}}|j-\sum _{k=1}^K\tau _k|} K \sum _{j=1}^N (1+\Vert \mathbf{x }_j\Vert )=O(N). \end{aligned}$$
(36)
Thus, \(\max _{1\le i \le N }\frac{\Vert {\mathbf{x }}_i\Vert +1}{\tilde{\pi }_{i}^{Lopt}}=o_p(\sqrt{n} N)\) and the right hand of (35) is \(o_p(1)\), which shows that
$$\begin{aligned} \sum _{i=1}^n E\{\Vert n^{-1/2}\tilde{{\varvec{\eta }}}_i^* \Vert ^2 I(\Vert \tilde{{\varvec{\eta }}}_i^*\Vert >\sqrt{n}\epsilon ) |{\mathbb {D}}_N,\tilde{{\varvec{\theta }}}_U \}=o_p(1). \end{aligned}$$
Given \({\mathbb {D}}_N\) and \(\tilde{{\varvec{\theta }}}_U\), \(\tilde{{\varvec{\eta }}}_i^*\), are i.i.d with mean \(o_p(1)\) and variance \(\mathbf{V }_{Lopt}+o_p(1)\). Note that if \((NK)^{-1}\sum _{i=1}^N(1+\Vert \mathbf{x }_i\Vert )^{-1}[\sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\}\tilde{\mathbf{x }}_{ik}]^{\otimes 2}\) is asymptotically positive definite, then \(\mathbf{V }_{Lopt}\) is asymptotically positive definite. Thus, given \({\mathbb {D}}_N\) and \(\tilde{{\varvec{\theta }}}_U\) in probability, as \(n_0\rightarrow \infty \), \(n\rightarrow \infty \), and \(N\rightarrow \infty \), if \(n/N =o(1)\) and \(\min _{0\le j\le K,j\in {\mathbb {Z}}}|j-\sum _{k=1}^K\tau _k|>0\), then
$$\begin{aligned} \mathbf{V }_{Lopt} ^{-1/2} \tilde{\mathbf{Z }}_{n}^*&{\mathop {\longrightarrow }\limits ^{d}}&N(\mathbf{0 },\mathbf{I }). \end{aligned}$$
(37)
For \({\tilde{A}}^*_{2n}(\mathbf{u })\) in (29), we get that
$$\begin{aligned} E\{{\tilde{A}}^*_{2n}(\mathbf{u })|{\mathbb {D}}_N,\tilde{{\varvec{\theta }}}_U\}= & {} \frac{n}{N}\sum _{i=1}^N A_{2n,i}(\mathbf{u })=\frac{1}{2}\mathbf{u }^\textsf {T}\mathbf{E }\mathbf{u }+o_p(1), \end{aligned}$$
(38)
where the last equality is from (23).
Now we exam its variance. By (21) and (36), we have
$$\begin{aligned}&\text{ var }\{{\tilde{A}}^*_{2n}(\mathbf{u })|{\mathbb {D}}_N,\tilde{{\varvec{\theta }}}_U\}\le \frac{n}{N^2}E\biggr \{ \frac{\{{\tilde{A}}^*_{2n,i}(\mathbf{u })\}^2}{(\tilde{\pi }_{i}^{*Lopt})^2}\biggr |{\mathbb {D}}_N,\tilde{{\varvec{\theta }}}\biggr \}\nonumber \\&\qquad =\frac{n}{N^2}\sum _{i=1}^N \frac{A_{2n,i}^2(\mathbf{u })}{\tilde{\pi }_{i}^{Lopt}}\le \frac{n}{N^2}\sum _{i=1}^N\frac{1}{\sqrt{n}}\sum _{k=1}^K |\mathbf{u }^\textsf {T} \tilde{\mathbf{x }}_{ik}|\frac{A_{2n,i}(\mathbf{u })}{\tilde{\pi }_{i}^{Lopt}}\nonumber \\&\qquad \le K \Vert \mathbf{u }\Vert \frac{\sqrt{n}}{N^2}\sum _{i=1}^N\frac{\Vert \mathbf{x }_{i}\Vert +1}{\tilde{\pi }_{i}^{Lopt}}A_{2n,i}(\mathbf{u })\nonumber \\&\qquad \le K \Vert \mathbf{u }\Vert \max _{1\le i \le N }\frac{\Vert \mathbf{x }_i\Vert +1}{\tilde{\pi }_{i}^{Lopt}}\frac{1}{\sqrt{n}N}\frac{n}{N}\sum _{i=1}^NA_{2n,i}(\mathbf{u })=O_p(n^{-1/2}). \end{aligned}$$
(39)
From (38), (39), and Chebyshev’s inequality,
$$\begin{aligned} {\tilde{A}}^*_{2n}(\mathbf{u })=\frac{1}{2}\mathbf{u }^\textsf {T}\mathbf{E }\mathbf{u }+o_p(1). \end{aligned}$$
(40)
From (29) and (40),
$$\begin{aligned} {\tilde{A}}_n^*(\mathbf{u })= & {} \mathbf{u }^\textsf {T} \tilde{\mathbf{Z }}_{n}^*+ \frac{1}{2}\mathbf{u }^\textsf {T}\mathbf{E }\mathbf{u }+o_p(1). \end{aligned}$$
Since \({\tilde{A}}_n^*(\mathbf{u })\) is a convex function, from the corollary in page 2 of Hjort and Pollard (2011), its minimizer, \(\sqrt{n}(\tilde{{\varvec{\theta }}}_{Lopt}- {\varvec{\theta }}_0)\), satisfies that
$$\begin{aligned} \sqrt{n} (\tilde{{\varvec{\theta }}}_{Lopt}- {\varvec{\theta }}_0)&{\mathop {\longrightarrow }\limits ^{d}}&- \mathbf{E }^{-1}_N \tilde{\mathbf{Z }}_{n}^*+o_p(1). \end{aligned}$$
Thus, we have
$$\begin{aligned} \{\mathbf{E }_N^{-1}\mathbf{V }_{Lopt} \mathbf{E }_N^{-1}\}^{-1/2} \sqrt{n} (\tilde{{\varvec{\theta }}}_{Lopt}-{\varvec{\theta }}_0)= & {} -\{\mathbf{E }_N^{-1}\mathbf{V }_{Lopt} \mathbf{E }_N^{-1}\}^{-1/2} \mathbf{E }^{-1}_N \tilde{\mathbf{Z }}_{n}^*+o_p(1). \end{aligned}$$
This asymptotic expression, together with (37), show that, for any \(\mathbf{a }\in {\mathbb {R}}^{p+K}\),
$$\begin{aligned} P[\{\mathbf{E }_N^{-1}\mathbf{V }_{Lopt} \mathbf{E }_N^{-1}\}^{-1/2} \sqrt{n} (\tilde{{\varvec{\theta }}}_{Lopt}-{\varvec{\theta }}_0)\le \mathbf{a }|{\mathbb {D}}_N,\tilde{{\varvec{\theta }}}_U]{\mathop {\longrightarrow }\limits ^{p}}\varPhi _{p+K}(\mathbf{a }). \end{aligned}$$
Here, \(\varPhi _{p+K}(\mathbf{a })\) denotes the standard \(p+K\)-dimensional multivariate normal distribution function. Since the conditional probability is a bounded random variable, convergence in probability to a constant implies convergence in the mean. Therefore, \(P[\{\mathbf{E }_N^{-1}\mathbf{V }_{Lopt} \mathbf{E }_N^{-1}\}^{-1/2} \sqrt{n} ({\hat{{\varvec{\theta }}}}_{Lopt}-{\varvec{\theta }}_0)\le \mathbf{a }]\rightarrow \varPhi _{p+K}(\mathbf{a })\) for any \(\mathbf{a }\in {\mathbb {R}}^{p+K}\), and this finishes the proof of Theorem 4.