Skip to main content
Log in

Semiparametric M-estimation with non-smooth criterion functions

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

We are interested in the estimation of a parameter \(\theta \) that maximizes a certain criterion function depending on an unknown, possibly infinite-dimensional nuisance parameter h. A common estimation procedure consists in maximizing the corresponding empirical criterion, in which the nuisance parameter is replaced by a nonparametric estimator. In the literature, this research topic, commonly referred to as semiparametric M-estimation, has received a lot of attention in the case where the criterion function satisfies certain smoothness properties. In certain applications, these smoothness conditions are, however, not satisfied. The aim of this paper is therefore to extend the existing theory on semiparametric M-estimators, in order to cover non-smooth M-estimators as well. In particular, we develop ‘high-level’ conditions under which the proposed M-estimator is consistent and has an asymptotic limit. We also check these conditions for a specific example of a semiparametric M-estimator coming from the area of classification with missing data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Chen, X. (2007). Large sample sieve estimation of semi-nonparametric models. In J. J. Heckman & E. E. Leamer (Eds.), Handbook of econometrics, 6B, Chapter 76. North Holland: Elsevier.

    Google Scholar 

  • Chen, X., Fan, Y. (2006). Estimation of copula-based semiparametric time series models. Journal of Econometrics, 130, 307–335.

    Article  MathSciNet  Google Scholar 

  • Chen, X., Liao, Z. (2014). Sieve M-inference on irregular parameters. Journal of Econometrics, 182, 70–86.

    Article  MathSciNet  Google Scholar 

  • Chen, X., Pouzo, D. (2009). Efficient estimation of semiparametric conditional moment models with possibly nonsmooth residuals. Journal of Econometrics, 152, 46–60.

    Article  MathSciNet  Google Scholar 

  • Chen, X., Linton, O., Van Keilegom, I. (2003). Estimation of semiparametric models when the criterion function is not smooth. Econometrica, 71, 1591–1608.

    Article  MathSciNet  Google Scholar 

  • Cheng, G., Shang, Z. (2015). Joint asymptotics for semi-nonparametric regression models under partially linear structure. Annals of Statistics, 43, 1351–1390.

    Article  MathSciNet  Google Scholar 

  • De Backer, M., El Ghouch, A., Van Keilegom, I. (2018). An adapted loss function for censored quantile regression. Journal of the American Statistical Association. https://doi.org/10.1080/01621459.2018.1469996.

    Article  MathSciNet  Google Scholar 

  • Ding, Y., Nan, B. (2011). A sieve \(M\)-theorem for bundled parameters in semiparametric models, with application to the efficient estimation in a linear model for censored data. Annals of Statistics, 39, 3032–3061.

    Article  MathSciNet  Google Scholar 

  • Escanciano, J., Jacho-Chavez, D., Lewbel, A. (2014). Uniform convergence of weighted sums of non- and semi-parametric residuals for estimation and testing. Journal of Econometrics, 178, 426–443.

    Article  MathSciNet  Google Scholar 

  • Escanciano, J., Jacho-Chavez, D., Lewbel, A. (2016). Identification and estimation of semiparametric two step models. Quantitative Economics, 7, 561–589.

    Article  MathSciNet  Google Scholar 

  • Goldenshluger, A., Zeevi, A. (2004). The Hough transform estimator. Annals of Statistics, 32, 1908–1932.

    Article  MathSciNet  Google Scholar 

  • Groeneboom, P., Wellner, J. A. (1992). Information bounds and nonparametric maximum likelihood estimation. Basel: Birkhäuser.

    Book  Google Scholar 

  • Groeneboom, P., Jongbloed, G., Wellner, J. A. (2001). Estimation of a convex function: Characterizations and asymptotic theory. Annals of Statistics, 29, 1653–1698.

    Article  MathSciNet  Google Scholar 

  • Horowitz, J. (2009). Semiparametric and nonparametric methods in econometrics. New York: Springer.

    Book  Google Scholar 

  • Ichimura, H. (1993). Semiparametric least squares (SLS) and weighted SLS estimation of single index models. Journal of Econometrics, 58, 71–120.

    Article  MathSciNet  Google Scholar 

  • Ichimura, H., Lee, S. (2010). Characterization of the asymptotic distribution of semiparametric \(M\)-estimators. Journal of Econometrics, 159, 252–266.

    Article  MathSciNet  Google Scholar 

  • Kim, J., Pollard, D. (1990). Cube root asymptotics. Annals of Statistics, 18, 191–219.

    Article  MathSciNet  Google Scholar 

  • Kosorok, M. R. (2008). Introduction to empirical processes and semiparametric inference. New York: Springer.

    Book  Google Scholar 

  • Koul, H. L., Müller, U. U., Schick, A. (2012). The transfer principle: A tool for complete case analysis. Annals of Statistics, 40, 3031–3049.

    Article  MathSciNet  Google Scholar 

  • Kristensen, D., Salanié, B. (2017). Higher-order properties of approximate estimators. Journal of Econometrics, 198, 189–208.

    Article  MathSciNet  Google Scholar 

  • Ma, S., Kosorok, M. R. (2005). Robust semiparametric M-estimation and the weighted bootstrap. Journal of Multivariate Analysis, 96, 190–217.

    Article  MathSciNet  Google Scholar 

  • Mammen, E., Rothe, C., Schienle, M. (2016). Semiparametric estimation with generated covariates. Econometric Theory, 32, 1140–1177.

    Article  MathSciNet  Google Scholar 

  • Mohammadi, L., Van de Geer, S. (2005). Asymptotics in empirical risk minimization. Journal of Machine Learning Research, 6, 2027–2047.

    MathSciNet  MATH  Google Scholar 

  • Müller, U. U. (2009). Estimating linear functionals in nonlinear regression with responses missing at random. Annals of Statistics, 37, 2245–2277.

    Article  MathSciNet  Google Scholar 

  • Pérez-González, A., Vilar-Fernández, J. M., González-Manteiga, W. (2009). Asymptotic properties of local polynomial regression with missing data and correlated errors. Annals of the Institute of Statistical Mathematics, 61, 85–110.

    Article  MathSciNet  Google Scholar 

  • Polonik, W., Yao, Q. (2000). Conditional minimum volume predictive regions for stochastic processes. Journal of the American Statistical Association, 95, 509–519.

    Article  MathSciNet  Google Scholar 

  • Radchenko, P. (2008). Mixed-rates asymptotics. Annals of Statistics, 36, 287–309.

    Article  MathSciNet  Google Scholar 

  • Van de Geer, S. A. (2000). Empirical processes in M-estimation. New York: Cambridge University Press.

    Google Scholar 

  • Van der Vaart, A. W., Wellner, J. A. (1996). Weak convergence and empirical processes: With applications in statistics. New York: Springer.

  • Van der Vaart, A. W., Wellner, J. A. (2007). Empirical processes indexed by estimated functions. IMS Lecture Notes-Monograph Series, 55, 234–252.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Xiaohong Chen, Guang Cheng, Oliver Linton, Michael Kosorok, Bin Nan, Bodhi Sen and Jon Wellner for stimulating discussions and helpful comments that improved the quality of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ingrid Van Keilegom.

Additional information

Research supported by the European Research Council (2016–2021, Horizon 2020/ERC Grant Agreement No. 694409), and by IAP research network Grant No. P7/06 of the Belgian government (Belgian Science Policy)

Appendix: Proofs

Appendix: Proofs

In this Appendix, we give the proofs of the asymptotic results, namely we prove the consistency, the rate of convergence and the asymptotic distribution of our M-estimator \(\widehat{\theta }\).

Proof of Theorem 1

Our aim is to show that

$$\begin{aligned} M(\theta _0,h_0)-M(\widehat{\theta },h_0)=o_{P^*}(1). \end{aligned}$$
(11)

Indeed, the result we want to obtain is a direct consequence of (11) and assumption (A2). It is easy to show that assumptions (A3) and (A4) imply that

$$\begin{aligned} \frac{|M_n(\widehat{\theta },\widehat{h})-M_n(\theta _0,\widehat{h})-M(\widehat{\theta },\widehat{h})+M(\theta _0,\widehat{h})|}{1+|M_n(\widehat{\theta },\widehat{h})-M_n(\theta _0,\widehat{h})|+|M(\widehat{\theta },\widehat{h})-M(\theta _0,\widehat{h})|}=o_{P^*}(1), \end{aligned}$$
(12)

since \(\widehat{\theta }\) belongs by construction to \(\Theta \). Consider the following decomposition:

$$\begin{aligned}&M(\theta _0,h_0)-M(\widehat{\theta },h_0)\\&\quad = M(\widehat{\theta },\widehat{h})-M(\widehat{\theta },h_0)+M(\theta _0,h_0)-M(\theta _0,\widehat{h})+M(\theta _0,\widehat{h})-M(\widehat{\theta },\widehat{h})\\&\quad \le M_n(\theta _0,\widehat{h})-M_n(\widehat{\theta },\widehat{h})+2\sup _{\theta \in \Theta }|M(\theta ,h_0)-M(\theta ,\widehat{h})|\\&\qquad +\, |M_n(\widehat{\theta },\widehat{h})-M(\widehat{\theta },\widehat{h})-M_n(\theta _0,\widehat{h})+M(\theta _0,\widehat{h})|. \end{aligned}$$

This together with (12) leads to the following inequality:

$$\begin{aligned}&(M(\theta _0,h_0)-M(\widehat{\theta },h_0))(1+o_{P^*}(1))\\&\quad \le (M_n(\theta _0,\widehat{h})-M_n(\widehat{\theta },\widehat{h}))(1+o_{P^*}(1))+4\sup _{\theta \in \Theta }|M(\theta ,h_0)-M(\theta ,\widehat{h})|+o_{P^*}(1). \end{aligned}$$

Now, the quantity \((1+o_{P^*}(1))\) on the left-hand side in the above inequality is positive on a set \(A_n\) whose outer probability tends to one when n tends to infinity. On \(A_n\), a reformulation of the previous inequality gives:

$$\begin{aligned}&M(\theta _0,h_0)-M(\widehat{\theta },h_0) \le (M_n(\theta _0,\widehat{h})-M_n(\widehat{\theta },\widehat{h}))(1+o_{P^*}(1))\nonumber \\&\qquad +\,4\sup _{\theta \in \Theta }|M(\theta ,h_0)-M(\theta ,\widehat{h})|(1+o_{P^*}(1))+o_{P^*}(1). \end{aligned}$$
(13)

Assumptions (A3) and (A5) imply that

$$\begin{aligned} \sup _{\theta \in \Theta }|M(\theta ,h_0)-M(\theta ,\widehat{h})|=o_{P^*}(1), \end{aligned}$$
(14)

and assumption (A1) gives that

$$\begin{aligned} M_n(\theta _0,\widehat{h})-M_n(\widehat{\theta },\widehat{h})\le o_{P^*}(1). \end{aligned}$$
(15)

It now follows directly from (13)–(15) that

$$\begin{aligned}&0\le M(\theta _0,h_0)-M(\widehat{\theta },h_0)\le o_{P^*}(1). \end{aligned}$$

\(\square \)

Proof of Theorem 2

Let \(\xi _n\) be the \(O_{P^*}(r_n^{-2})\)-quantity involved in assumption (B4). We introduce the sets

$$\begin{aligned} S_{j,n}=\Big \{\theta \in \Theta : 2^{j-1}< r_nd(\theta ,\theta _0)\le 2^j\Big \}, \end{aligned}$$

and observe that \(\Theta \backslash \{\theta _0\}=\cup _{j=1}^{+\infty }S_{j,n}\). Our aim is to prove that for any \(\epsilon >0\) there exists \(\tau _\epsilon >0\) such that

$$\begin{aligned} \mathbb P^*\big (r_nd(\widehat{\theta },\theta _0)>\tau _\epsilon \big )<\epsilon \end{aligned}$$
(16)

for n sufficiently large. From now on, we work with an arbitrary fixed positive value of \(\epsilon \). For any \(\delta ,\,\delta _1,\,M,\,K,\,K'>0\), we obtain the following bound using assumption (B4):

$$\begin{aligned}&\mathbb P^*\Big (r_nd(\widehat{\theta },\theta _0)>2^M\Big )\nonumber \\&\quad \le \sum _{j\ge M,\,2^j\le \delta r_n}\mathbb P^* \left( \sup _{\theta \in S_{j,n}}[ M_n(\theta ,\widehat{h})-M_n(\theta _0,\widehat{h})]\ge -Kr_n^{-2},\,A_n \right) \nonumber \\&\qquad +\,\mathbb P^*\Big (2d(\widehat{\theta },\theta _0)\ge \delta \Big )+\mathbb P^*\Big (r_n^2|\xi _n|>K\Big )+\mathbb P^*\Big ( r_n|W_n|>K'\Big )\nonumber \\&\qquad +\,\mathbb P^*\Big ( |\beta _n|>\frac{C}{2}\Big )+\mathbb P^*\left( d_\mathcal {H}(\widehat{h},h_0)>\frac{\delta _1}{v_n}\right) , \end{aligned}$$
(17)

where \(A_n= \{r_n|W_n| \le K',\,|\beta _n|\le \frac{C}{2},\,d_\mathcal {H}(\widehat{h},h_0)\le \frac{\delta _1}{v_n}\}\). Indeed, we can write

$$\begin{aligned}&\mathbb P^*\Big (r_nd(\widehat{\theta },\theta _0)>2^M, \, 2d(\widehat{\theta },\theta _0) < \delta ,\,r_n^2|\xi _n| \le K,\, A_n\Big ) \\&\quad \le \sum _{j \ge M, 2^j \le \delta r_n} \mathbb P^*\Big (\widehat{\theta }\in S_{j,n}, \,r_n^2|\xi _n| \le K, \, A_n \Big ) \\&\quad \le \sum _{j \ge M, 2^j \le \delta r_n} \mathbb P^*\left( \sup _{\theta \in S_{j,n}} [M_n(\theta ,\widehat{h})-M_n(\theta _0,\widehat{h})] \ge \xi _n, \,r_n^2|\xi _n| \le K, \, A_n \right) \\&\quad \le \sum _{j \ge M, 2^j \le \delta r_n} \mathbb P^*\left( \sup _{\theta \in S_{j,n}} [M_n(\theta ,\widehat{h})-M_n(\theta _0,\widehat{h})] \ge -Kr_n^2, \, A_n \right) . \end{aligned}$$

Assumption (B1) implies that for all \(\delta >0\) there exists \(n_\epsilon \) such that

$$\begin{aligned} \mathbb P^*(2d(\widehat{\theta },\theta _0)\ge \delta )<\frac{\epsilon }{6} \end{aligned}$$
(18)

for n larger than \(n_\epsilon \). Then, by definition of \(\xi _n\) and \(W_n\) and because of (B1), there exist three positive constants \(\delta _1\)\(K_\epsilon \) and \(K'_\epsilon \) such that

$$\begin{aligned}&\mathbb P^* \Big (r_n^2|\xi _n|>K_\epsilon \Big )<\frac{\epsilon }{6}, \quad \mathbb P^* \Big (r_n|W_n|>K'_\epsilon \Big )<\frac{\epsilon }{6},\,\nonumber \\&\mathbb P^* \left( |\beta _n|>\frac{C}{2}\right)<\frac{\epsilon }{6},\quad \text { and } \quad \mathbb P^*\left( d_\mathcal {H}(\widehat{h},h_0)>\frac{\delta _1}{v_n}\right) <\frac{\epsilon }{6} \end{aligned}$$
(19)

for n larger than some \(n_1 \in \mathbb {N}\). We fix \(\delta <\delta _0\) and suppose that \(n\ge \max (n_0,n_1,n_\epsilon )\) to get that assumptions (B2) and (B3) are fulfilled on all \(S_{j,n}\) such that \(2^j\le \delta r_n\).

Now, it follows directly from assumption (B3) that for each fixed j such that \(2^j\le \delta r_n\) one has for all \(\theta \in S_{n,j}\):

$$\begin{aligned}&M_n(\theta ,\widehat{h})-M_n(\theta _0,\widehat{h})\\&\quad \le M(\theta ,\widehat{h})-M(\theta _0,\widehat{h})+\sup _{d(\theta ,\theta _0)\le \frac{2^j}{r_n}} |M_n(\theta ,\widehat{h})-M_n(\theta _0,\widehat{h})-M(\theta ,\widehat{h})+M(\theta _0,\widehat{h})| \\&\quad \le |W_n|\frac{2^j}{r_n}-(C-\beta _n)\frac{2^{2j-2}}{r_n^2}+\sup _{d(\theta ,\theta _0)\le \frac{2^j}{r_n}} |M_n(\theta ,\widehat{h})-M_n(\theta _0,\widehat{h})-M(\theta ,\widehat{h})+M(\theta _0,\widehat{h})|. \end{aligned}$$

Consequently, we obtain the following inequality:

$$\begin{aligned}&\mathbb P^* \Big (\sup _{\theta \in S_{j,n}}[ M_n(\theta ,\widehat{h})-M_n(\theta _0,\widehat{h})]\ge -K_\epsilon r_n^{-2},\,A_n \Big )\\&\quad \le \mathbb P^*\Bigg (\sup _{d(\theta ,\theta _0)\le \frac{2^j}{r_n},\,d_\mathcal {H}(h,h_0)\le \frac{\delta _1}{v_n}} |M_n(\theta ,h)-M_n(\theta _0,h)-M(\theta ,h)+M(\theta _0,h)| \\& \ge \frac{2^{2j-2}}{r_n^2}\left( \frac{C}{2}-K'_\epsilon 2^{2-j}-K_\epsilon 2^{2-2j}\right) \Bigg ). \end{aligned}$$

Now, there exists \(M_\epsilon \) such that for all \(j\ge M_\epsilon \) one gets

$$\begin{aligned} \frac{C}{2}-K'_\epsilon 2^{2-j}-K_\epsilon 2^{2-2j}\ge \frac{C}{4}. \end{aligned}$$

Consequently, if \(M\ge M_\epsilon \), using assumption (B2) and Chebyshev’s inequality we have that

$$\begin{aligned}&\sum _{j\ge M,\,2^j\le \delta r_n}\mathbb P^*\Bigg (\Bigg \{\sup _{\theta \in S_{j,n}}[ M_n(\theta ,\widehat{h})-M_n(\theta _0,\widehat{h})]\ge -K_\epsilon r_n^{-2}\Bigg \} \cap A_n \Bigg )\\&\quad \le \sum _{j\ge M,\,2^j\le \delta r_n}\mathbb P^*\Bigg (\sup _{d(\theta ,\theta _0)\le \frac{2^j}{r_n},\,d_\mathcal {H}(h,h_0)\le \frac{\delta _1}{v_n}} |M_n(\theta ,h)-M_n(\theta _0,h)-M(\theta ,h)\\&\qquad \qquad \qquad \qquad \qquad \qquad +M(\theta _0,h)|\ge \frac{C2^{2j-2}}{4r_n^2} \Bigg )\\&\quad \le \frac{4Kr_n^2}{C\sqrt{n}} \sum _{j\ge M,\,2^j\le \delta r_n}\frac{\Phi _n(\frac{2^j}{r_n})}{2^{2j-2}}\\&\quad \le \frac{4Kr_n^2}{C\sqrt{n}} \sum _{j\ge M,\,2^j\le \delta r_n}\frac{2^{j\alpha }\Phi _n(\frac{1}{r_n})}{2^{2j-2}}\\&\quad \le \frac{16K}{C}\sum _{j\ge M}2^{j(\alpha -2)}. \end{aligned}$$

Finally, since \(\alpha <2\), the series \(\sum _{j \ge M} 2^{j(\alpha -2)}\) converges and hence there exists \(M'_\epsilon \ge M_\epsilon \) such that

$$\begin{aligned} \frac{16K}{C}\sum _{j\ge M'_\epsilon }2^{j(\alpha -2)}\le \frac{\epsilon }{6}. \end{aligned}$$

This finishes the proof showing (16) with \(\tau _\epsilon =2^{M'_\epsilon }\). \(\square \)

Proof of Theorem 3

The first step of the proof consists in showing the weak convergence of the process \(\gamma \mapsto r_n^2B_n(\theta _0+\frac{\gamma }{r_n},\widehat{h})\). This is shown in Lemma 1 (given below).

The remainder of the proof is based on somewhat similar arguments as those used to state the Argmax theorem in Van der Vaart and Wellner (1996). First note that E is a \(\sigma \)-compact metric space since \(E=\cup _{i=1}^\infty \mathcal {K}_i\) with \(\mathcal {K}_i=\{\gamma \in E : \Vert \gamma \Vert \le a_i\}\) for any positive sequence \((a_i)_{i\in \mathbb {N}^*}\) tending to infinity.

Then deduce from assumption (C9) together with Lemmas 2 and 3 that almost all paths of the limiting process \(\gamma \mapsto \Lambda (\gamma )+\mathbb {G}(\gamma )\) attain their supremum at an unique point \(\gamma _0\), following similar ideas to what is done in the parametric case (see Theorem 3.2.10 in Van der Vaart and Wellner 1996). Assume now that \(\gamma _0\) is measurable. The weak convergence of \(r_n(\widehat{\theta }-\theta _0)\) to \(\gamma _0\) is equivalent to the next statement (Portmanteau’s theorem):

$$\begin{aligned} \text{ limsup }_{n\rightarrow \infty }\mathbb P^*\Big (r_n(\widehat{\theta }-\theta _0)\in C\big )\le \mathbb P\Big (\gamma _0\in C\Big ),\quad \text { for every closed set}\, C. \end{aligned}$$

Let C be an arbitrary closed subset of E and fix \(\epsilon >0\). The random variable \(\gamma _0\) is tight because it takes values in E, which is \(\sigma \)-compact. Combining this tightness and the first part of (C1), it is possible to find \(K_\epsilon >0\) and hence a compact set \(\mathcal {K}_\epsilon :=\{\gamma : \Vert \gamma \Vert \le K_\epsilon \}\) such that

$$\begin{aligned} \mathbb P^{*}\Big (\gamma _0\notin \mathcal {K}_\epsilon \Big )\le \frac{\epsilon }{2},\text { and }\,\mathbb P^*\Big (r_n(\widehat{\theta }-\theta _0)\notin \mathcal {K}_\epsilon \Big )\le \frac{\epsilon }{2}. \end{aligned}$$
(20)

It follows easily from (20) that

$$\begin{aligned}&\text{ limsup }_{n\rightarrow \infty }\mathbb P^*\Big (r_n(\widehat{\theta }-\theta _0)\in C\Big ) \nonumber \\&\quad \le \mathbb P^*\Big (r_n(\widehat{\theta }-\theta _0) \in C \cap \mathcal {K}_\epsilon ,\,\gamma _0\in \mathcal {K}_\epsilon \Big )+\text{ limsup }_{n\rightarrow \infty }\mathbb P^*\Big (\{r_n(\widehat{\theta }-\theta _0)\notin \mathcal {K}_\epsilon \}\cup \{\gamma _0\notin \mathcal {K}_\epsilon \}\Big ) \nonumber \\&\quad \le \mathbb P^*\Big (r_n(\widehat{\theta }-\theta _0) \in C \cap \mathcal {K}_\epsilon ,\,\gamma _0\in \mathcal {K}_\epsilon \Big )+\epsilon . \end{aligned}$$
(21)

Now, using Lemma 1 and assumption (C8) we obtain

$$\begin{aligned}&\text{ limsup }_{n\rightarrow \infty }\mathbb P^*\Big (r_n(\widehat{\theta }-\theta _0)\in C\cap \mathcal {K}_\epsilon ,\,\gamma _0\in \mathcal {K}_\epsilon \Big )\nonumber \\&\le \text{ limsup }_{n\rightarrow \infty }\mathbb P^*\left( \sup _{\gamma \in C\cap \mathcal {K}_\epsilon }r_n^2B_n\Big (\theta _0+\frac{\gamma }{r_n},\widehat{h}\Big )\ge \sup _{\gamma \in \mathcal {K}_\epsilon }r_n^2B_n\Big (\theta _0+\frac{\gamma }{r_n},\widehat{h}\Big )+o_{P^*}(1),\,\gamma _0\in \mathcal {K}_\epsilon \right) \nonumber \\&\le \mathbb P^*\left( \sup _{\gamma \in C\cap \mathcal {K}_\epsilon }(\Lambda +\mathbb {G})(\gamma )\ge \sup _{\gamma \in \mathcal {K}_\epsilon }(\Lambda +\mathbb {G})(\gamma ),\,\gamma _0\in \mathcal {K}_\epsilon \right) , \end{aligned}$$
(22)

by Slutsky’s lemma and Portmanteau’s theorem. On the other hand, for every open set G containing \(\gamma _0\), we have:

$$\begin{aligned} (\Lambda +\mathbb {G})(\gamma _0)>\sup _{\gamma \in G^c\cap \mathcal {K}_\epsilon }(\Lambda +\mathbb {G})(\gamma ). \end{aligned}$$

This together with (22) leads to

$$\begin{aligned} \text{ limsup }_{n\rightarrow \infty }\mathbb P^*\Big (r_n(\widehat{\theta }-\theta _0)\in C\cap \mathcal {K}_\epsilon ,\,\gamma _0\in \mathcal {K}_\epsilon \Big )\le \mathbb P^*\Big (\gamma _0\in C\Big ). \end{aligned}$$
(23)

Consequently, it follows from (21) that for all \(\epsilon >0\),

$$\begin{aligned} \text{ limsup }_{n\rightarrow \infty }\mathbb P^*\Big (r_n(\widehat{\theta }-\theta _0)\in C\Big ) \le \mathbb P^*\Big (\gamma _0\in C\Big )+\epsilon . \end{aligned}$$
(24)

Since the right-hand side of (24) holds for all \(\epsilon >0\), it also holds for \(\epsilon =0\). The result now follows from Portmanteau’s theorem. \(\square \)

We end this section with three lemmas that were needed in the proof of Theorem 3.

Lemma 1

For all \(K>0\), let \(\mathcal {K}=\{\gamma \in E : \Vert \gamma \Vert \le K\}\) be a compact subset of E. Then, under the assumptions of Theorem 3, for any such \(\mathcal {K}\), the process \(\gamma \mapsto r_n^2B_n(\theta _0+\frac{\gamma }{r_n},\widehat{h})\) converges weakly to the process \(\gamma \mapsto \Lambda (\gamma )+\mathbb {G}(\gamma )\) in \(\ell ^\infty (\mathcal {K})\). Moreover, almost all paths of the limiting process are continuous (uniformly on every compact \(\mathcal {K}\)) with respect to \(\Vert \cdot \Vert \).

Proof

The weak convergence of the process \(\gamma \mapsto r_n^2B_n(\theta _0+\frac{\gamma }{r_n},\widehat{h})\) in \(\ell ^\infty (\mathcal {K})\) follows directly from Slutsky’s theorem and Lemmas 2 and 3. On the other hand, \(\Vert \cdot \Vert \) makes \(\mathcal {K}\) totally bounded (since it is compact) and \(\gamma \mapsto r_n^2B_n(\theta _0+\frac{\gamma }{r_n},h_0)+r_nW_n(\gamma )\) is asymptotically uniformly \(\Vert \cdot \Vert \)-equicontinuous in probability, asymptotically tight, and it converges weakly to \(\gamma \mapsto \Lambda (\gamma )+\mathbb {G}(\gamma )\) in \(\ell ^\infty (\mathcal {K})\) (see proof of Lemma 3). Thus, almost all paths of the limiting process are uniformly \(\Vert \cdot \Vert \)-continuous on \(\mathcal {K}\) (see Theorem 1.5.7 in Van der Vaart and Wellner 1996). Moreover, because E may be covered by a countable sequence of such compact sets, almost all paths of the limiting process are \(\Vert \cdot \Vert \)-continuous on E. \(\square \)

Lemma 2

Let \(\mathcal {K}=\{\gamma \in E : \Vert \gamma \Vert \le K\}\). Then, under the assumptions of Theorem 3, for all \(\gamma \in \mathcal {K}\), there exist \(\xi _{0,n},\xi _{1,n},\xi _{2,n}\), such that \(\sup _{\gamma \in \mathcal {K}}|\xi _{j,n}|=o_{P^*}(1), j=0,1,2,\,\) and

$$\begin{aligned} r_n^2B_n \left( \theta _0+\frac{\gamma }{r_n},\widehat{h}\right) (1+\xi _{0,n}) = \left[ r^2_nB_n \left( \theta _0+\frac{\gamma }{r_n},h_0 \right) +r_nW_n(\gamma ) \right] (1+\xi _{1,n})+\xi _{2,n}. \end{aligned}$$

Proof

Let us introduce the following notations:

$$\begin{aligned}&\alpha _{0,n}(\gamma )=\frac{B_n(\theta ,h)-B(\theta ,h)-B_n(\theta ,h_0)+B(\theta ,h_0)}{r_n^{-2}+|B_n(\theta ,h)|+|B_n(\theta ,h_0)|+|B(\theta ,h)|+|B(\theta ,h_0)|}, \\&s_{n,h}(\gamma )=\text{ sign }\Big [B_n\Big (\theta _0+\frac{\gamma }{r_n},h\Big )\Big ], \\&s_{h}(\gamma )=\text{ sign }\Big [B\Big (\theta _0+\frac{\gamma }{r_n},h\Big )\Big ], \end{aligned}$$

with \(\theta =\theta _0+\gamma /r_n\).

Because the compact \(\mathcal {K}\) is bounded and \(\theta _0\) belongs to the interior of \(\Theta \), there exists \(n_\mathcal {K}\) such that for all \(n\ge n_{\mathcal {K}}\) and for all \(\gamma \in \mathcal {K}\), the quantity \(\theta _0+\frac{\gamma }{r_n}\) is in \(\Theta \). Then, for all \(\gamma \in \mathcal {K}\) entails that

$$\begin{aligned} B_n\left( \theta _0+\frac{\gamma }{r_n},\widehat{h}\right)= & {} B_n\left( \theta _0+\frac{\gamma }{r_n},h_0\right) + B\left( \theta _0+\frac{\gamma }{r_n},\widehat{h}\right) - B\left( \theta _0+\frac{\gamma }{r_n},h_0\right) \nonumber \\&+\,\alpha _{0,n}(\gamma )\left( r_n^{-2}+\left| B_n\left( \theta _0+\frac{\gamma }{r_n},\widehat{h}\right) \right| + \Bigg |B_n\left( \theta _0+\frac{\gamma }{r_n},h_0\right) \Bigg |\right. \nonumber \\&\left. +\,\left| B\left( \theta _0+\frac{\gamma }{r_n},\widehat{h}\right) \right| + \left| B\left( \theta _0+\frac{\gamma }{r_n},h_0\right) \right| \right) . \end{aligned}$$
(25)

This can be reformulated as

$$\begin{aligned}&r_n^2B_n\Big (\theta _0+\frac{\gamma }{r_n},\widehat{h}\Big ) \Big (1-\alpha _{0,n}(\gamma )s_{n,\widehat{h}}(\gamma )\Big )\nonumber \\&\quad = r_n^2B_n\Big (\theta _0+\frac{\gamma }{r_n},h_0\Big ) \Big (1+\alpha _{0,n}(\gamma )s_{n,h_0}(\gamma )\Big ) + r_n^2B\Big (\theta _0+\frac{\gamma }{r_n},\widehat{h}\Big ) \Big (1+\alpha _{0,n}(\gamma )s_{\widehat{h}}(\gamma )\Big )\nonumber \\&\qquad -\, r_n^2B\Big (\theta _0+\frac{\gamma }{r_n},h_0\Big ) \Big (1-\alpha _{0,n}(\gamma )s_{h_0}(\gamma )\Big )+\alpha _{0,n}(\gamma ). \end{aligned}$$
(26)

Then, use assumptions (C1) and (C7) to get

$$\begin{aligned} r_n^2 \left[ B\left( \theta _0+\frac{\gamma }{r_n},\widehat{h}\right) -B\left( \theta _0+\frac{\gamma }{r_n},h_0\right) \right]= & {} r_nW_n(\gamma )+\beta _n\Vert \gamma \Vert ^2+r_n^2o\left( \frac{\Vert \gamma \Vert ^2}{r_n^2}\right) \nonumber \\:= & {} r_nW_n(\gamma )+\alpha _{1,n}(\gamma ). \end{aligned}$$
(27)

Combining (26) and (27), we obtain

$$\begin{aligned}&r_n^2B_n\left( \theta _0+\frac{\gamma }{r_n},\widehat{h}\right) (1+\xi _{0,n}(\gamma )) \nonumber \\&\quad = \left[ r_n^2 B_n \left( \theta _0+\frac{\gamma }{r_n},h_0\right) +r_nW_n(\gamma )\right] (1+\xi _{1,n}(\gamma ))+\xi _{2,n}(\gamma ), \end{aligned}$$
(28)

with

$$\begin{aligned} \xi _{0,n}(\gamma )= & {} -\alpha _{0,n}(\gamma )s_{n,\widehat{h}}(\gamma ),\\ \xi _{1,n}(\gamma )= & {} \alpha _{0,n}(\gamma )s_{n,h_0}(\gamma ),\\ \xi _{2,n}(\gamma )= & {} \alpha _{0,n}(\gamma )\Bigg [1+\Bigg (V(\gamma ,\gamma )+r_n^2o\Bigg (\frac{\Vert \gamma \Vert ^2}{r_n^2}\Bigg )\Bigg )(s_{\widehat{h}}+s_{h_0})(\gamma ) \\&+\,\Bigg (r_nW_n(\gamma )+\alpha _{1,n}(\gamma )\Bigg )(s_{\widehat{h}}-s_{n,h_0})(\gamma )\Bigg ] + \alpha _{1,n}(\gamma )(1+\xi _{1,n}(\gamma )). \end{aligned}$$

It can be easily shown that \(\sup _{\gamma \in \mathcal {K}}|\xi _{j,n}(\gamma )|=o_{P^*}(1)\) for \(j=0,1,2\) using assumptions (C3) and (C7). \(\square \)

Lemma 3

Let \(\mathcal {K}=\{\gamma \in E : \Vert \gamma \Vert \le K\}\). Then, under the assumptions of Theorem 3, the process \(\gamma \mapsto r^2_nB_n(\theta _0+\frac{\gamma }{r_n},h_0)+r_nW_n(\gamma )\) is asymptotically tight, asymptotically uniformly equicontinuous with respect to \(\Vert \cdot \Vert \) on \(\mathcal {K}\), and it converges weakly to the process \(\gamma \mapsto \Lambda (\gamma )+\mathbb {G}(\gamma )\) in \(\ell ^\infty (\mathcal {K})\).

Proof

The main idea of this proof consists in writing the process \(T_n:\gamma \mapsto r_n^2B_n(\theta _0+\frac{\gamma }{r_n},h_0)+r_nW_n(\gamma )\) as the sum of two processes \(T_{1,n}:\gamma \mapsto r_n^2(B_n(\theta _0+\frac{\gamma }{r_n},h_0)-B(\theta _0+\frac{\gamma }{r_n},h_0))\) and \(T_{2,n}:\gamma \mapsto r_n^2B(\theta _0+\frac{\gamma }{r_n},h_0)+r_nW_n(\gamma )\) and studying separately the properties of \(T_{1,n}\) and \(T_{2,n}\). However, in some specific cases it could be possible to state the weak convergence of \(T_n\) without this decomposition. Let us first note that assumption (C7) implies that for n sufficiently large (only depending on \(\mathcal {K}\)) so that \(\theta _0+\frac{\mathcal {K}}{r_n} \subset \Theta \), the processes \(T_{1,n}\) and \(T_{2,n}\) take values in \(\ell ^\infty (\mathcal {K})\).

The process \(T_{1,n}\) does not depend on the estimation of the nuisance parameter. Hence, following similar ideas as in the parametric case we get from assumptions (C4), (C5) and (C10) the asymptotic uniform equicontinuity of \(T_{1,n}\) with respect to \(\Vert \cdot \Vert \) on \(\mathcal {K}\) (as a sub-product of the proof of Theorem 2.11.9 in Van der Vaart and Wellner 1996). On the other hand, for n large enough, \(\theta _0+\gamma /r_n \in \Theta \) (see the proof of Lemma 2). Assume now that n is large enough and use assumption (C7) to conclude that for all \(0<\delta \le \delta _1\),

$$\begin{aligned}&\sup _{\gamma ,\gamma '\in \mathcal {K}, \Vert \gamma -\gamma '\Vert \le \delta }|T_{2,n}(\gamma )-T_{2,n}(\gamma ')| \nonumber \\&\qquad = \sup _{\gamma ,\gamma '\in \mathcal {K}, \Vert \gamma -\gamma '\Vert \le \delta } \Big |W_n(\gamma -\gamma ')+V(\gamma ,\gamma )-V(\gamma ',\gamma ')+r_n^2\Big (o\Big (\frac{\Vert \gamma \Vert ^2}{r_n^2}\Big )+o\Big (\frac{\Vert \gamma '\Vert ^2}{r_n^2}\Big )\Big )\Big |\nonumber \\&\qquad \le \delta ^\tau \left( r_n\sup _{\gamma \in E,\,\delta \le \delta _1,\,\Vert \gamma \Vert \le \delta } \Big |\frac{W_n(\gamma )}{\delta ^\tau }\Big |+\sup _{\gamma ,\gamma '\in E,\,\delta \le \delta _1,\,\Vert \gamma -\gamma '\Vert \le \delta }\frac{|V(\gamma ,\gamma )-V(\gamma ',\gamma ')|}{\delta ^\tau }\right) +b_n\nonumber \\&\qquad : = \delta ^\tau \alpha _n+b_n, \end{aligned}$$
(29)

where \(b_n\le \sup _{\gamma ,\gamma '\in \mathcal {K}}|r_n^2(o(\frac{\Vert \gamma \Vert ^2}{r_n^2})+o(\frac{\Vert \gamma '\Vert ^2}{r_n^2}))| \rightarrow 0\) as n tends to infinity, and \(\alpha _n=O_{P^*}(1)\) uniformly over \(\delta \le \delta _1\). Let \(\epsilon \) and \(\eta \) be arbitrary positive constants. It is clear that, for any \(0<\delta \le \delta _1\) and any positive constant K, (29) leads to

$$\begin{aligned}&\text{ limsup }_{n\rightarrow \infty }\mathbb P^*\left( \sup _{\gamma ,\gamma '\in \mathcal {K}, \Vert \gamma -\gamma '\Vert \le \delta }|T_{2,n}(\gamma )-T_{2,n}(\gamma ')|>\epsilon \right) \\&\quad \le \text{ limsup }_{n\rightarrow \infty }\mathbb P^*\Big (\delta ^\tau \alpha _n+b_n>\epsilon ,\alpha _n\le K,|b_n|<\frac{\epsilon }{2}\Big )+\text{ limsup }_{n\rightarrow \infty }\mathbb P^*\Big (\alpha _n> K\Big ) \\&\quad \le \text{ limsup }_{n\rightarrow \infty }\mathbb P^*\Big (\delta ^\tau>\frac{\epsilon }{2K}\Big )+\text{ limsup }_{n\rightarrow \infty }\mathbb P^*\Big (\alpha _n>K\Big ). \end{aligned}$$

Finally choose \(K_\eta \) such that the last term is smaller than \(\eta \), and take \(\delta \le \delta _1\wedge (\frac{\epsilon }{2K_\eta })^\frac{1}{\tau }\). It then follows that \(T_{2,n}\) is asymptotically uniformly equicontinuous in probability with respect to \(\Vert \cdot \Vert \) on \(\mathcal {K}\).

Hence, the same is also true for the process \(T_n\), since it is the sum of two such processes. The asymptotic tightness and hence the weak convergence of \(T_n\) to \(\Lambda +\mathbb {G}\) in \(\ell ^\infty (\mathcal {K})\) now follows from Theorems 1.5.7 and 1.5.4 in Van der Vaart and Wellner (1996), together with assumption (C9) and the fact that \(\mathcal {K}\) is totally bounded with respect to the \(\Vert \cdot \Vert \)-norm (since it is compact). Moreover, using Addendum 1.5.8 in the same book, almost all paths of the limiting process on \(\mathcal {K}\) are uniformly continuous with respect to \(\Vert \cdot \Vert \). \(\square \)

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Delsol, L., Van Keilegom, I. Semiparametric M-estimation with non-smooth criterion functions. Ann Inst Stat Math 72, 577–605 (2020). https://doi.org/10.1007/s10463-018-0700-y

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-018-0700-y

Keywords

Navigation