Skip to main content

M-estimation in Multistage Sampling Procedures


Multi-stage (designed) procedures, obtained by splitting the sampling budget suitably across stages, and designing the sampling at a particular stage based on information about the parameter obtained from previous stages, are often advantageous from the perspective of precise inference. We develop a generic framework for M-estimation in a multistage setting and apply empirical process techniques to develop limit theorems that describe the large sample behavior of the resulting M-estimates. Applications to change-point estimation, inverse isotonic regression, classification, mode estimation and cusp estimation are provided: it is typically seen that the multistage procedure accentuates the efficiency of the M-estimates by accelerating the rate of convergence, relative to one-stage procedures. The step-by-step process induces dependence across stages and complicates the analysis in such problems, which we address through careful conditioning arguments.

This is a preview of subscription content, access via your institution.


  1. The quantity hn,𝜃(λ) is uniformly (in n and \(\theta \in {\Theta }_{n}^{\tau }\)) bounded by a function of λ that goes to 0 as λ goes to 0.


  • Belitser, E., Ghosal, S. and Van Zanten, J. H. (2013). Optimal two-stage procedures for estimating location and size of maximum of multivariate regression functions. Ann. Statist.

  • Bhattacharya, P. K. and Brockwell, P. J. (1976). The minimum of an additive process withapplications to signal estimation and storage theory. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 37, 51–75.

    MathSciNet  Article  Google Scholar 

  • Bhattacharya, P. K. (1987). Maximum likelihood estimation of a change-point in the distribution of independent random variables: General multiparameter case. J. Multivariate Anal. 23, 183–208.

    MathSciNet  Article  Google Scholar 

  • Billingsley, P. (1995). Probability and measure, 3rd edn. Wiley Series in Probability and MathematicalStatistics. Wiley, New York. A Wiley-Interscience Publication.

  • Cohn, D., Ladner, R. and Waibel, A. (1994). Improving generalization with active learning. In Machine Learning. p. 201–221.

  • Groeneboom, P. (1985). Estimating a monotone density. In Proceedings of the Berkeley conference in honor of JerzyNeyman and Jack Kiefer, Vol. II (Berkeley, Calif., 1983). Wadsworth Statist./Probab. Ser. 539–555. Wadsworth, Belmont. MR822052 (87i62076).

  • Groeneboom, P. (1989). Brownian motion with a parabolic drift and Airy functions. Probab. Theory Related Fields 81, 79–109.

    MathSciNet  Article  Google Scholar 

  • Hotelling, H. (1941). Experimental determination of the maximum of a function. Ann. Math. Statistics 12, 20–45.

    MathSciNet  Article  Google Scholar 

  • Iyengar, V., Apte, C. and Zhang, T. (2000). Active learning using adaptive resampling. In: Proceedings of the Sixth ACM SIGKDD Conference on Knowledge Discovery and Data Mining 91–98. ACM.

  • Katenka, N., Levina, E. and Michailidis, G. (2008). Robust target localization from binary decisions in wireless sensor networks. Technometrics 50, 448–461.

    MathSciNet  Article  Google Scholar 

  • Kiefer, J. and Wolfowitz, J. (1952). Stochastic estimation of the maximum of a regressionfunction. Ann. Math. Statistics 23, 462–466.

    MathSciNet  Article  Google Scholar 

  • Kim, J. and Pollard, D. (1990). Cube root asymptotics. Ann. Statist. 18, 191–219. doi: 10.1214/aos/1176347498.

    MathSciNet  Article  MATH  Google Scholar 

  • Kosorok, M. R. (2008). Introduction to empirical processes and semiparametric inference Springer Series inStatistics. Springer, New York.

    Book  Google Scholar 

  • Lan, Y., Banerjee, M. and Michailidis, G. (2009). Change-point estimation under adaptive sampling. Ann. Statist. 37, 1752–1791.

    MathSciNet  Article  Google Scholar 

  • Müller, H.-G. and Song, K.-S. (1997). Two-stage change-point estimators insmooth regression models. Statist. Probab. Lett. 34, 323–335.

    MathSciNet  Article  Google Scholar 

  • Robbins, H. and Monro, S. (1951). A stochastic approximation method. Ann. Math. Statistics 22, 400–407.

    MathSciNet  Article  Google Scholar 

  • Robertson, T., Wright, F.T. and Dykstra, R.L. (1988). Order restricted statistical inference Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. Wiley, Chichester.

    Google Scholar 

  • Tang, R., Banerjee, M. and Michailidis, G. (2011). A two-stage hybrid procedure for estimating an inverse regression function. Ann. Statist. 39, 956–989.

    MathSciNet  Article  Google Scholar 

  • Tang, R., Banerjee, M., Michaildis, G. and Mankad, S. (2015). Two-Stage Plans for estimating a threshold value of a regression function. Technometrics 57, 395–407.

    MathSciNet  Article  Google Scholar 

  • van der Vaart, A. W. and Wellner, J. A. (1996). Weak convergence and empirical processes: with applications to statistics Springer Series in Statistics. Springer, New York.

    Book  Google Scholar 

  • van der Vaart, A. W. and Wellner, J. A. (2007). Empirical processes indexed by estimated functions. In: Asymptotics: particles, processes and inverse problems. IMS Lecture Notes Monogr. Ser. 55 234–252. Inst. Math. Statist., Beachwood.

  • Wei, S. and Kosorok, M. (2013). Latent supervised learning. JASA 108, 958–970.

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Moulinath Banerjee.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supported by NSF Grant DMS-1007751 and a Sokol Faculty Award, University of Michigan

Supported by NSF Grants DMS-1161838 and DMS-1228164

Electronic supplementary material




A.1 Proof of Theorem 1

Note that if κnrn = O(1), i.e., there exists C > 0, such that κnrnC for all n, then

$$ \begin{array}{@{}rcl@{}} P\left( r_{n} \rho_{n}(\hat{d}_{n}, d_{n}) \geq C\right) & = & P\left( r_{n} \kappa_{n} \rho_{n}(\hat{d}_{n}, d_{n}) \geq C \kappa_{n} \right) \\ & \leq & P\left( \rho_{n}(\hat{d}_{n}, d_{n}) \geq \kappa_{n} \right), \end{array} $$

which converges to zero. Therefore, the conclusion of the theorem is immediate when κnrn = O(1). Hence, we only need to address the situation where \(\kappa _{n} r_{n} \rightarrow \infty \). For a fixed realization of \(\hat {\theta } = \theta \), we use \(\hat {d}_{n}(\theta )\) to denote our estimate, so that \(\hat {d}_{n} = \hat {d}_{n}(\hat {\theta }_{n})\). For any L > 0,

$$ \begin{array}{@{}rcl@{}} {P\left( r_{n} \rho_{n}(\hat{d}_{n}(\hat{\theta}_{n}), d_{n}) \geq 2^{L} \right) }&\leq& { P\left( r_{n} \kappa_{n} > r_{n} \rho_{n}(\hat{d}_{n}(\hat{\theta}_{n}), d_{n}) \geq 2^{L} , \hat{\theta}_{n} \in {\Theta}_{n}^{\tau} \right)} \\ & & + P\left( \rho_{n}(\hat{d}_{n}(\hat{\theta}_{n}), d_{n}) \geq \kappa_{n} \right) + \tau. \end{array} $$

The second term on the right side goes to zero. Further,

$$ \begin{array}{@{}rcl@{}} &&P\left( r_{n} \kappa_{n} > r_{n} \rho_{n}(\hat{d}_{n}(\hat{\theta}_{n}), d_{n}) \geq 2^{L}, \hat{\theta}_{n} \in {\Theta}_{n}^{\tau}\right) \\ &&~~~~~~~~~~~~~~~~~~~~~~~~= E \left[P\left( r_{n} \kappa_{n} > r_{n} \rho_{n}(\hat{d}_{n}(\hat{\theta}_{n}), d_{n}) \geq 2^{L} \mid \hat{\theta}_{n} \right) 1\left[\hat{\theta}_{n} \in {\Theta}_{n}^{\tau}\right] \right] \\ &&~~~~~~~~~~~~~~~~~~~~~~~~ \leq \sup_{\theta \in {\Theta}_{n}^{\tau}} P\left( r_{n} \kappa_{n} > r_{n} \rho_{n}(\hat{d}_{n}({\theta}), d_{n}) \geq 2^{L} \right). \end{array} $$

Let \(S_{j,n} = \left \{d: 2^{j} \leq r_{n} \rho _{n}(d, d_{n}) < \min \limits (2^{j+1}, \kappa _{n} r_{n})\right \}\) for \(j \in \mathbb {Z}\). If \(r_{n} \rho _{n}(\hat {d}_{n}({\theta }), d_{n})\) is larger than 2L for a given positive integer L (and smaller than κnrn), then \(\hat {d}_{n}(\hat {\theta }_{n})\) is in one of the shells Sj,n’s for jL. By definition of \(\hat {d}_{n}({\theta })\), the infimum of the map \(d \mapsto \mathbb {M}_{n}(d, {\theta }) - \mathbb {M}_{n}(d_{n}, {\theta })\) over the shell containing \(\hat {d}_{n}({\theta })\) (intersected with \(\mathcal {D}_{{\theta }}\)) is not positive. For \(\theta \in {\Theta }_{n}^{\tau }\),

$$ \begin{array}{@{}rcl@{}} &&P\left( r_{n} \kappa_{n} > r_{n} \rho_{n}(\hat{d}_{n}({\theta}), d_{n}) \geq 2^{L} \right)\\ && ~~\leq \sum\limits_{j\geq L, 2^{j} \leq \kappa_{n} r_{n}} P^{*}\left( \inf_{d \in S_{j,n} \cap \mathcal{D}_{{\theta}}}\mathbb{M}_{n}(d, {\theta}) - \mathbb{M}_{n}(d_{n}, {\theta}) \leq 0\right). \end{array} $$

For every j involved in the sum, n > Nτ and any \(\theta \in {\Theta }_{n}^{\tau }\), Eq. 2.2 gives

$$ \begin{array}{@{}rcl@{}}\inf_{2^{j}/r_n \leq \rho_n(d, d_n) < \min(2^{j+1}, \kappa_n r_n)/r_n, d \in \mathcal{D}_\theta} M_n(d, \theta) - M_n(d_n, \theta) \geq c_\tau\ \frac{ 2^{2j}}{r^2_n}. \end{array} $$

Also, for such a j,n > Nτ and \(\theta \in {\Theta }_{n}^{\tau }\),

$$ \begin{array}{@{}rcl@{}} &&P^{*}\left( \inf_{d \in S_{j,n} \cap \mathcal{D}_{\theta}}\mathbb{M}_{n}(d, {\theta}) - \mathbb{M}_{n}(d_{n}, \theta) \leq 0\right)\\ &&~~~\leq P^{*}\left( \inf_{d \in S_{j,n} \cap \mathcal{D}_{\theta}}[(\mathbb{M}_{n}(d, \theta) - M_{n}(d, \theta))- (\mathbb{M}_{n}(d_{n}, \theta) - M_{n}(d_{n}, \theta))] \right. \\ &&\left. ~~\leq - \inf_{d \in S_{j,n} \cap \mathcal{D}_{\theta}} M_{n}(d, \theta) - M_{n}(d_{n}, \theta)\right) \\ &&~~\leq P^{*}\left( \inf_{d \in S_{j,n} \cap \mathcal{D}_{\theta}}[(\mathbb{M}_{n}(d, \theta) - M_{n}(d, \theta))- (\mathbb{M}_{n}(d_{n}, \theta) - M_{n}(d_{n}, \theta))] \leq - c_{\tau}\frac{ 2^{2j}}{{r^{2}_{n}}}\right) \\ &&~~\leq P^{*}\left( \sup_{d \in S_{j,n} \cap \mathcal{D}_{\theta}}\left|(\mathbb{M}_{n}(d, \theta) - M_{n}(d, \theta))- (\mathbb{M}_{n}(d_{n}, \theta) - M_{n}(d_{n}, \theta))\right| \geq c_{\tau}\frac{ 2^{2j}}{{r^{2}_{n}}}\right). \end{array} $$

For n > Nτ, by Markov inequality and Eq. 2.3, we get

$$ \begin{array}{@{}rcl@{}} &&{\sup\limits_{\theta \in {\Theta}_{n}^{\tau}}\sum\limits_{j\geq L, 2^{j} \leq \kappa_{n} r_{n}} P^{*}\left( \inf_{d \in S_{j,n} \cap \mathcal{D}_{\theta}}\mathbb{M}_{n}(d, \theta ) - \mathbb{M}_{n}(d_{n}, \theta) \leq 0\right) } \\ &&~~~~~~~~~~~~~~~~~~~~\leq C_{\tau} \sum\limits_{j\geq L, 2^{j} \leq \kappa_{n} r_{n}} \frac{\phi_{n}(\min(2^{j+1}, r_{n} \kappa_{n})/r_{n}){r^{2}_{n}}}{c_{\tau} \sqrt{n}2^{2j}}. \end{array} $$

Note that ϕn(cδ) ≤ cαϕn(δ) for every c > 1. As \(\kappa _{n} r_{n} \rightarrow \infty \), there exists \(\bar {N} \in \mathbb {N}\), such that κnrn > 1. Hence, for L > 0 and \(n > \max \limits (\bar {N}, N_{\tau })\), the above display is bounded by

$$ \frac{C_{\tau}}{ c_{\tau}} \sum\limits_{j\geq L, 2^{j} \leq \kappa_{n} r_{n}} (\min(2^{j+1}, r_{n} \kappa_{n}))^{\alpha} 2^{-2j} \leq \tilde{K} \frac{C_{\tau}}{ c_{\tau}} \sum\limits_{j\geq L, 2^{j} \leq \kappa_{n} r_{n}} 2^{(j+1) \alpha - 2j}, $$

for some universal constant \(\tilde {K}\), by the definition of rn. For any fixed η > 0, take τ = η/3 and choose Lη > 0 such that the sum on the right side is less than η/3. Also, there exists \(\tilde {N}_{\eta } \in \mathbb {N}\) such that for all \(n > \tilde {N}_{\eta } \in \mathbb {N}\),

$$ P\left( \rho_{n}(\hat{d}_{n}(\hat{\theta}_{n}), d_{n}) \geq \kappa_{n} \right) < \eta/3.$$

Hence, for \(n > \max \limits (\bar {N}, N_{\eta /3}, \tilde {N}_{\eta } )\),

$$ P\left( r_{n} \rho_{n}(\hat{d}_{n}(\hat{\theta}_{n}), d_{n}) > 2^{L_{\eta}} \right) < \eta,$$

by Eqs. 7.2 and 7.5. Thus, we get the result when conditions (2.2) and (2.3) hold for some sequence κn > 0.

Further, note that if the conditions in part (b) of the theorem hold for all sequences κn > 0, following the arguments in Eqs. 7.2 and 7.3, we have

$$ \begin{array}{@{}rcl@{}} {P\left( r_{n} \rho_{n}(\hat{d}_{n}(\hat{\theta}_{n}), d_{n}) > 2^{L} \right) }&\leq& \sup\limits_{\theta \in {\Theta}_{n}^{\tau}} P\left( r_{n} \rho_{n}(\hat{d}_{n}({\theta}), d_{n}) > 2^{L} \right) + \tau. \end{array} $$

We can now use the shelling argument for jL letting j go all the way to \(\infty \) where our shell Sj,n is now simply {d : 2jrnρn(d,dn) < 2j+ 1}. By our assumption, the bounds in Eqs. 7.4 and 7.5 hold for every such shell, when n > Nτ and we arrive at the result by similar arguments as above without needing to address the event \(P\left (\rho _{n}(\hat {d}_{n}(\hat {\theta }_{n}), d_{n}) \geq \kappa _{n} \right )\) in Eq. 7.2 separately.

Proof of Theorem 2

As the sum of tight processes is tight, it suffices to show tightness of \(\zeta _{n}(\cdot , \hat {\theta }_{n})\) and \(\mathbb {G}_{n} f_{n, \cdot , \hat {\theta }_{n}}\) separately. As \({\mathcal{H}}\) is totally bounded under \(\tilde {\rho }\), tightness of the process ζn can be shown by justifying that

$$ \begin{array}{@{}rcl@{}}P^{*}\left[\sup_{\tilde{\rho}(h_1, h_2) < \delta_n} \left| \zeta_n(h_1, \hat{\theta}_n) - \zeta_n(h_2, \hat{\theta}_n) \right| > t \right] \rightarrow 0, \end{array} $$

for δn 0 and t > 0. The right side of the above display is bounded by

$$ \begin{array}{@{}rcl@{}} &&{P^{*}\left[\sup\limits_{\tilde{\rho}(h_{1}, h_{2}) < \delta_{n}} \left| \zeta_{n}(h_{1}, \hat{\theta}_{n}) - \zeta_{n}(h_{2}, \hat{\theta}_{n}) \right| > t , \hat{\theta}_{n} \in {\Theta}_{n}^{\tau} \right] + P[\hat{\theta}_{n} \notin {\Theta}_{n}^{\tau}]} \\ &&~~~~~~~~~~~~~~~~~~~~~~~\leq 1\left[\sup\limits_{\underset{\theta \in {\Theta}_{n}^{\tau}}{\tilde{\rho} (h_{1},h_{2}) < \delta_{n}}} | \zeta_{n}(h_{1},\theta) - \zeta_{n}(h_{2},\theta)| > t \right] + \tau. \end{array} $$

By Eq. 2.10, the above can be made arbitrarily small for large n and hence, the process \(\zeta _{n}(\cdot , \hat {\theta }_{n})\) is asymptotically tight.

We justify tightness of the process \(\{ \mathbb {G}_{n} f_{n,h, \hat {\theta }}: h \in {\mathcal{H}} \}\) when Eq. 2.11 holds. The proof under the condition on bracketing numbers follows along similar lines.

As was the case with ζn, we consider the expression

$$ \begin{array}{@{}rcl@{}}P^{*}\left[\sup_{\tilde{\rho}(h_1, h_2) < \delta_n} \left| \mathbb{G}_n (f_{n,h_1,\hat{\theta}_n} - f_{n,h_2,\hat{\theta}_n}) \right| > t \right], \end{array} $$

for δn 0 and t > 0.

Let ei,i ≥ 1 denote Rademacher random variables independent of V ’s and \(\hat {\theta }\). By arguments similar to those at the beginning of the proof of Theorem 2.11.1 of van der Vaart and Wellner (1996), which use a symmetrization lemma for probabilities (Lemma 2.3.7 of the same book), for sufficiently large n, the above display can be bounded by

$$ \begin{array}{@{}rcl@{}}4 P^{*}\left[\sup\limits_{\tilde{\rho}(h_1, h_2) < \delta_n} \left| \frac{1}{\sqrt{n}} \sum\limits_{i=1}^n e_i (f_{n,h_1,\hat{\theta}}(V_i) - f_{n,h_2,\hat{\theta}}(V_i)) \right| > \frac{t}{4} \right]. \end{array} $$

The only difference from the proof of the cited lemma is that the arguments are to be carried out for fixed realizations of Vi’s and \(\hat {\theta }\) (instead of fixed realizations of the Vi’s alone), and then outer expectations are taken. Further, from the measurability assumption, the map

$$ (V_{1}, V_{2}, \ldots, V_{n}, \hat{\theta}, e_{1},\ldots,e_{n}) \mapsto \sup\limits_{\tilde{\rho}(h_{1}, h_{2}) < \delta_{n}} \left| \frac{1}{\sqrt{n}} \sum\limits_{i=1}^{n} e_{i} (f_{n,h_{1},\hat{\theta}}(V_{i}) - f_{n,h_{2},\hat{\theta}}(V_{i})) \right| $$

is jointly measurable. Hence, the expression in Eq. 7.6 is a probability. Let Qn denote the marginal distribution of \(\hat {\theta }_{n}\). Then, for any τ > 0,

$$ \begin{array}{@{}rcl@{}} &&{4 P\left[\sup\limits_{\tilde{\rho}(h_{1}, h_{2}) < \delta_{n}} \left| \frac{1}{\sqrt{n}} \sum\limits_{i=1}^{n} e_{i} (f_{n,h_{1},\hat{\theta}}(V_{i}) - f_{n,h_{2},\hat{\theta}}(V_{i})) \right| > \frac{t}{4} \right]}\\ &&~~ = 4 \int P\left[\sup\limits_{\tilde{\rho}(h_{1}, h_{2}) < \delta_{n}} \left| \frac{1}{\sqrt{n}} \sum\limits_{i=1}^{n} e_{i} (f_{n,h_{1},{\theta}}(V_{i}) - f_{n,h_{2},{\theta}}(V_{i})) \right| > \frac{t}{4} \right] Q_{n} (d \theta) \\ && ~~\leq 4\sup_{\theta \in {\Theta}_{n}^{\tau}} P\left[\sup\limits_{\tilde{\rho}(h_{1}, h_{2}) < \delta_{n}} \left| \frac{1}{\sqrt{n}} \sum\limits_{i=1}^{n} e_{i} (f_{n,h_{1},{\theta}}(V_{i}) - f_{n,h_{2},{\theta}}(V_{i})) \right| > \frac{t}{4} \right] + \tau . \end{array} $$

For a fixed \(\theta \in {\Theta }_{n}^{\tau }\), let \(\mathcal {F}_{n,\theta ,\delta _{n}} = \{ f_{n,h_{1},{\theta }} - f_{n,h_{2},{\theta }} : \tilde {\rho }(h_{1}, h_{2}) < \delta _{n} \} \). For \(g \in \mathcal {F}_{n,\theta ,\delta _{n}}\), the process \(g \mapsto (1/\sqrt {n}) {\sum }_{i=1}^{n} e_{i} g (V_{i})\) (given Vis) is sub-Gaussian with respect to the \(L_{2}(\mathbb {P}_{n})\) semi-metric and hence, by Markov’s inequality and chaining, Corollary 2.2.8 of van der Vaart and Wellner (1996), the first term of the above display can be bounded, up to a universal constant, by

$$ \begin{array}{@{}rcl@{}} \frac{16}{t} \sup\limits_{\theta \in {\Theta}_n^\tau} E {\int}_0^{\xi_n(\theta)} \sqrt{\log N\left( u, \mathcal{F}_{n, \theta, \delta_n}, L_2(\mathbb{P}_n)\right)} du, \end{array} $$


$${\xi_{n}^{2}} (\theta) = \sup\limits_{g \in \mathcal{F}_{n, \theta, \delta_{n}}} \| g\|^{2}_{L_{2}(\mathbb{P}_{n})} = \sup\limits_{g \in \mathcal{F}_{n, \theta, \delta_{n}}} \left[\frac{1}{n} {\sum}_{i=1}^{n} g^{2}(V_{i}) \right].$$

It suffices to show that for all sufficiently large \(n, \sup _{\theta \in {\Theta }_{n}^{\tau }} E \)

\({\int \limits }_{0}^{\xi _{n}(\theta )} \sqrt {\log N\left (u, \mathcal {F}_{n, \theta , \delta _{n}}, L_{2}(\mathbb {P}_{n})\right )} du\) can be made as small as wished. We assume, without loss of generality, that each Fn,𝜃 ≥ 1/2 if necessary by adding 1/2 to each of the original ones. (Note that this does not disturb any of the assumptions of Theorem 2.) Since, \(N(u,\mathcal {F}_{n, \theta , \delta _{n}}, L_{2}(\mathbb {P}_{n})) \leq N^{2}(u/2,\mathcal {F}_{n, \theta }, L_{2}(\mathbb {P}_{n}))\), we have:

$$ \begin{array}{@{}rcl@{}} &&{\sup\limits_{\theta \in {\Theta}_{n}^{\tau}} E {\int}_{0}^{\xi_{n}(\theta)} \sqrt{\log N\left( u, \mathcal{F}_{n, \theta, \delta_{n}}, L_{2}(\mathbb{P}_{n})\right)} du} \\ && ~~\lesssim \sup\limits_{\theta \in {\Theta}_{n}^{\tau}} E {\int}_{0}^{\xi_{n}(\theta)} \sqrt{\log N\left( u/2, \mathcal{F}_{n, \theta}, L_{2}(\mathbb{P}_{n})\right)} du\\ &&~~\lesssim \sup\limits_{\theta \in {\Theta}_{n}^{\tau}} E \left[{\int}_{0}^{\xi_{n}(\theta)/(2 \|F_{n,\theta}\|_{n})} \sqrt{\log N\left( u \|F_{n,\theta}\|_{n}, \mathcal{F}_{n, \theta}, L_{2}(\mathbb{P}_{n})\right)} du \|F_{n,\theta}\|_{n} \right] \\ && ~~\lesssim \sup\limits_{\theta \in {\Theta}_{n}^{\tau}} E \left[\|F_{n,\theta}\|_{n} {\int}_{0}^{\xi_{n}(\theta)} \sup_{Q \in \mathcal{Q}} \sqrt{\log N\left( u \|F_{n,\theta}\|_{Q,2}, \mathcal{F}_{n, \theta}, L_{2}(Q)\right)} du \right] . \end{array} $$

By Cauchy-Schwarz, the above is bounded by:

$$ \sup\limits_{\theta \in {\Theta}_{n}^{\tau}} \left[\sqrt{\frac{1}{n} \sum\limits_{i=1}^{n} E (F_{n,\theta}^{2}(V_{i}))}\right] \sqrt{E (h_{n,\theta}^{2}(\xi_{n}(\theta))} ,$$


$$ h_{n,\theta}(x) = {{\int}_{0}^{x}} \sup\limits_{Q \in \mathcal{Q}} \sqrt{\log N\left( u \|F_{n,\theta}\|_{Q,2}, \mathcal{F}_{n, \theta}, L_{2}(Q)\right)} du .$$

This, in turn, is bounded by:

$$ \sup\limits_{\theta \in {\Theta}_{n}^{\tau}} (PF_{n,\theta}^{2})^{1/2} \times \sqrt{\sup\limits_{\theta \in {\Theta}_{n}^{\tau}} E (h_{n,\theta}^{2}(\xi_{n}(\theta))} .$$

The first term above is bounded as \(n \rightarrow \infty \) by Eq. 2.7. To show that the second term can be made small for sufficiently large n, we claim that it suffices to show that \( \sup\limits _{\theta \in {\Theta }_{n}^{\tau }} E^{*} \xi _{n}(\theta )^{2}\) converges to zero. For the moment, assume the claim. It follows that for any λ > 0,

$$ \sup\limits_{\theta \in {\Theta}_{n}^{\tau}} P(\xi_{n}(\theta) > \lambda) \rightarrow 0 .$$

Next, note that \(\sup\limits _{\theta \in {\Theta }_{n}^{\tau }} h_{n,\theta }(\xi _{n}(\theta )) \leq \sup\limits _{\theta \in {\Theta }_{n}^{\tau }} h_{n,\theta }(\infty ) < \infty \) by Eq. 2.11. Now, for any λ > 0,

$$ \begin{array}{@{}rcl@{}} E(h_{n,\theta}^{2}(\xi_{n}(\theta))) \!& = &\! E(h_{n,\theta}^{2}(\xi_{n}(\theta)) 1(\xi_{n}(\theta) \leq \lambda)) + E(h_{n,\theta}^{2}(\xi_{n}(\theta)) 1(\xi_{n}(\theta) > \lambda)) \\ \!&\! \leq\! &\! h_{n,\theta}^{2}(\lambda) + h_{n,\theta}^{2}(\infty)P(\xi_{n}(\theta) > \lambda) , \end{array} $$

by virtue of the facts that hn,𝜃 is nondecreasing, and that ξn(𝜃)) ≤ λ because of the indicator, so that

$$ \sup_{\theta \in {\Theta}_{n}^{\tau}} E(h_{n,\theta}^{2}(\xi_{n}(\theta))) \leq h_{n,\theta}^{2}(\lambda) + \sup_{\theta \in {\Theta}_{n}^{\tau}} h_{n,\theta}^{2}(\infty) \sup_{\theta \in {\Theta}_{n}^{\tau}} P(\xi_{n}(\theta) > \lambda) ,$$

which can be made as small as we please by first choosing λ small enough and then letting \(n \rightarrow \infty \).Footnote 1 It remains to prove the claim. Note that

$$ \begin{array}{@{}rcl@{}} E^{*} \xi_{n}(\theta)^{2} \leq E^{*} \sup\limits_{g \in \mathcal{F}_{n, \theta, \delta_{n}}} | (\mathbb{P}_{n} - P) g^{2}| + \sup\limits_{g \in \mathcal{F}_{n, \theta, \delta_{n}}} |P g^{2}| . \end{array} $$

By Eq. 2.9, the second term on the right side goes to zero uniformly in \(\theta \in {\Theta }_{n}^{\tau }\). By the symmetrization lemma for expectations, Lemma 2.3.1 of van der Vaart and Wellner (1996), the first term on the right side is bounded by

$$ \begin{array}{@{}rcl@{}} 2E^{*} \sup\limits_{g \in \mathcal{F}^{2}_{n, \theta, \delta_{n}}} \left|\frac{1}{n}\sum\limits_{i=1}^{n} e_{i} g(V_{i}) \right| \leq 2E^{*} \sup\limits_{g \in \mathcal{F}^{2}_{n, \theta, \infty}} \left|\frac{1}{n}\sum\limits_{i=1}^{n} e_{i} g(V_{i}) \right| . \end{array} $$

Note that Gn,𝜃 = (2Fn,𝜃)2 is an envelope for the class \(\mathcal {F}^{2}_{n,\theta ,\infty }\). By condition Eq. 2.8, there exists a sequence of numbers ηn 0 (slowly enough) such that \(\sup _{\theta \in {\Theta }_{n}^{\tau }} P F^{2}_{n, \theta } 1\left [F_{n, \theta } > \eta _{n} \sqrt {n}\right ] \) converges to zero. Let \(\mathbb {F}^{2}_{n,\theta ,\infty , \eta _{n}} = \left \{g 1[G_{n,\theta } \leq n {\eta _{n}^{2}}] :\ g \in \mathcal {F}^{2}_{n, \theta , \infty } \right \} \). Then, the above display is bounded by:

$$ \begin{array}{@{}rcl@{}} 2E^{*}\ \sup\limits_{g \in \mathcal{F}^{2}_{n, \theta, \infty, \eta_{n}}} \left|\frac{1}{n}\sum\limits_{i=1}^{n} e_{i} g(V_{i})\right| + 2 P^{*} G_{n,\theta} 1\left[G_{n,\theta} > n {\eta_{n}^{2}}\right] . \end{array} $$

The second term in the above display goes to zero (uniformly in 𝜃) by Eq. 2.8 and it remains to show the convergence of the first term (to 0) uniformly in 𝜃. By the P-measurability of the class \(\mathcal {F}^{2}_{n, \theta , \infty , \eta _{n}}\), the first term in the above display is an expectation. For u > 0, let \(\mathcal {G}_{u,n}\) be a minimal uRn-net in \(L_{1}(\mathbb {P}_{n})\) over \(\mathbb {F}^{2}_{n,\theta ,\infty , \eta _{n}}\), where \(R_{n} = 4\|F_{n,\theta }\|_{n}^{2}\). Note that the cardinality of \(\mathcal {G}_{u,n}\) is \(N(u R_{n}, \mathcal {F}^{2}_{n, \theta , \infty , \eta _{n}}, L_{1}(\mathbb {P}_{n}))\) and that

$$ 2E^{*}\ \sup\limits_{g \in \mathcal{F}^{2}_{n, \theta, \infty, \eta_{n}}} \left|\frac{1}{n}\sum\limits_{i=1}^{n} e_{i} g(V_{i})\right| \leq 2E\ \sup\limits_{g \in \mathcal{G}_{u,n}} \left|\frac{1}{n}\sum\limits_{i=1}^{n} e_{i} g(V_{i})\right| + u E(R_{n}) . $$

Note that \(\sup _{\theta \in {\Theta }_{n}^{\tau }} u E(R_{n}) = 4u \sup _{\theta \in {\Theta }_{n}^{\tau }} u PF_{n,\theta }^{2} \lesssim u\), by Eq. 2.7. Using the fact that the L1 norm is bounded up to a (universal) constant by the ψ2 Orlicz norm and letting ψ2|V denote the conditional Orlicz norm given fixed realizations of the Vi’s, we obtain the following bound on the first term of the above display:

$$ \begin{array}{@{}rcl@{}} \frac{2}{n}E_{V} E_{e} \left[\sup\limits_{g \in \mathcal{G}_{u,n}} \left|\sum\limits_{i=1}^{n} e_{i} g(V_{i})\right|\right] &\lesssim& \frac{2}{n}E_{V} \left\|\sup\limits_{g \in \mathcal{G}_{u,n}} \left|\sum\limits_{i=1}^{n} e_{i} g(V_{i})\right|\right \|_{\psi_{2} \mid V} \\ &\lesssim& \frac{2}{n} E_{V} \left[\sqrt{1 + \log N(u R_{n}, \mathcal{F}^{2}_{n, \theta, \infty, \eta_{n}}, L_{1}(\mathbb{P}_{n})) } \right.\\ & & \left. \qquad \qquad \times {\max}_{g \in \mathcal{G}_{u,n}}\left\|\sum\limits_{i=1}^{n} e_{i} g(V_{i})\right.\|_{\psi_{2} \mid V} \right] , \end{array} $$

where the last inequality follows by an application of a maximal inequality for Orlicz norms (Lemma 2.2.2. of van der Vaart and Wellner 1996). By Hoeffding’s inequality, for each \(g \in \mathcal {G}_{u,n}, \left \|{\sum }_{i=1}^{n} e_{i} g(V_{i})\right . \|_{\psi _{2} \mid V} \leq [{\sum }_{i} g^{2}(V_{i}) ]^{1/2}\) which is at most \(\left [{\sum }_{i} n {\eta ^{2}_{n}} G_{n,\theta }(V_{i}) \right ]^{1/2}\). We conclude that the first term on the right side of Eq. 7.8 is bounded, up to a universal constant, by:

$$E \left[\frac{\left[{\sum}_{i} n {\eta^{2}_{n}} G_{n,\theta}(V_{i}) \right]^{1/2}}{n} \sqrt{1 + \log N(u 4\|F_{n,\theta}\|_{n}^{2}, \mathcal{F}^{2}_{n, \theta, \infty, \eta_{n}}, L_{1}(\mathbb{P}_{n})) }\right] . $$


$$ \begin{array}{@{}rcl@{}} \log N(u 4\|F_{n,\theta}\|_{n}^{2}, \mathcal{F}^{2}_{n, \theta, \infty, \eta_{n}}, L_{1}(\mathbb{P}_{n})) &\leq & \log N(u 4\|F_{n,\theta}\|_{n}^{2}, \mathcal{F}^{2}_{n, \theta,\infty}, L_{1}(\mathbb{P}_{n})) \\ & \leq & \log N(u \|F_{n,\theta}\|_{n}, \mathcal{F}_{n, \theta, \infty}, L_{2}(\mathbb{P}_{n})) \\ & \leq & \log N^{2}((u/2) \|F_{n,\theta}\|_{n}, \mathcal{F}_{n, \theta}, L_{2}(\mathbb{P}_{n})) \\ & \leq & 2 \sup\limits_{Q} \log N((u/2) \|F_{n,\theta}\|_{Q,2}, \mathcal{F}_{n, \theta}, L_{2}(Q)) . \end{array} $$

Conclude that the expectation preceding the above display is bounded by:

$$ \begin{array}{@{}rcl@{}} &&{\frac{\eta_{n}}{\sqrt{n}} E \left[\sum\limits_{i=1}^{n} G_{n,\theta}(V_{i}) \right]^{1/2} \sqrt{1 + 2\sup\limits_{Q} \log N((u/2) \|F_{n,\theta}\|_{Q,2}, \mathcal{F}_{n, \theta}, L_{2}(Q)) }} \\ && ~~~\leq \frac{\eta_{n}}{\sqrt{n}} \left[E \left[\sum\limits_{i=1}^{n} G_{n,\theta}(V_{i}) \right]\right]^{1/2} \sqrt{1 + 2\sup\limits_{Q} \log N((u/2) \|F_{n,\theta}\|_{Q,2}, \mathcal{F}_{n, \theta}, L_{2}(Q)) } \\ && ~~~\leq 4 \eta_{n} \left[P F^{2}_{n,\theta}\right] \sqrt{1 + 2\sup_{Q} \log N((u/2) \|F_{n,\theta}\|_{Q,2}, \mathcal{F}_{n, \theta}, L_{2}(Q))}. \end{array} $$

Now, note that u is arbitrary (and can therefore be as small as wished), \(\sup _{\theta \in {\Theta }_{n}^{\tau }} P F^{2}_{n,\theta }\) is O(1) from Eq. 2.7, and,

$$ \sup\limits_{\theta \in {\Theta}_{n}^{\tau}} \sqrt{1 + 2\sup\limits_{Q} \log N((u/2) \|F_{n,\theta}\|_{Q,2}, \mathcal{F}_{n, \theta}, L_{2}(Q)) } = O(1) ,$$


$$\sup_{\theta \in {\Theta}_{n}^{\tau}} h_{n,\theta}(u/2) \geq \sup\limits_{\theta \in {\Theta}_{n}^{\tau}}(u/2) \sup\limits_{Q} \sqrt{\log N((u/2) \|F_{n,\theta}\|_{Q,2}, \mathcal{F}_{n, \theta}, L_{2}(Q))} ,$$

showing that

$$ \sup\limits_{\theta \in {\Theta}_{n}^{\tau}} \sup\limits_{Q} \sqrt{\log N((u/2) \|F_{n,\theta}\|_{Q,2}, \mathcal{F}_{n, \theta}, L_{2}(Q))} \leq (2/u) \sup\limits_{\theta \in {\Theta}_{n}^{\tau}} h_{n,\theta}(u/2) ,$$

and from Eq. 2.11, \(\sup _{\theta \in {\Theta }_{n}^{\tau }} h_{n,\theta }(u/2)\) is O(1). Hence, by choosing u large enough and then letting \(n \rightarrow \infty \), the first term on the right side of Eq. 7.8 can be made as small as wished, uniformly over \(\theta \in {\Theta }_{n}^{\tau }\), for n sufficiently large, since \(\eta _{n} \rightarrow 0\).

Proof of Theorem 3

As n1,n2 and n are of the same order, we deduce bounds in terms of n only. For notational ease, we first consider the situation where dd0. Recall that 𝜃 = (α,β,μ). Also, let

$$ \begin{array}{@{}rcl@{}} \begin{array}{lll} {\Theta}_{n_1}^\tau =& \left[\alpha_n - \frac{K_\tau}{ \sqrt{n_1}}, \alpha_n + \frac{K_\tau}{ \sqrt{n_1}}\right] \times \left[\beta_n - \frac{K_\tau}{ \sqrt{n_1}}, \beta_n + \frac{K_\tau}{ \sqrt{n_1}}\right] \times \\ & \left[d_0 - \frac{K_\tau}{ {n_1}^\nu}, d_0 + \frac{K_\tau}{ {n_1}^\nu}\right], \end{array} \end{array} $$

where Kτ is chosen such that \(P\left (\hat {\theta }_{n_{1}} \in {\Theta }_{n_{1}}^{\tau } \right ) > 1 - \tau \). For \(\theta \in {\Theta }_{n_{1}}^{\tau }, \beta - \alpha \geq c_{0} n^{-\xi } - 2 K_{\tau }/ \sqrt {n_{1}}\). As ξ < 1/2,sgn(βα) = 1 for \(n > N^{(1)}_{\tau } := (2K_{\tau }/(\sqrt {p} c_{0}))^{2/(2-\xi )}\). Also, for x > d0,mn(x) = βn and thus,

$$ \begin{array}{@{}rcl@{}} \mathbb{M}_{n_{2}}(d, {\theta}) & = & \mathbb{P}_{n_{2}} \left[g_{n_{2},d, \theta} (V)\right], \end{array} $$

where for \(V = (U, \epsilon ), U \sim \text {Uniform}[-1,1]\),

$$ \begin{array}{@{}rcl@{}} g_{n_{2},d,\theta}(V) & = &\left( \beta_{n} + \epsilon - \frac{\beta +\alpha}{2} \right) 1\left[\mu + K n_{1}^{-\gamma} U \in (d_{0}, d]\right]\\ & = &\left( \beta_{n} + \epsilon - \frac{\beta +\alpha}{2} \right) 1\left[U \in \left( \frac{d_{0}- \mu}{Kn_{1}^{-\gamma}}, \frac{d- \mu}{Kn_{1}^{-\gamma}}\right]\right]. \end{array} $$

Consequently, for \(n > N^{(1)}_{\tau }\),

$$ \begin{array}{@{}rcl@{}} M_{n_{2}}(d, \theta) & = &\frac{1}{2}\left( \beta_{n} - \frac{\beta +\alpha}{2} \right) \lambda\left( [-1,1] \cap \left( \frac{d_{0}- \mu}{Kn_{1}^{-\gamma}}, \frac{d- \mu}{Kn_{1}^{-\gamma}}\right] \right). \end{array} $$

As \(\gamma < \nu , d_{0} \in \mathcal {D}_{\theta }\) for all \(\theta \in {\Theta }_{n_{1}}^{\tau }\), for \(n > N^{(2)}_{\tau } := (1/p) (K_{\tau }/K)^{1/(\nu -\gamma )}\) the intervals

$$\left\{ \left( (d_{0}-\mu)/(Kn_{1}^{-\gamma}), (d -\mu)/(Kn_{1}^{-\gamma})\right]: d>d_{0}, d \in \mathcal{D}_{\theta}, \theta \in {\Theta}_{n_{1}}^{\tau}\right\}$$

are all contained in [− 1,1]. Therefore, for \(n > N^{(3)}_{\tau } := \max \limits (2N^{(1)}_{\tau }, N^{(2)}_{\tau } )\),

$$ \begin{array}{@{}rcl@{}} M_{n_{2}}(d, \theta) & = & \frac{1}{2}\left( \beta_{n} - \frac{\beta +\alpha}{2} \right) \frac{d - d_{0}}{K n_{1}^{-\gamma}}. \end{array} $$

Note that \(M_{n_{2}}(d_{0}, \theta ) = 0\) for all \(\theta \in \mathbb {R}^{3}\). Further, let \({\rho _{n}^{2}}(d, d_{0}) = n^{\gamma -\xi } |d - d_{0}|\). Then, for \(n > N^{(3)}_{\tau }\),

$$ \begin{array}{@{}rcl@{}} M_{n_{2}}(d, \theta) - M_{n_{2}}(d_{0}, \theta) & \geq & \left( \beta_{n} - \frac{\beta_{n} + \alpha_{n}}{2} - \frac{K_{\tau}}{\sqrt{n_{1}}} \right) \frac{d - d_{0}}{2K n_{1}^{-\gamma}} \\ & =& \left( \frac{\beta_{n} - \alpha_{n}}{2} - \frac{K_{\tau}}{\sqrt{n_{1}}} \right) \frac{d - d_{0}}{2K n_{1}^{-\gamma}} \\ & = & \left( \frac{c_{0} n^{-\xi}}{2} - \frac{K_{\tau}}{\sqrt{n_{1}}} \right) \frac{d - d_{0}}{2K n_{1}^{-\gamma}} \\ & \geq& c_{\tau} {\rho_{n}^{2}}(d, d_{0}), \end{array} $$

for some cτ > 0 (depending on τ through Kτ). The last step follows from the fact that ξ < 1/2. Also, the above lower bound can be shown to hold for the case d > d0 as well.

Further, to apply Theorem 1, we need to bound

$$ \sup\limits_{\theta \in {\Theta}_{n_1}^\tau} E^{*} \sup\limits_{|d- d_0| < n^{\xi-\gamma}\delta^2, \\d \in \mathcal{D}_\theta } \sqrt{n_2}\left| (\mathbb{M}_{n_2} (d, \theta) -M_{n_2}(d, \theta)) - (\mathbb{M}_{n_2}(d_0, \theta) -M_{n_2}(d_0, \theta) ) \right|. $$

Note that for d > d0, the expression in |⋅| equals \((1/\sqrt {n_{2}}) \mathbb {G}_{n_{2}} g_{n_{2}, d,\theta }\).

The class of functions \(\mathcal {F}_{\delta ,\theta }= \{g_{n_{2}, d,\theta }: 0 \leq d-d_{0} < n^{\xi -\gamma }\delta ^{2}, d \in \mathcal {D}_{\theta } \}\) is VC with index at most 3 (for every (δ,𝜃)) and is enveloped by

$$ \begin{array}{@{}rcl@{}}M_{\delta, \theta}(V) = \left( |\epsilon| + \frac{\beta_n -\alpha_n}{2} + \frac{K_\tau}{\sqrt{n_1}} \right) 1\left[U \in \left[\frac{d_0-\mu}{K {n_1}^{-\gamma}}, \frac{d_0 - \mu +\delta^2 n^{\xi-\gamma}}{K {n_1}^{-\gamma}}\right]\right]. \end{array} $$

Note that

$$ \begin{array}{@{}rcl@{}} &&{E\left[M_{\delta, \theta}(V)\right]^{2}}\\ && ~~= \frac{1}{2}E\left[ \left( |\epsilon| + \frac{\beta_{n} -\alpha_{n}}{2} + \frac{K_{\tau}}{\sqrt{n_{1}}} \right)^{2}\right] \lambda \left[[-1,1] \cap \left[\frac{d_{0}-\mu}{K {n_{1}}^{-\gamma}}, \frac{d_{0} - \mu +\delta^{2} n^{\xi-\gamma}}{K {n_{1}}^{-\gamma}}\right]\right]\\ && ~~\leq \frac{1}{2}E\left[ \left( |\epsilon| + \frac{\beta_{n} -\alpha_{n}}{2} + \frac{K_{\tau}}{\sqrt{n_{1}}} \right)^{2}\right] \lambda \left[\frac{d_{0}-\mu}{K {n_{1}}^{-\gamma}}, \frac{d_{0} - \mu +\delta^{2} n^{\xi-\gamma}}{K {n_{1}}^{-\gamma}}\right]\\ && ~~\leq C^{2}_{\tau} \frac{n^{\xi -\gamma}\delta^{2}}{n^{-\gamma}} = C^{2}_{\tau} n^{\xi} \delta^{2}, \end{array} $$

where Cτ is positive constant (it depends on τ through Kτ). Further, the uniform entropy integral for \(\mathcal {F}_{\delta ,\theta }\) is bounded by a constant which only depends upon its VC-index (which, as noted above, is uniformly bounded in (δ,𝜃)), i.e., the quantity

$$ J(1, \mathcal{F}_{\delta,\theta}) = \sup_{Q} {{\int}_{0}^{1}} \sqrt{1 + \log N(u \|M_{\delta, \theta}\|_{Q,2}, \mathcal{F}_{\delta,\theta}, L_{2}(Q)) } d u $$

is uniformly bounded in (δ,𝜃); see Theorems 9.3 and 9.15 of Kosorok (2008) for more details. Using Theorem 2.14.1 of van der Vaart and Wellner (1996),

$$ \begin{array}{@{}rcl@{}} E^{*} \sup\limits_{\underset{0 \leq d- d_0 < n^{\xi-\gamma}\delta^2}{d \in \mathcal{D}_\theta}} \left|\mathbb{G}_{n_2} g_{n_2, d,\theta}\right| \leq J(1, \mathcal{F}_{\delta,\tau}) \|M_{\delta, \theta}\|_2 \leq C_\tau n^{\xi/2} \delta. \end{array} $$

Note that this bound does not depend on 𝜃 and can be shown to hold for the case dd0 as well.

Hence, we get the bound ϕn(δ) = nξ/2δ on the modulus of continuity. Further, for \(n > N^{(3)}_{\tau }\), Eq. 7.10 holds for all \(d \in \mathcal {D}_{\theta }\), and Eq. 7.12 is valid for all δ > 0. Hence, we do not need to justify a condition of the type \(P\left (\rho _{n}(\hat {d}_{n}, d_{n})\geq \kappa _{n} \right ) \rightarrow 0\) to apply Theorem 1. For rn = n1/2−ξ/2, the relation \({r^{2}_{n}} \phi _{n}(1/r_{n}) \leq \sqrt {n}\) is satisfied. Consequently, \({r^{2}_{n}} (n^{\gamma - \xi } (\hat {d}_{n} -d_{0})) = n^{\eta }(\hat {d}_{n} -d_{0}) = O_{p}(1)\).

Proof of Theorem 4

For any L > 0, we start by justifying the conditions of Theorem 2 to prove tightness of the process \(Z_{n_{2}}(h, \hat {\theta }_{n_{1}})\), for h ∈ [−L,L]. For sufficiently large n, the set \(\{ {h: d_{0} + h/n^{\eta } \in \mathcal {D}_{{\theta }}} \}\) contains [−L,L] for all \(\theta \in {\Theta }_{n_{1}}^{\tau }\) and hence, it is not necessary to extend \(Z_{n_{2}}\) (equivalently, \(f_{n_{2}, h, \theta }\)) as done in Eq. 2.5. Further, for a fixed \(\theta \in {\Theta }_{n_{1}}^{\tau }\) (defined in Eq. 7.9), an envelope for the class of functions \(\{ f_{n_{2},h,{\theta }}: |h| \leq L \}\) is given by

$$ \begin{array}{@{}rcl@{}} F_{n_{2}, \theta} (V) & =& n_{2}^{1/2- \xi} \left( \frac{\beta_{n} - \alpha_{n}}{2}+ \frac{K_{\tau}}{\sqrt{n_{1}}} + |\epsilon| \right) \times \\ & & 1\left[\mu + U K n_{1}^{-\gamma} \in [ d_{0} - L n^{-\eta}, d_{0} + L n^{-\eta}] \right]. \end{array} $$

Note that

$$ \begin{array}{@{}rcl@{}}P F^2_{n_2, \theta} \lesssim n^{1 - 2\xi} \left( \left( \frac{\beta_n - \alpha_n}{2} + \frac{K_\tau}{\sqrt{n_1}} \right)^2 + \sigma^2\right) \frac{2L n^{-\eta}}{2K n_1^{-\gamma}} \end{array}. $$

As η = 1 + γ − 2ξ, the right side (which does not depend on 𝜃) is O(1). Moreover, the bound is uniform in \(\theta , \theta \in {\Theta }_{n_{1}}^{\tau }\). Let K0 be a constant (depending on τ) such that \(K_{0} \geq {(\beta _{n} - \alpha _{n})}/{2} + {K_{\tau }}/{\sqrt {n_{1}}}\). Then, for \(t>0, P F^{2}_{n_{2}, \theta } 1[F_{n_{2}, \theta } > \sqrt {n_{2}} t] \) is bounded by

$$ \begin{array}{@{}rcl@{}} & & n^{1 - 2\xi} P\left( (K_{0} + |\epsilon|)^{2} 1\left[\mu + U K n_{1}^{-\gamma} \in [ d_{0} - L n^{-\eta}, d_{0} + L n^{-\eta}] \right] \times \right. \\ & & \left. 1\left[ n^{1/2 - \xi}(K_{0} + |\epsilon|) > \sqrt{n_{2}} t \right]\right). \end{array} $$

As 𝜖 and U are independent, the above is bounded up to a constant by

$$ P (K_{0} + |\epsilon|)^{2} 1\left[ (K_{0} + |\epsilon|) > \sqrt{p}{n^{\xi}} t \right] $$

which goes to zero. This justifies condition (2.7) and (2.8) of Theorem 2. Let \(\tilde {\rho } (h_{1}, h_{2}) = |h_{1} - h_{2}|\). For any L > 0, the space [−L,L] is totally bounded with respect to \(\tilde {\rho }\). For h1,h2 ∈ [−L,L] and \(\theta \in {\Theta }_{n_{1}}^{\tau }\), we have

$$ \begin{array}{@{}rcl@{}} P(f_{n_{2}, h_{1}, \theta} - f_{n_{2}, h_{2}, \theta})^{2} & \lesssim & n^{1 - 2\xi} \frac{|h_{1} - h_{2}|n^{-\eta}}{2K n_{1}^{-\gamma}} E\left[K_{0} + |\epsilon|\right]^{2}. \end{array} $$

The right side is bounded (up to a constant multiple depending on τ) by |h1h2| for all choices of \(\theta , \theta \in {\Theta }_{n_{1}}^{\tau }\). Hence, condition (2.9) is satisfied as well. Condition (2.10) can be justified in a manner mentioned later. Further, the class of functions \(\{ f_{n_{2},h,{\theta }}: |h| \leq L \}\) is VC of index at most 3 with envelope \(F_{n_{2},\theta }\). Hence, it has a bounded entropy integral with the bound only depending on the VC index of the class (see Theorems 9.3 and 9.15 of Kosorok 2008) and hence, condition (2.11) is also satisfied. Also, the measurability condition (2.13) can be shown to hold by approximating \(\mathcal {F}_{n_{2},\delta } = \{ f_{n_{2}, h_{1}, \theta } - f_{n_{2}, h_{2}, \theta }: |h_{1} -h_{2}| < \delta \} \) (defined in Theorem 2) by the countable class involving only rational choices of h1 and h2. Note that the supremum over this countable class is measurable and it agrees with supremum over \(\mathcal {F}_{n_{2},\delta }\). Thus \(\mathbb {G}_{n_{2}} f_{n_{2},h,\hat {\theta }}\) is tight in \(l^{\infty }([-L, L])\).

Next, we apply Corollary 1 to deduce the limit process. Note that for \(\theta \in {\Theta }_{n_{1}}^{\tau }\) and |h|≤ L,

$$ \begin{array}{@{}rcl@{}} \zeta_{n_{2}}(h, \theta) & = & n_{2}^{1 - \xi } \left( \alpha_{n} 1(h \leq 0) + \beta_{n} 1(h > 0) - \frac{\alpha + \beta}{2}\right) \frac{h n^{-\eta}}{2K n_{1}^{-\gamma}}\\ & = & (1-p)^{1 - \xi } \left( \alpha_{n} 1(h \leq 0) + \beta_{n} 1(h > 0) - \frac{\alpha + \beta}{2}\right) \frac{h n^{\xi}}{2K p^{-\gamma}} \\ & = & \frac{(1-p)^{1 - \xi} p^{\gamma} n^{\xi}}{2K} h \left( \alpha_{n} 1(h \leq 0) - \beta_{n} 1(h > 0) - \frac{\alpha_{n} + \beta_{n} }{2}\right) + R_{n}. \end{array} $$

The remainder term Rn in the last step accounts for replacing α + β by αn + βn in the expression for \(\zeta _{n_{2}}\) and is bounded (uniformly in \(\theta \in {\Theta }_{n_{1}}^{\tau }\)) up to a constant by

$$ n^{\xi} L \left( |\alpha_{n} - \alpha| + |\beta_{n} - \beta| \right) = O(n^{\xi-1/2}) .$$

As \(\xi < 1/2, \sqrt {n_{2}} P f_{n_{2}, h, \theta }\) converges uniformly to \(|h| \left ({(1-p)^{1 - \xi } p^{\gamma } c_{0}}\right )/({4K}) \). Condition (2.10) can be justified by calculations parallel to the above. Further, \(P f_{n_{2}, h, \theta } = \zeta _{n_{2}}(h,\theta )/\sqrt {n_{2}}\) converges to zero (uniformly over \(\theta \in {\Theta }_{n}^{\tau }\)) and hence, the covariance function of the limiting Gaussian process (for h1,h2 > 0) is given by

$$ \begin{array}{@{}rcl@{}} &&{\lim_{n \rightarrow \infty} P f_{n_{2},h_{1},{\theta}} f_{n_{2},h_{1},{\theta}} } \\ & = & \lim_{n \rightarrow \infty} n_{2}^{1- 2 \xi} \left[\left( \alpha_{n} 1(h \leq 0) + \beta_{n} 1(h > 0) - \frac{\alpha + \beta}{2}\right)^{2} + \sigma^{2}\right] \frac{h_{1} \wedge h_{2} n^{-\eta}}{2K n_{1}^{-\gamma}}\\ & = & \frac{(1-p)^{1 - 2\xi} p^{\gamma} \sigma^{2} }{2K} (h_{1} \wedge h_{2}). \end{array} $$

Analogous results can be established for other choices of (h1,h2) ∈ [−L,L]2. Also, the above convergence can be shown to be uniform in \(\theta \in {\Theta }_{n}^{\tau }\) by a calculation similar to that done for \(\zeta _{n_{2}}\). This justifies the form of the limit Z. Hence, we get the result.

Proof of Theorem 5

As Var(Z(t) − Z(s))≠ 0, uniqueness of the argmin follows immediately from Lemma 2.6 of Kim and Pollard (1990). Also, \(Z(h) \rightarrow \infty \) as \(|h| \rightarrow \infty \) almost surely. This is true as

$$Z(h) = |h| \left[ \sqrt{\frac{(1-p)^{1-2\xi} p^{\gamma}}{2K}}\sigma \frac{B(h)}{|h|} + \frac{(1-p)^{1-\xi} p^{\gamma}}{2K} \frac{c_{0}}{2} \right]$$

with B(h)/|h| converging to zero almost surely as \(|h| \rightarrow \infty \). Consequently, the unique argmin of Z is tight and \(Z \in C_{min}(\mathbb {R})\) with probability one. An application of argmin continuous mapping theorem (Kim and Pollard 1990, Theorem 2.7) then gives us distributional convergence. By dropping a constant multiple, it can be seen that

$$ \underset{h}{\arg\min} Z(h) = \underset{h}{\arg\min} \left[ \sigma B(h) + \sqrt{\frac{(1-p)p^{\gamma}}{2K}} \frac{c_{0}}{2} |h|\right]. $$

As \(\sigma \sqrt {\lambda _{0} } = \sqrt {({(1-p)p^{\gamma }})/({2K})} ({c_{0}}\lambda _{0} )/{2} \), by the rescaling property of Brownian motion,

$$ \begin{array}{@{}rcl@{}} &&{\underset{h}{\arg\min} \left[ \sigma B(h) + \sqrt{\frac{(1-p)p^{\gamma}}{2K}} \frac{c_{0}}{2} |h|\right]}\\ & = & \lambda_{0} \underset{v}{\arg\min} \left[ \sigma B(\lambda_{0} v) + \sqrt{\frac{(1-p)p^{\gamma}}{2K}} \frac{c_{0}}{2}|\lambda_{0}| |v|\right] \\ & \stackrel{d}{=} & \lambda_{0} \underset{v}{\arg\min} \left[ \sigma \sqrt{\lambda_{0} } B(v) + \sqrt{\frac{(1-p)p^{\gamma}}{2K}} \frac{c_{0}}{2} \lambda_{0} |v| \right] \\ & = & \lambda_{0} \underset{v}{\arg\min} \left[ B(v) + |v| \right]. \end{array} $$

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mallik, A., Banerjee, M. & Michailidis, G. M-estimation in Multistage Sampling Procedures. Sankhya A 82, 261–309 (2020).

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI:


  • Cusp estimation
  • M-Estimation
  • Multistage sampling procedures

AMS (2000) subject classification

  • Primary 62E20
  • 62G20
  • Secondary 62L99