Abstract
This paper is concerned with finite sample approximations to the supremum of a non-degenerate U-process of a general order indexed by a function class. We are primarily interested in situations where the function class as well as the underlying distribution change with the sample size, and the U-process itself is not weakly convergent as a process. Such situations arise in a variety of modern statistical problems. We first consider Gaussian approximations, namely, approximate the U-process supremum by the supremum of a Gaussian process, and derive coupling and Kolmogorov distance bounds. Such Gaussian approximations are, however, not often directly applicable in statistical problems since the covariance function of the approximating Gaussian process is unknown. This motivates us to study bootstrap-type approximations to the U-process supremum. We propose a novel jackknife multiplier bootstrap (JMB) tailored to the U-process, and derive coupling and Kolmogorov distance bounds for the proposed JMB method. All these results are non-asymptotic, and established under fairly general conditions on function classes and underlying distributions. Key technical tools in the proofs are new local maximal inequalities for U-processes, which may be useful in other problems. We also discuss applications of the general approximation results to testing for qualitative features of nonparametric functions based on generalized local U-processes.
This is a preview of subscription content, access via your institution.

References
Abrevaya, J., Jiang, W.: A nonparametric approach to measuring and testing curvature. J. Bus. Econ. Stat. 23(1), 1–19 (2005)
Adamczak, R.: Moment inequalities for U-statistics. Ann. Probab. 34(6), 2288–2314 (2006)
Arcones, M., Giné, E.: On the bootstrap of \(U\)- and \(V\)-statistics. Ann. Stat. 20(2), 655–674 (1992)
Arcones, M., Giné, E.: Limit theorems for \(U\)-processes. Ann. Probab. 21(3), 1495–1542 (1993)
Arcones, M., Giné, E.: U-processes indexed by Vapnik–Červonenkis classes of functions with applications to asymptotics and bootstrap of U-statistics with estimated parameters. Stoch. Process. Appl. 52(1), 17–38 (1994)
Bickel, P.J., Freedman, D.A.: Some asymptotic theory for the bootstrap. Ann. Stat. 9(6), 1196–1217 (1981)
Blundell, R., Gosling, A., Ichimura, H., Meghir, C.: Changes in the distribution of male and female wages accounting for employment composition using bounds. Econometrica 75(2), 323–363 (2007)
Borovskikh, Y.V.: U-Statistics in Banach Spaces. V.S.P. Intl Science, Zeist (1996)
Bretagnolle, J.: Lois limits du Bootstrap de certaines functionnelles. Annales de l’Institut Henri Poincaré Section B XIX(3), 281–296 (1983)
Callaert, H., Veraverbeke, N.: The order of the normal approximation for a Studentized \(U\)-statistic. Ann. Stat. 9(1), 360–375 (1981)
Chen, X.: Gaussian and bootstrap approximations for high-dimensional U-statistics and their applications. Ann. Stat. 46(2), 642–678 (2018)
Chernozhukov, V., Chetverikov, D., Kato, K.: Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Stat. 41(6), 2786–2819 (2013)
Chernozhukov, V., Chetverikov, D., Kato, K.: Anti-concentration and honest, adaptive confidence bands. Ann. Stat. 42(5), 1787–1818 (2014)
Chernozhukov, V., Chetverikov, D., Kato, K.: Gaussian approximation of suprema of empirical processes. Ann. Stat. 42(4), 1564–1597 (2014)
Chernozhukov, V., Chetverikov, D., Kato, K.: Empirical and multiplier bootstraps for suprema of empirical processes of increasing complexity, and related gaussian couplings. Stoch. Process. Appl. 126(12), 3632–3651 (2016)
Chetverikov, D.: Testing regression monotonicity in econometric models. arXiv:1212.6757 (2012)
Davydov, Y., Lifshits, M., Smorodina, N.: Local Properties of Distributions of Stochastic Functions (Transaction of Mathematical Monographs, Vol. 173). American Mathematical Society, New York (1998)
de la Peña, V., Giné, E.: Decoupling: From Dependence to Independence. Springer, Berlin (1999)
Dehling, H., Mikosch, T.: Random quadratic forms and the bootstrap for \(U\)-statistics. J. Multivar. Anal. 51(2), 392–413 (1994)
Dudley, R.M.: Real Analysis and Probability. Cambridge University Press, Cambridge (2002)
Dümbgen, L.: Application of local rank tests to nonparametric regression. J. Nonparametric Stat. 14(5), 511–537 (2002)
Einmahl, U., Mason, D.M.: Uniform in bandwidth consistency of kernel-type function estimators. Ann. Stat. 33(3), 1380–1403 (2005)
Ellison, G., Ellison, S.F.: Strategic entry deterrence and the behavior of pharmaceutical incumbents prior to patent expiration. Am. Econ. J. Microecon. 3(1), 1–36 (2011)
Frees, E.W.: Estimating densities of functions of observations. J. Am. Stat. Assoc. 89(426), 517–525 (1994)
Ghosal, S., Sen, A., van der Vaart, A.: Testing monotonicity of regression. Ann. Stat. 28(4), 1054–1082 (2000)
Giné, E., Latała, R., Zinn, J.: Exponential and moment inequalities for \(U\)-statistics. High Dimensional Probability II. Springer, Berlin (2000)
Giné, E., Mason, D.M.: On local \(U\)-statistic processes and the estimation of densities of functions of several sample variables. Ann. Stat. 35(3), 1105–1145 (2007)
Giné, E., Nickl, R.: Uniform limit theorems for wavelet density estimators. Ann. Probab. 37(4), 1605–1646 (2009)
Giné, E., Nickl, R.: Mathematical Foundations of Infinite-Dimensional Statistical Models. Cambridge University Press, Cambridge (2016)
Hall, P.: On convergence rates of suprema. Probab. Theory Relat. Fields 89(4), 447–455 (1991)
Hoeffding, W.: A class of statistics with asymptotically normal distributions. Ann. Math. Stat. 19(3), 293–325 (1948)
Huškova, M., Janssen, P.: Consistency of the generalized bootstrap for degenerate \(U\)-statistics. Ann. Stat. 21(4), 1811–1823 (1993)
Hušková, M., Janssen, P.: Generalized bootstrap for studentized \(U\)-statistics: a rank statistic approach. Stat. Probab. Lett. 16(3), 225–233 (1993)
Janssen, P.: Weighted bootstrapping of \(U\)-statistics. J. Stat. Plann. Inference 38(1), 31–42 (1994)
Koltchinskii, V.I.: Komlos–Major–Tusnády approximation for the general empirical process and Haar expansions of classes of functions. J. Theor. Probab. 7(1), 73–118 (1994)
Komlós, J., Major, P., Tusnády, G.: An approximation of partial sums of independent rv’s and the sample df. I. Z. Wahrscheinlichkeitstheor. Verw. Geb. 32(1–2), 111–131 (1975)
Ledoux, M., Talagrand, M.: Probability in Banach Spaces: Isoperimetry and Processes. Springer, New York (1991)
Lee, S., Linton, O., Whang, Y.-J.: Testing for stochastic monotonicity. Econometrica 77(2), 585–602 (2009)
Albert, Y.L.: A large sample study of the Bayesian bootstrap. Ann. Stat. 15(1), 360–375 (1987)
Mason, D.M., Newton, M.A.: A rank statistics approach to the consistency of a general bootstrap. Ann. Stat. 20(3), 1611–1624 (1992)
Massart, P.: Strong approximation for multivariate empirical and related processes, via KMT constructions. Ann. Probab. 17(1), 266–291 (1989)
Monrad, D., Philipp, W.: Nearby variables with nearby conditional laws and a strong approximation theorem for Hilbert space valued martingales. Probab. Theory Relat. Fields 88(3), 381–404 (1991)
Nolan, D., Pollard, D.: \(U\)-processes: rates of convergence. Ann. Stat. 15(2), 780–799 (1987)
Nolan, D., Pollard, D.: Functional limit theorems for \(U\)-processes. Ann. Probab. 16(3), 1291–1298 (1988)
Piterberg, V.I.: Asymptotic Methods in the Theory of Gaussian Processes and Fields. American Mathematical Society, New York (1996)
Resnick, S.I.: Extreme Values, Regular Variation, and Point Processes. Springer, Berlin (1987)
Rio, E.: Local invariance principles and their application to density estimation. Probab. Theory Relat. Fields 98(1), 21–45 (1994)
Rubin, D.B.: The Bayesian bootstrap. Ann. Stat. 9(1), 130–134 (1981)
Serfling, R.J.: Approximation Theorems of Mathematical Statistics. Wiley, New York (1980)
Sherman, R.P.: Limiting distribution of the maximal rank correlation estimator. Econometrica 61(1), 123–137 (1993)
Sherman, R.P.: Maximal inequalities for degenerate \(U\)-processes with applications to optimization estimators. Ann. Stat. 22(1), 439–459 (1994)
Solon, G.: Intergenerational income mobility in the United States. Am. Econ. Rev. 82(3), 393–408 (1992)
van der Vaart, A., Wellner, J.A.: Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, Berlin (1996)
van der Vaart, A., Wellner, J.A.: A local maximal inequality under uniform entropy. Electron. J. Stat. 5, 192–203 (2011)
Wang, Q., Jing, B.-Y.: Weighted bootstrap for \(U\)-statistics. J. Multivar. Anal. 91(2), 177–198 (2004)
Zhang, D.: Bayesian bootstraps for U-processes, hypothesis tests and convergence of Dirichlet U-processes. Stat. Sin. 11(2), 463–478 (2001)
Acknowledgements
The authors would like to thank the anonymous referees and an Associate Editor for their constructive comments that improve the quality of this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
X. Chen is supported by NSF DMS-1404891, NSF CAREER Award DMS-1752614, and UIUC Research Board Awards (RB17092, RB18099).
Appendices
Appendix A. Supporting lemmas
This appendix collects some supporting lemmas that are repeatedly used in the main text.
Lemma A.1
(An anti-concentration inequality for the Gaussian supremum) Let \((S,{\mathcal {S}},P)\) be a probability space, and let \({\mathcal {G}}\subset L^{2}( P )\) be a P-pre-Gaussian class of functions. Denote by \(W_{P}\) a tight Gaussian random variable in \(\ell ^{\infty }({\mathcal {G}})\) with mean zero and covariance function \({\mathbb {E}}[ W_{P}(g) W_{P}(g') ] = \mathrm {Cov}_{P}(g,g')\) for all \(g,g' \in {\mathcal {G}}\) where \(\mathrm {Cov}_{P}(\cdot ,\cdot )\) denotes the covariance under P. Suppose that there exist constants \({\underline{\sigma }}, {\overline{\sigma }}>0\) such that \({\underline{\sigma }}^{2} \leqslant \mathrm {Var}_{P}(g) \leqslant {\overline{\sigma }}^{2}\) for all \(g \in {\mathcal {G}}\). Then for every \(\varepsilon > 0\),
where \(C_{\sigma }\) is a constant depending only on \({\underline{\sigma }}\) and \({\overline{\sigma }}\).
Proof
See Lemma A.1 in [14]. \(\square \)
Lemma A.2
Let \({\mathcal {F}}\) be a class of real-valued measurable functions on a measurable space \(({\mathcal {X}},{\mathcal {A}})\) with finite measurable envelope F. Then for any probability measure R on \(({\mathcal {X}},{\mathcal {A}})\) such that \(RF^{2} < \infty \), we have
for every \(0 < \varepsilon \leqslant 1\), where \(\sup _{Q}\) is taken over all finitely discrete distributions on \({\mathcal {X}}\).
Proof
This follows from approximating R by a finitely discrete distribution. See Problem 2.5.1 in [53]. \(\square \)
Lemma A.3
Let \(({\mathcal {X}},{\mathcal {A}}), ({\mathcal {Y}},{\mathcal {C}})\) be measurable spaces and let \({\mathcal {F}}\) be a class of real-valued jointly measurable functions on \({\mathcal {X}}\times {\mathcal {Y}}\) with finite measurable envelope F. Let R be a probability measure on \(({\mathcal {Y}},{\mathcal {C}})\) and for a jointly measurable function \(f: {\mathcal {X}}\times {\mathcal {Y}}\rightarrow {\mathbb {R}}\), define \({\overline{f}}: {\mathcal {X}}\rightarrow {\mathbb {R}}\) by \({\overline{f}}(x) := \int f(x,y) dR(y)\) whenever the latter integral is defined and finite for every \(x \in {\mathcal {X}}\). Suppose that \({\overline{F}}\) is everywhere finite and let \({\overline{{\mathcal {F}}}} = \{ {\overline{f}} : f \in {\mathcal {F}}\}\). Then, for every \(r,s \in [1,\infty )\),
where \(\sup _{Q}\) and \(\sup _{Q'}\) are taken over all finitely discrete distributions on \({\mathcal {X}}\) and \({\mathcal {X}}\times {\mathcal {Y}}\), respectively.
Proof
This follows from Lemma A.2 in [25] combined with Lemma A.2. \(\square \)
If \(R=\delta _{y}\) for some \(y \in {\mathcal {Y}}\), then \(\Vert \delta _{y} f \Vert _{Q,r}^{r} = \Vert f \Vert _{Q \times \delta _{y},r}^{r}\) (with \(\delta _{y} f(x) = f(x,y)\)) and \(Q \times \delta _{y}\) is finitely discrete if Q is so. Hence, we have the following corollary.
Corollary A.4
Under the setting of Lemma A.3, for every \(y \in {\mathcal {Y}}\) and \(r \in [1,\infty )\),
Lemma A.5
Let \({\mathcal {F}}\) and \({\mathcal {G}}\) be function classes on a set \({\mathcal {X}}\) with finite envelopes F and G, respectively. If \({\mathcal {F}}\cdot {\mathcal {G}}\) stands for the class of pointwise products of functions from \({\mathcal {F}}\) and \({\mathcal {G}}\), then for any \(r \in [1,\infty )\),
where \(\sup _{Q}\) is taken over all finitely discrete distributions on \({\mathcal {X}}\).
Proof
See Lemma A.1 in [25] or [53, Section 2.10.3]. \(\square \)
Appendix B. Strassen–Dudley theorem and its conditional version
In this appendix, we state the Strassen–Dudley theorem together with its conditional version due to [42]. These results play fundamental roles in the proofs of Proposition 2.1 and Theorem 3.1. In what follows, let (S, d) be a Polish metric space equipped with its Borel \(\sigma \)-field \({\mathcal {B}}(S)\). For any set \(A \subset S\) and \(\delta > 0\), let \(A^{\delta } = \{ x \in S : \inf _{y \in A} d(x,y) \leqslant \delta \}\). We first state the Strassen–Dudley theorem.
Theorem B.1
(Strassen–Dudley) Let X be an S-valued random variable defined on a probability space \((\Omega ,{\mathcal {A}},{\mathbb {P}})\) which admits a uniform random variable on (0, 1) independent of X. Let \(\alpha , \beta >0\) be given constants, and let G be a Borel probability measure on S such that \({\mathbb {P}}(X \in A) \leqslant G(A^{\alpha })+ \beta \) for all \(A \in {\mathcal {B}}(S)\). Then there exists an S-valued random variable Y such that \({\mathcal {L}}(Y) (:= {\mathbb {P}}\circ Y^{-1}) = G\) and \({\mathbb {P}}(d(X,Y) > \alpha ) \leqslant \beta \).
For a proof of the Strassen–Dudley theorem, we refer to [20]. Next, we state a conditional version of the Strassen–Dudley theorem due to [42, Theorem 4].
Theorem B.2
(Conditional version of Strassen–Dudley) Let X be an S-valued random variable defined on a probability space \((\Omega ,{\mathcal {A}},{\mathbb {P}})\), and let \({\mathcal {G}}\) be a countably generated sub \(\sigma \)-field of \({\mathcal {A}}\). Suppose that there is a uniform random variable on (0, 1) independent of \({\mathcal {G}}\vee \sigma (X)\), and let \(\Omega \times {\mathcal {B}}(S) \ni (\omega ,A) \mapsto G(A \mid {\mathcal {G}}) (\omega )\) be a regular conditional distribution given \({\mathcal {G}}\), i.e., for each fixed \(A \in {\mathcal {B}}(S)\), \(G(A \mid {\mathcal {G}})\) is measurable with respect to \({\mathcal {G}}\) and for each fixed \(\omega \in \Omega \), \(G(\cdot \mid {\mathcal {G}})(\omega )\) is a probability measure on \({\mathcal {B}}(S)\). If
then there exists an S-valued random variable Y such that the conditional distribution of Y given \({\mathcal {G}}\) is identical to \(G(\cdot \mid {\mathcal {G}})\), and \({\mathbb {P}}( d(X,Y) > \alpha ) \leqslant \beta \).
Remark B.1
(i) The map \((\omega ,A) \mapsto {\mathbb {P}}(X \in A \mid {\mathcal {G}})(\omega )\) should be understood as a regular conditional distribution (which is guaranteed to exist since X takes values in a Polish space). (ii) \({\mathbb {E}}^{*}\) denotes the outer expectation.
For completeness, we provide a self-contained proof of Theorem B.2, since [42] do not provide its direct proof.
Proof of Theorem B.2
Since \({\mathcal {G}}\) is countably generated, there exists a real-valued random variable W such that \({\mathcal {G}}= \sigma (W)\). For \(n=1,2,\dots \) and \(k \in {\mathbb {Z}}\), let \(D_{n,k} = \{ k/2^{n} \leqslant W < (k+1)/2^{n} \}\). For each n, \(\{ D_{n,k} : k \in {\mathbb {Z}} \}\) forms a partition of \(\Omega \). Pick any D from \(\{ D_{n,k} : n =1,2,\dots ; k \in {\mathbb {Z}} \}\); let \({\mathbb {P}}_{D} = {\mathbb {P}}(\cdot \mid D)\) and \(G(\cdot \mid D) = \int G(\cdot \mid {\mathcal {G}}) d{\mathbb {P}}_{D}\). Then, the Strassen–Dudley theorem yields that there exists an S-valued random variable \(Y_{D}\) such that \({\mathbb {P}}_{D} \circ Y_{D}^{-1} = G(\cdot \mid D)\) and \({\mathbb {P}}_{D}(d(X,Y_{D}) > \alpha ) \leqslant \varepsilon (D) := \sup _{A \in {\mathcal {B}}(S)} \{ {\mathbb {P}}_{D}(X \in A) - G(A^{\alpha } \mid D) \}\).
For each \(n=1,2,\dots \), let \(Y_{n} = \sum _{k \in {\mathbb {Z}}} Y_{D_{n,k}} 1_{D_{n,k}}\), and observe that
Let M be any (proper) random variable such that \(M \geqslant \sup _{A \in {\mathcal {B}}(S)} \{ {\mathbb {P}}(X \in A \mid {\mathcal {G}}) - G(A^{\alpha } \mid {\mathcal {G}}) \}\), and observe that
where the notation \({\mathbb {E}}^{{\mathbb {P}}_{D}}\) denotes the expectation under \({\mathbb {P}}_{D}\). So,
and taking infimum with respect to M yields that the left hand side is bounded by \(\beta \).
Next, we shall verify that \(\{ {\mathcal {L}}(Y_{n}) : n \geqslant 1 \}\) is uniformly tight. In fact,
and since any Borel probability measure on a Polish space is tight by Ulam’s theorem, \(\{ {\mathcal {L}}(Y_{n}) : n \geqslant 1 \}\) is uniformly tight. This implies that the family of joint laws \(\{ {\mathcal {L}}(X,W,Y_{n}) : n \geqslant 1 \}\) is uniformly tight and hence has a weakly convergent subsequence by Prohorov’s theorem. Let \({\mathcal {L}}(X,W,Y_{n'}) {\mathop {\rightarrow }\limits ^{w}} Q\) (the notation \({\mathop {\rightarrow }\limits ^{w}}\) denotes weak convergence), and observe that the marginal law of Q on the “first two” coordinates, \(S \times {\mathbb {R}}\), is identical to \({\mathcal {L}}(X,W)\).
We shall verify that there exists an S-valued random variable Y such that \({\mathcal {L}}(X,W,Y) =Q\). Since S is polish, there exists a unique regular conditional distribution, \({\mathcal {B}}(S) \times (S \times {\mathbb {R}}) \ni (A,(x,w)) \mapsto Q_{x,w}(A) \in [0,1]\), for Q given the first two coordinates. By the Borel isomorphism theorem [20, Theorem 13.1.1], there exists a bijective map \(\pi \) from S onto a Borel subset of \({\mathbb {R}}\) such that \(\pi \) and \(\pi ^{-1}\) are Borel measurable. Pick and fix any \((x,w) \in S \times {\mathbb {R}}\), and observe that \(Q_{x,w} \circ \pi ^{-1}\) extends to a Borel probability measure on \({\mathbb {R}}\). Denote by \(F_{x,w}\) the distribution function of \(Q_{x,w} \circ \pi ^{-1}\), and let \(F_{x,w}^{-1}\) denotes its quantile function. Let U be a uniform random variable on (0, 1) (defined on \((\Omega ,{\mathcal {A}},{\mathbb {P}})\)) independent of (X, W). Then \(F_{x,w}^{-1} (U)\) has law \(Q_{x,w} \circ \pi ^{-1}\), and hence \(Y = \pi ^{-1} \circ F_{X,W}^{-1} (U)\) is the desired random variable.
Now, for any bounded continuous function f on S, observe that, whenever \(N \geqslant n\), \({\mathbb {E}}[ f(Y_{N})1_{D_{n,k}} ] = \int _{D_{n,k}} \int f(y) G(dy \mid {\mathcal {G}}) d{\mathbb {P}}\), which implies that the conditional distribution of Y given \({\mathcal {G}}\) is identical to \(G( \cdot \mid {\mathcal {G}})\). Finally, the Portmanteau theorem yields \({\mathbb {P}}(d(X,Y)> \alpha ) \leqslant \liminf _{n'} {\mathbb {P}}(d(X,Y_{n'}) > \alpha ) \leqslant \beta \). This completes the proof. \(\square \)
Appendix C. Additional proofs for the main text
1.1 C.1. Proof of Lemma 6.1
We begin with noting that \({\mathcal {G}}\) is VC type with characteristics \(4\sqrt{A}\) and 2v for envelope G. The rest of the proof is almost the same as that of Theorem 2.1 in [15] with \(B(f) \equiv 0\) (up to adjustments of the notation), but we now allow \(q=\infty \). To avoid repetitions, we only point out required modifications. In what follows, we will freely use the notation in the proof of [15, Theorem 2.1], but modify \(K_{n}\) to \(K_{n} = v \log (A \vee n)\), and C refers to a universal constant whose value may vary from place to place. In Step 1, change \(\varepsilon \) to \(\varepsilon =1/n^{1/2}\). For this choice, \(\log N({\mathcal {F}},e_{P},\varepsilon b) \leqslant C \log (Ab/(\varepsilon b)) = C\log (A/\varepsilon ) \leqslant CK_{n}\), and Dudley’s entropy integral bound yields that \({\mathbb {E}}[ \Vert G_{P} \Vert _{{\mathcal {F}}_{\varepsilon }}] \leqslant C\varepsilon b \sqrt{\log (Ab/(\varepsilon b))} \leqslant Cb\sqrt{K_{n}/n}\) (there is a slip in the estimate of \({\mathbb {E}}[\Vert G_{P}\Vert _{{\mathcal {F}}_{\varepsilon }}]\) in [15], namely, “\(Ab/\varepsilon \)” inside the log should read “\(Ab/(\varepsilon b)\)”, which of course does not affect the proof under their definition of \(K_{n}\)). Combining the Borell-Sudakov-Tsirel’son inequality yields that \({\mathbb {P}}\{ \Vert G_{P}\Vert _{{\mathcal {F}}_{\varepsilon }} > C b\sqrt{K_{n}/n} \} \leqslant 2n^{-1}\). In Step 3, Corollary 5.5 in the present paper (with \(r=k=1\)) yields that \({\mathbb {E}}[ \Vert {\mathbb {G}}_{n} \Vert _{{\mathcal {F}}_{\varepsilon }}] \leqslant C(b\sqrt{K_{n}/n} + bK_{n}/n^{1/2-1/q}) \leqslant CbK_{n}/n^{1/2-1/q}\), which is valid even when \(q=\infty \). Then, instead of applying their Lemma 6.1, we apply Markov’s inequality to deduce that
In Step 4, instead of their equation (14), we have
whenever \(\delta \geqslant 2c\sigma ^{-1/2}(\log N)^{3/2} \cdot (\log n)\) for some universal constant c (\(C_{7}\) comes from their Theorem 3.1 and is universal). Finally, in Step 5, take
for some large but universal constant \(C' > 1\). Under the assumption that \(K_{n}^{3} \leqslant n\), this choice ensures that \(\delta \geqslant 2c\sigma ^{-1/2}(\log N)^{3/2} \cdot (\log n)\), and
It remains to bound \(M_{n,X}(\delta )\). For finite q, their Step 4 shows that
Since \(\log N \leqslant C''K_{n}\) for some universal constant \(C''\), the right hand side is bounded by
Since \(K_{n}\) is bounded from below by a universal positive constant (by assumption), and \(\gamma \in (0,1)\), by taking \(C' > C''\), the above term is bounded by \(\gamma \) up to a universal constant.
Now, consider the \(q=\infty \) case. In that case, \(\max _{1 \leqslant j \leqslant N}| {\widetilde{X}}_{1j} | \leqslant 2b\) almost surely and \(\delta \sqrt{n}/\log N \geqslant 2C'b/(C''\gamma ) > 2b\) provided that \(C' > C''\). Hence \(M_{n,X}(\delta ) =0\) in that case. These modifications lead to the desired conclusion. \(\square \)
1.2 C.1. Proofs for Sect. 4
We first prove Theorem 4.2 and Corollary 4.3, and then prove Lemma 4.1 and Theorem 4.4.
Proof of Theorem 4.2
In what follows, the notation \(\lesssim \) signifies that the left hand side is bounded by the right hand side up to a constant that depends only on \(r,m,\zeta ,c_1,c_2,C_1,L\). We also write \(a \simeq b\) if \(a \lesssim b\) and \(b \lesssim a\). In addition, let \(c,C,C'\) denote generic constants depending only on \(r, m,\zeta , c_{1},c_{2}, C_{1}, L\); their values may vary from place to place. We divide the rest of the proof into three steps.
Step 1 Let
In this step, we shall show that the result (15) holds with \({\widehat{S}}_{n}\) and \({\widehat{S}}_{n}^{\sharp }\) replaced by \(S_{n}\) and \(S_{n}^{\sharp }\), respectively.
We first verify Conditions (PM), (VC), (MT), and (5) for the function class
with a symmetric envelope
Condition (PM) follows from our assumption. For Condition (VC), that \({\mathcal {H}}_{n}\) is VC type with characteristics \((A', v')\) satisfying \(\log A' \lesssim \log n\) and \(v' \lesssim 1\) follows from a slight modification of the proof of Lemma 3.1 in [25]. The latter part follows from our assumption. Condition (VC) guarantees the existence of a tight Gaussian random variable \({\mathcal {W}}_{P,n}(g), g \in P^{r-1}{\mathcal {H}}_{n} =: {\mathcal {G}}_{n}\) in \(\ell ^{\infty }({\mathcal {G}}_{n})\) with mean zero and covariance function \({\mathbb {E}}[{\mathcal {W}}_{P,n}(g){\mathcal {W}}_{P,n}(g')] = \mathrm {Cov}_{P}(g,g')\) for \(g,g' \in {\mathcal {G}}_{n}\). Let \(W_{P,n} (\vartheta ) = {\mathcal {W}}_{P,n}(g_{n,\vartheta })\) for \(\vartheta \in \Theta \) where \(g_{n,\vartheta } = b_{n}^{m/2} c_{n}(\vartheta )^{-1} P^{r-1}h_{n,\vartheta }\). It is seen that \(W_{P,n}(\vartheta ), \vartheta \in \Theta \) is a tight Gaussian random variable in \(\ell ^{\infty }(\Theta )\) with mean zero and covariance function (14).
Next, we determine the values of parameters \({\underline{\sigma }}_{{\mathfrak {g}}}, {\overline{\sigma }}_{{\mathfrak {g}}}, b_{{\mathfrak {g}}}, \sigma _{{\mathfrak {h}}}, b_{{\mathfrak {h}}}, \chi _{n},\nu _{{\mathfrak {h}}}\) for the function class \({\mathcal {H}}_n\). We will show in Step 3 that we may choose
and bound \(\nu _{{\mathfrak {h}}}\) and \(\chi _{n}\) as
Given these choices and bounds, Corollaries 2.2 and 3.2 yield that
Step 2 Observe that
We shall bound \(\sup _{\vartheta \in \Theta } | c_{n}(\vartheta )/{\widehat{c}}_{n}(\vartheta ) - 1|, \Vert \sqrt{n}U_{n} \Vert _{{\mathcal {H}}_{n}}\), and \(\Vert {\mathbb {U}}_{n}^{\sharp } \Vert _{{\mathcal {H}}_{n}}\).
Choose \(n_{0}\) by the smallest n such that \(C_{1}n^{-c_{2}} \leqslant 1/2\); it is clear that \(n_{0}\) depends only on \(c_{2}\) and \(C_{1}\). It suffices to prove (15) for \(n \geqslant n_{0}\), since for \(n < n_{0}\), the result (15) becomes trivial by taking C sufficiently large. So let \(n \geqslant n_{0}\). Then Condition (T8) ensures that with probability at least \(1-C_{1}n^{-c_{2}}\), \(\inf _{\vartheta \in \Theta } {\widehat{c}}_{n}(\vartheta )/c_{n}(\vartheta ) \geqslant 1/2\). Since \(| a^{-1} - 1 | \leqslant 2 | a - 1 |\) for \(a \geqslant 1/2\), Condition (T8) also ensures that
Next, we shall bound \(\Vert \sqrt{n}U_{n} \Vert _{{\mathcal {H}}_{n}}\) and \(\Vert {\mathbb {U}}_{n}^{\sharp } \Vert _{{\mathcal {H}}_{n}}\). Given (38) and (39), and in view of the fact that the covering number of \({\mathcal {H}}_{n} \cup (-{\mathcal {H}}_{n}) := \{ h,-h : h \in {\mathcal {H}}_{n} \}\) is at most twice that of \({\mathcal {H}}_{n}\), applying Corollaries 2.2 and 3.2 to the function class \({\mathcal {H}}_{n} \cup (-{\mathcal {H}}_{n})\), we deduce that
(Theorem 3.7.28 in [29] ensures that the Gaussian process \({\mathcal {W}}_{P,n}\) extends to the symmetric convex hull of \({\mathcal {G}}_{n}\) in such a way that \({\mathcal {W}}_{P,n}\) has linear, bounded, and uniformly continuous (with respect to the intrinsic pseudometric) sample paths; in particular, \(\{ {\mathcal {W}}_{P,n}(g) : g \in {\mathcal {G}}_{n} \cup (-{\mathcal {G}}_{n}) \}\) is a tight Gaussian random variable in \(\ell ^{\infty }({\mathcal {G}}_{n} \cup (-{\mathcal {G}}_{n}))\) with mean zero and covariance function \({\mathbb {E}}[{\mathcal {W}}_{P,n}(g){\mathcal {W}}_{P,n}(g')] = \mathrm {Cov}_{P}(g,g')\) for \(g,g' \in {\mathcal {G}}_{n} \cup (-{\mathcal {G}}_{n})\) and \(\sup _{g \in {\mathcal {G}}_{n} \cup (-{\mathcal {G}}_{n})} {\mathcal {W}}_{n}(g) = \Vert {\mathcal {W}}_{P,n} \Vert _{{\mathcal {G}}_{n}}\).) Dudley’s entropy integral bound and the Borell-Sudakov-Tsirel’son inequality yield that \({\mathbb {P}}\{ \Vert {\mathcal {W}}_{P,n} \Vert _{{\mathcal {G}}_{n}} > C(\log n)^{1/2} \} \leqslant 2n^{-1}\), so that
Now, the desired result (15) follows from combining (40)–(43) and the anti-concentration inequality (Lemma A.1). In fact, the anti-concentration inequality yields
Hence, combining the bounds (40)–(44), we have for every \(t \in {\mathbb {R}}\),
and likewise \({\mathbb {P}}({\widehat{S}}_{n} \leqslant t) \geqslant {\mathbb {P}}({\widetilde{S}}_{n} \leqslant t) - Cn^{-c}\). Similarly, we have
Step 3 It remains to verify (38) and (39). First, that we may choose \({\underline{\sigma }}_{{\mathfrak {g}}} \simeq 1\) follows from Conditions (T6) and (T7). For \(\varphi \in \Phi \) and \(k=1,\dots ,r-1\), let
and define \({\overline{\varphi }}_{[r-k]}\) similarly. Then, for \(k=1,\dots ,r\),
where \(x-b_{n}x_{k+1:r} = (x-b_{n}x_{k+1},\dots ,x-b_{n}x_{r})\). Likewise, we have
Suppose first that q is finite and let \(\ell \in [2,q]\). Observe that by Jensen’s inequality,
so that \(\sup _{h \in {\mathcal {H}}_n} \Vert P^{r-k}h \Vert _{P^k,\ell } \lesssim b_n^{-m[(k-1/2)-k/\ell ]}\). Hence, we may choose \({\overline{\sigma }}_{\mathfrak {g}}\simeq 1\) and \(\sigma _{\mathfrak {h}}\simeq b_n^{-m/2}\). Similarly, Jensen’s inequality and the symmetry of \({\overline{\varphi }}\) yield that
so that \(\Vert P^{r-k} H_n \Vert _{P^k,\ell } \lesssim b_n^{-m[(1-1/\ell )k - (1/2-1/\ell )]}\). Hence, we may choose \(b_{\mathfrak {g}}\simeq b_n^{-m/2}\), \(b_{\mathfrak {h}}\simeq b_n^{-3m/2}\), and bound \(\chi _{n}\) as
Similar calculations yield that
Hence, \(\nu _{{\mathfrak {h}}} \lesssim b_n^{-m(1-1/q)}\).
It is not difficult to verify that (38) and (39) hold in the \(q=\infty \) case as well under the convention that \(1/q=0\) for \(q=\infty \). This completes the proof. \(\square \)
Proof of Corollary 4.3
Let \(\eta _{n} := Cn^{-c}\) where the constants c, C are those given in Theorem 4.2. Denote by \(q_{{\widetilde{S}}_{n}}(\alpha )\) the \(\alpha \)-quantile of \({\widetilde{S}}_{n}\). Define the event
whose probability is at least \(1-\eta _{n}\). On this event,
where the second equality follows from the fact that the distribution function of \({\widetilde{S}}_{n}\) is continuous (cf. Lemma A.1). This shows that the inequality \(q_{{\widehat{S}}_{n}^{\sharp }}(\alpha ) \leqslant q_{{\widetilde{S}}_{n}}(\alpha +\eta _{n})\) holds on the event \({\mathcal {E}}_{n}\), so that
The above discussion presumes that \(\alpha + \eta _{n} < 1\), but if \(\alpha + \eta _{n} \geqslant 1\), then the last inequality is trivial. Likewise, we have \({\mathbb {P}}\left\{ {\widehat{S}}_{n} \leqslant q_{{\widehat{S}}_{n}^{\sharp }}(\alpha ) \right\} \geqslant \alpha -3\eta _{n}\). This completes the proof. \(\square \)
Proof of Lemma 4.1
We begin with noting that
where \(\breve{h}_{n,\vartheta } = b_{n}^{m/2}c_{n}(\vartheta )^{-1} h_{n,\vartheta }\). We note that \(\mathrm {Var}_{P}(P^{r-1}\breve{h}_{n,\vartheta }) =1\) by the definition of \(c_{n}(\vartheta )\). Recall from the proof of Theorem 4.2 that the function class \({\mathcal {H}}_{n} =\{ \breve{h}_{n,\vartheta } : \vartheta \in \Theta \}\) is VC type with characteristics \((A', v')\) satisfying \(\log A' \lesssim \log n\) and \(v' \lesssim 1\) for envelope \(H_{n}\). Now, from Step 5 in the proof of Theorem 3.1 applied with \({\mathcal {H}}= {\mathcal {H}}_{n}\), we have for every \(\gamma \in (0,1)\), with probability at least \(1-\gamma -n^{-1}\),
for some constant C depending only on r. The desired result follows from the choices of parameters \({\overline{\sigma }}_{{\mathfrak {g}}}, b_{{\mathfrak {g}}}, \sigma _{{\mathfrak {h}}}, b_{{\mathfrak {h}}}, \chi _{n}\), and \(\nu _{{\mathfrak {h}}}\) given in the proof of Theorem 4.2 together with choosing \(\gamma = n^{-c}\) for some constant c sufficiently small but depending only on \(r, m, \zeta , c_{1},c_{2}, C_{1}, L\). \(\square \)
Proof of Theorem 4.4
The proof follows from similar arguments to those in the proof of Theorem 4.2, so we only highlight the differences. Define the function class
with a symmetric envelope
Recall that we assume \(q=\infty \) in this theorem. In view of the calculations in the proof of Theorem 4.2, we may choose
and bound \(\nu _{{\mathfrak {h}}}\) and \(\chi _{n}\) as
Given these choices and bounds, the conclusion of the theorem follows from repeating the proof of Theorem 4.2. \(\square \)
Appendix D. Conditional UCLT for JMB
In this section we prove the conditional UCLT for the JMB when the function class \({\mathcal {H}}\) and the distribution P are independent of n under a metric entropy condition. We obey the notation used in Sects. 2 and 3 but since we consider a limit theorem we assume that the probability space is \((\Omega ,{\mathcal {A}},{\mathbb {P}}) = (S^{{\mathbb {N}}},{\mathcal {S}}^{{\mathbb {N}}},P^{{\mathbb {N}}}) \times (\Xi , {\mathcal {C}}, R)\) and \(X_{1},X_{2},\dots \) are the coordinate projections of \((S^{{\mathbb {N}}},{\mathcal {S}}^{{\mathbb {N}}},P^{{\mathbb {N}}})\). To formulate the conditional UCLT, recall that weak convergence in \(\ell ^{\infty }({\mathcal {H}})\) is “metrized” by the bounded Lipschitz distance: for arbitrary maps \({\mathbb {X}}_{n}: \Omega \rightarrow \ell ^{\infty }({\mathcal {H}})\) and a tight Borel measurable map \({\mathbb {X}}: \Omega \rightarrow \ell ^{\infty }({\mathcal {H}})\), \({\mathbb {X}}_{n}\) converge weakly to \({\mathbb {X}}\) if and only if
where \(BL_{1} = \{ f : \ell ^{\infty }({\mathcal {H}}) \rightarrow {\mathbb {R}}: |f| \leqslant 1, |f(x)-f(y)| \leqslant \Vert x-y \Vert _{{\mathcal {H}}} \ \forall x,y \in \ell ^{\infty }({\mathcal {H}}) \}\); see [53, p. 73]. If the function class \({\mathcal {G}}= P^{r-1} {\mathcal {H}}= \{ P^{r-1} h : h \in {\mathcal {H}}\}\) is P-pre-Gaussian, then there exists a tight Gaussian random variable \(W_{P}\) in \(\ell ^{\infty }({\mathcal {G}})\) with mean zero and covariance function \({\mathbb {E}}[W_{P}(g)W_{P}(g')] = \mathrm {Cov}_{P} (g,g')\). Set \({\mathbb {W}}_{P} (h) = W_{P} \circ P^{r-1} (h)\), which is a tight Gaussian random variable in \(\ell ^{\infty }({\mathcal {H}})\) with mean zero and covariance function \({\mathbb {E}}[{\mathbb {W}}_{P} (h){\mathbb {W}}_{P}(h')] = \mathrm {Cov}_{P}(P^{r-1}h,P^{r-1}h')\). We will show that conditionally on \(X_{1}^{\infty } = \{ X_{1},X_{2},\dots \}\), \({\mathbb {U}}_{n}^{\sharp }\) converges weakly to \({\mathbb {W}}_{P}\) in probability in the sense that
converges to zero in outer probability under regularity conditions (\({\mathbb {E}}_{\mid X_{1}^{\infty }}\) denotes the conditional expectation given \(X_{1}^{\infty }\)). Since the map \((\xi _{1},\dots ,\xi _{n}) \mapsto n^{-1/2} \sum _{i=1}^n \xi _{i}[ U_{n-1,-i}^{(r-1)} (\delta _{X_{i}}\cdot ) - U_n(\cdot ) ]\) is continuous from \({\mathbb {R}}^{n}\) into \(\ell ^{\infty }({\mathcal {H}})\), the multiplier process \({\mathbb {U}}_{n}^{\sharp }\) induces a Borel measurable map into \(\ell ^{\infty }({\mathcal {H}})\) for fixed \(X_{1}^{\infty }\). For an arbitrary map \(Y: \Omega \rightarrow {\mathbb {R}}\), let \(Y^{*}\) denote the measurable cover [53, lemma 1.2.1].
Theorem D.1
(Conditional UCLT for JMB) Let \({\mathcal {H}}\) be a fixed pointwise measurable class of symmetric measurable functions on \(S^{r}\) with symmetric envelope \(H \in L^{2}(P^{r})\) such that \(\int _{0}^{1} \sqrt{\lambda (\varepsilon )} d\varepsilon < \infty \) with \(\lambda (\varepsilon ) = \sup _{Q} \log N({\mathcal {H}},\Vert \cdot \Vert _{Q,2},\varepsilon \Vert H \Vert _{Q,2})\). Then \({\mathcal {G}}= P^{r-1}{\mathcal {H}}= \{ P^{r-1} h : h \in {\mathcal {H}}\}\) is P-pre-Gaussian, \(d_{BL}({\mathbb {U}}_{n}/r,{\mathbb {W}}_{P}) \rightarrow 0\), and \(d_{BL \mid X_{1}^{\infty }}({\mathbb {U}}_{n}^{\sharp },{\mathbb {W}}_{P})^{*} {\mathop {\rightarrow }\limits ^{{\mathbb {P}}}} 0\) as \(n \rightarrow \infty \).
Theorem D.1 should be compared with Theorem 2.1 in [5] that establishes a conditional UCLT for the empirical bootstrap for a non-degenerate U-process under the same metric entropy condition. Interestingly, however, our moment condition on the envelope H is weaker than their condition (2.3), which, if \(r=2\), requires \({\mathbb {E}}[H(X_1,X_1)]<\infty \) in addition to \({\mathbb {E}}[H^{2}(X_1,X_2)] < \infty \). This comes from the difference in how to estimate the Hajék projection; our JMB estimates the Hajék projection by a jackknife U-statistic, while the empirical bootstrap estimates it by a V-statistic (see Remark 3.1).
If we are interested in \(\sup _{h \in {\mathcal {H}}} {\mathbb {U}}_{n}(h)/r\), then the result of Theorem D.1 implies that
as long as the distribution function of \(\sup _{g \in {\mathcal {G}}} W_{P}(g)\) is continuous, which is true if \(\inf _{g \in {\mathcal {G}}} \mathrm {Var}_{P}(g) > 0\) (cf. Lemma A.1). When the function class \({\mathcal {H}}\) is centrally symmetric (i.e., \(-h \in {\mathcal {H}}\) whenever \(h \in {\mathcal {H}}\)) so that \(\sup _{h \in {\mathcal {H}}}{\mathbb {U}}_{n}(h) = \Vert {\mathbb {U}}_{n} \Vert _{{\mathcal {H}}}\), \(\sup _{g \in {\mathcal {G}}}W_{P}(g) = \Vert W_{P} \Vert _{{\mathcal {G}}}\), and \(\sup _{h \in {\mathcal {H}}}{\mathbb {U}}_{n}^{\sharp }(h) = \Vert {\mathbb {U}}_{n}^{\sharp } \Vert _{{\mathcal {H}}}\), then the distribution function of \(\Vert W_{P} \Vert _{{\mathcal {G}}}\) is continuous under a much less restrictive assumption that \(\mathrm {Var}_{P}(g) > 0\) for some \(g \in {\mathcal {G}}\). Indeed, from Theorem 11.1 in [17], the distribution of \(\Vert W_{P} \Vert _{{\mathcal {G}}}\) is (absolutely) continuous on \((\ell _{0},\infty )\) with \(\ell _{0} \geqslant 0\) being the left endpoint of the support of \(\Vert W_{P} \Vert _{{\mathcal {G}}}\), but from [37, p. 57–58], \(\ell _{0} = 0\). This implies that, unless \(\Vert W_{P} \Vert _{{\mathcal {G}}} = 0\) almost surely, the distribution function of \(\Vert W_{P} \Vert _{{\mathcal {G}}}\) does not have a jump at \(\ell _{0} = 0\) (as \({\mathbb {P}}(\Vert W_{P} \Vert _{{\mathcal {G}}} = 0) = 0\)) and so is everywhere continuous on \({\mathbb {R}}\).
Proof of Theorem D.1
The first two results are essentially implied by the proof of Theorem 4.9 in [4] but we include their proofs for completeness. By changing H to \(H \vee 1\) if necessary, we may assume \(\Vert G \Vert _{P,2} > 0\) (recall \(G=P^{r-1}H\)), which implies \(\Vert H \Vert _{P,2} > 0\). By Jensen’s inequality, \(\Vert P^{r-1}h \Vert _{P,2} \leqslant \Vert h \Vert _{P^{r},2}\) and so we have
The right hand side is bounded by \(\sup _{Q}N({\mathcal {H}},\Vert \cdot \Vert _{Q,2},\tau \Vert H \Vert _{Q,2}/4)\) by Lemma A.2. Conclude that
which implies by Dudley’s criterion for sample continuity that \({\mathcal {G}}\) is P-pre-Gaussian (to be precise we have to verify \(\int _{0}^{1} \sqrt{\log N(\{ g-Pg : g \in {\mathcal {G}}\},\Vert \cdot \Vert _{P,2},\tau )} d\tau < \infty \) but this is immediate). The convergence of marginals of \({\mathbb {U}}_{n}/r\) to \({\mathbb {W}}_{P}\) follows from the multidimensional CLT for U-statistics. To conclude \(d_{BL}({\mathbb {U}}_{n}/r,{\mathbb {W}}_{P}) \rightarrow 0\), it suffices to show the asymptotic equicontinuity condition
holds for every \(\eta > 0\). We defer the proof of (45) after the proof of the theorem.
To prove the last result of the theorem, let \(e_{P} (h,h') = \Vert P^{r-1}(h-h') \Vert _{P,2}\) and for given \(\delta > 0\) let \(\{ h_{1},\dots ,h_{N(\delta )} \}\) be a \((\delta \Vert G \Vert _{P,2})\)-net of \(({\mathcal {H}},e_{P})\). Let \(\pi _{\delta }: {\mathcal {H}}\rightarrow \{ h_{1},\dots ,h_{N(\delta )} \}\) be a map such that for each \(h \in {\mathcal {H}}\), \(e_{P} (h,\pi _{\delta }(h)) \leqslant \delta \Vert G \Vert _{P,2}\). Define \({\mathbb {U}}_{n,\delta }^{\sharp } := {\mathbb {U}}_{n}^{\sharp } \circ \pi _{\delta }\) and \({\mathbb {W}}_{P,\delta } := {\mathbb {W}}_{P} \circ \pi _{\delta }\). For any \(f \in BL_{1}\), we have
The third term on the right hand side of (46) is bounded by \({\mathbb {E}}[2 \wedge \Vert {\mathbb {W}}_{P,\delta } - {\mathbb {W}}_{P} \Vert _{{\mathcal {H}}}]\) and by construction \({\mathbb {W}}_{P}\) has sample paths almost surely uniformly \(e_{P}\)-continuous, so that \({\mathbb {E}}[2 \wedge \Vert {\mathbb {W}}_{P,\delta } - {\mathbb {W}}_{P} \Vert _{{\mathcal {H}}}] \rightarrow 0\) as \(\delta \downarrow 0\) by the dominated convergence theorem. Since \({\mathbb {U}}_{n,\delta }^{\sharp }\) can be identified with a Gaussian vector of dimension \(N(\delta )\) conditionally on \(X_{1}^{\infty }\), by Lemma 3.7.46 in [29], the second term on the right hand side of (46) is bounded by
for some constant \(c(\delta )\) that depends only on \(\delta \), where
From Step 5 of the proof of Theorem 3.1 and using the notation in the proof, we have
From the UCLT for the U-process established in the first paragraph, the last term on the right hand side is \(o_{{\mathbb {P}}}(1)\). The function class \(\breve{{\mathcal {G}}} \cdot \breve{{\mathcal {G}}}\) is weak P-Glivenko-Cantelli by Lemmas A.3 and A.5 together with Theorem 2.4.3 in [53], which implies that \(n^{-1/2} \Vert {\mathbb {G}}_{n} \Vert _{\breve{{\mathcal {G}}} \cdot \breve{{\mathcal {G}}}} = o_{{\mathbb {P}}}(1)\). From Lemma D.3 below, we also have \(\Upsilon _{n} = o_{{\mathbb {P}}}(1)\).
Finally, the first term on the right hand side of (46) is bounded by
for any \(\varepsilon > 0\), where \({\mathcal {H}}_{\delta } = \{ h-h' : h,h' \in {\mathcal {H}}, e_{P}(h,h') < 2\delta \Vert G \Vert _{P,2} \}\). Let \(\Sigma _{n,\delta } := \Vert n^{-1} \sum _{i=1}^{n}\{ U_{n-1,-i}^{(r-1)} (\delta _{X_{i}}h) - U_{n}(h) \}^{2} \Vert _{{\mathcal {H}}_{\delta }}\). By Markov’s inequality,
From Step 5 of the proof of Theorem 3.1,
with \(d(h,h') = \{ {\mathbb {E}}_{\mid X_{1}^{\infty }} [\{ {\mathbb {U}}_{n}^{\sharp } (h) - {\mathbb {U}}_{n}^{\sharp } (h') \}^{2}]\}^{1/2}\). Hence by Dudley’s entropy integral bound, we have
up to a constant independent of n and \(\delta \), and \(\Vert H \Vert _{{\mathbb {P}}_{I_{n,r},2}}^{2} = |I_{n,r}|^{-1}\sum _{I_{n,r}} H^{2}(X_{i_{1}},\dots ,X_{i_{r}}) = \Vert H \Vert _{P^{r},2}^{2} + o_{{\mathbb {P}}}(1)\) by the law of large numbers for U-statistics [18, Theorem 4.1.4]. From Step 4 of the proof of Theorem 3.1,
and the last two terms on the right hand side are \(o_{{\mathbb {P}}}(1)\) while the first term can be arbitrarily small by taking \(\delta \) sufficiently small. This implies that for any \(\eta > 0\),
Putting everything together, we conclude \(d_{BL \mid X_{1}^{\infty }}({\mathbb {U}}_{n}^{\sharp },{\mathbb {W}}_{P})^{*} {\mathop {\rightarrow }\limits ^{{\mathbb {P}}}} 0\), completing the proof. \(\square \)
Lemma D.2
Under the assumption of Theorem D.1, the asymptotic equicontinuity condition (45) holds.
Proof of Lemma D.2
For \(\delta \in (0,1]\), let \({\mathcal {H}}_{\delta }' = \{ h -h' : \Vert h - h' \Vert _{P^{r},2} < \delta \Vert H \Vert _{P^{r},2} \}\). By Markov’s inequality, it suffices to show that
We use Hoeffding’s averaging [49, Section 5.1.6] to bound the expectation. Let
Then we have
where \(\sum _{j_{1},\dots ,j_{n}}\) are taken over all permutations \(j_{1},\dots ,j_{n}\) of \(1,\dots ,n\). By Jensen’s inequality, \({\mathbb {E}}[ \Vert {\mathbb {U}}_{n} \Vert _{{\mathcal {H}}_{\delta }'}]\) is bounded by \(\sqrt{n}{\mathbb {E}}[\Vert S_{h}(X_{1},\dots ,X_{n}) - P^{r}h \Vert _{{\mathcal {H}}_{\delta }'}]\). Since
and since \((X_{(i-1)r+1},\dots ,X_{ir}) , i=1,\dots ,m\) are i.i.d., we can apply Theorem 5.2 in [14] to conclude that
up to a constant that depends only on r, where \(M_{r} = \max _{1 \leqslant i \leqslant m} H(X_{(i-1)r+1},\dots ,X_{ir})\) and the J function is defined in [14]. From a standard calculation, \(J(\delta ,{\mathcal {H}}_{\delta }', 2H) \lesssim J(\delta ,{\mathcal {H}},H) = \int _{0}^{\delta }\sqrt{1+\lambda (\tau )} d\tau \) up to a universal constant and \(\Vert M_{r} \Vert _{{\mathbb {P}},2} = o(\sqrt{m})\) by \(H \in L^{2}(P^{r})\) [53, Problem 2.3.4]. Hence we conclude
up to a constant that depends only on r, and by the dominated convergence theorem the right hand side is o(1) as \(\delta \downarrow 0\). This completes the proof. \(\square \)
Lemma D.3
Under the assumption of Theorem D.1, we have \({\mathbb {E}}[\Upsilon _{n}]= O(n^{-1})\) where \(\Upsilon _{n}\) is defined in (31).
Proof of Lemma D.3
We begin with noting that
By Hoeffding’s averaging [49, Section 5.1.6],
where \(\sum _{j_{1},\dots ,j_{n-1}}\) is taken over all permutations \(j_{1},\dots ,j_{n-1}\) of \(1,\dots ,n-1\), and
By Jensen’s inequality,
By Corollary A.4 and the condition of Theorem D.1, for given \(x \in S\),
Hence, applying Theorem 2.14.1 in [53] conditionally on \(X_{n}\), we have
up to a constant independent of n. Since \({\mathbb {E}}[\Vert \delta _{X_{n}} H \Vert _{P^{r-1},2}^{2}] = \Vert H \Vert _{P^{r},2}^{2}\), we obtain the desired conclusion by Fubini’s theorem. \(\square \)
Appendix E. Gaussian approximation for suprema of U-processes indexed by general function classes
In this section we derive Gaussian approximation error bounds for the U-process supremum indexed by general function classes. We obey the notation used in Sects. 2, 3 and 5. We make the following assumptions on the function class \({\mathcal {H}}\) and the distribution P.
- (A1)
The function class \({\mathcal {H}}\) is pointwise measurable.
- (A2)
The envelope H satisfies that \(H \in L^{3}(P^{r})\).
- (A3)
The class \({\mathcal {G}}= P^{r-1} {\mathcal {H}}= \{ P^{r-1} h : h \in {\mathcal {H}}\}\) is P-pre-Gaussian, i.e., there exists a tight Gaussian random variable \(W_{p}\) in \(\ell ^{\infty }({\mathcal {G}})\) with mean zero and covariance function \({\mathbb {E}}[W_{P}(g) W_{P}(g')] = \mathrm {Cov}(g(X_{1}), g'(X_{1}))\) for all \(g,g' \in {\mathcal {G}}\).
Conditions (A1)–(A3) are parallel with the corresponding conditions in [14]. Condition (A1) is the same as Condition (PM) in Sect. 2. Condition (A3) is a high-level assumption that is implied by Condition (VC) in Sect. 2.
For \(\varepsilon > 0\), define \({\mathcal {N}}_{n}(\varepsilon ) = \log (N({\mathcal {G}}, \Vert \cdot \Vert _{P,2}, \varepsilon \Vert G \Vert _{P,2}) \vee n)\) with \(G= P^{r-1}H\). Under Condition (A3), \({\mathcal {G}}\) is totally bounded for the intrinsic pseudometric induced by \(\Vert \cdot \Vert _{P,2}\) and \({\mathcal {N}}_{n}(\varepsilon )\) is finite for every \(\varepsilon \in (0,1]\). In addition, the Gaussian process \(W_{P}\) extends to the linear hull of \({\mathcal {G}}\) in such a way that \(W_{P}\) has linear sample paths (see e.g., Theorem 3.7.28 in [29]). For \(\varepsilon \in (0,1], \gamma \in (0,1)\), and \(\kappa > 0\), define
where \({\mathcal {G}}_{\varepsilon } = \{g-g' : g, g' \in {\mathcal {G}}, \Vert g-g'\Vert _{P,2} < 2\varepsilon \Vert G\Vert _{P,2}\}\), \(\breve{{\mathcal {G}}} \cdot \breve{{\mathcal {G}}} = \{gg' : g, g' \in \breve{{\mathcal {G}}}\}\), \(\breve{{\mathcal {G}}} = \{g, g-Pg : g \in {\mathcal {G}}\}\), and \(\breve{G} = G + PG\). Here \(c > 0\) is some universal constant. Below is an abstract (yet general) version of the Gaussian coupling bound.
Proposition E.1
(Abstract Gaussian coupling bound) Let \(Z_{n} = \sup _{h \in {\mathcal {H}}} {\mathbb {U}}_{n}(h)/r\). Suppose that Conditions (A1)–(A3) hold. Let \(\kappa > 0\) be any positive constant such that \(\kappa ^{3} \geqslant {\mathbb {E}}[\Vert n^{-1}\sum _{i=1}^{n}|g(X_{i}) - P g|^{3}\Vert _{{\mathcal {G}}}]\). Then, for every \(n \geqslant r+1\), \(\varepsilon \in (0,1]\), and \(\gamma \in (0,1)\), one can construct a random variable \({\widetilde{Z}}_{n} = {\widetilde{Z}}_{n,\varepsilon ,\gamma ,\kappa }\) such that \({\mathcal {L}}({\widetilde{Z}}_{n}) = {\mathcal {L}}(\sup _{g \in {\mathcal {G}}} W_P(g))\) and
where \(C_{1} = C_{1,r}\) is a constant depending only on r and \(C_{2}\) is a universal constant.
The proposition should be considered as an extension of Theorem 2.1 in [14] to the U-process. To apply the above proposition, we need to derive bounds on
which can be derived under some moment conditions on H and by using the uniform entropy integrals \(J_{k}(\delta ), k=1,\dots ,r\) defined in (19) (cf. Lemma 2.2 in [14] and our Theorem 5.1), where the latter can be simplified in terms of the VC characteristics (A, v) for a VC type function class (cf. the proof of Corollary 5.3).
Proof of Proposition E.1
The proof is based on a modification to that of Theorem 2.1 in [14]. In this proof C denotes a generic universal constant; the value of C may change from place to place. Let \(\{g_{k}\}_{k=1}^{N}\) be a minimal \(\varepsilon \Vert G\Vert _{P,2}\)-net of \(({\mathcal {G}}, \Vert \cdot \Vert _{P,2})\) with \(N := N({\mathcal {G}}, \Vert \cdot \Vert _{P,2}, \varepsilon \Vert G\Vert _{P,2})\). By the definition of \({\mathcal {G}}\), each \(g_{k}\) corresponds to a kernel \(h_{k} \in {\mathcal {H}}\) such that \(g_{k}=P^{r-1}h_{k}\). Recall the Hoeffding decomposition \({\mathbb {U}}_{n}(h) = r {\mathbb {G}}_{n}(P^{r-1}h) + \sqrt{n} \sum _{k=2}^{r} {r \atopwithdelims ()k} U_{n}^{(k)}(\pi _{k}h)\), where \({\mathbb {G}}_{n}(P^{r-1} h) = n^{-1/2} \sum _{i=1}^{n} (P^{r-1}h (X_{i}) - P^{r}h)\). Let \(L_{n}=\sup _{g \in {\mathcal {G}}} {\mathbb {G}}_{n}(g)\) and \(R_{n}=\Vert r^{-1} \sqrt{n} \sum _{k=2}^{r} {r \atopwithdelims ()k} U_{n}^{(k)}(\pi _{k}h)\Vert _{{\mathcal {H}}}\). Then \(|Z_{n}-L_{n}| \leqslant R_{n}\). Define
We note that \(|L_{n}-L_{n}^{\varepsilon }| \leqslant \Vert {\mathbb {G}}_{n}\Vert _{{\mathcal {G}}_{\varepsilon }}\) and \(|{\widetilde{Z}}-{\widetilde{Z}}^{\varepsilon }| \leqslant \Vert W_{P}\Vert _{{\mathcal {G}}_{\varepsilon }}\). By Corollary 4.1 in [14], we have for every \(B \in {\mathcal {B}}({\mathbb {R}})\) and \(\delta > 0\),
where
Observe that \(T_{1} \leqslant n^{-1/2} {\mathbb {E}}[\Vert {\mathbb {G}}_{n}\Vert _{\breve{{\mathcal {G}}} \cdot \breve{{\mathcal {G}}}}]\), \(T_{2} \leqslant n^{-1/2} \kappa ^{3}\), and \(T_{3} \leqslant n^{-1/2} P[\breve{G}^{3} 1(\breve{G}>\delta \sqrt{n} {\mathcal {N}}_{n}(\varepsilon )^{-1})]\). Thus choosing
we have
Since \(\delta \geqslant c \gamma ^{-1/3} n^{-1/6} \kappa {\mathcal {N}}_{n}^{2/3}(\varepsilon )\), we have
Conclude that with \(\eta _{n} = (\gamma / 5) P[(\breve{G}/\kappa )^{3} 1(\breve{G}/\kappa >c \gamma ^{-1/3} n^{1/3} {\mathcal {N}}_{n}(\varepsilon )^{-1/3})]\),
Next, we will bound \(\Vert {\mathbb {G}}_{n}\Vert _{{\mathcal {G}}_{\varepsilon }}\) and \(\Vert W_{P}\Vert _{{\mathcal {G}}_{\varepsilon }}\). By Markov’s inequality, with probability at least \(1-\gamma /5\),
Further, by the Borell–Sudakov–Tsirel’son inequality (see Theorem 2.5.8 in [29]), with probability at least \(1-\gamma /5\), we have
Therefore, for every \(B \in {\mathcal {B}}({\mathbb {R}})\),
The conclusion of the proposition follows from the Strassen–Dudley theorem (see Theorem B.1). \(\square \)
Appendix F. Alternative tests for concavity/convexity and monotonicity of regression functions
We will obey the setting of Example 4.2.
1.1 F.1. Alternative tests for concavity/convexity of regression function f
Instead of the original localized simplex statistic (11) proposed in [1], we may consider the following modified version:
where \({\widetilde{\varphi }} (v_{1},\dots ,v_{m+2}) = 1\{ (x_{1},\dots ,x_{m+2}) \in {\mathcal {D}}\} w(v_{1},\dots ,v_{m+2})\), and test concavity or convexity of f if the scaled supremum or infimum of \({\widetilde{U}}_{n}\) is large or small, respectively. These alternative tests will work without the symmetry assumption on the conditional distribution of \(\varepsilon \), which is maintained in [1]. Our results below also cover these alternative tests.
1.2 F.2. Alternative tests for monotonicity of regression function f
Chetverikov [16] considers testing monotonicity of the regression function f without the assumption that the error term \(\varepsilon \) is independent of X. Chetverikov [16] studies, e.g., U-statistics given by replacing \(\mathrm {sign}(Y_{j}{-}Y_{i})\) in (12) by \(Y_{j}{-}Y_{i}\), and the test statistic defined by taking the maximum of such U-statistics over a discrete set of design points and bandwidths whose cardinality may grow with the sample size (indeed, the cardinality can be much larger than the sample size). His analysis is conditional on \(X_{i}\)’s, and he cleverly avoids U-process machineries and applies directly high-dimensional Gaussian and bootstrap approximation theorems developed in [12]. It should be noted that [16] considers more general test statistics and studies multi-step procedures to improve on powers of his tests.
Another related test for regression monotonicity is based on the local linear rank statistics [21]. Let \(R_{mk}(i) = \sum _{j=m+1}^{k} 1(Y_{j} \leqslant Y_{i})\) be the local rank of \(Y_{i}\) among \(Y_{m+1},\dots ,Y_{k}\). In [21], Dümbgen considers a test for monotone trend of f (with fixed design points \(X_{1},\dots ,X_{n}\)) via the local linear rank statistics
where \(\beta \) and q are functions on (0, 1) such that: 1) \(\beta (1-u)=-\beta (u)\) and \(q(1-u)=-q(u)\) for \(u \in (0,1)\); 2) \(\beta (\cdot )\) and \(q(\cdot )\) are nondecreasing on (0, 1). Then [21] proposes the multiscale test statistic
where \(s_{i}\) and \(c_{i}\) are properly chosen nonnegative numbers. For the special case of the Wilcoxon score function \(q(u) = 2u-1\) and \(\beta (u) = q(u)\), one can write
The statistic \(T_{mk}\) is related to our test statistic \({\check{U}}_{n}(x)\) with \(L(u) = 1(u \in [-1,1])\), namely \(T_{mk}\) and \({\check{U}}_{n}(x)\) are (local) U-statistics with kernels \((j-i) \mathrm {sign}(Y_{j}-Y_{i})\) and \(\mathrm {sign}(X_{i}-X_{j}) \mathrm {sign}(Y_{j}-Y_{i})\), respectively. Thus for a given sequence of bandwidths \(b_{n}\), our monotonicity test based on the U-process \({\check{U}}_{n}(x)\) can be viewed as a single-scale test \(T_{mk}\) with \((k-m)/n = 2 b_{n}\) in Dümbgen’s sense. In particular, both \(T_{0n}\) and \({\check{U}}_{n}(x)\) with \(b_{n} = 1\) quantify the monotonicity on the global scale. In addition, the “uniform-in-bandwidth” type results for our U-process approach in Sect. 4.1 can be viewed as the multiscale analog T of \(T_{mk}\) with the Wilcoxon score function. Nevertheless, since [21] considers the fixed design points, \(T_{mk}\) is a local U-statistic on \(Y_{i}\)’s and \({\check{U}}_{n}(x)\) is a local U-statistic on \((X_{i}, Y_{i})\)’s. Our analysis (which requires a Lebesgue density on X) is not directly applicable for the local linear rank statistics of [21].
Rights and permissions
About this article
Cite this article
Chen, X., Kato, K. Jackknife multiplier bootstrap: finite sample approximations to the U-process supremum with applications. Probab. Theory Relat. Fields 176, 1097–1163 (2020). https://doi.org/10.1007/s00440-019-00936-y
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440-019-00936-y
Keywords
- Gaussian approximation
- Jackknife multiplier bootstrap
- Coupling
- U-process
- Local maximal inequality
Mathematics Subject Classification
- 60F17
- 62E17
- 62F40
- 62G10