Skip to main content

Jackknife multiplier bootstrap: finite sample approximations to the U-process supremum with applications

Abstract

This paper is concerned with finite sample approximations to the supremum of a non-degenerate U-process of a general order indexed by a function class. We are primarily interested in situations where the function class as well as the underlying distribution change with the sample size, and the U-process itself is not weakly convergent as a process. Such situations arise in a variety of modern statistical problems. We first consider Gaussian approximations, namely, approximate the U-process supremum by the supremum of a Gaussian process, and derive coupling and Kolmogorov distance bounds. Such Gaussian approximations are, however, not often directly applicable in statistical problems since the covariance function of the approximating Gaussian process is unknown. This motivates us to study bootstrap-type approximations to the U-process supremum. We propose a novel jackknife multiplier bootstrap (JMB) tailored to the U-process, and derive coupling and Kolmogorov distance bounds for the proposed JMB method. All these results are non-asymptotic, and established under fairly general conditions on function classes and underlying distributions. Key technical tools in the proofs are new local maximal inequalities for U-processes, which may be useful in other problems. We also discuss applications of the general approximation results to testing for qualitative features of nonparametric functions based on generalized local U-processes.

This is a preview of subscription content, access via your institution.

Fig. 1

References

  1. Abrevaya, J., Jiang, W.: A nonparametric approach to measuring and testing curvature. J. Bus. Econ. Stat. 23(1), 1–19 (2005)

    MathSciNet  Google Scholar 

  2. Adamczak, R.: Moment inequalities for U-statistics. Ann. Probab. 34(6), 2288–2314 (2006)

    MathSciNet  MATH  Google Scholar 

  3. Arcones, M., Giné, E.: On the bootstrap of \(U\)- and \(V\)-statistics. Ann. Stat. 20(2), 655–674 (1992)

    MathSciNet  MATH  Google Scholar 

  4. Arcones, M., Giné, E.: Limit theorems for \(U\)-processes. Ann. Probab. 21(3), 1495–1542 (1993)

    MathSciNet  MATH  Google Scholar 

  5. Arcones, M., Giné, E.: U-processes indexed by Vapnik–Červonenkis classes of functions with applications to asymptotics and bootstrap of U-statistics with estimated parameters. Stoch. Process. Appl. 52(1), 17–38 (1994)

    MATH  Google Scholar 

  6. Bickel, P.J., Freedman, D.A.: Some asymptotic theory for the bootstrap. Ann. Stat. 9(6), 1196–1217 (1981)

    MathSciNet  MATH  Google Scholar 

  7. Blundell, R., Gosling, A., Ichimura, H., Meghir, C.: Changes in the distribution of male and female wages accounting for employment composition using bounds. Econometrica 75(2), 323–363 (2007)

    MATH  Google Scholar 

  8. Borovskikh, Y.V.: U-Statistics in Banach Spaces. V.S.P. Intl Science, Zeist (1996)

    MATH  Google Scholar 

  9. Bretagnolle, J.: Lois limits du Bootstrap de certaines functionnelles. Annales de l’Institut Henri Poincaré Section B XIX(3), 281–296 (1983)

    MATH  Google Scholar 

  10. Callaert, H., Veraverbeke, N.: The order of the normal approximation for a Studentized \(U\)-statistic. Ann. Stat. 9(1), 360–375 (1981)

    MathSciNet  MATH  Google Scholar 

  11. Chen, X.: Gaussian and bootstrap approximations for high-dimensional U-statistics and their applications. Ann. Stat. 46(2), 642–678 (2018)

    MathSciNet  MATH  Google Scholar 

  12. Chernozhukov, V., Chetverikov, D., Kato, K.: Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Stat. 41(6), 2786–2819 (2013)

    MathSciNet  MATH  Google Scholar 

  13. Chernozhukov, V., Chetverikov, D., Kato, K.: Anti-concentration and honest, adaptive confidence bands. Ann. Stat. 42(5), 1787–1818 (2014)

    MathSciNet  MATH  Google Scholar 

  14. Chernozhukov, V., Chetverikov, D., Kato, K.: Gaussian approximation of suprema of empirical processes. Ann. Stat. 42(4), 1564–1597 (2014)

    MathSciNet  MATH  Google Scholar 

  15. Chernozhukov, V., Chetverikov, D., Kato, K.: Empirical and multiplier bootstraps for suprema of empirical processes of increasing complexity, and related gaussian couplings. Stoch. Process. Appl. 126(12), 3632–3651 (2016)

    MathSciNet  MATH  Google Scholar 

  16. Chetverikov, D.: Testing regression monotonicity in econometric models. arXiv:1212.6757 (2012)

  17. Davydov, Y., Lifshits, M., Smorodina, N.: Local Properties of Distributions of Stochastic Functions (Transaction of Mathematical Monographs, Vol. 173). American Mathematical Society, New York (1998)

  18. de la Peña, V., Giné, E.: Decoupling: From Dependence to Independence. Springer, Berlin (1999)

    Google Scholar 

  19. Dehling, H., Mikosch, T.: Random quadratic forms and the bootstrap for \(U\)-statistics. J. Multivar. Anal. 51(2), 392–413 (1994)

    MathSciNet  MATH  Google Scholar 

  20. Dudley, R.M.: Real Analysis and Probability. Cambridge University Press, Cambridge (2002)

    MATH  Google Scholar 

  21. Dümbgen, L.: Application of local rank tests to nonparametric regression. J. Nonparametric Stat. 14(5), 511–537 (2002)

    MathSciNet  MATH  Google Scholar 

  22. Einmahl, U., Mason, D.M.: Uniform in bandwidth consistency of kernel-type function estimators. Ann. Stat. 33(3), 1380–1403 (2005)

    MathSciNet  MATH  Google Scholar 

  23. Ellison, G., Ellison, S.F.: Strategic entry deterrence and the behavior of pharmaceutical incumbents prior to patent expiration. Am. Econ. J. Microecon. 3(1), 1–36 (2011)

    MathSciNet  Google Scholar 

  24. Frees, E.W.: Estimating densities of functions of observations. J. Am. Stat. Assoc. 89(426), 517–525 (1994)

    MathSciNet  MATH  Google Scholar 

  25. Ghosal, S., Sen, A., van der Vaart, A.: Testing monotonicity of regression. Ann. Stat. 28(4), 1054–1082 (2000)

    MathSciNet  MATH  Google Scholar 

  26. Giné, E., Latała, R., Zinn, J.: Exponential and moment inequalities for \(U\)-statistics. High Dimensional Probability II. Springer, Berlin (2000)

  27. Giné, E., Mason, D.M.: On local \(U\)-statistic processes and the estimation of densities of functions of several sample variables. Ann. Stat. 35(3), 1105–1145 (2007)

    MathSciNet  MATH  Google Scholar 

  28. Giné, E., Nickl, R.: Uniform limit theorems for wavelet density estimators. Ann. Probab. 37(4), 1605–1646 (2009)

    MathSciNet  MATH  Google Scholar 

  29. Giné, E., Nickl, R.: Mathematical Foundations of Infinite-Dimensional Statistical Models. Cambridge University Press, Cambridge (2016)

    MATH  Google Scholar 

  30. Hall, P.: On convergence rates of suprema. Probab. Theory Relat. Fields 89(4), 447–455 (1991)

    MathSciNet  MATH  Google Scholar 

  31. Hoeffding, W.: A class of statistics with asymptotically normal distributions. Ann. Math. Stat. 19(3), 293–325 (1948)

    MathSciNet  MATH  Google Scholar 

  32. Huškova, M., Janssen, P.: Consistency of the generalized bootstrap for degenerate \(U\)-statistics. Ann. Stat. 21(4), 1811–1823 (1993)

    MathSciNet  MATH  Google Scholar 

  33. Hušková, M., Janssen, P.: Generalized bootstrap for studentized \(U\)-statistics: a rank statistic approach. Stat. Probab. Lett. 16(3), 225–233 (1993)

    MathSciNet  MATH  Google Scholar 

  34. Janssen, P.: Weighted bootstrapping of \(U\)-statistics. J. Stat. Plann. Inference 38(1), 31–42 (1994)

    MathSciNet  MATH  Google Scholar 

  35. Koltchinskii, V.I.: Komlos–Major–Tusnády approximation for the general empirical process and Haar expansions of classes of functions. J. Theor. Probab. 7(1), 73–118 (1994)

    MATH  Google Scholar 

  36. Komlós, J., Major, P., Tusnády, G.: An approximation of partial sums of independent rv’s and the sample df. I. Z. Wahrscheinlichkeitstheor. Verw. Geb. 32(1–2), 111–131 (1975)

    MathSciNet  MATH  Google Scholar 

  37. Ledoux, M., Talagrand, M.: Probability in Banach Spaces: Isoperimetry and Processes. Springer, New York (1991)

    MATH  Google Scholar 

  38. Lee, S., Linton, O., Whang, Y.-J.: Testing for stochastic monotonicity. Econometrica 77(2), 585–602 (2009)

    MathSciNet  MATH  Google Scholar 

  39. Albert, Y.L.: A large sample study of the Bayesian bootstrap. Ann. Stat. 15(1), 360–375 (1987)

    MathSciNet  MATH  Google Scholar 

  40. Mason, D.M., Newton, M.A.: A rank statistics approach to the consistency of a general bootstrap. Ann. Stat. 20(3), 1611–1624 (1992)

    MathSciNet  MATH  Google Scholar 

  41. Massart, P.: Strong approximation for multivariate empirical and related processes, via KMT constructions. Ann. Probab. 17(1), 266–291 (1989)

    MathSciNet  MATH  Google Scholar 

  42. Monrad, D., Philipp, W.: Nearby variables with nearby conditional laws and a strong approximation theorem for Hilbert space valued martingales. Probab. Theory Relat. Fields 88(3), 381–404 (1991)

    MathSciNet  MATH  Google Scholar 

  43. Nolan, D., Pollard, D.: \(U\)-processes: rates of convergence. Ann. Stat. 15(2), 780–799 (1987)

    MathSciNet  MATH  Google Scholar 

  44. Nolan, D., Pollard, D.: Functional limit theorems for \(U\)-processes. Ann. Probab. 16(3), 1291–1298 (1988)

    MathSciNet  MATH  Google Scholar 

  45. Piterberg, V.I.: Asymptotic Methods in the Theory of Gaussian Processes and Fields. American Mathematical Society, New York (1996)

    Google Scholar 

  46. Resnick, S.I.: Extreme Values, Regular Variation, and Point Processes. Springer, Berlin (1987)

    MATH  Google Scholar 

  47. Rio, E.: Local invariance principles and their application to density estimation. Probab. Theory Relat. Fields 98(1), 21–45 (1994)

    MathSciNet  MATH  Google Scholar 

  48. Rubin, D.B.: The Bayesian bootstrap. Ann. Stat. 9(1), 130–134 (1981)

    MathSciNet  Google Scholar 

  49. Serfling, R.J.: Approximation Theorems of Mathematical Statistics. Wiley, New York (1980)

    MATH  Google Scholar 

  50. Sherman, R.P.: Limiting distribution of the maximal rank correlation estimator. Econometrica 61(1), 123–137 (1993)

    MathSciNet  MATH  Google Scholar 

  51. Sherman, R.P.: Maximal inequalities for degenerate \(U\)-processes with applications to optimization estimators. Ann. Stat. 22(1), 439–459 (1994)

    MathSciNet  MATH  Google Scholar 

  52. Solon, G.: Intergenerational income mobility in the United States. Am. Econ. Rev. 82(3), 393–408 (1992)

    Google Scholar 

  53. van der Vaart, A., Wellner, J.A.: Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, Berlin (1996)

    MATH  Google Scholar 

  54. van der Vaart, A., Wellner, J.A.: A local maximal inequality under uniform entropy. Electron. J. Stat. 5, 192–203 (2011)

    MathSciNet  MATH  Google Scholar 

  55. Wang, Q., Jing, B.-Y.: Weighted bootstrap for \(U\)-statistics. J. Multivar. Anal. 91(2), 177–198 (2004)

    MathSciNet  MATH  Google Scholar 

  56. Zhang, D.: Bayesian bootstraps for U-processes, hypothesis tests and convergence of Dirichlet U-processes. Stat. Sin. 11(2), 463–478 (2001)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous referees and an Associate Editor for their constructive comments that improve the quality of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaohui Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

X. Chen is supported by NSF DMS-1404891, NSF CAREER Award DMS-1752614, and UIUC Research Board Awards (RB17092, RB18099).

Appendices

Appendix A. Supporting lemmas

This appendix collects some supporting lemmas that are repeatedly used in the main text.

Lemma A.1

(An anti-concentration inequality for the Gaussian supremum) Let \((S,{\mathcal {S}},P)\) be a probability space, and let \({\mathcal {G}}\subset L^{2}( P )\) be a P-pre-Gaussian class of functions. Denote by \(W_{P}\) a tight Gaussian random variable in \(\ell ^{\infty }({\mathcal {G}})\) with mean zero and covariance function \({\mathbb {E}}[ W_{P}(g) W_{P}(g') ] = \mathrm {Cov}_{P}(g,g')\) for all \(g,g' \in {\mathcal {G}}\) where \(\mathrm {Cov}_{P}(\cdot ,\cdot )\) denotes the covariance under P. Suppose that there exist constants \({\underline{\sigma }}, {\overline{\sigma }}>0\) such that \({\underline{\sigma }}^{2} \leqslant \mathrm {Var}_{P}(g) \leqslant {\overline{\sigma }}^{2}\) for all \(g \in {\mathcal {G}}\). Then for every \(\varepsilon > 0\),

$$\begin{aligned} \sup _{t \in {\mathbb {R}}}{\mathbb {P}}\left\{ \left| \sup _{g \in {\mathcal {G}}} W_P(g)-t\right| \leqslant \varepsilon \right\} \leqslant C_{\sigma }\varepsilon \left\{ {\mathbb {E}}\left[ \sup _{g\in {\mathcal {G}}}W_P (g)\right] +\sqrt{1\vee \log ({\underline{\sigma }}/\varepsilon )}\right\} , \end{aligned}$$

where \(C_{\sigma }\) is a constant depending only on \({\underline{\sigma }}\) and \({\overline{\sigma }}\).

Proof

See Lemma A.1 in [14]. \(\square \)

Lemma A.2

Let \({\mathcal {F}}\) be a class of real-valued measurable functions on a measurable space \(({\mathcal {X}},{\mathcal {A}})\) with finite measurable envelope F. Then for any probability measure R on \(({\mathcal {X}},{\mathcal {A}})\) such that \(RF^{2} < \infty \), we have

$$\begin{aligned} N({\mathcal {F}},\Vert \cdot \Vert _{R,2}, 4\varepsilon \Vert F \Vert _{R,2}) \leqslant \sup _{Q}N({\mathcal {F}},\Vert \cdot \Vert _{Q,2},\varepsilon \Vert F \Vert _{Q,2}) \end{aligned}$$

for every \(0 < \varepsilon \leqslant 1\), where \(\sup _{Q}\) is taken over all finitely discrete distributions on \({\mathcal {X}}\).

Proof

This follows from approximating R by a finitely discrete distribution. See Problem 2.5.1 in [53]. \(\square \)

Lemma A.3

Let \(({\mathcal {X}},{\mathcal {A}}), ({\mathcal {Y}},{\mathcal {C}})\) be measurable spaces and let \({\mathcal {F}}\) be a class of real-valued jointly measurable functions on \({\mathcal {X}}\times {\mathcal {Y}}\) with finite measurable envelope F. Let R be a probability measure on \(({\mathcal {Y}},{\mathcal {C}})\) and for a jointly measurable function \(f: {\mathcal {X}}\times {\mathcal {Y}}\rightarrow {\mathbb {R}}\), define \({\overline{f}}: {\mathcal {X}}\rightarrow {\mathbb {R}}\) by \({\overline{f}}(x) := \int f(x,y) dR(y)\) whenever the latter integral is defined and finite for every \(x \in {\mathcal {X}}\). Suppose that \({\overline{F}}\) is everywhere finite and let \({\overline{{\mathcal {F}}}} = \{ {\overline{f}} : f \in {\mathcal {F}}\}\). Then, for every \(r,s \in [1,\infty )\),

$$\begin{aligned} \sup _{Q} N({\overline{{\mathcal {F}}}},\Vert \cdot \Vert _{Q,r},2\varepsilon \Vert {\overline{F}} \Vert _{Q,r}) \leqslant \sup _{Q'} N({\mathcal {F}}, \Vert \cdot \Vert _{Q',s},\varepsilon ^{r} \Vert F \Vert _{Q',s}/4) \end{aligned}$$

where \(\sup _{Q}\) and \(\sup _{Q'}\) are taken over all finitely discrete distributions on \({\mathcal {X}}\) and \({\mathcal {X}}\times {\mathcal {Y}}\), respectively.

Proof

This follows from Lemma A.2 in [25] combined with Lemma A.2. \(\square \)

If \(R=\delta _{y}\) for some \(y \in {\mathcal {Y}}\), then \(\Vert \delta _{y} f \Vert _{Q,r}^{r} = \Vert f \Vert _{Q \times \delta _{y},r}^{r}\) (with \(\delta _{y} f(x) = f(x,y)\)) and \(Q \times \delta _{y}\) is finitely discrete if Q is so. Hence, we have the following corollary.

Corollary A.4

Under the setting of Lemma A.3, for every \(y \in {\mathcal {Y}}\) and \(r \in [1,\infty )\),

$$\begin{aligned} \sup _{Q} N(\delta _{y}F,\Vert \cdot \Vert _{Q,r},\varepsilon \Vert \delta _{y}F \Vert _{Q,r}) \leqslant \sup _{Q'} N({\mathcal {F}}, \Vert \cdot \Vert _{Q',r},\varepsilon \Vert F \Vert _{Q',r}). \end{aligned}$$

Lemma A.5

Let \({\mathcal {F}}\) and \({\mathcal {G}}\) be function classes on a set \({\mathcal {X}}\) with finite envelopes F and G, respectively. If \({\mathcal {F}}\cdot {\mathcal {G}}\) stands for the class of pointwise products of functions from \({\mathcal {F}}\) and \({\mathcal {G}}\), then for any \(r \in [1,\infty )\),

$$\begin{aligned}&\sup _{Q} N({\mathcal {F}}\cdot {\mathcal {G}}, \Vert \cdot \Vert _{Q,r},2 \varepsilon \Vert FG \Vert _{Q,r}) \\&\quad \leqslant \sup _{Q} N({\mathcal {F}}, \Vert \cdot \Vert _{Q,r}, \varepsilon \Vert F\Vert _{Q,r}) \sup _{Q}N({\mathcal {G}}, \Vert \cdot \Vert _{Q,r}, \varepsilon \Vert G \Vert _{Q,r}), \end{aligned}$$

where \(\sup _{Q}\) is taken over all finitely discrete distributions on \({\mathcal {X}}\).

Proof

See Lemma A.1 in [25] or [53, Section 2.10.3]. \(\square \)

Appendix B. Strassen–Dudley theorem and its conditional version

In this appendix, we state the Strassen–Dudley theorem together with its conditional version due to [42]. These results play fundamental roles in the proofs of Proposition 2.1 and Theorem 3.1. In what follows, let (Sd) be a Polish metric space equipped with its Borel \(\sigma \)-field \({\mathcal {B}}(S)\). For any set \(A \subset S\) and \(\delta > 0\), let \(A^{\delta } = \{ x \in S : \inf _{y \in A} d(x,y) \leqslant \delta \}\). We first state the Strassen–Dudley theorem.

Theorem B.1

(Strassen–Dudley) Let X be an S-valued random variable defined on a probability space \((\Omega ,{\mathcal {A}},{\mathbb {P}})\) which admits a uniform random variable on (0, 1) independent of X. Let \(\alpha , \beta >0\) be given constants, and let G be a Borel probability measure on S such that \({\mathbb {P}}(X \in A) \leqslant G(A^{\alpha })+ \beta \) for all \(A \in {\mathcal {B}}(S)\). Then there exists an S-valued random variable Y such that \({\mathcal {L}}(Y) (:= {\mathbb {P}}\circ Y^{-1}) = G\) and \({\mathbb {P}}(d(X,Y) > \alpha ) \leqslant \beta \).

For a proof of the Strassen–Dudley theorem, we refer to [20]. Next, we state a conditional version of the Strassen–Dudley theorem due to [42, Theorem 4].

Theorem B.2

(Conditional version of Strassen–Dudley) Let X be an S-valued random variable defined on a probability space \((\Omega ,{\mathcal {A}},{\mathbb {P}})\), and let \({\mathcal {G}}\) be a countably generated sub \(\sigma \)-field of \({\mathcal {A}}\). Suppose that there is a uniform random variable on (0, 1) independent of \({\mathcal {G}}\vee \sigma (X)\), and let \(\Omega \times {\mathcal {B}}(S) \ni (\omega ,A) \mapsto G(A \mid {\mathcal {G}}) (\omega )\) be a regular conditional distribution given \({\mathcal {G}}\), i.e., for each fixed \(A \in {\mathcal {B}}(S)\), \(G(A \mid {\mathcal {G}})\) is measurable with respect to \({\mathcal {G}}\) and for each fixed \(\omega \in \Omega \), \(G(\cdot \mid {\mathcal {G}})(\omega )\) is a probability measure on \({\mathcal {B}}(S)\). If

$$\begin{aligned} {\mathbb {E}}^{*} \left[ \sup _{A \in {\mathcal {B}}(S)} \{ {\mathbb {P}}(X \in A \mid {\mathcal {G}}) - G(A^{\alpha } \mid {\mathcal {G}}) \} \right] \leqslant \beta , \end{aligned}$$

then there exists an S-valued random variable Y such that the conditional distribution of Y given \({\mathcal {G}}\) is identical to \(G(\cdot \mid {\mathcal {G}})\), and \({\mathbb {P}}( d(X,Y) > \alpha ) \leqslant \beta \).

Remark B.1

(i) The map \((\omega ,A) \mapsto {\mathbb {P}}(X \in A \mid {\mathcal {G}})(\omega )\) should be understood as a regular conditional distribution (which is guaranteed to exist since X takes values in a Polish space). (ii) \({\mathbb {E}}^{*}\) denotes the outer expectation.

For completeness, we provide a self-contained proof of Theorem B.2, since [42] do not provide its direct proof.

Proof of Theorem B.2

Since \({\mathcal {G}}\) is countably generated, there exists a real-valued random variable W such that \({\mathcal {G}}= \sigma (W)\). For \(n=1,2,\dots \) and \(k \in {\mathbb {Z}}\), let \(D_{n,k} = \{ k/2^{n} \leqslant W < (k+1)/2^{n} \}\). For each n, \(\{ D_{n,k} : k \in {\mathbb {Z}} \}\) forms a partition of \(\Omega \). Pick any D from \(\{ D_{n,k} : n =1,2,\dots ; k \in {\mathbb {Z}} \}\); let \({\mathbb {P}}_{D} = {\mathbb {P}}(\cdot \mid D)\) and \(G(\cdot \mid D) = \int G(\cdot \mid {\mathcal {G}}) d{\mathbb {P}}_{D}\). Then, the Strassen–Dudley theorem yields that there exists an S-valued random variable \(Y_{D}\) such that \({\mathbb {P}}_{D} \circ Y_{D}^{-1} = G(\cdot \mid D)\) and \({\mathbb {P}}_{D}(d(X,Y_{D}) > \alpha ) \leqslant \varepsilon (D) := \sup _{A \in {\mathcal {B}}(S)} \{ {\mathbb {P}}_{D}(X \in A) - G(A^{\alpha } \mid D) \}\).

For each \(n=1,2,\dots \), let \(Y_{n} = \sum _{k \in {\mathbb {Z}}} Y_{D_{n,k}} 1_{D_{n,k}}\), and observe that

$$\begin{aligned} {\mathbb {P}}(d(X,Y_{n})> \alpha ) = \sum _{k} {\mathbb {P}}_{D_{n,k}} (d(X,Y_{D_{n,k}}) > \alpha ) {\mathbb {P}}(D_{n,k}) \leqslant \sum _{k} \varepsilon (D_{n,k}) {\mathbb {P}}(D_{n,k}). \end{aligned}$$

Let M be any (proper) random variable such that \(M \geqslant \sup _{A \in {\mathcal {B}}(S)} \{ {\mathbb {P}}(X \in A \mid {\mathcal {G}}) - G(A^{\alpha } \mid {\mathcal {G}}) \}\), and observe that

$$\begin{aligned} {\mathbb {P}}_{D}(X \in A) -G(A^{\alpha } \mid D) = {\mathbb {E}}^{{\mathbb {P}}_{D}} [ {\mathbb {P}}(X \in A \mid {\mathcal {G}}) - G(A^{\alpha } \mid {\mathcal {G}}) ] \leqslant {\mathbb {E}}^{{\mathbb {P}}_{D}}[M], \end{aligned}$$

where the notation \({\mathbb {E}}^{{\mathbb {P}}_{D}}\) denotes the expectation under \({\mathbb {P}}_{D}\). So,

$$\begin{aligned} \sum _{k} \varepsilon (D_{n,k}) {\mathbb {P}}(D_{n,k}) \leqslant \sum _{k} {\mathbb {E}}^{{\mathbb {P}}_{D_{n,k}}} [M] {\mathbb {P}}(D_{n,k}) = {\mathbb {E}}[M], \end{aligned}$$

and taking infimum with respect to M yields that the left hand side is bounded by \(\beta \).

Next, we shall verify that \(\{ {\mathcal {L}}(Y_{n}) : n \geqslant 1 \}\) is uniformly tight. In fact,

$$\begin{aligned} {\mathbb {P}}(Y_{n} \in A)&= \sum _{k} {\mathbb {P}}(\{ Y_{D_{n,k}} \in A \} \cap D_{n,k}) = \sum _{k} {\mathbb {P}}_{D_{n,k}} (Y_{D_{n,k}} \in A) {\mathbb {P}}(D_{n,k}) \\&= \sum _{k} G(A \mid D_{n,k}) {\mathbb {P}}(D_{n,k}) = {\mathbb {E}}[G(A \mid {\mathcal {G}})], \end{aligned}$$

and since any Borel probability measure on a Polish space is tight by Ulam’s theorem, \(\{ {\mathcal {L}}(Y_{n}) : n \geqslant 1 \}\) is uniformly tight. This implies that the family of joint laws \(\{ {\mathcal {L}}(X,W,Y_{n}) : n \geqslant 1 \}\) is uniformly tight and hence has a weakly convergent subsequence by Prohorov’s theorem. Let \({\mathcal {L}}(X,W,Y_{n'}) {\mathop {\rightarrow }\limits ^{w}} Q\) (the notation \({\mathop {\rightarrow }\limits ^{w}}\) denotes weak convergence), and observe that the marginal law of Q on the “first two” coordinates, \(S \times {\mathbb {R}}\), is identical to \({\mathcal {L}}(X,W)\).

We shall verify that there exists an S-valued random variable Y such that \({\mathcal {L}}(X,W,Y) =Q\). Since S is polish, there exists a unique regular conditional distribution, \({\mathcal {B}}(S) \times (S \times {\mathbb {R}}) \ni (A,(x,w)) \mapsto Q_{x,w}(A) \in [0,1]\), for Q given the first two coordinates. By the Borel isomorphism theorem [20, Theorem 13.1.1], there exists a bijective map \(\pi \) from S onto a Borel subset of \({\mathbb {R}}\) such that \(\pi \) and \(\pi ^{-1}\) are Borel measurable. Pick and fix any \((x,w) \in S \times {\mathbb {R}}\), and observe that \(Q_{x,w} \circ \pi ^{-1}\) extends to a Borel probability measure on \({\mathbb {R}}\). Denote by \(F_{x,w}\) the distribution function of \(Q_{x,w} \circ \pi ^{-1}\), and let \(F_{x,w}^{-1}\) denotes its quantile function. Let U be a uniform random variable on (0, 1) (defined on \((\Omega ,{\mathcal {A}},{\mathbb {P}})\)) independent of (XW). Then \(F_{x,w}^{-1} (U)\) has law \(Q_{x,w} \circ \pi ^{-1}\), and hence \(Y = \pi ^{-1} \circ F_{X,W}^{-1} (U)\) is the desired random variable.

Now, for any bounded continuous function f on S, observe that, whenever \(N \geqslant n\), \({\mathbb {E}}[ f(Y_{N})1_{D_{n,k}} ] = \int _{D_{n,k}} \int f(y) G(dy \mid {\mathcal {G}}) d{\mathbb {P}}\), which implies that the conditional distribution of Y given \({\mathcal {G}}\) is identical to \(G( \cdot \mid {\mathcal {G}})\). Finally, the Portmanteau theorem yields \({\mathbb {P}}(d(X,Y)> \alpha ) \leqslant \liminf _{n'} {\mathbb {P}}(d(X,Y_{n'}) > \alpha ) \leqslant \beta \). This completes the proof. \(\square \)

Appendix C. Additional proofs for the main text

1.1 C.1. Proof of Lemma 6.1

We begin with noting that \({\mathcal {G}}\) is VC type with characteristics \(4\sqrt{A}\) and 2v for envelope G. The rest of the proof is almost the same as that of Theorem 2.1 in [15] with \(B(f) \equiv 0\) (up to adjustments of the notation), but we now allow \(q=\infty \). To avoid repetitions, we only point out required modifications. In what follows, we will freely use the notation in the proof of [15, Theorem 2.1], but modify \(K_{n}\) to \(K_{n} = v \log (A \vee n)\), and C refers to a universal constant whose value may vary from place to place. In Step 1, change \(\varepsilon \) to \(\varepsilon =1/n^{1/2}\). For this choice, \(\log N({\mathcal {F}},e_{P},\varepsilon b) \leqslant C \log (Ab/(\varepsilon b)) = C\log (A/\varepsilon ) \leqslant CK_{n}\), and Dudley’s entropy integral bound yields that \({\mathbb {E}}[ \Vert G_{P} \Vert _{{\mathcal {F}}_{\varepsilon }}] \leqslant C\varepsilon b \sqrt{\log (Ab/(\varepsilon b))} \leqslant Cb\sqrt{K_{n}/n}\) (there is a slip in the estimate of \({\mathbb {E}}[\Vert G_{P}\Vert _{{\mathcal {F}}_{\varepsilon }}]\) in [15], namely, “\(Ab/\varepsilon \)” inside the log should read “\(Ab/(\varepsilon b)\)”, which of course does not affect the proof under their definition of \(K_{n}\)). Combining the Borell-Sudakov-Tsirel’son inequality yields that \({\mathbb {P}}\{ \Vert G_{P}\Vert _{{\mathcal {F}}_{\varepsilon }} > C b\sqrt{K_{n}/n} \} \leqslant 2n^{-1}\). In Step 3, Corollary 5.5 in the present paper (with \(r=k=1\)) yields that \({\mathbb {E}}[ \Vert {\mathbb {G}}_{n} \Vert _{{\mathcal {F}}_{\varepsilon }}] \leqslant C(b\sqrt{K_{n}/n} + bK_{n}/n^{1/2-1/q}) \leqslant CbK_{n}/n^{1/2-1/q}\), which is valid even when \(q=\infty \). Then, instead of applying their Lemma 6.1, we apply Markov’s inequality to deduce that

$$\begin{aligned} {\mathbb {P}}\left\{ \Vert {\mathbb {G}}_{n} \Vert _{{\mathcal {F}}_{\varepsilon }} > CbK_{n}/(\gamma n^{1/2-1/q}) \right\} \leqslant \gamma . \end{aligned}$$

In Step 4, instead of their equation (14), we have

$$\begin{aligned} {\mathbb {P}}(Z^{\varepsilon } \in B) \leqslant {\mathbb {P}}({\widetilde{Z}}^{\varepsilon } \in B^{C_{7}\delta }) + C \left( \frac{b\sigma ^{2}K_{n}^{2}}{\delta ^{3}\sqrt{n}} + \frac{M_{n,X}(\delta )K_{n}^{2}}{\delta ^{3}\sqrt{n}} + \frac{1}{n} \right) \quad \forall B \in {\mathcal {B}}({\mathbb {R}}) \end{aligned}$$

whenever \(\delta \geqslant 2c\sigma ^{-1/2}(\log N)^{3/2} \cdot (\log n)\) for some universal constant c (\(C_{7}\) comes from their Theorem 3.1 and is universal). Finally, in Step 5, take

$$\begin{aligned} \delta = C' \left\{ \frac{(b\sigma ^{2}K_{n}^{2})^{1/3}}{\gamma ^{1/3}n^{1/6}} + \frac{2bK_{n}}{\gamma n^{1/2-1/q}} \right\} \end{aligned}$$

for some large but universal constant \(C' > 1\). Under the assumption that \(K_{n}^{3} \leqslant n\), this choice ensures that \(\delta \geqslant 2c\sigma ^{-1/2}(\log N)^{3/2} \cdot (\log n)\), and

$$\begin{aligned} \frac{b\sigma ^{2}K_{n}^{2}}{\delta ^{3}\sqrt{n}} \leqslant \frac{1}{(C')^{3}n}. \end{aligned}$$

It remains to bound \(M_{n,X}(\delta )\). For finite q, their Step 4 shows that

$$\begin{aligned} \frac{M_{n,X}(\delta )K_{n}^{2}}{\delta ^{3}\sqrt{n}} \leqslant \frac{2^{q}b^{q}K_{n}^{2}(\log N)^{q-3}}{\delta ^{q}n^{q/2-1}}. \end{aligned}$$

Since \(\log N \leqslant C''K_{n}\) for some universal constant \(C''\), the right hand side is bounded by

$$\begin{aligned} \frac{\gamma ^{q}(C'')^{q-3}}{(C')^{q}K_{n}}. \end{aligned}$$

Since \(K_{n}\) is bounded from below by a universal positive constant (by assumption), and \(\gamma \in (0,1)\), by taking \(C' > C''\), the above term is bounded by \(\gamma \) up to a universal constant.

Now, consider the \(q=\infty \) case. In that case, \(\max _{1 \leqslant j \leqslant N}| {\widetilde{X}}_{1j} | \leqslant 2b\) almost surely and \(\delta \sqrt{n}/\log N \geqslant 2C'b/(C''\gamma ) > 2b\) provided that \(C' > C''\). Hence \(M_{n,X}(\delta ) =0\) in that case. These modifications lead to the desired conclusion. \(\square \)

1.2 C.1. Proofs for Sect. 4

We first prove Theorem 4.2 and Corollary 4.3, and then prove Lemma 4.1 and Theorem 4.4.

Proof of Theorem 4.2

In what follows, the notation \(\lesssim \) signifies that the left hand side is bounded by the right hand side up to a constant that depends only on \(r,m,\zeta ,c_1,c_2,C_1,L\). We also write \(a \simeq b\) if \(a \lesssim b\) and \(b \lesssim a\). In addition, let \(c,C,C'\) denote generic constants depending only on \(r, m,\zeta , c_{1},c_{2}, C_{1}, L\); their values may vary from place to place. We divide the rest of the proof into three steps.

Step 1 Let

$$\begin{aligned} S_{n}^{\sharp } := \sup _{\vartheta \in \Theta } \frac{b_{n}^{m/2}}{c_{n}(\vartheta )\sqrt{n}}\sum _{i=1}^n \xi _{i} \left[ U_{n-1,-i}^{(r-1)} (\delta _{D_{i}} h_{n,\vartheta })- U_n(h_{n,\vartheta }) \right] . \end{aligned}$$

In this step, we shall show that the result (15) holds with \({\widehat{S}}_{n}\) and \({\widehat{S}}_{n}^{\sharp }\) replaced by \(S_{n}\) and \(S_{n}^{\sharp }\), respectively.

We first verify Conditions (PM), (VC), (MT), and (5) for the function class

$$\begin{aligned} {\mathcal {H}}_{n} = \left\{ b_{n}^{m/2} c_{n}(\vartheta )^{-1} h_{n,\vartheta } : \vartheta \in \Theta \right\} \end{aligned}$$

with a symmetric envelope

$$\begin{aligned} H_{n}(d_{1:r})= & {} b_{n}^{-(r-1/2)m} c_{1}^{-1} \Vert L \Vert _{{\mathbb {R}}^{m}}^{r} {\overline{\varphi }}(v_{1:r}) \prod _{i=1}^{r} 1_{{\mathcal {X}}^{\zeta /2}}(x_{i}) \\&\prod _{1 \leqslant i < j \leqslant r} 1_{[-2,2]^m}(b_{n}^{-1}(x_{i}-x_{j})). \end{aligned}$$

Condition (PM) follows from our assumption. For Condition (VC), that \({\mathcal {H}}_{n}\) is VC type with characteristics \((A', v')\) satisfying \(\log A' \lesssim \log n\) and \(v' \lesssim 1\) follows from a slight modification of the proof of Lemma 3.1 in [25]. The latter part follows from our assumption. Condition (VC) guarantees the existence of a tight Gaussian random variable \({\mathcal {W}}_{P,n}(g), g \in P^{r-1}{\mathcal {H}}_{n} =: {\mathcal {G}}_{n}\) in \(\ell ^{\infty }({\mathcal {G}}_{n})\) with mean zero and covariance function \({\mathbb {E}}[{\mathcal {W}}_{P,n}(g){\mathcal {W}}_{P,n}(g')] = \mathrm {Cov}_{P}(g,g')\) for \(g,g' \in {\mathcal {G}}_{n}\). Let \(W_{P,n} (\vartheta ) = {\mathcal {W}}_{P,n}(g_{n,\vartheta })\) for \(\vartheta \in \Theta \) where \(g_{n,\vartheta } = b_{n}^{m/2} c_{n}(\vartheta )^{-1} P^{r-1}h_{n,\vartheta }\). It is seen that \(W_{P,n}(\vartheta ), \vartheta \in \Theta \) is a tight Gaussian random variable in \(\ell ^{\infty }(\Theta )\) with mean zero and covariance function (14).

Next, we determine the values of parameters \({\underline{\sigma }}_{{\mathfrak {g}}}, {\overline{\sigma }}_{{\mathfrak {g}}}, b_{{\mathfrak {g}}}, \sigma _{{\mathfrak {h}}}, b_{{\mathfrak {h}}}, \chi _{n},\nu _{{\mathfrak {h}}}\) for the function class \({\mathcal {H}}_n\). We will show in Step 3 that we may choose

$$\begin{aligned} {\underline{\sigma }}_{{\mathfrak {g}}} \simeq 1, \ {\overline{\sigma }}_{{\mathfrak {g}}} \simeq 1, \ b_{{\mathfrak {g}}} \simeq b_{n}^{-m/2}, \ \sigma _{{\mathfrak {h}}} \simeq b_{n}^{-m/2}, \ b_{{\mathfrak {h}}} \simeq b_{n}^{-3m/2}, \end{aligned}$$
(38)

and bound \(\nu _{{\mathfrak {h}}}\) and \(\chi _{n}\) as

$$\begin{aligned} \nu _{{\mathfrak {h}}} \lesssim b_{n}^{-m(1-1/q)}, \ \chi _{n} \lesssim (\log n)^{3/2}/(nb_{n}^{3m/2}). \end{aligned}$$
(39)

Given these choices and bounds, Corollaries 2.2 and 3.2 yield that

$$\begin{aligned} \begin{aligned}&\sup _{t \in {\mathbb {R}}} \left| {\mathbb {P}}(S_{n} \leqslant t) - {\mathbb {P}}({\widetilde{S}}_{n} \leqslant t) \right| \leqslant Cn^{-c} \ \text {and} \\&{\mathbb {P}}\left\{ \sup _{t \in {\mathbb {R}}} \left| {\mathbb {P}}_{\mid D_{1}^{n}} (S_{n}^{\sharp } \leqslant t) - {\mathbb {P}}({\widetilde{S}}_{n} \leqslant t) \right| > Cn^{-c}\right\} \leqslant Cn^{-c}. \end{aligned} \end{aligned}$$
(40)

Step 2 Observe that

$$\begin{aligned} | {\widehat{S}}_{n} - S_{n} |\leqslant & {} \sup _{\vartheta \in \Theta } \left| \frac{c_{n}(\vartheta )}{{\widehat{c}}_{n}(\vartheta )} - 1 \right| \Vert \sqrt{n}U_{n} \Vert _{{\mathcal {H}}_{n}} \quad \text {and} \nonumber \\ | {\widehat{S}}_{n}^{\sharp }- S_{n}^{\sharp } |\leqslant & {} \sup _{\vartheta \in \Theta } \left| \frac{c_{n}(\vartheta )}{{\widehat{c}}_{n}(\vartheta )} - 1 \right| \Vert {\mathbb {U}}_{n}^{\sharp } \Vert _{{\mathcal {H}}_{n}}. \end{aligned}$$
(41)

We shall bound \(\sup _{\vartheta \in \Theta } | c_{n}(\vartheta )/{\widehat{c}}_{n}(\vartheta ) - 1|, \Vert \sqrt{n}U_{n} \Vert _{{\mathcal {H}}_{n}}\), and \(\Vert {\mathbb {U}}_{n}^{\sharp } \Vert _{{\mathcal {H}}_{n}}\).

Choose \(n_{0}\) by the smallest n such that \(C_{1}n^{-c_{2}} \leqslant 1/2\); it is clear that \(n_{0}\) depends only on \(c_{2}\) and \(C_{1}\). It suffices to prove (15) for \(n \geqslant n_{0}\), since for \(n < n_{0}\), the result (15) becomes trivial by taking C sufficiently large. So let \(n \geqslant n_{0}\). Then Condition (T8) ensures that with probability at least \(1-C_{1}n^{-c_{2}}\), \(\inf _{\vartheta \in \Theta } {\widehat{c}}_{n}(\vartheta )/c_{n}(\vartheta ) \geqslant 1/2\). Since \(| a^{-1} - 1 | \leqslant 2 | a - 1 |\) for \(a \geqslant 1/2\), Condition (T8) also ensures that

$$\begin{aligned} {\mathbb {P}}\left\{ \sup _{\vartheta \in \Theta } \left| \frac{c_{n}(\vartheta )}{{\widehat{c}}_{n}(\vartheta )} - 1 \right| > Cn^{-c} \right\} \leqslant Cn^{-c}. \end{aligned}$$
(42)

Next, we shall bound \(\Vert \sqrt{n}U_{n} \Vert _{{\mathcal {H}}_{n}}\) and \(\Vert {\mathbb {U}}_{n}^{\sharp } \Vert _{{\mathcal {H}}_{n}}\). Given (38) and (39), and in view of the fact that the covering number of \({\mathcal {H}}_{n} \cup (-{\mathcal {H}}_{n}) := \{ h,-h : h \in {\mathcal {H}}_{n} \}\) is at most twice that of \({\mathcal {H}}_{n}\), applying Corollaries 2.2 and 3.2 to the function class \({\mathcal {H}}_{n} \cup (-{\mathcal {H}}_{n})\), we deduce that

$$\begin{aligned}&\sup _{t \in {\mathbb {R}}} \left| {\mathbb {P}}(\Vert \sqrt{n}U_{n} \Vert _{{\mathcal {H}}_{n}} \leqslant t) - {\mathbb {P}}(\Vert {\mathcal {W}}_{P,n} \Vert _{{\mathcal {G}}_{n}} \leqslant t) \right| \leqslant Cn^{-c} \ \text {and} \\&\quad {\mathbb {P}}\left\{ \sup _{t \in {\mathbb {R}}} \left| {\mathbb {P}}_{\mid D_{1}^{n}} (\Vert {\mathbb {U}}_{n}^{\sharp } \Vert _{{\mathcal {H}}_{n}} \leqslant t) -{\mathbb {P}}(\Vert {\mathcal {W}}_{P,n} \Vert _{{\mathcal {G}}_{n}} \leqslant t) \right| > Cn^{-c}\right\} \leqslant Cn^{-c}. \end{aligned}$$

(Theorem 3.7.28 in [29] ensures that the Gaussian process \({\mathcal {W}}_{P,n}\) extends to the symmetric convex hull of \({\mathcal {G}}_{n}\) in such a way that \({\mathcal {W}}_{P,n}\) has linear, bounded, and uniformly continuous (with respect to the intrinsic pseudometric) sample paths; in particular, \(\{ {\mathcal {W}}_{P,n}(g) : g \in {\mathcal {G}}_{n} \cup (-{\mathcal {G}}_{n}) \}\) is a tight Gaussian random variable in \(\ell ^{\infty }({\mathcal {G}}_{n} \cup (-{\mathcal {G}}_{n}))\) with mean zero and covariance function \({\mathbb {E}}[{\mathcal {W}}_{P,n}(g){\mathcal {W}}_{P,n}(g')] = \mathrm {Cov}_{P}(g,g')\) for \(g,g' \in {\mathcal {G}}_{n} \cup (-{\mathcal {G}}_{n})\) and \(\sup _{g \in {\mathcal {G}}_{n} \cup (-{\mathcal {G}}_{n})} {\mathcal {W}}_{n}(g) = \Vert {\mathcal {W}}_{P,n} \Vert _{{\mathcal {G}}_{n}}\).) Dudley’s entropy integral bound and the Borell-Sudakov-Tsirel’son inequality yield that \({\mathbb {P}}\{ \Vert {\mathcal {W}}_{P,n} \Vert _{{\mathcal {G}}_{n}} > C(\log n)^{1/2} \} \leqslant 2n^{-1}\), so that

$$\begin{aligned} \begin{aligned}&{\mathbb {P}}\{ \Vert \sqrt{n}U_{n} \Vert _{{\mathcal {H}}_{n}}> C(\log n)^{1/2} \} \leqslant Cn^{-c} \ \text {and} \\&{\mathbb {P}}\left\{ {\mathbb {P}}_{\mid D_{1}^{n}} \{ \Vert {\mathbb {U}}_{n}^{\sharp } \Vert _{{\mathcal {H}}_{n}}> C (\log n)^{1/2} \} > Cn^{-c}\right\} \leqslant Cn^{-c}. \end{aligned} \end{aligned}$$
(43)

Now, the desired result (15) follows from combining (40)–(43) and the anti-concentration inequality (Lemma A.1). In fact, the anti-concentration inequality yields

$$\begin{aligned} \sup _{t \in {\mathbb {R}}} {\mathbb {P}}( |{\widetilde{S}}_{n} -t| \leqslant Cn^{-c} ) \leqslant C'n^{-c} (\log n)^{1/2}. \end{aligned}$$
(44)

Hence, combining the bounds (40)–(44), we have for every \(t \in {\mathbb {R}}\),

$$\begin{aligned} {\mathbb {P}}({\widehat{S}}_{n} \leqslant t )&\leqslant {\mathbb {P}}(S_{n} \leqslant t + Cn^{-c}) + Cn^{-c} \\&\leqslant {\mathbb {P}}({\widetilde{S}}_{n} \leqslant t+Cn^{-c}) + Cn^{-c} \\&\leqslant {\mathbb {P}}({\widetilde{S}}_{n} \leqslant t) + Cn^{-c}, \end{aligned}$$

and likewise \({\mathbb {P}}({\widehat{S}}_{n} \leqslant t) \geqslant {\mathbb {P}}({\widetilde{S}}_{n} \leqslant t) - Cn^{-c}\). Similarly, we have

$$\begin{aligned} {\mathbb {P}}\left\{ \sup _{t \in {\mathbb {R}}} \left| {\mathbb {P}}_{\mid D_{1}^{n}} ({\widehat{S}}_{n}^{\sharp } \leqslant t) - {\mathbb {P}}({\widetilde{S}}_{n} \leqslant t) \right| > Cn^{-c}\right\} \leqslant Cn^{-c}. \end{aligned}$$

Step 3 It remains to verify (38) and (39). First, that we may choose \({\underline{\sigma }}_{{\mathfrak {g}}} \simeq 1\) follows from Conditions (T6) and (T7). For \(\varphi \in \Phi \) and \(k=1,\dots ,r-1\), let

$$\begin{aligned} \varphi _{[r-k]}(v_{1:k},x_{k+1:r}) = {\mathbb {E}}[ \varphi (v_{1:k}, V_{k+1:r}) \mid X_{k+1:r} = x_{k+1:r}] \prod _{j=k+1}^{r}p(x_{j}), \end{aligned}$$

and define \({\overline{\varphi }}_{[r-k]}\) similarly. Then, for \(k=1,\dots ,r\),

$$\begin{aligned} (P^{r-k}h_{n,\vartheta }) (d_{1:k})= & {} \left( \prod _{j=1}^{k} L_{b_{n}}(x-x_{j}) \right) \int _{[-1,1]^{m(r-k)}}\varphi _{[r-k]}(v_{1:k},x-b_{n} x_{k+1:r}) \\&\left( \prod _{j=k+1}^{r}L(x_{j}) \right) dx_{k+1:r}, \end{aligned}$$

where \(x-b_{n}x_{k+1:r} = (x-b_{n}x_{k+1},\dots ,x-b_{n}x_{r})\). Likewise, we have

$$\begin{aligned} (P^{r-k}H_{n}) (d_{1:k})&\lesssim b_{n}^{-(k-1/2)m} \left( \prod _{i=1}^{k} 1_{{\mathcal {X}}^{\zeta /2}}(x_{i}) \right) \left( \prod _{1 \leqslant i < j \leqslant k} 1_{[-2,2]^m}(b_{n}^{-1}(x_{i}-x_{j})) \right) \\&\quad \times \, \int _{[-2,2]^{m (r-k)}} {\overline{\varphi }}_{[r-k]} (v_{1:k},x_{1}-b_{n}x_{k+1:r}) dx_{k+1:r}. \end{aligned}$$

Suppose first that q is finite and let \(\ell \in [2,q]\). Observe that by Jensen’s inequality,

$$\begin{aligned} \begin{aligned} \Vert P^{r-k}h_{n,\vartheta } \Vert _{P^k,\ell }^{\ell }&\leqslant C^{\ell } b_n^{-(\ell -1)mk} \int _{[-1,1]^{mr}} {\mathbb {E}}\left[ {\overline{\varphi }}^{\ell }(V_{1:r}) \mid X_{1:r} = x-b_{n}x_{1:r} \right] \\&\quad \left( \prod _{j=1}^k p(x-b_n x_{j}) \right) d {x_{1:r}} \\&\leqslant C^{\ell } b_{n}^{-(\ell -1)mk} \int _{[-1,1]^{mr}} {\mathbb {E}}\left[ {\overline{\varphi }}^{\ell }(V_{1:r}) \mid X_{1:r}=x-b_{n}x_{1:r} \right] dx_{1:r} \\&\leqslant C^{\ell } b_{n}^{-(\ell -1)mk}, \end{aligned} \end{aligned}$$

so that \(\sup _{h \in {\mathcal {H}}_n} \Vert P^{r-k}h \Vert _{P^k,\ell } \lesssim b_n^{-m[(k-1/2)-k/\ell ]}\). Hence, we may choose \({\overline{\sigma }}_{\mathfrak {g}}\simeq 1\) and \(\sigma _{\mathfrak {h}}\simeq b_n^{-m/2}\). Similarly, Jensen’s inequality and the symmetry of \({\overline{\varphi }}\) yield that

$$\begin{aligned} \Vert P^{r-k} H_n \Vert _{P^k,\ell }^\ell&\leqslant C^{\ell } b_n^{-(k-1/2)m\ell +m(k-1)} \times \int _{{\mathcal {X}}^{\zeta /2} \times [-2,2]^{m(r-1)}} \\&\quad {\mathbb {E}}\left[ {\overline{\varphi }}^{\ell }(V_{1:r}) \mid X_1 = x_1, X_{2:r} = x_1-b_n x_{2:j} \right] p(x_1) \\&\quad \prod _{j=2}^{k} p(x_1 - b_n x_j) d x_{1:r} \\&\leqslant C^{\ell } b_n^{-(k-1/2)m\ell +m(k-1)}\int _{{\mathcal {X}}^{\zeta /2} \times [-2,2]^{m(r-1)}} \\&\quad {\mathbb {E}}\left[ {\overline{\varphi }}^{\ell }(V_{1:r}) \mid X_1 = x_1, X_{2:r} = x_1-b_n x_{2:j} \right] d x_{1:r} \\&\leqslant C^{\ell } b_n^{-(k-1/2)m\ell +m(k-1)}, \end{aligned}$$

so that \(\Vert P^{r-k} H_n \Vert _{P^k,\ell } \lesssim b_n^{-m[(1-1/\ell )k - (1/2-1/\ell )]}\). Hence, we may choose \(b_{\mathfrak {g}}\simeq b_n^{-m/2}\), \(b_{\mathfrak {h}}\simeq b_n^{-3m/2}\), and bound \(\chi _{n}\) as

$$\begin{aligned} \chi _n \lesssim \sum _{k=3}^r n^{-(k-1)/2} (\log {n})^{k/2} b_n^{-mk/2} \lesssim {(\log n)^{3/2}\over n b_n^{3m/2}}. \end{aligned}$$

Similar calculations yield that

$$\begin{aligned} \Vert (P^{r-2}H_{n})^{\odot 2} \Vert ^{q/2}_{P^{2},q/2}&\leqslant C^{q} b_n^{-m(q-1)} \int _{{\mathcal {X}}^{\zeta /2} \times [-2,2]^{m(r-1)}} \\&\quad {\mathbb {E}}\left[ {\overline{\varphi }}^{q}(V_{1:r}) \mid X_1 = x_1, X_{2:r} = x_1-b_n x_{2:j} \right] d x_{1:r} \\&\leqslant C^{q} b_{n}^{-m(q-1)}. \end{aligned}$$

Hence, \(\nu _{{\mathfrak {h}}} \lesssim b_n^{-m(1-1/q)}\).

It is not difficult to verify that (38) and (39) hold in the \(q=\infty \) case as well under the convention that \(1/q=0\) for \(q=\infty \). This completes the proof. \(\square \)

Proof of Corollary 4.3

Let \(\eta _{n} := Cn^{-c}\) where the constants cC are those given in Theorem 4.2. Denote by \(q_{{\widetilde{S}}_{n}}(\alpha )\) the \(\alpha \)-quantile of \({\widetilde{S}}_{n}\). Define the event

$$\begin{aligned} {\mathcal {E}}_{n}: =\left\{ \sup _{t \in {\mathbb {R}}} \left| {\mathbb {P}}_{\mid D_{1}^{n}} ({\widehat{S}}_{n}^{\sharp } \leqslant t) - {\mathbb {P}}({\widetilde{S}}_{n} \leqslant t) \right| \leqslant \eta _{n} \right\} , \end{aligned}$$

whose probability is at least \(1-\eta _{n}\). On this event,

$$\begin{aligned} {\mathbb {P}}_{\mid D_{1}^{n}} \left\{ {\widehat{S}}_{n}^{\sharp } \leqslant q_{{\widetilde{S}}_{n}}(\alpha +\eta _{n}) \right\}&\geqslant {\mathbb {P}}\left\{ {\widetilde{S}}_{n}\leqslant q_{{\widetilde{S}}_{n}}(\alpha +\eta _{n}) \right\} - \eta _{n} \\&= \alpha +\eta _{n} - \eta _{n} = \alpha , \end{aligned}$$

where the second equality follows from the fact that the distribution function of \({\widetilde{S}}_{n}\) is continuous (cf. Lemma A.1). This shows that the inequality \(q_{{\widehat{S}}_{n}^{\sharp }}(\alpha ) \leqslant q_{{\widetilde{S}}_{n}}(\alpha +\eta _{n})\) holds on the event \({\mathcal {E}}_{n}\), so that

$$\begin{aligned} {\mathbb {P}}\left\{ {\widehat{S}}_{n} \leqslant q_{{\widehat{S}}_{n}^{\sharp }}(\alpha ) \right\}&\leqslant {\mathbb {P}}\left\{ {\widehat{S}}_{n} \leqslant q_{{\widetilde{S}}_{n}}(\alpha +\eta _{n}) \right\} + {\mathbb {P}}( {\mathcal {E}}_{n}^{c}) \\&\leqslant {\mathbb {P}}\left\{ {\widetilde{S}}_{n} \leqslant q_{{\widetilde{S}}_{n}}(\alpha +\eta _{n}) \right\} + 2\eta _{n} \\&= \alpha + 3\eta _{n}. \end{aligned}$$

The above discussion presumes that \(\alpha + \eta _{n} < 1\), but if \(\alpha + \eta _{n} \geqslant 1\), then the last inequality is trivial. Likewise, we have \({\mathbb {P}}\left\{ {\widehat{S}}_{n} \leqslant q_{{\widehat{S}}_{n}^{\sharp }}(\alpha ) \right\} \geqslant \alpha -3\eta _{n}\). This completes the proof. \(\square \)

Proof of Lemma 4.1

We begin with noting that

$$\begin{aligned} \left| \frac{{\widehat{c}}_{n}(\vartheta )}{c_{n}(\vartheta )} - 1 \right| \leqslant \left| \frac{{\widehat{c}}_{n}^2(\vartheta )}{c_{n}^2(\vartheta )} - 1 \right|&\leqslant \frac{1}{n} \sum _{i=1}^{n} \left[ \{ U_{n-1,-i}^{(r-1)}(\delta _{D_{i}}\breve{h}_{n,\vartheta }) - U_{n}(\breve{h}_{n,\vartheta }) \}^2 - 1 \right] , \end{aligned}$$

where \(\breve{h}_{n,\vartheta } = b_{n}^{m/2}c_{n}(\vartheta )^{-1} h_{n,\vartheta }\). We note that \(\mathrm {Var}_{P}(P^{r-1}\breve{h}_{n,\vartheta }) =1\) by the definition of \(c_{n}(\vartheta )\). Recall from the proof of Theorem 4.2 that the function class \({\mathcal {H}}_{n} =\{ \breve{h}_{n,\vartheta } : \vartheta \in \Theta \}\) is VC type with characteristics \((A', v')\) satisfying \(\log A' \lesssim \log n\) and \(v' \lesssim 1\) for envelope \(H_{n}\). Now, from Step 5 in the proof of Theorem 3.1 applied with \({\mathcal {H}}= {\mathcal {H}}_{n}\), we have for every \(\gamma \in (0,1)\), with probability at least \(1-\gamma -n^{-1}\),

$$\begin{aligned}&\left\| \frac{1}{n} \sum _{i=1}^{n} \left[ \{ U_{n-1,-i}^{(r-1)}(\delta _{D_{i}}h) - U_{n}(h) \}^2 - 1 \right] \right\| _{{\mathcal {H}}_{n}} \\&\quad \leqslant C\gamma ^{-1} \Bigg [ (b_{{\mathfrak {g}}} \vee \sigma _{{\mathfrak {h}}}){\overline{\sigma }}_{{\mathfrak {g}}}K_n^{1/2} n^{-1/2} + b_{{\mathfrak {g}}}^{2} K_{n}n^{-1+2/q} \\&\qquad +\, {\overline{\sigma }}_{{\mathfrak {g}}} \left\{ \nu _{{\mathfrak {h}}} K_{n} n^{-3/4+1/q} + (\sigma _{{\mathfrak {h}}} b_{{\mathfrak {h}}})^{1/2} K_{n}^{3/4} n^{-3/4} + b_{{\mathfrak {h}}} K_{n}^{3/2}n^{-1+1/q}+ \chi _{n} \right\} \Bigg ] \end{aligned}$$

for some constant C depending only on r. The desired result follows from the choices of parameters \({\overline{\sigma }}_{{\mathfrak {g}}}, b_{{\mathfrak {g}}}, \sigma _{{\mathfrak {h}}}, b_{{\mathfrak {h}}}, \chi _{n}\), and \(\nu _{{\mathfrak {h}}}\) given in the proof of Theorem 4.2 together with choosing \(\gamma = n^{-c}\) for some constant c sufficiently small but depending only on \(r, m, \zeta , c_{1},c_{2}, C_{1}, L\). \(\square \)

Proof of Theorem 4.4

The proof follows from similar arguments to those in the proof of Theorem 4.2, so we only highlight the differences. Define the function class

$$\begin{aligned} {\mathcal {H}}_{n} = \left\{ b^{m/2} c_{n}(\vartheta ,b)^{-1} h_{\vartheta ,b} : \vartheta \in \Theta , b \in {\mathcal {B}}_{n} \right\} \end{aligned}$$

with a symmetric envelope

$$\begin{aligned}&H_{n}(d_{1:r}) = {\underline{b}}_{n}^{-(r-1/2)m} c_{1}^{-1} \Vert L \Vert _{{\mathbb {R}}^{m}}^{r} {\overline{\varphi }}(v_{1:r}) \prod _{i=1}^{r} 1_{{\mathcal {X}}^{\zeta /2}}(x_{i})\\&\quad \prod _{1 \leqslant i < j \leqslant r} 1_{[-2,2]^m}({\overline{b}}_{n}^{-1}(x_{i}-x_{j})). \end{aligned}$$

Recall that we assume \(q=\infty \) in this theorem. In view of the calculations in the proof of Theorem 4.2, we may choose

$$\begin{aligned} {\underline{\sigma }}_{{\mathfrak {g}}} \simeq 1, \ {\overline{\sigma }}_{{\mathfrak {g}}} \simeq 1, \ b_{{\mathfrak {g}}} \simeq \kappa _{n}^{m(r-1)} {\underline{b}}_{n}^{-m/2}, \ \sigma _{{\mathfrak {h}}} \simeq {\underline{b}}_{n}^{-m/2}, \ b_{{\mathfrak {h}}} \simeq \kappa _{n}^{m(r-2)} {\underline{b}}_{n}^{-3m/2}, \end{aligned}$$

and bound \(\nu _{{\mathfrak {h}}}\) and \(\chi _{n}\) as

$$\begin{aligned} \nu _{{\mathfrak {h}}} \lesssim \kappa _{n}^{m/2} {\underline{b}}_{n}^{-m}, \ \chi _{n} \lesssim {\kappa _{n}^{m(r-2)} (\log n)^{3/2} \over n {\underline{b}}_{n}^{3m/2}}. \end{aligned}$$

Given these choices and bounds, the conclusion of the theorem follows from repeating the proof of Theorem 4.2. \(\square \)

Appendix D. Conditional UCLT for JMB

In this section we prove the conditional UCLT for the JMB when the function class \({\mathcal {H}}\) and the distribution P are independent of n under a metric entropy condition. We obey the notation used in Sects. 2 and 3 but since we consider a limit theorem we assume that the probability space is \((\Omega ,{\mathcal {A}},{\mathbb {P}}) = (S^{{\mathbb {N}}},{\mathcal {S}}^{{\mathbb {N}}},P^{{\mathbb {N}}}) \times (\Xi , {\mathcal {C}}, R)\) and \(X_{1},X_{2},\dots \) are the coordinate projections of \((S^{{\mathbb {N}}},{\mathcal {S}}^{{\mathbb {N}}},P^{{\mathbb {N}}})\). To formulate the conditional UCLT, recall that weak convergence in \(\ell ^{\infty }({\mathcal {H}})\) is “metrized” by the bounded Lipschitz distance: for arbitrary maps \({\mathbb {X}}_{n}: \Omega \rightarrow \ell ^{\infty }({\mathcal {H}})\) and a tight Borel measurable map \({\mathbb {X}}: \Omega \rightarrow \ell ^{\infty }({\mathcal {H}})\), \({\mathbb {X}}_{n}\) converge weakly to \({\mathbb {X}}\) if and only if

$$\begin{aligned} d_{BL}({\mathbb {X}}_{n},{\mathbb {X}}) := \sup _{f \in BL_{1}} | {\mathbb {E}}^{*}[f({\mathbb {X}}_{n})] - {\mathbb {E}}[f({\mathbb {X}})]| \rightarrow 0, \end{aligned}$$

where \(BL_{1} = \{ f : \ell ^{\infty }({\mathcal {H}}) \rightarrow {\mathbb {R}}: |f| \leqslant 1, |f(x)-f(y)| \leqslant \Vert x-y \Vert _{{\mathcal {H}}} \ \forall x,y \in \ell ^{\infty }({\mathcal {H}}) \}\); see [53, p. 73]. If the function class \({\mathcal {G}}= P^{r-1} {\mathcal {H}}= \{ P^{r-1} h : h \in {\mathcal {H}}\}\) is P-pre-Gaussian, then there exists a tight Gaussian random variable \(W_{P}\) in \(\ell ^{\infty }({\mathcal {G}})\) with mean zero and covariance function \({\mathbb {E}}[W_{P}(g)W_{P}(g')] = \mathrm {Cov}_{P} (g,g')\). Set \({\mathbb {W}}_{P} (h) = W_{P} \circ P^{r-1} (h)\), which is a tight Gaussian random variable in \(\ell ^{\infty }({\mathcal {H}})\) with mean zero and covariance function \({\mathbb {E}}[{\mathbb {W}}_{P} (h){\mathbb {W}}_{P}(h')] = \mathrm {Cov}_{P}(P^{r-1}h,P^{r-1}h')\). We will show that conditionally on \(X_{1}^{\infty } = \{ X_{1},X_{2},\dots \}\), \({\mathbb {U}}_{n}^{\sharp }\) converges weakly to \({\mathbb {W}}_{P}\) in probability in the sense that

$$\begin{aligned} d_{BL \mid X_{1}^{\infty }} ({\mathbb {U}}_{n}^{\sharp }, {\mathbb {W}}_{P}):= \sup _{f \in BL_{1}} | {\mathbb {E}}_{\mid X_{1}^{\infty }} [f({\mathbb {U}}^{\sharp }_{n})] - {\mathbb {E}}[f({\mathbb {W}}_{P})]| \end{aligned}$$

converges to zero in outer probability under regularity conditions (\({\mathbb {E}}_{\mid X_{1}^{\infty }}\) denotes the conditional expectation given \(X_{1}^{\infty }\)). Since the map \((\xi _{1},\dots ,\xi _{n}) \mapsto n^{-1/2} \sum _{i=1}^n \xi _{i}[ U_{n-1,-i}^{(r-1)} (\delta _{X_{i}}\cdot ) - U_n(\cdot ) ]\) is continuous from \({\mathbb {R}}^{n}\) into \(\ell ^{\infty }({\mathcal {H}})\), the multiplier process \({\mathbb {U}}_{n}^{\sharp }\) induces a Borel measurable map into \(\ell ^{\infty }({\mathcal {H}})\) for fixed \(X_{1}^{\infty }\). For an arbitrary map \(Y: \Omega \rightarrow {\mathbb {R}}\), let \(Y^{*}\) denote the measurable cover [53, lemma 1.2.1].

Theorem D.1

(Conditional UCLT for JMB) Let \({\mathcal {H}}\) be a fixed pointwise measurable class of symmetric measurable functions on \(S^{r}\) with symmetric envelope \(H \in L^{2}(P^{r})\) such that \(\int _{0}^{1} \sqrt{\lambda (\varepsilon )} d\varepsilon < \infty \) with \(\lambda (\varepsilon ) = \sup _{Q} \log N({\mathcal {H}},\Vert \cdot \Vert _{Q,2},\varepsilon \Vert H \Vert _{Q,2})\). Then \({\mathcal {G}}= P^{r-1}{\mathcal {H}}= \{ P^{r-1} h : h \in {\mathcal {H}}\}\) is P-pre-Gaussian, \(d_{BL}({\mathbb {U}}_{n}/r,{\mathbb {W}}_{P}) \rightarrow 0\), and \(d_{BL \mid X_{1}^{\infty }}({\mathbb {U}}_{n}^{\sharp },{\mathbb {W}}_{P})^{*} {\mathop {\rightarrow }\limits ^{{\mathbb {P}}}} 0\) as \(n \rightarrow \infty \).

Theorem D.1 should be compared with Theorem 2.1 in [5] that establishes a conditional UCLT for the empirical bootstrap for a non-degenerate U-process under the same metric entropy condition. Interestingly, however, our moment condition on the envelope H is weaker than their condition (2.3), which, if \(r=2\), requires \({\mathbb {E}}[H(X_1,X_1)]<\infty \) in addition to \({\mathbb {E}}[H^{2}(X_1,X_2)] < \infty \). This comes from the difference in how to estimate the Hajék projection; our JMB estimates the Hajék projection by a jackknife U-statistic, while the empirical bootstrap estimates it by a V-statistic (see Remark 3.1).

If we are interested in \(\sup _{h \in {\mathcal {H}}} {\mathbb {U}}_{n}(h)/r\), then the result of Theorem D.1 implies that

$$\begin{aligned} \begin{aligned}&\sup _{t \in {\mathbb {R}}} \left| {\mathbb {P}}\left( \sup _{h \in {\mathcal {H}}} {\mathbb {U}}_{n}(h)/r \leqslant t \right) - {\mathbb {P}}\left( \sup _{g \in {\mathcal {G}}} W_{P} (g) \leqslant t \right) \right| \rightarrow 0 \quad \text {and} \\&\sup _{t \in {\mathbb {R}}} \left| {\mathbb {P}}_{\mid X_{1}^{\infty }}\left( \sup _{h \in {\mathcal {H}}} {\mathbb {U}}_n^{\sharp } (h) \leqslant t \right) - {\mathbb {P}}\left( \sup _{g \in {\mathcal {G}}} W_{P} (g)\leqslant t\right) \right| {\mathop {\rightarrow }\limits ^{{\mathbb {P}}}} 0 \end{aligned} \end{aligned}$$

as long as the distribution function of \(\sup _{g \in {\mathcal {G}}} W_{P}(g)\) is continuous, which is true if \(\inf _{g \in {\mathcal {G}}} \mathrm {Var}_{P}(g) > 0\) (cf. Lemma A.1). When the function class \({\mathcal {H}}\) is centrally symmetric (i.e., \(-h \in {\mathcal {H}}\) whenever \(h \in {\mathcal {H}}\)) so that \(\sup _{h \in {\mathcal {H}}}{\mathbb {U}}_{n}(h) = \Vert {\mathbb {U}}_{n} \Vert _{{\mathcal {H}}}\), \(\sup _{g \in {\mathcal {G}}}W_{P}(g) = \Vert W_{P} \Vert _{{\mathcal {G}}}\), and \(\sup _{h \in {\mathcal {H}}}{\mathbb {U}}_{n}^{\sharp }(h) = \Vert {\mathbb {U}}_{n}^{\sharp } \Vert _{{\mathcal {H}}}\), then the distribution function of \(\Vert W_{P} \Vert _{{\mathcal {G}}}\) is continuous under a much less restrictive assumption that \(\mathrm {Var}_{P}(g) > 0\) for some \(g \in {\mathcal {G}}\). Indeed, from Theorem 11.1 in [17], the distribution of \(\Vert W_{P} \Vert _{{\mathcal {G}}}\) is (absolutely) continuous on \((\ell _{0},\infty )\) with \(\ell _{0} \geqslant 0\) being the left endpoint of the support of \(\Vert W_{P} \Vert _{{\mathcal {G}}}\), but from [37, p. 57–58], \(\ell _{0} = 0\). This implies that, unless \(\Vert W_{P} \Vert _{{\mathcal {G}}} = 0\) almost surely, the distribution function of \(\Vert W_{P} \Vert _{{\mathcal {G}}}\) does not have a jump at \(\ell _{0} = 0\) (as \({\mathbb {P}}(\Vert W_{P} \Vert _{{\mathcal {G}}} = 0) = 0\)) and so is everywhere continuous on \({\mathbb {R}}\).

Proof of Theorem D.1

The first two results are essentially implied by the proof of Theorem 4.9 in [4] but we include their proofs for completeness. By changing H to \(H \vee 1\) if necessary, we may assume \(\Vert G \Vert _{P,2} > 0\) (recall \(G=P^{r-1}H\)), which implies \(\Vert H \Vert _{P,2} > 0\). By Jensen’s inequality, \(\Vert P^{r-1}h \Vert _{P,2} \leqslant \Vert h \Vert _{P^{r},2}\) and so we have

$$\begin{aligned} N({\mathcal {G}},\Vert \cdot \Vert _{P,2},\tau \Vert H \Vert _{P^{r},2}) \leqslant N({\mathcal {H}}, \Vert \cdot \Vert _{P^{r},2}, \tau \Vert H \Vert _{P^{r},2}). \end{aligned}$$

The right hand side is bounded by \(\sup _{Q}N({\mathcal {H}},\Vert \cdot \Vert _{Q,2},\tau \Vert H \Vert _{Q,2}/4)\) by Lemma A.2. Conclude that

$$\begin{aligned} \int _{0}^{1} \sqrt{\log N({\mathcal {G}},\Vert \cdot \Vert _{P,2},\tau \Vert H \Vert _{P^{r},2})} d\tau < \infty , \end{aligned}$$

which implies by Dudley’s criterion for sample continuity that \({\mathcal {G}}\) is P-pre-Gaussian (to be precise we have to verify \(\int _{0}^{1} \sqrt{\log N(\{ g-Pg : g \in {\mathcal {G}}\},\Vert \cdot \Vert _{P,2},\tau )} d\tau < \infty \) but this is immediate). The convergence of marginals of \({\mathbb {U}}_{n}/r\) to \({\mathbb {W}}_{P}\) follows from the multidimensional CLT for U-statistics. To conclude \(d_{BL}({\mathbb {U}}_{n}/r,{\mathbb {W}}_{P}) \rightarrow 0\), it suffices to show the asymptotic equicontinuity condition

$$\begin{aligned} \lim _{\delta \downarrow 0} \limsup _{n \rightarrow \infty } {\mathbb {P}}\left( \sup _{\Vert h-h' \Vert _{P^{r},2} < \delta \Vert H \Vert _{P^{r},2}} | {\mathbb {U}}_{n} (h-h') | > \eta \right) = 0 \end{aligned}$$
(45)

holds for every \(\eta > 0\). We defer the proof of (45) after the proof of the theorem.

To prove the last result of the theorem, let \(e_{P} (h,h') = \Vert P^{r-1}(h-h') \Vert _{P,2}\) and for given \(\delta > 0\) let \(\{ h_{1},\dots ,h_{N(\delta )} \}\) be a \((\delta \Vert G \Vert _{P,2})\)-net of \(({\mathcal {H}},e_{P})\). Let \(\pi _{\delta }: {\mathcal {H}}\rightarrow \{ h_{1},\dots ,h_{N(\delta )} \}\) be a map such that for each \(h \in {\mathcal {H}}\), \(e_{P} (h,\pi _{\delta }(h)) \leqslant \delta \Vert G \Vert _{P,2}\). Define \({\mathbb {U}}_{n,\delta }^{\sharp } := {\mathbb {U}}_{n}^{\sharp } \circ \pi _{\delta }\) and \({\mathbb {W}}_{P,\delta } := {\mathbb {W}}_{P} \circ \pi _{\delta }\). For any \(f \in BL_{1}\), we have

$$\begin{aligned} \begin{aligned} | {\mathbb {E}}_{\mid X_{1}^{\infty }}[f({\mathbb {U}}_{n}^{\sharp })] - {\mathbb {E}}[f({\mathbb {W}}_{P})] |&\leqslant | {\mathbb {E}}_{\mid X_{1}^{\infty }}[f({\mathbb {U}}_{n}^{\sharp })] - {\mathbb {E}}_{\mid X_{1}^{\infty }} [f({\mathbb {U}}_{n,\delta }^{\sharp })]| \\&\quad +\, |{\mathbb {E}}_{\mid X_{1}^{\infty }}[f({\mathbb {U}}_{n,\delta }^{\sharp })] - {\mathbb {E}}[f({\mathbb {W}}_{P,\delta })]| \\&\quad +\, | {\mathbb {E}}[f({\mathbb {W}}_{P,\delta })] - {\mathbb {E}}[f({\mathbb {W}}_{P})]|. \end{aligned} \end{aligned}$$
(46)

The third term on the right hand side of (46) is bounded by \({\mathbb {E}}[2 \wedge \Vert {\mathbb {W}}_{P,\delta } - {\mathbb {W}}_{P} \Vert _{{\mathcal {H}}}]\) and by construction \({\mathbb {W}}_{P}\) has sample paths almost surely uniformly \(e_{P}\)-continuous, so that \({\mathbb {E}}[2 \wedge \Vert {\mathbb {W}}_{P,\delta } - {\mathbb {W}}_{P} \Vert _{{\mathcal {H}}}] \rightarrow 0\) as \(\delta \downarrow 0\) by the dominated convergence theorem. Since \({\mathbb {U}}_{n,\delta }^{\sharp }\) can be identified with a Gaussian vector of dimension \(N(\delta )\) conditionally on \(X_{1}^{\infty }\), by Lemma 3.7.46 in [29], the second term on the right hand side of (46) is bounded by

$$\begin{aligned} c(\delta ) \max _{1 \leqslant j,k \leqslant N(\delta )} | {\widehat{C}}_{j,k} -\mathrm {Cov}_{P}(P^{r-1}h_{j},P^{r-1}h_{k}) |^{1/3} \end{aligned}$$

for some constant \(c(\delta )\) that depends only on \(\delta \), where

$$\begin{aligned} {\widehat{C}}_{j,k} = n^{-1}\sum _{i=1}^{n}\{ U_{n-1,-i}^{(r-1)}(\delta _{X_{i}}h_{j}) - U_{n}(h_{j}) \}\{ U_{n-1,-i}^{(r-1)}(\delta _{X_{i}}h_{k}) - U_{n}(h_{k}) \}. \end{aligned}$$

From Step 5 of the proof of Theorem 3.1 and using the notation in the proof, we have

$$\begin{aligned}&\max _{1 \leqslant j,k \leqslant N(\delta )} | {\widehat{C}}_{j,k} - \mathrm {Cov}_{P}(P^{r-1}h_{j},P^{r-1}h_{k}) | \\&\quad \leqslant 2 \Upsilon _{n} + 2\Vert G \Vert _{P,2} \Upsilon _{n}^{1/2} + 2n^{-1/2} \Vert {\mathbb {G}}_{n} \Vert _{\breve{{\mathcal {G}}} \cdot \breve{{\mathcal {G}}}} + \Vert U_{n}(h) - P^{r}h \Vert _{{\mathcal {H}}}^{2}. \end{aligned}$$

From the UCLT for the U-process established in the first paragraph, the last term on the right hand side is \(o_{{\mathbb {P}}}(1)\). The function class \(\breve{{\mathcal {G}}} \cdot \breve{{\mathcal {G}}}\) is weak P-Glivenko-Cantelli by Lemmas A.3 and A.5 together with Theorem 2.4.3 in [53], which implies that \(n^{-1/2} \Vert {\mathbb {G}}_{n} \Vert _{\breve{{\mathcal {G}}} \cdot \breve{{\mathcal {G}}}} = o_{{\mathbb {P}}}(1)\). From Lemma D.3 below, we also have \(\Upsilon _{n} = o_{{\mathbb {P}}}(1)\).

Finally, the first term on the right hand side of (46) is bounded by

$$\begin{aligned} \varepsilon + 2{\mathbb {P}}_{\mid X_{1}^{\infty }} (\Vert {\mathbb {U}}_{n}^{\sharp } \Vert _{{\mathcal {H}}_{\delta }} > \varepsilon ) \end{aligned}$$

for any \(\varepsilon > 0\), where \({\mathcal {H}}_{\delta } = \{ h-h' : h,h' \in {\mathcal {H}}, e_{P}(h,h') < 2\delta \Vert G \Vert _{P,2} \}\). Let \(\Sigma _{n,\delta } := \Vert n^{-1} \sum _{i=1}^{n}\{ U_{n-1,-i}^{(r-1)} (\delta _{X_{i}}h) - U_{n}(h) \}^{2} \Vert _{{\mathcal {H}}_{\delta }}\). By Markov’s inequality,

$$\begin{aligned} {\mathbb {P}}_{\mid X_{1}^{\infty }} (\Vert {\mathbb {U}}_{n}^{\sharp } \Vert _{{\mathcal {H}}_{\delta }} > \varepsilon ) \leqslant \frac{{\mathbb {E}}_{\mid X_{1}^{\infty }}[\Vert {\mathbb {U}}_{n}^{\sharp } \Vert _{{\mathcal {H}}_{\delta }}]}{\varepsilon }. \end{aligned}$$

From Step 5 of the proof of Theorem 3.1,

$$\begin{aligned} N({\mathcal {H}}_{\delta },d,2\tau \Vert H \Vert _{{\mathbb {P}}_{I_{n,r},2}}) \leqslant N^{2}({\mathcal {H}},\Vert \cdot \Vert _{{\mathbb {P}}_{I_{n,r},2}}, \tau \Vert H \Vert _{{\mathbb {P}}_{I_{n,r},2}}) \end{aligned}$$

with \(d(h,h') = \{ {\mathbb {E}}_{\mid X_{1}^{\infty }} [\{ {\mathbb {U}}_{n}^{\sharp } (h) - {\mathbb {U}}_{n}^{\sharp } (h') \}^{2}]\}^{1/2}\). Hence by Dudley’s entropy integral bound, we have

$$\begin{aligned} {\mathbb {E}}_{\mid X_{1}^{\infty }}[\Vert {\mathbb {U}}_{n}^{\sharp } \Vert _{{\mathcal {H}}_{\delta }}] \lesssim \int _{0}^{\Sigma _{n,\delta }^{1/2}} \sqrt{1 + \lambda (\tau /\Vert H \Vert _{{\mathbb {P}}_{I_{n,r},2}})} d\tau \end{aligned}$$

up to a constant independent of n and \(\delta \), and \(\Vert H \Vert _{{\mathbb {P}}_{I_{n,r},2}}^{2} = |I_{n,r}|^{-1}\sum _{I_{n,r}} H^{2}(X_{i_{1}},\dots ,X_{i_{r}}) = \Vert H \Vert _{P^{r},2}^{2} + o_{{\mathbb {P}}}(1)\) by the law of large numbers for U-statistics [18, Theorem 4.1.4]. From Step 4 of the proof of Theorem 3.1,

$$\begin{aligned} \Sigma _{n,\delta } \leqslant 8(\delta \Vert G \Vert _{P,2})^{2} + 8n^{-1/2} \Vert {\mathbb {G}}_{n} \Vert _{\breve{{\mathcal {G}}} \cdot \breve{{\mathcal {G}}}} +8 \Upsilon _{n}, \end{aligned}$$

and the last two terms on the right hand side are \(o_{{\mathbb {P}}}(1)\) while the first term can be arbitrarily small by taking \(\delta \) sufficiently small. This implies that for any \(\eta > 0\),

$$\begin{aligned} \lim _{\delta \downarrow 0} \limsup _{n \rightarrow \infty }{\mathbb {P}}\left( {\mathbb {P}}_{\mid X_{1}^{\infty }} (\Vert {\mathbb {U}}_{n}^{\sharp } \Vert _{{\mathcal {H}}_{\delta }}> \varepsilon ) > \eta \right) = 0. \end{aligned}$$

Putting everything together, we conclude \(d_{BL \mid X_{1}^{\infty }}({\mathbb {U}}_{n}^{\sharp },{\mathbb {W}}_{P})^{*} {\mathop {\rightarrow }\limits ^{{\mathbb {P}}}} 0\), completing the proof. \(\square \)

Lemma D.2

Under the assumption of Theorem D.1, the asymptotic equicontinuity condition (45) holds.

Proof of Lemma D.2

For \(\delta \in (0,1]\), let \({\mathcal {H}}_{\delta }' = \{ h -h' : \Vert h - h' \Vert _{P^{r},2} < \delta \Vert H \Vert _{P^{r},2} \}\). By Markov’s inequality, it suffices to show that

$$\begin{aligned} \lim _{\delta \downarrow 0} \limsup _{n \rightarrow \infty } {\mathbb {E}}[ \Vert {\mathbb {U}}_{n} \Vert _{{\mathcal {H}}_{\delta }'}] = 0. \end{aligned}$$

We use Hoeffding’s averaging [49, Section 5.1.6] to bound the expectation. Let

$$\begin{aligned} S_{f}(x_{1},\dots ,x_{n}) = \frac{1}{m} \sum _{i=1}^{m} f(x_{(i-1)r+1},\dots ,x_{ir}) \ \text {with} \ m=\lfloor n/r \rfloor . \end{aligned}$$

Then we have

$$\begin{aligned} U_{n}(h) = \frac{1}{n!} \sum _{j_{1},\dots ,j_{n}} S_{h}(X_{j_{1}},\dots ,X_{j_{n}}), \end{aligned}$$

where \(\sum _{j_{1},\dots ,j_{n}}\) are taken over all permutations \(j_{1},\dots ,j_{n}\) of \(1,\dots ,n\). By Jensen’s inequality, \({\mathbb {E}}[ \Vert {\mathbb {U}}_{n} \Vert _{{\mathcal {H}}_{\delta }'}]\) is bounded by \(\sqrt{n}{\mathbb {E}}[\Vert S_{h}(X_{1},\dots ,X_{n}) - P^{r}h \Vert _{{\mathcal {H}}_{\delta }'}]\). Since

$$\begin{aligned} S_{h}(X_{1},\dots ,X_{n}) - P^{r}h = \frac{1}{m} \sum _{i=1}^{m} (h(X_{(i-1)r+1},\dots ,X_{ir}) - P^{r}h) \end{aligned}$$

and since \((X_{(i-1)r+1},\dots ,X_{ir}) , i=1,\dots ,m\) are i.i.d., we can apply Theorem 5.2 in [14] to conclude that

$$\begin{aligned} {\mathbb {E}}[ \Vert {\mathbb {U}}_{n} \Vert _{{\mathcal {H}}_{\delta }'}] \lesssim \Vert H \Vert _{P^{r},2}J(\delta ,{\mathcal {H}}_{\delta }', 2H) + \frac{\Vert M_{r} \Vert _{{\mathbb {P}},2}J^{2}(\delta ,{\mathcal {H}}_{\delta }',2H)}{\delta ^{2} \sqrt{m}} \end{aligned}$$

up to a constant that depends only on r, where \(M_{r} = \max _{1 \leqslant i \leqslant m} H(X_{(i-1)r+1},\dots ,X_{ir})\) and the J function is defined in [14]. From a standard calculation, \(J(\delta ,{\mathcal {H}}_{\delta }', 2H) \lesssim J(\delta ,{\mathcal {H}},H) = \int _{0}^{\delta }\sqrt{1+\lambda (\tau )} d\tau \) up to a universal constant and \(\Vert M_{r} \Vert _{{\mathbb {P}},2} = o(\sqrt{m})\) by \(H \in L^{2}(P^{r})\) [53, Problem 2.3.4]. Hence we conclude

$$\begin{aligned} \limsup _{n \rightarrow \infty } {\mathbb {E}}[ \Vert {\mathbb {U}}_{n} \Vert _{{\mathcal {H}}_{\delta }'}] \lesssim \Vert H \Vert _{P^{r},2}J(\delta ,{\mathcal {H}},H) \end{aligned}$$

up to a constant that depends only on r, and by the dominated convergence theorem the right hand side is o(1) as \(\delta \downarrow 0\). This completes the proof. \(\square \)

Lemma D.3

Under the assumption of Theorem D.1, we have \({\mathbb {E}}[\Upsilon _{n}]= O(n^{-1})\) where \(\Upsilon _{n}\) is defined in (31).

Proof of Lemma D.3

We begin with noting that

$$\begin{aligned} {\mathbb {E}}[\Upsilon _{n}] \leqslant {\mathbb {E}}\left[ {\mathbb {E}}\left[ \left\| U_{n-1,-n}^{(r-1)} (\delta _{X_{n}}h) - P^{r-1}(\delta _{X_{n}}h) \right\| _{{\mathcal {H}}}^{2} \ \Big | \ X_{n} \right] \right] . \end{aligned}$$

By Hoeffding’s averaging [49, Section 5.1.6],

$$\begin{aligned} U_{n-1,-n}^{(r-1)} (f) = \frac{1}{(n-1)!} \sum _{j_{1},\dots ,j_{n-1}} T_{f}(X_{j_{1}},\dots ,X_{j_{n-1}}), \end{aligned}$$

where \(\sum _{j_{1},\dots ,j_{n-1}}\) is taken over all permutations \(j_{1},\dots ,j_{n-1}\) of \(1,\dots ,n-1\), and

$$\begin{aligned} T_{f}(x_{1},\dots ,x_{n-1}) {=} \frac{1}{m} \sum _{i=1}^{m} f(x_{(i-1)(r-1){+}1},\dots ,x_{i(r-1)}) \ \text {with} \ m=\lfloor (n-1)/(r-1) \rfloor . \end{aligned}$$

By Jensen’s inequality,

$$\begin{aligned}&{\mathbb {E}}\left[ \left\| U_{n-1,-n}^{(r-1)} (\delta _{X_{n}}h) - P^{r-1}(\delta _{X_{n}}h) \right\| _{{\mathcal {H}}}^{2} \ \Big | \ X_{n} \right] \\&\quad \leqslant {\mathbb {E}}\left[ \left\| T_{\delta _{X_{n}h}} (X_{1},\dots ,X_{n-1}) - P^{r-1}(\delta _{X_{n}}h) \right\| _{{\mathcal {H}}}^{2} \ \Big | \ X_{n} \right] . \end{aligned}$$

By Corollary A.4 and the condition of Theorem D.1, for given \(x \in S\),

$$\begin{aligned} \int _{0}^{1} \sqrt{ \sup _{Q} \log N(\delta _{x} {\mathcal {H}}, \Vert \cdot \Vert _{Q,2},\tau \Vert \delta _{x} H \Vert _{Q,2})} \leqslant \int _{0}^{1} \sqrt{\lambda (\tau )} d\tau < \infty . \end{aligned}$$

Hence, applying Theorem 2.14.1 in [53] conditionally on \(X_{n}\), we have

$$\begin{aligned}&{\mathbb {E}}\left[ \left\| T_{\delta _{X_{n}}h} (X_{1},\dots ,X_{n-1}) - P^{r-1}(\delta _{X_{n}}h) \right\| _{{\mathcal {H}}}^{2} \ \Big | \ X_{n} \right] \lesssim n^{-1} \Vert \delta _{X_{n}} H \Vert _{P^{r-1},2}^{2} \end{aligned}$$

up to a constant independent of n. Since \({\mathbb {E}}[\Vert \delta _{X_{n}} H \Vert _{P^{r-1},2}^{2}] = \Vert H \Vert _{P^{r},2}^{2}\), we obtain the desired conclusion by Fubini’s theorem. \(\square \)

Appendix E. Gaussian approximation for suprema of U-processes indexed by general function classes

In this section we derive Gaussian approximation error bounds for the U-process supremum indexed by general function classes. We obey the notation used in Sects. 23 and 5. We make the following assumptions on the function class \({\mathcal {H}}\) and the distribution P.

  1. (A1)

    The function class \({\mathcal {H}}\) is pointwise measurable.

  2. (A2)

    The envelope H satisfies that \(H \in L^{3}(P^{r})\).

  3. (A3)

    The class \({\mathcal {G}}= P^{r-1} {\mathcal {H}}= \{ P^{r-1} h : h \in {\mathcal {H}}\}\) is P-pre-Gaussian, i.e., there exists a tight Gaussian random variable \(W_{p}\) in \(\ell ^{\infty }({\mathcal {G}})\) with mean zero and covariance function \({\mathbb {E}}[W_{P}(g) W_{P}(g')] = \mathrm {Cov}(g(X_{1}), g'(X_{1}))\) for all \(g,g' \in {\mathcal {G}}\).

Conditions (A1)–(A3) are parallel with the corresponding conditions in [14]. Condition (A1) is the same as Condition (PM) in Sect. 2. Condition (A3) is a high-level assumption that is implied by Condition (VC) in Sect. 2.

For \(\varepsilon > 0\), define \({\mathcal {N}}_{n}(\varepsilon ) = \log (N({\mathcal {G}}, \Vert \cdot \Vert _{P,2}, \varepsilon \Vert G \Vert _{P,2}) \vee n)\) with \(G= P^{r-1}H\). Under Condition (A3), \({\mathcal {G}}\) is totally bounded for the intrinsic pseudometric induced by \(\Vert \cdot \Vert _{P,2}\) and \({\mathcal {N}}_{n}(\varepsilon )\) is finite for every \(\varepsilon \in (0,1]\). In addition, the Gaussian process \(W_{P}\) extends to the linear hull of \({\mathcal {G}}\) in such a way that \(W_{P}\) has linear sample paths (see e.g., Theorem 3.7.28 in [29]). For \(\varepsilon \in (0,1], \gamma \in (0,1)\), and \(\kappa > 0\), define

$$\begin{aligned} \Delta _n(\varepsilon , \gamma , \kappa ) :=\,&\gamma ^{-1} {\mathbb {E}}[\Vert {\mathbb {G}}_{n}\Vert _{{\mathcal {G}}_{\varepsilon }}] + {\mathbb {E}}[\Vert W_{P}\Vert _{{\mathcal {G}}_{\varepsilon }}] \\&+\, \sqrt{\log (1/\gamma )} \varepsilon \Vert G \Vert _{P,2} + n^{-1/6} \gamma ^{-1/3} \kappa {\mathcal {N}}_{n}^{2/3}(\varepsilon ) \\&+\, n^{-1/4} \gamma ^{-1/2} ({\mathbb {E}}\Vert {\mathbb {G}}_{n}\Vert _{\breve{{\mathcal {G}}} \cdot \breve{{\mathcal {G}}}})^{1/2} {\mathcal {N}}_{n}^{1/2}(\varepsilon ) \\&+\, n^{1/2} \gamma ^{-1} \sum _{k=2}^{r} {\mathbb {E}}[\Vert U_{n}^{(k)}(\pi _{k} h)\Vert _{{\mathcal {H}}}], \\ \delta _{n}(\varepsilon , \gamma , \kappa ) :=\,&{1 \over 5} P \left[ (\breve{G}/\kappa )^{3} {1}(\breve{G}/\kappa > c \gamma ^{-1/3} n^{1/3} {\mathcal {N}}_{n}(\varepsilon )^{-1/3}) \right] , \end{aligned}$$

where \({\mathcal {G}}_{\varepsilon } = \{g-g' : g, g' \in {\mathcal {G}}, \Vert g-g'\Vert _{P,2} < 2\varepsilon \Vert G\Vert _{P,2}\}\), \(\breve{{\mathcal {G}}} \cdot \breve{{\mathcal {G}}} = \{gg' : g, g' \in \breve{{\mathcal {G}}}\}\), \(\breve{{\mathcal {G}}} = \{g, g-Pg : g \in {\mathcal {G}}\}\), and \(\breve{G} = G + PG\). Here \(c > 0\) is some universal constant. Below is an abstract (yet general) version of the Gaussian coupling bound.

Proposition E.1

(Abstract Gaussian coupling bound) Let \(Z_{n} = \sup _{h \in {\mathcal {H}}} {\mathbb {U}}_{n}(h)/r\). Suppose that Conditions (A1)–(A3) hold. Let \(\kappa > 0\) be any positive constant such that \(\kappa ^{3} \geqslant {\mathbb {E}}[\Vert n^{-1}\sum _{i=1}^{n}|g(X_{i}) - P g|^{3}\Vert _{{\mathcal {G}}}]\). Then, for every \(n \geqslant r+1\), \(\varepsilon \in (0,1]\), and \(\gamma \in (0,1)\), one can construct a random variable \({\widetilde{Z}}_{n} = {\widetilde{Z}}_{n,\varepsilon ,\gamma ,\kappa }\) such that \({\mathcal {L}}({\widetilde{Z}}_{n}) = {\mathcal {L}}(\sup _{g \in {\mathcal {G}}} W_P(g))\) and

$$\begin{aligned} {\mathbb {P}}\left( |Z_{n} - {\widetilde{Z}}_{n}| > C_{1} \Delta _n(\varepsilon , \gamma ,\kappa ) \right) \leqslant \gamma \{1 + \delta _{n}(\varepsilon , \gamma ,\kappa )\} + {C_{2} \log {n} \over n}, \end{aligned}$$

where \(C_{1} = C_{1,r}\) is a constant depending only on r and \(C_{2}\) is a universal constant.

The proposition should be considered as an extension of Theorem 2.1 in [14] to the U-process. To apply the above proposition, we need to derive bounds on

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}[\Vert {\mathbb {G}}_{n}\Vert _{{\mathcal {G}}_{\varepsilon }}], \; {\mathbb {E}}[\Vert W_{P}\Vert _{{\mathcal {G}}_{\varepsilon }}], \; {\mathbb {E}}\left[ \left\| n^{-1}\sum _{i=1}^{n}|g(X_{i})-Pg|^{3}\right\| _{{\mathcal {G}}}\right] , \\&{\mathbb {E}}[\Vert {\mathbb {G}}_{n}\Vert _{\breve{{\mathcal {G}}} \cdot \breve{{\mathcal {G}}}}], \; \text{ and } {\mathbb {E}}[\Vert U_{n}^{(k)}(\pi _{k} h)\Vert _{{\mathcal {H}}}, k = 2,\dots ,r, \end{aligned} \end{aligned}$$
(47)

which can be derived under some moment conditions on H and by using the uniform entropy integrals \(J_{k}(\delta ), k=1,\dots ,r\) defined in (19) (cf. Lemma 2.2 in [14] and our Theorem 5.1), where the latter can be simplified in terms of the VC characteristics (Av) for a VC type function class (cf. the proof of Corollary 5.3).

Proof of Proposition E.1

The proof is based on a modification to that of Theorem 2.1 in [14]. In this proof C denotes a generic universal constant; the value of C may change from place to place. Let \(\{g_{k}\}_{k=1}^{N}\) be a minimal \(\varepsilon \Vert G\Vert _{P,2}\)-net of \(({\mathcal {G}}, \Vert \cdot \Vert _{P,2})\) with \(N := N({\mathcal {G}}, \Vert \cdot \Vert _{P,2}, \varepsilon \Vert G\Vert _{P,2})\). By the definition of \({\mathcal {G}}\), each \(g_{k}\) corresponds to a kernel \(h_{k} \in {\mathcal {H}}\) such that \(g_{k}=P^{r-1}h_{k}\). Recall the Hoeffding decomposition \({\mathbb {U}}_{n}(h) = r {\mathbb {G}}_{n}(P^{r-1}h) + \sqrt{n} \sum _{k=2}^{r} {r \atopwithdelims ()k} U_{n}^{(k)}(\pi _{k}h)\), where \({\mathbb {G}}_{n}(P^{r-1} h) = n^{-1/2} \sum _{i=1}^{n} (P^{r-1}h (X_{i}) - P^{r}h)\). Let \(L_{n}=\sup _{g \in {\mathcal {G}}} {\mathbb {G}}_{n}(g)\) and \(R_{n}=\Vert r^{-1} \sqrt{n} \sum _{k=2}^{r} {r \atopwithdelims ()k} U_{n}^{(k)}(\pi _{k}h)\Vert _{{\mathcal {H}}}\). Then \(|Z_{n}-L_{n}| \leqslant R_{n}\). Define

$$\begin{aligned} L_{n}^{\varepsilon } = \max _{1 \leqslant j \leqslant N} {\mathbb {G}}_{n}(g_{j}), \; {\widetilde{Z}} = \sup _{g \in {\mathcal {G}}} W_{P}(g), \; {\widetilde{Z}}^{\varepsilon } = \max _{1 \leqslant j \leqslant N} W_{P}(g_{j}). \end{aligned}$$

We note that \(|L_{n}-L_{n}^{\varepsilon }| \leqslant \Vert {\mathbb {G}}_{n}\Vert _{{\mathcal {G}}_{\varepsilon }}\) and \(|{\widetilde{Z}}-{\widetilde{Z}}^{\varepsilon }| \leqslant \Vert W_{P}\Vert _{{\mathcal {G}}_{\varepsilon }}\). By Corollary 4.1 in [14], we have for every \(B \in {\mathcal {B}}({\mathbb {R}})\) and \(\delta > 0\),

$$\begin{aligned} {\mathbb {P}}(L_{n}^{\varepsilon } \in B) - {\mathbb {P}}({\widetilde{Z}}^{\varepsilon } \in B^{16\delta }) \leqslant C \delta ^{-2} \{T_{1}+\delta ^{-1}(T_{2}+T_{3}) {\mathcal {N}}_{n}(\varepsilon )\} {\mathcal {N}}_{n}(\varepsilon ) + C n^{-1} \log {n}, \end{aligned}$$

where

$$\begin{aligned} T_{1} =\,&n^{-1} \\&\quad {\mathbb {E}}\left[ \max _{1 \leqslant j,k \leqslant N} \left| \sum _{i=1}^{n} (g_{j}(X_{i}){-}P g_{j}) (g_{k}(X_{i}){-}P g_{k}) {-} P(g_{j}{-}P g_{j}) (g_{k}-P g_{k})\right| \right] , \\ T_{2} =\,&n^{-3/2} {\mathbb {E}}\left[ \max _{1 \leqslant j \leqslant N} \sum _{i=1}^{n} |g_{j}(X_{i}) {-} P g_{j}|^{3} \right] , \\ T_{3} =\,&n^{-1/2} \\&\quad {\mathbb {E}}\left[ \max _{1 \leqslant j \leqslant N} |g_{j}(X_{1})-Pg_{j}|^{3} \cdot 1\left( \max _{1 \leqslant j \leqslant N} |g_{j}(X_{1})-Pg_{j}| > \delta \sqrt{n} {\mathcal {N}}_{n}(\varepsilon )^{-1} \right) \right] . \end{aligned}$$

Observe that \(T_{1} \leqslant n^{-1/2} {\mathbb {E}}[\Vert {\mathbb {G}}_{n}\Vert _{\breve{{\mathcal {G}}} \cdot \breve{{\mathcal {G}}}}]\), \(T_{2} \leqslant n^{-1/2} \kappa ^{3}\), and \(T_{3} \leqslant n^{-1/2} P[\breve{G}^{3} 1(\breve{G}>\delta \sqrt{n} {\mathcal {N}}_{n}(\varepsilon )^{-1})]\). Thus choosing

$$\begin{aligned} \delta \geqslant C \max \left\{ \gamma ^{-1/2} n^{-1/4} ({\mathbb {E}}[\Vert {\mathbb {G}}_{n}\Vert _{\breve{{\mathcal {G}}} \cdot \breve{{\mathcal {G}}}}])^{1/2} {\mathcal {N}}_{n}^{1/2}(\varepsilon ), \; \gamma ^{-1/3} n^{-1/6} \kappa {\mathcal {N}}_{n}^{2/3}(\varepsilon ) \right\} , \end{aligned}$$

we have

$$\begin{aligned} {\mathbb {P}}(L_{n}^{\varepsilon } \in B) \leqslant {\mathbb {P}}({\widetilde{Z}}^{\varepsilon } \in B^{16\delta }) + {2 \gamma \over 5} + {\gamma \over 5} \kappa ^{-3} P[\breve{G}^{3} 1(\breve{G}>\delta \sqrt{n} {\mathcal {N}}_{n}(\varepsilon )^{-1})] + {C \log {n} \over n}. \end{aligned}$$

Since \(\delta \geqslant c \gamma ^{-1/3} n^{-1/6} \kappa {\mathcal {N}}_{n}^{2/3}(\varepsilon )\), we have

$$\begin{aligned} P[\breve{G}^{3} 1(\breve{G}>\delta \sqrt{n} {\mathcal {N}}_{n}(\varepsilon )^{-1})] \leqslant P[\breve{G}^{3} 1(\breve{G}/\kappa >c \gamma ^{-1/3} n^{1/3} {\mathcal {N}}_{n}(\varepsilon )^{-1/3})]. \end{aligned}$$

Conclude that with \(\eta _{n} = (\gamma / 5) P[(\breve{G}/\kappa )^{3} 1(\breve{G}/\kappa >c \gamma ^{-1/3} n^{1/3} {\mathcal {N}}_{n}(\varepsilon )^{-1/3})]\),

$$\begin{aligned} {\mathbb {P}}(L_{n}^{\varepsilon } \in B) \leqslant {\mathbb {P}}({\widetilde{Z}}^{\varepsilon } \in B^{16\delta }) + {2\gamma \over 5} + \eta _{n} + {C \log {n} \over n}. \end{aligned}$$

Next, we will bound \(\Vert {\mathbb {G}}_{n}\Vert _{{\mathcal {G}}_{\varepsilon }}\) and \(\Vert W_{P}\Vert _{{\mathcal {G}}_{\varepsilon }}\). By Markov’s inequality, with probability at least \(1-\gamma /5\),

$$\begin{aligned} \Vert {\mathbb {G}}_{n}\Vert _{{\mathcal {G}}_{\varepsilon }} \leqslant 5\gamma ^{-1}{\mathbb {E}}[\Vert {\mathbb {G}}_{n}\Vert _{{\mathcal {G}}_{\varepsilon }}] =: a. \end{aligned}$$

Further, by the Borell–Sudakov–Tsirel’son inequality (see Theorem 2.5.8 in [29]), with probability at least \(1-\gamma /5\), we have

$$\begin{aligned} \Vert W_{P}\Vert _{{\mathcal {G}}_{\varepsilon }} \leqslant {\mathbb {E}}[\Vert W_{P}\Vert _{{\mathcal {G}}_{\varepsilon }}] + 2 \varepsilon \Vert G \Vert _{P,2} \sqrt{2\log (5/\gamma )} =: b. \end{aligned}$$

Therefore, for every \(B \in {\mathcal {B}}({\mathbb {R}})\),

$$\begin{aligned} {\mathbb {P}}(Z_{n} \in B) \leqslant \,&{\mathbb {P}}(L_{n} \in B^{5\gamma ^{-1} {\mathbb {E}}[R_{n}]}) + {\gamma \over 5} \leqslant {\mathbb {P}}(L_{n}^{\varepsilon } \in B^{a+5\gamma ^{-1} {\mathbb {E}}[R_{n}]}) + {2\gamma \over 5} \\ \leqslant \,&{\mathbb {P}}({\widetilde{Z}}^{\varepsilon } \in B^{a+16\delta +5\gamma ^{-1} {\mathbb {E}}[R_{n}]}) + {4\gamma \over 5} + \eta _{n} + {C \log {n} \over n}\\ \leqslant \,&{\mathbb {P}}({\widetilde{Z}} \in B^{a+b+16\delta +5\gamma ^{-1} {\mathbb {E}}[R_{n}]}) + \gamma + \eta _{n} + {C \log {n} \over n}. \end{aligned}$$

The conclusion of the proposition follows from the Strassen–Dudley theorem (see Theorem B.1). \(\square \)

Appendix F. Alternative tests for concavity/convexity and monotonicity of regression functions

We will obey the setting of Example 4.2.

1.1 F.1. Alternative tests for concavity/convexity of regression function f

Instead of the original localized simplex statistic (11) proposed in [1], we may consider the following modified version:

$$\begin{aligned} {\widetilde{U}}_{n}(x) = {1 \over |I_{n,m+2}|} \sum _{(i_{1},\dots ,i_{m+2}) \in I_{n,m+2}} {\widetilde{\varphi }}(V_{i_{1}},\dots ,V_{i_{m+2}}) \prod _{k=1}^{m+2} L_{b_n}(x-X_{i_{k}}), \end{aligned}$$

where \({\widetilde{\varphi }} (v_{1},\dots ,v_{m+2}) = 1\{ (x_{1},\dots ,x_{m+2}) \in {\mathcal {D}}\} w(v_{1},\dots ,v_{m+2})\), and test concavity or convexity of f if the scaled supremum or infimum of \({\widetilde{U}}_{n}\) is large or small, respectively. These alternative tests will work without the symmetry assumption on the conditional distribution of \(\varepsilon \), which is maintained in [1]. Our results below also cover these alternative tests.

1.2 F.2. Alternative tests for monotonicity of regression function f

Chetverikov [16] considers testing monotonicity of the regression function f without the assumption that the error term \(\varepsilon \) is independent of X. Chetverikov [16] studies, e.g., U-statistics given by replacing \(\mathrm {sign}(Y_{j}{-}Y_{i})\) in (12) by \(Y_{j}{-}Y_{i}\), and the test statistic defined by taking the maximum of such U-statistics over a discrete set of design points and bandwidths whose cardinality may grow with the sample size (indeed, the cardinality can be much larger than the sample size). His analysis is conditional on \(X_{i}\)’s, and he cleverly avoids U-process machineries and applies directly high-dimensional Gaussian and bootstrap approximation theorems developed in [12]. It should be noted that [16] considers more general test statistics and studies multi-step procedures to improve on powers of his tests.

Another related test for regression monotonicity is based on the local linear rank statistics [21]. Let \(R_{mk}(i) = \sum _{j=m+1}^{k} 1(Y_{j} \leqslant Y_{i})\) be the local rank of \(Y_{i}\) among \(Y_{m+1},\dots ,Y_{k}\). In [21], Dümbgen considers a test for monotone trend of f (with fixed design points \(X_{1},\dots ,X_{n}\)) via the local linear rank statistics

$$\begin{aligned} T_{mk} = \sum _{i=m+1}^{k} \beta \left( {i-m \over k-m+1} \right) q \left( {R_{mk}(i) \over k-m+1} \right) , \quad 0 \leqslant m < k \leqslant n, \end{aligned}$$

where \(\beta \) and q are functions on (0, 1) such that: 1) \(\beta (1-u)=-\beta (u)\) and \(q(1-u)=-q(u)\) for \(u \in (0,1)\); 2) \(\beta (\cdot )\) and \(q(\cdot )\) are nondecreasing on (0, 1). Then [21] proposes the multiscale test statistic

$$\begin{aligned} T = \max _{0 \leqslant m < k \leqslant n} (s_{k-m} |T_{mk}| - c_{k-m}), \end{aligned}$$

where \(s_{i}\) and \(c_{i}\) are properly chosen nonnegative numbers. For the special case of the Wilcoxon score function \(q(u) = 2u-1\) and \(\beta (u) = q(u)\), one can write

$$\begin{aligned} T_{mk} = {2 \over (k-m+1)^{2}} \sum _{m< i < j \leqslant k} (j-i) \mathrm {sign}(Y_{j}-Y_{i}). \end{aligned}$$

The statistic \(T_{mk}\) is related to our test statistic \({\check{U}}_{n}(x)\) with \(L(u) = 1(u \in [-1,1])\), namely \(T_{mk}\) and \({\check{U}}_{n}(x)\) are (local) U-statistics with kernels \((j-i) \mathrm {sign}(Y_{j}-Y_{i})\) and \(\mathrm {sign}(X_{i}-X_{j}) \mathrm {sign}(Y_{j}-Y_{i})\), respectively. Thus for a given sequence of bandwidths \(b_{n}\), our monotonicity test based on the U-process \({\check{U}}_{n}(x)\) can be viewed as a single-scale test \(T_{mk}\) with \((k-m)/n = 2 b_{n}\) in Dümbgen’s sense. In particular, both \(T_{0n}\) and \({\check{U}}_{n}(x)\) with \(b_{n} = 1\) quantify the monotonicity on the global scale. In addition, the “uniform-in-bandwidth” type results for our U-process approach in Sect. 4.1 can be viewed as the multiscale analog T of \(T_{mk}\) with the Wilcoxon score function. Nevertheless, since [21] considers the fixed design points, \(T_{mk}\) is a local U-statistic on \(Y_{i}\)’s and \({\check{U}}_{n}(x)\) is a local U-statistic on \((X_{i}, Y_{i})\)’s. Our analysis (which requires a Lebesgue density on X) is not directly applicable for the local linear rank statistics of [21].

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, X., Kato, K. Jackknife multiplier bootstrap: finite sample approximations to the U-process supremum with applications. Probab. Theory Relat. Fields 176, 1097–1163 (2020). https://doi.org/10.1007/s00440-019-00936-y

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00440-019-00936-y

Keywords

  • Gaussian approximation
  • Jackknife multiplier bootstrap
  • Coupling
  • U-process
  • Local maximal inequality

Mathematics Subject Classification

  • 60F17
  • 62E17
  • 62F40
  • 62G10