Skip to main content

Measure estimation on manifolds: an optimal transport approach

Abstract

Assume that we observe i.i.d. points lying close to some unknown d-dimensional \({\mathcal {C}}^k\) submanifold M in a possibly high-dimensional space. We study the problem of reconstructing the probability distribution generating the sample. After remarking that this problem is degenerate for a large class of standard losses (\(L_p\), Hellinger, total variation, etc.), we focus on the Wasserstein loss, for which we build an estimator, based on kernel density estimation, whose rate of convergence depends on d and the regularity \(s\le k-1\) of the underlying density, but not on the ambient dimension. In particular, we show that the estimator is minimax and matches previous rates in the literature in the case where the manifold M is a d-dimensional cube. The related problem of the estimation of the volume measure of M for the Wasserstein loss is also considered, for which a minimax estimator is exhibited.

This is a preview of subscription content, access via your institution.

Fig. 1

References

  1. Liu, H., Lafferty, J., Wasserman, L.: Sparse nonparametric density estimation in high dimensions using the rodeo. In: Meila, M., Shen, X. (eds.) Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, vol. 2, pp. 283–290. PMLR (2007)

  2. Liu, J., Zhang, R., Zhao, W., Lv, Y.: A robust and efficient estimation method for single index models. J. Multivar. Anal. 122, 226–238 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  3. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  4. Genovese, C.R., Perone-Pacifico, M., Verdinelli, I., Wasserman, L.: Minimax manifold estimation. J. Mach. Learn. Res. 13, 1263–1291 (2012)

    MathSciNet  MATH  Google Scholar 

  5. Aamari, E., Levrard, C.: Stability and minimax optimality of tangential Delaunay complexes for manifold reconstruction. Discrete Comput. Geom. 59(4), 923–971 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  6. Aamari, E., Levrard, C.: Nonasymptotic rates for manifold, tangent space and curvature estimation. Ann. Stat. 47(1), 177–204 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  7. Divol, V.: Minimax adaptive estimation in manifold inference. Electron. J. Stat. 15(2), 5888–5932 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  8. Niyogi, P., Smale, S., Weinberger, S.: Finding the homology of submanifolds with high confidence from random samples. Discrete Comput. Geom. 39(1–3), 419–441 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  9. Balakrishnan, S., Rinaldo, A., Sheehy, D., Singh, A., Wasserman, L.: Minimax rates for homology inference. In: Lawrence, N.D., Girolami, M. (eds.) Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, vol. 22, pp. 64–72. PMLR (2012). https://proceedings.mlr.press/v22/balakrishnan12a.html

  10. Hein, M., Audibert, J.-Y.: Intrinsic dimensionality estimation of submanifolds in \(\mathbb{R}^d\). In: Proceedings of the 22nd International Conference on Machine Learning, ACM, pp. 289–296 (2005)

  11. Little, A.V., Jung, Y.-M., Maggioni, M.: Multiscale estimation of intrinsic dimensionality of data sets. In: 2009 AAAI Fall Symposium Series (2009)

  12. Kim, J., Rinaldo, A., Wasserman, L.: Minimax rates for estimating the dimension of a manifold. J. Comput. Geom. 10(1), 42–95 (2019)

    MathSciNet  MATH  Google Scholar 

  13. Aamari, E., Kim, J., Chazal, F., Michel, B., Rinaldo, A., Wasserman, L.: Estimating the reach of a manifold. Electron. J. Stat. 13(1), 1359–1399 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  14. Berenfeld, C., Harvey, J., Hoffmann, M., Shankar, K.: Estimating the reach of a manifold via its convexity defect function. Discrete Comput. Geom. 67(2), 403–438 (2022). https://doi.org/10.1007/s00454-021-00290-8

    Article  MathSciNet  MATH  Google Scholar 

  15. Hendriks, H.: Nonparametric estimation of a probability density on a Riemannian manifold using Fourier expansions. Ann. Stat. 18(2), 832–849 (1990). https://doi.org/10.1214/aos/1176347628

    Article  MathSciNet  MATH  Google Scholar 

  16. Hendriks, H., Janssen, J., Ruymgaart, F.: Strong uniform convergence of density estimators on compact Euclidean manifolds. Stat. Probab. Lett. 16(4), 305–311 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  17. Pelletier, B.: Kernel density estimation on Riemannian manifolds. Stat. Probab. Lett. 73(3), 297–304 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  18. Cleanthous, G., Georgiadis, A.G., Kerkyacharian, G., Petrushev, P., Picard, D.: Kernel and wavelet density estimators on manifolds and more general metric spaces. Bernoulli 26(3), 1832–1862 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  19. Berry, T., Sauer, T.: Density estimation on manifolds with boundary. Comput. Stat. Data Anal. 107, 1–17 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  20. Wu, H.-T., Wu, N.: Strong uniform consistency with rates for kernel density estimators with general kernels on manifolds. Inf. Infer. J. IMA (2021). https://doi.org/10.1093/imaiai/iaab014

    Article  Google Scholar 

  21. Berenfeld, C., Hoffmann, M.: Density estimation on an unknown submanifold. Electron. J. Stat. 15(1), 2179–2223 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  22. Peyré, G., Cuturi, M.: Computational optimal transport: with applications to data science. Found. Trends Mach. Learn. 11(5–6), 355–607 (2019)

    Article  MATH  Google Scholar 

  23. Dudley, R.M.: The speed of mean Glivenko-Cantelli convergence. Ann. Math. Stat. 40(1), 40–50 (1969)

    Article  MathSciNet  MATH  Google Scholar 

  24. Dereich, S., Scheutzow, M., Schottstedt, R.: Constructive quantization: approximation by empirical measures. Ann. l’IHP Probab. Stat. 49, 1183–1203 (2013)

    MathSciNet  MATH  Google Scholar 

  25. Fournier, N., Guillin, A.: On the rate of convergence in Wasserstein distance of the empirical measure. Probab. Theory Relat. Fields 162(3–4), 707–738 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  26. Singh, S., Póczos, B.: Minimax distribution estimation in Wasserstein distance. arXiv preprint arXiv:1802.08855 (2018)

  27. Weed, J., Bach, F.: Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance. Bernoulli 25(4A), 2620–2648 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  28. Lei, J.: Convergence and concentration of empirical measures under Wasserstein distance in unbounded functional spaces. Bernoulli 26(1), 767–798 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  29. Weed, J., Berthet, Q.: Estimation of smooth densities in Wasserstein distance. In: Conference on Learning Theory, PMLR pp. 3118–3119 (2019)

  30. Trillos, N.G., Gerlach, M., Hein, M., Slepčev, D.: Error estimates for spectral convergence of the graph Laplacian on random geometric graphs toward the Laplace-Beltrami operator. Found. Comput. Math. 20(4), 827–887 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  31. Federer, H.: Curvature measures. Trans. Am. Math. Soc. 93(3), 418–491 (1959)

    Article  MathSciNet  MATH  Google Scholar 

  32. Delfour, M., Zolésio, J.-P.: Shapes and Geometries: Metrics, Analysis, Differential Calculus, and Optimization. SIAM (2011). ISBN 978-0-89871-936-9

  33. Poly, J.-B., Raby, G.: Fonction distance et singularités. Bull. Sci. Math. 108, 187–195 (1984)

    MathSciNet  MATH  Google Scholar 

  34. Brezis, H.: Functional analysis. Univ. Sobolev Spaces Part. Differ, Equ (2010)

    Google Scholar 

  35. Triebel, H.: Spaces on Riemannian manifolds and lie groups. In: Theory of Function Spaces II, pp. 281–346. Springer Basel (1992). https://doi.org/10.1007/978-3-0346-0419-2_7

  36. Lunardi, A.: Interpolation Theory, pp. 1–44. Scuola Normale Superiore, Pisa (2018). https://doi.org/10.1007/978-88-7642-638-4_1

  37. Villani, C.: Optimal Transport: Old and New, vol. 338. Springer, New York (2008)

    MATH  Google Scholar 

  38. Peyre, R.: Comparison between \(W_2\) distance and \({\dot{H}}^{- 1}\) norm, and localization of Wasserstein distance. ESAIM. Control Optim. Calc. Var. 24(4), 1489–1501 (2018)

    MathSciNet  MATH  Google Scholar 

  39. Santambrogio, F.: Optimal Transport for Applied Mathematicians. Birkäuser, NY (2015)

    Book  MATH  Google Scholar 

  40. Divol, V.: A short proof on the rate of convergence of the empirical measure for the Wasserstein distance. arXiv preprint arXiv:2101.08126 (2021)

  41. Talagrand, M.: Upper and lower bounds of stochastic processes. In: Modern Surveys in Mathematics 60. Springer, New York (2014)

  42. Tsybakov, A.: Introduction to Nonparametric Estimation. Springer Series in Statistics, New York (2008)

    MATH  Google Scholar 

  43. Aubin, T.: Nonlinear analysis on manifolds. Monge-Ampère Equations. Grundlehren der mathematischen Wissenschaften (1982). ISBN 9780387907048

  44. Rosenthal, H.P.: On the subspaces of \(L_p\) (\(p> 2\)) spanned by sequences of independent random variables. Isr. J. Math. 8(3), 273–303 (1970)

    MATH  Google Scholar 

  45. Aamari, E.: Vitesses de convergence en inférence géométrique. PhD thesis, Paris Saclay (2017)

  46. Sato, H., Kasai, H., Mishra, B.: Riemannian stochastic variance reduced gradient algorithm with retraction and vector transport. SIAM J. Optim. 29(2), 1444–1472 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  47. Edelman, A., Arias, T.A., Smith, S.T.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2), 303–353 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  48. Diaconis, P., Holmes, S., Shahshahani, M.: Sampling from a manifold. In: Advances in Modern Statistical Theory and Applications: A Festschrift in Honor of Morris L. Eaton, pp. 102–125. Institute of Mathematical Statistics (2013)

  49. Zappa, E., Holmes-Cerfon, M., Goodman, J.: Monte carlo on manifolds: sampling densities and integrating functions. Commun. Pure Appl. Math. 71(12), 2609–2647 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  50. Giesen, J., Wagner, U.: Shape dimension and intrinsic metric from samples of manifolds with high co-dimension. In: Proceedings of the Nineteenth Annual Symposium on Computational Geometry, pp. 329–337 (2003)

  51. Brasco, L., Carlier, G., Santambrogio, F.: Congested traffic dynamics, weak flows and very degenerate elliptic equations. J. Math. Pures Appl. 93(6), 652–671 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  52. Besson, G., Courtois, G., Hersonsky, S.: Poincaré inequality on complete Riemannian manifolds with Ricci curvature bounded below. Math. Res. Lett. 25(6), 1741–1769 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  53. do Carmo, M.P.: Riemannian geometry. In: Mathematics. Massachusetts, Boston (1992)

  54. Benamou, J.-D., Brenier, Y.: A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem. Numer. Math. 84, 375–393 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  55. Brenier, Y.: Extended Monge-Kantorovich theory. In: Optimal Transportation and Applications, pp. 91–121. Springer (2003)

  56. Sogge, C.D.: Fourier Integrals in Classical Analysis, 2nd edn. Cambridge Tracts in Mathematics, Cambridge (2017)

    Book  MATH  Google Scholar 

  57. Hiroshima, T.: Construction of the Green function on Riemannian manifold using harmonic coordinates. J. Math. Kyoto Univ. 36(1), 1–30 (1996)

    MathSciNet  MATH  Google Scholar 

  58. Burkhardt, H.: Sur les fonctions de Green relatives à un domaine d’une dimension. Bull. Soc. Math. France 22, 71–75 (1894)

    Article  MathSciNet  MATH  Google Scholar 

  59. Giné, E., Nickl, R.: Mathematical Foundations of Infinite-Dimensional Statistical Models. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press (2015). https://doi.org/10.1017/CBO9781107337862

  60. Yu, B.: Assouad, Fano, and Le Cam. In: Pollard, D., Torgersen, E., Yang, G.L. (eds.) Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics, pp. 423–435. Springer, New York (1997). https://doi.org/10.1007/978-1-4612-1880-7_29

  61. Udriste, C.: Minimization of functions on Riemannian manifolds. In: Convex Functions and Optimization Methods on Riemannian Manifolds, pp. 226–286. Springer, Netherlands (2013). https://doi.org/10.1007/978-94-015-8390-9_7

Download references

Acknowledgements

I am grateful to Eddie Aamari, Clément Berenfeld, Fréderic Chazal, Clément Levrard and Pascal Massart for helpful discussions and valuable comments on different mathematical aspects of this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vincent Divol.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Geometric properties of \({\mathcal {C}}^k\) manifolds with positive reach and their estimators

Let \(M \in {\mathcal {M}}_{d,\tau _{\min },L}^k\) for some \(k\ge 2\) and \(\tau _{\min },L>0\). Recall that the angle between two d-dimensional subspaces \(T_1\) and \(T_2\) is given by , where \(\pi _{T_1}\) (resp. \(\pi _{T_2}\)) is the orthogonal projection on \(T_1\) (resp. \(T_2\)) and .

Lemma 20

Let \(x,y\in M\). The following properties hold:

  1. (i)

    One has \(\vert \pi _y^\bot (x-y)\vert \le \frac{\vert x-y\vert ^2}{2\tau _{\min }}\) and \(\angle (T_xM,T_yM)\le 2\frac{\vert y-x\vert }{\tau _{\min }}\).

  2. (ii)

    If \(\pi _M(z) =x\) for some \(z\in M^{\tau _{\min }}\), then \(z-x \in T_x M^\bot \).

  3. (iii)

    If \(h\le \tau _{\min }/4\), then \(c_dh^d\le \mathrm {vol}_M({\mathcal {B}}_M(x,h))\le C_dh^d\).

  4. (iv)

    If \(h\le r_0\), then \({\mathcal {B}}_M(x,h)\subset \Psi _x({\mathcal {B}}_{T_x M}(0,h))\subset {\mathcal {B}}_M(x,8h/7)\). Also, if \(u\in {\mathcal {B}}_{T_x M}(0,r_0)\), then \(\vert u\vert \le \vert \Psi _x(u)-x\vert \le 8\vert u\vert /7\).

  5. (v)

    There exists a map \(N_x:{\mathcal {B}}_{T_x M}(0,r_0)\rightarrow T_xM^\bot \) satisfying \(d N_x(0)= 0\), and such that, for \(u\in {\mathcal {B}}_{T_x M}(0,r_0)\), we have \(\Psi _x(u) = x + u + N_x(u)\) with \(\vert N_x(u)\vert \le L\vert u\vert ^2\).

  6. (vi)

    There exist tensors \(B_x^1,\dots ,B_x^{k-1}\) of operator norm controlled by a constant depending on L, d, k and \(\tau _{\min }\), such that, if \(u\in T_xM\) satisfies \(\vert u\vert \le C_{k,d,L}\), then \(J\Psi _x(u)= 1+ \sum _{i=2}^{k-1} B_{x}^i[u^{\otimes i}] + R_x(u)\), with \(\vert R_x(u)\vert \le C'_{k,d,L}\vert u\vert ^k\).

Proof

See Theorem 4.18 in [31] and Lemma 6 in [50] for (i), Theorem 4.8 in [31] for (ii), and Proposition 8.7 in [5] for (iii). See Lemma A.2 in [6] for the second inclusion of balls in (iv), which also implies the second inequality in (iv). The first inclusion as well as the first inequality in (iv) follow from the fact that \(\Psi _{x}\) is the local inverse of \(\tilde{\pi }_x\), which is 1-Lipschitz.

By a Taylor expansion of \(\Psi _x\) at \(u=0\), we have \(\Psi _x(u)=x+u + N_x(u)\), with \(N_x(u) = \int _0^1 d^2\Psi _x(tu)[u^{\otimes 2}]\mathrm {d}t\). Hence, \(\vert N_x(u)\vert \le L\vert u\vert ^2\). Furthermore, as \(\tilde{\pi }_x\circ \Psi _x(u)=u\), we have \(\pi _x(N_x(u))=0\), i.e. \(N_x\) takes its values in \(T_x M^\bot \). This proves (v).

Eventually, we prove (vi). Let us write \(d\Psi _x(u) = \mathrm {id}_{T_x M} + d N_x(u)\) and \(d\Psi _x(u)^\top d\Psi _x(u) = \mathrm {id}_{T_x M} + (d N_x(u))^\top dN_x(u)\). We obtain

$$\begin{aligned} J\Psi _x(u) = \sqrt{\det (d\Psi _x(u)^\top d\Psi _x(u))}= \sqrt{\det (\mathrm {id}_{T_x M} + (d N_x(u))^\top dN_x(u))}.\end{aligned}$$

One has \(dN_x(u) = dN_x(0) + \sum _{j=2}^{k-1} \frac{d^j N_x(0)}{(j-1)!}[u^{\otimes (j-1)}] + R_x(u)\), with \(\vert R_x(u)\vert \le C_{k,L} \vert u\vert ^{k-1}\) and \(dN_x(0)=0\). Hence, \( (d N_x(u))^\top dN_x(u)\) is written as \(\sum _{j= 2}^{k-1} B_j[u^{\otimes j}] + R'_x(u)\) with \(\vert R'_x(u)\vert \le C'_{k,l} \vert u\vert ^k\), for some j-tensors \(B_j\) whose operator norms are bounded in terms of L. The operator norm of this operator is smaller than, say, 1/2 for \(\vert u\vert \) sufficiently small, and we conclude the proof by writing a Taylor expansion at 0 of the function \(F \mapsto \sqrt{\det (\mathrm {id}+ F)}\). \(\square \)

We now prove Lemma 6, on the construction of smooth partitions of unity based on some set S which is sufficiently sparse and dense over a tubular neighborhood of M.

Proof of Lemma 6

Consider the functions \(\theta \) and \((\chi _x)_{x\in S}\) as in the statement of the lemma, and, for \(y\in M^\delta \), let \(Z(y)=\sum _{x'\in S}\theta \left( \frac{y-x'}{8\delta } \right) \). As \(d_H(M^{\delta }\vert S) \le 4\delta \), we have \(Z(y)\ge 1\) and the quantity \(\chi _x(y)\) is well defined. The function \(\chi _x\) is smooth and we have \(\sum _{x\in S} \chi _x \equiv 1\) on \(M^{\delta }\). One has \(d^l \chi _x(y)\) which is written as a sum of terms of the form \(d^{l-j}\theta \left( \frac{y-x}{8\delta } \right) d^j (Z^{-1})(y)\), and \(d^j (Z^{-1})(y)\) is equal to a sum of terms of the form \(Z^{j'-j-2}(y) d^{j'} Z(y)\) for \(1\le j'\le j\). Also, \(\left\| d^j\theta \left( \frac{y-x'}{8\delta } \right) \right\| _{\mathrm {op}}\le C_{j}\delta ^{-j}\) and \(\left\| d^j Z(y) \right\| _{\mathrm {op}}\le C_{j}\delta ^{-j}\sum _{x\in S} {\mathbf {1}}\{\vert x-y\vert \le 8\delta \}\). Hence, as \(Z\ge 1\), we have for any \(l\ge 0\)

$$\begin{aligned} \left\| d^l \chi _x(y) \right\| _{\mathrm {op}} \le C'_{l} \delta ^{-l}\sum _{x\in S} {\mathbf {1}}\{\vert x-y\vert \le 8\delta \}. \end{aligned}$$

It remains to bound this sum. If \(x\in {\mathcal {B}}(y,8\delta )\), then \(\pi _M(x)\in {\mathcal {B}}(\pi _M(y),10\delta )\). Also, for \(x\ne x'\in S\), we have \(\vert \pi _M(x)-\pi _M(x')\vert \ge \vert x-x'\vert -2\delta \ge 2\delta \). In particular, the balls \({\mathcal {B}}_M(\pi _M(x),\delta )\) for \(x\in S\) are pairwise disjoint, and are all included in \({\mathcal {B}}_M(\pi _M(y),11\delta )\) . Therefore, if \(11\delta \le \tau (M)/4\), using Lemma 20(iii) twice, we obtain that \(\mathrm {vol}_M({\mathcal {B}}_M(\pi _M(x),\delta ))\ge c_d \delta ^d\), and that

$$\begin{aligned} \sum _{x\in S} {\mathbf {1}}\{\vert x-y\vert \le 8\delta \}&\le \sum _{x\in S} {\mathbf {1}}\{\vert x-y\vert \le 8\delta \}\frac{\mathrm {vol}_M({\mathcal {B}}_M(\pi _M(x),\delta ))}{c_d \delta ^d}\\&\le \frac{\mathrm {vol}_M({\mathcal {B}}_M(\pi _M(y),11\delta ))}{c_d\delta ^d} \le c'_d. \end{aligned}$$

This concludes the proof. \(\square \)

We end this section by detailing the properties of the local polynomial estimators \({\hat{\Psi }}_i\) and \({{\hat{T}}}_i\) defined in [6]. In particular, the next lemma implies Proposition 5. Recall that \(X_i=Y_i+Z_i\) with \(Y_i\in M\) and \(\vert Z_i\vert \le \gamma \). Aamari and Levrard introduce tensors \(V_{j,i}^* \) which are defined as \(d^j\Psi _{X_i}(0)/j!\), where \(d^j\Psi _{X_i}(0)\) is the jth differential of \(\Psi _{X_i}\) at 0 (see the proof of Lemma 2 in [6] for details). In particular, we have \(V_{1,i}^* =\pi _{Y_i}\). Furthermore, as \(\tilde{\pi }_{Y_j}\circ \Psi _{Y_j}=\mathrm {id}\), we have \(\pi _{Y_j}\circ V_{j,i}^* = 0\) for \(j\ge 2\).

Lemma 21

With probability larger than \(1-cn^{-k/d}\), for any \(1\le i \le n\),

  1. (i)

    We have \(\angle (T_{Y_i}M,{{\hat{T}}}_i) \lesssim \varepsilon ^{m-1} + \gamma \varepsilon ^{-1}\).

  2. (ii)

    For \(v\in {{\hat{T}}}_i\), we have \({\hat{\Psi }}_i(v) = X_i + v + {{\hat{N}}}_i(v)\), where \({{\hat{N}}}_i : {{\hat{T}}}_i\rightarrow {{\hat{T}}}_i^\bot \) is defined by \({{\hat{N}}}_i(v) = \sum _{j=2}^{m-1} {{\hat{V}}}_{j,i}[v^{\otimes j}]\).

  3. (iii)

    For any \(2\le j <m\), \(\left\| {{\hat{V}}}_{j,i}\circ {\hat{\pi }}_i-V_{j,i}^* \circ \pi _{Y_i} \right\| _{\mathrm {op}} \lesssim \varepsilon ^{m-j} + \gamma \varepsilon ^{-j}.\)

  4. (iv)

    For \(v\in {\mathcal {B}}_{{{\hat{T}}}_i}(0,3\varepsilon )\), we have

    $$\begin{aligned}&\vert \hat{\Psi }_i(v)-\Psi _{Y_i}(\pi _{Y_i}(v))\vert \lesssim \varepsilon ^m + \gamma , \end{aligned}$$
    (A.1)
    $$\begin{aligned}&\vert {{\hat{N}}}_i(v)-N_{Y_i}(\pi _{Y_i}(v))\vert \lesssim \varepsilon ^m + \gamma , \end{aligned}$$
    (A.2)
    $$\begin{aligned}&\left\| d\hat{\Psi }_i(v) - d(\Psi _{Y_i}\circ \pi _{Y_i})(v) \right\| _{\mathrm {op}} \lesssim \varepsilon ^{m-1} + \gamma \varepsilon ^{-1} \end{aligned}$$
    (A.3)
    $$\begin{aligned}&\left\| d{\hat{N}}_i(v) - d(N_{Y_i}\circ \pi _{Y_i})(v) \right\| _{\mathrm {op}} \lesssim \varepsilon ^{m-1} + \gamma \varepsilon ^{-1}. \end{aligned}$$
    (A.4)

Proof

Lemma 21(i) is stated in Theorem 2 in [6]. Remark that for \(x\in {\mathcal {B}}(X_i,\varepsilon )\), with \(\tilde{x}=x-X_i\),

$$\begin{aligned}&\left| \tilde{x}-\pi (\tilde{x}) - \sum _{j=2}^{m-1} V_j[\pi (\tilde{x})^{\otimes j}]\right| ^2 \\&\quad = \left| \tilde{x}-\pi (\tilde{x}) - \sum _{j=2}^{m-1} \pi ^\bot \circ V_j[\pi (\tilde{x})^{\otimes j}]\right| ^2 + \left| \sum _{j=2}^{m-1} \pi \circ V_j[\pi (\tilde{x})^{\otimes j}]\right| ^2 \end{aligned}$$

so that we may always assume that the tensors \({{\hat{V}}}_{j,i}\) minimizing the criterion (3.8) satisfy \({\hat{\pi }}_i \circ {{\hat{V}}}_{j,i}= 0\) for \(j\ge 2\). This proves Lemma 21(ii).

We prove Lemma 21(iii) by induction on \(2\le j < m\). The result for \(j=2\) is stated in [6, Theorem 2]. It is shown in [6] (see Equation (3)) that there exist tensors \(V_{j,i}'\) for \(1\le j <m\) satisfying with probability larger than \(1-cn^{-k/d}\),

$$\begin{aligned} \left\| V_{j,i}'\circ \pi _{Y_i} \right\| _{\mathrm {op}} \lesssim \varepsilon ^{m-j}+ \gamma \varepsilon ^{-j}. \end{aligned}$$
(A.5)

The tensors \(V_{j,i}'\) are defined by the relation, for \(y\in M\) close enough to \(Y_i\),

$$\begin{aligned} y- Y_i-{\hat{\pi }}_i(y- Y_i) - \sum _{j=2}^{m-1} {{\hat{V}}}_{j,i}[{\hat{\pi _i}}(y- Y_i)^{\otimes j}] = \sum _{j=1}^{m-1} V'_{j,i}[\pi _{Y_i}(y- Y_i)^{\otimes j}] + R(y- Y_i),\nonumber \\ \end{aligned}$$
(A.6)

with \(\vert R(y-Y_i)\vert \lesssim \varepsilon ^m\), see the proof of Lemma 3 in [6]. We also may write

$$\begin{aligned} y- Y_i = \pi _{Y_i}(y- Y_i) + \sum _{j=2}^{m-1}V_{j,i}^* [\pi _{Y_i}(y- Y_i)^{\otimes j}] + R'(y- Y_i), \end{aligned}$$
(A.7)

with \(\vert R'(y-Y_i)\vert \lesssim \varepsilon ^m\). By plugging (A.7) in the left hand side of (A.6) and by noting that \(\pi _{Y_i}\circ V_{j,i}^* =0\) for \(j\ge 2\), we see that \(V'_{j,i}\circ \pi _{Y_i}\) is written as the sum of \((\pi _{Y_i}-{\hat{\pi _i}})\circ V_{j,i}^* + (V_{j,i}^* \circ \pi _{Y_i} - {{\hat{V}}}_{j,i}\circ {\hat{\pi }}_i)\) and of a sum of terms proportional to

$$\begin{aligned} {{\hat{V}}}_{j',i}[{\hat{\pi _i\circ }} V_{a_1,i}^* \circ \pi _{Y_i},\dots ,{\hat{\pi _i\circ }} V_{a_{j'},i}^* \circ \pi _{Y_i}], \end{aligned}$$
(A.8)

where \(2\le j' <j\) and \(a_1+\cdots +a_{j'} = j\), \(1\le a_1,\dots ,a_{j'} <j\). There exists in particular an index in the sum which is larger than 2. Assume without loss of generality that \(a_1,\dots ,a_l >1\) and \(a_{l+1},\dots ,a_{j'}=1\), so that \({\hat{\pi }}_i \circ {{\hat{V}}}_{a_{u},i}=0\) for \(1\le u \le l\). Then,

$$\begin{aligned}&\left\| {{\hat{V}}}_{j',i}[{\hat{\pi _i\circ }} V_{a_1,i}^* \circ \pi _{Y_i},\dots ,{\hat{\pi _i\circ }} V_{a_l,i}^* \circ \pi _{Y_i},\dots ,{\hat{\pi _i\circ }} V_{a_{j'},i}^* \circ \pi _{Y_i}] \right\| _{\mathrm {op}} \\&\quad =\Big \Vert {{\hat{V}}}_{j',i}[{\hat{\pi _i\circ }} (V_{a_1,i}^* -{{\hat{V}}}_{a_1,i})\circ \pi _{Y_i},\dots ,{\hat{\pi _i\circ }} (V_{a_l,i}^* -{{\hat{V}}}_{a_l,i})\circ \pi _{Y_i},\\&\qquad {\hat{\pi _i\circ }} V_{a_{j'},i}^* \circ \pi _{Y_i},\dots ,{\hat{\pi _i\circ }} V_{a_{l+1},i}^* \circ \pi _{Y_i}]\Big \Vert _{\mathrm {op}} \\&\quad \lesssim \ell \prod _{u=1}^l\left\| V_{a_u,i}^* \circ \pi _{Y_i}-{{\hat{V}}}_{a_u,i}\circ \pi _{Y_i} \right\| _{\mathrm {op}} \\&\quad \lesssim \ell \prod _{u=1}^l\left( \left\| V_{a_u,i}^* \circ \pi _{Y_i}-{{\hat{V}}}_{a_u,i}\circ {\hat{\pi }}_{i} \right\| _{\mathrm {op}} + \ell \left\| \pi _{Y_i} - {\hat{\pi }}_i \right\| _{\mathrm {op}} \right) \\&\quad \lesssim \varepsilon ^{-1}\prod _{u=1}^l\left( \varepsilon ^{m-a_u}+ \gamma \varepsilon ^{-a_u}+\varepsilon ^{m-2}+ \gamma \varepsilon ^{-2} \right) \\&\quad \lesssim \varepsilon ^{-1}(\varepsilon ^{lm-(j-l)}+\gamma ^l\varepsilon ^{-(j-l)})\lesssim \varepsilon ^{m-j}+\gamma \varepsilon ^{-j}, \end{aligned}$$

where at the last line we use the induction hypothesis as well as Lemma 21(i), the fact that \(\sum _{u=1}^la_u=j-l\) and that \(\ell \lesssim \varepsilon ^{-1}\). As \(\left\| (\pi _{Y_i}-{\hat{\pi _i}})\circ V_{j,i}^* \right\| _{\mathrm {op}} \lesssim \varepsilon ^{m-1}+ \gamma \varepsilon ^{-1}\), we obtain that

$$\begin{aligned}\left\| (V_{j,i}^* \circ \pi _{Y_i} - {{\hat{V}}}_{j,i}\circ {\hat{\pi }}_i) - V'_{j,i}\circ \pi _{Y_i} \right\| _{\mathrm {op}}\lesssim \varepsilon ^{m-j}+\gamma \varepsilon ^{-j}.\end{aligned}$$

Hence, using (A.5),

$$\begin{aligned} \left\| V_{j,i}^* \circ \pi _{Y_i} - {{\hat{V}}}_{j,i}\circ {\hat{\pi }}_i \right\| _{\mathrm {op}}&\le \left\| (V_{j,i}^* \circ \pi _{Y_i} - {{\hat{V}}}_{j,i}\circ {\hat{\pi }}_i) - V'_{j,i}\circ \pi _{Y_i} \right\| _{\mathrm {op}}+ \left\| V'_{j,i}\circ \pi _{Y_i} \right\| _{\mathrm {op}}\\&\lesssim \varepsilon ^{m-j}+ \gamma \varepsilon ^{-j}. \end{aligned}$$

We now may prove (A.1). Indeed, for \(v\in {\mathcal {B}}_{{{\hat{T}}}_i}(0,3\varepsilon )\), \({\hat{\Psi }}_i(v) = X_i + v + \sum _{j=2}^{m-1} {{\hat{V}}}_{j,i}[v^{\otimes j}]\), whereas by a Taylor expansion, \(\Psi _{Y_i}\circ \pi _{Y_i}(v)= Y_i + \pi _{Y_i}(v) + \sum _{j= 2}^{m-1} V_{j,i}[\pi _{Y_i}(v)^{\otimes j}] + R(v)\), with \(\vert R(v)\vert \lesssim \varepsilon ^m\). By Lemma 21(iii), the difference between the two quantities is bounded with high probability by a sum of terms of order \((\varepsilon ^{m-j}+ \gamma \varepsilon ^{-j})\vert v\vert ^j\lesssim \varepsilon ^m+ \gamma \). Inequality (A.2) is directly implied by (A.1) and Lemma 21(i). Inequality (A.3) is proven as (A.1), by noting that, for \(h\in {{\hat{T}}}_i\),

$$\begin{aligned} \left\{ \begin{array}{l} d(\Psi _{Y_j}\circ \pi _{Y_j})(v)[h] = \pi _{Y_j}(h) + \sum _{j=2}^{m-1} jV_{j,i}^* [\pi _{Y_j}(v),\pi _{Y_j}(h)^{\otimes (j-1)}] + R'(v)h\\ d{\hat{\Psi _j}}(v)[h] = h + \sum _{j=2}^{m-1} j{{\hat{V}}}_{j,i}[v,h^{\otimes (j-1)}], \end{array}\right. \end{aligned}$$

with \(\left\| R'(v) \right\| _{\mathrm {op}}\lesssim \varepsilon ^{m-1}\). Equation (A.4) is shown in a similar way. \(\square \)

Properties of negative Sobolev norms

Proof of Proposition 1

The second inequality in (i) is trivial. The assertion (ii) is stated in [51, Theorem 2.1] for an open set \(\Omega \subset {\mathbb {R}}^d\), and their proof can be straightforwardly adapted to the manifold setting. It remains to prove the first inequality in (i). Note that for any g with \( \Vert \nabla g\Vert _{L_{p^* }(M)}\le 1\), one has \(\int fg \mathrm {d}\mathrm {vol}_M=\int f(g-\int g\mathrm {d}\mathrm {vol}_M)\mathrm {d}\mathrm {vol}_M\) as \(\int f\mathrm {d}\mathrm {vol}_M=0\). Also, by Poincaré inequality (see [52, Theorem 0.6]),

$$\begin{aligned} \left\| g-\int _M g \right\| _{L_{p^* }(M)} \le C^{\frac{1}{p}}R^{\frac{d}{p^* }+\frac{1}{p}}\Vert \nabla g\Vert _{L_{p^* }(M)} \le C^{\frac{1}{p}}R^{\frac{d}{p^* }+\frac{1}{p}},\end{aligned}$$

where \(R= \max \{d_g(x,y),\ x,y\in M\}\) and C depends on d and on a lower bound \(\kappa \) on the Ricci curvature of M. Therefore, \(\left\| g-\int _M g \right\| _{H^1_{p^* }(M)}\le C^{\frac{1}{p}}R^{\frac{d}{p^* }+\frac{1}{p}}\). The quantity \(\kappa \) can be further lower bounded by a constant depending on \(\tau _{\min }\) and d. Indeed, a bound on the second fundamental form of M entails a bound on the Ricci curvature according to Gauss equation (see e.g. [53, Chapter 6]), and the second fundamental form is controlled by the reach of M, see [8, Proposition 6.1]. As \(C^{\frac{1}{p}} \le C\vee 1\), to conclude, it suffices to bound the geodesic diameter of M. This is done in the following lemma. \(\square \)

Lemma 22

The geodesic diameter of M satisfies \(\sup _{x,y\in M} d_g(x,y) \le c_d \vert \mathrm {vol}_M\vert \tau _{\min }^{1-d}\).

Fig. 2
figure 2

Illustration of the construction in the proof of Lemma 22

Proof

Consider a covering of M by N open balls of radius \(r_1=\tau _{\min }/4\) (for the Euclidean distance) and let \(x,y\in M\). Such a covering exists with \(N\le c_d\vert \mathrm {vol}_M\vert r_1^{-d}\) by standard packing arguments. Let \(\gamma :[0,\ell ]\rightarrow M\) be a unit speed curve between x and y. Let \(B_0\) be the ball of the covering such that \(x\in B_0\). If \(y\in B_0\), then \(\vert x-y\vert \le 2r_1\), and by [8, Proposition 6.3], we have \(d_g(x,y)\le 4r_1\). Otherwise, let \(t_0 = \inf \{t\in [0,\ell ],\ \forall t'\ge t,\ \gamma (t')\not \in B_0\}\). Then belong to the boundary of \(B_0\), and is also in some other ball \(B_1\). By the previous argument, we have \(d_g(x,x_1)\le 4r_1\). If \(y\in B_1\), then \(d_g(x_1,y)\le 4r_1\) and \(d_g(x,y)\le 8r_1\). Otherwise, we define \(t_1 = \inf \{t\in [t_0,\ell ],\ \forall t'\ge t,\ \gamma (t')\not \in B_1\}\) and we iterate the same argument. At the end, we obtain a sequence \(x=x_0,x_1,\dots ,x_I\) of points in M with associated balls \(B_i\) which contain \(x_i\), such that \(y\in B_I\) and \(d_g(x_i,x_{i+1})\le 4r_1\). Furthermore, all the balls \(B_i\) are pairwise distinct. As \(d_g(x_I,y)\le 4r_1\), we have \(\ell \le (I+1)4r_1 \le (N+1)4r_1\le 8Nr_1\). By letting \(\gamma \) be a geodesic, we obtain in particular \(\ell =d_g(x,y) \le 8Nr_1 \le 8c_d\vert \mathrm {vol}_M\vert r_1^{1-d}\). See also Fig. 2 for an illustration. \(\square \)

Proof of Proposition 2

Given a measurable map \(\rho :[0,1]\rightarrow {\mathcal {P}}^p\), \(E_t\) a vectorial measure absolutely continuous with respect to \(\rho _t\) (see [39, Box 4.2]) and v(xt) a time-depending vector field, defined as the density of \(E_t\) with respect to \(\rho _t\), we define the Benamou-Brenier functional

(B.1)

The Benamou-Brenier formula [54, 55] asserts that for \(\mu ,\nu \in {\mathcal {P}}_1^p\) supported on some ball of radius R,

$$\begin{aligned} W_{p}^p(\mu ,\nu ) = \min \left\{ {\mathcal {B}}_p(\rho ,E),\ \partial _t \rho _t + \nabla \cdot E_t = 0, \rho _0=\mu , \rho _1= \nu \right\} , \end{aligned}$$
(B.2)

where \(\rho _t\) is supported on the ball of radius R, and the continuity equation \(\partial _t \rho + \nabla \cdot E = \mu -\nu \) has to be understood in the distributional sense, i.e.

$$\begin{aligned} \begin{aligned}&\int _{[0,1]\times {\mathbb {R}}^D}\partial _t \phi (t,x)\mathrm {d}\rho (t,x) + \int _{[0,1]\times {\mathbb {R}}^D} \nabla \phi (t,x) \cdot \mathrm {d}E(t,x) =0, \end{aligned} \end{aligned}$$
(B.3)

for all \(\phi \in {\mathcal {C}}^1((0,1)\times {\mathcal {B}}(0,R))\) with compact support.

Assume that \(\mu \) has a density \(f_0\) and \(\nu \) has a density \(f_1\) on M. As \(\tau (M)>0\), the existence of a probability measure of mass 1, supported on M, with density larger than \(f_{\min }\) implies that M is compact, see Remark 3. It is in particular included in a ball \({\mathcal {B}}(0,R)\) for some R large enough. Let w be a vector field on M with \(\nabla \cdot w = \mu -\nu \) in a distributional sense, i.e. \(\int \nabla g\cdot w=- \int g(\mu -\nu )\) for all \(g\in {\mathcal {C}}^1(M)\). Let \(\rho _t = (1-t)\mu + t\nu \) and define E the vector measure having density w with respect to \(\mathrm {Leb}_1 \times \mathrm {vol}_M\), where \(\mathrm {Leb}_1\) is the Lebesgue measure on [0, 1]. Then \((\rho ,E)\) satisfies the continuity equation and \(E = v\cdot \rho \) where \(v(t,x)=\frac{w(x)}{(1-t)f_0(x)+tf_1(x)}\) for \(t\in [0,1]\), \(x\in M\). Hence,

$$\begin{aligned} W_p^p(\mu ,\nu )&\le \int _0^1 \int \frac{1}{p}\vert v\vert ^p \mathrm {d}\rho \\&= \frac{1}{p}\int _0^1\int \frac{\vert w(x)\vert ^p}{\vert (1-t)f_0(x)+tf_1(x)\vert ^p}( (1-t)f_0(x)+tf_1(x)) \mathrm {d}x\mathrm {d}t \\&\le \frac{1}{p}\int \vert w(x)\vert ^p\mathrm {d}x \frac{1}{f_{\min }^{p-1}}. \end{aligned}$$

By taking the infimum on vector fields w on M satisfying \(\nabla \cdot w = \mu -\nu \) and using Proposition 1, we obtain the conclusion. The second inequality in (2.6) follows from Proposition 1. \(\square \)

Proofs of Section 4.1

Proof of Lemma  10

We first prove (4.4). Note that if \(\vert x-y\vert \ge h\) for \(x,y\in M\), then \(K_h(x-y)=0\). Hence, by a change of variable, using that \({\mathcal {B}}_M(x,h)\subset \Psi _x({\mathcal {B}}_{T_xM}(0,h))\) according to Lemma 20(iv),

$$\begin{aligned}&\int _M K_h(x-y) B[(x-y)^{\otimes j}] \mathrm {d}y \\&\quad = \int _{{\mathcal {B}}_{T_xM}(0,h)} K_h(x-\Psi _x(v)) B[(x-\Psi _x(v))^{\otimes j}] J\Psi _x(v)\mathrm {d}v\\&\quad = \int _{{\mathcal {B}}_{T_xM}(0,1)} K\left( \frac{x-\Psi _x( h v)}{h} \right) B[(x-\Psi _x( hv))^{\otimes j}]J\Psi _x( hv)\mathrm {d}v. \end{aligned}$$

As the functions \(\Psi _x\) and K are \({\mathcal {C}}^{k}\), according to Lemma 20(v) and Lemma 20(vi), we can write by a Taylor expansion, for \(v,u \in {\mathcal {B}}_{T_xM}(0,r_0)\),

$$\begin{aligned} {\left\{ \begin{array}{ll} \Psi _x(v) = x + v + \sum _{i=2}^{k-1} \frac{d^i\Psi _x(0)}{i!}[v^{\otimes i}] + R_1(x,v)\\ J\Psi _x(v) = 1 + \sum _{i= 2}^{k-1} B_x^i [v^{\otimes i}] + R_2(x,v)\\ K(v+u) = K(v) + \sum _{i=1}^{k-1} \frac{d^i K(v)}{i!}[u^{\otimes i}] + R_3(v,u)\\ B[(v+u)^{\otimes j}] = B[v^{\otimes j}] + \sum _{\emptyset \ne \sigma \subset \{1,\dots ,j\}} B[v^{\sigma },u^{\sigma ^c}], \end{array}\right. } \end{aligned}$$
(C.1)

where \(\vert R_j(x,v)\vert \le C_j\vert v\vert ^{k}\) for \(j=1,2\), \(\vert R_3(v,u)\vert \le C_3\vert u\vert ^{k}\) and \((v^{\sigma },u^{\sigma ^c})\) is the j-tuple whose lth entry is equal to v if \(l\in \sigma \), u otherwise. We obtain that

$$\begin{aligned} \frac{x-\Psi _x( hv)}{h}= -v - \sum _{i=2}^{k-1} \frac{d^i\Psi _x(0)}{i!}[(h v)^{\otimes i}] h^{-1} -R_1(x,hv)h^{-1}, \end{aligned}$$

and that the expression \(K\left( \frac{x-\Psi _x( h v)}{h} \right) B[(x-\Psi _x( hv))^{\otimes j}]J\Psi _x( hv)\) is written as a sum of terms of the form

$$\begin{aligned} \begin{aligned}&C_{i_0,i_1,i_2}h^{-i_0}d^{i_0}K( v)[ (d^{i_1}\Psi _x(0)[(hv)^{\otimes i_1}])^{\otimes i_0}] F_{i_2}[ (h v)^{\otimes i_2}] \end{aligned} \end{aligned}$$
(C.2)

for \(0\le i_0\le k-1\), \(2\le i_1\le k-1\) and \(j\le i_2\le k'\), where \(F_{i_2}\) is some tensor of order \(i_2\) and \(k'\) is some integer depending on k and j, plus a remainder term smaller than \(\left\| B \right\| _{\mathrm {op}}\vert h v\vert ^{k-1+j}\) up to a constant depending on k, j, \(L_k\) and K. The terms for which \(i_0i_1+i_2-i_0\ge k\) are smaller than \(\left\| B \right\| _{\mathrm {op}}h^{k}\) up to a constant, whereas the integrals of the other the terms are null as the kernel is of order k. The first inequality in (4.5) is proven in a similar manner. Let us now bound \(\Vert \rho _h\Vert _{{\mathcal {C}}^j(M)}\). Given \(x\in M\), we have

$$\begin{aligned} d^j(\rho _h\circ \Psi _x)(0) = h^{-j} \int _{{\mathcal {B}}_{T_xM}(0,h)}(d^j K)_h(x-\Psi _x(v)) J\Psi _x(v)\mathrm {d}v. \end{aligned}$$

Therefore, using the same argument as before, we obtain that \(\left\| d^j(\rho _h\circ \Psi _x)(0) \right\| _{\mathrm {op}}\lesssim h^{k-1-j}\). \(\square \)

Proof of Lemma 11

Let \(0\le l \le k-1\) be even, \(\phi \in {\mathcal {C}}^\infty (M)\) be supported in \({\mathcal {B}}_M(x_0,h_0)\) for some \(h_0\) small enough and \(g\in L_{p^* }(M)\) with \(\Vert g\Vert _{L_{p^* }(M)}\le 1\). Let \(x=\Psi _{x_0}(u)\in {\mathcal {B}}_M(x_0,h_0)\) and let \(\tilde{\phi }_{x_0}=\tilde{\phi }\circ \Psi _{x_0}\). Recall that \(\tilde{\phi }_l=d^l\tilde{\phi }_{x_0} \circ \tilde{\pi }_{x_0}\). We have \(K_h(x-\Psi _{x_0}(v))\ne 0\) only if \(\vert x-\Psi _{x_0}(v)\vert \le h\). Hence, as \(\vert x-\Psi _{x_0}(v)\vert \ge \vert u-v\vert \) (recall that \(\Psi _{x_0}\) is the inverse of the projection \(\tilde{\pi }_{x_0}\)), the function \(K_h(x-\Psi _{x_0}(\cdot ))\) is supported on for \(h,h_0\) small enough. Thus,

$$\begin{aligned} A_h \phi (x)&= \int _{{\mathcal {B}}_M(x,h)} K_h(x-y)( \tilde{\phi }(y)- \tilde{\phi }(x))\mathrm {d}y\\&= \int _{B_0} K_h(x-\Psi _{x_0}(v))( \tilde{\phi }_{x_0}(v)- \tilde{\phi }_{x_0}(u))J\Psi _{x_0}(v)\mathrm {d}v. \end{aligned}$$

We may write

$$\begin{aligned}&\tilde{\phi }_{x_0}(v)-\tilde{\phi }_{x_0}(u)\\&\quad = \sum _{i=1}^{l-1}\frac{d^i \tilde{\phi }_{x_0}(u)}{i!}[(v-u)^{\otimes i}] +\int _0^1 d^l \tilde{\phi }_{x_0}(u+\lambda (v-u))[(v-u)^{\otimes l}]\frac{(1-\lambda )^{l-1}}{(l-1)!}\mathrm {d}\lambda . \end{aligned}$$

Each term \(\int _{B_0} K_h(x-\Psi _{x_0}(v))\frac{d^i \tilde{\phi }_{x_0}(u)}{i!}[(v-u)^{\otimes i}] J\Psi _{x_0}(v)\mathrm {d}v\) is equal to

$$\begin{aligned} \int _M K_h(x-y) \frac{d^i \tilde{\phi }_{x_0}(\tilde{\pi }_{x_0}(x))}{i!}[(\pi _{x_0}(y-x))^{\otimes i}]\mathrm {d}y, \end{aligned}$$

and is therefore of order smaller than \(h^k\max _{1\le i \le l}\left\| \tilde{\phi }_i(x) \right\| _{\mathrm {op}}\) by Lemma 10. Hence, \(A_h \phi (x)\) is equal to the sum of a remainder term of order \(h^{k}\max _{1\le i \le l}\left\| \tilde{\phi }_i(x) \right\| _{\mathrm {op}}\) and of

$$\begin{aligned}&\int _0^1 \int _{B_0} K_h(x-\Psi _{x_0}(v))d^l \tilde{\phi }_{x_0}(u+\lambda (v-u))[(v-u)^{\otimes l}]\frac{(1-\lambda )^{l-1}}{(l-1)!}J\Psi _{x_0}(v)\mathrm {d}v\mathrm {d}\lambda \\&\quad = \int _0^1 \int _{B_0}K_h(x-\Psi _{x_0}(v))\left( d^l \tilde{\phi }_{x_0}(u+\lambda (v-u))-d^l \tilde{\phi }_{x_0}(u) \right) [(v-u)^{\otimes l}]\\&\qquad \times \frac{(1-\lambda )^{l-1}}{(l-1)!}J\Psi _{x_0}(v)\mathrm {d}v\mathrm {d}\lambda +R_1(x), \end{aligned}$$

where \(\vert R_1(x)\vert \lesssim h^{k}\max _{1\le i \le l}\left\| \tilde{\phi }_i(x) \right\| _{\mathrm {op}}\) by Lemma 10. We now fix \(\lambda \in (0,1)\) and write by a change of variables

Note that \(\vert K_h(u)-K_h(v)\vert \lesssim h^{-d-1}\vert u-v\vert {\mathbf {1}}\{\vert u\vert \le h \text { or } \vert v\vert \le h\}\), and that, as \(\Psi _{x_0}\) is \({\mathcal {C}}^2\),

$$\begin{aligned}&\left| x-\Psi _{x_0}\left( u+\frac{w-u}{\lambda } \right) -\frac{x-\Psi _{x_0}(w)}{\lambda }\right| \\&\quad \le \left| \frac{d\Psi _{x_0}(u)[w-u] -( x -\Psi _{x_0}(w))}{\lambda }\right| + \frac{L_k\vert w-u\vert ^2}{2\lambda ^2} \\&\quad \le \frac{L_k\vert w-u\vert ^2}{\lambda } \lesssim \frac{\vert w-u\vert ^2}{\lambda }, \end{aligned}$$

whereas, as \(J\Psi _{x_0}\) is Lipschitz continuous,

$$\begin{aligned}&\left| J\Psi _{x_0}\left( u+\frac{w-u}{\lambda } \right) -J\Psi _{x_0}(w)\right| \lesssim \left| u+\frac{w-u}{\lambda }-w\right| \lesssim \frac{\vert w-u\vert }{\lambda }. \end{aligned}$$

Hence, U(x) is equal to the sum of

$$\begin{aligned}&\lambda ^{-l}\int _{B_0}K_{h\lambda }\left( x-\Psi _{x_0}(w) \right) \left( d^l \tilde{\phi }_{x_0}(w)-d^l \tilde{\phi }_{x_0}(u) \right) [(w-u)^{\otimes l}]J\Psi _{x_0}\left( w \right) \mathrm {d}w\\&\quad = \lambda ^{-l}\int _{M}K_{h\lambda }\left( x-y \right) \left( \tilde{\phi }_l(y)-\tilde{\phi }_l(x) \right) [(\pi _{x_0}(y-x))^{\otimes l}]\mathrm {d}y, \end{aligned}$$

and of a remainder term smaller than

$$\begin{aligned}&\lambda ^{-l}\int _{B_0}\Big \vert \lambda ^{-d}K_h\left( x-\Psi _{x_0}\left( u+\frac{w-u}{\lambda } \right) \right) J\Psi _{x_0}\left( u+\frac{w-u}{\lambda } \right) \\&\qquad -K_{h\lambda }\left( x-\Psi _{x_0}(w) \right) J\Psi _{x_0}\left( w \right) \Big \vert \times \left\| d^l \tilde{\phi }_{x_0}(w)-d^l \tilde{\phi }_{x_0}(u) \right\| _{\mathrm {op}}\vert w-u\vert ^l \mathrm {d}w \\&\quad \lesssim \lambda ^{-l}\int _{\vert w-u\vert \lesssim \lambda h}\Bigg (\frac{\vert w-u\vert ^2}{(\lambda h)^{d+1}}J\Psi _{x_0}\left( u+\frac{w-u}{\lambda } \right) +\vert K_{h\lambda }\left( x-\Psi _{x_0}(w) \right) \vert \frac{\vert w-u\vert }{\lambda }\Bigg )\\&\qquad \times \left\| d^l \tilde{\phi }_{x_0}(w)-d^l \tilde{\phi }_{x_0}(u) \right\| _{\mathrm {op}}\vert w-u\vert ^l \mathrm {d}w \\&\quad \lesssim h^{l+1}(\lambda h)^{-d}\int _{\vert w-u\vert \lesssim \lambda h} \left\| d^l \tilde{\phi }_{x_0}(w)-d^l \tilde{\phi }_{x_0}(u) \right\| _{\mathrm {op}} \mathrm {d}w. \end{aligned}$$

Putting all the estimates together, we may now write \(\int _M A_h\phi (x)g(x)\mathrm {d}x\) as \(S+R_2\), where, by the symmetrization trick (using that l is even)

$$\begin{aligned} S&= \iint _{M\times M}K^{(l)}_{h}\left( x-y \right) \left( \tilde{\phi }_l(y)-\tilde{\phi }_l(x) \right) [(\pi _{x_0}(y-x))^{\otimes l}] g(x)\mathrm {d}y\mathrm {d}x \\&= \iint _{M\times M}K^{(l)}_{h}\left( x-y \right) \left( \tilde{\phi }_l(x)-\tilde{\phi }_l(y) \right) [(\pi _{x_0}(x-y))^{\otimes l}] g(y)\mathrm {d}y\mathrm {d}x \\&=\frac{1}{2}\iint _{M\times M}K^{(l)}_{h}\left( x-y \right) \left( \tilde{\phi }_l(y)-\tilde{\phi }_l(x) \right) [(\pi _{x_0}(x-y))^{\otimes l}]( g(x)- g(y))\mathrm {d}y\mathrm {d}x, \end{aligned}$$

and, as \(A_h\phi \) is supported on \({\mathcal {B}}_M(x_0,h_0+h)\subset {\mathcal {B}}_M(x,2h_0)\) if h is small enough, \(R_2\) is smaller up to a constant than,

$$\begin{aligned}&h^{l+1}(\lambda h)^{-d}\quad \int _{x\in {\mathcal {B}}_M(x_0,2h_0)} \quad \int _{\vert w-\tilde{\pi }_{x_0}(x)\vert \lesssim \lambda h}\quad \left\| d^l \tilde{\phi }_{x_0}(w)-d^l \tilde{\phi }_{x_0}(\tilde{\pi }_{x_0}(x)) \right\| _{\mathrm {op}}\vert g(x)\vert \mathrm {d}w \mathrm {d}x \end{aligned}$$
(C.3)
$$\begin{aligned}&\qquad +\int _M h^{k}\max _{1\le i \le l}\left\| \tilde{\phi }_i(x) \right\| _{\mathrm {op}}\vert g(x)\vert \mathrm {d}x \nonumber \\&\quad \lesssim h^{l+1}(\lambda h)^{-d}\int _{w\in {\mathcal {B}}_M(x,3h_0)}\left\| d^l \tilde{\phi }_{x_0}(w) \right\| _{\mathrm {op}}\int _{\vert w-\tilde{\pi }_{x_0}(x)\vert \lesssim \lambda h} \vert g(x)\vert \mathrm {d}x\mathrm {d}w \nonumber \\&\qquad + h^{l+1}\int _{x\in {\mathcal {B}}_M(x,2h_0)} \left\| \tilde{\phi }_l(x) \right\| _{\mathrm {op}}\vert g(x)\vert \mathrm {d}x +\int _M h^{k}\max _{1\le i \le l}\left\| \tilde{\phi }_i(x) \right\| _{\mathrm {op}}\vert g(x)\vert \mathrm {d}x, \end{aligned}$$
(C.4)

where we also used Lemma 20(iii). By the chain rule,

$$\begin{aligned} \max _{1\le i \le l}\left\| \tilde{\phi }_i(x) \right\| _{\mathrm {op}}= & {} \max _{1\le i \le l}\left\| d^i({\tilde{\phi }}\circ \Psi _{x_0})\circ {\tilde{\pi }}_{x_0}(x) \right\| _{\mathrm {op}} \nonumber \\= & {} \max _{1\le i \le l}\left\| d^i({\tilde{\phi }}\circ \Psi _x \circ {\tilde{\pi }}_x \circ \Psi _{x_0})\circ {\tilde{\pi }}_{x_0}(x) \right\| _{\mathrm {op}} \nonumber \\\lesssim & {} \max _{1\le i \le l} \left\| d^i ({\tilde{\phi }}\circ \Psi _x)({\tilde{\pi }}_x \circ \Psi _{x_0}\circ {\tilde{\pi }}_{x_0}(x)) \right\| _{\mathrm {op}}\nonumber \\\lesssim & {} \max _{1\le i \le l} \left\| d^i ({\tilde{\phi }}\circ \Psi _x)(0) \right\| _{\mathrm {op}} = \left\| d^l {\tilde{\phi }}(x) \right\| _{\mathrm {op}}. \end{aligned}$$
(C.5)

Hence, applying Hölder’s inequality and using that \(\Vert g\Vert _{L_{p^* }(M)} \le 1\) show that the two last terms in (C.4) are of order \(h^{l+1}\Vert \tilde{\phi }\Vert _{H^l_p(M)}\). To bound the first term in (C.4), remark that by Young’s inequality for integral operators [56, Theorem 0.3.1], if \({\mathcal {T}}_{\lambda h}(g)(y) = (\lambda h)^{-d}\int _{\vert x-y\vert \lesssim \lambda h} \vert g(x)\vert \mathrm {d}x\), then \(\Vert {\mathcal {T}}_{\lambda h}g\Vert _{L_{p^* }(M)} \lesssim \Vert g\Vert _{L_{p^* }(M)}\). This yields, by Hölder’s inequality,

$$\begin{aligned} h^{l+1}\int _{w\in {\mathcal {B}}_M(x,3h_0)}\left\| d^l \tilde{\phi }_{x_0}(w) \right\| _{\mathrm {op}}{\mathcal {T}}_{h\lambda }(g)(\Psi _{x_0}(w))\mathrm {d}w \lesssim h^{l+1}\Vert \tilde{\phi }\Vert _{H^l_p(M)}, \end{aligned}$$

which concludes the proof of the first statement of Lemma 11. To bound the remainder term in terms of \(\Vert \tilde{\phi }\Vert _{H^{l+1}_p(M)}\), we bound the second term in (C.3) in the same fashion, while, to bound the first term, we write, by a change of variables,

$$\begin{aligned}&\int _{{\mathcal {B}}_M(x_0,2h_0)}\int _{\vert w-\tilde{\pi }_{x_0}(x)\vert \lesssim \lambda h} \left\| d^l \tilde{\phi }_{x_0}(w)-d^l \tilde{\phi }_{x_0}(\tilde{\pi }_{x_0}(x)) \right\| _{\mathrm {op}}\vert g(x)\vert \mathrm {d}x \mathrm {d}w\\&\quad \le \int _0^1 \int _{{\mathcal {B}}_M(x_0,2h_0)}\int _{\vert w-\tilde{\pi }_{x_0}(x)\vert \lesssim \lambda h} \left\| d^{l+1} \tilde{\phi }_{x_0}(\tilde{\pi }_{x_0}(x)+\lambda '(w-\tilde{\pi }_{x_0}(x))) \right\| _{\mathrm {op}} \\&\qquad \times \vert \tilde{\pi }_{x_0}(x)-w\vert \vert g(x)\vert \mathrm {d}x \mathrm {d}w \mathrm {d}\lambda '\\&\quad \lesssim h \int _0^1\int _{{\mathcal {B}}_M(x_0,2h_0)}\int _{\vert u-\tilde{\pi }_{x_0}(x)\vert \lesssim \lambda '\lambda h} \left\| d^{l+1} \tilde{\phi }_{x_0}(u) \right\| _{\mathrm {op}}\vert g(x)\vert \mathrm {d}x \frac{\mathrm {d}u}{\lambda '^d} \mathrm {d}\lambda ', \end{aligned}$$

and this term is bounded as the first term in (C.4) by \(h(h\lambda )^d \Vert \tilde{\phi }\Vert _{H^{l+1}_p(M)}\), concluding the proof of Lemma 11. \(\square \)

Proof of Lemma 12

The second inequality of Lemma 12 follows from the definition of the Sobolev norm and of the bound (C.5) applied to \({\tilde{\phi }}=\eta \). To prove the first inequality, write

$$\begin{aligned}&h^{-d} \iint _{{\mathcal {B}}_M(x_0,h_0)^2} {\mathbf {1}}\{\vert x-y\vert \le h\} \frac{\left\| \eta _l(x)-\eta _l(y) \right\| _{\mathrm {op}}^p}{\vert x-y\vert ^p}\mathrm {d}x\mathrm {d}y \\&\quad \lesssim h^{-d} \iint _{{\mathcal {B}}_{T_{x_0}M}(0,h_0)^2} {\mathbf {1}}\{\vert \Psi _{x_0}(u)-\Psi _{x_0}(v)\vert \le h\}\\&\qquad \times \frac{\left\| d^l(\eta \circ \Psi _{x_0})(u)-d^l(\eta \circ \Psi _{x_0})(v) \right\| _{\mathrm {op}}^p}{\vert \Psi _{x_0}(u)-\Psi _{x_0}(v)\vert ^p}\mathrm {d}u\mathrm {d}v \quad \text { as }J\Psi _{x_0}(u)\lesssim 1\text { for }\vert u\vert \lesssim 1\\&\quad \lesssim h^{-d} \int _0^1\iint _{{\mathcal {B}}_{T_{x_0}M}(0,h_0)^2} {\mathbf {1}}\{\vert u-v\vert \le h\} \left\| d^{l+1}(\eta \circ \Psi _{x_0})(u+\lambda (v-u)) \right\| _{\mathrm {op}}^p\mathrm {d}u\mathrm {d}v\mathrm {d}\lambda \\&\qquad \text { as }\vert v-u \vert \le \vert \Psi _{x_0}(u)-\Psi _{x_0}(v)\vert \\&\quad \lesssim h^{-d} \int _0^1\iint _{{\mathcal {B}}_{T_{x_0}M}(0,2h_0)^2} {\mathbf {1}}\{\vert w-u\vert \le \lambda h\} \left\| d^{l+1}(\eta \circ \Psi _{x_0})(w) \right\| _{\mathrm {op}}^p\mathrm {d}u\mathrm {d}w \lambda ^{-d} \mathrm {d}\lambda \\&\quad \lesssim \int _0^1\int _{{\mathcal {B}}_{T_{x_0}M}(0,2h_0)} \left\| d^{l+1}(\eta \circ \Psi _{x_0})(w) \right\| _{\mathrm {op}}^p\mathrm {d}w \lesssim \int _{{\mathcal {B}}_M(x_0,h_0)}\left\| \eta _{l+1}(x) \right\| _{\mathrm {op}}^p\mathrm {d}x, \end{aligned}$$

where at the second to last line, we used that \(w=u+\lambda (v-u)\) is of norm smaller than \(2h_0\) if \(\vert u\vert \le h_0\) and \(\vert v-u\vert \le h\le h_0\), and, at the last line, we used that \(J\Psi _{x_0}(w)\ge 1/2\) for \(\vert w\vert \) small enough. \(\square \)

Proof of Lemma 15

Lemma 15 is heavily based on the following classical control on the gradient of the Green function.

Lemma 23

Let \(x,y\in M\), then

$$\begin{aligned} \vert \nabla _x G(x,y)\vert \lesssim \frac{1}{d_g(x,y)^{d-1}}\le \frac{1}{\vert x-y\vert ^{d-1}}. \end{aligned}$$
(D.1)

Proof

For \(d\ge 2\), a proof of Lemma 23 is found in [43,  Theorem 4.13]. See also [57, Theorem 5.2] for a proof with more explicit constants in the case \(d\ge 3\). Constants in their proofs depend on d, bounds on the curvature of M, \(\vert \mathrm {vol}_M\vert \) and the geodesic diameter of M. As, those three last quantities can be further bounded by constants depending on \(\tau _{\min }\), \(f_{\min }\) and d, see Lemma 22 and [8, Proposition 6.1], this concludes the proof. For \(d=1\), M is isometric to a circle, for which a closed formula for G exists [58], and satisfies \(\vert \nabla _x G(x,y)\vert \le 1\). \(\square \)

Recall that, by Lemma 10, \(\vert \rho _h(x)\vert \ge 1/2\) for all \(x\in M\). Therefore, Lemma 23 yields

$$\begin{aligned} \left| \nabla G\left( K_h*\left( \frac{\delta _{x}}{\rho _h} \right) \right) (z)\right|&= \left| \int _{ M} \nabla _z G(z,y) \frac{K_h(x-y)}{\rho _h(x)}\mathrm {d}y \right| \\&\lesssim \int _{ {\mathcal {B}}_M(x,h)} \frac{\Vert K\Vert _\infty h^{-d}}{\vert z-y\vert ^{d-1}}\mathrm {d}y. \end{aligned}$$

If \(d=1\), this quantity is smaller than a constant as \(\mathrm {vol}_M({\mathcal {B}}_M(x,h))\lesssim h^d\) by Lemma 20(iii). We then obtain directly the result in this case by integrating this inequality against \(f(x)\mathrm {d}x\). If \(d\ge 2\), we use the following argument.

  • If \(\vert x-z\vert \ge 2h\) and \(y\in {\mathcal {B}}_M(x,h)\), then \(\vert z-y\vert \ge \vert x-z\vert -h\ge \vert x-z\vert /2\). Therefore, by Lemma 20(iii),

    $$\begin{aligned}&\int _{{\mathcal {B}}_M(x,h)}\frac{\Vert K\Vert _\infty h^{-d}}{\vert z-y\vert ^{d-1}}\mathrm {d}y \le \frac{2^{1-d}\Vert K\Vert _\infty h^{-d}}{\vert x-z\vert ^{d-1}}\mathrm {vol}_M({\mathcal {B}}_M(x,h)) \lesssim \frac{1}{\vert x-z\vert ^{d-1}}. \end{aligned}$$
  • If \(\vert x-z\vert \le 2h\), then

    $$\begin{aligned} \int _{{\mathcal {B}}_M(x,h)}\frac{\Vert K\Vert _\infty h^{-d}}{\vert z-y\vert ^{d-1}}\mathrm {d}y&\le \int _{ {\mathcal {B}}_M(z,3h)}\frac{\Vert K\Vert _\infty h^{-d}}{\vert z-y\vert ^{d-1}} \mathrm {d}y \\&\le \int _{ {\mathcal {B}}_{T_z M}(0,3h)}\frac{\Vert K\Vert _\infty h^{-d} J\Psi _z(u)}{\vert z-\Psi _z(u)\vert ^{d-1}} \mathrm {d}u\\&\lesssim h^{-d}\int _{{\mathcal {B}}_{T_z M}(0,3h)}\frac{\mathrm {d}u}{\vert u\vert ^{d-1}} \lesssim h^{1-d}, \end{aligned}$$

    where at the last line we used that \(\vert z-\Psi _z(u)\vert \ge \vert u\vert \) and that \(J\Psi _z(u)\lesssim 1\) by Lemma 20.

Hence,

$$\begin{aligned}&{\mathbb {E}}\left[ \vert \nabla (G(K_h*\delta _{X}))(z)\vert ^p\right] = \int _M f(x)\vert \nabla (G(K_h*\delta _{x}))(z)\vert ^p \mathrm {d}x \\&\quad \le f_{\max }\int _M \vert \nabla (G(K_h*\delta _{x}))(z)\vert ^p \mathrm {d}x \\&\quad \lesssim \int _{{\mathcal {B}}_M(z,2h)} \vert \nabla (G(K_h*\delta _{x}))(z)\vert ^p \mathrm {d}x+ \int _{M\backslash {\mathcal {B}}_M(z,2h)} \vert \nabla (G(K_h*\delta _{x}))(z)\vert ^p \mathrm {d}x \\&\quad \lesssim \int _{{\mathcal {B}}_M(z,2h)} h^{(1-d)p} \mathrm {d}x+ \int _{M\backslash {\mathcal {B}}_M(z,2h)} \vert z-x\vert ^{(1-d)p} \mathrm {d}x \\&\quad \lesssim h^{(1-d)p+d}+ \int _{M\backslash {\mathcal {B}}_M(z,2h)} \vert z-x\vert ^{(1-d)p} \mathrm {d}x. \end{aligned}$$

The latter integral is bounded by

$$\begin{aligned}&\int _{2h\le \vert x-z\vert \le r_0}\vert z-x\vert ^{(1-d)p}\mathrm {d}x +\int _{\vert x-z\vert \ge r_0}\vert z-x\vert ^{(1-d)p}\mathrm {d}x \\&\qquad \le \int _{2h\le \vert \Psi _z(u)-z\vert \le r_0} \vert z-\Psi _z(u)\vert ^{(1-d)p}J\Psi _z(u)\mathrm {d}u+\vert \mathrm {vol}_M\vert r_0^{(1-d)p} \\&\qquad \lesssim \int _{14h/8\le \vert u\vert \le r_0}\vert u\vert ^{(1-d)p}\mathrm {d}u+1 \lesssim h^{(1-d)p+d} \text { if }(1-d)p+d< 0, \end{aligned}$$

where at the last line we use that \(\vert u\vert \le \vert z-\Psi _z(u)\vert \le 8\vert u\vert /7\) by Lemma 20. If \(d>2\) or if \(d=2\) and \(p>2\), the condition \((1-d)p+d< 0\) is always satisfied. If \(d=2\) and \(p=2\), then \(\int _{14h/8\le \vert u\vert \le h_0}\vert u\vert ^{(1-d)p}\mathrm {d}u\) is of order \(-\log h\), concluding the proof.

Proof of Theorem 4(i)

Let f be the density of \(\mu \) and \(\tilde{f}=f/\rho _h\). By Lemma 10, \(f_{\min }(1-c_0h^{k-1})\le \tilde{f}\le f_{\max }(1+c_0h^{k-1})\) for h small enough. We have

$$\begin{aligned}&K_h *{\tilde{f}}(x) =\int _M K_h(x-y)\tilde{f}(y)\mathrm {d}y\nonumber \\&\quad = \int _{{\mathcal {B}}_{T_xM}(0,h)}K_h(x-\Psi _x(v)) \tilde{f}\circ \Psi _x(v) J\Psi _x(v)\mathrm {d}v \nonumber \\&\quad \ge \int _{{\mathcal {B}}_{T_xM}(0,h)} K_h(v)\tilde{f}\circ \Psi _x(v) J\Psi _x(v)\mathrm {d}v \nonumber \\&\qquad - \int _{{\mathcal {B}}_{T_xM}(0,h)}\vert K_h(x-\Psi _x(v))-K_h(v)\vert \tilde{f}\circ \Psi _x(v) J\Psi _x(v)\mathrm {d}v. \end{aligned}$$
(E.1)

By Lemma 20(v), the quantity \(\vert K_h(x-\Psi _x(v))-K_h(v)\vert \) is bounded by \(\frac{\Vert K\Vert _{{\mathcal {C}}^1({\mathbb {R}}^d)}}{h^{d+1}}\vert x-v-\Psi _x(v)\vert \lesssim \frac{\vert v\vert ^2}{h^{d+1}}\), so that the second term in (E.1) is bounded by \(Cf_{\max }\int _{{\mathcal {B}}_{T_x M}(0,h)}\frac{\vert v\vert ^2}{h^{d+1}}\mathrm {d}v \lesssim h\). Also, using that \(\vert J\Psi _x(v)-1\vert \le c_1 \vert v\vert \) by Lemma 20, the first term is larger than

$$\begin{aligned}&f_{\min }(1-c_0h^{k-1})(1-c_1h)\int _{{\mathbb {R}}^d} K_+ - f_{\max }(1+c_1h)(1+c_0h^{k-1})\int _{{\mathbb {R}}^d} K_-\\&\quad =f_{\min }(1-c_2h)\left( 1+\int _{{\mathbb {R}}^d} K_- \right) - f_{\max }(1+c_2h)\int _{{\mathbb {R}}^d} K_-\\&\quad =f_{\min }(1-c_2h) - (f_{\max }(1+c_2h)-f_{\min }(1-c_2h))\int _{{\mathbb {R}}^d} K_-(v) \mathrm {d}v \\&\quad \ge f_{\min }(1-c_2h) -(f_{\max }(1+c_2h)-f_{\min }(1-c_2h))\beta \\&\quad \ge 3f_{\min }/4, \end{aligned}$$

if \(\beta < f_{\min }/(4(f_{\max }-f_{\min }))\) and h is small enough. Likewise, we show that \(K_h*\tilde{f}(x)\le 3f_{\max }/2\). It remains to show that \(\vert K_h*\tilde{f}(x)-K_h*(\mu _n/\rho _h)(x)\vert \) is small enough for all \(x\in M\) with high probability. Note that \(K_h*\tilde{f}-K_h*(\mu _n/\rho _h)\) is L-Lipschitz with \(L\lesssim h^{-d-1}\). Let \(t=f_{\min }/4\) and consider a covering of M by N balls \({\mathcal {B}}_M(x_j,t/(2L))\). By standard packing arguments, such a covering exists with \(N\lesssim (L/t)^d\). If \(\vert K_h*\tilde{f}(x_j)-K_h*\mu _n(x_j)\vert \le t/2\) for all \(j=1,\dots ,N\), then \(\Vert K_h*\tilde{f}-K_h*\mu _n\Vert _{L_\infty (M)} \le t/2 + Lt/(2L) \le t\). Hence, using Bernstein inequality [59, Theorem 3.1.7], as \(\vert K_h(x_j-Y_i)\vert \le \Vert K\Vert _{{\mathcal {C}}^0({\mathbb {R}}^D)}h^{-d}\) and \(\mathrm {Var}(K_h(x_j-Y_i))\le \Vert K^2\Vert _{{\mathcal {C}}^0({\mathbb {R}}^D)}h^{-d}\), we obtain

$$\begin{aligned}&{\mathbb {P}}(\Vert K_h*\tilde{f}-K_h*\mu _n\Vert _{L_\infty (M)} \ge t) \le {\mathbb {P}}(\exists j,\ \vert K_h*\tilde{f}(x_j)-K_h*\mu _n(x_j)\vert \ge t/2) \\&\qquad \lesssim (L/t)^d{\mathbb {P}}(\vert K_h*\tilde{f}(x_j)-K_h*\mu _n(x_j)\vert \ge t/2) \lesssim h^{-d(d+1)} \exp (-Cnh^d). \end{aligned}$$

Choosing \(nh^d= C'\log n\) for \(C'\) large enough yields the conclusion.

Proofs of Section 4.4

We first prove Lemma 16.

Proof of (a)

The application \(\Psi _{Y_{j}}\circ \pi _{Y_{j}}: {\mathcal {B}}_{{{\hat{T}}}_{j}}(0,3\varepsilon )\rightarrow M\) is a diffeomorphism on \({\mathcal {B}}_{{{\hat{T}}}_{j}}(0,3\varepsilon )\), as the composition of the diffeomorphisms \(\Psi _{Y_{j}}\) and \((\pi _{Y_{j}})_{\vert {{\hat{T}}}_{j}}\) (recall that \(\angle ({{\hat{T}}}_{j},T_{Y_{j}}M) \lesssim \varepsilon ^{m-1}+ \gamma \varepsilon ^{-1} \lesssim 1\) by Proposition 5). Furthermore, by Lemma 20(iv) and the bound on the angle,

$$\begin{aligned} {\mathcal {B}}_M(Y_j,2\varepsilon )\subset \Psi _{Y_j}({\mathcal {B}}_{T_{Y_j}M}(0,2\varepsilon )) \subset (\Psi _{Y_j}\circ \pi _{Y_j})({\mathcal {B}}_{{{\hat{T}}}_j}(0,3\varepsilon )). \end{aligned}$$

This proves the first part of Lemma 16(a). Let \(S_j:{\mathcal {B}}_M(Y_j,2\varepsilon )\rightarrow {\mathcal {B}}_{{{\hat{T}}}_j}(0,3\varepsilon )\) be the inverse of \(\Psi _{Y_j}\circ \pi _{Y_j}\). By Lemma 21(ii), \({\hat{\Psi _j}}\) is injective on \({{\hat{T}}}_j\), while, for \(v\in {{\hat{T}}}_j\) with \(\vert v\vert \le 3\varepsilon \),

$$\begin{aligned} \left\| \mathrm {id}-d{\hat{\Psi }}_j(v) \right\| _{\mathrm {op}} \le \left\| \sum _{a= 2}^{m-1}a {{\hat{V}}}_{a,j}[\cdot ,v^{\otimes (a-1)}] \right\| \lesssim \ell \varepsilon \le 1/2 \end{aligned}$$
(F.1)

if \(\ell \lesssim \varepsilon ^{-1}\) is small enough. Hence, \({\hat{\Psi _j}}:{\mathcal {B}}_{{{\hat{T}}}_j}(0,3\varepsilon )\rightarrow {\hat{\Psi _j}}({{\hat{T}}}_j)\) is a diffeomorphism on its image, and \({\hat{\Psi _j\circ }} S_j\) is a diffeomorphism as a composition of diffeomorphisms. Note that the inverse of \({\hat{\Psi _j}}\) is given by \({\hat{\pi _j}}(\cdot -X_j)\), so that \({\mathcal {B}}_{{\hat{\Psi _j}}({{\hat{T}}}_j)}(X_j,\varepsilon )\subset {\hat{\Psi _j}}({\mathcal {B}}_{{{\hat{T}}}_j}(0,\varepsilon ))\). Furthermore, by Lemma 20,

$$\begin{aligned} (\Psi _{Y_j}\circ \pi _{Y_j})({\mathcal {B}}_{{{\hat{T}}}_j}(0,\varepsilon )) \subset \Psi _{Y_j}({\mathcal {B}}_{T_{Y_j}}(0,\varepsilon )) \subset {\mathcal {B}}_M(Y_j,8\varepsilon /7), \end{aligned}$$

so that \(({\hat{\Psi _j\circ }} S_j)({\mathcal {B}}_M(Y_j,2\varepsilon ))\) contains \({\mathcal {B}}_{{\hat{\Psi _j}}({{\hat{T}}}_j)}(X_j,\varepsilon )\). Furthermore, these inclusions of balls also hold for any \(\varepsilon '\le \varepsilon \), proving that \(\vert {\hat{\Psi _j\circ }} S_j(z)-X_j\vert \ge (7/8)\vert z-Y_j\vert \) for any \(z\in {\mathcal {B}}_M(Y_j,2\varepsilon )\). \(\square \)

Proof of (b)

The formula for the density \(\tilde{\chi }_j\) follows from a change of variables. \(\square \)

Proof of (c)

The inequality (4.17) follows from Proposition 5. We now prove that, for \(z\in {\mathcal {B}}_M(Y_j,2\varepsilon )\),

$$\begin{aligned} \vert \pi _{Y_i}(z-{\hat{\Psi }}_j\circ S_j(z))\vert \lesssim (\varepsilon +\gamma \varepsilon ^{-1})(\varepsilon ^{m} + \gamma ). \end{aligned}$$
(F.2)

Let \(u\in {{\hat{T}}}_j\) be such that \(z=\Psi _{Y_j}\circ \pi _{Y_j}(u)\) and \(y={\hat{\Psi }}_j(u)\). Recall that \(X_j\in T_{Y_j}M^\bot \) by assumption, so that \(\pi _{Y_j}(X_j-Y_j)=0\). Also, by Lemma 20(v), we have \(\Psi _{Y_j}(\pi _{Y_j}(u))=Y_j+ \pi _{Y_j}(u) + N_{Y_j}(\pi _{Y_j}(u))\) with \(N_{Y_j}(\pi _{Y_j}(u))\in T_{Y_j}M^\bot \), while by Lemma 21(ii), we have \({\hat{\Psi }}_j(u)=X_j+u+{{\hat{N}}}_j(u)\) with \({{\hat{N}}}_j(u)\in {{\hat{T}}}_j^\bot \). Hence,

$$\begin{aligned} \vert \pi _{Y_j}(z-y)\vert&=\vert \pi _{Y_j}(Y_j+ \pi _{Y_j}(u) + N_{Y_j}(\pi _{Y_j}(u))-(X_j+u+{{\hat{N}}}_j(u)))\vert \\&= \vert \pi _{Y_j}(N_{Y_j}(\pi _{Y_j}(u))-{{\hat{N}}}_j(u))\vert \\&\le \angle (T_{Y_j}M,{{\hat{T}}}_j)\vert N_{Y_j}(\pi _{Y_j}(u))-{{\hat{N}}}_j(u)\vert +\vert {\hat{\pi _j}}(N_{Y_j}(\pi _{Y_j}(u))-{{\hat{N}}}_j(u))\vert \\&\lesssim (\varepsilon ^{m-1}+\gamma \varepsilon ^{-1})(\varepsilon ^{m}+\gamma ) + \vert {\hat{\pi _j}}(\pi _{Y_j}^\bot (N_{Y_j}(\pi _{Y_j}(u))))\vert \\&\lesssim (\varepsilon ^{m-1}+\gamma \varepsilon ^{-1})(\varepsilon ^{m}+\gamma ) + \angle (T_{Y_j}M,{{\hat{T}}}_j)\vert N_{Y_j}(\pi _{Y_j}(u))\vert \\&\lesssim (\varepsilon ^{m-1}+\gamma \varepsilon ^{-1})(\varepsilon ^{m}+\gamma + \varepsilon ^2) \lesssim (\varepsilon ^{m-1}+\gamma \varepsilon ^{-1})(\varepsilon ^2+\gamma ), \end{aligned}$$

where we used Proposition 5 to bound \(\angle (T_{Y_j}M,{{\hat{T}}}_j)\), Lemma 21 to bound \(\vert N_{Y_j}(\pi _{Y_j}(u))-{{\hat{N}}}_j(u)\vert \) and Lemma 20 to bound \(\vert N_{Y_j}(\pi _{Y_j}(u))\vert \). We obtain (F.2). \(\square \)

To prove inequality (4.18), we first bound \(\vert \chi _j({\hat{\Psi }}_j\circ S_j(z))-\chi _j(z)\vert \) and then bound \(\vert J({\hat{\Psi }}_j\circ S_j)(z)-1\vert \). The first bound is based on the following elementary lemma.

Lemma 24

Let \(\theta :{\mathbb {R}}^D\rightarrow {\mathbb {R}}\) be a smooth radial function. Then, \(\vert \theta (x)-\theta (y)\vert \le \frac{\Vert \theta \Vert _{{\mathcal {C}}_2({\mathbb {R}}^D)}}{2} \vert \vert x\vert ^2-\vert y\vert ^2\vert \).

Proof

As \(d\theta (0)=0\), one can write \(\theta (x) =\tilde{\theta }(\vert x\vert ^2)\) for some function \(\tilde{\theta }\) which is Lipschitz continuous with Lipschitz constant \(\frac{\Vert d^2 \theta \Vert _{{\mathcal {C}}^0({\mathbb {R}}^D)}}{2}\). This implies the conclusion. \(\square \)

Recall from the proof of Lemma 6 that we have \(\chi _j(z) =\zeta _j(z)/\sum _{i=1}^J\zeta _{i}(z)\) where \(\zeta _i=\theta \left( \frac{z-X_i}{\varepsilon } \right) \) for some smooth radial function \(\theta \), and that furthermore, there is at most \(c_d\) non-zero terms in the sum in the denominator, which is always larger than 1. Hence, if we control for every \(i=1,\dots ,J\) the difference \(\vert \vert z-X_i\vert ^2-\vert {\hat{\Psi _j\circ }} S_j(z)-X_i\vert ^2\vert \), then we obtain a control on \(\vert \chi _j(z)-\chi _j({\hat{\Psi _j\circ }} S_j(z))\vert \). Let \(z\in M\) be such that \(\vert z-X_j \vert \le 2\varepsilon \) (for otherwise both \(\chi _j(z)\) and \(\chi _j({\hat{\Psi _j\circ }} S_j(z))\) are equal to zero). We have by (4.17) and (F.2),

$$\begin{aligned}&\vert \vert {\hat{\Psi _j\circ }} S_j(z)-X_i\vert ^2-\vert z-X_i\vert ^2\vert \nonumber \\&\quad = \vert \vert {\hat{\Psi _j\circ }} S_j(z)-z\vert ^2 + 2\langle {\hat{\Psi _j\circ }} S_j(z)-z, z-X_i \rangle \vert \nonumber \\&\quad \lesssim (\varepsilon ^m+\gamma )^2 + \vert \langle {\hat{\Psi _j\circ }} S_j(z)-z, z-Y_i \rangle \vert + \vert \langle {\hat{\Psi _j\circ }} S_j(z)-z, X_i-Y_i \rangle \vert \nonumber \\&\quad \lesssim (\varepsilon ^m+\gamma )^2 + \vert \langle \pi _{Y_j}({\hat{\Psi _j\circ }} S_j(z)-z), \pi _{Y_j}(z-Y_i) \rangle \vert \nonumber \\&\qquad + \vert \langle \pi _{Y_j}^\bot ({\hat{\Psi _j\circ }} S_j(z)-z), \pi _{Y_j}^\bot (z-Y_i) \rangle \vert + (\varepsilon ^m+\gamma )\gamma \nonumber \\&\quad \lesssim (\varepsilon ^m+\gamma )^2 + (\varepsilon +\gamma \varepsilon ^{-1})(\varepsilon ^m +\gamma )\vert z-Y_i\vert \nonumber \\&\qquad + (\varepsilon ^m+\gamma )\vert \pi _{Y_j}^\bot (z-Y_i)\vert + (\varepsilon ^m+\gamma )\gamma . \end{aligned}$$
(F.3)

By Lemma 20(i) and the fact that \(\vert z-Y_j\vert \le \vert z-X_j\vert +\gamma \lesssim \varepsilon \), we have \(\vert \pi _{Y_j}^\bot (z-Y_i)\vert \le \vert \tilde{\pi }_{Y_j}^\bot (z)\vert +\vert \tilde{\pi }_{Y_j}^\bot (Y_i)\vert \lesssim \varepsilon ^2+\vert Y_i-Y_j\vert ^2\). Hence, we obtain that

$$\begin{aligned} \vert \vert {\hat{\Psi _j\circ }} S_j(z)-X_i\vert ^2-\vert z-X_i\vert ^2\vert \lesssim (\varepsilon ^m+\gamma )(\varepsilon ^2 +\gamma + \vert Y_i-Y_j\vert ^2). \end{aligned}$$
(F.4)

Therefore,

$$\begin{aligned} \left| \theta \left( \frac{z-X_i}{\varepsilon } \right) -\theta \left( \frac{{\hat{\Psi _j\circ }} S_j(z)-X_i}{\varepsilon } \right) \right|\lesssim & {} \frac{(\varepsilon ^m+\gamma )(\varepsilon ^2 + \gamma +\vert Y_i-Y_j\vert ^2)}{\varepsilon ^2}\nonumber \\\lesssim & {} (\varepsilon ^m + \gamma )\left( 1+ \gamma \varepsilon ^{-2}+\frac{\vert Y_i-Y_j\vert ^2}{\varepsilon ^2} \right) . \end{aligned}$$
(F.5)

Note also that if \(\vert Y_i-Y_j\vert \ge 3\varepsilon \), then \(\vert z-X_i\vert \ge \vert X_i-X_j\vert -\vert z-X_j\vert \ge 3\varepsilon -\varepsilon -3\gamma \ge \varepsilon \), while by the same argument \(\vert {\hat{\Psi _j\circ }} S_j(z)-X_i\vert \ge \varepsilon \). Hence, both terms in the left-hand side of (F.5) are null in that case. Thus, we may assume that \(\vert Y_i-Y_j\vert \le 3\varepsilon \), so that \(\left| \theta \left( \frac{z-X_i}{\varepsilon } \right) -\theta \left( \frac{{\hat{\Psi _j\circ }} S_j(z)-X_i}{\varepsilon } \right) \right| \lesssim (\varepsilon ^m + \gamma )(1+\gamma \varepsilon ^{-2})\). From the definition of \(\chi _j(z)\), and as the function \(t\mapsto 1/t\) is Lipschitz on \([1,\infty [\), we obtain that

$$\begin{aligned} \vert \chi _j(z)-\chi _j({\hat{\Psi _j\circ }} S_j(z))\vert \lesssim (\varepsilon ^m+\gamma )(1+\gamma \varepsilon ^{-2}). \end{aligned}$$

We now provide a bound on \(\vert J({\hat{\Psi }}_j\circ S_j)(z)-1\vert \). One has, for \(u= S_j(z)\in {{\hat{T}}}_j\),

$$\begin{aligned} \vert J({\hat{\Psi }}_j\circ S_j)(z)-1\vert&= \frac{\vert J{\hat{\Psi _j}}(u)-J(\Psi _{Y_j}\circ \pi _{Y_j})(u)\vert }{J(\Psi _{Y_j}\circ \pi _{Y_j})(u)}. \end{aligned}$$

By Lemmas 20(v) and  21(ii), \(\left\| \mathrm {id}_{{{\hat{T}}}_j}-d(\Psi _{Y_j}\circ \pi _{Y_j})(u) \right\| _{\mathrm {op}}\lesssim 1 \) and \(\left\| \mathrm {id}_{{{\hat{T}}}_j}-d{\hat{\Psi _j}}(u) \right\| _{\mathrm {op}}\lesssim 1\) for u small enough. As a consequence, both Jacobians are larger than, say 1/2 for u small enough, and, as the function \(A\in {\mathbb {R}}^{d\times d} \mapsto \sqrt{\det (A)}\) is Lipschitz continuous on the set of matrices with \(\det (A)\ge 1/2\) and \(\left\| A \right\| _{\mathrm {op}}\le 2\), we have

$$\begin{aligned} \vert J({\hat{\Psi }}_j\circ S_j)(z)-1\vert&\lesssim \left\| d{\hat{\Psi _j}}(u)^\top d{\hat{\Psi _j}}(u)-d(\Psi _{Y_j}\circ \pi _{Y_j})(u)^\top d(\Psi _{Y_j}\circ \pi _{Y_j})(u) \right\| _{\mathrm {op}}. \end{aligned}$$
(F.6)

Recall that \({\hat{\Psi _j}}(u)=X_j+u+{{\hat{N}}}_j(u)\) and \(\Psi _{Y_j}\circ \pi _{Y_j}(u)=Y_j+\pi _{Y_j}(u)+N_{Y_j}\circ \pi _{Y_j}(u)\). We may write

$$\begin{aligned}&d{\hat{\Psi _j}}(u)^\top d{\hat{\Psi _j}}(u) = \mathrm {id}_{{{\hat{T}}}_j} + (d{{\hat{N}}}_j(u))^\top d {{\hat{N}}}_j(u) \quad \text { and}\\&d(\Psi _{Y_j}\circ \pi _{Y_j})(u)^\top d(\Psi _{Y_j}\circ \pi _{Y_j})(u) \\&\quad = {\hat{\pi _j\pi }}_{Y_j}{\hat{\pi _j}} + (d(N_{Y_j}\circ \pi _{Y_j})(u))^\top d(N_{Y_j}\circ \pi _{Y_j})(u). \end{aligned}$$

One has \(\left\| \mathrm {id}_{{{\hat{T}}}_j}-{\hat{\pi _j\pi }}_{Y_j}{\hat{\pi _j}} \right\| _{\mathrm {op}}=\left\| {\hat{\pi }}_j \pi _{Y_j}^\bot \pi _{Y_j}^\bot {\hat{\pi _j}} \right\| _{\mathrm {op}}\le \angle (T_{Y_j}M,{{\hat{T}}}_j)^2 \lesssim (\varepsilon ^{m-1}+\gamma \varepsilon ^{-1})^2\le (\varepsilon ^m+\gamma )(1+\gamma \varepsilon ^{-2})\). Furthermore, by Lemma 21(iv),

$$\begin{aligned}&\left\| (d{{\hat{N}}}_j(u))^\top d {{\hat{N}}}_j(u)-(d(N_{Y_j}\circ \pi _{Y_j})(u))^\top d(N_{Y_j}\circ \pi _{Y_j})(u) \right\| _{\mathrm {op}} \\&\qquad \le \left( \left\| d {{\hat{N}}}_j(u) \right\| _{\mathrm {op}}+\left\| d(N_{Y_j}\circ \pi _{Y_j})(u) \right\| _{\mathrm {op}} \right) \left\| d {{\hat{N}}}_j(u)-d(N_{Y_j}\circ \pi _{Y_j})(u) \right\| _{\mathrm {op}} \\&\qquad \lesssim \varepsilon (\varepsilon ^{m-1}+\gamma \varepsilon ^{-1})\lesssim \varepsilon ^m+\gamma . \end{aligned}$$

Putting together (F.6) with those two inequalities, we obtain that \(\vert J({\hat{\Psi }}_j\circ S_j)(z)-1\vert \lesssim (\varepsilon ^m+\gamma )(1+\gamma \varepsilon ^{-2})\), concluding the proof of Lemma 16.

To conclude the proof of Theorem 8, it remains to control the quantity T appearing in Lemma 17 for \(\phi =K_h*(\nu _n/{\hat{\rho _h}})\) and \(\phi '=K_h*(\mu _n/\rho _h)\).

Lemma 25

The quantity \(T=\max _{j=1\dots J}\sup _{z\in {\mathcal {B}}(Y_j,\varepsilon )}\vert \phi ({\hat{\Psi _j\circ }} S_j(z))-\phi '(z)\vert \) satisfies \(T\lesssim (\varepsilon ^m + \gamma )(1+\gamma \varepsilon ^{-2})\) with probability larger than \(1-cn^{-k/d}\).

Proof

For \(z\in {\mathcal {B}}(Y_j,\varepsilon )\), we have

$$\begin{aligned} \vert \phi ({\hat{\Psi _j\circ }} S_j(z))-\phi '(z)\vert \le \frac{1}{n}\sum _{i=1}^n \left| \frac{K_h*\delta _{X_i}({\hat{\Psi _j\circ }} S_j(z))}{{\hat{\rho _h}}(X_i)}- \frac{K_h*\delta _{Y_i}(z)}{\rho _h(Y_i)}\right| . \end{aligned}$$

The same computation than in (F.3) shows that

$$\begin{aligned} \vert \vert {\hat{\Psi _j\circ }} S_j(z)-Y_i\vert ^2-\vert z-Y_i\vert ^2\vert \lesssim (\varepsilon ^m+\gamma )(\varepsilon ^2 +\gamma + \vert Y_i-Y_j\vert ^2). \end{aligned}$$

This inequality together with Lemma 24 yield

$$\begin{aligned}&\vert K_h(X_i-{\hat{\Psi _j\circ }} S_j(z))-K_h(Y_i-z)\vert \\&\qquad \lesssim h^{-d-2}(\varepsilon ^m+\gamma )(\varepsilon ^2 +\gamma + \vert Y_i-Y_j\vert ^2). \end{aligned}$$

We may assume that \(\vert Y_i-Y_j\vert \le 3h\) and \(\vert z-Y_i\vert \le 2h\), for otherwise both quantities in the left-hand site of the above equation are equal to zero. Hence, as \(\varepsilon \lesssim h\) by assumption, we have

$$\begin{aligned} \vert K_h(X_i-{\hat{\Psi _j\circ }} S_j(z))-K_h(Y_i-z)\vert \lesssim h^{-d}(\varepsilon ^m+\gamma )(1+\gamma \varepsilon ^{-2}){\mathbf {1}}\{Y_i\in {\mathcal {B}}_M(z,2h)\}.\nonumber \\ \end{aligned}$$
(F.7)

Let us now bound \(\vert {\hat{\rho _h}}(X_i)-\rho _h(Y_i)\vert \). By the triangle inequality, and using (4.18) and (F.7), we obtain that this quantity is smaller than

$$\begin{aligned}&\vert \sum _{j=1}^J \int _{{\hat{\Psi _j}}({{\hat{T}}}_j)}\chi _j(w)K_h(X_i-w)\mathrm {d}w - \sum _{j=1}^J \int _M \chi _j(z) K_h(Y_i-z)\mathrm {d}z\vert \\&\quad \le \sum _{j=1}^J \int _M\vert \tilde{\chi }_j(z)K_h(X_i-{\hat{\Psi _j\circ }} S_j(z))-\chi _j(z)K_h(Y_i-z)\vert \mathrm {d}z \\&\quad \lesssim \sum _{j=1}^J \int _M( {\mathbf {1}}\{z\in {\mathcal {B}}_M(Y_j,2\varepsilon )\}(\varepsilon ^m+\gamma )(1+\gamma \varepsilon ^{-2})\vert K_h(Y_i-z)\vert \\&\qquad + \tilde{\chi }_j(z)h^{-d}(\varepsilon ^m+\gamma )(1+\gamma \varepsilon ^{-2}){\mathbf {1}}\{z\in {\mathcal {B}}_M(Y_i,2h)\})\mathrm {d}z\\&\quad \lesssim h^{-d}(\varepsilon ^m+\gamma )(1+\gamma \varepsilon ^{-2})\sum _{j=1}^J \int _M {\mathbf {1}}\{z\in {\mathcal {B}}_M(Y_j,2\varepsilon )\}{\mathbf {1}}\{z\in {\mathcal {B}}_M(Y_i,2h)\}\mathrm {d}z \\&\quad \lesssim \varepsilon ^dh^{-d}(\varepsilon ^m+\gamma )(1+\gamma \varepsilon ^{-2})\sum _{j=1}^J {\mathbf {1}}\{\vert Y_j-Y_i\vert \le 4h\} \\&\quad \lesssim h^{-d}(\varepsilon ^m +\gamma )(1+\gamma \varepsilon ^{-2})\sum _{j=1}^J {\mathbf {1}}\{\vert Y_j-Y_i\vert \le 4h\}\mathrm {vol}_M({\mathcal {B}}_M(Y_j,\varepsilon /8))\\&\quad \lesssim h^{-d}(\varepsilon ^m +\gamma )(1+\gamma \varepsilon ^{-2})\mathrm {vol}_M({\mathcal {B}}_M(Y_i,5h))\lesssim (\varepsilon ^m +\gamma )(1+\gamma \varepsilon ^{-2}), \end{aligned}$$

where we use that \(\{X_1,\dots ,X_J\}\) is \(7\varepsilon /24\)-sparse, so that \(\{Y_1,\dots ,Y_J\}\) is \(\varepsilon /4\)-sparse. Therefore, the balls \({\mathcal {B}}_M(Y_j,\varepsilon /8)\) for \(\vert Y_j-Y_i\vert \le 4h\) are pairwise distincts, and are all included in \({\mathcal {B}}_M(Y_i,4h+\varepsilon /8)\subset {\mathcal {B}}_M(Y_i,5h)\). We conclude by Lemma 20(iii). Letting N(z, 2h) be the number of points \(Y_i\) belonging to \({\mathcal {B}}_M(z,2h)\), we obtain

$$\begin{aligned}&\vert \phi ({\hat{\Psi _j\circ }} S_j(z))-\phi '(z)\vert \\&\quad \lesssim \frac{1}{n}\sum _{i=1}^n (\vert K_h(Y_i-z)\vert (\varepsilon ^m +\gamma )(1+\gamma \varepsilon ^{-2})\\&\qquad +h^{-d}(\varepsilon ^m+\gamma )(1+\gamma \varepsilon ^{-2}){\mathbf {1}}\{Y_i\in {\mathcal {B}}_M(z,2h)\})\\&\quad \lesssim \frac{N(z,2h)}{nh^{d}}(\varepsilon ^m+\gamma )(1+\gamma \varepsilon ^{-2}). \end{aligned}$$

If, for every \(z\in M\) and some \(\lambda >0\), \(N(z,2h)\le \lambda nh^d\), then we have the conclusion. Let us bound

$$\begin{aligned} P_0= {\mathbb {P}}(\exists z\in M,\ N(z,2h)> \lambda nh^d). \end{aligned}$$

If \(N(z,2h)>\lambda nh^d\), then there exists a point \(Y_i\) with \(N(Y_i,4h)\ge N(z,2h)>\lambda nh^d\). Hence, \(P_0 \le n{\mathbb {P}}(N(Y_1,4h)>\lambda nh^d)\). Conditionally on \(Y_1\), \(N(Y_1,4h)=1+U\) with U a binomial random variable of parameters \(n-1\) and \(\mu ({\mathcal {B}}_M(Y_1,4h))\le f_{\max }\mathrm {vol}_M({\mathcal {B}}_M(Y_1,4h))\lesssim h^d\) (see Lemma 20(iii)). In particular, for \(\lambda \) large enough, the probability \(P_0\) is smaller than \(n^{-k/d}\) by Bernstein’s inequality, as long as \(nh^d > rsim 1\). \(\square \)

We conclude this section by giving a proof of Proposition 19.

Proof

Recall that \(W_1,\dots ,W_N\) is a N-sample of law \({{\hat{U}}}_M\), and that we are in the noiseless regime \(\gamma =0\) with \(m=k\). Define \(j_a\) the index with \(W_a\in {\hat{\Psi }}_{j_a}({{\hat{T}}}_{j_a})\), and let \(H_a = ({\hat{\Psi }}_{j_a} \circ S_{j_a})^{-1}(W_a)\). Then, Lemma 16 implies that \(\vert W_a-H_a\vert \lesssim \varepsilon ^k\). Furthermore, the sample \(H_1,\dots ,H_N\) has a law \(\mu _H\) with density \(\sum _{j=1}^J {\tilde{\chi }}_j\) on M. We decompose the distance into

$$\begin{aligned} W_\infty (({{\hat{U}}}_M)_N,U_M)&\le W_\infty (({{\hat{U}}}_M)_N,N^{-1}\sum _{a=1}^N \delta _{H_a}) + W_\infty (N^{-1}\sum _{a=1}^N \delta _{H_a},\mu _H) \\&\qquad + W_\infty (\mu _H,U_M). \end{aligned}$$

The first term is of order \(\varepsilon ^k\), while the second term scales as the second term of (5.8) according to [30]. The third term was already shown to be bounded by \(\varepsilon ^k\) in the proof of Lemma 17 (with \(\phi ={\tilde{\phi }}=1\)). As \(\varepsilon ^k \simeq (\log n/n)^{k/d}\), this concludes the proof. \(\square \)

Lower bounds on minimax risks

In this section, we prove the different lower bounds on minimax risks stated in the article. The main tool used will be Assouad’s lemma. Fix a statistical model \(({\mathcal {Q}},\vartheta ,{\mathcal {L}})\), where we observe a sample of law \(\mu \in {\mathcal {Q}}\), while \(\vartheta (\mu )\) is a quantity of interest to be estimated, with risk measured by the loss function \({\mathcal {L}}\).

Lemma 26

(Assouad’s lemma [60]) Let \(m\ge 1\) be an integer and \({\mathcal {Q}}_m = \{\mu _\sigma ,\ \sigma \in \{-1,1\}^m\}\subset {\mathcal {Q}}\) be a set of probability measures. Assume that for all \(\sigma ,\sigma ' \in \{-1,1\}^m\),

$$\begin{aligned} {\mathcal {L}}(\vartheta (\mu _\sigma ),\vartheta (\mu _{\sigma '})) \ge \vert \sigma -\sigma '\vert \delta , \end{aligned}$$
(G.1)

where \(\vert \sigma -\sigma '\vert =\sum _{i=1}^m {\mathbf {1}}\{\sigma (i)\ne \sigma '(i)\}\) is the Hamming distance between \(\sigma \) and \(\sigma '\). Then,

$$\begin{aligned} {\mathcal {R}}_n(\vartheta ,{\mathcal {Q}},{\mathcal {L}}) \ge m\frac{\delta }{16} \left( 1-\max \left\{ \mathrm {TV}(\mu _\sigma ,\mu _{\sigma '}),\ \vert \sigma - \sigma '\vert =1\right\} \right) ^{2n}. \end{aligned}$$
(G.2)

The lower bound on the minimax rates we prove are actually going to hold on the smaller model of uniform distributions on manifolds.

Definition 7

Let \(k\ge 2\) and \(\gamma \ge 0\). The set \({\mathcal {Q}}^{k}_{d}\) is the set of uniform distributions on some manifold \(M\in {\mathcal {M}}^k_d\) with \(f_{\max }^{-1}\le \vert \mathrm {vol}_M\vert \le f_{\min }^{-1}\).

One can check that \({\mathcal {Q}}^k_d\subset {\mathcal {Q}}^{k,s}_d\), with parameter \(L_s= f_{\min }^{-1/p}\vee f_{\max }^{1-1/p}\). Therefore, a lowerbound on the minimax risk on the model \({\mathcal {Q}}^k_d\) yields a lowerbound on the minimax risk on the model \({\mathcal {Q}}^{k,s}_d\) should the parameter \(L_s\) be large enough.

We build a subfamily of manifolds indexed by \(\sigma \in \{-1,1\}^m\) following [6]. By [6, Section C.2], there exists a d-dimensional manifold \(M \subset {\mathbb {R}}^{d+1}\) of reach \(2\tau _{\min }\), of volume \(C_d\tau _{\min }^d\) which contains \({\mathcal {B}}_{{\mathbb {R}}^d}(0,\tau _{\min })\times \{0\}\) (that we identify with \({\mathcal {B}}_{{\mathbb {R}}^d}(0,\tau _{\min })\)). Let \(\delta >0\) and consider a family of m points \(x_1,\dots ,x_m \in {\mathcal {B}}_{{\mathbb {R}}^d}(0,\tau _{\min }/2)\), with \(\vert x_i-x_{i'}\vert \ge 4\delta \) for \(i\ne i'\) and \(c_d(\tau _{\min }/\delta )^d\le m \le C_d(\tau _{\min }/\delta )^d\). Let \(0<\Lambda <\delta \) and let \(\phi :{\mathbb {R}}^{d+1}\rightarrow [0,1]\) be a smooth radial function supported on \({\mathcal {B}}(0,1)\), with \(\phi \equiv 1\) on \({\mathcal {B}}(0,1/2)\). Let e be the unit vector in the \((d+1)\)th direction. We then let, for \(\sigma \in \{-1,1\}^m\),

$$\begin{aligned} \Phi _\sigma ^{\Lambda }(x) =x + \sum _{i=1}^m \frac{\sigma (i) +1}{2}\Lambda \phi \left( \frac{x-x_i}{\delta } \right) e. \end{aligned}$$
(G.3)

Let \(M_\sigma ^\Lambda = \Phi _\sigma ^\Lambda (M)\) and \(\mu _\sigma ^\Lambda \) be the the uniform measure on \(M_\sigma ^\Lambda \). Informally, the manifold \(M_\sigma ^\Lambda \) is obtained by adding bumps of height \(\Lambda \) to the base manifold M at locations \(x_i\) such that \(\sigma (i)=+1\). If \(\Lambda \le c_{k,d,\tau _{\min }}\delta ^k\), then \(\mu _\sigma ^\Lambda \in {\mathcal {Q}}^{k}_{d}\), provided that \(L_k\) is large enough [6, Lemma C.13]. If \(\sigma (i)=1\), the volume of \(\Phi _\sigma ^\Lambda ({\mathcal {B}}_{{\mathbb {R}}^{d}}(x_i,\delta ))\) satisfies (with \(\omega _d\) denoting the volume of the d-dimensional unit ball)

$$\begin{aligned}&\left| \mathrm {vol}_{M_\sigma ^\Lambda }(\Phi _\sigma ^\Lambda ({\mathcal {B}}_{{\mathbb {R}}^{d}}(x_i,\delta ))-\omega _d\delta ^d\right| \le \int _{{\mathcal {B}}_{{\mathbb {R}}^{d}}(x_i,\delta )} \vert J\Phi _\sigma ^\Lambda (x)-1\vert \mathrm {d}x \\&\qquad \le \int _{{\mathcal {B}}_{{\mathbb {R}}^{d}}(x_i,\delta )} \left| \sqrt{1+\Lambda ^2\delta ^{-2}\left| \nabla \phi \left( \frac{x-x_i}{\delta } \right) \right| ^2}-1\right| \mathrm {d}x \le C_d \delta ^d\Lambda ^2 \delta ^{-2}. \end{aligned}$$

Hence, for \(\delta \) small enough, we have \(\vert \vert \mathrm {vol}_{M_\sigma ^\Lambda }\vert -C_d\tau _{\min }^d\vert \le mC_d\delta ^d\Lambda ^2\delta ^{-2} \le C_d\tau _{\min }^d/3\), as \(m\le C_d(\tau _{\min }/\delta )^d\) and \(\Lambda \le c_{k,d,\tau _{\min }}\delta ^k\). As a consequence, if \(\vert \sigma - \sigma '\vert =1\), with for instance \(\sigma (i)=1\) and \(\sigma '(i)=-1\), then

$$\begin{aligned} \mathrm {TV}(\mu _\sigma ^\Lambda ,\mu _\sigma '^\Lambda )&\le \max (\mu _{\sigma }^\Lambda (\Phi _\sigma ^\Lambda ({\mathcal {B}}_{{\mathbb {R}}^{d}}(x_i,\delta ))),\mu _{\sigma '}^\Lambda ({\mathcal {B}}_{{\mathbb {R}}^{d}}(x_i,\delta ))\le C_{d,\tau _{\min }}\delta ^d. \end{aligned}$$
(G.4)

We may now prove the different minimax lower bounds using Assouad’s Lemma on the family \(\{\mu _\sigma ^\Lambda ,\ \sigma \in \{-1,1\}^m\}\).

Proof of Theorem 3

As g is nondecreasing and convex, by Jensen’s inequality, we may assume without loss of generality that \({\mathcal {L}}=\mathrm {TV}\). Let \(\Gamma = \vert (\mu _\sigma ^\Lambda -\mu _{\sigma '}^\Lambda )(B_i)\vert \), where \(B_i= {\mathcal {B}}_{{\mathbb {R}}^{d}}(x_i,\delta )\) and \(\sigma (i)\ne \sigma '(i)\). Then, \(\mathrm {TV}(\mu _\sigma ^\Lambda ,\mu _{\sigma '}^\Lambda )\ge \vert \sigma -\sigma '\vert \Gamma \). Furthermore, if for instance \(\sigma '(i)=1\), \(\Gamma \ge \mu _{\sigma '}^\Lambda (B_i) = (\omega _d \delta ^d)/\vert \mathrm {vol}_{M_{\sigma '}^\Lambda }\vert \ge c_d \delta ^d/\tau _{\min }^d.\) By Assouad’s Lemma,

$$\begin{aligned} {\mathcal {R}}_n(\mu ,{\mathcal {Q}}^{s,k}_d,\mathrm {TV})&\ge {\mathcal {R}}_n(\mu ,{\mathcal {Q}}^{k}_d,\mathrm {TV}) \ge \frac{m}{16} c_d \frac{\delta ^d}{\tau _{\min }^d} \left( 1-C_{d,\tau _{\min }}\delta ^d \right) ^{2n}\\&\ge C_d\left( 1-C_{d,\tau _{\min }}\delta ^d \right) ^{2n}. \end{aligned}$$

We obtain the conclusion by letting \(\delta \) go to 0. \(\square \)

Proof of Theorem 7(iii)

As, \(W_r\ge W_1\), we may assume that \(r=1\). Let \(\sigma ,\sigma '\in \{-1,1\}^m\) with \(\sigma (i)\ne \sigma '(i)\). Let \(p_{\sigma (i)}=\mathrm {vol}_{M_\sigma ^\Lambda }({\mathcal {B}}(x_i,\delta ))\) and \(U_{\sigma ,i}^\Lambda = p_{\sigma (i)}^{-1}(\mathrm {vol}_{M_\sigma ^\Lambda })_{\vert {\mathcal {B}}(x_i,\delta )}\). By the Kantorovitch-Rubinstein duality formula, \(W_1(\mu ,\nu ) = \max \int f\mathrm {d}(\mu -\nu )\), where the maximum is taken over all 1-Lipschitz continuous functions \(f:{\mathbb {R}}^D\rightarrow {\mathbb {R}}\). Recall that e is the unit vector in the \((d+1)\)th direction and let \(f:x\mapsto x\cdot e\). Assume for instance that \(\sigma (i)=-1\) and \(\sigma '(i)=1\). We have \(f(x) = 0\) for \(x\in {\mathcal {B}}_{M_\sigma ^\Lambda }(x_i,\delta )\) and \(f(x)=\Lambda \) for \(x\in {\mathcal {B}}_{M_{\sigma '}^\Lambda }(x_i,\delta /2)\). Therefore, we have, as \(p_{\sigma '(i)}\le c\delta ^{-d}\),

$$\begin{aligned} W_1(U_{\sigma ,i}^\Lambda ,U_{\sigma ',i}^\Lambda ) \ge p_{\sigma '(i)}^{-1}\Lambda \omega _d(\delta /2)^d \ge c_1\Lambda . \end{aligned}$$

Note also that \(\vert p_{\sigma (i)}-p_{\sigma '(i)}\vert \le \left| \mathrm {vol}_{M_\sigma ^\Lambda }(\Phi _\sigma ^\Lambda ({\mathcal {B}}_{{\mathbb {R}}^{d}}(x_i,\delta ))-\omega _d\delta ^d\right| \le C_d\delta ^{d}\Lambda ^2 \delta ^{-2}\). Furthermore, \(\vert \vert \mathrm {vol}_{M_\sigma ^\Lambda }\vert -\vert \mathrm {vol}_{M_{\sigma '}^\Lambda }\vert \vert \le \sum _{i=1}^m\vert p_{\sigma (i)}-p_{\sigma '(i)}\vert \le \vert \sigma -\sigma '\vert C_d\delta ^{d}\Lambda ^2 \delta ^{-2}\). Let \(f_i\) be a 1-Lipschitz continuous function such that \(W_1(U_{\sigma ,i}^\Lambda ,U_{\sigma ',i}^\Lambda )=\int f_i d(U_{\sigma ,i}^\Lambda -U_{\sigma ',i}^\Lambda )\). One can choose \(f_i\) such that \(f_i(x_i)=0\), so that the maximum of \(\vert f_i\vert \) on \({\mathcal {B}}(x_i,\delta )\) is at most \(\delta \). One can then change the value of \(f_i\) outside the ball without changing the value of the integral, so that \(f_i\) is supported on \({\mathcal {B}}(x_i,2\delta )\) and is 1-Lipschitz continuous. Consider the function f obtained by gluing together the different functions \(f_i\). The function f is 1-Lipschitz continuous, so that

$$\begin{aligned}&W_1\left( \mu _\sigma ^\Lambda ,\mu _{\sigma '}^\Lambda \right) \ge \sum _{i=1}^m \left( \frac{p_{\sigma (i)}}{\vert \mathrm {vol}_{M_\sigma ^\Lambda }\vert }U_{\sigma ,i}^\Lambda -\frac{p_{\sigma '(i)}}{\vert \mathrm {vol}_{M_{\sigma '}^\Lambda }\vert }U_{\sigma ',i}^\Lambda \right) (f)\\&\quad \ge \sum _{i=1}^m \frac{p_{\sigma (i)}}{\vert \mathrm {vol}_{M_\sigma ^\Lambda }\vert }(U_{\sigma ,i}^\Lambda -U_{\sigma ',i}^\Lambda )(f)\\&\qquad - \vert p_{\sigma (i)}-p_{\sigma '(i)}\vert \frac{\vert U_{\sigma ',i}^\Lambda (f)\vert }{\vert \mathrm {vol}_{M_\sigma ^\Lambda }\vert }- p_{\sigma '(i)}\vert U_{\sigma ',i}^\Lambda (f)\vert \left| \frac{1}{\vert \mathrm {vol}_{M_\sigma ^\Lambda }\vert }-\frac{1}{\vert \mathrm {vol}_{M_{\sigma '}^\Lambda }\vert }\right| \\&\quad \ge \sum _{i=1}^m \frac{p_{\sigma (i)}}{\vert \mathrm {vol}_{M_\sigma ^\Lambda }\vert }W_1(U_{\sigma ,i}^\Lambda ,U_{\sigma ',i}^\Lambda ) \\&\qquad -\sum _{i=1}^m c_4\vert p_{\sigma (i)}-p_{\sigma '(i)}\vert \delta {\mathbf {1}}\{\sigma (i)\ne \sigma '(i)\}- c_5\delta \vert \sigma -\sigma '\vert \delta ^{d}\Lambda ^2 \delta ^{-2}\\&\quad \ge \sum _{i=1}^m {\mathbf {1}}\{\sigma (i)\ne \sigma '(i)\} (c_6\delta ^d\Lambda -c_4\delta ^{d}\Lambda ^2 \delta ^{-1}) - c_5\delta \vert \sigma -\sigma '\vert \delta ^{d}\Lambda ^2 \delta ^{-2} \\&\quad \ge c_7\delta ^d \Lambda \vert \sigma -\sigma '\vert , \end{aligned}$$

where we used at the last line that we choose \(\Lambda \le c\delta ^2\) for some constant c small enough. More precisely, we let \(\Lambda = c_{k,d,\tau _{\min },L_k}\delta ^k\) and \(\delta =n^{-1}\), and obtain, by Assouad’s Lemma,

$$\begin{aligned} {\mathcal {R}}_n\left( \frac{\mathrm {vol}_M}{\vert \mathrm {vol}_M\vert },{\mathcal {Q}}^k_d(\gamma ),W_r \right) > rsim n^{-k/d}. \end{aligned}$$

\(\square \)

Proof of Theorem 4(iv)

Let \(a_n=n^{-\frac{s+1}{2s+d}}\) if \(d\ge 3\) and \(a_n = n^{-1/2}\) if \(d\le 2\). As \(W_p\ge W_1\), we may assume without loss of generality that \(p=1\), and up to rescaling, we assume that \(\tau _{\min }=\sqrt{d}\). Consider the manifold \(M \subset {\mathbb {R}}^{d+1}\) containing \({\mathcal {B}}_{{\mathbb {R}}^d}(0,\sqrt{d})\) of the previous proof. In particular, M contains the cube \([-1,1]^d\). We adapt the proof of Theorem 3 in [29], where authors consider a family of functions \(f_\sigma : [-1,1]^d \rightarrow M\) indexed by \(\sigma \in \{-1,1\}^m\), with \(f_\sigma = 1+n^{-1/2}\sum _{j=1}^m \sigma _j\psi _j\), where \((\psi _j)_{j=1,\dots ,m}\) are elements of a wavelet basis of \([-1,1]^d\) that satisfy \(\int \psi _j=0\) (see [29, Appendix E] for details on the construction of the wavelet basis). If \(m\lesssim n^{d/(2s+d)}\), then \(t_0\le f_\sigma \le t_1\) for some positive constants \(t_0<1<t_1\), and \(\Vert f_\sigma \Vert _{B^s_{p,q}([-1,1]^d)} \lesssim 1\). Note that each \(\psi _j\) is supported on a small rectangle inside \([-1,1]^d\), and can be extended to a smooth function on M (by simply defining \(\psi _j=0\) outside \([-1,1]^d\)). Therefore, we can also consider \(f_\sigma \) as being defined on M. This extension (that we still denote by \(f_\sigma \)) also satisfies \(t_0\le f_\sigma \le t_1\) and \(\Vert f_\sigma \Vert _{B^s_{p,q}(M)} \lesssim 1\) (this last inequality is clear for the \(\Vert \cdot \Vert _{H^l_p(M)}\) norm for l an integer, while the result follows from interpolation for Besov spaces [36, Corollary 1.1.7]).

As \(\int \psi _j=0\), we have \(\int _M f_\sigma = \vert \mathrm {vol}_M \vert \). Let \({\tilde{f}}_\sigma = f_\sigma /\vert \mathrm {vol}_M \vert \), that is larger than \(f_{\min }= t_0/ \vert \mathrm {vol}_M\vert \) and smaller than \(f_{\max }=t_1/ \vert \mathrm {vol}_M\vert \). Hence, identifying measures with their densities, the set

$$\begin{aligned} {\mathcal {Q}}_m = \{\tilde{f}_{\sigma },\ \sigma \in \{-1,1\}^m\} \end{aligned}$$

is a subset of \({\mathcal {Q}}^{s,k}_d\) for \(f_{\min }\) small enough and \(L_k\), \(L_s\), \(f_{\max }\) large enough. Furthermore, for \(\sigma , \sigma '\in \{-1,1\}^m\), we have \(\mathrm {TV}(\tilde{f}_\sigma ,\tilde{f}_{\sigma '}) = \mathrm {TV}(f_\sigma ,f_\sigma ')/\vert \mathrm {vol}_M\vert \). Also, for any function \(\phi :{\mathbb {R}}^{d+1}\rightarrow {\mathbb {R}}\) that is 1-Lipschitz, we have

$$\begin{aligned} \int _M \phi (\tilde{f}_\sigma - \tilde{f}_{\sigma '})&= \int _{[-1,1]^d} \frac{\phi (f_{\sigma } - f_{\sigma '})}{\vert \mathrm {vol}_M\vert }, \end{aligned}$$

so that \(W_1(\tilde{f}_\sigma ,\tilde{f}_{\sigma '}) = W_1(f_\sigma ,f_\sigma ')/\vert \mathrm {vol}_M\vert \). Hence, we have reduced our problem to the case of the cube, and applying Assouad’s inequality in the same fashion than in [29, Theorem 3] yields that \({\mathcal {R}}_n(\mu ,{\mathcal {Q}}^{s,k}_d,W_1) > rsim a_n\). \(\square \)

Existence of kernels satisfying conditions A, B(m) and \(C(\beta )\)

The goal of the section is to prove the existence of a kernel K satisfying the conditions A, B(m) and \(C(\beta )\) stated at the beginning of Sect. 3.

If K is a radial kernel, we have by integration by parts, as K is smooth with compact support,

$$\begin{aligned} \int _{{\mathbb {R}}^d}\partial ^{\alpha _0}K(v)v^{\alpha _1}\mathrm {d}v&= C_{\alpha _0,\alpha _1}\int _{{\mathbb {R}}^d}K(v)v^{\alpha _1+\alpha _0}\mathrm {d}v\\&= C'_{\alpha _0,\alpha _1}\int _{{\mathbb {R}}} K(r)r^{d+\vert \alpha _0 \vert +\vert \alpha _1 \vert -1}\mathrm {d}r. \end{aligned}$$

Hence, to show the existence of such a kernel, it suffices to find, for every \(m\ge 0\) and every positive constant \(\kappa \), a smooth even function \(K:{\mathbb {R}}\rightarrow {\mathbb {R}}\) supported on \([-1,1]\) satisfying

  • Condition \(A'\): \(\int _{{\mathbb {R}}} K(r)r^{d-1}\mathrm {d}r= \kappa \),

  • Condition \(B'(m)\): \(\int _{{\mathbb {R}}} K(r)r^{d+i-1}\mathrm {d}r=0\) for \(i= 1,\dots ,m\),

  • Condition \(C'(\beta )\): \(\int _{{\mathbb {R}}} K(r)^- r^{d-1}\mathrm {d}r \le \beta \).

We show by recursion on m that for any \(\beta >0\), there exists a such a kernel. For \(m=0\), let \(K_0\) be any smooth even nonnegative function supported on \([-1,1]\). Then, letting \(K=\kappa K_0/\int _{{\mathbb {R}}}K_0\), we obtain a kernel K satisfying the desired conditions for any \(\beta >0\). Consider now the case \(m>0\). Let \(\beta >0\).

  • If \(m+d\) is even, then any K satisfying conditions \(A'\), \(B'(m-1)\) and \(C'(\beta )\) will also satisfy \(B'(m)\). Indeed, as K is even, we have \(\int _{{\mathbb {R}}} K(r)r^{m+d-1}\mathrm {d}r=0\), so that the induction step is proven.

  • If \(m+d\) is odd, let K be a kernel satisfying conditions \(A'\), \(B'(m-1)\) and \(C'(\beta /2)\). We use the following lemma.

Lemma 27

For \(i\ge 0\), let \(e_i:x\in {\mathbb {R}}\mapsto x^{i+d-1}\) and fix an integer \(m>0\). For any \(a\in {\mathbb {R}}\), let \(F_a\) be the set of smooth functions \(f:(1,\infty )\rightarrow {\mathbb {R}}\) with compact support satisfying \(\int fe_i=0 \text { for } 0\le i<m\text { and } \int fe_{m}=a\). Then,

$$\begin{aligned} \inf \left\{ \int \vert f(r)\vert r^{d-1}\mathrm {d}r,\ f\in F_a \right\} = 0. \end{aligned}$$
(H.1)

Assume first that the lemma is true. Let \(a=-\frac{1}{2}\int _{{\mathbb {R}}} K(r)r^{m+d-1}\) and \(f\in F_a\). Then,

$$\begin{aligned} {\left\{ \begin{array}{ll} \int (K(r)+f(\vert r\vert ))r^{d-1}\mathrm {d}r= \kappa +\int f(\vert r\vert )r^{d-1}\mathrm {d}r=\kappa \\ \int (K(r)+f(\vert r\vert ))r^{i+d-1} \mathrm {d}r=\int f(\vert r\vert )r^{i+d-1}\mathrm {d}r = 0\text { for } 0<i<m \\ \int (K(r)+f(\vert r\vert ))r^{m+d-1} \mathrm {d}r=\int K(r)r^{m+d-1}\mathrm {d}r + 2\int _1^{\infty } f(r)r^{m+d-1}\mathrm {d}r = 0. \end{array}\right. } \end{aligned}$$

Hence, the kernel \(K+f(\vert \cdot \vert )\) satisfies the conditions A and \(B'(m)\). Also, we have, as \(K(r)=0\) if \(\vert r\vert \ge 1\),

$$\begin{aligned}&\int _{{\mathbb {R}}} (K(r)+f(\vert r\vert ))_-r^{d-1} \mathrm {d}r= \int _{{\mathbb {R}}} K(r)_- \mathrm {d}r + 2\int _1^{\infty }f(r)_-r^{d-1}\mathrm {d}r \\&\qquad \le \beta /2 + \int _1^\infty \vert f(r)\vert r^{d-1}\mathrm {d}r, \end{aligned}$$

where we used at the last line that \(\int _1^{\infty }f(r)_-r^{d-1}\mathrm {d}r=\int _1^{\infty }f(r)_+r^{d-1}\mathrm {d}r=\frac{1}{2}\int _1^{\infty }\vert f(r)\vert r^{d-1}\mathrm {d}r\). Lemma 27 asserts the existence of \(f\in F_a\) with \(\int \vert f(r)\vert r^{d-1}\mathrm {d}r\le \beta /2\). For such a choice of f, the kernel \(\tilde{K}=K+f(\vert \cdot \vert )\) satisfies also \(C'(\beta )\). Finally, f has a compact support, included in [0, R] for some \(R>0\). The kernel \(\tilde{K}_{1/R}\) is supported on \({\mathcal {B}}(0,1)\), and satisfies conditions \(A'\), \(B'(m)\) and \(C'(\beta )\). This concludes the induction step, and the proof of the existence of kernels satisfying conditions A, B(m) and \(C(\beta )\).

Proof of Lemma 27

Consider functions f supported on \([r_0,r_1]\) for some constants \(1< r_0\le r_1\) to fix. Let \(G_{r_0,r_1}\) be the subspace of \(L_2([r_0,r_1])\) spanned by the functions \(e_i\) for \(0\le i \le m-1\) and let \(g_{m}\) be the projection of \(e_{m}\) on \(G_{r_0,r_1}^\bot \), the orthogonal space of \(G_{r_0,r_1}\). Let \(\ell =\Vert g_m\Vert _{L_2[r_0,r_1]}\). The function \(f = \frac{ag_{m}}{\ell ^2} \) is a polynomial of degree m restricted to \([r_0,r_1]\) and satisfies \(\int fe_i=0\) for \(0\le i\le m-1\) by construction, with \(\int fe_{m} = \frac{a}{\ell ^2} \int e_{m} g_{m} = a\). Also, we have for any polynomial \(P\in G_{r_0,r_1}\),

$$\begin{aligned} \Vert e_{m}-P\Vert ^2_{L_2([r_0,r_1])}&= \int _{r_0}^{r_1} \vert r^{m+d-1}-P(r)\vert ^2\mathrm {d}r=\int _{1}^{\frac{r_1}{r_0}} r_0\vert (r_0r)^{d+m-1}-P(rr_0)\vert ^2\mathrm {d}r\\&= r_0^{2(d+m)-1}\int _{1}^{\frac{r_1}{r_0}} \vert r^{d+m-1}-r_0^{-(d+m-1)}P(rr_0)\vert ^2\mathrm {d}r. \end{aligned}$$

As \(r\mapsto r_0^{-(d+m-1)}P(rr_0)\) is an element of \(G_{1,r_1/r_0}\), letting \(r_1=2r_0\), we obtain

$$\begin{aligned} \ell ^2&= \Vert g_{m}\Vert _{L_2([r_0,r_1])}^2 = \min _{P\in G_{r_0,r_1}}\Vert e_{m}-P\Vert ^2_{L_2([r_0,r_1])} \\&= r_0^{2(d+m)-1} \min _{P\in G_{1,2}}\Vert e_{m}-P\Vert ^2_{L_2([1,2])}=Cr_0^{2(d+m)-1}, \end{aligned}$$

where \(C=C_m>0\) is the distance between \(e_m\) restricted to [1, 2] and \(G_{1,2}\). The function f is not smooth so that it does not belong to \(F_a\). To overcome this issue, we consider a smooth kernel \(\rho \) on \({\mathbb {R}}\) satisfying \(\int \rho =1\) and \(\int \rho (r)r^i \mathrm {d}r=0\) for \(i=1,\dots ,m+d-1\), with support included in \({\mathcal {B}}_{{\mathbb {R}}}(0,r_0/2)\). See e.g. [21, Section 3.2] for the construction of such a kernel \(\rho \). The map \(\rho *f\) is supported on \((1,\infty )\) and it is straighforward to check that \(\rho *f \in F_a\) for \(r_0>2\). By Young’s inequality, \(\Vert \rho *f\Vert _{L_2({\mathbb {R}})} \le \Vert \rho \Vert _{L_\infty ({\mathbb {R}})} \Vert f\Vert _{L_2({\mathbb {R}})}\), so that

$$\begin{aligned} \int \vert \rho *f(r)\vert r^{d-1}\mathrm {d}r&\le \left( \int _{r_0/2}^{5r_0/2}r^{2d-2}\mathrm {d}r \right) ^{1/2}\Vert \rho *f\Vert _{L_2({\mathbb {R}})}\\&\le \left( c_dr_0^{2d-1} \right) ^{1/2} \Vert \rho \Vert _{L_\infty ({\mathbb {R}})}\Vert f\Vert _{L_2({\mathbb {R}})} \le C_{d,m} a r_0^{-m}. \end{aligned}$$

By letting \(r_0\) goes to \(\infty \), we see that \(\inf \left\{ \int \vert f(r)\vert r^{d-1}\mathrm {d}r,\ f\in F_a \right\} = 0\). \(\square \)

Details on Section 5.1

1.1 Optimization of convex functions on Riemannian manifolds

Let \({\mathcal {M}}\) be a complete Riemannian manifold endowed with a metric \(g_{{\mathcal {M}}}\). We write \(\mathrm {Exp}_x\) for the exponential map at \(x\in {\mathcal {M}}\). The geodesic distance is written as \(d_{{\mathcal {M}}}\). We say that a set \(\Omega \subset {\mathcal {M}}\) is geodesically convex if every geodesic joining two points of \(\Omega \) is included in \(\Omega \). We will assume that \(\Omega \) is small enough so that the logarithmic map \(\mathrm {Exp}_x^{-1}\) is defined on \(\Omega \) for every \(x\in \Omega \). We say that a \({\mathcal {C}}^2\) function \(G:\Omega \rightarrow {\mathbb {R}}\) is \(\lambda \)-strongly geodesically convex and \(\beta \)-smooth if \(G\circ \gamma \) is \(\lambda \)-strongly convex and \(\beta \)-smooth for every unit-speed geodesic \(\gamma \) in \(\Omega \). In particular, this implies

$$\begin{aligned} \begin{aligned}&G(y)\ge G(x) + \langle \nabla G(x),\mathrm {Exp}_x^{-1}(y) \rangle + \frac{\lambda }{2}d_{{\mathcal {M}}}(x,y)^2 \\&G(y)\le G(x) + \langle \nabla G(x),\mathrm {Exp}_x^{-1}(y) \rangle + \frac{\beta }{2}d_{{\mathcal {M}}}(x,y)^2 . \end{aligned} \end{aligned}$$
(I.1)

A fundamental result of convex optimization [61] states that a \(\beta \)-smooth and \(\lambda \)-strongly geodesically convex function can be optimized efficiently through a gradient descent.

Proposition 28

Assume that \({\mathcal {M}}\) has nonnegative curvature. Let \(G:\Omega \rightarrow {\mathbb {R}}\) be \(\beta \)-smooth and \(\lambda \)-strongly geodesically convex, with minimizer \(x^*\). Assume that \(\Omega \) contains a geodesic ball centered at \(x^*\) of radius \(r_0\). Fix \(x^0\) a point in this geodesic ball and let \(0\le \alpha \le \beta ^{-1}\).

  1. 1.

    The sequence of iterates \(x^{a+1} = \mathrm {Exp}_{x^a}(-\alpha \nabla G(x^a))\) is well-defined for any \(a\in {\mathbb {N}}\).

  2. 2.

    The sequence of iterates satisfies

    $$\begin{aligned} d_{{\mathcal {M}}}(x^a,x^*)^2 \le (1-\lambda \alpha )^t d_{{\mathcal {M}}}(x^0,x^*)^2. \end{aligned}$$
    (I.2)

Such a result is standard, although we could not find it in this form in the literature. We provide a short proof here.

Proof

The fact that the sequence of iterates is well-defined follows from \({\mathcal {M}}\) being complete, inequality (I.2), and the fact that \(\Omega \) contains a geodesic ball centered at \(x^*\). It therefore suffices to show (I.2). As the manifold \({\mathcal {M}}\) has nonnegative curvature, we have

$$\begin{aligned} d_{{\mathcal {M}}}(x^{a+1},x^*)^2&\le d_{{\mathcal {M}}}(x^a,x^*)^2 + d_{{\mathcal {M}}}(x^a,x^{a+1})^2 -2 \langle \mathrm {Exp}^{-1}_{x^a}(x^{a+1}),\mathrm {Exp}^{-1}_{x^a}(x^{*}) \rangle \\&\le d_{{\mathcal {M}}}(x^a,x^*)^2 + \alpha ^2 \vert \nabla G(x^a)\vert ^2 + 2\alpha \langle \nabla G(x^a), \mathrm {Exp}^{-1}_{x^a}(x^{*}) \rangle \\&\le (1-\lambda \alpha )d_{{\mathcal {M}}}(x^a,x^*)^2 + \alpha ^2 \vert \nabla G(x^a)\vert ^2 + 2\alpha (G(x^*)-G(x^a)), \end{aligned}$$

where we used (I.1) at the last line. Also, we have by (I.1)

$$\begin{aligned} G(x^*)-G(x^a)&\le G(\mathrm {Exp}_{x^a}(-\beta ^{-1}\nabla G(x^a))) - G(x^a) \\&\le \langle \nabla G(x^a),-\beta ^{-1}\nabla G(x^a) \rangle + \frac{\beta }{2} \vert \nabla G(x^a)\vert ^2 \le - \frac{\beta }{2} \vert \nabla G(x^a)\vert ^2, \end{aligned}$$

concluding the proof. \(\square \)

Proposition  18 that is proven just below asserts that \(G_m\) is with high probability \(\beta \)-smooth and \(\lambda \)-strongly geodesically convex with both \(\beta \) and \(\lambda \) of order \(\varepsilon ^2\) on \(\Omega =\{(Q,V)\in {\mathcal {M}}:\ d_{{\mathcal {O}}_*(d,D)}(Q,Q^*)\le \delta \varepsilon ,\ \left\| V \right\| _{\mathrm {op}}\le \ell \}\). Our initialization point is given by \((Q^0,0)\), where \(Q^0\) is the output of a PCA, that satisfies with high probability \(d_{{\mathcal {O}}_*(d,D)}(Q^0,Q^*)\le c\varepsilon \) for some constant c. The geodesic distance between \((Q^0,0)\) and \((Q^*,V^*)\) is smaller than \(C\varepsilon \) for some larger constant C (using the definition of the metric (5.5)). Hence, for \(\delta \) large enough, \(\Omega \) contains the geodesic ball centered at \((Q^*,V^*)\) of radius \(C\varepsilon \), and we can apply Proposition 28.

Letting \(\alpha =\beta ^{-1}\), the iterates of a gradient descent converge at rate

$$\begin{aligned} d_{{\mathcal {M}}}((Q^a,V^a),(Q^*,V^*))^2 \le c^t d_{{\mathcal {M}}}((Q^0,V^0),(Q^*,V^*))^2, \end{aligned}$$
(I.3)

where \(c\in (0,1)\) depends on the parameter of the model.

1.2 Convexity of \(G_m\)

We prove in this section Proposition 18. We assume without loss of generality that \(\delta \ge 1/(2\tau _{\min })\) and that \(\delta \le \ell \). Fix \((Q,V)\in \Omega \) and let \((B,W)\in T_{(Q,V)}{\mathcal {M}}\) be a tangent vector with unit norm. Write U for the vector space spanned by the first d columns of Q. The exponential map \(\mathrm {Exp}_{(Q,V)}\) on \({\mathcal {M}}\) is given by

$$\begin{aligned} \mathrm {Exp}_{(Q,V)}(B,W) = \left( Q\exp \begin{pmatrix} 0 &{} -B^\top \\ B &{} 0 \end{pmatrix}, V+W \right) . \end{aligned}$$

Introduce the function \(F_x:t\mapsto G_{m,x}(\mathrm {Exp}_{(Q,V)}(tB,tW))\). We denote by \({\mathbb {E}}_N\) the expectation with respect to the empirical distribution associated with \(X_1,\dots ,X_N\), so that \({\mathbb {E}}_N F_X = \frac{1}{N}\sum _{i=1}^N F_{X_i}\). To show that \(G_m\) is geodesically \(\lambda \)-strongly convex and \(\beta \)-smooth on \(\Omega \), we need to show that

$$\begin{aligned} \lambda \le \frac{d^2}{dt^2} {\mathbb {E}}_N F_X(t)_{\vert t=0} \le \beta . \end{aligned}$$

To simplify the notation, write \((Q^t,V^t)= \mathrm {Exp}_{(Q,V)}(tB,tW)\), and let \(Q^t= (e_1^t \cdots e_D^t)\). We will also write \(\dot{a}\) (resp. \(\ddot{a}\)) for the first (resp. second) time derivative of a function a evaluated at 0. Let \({\mathbb {V}}_j^t(x) =\iota _j(Q^t,V^t)[x^{\otimes j}]\) and let \({\mathbb {V}}= \sum _{j=2}^{m-1}{\mathbb {V}}_j\). Remark that

$$\begin{aligned} F_x = \frac{1}{2}\left( \vert x\vert ^2 -\sum _{k=1}^d \langle e_k,x \rangle ^2 + \vert {\mathbb {V}}(x)\vert ^2 - 2\langle x,{\mathbb {V}}(x) \rangle \right) . \end{aligned}$$
(I.4)

One can directly compute

$$\begin{aligned} \begin{aligned}&{\ddot{F}}_x = -\sum _{k=1}^d \left( \langle \ddot{e}_k,x \rangle \langle e_k,x \rangle + \langle \dot{e}_k,x \rangle ^2 \right) + \langle {{\ddot{{\mathbb {V}}}}}(x) ,{\mathbb {V}}(x)-x \rangle +\vert {\dot{{\mathbb {V}}}}(x)\vert ^2 \\&{\mathbb {E}}_N {\ddot{F}}_X = \frac{1}{N}\sum _{i=1}^N {\ddot{F}}_{X_i}. \end{aligned} \end{aligned}$$
(I.5)

Also, we have

$$\begin{aligned} \begin{aligned}&\dot{Q}= (\dot{e}_1 \cdots \dot{e}_D) = Q\begin{pmatrix} 0 &{} -B^\top \\ B &{} 0 \end{pmatrix} = (Q_{[d,D]}B\ \vert -Q_{[d]}B^\top ) \\&{\ddot{Q}}= (\ddot{e}_1 \cdots \ddot{e}_D) =Q\begin{pmatrix} 0 &{} -B^\top \\ B &{} 0 \end{pmatrix}^2 = -(Q_{[d]}B^\top B \ \vert \ Q_{[d,D]}B B^\top ). \end{aligned} \end{aligned}$$
(I.6)

Note that (I.6) yields the following identities: for \(1\le k,l\le D\),

$$\begin{aligned} {\left\{ \begin{array}{ll} \langle \dot{e}_k, e_l \rangle = -\langle e_k,\dot{e}_l \rangle \text { and } \langle \ddot{e}_k,e_l \rangle = \langle e_k,\ddot{e}_l \rangle ,\\ \text {for }k\le d, \dot{e}_k \in U^\bot \text { and }\ddot{e}_k \in U,\\ \text {for }k>d, \dot{e}_k \in U\text { and }\ddot{e}_k \in U^\bot . \end{array}\right. } \end{aligned}$$
(I.7)

We let \({\tilde{x}} = Q_{[d]}^\top x\in {\mathbb {R}}^d\). Also, we insist on the distinction between the tensor \({\mathbb {V}}_j\) (that is a tensor from \({\mathbb {R}}^D\) to \({\mathbb {R}}^D\)) and the tensor \(V_j\) (that is a tensor from \({\mathbb {R}}^d\) to \({\mathbb {R}}^{D-d}\)). The two are related by the identity \({\mathbb {V}}_j(x) = Q_{[d,D]}V_j[{\tilde{x}}^{\otimes j}]\). We will also write \({\mathbb {W}}_j\) for the tensor given by \({\mathbb {W}}_j=\iota _j(Q,W_j)\) and let \({\mathbb {W}}= \sum _{j=2}^{m-1}{\mathbb {W}}_j\). We write \(B=u\Sigma v^\top \) for the SVD of B, with u (resp. v) a \((D-d)\times d\) (resp. \(d\times d\)) matrix with orthogonal columns \(u_k\) (resp. \(v_k\)) and \(\Sigma \) a \(d\times d\) diagonal matrix with nonnegative entries \(\sigma _1,\dots ,\sigma _d\). In particular, we have \(\vert B \vert ^2 = \sum _{k=1}^d \sigma _k^2\). We will use the following fact.

Lemma 29

Let \(a=(a_{d+1},\dots ,a_{D})\in {\mathbb {R}}^{D-d}\). Then,

$$\begin{aligned} \vert \sum _{k=d+1}^D \dot{e}_k a_k \vert \le \vert a \vert \vert B \vert . \end{aligned}$$
(I.8)

Proof

We have \(\sum _{k=d+1}^D \dot{e}_k a_k =\dot{Q}_{[d,D]} a = Q_{[d]}v\Sigma u^\top a\). As \(Q_{[d]}\) and v have orthogonal columns, the squared norm of this vector is equal to the squared norm of \(\Sigma u^\top a\), that is equal to \(\sum _{k=d+1}^D \sigma _k^2 \langle u_k,a \rangle ^2 \le \vert a \vert ^2 \vert B \vert ^2\), as each \(u_k\) is of norm 1. \(\square \)

1.2.1 Step 1

We first give bounds on \(-\sum _{k=1}^d \left( \langle \ddot{e}_k,x \rangle \langle e_k,x \rangle + \langle \dot{e}_k,x \rangle ^2 \right) \). First, the dot product \(\vert \langle \dot{e}_k,x \rangle \vert \) is negligible.

Lemma 30

For \(1\le k \le d\), we have \(\vert \langle \dot{e}_k,x \rangle \vert \le 2\vert \dot{e}_k\vert \varepsilon ^2\delta \).

Proof

Let \(x\in M\) with \(\vert x\vert \le \varepsilon \). Recall that U is the vector space spanned by \(Q_{[d]}\). It holds that

$$\begin{aligned} \vert \langle \dot{e}_k,x \rangle \vert&= \vert \langle \dot{e}_k,\pi _U^\bot (x) \rangle \vert \le \vert \dot{e}_k\vert (\vert (\pi _U^\bot -\pi _{T_0 M}^\bot )(x)\vert + \vert \pi _{T_0 M}^\bot (x)\vert )\\&\le \vert \dot{e}_k\vert (r\varepsilon + \varepsilon ^2/(2\tau _{\min })). \end{aligned}$$

The fact that \(\delta \ge 1/(2\tau _{\min })\) and that \(r =\delta \varepsilon \) gives the conclusion. \(\square \)

Lemma 30 implies that \(\sum _{k=1}^d \langle \dot{e}_k,x \rangle ^2 \le 4\delta ^2\vert B\vert ^2 \varepsilon ^4\le 4\delta ^2 \varepsilon ^4\). Also, we have

$$\begin{aligned} -\sum _{k=1}^d \langle \ddot{e}_k,x \rangle \langle e_k,x \rangle&= -\sum _{k=1}^d x^\top \ddot{e}_k e_k^\top x = -x^\top {\ddot{Q}}_{[d]}Q_{[d]}^\top x \\&= x^\top Q_{[d]}B^\top B Q_{[d]}^\top x =\vert B {\tilde{x}}\vert ^2. \end{aligned}$$

Therefore, we may lower bound the first term in the expression of \({\ddot{F}}\):

$$\begin{aligned} \begin{aligned}&-{\mathbb {E}}_N \sum _{k=1}^d \left( \langle \ddot{e}_k,X \rangle \langle e_k,X \rangle + \langle \dot{e}_k,X \rangle ^2 \right) \ge {\mathbb {E}}_N \vert B {\tilde{X}}\vert ^2 -4\delta ^2\varepsilon ^4 \end{aligned} \end{aligned}$$
(I.9)

Also, as \(\langle e_k,X_i \rangle \le \varepsilon \) and as (BW) is of norm 1, we have the upper bound

$$\begin{aligned} -{\mathbb {E}}_N\sum _{k=1}^d \left( \langle \ddot{e}_k,X \rangle \langle e_k,X \rangle + \langle \dot{e}_k,X \rangle ^2 \right) \le c_d\varepsilon ^2\vert B\vert ^2 \le c_d\varepsilon ^2. \end{aligned}$$
(I.10)

1.2.2 Step 2

One can compute

$$\begin{aligned} {\dot{{\mathbb {V}}}}&= \sum _{j=2}^{m-1}\sum _{k=d+1}^D \Bigg ( \dot{e}_k \sum _{1\le i_1\le \dots \le i_j\le d} V_{j,k}^{i_1,\dots ,i_j} e_{i_1}\otimes \cdots \otimes e_{i_j} \\&\qquad + e_k \sum _{1\le i_1\le \dots \le i_j\le d} W_{j,k}^{i_1,\dots ,i_j} e_{i_1}\otimes \cdots \otimes e_{i_j}\\&\qquad + e_k \sum _{1\le i_1\le \dots \le i_j\le d} V_{j,k}^{i_1,\dots ,i_j} \sum _{a=1}^j e_{i_1}\otimes \cdots \dot{e}_{i_a}\cdots \otimes e_{i_j} \Bigg ) \end{aligned}$$

Let us lower bound \(\vert {\dot{{\mathbb {V}}}}(x)\vert ^2\). As \(\dot{e}_k \in U\) and \(e_k\in U^\bot \) for \(d+1\le k \le D\),

$$\begin{aligned} \vert {\dot{{\mathbb {V}}}}(x)\vert ^2&\ge \sum _{k=d+1}^D \Bigg (\sum _{j=2}^{m-1} \sum _{1\le i_1\le \dots \le i_j\le d} W_{j,k}^{i_1,\dots ,i_j} \prod _{c=1}^j \langle e_{i_c},x \rangle \\&\qquad +\sum _{1\le i_1\le \dots \le i_j\le d} V_{j,k}^{i_1,\dots ,i_j} \sum _{a=1}^j \langle \dot{e}_{i_a},x \rangle \prod _{c\ne a} \langle e_{i_c},x \rangle \Bigg )^2 \\&= \sum _{k=d+1}^D (A_{1,k}+A_{2,k})^2. \end{aligned}$$

We lower bound \((A_{1,k}+A_{2,k})^2\) by \(A_{1,k}^2 - 2\vert A_{1,k}\vert \vert A_{2,k}\vert \). Notice first that

$$\begin{aligned} \langle \dot{e}_{i_a},x \rangle = \sum _{f=d+1}^D \langle \dot{e}_{i_a},e_f \rangle \langle e_f,x \rangle = -\sum _{f=d+1}^D \langle e_{i_a},\dot{e}_f \rangle \langle e_f,x \rangle = \langle e_{i_a}, z \rangle , \end{aligned}$$
(I.11)

where \(z = -\sum _{f=d+1}^D \dot{e}_f \langle e_f,x \rangle = -\dot{Q}_{[d,D]} Q_{[d,D]}^\top x = Q_{[d]}B^\top Q_{[d,D]}^\top x\). We have \(\vert z\vert ^2 = \sum _{k=1}^d \sigma _k^2 \vert {\tilde{y}}_k\vert ^2\), where \({\tilde{y}}_k\) is the kth entry of the vector \(\tilde{y} = u^\top Q_{[d,D]}^\top x\in {\mathbb {R}}^d\), that is equal to \(\langle x,Q_{[d,D]}u_k \rangle \). As \(Q_{[d,D]}u_k\in U^\bot \) and \(u_k\) is of unit norm, we have \(\vert \tilde{y}_k \vert \le c \delta \varepsilon ^2\) by the same argument than in Lemma 30. Therefore, \(\vert z \vert \le c \delta \vert B \vert \varepsilon ^2\). Write \(\tilde{z}=Q_{[d]}^\top z\in {\mathbb {R}}^d\). This implies

$$\begin{aligned}&\left( \sum _{k=d+1}^D \vert A_{2,k}\vert ^2 \right) ^{1/2} \nonumber \\&\quad = \left( \sum _{k=d+1}^D \left( \sum _{j=2}^{m-1}\sum _{1\le i_1\le \dots \le i_j\le d} V_{j,k}^{i_1,\dots ,i_j} \sum _{a=1}^j \langle e_{i_a},z \rangle \prod _{c\ne a} \langle e_{i_c},x \rangle ^2 \right) \right) ^{1/2} \nonumber \\&\quad = \left( \sum _{k=d+1}^D \sum _{j=2}^{m-1}V_{j,k}[\tilde{z}, \tilde{x}^{\otimes (j-1)}]^2 \right) ^{1/2} \le \sum _{j=2}^{m-1} \vert V_{j}[\tilde{z},\tilde{x}^{\otimes (j-1)}]\vert \nonumber \\&\quad \le \ell \sum _{j=2}^{m-1} \varepsilon ^{j-1} \vert z\vert \le C_{d,m}\delta (\ell \varepsilon ) \varepsilon ^{2} \vert B\vert . \end{aligned}$$
(I.12)

Also, we have

$$\begin{aligned} \vert A_{1,k}\vert \le \sum _{j=2}^{m-1}\varepsilon ^j \sum _{1\le i_1\le \dots \le i_j\le d} \vert W_{j,k}^{i_1,\dots ,i_j}\vert \le C_{d,m}\sum _{j=2}^{m-1}\varepsilon ^j \vert W_{j,k}\vert , \end{aligned}$$

so that

$$\begin{aligned}&\sum _{k=d+1}^D \vert A_{1,k}\vert \vert A_{2,k}\vert \le C_{d,m}\left( \sum _{k=d+1}^D \left( \sum _{j=2}^{m-1}\varepsilon ^j \vert W_{j,k}\vert \right) ^2 \right) ^{1/2} \left( \sum _{k=d+1}^D \vert A_{2,k}\vert ^2 \right) ^{1/2} \\&\quad \le C'_{d,m} \sum _{j=2}^{m-1}\varepsilon ^j \vert W_{j}\vert \delta (\ell \varepsilon )\varepsilon ^2 \vert B\vert \le C''_{d,m}\delta (\ell \varepsilon )(\varepsilon ^2 \vert B\vert ^2 + \sum _{j=2}^{m-1}\varepsilon ^{2j+2} \vert W_{j}\vert ^2) \\&\quad \le C''_{d,m}\delta (\ell \varepsilon ) \varepsilon ^2, \end{aligned}$$

where we used at the last line that \(\vert B\vert ^2 + \sum _{j=2}^{m-1}\varepsilon ^{2(j-1)}\vert W_j\vert ^2\) is the norm of the vector (BW), that we assume is equal to 1. As \(\sum _{k=d+1}^D A_{1,k}^2 = \vert {\mathbb {W}}(x)\vert ^2\), we obtain that

$$\begin{aligned} {\mathbb {E}}_N \vert {\dot{{\mathbb {V}}}}(X)\vert ^2 \ge {\mathbb {E}}_N \vert {\mathbb {W}}(X)\vert ^2 - C_{d,m}\delta (\ell \varepsilon )\varepsilon ^2. \end{aligned}$$
(I.13)

Let us now upper bound \(\vert {\dot{{\mathbb {V}}}}(x)\vert \). We have \(\vert \sum _{k=d+1}^D \dot{e}_k V_{j,k}[\tilde{x}^{\otimes j}]\vert ^2 = \vert \dot{Q}_{[d,D]} V_j[\tilde{x}^{\otimes j}]\vert ^2\) (where \(V_j[\tilde{x}^{\otimes j}]\) is the vector in \({\mathbb {R}}^{D-d}\) with entries \(V_{j,k}[\tilde{x}^{\otimes j}]\)). Therefore,

$$\begin{aligned}&\vert \sum _{k=d+1}^D \dot{e}_k V_{j,k}[\tilde{x}^{\otimes j}]\vert ^2 = \vert Q_{[d]}B^\top V_j[\tilde{x}^{\otimes j}]\vert ^2 = \vert Q_{[d]}v\Sigma u^\top V_j[\tilde{x}^{\otimes j}]\vert ^2 \\&\quad = \sum _{l=1}^d \sigma _l^2 \langle u_l,V_{j}[\tilde{x}^{\otimes j}] \rangle ^2 \le \vert B \vert ^2 \ell ^2 \varepsilon ^{2j} \le \vert B \vert ^2 \ell ^2 \varepsilon ^{4}, \end{aligned}$$

where we used that each \(u_l\) is of norm 1. We therefore obtain the upper bound (recalling that \(\ell \le c \varepsilon ^{-1}\) for a certain constant c)

$$\begin{aligned}&{\mathbb {E}}_N\vert {\dot{{\mathbb {V}}}}(X)\vert ^2 \le C_{d,m} \sum _{j=2}^{m-1}{\mathbb {E}}_N \vert \sum _{k=d+1}^D \dot{e}_k V_{j,k}[\tilde{X}^{\otimes j}]\vert ^2 + 2\sum _{k=d+1}^D( A_{1,k}^2 + A_{2,k}^2)\nonumber \\&\quad \le C'_{d,m} ( \vert B \vert ^2 \ell ^2 \varepsilon ^{4} + \sum _{j=2}^{m-1}\varepsilon ^{2j}\vert W_j\vert ^2 + \delta ^2 (\ell \varepsilon )^2 \varepsilon ^4 \vert B \vert ^2) \le C''_{d,m}\varepsilon ^2, \end{aligned}$$
(I.14)

where we used that (BW) is of norm 1.

1.2.3 Step 3

Eventually, we upper bound \(\vert {{\ddot{{\mathbb {V}}}}}(x)\vert \). We first compute

$$\begin{aligned} {{\ddot{{\mathbb {V}}}}}&= \sum _{j=2}^{m-1}\sum _{k=d+1}^D \Bigg ( \ddot{e}_k \sum _{1\le i_1\le \dots \le i_j\le d} V_{j,k}^{i_1,\dots ,i_j} e_{i_1}\otimes \cdots \otimes e_{i_j} \\&\quad + 2\dot{e}_k \sum _{1\le i_1\le \dots \le i_j\le d} W_{j,k}^{i_1,\dots ,i_j} e_{i_1}\otimes \cdots \otimes e_{i_j} \\&\quad + 2 \dot{e}_k \sum _{1\le i_1\le \dots \le i_j\le d} V_{j,k}^{i_1,\dots ,i_j} \sum _{a=1}^j e_{i_1}\otimes \cdots \dot{e}_{i_a}\cdots \otimes e_{i_j} \\&\quad + 2e_k \sum _{1\le i_1\le \dots \le i_j\le d} W_{j,k}^{i_1,\dots ,i_j} \sum _{a=1}^j e_{i_1}\otimes \cdots \dot{e}_{i_a}\cdots \otimes e_{i_j} \\&\quad + 2e_k \sum _{1\le i_1\le \dots \le i_j\le d} V_{j,k}^{i_1,\dots ,i_j} \sum _{a=1}^j \sum _{b> a} e_{i_1}\otimes \cdots \dot{e}_{i_a} \otimes \dot{e}_{i_b} \cdots \otimes e_{i_j} \\&\quad + e_k \sum _{1\le i_1\le \dots \le i_j\le d} V_{j,k}^{i_1,\dots ,i_j} \sum _{a=1}^j e_{i_1}\otimes \cdots \ddot{e}_{i_a}\cdots \otimes e_{i_j} \Bigg )\\&= A_3 + A_4+A_5+A_6 + A_7+A_8. \end{aligned}$$
  • Bound on \(A_3\). We have

    $$\begin{aligned} \vert A_3(x)\vert \le \sum _{j=2}^{m-1}\left( \sum _{k=d+1}^D \vert \ddot{e}_k\vert ^2 \right) ^{1/2}\vert {\mathbb {V}}_j(x)\vert \le C_{d,m}\ell \vert B\vert ^2 \varepsilon ^2\le C_{d,m}\ell \varepsilon ^2. \end{aligned}$$
  • Bound on \(A_4\). By Lemma 29 applied to \(a= W_j[\tilde{x}^{\otimes j}]\in {\mathbb {R}}^{D-d}\), we have

    $$\begin{aligned} \vert A_4(x) \vert&\le 2 \vert B \vert \sum _{j=2}^{m-1}\left( \sum _{k=d+1}^D W_{j,k}[\tilde{x}^{\otimes j}]^2 \right) ^{1/2} \le C_{d,m}\sum _{j=2}^{m-1}\varepsilon ^j \vert B \vert \vert W_j \vert \\&\le C'_{d,m} \varepsilon (\vert B \vert ^2 + \sum _{j=2}^{m-1} \varepsilon ^{2(j-1)}\vert W_j\vert ^2 ) \le C'_{d,m} \varepsilon . \end{aligned}$$

    Note also that

    $$\begin{aligned} A_4(x) = \sum _{j=2}^{m-1}2\dot{Q}_{[d,D]}W_j[\tilde{x}^{\otimes j}]= -2 Q_{[d]}B^\top \sum _{j=2}^{m-1} W_j[\tilde{x}^{\otimes j}]. \end{aligned}$$
  • Bound on \(A_5\). By Lemma 29 applied to \(a= (A_{2,1},\dots ,A_{2,D-d})\in {\mathbb {R}}^{D-d}\) (where the \(A_{2,k}\)s were introduced in Step 2) and by (I.12), we have

    $$\begin{aligned} \vert A_5(x) \vert \le 2 \vert B \vert \left( \sum _{k=d+1}^D A_{2,k}^2 \right) ^{1/2}\le C_{d,m}\delta (\ell \varepsilon )\varepsilon ^2 \vert B\vert ^2\le C_{d,m}\delta (\ell \varepsilon )\varepsilon ^2. \end{aligned}$$
  • Bound on \(A_6\). The quantity \(\vert A_6(x)\vert \) is smaller than

    $$\begin{aligned}&2\left( \sum _{j=2}^{m-1} \sum _{k=d+1}^D \left( \sum _{1\le i_1\le \dots \le i_j\le d} W_{j,k}^{i_1,\dots ,i_j} \sum _{a=1}^j \langle \dot{e}_{i_a},x \rangle \prod _{c\ne a} \langle e_{i_c},x \rangle \right) ^2 \right) ^{1/2}\\&\quad \le 2\left( \sum _{j=2}^{m-1}\sum _{k=d+1}^D \vert W_{j,k}\vert ^2\left( \sum _{1\le i_1\le \dots \le i_j\le d} \left( \sum _{a=1}^j \langle \dot{e}_{i_a},x \rangle \prod _{c\ne a} \langle e_{i_c},x \rangle \right) ^2 \right) \right) ^{1/2} \\&\quad \le C_{d,m}\sum _{j=2}^{m-1}\varepsilon ^{j-1}\left( \sum _{k=d+1}^D \vert W_{j,k}\vert ^2 \sum _{l=1}^d\langle \dot{e}_{l},x \rangle ^2 \right) ^{1/2} \\&\quad \le C'_{d,m}\sum _{j=2}^{m-1}\varepsilon ^{j-1}\left( \sum _{k=d+1}^D \vert W_{j,k}\vert ^2 \delta ^2\varepsilon ^4\sum _{l=1}^d\vert \dot{e}_{l}\vert ^2 \right) ^{1/2} \text { using Lemma}~30\\&\quad \le C'_{d,m}\delta \sum _{j=2}^{m-1}\varepsilon ^{j+1}\vert B\vert \vert W_j\vert \le C'_{d,m}\delta \varepsilon ^2(\vert B\vert ^2 +\sum _{j=2}^{m-1} \varepsilon ^{2(j-1)}\vert W_j\vert ^2) \\&\le C'_{d,m}\delta \varepsilon ^2. \end{aligned}$$
  • Bound on \(A_7\). Using (I.11), we obtain that \(\vert A_7(x)\vert \) is smaller than

    $$\begin{aligned}&2\left| \sum _{j=2}^{m-1} \sum _{k=d+1}^D e_k \sum _{1\le i_1\le \dots \le i_j\le d} V_{j,k}^{i_1,\dots ,i_j} \sum _{a=1}^j \sum _{b> a} \langle e_{i_a},z \rangle \langle e_{i_b},z \rangle \prod _{c\ne a,b} \langle e_{i_c},x \rangle \right| \\&\quad \le C_{d,m} \vert \sum _{j=2}^{m-1} V_j[\tilde{z},\tilde{z},\tilde{x}^{\otimes (j-2)}] \le C_{d,m}\ell \vert z\vert ^2 \le C'_{d,m}\delta ^2 \ell \varepsilon ^{4}\vert B\vert ^2 \le C'_{d,m}\delta ^2 \ell \varepsilon ^{4}. \end{aligned}$$
  • Bound on \(A_8\). We have

    $$\begin{aligned} \langle \ddot{e}_{i_a},x \rangle = \sum _{f=1}^d \langle \ddot{e}_{i_a},e_f \rangle \langle e_f,x \rangle = \sum _{f=1}^d \langle e_{i_a},\ddot{e}_f \rangle \langle e_f,x \rangle = \langle e_{i_a}, y \rangle , \end{aligned}$$

    where \(y= \sum _{f=1}^d \langle e_f,x \rangle \ddot{e}_f\). In particular, \(\vert y\vert \le \varepsilon \sum _{f=1}^d \vert \ddot{e}_f\vert \le c_d \varepsilon \vert B\vert ^2\). Therefore, letting \(\tilde{y} =Q_{[d]}^\top y\),

    $$\begin{aligned} \vert A_8(x)\vert&= \left| \sum _{j=2}^{m-1} \sum _{k=d+1}^D e_k \sum _{1\le i_1\le \dots \le i_j\le d} V_{j,k}^{i_1,\dots ,i_j} \sum _{a=1}^j \langle y, e_{i_a} \rangle \prod _{c\ne a} \langle e_{i_c},x \rangle \right| \\&\le C_{d,m} \vert \sum _{j=2}^{m-1} V_j[\tilde{y},\tilde{x}^{\otimes (j-1)}]\vert \le C_{d,m} \ell \varepsilon ^2 \vert B\vert ^2\le C_{d,m} \ell \varepsilon ^2. \end{aligned}$$

Putting the different terms together, and recalling that \(\ell \ge \delta \), we obtain that \({{\ddot{{\mathbb {V}}}}}(x) =-2 Q_{[d]}B^\top \sum _{j=2}^{m-1}W_j[\tilde{x}^{\otimes j}] + R\), where R is a remainder term of norm smaller than \(C_{d,m}\ell \varepsilon ^2\). Also, we have \(\vert \langle A_4(x),{\mathbb {V}}(x) \rangle \vert \le C_{d,m}(\ell \varepsilon )\varepsilon ^2\). We may therefore write

$$\begin{aligned} \begin{aligned} \langle {{\ddot{{\mathbb {V}}}}}(x), {\mathbb {V}}(x)-x \rangle&= 2 x^\top Q_{[d]}B^\top \sum _{j=2}^{m-1} W_j[\tilde{x}^{\otimes j}] + R'\\&=2 (B{\tilde{x}})^\top \sum _{j=2}^{m-1} W_j[\tilde{x}^{\otimes j}] + R', \end{aligned} \end{aligned}$$
(I.15)

where \(R'\) has norm smaller than \(C'_{d,m}(\ell \varepsilon )\varepsilon ^2\).

1.2.4 Step 4

Putting the lower bounds (I.9) and (I.13) together with identity (I.15), we obtain the lowerbound

$$\begin{aligned} {\mathbb {E}}_N{\ddot{F}}_X&\ge {\mathbb {E}}_N \vert B {\tilde{X}} \vert ^2 + {\mathbb {E}}_N \vert \sum _{j=2}^{m-1}W_j[{\tilde{X}}^{\otimes j}]\vert ^2 + 2\sum _{j=2}^{m-1}{\mathbb {E}}_N (B{\tilde{X}})^\top W_j[{\tilde{X}}^{\otimes j}] \nonumber \\&\quad -C_{d,m}(\ell \varepsilon )\varepsilon ^2 \nonumber \\&\ge {\mathbb {E}}_N\vert B{\tilde{X}} + \sum _{j=2}^{m-1}W_j[{\tilde{X}}^{\otimes j}]\vert ^2 -C_{d,m}(\ell \varepsilon )\varepsilon ^2. \end{aligned}$$
(I.16)

Let us now lower bound the quantity \({\mathbb {E}}\vert B{\tilde{X}} + \sum _{j=2}^{m-1}W_j[{\tilde{X}}^{\otimes j}]\vert ^2\), where we take the expectation with respect to the density f of the sample \(X_1,\dots ,X_N\). Letting \(Y= {\tilde{X}}/\varepsilon \) and \(Z_j = \varepsilon ^{j-1}W_j\), we have

$$\begin{aligned} {\mathbb {E}}\vert B{\tilde{X}} + \sum _{j=2}^{m-1}W_j[{\tilde{X}}^{\otimes j}]\vert ^2&= \varepsilon ^2 {\mathbb {E}}\vert BY + \sum _{j=2}^{m-1} Z_j[Y^{\otimes j}]\vert ^2, \end{aligned}$$

where \(\vert B\vert ^2 + \sum _{j=2}^{m-1}\vert Z_j\vert ^2=1\). We may decompose this expectation as

$$\begin{aligned} \varepsilon ^2\sum _{k=d+1}^D {\mathbb {E}}( B_k^\top Y + \sum _{j=2}^{m-1} Z_{j,k}[Y^{\otimes j}])^2. \end{aligned}$$

We show in the next lemma that each term in the sum is larger than \(c_{d,m}f_{\min }(\vert B_k\vert ^2 + \sum _{j=2}^{m-1}\vert Z_{j,k}\vert ^2)\). By summing over k, we obtain that the expectation is larger than \(c_{d,m}f_{\min }\varepsilon ^2\).

Lemma 31

Let \(S_j\) be a j-tensor from \({\mathbb {R}}^d\) to \({\mathbb {R}}\) for each \(j=1,\dots ,m-1\). Then,

$$\begin{aligned} {\mathbb {E}}\left[ \left( \sum _{j=1}^{m-1} S_j[Y^{\otimes j}] \right) ^2\right] \ge f_{\min }c_{d,m} \sum _{j=1}^{m-1}\vert S_j\vert ^2. \end{aligned}$$
(I.17)

Proof

The random variable \({\tilde{X}}\) has entries \(\langle e_k,X \rangle \) for \(1\le k \le d\). As \((e_1,\dots ,e_d)\) is an orthonormal basis of U that is \(\delta \varepsilon \)-close from \(T_0 M\), the random variable \({\tilde{X}}\) has a density lower bounded by \(f_{\min }/2\) on its support, and this support contains \({\mathcal {B}}(0,\varepsilon /2)\). Therefore, the expectation with respect to Y is larger than

$$\begin{aligned} f_{\min }\int _{{\mathcal {B}}(0,1/2)} \left( \sum _{j=1}^{m-1} S_j[y^{\otimes j}] \right) ^2 \mathrm {d}y. \end{aligned}$$

We may also write \(\sum _{j=1}^{m-1} S_j[y^{\otimes j}]\) as \(\langle {\mathbf {S}},{\mathbf {y}} \rangle \), where \({\mathbf {S}}\) and \({\mathbf {y}}\) are vectors indexed by \(\sigma \in \bigcup _{j=1}^{m-1} \{1,\dots ,d\}^j\), with the entries corresponding to \(\sigma =(i_1,\dots ,i_j)\) given by \({\mathbf {S}}_\sigma = S_j^{i_1,\dots ,i_j}\) and \({\mathbf {y}}_\sigma = \prod _{a=1}^j y_{i_a}\). Therefore, this integral is exactly equal to \({\mathbf {S}}^\top {\mathbf {C}} {\mathbf {S}}\), where \({\mathbf {C}}\) is the matrix with entries \({\mathbf {C}}_{\sigma ,\sigma '} = \int _{{\mathcal {B}}(0,1/2)} {\mathbf {y}}_\sigma {\mathbf {y}}_{\sigma '}\mathrm {d}y\). To conclude, we need to show that this matrix is positive definite. This follows from \({\mathbf {C}}\) being a Gram matrix for the \(L_2\) dot product on \({\mathcal {B}}(0,1/2)\) associated with the \(L_2\) functions \(y\mapsto {\mathbf {y}}_\sigma \) that are linearly independent. Therefore, we have \({\mathbf {S}}^\top {\mathbf {C}} {\mathbf {S}} \ge c_{d,m} \vert {\mathbf {S}}\vert ^2 = c_{d,m}\sum _{j=1}^{m-1}\vert S_j\vert ^2\). \(\square \)

Eventually, by Hoeffding’s inequality, with probability at least \(1-\exp (-cN)\), the empirical expectation \({\mathbb {E}}_N\vert B{\tilde{X}} + \sum _{j=2}^{m-1}W_j[{\tilde{X}}^{\otimes j}]\vert ^2\) is larger than \(c_{d,m}f_{\min }\varepsilon ^2/2\). From (I.16), we obtain a lower bound of order \(\varepsilon ^2\) by choosing \(\ell \le c\varepsilon \) for c small enough.

The upper bound, also of order \(\varepsilon ^2\), is obtained by gathering the different upper bounds (I.10), (I.14) obtained in Steps 1 and 2 as well as the identity (I.15).

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Divol, V. Measure estimation on manifolds: an optimal transport approach. Probab. Theory Relat. Fields 183, 581–647 (2022). https://doi.org/10.1007/s00440-022-01118-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00440-022-01118-z

Keywords

  • Nonparametric estimation
  • Minimax rates
  • Optimal transport
  • Geometric inference

Mathematics Subject Classification

  • 62G05
  • 49Q22
  • 62C20