1 Introduction

How to optimally distribute N points on the two-dimensional sphere? This is a question which has a long history and appears in a wide range of areas in pure as well as applied mathematics (see the survey [32] and [8, Section 7]). The notion of optimality depends, of course, on the problem at hand. But a recurrent theme is to distribute the configuration of points \({\varvec{x}}_{N}:=(x_{1},...,x_{N})\in X^{N}\) so as to minimize

$$\begin{aligned} \left\| \delta _{N}({\varvec{x}}_{N})-d\sigma \right\| \end{aligned}$$

on \(X^{N},\) where \(\left\| \cdot \right\| \) is a given (semi-)norm on the space of all signed measure on the two-sphere X\(d\sigma \) denotes the standard uniform probability measure on X and \(\delta _{N}({\varvec{x}}_{N})\) is the empirical measure corresponding to \({\varvec{x}}_{N},\) i.e. the discrete probability measure on X defined by

$$\begin{aligned} \delta _{N}(x_{1},\ldots ,x_{N}):=\frac{1}{N}\sum _{i=1}^{N}\delta _{x_{i}} \end{aligned}$$
(1.1)

More precisely, since finding exact minimizers is usually unfeasible, the aim is typically to distribute the N points \((x_{1},...,x_{N})\) so that \(\left\| \delta _{N}({\varvec{x}}_{N})-d\sigma \right\| \) achieves the optimal (minimal) rate as \(N\rightarrow \infty \) (as discussed in the introduction of [26, 27, I]). Here we will be concerned with a notion of optimality which naturally appears in the context of numerical integration (cubature) and quasi-Monte-Carlo integration techniques [10, 12], where the norm in question is a Sobolev norm.

1.1 Background

1.1.1 (Quasi-)Monte-Carlo integration on cubes

Monte-Carlo integration is a standard probabilistic technique for numerically computing the Lebesgue integral of a given, say continuous, function f over a domain X in Euclidean \(\mathbb {R}^{d}\) (or more generally, a Riemannian manifold X). It consists in generating N random points \(x_{1},...,x_{N}\) in X,  with respect to the uniform distribution dx on X (assuming for simplicity that X has unit-volume) and approximating

$$\begin{aligned} \int _{X}fdx\approx \frac{1}{N}\sum _{i=1}^{N}f(x_{i}) \end{aligned}$$

In other words, the points \(x_{1},...,x_{N}\) are viewed as as independent \(\mathbb {R}^{d}-\)valued random variables with identical distribution dx. By the central limit theorem the error is of the order \({\mathcal {O}}(N^{-1/2})\) with high probability:

$$\begin{aligned} \lim _{N\rightarrow \infty }\mathbb {P}\left( \left| \int _{X}fdx-\frac{1}{N}\sum _{i=1}^{N}f(x_{i})\right| \ge \frac{\lambda }{N^{1/2}}\right) =1-\int _{|y|\ge \lambda }e^{-y^{2}}dx/\pi ^{1/2}\qquad \end{aligned}$$
(1.2)

if f is normalized to have unit variance.

The popularQuasi-Monte-Carlo method aims at improving the order of the convergence, by taking \({\varvec{x}}_{N}:=(x_{1},...,x_{N})\) to be a judiciously constructed deterministic sequence of \(N-\)point configurations on X. In the most commonly studied case when X is the unit-cube \([0,1]^{d}\) in \(\mathbb {R}^{d}\) there are well-known explicit so called low-descrepancy sequences (e.g. digital nets)constructed using the theory of uniform distribution in number theory [29], such that

$$\begin{aligned} \sup _{f\in C(X):\,V(f)\le 1}\left| \int _{X}fdx-\frac{1}{N}\sum _{i=1}^{N}f(x_{i})\right| \le C_{d}\frac{(\log N)^{d}}{N} \end{aligned}$$
(1.3)

where V(f) is the Hardy-Krause variation of f,  whose general definition is rather complicated, but for f sufficiently regular it is, when \(d=2,\) given by

$$\begin{aligned} V(f):=\int _{[0,1]^{2}}\left| \frac{\partial ^{2}f}{\partial x_{1}\partial x_{2}}\right| dx+\int _{[0,1]}\left| \frac{\partial f}{\partial x_{1}}(x_{1},1)\right| dx_{1}+\int _{[0,1]}\left| \frac{\partial f}{\partial x_{2}}(1,x_{2})\right| dx_{2} \end{aligned}$$
(1.4)

This is a consequence of the Koksma-Hlawka inequality, which is the corner stone of the theory of quasi-Monte-Carlo integration on a cube [21, 23, 29].

1.1.2 Numerical integration on manifolds

Let us next recall the general setup for numerical integration on manifolds, following [9, 10]. Let X be a compact manifold that we shall take to be two-dimensional. Given a configuration \({\varvec{x}}_{N}\in X^{N}\) of N points on X the worst-case error for the integration rule on X with node set \({\varvec{x}}_{N}\) with respect to the smoothness parameter \(s\in ]1,\infty [\) is defined by

$$\begin{aligned} \text {wce }({\varvec{x}}_{N};s):=\sup _{f:\,\left\| f\right\| _{H^{s}(X)}\le 1}\left| \int _{X}fd\sigma -\frac{1}{N}\sum _{i=1}^{N}f(x_{i})\right| \end{aligned}$$
(1.5)

where \(d\sigma _{g}\) denotes the normalized volume form defined by g and \(\left\| f\right\| _{H^{s}(X)}\) denotes the norm in the Sobolev space \(H^{s}(X)\) of functions with s fractional derivatives in \(L^{2}(X).\) In other words,

$$\begin{aligned} \text {wce }({\varvec{x}}_{N};s)=\left\| \delta _{N}({\varvec{x}}_{N})-d\sigma _{g}\right\| _{H_{0}^{-s}(X)}, \end{aligned}$$

where \(\delta _{N}({\varvec{x}}_{N})\) is the empirical measure 1.1 and \(H_{0}^{-s}(X)\) denotes the Sobolev space of all mean zero distributions on X,  endowed with the Hilbert norm which is dual to the smoothness parameter \(s\in ]1,\infty [\) (see Sect. 2.1). The role of the Hardy-Krause variation norm 1.4 on a Euclidean square will in the present two-dimensional Riemannian setting be played by the Sobolev norm with smoothness parameter 2 : 

$$\begin{aligned} \left\| f\right\| _{H^{2}(X)}:=\left( \int _{X}|\Delta _{g}f|^{2}dV_{g}\right) ^{1/2}, \end{aligned}$$

where \(\Delta _{g}\) denotes Laplace operator on \(C^{\infty }(X).\) The worst case error \(\text {wce }({\varvec{x}}_{N};s)\) is also called the generalized discrepancy [12] because of the similarity with the Koksma-Hlawka inequality on a cube. A sequence \({\varvec{x}}_{N}\in X^{N}\) is said to be of convergence order \({\mathcal {O}}(N^{-\kappa })\) with respect to the smoothness parameter s if

$$\begin{aligned} \text {wce }({\varvec{x}}_{N};s)\le {\mathcal {O}}(N^{-\kappa }) \end{aligned}$$

The optimal convergence order is \({\mathcal {O}}(N^{-s/2}).\) More precisely, by [9, Thm 2.14], there exists a positive constant c(s) such that for any sequence \({\varvec{x}}_{N}\in X^{N}\)

$$\begin{aligned} \text {wce }({\varvec{x}}_{N};s)\ge c(s)N^{-s/2}. \end{aligned}$$
(1.6)

1.1.3 Quasi-Monte Carlo designs on the two-sphere

Consider now the case when X is the two-dimensional sphere endowed with the Riemannian metric induced from the standard embedding of X as the unit-sphere in Euclidean \(\mathbb {R}^{3}.\) We will denote by \(d\sigma \) the probability measure on X obtained by normalizing the area form of g. Following [10] a sequence of \(N-\)point configurations \({\varvec{x}}_{N}\in X^{N}\) is said to be a sequence of Quasi-Monte-Carlo designs (QMC) wrt the smoothness parameter \(s\in ]1,\infty [,\) if the corresponding worst case errors \(\text {wce }({\varvec{x}}_{N};s)\) have optimal convergence order, i.e. if

$$\begin{aligned} \text {wce }({\varvec{x}}_{N};s)={\mathcal {O}}(N^{-s/2}) \end{aligned}$$

In particular, this convergence is faster than the one offered by the standard Monte-Carlo method. Indeed, as recalled above, Monte-Carlo integration gives, with high probability, an error of the order \(N^{-1/2}\) for a fixed function f,  even if the function is smooth.

The notion of a QMC design is modeled on the influential notion of a spherical t-design \({\varvec{x}}_{N}\in X^{N},\) introduced in [13]. In fact, as shown in [10, Thm 6], it follows from the solution of the Korevaar-Meyers conjecture in [7], that there exists a sequence of spherical \(t-\)designs \(X^{N}\) with t of the order \(N^{1/2},\) which is a QMC design for any \(s\in ]1,\infty [.\) Moreover, for a fixed \(s\in ]1,2[\) reproducing kernel techniques reveal that any sequence of maximizers \({\varvec{x}}_{N}(s)\in X^{N}\) of the generalized sum

$$\begin{aligned} \sum _{i,j\le N}\left| x_{i}-x_{j}\right| ^{2s-2} \end{aligned}$$

is a QMC design wrt the smoothness parameter s (see [10]).

However, all the sequences of QMC designs discussed above are non-explicit for N large. Moreover, approximating them numerically is very challening for large N,  due, in particular, to an abundance of local minima of the corresponding functionals to be minimized on \(X^{N}\) [19, 36]. One is thus lead to wonder if probabilistic methods can be used to improve the convergence order of the standard Monte-Carlo method by taking the points \(x_{1},...,x_{N}\) on the sphere to be appropriately correlated, as in repulsive particle systems? A natural class of such point processes is offered by the class of determinantal point processes, whose utility for Monte-Carlo type numerical integration was advocated in [2]. The main aim of the present work is to show that a particular determinantal point process on the two-sphere known as the spherical ensemble enjoys quite remarkable converge properties from the point of view of numerical integration, for any smoothness parameter \(s\in ]1,2].\)

1.2 Main results for the spherical ensemble

The spherical ensemble first appeared as a Coulomb gas, also known as a one-component plasma, in the physics literature (see the monograph [16] and references therein). We recall that the Coulomb gas on the two-sphere X with N particles, at inverse temperature \(\beta ,\) is defined by the following symmetric probability measure on \(X^{N}:\)

$$\begin{aligned} d\mathbb {P}_{N,\beta }:=\frac{1}{Z_{N,\beta }}e^{-\beta E^{(N)}}d\sigma ^{\otimes N},\,\,\,\,E^{(N)}:=-\sum _{i\ne j\le N}\frac{1}{2}\log \left| x_{i}-x_{j}\right| \end{aligned}$$
(1.7)

where X has been embedded as the unit-sphere in Euclidean \(\mathbb {R}^{3}.\) It represents the microscopic state of N unit charge particles in thermal equilibrium on X,  interacting by the Coulomb energy \(E^{(N)},\) subject to a neutralizing uniform back-ground charge. More precisely, the spherical ensemble \((X,\mathbb {P}_{N})\) coincides with Coulomb gas on the sphere at the particular inverse temperature \(\beta =2,\) for which the Coulomb gas becomes a determinantal point process [16, 22]. The following elegant random matrix realization of the spherical ensemble was exhibited in [24]. Consider two rank N complex matrices A and B and take their entries to be standard normal variables. Then the spherical ensemble coincides with the random point process defined by the eigenvalues of \(AB^{-1}\) in the complex plane, when mapped to the two-sphere, using stereographic projection.

In the present work it is shown that a random \(N-\)point configuration \({\varvec{x}}_{N}\) in the spherical ensemble is, with overwhelming probability, nearly a Quasi-Monte Carlo design for any \(s\in ]1,2]:\)

Theorem 1.1

Consider the spherical ensemble with N particles. There exists a constant C such that for any given R in \([(\log N)^{-1/2},N(\log N)^{-1/2}]\)

$$\begin{aligned} \mathbb {P}_{N}\left( \text {wce }({\varvec{x}}_{N};2)\le CR\frac{(\log N)^{1/2}}{N}\right) \ge 1-\frac{1}{N^{R^{2}/C-C}} \end{aligned}$$

As a consequence, for any \(s\in ]1,2],\) there exists a constant C(s) such that for any given R in \([(\log N)^{-1/2},N(\log N)^{-1/2}]\)

$$\begin{aligned} \mathbb {P}_{N}\left( \text {wce}({\varvec{x}}_{N};s)\le C(s)R^{s/2}\frac{(\log N)^{s/4}}{N^{s/2}}\right) \ge 1-\frac{1}{N^{R^{2}/C-C}}. \end{aligned}$$

Note that for N large the restrictions on the parameter R are essentially negligable. That the estimate in the case \(s=2\) implies the one for \(s<2\) follows from [10, Lemma 26]. Interestingly, the worst case error in the case \(s=2\) is similar to the worst case error of low-discrepancy sequences on the cube (formula 1.3). It should, however, be stressed that on the two-sphere there are no explicitly constructed sequences \(({\varvec{x}}_{N})\) saturating the optimal rate \(\text {wce }({\varvec{x}}_{N};2)={\mathcal {O}}(N^{-1}),\) even if logarithmic factors are included (see the discussion in Sect. 1.5.1).

An interesting feature of the proof of the previous theorem is that it yields attractive values on the constants in question. Indeed, for any \(\eta >2/\log N\) the following explicit bound is obtained:

$$\begin{aligned} \mathbb {P}_{N}\left( \text {wce }({\varvec{x}}_{N};2)\le e\left( \frac{1+\eta }{4\pi }\right) ^{1/2}\frac{\left( \log N\right) ^{1/2}}{N}\right) \ge 1-\frac{(\frac{1}{2}e\eta \log N)^{2}}{N^{\eta }}, \end{aligned}$$
(1.8)

where \(\log N\) denotes the natural logarithm, assuming that \(N\ge 1000\) (otherwise, Euler’s number e appearing in the left hand side has to be replaced by a slightly larger constant). For example, when \(N=1000\) this yields the worst-case-error bound \(\text {wce }({\varvec{x}}_{N};2)<0.003\) with \(99.9\%\) confidence (by taking \(\eta =2)\). Moreover, \(N=10000\) yields \(\text {wce }({\varvec{x}}_{N};2)<0.0004\) with \(99.999\%\) confidence.

The previous theorem should be compared with the conjecture in [10], supported by numerical simulations, saying that minimizers \({\varvec{x}}_{N}\) of the logarithmic energy \(E^{(N)}\) (formula 1.7) are QMC designs for \(s\in ]1,3].\) However, a practical advantage of the spherical ensemble is that it can be simply generated by employing \({\mathcal {O}}(N^{3})\) elementary operations (using its random matrix representation), while no polynomial time algorithm for constructing near-minimizers of \(E^{(N)}\) is known [34, Problem 7]. Concerning the sharpness of the inequalities in the previous theorem we note that the restriction to \(s\le 2\) is necessary, as follows from formula 1.10 below. Moreover, the power 1/2 of \(\log N\) appearing in the inequalities for \(s=2\) can be expected to be optimal.

Theorem 1.1 will be deduced from a new concentration of measure inequality in Sobolev spaces (Theorem 1.3 below), which, in turn, will follow from the following bound on the moment generating function of the square of the random variable \(\text {wce }({\varvec{x}}_{N};s).\)

Theorem 1.2

For any \(\epsilon >0\) and \(\alpha \in ]0,4\pi 2^{\epsilon }[\)

$$\begin{aligned} \mathbb {E}_{N}\left( e^{\alpha N^{2}\left( \text {wce }({\varvec{x}}_{N};2+\epsilon )\right) ^{2}}\right) \le \left( \det \left( I-\frac{\alpha }{2\pi }\Delta _{g}^{-(1+\epsilon )}\right) \right) ^{-1/2}<\infty \end{aligned}$$

where \(\Delta _{g}\) denotes the Laplace operator on X and \(\det (I-\lambda \Delta _{g}^{-(1+\epsilon )})\) is the Fredholm (spectral) determinant of its fractional power \(\Delta _{g}^{-(1+\epsilon )}\) (see Sect. 2.1).

In fact, this is an asymptotic equality, as \(N\rightarrow \infty \) (as will be shown elsewhere [3]). Combining Theorem 1.2 with some spectral theory the following concentration of measure inequality for the spherical ensemble is obtained:

Theorem 1.3

There exist explicit constants \(A_{1},A_{2}\) and \(A_{3}\) such that for any positive integer N and \(\epsilon >0\)

$$\begin{aligned} \mathbb {P}_{N}\left( \left\| \delta _{N}-d\sigma \right\| _{H_{0}^{-(2+\epsilon )}(X)}>\delta \right) \le e^{-A_{1}N^{2}\delta ^{2}+\frac{A_{2}}{\epsilon }+A_{3}}. \end{aligned}$$

1.3 Outlook on the the case of general compact surfaces

The results for the two-sphere can be generalized to any compact two-dimensional Riemannian manifold X. Here we will just highlight the main points, deferring details to [3]. Given a Riemannian surface (Xg) of strictly positive genus denote by \(g_{c}\) the unique Riemannian metric on X with constant curvature which is conformally equivalent to g. To \((X,g_{c})\) one can attach a canonical \(N-\)particle determinantal point process \((X^{N},d\mathbb {P}_{N}),\) which can be viewed as a higher genus generalization of the spherical ensemble [4]. In this general setting the bound in Theorem 1.2 holds up to multiplying the right hand side with a factor of the form \((1+e^{-\delta /N}),\) for an explicit positive constant \(\delta ,\) depending on the injectivity radius of \((X,g_{c}).\) This is shown in essentially the same way as in the spherical setting, using the general Moser-Trudinger type inequalities in [4] as a replacement for the inequalities 1.11 recalled below. The analogs of Theorems 1.1, 1.3 then follow as before. In particular,

$$\begin{aligned} \left\| \frac{1}{N}\sum _{i=1}^{N}\delta _{x_{i}}-dV_{g_{c}}\right\| _{H_{0}^{-2}(X)}={\mathcal {O}}\left( \frac{(\log N)^{1/2}}{N}\right) \end{aligned}$$

holds with probability \(1-{\mathcal {O}}(1/N^{\infty })\) (in the sense of Theorem 1.1). Expressed in terms of the original Riemannian metric g this means that introducing the “weight function”

$$\begin{aligned} w:=dV_{g}/dV_{g_{c}} \end{aligned}$$

and sampling a configuration \((x_{1},...,x_{N})\) in the canonical \(N-\)particle ensemble \((X^{N},d\mathbb {P}_{N})\)

$$\begin{aligned}{} & {} \mathbb {P}_{N}\left( \sup _{f:\,\left\| f\right\| _{H_{0}^{2}(X)}\le 1}\left| \int _{X}fdV_{g}-\frac{1}{N}\sum _{i=1}^{N}f(x_{i})w(x_{i})\right| \le {\mathcal {O}}\left( \frac{(\log N)^{1/2}}{N}\right) \right) \nonumber \\{} & {} \quad \ge 1-{\mathcal {O}}(1/N^{\infty }) \end{aligned}$$
(1.9)

(in the sense of Theorem 1.1). In fact, the original Riemannian metric g also induces a determinantal \(N-\)particle point process on X. In physical terms, the corresponding probability measure \(d\mathbb {P}_{g}^{N}\) on \(X^{N}\) represents the probability density for an integer Quantum Hall state, i.e. an \(N-\)particle state of electrons confined to (Xg) subject to a constant magnetic field, whose strength is proportional to N. However, as explained in [3], the error in the corresponding estimate 1.9 will be of the larger order \({\mathcal {O}}(1/N^{1/2})\) (unless g has constant curvature). Accordingly, replacing the original metric g with the constant curvature one \(g_{c}\) is analogous to the use of importance sampling in the standard Monte-Carlo method. Recall that the latter method amounts to calculating integrals \(\int _{X}fdV_{g}\) by taking the points \(x_{i}\) to be independent realizations of a “target measure” \(\nu \) (taken to be different than \(dV_{g}\) with the aim of reducing the variance; compare the discussion in [2, Section 1.2])

It should be stressed, however, that one advantage of the spherical setting is that all the constants can be explicitly estimated, while, in the general setting, the constants depend on spectral invariants of \((X,g_{c}).\) Moreover, from a practical point of view the random matrix realization of the spherical ensemble offers a convenient implementation algorithm, while the general algorithm for simulating determinantal point processes [22] has to be employed for a general surface X (which, loosely speaking, replaces the task of finding the N eigenvalues with Gram-Schmidt orthogonalization).

1.4 Outline of the proof of Theorem 1.1

As shown in [30], for a fixed function f on X the following Central Limit Theorem (CLT) holds for the spherical ensemble: for any \(f\in H^{1}(X),\) normalized so that \(\int |\nabla _{g}f|^{2}dV_{g}=4\pi ,\)

$$\begin{aligned} \lim _{N\rightarrow \infty }\mathbb {P}_{N}\left( \left| \frac{1}{N}\sum _{i=1}^{N}f(x_{i})-\int _{X}fd\sigma \right| \ge \frac{\lambda }{N}\right) =1-\int _{|y|\ge \lambda }e^{-y^{2}}dx/\pi ^{1/2}\qquad \end{aligned}$$
(1.10)

A key ingredient in the proof of Theorem 1.1, given in Sect. 2, is the following quantitative refinement of the previous CLT, obtained in [4]:

$$\begin{aligned} \mathbb {P}_{N}\left( \frac{1}{N}\sum _{i=1}^{N}f(x_{i})-\int _{X}fd\sigma \le \frac{\lambda }{N}\right) \le e^{-\lambda ^{2}/2} \end{aligned}$$

More precisely, the following slightly stronger dual bound on the moment generating function was established in [4] (using complex differential geometry):

$$\begin{aligned} \mathbb {E}_{N}\left( \exp N(N+1)\lambda \left( \frac{1}{N}\sum _{i=1}^{N}f(x_{i})-\int _{X}fd\sigma \right) \right) \le e^{N(N+1)\lambda ^{2}/2} \end{aligned}$$
(1.11)

(coinciding when \(N=1\) with the well-known sharp Moser-Trudinger inequality on the two-sphere). Note, however, that these inequalities only hold for a fixed normalized function \(f\in H^{1}(X)\) and fail drastically for the random variable \(\text {wce }({\varvec{x}}_{N};1)\) obtained by taking the sup over all normalized \(f\in H^{1}(X).\) Indeed, \(\text {wce }({\varvec{x}}_{N};1)=\infty \) on all of \(X^{N}\) since \(H^{1}(X)\) contains unbounded functions (recall that \(s=1\) is the borderline case for the Sobolev embedding of \(H^{s}(X)\) into C(X)).

Here we will interpret the inequality 1.11 as the statement that the random variable

$$\begin{aligned} Y_{N}:=\sum _{i=1}^{N}\delta _{x_{i}}-Nd\sigma \end{aligned}$$

taking values in the dual Sobolev space \(H^{-(2+\epsilon )}(X)\) is sub-Gaussian wrt a canonical Gaussian random variable G on \(H^{-(2+\epsilon )}(X)\) (see Remark 3.2). Using some basic Gaussian measure theory and spectral theory for the Laplacian we then deduce the moment bound in Theorem 1.2, which, in turn, implies the concentration of measure inequality Theorem 1.3. Finally, we show that the latter inequality, implies Theorem 1.1, when combined with the results in [9, 10] relating \(\text {wce }({\varvec{x}}_{N};s)\) corresponding to different values of s.

1.5 Further comparison with previous results

As shown in [20, Thm 1], building on [1], the spherical ensemble satisfies the following asymptotics: for any \(s\in ]1,2[\) there exists a positive constant C(s) such that

$$\begin{aligned} \sqrt{\mathbb {E}\left( \text {wce }({\varvec{x}}_{N};s)^{2}\right) }=C(s)N^{s/2}+o(N^{s/s}) \end{aligned}$$
(1.12)

This result should be compared with [10, Thm 24], which says that if X is partitioned into N equal area regions whose diameters are bounded by \(CN^{-1/2}\) and a sequence \({\varvec{x}}_{N}\) of N point is randomly chosen from N different regions, then the corresponding \(\sqrt{\mathbb {E}\left( \text {wce }({\varvec{x}}_{N};s)^{2}\right) }\) is also of the order \({\mathcal {O}}(N^{s/2}),\) when \(s\in ]1,2[.\) However, in contrast to Theorem 1.1, the results in [10, 20], referred to above, do not give any information about the probability that the worst-case-error \(\text {wce }({\varvec{x}}_{N};s)\) for a random sequence \({\varvec{x}}_{N}\) in the corresponding ensembles is close to the average worst-case-error. The only previous result in this direction appears to be [1, Thm 1.1], saying that for any \(M>0\) there exists \(C_{M}>0\) such that

$$\begin{aligned} \mathbb {P}\left( D_{L^{\infty }}^{C}({\varvec{x}}_{N})\le C_{M}\frac{(\log N)^{1/2}}{N^{3/4}}\right) \ge 1-\frac{1}{N^{M}}. \end{aligned}$$
(1.13)

where \(D_{L^{\infty }}^{C}({\varvec{x}}_{N})\) is the \(L^{\infty }-\)spherical cap discrepancy defined by

$$\begin{aligned} D_{L^{\infty }}^{C}({\varvec{x}}_{N}):=\sup _{f=1_{{\mathcal {C}}}}\left| \int _{X}fd\sigma -\frac{1}{N}\sum _{i=1}^{N}f(x_{i})\right| \end{aligned}$$

where the sup if taken over all characteristic f functions of the form \(f=1_{{\mathcal {C}}},\) where \({\mathcal {C}}\) is a spherical cap in the two-sphere X,  i.e. the intersection of X with a half-space in \(\mathbb {R}^{3}\) (the proof of the inequality 1.13 is based on a variance estimate in [1]). Since

$$\begin{aligned} \text {wce }\left( {\varvec{x}}_{N};\frac{3}{2}\right) \le aD_{L^{\infty }}^{C}({\varvec{x}}_{N}) \end{aligned}$$
(1.14)

for an explicit constant a [10, Page 16]), the inequality 1.13 implies that

$$\begin{aligned} \mathbb {P}\left( \text {wce }\left( {\varvec{x}}_{N};\frac{3}{2}\right) \le C_{M}\frac{(\log N)^{1/2}}{N^{3/4}}\right) \ge 1-\frac{1}{N^{M}}. \end{aligned}$$
(1.15)

This is a bit weaker than the case \(s=3/2\) of Theorem 1.1 (where the power of \(\log N\) is \(3/8(<1/2)\) and moreover the dependence of \(C_{M}\) on M is made explicit). We recall that the inequality 1.14 follows from the fact that \(\text {wce }({\varvec{x}}_{N};\frac{3}{2})\) is comparable to the \(L^{2}-\)spherical cap discrepancy

$$\begin{aligned} D_{L^{2}}^{C}({\varvec{x}}_{N}):=\int _{\mathcal {}}\left| \int _{X}fd\sigma -\frac{1}{N}\sum _{i=1}^{N}f(x_{i})\right| ^{2}Df,\,\,\,f=1_{{\mathcal {C}}} \end{aligned}$$

where Df is a certain probability measure measure on the space of all spherical caps \({\mathcal {C}}\) [10, Page 16].

1.5.1 Explicit sequences for numerical integration on the sphere

As recalled above, \(\text {wce }({\varvec{x}}_{N};\frac{3}{2})\) is comparable to the \(L^{2}-\)spherical cap discrepancy \(D_{L^{2}}^{C}({\varvec{x}}_{N}).\) In [26, 27] the representation theory of Hecke operators and modular forms was used to obtain an explicit sequence satisfying the bound \(D_{L^{2}}^{C}({\varvec{x}}_{N})\le CN^{-1/2}\log N\) (see [26, 27, I, Thm 2.2]). The proof of the bound uses Deligne’s proof of the Weil conjectures and also yields, as explained in [9, Remark 3.10], \(\text {wce }({\varvec{x}}_{N};s)\le CN^{-1/2}\log N\) for any \(s>1.\) However, these rates are only close to optimal as s approaches 1. A different sequence satisfying \(D_{L^{2}}^{C}({\varvec{x}}_{N})\le CN^{-1/2}(\log N)^{1/2}\) was then constructed in [14], by mapping a digital net on the square to X. Numerical evidence was provided in [14] indicating that the latter sequence has the optimal rate \({\mathcal {O}}(N^{-3/4}).\) See also [10, Section 8] for numerical experiments for a range of different classes of point sets on the two-sphere.

1.5.2 Concentration of measure

It may be illuminating to compare Theorem 1.1 for \(s=2\) with the concentration of measure inequalities for independent random variables established in [6], which can be viewed as a quantitative refinement of the classical CLT 1.2. In the particular case of standard Monte-Carlo integration on a cube the inequalities in [6] imply that there exists a constant C such that for any \(R>C\)

$$\begin{aligned} \mathbb {P}_{N}\left( \sup _{\left\| \nabla f\right\| _{L^{\infty }}\le 1}\left| \int _{X}fdx-\frac{1}{N}\sum _{i=1}^{N}f(x_{i})\right| \le R\frac{(\log N)^{1/2}}{N^{1/2}}\right) \ge 1-\frac{1}{N^{R^{2}/C}}\qquad \end{aligned}$$
(1.16)

(see also [5] for a simplified proof). This inequality thus exhibits the slower rate \((\log N)^{1/2}/N^{1/2},\) since the points \(x_{i}\) are independent random variables. Moreover, the role of the Sobolev norm \(W^{1,2}\) appearing for \(s=2\) is in the inequality 1.16 played by the \(W^{1,\infty }-\)norm \(\left\| \nabla f\right\| _{L^{\infty }}\). The proof uses the dual representation of the \(W^{1,\infty }-\)norm between probability measures as the \(L^{1}-\)Wasserstein metric (aka Monge-Kantorovich distance) which fits into the general setting of optimal transport theory. We also recall that in the particular case of one dimension the sharp form of the Dvorestky-Kiefer-Wolfowitz inequality for N independent real random variables with the same distribution \(\mu ,\) assumed to have a continuos density, says that

$$\begin{aligned} \mathbb {P}_{N}\left( d_{K}(\mu ,\frac{1}{N}\sum _{i=1}^{N}\delta _{x_{i}}\ge \lambda )\right) \le 2e^{-2N\lambda ^{2}},\,\,\,\,d_{K}(\mu ,\nu ):=\sup _{x\in \mathbb {R}}\left| \int _{-\infty }^{x}(\mu -\nu )\right| , \end{aligned}$$

where \(d_{K}\) denotes the Kolomgorov distance. As a consequence, if \(\mu \) and the points \(x_{i}\) are supported in an interval X of unit-length, then it follows from the fact that the \(L^{1}-\)Wasserstein metric on X is bounded from above by \(d_{K}\) that

$$\begin{aligned} \mathbb {P}_{N}\left( \sup _{\left\| \nabla f\right\| _{L^{\infty }}\le 1}\left| \int _{X}f\mu -\frac{1}{N}\sum _{i=1}^{N}f(x_{i})\right| \ge \frac{\lambda }{N^{1/2}}\right) \le 2e^{-2\lambda ^{2}} \end{aligned}$$

(see the discussion in [5, page 2304-2305]).

For generalized Wigner random matrices of rank N, whose eigenvalues define a random point with N points on \(\mathbb {R}\) (which for standard Wigner matrices is a determinental point process, as well as a Coulomb gas) a concentration of measure type inequality was obtained in [15, Thm 2.2], expressed in terms of the Kolmogorov distance \(d_{K}\) on an interval. Denoting by \(\mu \) the semi-circle law it yields, by bounding the \(W^{1,\infty }-\)norm by \(d_{K},\) the following concentration inequality on an interval X containing a neighborhood of the support of \(\mu \),

$$\begin{aligned} \mathbb {P}_{N}\left( \sup _{\left\| \nabla f\right\| _{L^{\infty }}\le 1}\left| \int _{X}f\mu -\frac{1}{N}\sum _{i=1}^{N}f(x_{i})\right| \le \frac{(\log N)^{L}}{N}\right) \le Ce^{-c(\log N)^{\epsilon L}}, \end{aligned}$$

for some positive constants Cc and \(\epsilon \) and an appropriate positive number L (depending on N),  assuming that N is taken sufficiently large. Thus, in this case the distance in question is of the order \(\frac{(\log N)^{L}}{N}\) with overwhelming probability.

Generalizations of the concentration of measure inequality 1.16 to general Coulomb (and Riesz) gas ensembles \((d\mathbb {P}_{N,\beta },\mathbb {R}^{N})\) in Euclidean \(\mathbb {R}^{N}\) have been obtained in [11, 31] and on compact Riemannian manifolds in [17]. In particular, in the case of the spherical ensemble the inequalities in [17] say that

$$\begin{aligned} \mathbb {P}_{N}\left( \sup _{\left\| \nabla f\right\| _{L^{\infty }}\le 1}\left| \int _{X}fdx-\frac{1}{N}\sum _{i=1}^{N}f(x_{i})\right| \le \delta \right) \ge 1-e^{-\frac{N^{2}}{4\pi }\frac{\delta ^{2}}{2}+\frac{N}{4\pi }\log N+CN} \end{aligned}$$
(1.17)

To see the relation to the present \(L^{2}-\)setting note that the Sobolev inequality shows that, in dimension \(d=2,\)

$$\begin{aligned} \left\| \nabla f\right\| _{L^{\infty }}(X)\le C_{\epsilon }\left\| \nabla f\right\| _{H^{2+\epsilon }(X)} \end{aligned}$$

for any \(\epsilon >0.\) Hence, the inequality 1.17 implies a concentration inequality for Sobolev \(H^{-(2+\epsilon )}(X)-\)norms which is similar to the inequality in Theorem 1.3). However, the main virtue of the Gaussian estimate in Theorem 1.3 is that there are, apart from the term proportional to \(-N^{2}\delta ^{2}\), no \(N-\)dependent additional terms, for a fixed \(\epsilon >0.\) This allows one to apply Theorem 1.3 to \(\delta \) of the order \(N^{-1}\) (modulo logarithmic factors, by taking \(\epsilon \) to be of the order \(1/\log N),\) as in the proof of Theorem 1.1. In contrast, in the inequality 1.17 one can at best take \(\delta \) of the order \(N^{-1/2}\) due to the presence of the terms of order N.

2 Spectral preparations

We will denote by \(\mathbb {P}_{N}\) and \(\mathbb {E}_{N}\) the probabilities and expectations, respectively, defined wrt the spherical ensemble with \(N-\)particles \((X^{N},d\mathbb {P}_{N})\) (whose definition was recalled in Sect. 1.2). We start with some preliminaries.

2.1 Sobolev spaces and spectral theory

Let us first consider a general setup of Sobolev spaces on a compact Riemannian manifold (Xg). Denote by \(\left\langle \cdot ,\cdot \right\rangle _{L^{2}}\) the corresponding scalar product on \(C^{\infty }(X):\)

$$\begin{aligned} \left\langle u,v\right\rangle _{L^{2}}:=\int _{X}uvdV_{g}, \end{aligned}$$

where \(dV_{g}\) denotes the Riemannian volume form (we will denote by \(d\sigma _{g}\) the probability measure obtained by normalizing \(dV_{g}).\) We will denote by \(\Delta _{g}\) the Laplace operator on \(C^{\infty }(X),\) with the sign convention which makes \(\Delta _{g}\) a densely defined positive symmetric operator on \(L^{2}(X,dV_{g}):\)

$$\begin{aligned} \left\langle \Delta _{g}u,v\right\rangle _{L^{2}}:=\int _{X}g(\nabla _{g}u,\nabla _{g}v)dV_{g}, \end{aligned}$$

where \(\nabla _{g}u\) denotes the gradient of u wrt g. By the spectral theorem, for any \(p\in \mathbb {R}\) the pth power \(\Delta _{g}^{p}\) is a densely defined operator on \(L^{2}(X,dV_{g}).\)

Fix a “smoothness parameter” s,  assumed to be strictly positive:

  • \(H^{s}(X)/\mathbb {R}\) is defined as the completion of \(C^{\infty }(X)/\mathbb {R}\) with respect to the scalar product defined by

    $$\begin{aligned} \left\langle u,u\right\rangle _{s}:=\int _{X}\Delta _{g}^{s/2}u\Delta _{g}^{s/2}udV_{g}=\int _{X}u\Delta ^{s}udV_{g} \end{aligned}$$
    (2.1)
  • \(H_{0}^{-s}(X)\) is defined as the sub-space of all distributions \(\nu \) on X such that \(\left\langle \nu ,1\right\rangle =0\) satisfying

    $$\begin{aligned} \left\langle \nu ,\nu \right\rangle _{-s}:=\sup _{u\in C^{\infty }(X)}\frac{\left\langle \nu ,u\right\rangle }{\left\langle u,u\right\rangle _{s}}<\infty \end{aligned}$$

Here we view a distribution \(\nu \) on X as an element in the linear dual of the vector space \(C^{\infty }(X).\) We endow \(H^{s}(X)/\mathbb {R}\) and \(H_{0}^{-s}(X)\) with the Hilbert space structures defined by the scalar products \(\left\langle \cdot ,\cdot \right\rangle _{s}\) and \(\left\langle \cdot ,\cdot \right\rangle _{-s},\) respectively. Note that the norm on \(H^{s}(X)/\mathbb {R}\) is increasing wrt s,  while the norm on \(H_{0}^{-s}(X)\) is decreasing wrt s. Moreover, by definition, we have that

$$\begin{aligned} \left\| \delta _{N}({\varvec{x}}_{N})-d\sigma _{g}\right\| _{H_{0}^{-s}(X)}=\text {wce }({\varvec{x}}_{N};s) \end{aligned}$$
(2.2)

where \(\text {wce }({\varvec{x}}_{N};s)\) is the worst-case error for the integration rule on X with node set \({\varvec{x}}_{N}\) with respect to the smoothness parameter \(s\in ]1,\infty [\) (defined by formula 1.5). By the Sobolev embedding theorem \(\text {wce }({\varvec{x}}_{N};s)\) is finite precisely when \(s>\dim X/2,\)

By duality the operator \(\Delta _{g}\) is also defined on the space of all distributions \(\nu :\)

$$\begin{aligned} \left\langle \Delta _{g}\nu ,u\right\rangle :=\left\langle \nu ,\Delta _{g},u\right\rangle \end{aligned}$$

The following lemma follows directly from the definition of the Hilbert spaces in question:

Lemma 2.1

The operator \(\Delta _{g}\) induces an isometry when restricted to \(C^{\infty }(X)/\mathbb {R}\)

$$\begin{aligned} H^{s}(X)/\mathbb {R}\rightarrow H^{s-2}(X)/\mathbb {R},\,\,H_{0}^{-s}(X)\rightarrow H_{0}^{-s-2}(X) \end{aligned}$$

Next, recall that, by the spectral theorem, the set of eigenfunctions \(f_{i}\) of \(\Delta _{g}\) in \(C^{\infty }(X)\) form and orthonormal bases for \(L^{2}(X,dV_{g}).\) The following lemma then follows directly by duality:

Lemma 2.2

There exists an orthonormal basis \(\nu _{1},\nu _{2},...\) in the Hilbert space \(\left\langle H_{0}^{-s}(X),\left\langle \cdot ,\cdot \right\rangle _{-s}\right\rangle \) such that

$$\begin{aligned} \nu _{i}=f_{i}dV_{g} \end{aligned}$$

(acting on \(C^{\infty }(X)\) by integration) where \(f_{i}\) runs over all eigenfunctions of the Laplacian on \(C^{\infty }(X)\) with strictly positive eigenvalues \(\lambda _{i}.\) As a consequence, if \(f_{i}\) is normalized so that \(\left\| f_{i}\right\| _{L^{2}}=1\) and

$$\begin{aligned} \nu =\sum _{i=1}^{\infty }c_{i}f_{i} \end{aligned}$$

in \(H_{0}^{-s}(X)\) then

$$\begin{aligned} \left\langle \nu ,\nu \right\rangle _{-s}:=\sum _{i=1}^{\infty }\lambda _{i}^{-s}c_{i}^{2} \end{aligned}$$

Remark 2.3

In the literature different Sobolev space norms on \(H^{s}(X)/\mathbb {R}\) are often used, for example, obtained by replacing \(\Delta ^{s}\) in the last equality in formula 2.1 with \((I+\Delta )^{s}\) (as in [9, 10]) or, more generally, any other elliptic pseudodifferential operator \(P_{s}\) of order s. [12] Anyway, the norms on \(H^{s}(X)/\mathbb {R}\) defined by any two such operators are quasi-isometric, by elliptic regularity theory (see the discussions in [10, 12]). Hence, when the norm is changed Theorem 1.3 still applies if \(\delta \) is replaced by \(C(\epsilon )\delta \) for a positive constant \(C(\epsilon )\) (and similarly for Theorems 1.2, 1.1).

2.1.1 Spectral theory

Recall that the spectral zeta function of the Laplacian \(\Delta _{g}\) is defined by

$$\begin{aligned} \text {Tr}(\Delta _{g}^{-p}):=\sum _{i=1}^{\infty }\lambda _{i}^{-p} \end{aligned}$$

which is convergent for \(p>\dim X/2.\) More precisely,

$$\begin{aligned} \text {Tr}(\Delta _{g}^{-(d/2+\epsilon )})=\frac{1}{\Gamma (d/2)}\frac{\text {Vol}(X,g)}{(4\pi )^{d/2}}\frac{1}{\epsilon }+O(1),\,\,\,\epsilon \rightarrow 0^{+} \end{aligned}$$

as follows, for example, from the expansion of the heat kernel, i.e. the Schwartz kernel of \(\text {Tr}(e^{-t\Delta _{g}}).\) We will prove explicit estimates in the case of the two-sphere below.

The Fredholm (spectral) determinant of \(\Delta _{g}^{-p}\) is the function

$$\begin{aligned} D(\lambda ,p):=\det (I-\lambda \Delta _{g}^{-p}):=\prod _{i=1}^{\infty }(1-\lambda \lambda _{i}^{-p}) \end{aligned}$$

which is convergent for \(p>\dim X/2\) and \(\lambda \in ]0,\lambda _{1}^{p}[.\) Indeed, since the Taylor expansion of \(-\log (1-\lambda t)\) equals \(\sum _{m=1}^{\infty }\frac{\lambda ^{m}}{m},\)

$$\begin{aligned} -\log \det (I-\lambda \Delta _{g}^{-p}):=\sum _{m=1}^{\infty }\frac{\text {Tr}(\Delta ^{-mp})}{m}\lambda ^{m} \end{aligned}$$
(2.3)

2.1.2 The case of the two-sphere

Consider now the case when (Xg) is the two-sphere, i.e. the unit-sphere in \(\mathbb {R}^{3}\) endowed with the metric g induced by the Euclidean metric on \(\mathbb {R}^{3}.\) Note that under stereographic projection, whereby X minus the “north pole” is identified with \(\mathbb {R}^{2},\) we have

$$\begin{aligned} \Delta _{g}udV_{g}=-\left( \frac{\partial ^{2}u}{\partial x^{2}}+\frac{\partial ^{2}u}{\partial y^{2}}\right) dx\wedge dy,\,\,\,dV_{g}=\frac{4dx\wedge dy}{(1+x^{2}+y^{2})^{2}} \end{aligned}$$

Moreover, the set of non-zero eigenvalues of \(\Delta _{g}\) are given by all numbers of the form \(l(l+1),\) where l ranges over the positive integers. The eigenvalue corresponding to a given l has multiplicity \(2l+1.\)

Remark 2.4

Another convenient norm on \(H^{s}(X)/\mathbb {R}\) may be obtained by replacing \(\Delta _{g}\) with \(\Delta _{g}+1/4\) in formula 2.1 (compare the discussion in Remark 2.3). The point is that the eigenvalues of \(\Delta _{g}+1/4\) are given by \((l+1/2)^{2}.\) This implies that the corresponding spectral function may be explicitly expressed as \(\text {Tr}\left( (\Delta _{g}+4^{-1})^{-p}\right) :=2^{2p-2}\zeta (2p-1),\) where \(\zeta \) is Riemann’s zeta function (see [35, page 453]).

We will use the following refinement of [10, Lemma 26].

Lemma 2.5

Let (Xg) be a \(d-\)dimensional Riemannian manifold of non-negative Ricci curvature. Assume that \(d/2<s'<s.\) Then

$$\begin{aligned} \text {wce }({\varvec{x}}_{N};s')\le c(s',s)\text {(wce }({\varvec{x}}_{N};s)^{s'/s}, \end{aligned}$$

where the constant \(c(s',s)\) is given by

$$\begin{aligned} c(s',s)=\sqrt{\frac{\Gamma (s)}{\Gamma (s')}+\frac{2s^{s}e^{-s}}{\Gamma (s')(s'-d/2)}}. \end{aligned}$$

In particular, when \(d=2,\) \(c(2,2+\epsilon )<1.51<e^{1/2}\) for \(\epsilon \le 0.15.\)

Proof

The main difference compared to the case of the \(d-\)dimensional unit-sphere in [10, Lemma 26] is that the constant \(c(s,s')\) in [10, Lemma 26] depends on a non-explicit constant c such that for \(t\in ]0,\epsilon /2[\) the heat kernel \(K_{t}\) satisfies \(tK_{t}\le c\epsilon K_{\epsilon }\) on \(X\times X.\) Here we observe, in particular, that one can take \(c=1\) and allow \(t\in ]0,\epsilon [,\) i.e.

$$\begin{aligned} t\in ]0,\epsilon ]\implies tK_{t}\le \epsilon K_{\epsilon } \end{aligned}$$
(2.4)

as follows from the Li-Yau parabolic Harnack-inequality on any Riemannian manifold with non-negative Ricci curvature (apply [25, Thm 2.3] to \(u(x):=K_{t}(x,y)\) for y fixed and \(\alpha =1).\) Setting

$$\begin{aligned} \mu :=N^{-1}\sum _{i=1}^{N}\delta _{x_{i}}-dV_{g} \end{aligned}$$

formula 2.2 may be expressed as

$$\begin{aligned} W(s)^{2}:=\text {wce }({\varvec{x}}_{N};s)^{2}=\left\| \mu \right\| _{(H^{s}(X)/\mathbb {R})^{*}}^{2}=\sum _{\lambda _{i}>0}\lambda _{i}^{-s}\left| \left\langle \mu ,f_{i}\right\rangle \right| ^{2}, \end{aligned}$$
(2.5)

where, as before, \(\lambda _{i}\) and \(f_{i}\) denote the eigenvalues and eigenfunctions of the Laplacian for (Xg). Using the identity \(\lambda ^{-s}=\int _{0}^{\infty }t^{s-1}e^{-t\lambda }dt/\Gamma (s)\) and that \(\int \mu =0\) it follows that

$$\begin{aligned} \text {wce }({\varvec{x}}_{N};s)^{2}=\frac{1}{\Gamma (s)}\int _{0}^{\infty }t^{s-1}g(t)dt, \end{aligned}$$
(2.6)

where

$$\begin{aligned} g(t)&:=\int \int {\mathcal {K}}_{t}(x,y)\mu \otimes \mu =\int \int K_{t}(x,y)\mu \otimes \mu ,\,\,\,\,{\mathcal {K}}_{t}(\cdot ,y)\\&=\sum _{\lambda _{i}>0}e^{-t\lambda _{i}}f_{i}(x)f_{i}(y) \end{aligned}$$

with \({\mathcal {K}}_{t}\) denoting the heat-kernel \(K_{t}\) for (Xg) with the constant term removed (note that the formula in [10] corresponding to 2.6 contains a factor \(e^{-t}\) due to the different definition of the Sobolev norms in [10]). Now, setting

$$\begin{aligned} \epsilon :=\text {wce }({\varvec{x}}_{N};s)^{2/s}, \end{aligned}$$
(2.7)

we split the integral over t in formula 2.6, with s replaced by \(s',\) over the two disjoint regions \(]\epsilon ,\infty [\) and \(]0,\epsilon ]\) (in [10, Lemma 26] there are three regions and the role of \(\epsilon \) is played by \(\epsilon /2,\) but here we can take \(\epsilon \) since we will use the sharper estimate 2.4). First note that, since \(s'\le s,\)

$$\begin{aligned} \int _{\epsilon }^{\infty }t^{s'-1}g(t)dt= & {} \epsilon ^{s'}\int _{\epsilon }^{\infty }(t/\epsilon )^{s'}t^{-1}g(t)dt\le \epsilon ^{s'}\int _{\epsilon }^{\infty }(t/\epsilon )^{s}t^{-1}g(t)dt\\= & {} \epsilon ^{s'-s}\int _{\epsilon }^{\infty }t^{s-1}g(t)dt. \end{aligned}$$

Hence, bounding the latter integral with the integral over all of \([0,\infty [\) yields

$$\begin{aligned} \frac{1}{\Gamma (s')}\int _{\epsilon }^{\infty }t^{s'-1}g(t)dt\le & {} \frac{\Gamma (s)}{\Gamma (s')}\left( \epsilon ^{s'-s}\frac{1}{\Gamma (s)}\int _{0}^{\infty }t^{s-1}g(t)dt\right) =:\frac{\Gamma (s)}{\Gamma (s')}\epsilon ^{s'-s}\epsilon ^{s}\\= & {} \frac{\Gamma (s)}{\Gamma (s')}\epsilon ^{s'} \end{aligned}$$

Turning to the region where \(t\in [0,\epsilon ],\) the estimate 2.4 for the heat-kernel \(K_{t}\) yields

$$\begin{aligned} g(t)\le t^{-d/2}\epsilon ^{d/2}g(\epsilon ). \end{aligned}$$

Hence,

$$\begin{aligned} \int _{0}^{\epsilon }t^{s'-1}g(t)dt\le \left( \int _{0}^{\epsilon }t^{s'-1-d/2}dt\right) \epsilon ^{d/2}g(\epsilon )=\frac{1}{s'-d/2}\epsilon ^{s'}g(\epsilon ). \end{aligned}$$

Finally, we will show that \(g(\epsilon )\) is uniformly bounded (there is a typo in the proof of [10, Lemma 26] saying that g(t) is uniformly bounded for \(t\in [0,1[,\) which contradicts the blow-up of the heat-kernel as \(t\rightarrow 0).\) First, by the very definition of the worst case error W(s), 

$$\begin{aligned} \int {\mathcal {H}}_{t}(\cdot ,y)\mu (y)\le W(s)\left\| \int {\mathcal {H}}_{t}(\cdot ,y)\mu (y)\right\| _{H^{s}(X)}. \end{aligned}$$

The square of the latter Sobelev norm may be expressed as

$$\begin{aligned} \left\| \sum _{\lambda _{i}>0}e^{-t\lambda _{i}}\left\langle \mu ,f_{i}\right\rangle f_{i}(x)\right\| _{H^{s}(X)}^{2}=\sum _{\lambda _{i}>0}\lambda _{i}^{s}e^{-2t\lambda _{i}}\left| \left\langle \mu ,f_{i}\right\rangle \right| ^{2}. \end{aligned}$$

Hence, rewriting \(\lambda ^{s}=\lambda ^{-s}\lambda ^{2s},\)

$$\begin{aligned} \left\| \int {\mathcal {H}}_{t}(\cdot ,y)\mu (y)\right\| _{s}\le \left( \sum _{\lambda _{i}>0}\lambda _{i}^{-s}\left| \left\langle \mu ,f_{i}\right\rangle \right| ^{2}\right) ^{1/2}\sup _{\lambda >0}\lambda ^{s}e^{-t\lambda } \end{aligned}$$

By formula 2.5 the first factor equals W(s) and the second factor is given by

$$\begin{aligned} \sup _{\lambda >0}\lambda ^{s}e^{-t\lambda }=t^{-s}C(s),\,\,\,C(s):=s^{s}e^{-s} \end{aligned}$$

Since, by the definition 2.7, \(W(s)^{2}:=\epsilon ^{s},\) setting \(t=\epsilon \) thus yields the desired uniform bound:

$$\begin{aligned} g(\epsilon )\le 2\int {\mathcal {H}}_{t}(\cdot ,y)\mu (y)\le 2\epsilon ^{s}\epsilon ^{-s}C(s)=2C(s). \end{aligned}$$

Finally, adding up the two contributions to the ingral in formula 2.6 concludes the proof of the inequality in the lemma. Setting \(s'=2\) gives \(c(2,s)^{2}=\Gamma (s)+2s^{s}e^{-s}\) and hence, if \(\epsilon \le 0.15,\) then a numerical calcuation yields \(c(2,2+\epsilon )\le \sqrt{1.073+2(2.15/e)^{2.15}}<1.52<e^{1/2}.\) \(\square \)

Lemma 2.6

On the two-sphere (Xg) the following inequality holds for any \(\epsilon >0\)

$$\begin{aligned} \text {Tr}(\Delta _{g}^{-(1+\epsilon )})\le \frac{1}{\epsilon }+2,\,\,\,\text {Tr}(\Delta _{g}^{-2})=1 \end{aligned}$$

and hence

$$\begin{aligned} -\log \det (I-\lambda \Delta _{g}^{-(1+\epsilon )})\le \lambda \frac{1}{\epsilon }-4\log \left( 1-\frac{\lambda }{2}\right) . \end{aligned}$$

Proof

Setting \({\mathcal {Z}}(p):=\text {Tr}(\Delta _{g}^{-p}),\) for \(p\ge 1,\) we have

$$\begin{aligned} {\mathcal {Z}}(p)= & {} \sum _{l=1}^{\infty }\frac{2l+1}{l^{p}(l+1)^{p}}=2\sum _{l=1}^{\infty }\frac{1}{l^{p-1}(l+1)^{p}}+\sum _{l=1}^{\infty }\frac{1}{l^{p}(l+1)^{p}}\\\le & {} 2\sum _{l=1}^{\infty }\frac{1}{l^{p-1}(l+1)^{p}}+1, \end{aligned}$$

where the second sum was estimated by replacing p with 1 to get a a telescoping sum. Next,

$$\begin{aligned} \sum _{l=1}^{\infty }\frac{1}{l^{p-1}(l+1)^{p}}\le \frac{1}{2}+\sum _{l=2}^{\infty }\frac{1}{l^{p-1}(l+1)^{p}}\le \frac{1}{2}+\sum _{l=2}^{\infty }\frac{1}{l^{2p-1}}=\frac{1}{2}+\zeta (2p-1)-1 \end{aligned}$$

(using the trivial bound \(l+1\ge l),\) where \(\zeta (s)\) is the Riemann zeta function. Set \(s=1+\delta .\) As is well-known, when \(s>1,\) a resummation argument gives (cf. [18, formula 3])

$$\begin{aligned} \zeta (s)\le \frac{s}{s-1}=1+\frac{1}{\delta }, \end{aligned}$$

Hence, setting \(p=1+\epsilon ,\) gives \(\zeta (2p-1)=\zeta (1+2\epsilon )\le 1+1/(2\epsilon ).\) All in all, this means that

$$\begin{aligned} {\mathcal {Z}}(p)\le 2\left( \frac{1}{2}+\left( 1+\frac{1}{2\epsilon }\right) -1\right) +1=2+\frac{1}{\epsilon }, \end{aligned}$$

proving the first inequality in the lemma. Next, note that \({\mathcal {Z}}(2)\) can be computed as a telescoping sum:

$$\begin{aligned} {\mathcal {Z}}(2)=\sum _{l=1}^{\infty }\frac{2l+1}{l^{2}(l+1)^{2}}=\sum _{l=1}^{\infty }\frac{(l+1)^{2}-l^{2}}{l^{2}(l+1)^{2}}=\sum _{l=1}^{\infty }\frac{1}{l^{2}}-\frac{1}{(l+1)^{2}}=1+0. \end{aligned}$$

Now, Taylor expanding \(-\log \det (I-\lambda \Delta _{g}^{-(1+\epsilon )})\) (as in formula 2.3) gives

$$\begin{aligned} -\log \det (I-\lambda \Delta _{g}^{-(1+\epsilon )})= & {} \sum _{m=1}^{\infty }\frac{\lambda ^{m}}{m}\text {Tr}\left( \Delta ^{-m(1+\epsilon )}\right) \\\le & {} \lambda \left( \frac{1}{\epsilon }+2\right) +\sum _{m=2}^{\infty }\frac{\lambda ^{m}}{m}\text {Tr}\left( \Delta ^{-m(1+\epsilon )}\right) . \end{aligned}$$

Since the smallest strictly positive eigenvalue of \(\Delta _{g}\) is equal to 2 rewriting \(\lambda _{i}=2(\lambda _{i}/2)\) and using that that \(\lambda _{i}/2\ge 1\) now yields, for \(m\ge 2,\)

$$\begin{aligned} \text {Tr}\left( \Delta _{g}^{-m(1+\epsilon )}\right) \le 2^{2}2^{-m}\text {Tr}\left( \Delta _{g}^{-2}\right) =2^{2}2^{-m}\cdot 1. \end{aligned}$$

As a consequence,

$$\begin{aligned} \sum _{m=2}^{\infty }\frac{\lambda ^{m}}{m}\text {Tr}\left( \Delta _{g}^{-m(1+\epsilon )}\right) \le 2^{2}\sum _{m=2}^{\infty }\left( \frac{\lambda }{2}\right) ^{m}\frac{1}{m}=2^{2}\left( -\log \left( 1-\frac{\lambda }{2}\right) -\frac{\lambda }{2}\right) , \end{aligned}$$

using again that the Taylor expansion of \(-\log (1-t)\) equals \(\sum _{m=1}^{\infty }\frac{t^{m}}{m}.\) All in all, this means that \(-\log \det (I-\lambda \Delta _{g}^{-(1+\epsilon )})\) is bounded from above by

$$\begin{aligned} \lambda \left( \frac{1}{\epsilon }+2\right) -2\lambda -2^{2}\log \left( 1-\frac{\lambda }{2}\right) =\lambda \frac{1}{\epsilon }-4\log \left( 1-\frac{\lambda }{2}\right) , \end{aligned}$$

\(\square \)

Remark 2.7

By [28, Prop 5] \(\text {Tr}(\Delta _{g}^{-(1+\epsilon )})=1/\epsilon +2\gamma -1+o(1)\) as as \(\epsilon \rightarrow 0^{+},\) where \(\gamma =0.577...\) is Euler’s constant. But the point of the previous lemma is to bound the error term explicitly for \(\epsilon \) fixed.

3 Proofs of the main results

Fix \(s>2\) and a positive integer N. Consider the following \(H_{0}^{-s}(X)-\)valued random variable on \((X^{N},d\mathbb {P}_{N}):\)

$$\begin{aligned} Y_{N}:=N(\delta _{N}-d\sigma ):\,\,\,(X^{N},d\mathbb {P}_{N})\rightarrow H_{0}^{-s}(X), \end{aligned}$$

where \(\delta _{N}\) denotes the empirical measure 1.1 (the space \(H_{0}^{-s}(X)\) contains the image of \(Y_{N}\) for any \(s>\text {1, }\) but the restriction to \(s>2\) will turn out to be important in the following). To keep things as elementary as possible it will be convenient to consider truncated random variable taking values in finite dimensional approximations of \(H_{0}^{-s}(X)\) (but a more direct approach could also be used; see Remark 3.2). To this end fix an orthonormal basis \(\nu _{i}\) in the Hilbert space \(\left\langle H_{0}^{-s}(X),\left\langle \cdot ,\cdot \right\rangle _{-s}\right\rangle .\) It will be convenient to take \(\nu _{i}\) as in Lemma 2.2 ordered so that \(0<\lambda _{1}\le ...\le \lambda _{M}.\) Let \(H_{\le M}^{-s}(X)\) be the \(M-\)dimensional subspace of \(H_{0}^{-s}(X)\) defined by

$$\begin{aligned} H_{\le M}^{-s}(X):=\mathbb {R}\nu _{1}\oplus \cdots \oplus \mathbb {R}\nu _{M}\Subset H^{-s}(X) \end{aligned}$$

Denote by \(\pi _{M}\) the orthogonal projection from the Hilbert space \(H_{0}^{-s}(X)\) onto the \(M-\)dimensional subspace \(H_{\le M}^{-s}(X).\)

3.1 Step 1: \(\pi _{M}(Y_{N})\) is a sub-Gaussian random variable

We can view \(\pi _{M}(Y_{N})\) as a random variable on \((X^{N},d\mathbb {P}_{N})\) taking values in \(H_{\le M}^{-s}(X):\)

$$\begin{aligned} \pi _{M}(Y_{N}):\,\,\,(X^{N},d\mathbb {P}_{N})\rightarrow H_{\le M}^{-s}(X). \end{aligned}$$

The first step of the proof is to compare \(\pi _{M}(Y_{N})\) with a Gaussian random variable \(G_{M}\) taking values in \(H_{\le M}^{-s}(X).\) To this end we first endow \(H_{\le M}^{-s}(X)\) with the Hilbert structure define by the scalar product \(\left\langle \cdot ,\cdot \right\rangle _{-1}\) and denote by \(\gamma _{M}\) the Gaussian measure on \(\left\langle H_{\le M}^{-s}(X),\left\langle \cdot ,\cdot \right\rangle _{-1}\right\rangle .\) Concretely, this means that under any linear isometry of \(\left\langle H_{\le M}^{-s}(X),\left\langle \cdot ,\cdot \right\rangle _{-1}\right\rangle \) with \(\mathbb {R}^{N}\) the measure \(\gamma _{M}\) corresponds to the standard centered Gaussian measure on \(\mathbb {R}^{N}.\) Now define \(G_{M}\) as a random element in \(H_{\le M}^{-s}(X).\) In other words, \(G_{M}\) is the random variable defined by the identity map

$$\begin{aligned} G_{M}:=I\,\,\,(H_{\le M}^{-s}(X),\gamma _{M})\rightarrow H_{\le M}^{-s}(X) \end{aligned}$$

The following proposition says that the moment generating function of the random variable \(\pi _{M}(Y_{N}),\) viewed as a function on the linear dual \((H_{\le M}^{-s}(X))^{*}\) of \(H_{\le M}^{-s}(X)\) is bounded from above by the moment generating function of the scaled Gaussian random variable \(G_{M}/(4\pi )^{1/2}.\)

Proposition 3.1

The following inequality holds:

$$\begin{aligned} \mathbb {E}(e^{\left\langle \pi _{M}(Y_{N}),\cdot \right\rangle })\le \mathbb {E}\left( e^{\left\langle \frac{1}{\sqrt{4\pi }}G_{M},\cdot \right\rangle }\right) \end{aligned}$$
(3.1)

Equivalently, denoting by \(p_{N}^{(M)}\) the law of \(\pi _{M}(Y_{N}),\) i.e. the probability measure on \(H_{\le M}^{-s}(X)\) defined by the push-forward of \(d\mathbb {P}_{N}\) under the map \(\pi _{M}(Y_{N}),\) we have

$$\begin{aligned} L[p_{N}^{(M)}]\le L[F_{*}\gamma _{M}],\,\,\,\,\,F(v):=\frac{v}{\sqrt{4\pi }} \end{aligned}$$

where \(L[\Gamma ]\) denote the Laplace transform of a measure \(\Gamma \) on the finite dimensional vector space \(V:=H_{\le M}^{-s}(X),\) i.e. \(L[\Gamma ]\) is the function on \(V^{*}\) defined by

$$\begin{aligned} L[\Gamma ](w)=\int _{V}e^{-w}\Gamma \end{aligned}$$

Proof

Applying the Moser-Trudinger type inequality 1.11 for the spherical ensemble proved in [4] to \(f=w/(N+1)\) for \(w\in H^{1}(X)\) gives

$$\begin{aligned} \mathbb {E}(e^{\left\langle Y_{N},w\right\rangle })\le e^{\frac{N(N\text {+1)}}{(N+1)(N\text {+1)}}\frac{1}{2}\frac{1}{4\pi }\left\| w\right\| _{H^{1}(X)}^{2}}\le e^{\frac{1}{2}\frac{1}{4\pi }\left\| w\right\| _{H^{1}(X)}^{2}} \end{aligned}$$

In particular, taking \(w\in (H_{\le M}^{-s}(X))^{*},\) identified with a subspace of \(H^{1}(X)/\mathbb {R},\) gives \(\left\langle Y_{N},w\right\rangle =\left\langle \pi _{M}(Y_{N}),w\right\rangle \) and hence it will be enough to verify that

$$\begin{aligned} e^{\frac{1}{2}\left\| w\right\| _{H^{1}(X)}^{2}}=\mathbb {E}\left( e^{\left\langle G_{M},w\right\rangle }\right) (=L[\gamma _{N}](-w) \end{aligned}$$
(3.2)

To this end first note that under the identifications above \(\left\| w\right\| _{H^{1}(X)}^{2}\) coincides with the dual norm on the Hilbert space dual of \(\left\langle H_{\le M}^{-s}(X),\left\langle \cdot ,\cdot \right\rangle _{-1}\right\rangle .\) But then formula 3.2 follows from the well-known fact that if \(\gamma \) is the Gaussian measure on a finite dimensional Hilbert space H,  then

$$\begin{aligned} L[\gamma ](w)=\exp \left( \frac{1}{2}\left\| w\right\| _{H^{*}}^{2}\right) . \end{aligned}$$
(3.3)

Indeed, fixing an orthonormal basis in H this reduces to the basic fact that the Laplace transform of the measure \(e^{-|x|^{2}/2}dx\) on Euclidean \(\mathbb {R}^{N}\) is equal to \(e^{|y|^{2}/2},\) which, in turn, follows from “completing the square”. \(\square \)

Remark 3.2

In the terminology introduced by Kahane, the previous inequality says that the random variable \(\pi _{M}(Y_{N})\) is sub-Gaussian with respect to the Gaussian random variable \(\frac{1}{\sqrt{4\pi }}G_{M}.\) In fact, by letting \(M\rightarrow \infty \) this implies that \(Y_{N}\) is sub-Gaussian with respect to the Laplacian of the Gaussian free field [33], viewed as random variables taking values in \(H_{0}^{2+\epsilon }(X).\) This point of view will be elaborated on in [3].

3.2 Step 2: Bounding \(\mathbb {E}(e^{\left\| \pi _{M}(Y_{N})\right\| _{-s}^{2}})\)

We start with the following general

Lemma 3.3

Let H be a finite dimensional Hilbert space and denote by \(\gamma \) the corresponding Gaussian probability measure. If \(\Gamma \) is a measure on H such that the following inequality of Laplace transforms hold

$$\begin{aligned} L[\Gamma ]\le L[\gamma ] \end{aligned}$$

as functions on the dual vector space \(H^{*},\) then

$$\begin{aligned} \int e^{q}\Gamma \le \int e^{q}\gamma \end{aligned}$$
(3.4)

for the squared semi-norm q(v) defined by any given semi-positive symmetric bilinear form on H.

Proof

First observe that the inequality 3.4 holds for any function q on H which has the following positivity property: \(e^{q}\) is the Laplace transform of a positive measure \(\mu _{q}\) on \(H^{*},\) i.e.

$$\begin{aligned} e^{q(v)}=\int _{w\in H^{*}}e^{-\left\langle v,w\right\rangle }d\mu _{q}(w) \end{aligned}$$

Indeed, changing the order of integration (using Fubini) the integral of \(e^{q}\) against \(\Gamma \) may be expressed as

$$\begin{aligned} \int _{v\in H}\left( \int _{w\in H^{*}}e^{-\left\langle v,w\right\rangle }d\mu _{q}(w)\right) d\Gamma (v)=\int _{w\in H^{*}}\left( \int _{v\in H}e^{-\left\langle v,w\right\rangle }d\Gamma (v)\right) d\mu _{q}(w). \end{aligned}$$

Hence, by assumption,

$$\begin{aligned} \int _{v\in H}e^{q(v)}d\Gamma (v)\le \int _{w\in H^{*}}\left( \int _{v\in H}e^{-\left\langle v,w\right\rangle }d\gamma (v)\right) d\mu _{q}(w), \end{aligned}$$

which is equal to \(\int e^{q}\gamma \) (as seen by changing the order of integration again). All that remains is thus to verify the positivity property in question when q is a squared semi-norm. Identifying H with Euclidean \(\mathbb {R}^{M}\) and diagonalizing q we may as well assume that \(q=\sum |a_{i}x_{i}|^{2}/2\) for \(a_{i}\ge 0.\) Moreover, by approximation we may as well assume that \(a_{i}>0.\) But then it follows from formula 3.3 and scaling the variables that the measure \(d\mu _{q}=C\exp \left( -\sum |a_{i}^{-1}y_{i}|^{2}/2\right) dy_{1}...dy_{M}\) has the required property for a appropriate positive constant C. \(\square \)

In the present situation we get the following

Proposition 3.4

For any \(\alpha >0\) the following inequality holds:

$$\begin{aligned} \mathbb {E}\left( e^{\alpha \left\| \pi _{M}(Y_{N})\right\| _{-s}^{2}}\right) \le \mathbb {E}\left( e^{\frac{\alpha }{4\pi }\left\| G_{M}\right\| _{-s}^{2}}\right) =\prod _{i\le M}\left( 1-\frac{\alpha }{2\pi }\lambda _{i}^{-(s-1)}\right) ^{-1/2} \end{aligned}$$
(3.5)

Proof

The first inequality follows directly from combining Prop 3.1 and Lemma 3.3 with \(\gamma =F_{*}\gamma _{M}\) \(q(v):=\alpha \left\| v\right\| _{-s}^{2}.\) To prove the last equality denote by \(v_{i}\) an orthonormal base in the Hilbert space \(\left( H_{\le M}^{-s}(X),\left\langle \cdot ,\cdot \right\rangle _{-1}\right) .\) Given \(v\in H_{\le M}^{-s}(X)\) we decompose \(v=\sum _{i=1}^{M}v_{i}x_{i}\) and note that

$$\begin{aligned} \left\| v\right\| _{-s}^{2}=\sum _{i=1}^{M}x_{i}^{2}\lambda _{i}^{(1-s)} \end{aligned}$$

as follows from writing

$$\begin{aligned} \left\| v_{i}\right\| _{-s}^{2}:=\left\langle \Delta ^{-s}v_{i},v_{i}\right\rangle _{L^{2}}=\left\langle \Delta ^{-1}\left( \Delta ^{1-s}v_{i}\right) ,v_{i}\right\rangle _{L^{2}}=\lambda _{i}^{(1-s)}\left\langle \Delta ^{-1}v_{i},v_{i}\right\rangle _{L^{2}}=\lambda _{i}^{(1-s)}. \end{aligned}$$

Hence,

$$\begin{aligned} \mathbb {E}\left( e^{\frac{\alpha }{4\pi }\left\| G_{M}\right\| _{-s}^{2}}\right) =\prod _{i\le M}\int e^{\frac{\alpha }{4\pi }x_{i}^{2}\lambda _{i}^{(1-s)}}e^{-x_{i}^{2}/2}dx_{i}/Z_{M}, \end{aligned}$$

where \(Z_{M}\) is the total integral of \(\prod _{i\le M}e^{-x_{i}^{2}/2}dx_{i}.\) Finally, changing variables \(x_{i}\rightarrow \left( 1-\frac{\alpha }{2\pi }\lambda _{i}^{(1-s)}\right) ^{1/2}\) in the corresponding Gaussian integrals then concludes the proof of the proposition. \(\square \)

Letting \(M\rightarrow \infty \) and using the monotone convergence theorem now concludes the proof of Theorem 1.2.

3.3 Proof of Theorem 1.2 and Theorem 1.3

Set \(s=2+\epsilon \) and \(\lambda =\alpha /2\pi .\) By Lemma 2.6

$$\begin{aligned} \prod _{i\le M}\left( 1-\lambda \lambda _{i}^{-(s-1)}\right) ^{-1}\le \det (I-\lambda \Delta _{g}^{-(1+\epsilon )})^{-1}\le \exp \left( f(\lambda )\right) , \end{aligned}$$
(3.6)

where

$$\begin{aligned} f(\lambda ):=\lambda \frac{1}{\epsilon }-4\log \left( 1-\frac{\lambda }{2}\right) . \end{aligned}$$

Hence, for any fixed positive integer M Prop 3.4 gives

$$\begin{aligned} \mathbb {E}_{N}\left( e^{\alpha \left\| \pi _{M}(Y_{N})\right\| _{-(2+\epsilon )}^{2}}\right) \le \det \left( I-\lambda \Delta _{g}^{-(1+\epsilon )}\right) ^{-1/2}\le \exp \left( \frac{1}{2}f\left( \frac{\alpha }{2\pi }\right) \right) . \end{aligned}$$

Letting \(M\rightarrow \infty \) and using the monotone convergence theorem we deduce that

$$\begin{aligned} \mathbb {E}_{N}\left( e^{\alpha \left\| (Y_{N})\right\| _{-(2+\epsilon )}^{2}}\right) \le \det \left( I-\lambda \Delta _{g}^{-(1+\epsilon )}\right) ^{-1/2}\le \exp \left( \frac{1}{2}f\left( \frac{\alpha }{2\pi }\right) \right) , \end{aligned}$$

proving Theorem 1.2. Finally, by Chebyshev’s inequality, we can write \(\mathbb {P}_{N}\left( \left\| \delta _{N}-\sigma \right\| _{H^{-s}}>\delta \right) \) as

$$\begin{aligned} \mathbb {P}_{N}\left( \alpha \left\| Y_{N}\right\| _{H^{-(2+\epsilon )}(X)}^{2}>\alpha \delta ^{2}N^{2}\right)\le & {} e^{-\alpha \delta ^{2}N^{2}}\mathbb {E}_{N}\left( e^{\alpha \left\| (Y_{N})\right\| _{-(2+\epsilon )}^{2}}\right) \\\le & {} \exp \left( -\alpha \delta ^{2}N^{2}+\frac{1}{2}f\left( \frac{\alpha }{2\pi }\right) \right) \end{aligned}$$

Finally, since \(\lambda _{1}=2\) on the two-sphere the determinant in the theorem is finite for \(\lambda <2^{1+\epsilon },\) i.e. when \(\alpha <4\pi 2^{\epsilon }.\)

3.4 The optimal choice of \(\alpha \)

Setting \(\lambda :=\alpha /2\pi \) we have

$$\begin{aligned} \mathbb {P}_{N}\left( \left\| \delta _{N}-\sigma \right\| _{H(X)^{-(2+\epsilon )}}>\delta \right)&\le \exp \left( \frac{1}{2}\left( -4\pi \delta ^{2}N^{2}\lambda +f(\lambda )\right) \right) ,\,\nonumber \\ f(\lambda )&:=\lambda \frac{1}{\epsilon }-4\log \left( 1-\frac{\lambda }{2}\right) . \end{aligned}$$
(3.7)

First observe that \(f(\lambda )\) is strictly convex on [0, 2[,  \(f(0)=0\) and \(f(2^{-})=\infty .\) Hence, the optimal choice of \(\lambda \) satisfies

$$\begin{aligned} 4\pi \delta ^{2}N^{2}=f'(\lambda )=\epsilon ^{-1}+\frac{2}{1-\lambda /2},\,\,\,\lambda \in ]0,2[ \end{aligned}$$
(3.8)

if such a \(\lambda \) exists. Introducing the parameters \(R\in ]0,\infty [\) and \(\eta \in ]-1,\infty [\) determined by

$$\begin{aligned} R^{2}:=\delta ^{2}N^{2}\epsilon ,\,\,\,\eta :=4\pi R^{2}-1 \end{aligned}$$

the equation 3.8 for \(\lambda \) becomes

$$\begin{aligned} \eta =\frac{2\epsilon }{1-\lambda /2}\iff \lambda =2\left( 1-\frac{2\epsilon }{\eta }\right) \end{aligned}$$

assuming that R is sufficiently large to ensure that \(\lambda \in ]0,2[,\) i.e. that

$$\begin{aligned} \eta >2\epsilon . \end{aligned}$$

As a consequence, for this optimal \(\lambda ,\) the bracket appearing in the exponent in the estimate 3.7 becomes

$$\begin{aligned} -(4\pi \delta ^{2}N^{2}\epsilon )\epsilon ^{-1}\lambda +f(\lambda )= & {} -(\eta +1)\epsilon ^{-1}\lambda +f(\lambda )\\= & {} \left( -(\eta +1)+1\right) \epsilon ^{-1}\lambda +4\log \left( \left( 1-\frac{\lambda }{2}\right) ^{-1}\right) \\= & {} 2\left( -\eta \right) \left( \epsilon ^{-1}-\frac{2}{\eta }\right) +4\log \frac{\eta }{2\epsilon }\\= & {} 2\left( -\eta \right) \epsilon ^{-1}+4\left( 1+\log \frac{\eta }{2\epsilon }\right) \end{aligned}$$

In particular, taking \(\epsilon :=1/\log N\) gives, for any \(\eta >2/\log N,\) gives

$$\begin{aligned}{} & {} \mathbb {P}_{N}\left( \left\| \delta _{N}-\sigma \right\| _{H(X)^{-(2+1/\log N)}}>R\frac{(\log N)^{1/2}}{N}\right) \nonumber \\{} & {} \quad \le \exp \left( \frac{1}{2}\left( -4\pi \delta ^{2}N^{2}\lambda +f(\lambda )\right) \right) =\frac{1}{N^{\eta }}\left( \frac{1}{2}\log N\right) ^{2}\exp \left( 2\left( 1+\log \eta \right) \right) . \end{aligned}$$
(3.9)

3.5 Proof of Theorem 1.1

By Lemma 2.5 it will be enough to prove the inequality for \(s=2.\) Consider the real-valued random variable \(W(s):=\text {wce }({\varvec{x}}_{N};s)\) on \((X^{N},d\mathbb {P}_{N}).\) Applying Theorem 1.3 to \(\delta =R\epsilon ^{-1/2}/N\) it will be enough to show the following claim when \(\epsilon :=1/\log N\) under the assumption that \(\epsilon ^{1/2}\le R\le N\epsilon ^{1/2}:\)

$$\begin{aligned} \text {claim:\,}W(2+\epsilon )\le R\frac{\epsilon ^{-1/2}}{N}\implies W(2)\le CR\frac{\epsilon ^{-1/2}}{N},\,\,\,C:=e^{1/2}c(2,2+\epsilon ), \end{aligned}$$

where \(c(2,2+\epsilon )\) is defined in Lemma 2.5. To this end recall that, by assumption, \(W(2+\epsilon )\le 1\) and hence by Lemma 2.5 we have, since \(\epsilon \le 1\)

$$\begin{aligned} W(s)\le cW(2+\epsilon )^{\frac{s}{2+\epsilon }},\,\,\,c=c(2,2+\epsilon ) \end{aligned}$$

In particular,

$$\begin{aligned} W(2)\le cW(2+\epsilon )^{\frac{2}{2+\epsilon }}\le cW(2+\epsilon )^{(1-\epsilon /2)} \end{aligned}$$

using that \(1/(1+t)\ge 1-t\) if \(t\ge 0\) and that \(W_{N}(2+\epsilon )\le 1.\) Hence, \(W_{N}(2+\epsilon )\le R\frac{\epsilon ^{-1/2}}{N}\) implies that

$$\begin{aligned} W(2)\le c\left( R\frac{\epsilon ^{-1/2}}{N}\right) ^{(1-\epsilon /2)}=cR\frac{\epsilon ^{-1/2}}{N}\left( R\frac{\epsilon ^{-1/2}}{N}\right) ^{-\epsilon /2}. \end{aligned}$$

But, by assumption, \(R\epsilon ^{-1/2}\ge 1\) and hence

$$\begin{aligned} \left( R\frac{\epsilon ^{-1/2}}{N}\right) ^{-\epsilon /2}=\left( R\epsilon ^{-1/2}\right) ^{-\epsilon /2}N^{\epsilon /2}\le N^{\epsilon /2}:=\left( N^{1/(\log N)}\right) ^{1/2}=e^{1/2}, \end{aligned}$$

since \(\epsilon :=1/\log N.\)

Remark 3.5

In particular, if \(1/\log N\le 0.15\) (for, example, \(N=1000)\) then Lemma 2.5 gives \(C\le e\) and hence \(W(2)\le eR\frac{(\log N)^{1/2}}{N}.\)

3.6 Explicit formulation of Theorem 1.1

Combining the previous remark with the explicit bound in formula 3.9 gives the following explicit formulation of the first inequality in Theorem 1.1

$$\begin{aligned} \mathbb {P}_{N}\left( \text {wce }({\varvec{x}}_{N};2)\le e\sqrt{\frac{1+\eta }{4\pi }}\frac{(\log N)^{1/2}}{N}\right) \ge 1-\frac{\left( \frac{1}{2}e\eta \log N\right) ^{2}}{N^{\eta }}, \end{aligned}$$

under the assumption that \(\eta >2/\log N\) and \(N\ge 1000\).