Skip to main content
Log in

Finding global minima via kernel approximations

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

We consider the global minimization of smooth functions based solely on function evaluations. Algorithms that achieve the optimal number of function evaluations for a given precision level typically rely on explicitly constructing an approximation of the function which is then minimized with algorithms that have exponential running-time complexity. In this paper, we consider an approach that jointly models the function to approximate and finds a global minimum. This is done by using infinite sums of square smooth functions and has strong links with polynomial sum-of-squares hierarchies. Leveraging recent representation properties of reproducing kernel Hilbert spaces, the infinite-dimensional optimization problem can be solved by subsampling in time polynomial in the number of function evaluations, and with theoretical guarantees on the obtained minimum. Given n samples, the computational cost is \(O(n^{3.5})\) in time, \(O(n^2)\) in space, and we achieve a convergence rate to the global optimum that is \(O(n^{-m/d + 1/2 + 3/d})\) where m is the degree of differentiability of the function and d the number of dimensions. The rate is nearly optimal in the case of Sobolev functions and more generally makes the proposed method particularly suitable for functions with many derivatives. Indeed, when m is in the order of d, the convergence rate to the global optimum does not suffer from the curse of dimensionality, which affects only the worst-case constants (that we track explicitly through the paper).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Novak, E.: Deterministic and Stochastic Error Bounds in Numerical Analysis, vol. 1349. Springer, Berlin (2006)

    Google Scholar 

  2. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, vol. 87. Springer, Berlin (2013)

    Google Scholar 

  3. Ivanov, V.V.: On optimum minimization algorithms in classes of differentiable functions. In: Doklady Akademii Nauk, vol. 201, pp. 527–530. Russian Academy of Sciences, Moscow (1971)

  4. Novak, E., Woźniakowski, H.: Tractability of Multivariate Problems: Standard Information for Functionals, vol. 12. European Mathematical Society, Helsinki (2008)

    Book  Google Scholar 

  5. Osborne, M.A., Garnett, R., Roberts, S.J.: Gaussian processes for global optimization. In: International Conference on Learning and Intelligent Optimization (LION3), pp. 1–15 (2009)

  6. Lasserre, J.-B.: Global optimization with polynomials and the problem of moments. SIAM J. Optim. 11(3), 796–817 (2001)

    Article  MathSciNet  Google Scholar 

  7. Laurent, M.: Sums of squares, moment matrices and optimization over polynomials. In: Emerging Applications of Algebraic Geometry, pp. 157–270. Springer, Berlin (2009)

  8. Lasserre, J.-B.: Moments, Positive Polynomials and Their Applications, vol. 1. World Scientific, Singapore (2010)

    Google Scholar 

  9. Marteau-Ferey, U., Bach, F., Rudi, A.: Non-parametric models for non-negative functions. Adv. Neural. Inf. Process. Syst. 33, 12816–12826 (2020)

    Google Scholar 

  10. Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, Berlin (2011)

    Google Scholar 

  11. Adams, R.A., Fournier, J.J.F.: Sobolev Spaces. Elsevier, Amsterdam (2003)

    Google Scholar 

  12. Lasserre, J.-B., Toh, K.-C., Yang, S.: A bounded degree SOS hierarchy for polynomial optimization. EURO J. Comput. Optim. 5(1–2), 87–117 (2017)

    Article  MathSciNet  Google Scholar 

  13. Marx, S., Pauwels, E., Weisser, T., Henrion, D., Lasserre, J.: Semi-algebraic approximation using Christoffel–Darboux kernel. Technical report arXiv:1904.01833 (2019)

  14. Nie, J.: Optimality conditions and finite convergence of Lasserre’s hierarchy. Math. Program. 146(1–2), 97–121 (2014)

    Article  MathSciNet  Google Scholar 

  15. Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68(3), 337–404 (1950)

    Article  MathSciNet  Google Scholar 

  16. Paulsen, V.I., Raghupathi, M.: An Introduction to the Theory of Reproducing Kernel Hilbert Spaces, vol. 152. Cambridge University Press, Cambridge (2016)

    Book  Google Scholar 

  17. Steinwart, I., Christmann, A.: Support Vector Machines. Springer, Berlin (2008)

    Google Scholar 

  18. Wendland, H.: Scattered Data Approximation, vol. 17. Cambridge University Press, Cambridge (2004)

    Book  Google Scholar 

  19. Del Moral, P., Niclas, A.: A Taylor expansion of the square root matrix function. J. Math. Anal. Appl. 465(1), 259–266 (2018)

    Article  MathSciNet  Google Scholar 

  20. Narcowich, F.J., Ward, J.D., Wendland, H.: Refined error estimates for radial basis function interpolation. Constr. Approx. 19(4), 541–564 (2003)

    Article  MathSciNet  Google Scholar 

  21. Boyd, S.P., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  Google Scholar 

  22. Penrose, M.: Random Geometric Graphs, vol. 5. Oxford University Press, Oxford (2003)

    Book  Google Scholar 

  23. Nemirovski, A.: Interior point polynomial time methods in convex programming. Lecture notes (2004)

  24. Bach, F.: Breaking the curse of dimensionality with convex neural networks. J. Mach. Learn. Res. 18(1), 629–681 (2017)

    Google Scholar 

  25. Lasserre, J.-B.: A sum of squares approximation of nonnegative polynomials. SIAM Rev. 49(4), 651–669 (2007)

    Article  MathSciNet  Google Scholar 

  26. Lasserre, J.-B.: A new look at nonnegativity on closed sets and polynomial optimization. SIAM J. Optim. 21(3), 864–885 (2011)

    Article  MathSciNet  Google Scholar 

  27. Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. SIAM, Philadelphia (1992)

    Book  Google Scholar 

  28. Montaz Ali, M., Khompatraporn, C., Zabinsky, Z.B.: A numerical evaluation of several stochastic algorithms on selected continuous global optimization test problems. J. Global Optim. 31(4), 635–672 (2005)

    Article  MathSciNet  Google Scholar 

  29. Jamil, M., Yang, X.S.: A literature survey of benchmark functions for global optimisation problems. Int. J. Math. Model. Numer. Optim. 4(2), 150 (2013)

    Google Scholar 

  30. Molga, M., Smutnicki, C.: Test functions for optimization needs (2005)

  31. Henrion, D., Lasserre, J.-B., Löfberg, J.: Gloptipoly 3: moments, optimization and semidefinite programming. Optim. Methods Softw. 24, 10 (2007)

    MathSciNet  Google Scholar 

  32. Lasserre, J.-B.: The moment-SOS hierarchy and the Christoffel–Darboux kernel. Technical Report. arXiv:2011.08566 (2020)

  33. Nesterov, Y.: Squared functional systems and optimization problems. In: High Performance Optimization, pp. 405–440. Springer, Berlin (2000)

  34. Slot, L., Laurent, M.: Near-optimal analysis of Lasserre’s univariate measure-based bounds for multivariate polynomial optimization. Math. Program. 188, 443–460 (2020)

    Article  MathSciNet  Google Scholar 

  35. Zhou, D.-X.: Derivative reproducing properties for kernel methods in learning theory. J. Comput. Appl. Math. 220(1–2), 456–463 (2008)

    Article  MathSciNet  Google Scholar 

  36. Bach, F.: Sharp analysis of low-rank kernel matrix approximations. In: Conference on Learning Theory, pp. 185–209 (2013)

  37. Rudi, A., Camoriano, R., Rosasco, L.: Less is more: Nyström computational regularization. Adv. Neural. Inf. Process. Syst. 28, 1657–1665 (2015)

    Google Scholar 

  38. Rudi, A., Rosasco, L.: Generalization properties of learning with random features. Adv. Neural. Inf. Process. Syst. 30, 3215–3225 (2017)

    Google Scholar 

  39. Bach, F.: On the equivalence between kernel quadrature rules and random feature expansions. J. Mach. Learn. Res. 18(1), 714–751 (2017)

    MathSciNet  Google Scholar 

  40. Hörmander, L.: The Analysis of Linear Partial Differential Operators I: Distribution Theory and Fourier Analysis. Springer, Berlin (2015)

    Google Scholar 

  41. Brenner, S., Scott, R.: The Mathematical Theory of Finite Element Methods, vol. 15. Springer, Berlin (2007)

    Google Scholar 

  42. Weidmann, J.: Linear Operators in Hilbert Spaces, vol. 68. Springer, Berlin (1980)

    Google Scholar 

  43. Bhatia, R.: Matrix Analysis, vol. 169. Springer, Berlin (2013)

    Google Scholar 

  44. Olver, F.W.J., Lozier, D.W., Boisvert, R.F., Clark, C.W.: NIST Handbook of Mathematical Functions. Cambridge University Press, Cambridge (2010)

    Google Scholar 

  45. Sickel, W.: Superposition of functions in Sobolev spaces of fractional order. A survey. Banach Center Publ. 27, 481–497 (1992)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We would like to thank Jean-Bernard Lasserre and Edouard Pauwels for their feedback on an earlier version of the manuscript. This work was funded in part by the French government under management of Agence Nationale de la Recherche as part of the “Investissements d’avenir” program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute). We acknowledge support from the European Research Council (grant SEQUOIA 724063). We also acknowledge support from the European Research Council (grant REAL 947908).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alessandro Rudi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Additional notation and definitions

We provide here some basic notation that will be used in the rest of the appendices.

Multi-index notation. Let \(\alpha \in \mathbb {N}^d\), \(x \in \mathbb {R}^d\) and f be an infinitely differentiable function on \(\mathbb {R}^d\), we introduce the following notation

$$\begin{aligned} |\alpha | = \sum _{j \in [d]} \alpha _i, \quad \alpha ! = \prod _{j \in [d]} \alpha _j!, \quad x^\alpha = \prod _{j \in [d]} x_j^{\alpha _j}, \quad \partial ^\alpha f = \frac{\partial ^{|\alpha |} f}{\partial x_1^{\alpha _1}\cdots \partial x_d^{\alpha _d}}. \end{aligned}$$

Some useful space of functions. Let \(\varOmega \) be an open set. In this paper we will denote by \(C^s(\varOmega )\), \(s \in \mathbb {N}\), the set of s-times differentiable functions on \(\varOmega \) and by \(C^s_0(\varOmega )\) the set of functions that are differentiable at least s times and that are supported on a compact in \(\varOmega \). Denote by \(L^p(\varOmega )\) the Lebesgue space of p-integrable functions with respect to the Lebesgue measure and denote by \(\Vert \cdot \Vert _{L^p(\varOmega )}\) the associated norm [11].

1.1 Fourier Transform

Given two functions \(f,g:\varOmega \rightarrow \mathbb {R}\) on some set \(\varOmega \), we denote by \(f \cdot g\) the function corresponding to pointwise product of fg, i.e.,

$$\begin{aligned} (f \cdot g)(x) = f(x)g(x), \quad \forall x \in \varOmega . \end{aligned}$$

Let \(f, g \in L^1(\mathbb {R}^d)\) we denote the convolution by \(f \star g\)

$$\begin{aligned} (f \star g)(x) = \int _{\mathbb {R}^d} f(y) g(x-y) dy. \end{aligned}$$

Let \(f \in L^1(\mathbb {R}^d)\). The Fourier transform of f is denoted by \(\tilde{f}\) and is defined as

$$\begin{aligned} \tilde{f}(\omega ) = (2\pi )^{-\frac{d}{2}}\int _{\mathbb {R}^d} e^{-i \,\omega ^\top x} \,f(x)\, dx, \end{aligned}$$

We now recall some basic properties, that will be used in the rest of the appendix.

Proposition 2

(Basic properties of the Fourier transform [18], Chapter 5.2.).

  1. (a)

    Let \(f \in L^1(\mathbb {R}^d)\) and let \(r > 0\). Denote by \(\tilde{f}\) its Fourier transform and by \(f_r\) the function \(f_r(x) = f(x/r)\) for all \(x \in \mathbb {R}^d\), then

    $$\begin{aligned} \tilde{f}_r(\omega ) = r^d \tilde{f}(r\omega ). \end{aligned}$$
  2. (b)

    Let \(f, g \in L^1(\mathbb {R}^d)\), then

    $$\begin{aligned} \widetilde{f \cdot g} = (2\pi )^{d/2} \tilde{f} \star \tilde{g}. \end{aligned}$$
  3. (c)

    Let \(\alpha \in \mathbb {N}_0^d\), \(f: \mathbb {R}^d \rightarrow \mathbb {R}\) and \(f, \partial ^\alpha f \in L^1(\mathbb {R}^d)\), then

    $$\begin{aligned} \widetilde{\partial ^\alpha f}\,(\omega ) = i^{|\alpha |} \omega ^\alpha \tilde{f}(\omega ), \quad \forall \omega \in \mathbb {R}^d. \end{aligned}$$
  4. (d)

    Let \(f \in L^1(\mathbb {R}^d)\), then

    $$\begin{aligned} \Vert \tilde{f}\Vert _{L^\infty (\mathbb {R}^d)} \le (2\pi )^{-d/2} \Vert f\Vert _{L^1(\mathbb {R}^d)}. \end{aligned}$$
  5. (e)

    Let \(f \in L^1(\mathbb {R}^d)\) and assume that \(\tilde{f} \in L^1(\mathbb {R}^d)\), then

    $$\begin{aligned} f(x) = (2\pi )^{-\frac{d}{2}}\int _{\mathbb {R}^d} e^{i \,\omega ^\top x} \,\tilde{f}(\omega )\, dx, \quad \text {and} \quad \Vert f\Vert _{L^\infty (\mathbb {R}^d)} \le (2\pi )^{-d/2} \Vert \tilde{f}\Vert _{L^1(\mathbb {R}^d)}. \end{aligned}$$
  6. (f)

    There exists a linear isometry \({{\mathcal {F}}}: L^2(\mathbb {R}^d) \rightarrow L^2(\mathbb {R}^d)\) satisfying

    $$\begin{aligned} {{\mathcal {F}}} f = \tilde{f}, \quad f \in L^2(\mathbb {R}^d) \cap L^1(\mathbb {R}^d). \end{aligned}$$

    The isometry is uniquely determined by the property in the equation above. For any \(f \in L^2(\mathbb {R}^d)\) we denote by \(\tilde{f}\) the function \(\tilde{f} = {{\mathcal {F}}} f\).

1.2 Sobolev Spaces

For this section we refer to [11]. For any \(\alpha \in \mathbb {N}_0^d\) we say that \(v_\alpha \in L^1_{loc}(\mathbb {R}^d)\) is the \(\alpha \)-weak derivative of \(u \in L^1_{loc}(\mathbb {R}^d)\) if, for all compactly supported smooth functions \(\tau \in C^\infty _0(\mathbb {R}^d)\), we have

$$\begin{aligned} \int _{\mathbb {R}^d} v_\alpha (x) \tau (x) dx = (-1)^{|\alpha |}\int _{\mathbb {R}^d} u(x) (\partial ^\alpha \tau )(x) dx, \end{aligned}$$

and we denote \(v_\alpha \) by \(D^\alpha u\). Let \(\varOmega \subseteq \mathbb {R}^d\) be an open set. For \(s \in \mathbb {N}, p \in [1,\infty ]\) the Sobolev spaces \(W^s_p(\varOmega )\) are defined as

$$\begin{aligned} W^s_p(\varOmega ) = \{f \in L^p(\varOmega ) ~|~ \Vert f\Vert _{W^s_p(\varOmega )} < \infty \}, \quad \Vert f\Vert _{W^s_p(\varOmega )} = \sum _{|\alpha | \le s} \Vert D^\alpha f\Vert _{L^p(\varOmega )}. \end{aligned}$$

We now recall some basic results about Sobolev spaces that are useful for the proofs in this paper. First we start by recalling the restriction properties of Sobolev spaces. Let \(\varOmega \subseteq \varOmega ' \subseteq \mathbb {R}^d\) be two open sets. Let \(\beta \in \mathbb {N}\) and \(p \in [1,\infty ]\). By definition of the Sobolev norm above we have

$$\begin{aligned} \Vert g|_\varOmega \Vert _{W^s_p(\varOmega )} \le \Vert g\Vert _{W^s_p(\varOmega ')}, \end{aligned}$$

and so \(g|_\varOmega \in W^s_p(\varOmega )\) for any \(g \in W^s_p(\varOmega ')\). Now we recall the extension properties of Sobolev spaces.

Proposition 3

(Extension operator, 5.24 in [11]). Let \(\varOmega \) be a bounded open subset of \(\mathbb {R}^d\) with locally Lipschitz boundary [11]. Let \(\beta \in \mathbb {N}\) and \(p \in [1,\infty ]\). There exists a bounded operator \(E:W^\beta _p(\varOmega ) \rightarrow W^\beta _p(\mathbb {R}^d)\) and a constants \(C_3\) depending only on \(\beta , p, \varOmega \) such that for any \(h \in W^\beta _p(\varOmega )\) the following holds (a) \(h = (Eh)|_\varOmega \) (b) \(\Vert Eh\Vert _{W^\beta _p(\mathbb {R}^d)} \le C_3 \Vert h\Vert _{W^\beta _p(\varOmega )}\) with \(C_3 = \Vert E\Vert _{op}\).

1.3 Reproducing Kernel Hilbert spaces

For this section we refer to [15,16,17]. Let S be a set and \(k:S \times S \rightarrow \mathbb {R}\) be a p.d. kernel. We denote by \(\mathcal {H}_k(S)\) the reproducing kernel Hilbert space (RKHS) associated to the kernel k, and by \(\left\langle {\cdot },{\cdot }\right\rangle _k\) the associated inner product. In particular, we will omit the dependence in k from \(\mathcal {H}\) and \(\left\langle {\cdot },{\cdot }\right\rangle \) when the used kernel is clear from the context. We will omit also the dependence on S when \(S = \varOmega \), the region we are using in this paper. In particular we will use the following shortcuts \(\mathcal {H}= \mathcal {H}_k(\varOmega )\) and \(\mathcal {H}(\mathbb {R}^d) = \mathcal {H}_k(\mathbb {R}^d)\).

Concrete constructions and useful characterizations. In the rest of the section we provide other methods to build RKHS and some interesting characterizations of \(\mathcal {H}_k(S)\) and \(\left\langle {\cdot },{\cdot }\right\rangle _k\) that will be useful int the rest of the appendix.

Proposition 4

(Construction of RKHS given \(S, \phi \), Thm. 4.21 of [17]). Let \(\phi : S \rightarrow V\) be a continuous map, where V is separable Hilbert space with inner product \(\left\langle {\cdot },{\cdot }\right\rangle _V\). Let \(k(x,x') = \left\langle {\phi (x)},{\phi (x')}\right\rangle _V\) for any \(x,x' \in S\). Then k is a p.d. kernel and the associated RKHS is characterized as follows

$$\begin{aligned} \mathcal {H}_k(S) = \{\left\langle {w},{\phi (\cdot )}\right\rangle _V~|~ w \in V\}, \quad \Vert f\Vert _{\mathcal {H}_k(S)} = \inf _{u \in V} \Vert u\Vert _V ~~s.t.~~ f = \left\langle {u},{\phi (\cdot )}\right\rangle _V. \end{aligned}$$

Proposition 5

(Restriction of a RKHS \(\mathcal {H}_{k_1}(S_1)\) on a subset \(S_0 \subset S_1\) [15, 16]). Let \(k_0\) be the restriction on \(S_0\) of the kernel \(k_1\) defined on \(S_1\). Then the following holds

  1. (a)

    \(k_0\) is a p.d. kernel,

  2. (b)

    the RKHS \(\mathcal {H}_{k_0}(S_0)\) is characterized as \(\mathcal {H}_{k_0}(S_0) = \{ f|_{S_0} ~|~ f \in \mathcal {H}_{k_1}(S_1)\}\),

  3. (c)

    the norm \(\Vert \cdot \Vert _{\mathcal {H}_{k_0}(S_0)}\) is characterized by

    $$\begin{aligned} \Vert f\Vert _{\mathcal {H}_{k_0}(S_0)} = \inf _{g \in \mathcal {H}_{k_1}(S_1)} \Vert g\Vert _{\mathcal {H}_{k_1}(S_1)}, ~~~s.t.~~~ f(x) = g(x) ~\forall x \in S_0, \end{aligned}$$
  4. (d)

    there exist a linear bounded extension operator \(E:\mathcal {H}_{k_0}(S_0) \rightarrow \mathcal {H}_{k_1}(S_1)\) such that \((E f)(x) = f(x)\) for any \(x \in S_0\) and \(f \in \mathcal {H}_{k_0}(S_0)\) and such that

    $$\begin{aligned} \Vert f\Vert _{\mathcal {H}_{k_0}(S_0)} = \Vert Ef\Vert _{\mathcal {H}_{k_1}(S_1)}, \quad \forall f \in \mathcal {H}_{k_0}(S_0), \end{aligned}$$
  5. (e)

    there exist a linear bounded restriction operator \(R:\mathcal {H}_{k_1}(S_1) \rightarrow \mathcal {H}_{k_0}(S_0)\) such that \((R f)(x) = f(x)\) for any \(x \in S_0\) and \(f \in \mathcal {H}_{k_1}(S_1)\),

  6. (f)

    R and E are partial isometries. In particular \(E = R^*\) and RE is the identity on \(\mathcal {H}_{k_0}(S_0)\), while ER is a projection operator on \(\mathcal {H}_{k_1}(S_1)\).

Proposition 6

(Translation invariant kernels on \(\mathbb {R}^d\)). Let \(v:\mathbb {R}^d \rightarrow \mathbb {R}\) such that its Fourier transform \(\tilde{v}\) is integrable and satisfies \(\tilde{v} \ge 0\) on \(\mathbb {R}^d\). Then

  1. (a)

    The function \(k:\mathbb {R}^d \times \mathbb {R}^d \rightarrow \mathbb {R}\) defined as \(k(x,x') = v(x-x')\) for any \(x,x' \in \mathbb {R}^d\) is a kernel and is called translation invariant kernel.

  2. (b)

    The RKHS \(\mathcal {H}_k(\mathbb {R}^d)\) and the norm \(\Vert \cdot \Vert _{\mathcal {H}_k(\mathbb {R}^d)}\) are characterized by

    $$\begin{aligned}{} & {} \mathcal {H}_k(\mathbb {R}^d) = \{f \in L^2(\mathbb {R}^d) ~|~ \Vert f\Vert _{\mathcal {H}_k(\mathbb {R}^d)} < \infty \}, \quad \Vert f\Vert ^2_{\mathcal {H}_k(\mathbb {R}^d)} = (2\pi )^{-\tfrac{d}{2}}\int _{\mathbb {R}^d}\\{} & {} \frac{|({{\mathcal {F}}}f)(\omega )|^2}{\tilde{v}(\omega )}d\omega , \end{aligned}$$

    where \({{\mathcal {F}}} f\) is the Fourier transform of f (see Proposition 2 for more details on \({{\mathcal {F}}}\)).

  3. (c)

    The inner product \(\left\langle {\cdot },{\cdot }\right\rangle _k\) is characterized by

    $$\begin{aligned} \left\langle {f},{g}\right\rangle _k = (2\pi )^{-\tfrac{d}{2}} \int _{\mathbb {R}^d} \frac{({{\mathcal {F}}}f)(\omega )\overline{({{\mathcal {F}}}g)(\omega )}}{\tilde{v}(\omega )} d\omega . \end{aligned}$$

1.4 Auxiliary results on \(C^\infty \) functions

Proposition 7

Let U be an open set of \(\mathbb {R}^d\) and \(K \subset U\) be a compact set. Let \(u \in C^\infty (U)\), then there exists \(v \in C^\infty _0(\mathbb {R}^d)\) (with compact support), such that \(v(x) = u(x)\) for all \(x \in K\).

Proof

By Thm. 1.4.1, pag. 25 of [40] there exists \(z_{K,U} \in C_0^\infty (U)\), i.e., a smooth function with compact support, such that \(z_{K,U}(x) \in [0,1]\) for any \(x \in U\) and \(z(x) = 1\) for any \(x \in K\). Consider now the function \(v_{K,U}\) defined as \(v_{K,U}(x) = z_{K,U}(x)u(x)\) for all \(x \in U\). The function \(v_{K,U}\) is in \(C^\infty _0(U)\), since it is the product of a \(C^\infty _0(U)\) and a \(C^\infty (U)\) function, moreover \(v_{K,U}(x) = u(x)\) for all \(x \in K\). The theorem is concluded by defining v as the extension of \(v_{K,U}\) to \(\mathbb {R}^d\), i.e., the function \(v_{K}(x) = z_{K,U}(x)\) for any \(x \in U\) and \(v_{K}(x) = 0\) for any \(x \in \mathbb {R}^d\setminus U\). This is always possible since \(v_{K,U}\) is supported on a compact set \(K'\) which is contained in the open set U, so \(v_{K,U}\) is already identically zero in the open set \(U \setminus K'\). \(\square \)

Lemma 6

Given \(\zeta \in \mathbb {R}^d\) and \(r > 0\), there exists \(u \in C^\infty _0(\mathbb {R}^d)\) such that for any \(x \in \mathbb {R}^d\), it holds

  1. (i)

    \(u(x) \in [0,1]\);

  2. (ii)

    \(\Vert x\Vert \ge r \implies u(x) = 0\);

  3. (iii)

    \(\Vert x\Vert \le r/2 \implies u(x) = 1\).

Proof

Assume without loss of generality that \(\zeta = 0\) and \(r = 1\). Consider the following functions :

$$\begin{aligned} u_1(x) = {\left\{ \begin{array}{ll} \exp \left( -\frac{1}{1 - \Vert x\Vert ^2}\right) &{}\text {if} \,\Vert x\Vert <1 \\ 0 &{}\text {otherwise}\end{array}\right. },\qquad u_2(x) = {\left\{ \begin{array}{ll} \exp \left( -\frac{1}{ \Vert x\Vert ^2 -1/4}\right) &{}\text {if} \, \Vert x\Vert > 1/2 \\ 0 &{}\text {otherwise}\end{array}\right. }. \end{aligned}$$

Both \(u_1\) and \(u_2\) belong to \(C^{\infty }(\mathbb {R}^d)\) with values in [0, 1]. Moreover, \(u_1 > \alpha _1\) on \(B_{3/4}(0)\) and \(u_2 \ge \alpha _2\) for some \(\alpha _1, \alpha _2 > 0\) on \(\mathbb {R}^d \setminus {B_{3/4}(0)}\), which implies that \(u_1 + u_2 \in I\) on \(\mathbb {R}^d\), where \(I = [\min (\alpha _1,\alpha _2), 2]\). Since \((\cdot )^{-1}\) is infinitely differentiable on \((0,\infty )\) we see that \(1/(u_1+u_2)\) is well defined on all \(\mathbb {R}^d\) and belongs to \(C^\infty (\mathbb {R}^d)\), since \(I \subset \subset (0,\infty )\). Consider the function

$$\begin{aligned} u_0 = \frac{u_1}{u_1 + u_2}. \end{aligned}$$

It is non-negative, bounded by 1, and infinitely differentiable as a product. Moreover :

$$\begin{aligned}{} & {} \forall x \in B_{1/2}(0),~ u_2(x) = 0 \implies u_0(x) = 1,\\{} & {} \forall x \in \mathbb {R}^d,~ u_1(x) = 0 \Leftrightarrow u_0(x) =0 \Leftrightarrow x \in \mathbb {R}^d\setminus {B_1(0)}. \end{aligned}$$

To conclude the proof, given \(r > 0\) and \(\zeta \in \mathbb {R}^d\) we will take \(u(x) = u_0((x-\zeta )/r)\). \(\square \)

Lemma 7

Let \(N \in \mathbb {N}_+\), \(\zeta _1,...,\zeta _N \in \mathbb {R}^d\) and \(r_1,...,r_N >0\). For \(n \in \{1,\dots ,N\}\), let \(B_n = B_{r_n}(\zeta _n)\) be the open ball centered in \(\zeta _n\) of radius \(r_n\) and \(B^{\prime }_n = B_{r_n /2}(\zeta _n) \subset B_n\) be the open ball centered in \(\zeta _n\) of radius \(r_n/2\). Then there exists functions \(v_0,v_1,...,v_N \in C^\infty (\mathbb {R}^d)\) such that

  1. (i)

    \(v_0 = v_0\cdot \varvec{1}_{\mathbb {R}^d \setminus {\bigcup _{n =1}^N B^{\prime }_n}}\)

  2. (ii)

    \(v_n = v_n\cdot \varvec{1}_{B_n},~ \forall n \in \{1,\dots ,N\}\)

  3. (iii)

    \(\sum _{n =0}^N{v_n^2} = 1\).

Proof

For all \(n \in [N]\), take \(u_n\) as in Lemma 6 with \(r = r_n, \zeta = \zeta _n\) and define \(u_0 = \prod _{n=1}^N{(1-u_n)}\). Since \(\forall n \in [N],~ u_n \in [0,1]\), we also have \(u_0 \in [0,1]\). Moreover, let \(R = \max _{n \in [N]}{\Vert \zeta _n\Vert + r_n}\), then

$$\begin{aligned} \forall \Vert x\Vert \ge R,~ \forall 1 \le n \le N,~u_n(x) = 0 \text { and } u_0(x) = 1. \end{aligned}$$

Step 1. \(u_0\cdot \varvec{1}_{\mathbb {R}^d \setminus {\bigcup _{n \in [N]}B^{\prime }_n}} = u_0\) and for all \(n \in [N]\), \(u_n \cdot \varvec{1}_{B_n} = u_n\).

By point (iii) of Lemma 6, \(u_n = 1\) on \(B^{\prime }_n\) for all \(n \in [N]\), which shows that \(u_0 = 0\) on \(\bigcup _{n = 1}^N{B^{\prime }_n}\) and hence \(u_0\cdot \varvec{1}_{\mathbb {R}^d \setminus {\bigcup _{n \in [N]}B^{\prime }_n}} = u_0\). On the other hand, for all \(n \in [N]\), point (ii) of Lemma 6 directly implies \(u_n \cdot \varvec{1}_{B_n} = u_n\).

Step 2. The function \(\frac{1}{\sqrt{\sum _{n = 0}^N{u_i^2}}}\) is well defined and in \(C^\infty (\mathbb {R}^d)\).

By definition of \(u_0\), if \(u_0(x) = 0\), then there exists \(n \in [N]\) such that \(u_n(x) = 1\). Since all the \(u_n\) are non-negative, this shows that \(s := \sum _{n =0}^N{u_n^2} > 0\). Moreover, consider the closed ball \(\bar{B}\) of radius R and centered in 0. Since \(\bar{B}\) is compact, s is continuous and \(s(x) > 0\) for any \(x \in \bar{B}\), then there exists \(0< m_R \le M_R < \infty \) such that \(s(x) \in [m_R, M_R]\) for any \(x \in \bar{B}\). Moreover, since for any \(\Vert x\Vert \ge R,~ u_0(x) = 1 \text { and }\forall n \in [N],~u_n(x) = 0\), we see that

$$\begin{aligned} \forall x \in \mathbb {R}^d \setminus {B_R(0)},~ \sum _{n =0}^N{u_n^2(x)} = 1. \end{aligned}$$

Then \(s \in [m, M]\) for any \(x \in \mathbb {R}^d\), where \(m = \min (m_R,1)\) and \(M = \max (M_R,1)\).

Since the interval \(I = [m,M]\) is a compact set included in the open set \((0,\infty )\) and \({1}/{\sqrt{\cdot }}\) is infinitely differentiable on \((0,\infty )\) then by Proposition 7 there exists \(q_I \in C^\infty _0(\mathbb {R})\) such that \(q_I(x) = 1/\sqrt{x}\) for any \(x \in I\). Since \(s(x) \in I\) for any \(x \in \mathbb {R}^d\) we have

$$\begin{aligned} \frac{1}{\sqrt{\sum _{n = 0}^N{u_i^2}}} = q_I \circ s. \end{aligned}$$

Finally \(q_I \circ s \in C^\infty (\mathbb {R}^d)\) since it is the composition of \(q_I \in C_0^\infty (\mathbb {R})\) and \(s = \sum _{n =0}^N{u_n^2} \in C^\infty (\mathbb {R}^d)\) (since all the \(u_n\) are in \(C^\infty (\mathbb {R}^d)\)) and \(s \in [m, M]\).

Step 3.

Finally, defining \(v_n = \frac{u_n}{\sqrt{\sum _{n =0}^N{u_n^2}}}\) for all \(0 \le n \le N\), \(v_n \in C^\infty (\mathbb {R}^d)\) since it is the product of two infinitely differentiable functions. Moreover, \(\sum _{n=0}^N v_i^2 = 1\) by construction and \(v_0 = v_0\cdot \varvec{1}_{\mathbb {R}^d \setminus {\bigcup _{n =1}^N B^{\prime }_n}}\) since \(u_0\) satisfies the same equality and \(v_0\) is the product of \(u_0\) by the strictly positive function \(1/\sqrt{s}\). Analogously \(v_n = v_n\cdot \varvec{1}_{B_n},~ \forall n \in \{1,\dots ,N\}\), since \(u_n\) satisfy the same equality and \(v_n\) is the product of \(u_n\) by the strictly positive function \(1/\sqrt{s}\). \(\square \)

Fundamental results on scattered data approximation

We recall here some fundamental results about local polynomial approximation. In particular, we report here the proofs to track explicitly the constants. The proof techniques are essentially from [18, 20]. Denote by \(\pi _k(\mathbb {R}^d)\) the set of multivariate polynomials of degree at most k, with \(k \in \mathbb {N}\). In this section \(B_r(x) \subset \mathbb {R}^d\) denotes the open ball of radius r and centered in x.

Proposition 8

([18], Corollary 3.11. Local polynomial reproduction on a ball). Let \(k \in \mathbb {N},~d,m \in \mathbb {N}_+\) and \(\delta >0\). Let \(B_\delta \) be an open ball of radius \(\delta > 0\) in \(\mathbb {R}^d\). Let \(\widehat{Y} = \{y_1,\dots ,y_m\} \subset B_\delta \) be a non empty finite subset of \(B_\delta \). If either \(k = 0\) or \(h_{\widehat{Y},B_\delta } \le \frac{\delta }{9k^2}\), there exist \(u_j: B_\delta \rightarrow \mathbb {R}\) with \(j \in [m]\) such that

  1. (a)

    \(\sum _{j \in [m]} p(y_j) u_j(x) = p(x), \quad \forall x \in B_\delta , p \in \pi _k(\mathbb {R}^d)\)

  2. (b)

    \(\sum _{j \in [m]} |u_j(x)| \le 2, \quad \forall x \in B_\delta \).

Lemma 8

(Bounds on functions with scattered zeros on a small ball [18, 20]). Let \(k \in \mathbb {N},~d,m \in \mathbb {N}_+\) and \(\delta > 0\). Let \(B_\delta \subset \mathbb {R}^d\) be a ball of radius \(\delta \) in \(\mathbb {R}^d\). Let \(f \in C^{k+1}(B_\delta )\). Let \(\widehat{Y} = \{y_1,\dots ,y_m\} \subset B_\delta \) be a non empty finite subset of \(B_\delta \). If either \(k =0\) or \(h_{\widehat{Y}, B_\delta } \le \frac{\delta }{9 k^2}\), it holds:

$$\begin{aligned} \sup _{x \in B_\delta } |f(x)| ~\le ~ 3C\delta ^{k+1}~+~2\max _{i\in [m]} |f(y_i)|, \qquad C := \sum _{|\alpha | = k+1} \frac{1}{\alpha !} \Vert \partial ^\alpha f\Vert _{L^\infty (B_\delta )}. \end{aligned}$$

Proof

Note that since either \(k= 0\) or \(h_{\widehat{Y}, B_\delta } \le \frac{\delta }{9 k^2}\), then we can apply Proposition 8 obtaining \(u_j\) with \(j \in [m]\) with the local polynomial reproduction property. Define the function \(s_{f,\widehat{Y}} = \sum _{j \in [m]} f(y_j) u_j\) and let \(\tau = \max _{i \in [m]} |f(y_i)|\). Now, by using both Propositions 8(a) and 8(b), we have that for any \(p \in \pi _k(\mathbb {R}^d)\) and any \(x \in B_\delta \),

$$\begin{aligned} |f(x)|&\le |f(x) - p(x)| + |p(x) - s_{f,\widehat{Y}}(x)| + |s_{f,\widehat{Y}}(x)| \\&\le |f(x) - p(x)| + \sum _{j \in [m]} |p(y_j) - f(y_j)| |u_j(x)| + \max _{j \in [m]} |f(y_j)| \sum _{j \in [m]} |u_j(x)| \\&\le \Vert f - p\Vert _{L^\infty (B_\delta )}\left( 1 + \sum _{j \in [m]}|u_j(x)|\right) + \tau \sum _{j \in [m]} |u_j(x)|\\&\le 3\Vert f - p\Vert _{L^\infty (B_\delta )} + 2\tau . \end{aligned}$$

In particular, consider the Taylor expansion of f at the center \(x_0\) of \(B_\delta \) up to order k (e.g. [41] Eq. 4.2.5 pag 95). For any \(x \in B_\delta \), it holds

$$\begin{aligned} f(x)= & {} \sum _{|\alpha | \le k} \frac{1}{\alpha !} \partial ^\alpha f(x_0) (x -x_0)^\alpha \\{} & {} + \sum _{|\alpha | = k+1} \frac{k+1}{\alpha !} (x - x_0)^\alpha \int _0^1 (1-t)^{k} \partial ^\alpha f ((1-t)x_0 + tx) dt. \end{aligned}$$

By choosing \(p(x) = \sum _{|\alpha | \le k} \frac{1}{\alpha !} \partial ^\alpha f(x_0) (x -x_0)^\alpha ~ \in \pi _k(\mathbb {R}^d)\) it holds:

$$\begin{aligned} \Vert f - p\Vert _{L^\infty (B_\delta )} \le \sum _{|\alpha | = k+1} \frac{\delta ^{k+1}}{\alpha !} \Vert \partial ^\alpha f\Vert _{L^\infty (B_\delta )} = C\delta ^{k+1}, \end{aligned}$$

where \(C = \sum _{|\alpha | = k+1} \frac{1}{\alpha !} \Vert \partial ^\alpha f\Vert _{L^\infty (B_\delta )}\) is defined in the lemma. Gathering the previous equations,

$$\begin{aligned} \sup _{x \in B_\delta } |f(x)| \le 2\tau + 3C \delta ^{k+1}. \end{aligned}$$

\(\square \)

Theorem 11

(Bounds on functions with scattered zeros [18, 20]). Let \(k,m \in \mathbb {N}\) s.t. \(k \le m\) and \(n,d \in \mathbb {N}_+\). Let \(r > 0\) and \(\varOmega \) an open set of \(\mathbb {R}^d\) of the form \(\varOmega = \bigcup _{x \in S} B_r(x)\) for some subset S of \(\mathbb {R}^d\). Let \(\widehat{X} = \{x_1,\dots ,x_n\}\) be a non-empty finite subset of \(\varOmega \). Let \(f \in C^{m+1}(\varOmega )\). If \(h_{\widehat{X}, \varOmega } \le r\max (1,\frac{1}{18k^2})\), then

$$\begin{aligned} \sup _{x \in \varOmega } |f(x)| ~~\le ~~ C C_f h_{\widehat{X}, \varOmega }^{k+1} ~+~ 2 \max _{i \in [n]} |f(x_i)|, \end{aligned}$$

where \(C = 3\max (1,18~k^2)^{k+1}\) and \(C_f = \sum _{|\alpha | = k+1} \frac{1}{\alpha !} \Vert \partial ^\alpha f\Vert _{L^\infty (\varOmega )}.\)

Proof

First, note that the condition that there exists a set S such that \(\varOmega = \bigcup _{x \in S}B_r(x)\) implies

$$\begin{aligned} \forall \delta \le r,~ \varOmega = \bigcup _{x_0 \in S_\delta }{B_\delta (x_0)},\qquad S_\delta = \{x^\prime \in \varOmega ~:~ \exists x \in S,~ \Vert x-x^\prime \Vert \le r-\delta \}. \end{aligned}$$

We will now prove the theorem for \(k \ge 1\) and then the easier case \(k=0\), where we will use essentially only the Lipschitzianity of f.

Proof of the case \(\varvec{k \ge 1}\). The idea of the proof is to apply Lemma 8 to a collection of balls of radius \(\delta \) for a well chosen \(\delta \le r\) and centered in \(x_0 \in S_\delta \) defined above. Given \(\widehat{X}\), to apply Lemma 8 on a ball of radius \(\delta \) we have to restrict the points in \(\widehat{X}\) to the subset belonging to that ball, i.e., \(\widehat{Y}_{x_0,\delta } = \widehat{X} \cap B_\delta (x_0)\), \(x_0 \in S_\delta \) and \(\delta >0\). The set \(\widehat{Y}_{x_0,\delta }\) will have a fill distance \(h_{x_0,\delta } = h_{\widehat{Y}_{x_0,\delta },B_\delta (x_0)}\). First we are going to show that \(\widehat{Y}_{x_0,\delta }\) is not empty, when \(r> \delta > h_{\widehat{X},\varOmega }\). To obtain this result we need to study also the ball \(B_{\delta '}(x_0)\) with \(\delta ' = \delta - h_{\widehat{X},\varOmega }\).

Step 1. Showing that \(\widehat{Y}_{x_0,\delta }\) is not empty and for any \(y \in B_{\delta '}(x_0)\) there exists \(z \in \widehat{Y}_{x_0,\delta }\) satisfying \(\Vert y-z\Vert \le h_{\widehat{X},\varOmega }\). Let \(x_0 \in S_\delta \) and \(\delta \le r\). This implies that \(B_\delta (x_0) \subseteq \varOmega \) by the characterization of \(\varOmega \) in terms of \(S_\delta \) we gave above. Define now \(\delta ' = \delta - h_{\widehat{X},\varOmega }\) and note that \(B_{\delta '}(x_0)\) is non empty, since \(\delta ^\prime > 0\), and that \(B_{\delta '}(x_0) \subset B_\delta (x_0) \subseteq \varOmega \). Now note that by definition of fill distance, for any \(y \in B_{\delta '}(x_0)\) there exists a \(z \in \widehat{X}\) such that \(\Vert z - y\Vert \le h_{\widehat{X},\varOmega }\). Moreover note that \(z \in B_\delta (x_0)\), since \(\Vert x_0 - z\Vert \le \Vert x_0 - y\Vert + \Vert y - z\Vert < \delta - h_{\widehat{X},\varOmega } + h_{\widehat{X},\varOmega } = \delta \). Since \(z \in \widehat{X}\) and also in \(B_\delta (x_0)\), then \(z \in \widehat{Y}_{x_0,\delta }\) by definition of \(\widehat{Y}_{x_0,\delta }\).

Step 2. Showing that \(h_{x_0,\delta } \le 2h_{\widehat{X},\varOmega }\). Let \(x \in B_{\delta }(x_0)\). We have seen in the previous step that the ball \(B_{\delta '}(x_0)\) is well defined and non empty, with \(\delta ' = \delta - h_{\widehat{X},\varOmega }\). Now note that also \(B_{h_{\widehat{X},\varOmega }}(x) \cap B_{\delta '}(x_0)\) is not empty, indeed the distance between the centers \(x,x_0\) is strictly smaller than the sum of the two radii, indeed \(\Vert x-x_0\Vert < \delta = \delta ' + h_{\widehat{X},\varOmega }\), since \(x \in B_{\delta }(x_0)\). Take \(w \in B_{h_{\widehat{X},\varOmega }}(x) \cap B_{\delta '}(x_0)\). Since \(w \in B_{\delta '}(x_0)\) by Step 1 we know that there exists \(z \in \widehat{Y}_{x_0,\delta }\) with \(\Vert w-z\Vert \le h_{\widehat{X},\varOmega }\). Since \(w \in B_{h_{\widehat{X},\varOmega }}(x)\), then we know that \(\Vert x-w\Vert < h_{\widehat{X},\varOmega }\). So \(\Vert x - z\Vert \le \Vert x-w\Vert + \Vert w-z\Vert < 2 h_{\widehat{X},\varOmega }\).

Step 3. Applying Lemma 8. Since, by assumption \(h_{\widehat{X},\varOmega } \le r/(18k^2)\) and \(k \ge 1\), then the choice \(\delta = 18 k^2 h_{\widehat{X},\varOmega }\) implies \(r \ge \delta > h_{\widehat{X},\varOmega }\). So we can use the characterization of \(\varOmega \) in terms of \(S_\delta \) and the results in the previous two steps, obtaining that for any \(x_0 \in S_\delta \) the set \(B_\delta (x_0) \subseteq \varOmega \) and moreover the set \(\widehat{Y}_{x_0,\delta }\) is not empty and covers \(B_\delta (x_0)\) with a fill distance \(h_{x_0,\delta } \le 2h_{\widehat{X},\varOmega }\). Since, \(h_{x_0,\delta } \le 2h_{\widehat{X},\varOmega } \le \delta /(9k^2)\) then we can apply Lemma 8 to each ball \(B_\delta (x_0)\) obtaining

$$\begin{aligned}{} & {} \sup _{x \in B_\delta (x_0)} |f(x)| ~\le ~ 3C_{\delta ,x_0}\delta ^{k+1}~+~2\max _{z\in \widehat{Y}_{x_0,\delta }} |f(z)|, \\{} & {} \qquad C_{\delta ,x_0} := \sum _{|\alpha | = k+1} \frac{1}{\alpha !} \Vert \partial ^\alpha f\Vert _{L^\infty (B_\delta (x_0))}. \end{aligned}$$

The proof is concluded by noting that \(\varOmega = \bigcup _{x_0 \in S_\delta } B_\delta (x_0)\) and that for any \(x_0 \in S_\delta \) we have \(C_{\delta ,x_0} \le C_f\), \(\delta ^{k+1} \le (18k^2)^{k+1} h_{\widehat{X},\varOmega }^{k+1}\) and moreover that \(\max _{z\in \widehat{Y}_{x_0,\delta }} |f(z)| \le \max _{i \in [n]} |f(x_i)|\), since \(\widehat{Y}_{x_0,\delta } \subseteq \widehat{X}\) by construction.

Proof of the case \(\varvec{k=0}\) Since \(h_{\widehat{X},\varOmega } \le r\), by assumption, then \(\delta = h_{\widehat{X},\varOmega }\) implies that \(\varOmega \) admits a characterization as \(\varOmega = \bigcup _{x_0 \in S_\delta }B_{\delta }(x_0)\). Now let \(x \in \varOmega \) and choose \(x_0 \in S_\delta \) such that \(x \in B_\delta (x_0)\). One the one hand, since the segment \([x_0,x]\) is included in \(\varOmega \), by Taylor inequality, \(|f(x) - f(x_0)| \le C_f\Vert x-x_0\Vert \le C_f h_{\widehat{X},\varOmega }\) and \(C_f = \sum _{|\alpha | = 1}{\frac{1}{\alpha !} \Vert \partial ^\alpha f\Vert _{L^\infty (\varOmega )}}\). One the other hand, by definition of \(h_{\widehat{X},\varOmega }\), there exists \(z \in \widehat{X} \subset \varOmega \) such that \(\Vert z - x_0\Vert \le h_{\widehat{X},\varOmega } = \delta \). Since both the open segment \([x_0,z) \subset B_\delta (x_0) \subset \varOmega \) and \(z \in \varOmega \), then the whole segment \([x_0,z] \subset \varOmega \) and hence we can apply Taylor inequality to show \(\Vert f(x_0) - f(z)\Vert \le C_f\Vert z-x_0\Vert \le C_f h_{\widehat{X},\varOmega }\). Then we have

$$\begin{aligned} |f(x)| \le |f(x) - f(x_0)| + |f(x) - f(z)| + |f(z)| \le 2 C_f h_{\widehat{X},\varOmega } + \max _{i \in [n]}|f(x_i)|. \end{aligned}$$

The proof of the step \(k = 0\) is concluded by noting that the previous inequality holds for every \(x \in \varOmega \). \(\square \)

Auxiliary results on RKHS

We recall that the nuclear norm of a compact linear operator A is defined as \(\Vert A\Vert _\star = \text {Tr}(\sqrt{A^*A})\) or equivalently \(\Vert A\Vert _\star = \sum _{j \in \mathbb {N}} \sigma _j\), where \((\sigma _j)_{j \in \mathbb {N}}\) are the singular values of A (Chapter 7 of [42] or [43] for the finite dimensional analogue).

Lemma 9

Let \(\varOmega \) be a set, k be a kernel and \(\mathcal {H}\) the associated RKHS. Let \(A: \mathcal {H}\rightarrow \mathcal {H}\) be a trace class operator. If \(\mathcal {H}\) satisfies Assumption 2(a), then

$$\begin{aligned} \Vert r_A\Vert _{\mathcal {H}} \le \textsf{M}\Vert A\Vert _\star , \quad \text {where} \quad r_A(x) := \left\langle {\phi (x)},{A \phi (x)}\right\rangle , ~~ \forall x \in \varOmega , \end{aligned}$$

and \(\Vert A\Vert _\star \) is the nuclear norm of A. We recall that if \(A \in \mathbb {S}_+(\mathcal {H})\) then \(\Vert A\Vert _\star = \text {Tr}(A)\).

Proof

Since A is compact, it admits a singular value decomposition \(A = \sum _{i \in \mathbb {N}} \sigma _i u_i \otimes v_i\). Here, \((\sigma _j)_{j \in \mathbb {N}}\) is a non-increasing sequence of non-negative eigenvalues converging to zero, and \((u_j)_{j \in \mathbb {N}}\) and \((v_j)_{j \in \mathbb {N}}\) are two orthonormal families of corresponding eigenvectors, (a family \((e_j)\) is said to be orthonormal if for \(i,j \in \mathbb {N}\), \(\left\langle {e_i},{e_j}\right\rangle = 1\) if \(i=j\) and \(\left\langle {e_i},{e_j}\right\rangle = 0\) otherwise) [42]. Note that we can write \(r_A\) using this decomposition as \(r_A(x) = \sum _{i \in \mathbb {N}} \sigma _i u_i(x) v_i(x) = \sum _{i \in \mathbb {N}} \sigma _i \, (u_i \cdot v_i)(x)\), for all \(x \in \varOmega \), where we denote by \(\cdot \) the pointwise multiplication between two functions (this equality is justified by the following absolute convergence bound). By Assumption 2(a), the fact that A is trace-class (i.e., \(\Vert A\Vert _\star < \infty \)) and the fact that \(u_j\), \(v_j\) satisfy \(\Vert u_j\Vert _{\mathcal {H}} = \Vert v_j\Vert _{\mathcal {H}} = 1, j \in \mathbb {N}\), the following holds

$$\begin{aligned} \Vert r_A\Vert _{\mathcal {H}}&= \left\| \sum _{j \in \mathbb {N}} \sigma _j (u_j \cdot v_j)\right\| _{\mathcal {H}} \le \sum _{j \in \mathbb {N}} \sigma _j \Vert u_j \cdot v_j\Vert _{\mathcal {H}} \\&\le \textsf{M}\sum _{j \in \mathbb {N}} \sigma _j \Vert u_j\Vert _{\mathcal {H}}\Vert v_j\Vert _{\mathcal {H}} \le \textsf{M}\sum _{j \in \mathbb {N}} \sigma _j = \textsf{M}\Vert A\Vert _\star . \end{aligned}$$

In the case where \(A \in \mathbb {S}_+(\mathcal {H})\), we have \(\Vert A\Vert _\star = \text {Tr}(\sqrt{A^*A}) = \text {Tr}(A)\). \(\square \)

1.1 Proof of Proposition 2

Given the kernel k, the associated RKHS \(\mathcal {H}\) and the canonical feature map \(\phi : \varOmega \rightarrow \mathcal {H}\) and a set of distinct points \(\widehat{X} = \{x_1,\dots ,x_n\}\) define the kernel matrix \(K \in \mathbb {R}^{n \times n}\) as \(K_{i,j} = \left\langle {\phi (x_i)},{\phi (x_j)}\right\rangle = k(x_i, x_j)\) for all \(i,j \in [n]\). Note that, since k is a p.d. kernel, then K is positive semidefinite, moreover when k is universal, then \(\phi (x_1), \dots , \phi (x_n)\) are linearly independent, so K is full rank and hence invertible. Universality of k is guaranteed since \(\mathcal {H}\) contains the \(C^\infty _0(\varOmega )\) functions, by Assumption 1(a), and so can approximate continuous functions over compacts in \(\varOmega \) [17]. Denote by R the upper triangular matrix corresponding to the Cholesky decomposition of K, i.e., R satisfies \(K = R^\top R\). We are ready to start the proof of Proposition 2.

Proof

Denote by \(\widehat{S}:\mathcal {H}\rightarrow \mathbb {R}^n\) the linear operator that acts as follows

$$\begin{aligned} \widehat{S} g ~=~ (\,\left\langle {\phi (x_1)},{g}\right\rangle \,,~\dots ,~\left\langle {\phi (x_n)},{g}\right\rangle \,) \in \mathbb {R}^n, \qquad \forall g \in \mathcal {H}. \end{aligned}$$

Define \(\widehat{S}^*:\mathbb {R}^n \rightarrow \mathcal {H}\), i.e., the adjoint of \(\widehat{S}\), as \(\widehat{S}^*\beta = \sum _{i=1}^n \beta _i \phi (x_i)\) for \(\beta \in \mathbb {R}^n\). Note, in particular, that \(K = \widehat{S}\widehat{S}^*\) and that \(\widehat{S}^* e_j = \phi (x_i)\), where \(e_j\) is the j-th element of the canonical basis of \(\mathbb {R}^n\). We define the operator \(V = R^{-\top } \widehat{S}\) and its adjoint \(V^* = \widehat{S}^*R^{-1}\). By using the definition of V, the fact that \(K = R^\top R\) by construction of R, and the fact that \(K = \widehat{S}\widehat{S}^*\), we derive two facts.

On the one hand,

$$\begin{aligned} VV^* = R^{-\top } \widehat{S}\widehat{S}^* R^{-1} = R^{-\top } K R^{-1} = R^{-\top } R^\top R R^{-1} = I. \end{aligned}$$

On the other hand, P is a projection operator, i.e., \(P^2 = P\), P is positive definite and its range is \(\text {ran}{P} = \text {span}\{\phi (x_i)~|~i \in [n]\}\), implying \(P \phi (x_i) = \phi (x_i)\) for all \(i \in [n]\). Indeed, using the equation above, \(P^2 = V^*VV^*V = V^*(VV^*)V = V^* V = P\), and the positive-semi-definiteness of P is given by construction since it is the product of an operator and its adjoint. Moreover, the range of P is the same as that of \(V^*\) which in turn is the same as that of \(S^*\), since R is invertible : \(\text {ran}{P} = \text {span}\{\phi (x_i)~|~i \in [n]\}\).

Finally, note that since \(k(x,x') = \left\langle {\phi (x)},{\phi (x')}\right\rangle \), for any \(x,x' \in \varOmega \), then for any \(j \in [n]\), \(\varPhi _j\) is characterized by

$$\begin{aligned} \varPhi _j&= R^{-\top }(k(x_1,x_j), \dots , k(x_n,x_j))\\&= R^{-\top }(\left\langle {\phi (x_1)},{\phi (x_j)}\right\rangle , \dots , \left\langle {\phi (x_n)},{\phi (x_j)}\right\rangle ) = R^{-\top } \widehat{S}\phi (x_j) = V \phi (x_j). \end{aligned}$$

\(\square \)

The constants of translation invariant and Sobolev kernels

1.1 Results for translation invariant and Sobolev kernels

Lemma 10

Let \(\varOmega \) be a set and let \(k(x,x') = v(x-x')\) for all \(x,x' \in \varOmega \), be a translation invariant kernel for some function \(v:\mathbb {R}^d \rightarrow \mathbb {R}\). Denote by \(\tilde{v}\) the Fourier transform of v. Let \(\mathcal {H}\) be the associated RKHS. For any \(f,g \in \mathcal {H}\) we have

$$\begin{aligned}{} & {} \Vert f \cdot g\Vert _{\mathcal {H}} \le C \Vert f\Vert _{\mathcal {H}}\Vert g\Vert _{\mathcal {H}}, \\{} & {} \qquad C = (2\pi )^{d/4}\left\| \frac{\tilde{v} \star \tilde{v}}{\tilde{v}}\right\| ^{1/2}_{L^\infty (\mathbb {R}^d)}. \end{aligned}$$

In particular, if there exists a non-increasing \(g:[0,\infty ] \rightarrow (0,\infty ]\) s.t. \(\tilde{v}(\omega ) \le g(\Vert \omega \Vert )\), then

$$\begin{aligned} C \le \sqrt{2}(2 \pi )^{d/2}v(0)^{1/2}\sup _{\omega \in \mathbb {R}^d} \sqrt{\frac{g(\tfrac{1}{2}\Vert \omega \Vert )}{\tilde{v}(\omega )}}. \end{aligned}$$

Proof

First note that by as recalled in Example 5, there exists an extension operator, i.e., a partial isometry \(E: \mathcal {H}\rightarrow \mathcal {H}(\mathbb {R}^d)\) such that \(r = Eu\) satisfies \(r(x) = u(x)\) for all \(x \in \varOmega \) and \(\Vert u\Vert _{\mathcal {H}} = \Vert r\Vert _{\mathcal {H}}\), for any \(u \in \mathcal {H}\). Moreover there exists a restriction operator \(R:\mathcal {H}(\mathbb {R}^d)\rightarrow \mathcal {H}\), as recalled in Example 5, such that \(RE:\mathcal {H}\rightarrow \mathcal {H}\) is the identity operator and \(ER:\mathcal {H}(\mathbb {R}^d) \rightarrow \mathcal {H}(\mathbb {R}^d)\) is a projection operator whose range is \(\mathcal {H}\). Moreover, note that \(f \cdot g= R(E f \cdot Eg)\) since for any \(x \in \varOmega \), \((R(E f \cdot Eg))(x) = (E f)(x)(Eg)(x) = f(x)g(x) = (f \cdot g)(x)\). Since ER is a projection operator, then \(\Vert ER\Vert _\textrm{op} \le 1\), hence

$$\begin{aligned} \Vert f \cdot g\Vert _{\mathcal {H}}&= \Vert R (Ef \cdot Eg)\Vert _{\mathcal {H}} = \Vert ER (Ef \cdot Eg) \Vert _{\mathcal {H}(\mathbb {R}^d)} \\&\le \Vert ER\Vert _\textrm{op} \Vert Ef \cdot Eg \Vert _{\mathcal {H}(\mathbb {R}^d)} \le \Vert Ef \cdot Eg \Vert _{\mathcal {H}(\mathbb {R}^d)}. \end{aligned}$$

Let \(a = Ef\) and \(b = Eg\). Denote by \(\tilde{a},\tilde{b}\) their Fourier transform and by \(\widetilde{a \cdot b}\) the Fourier transform of \(a \cdot b\) (see Proposition 2 for more details). By expanding the definition of the Hilbert norm of translation invariant kernel

$$\begin{aligned} \Vert Ef \cdot Eg \Vert ^2_{\mathcal {H}(\mathbb {R}^d)} = \Vert a \cdot b \Vert ^2_{\mathcal {H}(\mathbb {R}^d)} = (2\pi )^{-d/2}\int _{\mathbb {R}^d} \frac{|\widetilde{a \cdot b}\,(\omega )|^2}{\tilde{v}(\omega )} d\omega . \end{aligned}$$

Now we bound \(\widetilde{a\cdot b}\). Since \(\widetilde{a\cdot b} = (2 \pi )^{d/2}\tilde{a} \star \tilde{b}\) (see Proposition 2) where \(\star \) corresponds to the convolution, by expanding it and by applying Cauchy-Schwarz we obtain

$$\begin{aligned} (2 \pi )^{-d/2}|\widetilde{a \cdot b}(\omega )|^2&= |(\tilde{a} \star \tilde{b}) (\omega )|^2 = \left( \int _{\mathbb {R}^d} \tilde{a}(\sigma ) \tilde{b}(\omega - \sigma ) d\sigma \right) ^2 \\&= \left( \int _{\mathbb {R}^d} \frac{\tilde{a}(\sigma )}{\sqrt{\tilde{v}(\sigma )}} \frac{\tilde{b}(\omega - \sigma )}{\sqrt{\tilde{v}(\omega - \sigma )}} ~ \sqrt{\tilde{v}}(\sigma ) \sqrt{\tilde{v}}(\omega - \sigma ) d\sigma \right) ^2\\&\le \int _{\mathbb {R}^d} \frac{\tilde{a}^2}{\tilde{v}}(\sigma ) \frac{\tilde{b}^2}{\tilde{v}}(\omega - \sigma ) d\sigma ~ \int _{\mathbb {R}^d} \tilde{v}(\sigma ) \tilde{v}(\omega - \sigma ) d\sigma \\&= \left( \frac{\tilde{a}^2}{\tilde{v}} \star \frac{\tilde{b}^2}{\tilde{v}}\right) (\omega )~(\tilde{v} \star \tilde{v})(\omega ). \end{aligned}$$

By using the bound above together with Hölder inequality and Young inequality for convolutions, we have

$$\begin{aligned} (2\pi )^{-d/2}\int _{\mathbb {R}^d} \frac{|\widetilde{a \cdot b}\,(\omega )|^2}{\tilde{v}(\omega )} d\omega&\le \int _{\mathbb {R}^d} \left( \frac{\tilde{a}^2}{\tilde{v}} \star \frac{\tilde{b}^2}{\tilde{v}}\right) (\omega )\, \frac{(\tilde{v} \star \tilde{v})(\omega )}{\tilde{v}(\omega )}d\omega \\&\le \left\| \frac{\tilde{a}^2}{\tilde{v}} \star \frac{\tilde{b}^2}{\tilde{v}}\right\| _{L^1(\mathbb {R}^d)} \left\| \frac{\tilde{v} \star \tilde{v}}{\tilde{v}}\right\| _{L^\infty (\mathbb {R}^d)} \\&\le \left\| \frac{\tilde{a}^2}{\tilde{v}}\right\| _{L^1(\mathbb {R}^d)} \left\| \frac{\tilde{b}^2}{\tilde{v}}\right\| _{L^1(\mathbb {R}^d)} \left\| \frac{\tilde{v} \star \tilde{v}}{\tilde{v}}\right\| _{L^\infty (\mathbb {R}^d)} \\&= (2 \pi )^{d/2} \left\| \frac{\tilde{v} \star \tilde{v}}{\tilde{v}}\right\| _{L^\infty (\mathbb {R}^d)}\Vert a\Vert ^2_{\mathcal {H}(\mathbb {R}^d)}\Vert b\Vert ^2_{\mathcal {H}(\mathbb {R}^d)} = C^2, \end{aligned}$$

where in the last step we used the definitions of inner products for translation invariant kernels. The proof is concluded by noting that \(\Vert a\Vert _{\mathcal {H}(\mathbb {R}^d)} = \Vert Ef\Vert _{\mathcal {H}(\mathbb {R}^d)} = \Vert f\Vert _{\mathcal {H}}\) and the same holds for b, i.e., \(\Vert b\Vert _{\mathcal {H}(\mathbb {R}^d)} = \Vert g\Vert _{\mathcal {H}}\). A final consideration is that C can be further bounded by applying Proposition 9 and noting that \(v(0) = (2\pi )^{-d/2}\int \tilde{v}(\omega ) d\omega = (2\pi )^{-d/2}\Vert \tilde{v}\Vert _{L^1(\mathbb {R}^d)}\), via the characterization of v in terms of \(\tilde{v}\) in Proposition 2(e), since \(\tilde{v}(\omega ) \ge 0\) and integrable. \(\square \)

Proposition 9

Let \(u \in L^1(\mathbb {R}^d) \cap C(\mathbb {R}^d)\) be \(u(x) \ge 0\) for \(x \in \mathbb {R}^d\) and such that there exists a non-increasing function \(g: [0,\infty ) \rightarrow (0,\infty )\) satisfying \(u(x) \le g(\Vert x\Vert )\) for all \(x \in \mathbb {R}^d\). Then it holds :

$$\begin{aligned} \forall x \in \mathbb {R}^d,~ 0 \le (u \star u)(x) \le 2 \Vert u\Vert _{L^1(\mathbb {R}^d)}g\left( \tfrac{1}{2}\Vert x\Vert \right) . \end{aligned}$$

In particular, if \(u > 0\), it holds

$$\begin{aligned} \left\| \frac{u \star u}{u}\right\| _{L^\infty (\mathbb {R}^d)} \le 2\Vert u\Vert _{L^1(\mathbb {R}^d)}\sup _{x \in \mathbb {R}^d} \frac{g\left( \tfrac{1}{2}\Vert x\Vert \right) }{u(x)}. \end{aligned}$$

Proof

For any \(x \in \mathbb {R}^d\),

$$\begin{aligned} (u \star u)(x)= \sup _{x \in \mathbb {R}^d} \int _{\mathbb {R}^d} u(y)u(x-y) dy. \end{aligned}$$

Let \(S_x = \{y ~|~ \Vert x - y\Vert \le \tfrac{1}{2} \Vert x\Vert \}\). Note that, when \(y \in \mathbb {R}^d \setminus S_x\), then \(\Vert x-y\Vert > \tfrac{1}{2}\Vert x\Vert \). Instead, when \(y \in S_x\), then

$$\begin{aligned} \tfrac{1}{2} \Vert x\Vert \le \Vert x\Vert - \Vert x - y\Vert \le \Vert y\Vert . \end{aligned}$$

Since g is non-increasing, for any \(x \in \mathbb {R}^d\) we have

$$\begin{aligned} \int _{\mathbb {R}^d} u(y)u(x-y) dy&= \int _{S_x} u(y)u(x-y) dy ~+~ \int _{\mathbb {R}^d\setminus S_x} u(y)u(x-y) dy \\&\le \int _{S_x} g(\Vert y\Vert )u(x-y) dy ~+~ \int _{\mathbb {R}^d\setminus S_x} u(y)g(\Vert x-y\Vert )dy \\&\le \int _{S_x} g\left( \tfrac{1}{2}\Vert x\Vert \right) u(x-y)dy ~+~ \int _{\mathbb {R}^d\setminus S_x} u(y)g\left( \tfrac{1}{2}\Vert x\Vert \right) dy\\&\le \int _{\mathbb {R}^d} g\left( \tfrac{1}{2}\Vert x\Vert \right) u(x-y) dy ~+~ \int _{\mathbb {R}^d} u(y)g\left( \tfrac{1}{2}\Vert x\Vert \right) dy \\&= \int _{\mathbb {R}^d} g\left( \tfrac{1}{2}\Vert x\Vert \right) u(y) dy ~+~ \int _{\mathbb {R}^d} u(y)g\left( \tfrac{1}{2}\Vert x\Vert \right) dy\\&= 2 ~g\left( \tfrac{1}{2}\Vert x\Vert \right) \int _{\mathbb {R}^d} u(y) dy, \end{aligned}$$

where: in the first inequality we bounded u(y) with \(g(\Vert y\Vert )\) and \(u(x-y)\) with \(g(\Vert x-y\Vert )\), in the first and the second integral, respectively; in the second inequality we bounded \(g(\Vert y\Vert )\) with \(g(\tfrac{1}{2}\Vert x\Vert )\), since \(\Vert y\Vert \ge \tfrac{1}{2}\Vert x\Vert \) when \(y \in S_x\) and we bounded \(g(\Vert x-y\Vert )\) with \(g(\tfrac{1}{2}\Vert x\Vert )\), since \(\Vert x-y\Vert \ge \tfrac{1}{2}\Vert x\Vert \) when \(y \in \mathbb {R}^d\setminus S_x\); in the third we extended the integration domains to \(\mathbb {R}^d\). \(\square \)

1.2 Proof of Proposition 1

Proof

We prove here that the Sobolev kernel satisfies Assumption 2. Let \(k = k_s\) from Eq. (3.2). As we have seen in Example 1\(\mathcal {H}= W^s_2(\varOmega )\) and \(\Vert \cdot \Vert _{W^s_2(\varOmega )}\) is equivalent to \(\Vert \cdot \Vert _{\mathcal {H}}\), when \(s > d/2\) and \(\varOmega \) satisfies Assumption 1(a) since this assumption implies that \(\varOmega \) satisfies the cone condition [18].

Recall that k is translation invariant, i.e., \(k(x,x') = v(x-x')\) for any \(x,x' \in \mathbb {R}^d\), with v defined in Example 1. The Fourier transform of v is \(\tilde{v}(\omega ) = C_0(1+\Vert \omega \Vert ^2)^{-s}\) with \(C_0 = \frac{2^{d/2} \varGamma (s)}{\varGamma (s-d/2)}\)  [18]. In the rest of the proof, \(C_0\) will always refer to this constant.

We are going to divide the proof in one step per point of Assumption 2.

Proof of Assumption 2(d) for the Sobolev kernel. Let \(\alpha \in \mathbb {N}^d\), \(m = |\alpha |\). Assume \(m < s-d/2\), i.e., \(m \in \{1,\dots , \lfloor s - (d+1)/2\rfloor \}\). Since k is translation invariant, then \(\partial ^\alpha _x \partial ^\alpha _y k(x,y) = (-1)^m ~v_{2\alpha }(x-y)\) with \(v_{2\alpha }(z) = \partial ^{2\alpha }_z v(z)\) for all \(z \in \mathbb {R}^d\). So

$$\begin{aligned}{} & {} \sup _{x,y \in \varOmega }|\partial ^\alpha _x \partial ^\alpha _y k(x,y)| = \sup _{x,y \in \varOmega }|\partial ^\alpha _x \partial ^\alpha _y v(x-y)| \le \sup _{z \in \mathbb {R}^d}|\partial ^{2\alpha }_z v(z)| \\{} & {} \quad \le (2\pi )^{-d/2}\Vert \omega ^{2\alpha } \tilde{v}(z)\Vert _{L^1(\mathbb {R}^d)}, \end{aligned}$$

where in the last step we used elementary properties of the Fourier transform (in particular the ones recalled in Proposition 2(c) and 2(e)). Let \(S_{d-1} = 2\frac{\pi ^{d/2}}{\varGamma (d/2)}\) be the area of the \(d-1\) dimensional sphere. Since \(m < s - d/2\) and \(\tilde{v} \ge 0\),

$$\begin{aligned} \Vert \omega ^{2\alpha } \tilde{v}(z)\Vert _{L^1(\mathbb {R}^d)}&\le \int _{\mathbb {R}^d} \Vert \omega \Vert ^{2m}\tilde{v}(\omega ) d\omega = C_0 S_{d-1} \int _0^\infty \frac{r^{2m + d - 1}}{(1+r^2)^s} dr \\&= C_0 S_{d-1} \int _0^\infty \frac{t^{m +d/2 -1}}{2(1+t)^s} dt = C_0 S_{d-1} \tfrac{\varGamma (m+d/2)\varGamma (s-d/2-m)}{2\varGamma (s)}, \end{aligned}$$

where we performed a change of variable \(r = \sqrt{t}\) and \(dr = \frac{dt}{2\sqrt{t}}\) and applied Eq. 5.12.3 pag. 142 of [44] to the resulting integral. Thus, Assumption 2(d) holds with

$$\begin{aligned} \textsf{D}_m^2 = C_0 \frac{\pi ^{d/2} \varGamma (m+d/2) \varGamma (s- m -d/2)}{\varGamma (d/2)\varGamma (s)} = \frac{(2\pi )^{d/2} \varGamma (m+d/2)\varGamma (s-d/2-m)}{\varGamma (s-d/2)\varGamma (d/2)}. \end{aligned}$$

Proof of Assumption 2(a) for the Sobolev kernel. First, note that \(C^\infty (\mathbb {R}^d)|_\varOmega \subset W^s_\infty (\varOmega ) \subset W^s_2(\varOmega )\). Indeed, since \(\varOmega \) is bounded, for any \(f \in C^\infty (\mathbb {R}^d)\), \(\Vert \partial ^\alpha f|_\varOmega \Vert _{L^\infty (\varOmega )} < \infty \) for any \(\alpha \in \mathbb {N}^d\). This shows that \(f|_\varOmega \in W^s_\infty (\varOmega )\). Moreover \(W^s_\infty (\varOmega ) \subset W^s_2(\varOmega )\) since \(\Vert \cdot \Vert _{L^2(\varOmega )} \le \text {vol}(\varOmega )^{1/2} \Vert \cdot \Vert _{L^\infty (\varOmega )}\) because \(\varOmega \) is bounded. Second, since \(\tilde{v}(\omega ) = g_s(\Vert \omega \Vert )\) with \(g_s(t) = C_0 (1 + t^2)^{-s}\), positive and non-increasing, we can apply Lemma 10. Therefore, for \(C = \sqrt{2}(2\pi )^{d/2}v(0)^{1/2} \sup _{t \ge 0}\big (\tfrac{g_s(t/2)}{g_s(t)}\big )^{1/2}\) it holds \(\Vert f \cdot g\Vert _{\mathcal {H}} \le C \Vert f\Vert _{\mathcal {H}}\Vert g\Vert _{\mathcal {H}}\). In particular we have \(\sup _{t \ge 0}\big (\tfrac{g_s(t/2)}{g_s(t)}\big )^{1/2} \le 2^{s}\) and \(v(0) = 1\), since \(\lim _{t\rightarrow 0} t^{s-d/2}{{\mathcal {K}}}_{s-d/2}(t) = \varGamma (s-d/2)/2^{1+d/2-s} = 1/C_0\) ( [44] Eq. 10.30.2 pag. 252) and \(v(x) = C_0 t^{s-d/2}{{\mathcal {K}}}_{s-d/2}(t),~t = \Vert x\Vert \). Thus, Assumption 2(a) holds with constant

$$\begin{aligned} \textsf{M}=\pi ^{d/2} 2^{(2s+d+1)/2}. \end{aligned}$$

Proof of Assumption 2(b) for the Sobolev kernel. First we recall from [11] that for any \(s > d/2\), there exists a constant \(C_s\) such that

$$\begin{aligned} \forall h \in W^s_2(\mathbb {R}^d),~ \Vert h\Vert _{L^\infty (\mathbb {R}^d)} \le C_s \Vert h\Vert _{ W^s_2(\mathbb {R}^d)}. \end{aligned}$$

In particular, this shows that \(W^s_2(\mathbb {R}^d) \subset L^\infty (\mathbb {R}^d)\). Fix such a constant \(C_s\) in the rest of the proof.

Let \(p \in \mathbb {N}\) and \(g \in C^\infty (\mathbb {R}^p)\) with \(g(0,0,\dots ,0) = 0\). From (i) of Thm. 11 in [45], there exists a constant \(c_g\) depending only on gps such that for any \(h_1,\dots ,h_p \in W^s_2(\mathbb {R}^d)\cap L^\infty (\mathbb {R}^d)\), it holds

$$\begin{aligned} \Vert g(h_1,\dots ,h_p)\Vert _{W^s_2(\mathbb {R}^d)} ~~\le ~~ c_g \sup _{i \in [p]}~\Vert h_i\Vert _{W^s_2(\mathbb {R}^d)}\left( 1 +\Vert h_i\Vert _{L^\infty (\mathbb {R}^d)}^{\max (0,s-1)}\right) . \end{aligned}$$

Since \(s > d/2\), the bound above shows, in particular, that for any \(h_1,\dots ,h_p \in W^s_2(\mathbb {R}^d)\), it holds

$$\begin{aligned} \Vert g(h_1,\dots ,h_p)\Vert _{W^s_2(\mathbb {R}^d)}{} & {} \le c^\prime _g \sup _{i \in [p]}~\left( \Vert h_i\Vert +\Vert h_i\Vert _{W^s_2(\mathbb {R}^d)}^{\max (1,s)}\right) , \\ c^\prime _g{} & {} = c_g\max \left( 1 , C_s^{\max (0,s-1)}\right) . \end{aligned}$$

Since \(W^s_2(\mathbb {R}^d) = \mathcal {H}(\mathbb {R}^d)\) and \(\Vert \cdot \Vert _{W^s_2(\mathbb {R}^d)} \) and \( \Vert \cdot \Vert _{\mathcal {H}(\mathbb {R}^d)}\) are equivalent (see [11]), the previous inequality holds for \(\Vert \cdot \Vert _{\mathcal {H}(\mathbb {R}^d)}\) with a certain constant \(c^\prime _g\) depending only on gpsd. In particular, this implies that \(g(h_1,\dots ,h_p) \in \mathcal {H}(\mathbb {R}^d)\) for any \(h_1,\dots , h_p \in \mathcal {H}(\mathbb {R}^d)\). Now we are going to prove the same implication for the restriction on \(\varOmega \).

First note that any function in \(a \in C^\infty (\mathbb {R}^p)\) can be written as \(a(z) = q\,1(z) + g(z)\), \(z \in \mathbb {R}^p\) where \(q = a(0,0,\cdots ,0) \in \mathbb {R}\), \(g \in C^\infty (\mathbb {R}^p)\) with \(g(0,0,\cdots ,0) = 0\) and \(1(z) = 1\) for all \(z \in \mathbb {R}^p\). Recall the definition and basic results on the extension operator \(E:\mathcal {H}\rightarrow \mathcal {H}(\mathbb {R}^d)\) from Example 5. For any \(f_1,\dots , f_p \in \mathcal {H}\), note that \(g((Ef_1)(x),\dots ,(Ef_p)(x)) = g(f_1(x),\dots ,f_p(x))\) for all \(x \in \varOmega \). We can now apply the results of Example 5 to show that \(g(f_1,\dots ,f_p) \in \mathcal {H}\) :

$$\begin{aligned} \Vert g(f_1,\dots ,f_p)\Vert _{\mathcal {H}}&= \inf _{u} \Vert u\Vert _{\mathcal {H}(\mathbb {R}^d)}~~s.t.~~ u(x) = g(f_1(x),\dots ,f_p(x))~\forall x \in \varOmega \\&\le \Vert g(Ef_1,\dots ,Ef_p)\Vert _{\mathcal {H}(\mathbb {R}^d)} \\&\le c^\prime _g \sup _{j \in [p]} \Vert E f_j\Vert _{\mathcal {H}(\mathbb {R}^d)} + \Vert E f_j\Vert _{\mathcal {H}(\mathbb {R}^d)}^{\max (1,s)} \\&= c^\prime _g \sup _{j \in [p]} \Vert f_j\Vert _{\mathcal {H}} + \Vert f_j\Vert _{\mathcal {H}}^{\max (1,s)} < \infty , \end{aligned}$$

where in the last step we used the fact that \(\Vert \cdot \Vert _{\mathcal {H}} = \Vert E \,\cdot \Vert _{\mathcal {H}(\mathbb {R}^d)}\). The proof of this point is concluded by noting that, \(a(f_1,\dots ,f_p) \in \mathcal {H}\), since \(1 \in \mathcal {H}\), due to the Point (a) above, and

$$\begin{aligned} \Vert a(f_1,\dots ,f_p)\Vert _{\mathcal {H}} \le q\Vert 1\Vert _{\mathcal {H}} + \Vert g(f_1,\dots ,f_p)\Vert _{\mathcal {H}} < \infty . \end{aligned}$$

Proof of Assumption 2(c) for the Sobolev kernel. This proof is done in Lemma 11, right below. \(\square \)

Before stating Lemma 11 we are going to recall some properties. First, recall the Young inequality :

$$\begin{aligned} \forall f \in L^2(\mathbb {R}^d),~\forall g \in L^1(\mathbb {R}^d),~ \Vert f\star g\Vert _{L^2(\mathbb {R}^d)} \le \Vert f\Vert _{L^2(\mathbb {R}^d)} ~ \Vert g\Vert _{L^1(\mathbb {R}^d)}. \end{aligned}$$

Moreover, by definition of the Sobolev kernel, it is a translation-invariant kernel with v defined in Example 1, with Fourier transform \(\tilde{v}(\omega ) = C_0 (1+\Vert \omega \Vert ^2)^{-s}\). Let \(\mathcal {H}(\mathbb {R}^d)\) be the reproducing kernel Hilbert space on \(\mathbb {R}^d\) associated to the Sobolev kernel \(k_s\). As recalled in Example 6, the \(\mathcal {H}(\mathbb {R}^d)\)-norm is characterized by

$$\begin{aligned} \forall f \in \mathcal {H}(\mathbb {R}^d),~ \Vert f\Vert _{\mathcal {H}(\mathbb {R}^d)} = (2\pi )^{-d/4} \Vert \tilde{f}/\sqrt{\tilde{v}}\Vert _{L^2(\mathbb {R}^d)}, \end{aligned}$$
(D.1)

where \(\tilde{f} = {{\mathcal {F}}}(f)\) is the Fourier transform of f (see [11]). Then we recall that \(\tilde{v} \in L^1(\mathbb {R}^d)\), since \(s > d/2\), so for any \(f \in \mathcal {H}(\mathbb {R}^d)\)

$$\begin{aligned} \Vert \tilde{f}\Vert _{L^1(\mathbb {R}^d)} = \Vert \sqrt{\tilde{v}} \tilde{f}/\sqrt{\tilde{v}}\Vert _{L^1(\mathbb {R}^d)} \le \Vert \sqrt{\tilde{v}}\Vert _{L^2(\mathbb {R}^d)}\Vert \tilde{f}/\sqrt{\tilde{v}}\Vert _{L^2(\mathbb {R}^d)} = C_1 \Vert f\Vert _{\mathcal {H}(\mathbb {R}^d)}. \end{aligned}$$
(D.2)

where \(C_1 = (2\pi )^{d/4} \Vert \sqrt{\tilde{v}}\Vert _{L^2(\mathbb {R}^d)}\). A useful consequence of the inequality above is obtained by considering that \(\Vert f\Vert _{L^\infty (\mathbb {R}^d)}\) is bounded by the \(L^1\) norm of \(\tilde{f}\) (see Proposition 2(e)), then

$$\begin{aligned} \Vert f\Vert _{L^\infty } \le (2\pi )^{-d/2} \Vert \tilde{f}\Vert _{L^1(\mathbb {R}^d)} \le C_2 \Vert f\Vert _{\mathcal {H}(\mathbb {R}^d)}, \end{aligned}$$
(D.3)

where \(C_2 = (2\pi )^{-d/4} \Vert \sqrt{\tilde{v}}\Vert _{L^2(\mathbb {R}^d)}\).

Lemma 11

(Assumption 2(c) for Sobolev Kernels). Let \(\mathcal {H}\) be the RKHS associated to the translation invariant Sobolev Kernel defined in Example 1, with \(s > d/2\). Then Assumption 2(c) is satisfied.

Proof

For the rest of the proof we fix \(u : \varOmega \rightarrow \mathbb {R}\) with \(u \in \mathcal {H}\), \(r >0\) and \(z \in \mathbb {R}^d\) such that \(B_r(z) \subset \varOmega \). Let \(E_\varOmega : \mathcal {H}\rightarrow \mathcal {H}(\mathbb {R}^d)\) be the extension operator from \(\varOmega \) to \(\mathbb {R}^d\) (its properties are recalled in Example 5).Let \(\chi \in C^\infty _0(\mathbb {R}^d)\) be given by Lemma 6 such that \(\chi = 1\) on \(B_r(z)\), \(\chi = 0\) on \(\mathbb {R}^d \setminus {B_{2r}(z)}\) and \(\chi \in [0,1]\). Define for any \(t \in \mathbb {R}\) and \(x \in \mathbb {R}^d\)

$$\begin{aligned} h_t(x) = \chi (x) w_t(x), \qquad w_t(x) = w((1-t)z + tx), \qquad w = E_\varOmega u. \end{aligned}$$

In particular we recall that, since \(E_\varOmega \) is a partial isometry (see Example 5) then \(\Vert w\Vert _{\mathcal {H}(\mathbb {R}^d)} = \Vert u\Vert _{\mathcal {H}}\).

Step 1. Fourier transform of \(w_t\). Denote with \(\widetilde{w}\) the Fourier transform of w which is well defined since \(w \in \mathcal {H}(\mathbb {R}^d) \subset L^2(\mathbb {R}^d)\) (see [11]), with \(\tilde{\chi }\) the Fourier transform of \(\chi \). Since For any \(t \ne 0\), denote with \(\widetilde{w_t}\) the Fourier transform of \(w_t\) which is well defined using the results of Proposition 2, and which satisfies

$$\begin{aligned} \forall t \ne 0,~\forall \omega \in \mathbb {R}^d,~ \widetilde{w_t}(\omega ) = |t|^{-d} e^{i\frac{1-t}{t}z^\top \omega }\tilde{w}(\omega /t). \end{aligned}$$

Step 2. Separating low and high order derivatives of \(h_t\), and bounding the low order terms. For \(t \ne 0\), denote with \(\widetilde{h_t}\) the Fourier transform of \(h_t\) which is well defined since \(\chi \) is bounded and \(w_t \in L^2(\mathbb {R}^d)\). We will now bound \(\Vert h_t\Vert _{\mathcal {H}(\mathbb {R}^d)}\) for all \(t \ne 0\), by using the characterization in Eq. (D.1). Since \((x+y)^s \le 2^{\max (s-1,0)}(x^s + y^s)\) for any \(x,y \ge 0\), \(s \ge 0\), then \((1 + \Vert \omega \Vert ^2)^{s/2} \le c_1(1+\Vert \omega \Vert ^s)\) for any \(\omega \in \mathbb {R}^d\), with \(c_1 = 2^{\max (s/2-1,0)}\) so using Eq. (D.1), we have

$$\begin{aligned}{} & {} \sqrt{C_0}(2\pi )^{d/4}\Vert h_t\Vert _{\mathcal {H}(\mathbb {R}^d)} = \Vert (1 + \Vert \cdot \Vert ^2)^{s/2} \widetilde{h_t}\Vert _{L^2(\mathbb {R}^d)}\\{} & {} \qquad \qquad \qquad \le c_1\,\Vert \widetilde{h_t}\Vert _{L^2(\mathbb {R}^d)} + c_1\,\Vert ~|\cdot |^s_{\mathbb {R}^d} \widetilde{h_t}\Vert _{L^2(\mathbb {R}^d)}. \end{aligned}$$

The first term on the right hand side can easily be bounded using the fact that the Fourier transform is an isometry of \(L^2(\mathbb {R}^d)\) (see Proposition 2 for more details), indeed

$$\begin{aligned} \Vert \widetilde{h_t}\Vert _{L^2(\mathbb {R}^d)} = \Vert h_t\Vert _{L^2(\mathbb {R}^d)}&= \Vert \chi \cdot w_t\Vert _{L^2(\mathbb {R}^d)} \le \Vert w_t\Vert _{L^\infty (\mathbb {R}^d)} \Vert \chi \Vert _{L^2(\mathbb {R}^d)} < \infty . \end{aligned}$$

since \(\chi \in C^\infty _0(\mathbb {R}^d)\) by definition, so it it bounded and has compact support, implying that \(\Vert \chi \Vert _{L^2(\mathbb {R}^d)} < \infty \), moreover \(\Vert w_t\Vert _{L^\infty (\mathbb {R}^d)} = \Vert w\Vert _{L^\infty (\mathbb {R}^d)}\) and \(\Vert w\Vert _{L^\infty (\mathbb {R}^d)} \le C_2 \Vert w\Vert _{\mathcal {H}(\mathbb {R}^d)}\) as recalled in Eq. (D.3) (the constant \(C_2\) is defined in the same equation).

Step 3. Decomposing the high order derivatives of \(h_t\). Note that since \(\widetilde{h_t} = \widetilde{\chi \cdot w_t}\), by property of the Fourier transform (see Proposition 2(b)), \(\widetilde{\chi \cdot w_t} = (2\pi )^{d/2} \widetilde{\chi } \star \widetilde{w_t}\). Moreover, since \(\Vert \omega \Vert ^s \le (\Vert \omega -\eta \Vert + \Vert \eta \Vert )^s \le c_s(\Vert \omega - \eta \Vert ^s + \Vert \eta \Vert ^s)\) for any \(\omega ,\eta \in \mathbb {R}^d\), with \(c = 2^{\max (s-1,0)}\), then, for all \(\omega \in \mathbb {R}^d\) we have

$$\begin{aligned} \Vert \omega \Vert ^s |\widetilde{h_t}(\omega )|&= \Vert \omega \Vert ^s |\widetilde{\chi \cdot w_t}(\omega )| = \Vert \omega \Vert ^s (2\pi )^{\frac{d}{2}} |(\tilde{\chi } \star \tilde{w}_t)(\omega )| \\&= (2\pi )^{\frac{d}{2}} |\int _{\mathbb {R}^d} \Vert \omega \Vert ^s \widetilde{\chi }(\eta ) \widetilde{w_t}(\omega - \eta ) d \eta |\\&\le (2\pi )^{\frac{d}{2}} c\int _{\mathbb {R}^d}(|\widetilde{\chi }(\eta )| ~\Vert \eta \Vert ^s)~|\widetilde{w_t}(\omega - \eta )| ~d\eta \\&\quad + (2\pi )^{\frac{d}{2}} c \int _{\mathbb {R}^d}|\widetilde{\chi }(\eta )| ~(|\widetilde{w_t}(\omega - \eta )|~\Vert \omega - \eta \Vert ^{s}) ~d\eta \\&= c\, ((J_s |\widetilde{\chi }|) \star |\widetilde{w_t}|)(\omega ) ~+~ c\, (|\tilde{\chi }| \star (J_s |\widetilde{w_t}|))(\omega ), \end{aligned}$$

where we denoted by \(J_s\) the function \(J_s(\omega ) = \Vert \omega \Vert ^s\) for any \(\omega \in \mathbb {R}^d\). Applying Young’s inequality, it holds :

$$\begin{aligned} \Vert J_s \widetilde{h_t}\Vert _{L^2(\mathbb {R}^d)}&\le c\, \Vert (J_s |\widetilde{\chi }|) \star |\widetilde{w_t}|\Vert _{L^2(\mathbb {R}^d)} ~+~ c\Vert |\tilde{\chi }| \star (J_s |\widetilde{w_t}|)\Vert _{L^2(\mathbb {R}^d)} \\&\le c \Vert J_s \widetilde{\chi }\Vert _{L^2(\mathbb {R}^d)} \Vert \widetilde{w}_t\Vert _{L^1(\mathbb {R}^d)} + c\Vert J_s \widetilde{w}_t \Vert _{L^2(\mathbb {R}^d)}~\Vert \widetilde{\chi }\Vert _{L^1(\mathbb {R}^d)}. \end{aligned}$$

Step 4. Bounding the elements of the decomposition. Now we are ready to bound the four terms of the decomposition of \(\Vert J_s \widetilde{h_t}\Vert _{L^2(\mathbb {R}^d)}\). First term, since \(\chi \in C^\infty _0(\mathbb {R}^d) \subset \mathcal {H}(\mathbb {R}^d)\), and \(J_s(\omega ) \le \sqrt{C_0/\tilde{v}(\omega )}\) for any \(\omega \in \mathbb {R}^d\), then \(\Vert J_s \widetilde{\chi }\Vert _{L^2(\mathbb {R}^d)} \le \sqrt{C_0} \Vert \widetilde{\chi }/\sqrt{\tilde{v}}\Vert _{L^2(\mathbb {R}^d)} =(2\pi )^{d/4}\sqrt{C_0}\Vert \chi \Vert _{\mathcal {H}(\mathbb {R}^d)}\), where we used Eq. (D.1). Second term, \(\Vert \widetilde{\chi }\Vert _{L^1(\mathbb {R}^d)} < \infty \), since \(\Vert \widetilde{\chi }\Vert _{L^1(\mathbb {R}^d)} \le C_1 \Vert \chi \Vert _{\mathcal {H}(\mathbb {R}^d)}\), via Eq. (D.2) (the constant \(C_1\) is defined in the same equation) and we have seen already that \(\Vert \chi \Vert _{\mathcal {H}(\mathbb {R}^d)}\) is bounded. Third term, by a change of variable \(\tau = \omega /t\),

$$\begin{aligned} \Vert \widetilde{w}_t\Vert _{L^1(\mathbb {R}^d)}{} & {} = \int _{\mathbb {R}^d}|\widetilde{w}_t(\omega )| d\omega = \int _{\mathbb {R}^d}|t|^{-d}|\tilde{w}(\omega /t)| d\omega \\{} & {} = \int _{\mathbb {R}^d} |\tilde{w}(\tau )| d\tau = \Vert \tilde{w}\Vert _{L^1(\mathbb {R}^d)}, \end{aligned}$$

moreover \(\Vert \tilde{w}\Vert _{L^1(\mathbb {R}^d)} \le C_1 \Vert w\Vert _{\mathcal {H}(\mathbb {R}^d)} = C_1 \Vert u\Vert _{\mathcal {H}}\) via Eq. (D.2) and the fact that \(\Vert w\Vert _{\mathcal {H}(\mathbb {R}^d)} = \Vert u\Vert _{\mathcal {H}}\) as recalled at the beginning of the proof. Finally, fourth term, for \(t \in \mathbb {R}\setminus {\{0\}}\),

$$\begin{aligned} \Vert J_s \widetilde{w}_t\Vert ^2_{L^2(\mathbb {R}^d)}&= \int _{\mathbb {R}^d}\Vert \omega \Vert ^{2s} |\widetilde{w_t}(\omega )|^2~d\omega = t^{-2d}\int _{\mathbb {R}^d}\Vert \omega \Vert ^{2s} |\widetilde{w}(\omega /t)|^2 d\omega \\&= t^{2s-d} \int _{\mathbb {R}^d}{\Vert \tau \Vert ^{2s}|\widetilde{w}(\tau )|^2 d\tau } \le t^{2s-d} \int _{\mathbb {R}^d}{(1+\Vert \tau \Vert ^2)^{s}|\widetilde{w}(\tau )|^2 d\tau } \\&= t^{2s-d} (2\pi )^{d/2}C_0\Vert w\Vert ^2_{\mathcal {H}(\mathbb {R}^d)}. \end{aligned}$$

where we performed a change of variable \(\omega = t\,\tau \), \(t^d d\tau = d\omega \) and used the definition in Eq. (D.1) and the fact that \(\Vert \tau \Vert ^{2s} \le (1+\Vert \tau \Vert ^2)^s\) for any \(\tau \in \mathbb {R}^d\). The proof of the bound of the fourth term is concluded by recalling that \(\Vert w\Vert _{\mathcal {H}(\mathbb {R}^d)} = \Vert u\Vert _{\mathcal {H}}\) as discussed in the proof of the bound for the previous term.

Conclusion. Putting all our bounds together, we get :

$$\begin{aligned} \forall t \in \mathbb {R}\setminus {\{0\}},~ \Vert h_t\Vert _{\mathcal {H}(\mathbb {R}^d)} \le (A + B~t^{s-d/2}) \Vert \chi \Vert _{\mathcal {H}(\mathbb {R}^d)}\Vert u\Vert _{\mathcal {H}}, \end{aligned}$$

where \(A = c_1 C_2 + c c_1 C_1 (2\pi )^{d/4}\sqrt{C_0}\) and \(B = c c_1 C_1 (2\pi )^{d/4} \sqrt{C_0}\), where \(c = 2^{\max (s-1,0)}\), \(c_1 = 2^{\max (s/2-1,0)}\), while \(C_1\) is defined in Eq. (D.2), \(C_2\) in Eq. (D.3).

Now define

$$\begin{aligned} \forall x \in \mathbb {R}^d,~ \overline{g}_{z,r}(x) = \int _0^1 (1-t) h_t(x) dt, \end{aligned}$$

and note that, by construction \(\overline{g}_{z,r}(x) = \int _0^1 (1-t) u(t z + (1-t)x) dt\) for any \(x \in B\) since u and \(\chi w\) coincide on B.

Note that the map \(t \in (0,1) \mapsto (1-t)\Vert h_t\Vert _{\mathcal {H}(\mathbb {R}^d)}\) is measurable, using the expression in Eq. (D.1).

Moreover, since for all \(t \in (0,1)\), it holds \(\Vert h_t\Vert _{\mathcal {H}(\mathbb {R}^d)} \le (A + B t^{s-d/2})\Vert \chi \Vert _{\mathcal {H}(\mathbb {R}^d)}\Vert u\Vert _{\mathcal {H}} \le (A + B)\Vert \chi \Vert _{\mathcal {H}(\mathbb {R}^d)}\Vert u\Vert _{\mathcal {H}}\) since \(s > d/2\), the map \(t \mapsto (1-t)h_t\) is in integrable, and thus

$$\begin{aligned} \Vert \overline{g}_{z,r}\Vert _{\mathcal {H}(\mathbb {R}^d)}&= \big \Vert \int _0^1 (1-t) h_t dt\big \Vert _{\mathcal {H}(\mathbb {R}^d)}\le \int _0^1 |1-t| \Vert h_t\Vert _{\mathcal {H}(\mathbb {R}^d)} dt \\&\le (A + B)\Vert \chi \Vert _{\mathcal {H}(\mathbb {R}^d)}\Vert u\Vert _{\mathcal {H}} < \infty , \end{aligned}$$

which implies that the function \(\overline{g}_{z,r}\) belongs to \(\mathcal {H}(\mathbb {R}^d)\). Finally, denote by \(R_\varOmega :\mathcal {H}(\mathbb {R}^d) \rightarrow \mathcal {H}\) the restriction operator (see Example 5 for more details). By construction \((R_\varOmega g)(x) = g(x)\) for any \(g \in \mathcal {H}(\mathbb {R}^d)\) and \(x \in \varOmega \), defining \(g_{z,r} = R_\varOmega \overline{g}_{z,r}\) the lemma is proven. \(\square \)

Proofs for Algorithm 1

We start with two technical lemmas that will be used by the proofs in this section.

Lemma 12

(Technical result). Let \(\alpha \ge 1\), \(\beta \ge 2\) and \(n \in \mathbb {N}\). If \(n \ge 2\alpha \log (2\beta \alpha )\), then it holds

$$\begin{aligned} \frac{\alpha \log (\beta n)}{n} \le 1. \end{aligned}$$

Proof

Note that the function \(x \mapsto \frac{\log (\beta x)}{x}\) is strictly decreasing on \([\exp (1)/\beta ,+\infty ]\).

Moreover, \(2\alpha \log (2\beta \alpha ) \ge 2\log 4 \ge \exp (1)/2 \ge \exp (1)/\beta \) since \(\beta \ge 2\) and \(\alpha \ge 1\).

Now assume \(n \ge c \alpha \) with \(c = 2\log (2\beta \alpha )\). It holds:

$$\begin{aligned} \frac{\alpha \log (\beta n)}{n} \le \frac{\log (\beta c \alpha )}{c} \le \frac{\log (\tfrac{c}{2}) + \log (2\alpha \beta )}{c} \le \frac{1}{2} + \frac{1}{2}~\frac{2\log (2\beta \alpha )}{c} \le 1, \end{aligned}$$

where we used the definition of c and the fact that \(\log (c/2) \le c/2 -1 \le c/2\). \(\square \)

Lemma 13

Let \(\overrightarrow{u} \in S_{d-1} = \{x \in \mathbb {R}^{d}~|~\Vert x\Vert = 1\}\), \(\alpha \in [0,\pi /2]\), \(x_0 \in \mathbb {R}^d\) and \(t > 0\). Define the cone centered at \(x_0\), directed by \(\overrightarrow{u}\) of radius t with aperture \(\alpha \):

$$\begin{aligned} C_{x_0,\overrightarrow{u},t}^\alpha = \left\{ x \in B_t(x_0)~|~ \tfrac{x-x_0}{\Vert x-x_0\Vert } \cdot \overrightarrow{u} \le \cos (\alpha ),~x \ne x_0 \right\} , \end{aligned}$$

where we denoted by \(\cdot \) the scalar product among vectors. Then the volume of this cone is lower bounded as

$$\begin{aligned} \text {vol}(C_{x_0,\overrightarrow{u},t}^\alpha ) \ge \frac{(\sqrt{\pi }\sin (\alpha ))^{d-1}(t~\cos \alpha )^d}{d\varGamma ((d+1)/2)}. \end{aligned}$$

Moreover, let \(x_0 \in \mathbb {R}^d\) and \(r >0\). Let \(x \in B_r(x_0)\) and \(0 < t \le r\). The intersection \(B_t(x) \cap B_r(x_0)\) contains the cone \(C_{x,\overrightarrow{u},t}^{\pi /3}\), where \(\overrightarrow{u} = \frac{x-x_0}{\Vert x-x_0\Vert }\) if \(x \ne x_0\) and any unit vector otherwise.

Proof

1. Bound on the volume of the cone. Without loss of generality, assume \(x_0 = 0\) and \(\overrightarrow{u} = e_1\) since the Lebesgue measure is invariant by translations and rotations. A simple change of variable also shows that \(\text {vol}(C_{0,\overrightarrow{u},t}^\alpha ) = t^d \text {vol}(C_{0,\overrightarrow{u},1}^\alpha )\). Now note the following inclusion (the proof is trivial):

$$\begin{aligned} \widetilde{C} := \left\{ x = (x_1,z) \in \mathbb {R}^d = \mathbb {R}\times \mathbb {R}^{d-1}~:~z \le \cos (\alpha ),~\Vert z\Vert _{\mathbb {R}^{d-1}} \le x_1\sin (\alpha )\right\} \subset C_{0,e_1,1}^\alpha . \end{aligned}$$

It is possible to compute the volume of the left hand term explicitly :

$$\begin{aligned} \text {vol}(\widetilde{C})&= \int _{\mathbb {R}} {\varvec{1}_{x_1 \le \cos (\alpha )} \left( \int _{\mathbb {R}^{d-1}}{\varvec{1}_{\Vert z\Vert \le x_1 \sin (\alpha )} dz}\right) dx_1} \\&= \int _{0}^{\cos (\alpha )}{V_{d-1}(\sin \alpha x_1)^{d-1}~dx} \\&= V_{d-1}\frac{\sin ^{d-1}(\alpha ) \cos ^d(\alpha )}{d} , \end{aligned}$$

where \(V_{d-1} = \pi ^{(d-1)/2}/\varGamma ((d-1)/2 +1)\) denotes the volume of the \(d-1\) dimensional ball.

2. Proof of the second point The case where \(x = x_0\) is trivial since \(t \le r\). Assume therefore \(x \ne x_0\) and note that by definition, \(C_{x,\overrightarrow{u},t}^{\pi /3} \subset B_t(x)\). We will now show that \(C_{x,\overrightarrow{u},t}^{\pi /3} \subset B_r(x_0)\). Let \(y \in C_{x,\overrightarrow{u},t}^{\pi /3}\) and assume \(y \ne x\) (if \(y=x\) then \(y \in B_r(x_0)\)). Expanding the dot product

$$\begin{aligned} \Vert y-x_0\Vert ^2&= \Vert y - x\Vert ^2 + 2(y-x)\cdot (x-x_0) + \Vert x-x_0\Vert ^2 \\&= \Vert y - x\Vert ^2 - 2\Vert y-x\Vert ~\Vert x_0-x\Vert ~\tfrac{y-x}{\Vert y-x\Vert } \cdot \overrightarrow{u} + \Vert x-x_0\Vert ^2\\&\le \Vert x-y\Vert ^2 - \Vert x-y\Vert ~\Vert x-x_0\Vert + \Vert x-x_0\Vert ^2, \end{aligned}$$

where the last inequality comes from the definition of the cone and \(\cos \pi /3 =\frac{1}{2}\). Let us distinguish two cases:

  • if \(t > \Vert x_0-x\Vert \), we have \(- \Vert x-y\Vert \Vert x_0-x\Vert \le - t^2\) and hence \(\Vert y-x_0\Vert ^2 \le t^2 \le r^2\);

  • otherwise \(\Vert x-y\Vert \le t \le \Vert x_0-x\Vert \) and thus \(\Vert y-x_0\Vert ^2 \le \Vert x-x_0\Vert ^2 \le r^2\).

In any case, \(y \in B_r(x_0)\), which concludes the proof. \(\square \)

1.1 Proof of Theorem 4

Proof of Theorem 4

Fix \(\varOmega \) as in Theorem 4. Let U be the uniform probability over \(\varOmega \), i.e., \(U(A) = \tfrac{\text {vol}(A \cap \varOmega )}{\text {vol}(\varOmega )}\) for any Borel-measurable set A. Let \(\mathbb {P} = U^{\otimes n}\) over \(\varOmega ^n\). Throughout this proof, we will use the notation \(V_d\) to denote the volume of the d-dimensional unit ball (recall that \(V_d = \tfrac{\pi ^{d/2}}{\varGamma (d/2+1)}\)).

Step 1. Covering \(\varOmega \). Let \(t > 0\). We say that a subset \(\overline{X}\) of \(\varOmega \) is a t (interior) covering of \(\varOmega \) if \(\varOmega \subset \bigcup _{x \in \overline{X}}{B_t(x)}\). Denote with \(N_t\) the minimal cardinal \(|\overline{X}|\) of a t interior covering of \(\varOmega \) and fix \(\overline{X}_t\) a t interior covering of \(\varOmega \) whose cardinal is minimum, i.e., \(|\overline{X}_t| = N_t\). Since the diameter of \(\varOmega \) is bounded by 2R, it is known that \(N_t \le (1 + 2R/t)^d\)

To prove this fact , one defines a maximal t/2-packing of \(\varOmega \) as a maximal set \(\overline{Y}_{t/2} \subset \varOmega \) such that the balls \(B_{t/2}({\overline{y}})\) are disjoint. It is then easy to check that if \(\overline{Y}_{t/2}\) is a maximal t/2-packing, then it is also a t-covering and hence \(N_t \le |\overline{Y}_{t/2}|\). Finally, since \(\varOmega \) is included in a ball of radius \(B_{2R}(x_0)\) for some \(x_0 \in \mathbb {R}^d\) and since \(\overline{Y}_{t/2} \subset \varOmega \), it holds \(\bigcup _{\overline{y} \in \overline{Y}_{t^{\prime }}}B_t(\overline{y}) \subset B_{R+t/2}(x_0)\). Since the \(B_t(\overline{y})\) are two by two disjoint, the result follows from the following equation:

$$\begin{aligned} |\overline{Y}|_{t/2} (t/2)^d V_d = \text {vol}\left( \cup _{\overline{y} \in \overline{Y}_{t^{\prime }}}B_t(\overline{y})\right) \le \text {vol}(B_{R+t/2}(x_0)) = (R + t/2)^d V_d. \end{aligned}$$

Step 2. Probabilistic analysis. Note that for any \((x_1,...,x_n) \in \varOmega ^n\), writing \(\widehat{X} = \{x_1,..,x_n\}\), it holds:

$$\begin{aligned} h_{\widehat{X},\varOmega }&= \max _{x \in \varOmega } \min _{i \in [n]}\Vert x-x_i\Vert = \max _{\overline{x} \in \overline{X}_t} \max _{x \in B_t(\overline{x}) \cap \varOmega } \min _{i \in [n]}\Vert x-x_i\Vert \\&\le t + \max _{\overline{x} \in \overline{X}_t} \min _{i \in [n]}\Vert \overline{x}-x_i\Vert . \end{aligned}$$

Define E to be the following event :

$$\begin{aligned} E= \{(x_1,\dots ,x_n) \in \varOmega ^n ~|~ \max _{j \in [m]} \min _{i \in [n]}\Vert \overline{x}_j-x_i\Vert < t \}. \end{aligned}$$

The n tuple \((x_1,..,x_n)\) belongs to E if for each \(\overline{x} \in \overline{X}_t\) there exists at least one \(i \in [n]\) for which \(\Vert \overline{x} - x_i\Vert < t\). E can therefore be rewritten as follows :

$$\begin{aligned} E = \bigcap _{\overline{x} \in \overline{X}_t} \bigcup _{i \in [n]} \{(x_1,\dots ,x_n) \in \varOmega ^n ~|~ \Vert \overline{x}-x_i\Vert < t\}. \end{aligned}$$

In particular, note that

$$\begin{aligned} E^c = \varOmega ^n \setminus E = \bigcup _{\overline{x} \in \overline{X}_t} \bigcap _{i \in [n]} \{(x_1,\dots ,x_n) \in \varOmega ^n ~|~ \Vert \overline{x}-x_i\Vert \ge t\} = \bigcup _{\overline{x} \in \overline{X}_t} ~(\varOmega \setminus B_t(\overline{x}))^{ n}. \end{aligned}$$

Applying a union bound, we get

$$\begin{aligned} \mathbb {P}(E^c)&= \mathbb {P}\left( \bigcup _{\overline{x} \in \overline{X}_t} (\varOmega \setminus B_t(\overline{x}))^n \right) \\&\le \sum _{\overline{x} \in \overline{X}_t} \mathbb {P}\big ((\varOmega \setminus B_t(\overline{x}))^{n} \big ) = \sum _{j \in [m]} U(\varOmega \setminus B_t(\overline{x})))^n, \end{aligned}$$

where the last step is due to the fact that \(\mathbb {P}\) is a product measure and so \(\mathbb {P}(A^n) = U^{\otimes n}(A^n) = U(A)^n\). Now we need to evaluate \(U(\varOmega \setminus B_t(\overline{x})) = 1- U(B_t(\overline{x}))\) for \(\overline{x} \in \overline{X_t}\). Since \(\overline{X}_t \subset \varOmega \), it holds

$$\begin{aligned} \forall \overline{x} \in \overline{X}_t,~U( B_t(\overline{x})) = \tfrac{\text {vol}(B_t(\overline{x}) \cap \varOmega )}{\text {vol}(\varOmega )} \ge \tfrac{\min _{x \in \varOmega } \text {vol}(B_t(x) \cap \varOmega )}{\text {vol}(\varOmega )}. \end{aligned}$$

Step 3. Bounding \(\text {vol}(B_t(x) \cap \varOmega )\) when \(t \le r\). Let us now find a lower bound for \(\min _{x \in \varOmega } \text {vol}(B_t(x) \cap \varOmega )\). Recall that since \(\varOmega \) satisfies Assumption 1(a), \(\varOmega \) can be written \(\varOmega = \cup _{z \in S} B_r(z)\). .Let \(t \le r\), \(x \in \varOmega \). By the previous point, there exists \(z \in S\) such that \(x \in B_r(z) \subset \varOmega \) and hence \(B_t(x) \cap B_r(z) \subset B_t(x) \cap \varOmega \). Let \(C_{x,z,t}\) denote the cone centered in x and directed to z with aperture \(\pi /3\). It is easy to see geometrically that \(B_r(z) \cap B_t(x)\) contains the cone \(C_{x,z,t}\) (this fact is proved in Lemma 13). Moreover, using the lower bound for the volume of this cone provided in Lemma 13, it holds:

$$\begin{aligned} \text {vol}(\varOmega \cap B_t(x))&\ge \text {vol}(B_r(z) \cap B_t(x)) \ge \text {vol}(C_{x,z,t}) \\&\ge \frac{2V_{d-1}}{\sqrt{3}d} \left( \tfrac{\sqrt{3}}{4}\right) ^{d}t^d. \end{aligned}$$

Step 4. Expressing t with respect to n and \(\delta \) and guaranteeing that \(t \le r\). To conclude, let \(C = \frac{V_{d-1}}{2d \text {vol}(\varOmega )} \left( \tfrac{\sqrt{3}}{4}\right) ^{d-1}\). Since \(N_t \le (1 + 2R/t)^d\), and \((1 - c)^x \le e^{-c x}\) for any \(x \ge 0\) and \(c \in [0,1]\), then

$$\begin{aligned} \mathbb {P}(E) \ge 1 - N_t \big (1- C t^d\big )^n \ge 1 - e^{-Ct^d n + d\log (1+ 2R/t)} \ge 1 - \delta , \end{aligned}$$

where the last step is obtained by setting

$$\begin{aligned} t = (Cn)^{-1/d} \left( \log \frac{(1+2R(Cn)^{1/d})^d}{\delta }\right) ^{1/d}. \end{aligned}$$

Then \(h_{\widehat{X},\varOmega } \le 2t\) with probability at least \(1-\delta \), when \(t \le r\). The desired result is obtained by further bounding C and t as follows.

Bounding C. It holds \(\frac{2V_{d-1}}{\sqrt{3}d V_d} = \left( \tfrac{4}{3 d^2 \pi }\right) ^{1/2} \frac{\varGamma (d/2 + 1)}{\varGamma (d/2 + 1/2)}\). Using Gautschi’s inequality and the fact that \(d \ge 1\),

$$\begin{aligned} \left( \tfrac{2}{3 d \pi }\right) ^{1/2}\le \frac{2V_{d-1}}{\sqrt{3}d V_d}\le \left( \tfrac{2(d+2)}{3 d^2 \pi }\right) ^{1/2}\le 1. \end{aligned}$$

Since \(\left( \tfrac{3d\pi }{2}\right) ^{1/2d} \tfrac{4}{\sqrt{3}} \le 2\sqrt{2\pi } \) for all \(d \ge 1\), and since \(V_d r^d \le \text {vol}(\varOmega ) \le V_d R^d\), it holds

$$\begin{aligned} (2\sqrt{2\pi } R)^{-d} \le C \le (4r/\sqrt{3})^{-d} \implies \frac{n^{1/d}}{2\sqrt{2\pi }R} \le (Cn)^{1/d} \le \frac{\sqrt{3}n^{1/d}}{4 r} \le \frac{n^{1/d}}{2 r} . \end{aligned}$$

Bounding t. Since, \((1 + x)^d \le (2x)^d \) for any \(x \ge 1\) and \(2R(Cn)^{1/d} \le \frac{R}{r} n^{1/d}\), and \(\frac{R}{r} n^{1/d} \ge 1\), it holds

$$\begin{aligned} t \le 2\sqrt{2\pi } R n^{-1/d} (\log \tfrac{n}{\delta } + d \log \tfrac{2R}{r})^{1/d} . \end{aligned}$$

Guaranteeing \(t \le r\). Applying Lemma 12 to \(\alpha = (2\pi )^{d/2} (2R/r)^d\) and \(\beta = (2R/r)^d/\delta \), it holds that if

$$\begin{aligned} n\ge 2\alpha \log (2\alpha \beta ) = 2~(2\pi )^{d/2} (2R/r)^d ~\left( \log \frac{2}{\delta } + d/2\log (2\pi ) + 2d \log (2R/r)\right) , \end{aligned}$$

then \(\alpha /n \log (\beta n) \le 1\), so

$$\begin{aligned} t \le 2\sqrt{2\pi } R n^{-1/d} (\log \tfrac{n}{\delta } + d \log \tfrac{2R}{r})^{1/d} \le r (\alpha /n \log (\beta n)^{1/d} \le r . \end{aligned}$$

\(\square \)

1.2 Proof of Theorem 6

Proof

Recall that \(s > d/2\) and \(m < s-\tfrac{d}{2}\) is a positive integer. Assume that \(\varOmega \) satisfies Assumption 1(a) for a certain r and that the diameter of \(\varOmega \) is bounded by 2R. In particular, if \(\varOmega \) is a ball of radius R, then \(\varOmega \) satisfies Assumption 1(a) with \(r = R\). In the first step of the proof we guarantee that n is large enough to apply Theorem 4 and that \(h_{\widehat{X},\varOmega }\), controlled by Theorem 4, satisfies the assumptions of Theorem 5. Then we apply Theorem 5.

Step 1. Guaranteeing n large enough and \(h_{\widehat{X},\varOmega } \le r/(18(m-1)^2)\). Applying Lemma 12 to \(\alpha = \left( \frac{2R}{r}\right) ^d\max (3,10(m-1))^{2d}\) and \(\beta = \tfrac{(2R)^d}{r^d~\delta }\), it holds that if

$$\begin{aligned}{} & {} n \ge 2\alpha \log (2\alpha \beta ) = \left( \frac{2R}{r}\right) ^d \\{} & {} \quad \max (3,10(m-1))^{2d}\left( 2 \log \frac{2}{\delta } + 4d \log \left( \tfrac{R}{r} \max (6,20(m-1))\right) \right) , \end{aligned}$$

then \(\alpha /n \, \log (\beta n) \le 1\), which implies

$$\begin{aligned} n^{-1/d}(\log \frac{n}{\delta } + d\log \beta )^{1/d} \le \frac{r}{2R\max (3,10(m-1))^2}. \end{aligned}$$

In particular, n satisfying the condition above is large enough to satisfy the requirement of Theorem 4 (since \(r \le R\)). Therefore, by applying Theorem 4 we have that with probability at least \(1-\delta \),

$$\begin{aligned} h_{\widehat{X},\varOmega } ~~\le ~~ 11 R \, n^{-\frac{1}{d}} \, (\log \tfrac{(2R)^d~n}{r^d~\delta } )^{{1}/{d}} \le \frac{r}{\max (1,18(m-1)^2)}. \end{aligned}$$

Step 2. Applying Theorem 5. In the previous step we provided a condition on n such that \(h_{\widehat{X},\varOmega }\) satisfies \(h_{\widehat{X},\varOmega } \le \frac{r}{\max (1,18(m-1)^2)}\). By Proposition 1, Assumption 2 holds for the Sobolev kernel with smoothness s, for any \(m \in \mathbb {N}\) since \(m < s - d/2\). Then the conditions to apply Theorem 5 are satisfied. Applying Theorem 5 with \(\lambda \ge 2 \eta \max (1, \textsf{M}\textsf{D}_m)\) and \(\eta = \frac{3\max (1,18(m-1)^2)^m~d^m}{m!} h^m_{\widehat{X},\varOmega }\), we have

$$\begin{aligned} |\hat{c} - f_*| \le 2\eta |f|_{\varOmega ,m} + \lambda \text {Tr}(A_*) \le 3 \lambda (|f|_{\varOmega ,m}+\text {Tr}(A_*)), \end{aligned}$$

Thus, under this condition, we have with probability at least \(1-\delta \),

$$\begin{aligned} |\hat{c} - f_*| \le C_{m,s,d} R^m n^{-m/d} (\log \frac{2^d n}{\delta }), \end{aligned}$$

where

$$\begin{aligned} C_{m,s,d} = 6\times 11^m \times \frac{\max (1,18(m-1)^2)^m d^m}{m!} \max (1,\textsf{M}\textsf{D}_m). \end{aligned}$$

Step 3. Bounding the constant term \(C_{m,s,d}\) in terms of msd. Note that

$$\begin{aligned} \frac{\varGamma (m+d/2)}{\varGamma (d/2)} = (d/2)...(d/2 + m-1) \le (d/2 + m-1)^{m-1} \end{aligned}$$

and

$$\begin{aligned} \frac{\varGamma (s-d/2-m)}{\varGamma (s-d/2)} =\frac{1}{ (s-d/2 - m)....(s-d/2-1)} \le \left( \frac{1}{s-d/2-m}\right) ^{m-1}, \end{aligned}$$

which yields:

$$\begin{aligned} \textsf{D}_m\le (2\pi )^{d/4} \left( \frac{d/2 + m-1}{s-d/2-m}\right) ^{(m-1)/2}. \end{aligned}$$

Moreover, using the bound on \(\textsf{M}\), we get

$$\begin{aligned} \textsf{D}_m\textsf{M}\le 2^{s+1/2} ~ (2\pi )^{3d/4} \left( \frac{d/2 + m-1}{s-d/2-m}\right) ^{(m-1)/2} . \end{aligned}$$

This yields the following bound for \(C_{m,s,d}\):

$$\begin{aligned}{} & {} C_{m,s,d}\le \\{} & {} \frac{6 \max (1,18(m-1)^2)^m (11d)^m}{m!} \max \left( 1,2^{s+1/2} ~ (2\pi )^{3d/4} \left( \frac{d/2 + m-1}{s-d/2-m}\right) ^{(m-1)/2} \right) .\\ \end{aligned}$$

\(\square \)

Global minimizer. Proofs

1.1 Proof of Remark 4

Proof

Since f satisfies both Assumptions 1(b) and 4, denote by \(\zeta \) the unique minimizer of f in \(\varOmega \). Since \(\zeta \) is a strict minimum by Assumption 1(b), there exists \(\beta _1 > 0\) such that \(\nabla ^2 f(\zeta ) \succeq \beta _1 I\). Thus, since \(f \in C^2(\mathbb {R}^d)\), there exists a small radius \(t>0\) such that \(\nabla ^2 f(x) \succeq \tfrac{\beta _1}{2} I\) for all \(x \in B_t(\zeta )\) and hence

$$\begin{aligned} \forall x \in \varOmega \cap B_t(\zeta ),~ f(x) - f_* = f(x) -f(\zeta ) - \nabla f(\zeta )^\top (x-\zeta ) \ge \tfrac{\beta _1}{4}\Vert x-\zeta \Vert ^2. \end{aligned}$$
(F.1)

Moreover, since f has no minimizer on the boundary of \(\varOmega \) and since \(\zeta \) is the unique minimizer of f on \(\varOmega \), f has no minimizer on \(K = \overline{\varOmega } \setminus {B_t(x)}\) which is a compact set. Denote by m the minimum of f on K. Since K is compact, this minimum is reached and since f does not reach its global minimum \(f_*\) on K, we have \(m - f_* > 0\). Let R be a radius such that \(\overline{\varOmega } \subset B_R(\zeta )\), which exists since \(\varOmega \) is bounded. Then, since for any \(x \in \overline{\varOmega }\), \(\Vert x - \zeta \Vert < R\), it holds for any \(x \in K\) :

$$\begin{aligned} f(x) - f_* = f(x) - m + m-f_* \ge m-f_* = \frac{2(m-f_*)}{2R^2}R^2 \ge \frac{2(m-f_*)}{2R^2}\Vert x-\zeta \Vert ^2. \end{aligned}$$
(F.2)

Thus, taking \(\beta = \min (\tfrac{\beta _1}{2},\tfrac{2(m-f_*)}{R^2})\) and combining Eqs. (F.1) and (F.2), it holds

$$\begin{aligned} \forall x \in \varOmega ,~f(x)-f_* \ge \frac{\beta }{2}\Vert x-\zeta \Vert ^2. \\ \end{aligned}$$

\(\square \)

1.2 Proof of Theorem 7

Proof

Let us divide the proof into four steps.

Step 1: Extending the parabola outside of \(\varOmega \) Since \(\varOmega \) is an open set containing \(\zeta \), there exists \(t > 0\) such that the ball \(B_t(\zeta ) \subset \varOmega \). Define \(\delta = \tfrac{\beta - \nu }{2}t^2\). It holds :

$$\begin{aligned} \forall x \in \mathbb {R}^d\setminus {\varOmega },~ \frac{\beta }{2}\Vert x-\zeta \Vert ^2 \ge \frac{\nu }{2}\Vert x-\zeta \Vert ^2 + \delta . \end{aligned}$$
(F.3)

Now define the following open set :

$$\begin{aligned} \widetilde{\varOmega } = \left\{ x \in \mathbb {R}^d ~:~ f(x) - f_* - \tfrac{\beta }{2}\Vert x - \zeta \Vert ^2> - \delta /2\right\} . \end{aligned}$$

It is open since f is continuous. Moreover, it contains the closure of \(\varOmega \) which we denote with \(\overline{\varOmega }\) which is compact since it is closed and bounded in \(\mathbb {R}^d\). Theorem 1.4.2 in [40] applied to \(X = \widetilde{\varOmega }\) and \(K = \overline{\varOmega }\) shows the existence of \(\chi : \mathbb {R}^d \rightarrow \mathbb {R}\) such that \(\chi \in C^\infty (\mathbb {R}^d)\), \(\chi (x) \in [0,1]\), \(\chi = 1\) on \(\overline{\varOmega }\) and \(\chi = 0\) on \(\mathbb {R}^d \setminus {\widetilde{\varOmega }}\). Finally, define \(\overline{p}_{\nu }(x) := \frac{\nu }{2}\Vert x-\zeta \Vert ^2 \chi (x)\). \(\overline{p}_{\nu }\) satisfies the following properties :

  • \(\overline{p}_{\nu } \in C^\infty (\mathbb {R}^d)\);

  • for all \(x \in \overline{\varOmega }\), \(\overline{p}_{\nu }(x) = \frac{\nu }{2}\Vert x-\zeta \Vert ^2 \le \frac{\beta }{2}\Vert x-\zeta \Vert ^2\);

  • for all \(x \in \mathbb {R}^d \setminus {\widetilde{\varOmega }}\), \(\overline{p}_{\nu }(x) =0\);

  • for all \(x \in \widetilde{\varOmega }\setminus {\varOmega }\), \(f(x) - f_* - \overline{p}_{\nu }(x) \ge \delta /2\).

The first, second and third properties are direct consequences of the properties of \(\chi \) and the fact that \(\nu < \beta \). The last property comes from combining Eq. (F.3) with the definition of \(\widetilde{\varOmega }\) and the fact that \(\chi \in [0,1]\) :

$$\begin{aligned} \forall x \in \widetilde{\varOmega }\setminus {\varOmega },~f(x) - f_* - \overline{p}_{\nu }(x)&= f(x) - f_* - \chi (x)\tfrac{\nu }{2}\Vert x-\zeta \Vert ^2 \\&\ge f(x) - f_* - \tfrac{\nu }{2}\Vert x-\zeta \Vert ^2\\&= \left( f(x) - f_* - \tfrac{\beta }{2}\Vert x-\zeta \Vert ^2\right) \\&\quad + \left( \tfrac{\beta }{2}\Vert x-\zeta \Vert ^2 - \tfrac{\nu }{2}\Vert x-\zeta \Vert ^2\right) \\&\ge -\delta /2 + \delta = \delta /2. \end{aligned}$$

Step 2: Extending \(x \mapsto f(x) - \tfrac{\nu }{2}\Vert x-\zeta \Vert ^2\) outside of \(\varOmega \). Define \(g(x) = f(x) - \overline{p}_\nu (x)\) on \(\mathbb {R}^d\). Then g satisfies Assumption 1(b), g has exactly one minimizer in \(\varOmega \) which is \(\zeta \), and its minimum is \(g(\zeta ) = f_*\). Indeed, the fact that \(g \in C^2(\mathbb {R}^d)\) comes from the fact that \(f \in C^2(\mathbb {R}^d)\) by Assumption 1(b) on f and the fact that \(\overline{p}_\nu \in C^\infty (\mathbb {R}^d)\). Moreover, \(g \ge f_*\) on \(\mathbb {R}^d\) and \(g-f_* \ge \delta /2\) on \(\partial \varOmega \). Indeed, first note that since \(\nu < \beta \), it holds

$$\begin{aligned} \forall x \in \varOmega ,~g(x) = f(x) - \overline{p}_{\nu }(x) = f(x) - \tfrac{\nu }{2}\Vert x-\zeta \Vert ^2 \ge f(x) - \tfrac{\beta }{2}\Vert x-\zeta \Vert ^2 \ge f_*, \end{aligned}$$

where the last inequality comes from Eq. (7.2). Second, since \(\overline{p}_{\nu } = 0\) on \(\mathbb {R}^d\setminus {\widetilde{\varOmega }} \) and since \(f_*\) is the minimum of f, for any \(x \in \mathbb {R}^d \setminus {\widetilde{\varOmega }}\), \(g(x) - f_* = f(x) - f_* \ge 0\). Finally, by the last point of the previous step, we see that \(g(x) \ge f_* + \delta /2 > f_*\) for any \(x \in \widetilde{\varOmega }\setminus {\varOmega }\). In particular, \(g(x) \ge f_* + \delta /2\) for any \(x \in \partial \varOmega \). Since \(g(\zeta ) = f(\zeta ) = f_*\), we see that \(f_*\) is the minimum of g on \(\mathbb {R}^d\) and that this minimum is reached at \(\zeta \) and is not reached on the boundary of \(\varOmega \). The fact that \(\zeta \) is the unique minimum on \(\varOmega \) comes from the fact that since \(\nu < \beta \) and by Eq. (7.2) we have that for any \(x \in \varOmega \setminus {\{\zeta \}}\) the following holds

$$\begin{aligned} g(x)&= f(x) - \overline{p}_{\nu }(x) = f(x) - \tfrac{\nu }{2}\Vert x-\zeta \Vert ^2 \nonumber \\&> f(x) - \tfrac{\beta }{2}\Vert x-\zeta \Vert ^2 \ge f_*. \end{aligned}$$
(F.4)

The fact that this minimum is not reached on the boundary of \(\varOmega \) comes from the fact stated above that \(g(x) \ge f_* + \delta /2\) for any \(x \in \partial \varOmega \). Finally, the fact that \(\zeta \) is a strict minimum of g also comes from Eq. (F.4) which implies that \(\nabla ^2 g(\zeta ) \succeq (\beta - \nu ) I\) since g reaches a minimum in \(\zeta \), g is \(C^2\) and \(\nu < \beta \).

Note that g also satisfies Assumption 3 since f satisfies Assumption 3 and \(\overline{p}_\mu \in C^\infty (\mathbb {R}^d) \subset C^2(\mathbb {R}^d)\cap \mathcal {H}\) by Assumption 2(a).

Step 3: Applying Corollary 1to g. The previous point shows that g satisfies Assumptions 1(b) and 3 and that g has a unique minimum in \(\varOmega \). Moreover, \(\mathcal {H}\) satisfies Assumption 2. Hence, Corollary 1 to g and \(\mathcal {H}\), the following holds : there exists \(A_* \in \mathbb {S}_+(\mathcal {H})\) with \(\text {rank}(A_*) \le d + 1\) such that \(g(x) - f^* = \left\langle {\phi (x)},{A_* \phi (x)}\right\rangle \) for all \(x \in \varOmega \).

Step 4. Let \(p_0\) be the maximum of Eq. (7.1). In Lemma 5 we have seen that the solution of Eq. (7.1) is \(p_0 = f_*\). Since \(A \succeq 0\) implies \(\left\langle {\phi (x)},{ A \phi (x)}\right\rangle \ge 0\) for all \(x \in \varOmega \), the problem in Eq. (7.1) is a relaxation of Eq. (7.3), where the constraint \(f(x) - \tfrac{\nu }{2}\Vert x\Vert ^2 + \nu x^\top z - c = \left\langle {\phi (x)},{ A \phi (x)}\right\rangle \) is substituted by \(f(x) - \tfrac{\nu }{2}\Vert x\Vert ^2 + \nu x^\top z - c \ge 0, \forall x \in \varOmega \). Then \(p_0 \ge p^*\) if a maximum \(p^*\) exists for Eq. (7.3). Thus, if there exists A that satisfies the constraints in Eq. (7.3) for the value \(c_* = f_*+\frac{\nu }{2}\Vert \zeta \Vert ^2\) and \(z_* = \zeta \), then \(p_0 = p^*\) and \((c_*, \zeta , A)\) is a minimizer of Eq. (7.3).

The proof is concluded by noting that indeed there exists A that satisfies the constraints in Eq. (7.3) for the value \(c_* = f_*+\frac{\nu }{2}\Vert \zeta \Vert ^2\) and \(z_* = \zeta \) and it is obtained by the previous step. \(\square \)

1.3 Proof of Theorem 8

Proof

The proof is a variation of the the one for Theorem 5, the main difference is that we take care of the additional term \(z - \zeta \).

Step 0. The SDP problem in Eq. (7.4) admits a solution

(a) Under the constraints of Eq. (7.4), \(c - \frac{\nu }{2}\Vert z\Vert ^2\) cannot be larger than \(\min _{i \in [n]} f(x_i)\). Indeed, for any \(i\in [n]\), since \(B \succeq 0\), the i-th constraint implies

$$\begin{aligned} f(x_i) - \tfrac{\nu }{2}\Vert x_i - z\Vert ^2 - c + \frac{\nu }{2}\Vert z\Vert ^2 = f(x_i) - \tfrac{\nu }{2}\Vert x_i\Vert ^2 + \nu x_i^\top z - c = \varPhi _i B \varPhi _i \ge 0. \end{aligned}$$

Hence, \(f(x_i) \ge f(x_i) - \tfrac{\nu }{2}\Vert x_i - z\Vert ^2 \ge c + \frac{\nu }{2}\Vert z\Vert ^2\). Thus, since \(B \succeq 0\), for any Bzc satisfying the constraint, \(c - \frac{\nu }{2}\Vert z\Vert ^2 - \lambda \text {Tr}(B) \le \max _{i \in [i]}{f(x_i)}\).

(b) There exists an admissible point. Indeed let \((c_*, z_*, A_*)\) be the solution of Eq. (7.3) such that \(A_*\) has minimum trace norm (by Theorem 7, we know that this solution exists with \(c_* = f_*\) and \(z_* = \zeta \), under Assumptions 1 to 4). Then, by Lemma 3 applied to \(g(x) = f(x) - \frac{\nu }{2}\Vert x\Vert ^2 - \nu x^\top z_* - c_*\) and \(A =A_*\), given \(\widehat{X} = \{x_1,\dots ,x_n\}\) we know that there exists \(\overline{B} \in \mathbb {S}_+(\mathbb {R}^n)\) satisfying \(\text {Tr}(\overline{B}) \le \text {Tr}(A_*)\) s.t. the constraints of Eq. (7.4) are satisfied for \(c = c_*\) and \(z = z_*\). Then \((c_*,z_*, \overline{B})\) is admissible for the problem in Eq. (7.4). Since there exists an admissible point for the constraints of Eq. (7.4) and its functional cannot be larger than \(\max _{i \in [n]}f(x_i)\), then the SDP problem in Eq. (7.4) admits a solution [21].

Step 1. Consequences of existence of \(A_*\). Let \((\hat{c}, \hat{z}, \hat{B})\) one minimizer of Eq. (7.4). The existence of the admissible point \((c_*, z_*, \overline{B})\) implies that

$$\begin{aligned} \hat{c} - \tfrac{\nu }{2}\Vert \hat{z}\Vert ^2 - \lambda \text {Tr}(\hat{B}) \ge c_* - \tfrac{\nu }{2}\Vert z_*\Vert ^2 - \lambda \text {Tr}(\overline{B}) \ge f_* - \lambda \text {Tr}(A_*). \end{aligned}$$
(F.5)

From which we derive,

$$\begin{aligned} \lambda \text {Tr}(\hat{B}) - \lambda \text {Tr}(A_*) ~~\le ~~ \varDelta , \quad \varDelta := \hat{c} - \tfrac{\nu }{2}\Vert \hat{z}\Vert ^2 - f_*. \end{aligned}$$
(F.6)

Step 2. \(L^\infty \) bound due to the scattered zeros. Note that the solution \((\hat{c},\hat{z},\hat{B})\) satisfies \(\hat{g}(x_i) = \varPhi _i^\top \hat{B} \varPhi _i\) for \(i \in [n]\), where the function \(\hat{g}\) is defined as \(\hat{g}(x) = f(x) - \frac{\nu }{2}\Vert x\Vert ^2 + \nu x^\top \hat{z} - \hat{c}\) for \(x \in \varOmega \), moreover \(h_{\widehat{X},\varOmega } \le \frac{r}{\max (1,18(m-1)^2)}= \frac{r}{18(m-1)^2}\) by assumption, since \(m\ge 2\). Then we can apply Theorem 4 with \(g = \hat{g}\), \(\tau = 0\) and \(B = \hat{B}\) obtaining for all \(x \in \varOmega \)

$$\begin{aligned} f(x) - \tfrac{\nu }{2}\Vert x\Vert ^2 + \nu x^\top \hat{z} - \hat{c} = \hat{g}(x) \ge - \eta (|\hat{g}|_{\varOmega ,m} + \textsf{M}\textsf{D}_m\text {Tr}(\hat{B})), \quad \eta = C_0 h^m_{\widehat{X},\varOmega }, \end{aligned}$$

where \(C_0\) is defined in Theorem 4 and \(C_0 = 3\frac{(18d)^m (m-1)^{2m}}{m!}\) since \(m \ge 2\). Since the inequality above holds for any \(x \in \varOmega \), by evaluating it in the global minimizer \(\zeta \in \varOmega \), we have \(f(\zeta ) = f_*\) and so

$$\begin{aligned} -\varDelta - \tfrac{\nu }{2}\Vert \hat{z} - \zeta \Vert ^2 = \hat{g}(\zeta ) \ge -\eta (|\hat{g}|_{\varOmega ,m} + \textsf{M}\textsf{D}_m\text {Tr}(\hat{B})). \end{aligned}$$

Now we bound \(|\hat{g}|_{\varOmega ,m}\). Since \(\hat{g}(x) = f(x) - p_{\hat{z},\hat{c}}(x)\), where \(p_{\hat{z},\hat{c}}\) is a second degree polynomials defined as \(p_{\hat{z},\hat{c}} = \tfrac{\nu }{2} \Vert x\Vert ^2 - \nu x^\top \hat{z} + \hat{c}\), we have

$$\begin{aligned} |\hat{g}|_{\varOmega ,m} \le |f|_{\varOmega ,m} \,+\, |p_{\hat{z},\hat{c}}|_{\varOmega ,m} \le |f|_{\varOmega ,m} + \nu , \end{aligned}$$
(F.7)

since for \(m = 2\), we have \(|p_{\hat{z},\hat{c}}|_{\varOmega ,2} = \sup _{i,j \in [d], x \in \varOmega } |\frac{\partial ^2 p_{\hat{z},\hat{c}}(x)}{\partial x_i \partial x_j}| = \nu \) and also \(|p_{\hat{z},\hat{c}}|_{\varOmega ,m} = 0\) for \(m > 2\). Then

$$\begin{aligned} \varDelta \le \varDelta + \tfrac{\nu }{2}\Vert \hat{z} - \zeta \Vert ^2 \le \eta |f|_{\varOmega ,m} + \eta \textsf{M}\textsf{D}_m\text {Tr}(\hat{B}) + \eta \nu . \end{aligned}$$
(F.8)

Conclusion. Combining Eq. (F.8) with Eq. (F.6), since \(\tfrac{\nu }{2}\Vert \hat{z} - \zeta \Vert ^2 \ge 0\) and since \(\lambda \ge 2\textsf{M}\textsf{D}_m\eta \) by assumption, we have

$$\begin{aligned} \tfrac{\lambda }{2} \text {Tr}(\hat{B}) \le (\lambda - \textsf{M}\textsf{D}_m\eta ) \text {Tr}(\hat{B}) \le \eta |f|_{\varOmega ,m} + \eta \nu + \lambda \text {Tr}(A_*), \end{aligned}$$

from which we obtain Eq. (7.7). Moreover, the inequality Eq. (7.6) is derived by bounding \(\varDelta \) from below as \(\varDelta \ge -\lambda \text {Tr}(A_*)\) by Eq. (F.6), since \(\text {Tr}(\hat{B}) \ge 0\) by construction, and bounding it from above as

$$\begin{aligned} \varDelta \le 2\eta |f|_{\varOmega ,m} + 2\eta \nu + \lambda \text {Tr}(A_*), \end{aligned}$$

that is obtained by combining Eq. (F.8) with Eq. (7.7) and with the assumption \(\textsf{M}\textsf{D}_m\eta \le \lambda /2\). Finally from Eq. (F.8) we obtain

$$\begin{aligned} \tfrac{\nu }{2}\Vert \hat{z} - \zeta \Vert ^2 \le |\varDelta | ~+~ \eta |f|_{\varOmega ,m} + \eta \textsf{M}\textsf{D}_m\text {Tr}(\hat{B}) + \eta \nu , \end{aligned}$$

from which we derive the bound \(\frac{\nu }{2}\Vert \hat{z} - \zeta \Vert ^2\) in Eq. (7.5), by bounding \(|\varDelta |\) and \(\text {Tr}(\hat{B})\) via Eq. (7.6) and Eq. (7.7). \(\square \)

Proofs for the extensions

1.1 Proof of Theorem 9

Proof

Let \((\hat{c}, \hat{B})\) be a minimum trace-norm solution of Eq. (2.4). The minimum \(p_{\lambda ,n}\) of Eq. (2.4) then corresponds to \(p_{\lambda ,n} = \hat{c} - \lambda \text {Tr}(\hat{B})\). Combining Eq. (8.1) with Eq. (5.7) from the proof of Theorem 5 and the fact that \(\theta _2 \le \lambda /8\), we have that

$$\begin{aligned} \tfrac{7}{8}\lambda \text {Tr}(\tilde{B}) - \lambda \text {Tr}(A_*) - \theta _1 \le {\tilde{\varDelta }}, \quad {\tilde{\varDelta }} := \tilde{c} - f_*. \end{aligned}$$
(G.1)

Analogously to Step 3 of the proof of Theorem 5, by applying Theorem 4 to Eq. (8.2) with \(g(x) = f(x) - \tilde{c}, B = \tilde{B}\) and \(\tau = \tau _1 + \tau _2\text {Tr}(\tilde{B})\), we obtain for any \(x \in \varOmega \)

$$\begin{aligned} f(x) - \tilde{c} ~\ge ~ -2\tau _1 - 2\tau _2\text {Tr}(\tilde{B}) ~-~ \eta (|g|_{\varOmega ,m} + \textsf{M}\textsf{D}_m\text {Tr}(\tilde{B})), \qquad \eta = C_0 h^m_{\widehat{X},\varOmega }, \end{aligned}$$

with \(C_0\) defined in Theorem 4. Now evaluating the inequality above for \(x = \zeta \), noting that \(|g|_{\varOmega ,m} = |f|_{\varOmega ,m}\) since \(m \ge 1\), and considering that by assumption \(\tau _2 \le \lambda /8\) and \(\textsf{M}\textsf{D}_m\eta \le \lambda /2\) we derive

$$\begin{aligned} \tilde{\varDelta } = -(f(\zeta ) - \tilde{c}) \le 2\tau _1 + \tfrac{3}{4}\lambda \text {Tr}(\tilde{B}) + \eta |f|_{\varOmega ,m}. \end{aligned}$$
(G.2)

The desired result is obtained by combining Eq. (G.2) and Eq. (G.1) as we did in Step 3 of Theorem 5. \(\square \)

1.2 Proof of Corollary 2

Proof

Define \(\mathcal {H}= \{g \in C^s(\varOmega )~:~ \exists f \in C^s(\mathbb {R}^d),~f|_\varOmega = g \}\), endowed with the following norm :

$$\begin{aligned} \forall g \in \mathcal {H},~ \Vert g\Vert _{\mathcal {H}} = \sup _{|\alpha | \le s}\sup _{x \in \varOmega }{\Vert \partial ^\alpha g(x)\Vert }. \end{aligned}$$

Note that this norm is well defined since for any \(g \in \mathcal {H}\), since there exists \(f \in C^s(\mathbb {R}^d)\) such that \(g = f|_\varOmega \), since all the derivatives of f are continuous hence bounded on \(\varOmega \) which is bounded, so are all the derivatives of g.

Now note that \(\mathcal {H}\) satisfies Assumptions 2(a) to 2(c). Indeed, given \(u,v \in \mathcal {H}\) the first assumption is satisfied as a simple consequence of the Leibniz formula, since for any \(x \in \varOmega \), \(\partial ^{\alpha }(u\cdot v)(x) = \sum _{\beta \le \alpha }{ \left( {\begin{array}{c}\alpha \\ \beta \end{array}}\right) \partial ^\beta u(x) \partial ^{\alpha - \beta } v(x)}\) which in turn implies that for any \(|\alpha | \le s\) and \(x \in \varOmega \), \(\Vert \partial ^{\alpha }(u\cdot v)(x)\Vert \le 2^{|\alpha |}~\Vert u\Vert _{\mathcal {H}} ~\Vert v\Vert _{\mathcal {H}}\) and hence \(\Vert u\cdot v\Vert _{\mathcal {H}} \le 2^s \Vert u\Vert _{\mathcal {H}} ~\Vert v\Vert _{\mathcal {H}} \). Assumption 2(b) is trivially satisfied and Assumption 2(c) is a simple consequence of the dominated convergence theorem. Indeed, if \(u \in \mathcal {H}\) and \(\overline{u} \in C^s(\mathbb {R}^d)\) such that \(\overline{u}|_\varOmega = u\), define

$$\begin{aligned} \forall x,z \in \mathbb {R}^d,~ \overline{v}_z(x) = \int _{0}^1{(1-t)\overline{u}(z + t(x-z))dt}. \end{aligned}$$

\(\overline{v}_z\) is in \(C^s(\mathbb {R}^d)\) by dominated convergence, and \(v_{z} = \overline{v}|_{\varOmega }\) satisfies the desired property (in this case, there is no need to depend on r and one can simply take \(g_{r,z} = v_z\)).

Moreover, if \(f \in C^{s+2}(\mathbb {R}^d)\), then in particular, for any \(i,j \in [d]\), \(\tfrac{\partial f}{\partial x_i \partial x_j} \in C^s(\mathbb {R}^d) \) and hence its restriction to \(\varOmega \) is in \(\mathcal {H}\). Moreover, in that case, it is obvious that since \(s \ge 0\), \(f|_\varOmega \in \mathcal {H}\). This shows that f satisfies Assumptions 1(b) and 3.

Therefore, Theorem 2 can be applied, and there exist \(\tilde{w}_1,\dots ,\tilde{w}_p \mathcal {H}\), \(p \in \mathbb {N}_+\), such that

$$\begin{aligned} \forall x \in \varOmega ,~f(x) - f_* = \sum _{j \in [p]} w_j^2(x). \end{aligned}$$

By definition of \(\mathcal {H}\), taking \(w_1,...,w_p\) such that \(w_j|_\varOmega = \tilde{w}_j\), the corollary holds. \(\square \)

1.3 Certificate of optimality for the global minimizer candidate of Eq. (7.4)

Theorem 12

(Certificate of optimality for Eq. (7.4)). Let \(\varOmega \) satisfy Assumption 1(a) for some \(r > 0\). Let k be a kernel satisfying Assumptions 2(a) and 2(d) for some \(m \ge 2\). Let \(\widehat{X} = \{x_1,\dots , x_n\} \subset \varOmega \) with \(n \in \mathbb {N}\) such that \(h_{\widehat{X},\varOmega } \le \frac{r}{18(m-1)^2}\). Let \(f \in C^m(\varOmega )\) and let \(\hat{c} \in \mathbb {R}, \hat{z} \in \mathbb {R}^d,\hat{B} \in \mathbb {S}_+(\mathbb {R}^n)\) and \(\tau \ge 0\) satisfying

$$\begin{aligned} |f(x_i) - \tfrac{\nu }{2}\Vert x_i\Vert ^2 + \nu x_i^\top {\hat{z}} - \hat{c} ~-~ \varPhi _i^\top \hat{B} \varPhi _i | \le \tau , \quad i \in [n] \end{aligned}$$
(G.3)

where \(\varPhi _i\) are defined in Sect. 2. Let \(f_* = \min _{x \in \varOmega } f(x)\) and \(\hat{f} = \hat{c} - \tfrac{\nu }{2}\Vert \hat{z}\Vert ^2\). Then,

$$\begin{aligned} |f(\hat{z}) -f_*|&\le f(\hat{z}) - \hat{f} +2 \tau + C_1 h_{\widehat{X},\varOmega }^m , \end{aligned}$$
(G.4)
$$\begin{aligned} \tfrac{\nu }{2}\Vert \zeta -\hat{z}\Vert ^2&\le f(\hat{z}) - \hat{f} + 2\tau + C_2 h_{\widehat{X},\varOmega }^m. \end{aligned}$$
(G.5)

and \(C_1 = C_0(|f|_{\varOmega ,m}+\textsf{M}\textsf{D}_m\text {Tr}(\hat{B}) + \textsf{M}\textsf{D}_m\hat{C})\), \(C_2 = C_0(|f|_{\varOmega ,m}+ \nu +\textsf{M}\textsf{D}_m\text {Tr}(\hat{B}))\), where \(\hat{C} = \tfrac{\nu }{2} \Vert R^{-\top } (X - 1_n\hat{\zeta }^\top )\Vert ^2\), with \(X \in \mathbb {R}^{n\times d}\) the matrix whose i-th row corresponds to the point \(x_i\) and \(1_n \in \mathbb {R}^n\) the vector where each element is 1. The constants \(C_0\), defined in Theorem 4, and \(m, \textsf{M}, \textsf{D}_m\), defined in Assumptions 2(a) and 2(d), do not depend on \(n, \widehat{X}, h_{\widehat{X},\varOmega }, \hat{c}, \hat{B}\) or f.

Proof

We divide the proof in two steps

Step 1. First note that

$$\begin{aligned} \hat{g}(x) := f(x) - \tfrac{\nu }{2}\Vert x\Vert ^2 + \nu x^\top {\hat{z}} - \hat{c} = f(x) - \tfrac{\nu }{2}\Vert x - \hat{z} \Vert ^2 - \hat{f}. \end{aligned}$$

By applying Theorem 4 with \(g = \hat{g}\) and \(B = \hat{B}\) we have that for any \(x \in \varOmega \) \(f(x) - \tfrac{\nu }{2}\Vert x -\hat{z}\Vert ^2 - \hat{f} = \hat{g}(x) \ge - \varepsilon - 2\tau \), where \(\varepsilon = C_0(|\hat{g}|_{\varOmega ,m} + \textsf{M}\textsf{D}_m\text {Tr}(\hat{B})) h_{\widehat{X},\varOmega }^m\) and \(C_0\) is defined in Theorem 4. In particular this implies that

$$\begin{aligned} f(\zeta ) - \hat{f} - \tfrac{\nu }{2}\Vert x -\hat{z}\Vert ^2 \ge - \varepsilon - 2\tau , \end{aligned}$$

from which Eq. (G.5) is obtained by considering that \(f(\hat{z}) \ge f(\zeta )\) since \(\zeta \) is a minimizer of f. To conclude the proof of Eq. (G.5) note that \(|\hat{g}|_{\varOmega ,m} \le |f|_{\varOmega ,m} + \nu \) since \(m \ge 2\).

Step 2. Now to obtain Eq. (G.4) we need to do a slightly different construction. Let \(u_j(x) = e_j^\top (x - \hat{z})\) for any \(x \in \varOmega \). Note that since \(u_j\) is the restriction to \(\varOmega \) of a \(C^\infty \) function on \(\mathbb {R}^d\), by Assumption 2(a), \(u_j \in \mathcal {H}\). Moreover, note that \(\tfrac{\nu }{2}\Vert x-\hat{z}\Vert ^2 = \tfrac{\nu }{2}\sum _{j=1}^d u_j(x)^2\). Take \(\hat{u}_j \in \mathbb {R}^n\) defined as \(\hat{u}_j = V^*u_j\) and note that

$$\begin{aligned} \varPhi _i^\top \hat{u}_j = \big<V\phi (x_i),V^*u_j\big> = \big<V^*V\phi (x_i),u_j\big> = \big <P\phi (x_i),u_j\big > = u_j(x_i). \end{aligned}$$

Then, defining \(\hat{G} = \tfrac{\nu }{2}\sum _{i=1}^d \hat{u}_j \hat{u}_j^\top \in \mathbb {S}_+(\mathbb {R}^n)\) we have

$$\begin{aligned} \tfrac{\nu }{2}\Vert x_i-\hat{z}\Vert ^2 = \varPhi _i^\top \hat{G} \varPhi _i, \qquad \forall i \in [n]. \end{aligned}$$

Substituting \(-\tfrac{\nu }{2}\Vert x_i\Vert ^2 + \nu x_i^\top \hat{z}\) with \(\tfrac{\nu }{2}\Vert \hat{z}\Vert ^2 - \varPhi _i^\top \hat{G} \varPhi _i\) in the inequality in Eq. (G.3), we obtain

$$\begin{aligned} |f(x_i) - \hat{f} ~-~ \varPhi _i^\top (\hat{B} + \hat{G}) \varPhi _i^\top | \le \tau , \quad \forall i \in [n]. \end{aligned}$$

By applying Theorem 4 with \(g(x) = f(x) - \hat{f}\) and \(B = \hat{B} + \hat{G}\) we have that \(f(x) - \hat{f} \ge - \varepsilon -2 \tau \) for all \(x \in \varOmega \), where \(\varepsilon = C' h^m_{\hat{X},\varOmega }\) with \(C' = C_0 (|g|_{\varOmega ,m}+\textsf{M}\textsf{D}_m\text {Tr}(\hat{B} + \hat{G}))\). In particular, \(f(\zeta ) - \hat{f} \ge - \varepsilon - 2\tau \), from which Eq. (G.4) is obtained considering that \(f(\hat{z}) \ge f_*\) since \(\zeta \) is a minimizer of f.

Finally, note that \(|g|_{\varOmega ,m} \le |f|_{\varOmega ,m}\) since \(m \ge 1\). The proof is concluded by noting that using the definition of V we have \(\hat{u}_j = R^{-\top }\hat{v}_j\) with \(\hat{v}_j \in \mathbb {R}^n\) corresponding to \(\hat{v}_j = (u_j(x_1), \dots , u_j(x_n))\) for \(j \in [d]\) and that \(\text {Tr}(\hat{G}) = \tfrac{\nu }{2}\sum _{j\in [d]} \Vert \hat{u}_j\Vert ^2\). In particular, some basic linear algebra leads to \(\text {Tr}(\hat{G}) = \tfrac{\nu }{2}\Vert R^{-\top } (X - 1_n\hat{z}^\top )\Vert ^2\). \(\square \)

Details on the algorithmic setup used in the benchmark experiments

In this section, we explain exactly the algorithmic setup which we used to perform the experiments in Sect. 10.1. In all the following problems, the set \(\varOmega \) on which we will minimize the function will be a hyper-rectangle. Given a hyper-rectangle R, we will identify it with its center \(c_R\in \mathbb {R}^d\) and its width \(w_R \in \mathbb {R}^d\), such that \(R = \prod _{i=1}^d\left( (c_R)_i - (w_R)_i/2,(c_R)_i + (w_R)_i/2\right) \).

Algorithm 2
figure b

Finding a minimizer given points X

We start by defining Algorithm 2 whose main goal is to find a global minimizer as described in the previous sections given sample points \((x_1,...,x_n)\). Recall that the algorithm introduced in Sects. 6 and 7.1 computes a minimizer by solving problem:

$$\begin{aligned} \widehat{\alpha } = \mathop {\text {argmin}}_{\begin{array}{c} \alpha \in \mathbb {R}^n\\ \alpha ^{\top } 1_n = 1 \end{array}} \sum _{i=1}^n \alpha _i f(x_i) - \frac{\varepsilon }{n} \log \det \big (\varPhi ^\top \textrm{Diag}(\alpha ) \varPhi + \lambda I \big ) + \frac{\varepsilon }{n} \log \frac{ \varepsilon }{n} - \varepsilon , \end{aligned}$$
(H.1)

where \(\varPhi \) satisfies \(\varPhi ^{\top } \varPhi = K\) for \(K = (k(x_i,x_j))_{1 \le i,j \le n} \in \mathbb {R}^{n \times n}\), and choosing \(\widehat{x}\) as the approximation of the minimizer, defined by

$$\begin{aligned} \widehat{x} = \sum _{i=1}^n{\widehat{\alpha }_i x_i }. \end{aligned}$$
(H.2)

However, the kernel k, and the hyper-parameters \(\lambda \) also have to be chosen. Therefore, Algorithm 2 will use as inputs: 1) The function f to minimize; 2) the evaluation points \(x_i,~1 \le i \le n\), summarized in a matrix \(X \in \mathbb {R}^{n \times d}\); 3) the kernel k; 4) two parameters \(\lambda _{\min }\) and \(\lambda _{\max }\) such that we can choose \(\lambda \) in \([\lambda _{\min },\lambda _{\max }]\); 5) The paramter \(\varepsilon \), which controls the log barrier. For simplicity, we hide the hyperparameters linked to the solving of Eq. (H.1) through a Newton method, as explained in Sect. 7.1. Algorithm 2 automatically selects the hyperparameter \(\lambda \) by minimizing the function wich to \(\lambda \) associates the function value of the resulting \(\widehat{x}\) on a log scale (\({\textsc {ScalarFunction}}\)). This function is minimized in the range \([\lambda _{\min },\lambda _{\max }]\) through the function \({\textsc {MinimizeScalar}}\). Hence, the number of function evaluations inherent to running this algorithm is \(n + n_{\min }\) where \(n_{\min }\) is a minimum number of evaluations (equal to 10).

In our experiments, we use \(\varepsilon = 10^{-3}\), \(\lambda _{\min } = 10^{-12},\lambda _{\max } = 1\) and we use either the Brent method or simply a grid search with a maximum number of 100 points. This minimization does not have to be very precise. The full algorithm we use is an iterative scheme and is written down in Algorithm 3, computing a sequence \((x_k)\) of approximations of a minimizer of f by iteratively reducing the size of the hyper-rectangle from which the points used in Algorithm 2 are sampled. More precisely, we start from points sampled from a hyper-rectangle with center \(x_0 = c_{\varOmega }\) and with width \(w_0 = w_{\varOmega }\) (that is the hyper-rectangle \(\varOmega \)) to form \(m-1\) samples which, together with \(x_0\), form the points \(\widetilde{X}_0\in \mathbb {R}^{m \times d}\) used to compute the first approximation of the minimizer using FindMinimizer : \(x_1\). Then at each step k, we use the last approximation of the minimizer \(x_k\) as the new center of the hyper-rectangle, with width \(w_k\) which is set through the predefined function Contraction as \(w_k = {\textsc {Contraction}}(k)w_0\). As for the first iteration, we then form the concatenation \(\widetilde{X}_k \in \mathbb {R}^{m\times d}\) of \(m-1\) samples from this hyper-rectangle plus \(x_k\). In order to keep track of the previous points (as a kind of momentum), we apply FindMinimizer with \(X_k = [\widetilde{X}_k,\widetilde{X}_{k-1},\widetilde{X}_{k-2}]\), that is keeping the two last set of points as well as the ones sampled for the k-th step.

Algorithm 3
figure c

Converging to the minimum

The function FindMinimizerIter uses the following parameters: 1) a kernel function \(x,x',\sigma \mapsto k_{\sigma }(x,x')\) such that \(\sigma \) is a parameter to adapt to the the typical width of the data; 2) the initial hyper-rectangle \(\varOmega \); 3) the function f; 4) the contraction function Contraction to set the width of the successive hyper-rectangles; 5) the number m of new points sampled and used at each iteration; 6) the number N of iterations. In our implementation, we use the following parameters.

  • For \(\sigma >0\) and \(x,y \in \mathbb {R}^d\), we will use the following kernel, which is a mix between the Gaussian (very regular functions) and the Abel kernel (Sobolev functions of order \(s = (d+1)/2\) functions), plus a small term 0.01 which allows to handle the constant component of a function more easily.

    $$\begin{aligned} k_{\sigma } (x,y) = 0.01 + \exp (-\Vert x-y\Vert ^2/(2\sigma ^2)) + \exp (-\Vert x-y\Vert /\sigma ). \end{aligned}$$
    (H.3)
  • We will use the following contraction function, which depends on the dimension as well as the number of iterations :

    $$\begin{aligned} {\textsc {Contraction}}(k)=\max \left( \left( 1 + \tfrac{1}{d}\right) ^{-k},\tfrac{1}{1+k^{0.6}}\right) . \end{aligned}$$
    (H.4)
  • The number N of iterations will be set to \(N= 200\) unless stated otherwise.

  • m will be specified in the experiments : indeed, the higher the dimension, the larger m has to be in order to get meaningful results. Note that one actually uses \(n=3m\) points (from the third iteration onwards) to form the optimization problem, hence the dimension of the SDP solved with the Newton method will be 3m.

Remark 6

It is equivalent to minimize a function f and minimize the function \(\tfrac{f}{f+c}\) for a positive constant c. This allows to minimize a function in [0, 1] instead of minimizing a real-valued function: however, this also makes higher derivatives behave differently than those of the original function. In practice, instead of minimizing f directly, we minimize \(\tfrac{f}{f+c}\), where c is chosen such that \(\tfrac{f}{f+c}\) will be spread evenly over [0, 1], typically by selecting c as a quantile of the \((f(x_i))_{1 \le i \le n}\) (we choose the 0.25 quantile). We performed experiments by comparing this renormalization method with simply minimizing f, and this yields slightly better results.

1.1 Additional experiments for global optimization

See Table 4.

Table 4 Complete results of our algorithm on functions on \(\mathbb {R}^d\) for \(d \ge 2\)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rudi, A., Marteau-Ferey, U. & Bach, F. Finding global minima via kernel approximations. Math. Program. (2024). https://doi.org/10.1007/s10107-024-02081-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10107-024-02081-4

Keywords

Mathematics Subject Classification

Navigation