Construction and Monte Carlo estimation of wavelet frames generated by a reproducing kernel

We introduce a novel construct of multiscale tight frames on general domains. The frame elements are obtained by spectral filtering of the integral operator associated with a reproducing kernel. Our construction extends classical wavelets as well as generalized wavelets on both continuous and discrete non-Euclidean structures such as Riemannian manifolds and weighted graphs. Moreover, it allows to study the relation between continuous and discrete frames in a random sampling regime, where discrete frames can be seen as Monte Carlo estimates of the continuous ones. Pairing spectral regularization with learning theory, we show that a sample frame tends to its population counterpart, and derive explicit finite-sample rates on spaces of Sobolev and Besov regularity. On the other hand, our results prove the stability of frames constructed on empirical data, in the sense that all stochastic discretizations have the same underlying limit regardless of the set of initial training samples.


Introduction
Wavelet systems have long been employed in time-frequency analysis and approximation theory to break the uncertainty principle and resolve local singularities against global smoothness. Nonlinear approximation over redundant families of localized waveforms has enabled the construction of efficient sparse representations, becoming common practice in signal processing, source coding, noise reduction, and beyond. Sparse dictionaries are also an important goal in machine learning, where the extraction of few relevant features can significantly enhance a myriad of learning tasks, making them scale with enormous quantities of data. However, the role of wavelets in machine learning is still unclear, and the impact they had in signal processing has, by far, not been matched. One objective constraint to a direct application of classical wavelet techniques to modern data science is of geometrical kind: real data are typically high-dimensional and inherently structured, often featuring or hiding non-Euclidean topologies. On the other hand, a representation built on empirical samples poses an additional problem of stability, accounted for by how well it generalizes to future data. In this paper we introduce a data-driven construction of wavelet frames on non-Euclidean domains, and provide a result of asymptotic stability in high probability.
With a jump from Haar's seminal work [22] and since the founding contributions of Grossmann and Morlet [21], a general theory of wavelet transforms and a wealth of specific families of wavelets have rapidly arisen [6,9,16,26,27], first and foremost on R d , but soon thereafter also on non-Euclidean structures such as manifolds and graphs [7,8,18,19,23]. Generalized wavelets usually consist of frames with some kind of broad to tighter link to ideas from multi-resolution analysis. At the very least, elements of a wavelet frame ought to be associated with locations and scales, decomposing signals into a sum of local features in increasing resolution. On a basic conceptual level, many of these generalized constructions stem from a reinterpretation of the frequency domain as the spectrum of a differential operator. Indeed, wavelets on R are commonly generated by dilating and translating a well-localized function ψ, but taking the Fourier transform, they can be rewritten as with G a (ξ) = |a| 1/2 ψ(aξ) and v ξ (x) = e 2πıxξ . This allows to reinterpret the wavelet ψ a,b (x) as a superposition of Fourier harmonics v ξ (x), modulated by a spectral filter G a (ξ). Moreover, each v ξ can be seen as an eigenfunction of the Laplacian ∆ = −d 2 /dx 2 . Hence, in principle, we may retrace an analogous construction whenever some notion of Laplacian is at hand. In particular, Riemannian manifolds and weighted graphs are examples of spaces where this is possible, using the Laplace-Beltrami operator or the graph Laplacian. A more detailed overview of related work based on these or similar ideas is postponed to Section 5. Thus far, the study of generalized wavelets on non-Euclidean domains has primarily focused on either the continuous or the discrete setting. It is nonetheless natural to investigate the relationship between the two. Regarding for instance a graph as a sample of a manifold, we may ask ourselves whether and in what sense the frame built on the graph tends to the one living on the manifold. In this paper we present a unified framework for the construction and the comparison of continuous and discrete frames. Returning for a moment to the real line, let us consider the semigroup e −t∆ generated by the Laplacian. This defines an integral operator e −t∆ f (x) = K t (x, y)f (y)dy, with K t (x, y) being the heat kernel. Such a representation suggests that the generalized Fourier analysis, already revisited as spectral analysis of the Laplacian, can now be translated in terms of a corresponding integral operator. With the attention shifting from the Laplacian to an integral kernel, the idea is then to recast the above constructions inside a reproducing kernel Hilbert space. Exploiting the reproducing kernel, one can in particular extend a discrete frame out of the given samples, and thus compare it to its natural continuous counterpart.
Our construction yields empirical frames Ψ N on sets of N data. We will show that Ψ N converges in high probability to a continuous frame Ψ on a reproducing kernel Hilbert space H as N → ∞, thus providing a proof of its stability in an asymptotic sense. The empirical frames Ψ N can be seen as Monte Carlo estimates of Ψ. Repeated random sampling will in fact produce a sequence of frames Ψ N on an increasing chain of finite dimensional reproducing kernel Hilbert spaces H N , which approximates Ψ on H up to a desired sampling resolution quantifiable by finite sample bounds in high probability.
Overturning this perspective, one may also look at our result as a form of stochastic discretization of continuous frames. Going from the continuum to the discrete is an important problem in frame theory and applications of coherent states. Given a continuous frame of a Hilbert space, the discretization problem [2,Chapter 17] asks to extract a discrete frame out of it. Originally motivated by the need of numerical implementations of coherent states arising in quantum mechanics [10,32], the problem was then generalized to continuous frames [1] and addressed in several theoretical efforts [14,17,20], until it found a complete yet not constructive characterization in [15]. Sampling the continuous frame is tantamount to sampling the parameter space on which the frame is indexed. For a wavelet frame, this means the selection of a discrete set of scales and locations. While the discretization of the scales can be readily obtained by a dyadic parametrization, the difficult part is usually sampling locations, that is, the domain where the frame is defined. How to do this is known in many cases and consists in an attentive selection of nets of well covering but sufficiently separated points. Already sensitive in the Euclidean setting, this procedure can be hard to generalize and implement in more general geometries [8]. In this respect, our Monte Carlo frame estimation provides a randomized approach to frame discretization as opposed to a deterministic sampling design. Clearly, our Monte Carlo estimate is not solving the discretization problem in its original form, since it defines frames only on finite dimensional subspaces. It is rather providing an asymptotic approximate solution, computing frames on an invading sequence of subspaces H N ⊂ H. We should also remark that, due to covering properties, standard frame discretization always entails a loosening of the frame bounds; hence, in particular, only non-tight frames may be sampled, even when the starting continuous frame is Parseval. As a result, signal reconstruction with respect to the discretized frame will in general require the computation of a dual frame, which is a problem on its own. On the contrary, in our randomized construction we preserve the tightness, albeit at the expense of a (possibly large) loss of resolution power H \ H N .
The remainder of the paper is organized as follows. The general notation used throughout the paper is listed in Table 1. In Section 2 we give an overview of our main results. In Section 3 we introduce the general framework and define the fundamental objects used in our analysis. The focus is on kernels, reproducing kernel Hilbert spaces, and associated integral operators. In Section 4 we present our frame construction based on spectral calculus of the integral operator. Our theory encompasses continuous and discrete frames within a unified formalism, paving the way for a principled comparison of the two. In particular, interpreting discrete locations as samples from a probability distribution, we propose a Monte Carlo method for the estimation of continuous frames. In Section 5 we compare and contrast our approach to the existing literature. In Section 6 we prove the consistency of our Monte Carlo wavelets and obtain explicit convergence rates under Sobolev regularity of the signals. This is done combining techniques borrowed from the theory of spectral regularization with bounds of concentration of measure. In Section 7 we study the convergence rates in Besov spaces. In Section 8 we draw our conclusions and point at some directions for future work. ·, · H , · H inner product and norm in a RKHS H P S orthogonal projection onto a closed subspace S σ(A) spectrum of a linear operator A supp(ρ) support of a measure ρ · operator norm ·, · ρ , · ρ inner product and norm in L 2 (X ; ρ)

Overview of the main result
To illustrate our construction in a simple form, let us consider a graph of vertices X N = {x 1 , . . . , x N } and pairwise similarities encoded in a positive definite matrix K ∈ C N ×N . The space of signals on the graph X N is then isomorphic to C N . Computing the eigenvalues λ i and eigenvectors u i of the matrix K, we can define, in consonance with (1), for a suitable spectral filter F j (λ). We will show (Proposition 4.7) that (2) is a Parseval frame on the space of graph signals. Now, suppose X N is embedded in a compact Riemannian manifold X . Furthermore, suppose X admits a positive definite kernel K : X × X → C such that K[i, ℓ] = K(x i , x ℓ )/N , and let H := span{K(·, x) : x ∈ X } denote the reproducing kernel Hilbert space associated with K. The kernel allows to define the out-of-sample extension of a signal u ∈ C N to H by where S : H → C N is the sampling operator Sf [ℓ] = f (x ℓ ). By means of (3), we thus define where The family of functions (4) is a Parseval frame for the finite dimensional reproducing kernel Hilbert space H N = span{K(·, x) : x ∈ X N } ⊂ H, and it is isomorphic to the frame (2) (Proposition 4.7). Notice though that, despite being isomorphic to a frame which is defined only on X N , the frame functions (4) are well-defined on the entire manifold X . In particular, for any signal f in the reproducing kernel Hilbert space H, we can study the wavelet expansion This series approximates f up to a resolution τ and a sampling rate N . Our main result (Theorem 6.6) states that, cutting off the frequencies at a threshold τ = τ (N ) and letting N go to infinity, the error of (5) goes to zero, at a rate that depends on the regularity of the signal f . In other words, the frame constructed on the graph is asymptotically resolving the signal defined on the manifold. This result is derived as a finite-sample bound in high probability.

RKHS and integral operators
We now prepare the technical ground on which our results will built (see also [31]). Let X be a locally compact, second countable topological space endowed with a Borel probability measure ρ. Given a continuous, positive semi-definite kernel K : X × X → C, we denote the associated reproducing kernel Hilbert space (RKHS) by where K x := K(·, x) ∈ H, and the closure is taken with respect to the inner product K x , K y H := K(x, y).
Elements of H are continuous functions satisfying the following reproducing property: The space H is separable, since X is separable. We further assume K is bounded on X and denote which implies that H is continuously embedded into the space of bounded continuous functions on X . We define the (non-centered) covariance operator T : H → H by where the integral converges strongly. The operator T is positive and trace-class (therefore compact) with σ(T) ⊂ [0, κ 2 ]. Hence, the spectral theorem ensures the existence of a countable orthonormal set {v i } i∈Iρ∪I0 ⊂ H and a sequence (λ i ) i∈Iρ ⊂ (0, κ 2 ] such that Let L 2 (X ; ρ) be the space of square-integrable functions on X with respect to the measure ρ, and denote X ρ := supp(ρ). We define the integral operator L K : L 2 (X ; ρ) → L 2 (X ; ρ) by The spaces H and L 2 (X ; ρ) and the operators T and L K are related through the inclusion operator S : The adjoint operator S * : L 2 (X ; ρ) → H acts as the strongly converging integral We have T = S * S and L K = SS * . Hence, σ(T)\{0} = σ(L K )\{0}, and the eigenfunctions Mercer's theorem gives where the series converge absolutely and uniformly on compact subsets. Defining we can identify H ρ as a (non-closed) subspace of L 2 (X ; ρ). The closure of H ρ in L 2 (X ; ρ) is and the following decompositions hold true: For f ∈ H ρ , we can relate the norms in H and L 2 (X ; ρ) as In other words, √ T induces an isometric isomorphism between H ρ and H ρ . We define the partial isometry U : As examples of this setting, we may think of X as R d , or a non-Euclidean domain such as a compact connected Riemannian manifold or a weighted graph. In these cases, we can take K as the heat kernel associated with the proper notion of Laplacian, be it the Laplace-Beltrami operator or the graph Laplacian.

Wavelet frames by reproducing kernels
In this section we build Parseval frames in the RKHS H and in L 2 (X ; ρ). Our construction is centered around eigenfunctions of the integral operator (7) and functions on the corresponding eigenvalues. Continuous frames emerged in the mathematical physics community for the study of coherent states, which satisfy an integral resolution of the identity. This naturally leads to the notion of continuous frame, as a generalization of the more common discrete frame [2,16].

Definition 4.1 (frame). Let H be a Hilbert space, A a locally compact space and µ a Radon measure on
In the above definition it is implicitly assumed that the map a → Ψ a , f H is measurable for all f ∈ H. It is important to note that this definition depends on the choice of the measure µ. In the case of a counting measure, we recover the standard definition of discrete frame.

Filters
To construct our wavelet frames, we first need to define filters, i.e. functions acting on the spectrum of T that satisfy a partition of unity condition.
is called a family of filters.
By the spectral theorem, An easy way to define filters is by differences of suitable spectral functions.
is called a family of spectral functions.
Given a family of spectral functions {g j } j≥0 , filters {G j } j≥0 can be obtained setting The filters thus defined give rise to a telescopic sum: Taking the limit for τ → ∞, condition (10) is satisfied thanks to (11). Conversely, starting from a family of filters {G j } j≥0 , we can define spectral functions {g j } j≥0 by which enjoys (11) due to (10). Therefore, the notion of filter and that of spectral function are equivalent, and we will refer to them interchangeably. The definition in (12) allows to find a wealth of filters by tapping into regularization theory [12]. In the forthcoming analysis, we will use the following notion of qualification.
where the constant C ν does not depend on j.
In the theory of regularization of ill-posed inverse problems [12], the qualification represents the limit whithin which a regularizer may exploit the regularity of the true solution. In particular, methods with finite qualification suffer from a so-called saturation effect.
Some standard examples of spectral functions, together with their qualifications, are listed in Table 2. Table 2: Spectral regularizers and their qualifications. Landweber iteration and Nesterov acceleration require γ < 1/κ 2 and β ≥ 1. In heavy ball, α j , β j are suitably selected sequences depending on ν, where ν is any positive real (see [29]).
Additional examples of admissible filters widely used in the construction of wavelet frames (see e.g. [8,13]) are given by the following: Then the family {g j } j≥0 satisfies the properties (11). Furthermore, the corresponding filters (12) are localized, meaning that, defining F j (λ) :

Frames
We are now ready to define our wavelet frames. We first form frame elements in H, and then use the partial isometry U : H → L 2 (X ; ρ) to obtain frames in L 2 (X ; ρ).
We define the families of wavelets Observe that, since ψ j,x and φ j,x are defined for x ∈ X ρ , we actually have Ψ ⊂ H ρ ⊂ H, and Φ ⊂ H ρ ⊂ L 2 (X ; ρ). In particular, the orthogonality of H ρ and ker S entails K x , G j (T )v i H = 0 for all i ∈ I 0 . By the reproducing property (6), condition (14) is thus equivalent to i∈Iρ G j (λ i ) 2 |v i (x)| 2 < ∞ for all j ≥ 0 and almost every x ∈ X ρ .
If G j is a bounded function, then G j (T) is a bounded operator, hence D j = H. In this case, which includes the spectral functions listed in Table 2, condition (14) is trivially satisfied.
Using the spectral decomposition of G j (T) and the reproducing property, we obtain These expressions allow to interpret Ψ and Φ as families of wavelets, in the sense of (1). We interpret x as the location and j as the scale parameter; the functions K x localize the signal in space, whereas the filters G j regularize or localize in frequency. Note also the analogy with (8), in the light of which (16) may be seen as a filtered Mercer representation.
With the following proposition we show that (15) defines Parseval frames.
Proposition 4.7. Assume the setting in Section 3, and let Ψ, Φ be defined as in Definition 4.6. Then, for and for any F ∈ L 2 (X ; ρ) we have Proof. The equality (18) follows from (17) and the fact that U is unitary from H ρ to H ρ . To establish (17), in view of Lemma A.1 it suffices to consider functions in the dense subspace D ⊂ H. Thus, let f ∈ D. Since G j (T) is self-adjoint on D j , and D ⊂ D j for all j, we have Summing over j ≥ 0 and using (10), we therefore obtain The frame property can also be expressed as a resolution of the identity. Such a formulation will be particularly useful in Section 6.
Depending on the choice of the measure ρ, Proposition 4.7 gives the frame property for either a continuous or a discrete setting. Namely, consider a discrete set {x 1 , . . . , x N }, and let With the choice of the discrete measure ρ N , (7) defines the discrete (non-centered) covariance operator T : H → H by Furthermore, Definition 4.6 produces the family of wavelets ψ j,k := G j ( T)K x k for j ≥ 0 and k = 1, . . . , N, which, by Proposition 4.7, constitutes a discrete Parseval frame on In Section 6 we will make reference to this construction to define Monte Carlo wavelets, where the points x 1 , . . . , x N are drawn at random from X ρ .

Two generalizations
We discuss here two generalizations of the framework presented in Section 4.2. First, one may readily consider more general scale parameterizations. Namely, let Ω be a locally compact, second countable topological space, endowed with a measure µ defined on the Borel σ-algebra of Ω, finite on compact subsets, and such that supp µ = Ω. Adjusting the definitions accordingly, such as replacing the sums over all non-negative integers j in (10) and (17) with integrals over Ω with respect to µ, the proof of Proposition 4.7 follows along the same steps. In this context, Definition 4.2 can be seen as a special case where Ω is countable and µ is the counting measure. Second, the assumption that the kernel K is bounded, implying that L K admits an orthonormal basis of eigenvectors, is not necessary for our construction of Parseval frames. Indeed, it is enough to assume that This implies that H is a subspace of L 2 (X ; ρ) and the inclusion operator S is bounded. The integral (7) converges now in the weak operator topology, and the covariance operator T is positive and bounded. Thus, the Riesz-Markov theorem entails that, for all f ∈ H, there is a unique finite measure ν f on [0, +∞) such that ν f ([0, +∞)) = f where now D j := f ∈ H : Assume further that is a dense subset of H. Assumption (14) and Definition 4.6 are still valid. Moreover, the proof of Proposition 4.7 remains essentially unchanged. The only difference is in the following lines of equalities: for a given f ∈ D ∞ , we have where the second equality is due to Tonelli's theorem.

Comparison with other frame constructions
The approach we adopt in Section 4 differs from the existing literature in several crucial aspects. We now give an overview of similarities and differences. As argued in Section 1, many techniques for the analysis of signals on non-Euclidean domains, such as graphs and manifolds, are based on spectral filtering of some suitable operator. There are, generally speaking, two distinct yet related perspectives.
Starting from a discrete setting, in graph signal processing one considers a weight (or adjacency) matrix to define a certain graph operator L, such as the graph Laplacian [23] or the diffusion operator [7]. The frame elements are then defined in the spectral domain as ψ j,x := g(jL)δ x , where g is an admissible wavelet kernel, j a scale parameter, and δ x the indicator function of a vertex x. This is conceptually similar to (15), though there are also several distinctions. First, following [19], our construction results in Parseval frames. This simplifies the computational effort, since Parseval frames are canonically self-dual, and thus signal reconstruction goes without the computation of a dual frame. Moreover, to localize the frame in space we use the continuous kernel function K x , instead of the impulse δ x . Since in our setting the kernel K is used both to define the underlying integral operator and to localize the frame elements, we can use the theory of RKHS to establish a connection between continuous and discrete frames, as we will show in Section 6. In typical constructions of frames on graphs, a more judicious effort is usually required to elaborate analogous convergence results.
On the other hand, a different line of research has been primarily focused on smooth manifolds, or inspired by them to extend the analysis of signals on general (continuous) spaces. We distinguish two approaches. First, as in [25,28], one can take an arbitrary orthonormal basis {w i } i≥0 of a separable Hilbert space of functions defined on a quasi-metric measure space, together with a suitable sequence of positive reals (l i ) i≥0 , to construct a kernel-like function K H (x, ·) := i≥0 H(l i )w i (x)w i . This mirrors the basis expansion of frame elements (16), but in our case a specific orthonormal basis is taken, that is, the eigenbasis of the integral operator, and (l i ) i≥0 are the corresponding eigenvalues. Due to the use of an arbitrary basis and sequence, an additional effort (or a set of assumptions) needs to be made in order to ensure the desired properties, such as the decay of the approximation error as the number of eigenvalues resolved by the function H increases. Some of the results are similar to those in our paper, albeit estimation errors or sample bounds have not been established in this context.
A second type of methods builds frames for function spaces on compact differentiable manifolds associated with certain positive operators (predominantly the Laplace-Beltrami operator). In [8,18], filter functions g j are applied to the given operator L, giving g j ( √ L) for j ≥ 0. One then needs to ensure that this defines an integral operator with a corresponding kernel ψ j ( √ L)(x, y), which often poses a technical challenge, and relies on the relationship between the operator L and local metric properties of the manifold. We avoid this by using a positive definite kernel from the start. The next step is to sample points {x j k } mj k=1 from the manifold for each scale j, in such a way that they form a δ j -net and satisfy a cubature rule for functions in the desired space. Frame elements are then defined by C j,k ψ j ( √ L)(x j k , ·), for some suitable weights C j,k . The resulting family of functions constitutes a non-tight frame on the entire function space. On the contrary, our sampled frames are Parseval frames on finite-dimensional subspaces. As we are going to show in the next section, in order to establish convergence we do not require a stringent selection of points; instead, we sample at random, which allows for a straightforward algorithmic approach, independent of the specific geometry of the underlying space.

Monte Carlo wavelets
In this section we study the relationship between continuous and discrete frames, regarding the latter as Monte Carlo estimates of the former. We begin by restricting our attention to H, and we will then extend the analysis to L 2 (X ; ρ). In the following, we adopt notations, definitions and assumptions of Sections 3 and 4. For the sake of simplicity, we further assume supp(ρ) = X , so that H ρ = H. By Proposition 4.7, the family Ψ defined in (15) describes a Parseval frame on the entire Hilbert space H.
We thus interpret Ψ N as a Monte Carlo estimate of Ψ. In this view, we are interested in studying the asymptotic behavior of Ψ N as N → ∞, and, in particular, the convergence of Ψ N to Ψ.

Convergence in H Let
be the frame operators associated with the scale j, and its empirical counterpart. By Proposition 4.8, we have For f ∈ H, given a threshold scale τ ∈ N and a sample size N , we let be the empirical approximation of f using the first τ scales of the frame Ψ N . The reconstruction error of f τ,N can be decomposed into The first term is the approximation error, arising from the truncation of the resolution of the identity. The second term is the estimation error, which stems from estimating the measure by means of empirical samples. Next, we derive quantitative error bounds for both terms, and then balance the resolution τ in terms of sample size N to obtain our convergence result.
Approximation error. Note that Proposition 4.7 already implies as it is the tail of a convergent series. To quantify the speed of convergence with respect to τ , approximation theory suggests that f has to obey some notion of regularity. In the following we assume a smoothness of Sobolev kind (see [13] and Section 7), also known in statistical learning theory as the source condition (see [5]): f = T α h for some h ∈ H and α > 0. Proposition 6.2. Assume that g j has qualification ν ∈ (0, ∞] and f ∈ range(T α ) for some α > 0. Let β := min{ν, α}. Then Proof. By (22) we have j>τ T j = Id H − Tg τ (T). Hence, Estimation error. To bound the second term in (25), we rely on concentration results for covariance operators [31].
Then, for every f ∈ H and t > 0, with probability at least 1 − 2e −t we have Proof. Using (22) and Lemma A. Proof. For the first four spectral functions of Table 2, the claim follows by bounding the explicit derivative of λ → λg τ (λ); for the last two, from an application of Markov brothers' inequality (see [29,Supplemental,Lemma 1]). For filters of Example 4.5, we differentiate λ → g(2 τ λ) and use |g ′ | ≤ B.
Remark 6.5. In this paper we are not interested in the constants. We rely on the Hilbert norm since it provides both a simple bound on T − T HS and, by the Lipschitz assumption, the stability bound Tg τ (T) − Tg τ ( T) HS f H ≤ L(τ ) T − T HS . Our result can be improved by using the sharper bound where r(T) = trace(T) T (see Theorem 9 in [24] and the techniques in the proof of Theorem 3.4 in [4] to bound Tg τ (T) − Tg τ ( T) ).
Reconstruction error and convergence. Combining Propositions 6.2 and 6.3, we can finally prove the convergence of our Monte Carlo wavelets. In order to balance approximation and estimation error, we need to tune the resolution τ with the number of samples N and the smoothness α of the signal, in so far as the qualification ν of the filter allows.
If supp ρ = X , we have instead a frame on H ρ , and the corresponding resolution of the identity Id Hρ = j≥0 T j . The reconstruction error would thus include an additional bias term: Classical spectral functions from Table 2 satisfy the assumptions of Theorem 6.6. We report the explicit rates in Table 3. A convergence result for filters of Example 4.5 will be provided at the end of Section 7.
Combining all together, we obtain the following bound in L 2 (X ; ρ).
Then, for every t > 0, with probability at least 1 − 2e −t we have See Table 3 for specific rates regarding spectral functions from Table 2.

Monte Carlo wavelet approximation as noiseless kernel ridge regression
We conclude this section with an observation that draws a link between Monte Carlo wavelets and the regression problem. Let f τ,N be the Monte Carlo wavelet approximation (24) of f ∈ H at resolution τ given samples x 1 , . . . , x N . Then With the choice of the Tikhonov filter g j (λ) = (λ + τ −1 ) −1 (Table 2), and defining This is the (unique) solution to the kernel regularized least squares problem where y i = y[i] and λ = τ −1 . Therefore, f τ,N is the kernel ridge estimator for the noiseless regression problem and the squared reconstruction error f − f τ,N 2 ρ is the generalization error of f τ,N . Contrasting this with the optimal rate (in the minimax sense) for kernel ridge regression [5] entails that the rate in Table 3 is suboptimal for Tikhonov regularization, and presumably for all other regularizers. This is well expected from the crude Lipschitz bound used in Proposition 6.3. The scope of the present work was to establish a first result of convergence of randomly sampled frames, rather than identifying the optimality of the convergence rates. Refinement of our bounds will be object of future investigation (see also Remark 6.5).

Sobolev and Besov spaces in RKHS
The convergence rates of the frame reconstruction error in Theorem 6.6 depend on the approximation rates in Proposition 6.2, hence on the regularity of the original signal f , as quantified by the condition f ∈ range(T α ). Thinking of T as the inverse square root of the Laplacian allows to interpret range(T α ) as a Sobolev space. The theory of smoothness function spaces [33] plays a critical role in harmonic analysis, and serves also as a base for the definition of statistical priors in learning theory [3]. In this section we examine general notions of regularity and their effect on the reconstruction error. Many of the reported results on Besov spaces are well known [13], but we nonetheless include them here to be self contained and to adapt them to our setting and notation. As in the previous section, we assume supp(ρ) = X .
Sobolev spaces as domains of powers of a positive operator. By virtue of the spectral theorem, for every α > 0, T α is a positive, bounded, injective operator on H, with σ(T α ) ⊂ (0, κ 2α ]. Thus, T −α is a positive, closed, densely-defined, injective operator with σ(T −α ) ⊂ [κ −2α , ∞). We put the following H α is a Hilbert space. Moreover, we have which expresses H α in terms of the speed of decay of the Fourier coefficients, thus generalizing the standard Sobolev spaces H α = W α,2 . Theorem 6.6 establishes the convergence of Monte Carlo wavelets for signals in the class H α .
Besov spaces as approximation spaces. Besov spaces on Euclidean domains are traditionally defined by the decay of the modulus of continuity. A characterization that is best suited to generalize to arbitrary domains, and to which we also adhere, is through approximation and interpolation spaces [13,30,33]. We begin with the approximation perspective by defining a scale of Paley-Wiener spaces.
The associated approximation error for f ∈ H is equipped with the norm The space B s ∞ is defined with the usual adjustment. Discretizing the integral in (26), we obtain the equivalent norm In particular, a function f ∈ B s q if and only if the sequence 2 js E(f, 2 j ) j≥0 ∈ ℓ q . It is easy to see that the scale of spaces B s q obeys the following lexicographical order [30,Proposition 3]: B s q ⊂ B s p for q < p. Besov spaces as interpolation spaces. The Sobolev space H α is continuously embedded into B s q for every α > s. Indeed, for f ∈ H α we have the Jackson-type inequality Furthermore, B s q interpolates between H α and H. Definition 7.4 (interpolation spaces). For quasi-normed spaces E and F, θ ∈ (0, 1) and q ∈ (0, ∞), the quasi-normed interpolation space (E, F) θ,q is defined by The space (E, F) θ,∞ is defined with the usual adjustment.
Standard interpolation theory [13,33] gives with In the next proposition we show that, as in the Euclidean setting, the Besov space B s 2 coincides with the Sobolev space H s of the same order. As in the classical setting, this is particular to the case q = 2. This is probably a known fact, but we could find neither a proof nor a statement.
Besov spaces by wavelets coefficients. The Besov norm can also be expressed by means of wavelet coefficients.
where G j is a filter as in Definition 4.2. The partition of unity (10) becomes Moreover, in view of (19), for a frame Ψ as in Definition 4.6 we have f, ψ j,· L 2 (X ;ρ) = F j (T)f H , and the frame property (17) can be rewritten as If we further assume the localization property (cf. Example 4.5) a weighted ℓ q -norm of the sequence ( F j (T)f H ) j≥0 gives an equivalent characterization of the space B s q .
Proof. We upper and lower bound the discretized norm in (27). Using (35) (which holds thanks to (34)) and (36), we have Thus, by the discrete Hardy inequality (Lemma A.3), we get Conversely, F j (T)g = 0 for every g ∈ PW(2 j ), and therefore Convergence of spectrally-localized Monte Carlo wavelets. Proposition 7.6 can be used to obtain approximation bounds for frames built with filters satisfying the localization property (36).
Proposition 7.7. Under the conditions of Proposition 7.6, for every f ∈ B s q and ǫ ∈ (0, s), we have Proof. By Proposition 7.6, we have If q > 2, then B s q ⊂ B s−ǫ 2 for every ǫ ∈ (0, s), thanks to (28), and the claim follows.
Putting together Proposition 7.7 and Proposition 6.3 yields a convergence result for Monte Carlo wavelets with localized filters.
Then, for every t > 0, with probability at least 1 − 2e −t we have Compared to Theorem 6.6, Theorem 7.8 requires the resolution τ to grow only logarithmically with respect to the sample size N . Note that the conditions of Theorem 7.8 exclude the spectral functions of Table 2, since they do not satisfy (36). Examples of admissible filters are given instead by Example 4.5, which have local support (36) but exponential Lipschitz constant.

Concluding remarks and future directions
We presented a new construction of tight frames which extends wavelets on general domains based on spectral filtering of a reproducing kernel. Depending on the measure considered, our construction leads to continuous or discrete frames, covering non-Euclidean structures such as Riemannian manifolds and weighted graphs. Besides standard frequency-localized filters commonly used in wavelet frames, we defined admissible spectral filters resorting to methods from regularization theory, such as Tikhonov regularization and Landweber iteration. Regarding discrete measures as empirical measures arising from independent realizations of a continuous density, we interpreted discrete frames as Monte Carlo estimates of continuous frames. We proved that the Monte Carlo frame converges to the corresponding deterministic continuous frame and provided finite-sample bounds in high probability, with rates that depend on the Sobolev or Besov class of the reproduced signal. This demonstrates the stability of empirical frames built on sampled data.
In future work we intend to study the numerical implementation of our Monte Carlo wavelets, along with possible applications in graph signal processing, regression analysis and denoising. Further theoretical investigation may include L p Banach frame extensions, nonlinear approximation rates, Lipschitz bound refinements, and explicit localization properties for specific families of kernels.

A. Appendix
We recall the following result, whose proof can be collected from [16].
Lemma A.1. Let (Ω; µ) be a measure space and H a Hilbert space. Given a weakly measurable mapping ω → Ψ ω from Ω to H, assume there exists a dense subset D ⊂ H, and a constant C > 0, such that, for every f ∈ D, Then (37) holds for every f ∈ H. Furthermore, there exists a positive bounded operator A : H → H such that, for every f, g ∈ H, Proof. For f ∈ H, define the measurable mapping Proof. Let {e i } i∈I and {f j } j∈J be orthonormal bases of H such that Ae i = λ i e i and Bf j = µ j f j . Then We include a proof of the discrete Hardy inequality [11, equation 5.2] where we explicitly compute the Hardy constant. Then, for every s > 0, we have j≥0 2 js |b j | q ≤ 2 sq 2 sq − 1 j≥0 2 js |a j | q , provided all the sums are finite.