Kernel embedding of measures and low-rank approximation of integral operators

We describe a natural coisometry from the Hilbert space of all Hilbert-Schmidt operators on a separable reproducing kernel Hilbert space (RKHS) H and onto the RKHS G associated with the squared-modulus of the reproducing kernel of H . Through this coisometry, trace-class integral operators deﬁned by general measures and the reproducing kernel of H are isometrically represented as potentials in G , and the quadrature approximation of these operators is equivalent to the approximation of integral functionals on G . We then discuss the extent to which the approximation of potentials in RKHSs with squared-modulus kernels can be regarded as a differentiable surrogate for the characterisation of low-rank approximation of integral operators


Introduction
Integral operators with positive-semidefinite (PSD) kernels play a central role in the theory of reproducing kernel Hilbert spaces (RKHSs) and their applications; see for instance [4,5,19,20,26].As an important instance, this class of operators encompasses the PSD matrices.
Under suitable conditions, an integral operator defined by a PSD kernel K and a measure μ can be regarded as a Hilbert-Schmidt (HS) operator L μ on the RKHS H associated with K ; see e.g.[20][21][22].Let G be the RKHS for which the squared-modulus kernel |K | 2 is reproducing.Following [10], when the integral of the diagonal of K with respect to the variation of μ is finite, the HS operator L μ on H can be isometrically represented as the Riesz representation g μ ∈ G of the integral functional on G defined by the measure μ, the conjugate of μ.The operator L μ is in this case trace-class, and g μ is the potential, or kernel embedding, of the measure μ in the RKHS G.In the Hilbert space HS(H) of all HS operators on H, the quadrature approximation of trace-class integral operators with kernel K is hence equivalent to the approximation of integral functionals on G. Considering another measure ν and denoting by B G the closed unit ball of G, we more specifically have so that the map (μ, ν) → L μ − L ν HS(H) corresponds to a generalised integral probability metric, or maximum mean discrepancy (see e.g.[2,15,16,24,27]).
We give an overall description of the framework surrounding such an isometric representation, and illustrate that it follows from the definition of a natural coisometry from HS(H) onto G; this coisometry maps self-adjoint operators to real-valued functions, and PSD operators to nonnegative functions (Sect.2).Under adequate measurability conditions on K and assuming that the diagonal of K is integrable with respect to |μ|, we show that L μ always belongs to the initial space of , and that [L μ ] = g μ .We then describe the equivalence between the quadrature approximation of integral operators with PSD kernels and the approximation of potentials in RKHSs with squared-modulus kernels (Sect.3).
For an approximate measure ν, and denoting by H ν the closure in H of the range of L |ν| (so that when ν is finitely-supported, H ν is fully characterised by the support of ν), we next investigate the extent to which the approximation of potentials in G can be used as a differentiable surrogate for the characterisation of approximations of L μ of the form P ν L μ , L μ P ν or P ν L μ P ν , with P ν the orthogonal projection from H onto H ν (Sect.4).When the measure μ is nonnegative, the operator L μ admits the decomposition L μ = ι * μ ι μ , with ι μ the natural embedding of H in L 2 (μ).The three operators ι * μ : L 2 (μ) → H, ι μ ι * μ : L 2 (μ) → L 2 (μ), and ι μ ι * μ ι μ : H → L 2 (μ), can then also be regarded as integral operators defined by the kernel K and the measure μ, and through the partial embedding ι μ P ν , a measure ν characterises approximations of each of these operators.We study the properties of these approximations and further illustrate the connections between the low-rank approximation of integral operators with PSD kernels and the approximation of potentials in RKHSs with squared-modulus kernels.We also describe the link between the considered framework and the low-rank approximation of PSD matrices through column sampling (Sect.5).The presentation ends with a concluding discussion (Sect.6) and some technical results are gathered in appendix (Appendix A).The approximation schemes considered in this note should be apprehended from the the point of view of numerical strategies for discretisation or dimension reduction; in practical applications, approximations will generally be characterised by finitely-supported measures.

Framework, notations and basic properties
By default, all the Hilbert spaces considered in this note are complex; they are otherwise explicitly referred to as real Hilbert spaces; we use a similar convention for vector spaces.Inner products of complex Hilbert spaces are assumed to be linear with respect to their right argument.For z ∈ C, we denote by z, |z| and (z) the conjugate, modulus and real part of z, respectively, and i ∈ C is the imaginary unit.By analogy, for a complex-valued function f on a general set S, we denote by f and | f | the functions defined as For two Hilbert spaces H and F, we denote by A * the adjoint of a bounded linear operator A : H → F. The map A is an isometry if A * A = id H , the identity operator on H , and A is a coisometry if A * is an isometry (and so A A * = id F ).A coisometry A is a surjective partial isometry (that is, A A * A = A), and A * A is then the orthogonal projection from H onto the initial space I(A) of A, with I(A) the orthogonal complement in H of the nullspace of A. We denote by null(A) the nullspace of A, and by range(A) its range.Also, for a subset C of H , we denote by C ⊥ H the orthogonal complement of C in H , and by C H the closure of C in H .

RKHSs and Hilbert-Schmidt operators
Below, we introduce the various Hilbert spaces relevant to our study.
Underlying RKHS.Let H be a separable RKHS of complex-valued functions on a general set X , with reproducing kernel K : X × X → C; see e.g.[1,17].For t ∈ X , let k t ∈ H be defined as the vector space H is a Hilbert space, and the Riesz map h → ξ h is a bijective conjugate-linear isometry form H to H (we may notice that ξ αh = αξ h , α ∈ C).The linear map densely defined as T a,b → a ⊗ ξ b (see Remark 2.1) is then a bijective isometry from the Hilbert space HS(H) to the tensor Hilbert space H ⊗ H .

Conjugate RKHS.
Let H be the RKHS of complex-valued functions on X associated with the conjugate kernel K , with K (x, t) = K (x, t), x and t ∈ X .For all h ∈ H, we have h ∈ H (that is, the function h : x → h(x) is a vector of H), and the map h → h is a bijective conjugate-linear isometry from H to H. We have k t (x) = K (x, t) = k t (x), and We denote by the bijective linear isometry from HS(H) to the tensor Hilbert space H ⊗ H, densely defined as (T a,b ) = a ⊗ b, a and b ∈ H. Squared-kernel RKHS.The kernels K and K being PSD, by the Schur-product theorem, so is the squared-modulus kernel

Remark 2.3
Let G be the RKHS of complex-valued functions on X for which |K | 2 is reproducing (G = H H is the product of the two RKHSs H and H; see e.g.[1,17]).Following [17,Chapter 5], we denote by C : H ⊗ H → G the coisometry densely defined as C (a ⊗ b) = ab, a and b ∈ H, where ab ∈ G is the complex-valued function on X given by For ϒ ∈ H ⊗ H, we more generally have ( The initial space of C is , the closure in H⊗H of the linear space spanned by the simple tensors k x ⊗ k x , x ∈ X .Remark 2.4 From (1), for all x ∈ X , we have [17,Chapter 2]), we have space span c {|k x| 2 have C C * = id G , so that C * is an isometry.

Natural coisometry from HS(H) onto G
We can now define a natural coisometry from the Hilbert space HS(H) of all HS operators on a RKHS H, and onto the RKHS G associated with the squared-modulus of the reproducing kernel of H.The terminology natural is used to emphasise that the considered construction does not depend on the choice of any specific basis.

Lemma 2.1 The linear map = C
: HS(H) → G is a coisometry, and its initial space is in addition, for all T ∈ HS(H), we have Proof The linear isometry being bijective, we have * = id H⊗H , and so By definition of C and , we have ).The reproducing property in G then gives We next observe that indeed, as T a,0 = 0, equality (4) trivially holds for b = 0, and for b = 0, we have The following diagram summarises the construction of (the ∼ = symbol refers to the two bijective linear isometries discussed in Remarks 2.2 and 2.3).

HS(H)
/ / Through , the HS operators on H belonging to I( ) can be isometrically represented as functions in the RKHS G associated with the squared-modulus kernel |K | 2 .In the framework of Remark 2.1, we may notice that if

Lemma 2.2
The following assertions hold: Proof Assertions 1 and 2 follow directly form (3). We assume that T ∈ HS(H) is PSD, and we consider a spectral expansion T = j∈I λ j S ϕ j of T , with λ j ≥ 0, ϕ j ∈ H and I ⊆ N; observing that [S ϕ j ] = |ϕ j | 2 , j ∈ I, we obtain 3. To prove assertion 4, we first observe that if g ∈ G, then g ∈ G (that is, the function g is a vector of G); the map is indeed surjective, and if g = [T ], T ∈ HS(H), then g = [T * ].By linearity, the real and imaginary parts of g are then also vectors of G, and so g G = g G (see for instance [17,Chapter 5]; see also Remark 2.6).Since is a partial isometry, for T ∈ HS(H), we have T HS(H) ≥ [T ] G , with equality if and only if T ∈ I( ); as T HS(H) = T * HS(H) , the result follows.
Remark 2. 5 The diagram ( 5) is also well-defined when the involved Hilbert spaces are real.We in this case have H) and the operators in I( ) Remark 2. 6 The PSD kernel |K | 2 being real-valued, it is the reproducing kernel of a real RKHS G R of real-valued functions on X .The decomposition G = G R + iG R holds, and G R is the real-linear subspace of all real-valued functions in G.This decomposition mirrors the decomposition HS(H) = HS R (H) + iHS R (H), with HS R (H) ⊂ HS(H) the real-linear subspace of all self-adjoint HS operators on H. Also, the real convex cone HS + R (H) ⊂ HS R (H) of all PSD HS operators on H is generating in HS R (H), and the real convex cone

Remark 2.7
Let F be another separable RKHS of complex-valued functions on X , with reproducing kernel J : X × X → C. We denote by HS(F, H) the Hilbert space of all HS operators from F to H, and let H F be the product of the RKHSs H and F, that is, the RKHS with kernel K J .Following (5), we can more generally define a natural coisometry from HS(F, H) onto H F.

Trace-class integral operators with PSD kernels
From Lemma 2.1, if T ∈ HS(H) is of the form T = n j=1 ω j S k s j , with n ∈ N, s j ∈ X and ω j ∈ C, then T ∈ I( ).We in this case have so that T can be regarded as an integral operator on H defined by the kernel K and the finitely-supported measure n j=1 ω j δ s j , with δ x the Dirac measure at x ∈ X .We also have T ] is thus the Riesz representation of the integral functional on G defined by the measure n j=1 ω j δ s j .Under measurability conditions, this observation holds for all trace-class integral operators on H defined by the reproducing kernel K of H and general measures on X , as illustrated below.

Integral operators and kernel embedding of measures
Let A be a σ -algebra of subsets of X .We consider the Borel σ -algebra of C, and make the following assumptions on K and the measurable space (X , A): 1 The RKHSs H and G being separable, A.1 ensures that all the functions in H and G are measurable; see for instance [25,Lemma 4.24].Consequently, under A.1, the maps t → k t , t → |k t | 2 and t → S k t , t ∈ X , are weakly-measurable, and since the Hilbert spaces H, G and HS(H) are separable, by the Pettis measurability theorem, these maps are also strongly-measurable (see e.g.[8,28]).
We denote by M + , M, and M C the set of all nonnegative, signed and complex measures1 on (X , A), and we set We next introduce the sets T + (K ), T (K ) and T C (K ) of all measures μ in M + , M, and M C such that τ μ is finite, respectively; the inclusion T + (K ) ⊂ T (K ) holds, and we set Integral operators on H with kernel K .By assumption, for μ ∈ T F (K ), the integral X S k t HS(H) d|μ|(t) = τ μ is finite, and the map t → S k t is thus Bochner-integrable with respect to μ (Bochner integrability criterion, see e.g.[8,28]; see also Remark 3.1).We set From (4), for h ∈ H and x ∈ X , we have so that L μ ∈ HS(H) can be regarded as an integral operator on H defined by the kernel K and the measure μ.
By boundedness of the linear evaluation map T → T [h] from HS(H) to H, we obtain (see for instance [28,Chapter 5]) Kernel embedding of measures in G.By assumption again, for μ ∈ T F (K ), the integral X |k t | 2 G d|μ|(t) is finite, and the map t → |k t | 2 is therefore Bochnerintegrable with respect to μ.We set We have g μ |g G = X g(t)dμ(t), g ∈ G, so that g μ is the Riesz representation of the linear functional I μ : G → C, with I μ (g) = X g(t)dμ(t); we may observe that |I μ (g)| ≤ g G τ μ , and that g μ (x) = X |K (x, t)| 2 dμ(t), x ∈ X .The vector g μ is referred to as the kernel embedding, or potential, of the measure μ in the RHKS G; see for instance [6,15,24].
Proof From Lemma 2.1 and by definition of L μ and g μ , for all T ∈ HS(H), we have Remark 3.3 Following Lemma 2.2, for a signed measure μ ∈ T (K ), the function g μ is real-valued, and the operator L μ = * [g μ ] is self-adjoint.Also, for a nonnegative measure μ ∈ T + (K ), the function g μ is nonnegative, and the operator L μ is PSD.We may notice that L δ x = S k x , x ∈ X .From Theorem 3.1, for μ and ν ∈ T F (K ), the following equalities hold: (7) relating the evaluation of inner products in HS(H) between trace-class integral operators with kernel K to the integration of potentials in G.

Quadrature approximation
Let B G = {g ∈ G| g G ≤ 1} be the closed unit ball of G.We set The map M G defines a pseudometric on T F (K ); for probability measures, such pseudometrics are referred to as integral probability metrics, or maximum mean discrepancies; see for instance [15,16,23,24,27].
The following Corollary 3.1 describes the equivalence between the quadrature approximation of trace-class integral operators with PSD kernels and the approximation of integral functionals on RKHSs with squared-modulus kernels.

Corollary 3.1 For all μ and ν
Proof Form Theorem 3.1 and by linearity of * , we have The CS inequality in G and the definition of g μ and g ν then give We conclude by observing that for all g ∈ G, we have g ∈ G and g G = g G (see the proof of Lemma 2.2), so that g μ − g ν G = M G (μ, ν).

Further properties
In this section and in anticipation of the forthcoming developments, we discuss some further properties verified by the integral operators considered in Theorem 3.1.
For μ ∈ T + (K ), let L 2 (μ) be the Hilbert space of all square-integrable functions with respect to μ.From the CS inequality in H, we have so that the linear embedding ι μ : H → L 2 (μ), with ι μ [h] the equivalence class of all measurable functions μ-almost everywhere equal to h, is bounded (see e.g.[26]).
For ν ∈ T F (K ), we by definition have |ν| ∈ T + (K ) and ν ∈ T F (K ); from (6), we also have L * ν = L ν .The following relation (Lemma 3.3) holds between the range of L ν and the range of L |ν| .

Measures and projection-based approximations
In this section, we illustrate the extent to which the the approximation of potentials in G can be used as a surrogate for the characterisation of closed linear subspaces of H for the approximation of L μ ∈ HS(H) through projections (see Remark 4.1).

Additional notations and general properties
For a closed linear subspace H S of H, we denote by P S the orthogonal projection from H onto H S .Endowed with the Hilbert structure of H, the vector space H S is a RKHS, and its reproducing kernel K S verifies K S (x, t) = P S [k t ](x), x and t ∈ X .

Remark 4.1 The linear map T → P S T is the orthogonal projection from HS(H) onto R(H S ) = {T ∈ HS(H)| range(T ) ⊆ H S }, the closed linear subspace of HS(H) of all operators with range included in H S . Also, the linear map T → T P S is the orthogonal projection from HS(H) onto Z(H S ) = {T ∈ HS(H)| range(T * ) ⊆ H S }.
The two orthogonal projections T → P S T and T → T P S commute, and their composition, that is, the linear map T → P S T P S , is the orthogonal projection from HS(H) onto R(H S ) ∩ Z(H S ).As (P S T ) * = T * P S , the orthogonal projections onto R(H S ) and Z(H S ) are intrinsically related; for this reason, in what follows, we mainly focus on approximations of the form P S T and P S T P S .By orthogonality, for all T ∈ HS(H), we have By boundedness of P S , for μ ∈ T F (K ), we have P S L μ = X P S S k t dμ(t), and so The operator P S L μ ∈ HS(H) can thus be regarded as an integral operator on H defined by the kernel K S and the measure μ.Since K S (t, t) ≤ K (t, t), t ∈ X , we may notice that T F (K ) ⊆ T F (K S ).We have , and ( 12) see (10), (11) and Lemma A.1 in Appendix A for a detailed computation (see also Remark 4.2 for an alternative computation involving ).

Remark 4.2
Let H U and H V be two closed linear subspaces of H.For μ ∈ T F (K ) and x ∈ X , we have From Theorem 3.1 and the properties of , we then obtain For general subspaces H U and H V , the operator P V L μ P U does not necessarily belong to I( ); see Remark 4.3 for an example where this situation occurs.

Projections defined by measures
Motivated by Lemmas 3.3 and 4.1, for ν ∈ T F (K ), we set H ν = range(L |ν| ) H , and we denote by P ν the orthogonal projection from H onto H ν .
For an initial operator L μ , with μ ∈ T F (K ), through the orthogonal projection P ν and in addition to L ν , an approximate measure ν ∈ T F (K ) also defines the approximations P ν L μ , L μ P ν or P ν L μ P ν of L μ .

Lemma 4.3 For all μ and ν ∈ T F (K ), we have
Proof Using the notations of Remark 4.1, Lemma 4.2 reads we obtain (14).In the same way, and we have , leading to (15).

Error maps on sets of measures
In the framework of Sects.3.2 and 4.2, the characterisation of measures leading to accurate approximations of an initial operator L μ , μ ∈ T F (K ), relates to the minimisation of error maps measuring the accuracy of the approximations induced by a measure ν ∈ T F (K ).
Quadrature approximation.We define the error map For ν and η ∈ C , the directional derivative of D μ at ν along η − ν is the convexity of D μ on C then follows from the convexity of the map G provides the expected expression for the directional derivatives of D μ .
Projection-based approximation.We denote by C P μ and C PP μ : T F (K ) → R ≥0 the error maps defined as Theorem 4.1 For μ ∈ T F (K ) and X ∈ {P, PP}, the map C X μ is convex on the real convex cone T + (K ), and for all ν and η ∈ T + (K ), we have Proof For ν, η ∈ T + (K ) and ρ ∈ (0, 1), we set ξ = ν + ρ(η − ν) ∈ T + (K ).The three operators L ν , L η and L ξ being PSD, independently of ρ ∈ (0, 1), we have null(L ξ ) = null(L ν ) ∩ null(L η ), and so are therefore constant on the open interval (0, 1).From Lemma 4.1 and ( 10), noticing that In view of Theorem 4.1, the maps C P μ and C PP μ are akin to piecewise-constant functions.By contrast (see Lemma 4.4), the directional derivatives of the map D μ are informative, in the sense that the landscape of D μ can be explored through steepest descents.From Remark 4.1 and Lemma 4.3, we have Fig. 1 Graphical representation of the maps D μ and C PP μ as functions of the weight parameters characterising an approximate measure ν ∈ T + (K ).The measures μ and ν are supported by the same set of points {x 1 , x 2 } ⊆ X , and described by their weight parameters (ω 1 , ω 2 ) and (υ 1 , υ 2 ) ∈ R 2 ≥0 , respectively; the red star represents the weight parameters of μ = ω 1 δ x 1 + ω 2 δ x 2 .The presented graphs correspond to the case In the graph of C PP μ , the point on the vertical axis indicates the value of the map at ν = 0, and the bold lines indicate the constant values taken by the map along the horizontal axes (and following Remark 4.3, the graph of C PP μ is tangent to the graph of D μ along the horizontal axes) with (see also Remark 4.3).The quadrature-approximation error map D μ may hence be regarded as a differentiable relaxation of the projection-based-approximation error maps C P μ and C PP μ ; see Fig. 1 for an illustration.
and c δ s = 0 otherwise, we have P δ s L μ P δ s = c δ s S k s .For K (s, s) = 0, we indeed have k s = 0, and so P δ s = 0, and for K (s, s) > 0, We obtain P δ s L μ P δ s ∈ I( ) and Remark 4. 4 From a numerical standpoint, in view of ( 12) and ( 13), for ν ∈ T F (K ), the evaluation of C P μ (ν) or C PP μ (ν) requires a suitable characterisation of the reproducing kernel K ν of H ν (or equivalently, of the orthogonal projection P ν ); in practice, K ν is a priori unknown and needs to be computed from K and ν (see Remark 4.5).In comparison and in view of (7), the error map D μ only involves the kernel K ; the projection-free nature of D μ is of notable interest for numerical applications.Remark 4.5 Following Lemma 3.4, for a measure ν supported by S = {s 1 , . . ., s n }, n ∈ N, the reproducing kernel K ν of H ν can be expressed as where κ i, j is the i, j entry of the pseudoinverse (Moore-Penrose inverse) of the n × n kernel matrix with i, j entry K (s i , s j ).The worst-case time complexity of the evaluation of 3 ) is related to the pseudoinversion of the kernel matrix defined by K and S, while the term O(n 2 M) corresponds to the evaluation, from this pseudoinverse and the kernel K , of K ν at M different locations.

Nonnegative measures and L 2 -embeddings
Following Sect.3.3, for μ ∈ T + (K ), the embedding ι μ : H → L 2 (μ) is HS.For f ∈ L 2 (μ) and x ∈ X , we have so that in addition to L μ = ι * μ ι μ ∈ HS(H), the three operators can also be regarded as integral operators defined by the kernel K and the nonnegative measure μ.These four interpretations are inherent to K , which characterises H, and μ, which characterises L 2 (μ); see for instance [4,19,20,22,26] for illustrations.In each case, the corresponding operator is HS, and we denote by HS(μ, H), HS(μ) and HS(H, μ) the Hilbert spaces of all HS operators from L 2 (μ) to H, on L 2 (μ), and from H to L 2 (μ), respectively.

Partial L 2 -embeddings
For a closed linear subspace H S ⊆ H, the embedding ι μ can be approximated by the partial embedding ι μ P S .For f ∈ L 2 (μ) and x ∈ X , we have  (20) with, form the CS inequality in 2 (I) and in H, From ( 20) and Fubini's theorem, we then for instance obtain The notations C tr μ and C F μ are motivated by the relation between these maps and the trace and Frobenius norms; see Sect.5.3.As observed for C P μ and C PP μ , we may notice that C Proof We follow the same reasoning as in the proof of Theorem 4.1.For two measures ν and η ∈ T + (K ) and for ρ ∈ (0, 1), we set ξ = ν + ρ(η − ν) ∈ T + (K ).We then have H ξ = H ν + H η H independently of ρ ∈ (0, 1).We conclude by combining the inclusions H ν ⊆ H ξ and H η ⊆ H ξ with the inequalities provided in Lemma A.2 (Appendix A).
As illustrated by Lemma 5.1 and Theorem 5.1, and as already observed for the error maps C P μ and C PP μ , the error maps C tr μ and C F μ are akin to piecewise-constant functions, and their evaluation requires a suitable characterisation of the kernel of subspaces of H.For μ ∈ T + (K ), the error maps C X μ , X ∈ {tr, F, P, PP} can be regarded as alternative ways to asses the accuracy of the approximation of ι μ by ι μ P ν , ν ∈ T F (K ).From the relation between the error maps D μ and C X μ , X ∈ {P, PP} (see Lemma 4.3) the approximation of potentials in G can hence more generally be regarded as a differentiable and projection-free surrogate for the characterisation of accurate partial embeddings.From Lemma 5.2, we may notice that C F μ (ν) ≤ C P μ (ν), ν ∈ T F (K ), extending the sequence of inequalities (16).

Remark 5.3
Let ν ∈ T C (K ) be a complex measure with real and imaginary parts ν r and ν i ∈ T (K ).For μ ∈ T (K ), the three operators L μ , L ν r and L ν i are selfadjoint; we thus have HS(H) , and so D μ (ν r ) ≤ D μ (ν).Hence, when L μ is self-adjoint, the search of an approximate measure ν for the approximation of L μ by L ν can be restricted to T (K ).

Column sampling for PSD-matrix approximation
Let K be a N × N PSD matrix, with N ∈ N; we denote by [N ] the set of all integers between 1 and N .For a subset I ⊆ [N ] the Nyström approximation2 of K induced by I is the N × N PSD matrix where K •,I is the matrix defined by the columns of K with index in I , and where (K I ,I ) † is the pseudoinverse of the principal submatrix of K defined by I (and K I ,• consists of rows of K); see e.g.[9,11,14,18].For i and j ∈ [N ], the i, j entry of K may be regarded as the value K (i, j) of a PSD kernel K defined on the discrete set X = [N ].The j-th column of K then corresponds to the function k j ∈ H, j ∈ X , and the subset I defines the closed linear subspace H I = span C {k j | j ∈ I } ⊆ H; in particular, the i, j entry of K(I ) is K I (i, j), with K I the reproducing kernel of H I (see e.g.[17], and Remark 4.5).
Introducing μ = N i=1 δ i , the Hilbert space L 2 (μ) can be identified with the Euclidean space C N ; following Sect.5.2, we then observe that • the trace norm K − K(I ) tr corresponds to (18), and • the squared Frobenius norm K − K(I ) 2  F corresponds to (19).The column-sampling problem for the Nyström approximation of a PSD matrix K, that is, the search of a subset I ⊆ [N ] leading to an accurate approximation K(I ) of K, is thus a special instance of the general framework discussed in Sect.5.1.In particular, the support of an approximate measure ν on X = [N ] defines a subset of columns of K, and the approximation of potentials in the RKHS G may be used as surrogate for the characterisation of such measures.In the discrete setting, G corresponds to the RKHS defined by the N × N PSD matrix S with i, j entry |K i, j | 2 (that is, S is the element-wise product between K and K, the conjugate of K).

Concluding discussion
We described the overall framework surrounding the isometric representation of integral operators with PSD kernels as potentials, and illustrated the equivalence between the quadrature approximation of such integral operators and the approximation of integral functionals on RKHSs with squared-modulus kernels.Through subspaces defined by measures and partial L 2 -embeddings, we also discussed the extent to which the approximation of potentials in RKHSs with squared-modulus kernels can be used as a differentiable surrogate for the characterisation of projection-based approximation of integral operators with PSD kernels.
The link between integral-operator approximation and potential approximation may be leveraged to design sampling strategies for low-rank approximation (where approximations are characterised by sparse finitely-supported measures).The direct minimisation of D μ under sparsity-inducing constraints is for instance considered in [10], while the possibility to locally optimised the support of approximate measures using particle-flow techniques is studied in [13].Sequential approaches, where support points are added one-at-a-time on the basis of information provided by the directional derivatives of D μ , are investigated in [12].The present work aims at supporting this type of approaches by strengthening their theoretical underpinning.

5 . 1
For μ ∈ T + (K ), the statement of Theorem 4.1 also holds for the maps C tr μ and C F μ ; that is, these two maps are convex on the real convex cone T + (K ), and their directional derivatives take values in the set {−∞, 0}.
), where .|. H stands for the inner product of H (this equality is often referred to as the reproducing property); we denote by • H the norm of H, and we use a similar convention for the inner products and norms of all the Hilbert spaces encountered in this note.An operator T ∈ HS(H) always admits a singular value decomposition (SVD) of the form T = i∈I σ i T u i ,v i , I ⊆ N, where {σ i } i∈I ∈ 2 (I) is the set of all strictly-positive singular values of T , and where {u i } i∈I and {v i } i∈I are two orthonormal systems in H; the series converges in HS(H).Let H be the continuous dual of H.For h ∈ H, let ξ h ∈ H be the bounded linear functional such that ξ h [3,7]rt-Schmidt space.Let HS(H) be the Hilbert space of all HS operators on H; see e.g.[3,7].For T ∈ HS(H), we denote by T [h] ∈ H the image of h ∈ H through T , and by T [h](x) the value of the function T [h] at x ∈ X ; we use similar notations for all function-valued operators.For a and b ∈ H, let T a,b ∈ HS(H) be the rank-one operator given by T a,b [h] = a b | h H , h ∈ H; we also set S b = T b,b .Remark 2.1

Following Remark 2.2, the linear map ξ h → h is a bijective isometry form H to H. Further, the linear map densely defined as a ⊗ ξ b → a ⊗ b is a bijective isometry form H ⊗ H to H ⊗ H; the composition of this isometry with the bijective isometry from HS(H) to H ⊗H discussed in Remark 2.2 yields the isometry
ensures that the embedding ι |ν| is well-defined.From Lemma 3.2, we have null(L |ν| ) = span C {k s 1 , . . ., k s n }. |δ s i ∈ T + (K ), and L |ν| is PSD.From Lemma 3.2, we obtain that null(L |ν| ) = {h ∈ H|ι |ν| [h] = 0} = n i=1 {h ∈ H| k s i |h H = 0}, and so null(L |ν| ) ⊥ H = span C {k s 1 , . . ., k s n }.Observing that L |ν| is self-adjoint, the result follows.