Approximation and Interpolation of Singular Measures by Trigonometric Polynomials

Complex signed measures of finite total variation are a powerful signal model in many applications. Restricting to the $d$-dimensional torus, finitely supported measures allow for exact recovery if the trigonometric moments up to some order are known. Here, we consider the approximation of general measures, e.g., supported on a curve, by trigonometric polynomials of fixed degree with respect to the Wasserstein-1 distance. We prove sharp lower bounds for their best approximation and (almost) matching upper bounds for effectively computable approximations when the trigonometric moments of the measure are known. A second class of sum of squares polynomials is shown to interpolate the characteristic function on the support of the measure and to converge to zero outside.


Introduction
Data science in general and more specifically signal and image processing relies on mathematical methods, with the fast Fourier transform as the most prominent example.Besides its favourable computational complexity, its success relies on the good approximation of smooth functions by trigonometric polynomials.Mainly driven by specific applications, functions with additional properties together with their computational schemes have gained some attention: signals might for instance be sparse like in single molecule fluorescence microscopy [30], or live on some other lower dimensional structure like microfilaments, again in bio-imaging.Such properties are well modeled by measures, which can express the underlying structure through the geometry of their support, e.g.being discrete or singular continuous.This representation has in particular led to a better understanding of the sparse super-resolution problem [5,8,11], but has also proven useful in many more applications, such as phase retrieval in X-ray crystallography, or contour reconstruction in natural images.In this work, we consider measures supported on the torus.The available data then consists in trigonometric moments of low to moderate order, and one asks for the reconstruction or approximation of the measures.
Related work For discrete measures, there is a large variety of methods that compute or approximate the parameters of the measure, e.g., parametric methods like Prony's method [39,37,21,19,43], matrix pencil [17,31,11], ESPRIT [40,2,41,26] or MUSIC [45,27], or variational methods, such as TV-minimization via the Beurling LASSO [6,5], which can be challenging for higher spatial dimensions [7,38] or larger polynomial degrees.The positivedimensional case on the other hand is more involved.Specific curves in a two-dimensional domain are identified by the kernel of moment matrices in [34,13,33], more general discussions can be found in [25] and [47].In another line of work, Christoffel functions offer interesting guarantees both in terms of support identification [24] or approximation on the support [20,28,35], but, to the best of our knowledge, require strong regularity assumptions, and only come with separate guarantees on and outside the support of the measure.
Contributions Following the seminal paper [29], we introduce easily computable trigonometric polynomials to approximate an arbitrary measure on the d-dimensional torus.In contrast to [29], we provide tight bounds on the pointwise approximation error as well as with respect to the Wasserstein-1 distance, the latter scaling inverse linearly with respect to the polynomial degree (up to a logarithmic factor).After setting up the notations, Section 2 considers the approximation of measures by trigonometric polynomials.Theorem 2.2 proves the existence of a best approximation and provides a lower bound which is attained for the univariate case and is sharp within a factor 6d for spatial dimensions d > 1.The convolution of the measure with the Fejér kernel has a representation via the moment matrix of the measure and is shown to be a sum of squares for non-negative measures in Lemma 2.3.Theorem 2.5 proves a sharp upper bound for its approximation error being a log-factor worse than the best approximation.Theorem 2.7 and Remark 2.8 discuss the saturation of this approximation and the removal of the log-factor by using the Jackson kernel.In the univariate case, the Wasserstein-1 distance of measures is realized as L 1 -norm after convolution with the Bernoulli spline of degree 1 and this also allows for the uniqueness of the best approximation for absolutely continuous real measures.
Section 3 studies another sum-of-squares trigonometric polynomial defined via the moment matrix of the measure, similarly suggested in [21,Thm. 3.5] and [33,Prop. 5.3] (and indeed closely related to the rational function [45, Eq. ( 6)]).This polynomial interpolates the constant one function on the Zariski closure of the support of the measure and converges pointwisely to zero outside.Theorem 3.1 proves a variational characterisation as well as the interpolation property.The pointwise convergence is proved in Theorem 3.4 and Theorem 3.8 for the discrete and singular continuous case, respectively.The discrete case also allows for a weak convergence result in Theorem 3.7.We end by illustrating the theoretical results by numerical examples in Section 4.

Preliminaries
Let d ∈ N, 1 ≤ p ≤ ∞ and let |x−y| p = min k∈Z d x − y + k p denote the wrap-around p-norm on T d = [0, 1) d .For d = 1 these wrap-around distances coincide and we denote them by |x−y| 1 to distinguish from the absolute value.Throughout this paper, let µ, ν denote some complex Borel measures on T d with finite total variation and normalization µ(T d ) = ν(T d ) = 1.We denote the set of all such measures by M and restrict to the real signed and non-negative case by M R and M + , respectively.
A function has Lipschitz-constant at most 1 if |f (x) − f (y)| ≤ |x − y| 1 for all x, y ∈ T d and we denote this by the shorthand Lip(f ) ≤ 1.Using the dual characterisation by Kantorovich-Rubinstein, the Wasserstein-1-distance of µ and ν is defined by for any µ, ν ∈ M, and µ, ν ∈ M + also admit the primal formulation where the infimum is taken over all couplings π with marginals µ and ν, respectively.We note in passing that the Wasserstein-1-distances for other p norms on T d are equivalent with lower and upper constant 1 and d 1−1/p , respectively.Moreover, the Wasserstein distance defines a metric induced by the norm which makes the space of Borel measures with finite total variation a Banach space.By slight abuse of notation, we also write W 1 (p, µ) in case the measure ν has density p, i.e., dν(x) = p(x)dx.
The Fourier coefficients or trigonometric moments of µ are given by and these are finite with |μ(k)| ≤ µ TV and μ(0) = 1.We are interested in the reconstruction of the measure given these moments for indices k ∈ {−n, . . ., n} d .Besides collecting them in a vector, we also set up the finite moment matrix and denote its singular value decomposition by From the data stored in T n , we compute trigonometric approximations q n to the underlying measure and distinguish between pointwise convergence to the characteristic function of supp µ, i.e.
for all continuous test functions f : T d → C. The latter is denoted by q n µ as n → ∞ and the space of test functions can be restricted to Lipschitz continuous test functions by Portmanteau's theorem.Moreover, q n µ is equivalent to lim n→∞ W 1 (q n , µ) = 0 on the bounded set T d and we can quantify rates of weak convergence in terms of the Wasserstein distance (e.g.cf.[46,Thm. 6.9]).

Approximation
We give two introductory examples, prove a lower bound on a best approximation, and an upper bound on the easily computable approximation by convolution with the Fejér kernel.
Example 2.1.Our first example for d = 1 is the measure where λ denotes the Lebesgue measure, which obviously has singular and absolutely continuous parts including an integrable pole at x = 7 8 .Given the first trigonometric moments, the Fourier partial sums might serve as a sequence of approximations.Another classical sequence of approximations is given by convolution with the Fejér kernels, see (2.4)The example measure (2.1) and its approximations by the Fourier partial sum (left) and the convolution with the Fejér kernel (right).The weight 1 3 of the Dirac measure in 1  8 is displayed by an arrow of height n/3 for visibility.
Our second example is a singular continuous measure for d = 2.We take µ = (2πr 0 ) −1 δ C ∈ M + as the uniform measure on the circle for some radius 0 < r 0 < 1 2 .The total variation of this measure is Using the Poisson summation formula and a well-known representation of the Fourier transform of a radial function, we find for the trigonometric moments of µ, where J 0 denotes the 0-th order Bessel function of the first kind.These decay asymptotically with rate k . The Fourier partial sum as well as the convolution with the Fejér kernel for n = 29 are shown with maximal contrast in the left and right panel of  Proof.We directly have existence of a best approximation by polynomials in the Banach space of Borel measures with finite total variation (e.g.cf.[9,Thm. 3.1.1]).For the lower bound, we compute where p denotes the reflection of p.It remains to find the worst case error for the best approximation of a Lipschitz function by a trigonometric polynomial.This is well-understood for d = 1 (cf.[1,14]) while we did not find a reference that and how d > 1 is possible as well.Therefore, we show that the idea by [15] for the case d = 1 works also for d > 1 in our situation.A main ingredient of Fishers proof is the duality relation for a Banach space X, x 0 ∈ X, with subset Y and dual space X * .The second ingredient is given by the 1-periodic Bernoulli spline of degree 1 A Lipschitz continuous and 1-periodic function f : T → R with Lip(f ) ≤ 1 has a derivative f almost everywhere and this derivative satisfies T f (s) = 0 by the periodicity of f .Therefore, it follows that The dual space of the space of continuous periodic functions is the space of periodic finite regular Borel measures equipped with the total variation norm and the duality formulation gives Our main contribution to this result is the observation how to transfer the multivariate setting back to the univariate one.It is easy to verify that f and λ being the Lebesgue measure on T is admissible.Since this choice of µ s integrates gdµ s = 0 if g is constant with respect to x s (and the same holds for constant univariate functions integrated against µ * 0 ), we obtain ) and μ * 0 (k) = 0 otherwise.Together with the Fourier representation (2.3) of B 1 this gives Hence, taking f 0 (s) = ±d depending on the sign of with this choice.Finally, we end up with and this was the claim.
Let the Fejér kernel F n : T → R and by slight abuse of notation also its multivariate version be given by The main object of study now is the approximation two examples are given in Example 2.1.We start by noting that the p n can be expressed in terms of the moment matrix and preserves non-negativity and normalization.
Choosing q = e n (x) yields the second claim and by interchanging the order of integration and noting that the value of the inner integral is independent of y also Our next goal is a quantitative approximation result, for which we need the following preparatory lemma.This result can be found in qualitative form e.g. in [4, Lemma 1.6.4].
Lemma 2.4.Let n, d ∈ N, then we have Proof.First note that such that it is sufficient to consider the univariate case.With the representation F n (x) = 1 + 2 n k=1 1 − k n+1 cos(2πkx) we find after elementary integration The lower bound follows similarly by bounding the series from the previous calculation by integrals from below.Theorem 2.5.Let d, n ∈ N and µ ∈ M, then the measure with density p n converges weakly to µ with which is sharp since µ ∈ M + implies µ TV = 1 and Proof.We compute note that both inequalities become equalities when choosing µ = δ 0 and f (x) = |x| 1 , and then apply Lemma 2.4.We note in passing that W 1 (F n , δ 0 ) = T d F n (x)|x| 1 dx.Example 2.6.Theorem 2.5 gives a worst case lower bound and, on the other hand, the Lebesgue measure is approximated by F n * λ = λ without any error.We may thus ask how well a measure dµ = w(x)dx with smooth (non-negative) density might be approximated.If we choose the analytical density w(x) = 1+cos(2πx), then F n * w(x)−w(x) = cos(2πx)/(n+1) and, by testing with the Lipschitz function f (x) = cos(2πx)/(2π), we see that .
In greater generality, such a lower bound holds for each measure individually and can be inferred by a nice relationship between the Wasserstein distance and a discrepancy, cf.[12].
Theorem 2.7.For each individual measure µ ∈ M different from the Lebesgue measure, there is a constant c > 0 such that holds for all n ∈ N.
Remark 2.8.The gap between upper and lower bounds can be narrowed by choosing another convolution kernel, which then however does not allow for the representation in Lemma 2.3.The Jackson kernel has degree n = 2m − 2 and satisfies .
Analogously to Theorem 2.5, we get which still is an approximate factor 6 worse than the lower bound in the univariate case.A factor 3 is due to the above estimate and a factor 2 seems to indicate that the Jackson kernel is not optimal.Moreover, upper and lower bound differ by a factor d in the multivariate case which might be due to the used norms or our proof techniques.We mention at this point that for any kernel K n and measure µ ∈ M with equality in the second inequality if K n is nonnegative.

Univariate case
In one variable, the question of uniqueness of the best approximation can be equivalently characterised by the uniqueness of the best approximation in L 1 (T) and thus allows for the following lemma.
Lemma 2.9 (Best approximation in the univariate case).For d = 1, any absolutely continuous real measure admits a unique best approximation by a polynomial of degree n ∈ N with respect to the Wasserstein-1 distance.
Proof.Let µ, ν ∈ M R and B 1 denote the Bernoulli spline of degree 1 from the proof of Theorem 2.2, then we have Since the integral over f is zero by the periodicity of f , any c ∈ R yields On the other hand, choosing c * such that {t : by taking f with f (t) = ±1 depending on the sign of the term in brackets.Because of We proceed by computing explicitly If µ does not give mass to single points, we have that B 1 * µ is continuous and hence there exists a unique best L 1 -approximation p (e.g.cf.[9, Thm.3.10.9])which defines p * uniquely by p = B 1 * p * .
Example 2.10.Uniqueness and non-uniqueness of L 1 approximation is discussed in some detail in [32,10] and we note the following: where λ is again the Lebesgue measure, one finds As proved in [32, Thm.5.1], this function does not have a unique L 1 approximation and thus µ does not admit a unique approximation by a polynomial due to Lemma 2.9 for even n.
(ii) For µ = δ 0 one has B 1 * µ = B 1 and according to [32,Lem. 2.2] this function with only one jump has a unique best L 1 -approximation given by the interpolation polynomial as the unique best approximation to δ 0 .Since the error of the best L 1 approximation of B 1 is known from a theorem by Favard [14] (e.g. this is mentioned in [9, p. 213]), we can compute and this reveals that the estimates in the proof of Theorem 2.2 are sharp.2.1 summarize our findings on the approximation of δ 0 .The best approximation p * as well as the Dirichlet kernel D n (x) = sin(2n + 1)πx/ sin πx are signed with small full width at half maximum but positive and negative oscillations at the sides.The latter might be seen as an unwanted artifact in applications.The approximations given by the Fejér and the Jackson kernel are non-negative.For completeness, we note that the Dirichlet kernel is the Fourier partial sum of δ 0 and allows for the estimate and thus [29, eq. ( 2.1) and (2. 2)] needs some adjustment.
On the real line, we indeed have µ((−∞, x]) = (H * µ)(x) for the Heaviside function x ≥ 0, 0, else, such that the Wasserstein distance again can be computed via see e.g.[42, Prop.2.17].Lemma 2.9 might be considered as the periodic analogue of this result.
(ii) One can relate our work to a main result in [29].As Lemma 2.9 reformulates the Wasserstein distance of two univariate measures in terms of the L 1 -distance of their convolution with the Bernoulli spline, one can view this Bernoulli spline as a kernel of type β = 1 following the notation of [29].Thus, one can take p = 1, p = ∞ in [29,Thm. 4.1] yielding that the Wasserstein distance between a measure µ and its trigonometric approximation is bounded from above by c/n.The latter agrees with our Remark 2.8 which additionally gives an explicit and small constant.
(iii) The observation that the construction of p * for δ 0 is possible via FFT's might lead to the idea to construct near-best approximations to any measure µ by interpolating B 1 * µ by some p and to obtain the polynomial p of near best approximation which satisfies p = B 1 * p by multiplying with the Fourier coefficients of the Bernoulli spline B 1 .A first problem would be that the limited knowledge of moments only allows to interpolate the partial Fourier sum S n (B 1 * µ) which does not converge to B 1 * µ uniformly as n → ∞ for discrete µ.Secondly, the near-best approximation p cannot be expected to be nonnegative for a nonnegative measure µ which is another drawback compared to convolution with nonnegative kernels like the Fejér or Jackson kernel.
(iv) Finally note that kernels K n with stronger localization and 'smoother' Fourier coefficients, e.g. higher powers of the Fejér kernel, allow to improve the rate beyond n −1 if the measure has a smooth density w.This can be seen from partial integration and the above arguments.However note that from a practical perspective, this asks for a-priori smoothness assumptions on the measure to choose a suitable kernel.

Interpolation
Using the singular functions of the moment matrix as in (2.5) and with r = r(n) = rank T n , we define noise-and signal-functions p 0,n , p 1,n : respectively.In what follows, we suppose that V ⊆ T d denotes the smallest set containing supp µ that is the zero-locus of some unknown trigonometric polynomial, i. e. V ⊇ supp µ is the Zariski closure of the support.We show pointwise convergence to the characteristic function of this set.Clearly this cannot be uniform and our first result shows interpolation of the value 1 for finite n as well as a variational characterization.
is the set consisting of the common roots of all the polynomials in ker T n .Then p 0,n (x) + p 1,n (x) = 1 for all x ∈ T d .In particular, we have holds, where the maximum is subject to all trigonometric polynomials p ∈ e 2πikx : k ∈ [n] , p = 0, such that p(y) = 0 for all y ∈ V .

Zero-dimensional situation
If the measure is given by with support V = supp µ = {x 1 , . . ., x r } ⊂ T d and complex weights Λ = diag(λ 1 , . . ., λ r ), then the support is zero-dimensional and the moment matrix allows for the Vandermonde factorisation which will be instrumental.
Theorem 3.4 (Pointwise convergence).Let µ = r j=1 λ j δ x j be a discrete measure and let Proof.Comparing (3.1) with Lemma 2.3 and using (2.4) yields The final estimate follows from Regarding the second estimate, consider the Vandermonde matrix and note that its pseudo-inverse gives rise to the Lagrange polynomial satisfying r+1 (x j ) = 0 for j = 1, . . ., r and r+1 (x) = 1.We compute and use Theorem 3.1 to bound The assertion follows from known estimates on the smallest singular value for the Vandermonde matrix with pairwise clustering nodes, see [16,Corollary 3.20].
Remark 3.5.Actually, Theorem 3.4 shows the correct orders in n and min j |x − x j | 2 ∞ in the upper bound of p 1,n (x).First note that 1 − p 1,n and all its partial derivatives of order 1 vanish on x 1 , . . ., x r .For fixed x ∈ T d , the Taylor expansion in x 0 = argmin j |x − x j | ∞ thus gives where the last inequality uses an entrywise Bernstein inequality and p 1,n ∞ = 1.was evaluated on a grid in T 2 and interpolated on the magenta cross section (left), while the bounds on p 1,20 on this cross section are displayed (right).We see that specifically the bound 1−σ min ( Ãn,x ) 2 /N from the proof of Theorem 3.4 reproduces the behaviour of p 1,n .The constant upper bound on p 1,n away from the support of µ can be derived by using estimates for σ min ( Ãn,x ) in the case of separated nodes.
Lemma 3.6 (Convergence of singular values).Let µ = r j=1 λ j δ x j be a discrete complex measure whose weights are ordered non-increasingly with respect to their absolute value.Assume that (n + 1) min j = |x j − x | ∞ > d, then the singular values σ j of the moment matrix T n fulfil where the first inequality is due to [3,Theorem 2.2.8].Each entry of the matrix 1 N A n A * n − I r is a modified Dirichlet kernel and can be bounded uniformly by Theorem 3.7.Normalizing differently, we have as n → ∞.
Proof.First note that p 1,n L 1 = r/N .We define pn = F n * μ and observe that for any continuous function f on T d we have f dμ , so, by Theorem 2.5, it is enough to show that N r p 1,n − pn L 1 converges to zero for n → ∞.If n is sufficiently large, then by (2.5) we can write pn (x) = 1 N e n (x) * Ũ Σ Ũ * e n (x) where Σ ∈ C r×r denotes the diagonal matrix consisting of non-zero singular values and Ũ ∈ C N ×r denotes the corresponding singular vector matrix of the moment matrix of μ.
As p 1,n only depends on the signal space of the moment matrix T n of µ, which agrees with the signal space of the moment matrix of μ, it follows by (3.1) that p 1,n (x) = 1 N e n (x) * Ũ Ũ * e n (x) and thus r is constant, the result follows from Lemma 3.6.

Positive-dimensional situation
For a measure µ whose support is an algebraic variety, we derive a pointwise convergence rate p 1,n (x) = O n −1 outside of the variety in Theorem 3.8 and this proves (3.2).It is not clear whether this is already optimal, as we found O n −2 as an approximation rate in the case of a discrete measure.
Theorem 3.8.Let y ∈ T d and let g ∈ e 2πi k,x | k ∈ [m] be a trigonometric polynomial of max-degree m such that g(y) = 0 and g vanishes on supp µ.Then Proof.Set N n = (n+1) d for n ∈ N and define the trigonometric polynomial p(x) = e n,y (x)g(x) of max-degree n + m, where e n,y (x) := e n (x) * e n (y).Furthermore, we define f (x) : for all x ∈ T d .On the other hand, Thus, by (3.4), we obtain where we define y), which proves the first statement.For the upper bound, we compute by using that f = |g| 2 is a trigonometric polynomial of degree m.Then it follows from (3.7) that

Numerical examples
We illustrate in this section the asymptotic behaviour of p n and p 1,n for several types of singular measures, with respect to the Wasserstein-1 distance.We compute the distance using a semidiscrete optimal transport algorithm, described below.Our experiments focus on three examples on T 2 : a discrete measure µ d supported on 15 points, with (nonnegative) random amplitudes, a uniform measure µ cu supported on the trigonometric algebraic curve and a uniform measure µ ci supported on the circle centered in c 0 = ( 1 2 , 1 2 ) with radius r 0 = 0.3.The moments of µ cu are computed numerically up to machine precision using Arb [18] with a parametrization of the implicit curve (4.1).It follows from (2.2) that the trigonometric moments of the measure µ ci are given by µ ci (k) = e −2πikc 0 J 0 (2πr 0 k 2 ).
The polynomials p n , J n * µ, and p 1,n can be evaluated efficiently via the fast Fourier transform over a regular grid in T 2 .For the polynomial p 1,n , the singular value decomposition of the moment matrix T n can be computed at reduced cost by exploiting that T n has Toeplitz structure and resorting only to matrix-vector multiplications.
To compute transport distances to the measure µ ∈ {µ cu , µ ci }, let the curve C = supp µ ⊂ T d denote its support with arc-length L. Now let s ∈ N, take a partition C = s =1 C into path-connected curves with measure µ(C ) = s −1 and arc-length L , and any x ∈ C , then We denote the resulting discrete measures by µ s cu and µ s ci , respectively (see Figure 4.1).In our tests, we use s = 3000 samples, which offers a satisfactory tradeoff between computational time and accuracy for our range of degrees n.Indeed, the computational cost of evaluating the  objective (4.2) or its gradient grows linearly in s, while for degrees up to n = 250, sampling beyond 3000 points has no effect on the output of our algorithm for computing W 1 (p n , µ s ), see where the Laguerre cells associated to the weight vector w are given by Ω j (w) = y ∈ T d : |x j − y| − w j ≤ |x i − y| − w i , i = 1, . . ., s , see e.g.[36].In our implementation, the density measure (and the Laguerre cells) are computed over a 502 × 502 grid.We use a BFGS algorithm to perform the maximization, using the Matlab implementation [44]; we stop the iterations when the change of value of the objective goes below 10 −9 , or when the infinite norm ∇f ∞ goes below 10 −5 .Note that this last condition has a geometrical interpretation since the j-th component of ∇f corresponds to the difference between the measure of the Laguerre cell Ω j (w) and the amplitude λ j .We set the limit number of iterations to 100.In the discrete case, our numerical results show that the Wasserstein distance W 1 (p n , µ s ) decreases at a rate close to the worst-case bound derived in Theorem 2.5.This is also the case for W 1 (p 1,n , μs ), which is coherent with the bound given in the proof of Theorem 3.7.In the positive dimensional cases, one would need to compute the Wasserstein distances for degrees larger than n = 250 to be able to reliably estimate a rate, but this would require better optimized algorithms, in the spirit for instance of [23], which goes beyond the scope of this paper.Still, our preliminary results seem to indicate that the rates for F n * µ and J n * µ in the positive dimensional situation are similar to the ones for discrete measures, but below.Both approximations for n = 19 are shown in the left and right panel of Figure 2.1, respectively.

Figure 2 . 1 :
Figure 2.1: The example measure (2.1) and its approximations by the Fourier partial sum (left) and the convolution with the Fejér kernel (right).The weight 1 3 of the Dirac measure in 1 8 is displayed by an arrow of height n/3 for visibility.

Figure 2 Figure 2 . 2 :
Figure 2.2: Uniform measure on a circle of radius r 0 = 1 3 and its approximations by the Fourier partial sum (left) and the convolution with the Fejér kernel (right).

Theorem 2 . 2 .
For any d, n ∈ N and for every µ ∈ M there exists a polynomial of best approximation in the Wasserstein-1 distance.Moreover, we have sup

Figure 2 .
Figure 2.3 and Table2.1 summarize our findings on the approximation of δ 0 .The best approximation p * as well as the Dirichlet kernel D n (x) = sin(2n + 1)πx/ sin πx are signed with small full width at half maximum but positive and negative oscillations at the sides.The latter might be seen as an unwanted artifact in applications.The approximations given by the Fejér and the Jackson kernel are non-negative.For completeness, we note that the Dirichlet

Figure 3 . 1 :
Figure 3.1: Summary of the bounds on p 1,n from Theorem 3.4 and Remark 3.5 for d = 2, n = 20, and a discrete measure µ supported on four points.The polynomial p 1,20was evaluated on a grid in T 2 and interpolated on the magenta cross section (left), while the bounds on p 1,20 on this cross section are displayed (right).We see that specifically the bound 1−σ min ( Ãn,x ) 2 /N from the proof of Theorem 3.4 reproduces the behaviour of p 1,n .The constant upper bound on p 1,n away from the support of µ can be derived by using estimates for σ min ( Ãn,x ) in the case of separated nodes.

1 √
1, . . ., r. Proof.With the polar decomposition N A * n = P H, where P ∈ C N ×r is unitary and H ∈ C r×r is positive-definite, we have that |λ 1 | ≥ • • • ≥ |λ r | are the singular values of the matrix P ΛP * .Therefore, for the singular values of T n = A * n ΛA n , we obtain max 1≤j≤r

Figure 4 . 1 :
Figure 4.1: The two example measures µ s cu (left) and µ s ci (middle) used in our numerical tests.In this display the two continuous measures are discretized using s = 60 samples.The amplitudes of the spikes in both measures are taken equal, and normalized.The last plot shows the Wasserstein distance W 1 (F n * µ cu , µ s cu ) for degrees n = 1, . . ., 250 and several values of s.

Figure 4 . 1 .(
Now let µ = s j=1 λ j δ x j refer to either µ d , µ s cu or µ s ci .The semidiscrete optimal transport between a measure with density p and the discrete measure µ may be computed by solving the finite-dimensional optimization problem max |x j − y| − w j )p(y)dy (4.2)

Table 2 .
Trigonometric polynomial Sign of polynomial W 1 (δ 0 , K n ) 1: Convergence rates of different trigonometric polynomials approximating δ 0 .Remark 2.11.We close by some remarks which are specific for the univariate setting:(i) We stress that (2.7) in the proof of the Lemma 2.9 allows to compute the Wasserstein distance as an L 1 -distance for real signed univariate measures.Similarly, this allows to compute the so-called star discrepancy ν([0, •)) ∞ as suggested in [29, eq.(2.1) and (2.2)].However note that (2.8) has some additional term such the well known bound on the Lebesgue constant [4, Prop.1.2.3], and Example 2.10 (iii).