Super-Resolution for Doubly-Dispersive Channel Estimation

In this work we consider the problem of identification and reconstruction of doubly-dispersive channel operators which are given by finite linear combinations of time-frequency shifts. Such operators arise as time-varying linear systems for example in radar and wireless communications. In particular, for information transmission in highly non-stationary environments the channel needs to be estimated quickly with identification signals of short duration and for vehicular application simultaneous high-resolution radar is desired as well. We consider the time-continuous setting and prove an exact resampling reformulation of the involved channel operator when applied to a trigonometric polynomial as identifier in terms of sparse linear combinations of real-valued atoms. Motivated by recent works of Heckel et al. we present an exact approach for off-the-grid superresolution which allows to perform the identification with realizable signals having compact support. Then we show how an alternating descent conditional gradient algorithm can be adapted to solve the reformulated problem. Numerical examples demonstrate the performance of this algorithm, in particular in comparison with a simple adaptive grid refinement strategy and an orthogonal matching pursuit algorithm.


INTRODUCTION
Sensing and information retrieval in highly non-stationary environments are challenging inverse problems in radar and sonar applications, and their fundamental understanding is also required for future wireless communication in very rapidly time-varying mobile scenarios. In such problems, the task is to identify or estimate channel parameters in a robust manner by probing the channel with a particular identifier signal w of finite duration, also called pilot signal. In radar, for example, a known radar waveform is transmitted and from the received reflections, distance and relative velocity of a target can be obtained by estimating delay and Doppler shifts. Several reflections superimpose at the receiver, hence the core task consists in estimating the multiple time-frequency shifts from finitely many samples of the received signal: taken within a finite observation interval. Here each triplet (η s , τ s , ν s ) can be interpreted as a particular transmission path with a delay τ s and Doppler-shift ν s due to relative distance and velocity, respectively, with a complex-valued attenuation factor η s . This so called tapped delay-line model, is a special case of a doubly-dispersive (or linear timevariant) channel, where the spreading function is a (finite) point measure. For more details on this terminology, see for example classical works [1,25]. Intuitively, it is clear that simultaneous accuracy in time and frequency are governed by the uncertainty relation and that the shape of the waveform should fit time and frequency dispersion of the channel. However, often only few scatterers are affecting the wave propagation and therefore the number of time-frequency shifts is rather small compared to the number of samples one may acquire at the receiver.
In so-called coherent communication the wireless channel needs to be estimated to equalize unknown data signals consecutively or simultaneously transmitted with the pilot signal. This principle is used for example in orthogonal frequency-division multiplexing (OFDM) modulation scheme [7] which is implemented in many of today's communication technologies like WiFi, LTE and 5G standards, as well as broadcasting systems like DAB and certain DVB standards [28]. Thus, the first goal here is to estimate the action of the channel operator on a particular restricted class of data signals. A channel which is exclusively time-or frequency-selective, reduces to convolutions or multiplication operators and equalization (inverting the action of the channel) is then often possible via conventional deconvolution techniques. In the doubly-selective case however, more advanced equalization approaches are necessary to deal with selfinterference effects. For this purpose the delay-Doppler shifts are usually approximated to lie on a-priori fixed lattices leading to leakage effects [10]. In essence, the intrinsic sparsity of channel does not carry over to the approximated model, rendering compressed sensing methods like [40,34] much less effective.
In radar instead it is important to achieve high resolution on the time-frequency shift parameters itself. However, in future high mobility vehicular communication [29] and automotive applications both aspects will become relevant, i.e., discover the instantaneous neighborhood using radar and simultaneous communicating with other vehicles or road side units. In particular, combined radar and communication transceivers which simultaneously shall use the same hardware and frequency band for both tasks, are recently proposed and investigated in the literature, see exemplary [32]. However, since the propagation environment may change in such vehicular applications as well on a short time-scale and usually in an almost unpredictable manner, it is also important to perform channel estimation and radar in short time cycles with short signals. The traffic type in automotive applications also enforces to ensure strict latency requirements in communication for decoding the equalized data signals.
Beside the practical needs for advanced signal processing algorithms in this challenging engineering field, the estimation problem itself has been attracted researchers working in harmonic analysis. First works in this field and from the perspective of channel identification are due Pfander et al. [35]. Identifiying a linear operator with restricted spreading, i.e., with bandlimited symbol has been investigated in [27].
Finally, we like to mention that there exist other methods for superresolution as Prony-like methods [13,30,31,37,38]. These are spectral methods which perform spike localization from low frequency measurements. They not need any discretization and recover the initial signal as long as there are enough observations. So far we have not examined if and how such methods could be applied for our specific modulationtranslation setting.
Main contribution. The main contribution of this paper is twofold. First, we establish an exact sampling formula for operators which are sparse complex linear combinations of modulation and translation operators H = S ∑ s=1 η s M ν s T τ s applied to (truncated) trigonometric polynomials w as identifiers. The basic resampling idea goes back to the work of Heckel et al. [23], where the problem to identify the parameters η s , ν s , τ s of the unknown operator H is approximated by a discrete formulation without explicitly accounting for the employed function spaces and by applying an approximate sampling formula. Using trigonometric polynomials as identifiers, we derive an explicit resampling formula for the continuous problem such that we can completely avoid the approximation errors in [23]. By this, we also overcome particular parameter limitations in the original proof since we not directly couple time-bandwidth limitation of operator and the identifier.
As a second main result we provide explicit algorithmic reconstruction approaches. Our sampling reformulation allows the straightforward application of standard modifications of the conditional gradient method, also known as Frank-Wolfe algorithm, to determine the amplitudes η s ∈ C and the two-dimensional positions (τ s , ν s ). Here we focus on the alternating direction conditional gradient (ADCG) algorithm propose by Boyd et al. [2]. The corresponding optimization problem takes noise into account and penalizes the sparsity of the above linear combination by the 1 -norm of the amplitudes. The optimization problem can be rephrased in terms of atomic measures, where the 1norm is directly related to the total variation norm of the measure, resp. to the atomic norm of a certain set of atoms. Such problems are known as BLASSO [16]. Besides Frank-Wolfe like algorithms that minimize the location parameters over a continuous domain, a common approach consists in constraining the locations to lie on a grid. This leads to a finite dimensional convex optimization problems, known as LASSO [41] or basis pursuit [8] , for which there exist numerous solvers [12,14,20,43]. We will compare the ADCG applied to our resampled problem with a grid method, where we incorporate an adaptive grid refinement. As a third group of methods, we like to mention the reformulation of the optimization problem via its dual into an equivalent finite dimensional semi-definite program (SDP). This technique was first proposed in [5] and then adapted by many other authors. However, the equivalence of formulations are only true in the one-dimensional setting and in higher dimensions one needs to use e.g. the so-called Lassere hierarchy [16]. An SDP approach for our two-dimensional setting based on a results of [18] was also proposed in the paper of Heckel et al. [23]. Since this approach appears to be highly expansive both in time and memory requirement and has moreover to fight with many non specific local maxima related to the so-called dual certificate, it is not appropriate for our setting.
This paper is organized as follows: In Section 2, we collect the basic notation and results from Fourier analysis and measure theory which are needed in the following sections. At the end of the section we establish a theorem which relates trigonometric polynomials with periodic functions arising from the Fourier transform of compactly supported measures. The proof of the theorem is given in Appendix A. In Section 3, we formulate our superresolution problem for doubly-dispersive channel estimations. More precisely, we are interested in the two-dimensional parameter detection of sparse linear combinations of translation-modulation operators. Instead of treating the original problem, we give a sampling reformulation of the involved translation-modulation operators for identifiers which are trigonometric polynomials. Here the relation between these polynomials and Fourier transforms of measures will play a role. Since the identifiers have only to evaluated at points lying in a compact interval, our choice implies no restriction for practical purposes. In Section 3, we prove the sampling theorem for translation-modulation operators applied to trigonometric polynomials. Then, in Section 5, we show how an alternating descent conditional gradient algorithm can be applied to solve the reformulated problem. Finally, we demonstrate the performance of this algorithm in comparison with simple adaptive grid refinement algorithm and an orthogonal matching pursuit method in Section 6.

PRELIMINARIES
Function spaces. Let I be an open finite interval of R or R itself. By C(I) we denote the space of complex-valued continuous functions on I, by C b (I) the Banach space of bounded functions endowed with the norm f ∞ = sup x∈I | f (x)|. Further, let C 0 (R) ⊂ C b (R) be the closed subspace of functions vanishing at infinity. Let L p (I), p ∈ [1, ∞] be the Banach space of (equivalence classes) of complex-valued Borel measurable functions with finite norm For compact I, it holds L 1 (I) ⊃ L r (I) ⊃ L s (I) ⊃ L ∞ (I), r < s.
An entire (holomorphic) function f : C → C is of exponential type if there exist positive constants A, B > 0 such that The exponential type of f is then defined as the number The Bernstein space B p σ , p ∈ [1, ∞] consist of all functions f of exponential type σ whose restriction to R belongs to L p (R). Endowed with the L p norm, B p σ becomes a Banach space, too. We will need the following sampling result of Nikol'skii [33].
Fourier transform of functions. The Fourier transform F : is a bounded linear operator. For 1 < p ≤ 2, this operator can be extended as F : L p (R) → L q (R), 1 p + 1 q = 1 via the limit in the norm of L q (R) of By Plancherel's equality, the Fourier transform is an isometry on L 2 (R). Note that the Fourier transform of a function f ∈ L p (R) with p > 2 can be defined in terms of tempered distributions. However, the distributional Fourier transformf does in general not correspond to a function. A special role plays the sinus cardinalis defined as The sinc function is in L 2 (R) but not in L 1 (R). Further, we havê where χ S denotes the characteristic function of a set S ⊆ R, i.e., χ S (x) = 1 if x ∈ S and χ S (x) = 0 if x / ∈ S. The counterpart of scaled sinc functions in the periodic setting are the Nth Dirichlet kernels given by For arbitrary f ∈ L 1 (R) withf ∈ L 1 (R), the Fourier inversion formula holds true almost everywhere and, moreover, pointwise if the function f is continuous.
For two functions f ∈ L 1 (R) and w ∈ L p (R), p ∈ [1, ∞], the convolution f * w is defined almost everywhere by and is contained in L p (R). For p ∈ [1,2], the relation between convolution and Fourier transform is given by f * w =fŵ. For σ > 0 and p ∈ [1, ∞], we denote by PW p σ the Paley-Wiener class of functions f : C → C of the form for some g ∈ L p (−σ , σ ). We have the inclusion PW r σ ⊂ PW s σ for 1 ≤ s < r. Functions of the class PW p σ are holomorphic and of exponential type 2πσ by For p ∈ [1, 2], we further have PW p σ ⊂ B q 2πσ , see [24]. Measure spaces. Let X be a compact subset of R d or R d itself. By M (X) we denote all regular, finite, complex-valued measures, i.e., all mappings µ : B(X) → C from the Borel σ -algebra of R n to C with |µ(X)| < ∞ and for any sequence {B k } k∈N ⊂ B(X) of pairwise disjoint sets. We suppose that the series on the right-hand side converges absolutely, so that the indices of the sets B k can be arbitrarily reordered. The support of a complex measure µ ∈ M (X) is defined by where ρ + − ρ − = ℜ(µ) and ι + − ι − = ℑ(µ) are the Hahn decompositions of the real and imaginary part into non-negative measures. The support of a non-negative measure ν is the closed set The total variation of a measure µ ∈ M (X) is defined by With the norm µ M := |µ|(X) the space M (X) becomes a Banach space. The space M (X) can be identified via Riesz's representation theorem with the dual space of C 0 (X) and the weak- * topology on M (X) gives rise to the weak convergence of measures.
We will need that, for a bounded Borel-measurable function g, the measure gµ defined by gµ(B) : Indeed, the integral with respect to a measure µ ∈ M (R) is also well defined for every The Fourier transform is a linear, bounded operator from M (R) into C b (R) with operator norm one. Moreover, it is unique in the sense that µ ∈ M (R) withμ ≡ 0 implies that µ is the zero measure. We are especially interested in the Fourier transform of atomic measures µ := ∑ k∈Z α k δ (· − t k ) with α k ∈ C, t k ∈ R given bŷ If the point masses are equispaced located at t k = k n with n ∈ N, the Fourier transform becomes an n-periodic Fourier series. Moreover, restricting the support of µ to [−σ , σ ], we obtain the n-periodic trigonometric polynomial where N = σ n and t k = k n , k = −N, . . . , N. The following theorem shows that also the reverse direction is true, i.e., every periodic function given as the Fourier transform of a compact measure is a finite trigonometric polynomials.
Suppose that f is n-periodic for n > 0. Then f is a trigonometric polynomial of the form where N = σ n .
The proof of the theorem is given in Appendix A.

SUPERRESOLUTION IN DOUBLY-DISPERSIVE CHANNEL ESTIMATION
In doubly-dispersive channel estimation we are both interested in the detection of shifts and modulations of signals. Recall that the shift operator T τ and the modulation operator M ν are defined for x, τ, ν ∈ R by respectively. Their concatenation is given by .
Both operators are unitary on L 2 (R). Note that a similar definition of shifts and modulations can be given for tempered distributions, see, e.g., [36, Section 4.3.1]. For S ∈ N and T , Ω > 0, we consider the operator We are interested in the following superresolution problem: for a known function w ∈ C b (R), determine the amplitudes η s ∈ C and positions τ s , ν s ∈ X, s = 1, . . . , S from certain samples of In this context, the function w is often called identifier. Our solution will be based on an exact sampling formula of Hw which contains sparse linear combination of certain real-valued "atoms". The idea to use such a reformulation for later computations originates from a paper of Heckel et al. [23]. However, the approach of those authors uses only an approximate sampling formula without given error bound and not an exact one, see Remark 2. The main sampling result is given in the following theorem.

Theorem 3 (Sampling Formula for Translation-Modulation Operators
be an L 1 Ω -periodic trigonometric polynomial. Then, we have for τ, ν ∈ R and x j = T j with so-called atoms The proof of Theorem 3 is the content of the next section. By Theorem 3, we can rewrite the superresolution problem (3) with an identifier of the form (4) for given samples By periodicity of the atoms (6), it makes indeed sense to restrict ourselves to In this case, all points x j − u Ω at which the periodic identifier w in (7) must be evaluated, belong to the interval I = (−T , T ). In practice, we therefore have not really to work with a periodic identifier, but can restrict ourselves to the compactly supported function χ I w. Setting where we consider (u, v) arranged as a vector, and introducing the operator T w x j − u Ω we can rewrite the superresulution problem (7) as In practical applications, the measurements y are often corrupted by noise so that we finally intend to solve the regularized problem where η = (η s ) S s=1 and τ = (τ s ) S s=1 , ν = (ν s ) S s=1 . Indeed, we may choose S larger than the number of expected translation-modulations and hope that the regularization term enforces the sparsest solution.
Remark 1. The above problem is closely related to an inverse problem in the space of measures. To this end, we consider the linear, continuous operator A : Then, we may consider the inverse problem min µ∈M (X) Problems of this kind are also known as BLASSO [16,5] and were studied in several papers, e.g., by Bredies and Pikkarainen [3] and Denoyelle et al. [17]. In particular, it was shown that the problem has a solution. Since GA * is not injective, the solution is in general not unique. Restricted to atomic measures in M (X), i.e. µ = ∑ S s=1 η s δ (· − (τ s , ν s )), problem (9) takes the form (8). The superresolution problem may be also seen from the point of view the so-called atomic norm formulation addressed in a couple of papers [6,9,17,19,39]. Since η s = |η s |e 2πiφ s is complex-valued, the set of atoms must be redefined as {e 2πiφ s a(τ, ν) : φ ∈ [0, 1), (τ, ν) ∈ X} to take real linear combinations of atoms.
As already mentioned, superresolution problem (3) has been already considered by Heckel et al. [23]. However, these authors proposed to use a different identifier, an issue addressed in the next remark.
Remark 2 (Relation to the work of Heckel et al. [23]). The authors of [23] considered the case N 1 = N 2 = N and L 1 = L 2 = L := T Ω, so that the resampling formula (7) becomes However, as identifier they propose Actually, R = 1 was applied in [23]. Since the sinc function is not periodic, the resampling formula (10) does not hold exactly and only gives an approximation.

RESAMPLING RESULTS FOR TRANSLATION-MODULATION OPERATORS
In this section, we prove Theorem 3. The basis is the Sampling Theorem 4 for L 1 functions. Then we prove certain sampling formulas which are of interest on their own. First, in Lemma 2, we show a sampling formula for p H(q * w), where p, q are compactly supported functions with Fourier transform in L 1 (R), for general w ∈ L ∞ (R) using certain compactly supported helper functions φ and ψ. Restricting to identifiers w which are Fourier transforms of measures, we will see in Theorem 5 that the helper functions can be avoided. Finally, we will use this theorem together with approximation arguments involving sequences of compactly supported Schwartz functions {p n } n and {q n } n to prove Theorem 3. We start by recalling a sampling theorem for L 1 functions.
for all x ∈ R with absolute and uniform convergence on R and convergence in L 1 (R).
For convenience, the proof is given in Appendix B. In the following, we will further need the next auxiliary lemma.
Then D w is continuous and for all τ, ν ∈ R we have for a.e. x ∈ R.
Proof. For any x ∈ R, we have Thus D w L 1 (R 2 )→L ∞ (R) ≤ w L ∞ (R) and the first claim follows.
For the left-hand side of (11) we have by Young's convolution inequality, see [36], that q * w ∞ ≤ q 1 w ∞ .
x dξ a.e., we obtain for a.e. x ∈ R that We use the above lemma to show the following intermediate sampling formula.
Lemma 2. Let H be given by (2). Let w ∈ L ∞ (R) and p, q for all x ∈ R, where The series on the right side of (12) converges uniformly on R.
Proof. By linearity it suffices to consider the case . Moreover, by the support properties of p and q, we get suppF . Consequently, we can apply Theorem 4 to F along each dimension w.r.t. the step-sizes a and b and low-pass kernelsφ andψ to obtain which converges absolutely and uniformly. For the L 1 -convergence, we have to show that vanishes for N → ∞, which follows for both integrals as discussed in the proof of Theorem 4.
As the operator D w : L 1 (R 2 ) → L ∞ (R) defined in Lemma 1 is continuous we conclude a.e.
By applying Lemma 1 once again, we obtain for a.e. x ∈ R.
Consequently we get for a.e. x ∈ R that Note that by Theorem 1 the sequences q(ak − τ) k∈Z and p(b − ν) ∈Z are absolutely summable. The functions M b T ak (ψ * w) are bounded by Thus, the series (13) converges uniformly on R and, since the partial sums in (13) are continuous functions, we conclude that the series converges to a continuous bounded function. As p andq * w are also continuous and bounded, we see that (12) holds for all x ∈ R.
Although Theorem 2 works on arbitrary bounded identifiers w ∈ L ∞ (R), the fact that the left side of (12) does not depend on φ and ψ suggests that there might be a way to avoid the use of these functions. For this purpose, we restrict our attention to a subset of L ∞ (R), namely functions f =μ f with µ f ∈ M (R). Having the Fourier convolution theorem in mind, for a Borel measurable, bounded function φ , we define the convolution which yields a continuous and bounded function. If φ ∈ L 1 (R) ∩ C 0 (R) such thatφ ∈ L 1 (R), then our convolution may be expressed by the Fourier convolution as We have the following convergence result.
Lemma 3. Let f =μ f with µ f ∈ M (R) and let g be a bounded Borel-measurable function. Assume that the uniformly bounded and Borel measurable functions g n : R → C converge pointwise to g : R → C. Then g n F f converges uniformly to g F f , i.e., Proof. Applying Fatou's lemma, we obtain lim sup The lemma of Fatou is applicable since g − g n ∞ ≤ 2M for some M > 0 and constant functions are integrable w.r.t. µ f ∈ M (R).
Theorem 5. Let H be given by (2). Let w =μ w with µ w ∈ M (R) and p, q ∈ L 1 (R) ∩ C 0 (R) withp,q ∈ L 1 (R) and supp p ⊆ [− . Choose 0 < a < 1/Ω q and 0 < b < 1/T p . Then, for all x ∈ R, we have where The series on the right-hand side of (14) converges uniformly on R.
Proof. Let (ψ n ) n∈Z and (φ n ) n∈N be uniformly bounded sequences of Schwartz functions with 2a . Abbreviating y := pH(q * w), we obtain by Theorem 2 that for all x ∈ R and n, m ∈ N.
Note that neither y(x) nor c k, depend on n or m. Letting n → ∞, we immediately obtain the pointwise limit for all x ∈ R and m ∈ N.
Now consider the series: We already used in the proof of Theorem 2 that by Theorem 1 the coefficients (c k, ) k, ∈Z ∈ 1 (Z 2 ) are absolutely summable. Moreover, writing φ := aχ (− 1 2a , 1 2a ) we know by construction that φ m (x) → φ (x) as m → ∞ for every x ∈ R and (φ m ) m∈Z is uniformly bounded. We can therefore apply Lemma 3 to obtain Letting m → ∞ the right side converges to 0 which proves that for all x ∈ R, which is equivalent to (14). The uniform convergence of the series follows immediately from (c k, ) k, ∈Z ∈ 1 (Z 2 ) and

Now we can prove our main theorem.
Proof of Theorem 3. 1. Since |k|Ω in the reprsentation (4) of the identifier w, we see that supp and let (γ n ) n∈N and (λ n ) n∈N be sequences of positive numbers such that 1 < γ n and β < λ n < 1 and γ n λ n < 1 for all n ∈ N which converge to 1 as n → ∞. Then, for n ∈ N, define T n := γ n T , Ω n := γ n Ω, a n := λ n Ω , b n := λ n T , as well as the functions Clearly, we have for all n ∈ N that w n =μ w n , where µ w n ∈ M (R) fulfills Further, the function w n is a n L 1 -periodic. Let (p n ) n∈N , (q n ) n∈N be sequences of Schwartz functions with p n (x) = 1, for |x| ≤ T 2 , 0, for |x| ≥ T n 2 , q n (x) = 1, for |x| ≤ Ω 2 , 0, for |x| ≥ Ω n 2 . We consider the signal y n (x) := p n (x)M ν T τ (q n * w n )(x), x ∈ R. Now p n , q n as well as a n = λ n Ω < 1 Ω n and b n = λ n T < 1 T n satisfy the assumptions of Theorem 5. Hence we get y n (x) = a n b n χ (− 1 2bn , 1 2bn ) (x) ∑ k∈Z ∑ ∈Z c n,k, M b n T a n k χ (− 1 2an , 1 2an ) F w n (x).
with c n,k, :=q n (a n k − τ)p n (b n − ν) for k, ∈ Z. Since 1 a n = Ω λ n > Ω it follows that supp µ w n ⊂ (− Ω 2 , Ω 2 ) ⊂ (− 1 2a n , 1 2a n ). Therefore we have for all x ∈ R and n ∈ N that Thus for |x| < 1 2b n we can simplify (15) to y n (x) = a n b n ∑ k∈Z ∑ ∈Z c n,k, M b n T a n k w n (x).
2. For j = −N 2 , . . . , N 2 , we consider y n j b n L 2 = a n b n ∑ k∈Z ∑ ∈Z c n,k, w n j b n L 2 − a n k e 2πi j L 2 .
Sincep n ,q n are Schwartz functions, we know that (c n,k, ) k, ∈Z ∈ 1 (Z 2 ). Further w n is bounded, so that the series in (16) converges absolutely. Consequently we can rearrange the summation and use the substitution k = rL 1 + u and = tL 2 + v for r,t ∈ Z and u = −N 1 , . . . , N 1 as well as v = −N 2 , . . . , N 2 to obtain c n,rL 1 +u,tL 2 +v w n j b n L 2 − a n u − a n L 1 r e 2πi(t j+ v j L 2 ) = a n b n − a n u e 2πi v j L 2 ∑ r∈Z ∑ t∈Z c n,rL 1 +u,tL 2 +v = a n b n where in the last line we abbreviate ∑ r∈Z ∑ t∈Z c n,rL 1 +u,tL 2 +v = ∑ r∈Zq n a n (rL 1 + u) − τ We can significantly simplify (18) via Poisson's summation formula: Indeed,q n ,p n are band-limited, integrable functions, so by Lemma 4 we obtain Q n,u (τ) = ∑ r∈Zq n a n (rL 1 + u) − τ = 1 a n L 1 We used that q n ( −k a n L 1 ) = 0 if |k| ≥ L 1 2 since this implies |k| a n L 1 ≥ 1 2a n > Ω n 2 and also p n ( − b n L 2 ) = 0 if | | ≥ L 2 2 because then | | b n L 2 ≥ 1 2b n > T n 2 .
3. Finally, we take limits. By continuity of w it is easy to compute lim n→∞ w n j b n L 2 − a n u = lim n→∞ w j Now consider the limits of Q n,u (τ) and P n,v (ν). It follows from a n Ω = λ n > β > 2N 1 L 1 that |k| a n L 1 ≤ N 1 a n L 1 < Ω 2 for k = −N 1 , . . . , N 1 , which in turn implies q n ( −k a n L 1 ) = 1. Similarly, since b n T = λ n > β > 2N 2 L 2 we have | | b n L 2 ≤ N 2 a n L 2 < T 2 and thus p( − b n L 2 ) = 1 for all = −N 2 , . . . , N 2 . Consequently, it follows lim n→∞ Q n,u (τ) = lim n→∞ 1 a n L 1 and by the an analogous computation, Therefore taking the limit of (17) yields lim n→∞ y n j b n L 2 Next we consider the limit of the definition of y n ( j b n L 2 ), i.e., Using the assumptions on q n we obtain for all n ∈ N, so that (20) can be written as We already showed in a previous argument that p n ( j b n L 2 ) = 1 for j = −N 2 , . . . , N 2 for all n ∈ N. Then it follows from continuity that lim n→∞ y n j b n L 2 = lim n→∞ w j Combining (19) with (21) then proves (5).

NUMERICAL ALGORITHMS
In this section, we propose to solve problem (8), i.e., argmin η∈C S ,(τ,ν)∈X S G S ∑ s=1 η s a(τ s , ν s ) − y 2 + λ η 1 , λ > 0 by two kind of algorithms. We adapt the alternating descent conditional gradient method from [2] to our setting in Subsection 5.2. We will address the theoretical convergence behaviour in a forthcoming and refer only to the literature here. For numerical comparisons, we start with a simple grid refinement algorithm in the next subsection 5.1.

Multi-Level Time-Frequency Refinement Algorithm.
Instead of solving the optimization problem over the continuous set X = [−T /2, T /2] × [−Ω/2, Ω/2], we may discretize X on a grid J of cardinality J. For instance we could choose an equidistant grid. Then we consider the atoms on the grid points (τ j , ν j ), j ∈ J . Setting and η ∈ C J , we reduce (9) to the convex minimization problem The sparsity of the discrete measure is here promoted by the 1-norm. In other words, we hope that η has only S J entries which are not near zero. For one-dimensional problems on the torus, Duval and Peyré [19] showed that the discretized problem Γconverges to the continuous problem in the sense of Remark 1 if the regular grid gets finer and finer under certain assumptions; so if the grid is fine enough, we should obtain a sufficient precise solution. On the contrary, a fine grid blows up the problem dimension and make its numerically intractable. Further, as described in [17] and references therein for general total variation minimization problems, the true point masses are usually approximated by several point masses of the grid in a small neighbourhood. These clusters may be detected and replaced by an averaged point mass. Further, the minimization problem (22) is a basis pursuit often encountered in compressed sensing and can be solved using toolboxes like CVX [22] or, approximately, by greedy methods like matching pursuits [15,42,4]. Expand τ ∈ R k , ν ∈ R k by (τ k+1 , ν k+1 ) := argmax (τ,ν)∈J | r,Ga(τ,ν) | Ga(τ,ν) 2 .
Instead of choosing a fine grid on the entire domain, we would like to solve the 1 minimization problem (22) on a small set J that, in the ideal case, only covers the neighbourhoods of the unknown true parameters in X to reduce the numerical effort. For this purpose, we initially apply the orthogonal matching pursuit in Algorithm 1 on a fine regular grid until the residuum r gets small or a certain number of atoms is determined. Although the performance of the greedy method strongly depends on the current instance, the computed atoms are usually located near the true point masses. Surrounding the computed atoms with a fine local grid, we obtain a good starting set J 0 for (22). Next, we would like to let the local grid become finer and finer to improve the solution and to let the number of atoms be nearly the same. Having an optimal η * of (22) for J r , we may chose a new finer grid J r+1 around the interesting features by one of the following refinement strategies: (1) Determine the dominant atoms corresponding to (τ j , ν j ) ∈ J r with |η * j | ≥ ε. Discretize the neighbourhood around these atoms by a finer grid. Chose J r+1 as the union of these finer grids.
(2) Determine the importance γ j of the atom corresponding to (τ j , ν j ) ∈ J r by where the coefficients of all atoms with parameters in a neighbourhood U j around (τ j , ν j ) are summed up. For the most important neighbourhood U j , compute the barycenter by Add a finer grid around (τ j ,ν j ) to J r+1 , remove the atoms in U j from J r , and repeat the procedure as long as there are important points with γ j ≥ ε.
The new local grids should cover a smaller neighbourhood. For instance, these grids could again be regular with decreasing step size according to r. Notice that the numerical effort of the first refinement strategy is less than for the second one. On the other hand, the second strategy can leave the local grids due to the barycenters. After determining a final atomic set J * containing the most dominant atoms or barycenters, the corresponding coefficients can be computed by solving the least square problem In summary, we obtain Algorithm 2.

4:
Determine a new atomic set J r+1 using strategy 1 or 2.

5.2.
Alternating Descent Conditional Gradient Algorithm. Next, we adapt the ADCG from [2] to our setting. This algorithm minimizes over the continuous domain X. The ADCG is a modification of the conditional gradient method (CGM) -also known as the Frank-Wolfe algorithm introduced in [21] -for total variation regularization. The original Frank-Wolfe algorithm on R d solves optimization problems of the form argmin x∈V f (x), where the feasible set V ⊂ R d is compact and convex and the function f is a differentiable and convex. Given the kth iterate x k each iteration consists basically of two steps, namely i) minimizing a linearized version of f in x k over the feasible set .
In superresolution, the first step always consists in an update of the support of the measure as it is also done in the first step of our Algorithm 3.
Concerning the second step, all important convergence guarantees of the algorithms are still valid, if we replace x k+1 in the second step by any feasiblex k+1 that fulfills f (x k+1 ) ≤ f (x k+1 ). This flexibility has led to several successful variations of the classical Frank-Wolfe algorithm. ADCG related algorithms which differ in the second step are for example the algorithm in [3] and the so-called sliding Frank-Wolfe in [17]. While the first one uses soft shrinkage to update the amplitudes and a discrete gradient flow over the locations, the second one uses a non-convex solver to jointly minimize over the amplitudes and positions with a suitable starting values for the amplitudes.
Adapting the ADCG to our setting results in Algorithm 3, whose details are discussed in the following. For convergence results we refer to [2]. The expansion step of the ADCG algorithm is very similar to the greedy matching pursuit in Algorithm 1 without normalization of the atoms. To find a solution (τ J k +1 , ν J k +1 ) := argmax (τ,ν)∈X | r, Ga(τ, ν) |, the objective can first be evaluated on a fine regular grid of X. The obtained (τ J k +1 , ν J k +1 ) may then be improved using a gradient descent method. In our numerical simulations, we however notice that this improvement step has no crucial impact on the recovered measure for our problem and can be skipped.
The second step consists in the update of the parameters by (η, τ, ν) := argmin ν 1 ), . . . , a(τ S , ν S )]. In difference to the general algorithm in [17], the coefficient of the point masses η are complex numbers such that the above update consists in the minimization of a non-smooth objective. Therefore, we use the alternating minimization proposed in [2], which splits up the minimization into the basis Expand τ ∈ R J k , ν ∈ R J k by | r, Ga(τ, ν) |.

6:
Compute a minimizer . The 1 regularized problem can be solved as discussed above and the second one by a gradient descent or quasi Newton method like BFGS. A short computation shows that the gradients of the objective F are just given by where · * denotes the conjugation and transposition of a matrix. The partial derivatives of the atoms a(τ j , ν j ) with respect to τ j and ν j are collected in the matrices The derivative of the N-th Dirichlet kernel D N is given by Finally, we like to mention that the numerical effort of ADCG algorithm is much higher compared with the multi-level refinement in Algorithm 2 since several optimization problems have to be solved for each added point mass.

NUMERICAL RESULTS
In the following experiments, we compare the orthogonal matching pursuit, the multilevel time-frequency refinement, and the ADCG. First, we consider the performance for a specific synthetic instance. Then we study the general performance with respect to the noise level and how many measurements are needed to estimate the unknown channel. Finally, the influence of the identifier model is discussed. Channel estimation from synthetic measurements. For this experiment, we assume that the unknown channel or operator H in (2) has exactly s = 10 features and that this number is known in advance. The shifts and modulations (τ j , ν j ) are independently generated with respect to the uniform distribution on [−T /2, T /2] × [−Ω/2, Ω/2] = [−1.5, 1.5] × [15.5, 15.5]. The coefficients η j are independently and uniformly drawn from the complex unit circle. The employed identifier w is a trigonometric polynomial of degree N 1 = 50, i.e. L 1 = 101, whose coefficients are independently drawn from the complex unit circle too. The true samples y j = Hw( T j L 2 ) with j = −N 2 , . . . , N 2 and L 2 = 101 are corrupted by additive complex Gaussian noise such that ||y − y δ ||/||y|| = 0.1, which corresponds to −10 db 1 white noise -the noisy data are again denoted by y δ .
To recover the unknown channel parameters, we apply the orthogonal matching pursuit (Algorithm 1) with the regular grid S of [−T /2, T /2]×[−Ω/2, Ω/2] consisting of 1 024 points in each direction. The same grid is used to compute the location of the new point masses in the ADCG (Algorithm 3). Both methods are stopped after computing exactly 10 features. The multi-level refinement in Algorithm 2 is initialized by applying the orthogonal matching pursuit to a coarser grid with 256 points in each direction. The local 5 × 5 grids are then refined 15 times by reducing the stepsize by a factor of 0.75. We always use the second refinement strategy. The multi-level refinement and the ADCG are applied to the Tikhonov regularization (9) with λ = 500. The recovered shifts and modulations of all three methods are shown in Figure 1. The true parameters are denoted with an additional †. The absolute errors of the estimation are recorded in   Table 1, where the experiment has been repeated 50 times and the errors are averaged. For this instance, all three methods yield comparable results, where the shifts τ j and modulations ν j are quite accurate. The multi-level refinement and the ADCG method achieve slightly higher accuracies than the orthogonal matching pursuit, but, on the downside, the ADCG method is much more time-consuming than the others. Considering the noise level, the results are nevertheless satisfying and show that in particular the shifts and modulations are recoverable from highly noisy measurements.
Influence of Noise. Next, we study the influence of the noise on the recovery quality of the algorithms in more details. Therefore, the unknown channel is again randomly generated with respect to 10 coefficients on the complex unit circle. In contrast to the first numerical example, the algorithms are henceforth stopped if the residuum becomes small or if the objective stagnates; in other words, the algorithms have no knowledge of the true sparsity S. The degree of the random identifier with unimodular coefficients and the number samples is L 1 = L 2 = 101 once more. The remaining parameters are T = 1 and Ω = 101. The parameter λ is chosen with respect to the noise level and goes to zero for vanishing noise. Differently from the experiment before, we want to measure how well the estimated channel approximates the true one. Since we are only interested on Due to Parseval's identity, the considered subspace is isometrically isomorph to the coefficient space C L 1 . After discretizing [−T /2, T /2] and employing the midpoint rule, the operator norm may be computed numerically using the singular value decomposition.
The mean performance of the discussed algorithms is shown in Figure 2, where for every noise level the experiment has been repeated 50 times. During the multi-level refinement, the step size of the local grids is decreased 25 times by a factor of 2/3. For the ADCG method the 1 and least-square minimization is alternated 25 times. The observation of the first experiment for −10 dB noise carry over. Notice that already small parameter errors lead to large relative errors in the operator norm. The reconstruction error for the multi-level method and the ADCG method corresponds nearly one-to-one to the noise level of the measurements. The reconstruction by the orthogonal matching pursuit does not improve if the noise is decreasing. Although the orthogonal matching pursuit yields sufficient results as starting point for the refinement method, the problem cannot be solved sufficiently accurate by applying only this greedy method.   Number of required measurements During our numerical experiments, we have noticed that around 10 times more samples than unknown features are required to estimate the parameters of the channel sufficiently well. In the following, we explore the question how many measurements are needed in more details. For this, we consider the solution of Algorithm 3 for different numbers of features and numbers of measurements. The remaining parameters of the setting are Ω = L 1 = L 2 = 101 and T = 1. The coefficient of the unknown channel are unimodular, and the measurements are exact. We declare a reconstruction as success if the relative error ||H † − H||/||H † is less than −40 dB, and repeat the experiment 50 times for each data point. The success rate and the mean relative error in the operator norm are shown in Figure 3 and sustain our observation.
This experiment is the numerical analogon to the theoretical recovery guarantee in [23,Thm 1], where the unknown parameters (η s , τ s , ν s ) of (10) in Remark 2 are determined by solving an atomic norm problem. More precisely, the minimizer of the atomic norm problem yields the wanted parameters with high probability under certain assumptions. For the theoretical statement, at least L ≥ 1024 measurements are required. Considering the phase transition in Figure 3, we see that, from a numerical point of view, much less measurements are needed to recover the unknown channel. In particular for higher sparsity levels, the transition between failure and success becomes non-linear, which corresponds to the theoretical results.
Influence of the minimal separation. Continuing the discussion of the theoretical guarantees, we recall that one of the crucial assumptions is a lower bound for the minimal separation min If the distance between two or more features in the parameter space become to close, they cannot be resolved numerically and are often combined into one feature. This wellknown effect may heavily lower the quality of the reconstruction and also occur in our setting. To study this behaviour numerically, we again consider random channels with 10 unimodular features for L 1 = L 2 = 101, T = 1, Ω = 101. The shifts and modulations  are generated such that the parameter set exactly possesses a certain minimal separation.
The results with respect to the operator norm on P N 1 are shown in Figure 4, where the experiments have been repeated 50 times without noise. If the separation falls below 0.01, then the error increases rapidly. Note that this transition point depends on the problem dimension L 1 , L 2 and on the number of unknown features. Importance of the identifier model. Finally, we study how the chosen identifier model affects the recovery quality. During the entire paper, we used trigonometric polynomials as identifier w for the unknown channel. On the basis of w, the given samples Hw(T d/L 2 ) are related to the unknown parameters by Theorem 3, which are then determined by solving the Tikhonov functional (9) with respect to the total variation norm for measures. In [23], for the special case The real coefficients are chosen partially periodic as c k = c k+L = c k−L for k = −N, . . . , N.
We denote the L-dimensional span of the sinc functions (24) by S L . The given samples are then only approximated by (5) in Theorem 3. The replacement of the trigonometric polynomial by a sum of sinc functions leads to a model error. Considering a channel with 10 features and 101 samples as before, and studying the recovery error of Algorithm 3 measured in the operator norm, we see that the model mismatch corresponds to a noise level of around −25 db. Notice that the comparison with respect to trigonometric polynomials is somehow subjective. For this reason, we also compute the relative reconstruction error based on the subspace of sinc functions (24). Numerically, the difference between both error terms is negligible. The clearly visible approximation for sinc functions does not occur for trigonometric identifiers.  To prove Theorem 2 we need the following auxiliary lemmata. We start with Poisson's summation formula for bandlimited functions. Since we have not found it directly in the literature we give the proof for convenience.
Proof. By assumption, we have suppf ⊆ [−σ , σ ] for some σ > 0 so that we may identify f as an element in B 1 2πσ . By Theorem 1, we know that for all x ∈ R. This shows that F α is indeed well-defined and bounded. In particular, F α ∈ L ∞ (R/αZ) ⊂ L 1 (R/αZ) and we can compute the Fourier coefficientŝ Interchanging the series and integral in (26) is allowed by the theorem of Fubini-Tonelli since x → ∑ ∈Z | f (x − α )| is uniformly bounded by (25) and thus integrable on [0, α].