Abstract
In this work we consider the problem of identification and reconstruction of doublydispersive channel operators which are given by finite linear combinations of timefrequency shifts. Such operators arise as timevarying linear systems for example in radar and wireless communications. In particular, for information transmission in highly nonstationary environments the channel needs to be estimated quickly with identification signals of short duration and for vehicular application simultaneous highresolution radar is desired as well. We consider the timecontinuous setting and prove an exact resampling reformulation of the involved channel operator when applied to a trigonometric polynomial as identifier in terms of sparse linear combinations of realvalued atoms. Motivated by recent works of Heckel et al. we present an exact approach for offthegrid superresolution which allows to perform the identification with realizable signals having compact support. Then we show how an alternating descent conditional gradient algorithm can be adapted to solve the reformulated problem. Numerical examples demonstrate the performance of this algorithm, in particular in comparison with a simple adaptive grid refinement strategy and an orthogonal matching pursuit algorithm.
Introduction
Sensing and information retrieval in highly nonstationary environments are challenging inverse problems in radar and sonar applications, and their fundamental understanding is also required for future wireless communication in very rapidly timevarying mobile scenarios. In such problems, the task is to identify or estimate channel parameters in a robust manner by probing the channel with a particular identifier signal w of finite duration, also called pilot signal. In radar, for example, a known radar waveform is transmitted and from the received reflections, distance and relative velocity of a target can be obtained by estimating delay and Doppler shifts. Several reflections superimpose at the receiver, hence the core task consists in estimating the multiple timefrequency shifts from finitely many samples of the received signal:
taken within a finite observation interval. Here each triplet \((\eta _s, \tau _s, \nu _s)\) can be interpreted as a particular transmission path with a delay \(\tau _s\) and Dopplershift \(\nu _s\) due to relative distance and velocity, respectively, with a complexvalued attenuation factor \(\eta _s\). This so called tapped delayline model, is a special case of a doublydispersive (or linear timevariant) channel, where the spreading function is a (finite) point measure. For more details on this terminology, see for example classical works [1, 25]. Intuitively, it is clear that simultaneous accuracy in time and frequency are governed by the uncertainty relation and that the shape of the waveform should fit time and frequency dispersion of the channel. However, often only few scatterers are affecting the wave propagation and therefore the number of timefrequency shifts is rather small compared to the number of samples one may acquire at the receiver.
In socalled coherent communication the wireless channel needs to be estimated to equalize unknown data signals consecutively or simultaneously transmitted with the pilot signal. This principle is used for example in orthogonal frequencydivision multiplexing (OFDM) modulation scheme [7] which is implemented in many of today’s communication technologies like WiFi, LTE and 5G standards, as well as broadcasting systems like DAB and certain DVB standards [28]. Thus, the first goal here is to estimate the action of the channel operator on a particular restricted class of data signals. A channel which is exclusively time or frequencyselective, reduces to convolutions or multiplication operators and equalization (inverting the action of the channel) is then often possible via conventional deconvolution techniques. In the doublyselective case however, more advanced equalization approaches are necessary to deal with selfinterference effects. For this purpose the delayDoppler shifts are usually approximated to lie on apriori fixed lattices leading to leakage effects [10]. In essence, the intrinsic sparsity of the channel does not carry over to the approximated model, rendering compressed sensing methods like [34, 40] much less effective.
In radar instead it is important to achieve high resolution on the timefrequency shift parameters itself. However, in future high mobility vehicular communication [29] and automotive applications both aspects will become relevant, i.e., discover the instantaneous neighborhood using radar and simultaneous communicating with other vehicles or road side units. In particular, combined radar and communication transceivers which simultaneously shall use the same hardware and frequency band for both tasks, are recently proposed and investigated in the literature, see exemplary [32]. However, since the propagation environment may change in such vehicular applications as well on a short timescale and usually in an almost unpredictable manner, it is also important to perform channel estimation and radar in short time cycles with short signals. The traffic type in automotive applications also enforces to ensure strict latency requirements in communication for decoding the equalized data signals.
Beside the practical needs for advanced signal processing algorithms in this challenging engineering field, the estimation problem itself has been attracted researchers working in harmonic analysis. First works in this field and from the perspective of channel identification are due Pfander et al. [35]. Identifiying a linear operator with restricted spreading, i.e., with bandlimited symbol has been investigated in [27].
Finally, we like to mention that there exist other methods for superresolution as Pronylike methods [13, 30, 31, 37, 38]. These are spectral methods which perform spike localization from low frequency measurements. They not need any discretization and recover the initial signal as long as there are enough observations. So far we have not examined if and how such methods could be applied for our specific modulationtranslation setting.
Main contribution The main contribution of this paper is twofold. First, we establish an exact sampling formula for operators which are sparse complex linear combinations of modulation and translation operators
applied to (truncated) trigonometric polynomials w as identifiers. The basic resampling idea goes back to the work of Heckel et al. [23], where the problem to identify the parameters \(\eta _s\), \(\nu _s\), \(\tau _s\) of the unknown operator H is approximated by a discrete formulation without explicitly accounting for the employed function spaces and by applying an approximate sampling formula. Using trigonometric polynomials as identifiers, we derive an explicit resampling formula for the continuous problem such that we can completely avoid the approximation errors in [23]. By this, we also overcome particular parameter limitations in the original proof since we not directly couple timebandwidth limitation of operator and the identifier.
As a second main result we provide explicit algorithmic reconstruction approaches. Our sampling reformulation allows the straightforward application of standard modifications of the conditional gradient method, also known as FrankWolfe algorithm, to determine the amplitudes \(\eta _s \in \mathbb {C}\) and the twodimensional positions \((\tau _s,\nu _s)\). Here we focus on the alternating direction conditional gradient (ADCG) algorithm proposed by Boyd et al. [2]. The corresponding optimization problem takes noise into account and penalizes the sparsity of the above linear combination by the \(\ell _1\)norm of the amplitudes. The optimization problem can be rephrased in terms of atomic measures, where the \(\ell _1\)norm is directly related to the total variation norm of the measure, resp. to the atomic norm of a certain set of atoms. Such problems are known as BLASSO [16]. Besides FrankWolfe like algorithms that minimize the location parameters over a continuous domain, a common approach consists in constraining the locations to lie on a grid. This leads to a finite dimensional convex optimization problems, known as LASSO [41] or basis pursuit [8] , for which there exist numerous solvers [12, 14, 20, 43]. We will compare the ADCG applied to our resampled problem with a grid method, where we incorporate an adaptive grid refinement. As a third group of methods, we like to mention the reformulation of the optimization problem via its dual into an equivalent finite dimensional semidefinite program (SDP). This technique was first proposed in [5] and then adapted by many other authors. However, the equivalence of formulations are only true in the onedimensional setting and in higher dimensions one needs to use e.g. the socalled Lassere hierarchy [16]. An SDP approach for our twodimensional setting based on a results of [18] was also proposed in the paper of Heckel et al. [23]. Since this approach appears to be highly expensive both in time and memory requirement and has moreover to fight with many non specific local maxima related to the socalled dual certificate, it is not appropriate for our setting.
This paper is organized as follows: In Sect. 2, we collect the basic notation and results from Fourier analysis and measure theory which are needed in the following sections. At the end of the section we establish a theorem which relates trigonometric polynomials with periodic functions arising from the Fourier transform of compactly supported measures. The proof of the theorem is given in Appendix A. In Sect. 3, we formulate our superresolution problem for doublydispersive channel estimations. More precisely, we are interested in the twodimensional parameter detection of sparse linear combinations of translationmodulation operators. Instead of treating the original problem, we give a sampling reformulation of the involved translationmodulation operators for identifiers which are trigonometric polynomials. Here the relation between these polynomials and Fourier transforms of measures will play a role. Since the identifiers have only to be evaluated at points lying in a compact interval, our choice implies no restriction for practical purposes. In Sect. 3, we prove the sampling theorem for translationmodulation operators applied to trigonometric polynomials. Then, in Sect. 5, we show how an alternating descent conditional gradient algorithm can be applied to solve the reformulated problem. Finally, we demonstrate the performance of this algorithm in comparison with simple adaptive grid refinement algorithm and an orthogonal matching pursuit method in Sect. 6.
Preliminaries
Function spaces Let I be an open finite interval of \(\mathbb {R}\) or \(\mathbb {R}\) itself. By C(I) we denote the space of complexvalued, continuous functions on I, by \(C_b(I)\) the Banach space of bounded, complexvalued, continuous functions endowed with the norm \(\Vert f\Vert _\infty = \sup _{x \in I} f(x)\). Further, let \(C_0(\mathbb {R}) \subset C_b(\mathbb {R})\) be the closed subspace of complexvalued, continuous functions vanishing at infinity. Let \(L^r(I)\), \(r \in [1,\infty ]\) be the Banach space of (equivalence classes) of complexvalued Borel measurable functions with finite norm
For compact I, it holds \(L^1(I) \supset L^r(I) \supset L^s(I) \supset L^\infty (I)\), \(r < s\). The rnorm \(\Vert \cdot \Vert _r\) on higher dimensional domains, sequences and vectors are analogously defined.
An entire (holomorphic) function \(f : \mathbb {C}\rightarrow \mathbb {C}\) is of exponential type if there exist positive constants \(a, b > 0\) such that
The exponential type of f is then defined as the number
The Bernstein space \(B_\sigma ^r\), \(r \in [1,\infty ]\), consist of all entire functions f of exponential type \(\sigma \) whose restriction to \(\mathbb {R}\) belongs to \(L^r(\mathbb {R})\). Endowed with the \(L^r\) norm, \(B_\sigma ^r\) becomes a Banach space, too. We will need the following sampling result of Nikol’skiĭ [33].
Theorem 1
(Nikol’skiĭ’s Inequality [33, Thm 3.3.1]) Let \(r \in [1, \infty ]\). Then, for every \(f \in B_\sigma ^r\) and \(a > 0\), we have
Fourier transform of functions The Fourier transform \({\mathscr {F}}: L^1(\mathbb {R}) \rightarrow C_0 (\mathbb {R}) \subset L^\infty (\mathbb {R})\) defined by
is a bounded linear operator. For \(1 < r \le 2\), this operator can be extended as \({\mathscr {F}}: L^r(\mathbb {R}) \rightarrow L^s (\mathbb {R})\), \(\frac{1}{r} + \frac{1}{s} = 1\) via the limit in the norm of \(L^s(\mathbb {R})\) of
By Plancherel’s equality, the Fourier transform is an isometry on \(L^2(\mathbb {R})\). Note that the Fourier transform of a function \(f \in L^r(\mathbb {R})\) with \(r > 2\) can be defined in terms of tempered distributions. However, the distributional Fourier transform \({\hat{f}}\) does in general not correspond to a function. A special role plays the sinus cardinalis defined as
The sinc function is in \(L^2(\mathbb {R})\) but not in \(L^1(\mathbb {R})\). Further, we have
where \(\chi _{C}\) denotes the characteristic function of a set \(C \subseteq \mathbb {R}\), i.e., \(\chi _{C}(x) = 1\) if \(x \in A\) and \(\chi _{C}(x) = 0\) if \(x \notin C\). The counterpart of scaled sinc functions in the periodic setting are the Nth Dirichlet kernels given by
For arbitrary \(f\in L^1(\mathbb {R})\) with \({\hat{f}}\in L^1(\mathbb {R})\), the Fourier inversion formula
holds true almost everywhere and, moreover, pointwise if the function f is continuous. For two functions \(f \in L^1(\mathbb {R})\) and \(g \in L^r(\mathbb {R})\), \(r\in [1,\infty ]\), the convolution \(f*g\) is defined almost everywhere by
and is contained in \(L^r(\mathbb {R})\). For \(r \in [1,2]\), the relation between convolution and Fourier transform is given by \(\widehat{f*g} = {\hat{f}} \, {\hat{g}}\).
For \(\sigma >0\) and \(r \in [1,\infty ]\), we denote by \(\mathrm {PW}_\sigma ^r\) the PaleyWiener class of functions \(f : \mathbb {C} \rightarrow \mathbb {C}\) of the form
for some \(g \in L^r(\sigma ,\sigma )\). We have the inclusion \(\mathrm {PW}_\sigma ^r \subset \mathrm {PW}_\sigma ^s\) for \(1 \le s < r\). Functions of the class \(\mathrm {PW}_\sigma ^r\) are holomorphic and of exponential type \(2 \pi \sigma \) by
For \(r \in [1,2]\), we further have \(\mathrm {PW}_{\sigma }^r\subset B^s_{2 \pi \sigma }\) with \(\frac{1}{r} + \frac{1}{s} = 1\), see [24].
Measure spaces Let X be a compact subset of \(\mathbb {R}^d\) or \(\mathbb {R}^d\) itself. By \({\mathscr {M}}(X)\) we denote all regular, finite, complexvalued measures, i.e., all mappings \(\mu : {\mathscr {B}}(X) \rightarrow \mathbb {C}\) from the Borel \(\sigma \)algebra of \(\mathbb {R}^n\) to \(\mathbb {C}\) with \(\mu (X) < \infty \) and
for any sequence \(\{B_k\}_{k \in \mathbb {N}} \subset {\mathscr {B}}(X)\) of pairwise disjoint sets. We suppose that the series on the righthand side converges absolutely, so that the indices of the sets \(B_k\) can be arbitrarily reordered. The support of a complex measure \(\mu \in {\mathscr {M}}(X)\) is defined by
where \(\rho ^+  \rho ^ = \mathfrak {R}(\mu )\) and \(\iota ^+  \iota ^ = \mathfrak {I}(\mu )\) are the Hahn decompositions of the real and imaginary part into nonnegative measures. The support of a nonnegative measure \(\nu \) is the closed set
The total variation of a measure \(\mu \in \mathscr {M}(X)\) is defined by
With the norm \(\Vert \mu \Vert _{{\mathscr {M}}(X)} :=\mu (X)\) the space \({\mathscr {M}}(X)\) becomes a Banach space. The space \({\mathscr {M}}(X)\) can be identified via Riesz’s representation theorem with the dual space of \(C_0(X)\) and the weak\(*\) topology on \({\mathscr {M}}(X)\) gives rise to the weak convergence of measures.
We will need that, for a bounded Borelmeasurable function g, the measure \(g \mu \) defined by \(g \mu (B) := \int _B g(x) \,\mathrm {d}\mu (x)\) for open \(B \subset \mathbb {R}^d\) is again in \({\mathscr {M}}(\mathbb {R}^d)\) and \(\Vert g \mu \Vert _{{\mathscr {M}}(X)} \le \Vert g\Vert _\infty \Vert \mu \Vert _{{\mathscr {M}}(X)}\).
Fourier transform of measures For our purposes, it is enough to consider the Fourier transform of measures on \(X = \mathbb {R}\). If we consider the open balls \(B_R :=\{x : x < R\}\) of radius \(R > 0\), then
Indeed, the integral with respect to a measure \(\mu \in {\mathscr {M}}(\mathbb {R})\) is also well defined for every \(\varphi \in C_b(\mathbb {R})\) and
Consequently, we can define the Fourier transform \({\mathscr {F}} :{\mathscr {M}}(\mathbb {R}) \rightarrow C_{b}(\mathbb {R})\) by
The Fourier transform is a linear, bounded operator from \({\mathscr {M}}(\mathbb {R})\) into \(C_b(\mathbb {R})\) with operator norm one. Moreover, it is unique in the sense that \(\mu \in {\mathscr {M}}(\mathbb {R})\) with \({\hat{\mu }} \equiv 0\) implies that \(\mu \) is the zero measure. We are especially interested in the Fourier transform of atomic measures \(\mu :=\sum _{k\in \mathbb {Z}} c_k \delta (\cdot  t_k)\) with \(c_k \in \mathbb {C}\), \(t_k \in \mathbb {R}\) given by
If the point masses are equispaced located at \(t_k = \frac{k}{K}\) with \(K \in \mathbb {N}\), the Fourier transform becomes a Kperiodic Fourier series. Moreover, restricting the support of \(\mu \) to \([\sigma , \sigma ]\), we obtain the Kperiodic trigonometric polynomial
where \(N = \lfloor \sigma K \rfloor \) and \(t_k = \frac{k}{K}\), \(k=N,\ldots ,N\). The following theorem shows that also the reverse direction is true, i.e., every periodic function given as the Fourier transform of a compact measure is a finite trigonometric polynomials.
Theorem 2
Let \(f = {\hat{\mu }}_f\) with \(\mu _f \in {\mathscr {M}}(\mathbb {R})\) fulfill \({{\,\mathrm{\mathrm {supp}}\,}}\mu _f \subseteq [\sigma , \sigma ]\) for some \(\sigma > 0\). Suppose that f is Kperiodic for \(K \in \mathbb {N}\). Then f is a trigonometric polynomial of the form
where \(N = \lfloor \sigma K \rfloor \).
The proof of the theorem is given in Appendix A.
Superresolution in doublydispersive channel estimation
In doublydispersive channel estimation we are both interested in the detection of shifts and modulations of signals. Recall that the shift operator \(T_\tau \) and the modulation operator \(M_\nu \) are defined for \(x, \tau , \nu \in \mathbb {R}\) by
respectively. Their concatenation is given by
Similarly, for \(f \in L^r(\mathbb {R})\) with \(r \in [1,2]\), it holds
Both operators are unitary on \(L^2(\mathbb {R})\). Note that a similar definition of shifts and modulations can be given for tempered distributions, see, e.g., [36, Section 4.3.1]. For \(S \in \mathbb {N}\) and \({\mathscr {T}}, \varOmega >0\), we consider the operator
with \(\mathbb {C}_* := \mathbb {C}\setminus \{0\}\). We are interested in the following superresolution problem: for a known function \(w \in C_b(\mathbb {R})\), determine the amplitudes \(\eta _s \in \mathbb {C}_*\) and positions \(\tau _s , \nu _s \in \mathbb {R}\), \(s=1,\ldots ,S\) from certain samples of
In this context, the function w is often called identifier.
Our solution will be based on an exact sampling formula of Hw which contains sparse linear combination of certain realvalued “atoms”. The idea to use such a reformulation for later computations originates from a paper of Heckel et al. [23]. However, the approach of those authors uses only an approximate sampling formula without given error bound and not an exact one, see Remark 2. The main sampling result is given in the following theorem.
Theorem 3
(Sampling Formula for TranslationModulation Operators) Choose \({\mathscr {T}}, \varOmega > 0\), \(N_1, N_2 \in \mathbb {N}\) and set \(L_1 :=2N_1 + 1\), \(L_2 :=2N_2 + 1\). Let
be an \(\frac{L_1}{\varOmega }\)periodic trigonometric polynomial. Then, we have for \(\tau , \nu \in \mathbb {R}\) and \(x_j = \frac{{\mathscr {T}}j}{L_2}\), \(j = N_2, \dots , N_2\) that
with socalled atoms \(A : \mathbb {R}^2 \rightarrow \mathbb {C}^{L_1 L_2}\) given by
where \((n_1,n_2)\) denotes the corresponding, unique index in \(\mathbb {C}^{L_1 L_2}\) for \(n_1 = N_1,\ldots ,N_1\), \(n_2 = N_2, \ldots ,N_2\).
Figuratively, an atom \(A(\tau ,\nu )\) may be interpreted as a vectorized \(L_1\times L_2\)dimensional matrix. The proof of Theorem 3 is the content of the next section.
By Theorem 3, we can rewrite the superresolution problem (3) with an identifier of the form (4) for given samples \(y_j = H w(x_j)\), \( x_j = \frac{{\mathscr {T}}j}{L_2}\), \(j=N_2,\ldots ,N_2\) as
By periodicity of the atoms (6), it makes indeed sense to restrict ourselves to
and to choose \(L_k \ge {\mathscr {T}} \varOmega \), \(k=1,2\). In this case, all points \(x_j  \frac{n_1}{\varOmega }\) at which the periodic identifier w in (7) must be evaluated, belong to the interval \(I = ({\mathscr {T}},{\mathscr {T}})\).
In practice, we would like to use a compactly supported identifier whereas our theory is based on periodic identifiers. Since only the function values w(x) with \(x \in ({\mathscr {T}},{\mathscr {T}})\) are involved in the sampling process of Hw, we may theoretically replace the periodic identifier w by the compactly supported and partially periodic function \(\chi _I\, w\) without changing the obtained samples. Consequently, we may apply the resampling formula to identify a doublydispersive channel H using compactly supported and partially periodic signals like \(\chi _I \, w\), which links our theory to the realworld setting.
Setting \( y :=(y_j)_{j=N_2}^{N_2}\) and introducing the operator
with entries
where \((n_1,n_2)\) is again the corresponding index in \(\mathbb {C}^{L_1L_2}\) as for the atoms, we can rewrite the superresulution problem (7) as
In practical applications, the measurements y are often corrupted by noise so that we finally intend to solve the regularized problem
where \(\eta = (\eta _s)_{s=1}^S\) and \(\tau = (\tau _s)_{s=1}^S\), \(\nu = (\nu _s)_{s=1}^S\). Indeed, we may choose S larger than the number of expected translationmodulations, minimize over \(\eta \in \mathbb {C}^S\), and hope that the regularization term enforces the sparsest solution. Especially in the numerics, we allow \(\eta _s\) to become zero; the captured triple \((\eta _s, \tau _s, \nu _s)\) with \(\eta _s = 0\) may then be neglected.
Remark 1
The above problem is closely related to an inverse problem in the space of measures. To this end, we consider the linear, continuous operator \({\mathbf {A}}: \mathbb {C}^{L_1L_2} \rightarrow C(X)\) defined by \({\mathbf {A}} y := \{ (\tau ,\nu ) \mapsto \langle A(\tau ,\nu ) , y \rangle \}\) for \(y \in \mathbb {C}^{L_1L_2}\). Its adjoint \({\mathbf {A}}^*: {\mathscr {M}}(X) \rightarrow \mathbb {C}^{L_1L_2}\) is given by
Then, we may consider the inverse problem
Problems of this kind are also known as BLASSO [5, 16] and were studied in several papers, e.g., by Bredies and Pikkarainen [3] and Denoyelle et al. [17]. In particular, it was shown that the problem has a solution. Since \(G {\mathbf {A}}^*\) is not injective, the solution is in general not unique. Restricted to atomic measures in \({\mathscr {M}}(X)\), i.e. \(\mu = \sum _{s=1}^S \eta _s \delta \left( \cdot  (\tau _s,\nu _s \right) )\), problem (9) takes the form (8).
The superresolution problem may be also seen from the point of view the socalled atomic norm formulation addressed in a couple of papers [6, 9, 17, 19, 39]. Since \(\eta _s = \eta _s \mathrm {e}^{2\pi \mathrm {i}\phi _s}\) is complexvalued, the set of atoms must be redefined as \(\{ \mathrm {e}^{2\pi \mathrm {i}\phi } A(\tau ,\nu ): \phi \in [0,1), (\tau ,\nu ) \in X\}\) to take real linear combinations of atoms.
As already mentioned, superresolution problem (3) has been already considered by Heckel et al. [23]. However, these authors proposed to use a different identifier, an issue addressed in the next remark.
Remark 2
(Relation to the work of Heckel et al. [23]) The authors of [23] considered the case \(N_1=N_2=N\) and \(L_1 = L_2 = L := {\mathscr {T}}\varOmega \), so that the resampling formula (7) becomes
However, as identifier they propose
with some \(K \in \mathbb {N}\). Actually, \(K=1\) was applied in [23]. Since the sinc function is not periodic, the resampling formula (10) does not hold exactly and only gives an approximation.
Resampling results for translationmodulation operators
In this section, we prove Theorem 3. The basis is a Sampling Theorem 4 for \(L^1\) functions. Then we prove certain sampling formulas which are of interest on their own. First, in Lemma 2, we show a sampling formula for \(p\,H ({\hat{q}} * w)\), where p, q are compactly supported functions with Fourier transform in \(L^1(\mathbb {R})\), for general \(w \in L^\infty (\mathbb {R})\) using certain compactly supported helper functions \(\phi \) and \(\psi \). Restricting to identifiers w which are Fourier transforms of measures, we will see in Theorem 5 that the helper functions can be avoided. Finally, we will use this theorem together with approximation arguments involving sequences of compactly supported Schwartz functions \(\{p_n\}_n\) and \(\{q_n\}_n\) to prove Theorem 3. We start by recalling a sampling theorem for \(L^1\)functions, which is an extension of the classical sampling theorem of Shannon, Whittaker, and Kotelnikov, see for instance [36, Thm 2.29], by the \(L^1\)convergence of the interpolation formula for \(L^1\)sampling functions.
Theorem 4
(Sampling Theorem for \(L^1\)functions) Let \(f \in L^1(\mathbb {R}) \cap C_0(\mathbb {R})\) be a bandlimited function with \({{\,\mathrm{\mathrm {supp}}\,}}{\hat{f}} \subseteq [\frac{\varOmega }{2}, \frac{\varOmega }{2}]\). Choose \(0< a < 1/\varOmega \). Then for any lowpass kernel \(\phi \in L^1(\mathbb {R}) \cap C_0(\mathbb {R})\) satisfying
we have
for all \(x \in \mathbb {R}\) with absolute and uniform convergence on \(\mathbb {R}\) and convergence in \(L^1(\mathbb {R})\).
For convenience, the proof is given in Appendix B. In the classical sampling theorem of Shannon, Whittaker, and Kotelnikov, the function \(\phi \) is the sinus cardinalis, which however prevents the convergence in \(L^1\). In the following, we will further need the next auxiliary lemma.
Lemma 1
Let \(w \in L^\infty (\mathbb {R})\) and \(p, q \in L^1(\mathbb {R})\) with \({\hat{p}}, {\hat{q}} \in L^1(\mathbb {R})\). For \(F \in L^1(\mathbb {R}^2)\), we define the linear operator \({\mathscr {D}}_w : L^1(\mathbb {R}^2) \rightarrow L^\infty (\mathbb {R})\) by
Then \({\mathscr {D}}_w\) is continuous and for all \(\tau , \nu \in \mathbb {R}\) we have
Proof
For any \(x \in \mathbb {R}\), we have
Thus \(\Vert {\mathscr {D}}_w\Vert _{L^1(\mathbb {R}^2) \rightarrow L^\infty (\mathbb {R})} \le \Vert w\Vert _{\infty }\) and the first claim follows.
For the lefthand side of (11) we have by Young’s convolution inequality, see [36], that
Since \({\hat{p}} \in L^1(\mathbb {R})\), we know that \(p \in L^\infty (\mathbb {R})\). This implies \(p \, M_\nu T_\tau ({\hat{q}} *w) \in L^\infty (\mathbb {R})\). Using that \(p(x) = \int _\mathbb {R}{\hat{p}}(\xi ) \mathrm {e}^{2 \pi \mathrm {i}\xi x} \,\mathrm {d}\xi \) a.e., we obtain for almost every \(x \in \mathbb {R}\) that
\(\square \)
We use the above lemma to show the following intermediate sampling formula.
Lemma 2
Let H be given by (2). Let \(w \in L^\infty (\mathbb {R})\) and \(p, q \in L^1(\mathbb {R}) \cap C_0(\mathbb {R})\) with \({\hat{p}}, {\hat{q}} \in L^1(\mathbb {R})\) and \({{\,\mathrm{\mathrm {supp}}\,}}p \subseteq [\frac{{\mathscr {T}}_p}{2}, \frac{{\mathscr {T}}_p}{2}]\) as well as \({{\,\mathrm{\mathrm {supp}}\,}}q \subseteq [\frac{\varOmega _q}{2}, \frac{\varOmega _q}{2}]\). Choose stepsizes \(0< a < 1 / \varOmega _q\) and \(0< b < 1 / {\mathscr {T}}_p\). Then for any \(\phi , \psi \in L^1(\mathbb {R}) \cap C_0(\mathbb {R})\) with \({\hat{\phi }}, {\hat{\psi }} \in L^1(\mathbb {R})\) obeying
we have
for all \(x \in \mathbb {R}\), where
The series on the right side of (12) converges uniformly on \(\mathbb {R}\).
Proof
By linearity it suffices to consider the case \(H = M_\nu T_\tau \). Since \(p,q \in L^1(\mathbb {R})\), we have \({\hat{p}},{\hat{q}} \in C_0(\mathbb {R})\) so that \(F :=T_\tau {\hat{q}} \otimes T_\nu {\hat{p}} \in L^1(\mathbb {R}^2) \cap C_0(\mathbb {R}^2)\). Moreover, by the support properties of p and q, we get \({{\,\mathrm{\mathrm {supp}}\,}}{\hat{F}} \subset [\tfrac{\varOmega _q}{2}, \tfrac{\varOmega _q}{2}] \times [\tfrac{{\mathscr {T}}_p}{2}, \tfrac{{\mathscr {T}}_p}{2}]\). Consequently, we can apply Theorem 4 to F along each dimension w.r.t. the stepsizes a and b and lowpass kernels \({\hat{\phi }}\) and \( {\hat{\psi }}\) to obtain
which converges absolutely and uniformly. For the \(L^1\)convergence, we have to show that
vanishes for \(K \rightarrow \infty \), which follows for both integrals as discussed in the proof of Theorem 4.
As the operator \({\mathscr {D}}_w : L^1(\mathbb {R}^2) \rightarrow L^\infty (\mathbb {R})\) defined in Lemma 1 is continuous we conclude
By applying Lemma 1 once again, we obtain
Consequently we get for almost every \(x \in \mathbb {R}\) that
Note that by Theorem 1 the sequences \(\bigl ({\hat{q}}(ak_1  \tau )\bigr )_{k_1 \in \mathbb {Z}}\) and \(\bigl ({\hat{p}}(b k_2  \nu )\bigr )_{k_2 \in \mathbb {Z}}\) are absolutely summable. The functions \(M_{bk_2}T_{ak_1} ({\hat{\psi }} *w)\) are bounded by
Thus, the series (13) converges uniformly on \(\mathbb {R}\) and, since the partial sums in (13) are continuous functions, we conclude that the series converges to a continuous bounded function. As p and \({\hat{q}} *w\) are also continuous and bounded, we see that (12) holds for all \(x \in \mathbb {R}\). \(\square \)
Although Theorem 2 works on arbitrary bounded identifiers \(w \in L^\infty (\mathbb {R})\), the fact that the left side of (12) does not depend on \(\phi \) and \(\psi \) suggests that there might be a way to avoid the use of these functions. For this purpose, we restrict our attention to a subset of \(L^\infty (\mathbb {R})\), namely functions \(f = {\hat{\mu }}_f\) with \(\mu _f \in {\mathscr {M}}(\mathbb {R})\). Having the Fourier convolution theorem in mind, for a Borel measurable, bounded function \(\phi \), we define the convolution
which yields a continuous and bounded function. If \(\phi \in L^1(\mathbb {R}) \cap C_0(\mathbb {R})\) such that \({\hat{\phi }} \in L^1(\mathbb {R})\), then our convolution may be expressed by the Fourier convolution as
We have the following convergence result.
Lemma 3
Let \(f = {\hat{\mu }}_f\) with \(\mu _f \in {\mathscr {M}}(\mathbb {R})\) and let g be a bounded Borelmeasurable function. Assume that the uniformly bounded and Borel measurable functions \(g_m : \mathbb {R}\rightarrow \mathbb {C}\) converge pointwise to \(g : \mathbb {R}\rightarrow \mathbb {C}\). Then \(g_m \star _{{\mathscr {F}}} f\) converges uniformly to \(g \star _{{\mathscr {F}}} f\), i.e.,
Proof
Applying Fatou’s lemma, we obtain
The lemma of Fatou is applicable since \(\Vert g  g_m\Vert _\infty \le 2M\) for some \(M > 0\) and constant functions are integrable w.r.t. \(\mu _f \in {\mathscr {M}}(\mathbb {R})\). \(\square \)
Theorem 5
Let H be given by (2). Let \(w = {\hat{\mu }}_w\) with \(\mu _w \in {\mathscr {M}}(\mathbb {R})\) and \(p, q \in L^1(\mathbb {R}) \cap C_0(\mathbb {R})\) with \({\hat{p}}, {\hat{q}} \in L^1(\mathbb {R})\) and \({{\,\mathrm{\mathrm {supp}}\,}}p \subseteq [\frac{{\mathscr {T}}_p}{2}, \frac{{\mathscr {T}}_p}{2}]\) and \({{\,\mathrm{\mathrm {supp}}\,}}q \subseteq [\frac{\varOmega _q}{2}, \frac{\varOmega _q}{2}]\). Choose \(0< a < 1 /\varOmega _q\) and \(0< b < 1 / {\mathscr {T}}_p\). Then, for all \(x \in \mathbb {R}\), we have
where
The series on the righthand side of (14) converges uniformly on \(\mathbb {R}\).
Proof
Let \((\psi _m)_{m \in \mathbb {Z}}\) and \((\phi _m)_{m\in \mathbb {N}}\) be uniformly bounded sequences of Schwartz functions with
for all \(m \in \mathbb {N}\) which converge for \(m\rightarrow \infty \) pointwise as
Abbreviating \(y :=p H ({\hat{q}} *w)\), we obtain by Theorem 2 that
Note that neither y(x) nor \(c_{k_1,k_2}\) depend on \(m_1\) or \(m_2\). Letting \(m_1 \rightarrow \infty \), we immediately obtain the pointwise limit
Now consider the series: We already used in the proof of Theorem 2 that by Theorem 1 the coefficients \((c_{k_1,k_2})_{k_1,k_2 \in \mathbb {Z}} \in \ell ^1(\mathbb {Z}^2)\) are absolutely summable. Moreover, writing \(\phi :=a \chi _{(\frac{1}{2a}, \frac{1}{2a})}\) we know by construction that \(\phi _{m_2}(x) \rightarrow \phi (x)\) as \(m_2 \rightarrow \infty \) for every \(x \in \mathbb {R}\) and \((\phi _{m_2})_{m_2 \in \mathbb {Z}}\) is uniformly bounded. We can therefore apply Lemma 3 to obtain
Since we have \({\hat{\phi }}_{m_2} *w = \phi _{m_2} \star _{{\mathscr {F}}} w\) for all \(m_2 \in \mathbb {N}\), we estimate
Letting \(m_2 \rightarrow \infty \) the right side converges to 0 which proves that
for all \(x\in \mathbb {R}\), which is equivalent to (14).
The uniform convergence of the series follows immediately from \((c_{k_1,k_2})_{k_1,k_2 \in \mathbb {Z}} \in \ell ^1(\mathbb {Z}^2)\) and \(\chi _{(\frac{1}{2a}, \frac{1}{2a})} \star _{{\mathscr {F}}} w \in C_b(\mathbb {R})\).\(\square \)
Now we can prove our main theorem.
Proof (Theorem 3)
1. Since \(\frac{n\varOmega }{L_1} \le \frac{(L_11)\varOmega }{2L_1}\) for \(n = N_1, \dots , N_2\) in the representation (4) of the identifier w, we see that \({{\,\mathrm{\mathrm {supp}}\,}}\mu _w \subset [\frac{L_11}{2L_1}\varOmega , \frac{L_11}{2L_1}\varOmega ]\). Choose \(\max \{\frac{L_11}{L_1}, \frac{L_21}{L_2} \}< \beta < 1\) and let \((\gamma _m)_{m\in \mathbb {N}}\) and \((\lambda _m)_{m \in \mathbb {N}}\) be sequences of positive numbers such that \(1 < \gamma _m\) and \(\beta< \lambda _m < 1\) and \(\gamma _m \lambda _m < 1\) for all \(m \in \mathbb {N}\) that converge to 1 as \(m \rightarrow \infty \). Then, for \(m \in \mathbb {N}\), define
as well as the functions
Clearly, we have for all \(m \in \mathbb {N}\) that \(w_m= {\hat{\mu }}_{w_m}\), where \(\mu _{w_m} \in {\mathscr {M}}(\mathbb {R})\) fulfills
Further, the function \(w_m\) is \(a_m L_1\)periodic. Let \((p_m)_{m \in \mathbb {N}}, (q_m)_{m \in \mathbb {N}}\) be sequences of Schwartz functions with
We consider the signal
Now \(p_m, q_m\) as well as \(a_m = \frac{ \lambda _m}{\varOmega } < \frac{1}{\varOmega _m}\) and \(b_m = \frac{ \lambda _m}{ {\mathscr {T}}} < \frac{1}{{\mathscr {T}}_m}\) satisfy the assumptions of Theorem 5. Hence we get
with \(c_{m, k_1, k_2} :={\hat{q}}_m(a_m k_1  \tau ) {\hat{p}}_m(b_m k_2  \nu )\) for \(k_1, k_2 \in \mathbb {Z}\).
Since \(\frac{1}{a_m} = \frac{\varOmega }{\lambda _m} > \varOmega \) it follows that \({{\,\mathrm{\mathrm {supp}}\,}}\mu _{w_m} \subset (\frac{\varOmega }{2}, \frac{\varOmega }{2}) \subset (\frac{1}{2a_m}, \frac{1}{2a_m})\). Therefore we have for all \(x \in \mathbb {R}\) and \(m \in \mathbb {N}\) that
Thus for \(x < \frac{1}{2b_m}\) we can simplify (15) to
2. For \(j = N_2, \dots , N_2\), we consider
Since \({\hat{p}}_m, {\hat{q}}_m\) are Schwartz functions, we know that \((c_{m,k_1,k_2})_{k_1,k_2\in \mathbb {Z}} \in \ell ^1(\mathbb {Z}^2)\). Further \(w_m\) is bounded, so that the series in (16) converges absolutely. Consequently we can rearrange the summation and use the substitution \(k_1 = \ell _1 L_1 + n_1\) and \(k_2 = \ell _2 L_2 + n_2\) for \(\ell _1, \ell _2 \in \mathbb {Z}\) and \(n_1 = N_1, \dots , N_1\) as well as \(n_2 = N_2, \dots , N_2\) to obtain
where in the last line we abbreviate
We can significantly simplify (18) via Poisson’s summation formula: Indeed, \({\hat{q}}_m, {\hat{p}}_m\) are bandlimited, integrable functions, so by Lemma 4 we obtain
and
We used that \(q_m(\frac{\ell _1}{a_m L_1}) = 0\) if \(\ell _1 \ge \frac{L_1}{2}\) since this implies \(\frac{\ell _1}{a_mL_1} \ge \frac{1}{2a_m} > \frac{\varOmega _m}{2}\) and also \(p_m(\frac{\ell _2}{b_mL_2}) = 0\) if \(\ell _2 \ge \frac{L_2}{2}\) because then \(\frac{\ell _2}{b_mL_2} \ge \frac{1}{2b_m} > \frac{{\mathscr {T}}_m}{2}\). 3. Finally, we take limits. By continuity of w it is easy to compute
Now consider the limits of \(Q_{m, n_1}(\tau )\) and \(P_{m, n_2}(\nu )\). It follows from \(a_m \varOmega = \lambda _m> \beta > \frac{2N_1}{L_1}\) that \(\frac{\ell _1}{a_mL_1} \le \frac{N_1}{a_mL_1} < \frac{\varOmega }{2}\) for \(\ell _1 = N_1, \dots , N_1\), which in turn implies \(q_m(\frac{\ell _1}{a_mL_1}) = 1\). Similarly, since \(b_m {\mathscr {T}}= \lambda _m> \beta > \frac{2N_2}{L_2}\) we have \(\frac{\ell _2}{b_mL_2} \le \frac{N_2}{a_mL_2} < \frac{{\mathscr {T}}}{2}\) and thus \(p(\frac{\ell _2}{b_mL_2}) = 1\) for all \(\ell _2 = N_2, \dots , N_2\). Consequently, it follows
and by the an analogous computation,
Therefore taking the limit of (17) yields
Next we consider the limit of the definition of \(y_m(\frac{j}{b_m L_2})\), i.e.,
Using the assumptions on \(q_m\) we obtain
for all \(m\in \mathbb {N}\), so that (20) can be written as
We already showed in a previous argument that \(p_m(\frac{j}{b_m L_2}) = 1\) for \(j = N_2, \dots , N_2\) for all \(m\in \mathbb {N}\). Then it follows from continuity that
Numerical algorithms
In this section, we propose to solve problem (8), i.e.,
by two kind of algorithms. We adapt the alternating descent conditional gradient method from [2] to our setting in Sect. 5.2. We will address the theoretical convergence behaviour in a forthcoming manuscript and refer only to the literature here. For numerical comparisons, we start with a simple grid refinement algorithm in the next Sect. 5.1.
Multilevel time–frequency refinement algorithm
Instead of solving the optimization problem over the continuous set \(X= [{\mathscr {T}}/2, {\mathscr {T}}/2] \times [\varOmega /2, \varOmega /2]\), we may discretize X on a grid \({\mathscr {J}}\) of cardinality J. For instance we could choose an equidistant grid. Then we consider the atoms on the grid points \((\tau _j, \nu _j)\), \(j \in {\mathscr {J}}\). Setting
and \(\eta \in \mathbb {C}^J\), we reduce (9) to the convex minimization problem
The sparsity of the discrete measure is here promoted by the 1norm. In other words, we hope that \(\eta \) has only \(S \ll J\) entries which are not near zero. For onedimensional problems on the torus, Duval and Peyré [19] showed that the discretized problem \(\varGamma \)converges to the continuous problem in the sense of Remark 1 if the regular grid gets finer and finer under certain assumptions; so if the grid is fine enough, we should obtain a sufficient precise solution. On the contrary, a fine grid blows up the problem dimension and make its numerically intractable. Further, as described in [17] and references therein for general total variation minimization problems, the true point masses are usually approximated by several point masses of the grid in a small neighbourhood. These clusters may be detected and replaced by an averaged point mass. Further, the minimization problem (22) is a basis pursuit often encountered in compressed sensing and can be solved using toolboxes like CVX [22] or, approximately, by greedy methods like matching pursuits [4, 15, 42].
Instead of choosing a fine grid on the entire domain, we would like to solve the \(\ell ^1\) minimization problem (22) on a small set \({\mathscr {J}}\) that, in the ideal case, only covers the neighbourhoods of the unknown true parameters in X to reduce the numerical effort. For this purpose, we initially apply the orthogonal matching pursuit in Algorithm 1 on a fine regular grid until the residuum r gets small or a certain number of atoms is determined. Although the performance of the greedy method strongly depends on the current instance, the computed atoms are usually located near the true point masses. Surrounding the computed atoms with a fine local grid, we obtain a good starting set \({{\mathscr {J}}}_0\) for (22). Next, we would like to let the local grid become finer and finer to improve the solution and to let the number of atoms be nearly the same. Having an optimal \(\eta ^*\) of (22) for \({{\mathscr {J}}}_r\), we may chose a new finer grid \({{\mathscr {J}}}_{r+1}\) around the interesting features by one of the following refinement strategies:

1.
Determine the dominant atoms corresponding to \((\tau _j, \nu _j) \in {{\mathscr {J}}}_r\) with \(\eta _j^* \ge \epsilon \). Discretize the neighbourhood around these atoms by a finer grid. Chose \({{\mathscr {J}}}_{r+1}\) as the union of these finer grids.

2.
Determine the importance \(\gamma _j\) of the atom corresponding to \((\tau _j, \nu _j) \in {{\mathscr {J}}}_r\) by
$$\begin{aligned} \gamma _j := \sum _{(\tau _k, \nu _k) \in {{\mathscr {J}}}_r \cap U_j} \eta _k^*, \end{aligned}$$where the coefficients of all atoms with parameters in a neighbourhood \(U_j\) around \((\tau _j, \nu _j)\) are summed up. For the most important neighbourhood \(U_j\), compute the barycenter by
$$\begin{aligned} ({\tilde{\tau }}_j, {\tilde{\nu }}_j) := \sum _{(\tau _k, \nu _k) \in {{\mathscr {J}}}_r \cap U_j} \tfrac{\eta _k^*}{\gamma _j} \, (\tau _k, \nu _k). \end{aligned}$$Add a finer grid around \(({\tilde{\tau }}_j, {\tilde{\nu }}_j)\) to \({{\mathscr {J}}}_{r+1}\), remove the atoms in \(U_j\) from \({{\mathscr {J}}}_r\), and repeat the procedure as long as there are important points with \(\gamma _j \ge \epsilon \).
The new local grids should cover a smaller neighbourhood. For instance, these grids could again be regular with decreasing step size according to r. Notice that the numerical effort of the first refinement strategy is less than for the second one. On the other hand, the second strategy can leave the local grids due to the barycenters. After determining a final atomic set \({{\mathscr {J}}}^*\) containing the most dominant atoms or barycenters, the corresponding coefficients can be computed by solving the least square problem
In summary, we obtain Algorithm 2.
Alternating descent conditional gradient algorithm
Next, we adapt the ADCG from [2] to our setting. This algorithm minimizes over the continuous domain X. The ADCG is a modification of the conditional gradient method (CGM) – also known as the FrankWolfe algorithm introduced in [21] – for total variation regularization. The original FrankWolfe algorithm on \(\mathbb {R}^d\) solves optimization problems of the form \({{\,\mathrm{\mathrm {argmin}}\,}}_{x \in {\mathscr {V}}} f(x)\), where the feasible set \({\mathscr {V}} \subset \mathbb {R}^d\) is compact and convex and the function f is a differentiable and convex. Given the kth iterate \(x_k\) each iteration consists basically of two steps, namely

(i)
minimizing a linearized version of f in \(x_k\) over the feasible set
$$\begin{aligned} v_k = {{\,\mathrm{\mathrm {argmin}}\,}}_{v \in {\mathscr {V}}} f(x_k) + \langle \nabla f(x_k), v  x_k\rangle , \end{aligned}$$ 
(ii)
updating with
$$\begin{aligned} x_{k+1} = x_k + \gamma (v_k  x_k). \end{aligned}$$
In superresolution, the first step always consists in an update of the support of the measure as it is also done in the first step of our Algorithm 3.
Concerning the second step, all important convergence guarantees of the algorithms are still valid, if we replace \(x_{k+1}\) in the second step by any feasible \({\tilde{x}}_{k+1}\) that fulfills \(f({\tilde{x}}_{k+1}) \le f(x_{k+1})\). This flexibility has led to several successful variations of the classical FrankWolfe algorithm. ADCG related algorithms which differ in the second step are for example the algorithm in [3] and the socalled sliding FrankWolfe in [17]. While the first one uses soft shrinkage to update the amplitudes and a discrete gradient flow over the locations, the second one uses a nonconvex solver to jointly minimize over the amplitudes and positions with a suitable starting values for the amplitudes.
Adapting the ADCG to our setting results in Algorithm 3, whose details are discussed in the following. For convergence results we refer to [2]. The expansion step of the ADCG algorithm is very similar to the greedy matching pursuit in Algorithm 1 without normalization of the atoms. To find a solution
the objective can first be evaluated on a fine regular grid of X. The obtained \((\tau _{{J_k}+1}, \nu _{{J_k}+1})\) may then be improved using a gradient descent method. In our numerical simulations, we however notice that this improvement step has no crucial impact on the recovered measure for our problem and can be skipped.
The second step consists in the update of the parameters by
with \(Z (\tau , \nu ) :=[A(\tau _1,\nu _1),\dots , A(\tau _S, \nu _S)]\). In difference to the general algorithm in [17], the coefficient of the point masses \(\eta \) are complex numbers such that the above update consists in the minimization of a nonsmooth objective. Therefore, we use the alternating minimization proposed in [2], which splits up the minimization into the basis pursuit or LASSO problem
and the smooth minimization problem
The \(\ell ^1\) regularized problem can be solved as discussed above and the second one by a gradient descent or quasi Newton method like BFGS. A short computation shows that the gradients of the objective F are just given by
where \(\cdot ^*\) denotes the conjugation and transposition of a matrix. The partial derivatives of the atoms \(A(\tau _j,\nu _j)\) with respect to \(\tau _j\) and \(\nu _j\) are collected in the matrices
with
The derivative of the Nth Dirichlet kernel \(D_N\) is given by
Finally, we like to mention that the numerical effort of ADCG algorithm is much higher compared with the multilevel refinement in Algorithm 2 since several optimization problems have to be solved for each added point mass.
Numerical results
In the following experiments, we compare the orthogonal matching pursuit, the multilevel timefrequency refinement, and the ADCG. First, we consider the performance for a specific synthetic instance. Then we study the general performance with respect to the noise level and how many measurements are needed to estimate the unknown channel. Finally, the influence of the identifier model is discussed.
Channel estimation from synthetic measurements For this experiment, we assume that the unknown channel or operator H in (2) has exactly \(S=10\) features and that this number is known in advance. The shifts and modulations \((\tau _j, \nu _j)\) are independently generated with respect to the uniform distribution on \([{\mathscr {T}}/ 2, {\mathscr {T}}/ 2] \times [\varOmega /2, \varOmega /2] = [1.5, 1.5] \times [15.5, 15.5]\). The coefficients \(\eta _j\) are independently and uniformly drawn from the complex unit circle. The employed identifier w is a trigonometric polynomial of degree \(N_1 = 50\), i.e. \(L_1 = 101\), whose coefficients are independently drawn from the complex unit circle too. The true samples \(y_j = H w (\tfrac{{\mathscr {T}}j}{L_2})\) with \(j = N_2, \dots , N_2\) and \(L_2 = 101\) are corrupted by additive complex Gaussian noise such that \(\Vert y  y^\delta \Vert _2 / \Vert y \Vert _2 = 0.1\), which corresponds to \(10~\mathrm {db}\)^{Footnote 1} white noise – the noisy data are again denoted by \(y^\delta \).
To recover the unknown channel parameters, we apply the orthogonal matching pursuit (Algorithm 1) with the regular grid \({\mathscr {J}}\) of \([{\mathscr {T}}/2, {\mathscr {T}}/2] \times [\varOmega /2, \varOmega /2]\) consisting of 1 024 points in each direction. The same grid is used to compute the location of the new point masses in the ADCG (Algorithm 3). Both methods are stopped after computing exactly 10 features. The multilevel refinement in Algorithm 2 is initialized by applying the orthogonal matching pursuit to a coarser grid with 256 points in each direction. The local \(5\times 5\) grids are then refined 15 times by reducing the stepsize by a factor of 0.75. We always use the second refinement strategy. The multilevel refinement and the ADCG are applied to the Tikhonov regularization (9) with \(\lambda = 500\). The recovered shifts and modulations of all three methods are shown in Fig. 1. The true parameters are denoted with an additional \(\dagger \). The absolute errors of the estimation are recorded in Table 1, where the experiment has been repeated 50 times and the errors are averaged. For this instance, all three methods yield comparable results, where the shifts \(\tau _j\) and modulations \(\nu _j\) are quite accurate. The multilevel refinement and the ADCG method achieve slightly higher accuracies than the orthogonal matching pursuit, but, on the downside, the ADCG method is much more timeconsuming than the others. Considering the noise level, the results are nevertheless satisfying and show that in particular the shifts and modulations are recoverable from highly noisy measurements.
Influence of Noise Next, we study the influence of the noise on the recovery quality of the algorithms in more details. Therefore, the unknown channel is again randomly generated with respect to 10 coefficients on the complex unit circle. In contrast to the first numerical example, the algorithms are henceforth stopped if the residuum becomes small or if the objective stagnates; in other words, the algorithms have no knowledge of the true sparsity S. The degree of the random identifier with unimodular coefficients and the number samples is \(L_1 = L_2 = 101\) once more. The remaining parameters are \(T = 1\) and \(\varOmega = 101\). The parameter \(\lambda \) is chosen with respect to the noise level and goes to zero for vanishing noise. Differently from the experiment before, we want to measure how well the estimated channel approximates the true one. Since we are only interested on the behavior of the true channel on the sampled interval \([{\mathscr {T}}/2, {\mathscr {T}}/2]\), we interpret the restriction of H as an operator from the space of \(L_1/\varOmega \)periodic trigonometric polynomials \({\mathscr {P}}_{N_1} \subset L^2([ L_1/2\varOmega , L_1/2\varOmega ))\) of degree \(N_1\) at the most to the squareintegrable functions \(L^2([{\mathscr {T}}/2, {\mathscr {T}}/2])\), i.e.
The difference between the true operator \(H^\dagger \) and the estimated operator H is henceforth measured by the operator norm
where \(\Vert \cdot \Vert _{L^2(I)}\) is the 2norm of the restriction to the specified interval I. Due to Parseval’s identity, the considered subspace is isometrically isomorph to the coefficient space \(\mathbb {C}^{L_1}\). After discretizing \([{\mathscr {T}}/2, {\mathscr {T}}/2]\) and employing the midpoint rule, the operator norm may be computed numerically using the singular value decomposition.
The mean performance of the discussed algorithms is shown in Fig. 2, where for every noise level the experiment has been repeated 50 times. During the multilevel refinement, the step size of the local grids is decreased 25 times by a factor of 2/3. For the ADCG method the \(\ell ^1\) and leastsquare minimization is alternated 25 times. The observation of the first experiment for \(10\,\mathrm {dB}\) noise carry over. Notice that already small parameter errors lead to large relative errors in the operator norm. The reconstruction error for the multilevel method and the ADCG method corresponds nearly onetoone to the noise level of the measurements. The reconstruction by the orthogonal matching pursuit does not improve if the noise is decreasing. Although the orthogonal matching pursuit yields sufficient results as starting point for the refinement method, the problem cannot be solved sufficiently accurate by applying only this greedy method.
Number of required measurements During our numerical experiments, we have noticed that around 10 times more samples than unknown features are required to estimate the parameters of the channel sufficiently well. In the following, we explore the question how many measurements are needed in more details. For this, we consider the solution of Algorithm 3 for different numbers of features and numbers of measurements. The remaining parameters of the setting are \(\varOmega = L_1 = L_2 = 101\) and \({\mathscr {T}}= 1\). The coefficient of the unknown channel are unimodular, and the measurements are exact. We declare a reconstruction as success if the relative error \(\Vert H^\dagger  H\Vert _{\mathrm {op}}/\Vert H^\dagger \Vert _{\mathrm {op}}\) is less than \(40 \, \mathrm {dB}\), and repeat the experiment 50 times for each data point. The success rate and the mean relative error in the operator norm are shown in Fig. 3 and sustain our observation.
This experiment is the numerical analogon to the theoretical recovery guarantee in [23, Thm 1], where the unknown parameters \((\eta _s, \tau _s, \nu _s)\) of (10) in Remark 2 are determined by solving an atomic norm problem. More precisely, the minimizer of the atomic norm problem yields the wanted parameters with high probability under certain assumptions. For the theoretical statement, at least \(L \ge 1024\) measurements are required. Considering the phase transition in Fig. 3, we see that, from a numerical point of view, much less measurements are needed to recover the unknown channel. In particular for higher sparsity levels, the transition between failure and success becomes nonlinear, which corresponds to the theoretical results.
Influence of the minimal separation Continuing the discussion of the theoretical guarantees, we recall that one of the crucial assumptions is a lower bound for the minimal separation
If the distance between two or more features in the parameter space become to close, they cannot be resolved numerically and are often combined into one feature. This wellknown effect may heavily lower the quality of the reconstruction and also occur in our setting. To study this behaviour numerically, we again consider random channels with 10 unimodular features for \(L_1 = L_2 = 101\), \({\mathscr {T}}= 1\), \(\varOmega = 101\). The shifts and modulations are generated such that the parameter set exactly possesses a certain minimal separation. The results with respect to the operator norm on \({\mathscr {P}}_{N_1}\) are shown in Fig. 4, where the experiments have been repeated 50 times without noise. If the separation falls below 0.01, then the error increases rapidly. Note that this transition point depends on the problem dimension \(L_1\), \(L_2\) and on the number of unknown features.
Importance of the identifier model Finally, we study how the chosen identifier model affects the recovery quality. During the entire paper, we used trigonometric polynomials as identifier w for the unknown channel. On the basis of w, the given samples \(Hw({\mathscr {T}}d/L_2)\) are related to the unknown parameters by Theorem 3, which are then determined by solving the Tikhonov functional (9) with respect to the total variation norm for measures. In [23], for the special case \(L := L_1 = L_2\), \(N := N_1 = N_2\), and odd \(L = {\mathscr {T}}\varOmega \), Heckel, Morgenshtern, and Soltanolkotabi have suggested to solve an atomic norm problem based on a model approximation where the identifier is chosen as sum of shifted sinc functions
The real coefficients are chosen partially periodic as \(c_n = c_{n+L} = c_{nL}\) for \(n = N, \dots , N\). We denote the Ldimensional span of the sinc functions (24) by \({\mathscr {S}}_L\). The given samples are then only approximated by (5) in Theorem 3 Fig. 5.
The replacement of the trigonometric polynomial by a sum of sinc functions leads to a model error. Considering a channel with 10 features and 101 samples as before, and studying the recovery error of Algorithm 3 measured in the operator norm, we see that the model mismatch corresponds to a noise level of around \(25~\mathrm {db}\). Notice that the comparison with respect to trigonometric polynomials is somehow subjective. For this reason, we also compute the relative reconstruction error based on the subspace of sinc functions (24). Numerically, the difference between both error terms is negligible. The clearly visible approximation for sinc functions does not occur for trigonometric identifiers.
Notes
The unit decibel henceforth refers to the scale \(10 \log _{10}(\Vert \cdot  p\Vert / \Vert p \Vert )\) for a reference point p – usually the true measurements or operator. Depending on the context, the norm refers to the Euclidean or operator norm.
References
Bello, P.: Measurement of random timevariant linear channels. IEEE Trans. Inform. Theory 15(4), 469–475 (1969)
Boyd, N., Schiebinger, G., Recht, B.: The alternating descent conditional gradient method for sparse inverse problems. SIAM J. Optim. 27(2), 616–639 (2017)
Bredies, K., Pikkarainen, H.K.: Inverse problems in spaces of measures. ESAIM Control Optim. Calc. Var. 19(1), 190–218 (2013)
Cai, T.T., Wang, L.: Orthogonal matching pursuit for sparse signal recovery with noise. IEEE Trans. Inform. Theory 57(7), 4680–4688 (2011)
Candès, E.J., FernandezGranda, C.: Towards a mathematical theory of superresolution. Comm. Pure Appl. Math. 67(6), 906–956 (2014)
Chandrasekaran, V., Recht, B., Parrilo, P.A., Willsky, A.S.: The convex geometry of linear inverse problems. Found. Comput. Math. 12(6), 805–849 (2012)
Chang, R.W.: Synthesis of bandlimited orthogonal signals for multichannel data transmission. AT&T Bell Labs. Tech. J. 45(10), 1775–1796 (1966)
Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20(1), 33–61 (1998)
Chi, Y., Ferreira Da Costa, M.: Harnessing sparsity over the continuum: atomic norm minimization for superresolution. IEEE Sig. Proces. Mag. 37(2), 39–57 (2020)
Chi, Y., Scharf, L.L., Pezeshki, A., Calderbank, A.R.: Sensitivity to basis mismatch in compressed sensing. IEEE Trans. Signal Process. 59(5), 2182–2195 (2011)
Cohn, D.L.: Measure Theory, 2nd edn. Birkhäuser Advanced Texts: Basel Textbooks. Birkhäuser/Springer, New York (2013)
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forwardbackward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)
Condat, L., Hirabayashi, A.: Cadzow denoising upgraded: a new projection method for the recovery of Dirac pulses from noisy linear measurements. Sampl. Theory Signal Image Process. 14(1), 17–47 (2015)
Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm. Pure Appl. Math. 57(11), 1413–1457 (2004)
Davis, G., Mallat, S., Avellaneda, M.: Adaptive greedy approximations. Constr. Approx. 13(1), 57–98 (1997)
De Castro, Y., Gamboa, F., Henrion, D., Lasserre, J.B.: Exact solutions to super resolution on semialgebraic domains in higher dimensions. IEEE Trans. Inform. Theory 63(1), 621–630 (2017)
Denoyelle, Q., Duval, V., Peyré, G., Soubies, E.: The sliding FrankWolfe algorithm and its application to superresolution microscopy. Inverse Problems 36(1), 014001, 42 (2020)
Dumitrescu, B.: Positive Trigonometric Polynomials and Signal Processing Applications. Signals and Communication Technology. Springer, Dordrecht (2007)
Duval, V., Peyré, G.: Sparse regularization on thin grids I: the Lasso. Inverse Problems 33(5), 05500, 29 (2017)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Statist. 32(2), 407–499 (2004)
Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res. Logist. Quart. 3, 95–110 (1956)
Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx (2014)
Heckel, R., Morgenshtern, V.I., Soltanolkotabi, M.: Superresolution radar. Inf. Inference 5(1), 22–75 (2016)
Higgins, J.R., Stens, R.L.: Sampling Theory in Fourier and Signal Analysis: Advanced Topics. Oxford Science Publications. Oxford University Press, Oxford (1999)
Kailath, T.: Measurements on timevariant communication channels. IRE Trans. on Inform. Theory 8(5), 229–236 (1962)
Katznelson, Y.: An Introduction to Harmonic Analysis Analysis. Cambridge University Press, Cambridge (2004)
Krahmer, F., Pfander, G.E.: Local sampling and approximation of operators with bandlimited KohnNirenberg symbols. Constr. Approx. 39(3), 541–572 (2014)
Kumar, S.: Wireless Communications: Fundamental and Advanced Concepts. River Publishers Series in Communications Series. River Publishers, Aalborg (2015)
Kumari, P., Choi, J., GonzálezPrelcic, N., Heath, R.W.: IEEE 802.11adbased radar: an approach to joint vehicular communicationradar system. IEEE Trans. Veh. Technol. 67(4), 3012–3027 (2018)
Kunis, S., Möller, H.M., Peter, T., von der Ohe, U.: Prony’s method under an almost sharp multivariate Ingham inequality. J. Fourier Anal. Appl. 24, 1306–1318 (2018)
Liao, W., Fannjiang, A.: MUSIC for singlesnapshot spectral estimation: stability and superresolution. Appl. Comput. Harmon. Anal. 40(1), 33–67 (2016)
Liu, F., Masouros, C., Petropulu, A.P., Griffiths, H., Hanzo, L.: Joint radar and communication design: applications, stateoftheart, and the road ahead. IEEE Trans. Comm. 68(6), 3834–3862 (2020)
Nikol’skiĭ, S.M.: Approximation of Functions of Several Variables and Imbedding Theorems. Die Grundlehren der mathematischen Wissenschaften in Einzeldarstellung. SpringerVerlag, Berlin (1975)
Pfander, G.E., Rauhut, H.: Sparsity in timefrequency representations. J. Fourier Anal. Appl. 16(2), 233–260 (2010)
Pfander, G.E., Walnut, D.F.: Operator identification and Feichtinger’s algebra. Sampl. Theory Signal Image Process. 5(2), 183–200 (2006)
Plonka, G., Potts, D., Steidl, G., Tasche, M.: Numerical Fourier Analysis. Applied and Numerical Harmonic Analysis. Birkhäuser, Basel (2018)
Potts, D., Tasche, M.: Parameter estimation for multivariate exponential sums. Electron. Trans. Numer. Anal. 40, 204–224 (2013)
Stampfer, K., Plonka, G.: The generalized operator based Prony method. Constr. Approx. 52, 1–36 (2020)
Tang, G., Bhaskar, B.N., Recht, B.: Near minimax line spectral estimation. IEEE Trans. Inform. Theory 61(1), 499–512 (2015)
Taubock, G., Hlawatsch, F., Eiwen, D., Rauhut, H.: Compressive estimation of doubly selective channels in multicarrier systems: leakage effects and sparsityenhancing processing. IEEE J. Sel. Top. Signal Process. 4(2), 255–271 (2010)
Tibshirani, R.: Regression shrinkage and selection via the LASSO. J. Roy. Statist. Soc. Ser. B 58(1), 267–288 (1996)
Tropp, J.A.: Greed is good: algorithmic results for sparse approximation. IEEE Trans. Inform. Theory 50(10), 2231–2242 (2004)
Wu, T.T., Lange, K.: Coordinate descent algorithms for LASSO penalized regression. Ann. Appl. Stat. 2(1), 224–244 (2008)
Acknowledgements
We thank Götz Pfander and Dae Gwan Lee for inspiring discussions. Further, we thank the unknown referees for their valuable comments to improve the manuscript. This work was supported by Deutsche Forschungsgemeinschaft (DFG) Grant JU 2795/3.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
This work was supported by Deutsche Forschungsgemeinschaft (DFG) Grant JU 2795/3.
Communicated by Antonio García.
Appendices
Proof of Theorem 2
To prove Theorem 2 we need the following auxiliary lemmata. We start with Poisson’s summation formula for bandlimited functions. Since we have not found it directly in the literature we give the proof for convenience.
Lemma 4
(Poisson Summation Formula for Bandlimited \(L^1\)Functions) Let \(f \in L^1(\mathbb {R}) \cap C_0(\mathbb {R})\) be bandlimited. Then, for \(a > 0\), the aperiodic function \(F_a\) given by
converges absolutely for all \(x \in \mathbb {R}\), and we have
In particular, \(F_a\) is a trigonometric polynomial for all \(a > 0\).
Proof
By assumption, we have \({{\,\mathrm{\mathrm {supp}}\,}}{\hat{f}} \subseteq [\sigma , \sigma ]\) for some \(\sigma > 0\) so that we may identify f as an element in \(B^1_{2 \pi \sigma }\). By Theorem 1, we know that
for all \(x \in \mathbb {R}\). This shows that \(F_a\) is indeed welldefined and bounded. In particular, \(F_a \in L^\infty (\mathbb {R}/ a \mathbb {Z}) \subset L^1(\mathbb {R}/ a \mathbb {Z})\) and we can compute the Fourier coefficients
Interchanging the series and integral in (26) is allowed by the theorem of Fubini–Tonelli since \(x \mapsto \sum _{\ell \in \mathbb {Z}} f(x  a \ell )\) is uniformly bounded by (25) and thus integrable on [0, a].
Since \({\hat{f}}\) has compact support, only finitely many Fourier coefficients are nonzero, so the Fourier series
converges uniformly and is indeed a trigonometric polynomial.\(\square \)
Lemma 5
Let \(f = {\hat{\mu }}_f\), where \(\mu _f \in {\mathscr {M}}(\mathbb {R})\) fulfills \({{\,\mathrm{\mathrm {supp}}\,}}\mu _f \subseteq [\sigma , \sigma ]\) for some \(\sigma > 0\). Then f is infinitely often differentiable and \(f^{(n)} = {\hat{\mu }}_{f^{(n)}} \in {\mathscr {M}}(\mathbb {R})\) for all \(n \in \mathbb {N}\) with
In particular, \({{\,\mathrm{\mathrm {supp}}\,}}\mu _{f^{(n)}} \subseteq [\sigma , \sigma ]\).
Proof
Consider the difference quotients \(g_h(x, \xi ) :=\frac{1}{h}(\mathrm {e}^{2\pi \mathrm {i}(x + h) \xi }  \mathrm {e}^{2 \pi \mathrm {i}x \xi })\) for \(x \in \mathbb {R}\), \(\xi \in [\sigma , \sigma ]\) and \(h \ne 0\). Due to the mean value theorem, they are uniformly bounded by
Since constant functions are integrable w.r.t. \(\mu _f \in {\mathscr {M}}(\mathbb {R})\), it follows from the dominated convergence theorem that
Repeating the above argument starting with \(f'\), then \(f^{(2)}\), and so forth, we obtain the claim inductively for all \(n \in \mathbb {N}\).\(\square \)
Lemma 6
([26, Thm 4.4, p. 25]) Let f be an infinitely often differentiable, Tperiodic function for \(T > 0\). Denote the Fourier coefficients of f by
Then for all \(j \in \mathbb {N}_0\) there exists \(C_j > 0\) such that
Proof
(Theorem 2) By Lemma 5 we know that f is infinitely often differentiable and by Lemma 6 we have for all \(j \in \mathbb {N}\) that \({\hat{f}}(k) \le C_j k^{j}\) for some \(C_j > 0\), so in particular \(({\hat{f}}(k))_{k \in \mathbb {Z}} \in \ell ^1(\mathbb {Z})\). Define the Borel measure \(\mu \) by
We have to show that \(\mu \in {\mathscr {M}}(\mathbb {R})\) and we will use that \(({\mathscr {M}}(\mathbb {R}), \Vert \cdot \Vert _{{\mathscr {M}}(\mathbb {R})})\) is the dual space of \((C_0(\mathbb {R}), \Vert \cdot \Vert _\infty )\). Let \(\varphi \in C_0(\mathbb {R})\) be arbitrary, then
This shows that \(\mu \) indeed defines a continuous linear functional on \(C_0(\mathbb {R})\). The Fourier transform of \(\mu \) is
Since the Fourier transform is unique this implies \(\mu = \mu _f\). Finally, by assumption \({{\,\mathrm{\mathrm {supp}}\,}}\mu = {{\,\mathrm{\mathrm {supp}}\,}}\mu _f \subseteq [\sigma , \sigma ]\), so that \({\hat{f}}(k) = 0\) for all \(k \in \mathbb {Z}\) satisfying \(k > \sigma K\) and we obtain (1). This concludes the proof. \(\square \)
Proof of Theorem 4
Proof
(Theorem 4) The first part can be proved exactly following the lines of the classical sampling theorem of Shannon, Whittaker, Kotelnikov, see [36, Thm 2.29] for instance. It remains to show the convergence in \(L^1(\mathbb {R})\). Applying Theorem 1 to \(\phi \), we obtain
Since the righthand side vanishes for \(M \rightarrow \infty \) independently of x due to \(f \in C_0 (\mathbb {R})\), the pointwise convergent series \(\sum _{k \in \mathbb {Z}} f(ak)\phi (xak)\) also converges uniformly. The partial sums are continuous functions such that the limit is continuous too and, in particular, measurable. Using Levi’s monotone convergence theorem [11, Thm 2.4.1], we have
Since Theorem 1 ensures \(\bigl (f(a k) \bigr )_{k \in \mathbb {Z}} \in \ell ^1(\mathbb {Z})\), the last expression converges to zero as \(M \rightarrow \infty \), which establishes the \(L^1\)convergence.\(\square \)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Beinert, R., Jung, P., Steidl, G. et al. Superresolution for doublydispersive channel estimation. Sampl. Theory Signal Process. Data Anal. 19, 16 (2021). https://doi.org/10.1007/s43670021000160
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s43670021000160
Keywords
 Superresolution
 Channel estimation
 Doublydispersive
 Timefrequency
 Sampling
Mathematics Subject Classification
 47A62
 65R30
 65T99
 94A20