1 Introduction

The Nyquist–Shannon sampling theorem is perhaps the most impactful result in the theory of signal processing, fundamentally shaping the practice of acquiring and processing data [1, 2] (also attributed to Kotel’nikov [3], Ferrar [4], Cauchy [5], Ogura [6], Whittaker [7, 8]). In this setting, typical acquisition of a continuous-time signal involves taking equispaced samples at a rate slightly higher than a prescribed frequency \(\omega \) Hz in order to obtain a bandlimited approximation via a quickly decaying kernel. Such techniques provide accurate approximations of (noisy) signals whose spectral energy is largely contained in the band \([-\omega /2,\omega /2]\) [5, 9,10,11].

As a consequence, industrial signal acquisition and post-processing methods tend to be designed to incorporate uniform sampling. However, such sampling schemes are difficult to honor in practice due to physical constraints and natural factors that perturb sampling locations from the uniform grid, i.e., nonuniform or off-the-grid samples. In response, nonuniform analogs of the noise-free sampling theorem have been developed, where an average sampling density proportional to the highest frequency \(\omega \) of the signal guarantees accurate interpolation, e.g., Landau density [11,12,13]. However, non-equispaced samples are typically unwanted and regarded as a burden due to the extra computational cost involved in regularization, i.e., interpolating the nonuniform samples onto the desired equispaced grid.

On the other hand, many works in the literature have considered the potential benefits of deliberate nonuniform sampling [14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41]. Suppression of aliasing error, i.e., anti-aliasing, is a well known advantage of randomly perturbed samples. For example, jittered sampling is a common technique for anti-aliasing that also provides a well distributed set of samples [30, 42,43,44]. To the best of the authors’ knowledge, this phenomenon was first noticed by Shapiro and Silverman [14] (also by Beutler [15, 16] and implicitly by Landau [12]) and remained unused in applications until rediscovered in Pixar animation studios by Cook [17]. According to our literature review, such observations remain largely empirical or arguably uninformative for applications. Closing this gap between theory and experiments would help the practical design of such widely used methodologies.

To this end, in this paper we propose a practical framework that allows us to concretely investigate the properties of randomly deviated samples for undersampling, anti-aliasing and general noise attenuation. To elaborate (see Sect. 1.1 for notation), let \({{\textbf {f}}} :[-\frac{1}{2}, \frac{1}{2})\mapsto {\mathbb {C}}\) be our function of interest that belongs to some smoothness class. Our goal is to obtain a uniform discretization \(f\in {\mathbb {C}}^N,\) where an estimate of \(f_k = {{\textbf {f}}}(\frac{k-1}{N}-\frac{1}{2})\) will provide an accurate approximation \({{\textbf {f}}}^{\sharp }(x)\) of \({{\textbf {f}}}(x)\) for all \(x\in [-\frac{1}{2}, \frac{1}{2}).\) We are given noisy non-equispaced samples, \(b = \tilde{f} + d \in {\mathbb {C}}^m,\) where \(\tilde{f}_k = {{\textbf {f}}}(\frac{k-1}{m}-\frac{1}{2} + \Delta _k)\) is the nonuniformly sampled signal and \(d\in {\mathbb {C}}^m\) encompasses unwanted additive noise. In general, we will consider functions \({{\textbf {f}}}\) with support on \([-\frac{1}{2}, \frac{1}{2})\) whose periodic extension is in the Wiener algebra \(A(\Omega )\) [45], where by abuse of notation \(\Omega \) denotes the interval \([-\frac{1}{2}, \frac{1}{2})\) and the torus \({\mathbb {T}}.\)

To achieve undersampling and anti-aliasing, we assume our uniform signal admits a sparse (or compressible) representation along the lines of compressive sensing [46,47,48]. We say that f is compressible with respect to a transform, \(\Psi \in {\mathbb {C}}^{N\times N},\) if there exists some \(g\in {\mathbb {C}}^N\) such that \(f = \Psi g\) and g can be accurately approximated by an s-sparse vector \((s\le N).\) In this scenario, our methodology consists of constructing an interpolation kernel \({\mathcal {S}}\in {\mathbb {R}}^{m\times N}\) that achieves \({\mathcal {S}}f\approx \tilde{f}\) accurately for smooth signals, in order to define our estimate \({{\textbf {f}}}^{\sharp }(x)\) using the discrete approximation \(\Psi g^{\sharp }\approx f\) where

$$\begin{aligned} {g^{\sharp }{:}{=}{{\,\mathrm{arg\,\hspace{-2pt}min}\,}}_{h\in {\mathbb {C}}^N}\lambda \Vert h\Vert _1+\frac{\sqrt{N}}{\sqrt{m}}\Vert {\mathcal {S}}\Psi h - b\Vert _2} \end{aligned}$$
(1)

and \(\lambda \ge 0\) is a parameter to be chosen appropriately. We show that for signals in the Wiener algebra and under certain distributions \({\mathcal {D}},\) if we have \(m\sim {\mathcal {O}}(s{{\,\mathrm{poly\,\hspace{-2pt}log}\,}}(N))\) off-the-grid samples with i.i.d. deviations \(\{\Delta _k\}_{k=1}^{m}\sim {\mathcal {D}}\) then the approximation error \(|{{\textbf {f}}}^{\sharp }(x)-{{\textbf {f}}}(x)|\) is proportional to \(\Vert d\Vert _2,\) the error of the best s-sparse approximation of g,  and the error of the best \(\frac{N}{2}\)-bandlimited approximation of \({{\textbf {f}}}\) in the Wiener algebra norm (see equation 6.1 in [45]). If \(s\ll N,\) the average sampling rate required for our result (step size \(\sim \frac{1}{s{{\,\mathrm{poly\,\hspace{-2pt}log}\,}}(N)}\)) provides a stark contrast to standard density conditions where a rate proportional to the highest frequency \(\omega \sim N,\) resulting in step size \(\sim \frac{1}{N},\) is needed for the same bandlimited approximation. The result is among the first to formally state the anti-aliasing nature of nonuniform sampling in the context of undersampling (see Sect. 3).

Removing the sparse signal model, we attenuate measurement noise (i.e., denoise) by defining \({{\textbf {f}}}^{\sharp }(x)\) using the discrete estimate

$$\begin{aligned} f \approx f^{\sharp } := {{\,\mathrm{arg\,\hspace{-2pt}min}\,}}_{h\in {\mathbb {C}}^N}\Vert {\mathcal {S}}h - b\Vert _2. \end{aligned}$$
(2)

In this context, our main result states that \(m\ge N\log (N)\) i.i.d. randomly deviated samples provide approximation error \(|{{\textbf {f}}}^{\sharp }(x)-{{\textbf {f}}}(x)|\) proportional to the noise level \((\frac{\Vert d\Vert _2}{\sqrt{\log (N)}}\)) and the error of the best \(\frac{N}{2}\)-bandlimited approximation of \({{\textbf {f}}}\) in the Wiener algebra norm. Thus, by nonuniform oversampling (relative to the desired \(\frac{N}{2}\)-bandwidth) we attenuate unwanted noise regardless of its structure. While uniform oversampling is a common noise filtration technique, our results show that general nonuniform samples also posses this denoising property (see Sect. 4).

The rest of the paper is organized as follows: Sect. 2 provides a detailed elaboration of our sampling scenario, signal model and methodology. Section 3 showcases our results for anti-aliasing and undersampling of compressible signals while Sect. 4 considers noise attenuation via oversampling. A comprehensive discussion of the results and implications is presented in Sect. 5. Several experiments and computational considerations are found in Sect. 6, followed by concluding remarks in Sect. 7. We postpone the proofs of our statements until Sect. 8. Before proceeding to the next section, we find it best to introduce the general notation that will be used throughout. However, each subsequent section may introduce additional notation helpful in its specific context.

1.1 Notation

We denote complex valued functions of real variables using bold letters, e.g., \({{\textbf {f}}}:{\mathbb {R}}\rightarrow {\mathbb {C}}.\) For any integer \(n\in {\mathbb {N}},\) [n] denotes the set \(\{\ell \in {\mathbb {N}}: 1 \le \ell \le n\}.\) For \(k,\ell \in {\mathbb {N}},\) \(b_{k}\) indicates the k-th entry of the vector b\(D_{k\ell }\) denotes the \((k,\ell )\) entry of the matrix D and \(D_{k*} \ (D_{*\ell })\) denotes the k-th row (resp. \(\ell \)-th column) of the matrix. We reserve x to denote real variables and write the complex exponential as \({{\textbf {e}}}(x) := e^{2\pi ix},\) where i is the imaginary unit. For a vector \(f\in {\mathbb {C}}^{n}\) and \(1\le p < \infty ,\) \(\Vert f\Vert _p := \big [\sum _{k=1}^{n}|f_k|^p\big ]^{1/p}\) is the p-norm, \(\Vert f\Vert _{\infty } = \max _{k\in [n]}|f_k|,\) and \(\Vert f\Vert _0\) gives the total number of non-zero elements of f. For a matrix \(X\in {\mathbb {C}}^{n\times m},\) \(\sigma _k(X)\) denotes the k-th largest singular value of X and \(\Vert X\Vert := \sigma _1(X)\) is the spectral norm. \(A(\Omega )\) is the Wiener algebra and \(H^k(\Omega )\) is the Sobolev space \(W^{k,2}(\Omega )\) (with domain \(\Omega \)), \(S^{n-1}\) is the unit sphere in \({\mathbb {C}}^n,\) and the adjoint of a linear operator \({\mathcal {A}}\) is denoted by \({\mathcal {A}}^*.\)

2 Assumptions and methodology

In this section we develop the signal model, deviation model and interpolation kernel, in Sects. 2.12.2 and 2.3 respectively. This will allow us to proceed to Sects. 3 and 4 where the main results are elaborated. We note that the deviation model (Sect. 2.2) and sparse signal model at the end of Sect. 2.1 only apply to the compressive sensing results in Sect. 3. However, the sampling on the torus assumption in Sect. 2.1 does apply to the results in Sect. 4 as well.

2.1 Signal model

For the results in Sects. 3 and 4, let \(\Omega = [-\frac{1}{2},\frac{1}{2})\) and let \({{\textbf {f}}}:\Omega \rightarrow {\mathbb {C}}\) be the function of interest to be sampled. We assume \({{\textbf {f}}}\in A(\Omega )\) with Fourier expansion

$$\begin{aligned} {{\textbf {f}}}(x) = \sum _{\ell =-\infty }^{\infty }c_{\ell }{{\textbf {e}}}(\ell x), \end{aligned}$$
(3)

on \(\Omega .\) Note that our regularity assumption implies that

$$\begin{aligned} \sum _{\ell =-\infty }^{\infty }|c_{\ell }|<\infty , \end{aligned}$$

which will be crucial for our error bounds. Further, \(H^{k}(\Omega )\subset A(\Omega )\) for \(k\ge 1\) so that our context applies to many signals of interest.

Henceforth, let \(N\in {\mathbb {N}}\) be odd. We denote the discretized regular data vector by \(f \in {\mathbb {C}}^{N},\) which is obtained by sampling \({{\textbf {f}}}\) on the uniform grid \(\tau =\{t_1,\ldots , t_N\} \subset \Omega ,\) with \(t_k := \frac{k-1}{N}-\frac{1}{2},\) (which is a collection of equispaced points) so that \(f_k = {{\textbf {f}}}(t_{k}).\) The vector f will be our discrete signal of interest to recover via nonuniform samples in order to ultimately obtain an approximation to \({{\textbf {f}}}(x)\) for all \(x\in \Omega .\) Similar results can be obtained in the case N is even, our current assumption is adopted to simplify the exposition.

The observed nonuniform data is encapsulated in the vector \(\tilde{f} \in {\mathbb {C}}^{m}\) with underlying unstructured grid \(\tilde{\tau } = \{\tilde{t}_1,\ldots , \tilde{t}_m\} \subset \Omega \) where \(\tilde{t}_k := \frac{k-1}{m}-\frac{1}{2} + \Delta _k\) is now a collection of generally non-equispaced points. The entries of the perturbation vector \(\Delta \in {\mathbb {R}}^{m}\) define the pointwise deviations of \(\tilde{\tau }\) from the equispaced grid \(\{\frac{k-1}{m}-\frac{1}{2}\}_{k=1}^{m},\) where \(\tilde{f}_k = {{\textbf {f}}}(\tilde{t}_{k}).\) Noisy nonuniform samples are given as

$$\begin{aligned} b = \tilde{f} + d \in {\mathbb {C}}^m, \end{aligned}$$

where the noise model, d,  does not incorporate off-the-grid corruption. We assume that we know \(\tilde{\tau }.\)

In order for the expansion (3) to remain valid for \(x\in \tilde{\tau },\) we must impose \(\tilde{\tau }\subset \Omega .\) This is not possible for the general deviations \(\Delta \) we wish to examine, so we instead adopt the torus as our sampling domain to ensure this condition.

Sampling on the torus: for all our results, we consider sampling schemes to be on the torus. In other words, we allow grid points \(\tilde{\tau }\) to lie outside of the interval \([-\frac{1}{2},\frac{1}{2})\) but they will correspond to samples of \({{\textbf {f}}}\) within \([-\frac{1}{2},\frac{1}{2})\) via a circular wrap-around. To elaborate, if \({{\textbf {f}}}|_{\Omega }(x)\) is given as

$$\begin{aligned} {{\textbf {f}}}|_{\Omega }(x) = {\left\{ \begin{array}{ll} {{\textbf {f}}}(x) &{} \text {if} \ x\in \left[ -\frac{1}{2},\frac{1}{2}\right) \\ 0 &{} \text {if} \ x\notin \left[ -\frac{1}{2},\frac{1}{2}\right) , \end{array}\right. } \end{aligned}$$

then we define \(\tilde{{{\textbf {f}}}}(x)\) as the periodic extension of \({{\textbf {f}}}|_{\Omega }(x)\) to the whole line

$$\begin{aligned} \tilde{{{\textbf {f}}}}(x) = \sum _{\ell =-\infty }^{\infty }{{\textbf {f}}}|_{\Omega }(x+\ell ). \end{aligned}$$

We now apply samples generated from our deviations \(\tilde{\tau }\) to \(\tilde{{{\textbf {f}}}}(x).\) Indeed, for any \(\tilde{t}_k\) generated outside of \(\Omega \) will have \(\tilde{{{\textbf {f}}}}(\tilde{t}_k) = {{\textbf {f}}}(t^*)\) for some \(t^*\in \Omega .\) In this way, we avoid restricting the magnitude of the entries of \(\Delta \) and the expansion (3) will remain valid for any nonuniform samples generated.

Sparse signal model: For the results of Sect. 3 only, we impose a compressibility condition on \(f \in {\mathbb {C}}^{N}.\) To this end, let \(\Psi \in {\mathbb {C}}^{N\times N}\) be a basis with \(0< \sigma _N(\Psi ) =: \alpha \) and \(\sigma _1(\Psi ) =: \beta .\) We assume there exists some \(g\in {\mathbb {C}}^N\) such that \(f = \Psi g,\) where g can be accurately approximated by an \(s\le N\) sparse vector. To be precise, for \(s\in [N]\) we define the error of best s-sparse approximation of g as

$$\begin{aligned} \epsilon _s(g) := \min _{\Vert h\Vert _0\le s}\Vert h-g\Vert _1, \end{aligned}$$

and assume s has been chosen so that \(\epsilon _s(g)\) is within a prescribed error tolerance determined by the practitioner.

In Sect. 8.1, we will relax the condition that \(\Psi \) be a basis by allowing full column rank matrices \(\Psi \in {\mathbb {C}}^{N\times n}\) with \(n\le N.\) While such transforms are not typical in compressive sensing, we argue that they may be of practical interest since our results will show that if \(\Psi \) can be selected as tall matrix then the sampling complexity will solely depend on its number of columns (i.e., the smallest dimension n).

The transform \(\Psi \) will have to be coherent with respect to the 1D centered discrete Fourier basis \({\mathcal {F}}\in {\mathbb {C}}^{N\times N}\) (see Sect. 2.3 for definition of \({\mathcal {F}}).\) We define the DFT-incoherence parameter as

$$\begin{aligned} \gamma = \max _{\ell \in [N]} \sum _{k=1}^{N}|\langle {\mathcal {F}}_{*k},\Psi _{*\ell }\rangle |, \end{aligned}$$

which provides a uniform bound on the \(\ell _1\)-norm of the discrete Fourier coefficients of the columns of \(\Psi .\) This parameter will play a role in the sampling complexity of our result in Sect. 3, as a metric that quantifies the smoothness of our signal of interest. We discuss \(\gamma \) in detail in Sect. 5.3, including examples for several transforms common in compressive sensing.

2.2 Deviation model

Section 3 will apply to random deviations \(\Delta \in {\mathbb {R}}^{m}\) whose entries are i.i.d. with any distribution, \({\mathcal {D}},\) that obeys the following: for \(\delta \sim {\mathcal {D}},\) there exists some \(\theta \ge 0\) such that for all integers \(0<|j|\le \frac{N-1}{m}\) we have

$$\begin{aligned} \frac{2N}{m}|{\mathbb {E}}{{\textbf {e}}}(jm\delta )|\le \theta . \end{aligned}$$
(4)

This will be known as our deviation model. In our results, distributions with smaller \(\theta \) parameter will require less samples and provide reduced error bounds. We postpone further discussion of the deviation model until Sect. 5.2, where we will also provide examples of deviations that fit our criteria. We note that the deviation model is most relevant when \(m<N.\) The case \(m\ge N\) is discussed in Sect. 4, which no longer requires this deviation model or the sparse signal model.

2.3 Dirichlet kernel

In Sects. 3 and 4, we model our nonuniform samples via an interpolation kernel \({\mathcal {S}}\in {\mathbb {R}}^{m\times N}\) that achieves \({\mathcal {S}}f\approx \tilde{f}\) accurately. We consider the Dirichlet kernel defined by \({\mathcal {S}} = {\mathcal {N}}{\mathcal {F}}^*:{\mathbb {C}}^{N}\rightarrow {\mathbb {C}}^{m},\) where \({\mathcal {F}}\in {\mathbb {C}}^{N\times N}\) is a 1D centered discrete Fourier transform (DFT) and \({\mathcal {N}}\in {\mathbb {C}}^{m\times N}\) is a 1D centered nonuniform discrete Fourier transform (NDFT, see [49, 50]) with normalized rows and non-harmonic frequencies chosen according to \(\tilde{\tau }.\) In other words, let \(\tilde{N} = \frac{N-1}{2},\) then the \((k, \ell ) \in [m]\times [N]\) entry of \({\mathcal {N}}\) is given as

$$\begin{aligned} {\mathcal {N}}_{k\ell } = \frac{1}{\sqrt{N}}{{\textbf {e}}}(-\tilde{t}_k(\ell -\tilde{N}-1)). \end{aligned}$$

This NDFT is referred to as a nonuniform discrete Fourier transform of type 2 in [50]. Thus, the action of \({\mathcal {S}}\) on \(f\in {\mathbb {C}}^{N}\) can be given as follows: we first apply the centered inverse DFT to our discrete uniform data

$$\begin{aligned} \check{f}_u := ({\mathcal {F}}^*f)_u = \sum _{p=1}^{N}f_p{\mathcal {F}}^*_{up} := \frac{1}{\sqrt{N}}\sum _{p=1}^{N}f_p{{\textbf {e}}}(t_p(u-\tilde{N}-1)),\quad \forall u\in [N],\quad \end{aligned}$$
(5)

followed by the NDFT in terms of \(\tilde{\tau }\):

$$\begin{aligned} ({\mathcal {S}}f)_k := ({\mathcal {N}}\check{f})_k = \sum _{u=1}^{N}\check{f}_u{\mathcal {N}}_{ku} := \frac{1}{\sqrt{N}}\sum _{u=1}^{N} \check{f}_u{{\textbf {e}}}(-\tilde{t}_k(u-\tilde{N}-1)),\quad \forall k\in [m].\nonumber \\ \end{aligned}$$
(6)

Equivalently,

$$\begin{aligned} ({\mathcal {S}}f)_k =\frac{1}{N}\sum _{p=1}^{N}f_p{{\textbf {K}}}(\tilde{t}_{k}-t_p)\quad \forall k\in [m], \end{aligned}$$
(7)

where \({{\textbf {K}}}(x) = \frac{\sin {(N\pi x})}{\sin {(\pi x)}}\) is the Dirichlet kernel. This equality is well known and holds by applying the geometric series formula upon expansion. This kernel is commonly used for trigonometric interpolation and is accurate when acting on signals that can be well approximated by trigonometric polynomials of finite order, as we show in the following theorem.

Theorem 1

Let \({\mathcal {S}}, f\) and \(\tilde{f}\) be defined as above and \(\tilde{t}_k\in \Omega \) for some \(k\in [m].\) If \(\tilde{t}_k = t_p\) for some \(p\in [N]\) then

$$\begin{aligned} \left( \tilde{f}-{\mathcal {S}}f\right) _k = 0 \end{aligned}$$
(8)

and otherwise

$$\begin{aligned} \left( \tilde{f}-{\mathcal {S}}f\right) _k = \sum _{|\ell |>\tilde{N}}c_{\ell }\left( {{{\textbf {e}}}}(\ell \tilde{t}_k) -(-1)^{\lfloor \frac{\ell +\tilde{N}}{N}\rfloor }{{{\textbf {e}}}}(r(\ell )\tilde{t}_k)\right) , \end{aligned}$$
(9)

where \(r(\ell ) = {{\,\mathrm{rem\,\hspace{-2pt}}\,}}(\ell + \tilde{N},N) - \tilde{N}\) with \({{\,\mathrm{rem\,\hspace{-2pt}}\,}}(p,q)\) giving the remainder after division of p by q. As a consequence,  if \(\tilde{t}_k\in \Omega \) for all \(k\in [m]\) then for any integer \(1\le p< \infty \)

$$\begin{aligned} \Vert \tilde{f}-{\mathcal {S}}f\Vert _p \le 2m^{\frac{1}{p}}\sum _{|\ell |>\tilde{N}}|c_{\ell }|, \end{aligned}$$
(10)

and

$$\begin{aligned} \Vert \tilde{f}-{\mathcal {S}}f\Vert _{\infty } \le 2\sum _{|\ell |>\tilde{N}}|c_{\ell }|. \end{aligned}$$
(11)

The proof of this theorem is postponed until Sect. 8.3. Therefore, the error due to \({\mathcal {S}}\) is proportional to the 1-norm (or Wiener algebra norm) of the Fourier coefficients of \({{\textbf {f}}}\) that correspond to frequencies larger than \(\tilde{N} = \frac{N-1}{2}.\) In particular notice that if \(c_{\ell } = 0\) for all \(\ell > \tilde{N}\) we obtain perfect interpolation, as expected from standard results in signal processing (i.e., bandlimited signals consisting of trigonometric polynomials with finite degree \(\le \tilde{N}).\) Despite the wide usage of trigonometric interpolation in applications [51,52,53], such a result that gives a sharp error bound does not seem to exist in the literature.

Notice that Theorem 1 only holds for \(\tilde{\tau }\subset \Omega \) as restricted in Sect. 2.1. However, the results continues to hold for \(\tilde{\tau }\) unrestricted if we sample on the torus as imposed in Sect. 2.1. Therefore, the error bound will always hold under our setup.

3 Anti-aliasing via nonuniform sampling

With the definitions and assumptions introduced in Sect. 2, our methodology in this chapter will consist of modeling our m nonuniform measurements via \({\mathcal {S}}\) and approximating the s largest coefficients of f in \(\Psi \) (in the representation \(f=\Psi g).\) This discrete approximation will provide an accurate estimate \({{\textbf {f}}}^{\sharp }(x)\) of \({{\textbf {f}}}(x)\) for all \(x\in \Omega ,\) achieving precision comparable to that given by the best \(\frac{N}{2}\)-bandlimited approximation of \({{\textbf {f}}}\) while requiring \(m\ll N\) samples.

The following is a simplified statement, assuming that \(\Psi \) is an orthonormal basis and \(m\le N.\) We focus on this cleaner result for ease of exposition, presented as a corollary of the main result in Sect. 8.1. The full statement considers the case \(m\ge N\) and allows for more general and practical \(\Psi \) that allow for reduced sample complexity.

Theorem 2

Let \(2\le s\le N\) and \(m\le N,\) where m is the number of nonuniform samples. Under our signal model with Fourier expansion (3), let \(\Psi \in {\mathbb {C}}^{N\times N}\) be an orthonormal basis with DFT-incoherence parameter \(\gamma .\) Define the interpolation kernel \({\mathcal {S}}\) as in Sect. 2.3 with the entries of \(\Delta \) i.i.d. from any distribution satisfying our deviation model from Sect. 2.2 with \(\theta < 1.\)

Define

$$\begin{aligned} g^{\sharp }{:}{=}{{\,\mathrm{arg\,\hspace{-2pt}min}\,}}_{h\in {\mathbb {C}}^N}\lambda \Vert h\Vert _1+\frac{\sqrt{N}}{\sqrt{m}}\Vert {\mathcal {S}}\Psi h - b\Vert _2 \end{aligned}$$
(12)

with

$$\begin{aligned} 0<\lambda \le \frac{\sqrt{1-\theta }}{2\sqrt{2s}}. \end{aligned}$$

If

$$\begin{aligned} m \ge \frac{C_1\gamma ^2(1+\theta )}{(1-\theta )^2} s\log ^4\left( C_2N\right) \end{aligned}$$
(13)

where \(C_1\) and \(C_2\) are absolute constants,  then

$$\begin{aligned} \Vert f-\Psi g^{\sharp }\Vert _{2} \le \frac{8\epsilon _s(g)}{\sqrt{s}} + \left( \frac{4}{\lambda \sqrt{s}}+\frac{8\sqrt{2}}{\sqrt{1-\theta }} \right) \left( \frac{\sqrt{N}}{\sqrt{m}}\Vert d\Vert _2 + 2\sqrt{N}\sum _{|\ell |>\frac{N-1}{2}}|c_{\ell }|\right) \nonumber \\ \end{aligned}$$
(14)

with probability exceeding \(1-\frac{1}{N}.\)

Therefore, with \(m\sim s\log ^4(N)\) randomly perturbed samples we can recover f with error (14) proportional to the sparse model mismatch \(\epsilon _s(g),\) the noise level \(\Vert d\Vert _2,\) and the error of the best \(\frac{N-1}{2}\)-bandlimited approximation of \({{\textbf {f}}}\) in the Wiener algebra norm (i.e., \(\sum _{|\ell |>\frac{N-1}{2}}|c_{\ell }|).\) As a consequence, we can approximate \({{\textbf {f}}}(x)\) for all \(x\in \Omega \) as stated in the following corollary.

Corollary 1

Let \({{{\textbf {h}}}}:\Omega \rightarrow {\mathbb {C}}^N\) be the vector valued function defined entry-wise for \(\ell \in [N]\) as

$$\begin{aligned} {{{\textbf {h}}}}(x)_{\ell } {:}{=}\frac{1}{\sqrt{N}}{{\textbf {e}}}(-x(\ell -\tilde{N}-1)), \end{aligned}$$
(15)

and define the function \({{{\textbf {f}}}}^{\sharp }:\Omega \rightarrow {\mathbb {C}}\) via

$$\begin{aligned} {{{\textbf {f}}}}^{\sharp }(x) = \langle {{{\textbf {h}}}}(x),{\mathcal {F}}^*\Psi g^{\sharp }\rangle , \end{aligned}$$
(16)

where \(g^{\sharp }\) is given by (12).

Then,  under the assumptions of Theorem 2,

$$\begin{aligned} |{{{\textbf {f}}}}(x) - {{{\textbf {f}}}}^{\sharp }(x)|&\le \frac{8\epsilon _s(g)}{\sqrt{s}} + \left( \frac{4}{\lambda \sqrt{s}}+\frac{8\sqrt{2}}{\sqrt{1-\theta }}\right) \frac{\sqrt{N}}{\sqrt{m}}\Vert d\Vert _2\nonumber \\&\quad +\left( \frac{8\sqrt{N}}{\lambda \sqrt{s}}+\frac{16\sqrt{2N}}{\sqrt{1-\theta }}+2\right) \sum _{|\ell |>\frac{N-1}{2}}|c_{\ell }|\end{aligned}$$
(17)

holds for all \(x\in \Omega = [-\frac{1}{2},\frac{1}{2})\) with probability exceeding \(1-\frac{1}{N}.\)

The proof of this corollary is presented in Sect. 8.3. In the case \(\epsilon _s(g) = \Vert d\Vert _2 = 0,\) the results intuitively say that we can recover a \(\frac{N-1}{2}\)-bandlimited approximation of \({{\textbf {f}}}\) with \({\mathcal {O}}(s{{\,\mathrm{poly\,\hspace{-2pt}log}\,}}(N))\) random off-the-grid samples. In the case of equispaced samples, \({\mathcal {O}}(N)\) measurements are needed for the same quality of reconstruction by the Nyquist–Shannon sampling theorem (or by Theorem 1 directly). Thus, for compressible signals with \(s\ll N,\) random nonuniform samples provide a significant reduction in sampling complexity (undersampling) and simultaneously allow recovery of frequency components exceeding the sampling density (anti-aliasing). See Sect. 5 for further discussion.

Notice that general denoising is not guaranteed in an undersampling scenario, due to the term \(\frac{\sqrt{N}}{\sqrt{m}}\Vert d\Vert _2\) in (14), and (17). In other words, one cannot expect the output estimate to reduce the measurement noise \(\Vert d\Vert _2\) since \(\frac{\sqrt{N}}{\sqrt{m}}\ge 1\) appearing in our error bound implies an amplification of the input noise level. Such situations with limited samples are typical in compressive sensing and this noise amplifying behavior is demonstrated numerically in Sect. 6.3. In general a practitioner must oversample (i.e., \(N< m\)) to attenuate the effects of generic noise. However, Theorem 2 and Corollary 1 state that nonuniform samples specifically attenuate aliasing noise.

4 Denoising via nonuniform oversampling

In this section, we show that reduction in the noise level introduced during acquisition is possible given nonuniform samples whose average density exceeds the Nyquist rate (relative to a desired bandwidth). While the implications of this section are not surprising in the context of classical sampling theory, to the best of our knowledge such guarantees do not exist in the literature when the sampling points are nonuniform.

By removing the sparse signal model (Sect. 2.1), deviation model (Sect. 2.2), and requiring \(m\ge N\) off-the-grid samples (on the torus, see Sect. 2.1), we now use the numerically cheaper program of least squares. To reiterate, \({{\textbf {f}}}\in A(\Omega )\) with Fourier expansion \(\sum _{\ell =-\infty }^{\infty }c_{\ell }{{\textbf {e}}}(\ell x)\) is our continuous signal of interest. With N odd, \(f \in {\mathbb {C}}^{N}\) is the discrete signal to be approximated, where \(f_k = {{\textbf {f}}}(t_{k})\) for \(t_k := \frac{k-1}{N}-\frac{1}{2}.\) The vector \(\tilde{f} \in {\mathbb {C}}^{m}\) encapsulates the nonuniformly sampled data where \(\tilde{f}_k = {{\textbf {f}}}(\tilde{t}_{k})\) for \(\tilde{t}_k := \frac{k-1}{m}-\frac{1}{2} + \Delta _k.\) Noisy nonuniform samples are given as

$$\begin{aligned} b = \tilde{f} + d \in {\mathbb {C}}^m, \end{aligned}$$

where the additive noise model, d,  does not incorporate off-the-grid corruption.

In this oversampling context, we provide a denoising result for a more general set of deviations.

Theorem 3

Let the entries of \(\Delta \) be i.i.d. from any distribution and define

$$\begin{aligned} f^{\sharp } {:}{=}{{\,\mathrm{arg\,\hspace{-2pt}min}\,}}_{h\in {\mathbb {C}}^N}\Vert {\mathcal {S}} h - b\Vert _2. \end{aligned}$$
(18)

If \(m= \kappa N\log (N)\) with \(\kappa \ge \frac{4}{\log \left( e/2\right) },\) then

$$\begin{aligned} \Vert f- f^{\sharp }\Vert _{2} \le \frac{2\sqrt{2}}{\sqrt{\kappa \log (N)}}\Vert d\Vert _2 + 4\sqrt{2N}\sum _{|\ell |>\frac{N-1}{2}}|c_{\ell }|\end{aligned}$$
(19)

with probability exceeding \(1-\frac{1}{N}.\)

The proof can be found in Sect. 8.2. In this scenario, we oversample relative to the \(\frac{N-1}{2}\)-bandlimited output by generating a set samples with average density exceeding the Nyquist rate (step size \(\frac{1}{N}).\) With \(m\ge \kappa N\log (N)\) for \(\kappa \ge 1,\) bound (19) tells us that we can diminish the measurement noise level \(\Vert d\Vert _2\) by a factor \( \sim \frac{1}{\sqrt{\kappa \log (N)}}.\) The oversampling parameter \(\kappa \) may be varied for increased attenuation at the cost of denser sampling. We comment that the methodology from Sect. 3 with \(m\ge N\) also allows for denoising and similar error bounds (see Theorem 4). However, focusing on the oversampling case distinctly provides simplified results with many additional benefits.

In particular, here the deviations \(\Delta \) need not be from our deviation model in Sect. 2.2 and instead the result applies to perturbations generated by any distribution. This includes the degenerate distribution (deterministic), so the claim also holds in the case of equispaced samples. Furthermore, we no longer require the sparsity assumption and the result applies to all functions in the Wiener algebra. Finally, the recovery method (18) consists of standard least squares which can be solved cheaply relative to the square-root LASSO decoder (12) from the previous section.

We may proceed analogously to Corollary 1 and show that the output discrete signal \(f^{\sharp }\) provides a continuous approximation

$$\begin{aligned} {{\textbf {f}}}^{\sharp }(x) {:}{=}\langle {{\textbf {h}}}(x),{\mathcal {F}}^*f^{\sharp }\rangle \approx {{\textbf {f}}}(x) \end{aligned}$$

for all \(x\in \Omega ,\) where h(x) is defined in (15). The error of this estimate is bounded as

$$\begin{aligned} |{{\textbf {f}}}^{\sharp }(x)-{{\textbf {f}}}(x)|\le \frac{2\sqrt{2}}{\sqrt{\kappa \log (N)}}\Vert d\Vert _2 + \left( 4\sqrt{2N}+2\right) \sum _{|\ell |>\frac{N-1}{2}}|c_{\ell }|, \end{aligned}$$

proportional to the error of the best \(\frac{N-1}{2}\)-bandlimited approximation in the Wiener algebra norm while attenuating the introduced measurement noise. In the result, the structure of the deviated samples is quite general and accounts for many practical cases.

While related results exist in the equispaced case (see for example Sect. 4 of [10]), Theorem 3 is the first such statement in a general non-equispaced case. The result therefore provides insight into widely applied techniques for the removal of unwanted noise, without making any assumptions on the noise structure.

5 Discussion

This section elaborates on several aspects of the results. Section 5.1 discusses relevant work in the literature. Section 5.2 provides examples of distributions that satisfy our deviation model and intuition of its meaning. Section 5.3 explores the \(\gamma \) parameter with examples of transformations \(\Psi \) that produce a satisfiable sampling complexity.

5.1 Related work

Several studies in the compressive sensing literature are similar to our results in Sect. 3 [53, 54]. In contrast to these references, we derive recovery guarantees for non-orthonormal systems (when \(\theta \ne 0\)) while focusing the scope of the paper within the context of classical sampling theory (introducing error according to the bandlimited approximation). The work in [53] considers sampling of sparse trigonometric polynomials and overlaps with our application in the case \(\Psi = {\mathcal {F}}.\) Our results generalize this work to allow for other signal models and sparsifying transforms. Furthermore, [53] assumes that the samples are chosen uniformly at random from a continuous interval or a discrete set of N equispaced points. In contrast, our results pertain to general deviations from an equispaced grid with average sampling density \(\sim s{{\,\mathrm{poly\,\hspace{-2pt}log}\,}}(N)\) and allow for many other distributions of the perturbations.

5.2 Deviation model

In this section, we present several examples of distributions that are suitable for our results in Sect. 3. Notice that our deviation model utilizes the characteristic function of a given distribution, evaluated at a finite set of points. This allows one to easily consider many distributions for our purpose by consulting the relevant and exhaustive literature of characteristic functions (see for example [55]).

  1. 1.

    Uniform continuous: \({\mathcal {D}} = {\mathcal {U}}[-\frac{1}{2m},\frac{1}{2m}]\) gives \(\theta = 0.\) To generalize this example, we may take \({\mathcal {D}} = {\mathcal {U}}[\mu -\frac{p}{2m},\mu +\frac{p}{2m}],\) for any \(\mu \in {\mathbb {R}}\) and \(p\in {\mathbb {N}}/\{0\}\) to obtain \(\theta = 0\) (i.e., shift and dilate on the torus). Notice that with \(p=m,\) we obtain i.i.d. samples chosen uniformly from the whole interval \(\Omega \) (as in [53]).

  2. 2.

    Uniform discrete: \({\mathcal {D}} = {\mathcal {U}}\{- \frac{1}{2m} + \frac{k}{m\bar{n}}\}_{k=0}^{\bar{n}-1}\) with \(\bar{n}:=\lceil \frac{2(N-1)}{m}\rceil + 1\) gives \(\theta = 0.\) To generalize, we may shift and dilate \({\mathcal {D}} = {\mathcal {U}}\{\mu - \frac{p}{2m} + \frac{pk}{m\bar{n}}\}_{k=0}^{\bar{n}-1},\) for any \(\mu \in {\mathbb {R}},\) \(p\in {\mathbb {N}}/\{0\}\) and integer \(\bar{n}>\frac{2(N-1)p}{m}.\) We obtain \(\theta = 0\) as well.

  3. 3.

    Normal: \({\mathcal {D}} = {\mathcal {N}}(\mu ,\bar{\sigma }^2),\) for any mean \(\mu \in {\mathbb {R}}\) and variance \(\bar{\sigma }^2>0.\) Here

    $$\begin{aligned} \theta = \frac{2N}{m}e^{-2(\bar{\sigma }\pi m)^2}. \end{aligned}$$

    In particular, for fixed \(\bar{\sigma },\) m may be chosen large enough to satisfy the conditions of Theorem 4 and vice versa.

  4. 4.

    Laplace: \({\mathcal {D}} = {\mathcal {L}}(\mu ,b),\) for any location \(\mu \in {\mathbb {R}}\) and scale \(b>0\) gives

    $$\begin{aligned} \theta = \frac{2N}{m(1+(2\pi bm)^2)}. \end{aligned}$$
  5. 5.

    Exponential: \({\mathcal {D}} = \text{ Exp }(\lambda ),\) for any rate \(\lambda >0\) gives

    $$\begin{aligned} \theta = \frac{2N}{m\sqrt{1+4\pi ^2m^2\lambda ^{-2}}}. \end{aligned}$$

In particular, notice that examples 1 and 2 include cases of jittered sampling [30, 42,43,44]. Indeed, with \(p=1\) these examples partition \(\Omega \) into m regions of equal size and these distributions will choose a point randomly from each region (in a continuous or discrete sense). The jittered sampling list can be expanded by considering other distributions to generate samples within each region.

In general we will have \(\theta > 0,\) which implies deteriorated output quality and increases the number of required off-the-grid samples according to Theorem 2. Arguably, our deviation model introduces a notion of optimal jitter when the chosen distribution achieves \(\theta = 0,\) ideal in our results. This observation may be of interest in the active literature of jittered sampling techniques [30].

Intuitively, \(\theta \) is measuring how biased a given distribution is in generating deviations. If \(\delta \sim {\mathcal {D}},\) \(|{\mathbb {E}}{{\textbf {e}}}(jm\delta )|\approx 0\) means that the distribution is nearly centered and impartial. On the other hand, \(|{\mathbb {E}}{{\textbf {e}}}(jm\delta )|\approx 1\) gives the opposite interpretation where the deviations will be generated favoring a certain direction in an almost deterministic sense. Our result is not applicable to such biased distributions, since in Theorem 2 as \(\theta \rightarrow 1\) the error bound becomes unbounded and meaningless.

5.3 Signal model

In this section we discuss the DFT-incoherence parameter \(\gamma ,\) introduced in Sect. 2.1 as

$$\begin{aligned} \gamma = \max _{\ell \in [n]} \sum _{k=1}^{N}|\langle {\mathcal {F}}_{*k},\Psi _{*\ell }\rangle |, \end{aligned}$$

where we now let \(\Psi \in {\mathbb {C}}^{N\times n}\) be a full column rank matrix with \(n\le N.\) The parameter \(\gamma \) a uniform upper bound on the 1-norm of the discrete Fourier coefficients of the columns of \(\Psi .\) Since the decay of the Fourier coefficients of a function is related to its smoothness, intuitively \(\gamma \) can be seen as a measure of the smoothness of the columns of \(\Psi .\) Implicitly, this also measures the smoothness of \({{\textbf {f}}}\) since its uniform discretization admits a representation via this transformation \(f = \Psi g.\)

Therefore, the role of \(\gamma \) on the sampling complexity is clear, relatively small \(\gamma \) implies that our signal of interest is smooth and therefore requires less samples. This observation is intuitive, since non-smooth functions will require additional samples to capture discontinuities in accordance with Gibbs phenomenon. This argument is validated numerically in Sect. 6.1, where we compare reconstruction via an infinitely differentiable ensemble (FFT) and a discontinuous wavelet (Daubechies 2).

We now consider several common choices for \(\Psi \) and discuss the respective \(\gamma \) parameter:

  1. 1.

    \(\Psi = {\mathcal {F}}\) (the DFT), then \(\gamma = 1\) which is optimal. However, most appropriate and common is the choice \(\Psi = {\mathcal {F}}^*\) which can be shown to exhibit \(\gamma \sim {\mathcal {O}}(1)\) by a simple calculation.

  2. 2.

    When \(\Psi = {\mathcal {H}}^*\) is the inverse 1D Haar wavelet transform, we have \(\gamma \sim {\mathcal {O}}(\log (N)).\) In [56] it is shown that the inner products between rows of \({\mathcal {F}}\) and rows of \({\mathcal {H}}\) decay according to an inverse power law of the frequency (see Lemma 1 therein). A similar proof shows that \(|\langle {\mathcal {F}}_{*k},{\mathcal {H}}^{*}_{*\ell }\rangle |\sim \frac{1}{|k|},\) which gives the desired upper bound for \(\gamma \) via an integral comparison. Notice that these basis vectors have jump discontinuities, and yet we still obtain an acceptable DFT-incoherence parameter for nonuniform undersampling.

  3. 3.

    \(\Psi = {\mathcal {I}}_N\) (the \(N\times N\) identity) gives \(\gamma = \sqrt{N}.\) This is the worst case scenario for normalized transforms since

    $$\begin{aligned} \max _{v\in S^{N-1}} \sum _{k=1}^{N}|\langle {\mathcal {F}}_{*k},v\rangle |= \max _{v\in S^{N-1}} \sum _{k=1}^{N}|\langle {\mathcal {F}}_{*k},{\mathcal {F}}^*v\rangle |= \max _{v\in S^{N-1}} \sum _{k=1}^{N}|v_k|= \sqrt{N}. \end{aligned}$$

    In general, our smooth signals of interest are not fit for this sparsity model.

  4. 4.

    Let \(p\ge 1\) be an integer. Matrices \(\Psi \) whose columns are uniform discretizations of p-differentiable functions, with \(p-1\) periodic and continuous derivatives and p-th derivative that is piecewise continuous. In this case \(\gamma \sim {\mathcal {O}}(\log (N))\) if \(p=1\) and \(\gamma \sim {\mathcal {O}}(1)\) if \(p\ge 2.\) For sake of brevity we do not provide this calculation, but refer the reader to Section 2.8 in [57] for an informal argument.

Example 4 is particularly informative due to its generality and ability to somewhat formalize the intuition behind \(\gamma \) previously discussed. This example implies the applicability of our result to a general class of smooth functions that agree nicely with our signal model defined in Sect. 2.1 (functions in \(A(\Omega )).\)

6 Numerical experiments

In this section we present numerical experiments to explore several aspects of our methodology and results. Specifically, we consider the effects of the DFT-incoherence and \(\theta \) parameter in Sects. 6.1 and 6.2 respectively. Section 6.3 investigates the noise attenuation properties of nonuniform samples. We first introduce several terms and models to describe the setup of the experiments. Throughout we let \(N = 2015\) be the size of the uniformly discretized signal f.

Program (1) with \(\lambda = \frac{1}{2\sqrt{2s}}\) is solved using CVX [58, 59], a MATLAB\(^{\circledR }\) optimization toolbox for solving convex problems. We implement the Dirichlet kernel using (7) directly to construct \({\mathcal {S}}.\) We warn the reader that in this section we have not dedicated much effort to optimize the numerical complexity of the interpolation kernel. For a faster implementation, we recommend instead applying the DFT/NDFT representation \({\mathcal {S}}= {\mathcal {N}}{\mathcal {F}}^*\) (see Sect. 2.3) using NFFT 3 software from [49] or its parallel counterpart [60].

Given output \(f^{\sharp } = \Psi g^{\sharp }\) with true solution f,  we consider the relative norm of the reconstruction error as a measure of output quality, given as

$$\begin{aligned} \text{ Relative } \text{ Error } = \frac{\Vert f^{\sharp }-f\Vert _2}{\Vert f\Vert _2}. \end{aligned}$$

Grid perturbations: To construct the nonuniform grid \(\tilde{\tau },\) we introduce an irregularity parameter \(\rho \ge 0.\) We define our perturbations by sampling from a uniform distribution, so that each \(\Delta _k\) is drawn uniformly at random from \([-\frac{\rho }{m},\frac{\rho }{m}]\) for all \(k\in [m]\) independently. Off-the-grid samples \(\tilde{\tau }\) are generated independently for each signal reconstruction experiment.

Complex exponential signal model: We consider bandlimited complex exponentials with random harmonic frequencies. With bandwidth \(\omega = \frac{N-1}{2} = 1007,\) and sparsity level \(s = 50\) we generate \({\vec {\omega }}\in {\mathbb {Z}}^s\) by choosing s frequencies uniformly at random from \(\{-\omega ,-\omega +1,\ldots ,\omega \}\) and let

$$\begin{aligned} {{\textbf {f}}}(x) = \sum _{k=1}^{s}{{\textbf {e}}}\left( {\vec {\omega }}_k x\right) . \end{aligned}$$

We use the DFT as a sparsifying transform \(\Psi = {\mathcal {F}}\) so that \(g = \Psi ^{*} f = \Psi ^{-1} f\) is a 50-sparse vector. This transform is implemented using MATLAB’s fft function. The frequency vector, \({\vec {\omega }},\) is generated randomly for each independent set of experiments. Note that in this case we have optimal DFT-incoherence parameter \(\gamma = 1\) (see Sect. 5.3).

Gaussian signal model: We consider a non-bandlimited signal consisting of sums of Gaussian functions. This signal model is defined as

$$\begin{aligned} {{\textbf {f}}}(x) = -e^{-100x^2}+e^{-100(x-.104)^2}-e^{-100(x+.217)^2}. \end{aligned}$$

For this dataset, we use the Daubechies 2 wavelet as a sparsifying transform \(\Psi ,\) implemented using the Rice Wavelet Toolbox [61]. This provides \(g = \Psi ^{*} f = \Psi ^{-1} f\) that can be well approximated by a 50-sparse vector. In other words, all entries of g are non-zero but \(\epsilon _{50}(g) < .088 \approx \frac{\Vert f\Vert _2}{250}\) and if \(g_{50}\) is the best 50-sparse approximation of g then \(\Vert f-\Psi g_{50}\Vert _2 < .026 \approx \frac{\Vert f\Vert _2}{850}.\) The smallest singular value of the transform is \(\sigma _{2015}(\Psi ) = 1\) and we have \(\gamma \approx 36.62,\) computed numerically.

6.1 Effect of DFT-incoherence

This section is dedicated to exploring the effect of the DFT-incoherence parameter in signal reconstruction. We consider the complex exponential and Gaussian signal models described above. Recall that in the complex exponential model we have \(\Psi = {\mathcal {F}}\) (the DFT) with optimal DFT-incoherence parameter \(\gamma = 1.\) In the Gaussian model \(\Psi \) is the Daubechies 2 wavelet with \(\gamma \approx 36.62.\) Varying the number of nonuniform samples, we will compare the quality of reconstruction using both signal models with respective transforms to investigate the role of \(\gamma \) in the reconstruction error. We consider the sparsity level \(s=50\) and solve (1) with \(\lambda = \frac{1}{2\sqrt{2s}} = \frac{1}{20},\) though the Gaussian signal model is not 50-sparse in the Daubechies domain (see last paragraph of this subsection for further discussion).

Here we set irregularity parameter \(\rho = \frac{1}{2}\) to generate the deviations (so that \(\theta = 0\)) and vary the average step size of the nonuniform samples. We do so by letting m vary through the set \(\{\lfloor \frac{N}{1.5}\rfloor ,\lfloor \frac{N}{2}\rfloor ,\lfloor \frac{N}{2.5}\rfloor ,\ldots ,\lfloor \frac{N}{10}\rfloor \}.\) For each fixed value of m,  the average relative error is obtained by averaging the relative errors of 50 independent reconstruction experiments. The results are shown in Fig. 1, where we plot the average step size vs average relative reconstruction error.

Fig. 1
figure 1

Plot of average relative reconstruction error vs average step size for both signal models. In the complex exponential model \((\Psi ={\mathcal {F}},\) the DFT) we have \(\gamma = 1\) and in the Gaussian signal model we have \(\gamma \approx 36.62\) (Daubechies 2 wavelet). Notice that the complex exponential model allows for reconstruction from larger step sizes in comparison to the Gaussian signal model

These experiments demonstrate the negative effect of larger DFT-incoherence parameters in signal reconstruction. Indeed, in Fig. 1 we see that the complex exponential model with \(\gamma =1\) allows for accurate reconstruction from larger step sizes. This is to be expected from Sect. 3, where the results imply that the Daubechies 2 wavelet will require more samples for successful reconstruction according to its parameter \(\gamma \approx 36.62.\)

To appropriately interpret these experiments, it is important to note that the signal from the Gaussian model is only compressible and does not exhibit a 50-sparse representation via the Daubechies transform. Arguably, this may render the experiments of this section inappropriate to purely determine the effect of \(\gamma \) since the impact of approximating the Gaussian signal with a 50-sparse vector may be significant and produce an unfair comparison (i.e., due the sparse model mismatch term \(\epsilon _{50}(g)\) appearing in our error bound (14)). This is important for the reader to keep in mind, but we argue that the effect of this mismatch is negligible since in the Gaussian signal model with \(g = \Psi ^{-1} f\) we have \(\epsilon _{50}(g) < \frac{\Vert f\Vert _2}{250}\) and if \(g_{50}\) is the best 50-sparse approximation of g then \(\Vert f-\Psi g_{50}\Vert _2 < \frac{\Vert f\Vert _2}{850}.\) This argument can be further validated with modified numerical experiments where f does have a 50-sparse representation in the Daubechies domain, producing reconstruction errors with identical behavior and magnitude as those in Fig. 1. Therefore, we believe our results here are informative to understand the impact of \(\gamma .\) For brevity, we do not present these modified experiments since such an f will not longer satisfy the Gaussian signal model and complicate our discussion.

6.2 Effect of the deviation model parameter

In this section we generate the deviations in such a way that vary the deviation model parameter \(\theta ,\) in order to explore its effect on signal reconstruction. We only consider the complex exponential signal model for this purpose and fix \(m = \lfloor \frac{N}{10}\rfloor = 201.\)

We vary \(\theta \) by generating deviations with irregularity parameter \(\rho \) varying over \(\{.001,.002,.003,\ldots ,.009\}\bigcup \{.01,.02,.03,\ldots ,.5\}.\) For each fixed \(\rho \) value we compute the average relative reconstruction error of 50 independent experiments. Notice that for each \(k\in [m]\) and any j

$$\begin{aligned} {\mathbb {E}}{{\textbf {e}}}\left( jm\Delta _k\right) = \frac{\sin \left( 2\pi j \rho \right) }{2\pi j \rho }. \end{aligned}$$

Given \(\rho ,\) we use this observation and definition (4) to compute the respective \(\theta \) value by considering the maximum of the expression above over all \(0<|j|\le \lfloor \frac{N-1}{m}\rfloor = 10.\) The relationship between \(\rho \) and \(\theta \) is illustrated in Fig. 2 (right plot), where smaller irregularity parameters \(\rho \approx 0\) provide larger deviation model parameters \(\theta .\)

According to (4), this allows \(\theta \in [0,20.05),\) which violates the assumption \(\theta < 1\) of Theorem 2 and does not allow (1) to be implemented with parameter in the required range

$$\begin{aligned} 0<\lambda \le \frac{\sqrt{1-\theta }}{2\sqrt{2s}}. \end{aligned}$$

Despite this, we implement all experiments in this section with \(\lambda = \frac{1}{2\sqrt{2s}} = \frac{1}{20}\) (where \(s=50).\) Such a fixed choice may not provide a fair set of results, since the parameter is not adapted in any way to the deviation model. Regardless, the experiments will prove to be informative while revealing the robustness of the square-root LASSO decoder with respect to parameter selection.

Figure 2 plots \(\theta \) vs average relative reconstruction error (left plot). In the plot, our main result (Theorem 2) is only strictly applicable in three cases (outlined in red, \(\theta = 0,.409,.833).\) However, the experiments demonstrate that decent signal reconstruction may be achieved when the condition \(\theta < 1\) does not hold and the parameter \(\lambda \) is not chosen appropriately. Therefore, the applicability of the methodology goes beyond the restrictions of the theorem and the numerical results demonstrate the flexibility of the square-root LASSO decoder.

Fig. 2
figure 2

(Left) Plot of average relative reconstruction error vs corresponding \(\theta \) parameter and (right) plot illustrating the relationship between the irregularity parameter \(\rho \) and the deviation model parameter \(\theta .\) The plots emphasize via red outlines the \(\theta \) values that satisfy the conditions of Theorem 2 (i.e., \(\theta < 1).\) Although our results only hold for three \(\theta \) values (0, .409, .833),  the experiments demonstrate that accurate recovery is possible otherwise

6.3 Noise attenuation

This section explores the robustness of the methodology when presented with measurement noise, in both the undersampled and oversampled cases relative to the target bandwidth \(\frac{N-1}{2}\) (Sects. 3 and 4 respectively). We only solve the square-root LASSO problem (1) with \(\lambda = \frac{1}{2\sqrt{2s}} = \frac{1}{20},\) and avoid the least squares problem (18) for brevity. However, we note that both programs produce similar results and conclusions in the oversampled case (see Theorem 4). We only consider the bandlimited complex exponential signal model for this purpose. We generate additive random noise \(d\in {\mathbb {R}}^m\) from a uniform distribution. Each entry of \(d\in {\mathbb {R}}^{m}\) is i.i.d. from \([-\frac{\chi }{1000},\frac{\chi }{1000}]\) where \(\chi = \frac{1}{\sqrt{m}}\Vert f\Vert _1,\) chosen to maintain \(\Vert d\Vert _2\) relatively constant as m varies.

We set \(\rho = \frac{1}{2}\) to generate the deviations (so that \(\theta = 0\)) and vary the average step size of the nonuniform samples. We do so by letting m vary through the set \(\{\lfloor \frac{N}{.5}\rfloor ,\lfloor \frac{N}{.75}\rfloor ,N,\ldots ,\lfloor \frac{N}{6.75} \rfloor ,\lfloor \frac{N}{7}\rfloor \},\) notice that only the first two cases correspond to oversampling. For each fixed value of m,  the relative reconstruction error is obtained by averaging the result of 50 independent experiments. The results are shown in Fig. 3, where we plot the average step size vs average relative reconstruction error and average relative input noise level \(\Vert d\Vert _2/\Vert f\Vert _2.\)

Fig. 3
figure 3

Plot of average relative reconstruction error \((\Vert f-f^{\sharp }\Vert _2/\Vert f\Vert _2)\) vs average step size (blue curve) and average input relative measurement error \((\Vert d\Vert _2/\Vert f\Vert _2)\) vs average step size (red curve). Notice that the first 13 step size values achieve noise attenuation, i.e., the reconstruction error is lower than the input noise level

The first two cases \((m = \lfloor \frac{N}{.5}\rfloor ,\lfloor \frac{N}{.75}\rfloor \)) correspond to oversampling and illustrate the results from Sect. 4 (and Theorem 4), where attenuation of the input noise level is achieved. Surprisingly, these experiments demonstrate that nonuniform undersampling also allows for denoising. This is seen in Fig. 3, where the values \(m = \lfloor \frac{N}{1.25}\rfloor ,\lfloor \frac{N}{1.5}\rfloor ,\ldots ,\lfloor \frac{N}{3.5}\rfloor \) correspond to sub-Nyquist rates and output an average relative reconstruction error smaller than the input measurement noise level. Thus, when nonuniform samples are not severely undersampled, the negative effects of random noise can be reduced.

7 Conclusions

This paper provides a concrete framework to study the benefits of random nonuniform samples for signal acquisition (in comparison to equispaced sampling), with explicit statements that are informative for practitioners. Related observations are extensive but largely empirical in the sampling theory literature. Therefore, this work supplies novel theoretical insights on this widely discussed phenomenon. In the context of compressive sensing, we extend the applicability of this acquisition paradigm by demonstrating how it naturally intersects with standard sampling techniques. We hope that these observations will prompt a broader usage of compressive sensing in real world applications that rely on classical sampling theory.

There are several avenues for future research. First, the overall methodology requires the practitioner to know the nonuniform sampling locations \(\tilde{\tau }\) accurately. While this is typical for signal reconstruction techniques that involve non-equispaced samples, it would be of practical interest to extend the methodology is such a way that allows for robustness to inaccurate sampling locations and even self-calibration. Further, as mentioned in Sect. 6, this work has not dedicated much effort to a numerically efficient implementation of the Dirichlet kernel \({\mathcal {S}}.\) This is crucial for large-scale applications, where a direct implementation of the Dirichlet kernel via its Fourier or Dirichlet representation (see [62]) may be too inefficient for practical purposes. As future work, it would be useful to consider other interpolation kernels with greater numerical efficiency (e.g., a low order Lagrange interpolation operator).

Finally, to explore the undersampling and anti-aliasing properties of nonuniform samples, our results here require a sparse signal assumption and adopt compressive sensing methodologies. However, most work that first discussed this nonuniform sampling phenomenon precedes the introduction of compressive sensing and does not explicitly impose sparsity assumptions. Therefore, to fully determine the benefits provided by off-the-grid samples it would be most informative to consider a more general setting, e.g., only relying on the smoothness of continuous-time signals. We believe the work achieved here provides a potential avenue to do so.

8 Proofs

We now provide proofs to all of our claims. In Sect. 8.1 we prove Theorem 2 via a more general result. Theorem 3 is proven in Sect. 8.2. Section 8.3 establishes the Dirichlet kernel error bounds in Theorem 1 and Corollary 1.

8.1 Proof of Theorem 2

In this section, we will prove a more general result than Theorem 2 assuming that \(\Psi \) is a full column-rank matrix and allowing \(m\ge N.\) Theorem 2 will follow from Theorem 4 by taking \(\alpha = \beta = 1, n=N,\) and simplifying some terms.

Theorem 4

Let \(2\le s\le n\le N\) and \(\Psi \in {\mathbb {C}}^{N\times n}\) be a full column rank matrix with DFT-incoherence parameter \(\gamma \) and extreme singular values \(\sigma _1(\Psi ){:}{=}\beta \ge \alpha {:}{=}\sigma _n(\Psi )>0.\) Let the entries of \(\Delta \) be i.i.d. from any distribution satisfying our deviation model with \(\theta <1.\) Define

$$\begin{aligned} g^{\sharp }{:}{=}{{\,\mathrm{arg\,\hspace{-2pt}min}\,}}_{h\in {\mathbb {C}}^n}\lambda \Vert h\Vert _1+\frac{\sqrt{N}}{\sqrt{m}}\Vert {\mathcal {S}}\Psi h - b\Vert _2 \end{aligned}$$
(20)

with

$$\begin{aligned} 0<\lambda \le \frac{\alpha \sqrt{1-\theta }}{2\sqrt{2s}}. \end{aligned}$$

If

$$\begin{aligned} m&\ge \frac{C_1\gamma ^2\beta ^2(1+\theta )}{\alpha ^4(1-\theta )^2}s\nonumber \\&\quad \cdot \,\left( \log \left( \frac{C_2\gamma ^2\beta ^2(1+\theta )}{\alpha ^4(1-\theta )^2}s +2\right) \log ^2\left( \frac{C_2\beta ^2(1+\theta )}{\alpha ^2(1-\theta )}s\right) \log (n) + \log (n)\right) \nonumber \\ \end{aligned}$$
(21)

where \(C_1\) and \(C_2\) are absolute constants,  then

$$\begin{aligned} \Vert f-\Psi g^{\sharp }\Vert _{2} \le \frac{8\beta \epsilon _s(g)}{\sqrt{s}} +\left( \frac{4\beta }{\lambda \sqrt{s}}+\frac{8\beta \sqrt{2}}{\alpha \sqrt{1-\theta }}\right) \left( \frac{\sqrt{N}}{\sqrt{m}}\Vert d\Vert _2 + 2\sqrt{N}\sum _{|\ell |>\frac{N-1}{2}}|c_{\ell }|\right) \end{aligned}$$

with probability exceeding \(1-\frac{1}{n}.\)

This theorem generalizes Theorem 2 to more general transformations \(\Psi \) for sparse representation. This is more practical since the columns of \(\Psi \) need not be orthogonal, instead linear independence suffices (with knowledge of the singular values \(\alpha ,\beta ).\) In particular notice that (21) depends on n and does not involve N,  as opposed to \(m\sim s\log ^4(N)\) in (13). Since \(n\le N,\) this general result allows for a potential reduction in sample complexity if the practitioner may construct \(\Psi \) in such an efficient manner while still allowing a sparse and accurate representation of f.

Furthermore, notice that this more general result allows for oversampling \(m\ge n\) or \(m\ge N.\) If we apply Theorem 4 with \(s=n\) then \(\epsilon _s(g) = 0\) and we obtain an error bound similar to those in Sect. 4, reducing additive noise by a factor \(\frac{\sqrt{N}}{\sqrt{n}\log ^2(n)}\) from \(m\sim n\log ^4(n)\) off-the-grid samples. However, in this scenario the sparsifying transform is no longer of much relevance and it is arguably best to consider the approach of Sect. 4 which removes the need to consider \(\gamma , \beta , \alpha ,\) and \(\theta \) via a numerically cheaper methodology and a more general set of deviations.

To establish Theorem 4 we will consider the \({\mathcal {G}}\)-adjusted restricted isometry property \(({\mathcal {G}}\)-RIP) [63], defined as follows:

Definition 1

\(({\mathcal {G}}\)-adjusted restricted isometry property [63]) Let \(1\le s\le n\) and \({\mathcal {G}}\in {\mathbb {C}}^{n\times n}\) be invertible. The s-th \({\mathcal {G}}\)-adjusted Restricted Isometry Constant \(({\mathcal {G}}\)-RIC) \(\delta _{s,{\mathcal {G}}}\) of a matrix \({\mathcal {A}}\in {\mathbb {C}}^{m\times n}\) is the smallest \(\delta > 0\) such that

$$\begin{aligned} (1-\delta )\Vert {\mathcal {G}}v\Vert _2^2 \le \Vert {\mathcal {A}}v\Vert \le (1+\delta )\Vert {\mathcal {G}}v\Vert _2^2 \end{aligned}$$

for all \(v\in \{z\in {\mathbb {C}}^{n} \ |\ \Vert z\Vert _0 \le s\}.\) If \(0< \delta _{s,{\mathcal {G}}} < 1\) then the matrix \({\mathcal {A}}\) is said to satisfy the \({\mathcal {G}}\)-adjusted Restricted Isometry Property \(({\mathcal {G}}\)-RIP) of order s.

This property ensures that a measurement matrix is well conditioned amongst all s-sparse signals, allowing for successful compressive sensing from \(\sim s{{\,\mathrm{poly\,\hspace{-2pt}log}\,}}(n)\) measurements. Once established for our measurement ensemble, Theorem 4 will follow by applying the following result:

Theorem 5

(Theorem 13.9 in [63]) Let \({\mathcal {G}}\in {\mathbb {C}}^{n\times n}\) be invertible and \({\mathcal {A}}\in {\mathbb {C}}^{m\times n}\) have the \({\mathcal {G}}\)-RIP of order q and constant \(0<\delta <1\) where

$$\begin{aligned} q = 2\lceil {4s\left( \frac{1+\delta }{1-\delta }\right) \Vert {\mathcal {G}} \Vert ^2\Vert {\mathcal {G}}^{-1}\Vert ^2}\rceil . \end{aligned}$$
(22)

Let \(g\in {\mathbb {C}}^{n},\) \(y = {\mathcal {A}}g+d\in {\mathbb {C}}^{m},\) and \(\lambda \le \frac{\sqrt{1-\delta }}{2\Vert {\mathcal {G}}^{-1}\Vert \sqrt{s}}.\) Then

$$\begin{aligned} g^{\sharp }={{\,\mathrm{arg\,\hspace{-2pt}min}\,}}_{h\in {\mathbb {C}}^n}\lambda \Vert h\Vert _1+\Vert {\mathcal {A}}h - y\Vert _2 \end{aligned}$$

satisfies

$$\begin{aligned} \Vert g-g^{\sharp }\Vert _2 \le \frac{8\epsilon _s(g)}{\sqrt{s}} + 8\left( \frac{1}{2\lambda \sqrt{s}}+\frac{\Vert {\mathcal {G}}^{-1}\Vert }{\sqrt{1-\delta }}\right) \Vert d\Vert _2. \end{aligned}$$
(23)

We therefore obtain our main result if we establish the \({\mathcal {G}}\)-RIP for

$$\begin{aligned} {\mathcal {A}}:=\sqrt{N}{\mathcal {S}}\Psi . \end{aligned}$$

To do so, we note that our measurement ensemble is generated from a nondegenerate collection of independent families of random vectors. Such random matrices have been shown to possess the \({\mathcal {G}}\)-RIP in the literature. To be specific, a nondegenerate collection is defined as follows:

Definition 2

(Nondegenerate collection [63]) Let \({\varvec{{\mathscr {A}}}}_1,\ldots , {{\varvec{\mathscr {A}}}}_m\) be independent families of random vectors on \({\mathbb {C}}^{n}.\) The collection \({{\varvec{\mathscr {C}}}}=\{{{\varvec{\mathscr {A}}}}_k\}_{k=1}^{m}\) is nondegenerate if the matrix

$$\begin{aligned} \frac{1}{m}\sum _{k=1}^{m}{\mathbb {E}}\left( a_ka_k^*\right) , \end{aligned}$$

where \(a_k\sim {{\varvec{\mathscr {A}}}}_k,\) is positive-definite. In this case, write \({\mathcal {G}}_{{{\varvec{\mathscr {C}}}}}\in {\mathbb {C}}^{n\times n}\) for its unique positive-definite square root.

Our ensemble fits this definition, with the rows of \({\mathcal {N}}\in {\mathbb {C}}^{m\times N}\) generated from a collection of m independent families of random vectors:

$$\begin{aligned} {\mathcal {N}}_{k*}^*&= \frac{1}{\sqrt{N}}\begin{bmatrix} {{\textbf {e}}}\left( -\tilde{N}\left( \frac{k-1}{m} - \frac{1}{2} + \Delta _k\right) \right) \\ {{\textbf {e}}}\left( -(\tilde{N}-1)\left( \frac{k-1}{m} - \frac{1}{2} + \Delta _k\right) \right) \\ \vdots \\ {{\textbf {e}}}\left( \tilde{N}\left( \frac{k-1}{m} - \frac{1}{2} + \Delta _k\right) \right) \end{bmatrix}\quad \text{ with } \ \Delta _k \sim {\mathcal {D}}. \end{aligned}$$

Therefore, in our scenario, the k-th family \({{\varvec{\mathscr {A}}}}_k\) independently generates deviation \(\Delta _k \sim {\mathcal {D}}\) and produces a random vector of the form above as the k-th row of \({\mathcal {N}}.\) This in turn also generates the rows of \({\mathcal {A}}\) independently, since its k-th row is given as \(\sqrt{N}{\mathcal {N}}_{k*}{\mathcal {F}}^*\Psi .\) To apply \({\mathcal {G}}\)-RIP results from the literature for such matrices, we will have to consider the coherence of our collection:

Definition 3

(Coherence of an unsaturated collection \({{\varvec{\mathscr {C}}}}\) [63]) Let \({{\varvec{\mathscr {A}}}}_1,\ldots , {{\varvec{\mathscr {A}}}}_m\) be independent families of random vectors, with smallest constants \(\mu _1,\ldots ,\mu _k\) such that

$$\begin{aligned} \Vert a_k\Vert _{\infty }^{2}\le \mu _k \end{aligned}$$

holds almost surely for \(a_k\sim {{\varvec{\mathscr {A}}}}_k.\) The coherence of an unsaturated collection \({{\varvec{\mathscr {C}}}}=\{{{\varvec{\mathscr {A}}}}_k\}_{k=1}^{m}\) is

$$\begin{aligned} \mu ({{\varvec{\mathscr {C}}}}) = \max _{k\in [m]}\mu _k. \end{aligned}$$

In the above definition, a family \({{\varvec{\mathscr {A}}}}_k\) is saturated is it consists of a single vector and a collection is unsaturated if no family in the collection is saturated. In our context, it is easy to see that the condition \(\theta < 1\) avoids saturation and the definition above applies. The coherence of our collection of families will translate to the DFT-incoherence parameter defined in Sect. 2.1.

With these definitions in mind, we now state a simplified version of Theorem 13.12 in [63] that will show the \({\mathcal {G}}\)-RIP for our ensemble.

Theorem 6

Let \(0<\delta ,\) \(\epsilon <1,\) \(n\ge s\ge 2,\) \({{\varvec{\mathscr {C}}}}=\{{{\varvec{\mathscr {A}}}}_k\}_{k=1}^{m}\) be a nondegenerate collection generating the rows of \({\mathcal {A}}.\) Suppose that

$$\begin{aligned} m \ge \frac{\tilde{c}_1\Vert {\mathcal {G}}_{{{\varvec{\mathscr {C}}}}}^{-1}\Vert ^2\mu ({{\varvec{\mathscr {C}}}})s}{\delta ^2}\left( \log \left( 2\left( \Vert {\mathcal {G}} _{{{\varvec{\mathscr {C}}}}}^{-1}\Vert ^2\mu ({{\varvec{\mathscr {C}}}})s + 1\right) \right) \log ^2(s)\log (n) + \log \left( \epsilon ^{-1}\right) \right) ,\nonumber \\ \end{aligned}$$
(24)

where \(\tilde{c}_1\) is an absolute constant. Then with probability at least \(1-\epsilon ,\) the matrix \({\mathcal {A}}\) has the \({\mathcal {G}}\)-RIP of order s with constant \(\delta _{s,{\mathcal {G}}} \le \delta .\)

In conclusion, to obtain Theorem 4 we will first show that \({\mathcal {A}}\) is generated by a nondegenerate collection with unique positive-definite square root \({\mathcal {G}}.\) Establishing this will provide a upper bounds for \(\Vert {\mathcal {G}}^{-1}\Vert ,\) \(\Vert {\mathcal {G}}\Vert ,\) and \(\mu ({{\varvec{\mathscr {C}}}}).\) At this point, Theorem 6 will provide \({\mathcal {A}}\) with the \({\mathcal {G}}\)-RIP and subsequently Theorem 5 can be applied to obtain the error bounds.

To establish that the collection \({{\varvec{\mathscr {C}}}}=\{{{\varvec{\mathscr {A}}}}_k\}_{k=1}^{m}\) above is nondegenerate, it suffices to show that

$$\begin{aligned} \frac{1}{m}{\mathbb {E}}\Vert {\mathcal {A}}w\Vert _2^2 \le \beta ^2(1+\theta )\Vert w\Vert _2^2 \quad \text{ and }\quad \frac{1}{m}{\mathbb {E}}\Vert {\mathcal {A}}w\Vert _2^2 \ge \alpha ^2(1-\theta )\Vert w\Vert _2^2 \end{aligned}$$
(25)

for all \(w\in {\mathbb {C}}^{n}.\) This will show that \(\frac{1}{m}{\mathbb {E}}{\mathcal {A}}^*{\mathcal {A}}\) is positive-definite if the deviation model satisfies \(\theta < 1.\) Further, let \({\mathcal {G}}\) be the unique positive-definite square root of \(\frac{1}{m}{\mathbb {E}}{\mathcal {A}}^*{\mathcal {A}},\) then (25) will also show that

$$\begin{aligned} \Vert {\mathcal {G}}\Vert \le \beta \sqrt{1+\theta } \quad \text{ and }\quad \Vert {\mathcal {G}}^{-1}\Vert \le \frac{1}{\alpha \sqrt{1-\theta }}. \end{aligned}$$
(26)

To this end, let \(w\in {\mathbb {C}}^n\) and normalize \(\tilde{{\mathcal {N}}} := \sqrt{N}{\mathcal {N}}\) so that for \(k\in [m], \ell \in [N]\)

$$\begin{aligned} \tilde{{\mathcal {N}}}_{k\ell } := {{\textbf {e}}}(-\tilde{t}_k(\ell -\tilde{N}-1)). \end{aligned}$$

Throughout, let \(\tilde{\Delta }\in {\mathbb {R}}\) be an independent copy of the entries of \(\Delta \in {\mathbb {R}}^m.\) Then with \(v := {\mathcal {F}}^*\Psi w,\)

$$\begin{aligned}&\frac{1}{m}{\mathbb {E}}\Vert {\mathcal {A}}w\Vert _2^2 = \frac{1}{m}{\mathbb {E}}\Vert \tilde{{\mathcal {N}}}{\mathcal {F}}^*\Psi w\Vert _2^2 := \frac{1}{m}{\mathbb {E}}\Vert \tilde{{\mathcal {N}}}v\Vert _2^2\\&\quad = {\mathbb {E}}\frac{1}{m}\sum _{k=1}^{m}|\langle \tilde{{\mathcal {N}}}_{k*},v\rangle |^2 = {\mathbb {E}}\frac{1}{m}\sum _{k=1}^{m}\Biggl |\sum _{\ell =1}^{N}{{\textbf {e}}}(\tilde{t}_k (\ell -\tilde{n}-1))v_{\ell }\Biggr |^2\\&\quad = {\mathbb {E}}\frac{1}{m}\sum _{k=1}^{m}\left( \sum _{\ell =1}^{N}\sum _{\tilde{\ell }=1}^{N}{{\textbf {e}}} (\tilde{t}_k(\ell -\tilde{\ell }))v_{\ell }\bar{v}_{\tilde{\ell }}\right) \\&\quad = \sum _{\ell =1}^{N}\sum _{\tilde{\ell }=1}^{N}v_{\ell }\bar{v}_{\tilde{\ell }}\left( {\mathbb {E}} \frac{1}{m}\sum _{k=1}^{m}{{\textbf {e}}}(\tilde{t}_k(\ell -\tilde{\ell }))\right) \\&\quad = \sum _{\ell =1}^{N}|v_{\ell }|^2 + \sum _{j= 1}^{\lfloor (N-1)/m \rfloor }\sum _{\ell -\tilde{\ell } = jm}v_{\ell }\bar{v}_{\tilde{\ell }}{\mathbb {E}}{{\textbf {e}}}(jm(\tilde{\Delta }-1/2)) \\&\qquad + \sum _{j= 1}^{\lfloor (N-1)/m \rfloor }\sum _{\ell -\tilde{\ell } = -jm}v_{\ell }\bar{v}_{\tilde{\ell }} {\mathbb {E}}{{\textbf {e}}}(-jm(\tilde{\Delta }-1/2)). \end{aligned}$$

The last equality can be obtained as follows,

$$\begin{aligned}&{\mathbb {E}}\frac{1}{m}\sum _{k=1}^{m}{{\textbf {e}}}(\tilde{t}_k(\ell -\tilde{\ell })) ={\mathbb {E}}\frac{1}{m}\sum _{k=1}^{m}{{\textbf {e}}}\left( \left( \frac{k-1}{m}-\frac{1}{2} + \Delta _k\right) (\ell -\tilde{\ell })\right) \\&\quad = \frac{1}{m}\sum _{k=1}^{m}{{\textbf {e}}}\left( \left( \frac{k-1}{m}-\frac{1}{2}\right) (\ell -\tilde{\ell })\right) {\mathbb {E}}{{\textbf {e}}}(\Delta _k(\ell -\tilde{\ell }))\\&\quad = \frac{1}{m}\sum _{k=1}^{m}{{\textbf {e}}}\left( \left( \frac{k-1}{m}-\frac{1}{2}\right) (\ell -\tilde{\ell })\right) {\mathbb {E}}{{\textbf {e}}}(\tilde{\Delta }(\ell -\tilde{\ell }))\\&\quad = {\mathbb {E}}{{\textbf {e}}}\left( (\tilde{\Delta }-1/2)(\ell -\tilde{\ell })\right) \sum _{k=1}^{m}\frac{1}{m} {{\textbf {e}}}\left( \frac{k-1}{m}(\ell -\tilde{\ell })\right) \\&\quad =\left\{ \begin{array}{ll} 1 &{} \quad \text{ if } \ell = \tilde{\ell } \\ {\mathbb {E}}{{\textbf {e}}}\left( jm(\tilde{\Delta }-1/2)\right) &{} \quad \text{ if } \ell - \tilde{\ell } = jm, j\in {\mathbb {Z}}/\{0\} \\ 0 &{}\quad \text{ otherwise }. \end{array} \right. \end{aligned}$$

The third equality uses the fact that \({\mathbb {E}}{{\textbf {e}}}(\Delta _k(\ell -\tilde{\ell })) = {\mathbb {E}}{{\textbf {e}}}(\tilde{\Delta }(\ell -\tilde{\ell }))\) for all \(k\in [m]\) in order to properly factor out this constant from the sum in the fourth equality. The last equality is due to the geometric series formula.

Returning to our original calculation, we bound the last term using our deviation model assumptions

$$\begin{aligned}&\Biggl |\sum _{j= 1}^{\lfloor (N-1)/m \rfloor }\sum _{\ell -\tilde{\ell } = -jm}v_{\ell }\bar{v}_{\tilde{\ell }}{\mathbb {E}}{{\textbf {e}}} (-jm(\tilde{\Delta }-1/2))\Biggr |\\&\quad = \Biggl |\sum _{j= 1}^{\lfloor (N-1)/m \rfloor }\sum _{\ell \in Q_j}v_{\ell }\bar{v}_{\ell +jm}{\mathbb {E}}{{\textbf {e}}}(-jm(\tilde{\Delta }-1/2))\Biggr |\\&\quad \le \sum _{j= 1}^{\lfloor (N-1)/m \rfloor }\sum _{\ell \in Q_j}|v_{\ell }||v_{\ell +jm}||{\mathbb {E}}{{\textbf {e}}}(-jm(\tilde{\Delta }-1/2))|\\&\quad \le \frac{\theta m}{2N}\sum _{j= 1}^{\lfloor (N-1)/m \rfloor }\sum _{\ell \in Q_j}|v_{\ell }||v_{\ell + jm}|\le \frac{\theta m}{2N}\sum _{j= 1}^{\lfloor (N-1)/m \rfloor }|\langle v,v\rangle |\\&\quad = \frac{\theta m \Vert v\Vert _2^2}{2N}\sum _{j= 1}^{\lfloor (N-1)/m \rfloor }1 = \frac{\theta m \Vert v\Vert _2^2}{2N}\left\lfloor { \frac{N-1}{m}} \right\rfloor \le \frac{\theta \Vert v\Vert _2^2}{2} . \end{aligned}$$

\(Q_j\subset [N]\) is the index set of allowed \(\ell \) indices according to j,  i.e., that satisfy \(\ell \in [N]\) and \(\ell +jm \in [N].\) The second inequality holds by our deviation model assumption (4).

The remaining sum (with \(\ell -\tilde{\ell } = jm\)) can be bounded similarly. Combine these inequalities with the singular values of \(\Psi \) to obtain

$$\begin{aligned} \frac{1}{m}{\mathbb {E}}\Vert {\mathcal {A}}w\Vert _2^2 \le \Vert v\Vert _2^2 + \frac{2\theta \Vert v\Vert _2^2}{2} := \Vert \Psi w\Vert _2^2\left( 1 + \theta \right) \le \beta ^2\Vert w\Vert _2^2\left( 1 + \theta \right) , \end{aligned}$$

and

$$\begin{aligned} \frac{1}{m}{\mathbb {E}}\Vert {\mathcal {A}}w\Vert _2^2 \ge \alpha ^2\Vert w\Vert _2^2\left( 1 - \theta \right) . \end{aligned}$$

We will apply this inequality and similar orthogonality properties in what follows (e.g., in Sect. 8.2), and ask the reader to keep this in mind.

To upper bound the coherence of the collection \({{\varvec{\mathscr {C}}}},\) let \(\tilde{{\mathcal {N}}}_{k*}{\mathcal {F}}^*\Psi \sim {{\varvec{\mathscr {A}}}}_k\) as above. Then

$$\begin{aligned}&\big \Vert \tilde{{\mathcal {N}}}_{k*}{\mathcal {F}}^*\Psi \big \Vert _{\infty } = \max _{\ell \in [n]}\big |\langle \tilde{{\mathcal {N}}}_{k*},({\mathcal {F}}^*\Psi )_{*\ell }\rangle \big |\\&\quad \le \max _{\ell \in [n]}\big \Vert \tilde{{\mathcal {N}}}_{k*}\big \Vert _{\infty }\Vert ({\mathcal {F}}^*\Psi )_{*\ell }\Vert _1 = \max _{\ell \in [n]}\sum _{k=1}^{N}|\langle {\mathcal {F}}_{*k},\Psi _{*\ell }\rangle |:= \gamma \end{aligned}$$

and therefore

$$\begin{aligned} \mu ({{\varvec{\mathscr {C}}}}) \le \gamma ^2. \end{aligned}$$
(27)

The proof of Theorem 4 is now an application of Theorems 6 and 5 using the derivations above.

Proof of Theorem 4

We are considering the equivalent program

$$\begin{aligned} g^{\sharp }{:}{=}{{\,\mathrm{arg\,\hspace{-2pt}min}\,}}_{h\in {\mathbb {C}}^n}\lambda \Vert h\Vert _1+\frac{1}{\sqrt{m}}\Vert {\mathcal {A}} h - \sqrt{N}b\Vert _2. \end{aligned}$$

From the arguments above, the rows of \({\mathcal {A}}\) are generated by a nondegenerate collection \({{\varvec{\mathscr {C}}}}\) with coherence bounded as (27). The unique positive-definite square root of \(\frac{1}{m}{\mathbb {E}}{\mathcal {A}}^*{\mathcal {A}},\) denoted \({\mathcal {G}},\) satisfies the bounds (26).

We now apply Theorem 6 with \(\delta = 1/2,\) \(\epsilon = n^{-1}\) and order

$$\begin{aligned} q = 2 \lceil {4s\left( \frac{1+\delta }{1-\delta }\right) \Vert {\mathcal {G}}\Vert ^2\Vert {\mathcal {G}}^{-1}\Vert ^2}\rceil . \end{aligned}$$

By (26) and (27), if

$$\begin{aligned} m \ge \frac{\tilde{c}_1\gamma ^2 q}{\delta ^2\alpha ^2(1-\theta )}\left( \log \left( 2\left( \frac{\gamma ^2 q}{\alpha ^2(1-\theta )} + 1\right) \right) \log ^2(q)\log (n) + \log \left( n\right) \right) , \end{aligned}$$
(28)

then (24) is satisfied and the conclusion of Theorem 6 holds. Therefore, with probability exceeding \(1-n^{-1},\) \({\mathcal {A}}\) has \({\mathcal {G}}\)-RIP of order q with constant \(\delta _{q,{\mathcal {G}}}\le \delta = 1/2.\)

To show that our sampling assumption (21) satisfies (28), notice that by (26)

$$\begin{aligned} q = 2 \lceil {12s\Vert {\mathcal {G}}\Vert ^2\Vert {\mathcal {G}}^{-1}\Vert ^2}\rceil \le 2 \lceil {12s\frac{\beta ^2(1+\theta )}{\alpha ^2(1-\theta )}}\rceil \le 2\left( 12\left( 1+\frac{1}{24}\right) s\frac{\beta ^2(1+\theta )}{\alpha ^2(1-\theta )}\right) {:}{=}\tilde{q}. \end{aligned}$$

The last inequality holds since

$$\begin{aligned} \frac{12s\beta ^2(1+\theta )}{\alpha ^{2}(1-\theta )}\ge 12s\ge 24, \end{aligned}$$

and for any real number \(a\ge 24\) it holds that \(\lceil a\rceil \le (1+\frac{1}{24})a.\) In (28), replace q with \(\tilde{q}.\) This provides our assumed sampling complexity, where expression (21) simplifies by absorbing all absolute constants into \(C_1\) and \(C_2.\)

With parameter \(\lambda \) chosen for (20), the conditions of Theorem 5 hold with \(\delta = 1/2\) and we obtain the error bound

$$\begin{aligned} \Vert g-g^{\sharp }\Vert _2 \le \frac{8\epsilon _s(g)}{\sqrt{s}} + 8\left( \frac{1}{2\lambda \sqrt{s}}+\frac{\sqrt{2}}{\alpha \sqrt{1-\theta }}\right) \frac{\sqrt{N}}{\sqrt{m}}\Vert {\mathcal {S}}f - b\Vert _2. \end{aligned}$$

To finish, notice that

$$\begin{aligned} \Vert g-g^{\sharp }\Vert _2 \ge \frac{1}{\beta }\Vert \Psi (g-g^{\sharp })\Vert _2 = \frac{1}{\beta }\Vert f-\Psi g^{\sharp }\Vert _2, \end{aligned}$$

and

$$\begin{aligned} \Vert {\mathcal {S}}f - b\Vert _2 \le \Vert {\mathcal {S}}f-\tilde{f}\Vert _2 + \Vert d\Vert _2 \le 2\sqrt{m}\sum _{|\ell |> \frac{N-1}{2}}|c_{\ell }|+ \Vert d\Vert _2, \end{aligned}$$

where the last inequality holds by Theorem 1. \(\square \)

To obtain Theorem 2 from Theorem 4, notice that in Theorem 2 we have \(n=N\) and \(\alpha =\beta =1.\) The assumption \(m\le N\) gives that

$$\begin{aligned} N \ge \frac{\gamma ^2(1+\theta )s}{(1-\theta )^2} \ge \frac{(1+\theta )s}{(1-\theta )^2}, \end{aligned}$$

which allows further simplification by combining all the logarithmic factors into a single \({{\,\mathrm{poly\,\hspace{-2pt}log}\,}}(N)\) term (introducing absolute constants where necessary). We note that the condition \(m\le N\) is not needed and is only applied for ease of exposition in the introductory result.

8.2 Proof of Theorem 3

To establish the claim, we aim to show that

$$\begin{aligned} \inf _{v\in S^{N-1}}\Vert {\mathcal {S}} v\Vert _2 \ge \delta >0, \end{aligned}$$
(29)

holds with high probability. By optimality of \(f^{\sharp },\) this will give

$$\begin{aligned}&\Vert f - f^{\sharp }\Vert _2 \le \frac{1}{\delta }\Vert {\mathcal {S}}(f - f^{\sharp })\Vert _2 \le \frac{1}{\delta }\Vert {\mathcal {S}}f - b\Vert _2 + \frac{1}{\delta }\Vert b - {\mathcal {S}}f^{\sharp }\Vert _2 \\&\quad \le \frac{2}{\delta }\Vert {\mathcal {S}}f - b\Vert _2 \le \frac{2}{\delta }\Vert d\Vert _2 + \frac{4}{\delta } \sqrt{m}\sum _{|\ell |>\frac{N-1}{2}}|c_{\ell }|, \end{aligned}$$

where the last inequality is due to our noise model and trigonometric interpolation error (Theorem 1).

To this end, we normalize by letting \(\tilde{{\mathcal {S}}} = \frac{1}{\sqrt{m}}\tilde{{\mathcal {N}}}{\mathcal {F}}^* := \frac{\sqrt{N}}{\sqrt{m}}{\mathcal {N}}{\mathcal {F}}^*\) and note that when \(m\ge N\) our sampling operator is isometric in the sense that

$$\begin{aligned} {\mathbb {E}}\tilde{{\mathcal {S}}}^*\tilde{{\mathcal {S}}} = {\mathcal {F}}\left( \frac{1}{m}{\mathbb {E}}\tilde{{\mathcal {N}}}^*\tilde{{\mathcal {N}}}\right) {\mathcal {F}}^* = {\mathcal {I}}_{N} \end{aligned}$$
(30)

where \({\mathcal {I}}_{N}\) is the \(N\times N\) identity matrix. To see this, we use our calculations from the previous section (that establish (25)) to obtain as before that for \(\ell ,\tilde{\ell }\in [N]\)

$$\begin{aligned}&{\mathbb {E}}\left( \frac{1}{m}\tilde{{\mathcal {N}}}^*\tilde{{\mathcal {N}}}\right) _{\ell \tilde{\ell }} = \frac{1}{m}{\mathbb {E}}\langle \tilde{{\mathcal {N}}}_{*\ell },\tilde{{\mathcal {N}}}_{*\tilde{\ell }}\rangle = \frac{1}{m}{\mathbb {E}}\sum _{k=1}^{m}{{\textbf {e}}}(\tilde{t}_k(\ell -\tilde{\ell }))\\&\quad = \left\{ \begin{array}{ll} 1 &{} \quad \text{ if } \ell = \tilde{\ell } \\ {\mathbb {E}}{{\textbf {e}}}\left( jm(\tilde{\Delta }-1/2)\right) &{} \quad \text{ if } \ell - \tilde{\ell } = jm, j\in {\mathbb {Z}}/\{0\} \\ 0 &{} \quad \text{ otherwise }. \end{array} \right. \end{aligned}$$

However, if \(m\ge N,\) notice that the middle case never occurs since \(|\ell -\tilde{\ell }|\le N-1 < m\) for all \(\ell ,\tilde{\ell }\in [N].\) Therefore, (30) holds.

With the isometry established, we may now proceed to the main component of the proof of Theorem 3.

Theorem 7

Let \(m\ge \kappa N\) with \(\kappa \ge \frac{2\log (N)}{\log (\sqrt{e}/\sqrt{2})}\) and the entries of \(\Delta \) be i.i.d. with any distribution. Then

$$\begin{aligned} \inf _{v\in S^{N-1}}\Vert {\mathcal {S}} v\Vert _2 \ge \frac{\sqrt{m}}{\sqrt{2N}}, \end{aligned}$$

with probability exceeding \(1-\frac{1}{N}.\)

Proof

We will apply a matrix Chernoff inequality to lower bound the smallest eigenvalue of \(\tilde{{\mathcal {S}}}^*\tilde{{\mathcal {S}}}.\) To apply Theorem 1.1 in [64], notice that we can expand

$$\begin{aligned} \tilde{{\mathcal {S}}}^*\tilde{{\mathcal {S}}} = \sum _{k=1}^{m}\tilde{{\mathcal {S}}}_{k*}^*\tilde{{\mathcal {S}}}_{k*}, \end{aligned}$$

which is a sum of independent, random, self-adjoint, and positive-definite matrices. Our isometry condition (30) gives that \({\mathbb {E}}\tilde{{\mathcal {S}}}^*\tilde{{\mathcal {S}}} ={\mathcal {I}}_N\) has extreme eigenvalues equal to 1, we stress that this holds because we assume \(m\ge N\) as shown above. Further,

$$\begin{aligned} \Vert \tilde{{\mathcal {S}}}_{k*}^*\tilde{{\mathcal {S}}}_{k*}\Vert = \frac{1}{m} \Vert {\mathcal {F}}\tilde{{\mathcal {N}}}_{k*}^*\tilde{{\mathcal {N}}}_{k*}{\mathcal {F}}^*\Vert = \frac{1}{m} \Vert \tilde{{\mathcal {N}}}_{k*}^*\tilde{{\mathcal {N}}}_{k*}\Vert = \frac{N}{m}. \end{aligned}$$

Therefore, by Theorem 1.1 in [64] with \(R=\frac{N}{m}\) and \(\delta = \frac{1}{2},\) we obtain

$$\begin{aligned} {\mathbb {P}}\left( \lambda _{\text{ min }}\left( \tilde{{\mathcal {S}}}^*\tilde{{\mathcal {S}}}\right) \le \frac{1}{2} \right) \le N\left( \frac{\sqrt{2}}{\sqrt{e}}\right) ^{m/N}. \end{aligned}$$

With \(m\ge \kappa N\) and \(\kappa \ge \frac{2\log (N)}{\log (\sqrt{e}/\sqrt{2})},\) the left hand side is upper bounded by \(N^{-1}.\) Since the singular values of \(\tilde{{\mathcal {S}}}\) are the square root of the eigenvalues of \(\tilde{{\mathcal {S}}}^*\tilde{{\mathcal {S}}},\) this establishes the result. \(\square \)

With our remarks in the beginning of the section, we can now easily establish the proof of Theorem 3.

Proof of Theorem 3

Under our assumptions, apply Theorem 7 to obtain that for all \(v\in S^{N-1}\)

$$\begin{aligned} \inf _{v\in S^{N-1}}\Vert {\mathcal {S}}v\Vert _2 \ge \frac{\sqrt{m}}{\sqrt{2N}} \end{aligned}$$

holds with the prescribed probability. This establishes (29) with \(\delta = \frac{\sqrt{m}}{\sqrt{2N}}.\) The remainder of the proof follows from our outline in the beginning of the section. \(\square \)

8.3 Interpolation error of Dirichlet kernel: proof

In this section we provide the error term of our interpolation operator when applied to our signal model (Theorem 1) and also the error bound given in Corollary 1.

Proof of Theorem 1

We begin by showing (8), i.e., if \(\tilde{t}_k = t_{\tilde{p}}\) for some \(\tilde{p}\in [N]\) (our “nonuniform” sample lies on the equispaced interpolation grid) then the error is zero. This is easy to see by orthogonality of the complex exponentials, combining (5), (6) (recall that \(\tilde{N} = \frac{N-1}{2}\)) we have

$$\begin{aligned}&({\mathcal {S}}f)_{k} = \langle f, {\mathcal {S}}_{k*}\rangle = \sum _{p=1}^{N} f_p{\mathcal {S}}_{kp} = \frac{1}{N}\sum _{p=1}^{N} f_p\left( \sum _{u=-\tilde{N}}^{\tilde{N}}{{\textbf {e}}}(ut_p){{\textbf {e}}}(-u\tilde{t}_k)\right) \\&\quad = \frac{1}{N}\sum _{p=1}^{N} f_p\left( \sum _{u=-\tilde{N}}^{\tilde{N}}{{\textbf {e}}}(ut_p){{\textbf {e}}}(-ut_{\tilde{p}})\right) = f_{\tilde{p}} = {\textbf{f}}(t_{\tilde{p}}) = {\textbf{f}}(\tilde{t}_k) = \tilde{f}_k. \end{aligned}$$

The fourth equality holds since we are assuming \(\tilde{t}_k = t_{\tilde{p}}\) for some \(\tilde{p}\in [N].\)

We now deal with the general case (9). Recall the Fourier expansion of our underlying function

$$\begin{aligned} {\textbf{f}}(x) = \sum _{\ell =-\infty }^{\infty }c_{\ell }{\textbf{e}}(\ell x). \end{aligned}$$

Again, using (5), (6) and the Fourier expansion at \({{\textbf {f}}}(t_p) = f_p\) we obtain

$$\begin{aligned}&({\mathcal {S}}f)_{k} = \langle f, {\mathcal {S}}_{k*}\rangle = \sum _{p=1}^{N} f_p{\mathcal {S}}_{kp}\\&\quad := \frac{1}{N}\sum _{p=1}^{N} \left( \sum _{\ell =-\infty }^{\infty }c_{\ell }{\textbf{e}}(\ell t_p)\right) \left( \sum _{u=-\tilde{N}}^{\tilde{N}}{{\textbf {e}}}(ut_p){{\textbf {e}}}(-u\tilde{t}_k)\right) . \end{aligned}$$

At this point, we wish to switch the order of summation and sum over all \(p\in [N].\) We must assume the corresponding summands are non-zero. To this end, we continue assuming \(f_p, {\mathcal {S}}_{kp} \ne 0\) for all \(p\in [N].\) We will deal with these cases separately afterward. In particular we will remove this assumption for the \(f_p\)’s and show that \({\mathcal {S}}_{kp} \ne 0\) under our assumption \(\tilde{\tau }\subset \Omega .\)

Proceeding, we may now sum over all \(p\in [N]\) to obtain

$$\begin{aligned} ({\mathcal {S}}f)_{k}&= \frac{1}{N}\sum _{u=-\tilde{N}}^{\tilde{N}}\sum _{\ell =-\infty }^{\infty }c_{\ell }{{\textbf {e}}} (-u\tilde{t}_k)\sum _{p=1}^{N}{{\textbf {e}}}((u + \ell )t_p)\\&= \sum _{u=-\tilde{N}}^{\tilde{N}}\sum _{j=-\infty }^{\infty }(-1)^{jN}c_{jN+u}{{\textbf {e}}}(u\tilde{t}_k) = \sum _{j=-\infty }^{\infty }(-1)^{\lfloor \frac{j+\tilde{N}}{N}\rfloor }c_j{{\textbf {e}}}(r(j)\tilde{t}_k). \end{aligned}$$

The second equality is obtained by orthogonality of the exponential basis functions, \(\sum _{p=1}^{N}{{\textbf {e}}}((u + \ell )t_p) = 0\) when \(\ell +u\notin N{\mathbb {Z}}\) and otherwise equal to \(N(-1)^{jN}\) for some \(j\in {\mathbb {Z}}\) where \(u+\ell = jN.\) The last equality results from a reordering of the absolutely convergent series where the mapping r is defined as in the statement of Theorem 1.

To illustrate the reordering, we consider \(j\ge 0\) (for simplicity) and first notice that \((-1)^{jN} = (-1)^j\) since N is assumed to be odd in Sect. 2.1. Aesthetically expanding the previous sum gives

$$\begin{aligned}&\sum _{u=-\tilde{N}}^{\tilde{N}}\sum _{j=0}^{\infty } (-1)^j c_{jN+u}{{\textbf {e}}}\left( u\tilde{t}_k\right) \\&\quad ={{\textbf {e}}}(-\tilde{N}\tilde{t}_k)\left( c_{-\tilde{N}} -c_{N-\tilde{N}}+c_{2N-\tilde{N}}- \cdots \right) \\&\qquad + {{\textbf {e}}}((-\tilde{N}+1)\tilde{t}_k)\left( c_{-\tilde{N}+1} -c_{N-\tilde{N}+1} + c_{2N-\tilde{N}+1} - \cdots \right) \\&\qquad \qquad \vdots \\&\qquad +{{\textbf {e}}}(0\cdot \tilde{t}_k)\left( c_{0} -c_{N}+ c_{2N} - \cdots \right) \\&\qquad \qquad \vdots \\&\qquad +{{\textbf {e}}}(\tilde{N}\tilde{t}_k)\left( c_{\tilde{N}} -c_{N+\tilde{N}}+ c_{2N+\tilde{N}} - \cdots \right) . \end{aligned}$$

Notice that in the first row starting at the second coefficient we have indices \(N-\tilde{N} = \tilde{N}+1\) followed by \(2N-\tilde{N} = N+\tilde{N}+1\) and so on, which are subsequent to the indices of the coefficients in the last row (one column prior). Therefore, if start at the top left coefficient \(c_{-\tilde{N}}\) and “column-wise” traverse this infinite array of Fourier coefficients we will obtain the ordered sequence \(\{(-1)^{\lfloor \frac{j+\tilde{N}}{N}\rfloor }c_j\}_{j=-\tilde{N}}^{\infty }\) (with no repetitions).

The coefficients in row \(q\in [N]\) correspond to frequency value \(-\tilde{N}+q-1\) and have indices of the form \(pN-\tilde{N}+q-1\) for some \(p\in {\mathbb {N}}.\) To establish that the reordered series is equivalent, we finish by checking that for a given index the mapping r gives the correct frequency value, i.e., \(r(pN-\tilde{N}+q-1) = -\tilde{N}+q-1\) for all \(q\in [N]\):

$$\begin{aligned}&r(pN-\tilde{N}+q-1) {:}{=}\text{ rem }(pN-\tilde{N}+q-1 + \tilde{N}, N) - \tilde{N}\\&\quad = \text{ rem }(pN+q-1, N) - \tilde{N} = q-1 - \tilde{N}. \end{aligned}$$

We can therefore reorder the series as desired and incorporate the sum over \(j<0\) via the same logic to establish the equality.

Since for \(\ell \in \{-\tilde{N}, -\tilde{N} + 1,\ldots , \tilde{N}\}\) we have \(r(\ell ) = \ell \) and \((-1)^{\lfloor \frac{\ell +\tilde{N}}{N}\rfloor } = 1,\) we finally obtain

$$\begin{aligned} {{\textbf {f}}}(\tilde{t}_k)-({\mathcal {S}}f)_{k} = \sum _{|\ell |>\tilde{N}}c_{\ell }\left( {{\textbf {e}}}(\ell \tilde{t}_k) -(-1)^{\lfloor \frac{\ell +\tilde{N}}{N}\rfloor }{{\textbf {e}}}(r(\ell )\tilde{t}_k)\right) . \end{aligned}$$

The definition of the p-norms along with the triangle inequality give the remaining claim. In particular,

$$\begin{aligned}&\Vert \tilde{f}-{\mathcal {S}}f\Vert _p = \left( \sum _{k=1}^{m} |{{\textbf {f}}}(\tilde{t}_k)-({\mathcal {S}}f)_{k}|^p\right) ^{1/p}\\&\quad = \left( \sum _{k=1}^{m} \Biggl |\sum _{|\ell |>\tilde{N}}c_{\ell }\left( {{\textbf {e}}}(\ell \tilde{t}_k) -(-1)^{\lfloor \frac{\ell +\tilde{N}}{N}\rfloor }{{\textbf {e}}}(r(\ell )\tilde{t}_k)\right) \Biggr |^p\right) ^{1/p}\\&\quad \le \left( \sum _{k=1}^{m} \left( \sum _{|\ell |>\tilde{N}}2|c_{\ell }|\right) ^p\right) ^{1/p} = \left( m \left( \sum _{|\ell |>\tilde{N}}2|c_{\ell }|\right) ^p\right) ^{1/p} = 2m^{1/p} \sum _{|\ell |>\tilde{N}}|c_{\ell }|. \end{aligned}$$

This finishes the proof in the case \(f_p, {\mathcal {S}}_{kp} \ne 0\) for all \(p\in [N].\) To remove this condition for the \(f_p\)’s, we may find a real number \(\mu \) such that the function

$$\begin{aligned} {\textbf{g}}(x) {:}{=}{\textbf{f}}(x) + \mu = \sum _{\ell \in (-\infty ,\infty )\cap {\mathbb {Z}}/\{0\}}c_{\ell }{\textbf{e}}(\ell x) + c_0 + \mu \end{aligned}$$

is non-zero when \(x \in \{t_p\}_{p=1}^{N}.\) In particular notice that if we define \(h = f + \mu \in {\mathbb {C}}^N,\) then \(h_p \ne 0\) for all \(p\in [N].\) Therefore, only assuming now that \({\mathcal {S}}_{kp} \ne 0\) for \(p\in [N],\) the previous argument can be applied to conclude

$$\begin{aligned} {{\textbf {g}}}(\tilde{t}_k)-({\mathcal {S}}h)_{k} = \sum _{|\ell |>\tilde{N}}c_{\ell }\left( {{\textbf {e}}}(\ell \tilde{t}_k) -(-1)^{\lfloor \frac{\ell +\tilde{N}}{N}\rfloor }{{\textbf {e}}}(r(\ell )\tilde{t}_k)\right) . \end{aligned}$$

However, if \(1_N\in {\mathbb {C}}^{N}\) denotes the all ones vector and \(e_{\tilde{N}+1}\in {\mathbb {C}}^{N}\) is the \(\tilde{N}+1\)-th standard basis vector, notice that

$$\begin{aligned}&({\mathcal {S}}h)_{k} = \langle {\mathcal {S}}_{k*},h\rangle = \langle {\mathcal {S}}_{k*},f\rangle + \mu \langle {\mathcal {S}}_{k*},1_N\rangle = \langle {\mathcal {S}}_{k*},f\rangle + \mu \langle {\mathcal {N}}_{k*},{\mathcal {F}}^*1_N\rangle \\&\quad = \langle {\mathcal {S}}_{k*},f\rangle + \mu \sqrt{N}\langle {\mathcal {N}}_{k*}, e_{\tilde{N}+1}\rangle = \langle {\mathcal {S}}_{k*},f\rangle + \mu = ({\mathcal {S}}f)_{k} + \mu . \end{aligned}$$

The fourth equality holds by orthogonality of \({\mathcal {F}}^*\) and since \({\mathcal {F}}_{(\tilde{N}+1)*}^* = \frac{1}{\sqrt{N}}1_N.\) The fifth inequality holds since \({\mathcal {N}}_{k(\tilde{N}+1)}=\frac{1}{\sqrt{N}}.\) Therefore

$$\begin{aligned} {{\textbf {g}}}(\tilde{t}_k)-({\mathcal {S}}h)_{k} = {{\textbf {f}}}(\tilde{t}_k)+\mu -\left( ({\mathcal {S}}f)_{k} + \mu \right) = {{\textbf {f}}}(\tilde{t}_k)-({\mathcal {S}}f)_{k}, \end{aligned}$$

and the claim holds in this case as well.

The assumption \({\mathcal {S}}_{kp} \ne 0\) will always hold if \(\tilde{\tau }\subset \Omega ,\) i.e., \(\tilde{t}_k\in [-\frac{1}{2},\frac{1}{2})\) for all \(k\in [m].\) We show this case by deriving conditions under which this occurs. As noted before, we have

$$\begin{aligned}&{\mathcal {S}}_{kp} := \sum _{u=-\tilde{N}}^{\tilde{N}}{{\textbf {e}}}(u(t_p-\tilde{t}_k)) = \sum _{u=0}^{N-1}{{\textbf {e}}}(u(t_p-\tilde{t}_k)){{\textbf {e}}}(-\tilde{N}(t_p-\tilde{t}_k))\\&\quad ={{\textbf {e}}}(-\tilde{N}(t_p-\tilde{t}_k))\frac{1-{{\textbf {e}}}(N(t_p-\tilde{t}_k))}{1-{{\textbf {e}}}(t_p-\tilde{t}_k)} \end{aligned}$$

and we see that \({\mathcal {S}}_{kp} = 0\) iff \(N(t_p-\tilde{t}_k)\in {\mathbb {Z}}/\{0\}\) and \(t_p-\tilde{t}_k\notin {\mathbb {Z}}.\) However, notice that

$$\begin{aligned} N(t_p-\tilde{t}_k) = N\left( \frac{p-1}{N}-\frac{k-1}{m}-\Delta _k\right) = p-1-\frac{N(k-1)}{m}-N\Delta _k, \end{aligned}$$

so that \(N(t_p-\tilde{t}_k)\in {\mathbb {Z}}/\{0\}\) iff \(\frac{N(k-1)}{m}+N\Delta _k = N\tilde{t}_k + \frac{N}{2} \in {\mathbb {Z}}/\{p-1\}.\) This condition equivalently requires \(\tilde{t}_k = \frac{j}{N} - \frac{1}{2}\) for some \(j\in {\mathbb {Z}}/\{p-1\}.\) Since this must hold for all \(p\in [N],\) we have finally have that

$$\begin{aligned} N(t_p-\tilde{t}_k)\in {\mathbb {Z}}/\{0\} \quad \text{ iff } \ \tilde{t}_k = \frac{j}{N} - \frac{1}{2} \ \text{ for } \text{ some } \ j\in {\mathbb {Z}}/\{0,1,\ldots , N-1\}. \end{aligned}$$

We see that such a condition would imply that \(\tilde{t}_k \notin \Omega := [-\frac{1}{2},\frac{1}{2}),\) which violates our assumption \(\tilde{\tau }\subset \Omega .\) This finishes the proof. \(\square \)

We end this section with the proof of Corollary 1.

Proof of Corollary 1

The proof will consist of applying Theorem 2 (under identical assumptions) and Theorem 1.

By Theorem 2, we have that

$$\begin{aligned} \Vert f-\Psi g^{\sharp }\Vert _{2} \le \frac{8\epsilon _s(g)}{\sqrt{s}} + \left( \frac{4}{\lambda \sqrt{s}}+\frac{8\sqrt{2}}{\sqrt{1-\theta }} \right) \left( \frac{\sqrt{N}}{\sqrt{m}}\Vert d\Vert _2 + 2\sqrt{N}\sum _{|\ell |>\frac{N-1}{2}}|c_{\ell }|\right) \end{aligned}$$

with probability exceeding \(1-\frac{1}{N}.\) As in the proof of Theorem 1, we can show that for \(x\in \Omega \)

$$\begin{aligned} {{\textbf {f}}}(x) - \langle {{\textbf {h}}}(x),{\mathcal {F}}^*f\rangle = \sum _{|\ell |>\tilde{N}}c_{\ell }\left( {{\textbf {e}}}(\ell x) - (-1)^{\lfloor \frac{\ell +\tilde{N}}{N}\rfloor }{{\textbf {e}}}(r(\ell )x)\right) . \end{aligned}$$

Therefore

$$\begin{aligned}&|{{\textbf {f}}}(x)-{{\textbf {f}}}^{\sharp }(x)|:= |{{\textbf {f}}}(x)-\langle {{\textbf {h}}}(x),{\mathcal {F}}^*\Psi g^{\sharp }\rangle |\\&\quad \le |{{\textbf {f}}}(x)-\langle {{\textbf {h}}}(x),{\mathcal {F}}^*f\rangle |+ |\langle {{\textbf {h}}}(x),{\mathcal {F}}^*f\rangle -\langle {{\textbf {h}}}(x),{\mathcal {F}}^*\Psi g^{\sharp }\rangle |\\&\quad \le \Biggl |\sum _{|\ell |>\tilde{N}}c_{\ell }\left( {{\textbf {e}}}(\ell x) -(-1)^{\lfloor \frac{\ell +\tilde{N}}{N}\rfloor }{{\textbf {e}}}(r(\ell )x)\right) \Biggr |+ \Vert {{\textbf {h}}}(x)\Vert _2\Vert {\mathcal {F}}^*(f-\Psi g^{\sharp })\Vert _2\\&\quad \le 2\sum _{|\ell |>\tilde{N}}|c_{\ell }|+ \frac{8\epsilon _s(x)}{\sqrt{s}} + \left( \frac{4}{\lambda \sqrt{s}}+ \frac{8\sqrt{2}}{\sqrt{1-\theta }}\right) \left( \frac{\sqrt{N}}{\sqrt{m}}\Vert d\Vert _2 + 2\sqrt{N}\sum _{|\ell |>\frac{N-1}{2}}|c_{\ell }|\right) . \end{aligned}$$

The last inequality holds since \(\Vert {{\textbf {h}}}(x)\Vert _2 = 1\) (here x is considered fixed and \({{\textbf {h}}}(x)\in {\mathbb {C}}^{N}).\) This finishes the proof. \(\square \)