Sparse sampling: theory, methods and an application in neuroscience

The current methods used to convert analogue signals into discrete-time sequences have been deeply influenced by the classical Shannon–Whittaker–Kotelnikov sampling theorem. This approach restricts the class of signals that can be sampled and perfectly reconstructed to bandlimited signals. During the last few years, a new framework has emerged that overcomes these limitations and extends sampling theory to a broader class of signals named signals with finite rate of innovation (FRI). Instead of characterising a signal by its frequency content, FRI theory describes it in terms of the innovation parameters per unit of time. Bandlimited signals are thus a subset of this more general definition. In this paper, we provide an overview of this new framework and present the tools required to apply this theory in neuroscience. Specifically, we show how to monitor and infer the spiking activity of individual neurons from two-photon imaging of calcium signals. In this scenario, the problem is reduced to reconstructing a stream of decaying exponentials.

and has been instrumental to the digital revolution of the past 60 years. Without the sampling process, we could not convert real-life signals in digital form, and without digital samples, we could not use computers for digital computation. The sampling process is also ubiquitous in that it is present in any mobile phone or digital camera but also in sophisticated medical devices like MRI or ultrasound machines, in sensor networks and in digital microscopes just to name a few examples.
Over the last six decades, our understanding of the conversion of continuous-time signal in discrete form has been heavily influenced by the Shannon-Whittaker-Kotelnikov sampling theorem (Shannon 1949;Whittaker 1929;Kotelnikov 1933;Unser 2000) which showed that the sampling and perfect reconstruction of signals are possible when the Fourier bandwidth or spectrum of the signal is finite. In this case, the signal is said to be bandlimited and must be sampled at a rate (Nyquist rate) at least twice its maximum nonzero frequency in order to reconstruct it without error.
We are so used to this approach that we tend to forget that it comes with many strings attached. First of all, there are no natural phenomena that are exactly bandlimited (Slepian 1976). Moreover, we tend to forget that the Shannon sampling theorem provides sufficient but not necessary conditions for perfect reconstruction. In other words, this theorem does not claim that it is not possible to sample and reconstruct non-bandlimited signals. It is therefore incorrect to assume that the bandwidth of a signal is related to its information content. Consider for instance the function shown in Fig. 1a. This is a stream of short pulses and appears in many applications including bio-imaging, seismic signals and spread-spectrum communication. If the pulse shape is known a priori, the signal is completely determined by the amplitude and location of such pulses. If there are at most K pulses in a unit interval, then the signal is com- Nyquist rate versus information rate. The signal x 1 (t) depicted in part a is bandlimited as shown in part b. The sum of x 1 (t) with a step function lead to a signal x 2 (t) with infinite bandwidth as shown in part c and d pletely specified by the knowledge of these 2K parameters per unit of time. Assume now that the duration of the pulses is reduced but that the average number of pulses per unit interval stays the same. Clearly, the information content of the signal is not changing (still 2K parameters per unit of time); however, its bandwidth is increasing (bandwidth increases when the support of a function decreases).
Consider, as second example, the signal shown in Fig. 2c. This is given by the sum of a bandlimited signal with a step function. Clearly, the step function has only two degrees of freedom: the discontinuity location and its amplitude. So, its information content is finite. The bandlimited function has a finite number of degrees of freedom per unit of time since it is fully determined by its samples at points spaced by the sampling period (given by the inverse of the Nyquist rate). We thus say that they both have a finite rate of innovation. However, the combination of these two functions leads to a signal with infinite bandwidth (see Fig. 2d). Now, if we were to relate the information content of the signal to its bandwidth, we would conclude incorrectly that this signal has an infinite rate of information since it requires an infinite sampling rate for perfect reconstruction. Therefore, bandwidth and information content are not always synonyms.
A first attempt to reconcile these two notions: sampling rate and information content was made in Vetterli et al. (2002). Here, they introduced a new class of signals called signals with finite rate of innovation (FRI) which includes both bandlimited signals and the non-bandlimited functions discussed so far. They went on showing that classes of FRI signals can be sampled and perfectly reconstructed using an appropriate acquisition device. These results have then be extended to include more classes of acquisition devices (Dragotti et al. 2007;Seelamantula and Unser 2008;Asl et al. 2010;Tur et al. 2011;Urigüen et al. 2013) and more classes of signals (Maravić and Vetterli 2005;Berent et al. 2010;Chen et al. 2012). FRI sampling theory has also had impact in various applications (Baboulaz and Dragotti 2009;Poh and Marziliano 2010;Tur et al. 2011;Kandaswamy et al. 2013) and here we focus on an application in neuroscience.
The paper is organised as follows. In the next section, we define FRI signals and give some examples. Section 3 presents the framework for sampling and reconstructing some classes of FRI signals. Specifically, we show how to sample and perfectly reconstruct a stream of Diracs and what are the conditions that the acquisition device has to satisfy. We also extend this framework to the case of streams of decaying exponentials and present some denoising strategies. Section 4 presents an algorithm to reconstruct streaming signals where there is no clear separation between consecutive bursts of spikes. Section 5 describes an application of this theory to monitor neural activity from two-photon calcium images. Finally, conclusions are drawn in Sect. 6.

Notations
For f (t) ∈ L 2 (R), where L 2 (R) is the Hilbert space of finite-energy functions, the Fourier transform of f (t) is denoted byf (ω) and is given byf The indicator function is denoted by 1 A (t) and is given by δ i, j denotes the Kronecker delta, which is defined as δ i, j = 1 if i = j and 0 otherwise. · and · denote the floor and ceil functions.

Finite rate of innovation signals
Classical sampling theorems state that any bandlimited function x(t) such thatx(ω) = 0, ∀ω > ω max , can be perfectly recovered from its samples x n = x(t)| t=nT if the sampling rate 2π/T is greater than or equal to twice the highest frequency component of x(t), that is, 2π/T ≥ ω max . Moreover, the original signal can be perfectly reconstructed as follows: where sinc(t) = sin(π t)/π t. If x(t) is not bandlimited, sampling with an ideal lowpass filter (h(t) = sinc(t/T )) and reconstruction applying (1) provides a lowpass approximation of x(t). This is the best approximation in the least square sense of x(t) in the space spanned by {sinc(t/T − n)} n∈Z (Unser 2000). However, it is an approximation, and perfect reconstruction of the original signal is not achieved. We also note that signals defined as in (1) are completely specified by the knowledge of a new parameter x n every T seconds. Based on this observation, consider now a new class of signals that extend the one in (1) (Vetterli et al. 2002): where {g r (t)} R r =0 is a set of known functions. We note that, since g r (t) are known, signals in (2) are uniquely determined by the set of parameters a r,k and t k . Introducing a counting function C x (t a , t b ) that counts the number of degrees of freedom in x(t) over the interval [t a , t b ], we define the rate of innovation ρ as follows (Vetterli et al. 2002;Dragotti et al. 2007;Blu et al. 2008;Urigüen et al. 2013): and signals with a finite ρ are called signals with a finite rate of innovation (FRI). It is of interest to note that bandlimited signals fall under this definition. Therefore, one possible interpretation is that it is possible to sample them because they have a finite rate of innovation (rather than because they are bandlimited). Examples of FRI signals which are not bandlimited and which are of interest to us include -Stream of pulses: x(t) = k a k p(t − t k ). For instance, stream of decaying exponentials: which are a good fit for calcium transient signals induced by neural activity in two-photon calcium imaging. Figure 1a, b are examples of such signals. -Piecewise sinusoidal signals (see Fig. 1c): -Stream of Diracs (see Fig. 1d):

Sampling scheme
Consider the typical acquisition process as shown in Fig. 3. This is usually modelled as a filtering stage followed by a sampling stage. The filter accounts for the modifications that the analogue signal x(t) experiences before being sampled. It may model an anti-aliasing filter or it might be due to the distortion introduced by the acquisition device, for example, in the case of a digital camera the distortion due to the lens.
Filtering signal x(t) with h(t) = ϕ(−t/T ) and retrieving samples at instants of time t = n T is equivalent to computing the inner product between x(t) and ϕ(t/T − n). Specifically, the filtered signal is given by  Two different kernels that reproduce exactly different exponentials. The first row illustrates the reproduction of two real exponential functions with the kernel ϕ 1 (t), shown in a. The second row illustrates the reproduction of two complex exponentials with the kernel ϕ 2 (t), shown in d.
In e and f, only the real part of the exponentials is shown (e αm t = e iωm t = cos(ω m t) + i sin(ω m t)).
The thin lines represent the shifted and weighted kernels, the thick line represents their sum and the dashed line the true exponential. Note that both kernels are of compact support. The summation in (9)  Moreover, sampling y(t) at regular intervals of time t = n T leads to The function ϕ(t) is called the sampling kernel. In order to guarantee perfect reconstruction of the signal x(t), the sampling kernel and the input signal have to satisfy some conditions. The literature presents a variety of kernels that can be used to achieve perfect reconstruction of FRI signals.
Here, we will focus on exponential reproducing kernels since they offer the best flexibility and resilience to noise.
-Exponential reproducing property: Any function ϕ(t) that together with its shifted versions can reproduce exponential functions of the form e α m t with α m ∈ C and m = 0, 1, . . . , P: The exponential reproduction property is illustrated in Fig. 4 for two different kernels that reproduce different exponentials. In both cases, the kernels are of compact support. The advantage of such kernels is that the summation in (9) can be truncated and still have a region in time where the exponential functions are perfectly reproduced. In general, the exponentials e α m t are perfectly reproduced when the summation is computed for n ∈ Z. Let t ∈ [0, L) be the support of ϕ(t), that is, ϕ(t) = 0 for t / ∈ [0, L). If the sum-mation is truncated to n = n 0 , . . . , n f , it follows that the perfect reproduction of the exponential functions holds for t ∈ [n 0 − 1 + L , n f + 1).

Exponential reproducing kernels
For the sake of clarity, in what follows, we restrict the analysis to the case where the parameter α m in (9) is purely imaginary, that is α m = iω m for m = 0, 1, . . . , P, where ω m ∈ R. This analysis can easily be extended to the more general case where α m has nonzero real and imaginary parts, or is purely real. A function ϕ(t) together with a linear combination of its shifted versions reproduces the exponentials {e iω m t } P m=0 as in (9) if and only if it satisfies the generalised Strang-Fix conditions: where m = 0, 1, . . . , P, l ∈ Z \ {0} andφ(ω) is the Fourier transform of ϕ(t) (Strang and Fix 1971;Unser and Blu 2005;Urigüen et al. 2013). A family of functions that satisfy these conditions are the exponential B-splines, also named E-splines. These functions are constructed through the convolution of elementary zero order E-splines, where each elementary function reproduces a particular exponential e iω m t . The Fourier transform of a zero order E-spline that reproduces the exponential e αt is given bŷ  Figure 5 illustrates the Fourier transform of zero order Esplines for two different values of the parameter α.
The corresponding E-spline that reproduces the set of exponentials {e α m t } P m=0 is obtained as follows where α = (α 0 , α 1 , . . . , α P ). Thus, the Fourier transform of β α (t) is given bŷ E-splines have compact support P + 1 and have P − 1 continuous derivatives. It can be shown that any function that reproduces the set of exponentials {e α m t } P m=0 can be expressed as the convolution of another function γ (t) with the corresponding E-spline that reproduces these exponentials, that is, (Unser and Blu 2005;Delgado-Gonzalo et al. 2012). It is also true that if ϕ(t) reproduces a set of exponentials, this property is preserved through convolution. Let for The function ψ(t) also reproduces the same set of exponentials. This is easy to verify since ψ(t) also satisfies the Strang-Fix conditions.

Sampling with an exponential reproducing kernel
The choice of purely imaginary parameters α m = iω m leads to an important family of sampling kernels. These design parameters directly determine the information of the input analogue signal x(t) that we acquire and allow us to perfectly reconstruct the input signal from the discrete samples y n for some classes of signals. Specifically, the different ω m correspond to the frequencies of the Fourier transform of x(t) that we are able to retrieve from the only knowledge of samples y n . It can be shown that if parameters α m are real or appear in complex conjugate pairs, the corresponding Espline is real. We thus impose that for all α m that are nonzero, their complex conjugates are also present in α. If parameters α m = iω m in vector α are sorted in increasing order of ω m , we have that α * m = α P−m . Let us assume that function x(t) is localised in time and thus only N samples y n are nonzero. Let (s m ) P m=0 be the sequence obtained by linearly combining samples y n with the coefficients c m,n from (9), that is, s m = N n=1 c m,n y n . We have that where (a) follows from (8), (b) from the linearity of the inner product and (c) from the exponential reproduction property. The quantity s m therefore corresponds to the Fourier transform of

Computation of c m,n coefficients
We have established the properties that a function ϕ(t) has to satisfy in order to reproduce exponentials, which are given by the Strang-Fix conditions. Moreover, we have seen the importance of the E-splines since they allow us to obtain samples of the Fourier transform of the input signal. We now show how to obtain the coefficients c m,n in (9) required to reproduce the exponential functions {e iω m t } P m=0 , and that are used to obtain the sequence s m in (15). These coefficients are given by Note that the support in time is equal to P + 1 and quickly decays to zero. This E-spline reproduces the exponentials in Fig. 4 among others. Parameters α m are purely imaginary or equal to zero; purely imaginary α m appear in complex conjugate pairs. a illus-trates the shape of the E-spline in time and b in frequency (expressed in dB). The dots in b represent the locations at which the Fourier transform of ϕ(t) is sampled in order to compute the c m,0 coefficients as in (20). c is a representation of the complex values e αm whereφ(t) is chosen to form with ϕ(t) a quasibiorthonormal set (Dragotti et al. 2007). This includes the particular case The introduction ofφ(t) is a technicality that is needed in order to show where the coefficients c m,n come from, but we do not need to work with this function. From (16) If we plug this expression in (9), we can derive an expression to compute c m,0 for each m = 0, . . . , P: where (a) follows from the Possion summation formula 1 and (b) from the fact that the Fourier transform of ψ(t) is equal to the Fourier transform of ϕ(t) shifted by ω m . Sincê ϕ(ω) satisfies the Strang-Fix conditions, from (18) and (19) it follows that The dots in Fig. 6b illustrate the valuesφ(ω m ) that are used in the computation of the different c m,0 for an E-spline with P = 6. Note that the generalised Strang-Fix conditions (10) impose some constraints on the choice of ω m since we have to guarantee thatφ(ω m ) = 0. From (11) and Fig. 5, it is clear that each ω m introduces zeros at locations ω m + 2πl, where l ∈ Z \ {0}, we thus have to guarantee that for all pairs of distinct m, n we have ω m − ω n = 2πl. In Fig. 6b, it can be appreciated thatφ(ω) is nonzero for all ω = ω m , and that the locations ω m + 2π and ω m − 2π are zero since the curve in dB tends to −∞.
From (20) and (17), we can compute the c m,n coefficients for our choice of (α m ) P m=0 and any value of n ∈ Z. By combining these coefficients with {ϕ(t −n)} n∈Z , the exponentials {e α m t } P m=0 are perfectly reproduced as shown in Fig. 4.

Approximate reproduction of exponentials
The generalised Strang-Fix conditions (10) impose restrictive constraints on the sampling kernel. This becomes a problem when we do not have control or flexibility over the design of the acquisition device. Recent publications Dragotti et al. 2013) show that these conditions can be relaxed and still have a very accurate exponential reproduction, which is the property we require in order to reconstruct the analogue input signal. The first part of the Strang-Fix conditions, that isφ(ω m ) = 0, is easy to achieve, but the second part is harder to guarantee when we do not have control over the sampling device. If the sampling kernel does not satisfy the generalised Strang-Fix conditions, the exponential reproduction property (9) cannot be satisfied exactly. We thus have to find the coefficients c m,n that better approximate the different exponentials e iω m t : n∈Z c m,n ϕ(t − n) e iω m t .
There are various options to compute these coefficients, but a good and stable approximation is obtained with the constant least squares approach ). If the Fourier transform of the sampling kernel is sufficiently small at ω = ω m + 2πl, l = 0, the c m,n coefficients are given by c m,n =φ(ω m ) e iω m n .
Gaussian filters are good candidates for this approach since they are smooth and the shape in time is very similar to the E-splines (see Fig. 7a). The Fourier transform of such filters is given by It is clear that the filter is nonzero at ω = ω m +2πl, l = 0, however, as can be appreciated from Fig. 7a, the attenuation at these frequencies is very strong. This makes the exponential reproduction very accurate as illustrated in Fig. 7b, c.
In the case of the Gaussian filter, we can easily obtain the c m,n coefficients of the exponentials to be reproduced since we have an analytical expression for its Fourier transform. When an analytic expression is unknown, we can still apply this approach since we only need knowledge of the transfer function of the acquisition device at frequencies ω = ω m . The c m,n coefficients are then given by (22).
The approximate Strang-Fix framework is therefore very attractive since it allows us to use the theory discussed so far with any acquisition device.

Perfect reconstruction of FRI signals
In the previous section, we have seen some properties of exponential reproducing kernels. We have also seen that if the sampling kernel satisfies the exponential reproducing property, we can obtain some samples of the Fourier transform of the input analogue signal from the measurements (y n ) N n=1 that result from the sampling process. We now show how this partial knowledge of the Fourier transform can be used to perfectly reconstruct some classes of band unlimited signals.

Perfect reconstruction of a stream of Diracs
We assume that the input signal is a stream of Diracs: x(t) = K k=1 a k δ(t − t k ), and that the sampling kernel ϕ(t) satisfies the exponential reproduction property for a choice of α = (α m ) P m=0 such that α m = iω m , where ω m ∈ R for m = 0, 1, . . . , P. We further impose the frequencies ω m to be equispaced, that is ω m+1 − ω m = λ, and to be symmetric, that is ω m = −ω P−m . We thus have ω m = ω 0 + λm and ω P = −ω 0 .
Since x(t) is a sum of Diracs, we have that the Fourier transform is given by a sum of exponentials: This is clearly a band unlimited signal. We now consider the sequence s m that is obtained by linearly combining samples y n with the coefficients c m,n from the exponential reproducing property (9). From (15), we have that s m =x(−ω m /T ) and therefore: where b k :=a k e iω 0 t k /T and u k :=e iλt k /T . Note that we have also applied the fact that the frequencies can be expressed as ω m = ω 0 + λm. The perfect recovery of the original stream of Diracs, that is, the estimation of the locations t k and the amplitudes a k of the K Diracs, is now recast as the estimation of parameters b k and u k from the knowledge of values s m . The problem of estimating the parameters of a sum of exponentials from a set of samples arises in a variety of fields and has been analysed for several years by the spectral estimation community (Pisarenko 1973;Paulraj et al. 1985;Schmidt 1986). One way to solve it is by realising that the sequence s m given as in (25) is the solution to the following linear homogeneous recurrence relation h K s m−K + · · · + h 1 s m−1 + s m = 0.
See section "Linear homogeneous recurrence relations with constant coefficients" of Appendix for a description of this type of homogeneous systems and their solutions. Note that coefficients h 1 , . . . , h K are unknown, but can be obtained from the following linear system of K equations: It can be shown that, if the K parameters u k in (25) are distinct, which is a direct consequence of the fact that all the delays t k are different, the Toeplitz matrix in the lefthand side of (27) is of rank K , and therefore, the solution is unique (see section "Rank deficiency of Toeplitz matrix" of Appendix for a proof on the rank of this matrix). As shown in section "Linear homogeneous recurrence relations with constant coefficients" of Appendix, the parameters u k are obtained from the roots of the polynomial H (z) = h K z −K + · · ·+h 1 z −1 +1. Once the parameters u k have been obtained, the amplitudes b k of the sum of exponentials can be directly retrieved from (25) by solving the associated least squares problem. From u k and b k , we can then compute t k and a k . The stream of Diracs is thus perfectly recovered. In the literature, this approach is known as Prony's method or the annihilating filter method (Stoica and Moses 2005).
The system of equations (27) requires at least 2K consecutive values s m . Recall that the sequence s m is obtained as follows s m = N n=1 c m,n y n , with m = 0, 1, . . . , P, where P + 1 is the number of exponentials reproduced by the sampling kernel. We thus have a lower bound on the number of exponentials that the sampling kernel has to reproduce: P + 1 ≥ 2K . The perfect reconstruction of a stream of Diracs is summarised in the following theorem.

Theorem 1 Consider a stream x(t) of K Diracs: x(t) =
K k=1 a k δ(t − t k ), and a sampling kernel ϕ(t) that can reproduce exponentials e i(ω 0 +λ m)t , with m = 0, 1, . . . , P, and P + 1 ≥ 2K . Then, the samples defined by y n = x(t) , ϕ(t/T − n) are sufficient to characterise x(t) uniquely. Figure 8 illustrates the entire sampling process. Note that, since the sampling kernel is of compact support and the stream of Diracs is localised in time, there are only a small number of samples y n that are nonzero. From Fig. 8e, it is clear that the signal is not bandlimited. Furthermore, in the classical sampling setup, in order to sample a continuoustime signal at rate T −1 Hz, an anti-aliasing filter that sets to zerox(ω) for |ω| ≥ π/T has to be applied before acquisition. The FRI framework does not impose this stringent condition since the sampling kernel is not necessarily equal to zero for all |ω| ≥ π/T . Sampling of a stream of decaying exponentials and perfect reconstruction. Since x(t) is an infinite duration signal, samples y n are nonzero for n ≥ n 0 , for some n 0 that depends on the location of the first decaying exponential. However, if the number of decaying exponen-tials is finite, the number of nonzero samples z n = y n − y n−1 e −αT is also finite since they are equivalent to sampling a stream of Diracs with a compact support kernel. a Input signal, x(t), b filtered and sampled signal, c reconstructed signal

Perfect reconstruction of a stream of decaying exponentials
Streams of Diracs are an idealisation of streams of pulses. Although this example may seem limited, the framework presented so far can be applied to other classes of functions that model a variety of signals. For instance, calcium concentration measurements obtained from two-photon imaging to track the activity of individual neurons can be modelled with a stream of decaying exponentials. In this model, the time delays correspond to the activation time of the tracked neuron, that is, the action potentials (AP).
Let x(t) be a stream of K decaying exponentials, that is where ρ α (t):=e −αt 1 t≥0 . See Fig. 9a for an example of such signal. This is also an FRI signal since x(t) is perfectly determined by a finite number of parameters: {(t k , a k )} K k=1 . Let us assume that x(t) is sampled with the acquisition device described in Sect. 3.2.1, that is, an exponential reproducing kernel h(t) = ϕ(−t/T ), followed by a sampling stage. We thus have that ϕ(t) satisfies (9), and the resulting samples y n can be expressed as the inner product between x(t) and ϕ(t/T − n) as in (8).
Let us also assume that the reproduced exponentials e iω m t can be expressed as e i(ω 0 +λm)t , with m = 0, 1, . . . , P. It can be shown that sampling the signal in (28) with ϕ(−t/T ) and computing the following finite differences is equivalent to the sequence that would result from sampling the stream of Diracs s(t) = K k=1 a k δ(t − t k ) with the following kernel where β αT (−t) is a zero order E-spline with parameter αT (Oñativia et al. 2013a). Note that α is the exponent in (28). We thus have that Since convolution preserves the exponential reproduction property, ψ(t) reproduces the same exponentials as ϕ(t). Thus, we can find the coefficients d m,n such that n∈Z d m,n ψ(t − n) = e iω m t , m = 0, 1, . . . , P.
We now have all the elements to perfectly reconstruct the stream of decaying exponentials x(t) from samples y n , that is, estimate the set of pairs of parameters {(t k , a k )} K k=1 . By combining the sequence z n with coefficients d m,n , we obtain exactly the same measurements s m as in (25): where b k = a k e iω 0 t k /T and u k = e iλt k /T . We can therefore apply Prony's method to this sequence and obtain the parameters of interest. Figure 9 illustrates the perfect reconstruction of a stream of K = 4 decaying exponentials.

FRI signals with noise
The acquisition process inevitably introduces noise making the solutions described so far only ideal. Perturbations may arise in the analogue and digital domain. We model the noise of the acquisition process as a white Gaussian process that is added to the ideal samples. The noisy samples are therefore given bỹ where y n are the ideal noiseless samples from (8) and ε n are i.i.d. Gaussian random variables with zero mean and variance σ 2 ε . In order to have a more robust reconstruction, we increase the number of samples s m by making the order P larger than the critical rate 2K − 1.
The denoising strategies that can be applied to improve the performance of the reconstruction process come from the spectral analysis community, where the problem of finding sinusoids in noise has been extensively studied. There are two main approaches. The first, named Cadzow denoising algorithm, is an iterative procedure applied to the Toeplitz matrix constructed from samples s m as in (27). Let us denote by S this matrix. By construction, this matrix is Toeplitz, and in the noiseless case, it is of rank K . The presence of noise makes this matrix be full rank. The Cadzow algorithm (Cadzow 1988) looks for the closest rank deficient matrix which is Toeplitz. At each step, we force matrix S to be of rank K by computing the singular value decomposition (SVD) and only keeping the K largest singular values and setting the rest to zero. This new matrix is not Toeplitz anymore, we thus compute a new Toeplitz matrix by averaging the diagonal elements. This last matrix might not be rank deficient, and we can thus iterate again. The next step is to solve equation (27). This is done computing the total least squares solution that minimises Sh 2 subject to h 2 = 1, where h is an extended version of the vector in (27) and has length K + 1. If this vector is normalised with respect to the first element, we have that the following K elements correspond to the coefficients h k in (26). This approach has successfully been applied in the FRI setup in (Blu et al. 2008).
The second approach is based on subspace techniques for estimating generalised eigenvalues of matrix pencils Sarkar 1990, 1991). Such approach has also been applied in the FRI framework (Maravić and Vetterli 2005). This method is based on the particular structure of the matrix S, which is Toeplitz and each element is given by a sum of exponentials. Let S 0 be the matrix constructed from S by dropping the first row and S 1 the matrix constructed from S by dropping the last row. It can be shown that in the matrix pencil S 0 − μS 1 the parameters {u k } K k=1 from (25) are rank reducing numbers, that is, the matrix S 0 − μS 1 has rank K − 1 for μ = u k and rank K otherwise. The parameters {u k } K k=1 are thus given by the eigenvalues of the generalised eigenvalue problem (S 0 − μS 1 )v = 0.
Further variations of these two fundamental approaches have been proposed recently. See for example Tan and Goyal (2008), Erdozain and Crespo (2011), Hirabayashi et al. (2013).

Sampling streaming FRI signals
In the previous section, we have seen how to sample and reconstruct a set of K Diracs. We now consider the case where we have a streaming signal: Fig. 10 Border effects with the sliding window approach. In this example, N = 16 and T = 1/4. a A nearby Dirac located before the observation window τ influences samples y n of the window. b A Dirac inside the window but close to the right border generates nonzero samples outside the window If the stream is made of clearly separable bursts, we can apply the previously described strategy by assuming that each burst has a maximum number of spikes. However, when this separation cannot be made because of the presence of noise, or due to the nature of the signal itself, this strategy is not valid. The infinite stream presents an obvious constraint due the number of parameters that have to be recovered. We have seen that the order of the sampling kernel, P, and its support are directly related to the number of parameters to be estimated. However, we cannot increase P indefinitely. In order to handle this type of signals, we thus consider a sequential and local approach (Oñativia et al. 2013b).

Sliding window approach
We assume that x(t) has a bounded local rate of innovation of 2K /τ , that is, for any time window of duration τ there are at most K Diracs within the window. Since each Dirac has two degrees of freedom, location and amplitude, the rate of innovation is 2K /τ . We analyse sequentially the infinite stream with a sliding window that progresses in time by steps equal to the sampling interval T . Let the i-th window cover the following temporal interval where τ = N T and N is the number of samples that are processed for each position of the sliding window. The acquisition device is the same as in the previous section: the sampling kernel is given by h(t) = ϕ(−t/T ) and y n = x(t) , ϕ(t/T − n) . In order to have a causal filter h(t), that is h(t) = 0 for t < 0, we impose the support of ϕ(t) to be t ∈ (−L , 0], where L = P + 1 if ϕ(t) is an E-spline of order P. The support of ϕ(t/T − n) is therefore t ∈ ((n − L)T, nT ]. Consequently, a Dirac located at t = t k influences L samples y n . The indices corresponding to these samples are given by When we process the stream sequentially, there are border effects due to the fact that we only process N samples at a time. Diracs located just before the sliding window influence samples within the window, and the Diracs inside the observation window which are close to the right border influence samples outside the window. These effects are illustrated in Fig. 10. However, if the sliding window is big enough, there are a good number of positions of the sliding window that will fully capture each individual Dirac and therefore lead to a good estimate of its amplitude and location. In the noiseless case, we can detect if we are in the presence of these border effects or if there is no border effect and therefore the reconstruction can be exact. Nonetheless, in the presence of noise, we cannot guarantee perfect reconstruction.
For this reason, the sequential algorithm works in two steps: first, it estimates the locations for each position of the sliding window; second, it analyses the consistency of the retrieved locations among different windows. The i-th window processes samples (ỹ n ) k } be the set of estimated locations within the i-th window. When the observation window is at position t = n i T , we know that Diracs located at t < (n i − L)/T cannot have any influence on the current samples. We can therefore analyse the consistency of the locations up to (n i − L)/T . Figure 11a shows the retrieved locations for different positions of the sliding window, where the horizontal axis corresponds to the window index, n i , and the vertical axis to the locations in time, that is, for a given window index, each dot corresponds to an estimate of the set {t (i) k }. Consistent locations among different windows appear as horizontally aligned dots. The shaded area represents the evolution in time of the observation window: for a given index n i , the vertical cross section of the shaded area represents the time interval τ that is seen by this window. This consistency can be analysed by building a histogram of all the estimated locations up to a given time. This is illustrated in Fig. 11b. The Diracs are then estimated from the peaks of this histogram.

Application to neuroscience
To understand how neurons process information, neuroscientists need accurate information about the firing of action potentials (APs of spikes) by individual neurons. We thus need techniques that allow to monitor large areas of the brain with a spatial resolution that distinguishes single neurons and with a temporal resolution that resolves APs. Of the currently available techniques, only multiphoton calcium imaging (Denk et al. 1990(Denk et al. , 1994Svoboda et al. 1999;Stosiek et al. 2003) and multielectrode array electrophysiology (Csicsvari et al. 2003;Blanche et al. 2005;Du et al. 2009) offer this capability. Of these, only multiphoton calcium imaging currently allows precise three-dimensional localisation of each individual monitored neuron within the region of tissue studied, in the intact brain. Populations of neurons are simultaneously labelled with a fluorescent indicator, acetoxy-methyl (AM) ester calcium dyes (Stosiek et al. 2003). This allows simultaneous monitoring of action potential-induced calcium signals in a plane (Ohki et al. 2005) or volume (Göbel and Helmchen 2007) of tissue. The calcium concentration is measured with a laser-scanning two-photon imaging system. For a given region of interest (ROI) where a neuron is located, the calcium concentration is obtained by averaging the value of the pixels of the ROI for each frame. The result is a one-dimensional fluorescence sequence. We assume that when a neuron is activated, the calcium concentration jumps instantaneously, and each jump has the same amplitude A. The concentration then decays exponentially, with time constant τ , to a baseline concentration. The one-dimensional fluorescence signal can therefore be characterised by convolving the spike train with a decaying exponential and adding noise: where the index k represents different spikes and the different t k their occurrence times. Hence, the goal of spike detection algorithms is to obtain the values t k .
A number of methods have previously been used to detect spike trains from calcium imaging data, including thresholding the first derivative of the calcium signal (Smetters et al. 1999), and the application of template-matching algorithms based on either fixed exponential (Kerr et al. 2005(Kerr et al. , 2007Greenberg et al. 2008) or data-derived (Schultz et al. 2009;Ozden et al. 2008) templates. Machine learning techniques (Sasaki et al. 2008) and probabilistic methods based on sequential Monte Carlo framework (Vogelstein et al. 2009) or fast deconvolution (Vogelstein et al. 2010) have also been proposed. Some broadly used methods such as template matching or derivative-thresholding have the disadvantage that they do not deal well with multiple events occurring within a time period comparable to the sampling interval. Our spike detection algorithm is based on connecting the calcium transient estimation problem to the theory of FRI signals. The calcium concentration model in (38) is clearly a FRI signal, we can thus apply the techniques presented in the previous sections.

Spike inference algorithm
The spike inference algorithm is based on applying the sliding window approach presented in Sect. 4.1 combined with the reconstruction of streams of decaying exponentials presented in Sect. 3.2.2. One major issue of the framework presented so far is that we have assumed the number K of spikes within a time window to be known a priori. In practice, this is a value that has to be estimated.
In the noiseless case, the number of spikes can be recovered from the rank of the Toeplitz matrix constructed from samples s m :  Fluorescence signal and detected spikes using the double consistency approach. The spikes are detected from the peaks of the histogram in Fig. 12c In the noisy case, matrix S becomes full rank. An estimate of K can still be obtained by thresholding the normalised singular values of S. Let μ 1 ≥ μ 2 ≥ . . . μ P/2 +1 be the singular values of S sorted in decreasing order. We can estimate K as the number of singular values that satisfy μ i /μ 1 ≥ μ 0 . Where 0 < μ 0 < 1 is adjusted depending on the level of noise. This approach tends to overestimate K . Moreover, we never detect the K = 0 case since when noise is present we always have μ 1 = 1.

S
To overcome these inaccuracies, we make the algorithm more robust by applying a double consistency approach. We run the sliding window approach presented in Sect. 4.1 twice. First, with a sufficiently big window where we estimate K from the singular values of S. Second, with a smaller window where we assume that we only capture one spike and therefore we always set K = 1. We then build a joint histogram out of all the locations retrieved from both approaches and estimate the spikes from the peaks of the histogram. This approach is illustrated in Figs. 12 and 13 with real data.
This technique is fast and robust in high noise and low temporal resolution scenarios. It is able to achieve a detection rate of 84 % of electrically confirmed AP with real data (Oñativia et al. 2013a), outperforming other state of the art realtime approaches. Due to its low complexity, tens of streams can be processed in parallel with a commercial off-the-shelf computer.

Conclusions
We have presented a framework to sample and reconstruct signals with finite rate of innovation. We have shown that it is possible to sample and perfectly reconstruct streams of Diracs, and more importantly, streams of decaying exponentials. The latter offer a perfect fit for calcium transients induced by the spiking activity of neurons. The presented approach is sequential, and the reconstruction is local. These two features make the overall algorithm resilient to noise and have low complexity offering real-time capabilities.
The theoretical framework, where perfect reconstruction can be achieved, is also extended to the more realistic case where we do not have full control over the sampling kernel. In this case, perfect reconstruction cannot be guaranteed, but we can still reconstruct the underlying analogue signal with high precision if the sampling kernel can reproduce exponentials approximately. The corresponding homogeneous system is given by L K [y n ] = 0. (41)