1 Background

Since the seminal papers [16] of Agrawal and Raskar coded exposure (flutter shutter) method has received a lot of follow-ups [739]. In a nutshell, the authors proposed to open and close the camera shutter, according to a sequence called “code,” during the exposure time. By this clever exposure technique, the coded exposure method permits one to arbitrarily increase the exposure time when photographing (flat) scenes moving at a constant velocity. Note that with a coded exposure method only one picture is stored/transmitted. A rich body of empirical results suggest that the coded exposure method allows for a gain in terms of Mean Square Error (MSE) or Signal to Noise Ratio (SNR) compared to a classic camera, i.e., a snapshot. Therefore, the coded exposure method seems to be a magic tool that should equip all cameras.

We now briefly expose the different applications, variants, and studies that surround the coded exposure method. An application of the coded exposure method to bar codes is given in [16, 35], to fluorescent cell image in [27], to periodic events in [25, 34, 36], to multi-spectral imaging in [10], and to iris in [21]. Application to motion estimation/deblurring are presented in [9, 19, 20, 26, 31, 33, 37, 38]. An extension for space-dependent blur is investigated in [28]. Methods to find better or optimal sequences as investigated in [1214, 22, 23, 39] or in [15] that aims at adapting the sequence to the velocity. Diverse implementations of the method are presented in [17, 18, 24, 32]. The method is used for spatial/temporal trade-off in [7, 8, 11]. A numerical and mathematical investigation of the gain of the method is in [29, 30] but their camera model contains only photon (shot) noise and neglects all other noise sources, contrarily to the model we shall develop in this paper.

Therefore, as far as we know, little is known on the coded exposure method from a rigorous mathematical point of view and it seems useful for the applications to build a theory to shed some light on this promising coded exposure method. For instance, to the best of our knowledge, little is know on the gain, in terms of MSE and SNR, of this coded exposure method compared to a standard (snapshot) camera. This paper proposes a mathematical model of photon acquisition by a light sensor. The model can cope with any additive readout noise of finite variance in addition to the Poisson photon (shot) noise. The model is compatible with the Shannon–Whittaker framework, assumes that the relative camera scene velocity is constant and known, that the sensor does not saturate, that the readout noise has finite variance and that the coded exposure method allows for an invertible transformation among the class of band-limited functions (this means that the observed image can be deblurred using a filter). Note that with this model the image has a structure: the image is assumed to be band limited. This set of assumptions represents an ideal mathematical framework that allows us to give a rigorous analysis of the limits, in terms of MSE and SNR, of the coded exposure method. For instance, it is clear that the MSE (resp. SNR) will increase (resp. decrease) if one needs to estimate the velocity from the observed data, compared to the formulae we shall prove in this theoretical paper.

To be thorough, a mathematical analysis of a camera requires to go rigorously from the continuous observed scene to the discrete samples of the final restored image. This is needed to mathematically analyze the whole image chain: from the photon emission to the final restored image via the observed discrete samples measured by the camera. As far as we know, the coded exposure method is very useful for moving scenes. Consequently, we need a formalism capable of dealing with moving scenes. Since the observed scene moves continuously with respect to the time we adopt a continuous point of view. This means that we shall model the observed scene as a function s. Loosely speaking, s(x) give the light intensity at a spatial position x (by opposition, a discrete formalism would model the observed scene as a vector of \(\mathbb {R}^n\) but requires a more restrictive assumption, see below). We shall rely on the Shannon–Whittaker framework (see, e.g., [40]) to perform the mathematical analysis of sampling-related questions. This framework requires the structure of band-limited (with a cutoff frequency) signals or images and will allow us to perform a rigorous mathematical analysis of the coded exposure method. Recall that a discrete formalism would model the observed scene as a vector of \(\mathbb {R}^n\) and the convolution would use Toepliz matrices. Therefore, the scene would be assumed to be periodic and also band limited for sampling purposes. Note that the continuous formalism that we shall develop in this paper does not require to assume that the observed scene s is periodic (most natural scenes are not periodic). However, the adaptation of the formalism that we shall develop in this paper to periodic band-limited scene is straightforward if needed for some application.

Our first goal is to provide closed mathematical formulae that give the MSE and SNR of images obtained by a coded exposure camera. Therefore, we shall start by carefully modeling the photon acquisition by a light sensor then deduce a mathematical model of the coded exposure method.

The mathematical model of camera that we shall develop in this paper has not, to the best of our knowledge, been developed in the existing literature on the coded exposure method. Indeed, the model we shall develop in this paper is able to cope with the Poisson photon (shot) noise in addition to any additive (sensor readout) noise of finite variance and does not require to assume that the observed scene is periodic. For example, the model developed in [30] does not consider any additive (sensor readout) noise. The formulae that give the MSE and SNR of the final crisp image

  • Assume the Shannon–Whittaker framework that (1) requires band-limited (with a frequency cutoff) images, and that (2) the pixel size is designed according to the Shannon–Whittaker theory. In this paper, we prove the validity of the Shannon–Whittaker for non-stationary noises (see also Sect. 2.2).

  • Assume that the relative camera scene velocity is constant and known.

  • Assume that the sensor does not saturate.

  • Assume that the additive (sensor readout) noise has zero mean and finite variance (this term contains, without loss of generality, the quantization noise).

  • Assume that the coded exposure allows for an invertible transformation among the class of band-limited functions (this means that the observed image can be deblurred using a filter).

  • Neglects the boundaries effects for the deconvolution (the inverse filter of a coded exposure camera has larger support than the inverse filter of a snapshot. Thus, this slightly overestimates the gain of the coded exposure method with respect to the snapshot).

We assume that the sensor readout (additive) noise has zero mean. However, with our formalism, the adaptation to non-zero mean additive (sensor readout) noise is straightforward if needed for some application. This zero mean assumption for the additive noise can be found in e.g., [41, 1st paragraph and Eqs. (22)–(25)]. It can also be found in  [42, p. 2, 3rd paragraph] and [43, p. 554, column 2, “noise model” paragraph] (for HDR sensors). It is also common for CMOS (3T) APS sensors, see, e.g., [44, p. 179, paragraph 2] and certain infrared sensors (microbolometers), see, e.g., [45, p. 98] that states that these devices have the readout noise of a CMOS device.

The paper is organized as follows. Section 2 gives a mathematical model of classic cameras. This mathematical model is extended in Sect. 3 to model coded exposure cameras. Section 4 gives an upper bound for the gain of the coded exposure method, in terms of MSE and SNR with respect to a snapshot, in function of the temporal sampling of the code. The upper bound of Corollary 4.2 is illustrated on Fig. 2. In addition, Table 1 provides numerical experiments illustrating these results. The Appendix A–L contain several proofs of propositions that are used throughout this paper. A glossary of notations is in Appendix M (in the sequel latin numerals refer to the glossary of notations).

2 A mathematical model of classic cameras

The goal of this section is to provide a mathematical model of the photon acquisition by a light sensor and the formalism that we shall use to model the coded exposure method in the sequel.

As usual in the coded exposure literature [2, 3, 9, 21, 22, 30, 35, 46] and for the sake of the clarity we shall formalize the coded exposure method using a one-dimensional framework. In other words, the sensor array and the observed image are assumed be one-dimensional. One could think that this one-dimensional framework is a limitation of the theory. However, this one-dimensional framework is no limitation. Indeed, as we have seen, we assume that the image acquisition obeys the Shannon–Whittaker sampling theory. This means that the frequency cutoff is compatible with the image grid sampling. The extension to any two-dimensional grid (and two-dimensional images) is straightforward (the sketch of the proof is in Appendix A). Therefore, the one-dimensional framework that we shall consider is no limitation for the scope of this paper that proposes a mathematical analysis of coded exposure cameras. A fortiori, the calculations of MSE and SNR that we shall propose in this paper remain valid for two-dimensional images. The noise is, in general, non-stationary. This due both to the sensor (see, e.g., [47]) and to the observed scene. In this paper, we also prove the validity of the Shannon–Whittaker interpolation is valid for non-stationary noises (see also Sect. 2.2). In addition, we shall assume that the motion blur kernel is known, i.e., the relative camera scene velocity vector and the exposure code (or function) are known (this kernel is called “PSF motion” in, e.g., [1] and is also assumed to be known [1, p. 2]).

We now turn to the mathematical model of photons acquisition by a light sensor.

2.1 A mathematical model of photons acquisition by a light sensor

The goal of this subsection is to give a rigorous mathematical definition (see Definition 2) of the samples produced by a pixel sensor that observes a moving scene. This definition of the observed sample can cope with any additive zero mean (sensor readout) noise of finite variance in addition to the standard Poisson photon (shot) noise. Note that the model developed in [30] do not consider any additive (sensor readout) noise. Therefore, the results of [30] do not include this more elaborated mathematical model. In particular, the advantages of the coded exposure method in terms of MSE, with this more elaborated set up, are, to the best of our knowledge, open questions.

We consider a continuous formalism in order to ease the transition from steady scenes to scenes moving at an arbitrary real velocity. Another advantage of this continuous formalism is that it allows us to avoid the implicit periodic assumption of the observed scene needed if one uses Toeplitz matrices to represent the convolutions see, e.g., [1, Eq. 2, p. 3] (this is needed because, in general, natural scenes are not periodic).

We now sketch the construction of our camera model. We first consider the photon emission, and finally include the optical and sensor kernels, then include the effect of the exposure time and of the motion to our model. The lasts two steps consists in adding the Poisson photon (shot) noise and the additive (sensor readout) noise to our camera model. The camera model that we shall consider in this paper is depicted in Fig. 1.

Fig. 1
figure 1

Schematic diagram of our camera model. The observed scene emits light and moves at velocity \(v \in \mathbb {R}\). The light undergoes the blur of the optical system and is measured by a pixel sensor. The pixel sensor produces a Poisson random variable (shot noise) that is further corrupted by an additive (sensor readout) noise of finite variance to produce the observed sample

We assume that the observed scene emits photons at a deterministic rate s defined by

$$\begin{aligned}{}\begin{array}[t]{lrcl} s: &{} \mathbb {R} &{} \longrightarrow &{}(0,+\infty ) \\ &{} x &{} \longmapsto &{} s(x). \end{array} \end{aligned}$$

Here and in the sequel, the variable \(x\in \mathbb {R}\) represents the spatial position (we will precise the unit of x, i.e., the unit we shall use to measure distances when we introduce the pixel sensor). Intuitively, \(s\) represents the ideal crisp image, i.e., the image that one would observed if there were no noise whatsoever, no motion, with a perfect optical system (formally the point spread function is a Dirac-mass) and the pixel sensor has an infinitesimal area. In a nutshell, \(s(x)\) would be the gray-level of the image at position \(x \in \mathbb {R}\) in the idealistic case mentioned above. The quantity \(s(x)\) can also be seen as the intensity of light emission at position x.

We now introduce the optical system in our model. The effect of the optical system is described by its point spread function (PSF) denoted \(g\), and we assume that \(g\ge 0\). Formally, the effect of the point spread function is modeled by a convolution in space (see, e.g., [48, Eq. 7.1, p. 171] see also, e.g., [1, Eq. 1, Sect. 2]). Therefore, in the noiseless case, if there is no motion, the gray level of the acquired image at position \(x \in \mathbb {R}\) is, formally, described by

$$\begin{aligned} (g*s)(x), \end{aligned}$$
(1)

where \(*\) denotes the convolution (see (ix) for the definition) (recall that here and in the rest of the text, Latin numerals refer to the formulae in the final glossary). We shall give the assumptions on \(g\) and \(s\) so that the quantity in (1) is well defined later on.

A pixel sensor can be small but has nevertheless a positive area. Indeed, a pixel sensor integrates the incoming light \(g*s\) (the scene is observed through the optical system) on some surface element of the form \([x_1,x_2]\subset \mathbb {R}\) with \(x_1<x_2\). Therefore, formally, the output of a pixel sensor supported by \([x_1,x_2]\), in the noiseless case and without motion, is

$$\begin{aligned} \int _{x_1}^{x_2} (g*s)(y)\mathrm{d}y. \end{aligned}$$
(2)

In the sequel, we shall assume that all the pixel sensors of the sensor array have the same length. Mathematically, we can normalize this length so that every pixel sensors of the array have unit length. This corresponds to using the pixel sensor length as unit to measure distances. Thus, this represents no limitation. Hence, from now on the unit of x is the pixel sensor length. By definition, with this unit, all the pixel sensor have lengths 1. Therefore, from now on when we speak of a pixel sensor centered at x we mean that the pixel sensor is supported on the interval \([x-\frac{1}{2},x +\frac{1}{2}]\). Hence, from (2) we deduce that the output of a pixel sensor supported on the interval \([x-\frac{1}{2}, x+ \frac{1}{2}]\), that stares at the scene s through the optical system modeled by \(g\), is, in the noiseless case and without motion,

$$\begin{aligned} \int _{x-\frac{1}{2}}^{x + \frac{1}{2}} (g*s)(y)\mathrm{d}y= (\underbrace{\mathbb {1}_{[-\frac{1}{2},\frac{1}{2}]}*g*s}_{u})(x). \end{aligned}$$
(3)

Remark

We implicitly assume a 100 % fill factor for the sensor as the pixel sensor is supported on \([-\frac{1}{2},\frac{1}{2}]\) and we have a pixel sensor at every unit. This is no loss of generality for studying the gain of the flutter with respect to a snapshot. Indeed, the fill factor impacts equally the snapshot and the flutter. In addition, the RMSE calculations are carried out using the function u in (3) as reference and using an unbiased estimator for u. Thus, all results we give in this paper hold if one replace u by \(u=\mathbb {1}_{[-\varepsilon ,\varepsilon ]}*g*s\) in (3) for any \(\varepsilon \in (0,\frac{1}{2}]\).

Consider the deterministic function formally defined by \(u:=\mathbb {1}_{[-\frac{1}{2},\frac{1}{2}]}*g*s\). The deterministic quantity u(x) represents the gray level of the image at position x if there were no noise and no motion. Indeed, u contains the kernels of the optical system \(g\) and of the sensor. Note that the quantity u(x) can also be seen as an intensity of light emission received by a unit pixel sensor centered at x. With the formalism of, e.g., [1, Eq. 1, Sect. 2] \(\mathbb {1}_{[-\frac{1}{2},\frac{1}{2}]}\) represents “\(h_{\text {sensor}}\)” and \(g\) represents “\(h_{\text {lens}}\).”

We now introduce the exposure time in our model. Indeed, the sensor accumulates the light during a time span of the form \([t_1,t_2]\subset \mathbb {R}\), with \(t_1<t_2\). We denote by \(\Delta t\) the positive quantity \(\Delta t:=t_2-t_1\) that we shall call exposure time. Thus, from (3), the output of a pixel sensor centered at x that integrates on the time interval \([t_1,t_2]\) is, in the noiseless case

$$\begin{aligned} \int _{t_1}^{t_2}\int _{x-\frac{1}{2}}^{x + \frac{1}{2}} (g*s)(y)\mathrm{d}y\mathrm{d}t=\int _{t_1}^{t_2} u(y)\mathrm{d}t. \end{aligned}$$
(4)

Note that the quantity in (4) is the amount of light measured by the pixel sensor, and it evolves linearly with the exposure time \(\Delta t~(=t_2-t_1)\).

We now extend the above formalism to cope with moving scenes. Without loss of generality (w.l.o.g.) we assume that the camera is steady while the scene \(s\) moves. The coded exposure method permits to deal with uniform motions. Therefore, we assume that the scene \(s\) moves at a constant velocity \(v \in \mathbb {R}\) (measured in pixel per second) during the exposure time interval \([t_1,t_2]\). This means that the scene evolves with respect to the time as \(s(x-vt)\). Here and in the sequel the temporal variable is denoted by t. Therefore, from (4) we deduce that the output of a pixel sensor centered at x and integrating on the time interval \([t_1,t_2]\) is, in the noiseless case,

$$\begin{aligned} \int _{t_1}^{t_2} \int _{x-\frac{1}{2}}^{x + \frac{1}{2}} (g*s)(y-vt)\mathrm{d}y\mathrm{d}t= \int _{t_1}^{t_2} u(x-vt)\mathrm{d}t. \end{aligned}$$
(5)

For example, suppose that we take a constant velocity \(v=1\) in (5). In this case, the output of a pixel sensor centered at x is, in the noiseless case,

$$\begin{aligned} \int _{t_1}^{t_2} \int _{x-\frac{1}{2}}^{x + \frac{1}{2}} (g*s)(y-t)\mathrm{d}y\mathrm{d}t =\left( \mathbb {1}_{[t_1,t_2]} *\mathbb {1}_{[-\frac{1}{2},\frac{1}{2}]}*g*s\right) (x)=\left( \mathbb {1}_{[t_1,t_2]}*u\right) (x), \end{aligned}$$

where \(\mathbb {1}_{[a,b]}\) represents the characteristic function of the interval [ab] (see (3) for the last equality). From this simple example we can qualitatively describe where the exposure code will act. Indeed, by a clever exposure technique, the coded exposure method will allow to replace the function \(\mathbb {1}_{[t_1,t_2]}\) in the above formula by a more general class of functions that does not need to be window functions. With the formalism of, e.g., [1, Eq. 1, Sect. 2], \(\mathbb {1}_{[t_1,t_2]}\) represents “\(h_{\text {motion}}\)” for a classic camera.

We now extend our model to cope with the Poisson photon (shot) noise and then will add the readout noise. The photon emission follows a Poisson distribution see, e.g., [49] (tf X is a random variable that follows a Poisson distribution then all the possible realization of X are in \(\mathbb {N}\). In addition, the probability of the event \(X=k\) is \(\mathbb {P}\ (X=k)=\frac{\lambda ^k e^{-\lambda }}{k!}\), where \(\lambda >0\) is the intensity of the Poisson random variable). We assume that a pixel sensor behaves as a photon counter.Footnote 1 That is to say, we assume that a pixel sensor integrates the photons that are emitted by the moving observed scene \(s\) on some surface element of the form \([x_1,x_2]\) on the time span \([t_1,t_2]\) and produces a sample. This sample follows a Poisson random variable. From (5), this means that the sample produced by a pixel sensor supported by \([x_1,x_2]\) and that integrates on the time span \([t_1,t_2]\) has law

$$\begin{aligned} \mathcal {P}\left( \int _{t_1}^{t_2} \int _{x-\frac{1}{2} }^{x+ \frac{1}{2}} (g*s)(y-vt)\mathrm{d}y\mathrm{d}t \right) , \end{aligned}$$
(6)

where the notation \(\mathcal {P}(\lambda )\) denotes a Poisson random variable of intensity \(\lambda \). With (3) the above equation can be rewritten as

$$\begin{aligned} \mathcal {P}\left( \int _{t_1}^{t_2} u(x-vt)\mathrm{d}t\right) . \end{aligned}$$
(7)

Thus, the value of this sample can be any realization a Poisson random variable with intensity \(\int _{t_1}^{t_2} u(x-vt)\mathrm{d}t\). Consequently, the probability that the sample has value \(k\in \mathbb {N}\) when observing the scene \(s\) on the time span \([t_1,t_2]\) with the pixel sensor centered at x is

$$\begin{aligned} \frac{\left( \int _{t_1}^{t_2}u(x-vt)\mathrm{d}t \right) ^{k}\exp \left( - \int _{t_1}^{t_2} u(x-vt)\mathrm{d}y\mathrm{d}t\right) }{k!}. \end{aligned}$$

This quantity is nothing but the probability that the pixel sensor counts \(k \in \mathbb {N}\) photons during the time interval \([t_1,t_2]\).

With the formalism we introduced we can compute the SNR of the produced image, just to verify that we retrieved the fundamental theorem of photography that underlies statements like “the capture SNR increases proportional to the square root of the exposure time” that can be found in, e.g., [2, p. 1, column 2, 1st paragraph]. To this aim, consider the case where \(v=0\), \(t_1=0\), \(t_2=\Delta t\) in (7). If the observed value \(\mathrm{obs}(x)\) at position \(x \in \mathbb {R}\) follows \(\mathcal {P}\left( \int _0^{\Delta t} u(x)\mathrm{d}t \right) \) we have \(\mathbb {E}(\mathrm{obs}(x))=\Delta t u(x)\). This means that, in expectation, the number of photon caught by the pixel sensor centered at x increases linearly with the exposure time. If we time-normalize the obtained quantity and consider, formally, a random variable \(\mathbb {u}_\mathrm{est}(x)\) that follows \(\frac{\mathcal {P}\left( \int _0^{\Delta t} u(x)\mathrm{d}t \right) }{\Delta t}\) we obtain \(\mathbb {E}\left( \mathbb {u}_\mathrm{est}(x) \right) =u(x)\). This means that \(\mathbb {u}_\mathrm{est}(x)\) estimates u(x) without bias. In addition, we have \(\mathrm {var}\left( \mathbb {u}_\mathrm{est}(x) \right) =\frac{\Delta t u(x)}{ \Delta t^2}=\frac{u(x)}{\Delta t}.\)

Consider the SNR on the spatial interval \([-R,R]\) given by \(\frac{\frac{1}{2R}\int _{-R}^R \mathbb {E}\left( \mathbb {u}_\mathrm{est}(x)\right) \mathrm{d}x}{\sqrt{\frac{1}{2R}\int _{-R}^R \mathrm {var}\left( \mathbb {u}_\mathrm{est}(x)\right) \mathrm{d}x}}\). This definition of the SNR can be found in, e.g., [48, Eq. 1.39, p. 42], [46, Eq. 15, p. 4], [2, Eq. 1, p. 2562]. We have

$$\begin{aligned} \frac{\frac{1}{2R}\int _{-R}^R \mathbb {E}\left( \mathbb {u}_\mathrm{est}(x)\right) \mathrm{d}x}{\sqrt{\frac{1}{2R}\int _{-R}^R \mathrm {var}\left( \mathbb {u}_\mathrm{est}(x)\right) \mathrm{d}x}}=\frac{\frac{1}{2R}\int _{-R}^R u(x) \mathrm{d}x}{\sqrt{\frac{1}{2R}\int _{-R}^R \frac{u(x)}{\Delta t}\mathrm{d}x}}=\frac{\sqrt{\frac{1}{2R} \int _{-R}^{R} u(x)\mathrm{d}x}}{\Delta t}. \end{aligned}$$

Therefore, assuming that \(\mu \) the “mean signal level” [48, p. 42] (\(\mu \) relates to \(\bar{i}_0\) in, e.g., [2, Sect. 2])

$$\begin{aligned} \mathbb {R}^+ \ni \mu :=\lim _{R \rightarrow +\infty }\frac{1}{2R}\int _{-R}^R u(x)\mathrm{d}x \end{aligned}$$

is finite we can define the SNR by

$$\begin{aligned} \text {SNR}(\mathbb {u}_\mathrm{est}):=\frac{\lim _{R\rightarrow +\infty }\frac{1}{2R}\int _{-R}^R \mathbb {E}\left( \mathbb {u}_\mathrm{est}(x)\right) \mathrm{d}x}{\sqrt{\lim _{R\rightarrow +\infty }\frac{1}{2R}\int _{-R}^R \mathrm {var}\left( \mathbb {u}_\mathrm{est}(x)\right) \mathrm{d}x}}. \end{aligned}$$

Thus, we have \(\text {SNR}(\mathbb {u}_\mathrm{est})=\sqrt{\mu \Delta t}\). For example, if the mean photon emission \(\mu \) doubles then the SNR is multiplied by a factor \(\sqrt{2}\) (and we retrieve the fundamental theorem of photography namely the \(\mathrm {SNR} \rightarrow + \infty \) when \(\Delta t\rightarrow +\infty \)). Note that if we have no control over the photon emission then the only sure way to increase the SNR with a given camera is to increase the exposure time \(\Delta t\). Similarly, we can define the MSE by

$$\begin{aligned} \text {MSE}(\mathbb {u}_\mathrm{est}):=\lim _{R\rightarrow +\infty } \frac{1}{2R}\int _{-R}^{R}\mathbb {E}\left( \left| \mathbb {u}_\mathrm{est}(x)-u(x)\right| ^2\right) \mathrm{d}x, \end{aligned}$$

whenever the limit exists, and we have \(\text {MSE}\left( \mathbb {u}_\mathrm{est}\right) =\mu \Delta t\).

We are now in position to extend our model to include the additive (readout) noise. Here and in the sequel the additive (readout) noise of a pixel sensor centered at x is modeled by a zero-mean real random variable of finite variance denoted by \(\eta (x)\). Therefore, from (6), the output of the pixel sensor centered at x that integrates photons on the time span \([t_1,t_2]\) can be, formally, any realization of the sum of the random variables

$$\begin{aligned} \mathcal {P}\left( \int _{t_1}^{t_2} \int _{x-\frac{1}{2}}^{x+\frac{1}{2}} (g*s)(y-vt)\mathrm{d}y\mathrm{d}t \right) +\eta (x), \end{aligned}$$
(8)

or equivalently (see (3)),

$$\begin{aligned} \mathcal {P}\left( \int _{t_1}^{t_2} u(x-vt)\mathrm{d}t \right) +\eta (x). \end{aligned}$$
(9)

Recall that the deterministic quantity u(x) represents the gray level of the image at position x if there were no noise and no motion as it is seen by a pixel sensor centered at x. The quantity \(\int _{t_1}^{t_2} u(x-vt)\mathrm{d}t\) represents the amount of light received on the time interval \([t_1,t_2]\) by a steady pixel sensor centered at x that gathers the light emitted by the observed scene that moves at velocity \(v\in \mathbb {R}\).

We now give a mathematical framework to make precise the above formulae. We shall assume that the scene \(s\in L^1_{loc}(\mathbb {R})\) so that the convolution in (3) is well defined. We shall assume that the PSF \(g\) belongs to the Schwartz class that, hereinafter, we shall denote \(S(\mathbb {R})\). In addition, we shall assume that the PSF \(g\in S(\mathbb {R})\) furnishes a cutoff frequency. This assumption is needed by the Shannon–Whittaker sampling theory. We shall assume that the frequency cutoff of \(g\) is \(\pi \), i.e., \(g\) is \([-\pi ,\pi ]\) band limited. In other words, \(\hat{g}(\xi )=0\) for any \(\xi \in \mathbb {R}\) such that \(|\xi |>\pi \), where, here and in the sequel, we denote by \(\hat{g}\) or \(\mathcal {F}(g)\) the Fourier transform of \(g\) [see (14) for the definition of the Fourier transform] and (here and elsewhere) \(\xi \in \mathbb {R}\) represents the (Fourier) frequency coordinate. One could think that this \([-\pi ,\pi ]\) is a limitation for the theory. However, it is not. The choice of \([{-\pi ,\pi }]\) in the following definition is thoroughly justified in Appendix B.

Definition 1

(The observable scene u) We call observable scene any non-negative deterministic function u of the form \( u=\mathbb {1}_{[-\frac{1}{2},\frac{1}{2}]} *g*s.\) Recall that the \(\mathbb {1}_{[-\frac{1}{2},\frac{1}{2}]}\) denotes the characteristic function of the interval \([-\frac{1}{2},\frac{1}{2}]\) and is related to the normalized pixel sensor. The PSF satisfies \(g\in S(\mathbb {R})\), \(g\ge 0\), and is \([-\pi ,\pi ]\) band limited. The (non-negative) photon emission intensity is denoted \(s\in L^1_{loc}(\mathbb {R})\). We have that \(u\in L_{loc}^1(\mathbb {R})\) and we assume that u satisfies \(\mu :=\lim _{TR\rightarrow +\infty }\frac{1}{2R}\int _{-R}^{R^{}} u(x)\mathrm{d}x \in \mathbb {R}^+\). In addition, we assume that \(\tilde{u}:=u-\mu \in L^1(\mathbb {R}) \cap L^2(\mathbb {R})\).

Note that u is the sum of the constant \(\mu \) and of \(\tilde{u} \in L^1(\mathbb {R})\). Thus, we have \(u\in S'(\mathbb {R})\) (the space of tempered distributions). This means that u enjoys a Fourier transform in \(S'(\mathbb {R})\), see, e.g., [50, p. 173], see also [51, p. 23]. In addition, u and \(\tilde{u}\) inherit the frequency cutoff of the PSF \(g\). Therefore, u and \(\tilde{u}\) are \([-\pi ,\pi ]\) band limited. In addition note that the assumption \(\tilde{u} \in L^2(\mathbb {R}) \) is w.l.o.g. Indeed, since \(\tilde{u} \in L^1(\mathbb {R})\), from Riemann–Lebesgue theorem (see e.g., [52, Proposition 2.1]), we have that \(\hat{\tilde{u}}\) is continuous. In addition, since \(\tilde{u}\) is \([-\pi ,\pi ]\) band limited we have that \(\hat{\tilde{u}}\) is continuous and has compact support. We deduce that \(\hat{\tilde{u}}\in L^2(\mathbb {R})\). Therefore, we obtain that \(\tilde{u} \in L^2(\mathbb {R})\) w.l.o.g.

We can now give a definition of the observed sample at a pixel centered at \(x \in \mathbb {R}\) that we shall denote \(\mathrm{obs}(x)\).

Definition 2

(Observed sample of a of pixel that includes any additive (sensor readout) noise of finite variance in addition to the Poisson photon (shot) noise) We assume that the observed sample produced by a unit pixel sensor centered at \(x \in \mathbb {R}\) is corrupted by an additive noise \(\eta (x)\) that we shall call readout noise. We assume that \(\mathbb {E}(\eta (x))=0\) and that \(\mathrm {var}(\eta (x))=\sigma _r<+\infty \). Hereinafter, we shall denote this observed sample by \(\mathrm{obs}(x)\). From (9), we have that \(\mathrm{obs}(x)\) satisfies, for any \(x \in \mathbb {R}\),

$$\begin{aligned} \mathrm{obs}(x) \sim \mathcal {P}\left( \int _{t_1}^{t_2} u(x-vt) \mathrm{d}t\right) +\eta (x), \end{aligned}$$
(10)

where \([t_1,t_2]\) is the time exposure interval, the observable scene u is defined by Definition 1 and moves at velocity \(v\in \mathbb {R}\). The notation \(X\sim Y\) means that the random variables X and Y have the same law.

In the sequel we will need to compute MSEs as well as SNRs. Therefore, we will need to compute expected values and variances of the observed samples. Thus, we need to justify the validity of these operations. This is done in Appendix C.

The Definition 2 entails that \(\mathrm{obs}(x)\), the observed sample of a pixel sensor centered at position x, is a measurable function (a random variable see, e.g., [p. 168] shiriaev1984probability) for which it is mathematically possible to compute, e.g., the expectation and the variance.

The images produced by a digital camera are discrete. In addition, the image obtained by a coded exposure camera requires to undergo a deconvolution to get the final crisp image. The calculation of the adequate deconvolution filter requires a continuous model. Thus, we now turn to the sampling and interpolation in order to go comfortably from the discrete observations to the latent continuous image.

2.2 Sampling and interpolation

This section recalls the principles of the Shannon–Whittaker interpolation that applies to, e.g., images that have the band limitedness structure. Consider a \([-\pi ,\pi ]\) band-limited deterministic function \(f\in L^1(\mathbb {R})\cap L^2(\mathbb {R})\). From the values f(n) for \(n \in \mathbb {Z}\) the Shannon–Whittaker interpolation of f is

and the above series converges uniformly to f(x) for any \(x \in \mathbb {R}\) (see, e.g., [40, p. 354]).

We recall that Appendix B proves that it is no loss to assume that u is \([-\pi ,\pi ]\) band limited. However, the sample \(\mathrm{obs}(x)\) defined in Definition 2 produced by the sensor is noisy. Indeed, the sample \(\mathrm{obs}(x)\) contains the Poisson photon (shot) noise and the additive sensor readout noise. This means that \( \mathbb {R}\ni x \mapsto \mathrm{obs}(x)\) is not a deterministic function and that \(\mathrm{obs}\) does not belong to any Lebesgue space. The Shannon–Whittaker theorem is usually applied to deterministic functions. Some generalization exists in the case where the observed samples are corrupted by an additive noise, see, e.g., [54, p. 111], or to sample wide-sense stationary stochastic signals, see, e.g., [54, p. 148]. However, the Poisson photon shot noise is not additive. Therefore, the first generalization is not applicable. In addition, from Definition 1 we deduce that the autocorrelation function \(\mathbb {E}\left( \mathrm{obs}(x) \mathrm{obs}(y) \right) \) is not a function of the variable \(x-y\). This means that the samples of a coded exposure camera cannot be seen as the samples of a wide-sense stationary stochastic process (see, e.g., [55, p. 17] for the definition). In addition, the sensor itself can introduce non-stationary noise, see, e.g., [47]. Thus, to the best of our knowledge, the existing generalizations of the Shannon–Whittaker theorem are not sufficient to treat the observed samples of a coded exposure camera (defined in Definition 2). Consequently, in the sequel, we shall carefully prove that

(11)

is mathematically feasible for the \(\mathrm{obs}\) defined in Definition 2.

Therefore, in the sequel, we assume that the observed samples are obtained from a sensor array and that the sensor array is designed according to the Shannon–Whittaker sampling theory. Thus, we assume that the samples \(\mathrm{obs}(x)\) are obtained at a unit rate, i.e., for \(x \in \mathbb {Z}\). Consequently, we shall denote the observed samples by \(\mathrm{obs}(n)\). This means that, in the sequel, we shall neglect the boundaries effect due to the deconvolution. This is another way to get rid of the boundaries effects without assuming that the observed scene is periodic as required by linear algebra (with Toepliz matrices) model based (see, e.g., [13, 5, 12, 25, 29, 30]) (this is needed because most natural scenes are not periodic). Note that this slightly overestimates the gain of the coded exposure method with respect to the snapshot. Indeed, the support of the coded exposure function is larger than the support of the exposure function of a snapshot. This means that in practice the boundaries artifacts due to the deconvolution are stronger with the coded exposure method.

Hereinafter, we assume that the sequence of random variables \((\eta (n))_{n \in \mathbb {Z}}\) are mutually independent, identically distributed, and independent from the shot noise, i.e., independent from \(\mathcal {P}\left( \int _{t_1}^{t_2} u(n-vt) \mathrm{d}t\right) \) where \(n \in \mathbb {Z}\). This independence assumption represents no limitation for the model. Indeed, a photon can only be sensed once. In addition, the additive (sensor readout) noise comes from an inaccurate reading of the sample value that does not depend on the light intensity emission or on the Poisson photon (shot) noise.

We have defined the observed samples produced by a light sensor in Definition 2. This definition includes both the Poisson photon (shot) noise and an additive (readout) noise of finite variance. We now turn to the mathematical formalization of the coded exposure method.

3 A mathematical model of coded exposure camera that includes any additive (sensor readout) noise of finite variance in addition to the Poisson photon (shot) noise

The goal of this section is to formalize the coded exposure method. In this section, we consider invertible “exposure codes” and provides the MSE and SNR of these exposure strategies. The study yields to Theorem 3.4.

The coded exposure (flutter shutter) method permits to modulate, with respect to the time, the photons flux caught by the sensor array. Indeed, the Agrawal and Raskar coded exposure method [16] consists in opening/closing the camera shutter on sub-intervals of the exposure time. In such a situation the exposure function that controls when the shutter is open or closed is binary and piecewise constant. Since it is piecewise constant it is possible to encode this function using an “exposure code.” (We give a mathematical definition of these objects).

Note that neither the model nor the results of [30] can be used in this paper. Indeed, in [30] the additive (sensor readout) noise is neglected. Therefore, the formalism of [30] does not hold with the more elaborated set up that we shall consider here. Indeed, this paper considers any additive sensor readout noise of finite variance in addition to the Poisson photon (shot) noise.

As we have seen, in their seminal work [16], Agrawal and Raskar propose to use binary exposure codes. Yet, mathematically, one could envisage smoother exposure codes that are not binary. Indeed, with a bigger searching space for the exposure code the MSE and SNR can be expected to be better than with the smaller set of binary codes. Therefore, in the sequel, we shall assume that the exposure codes have values in [0, 1]. The value 0 means that the shutter is closed while the value 1 means the shutter is open and, e.g., \(\frac{1}{2}\) means that half of the photons are allowed to reach the sensor. We do not consider the practical feasibility of these non-binary exposure codes as this is out of the scope of this paper which proposes a mathematical framework and formulae.

We first formalize the fact that the exposure code method modulates temporally the flux of photons that are allowed to reach the sensor by giving an adequate definition of an “exposure function” that, hereinafter, we shall denote \(\alpha \). To be precise, the gain \(\alpha (t)\) at time t is defined as the proportion of photons that are allowed to travel to the sensor. We then give the formula of the observed samples taking the exposure function into account (see Definition 4).

Definition 3

(Exposure function, exposure code) We call exposure function any function \(\alpha \) of the form

$$\begin{aligned} \begin{array}{lrcl} \alpha : &{} \mathbb {R}&{} \longrightarrow &{} [0,1] \\ &{} t &{} \longmapsto &{} \sum \limits _{k=-\infty }^{+\infty } \alpha _k \mathbb {1}_{[k\Delta t,(k+1)\Delta t)}(t).\end{array} \end{aligned}$$
(12)

We assume that \(a_k \in [0,1]\), for any k, that \((a_k)_k \in \ell ^1(\mathbb {Z})\) and that \(\Delta t>0\). The sequence \((a_k)_k\) is called exposure code.

Remark

It is easy to see that \(\alpha \in L^1(\mathbb {R}) \cap L^2(\mathbb {R}) \cap L^\infty (\mathbb {R})\) and that the above definition can cope with finitely supported codes, e.g., the Agrawal and Raskar code [5, p. 5] and patent application [6, p. 5].

We have defined the exposure function that controls with respect to the time the camera shutter. We now give the formula of the observed samples of a coded exposure camera.

Let \(\alpha \) be an exposure function and \(u(x-vt)\) a scene moving at velocity \(v \in \mathbb {R}\). Recall that \(\alpha (t)\) is nothing but the percentage of photons allowed to reach the sensor at time t. Therefore, from Definition 2, we deduce that \(\mathrm{obs}(n)\), the observed sample at a position \(n \in \mathbb {Z}\), is a random variable that satisfies, for any \(n \in \mathbb {Z}\),

$$\begin{aligned} \mathrm{obs}(n) \sim \mathcal {P}\left( \int _{- \infty }^{ \infty } \alpha (t) u(n-vt) \mathrm{d}t \right) + \eta (n) \sim \mathcal {P}\left( \left( \frac{1}{|v|} \alpha \left( \frac{\cdot }{v}\right) *u\right) (n) \right) + \eta (n). \end{aligned}$$

This yields

Definition 4

(Observed samples of a coded exposure camera) Let \(\alpha \) be an exposure function. We call observed samples at position n of the scene u (defined in Definition 1) moving at velocity \(v \in \mathbb {R}\) the random variable

$$\begin{aligned} \mathrm{obs}(n) \sim \mathcal {P} \left( \left( \frac{1}{|v|}\alpha \left( \frac{\cdot }{v}\right) *u\right) (n) \right) + \eta (n) . \end{aligned}$$
(13)

Recall that the random variables \(\mathrm{obs}(n)\) observed for \(n \in \mathbb {Z}\) are mutually independent. From Definition 1 we have that u is of the form \(L^1(\mathbb {R})\) plus constant and is band limited. From Definition 3 we have that \(\alpha \in L^1(\mathbb {R})\). We obtain that the convolution in (13) is well defined everywhere. In addition, note that the pixels are read only once as in, e.g., [16]. (Only one image is observed, stored and transmitted).

Proposition 3.1

Let \(\mathrm{obs}\) be as in Definition 4. For any \(n \in \mathbb {Z}\), we have

$$\begin{aligned} \mathbb {E}\left( \mathrm{obs}(n)\right) =\left( \frac{1}{|v|} \alpha \left( \frac{\cdot }{v}\right) *u\right) (n) \mathrm {\quad and \quad } \mathrm {var}\left( \mathrm{obs}(n)\right) = \left( \frac{1}{|v|}\alpha \left( \frac{\cdot }{v}\right) *u\right) (n) +\sigma ^2_r.\nonumber \\ \end{aligned}$$
(14)

Proof

The proof is a direct consequence of Definition 4. \(\square \)

Remark

(The motion blur of a standard camera is not invertible as soon as its support exceeds two pixels) A standard camera can be seen as a coded exposure camera if the exposure function \(\alpha \) is of the form \(\alpha =\mathbb {1}_{[0,\Delta t]}\), where \(\Delta t>0\) is the exposure time measured in second(s). Consider the idealistic noiseless case where, from (13), one would observe

$$\begin{aligned} \mathbb {E}\left( \mathrm{obs}(n)\right) =\left( \frac{1}{|v|}\mathbb {1}_{[0,\Delta t]} \left( \frac{\cdot }{v}\right) *u \right) (n). \end{aligned}$$
(15)

From Definition 1 u is \([-\pi ,\pi ]\) band limited. Therefore, we deduce that the convolution in (15) is non-invertible as soon as the Fourier transform of \(\frac{1}{|v|}\mathbb {1}_{[0,\Delta t]} \left( \frac{\cdot }{v}\right) \) is zero on \([-\pi ,\pi ]\). For any \(\xi \in \mathbb {R}\), we have \(\mathcal {F}\left( \frac{1}{|v|}\mathbb {1}_{[0,\Delta t]} \left( \frac{\cdot }{v}\right) \right) (\xi )=\mathcal {F}\left( \mathbb {1}_{[0,\Delta t]}\right) (\xi v)\). In addition, from the definition of the Fourier transform (14), for any \(\xi \in \mathbb {R}\) we have

$$\begin{aligned} \mathcal {F}\left( \mathbb {1}_{[0,\Delta t]} \right) (\xi )=\Delta t \ \mathrm {sinc}\ \left( \frac{\xi \Delta t}{2\pi }\right) e^{\frac{-i\xi \Delta t}{2}}. \end{aligned}$$
(16)

Therefore, for any \(\xi \in \mathbb {R}\), we have \(\mathcal {F}\left( \frac{1}{|v|}\mathbb {1}_{[0,\Delta t]} \left( \frac{\cdot }{v}\right) \right) (\xi )= \Delta t \ \mathrm {sinc}\ \left( \frac{\xi v \Delta t}{2\pi }\right) e^{\frac{-i\xi v\Delta t}{2}}.\) From the Definition (16) of the sinc function, we deduce that the convolution in (15) is not invertible as soon as the blur support \(|v| \Delta t\) satisfies \(|v| \Delta t \ge 2\). Since, the velocity v is measured in pixel(s) per second, and the exposure time \(\Delta t\) is measured in second(s) we deduce that as soon as the blur support \(|v|\Delta t\) exceeds two pixels the motion blur of a standard camera is not invertible.

The observed samples of any coded exposure camera are formalized in Definition 4. We wish to compute the MSE and SNR of a deconvolved crisp image with respect to the continuous observable scene u. To this aim a continuous deconvolved crisp signal \(\mathbb {u}_\mathrm{est}\) must be defined from the observed samples \(\mathrm{obs}(n)\) observed for \(n \in \mathbb {Z}\). Thus, we (1) prove the mathematical feasibility of the Shannon–Whittaker interpolation “\(\mathrm{obs}(x)=\sum _{n \in \mathbb {Z}} \mathrm{obs}(n) \ \mathrm {sinc}\ (x-n)\)”, (2) deduce the conditions on the exposure function \(\alpha \) for the existence of an inverse filter \(\gamma \) that deconvolves the observed samples, (3) define the final crisp image \(\mathbb {u}_\mathrm{est}\) and (4) give the formulae for the MSE and SNR of \(\mathbb {u}_\mathrm{est}\). The study yields to Theorem 3.4.

The mathematical feasibility of the Shannon–Whittaker interpolation is formalized by

Proposition 3.2

(Mathematical feasibility of the Shannon–Whittaker interpolation of the observed samples \(\mathrm{obs}(n)\) \(n\in \mathbb {Z}\)) Let \(\mathrm{obs}\) be as in Definition 4. For any \(x \in \mathbb {R}\), the series

$$\begin{aligned} \mathrm{obs}(x)=\sum _{n=-\infty }^{+\infty } \mathrm{obs}(n)\ \mathrm {sinc}\ (x-n) \end{aligned}$$
(17)

converges in quadratic mean. In addition, for any \(x \in \mathbb {R}\), we have

$$\begin{aligned}&\mathbb {E} \left( \mathrm{obs}(x)\right) =\left( \frac{1}{|v|}\alpha \left( \frac{\cdot }{v}\right) *u\right) (x);\end{aligned}$$
(18)
$$\begin{aligned}&\mathrm {var}\left( \mathrm{obs}(x)\right) =\sum _{n=-\infty }^{+\infty }\left[ \left( \frac{1}{|v|}\alpha \left( \frac{\cdot }{v}\right) *u\right) (n) ~\ \mathrm {sinc}\ ^2(x-n)\right] +\sigma _r^2<+\infty .\qquad \quad \end{aligned}$$
(19)

Proof

See Appendix D. \(\square \)

We now treat the step 2. We cannot recur to a Wiener filter to define \(\gamma \). Indeed, due to the Poisson photon (shot) noise, the noise in of our observations \(\mathrm{obs}(n)\) defined in Definition 4 is not additive. Therefore, the Wiener filter is not defined (see, e.g., [56, p. 205], [55, p. 95], [57, p. 159] see also [58, p. 252] for a definition). Instead of using a Wiener filter we shall propose a filter designed so that the restored crisp image \(\mathbb {u}_\mathrm{est}\) is unbiased. This is also the set up considered in, e.g., [1, Sect. 3.1, p. 6]. We now provide the condition under which an inverse filter \(\gamma \) will yield to an unbiased restored crisp image \(\mathbb {u}_\mathrm{est}\).

If the exposure function \(\alpha \) defined in Definition 3 satisfies \(\hat{\alpha }(\xi v)= 0\) for some \(\xi \in [-\pi ,\pi ]\) the convolution in (18) is not invertible and some information is destroyed. Therefore, it is no more possible to retrieve any observed scene u (in a discrete setting, that would mean that the Toepliz matrix associated with the convolution kernel is not invertible). Thus, if \(\hat{\alpha }(\xi v)\ne 0\) for some \(\xi \in [-\pi ,\pi ]\) there exists no inverse filter \(\gamma \) capable of giving back an arbitrary observed scene u. Hence, we assume that the exposure function \(\alpha \) satisfies \(\hat{\alpha }(\xi v)\ne 0\) for any \(\xi \in [-\pi ,\pi ]\). Under that condition the convolution \(\left( \frac{1}{|v|} \alpha \left( \frac{\cdot }{v}\right) \right) *u\) in (18) is invertible because u is \([-\pi ,\pi ]\) band limited. Therefore, we have

Definition 5

(Admissible \(\alpha \) and definition of the inverse filter \(\gamma \) ) Let \(\alpha \) as in Definition 3. If \(\alpha \) satisfies \(\hat{\alpha }(\xi v)\ne 0\) for any \(\xi \in [-\pi ,\pi ]\) then the inverse filter \(\gamma \), that deconvolves the observed samples (and will be proved to give back a crisp image), exists and is defined by [its inverse Fourier transform (14)]

$$\begin{aligned} \gamma (x):=\mathcal F^{-1}\left( \frac{\mathbb {1}_{[-\pi ,\pi ]}(\xi )}{\hat{\alpha }(\xi v)}\right) (x). \end{aligned}$$
(20)

Remark

From Definition 5, we deduce that \(\mathbb {R}\ni \xi \mapsto \hat{\gamma }(\xi )\) is bounded and has compact support. Hence, we have \(\hat{\gamma }\in L^1(\mathbb {R}) \cap L^2(\mathbb {R})\) and therefore \(\gamma \in L^2(\mathbb {R})\). In addition, from Riemann–Lebesgue theorem we have that \(\gamma \) is continuous and bounded. Consequently, \(\gamma \) is \([-\pi ,\pi ]\) band limited and \(C^{\infty }(\mathbb {R})\), bounded, and belongs to \( L^2(\mathbb {R})\).

We now treat the step 3. The mathematical feasibility of deconvolved crisp signal \(\mathbb {u}_\mathrm{est}\) is formalized by

Proposition 3.3

(Validity/existence of the crisp deconvolved image \(\mathbb {u}_\mathrm{est}\)) Let \(\mathrm{obs}\) be as in Definition 4 and \(\alpha \), \(\gamma \) be as in Definition 5. For any \(x \in \mathbb {R}\), the series

$$\begin{aligned} \mathbb {u}_\mathrm{est}(x):=\sum _{n=-\infty }^\infty \mathrm{obs}(n)\gamma (x-n) \end{aligned}$$
(21)

converges in quadratic mean. In addition, for any \(x \in \mathbb {R}\), we have

$$\begin{aligned}&\mathbb {E}\left( \mathbb {u}_\mathrm{est}(x)\right) = u(x);\end{aligned}$$
(22)
$$\begin{aligned}&\mathrm {var}\left( \mathbb {u}_\mathrm{est}(x)\right) =\sum _{n=-\infty }^\infty \mathrm {var}\left( \mathrm{obs}(n)\right) (\gamma (x-n))^2<+\infty . \end{aligned}$$
(23)

This proposition means that \(\mathbb {u}_\mathrm{est}\) is an unbiased estimator of the observable scene u.

Proof

See Appendix F. \(\square \)

We now treat the step 4. We have

Theorem 3.4

(MSE and SNR of the coded exposure method) Let \(\mathbb {u}_\mathrm{est}\) be as in Proposition 3.3. Consider a scene \(u(x-vt)\) (see Definition 1) that moves at velocity \(v\in \mathbb {R}\) and let \(\sigma _r^2\) be the (finite) variance of the additive (readout) noise.

The MSE and SNR of the final crisp image \(\mathbb {u}_\mathrm{est}\) satisfy

$$\begin{aligned} \mathrm{MSE}_{\mathrm{coded exp.}}(\alpha ):= & {} \lim _{R\rightarrow +\infty } \frac{1}{2R} \int _{-R}^R \mathbb {E}\left( \left| \mathbb {u}_\mathrm{est}(x) -u(x) \right| ^2\right) \mathrm{d}x \nonumber \\= & {} \frac{1}{2\pi }\int _{-\pi }^\pi \frac{\mu \Vert \alpha \Vert _{L^1(\mathbb {R})} + \sigma ^2_r}{\left| \hat{\alpha }(\xi v )\right| ^2} \mathrm{d}\xi ; \end{aligned}$$
(24)
$$\begin{aligned} \mathrm{SNR}_{\mathrm{coded exp.}}(\alpha ):= & {} \frac{\lim _{R\rightarrow + \infty } \frac{1}{2R} \int _{-R}^{R} \mathbb {E}\left( \mathbb {u}_\mathrm{est}(x) \right) \mathrm{d}x}{\sqrt{\lim _{R \rightarrow +\infty } \frac{1}{2R} \int _{-R}^{R} \mathrm {var}\left( \mathbb {u}_\mathrm{est}(x) \right) \mathrm{d}x}} = \frac{\mu }{\sqrt{\frac{1}{2\pi }\int _{-\pi }^\pi \frac{\mu \Vert \alpha \Vert _{L^1(\mathbb {R})} + \sigma ^2_r}{\left| \hat{\alpha }(\xi v)\right| ^2}\mathrm{d}\xi }}.\nonumber \\ \end{aligned}$$
(25)

Proof

See Appendix I. \(\square \)

We now connect the formulae in Theorem 3.4 with the existing literature on the coded exposure method. We have that the mean photon emission \(\mu \) relates to \(\bar{i}_0\) in, e.g., [2, Sect. 2]. In addition, from (25), we have that for fixed exposure function \(\alpha \) and additive (readout) noise variance \(\sigma _r^2\) the SNR evolves proportionally to \(\sqrt{\frac{\mu }{1 +\frac{\sigma _r^2}{\mu }}}\) with the mean photon emission \(\mu \). In particular, from (25), if \(\sigma _r^2=0\) and for a fixed \(\alpha \) we deduce that the SNR evolves proportionally to \(\sqrt{\mu }\) and we retrieve the fundamental theorem of photography. Note that it is equivalent to minimize (24) or to maximize (25) with respect to the exposure function \(\alpha \). Therefore, in the sequel we choose w.l.o.g. to use formula (24) and to evaluate the performance of the coded exposure method in terms of MSE. The calculation for the SNR can be immediately deduced. As an easy application of Theorem 3.4, we have the following corollary that provides the MSE of any invertible snapshot, i.e., that satisfies \(|v|\Delta t<2\) (see the Sect. 5) where \(\Delta t\) is the exposure time. This corollary will also be needed to compare the coded exposure method and the snapshot, in terms of MSE, in Sect. 4.

Corollary 3.5

(MSE of a snapshot with an exposure time of \(\Delta t\)) Let \(\mathbb {u}_\mathrm{est}\), v, \(u(x-vt)\), and \(\sigma _r^2\) be as in Theorem 3.4 and \(\Delta t\) be such that \(|v|\Delta t <2\). The MSE of a snapshot with exposure time \(\Delta t\) is

$$\begin{aligned} \mathrm{MSE}_{\mathrm{snap.}}(\Delta t)= & {} \lim _{R\rightarrow +\infty } \frac{1}{2R} \int _{-R}^R \mathbb {E}\left( \left| \mathbb {u}_\mathrm{est}(x) -u(x) \right| ^2\right) \mathrm{d}x\nonumber \\= & {} \frac{1}{2\pi }\int _{-\pi }^\pi \frac{\mu \Delta t + \sigma ^2_r}{\left| \Delta t\ \mathrm {sinc}\ \left( \frac{\xi v \Delta t}{2\pi }\right) \right| ^2} \mathrm{d}\xi . \end{aligned}$$
(26)

Proof

The proof is immediate combining (16) and (24). \(\square \)

We now turn to Sect. 4 that proposes a theoretical evaluation of the gain of the coded exposure method, with respect a snapshot.

4 An upper bound of performance for coded exposure cameras

This section study the gain, in terms of MSE, of the coded exposure method, with respect to a snapshot, as a function of the exposure code sampling rate. The study yields to a theoretical bound that is formalized in Theorem 4.1 and Corollary 4.2. The bound is valid for any exposure code provided \(|v|\Delta t \le 1\) (we recall that the exposure code sampling rate \(\Delta t\) is defined in Definition 3). This means that the proposed bound is an upper bound for the gain of any coded exposure camera, provided \(|v|\Delta t \le 1\). We have

Theorem 4.1

(A lower bound for the MSE of coded exposure cameras) Let \(\mathbb {u}_\mathrm{est}\), v, \(u(x-vt)\), and \(\sigma _r^2\) be as in Theorem 3.4. The MSE of any coded exposure camera satisfies, as soon as \(|v|\Delta t \le 1\),

$$\begin{aligned} {}\mathrm{MSE}_{\mathrm{any \ flutter}}(\alpha ) \ge |v|\left[ \frac{\mu }{2\pi } \int _{-\pi }^{\pi } \frac{\mathrm{d}\xi }{\ \mathrm {sinc}\ ^2\left( \frac{\xi |v|\Delta t}{2\pi }\right) }+ \sigma ^2_r|v| \left( \frac{1}{2\pi }\int _{-\pi }^{\pi } \frac{\mathrm{d}\xi }{ \ \mathrm {sinc}\ \left( \frac{\xi v \Delta t}{2\pi }\right) }\right) ^2\right] . \end{aligned}$$
(27)

Proof

See Appendix J. \(\square \)

We also have

Fig. 2
figure 2

This figure depicts the upper bound proved in Corollary 4.2. The x axis represents the quantity \(|v|\Delta t\) that is inversely proportional to the frequency sampling of the exposure function. The x axis varies in the interval [0, 1] because Corollary 4.2 is valid in this range. The y axis represents the upper bound of the gain, in terms of root mean square error, of the flutter with respect to a snapshot with an exposure time \(\Delta t=\frac{1}{|v|}\). Note that the curve is an upper bound. Thus, the actual gain of the coded exposure method is below this curve

Fig. 3
figure 3

In this experiment, we assume that the scene s moves at velocity \(v=1\) pixel per second. The additive (readout) noise is Gaussian with a standard deviation equal to 10. We also assume that the scene emits 625 photons per seconds (for other values see Table 1) On the top left panel: the observed image using the Agrawal and Raskar code [5, 6]. On the top right panel: the observed image for snapshot an exposure time of 1 second, i.e., the blur support is 1 pixel. On the bottom left panel: the reconstructed image for the Agrawal and Raskar code [5, 6]. On the bottom right panel: the reconstructed image for the snapshot (blur support of 1 pixel). This means that for the Agrawal and Raskar code the blur has a support of 52 pixels. In other words, this code permits to increase the exposure time by a factor 52 compared to the snapshot. The RMSE using the Agrawal and Raskar code is equal to 9.84. The RMSE of the snapshot is equal to 5.96. We refer to Table 1 for different values of mean photon count and additive (readout) noise variance. This simulation is based on a variant of [29]

Corollary 4.2

(Upper bound of any coded exposure camera in terms of MSE with respect to a snapshot) Let \(\mathbb {u}_\mathrm{est}\), v, \(\Delta t\), \(u(x-vt)\) and \(\sigma _r^2\) be as in Theorem 4.1. We have

$$\begin{aligned} \frac{ \mathrm{MSE}_{\mathrm{optimal~snapshot}}}{ \mathrm{MSE}_{\mathrm{any \ fluttter}}(\alpha )}\le \frac{\mathrm{MSE}_{\mathrm{snapshot}}(\Delta t)}{ \mathrm{MSE}_{\mathrm{any \ fluttter}}(\alpha )}\le \frac{\left( \frac{1}{2\pi } \int _{-\pi }^\pi \left| \frac{\xi }{2 \sin \left( \frac{\xi }{2}\right) }\right| ^2 \mathrm{d}\xi \right) }{\left( \frac{1}{2\pi }\int _{-\pi }^{\pi } \frac{\mathrm{d}\xi }{ \ \mathrm {sinc}\ \left( \frac{\xi v \Delta t}{2\pi }\right) }\right) ^2}. \end{aligned}$$
(28)

Proof

See Appendix K. \(\square \)

Corollary 4.2 directly provides an upper bound for the gain, of the coded exposure method, in terms of MSE with respect to a snapshot. Given our hypothesis this bound is valid for any code as soon as \(|v|\Delta t \le 1\). We now depict, in Fig. 2, the upper bound of Corollary 4.2 varying the quantity \(|v|\Delta t\). Note that the quantity \(|v|\Delta t\) is inversely proportional to the temporal frequency sampling of the exposure code. Note that the curve is an upper bound. Thus, the actual gain of the coded exposure method is below this curve.

We now illustrate numerically Corollary 4.2 in Fig. 3 and Table 1.

Table 1 This table provides the evolution of the RMSE varying the intensity of the mean photon emission for a fixed additive (readout) noise variance

5 Limitations and discussion

We have proposed a mathematical model of coded exposure cameras. The model includes the Poisson photon (shot) noise and any additive readout noise of finite variance. The model is based on the Shannon–Whittaker framework that assumes band-limited images. This formalism has allowed us to give closed formulae for the Mean Square Error and Signal to Noise Ratio of coded exposure cameras. In addition, we have given an explicit formula that gives an absolute upper bound for the gain of any coded exposure cameras, in terms of Mean Square Error, with respect to a snapshot. The calculations take into account the whole imaging chain that includes Poisson photon (shot) noise, any additive (readout) noise of finite variance in addition to the deconvolution. Our mathematical model does not allow us to prove that the coded exposure method allows for very large gains compared to an optimal snapshot. This may be the result of an imperfect model of our mathematical coded exposure camera. Indeed, our model assumes that the sensor does not saturate, that the relative camera scene velocity is known, that the scene has finite energy and is observed through an optical system that provides a cutoff frequency, that the additive (readout) noise has a finite variance and neglects the boundaries effects due to the deconvolution. How the results change if one has to, e.g., estimate the velocity is, to the best of our knowledge, an open question.