Image Reconstruction in Light-Sheet Microscopy: Spatially Varying Deconvolution and Mixed Noise

We study the problem of deconvolution for light-sheet microscopy, where the data is corrupted by spatially varying blur and a combination of Poisson and Gaussian noise. The spatial variation of the point spread function of a light-sheet microscope is determined by the interaction between the excitation sheet and the detection objective PSF. We introduce a model of the image formation process that incorporates this interaction and we formulate a variational model that accounts for the combination of Poisson and Gaussian noise through a data fidelity term consisting of the infimal convolution of the single noise fidelities, first introduced in L. Calatroni et al. (SIAM J Imaging Sci 10(3):1196–1233, 2017). We establish convergence rates and a discrepancy principle for the infimal convolution fidelity and the inverse problem is solved by applying the primal–dual hybrid gradient (PDHG) algorithm in a novel way. Numerical experiments performed on simulated and real data show superior reconstruction results in comparison with other methods.


Introduction
Light-sheet microscopy is a fluorescence microscopy technique that enables volumetric imaging of biological samples at high frame rate with better sectioning and lower phototoxicity in comparison with other fluorescent techniques. This is achieved by illuminating a thin slice of the sample using a sheet of light and detecting the emitted fluores-cence from this plane with another objective perpendicular to the plane of the sheet. A schematic representation of a light-sheet microscope is shown in Fig. 1. Other microscopy techniques present certain disadvantages. For example, widefield microscopy [1] illuminates the whole sample using a single objective and achieves only very limited sectioning, while confocal microscopy [1] allows improved sectioning by utilising a pinhole to discard out-of-focus light, at the cost of higher photo-toxicity and reduced frame rate. Light-sheet microscopy avoids these downsides by only selectively illuminating the slice of the sample being imaged. In this way, less photo-toxicity damage is induced and, therefore, imaging of living samples over a longer period of time is possible. The combination of lower photo-toxicity, better sectioning capabilities and faster image acquisition led to light-sheet microscopy being recognised as "Method of the Year" by Nature Methods in 2014 [2].
The focus of the present manuscript is on deconvolution techniques for light-sheet microscopy data. In this context, deconvolution refers to the computational method of reversing the effect of blurring in the image acquisition process due to the point spread function (PSF) of the microscope [3][4][5]. Specifically, the PSF of an imaging system repre- Fig. 1 Schematic of a light-sheet microscope, showing the illumination and the detection directions. The interaction of the light-sheet with the detection PSF leads to a spatially varying overall PSF and decreasing of the pixel intensities away from the centre in the horizontal direction sents its response to a point object. In general, knowledge of the PSF can be modelled mathematically and calibrated using bead data (samples containing small spheres of known dimensions), and then used in the formulation of a forward model of the image formation, which can then be inverted, for example using optimisation methods, to reconstruct the original, deblurred object [6].
However, in the case of light-sheet microscopy, simply knowing or estimating the PSF of the detection objective is not sufficient, since the overall response of the system to a point source is also influenced by the excitation lightsheet used to illuminate the slice. The overall PSF could be approximated by the detection PSF in the region where the illumination sheet is focused. However, the detection PSF becomes more distorted and loses intensity away from the focus of the excitation light-sheet, an effect illustrated in Fig. 1, so this approach is not accurate. Therefore, we address this problem as a case of spatially varying deconvolution [7,8], where the variation of the system's overall PSF is determined by the interaction between the detection PSF and the light-sheet. We note that, in general, the detection PSF itself can be spatially varying due to optical aberrations in the sample, a problem that is not specific to light-sheet microscopy. We do not address this source of variability in this work, although such a spatially varying detection PSF could in principle be incorporated in our method.
Two examples of acquired data are shown in Fig. 2. We can see in both cases the effect of the spatially varying light-sheet: the image is sharper in the centre and blurry on the sides, with the amount of blur growing with the horizontal distance from the centre. In addition, the fluorescence intensity of imaged beads in Fig. 2a is unevenly distributed despite imaging a homogeneous sample of beads, with the centre of the image being brighter than the left and right sides. The aim of our work is to correct these effects.

Contribution
We propose a method for deconvolution of 3D light-sheet microscopy data that takes into account the spatially varying nature of the PSF and is scalable to the dimensions typical to biological samples imaged using light-sheet microscopy-4.86 GB per 3D 16-bit stack of 2048 × 2048 × 580 voxels.
Our approach is based on a new model for image formation that describes the interaction between the light-sheet and the detection PSF which replicates the physics of the microscope. Then, we formulate an inverse problem where the forward operator is given by model of the image formation process and which takes into account the degradation of the data by both Gaussian and Poisson noise as an infimal convolution (a concept that will be defined in Sect. 3) between an L 2 term and a Kullback-Leibler divergence term, following [9]. The proposed variational problem is solved by applying the Primal Dual Hybrid Gradient (PDHG) algorithm [6,10,11] in a novel way. Finally, we exploit the noise model to automatically tune the balance between the data fidelity and regularisation resorting to a discrepancy principle. We obtain convergence rates in a Bregman distance for the infimal convolution fidelity from [9] under a standard source condition.
In our numerical experiments, we first show how this method performs on simulated data, where the ground truth We show for both samples maximum intensity projections on the x −y plane in the top row and on the x − z plane in the bottom row, with the bead intensity shown in log scale for increased contrast. The effect of the light-sheet is visible along the horizontal direction (the x axis), as the image is sharp and has higher intensity in the centre, where the sheet is focused, while the quality of the image decreases away from the centre. The blurring effect of the light-sheet in the z direction is particularly noticeable in the x − z projections in the bottom row. Another source of blur observed, especially in the bead image (left) is given by optical aberrations due to the sample imaging medium. The Marchantia image has been acquired using samples from Dr. Alessandra Bonfanti and Dr. Sarah Robinson using the genetic line provided by Prof. Sebastian Schornack and Dr. Giulia Arsuffi at the Sainsbury Laboratory Cambridge University is known, then we apply our method to two examples of data from experiments: an image of fluorescent beads and a sample of Marchantia. In both cases, we see that the deconvolved images show improved contrast, while outperforming deconvolution using only the constant detection PSF.

Related Work
Before describing in more detail our approach to the deconvolution problem, we give a brief overview of the literature on spatially varying deconvolution in the context of microscopy and how our work relates to it.
Purely data-driven approaches estimate a spatially varying PSF in a low dimensional space (for scalability reasons) using bead images [7,8,12]. This is usually not application specific and can be included in a more general blind deconvolution framework. Similarly, the work in [13] involves writing the spatially varying PSF as a convex combination of spatially invariant PSFs. The algorithm alternates between estimating the image and estimating the PSF. In a similar vein, the authors of [14] approach the problem of blind deconvolution by defining the convolution operator using efficient matrix-vector multiplication operations. This decomposition is similar to the discrete formulation of our image formation model. These methods optimise over the (unknown) operator in addition to the unknown image. Related to these results is [15], where the authors consider the models from [12] and [14] under the assumption that the blurring operator is known and given as a sum of weighted spatially invariant operators. They exploit this structure of the operator and use a Douglas-Rachford-based splitting to solve the optimisation problem efficiently. A different data-driven approach is presented in [16], where a deep artificial neural network is used to learn the spatially varying PSF from simulated data obtained using a forward model of the microscope. While these approaches are more general than our method, we consider that using the knowledge of the image formation process in the forward model is advantageous for the reconstruction of light-sheet microscopy data.
A number of groups consider the problem of reconstruction from multiple views in the context of light-sheet microscopy. In [17], the problem of multi-view reconstruction under a spatially varying blurring operator for 3D light-sheet data is considered. They divide the image into small blocks where they perform deconvolution using spatially invariant PSFs estimated from beads (and interpolated PSFs in regions where there are no beads). In [18], the authors extend the Richardson-Lucy algorithm to the multi-view reconstruction problem in a Bayesian setting. While it allows for different PSFs for each view (estimated using beads), this work does not consider spatial variations of the PSF. While using data from multiple views improves the quality of the reconstruction, these approaches are agnostic to the physics of the microscope.
The approach taken in [19,20] involves directly measuring the spatially varying PSF in different regions of the field of view using an additional hardware module installed with the microscope, and then deconvolving the image in each region using the measured PSF. In particular, [20] employs a sophisticated tiling-based deconvolution method based on the Richardson-Lucy algorithm and a formulation similar to a convolutional neural network in order to avoid artefacts usually caused by stitching tiles deconvolved with different PSFs.
Taking an approach similar in spirit to ours, the authors of [21] model the effective PSF of a light-sheet microscope, which is then plugged into a regularised version of the Richardson-Lucy algorithm for deconvolution. However, while they model the detection PSF and the light-sheet separately, they assume the effective PSF of the microscope is spatially invariant and the point-wise product of the two PSFs. In contrast, we do not take this simplifying step in our modelling, as we consider that the relationship between the two PSFs plays in important role in the resulting blur of the image.
The work of Guo et al. [22] uses a modified Richardson-Lucy algorithm implemented on GPU to improve the speed of convergence, further improved by the use of a deep neural network, which is a promising approach.
Moreover, in [23] the authors introduce an image formation model similar to the one described in the present manuscript. However, the regions of the resulting PSF where the light-sheet is out of focus are discarded, hence approximating the overall PSF with a constant PSF and then performing deconvolution using the ADMM algorithm. In Cueva et al. [24], a mathematical model which takes into account image fusion with two-sided illumination is derived from first principles. However, it is restricted to 2D and they do not apply the method to real data.
Lastly, regarding the mixed Gaussian-Poisson noise fidelity, our method follows the infimal convolution variational approach described in [9], with the additional lightsheet blurring operator. The same inverse problem, without the blurring operator, is solved in [25] albeit using an ADMM algorithm for the minimisation.

Paper Structure
The paper is organised as follows. In Sect. 2, we introduce a mathematical model of the image formation process in a light-sheet microscope. This model describes how the sample is blurred by the excitation illumination together with the detection objective PSF. Optical aberrations of the system are modelled using Zernike polynomials in the detection PSF, which we discuss in Sect. 2.3. In Sect. 3, we define the math-ematical setting for the deconvolution problem and we state an inverse problem using a data fidelity as an infimal convolution of the individual Gaussian and Poisson data fidelities. We discuss convergence rates and a discrepancy principle for choosing the regularisation parameter in Sect. 2.1. In Sect. 4, we describe how PDHG is applied to this inverse problem, with details of the implementation of the proximal operator and the convex conjugate of the joint Kullback-Leibler divergence. Finally, we validate our method with numerical experiments both with simulated and real data in Sect. 5, before concluding and giving a few directions for future work in Sect. 6.

Forward Model
The first contribution of the current work is a model of the image formation process in light-sheet microscopy. By modelling the excitation light-sheet and the detection PSF separately and their interaction in a way that replicates the physics of the microscope, we are able to accurately simulate the spatially varying PSF of the imaging system. We then incorporate this knowledge as the forward model in an inverse problem, which we solve to remove the noise and blur in light-sheet microscopy data. In this section, we describe the image formation process and the PSF model.

Image Formation Model
A light-sheet propagated along the x direction is focused by the excitation objective at an axial position z = z 0 and the local light-sheet intensity l is modelled by the incoherent point spread function (PSF) of the excitation objective. The sample with local density of fluorophores u emits photons proportionally to the local intensity l of the light-sheet. These photons are then collected by a detection objective, whose action on the illuminated sample is modelled as a convolution with its PSF h. For clarity, see Fig. 3 for the directions of the axes. Finally, the sensor conjugated with the image plane z 0 collects photons and converts them to digital values for storage. Consequently, the recorded image is corrupted by a combination of Gaussian and Poisson noise. We can see here again how the local variation of the light-sheet will result in a spatially varying blur and spatially varying illumination intensity in the captured image. This process is then repeated for each z 0 to obtain the measured data f .
More specifically, we model u, f , l and h as functions defined on ⊂ R 3 , a rectangular domain of dimensions For the sample u, the light-sheet l and the detec- The detection PSF h is given by and the light-sheet l is the y-averaged beam PSF l beam : where n is the refractive index, λ h , λ l are the wave lengths corresponding to the detection objective and light-sheet beam, respectively, and g σ represents Gaussian blur. Lastly, p Z (κ x , κ y ) and p 0 (κ z , κ y ) are the pupil functions for the detection PSF and the light-sheet beam, respectively, both given by: for their respective wave lengths, λ i = λ h or λ i = λ l , and numerical apertures, NA i = NA h or NA i = NA l , where the phase for the light-sheet pupil p 0 is equal to zero and the phase for the detection PSF pupil p Z is an approximation of the optical aberrations written as an expansion in a Zernike polynomial basis. The Gaussian blur g σ in (2.2) is a technical detail that enables better fitting of the detection PSF h to the optical aberrations seen in bead data, an idea introduced in [26]. More details about the pupil functions and the aberration fitting using Zernike polynomials and the Gaussian blur g σ will be given in Sect. 2.3. In general, the NA of the excitation sheet is much lower than the NA of the detection lens. We note that the overall process is not translation invariant and cannot be modelled by a convolution operator. Note that both the detection PSF h and the light-sheet PSF have a similar formulation derived from: which includes the pupil function for modelling aberrations and a defocus term before taking the Fourier transform (see, for example [27,28]). In addition, the actual light-sheet which illuminates a slice of the sample is obtained by rapid scanning of the illumination beam, which we model by y-averaging the illumination PSF l beam given in (2.3) and repeating it in the y direction for the full length of the sample. In practice, the image formation process modelled by (2.1) is discretised at the point of recording by the camera sensor in the x y plane and by the step size of the light-sheet in the z direction. If the camera has a resolution of N x × N y pixels and the light-sheet illuminates the sample at N z distinct steps, the model (2.1) becomes: . . , N z , and a normalisation constantC, whereũ,f ,l,h ∈ R N x ×N y ×N z are the discretised versions of u, f , l, h, respectively. Similarly, the sampling performed by the camera sensor leads to a discretisation of the Fourier space and the use of the discrete Fourier transform in the PSF and light-sheet models (2.2) and (2.3). Lastly, in our implementation we normaliseh so that N z k=1h i, j,k = 1 and choose the normalisation constantC so that the norm of the resulting operator is equal to one.

Derivation of the Model
Let l, u, h be defined as in Sect. 2.1, with h and l centred at the origin and l translation invariant in the y direction and symmetric around the yz plane. For a fixed z 0 ∈ [− z 2 , z 2 ], we take the following steps, which replicate the inner workings of a light-sheet microscope: 1. Image the sample at z = z 0 : centre the sample u at z 0 and multiply the result with the light-sheet l: 2. Convolve with the objective PSF h: 3. Slice at z = 0: which leads to: This is the same as model (2.1), where we substitute z for z 0 . Note that, if there are no aberrations in h or other sources of asymmetry in the z direction, we could simply write h(x − s, y − t, w) instead. For a discretisation of the domain using a 3D grid with N x × N y pixels and N z light-sheet steps, the forward model can be computed by following the three steps above for each k = 1, . . . , N z , where we perform the convolutions using the fast Fourier transform (FFT), resulting in a number of O(N x N y N 2 z log(N x N y N z )) operations. Alternatively, we can rewrite the last integral above as: 12) and the convolution in (2.11) is a 2D convolution in (x, y): In terms of numbers of FFTs performed on a discretised N x × N y × N z grid, this alternative formulation requires O(N x N y N 2 z log(N x N y )) operations.

Point Spread Function Model
While both the light-sheet profile and the detection PSF are based on the same model of a defocused system (2.5) introduced in [27], note that our definition of h in (2.2) includes an additional convolution operation with a Gaussian g σ and a pupil function p Z with a nonzero phase. Let us turn to why this is the case. It is well known that optical aberrations hamper results based on deconvolution with theoretical PSFs. In light-sheet microscopy, the effect of aberrations is more visible away from the centre, as shown for example in the bead image in Fig. 2, or in the more detailed example beads in Fig. 4. It is, therefore, required that we model the (spatially invariant) aberrations of the detection lens.
The general PSF model (2.5), with the phase of the pupil function equal to zero, does not take optical aberrations into account and therefore it is not an accurate representation of the objective PSF h. For example, a PSF calculated using (2.5) with zero phase of the pupil and the parameters of the detection objective, shown in Fig. 5, does not resemble the actual bead images in the data in Fig. 4.
There has been extensive work on the problem of phase reconstruction in the literature [26,29,30], but here we take a more straightforward approach using Zernike polynomials to include aberrations in the PSF [31], as follows. Let h z be the objective PSF calculated using (2.5) with Zernike polynomials in the phase of the pupil function: (2.14) where p z (κ x , κ y ; c) is the pupil function with N Z Zernike polynomials in the phase: and c = [c 1 , . . . , c N z ] T are coefficients corresponding to the polynomials for some integer N Z > 0. Moreover, let h zb be the blurred PSF obtained by convolving h z with a Gaussian g σ with width σ : This allows us to obtain a better approximation of the objective PSF [26]. The parameters c and σ are calculated by for some B Z > 0, where h bead is a bead image containing the aberrations that one wants to capture in the fitted detection PSF (for example the bead in Fig. 4b) and b is equal to one inside the sphere of the radius equal to the radius of the bead (a parameter that is provided) and zero outside the sphere. This takes into account the non-negligible size of the beads used to generate the data.
In the implementation of the fitting procedure, we normalise both the bead image h bead and the simulated PSF h zb by their maximum values before calculating their error, and we include two additional parameters, scaling and shift, to ensure a better fit of the intensity values (not shown here for simplicity of the presentation).
The best choice of the number of Zernike polynomial basis elements N Z and the boundary B Z of the coefficients c depend on the data h bead and how well the fit is required to be in the deconvolution step. In general, at least the first 15 Zernike polynomials are needed to capture the main optical aberrations such as spherical and astigmatism. In our experiments, for the bead shown in Fig. 4b, we found that N Z = 15 and B = 3 are an appropriate choice. The Zernike polynomials used and their resulting corresponding coefficients are shown in Table 1 and in Fig. 6. The resulting PSF is the detection PSF model (2.2) and is shown in Fig. 7.
Finally, we note that a more thorough approach to choosing the Zernike basis is possible, by using multiple bead images for fitting the Zernike coefficients and comparing the quality of the fit for different values of N Z and B Z . Alternatively, one could average multiple beads and perform the fitting procedure described above on the averaged bead. In both cases, it is worth mentioning that, since the optical aberrations vary within the sample image, we would only be able to fit the general shape of the PSF rather than the sharper features present in each bead, effectively fitting the low frequency information in the beads. In the end, this would achieve a similar effect to the Gaussian blur g σ that we use in the fitting process. Moreover, one can employ more advanced techniques such as the ones described in [26,29,30] for estimating the pupil function, which can be plugged in to our image formation model. However, such an analysis focused on the pupil function is beyond the scope of the present work.

Convolution with Spatially Varying Kernel
Having introduced the image formation model for a lightsheet microscope (2.1) as well as the models for the individual PSFs, it is worth expanding on the source of spatial variability that we tackle in this work. First, note that with a change of variable w → w + z, we can rewrite the model (2.1) as: is the convolution of u(x, y, z) with the spatially varying kernelh(x, y, z; s, t, w): gives the expression of a kernelh(x, y, z; ·, ·, ·) which varies with its centre (x, y, z). Therefore, the model presented in this section describes the spatial variation of the overall PSF of the systemh, as a consequence of the interaction between the light-sheet beam PSF l and the detection PSF h. We highlight that l and h are not themselves spatially varying. By the process described in Sect. 2.2, this interaction (and spatial variability) is modelled explicitly. This is in contrast to approaches such as [16], where the spatial variability of the PSF is learned from the data and encoded in the black-box mechanism of an artificial neural network.
In practice, a second source of spatial variability of the PSF may be the detection PSF h, due to the optical aberrations that can vary within the sample image. As described in Sect. 2.3, in this work we do not account for this potential spatial variability of h, and we fit one pupil function to the bead data.

Problem Statement
In this section we formally state the inverse problem of deblurring a light-sheet microscopy image. Let ⊂ R 3 be a bounded Lipschitz domain and let L : L p ( ) → L 2 ( ) be the forward operator defined by (2.1). Here 1 < p < 3/2 is chosen such that the embedding of the BV space is compact [32]. Clearly, L is linear.
We consider the following inverse problem where v ∼ Pois(f ) is a Poisson distributed random variable with meanf and w represents additive zero-mean Gaussian noise. We do not model Gaussian noise statistically and instead, in the spirit of (deterministic) variational regularisation, assume that w ∈ L 2 ( ) is a fixed perturbation with w L 2 ( ) ≤ σ G for some known σ G > 0. Poisson noise is typically modelled using the Kullback-Leibler divergence as the data fidelity term [33,34].
Let us give a brief justification of the inverse problem formulation described in this section [9,35], from a Bayesian perspective. First, by using the Poisson and Gaussian probability density functions, we have that p(v|u) , and from Bayes' theorem and conditional probability: where we used that p( f |u, v) = p( f |v). Moreover, we assume that the prior is a Gibbs distribution p(u) = e −αJ (u) for a convex functional J (u), which we will introduce later.
To obtain a maximum a posteriori estimation of u and v (i.e. maximise the posterior distribution p(u, v| f )), we take the minimum of the negative log of (3.2) and, after discarding the denominator p( f ) and using the Stirling approximation for the factorial log v! = v log v − v, we obtain the minimisation problem: where the first term is the regularisation term and the remaining terms form the data fidelity term.
We will now describe the formal mathematical setting for (3.3) in the context of variational regularisation. This will allow us to show well-posedness of the model, establish convergence rates of the solution with respect to the noise in the measurements and to introduce a discrepancy principle for choosing the value of the regularisation parameter α.
First, note that in (3.3), we can perform the minimisation over v only on the data fidelity part of the objective, which can be written as an infimal convolution of the two separate Gaussian and Poisson fidelities. The infimal convolution of two functionals ϕ 1 , ϕ 2 on L 2 is defined as 1 : for f ∈ L 2 ( ). Therefore, we define the following data fidelity term, as proposed in [9]: for f ∈ L 2 ( ) andf ∈ L 1 + ( ), where L 1,2 + ( ) denotes the positive cone in L 1,2 ( ) (that is, functions f ∈ L 1,2 ( ) such that f ≥ 0 a.e.) and D KL denotes the Kullback-Leibler divergence, which we define as follows +∞, otherwise. (3.6) We note that v(x) log v(x)dx < ∞ for v ∈ L 2 , since L 2 is continuously embedded into the Orlicz space L log L of functions of finite entropy [36,37] L log L( ) where (·) + = max{·, 0} denotes the positive part. A proof of the following result can be found in [9], but we provide it here for readers' convenience.
where χ denotes the characteristic function. The function ϕ is proper, convex, non-negative and lower semicontinuous, while ψ is proper, convex, lower semicontinuous and coercive. Therefore, by [38,Prop. 12.14], the infimal convolution is exact and is itself a proper, convex and lower semicontinuous function. Uniqueness follows from strict convexity of ψ.
Now we turn our attention to the lower semicontinuity of the functional (·, f ) in its first argument.

Proposition 3.2 (Lower semicontinuity) For any f
where g ∈ L 1 ( ), v * (g) is as defined in Proposition 3.1 and C := {g ∈ L 1 + ( ) : gdx = 1}. The characteristic function is lower semicontinuous because C is closed in L 1 and the rest is lower semicontinuous by [9,Thm. 4.1].
The following fact is easily established. Proof By (2.1), we have where dμ stw := dsdtdw. Noting that the light-sheet PSF l and detection PSF h are bounded from above by some C 1 , C 2 > 0, we have that: where in the last inequality we applied Hölder's inequality and C( p) is a constant that depends on p (as well as C 1,2 and ). Hence, we obtain the first claim. For the second claim, we observe that L1(x, y, z) = l(s, t, w)h(x − s, y − t, w)dμ stw ≥ 0.

= l(s, t, w)h(x − s, y − t, w)dμ stw dμ xyz
and let B l,h ⊂ supp l ∩ supp h. Then, since both l and h are non-negative on , from the last equality above we have that: which proves the second claim.

Remark 3.1
Our setting with the measured data f ∈ L 2 ( ) differs slightly from [9], where f ∈ L ∞ ( ) was assumed.
We will consider the following variational regularisation problem where is the infimal convolution fidelity as defined in (3.5), J : L p → R + ∪ {+∞} is a regularisation functional, α ∈ R + is a regularisation parameter and 1 < p < 3/2. Without loss of generality, we assume that f dx = 1.
As the regulariser J , we choose the total variation [39] J (u) = TV(u) := sup By the Rellich-Kondrachov theorem, the space BV( ) := {u ∈ L 1 ( ) : TV(u) < ∞}, is compactly embedded into L p ( ) for 1 ≤ p < 3/2 and continuously embedded into L 3/2 ( ) since ⊂ R 3 . Therefore, we consider TV : We will denote by u † TV the TV-minimising solution of (3.1), i.e. a solution that satisfies The existence of such solution is obtained by standard arguments [40]. We will make the reasonable assumption that the TV-minimising solution is positive, i.e. u † TV ≥ 0 a.e. Due to the positivity of the kernels involved in (2.1), it is clear that u † TV ≥ 0 implies Lu † TV =f ≥ 0. Since by Proposition 3.1 the infimal convolution (3.5) is exact, we can equivalently rewrite (3.8) as follows Existence of minimisers in (3.8) and (3.9) is obtained by standard arguments [9, Thm. 4.1].

Proposition 3.4 Each of the optimisation problems (3.8) and
(3.9) admits a unique minimiser.
We will also need the following coercivity result.

Proposition 3.5
The functional ( f , ·) : L 1 ( ) → R + ∪ {+∞} is strongly coercive with exponent 2, i.e. there exists a constant C > 0 such that Proof Using Pinsker's inequality for the Kullback-Leibler divergence, we get for some C > 0. Note that Pinsker's inequality assumes that f , g ≥ 0 and inf f dx = gdx = 1, which we ensure by definition in (3.6). Now, using the inequality 1 2 (a + b) 2 ≤ a 2 + b 2 that holds for all a, b ∈ R and the triangle inequality, we obtain the claim

Convergence Rates
Our aim in this section is to establish convergence rates of minimisers of (3.8) as the amount of noise in the data decreases. But first we need to specify what we mean by the amount of noise in our setting.
We argue as follows. Since the noise in the measurement is generated sequentially, i.e. photo-electrons are first counted by the sensor leading to a Poisson noise and later they are collected by the electronic circuit generating an additive Gaussian noise, for any exact dataf there exists z ∼ Pois(f ) such that D KL (z,f ) ≤ γ , where γ > 0 depends on the exposure time t and vanishes as t → ∞ [33]. Further, there exists w ∈ L 2 ( ) with w L 2 ≤ σ G such that f =z + w. Sincez ≥ 0 is feasible in (3.5), we get the following upper bound on the fidelity term (3.5) evaluated at the measurement f and the exact dataf The standard tool for establishing convergence rates are Bregman distances associated with the regulariser J . We briefly recall the necessary definitions. Definition 3.1 Let X be a Banach space and J : X → R + ∪ {+∞} a proper convex functional. The generalised Bregman distance between x, y ∈ X corresponding to the subgradient q ∈ ∂J (y) is defined as follows Here ∂J (v) denotes the subdifferential of J at y ∈ X . If, in addition, p ∈ ∂J (x), the symmetric Bregman distance between x, y ∈ X corresponding to the subgradients p, q is defined as follows To obtain convergence rates, an additional assumption on the regularity of the TV-minimising solution, called the source condition, needs to be made. We use the following variant [41].

Parameter Choice Rules
Let us summarise what we know about the fidelity function as defined in (3.5), the regularisation functional TV and the forward operator L: ·) is proper, convex and coercive (Proposition 3.5) in L 1 ( ); -(·, ·) is jointly convex [42] and lower semicontinuous (Propositions 3.1 and 3.2); -( f , g) = 0 if and only if f = g; -TV: L 1 ( ) → R ∪ {+∞} is proper, convex and lower semicontinuous [32] and its null space is given by N (TV) = span{1}, where 1 denotes the constant one function; -TV is coercive on the complement of its null space in L 1 ( ) [32]; Using these facts and slightly modifying the proofs from [43], we obtain the following Then, where q † = L * μ † is the subgradient from Theorem 3.1 and σ G , γ > 0 are as defined in (3.10).
In a similar manner, we can obtain convergence rates for an a posteriori parameter choice rule known as the discrepancy principle [44][45][46]. Let f be the noisy data and δ > 0 the amount of noise such that (f , f ) ≤ δ, where is as defined in (3.5). In our case, δ = σ 2 G 2 + γ by (3.10). The discrepancy principle amounts to selecting α = α( f , δ) such that pgα = sup{α > 0 : (Lu α , f ) ≤ τ δ}, (3.11) where u α is the regularised solution corresponding to the regularisation parameter α and τ > 1 is a parameter. Again, slightly modifying the proofs from [43], we obtain the following where q † = L * μ † is the subgradient from Theorem 3.1 and σ G , γ > 0 are as defined in (3.10).

PDHG for Infimal Convolution Model
In practice, due to the joint convexity of the Kullback-Leibler divergence, we solve the minimisation problem (3.9), where we treat the reconstructed sample u and the Gaussian denoised image v jointly and, in addition, we impose lower and upper bound constraints on u and v by including the corresponding characteristic functions in the objective: Note that the objective function in (4.1) is a sum of convex functions (the Kullback-Leibler divergence D KL is jointly convex [47]), and therefore is itself convex. We then write the problem (4.1) as: where we solve for w = u v , m = 3 and: where L is the forward operator corresponding to the image formation model from Sect. 2.1. Rather than solving the problem (4.2) directly, a common approach is to reformulate it as a saddle point problem using the Fenchel conjugate G * (y) = sup z z, y − G(z). For proper, convex and lower semicontinuous function G, we have that G * * = G, so (4.2) can be written as the saddle point problem and by swapping the min and the sup and applying the definition of the convex conjugate G * , one obtains the dual of (4.2): The saddle point problem (4.7) is commonly solved using the primal-dual hybrid gradient (PDHG) algorithm [6,10,11], and by doing so, both the primal problem (4.2) and the dual (4.8) are solved. We apply the variant of PDHG from [48], which accounts for the sum of composite terms H i •L i . Given an initial guess for (w 0 , y 1,0 , . . . , y m,0 ) and the parameters σ, τ > 0, and ρ ∈ [ , 2 − ] for some > 0, each iteration k ≥ 0 consists of the following steps: 3. ∀i = 1, . . . , m : 4. ∀i = 1, . . . , m : where for a proper, lower semicontinuous, convex function G, prox τ G is its proximal operator, defined as: The iterates (w k ) k∈N and (y i,k ) k∈N (i = 1, . . . , m) are shown to converge if the parameters σ and τ are chosen such that σ τ [48,Theorem 5.3]). In step 3 in (4.9), we use Moreau's identity to obtain prox σ H * As a stopping criterion, one can use the primal-dual gap, i.e. the difference between the primal objective cost at the current iterate and the dual objective cost at the current (dual) iterate: , y 1 , . . . , y m Due to strong duality, optimality is reached when the primaldual gap is zero, so a practical stopping criterion is when the gap reaches a certain threshold set in advance. Lastly, note that the optimisation is performed jointly over both u and v, which introduces a difficulty for the term H 2 (L 2 w) in Step 3 above, as this requires the proximal operator of the joint Kullback-Leibler divergence D KL (u, v). Similarly, the computation of the primal-dual gap in (4.12) requires the convex conjugate of the joint Kullback-Leibler divergence. We describe the details of these computations in Sects. 4.2 and 4.3, respectively.

Computing the Proximal Operator of the Joint Kullback-Leibler Divergence
When writing the optimisation problem in the form (4.2), it is common that the functions G and H i (i = 1, . . . , m) are "simple", meaning that their proximity operators have a closed-form solution or can be easily computed with high precision. This is certainly true for G and H 1 , but not obvious for the joint Kullback-Leibler divergence. First, for discrete images u = [u 1 , . . . , u N ] T , [v 1 , . . . , v N ] T , the definition (3.6) becomes: and then: where we define the function as: To find the minimiser of (u j , v j ), we let its gradient be equal to zero: In the second equation, we write u j as a function of v j , which we substitute in the first equation to obtain: . (4.17) The first equation is then solved using Newton's method, where the iteration is given by: . (4.18)

Computing the Convex Conjugate of the Joint Kullback-Leibler Divergence
We compute the convex conjugate of the discrete joint Kullback-Leibler divergence D KL (v, u) in (4.13) for u, v ∈ [l 1 , l 2 ] N : where is defined as: To solve the optimisation problem on the last line in (4.19), we write the KKT conditions (where we use u, v instead of u j , v j to simplify the notation: where the functions g i correspond to the bound constraints: Noting that (4.21a) is equivalent to: we solve the last two equations by using the complementarity conditions (4.21d) for cases when the Lagrange multipliers μ i are zero or nonzero.

Numerical Results
In this section, we describe a number of numerical experiments that illustrate the performance of our deconvolution method. We start with four examples of simulated data, where we are able to quantify the reconstructed image in relation to the known ground truth image. Then, we show how our method performs on microscopy data, where we reconstruct an image of spherical beads and a sample of a Marchantia thallus. In the experiments with microscopy data, we compare our method with two standard approaches of performing shift-invariant deconvolution, one where the convolution kernel is the detection PSF and one where the convolution kernel is the point-wise multiplication of the detection PSF with the light-sheet.

Simulated Data
We consider four images of size 128×125×64: a 5×5×5 grid of beads where the effect of the light-sheet in the z coordinate and the shape of the objective PSF are noticeable, a piecewise constant image of "steps" where the Poisson noise affects each step differently based on intensity, and an image that replicates realistic biological samples of tissue. These are shown in the top row of Fig. 8.
To obtain the measured data, we proceed as follows. Given the ground truth image u 0 , the forward operator described in Sect. 2.1 is applied to obtain the blurred image Lu 0 . The parameters for the forward model are taken to be those of the microscope used in the experimental setup and are given in Table 2 The resulting simulated measured data is shown in the bottom row of Fig. 8.
We compare the reconstruction obtained using the proposed approach, which we will refer to as LS-IC (light-sheetinfimal convolution), with the reconstructions obtained by using an L 2 data fidelity term instead of the infimal convolution term, or using a convolution operator corresponding to the objective PSF instead of the light-sheet forward model from Sect. 2.1. Specifically, we compare the solution of (4.1) with the solutions to the following problems, all solved using PDHG as described in Sect. 4: where H is the convolution operator with the detection objective PSF h z as given in (2.14). For each test image and each method above, the PDHG parameters ρ and σ used are given in Table 3 and τ is set to τ = 1/σ m i=1 L * i L i to ensure convergence according to Theorem 5.3 in [48]. As a stopping criterion, we used the primal-dual gap (4.12), normalised by the number of pixels N and the dynamic range of the measured image f : with a threshold of 10 −6 and a maximum number of 10,000 iterations.
The results of the four methods applied to the test images are given in Fig. 9 and quantitative results are given in Table 4. For each test image and each method, the regularisation parameter has been chosen to optimise the normalised l 2 error and the structural similarity index (SSIM), respectively. We note that PSF-L2 and PSF-IC perform particularly poorly, highlighting the importance of an accurate representation of the image formation model instead of simply using the detection objective PSF as the forward operator. Comparing LS-IC and LS-L2, we see better results when using the infimal convolution data fidelity for the beads and the steps image, both visually and quantitatively. The deblurring is performed better on the beads image, while on the steps image we see a better denoising effect, especially along the edges in the image. For the tissue image, both fidelities give comparable results, but as we see in Fig. 10, when the ground truth is not known, choosing α using the discrepancy principle gives a better result for the infimal convolution model.
The reconstructions shown in Fig. 10 are obtained by applying the discrepancy principle corresponding to each method. For LS-IC, we choose a value of α such that it satisfies a variation of the discrepancy principle given in (3.11), where we enforce that the single noise fidelities are bounded by their respective noise bounds, rather than the sum of the fidelities being bounded by the sum of the noise bounds, as stated in (3.11). While both versions give good results, we found the former to give more accurate reconstructions. Here, the bound on the Poisson noise is set to 1 2 , motivated by the following lemma from [49], which gives the expected value of the Kullback-Leibler divergence: Lemma 5.1 Let Y β be a Poisson random variable with expected value β and consider the function: Then, for large β, the following estimate of the expected value of F(Y β ) holds: One last observation worth making about the results in Figs. 9 and 10 is about the square shape of the reconstructed beads (the first column of both figures). By looking carefully at the ground truth bead image in Fig. 8, one can see that the beads are almost square to begin with, due to the small dimensions of the image. The finer details that make them appear round are lost in the blurring process which, in combination with the total variation regulariser used in the deconvolution algorithm, leads to this detail not being present in the reconstruction, thus making them square. Moreover, the sharpening of their edges is an expected effect of the total variation regularisation, which could be avoided by using a different regularisation technique. However, this is beyond the scope of this article.
The experiments were run using Matlab version R2020b Update 2 (9.9.0.1524771) 64-bit in Scientific Linux 7.9 on a machine with Intel Xeon E5-2680 v4 2.40 GHz CPU, 256 GB memory and Nvidia P100 16 GB GPU. The running times, averaged over 5 runs for each method and each image, are given in Table 5.

Light-Sheet Data
In this section, we show the results of applying LS-IC to a cropped portion of the full resolution images in Fig. 2. Specifically, we select a cropped beads image of 1127×111× 100 voxels and a cropped Marchantia image of 1127×156× 100 voxels.
For comparison, we also run PSF-L2 on the same images. In addition, we run an alternative light-sheet deconvolution method, where we perform shift-invariant deconvolution using a PSFh obtained by point-wise multiplication of the detection PSF h z in (2.14) and the light-sheet l, effectively clipping h by the width of the light-sheet. Therefore, the problem we solve, which we denote by PSF-L2-clip is: whereH is the convolution operator with the PSFh = h z · l. A justification of this method is given by a simplified image formation model where we assume that the light-sheet has constant width (in the z direction) and constant intensity throughout the full sample, or in a region of interest where deconvolution is performed, as it is done for example in [21]. We run each method on both images for up to 6000 iterations, with a normalised primal-dual gap of 10 −6 as a stopping criterion. The parameters for the image formation model used are the same as in Table 2 and the PDHG parameters are given in Table 6.  The results of the deconvolution are shown in Figs. 11 and 13 for the beads image and the Marchantia image, respectively. In both figures, we first show the position of the light-sheet in the first row (due to the cropping, this is no longer centred), the measured data in the second row, followed by the PSF-L2, the PSF-L2-clip and the LS-IC reconstructions on the third, fourth and fifth rows, respectively. The regularisation parameter α was chosen in all four cases visually such that a balance is achieved between the amount of regularisation and the noise in the reconstruction.
In the beads image in Fig. 11, we note that LS-IC performs better than PSF-L2 and PSF-L2-clip at reversing the effect of the light-sheet. This is most obvious in the zy plane on the  The minimisation is stopped when the primal-dual gap is lower than 10 −6 or the maximum number of 10,000 iterations is reached right-hand side of the image, where the length of the beads in the z direction has been reduced to a greater extent than in the PSF-L2 and the PSF-L2-clip reconstructions. In addition, the beads appear less blurry in the LS-IC reconstruction in the right-hand side of the x y plane. We also note that PSF-L2-clip fails to properly reverse the effects of the optical aberrations in the beads. This is not unexpected, as the information related to the aberrations is lost when the detection PSF is clipped by setting its upper and lower extremities to zero. The extent to which this happens depends on the width of the light-sheet: as the light-sheet becomes wider, the overall PSF approaches the detection PSF, in which case the deconvolved image will be the same as the reconstruction  using PSF-L2. We show the bead images in 3D in Fig. 12, where the effect of the deconvolution in the z direction is more significant in the LS-IC reconstruction than in both the PSF-L2 and the PSF-L2-clip reconstructions, namely the beads are shorter in the z direction.
In the Marchantia reconstruction in Fig. 13, we see a similar effect of better sharpening in the z direction, most easily seen in the right-hand side and bottom projections in each panel (maximum intensity projections in the zy and the xz planes, respectively). In particular, we see additional artefacts in the PSF-L2-clip reconstruction: horizontal lines (parallel with the x y plane), likely due to the clipping of the detection PSF. Moreover, the 3D rendering of the Marchantia sample in Fig. 14 shows smoother cell edges in the LS-IC reconstruction compared to the other methods. Specifically, the PSF-L2 reconstruction contains reconstruction artefacts that are non-existent in the LS-IC reconstruction (indicated by the yellow arrows), while the PSF-L2-clip reconstruction contains areas where the blur has not been fully removed (for example at the same locations indicated by the yellow arrows), where the edges are not as sharp as in the LS-IC reconstruction.
Lastly, we reiterate that the strength of our proposed method is given by the physically accurate modelling of the interaction between the detection PSF and the light-sheet. This allows one to model the optical aberrations as part of the detection PSF (with no requirements on how this should be done), as well as the spatial dependence of the width and the intensity of the light-sheet and to combine them in an image formation model that does not require approximating using a light-sheet with constant width and intensity. As we see in the numerical experiments shown in this section, such approximation, while faster and less expensive computationally, leads to loss of information and results that are at most locally accurate.

Conclusion
In this paper, we introduced a novel method for performing deconvolution for light-sheet microscopy. We start by modelling the image formation process in a way that replicates the physics of a light-sheet microscope, which is achieved by explicitly modelling the interaction of the illumination lightsheet and the detection objective PSF. Moreover, the optical aberrations in the system are modelled using a linear combination of Zernike polynomials in the pupil function of the detection PSF, fitted to bead data using a least squares procedure. We then formulate a variational model taking into account the image formation model as the forward operator and a combination of Poisson and Gaussian noise in the data. The model combines a total variation regularisation term and a fidelity term that is an infimal convolution between an L 2 In addition, we establish convergence rates with respect to the noise and we introduce a discrepancy principle for selecting the regularisation parameter α in the mixed noise setting. We solve the resulting inverse problem by applying the PDHG algorithm in a non-trivial way.
The results in the numerical experiments section show that our method, LS-IC, outperforms simpler approaches to deconvolution of light-sheet microscopy data, where one does not take into account the variability of the overall PSF introduced by the light-sheet excitation, or the combination of Gaussian and Poisson noise. In particular, numerical experiments with simulated data show superior reconstruction quality in terms of the normalised l 2 error and the structural similarity index, not only by optimising over the regularisation parameter α given the ground truth, but also with an a posteriori choice of α using the stated discrepancy principle. On bead data, the reconstruction obtained using LS-IC shows a more significant reduction of the blur in the z direction compared to PSF-L2, where the light-sheet variations and the Poisson noise are not taken into account. Moreover, reconstruction of a Marchantia sample with LS- IC shows fewer artefacts than the PSF-L2 reconstruction, as well as sharper cell edges and smoother cell membranes.
Future work includes applying this technique to a broader range of samples and using it to answer questions of biological interest. To do so, we see a number of potential future directions that this work can take: 1. Adapting the discrepancy principle given in (3.11) for choosing the regularisation parameter α to real data sets, like the ones in Sect. 5.2. 2. Improving the running time of the method potentially by means of randomised approaches. 3. Investigating other regularisation terms.