1 Introduction

Light field (LF) refers to the concept of capturing a comprehensive description of light rays. Although a complete parameterization of LF would require a 7D plenoptic function [1], practical applications have been successfully made use of its simplified 4D version [2]. Among four dimensions, two dimensions are for perspective indexing, and the other dimensions assemble the spatial information, see Fig. 1. The rich-content property of LF brings a great advantage to numerous applications such as autonomous systems [3], virtual reality [4], 3D television [5]. However, this benefit also comes with a cost of computational resources. Processing 4D LF images require more memory bandwidth, computing power and runtime than the conventional 2D image. This problem encourages the use of Graphics Processing Unit (GPU) for offloading LF image processing tasks. There are three main techniques to capture 4D LF data: time-sequential [6], multi-sensors [7], and multiplexing [1]. These acquisition methods compromise between spatial resolution and angular or temporal resolution, i.e., using a low-resolution imaging sensor to reduce cost while increasing the number of cameras for a higher angular resolution [7]; moving camera with more spatial steps to capture more perspective images but suffering from a long acquisition time [6]; Increasing the number of microlenses for a higher spatial resolution while reducing the angular resolution [1]. These existing challenges in high-resolution LF acquisition are driving recent research on super LF resolution (LFSR) [8].

Fig. 1
figure 1

Light-field representation and acquisition; a Two-plane parameterisation; b 2D array representation of sub-aperture images (SAIs)

The super-resolution of LF image aims to reconstruct a high-resolution (HR) view, also referred to as sub-aperture image (SAI), from a 2D array of low-resolution (LR) views, see Fig. 1(b). Many approaches have been proposed for the LFSR, including convolutional neural network (CNN) based approaches [9,10,11] and optimization-based approaches [12,13,14]. Although providing high-quality SR results, these approaches typically comprise multiple processing stages and complex algorithms, leading to high computational demand and a long processing time. For example, multi-stage CNN-based approaches [9, 11] divide LFSR into two steps. The first step employs very deep and large CNNs [15, 16] for separately up-scaling LR SAIs. Another refinement CNN is trained and applied in the second step for enhancing the quality of HR SAI. Other examples are [13] consisting of time-consuming graph processing tasks and [14] involving computational demanding 5D filtering operator. As far as we know, the literature on GPU accelerated LFSR is very limited despite its importance. While focusing on the quality aspect, previous approaches put aside the run-time constraint and leave the possibility of accelerating SR tasks undiscussed.

This paper presents a GPU accelerated approach for 4D light-field image super-resolution. First, we proposed a computational framework for reconstructing high-resolution sub-aperture images from 4D LF data, Sect. 3. The LF super-resolution model derived from the statistical perspective consists of a joint \(\ell ^1\)-\(\ell ^2\) data fidelity term and a weighted nonlocal total variation regularization term. While the first term provides a proper treatment to mixed Gaussian-Impulse noise conditions, the second term introduces an effective way to integrate image features for a better regularization effect. A weighting scheme combining bilateral effect, edge and occlusion features is also proposed. Secondly, we show that the proposed optimization problem can be effectively solved with the alternating direction method of multipliers (ADMM), Sect. 4. ADMM resolves the main problem of steepest gradient descent in finding a proper step size while avoiding costly line-search operations. Third, a GPU accelerated architecture is presented for speeding up the iterative solver, Sect. 5. Through the realization of transformation matrices with linear functions, which are effectively realized in the form of GPU kernel execution, the proposed approach alleviates the resource shortage of sparse matrix implementation. As shown in the experimental result, the proposed approach can super-resolve large-size images (i.e., up to 5760\({\times }\)5760) within a single GPU as compared to 4 GPUs used in the related work [17]. In Sect. 7 an extensive experiment is conducted on synthetic 4D LF dataset [18, 19] and natural image dataset DIV8K [20] to validate the robustness of the proposed SR model and evaluate the performance of the accelerated computational framework. Through the OpenCL framework, the accelerated solver can be deployed on various GPU platforms bringing up a speed-up of 77\({\times }\) as compared to CPU execution. The contribution of this work can be summarized as follows:

  • Optimization-based approach for spatially SR of LF image under mixed Gaussian-Impulse noise condition assembling a joint \(\ell ^1-\ell ^2\) data term with weighted nonlocal TV regularization term.

  • Application of ADMM for solving the proposed optimization problem. As shown in Sect. 4, by properly rewriting the optimization problem into the form of ADMM, the solving process is simplified and more suitable for parallel implementation on the GPU platform.

  • OpenCL-based acceleration of the iterative solving process. As discussed in Sects. 5 and 7, our accelerator not only provides a significant speed-up as compared to CPU but also overcomes the limitation of the previous work in handling large-scale SR problems on the GPU platform.

2 Related works

This section discusses the previous works on the super-resolution of 4D LF images, which are divided into two categories: optimization-based approach and learning-based approach. Among these two, learning-based approaches present state-of-the-art performance.

2.1 Optimization-based methods

Optimization-based methods generally formulated LF SR as an optimization problem, including a data fidelity term built upon a degradation model and a regularization term based on an assumed prior. Regarding the data term, previous works proposed either penalizing the coherence between LR and HR sub-aperture images  [12, 14] or enforcing the intensity similarity over the angular dimension by warping sub-aperture images [13, 21]. Regarding the regularization term, the choice is more diverse. Many image priors are proposed to achieve better output quality and with reasonable computational cost, i.e. Markov random field (MRF) [12], Bilateral TV[21], graph-based [13], sparsity [14].

In [12], Bishop et al. formulated LF imaging process by a set of spatially-variant point spread functions (PSFs). Under Gaussian optic assumptions, these PSFs are derived and applied in a Bayesian SR framework. In [21], LFSR was studied in the context of a multi-image super-resolution problem which considers the degradation process as a combination of three operators: warping, blurring and down-scaling. The authors employed a variational framework [22] to estimate disparity maps used for warping functions, while BTV was selected for regularization. In [13], Rossi et al. assembled an optimization problem with a graph-based regularizer and two \(\ell ^2\) data terms. They employed block matching for estimating disparity values which was used to build the graph map. A patch-based SR approach was proposed in [14]. The authors made used of a 5D transform filter consisting of 2D shape-adaptive DCT, 2D DCT transform, and 1D haar wavelet. By a proper selection of 5D patches, a high degree of sparsity was expected in the transformed signal. This sparsity property was employed for regularization in combination with a \(\ell ^2\) data term.

2.2 Deep learning-based approaches

Deep learning-based methods for LFSR are mainly categorized into two groups. While the first group directly exploits the multi-dimensional structure of LF in learning an end-to-end neural network to synthesize high-resolution view [10, 23], the second group employs a multi-stages processing model for a step-by-step improvement of the reconstruction quality [9, 11]. A 4D convolution method was proposed in [23] to fully exploit the 4D structure of LF images. The 4D convolution was realized as an angular-spatial separable convolution allowing the acquisition of feature maps from both angular and spatial domains. In [10], a residual CNN-based approach was proposed for the super-resolution of LF images. Their network was provided with stacking images from four different angles and predicted an HR image at the central perspective. Due to the diversity in directional position, six CNNs were needed for completely reconstructing high-resolution LF. Compared to learning a single SR network, the two-stages model provides more flexibility and potentially higher reconstruction quality. This type of approach takes advantage of well-trained single image super-resolution (SISR) networks [15, 16] to separately reconstruct an HR view of each SAI in the first stage. These HR images are then enhanced in the second stage through a novel CNN which makes use of inter-perspective information across multiple SAIs. In [24], Fan et al. used VDSR [15] in the first stage and applied a patch-based warping strategy to register the pre-scaling images. The registered images were combined with a reference image before feeding to the second-stage CNN for rendering the final HR view. Yuan et al. [9] employed EDSR [16] as SISR and proposed a refinement CNN which relies on 2D epipolar image for the second stage. Recently, Tran et al. [11] proposed an approach that exploits the 3D EPI structure of LF in a two-stages SR framework. Their method aimed for various LFSR problems, i.e., spatial, angular, and angular-spatial super-resolution. As compared to 2D EPI, which is limited to one spatial dimension, 3D EPI, which assembles two spatial dimensions along with one angular dimension, provides a significant contribution to enhance reconstruction quality. Departed from the usual strategy of employing CNN to directly enhance SR reconstruction quality, Guo et al. [25] proposed to learn coded aperture from LF data and used it as an implicit LF image prior to a deep learning-based framework for de-noising and reconstructing HR LF. Their approach, however, does not consider Impulse noise and treats de-noising and HR as separate reconstruction problems.

2.3 GPU accelerated LF processing

The high demand for computational resources due to the large amount of data provided with 4D LF image encourages the use of GPU as an acceleration platform. Recent works on GPU-based acceleration focus on two main LF processing tasks, disparity estimation [26, 27], and super-resolution [21]. For disparity estimation, a GPU acceleration architecture was presented in [26] for cost-volume-based optimization. The authors employed an advanced matching cost from [28] but decided to choose the winner-take-all solution over the global minimum as scarification of accuracy for less complexity and computation. On the contrary, GVLD [27] proposed a GPU-accelerated approach based on a variational computation framework. The framework combines the intrinsic sub-pixel precision of variational formulation and the effectiveness of weighted median filtering to produce a highly accurate solution. A fully parallelized and optimized OpenCL implementation was provided for finding the global minimum solution.

For super-resolution, Tran et al. [21] proposed to accelerate the optimization problem which assembles an \(\ell ^1\) data fidelity term and a BTV [29] regularization term. Using steepest descent as an iterative solver, which is fully realized with OpenCL kernel execution, the proposed approach provides a significant speed-up as compared to the implementation running on CPU. This paper extends our previous work [21] mainly as follows. First, we revisit the super-resolution model from the statistical perspective and propose a mixed noise (Gaussian and Impulse noise) model based on a combination of \(\ell ^1\) and \(\ell ^2\) fidelity terms. Secondly, we propose a nonlocal total variation weighting scheme that combines bilateral filtering with image features to improve the regularization effect. Thirdly, the alternating direction method of multipliers (ADMM) is employed in this work for solving the optimization problem as a replacement for the steepest descent. ADMM address the short-coming of the steepest descent in finding appropriate step-size while avoiding time-consuming line-search. Lastly, we present an accelerated architecture for realizing the computational framework on the GPU platform. The proposed approach is validated and evaluated through an extensive experiment on synthetic 4D LF dataset and high-resolution natural image dataset.

3 Proposed approach

This section discusses our proposed approach for reconstructing high-resolution LF images under mixed noise conditions. The section starts with a presentation of the degradation model and notation, which form a basis for discussing the proposed optimization model derived from the Bayesian image reconstruction framework. Our selection of data fidelity term and regularization term are consecutively discussed at the end of this section.

3.1 Degradation model and notation

Light-field is a 4D parameterization of the plenoptic function [2] which can be illustrated as a light ray intersecting with two parallel planes,

$$\begin{aligned} \varvec{L}:\varOmega {\times }\varPi \rightarrow \mathbb {R},\qquad (\mathfrak {\varvec{z}},\varvec{\theta }) \rightarrow \varvec{L}(\mathfrak {\varvec{z}},\varvec{\theta }), \end{aligned}$$
(1)

with \(\varvec{\theta }= [\rho ,\tau ]^T\) and \(\mathfrak {\varvec{z}}=[x,y]^T\) indicate the coordinates in the directional plane \(\varPi \subset \mathbb {R}^2\) and the spatial plane \(\varOmega \subset \mathbb {R}^2\), see Fig. 1(a). By fixing the directional coordinate \(\varvec{\theta }\) and let spatial coordinate \(\mathfrak {\varvec{z}}\) vary, we obtains the spatial information from one perspective. Such spatial information is referred to as a sub-aperture image (SAI) or a perspective image. Fig. 1(b) shows a \(5{\times }5\) angular views of LF scene ‘table’ [18]. From this perspective, a 4D LF is a collection of 2D images captured from different viewpoints and the reconstruction of high-resolution SAIs shows a strong connection to the multi-image super-resolution (MISR) problem.

Fig. 2
figure 2

Degradation process

Let us rearrange the 2D angular view of LR SAIs into a 1D set of \(s_k\) LR observations \(Y_k \in \mathbb {R}^{s_y{\times }s_x}\), \(k \in [1,s_k]\). Our goal is to approximate the HR version \(X \in \mathbb {R}^{s_Y\times s_X}\), where \(s_y\times s_x\) and \(s_Y\times s_X\) are the size of LR images and the size of HR image, respectively. In practice, a LR image \(Y_k\) is considered as a degraded version of the HR image X. This degradation can be modelled by the application of three linear operators: warping (\(\mathcal {W}_k\)), blurring (\(\mathcal {B}\)), and down-sampling (\(\mathcal {D}\)), as depicted in Fig. 2. The warping operator represents the positioning of the camera. Shifting the camera’s position will result in the corresponding shifts of pixels in the captured image. We define the warping operator as \(\mathcal {W}_k: \mathbb {R}^{s_Y\times s_X} \rightarrow \mathbb {R}^{s_Y\times s_X}\) which transforms an HR image into a new one observed from a different perspective. The blurring operation represents the point spread function (PSF) which describes the response of an imaging system. Depending on the setup of lenses and imaging sensors, PSFs can be very complicated and even spatially variant. However, as shown in the literature [29, 30], it is sufficient to assume a spatially invariant version of PSF which can be modelled by a linear operator, i.e. \(\mathcal {B}: \mathbb {R}^{s_Y\times s_X} \rightarrow \mathbb {R}^{s_Y\times s_X}\). The down-sampling operator represents the digital sampling process of an imaging sensor, i.e., \(\mathcal {D}: \mathbb {R}^{s_Y\times s_X} \rightarrow \mathbb {R}^{s_y\times s_x}\). As a combination of these linear operators, the image foundation process can be described as

$$\begin{aligned} Y_k = \mathcal {D}_k\circ \mathcal {B}\circ \mathcal {W}_k (X) + \epsilon _k, \forall k \in [1,s_k], \end{aligned}$$
(2)

where \(\epsilon _k\) represents the measurement error or the additive noise which is practically assumed to follow Gaussian distribution or Laplace distribution. For a better presentation, we transform Eq. 2 into vector form,

$$\begin{aligned} \mathfrak {\varvec{y}}_{k} = DBW_k\mathfrak {\varvec{x}}+ \varvec{\epsilon }_k \end{aligned}$$
(3)

where \(\mathfrak {\varvec{y}}_k, \varvec{\epsilon }_k \in \mathbb {R}^{s_xs_y}\) and \(\mathfrak {\varvec{x}}\in \mathbb {R}^{s_X s_Y}\) are the column-vector representations of \(Y_k, \epsilon _k\) and X. Linear transformation matrices D, B, and \(W_k\) respectively replaced the linear operators \(\mathcal {D}\), \(\mathcal {B}\), and \(\mathcal {W}_k\). To further simplify the notation, we define \(p=s_Xs_Y\), \(q=s_xs_y\), and combine DB and \(W_k\) into \(A_k\), i.e., \(A_k=DBW_k\). It follows that \(B,W_k \in \mathbb {R}^{p\times p}\), \(D \in \mathbb {R}^{q\times p}\), and \(A_k \in \mathbb {R}^{q\times p}\).

3.2 Bayesian image reconstruction framework

Let us start with the standard Bayesian formulation which poses the SR problem as a maximum a posteriori (MAP) estimation of HR image \(\mathfrak {\varvec{x}}\) given a set of LR samples \(\{\mathfrak {\varvec{y}}_k\ |\ k=1,..,s_k\}\):

$$\begin{aligned} \tilde{\mathfrak {\varvec{x}}} = \mathop {{\mathrm{arg\,max}}}\limits _{\mathfrak {\varvec{x}}} \mathcal {P}(\mathfrak {\varvec{x}}|\mathfrak {\varvec{y}}_i,...,\mathfrak {\varvec{y}}_{s_k}), \end{aligned}$$
(4)

where \(\mathcal {P}(\mathfrak {\varvec{x}}|\mathfrak {\varvec{y}}_i,...,\mathfrak {\varvec{y}}_{s_k})\) is called posterior and represents the conditional probability density of \(\mathfrak {\varvec{x}}\) given the set of degraded images (\(\mathfrak {\varvec{y}}_k\)). Follow Bayes’ rule, we have

$$\begin{aligned} \mathcal {P}(\mathfrak {\varvec{x}}|\mathfrak {\varvec{y}}_1,...,\mathfrak {\varvec{y}}_{s_k}) = \frac{\mathcal {P}(\mathfrak {\varvec{x}})\prod \nolimits _{k=1}^{s_k} \mathcal {P}(\mathfrak {\varvec{y}}_k|\mathfrak {\varvec{x}})}{\prod \nolimits _{k=1}^{s_k}\mathcal {P}(\mathfrak {\varvec{y}}_k)}, \end{aligned}$$
(5)

with \(\mathcal {P}(\mathfrak {\varvec{y}}_k|\mathfrak {\varvec{x}})\) is a likelihood function which encodes the likelihood that the HR image \(\mathfrak {\varvec{x}}\) is due to the LR observation \(\mathfrak {\varvec{y}}_k\). This function is defined based on the assumption of the noise model of \(\varvec{\epsilon }_k\). Here, we assume that the noise effecting the observed LR image \(\mathfrak {\varvec{y}}_k\) is independent. \(\mathcal {P}(\mathfrak {\varvec{x}})\) is an image prior describing the properties of the high-resolution image being reconstructed. Since the low-resolution samples are known, \(\mathcal {P}(\mathfrak {\varvec{y}}_k), k=1,...,s_k\), are constants, and the above MAP problem can be transformed into a minimization of negative log-likelihood

$$\begin{aligned} \mathop {{\mathrm{arg\,max}}}\limits _{\mathfrak {\varvec{x}}} \mathcal {P}(\mathfrak {\varvec{x}}|\mathfrak {\varvec{y}}_1,...,\mathfrak {\varvec{y}}_{s_k}) = \mathop {{\mathrm{arg\,min}}}\limits _{\mathfrak {\varvec{x}}} - ln \mathcal {P}(\mathfrak {\varvec{x}}) - \sum _{k=1}^{s_k}ln \mathcal {P}(\mathfrak {\varvec{y}}_k|\mathfrak {\varvec{x}}). \end{aligned}$$

The above two logarithmic terms represent the typical setup of an optimization problem consisting of a data fidelity term (i.e., \(E(\mathfrak {\varvec{x}}) := - \sum _{k=1}^{s_k}ln \mathcal {P}(\mathfrak {\varvec{y}}_k|\mathfrak {\varvec{x}})\)) and a regularization term (i.e., \(R(\mathfrak {\varvec{x}}) := - ln \mathcal {P}(\mathfrak {\varvec{x}})\))

$$\begin{aligned} \begin{aligned} \hat{\mathfrak {\varvec{x}}} =\quad \mathop {{\mathrm{arg\,min}}}\limits _{\mathfrak {\varvec{x}}} E(\mathfrak {\varvec{x}}) + R(\mathfrak {\varvec{x}}) \end{aligned} \end{aligned}$$
(6)

3.3 The data fidelity term

The construction of the data fidelity term depends on the noise models which are practically assumed to follow Gaussian and Laplace distribution [31]. For additive Gaussian noise, \(\varvec{\epsilon }_k \sim \mathcal {N}(\mu _k ,\sigma _k^{2})\) follows a normal distribution with the probability density function given by \(\left( 1/\sqrt{2\pi \sigma _k^2}\right) e^{-(\mu _k - \varvec{\epsilon }_k)^2/2\sigma _k^2}\). Assuming a 0 central distribution (i.e. \(\mu _k=0\)), the likelihood function \(\mathcal {P}(\mathfrak {\varvec{y}}_k|\mathfrak {\varvec{x}})\) reads

$$\begin{aligned} \mathcal {P}(\mathfrak {\varvec{y}}_k|\mathfrak {\varvec{x}}) \propto \texttt {exp}\left( \sum _{k=1}^{s_k}\left\Vert A_k\mathfrak {\varvec{x}}- \mathfrak {\varvec{y}}_k\right\Vert _2^2\right) , \end{aligned}$$
(7)

which results in a well-known least square fidelity term. In the case of Laplace noise (i.e., impulse noise), \(\varvec{\epsilon }_k\sim \mathcal {L}(\mu _k, b)\) has the probability density function given by \(\frac{1}{2b} e^{-\left\Vert \mu _k - \varvec{\epsilon }_k \right\Vert _1/b}\),

$$\begin{aligned} \mathcal {P}(\mathfrak {\varvec{y}}_k|\mathfrak {\varvec{x}}) \propto \texttt {exp}\left( \sum _{k=1}^{s_k}\left\Vert A_k\mathfrak {\varvec{x}}- \mathfrak {\varvec{y}}_k\right\Vert _1\right) . \end{aligned}$$
(8)

This results in an \(\ell ^1\) norm data fidelity term, which shows robustness against outliers and superior performance with impulse noise [29]. In order to handle the mixed Gaussian-impulse noise situation, we followed the previous works [32, 33] to combine \(\ell ^1\) and \(\ell ^2\) norm resulting in a joint \(\ell ^1-\ell ^2\) data fidelity term,

$$\begin{aligned} E(\mathfrak {\varvec{x}}) = \sum _{l\in \{1,2\}} \lambda _l\sum _{k=1}^{s_k}\left\Vert A_k\mathfrak {\varvec{x}}- \mathfrak {\varvec{y}}_k\right\Vert _l^l, \end{aligned}$$
(9)

with parameters \(\lambda _1\) and \(\lambda _2\) control the contribution of \(\ell ^1\) and \(\ell ^2\) norm respectively.

3.4 Regularization term

In Bayersian framework, it is generally assumed that \(\mathfrak {\varvec{x}}\) is an Markov random field (MRF) with a strictly positive joint probability density. Therfore, following Hammersly-Clifford theorem, its joint probability density must have the form of a Gibbs distribution [34]:

$$\begin{aligned} \mathcal {P}(\mathfrak {\varvec{x}}) \propto \frac{1}{Z} \texttt {exp}\left( -\frac{1}{T} \sum _{C\in \mathcal {C}}V_{C}(\mathfrak {\varvec{x}}) \right) , \end{aligned}$$
(10)

where Z is a normalizing constant, T stands for temperature and controls the degree of peaking [34]. \(V_{C}\) is called potential defined for a local group of pixels or clique C. The sum is for a set \(\mathcal {C}\) of all possible cliques. The definition of clique set \(\mathcal {C}\) and the selection of the potential \(V_{C}\) lead to various types of image prior, which share the following form

$$\begin{aligned} \mathcal {P}(\mathfrak {\varvec{x}}) \propto \texttt {exp}\Big ( \sum \limits _{\mathfrak {\varvec{u}}\in \varOmega } \sum \limits _{\mathfrak {\varvec{v}}\in \mathcal {N}(\mathfrak {\varvec{u}})} w(\mathfrak {\varvec{u}},\mathfrak {\varvec{v}}) \varPhi (\mathfrak {\varvec{x}}_{\mathfrak {\varvec{u}}}, \mathfrak {\varvec{x}}_{\mathfrak {\varvec{v}}}) \Big ), \end{aligned}$$
(11)

where \(\mathfrak {\varvec{u}},\mathfrak {\varvec{v}}\in \varOmega\) represents the 2D indices of \(\mathfrak {\varvec{x}}\). \(w:\varOmega {\times }\varOmega \rightarrow \mathbb {R}^+\) and \(\phi : \mathbb {R}{\times }\mathbb {R} \rightarrow \mathbb {R}^+\) are respectively weighting function and distance function. The weighting function characterizes the dependency in pixel locations, while the distance function penalizes the difference in pixel intensities. \(\mathcal {N}(\mathfrak {\varvec{u}})\) represents a set of indices defined with regarding to the index \(\mathfrak {\varvec{u}}\). By setting \(\varPhi\) to the absolute difference, we come to the following weighted regularization term,

$$\begin{aligned} \mathfrak {\text {R}}(\mathfrak {\varvec{x}}) = \sum _{\mathfrak {\varvec{u}}\in \varOmega } \sum _{\mathfrak {\varvec{v}}\in \mathcal {N}(\mathfrak {\varvec{u}})} w(\mathfrak {\varvec{u}},\mathfrak {\varvec{v}}) |\mathfrak {\varvec{x}}_{\mathfrak {\varvec{u}}} - \mathfrak {\varvec{x}}_{\mathfrak {\varvec{v}}}|, \end{aligned}$$
(12)

which can be considered as a generalized version of many total variation based image priors, i.e. TV [35], BTV [29], NLTV [36], and BSWTV [37]. In the vector form, Eq. 12 can be rewritten as

$$\begin{aligned} \mathfrak {\text {R}}(\mathfrak {\varvec{x}}) = \sum _{\mathfrak {\varvec{v}}\in \mathcal {N}(\mathfrak {\varvec{u}})} \left\Vert W_{\mathfrak {\varvec{d}}} \odot (S_{\mathfrak {\varvec{d}}} - \text {I})\mathfrak {\varvec{x}}\right\Vert _1,\quad \mathfrak {\varvec{d}}=\mathfrak {\varvec{u}}-\mathfrak {\varvec{v}}, \end{aligned}$$
(13)

where \(S_{\mathfrak {\varvec{d}}}\in \mathbb {R}^{p\times p}\) denotes the shifting matrix which shift \(\mathfrak {\varvec{x}}\) by \(\mathfrak {\varvec{d}}\) (in 2D coordinate), \(\odot\) denotes the Hadamard product. Weighting functions are assembled in weighting matrix \(W_{\mathfrak {\varvec{d}}}= \text {diag}(\mathfrak {\varvec{w}}_{\mathfrak {\varvec{d}}})\), with \(\mathfrak {\varvec{w}}_{\mathfrak {\varvec{d}}}\in \mathbb {R}^{p}\). The main advantage of this regularization term is the flexibility in defining weighting function to capture unique feature of the SR problem. For example, setting \(\mathcal {N}(\mathfrak {\varvec{u}})\) to direct neighborhood and the weighting to a constant gives us TV [35] which regularizes the local smoothness between adjacent pixels. Setting weighting to a function of the pixel distance give us BTV [29], which assumes that the smoothness is spatially dependent. Another weighting scheme based on bilateral spectrum used in  [37] provides a successful regularization for mixed Gaussian-Poisson noise images. Considering the 4D LF data, we proposed a discontinue-aware weighting scheme which assemble three data properties, i.e., spatial distance, edge and occlusion feature,

$$\begin{aligned} \mathfrak {\varvec{w}}_{\mathfrak {\varvec{d}}} := w_d \mathfrak {\varvec{w}}_{e}\odot \mathfrak {\varvec{w}}_{o},\quad w_{\mathfrak {\varvec{d}}}\in \mathbb {R},\mathfrak {\varvec{w}}_{e},\mathfrak {\varvec{w}}_{o}\in \mathbb {R}^p, \end{aligned}$$
(14)

where the spatial weight \(w_{d} := \texttt {exp}\left( \frac{\left\Vert \mathfrak {\varvec{d}}\right\Vert _2^2}{\sigma _s} \right)\) adjusts the impact of weighting w.r.t. the relative distance \(\mathfrak {\varvec{d}}\) and provide a bilateral filtering effect. The edge weight \(\mathfrak {\varvec{w}}_{e}:= \texttt {exp}\left( \frac{\left\Vert \nabla \mathfrak {\varvec{x}}\right\Vert _2^2}{\sigma _e} \right)\) and the occlusion weight \(\mathfrak {\varvec{w}}_{o}\) penalize the smoothness at image discontinuing area. We follow the related works [27, 38] to define the occlusion weight \(\mathfrak {\varvec{w}}_{o}\) as follow,

$$\begin{aligned} \mathfrak {\varvec{w}}_o(\mathfrak {\varvec{z}}) = e^{-\frac{b(\mathfrak {\varvec{z}})^2}{2\sigma _{o_1}^2}} e^{-\frac{p(\mathfrak {\varvec{z}})^2}{2\sigma _{o_2}^2}}, \end{aligned}$$
(15)

where \(b(\mathfrak {\varvec{z}})\) and \(p(\mathfrak {\varvec{z}})\) are the functions of occlusion boundary and projection error, respectively. By an one-side divergence, \(b(\mathfrak {\varvec{z}})\) provides weighting to occluding boundary,

$$\begin{aligned} b(\mathfrak {\varvec{z}}) = \left\{ \begin{array}{ll} \texttt {sum}\{\nabla \varvec{\omega }(\mathfrak {\varvec{z}})\} , &{} \texttt {sum}\{\nabla \varvec{\omega }(\mathfrak {\varvec{z}})\}<0\\ 0 , &{} \text {otherwise}\\ \end{array} \right. , \end{aligned}$$
(16)

where \(\nabla \varvec{\omega }\) denotes the gradient of a disparity map. The projection error function is computed as the intensity difference between a warped view and the reference view, i.e., \(p(\mathfrak {\varvec{z}}) = \varvec{L}(\mathfrak {\varvec{z}},\varvec{\theta }_0) - \varvec{L}(\mathfrak {\varvec{z}}+\varvec{\theta }_i\varvec{\omega }(\mathfrak {\varvec{z}}),\varvec{\theta }_i)\)

4 Optimization approach

Combining the data-fidelity term and regularization term discussed in the previous section, we finalize the minimization problem with the following cost function

$$\begin{aligned} \begin{aligned} \mathfrak {\text {J}}(\mathfrak {\varvec{x}}) &=\lambda _1\sum \limits _{k=1}^{s_k}\left\Vert A_k\mathfrak {\varvec{x}}- \mathfrak {\varvec{y}}_k\right\Vert _{1} + \lambda _2\sum \limits _{k=1}^{s_k}\left\Vert A_k\mathfrak {\varvec{x}}- \mathfrak {\varvec{y}}_k\right\Vert _{2}^{2} \\&\quad + \sum \limits _{d=1}^{s_d} \left\Vert W_d\odot (S_{d} - \mathfrak {\text {I}})\mathfrak {\varvec{x}}\right\Vert _1, \end{aligned} \end{aligned}$$
(17)

Although non-smooth, the cost function is convex, and the existence of the global minimized solution is guaranteed. There are many algorithms that can be used to optimize it. One of the traditional approaches to solving this problem is applying a first-order iterative algorithm such as steepest gradient descent. A more recent approach is alternating direction method of multipliers (ADMM) [39], which breaks a complex optimization problem into smaller sub-problems, each can be solved in a simpler manner. Although ADMM requires more computation for each iterative step as compared to gradient descent, we notice that the overall computation of ADMM is much less considering the similar minimization threshold. We start with rewriting the objective function into a more compact form,

$$\begin{aligned} \mathfrak {\text {J}}(\mathfrak {\varvec{x}})= \left\Vert A\mathfrak {\varvec{x}}-\mathfrak {\varvec{b}}\right\Vert _2^2 + \left\Vert F\mathfrak {\varvec{x}}- \mathfrak {\varvec{b}}'\right\Vert _1, \end{aligned}$$
(18)

the matrices F, A and columns vectors \(\mathfrak {\varvec{b}}\), \(\mathfrak {\varvec{b}}'\) are defined as in Eq. 19. Notice that \(\lambda _1\) and \(\lambda _2\) are absorbed into the matrices and column vectors for simplifying the notation. The sizes of A, F, \(\mathfrak {\varvec{b}}\) and \(\mathfrak {\varvec{b}}'\) are respectively \(qs_k\times p\), \((qs_k+ps_d)\times p\), \(qs_k\times 1\) and \((qs_k+ps_d)\times 1\). All transformation matrices (\(A_k\) and \(S_d\)) and weighting matrices (\(W_d\)) are assembled into A and F. Low-resolution images \(b_k\) are stacked into \(\mathfrak {\varvec{b}}\). \(O_{ps_d}\) is zero vector with the size of \(ps_d\times 1\).

$$\begin{aligned}&A := \sqrt{\lambda _2} \begin{bmatrix} A_1\\ A_2\\ ... \\ A_{s_k} \end{bmatrix},\quad \mathfrak {\varvec{b}}:= \sqrt{\lambda _2}\begin{bmatrix} \mathfrak {\varvec{y}}_1 \\ \mathfrak {\varvec{y}}_2 \\ ... \\ \mathfrak {\varvec{y}}_{s_k} \end{bmatrix} \nonumber \\&F := \begin{bmatrix} \frac{\lambda _1}{\sqrt{\lambda _2}} A \\ S \end{bmatrix}, \mathfrak {\varvec{b}}' := \begin{bmatrix} \frac{\lambda _1}{\sqrt{\lambda _2}}\mathfrak {\varvec{b}}\\ O_{ps_d} \end{bmatrix}, S := \begin{bmatrix} W_1\odot (S_1-I) \\ W_2\odot (S_2-I)\\ ... \\ W_{s_d}\odot (S_{s_d}-I) \end{bmatrix} \end{aligned}$$
(19)

Taking the compact representation, we rewrite the optimization problem in Eq. 17 into the form of ADMM problem,

$$\begin{aligned} \begin{aligned}&{\mathrm{minimize}}_{\mathfrak {\varvec{x}},\mathfrak {\varvec{z}}} \quad \left\Vert A\mathfrak {\varvec{x}}- \mathfrak {\varvec{b}}\right\Vert _2^2 + \left\Vert \mathfrak {\varvec{z}}\right\Vert _1\\&{\mathrm{subject\,to}}\quad F\mathfrak {\varvec{x}}- \mathfrak {\varvec{z}}= \mathfrak {\varvec{b}}', \end{aligned} \end{aligned}$$
(20)

with the augmented Lagragian reads,

$$\begin{aligned} \mathcal {L}_{\vartheta }\big (\mathfrak {\varvec{x}}, \mathfrak {\varvec{z}}, \mathfrak {\varvec{w}}\big )&:=\left\Vert A\mathfrak {\varvec{x}}- \mathfrak {\varvec{b}}\right\Vert _2^2 + \left\Vert \mathfrak {\varvec{z}}\right\Vert _1 \nonumber \\&\quad + \mathfrak {\varvec{w}}^{\intercal }(F\mathfrak {\varvec{x}}- \mathfrak {\varvec{z}}-\mathfrak {\varvec{b}}') + \frac{\vartheta }{2} \left\Vert F\mathfrak {\varvec{x}}- \mathfrak {\varvec{z}}-\mathfrak {\varvec{b}}'\right\Vert _2^2 \end{aligned}$$
(21)

The ADMM problem, Eq. 20, is then broken into the following sub-problems for the two unknowns \(\mathfrak {\varvec{x}}\) and \(\mathfrak {\varvec{z}}\).

$$\begin{aligned} \mathfrak {\varvec{x}}^{(k+1)} \,=\,&{\mathrm{arg\,min}}_{\mathfrak {\varvec{x}}} \mathcal {L}_{\vartheta }(\mathfrak {\varvec{x}}, \mathfrak {\varvec{z}}^{(k)}, \mathfrak {\varvec{w}}^{(k)}) \end{aligned}$$
(22a)
$$\begin{aligned} \,=\,&{\mathrm{arg\,min}}_{\mathfrak {\varvec{x}}} \left\Vert A\mathfrak {\varvec{x}}- \mathfrak {\varvec{b}}\right\Vert _2^2 + \frac{\vartheta }{2}\left\Vert F\mathfrak {\varvec{x}}- \mathfrak {\varvec{z}}^{(k)} -\mathfrak {\varvec{b}}'+ \frac{\mathfrak {\varvec{w}}^{(k)}}{\vartheta }\right\Vert _2^2 \nonumber \\ \mathfrak {\varvec{z}}^{(k+1)} \,=\,&{\mathrm{arg\,min}}_{\mathfrak {\varvec{x}}} \mathcal {L}_{\vartheta }(\mathfrak {\varvec{x}}^{(k+1)}, \mathfrak {\varvec{z}}, \mathfrak {\varvec{w}}^{(k)}) \end{aligned}$$
(22b)
$$\begin{aligned} \,=\,&{\mathrm{arg\,min}}_{\mathfrak {\varvec{z}}} \left\Vert \mathfrak {\varvec{z}}\right\Vert _1 + \frac{\vartheta }{2} \left\Vert \mathfrak {\varvec{z}}- \left( F\mathfrak {\varvec{x}}^{(k+1)} -\mathfrak {\varvec{b}}'+ \frac{\mathfrak {\varvec{w}}^{(k)}}{\vartheta }\right) \right\Vert _2^2 \nonumber \\ \mathfrak {\varvec{w}}^{(k+1)} \,=\,&\mathfrak {\varvec{w}}^{(k)} + \vartheta \big ( F\mathfrak {\varvec{x}}^{(k+1)} - \mathfrak {\varvec{z}}^{(k+1)} -\mathfrak {\varvec{b}}' \big ) \end{aligned}$$
(22c)

The sub-problem of \(\mathfrak {\varvec{z}}\), in Eq. 22b, is actually a proximal operator of \(\ell ^1\) function,

$$\begin{aligned} \mathfrak {\varvec{z}}^{(k+1)} = \mathfrak {\varvec{prox}}_{\vartheta ^{-1}\left\Vert \cdot \right\Vert _1}\left( F\mathfrak {\varvec{x}}^{(k+1)} - \mathfrak {\varvec{b}}' + \frac{\mathfrak {\varvec{w}}^{(k)}}{\vartheta } \right) , \end{aligned}$$

which has the following closed form solution

$$\begin{aligned} \begin{aligned} \mathfrak {\varvec{z}}^{(k+1)} =&\left[ \left|F\mathfrak {\varvec{x}}^{(k+1)} - \mathfrak {\varvec{b}}' + \frac{\mathfrak {\varvec{w}}^{(k)}}{\vartheta }\right| - \frac{1}{\vartheta }\right] _{+}\\&\quad \odot \mathfrak {\varvec{sgn}}\left( F\mathfrak {\varvec{x}}^{(k+1)} - \mathfrak {\varvec{b}}' + \frac{\mathfrak {\varvec{w}}^{(k)}}{\vartheta } \right) \end{aligned} \end{aligned}$$
(23)

The sub-problem of \(\mathfrak {\varvec{x}}\) (Eq. 22a) has the form of a least square approximation problem,

$$\begin{aligned} \begin{aligned} \tilde{\mathfrak {\varvec{x}}}= \mathop {{\mathrm{arg\,min}}}\limits _{\mathfrak {\varvec{x}}} \left\Vert G\mathfrak {\varvec{x}}-\mathfrak {\varvec{c}}\right\Vert _2^2 \end{aligned}, \end{aligned}$$
(24)

with \(G=\begin{bmatrix} A\\ \sqrt{\vartheta /2}F \end{bmatrix}\), and \(\mathfrak {\varvec{c}}=\begin{bmatrix} \mathfrak {\varvec{b}}\\ \sqrt{\vartheta /2}\left( \mathfrak {\varvec{z}}^{(k)} + \mathfrak {\varvec{b}}' - \mathfrak {\varvec{w}}^{(k)}/\vartheta \right) \end{bmatrix}\). Equation 24 can be effectively solved with a conjugate gradient approach on normal equation [40].

4.1 Treatment of linear operators

All computations are eventually broken down to matrix multiplication for which the largest computational efforts are on \(A_k, S_d,\quad k\in [1,s_k], d\in [1,s_d]\), and their adjoint versions \(A_k^{\intercal }, S_d^{\intercal }\). These matrixes are very large and sparse. For example, given a pair of low-resolution and high-resolution: \(s_x\times s_y:= 128\times 128\) and \(s_X\times s_Y=512\times 512\) (i.e., \(4\times\) super-resolution). Assuming \(s_k= 16\) and \(s_d = 8\), the size of A is \(2^{18}\times 2^{18}\) and the size of S is \(2^{21}\times 2^{18}\). Direct computation of these matrixes is infeasible. Therefore, we decided to implement these matrices in the form of linear functions of 2D variables instead of sparse matrix and vectorized inputs.

Fig. 3
figure 3

Implementation of downsampling operator

For downsampling operator \(\mathcal {D}\), a simple resampling scheme is employed as depicted in Fig. 3. For each block of \(\zeta _x\times \zeta _y\) pixels, one pixel at the top-left location is picked and put into the low-resolution grid. The adjoint operator \(\mathcal {D}^{*}\) is therefore simply putting back the corresponding pixel to this location. The bluring operator \(\mathcal {B}\) is modelled by a simple Gaussian kernel with a standard deviation of \(\sigma =\frac{1}{4}\sqrt{\zeta ^2-1}\) and a size of \(3\sigma\) as suggested in [41]. The warping operator \(\mathcal {W}_k\) and its adjoint operator \(\mathcal {W}_k^{*}\) are implemented as forward-warping and backward-warping functions. These functions are associated with a set of disparity maps at each of the perspectives employed for super-resolution. Assumes that a set of \(s_k\) low-resolution sub-aperture images each with its perspective index is in \(P=\{\varvec{\theta }_1, \varvec{\theta }_2,...,\varvec{\theta }_{s_k}\}\) are inputs to estimate an super-resoltion image at \(\varvec{\theta }_0\in P\). For each perspective \(\varvec{\theta }_k\), we need to find the disparity map \(\varvec{\omega }_k\). The forward warping function \(\mathcal {W}_k\) will warp the SAI from perspective \(\varvec{\theta }_0\) to \(\varvec{\theta }_k\) using \(\varvec{\omega }_k\), i.e., \(\widehat{L}(\mathfrak {\varvec{z}},\varvec{\theta }_k) = \varvec{L}(\mathfrak {\varvec{z}}+ \varvec{\theta }_k\varvec{\omega }_k,\varvec{\theta }_k)\), while the backward warping function \(\mathcal {W}_k^{*}\) will warp the input SAI from perspective \(\varvec{\theta }_k\) to \(\varvec{\theta }_0\) using \(\varvec{\omega }_0\), i.e., \(\widehat{L}(\mathfrak {\varvec{z}},\varvec{\theta }_0) = \varvec{L}(\mathfrak {\varvec{z}}+ \varvec{\theta }_0\varvec{\omega }_0,\varvec{\theta }_0)\).

The transformation matrix S can be implemented in the form of weighted directional gradient (\(\nabla ^{U,V}\)) computed for a direction set \(U=\{\mathfrak {\varvec{d}}_i|\mathfrak {\varvec{d}}_i\in \mathbb {N}^2, i=1,..,s_d\}\) and a weight set \(V=\{V_i|V_i\in \mathbb {R}^{s_X{\times }s_Y}, i=1,..,s_d\}\). Let I be the SAI at perspective \(\varvec{\theta }_0\), \(I(\mathfrak {\varvec{z}}) = \varvec{L}(\mathfrak {\varvec{z}},\varvec{\theta }_0)\), we computed \(\nabla ^{U,V}I\) as follow,

$$\begin{aligned} \varvec{G} = \nabla ^{U,V}I = \left( \frac{\partial }{\partial \mathfrak {\varvec{d}}_1}, \frac{\partial }{\partial \mathfrak {\varvec{d}}_2}, .., \frac{\partial }{\partial \mathfrak {\varvec{d}}_{s_d}}\right) I, \end{aligned}$$
(25)

with the weighted directional derivative \(\partial /\partial \mathfrak {\varvec{d}}_i\) approximated by finite differences,

$$\begin{aligned} \varvec{G}_{d_i}(\mathfrak {\varvec{z}}) = \frac{\partial }{\partial \mathfrak {\varvec{d}}_i} I(\mathfrak {\varvec{z}}) = V_i(\mathfrak {\varvec{z}}) \left( I(\mathfrak {\varvec{z}}) - I(\mathfrak {\varvec{z}}+\mathfrak {\varvec{d}}_i)\right) . \end{aligned}$$
(26)

The adjoint matrix \(S^{\intercal }\) is then computed in the form of weighted directional divergence,

$$\begin{aligned} \texttt {div}^{U,V}\varvec{G} = \nabla ^{U,V}\cdot \varvec{G} = \sum _{i=1}^{s_d} \frac{\partial \varvec{G}_{\mathfrak {\varvec{d}}_i}}{\partial \mathfrak {\varvec{d}}_i}. \end{aligned}$$
(27)

5 GPU-accelerated architecture

This section presents the accelerated architecture for 4D LFSR. Acceleration is achieved by parallel computation on graphics processing units. Due to the multi-platform compatibility, we select OpenCL over CUDA for the implementation of the proposed approach. To solve the cost function optimization problem of Eq. 17, we follow the iterative solving process discussed in Sect. 4. As will be discussed later in the experimental results (Sect. 7.1), the ADMM solver provides better performance in optimizing the cost function as compared to the gradient descent approach.

figure a

For better handling of the computation flow, we did the following modifications to ADMM iteration in Eq. 22. First, the order of sub-problems is rearranged such that \(\mathfrak {\varvec{x}}\)-step comes after \(\mathfrak {\varvec{z}}\)-step and \(\mathfrak {\varvec{w}}\)-step. This way allows us to make use of the computation of \(F\mathfrak {\varvec{x}}\) for all sub-problems. Secondly, the parameter \(\vartheta\) is absorbed into \(\mathfrak {\varvec{w}}\) (i.e., \(\mathfrak {\varvec{w}}\) instead of \(\mathfrak {\varvec{w}}/\vartheta\)) to save unnecessary scalar multiplications. \(\vartheta\) only takes part in the computation of proximal operator (z-step) and solving of the least square problem (x-step). Fig. 4 illustrates the modified computations of ADMM solver which is also listed in Algorithm 1.

Fig. 4
figure 4

Computation flow of one ADMM iteration

The ADMM solver takes in three arguments, the parameter \(\vartheta\), an initial guess (\(\mathfrak {\varvec{x}}_0\)) and the number of iterations (N), as in Algorithm 1. Before the iteration, we initialized \(\mathfrak {\varvec{x}}\) with \(\mathfrak {\varvec{x}}_0\), a bi-cubic up-sampling of the low-resolution image, and \(\mathfrak {\varvec{w}}\) with zeros, line 1, 2. Each iteration starts with the computation of \(A\mathfrak {\varvec{x}}\) and \(F\mathfrak {\varvec{x}}\) which are associated with \(\ell ^2\) and \(\ell ^1\) terms of the objective function, Eq. 18. While \(A\mathfrak {\varvec{x}}\) is subtracted by \(\mathfrak {\varvec{b}}\), line 4, \(F\mathfrak {\varvec{x}}\) is subtracted by \(\mathfrak {\varvec{b}}'\) and summed with \(\mathfrak {\varvec{w}}\), line 5. Since F is a stack of A and S and \(\mathfrak {\varvec{b}}'\) includes \(\mathfrak {\varvec{b}}\), Eq. 19, we avoid the re-computation of \(A\mathfrak {\varvec{x}}-\mathfrak {\varvec{b}}\) by extracting it from \(F\mathfrak {\varvec{x}}-\mathfrak {\varvec{b}}'\) as depicted in Fig. 4. The sum and subtract operations in line 5 are realized by two-arguments sum kernels (i.e., sum in Fig. 4). The gray box attached to each input to the sum kernel denotes the scalar scaling of the input. On line 6, we conduct a \(\mathfrak {\varvec{z}}\)-step by computing the proximal operator of \(\mathfrak {\varvec{u}}\). This proximal operator is realized by an OpenCL kernel prox, as in Fig. 4, followed by a sum kernel which realizes \(\mathfrak {\varvec{w}}\)-step, line 7 Algorithm 1.

After the computation of \(\mathfrak {\varvec{z}}\) and \(\mathfrak {\varvec{w}}\), the next step is preparing the residual input for the conjugate gradient descent solver in \(\mathfrak {\varvec{x}}\)-step, \(\mathfrak {\varvec{v}}= G^{\intercal }(G\mathfrak {\varvec{x}}-\mathfrak {\varvec{c}})\). From Eq. 24, we have

$$\begin{aligned} \mathfrak {\varvec{v}}=&\quad \begin{bmatrix}A\\ \sqrt{\vartheta /2}F\end{bmatrix}^T \begin{bmatrix}A\mathfrak {\varvec{x}}-\mathfrak {\varvec{b}}\\ \sqrt{\vartheta /2}\left( F\mathfrak {\varvec{x}}- \mathfrak {\varvec{z}}^{(n)}-\mathfrak {\varvec{b}}'+\mathfrak {\varvec{w}}^{(n)} \right) \end{bmatrix} \nonumber \\ =&\quad A^{\intercal }(A\mathfrak {\varvec{x}}-\mathfrak {\varvec{b}}) + \vartheta /2 F^{\intercal }(F\mathfrak {\varvec{x}}-\mathfrak {\varvec{z}}^{(n)}-\mathfrak {\varvec{b}}'+\mathfrak {\varvec{w}}^{(n)}) \nonumber \\ =&\quad A^{\intercal }\mathfrak {\varvec{a}}+ \frac{\vartheta }{2} F^T\mathfrak {\varvec{f}}\end{aligned}$$
(28)

With the computation of \(\mathfrak {\varvec{f}}\), Algorithm 1 line 8, as \(\mathfrak {\varvec{f}}= 2 \mathfrak {\varvec{w}}^{(n)} - \mathfrak {\varvec{w}}^{(n-1)} = \mathfrak {\varvec{u}}-\mathfrak {\varvec{z}}- \mathfrak {\varvec{w}}^{(n-1)} + \mathfrak {\varvec{w}}^{(n)} = F\mathfrak {\varvec{x}}^{(n-1)}-\mathfrak {\varvec{z}}- \mathfrak {\varvec{b}}'+\mathfrak {\varvec{w}}^{(n)}\). The computations of \(\mathfrak {\varvec{f}}\) and \(\mathfrak {\varvec{v}}\) are realized by two sum kernels directly before and after \(F^{T}\) as in Fig. 4. Notice that we made a scaling of \(A^T\mathfrak {\varvec{a}}\) by \(\frac{\lambda _2}{\lambda _1}\) since \(\mathfrak {\varvec{a}}\) is extracted from \(F\mathfrak {\varvec{x}}-\mathfrak {\varvec{b}}'\) which has a different scalar scaling of matrix A and column vector \(\mathfrak {\varvec{b}}\). Another note from the implementation of Fig. 4 is that the group of OpenCL kernels marked by dashed rectangle would be combined into a single kernel, since these kernels share element-wise operators.

Fig. 5
figure 5

Computation flow of xstep

As discussed in the previous section, conjugate gradient descent on normal equation is employed to solve \(\ell ^2\) optimization problem of \(\mathfrak {\varvec{x}}\)-step. Figure 5 depicts the computation flow of \(\mathfrak {\varvec{x}}\)-step, while its pseudo code is listed in Algorithm 2. There are two inputs, i.e. \(\mathfrak {\varvec{v}}\), \(\mathfrak {\varvec{x}}^{(n-1)}\), and two scalar parameters, i.e., \(\tau\), K. The computed HR image from the previous ADMM iteration \(\mathfrak {\varvec{x}}^{(n-1)}\) is used as the initial guess for the conjugate gradient descent solver, while the residual \(\mathfrak {\varvec{v}}\) is used to initialize \(\mathfrak {\varvec{r}}^{(0)}\), \(\mathfrak {\varvec{p}}^{0}\) and compute the initial error \(\pi ^{(0)}\). The two parameters \(\tau\) and K specify the error threshold and the maximum number of conjugate gradient iterations, respectively. The stop condition is that either the residual \(\mathfrak {\varvec{r}}\) is sufficiently small or the maximum number of iterations is reached, Algorithm 2 line 4,5. All computations in Algorithm 2 can be effectively broken down into GPU kernel implementation. Beside the forward and backward transform (\(G,G^{\intercal }\)), there are two kernels sum and dot, as in Fig. 5, which represents element-wise sum and dot product respectively.

From the Eqs. 19 and 24, we can derive the computation of \(G^TG\) in the form of A and S as

$$\begin{aligned} \begin{aligned} G^TG = A^TA + \frac{\vartheta }{2}F^{\intercal }F = \left( 1+\frac{\vartheta }{2}\frac{\alpha _1^2}{\alpha _2}\right) A^TA + \frac{\vartheta }{2}S^TS, \end{aligned} \end{aligned}$$
(29)

with the kernel realization of A,S and its adjoint version \(A^{\intercal }\), \(S^{\intercal }\) shown in Fig. 6. The figure illustrates the change in the size of the column vector after each kernel execution. Regarding Fig. 6, fwarp, bwarp, blur, up, and down denote the forward warp, backward warp, blur, up-sampling and down-sampling kernel respectively. wdg kernel realizes the weighted directional gradient (i.e., \(\nabla ^{U,V}\)), while the weighted directional divergence (i.e., div\(^{U,V}\)) is implemented by wdd kernel.

figure b
Fig. 6
figure 6

Kernel realization of \(A,A^{\intercal },S,S^{\intercal }\)

6 Limitation and discussion

Although the strategy to realize the degradation process with linear functions has the advantage of saving computational resources and simplifying the GPU implementation, it presents a drawback in dealing with a more challenging blurring process, i.e., space-variant PSFs. In this work, we assume that the PSF is space-invariant and can be approximated by a single Gaussian blur kernel. However, depending on the optical setup, the blurring process may involve a set of space-variant PSFs. This means that each blur kernel may only be applied to a group of pixels, and different regions of an image would require different blur kernels. In such a case, sparse matrix realization of blurring operator would be a reasonable option to avoid the complication of maintaining and applying region-specific blur kernel.

Besides Gaussian and Impulse noise, there is another challenging noise originating from the discrete nature of the electric charge, namely photon noise or shot noise [42]. Different from additive Gaussian noise, which is pixel independent, the photon noise is pixel dependent and follows the Poisson distribution. Taking the notation from Sect. 3.1, the degradation model considering Poisson noise and additive Gaussian noise reads

$$\begin{aligned} \mathfrak {\varvec{y}}_k = \mathfrak {\varvec{z}}_k + \varvec{\epsilon }, \end{aligned}$$
(30)

where \(\mathfrak {\varvec{z}}_k \sim \mathcal {P}(A_k\mathfrak {\varvec{x}})\) and \(\varvec{\epsilon }\sim \mathcal {N}(0,\sigma ^2)\) represent the Poisson distribution and a zero-mean Gaussian distribution, respectively. Following the work in [30], we can rewrite our data fidelity term as

$$\begin{aligned} E(\mathfrak {\varvec{x}}) = \sum \limits _{k=1}^{s_k}\left\Vert A_k\mathfrak {\varvec{x}}-\mathfrak {\varvec{y}}_k\right\Vert _{W_k}^2 + <log\big (A_k\mathfrak {\varvec{x}}+ \sigma ^2\big ),1>, \end{aligned}$$
(31)

where \(log(\cdot )\) is computed element-wise and diagonal weight matrix \(W_i\) is computed as

$$\begin{aligned} W_k = diag\left( \frac{1}{[A_k\mathfrak {\varvec{x}}]_i + [\sigma ]_i^2} \right) , \end{aligned}$$
(32)

with \([\mathfrak {\varvec{x}}]_i\) denotes the \(i^{th}\) element of column vector \(\mathfrak {\varvec{x}}\). As discussed in [30], although \(\ell 1\)/\(\ell 2\) data terms can also be applied to input data with Poisson noise, their reconstruction quality is about 1dB worse as compared to applying Eq. 31. Due to the log function, the above data term will lead to a non-convex optimization problem in which a global minimum is not guaranteed. For solving this new problem, a new decomposing strategy with ADMM needs to be developed. This task, together with the acceleration of the new solving process, is listed in our plan for future work.

7 Experimental results

This section discusses the results of our experiments, in which the robustness of the proposed SR model is validated through numerous testing scenarios. Comparisons to the state-of-the-art approaches under severe mixed noise conditions and previous GPU acceleration approaches are presented. In addition, the performance of the accelerated computational framework is also analyzed and discussed.

7.1 Evaluation of LFSR computational framework

Light-field scenes from 4D synthetic dataset [18] are employed to evaluate the robustness of the SR model and analyze the converge of iterative solvers. This dataset is selected since it includes plenty of scenery and provides accurate disparity maps. We follow the degradation model discussed in Sect. 3.1 to prepare the input data with two test scaling factors, i.e., \({\times }2\), \({\times }4\). The observation noises are parameterized by \(\sigma\) and \(\nu\), which respectively denotes the standard deviation of Gaussian noise (i.e., \(\mathcal {N}(\mu ,\sigma )\)) and the percentage of impulse noise (i.e., salt and pepper). To match the practical use cases in which the high-resolution disparity maps are not available, the provided disparity maps are down-scaled by the same factor as of the test case (i.e., \({\times }2\), \({\times }4\)) and then are interpolated back to the original size and used in the warping functions. For handling color input data, we follows the strategy proposed in [21] to solve the cost function for Y color channel while applying bi-cubic interpolation for Cb and Cr channel.

Fig. 7
figure 7

Regularization weights calculation for LF scene ‘boardgames’; top row: full size image and weights; bottom row: zoom-in of region marked by green rectangle; a Ground-truth image; b weights at the \(1^{st}\) iteration; c weights after the \(10^{th}\) iteration

Fig. 8
figure 8

\(\times 4\) super-resolution of LF scene ‘dishes’ (\(\sigma =1\), \(\nu =1\%\)); top row: full size image; bottom row: zoom-in of region marked by green rectangle; a Ground-truth image; b bi-cubic up-sampling (21.72 dB); c 1st iteration (24.96 dB) d 5th iteration (29.53 dB)

The regularization weights computed for the scene ‘boardgames’ are shown Fig. 7. It is expected that a strong weighting is applied to the region where high-frequency information occupied (i.e., texture edges, occlusions). As discussed in Sect. 3.4, the regularization weight is a combination of spatial weight (\(w_d\)), edge weight (\(\mathfrak {\varvec{w}}_e\)), and occlusion weight (\(\mathfrak {\varvec{w}}_o\)). To strengthen the regularizing effect, the weights are recomputed for each ADMM iteration using the current computed super-resolution image \(\mathfrak {\varvec{x}}\). When the optimization starts, \(\mathfrak {\varvec{x}}\) is initialized to a bi-cubic up-sampling of the low-resolution image. This explains the blur edges of regularization weights at iteration 1, as shown in Fig. 7(b). However, it could be observed after each ADMM iteration that the qualities of \(\mathfrak {\varvec{x}}\) and regularization weight are gradually improved. As shown in Fig. 7(c), the regularization weight after 10 ADMM iterations capture well the high-resolution structure of the reconstructed scene.

Figure 8 visualizes the SR result for the \({\times }4\) test case of LF scene ‘dishes’. We employed 17 LR sub-aperture images as inputs to calculate the cost function in Eq. 17 which is then solved by ADMM iterative solver. The SAIs are picked up from \(5{\times }5\) angular views in a star-like structure. As compared to the bi-cubic up-sampling image used as an initial solution [Fig. 8 (b)], the reconstructed HR image after the first ADMM iteration [Fig. 8(c)] demonstrates an obvious improvement in visibility. Although the noise effect from the combination of multiple SAIs is still visible, it is possible to observe the texture content (i.e., small characters in the middle of the zoom-in region). After 5 ADMM iterations, the noise effect is removed, resulting in a significant enhancement in visual quality with 4.6 dB and 7.8 dB improvement as compared to the 1st iteration’s solution and the initial solution, respectively.

Fig. 9
figure 9

\(\times 2\) super-resolution of LF scene ‘medieval2’ (\(\sigma =10, \nu =1\%\)); left : a cropped noisy LR input; right: four zoom-in of the marked region from an LR input and three different configurations of data fidelity term

Fig. 10
figure 10

Super-resolution \(\times 4\) results of LF scene ‘vinyl’ under different number of inputs. top full size HR image; bottom zoom-in of marked region; a Ground-truth image; b bi-cubic upsampling (24.04dB); c 3 SAIs (27.55dB); d 5 SAIs (29.33dB); e 9 SAIs (30.54dB); f 25 SAIs (31.65dB); g 49 SAIs (32.09dB)

To evaluate the contribution of \(\ell ^1\) and \(\ell ^2\) data terms in reconstructing HR perspective image under mix-noise condition, we prepare a test case in which LR Light-field is severely damaged by noise effects, see Fig. 9. While keeping the regularization part unchanged, we tuned data fidelity parameters (\(\lambda _1\), \(\lambda _2\)) to find a solution with the highest PSNR score for each model (i.e., \(\ell ^1\), \(\ell ^2\), \(\ell ^1+\ell ^2\)). We observed that using only \(\ell ^2\) data fidelity tends to oversmooth the solution due to the effect of the \(\ell ^2\) norm. Although \(\ell ^1\) data fidelity well preserves the sharp edge structure, it also carries the effect of the noisy pixels into the solution. The proposed mix-noise data term combines the impacts of both \(\ell ^2\) norm and \(\ell ^1\) norm and provides a better reconstruction quality.

The number of input LR images play an important role in the quality of reconstructed HR image. Although demanding higher computation resources, we observed that more input SAIs tend to provide higher reconstruction qualities. Figure 10 reports the \(\times 4\) super-resolution results of LF scene ‘vinyl’ where different numbers of LR sub-aperture images are used. As can be seen from the figure, giving more input images to the computational problem (Eq. 17) results in a better visual quality of HR solutions, which is also evident from the reported PSNR scores. Specifically, an improvement of 3.5dB as compared to bi-cubic up-sampling can be achieved with three input images. When increasing the number of LR images to 5, 9, 25, and 49, we observed the incremental gains of 1.8 dB, 1.2 dB, 1.1 dB and 0.44 dB, respectively.

To compare the convergence of the iterative solvers, we employ the matrix transform functions (i.e., A,S) and their adjoint versions (i.e., \(A^{\intercal }\), \(S^{\intercal }\)) as computation units (CU). As derived in Sect. 4, these transforms are the most dominant computation tasks and exist in every iterative step. Each CU is either a combination of A and S as for computing the cost function \(\mathfrak {\text {J}}\) or \(A^{\intercal }\) and \(S^{\intercal }\) as for computing the gradient \(\nabla \mathfrak {\text {J}}\). In this experiment, we built a cost function for \(\times 2\) SR problem of LF scene ‘vinyl’ and applied four different configurations of the iterative solvers to optimize it. The first two are gradient descent solver (GD) without and with line search denoted as gd and gd-ls respectively. The last two are ADMM solvers in which we configure the maximum number of conjugate gradient steps to 5 (admm-5) and 10 (admm-10). Figure 11 presents the plot of the loss function against the accumulated CU. Providing a good step size, GD without line search can make a rapid reduction in the cost function for the first few iterations. However, due to fixed step size, the GD cannot optimize the loss function further after 80 CUs. In contrast, gd-ls seems slow at the beginning due to the search for an appropriate step size but is able to surpass gd at around 100 CUs and approach the global minimum after around 300 CUs. Avoiding the costly line-search tasks, both configurations of the ADMM solver demonstrate a superior convergence rate as compared to GD. We also observed that setting the maximum number of conjugate descent steps to 5 does shorten the computation effort for the first few iterations. However, at later iterations when the early stop condition is satisfied, i.e., Algorithm 2 line 5, both settings result in a similar performance.

Fig. 11
figure 11

Optimization results of different solvers

Fig. 12
figure 12

\({\times }2\) super-resolution result of LF scene ‘vinyl’ degraded by motion blur. a a cropped of ground truth with two marked region and motion blur kernel shown at the top left corner; b zoom-in of ground truth image; c bi-cubic initial image (26.86dB); d after \(1^{st}\) ADMM iteration (30.27 dB); e after \(10^{th}\) ADMM iteration (35.43 dB)

The proposed computation framework can also be applied to a more challenging image condition such as motion blur. In such a case, the motion blur can be modelled by a convolutional kernel as a realization of the linear operator \(\mathcal {B}\) (see Fig. 2). Figure 12 shows our \({\times }2\) SR result for low-resolution LF input degraded by a 45° motion blur. The blur kernel is shown on the top left corner of Fig. 12(a) and two zoom-in regions of the bi-cubic upsampling of degraded low-resolution SAI are shown in Fig. 12(c). Taking 25 SAIs as inputs to our reconstruction algorithm, we can achieve more than a 3 dB improvement in PSNR score after one ADMM iteration. The high-resolution LF image is well reconstructed after \(10^{th}\) ADMM iterations with clear texture information and motion trace.

Table 1 Quantitative comparison of LFSR approaches under various mixed noise settings
Fig. 13
figure 13

Visual comparisons of LFSR approaches under various mixed noise settings. From top to bottom (‘scene’ - \(\sigma\)/\(\nu\)): ‘Rooster-clock’ - 20/0; ‘Coffee-beans-vases’ - 20/5; ‘Smiling-crowd’ - 50/0; ‘Dishes’ - 50/20

7.2 Comparison to LFSR approaches

In this section, we evaluate the performance of the proposed method under severe mixed noise conditions and compare it to state-of-the-art approaches (i.e., resLF [10], DRLF [25], and 3DVSR [11]). These approaches currently provide state-of-the-art performance in reconstructing high-resolution LF images. To the best of our knowledge, only DRLF [25] supports LFSR with noisy input. For the evaluation, we randomly select five scenes from the Inria LF dataset [19]. For each scene we generate low resolution LF (\({\times }2\)) and insert noises with four configurations, (\(\sigma \texttt {=}20,\nu \texttt {=}0\%\)), (\(\sigma \texttt {=}20,\nu \texttt {=}5\%\)), (\(\sigma \texttt {=}50\), \(\nu \texttt {=}0\%\)), (\(\sigma \texttt {=}50,\nu \texttt {=}20\%\)). These Gaussian noise settings are selected due to the pre-trained weights published by DRLF. DRLF needs different trainings for dealing with different noise conditions, and there are only three pre-trained weights published for three Gaussian noise configurations \(\sigma \texttt {=}10\), \(\sigma \texttt {=}20\), and \(\sigma \texttt {=}50\). In addition, DRLF does not directly process noisy LR inputs. It provides separate networks for de-nosing and super-resolution. Therefore, we applied first their de-noising network to noisy LR inputs and then applied their SR network to the de-noised LR outputs. In this way, we are able to evaluate the performance of the other two state-of-the-art LFSR approaches (i.e., resLF [10], 3DVSR [11]) using the de-noised LR output from DRLF.

The experimental results are reported in Table 1 and visualized in Fig. 13. For the two approaches, resLF and 3DVSR, which do not support noisy LF input, we generate de-noised LF with DRLF and use it as an input to resLF and 3DVSR. These results are denoted as De+resLF and De+3DVSR respectively. For all noise settings, our approach provides the best reconstruction quality in terms of PSNR. For mixed noise settings, the proposed method achieves an averagely highest SSIM score as compared to the other approaches. These high scores pay tribute to the robustness of the proposed model in which de-noising and super-resolution are jointly resolved. Without de-nosing resLF and 3DVSR completely fails to reconstruct a good quality HR image. In practice, they up-scale not only the texture but also the existing noise. Their scores are, therefore, even worse as compared to bi-cubic upsampling approach in which noise are blurred out. From Fig. 13, it is evident that the reconstructed HR image from the other approaches is over-smoothed while our approach preserves well the texture content and high-frequency information, e.g., and object edges in Dishes scene, background pattern in Smiling-crowd scene. Since DRLF supports only Gaussian noise, it fails to recognize impulse noise in the LR input. The impulse noise is either ignored, i.e., when Gaussian noise level is low, or mistreated, i.e., in a severe Gaussian noise setting. Consequently, the reconstructed HR images are presented with noisy traces, i.e., Coffee-beans-vases scene or losing texture detail, i.e., the flower bud in Dishes scene.

7.3 Comparison to GPU-accelerated approach

As discussed in Sect. 3.1, the proposed framework shares a similar setup as a multi-frame super-resolution problem and indeed can be applied as well for this kind of problem. To evaluate the performance of our accelerated framework, we conducted an experiment on the natural image dataset DIV8K [20] and compared to recent related work in the field (FL-MISR [17]). We follow the experimental setup described in [17] to prepare the low-resolution images and perform the HR image reconstruction with our accelerated solver. Particularly, we pick up seven images from DIV8K dataset and generate, for each of them, four LR images for \({\times }2\) SR and nine LR images for \({\times }3\) SR. The shifting of \({\times }2\) and \({\times }3\) image sets are respectively \(\frac{1}{2}\)px and \(\frac{1}{3}\)px. The Gaussian noise is configured with \(\sigma =1\). Since FL-MISR use \(\ell ^1\) data fidelity and BTV regularization in their model, we turn off our \(\ell ^2\) term and configure nonlocal weighting (i.e. \(W_\mathfrak {\varvec{d}}\)) to match BTV condition. The accelerated ADMM iterative solver is then executed to minimize the cost function in Eq. 17. For a fair comparison, we stop our iterative solver as soon as the quality of the reconstructed image is comparable to FL-MISR and measure the execution time. Quantitative evaluation results are listed in Table 2, while visual comparison is given in Fig. 14 From the table, it is obvious that our GPU accelerated solver outperforms FL-MISR in processing speed for all test cases while providing a better reconstruction quality. As compared to FL-MISR, our GPU-based solver achieves an average speed-up of 2.46\({\times }\) and 1.57\({\times }\) for up-scaling \({\times }2\) and up-scaling \({\times }3\) respectively. This performance boost tributes to the effectiveness of ADMM solver and the realization strategy of transformation matrices (\(A_k, S_\mathfrak {\varvec{d}}\)). In contrast to FL-MISR, which chooses to implement \(A_k, S_\mathfrak {\varvec{d}}\) with sparse matrices, our approach takes advantage of linear functions (i.e., \(\mathcal {W}, \mathcal {B}, \mathcal {D}\)) to optimize GPU memory and computation resource. Therefore, our GPU-based solver can fit well within a single GTX 1080Ti GPU, while FL-MISR needs four of them to solve the same problem.

Table 2 Evaluation of parallel computing approach for MISR problem on 8-bit natural images in DIV8K dataset
Fig. 14
figure 14

HR reconstruction results of DIV8K dataset [20]. Top \({\times }2\) results of image 0002; bottom \({\times }3\) results of image 0084

7.4 Performance analysis of openCL-based solvers

Fig. 15
figure 15

Cumulative execution time and speed-up of three realization strategies

Fig. 16
figure 16

Execution of GPU-based solver under different number of inputs on various OpenCL platforms

For analyzing the performance improvement of the proposed GPU accelerated approach, we perform an evaluation of three realization strategies of ADMM iterative solvers. Figure 15 reports the cumulative execution time of the three GPU implementations. The initial GPU implementation (i.e., buf) is considered as a baseline, in which a 1D buffer object is used for holding variable and input data in GPU global memory. In the second implementation, denoted as i2d, 1D buffer objects are replaced by Image2D objects. This allows us to make use of the texture cache provided in GPU architecture for speeding up the access to image-like data. The third implementation, denoted as \(i2d\_local\) takes advantage of local memory for buffering and sharing data within a work-group. Since local memory is close to the computing unit, this provides a high-speed data pool for kernel tasks which frequently require access to multiple neighbor pixels (i.e. blurring, warping). For this experiment, we use \(5{\times }5\) angular views as input for \({\times }4\) SR to a spatial resolution of \(512{\times }512\). The number of ADMM iterations and CG iterations is set to 10 and 5, respectively. The execution time of the ADMM solver can be divided into three parts. The io part covers the time for transferring input data from CPU memory into GPU global memory and reading back the reconstructed HR image from GPU to GPU memory. The \(wz-step\) part represents the computation time of updating \(\mathfrak {\varvec{w}}\) and \(\mathfrak {\varvec{z}}\) in an ADMM iteration, while \(x-step\) part measures the time to solve for \(\mathfrak {\varvec{x}}\) by applying conjugate gradient descent technique, see Fig. 4. As could be seen from Fig. 15, the IO time only accounts for a small amount of overall execution time, while most of the time is spent on \(x-step\) and \(wz-step\). As compared to the buf version, the texture cache provided by Image2D object i2d does shorten the computation time of ADMM solver by a factor of 1.2\(\times\). The local memory sharing technique further speeds up the computation time by a factor of 1.8\(\times\).

The advantage of using the OpenCL framework is that the accelerated solver can be executed on various platforms. Figure 16 shows the execution time of \(i2d\_local\) on various GPU platforms. In this test, we vary the number of input LR images: 9, 25, 49, and 81, which are denoted as \(3\times 3\),\(5\times 5\),\(7\times 7\), and \(9\times 9\), respectively. The regularization window size is configured to \(5{\times }5\), and the number of conjugate gradient steps is set to 5. For each case, we measure the execution time of a single ADMM iteration and compare it to the CPU implementation, executed on i7-5820K 3.30GHz. In general, we observed a higher speed up as compared to CPU execution when more input images are provided. The speedup ranges from 23\(\times\) to 40\(\times\) in the case of \(3\times 3\) inputs and from 43\(\times\) to 77\(\times\) in the case of \(9\times 9\) inputs.

8 Conclusion

This paper presents a GPU-accelerated computational framework for reconstructing high-resolution SAI from 4D LF data under mixed Gaussian-Impulse noise conditions. The proposed SR model derived from a statistical perspective takes advantage of a joint \(\ell ^1-\ell ^2\) data fidelity term for dealing with mixed noise conditions and weighted non-local total variation for enforcing LF image prior. Our approach combines the de-noising effect and SR reconstruction into a single optimization problem which, as shown in the experimental results, allows us to surpass the current state-of-the-art approaches in which de-noise and SR problems are resolved separately. The non-smooth convex optimization problem resulting from the proposed SR model is effectively solved by ADMM algorithm. By transforming the minimization of \(\ell ^1 -\ell ^2-\ell ^1\) mixture cost function into least square approximation and proximal operator problems, ADMM overcomes the main problem of gradient descent technique in finding a suitable step-size. We showed that GPU acceleration is well-suited to speeding up the iteratively solving process. To verify the robustness of the proposed SR model and evaluate the performance of the accelerated optimizer, an extensive experiment is conducted on 4D synthetic LF dataset and high-resolution natural image dataset. The experimental results show that the proposed approach outperforms the previous work in accelerating the super-resolution task and optimizing GPU resources. While providing a better reconstruction quality, our accelerated framework provides an average speed up of 2.46\({\times }\) and \(1.57{\times }\) for \({\times }2\) and \({\times }3\) SR tasks, respectively. The accelerated solver achieves a speedup of 77\({\times }\) as compared to CPU implementation.

The proposed approach encourages further research directions on both algorithmic and computing architecture levels. In the first direction, we would extend the SR model to handle a more challenging noise setting, i.e., photon noise, which follows the Poisson distribution. Solving such a problem would require a new ADMM decomposing strategy for the non-convex non-smooth optimization problem. In the second direction, the iterative solving process could be realized on a field-programmable gate array (FPGA) platform on which we could achieve much higher processing speed and much lower energy consumption as compared to GPU. For this task, the main challenges lie in the realization of the warping function and the access of 4D-LF data on hardware.