Keywords

1 Introduction

Multispectral imaging has been widely applied in various application fields, including biomedicine [1], remote sensing [2], color reproduction [3], and etc. Multispectral imaging can achieve high spectral resolution, but lacks spatial information when compared with general RGB cameras. The objective of this work is to reconstruct a high-resolution (HR) multispectral image by fusing a low-resolution (LR) multispectral image and an HR RGB image of the same scene.

The fusion of multispectral and RGB image can be conveniently formulated in the Bayesian inference framework. The work [4] estimates the signal-dependent noise statistics to generate the conditional probability distribution of acquired images, and makes the reconstruction robust to noise corruption. Extracting auxiliary information in Bayesian framework requires additional calculations and influences the reconstruction efficiency to some degree.

Matrix factorization has been widely employed in image fusion. As spectral bands are highly correlated, principal component analysis (PCA) is used in [5] to decompose the image data. By adopting the coupled nonnegative matrix factorization criterion, the spectral unmixing principle is employed in [6] to unmix the hyperspectral and multispectral image in a coupled fashion. Meanwhile, tensor factorization has the potential to fully exploit the inherent spatial-spectral structures during image fusion. The work [7] incorporates the non-local spatial self-similarity into sparse tensor factorization and casts the image fusion problem as estimating sparse core tensor and dictionaries of three modes.

Regularization techniques can be employed to produce a reasonable approximate solution when the fusion problem is ill-posed. The HySure algorithm [8] uses vector total variation as an edge-preserving regularizer to promote a piecewise-smooth solution. The NSSR algorithm [9] uses a clustering-based regularizer to exploit the spatial correlations among local and nonlocal similar pixels. The regularization problem is usually solved though iteration. To decrease the computational complexity, the R-FUSE algorithm [10] derives a robust and efficient solution to the regularized image fusion problem based on a generalized Sylvester equation. In addition, the work [11] explores the properties of decimation matrix and derives an analytical solution for the \(\ell 2\) norm regularized super-resolution problem.

Deep learning presents new solutions for the multispectral image super-resolution. The work [12] learns a mapping function between LR and HR images by training a deep neural network with the modified sparse denoising autoencoder. PanNet [13] has the ability to preserve both the spectral and spatial information during the learning process, as its network parameters are trained on the high-pass components of the PAN and upsampled LR multispectral images.

Inspired by the above works, this paper proposes a super-resolution algorithm to reconstruct the target HR multispectral data via structure-guided RGB image fusion. In the algorithm, the spatial and spectral degradation models are used to fit the acquired image data. An edge-preserving regularizer, which is in the form of directional total variation (dTV) [14], is used to guide the image reconstruction. It is based on the reasonable assumption that the spectral images and RGB image share not only the edge location but also the edge direction. To avoid the singularity induced by spectral dependence, the reconstruction is performed on a subspace of the LR multispectral image. The fusion problem is finally solved by the alternating direction method of multipliers (ADMM) algorithm [15] through iteration. The solutions of subproblems are in closed-form and can be accelerated in frequency domain.

The main contributions of this paper include: (1) The image fusion accuracy is improved by guiding the recovered edge structure in accordance to that of RGB image, and (2) The image fusion efficiency is improved by solving the subproblems in closed-form and accelerating the solutions in frequency domain. These makes the proposed algorithm more suitable for practical applications.

2 Problem Formulation

The acquired LR multispectral image is denoted as , where \(m\times n\) is the spatial resolution and L is the number of spectral bands. The acquired HR RGB image has the spatial resolution \(M\times N \). Denoting the scale factor of resolution improvement with d, the spatial dimensions are related by \(M = m\times d\) and \(N = n\times d\). The goal of super-resolution is to estimate the HR multispectral image by fusing \(\widetilde{\mathbf {Y}}\) and \(\widetilde{\mathbf {Z}}\).

2.1 Observation Model

By indexing pixels in lexicographic order, the image cubes \(\widetilde{\mathbf {Y}}\), \(\widetilde{\mathbf {Z}}\) and \(\widetilde{\mathbf {X}}\) can be represented by matrices and respectively. The row vectors of these matrices are actually the vectorized band images. With this treatment, the spatial degradation model can be constructed as

$$\begin{aligned} \mathbf {Y} = \mathbf {X}\mathbf {BS}, \end{aligned}$$
(1)

where matrix is a spatial blurring matrix representing the point spread function (PSF) of multispectral sensor in the spatial domain of \(\mathbf {X}\). It is assumed under circular boundary conditions. Matrix accounts for a uniform downsampling of image with scale factor d.

The spectral degradation model can be formulated as

$$\begin{aligned} \mathbf {Z} = \mathbf {RX}, \end{aligned}$$
(2)

where matrix denotes the spectral sensitivity function (SSF) and holds in its rows the spectral responses of RGB camera.

2.2 Edge-Preserving Regularizer

A regularizer, which is in the form of dTV [14], is used to preserve both the location and direction of image edges during the super-resolution procedure. It is based on a priori knowledge that the RGB image and spectral images are likely to show very similar edge structures.

Fig. 1.
figure 1

Demonstration of edge structure preserving effect by the proposed algorithm. From left to right: An HR image region and its edge structure, real band image at band 420 nm, reconstructed band image using R-FUSE [10], reconstructed band image using the proposed algorithm. The spatial resolution is improved by 16\(\times \).

The edge-preserving dTV regularizer is formulated as

$$\begin{aligned} \begin{aligned} \mathrm {dTV}(\mathbf {XD}_x, \mathbf {XD}_y)=&~||\mathbf {XD}_x-\left[ \mathbf {G}_x\odot (\mathbf {XD}_x)+\mathbf {G}_y\odot (\mathbf {XD}_y)\right] \odot \mathbf {G}_x||_1\\ +&\, ||\mathbf {XD}_y-\left[ \mathbf {G}_x\odot (\mathbf {XD}_x)+\mathbf {G}_y\odot (\mathbf {XD}_y)\right] \odot \mathbf {G}_y||_1, \end{aligned} \end{aligned}$$
(3)

where \(\odot \) and \(||\cdot ||_1\) denote the Hadamard product and element-wise \(\ell _1\) norm respectively. Matrices \(\mathbf {D}_x\) and represent the first-order horizontal and vertical derivative matrices under circular boundary conditions. Matrix \(\mathbf {G}_x\) and \(\mathbf {G}_y\) denote the normalized horizontal and vertical gradient components of RGB image \(\mathbf {Z}\), which can be computed in advance as

$$\begin{aligned} \mathbf {G}_* = \frac{\mathbf {f}(\mathbf {ZD}_*)}{\sqrt{\mathbf {f}(\mathbf {ZD}_x)\odot \mathbf {f}(\mathbf {ZD}_x)+\mathbf {f}(\mathbf {ZD}_y)\odot \mathbf {f}(\mathbf {ZD}_y)+\eta ^2}}, \quad *:= x, y \end{aligned}$$

where \(\cdot /\cdot \) and \(\sqrt{\cdot }\) are element-wise division and square root operators. Grayscale conversion function \(\mathbf {f}(\cdot )\) integrates image gradient information across the visible spectrum. Constant \(\eta \) adjusts the relative magnitude of edges and is set to 0.01 in this work. Through the regulating effect of Eq. (3), the component of reconstructed gradient that is orthogonal to the one from RGB image in the same edge location will be penalized. Thus the reconstructed image \(\mathbf {X}\) tends to share the same edge direction with RGB image \(\mathbf {Z}\). Meanwhile, the noise of the reconstructed image will be suppressed in flat area since Eq. (3) reduces to total variation there. Figure 1 shows that the proposed algorithm keeps the edge structure of reconstructed band image in consistent with the one of RGB image, and also suppresses the band image noise. In comparison, the R-FUSE [10] algorithm, which is based on dictionary learning and sparse representation, fails to recover the edge structure.

2.3 Optimization Problem

The target HR multispectral image \(\mathbf {X}\) usually lives in a linear subspace, i.e.,

$$\begin{aligned} \mathbf {X} = \mathbf {\Psi }\mathbf {C}, \end{aligned}$$
(4)

where matrix is the subspace basis that can be obtained in advance by applying PCA on the LR multispectral image \(\mathbf {Y}\), and the dimension \(K_{\Psi }\) is set to 10 in this work. Matrix is the corresponding projection coefficients of \(\mathbf {X}\).

In this case, based on degradation models with the proposed regularizer, the reconstruction problem can be converted to the problem of estimating the unknown coefficient matrix \(\mathbf {C}\) from the following optimization equation

$$\begin{aligned} \mathbf {C}= \arg \min _\mathbf {C}\;\frac{1}{2}||\mathbf {Y} - \mathbf {\Psi CBS}||_F^2 + \frac{\beta }{2}||\mathbf {Z} - \mathbf {R\Psi C}||_F^2 + \gamma \mathrm {dTV}(\mathbf {\Psi CD}_x,\mathbf {\Psi CD}_y), \end{aligned}$$
(5)

where \(\beta \) and \(\lambda \) are weighting and regularization parameters, respectively, and \(||\,.\,||_F\) denotes the Forbenious norm.

3 Optimization Method

Due to the nature of dTV regularizer, which is nonquadratic and nonsmooth, the ADMM algorithm [15] is employed to solve problem (5) through the variable splitting technique. Each subproblem can be efficiently solved.

3.1 ADMM for Problem (5)

By introducing 5 auxiliary variables, the original problem (5) is reformulated as

$$\begin{aligned} \begin{aligned} \min \quad&\frac{1}{2}||\mathbf {Y} - \mathbf {\Psi CB}\mathbf {S}||_F^2 + \frac{\beta }{2}||\mathbf {Z} - \mathbf {R\Psi V}_1||_F^2 + \gamma \left\{ ||\mathbf {V}_2||_1 + ||\mathbf {V}_3||_1 \right\} _\mathrm {dTV} \\ \mathrm {s.t.}\quad&\mathbf {V}_1 = \mathbf {C},\\&\mathbf {V}_2 = \mathbf {V}_x - \left( \mathbf {G}_x\odot \mathbf {V}_x + \mathbf {G}_y\odot \mathbf {V}_y\right) \odot \mathbf {G}_x,\;\mathbf {V}_x = \mathbf {\Psi }\mathbf {CD}_x,\\&\mathbf {V}_3 = \mathbf {V}_y - \left( \mathbf {G}_x\odot \mathbf {V}_x + \mathbf {G}_y\odot \mathbf {V}_y\right) \odot \mathbf {G}_y,\;\mathbf {V}_y = \mathbf {\Psi CD}_y. \end{aligned} \end{aligned}$$
(6)

The auxiliary variable \(\mathbf {V}_1\) helps bypass singularity. The auxiliary variables \(\mathbf {V}_2\) and \(\mathbf {V}_3\) help generate closed-form solutions associated with the dTV regularizer. The auxiliary variables \(\mathbf {V}_x\) and \(\mathbf {V}_y\) help compute the coefficient matrix \(\mathbf {C}\) in frequency domain. Problem (6) has the following augmented Lagrangian

$$\begin{aligned} \begin{aligned}&\;\min \;\mathcal {L}_\rho (\mathbf {C},\mathbf {V}_1,\mathbf {V}_2,\mathbf {V}_3,\mathbf {V}_x,\mathbf {V}_y,\mathbf {A}_1,\mathbf {A}_2,\mathbf {A}_3, \mathbf {A}_x, \mathbf {A}_y)\\&=\,\frac{1}{2}||\mathbf {Y} - \mathbf {\Psi CBS}||_F^2 + \frac{\beta }{2}||\mathbf {Z} - \mathbf {R\Psi V}_1||_F^2 + \frac{\rho }{2}||\mathbf {C - V}_1 - \mathbf {A}_1||_F^2\\&+\,\gamma ||\mathbf {V}_2||_1 + \frac{\rho }{2}||\left[ \mathbf {V}_x - \left( \mathbf {G}_x\odot \mathbf {V}_x + \mathbf {G}_y\odot \mathbf {V}_y\right) \odot \mathbf {G}_x\right] - \mathbf {V}_2 - \mathbf {A}_2||_F^2\\&+\,\gamma ||\mathbf {V}_3||_1 + \frac{\rho }{2}||\left[ \mathbf {V}_y - \left( \mathbf {G}_x\odot \mathbf {V}_x + \mathbf {G}_y\odot \mathbf {V}_y\right) \odot \mathbf {G}_y\right] - \mathbf {V}_3 - \mathbf {A}_3||_F^2\\&+\,\frac{\rho }{2}||\mathbf {\Psi CD}_x - \mathbf {V}_x - \mathbf {A}_x||_F^2 + \frac{\rho }{2}||\mathbf {\Psi CD}_y - \mathbf {V}_y - \mathbf {A}_y||_F^2, \end{aligned} \end{aligned}$$
(7)

where matrices \(\mathbf {A}_1\), \(\mathbf {A}_2\), \(\mathbf {A}_3\), \(\mathbf {A}_x\), \(\mathbf {A}_y\) represent five scaled dual variables, and \(\rho \) denotes the penalty parameter.

The variables in (7) are solved through iteration. The subproblem of coefficient matrix \(\mathbf {C}^{j+1}\) can be fast minimized in frequency domain, which will be detailed in Subsect. 3.2.

The auxiliary variable \(\mathbf {V}_1\) has the following closed-form solution of an unconstrained least squares problem

$$\begin{aligned} \mathbf {V}_{1}^{j+1} = \left( \beta (\mathbf {R\Psi })^\mathsf {H}(\mathbf {R\Psi }) + \rho \mathbf {I}\right) ^{-1}\left( \beta (\mathbf {R\Psi })^\mathsf {H}\mathbf {Z} + \rho (\mathbf {C}^{j+1} - \mathbf {A}_{1}^{j})\right) , \end{aligned}$$
(8)

where \((\cdot )^\mathsf {H}\) denotes matrix conjugate transpose and \(\mathbf {I}\) represents the unit matrix with proper dimensions.

By using soft shrinkage operator, the minimization problems involving \(\mathbf {V}_2\) and \(\mathbf {V}_3\) have the analytical solutions

$$\begin{aligned} \begin{aligned}&\mathbf {V}_{2}^{j+1} = \mathsf {shrink}\left\{ \left[ \mathbf {V}_x^{j} - \left( \mathbf {G}_x\odot \mathbf {V}_x^{j} +\mathbf {G}_y\odot \mathbf {V}_y^{j}\right) \odot \mathbf {G}_x\right] - \mathbf {A}_{2}^{j}, \; {\gamma }/{\rho }\right\} , \\&\mathbf {V}_{3}^{j+1} = \mathsf {shrink}\left\{ \left[ \mathbf {V}_y^{j} -\left( \mathbf {G}_x\odot \mathbf {V}_x^{j} +\mathbf {G}_y\odot \mathbf {V}_y^{j}\right) \odot \mathbf {G}_y\right] - \mathbf {A}_{3}^{j}, \; {\gamma }/{\rho }\right\} , \end{aligned} \end{aligned}$$
(9)

where \(\mathsf {shrink}\left\{ y,\kappa \right\} := \mathsf {sgn}(y)\cdot \mathsf {max}(|y| - \kappa , 0)\), with the sign and maximum functions denoted by \(\mathsf {sgn}(\cdot )\) and \(\mathsf {max}(\cdot ,\cdot )\) respectively.

Under the definitions of Hadamard product and Forbenious norm, every matrix element of \(\mathbf {V}_x^{j+1}\) and \(\mathbf {V}_y^{j+1}\) can be solved independently by minimizing a simple quadratic function. The solution details are omitted for the sake of simplicity.

Then the scaled dual variables are updated according to the ADMM iterative framework [15]. At the end of iteration, the target HR image \(\mathbf {X}\) is recovered as \(\mathbf {X} = \mathbf {\Psi }\mathbf {C}\). Algorithm 1 lists the procedure of this reconstruction. For any \(\beta > 0\), \(\gamma > 0\), and \(\rho > 0\), Algorithm 1 will converge to a solution of (5) as its ADMM steps are all closed, proper, and convex [15]. Our study reveals that 20 iterations are enough to obtain a satisfactory HR image.

figure a

3.2 Solving Coefficient Matrix

By forcing the derivative of (5) w.r.t. \(\mathbf {C}\) to be zero, an efficient analytical solution can be derived in terms of solving the following Sylvester function

$$\begin{aligned} \mathbf {C}^{j+1}\mathbf {W}_1+\mathbf {W}_2\mathbf {C}^{j+1}=\mathbf {W}_3, \end{aligned}$$
(10)

where

$$\begin{aligned} \mathbf {W}_1=\mathbf {BS}\mathbf {S}^\mathsf {H}\mathbf {B}^\mathsf {H}+\rho \mathbf {D}_x\mathbf {D}_x^\mathsf {H}+\rho \mathbf {D}_y\mathbf {D}_y^\mathsf {H}, \end{aligned}$$
$$\begin{aligned} \mathbf {W}_2=\rho (\mathbf {\Psi }^\mathsf {H}\mathbf {\Psi })^{-1}, \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} \mathbf {W}_3 = (\mathbf {\Psi }^\mathsf {H}\mathbf {\Psi })^{-1}[ \mathbf {\Psi }^\mathsf {H}\mathbf {Y}\mathbf {S}^\mathsf {H}\mathbf {B}^\mathsf {H}+&\rho (\mathbf {V}_1^j + \mathbf {A}_1^j) + \rho \mathbf {\Psi }^\mathsf {H}(\mathbf {V}_x^j + \mathbf {A}_x^j)\mathbf {D}_x^\mathsf {H}\\ +&\rho \mathbf {\Psi }^\mathsf {H}(\mathbf {V}_y^j + \mathbf {A}_y^j)\mathbf {D}_y^\mathsf {H}]. \end{aligned} \end{aligned}$$

Using the decomposition \(\mathbf {W}_2=\mathbf {Q}\mathbf {\Lambda }\mathbf {Q}^{-1}\) and multiplying both sides of (10) by \(\mathbf {Q}^{-1}\) leads to

$$\begin{aligned} \mathbf {\overline{C}W}_1+\mathbf {\Lambda }\mathbf {\overline{C}}=\mathbf {\overline{W}}_3, \end{aligned}$$

where \(\mathbf {\overline{C}} = \mathbf {Q}^{-1}\mathbf {C}^{j+1}\) and \(\mathbf {\overline{W}}_3=\mathbf {Q}^{-1}\mathbf {W}_3\). Thus each row of \(\mathbf {\overline{C}}\) can be solved independently as

$$\begin{aligned} \mathbf {\overline{C}}_i = \mathbf {\overline{W}}_3(\mathbf {W}_1+\lambda _i\mathbf {I})^{-1}, \;\; 1\le i\le K_{\Psi }, \end{aligned}$$
(11)

where i denotes the row index, and \(\lambda _i\) denotes the ith eigenvalue of \(\mathbf {W}_2\).

Utilizing the properties of convolution and decimation matrices, the solution (11) can be accelerated in frequency domain. Convolution matrices \(\mathbf {B}\), \(\mathbf {D}_x\) and \(\mathbf {D}_y\) can be diagonalized by Fourier matrix , i.e., \(\mathbf {B}=\mathbf {F\Lambda }_{B}\mathbf {F}^\mathsf {H}\), \(\mathbf {D}_x=\mathbf {F\Lambda }_x\mathbf {F}^\mathsf {H}\) and \(\mathbf {D}_y=\mathbf {F\Lambda }_y\mathbf {F}^\mathsf {H}\). Then when computing \(\mathbf {\overline{W}}_3\), right multiplying with these matrices can be achieved through fast Fourier transform (FFT) and entry-wise multiplication operations. Meanwhile, right multiplying with \(\mathbf {S}^\mathsf {H}\) is equivalent to the simple upsampling operation.

For further simplification, the matrix inverse in (11) is represented as

$$\begin{aligned} \mathbf {F}\left( \mathbf {\Lambda }_{B}\mathbf {F}^\mathsf {H}\mathbf {SS}^\mathsf {H}\mathbf {F\Lambda }^\mathsf {H}_{B}+\rho \mathbf {\Lambda }_x^2+\rho \mathbf {\Lambda }_y^2+\lambda _i\mathbf {I}\right) ^{-1}\mathbf {F}^\mathsf {H}:=\mathbf {F}\mathbf {K}^{-1}\mathbf {F}^\mathsf {H}. \end{aligned}$$

By translating the frequency properties of decimation matrix [10] into

$$\begin{aligned} \mathbf {F}^\mathsf {H}\mathbf {SS}^\mathsf {H}\mathbf {F}=\mathbf {PP}^\mathsf {H}/d^2, \end{aligned}$$

\(\mathbf {K}\) can be consolidated as

$$\begin{aligned} \mathbf {K} =\frac{1}{d^2}\mathbf {\Lambda }_{B}\mathbf {PP}^\mathsf {H}\mathbf {\Lambda }^\mathsf {H}_{B}+\mathbf {\Lambda }_{K}, \end{aligned}$$

where \(\mathbf {\Lambda }_{K} = \rho \mathbf {\Lambda }_x^2+\rho \mathbf {\Lambda }_y^2+\lambda _i\mathbf {I}\) is a diagonal matrix, is a transform matrix with 0 and 1 elements. Right multiplying with \(\mathbf {P}\) and \(\mathbf {P}^\mathsf {H}\) can be achieved by performing sub-block accumulating and image copying operations to the corresponding image. As the inverse of large-scale matrix is difficult, the Woodbury inversion lemma [11] is used to decompose \(\mathbf {K}^{-1}\) as

$$\begin{aligned} \mathbf {K}^{-1}=\mathbf {\Lambda }_{K}^{-1}-\mathbf {\Lambda }_{K}^{-1}\mathbf {\Lambda }_{B}\mathbf {P}\left( d^2\mathbf {I}+\mathbf {P}^\mathsf {H}\mathbf {\Lambda }^\mathsf {H}_{B}\mathbf {\Lambda }_{K}^{-1}\mathbf {\Lambda }_{B}\mathbf {P}\right) ^{-1}\mathbf {P}^\mathsf {H}\mathbf {\Lambda }^\mathsf {H}_{B}\mathbf {\Lambda }_{K}^{-1}, \end{aligned}$$
(12)

where matrix \(d^2\mathbf {I}+\mathbf {P}^\mathsf {H}\mathbf {\Lambda }^\mathsf {H}_{B}\mathbf {\Lambda }_{K}^{-1}\mathbf {\Lambda }_{B}\mathbf {P}\) is diagonal.

Inserting (12) into (11) yields the final solution

$$\begin{aligned} \begin{aligned} \mathbf {\overline{C}}_{i}=\mathbf {\overline{W}}_3\mathbf {F\Lambda }_{K}^{-1}\mathbf {F}^\mathsf {H}-&\mathbf {\overline{W}}_3\mathbf {F\Lambda }_{K}^{-1}\mathbf {\Lambda }_{B}\mathbf {P}\left( d^2\mathbf {I}+\mathbf {P}^\mathsf {H}\mathbf {\Lambda }^\mathsf {H}_{B}\mathbf {\Lambda }_{K}^{-1}\mathbf {\Lambda }_{B}\mathbf {P}\right) ^{-1}\\&\mathbf {P}^\mathsf {H}\mathbf {\Lambda }^\mathsf {H}_{B}\mathbf {\Lambda }_{K}^{-1}\mathbf {F}^\mathsf {H}, \;\; 1\le i\le K_{\Psi }, \end{aligned} \end{aligned}$$
(13)

and the coefficient matrix is computed as \(\mathbf {C}^{j+1}=\mathbf {Q}\mathbf {\overline{C}}\). Noting that this solution procedure mainly contains the efficient FFT, entry-wise multiplication, sub-block accumulating, and image copying operations.

4 Experiments

Experiments are performed on both simulated and our acquired LR multispectral images. In the simulation, the LR multispectral images with 31 bands are generated by applying Gaussian blur and downsampling operations to the images in the Harvard scene dataset [16]Footnote 1 and CAVE object dataset [17]Footnote 2. The HR RGB images are generated using the SSF of Canon 60D camera provided in the CamSpec database [18]. In our real image set, the LR multispectral images with 31 bands are acquired across the visible spectrum 400–720 nm by an imaging system consisting of a liquid crystal tunable filters and a CoolSnap monochrome camera. The HR RGB images are captured using a Canon 70D camera. The acquired multispectral and RGB images are aligned according to [19].

Fig. 2.
figure 2

Reconstruction results of imgc4 with 16\(\times \) spatial resolution improvement. The 1st row shows the reconstructed HR images at 580 nm using different algorithms. The LR image and ground truth image are listed on the right. The remaining rows illustrate the corresponding RMSE maps and SAM maps calculated across all the spectral bands.

To evaluate the quality of reconstructed multispectral images, four objective quality metrics namely spectral angle mapper (SAM) [6], root mean squared error (RMSE) [6], relative dimensionless global error in synthesis (ERGAS) [6], and peak signal to noise ration (PSNR) [6] are used in our study. For comparison, three leading super-resolution methods namely HySure [8], R-FUSE [10], and NSSR [9] are also implemented under the same environment. Their source codes are publicly available onlineFootnote 3\(^,\)Footnote 4\(^,\)Footnote 5.

Fig. 3.
figure 3

The average RMSE values of all the reconstructed images with respect to parameters (a) \(\mathrm {log}_{10}\beta \), (b) \(\mathrm {log}_{10}\gamma \), and (c) \(\mathrm {log}_{10}\rho \).

4.1 Parameter Setting

We evaluate the effect of three key parameters (weighting parameter \(\beta \), regularization parameter \(\gamma \), and penalty parameter \(\rho \)) on the reconstruction accuracy of proposed algorithm. Figure 3 plots the average RMSE values of all the reconstructed images with respect to these parameters. In this work, we set \(\beta =1\), \(\gamma =10^{-6}\), and \(\rho =10^{-5}\) that result in small RMSE value. We note that setting the \(\beta \) value too large will overemphasize the importance of RGB data term, and setting the \(\gamma \) value too small will decrease the role of RGB edge guidance.

4.2 Results on Simulated Images

Figure 2 shows the reconstruction results of imgc4 with 16\(\times \) spatial resolution improvement, as well as the detailed RMSE maps and SAM maps. The average RMSE and SAM values are also listed for quantitative comparison. It is observed that the HySure [8] algorithm exhibits large spectral errors, and the R-FUSE [10] and NSSR [9] algorithms do not handle the spatial details well. In comparison, the proposed algorithm produces relatively accurate HR images.

Table 1. Average SAM, RMSE, ERGAS, and PSNR values produced by different algorithms on two datasets. The resolution is improved with 16\(\times \)

Table 1 shows the average SAM, RMSE, ERGAS, and PSNR values of all the reconstructed multispectral images in Harvard and CAVE datasets. The spatial resolution is improved by 16 times. It is observed that the proposed algorithm outperforms all the competitors when evaluated using these metrics. Furthermore, Fig. 4 shows the overall reconstruction accuracy on the multispectral images of the two datasets in terms of RMSE and SAM. For clear demonstration, the image indexes are sorted in ascending order with respect to the metric values produced by the proposed algorithm. It is observed that in most cases the proposed algorithm performs better than the competing methods when evaluated using either spatial or spectral metrics.

Fig. 4.
figure 4

(a) RMSE and (b) SAM values produced by different algorithms on all the stimulated data with scale factors d = 16.

Fig. 5.
figure 5

(a) Reconstruction results on real data Masks at band 590 nm with 8\(\times \) spatial resolution improvement. (b) Marked pixels in reconstructed images compared with the ones in original LR image.

4.3 Results on Real Images

We also evaluate the performance of the proposed algorithm on real images acquired in our laboratory. The RGB image is linearized beforehand with the inverse camera response function estimated by [20]. The SSF is computed through linear regression with existing image data. Figure 5(a) shows the original HR RGB image and LR band image at 590 nm of Masks, as well as the corresponding reconstructed results with 8\(\times \) spatial resolution improvement. Figure 5(b) shows the marked pixels in smooth regions. Each marked pixel in the reconstructed HR image is compared with the one in the original LR image, and it is desired that the intensity of the two pixels should be close. It is observed that the face edges produced by HySure and NSSR are not clear, and the intensity of eye produced by R-FUSE is too high. In comparison, the proposed algorithm performs well in handling these details.

4.4 Computational Complexity

The complexity of the proposed algorithm is dominated by the FFTs when computing coefficient matrix \(\mathbf {C}\), and is of order \(\mathcal {O}(K_{\Psi }MN\mathsf {log}(MN))\) per ADMM iteration. Table 2 shows the running times of the HySure [8], R-FUSE [10], NSSR [9], and proposed algorithms for reconstructing an HR multispectral image with 31 spectral bands and 1392 \(\times \) 1040 spatial resolution. These algorithms are all implemented using MATLAB R2016a on a personal computer with 2.60 GHz CPU (Intel Xeon E5-2630) and 64 GB RAM. The proposed algorithm gains improvement in computational efficiency.

Table 2. Running times (in seconds) of different algorithms for reconstructing an HR multispectral image with 31 bands and 1392 \(\times \) 1040 spatial resolution. The numbers in parentheses are the speedup of the proposed algorithm over the corresponding competitors

5 Conclusions

This paper has proposed a super-resolution algorithm to improve the spatial resolution of multispectral image with an HR RGB image. The HR multispectral image is efficiently reconstructed according to the linear image degradation models, and the dTV operator is used to keep the recovered edge locations and directions in accordance with those of the RGB image. Experimental results validate that the proposed algorithm performs better than the state-of-the-arts in terms of both reconstruction accuracy and computational efficiency.