Keywords

1 Introduction

Remote sensing applications, like, geographical navigation, disaster management, agricultural monitoring, etc., require images with high spatial/spectral resolutions. The high spatial information allows an accurate geometric analysis while high spectral resolution allows a better thematic interpretation. Although the MS images usually contain high spectral information yet they have limitations in terms of low spatial resolution. There are numerous applications using HR MS remote sensing images where such LR images fail to provide high-quality visuals as well as a proper analysis of such images. So, reconstruction of high spatial/spectral MS images from available LR MS images becomes an important topic of research [5, 10, 14].

In remote sensing, several satellites, like, Landsat, QuickBird, SPOT, etc., provides a set of LR MS band images along with a corresponding high-resolution panchromatic (HR PAN) image. For example, the data provided by QuickBird is composed of a HR PAN band of 0.65 m and several LR MS bands of 2.62 m resolution. Similarly, that of the Landsat-7 satellite is 15 m and 30 m, respectively. A MS color image is formed by combining the LR spectral bands while the PAN image is a single band gray-scale image only. Figure 1 shows an example of MS and PAN images of the same area from which the visual differences between the two images can be clearly viewed.

HR MS images can be produced by fusion of the HR PAN and the LR MS images. These methods are known as pan-sharpening as the resolution of MS images are made equal to that of the PAN images. A large number of works on fusion of MS and PAN image are explained in [2]. Some of the existing pan-sharpening methods which are very popular are named as intensity-hue-saturation method (IHS) [11], principal component analysis (PCA) method [9] and Brovery transform based method [13]. These are component substitution based methods where the MS image is first transformed into a color image and then an interpolated MS image is obtained to get a spatial resolution in the order of the PAN image’s resolution. Next, the pixels of the luminance channel of the color transformed MS image is replaced with those of the PAN image and the final HR image is obtained through an inverse color transformation. One major limitation in this approach is that due to different statistical distribution of the pixels of the color image’s luminance channel and the PAN image, the output images produced by such methods suffer from significant spectral distortions [10].

Fig. 1.
figure 1

Example of QuickBird Images of the same area: PAN (left) and MS (right).

The prime focus of this paper is to produce a HR MS image from the given LR MS image using the super-resolution (SR) method. Although, the pan-sharpening and SR methods have similarities in their approaches but they are different by the fact that pan-sharpening enhances the spatial information of MS images only while SR tries to estimate the target HR image’s spatial as well as spectral information from the LR image itself.

The problem of SR image reconstruction from an available LR image is an inverse problem. This is also an ill-posed inverse problem as there may be many HR images which yield to the similar LR image. Literature studies refer that sparse representation theory in signal and image processing is popularly applied for image denoising and restoration problems [3, 4]. Sparse representation is also used effectively in several recent single image super-resolution works [1, 7, 12]. These works are based on learning a pair of dictionaries (LR and HR) from a given dataset of HR RGB images and reconstruct the luminance channel of the LR input using patch-wise sparse representation technique. However, in case of MS image SR, a standard HR MS image dataset is usually not obtainable to learn the dictionary pairs and on the other hand, reconstruction from a transformed RGB image in pan-sharpening gives spectral distortions.

In this work, firstly, to overcome the dictionary learning issue, we focus on utilizing input HR PAN image to train the LR/HR dictionary pair as they contain high spatial details that is desired in the target HR image. A work by Zhu et al. [14] presents a similar pan-sharpening work where they consider the PAN image patches as dictionary atoms. We focus on training the dictionaries based on extraction of features, e.g. edges, etc., from the PAN image patches such that these features can be utilized for improved representation of an LR patches during reconstruction. Secondly, instead of reconstructing a RGB MS image, we apply reconstruction of each MS band separately and combine them to obtain the resulted HR MS image such that each band can maintain it’s original spectral specifications. The major contributions of this paper are as stated below:

  • A self-learning based K-singular value decomposition (K-SVD) coupled dictionary learning method is developed using the single HR PAN image.

  • Reconstruction of a high spatial/spectral MS output image is made through band-wise super-resolution using sparsity regularization.

  • Comparison of different methods are carried out using standard spatial/spectral quality and quantity evaluation metrics.

The remaining part of this paper is organized as follows: In Sect. 2, a background of the image acquisition process is discussed. The proposed method of MS image super-resolution is explained in Sect. 3. The details of experiments carried out and the simulations results obtained are given in Sect. 4. Finally, in Sect. 5, a conclusion is provided.

2 Background

The general image acquisition model can be expressed as [12]

$$\begin{aligned} {\mathbf {X}} = {\mathbf {SHX'}} + v, \end{aligned}$$
(1)

where \({\mathbf {X}}\) represents the LR image, \({\mathbf {X'}}\) represent the HR image, and \(\mathbf {H}\) and \(\mathbf {S}\) are the blurring and down-sampling operators, respectively. Here, v is the additive noise which is ideally equal to zero.

Now, \({\mathbf {X'}}\) can be estimated by solving the inverse problem shown below.

$$\begin{aligned} {\mathbf {\hat{X}'}} = \arg \min \left\| {{\mathbf {X - SHX'}}} \right\| _2^2 \end{aligned}$$
(2)

The problem of Eq. 2 is an ill-posed inverse problem since many HR image \({\mathbf {X'}}\) may obey the above condition for any input LR image \({\mathbf {X}}\). Therefore, to resolve this issue, we regularize the problem using a local patch-wise sparsity prior based modeling of it. The detail working of the model proposed in this work is discussed in the following section.

3 Proposed MS Image Super-Resolution Method

The proposed SR algorithm considers the input PAN image \(\mathbf {Y}\) for learning a pair of overcomplete dictionaries \(\mathbf {D_H}\) and \(\mathbf {D_L}\). Then, patch-based sparse representation of the input LR MS image \(\mathbf {X}\) is carried out to generate the desired output HR MS image \({\mathbf {X'}}\). The work flow of the proposed SR algorithm is depicted in Fig. 2. It comprises of two main steps: dictionary training and SR reconstruction.

Fig. 2.
figure 2

Schematic of the proposed MS image SR algorithm.

3.1 Dictionary Training

For extraction of LR patches from the given PAN image \(\mathbf {Y}\), it is blurred and down-sampled so that it can have the point spread function (PSF) identical to that of the input MS image. This converted LR PAN image is then passed through four 1D-filters of order 1 and 2, to extract both horizontal and vertical gradient features from it.

$$\begin{aligned} f_1 = [ - 1,0,1],\,\,\,\,\,f_2 = [1,0, - 2,0,1], ~~ f_3 = f_1 ^T ,\,\,\,f_4 = f_3 ^T , \end{aligned}$$
(3)

The four feature vectors obtained from each patch after filtering are concatenated into a single vector that represents a LR patch. Again, a HR patch is extracted directly from the given HR images. Two vectors \({\mathbf {y}}_{\mathbf {H}}\) and \({\mathbf {y}}_{\mathbf {L}}\) are created which contain all the HR and LR patches. A sparse representation problem is formulated to train the coupled dictionary \({{\mathbf {\tilde{D}}}}\) from the combined patch vector of \({\mathbf {y}}_{\mathbf {H}}\) and \({\mathbf {y}}_{\mathbf {L}}\).

$$\begin{aligned} \mathop {\min }\limits _{\left\{ {{\mathbf {D}}_{{\mathbf {H,}}} {\mathbf {D}}_{\mathbf {L}} {\mathbf {,z}}} \right\} } \left\| {{\mathbf {y}}_{\mathbf {C}} - {\mathbf {\tilde{D}z}}} \right\| _2^2 + \lambda \left\| {\mathbf {z}} \right\| _1, \end{aligned}$$
(4)

where \({\mathbf {\tilde{D}}} = \left[ \begin{gathered} \tfrac{1}{{\sqrt{m} }}{\mathbf {D}}_{\mathbf {H}} \\ \tfrac{1}{{\sqrt{n} }}{\mathbf {D}}_{\mathbf {L}} \\ \end{gathered} \right] \) and \({\mathbf {y}}_{\mathbf {C}} = \left[ \begin{gathered} \tfrac{1}{{\sqrt{m} }}{\mathbf {y}}_{\mathbf {H}} \\ \tfrac{1}{{\sqrt{n} }}{\mathbf {y}}_{\mathbf {L}} \\ \end{gathered} \right] \); here, m, n represents the size of the HR and LR patches in vector form.

The least-square minimization based problem in Eq. 4 is solved by utilizing the optimized K-SVD training algorithm [8] to obtain the coupled trained overcomplete dictionary \({{\mathbf {\tilde{D}}}}\).

3.2 SR Reconstruction

The LR MS image is processed band-wise for SR reconstruction. A selected band image is first applied to the feature extraction stage to get the feature patch vectors as done in dictionary training. Next, for each feature patch vector \(\mathbf {x}\) in LR MS image, a sparse representation problem is formulated using the dictionary \({{\mathbf {\tilde{D}}}}\) and it is given as:

$$\begin{aligned} {\mathbf {\alpha }}^{\mathbf {*}} = \arg \mathop {\min }\limits _{\mathbf {\alpha }} \left\{ {\left\| {{\mathbf {\tilde{D}\alpha }} - {\mathbf {x}}} \right\| _2^2 + \lambda \left\| {\mathbf {\alpha }} \right\| _1 } \right\} \end{aligned}$$
(5)

Equation 5 is a \(\ell _1\)- \(\ell _2\) minimization problem. We estimate the sparse coefficient vector \({\mathbf {\alpha }}^{\mathbf {*}}\) by solving the feature-sign search based convex optimization algorithm [6].

Since, the patches of both LR and HR images share common sparse representation with their individual dictionaries, HR image patches can be reconstructed as follows:

$$\begin{aligned} {\mathbf {x'}} = {\mathbf {D}}_{\mathbf {H}} {\mathbf {\alpha }}^* \end{aligned}$$
(6)

Tiling all the reconstructed patches in its corresponding channel yields to an intermediate HR image \({\mathbf {X'}}_{\mathbf {0}}\). Finally, back-projection applies the global imaging model constraint of Eq. 2 on \({\mathbf {X'}}_{\mathbf {0}}\) to obtain the final HR image \({\mathbf {X'}^*}\) . Mathematically,

$$\begin{aligned} {\mathbf {X'}}^{\mathbf {*}} = \arg \mathop {\min }\limits _{{\mathbf {X'}}} \left\| {{\mathbf {X - SHX'}}} \right\| _2^2 + c\left\| {{\mathbf {X'}} - {\mathbf {X'}}_{\mathbf {0}} } \right\| _2^2 \end{aligned}$$
(7)

Equation 7 is efficiently solved using gradient descent method.

4 Experimental Results

Experiments are carried out for SR reconstruction using the proposed method on two test MS images and comparison of results are shown with four other existing MS image SR methods based on pan-sharpening, namely, IHS [11], PCA [9], Brovery [13], and SparseFI [14]. Datasets containing PAN and MS images of size 2048 \(\times \) 2048 and 512 \(\times \) 512 are acquired from a QuickBird image taken over the region India-The Sundarbans and captured on 02 November, 2002 fromFootnote 1. Simulations are carried out using MATLAB on a PC running Windows 7 OS and having 4 GB RAM.

In this experiment, to keep a HR reference image for quality assessment, the input PAN and MS images are downsampled by a factor 4. Thus, for both the datasets, the dimensions of the test LR MS image and trainable PAN image are resized to 128 \(\times \) 128 and 512 \(\times \) 512, respectively.

Fig. 3.
figure 3

Visuals for QuickBird first dataset using different methods. First row (from left): original MS, Down-sampled MS, Down-sampled PAN, IHS output; second row (from left): results by PCA, Brovery, SparseFI and Proposed method.

During the training phase, a coupled overcomplete dictionary consisting 1024 number of atoms is learned using 10000 sample patches taken from the input HR PAN image. Here, we consider extracting patches of size 7 \(\times \) 7 which is preferable for better results as explained by Zhu et al. [14]. In this work, each LR MS band image is upscaled individually. So, during reconstruction, each feature patch of the input image is processed by sparse representation and an HR patch is obtained using the sparse coefficients and HR dictionary. Finally, the reconstructed HR images are combined to get the output HR MS image.

Table 1. Performance evaluation for QuickBird first dataset.

For quality assessment of the resulted images of different methods, the quantitative metrics computed are as follows: root mean-square error (RMSE), spatial correlation coefficient (CC), spectral distortion (SD), the universal image quality index (UIQI), spectral angle mapper (SAM), and erreur relative globale adimensionnelle de synthese (ERGAS).

Fig. 4.
figure 4

Visuals for QuickBird second dataset using different methods. First row (from left): original MS, Down-sampled MS, Down-sampled PAN, IHS output; second row (from left): results by PCA, Brovery, SparseFI and Proposed method.

Table 2. Performance evaluation for QuickBird second dataset.

The output images by different methods for the two test datasets are visually presented in Figs. 3 and 4. Moreover, their visual outputs are quantitatively validated using the above mentioned parameters in Tables 1 and 2. Results obtained indicate that, in case of the proposed method, the spatial information is better while the spectral distortion is less compared to others.

5 Conclusion

A new MS image SR method using sparse representation is presented in the paper. K-SVD technique based coupled overcomplete dictionary learning from input HR PAN image is also shown. HR output MS image is reconstructed based on patch-wise sparsity regularization along with back-projection. Simulations are carried out on two standard datasets. The proposed method performs superior while compared to other SR and pan-sharpening methods in terms of both quantitative and visual analysis. As a future work, more effective prior term based regularization can be considered for better spatial or spectral reconstructions.