1 Introduction

Images and videos are very important for normal communication, information transmission and surveillance systems. These contents appear more and more frequently with the rapid development of hand-held imaging device. However, most images suffer from the degradation process due to image blurring caused by the point spread function (PSF) of the imaging system and atmospheric turbulence, aliasing from subsampling and noise. To easily solve these problems, super-resolution (SR) emerges as the times require [1].

The SR problem is extremely ill-posed since multiple HR images can be mapped to the same LR image through a degradation model. Thus, the reconstruction process should rely on strong prior information. Through the years, researchers have proposed various methods to solve this problem, they can be roughly divided into two categories: multiple-frame super-resolution [2,3,4,5] and single-image super-resolution (SISR) [6]. The SISR approaches also can be further divided into interpolation-based methods [7,8,9], reconstruction-based methods [10,11,12,13,14] and learning-based methods [15, 16]. Among all these approaches, the learning-based SR method is considered as one of the most effective solutions, it works by learning the mapping relationship between LR and HR image pairs as the image prior that should be used for reconstruction.

Representative learning-based works such as neighbor embedding (NE) [17,18,19], which assumes that LR and HR patches have locally similar geometry on low-dimensional nonlinear manifolds, thus allow the estimated HR output patch to share the same interpolation coefficients that are used to approximate the corresponding input LR patch on an existing database. However, NE needs to search a vast dataset for image patterns that are similar to the complicated structures of the input images, thus will consume huge time cost. By utilizing machine learning, Yang et al. [20, 21] treated the LR and HR patches as a sparse linear combination of the jointly learned LR and HR dictionary atoms based on the assumption that LR and HR patches share the same sparse representation coefficients [22]. Although this method obtains superior results with remarkably low computational complexity, it is still limited by the strong constraint of same sparse representation and time cost of L1-norm regularity in the optimization process. Zeyde et al. [23] developed this framework by applying K-SVD [24, 25] on LR dictionary training, utilizing principle component analysis (PCA) and orthogonal matching pursuit (OMP) [26] for sparse representation. This method shows substantial improvements in terms of running speed and reconstruction quality. Moreover, Jiang et al. [27] combined neighbor embedding with sparse representation theory, utilizing a manifold-preserving framework on a sparse support base and learning a local regression as a mapping function from LR to HR. This method which is named as manifold-regularized sparse support regression (MSSR) also achieves pleasing results. However, the support base is obtained by applying OMP on the LR dictionary [28], which may make the precision of manifold insufficient for predicting the HR version of an input LR image, and the online process is still time consuming.

In this paper, we are inspired by MSSR, and propose a new algorithm named manifold-regularized collaboration representation support (MCSR) to solve super-resolution problem. Compared to the previous work of [27], our main contributions are threefolds. First, by reformulating the SR problem as an L2-norm-regularized least squares regression and applying collaboration representation on it, we can obtain a projective matrix offline. Then the matrix is multiplied by the input LR image to obtain the support base which is used to calculating the final mapping relationship between the input LR patch and its HR version. Different from calculating sparse support base in online process, our approach can boost online execution speed. Second, the collaboration representation is constructed on the training sample pool for each atom in the dictionary, thus the selected support base is much closer to the input LR image and the manifold will be approximated better. Third, to achieve a fast execution speed, we further transfer of the whole online process including manifold preserving and support base calculating to offline. We take experiments on commonly used datasets, showing better results of sharper edges, finer textures and higher index values compared to other methods.

The remainder of this paper is organized as follows: In Sect. 2, we take a close look at the MSSR approach. In Sect. 3, we present the proposed MCSR method and describe its details. In Sect. 4, we report experimental results. In Sect. 5, we conclude the paper.

2 Related works

In this section, we make a short review about the previous work of MSSR approach.

Considering the manifold assumption in NE (manifolds spanned by LR and HR patches have locally similar geometry) and strong constraint in sparse coding-based SR (LR and HR patches share the same sparse representation coefficients) may not be feasible in practical [29], the MSSR focuses on learning a much more stable LR-to-HR mapping function in the support domain where the same sparse representation constraint is relaxed, together with preserving the geometry for the reconstructed HR patch space. By denoting the original HR image patches as X = [x1,x2,…xk], xkRD and corresponding LR patches as Y = [y1,y2,…yk], ykRd, where D and d are dimensions of HR and LR patch, the objective function of MSSR can be written as follows:

$${O_{{\text{MSSR}}}}=\sum\nolimits_{i} {\varepsilon (P,{y_i},{x_i})+\alpha \left\| P \right\|_{{\text{H}}}^{2}+\beta \left\| P \right\|_{{\text{M}}}^{2}} ,$$
(1)

where \(\varepsilon (P,{y_i},{x_i})=\sum\nolimits_{{i \in S}} {{{(P{x_i} - {y_i})}^2}}\) is a predefined loss function of support samples, \(\left\| P \right\|_{{\text{H}}}^{2}\) is a regularization function measuring the smoothness of parameter P, and \(\left\| P \right\|_{{\text{M}}}^{2}\) is an appropriate penalty term that should reflect the intrinsic structure of data. The parameter α controls the complexity of function in ambient space while β controls the complexity of function along geodesics in intrinsic geometry of data space.

The goal of MSSR is to minimize the loss function term by finding a function f(y,P) = Py which maps an input LR patch yin to its corresponding HR version. In this term, i is the indices of nonzero elements in the coding coefficients of patch yin on the LR dictionary, the sparse support domain is constructed by these elements and thus denoted as S, where iS. The support sample sets can be defined as XS and YS. For the manifold-preserving term \(\left\| P \right\|_{M}^{2}\), since the data manifold M is unknown, the geometry relation is preserved using L1 graph [30]. This is to sparsely approximate the HR patch on the smooth patch manifold space by a linear combination of a few nearby patches, which should refer to the support domain XS. Any HR patch xi can be approximated by data matrix XS except xi as follows:

$$\begin{gathered} {\mathop W\limits^{ \wedge } _i}=\mathop {\arg \hbox{min} }\limits_{{{W_i}}} {\left\| {{x_i} - {X_S}{W_i}} \right\|_2}+{\lambda _1}{\left\| {{W_i}} \right\|_1}, \hfill \\ {\text{ s}}{\text{.t}}{\text{. }}{W_{ii}}=0 \hfill \\ \end{gathered}$$
(2)

where Wi denotes the i-th column of the edge weight matrix W whose diagonal elements are zeros, and λ1 is the parameter balancing the coding error of xi and the sparsity of Wi. Then the geometry constraint can be preserved from W for PYSWi by

$$\begin{gathered} \sum\limits_{{i \in S}} {\left\| {P{y_i} - P{Y_S}{W_i}} \right\|_{2}^{2}=\left\| {P{Y_S} - P{Y_S}W} \right\|_{{\text{F}}}^{2}} \hfill \\ =\left\| {P{Y_S}(I - W)} \right\|_{{\text{F}}}^{2}, \hfill \\ \end{gathered}$$
(3)

where I is an identity matrix. Considering above-mentioned two properties, the objective function is defined as

$${O_{{\text{MSSR}}}}=\left\| {P{Y_S} - {X_S}} \right\|_{{\text{F}}}^{2}+\alpha \left\| P \right\|_{{\text{F}}}^{2}+\beta \left\| {P{Y_S}(I - W)} \right\|_{{\text{F}}}^{2}.$$
(4)

Using matrix properties \({\text{tr}}(BA)={\text{tr}}(AB)\), \({\left\| A \right\|^2}={\text{tr}}(A{A^{\text{T}}})\), and \({\text{tr}}(A)={\text{tr}}({A^{\text{T}}})\), we have

$$\begin{aligned} {O_{{\text{MSSR}}}}= \,& {\text{tr}}(P{Y_S}Y_{S}^{{\text{T}}}{P^{\text{T}}} - P{Y_S}X_{S}^{{\text{T}}} - {X_S}Y_{S}^{{\text{T}}}{P^{\text{T}}}+{X_S}X_{S}^{{\text{T}}}) \\ & \quad +\alpha {\text{tr}}(P{P^{\text{T}}})+\beta {\text{tr}}(P{Y_S}GY_{S}^{{\text{T}}}{P^{\text{T}}}), \\ \end{aligned}$$
(5)

where \(G=(I - W){(I - W)^{\text{T}}}\). Then we take the derivate of \({O_{{\text{MSSR}}}}\) with respect to P and set it to zero; thus

$$P({Y_S}Y_{S}^{{\text{T}}}+\alpha I+\beta {Y_S}GY_{S}^{{\text{T}}})={X_S}Y_{S}^{{\text{T}}}.$$
(6)

Finally, by solving the objective function, the algebra solution is given by

$$P={X_S}Y_{S}^{{\text{T}}}{({Y_S}Y_{S}^{{\text{T}}}+\alpha I+\beta {Y_S}GY_{S}^{{\text{T}}})^{ - 1}}.$$
(7)

The MSSR remarkably improved the image reconstruction quality from SC- and NE-based methods. However, the manifold that is preserved is constructed by the sparse vectors, which are selected using OMP. This may make the manifold not have enough precision. Moreover, the final post-processing step of using iterative back projection is not recommended since it may produce unpleasing artifacts [23]. And the complicated computing process also makes MSSR very time consuming.

3 Manifold-regularized collaboration representation support (MCSR)

In this section, we will present the details of our proposed algorithm. We fix the magnification factor to 3.

3.1 Collaboration representation support

As we have mentioned in Sect. 2, MSSR uses the OMP algorithm to select the most contributive vectors that can represent the input LR image from the LR dictionary. The problem is that OMP calculates the inner product between dictionary atoms and the input LR patch, and can promise that the residual is always orthogonal to the previous selected atoms. Let a1 and a2 be the first two atoms selected by OMP to represent the input LR patch yin and r1 be the residual after this step, r1 should be orthogonal to a1 and a2. After next projection, OMP will obtain the third atom vector a3 and the next residual r2, with r2 being orthogonal to a1, a2, and a3. Therefore, these vectors are not closest to yin, but best sparsely represent yin, thus the manifold approximated by them will not be precise enough.

Inspired by [31], we treat this problem using collaboration representation instead of sparse coding. Using the L2 norm can encourage the weight parameters to be infinitely close to 0, which is less likely to produce an overfitting than the L1 norm. So in our framework, the sparse constraint is reformulated as a least squares regression by the L2-norm, with the following expression:

$$\mathop {\hbox{min} }\limits_{\theta } ||{y_{in}} - {N_L}\theta ||_{2}^{2}+{\lambda _2}||\theta |{|_2},$$
(4)

where θ is the coefficient, λ2 is a weighting parameter balances the two terms and NL is the corresponding LR neighborhood of the input patch feature yin. By grouping K nearest neighbor for each dictionary atom from the whole training sample pool, NL can refer to the neighborhood of the atom that are most similar to yin. Using collaborative representation, the algebraic solution is obtained by

$$\theta ={(N_{{\text{L}}}^{{\text{T}}}{N_{\text{L}}}+{\lambda _2}I)^{ - 1}}N_{{\text{L}}}^{{\text{T}}}{y_{{\text{in}}}}={P_{\text{C}}}{y_{{\text{in}}}},$$
(5)

where the projective matrix

$${P_{\text{C}}}={(N_{{\text{L}}}^{{\text{T}}}{N_{\text{L}}}+{\lambda _2}I)^{ - 1}}N_{{\text{L}}}^{{\text{T}}},$$
(6)

can be computed offline for each atom in the dictionary. Then we can obtain the support base by applying PC to yin instead of solving sparse coding problem, thus boosting the online execution speed. Note that θ is not sparse, so we choose the most contributive J vectors as our support base, denoting as XC and YC. And since all the vectors in the support base are chosen from the nearest neighborhood, the manifold will be approximated much better than using sparse support base. Using XC as support domain in the manifold-preserving step, we can obtain the final objective function:

$${O_{{\text{MCSR}}}}=\left\| {P{Y_{\text{C}}} - {X_{\text{C}}}} \right\|_{{\text{F}}}^{2}+\alpha \left\| P \right\|_{{\text{F}}}^{2}+\beta \left\| {P{Y_{\text{C}}}(I - W)} \right\|_{{\text{F}}}^{2},$$
(7)

and the projective matrix can be obtained by following expression:

$$P={X_{\text{C}}}Y_{{\text{C}}}^{{\text{T}}}{({Y_{\text{C}}}Y_{{\text{C}}}^{{\text{T}}}+\alpha I+\beta {Y_{\text{C}}}GY_{{\text{C}}}^{{\text{T}}})^{ - 1}}.$$
(8)

This matrix is then applied to the input yin, mapping it to the final corresponding HR version. Note that the iterative back projection is not needed in our framework due to the high precision and smoothness of the manifold. The complete procedure of MCSR is introduced in Algorithm 1.

figure a

3.2 Global solution and offline processing

Although we succeed in reducing the running time using our framework, the online time cost is still not pleasing for the practical use due to the manifold-preserving process. In our experiment, we find that with more vectors being chosen as support base, the algorithm will become more time consuming. The reconstruction quality also will improve to a peak, and then it will slowly drop down as Fig. 1 shows. We notice that when the vector number J in the support base is same as the neighborhood size K, the support domain is equal to the corresponding neighborhood. Therefore, by denoting the new support domain as XN and YN, the manifold-preserving process can be transferred to offline by the following expression:

Fig. 1
figure 1

The average performance of MCSR on Set14 images using different number of support vector

$$P={X_N}Y_{N}^{T}{({Y_N}Y_{N}^{T}+\alpha I+\beta {Y_N}GY_{N}^{T})^{ - 1}}.$$
(9)

The final projective matrix can be computed offline for each dictionary atom. We only need to find the atom which is closest to our input patch yin, and the HR result can then be obtained by applying the corresponding projective matrix to yin. Using the global solution, we will obtain very fast online execution speed together compared to the original MCSR method with a small support domain. The detailed process is depicted in Algorithm 2.

figure b

4 Experimental results

In this section, we give the experimental results of our proposed MCSR method, and compare it to other algorithms in terms of peak signal-to-noise ratio (PSNR) and structural similarity (SSIM).

To ensure a fair comparison, we obtain our dictionary on the same training dataset of 91 images as proposed by Yang et al. [20] and fix the size of dictionary to 1024 for all these algorithms. We use the same 4 filters (first and second derivatives horizontally and vertically) as [20, 21] does to extract the high-frequency component of LR images as input features. The magnification factor is 3 and the size of LR patches is set to 3 × 3. The number of support bases of MSSR and is set to 10, same as the neighborhood number of NE [17], and the sparsity parameter of SC [20, 23] is set to 0.1. For our MCSR, the neighborhood size clustered to an atom is set to 2048 as 31 recommends; the number of support bases is changed to verify the reconstruction quality under different conditions. The regularization parameters λ1, λ2, α and β are empirically set to 0.1, 0.15, 0.3, and 10, respectively. We evaluate the performance of reconstruction on the same common datasets (Set5, Set14). All these procedures are implemented in MATLAB with a 2.4 GHz Intel i7 CPU and 8 GB memory.

Table 1 summarizes the results, showing PSNR and SSIM values of different methods. We can see that our normal MCSR method that uses ten support vectors as MSSR does has already outperformed other algorithms in most cases. The global MCSR method further surpasses the normal one in both average PSNR values and SSIM, which means we recover most details that are similar to the original HR images while demonstrating the best structural similarity with the original HR images. All these experimental results verified the effectivity and necessity of our MCSR method with collaboration representation compared to the previous MSSR framework.

Table 1 SR performance comparison in terms of PSNR (dB) and SSIM (bold for highest, Italics for second-highest)

Visual results are shown in Figs. 2, 3 and 4, demonstrating how MCSR has a visual performance comparable or superior to the other compared methods at a magnification of 3 on a couple of images. We can clearly observe that our proposed method provide the best visual plausibility on all images. In detail, NE [17] suffers a lot from unpleasant artifacts and block effects in all results. Yang et al.’s [20] method introduces some noise and suppresses the texture details due to the strong constraint of same sparse representation on the couple-learned LR and HR dictionary. Zeyde et al.’s [23] method and MSSR ease this problem; therefore, they both obtain a good result; however, they still do not recover enough details. For [23], it may because the several chosen sparse vectors are not able to represent complex image patterns, and for MSSR, it may refer to the insufficiency of manifold precision, where the manifold is approximated by sparse support vectors. We can conclude that our MCSR reconstructs HR images more faithful to the originals with sharper edges, finer textures and less artifacts. This is owing to our collaboration support framework, which is more powerful and flexible to represent complex image patterns.

Fig. 2
figure 2

Visual qualitative comparison for ‘butterfly’ image from Set5 with magnification ×3

Fig. 3
figure 3

Visual qualitative comparison for ‘lenna’ image from Set14 with magnification ×3

Fig. 4
figure 4

Visual qualitative comparison for ‘pepper’ image from Set14 with magnification ×3

We further compare the running time for all these methods; the results are shown in Fig. 5. The NE [17] method costs more than 1000s due to the k-NN search in a large dataset. By learning a compact dictionary, Yang et al.’s method [20] and MSSR successfully shorten the time cost to a level below 100 s. However, they are still far from practical use. Zeyde et al.’s method efficiently boost the process using OMP to select several dictionary atoms for the input LR patch representation instead of solving complex L1 optimization problem. Although our normal MCSR method is similar to the MSSR framework, the running time is still shortened using collaboration representation support. Note that, if 320 support vectors are used in our manifold-preserving process, although we can obtain best reconstruction quality, the time cost will be horrible due to enormous L1 graph-solving process. This problem can be perfectly eased by our global solution, which costs only around 0.8 s while making no compromise to the reconstruction quality, therefore, leading to a very efficient solution for single-image SR problem. Therefore, the time computational complexity of the proposed method is much lower than that of other methods.

Fig. 5
figure 5

The average PSNR and running time cost on Set14 images using different methods

5 Conclusion

In this paper, we proposed a new SR strategy named manifold-regularized collaboration representation support regression (MCSR) to overcome the limitations of optical systems. By combining the collaboration representation of the training samples into the previous manifold-preserving framework, we obtain a much more precise manifold than MSSR. Furthermore, we proposed a global solution for our MCSR framework by transferring the whole mapping matrix calculation process into offline, therefore, increasing the online execution speed by 100 times. Experimental results show our MCSR method can provide a substantial improvement in reconstruction quality than other algorithms, with more faithful details and less artifacts. Simplifying the manifold-preserving process and improving the manifold mapping precision will be our future work.