1 Introduction

Hyperspectral images (HSIs) data are acquired based on the radiance obtained by airborne or space-borne sensors, extracting information from objects or scenes on the earth surface. They contain rich spatial and spectral information, and have been widely used in numerous fields, e.g., terrain detection, environmental monitoring, biological diagnosis [1]. However, an HSI is always damaged by different noise because of photon effects, random errors in light counting, calibration errors and so on [2]. These degradations often damage the potential structure of an HSI and impede the subsequent tasks. Therefore, mixed noise removal in HSIs has become an essential and crucial step for further analysis and application.

In the past few decades, many different technologies have been developed for HSI denoising. The simplest and most intuitive idea is to denoise band-by-band. Representative examples in single-band image denoising include the maximum noise fraction (MNF) transformation [3], the non-local means (NLMs) filter [4], the collaborative filtering of groups of similar patches (BM3D) [5]. The HSI cube can be also denoised by multi-band image denoising methods, including blocks-matching and 4D filtering (BM4D) [6], video denoising using separable 4-D non-local spatio-temporal transforms (VBM4D) [7], and multi-spectral principal component analysis (MSPCA-BM3D) [8]. However, these methods only remove the spatial noise, and cannot take advantage of the correlation of all bands of HSI. Moreover, they introduce artifacts or distortions in restoration results. Hence, they are not able to provide satisfactory denoising results.

To achieve better denoising performance, spatial and spectral information needs to be considered comprehensively. Yuan et al. [9] proposed a spectral and spatial adaptive total variation (TV) denoising method, which takes into account the spectral difference and the spatial difference simultaneously. Fu et al. used the spectral correlation and non-local spatial similarity of each band to learn an adaptive dictionary for HSI denoising [10]. Chen et al. [11] take advantage of the different characteristics of an HSI in both the spatial and spectral domains, and establish a maximum a posteriori (MAP) framework for HSI denoising. In addition, subspace-based methods are widely used to describe spectral correlation, and have achieved good results, including fast hyperspectral denoising (FastHyDe) [12], global local factorization (GLF) [13], etc. Besides, the 2-D images at each band of an HSI have strong structural similarity, hence, a lot of methods utilizes the low-rank property along the adjacent spectral bands of the HSI. For instance, Zhang et al. [14] proposed a method based on low-rank matrix recovery (LRMR) for HSI restoration. Fan et al. built a 3-D low-rank tensor model to handle HSI denoising problem [15]. Zhang et al. [16] proposed a double low-rank matrix decomposition method for HSI denoising and destriping. To better restore the matrix, one can resort to [17,18,19], which can be used to recover the data and reveal its internal characteristics efficiently and effectively.

In addition, the deep learning methods represented by convolutional neural networks (CNNs) are also applied to HSI denoising problems, e.g., the HSI denoising method based on residual convolutional neural network (HSID-CNN) [20], the HSI single-denoising CNN (HSI-SDeCNN) [21], etc. Despite these deep learning-based methods achieve the state-of-the-art performance, they are not necessarily robust. Besides, these methods do not fully explore the inner relationship of an HSI, either spectrally or spatially. Hence, they cannot adapt to different data and noise types. When the type of noise in the test data becomes complex, the users should train the model from scratch, which is time-consuming.

In recent years, sparse representation has been favored for image denoising [22] and restoration [23]. Signals are usually approximated by a set of sparse atoms in a dictionary. This set of sparse atoms helps reduce the redundancy of the original high-dimensional, so that the information contained in the signal can be retrieved more easily. Sparse representation has been applied to 2-D image denoising [24]. However, it usually faces difficulties for HSI denoising because it does not consider the correlation between bands and spatial pixels of HSI. To make full use of the characteristics of HSI’s spectral correlation and spatial similarity, Ye et al. put forward a method for sparse representation using non-local spectral-spatial structure [25]. Zhao et al. raised a model that combines sparse representation with low-rank constraints to solve the HSI restoration problem [26].

The HSI denoising problem can also be regarded as an inverse imaging problem. Therefore, the use of appropriate image priors for regularization is very important. As a recently widely-used prior, graph Laplacian regularization has been empirically proven useful in denoising [27, 28], sharpening [29], HSI unmixing [30, 31], etc. Extended to the denoising problem of HSI, Lu et al. [32] proposed a new HSI destriping method by considering the local flow pattern structure of HSI and adopting the method of graph regularization and low rank.

Although the methods described above can achieve good results, few methods view stripe noise as a separate component, hence most mixed noise removal methods cannot successfully remove stripe noise. The stripe noise has significant structural and directional characteristics, and the stripes occur periodically, leading to the low-rank property. In this paper, we propose to add a low-rank constraint on the stripe noise to the HSI denoising method based on sparse representation and graph Laplacian regularization. The contributions of this paper lie in that:

  1. 1

    We regard the stripe noise as an independent component, and use a nuclear norm to regularize the low-rank property of the stripe noise, thus resulting in satisfying de-striping and denoising performance;

  2. 2

    We use an iterative update algorithm to solve our proposed non-convex model, which helps us minimize the least square error to find the local optimal solution;

  3. 3

    We conduct numerous experiments on the simulated and real data to find out the most appropriate parameters, and the experimental results demonstrate that the proposed method outperforms many mainstream methods in both quantitative evaluation indexes and visual effects.

The remainder of this paper is organized as follows: Sect. 2 introduces graph regularized non-negative matrix factorization and multi-task graph-regularized sparse non-negative matrix factorization related work. Section 3 proposes the model of our method and derives the solution process. The results of different methods on simulated data and real data are in Sect. 4. Finally, conclusions are drawn in Sect. 5.

2 Related work

An HSI is a superposition of the same area images under different spectra. Hence, it has strong correlations between bands, which leads to information redundancy [33]. In this paper, we use matrix decomposition to reduce the redundancy [34]. Matrix decomposition helps us find the product of two or more low-dimensional data matrices to approximate the original high-dimensional data. Therefore, it can reduce the data dimension, thus decreasing the redundancy of HSI.

Because of its convenience and simplicity, the nonnegative matrix factorization (NMF) algorithm is proposed and widely applied in computer vision, document clustering, recommendation system and other fields [35]. Different from sparse dictionary learning, the NMF requires that the original matrix, the corresponding bases and the coefficient matrix are nonnegative. Hence, it is more useful for nonnegative data such as images, because the pixel intensity of an image is always nonnegative. However, it does not consider the geometric structure of the data. To address this issue, Cai et al. constructed an affinity graph to encode geometric information, and sought a matrix factorization that respects the structure of the graph [36]. The algorithm is called graph regularized nonnegative matrix factorization (GNMF). The objective function of GNMF is defined as:

$$\begin{aligned} \{\mathbf {U,V}\} = \mathop {\arg\min }\limits _{{{\textbf {U,V}}}} \left\| {{{\textbf {Z}}} - {{\textbf {UV}}}} \right\| _{\mathbf {F}}^{\mathbf {2}} + \frac{\mu }{2}\sum \limits _{i,j = 1}^{{\mathcal {N}}} {{{\left\| {{v_i} - {v_j}} \right\| }^2}} {{{\textbf {W}}}_{i,j}}, \end{aligned}$$
(1)

where \(\mathbf {Z}=[{{z}_{1}},{{z}_{2}},\ldots ,{{z}_\mathcal {N}}]\in {{\mathbb {R}^{\mathcal {M}\times \mathcal {N}}}}\) is the given data matrix. Each column of \(\mathbf {Z}\) is a sample vector, and \(\mathcal {M}\) represents the dimension of the vector, \(\mathcal {N}\) indicates the number of samples of the data. \(\mathbf {U}\in {\{{\mathbb {R}_{+},0\}^{\mathcal {M}\times \mathcal {R}}}}\) means the basis of \(\mathbf {Z}\) in the representation of NMF, and \(\mathbf {V}=[{{v}_{1}},{{v}_{2}},\ldots ,{{v}_\mathcal {N}}]\in {\{{\mathbb {R}_{+},0\}^{\mathcal {R}\times \mathcal {N}}}}(\mathcal {R}\ll \mathcal {N})\) denotes the coefficient matrix. A \(\mathcal {N}\)-node graph \(\mathcal {G}\) is formed with \(\{{{v}_{1}},{{v}_{2}},\ldots ,{{v}_\mathcal {N}}\}\) the graph signals. \({{\textbf {W}}}\) is the corresponding adjacency matrix, and its element \({{\textbf {W}}}_{i,j}\) is the edge weight between nodes \({v_i}\) and \({v_j}\). \(\mu\) is the regularization parameter that controls the regularization term. In (1), the second term usually represents the association between nodes [37], which can be further modified as follows:

$$\begin{aligned} \begin{aligned} \frac{\mathbf {1}}{\mathbf {2}}\sum \limits _{i,j=1}^{\mathcal {N}}{\left\| {{v}_{i}}-{{v}_{j}} \right\| _{\mathbf {2}}^{\mathbf {2}}{\mathbf {{W}}_{i,j}}}&=\sum \limits _{i=1}^{\mathcal {N}}{v_{i}^{T}{{v}_{i}}{\mathbf {{D}}_{ii}}-}\sum \limits _{i,j=1}^{\mathcal {N}}{v_{i}^{T}}{{v}_{j}}{\mathbf {{W}}_{i,j}} \\&=Tr(\mathbf {V}\mathbf {D}{\mathbf {{V}}^{T}})-Tr(\mathbf {V}\mathbf {W}{\mathbf {{V}}^{T}}) \\&=Tr(\mathbf {V}\mathbf {L}{\mathbf {{V}}^{T}}), \\ \end{aligned} \end{aligned}$$
(2)

where \(T\text {r}(\cdot )\) stands for the trace of a matrix, \(\mathbf {D}\) is a diagonal matrix in which \({\mathbf {{D}}_{i,i}}=\sum \nolimits _{j=1}^{\mathcal {N}}{{\mathbf {{W}}_{i,j}}}\), and the graph Laplacian matrix is \(\mathbf {L}=(\mathbf {D}-\mathbf {W})\in {{\mathbb {R}}^{\mathcal {N}\times \mathcal {N}}}\), which is symmetric positive semidefinite [38]. Therefore, GNMF can be simplified to:

$$\begin{aligned} \{{\textbf {U}},{\textbf {V}}\} =\underset{\mathbf {U},\mathbf {V}}{\mathop {\arg \min }}\,\left\| \mathbf {Z}-\mathbf {UV} \right\| _{F}^{2}+\mu T\text {r(}{{\mathbf {VL}\mathbf {V}}^{T}}\text {)}, \end{aligned}$$
(3)

The GNMF algorithm can reveal the geometric relationship inside the high-dimensional data. Based on GNMF, Lei et al. developed this method to the denoising task of HSI, and proposed multi-task graph-regularized sparse nonnegative matrix factorization (MTGSNMF) [39]. In this methd, they added an \(\ell _1\) norm to regularize the sparse noise, and the restoration model is as follows:

$$\begin{aligned} \{{\textbf {U}},{\textbf {V}}\}=\underset{\mathbf {U},\mathbf {V}}{\mathop {\arg \min }}\,\left\| \mathbf {Z-UV} \right\| _{F}^{2}+\lambda {{\left\| \mathbf {V} \right\| }_{1}}+\mu Tr(\mathbf {VL}{{\mathbf {V}}^{T}}) . \end{aligned}$$
(4)

3 Method

3.1 Problem formulation

As mentioned above, HSI is usually affected by different noises. In this article, we assume that the noises are independent, and we have:

$$\begin{aligned} {{\textbf {Y}}} = {{\textbf {X}}} + {{\textbf {N}}} + {{\textbf {B}}} + {{\textbf {S}}}, \end{aligned}$$
(5)

where \(\mathbf {Y}\in {{\mathbb {R}}^{M\times N\times P}}\) is the observed HSI has M rows, N columns, and P bands, respectively. \({{\textbf {X}}}\) is the clean image, and the remaining variables represent the additive noise components. \({{\textbf {N}}}\) represents the Gaussian noise, \({{\textbf {B}}}\) denotes the sparse noise, and \({{\textbf {S}}}\) indicates the stripe noise. Given the observed noise data \({{\textbf {Y}}}\), we try to restore \({{\textbf {X}}}\) from \({{\textbf {Y}}}\).

3.2 Proposed model

The sparse representation method can achieve the approximation of the recovered clean HSI. Additionally, the introduction of graph Laplacian regularization can describe the non-local similarity of the entire HSI. Besides, the stripe noise has low-rank property, hence, we combine the constraint on the stripe noise with the MTGSNMF. The proposed HSI reconstruction model is as follows:

$$\begin{aligned}&\underset{\mathbf{U},\mathbf{V},\mathbf{S}}{\mathop{\arg \min }}\,\left\| \mathbf{Y'}-\mathbf{UV}-\mathbf{S'} \right\| _{\mathbf{F}}^{\mathbf{2}}+\lambda{{\left\| \mathbf{V} \right\| }_{\mathbf{1}}}+\mu Tr(\mathbf{VL}{{\mathbf{V}}^{T}}) +\beta \sum \limits _{b=1}^{P}{\mathrm{rank}({{\mathbf{S}}_{b}})}, \\ \end{aligned}$$
(6)

where \(\mathbf {Y'}\) and \(\mathbf {S'}\) are auxiliary variables that represent \(\mathbf {Y}\) and \(\mathbf {S}\). To make better use of the redundant information of HSI and the relevant characteristics between bands, we are inspired by NLMs [4] and divide the observed \(\mathbf {Y}\) into a set of K overlapping full-band blocks \(\{\mathbf{Y}_1, \mathbf{Y}_2,\ldots ,\mathbf{Y}_K\}\). In addition, to simplify the representation of each graph node and facilitate subsequent calculations, each \(\mathbf{Y}_k \in \mathbb {R}^{m\times m\times P}\) is then vectorized to \(\mathbf {Y}_k' \in \mathbb {R}^{m^2P\times 1}\), where m is the width and height of the block, P is the number of bands, \(\mathbf {Y'}=[{\mathbf {Y}_1'},{\mathbf {Y}_2'},\ldots ,{\mathbf {Y}_K'}]\). \(\mathbf {S'}\) is obtained in the same way as \(\mathbf {Y'}\), and \(\mathbf {Y',S'}\in {{\mathbb {R}}^{m^2P\times K}}\). \(\mathbf {S}_b \in \mathbb {R}^{M\times N}\) is the 2-D stripe noise image at \(b_{th}\) band, and \(\mathbf {S} = [\mathbf {S}_1, \mathbf {S}_2,\ldots ,\mathbf {S}_b,\ldots ,\mathbf {S}_P]\).

As the low-rank constraint is non-convex, solving the problem (6) is challenging. Therefore, we replace the low-rank constraint by the nuclear norm \({\left\| \cdot \right\| _{\mathbf {*}}}\) [40, 41], where \(\left\| C \right\| _* =tr\sqrt{C^{T}C}=tr\sqrt{(U \Sigma V^{T})^TU \Sigma V^{T}}=tr( \Sigma )\), where \(\Sigma\) stands for the singular values of C. Thus, the reconstruction model is reformulated as:

$$\begin{aligned}&\underset{\mathbf{U},\mathbf{V},\mathbf{S}}{\mathop{\arg \min }}\,\left\| \mathbf{Y'}-\mathbf{UV}-\mathbf{S'} \right\| _{\mathbf{F}}^{\mathbf{2}}+\lambda{{\left\| \mathbf{V} \right\| }_{\mathbf{1}}}+\mu Tr(\mathbf{VL}{{\mathbf{V}}^{T}}) +\beta \sum \limits _{b=1}^{P}{{{\left\| {{\mathbf{S}}_{b}} \right\| }_{\mathbf{*}}}}, \\ \end{aligned}$$
(7)

where \(\left\| \mathbf {Y'}-\mathbf {UV}-\mathbf {S'} \right\| _{\mathbf {F}}^{\mathbf {2}}\) is the data fidelity item, which can help remove Gaussian noise through the \(\left\| \cdot \right\| _{\mathbf {F}}^{\mathbf {2}}\) norm constraint. \({\left\| {{\textbf {V}}} \right\| _{\mathbf {1}}}\) is sparsity constraint item, which can ensure the sparseness of the coefficient matrix, and can also constrain the sparse noise. \(\sum \limits _{b = 1}^P {{{\left\| {{{{\textbf {S}}}_b}} \right\| }_{\mathbf {*}}}}\) is the constraint on stripe noise.

Furthermore, \(Tr({{\textbf {VL}}}{{{\textbf {V}}}^T})\) is a graph Laplacian regularizer which helps preserve the spatial relation. As mentioned above, the graph is formed by K nodes, and each node is equipped with a signal \(\mathbf{Y}_k'\). We implement the k-means clustering algorithm for the graph nodes, so that all nodes are divided into c classes [39]. The weight of the edge of any two nodes in the same category is assigned as 1, and the connection weight of any two nodes not in the same category is assigned as 0. So the edge weight is defined as:

$$\begin{aligned} {{\mathbf {W}}_{i,j}}=\left\{ \begin{matrix} 1, &{} {\mathbf {Y'}_{i}}\ \text {and}\ {\mathbf {Y'}_{j}}\ \text {are in the same cluster,} \\ 0, &{} \text {otherwise.} \\ \end{matrix} \right. \end{aligned}$$
(8)

The corresponding graph Laplacian matrix \(\mathbf {L}\) is obtained by \(\mathbf {L}=\mathbf {D}-\mathbf {W}\).

Moreover, \(\lambda\), \(\mu\), \(\beta\) are the regularization parameters. \(\lambda\) controls the sparseness, and \(\mu\) dominates the strength of graph regularization, and \(\beta\) regulates the stripe low-rank regularizer. Finally, We need to optimize three variables in the objective function, which are \({{\textbf {U}}}\), \({{\textbf {V}}}\) and \({{\textbf {S}}}\). Next, we use the alternate minimization strategy to solve the problem. The optimization process is as follows.

3.3 Optimization procedure

The objective function in (7) is non-convex, so it is difficult to find a global minimum. The product \({{\textbf {UV}}}\) of two random variables \({{\textbf {U}}}\) and \({{\textbf {V}}}\) between 0 and 1 is a rough approximation of the original observation matrix \(\mathbf {Y'}\in {{\mathbb {R}}^{m^2P\times K}}\). Therefore, to solve \({{\textbf {U}}}\) and \({{\textbf {V}}}\), we use an iterative update algorithm that minimize the least square error to find the local optimal solution [42]. In addition, for solving the stripe component \({{\textbf {S}}}\), we convert the 2-D data matrix recovered from \({{\textbf {U}}{} {\textbf {V}}}\) into a 3-D matrix. Then, we use the low-rank constraint band by band for the recovered HSI, and use the soft threshold operation for the singular value [43]. For the convenience of description, let

$$\begin{aligned}&\mathcal{L}=\left\| \mathbf{Y'-UV-S'} \right\| _{\mathbf{F}}^{\mathbf{2}}+\lambda{{\left\| \mathbf{V} \right\| }_{\mathbf{1}}}+\mu Tr(\mathbf{VL}{{\mathbf{V}}^{T}}) +\beta \sum \limits _{b=1}^{P}{{{\left\| {{\mathbf{S}}_{b}} \right\| }_{\mathbf{*}}}}, \\ \end{aligned}$$
(9)

To find the minimum value of \(\mathcal {L}\), firstly, we should obtain the partial derivatives of \(\mathcal {L}\) w.r.t. \({{\textbf {U}}}\) and \({{\textbf {V}}}\) respectively:

$$\begin{aligned} \begin{aligned} \left\{ \begin{aligned}&\frac{{\partial \mathcal{L}}}{{\partial {{\textbf {U}}}}} = - {\mathbf {2}}{{\textbf {Y'}}}{{{\textbf {V}}}^T} + {\mathbf {2}}{{\textbf {UV}}}{{{\textbf {V}}}^T} + {\mathbf {2}}{{\textbf {S'}}}{{{\textbf {V}}}^T} \\&\frac{{\partial \mathcal{L}}}{{\partial {{\textbf {V}}}}} = - {\mathbf {2}}{{{\textbf {U}}}^T}{{\textbf {Y'}}} + {\mathbf {2}}{{{\textbf {U}}}^T}{{\textbf {UV}}} + {\mathbf {2}}{{{\textbf {U}}}^T}{{\textbf {S'}}} + \lambda + {\mathbf {2}}\mu {{\textbf {V}}}{{{\textbf {L}}}^T}\\ \end{aligned} \right. \end{aligned}, \end{aligned}$$
(10)

where the Laplacian matrix \({{\textbf {L}}}\) satisfies \({{\textbf {L}}} = {{\textbf {D}}} - {{\textbf {W}}}\), so \(\frac{{\partial \mathcal{L}}}{{\partial {{\textbf {V}}}}}\) can be transformed into:

$$\begin{aligned} \frac{{\partial \mathcal{L}}}{{\partial {{\textbf {V}}}}} = - {\mathbf {2}}{{{\textbf {U}}}^T}{{\textbf {Y'}}} + {\mathbf {2}}{{{\textbf {U}}}^T}{{\textbf {UV}}} + {\mathbf {2}}{{{\textbf {U}}}^T}{{\textbf {S'}}} + \lambda + {\mathbf {2}}\mu {{\textbf {V}}}{{{\textbf {D}}}^T} - {\mathbf {2}}\mu {{\textbf {V}}}{{{\textbf {W}}}^T}, \end{aligned}$$
(11)

Utilizing KKT conditions \({{\textbf {U}}} \odot \frac{{\partial \mathcal{L}}}{{\partial {{\textbf {U}}}}} = 0\) and \({{\textbf {V}}} \odot \frac{{\partial \mathcal{L}}}{{\partial {{\textbf {V}}}}} = 0\), where \(\odot\) represents the multiplication of the corresponding elements of the matrix, we can get the following equation w.r.t. \({{\textbf {U}}}\) and \({{\textbf {V}}}\):

$$\begin{aligned} \begin{aligned} \left\{ \begin{aligned}&{{\textbf {U}}} \odot ( - {{\textbf {Y'}}}{{{\textbf {V}}}^T} + {{\textbf {UV}}}{{{\textbf {V}}}^T} + {{\textbf {S'}}}{{{\textbf {V}}}^T}) = 0 \\&{{\textbf {V}}} \odot ( - {{{\textbf {U}}}^T}{{\textbf {Y'}}} + {{{\textbf {U}}}^T}{{\textbf {UV}}} + {{{\textbf {U}}}^T}{{\textbf {S'}}} + \frac{\lambda }{{\mathbf {2}}} + \mu {{\textbf {V}}}{{{\textbf {D}}}^T} - \mu {{\textbf {V}}}{{{\textbf {W}}}^T})= 0 \\ \end{aligned} \right. \end{aligned}, \end{aligned}$$
(12)

The above equation will result in the following update rules of \({{\textbf {U}}}\) and \({{\textbf {V}}}\), the proof of this rule can refer to theorem 1 in GNMF [36].

$$\begin{aligned} \begin{aligned} \left\{ \begin{aligned}&{{\textbf {U}}} \leftarrow {{\textbf {U}}} \odot \frac{{{{\textbf {Y'}}}{{{\textbf {V}}}^T}}}{{{{\textbf {UV}}}{{{\textbf {V}}}^T} + {{\textbf {S'}}}{{{\textbf {V}}}^T}}} \\&{{\textbf {V}}} \leftarrow {{\textbf {V}}} \odot \frac{{{{{\textbf {U}}}^T}{{\textbf {Y'}}} + \mu {{\textbf {V}}}{{{\textbf {W}}}^T}}}{{{{{\textbf {U}}}^T}{{\textbf {UV}}} + {{{\textbf {U}}}^T}{{\textbf {S'}}} + \frac{\lambda }{{\mathbf {2}}} + \mu {{\textbf {V}}}{{{\textbf {D}}}^T}}} \\ \end{aligned} \right. \end{aligned}, \end{aligned}$$
(13)

Upon obtaining \({{\textbf {U}}}\) and \({{\textbf {V}}}\) according to (13), we convert \(\mathbf {UV}\) to a 3-D matrix \(\mathbf {E}\), which is the approximation of the denoised image. The stripe component \({{\textbf {S}}}\) is solved band by band. We use the augmented Lagrange multiplier (ALM) method [44] as follows:

$$\begin{aligned} \begin{aligned} {\mathbf {S}_{b}}&=\underset{{\mathbf {S}_{b}}}{\mathop {\arg \min }}\,\sum \limits _{b=1}^{P}{\left( \beta {{\left\| {\mathbf {S}_{b}} \right\| }_{*}}+{{Y}_{b}^S} \cdot (\mathbf {Y}_{b}-\mathbf {E}_{b}-\mathbf {S}_{b})+\frac{\rho }{2}\left\| \mathbf {Y}_{b}-\mathbf {E}_{b}-\mathbf {S}_{b} \right\| _{\mathbf {F}}^{\mathbf {2}} \right) } \\&=\underset{{\mathbf {S}_{b}}}{\mathop {\arg \min }}\,\sum \limits _{b=1}^{P}{\left( \beta {{\left\| {\mathbf {S}_{b}} \right\| }_{*}}+\frac{\rho }{2}\left\| \mathbf {Y}_{b}-\mathbf {E}_{b}-\mathbf {S}_{b}+\frac{{{Y}_{b}^S}}{\rho } \right\| _{\mathbf {F}}^{\mathbf {2}} \right) }, \\ \end{aligned} \end{aligned}$$
(14)

It is a low-rank matrix approximation (LRMA) problem and solve it through the soft threshold operation of SVD [43]. Thus, we have the following update rules:

$$\begin{aligned} \left\{ \begin{array}{l} {\mathbf {S}_b}=U(\mathrm{shrink}\_{{L}_{*}}({{\Sigma }_{r}},\frac{\beta }{\rho })){{V}^{*}} \\ \mathrm{SVD}({\mathbf {Y}_b} - {\mathbf {E}_b}+{\textstyle {{{Y_b^S}} \over \rho }})=U\Sigma {{V}^{*}} \\ \mathrm{shrink}\_{{L}_{*}}({{\Sigma }_{r}},\frac{\beta }{\rho })=\mathrm{diag}\{\max ({{\Sigma }_{rii}}-\frac{\beta }{\rho },0)\} \\ {{\Sigma }_{r}}=\mathrm{diag}({{\sigma }_{i}}(1<i<r)) \\ \end{array} \right. \end{aligned}$$
(15)

where \(\rho\) is the penalty factor, \({Y_b^S}\) is the Lagrange multiplier. \({{\sum }_{rii}}\) is the diagonal element of the singular value matrix \({\sum _r} = \mathrm{diag}({\sigma _i}(1< i < r))\), \(\mathrm{shrink}\_{{L}_{*}(\cdot )}\) is the soft threshold operation, r is the upper limit of the low-rank matrix.

Finally, the Lagrange multiplier can be updated in parallel:

$$\begin{aligned} {Y_b^S} = {Y_b^S} + \rho \cdot (\mathbf {Y}_b-\mathbf {E}_b-\mathbf {S}_b). \end{aligned}$$
(16)

The proposed algorithm for HSI denoising, termed SRGLR, is summarized in Algorithm 1.

figure a

4 Experiments

4.1 Experimental setup

For the simulated experiments, the Pavia City Center dataset and the Washington DC Mall dataset are used for clean HSI. The Pavia City Center dataset is collected by the reflection optical system imaging spectrometer (ROSIS-03), and we choose a \(200\times 200\times 80\) sub-image. Washington DC Mall dataset is obtained from a hyperspectral digital image (HYDICE) sensor in a shopping mall in Washington, and the size is selected \(256\times 256\times 80\).

For real data experiments, we use the Indian Pines dataset and the Gaofen (GF-5) Shanghai dataset. The Indian Pines dataset data is captured by the Airborne Visual Infrared Imaging Spectrometer (AVIRIS) and has \(145\times 145\) pixels and 220 channels, The GF-5 Shanghai dataset is acquired by the advanced Hyperspectral Imager (AHSI) in GF-5 satellite. A subimage with the size of \(300\times 300\times 155\) is selected. The two real datasets are severely degraded by various noises at some bands.

To evaluate the performance, we use the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM). The PSNR measures the quality of the restored image based on the mean square error, and the SSIM computes the similarity between the target image and the reference image. The higher PSNR and SSIM, the better the denoised image. Besides, to further objectively evaluate the recovery results, we introduce the spectral angle mapper (SAM). The SAM calculates the spectral similarity based on the angle difference between the restored HSI and the noise-free HSI spectral vector. The smaller the SAM, the more similar the recovery image is to the original image. To reflect the overall recovery quality among all bands, we use the mean PSNR (MPSNR), mean SSIM (MSSIM), and mean SAM (MSAM).

To testify the effectivness of our method in both simulated and real data condition, we conduct a serious of comparative experiments with six classical or state-of-the-art model-based denoising methods and a deep learning method : BM4D [6], low-rank matrix recovery (LRMR) [14], the noise-adjusted iterative low-rank matrix approximation (NAILRMA) [45], spatial-spectral total variation regularized local low-rank matrix recovery (LLRGTV) [46], multi-task graph-regularized sparse non-negative matrix factorization (MTGSNMF) [39], stripe spectral low-rank and spatial-spectral TV regularization (SSTVSSLR) [44], and HSI-SDeCNN [21]. Since the pre-trained HSI-DeCNN model provided by the authors only considers the Gaussian noise. To be fair, we retrained the network by adding the salt and pepper noise and stripe noise to the training and validation data. After 56.7 hours of training, the best training model is obtained.Footnote 1

Before denoising, to converge to the local optimal solution more efficiently, we normalized the HSI data to [0,1] by dividing the maximum value of the HSI data cube. Besides, we randomly select \(30\%\) of the band to simulate stripe noise. Empirically, we set the penalty parameter as \(\lambda =10\), \(\mu =10\), \(\beta =5\), and the initial block size is \(4\times 4\), and the step size in both the horizontal and vertical directions is 2. Finally, the threshold of stopping criterion and the max number of iterations are set to be \(\varepsilon ={{10}^{-6}}\) and \({{I}_{\max }=300}\), respectively. To simulate different noisy cases, we consider using the following different levels of noise types:

Case 1 We add Gaussian noise with standard variance \(\sigma\) = 0.05 and salt and pepper noise with the percentage of o = 0.01. Besides, the intensity of the stripe noise is v = 0.075. For the percentage of stripe noise, we use an increasing value r = 0.3 (Case 1.1), r = 0.5 (Case 1.2), and r = 0.7 (Case 1.3), respectively.

Case 2 Gaussian noise and salt and pepper noise are the same as Case 1. The stripe noise with the percentage of r = 0.3 and the intensity range from v = 0.05 (Case 2.1) to v = 0.1 (Case 2.2).

4.2 Simulated experimental results

Figures 1 and 2 shows the denoising results of different methods under Case 1.3 for the fourth band of the Pavia City Center image and Case 2.2 for the second band of Washington DC Mall image, respectively. In the two figures, the first row represents the restored image, and the second row indicates the locally enlarged detail region. Figures 1a, b and 2a, b separately displays the original noiseless image and the noisy image. Figures 1c–i and 2c–i are restored images by using different denoising methods. Figures 1j and 2j are the results of our proposed method. Through visual analysis and comparison, we can see that BM4D cannot remove stripe noise and salt and pepper noise, and this is because the BM4D ignores the correlation amongst the bands. NAIRLMA only considers Gaussian noise, so the output effects are not ideal. The LRMR, LLRGTV, MTGSNMF and HSI-SDeCNN can remove Gaussian noise and salt and pepper noise, but they fails in removing stripe noise. Though SSTVSSLR removes mixed noise very well, in some areas, such as the enlarged image of Fig. 1, it either enhances or decays the bright points at the top of the image, while our method can present better results in detail preservation and mixed noise removal.

Fig. 1
figure 1

The Pavia City Center image (4th band) [top] and zoom-in image [bottom] before and after denoising in Case 1.3 with \(\sigma\) = 0.05 and o = 0.01, r = 0.7, v = 0.075. a Original image; b Noisy image; the denoising results obtained by c BM4D; d LRMR; e NAILRMA; f LLRGTV; g MTGSNMF; h SSTVSSLR; i HSI-SDeCNN; j SRGLR (proposed)

Fig. 2
figure 2

The Washington DC Mall image (2nd band) [top] and zoom-in image [bottom] before and after denoising in Case 2.2 with \(\sigma\) = 0.05 and o = 0.01, r = 0.3, v = 0.1. a Original image; b Noisy image; the denoising results obtained by c BM4D; d LRMR; e NAILRMA; f LLRGTV; g MTGSNMF; h SSTVSSLR; i HSI-SDeCNN; j SRGLR (proposed)

Fig. 3
figure 3

Vertical mean profiles of band 4 in the Pavia City Center image a Original image; b Noisy image; before and after denoising via the different methods: c BM4D; d LRMR; e NAILRMA; f LLRGTV; g MTGSNMF; h SSTVSSLR; i HSI-SDeCNN; j SRGLR (proposed). The green, red, and blue curves represent the mean DN of the clean, noisy and restored images, respectively

Fig. 4
figure 4

Vertical mean profiles of band 2 in the Washington DC Mall image a Original image; b Noisy image; before and after denoising via the different methods: c BM4D; d LRMR; e NAILRMA; f LLRGTV; g MTGSNMF; h SSTVSSLR; i HSI-SDeCNN; j SRGLR (proposed). The green, red, and blue curves represent the mean DN of the clean, noisy and restored images, respectively

Figures 3 and 4 display the vertical average profile that corresponds to Figs. 1 and 2. The horizontal axis represents the number of columns, and the vertical axis indicates the mean gray-level or digital number (DN) of each column. The green, red, and blue curves represent the mean DN of the clean, noisy and restored images, respectively. Note that the more similarity between the blue and green curves, the better the denoising effect. Figures 3b–i and 4b–i show the mean DN of the denoised images using different methods and their comparisons with the ground-truth. Obviously, the results obtained by BM4D, LRMR, NAILRMA, LLRGTV, and MTGSNMF are similar to the original noisy image (red curve). This is because the 4th band of the Pavia dataset has strong stripe noise, while stripes significantly impact the mean DN. None of the above six methods can remove stripe noise, so the restored mean DN curve is almost equivalent to the noisy curve. Compared with the above model-based methods, HSI-SDeCNN can provide a slightly better visual performance. However, it fails to preserve the details and to remove some stripe noise. SSTVSSLR and our method perform well on both types of data, which demonstrates that the low-rank constraint on the stripe is an essential term to remove the stripe noise. In addition, the curve obtained by our method is closer to the original clean HSI curve, which further illustrates the advantages of our method.

Figure 5 shows the PSNR and SSIM of each band of the Pavia City Center in Case 1.3 and the Washington DC Mall dataset in Case 2.2. As can be seen from the figure, our method can generate the highest PSNR and SSIM in some bands. In addition, our method can bring comparatively stable denoising results for each band. In other words, we can see that the curves of some methods have serious sawtooth phenomena, which seriously limits the overall assessment index, while our approach can result in a better overall effect by giving relatively smooth and steady curves.

Fig. 5
figure 5

PSNR value and SSIM values of each band in the Pavia City Center image a, b and the Washington DC Mall image c, d with Case 1.3 and Case 2.2

Table 1 shows the MPSNR, MSSIM and MSAM of different methods for the Pavia City Center dataset and Washington DC Mall dataset in both noisy cases, i.e., Case 1 and Case 2. The optimal indexes are shown in bold. From the table, we can see our method achieves the highest MPSNR, the largest MSSIM, and the lowest MSAM in most cases. Moreover, in terms of the indexes, our method behaves more excellent than other methods when increasing the percentage and intensity of stripe noise. The above experiments demonstrate the effectiveness of the proposed method in simulated cases.

Table 1 In the different cases of simulated mixed noise (Case 1 and Case 2), the quantitative picture quality indices of denoising two HSI data with different methods

4.3 Real noisy data experimental results

Figure 6 depicts the result on the Indian Pines dataset. Figure 6a displays the original noisy image at band 219, and Fig. 6b–i show the denoising results gained by different methods. We can see from Fig. 6b that BM4D cannot remove mixed noise. As shown in Fig. 6c, d, LRMR and NAIRLMA can deduct a certain amount of noise, but there still remains significant noise. In Fig. 6e, LLRGTV produces artifacts in a local area. In Fig. 6f, MTGSNMF brings too smooth a result and filters the details of the image. In Fig. 6h, HSI-DeCNN has obvious stripes in local areas and generates chessboard artifacts. Through visual comparison, SSTVSSLR and our proposed method have better visual effects.

Fig. 6
figure 6

Band 219 of the Indian Pines dataset before and after denoising via the different methods: a Original image; b BM4D; c LRMR; d NAILRMA; e LLRGTV; f MTGSNMF; g SSTVSSLR; h HSI-SDeCNN; i SRGLR (proposed)

Fig. 7
figure 7

Band 155 of the GF-5 Shanghai image before and after denoising via the different methods: a Original image; b BM4D; c LRMR; d NAILRMA; e LLRGTV; f MTGSNMF; g SSTVSSLR; h HSI-SDeCNN; i SRGLR (proposed)

Figure 7a is the image of the GF-5 Shanghai dataset at band 155. As shown in Fig. 7a, the image is seriously corrupted by Gaussian noise, impulse noise, dense stripe noise and dead lines. Comparing the detailed views from denoising images of different methods, BM4D and LRMR still contain heavy stripe noise, deadlines and some sparse noise. NAIRLMA, LLRGTV, MTGSNMF and HSI-DeCNN behave better but can not fully remove the stripe noise and dead lines. SSTVSSLR performs better than the above methods, but in the lower left and upper right corner, there is still some noise left. In the end, our proposed method delivers the best visual effect.

In addition, we provide the vertical average profile of the 155th band in the GF-5 Shanghai dataset before and after denoising. We can see from Fig. 8a that the original data is severely affected by various types of noises, and the mean DN curve fluctuates rapidly. Figure 8b–i show the mean DN curves after denoising by different methods. From the results, BM4D and LRMR barely generate any changes compared with the original noisy curve. The left methods can suppress the noise more or less, but SSTVSSLR, HSI-DeCNN and our method can bring a better result with the smoother curve. Compared with other methods, our method can get the best effect both in the global and in the local details. This result further proves the superiority of the proposed method.

Fig. 8
figure 8

Vertical mean profiles of band 155 in the GF-5 Shanghai image before and after denoising via the different methods: a Original image; b BM4D; c LRMR; d NAILRMA; e LLRGTV; f MTGSNMF; g SSTVSSLR; h HSI-SDeCNN; i SRGLR (proposed). The red and blue curves represent the mean DN of the original and restored images, respectively

Fig. 9
figure 9

Change in the MPSNR values of Proposed method for the Pavia City Center image (top) and the Washington DC Mall image (bottom) by varying parameters \(\lambda\), \(\mu\) and \(\beta\). The data were corrupted by the noise simulated in Case 1 and Case 2 with \(\sigma\) = 0.05 and o = 0.01: a, f r = 0.3, v = 0.075; b, g r = 0.5, v = 0.075; c, h r = 0.7, v = 0.075; d, i r = 0.3, v = 0.05; e, j r = 0.3, v = 0.1;

4.4 Discussion

The adjustment of the parameters is essential for the denoising task. In this paper, there are three parameters that need adjusting, i.e., \(\lambda\), \(\mu\), \(\beta\). \(\lambda\) controls the regularization of the salt and pepper noise. \(\mu\) is a graph Laplacian regularization parameter. \(\beta\) is the parameter for stripe noise. Figure 9 shows the changes of the MPSNR value on the Pavia City Center data and the Washington DC Mall data under Case 1 and Case 2 by using different parameters along the three coordinate axes.

As mentioned in Sect.3.3, it is not easy to find the global optimal solution of the cost function (7) because it is not a convex optimization problem. This is the reason that we can barely see a convergence point in Fig. 9 but several local maximums. Therefore, we choose the optimal local solution as our solution goal. According to a trade-off between the experimental results under different noise conditions and data, the optimal parameters are finally set to be: \(\lambda = 10\), \(\mu = 10\), \(\beta = 5\). As the above experimental results demonstrate, the selected parameters are robust to various noise conditions and distinct data.

5 Conclusions

Concentrating on the high-dimensional and particular geometric structure of the HSIs, in this paper, we propose a sparse representation and graph Laplacian regularization (SRGLR) method to solve the destriping and denoising problem. By analyzing the properties of the mixed noise, we combine a graph Laplacian regularization, sparse representation, and low-rank term to construct our denoising model. The sparse representation ensures the approximation of the restored image, and the graph Laplacian regularization term enhances the non-local similarity of HSI. In addition, the low-rank constraint helps remove the stripe noise, and the sparse regularizer eliminates the sparse noise. To solve the proposed model, an iterative algorithm of local minimum is used to recover the final restored image. Finally, simulation experiments and real experimental results show that our method has certain advantages in removing mixed noise. In future, we will consider the removal of more complex mixed noise such as deadlines, dead-zones, poisson noise, etc.