Multispectral image denoising using sparse and graph Laplacian Tucker decomposition

Multispectral image denoising is a basic problem whose results affect subsequent processes such as target detection and classification. Numerous approaches have been proposed, but there are still many challenges, particularly in using prior knowledge of multispectral images, which is crucial for solving the ill-posed problem of noise removal. This paper considers both non-local self-similarity in space and global correlation in spectrum. We propose a novel low-rank Tucker decomposition model for removing the noise, in which sparse and graph Laplacian regularization terms are employed to encode this prior knowledge. It can jointly learn a sparse and low-rank representation while preserving the local geometrical structure between spectral bands, so as to better capture simultaneously the correlation in spatial and spectral directions. We adopt the alternating direction method of multipliers to solve the resulting problem. Experiments demonstrate that the proposed method outperforms the state-of-the-art, such as cube-based and tensor-based methods, both quantitatively and qualitatively.


Introduction
A multispectral image (MSI) contains dozens of bands, each of which is captured over a specific wavelength range of the electromagnetic spectrum. It provides rich spectral-spatial information for improved performance of applications in remote sensing, such as target detection [1], classification [2], and tracking [3]. However, inevitably there are various kinds of noises which degrade visual quality, and affect the results obtained in those applications. MSI denoising is thus useful as a basic preprocessing step which can improve performance of subsequent processes. In recent years, it has attracted more and more attention in the fields of computer vision and remote sensing.
A great many MSI denoising approaches have been proposed to date [4]. The spatial-spectral structure of an MSI corresponds to a 1D signal at each spatial point, and a grayscale image in each spectral band. Hence, straightforward denoising employs conventional methods for 1D signals and grayscale images, such as wavelet domain soft thresholding [5], K-SVD [6], and block-matching and 3D filtering (BM3D) [7]. However, those methods only take into account spectral correlation or spatial correlation, and thus satisfactory results cannot be generally achieved. To overcome this issue, various methods aimed at making full use of spatial-spectral correlation have been proposed. Some examples include the use of principal component analysis [8], multidimensional Wiener filtering [9], and tensor decompositions [10][11][12].
Because MSI denoising is inherently ill-posed, formulating and exploiting prior knowledge plays a central role in addressing the problem. Nonlocal spatial self-similarity and global correlation in spectrum, which have been shown to be two very reasonable priors [13][14][15], are commonly used. Nonlocal self-similarity refers to a phenomenon when observing an MSI from a perspective of small fullband patches: each full-band patch is similar to many other, non-local, ones. Global correlation in spectrum means that there is a large amount of spectral redundancy in an MSI: different bands' images are generally highly correlated.
In the state-of-the-art approach proposed in Ref. [16], to use these two kinds of priors, a Kronecker basis representation (KBR) based tensor sparsity measure is introduced and applied to tensors formed from non-local similar full-band patches. However, it fails to consider the geometrical structure linking spectral bands: the set of clean band images lies on a low-dimensional manifold embedded in image space. Thus, as the results of band selection in Ref. [17] show, there are possibilities to further investigate spectral correlations using this geometrical structure, to obtain better denoising results.
This paper aims to take a further step in effectively utilizing the priors of non-local self-similarity and spectral correlation. As in Ref. [16], we build tensors from non-local similar full-band patches to naturally form representations based on those two priors. The low-rank Tucker decomposition is adopted to capture correlation simultaneously in all dimensions of the patch tensor. Furthermore, to fully identify the two kinds of priors, the band-wise local geometrical structure is formulated and incorporated as well as sparse regularization. Our main contributions are: • A sparse and graph Laplacian regularized lowrank Tucker decomposition model for removing noise. We use weighted graphs to characterize the intrinsic local geometrical structure of spectral bands. As similar bands should have similar low-dimensional representations, graph embedding is used to jointly learn a sparse lowrank representation while preserving the graph structure. • An algorithm based on this model and the altering direction method of multipliers (ADMM) for removing noise. • An evaluation of this approach based on a benchmark MSI dataset with various noise levels. Detailed analyses such as the balance of noise removal in different spectral bands are carried out. The results show that our method outperforms state-of-the-art methods, both quantitatively and qualitatively.
The rest of the paper is organized as follows. In Sections 2 and 3, we respectively introduce useful tensor concepts and give a brief review of related work. Section 4 investigates how to encode the priors, and proposes a sparse, graph Laplacian regularized, low-rank Tucker decomposition model for MSI denoising. A denoising algorithm based on the alternating direction method of multipliers employing this model is also given. In Section 5, experiments show quantitative and qualitative evaluation of the proposed method. Finally, we conclude the paper in Section 6.

Tensor background
In this section, we briefly describe some useful tensor concepts. Readers are referred to Refs. [18,19] for details and applications.
First, we explain the notation used in this paper. We denote scalars by non-bold letters, vectors by bold lowercase letters, matrices by bold uppercase letters, and higher-order tensors by bold calligraphy letters. The ith column of a matrix A is denoted by a i or A(:, i), and the ith row by A(i, :). We use [N ] to denote the set {1, · · · , N} for N ∈ N + .
Tensors are elements of a tensor product of vector spaces, and often assimilated to corresponding array representations while fixing bases on the vector spaces. Specifically, a real N th-order (I 1 , · · · , I N )dimensional tensor A ∈ R I 1 ×···×I N is an I 1 × · · · × I N array: where i n ∈ [I n ] for n ∈ [N ]. Scalars, vectors, and matrices are 0th-order, 1st-order, and 2nd-order tensors respectively. Tensors of order 3 are called higher-order tensors, and are natural representations of multidimensional data preserving their native structure. For example, an MSI can be naturally treated as a 3rd-order tensor, in which the spatialspectral structure is preserved. Let A ∈ R I 1 ×···×I N . A mode-n vector of tensor A is an I n -dimensional vector defined by fixing all indices but i n ; the mode-n matricization, denoted A (n) , is an I n ×(I 1 · · · I n−1 I n+1 · · · I N ) matrix formed by arranging all mode-n vectors as its columns. For instance, the mode-2 matricization of a matrix is its transpose.
The scalar product of tensors A, B ∈ R I 1 ×···×I N is defined as It is clear that the scalar product of tensors can be expressed in matrix form as A, B = tr A T (n) B (n) , n∈ [N ] in which tr(·) denotes the trace operator of matrix. The Frobenius norm of a tensor A ∈ R I 1 ×···×I N is given by The n-mode product of a tensor A ∈ R I 1 ×···×I N with a matrix U ∈ R J n ×I n , denoted A × n U , is a tensor B ∈ R I 1 ×···×I n−1 ×J n ×I n+1 ×···×I N defined entry-wise by It can be matricized as Each mode-n vector of A is multiplied by U . A tensor S ∈ R I 1 ×···×I N is said to be rank-1 if it can be expressed as the outer product of N vectors: where v (n) ∈ R I n for n ∈ [N ]. Let A be an arbitrary I 1 × · · · × I N tensor. The rank of A, denoted rank(A), is defined as the minimal number of rank-1 tensors that yield A as their sum; the n-rank rank n (A) is defined as the column rank of A (n) ; the multilinear rank is consequently defined as the N -tuple (R 1 , · · · , R N ), in which R n = rank n (A). Note that the values of R 1 , · · · , R N and R = rank(A) can all be different for higher order tensors.
Using the concepts of tensors, the singular value decomposition (SVD) of a matrix A = U ΣV T can be written as A = Σ × 1 U × 2 V . This means that a 2ndorder tensor A can be decomposed into a diagonal core tensor Σ multiplied by orthonormal matrices U and V along its 1-mode and 2-mode respectively. Tensor decompositions are useful in multilinear algebra. They are higher-order generalizations of the matrix SVD. However, a decomposition that satisfies both properties, namely orthogonality of the factor matrices and diagonality of the core tensor, does not exist in general. There are two basic models that treat a tensor as a global quantity.
One basic model is the CP decomposition proposed by Hitchcock [20]. It factorizes a tensor A ∈ R I 1 ×···×I N into a linear combination of R rank-1 tensors: where λ r ∈ R and v (n) r ∈ R I n are normalized vectors for r ∈ [R] and n ∈ [N ]. The other basic model is Tucker decomposition [21]. It factorizes a tensor A ∈ R I 1 ×···×I N into a core tensor G ∈ R R 1 ×···×R N multiplied by a matrix U (n) ∈ R I n ×R n along each n-mode: Here, the factor matrices U (1) , · · · , U (N ) are usually column orthonormal. This decomposition can be expressed in matrix form as The symbol ⊗ represents the Kronecker product of matrices.

Related work
Many methods have been proposed for MSI denoising. This section briefly reviews the relevant literature.

2D and extended 2D approaches
As mentioned earlier, grayscale image denoising approaches, such as non-local means (NLM) [22], K-SVD [6], and BM3D [7], can be directly used to reduce noise in each spectral band separately. However, an MSI is not just a collection of grayscale images: there is significant correlation between the spectral bands. Hence, these methods generally cannot achieve satisfactory results. In Ref. [23], a spectral-spatial kernel method is proposed. It can maintain spectral correlation and simultaneously match the original structure between two spatial dimensions. Another reasonable idea is to extend patch-based 2D denoising methods to 3D by building small cubes from an MSI. An adaptive non-local means (ANLM3D) method is proposed in Ref. [24]; the stateof-the-art method in this line is BM4D proposed in Ref. [13]. It utilizes 3D non-local self-similarity to collaboratively remove noise in similar cubes. Such methods still have not reached the full potential for noise removal, since the global spectral correlation prior is neglected.

Tensor-based approaches
Tensor decompositions are promising tools to process multidimensional data, preserving the native form of the data. Recently, the mathematical foundation of tensor decompositions has rapidly developed as well as tensor-based data processing techniques [18,19]. An MSI can be naturally represented by a 3rd-order tensor, and many denoising methods applying various tensor decompositions have been proposed to date.
In Ref. [12], the CP decomposition is used. Based on the Tucker decomposition, a multidimensional Wiener filtering method is presented in Ref. [9]. A multilinear low-rank approximation method is proposed in Ref. [10]. To avoid explicitly estimating the rank parameters, a sparse regularization on the core tensor is induced in Ref. [25]. The recently proposed tensor singular value decomposition (t-SVD) [26] is adopted in Ref. [27] to achieve an effective representation for the MSI, and thus effective recovery. The advantage of these methods is that they jointly capture the correlation in spatial and spectral directions, but the non-local self-similarity prior is not well utilized.
The tensor dictionary learning (TDL) [14] method takes the two kinds of priors mentioned earlier into account, but it just coarsely encodes the non-local self-similarity prior using a relatively small number of full-band clusters. The state-of-the-art method in this line was proposed in Ref. [16], in which a KBR based sparsity measure is introduced and applied to tensors formed by non-local similar fullband patches. However, it does not consider the geometrical structure linking spectral bands, i.e., that the set of clean band images resides on a low-dimensional manifold. There are possibilities to further investigate spectral correlation by using the geometrical structure, thereby improving the denoising performance, as the results of band selection show in Ref. [17].

Sparse and graph regularized low-rank Tucker decomposition
We treat an MSI as an H × W × S tensor, where H, W , and S are the height, width, and number of spectral bands respectively. Driven by the idea of using prior knowledge to improve the results of denoising, this paper investigates the priors of nonlocal self-similarity and spectral correlation based on a patch framework. We propose a low-rank Tucker decomposition model with sparse and graph Laplacian regularization, in which a sparse low-rank representation incorporating the local geometrical information linking spectral bands can be jointly learned. Its suitability for combining MSI correlation simultaneously along spatial and spectral directions produces better results. Figure 1 illustrates the idea of our method.

Problem formulation
Typically, an MSI is degraded by additive Gaussian noise. Hence, we consider the noise degradation model formulated as where Y is the noisy MSI, X is the clean MSI, and E is the Gaussian noise. The problem of MSI denoising, that is recovering X from Y, is inherently ill-posed, and prior knowledge is needed to address the problem. This paper employs both non-local self-similarity and spectral correlation priors at the same time, as do other state-of-the-art methods [13,14,16].
, is extracted from Y by block matching [7]. Then, an S ×wh×M patch tensor Y (i) is built up by stacking each mode-3 matricization of those M patches: The recovery of the underling clean patch tensor X (i) for Y (i) is formulated as an optimization problem: in which R(X (i) ) is the regularization term identifying prior knowledge about clean patch tensors, and λ > 0 is a constant controlling the tradeoff between the two terms. Finally, the overall MSI can be estimated by aggregating all restored patches. For simplicity and without confusion, we hereafter omit the superscripts for patch tensors. The Tucker decomposition is used to aggregate the correlation simultaneously in all different dimensions: in which G ∈ R R 1 ×R 2 ×R 3 is a core tensor, and U ∈ R S×R 1 , V ∈ R wh×R 2 , and W ∈ R M ×R 3 are column orthonormal factor matrices.
The Tucker decomposition is not unique [18]. This opens the door to placing constraints on the core tensor to improve the uniqueness, thereby improving the performance of noise removal. Next, a formula for the regularization term in model (9) will be investigated specifically. Fig. 1 Concept of our method. Superscripts i in full-band patches, and (i) in patch tensors are omitted for simplicity. We investigate the priors of non-local self-similarity and spectral correlation based on a patch framework. Non-local similar full-band patches P (2) , · · · , P (M ) are extracted for a selected full-band patch P (1) , and a noisy patch tensor Y delivering the two priors is built by matricizing and stacking those patches. We use Tucker decomposition to capture the correlation simultaneously in all different directions. In this representation, the patch tensor X of a clean MSI has the properties of multilinear low-rank and a sparse core tensor. Furthermore, the local geometrical structure linking spectral bands is also considered. We use a weighted graph, reconstructed using the noisy patch tensor Y or spectral priors, to characterize the intrinsic local structure. Graph embedding is utilized to jointly learn a sparse low-rank representation while preserving the graph structure.

Sparse and low-rank regularization
By construction, intuitively, the patch tensors should be low-rank in each mode, so R 1 < S, R 2 < wh, and R 3 < M Hence, a direct way is just to estimate the rank parameters and then make a multilinear low-rank approximation. But it is difficult to obtain accurate rank estimates. In Ref. [25], the authors induced a sparse regularization on the core tensor, such as G 1 , instead of explicitly estimating those rank parameters.
Another natural way is to introduce a low-rank part into the regularization term in the model in Eq. (9). However, there are many ways to formulate the low rank of tensors. The recently defined tensor trace norm [28]: is widely used in denoising methods with different settings of ω n . For example, equal weights are used in Refs. [29,30]: ω n = 1 N , n ∈ [N ]. In Ref. [31], the product of trace norm of each mode-n matricization is adopted, which is equivalent to setting weights adaptively, since N n=1 It is clear that the mode-n subspace with lower rank has a larger weight, and vice versa. Furthermore, based on the rank-product framework, the method proposed in Ref. [15] also considers fine-grained sparsity inside the core tensor, allowing it to well describe the detailed information in the data. We follow Ref. [15] in this work, and formulate the sparse and low-rank regularization term in Eq. (9) as where ω ∈ (0, 1) is a constant. As mentioned in Refs. [15,16], the regularization constrains the volume of the nonzero top-left block in the core tensor, as well as the number of Kronecker bases used for the Tucker representation. By construction of the patch tensors, it is easy to see that the singular values in the non-local self-similarity model generally decay much faster than those in the spectral and spatial models: R 3 R 1 , R 2 . Thus, this regularization has a larger penalty for R 3 than for R 1 and R 2 , which is in accord with our prior knowledge underlying patch tensors.

Graph Laplacian regularization
As demonstrated in Refs. [15,16], the sparse and low-rank regularization term Eq. (11) can efficiently encode prior knowledge concerning both non-local spatial self-similarity and global correlation in spectrum. However, one drawback is that it does not consider the geometrical structure linking spectral bands, namely that the set of clean band images lie on a low-dimensional manifold embedded in image space. Further improvement is possible by using the geometrical structure, as the results of band selection in Ref. [17] show.
Motivated by recently developed graph embeddings for manifold learning and regularization with tensor decompositions [32][33][34], this paper uses weighted graphs to characterize the intrinsic local geometrical structure linking spectral bands, and induces the graph Laplacian regularization aiming to jointly learn a sparse and low-rank representation while preserving the structure of the graph. Specifically, we rewrite Eq. (10) in matrix form as This implies that the ith band X (1) (i, :) ∈ R whM has a low-dimensional representation U (i, :) ∈ R R 1 in the basis consisting of all rows of G (1) The idea is that similar bands should have similar representations.
In the framework of graph embedding, an edgeweighted graph G with all bands as its vertices is built to model the local geometrical structure of the spectral bands. Let C ∈ R S×S be the weight matrix of graph G. Entry c ij measures the similarity between the ith and jth bands. The goal is to learn an optimal low-dimensional representation U for the vertices of graph G, which optimally characterizes their similarities. This can be achieved by introducing a novel regularization term: where η > 0 is a constant, and L ∈ R S×S is the graph Laplacian defined by

Proposed model
By adding the regularization terms in Eqs. (11) and (12) to the model in Eq. (9), our final model for MSI denoising is given by where α = (1 − ω)/ω, β = η/ω, and γ = λ/ω. It is obvious that Eq. (13) is a non-convex optimization problem, since the l 0 and rank terms are all non-convex. As in Ref. [16], we use a log-sum form to approximate the l 0 norm of tensors and the rank of matrices. Let: in which ε is a small positive constant, and σ i (A) is the ith singular value of matrix A. This leads to the following optimization problem:

Model optimization
We adopt the ADMM method [35] to solve the problem in Eq. (14), since it is so hard to estimate the multiple unknowns directly. By introducing three auxiliary tensors Z (1) , Z (2) , Z (3) ∈ R S×wh×M , Eq. (14) can be equivalently reformulated as Then, the augmented Lagrangian function can be defined as L(Z (1) ,Z (2) , Z (3) , where μ > 0 is a constant, T (1) , T (2) , and T (3) are Lagrange multipliers, and U , V , and W are column orthonormal matrices. In the ADMM framework, the problem can now be solved by solving the following sub-problems. Before starting, we first define: (1) Updating U , V , and W . There is a closedform solution to the problem of optimizing only V or W while keeping the other variables fixed. For example, V can be updated by solving min Since U , V , and W are column orthonormal, the problem can be rewritten as max where B = P (2) (W ⊗ U )G T (2) . Thus, we can obtain a closed-form solution by using von Neumann's trace inequality [36].
With other variables fixed, U can be updated by solving min This is equivalent to where A = βL and B = −P (1) (W ⊗ V )G T (1) . When B = 0, this problem is difficult to solve [37]. Here we use the conjugate gradient algorithm on a Stiefel manifold [38] to solve it. Note that this is the most time consuming step in the algorithm.
(2) Updating G. With other variables fixed, we update G by solving where ρ = 1/(γ + 3μ). Since U , V , and W are column orthonormal, the problem can be rewritten as min where It has been proved in Ref. [39] that this problem has a closed-form solution.
(3) Updating Z (n) and T (n) for n ∈ [3]. We optimize Z (n) while fixing other variables by solving min Z (n) where α n = α μ i =n L * (Z (i) (i) ). As proved in Ref. [40], this problem has a closed-form solution.
The multipliers T (n) can be updated by using:

Experiments
To verify the effectiveness of the proposed method, this section presents results of an experimental comparison with a wide range of state-of-the-art approaches. They include 2D band-wise methods: band-wise K-SVD (BW K-SVD) [6] and bandwise BM3D (BW BM3D) [7], cube-based methods extended from 2D methods: ANLM3D [24], BM4D [13], and tensor-based methods: PARAFAC [12], TDL [14], t-SVD [27], and KBR [16]. We do not make a comparison with deep learning based methods such as Ref. [41]. All experiments were conducted in MATLAB R2018b using a 64-bit Windows 10 PC with a 2.30 GHz Intel Xeon Gold 5118 CPU, and 64 GB RAM. The MATLAB Tensor Toolbox Version 2.6 [42] and the Manopt toolbox [43] were used to implement our algorithm.

Dataset and experimental setting
The competing methods were evaluated on the Columbia MSI dataset [44]. It consists of 32 scenes covering various real-world objects and materials such as textiles, paint, and skin. Each MSI has 31 bands, with spectral reflectance data from 400 to 700 nm in 10 nm steps. In the experiments, each MSI was downsampled to a spatial resolution of 128 × 128, and pixel values normalized to [0, 1].
Additive Gaussian noise was used to generate the noisy MSI. The noise variance σ ranged from 0.05 to 0.30 in 0.05 steps. The parameters α, β, and γ were set to 10, 0.6, and 10 −3 σ, respectively. The parameter μ was initially set as 250, and increased in each iteration by a factor 1.2. The weight linking the ith and jth bands was given by In practice, graph reconstruction consistent with reality is a crucial step to obtain a good denoising result. In this experiment, we simply used the above settings. For other possible settings we refer to Ref. [34]. Parameter settings for other methods than our own are taken from their original descriptions.
Five quantitative image quality indices, including peak signal-to-noise ratio (PSNR), structural similarity (SSIM) [45], feature similarity (FSIM) [46], erreur relative globale adimensionnelle de synthèse (ERGAS) [47], and spectral angle mapper (SAM) [48], measure the quality of the restored MSI. The bigger the former three indices are, and the smaller the last two are, the better the restored MSIs are.  Table 1 are computed over all combinations of 32 scenes and 6 levels of noise. Clearly, the proposed method outperforms all others with respect to all quality indices. Compared to the second best performed method, KBR [16], we achieve an improvement of around 0.5 dB in PSNR, and around 5 in ERGAS. The results provide evidence for the advance provided by the proposed sparse and graph Laplacian Tucker decomposition model. Also, the improvements beyond KBR [16] indicate the validity of using geometrical structure linking spectral bands. Of course, if there is no such structure, or it is incorrectly characterized, a poor result would be obtained due to unreasonable regularization.

Experimental results
We also observe in Table 1 that our method is the slowest: this is its main limitation. It is slow because it is full-band patch based, and needs to solve a difficult optimization problem due to the incorporation of the graph Laplacian. More efficient algorithms need to be considered. We also note that there is still room for further improvement, by simultaneously considering the geometrical structure of other models such as the non-local self-similarity model, or considering the inter geometrical structure of these models. We intend to consider these in our further work.
We also show variation of image quality index with Gaussian noise variance for different methods in Fig. 4. These results were obtained by averaging over the 32 scenes. At all levels of noise, the proposed method performs best with respect to all indices. Table 2 lists for different image quality indices the improvement of our method over the second best method. Furthermore, to analyze whether the proposed method can achieve better results in each spectral band, we show the results of KBR [16] and our method for each of the 31 bands in Fig. 5. These results are obtained by averaging over all combinations of 32 scenes and 6 levels of noise. Note that in this figure the ERGAS is computed on each band separately, which degenerates to a Fig. 4 Image quality index versus Gaussian noise variance for BW K-SVD [6], ANLM3D [24], TDL [14], BM4D [13], KBR [16], and the proposed method. The results were obtained by averaging over the 32 scenes.  quality index of grayscale images. It is clear that the results obtained by our method are better than those obtained by KBR [16] in almost all bands. Moreover, the improvements in different bands are relatively balanced.

Conclusions
This paper focuses on the MSI denoising problem. Two kinds of prior knowledge, non-local spatial self-similarity and global correlation in spectrum, were both considered to achieve better results than previous methods. We used Tucker decomposition to aggregate the correlation simultaneously in all modes of the patch tensor formed by non-local similar full-band patches. To fully identify the two kinds of priors, the local geometrical structure of spectral bands is considered and characterized by weighted graphs. Besides using sparse and lowrank regularization, we applied graph Laplacian regularization, to jointly learn a sparse and low-rank representation while preserving the graph structure. The alternating direction of multipliers method was adopted to solve the proposed model. Experimental results demonstrate that our method outperforms state-of-the-art methods in terms of both quantitative and qualitative results.
In future work, we will investigate models that consider geometrical structures simultaneously in multiple modes, either intra or inter, and efficient algorithms for solving the models. We will also consider other types of noise or mixed noise.