Keywords

1 Introduction

One of the most important task in computer vision is image denoising. The general idea is to regard a noisy image d as being obtained by corrupting a noiseless image m; given a model for the noise corruption, the desired image m is a solution of the corresponding inverse problem.

$$\begin{aligned} d = m + n \end{aligned}$$

where n is the noise, in this paper we assume that the noise is normally distributed and additive. Although, the image denoising problem is the simplest possible inverse problem, it’s still a difficult task for researchers due to the complicated structures and textures of natural images. Generally, the degraded imaging process can be described mathematically by the first class Fridman operator equation

$$\begin{aligned} (Af)(x,y) =\int \int \mathcal K(x-\xi ,y-\eta ) m(x,y)dxdy + noise = d(x,y) \end{aligned}$$

where \(\mathcal K\) is usually a linear kernel function. For deblurring problem, \(\mathcal K\) is the point spread function(psf), and for denoising, it’s just the identity operator. Since the the inverse problem is extremely ill-posed, most denoising procedures have to employ some sort of regularization. Generally, the image denoising problem can be solved under the frame of Tikhonov regularization.

$$\begin{aligned} \min _m \mathcal J(m) =||\mathcal K m -d||^2 + \alpha {\varOmega }[m] \end{aligned}$$

where \({\varOmega }[m]\) is the stability functional for m. According to different types of problems, the \({\varOmega }[m]\) can adopt different regularizations, such as Total Variation [1, 2], F-norm [3], L0-norm [4], and so on.

For images with low-rank structures, the rank regularization is a good choice. But the key problem is that the rank of an image matrix is difficult to be formulated in closed form. Recently, reference [4] showed that under surprisingly broad conditions, one can exactly recover the low-rank matrix A from D = A + E with gross but sparse errors E by solving the following convex optimization problem:

$$\begin{aligned} \arg \min _A ||D - A||_{F}^{2} + ||A||_* \end{aligned}$$

where \(||\cdot ||_*\) represents the nuclear norm of a matrix (the sum of its singular values). In [5], this optimization is dubbed Robust PCA (RPCA), because it enables one to correctly recover underlying low-rank structure in the data, even in the presence of gross errors or outlying observations. This optimization can be easily recast as a semidefinite program and solved by an off-the-shelf interior point solver (e.g., [68]). However, although interior point methods offer superior convergence rates, the complexity of computing the step direction is O(\(m^6\)). So they do not scale well with the size of the matrix.

In recent years, the research for more scalable algorithms for high dimensional convex optimization problems has prompted a return to first-order methods. One striking example of this is the current popularity of iterative thresholding algorithms for \( l^1\) -norm minimization problems arising in compressed sensing [911]. Similar iterative thresholding techniques [1215] can be applied to the problem of recovering a low-rank matrix from an incomplete (but clean) subset of its entries. This optimization is closely related to the RPCA problem, and the algorithm and convergence proof extend quite naturally to RPCA. However, the iterative thresholding scheme proposed in [4] exhibits extremely slow convergence: solving one instance requires about \(10^4 \) iterations, each of which has the same cost as one singular value decomposition.

In this paper, we attempt to solve the rank regularization problem in a new direction. Contrast to the nuclear-norm approximation introduced in [4], we propose a continuous approximation for rank regularization, and analyze the regularity of the algorithm. Compared with the nuclear norm approximation, the new method proposed in this paper is a ‘direct’ solver to the rank regularization, and it’s a smooth optimization problem with fruitful theory and computation foundation. Finally, the denoising experiments based on the new regularization are proposed for images with periodical textures.

The paper is organized as follows. In Sect. 2 we describe the proposed smooth approximation algorithm in detail, and the criteria of automatic parameter choosing is discussed theoretically; the numerical experiments results will be shown in Sect. 3 and conclusions in Sect. 4.

2 Proposed Method

In this paper, we consider the rank regularization optimization problem, which can be described as following model:

$$\begin{aligned} \min _u \mathcal J(u) =||u-u_0||^2 + rank(u) \end{aligned}$$
(1)

As the rank of an image matrix is nonconvex and difficult to be formulated in closed form, so the optimization problem (1) is difficult to be solved directly. The nuclear norm is a relaxation of the rank, and it can be proved that under some conditions this relaxation is a good convex approximation.

$$\begin{aligned} \min _u \mathcal J(u) =||u-u_0||^2 + \alpha \Vert u\Vert _* \end{aligned}$$

Many optimization algorithms for this problem have been researched, such as the Accelerated Proximal Gradient (APG) [1618], the Augmented Lagrange Multiplier (ALM) Method [19], and Iterative Singular Value Thresholding (ISVT) [12]. One common shortcoming of these methods is that they have to calculate the singular value decomposition at each iteration, which increase the complexity of computation heavily.

In this paper, we attempt to solve the rank regularization problem in a new direction. Despite using the nuclear-norm as relaxation, we propose a ‘direct’ solver to the rank regularization. The main contribution of this algorithm includes: firstly, it’s a smooth optimization problem, easily to be solved, and we propose an automatic parameter choosing criteria based on the Morozov discrepancy principle; secondly, compared with traditional methods, it just needs calculating singular value decomposition one time, so it’s more efficient.

From the basis of matrix theory, the rank of a matrix equals to the number of non-zero singular value terms. For a matrix A, \(rank(A)=\#\{\sigma \ne 0\}\), where \(\sigma \)is the vector composed by the singular value of matrix A. From this point of view, to compute the rank of a matrix is a process of counting non-zeros. It’s a generalization of \(L_0\) norm to two dimensional space. But ’the counting process’ is also difficult to be formed in the closed form and can’t be optimized directly. In this paper, we propose a continuous function to describe the process of “counting non-zeros”.

Firstly, for a matrix A, the operation of singular value decomposition should be calculated, \(A = S\cdot V \cdot D\), where SD are the unitary matrix, and V is a diagonal matrix composed by the singular value of A. Then, we construct a characteristic function \(p(\sigma (i))\) for each singular value \(\sigma (i)\), and the function \(p(\sigma _i)\) should have the following property: \(p(\sigma _i) = \left\{ \begin{array}{ll} 0, &{} \hbox {if } \sigma _i =0; \\ 1, &{} \hbox {else.} \end{array} \right. \) In this paper, we choose the gaussian function as the approximation of characteristic function, with only one parameter s as the variance.

$$\begin{aligned} p(\sigma ) = 1-exp(-\frac{\sigma ^2}{s^2}), \end{aligned}$$
(2)

From Fig. 1, we can see that when \(s\rightarrow 0\), the function \(p(\cdot )\) is a good approximation of characteristic function. So the rank can be approximately formulated as the sum of the characteristic functions of each singular value, \(rank(A)=\sum _{i} p(\sigma (i))\). With the proposed relaxation of rank, the denoising model (1) can be formulated as following

$$\begin{aligned} \min _\sigma \mathcal J(\sigma ) =||S\cdot V \cdot D-u_0||^2 + \lambda \sum _i p(\sigma (i)) \end{aligned}$$
(3)

where \(V=diag\{\sigma \}\), and the final denoised image \(u=S\cdot V(\sigma )\cdot D\).

Fig. 1.
figure 1

The curve of characteristic function \(p(\sigma )=1-exp(-\frac{\sigma ^2}{s^2})\) with different choosing parameter \(s = 0.1,0.01, 0.001.\)

Compared with the nuclear norm, the model (3) is a new relaxation to the rank regularization. And we can see that it’s a smooth optimization problem, especially, it just need compute the singular value decomposition just one time.

From the property of unitary of matrix S and D, the model (3) can be reformulated as

$$\begin{aligned} \min _{\sigma } ||V - S^T u_0 D^T||^2 + \lambda \sum _i p(\sigma _i) \end{aligned}$$

In the discrete representation, it can be described as

$$\begin{aligned} \min _{\sigma } \sum _{i,j}(\sigma _i -(S^T u_0 D^T)_{ij})^2 + \lambda \sum _i p(\sigma _i) \end{aligned}$$
(4)

By computing the derivative with respect to \(\sigma _i\), the final problem is to solve the following smooth equation system, which can be efficiently solved by Newton methods.

$$\begin{aligned} \sigma _i + \frac{2\lambda }{s^2}\sigma _i exp(-\frac{\sigma _i^2}{s^2}) =\sum _{j}(S^T u_0)_{ij} *d_{ij} \end{aligned}$$
(5)

The detail process is shown in Algorithm 1.

figure a

In Algorithm 1, there are two parameters \(\lambda \) and s need to be initialized, How to choose these two parameters is an important and hard work. In the following, we will discuss the choosing criteria of \(\lambda \) and s.

The choose of parameter \(\lambda \) is difficult for regularization problem, because if the weighting parameter \(\lambda \) is too small, the algorithm will be unstable, and if \(\lambda \) is too large, the regularity will be too strong and the resulting algorithm will not accurately optimize the originally problem. In this paper, we propose an adaptive choosing criteria of parameter \(\lambda \) based on the Morozov discrepancy principle.

As the final optimal \(\sigma \) is dependent on the choosing of parameter \(\lambda \), so the problem (3) should be reformulated as

$$\begin{aligned} {} \min _{\sigma } J(\sigma ,\lambda )= ||S\cdot diag(\sigma _i^{\lambda }) \cdot D - u_{0,\delta }||^2 + \alpha \sum _i p(\sigma _i^{\lambda }) \end{aligned}$$
(6)

There are two variables \(\sigma \) and \(\lambda \). For a fixed \(\lambda \), the optimization for \(\sigma \) is simple to be solved by the Euler equation:

$$\begin{aligned} \sigma _i^{\lambda } + \lambda p(\sigma _i^{\lambda })^{'} = b, \end{aligned}$$
(7)

And for fixed \(\sigma \), the parameter \(\lambda \) can be chosen by the Morozov discrepancy principle.

Proposition 1

The criteria of choosing parameter \(\lambda \) Definite functional \(\phi (\lambda )\),

$$\begin{aligned} \phi (\lambda ) = ||S\cdot diag(\sigma ^{\lambda })\cdot D - u_{0,\delta }||^2 - \delta ^2 \end{aligned}$$
(8)

according to the Morozov discrepancy principle, the parameter \(\lambda \) should obey \(\frac{d\phi (\lambda )}{d\lambda }=0\), which follows the Newton iteration

$$\begin{aligned} \lambda _{k+1} = \lambda _k -\frac{\phi (\lambda _k)}{\phi (\lambda _k)^{'}} \end{aligned}$$
(9)

according to (7), \(\phi _{\lambda _k}^{'} = \sigma _i \frac{d\sigma _i}{d\lambda } = -\sigma _i (1 + \lambda _k p(\sigma _i^\lambda )^{''})^{-1} p(\sigma _i^{\lambda })^{'}\).

We can see that the function \(p(\sigma )\) is not a convex function, especially when the parameter \(s\rightarrow 0\), it’s difficult to obtain the global optimal solution. But for bigger s, this is a smooth optimization problem, and the global solution can be surely solved. So in this paper, we choose a descent sequence s, \(s_0>s_1>\cdots >s_n\).

figure b

3 Numerical Experiments

In this section, we report our experimental results with the proposed algorithm in the previous section. We present experiments on both synthetic data and real images. All the experiments are conducted and timed on the same PC with an Intel Core i5 2.50 GHz CPU that has 2 cores and 16 GB memory, running Windows 7 and Matlab (Version 7.10).

We compared the proposed method with some of the most popular low rank matrix completion algorithm, including the Accelerated Proximal Gradient algorithm(APG), the Iterative Singular Value Thresholding algorithm (ISVT) [11] and the Augmented Lagrange Multiplier (ALM).

3.1 Synthetic Data

We generate the synthetic matrix \(X\in R^{250*250}\) with rank equal to 25, as shown in Fig. 2(a), it’s a block-diagonal matrix. Then the gaussian noise with level 0.35 is added, as shown in Fig. 2(b). For the compared algorithms, we use the default parameters in their released codes. The results were shown in Fig. 2(c–f), and the PSNR values, estimated rank and the costing time were listed in the Table 1. According to the experimental results, we can learn that our algorithm can achieve the exact estimation of rank and higher PSNR value while having the smallest time complexity.

Fig. 2.
figure 2

Experimental results for synthetic data.

Table 1. PSNR, estimated rank and costing time of different algorithms

3.2 Real Image Denoising

As general natural images may not have the low rank property, so we choose a certain type of natural images which have periodical textures, such as images of buildings, fabric, and so on (as shown in Figs. 3 and 4). The PSNR values for two test images are shown in Fig. 5.

Fig. 3.
figure 3

The image denoising results based on different algorithms.

Fig. 4.
figure 4

The image denoising results based on different algorithms.

Fig. 5.
figure 5

The PSNR values of different algorithms for building and fabric image

4 Conclusions

In this paper, we propose a novel rank-regularized optimization algorithm. The new method proposed in this paper is a ‘direct’ solver to the rank regularization, it’s a smooth optimization and it just need computing the singular value decomposition one time. An automatic parameter choosing criteria is proposed based on the Morozov discrepancy principle. Finally, the experimental results show that the proposed algorithm can achieve the better estimation of rank and the higher PSNR value while having the smallest time complexity.