Keywords

1 Introduction

This work is concerned with the general problem of image restoration under sparsity constraints formulated as

$$\begin{aligned} \min _\mathbf {x} \; \phi (\mathbf {x}) = \frac{1}{2}\Vert \mathbf {A}\mathbf {x}-\mathbf {b}\Vert _2^2 + \lambda \Vert \mathbf {x}\Vert _1 \end{aligned}$$
(1)

where \(\mathbf {A}\) is a \({m\times n}\) real matrix (usually \(m\le n\)), \(\mathbf {x}\in \mathbb {R}^{n}\), \(\mathbf {b}\in \mathbb {R}^{m}\) and \(\lambda \) is a positive parameter. (Throughout the paper, \(\Vert \cdot \Vert \) will denote the Euclidean norm). In some image restoration applications, \(\mathbf {A}=\mathbf {K}\mathbf {W}\) where \(\mathbf {K}\in \mathbb {R}^{m\times n}\) is a discretized linear operator and \(\mathbf {W}\in \mathbb {R}^{n\times n}\) is a transformation matrix from a domain where the image is a priori known to have a sparse representation. The variable \(\mathbf {x}\) contains the coefficients of the unknown image and the data \(\mathbf {b}\) is the measurements vector which is assumed to be affected by Gaussian white noise intrinsic to the detection process. The formulation (1) is usually referred to as synthesis formulation since it is based on the synthesis equation of the unknown image from its coefficients \(\mathbf {x}\).

The penalization of the \(\ell _1\)-norm of the coefficients vector \(\mathbf {x}\) in (1) simultaneously favors sparsity and avoids overfitting. For this reason, sparsity constrained image restoration has received considerable attention in the recent literature and has been successfully used in various areas. The efficient solution of problem (1) is a critical issue since the nondifferentiability of the \(\ell _1\)-norm makes standard unconstrained optimization methods unusable. Among the current state-of-the-art methods there are gradient descent-type methods as TwIST [5], SparSA [14], FISTA [2] and NESTA [3]. GPSR [6] is a gradient-projection algorithm for the equivalent convex quadratic program obtained by splitting the variable \(\mathbf {x}\) in its positive and negative parts. Fixed-point continuation methods [9], as well as methods based on Bregman iterations [7] and variable splitting, as SALSA [1], have also been recently proposed. In [12], the classic Newton projection method is used to solve the bound-constrained quadratic program formulation of (1) obtained by splitting \(\mathbf {x}\). A Modified Newton projection (MNP) method has been recently proposed in [11] for the analysis formulation of the \(\ell _1\)-regularized least squares problem where \(\mathbf {W}\) is the identity matrix and \(\mathbf {x}\) represents the image itself. The MNP method uses a fair regularized approximation to the Hessian matrix so that products of its inverse and vectors can be computed at low computational cost. As a result, the only operations required for the search direction computation are matrix-vector products.

The main contribution of this work is to extend the MNP method of [11], developed for the case \(\mathbf {W}=\mathbf {I}_n\), to the synthesis formulation of problem (1) where \(\mathbf {W}\ne \mathbf {I}_n\). In the proposed approach, problem (1) is firstly formulated as a nonnegatively constrained quadratic programming problem by splitting the variable \(\mathbf {x}\) into the positive and negative parts. Then, the quadratic program is solved by a special purpose MNP method where a fair regularized approximation to the Hessian matrix is proposed so that products of its inverse and vectors can be computed at low computational cost. As a result, the search direction can be efficiently obtained. The convergence of the proposed MNP method is analyzed. Even if the size of the problem is doubled, the low computational cost per iteration and less iterative steps make MNP quite efficient. The performance of MNP is evaluated on several image restoration problems and is compared with that of some state-of-the-art methods. The results of the comparative study show that MNP is competitive and in some cases is outperforming some state-of-the-art methods in terms of computational complexity and achieved accuracy.

The rest of the paper can be outlined as follows. In Sect. 2, the quadratic program formulation of (1) is derived. The MNP method is presented and its convergence is analyzed in Sect. 3. In this section, the efficient computation of the search direction is also discussed. In Sect. 4, the numerical results are presented. Conclusions are given in Sect. 5.

2 Nonnegatively Constrained Quadratic Program Formulation

The proposed approach firstly needs to reformulate (1) as a nonnegatively constrained quadratic program (NCQP). The NCQP formulation is obtained by splitting the variable \(\mathbf {x}\) into its positive and negative parts [6], i.e.

$$\begin{aligned} \mathbf {x}=\mathbf {u}-\mathbf {v}, \quad \mathbf {u}=\max (\mathbf {x},0), \quad \mathbf {v}=\max (-\mathbf {x},0). \end{aligned}$$

Problem (1) can be written as the following NCQP:

$$\begin{aligned} \begin{aligned}&\min _{(\mathbf {u},\mathbf {v})} \; \mathcal {F}(\mathbf {u},\mathbf {v})=\frac{1}{2}\Vert \mathbf {A}(\mathbf {u}-\mathbf {v)}-\mathbf {b}\Vert ^2 + \lambda \mathbf {1}^H\mathbf {u} + \lambda \mathbf {1}^H\mathbf {v} \\&\text { s.t.} \quad \mathbf {u}\ge 0, \; \mathbf {v}\ge 0 \\ \end{aligned} \end{aligned}$$
(2)

where \(\mathbf {1}\) denotes the n-dimensional column vector of ones. The gradient \(\mathbf {g}\) and Hessian \(\mathbf {H}\) of \(\mathcal {F}(\mathbf {u},\mathbf {v})\) are respectively defined by

$$\begin{aligned} \mathbf {g} = \begin{bmatrix} \mathbf {A}^H\mathbf {A}(\mathbf {u}-\mathbf {v})-\mathbf {A}^H\mathbf {b}+\lambda \mathbf {1} \\ -\mathbf {A}^H\mathbf {A}(\mathbf {u}-\mathbf {v})+\mathbf {A}^H\mathbf {b}+\lambda \mathbf {1} \\ \end{bmatrix}, \; \mathbf {H} = \begin{bmatrix} \mathbf {A}^H\mathbf {A}&-\mathbf {A}^H\mathbf {A} \\ -\mathbf {A}^H\mathbf {A}&\mathbf {A}^H\mathbf {A} \\ \end{bmatrix}. \end{aligned}$$
(3)

We remark that the computation of the objective function and its gradient values requires only one multiplication by \(\mathbf {A}\) and one by \(\mathbf {A}^H\), nevertheless the double of the problem size. Since \(\mathbf {H}\) is positive semidefinite, we propose to approximate it with the positive definite matrix \(\mathbf {H}_\tau \):

$$\begin{aligned} \mathbf {H}_\tau = \begin{bmatrix} \mathbf {A}^H\mathbf {A} +\tau \mathbf {I}_n&-\mathbf {A}^H\mathbf {A} \\ -\mathbf {A}^H\mathbf {A}&\mathbf {A}^H\mathbf {A}+\tau \mathbf {I}_n \\ \end{bmatrix} \end{aligned}$$
(4)

where \(\tau \) is a positive parameter and \(\mathbf {I}\) is the identity matrix of size n.

Proposition 21

Let \(\sigma _1,\sigma _2,\ldots ,\sigma _n\) be the nonnegative eigenvalues of \(\mathbf {A}\) in nonincreasing order:

$$\begin{aligned} \sigma _1\ge \sigma _2\ge \ldots \ge \sigma _m\ge \sigma _{m+1}=\ldots =\sigma _n=0. \end{aligned}$$
(5)

Then, \(\mathbf {H}_\tau \) is a positive definite matrix whose eigenvalues are

$$\begin{aligned} 2\sigma _1+\tau ,\; 2\sigma _2+\tau ,\;\ldots ,\; 2\sigma _n+\tau ,\;\tau ,\;\ldots ,\;\tau . \end{aligned}$$
(6)

The proof is immediate since the spectrum of \(\mathbf {H}_\tau \) is the union of the spectra of \(\mathbf {A}^H\mathbf {A}+\tau \mathbf {I}+\mathbf {A}^H\mathbf {A}\) and \(\mathbf {A}^H\mathbf {A}+\tau \mathbf {I}-\mathbf {A}^H\mathbf {A}\).

The following proposition shows that an explicit formula for the inverse of \(\mathbf {H}_\tau \) can be derived.

Proposition 22

The inverse of the matrix \(\mathbf {H}_\tau \) is the matrix \(\mathbf {M}_\tau \) defined as

$$\begin{aligned} \mathbf {M}_\tau = \frac{1}{\tau } \mathbf {M}_1\mathbf {M}_2 \end{aligned}$$
(7)

where

$$\begin{aligned} \mathbf {M}_1=\begin{bmatrix} \mathbf {A}^H\mathbf {A} +\tau \mathbf {I}&\mathbf {A}^H\mathbf {A} \\ \mathbf {A}^H\mathbf {A}&\mathbf {A}^H\mathbf {A}+\tau \mathbf {I} \\ \end{bmatrix}, \quad \mathbf {M}_2 = \begin{bmatrix} (2\mathbf {A}^H\mathbf {A} +\tau \mathbf {I})^{-1}&\mathbf {0} \\ \mathbf {0}&(2\mathbf {A}^H\mathbf {A}+\tau \mathbf {I})^{-1} \\ \end{bmatrix}. \end{aligned}$$

Proof

We have

$$\begin{aligned} \frac{1}{\tau }\mathbf {H}_\tau \mathbf {M}_1&= \frac{1}{\tau } \begin{bmatrix} (\mathbf {A}^H\mathbf {A} +\tau \mathbf {I})^2 -(\mathbf {A}^H\mathbf {A})^2 & \mathbf {0} \\ \mathbf {0}&(\mathbf {A}^H\mathbf {A} +\tau \mathbf {I})^2 -(\mathbf {A}^H\mathbf {A})^2 \\ \end{bmatrix} \nonumber \\&= \frac{1}{\tau } \begin{bmatrix} 2\tau \mathbf {A}^H\mathbf {A} +\tau ^2 \mathbf {I}&\mathbf {0} \\ \mathbf {0}&2\tau \mathbf {A}^H\mathbf {A}+\tau ^2 \mathbf {I} \\ \end{bmatrix} = \mathbf {M}_2^{-1}. \end{aligned}$$
(8)

Similarly, we can prove that

$$\begin{aligned} \frac{1}{\tau }\mathbf {M}_1 \mathbf {H}_\tau =\mathbf {M}_2^{-1}. \end{aligned}$$
(9)

We now show that \(\mathbf {M}_1\mathbf {M}_2=\mathbf {M}_2\mathbf {M}_1\). We have

$$\begin{aligned} \mathbf {M}_1\mathbf {M}_2&= \begin{bmatrix} (\mathbf {A}^H\mathbf {A})(2\mathbf {A}^H\mathbf {A} +\tau \mathbf {I})^{-1} \;&\quad (\mathbf {A}^H\mathbf {A})(2\mathbf {A}^H\mathbf {A} +\tau \mathbf {I})^{-1} \\ (\mathbf {A}^H\mathbf {A})(2\mathbf {A}^H\mathbf {A} +\tau \mathbf {I})^{-1} \;&\; (\mathbf {A}^H\mathbf {A})(2\mathbf {A}^H\mathbf {A} +\tau \mathbf {I})^{-1} \\ \end{bmatrix} \nonumber \\&\qquad \qquad \qquad + \begin{bmatrix} \tau (2\mathbf {A}^H\mathbf {A} +\tau \mathbf {I})^{-1}&\mathbf {0} \\ \mathbf {0}&\tau (2\mathbf {A}^H\mathbf {A} +\tau \mathbf {I})^{-1} \\ \end{bmatrix} \\ \mathbf {M}_2\mathbf {M}_1&= \begin{bmatrix} (2\mathbf {A}^H\mathbf {A} +\tau \mathbf {I})^{-1}(\mathbf {A}^H\mathbf {A}) \;&\; (2\mathbf {A}^H\mathbf {A} +\tau \mathbf {I})^{-1}(\mathbf {A}^H\mathbf {A}) \\ (2\mathbf {A}^H\mathbf {A} +\tau \mathbf {I})^{-1}(\mathbf {A}^H\mathbf {A}) \;&\; (2\mathbf {A}^H\mathbf {A} +\tau \mathbf {I})^{-1}(\mathbf {A}^H\mathbf {A})\\ \end{bmatrix} \nonumber \\&\qquad \qquad \qquad + \begin{bmatrix} \tau (2\mathbf {A}^H\mathbf {A} +\tau \mathbf {I})^{-1}&\mathbf {0} \\ \mathbf {0}&\tau (2\mathbf {A}^H\mathbf {A} +\tau \mathbf {I})^{-1} \\ \end{bmatrix}. \end{aligned}$$

After some simple algebra, it can be proved that

$$\begin{aligned} (\mathbf {A}^H\mathbf {A})(2\mathbf {A}^H\mathbf {A} +\tau \mathbf {I})^{-1}=(2\mathbf {A}^H\mathbf {A} +\tau \mathbf {I})^{-1}(\mathbf {A}^H\mathbf {A}). \end{aligned}$$

and thus

$$\begin{aligned} \mathbf {M}_1\mathbf {M}_2=\mathbf {M}_2\mathbf {M}_1. \end{aligned}$$
(10)

From (8), (9) and (10), it follows

$$\begin{aligned} \mathbf {H}_\tau \mathbf {M}_\tau&= \frac{1}{\tau }\mathbf {H}_\tau \mathbf {M}_1\mathbf {M}_2=\mathbf {M}_2^{-1}\mathbf {M}_2=\mathbf {I}_{2n} \\ \mathbf {M}_\tau \mathbf {H}_\tau&= \frac{1}{\tau } \mathbf {M}_1 \mathbf {M}_2\mathbf {H}_\tau = \frac{1}{\tau } \mathbf {M}_2 \mathbf {M}_1\mathbf {H}_\tau = \mathbf {M}_2^{-1}\mathbf {M}_2^{-1}=\mathbf {I}_{2n}. \end{aligned}$$

3 The Modified Newton Projection Method

The Algorithm. The Newton projection method [4] for problem (2) can be written as

$$\begin{aligned} \begin{bmatrix} \mathbf {u}^{(k+1)} \\ \mathbf {v}^{(k+1)} \\ \end{bmatrix}= \left[ \begin{bmatrix} \mathbf {u}^{(k)} \\ \mathbf {v}^{(k)} \\ \end{bmatrix} -\alpha ^{(k)}\mathbf {p}^{(k)}\right] ^+, \quad \mathbf {p}^{(k)}=\mathbf {S}^{(k)}\mathbf {g}^{(k)}, \quad \mathbf {g}^{(k)}= \begin{bmatrix} \mathbf {g}_\mathbf {u}^{(k)} \\ \mathbf {g}_\mathbf {v}^{(k)} \\ \end{bmatrix} \end{aligned}$$
(11)

where \([\cdot ]^+\) denotes the projection on the positive orthant, \(\mathbf {g}_\mathbf {u}^{(k)}\) and \(\mathbf {g}_\mathbf {v}^{(k)}\) respectively indicate the partial derivatives of \(\mathcal {F}\) with respect to \(\mathbf {u}\) and \(\mathbf {v}\) at the current iterate. The scaling matrix \(\mathbf {S}^{(k)}\) is a partially diagonal matrix with respect to the index set \(\mathcal {A}^{(k)}\) defined as

$$\begin{aligned}&\quad \qquad \qquad \mathcal {A}^{(k)} = \Big \{i \; |\; 0\le y_i^{(k)} \le \varepsilon ^{(k)} \text { and } g^{(k)}_i>0\Big \} \\&\mathbf {y}^{(k)}= \begin{bmatrix} \mathbf {u}^{(k)} \\ \mathbf {v}^{(k)} \\ \end{bmatrix}, \; \varepsilon ^{(k)} = \min \{ \varepsilon , w^{(k)} \}, \; w^{(k)}=\Vert \mathbf {y}^{(k)}-[\mathbf {y}^{(k)}- \mathbf {g}^{(k)}]^+\Vert \end{aligned}$$

and \(\varepsilon \) is a small positive parameter.

The step-length \(\alpha ^{(k)}\) is computed with the Armijo rule along the projection arc [4]. Let \(\mathbf {E}^{(k)}\) and \(\mathbf {F}^{(k)}\) be the diagonal matrices [13] such that

$$\begin{aligned} \{ \mathbf {E}^{(k)} \}_{ii}&= \left\{ \begin{array}{ll} 1, &{} i \notin \mathcal {A}^{(k)}; \\ 0, &{} i \in \mathcal {A}^{(k)}; \end{array} \right. , \quad \mathbf {F}^{(k)} = \mathbf {I}_{2n}-\mathbf {E}^{(k)} . \end{aligned}$$

In MNP, we propose to define the scaling matrix \(\mathbf {S}^{(k)}\) as

$$\begin{aligned} \mathbf {S}^{(k)}=\mathbf {E}^{(k)}\mathbf {M}_\tau \mathbf {E}^{(k)}+\mathbf {F}^{(k)} . \end{aligned}$$
(12)

Therefore, the complexity of computation of the search direction \(\mathbf {p}^{(k)}=\mathbf {S}^{(k)}\mathbf {g}^{(k)}\) is mainly due to one multiplication of the inverse Hessian approximation \(\mathbf {M}_\tau \) with a vector since matrix-vector products involving \(\mathbf {E}^{(k)}\) and \(\mathbf {F}^{(k)}\) only extracts some components of the vector and do not need not to be explicitly performed. We remark that, in the Newton projection method proposed in [4], the scaling matrix is \(\mathbf {S}^{(k)}=\left( \mathbf {E}^{(k)}\mathbf {H}_\tau \mathbf {E}^{(k)}+\mathbf {F}^{(k)}\right) ^{-1}\) which requires to extract a submatrix of \(\mathbf {H}_\tau \) and then to invert it.

Convergence Analysis. As proved in [4], the convergence of Newton-type projection methods can be proved under the general following assumptions which basically require the scaling matrices \(\mathbf {S}^{(k)}\) to be positive definite matrices with uniformly bounded eigenvalues.

 

A1:

The gradient \(\mathbf {g}\) is Lipschitz continuous on each bounded set of \(\mathbb {R}^{2n}\).

A2:

There exist positive scalars \(c_1\) and \(c_2\) such that

$$\begin{aligned} c_1\Vert \mathbf {y}\Vert ^2 \le \mathbf {y}^H \mathbf {S}^{(k)} \mathbf {y} \le c_2 \Vert \mathbf {y}\Vert ^2, \; \forall \mathbf {y}\in \mathbb {R}^{2n}, \; k=0,1,\ldots \end{aligned}$$

The key convergence result is provided in Proposition 2 of [4] which is restated here for the shake of completeness.

Proposition 31

[4, Proposition 2] Let \(\{[\mathbf {u}^{(k)},\mathbf {v}^{(k)}]\}\) be a sequence generated by iteration (11) where \(\mathbf {S}^{(k)}\) is a positive definite symmetric matrix which is diagonal with respect to \(\mathcal {A}^{(k)}\) and \(\alpha ^k\) is computed by the Armijo rule along the projection arc. Under assumptions A1 and A2 above, every limit point of a sequence \(\{[\mathbf {u}^{(k)},\mathbf {v}^{(k)}]\}\) is a critical point with respect to problem (2).

Since the objective \(\mathcal {F}\) of (2) is twice continuously differentiable, it satisfies assumption A1. From Propositions 21 and 22, it follows that \(\mathbf {M}_\tau \) is a symmetric positive definite matrix and hence, the scaling matrix \(\mathbf {S}^{(k)}\) defined by (12) is a positive definite symmetric matrix which is diagonal with respect to \(\mathcal {A}^{(k)}\). The global convergence of the MNP method is therefore guaranteed provided \(\mathbf {S}^{(k)}\) verifies assumption A2.

Proposition 32

Let \(\mathbf {S}^{(k)}\) be the scaling matrix defined as \( \mathbf {S}^{(k)}=\mathbf {E}^{(k)}\mathbf {H}_\tau \mathbf {E}^{(k)}+\mathbf {F}^{(k)}. \) Then, there exist two positive scalars \(c_1\) and \(c_2\) such that

$$\begin{aligned} c_1\Vert \mathbf {y}\Vert ^2 \le \mathbf {y}^H \mathbf {S}^{(k)} \mathbf {y} \le c_2 \Vert \mathbf {y}\Vert ^2, \quad \forall \mathbf {y}\in \mathbb {R}^{2n}, \quad k=0,1,\ldots \end{aligned}$$

Proof

Proposition 21 implies that the largest and smallest eigenvalue of \(\mathbf {M}_\tau \) are respectively \({1}/{\tau }\) and \(1/(2\sigma _1+\tau )\), therefore

$$\begin{aligned} \frac{1}{2\sigma _1+\tau }\Vert \mathbf {y}\Vert ^2 \le \mathbf {y}^H \mathbf {M}_\tau \mathbf {y} \le \frac{1}{\tau } \Vert \mathbf {y}\Vert ^2, \quad \forall \mathbf {y}\in \mathbb {R}^{2n}. \end{aligned}$$
(13)

We have \( \mathbf {y}^H \mathbf {S}^{(k)} \mathbf {y}= \big ( \mathbf {E}^{(k)}\mathbf {y}\big )^H \mathbf {M}_\tau \big ( \mathbf {E}^{(k)}\mathbf {y}\big ) + \mathbf {y}^H\mathbf {F}^{(k)}\mathbf {y} . \) From (13) it follows that

$$\begin{aligned} \frac{\Vert \mathbf {E}^{(k)}\mathbf {y}\Vert ^2}{2\sigma _1+\tau }+\mathbf {y}^H\mathbf {F}^{(k)}\mathbf {y} \le \big ( \mathbf {E}^{(k)}\mathbf {y}\big )^H \mathbf {M}_\tau \big ( \mathbf {E}^{(k)}\mathbf {y}\big ) + \mathbf {y}^H\mathbf {F}^{(k)}\mathbf {y} \le \frac{\Vert \mathbf {E}^{(k)}\mathbf {y}\Vert ^2}{\tau }+\mathbf {y}^H\mathbf {F}^{(k)}\mathbf {y}. \end{aligned}$$

Moreover we have \( \mathbf {y}^H\mathbf {F}^{(k)}\mathbf {y} = \sum _{i\in \mathcal {A}^{(k)}} y_i^2\), \(\Vert \mathbf {E}^{(k)}\mathbf {y}\Vert ^2 = \sum _{i\notin \mathcal {A}^{(k)}} y_i^2 \); hence:

$$\begin{aligned} \frac{1}{2\sigma _1+\tau }\sum _{i\notin \mathcal {A}^{(k)}} y_i^2+\sum _{i\in \mathcal {A}^{(k)}} y_i^2 \le \mathbf {y}^H \mathbf {S}^{(k)} \mathbf {y} \le \frac{1}{\tau }\sum _{i\notin \mathcal {A}^{(k)}} y_i^2+\sum _{i\in \mathcal {A}^{(k)}} y_i^2 \end{aligned}$$

and

$$\begin{aligned} \min \{\frac{1}{2\sigma _1+\tau },1\}\Vert \mathbf {y} \Vert ^2 \le \mathbf {y}^H \mathbf {S}^{(k)} \mathbf {y} \le \max \{\frac{1}{\tau },1\}\Vert \mathbf {y}\Vert ^2. \end{aligned}$$

The thesis follows by setting \(c_1=\min \{\frac{1}{2\sigma _1+\tau },1\}\) and \(c_2=\max \{\frac{1}{\tau },1\}\).

Computing the Search Direction. We suppose that \(\mathbf {K}\) is the matrix representation of a spatially invariant convolution operator with periodic boundary conditions so that \(\mathbf {K}\) is a block circulant with circulant blocks (BCCB) matrix and matrix-vector products can be efficiently performed via the FFT. Moreover, we assume that the columns of \(\mathbf {W}\) form an orthogonal basis for which fast sparsifying algorithms exist, such as a wavelet basis, for example. Under these assumptions, \(\mathbf {A}\) is a full and dense matrix but the computational cost of matrix-vector operations with \(\mathbf {A}\) and \(\mathbf {A}^H\) is relatively cheap. As shown by (12), the computation of the search direction \(\mathbf {p}^{(k)}=\mathbf {S}^{(k)}\mathbf {g}^{(k)}\) requires the multiplication of a vector by \(\mathbf {M}_\tau \). Let \( \begin{bmatrix} \mathbf {z}, \mathbf {w} \end{bmatrix} \in \mathbb {R}^{2n}\) be a given vector, then it immediately follows that

$$\begin{aligned} \mathbf {M}_\tau \begin{bmatrix}\mathbf {z} \\ \mathbf {w} \end{bmatrix} = \frac{1}{\tau } \begin{bmatrix} \mathbf {A}^H\mathbf {A}\big (2\mathbf {A}^H\mathbf {A}+\tau \mathbf {I}\big )^{-1}(\mathbf {z}+\mathbf {w}) +\tau \big (2\mathbf {A}^H\mathbf {A}+\tau \mathbf {I}\big )^{-1}\mathbf {z} \\ \mathbf {A}^H\mathbf {A}\big (2\mathbf {A}^H\mathbf {A}+\tau \mathbf {I}\big )^{-1}(\mathbf {z}+\mathbf {w}) +\tau \big (2\mathbf {A}^H\mathbf {A}+\tau \mathbf {I}\big )^{-1}\mathbf {w} \end{bmatrix}. \end{aligned}$$
(14)

Formula (14) needs the inversion of \(2\mathbf {A}^H\mathbf {A}+\tau \mathbf {I}\). Our experimental results indicate that the search direction can be efficiently and effectively computed as follows. Using the Sherman-Morrison-Woodbury formula, we obtain

$$\begin{aligned} \big (2\mathbf {A}^H\mathbf {A}+\tau \mathbf {I}\big )^{-1}= \frac{1}{\tau }\Big (\mathbf {I}-\mathbf {W}^H\mathbf {K}^H\big (\mathbf {K}\mathbf {K}^H +\frac{\tau }{2}\big )^{-1}\mathbf {K}\mathbf {W}\Big ) \end{aligned}$$
(15)
$$\begin{aligned} \mathbf {A}^H\mathbf {A}\big (2\mathbf {A}^H\mathbf {A}+\tau \mathbf {I}\big )^{-1} = \frac{1}{\tau }\Big ( \mathbf {W}^H \big (\mathbf {K}^H\mathbf {K} -\mathbf {K}^H\mathbf {K}\mathbf {K}^H(\mathbf {K}\mathbf {K}^H+\frac{\tau }{2})^{-1}\mathbf {K}\big )\mathbf {W} \Big ). \end{aligned}$$
(16)

Substituting (15) and (16) in (14), we obtain

$$\begin{aligned}&\mathbf {A}^H\mathbf {A}\big (2\mathbf {A}^H\mathbf {A}+\tau \mathbf {I}\big )^{-1}(\mathbf {z}+\mathbf {w}) +\tau \big (2\mathbf {A}^H\mathbf {A}+\tau \mathbf {I}\big )^{-1}\mathbf {z} =\nonumber \\&\quad \quad \quad \quad \frac{1}{\tau }\mathbf {W}^H \big (\mathbf {K}^H\mathbf {K} -\mathbf {K}^H\mathbf {K}\mathbf {K}^H(\mathbf {K}\mathbf {K}^H+\frac{\tau }{2})^{-1}\mathbf {K}\big ) (\mathbf {W}\mathbf {z}+\mathbf {W}\mathbf {w}) \nonumber \\&\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad -\mathbf {W}^H\mathbf {K}^H(\mathbf {K}\mathbf {K}^H+\frac{\tau }{2})^{-1}\mathbf {K})\mathbf {W}\mathbf {z} +\mathbf {z} \end{aligned}$$
(17)
$$\begin{aligned}&\mathbf {A}^H\mathbf {A}\big (2\mathbf {A}^H\mathbf {A}+\tau \mathbf {I}\big )^{-1}(\mathbf {z}+\mathbf {w}) +\tau \big (2\mathbf {A}^H\mathbf {A}+\tau \mathbf {I}\big )^{-1}\mathbf {w} =\nonumber \\&\quad \quad \quad \quad \frac{1}{\tau } \mathbf {W}^H \big (\mathbf {K}^H\mathbf {K} -\mathbf {K}^H\mathbf {K}\mathbf {K}^H(\mathbf {K}\mathbf {K}^H+\frac{\tau }{2})^{-1}\mathbf {K}\big ) (\mathbf {W}\mathbf {z}+\mathbf {W}\mathbf {w})\nonumber \\&\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad -\mathbf {W}^H\mathbf {K}^H(\mathbf {K}\mathbf {K}^H+\frac{\tau }{2})^{-1}\mathbf {K})\mathbf {W}\mathbf {w} +\mathbf {w}. \end{aligned}$$
(18)

Since \(\mathbf {K}\) is BCCB, it is diagonalized by the Discrete Fourier Transform (DFT), i.e. \(\mathbf {K} = \mathbf {U}^H\mathbf {D}\mathbf {U}\) where \(\mathbf {U}\) denotes the unitary matrix representing the DFT and \(\mathbf {D}\) is a diagonal matrix. Thus, we have:

$$\begin{aligned}&\quad \quad \quad \quad \quad \mathbf {K}^H(\mathbf {K}\mathbf {K}^H+\frac{\tau }{2})^{-1}\mathbf {K} = \mathbf {U}^H \Big ( \frac{|\mathbf {D}|^2}{|\mathbf {D}|^2+\frac{\tau }{2}} \Big )\mathbf {U} \end{aligned}$$
(19)
$$\begin{aligned}&\mathbf {K}^H\mathbf {K} -\mathbf {K}^H\mathbf {K}\mathbf {K}^H(\mathbf {K}\mathbf {K}^H+\frac{\tau }{2})^{-1}\mathbf {K} = \mathbf {U}^H \Big ( |\mathbf {D}|^2 -\frac{|\mathbf {D}|^4}{|\mathbf {D}|^2+\frac{\tau }{2}} \Big )\mathbf {U}. \end{aligned}$$
(20)

Substituting (19) and (20) in (17) and (18), we obtain

$$\begin{aligned} \mathbf {A}^H\mathbf {A}\big (2\mathbf {A}^H\mathbf {A}+\tau \mathbf {I}\big )^{-1}(\mathbf {z}+\mathbf {w}) +\tau \big (2\mathbf {A}^H\mathbf {A}+\tau \mathbf {I}\big )^{-1}\mathbf {z} =\quad \quad \quad \quad \quad \quad \quad \quad \nonumber \\ \frac{1}{\tau } \mathbf {W}^H\mathbf {U}^H \Big ( \big (|\mathbf {D}|^2-\frac{|\mathbf {D}|^4}{|\mathbf {D}|^2+\frac{\tau }{2}}\big ) (\mathbf {UW}\mathbf {z}+\mathbf {UW}\mathbf {w}) -\big ( \frac{|\mathbf {D}|^2}{|\mathbf {D}|^2+\frac{\tau }{2}} \big )\mathbf {UW}\mathbf {z} \Big ) +\mathbf {z} \end{aligned}$$
(21)
$$\begin{aligned} \mathbf {A}^H\mathbf {A}\big (2\mathbf {A}^H\mathbf {A}+\tau \mathbf {I}\big )^{-1}(\mathbf {z}+\mathbf {w}) +\tau \big (2\mathbf {A}^H\mathbf {A}+\tau \mathbf {I}\big )^{-1}\mathbf {w} =\quad \quad \quad \quad \quad \quad \quad \quad \nonumber \\ \frac{1}{\tau } \mathbf {W}^H\mathbf {F}^H \Big ( \big (|\mathbf {D}|^2-\frac{|\mathbf {D}|^4}{|\mathbf {D}|^2+\frac{\tau }{2}}\big )(\mathbf {UW}\mathbf {z}+\mathbf {UW}\mathbf {w}) -\big ( \frac{|\mathbf {D}|^2}{|\mathbf {D}|^2+\frac{\tau }{2}} \big )\mathbf {UW}\mathbf {w} \Big ) +\mathbf {w}. \end{aligned}$$
(22)

Equations (14), (21) and (22) show that, at each iteration, the computation of the search direction \(\mathbf {p}^{(k)}\) requires two products by \(\mathbf {W}\), two products by \(\mathbf {W}^H\), two products by \(\mathbf {U}\) and two products by \(\mathbf {U}^H\). The last products can be performed efficiently by using the FFT algorithm.

Fig. 1.
figure 1

Top line: exact image (left), Gaussian blurred image (middle) and MNP reconstruction (right). Bottom line: out-of-focus blurred blurred image (left) and MNP reconstruction (right). The noise level is NL = \(7.5\cdot 10^{-3}\).

4 Numerical Results

In this section, we present the numerical results of some image restoration test problems. The numerical experiments aim at illustrating the performance of MNP compared with some state-of-the-art methods as SALSA [1], CGIST [8], the nonmonotonic version of GPSR [6], and the Split Bregman method [7]. Even if SALSA has been shown to outperform GPSR [1], we consider GPSR in our comparative study since it solves, as MNP, the quadratic program (2). The Matlab source code of the considered methods, made publicly available by the authors, has been used in the numerical experiments. The numerical experiments have been executed in Matlab R2012a on a personal computer with an Intel Core i7-2600, 3.40 GHz processor.

The numerical experiments are based on the well-known Barbara image (Fig. 1), whose size is \(512\times 512\) and whose pixels have been scaled into the range between 0 and 1. In our experiments, the matrix \(\mathbf {W}\) represents an orthogonal Haar wavelet transform with four levels. For all the considered methods, the initial iterate \(\mathbf {x}^{(0)}\) has been chosen as \(\mathbf {x}^{(0)}=\mathbf {W}^H\mathbf {b}\); the regularization parameter \(\lambda \) has been heuristically chosen. In MNP, the parameter \(\tau \) of the Hessian approximation has been fixed at \(\tau = 100\lambda \). This value has been fixed after a wide experimentation and has been used in all the presented numerical experiments.

The methods iteration is terminated when the relative distance between two successive objective values becomes less than a tolerance \(tol_\phi \). A maximum number of 100 iterations has been allowed for each method.

Fig. 2.
figure 2

MSE (left column) and obiective function (right column) histories of MNP (blue solid line), SALSA (magenta dashed line), GPSR (red dotted line), CGIST (black dashdotted line) and Split Bregman (cyan dashdotted line) methods. Top line: Guassian blur; bottom line: out-of-focus blur. The noise level is NL = \(1.5\cdot 10^{-2}\). (Color figure online)

Table 1. Restoring the noised and blurred Barbara image: numerical results.

In the first experiment, the Barbara image has been convolved with a Gaussian PSF with variance equal to 2, obtained with the code psfGauss from [10], and then, the blurred image has been corrupted by Gaussian noise with noise level equal to \(7.5\cdot 10^{-3}\) and \(1.5\cdot 10^{-2}\). (The noise level \(\text {NL}\) is defined as \(\text {NL}:={\Vert \varvec{\eta }\Vert }\big /{\Vert \mathbf {A}\mathbf {x}_\text {original}\Vert }\) where \(\mathbf {x}_\text {original}\) is the original image and \(\varvec{\eta }\) is the noise vector.) In the second experiment, the Barbara image has been corrupted by out-of-focus blur, obtained with the code psfDefocus from [10] and by Gaussian noise with noise level equal to \(2.5\cdot 10^{-3}\) and \(7.5\cdot 10^{-3}\). The degraded images and the MNP restorations are shown in Fig. 1 for \(\text {NL}=7.5\cdot 10^{-3}\). Table 1 reports the Mean Squared Error (MSE) values, the objective function values and the CPU times in seconds obtained by using the stopping tolerance values \(tol_\phi =10^{-1},10^{-2},10^{-3}\). In Fig. 2, the MSE behavior and the decreasing of the objective function versus time (in seconds) are illustrated.

The reported numerical results indicate that MNP is competitive with the considered state-of-the-art and, in terms of MSE reduction, MNP reaches the minimum MSE value very early.

5 Conclusions

In this work, the MNP method has been proposed for sparsity constrained image restoration. In order to gain low computational complexity, the MNP method uses a fair approximation of the Hessian matrix so that the search direction can be computed efficiently by only using FFTs and fast sparsifying algorithms. The results of numerical experiments show that MNP may be competitive with some state-of-the-art methods both in terms of computational efficiency and accuracy.