1 Introduction

This paper is concerned with the solution of ill-conditioned linear systems of equations of the form

$$\begin{aligned} {{\textbf {d}}}={{\textbf {Gm}}} + {{\textbf {e}}}, \end{aligned}$$
(1.1)

arising from the discretization of continuous inverse problems in different fields of applied sciences and engineering; see, e.g., [3, 24]. Here \({{\textbf {G}}}\) is an \(M\times N\) real matrix representing a forward operator, \({{\textbf {d}}}\) and \({{\textbf {e}}}\) are real vectors of lengths M representing known measured data and unknown Gaussian white noise, respectively, and the unknown \({{\textbf {m}}}\) is a real vector of length N representing a quantity of interest. The singular values of \({{\textbf {G}}}\) may concentrate close to the origin and decay to zero, making the system ill-conditioned. Both one-dimensional (1D) and two-dimensional (2D) problems are considered in this paper; in the 2D case, the model parameters \({{\textbf {m}}}\) are obtained by stacking the columns of a rectangular \(N_z\times N_x\) array, so that \(N=N_z\cdot N_x\). Due to the presence of noise and the ill-conditioning of the system, a naive (e.g., least squares) solution of (1.1) possibly gives a meaningless estimate of the unknown \({{\textbf {m}}}\). Thus we seek to determine an approximation of \({{\textbf {m}}}\) by regularizing problem (1.1), i.e., by incorporating some a priori information about \({{\textbf {m}}}\) into the problem formulation; see [13].

In this paper, we assume that the solution to problem (1.1) is piecewise smooth, e.g., it consists of a smooth background while containing regions with discontinuities. This allows us to decompose \({{\textbf {m}}}\) as the sum of two components \({{\textbf {m}}}_1\) and \({{\textbf {m}}}_2\), i.e.,

$$\begin{aligned} {{\textbf {m}}} = {{\textbf {m}}}_1+{{\textbf {m}}}_2, \end{aligned}$$
(1.2)

where \({{\textbf {m}}}_1\) is the piecewise constant component of \({{\textbf {m}}}\) and \({{\textbf {m}}}_2\) is the smooth component of \({{\textbf {m}}}\). Following [11], we incorporate regularizarization by considering a Tikhonov regularization term for the smooth component and a(n anisotropic) total variation (TV) term for the piecewise constant component. A scalar balancing parameter \(\beta >0\) is included to control the amount of regularization associated to each term and, hence, to properly separate the two complementary components of the solution. Assuming that the value \(\varepsilon =\Vert {{\textbf {e}}}\Vert _2^2\) is available, the pair (\({{\textbf {m}}}_1,{{\textbf {m}}}_2\)) is obtained by solving the following Tikhonov-TV constrained optimization problem

$$\begin{aligned}&\mathop {\mathrm{minimize}}\limits _{{{\textbf {m}}}_1,{{\textbf {m}}}_2}\qquad {\Vert {{\textbf {D}}}_1{{\textbf {m}}}_1\Vert _1+\frac{\beta }{2}\Vert {{\textbf {D}}}_2 {{\textbf {m}}}_2\Vert _2^2}\nonumber \\&\text {subject to}\qquad {\Vert {{\textbf {G}}}({{\textbf {m}}}_1+{{\textbf {m}}}_2)-{{\textbf {d}}}\Vert _2^2}{=\varepsilon }, \end{aligned}$$
(1.3)

where \(\Vert \cdot \Vert _p\), \(p=1,2\), denotes the vector p-norm, and \({{\textbf {D}}}_1\) and \({{\textbf {D}}}_2\) are scaled finite difference discretizations of the gradient and partial second derivative operators, respectively.

The Tikhonov-TV regularization method (1.3) has gained considerable attention in scientific applications, for example: image denoising, inpainting and deblurring [11], signal decomposition [14], optical flow computation [16], signal filtering [21], seismic travel time tomography [2], and seismic full waveform inversion [1]. This paper proposes an alternative algorithm to the one in [11], featuring a fully automatic balancing parameter choice strategy, which can be reliably used to solve large scale problems arising in the applications mentioned above. Note that, in [11], \(\xi =\beta ^{-1}\) was used as balancing parameter; using \(\beta \) is more convenient for the approach presented in this paper.

In this paper we solve (1.3) using the alternating direction method of multipliers (ADMM) [7, 10]. This method has proved to be particularly advantageous for dealing with large scale problems such as those arising in geophysics and other scientific applications [5]. Specifically, ADMM splits the original optimization problem into reduced subproblems that are solved in sequence and whose solution is efficient and simple to implement. Furthermore, ADMM naturally handles non-smooth terms (such as the 1-norm term in (1.3)) by employing proximal methods. Very recent research has focussed on applying ADMM to nonlinear, nonconvex equality-constrained optimization problems, similar to problem formulation (1.3). For instance, [26] analyzes the convergence of a nonlinear objective function subject to a 2-norm squared equality constraint; because of our approach in handling each subproblem (see Sect. 2), such analysis can be adapted to our problem formulation (1.3).

The value of the parameter \(\beta \) has a great impact on the quality of the regularized solution for the problem at hand. The authors of [11] proposed to determine \(\beta \) via L-curve analysis, still assuming that an estimate of the error bound \(\varepsilon \) is known a priori. Specifically, an unconstrained counterpart of (1.3) is considered, which involves two parameters (one for weighting the fit-to-data and the regularization terms, and one for balancing the two regularization terms); the problem is solved by applying both the discrepancy principle and the L-curve (taking a range of values assigned to \(\beta \)), requiring the solution of many instances of the considered optimization problem (one for each value of \(\beta \)) to generate the L-curve. Recently, the authors of [14] presented a procedure for tuning \(\beta \) in image decomposition problems, i.e., problems like (1.3) where \({{\textbf {G}}}\) is the identity matrix. However, their method assumes that a good estimate of the minimum non-zero gradient norm of the blocky component \({{\textbf {m}}}_1\) is known a priori. Therefore, their method is limited to problems for which this quantity can be estimated directly from the input data.

In this paper, we present an extremely simple and efficient strategy for automatic selection of \(\beta \) using robust statistics. Our basic idea is that, since the desired model \({{\textbf {m}}}\) consists of the blocky component \({{\textbf {m}}}_1\) and the smooth component \({{\textbf {m}}}_2\), from a statistical point of view the model gradient \({{\textbf {D}}}_1 {{\textbf {m}}}\) is from a mixture of non-Gaussian and Gaussian distributions. Specifically, the entries of \({{\textbf {D}}}_1 {{\textbf {m}}}_1\) are considered as anomalies/outliers in the model gradient and therefore statistical anomaly detection tools are used to identify them. The robust z-score [15] allows us to optimally determine the lower limit of the anomalous entries (or the upper limit of normal/Gaussian distributed entries) in the gradient vector. In particular, the optimum \(\beta \) is determined such that \(\Vert {{\textbf {D}}}_1{{\textbf {m}}}_2\Vert _{\infty }\) is equal to the minimum value of the anomalies (or, equivalently, the maximum value of the normal entries) determined by the z-score. A simple algorithm is presented to achieve this, which updates the value of \(\beta \) at each ADMM iteration, leading to an optimal separation of the model components. Extensive numerical examples from imaging, tomography and geophysical inverse problems are presented to show the excellent performance of the proposed method for solving ill-posed inverse problems with piecewise smooth solutions.

The rest of the paper is organized as follows. In Sect. 2, we present more details about the formulation and ADMM implementation of the Tikhonov-TV regularization method (1.3). In Sect. 3, we introduce the new technique for the selection of the balancing parameter \(\beta \). Experimental results are displayed in Sect. 4. Finally, some concluding remarks are proposed in Sect. 5.

2 Solving the Tikhonov-TV regularized problem using ADMM

In this section we provide some details about the formulation and the solution of problem (1.3), where the parameter \(\beta \) is fixed. We start by precisely defining the matrices \({{\textbf {D}}}_{1}\) and \({{\textbf {D}}}_{2}\) introduced in Sect. 1. For problems (1.1) formulated in 1D, we take

(2.1)

and

(2.2)

where the alternative notations and highlight the differentiation order, the dimensionality of the problem, and the size of the matrices. Although, for the 1D case, , we keep these notations to better match the 2D case (see (2.4)). Note that \({{\textbf {D}}}_{1}\) and \({{\textbf {D}}}_{2}\) as expressed above are associated to a rescaled forward first order and second order finite difference scheme, respectively. For problems (1.1) formulated in 2D, denoting Kronecker products by \(\otimes \) and identity matrices of size N by \({{\textbf {I}}}_{N}\), we take

(2.3)

which represents the discrete partial first order derivatives in the x (or horizontal) direction (matrix on the top) and in the z (or vertical) direction (matrix on the bottom). Similarly to the 1D case, we take

(2.4)

Note that the one adopted above is not the only way of defining discrete differential operators of the first and second order. Indeed, one may consider alternative operators obtained varying boundary conditions and/or varying the way first-order operators are combined [13].

In order to compute the regularized solution \({{\textbf {m}}}\), one can solve problem (1.3) for \({{\textbf {m}}}_1\) and \({{\textbf {m}}}_2\) separately, or reformulate the problem to be solved directly for \({{\textbf {m}}}\). Note that, in general, the two components \({{\textbf {m}}}_1\) and \({{\textbf {m}}}_2\) are not unique: indeed, for any constant signal \({{\textbf {c}}}\), we have that \({{\textbf {D}}}_1{{\textbf {c}}}={{\textbf {0}}}\) and \({{\textbf {D}}}_2{{\textbf {c}}}={{\textbf {0}}}\), so that, if the pair (\({{\textbf {m}}}_1\), \({{\textbf {m}}}_2\)) is a solution of (1.3), then \(({{\textbf {m}}}_1\pm {{\textbf {c}}}, {{\textbf {m}}}_2 {\mp } {{\textbf {c}}})\) is also a solution of (1.3). However, the sum \({{\textbf {m}}}={{\textbf {m}}}_1+{{\textbf {m}}}_2\) is unique; see [25]. Because of this, in this paper, we solve directly for \({{\textbf {m}}}\).

Let us introduce the auxiliary variables

$$\begin{aligned} {{\textbf {e}}}={{\textbf {d}}}-{{\textbf {Gm}}},~~~{{\textbf {g}}}_1={{\textbf {D}}}_1{{\textbf {m}}}_1,~~~{{\textbf {g}}}_2={{\textbf {D}}}_1{{\textbf {m}}}_2\,, \end{aligned}$$
(2.5)

so that problem (1.3) can be written in the following equivalent form

$$\begin{aligned}&\mathop {\mathrm{minimize}}\limits _{{{\textbf {m}}},{{\textbf {g}}}_1,{{\textbf {g}}}_2,{{\textbf {e}}}}\qquad {\Vert {{\textbf {g}}}_1\Vert _1+\frac{\beta }{2}\Vert \bar{{{\textbf {D}}}}_1 {{\textbf {g}}}_2\Vert _2^2} \nonumber \\&\text {subject to} \qquad {{{\textbf {D}}}_1{{\textbf {m}}}}{={{\textbf {g}}}_1+{{\textbf {g}}}_2}, \nonumber \\&\qquad \qquad \qquad \qquad {{{\textbf {G}}}{{\textbf {m}}}+{{\textbf {e}}}}\,{={{\textbf {d}}}}, \nonumber \\&\qquad \qquad \qquad \qquad \quad {\Vert {{\textbf {e}}}\Vert _2^2}{=\varepsilon }. \end{aligned}$$
(2.6)

The augmented Lagrangian function associated with problem (2.6) is

$$\begin{aligned}&{\mathscr {L}}({{{\textbf {m}}}},{{{\textbf {g}}}}_1,{{{\textbf {g}}}}_2,{{{\textbf {e}}}},\widehat{\varvec{\lambda }}_1,\widehat{\varvec{\lambda }}_2,\widehat{{\lambda }}_3)= \Vert {{{\textbf {g}}}}_1\Vert _1+\frac{\beta }{2}\Vert \bar{{{{\textbf {D}}}}}_1 {{{\textbf {g}}}}_2\Vert _2^2 \nonumber \\&\quad - \langle \widehat{\varvec{\lambda }}_1,{{{\textbf {D}}}}_1{{{\textbf {m}}}}-{{{\textbf {g}}}}_1-{{{\textbf {g}}}}_2\rangle - \langle \widehat{\varvec{\lambda }}_2,{{{\textbf {G}}}}{{{\textbf {m}}}}+{{{\textbf {e}}}}-{{{\textbf {d}}}}\rangle - \langle {\widehat{\lambda }}_3,\Vert {{{\textbf {e}}}}\Vert _2^2-\varepsilon \rangle \nonumber \\&\quad + \frac{\mu _1}{2}\Vert {{{\textbf {D}}}}_1{{{\textbf {m}}}}-{{{\textbf {g}}}}_1-{{{\textbf {g}}}}_2\Vert _2^2 +\frac{\mu _2}{2}\Vert {{{\textbf {G}}}}{{{\textbf {m}}}}+{{{\textbf {e}}}}-{{{\textbf {d}}}}\Vert _2^ \frac{\mu _3}{2}(\Vert {{{\textbf {e}}}}\Vert _2^2-\varepsilon )^2,\end{aligned}$$
(2.7)

where \(\langle \cdot ,\cdot \rangle \) denotes the canonical inner product in \({\mathbb {R}}^d\) (\(d=1,M,2N\)), \(\widehat{\varvec{\lambda }}_1,\,\widehat{\varvec{\lambda }}_2,\,{\widehat{\lambda }}_3\) are the Lagrange multipliers, and \(\mu _1,\,\mu _2,\,\mu _3>0\) are the penalty parameters; see [17].

The kth ADMM iteration has the form

$$\begin{aligned} {{\textbf {m}}}^{k}&= \arg \min _{{{\textbf {m}}}} \,{\mathscr {L}}({{\textbf {m}}},{{\textbf {g}}}_1^{k-1},{{\textbf {g}}}_2^{k-1},{{\textbf {e}}}^{k-1},\widehat{\varvec{\lambda }}_1^{k-1},\widehat{\varvec{\lambda }}_2^{k-1},{\widehat{\lambda }}_3^{k-1}), \end{aligned}$$
(2.8a)
$$\begin{aligned} {{\textbf {g}}}_1^{k}&= \arg \min _{{{\textbf {g}}}_1} \,{\mathscr {L}}({{\textbf {m}}}^{k},{{\textbf {g}}}_1,{{\textbf {g}}}_2^{k-1},{{\textbf {e}}}^{k-1},\widehat{\varvec{\lambda }}_1^{k-1},\widehat{\varvec{\lambda }}_2^{k-1},{\widehat{\lambda }}_3^{k-1}), \end{aligned}$$
(2.8b)
$$\begin{aligned} {{\textbf {g}}}_2^{k}&= \arg \min _{{{\textbf {g}}}_2} \,{\mathscr {L}}({{\textbf {m}}}^{k},{{\textbf {g}}}_1^{k},{{\textbf {g}}}_2,{{\textbf {e}}}^{k-1},\widehat{\varvec{\lambda }}_1^{k-1},\widehat{\varvec{\lambda }}_2^{k-1},{\widehat{\lambda }}_3^{k-1}), \end{aligned}$$
(2.8c)
$$\begin{aligned} {{\textbf {e}}}^{k}&= \arg \min _{{{\textbf {e}}}} \,{\mathscr {L}}({{\textbf {m}}}^{k},{{\textbf {g}}}_1^{k},{{\textbf {g}}}_2^k,{{\textbf {e}}},\widehat{\varvec{\lambda }}_1^{k-1},\widehat{\varvec{\lambda }}_2^{k-1},{\widehat{\lambda }}_3^{k-1}), \end{aligned}$$
(2.8d)
$$\begin{aligned} \widehat{\varvec{\lambda }}_1^{k}&=\widehat{\varvec{\lambda }}_1^{k-1} - \mu _1 (\varvec{\nabla }{{\textbf {m}}}^{k}-{{\textbf {g}}}_1^{k}-{{\textbf {g}}}_2^{k}), \end{aligned}$$
(2.8e)
$$\begin{aligned} \widehat{\varvec{\lambda }}_2^{k}&=\widehat{\varvec{\lambda }}_2^{k-1} - \mu _2({{\textbf {G}}}{{\textbf {m}}}^{k}+{{\textbf {e}}}^k-{{\textbf {d}}}). \end{aligned}$$
(2.8f)
$$\begin{aligned} {\widehat{\lambda }}_3^{k}&={\widehat{\lambda }}_3^{k-1} - \mu _3(\Vert {{\textbf {e}}}^k\Vert _2^2-\varepsilon )\,. \end{aligned}$$
(2.8g)

By combining the linear and quadratic terms in the augmented Lagrangian function (2.7) as

$$\begin{aligned} -\langle \widehat{\varvec{\lambda }}_i,{{\textbf {x}}}\rangle + \frac{\mu _i}{2}\Vert {{\textbf {x}}}\Vert _2^2= \frac{\mu _i}{2}\Vert {{\textbf {x}}}-\left( \frac{1}{\mu _i}\right) \widehat{\varvec{\lambda }}_i \Vert _2^2 - \frac{\mu _i}{2}\Vert \widehat{\varvec{\lambda }}_i\Vert _2^2, \end{aligned}$$
(2.9)

and by making a change of variables \(\varvec{\lambda }_i=(1/\mu _i)\widehat{\varvec{\lambda }}_i, i=1,2,3\), the ADMM iteration in (2.8a)–(2.8f) can be written in the simpler scaled form

$$\begin{aligned} {{\textbf {m}}}^{k}&= \arg \min _{{{\textbf {m}}}} \left\{ \frac{\mu _1}{2} \Vert {{\textbf {D}}}_1{{\textbf {m}}}-{{\textbf {g}}}_1^{k-1}-{{\textbf {g}}}_2^{k-1} -\varvec{\lambda }_1^{k-1}\Vert _2^2 +\frac{\mu _2}{2}\Vert {{\textbf {Gm}}}+{{\textbf {e}}}^{k-1}-{{\textbf {d}}}-\varvec{\lambda }_2^{k-1}\Vert _2^2\right\} , \end{aligned}$$
(2.10a)
$$\begin{aligned} {{\textbf {g}}}_1^{k}&= \arg \min _{{{\textbf {g}}}_1} \left\{ \frac{\mu _1}{2} \Vert {{\textbf {D}}}_1{{\textbf {m}}}^k-{{\textbf {g}}}_1-{{\textbf {g}}}_2^{k-1} -\varvec{\lambda }_1^{k-1}\Vert _2^2+ \Vert {{\textbf {g}}}_1\Vert _1\right\} , \end{aligned}$$
(2.10b)
$$\begin{aligned} {{\textbf {g}}}_2^{k}&= \arg \min _{{{\textbf {g}}}_2} \left\{ \frac{\mu _1}{2} \Vert {{\textbf {D}}}_1{{\textbf {m}}}^k-{{\textbf {g}}}_1^{k}-{{\textbf {g}}}_2 -\varvec{\lambda }_1^{k-1}\Vert _2^2+\frac{\beta }{2}\Vert \bar{{{\textbf {D}}}}_1 {{\textbf {g}}}_2\Vert _2^2\right\} , \end{aligned}$$
(2.10c)
$$\begin{aligned} {{\textbf {e}}}^{k}&= \arg \min _{{{\textbf {e}}}} \left\{ \frac{\mu _2}{2}\Vert {{\textbf {Gm}}}^k+{{\textbf {e}}}-{{\textbf {d}}} -\varvec{\lambda }_2^{k-1}\Vert _2^2+\frac{\mu _3}{2}\left( \Vert {{\textbf {e}}}\Vert _2^2-\varepsilon - \lambda _3^{k-1}\right) ^2\right\} , \end{aligned}$$
(2.10d)
$$\begin{aligned} \varvec{\lambda }_1^{k}&=\varvec{\lambda }_1^{k-1} + {{\textbf {g}}}_1^{k}+{{\textbf {g}}}_2^{k}-{{\textbf {D}}}_1{{\textbf {m}}}^{k}, \end{aligned}$$
(2.10e)
$$\begin{aligned} \varvec{\lambda }_2^{k}&=\varvec{\lambda }_2^{k-1} + {{\textbf {d}}}- {{\textbf {e}}}^k-{{\textbf {G}}}{{\textbf {m}}}^{k}, \end{aligned}$$
(2.10f)
$$\begin{aligned} {\lambda }_3^{k}&={\lambda }_3^{k-1} + \varepsilon -\Vert {{\textbf {e}}}^k\Vert _2^2\,. \end{aligned}$$
(2.10g)

In the following, we explain how the subproblems (2.10a)–(2.10d) can be solved efficiently.

Regarding \({{\textbf {m}}}\), since the associated subproblem (2.10a) is a differential Tikhonov regularization problem in standard form, the optimality conditions for (2.10a) lead to

$$\begin{aligned} {{{\textbf {m}}}}^{k}= \left( \mu _1 {{{\textbf {D}}}}_1^T{{{\textbf {D}}}}_1 + \mu _2 {{{\textbf {G}}}}^T{{{\textbf {G}}}} \right) ^{-1}\!\!\! \left( \mu _1 {{{\textbf {D}}}}_1^T( {{{\textbf {g}}}}_1^{k-1}\!\! +\, {{{\textbf {g}}}}_2^{k-1}+\varvec{\lambda }_1^{k-1})\! + \mu _2{{{\textbf {G}}}}^T({{{\textbf {d}}}}-{{{\textbf {e}}}}^{k-1}\!\!+\varvec{\lambda }_2^{k-1})\right) \!.\quad \end{aligned}$$
(2.11)

Although \({{\textbf {m}}}^{k}\) has the closed form expression given above, in practice and in a large-scale setting, depending on the properties of \({{\textbf {G}}}\), it may be necessary to apply an iterative solver (such as CG or CGLS) to compute \({{\textbf {m}}}^{k}\); additional details about this will be provided in Sect. 4.

Regarding \({{\textbf {g}}}_1\), since the associated subploblem (2.10b) is, by definition, the proximal operator associated with the \(\ell _1\)- norm penalty, this can be computed as follows

$$\begin{aligned} {{\textbf {g}}}_1^{k}= {\mathscr {T}}_{\frac{1}{\mu _1}}({{\textbf {D}}}_1{{\textbf {m}}}^{k}-{{\textbf {g}}}_2^{k-1}-\varvec{\lambda }_1^{k-1}), \end{aligned}$$
(2.12)

where the operator \({\mathscr {T}}_{\frac{1}{\mu _1}}\) is the soft-thresholding operator defined component-wise as:

$$\begin{aligned} {[}{\mathscr {T}}_{\frac{1}{\mu _1}}({{\textbf {x}}})]_i= x_i~\max \left( 1 - \frac{1}{\mu _1|x_i|},0\right) . \end{aligned}$$
(2.13)

Regarding \({{\textbf {g}}}_2\), again the associated subproblem (2.10c) is differentiable, and the optimality condition for (2.10c) leads to

$$\begin{aligned} {{\textbf {g}}}_2^{k}= \left( {{\textbf {I}}} + (\beta /\mu _1) \bar{{{\textbf {D}}}}_1^T\bar{{{\textbf {D}}}}_1\right) ^{-1}\left( {{{\textbf {D}}}_1}{{\textbf {m}}}^{k}-{{\textbf {g}}}_1^{k}-\varvec{\lambda }_1^{k-1}\right) . \end{aligned}$$
(2.14)

This is a first-order Tikhonov filter and can be calculated efficiently by different methods including those based on the fast Fourier transform or the discrete cosine transform; see [23] and the references therein.

Regarding \({{\textbf {e}}}\), the associated subproblem (2.10d) is differentiable but nonconvex, and the optimality condition of (2.10d) guarantees that \({{\textbf {e}}}^{k}\) satisfies

$$\begin{aligned} \left( 1 + \frac{2\mu _3}{\mu _2} (\Vert {{\textbf {e}}}^{k}\Vert _2^2-\varepsilon - \lambda _3^{k-1})\right) {{\textbf {e}}}^{k} = {{\textbf {d}}}-{{\textbf {Gm}}}^k+\varvec{\lambda }_2^{k-1}\,. \end{aligned}$$
(2.15)

The solution to this system of equations may not be unique. However, we claim that

$$\begin{aligned} {{\textbf {e}}}^{k}= \gamma _k({{\textbf {e}}}^{k}) ({{\textbf {d}}}-{{\textbf {Gm}}}^k+\varvec{\lambda }_2^{k-1})\,, \end{aligned}$$
(2.16)

where the scale parameter \(\gamma _k=\gamma _k({{\textbf {e}}}^{k})\) is the maximum real root of the depressed cubic equation (see [6])

$$\begin{aligned} \gamma ^3\,+\, \frac{\mu _2-2\mu _3(\varepsilon +\lambda _3^{k-1})}{2\mu _3 \Vert {{{\textbf {d}}}}-{{{\textbf {Gm}}}}^k+\varvec{\lambda }_2^{k-1}\Vert _2^2}\,\gamma \, - \,\frac{\mu _2}{2\mu _3 \Vert {{{\textbf {d}}}}-{{{\textbf {Gm}}}}^k+\varvec{\lambda }_2^{k-1}\Vert _2^2}=0\,, \end{aligned}$$
(2.17)

satisfies (2.15) and is also the global minimizer of (2.10d). Indeed, expression (2.16) comes from the fact that \({{\textbf {e}}}^{k}\) on the left-hand side of (2.15) is premultiplied by a real scalar dependent on \(\Vert {{\textbf {e}}}^{k}\Vert _2^2\). By substituting \({{\textbf {e}}}^{k}\) expressed as in (2.16) into (2.15) leads to the determination of \(\gamma _k\) as a root of the depressed cubic equation (2.17). Furthermore, plugging \({{\textbf {e}}}^{k}\) from (2.16) into the objective function (2.10d) (here denoted by \(f(\gamma _k)\)), and letting

$$\begin{aligned} E_k=\Vert {{\textbf {d}}}-{{\textbf {Gm}}}^k+\varvec{\lambda }_2^{k-1}\Vert _2^2,\quad p_k=\frac{\mu _2-2\mu _3(\varepsilon +\lambda _3^{k-1})}{E_k},\quad q_k=-\frac{\mu _2}{E_k} \end{aligned}$$

to simplify the notations, we get that

$$\begin{aligned} f(\gamma _k)&=\frac{\mu _2}{2}\Vert {{\textbf {Gm}}}^k+\gamma _k ({{\textbf {d}}}-{{\textbf {Gm}}}^k +\varvec{\lambda }_2^{k-1}) -{{\textbf {d}}} -\varvec{\lambda }_2^{k-1}\Vert _2^2\nonumber \\&\quad +\frac{\mu _3}{2}\left( \Vert \gamma _k ({{\textbf {d}}}-{{\textbf {Gm}}}^k +\varvec{\lambda }_2^{k-1})\Vert _2^2-\varepsilon - \lambda _3^{k-1}\right) ^2\nonumber \\&= \frac{\mu _2 E_k}{2}(\gamma _k-1)^2 + \frac{\mu _3}{2}(E_k\gamma _k^2-\varepsilon - \lambda _3^{k-1})^2\nonumber \\&=\mu _3 E_k^2 \gamma _k \left( \frac{1}{2}\gamma _k^3 + \frac{\mu _2-2\mu _3(\varepsilon +\lambda _3^{k-1})}{2\mu _3 E_k}\gamma _k - \frac{\mu _2}{\mu _3 E_k} \right) \nonumber \\&\quad + \frac{1}{2}(\mu _2 E_k + \mu _3(\varepsilon + \lambda _3^{k-1})^2) \nonumber \\&=\mu _3 E_k^2 \gamma _k \left( -\frac{1}{2}\gamma _k^3 + \gamma _k^3 + p_k\gamma _k + q_k - \frac{\mu _2}{2\mu _3 E_k} \right) \nonumber \\&\quad + \frac{1}{2}(\mu _2 E_k + \mu _3(\varepsilon + \lambda _3^{k-1})^2) \nonumber \\&=-\mu _3 E_k^2 \gamma _k \left( \frac{1}{2}\gamma _k^3 + \frac{\mu _2}{2\mu _3 E_k} \right) + \frac{1}{2}(\mu _2 E_k + \mu _3(\varepsilon + \lambda _3^{k-1})^2)\,, \end{aligned}$$
(2.18)

where we have used the fact that \(\gamma _k\) solves (2.17). It follows that \(f(\gamma _k)\) is minimized when evaluated at the largest root of (2.17).

3 Balancing parameter selection by Robust statistics

The Tikhonov-TV regularized problem (1.3) introduced in Sect. 1 stems from the statistical assumption that the model gradient \({{\textbf {g}}}={{\textbf {D}}}_1{{\textbf {m}}}\) is a mixture of two components resulting from different distributions: a non-Gaussian distributed (sparse) component \({{\textbf {g}}}_1={{\textbf {D}}}_1{{\textbf {m}}}_1\) and a Gaussian distributed (non-sparse) component \({{\textbf {g}}}_2={{\textbf {D}}}_1{{\textbf {m}}}_2\). We determine the balancing parameter \(\beta \) by using statistical tools that allow to optimally separate these two components of the gradient. Specifically, we assume that the non-zero entries of \({{\textbf {g}}}_1\), which are associated with jumps in the regularized solution, can be considered as anomalies (or outliers) in the gradient vector \({{\textbf {g}}}\). The lower value of these anomalous entries can be optimally determined by robust statistics [19].

According to the classical z-score statistic, an element of a data set is considered an anomaly if it falls outside a distance (e.g., 2.5 standard deviations) from the mean. A smaller distance (e.g., 2.0 standard deviations) can be used if the model size is small and a larger distance (e.g., 3.0 or 3.5 standard deviations) can be used if the model size is large. Therefore, its value can be determined according to the user’s perspective; through this paper we denote this threshold distance by \({\tau _{\tiny {\text {nrm}}}}\) and, unless otherwise stated, we set its value to 2.5 for all numerical examples. A main difficulty in anomaly detection is that the z-score is sensitive to extreme values because the mean and standard deviation are sensitive to extreme values.

The robust z-score [15] is based on robust estimators rather than the mean and the standard deviation. A robust measure of mean is the median. For a given vector \({{\textbf {g}}}\), the median of \({{\textbf {g}}}\), denoted by \(\text {median}({{\textbf {g}}})\), is the value such that at least half of the entries in \({{\textbf {g}}}\) are smaller than or equal to \(\text {median}({{\textbf {g}}})\), and that at least half of the entries in \({{\textbf {g}}}\) are larger than or equal to \(\text {median}({{\textbf {g}}})\). Thus, if \({{\textbf {g}}}\) has an odd number of sorted elements, then \(\text {median}({{\textbf {g}}})\) is the middle value; otherwise it can be the average of the two middle values. A robust measure of standard deviation is known as the median absolute deviation (MAD). For a given vector \({{\textbf {g}}}\), the MAD of \({{\textbf {g}}}\), denoted by \(\text {MAD}({{\textbf {g}}})\), is given by

$$\begin{aligned} \text {MAD}({{\textbf {g}}})=1.4826~ \text {median}({{\textbf {g}}} - \text {median}({{\textbf {g}}}))\,, \end{aligned}$$
(3.1)

where the constant value 1.4826 is used to make the estimator consistent for Gaussian distributions; see [18]. Other alternatives to the MAD can also be used for estimation of the standard deviation; see, again, [18]. Using these robust measures of the mean and standard deviation, a robust z-score is assigned to each gradient sample, calculated as

$$\begin{aligned} z_i=\frac{[{{\textbf {g}}}]_i - \text {median}({{\textbf {g}}})}{\text {MAD}({{\textbf {g}}})}\,, \end{aligned}$$
(3.2)

and every element for which \(|z_i|\) is larger than the threshold value \({\tau _{\tiny {\text {nrm}}}}\) is considered as anomaly. Consequently, we define \(\text {nrm}({{\textbf {g}}})\) as the vector comprising the ‘normal’ (i.e., the ‘not-anomalous’) entries of \({{\textbf {g}}}\), i.e.,

$$\begin{aligned} \text {nrm}({{\textbf {g}}})=\{[{{\textbf {g}}}]_i:~ |z_i| \le {\tau _{\tiny {\text {nrm}}}}\}. \end{aligned}$$
(3.3)

According to the insight given at the beginning of this section, the optimal balancing parameter to be used in (1.3) should be such that the smooth components \({{\textbf {g}}}_2\) of the computed solution gradient \({{\textbf {g}}}\) are classified as ‘normal’ (3.3) according to the robust z-score (3.2). We impose this condition by defining as a root of the equation

(3.4)

In the above equation, the notations \({{\textbf {g}}}_2(\beta )\) and \({{\textbf {g}}}(\beta )\) have been employed to highlight the dependence of the vectors \({{\textbf {g}}}_2\) and \({{\textbf {g}}}\) on \(\beta \); in the following, both these notations will be freely used depending on the context. In (3.4), the vectors \({{\textbf {g}}}_2\) and \(\text {nrm}({{\textbf {g}}})\) are evaluated in the max norm to enforce that the value of the maximum entry in the smooth and normal components of \({{\textbf {g}}}\) coincide.

Proposition 3.1

The function \(\phi (\beta )\) defined in (3.4) has at least one positive root.

Proof

Once the ADMM iterations (2.10a)–(2.10g) have converged, we can write the computed \({{\textbf {g}}}_1\) and \({{\textbf {g}}}_2\) as

$$\begin{aligned} {{\textbf {g}}}_1= & {} {\mathscr {T}}_{\frac{1}{\mu _1}}({{\textbf {D}}}_1{{\textbf {m}}}-{{\textbf {g}}}_2-\varvec{\lambda }_1) \end{aligned}$$
(3.5)
$$\begin{aligned} {{\textbf {g}}}_2= & {} \left( {{\textbf {I}}} + (\beta /\mu _1) \bar{{{\textbf {D}}}}_1^T\bar{{{\textbf {D}}}}_1\right) ^{-1}\left( {{{\textbf {D}}}_1}{{\textbf {m}}}-{{\textbf {g}}}_1-\varvec{\lambda }_1\right) \end{aligned}$$
(3.6)

where the updates rules (2.12) and (2.14) have been exploited. From the above equalities it can be clearly seen that, when \(\beta = 0\), \({{\textbf {g}}}_2(0)={{\textbf {D}}}_1{{\textbf {m}}}- {{\textbf {g}}}_1-\varvec{\lambda }_1\), so that

$$\begin{aligned} {{\textbf {g}}}_1 = {\mathscr {T}}_{\frac{1}{\mu _1}}({{\textbf {D}}}_1{{\textbf {m}}}-{{\textbf {g}}}_2-\mathbf {\lambda }_1)={\mathscr {T}}_{\frac{1}{\mu _1}}({{\textbf {g}}}_1)\,, \end{aligned}$$

which holds if and only if \({{\textbf {g}}}_1={\mathbf {0}}\). As a consequence, \({{\textbf {g}}}(0)={{\textbf {g}}}_2(0)\). After defining the diagonal matrix \({{\textbf {D}}}({{\textbf {g}}}_2)\) whose diagonal entries are

$$\begin{aligned}{}[{{\textbf {D}}}({{\textbf {g}}}_2)]_{i,i}={\left\{ \begin{array}{ll}1 &{} \text{ if } [{{\textbf {g}}}_2]_i\in \text {norm}({{\textbf {g}}})\\ 0 &{} \text{ otherwise } \end{array}\right. }\,, \end{aligned}$$

and using standard norm inequalities, we get

$$\begin{aligned} \Vert \text {nrm}({{\textbf {g}}}(0))\Vert _{\infty }=\Vert {{\textbf {D}}}({{\textbf {g}}}_2){{\textbf {g}}}_2(0)\Vert _{\infty }\le \Vert {{\textbf {D}}}({{\textbf {g}}}_2)\Vert _{\infty }\Vert {{\textbf {g}}}_2(0)\Vert _{\infty } =\Vert {{\textbf {g}}}_2(0)\Vert _{\infty }\,, \end{aligned}$$

so that

$$\begin{aligned} \phi (0)=\Vert {{\textbf {g}}}_2(0)\Vert _{\infty }-\Vert \text {nrm}({{\textbf {g}}}_2(0))\Vert _{\infty }\ge 0\,. \end{aligned}$$

Let us denote by \({{\textbf {v}}}_i\) and \(\sigma _i\), \(i=1,\dots ,N\) the eigenvectors and nonnegative eigenvalues of \(\bar{{{\textbf {D}}}}_1^T\bar{{{\textbf {D}}}}_1\), respectively. It follows that \({{\textbf {g}}}_2\) can be expressed as

$$\begin{aligned} {{\textbf {g}}}_2=\sum _{i=1}^N\left( 1+\frac{\beta }{\mu _1}\sigma _i\right) ^{-1}({{\textbf {v}}}_i^T({{\textbf {D}}}_1{{\textbf {m}}}-{{\textbf {g}}}_1-\varvec{\lambda }_1)){{\textbf {v}}}_i, \end{aligned}$$

so that, when \(\beta \rightarrow +\infty \), \({{\textbf {g}}}_2(\beta )\rightarrow {\mathbf {0}}\) and \(\Vert {{\textbf {g}}}_2(\beta )\Vert _{\infty }\rightarrow 0\). As a consequence,

$$\begin{aligned} \phi (\beta )=\Vert {{\textbf {g}}}_2(\beta )\Vert _{\infty }-\Vert \text {nrm}({{\textbf {g}}}(\beta ))\Vert _{\infty }\rightarrow -\Vert \text {nrm}({{\textbf {g}}}(\beta ))\Vert _{\infty }\le 0\quad \text{ as }\quad \beta \rightarrow +\infty \,. \end{aligned}$$

The result follows from the continuity of \(\phi (\beta )\), applying the intermediate value theorem. \(\square \)

Under the reasonable assumption that \(\Vert \text {nrm}({{\textbf {g}}}(\beta ))\Vert _{\infty }\) is an increasing function of \(\beta \), it is possible to prove that \(\phi (\beta )\) defined in (3.4) is a strictly decreasing function of \(\beta >0\) and, therefore, the zero of \(\phi (\beta )\) is unique; see the arguments in Appendix A for more details.

In order to solve (3.4), we first reformulate it as an equivalent fixed point problem of the form

$$\begin{aligned} \beta \left( \Vert \text {nrm}({{\textbf {g}}}(\beta ))\Vert _{\infty } +\Vert {{\textbf {g}}}_2(\beta )\Vert _{\infty }\right) =\beta \left( 2\Vert {{\textbf {g}}}_2(\beta )\Vert _{\infty }\right) \,. \end{aligned}$$
(3.7)

Then, starting from an initial value \(\beta ^0>0\), we solve (3.7) via an iteration of the form

$$\begin{aligned} \beta ^{j+1}=\frac{1}{2}\beta ^j + \frac{1}{2}\underbrace{\left( \frac{4\Vert {{\textbf {g}}}_2(\beta ^j)\Vert _{\infty }}{\Vert {{\textbf {g}}}_2(\beta ^j)\Vert _{\infty }+\Vert \text {nrm}({{\textbf {g}}}(\beta ^j))\Vert _{\infty }}-1\right) \beta ^j}_{=:\psi (\beta ^j)},\quad j=0,1,\dots \,. \end{aligned}$$
(3.8)

Since the above update rule can be regarded as an averaged iteration algorithm and since, under mild assumptions, it can be proved that the function \(\psi (\beta )\) is not expansive (see Appendix A), thanks to the theory presented in [20] the iteration (3.8) is guaranteed to globally converge to a fixed point of (3.7) and, therefore, to a solution to (3.4).

Within the ADMM scheme (2.10a)–(2.10e) we dynamically tune the value of \(\beta \) according to rule (3.8), in such a way that condition (3.4) is (approximately) satisfied for the final regularized solution (i.e., at the end of the ADMM iterations). Namely, suppose that, at iteration k, we have access to the balancing parameter \(\beta ^{k-1}\) and to the quantities \({{\textbf {g}}}_1^{k-1}\), \({{\textbf {g}}}_2^{k-1}\), \({{\textbf {e}}}^{k-1}\), \(\varvec{\lambda }_1^{k-1}\), \(\varvec{\lambda }_2^{k-1}\). Then: \({{\textbf {m}}}^{k}\) is computed from \({{\textbf {g}}}_1^{k-1}\), \({{\textbf {g}}}_2^{k-1}\), \({{\textbf {e}}}^{k-1}\), \(\varvec{\lambda }_1^{k-1}\), \(\varvec{\lambda }_2^{k-1}\); \({{\textbf {g}}}_1^{k}\) is computed from \({{\textbf {m}}}^{k}\), \({{\textbf {g}}}_2^{k-1}\), \(\varvec{\lambda }_1^{k-1}\); \({{\textbf {g}}}_2^{k}\) is computed from \({{\textbf {m}}}^{k}\), \({{\textbf {g}}}_1^{k}\), \(\varvec{\lambda }_1^{k-1}\) and \(\beta ^{k-1}\). We set the value of \(\beta ^{k}\) by applying one iteration of the update rule (3.8) and using the updated quantities available at iteration k (i.e., \({{\textbf {g}}}_2^k(\beta ^{k-1})\) and \({{\textbf {g}}}^k(\beta ^{k-1})\)). Although a convergence proof for the ADMM scheme (2.8a)–(2.8f), with \(\beta \) updated according to (3.8) is outside the scope of this paper, numerical experiments consistently show that both ADMM applied with the fixed value obtained when (3.8) has converged and ADMM applied with adaptive \(\beta \) selected by (3.8) converge to the same solution; see also Sect. 4. The resulting method is summarized in Algorithm 1.

figure a

4 Numerical experiments

In this section, we assess the performance of the proposed automated ADMM algorithm in solving inverse problems arising in different applications.

Concerning the inputs of Algorithm 1, for all the experiments we set \(\mu _1>\mu _2\) and \(\mu _2=\mu _3= 1\). Moreover, we assume that the value \(\varepsilon =\Vert {{\textbf {e}}}\Vert _2^2\) is known in all but one experiments, where we test the robustness of the proposed solver with respect to inaccuracies in the value of \(\varepsilon \); similarly, we take \({\tau _{\tiny {\text {nrm}}}}=2.5\) in all but one experiments, where we test the robustness of the proposed solver with respect to the choice of \({\tau _{\tiny {\text {nrm}}}}\). The iterations are stopped either when a maximum number of iterations is reached or when the computed \({{\textbf {m}}}\) stabilizes, i.e., \({\Vert {{{\textbf {m}}}}^{k+1}-{{{\textbf {m}}}}^k\Vert _2}/{\Vert {{{\textbf {m}}}}^k\Vert _2} <10^{{-4}}\).

We compare the results obtained by applying the new method to the results obtained applying TV regularization and Tikhonov regularization (with a regularization term of the form \(\Vert \bar{{{\textbf {D}}}}_1{{\textbf {D}}}_1{{\textbf {m}}}\Vert _2^2\)). Both TV and Tikhonov regularization are implemented through ADMM and can be regarded as special cases of Algorithm 1. Specifically, to recover the TV formulation, we set \({{\textbf {g}}}_2={\mathbf {0}}\), so that minimization in (2.6) happens on \({{\textbf {m}}}\), \({{\textbf {g}}}_1\) and \({{\textbf {e}}}\) only, and the first constraint reduces to \({{\textbf {D}}}_1{{\textbf {m}}}={{\textbf {g}}}_1\); as a consequence, the updates (2.11), (2.12), (2.15) and (2.10e)–(2.10g) are performed taking \({{\textbf {g}}}_2^{k-1}={{\textbf {g}}}_2^k={\mathbf {0}}\), and update (2.14) is discarded. Similarly, to recover the Tikhonov formulation, we set \({{\textbf {g}}}_1={\mathbf {0}}\), so that minimization in (2.6) happens on \({{\textbf {m}}}\), \({{\textbf {g}}}_2\) and \({{\textbf {e}}}\) only, and the first constraint reduces to \({{\textbf {D}}}_1{{\textbf {m}}}={{\textbf {g}}}_2\); as a consequence, the updates (2.11), (2.14), (2.15) and (2.10e)–(2.10g) are performed taking \({{\textbf {g}}}_1^{k-1}={{\textbf {g}}}_1^k={\mathbf {0}}\), and update (2.12) is discarded. To quantitatively evaluate the accuracy of each algorithm we compute the 2-norm relative error at the kth iteration, \(k=1,2,\dots \), defined as \({\Vert {{{\textbf {m}}}}^k-{{{\textbf {m}}}}\Vert _2}/{\Vert {{{\textbf {m}}}}\Vert _2}\).

4.1 Compressed sensing and denoising

The compressed sensing theory allows to recover a sparse signal from a small number of random projections [9]. In this subsection, we first show the performance of the new automated Tikhonov-TV regularization method in recovering 1D signals with different features, in the framework of the compressed sensing theory. We consider four test signals, ranging from very smooth to very rough: these are shown at the top row of Fig. 1 (dashed red lines). The observation matrix \({{\textbf {G}}}\) consists of 1024 random column vectors of length 250, all drawn from a standard Gaussian distribution and then normalized. Each observation vector is generated as \({{\textbf {d}}}={{\textbf {Gm}}}+{{\textbf {e}}}\) where \({{\textbf {m}}}\) is the test signal and \({{\textbf {e}}}\) is some Gaussian white noise (such that the ‘noise level’ \(\Vert {{\textbf {e}}}\Vert _{2}/\Vert {{\textbf {G}}}{{\textbf {m}}}\Vert _2\) is 0.1%). To evaluate the long-term behavior of Algorithm 1, we stop the method when it has computed 500 iterations. The first row of Fig. 1 shows the reconstructed signals (in blue) overlaid by the ground truth (dashed, in red), while the nonsmooth and smooth components of the regularized solution are shown in the second and third rows, respectively. We can observe that nearly optimal reconstructions and decompositions are obtained for all of the signals. The model gradients of the recovered signals are shown in the last row of Fig. 1 (in blue), with the gradient of the recovered smooth component (dashed red curve) overlaid. In each frame in the bottom row of Fig. 1, the lower bounds for the anomalous gradient entries as determined by the z-score (3.2), (3.3) are shown by horizontal dashed lines. We can observe that, in all of the cases, the z-score determined meaningful bounds, which resulted in an optimal determination of \(\beta \). Fig. 2 shows the evolution of \(\beta \) and \(\phi (\beta )\) versus iteration count for all the four signals, with different initial guesses \(\beta _0\). We can observe that formula (3.8) stably converges to the root of \(\phi \), irrespective of the initial value \(\beta _0\).

Fig. 1
figure 1

Compressed sensing problem. First row: reconstructed signal (in blue) overlaid by the ground truth (in red). Second and third row: the blocky component \({{\textbf {m}}}_1\) and the smooth component \({{\textbf {m}}}_2\), respectively. Last row: the gradient of the reconstruction (in blue) overlaid by the gradient of the smooth component \({{\textbf {m}}}_2\) (in red); the horizontal dashed lines show the lower value determined for anomalous components

Fig. 2
figure 2

Compressed sensing problem. Evolution of the balancing parameter \(\beta \) (top row) and \(\phi (\beta )\) (bottom row) versus iteration count, for different initial values \(\beta _0\) of \(\beta \), corresponding to the signals shown in Fig. 1

We then turn to a 2D image denoising example. The image which is to be denoised, displayed in Fig. 3a, is obtained by adding some Gaussian white noise of noise level \(\Vert {{\textbf {e}}}\Vert _2/\Vert {{\textbf {m}}}\Vert _2=30\%\ \) to a clean synthetic image which consists of piecewise-smooth regions, displayed in Fig. 3b. We use the combined Tikhonov-TV method and, to evaluate its regularization performance, we compare with the TV-only and the Tikhonov-only methods to denoise the image (as explained above). We run each method for 500 iterations (to evaluate the long-term behavior of Algorithm 1 even after the stopping criterion based on the stabilization of the solution is satisfied) and the obtained results are depicted in Fig. 3c-e. As expected, the Tikhonov-TV method estimates a more accurate image by balancing between restoration of the sharp edges and of the smooth regions; instead, the TV-only and Tikhonov-only methods give more emphasis on the restoration of the sharp edges and smooth regions, respectively, and hence provide a suboptimal estimate of the noiseless image. The evolution of the discrepancy for all three methods is depicted in Fig. 4a. Also, Figs. 4b-c show the evolution of \(\beta \) and \(\phi (\beta )\) versus iteration count for the Tikhonov-TV method. For this test problem, we experimentally study the sensitivity of the new Algorithm 1 with respect to a couple of algorithmic parameters that should be given in input, namely: the value of the noise magnitude \(\varepsilon =\Vert {{\textbf {e}}}\Vert _2^2\) and the threshold \({\tau _{\tiny {\text {nrm}}}}\) for the normal entries of \({{\textbf {g}}}\) (3.3). Concerning the former, we consider cases where the true value of \(\varepsilon \) is over- and under-estimated by 5% and 10%. The results of these tests are displayed in Fig. 5, a-d. In particular, looking at frame (a), it is evident that, as the iterations proceed, the value of the squared discrepancy \(\Vert {{\textbf {G}}}{{\textbf {m}}}- {{\textbf {d}}}\Vert _2^2\) stabilizes around the inputted values of \(\varepsilon \) (leading to over- and under-fitted data when \(\varepsilon \) is under- and over-estimated, respectively). Looking at frames (b)-(c) we can see that the update rule (3.8) for \(\beta \) converges to a zero of \(\phi (\beta )\) for all the considered values of \(\varepsilon \), although the values of \(\beta \) determined at the end of the iterations differ. The impact of an inaccurate value of \(\varepsilon \) on the quality of the solution is visible in frame (d): it is clear that, for this test problem, an under-estimation of \(\varepsilon \) (corresponding to an over-fitting) of the data still leads to results comparable to the case where an accurate value of \(\varepsilon \) is considered; naturally, the higher the error in the estimate of \(\varepsilon \), the lower the quality of the computed solution. In Fig. 5e we illustrate some of the entries of the recovered noise vector \({{\textbf {e}}}\) at the end of the ADMM iterations. The new ADMM formulation allows recovery of the random noise vector \({{\textbf {e}}}\) (through subproblem (2.8d)), alongside the inverse problem solution \({{\textbf {m}}}\) and its gradient components \({{\textbf {g}}}_1\) and \({{\textbf {g}}}_2\). We can clearly see that, for this test problem, the computed approximation of \({{\textbf {e}}}\) quite carefully reproduces the behavior of the unknown noise corrupting the original image. Concerning the sensitivity of Algorithm 1 with respect to the value of \({\tau _{\tiny {\text {nrm}}}}\) we consider the default value \({\tau _{\tiny {\text {nrm}}}}=2.5\) and \({\tau _{\tiny {\text {nrm}}}}=2\), \({\tau _{\tiny {\text {nrm}}}}=3\); the behavior of the solver for these tests are displayed in Fig. 6. Looking at frame (a) we can clearly see that a higher value of \({\tau _{\tiny {\text {nrm}}}}\) results in a lower value of the computed \(\beta \) at the end of the iterations of Algorithm 1: this is expected, as a higher \({\tau _{\tiny {\text {nrm}}}}\) implies more entries of \({{\textbf {g}}}\) to be regarded as ‘normal’ (see Eq. (3.3)), which is achieved by imposing less penalisation on the smoothness-enforcing term in (2.6). Similarly, a lower \({\tau _{\tiny {\text {nrm}}}}\) results in a higher value of \(\beta \), i.e., less entries of \({{\textbf {g}}}\) are regarded as ‘normal’ and the smoothness-enforcing term in (2.6) is penalised more. Looking at frame (b) we can see that such variations in \({\tau _{\tiny {\text {nrm}}}}\) do not impact the quality of the solution computed by Algorithm 1. Even if not reported, a similar behavior is observed testing a wider range of \({\tau _{\tiny {\text {nrm}}}}\) values. Finally, we experimentally assess the convergence properties of the new ADMM scheme (2.8a)–(2.8f) applied with iteration-dependent choice of the balancing parameter \(\beta \) according to (3.8), as well as with a fixed value \(\beta =\beta _{500}\), i.e., picking the value the fixed point iteration (3.8) converged to after 500 iterations. Fig. 7 shows the residual and error values versus iteration number: we can clearly see that, although the two instances of ADMM display some discrepancies for approximately the first 100 iterations, they eventually converge to the same value. Even if not reported, this behavior is observed in all the considered test problems.

Fig. 3
figure 3

Image denoising example. a degraded image by a Gaussian noise of (noise level 30%); b noise-free image; c estimated image by the proposed method (error = \(1.98\times 10^{-4}\)); d estimated image by the TV method (error = \(3.18\times 10^{-4}\)); e estimated image by the Tikhonov method (error = \(4.08\times 10^{-4}\))

Fig. 4
figure 4

Image denoising example. Evolution of a discrepancy, b \(\beta \) and c \(\phi (\beta )\) versus iteration for the proposed method. The asterisk highlights the quantities computed at the 135th iteration, i.e., when the stopping criterion on the stabilization of the solution is satisfied

Fig. 5
figure 5

Image denoising example. Sensitivity of Algorithm 1 with respect to over and under estimations of \(\varepsilon \). a-d evolution of a discrepancy, b \(\beta \), c \(\phi (\beta )\), and d 2-norm of the estimated model error. Frame e shows approximation of the entries 166 to 346 (at column 256) of noise vector \({{\textbf {e}}}\) at the 500th (final) ADMM iteration. In this frame, the black line shows the true noise

Fig. 6
figure 6

Image denoising example. Evolution of a \(\beta \) and b 2-norm relative error versus iteration for the proposed method for different values of the threshold \(\tau _{\tiny {\text {nrm}}}\) in (3.3)

Fig. 7
figure 7

Image denoising example. Evolution of a discrepancy and b 2-norm of the estimated model error for Algorithm 1 with iteration-dependent choice of \(\beta \) and for the ADMM method (2.8a)–(2.8f) with fixed \(\beta =\beta _{500}\)

4.2 Inverse problems in geophysics

We apply the new automated Tikhonov-TV regularization method for subsurface interval-velocity estimation from root-mean-square (RMS) velocities. For a horizontally layered earth model, the RMS velocity V(t) is a continuous function of time defined as

$$\begin{aligned} V(t) = \sqrt{\frac{1}{t}\int _{0}^t [v(\tau )]^2d\tau }, \end{aligned}$$
(4.1)

where v(t) is the instantaneous/interval velocity function and t is the wave propagation time; see [8]. In practice, the velocity analysis of common-depth-point (CDP) gathers gives an estimate of the RMS velocity; when both the RMS velocity V(t) and the interval velocity v(t) are needed, the estimation of the latter from the RMS velocity is a severely ill-conditioned problem, thus proper regularization is required to stabilize the solution. Upon discretization, Eq. (4.1) reads

$$\begin{aligned} V_i = \sqrt{\frac{1}{i}\sum _{j=1}^i v^2_j}, \end{aligned}$$
(4.2)

where \(i=1,\ldots ,N\) is the sample number and N is the number of earth layers. Squaring both sides of Eq. (4.2) results in a system of linear equations of the form \({{\textbf {d}}}={{\textbf {G}}}{{\textbf {m}}}\), where \(d_i=iV_i^2\), \(m_j=v_j^2\) and the discrete forward operator \({{\textbf {G}}}\) is a causal integration matrix, i.e., a lower triangular matrix of ones

$$\begin{aligned} {{\textbf {G}}}= \begin{pmatrix} 1 &{} &{} &{} &{} \\ 1 &{} 1 &{} &{} &{} \\ 1 &{} 1 &{} 1 &{} &{} \\ \vdots &{}\vdots &{} \ddots &{} \ddots &{} \\ 1 &{} 1 &{} \cdots &{} 1&{} 1 \end{pmatrix}\in {\mathbb {R}}^{N\times N}. \end{aligned}$$
(4.3)

We simulate a 1D RMS velocity vector from a velocity log of the 2004 BP model [4]. The interval and RMS velocities are shown in Figs. 8a and b, respectively. In order to make the simulation more realistic, only 25% of the RMS velocity elements were used as input to the inversion, mimicking the real situations where one only picks RMS velocities at the positions of strong reflections. This leads to a compressed sensing problem for the Dix inversion [12]. For this example, the forward operator is \(\varPhi {{\textbf {G}}}\), where \(\varPhi \) is a 382 by 1911 sampling matrix (made up of 382 rows of an identity matrix of size 1911, picked at the locations of the velocity model with strong seismic response/reflectivity) and \({{\textbf {G}}}\) is defined as in (4.3) and has size \(1911\times 1911\). The estimated interval velocity obtained applying 100 iterations of Algorithm 1 is shown in Fig. 8c; we can clearly observe that the interval velocities are estimated accurately.

We then consider a 2D Dix inversion example. The full RMS velocity field is shown in Fig. 9a. We used 30% of the traces (vertical lines, picked at random) (shown in Fig. 9b) as the input to the inversion. In this case, the size of the RMS velocity field is 478 \(\times \) 1349 pixels and the forward operator is of the form \(\varPhi \otimes {{\textbf {G}}}\), where \(\varPhi \) is a sampling matrix of size \(404\times 1349\) (made up of 404 rows of an identity matrix of size 1349, picked at random) and \({{\textbf {G}}}\) is defined as in (4.3) and has size \(478\times 478\). The estimated full interval velocity field obtained applying 100 iterations of Algorithm 1 is shown in Fig. 9c ; also for this test problem we can clearly see that the interval velocity model is reconstructed accurately. Fig. 10 shows the evolution of \(\beta \) and \(\phi (\beta )\) versus iteration for the considered 1D and 2D Dix inversion examples.

Fig. 8
figure 8

One-dimensional Dix velocity inversion. a True interval velocity. b Decimated noisy RMS velocity. c Estimated interval velocity

Fig. 9
figure 9

Two-dimensional Dix velocity inversion. a RMS velocity field corresponding to the 2004 BP model. b Input RMS field from velocity scan using 30% of CDP gathers selected at random. c Interval velocity field obtained by inversion using the proposed method

Fig. 10
figure 10

Dix velocity inversion. Left frame: values of the balancing parameter \(\beta \) versus iteration number. Right frame: values of \(\phi (\beta )\) versus iteration number. Solid curves correspond to the 1D case, while dashed curves correspond to the 2D case. The asterisk highlights iteration for which the stopping criterion on the stabilization of the solution is satisfied

Finally, still using two popular datasets in the seismic community, we assess the performance of our algorithm when used for image decomposition. Fig. 11a-b, top row, show the 2004 BP velocity model [4] and the 2007 BP velocity model [22], respectively; the velocity variation is a piecewise smooth function of space for both these models. We apply 500 iterations of the Tikhonov-TV regularization method to decompose each velocity model into smooth and nonsmooth components: the results are still displayed in Fig. 11, on the middle and bottom rows, respectively. We can clearly observe that the two components \({{\textbf {m}}}_1\) and \({{\textbf {m}}}_2\) are optimally separated and allow to recover complementary parts of the original velocity model. Fig. 12 shows the evolution of the values of \(\beta \) and \(\phi (\beta )\) versus iteration for these two test problems. We can observe that, in both cases, the simple update scheme in (3.8) stably converges to a root of \(\phi (\beta )\), leading to an optimal separation between the smooth and nonsmooth components of the model \({{\textbf {m}}}\).

Fig. 11
figure 11

Model decomposition into smooth and nonsmooth components. Column a: the 2004 BP velocity model. Column b: the 2007 BP velocity model. Top row: the original models \({{\textbf {m}}}\). Middle row: the nonsmooth component \({{\textbf {m}}}_1\). Bottom row: the smooth component \({{\textbf {m}}}_2\)

Fig. 12
figure 12

Model decomposition into smooth and nonsmooth components. Left frame: values of the balancing parameter \(\beta \) versus iteration number. Right frame: values of \(\phi (\beta )\) versus iteration number. The asterisk highlights iteration for which the stopping criterion on the stabilization of the solution is satisfied

4.3 X-ray tomography

We first consider a parallel X-ray computed tomography test problem. The unknown \({{\textbf {m}}}\) to be reconstructed is the Shepp-Logan phantom of size \(320\times 320\) pixels. The original phantom is shown on the leftmost frame of Fig. 13. We consider a full projection setting with 453 parallel rays for each of the 90 equispaced angles in range \([-90^\circ ,+90^\circ ]\). This results in a discrete forward operator \(\mathbf{G}\) of size \(40770 \times 102400\). Gaussian white noise of level \(1\%\) is added to the data. During the inversion stage, at the kth ADMM iteration, the subproblem (2.11) for \({{\textbf {m}}}\) is solved by using the CG method. The CG iterations are stopped when the relative residual tolerance \(10^{-7}\) is reached, or when the maximum number of 100 iterations is reached; the estimated solution at the \((k-1)\)th ADMM iteration is used as an initial guess for the CG at current iteration. This resulted in a dynamic number of inner CG iterations, as shown in Fig. 15. The behavior of other relevant quantities is displayed in Fig. 14. In particular, we can observe that for this test problem, whose solution solely contains piecewise constant features, there are almost no differences in the performance of Tikhonov-TV and TV regularization methods.

Fig. 13
figure 13

X-ray tomography test problem. From left to right: true image (320 \(\times \) 320 pixels), estimation by the proposed balanced Tikhonov-TV method, estimation by the TV method, and estimation by the Tikhonov method. The numbers in parenthesis report the relative error of each estimate

Fig. 14
figure 14

X-ray tomography test problem. Evolution of the: a relative errors, b squared discrepancy versus iteration for the methods considered in Fig. 13. Evolution of: c \(\beta \), d \(\phi (\beta )\) for the Tikhonov-TV method. The asterisk highlights iteration for which the stopping criterion on the stabilization of the solution is satisfied

Fig. 15
figure 15

X-ray tomography test problem. The number of CG iterations performed at each ADMM iteration

We then take a phantom containing both piecewise constant and smooth features, and we resize it to be \(128\times 128\) pixels; the resulting phantom is shown on the leftmost frame of Fig. 16. We consider a setting where projections along parallel X-rays can be performed only along limited angles: specifically, we consider 181 parallel rays for each of the 85 equispaced angles in \([-42^{\circ }, +42^{\circ }]\). This results in a discrete forward operator \({{\textbf {G}}}\) of size \(15385\times 16384\). Gaussian white noise of level \(0.1\%\) is added to the data. For this particular test problem, ADMM applied to solve the Tikhonov-TV and TV-only test problem display a very slow convergence (this is visible in Fig. 17b, where the only method whose squared residual approximately equals \(\varepsilon \) within 600 iterations is Tikhonov regularization). Despite this, the reconstruction computed by the Tikhonov-TV method achieves the lowest relative error among the considered methods, with a good resolution of the background, the piecewise constant and the smooth features: this is clearly visible in Fig. 16. Figure 17 reports the progress of other relevant quantities versus the iteration count.

Fig. 16
figure 16

X-ray tomography test problem, with limited angles. Exact phantom along with the reconstructions achieved after 600 iterations of each methods. The numbers between brackets are the 2-norm relative errors associated to each reconstruction. To better highlight differences, the square root of the modulus of each pixel is displayed

Fig. 17
figure 17

X-ray tomography test problem, with limited angles. Frame a: 2-norm relative errors versus iteration number for different methods. Frame b: squared 2-norm discrepancies versus iteration number for different methods. Frame c: values of \(\beta \) versus iteration number for Tikhonov-TV. Frame d: values of \(\phi (\beta )\) versus iteration number for Tikhonov-TV

5 Conclusions

In this paper, we proposed a method for selecting the optimal balancing parameter in Tikhonov-TV regularization for the solution of discrete ill-posed inverse problems with piecewise smooth solutions. Namely, the solution of the inverse problem is split into the sum of a smooth and a piecewise constant component, which are separately regularized by a Tikhonov and a TV term that must be balanced. We have used robust statistical methods for determining the optimal balance of these terms, motivated by the fact that the gradient entries of the piecewise constant component at jump locations can be considered as anomalies/outliers in comparison to the other gradient entries constituting the smooth background. This led to the characterization of the best balancing parameter as a root of a scalar function. Finally, an extremely simple update scheme was proposed for determining the balancing parameter, which can be naturally coupled with ADMM to solve the Tikhonov-TV regularized problem. Extensive numerical experiments on different inverse problems demonstrate that high quality reconstructions can be obtained by applying the proposed algorithm, and validate the robustness of the proposed selection strategy of balancing parameter.