1 Introduction

We are interested in the solution of an inverse problem arising in geophysics. We wish to determine some properties of the ground, such as the electrical conductivity and/or the magnetic permeability distributions, by non-invasive techniques. This can be achieved by inverting FDEM data. For simplicity of notation, we assume that the magnetic permeability of the ground is known and constant. We consider a bi-dimensional discretization of the problem, that is, we are interested in a vertical section of the soil. The extension to unknown magnetic permeability and tridimensional ground is straightforward and will not be investigated here. This geophysical application is described by a nonlinear model. Once discretized, the problem can be expressed as

$$\begin{aligned} \min _\mathbf {\Sigma }\left\| M(\mathbf {\Sigma })-\textbf{B}\right\| _F^2, \end{aligned}$$
(1)

where \(\mathbf {\Sigma }\in \mathbb {R}^{n\times m}\) collects the values of electrical conductivity in m sites at n depths, \(\textbf{B}\in \mathbb {R}^{s\times m}\) contains the s measured data, obtained with different configurations of the FDEM device, in m sites, \(M:\mathbb {R}^{n \times m}\rightarrow \mathbb {R}^{s \times m}\) is a nonlinear function that maps the electrical conductivity to the measured data, and \(\left\| \cdot \right\| _F\) denotes the Frobenius norm. The data matrix \(\textbf{B}\) is usually corrupted by some measurement errors that we assume can be well approximated by white Gaussian noise. We will also assume that M is Fréchet differentiable and \(s<n\). In this application, since \(s<n\) and because of the ill-conditioning of the Jacobian of M, the problem is extremely ill-conditioned, i.e., it is very sensible to perturbation in the data. Therefore, we need to regularize the problem.

A regularization approach that has gathered a lot of attention in the past few years is the \(\ell ^p\)-\(\ell ^q\) regularization [8, 12, 31, 33]. A Matlab toolbox for the solution of the \(\ell ^p\)-\(\ell ^q\) minimization for linear problems has been recently proposed in [11] and a Python implementation is included in [40]. This approach has been successfully applied to this problem in [5]. In this work we wish to expand from the work in [5]. In particular, we propose an Alternating Directions Multiplier Method (ADMM) for the inversion of the problem, we show the convergence of the obtained method, and propose an automated strategy to determine the parameters involved in the method.

Following [5], we propose to compute a regularized solution of (1) by solving the following minimization problem

$$\begin{aligned} \min _\mathbf {\Sigma }\frac{1}{2}\left\| M(\mathbf {\Sigma })-\textbf{B}\right\| _F^2+\frac{\mu }{q}\left\| L(\mathbf {\Sigma })\right\| _q^q,\quad 0<q\le 2, \end{aligned}$$
(2)

where \(\mu >0\), \(L:\mathbb {R}^{n\times m}\rightarrow \mathbb {R}^p\) is a linear operator, and the \(\ell ^q\)-norm is \(\left\| \mathbf {\Sigma }\right\| _q^q=\sum _{i=1}^{n}\sum _{j=1}^{m}\left| \left( \mathbf {\Sigma }\right) _{i,j}\right| ^q\). If \(1\le q\le 2\), then \(\left\| \cdot \right\| _q\) is a norm, otherwise it is not a norm since it does not satisfy the triangular inequality. However, with a slight abuse of notation, we will refer to \(\left\| \cdot \right\| _q\) as \(\ell ^q\)-norm regardless of the value of q. To solve the minimization problem (2) we construct an ADMM that decouples the \(\ell ^2\) part from the \(\ell ^q\) part of the functional. At each iteration of ADMM, the solution of two minimization problems is performed. The first one is a least-squares nonlinear problem, while the second one is an \(\ell ^2\)-\(\ell ^q\) minimization problem where all the operators involved are linear. The first one is solved efficiently using the algorithm proposed in [6], to solve the latter we use an adaptation of the Majorization-Minimization (MM) algorithm proposed in [31]. The difference between the \(\ell ^2\)-\(\ell ^q\) minimization procedure proposed here and the one in [31] is the fact that, since we select L with a favorable structure, we do not need to project the problem in generalized Krylov subspaces; see below.

Here, our purpose is to reconstruct a 2D representation of the electrical conductivity. The most common device used to collect FDEM measurements is the Ground Conductivity Meter (GCM). Its principle of operation is based on an alternating electrical current which flows through a small electric wire coil (the transmitter) and a second coil (the receiver) positioned at a fixed distance from the first one. The transmitter sends electromagnetic waves into the subsoil generating a primary electromagnetic field \(H_{\textrm{P}}\). This primary magnetic field sends, in turn, small eddy currents within the ground, generating in turn a secondary magnetic field \(H_\textrm{S}\) which propagates back to the surface and the air above. The ratio between both magnetic fields is recorded by the receiver as the apparent conductivity of the soil. The two coil axes may be aligned either vertically or horizontally with respect to the subsurface. The measurements are complex data and depend on the instrument configuration, like the frequency of the device, the inter-coil distance, the height of the instrument above the ground, or the orientation of the coils.

A brief explanation of the model which describes the interaction between the soil and the GCM is provided in the next section. This nonlinear FDEM model has been developed in [47, 48] and solved in several papers for different device configurations and using different techniques; see [3, 14, 17,18,19, 23, 29, 41, 46]. The Matlab package FDEMtools including a graphical user interface (GUI) for the inversion procedure has been introduced in [15]. More recently, an updated version of this Matlab package together with a GUI for the forward model has been implemented in [16].

In most of the papers cited above, only one-dimensional inversions have been considered and, in the case of two-dimensional reconstructions, the result is obtained by juxtaposing one-dimensional approximate solutions. However, this approach results in a “spliced” two-dimensional reconstruction since the horizontal information is not considered and no horizontal continuity of the computed two-dimensional solution is required. As mentioned above, the variational model (2) was proposed in [5]. The novelty of the proposed method is that the one-dimensional reconstructions are coupled along the horizontal axis ensuring the continuity of the two-dimensional solution. This allowed the authors to obtain very accurate reconstructions of the electrical conductivity of the ground and avoided the so-called “splicing”. In [5] the authors provided a theoretical analysis of (2) and showed that it induces a regularization method; see [5, 24] for more details. The authors proposed a minimization algorithm that, however, was extremely computationally demanding and required the careful tuning of several parameters.

Our purpose is to propose here a computationally cheaper algorithm to solve (2) that requires very minimal tuning of parameters and that can be used in a plug-and-play fashion. We construct the method and show that, under reasonable theoretical hypothesis, it converges to a stationary point of the problem.

Finally, we would like to stress that when we deal with small values of the conductivity, a linear model is available [38]. This model has been treated in [4, 25, 45] and later in [20] in order to obtain an optimized numerical approach. More recently, in [21, 22] the inversion problem is solved in a reproducing kernel Hilbert space. The method proposed in this paper can be easily extended to this case. However, we do not dwell on the linearized model here.

The main contributions of this paper are the following. Firstly, we show the convergence of the ADMM in the nonlinear and nonconvex case, adapting the proof in [30]. We consider different assumptions for the convergence that are suitable for our application. In particular, we assume that the norm of the iterates is bounded rather than requiring that the nonlinear function is Lipschitz differentiable. Secondly, to automatically select the regularization parameter, we exploit the Residual Whiteness Principle; see [37, 42]. To the best of our knowledge this criterion for the determination of the regularization parameter has never been applied to nonlinear problems and has always been used exclusively in the linear case. Finally, we propose an automatic rule for the selection of the regularization parameter that exploits the statistical properties of the noise. This kind of rules have never been considered for the inversion of FDEM data.

This paper is structured as follows. Section 2 presents and briefly describes the nonlinear FDEM model. In Sect. 3, we propose our new numerical approach and we demonstrate its convergence in Sect. 4. Two variations of the ADMM developed are presented in Sect. 5, one introduces the nonnegativity constraint, the other provides a cheaper way to determine \(\mu \) in (2). Section 6 reports some numerical experiments that illustrate the effectiveness of our method, and concluding remarks can be found in Sect. 7.

2 The Nonlinear FDEM Model

In this section, we introduce briefly the nonlinear FDEM forward model. It is composed of two Fredholm integral equations of the first kind and describes the interaction between the soil and the FDEM induction device. The predicted data are functions of the electrical conductivity and of the magnetic permeability.

The model assumes a layered structure soil with n layers, each one characterized by an electrical conductivity \(\sigma _k\) (measured in Siemens per meter) and a magnetic permeability \(\mu _k\) (measured in Henry per meter), for \(k=1,\ldots ,n\). Each layer has a thickness \(d_k\), measured in meters, with \(d_n\) assumed as infinite.

Taking into account the interaction between the soil and the GCM, the nonlinear FDEM model reads as follows

$$\begin{aligned} M_\nu (\varvec{\sigma },\varvec{\mu };h,\omega ,r) = -r^{3-\nu } \int _0^\infty \lambda ^{2-\nu }e^{-2h\lambda } R_{\omega ,0}(\lambda ) J_\nu (r\lambda ) d\lambda , \qquad \nu =0,1, \end{aligned}$$
(3)

where \(\nu \in \{0,1\}\) represents the orientation of the coils, i.e., horizontal and vertical, respectively, \(\varvec{\sigma }=(\sigma _1,\ldots ,\sigma _n)^T\) is the electrical conductivity vector, \(\varvec{\mu }=(\mu _1,\ldots ,\mu _n)^T\) is the magnetic permeability vector, where the superscript \(^T\) denotes the transposition, h represents the height above the ground at which the measurements are taken, \(\omega \) stands for the angular frequency of the electromagnetic wave generated by the device, r is the distance between the coils, and \(J_0\) and \(J_1\) are first kind Bessel functions of order 0 and 1, respectively. The reflection factor \(R_{\omega ,0}(\lambda )\) is defined by

$$\begin{aligned} R_{\omega ,0}(\lambda ) = \frac{N_0(\lambda ) - Y_1(\lambda )}{N_0(\lambda ) + Y_1(\lambda )}, \end{aligned}$$

where \(N_0(\lambda )=\lambda /({\textrm{i}}\mu _0\omega )\), with \({\textrm{i}}\) the imaginary unit, \(\mu _0=4\pi \cdot 10^{-7}\) H/m (Henry per meter) the vacuum magnetic permeability, and \(Y_1(\lambda )\) is computed by the back-recursion

$$\begin{aligned} Y_k(\lambda ) = N_k(\lambda )\frac{Y_{k+1}(\lambda )+N_k(\lambda )\tanh (d_k u_k(\lambda ))}{N_k(\lambda ) + Y_{k+1}(\lambda )\tanh (d_k u_k(\lambda ))}, \quad k=n-1,\ldots ,1, \end{aligned}$$

where \( N_k(\lambda ) = u_k(\lambda )/({\textrm{i}}\mu _k\omega )\), \(u_k(\lambda )=\sqrt{\lambda ^2 + {\textrm{i}}\sigma _k\mu _k\omega }\) is the propagation constant, and the recursion is initialized by \(Y_n(\lambda )=N_n(\lambda )\). For more details about the nonlinear FDEM model see [14, 16, 19, 29].

It is possible to obtain simultaneous measurements with different configurations of the GCM device. For instance, with different inter-coil distances or different operating frequencies at different heights. To represent all the measurements, the vectors containing the loop-loop distances, the angular frequencies, and the heights at which the readings were taken are denoted, respectively, by \(\textbf{r}=(r_1,\ldots ,r_{s_r})^T\), \(\varvec{\omega }=(\omega _1,\ldots ,\omega _{s_\omega })^T\), and \(\textbf{h}=(h_1,\ldots ,h_{s_h})^T\). In this way, if we consider both orientations of the device, we have \(s=2 s_r s_\omega s_h\) measurements, arranged in a vector \(\textbf{b}\in \mathbb {R}^s\).

From now on, we will assume that the distribution of the magnetic permeability is the one of free space, i.e., \(\mu _k=\mu _0\), for \(k=1,\ldots ,n\), so that the measurements are sensitive only to electrical conductivity values \(\sigma _k\). Of course, the same numerical procedure introduced in this paper can be applied for the inversion of the magnetic permeability when the electrical conductivity is neglectable; see [14].

Finally, we assume in (1) that m equispaced measurement sets \(\textbf{b}_j \in \mathbb {R}^s\) are collected using different configurations of the FDEM device, i.e.,

$$\begin{aligned} \textbf{B}= [\textbf{b}_1,\ldots ,\textbf{b}_m]\in \mathbb {R}^{s\times m}, \qquad \mathbf {\Sigma }= [\varvec{\sigma }_1,\ldots ,\varvec{\sigma }_m]\in \mathbb {R}^{n\times m}, \end{aligned}$$

where \(\varvec{\sigma }_j\) are the corresponding electrical conductivities to the measurements \(\textbf{b}_j\). In this way, the minimization problem (1) can be discretized as

$$\begin{aligned} \sum _{j=1}^{m}\min _{\varvec{\sigma }_j}\left\| M(\varvec{\sigma }_j)-\textbf{b}_j\right\| _2^2, \end{aligned}$$

where \(\left\| \cdot \right\| _2\) is the Euclidean vector norm, and the vector function

$$\begin{aligned} M(\mathbf {\Sigma })=[M(\varvec{\sigma }_1), \ldots ,M(\varvec{\sigma }_m)] \in \mathbb {R}^{s\times m} \end{aligned}$$

returns the readings predicted by the model in the same order they were arranged in the vector \(\textbf{b}_j\). This can be done because the model is one-dimensional, i.e., the values of the j-th column of \(\textbf{B}\) depend only on the entries of the j-th column of \(\mathbf {\Sigma }\). Therefore, once discretized, the two-dimensional problem is reduced to m independent one-dimensional problems.

Remark 1

The model (3) takes complex values, indeed GCM devices record both the real (called the in-phase component) and the imaginary parts (called the quadrature component) of the ratio between the secondary and the primary fields. In the inversion process, sometimes we are interested in studying only one component (for instance the quadrature component, as in Tests 1 and 2 of Sect. 6) or both components (as in Test 3 of Sect. 6). The treatment of complex values does not introduce obstacles in solving the inversion problem with the procedure described in this paper, because the real and imaginary parts are considered separately. Specifically, we have to handle only real vectors of the form

$$\begin{aligned} \begin{bmatrix} \text {Re}(\textbf{B})\\ \text {Im}(\textbf{B}) \end{bmatrix}\in \mathbb {R}^{2s \times m}, \qquad \begin{bmatrix} \text {Re}(M(\mathbf {\Sigma }))\\ \text {Im}(M(\mathbf {\Sigma })) \end{bmatrix}\in \mathbb {R}^{2s \times m}. \end{aligned}$$

With abuse of notation, we will use s for the number of measurements in any case.

3 Alternating Directions Multiplier Method Applied to the Inversion of FDEM Data

We now describe how we apply the ADMM to the minimization problem (2). Before applying ADMM to the solution of our problem we have to note that, if \(q\le 1\), then the minimized functional is non-smooth. Since the case \(q\le 1\) is the one of interest in our application, we wish to substitute the non-smooth functional with a smooth one, following the approach in [31, 33].

Let \(\varepsilon >0\) be a fixed parameter, we consider the smoothed \(\ell ^q\)-norm defined by

$$\begin{aligned} \left\| \textbf{x}\right\| _{q,\varepsilon }^q=\sum _{j=1}^n\left( \sqrt{\textbf{x}_j^2+\varepsilon ^2}\right) ^q. \end{aligned}$$

Note that \(\left\| \textbf{x}\right\| _{q,\varepsilon }^q\) is everywhere differentiable. This substitution usually does not affect the quality of the computed solutions; see [9].

We can now reformulate (2), with the smoothed \(\ell ^q\)-norm, as a constrained minimization problem

$$\begin{aligned} \min _{\mathbf {\Sigma },\mathbf {\Xi }} \left\{ \frac{1}{2}\left\| M(\mathbf {\Sigma })-\textbf{B}\right\| _F^2+\frac{\mu }{q}\left\| L(\mathbf {\Xi })\right\| _{q,\varepsilon }^q, \;\mathbf {\Sigma }=\mathbf {\Xi }\right\} . \end{aligned}$$

We can write the augmented Lagrangian of the problem as

$$\begin{aligned} \mathcal {L}_\rho \left( \mathbf {\Sigma },\mathbf {\Xi };\textbf{y}\right) = \frac{1}{2}\left\| M(\mathbf {\Sigma })-\textbf{B}\right\| _F^2+ \frac{\mu }{q}\left\| L(\mathbf {\Xi })\right\| _{q,\varepsilon }^q+ \left\langle \textbf{y},\mathbf {\Sigma }-\mathbf {\Xi }\right\rangle +\frac{\rho }{2}\left\| \mathbf {\Sigma }-\mathbf {\Xi }\right\| _F^2, \end{aligned}$$

where \(\rho >0\) is a fixed parameter (see below), \(\textbf{y}\in \mathbb {R}^{n\times m}\), and

$$\begin{aligned} \left\langle \textbf{y},\textbf{x}\right\rangle =\sum _{i=1}^n\sum _{j=1}^m\textbf{y}_{i,j}\textbf{x}_{i,j} \end{aligned}$$

is the scalar product on \(\mathbb {R}^{n\times m}\) that induces the norm \(\left\| \cdot \right\| _F\). A solution of (2) is obtained as a saddle point of \(\mathcal {L}_\rho \). We define

$$\begin{aligned} \left( \mathbf {\Sigma }^*,\mathbf {\Xi }^*;\textbf{y}^*\right) =\arg \min _{\mathbf {\Sigma },\mathbf {\Xi }}\max _\textbf{y}\mathcal {L}_\rho (\mathbf {\Sigma },\mathbf {\Xi };\textbf{y}). \end{aligned}$$

The ADMM approximately computes \(\left( \mathbf {\Sigma }^*,\mathbf {\Xi }^*;\textbf{y}^*\right) \) with the following iteration

$$\begin{aligned} {\left\{ \begin{array}{ll} \displaystyle \mathbf {\Sigma }^{(k+1)}\in \arg \min _\mathbf {\Sigma }\mathcal {L}_\rho \left( \mathbf {\Sigma },\mathbf {\Xi }^{(k)};\textbf{y}^{(k)}\right) ,\\ \displaystyle \mathbf {\Xi }^{(k+1)}\in \arg \min _\mathbf {\Xi }\mathcal {L}_\rho \left( \mathbf {\Sigma }^{(k+1)},\mathbf {\Xi };\textbf{y}^{(k)}\right) ,\\ \displaystyle \textbf{y}^{(k+1)}=\textbf{y}^{(k)}+\rho \left( \mathbf {\Sigma }^{(k+1)}-\mathbf {\Xi }^{(k+1)}\right) . \end{array}\right. } \end{aligned}$$
(4)

This scheme is intuitively obtained by combining a Gauss-Seidel minimization method with a proper update of the Lagrangian multiplier \(\textbf{y}\). For simplicity of notation we will use the equality sign, rather than \(\in \), in the following exposition. Note that, if \(\rho \) is large enough, the solutions of the minimization problems in (4) are unique. This is a requirement of the proof of convergence; see Remark 2 in Sect. 4.

We now discuss how we (approximately) solve the two minimization problems in (4).

Let us first consider the \(\mathbf {\Sigma }\) subproblem. First, we discard in \(\mathcal {L}_\rho (\mathbf {\Sigma },\mathbf {\Xi }^{(k)};\textbf{y}^{(k)})\) the terms that do not depend on \(\mathbf {\Sigma }\), since they do not contribute to the minimization problem,

$$\begin{aligned} \mathbf {\Sigma }^{(k+1)}=\arg \min _\mathbf {\Sigma }\frac{1}{2}\left\| M(\mathbf {\Sigma })-\textbf{B}\right\| _F^2+\left\langle \textbf{y}^{(k)},\mathbf {\Sigma }\right\rangle +\frac{\rho }{2}\left\| \mathbf {\Sigma }-\mathbf {\Xi }^{(k)}\right\| _F^2. \end{aligned}$$
(5)

Rearranging the terms, we obtain

$$\begin{aligned} \mathbf {\Sigma }^{(k+1)}=\arg \min _\mathbf {\Sigma }\frac{1}{2}\left\| M(\mathbf {\Sigma })-\textbf{B}\right\| _F^2+\frac{\rho }{2}\left\| \mathbf {\Sigma }-\left( \mathbf {\Xi }^{(k)}-\frac{\textbf{y}^{(k)}}{\rho }\right) \right\| _F^2. \end{aligned}$$

Defining the function \(\widetilde{M}\) and the vector \(\widetilde{\textbf{B}}\) as follows

$$\begin{aligned} \widetilde{M}(\mathbf {\Sigma })=\begin{bmatrix} M(\mathbf {\Sigma })\\ \sqrt{\rho }\mathbf {\Sigma }\end{bmatrix},\quad \quad \widetilde{\textbf{B}}=\begin{bmatrix} \textbf{B}\\ \sqrt{\rho }\left( \mathbf {\Xi }^{(k)}-\frac{\textbf{y}^{(k)}}{\rho }\right) \end{bmatrix}, \end{aligned}$$

we get

$$\begin{aligned} \mathbf {\Sigma }^{(k+1)}=\arg \min _\mathbf {\Sigma }\frac{1}{2}\left\| \widetilde{M}(\mathbf {\Sigma })-\widetilde{\textbf{B}}\right\| _F^2. \end{aligned}$$
(6)

Thanks to the structure of M, we can rewrite (6) as follows. Denote by \(\varvec{\sigma }_j^{(k+1)}\), \(\varvec{\xi }_j^{(k)}\), \(\textbf{y}_j^{(k)}\), and \(\varvec{b}_j\) the j-th columns of \(\mathbf {\Sigma }^{(k+1)}\), \(\mathbf {\Xi }^{(k)}\), \(\textbf{y}^{(k)}\), and \(\textbf{B}\), respectively. Then, we can write

$$\begin{aligned} \begin{aligned} \varvec{\sigma }_j^{(k+1)}&=\arg \min _{\varvec{\sigma }}\frac{1}{2}\left\| \begin{bmatrix} \varvec{b}_j\\ \sqrt{\rho }\left( \varvec{\xi }_j^{(k)}-\frac{\textbf{y}_j^{(k)}}{\rho }\right) \end{bmatrix}-\begin{bmatrix} M(\varvec{\sigma })\\ \sqrt{\rho }\varvec{\sigma }\end{bmatrix}\right\| _2^2\\&=\arg \min _{\varvec{\sigma }}\frac{1}{2}\left\| r_j(\varvec{\sigma })\right\| _2^2,\quad j=1,\ldots ,m, \end{aligned} \end{aligned}$$
(7)

where \(r_j(\varvec{\sigma })=\begin{bmatrix} \varvec{b}_j\\ \sqrt{\rho }\left( \varvec{\xi }_j^{(k)}-\frac{\textbf{y}_j^{(k)}}{\rho }\right) \end{bmatrix}-\begin{bmatrix} M(\varvec{\sigma })\\ \sqrt{\rho }\varvec{\sigma }\end{bmatrix}\) is the j-th component of the residual function. We can now use the algorithm proposed in [6] on each minimization problem in (7). Note that the m minimization problems are independent and, therefore, can be run in parallel.

We now briefly discuss the algorithm proposed in [6]. This iterative procedure implements a Gauss-Newton method and employs a projection in fairly small linear subspaces to ensure a low computational cost. We note that the algorithm in [6] was developed for well-posed problems, this is our case, assuming that \(\rho \) in (7) is large enough. As we will see in the numerical simulations this is not a too restrictive requirement and a fairly small value of \(\rho \) can be used.

Denote by \(J(\varvec{\sigma })\) the Jacobian matrix of M in the point \(\varvec{\sigma }\), therefore, the Jacobian \(\widetilde{J}\) of \( \widetilde{M}(\varvec{\sigma })=\begin{bmatrix} M(\varvec{\sigma })\\ \sqrt{\rho }\varvec{\sigma }\end{bmatrix} \) is

$$\begin{aligned} \widetilde{J}(\varvec{\sigma })=\begin{bmatrix} J(\varvec{\sigma })\\ \sqrt{\rho }I_n\end{bmatrix}, \end{aligned}$$

where \(I_n\) denotes the identity matrix of order n. Given \(\varvec{\sigma }^{(k,l)}_j\) an approximation of \(\varvec{\sigma }_j^{(k)}\), i.e., of a solution of (7), the Gauss-Newton method computes the next approximate solution by

$$\begin{aligned} \varvec{\sigma }^{(k,l+1)}_j=\varvec{\sigma }_j^{(k,l)}+\alpha ^{(l)}\textbf{q}^{(l)}, \qquad l=0,1,\ldots , \end{aligned}$$

where \(\alpha ^{(l)}>0\) is determined by the Armijo-Goldstein principle (see below) and

$$\begin{aligned} \textbf{q}^{(l)}=\arg \min _\textbf{q}\frac{1}{2}\left\| r_j\left( \varvec{\sigma }_j^{(k,l)}\right) +J^{(l)}\textbf{q}\right\| _2^2. \end{aligned}$$

Here we denote by \(J^{(l)}\) the Jacobian matrix of \(r_j\) computed in \(\varvec{\sigma }_j^{(k,l)}\), i.e.,

$$\begin{aligned} J^{(l)}=-\widetilde{J}\left( \varvec{\sigma }^{(k,l)}_j\right) . \end{aligned}$$

In applications, the Gauss-Newton method is not always convergent. In order to ensure the convergence, we resort to the Armijo-Goldstein principle. It is satisfied by a given \(\alpha \) if

$$\begin{aligned} \left\| r_j\left( \varvec{\sigma }_j^{(k,l)}\right) \right\| _2^2-\left\| r_j\left( \varvec{\sigma }_j^{(k,l)}+\alpha \textbf{q}^{(l)}\right) \right\| _2^2\ge \frac{1}{2}\alpha \left\| J^{(l)}\textbf{q}^{(l)}\right\| _2^2. \end{aligned}$$

To determine such \(\alpha \), we employ a line search algorithm. Given a certain \(\alpha _0\), if it satisfies the condition above, we set \(\alpha ^{(l)}=\alpha _0\), otherwise we define

$$\begin{aligned} \alpha _1=\frac{\alpha _0}{2}. \end{aligned}$$

We iterate in this way until we determine a t such that

$$\begin{aligned} \alpha _t=\frac{\alpha _0}{2^t} \end{aligned}$$

satisfies the inequality above and we set \(\alpha ^{(l)}=\alpha _t\). This procedure always concludes in a finite amount of steps and ensures that the Gauss-Newton algorithm converges to a stationary point of the problem; see [26].

In [6] the authors lowered the computational cost of the Gauss-Newton algorithm by incorporating the so-called Generalized Krylov Subspace (GKS); see [32]. They determine an iterate

$$\begin{aligned} \varvec{\sigma }_j^{(k,l+1)}=V_l\textbf{z}^{(l+1)}, \end{aligned}$$

where \(V_l\in \mathbb {R}^{n\times \widehat{l}}\) has orthonormal columns and \(\widehat{l}\ll n\). The coefficients \(\textbf{z}^{(l+1)}\) are obtained using the Gauss-Newton algorithm, i.e., by

$$\begin{aligned} \textbf{z}^{(l+1)}=\textbf{z}^{(l)}+\alpha ^{(l)}\textbf{q}^{(l)}, \end{aligned}$$

where the step \(\textbf{q}^{(l)}\) is obtained by

$$\begin{aligned} \textbf{q}^{(l)}=\arg \min _\textbf{q}\frac{1}{2}\left\| r_j\left( V_l\textbf{z}^{(l)}\right) +\widehat{J}^{(l)}\textbf{q}\right\| _2^2. \end{aligned}$$

The matrix \(\widehat{J}^{(l)}\) is the Jacobian matrix of the function \(r_j\left( V_l\textbf{z}\right) \), with respect to the variable \(\textbf{z}\), computed in \(\textbf{z}^{(l)}\), i.e.,

$$\begin{aligned} \widehat{J}^{(l)}=-\widetilde{J}\left( V_l\textbf{z}^{(l)}\right) V_l. \end{aligned}$$

This latter matrix has many more rows than columns and, therefore, the computation of \(\textbf{q}\) is extremely cheap. As in the classical algorithm, the parameter \(\alpha ^{(l)}\) satisfies the Armijo-Goldstein principle, i.e.,

$$\begin{aligned} \left\| r_j\left( V_l\textbf{z}^{(l)}\right) \right\| _2^2-\left\| r_j\left( V_l\left( \textbf{z}^{(l)}+\alpha ^{(l)}\textbf{q}^{(l)}\right) \right) \right\| _2^2\ge \frac{1}{2}\alpha ^{(l)}\left\| \widehat{J}^{(l)}\textbf{q}^{(l)}\right\| _2^2, \end{aligned}$$

and is determined by a linesearch algorithm.

Once \(\textbf{z}^{(l+1)}\) is determined, the search-subspace is enlarged by computing

$$\begin{aligned} \textbf{g}^{(l+1)}=\widetilde{J}\left( V_l\textbf{z}^{(l+1)}\right) ^Tr_j\left( V_l\textbf{z}^{(l+1)}\right) , \end{aligned}$$

and reorthogonalizing it against the basis \(V_l\), i.e.,

$$\begin{aligned} \widetilde{\textbf{g}}^{(l+1)}=\textbf{g}^{(l+1)}-V_lV_l^T\textbf{g}^{(l+1)}. \end{aligned}$$

Then, the new basis results by adding the normalized \(\widetilde{\textbf{g}}^{(l+1)}\), i.e.,

$$\begin{aligned} V_{l+1}=\begin{bmatrix} V_l&\frac{\widetilde{\textbf{g}}^{(l+1)}}{\left\| \widetilde{\textbf{g}}^{(l+1)}\right\| _2} \end{bmatrix}. \end{aligned}$$

Chosen an initial vector \(\varvec{\sigma }^{(k,0)}\ne \textbf{0}\), we set \(V_0=\frac{\varvec{\sigma }^{(k,0)}}{\left\| \varvec{\sigma }^{(k,0)}\right\| _2}\), therefore \(V_l\in \mathbb {R}^{n\times (l+1)}\). We will assume that few iterations are needed for this procedure to converge (or at least to produce a reasonable approximation of \(\varvec{\sigma }^{(k+1)}\)) and so \(l\ll n\). From a theoretical point of view, we set

$$\begin{aligned} \varvec{\sigma }^{(k+1)}=\varvec{\sigma }^{(k,\infty )}. \end{aligned}$$

We now turn to the \(\mathbf {\Xi }\) subproblem. By neglecting in \(\mathcal {L}_\rho (\mathbf {\Sigma }^{(k+1)},\mathbf {\Xi };\textbf{y}^{(k)})\) the terms that do not depend on \(\mathbf {\Xi }\), we obtain

$$\begin{aligned} \mathbf {\Xi }^{(k+1)}=\arg \min _\mathbf {\Xi }\frac{\mu }{q}\left\| L(\mathbf {\Xi })\right\| _{q,\varepsilon }^q -\left\langle \textbf{y}^{(k)},\mathbf {\Xi }\right\rangle +\frac{\rho }{2}\left\| \mathbf {\Sigma }^{(k+1)}-\mathbf {\Xi }\right\| _F^2. \end{aligned}$$
(8)

We first construct the operator L. To do this, we define the \(\textrm{vec}\) operator for a matrix \(\textbf{C}\in \mathbb {R}^{m\times n}\) as

$$\begin{aligned} \textrm{vec}(\textbf{C})=\begin{bmatrix} \textbf{C}_{1,1}&\ldots&\textbf{C}_{m,1}&\textbf{C}_{1,2}&\ldots&\textbf{C}_{m,2}&\ldots&\textbf{C}_{1,n}&\ldots&\textbf{C}_{m,n} \end{bmatrix}^T. \end{aligned}$$

For simplicity of notation, let us assume that \(n=m\) and denote by \(L_2\) the following discretization of the one-dimensional Laplacian operator with reflective boundary conditions

$$\begin{aligned} L_2=\begin{bmatrix}1& -1\\ -1& 2& -1\\ & \ddots & \ddots & \ddots \\ & & -1& 2& -1\\ & & & -1& 1\end{bmatrix}\in \mathbb {R}^{n\times n}. \end{aligned}$$

Let \(I_n\) denote the identity matrix of order n, we define

$$\begin{aligned} \textrm{vec}(L(\mathbf {\Xi }))=(L_2\otimes I_n+I_n\otimes L_2)\textrm{vec}(\mathbf {\Xi }), \end{aligned}$$

where \(\otimes \) is the Kronecker product. Thanks to the structure of \(L_2\), the matrix \((L_2\otimes I_n+I_n\otimes L_2)\) is diagonalized by the two-dimensional Discrete Cosine Transform C and it holds

$$\begin{aligned} (L_2\otimes I_n+I_n\otimes L_2)=C^T\Lambda C, \end{aligned}$$
(9)

where \(\Lambda \) is a diagonal matrix containing the eigenvalues of the matrix \((L_2\otimes I_n+I_n\otimes L_2)\) and can be computed by applying the dct algorithm to the first column of it; see, e.g., [28, 39] for more details.

We can now rearrange the terms in the \(\mathbf {\Xi }\) subproblem (8) obtaining

$$\begin{aligned} \mathbf {\Xi }^{(k+1)}=\arg \min _\mathbf {\Xi }\frac{1}{2}\left\| \mathbf {\Xi }-\left( \mathbf {\Sigma }^{(k+1)}+ \frac{\textbf{y}^{(k)}}{\rho }\right) \right\| _F^2+\frac{\mu }{q\rho }\left\| L(\mathbf {\Xi })\right\| _{q,\varepsilon }^q. \end{aligned}$$
(10)

To compute \(\mathbf {\Xi }^{(k+1)}\) we use a variation, already used in [5], of the Majorization-Minimization (MM) algorithm proposed in [31].

The MM algorithm determines a sequence of vectors that converges to a stationary point of (10). At each iteration the functional in (10) is majorized by a quadratic functional \(\mathcal {Q}\) and the new approximate solution is obtained as the unique minimizer of \(\mathcal {Q}\). To simplify the notation, let \(\widetilde{\mathbf {\Sigma }}^{(k+1)}=\mathbf {\Sigma }^{(k+1)}+\frac{\textbf{y}^{(k)}}{\rho }\) and define

$$\begin{aligned} \varvec{\xi }=\textrm{vec}(\mathbf {\Xi })\quad \text{ and }\quad \widetilde{\varvec{\sigma }}^{(k+1)}=\textrm{vec}\left( \widetilde{\mathbf {\Sigma }}^{(k+1)}\right) . \end{aligned}$$

Therefore, we can rewrite (10), using (9), as

$$\begin{aligned} \varvec{\xi }^{(k+1)}=\arg \min _{\varvec{\xi }}\frac{1}{2}\left\| \varvec{\xi }-\widetilde{\varvec{\sigma }}^{(k+1)}\right\| _2^2+\frac{\mu }{q\rho }\left\| C^T\Lambda C\varvec{\xi }\right\| _{q,\varepsilon }^q. \end{aligned}$$

Let \(\varvec{\xi }^{(k,j)}\) be an approximation of the solution of the problem above. A quadratic tangent majorant of \(\mathcal {J}(\varvec{\xi })=\frac{1}{2}\left\| \varvec{\xi }-\widetilde{\varvec{\sigma }}^{(k+1)}\right\| _2^2+\frac{\mu }{q\rho }\left\| C^T\Lambda C\varvec{\xi }\right\| _{q,\varepsilon }^q\) in the point \(\varvec{\xi }^{(k,j)}\) is defined as a function \(\mathcal {Q}\left( \varvec{\xi },\varvec{\xi }^{(k,j)}\right) \) such that

  1. (i)

    \(\mathcal {Q}\left( \varvec{\xi },\varvec{\xi }^{(k,j)}\right) \) is quadratic in \(\varvec{\xi }\);

  2. (ii)

    \(\mathcal {Q}\left( \varvec{\xi }^{(k,j)},\varvec{\xi }^{(k,j)}\right) =\mathcal {J}\left( \varvec{\xi }^{(k,j)}\right) \);

  3. (iii)

    \(\nabla \mathcal {Q}\left( \varvec{\xi }^{(k,j)},\varvec{\xi }^{(k,j)}\right) =\nabla \mathcal {J}\left( \varvec{\xi }^{(k,j)}\right) \);

  4. (iv)

    \(\mathcal {Q}\left( \varvec{\xi },\varvec{\xi }^{(k,j)}\right) \ge \mathcal {J}\left( \varvec{\xi }\right) \) for all \(\varvec{\xi }\).

The choice of \(\mathcal {Q}\) is not unique. We consider here the so-called fixed majorant

$$\begin{aligned} \mathcal {Q}\left( \varvec{\xi },\varvec{\xi }^{(k,j)}\right) =\frac{1}{2}\left\| \varvec{\xi }-\widetilde{\varvec{\sigma }}^{(k+1)}\right\| _2^2+\frac{\eta }{2}\left( \left\| C^T\Lambda C\varvec{\xi }\right\| _2^2-2\varvec{\omega }_j^TC^T\Lambda C\varvec{\xi }\right) , \end{aligned}$$

with \(\eta =\frac{\mu }{\rho }\varepsilon ^{q-2}\),

$$\begin{aligned} \varvec{\omega }_j=\textbf{u}_j\left( \textbf{1}-\left( \frac{\textbf{u}_j^2+\varepsilon ^2\textbf{1}}{\varepsilon ^2}\right) ^{q/2-1}\right) ,\quad \text{ and }\quad \textbf{u}_j=C^T\Lambda C\varvec{\xi }^{(k,j)}, \end{aligned}$$

where \(\textbf{1}\) denotes a vector of the same size of \(\textbf{u}_j\) with all components equal to 1, and all the operations are meant element-wise. The next iterate is obtained by

$$\begin{aligned} \begin{aligned} \varvec{\xi }^{(k,j+1)}&=\arg \min _{\varvec{\xi }}\mathcal {Q}\left( \varvec{\xi },\varvec{\xi }^{(k,j)}\right) \\&=\arg \min _{\varvec{\xi }}\left\| \begin{bmatrix} I\\ \sqrt{\eta }C^T\Lambda C \end{bmatrix}\varvec{\xi }-\begin{bmatrix} \widetilde{\varvec{\sigma }}^{(k+1)}\\ \sqrt{\eta }\varvec{\omega }_j \end{bmatrix}\right\| _2^2\\&=\left( I+\eta C^T\Lambda ^TCC^T\Lambda C\right) ^{-1}\left( \widetilde{\varvec{\sigma }}^{(k+1)}+\eta C^T\Lambda ^TC\varvec{\omega }_j\right) \\&=C^T\left( I+\eta \Lambda ^2\right) ^{-1}\left( C\widetilde{\varvec{\sigma }}^{(k+1)}+\eta \Lambda C\varvec{\omega }_j\right) , \end{aligned} \end{aligned}$$
(11)

where the last step is obtained by recalling that \(C^TC=CC^T=I_{n^2}\) and that \(\Lambda \) is a real, square, and diagonal matrix. Therefore, the cost for computing \(\varvec{\xi }^{(k,j+1)}\) is dominated by that of two matrix–vector products with C and one with \(C^T\). The matrix–vector products with C can be performed in \(O(nm\log (nm))\) operations using the dct algorithm.

The computations are summarized in Algorithm 1.

Algorithm 1
figure a

ADMM for FDEM data

3.1 Selection of the Regularization Parameter

We now describe an automatic procedure to determine the regularization parameter \(\mu \). To this aim, we exploit the assumption that the noise that corrupts the data is white Gaussian. Let

$$\begin{aligned} \mathbf {\Sigma }_\mu =\arg \min _\mathbf {\Sigma }\frac{1}{2}\left\| M\left( \mathbf {\Sigma }\right) -\textbf{B}\right\| _F^2+\frac{\mu }{q}\left\| L\left( \mathbf {\Sigma }\right) \right\| _{q,\varepsilon }^q. \end{aligned}$$

In the ideal scenario, i.e., when \(\mathbf {\Sigma }_\mu \) is the exact solution of the problem, the residual

$$\begin{aligned} \textbf{R}_\mu =M\left( \mathbf {\Sigma }_\mu \right) -\textbf{B}\end{aligned}$$

coincides with the noise and, therefore, each of its entries is the realization of a random variable with Gaussian distribution with zero mean and fixed variance. We wish to determine \(\mu \) so that \(\textbf{R}_\mu \) satisfies as much as possible this property.

This idea of exploiting the whiteness property of the corrupting noise has been widely used for the solution of linear inverse problems. The main advantage is that it does not require any additional information on the noise that corrupts the data aside from its whiteness. This idea has been vastly investigated for image denoising and deblurring problems; see, e.g., [1, 2, 27, 34,35,36, 42,43,44].

Here, we follow the approach proposed in [8] that uses the a posteriori criterion described in [1, 37]. We first outline the main ideas of this a posteriori criterion, referred to as Residual Whiteness Principle (RWP), and then describe how to use it to determine a suitable value for \(\mu \).

Denote by \(\varvec{\eta } \in \mathbb {R}^{s\times m}\) the noise that corrupts the data \(\textbf{B}\). Thus,

$$\begin{aligned} \varvec{\eta } = \left\{ \eta _{i,j} \right\} _{(i,j) \in \Omega }, \quad \Omega := \{ 0, \,\ldots \,, s-1 \} \times \{ 0, \,\ldots \,, m-1 \}. \end{aligned}$$

The sample auto-correlation of \(\varvec{\eta }\) is defined as

$$\begin{aligned} a(\varvec{\eta })= \left\{ a_{l,k}(\varvec{\eta }) \right\} _{(l,k) \in \mathrm {\Theta }}, \end{aligned}$$

where \(\mathrm {\Theta } \;{:=}\; \{ -(s -1), \,\ldots \,, s - 1 \} \times \{ -(m -1),\,\ldots \,,m-1\}\). The values \(a_{l,k}(\varvec{\eta })\in \mathbb {R}\) are computed by

$$\begin{aligned} \begin{aligned} a_{l,k}(\varvec{\eta })&= \frac{1}{sm}\left( \varvec{\eta }\star \varvec{\eta }\right) _{l,k}= \frac{1}{sm}\left( \varvec{\eta } {*} \varvec{\eta }^{\prime }\right) _{l,k}\\&=\frac{1}{sm}\sum _{(i,j)\in \mathrm {\Omega }} \eta _{i,j} \eta _{i+l,j+k}, \quad (l,k) \in \mathrm {\Theta }, \end{aligned} \end{aligned}$$
(12)

where the pairs (lk) are referred to as lags, \(\cdot \star \cdot \) and \(\cdot *\cdot \) denote the two-dimensional discrete correlation and convolution operators, respectively, and \(\varvec{\eta }^{\prime }(i,j) = \varvec{\eta }(-i,-j)\).

Since we wish to define the auto-correlation in (12) for all lags \((l,k) \in \mathrm {\Theta }\), we need to pad the noise realization \(\varvec{\eta }\) with at least \(s-1\) samples in the first direction and \(m-1\) samples in the second one. For simplicity, we will assume periodic boundary conditions for \(\varvec{\eta }\). Thanks to this assumption, the operators \(\cdot \star \cdot \) and \(\cdot *\cdot \) in (12) become the two-dimensional circular correlation and convolution, respectively. As a result of the imposed structure in the auto-correlation, we can only consider some lags and not all of them, namely

$$\begin{aligned} (l,k) \in \overline{\mathrm {\Theta }}:=\{ 0, \ldots , s - 1\} \times \{ 0,\ldots , m - 1 \}. \end{aligned}$$

Since we assume that the error \(\varvec{\eta }\) is a realization of a white noise process, then it is well known that the following asymptotic property of the sample auto-correlation \(a(\varvec{\eta })\) is satisfied

$$\begin{aligned} \lim _{m \rightarrow +\infty } a_{l,k}(\varvec{\eta }) = {\left\{ \begin{array}{ll} \sigma ^2 & \text{ for } (l,k) = (0,0),\\ 0 & \text{ for } (l,k) \in \overline{\mathrm {\Theta }}_0:= \overline{\mathrm {\Theta }}\setminus \{(0,0)\}. \end{array}\right. } \end{aligned}$$
(13)

To avoid the dependency on the noise variance \(\sigma ^2\), we consider the normalized sample auto-correlation of the noise realization \(\varvec{\eta }\)

$$\begin{aligned}\nonumber \beta (\varvec{\eta }) = \frac{1}{a_{0,0}(\varvec{\eta })} a(\varvec{\eta }) = \frac{1}{\left\| \varvec{\eta }\right\| _F^2} \left( \varvec{\eta } \star \varvec{\eta } \right) . \end{aligned}$$

From (13) it is trivial to see that

$$\begin{aligned} \lim _{m \rightarrow +\infty } \beta _{l,k}(\varvec{\eta }) = {\left\{ \begin{array}{ll} 1 & \text{ for } (l,k) = (0,0),\\ 0 & \text{ for } (l,k) \in \overline{\mathrm {\Theta }}_0. \end{array}\right. } \end{aligned}$$

We can, therefore, introduce the following \(\sigma \)-independent non-negative scalar measure of whiteness \(\mathcal {W}: \mathbb {R}^{s \times m} \rightarrow \mathbb {R}^+\) of the noise realization \(\varvec{\eta }\)

$$\begin{aligned} \mathcal {W}(\varvec{\eta }):= \left\| \beta (\varvec{\eta }) \right\| _F^2 = \frac{\left\| \,\varvec{\eta }\star \varvec{\eta }\right\| _F^2}{\left\| \varvec{\eta }\right\| _F^4}. \end{aligned}$$
(14)

As stated above, if \(\mathbf {\Sigma }_\mu \) well approximates the exact solution of the problem, the associated \(s\times m\) residual \(\textbf{R}_\mu = M\left( \mathbf {\Sigma }_\mu \right) - \textbf{B}\) well approximates the white noise realization \(\varvec{\eta }\). Hence, in this case, the residual \(\textbf{R}_\mu \) is white according to the scalar measure in (14). We can formulate the RWP for automatically selecting the regularization parameter \(\mu \) as follows

$$\begin{aligned} \mu ^* \in \arg \min _{\mu >0} W(\mu ), \quad W(\mu ):= \mathcal {W}\left( \textbf{R}_{\mu }\right) , \quad \textbf{R}_\mu =M\left( \mathbf {\Sigma }_\mu \right) - \textbf{B}, \end{aligned}$$

where the function W is defined by

$$\begin{aligned} W(\mu ) = \left\| \beta (\textbf{R}_\mu ) \right\| _F^2 = \frac{\left\| \, \textbf{R}_\mu \star \textbf{R}_\mu \right\| _F^2}{\left\| \textbf{R}_\mu \right\| _F^4}. \end{aligned}$$
(15)

We refer to W as the residual whiteness function.

We propose to determine \(\mu \) by following one of the strategies in [8]. Consider a set of candidate values for the regularization parameter \(\{\mu _1,\ldots ,\mu _d\}\subset \mathbb {R}^+\). For each \(\mu _j\) we compute, using Algorithm 1, an approximation of \(\mathbf {\Sigma }_{\mu _j}\). We determine a suitable value of the regularization parameter and we denote it by \(\mu ^*\) by

$$\begin{aligned} \mu ^*=\arg \min \left\{ W\left( \mu _j\right) ,\;j=1,\ldots ,d\right\} , \end{aligned}$$

where W is defined in (15).

4 Convergence of the Iterations

We now wish to show that the iterations determined by Algorithm 1 converge to a stationary point of

$$\begin{aligned} \frac{1}{2}\left\| M(\mathbf {\Sigma })-\textbf{B}\right\| _F^2+\frac{\mu }{q}\left\| L(\mathbf {\Sigma })\right\| _{q,\varepsilon }^q. \end{aligned}$$

The proof of this result is inspired by the ones in [7, 30] with some substantial modifications.

We need the following assumptions.

Assumption 1

Denote by \(\left\{ \mathbf {\Sigma }^{(k)}\right\} _{k\in \mathbb {N}}\) and \(\left\{ \mathbf {\Xi }^{(k)}\right\} _{k\in \mathbb {N}}\) the iterates generated by Algorithm 1. We assume that these sequences are bounded, that is, there exists \(R>0\) such that \(\left\| \mathbf {\Sigma }^{(k)}\right\| _F\le R\) and \(\left\| \mathbf {\Xi }^{(k)}\right\| _F\le R\), for all \(k\in \mathbb {N}\).

Assumption 2

The computed \(\mathbf {\Sigma }^{(k+1)}\) and \(\mathbf {\Xi }^{(k+1)}\) are such that

$$\begin{aligned} \mathbf {\Sigma }^{(k+1)}&=\arg \min _{\mathbf {\Sigma }}\mathcal {L}_\rho \left( \mathbf {\Sigma },\mathbf {\Xi }^{(k)};\textbf{y}^{(k)}\right) ,\\ \mathbf {\Xi }^{(k+1)}&=\arg \min _{\mathbf {\Xi }}\mathcal {L}_\rho \left( \mathbf {\Sigma }^{(k+1)},\mathbf {\Xi };\textbf{y}^{(k)}\right) , \end{aligned}$$

i.e., the inner iterations in Algorithm 1 converge to the unique minimizer of the respective subproblem.

Remark 2

Assumption 2 requires that \(\rho \) is large “enough” so that the subproblems are strictly convex and that the inner iterations reach convergence; see below. However, note that this is rarely satisfied in practice since, firstly, only a finite amount of inner iterations can be performed and the inner iterations may have not reached convergence. Secondly, a too large value of \(\rho \) may slow down significantly the computations and, to ensure that the algorithm is computationally feasible, a small value of \(\rho \) may be needed. Therefore, the subproblems may be non-convex and, even if a stationary point is computed, this may be a saddle point or a local minimum. Nevertheless, vast numerical experience suggests that the procedure converges even when an accurate enough approximate minimization is performed and when \(\rho \) is not too large.

Note that, due to Assumption 1 and since \(M(\mathbf {\Sigma })\in C^1\), there exists \(c_1\) such that

$$\begin{aligned} \left\| \nabla M\left( \mathbf {\Sigma }^{(j)}\right) - \nabla M\left( \mathbf {\Sigma }^{(k)}\right) \right\| _F \le c_1 \left\| \mathbf {\Sigma }^{(j)} - \mathbf {\Sigma }^{(k)}\right\| _F, \qquad \forall \ j,k, \end{aligned}$$

where \(\nabla M\) is the gradient of M.

Before showing our main result, we need some auxiliary results.

Lemma 1

Consider the iterates \(\textbf{y}^{(k)}\) and \(\mathbf {\Xi }^{(k)}\) generated by Algorithm 1 and assume that Assumption 2 is satisfied. Then, there exists a constant \(c_2\) such that

$$\begin{aligned} \left\| \textbf{y}^{(k+1)}-\textbf{y}^{(k)}\right\| _F\le c_2 \left\| \mathbf {\Xi }^{(k+1)}-\mathbf {\Xi }^{(k)}\right\| _F,\quad \text{ for } \text{ all } k\in \mathbb {N}. \end{aligned}$$

Proof

Let \(f_q(\mathbf {\Xi }):= \frac{\mu }{q}\left\| L(\mathbf {\Xi })\right\| _{q,\varepsilon }^q\). Note that \(f_q\) is continuously differentiable and it is easy to see that \(\nabla f_q\) is Lipschitz continuous.

By definition, \(\mathbf {\Xi }^{(k+1)}\) is a minimizer of \(\mathcal {L}_\rho \left( \mathbf {\Sigma }^{(k+1)},\mathbf {\Xi };\textbf{y}^{(k)}\right) \), then it holds

$$\begin{aligned} 0&=\left. \nabla \left( f_q(\mathbf {\Xi }) - \langle \textbf{y}^{(k)},\mathbf {\Xi }\rangle + \frac{\rho }{2}\left\| \mathbf {\Sigma }^{(k+1)}-\mathbf {\Xi }\right\| _F^2 \right) \right| _{\mathbf {\Xi }^{(k+1)}}\\&=\nabla f_q\left( \mathbf {\Xi }^{(k+1)}\right) -\textbf{y}^{(k)}-\rho \left( \mathbf {\Sigma }^{(k+1)}-\mathbf {\Xi }^{(k+1)}\right) . \end{aligned}$$

From this equality and the definition of \(\textbf{y}^{(k+1)}\) in (4), it immediately follows that

$$\begin{aligned} \nabla f_q\left( \mathbf {\Xi }^{(k+1)}\right) = \textbf{y}^{(k+1)}. \end{aligned}$$
(16)

This relation and the Lipchitz continuity of \(f_q\) imply that there exists a constant \(c_2\) such that

$$\begin{aligned} \left\| \textbf{y}^{(k+1)}-\textbf{y}^{(k)}\right\| _F = \left\| \nabla f_q(\mathbf {\Xi }^{(k+1)})-\nabla f_q(\mathbf {\Xi }^{(k)})\right\| _F \le c_2 \left\| \mathbf {\Xi }^{(k+1)}-\mathbf {\Xi }^{(k)}\right\| _F, \end{aligned}$$

which concludes the proof. \(\square \)

We now recall the definition of a \(\omega \)-strongly convex function.

Definition 1

A differentiable function f is called strongly convex with parameter \(\omega >0\) if the following inequality holds for all points x, y in its domain

$$\begin{aligned} f(x)-f(y)\le \langle \nabla f(x),x-y\rangle -\frac{\omega }{2}\left\| x-y\right\| ^{2}. \end{aligned}$$

We can now show the following proposition.

Proposition 1

Under Assumptions 1 and 2 and with the notation above, assume that \(\rho \) is large enough so that \(\mathcal {L}_\rho \left( \mathbf {\Sigma },\mathbf {\Xi }^{(k)};\textbf{y}^{(k)}\right) \) and \(\mathcal {L}_\rho \left( \mathbf {\Sigma }^{(k+1)},\mathbf {\Xi };\textbf{y}^{(k)}\right) \) are both \(\omega \)-strongly convex functions of \(\mathbf {\Sigma }\) and \(\mathbf {\Xi }\), respectively. Then,

$$\begin{aligned}&{\mathcal {L}}_\rho \left( \mathbf {\Sigma }^{(k+1)},\mathbf {\Xi }^{(k+1)};\textbf{y}^{(k+1)}\right) - {\mathcal {L}}_\rho \left( \mathbf {\Sigma }^{(k)},\mathbf {\Xi }^{(k)};\textbf{y}^{(k)}\right) \\&\le \left( \frac{c_2^2}{\rho }-\frac{\omega }{2}\right) \left\| \mathbf {\Xi }^{(k+1)}-\mathbf {\Xi }^{(k)}\right\| _F^2 - \frac{\omega }{2}\left\| \mathbf {\Sigma }^{(k+1)}-\mathbf {\Sigma }^{(k)}\right\| _F^2. \end{aligned}$$

Proof

We rewrite the difference between the augmented Lagrangians in the thesis in an equivalent form

$$\begin{aligned}&{\mathcal {L}}_\rho \left( \mathbf {\Sigma }^{(k+1)},\mathbf {\Xi }^{(k+1)};\textbf{y}^{(k+1)}\right) - {\mathcal {L}}_\rho \left( \mathbf {\Sigma }^{(k)},\mathbf {\Xi }^{(k)};\textbf{y}^{(k)}\right) \nonumber \\&={\mathcal {L}}_\rho \left( \mathbf {\Sigma }^{(k+1)},\mathbf {\Xi }^{(k+1)};\textbf{y}^{(k+1)}\right) - {\mathcal {L}}_\rho \left( \mathbf {\Sigma }^{(k+1)},\mathbf {\Xi }^{(k+1)};\textbf{y}^{(k)}\right) \nonumber \\&\quad +{\mathcal {L}}_\rho \left( \mathbf {\Sigma }^{(k+1)},\mathbf {\Xi }^{(k+1)};\textbf{y}^{(k)}\right) - {\mathcal {L}}_\rho \left( \mathbf {\Sigma }^{(k)},\mathbf {\Xi }^{(k)};\textbf{y}^{(k)}\right) . \end{aligned}$$
(17)

By manipulating the first two terms in the right-hand side of (17) and by using the last definition of \(\textbf{y}^{(k+1)}\) in (4), we obtain

$$\begin{aligned}&\left\langle \textbf{y}^{(k+1)},\mathbf {\Sigma }^{(k+1)}-\mathbf {\Xi }^{(k+1)} \right\rangle - \left\langle \textbf{y}^{(k)},\mathbf {\Sigma }^{(k+1)}-\mathbf {\Xi }^{(k+1)} \right\rangle \nonumber \\&= \left\langle \textbf{y}^{(k+1)}-\textbf{y}^{(k)},\mathbf {\Sigma }^{(k+1)}-\mathbf {\Xi }^{(k+1)} \right\rangle \nonumber \\&= \left\langle \textbf{y}^{(k+1)}-\textbf{y}^{(k)},\frac{1}{\rho }(\textbf{y}^{(k+1)}-\textbf{y}^{(k)}) \right\rangle = \frac{1}{\rho }\left\| \textbf{y}^{(k+1)}-\textbf{y}^{(k)}\right\| ^2_F. \end{aligned}$$
(18)

We consider the last two terms of the right-hand side of (17). Applying the \(\omega \)-strong convexity of \({\mathcal {L}}_\rho \) gives

$$\begin{aligned}&{\mathcal {L}}_\rho \left( \mathbf {\Sigma }^{(k+1)},\mathbf {\Xi }^{(k+1)};\textbf{y}^{(k)}\right) - {\mathcal {L}}_\rho \left( \mathbf {\Sigma }^{(k)},\mathbf {\Xi }^{(k)};\textbf{y}^{(k)}\right) \nonumber \\&={\mathcal {L}}_\rho \left( \mathbf {\Sigma }^{(k+1)},\mathbf {\Xi }^{(k+1)};\textbf{y}^{(k)}\right) - {\mathcal {L}}_\rho \left( \mathbf {\Sigma }^{(k+1)},\mathbf {\Xi }^{(k)};\textbf{y}^{(k)}\right) \nonumber \\&\quad +{\mathcal {L}}_\rho \left( \mathbf {\Sigma }^{(k+1)},\mathbf {\Xi }^{(k)};\textbf{y}^{(k)}\right) - {\mathcal {L}}_\rho \left( \mathbf {\Sigma }^{(k)},\mathbf {\Xi }^{(k)};\textbf{y}^{(k)}\right) \nonumber \\&\le \left\langle \nabla {\mathcal {L}}_\rho \left( \mathbf {\Xi }^{(k+1)}\right) , \mathbf {\Xi }^{(k+1)}-\mathbf {\Xi }^{(k)} \right\rangle - \frac{\omega }{2}\left\| \mathbf {\Xi }^{(k+1)}-\mathbf {\Xi }^{(k)}\right\| _F^2 \nonumber \\&\quad + \left\langle \nabla {\mathcal {L}}_\rho \left( \mathbf {\Sigma }^{(k+1)}\right) , \mathbf {\Sigma }^{(k+1)}-\mathbf {\Sigma }^{(k)}\right\rangle - \frac{\omega }{2}\left\| \mathbf {\Sigma }^{(k+1)}-\mathbf {\Sigma }^{(k)}\right\| _F^2 \nonumber \\&= -\frac{\omega }{2} \left( \left\| \mathbf {\Xi }^{(k+1)}-\mathbf {\Xi }^{(k)}\right\| _F^2 + \left\| \mathbf {\Sigma }^{(k+1)}-\mathbf {\Sigma }^{(k)}\right\| _F^2 \right) , \end{aligned}$$
(19)

where the last equality follows from \(\nabla {\mathcal {L}}_\rho \left( \mathbf {\Xi }^{(k+1)}\right) =0\) and \(\nabla {\mathcal {L}}_\rho \left( \mathbf {\Sigma }^{(k+1)}\right) =0\). Then, summing (18) and (19) yields

$$\begin{aligned} \begin{aligned}&{\mathcal {L}}_\rho \left( \mathbf {\Sigma }^{(k+1)},\mathbf {\Xi }^{(k+1)};\textbf{y}^{(k+1)}\right) - {\mathcal {L}}_\rho \left( \mathbf {\Sigma }^{(k)},\mathbf {\Xi }^{(k)};\textbf{y}^{(k)}\right) \\&\le \frac{1}{\rho }\left\| \textbf{y}^{(k+1)}-\textbf{y}^{(k)}\right\| ^2_F -\frac{\omega }{2} \left( \left\| \mathbf {\Xi }^{(k+1)}-\mathbf {\Xi }^{(k)}\right\| _F^2 + \left\| \mathbf {\Sigma }^{(k+1)}-\mathbf {\Sigma }^{(k)}\right\| _F^2 \right) \\&\le \frac{c_2^2}{\rho } \left\| \mathbf {\Xi }^{(k+1)}-\mathbf {\Xi }^{(k)}\right\| _F^2 - \frac{\omega }{2} \left( \left\| \mathbf {\Xi }^{(k+1)}-\mathbf {\Xi }^{(k)}\right\| _F^2 + \left\| \mathbf {\Sigma }^{(k+1)}-\mathbf {\Sigma }^{(k)}\right\| _F^2 \right) , \end{aligned} \end{aligned}$$

where the second inequality follows from Lemma 1. Rearranging this expression concludes the proof. \(\square \)

We can now show that the sequence \(\left\{ {\mathcal {L}}_\rho (\mathbf {\Sigma }^{(k)},\mathbf {\Xi }^{(k)};\textbf{y}^{(k)})\right\} _{k\in \mathbb {N}}\) converges to a limit point.

Lemma 2

Under the assumptions and notation of Proposition 1 assume that \(\rho \) is large enough so that \(\frac{c_2^2}{\rho }-\frac{\omega }{2}<0\) and denote by

$$\begin{aligned} \nu = \inf \left\{ \frac{1}{2}\left\| M(\mathbf {\Sigma })-\textbf{B}\right\| _F^2+ \frac{\mu }{q}\left\| L(\mathbf {\Sigma })\right\| _{q,\varepsilon }^q \right\} . \end{aligned}$$

Then,

$$\begin{aligned} \lim _{k\rightarrow \infty } {\mathcal {L}}_\rho \left( \mathbf {\Sigma }^{(k)},\mathbf {\Xi }^{(k)};\textbf{y}^{(k)}\right) \ge \nu , \end{aligned}$$

and the sequence \(\left\{ {\mathcal {L}}_\rho (\mathbf {\Sigma }^{(k)},\mathbf {\Xi }^{(k)};\textbf{y}^{(k)})\right\} _{k\in \mathbb {N}}\) converges.

Proof

If \(\frac{c_2^2}{\rho }-\frac{\omega }{2}<0\), then the sequence \(\left\{ {\mathcal {L}}_\rho (\mathbf {\Sigma }^{(k)},\mathbf {\Xi }^{(k)};\textbf{y}^{(k)})\right\} _{k\in \mathbb {N}}\) is monotonically decreasing. Now, we show it is bounded from below. From the proof of Lemma 1, we obtain that \(\textbf{y}^{(k)} = \nabla f_q\left( \mathbf {\Xi }^{(k)}\right) \), where \(f_q(\cdot )=\frac{\mu }{q}\left\| L(\cdot )\right\| _{q,\varepsilon }^q\). Plugging in the definition of \(\mathcal {L}_\rho \) the latter equality, we obtain that the augmented Lagrangian evaluated at \(\left( \mathbf {\Sigma }^{(k)},\mathbf {\Xi }^{(k)};\textbf{y}^{(k)}\right) \) satisfies

$$\begin{aligned} {\mathcal {L}}_\rho \left( \mathbf {\Sigma }^{(k)},\mathbf {\Xi }^{(k)};\textbf{y}^{(k)}\right)&= \frac{1}{2}\left\| M(\mathbf {\Sigma }^{(k)})-\textbf{B}\right\| _F^2+ f_q(\mathbf {\Xi }^{(k)}) + \left\langle \nabla f_q(\mathbf {\Xi }^{(k)}),\mathbf {\Sigma }^{(k)}-\mathbf {\Xi }^{(k)}\right\rangle \\&\phantom {=} + \frac{\rho }{2}\left\| \mathbf {\Sigma }^{(k)}-\mathbf {\Xi }^{(k)}\right\| _F^2 \\&\ge \frac{1}{2}\left\| M(\mathbf {\Sigma }^{(k)})-\textbf{B}\right\| _F^2+ \frac{\omega }{2}\left\| \mathbf {\Sigma }^{(k)}-\mathbf {\Xi }^{(k)}\right\| _F^2 + f_q(\mathbf {\Sigma }^{(k)}) \\&\phantom {=} +\frac{\rho }{2}\left\| \mathbf {\Sigma }^{(k)}-\mathbf {\Xi }^{(k)}\right\| _F^2 \\&\ge \frac{1}{2}\left\| M(\mathbf {\Sigma }^{(k)})-\textbf{B}\right\| _F^2+ \frac{\mu }{q}\left\| L(\mathbf {\Sigma }^{(k)})\right\| _{q,\varepsilon }^q \ge \nu , \end{aligned}$$

where the first inequality follows from the \(\omega \)-strong convexity of \(f_q(\cdot )=\frac{\mu }{q}\left\| L(\cdot )\right\| _{q,\varepsilon }^q\). The convergence of the sequence \(\left\{ {\mathcal {L}}_\rho \left( \mathbf {\Sigma }^{(k)},\mathbf {\Xi }^{(k)};\textbf{y}^{(k)}\right) \right\} _{k\in \mathbb {N}}\) is ensured by its monotonically decreasing and its boundedness. \(\square \)

We are now in position to show that the sequences \(\left\{ \mathbf {\Sigma }^{(k)}\right\} _{k\in \mathbb {N}}\) and \(\left\{ \mathbf {\Xi }^{(k)}\right\} _{k\in \mathbb {N}}\) converge to a limit point.

Lemma 3

With the notation and the assumptions of Lemma 2, the sequences \(\left\{ \mathbf {\Sigma }^{(k)}\right\} _{k\in \mathbb {N}}\) and \(\left\{ \mathbf {\Xi }^{(k)}\right\} _{k\in \mathbb {N}}\) converge.

Proof

Let k be fixed, but arbitrary. From Proposition 1 we have

$$\begin{aligned}&{\mathcal {L}}_\rho \left( \mathbf {\Sigma }^{(0)},\mathbf {\Xi }^{(0)};\textbf{y}^{(0)}\right) - {\mathcal {L}}_\rho \left( \mathbf {\Sigma }^{(k)},\mathbf {\Xi }^{(k)};\textbf{y}^{(k)}\right) \\&\ge -\sum _{j=0}^{k-1} \left[ \left( \frac{c_2^2}{\rho }-\frac{\omega }{2}\right) \left\| \mathbf {\Xi }^{(j+1)}-\mathbf {\Xi }^{(j)}\right\| _F^2 - \frac{\omega }{2}\left\| \mathbf {\Sigma }^{(j+1)}-\mathbf {\Sigma }^{(j)}\right\| _F^2 \right] \ge 0, \end{aligned}$$

where the last inequality follows from the assumption that \(\frac{c_2^2}{\rho }-\frac{\omega }{2}<0\). Denote by \(\mathcal {L}_\rho ^*\) the limit point of the sequence \(\left\{ {\mathcal {L}}_\rho (\mathbf {\Sigma }^{(k)},\mathbf {\Xi }^{(k)};\textbf{y}^{(k)})\right\} _{k\in \mathbb {N}}\), that exists thanks to Lemma 2. Then,

$$\begin{aligned}&\sum _{j=0}^{\infty } \left[ \left( \frac{\omega }{2}-\frac{c_2^2}{\rho }\right) \left\| \mathbf {\Xi }^{(j+1)}-\mathbf {\Xi }^{(j)}\right\| _F^2 + \frac{\omega }{2}\left\| \mathbf {\Sigma }^{(j+1)}-\mathbf {\Sigma }^{(j)}\right\| _F^2 \right] \\&\le {\mathcal {L}}_\rho (\mathbf {\Sigma }^{(0)},\mathbf {\Xi }^{(0)};\textbf{y}^{(0)}) - {\mathcal {L}}_\rho ^*. \end{aligned}$$

Therefore, we have

$$\begin{aligned} \lim _{j\rightarrow \infty }\left\| \mathbf {\Sigma }^{(j+1)}-\mathbf {\Sigma }^{(j)}\right\| _F^2=0\quad \text{ and }\quad \lim _{j\rightarrow \infty }\left\| \mathbf {\Xi }^{(j+1)}-\mathbf {\Xi }^{(j)}\right\| _F^2=0, \end{aligned}$$

and, since thanks to Assumption 1 the sequences \(\left\{ \mathbf {\Sigma }^{(j)}\right\} _{j\in \mathbb {N}}\) and \(\left\{ \mathbf {\Xi }^{(j)}\right\} _{j\in \mathbb {N}}\) are bounded, they converge to a limit point. \(\square \)

We can now prove our main result.

Theorem 1

Under the assumptions and with the notation above, let \(\left( \mathbf {\Sigma }^*,\mathbf {\Xi }^*\right) \) be the limit point of the sequence \(\left\{ \left( \mathbf {\Sigma }^{(k)},\mathbf {\Xi }^{(k)}\right) \right\} _{k\in \mathbb {N}}\) and let us denote by \(g_M(\cdot )=\frac{1}{2}\left\| M(\cdot )-\textbf{B}\right\| ^2_F\). Then, \((\mathbf {\Sigma }^*,\mathbf {\Xi }^*)\) satisfies:

  1. (i)

    \(\mathbf {\Sigma }^*=\mathbf {\Xi }^*\);

  2. (ii)

    The sequence \(\left\{ \textbf{y}^{(k)}\right\} _{k\in \mathbb {N}}\) converges to a limit point \(\textbf{y}^*\) such that \(\nabla f_q(\mathbf {\Xi }^*)-\textbf{y}^*=0\) and \(\nabla g_M(\mathbf {\Sigma }^*)+\textbf{y}^*=0\);

  3. (iii)

    \(\mathbf {\Sigma }^*\) is a stationary point of \(g_M+f_q\).

Proof

First we observe that the existence of the limit point \((\mathbf {\Sigma }^*,\mathbf {\Xi }^*)\) is ensured by Lemma 3. We can now show the three points in the statement of the theorem.

  1. (i)

    From Lemma 3 we have that \(\left\| \mathbf {\Xi }^{(k+1)}-\mathbf {\Xi }^{(k)}\right\| _F \rightarrow 0\) as \(k\rightarrow \infty \), therefore, Lemma 1 yields \(\left\| \textbf{y}^{(k+1)}-\textbf{y}^{(k)}\right\| _F \rightarrow 0\) as \(k\rightarrow \infty \). From the definition of \(\textbf{y}^{(k+1)}\) in (4), it follows that

    $$\begin{aligned} \lim _{k\rightarrow \infty }\left\| \mathbf {\Sigma }^{(k+1)}-\mathbf {\Xi }^{(k+1)}\right\| _F=\lim _{k\rightarrow \infty }\left\| \textbf{y}^{(k+1)}-\textbf{y}^{(k)}\right\| _F=0, \end{aligned}$$

    that implies \(\mathbf {\Sigma }^*=\mathbf {\Xi }^*\).

  2. (ii)

    Recall that (16) states that \(\textbf{y}^{(k)}=\nabla f_q\left( \mathbf {\Xi }^{(k)}\right) \). Therefore,

    $$\begin{aligned} \textbf{y}^*=\lim _{k\rightarrow \infty }\nabla f_q\left( \mathbf {\Xi }^{(k)}\right) =\nabla f_q\left( \mathbf {\Xi }^*\right) , \end{aligned}$$

    where the last equality follows from the continuity of \(\nabla f_q\). This proves that the sequence \(\left\{ \textbf{y}^{(k)}\right\} _{k\in \mathbb {N}}\) converges and the first relation in the statement trivially follows. The optimality condition on \(\mathbf {\Sigma }^{(k+1)}\) [see (5)] implies

    $$\begin{aligned} 0&=\nabla g_M\left( \mathbf {\Sigma }^{(k+1)}\right) + \textbf{y}^{(k)} + \rho \left( \mathbf {\Sigma }^{(k+1)}-\mathbf {\Xi }^{(k)}\right) \\&=\nabla g_M\left( \mathbf {\Sigma }^{(k+1)}\right) + \textbf{y}^{(k+1)} + \rho \left( \mathbf {\Xi }^{(k+1)}-\mathbf {\Xi }^{(k)}\right) . \end{aligned}$$

    We obtain the second relation in the statement taking the limit for \(k\rightarrow \infty \).

  3. (iii)

    Since \(\mathbf {\Sigma }^*=\mathbf {\Xi }^*\), adding the two equalities in (ii), shows that

    $$\begin{aligned} \nabla f_q(\mathbf {\Sigma }^*)+\nabla g_M(\mathbf {\Sigma }^*)=0, \end{aligned}$$

    which concludes the proof.

\(\square \)

5 Variations of Algorithm 1

We now present two variations of Algorithm 1 that either improve the accuracy of the computed solutions or lower the computational cost. However, at this point we are not able to prove the convergence of these methods and they remain heuristic algorithms. Nevertheless, extensive numerical experience shows that these methods perform well and provide good restorations.

5.1 Nonnegativity Constraint

Since the electrical conductivity is a nonnegative quantity, we wish to add the constraint \(\mathbf {\Sigma }\ge 0\) on the computed solution, i.e., we wish to solve

$$\begin{aligned} \min _{\mathbf {\Sigma }\ge 0}\frac{1}{2}\left\| M(\mathbf {\Sigma })-\textbf{B}\right\| _F^2+ \frac{\mu }{q}\left\| L\left( \mathbf {\Sigma }\right) \right\| _{q,\varepsilon }^q. \end{aligned}$$
(20)

To do this we follow the procedure in [7, 13]. We rewrite the above minimization problem as

$$\begin{aligned} \min _{\mathbf {\Sigma }}\frac{1}{2}\left\| M(\mathbf {\Sigma })-\textbf{B}\right\| _F^2+\frac{\mu }{q}\left\| L\left( \mathbf {\Sigma }\right) \right\| _{q,\varepsilon }^q+\iota _0\left( \mathbf {\Sigma }\right) , \end{aligned}$$

where \(\iota _0\) denotes the indicator function of the nonnegative cone, i.e.,

$$\begin{aligned} \iota _0(\mathbf {\Sigma })={\left\{ \begin{array}{ll} 0& \mathbf {\Sigma }\ge 0,\\ +\infty & \text{ else. } \end{array}\right. } \end{aligned}$$

We can reformulate this minimization problem as a constrained one by

$$\begin{aligned} \min _{\mathbf {\Sigma },\;\mathbf {\Xi }_L,\;\mathbf {\Xi }_0}\left\{ \frac{1}{2}\left\| M(\mathbf {\Sigma })-\textbf{B}\right\| _F^2+\frac{\mu }{q}\left\| L\left( \mathbf {\Xi }_L\right) \right\| _{q,\varepsilon }^q+\iota _0\left( \mathbf {\Xi }_0\right) ,\;\mathbf {\Sigma }=\mathbf {\Xi }_L,\;\mathbf {\Sigma }=\mathbf {\Xi }_0\right\} . \end{aligned}$$

The associated augmented Lagrangian is

$$\begin{aligned} \mathcal {L}_\rho \left( \mathbf {\Sigma },\mathbf {\Xi }_L,\mathbf {\Xi }_0;\textbf{y}_L,\textbf{y}_0\right) =&\frac{1}{2}\left\| M(\mathbf {\Sigma })-\textbf{B}\right\| _F^2+\frac{\mu }{q}\left\| L\left( \mathbf {\Xi }_L\right) \right\| _{q,\varepsilon }^q+\iota _0\left( \mathbf {\Xi }_0\right) \\&+\left\langle \textbf{y}_L,\mathbf {\Sigma }-\mathbf {\Xi }_L\right\rangle +\frac{\rho }{2}\left\| \mathbf {\Sigma }-\mathbf {\Xi }_L\right\| _F^2\\&+\left\langle \textbf{y}_0,\mathbf {\Sigma }-\mathbf {\Xi }_0\right\rangle +\frac{\rho }{2}\left\| \mathbf {\Sigma }-\mathbf {\Xi }_0\right\| _F^2. \end{aligned}$$

The ADMM iterations can be written as

The \(\mathbf {\Sigma }^{(k+1)}\) subproblem can be reformulated as

$$\begin{aligned} \mathbf {\Sigma }^{(k+1)}=\arg \min _\mathbf {\Sigma }\frac{1}{2}\left\| M(\mathbf {\Sigma })-\textbf{B}\right\| _F^2 + \frac{\rho }{2}\left\| \begin{bmatrix}I_n\\ I_n\end{bmatrix}\mathbf {\Sigma }-\begin{bmatrix} \mathbf {\Xi }^{(k)}_L-\frac{\textbf{y}_L^{(k)}}{\rho }\\ \mathbf {\Xi }^{(k)}_0-\frac{\textbf{y}_0^{(k)}}{\rho } \end{bmatrix}\right\| _F^2. \end{aligned}$$

This minimization problem can be solved in the same way described in Sect. 3. The only difference is in the definition of the functional \(\widetilde{M}\) and its Jacobian matrix, that are

$$\begin{aligned} \widetilde{M}(\varvec{\sigma })=\begin{bmatrix} M(\varvec{\sigma })\\ \sqrt{\rho }\varvec{\sigma }\\ \sqrt{\rho }\varvec{\sigma }\end{bmatrix} \quad \text{ and }\quad \widetilde{J}(\varvec{\sigma })=\begin{bmatrix} J(\varvec{\sigma })\\ \sqrt{\rho }I_n\\ \sqrt{\rho }I_n\end{bmatrix}. \end{aligned}$$

The minimization problem for the computation of \(\mathbf {\Xi }_L^{(k+1)}\) is identical to the one in Sect. 3 and, therefore, we do not dwell on it here.

Finally, we discuss the minimization problem to obtain \(\mathbf {\Xi }^{(k+1)}_0\). Dropping the constant terms with respect to \(\mathbf {\Xi }_0\), this can be rewritten as

$$\begin{aligned} \mathbf {\Xi }^{(k+1)}_0=\arg \min _{\mathbf {\Xi }_0}\frac{1}{2}\left\| \mathbf {\Xi }_0-\left( \mathbf {\Sigma }^{(k+1)}+\frac{\textbf{y}_0^{(k)}}{\rho }\right) \right\| _F^2+\iota _0\left( \mathbf {\Xi }_0\right) , \end{aligned}$$

which can be solved in closed form as

$$\begin{aligned} \mathbf {\Xi }^{(k+1)}_0=\max \left\{ \left( \mathbf {\Sigma }^{(k+1)}+\frac{\textbf{y}_0^{(k)}}{\rho }\right) ,0\right\} , \end{aligned}$$

where the maximum is meant element-wise.

We do not present here a proof of convergence for this algorithm. Note that the proof in Sect. 4 cannot be applied here since \(\iota _0\), while convex, is only weakly lower semicontinuous and it is not differentiable.

5.2 Non-stationary Regularization Parameter

The method proposed in Sect. 3.1 requires the computation of many \(\mathbf {\Sigma }_\mu \) to select the “best” one. This approach can be computationally demanding since problem (2) has to be solved several times. However, this approach ensures the convergence of the method as shown in Sect. 4. We wish to describe here a heuristic, but computationally cheaper, way to determine a suitable value for \(\mu \). We follow the idea in [8, Section 3.2.2] and in [42].

Instead of running Algorithm 1 for a fixed value of \(\mu \), we construct a non-stationary method that generates a sequence of values of \(\mu \) so that the RWP is satisfied at each (inner) iteration.

At iteration k of Algorithm 1 (or its constrained counterpart in Sect. 5.1) the parameter \(\mu \) is considered only when computing \(\mathbf {\Xi }^{(k+1)}\), i.e., when solving (10). We recall that this minimization problem is solved by the MM algorithm described in Sect. 3. At each iteration (kj) of the inner loop that computes \(\mathbf {\Xi }^{(k+1)}\), we determine a “proper” parameter \(\mu _{k,j}\). As stated above, to select \(\mu _{k,j}\) at each iteration, we still exploit the RWP. Denote by \(\varvec{\xi }_\mu ^{(k,j)}\) the value obtained in (11) with \(\eta =\frac{\mu \varepsilon ^{q-2}}{\rho }\) and let \(\mathbf {\Xi }_\mu ^{(k,j)}\) be such that

$$\begin{aligned} \varvec{\xi }_\mu ^{(k,j)}=\textrm{vec}\left( \mathbf {\Xi }_\mu ^{(k,j)}\right) . \end{aligned}$$

We define

$$\begin{aligned} \textbf{R}_\mu ^{(k,j)}=M\left( \mathbf {\Xi }_\mu ^{(k,j)}\right) -\textbf{B}\quad \text{ and }\quad W^{(k,j)}(\mu )=\frac{\left\| \textbf{R}_\mu ^{(k,j)}\star \textbf{R}_\mu ^{(k,j)}\right\| _F^2}{\left\| \textbf{R}_\mu ^{(k,j)}\right\| _F^4} \end{aligned}$$

as in (15). We determine a “proper” value for \(\mu _{k,j}\) by

$$\begin{aligned} \mu _{k,j}=\arg \min _\mu W^{(k,j)}(\mu ) \end{aligned}$$

and we set the iterate \(\varvec{\xi }^{(k,j)}=\varvec{\xi }^{(k,j)}_{\mu _{k,j}}\).

By employing this strategy, we require only a single run of the modified version of Algorithm 1. However, the cost per iteration increases due to the additional computational cost required to minimize \(W^{(k,j)}(\mu )\). Nevertheless, the overall cost is much lower than when Algorithm 1 is used in a stationary way for several values of \(\mu \); see the numerical results in Sect. 6.

In practice, to minimize the function \(W^{(k,j)}(\mu )\), we employ the Matlab routine fminbnd. This method requires to evaluate several times the function \(W^{(k,j)}(\mu )\) for different \(\mu \) values. This may become computationally expensive. To further lower the computational cost, we impose the RWP only on a subset of the residual \(\textbf{R}\). Fixing a number \(2\le t\le s\), this will be the number of contiguous columns of \(\textbf{R}\) that we will consider when minimizing \(W^{(k,j)}\). To ensure that most of the columns are eventually considered, we randomly select the set of t columns at each outer iteration k. In particular, at iteration k we fix a random index \(i_k\in \{1,\ldots ,s-(t-1)\}\) and we define

$$\begin{aligned} \widetilde{\textbf{R}}_\mu ^{(k,j)}=\left( M\left( \mathbf {\Xi }_\mu ^{(k,j)}\right) -\textbf{B}\right) _{i_k:i_k+t}, \end{aligned}$$

where by \((A)_{i:j}\) we denote the matrix obtained by extracting the columns from i to j from the matrix A. Thanks to the structure of the function M, we can compute \(\widetilde{\textbf{R}}_\mu ^{(k,j)}\) as

$$\begin{aligned} \widetilde{\textbf{R}}_\mu ^{(k,j)}=\begin{bmatrix} M\left( \mathbf {\Xi }_\mu ^{(k,j)}\right) _{i_k}&M\left( \mathbf {\Xi }_\mu ^{(k,j)}\right) _{i_k+1}&\ldots&M\left( \mathbf {\Xi }_\mu ^{(k,j)}\right) _{i_k+t}\end{bmatrix}-\textbf{B}_{i_k:i_k+t}. \end{aligned}$$

Therefore, computing \(\widetilde{\textbf{R}}_\mu ^{(k,j)}\) requires only t one-dimensional evaluation of M, instead of s. In our experiments we set \(t=4\).

6 Numerical Examples

In this section, we show the effectiveness of the numerical procedures described in this paper through three synthetic examples. In Test 1 we detail the behavior of our algorithmic proposal with respect to the different parameters involved in the algorithm. In Test 2, we generate several FDEM datasets by considering one model profile for the electrical conductivity and two different FDEM devices. We illustrate the effectiveness of our heuristic choice rule for the regularization parameter and compare between the stationary and non-stationary version of our method. Finally, in Test 3, we compare the results obtained by the algorithms here proposed with the ones obtained in [5].

We briefly describe the approach in [5]. The algorithm proposed by the authors tackled the minimization problem (20). The problem is reformulated as follows

$$\begin{aligned} \min _{\mathbf {\Sigma },\mathbf {\Xi }}\frac{1}{2}\left\| M(\mathbf {\Sigma })-\textbf{B}\right\| _F^2+\frac{\mu }{q}\left\| L\left( \mathbf {\Xi }\right) \right\| _{q,\varepsilon }^q+\iota _0\left( \mathbf {\Sigma }\right) +\frac{\beta }{2}\left\| \mathbf {\Xi }-\mathbf {\Sigma }\right\| _F^2, \end{aligned}$$

where \(\beta >0\) is large “enough”. The solution is obtained using the alternating minimization algorithm. This results in an additional parameter to estimate, i.e., \(\beta \). The minimization with respect to \(\mathbf {\Xi }\) is performed as in our current proposal, while the other minimization is performed using the method in [18]. Since this algorithm does not project the problem into GKS this results in an overall higher computational cost.

To simulate experimental errors, we add white Gaussian noise to the data, by letting

$$\begin{aligned} \mathbf {\varvec{\eta }}=\delta \frac{ \left\| \textbf{B}_\textrm{exact}\right\| _F}{\left\| \textbf{W}\right\| _F}\textbf{W}, \end{aligned}$$

where \(\textbf{W}\) is a normally distributed random matrix, \(\textbf{B}_\textrm{exact}\) is the exact generated data, and \(\delta \) stands for the noise level.

In order to assess the quality of the algorithms, we compute the Relative Restoration Error (RRE)

$$\begin{aligned} \textrm{RRE}(\mathbf {\Sigma })=\frac{\left\| \mathbf {\Sigma }-\mathbf {\Sigma }_\textrm{exact}\right\| _F}{\left\| \mathbf {\Sigma }_{\textrm{exact}}\right\| _F}, \end{aligned}$$

where \(\mathbf {\Sigma }_{\textrm{exact}}\) denotes the exact solution of the problem and \(\mathbf {\Sigma }\) the approximate solution, and the CPU time for every numerical approach.

To avoid the estimation of \(\varepsilon \) we select it in an adaptive way. In particular, when solving the problem for \(\mathbf {\Xi }^{(k+1)}\) we fix

$$\begin{aligned} \varepsilon ^{(k+1)}=\frac{1}{100nm}\sum _{i=1}^n\sum _{j=1}^m\left( \mathbf {\Sigma }^{(k+1)}\right) _{i,j}, \end{aligned}$$

i.e., one-hundredth of the mean value of \(\mathbf {\Sigma }^{(k+1)}\). Note that, in practice, the value for \(\varepsilon \) has little to no impact on the quality of the computed results; see, e.g., [9, 10].

In all the experiments we adopt the following stopping criteria: we set the maximum number of iterations to 500 and the convergence tolerance to \(\tau =10^{-3}\).

All simulations have been performed using MATLAB version 9.10 (R2021a) on an Intel® Xeon® Gold 6136 CPU 3.00GHz processor with 128 GB of RAM and 32 cores, running the Ubuntu GNU/Linux operating system.

Test 1. We wish to detail the behavior of the stationary algorithm with respect to the various parameters involved. In particular, we consider the effects of \(\mu \) and \(\rho \). The analysis with respect to the values of \(\varepsilon \) produces similar results as the one in [9, 10], where the authors determined that the algorithm is stable with respect to the value of \(\varepsilon \) in terms of RRE, but small values of \(\varepsilon \) may lead to slow convergence.

We first observe that, in order to satisfy the hypothesis of Theorem 1, i.e., that \(\mathcal {L}_\rho (\mathbf {\Sigma }^{(k+1)},\mathbf {\Xi };\textbf{y}^{(k)})\) is an \(\omega \)-strongly convex function of \(\mathbf {\Xi }\), we require that \(\rho \) is large enough when compared to \(\mu \). Therefore, the values of these two parameters are necessarily linked.

We construct a synthetic example that concerns with the inversion of the imaginary part of the data (quadrature component of the signal) generated by the Geophex GEM-2 device with the following configuration: both orientations of the coils, with inter-coil distance \(r=1.66~\textrm{m}\), five different operating frequencies \(f = 775\), 1175, 3925, 9825, 21725 Hz, and two measuring heights \(h= 0.75\), 1.5 m above the ground.

Therefore, we have \(s=20\) measurements for each position \(j=1,\ldots ,m\). Chosen \(m=10\) soundings, the forward model generates the matrix \(\textbf{B}_{\textrm{exact}}\) of dimension \(s\times m\) of the exact synthetic measurements. In this experiment, the data is perturbed by white Gaussian noise of levels \(\delta = 10^{-3}\) and \(\delta = 10^{-2}\). We discretize the soil with \(n=50\) layers up to the depth of 6 m, each of which is of the same thickness.

We run our stationary method for \((\rho ,\mu )\in \{\rho _1,\ldots ,\rho _{10}\}\times \{\mu _1,\ldots ,\mu _{10}\}\), where \(\rho _j\) and \(\mu _j\) are logarithmically equispaced numbers between \(10^{-7}\) and \(10^{-3}\).

We show in Fig. 1 the relative restoration error obtained with all the considered couples of parameters and the two noise levels. We can see that the obtained RRE depends on both values of \(\mu \) and \(\rho \). In particular, we can observe that choosing a tiny value for \(\rho \), may lead to instabilities especially if \(\mu \) is large. This is due to the fact that in this case the hypothesis of Theorem 1 may not be satisfied since \({\mathcal {L}}_\rho \) may not be \(\omega \)-strongly convex.

Fig. 1
figure 1

Test 1. Relative restoration error obtained for noise level \(10^{-3}\) (left) and \(10^{-2}\) (right) using Algorithm 1

Fig. 2
figure 2

Test 1. Relative restoration error against the number of iterations for noise level \(10^{-3}\) (left) and \(10^{-2}\) (right) obtained by Algorithm 1 for some selection of \(\mu \) and \(\rho \)

We report the evolution of the RRE against the number of iterations for some selection of \(\mu \) and \(\rho \) in Fig. 2. We can observe that, as \(\rho \) increases, the method slows down, while smaller values of \(\rho \) produce a faster convergence. If the value of \(\rho \) is too large, the convergence may be so slow that the stopping criterion terminates the iteration too soon. Note that, we cannot simply choose \(\rho \) tiny since we need to satisfy the assumptions of Theorem 1. To empirically verify this one can look at the behavior of \(\left\| \mathbf {\Sigma }^{(k+1)}-\mathbf {\Sigma }^{(k)}\right\| _F/\left\| \mathbf {\Sigma }^{(k)}\right\| _F\), since this quantity is expected to smoothly decay. An oscillatory behavior may be an indicator that a larger value of \(\rho \) is required.

Test 2. This example concerns with the inversion of synthetic FDEM data generated by the Geophex GEM-2 device with the following configuration: vertical orientation of the coils and same inter-coil distance, frequencies, and measuring heights above the ground as in Test 1.

Therefore, we have \(s=10\) measurements for each position \(j=1,\ldots ,m\). Chosen \(m=25\) soundings, the forward model generates the matrix \(\textbf{B}_{\textrm{exact}}\) of dimension \(s\times m\) of the exact synthetic measurements. In this experiment, the data is perturbed by white Gaussian noise of level \(\delta = 10^{-3}\). We discretize the soil with \(n=50\) layers up to the depth of 4.5 m, each of which is of the same thickness.

Figure 3 reports the exact solution (a) and the reconstructions (b), (c), and (d) for three different values of the parameter \(q=0.1,0.5,1\), respectively, obtained by applying Algorithm 1 with nonnegativity constraint. The parameter \(\rho \) has been chosen as \(\rho =10^{-9}\) and \(\mu \) has been automatically estimated from the procedure described in Sect. 3.1. The initial value vector is set as \(\varvec{\sigma }_0=0.5\).

Fig. 3
figure 3

Test 2. Reconstructions of the electrical conductivity from data generated by the Geophex GEM-2 obtained by Algorithm 1: the plots show the exact solution (a), the approximate solution with \(q=0.1\) (b), \(q=0.5\) (c), and \(q=1\) (d)

We can observe that all reconstructions are very accurate.

Figure 4 analyzes the relation between the whiteness [see (15)] and the error for different values of the parameter \(\mu \). We can observe that the \(\textrm{RRE}\) and W seen as functions of the regularization parameter \(\mu \) behave quite similarly and, therefore, we can use the latter to determine a good approximation of the minimizer of the first one.

Fig. 4
figure 4

Test 2. Whiteness (left) and error (right) obtained for the reconstruction of the electrical conductivity obtained by Algorithm 1 in dependence of the parameter \(\mu \). The measurement data are generated by the configuration of Geophex GEM-2. The errors and the whiteness are obtained with \(q=0.1\) (red line), \(q=0.5\) (orange dashed-dotted line), \(q=1\) (blue dashed line), respectively (Color figure online)

The reconstructions of the electrical conductivity obtained by the non-stationary algorithm described in Sect. 5 are reported in Fig. 5. Also in this case, we compare the exact solution with the ones approximated by using different values of the parameter q. From a visual analysis, the best reconstruction is obtained for \(q=1\); see Fig. 5d.

Fig. 5
figure 5

Test 2. Reconstructions of the electrical conductivity from data generated by the Geophex GEM-2 obtained by non-stationary algorithm: the plots show the exact solution (a), the approximate solution with \(q=0.1\) (b), \(q=0.5\) (c), and \(q=1\) (d)

The main advantage of the non-stationary version of Algorithm 1 is the computational cost. Table 1 shows the CPU time in seconds and the RRE of the reconstructions obtained by the two algorithms proposed in this paper. We can see that the error committed by non-stationary approach is comparable with respect to the one obtained by Algorithm 1. However, the non-stationary version reduces the CPU time considerably, almost half of the CPU time computed by Algorithm 1.

Table 1 Test 2. CPU time (in seconds), RRE, and number of iterations of the two algorithms for different values of \(q=0.1, 0.5, 1\)

Test 3. In this third experiment, in order to compare the new algorithms with the one proposed in [5], we consider two synthetic datasets presented in [5, Section 5.1, Test 1]. We briefly recall them here. The available complex datasets for the inversion procedure are generated by two different FDEM devices. The first dataset is constructed considering the following configuration of the Geophex GEM-2: both orientations of the coils, with inter-coil distance \(r=1.66\) m, six different operating frequencies \(f = 775, 1175, 3925, 9825, 21725, 47025\) Hz, placed at a height of \(h=1\) m above the ground. Therefore, we have \(s=12\) measurements for each position \(j=1,\ldots ,m\). The second dataset is constructed considering another device, the CMD Explorer, placed at \(h=1\) m above the ground, with the following configuration: both orientations of the coils, with three different values of inter-coil distance \(r=1.48, 2.82, 4.49\) m, and a frequency \(f=10\) kHz. In this way, we get \(s=6\) measured data for each position \(j=1,\ldots ,m\).

In both datasets, we add white Gaussian noise of level \(\delta = 10^{-2}\) and we assume to discretize the soil with \(n=20\) layers up to the depth of 10 m. In this experiment, we simulate to collect the data along a straight line with \(m=50\) soundings. The initial value vector is set as \(\varvec{\sigma }_0=0.1\).

In all the examples we have chosen the value of the parameter \(\rho \) as the smallest one such that the Jacobian matrix involved in the problem has a small enough conditioning number, i.e., \(\kappa _2(J)\approx 10^6\).

Moreover, an automatic choice of \(\mu \) and different values of q have been tested and compared.

Fig. 6
figure 6

Test 3. Reconstruction of the electrical conductivity from data generated by the Geophex GEM-2 obtained by Algorithm 1 (left panels) and by non-stationary algorithm (right panels). The rows report the exact solution (a), the approximated solution with \(q=0.1\) (b), \(q=0.5\) (c), and \(q=1\) (d)

Fig. 7
figure 7

Test 3. Reconstruction of the electrical conductivity from data generated by the CMD Explorer obtained by Algorithm 1 (left panel) and by non-stationary algorithm (right panel). The rows report the exact solution (a), the approximated solution with \(q=0.1\) (b), \(q=0.5\) (c), and \(q=1\) (d)

Table 2 Test 3. Comparison between Algorithm 1, non-stationary algorithm, and the algorithm proposed in [5]: relative restoration error (RRE) and number of iterations obtained with \(q=0.1, 0.5, 1\)

The reconstructions obtained from the first dataset (configuration of Geophex GEM-2) are depicted in Fig. 6, while the ones obtained from the second dataset (CMD Explorer) are illustrated in Fig. 7. Table 2 reports the RRE produced by each algorithm for both device configurations. The methods proposed in this paper reconstruct a better solution in terms of RRE, compared to the algorithm suggested in [5].

7 Conclusions

In this paper, we have proposed an ADMM algorithm in order to invert FDEM data and reconstruct the electrical conductivity distribution of the ground assuming the magnetic permeability is known. We have shown the convergence of the method and we have presented two variations that improve either the accuracy or the computational cost of the ADMM. We have compared the obtained results with the ones computed by using the variational method in [5], showing, through different numerical tests, that the proposed algorithm better computes the solution of the nonlinear problem. We remark that the same procedure could be applied to get the magnetic permeability distribution by assuming that the electrical conductivity is known.