Advertisement

Damped Dynamical Systems for Solving Equations and Optimization Problems

  • Mårten GullikssonEmail author
  • Magnus Ögren
  • Anna Oleynik
  • Ye Zhang
Living reference work entry

Abstract

We present an approach for solving optimization problems with or without constrains which we call Dynamical Functional Particle Method (DFMP). The method consists of formulating the optimization problem as a second order damped dynamical system and then applying symplectic method to solve it numerically. In the first part of the chapter, we give an overview of the method and provide necessary mathematical background. We show that DFPM is a stable, efficient, and given the optimal choice of parameters, competitive method. Optimal parameters are derived for linear systems of equations, linear least squares, and linear eigenvalue problems. A framework for solving nonlinear problems is developed and numerically tested. In the second part, we adopt the method to several important applications such as image analysis, inverse problems for partial differential equations, and quantum physics. At the end, we present open problems and share some ideas of future work on generalized (nonlinear) eigenvalue problems, handling constraints with reflection, global optimization, and nonlinear ill-posed problems.

Keywords

Optimization Damped dynamical systems Convex problems Eigenvalue problems Image analysis Inverse problems Quantum physics Schrödinger equation 

Introduction

In this chapter we describe the idea of solving optimization problems (with or without constraints) and equations by using a second-order damped dynamical system. In order to introduce the idea, let us consider a simple example from classical mechanics. The harmonic oscillator is described by
$$\displaystyle \begin{aligned} m\ddot{u} = -ku, \quad k>0. \end{aligned} $$
(1)
Here u(t) is the distance from the equilibrium of a mass m on which a force − ku, k > 0 is acting and the dot denotes the time derivative. For example, the mass could be attached to a spring where then ku is the force acting on the mass; see Fig. 1. The solution of (1) is
$$\displaystyle \begin{aligned}u(t) = C_1\sin \left(\sqrt{\frac{k}{m}}\, t + C_2\right)\end{aligned}$$
where the constants Ci are given by the initial position and velocity at, say, t = t0. The mass never reach equilibrium unless u(t0) = 0. However, if we in addition assume that there is some friction proportional to the velocity \(\dot {u}\), e.g., between the mass and the surface on which it is sliding, we get a damped second-order system
$$\displaystyle \begin{aligned} m\ddot{u} = -\eta \dot{u} -ku, \end{aligned} $$
(2)
where η > 0 is the friction constant. In this case, the solution is given as
$$\displaystyle \begin{aligned}u(t) = C_1 e^{\xi_1 t} + C_2 e^{\xi_2 t}\end{aligned}$$
where Ci are determined by the initial conditions and \(\xi _i = -\eta /(2m) \pm \sqrt {(\eta /(2m))^2 - k/m}.\) It is easy to see that u tends to zero which is the equilibrium position of the mass and the stationary solution to (1). Thus, the system (2) can be used to find the stationary solution of the harmonic oscillator (1). Further, we note that u = 0 is also the solution of the (trivial) optimization problem \(\min ku^2/2\). In fact, the convex function V (u) = ku2∕2 is the potential corresponding to the force F = −ku which is conservative since F = −dVdt. Of course, in this case it is easier to solve ku = 0 or \(\min ku^2/2\) than to integrate (2). However, this simple idea will be extended to solve more challenging problems.
Fig. 1

A simple oscillating spring-mass system

Let us generalize the mechanical system slightly by imagining masses mij in a rectangular grid where each mass except on the boundary is connected to its four closest neighbors by springs. Different types of boundary conditions could be introduced that we do not specify here. The damped dynamical system for this problem is \(M\ddot {u} + \eta \dot {u} = -Au\) where u is a vector consisting of the x- and y-parts of the displacement of masses in some given order and A is a positive definite symmetric matrix with spring constants. It is not difficult to show that the stationary solution is the solution to the linear system of equations Au = 0. More generally, we realize that we can solve any linear system of equations Au = b with A positive definite by finding the stationary solution to a damped dynamical system.

The potential corresponding to the force Au is the convex function V (u) = uTAu∕2 that has a unique minimum corresponding to the solution of the linear system of equations. As before F = −Au is conservative since F = −∇V .

The continuous problem corresponding to the last example is an isotropic linear elastic continuum where u = u(x, y, t) is the displacement field. The second-order damped system is the partial differential equation (sometimes called the damped wave equation) utt + ηut = −∇⋅ ku where k = k(x, y) > 0. The stationary solution is given by the solution of Poissons equation ∇⋅ ku = 0, and we note that damped dynamical systems can be used to find the solution of many different partial differential equations.

We now expand our idea into a more general setting and show how to numerically solve the problem in a stable and efficient way. For the sake of brevity, we omit proofs of the theorems that require introducing additional theory or are lengthy. We however provide numerous useful references to recent publications and to the state of the art where one could find the missing and additional information. Otherwise we try to supply enough details, with all the numerical results being reproducible, so the reader can expand on the theory and applications. Some of the ideas presented here are work in progress.

Our starting point is the minimization problem
$$\displaystyle \begin{aligned} \min\limits_{u\in \mathcal{H}} V(u), \end{aligned} $$
(3)
where \(\mathcal {H}\) is a real Hilbert space and \(V: \mathcal {H}\rightarrow {\mathbb R}\) is a smooth (analytical) convex functional. We use the conventional notation for inner product and norm as (⋅, ⋅) and ∥⋅∥, respectively, with subindices added to specify the underlying spaces if needed.
The main idea, already introduced above, for solving (3) is to utilize the fact that the solution to (3) is also a stationary solution, say u, to the second-order damped dynamical system
$$\displaystyle \begin{aligned} \ddot{u}(t) + \eta \dot{u}(t) = -\nabla V(u(t)),\, \eta>0, \end{aligned} $$
(4)
and this solution is unique and globally exponentially stable, see, e.g., references in Begout et al. (2015). We have seen that the problem (4) naturally appears in modeling mechanical systems where an additional relevant example is the heavy ball with friction system (HBF). In this case (4) describes the motion of a material point with positive mass. The optimization properties for HBF with different friction have been studied in detail in Attouch et al. (2000), Attouch and Alvarez (2000), and Alvarez (2000) and references therein. Throughout the whole chapter, we reserve the dot notation for the derivatives with respect to the fictious time t. Moreover, we use u as the unknown everywhere except in section “Inverse Problems for Partial Differential Equations” where we use p.
Since the minimum of (3) is the (unique) solution of
$$\displaystyle \begin{aligned} \nabla V(u)=0, \end{aligned} $$
(5)
we can also use (4) to solve equations such as linear equations that we will expand upon in the sections to come.

After the problem has been formulated as (4), the important question is the following: How does one choose a numerical method for solving (4) that is efficient and fast enough to compete with the existing methods of solving (3) or (5)?

We observe that the dynamical system (4) is Hamiltonian with the total energy
$$\displaystyle \begin{aligned} E(u(t),\dot{u}(t)) = V(u(t)) + \dfrac{1}{2}\| \dot{u}(t) \|{}^2, \end{aligned} $$
(6)
which is also a Lyapunov function to (4). Its exponential decrease in time results in an exponential decay of ∥u(t) − u∥. Symplectic methods, such as symplectic Runge-Kutta, Störmer-Verlet, etc., are tailor-made for Hamiltonian systems and preserve the energy. This serves as the motivation for our choice of numerical method.
Let us rewrite (4) as the first-order system
$$\displaystyle \begin{aligned} \begin{array}{l} \dot{u} = v\\ \dot{v} = -\eta v - \nabla V(u). \end{array} \end{aligned} $$
(7)
Then, we apply a one-step symplectic explicit method, such as symplectic Euler or Störmer-Verlet (Hairer et al., 2006), which give us an iterative map on the form
$$\displaystyle \begin{aligned} w_{k+1} = F(w_k,\Delta t_k, \eta_k) , w_k = (u_k,v_k), k = 1, 2, \ldots, \end{aligned} $$
(8)
where the time step Δtk, damping ηk may be independent of k. The choice of parameters Δtk and ηk can be aimed to optimize the performance of the numerical method, which generally is a nontrivial task.

We call the approach of finding the solution to (3) or (5) by solving (4) with a symplectic method as the dynamical functional particle method (DFPM) (Gulliksson et al., 2013). We would like to emphasize that it is the combination of the damped dynamical system together with an efficient (fast, stable, accurate) symplectic solver that makes DFPM a novel and powerful method. Even if the idea of solving minimization problems using (4) with different damping strategies goes far back (see Poljak 1964 and Sandro et al. 1979), it has not been systematically treated with symplectic solvers and optimal parameter choices.

DFPM is readily extended to constrained problems. Consider a minimization problem on a convex constraint set \(\mathcal {G}= \left \{u\in H: g(u) = 0\right \}\) where g is smooth, i.e.,
$$\displaystyle \begin{aligned} \begin{array}{l} \min_{u\in H} V(u) \\ \text{s.t. } g(u)=0. \end{array} \end{aligned} $$
(9)
The problem (9) has a unique solution u if ∇g(u) is of full rank. The corresponding dynamical system can be formulated using the Lagrange function L(u, μ) = V (u) + (∇g(u), μ), where μ is the Lagrange parameter. The dynamical system for solving (9) is given by
$$\displaystyle \begin{aligned} \ddot{u} + \eta \dot{u} = -\nabla V (u) - (\nabla g(u), \mu ) \end{aligned} $$
(10)
with μ(t) chosen such that u(t) tends to u. For existence and uniqueness of solutions to constrained problems in a more general setting, see McLachlan et al. (2014) and Alvarez et al. (2002) and references therein.
In order to solve (9) using (10), we can either choose μ(t) to remain on the constraint set \(\mathcal {G}\) or approach it as t grows. We intend to realize these two strategies by using (i) projection and (ii) full damped formulation, respectively. The first approach is executed by formulating the iterative map (8) and then projecting wk to the set \(\mathcal {G}\) at each iteration step. This method is generally costly but there are exceptions, e.g., eigenvalue problems with only normalization constraints (see section “Linear Eigenvalue Problems”). We consider nonlinear ill-posed inverse problems using DFPM with projection in section “Inverse Problems for Partial Differential Equations” and nonlinear eigenvalue problems in section “The Yrast Spectrum for Atoms Rotating in a Ring”. For the fully damped approach, we introduce an additional dynamical system for the constraints g as
$$\displaystyle \begin{aligned} \ddot{g}_i + \xi_i \dot{g}_i = -k_i g_i,\, \xi_i, \, k_i>0. \end{aligned} $$
(11)
Then g(u(t)) tends to zero exponentially fast, and the equations in (11) can be used to derive explicit expressions of μ(t) in (10). A first attempt in this direction was made in Gulliksson (2017) (see section “Linear Eigenvalue Problems”) and also applied to a nonlinear Schrödinger equation (see section “Excited States to the Schrödinger Equation”).

Symplectic solvers for undamped Hamiltonian systems has long been known for their excellent behavior for integration over long time intervals (Hairer et al., 2006). There has been much less research on symplectic methods for the damped system and even less is known about the use of symplectic methods for the constrained problem (9). However, there are some results (see, e.g., Bhatt et al. 2016 and Mclachlan et al. 2006) that indicate a similar good long-term performance for the damped case, which further supports DFPM. To the best of our knowledge, there are no other symplectic methods developed specifically to attain fast convergence to a stationary solution of (4) and (10) except DFPM for linear problems (see section “Linear Problems” and Edvardsson et al. 2015, Neuman et al. 2015, and Gulliksson 2017) and our work on the constrained problems (see sections “Linear Eigenvalue Problems” and “The Yrast Spectrum for Atoms Rotating in a Ring”, and Sandin et al. 2016 and Gulliksson 2017).

Very recently we have started to develop DFPM for ill-posed problems (see sections “Ill-Posed Problems”, “Image Analysis”, and “Inverse Problems for Partial Differential Equations”), inspired by work on second-order methods for ill-posed linear problems and the work in Attouch and Chbani (2016). One of the main advantages of DFPM as opposed to Tikhonov or other regularization techniques is the simplified choice of regularization. In the linear case, this is manifested by an a posteriori stopping time (see section “Ill-Posed Problems”) and for inverse ill-posed source problems by using a decreasing regularization term (see section “Inverse Problems for Partial Differential Equations”).

A closely related approach for solving (3) that has been studied extensively in, e.g., Smyrlis and Zisis (2004), is the steepest descent method
$$\displaystyle \begin{aligned} \dot{u}(t) + \alpha \nabla V(u(t)) =0,\, \alpha>0. \end{aligned} $$
(12)
It might seem that (12) should be better than (4) since the exponential decrease toward the stationary solution in (12) can be made arbitrary large by choosing α large enough. However, this is not true if one takes into account the stability and accuracy of the numerical solver. DFPM has shown to have a remarkably faster convergence to the stationary solution than any numerical method applied to (12) (see Gulliksson et al. 2012 and Edvardsson et al. 2015).

Finally, we would like to mention some other methods that are based on introducing an extra parameter in order to solve equations. These are the continuation method (Watson et al., 1997), fictitious time methods (Ascher et al., 2007; Tsai et al., 2010), and dynamical system methods developed for solving numerical linear algebra problems (Chu, 2008). Despite that these methods and DFPM share a common idea, their similarities does not extend further.

The chapter is organized as follows. Linear system of equations, linear eigenvalue problems, linear least squares, and linear ill-posed problems are treated in section “Linear Problems”. The first three of these parts deal with finite dimensional problems, whereas linear ill-posed problems are considered in an infinite dimensional Hilbert space setting. In section “From Linear to Nonlinear Problems” we present two novel ideas how to solve nonlinear problems using either a local linearization or the Lyapunov function given by the total energy in (6). The next section contains important applications in image analysis, inverse source problems, and quantum physics where both linear and nonlinear (ill-posed) problems are treated. Finally, we make some conclusions and present some possible future research where DFPM can be further developed and analyzed.

Linear Problems

Linear Equations

Consider the linear system of equations
$$\displaystyle \begin{aligned} Au=b,\, A\in\mathbb{R}^{n\times n},\, b\in\mathbb{R}^{n},{} \end{aligned} $$
(13)
where A is positive definite. It is straightforward to formulate (13) as the dynamical system (4) by letting V (u) = uTAu∕2 − bTu. We have
$$\displaystyle \begin{aligned} \ddot{u}+ \eta \dot{u}=b-Au,\, u(0)=u_{0},\,\dot{u}(0)=v_{0}. \end{aligned} $$
(14)
As it was already discussed in section “Introduction”, the solution to (14) is globally asymptotically stable.
We reformulate (14) as the system
$$\displaystyle \begin{aligned} \begin{array}{l} \dot{u} = v\\ \dot{v} = -\eta v + (b-Au) \end{array}. \end{aligned} $$
(15)
and apply symplectic Euler to obtain
$$\displaystyle \begin{aligned} \left\{ \begin{array}{l} v_{k+1}= (I - \Delta t \, \eta ) v_k - \Delta t( b- Au_{k}) \\ u_{k+1} = u_k + \Delta t \, v_{k+1} \\ \end{array} \right. \end{aligned} $$
(16)
or, equivalently,
$$\displaystyle \begin{aligned} w_{k+1}=B w_k + c,\, B = \left[ \begin{array}{cc} I - \Delta t^2\, A & \Delta t (1 - \Delta t \, \eta )\, I \\ \Delta t \,I & (1 - \Delta t \, \eta )\, I \end{array} \right], \ c = \left[ \begin{array}{c} - \Delta t^2 b\\ -\Delta t b \end{array} \right], \end{aligned} $$
(17)
where wk = [uk, vk]T.

In order to ensure fast convergence of the method, one must choose ∥B∥ < 1 and as small as possible. This choice provides the optimal time step and damping which can be summarized in the following theorem.

Theorem 1

Consider symplectic Euler (16) where λi = λi(A) > 0 are the eigenvalues of A. Then the parameters
$$\displaystyle \begin{aligned} \Delta t = \dfrac{2}{\sqrt{ \lambda_{\mathrm{min}}} + \sqrt{ \lambda_{\mathrm{max}}}}, \eta = \dfrac{2\sqrt{ \lambda_{\mathrm{min}}}\sqrt{ \lambda_{\mathrm{max}}}}{\sqrt{ \lambda_{\mathrm{min}}} + \sqrt{ \lambda_{\mathrm{max}}}}, \end{aligned} $$
(18)
where λmin =miniλi and λmax =maxiλi are the solution to the problem
$$\displaystyle \begin{aligned} \min_{\Delta t, \eta} \max_{1\leq i \leq 2n} | \mu_i (B) |, \end{aligned}$$

where μi(B) are the eigenvalues of B.

Proof

We outline the idea of the proof and refer for details to Edvardsson et al. (2015). Let A = U ΛUT be a diagonalization of A. We apply the transformations \(\tilde {u}=U^T u\) and \(\tilde {w}_{k}=diag(U^T,U^T) w_k\) to (14) and (17) and obtain the system of n decoupled oscillators \( \ddot {\tilde {u}}_i + \eta \dot {\tilde {u}}_i + \lambda _i \tilde {u}_i = \tilde {b}_i\) and
$$\displaystyle \begin{aligned} \tilde{w}_{k+1}=\tilde{B} \tilde{w_k}+ \tilde{c}, \quad \tilde{B}= \left[ \begin{array}{cc} U^T&O\\ O & U^T \end{array} \right] B \left[ \begin{array}{cc} U&O\\ O & U \end{array} \right]= \left[ \begin{array}{cc} I - \Delta t^2\, \Lambda & \Delta t (1 - \Delta t \, \eta )\, I \\ \Delta t \, \Lambda & (1 - \Delta t \, \eta )\, I \end{array} \right], \end{aligned}$$
respectively. The eigenvalues of \(\tilde {B},\) and hence B, are then given as
$$\displaystyle \begin{aligned} \mu_i (B) = 1- \dfrac{\Delta t}{2}(\eta + \Delta t \lambda_i) \pm \dfrac{\Delta t}{2}\sqrt{ (\eta + \Delta t \lambda_i)^2 - 4 \lambda_i}. \end{aligned}$$
From Edvardsson et al. (2015) each oscillator has fastest decay if they are critically damped. This implies \(\eta + \Delta t \lambda _{\mathrm {min}} = 2\sqrt {\lambda _{\mathrm {min}}}\) and \(\eta + \Delta t \lambda _{\mathrm {max}} = 2\sqrt {\lambda _{\mathrm {max}}}\) which consequently yields (18). □

The obtained above optimal parameter choice gives the following important property.

Corollary 1

Symplectic Euler defined by (16) with Δt and η given by (18) is stable and convergent.

Proof

By construction the eigenvalues μi(B) corresponding to the smallest and largest eigenvalue of A has no imaginary part, and all other μi(B) are imaginary (underdamped). By direct calculation we then get
$$\displaystyle \begin{aligned} |\mu_i (B)| = \sqrt{Re(\mu_i (B))^2 + Im(\mu_i (B))^2} =\sqrt{1 - \eta \Delta t}. \end{aligned}$$
Substituting the optimal parameters in the equation above yields 0 < 1 − η Δt < 1 and therefore stability. Since symplectic Euler is consistent, we get convergence by the Lax equivalence theorem. □

In Edvardsson et al. (2015) the system (16) has been used to solve very large sparse linear systems (up to 107 unknowns) using optimal parameters in symplectic Euler (18). In particular, DPFM was tested on a linear Poisson equation discretized using finite differences and a problem where the matrix originates from an s-limit two-electron Hamiltonian in quantum mechanics. For the Poisson problem, DFPM clearly outperformed classical methods such as Gauss-Seidel, Jacobi, as well as the method based on the first-order ODE (12). On the quantum physics problem, DFPM was slightly faster in CPU time than the conjugate gradient method and Chebyshev semi-iterations. It was shown that DFPM is less computationally expensive in each iteration than the other two methods, and there is no benefit of choosing a higher-order symplectic solver.

DFPM has shown excellent convergence rates for an indefinite system of equations (see Neuman et al. 2015). The drawback of the method in that case is to find an efficient set of parameters.

Finally the more general system
$$\displaystyle \begin{aligned} M\ddot{u}+N\dot{u}=b-Au,\, u(0)=u_{0},\,\dot{u}(0)=v_{0},{} \end{aligned} $$
(19)
with M, N positive definite and A not necessarily positive definite, can be considered instead of (10). This case was analyzed for convergence in Gulliksson et al. (2013). While it is straightforward to apply a symplectic method to solve (19), the choice of the optimal time step and the parameters in M, N remains an open problem.

Linear Eigenvalue Problems

Consider the eigenvalue problem
$$\displaystyle \begin{aligned} A u = \lambda u, \quad u^T u = 1, \end{aligned} $$
(20)
where \(A\in \mathbb {R}^{n\times n}\) is positive definite. We assume that the eigenvalues are distinct (non-defect eigenvalue problem) and sorted as 0 < λ1 < λ2 < ⋯ < λn. Then the solution of the minimization problem
$$\displaystyle \begin{aligned} \begin{array}{l} \min_{u\in \mathbb{R}^n} V(u) = \frac{1}{2}u^T A u \\ \text{s.t } g(u) = u^T u-1 =0, \end{array} \end{aligned} $$
(21)
is the eigenvector u1 corresponding to the smallest eigenvalue λ1 as the solution.

As we have discussed in section “Introduction”, the constrained optimization problem could be solved by using (i) projection to the manifold or by (ii) introducing additional dynamical system for constraints. Here we present both approaches.

For the first approach, we formulate the corresponding dynamical system to (21) where we project the gradient of V (u) on the tangent space of the unit sphere ∥u2 = 1, that is, (I − uuT)∇V (u) = (I − uuT)Au. This gives
$$\displaystyle \begin{aligned} \begin{array}{ll} \ddot{u}+ \eta \dot{u}= (u^T A u)u - Au,& \| u\|{}_2=1 \\ u(0)=u_{0},\dot{u}(0)=v_{0}.& \end{array} \end{aligned} $$
(22)

Theorem 2

The system (22) is globally exponentially stable.

Proof

The total energy (6) is given as \(E = u^TAu/2 + \| \dot {u} \|{ }_2^2/2\) where ∥u2 = 1. We then have \( dE/dt = \dot {u}^T(I - uu^T)Au + \dot {u}^T \ddot {u} \) and, using (22), \(dE/dt = - \eta \| \dot {u} \|{ }_2^2\). Therefore, E is a Lyapunov function, and the system (22) is globally exponentially stable. □

Applying symplectic Euler to solve (22) we get
$$\displaystyle \begin{aligned} \begin{array}{l} v_{k+1} = (1 - \Delta t\, \eta) v_k - (I-uu^T)Au\\ y_{k+1} = y_k + \Delta t v_{k+1}, \ u_{k+1} = y_{k+1}/\| y_{k+1} \|{}_2 . \end{array} \end{aligned} $$
(23)
As the system (23) is nonlinear, the optimal time step and damping are not necessarily constant, as for linear system of equations. However, close to u1 we can derive locally optimal parameters by considering a linearization.
We calculate the Jacobian of F(u) = (uTAu)u − Au and project it on the tangent space of the unit sphere to obtain
$$\displaystyle \begin{aligned} J(u) = (I - uu^T)\partial F / \partial u = (I - uu^T)((u^T A u)I - uu^TA - Auu^T - A). \end{aligned}$$
The equalities \(u_1^TAu_1 = \lambda _1\) and \((I - u_1u_1^T)u_1=0\) yield J(u1) = λ1I − A. Hence the first approximation of (22) close to u1 is given by
$$\displaystyle \begin{aligned} \ddot{u}+ \eta \dot{u}= (\lambda_1 I - A)u, \ \| u \|{}_2 = 1. \end{aligned} $$
(24)
Using Theorem 1 and (24) we derive, the locally optimal time step and damping as
$$\displaystyle \begin{aligned} \Delta t = \dfrac{2}{\sqrt{ \lambda_{2}-\lambda_{1}} + \sqrt{\lambda_{n}-\lambda_{1}}}, \eta = 2\dfrac{\sqrt{ \lambda_{2}-\lambda_{1}}\sqrt{ \lambda_{n}-\lambda_{1}}} {\sqrt{ \lambda_{2}-\lambda_{1}} + \sqrt{\lambda_{n}-\lambda_{1}}}. \end{aligned} $$
(25)
This result is easily extended to the more general problem of finding eigenvector um assuming ui, i = 1, …, m − 1 to be known. Indeed, instead of (21) we solve
$$\displaystyle \begin{aligned} \begin{array}{l} \min_{u\in \mathbb{R}^n} V(u) = \frac{1}{2}u^T A u \\ \text{s.t } u_1^Tu=0, \ldots, u_{m-1}^Tu=0, u^Tu =1. \end{array} \end{aligned} $$
(26)
The locally optimal parameters of the corresponding discretized system is then given as
$$\displaystyle \begin{aligned} \Delta t = \dfrac{2}{\sqrt{ \lambda_{m+1}-\lambda_{m}} + \sqrt{\lambda_{n}-\lambda_{m}}}, \eta = 2\dfrac{\sqrt{ \lambda_{m+1}-\lambda_{m}}\sqrt{ \lambda_{n}-\lambda_{m}}} {\sqrt{ \lambda_{m+1}-\lambda_{m}} + \sqrt{\lambda_{n}-\lambda_{m}}}. \end{aligned} $$
(27)
In Gulliksson et al. (2012) DFPM with a symplectic Euler was used for finding the s-limit of the Helium ground state and first excited stage giving matrices of size n = 500, 000 or larger. It was shown that DFPM outperforms the standard package ARPACK and has a complexity rate of \(\mathcal {O}(n^{3/2})\), i.e., the same order as conjugate gradient methods. For further details and numerical tests, we refer the reader to Gulliksson (2017) and section “Excited States to the Schrödinger Equation”.
For the damped constraint approach, we introduce
$$\displaystyle \begin{aligned}g_m = (u^Tu-1)/2 = 0, \quad g_i = u_i^Tu = 0, i=1, \ldots, m-1. \end{aligned}$$
Then we consider the following dynamical system
$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \ddot{u} + \eta \dot{u} = -Au + \sum_{i=1}^{m-1} \mu_i u_i + \mu_m u \end{array} \end{aligned} $$
(28)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \ddot{g}_i + \eta \dot{g}_i = -k_i g_i, i = 0, \ldots , m-1 . \end{array} \end{aligned} $$
(29)
Observe that ∇gm = u, ∇gi = ui, i = 1, …, m − 1. This method for solving eigenvalue problem was first introduced in Gulliksson (2017) where it was shown that u(t) converges asymptotically to the eigenvector um. In Gulliksson (2017) it was also shown that the choice of ki does not change the local convergence rate if λm+1 − λm < ki < λn − λm. Given these constraints, the local convergence of the corresponding symplectic Euler with the parameters as in (25) will be the same as for the projection approach. However, while the two approaches have the same local behavior, it is not known generally which of these two methods is faster for a specific problem.

Linear Least Squares

The linear least squares problemcan be formulated as the damped dynamical system (4) using
$$\displaystyle \begin{aligned} V(u) = \dfrac{1}{2}\| Au - b\|{}_2^2, \quad A\in {\mathbb R}^{m\times n},\, b\in {\mathbb R}^m,\, m \geq n. \end{aligned} $$
(30)
We obtain
$$\displaystyle \begin{aligned} \ddot{u}+ \eta \dot{u}= A^T(b-Au), \end{aligned} $$
(31)
or, equivalently,
$$\displaystyle \begin{aligned} \begin{array}{l} \dot{u} = v\\ \dot{v} = -\eta v + A^T(b-Au). \end{array} \end{aligned} $$
(32)
Here we assume that A has full rank in order to keep the problem well-posed (see section “Ill-Posed Problems”) for the ill-posed case. The most common and efficient iterative methods for (30) are conjugate gradient (CG) methods such as LSQR and LSMR. From Edvardsson et al. (2015) and preliminary results, we know that DFPM in some cases performs as well as conjugate gradient methods and even outperforms it. This serves as a motivation to further develop DFPM for the least squares problem. Once again we use the symplectic Euler to obtain the iterative map
$$\displaystyle \begin{aligned} \left\{ \begin{array}{l} u_{k+1} = u_k + \Delta t \, v_k \\ v_{k+1}= (I - \Delta t \, \eta ) v_k - A^T (Au_{k+1}-b). \end{array} \right. \end{aligned} $$
(33)
We choose the parameters as in (18) where λi(A) is substituted by \(\lambda _i (A^TA) = \sigma _i^2 (A) \), the square of the singular values σi(A) of A.
It is well known from the analysis of the conjugate gradient method that explicitly forming the matrix ATA will introduce unnecessary rounding errors. Therefore, we introduce a new vector d = Au − b in (33) and get
$$\displaystyle \begin{aligned} \left\{ \begin{array}{l} d_{k+1} = d_k + \Delta t \, A v_k \\ v_{k+1}= (I - \Delta t \, \eta ) v_k - A^T d_{k+1}. \end{array} \right. \end{aligned} $$
(34)
The result in Fig. 2 shows the performance of DFPM compared to the conjugate gradient method for a set of random matrices A. No preconditioning was done since its effect is expected to be similar for both methods.
Fig. 2

Estimated normalized execution time as a function of the number of rows in A for DFPM compared to the conjugate gradient method. Number of columns is n = 1000, sparsity is 0.0001%, condition number is 20, and absolute tolerance is 10−4. The markers in red and green indicate when DFPM performs better or worse, respectively. The curve in the green part is the mean value of 50 random problems. The estimated time for convergence is attained with Matlabs tic and toc commands

Further work will be done to investigate the convergence of DFPM. Of specific interest is the influence of the clustering of eigenvalues which is well known to improve the convergence of conjugate gradient methods.

Ill-Posed Problems

Consider the linear operator equations,
$$\displaystyle \begin{aligned} \begin{array}{rcl}{} A u = y, \end{array} \end{aligned} $$
(35)
where A is an injective and compact linear operator acting between two infinite dimensional Hilbert spaces \(\mathcal {U}\) and \(\mathcal {Y}\). A group of examples for (35) are the integral equations, which are models with applications in the natural sciences (Wang et al., 2012, 2013; Zhang et al., 2017a), mathematics (Zhang et al., 2013, 2016c), imaging (Yao et al., 2018; Zhang et al., 2015), and engineering (Zhang et al., 2016a). Since A is injective, the operator equation (35) has a unique solution \(u^\dagger \in \mathcal {U}\) for every y from the range \(\mathcal {R}(A)\) of the linear operator A. In this context, \(\mathcal {R}(A)\) is assumed to be an infinite dimensional subspace of \(\mathcal {Y}\).

Suppose that, instead of the exact right-hand side y = Au, we are given noisy data \(y^\delta \in \mathcal {Y}\) obeying the deterministic noise model ∥yδ − y∥≤ δ with noise level δ > 0. Since A is compact and \(\mathrm {dim}(\mathcal {R}(A))=\infty \), we have \(\mathcal {R}(A)\neq \overline {\mathcal {R}(A)}\) and the problem (35) is ill-posed. Therefore, regularization methods should be employed for obtaining stable approximate solutions.

Loosely speaking, two groups of regularization methods exist: variational regularization methods and iterative regularization methods. Tikhonov regularization is certainly the most prominent variational regularization method, while the Landweber iteration is the most famous iterative regularization approach (see, e.g., Tikhonov et al. 1998 and Engl et al. 1996).

For the linear problem (35), the Landweber iteration is defined by
$$\displaystyle \begin{aligned} \begin{array}{rcl}{} u_{k+1} = u_{k} + \Delta t A^* ( y^\delta- A u_{k} ), \quad \Delta t\in\left\{ 0, 2/\|A\|{}^2 \right\}, \end{array} \end{aligned} $$
(36)
where A denotes the adjoint operator of A. We refer to Engl et al. (1996, § 6.1) for the regularization property of the Landweber iteration. The continuous analogue of (36) can be considered as a first-order evolution equation in Hilbert spaces
$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \dot{u}(t) + A^* A u(t)= A^* y^\delta \end{array} \end{aligned} $$
(37)
if an artificial scalar time t is introduced, and Δt → 0 in (36). The formulation (37) is known as Showalter’s method or asymptotic regularization (Tautenhahn, 1994; Vainikko and Veretennikov, 1986) that we introduced earlier in (12). The regularization property of (37) can be analyzed through a proper choice of the terminating time. Moreover, it has been shown that by using Runge-Kutta integrators, all of the properties of asymptotic regularization (37) carry over to its numerical realization (Rieder, 2005).

From a computational viewpoint, the Landweber iteration, as well as the steepest descent method and the minimal error method, is quite slow. Therefore, in practice accelerating strategies are usually used; see Kaltenbacher et al. (2008) and Neubauer (2000) and references therein for details.

Over the last few decades, besides the first-order iterative methods, there has been increasing evidence to show that the discrete second-order iterative methods also enjoy remarkable acceleration properties for ill-posed inverse problems. The well-known methods are the Levenberg-Marquardt method, the iteratively regularized Gauss-Newton method, and the Nesterov acceleration scheme (Neubauer, 2017). DFPM can be viewed as a second-order iterative method with the corresponding dynamical system
$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \left\{\begin{array}{ll} \ddot{u}(t) + \eta \dot{u}(t) + A^* A u(t)= A^* y^\delta, \\ u(0)=u_0, \quad \dot{u}(0)= v_0, \end{array}\right. \end{array} \end{aligned} $$
(38)
where \(u_0, v_0 \in \mathcal {U}\) are the prescribed initial data and η is a positive constant damping parameter.
Let us assume we have found a stopping time T(δ) <  using some criteria to be discussed later in more detail. The rate of convergence of u(T) → u as T → in the case of precise data, and of u(T(δ)) → u as δ → 0 in the case of noisy data, can be arbitrarily slow for solutions u which are not smooth enough (see Schock 1985). In order to prove convergence rates, some kind of smoothness assumptions imposed on the exact solution must be employed. Here we use the range-type source conditions, that is, we assume that there exist elements \(z_0,z_1\in \mathcal {X}\) and numbers p > 0 and ρ ≥ 0 such that
$$\displaystyle \begin{aligned} u_0 - u^\dagger = \left( A^* A \right)^{p} z_0 \quad \mathrm{~with~} \quad \|z_0\|\leq \rho \end{aligned} $$
(39)
and
$$\displaystyle \begin{aligned} v_0 = \left( A^* A \right)^{p} z_1 \quad \text{ with } \quad \|z_1\|\leq \rho. \end{aligned} $$
(40)
In many cases the source conditions could be interpreted in the form of differentiability of the exact solution, boundary conditions, or similar.

For the choice v0 = 0, the condition (40) is trivially satisfied. However, following the discussions in Zhang and Hofmann (2018), the regularized solutions essentially depend on the value of v0. A good choice of v0 provides an acceleration of the regularization algorithm. In practice, one can choose a relatively small value of v0 to balance the source condition and the acceleration effect. Below we give three theorems on convergence properties of the method and the choice of the regularization parameter. For the sake of brevity, we omit proofs here. The range of positive constants γ, γ, γ1, and γ2 and further details can be found in Zhang and Hofmann (2018).

Theorem 3

(A priori choice of the regularization parameter) If the terminating time T of the second-order flow (38) is selected by the a priori parameter choice
$$\displaystyle \begin{aligned} T^* (\delta) =c_0 \rho^{2/(2p+1)} \,\delta^{-2/(2p+1)} \end{aligned} $$
(41)
with the constant c0 = (2γ)2∕(2p+1), then we have the error estimate for δ ∈ (0, δ0]
$$\displaystyle \begin{aligned} \| u(T^*) - u^\dagger \| \leq c \rho^{1/(2p+1)}\, \delta^{2p/(2p+1)}, \end{aligned} $$
(42)

where c = (1 + γ)(2γ)1∕(2p+1) and δ0 = 2γρη2p+1.

In practice, the stopping rule in (41) is not realistic, since a good terminating time point T requires knowledge of ρ which is a characteristic of the unknown exact solution. Such knowledge, however, is not necessary in the case of a posteriori parameter choices. For choosing the termination time point a posteriori, we make use of Morozov’s conventional discrepancy principle and the newly developed total energy discrepancy principle (see Zhang and Hofmann 2018).

In our setting, Morozov’s conventional discrepancy principle means searching for values T > 0 satisfying the equation
$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \chi(T):= \|A u(T)-y^\delta\| - \tau \delta=0, \end{array} \end{aligned} $$
(43)
where τ is bounded below by γ1 ≥ 1.

Lemma 1

IfAu0 − yδ∥ > τδ, then the function χ(T) has at least one root.

Theorem 4

(A posteriori choice I of the regularization parameter) If the terminating time T of the second-order flow (38) is chosen according to the discrepancy principle (43) , we have for any δ ∈ (0, δ0] and p > 0 the error estimates
$$\displaystyle \begin{aligned} T^* \leq C_0 \rho^{2/(2p+1)} \delta^{-2/(2p+1)} \end{aligned} $$
(44)
and
$$\displaystyle \begin{aligned} \| u(T^*) - u^\dagger \| \leq C_1 \delta^{2p/(2p+1)}, \end{aligned} $$
(45)

where δ0 is defined in the Theorem 3 , C0 := (τγ1)−2∕(2p+1)(2γ)2∕(2p+1), and \(C_1:= \left ( \tau + \gamma _1 \right )^{2p/(2p+1)} (\gamma _1+\gamma _2)^{1/(2p+1)} + \gamma _* (\tau -\gamma _1)^{-1/(2p+1)} (2\gamma )^{1/(2p+1)}\).

The second method searches for roots T > 0 of the total energy discrepancy function, i.e.,
$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \chi_{te}(T):= \|A u(T)-y^\delta\|{}^2+ \|\dot{u}(T)\|{}^2 - \tau_{te}^2 \delta^2=0, \end{array} \end{aligned} $$
(46)
where τte > γ1 is a constant.

Lemma 2

The function χte(T) is continuous and monotonically non-increasing. If \(\|A u_0-y^\delta \|{ }^2+ \|\dot {u}_0\|{ }^2> \tau ^2 \delta ^2\), then χte(T) = 0 has a unique solution.

Theorem 5

(A posteriori choice II of the regularization parameter) Assume that a positive number δ1 exists such that for all δ ∈ (0, δ1], the unique root T of χte(T) satisfies the inequalityAu(T) − yδ∥≥ τ1δ, where τ1 > γ1 is a constant, independent of δ. Then, for any δ ∈ (0, δ0] and p > 0 we have the error estimates
$$\displaystyle \begin{aligned} T^* \leq C_0 \rho^{2/(2p+1)}\, \delta^{-2/(2p+1)}, \quad \| u(T^*) - u^\dagger \| \leq C_1\, \delta^{2p/(2p+1)}, \end{aligned} $$
(47)

where δ0 is defined in Theorem 3 and C0 and C1 are the same as in Theorem 4 .

Numerical Simulations

Roughly speaking, in the language of ill-posed problems, DFPM yields a discrete second-order iterative method, and as mentioned before symplectic integrators are more stable due to their inherit property of minimizing the exponential decrease in energy due to the damping retaining stability. Here, we use the Störmer-Verlet method, which belongs to the family of symplectic integrators, giving at the k-th iteration the scheme
$$\displaystyle \begin{aligned} \left\{\begin{array}{l} v^{k+\frac{1}{2}} = v^{k} + \frac{\Delta t}{2} \left( A^*(y^\delta-Au^{k}) - \eta v^{k+\frac{1}{2}} \right), \\ u^{k+1} =u^{k} + \Delta t v^{k+\frac{1}{2}} , \\ v^{k+1} = v^{k+\frac{1}{2}} + \frac{\Delta t}{2} \left( A^*(y^\delta-Au^{k+1}) - \eta v^{k+\frac{1}{2}} \right) , \\ u^{0}=u_0, v^{0}=v_0. \end{array}\right. \end{aligned} $$
(48)
Our numerical tests are for the following integral equation
$$\displaystyle \begin{aligned} Au(s):= \int^1_0 K(s,t) u(t) dt = y(s), \quad K(s,t)=s(1-t)\chi_{s\leq t} + t(1-s)\chi_{s> t}. \end{aligned} $$
(49)
If we choose \(\mathcal {X}=\mathcal {Y}=L^2[0,1]\), the operator A is compact, self-adjoint, and injective. It is well known that the integral equation (49) has a solution u = −y″ if \(y\in H^2[0,1]\cap H^1_0[0,1]\). Moreover, the operator A has the eigensystem j = σjφj, where σj = ()−2 and \(\varphi _j(t)=\sqrt {2}\sin {}(j\pi t)\). Furthermore, using the interpolation theory (see, e.g., Lions and Magenes 1972), it is not difficult to show that for 4p − 1∕2∉N
$$\displaystyle \begin{aligned} \begin{array}{rcl} R ((A^*A)^{p}) = \left\{ u\in H^{4p}[0,1]:~u^{2l}(0)=u^{2l}(1)=0,~l=0,1,\ldots,\lfloor 2p-1/4 \rfloor \right\}, \end{array} \end{aligned} $$
where ⌊⋅⌋ denotes the standard floor function.
In general, a regularization procedure becomes numerically feasible only after an appropriate discretization. Here, we apply the linear finite elements to solve (49). Let \(\mathcal {Y}_n\) be the finite element space of piecewise linear functions on a uniform grid with step size 1∕(n − 1). Denote by Pn the orthogonal projection operator acting from \(\mathcal {Y}\) into \(\mathcal {Y}_n\). Define An := PnA and \(\mathcal {U}_n:= A^*_n \mathcal {Y}_n\). Let \(\{\phi _j\}^n_{j=1}\) be a basis of the finite element space \(\mathcal {Y}_n\); then, instead of the original problem (49), we solve the following system of linear equations
$$\displaystyle \begin{aligned} A_n u_n = y_n, \end{aligned} $$
(50)
where \([A_n]_{ij}= \int ^1_0 \left ( \int ^1_0 k(s,t) \phi _i(s) ds \right ) \phi _j(t) dt\) and \([y_n]_{j} = \int ^1_0 y(t) \phi _j(t) dt\).
Uniformly distributed noises with the magnitude δ′ are added to the discretized exact right-hand side:
$$\displaystyle \begin{aligned}{}[y^\delta_n]_j := \left[ 1 + \delta' \cdot(2 \mathrm{Rand}(u) -1) \right] \cdot [y_n]_j, \quad j=1, \ldots, n, \end{aligned} $$
(51)
where Rand(u) returns a pseudorandom value drawn from a uniform distribution on [0,1]. The noise level of measurement data is calculated by \(\delta =\|y^\delta _n - y_n\|{ }_2\), where ∥⋅∥2 denotes the standard vector norm in \(\mathbb {R}^n\).
To assess the accuracy of the approximate solutions, we define the L2-norm relative error for an approximate solution un(T(δ)) as
$$\displaystyle \begin{aligned}\mathrm{L2Err}:= \|u_n(T^*(\delta)) - u^\dagger\|{}_{L^2[0,1]}/\|u^\dagger\|{}_{L^2[0,1]},\end{aligned}$$
where u is the exact solution to the corresponding model problem.
In order to demonstrate the advantages of DFPM over the traditional approaches, we solve the same problems by four well-known regularization methods – the Landweber method, the Nesterov’s method, the ν-method, and the conjugate gradient method for the normal equation (CGNE). The Landweber method is given in (36), while Nesterov’s method is defined as (Nesterov, 1983)
$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \left\{\begin{array}{ll} z^k = x^k + \frac{k-1}{k+\alpha-1} (x^k - x^{k-1}), \\ x^{k+1}=z^k + \Delta t A^*(y^\delta-A z^k), \end{array}\right. \end{array} \end{aligned} $$
(52)
where α > 3 (we choose α = 3.1 in our simulations). Moreover, we select the Chebyshev method as our special ν-method, i.e., ν = 1∕2 Engl et al. (1996, § 6.3). For all of four traditional iterative regularization methods, we use the Morozov’s conventional discrepancy principle as the stopping rule.
We consider the following two different right-hand sides for the integral equation (49).
  • Example 1: y(s) = s(1 − s). Then, u = 2, and u∈ R((AA)p) for all p < 1∕8. This example uses the discretization size n = 50. Other parameters are Δt = 19.4946, η = 2.5648 × 10−4, u0 = 1, v0 = 0, τ = 2, p = 0.1125.

  • Example 2: y(s) = s4(1 − s)3. Then, u = −6t2(1 − t)(2 − 8t + 7t2), and u∈ R((AA)p) for all p < 5∕8. This example uses the discretization size n = 100. Other parameters are Open image in new window.

The results of the simulation are presented in Table 1 where kmax = 400, 000 is the maximal number of iterations, DP is the discrepancy principle, and TEDP is the total energy discrepancy principle. We can conclude that, in general, DFPM need less iterations and offers a more accurate regularization solution and that the two discrepancy principles are comparable both in number of iterations and accuracy.
Table 1

Comparisons with the Landweber method, the Nesterov’s method, the Chebyshev method, and the CGNE method

δ

DP

TEDP

Landweber

 

k(δ)

CPU (s)

L2Err

k(δ)

CPU (s)

L2Err

k(δ)

CPU (s)

L2Err

Example 1

1.0400e-2

70

0.2023

0.1494

70

0.1995

0.1494

20438

52.6428

0.2639

1.0380e-3

2872

8.9239

0.0945

2871

1.9105

0.0945

kmax

1.6899e3

0.1807

1.0445e-4

56246

177.3387

0.0597

56246

177.9603

0.0473

kmax

1.8048e3

0.1807

Example 2

1.0761e-2

15

0.0554

0.3676

25

0.0753

0.1987

1457

6.3672

0.7032

1.0703e-3

93

0.2488

0.0637

132

0.3567

0.0581

23790

67.6082

0.1767

1.1006e-4

453

1.4931

0.0195

531

1.8000

0.0184

187188

679.8274

0.0509

δ

Nesterov

Chebyshev

CGNE

 

k(δ)

CPU (s)

L2Err

k(δ)

CPU (s)

L2Err

k(δ)

CPU (s)

L2Err

Example 1

1.0400e-2

419

1.1384

0.2590

264

0.7378

0.2553

6

0.0351

0.2213

1.0380e-3

2813

8.5986

0.1600

2229

7.6512

0.1496

18

0.0904

0.1383

1.0445e-4

16642

60.1179

0.1025

17443

52.2056

0.0897

39

0.2013

0.0894

Example 2

1.0761e-2

102

0.3018

0.7043

62

0.1768

0.7102

6

0.0148

0.4835

1.0703e-3

416

1.1732

0.1676

415

1.1908

0.1190

12

0.0261

0.1514

1.1006e-4

1805

6.0793

0.0280

2226

7.6967

0.0196

15

0.0309

0.0447

From Linear to Nonlinear Problems

As we have mentioned before, the main challenge of making DFPM efficient is the choice of the damping parameter and the time step. Obviously if V (u) is nonlinear, this choice cannot be made a priori as for linear problems. In this section we discuss some ways of dealing with the choice of damping and time step for nonlinear problems, some of them later appearing in the applications (see section “Applications”). We note that this section is a subject of future research, and therefore we do not go into details but rather present main ideas and illustrate the potential of the method. Here we assume that \(V : \mathbb {R}^n \rightarrow \mathbb {R}\) in (3) is convex and \(V\in C^2(\mathbb {R}^n)\). However, we believe that many of the ideas presented here can be extended to more general Hilbert spaces.

We differ between two ideas that seems possible to develop into efficient and globally convergent algorithms. The first one is to linearize the nonlinear problem and then apply optimal damping and time step for the symplectic method as in the linear case. The second idea is to use the total energy as a Lyapunov function to determine the time step either with constant damping or a time-dependent damping. These two approaches can be combined for better efficiency.

Local Linearization Using Optimal Damping and Time Step

Let us assume we have some initial approximation u1 of the convex problem (3) together with an initial velocity v1. By linearizing (4) around u1 and dropping higher-order terms, we get the dynamical system
$$\displaystyle \begin{aligned} \ddot{u}(s) + \eta \dot{u}(s) = -\nabla V (u_1)-H(u_1)(u(s)-u_1), \ s\geq 0 \end{aligned} $$
(53)
where H(u1) is the Hessian of V (u) at u1 with elements hij = 2V∂ui∂uj. The initial conditions are set to u(0) = u1, v(0) = v1. We note that H(u1) is positive definite due to the convexity of V (u). Then the stationary solution of (53) is given by −∇V (u1) − H(u1)(u − u1) = 0, i.e., \(\tilde {u}_1 = u_1 -H(u_1)^{-1}\nabla V (u_1)\). Moreover, \(\tilde {u}_1\) is the solution to the quadratic convex minimization problem
$$\displaystyle \begin{aligned} \min\limits_{u} V(u_1) + \nabla V (u_1)(u-u_1) + \dfrac{1}{2}(u-u_1)^T H(u_1)(u-u_1), \end{aligned} $$
(54)
and we recognize that \(\tilde {u}_1\) is actually a full Newton step from u1. The system (53) can be solved as we have discussed earlier by a symplectic method. Let us now describe the overall algorithm.
At iteration k assume that we have uk as an approximation of (3) together with a velocity vk and define tk as the time corresponding to iteration k. The linearized problem
$$\displaystyle \begin{aligned} \begin{array}{c} {} \ddot{u}(t) + \eta \dot{u}(t) = -\nabla V (u_k)-H(u_k)(u(t)-u_k), \\ u(t_k) = u_k,\quad v(t_k) = v_k, \end{array} \end{aligned} $$
(55)
where H(uk) is the Hessian of V at uk then has the stationary solution \(\tilde {u}_k = u_k - H(u_k)^{-1}\nabla V (u_k)\) which is also a Newton step at uk. By applying, e.g., a symplectic Euler for the dynamical system (55) with time step Δtk and damping ηk, we get
$$\displaystyle \begin{aligned} w_{j+1} = B_k (\Delta t_k, \eta_k) w_j + c_k,\ w_j=(u_j, \, v_j), \ j = 1, 2, \ldots , \ w_0 = (u_k,\, v_k) \end{aligned} $$
(56)
where Bk is given by B in (17) with A = H(uk) and \(c = c_k = (v(u_k) - H(u_k)u_k)\left [ \Delta t_k^2 , \Delta t_k \right ]^T\). Furthermore, we choose Δtk, ηk according to Theorem 1 using the eigenvalues of H(uk). To terminate the inner iterations (56) and to attain global convergence, i.e., ∥uk − u∥→ 0, j → for any initial conditions u0, v0, we have chosen to use an Armijo rule, that is, we exit the inner iterations when
$$\displaystyle \begin{aligned} V(u_{j+1}) < V(u_j) + \gamma (u_j-u_k)^T\nabla V(u_k), \gamma \in (0,1). \end{aligned} $$
(57)
If the inner iterations are terminated at j = J, we set uk+1 = uJ+1, vk+1 = vJ+1 and tk+1 = tk +  Δtk. Since H(uk) is positive definite, the inner iterations always converge and the corresponding solution will define a Newton step. Therefore, the inequality in (57) will eventually be satisfied, and the algorithm is thus globally convergent. For further details on proving global convergence using the Armijo rule, we refer to Bertsekas (2015).

We attain a variant of the algorithm above if we solve the original nonlinear problem with a symplectic method using the parameters at each step obtained from the matrix Bk given above. This shows to be a rather promising approach (see section “Numerical Experiments”), even if it is not generally globally convergent.

Obviously, the two approaches are only valid if the largest and smallest eigenvalue of H(uk) can be estimated cheaply enough. On the other hand, the parameters Δtk and ηk do not necessarily have to be calculated in every step k and can be held constant when close to the solution. Moreover, an approximation of H(uk), together with its minimum and maximum eigenvalue, may be obtained from one rank updates as in quasi-Newton methods for further efficiency.

Total Energy as a Lyapunov Function

As mentioned in section “Introduction”, the total energy in (6) is a Lyapunov function for the dynamical system (4). We can use this fact in order to find a globally convergent method for solving (3) or (4) following the ideas in Karafyllis and Grüne (2013).

Let wk+1( Δt) = F(wk,  Δt) be the iterative map defined by a symplectic solver applied to the dynamical system of interest (see (7)). Here as before wk+1( Δt) = (uk+1( Δt), vk+1( Δt)) where we have emphasized the dependence on Δt. Then total energy at k + 1 iterate is given as Ek+1( Δt) = V (uk+1( Δt)) + ∥vk+1( Δt)∥2∕2. The approach here is to determine the next time step Δtk as the solution to minΔtEk+1( Δt). However, to fully resolve this minimization problem would generally not be efficient. Therefore, we follow the idea used in Karafyllis and Grüne (2013) where Ek+1( Δt) is approximately minimized using the algorithm below instead.

Algorithm 1 A simplified time step determination based on the total energy

Open image in new window

Using Theorem 8 in Karafyllis and Grüne (2013), it can be proven that wk → w = (u, 0), k → where u is the minimum of V (u) in (3). There are more sophisticated ways of choosing Δtinit and Δtk in the algorithm above described in Karafyllis and Grüne (2013) that can improve the efficiency of DFPM and lead to fewer iterations.

Numerical Experiments

We consider the problem
$$\displaystyle \begin{aligned} \min_{u\in \mathbb{R}^n} V(u), \quad V(u)=\exp({u^T\text{diag}(a) u}) \quad a \in \mathbb{R}^n, \quad a_i>0. \end{aligned}$$
The condition ai > 0 yields convexity of V (u) and the existence of the unique solution u = 0. We calculate \(\nabla V (u)_i = 2 \exp (u^T\text{diag}(a) u)a_i u_i\), and the Hessian H(u) with elements \(h_{ij} (u) = \exp ({u^T\text{diag}(a) u})(4a_i u_i a_j u_j + 2a_i \delta _{ij})\), where δij stands for Kronecker delta. The values ai are chosen as uniformly random values in (0, 1].

The intention is not to derive the most efficient implementation but to illustrate the behavior of our suggested algorithms. Therefore, we do not report CPU times but only the number of iterations.

We have implemented and tested the four different algorithms.

The first method is a symplectic Eulergiven as
$$\displaystyle \begin{aligned} \left\{ \begin{array}{l} v_{k+1}= (I - \Delta t_k \, \eta ) v_k - \Delta t_k \nabla V(u_k) \\ u_{k+1} = u_k + \Delta t_k \, v_{k+1} \\ \end{array} \right. \end{aligned} $$
(58)
where Δtk is chosen according to Algorithm 1 with ρ = 0.9 and η is constant. In order to get a more efficient algorithm, we switch to optimal parameter values based on (18) when ∥∇V (uk)∥2 < 10−2, and the parameters are kept constant from then until convergence. A typical convergence behavior is shown in Fig. 3. In the figure we show the decrease of the norm of the gradient, ∥V (uk)∥, total energy, and the norm of the velocity, ∥vk∥. The vertical line indicates the first iterate where the damping and time step are being kept constant.
Fig. 3

Convergence of the algorithm based on Algorithm 1

The second algorithm shown in Fig. 4, denoted by uNesterov, is the famous Nestorov’s method
$$\displaystyle \begin{aligned} u_{k+1} = u_k + \gamma (u_{k}-u_{k-1}) - \omega \nabla V(u_k) \end{aligned}$$
where we have used γ = (k − 1)∕(k + α − 1), α = 3, and ω = 0.6.
Fig. 4

Convergence of three different algorithms based on different choices of damping and time step; see the legend in the plot. The dimension of the problem is n = 100, and initial condition is chosen uniformly random as u0, v0, ∥u02 = ∥v02 = 1 and is the same for all algorithms

The third algorithm with approximations \(u_k^{eig}\) is based directly on the optimal parameters in (18), say Δtk, ηk, given by the smallest and largest eigenvalue of H(uk). The symplectic Euler iterations then are given by
$$\displaystyle \begin{aligned} \left\{ \begin{array}{l} v_{k+1}= (I - \Delta t_k \, \eta_k ) v_k - \Delta t_k \nabla V(u_k) \\ u_{k+1} = u_k + \Delta t_k \, v_{k+1} \\ \end{array} \right. \end{aligned} $$
(59)
In order to improve efficiency, we keep the time step and damping constant when ∥∇V (uk)∥ < 10−2 which is indicated with a vertical line in Fig. 4. Note that this algorithm is not globally stable and did indeed diverge for some starting points.
The final algorithm we use here is based on the linearization in section “Local Linearization Using Optimal Damping and Time Step”. We use constant parameters when ∥∇V (uk)∥ < 10−4 where then the iterations are given by (58). In Fig. 5 the error in the solution is shown as a function of number of total inner iterations given by (56). The stop criterion for the inner iterations is based on the Armijo rule (57) but for illustration purposes we added a criterion of minimal inner iterations as described in the caption. Note that for more inner iterations, the convergence rate gets closer to quadratic since then the iterations are very close to Newton iterations. Even for an increasing number of inner iterations, it is not evident how many inner iterations are the most efficient since it depends on the cost for calculating the Hessian and its smallest and largest eigenvalue.
Fig. 5

Convergence of the solution ∥uk − u∥ for the linearized approach in section “Local Linearization Using Optimal Damping and Time Step” using the Armijo rule(57) where γ = 10−4. On the horizontal axis is the total number of inner iterations. The number of minimal inner iterations (56) are one, marker ‘+ ’, 5 marker ‘×’, and 20 marker ‘*’, respectively

For comparison we used a Polak-Ribiere-Polyak Constrained by Fletcher-Reeves nonlinear conjugated gradient methodwith line search based on the strong Wolfe condition. In average it had the same number of iterations as the third algorithm presented here. However the methods cannot be compared in CPU time before we make DFPM more efficient using suggestions in sections “Local Linearization Using Optimal Damping and Time Step” and “Total Energy as a Lyapunov Function”.

Applications

Image Analysis

Image denoisingis a fundamental task in image processing. An essential challenge for imaging denoising is to remove noise as much as possible without eliminating the most representative characteristics of the image, such as edges, corners, and other sharp structures. Traditional denoising methods assume that some information about the noise is given. The problem of blind image denoising involves computing the denoised image from the noisy one without any a priori knowledge of the noise. The energy functional approach has in recent years been very successful in blind image denoising. The approach is to minimize the energy functional which is, most often, taking the form
$$\displaystyle \begin{aligned} \mathcal{E} (u) = \frac{1}{2} \int_{\Omega} (u-u_0)^2 dx + \alpha \int_{\Omega} \Phi(|\nabla u|) dx. {} \end{aligned} $$
(60)
Here u0(x) is the observed (noisy) image, and \(\Omega \subset \mathbb {R}^d\) (d = 2, 3) is a bounded domain with almost everywhere C2 smooth boundary  Ω. The first term in (60) is a fidelity term, the second term is a regularization term, and α > 0 is the regularization parameter. The regularization term Φ(|∇u|) is usually assumed to be strictly convex.

A well-studied case of \(\mathcal {E}(u)\) is when the regularization term is the p-Dirichlet energy, i.e., Φ(|∇u|) = |∇u|pp, p ≥ 1. The case p = 1 corresponds to the total variation principle (see Scherzer et al. 2009), and p > 1 has been studied in, e.g., Baravdish et al. (2015).

The Euler-Lagrange equation \(\partial \mathcal {E} / \partial u =0\) associated with the functional \(\mathcal {E}(u)\) is given by
$$\displaystyle \begin{aligned} \left\{\begin{array}{rl} u-u_0 + \alpha \cdot {div} \left( \frac{\Phi'(|\nabla u|)}{|\nabla u|} \nabla u \right) =0 & {in} \ \Omega,\\ \frac{\partial u}{\partial \mathbf{n}} =0& {on} \ \partial\Omega, \end{array}\right.\end{aligned} $$
(61)
where n is the outward normal to the boundary  Ω. For the p-Dirichlet energy, let α → 0, and we obtain the first-order flow
$$\displaystyle \begin{aligned} \left\{\begin{array}{rl} \dot{u} - \Delta_{p} u = 0 & {in} \ (0,T)\times\Omega,\\ u(x,0) = u_0(x) & {in} \ \Omega,\\ \frac{\partial u}{\partial \mathbf{n}}& {on} \ \partial\Omega, \end{array}\right.\end{aligned} $$
(62)
where the p-Laplace operator is defined by \(\Delta _{p} u = {div}\left ( |\nabla u|{ }^{p-2} \nabla u \right )\). The p-parabolic equation in (62) has been studied intensively in Roubíček (2013) and references therein.
Motivated by DFPM we consider the following PDE instead of (62)
$$\displaystyle \begin{aligned} \left\{\begin{array}{rl} \ddot{u} + \eta \dot{u} - \Delta_{p} u =0 & {in} \ (0,T)\times\Omega,\\ u(x,0) = u_0(x), \quad \dot{u}(x,0) = 0 & {in} \ \Omega,\\ \frac{\partial u}{\partial \mathbf{n}}=0& {on} \ \partial\Omega, \end{array}\right.\end{aligned} $$
(63)
where again η > 0 is the damping parameter.
In order to overcome the ill-posedness of the formulation (63), we introduce regularization. Let Gσ(x) be the Gaussian kernel
$$\displaystyle \begin{aligned} G_{\sigma}(x) = \frac{1}{(2\pi\sigma)^{N/2}}e^{-\frac{|x|{}^{2}}{2\sigma}}, \quad \sigma>0, \end{aligned}$$
and ⋆ denote the cross-correlation,
$$\displaystyle \begin{aligned} (f \star g)(x) =\int_{R^d} \bar{f}(y)\ g(x+y)\,dy, \end{aligned}$$
where \(\bar {f}\) is the complex conjugate of f. Then we consider
$$\displaystyle \begin{aligned} \left\{\begin{array}{rl} \ddot{u}+\eta\dot{u}-\mathrm{div}\left((\varepsilon+|\nabla G_\sigma \star u|{}^{2})^{\frac{p-2}{2}} \nabla u \right) =0 & {in} \ (0,T)\times\Omega,\\ u(x,0) = u_0(x), \quad \dot{u}(x,0) = 0& {in} \ \Omega,\\ \frac{\partial u}{\partial \mathbf{n}}=0& {on} \ \partial\Omega. \end{array}\right. \end{aligned} $$
(64)
where ε > 0 is a fixed small number and \(|\nabla G_{\sigma }\star u| = \left (\sum _{j=1}^{N}\left (\frac {\partial G_\sigma }{\partial x_{j}}\star u\right )^{2}\right )^{1/2}\).

Theorem 6

Assume that p ∈ [1, 2] and u0 ∈ H2( Ω). Then a unique weak solution to problem (64) exists if T is sufficiently small with an upper bound depending on \(||u_{0}||{ }_{H^{1}(\Omega )}\), Gσ and Ω.

For the proof we refer to Baravdish et al. (2018).

Now, let us consider the numerical algorithm based on the equation (64). For simplicity and clarity of statements, we assume Ω is a rectangle region in \(\mathbb {R}^2\) and consider a uniform grid \(\Omega _{MN}=\{(x_i,y_j)\}^{M,N}_{i,j=1}\) in Ω with the uniform step size h = xi+1 − xi = yj+1 − yj. Define \(\mathbf {u}(t)=[u(x_i,y_j,t)]^{M,N}_{i,j=1}\). Denote uk as the projection of u(x, y, t) at the spacial grid ΩMN and time point t = tk. We approximate the \( {div}\left ( a^{\varepsilon } (u) \nabla u \right )\) by a linear one – \( {div}\left ( a^{\varepsilon }({\mathbf {u}}^{k-1}) \nabla {\mathbf {u}}^{k} \right )\), where \(a^{\varepsilon }(u)=(\varepsilon + |\nabla G_{\sigma }\star u|{ }^{2})^{\frac {p-2}{2}}\). Using the central difference discretization rule, we have
$$\displaystyle \begin{aligned} \begin{array}{rl} & {div}\left( a({\mathbf{u}}^{\varepsilon, k-1}) \nabla {\mathbf{u}}^{k} \right) = D_{x,\frac{h}{2}} \left( a^{\varepsilon, k-1}_{i,j} D_{x,\frac{h}{2}} {\mathbf{u}}^{k}_{i,j} \right) + D_{y,\frac{h}{2}} \left( a^{\varepsilon, k-1}_{i,j} D_{y,\frac{h}{2}} {\mathbf{u}}^{k}_{i,j} \right) \\ & = D_{x,\frac{h}{2}} \left( a^{\varepsilon, k-1}_{i,j} \frac{{\mathbf{u}}^{k}_{i+\frac{1}{2},j} - {\mathbf{u}}^{k}_{i-\frac{1}{2},j}}{h} \right) + D_{y,\frac{h}{2}} \left( a^{\varepsilon, k-1}_{i,j} \frac{{\mathbf{u}}^{k}_{i,j+\frac{1}{2}} - {\mathbf{u}}^{k}_{i,j-\frac{1}{2}}}{h} \right) \\ & = \frac{1}{h^2} \Big\{ a^{\varepsilon, k-1}_{i-\frac{1}{2},j} {\mathbf{u}}^{k}_{i-1,j} + a^{\varepsilon, k-1}_{i,j-\frac{1}{2}} {\mathbf{u}}^{k}_{i,j-1} - \left( a^{\varepsilon, k-1}_{i-\frac{1}{2},j} + a^{\varepsilon, k-1}_{i,j-\frac{1}{2}} + a^{\varepsilon, k-1}_{i+\frac{1}{2},j} + a^{\varepsilon, k-1}_{i,j+\frac{1}{2}} \right) {\mathbf{u}}^{k}_{i,j} \\ & \qquad \qquad + a^{\varepsilon, k-1}_{i,j+\frac{1}{2}} {\mathbf{u}}^{k}_{i,j+1} + a^{\varepsilon, k-1}_{i+\frac{1}{2},j} {\mathbf{u}}^{k}_{i+1,j} \Big\} , \end{array} \end{aligned} $$
(65)
where
$$\displaystyle \begin{aligned} a^{\varepsilon, k-1}_{i-\frac{1}{2},j} = \left( \varepsilon + |\nabla {\mathbf{G}}_{\sigma}\star {\mathbf{u}}^{k-1}_{i-\frac{1}{2},j} |{}^{2} \right)^{\frac{p-2}{2}} , {\mathbf{u}}^{k-1}_{i-\frac{1}{2},j} = \frac{{\mathbf{u}}^{k-1}_{i-1,j} + {\mathbf{u}}^{k-1}_{i,j}}{2}, \end{aligned} $$
(66)
and ∇Gσ is the projection of the function ∇Gσ on the same grid ΩMN.
Given a matrix \(\mathbf {u}\in \mathbb {R}^{M\times N}\), one can obtain a vector \(\vec {\mathbf {u}}\in \mathbb {R}^{MN}\) by stacking the columns of u. This defines a linear operator \(vec: \mathbb {R}^{M}\times \mathbb {R}^{N} \to \mathbb {R}^{MN}\),
$$\displaystyle \begin{aligned} \begin{array}{rl} & vec(\mathbf{u}) = ({\mathbf{u}}_{1,1}, {\mathbf{u}}_{2,1}, \cdot\cdot\cdot, {\mathbf{u}}_{M,1}, {\mathbf{u}}_{1,2}, {\mathbf{u}}_{2,2}, \cdot\cdot\cdot, {\mathbf{u}}_{M,1}, \cdot\cdot\cdot, {\mathbf{u}}_{1,N}, {\mathbf{u}}_{2,N}, \cdot\cdot\cdot, {\mathbf{u}}_{M,N})^T \\ & \vec{\mathbf{u}} = vec(\mathbf{u}), \quad \vec{\mathbf{u}}_q = {\mathbf{u}}_{i,j}, \quad q=(i-1)M+j. \end{array} \end{aligned}$$
This corresponds to a lexicographical column ordering of the components in the matrix u. The symbol array denotes the inverse of the vec operator. That means the following equalities hold
$$\displaystyle \begin{aligned} array(vec(\mathbf{u})) = u, \quad vec(array(\vec{\mathbf{u}})) = \vec{\mathbf{u}}, \end{aligned}$$
whenever \(\mathbf {u}\in \mathbb {R}^{M}\times \mathbb {R}^{N}\) and \(\vec {\mathbf {u}}\in \mathbb {R}^{MN}\).

Based on the above definition, rewrite (65) as the matrix form, \({\mathbf {F}}^{k-1} \vec {\mathbf {u}}^{k}\), where the matrix Fk−1 is dependent only on \(\vec {\mathbf {u}}^{k-1}\).

Denote \(\vec {\mathbf {v}}^k=\dot { \vec {\mathbf {u}}}^k\) then the Störmer-Verlet method for the PDE (64) gives the scheme
$$\displaystyle \begin{aligned} \left\{\begin{array}{l} \vec{\mathbf{v}}^{k+\frac{1}{2}} = \vec{\mathbf{v}}^{k} + \frac{\Delta t_k}{2} \left( {\mathbf{F}}^{k-1} \vec{\mathbf{u}}^{k} - \eta \vec{\mathbf{v}}^{k+\frac{1}{2}} \right), \\ \vec{\mathbf{u}}^{k+1} = \vec{\mathbf{u}}^{k} + \Delta t_k \vec{\mathbf{v}}^{k+\frac{1}{2}}, \\ \vec{\mathbf{v}}^{k+1} = \vec{\mathbf{v}}^{k+\frac{1}{2}} + \frac{\Delta t_k}{2} \left( {\mathbf{F}}^{k} \vec{\mathbf{u}}^{k+1} - \eta \vec{\mathbf{v}}^{k+\frac{1}{2}} \right), \\ \vec{\mathbf{u}}_{0}=\vec{\mathbf{u}}^\delta_0, \vec{\mathbf{v}}_0=0, \end{array}\right. \end{aligned} $$
(67)
where \(\vec {\mathbf {u}}^\delta _0= vec({\mathbf {u}}^\delta _0)\) and \({\mathbf {u}}^\delta _0\) is the projection of \(u^\delta _0(x)\) on the grid ΩMN. It has been shown in Baravdish et al. (2018) that all eigenvalues of Fk (k = 1, …, MN) are nonpositive. Let \(\lambda ^{(k)}_{\mathrm {max}}>0\) be the largest eigenvalue; then the following result holds.

Theorem 7

The scheme above is convergent provided \(\Delta t_k \leq \eta /\sqrt {\lambda ^{(k)}_{\mathrm {max}}}. \)

Various stopping criteria exist for an iteration algorithm (Scherzer et al., 2009). In principle, the stopping criterion for image denoising problems should be proposed case by case. In real-world problems, in order to obtain a high qualified denoised image, a manual stopping criterion is always required, especially for the PDE-based denoising technique. Nevertheless, an automatic stopping criterion can be helpful to select a good initial guess of the denoised image. Here, we use a frequency domain threshold method based on the fact that noise is usually represented by high frequencies.

Define energy of high frequencies by
$$\displaystyle \begin{aligned} \Delta_{N_0} (\mathbf{u}) = \sum_{i+j\geq N_0} \left| \mathfrak{F}(\mathbf{u}) (i,j) \right|{}^2, \end{aligned}$$
where \(\mathfrak {F}(\mathbf {u})\) denotes a 2D discrete Fourier transform of an image u and N0 presents the high frequency index. In the simulation, we set N0 = ⌊0.6N2⌋. Define by
$$\displaystyle \begin{aligned} RDE(k)=|\Delta_{N_0} ({\mathbf{u}}^k) - \Delta_{N_0} ({\mathbf{u}}^{k-1})| / \Delta_{N_0} ({\mathbf{u}}^{k-1}) \end{aligned}$$
the relative denoising efficiency. Then, the value of RDE at every iteration can be used as a stopping criterion. Based on this stopping criterion, an algorithm for imaging denoising is proposed in Algorithm 2.

Algorithm 2 DFPM for imaging denoising

Open image in new window

In order to show the advantages of our algorithm over the existing approaches, we solve the same problem by the following methods: total variation (TV), modified telegraph (MTele), and telegraph (Tele). The test degraded image is given in the first picture of Fig. 6, while the denoised images are displayed in the last five pictures. We see that DFPM, together with the TV methods, gives the highest Structural Similarity Index (SSIM) value 0.549. It also gives a smoother result than the other methods. Note that in Fig. 6 we have chosen the terminating time which produces the highest SSIM value. We also indicate that in general, with an appropriate choice of damping parameter η, DFPM exhibits an acceleration phenomenon, i.e., with an earlier termination time, the reconstructed image by DFPM presents a higher SSIM value than the denoised image by TV. A rigorous theoretical analysis on the acceleration of the damped flow (64) will be addressed in future work.
Fig. 6

Example results for DFPM, TV, MTele, and Tele methods

Inverse Problems for Partial Differential Equations

The interaction between partial differential equations and inverse problems has produced remarkable developments in the last couple of decades, partly due to its importance in applications of mathematics, science, and engineering, such as inverse scattering and inverse spectral problems (Chadan et al., 1997), inverse homogenization problems (Gulliksson et al., 2018), inverse chromatography problems (Cheng et al., 2018; Lin et al., 2018b; Zhang et al., 2016b, 2017b), parameter identification problems (Lin et al., 2018a), etc. Here we consider an inverse source problem (Cheng et al., 2014; Zhang et al., 2018a,b). Let \(\Omega \subset \mathbb {R}^d\) (d = 2, 3) be a bounded domain with Lipchitz boundary  Ω. Given g1 and g2 on  Ω we are concern with finding the source p such that the solution u of the boundary value problem (BVP) satisfies
$$\displaystyle \begin{aligned} u = g_1 \mathrm{~on~} \partial \Omega. \end{aligned} $$
(69)
Here c is a positive constant, n stands for the outward normal derivative, Ω0 ⊂ Ω is known as a permissible region of the source function, and χ is the indicator function such that \(\chi _{\Omega _0}(x)=1\) for x ∈ Ω0, while \(\chi _{\Omega _0}(x)=0\), when x∉ Ω0.
In order to find p, we introduce the minimization problem minpV (p) where the minimum is taken over an admissible set that incorporates a priori information about the source function p. Here for simplicity we assume it is given by L2( Ω0). The data fitting term V (p) may have different forms. For instance, in Han et al. (2006),
$$\displaystyle \begin{aligned} V(p) = \frac{1}{2} \|u(p)-g_1\|{}^2_{L^2(\partial \Omega)}{} \end{aligned} $$
(70)
with u(p) being the weak solution of (68). Another choice of V (p) is using the Kohn-Vogelius-type functionals which are expected to give more robust optimization procedures when compared with the boundary fitting formulation (Afraites et al., 2007). In this approach (see, e.g., Song and Huang 2012),
$$\displaystyle \begin{aligned} V(p)=\frac{1}{2} \|u_1(p)-u_2(p)\|{}^2_{L^2(\Omega)} {} \end{aligned} $$
(71)
with u1, u2 ∈ H1( Ω) being the weak solutions of \(-\triangle u_{1,2} + c u_{1,2} = p\chi _{\Omega _0}\) with Dirichlet and Neumann data, respectively. Recently, in Cheng et al. (2014), a novel coupled complex boundary method (CCBM) was introduced where the Dirichlet data g1 and the Neumann data g2 are used simultaneously in a single BVP and the data fitting term V (p) takes the form
$$\displaystyle \begin{aligned} V(p)= \frac{1}{2} \|u_{im}(p)\|{}^2_{L^2(\Omega)}, {} \end{aligned} $$
(72)
with u = ure + iuim (\(\mathrm {i}=\sqrt {-1}\) is the imaginary unit) solving
In practice the data g1 and g2 are noisy, and therefore one needs to introduce regularization. Thus, we consider
$$\displaystyle \begin{aligned} p_\varepsilon = \operatorname*{\arg\min}_{p} V_\varepsilon(p), \qquad V_\varepsilon(p):=V(p) + \frac{\varepsilon}{2} \|p\|{}^2_{L^2(\Omega_0)}, {} \end{aligned} $$
(74)
where ε > 0 is a regularization parameter.

Under certain assumptions, (74) admits a unique stable solution pε, which converges to p, the solution of the original inverse source problem with minimal L2-norm, as ε → 0 (see Han et al. 2006, Cheng et al. 2014, and Song and Huang 2012 for the three forms of V (p) in (70), (71), and (72), respectively).

To solve the optimization problem (74), one can use DFPM. To that end, we consider the damped dynamical system
$$\displaystyle \begin{aligned} \ddot{p}(t) + \eta \dot{p}(t) + \nabla V_\varepsilon (p) =0,\end{aligned} $$
(75)
where η > 0 is a damping parameter and ∇Vε is the gradient of Vε.

The regularization parameter ε could be chosen constant or as a function of the artificial scalar time t. In the first case, ε must be chosen a priori which is not always possible. In the latter case, no such information is required. It turns out that given ε(t) → 0 and some mild assumptions the monotonicity of ε(t), the solution of (75) p(t) → p as t →. We call the approach above as the damped dynamical regularization method. Below we give more details of the CCBM for inverse source problem with noisy data.

Suppose that, instead of the exact boundary data {g1, g2}, we are only given approximate ones \(\{g^\delta _1,g^\delta _2\}\), such that \(\|g^\delta _k-g_k\|{ }_{L^2(\partial \Omega )} \leq \delta ,\) k = 1, 2, where δ reflects the magnitude of noise in the measurements. Then, u = ure + iuim solves

It is not difficult to show that the second Fréchet derivative \(V''(p)q^2=\|u_{im}(q)-u_{im}(0)\|{ }^2_{L^2(\Omega )}\). Hence, V (p), and thus Vε(p), is convex. The next theorem allows to obtain ∇pV (p) using an adjoint problem.

Theorem 8

The Fréchet derivative of the convex functional V (p), defined in (72) , is the imaginary part of the solution on Ω0 to the adjoint problem

where uim is the imaginary part of u, the solution of (76) , i.e., \(\nabla _p V(p) = w_{im}(p)\chi _{\Omega _0}\).

Let t ∈ [t0, T(δ)] where T(δ) → when δ → 0. We set T(δ) = 1∕δ. However, for another T(δ) the same results with slight modification will hold.

We formulate the main results.

Theorem 9

Assume that the dynamical regularization parameter ε : [t0, ) → (0, 1] satisfies the following conditions
  1. (i)

    ε(t) → 0 as t ,

     
  2. (ii)

    \(\dot {\varepsilon }(t)\leq 0\) on [t0, ) and \(\dot {\varepsilon }(t)\to 0\) as t ,

     
  3. (iii)

    \(\int ^{\infty }_{t_0} \varepsilon (t) dt = \infty .\)

     
Then, the following statement holds:
  1. (a)
    For each pair \((p_{0}, \dot {p}_{0})\in L^2(\Omega _0)\times L^2(\Omega _0)\), there exists a unique solution pδ : [t0, T(δ)] → L2( Ω0) of the Cauchy problem
    $$\displaystyle \begin{aligned} \left\{\begin{array}{ll} \ddot{p}(x,t) + \eta \dot{p}(x,t) + w^\delta_{im}(x,t) + \varepsilon(t) p(x,t)=0, & x\in\Omega_0,~ t\in [t_0, T(\delta)], \\ p(x,t_0)=p_{0}(x), \dot{p}(x,t_0)=\dot{p}_{0}(x), & x\in\Omega_0, \end{array}\right. \end{aligned} $$
    (78)
    where \(w^\delta =w^\delta _{re}+\mathrm {i} w^\delta _{im}\) is the solution of the adjoint problem with the same t and \(u^\delta =u^\delta _{re}+\mathrm {i} u^\delta _{im}\) is the solution of the BVP
     
  2. (b)

    For any fixed t ∈ [t0, T(δ)], pδ(t) → p(t) in L2( Ω0) as δ → 0. Here, p(t) denote the solution of system (78) , (79) , and (80) with noisy-free boundary data.

     
  3. (c)

    Both \(\dot {p}^\delta (T(\delta ))\) and \(\ddot {p}^\delta (T(\delta ))\) converge to zero in L2( Ω0) when the noise level δ vanishes.

     
  4. (d)

    Given p be the minimal L2( Ω0)-norm solution of problem (72) with noisy-free boundary data, pδ(T(δ)) → p in L2( Ω0) as δ → 0.

     

The proof of Theorem 9 can be found in Zhang et al. (2018a). Obviously, the dynamical regularization parameter satisfying the conditions of Theorem 9 can, e.g., be chosen as the following functions: \(\varepsilon (t)= C/t, C/\log (t), C/(t\log (t))\), etc., where C is a constant, and we choose t0 large enough so that the value of ε(t) is restricted in (0, 1]. The constant C does not have an effect on the value of the approximate solution. However, it influences the speed of the numerical solver. With an appropriate value of C, one may obtain a good enough approximate solution by our algorithm, within a few iterations.

By using standard linear finite elements on a triangular mesh with size h on the weak formulations of (79) and (80), we attain discrete approximations uh(x), wh(x) with corresponding imaginary parts \(u^h_{im}(x), w^h_{im}(x)\). Further we assume that ph(x) is the finite element ansatz in the same finite element space but restricted on Ω0.

The finite element approximations introduced are now all assumed to be time dependent in order to formulate our semi-discrete dynamical system corresponding to (78). By defining \(q^h(x,t)=\dot {p}^h(x,t)\), we can formulate the semi-discrete dynamical system as first-order
$$\displaystyle \begin{aligned} \left\{\begin{array}{ll} \dot{q}^h = - \eta q^h - \varepsilon p^h - w^h_{im} \chi_{\Omega_0} , \\ \dot{p}^h = q^h, \\ p^h(t_0)=p^h_{0}, q^h(t_0)=\dot{p}^h_{0}. \end{array}\right. \end{aligned} $$
(81)
We then apply the symplectic Euler to get
$$\displaystyle \begin{aligned} \left\{\begin{array}{l} q^{h}_{k+1} = q^{h}_{k} - \Delta t \left( \eta q^{h}_{k+1} + \varepsilon_{k} p^{h}_{k} + w^{h}_{im}(p^{h}_{k}) \chi_{\Omega_0} \right), \\ p^{h}_{k+1} = p^{h}_{k} + \Delta t q^{h}_{k+1}, \\ q^{h}(t_0)=\dot{p}^{h}_{0}, p^{h}(t_0)=p^{h}_{0}, \end{array}\right. \end{aligned} $$
(82)
where \(p^{h}_{k}=p^{h}(t_k)\) and Δt is a fixed time step size.

Theorem 10

Let \(p^{h}_k\) be the finite element solution of the scheme (82) with noisy boundary data and p be the minimal L2( Ω0)-norm solution of the noise-free problem (72) . If the iteration number of scheme (82) satisfies k( Δt, δ) Δtδ ∞ as (δ,  Δt) → 0, then \(\|p^{h}_{k(\Delta t,\delta )} - p^*\|{ }_{0,\Omega } \to 0\) as (δ,  Δt, h) → 0. Here δ is the noise level of the boundary data, and Δt and h are the discretization parameters for problems (78) , (79) , and (80) in time and in space, respectively.

Numerical Simulations

We let \(\Omega =\{(x,y,z)\in \mathbb {R}^3\ |\ x^2+y^2<1, 0< z<2\}\), c = 1, the true source p(x, y, z) = 10(1 + x + y + z) be defined on Ω0 = {(x, y, z) ∈ Ω∣(x − 0.5)2 + (y − 0.5)2 + (z − 1)2 ≤ 0.32}, and the nonnoisy Neumann data be g2 = 0. We solve (73) with p = p in order to produce g1. It was computed on a fine mesh with h = 0.1050, 55637 nodes and 316565 elements.

Then uniformly distributed noises with level δ are added to both g1 and g2 to get \(g^\delta _1\) and \(g^\delta _2\), that is,
$$\displaystyle \begin{aligned} g^\delta_i(x,y,z)=[1+\delta\cdot(2\,\mathrm{rand}(x,y,z)-1)]\,g _i(x,y,z),\quad x,y,z\in \partial \Omega,\quad i = 1, 2,\end{aligned}$$
where rand(x, y, z) returns a pseudorandom value drawn from a uniform distribution on [0, 1] for each fixed (x, y, z). With properly chosen parameters η,  Δt, ε(t), the scheme (82) was implemented to get ph, a stable approximation of p. We chose t0 = 2 and \(p_{0}=\dot {p}_{0}=0\), which is far from the true source p.
With our method, the approximate source function ph is recovered on a mesh with h = 0.4068, 1077 nodes, 5157 elements for various values of the parameters δ with \(\Delta t=10, \eta =1, \varepsilon (t)=0.1/(t\ln (t))\). The approximate source functions are plotted in Figs. 7 and 8. The corresponding relative error L2Err and iterative number are 7.8728e-2, 7.8381e-2, 7.9860e-2, 7.7507e-2, 8.6974e-2, 7.0421e-2, and 1.0757e-1 and 5760, 5936, 6261, 3805, 1651, 456, and 494, respectively. We conclude from these numerical results that the reconstruction is accurate and stable. Moreover, the regularized solution is easily attained without having to use any additional equation for the regularization parameter.
Fig. 7

Exact source p and reconstructed source ph for δ = 0, 0.005, 0.01

Fig. 8

Reconstructed source ph for δ = 0.05, 0.1, 0.2, 0.3

Applications in Quantum Physics

A quantum particle can be described through a complex wave function that solves the Schrödinger equation, which in the DFPM formulation is a time- and space-dependent linear PDE. For many interacting quantum particles, the dimensionality of the corresponding Schrödinger equation normally makes it unsuitable for direct computations. However, in various situations so-called mean-field methods can yield an approximate wave function through the solution of a nonlinear Schrödinger equation. In the following section, we present three such different examples. Note that the notation in this section is not standard in quantum physics where usually E denotes total energy and V the potential in the Hamiltonian.

Excited States to the Schrödinger Equation

We start with an example of how to calculate the well-known normalized wave functions (eigenstates) u and energies (eigenvalues) μ to the linear Schrödinger equationwith a harmonic potential (here in dimensionless units)
$$\displaystyle \begin{aligned} -\frac{1}{2}u_{xx}+\frac{1}{2}x^{2}u=\mu u,\:\| u \|{}_{L_2(\mathbb{R})} = 1.{} \end{aligned} $$
(83)
We note that (83) is an infinite dimensional counterpart of the linear eigenvalue problem discussed in section “Linear Eigenvalue Problems” where the energy level μ corresponds to the unknown positive eigenvalue. That is, we are solving the following optimization problem
$$\displaystyle \begin{aligned}\begin{array}{ll} \min\limits_u V(u) = \frac{1}{2}(u,\, Hu)_{L^2(\mathbb{R})},& H = -\frac{1}{2}\frac{d^2}{dx^2} + \frac{1}{2}x^{2},\\ s.t. \| u \|{}_{L^2(\mathbb{R})} = 1.& \end{array}\end{aligned}$$

We emphasize that even if (83) is a simple problem with exact solution, see below, it has all the relevant properties of more challenging eigenvalue problems such as the ones treated in Gulliksson et al. (2012) and is thus excellent for illustrating DFPM.

Let \(u^{\left (m\right )}\) denote the m’th eigenstate and \(\mu ^{\left ( m\right )} = (u^{\left (m\right )},Hu^{\left (m\right )})\) the corresponding eigenvalue. To obtain \(u^{\left (m\right )}\) we use in addition to (83) the m − 1 orthogonality constraints
$$\displaystyle \begin{aligned} \int\limits_{\mathbb{R}}\bar{{u}}^{\left(1\right)}u^{\left(m\right)}dx=0,\:\int\limits_{\mathbb{R}}\bar{{u}}^{\left(2\right)}u^{\left(m\right)}dx=0,\:\ldots,\:\int\limits_{\mathbb{R}}\bar{{u}}^{\left(m-1\right)}u^{\left(m\right)}dx=0,{} \end{aligned} $$
(84)
where, as before, the bar denotes complex conjugation. Using (83) and (84) we formulate the dynamical system and the m constraints compactly written using Kroneckers delta, that is,
$$\displaystyle \begin{aligned} \begin{array}{c} \ddot{u}^{\left(m\right)}+\eta \dot{u}^{\left(m\right)}=\frac{1}{2}u_{xx}^{\left(m\right)}-\frac{1}{2}x^{2}u^{\left(m\right)}+\mu^{\left(m\right)}u^{\left(m\right)},\\ \\ g_{n}=\int\limits_{\mathbb{R}}\bar{{u}}^{\left(n\right)}u^{\left(m\right)}dx-\delta_{nm}=0,\: n=1,\:2,\:\ldots,m. \end{array} \end{aligned} $$
(85)

Since one need access to \(u^{\left (n\right )},\: n=1,\:2,\:\ldots ,m-1\), we solve (85) in consecutive order. Here we have used the damped constrained approach described in section “Linear Eigenvalue Problems” with symplectic Euler where parameters t and η were chosen according to (25).

The Eq. (83) possesses the explicit solutions
$$\displaystyle \begin{aligned} u^{\left(n\right)}=\frac{1}{\pi^{1/4}\sqrt{2^{\left(n-1\right)}\left(n-1\right)!}}\mathcal{H}_{n-1}\left(x\right)\exp\left(-\frac{x^{2}}{2}\right),\: \mu^{(n)}=n-\frac{1}{2},\: n=1,\:2,\:3,\:\ldots\:,{} \end{aligned} $$
(86)
where \(\mathcal {H}_{k}\) denote the Hermite polynomial of degree k.
In the left plot of Fig. 9 we plot the first five wave functions \(u^{\left (n\right )},\: n=1,\:2,\:\ldots \:,5\) for the harmonic potential. The wave functions are sorted according to their energy μ(n) = n − 1∕2 which is reflected by the following inequality u(1)(|5|) < … < u(5)(|5|). We used − 20 ≤ x ≤ 20 and Δx = 10−2 for the discretization. The parameters η and Δt were chosen optimally according to (25). We used the damped constraint approach (see section “Linear Eigenvalue Problems”) with k1 = k2 = … = k5 = 5. In the right plot we give examples of the convergence for the corresponding energies \(\mu ^{\left (n\right )}\) as t →..
Fig. 9

Left figure: The five lowest eigenstates \(u^{ \left (n \right )},\: n=1,\:2,\:\ldots \:,5\) of (83) sorted by their respective energy μ(n). Dotted lines correspond to the numerical solution and the rings to the analytic solutions (86). The dashed black curve shows the potential x2∕2. Right figure: The convergence of the numerically calculated five energies \(\mu ^{ \left (n \right )}(t),\: n=1,\:2,\:\ldots \:,5\)

The Yrast Spectrum for Atoms Rotating in a Ring

In a recent development of DFPM, we investigated a one-dimensional nonlinear Schrödinger equation with a rotational term on a ring geometry with radius R = 1, i.e., \(I = \left \{x\in \mathbb {R}: -\pi < x \leq \pi \right \}\) with periodic boundary conditions.

The aim is to minimize
$$\displaystyle \begin{aligned} V=\int\limits_{-\pi}^{\pi} \left|\nabla u\right|{}^{2} + \gamma \pi \left|u\right|{}^{4} dx, \end{aligned} $$
(87)
subject to the constraints for normalization and the total angular momentum
$$\displaystyle \begin{aligned} g_{1}=\int\limits_{-\pi}^{\pi}\left|u\right|{}^{2}dx-1 = 0,\ g_{2}=-\mathrm{i} \int\limits_{-\pi}^{\pi}\bar{{u}}u_{x}dx-\ell=0. \end{aligned} $$
(88)
Together with the normalization and the momentum constraint, the equation is given as Sandin et al. (2016).
$$\displaystyle \begin{aligned} \ddot{u}+\eta \dot{u}=u_{xx}-2\pi\gamma\left|u\right|{}^{2}u-\mathrm{i}\Omega u_{x}+\mu u, \ u \in L_2(I). \end{aligned} $$
(89)
In this case DFPM was implemented with a modified RATTLE method (Andersen, 1983), in which we solved for the Lagrange parameters μ and − Ω in each time step (see Sandin et al. 2016 for details).
In Fig. 10 we have plotted the resulting so-called Yrast curve (main figure) with the density and phase of the corresponding complex wave function u for the particularly interesting points  = 0, 0.5, 1 (inset figures). At integer values of (0, 1 here) u is a plane wave \(u=\exp \left (i\ell x\right )/\sqrt {2\pi }\). At half-integer values (e.g.,  = 0.5), u corresponds to a dark soliton that circulates in the ring (see the right-upper- and mid-lower-inset figures).
Fig. 10

Yrast curve, i.e., energy vs momentum, with some examples of the density and the phase for the wavefunction u for γ = 7.5

In Sandin et al. (2016) it was demonstrated that DFPM outperformed another commonly used first-order method.

Phase Separation of Bosonic- and Fermionic-Densities in an Ultracold Atomic Mixture

In a recent work (Abdullaev et al., 2018) DFPM was used to calculate the initial conditions for a real-time propagation of coupled nonlinear Schrödinger-like-equations to model a so-called Fermi-Bose mixture. This is an exotic state of matter consisting of two interacting different atomic species at ultracold temperatures.

The aim is to minimize subject to
$$\displaystyle \begin{aligned} g_{n}=\int\left|u_{n}\right|{}^{2}dx-N_{n,0},\ n=1,\ 2,{} \end{aligned} $$
(91)
where N1,0 = 1000 and N2,0 = 200 are the number of atoms and \(V_{1,2}^{(ext)}\) are the external potentials, which in this example was harmonic \(V_{1,2}^{(ext)}=x^{2}/4\); finally we kept the bosonic coupling g1 = 1 and varied the interatomic interaction g12 = g21 in the numerical example below.
In order to solve the problem above, we formulated the coupled dynamical system
$$\displaystyle \begin{aligned} \begin{array}{l} \ddot{{u}}_{1}+\eta_{1}\dot{{u}}_{1}=\left(\nabla^{2}-V_{1}^{(ext)}-g_{1}\left|u_{1}\right|{}^{2}-g_{12}\left|u_{2}\right|{}^{2}+\mu_{1}\right)u_{1}\\ \ddot{{u}}_{2}+\eta_{2}\dot{{u}}_{2}=\left(\nabla^{2}-V_{2}^{(ext)}-\frac{\pi^{2}}{4}\left|u_{2}\right|{}^{4}-g_{21}\left|u_{1}\right|{}^{2}+\mu_{2}\right)u_{2}, \end{array} \end{aligned} $$
(92)
where μ1 and μ2 are the Lagrange parameters.
Using the dynamical formulation for the constraints (see (11)), we evaluate μn(t), n = 1, 2, at each iteration step tk from
$$\displaystyle \begin{aligned} \mu_{1}=\frac{\int\left|\nabla u_{1}\right|{}^{2}+V_{1}^{(ext)}\left|u_{1}\right|{}^{2}+g_{1}\left|u_{1}\right|{}^{4}+g_{12}\left|u_{2}\right|{}^{2}\left|u_{1}\right|{}^{2}-\left|\dot{{u}}_{1}\right|{}^{2}dx-\frac{k_{1}}{2}\left(\int\left|u_{1}\right|{}^{2}dx-N_{1,0}\right)}{\int\left|u_{1}\right|{}^{2}dx},{} \end{aligned} $$
(93)
and
$$\displaystyle \begin{aligned} \mu_{2}=\frac{\int\left|\nabla u_{2}\right|{}^{2}+V_{2}^{(ext)}\left|u_{2}\right|{}^{2}+\frac{\pi^{2}}{4}\left|u_{2}\right|{}^{6}+g_{21}\left|u_{1}\right|{}^{2}\left|u_{2}\right|{}^{2}-\left|\dot{{u}}_{2}\right|{}^{2}dx-\frac{k_{2}}{2}\left(\int\left|u_{2}\right|{}^{2}dx-N_{2,0}\right)}{\int\left|u_{2}\right|{}^{2}dx}.{} \end{aligned} $$
(94)

For the simulations we used a fourth-order Runge-Kutta method in the XMDS2 code generator (www.xmds.org) with η1 = η2 = k1 = k2 = 1 and Δtk = 0.01.

We have studied the atomic densities \(\left |u_{1}\right |{ }^{2}\) and \(\left |u_{2}\right |{ }^{2}\) of the mixture for different values of g12 = g21 = ∓π (see Fig. 11). For attractive interatomic forces g12 = g21 < 0, the densities overlap in order to minimize the total energy (see the left plot in Fig. 11). For repulsive interatomic forces g12 = g21 > 0 the densities instead (partly) separate from each other in order to minimize the total energy, see the right plot in Fig. 11.
Fig. 11

Ground-state densities for an ultracold Fermi-Bose mixture with different interatomic interaction with the attractive interaction g12 = −π (left) and the repulsive interaction g12 = π (right). In both figures the blue curve shows the bosonic density, and the red dashed curve shows the fermionic density

In Fig. 12 (left) we have plotted the total energy Etot and the dynamical constraints g1(t) and g2(t) at each iteration step to demonstrate convergence while keeping the parameters constant η1 = η2 = k1 = k2 = 1.
Fig. 12

The left figure shows an example of convergence for the total energy (blue curve) and normalization constraints (red and green dashed curves). The right figure shows a landscape for the inverse number of iterations for different parameter values. The black thick curve corresponds to the result using projection method (see text)

We then performed a numerical (sub-)optimization of the parameters η1 = η2 and k1 = k2 to minimize the number of iterations needed for DFPM to reach the ground state. The corresponding landscape for the inverse number of iterations needed to reach \(\left |E_{\mathrm {tot}}-E_{\mathrm {ref}}\right |<10^{-8}\) is shown in the right part of Fig. 12. Here Eref is the ground-state energy, calculated numerically for a large time (i.e., formally Open image in new window). The corresponding result for using projection instead of dynamic constraints for the two normalization conditions, i.e. with only η1 = η2 being the free parameter, was also calculated and is plotted in the right part of Fig. 12. In some sense this projection corresponds to the limit k1 = k2 →, and qualitative agreement can be seen, especially in the right part of the figure, corresponding to the overdamped regime. The resolution of the parameters in the right figure is Δη1 =  Δk1 = 0.1. In all cases we used Gaussian initial conditions for un, n = 1, 2, with initial normalization \(\int \left |u_{n}\right |{ }^{2}dx=2N_{n,0},\: n=1,\:2\), (in order to have any substantial dynamic for the constraints) and interatomic interaction g12 = −π.

For this example with only normalization constraints, we cannot conclude an advantage of the DFPM version with dynamic constraints, compared to DFPM with projection, with respect to the number of iterations. However, it is generally known that projection requires a smaller time step to maintain stability.

Conclusions and Future Work

We have described a new approach, DFPM, to solve optimization problems and equations using a second-order damped dynamical system together with symplectic methods. The strength of the method lies in the combination of a globally exponentially stable system together with an energy preserving symplectic method.

Based on the work presented here, we believe that DFPM has the capacity to solve a variety of problems more efficiently and accurate than existing methods. This is shown to be true for linear problems such as linear system of equations, linear eigenvalue problems, and linear least squares problems, see section “Linear Problems” where it is fairly easy to choose the parameters in DFPM efficiently.

A straightforward extension of the linear eigenvalue problems in section “Linear Eigenvalue Problems” is to consider the generalized eigenvalue problem Au = λBu where A, B are positive definite matrices. Optimal choices of parameters can be derived by a simultaneous transformation of A, B to diagonal matrices and then estimated numerically. The corresponding Rayleigh quotient is easily calculated for finding the approximation of the generalized eigenvalue in each iteration. A major challenge here is to approximate the optimal parameters efficiently. However, because of the simplicity of DFPM we believe that DFPM might be more useful for nonlinear eigenvalue problems A(λ)u = 0 where A(λ) most often is a polynomial, a rational function, or containing the exponential function.

For ill-posed linear problems, considered in section “Ill-Posed Problems” the method has the advantage of being highly competitive with other existing iterative methods as well as not requiring an a priori choice of regularization parameter needed such as in, e.g., Tikhonov regularization methods. It would be most interesting to further develop DFPM toward nonlinear ill-posed problems following the approach we have developed for the linear case. It will be a challenge to find good convergence criteria for the iterations as well as an efficient and robust way of choosing the damping parameter.

Regarding inequality constraints, we are currently considering using reflections for simple inequality bound constraints based on the idea of reflecting particles (see Kaufman et al. 2012). A major challenge here is to construct the reflections such that the numerical method for solving the damped dynamical system will conserve the energy without losing accuracy. As a simple example we can take the optimization problem \(\min V(u)\) with positivity constraints u > 0. In step k of the numerical method for solving (4), we attain an updated position uk+1. If any component of uk+1, say \(u_i^{(k+1)}\), is nonpositive a new update is defined for that component by finding the reflection at the surface ui = 0 using the direction of \(v_i^{(k+1)}\) for defining the angle of reflection α (see Fig. 13). The energy conservation is maintained by adjusting the distance between uk+1 and the reflection point (however there are other ways of conserving the energy).
Fig. 13

Reflection on the surface ui = 0 where uk, vk are current approximations of the position and velocity, u(tk), v(tk). The updated approximate position and velocity are uk+1, vk+1 , and α is the angle of reflection with respect to the surface ui = 0

For nonlinear problems we presented in section “From Linear to Nonlinear Problems” two different approaches based on the Lyapunov function and a local linearization that both can be made convergent for convex problems. We showed by a simple example preliminary numerical results. With more future research using ideas from quasi-Newton methods and higher-order symplectic solvers, we think DFPM will be a competitive alternative for these problems.

By using negative damping, it is possible to add energy to the dynamical system forcing the iterates to leave any vicinity of a minimum. This can be used to solve a global optimization problem. To illustrate the idea, we used DFPM with a symplectic Euler to find the two minima of the peaks function
$$\displaystyle \begin{aligned} V(u) = 3(1-u_1)^2 e^{-u_1^2 - (u_2+1)^2} - 10\left(\frac{1}{5}u_1 - u_1^3 - u_2^5\right)e^{-u_1^2-u_2^2} - \frac{1}{3} e^{-(u_1+1)^2 - u_2^2}, \end{aligned}$$
see Fig. 14. In Fig. 15 the upper curve shows the norm of the gradient and the lower curve the size of the damping (note the negative damping). The starting point is chosen close to the minimum (0.22826, −1.6255), and DFPM converges initially toward that minimum. The damping is then switched to a negative value increasing the norm of the gradient, and the iterates depart from the minimum. When the damping is switched back to a positive value, DFPM converges toward the second local minimum situated approximately at (−1.3474, 0.20450). We have not tried to find any optimal parameters in this simple example which explains the rather slow convergence.
Fig. 14

The peaks function

Fig. 15

Upper curve show the convergence of DFPM and the lower the size of the damping when finding the minima of the peaks function

References

  1. Abdullaev F Kh, Ögren M, Sørensen MP (2018, Submitted) Collective dynamics of Fermi-Bose mixtures with an oscillating scattering lengthGoogle Scholar
  2. Afraites L, Dambrine M, Kateb D (2007) Conformal mappings and shape derivatives for the transmission problem with a single measurement. Numer Func Anal Opt 28:519–551CrossRefGoogle Scholar
  3. Alvarez F (2000) On the minimizing property of a second order dissipative system in Hilbert spaces. SIAM J Control Opt 38(4):1102–1119MathSciNetCrossRefGoogle Scholar
  4. Alvarez F, Attouch H, Bolte J, Redont P (2002) A second-order gradient-like dissipative dynamical system with hessian-driven damping. J de Mathematiques Pures et Applicuees 81(8):747–779CrossRefGoogle Scholar
  5. Andersen HC (1983) Rattle: a velocity version of the shake algorithm for molecular dynamics calculations. J Comput Phys 52:24–34CrossRefGoogle Scholar
  6. Ascher U, van den DK, Huang H (2007) Artificial time integration. BIT 47:3–25Google Scholar
  7. Attouch H, Alvarez F (2000) The heavy ball with friction dynamical system for convex constrained minimization problems. Lect Notes Econ Math Syst 481:25–35MathSciNetCrossRefGoogle Scholar
  8. Attouch H, Chbani Z (2016) Combining fast inertial dynamics for convex optimization with Tikhonov regularization. 39(2). arXiv:1602.01973Google Scholar
  9. Attouch H, Goudou X, Redont P (2000) The heavy ball with friction method, I. The continuous dynamical system: global exploration of the local minima of a real-valued function by asymptotic analysis of a dissipative dynamical system. Commun Contemp Math 2(1):1–34MathSciNetCrossRefGoogle Scholar
  10. Baravdish G, Svensson O, Åström F (2015) On backward p(x)-parabolic equations for image enhancement. Numer Funct Anal Optim 36(2):147–168MathSciNetCrossRefGoogle Scholar
  11. Baravdish G, Svensson O, Gulliksson M, Zhang Y (2018) A damped flow for image denoising. ArXiv e-printsGoogle Scholar
  12. Begout P, Bolte J, Jendoubi M (2015) On damped second-order gradient systems. J Differ Equ 259(7):3115–3143MathSciNetCrossRefGoogle Scholar
  13. Bertsekas DP (2015) Convex optimization algorithms. Athena Scientific, BelmontzbMATHGoogle Scholar
  14. Bhatt A, Floyd D, Moore BE (2016) Second order conformal symplectic schemes for damped Hamiltonian systems. J Sci Comput 66(3):1234–1259MathSciNetCrossRefGoogle Scholar
  15. Chadan K, Colton D, Paivarinta L, Rundell W (1997) An introduction to inverse scattering and inverse spectral problems. SIAM, PhiladelphiaCrossRefGoogle Scholar
  16. Cheng X, Gong R, Han W, Zheng W (2014) A novel coupled complex boundary method for inverse source problems. Inverse Prob 30:055002MathSciNetCrossRefGoogle Scholar
  17. Cheng X, Lin G, Zhang Y, Gong R, Gulliksson M (2018) A modified coupled complex boundary method for an inverse chromatography problem. J Inverse Ill-Posed Prob 26:33–49MathSciNetCrossRefGoogle Scholar
  18. Chu MT (2008) Numerical linear algebra algorithms as dynamical systems. Acta Numer 17:1–86MathSciNetCrossRefGoogle Scholar
  19. Edvardsson S, Neuman M, Edström P, Olin H (2015) Solving equations through particle dynamics. Comput Phys Commun 197:169–181MathSciNetCrossRefGoogle Scholar
  20. Engl H, Hanke M, Neubauer A (1996) Regularization of inverse problems, vol 375. Springer, New YorkCrossRefGoogle Scholar
  21. Gulliksson M (2017) The discrete dynamical functional particle method for solving constrained optimization problems. Dolomites Res Notes Approx 10:6–12MathSciNetCrossRefGoogle Scholar
  22. Gulliksson M, Edvardsson S, Persson J (2012) The dynamical functional particle method: an approach for boundary value problems. J Appl Mech 79(2):021012CrossRefGoogle Scholar
  23. Gulliksson M, Edvardsson S, Lind A (2013) The dynamical functional particle method. ArXiv e-prints, 2013Google Scholar
  24. Gulliksson M, Holmbom A, Persson J, Zhang Y (2018) A separating oscillation method of recovering the g-limit in standard and non-standard homogenization problems. Inverse Prob 32:025005MathSciNetCrossRefGoogle Scholar
  25. Hairer E, Lubich C, Wanner G (2006) Geometric numerical integration, 2nd edn. Springer, Berlin/HeidelbergzbMATHGoogle Scholar
  26. Han W, Cong W, Wang G (2006) Mathematical theory and numerical analysis of bioluminescence tomography. Inverse Prob 22:1659–1675MathSciNetCrossRefGoogle Scholar
  27. Kaltenbacher B, Neubauer A, Scherzer O (2008) Iterative regularization methods for nonlinear ill-posed problems. Walter de Gruyter GmbH & Co. KG, BerlinCrossRefGoogle Scholar
  28. Karafyllis I, Grüne L (2013) Lyapunov function based step size control for numerical ode solvers with application to optimization algorithms. In: Hüper K, Trumpf J (eds) Mathematical system theory – festschrift in honor of Uwe Helmke on the occasion of his 60th birthday. CreateSpace, pp 183–210. http://num.math.unibayreuth.de/de/publications/2013/gruene_karafyllis_lyapunov_function_based_step_size_control_2013/index.html
  29. Kaufman D, Pai D (2012) Geometric numerical integration of inequality constrained, nonsmooth hamiltonian systems. SIAM J Sci Comput 34(5):A2670–A2703MathSciNetCrossRefGoogle Scholar
  30. Lin G, Cheng X, Zhang Y (2018a) A parametric level set based collage method for an inverse problem in elliptic partial differential equations. J Comput Appl Math 340:101–121MathSciNetCrossRefGoogle Scholar
  31. Lin G, Zhang Y, Cheng X, Gulliksson M, Forssen P, Fornstedt T (2018b) A regularizing Kohn-Vogelius formulation for the model-free adsorption isotherm estimation problem in chromatography. Appl Anal 97:13–40MathSciNetCrossRefGoogle Scholar
  32. Lions J, Magenes E (1972) Non-homogeneous boundary value problems and applications, vol I. Springer, BerlinCrossRefGoogle Scholar
  33. Mclachlan R, Reinout G, Quispel W (2006) Geometric integrators for odes. J Phys A 39: 5251–5285MathSciNetCrossRefGoogle Scholar
  34. McLachlan R, Modin K, Verdier O, Wilkins M (2014) Geometric generalisations of shake and rattle. Found Comput Math J Soc Found Comput Math 14(2):339MathSciNetCrossRefGoogle Scholar
  35. Nesterov Y (1983) A method of solving a convex programming problem with convergence rate. Sov Math Doklady 27:372–376zbMATHGoogle Scholar
  36. Neubauer A (2000) On Landweber iteration for nonlinear ill-posed problems in Hilbert scales. Numer Math 85:309–328MathSciNetCrossRefGoogle Scholar
  37. Neubauer A (2017) On Nesterov acceleration for Landweber iteration of linear ill-posed problems. J Inverse Ill-Posed Prob 25:381–390MathSciNetzbMATHGoogle Scholar
  38. Neuman M, Edvardsson S, Edström P (2015) Solving the radiative transfer equation with a mathematical particle method. Opt Lett 40(18):4325–4328CrossRefGoogle Scholar
  39. Poljak BT (1964) Some methods of speeding up the convergence of iterative methods. Akademija Nauk SSSR. Zurnal Vycislitel nli Matematiki i Matematicoskoi Fiziki 4:791Google Scholar
  40. Rieder A (2005) Runge-Kutta integrators yield optimal regularization schemes. Inverse Prob 21:453–471MathSciNetCrossRefGoogle Scholar
  41. Roubíček T (2013) Nonlinear partial differential equations with applications, vol 153. Springer Science & Business Media, BaselCrossRefGoogle Scholar
  42. Sandin P, Ögren M, Gulliksson M (2016) Numerical solution of the stationary multicomponent nonlinear schrödinger equation with a constraint on the angular momentum. Phys Rev E 93:033301MathSciNetCrossRefGoogle Scholar
  43. Sandro I, Valerio P, Francesco Z (1979) A new method for solving nonlinear simultaneous equations. SIAM J Numer Anal 16(5):779–11. 10Google Scholar
  44. Scherzer O, Grasmair M, Grossauer H, Haltmeier M, Lenzen F (2009) Variational methods in imaging. Springer, New YorkzbMATHGoogle Scholar
  45. Schock E (1985) Approximate solution of ill-posed equations: arbitrarily slow convergence vs. superconvergence. Construct Methods Pract Treat Integral Equ 73:234–243MathSciNetCrossRefGoogle Scholar
  46. Smyrlis G, Zisis V (2004) Local convergence of the steepest descent method in Hilbert spaces. J Math Anal Appl 300(2):436–453MathSciNetCrossRefGoogle Scholar
  47. Song S, Huang J (2012) Solving an inverse problem from bioluminescence tomography by minimizing an energy-like functional. J Comput Anal Appl 14:544–558MathSciNetzbMATHGoogle Scholar
  48. Tautenhahn U (1994) On the asymptotical regularization of nonlinear ill-posed problems. Inverse Prob 10:1405–1418MathSciNetCrossRefGoogle Scholar
  49. Tikhonov A, Leonov A, Yagola A (1998) Nonlinear ill-posed problems, vol I and II. Chapman and Hall, LondonzbMATHGoogle Scholar
  50. Tsai C-C, Liu C-S, Yeih W-C (2010) Fictious time integration method of fundamental solutions with Chebyshev polynomials for solving Poisson-type nonlinear pdes. CMES 56(2):131–151MathSciNetzbMATHGoogle Scholar
  51. Vainikko G, Veretennikov A (1986) Iteration procedures in ill-posed problems. Moscow: Nauka (In Russian)Google Scholar
  52. Wang Y, Zhang Y, Lukyanenko D, Yagola A (2012) A method of restoring the aerosol particle size distribution function on the set of piecewise-convex functions. Vychislitelnye Metody i Programmirovanie 13:49–66Google Scholar
  53. Wang Y, Zhang Y, Lukyanenko D, Yagola A (2013) Recovering aerosol particle size distribution function on the set of bounded piecewise-convex functions. Inverse Prob Sci Eng 21:339–354MathSciNetCrossRefGoogle Scholar
  54. Watson L, Sosonkina M, Melville R, Morgan A, Walker H (1997) Alg 777:hompack90: a suite of fortan 90 codes for globally convergent homotopy algorithms. ACM Trans Math Softw 23(4):514–549CrossRefGoogle Scholar
  55. Yao Z, Zhang Y, Bai Z, Eddy WF (2018) Estimating the number of sources in magnetoencephalography using spiked population eigenvalues. J Am Stat Assoc 113(522):505–518MathSciNetCrossRefGoogle Scholar
  56. Zhang Y, Hofmann B (2018) On the second order asymptotical regularization of linear ill-posed inverse problems. Applicable Analysis, pp 1–26. https://doi.org/10.1080/00036811.2018.1517412
  57. Zhang Y, Lukyanenko D, Yagola A (2013) Using Lagrange principle for solving linear ill-posed problems with a priori information. Vychislitelnye Metody i Programmirovanie 14:468–482Google Scholar
  58. Zhang Y, Lukyanenko D, Yagola A (2015) An optimal regularization method for convolution equations on the sourcewise represented set. J Inverse Ill-Posed Prob 23:465–475MathSciNetzbMATHGoogle Scholar
  59. Zhang Y, Gulliksson M, Hernandez Bennetts V, Schaffernicht E (2016a) Reconstructing gas distribution maps via an adaptive sparse regularization algorithm. Inverse Prob Sci Eng 24:1186–1204MathSciNetCrossRefGoogle Scholar
  60. Zhang Y, Lin G, Forssen P, Gulliksson M, Fornstedt T, Cheng X (2016b) A regularization method for the reconstruction of adsorption isotherms in liquid chromatography. Inverse Prob 32:105005MathSciNetCrossRefGoogle Scholar
  61. Zhang Y, Lukyanenko D, Yagola A (2016c) Using Lagrange principle for solving two-dimensional integral equation with a positive kernel. Inverse Prob Sci Eng 24:811–831MathSciNetCrossRefGoogle Scholar
  62. Zhang Y, Forssen P, Fornstedt T, Gulliksson M, Dai X (2017a) An adaptive regularization algorithm for recovering the rate constant distribution from biosensor data. Inverse Prob Sci Eng 24:1–26Google Scholar
  63. Zhang Y, Lin G, Forssen P, Gulliksson M, Fornstedt T, Cheng X (2017b) An adjoint method in inverse problems of chromatography. Inverse Prob Sci Eng 25:1112–1137MathSciNetCrossRefGoogle Scholar
  64. Zhang Y, Gong R, Cheng X, Gulliksson M (2018a) A dynamical regularization algorithm for solving inverse source problems of elliptic partial differential equations. Inverse Prob 34:065001MathSciNetCrossRefGoogle Scholar
  65. Zhang Y, Gong R, Gulliksson M, Cheng X (2018b) A coupled complex boundary expanding compacts method for inverse source problems. J Inverse Ill-Posed Prob, pp 1–20.  https://doi.org/10.1515/jiip-2017-0002

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Mårten Gulliksson
    • 1
    Email author
  • Magnus Ögren
    • 1
  • Anna Oleynik
    • 2
  • Ye Zhang
    • 3
  1. 1.MathematicsSchool of Engineering and TechnologyÖrebroSweden
  2. 2.Department of MathematicsUniversity of BergenBergenNorway
  3. 3.Faculty of MathematicsChemnitz University of TechnologyChemnitzGermany

Section editors and affiliations

  • Torsten Lindström
  • Bharath Sriraman
    • 1
  1. 1.Department of Mathematical SciencesThe University of MontanaMissoulaUSA

Personalised recommendations