1 Introduction

We aim to minimize a non-smooth functional consisting of a combined \(L^1/L^2\) data fidelity term and a total variation term. Let \(\varOmega \subseteq {\mathbb {R}}^d\) be an open, bounded and simply connected domain with Lipschitz boundary, where \(d \in {\mathbb {N}}\) denotes the spatial dimension, e.g. \(d=1\) for signals or \(d=2\) for images. Further, we denote by \(g \in L^2(\varOmega )\) the given data, \(T:L^2(\varOmega )^m \mapsto L^2(\varOmega )\) a bounded linear operator, where \(m\in {\mathbb {N}}\) denotes the number of channels, e.g. \(m=1\) for grey-scale images or \(m=d\) for motion fields, and \(\alpha _1,\alpha _2,\lambda \ge 0\) adjustable weighting parameters. Then we consider the so-called \(L^1\)-\(L^2\)-TV model

$$\begin{aligned} \inf _{{\textbf{u}}\in L^2(\varOmega )^m\cap BV(\varOmega )^m} \alpha _1 \Vert T {\textbf{u}} -g \Vert _{L^1(\varOmega )} + \tfrac{\alpha _2}{2} \Vert T {\textbf{u}} -g \Vert _{L^2(\varOmega )}^2 + \lambda \int _{\varOmega }|D{\textbf{u}}|_F, \end{aligned}$$
(1)

which was first proposed in a slightly more general form in [36] for the scalar-valued case \(m=1\). Here \(BV(\varOmega )^m\) denotes the space of m-vector-valued functions with bounded variation, i.e. \(BV(\varOmega )^m:= \{{\textbf{u}} \in L^1(\varOmega )^m :\int _\varOmega |Du|_F < \infty \}\), where \(\int _\varOmega |D{\textbf{u}}|_F\) denotes the total variation of \({\textbf{u}}\) in \(\varOmega \) defined by

(2)

The operator \({\text {div}}:(C_0^\infty (\varOmega ))^{d\times m} \mapsto C_0^{\infty }(\varOmega )^m\) describes the divergence with respect to d (i.e. column-wise), while \(|{\,\cdot \,}|_F:{\mathbb {R}}^{d\times m} \mapsto {\mathbb {R}}\) denotes the Frobenius norm. For \({\textbf{u}} \in H^1(\varOmega )^m\) the total variation becomes , see [7, Section 10.1] or Proposition 3.4 below in a more general setting. The space \(BV(\varOmega )^m\) equipped with the norm \(\Vert {\textbf{u}}\Vert _{BV(\varOmega )^m}:= \Vert {\textbf{u}}\Vert _{L^1(\varOmega )^m} + \int _{\varOmega }|D{\textbf{u}}|_F\) is a Banach space [7, Theorem 10.1.1]. Note that choosing \(|{\,\cdot \,}|_F\) leads to rotational invariance of the total variation in both the domain (change of coordinates) and the range (global rotation of vector field) of \({\textbf{u}}\). We refer to [30] for a short overview of other ways to define the total variation for vector-valued functions. In all of these definitions the topological properties remain the same. If we replace the pointwise norm \(|{\,\cdot \,}|_F\) in the definition of the total variation above with any other matrix norm, the defined total variation may be different but the resulting space \(BV(\varOmega )^m\) will be topologically equivalent. Indeed, since any two norms \(|{\,\cdot \,}|_a,~|{\,\cdot \,}|_b:~{\mathbb {R}}^m~\rightarrow ~[0,\infty )\) are equivalent, i.e. \(c |\textbf{x}|_b \le |\textbf{x}|_a \le C |\textbf{x}|_b\) for all \(\textbf{x} \in {\mathbb {R}}^m\) for constants \(c, C > 0\), we observe for any 1-homogeneous functional \(F: C_0^\infty (\varOmega )^{d\times m} \rightarrow {\mathbb {R}}\), i.e. \(F(c{\textbf{p}})=cF({\textbf{p}})\) for any \(c>0\), that

$$\begin{aligned} \tfrac{1}{C} \sup _{|{\textbf{p}}(\textbf{x})|_b \le 1} F({\textbf{p}}) \le \tfrac{1}{C} \sup _{\tfrac{1}{C}|{\textbf{p}}(\textbf{x})|_a \le 1} F({\textbf{p}})&= \sup _{|{\textbf{p}}(\textbf{x})|_a \le 1} F({\textbf{p}}) \\&\le \sup _{c|{\textbf{p}}(\textbf{x})|_b \le 1} F({\textbf{p}}) = \tfrac{1}{c} \sup _{|{\textbf{p}}(\textbf{x})|_b \le 1} F({\textbf{p}}). \end{aligned}$$

Consequently, the corresponding norms \(\Vert {\,\cdot \,}\Vert _a:=\Vert {\,\cdot \,}\Vert _{L^1(\varOmega )^m} + \int _\varOmega |D{\,\cdot \,}|_a\) and \(\Vert {\,\cdot \,}\Vert _b:=\Vert {\,\cdot \,}\Vert _{L^1(\varOmega )^m} + \int _\varOmega |D{\,\cdot \,}|_b\) on \(BV(\varOmega )^m\) are equivalent and \(BV(\varOmega )^m\) carries the same topology as e.g. the space of bounded variation from the extensive work [6].

It is demonstrated in [36, 43, 45] that optimization problem (1) is well suited to the task of removing a mixture of Gaussian and impulse noise. Moreover it is easy to see that (1) is a generalization of two well-known total variation models. For \(\alpha _1=0\) in (1) we obtain the so-called \(L^2\)-TV model, which has been successfully used to remove Gaussian noise in images, see e.g. [20], for \(\alpha _2 =0\) we get the so-called \(L^1\)-TV model which is proposed, see e.g. [5, 47, 48], to remove impulse noise. Moreover, these two special instances have been used for calculating the optical flow in image sequences, cf. [24]. In the literature modifications of the \(L^1\)-\(L^2\)-TV model have been presented, see e.g. [31, 46]. In [31] the total variation is replaced by \(\Vert Wu\Vert _{L^1}\) with W being a wavelet tight frame transform. The second order total generalized variation [16] has been used as regularization term in [46], where also box-constraints are incorporated to assure that the reconstruction lies in the respective dynamic range.

In this paper we derive a primal-dual semi-smooth Newton method, cf. [34], in order to find an approximate solution of (1) on a finite element grid. Such Newton methods have been already used for the \(L^2\)-TV model [39, 44] and \(L^1\)-TV model [25, 42] in image reconstruction, i.e. \(m=1\). We extend the approach of semi-smooth Newton methods to a vector-valued setting and to the full \(L^1\)-\(L^2\)-TV model. In particular, dualization results for the scalar models in [35] and [37] need to be adjusted to our vector-valued setting. In comparison to the primal-dual methods in [25, 39, 42, 44], where the dualization is performed either on smooth or on discrete function spaces, our dualization-setting allows for non-smooth solutions, in particular for solutions in \(L^2(\varOmega )^m \cap BV(\varOmega )^m\). We rigorously analyze the existence and uniqueness of solutions of the respective optimization problems.

Further, our proposed algorithm is compared with the primal-dual method of [21] in a finite element setting. Note that based on the method in [21] finite element discretizations of the \(L^2\)-TV model have been considered in [10,11,12, 50] and of the \(L^1\)-\(L^2\)-TV model with T being the identity in [3]. We refer the reader to [22] for an overview of finite element discretization techniques of the total variation. Our comparison demonstrates numerically that the proposed Newton method tremendously outperforms the method in [21].

The rest of the paper is organized as follows: In Sect. 2 we describe the functional-analytic setting used in this paper and formulate the mathematical problem. Conditions for the existence and uniqueness of a solution of the problem are analyzed for different function space settings. A regularized model is considered in Sect. 3 for which a pair of primal-dual problems is derived and analyzed based on Fenchel duality. We prove that this regularized model \(\varGamma \)-converges to the non-regularized model of Sect. 2. In Sect. 4 the primal-dual semi-smooth Newton algorithm based on the pair of primal-dual problems of Sect. 3 is introduced and its well-posedness is analyzed. We present in Sect. 5 the discretization of the considered problem using finite element spaces. Numerical experiments demonstrate the applicability of the proposed algorithm in a finite element setting in Sect. 6.

2 Preliminaries

2.1 Basic Terminology

For a Banach space V we write its corresponding norm as \(\Vert {\,\cdot \,}\Vert _V\), while \(|{\,\cdot \,}|\) describes the Euclidean norm on \({\mathbb {R}}^n\), \(n\in {\mathbb {N}}\). Further, the expression \(V^*\) denotes the continuous dual space of V, i.e. the space of bounded linear functionals \(V \rightarrow {\mathbb {R}}\), and we use \({\langle }{\,\cdot \,}, {\,\cdot \,}{\rangle }_{V, V^*}\) for the duality pairing. For a bounded linear operator \(\varLambda : V \rightarrow W\) between two Banach spaces V and W we use \(\Vert \varLambda \Vert := \Vert \varLambda \Vert _{{\mathcal {L}}(V,W)}\) for the operator norm and denote the adjoint operator by \(\varLambda ^*: W^* \rightarrow V^*\).

For \(V=L^2(\varOmega )^{n}\), \(n \in {\mathbb {N}}\), i.e. the Hilbert space of square-integrable vector-valued functions, we denote the associated inner product by brackets \({\langle }{\,\cdot \,}, {\,\cdot \,}{\rangle }_V: \big ((u_k)_{k=1}^n, (v_k)_{k=1}^n\big ) \mapsto \sum _{k=1}^n {\langle }u_{k}, v_{k}{\rangle }_{L^2(\varOmega )}\), where \({\langle }{\,\cdot \,}, {\,\cdot \,}{\rangle }_{L^2(\varOmega )}\) is the standard \(L^2\) inner product. Apart from notational convenience, we treat a matrix-valued space \(L^2(\varOmega )^{d\times m}\), \(m \in {\mathbb {N}}\), as equivalent to \(L^2(\varOmega )^{dm}\) using the numbering \((i,j) \mapsto (j-1) \cdot d + i\), \(i \in \{1, \dotsc , d\}\), \(j \in \{1, \dotsc , m\}\) of the respective components. Moreover, for any \(L^2\) function space we may use the inner product shorthand notations \({\langle }{\,\cdot \,}, {\,\cdot \,}{\rangle }_{L^2}:= {\langle }{\,\cdot \,}, {\,\cdot \,}{\rangle }_V\) and similarly \(\Vert {\,\cdot \,}\Vert _{L^2}:= \Vert {\,\cdot \,}\Vert _V\) for the norm.

Often operations are applied in a pointwise sense, such that for a vector-valued function \({\textbf{u}}: \varOmega \rightarrow {\mathbb {R}}^m\), \(m \in {\mathbb {N}}\) the expression \(|{\textbf{u}}|\) denotes the function \(|{\textbf{u}}|: \varOmega \rightarrow {\mathbb {R}}\), \(\textbf{x} \mapsto |{\textbf{u}}(\textbf{x})|\).

Similarly \(|{\textbf{u}}| \ge 1\) would denote a predicate \(w: \varOmega \rightarrow \{\textrm{true}, \textrm{false}\}\) evaluating to true where \(|{\textbf{u}}(\textbf{x})| \ge 1\) for \(\textbf{x}\in \varOmega \) and to false otherwise. For such a predicate w we define the indicator \(\chi _w \in \overline{{\mathbb {R}}}\) as

$$\begin{aligned} \chi _{w}:= {\left\{ \begin{array}{ll} 0 &{} \text {if }w(\textbf{x})\text { is true for a.e.} \ \textbf{x} \in \varOmega , \\ \infty &{} \text {else}. \end{array}\right. } \end{aligned}$$

Thus \(\chi _{|w| \le 1}\) would evaluate to \(\infty \) if and only if |w| is greater than 1 on a set of non-zero measure.

A function \(f: V \rightarrow \overline{{\mathbb {R}}}:={\mathbb {R}}\cup \{+\infty \}\) is called proper if \(f(u) < \infty \) for one \(u \in V\) and \(f(u) > -\infty \) for all \(u \in V\). Further f is called coercive, if for any sequence \((v_n)_{n\in {\mathbb {N}}} \subseteq V\) we have

$$\begin{aligned} \Vert v_n\Vert _V \rightarrow \infty \implies F(v_n) \rightarrow \infty . \end{aligned}$$

A bilinear form \(a:V\times V \rightarrow {\mathbb {R}}\) is called V-elliptic or coercive, if there exists a constant \(c>0\) such that \(a(v,v) \ge c \Vert v\Vert ^2_V\) for all \(v\in V\).

For a convex functional \(f: V \rightarrow \overline{{\mathbb {R}}}\), we define the subdifferential of f at \(v\in V\), as the set valued function \(\partial f(v) = \emptyset \) if \(f(v)=\infty \), and otherwise as

$$\begin{aligned} \partial f(v) = \{v^* \in V^* \, \ \langle v^*, u-v\rangle _{V^*,V} + f(v) \le f(u) \ \ \forall u\in V \}. \end{aligned}$$

Let us recall the notion of \(\varGamma \)-convergence and \(\varGamma \)-limit, see [15]: a sequence \((f_j)_j\) of functions \(f_j:V \rightarrow \overline{{\mathbb {R}}}\) \(\varGamma \)-converges in V to its \(\varGamma \)-limit \(f:V \rightarrow \overline{{\mathbb {R}}}\) (we write shortly ), if for all \(v\in V\) we have

  1. 1.

    \(f(v) \le \liminf _{j\rightarrow \infty } f_j(v_j)\) for every \((v_j)_j\subseteq V\) converging to v;

  2. 2.

    \(f(v) \ge \limsup _{j\rightarrow \infty } f_j(v_j)\) for some \((v_j)_j\subseteq V\) converging to v.

A sequence \((v_j)_j\) is called (weakly) V-convergent, if it converges (weakly) in the space V. A function \(f: V \rightarrow \overline{{\mathbb {R}}}\) is called lower semi-continuous (l.s.c.) if for all \(u \in V\) we have that \(\liminf _{k \rightarrow \infty } f(v_k) \ge f(u)\) for any sequence \((v_k)_k \rightarrow u\) as \(k \rightarrow \infty \). In a Banach space we have the following important set-based characterization of lower semi-continuity.

Proposition 2.1

[15, Remark 1.3] Let V be a Banach space. A function \(F: V \rightarrow \overline{{\mathbb {R}}}\) is lower semi-continuous if and only if all level sets \(L_a:= \{v \in V: F(v) \le a\}\), \(a \in {\mathbb {R}}\) are sequentially closed.

Further, for convex functions lower semi-continuity with regard to weak convergence coincides with lower semi-continuity.

Lemma 2.1

Let V be a Banach space and \(F: V \rightarrow \overline{{\mathbb {R}}}\) be a convex function. Then F is lower semi-continuous if and only if it is weakly lower semi-continuous.

Proof

Since F is convex, the level sets \(L_a\), \(a \in {\mathbb {R}}\) of F are convex. Then due to [27, Corollary 8.74] all \(L_a\), \(a \in {\mathbb {R}}\) are closed if and only if they are weakly closed. The characterization of lower semi-continuity due to Proposition 2.1 finalizes the argument. \(\square \)

Finally, lower semi-continuity propagates to the supremum.

Lemma 2.2

[15, Remark 1.4 (ii)] Let V be a Banach space and \(F_k: V \rightarrow \overline{{\mathbb {R}}}\), \(k \in {\mathcal {I}}\) for some index set \({\mathcal {I}}\) be lower semi-continuous functions. Then the supremum \(F: V \rightarrow \overline{{\mathbb {R}}}\), \(F(v):= \sup _{k\in {\mathcal {I}}} F_k(v)\) is lower semi-continuous.

The conjugate function (or Legendre transform) of a convex function \(F: V\rightarrow \overline{{\mathbb {R}}}\) is defined as \(F^*: V^* \rightarrow \overline{{\mathbb {R}}}\) with

$$\begin{aligned} F^*(v^*)= \sup _{v\in V} \{\langle v,v^* \rangle _{V,V^*} - F(v)\}. \end{aligned}$$

If F is separable, i.e. \(F(v_1, v_2) = F_1(v_1) + F_2(v_2)\) for functions \(F_1: V_1 \rightarrow \overline{{\mathbb {R}}}\) and \(F_2: V_2 \rightarrow \overline{{\mathbb {R}}}\), then so its conjugate \(F^*\):

$$\begin{aligned} F^*(v_1^*, v_2^*) = F_1^*(v_1^*) + F_2^*(v_2^*), \end{aligned}$$

see [28, III, Remark 4.3]. We present a specific version of the Fenchel duality theorem which will be convenient to us for the type of minimization problem we are considering in this paper.

Theorem 2.1

(Fenchel duality, [28, Remark III.4.2]) Let V and W be reflexive Banach spaces, \(A: V \rightarrow W\) be a continuous linear operator and \(F: V \rightarrow \overline{{\mathbb {R}}}\), \(G: W \rightarrow \overline{{\mathbb {R}}}\) be proper, convex, lower semi-continuous functions such that there exists \(v_0 \in V\) with \(F(v_0) + G(A v_0) < \infty \) and G continuous at \(A v_0\). Then the following holds:

$$\begin{aligned} \inf _{v\in V} F(v) + G(A v) = \sup _{w^*\in W^*} - F^*(A^*w^*) - G^*(-w^*). \end{aligned}$$
(3)

The problem on the right hand side in (3) has at least one solution. In addition \({\hat{v}} \in V\), \({\hat{w}}^* \in W^*\) are solutions to both optimization problems if and only if

$$\begin{aligned} A^* {\hat{w}}^* \in \partial F({\hat{v}}), \quad \text { and} \quad -{\hat{w}}^* \in \partial G (A {\hat{v}}). \end{aligned}$$

2.2 Problem Formulation

The intended application lies in imaging and hence we want to allow for discontinuous solutions. To this aim the space of interest is \(BV(\varOmega )^m\), i.e. the space of functions of bounded variations; cf. (1). In this paper we use \(m=1\) for denoising (T being the identity) and inpainting (T being the characteristic function of a subset of \(\varOmega \)) of greyscale images and \(m=2\) for determining the optical flow between a series of two greyscale images (see Sect. 6.4 below for the choice of T). We will not deal with other choices of m like \(m=3\) which might be of use to treat color images as three separate channels.

In order to derive a primal-dual semi-smooth Newton method, we consider the following penalized version of (1)

$$\begin{aligned} \inf _{{\textbf{u}}\in V} \alpha _1 \Vert T {\textbf{u}} -g \Vert _{L^1} + \tfrac{\alpha _2}{2} \Vert T {\textbf{u}} -g \Vert _{L^2}^2 + \tfrac{\beta }{2} \Vert S{\textbf{u}}\Vert _{V_S}^2 + \lambda \int _{\varOmega }|D{\textbf{u}}|_F, \end{aligned}$$
(4)

where \(\beta \ge 0\) is an optional penalization parameter, typically chosen very small such that problem (4) is a close approximation of (1), \(V\subseteq L^2(\varOmega )^m\) is a continuously embedded Hilbert space, and \(S: V \rightarrow V_S\) is a bounded linear operator for some Hilbert space \(V_S\). We note that searching for solutions \(u \in V\) in (4) instead of in the space \(V \cap BV(\varOmega )^m\) as in (1) does not affect the original problem in its intended purpose. Indeed, once \(\lambda > 0\) any \(u \in V \subseteq L^2(\varOmega )^m \subseteq L^1(\varOmega )^m\) for which the energy in (4) is finite needs to have a finite total variation and therefore is an element of \(BV(\varOmega )^m\).

For the operator S and its related spaces we will restrict ourselves to the choices

\((S.\textrm{i})\):

\(S = I: V \rightarrow V_S\) with the normed subspaces \((V,\Vert {\,\cdot \,}\Vert _{V})\) and \((V_S,\Vert {\,\cdot \,}\Vert _{L^2})\) where \(V \subseteq L^2(\varOmega )^m\) is weakly closed, \(\Vert {\,\cdot \,}\Vert _V:= \Vert {\,\cdot \,}\Vert _{L^2}\), \(V_S \subseteq L^2(\varOmega )^m\), \(\Vert {\,\cdot \,}\Vert _{V_S}:= \Vert {\,\cdot \,}\Vert _{L^2}\) or

\((S.\textrm{ii})\):

\(S = \nabla : V \rightarrow V_S\) with the normed subspaces \((V,\Vert {\,\cdot \,}\Vert _{V})\) and \((V_S,\Vert {\,\cdot \,}\Vert _{L^2})\) where \(V \subseteq H^1(\varOmega )^m\) is weakly closed, \(\Vert {\,\cdot \,}\Vert _V:= \Vert {\,\cdot \,}\Vert _{H^1}\) and \(V_S \subseteq L^2(\varOmega )^{d\times m}\) (the boundedness of S follows due to \(\Vert \nabla {\textbf{v}}\Vert _{L^2} \le \Vert {\textbf{v}}\Vert _{H^1}\)), \(\Vert {\,\cdot \,}\Vert _{V_S}:= \Vert {\,\cdot \,}\Vert _{L^2}\),

which we will refer to as Setting \((S.\textrm{i})\) and Setting \((S.\textrm{ii})\) respectively. Note that Setting \((S.\textrm{ii})\) has \(V \subseteq H^1(\varOmega )^m\), which restricts \({\textbf{u}} \in V\) to allow for weak derivatives, while Setting \((S.\textrm{i})\) does not. However, in Setting \((S.\textrm{i})\) \(V \subseteq H^1(\varOmega )^m \subseteq L^2(\varOmega )^m\) is possible as long as V is weakly closed in \(L^2(\varOmega )^m\), which is not the case for \(V=H^1(\varOmega )^m\). We emphasize that since every finite dimensional subspace of a normed vector space is closed [40, Corollary 5.34] and convex, it is also weakly closed [27, Corollary 8.74]. In particular, Setting \((S.\textrm{i})\) and Setting \((S.\textrm{ii})\) cover the case of discrete subspaces \(V \subseteq L^2(\varOmega )^m\) or \(V \subseteq H^1(\varOmega )^m\) respectively.

We are aware that setting \(S=\nabla \) in (4), i.e. Setting \((S.\textrm{ii})\), adds regularity to the solution space V. As we are interested in solutions which may have discontinuities, this might indeed be a disadvantage of this setting. Nevertheless we still consider Setting \((S.\textrm{ii})\) for the following reasons: (i) To the best of our knowledge until now in a continuous setting primal-dual semi-smooth Newton methods have only been presented in the literature for total variation minimization with \(S=\nabla \) [26, 38, 39, 44]. In this vein Setting \((S.\textrm{ii})\) naturally extends the existing approaches to the \(L^1\)-\(L^2\)-TV case. (ii) It allows us to compare Setting \((S.\textrm{i})\) and Setting \((S.\textrm{ii})\), see for example Fig. 5  for a numerical comparison in image inpainting.

2.3 The Bilinear Form \(a_B\)

To describe the differentiable part of (4) it is convenient to define the symmetric bilinear form \(a_B: V \times V \rightarrow {\mathbb {R}}\) by

$$\begin{aligned} a_B({\textbf{u}},\textbf{w}):= \alpha _2 {\langle }T {\textbf{u}}, T \textbf{w}{\rangle }_{L^2} + \beta {\langle }S{\textbf{u}}, S\textbf{w}{\rangle }_{L^2} = {\langle }B{\textbf{u}}, \textbf{w}{\rangle }_{V^*,V} \end{aligned}$$
(5)

with \(B: V \rightarrow V^*\) denoting the operator \(B:= \alpha _2 T^* T + \beta S^* S\). Thus \(B{\textbf{u}} = {\textbf{v}}\) for \({\textbf{u}} \in V\), \({\textbf{v}} \in V^*\) if and only if

$$\begin{aligned} a_B({\textbf{u}}, \textbf{w}) = {\langle }{\textbf{v}}, \textbf{w}{\rangle }_{V^*, V} \end{aligned}$$
(6)

for all \(\textbf{w} \in V\). The bilinear form \(a_B({\,\cdot \,},{\,\cdot \,})\) induces a respective energy norm defined by

$$\begin{aligned} \Vert {\textbf{u}} \Vert _B^2:= a_B({\textbf{u}},{\textbf{u}}) \qquad \text {for} \ {\textbf{u}}\in V. \end{aligned}$$

Since T and S are bounded linear operators, it is easy to see that \(a_B\) is bounded (i.e. continuous) as well. In particular, we have

$$\begin{aligned} |a_B({\textbf{v}},\textbf{w})|&\le \alpha _2\Vert T\Vert _{{\mathcal {L}}(L^2, L^2)}^2\Vert {\textbf{v}}\Vert _{L^2}\Vert \textbf{w}\Vert _{L^2} + \beta \Vert S{\textbf{v}}\Vert _{L^2}\Vert S\textbf{w}\Vert _{L^2} \\&\le (\alpha _2 \Vert T\Vert _{{\mathcal {L}}(L^2, L^2)}^2 + \beta )\Vert {\textbf{v}}\Vert _V\Vert \textbf{w}\Vert _V \end{aligned}$$

for any \({\textbf{v}},\textbf{w}\in V\).

Remark 2.1

Note that a continuous bilinear form \(a_B\) is coercive, i.e. there exists \(c_B>0\) such that \(a_B({\textbf{v}},{\textbf{v}})\ge c_B \Vert {\textbf{v}}\Vert _V^2\), for all \({\textbf{v}}\in V\) if and only if \(a_B\) is strongly convex, i.e. the functional \(F: V\rightarrow {\mathbb {R}}\) defined as \(F({\textbf{u}}):=a_B({\textbf{u}},{\textbf{u}})\) is strongly convex. Assuming that the bilinear form \(a_B\) is continuous and coercive, the Lax-Milgram Lemma, see e.g. [23, Theorem 1.1.3], implies that the inverse \(B^{-1}: V^* \rightarrow V\) exists and that it is bounded through \(\Vert B^{-1}{\textbf{v}}^*\Vert _V \le c_B^{-1} \Vert {\textbf{v}}^*\Vert _{V^*}\) for all \({\textbf{v}}^* \in V^*\) where \(c_B\) denotes the coercivity constant of \(a_B\), cf. [23, Remark 1.1.3].

The definition of \(a_B\) allows us to give the following simple condition for existence and uniqueness of (4).

Proposition 2.2

If \(a_B\) is coercive, then (4) has a unique solution \(\hat{{\textbf{u}}} \in V\). If additionally \(\lambda > 0\), then \(\hat{ {\textbf{u}}} \in V \cap BV(\varOmega )^m\).

Proof

We denote by F the functional from (4) and aim to apply the direct method, see e.g. [13, Theorem 2.1]. Since it is clear that F is proper by having a lower bound of 0 and satisfying \(F(0) < \infty \), it remains to check that F is coercive and weakly lower semi-continuous.

Since \(T: V \rightarrow L^2(\varOmega )\) is bounded, \(V \rightarrow {\mathbb {R}}, {\textbf{u}} \mapsto \alpha _1\Vert T{\textbf{u}} - g\Vert _{L^1} + \tfrac{\alpha _2}{2} \Vert T{\textbf{u}} - g\Vert _{L^2}^2\) is continuous and due to convexity weakly lower semi-continuous, see Lemma 2.1 (or [29, Proof of Theorem 1, p. 525]). By the same argument, since S is bounded, \(V \rightarrow {\mathbb {R}}, {\textbf{u}} \mapsto \tfrac{\beta }{2} \Vert S{\textbf{u}}\Vert _{L^2}^2\) is weakly lower semi-continuous. The total variation is weakly lower semi-continuous on \(L^1(\varOmega )^m\), see [6, Remark 3.5, p. 119], and in particular on V, because \(V \subseteq L^2(\varOmega )^m \subseteq L^1(\varOmega )^m\) is assumed to be a continuously embedded subspace of \(L^2(\varOmega )^m\). In total, \(F: V \rightarrow \overline{{\mathbb {R}}}\) is weakly lower semi-continuous.

Observe that

$$\begin{aligned} F({\textbf{u}})&\ge \frac{\alpha _2}{2}\Vert T{\textbf{u}} -g\Vert _{L^2}^2 + \frac{\beta }{2}\Vert S{\textbf{u}}\Vert _{L^2}^2\\&= \frac{\alpha _2}{2}\Vert T{\textbf{u}}\Vert _{L^2}^2 - \alpha _2 {\langle }T{\textbf{u}},g{\rangle }_{L^2} + \frac{\alpha _2}{2}\Vert g\Vert _{L^2}^2+ \frac{\beta }{2}\Vert S{\textbf{u}}\Vert _{L^2}^2\\&\ge \tfrac{1}{2} a_B({\textbf{u}}, {\textbf{u}}) - \alpha _2 \Vert T\Vert _{{\mathcal {L}}(L^2, L^2)}\Vert {\textbf{u}}\Vert _{L^2} \Vert g\Vert _{L^2} + \frac{\alpha _2}{2}\Vert g\Vert _{L^2}^2 . \end{aligned}$$

Since \(a_B\) is coercive, i.e. there exists \(c_B>0\) such that \(a_B({\textbf{v}},{\textbf{v}})\ge c_B \Vert {\textbf{v}}\Vert _V^2\) for all \({\textbf{v}}\in V\), and \(\Vert {\textbf{u}}\Vert _{L^2}\le \Vert {\textbf{u}}\Vert _{V}\) from the latter inequality we obtain

$$\begin{aligned} F({\textbf{u}})&\ge \Vert {\textbf{u}}\Vert _{V} \left( \frac{c_B}{2}\Vert {\textbf{u}}\Vert _{V} - \alpha _2 \Vert T\Vert _{{\mathcal {L}}(L^2, L^2)}\Vert g\Vert _{L^2} \right) + \frac{\alpha _2}{2}\Vert g\Vert _{L^2}^2. \end{aligned}$$

Hence for \(\Vert {\textbf{u}}\Vert _{V} \rightarrow \infty \) we have \(F({\textbf{u}}) \rightarrow \infty \), which shows the coercivity of F.

Since V is reflexive and weakly closed in \(L^2(\varOmega )^m\) (Setting \((S.\textrm{i})\)) or in \(H^1(\varOmega )^m\) (Setting \((S.\textrm{ii})\)), the existence of a minimizer \(\hat{{\textbf{u}}} \in V\) now follows from [13, Theorem 2.1] and [13, Remark 2.2].

For uniqueness, we note that F is strongly convex since \(a_B\) is coercive and we may write \(F({\textbf{u}}) = \tfrac{1}{2} a_B({\textbf{u}}, {\textbf{u}}) + \alpha _1\Vert T{\textbf{u}} - g\Vert _{L^1} - \alpha _2 {\langle }T{\textbf{u}}, g{\rangle }_{L^2} + \Vert g\Vert _{L^2}^2 + \lambda \int _{\varOmega }|D{\textbf{u}}|_F\) with all terms being convex.

Since \(0 \in V\) has finite energy F(0), for the minimizer \(\hat{{\textbf{u}}}\) we have \(F(\hat{{\textbf{u}}})<\infty \) and in particular \(\int _\varOmega |D\hat{{\textbf{u}}}|_F <\infty \) if \(\lambda > 0\). In this case we conclude \(\hat{{\textbf{u}}} \in V \cap BV(\varOmega )^m\) since \(\hat{{\textbf{u}}} \in V \subseteq L^1(\varOmega )^m\). \(\square \)

Specifically for our two main choices \(S \in \{I, \nabla \}\) we can describe coercivity of the bilinear form \(a_B\) in slightly more explicit terms as given by the following proposition.

Proposition 2.3

The bilinear form \(a_B: V \times V \rightarrow {\mathbb {R}}\) is coercive in any of the following cases:

  1. (i)

    \(\alpha _2 > 0\) and \(T = I\),

  2. (ii)

    \(\beta > 0\) and \(S = I\),

  3. (iii)

    \(\beta > 0\), \(S = \nabla \) and \(1 \notin \ker T\).

Proof

  1. (i)

    For \(T = I\) with \(\alpha _2 > 0\) we immediately have

    $$\begin{aligned} a_B({\textbf{v}}, {\textbf{v}}) = \alpha _2 \Vert T {\textbf{v}}\Vert _{L^2}^2 + \beta \Vert S {\textbf{v}}\Vert _{L^2}^2 \ge \alpha _2 \Vert {\textbf{v}}\Vert _{V}^2 \end{aligned}$$

    for all \({\textbf{v}}\in V\).

  2. (ii)

    For \(S = I\) with \(\beta > 0\), as before, we directly obtain

    $$\begin{aligned} a_B({\textbf{v}}, {\textbf{v}}) = \alpha _2 \Vert T {\textbf{v}}\Vert _{L^2}^2 + \beta \Vert S {\textbf{v}}\Vert _{L^2}^2 \ge \beta \Vert {\textbf{v}}\Vert _{V}^2 \end{aligned}$$

    for all \({\textbf{v}}\in V\).

  3. (iii)

    Now we have \(S = \nabla \) which implies that we are in Setting \((S.\textrm{ii})\). Hence we need to show coercivity in \(H^1(\varOmega )^m\) from which coercivity of the subspace V follows immediately. We split \({\textbf{u}} \in H^1(\varOmega )^m\) into \({\textbf{u}} = {\textbf{v}} + \textbf{w}\) with being the componentwise mean and \({\textbf{v}}\in H^1(\varOmega )^m\) such that for \(i=1,\ldots ,m\). Due to the Poincaré-Wirtinger inequality, see e.g. [7, Corollary 5.4.1], we have

    $$\begin{aligned} \begin{aligned} \Vert {\textbf{u}}\Vert _{H^1(\varOmega )^m}^2&= \Vert {\textbf{v}} + \textbf{w}\Vert _{L^2}^2 + \Vert \nabla {\textbf{v}}\Vert _{L^2}^2 \\&\le \Vert {\textbf{v}}\Vert _{L^2}^2 + 2\Vert {\textbf{v}}\Vert _{L^2}\Vert \textbf{w}\Vert _{L^2} + \Vert \textbf{w}\Vert _{L^2}^2 + \Vert \nabla {\textbf{v}}\Vert _{L^2}^2 \\&\le 2\Vert \textbf{w}\Vert _{L^2}^2 + 2\Vert {\textbf{v}}\Vert _{L^2}^2 + \Vert \nabla {\textbf{v}}\Vert _{L^2}^2 \le 2\Vert \textbf{w}\Vert _{L^2}^2 + c_1 \Vert \nabla {\textbf{v}}\Vert _{L^2}^2 \end{aligned} \end{aligned}$$
    (7)

    for a constant \(c_1 > 0\), where we used \((a + b)^2 \le 2(a^2 + b^2)\), \(a, b \ge 0\), to obtain the second inequality. Because the operator T cannot annihilate constant functions, there is a constant \(c_T > 0\) independent of w such that \(\Vert T \textbf{w}\Vert _{L^2} \ge c_T \Vert \textbf{w}\Vert _{L^2}\). This means that if \(\Vert \textbf{w}\Vert _{L^2} \ge 2c_T^{-1}\Vert T\Vert _{{\mathcal {L}}(L^2,L^2)} \Vert {\textbf{v}}\Vert _{L^2}\), then

    $$\begin{aligned} \Vert T {\textbf{u}}\Vert _{L^2} = \Vert T \textbf{w} + T {\textbf{v}}\Vert _{L^2} \ge c_T \Vert \textbf{w}\Vert _{L^2} - \Vert T\Vert _{{\mathcal {L}}(L^2,L^2)} \Vert {\textbf{v}}\Vert _{L^2} \ge \tfrac{c_T}{2} \Vert \textbf{w}\Vert _{L^2}. \end{aligned}$$

    This together with (7) yields

    $$\begin{aligned} \Vert {\textbf{u}}\Vert _{H^1(\varOmega )^m}^2&\le 2\Vert \textbf{w}\Vert _{L^2}^2 + c_1\Vert \nabla {\textbf{v}} \Vert _{L^2}^2 \le \tfrac{8}{c_T^2} \Vert T {\textbf{u}}\Vert _{L^2}^2 + c_1 \Vert \nabla {\textbf{u}}\Vert _{L^2}^2 \le c_2 a_B({\textbf{u}},{\textbf{u}}) \end{aligned}$$

    for some constant \(c_2 > 0\). If on the other hand \(\Vert \textbf{w}\Vert _{L^2} < 2c_T^{-1}\Vert T\Vert _{{\mathcal {L}}(L^2,L^2)} \Vert {\textbf{v}}\Vert _{L^2}\) then (again using the Poincaré-Wirtinger inequality) we have

    $$\begin{aligned} \Vert \textbf{w}\Vert _{L^2} < 2c_T^{-1}\Vert T\Vert _{{\mathcal {L}}(L^2,L^2)} \Vert {\textbf{v}}\Vert _{L^2} \le c_3 \Vert \nabla {\textbf{v}}\Vert _{L^2} \end{aligned}$$

    for some constant \(c_3 > 0\) and hence

    $$\begin{aligned} \Vert {\textbf{u}}\Vert _{H^1(\varOmega )^m}^2&\le 2\Vert \textbf{w}\Vert _{L^2}^2 + c_1\Vert \nabla {\textbf{v}} \Vert _{L^2}^2 \le (c_1 + 2 c_3^2) \Vert \nabla {\textbf{u}}\Vert _{L^2}^2 \le \tfrac{c_1 + 2 c_3^2}{\beta } a_B({\textbf{u}}, {\textbf{u}}) \end{aligned}$$

    which concludes coercivity of \(a_B\) for Item (iii).

\(\square \)

In the sequel we will assume that \(a_B\) is coercive and hence, due to Remark 2.1, the invertibility of \(B = \alpha _2 T^* T + \beta S^* S: V \rightarrow V^*\).

(A1):

The bilinear form \(a_B: V \times V \rightarrow {\mathbb {R}}\) is coercive.

While this assumption is not required for dualization in itself, it will allow us to state the dual problem to (4) in a more explicit form in Theorem 2.2 and (11) using the inverse of B. Namely, we introduce on \(V^*\) the dual norm \(\Vert {\textbf{u}}^*\Vert _{B^{-1}}^2:= {\langle }{\textbf{u}}^*, {B^{-1}}{\textbf{u}}^*{\rangle }_{V^*, V}\) for \({\textbf{u}}^*\in V^*\). Coercivity of \(a_B\) will also be useful later in showing other uniqueness properties as in Theorems 3.2 and 4.1.

2.4 Dualization in \(H^1(\varOmega )^m\)

In this subsection we fix Setting \((S.\textrm{ii})\) with \(V = H^1(\varOmega )^m\) and aim to derive the dual problem to (4) which will later motivate the regularized predual formulation (11) in a more general setting. We recall that for this choice of V the total variation reduces to .

Theorem 2.2

Let \(V = H^1(\varOmega )^m\) and \(W = W_1 \times W_2 = L^2(\varOmega ) \times L^2(\varOmega )^{d\times m}\). Then the problem

$$\begin{aligned} \begin{aligned} \inf _{{\textbf{p}}=(p_1, {\textbf{p}}_2) \in W^*}\;&\tfrac{1}{2} \Vert T^* p_1 + \nabla ^* {\textbf{p}}_2 + \alpha _2 T^*g \Vert _{B^{-1}}^2 \\&\qquad - \tfrac{\alpha _2}{2} \Vert g\Vert _{L^2}^2 - {\langle } g,p_1{\rangle }_{L^2} + \chi _{|p_1| \le \alpha _1} + \chi _{|{\textbf{p}}_2|_F\le \lambda }, \end{aligned} \end{aligned}$$
(8)

is dual to (4). Furthermore, solutions \(u \in V\) and \({\textbf{p}} \in W^*\) to (4) and (8) respectively are characterized by

$$\begin{aligned} T^* p_1 + \nabla ^* {\textbf{p}}_2&= B{\textbf{u}} - \alpha _2 T^* g,\nonumber \\ |T {\textbf{u}} -g|\, p_1&= - \alpha _1 (T {\textbf{u}} - g),&\quad |p_1|&\le \alpha _1, \nonumber \\ |\nabla {\textbf{u}}|_{F}\, {\textbf{p}}_2&= - \lambda \nabla {\textbf{u}},&\quad |{\textbf{p}}_2|_F&\le \lambda . \end{aligned}$$
(9)

The proof of this statement follows standard arguments. However for completeness it is stated in “Appendix A”.

Note that (9) is a relation in the dual space \(V^*\) and the term \(\nabla ^*\) may be understood as \(\nabla ^*: L^2(\varOmega )^{d \times m} \rightarrow V^*, {\textbf{p}} \mapsto (\textbf{w} \mapsto {\langle }{\textbf{p}}, \nabla \textbf{w}{\rangle }_{L^2})\). Further, equation (9) can be rewritten using the bilinear form \(a_B\) from equation (5) as

$$\begin{aligned} {\langle }p_1, T {\textbf{v}}{\rangle }_{L^2} {+} {\langle }{\textbf{p}}_2, \nabla {\textbf{v}}{\rangle }_{L^2} = a_B({\textbf{u}}, {\textbf{v}}) - l({\textbf{v}}) \qquad \forall {\textbf{v}} \in V, \end{aligned}$$
(10)

where \(l({\textbf{v}}):= \alpha _2 {\langle }g, T{\textbf{v}}{\rangle }_{L^2}\).

3 Regularized Model

The dual problem (8) is convex but does not necessarily have a unique solution due to the nontrivial kernel of \(\nabla ^*\). To be able to enforce a unique solution, we slightly modify the objective function in (8) by adding terms \(\frac{\gamma _1}{2\alpha _1} \Vert p_1\Vert _{L^2}^2\) and \(\frac{\gamma _2}{2\lambda } \Vert {\textbf{p}}_2\Vert _{L^2}^2\) with \(\gamma _1, \gamma _2 \ge 0\). Additionally, compared to the motivation by Theorem 2.2 in a smooth setting, we will generalize the space V to allow for discontinuous functions as originally intended by (4).

3.1 Predual Problem and Dualization

We aim to choose W as a Hilbert space such that the linear operator \(\varLambda := (T, \nabla ): V \rightarrow W = (W_1, W_2)\), corresponding to A in the proof of Theorem 2.2, remains bounded. In particular we restrict ourselves to closed subspaces \(W_1 \subseteq L^2(\varOmega )\) equipped with \(\Vert {\,\cdot \,}\Vert _{L^2}\) and the following settings for \(\nabla : V \rightarrow W_2\) and its corresponding spaces:

(\(\nabla \).i):

\(V \subseteq H^1(\varOmega )^m\), allowing for Settings \((S.\textrm{i})\) and \((S.\textrm{ii})\), and the normed subspace \((W_2,\Vert {\,\cdot \,}\Vert _{L^2})\) with \(W_2 \subseteq L^2(\varOmega )^{d\times m}\) being closed,

(\(\nabla \).ii):

\(V \subseteq H_0^1(\varOmega )^m\), allowing for Settings \((S.\textrm{i})\) and \((S.\textrm{ii})\), and the normed subspace \((W_2,\Vert {\,\cdot \,}\Vert _{L^2})\) with \(W_2 \subseteq L^2(\varOmega )^{d\times m}\) being closed,

(\(\nabla \).iii):

\(V \subseteq L^2(\varOmega )^m\) with Setting \((S.\textrm{i})\) and the normed subspace \((W_2,\Vert {\,\cdot \,}\Vert _{(H_0^{\textrm{div}})^*})\) with \(W_2 \subseteq (H_0^{\textrm{div}}(\varOmega )^m)^*\) closed by defining \(\nabla : {\textbf{u}} \mapsto ({\textbf{p}} \mapsto {\langle }{\textbf{u}}, -{\text {div}}{\textbf{p}}{\rangle }_{L^2})\).

Note that for Setting (\(\nabla .\textrm{iii}\)) we have \(\nabla ^* = -{\text {div}}\) due to vanishing boundary terms, while for Settings \((\nabla .\textrm{i})\) and \((\nabla .\textrm{ii})\) this is not necessarily true, as in these settings the scalar product associated to the Hilbert space V is not the \(L^2\)-scalar product [41].

Using \(\gamma _1, \gamma _2 \ge 0\) we propose the following regularized dual problem:

$$\begin{aligned} \begin{aligned} \inf _{{\textbf{p}} = (p_1, {\textbf{p}}_2) \in W^*}\; \Big \{&\tfrac{1}{2} \big \Vert \varLambda ^* {\textbf{p}} - \alpha _2 T^*g \big \Vert _{B^{-1}}^2 - \tfrac{\alpha _2}{2} \Vert g\Vert _{L^2}^2 + {\langle }g, p_1{\rangle }_{L^2} \\& + \chi _{|p_1| \le \alpha _1} + \tfrac{\gamma _1}{2\alpha _1} \Vert p_1\Vert _{L^2}^2 + \chi _{|{\textbf{p}}_2|_F\le \lambda } + \tfrac{\gamma _2}{2\lambda } \Vert {\textbf{p}}_2\Vert _{L^2}^2 =:D({\textbf{p}}) \Big \}, \end{aligned} \end{aligned}$$
(11)

Note that if \(\alpha _1 = 0\), then it follows immediately that \(p_1 = 0\) due to the box-constraint \(\chi _{|p_1|\le \alpha _1}\). Analogously if \(\lambda = 0\), then \({\textbf{p}}_2=0\). In these cases we use the convention that the terms \(\tfrac{\gamma _1}{2\alpha _1} \Vert p_1\Vert _{L^2}^2\) and \(\tfrac{\gamma _2}{2\lambda } \Vert {\textbf{p}}_2\Vert _{L^2}^2\) vanish respectively. This convention both makes sense as a continuous extension of the limit process \(\alpha _1, \lambda \rightarrow 0\) and agrees with setting \(\alpha _1, \lambda = 0\) prior to dualization.

Theorem 3.1

The dual problem to (11) reads

$$\begin{aligned} \begin{aligned} \inf _{{\textbf{u}} \in V} \Big \{ F_1^*(T{\textbf{u}}) + \tfrac{\alpha _2}{2} \Vert T {\textbf{u}} -g \Vert _{L^2}^2 + \tfrac{\beta }{2} \Vert S{\textbf{u}}\Vert _{L^2}^2 + F_2^*(\nabla {\textbf{u}}) =: E({\textbf{u}}) \Big \} \end{aligned} \end{aligned}$$
(12)

where \(F_1^*\), \(F_2^*\) are the convex conjugates to \(F_1: W_1^* \rightarrow \overline{{\mathbb {R}}}\), \(F_2: W_2^* \rightarrow \overline{{\mathbb {R}}}\) given by

$$\begin{aligned} F_1(p_1)&:= {\langle } g,p_1{\rangle }_{L^2} + \chi _{|p_1| \le \alpha _1} + \tfrac{\gamma _1}{2\alpha _1} \Vert p_1\Vert _{L^2}^2, \\ F_2({\textbf{p}}_2)&:= \chi _{|{\textbf{p}}_2|_F\le \lambda } + \tfrac{\gamma _2}{2\lambda } \Vert {\textbf{p}}_2\Vert _{L^2}^2. \end{aligned}$$

Furthermore, solutions \(p = (p_1, {\textbf{p}}_2) \in W^*\), \({\textbf{u}} \in V\) of (11) and (12) respectively are characterized by

$$\begin{aligned} \begin{aligned} 0&= \varLambda ^* {\textbf{p}} - \alpha _2 T^* g + B{\textbf{u}}, \\ T{\textbf{u}}&\in \partial F_1(p_1), \\ \nabla {\textbf{u}}&\in \partial F_2({\textbf{p}}_2). \end{aligned} \end{aligned}$$
(13)

Proof

We use the Fenchel duality from Theorem 2.1, choosing \({\mathcal {F}}: W^* \rightarrow \overline{{\mathbb {R}}}\), \({\mathcal {G}}: V^* \rightarrow \overline{{\mathbb {R}}}\) and \(A: W^* \rightarrow V^*\) as follows

$$\begin{aligned} \begin{aligned} {\mathcal {F}}({\textbf{p}})&:= F_1(p_1) + F_2({\textbf{p}}_2) \\&= {\langle } g,p_1{\rangle }_{L^2} + \chi _{|p_1| \le \alpha _1} + \chi _{|{\textbf{p}}_2|_F\le \lambda } + \tfrac{\gamma _1}{2\alpha _1} \Vert p_1\Vert _{L^2}^2+ \tfrac{\gamma _2}{2\lambda } \Vert {\textbf{p}}_2\Vert _{L^2}^2, \\ {\mathcal {G}}(A {\textbf{p}})&:= \tfrac{1}{2} \Vert A {\textbf{p}} - \alpha _2 T^*g \Vert _{B^{-1}}^2 - \tfrac{\alpha _2}{2} \Vert g \Vert _{L^2}^2, \qquad A {\textbf{p}} := \varLambda ^* {\textbf{p}} = T^* p_1 + \nabla ^* {\textbf{p}}_2. \end{aligned} \end{aligned}$$

For \({\mathcal {G}}^*\) we get by the definition of the convex conjugate

where the supremum is attained whenever

$$\begin{aligned} \begin{aligned} 0 = \partial _{{\textbf{v}}} \big ({\langle }{\textbf{v}}, {\textbf{u}}{\rangle }_{V^*,V} - {\mathcal {G}}({\textbf{v}}) \big )&= {\textbf{u}} - B^{-1}({\textbf{v}} - \alpha _2 T^*g), \end{aligned} \end{aligned}$$

which implies \({\textbf{v}} = B {\textbf{u}} + \alpha _2 T^*g\). Hence we have

$$\begin{aligned} {\mathcal {G}}^*({\textbf{u}})&= {\langle } B {\textbf{u}} + \alpha _2 T^*g , {\textbf{u}} {\rangle }_{V^*,V} - \tfrac{1}{2} {\langle } B^{-1} B {\textbf{u}} , B {\textbf{u}} {\rangle }_{V,V^*} + \tfrac{\alpha _2}{2}\Vert g\Vert _{L^2}^2 \\&= {\langle } {\textbf{u}}, B {\textbf{u}} {\rangle }_{V,V^*} + {\langle } {\textbf{u}},\alpha _2 T^* g {\rangle }_{L^2} - \tfrac{1}{2} {\langle }{\textbf{u}}, B {\textbf{u}} {\rangle }_{V,V^*} + \tfrac{\alpha _2}{2}\Vert g\Vert _{L^2}^2 \\&= \tfrac{1}{2} {\langle } {\textbf{u}},(\alpha _2 T^* T + \beta S^* S) {\textbf{u}} {\rangle }_{V,V*} + {\langle } {\textbf{u}},\alpha _2 T^* g {\rangle }_{L^2} + \tfrac{\alpha _2}{2}\Vert g\Vert _{L^2}^2 \\&= \tfrac{\alpha _2}{2} {\langle } T {\textbf{u}}, T{\textbf{u}} {\rangle }_{L^2} + \tfrac{\beta }{2} {\langle } S {\textbf{u}},S {\textbf{u}} {\rangle }_{L^2} + \alpha _2 {\langle } T {\textbf{u}}, g {\rangle }_{L^2} + \tfrac{\alpha _2}{2}\Vert g\Vert _{L^2}^2 \\&= \tfrac{\alpha _2}{2} \Vert T {\textbf{u}} + g \Vert _{L^2}^2 + \tfrac{\beta }{2} \Vert S {\textbf{u}}\Vert _{L^2}^2. \end{aligned}$$

For \(F^*\), since F is separable in \(p_1\) and \({\textbf{p}}_2\), we only apply the separability property [28, III, Remark 4.3] without resolving \(F_1^*\) and \(F_2^*\) explicitly. The optimality conditions in Theorem 2.1 correspond to \(\varLambda {\textbf{u}} \in \partial F({\textbf{p}})\) and \(-{\textbf{u}} = B^{-1} (\varLambda ^* {\textbf{p}} - \alpha _2 T^* g)\) which yield (13). \(\square \)

Theorem 3.1 established the duality of (11) and (12) based on the predual formulation (11) similar to the approach of [35]. It is, however, interesting to note that the spaces V and W used for dualization are reflexive and thus the Fenchel duality from Theorem 2.1 may be used to equivalently establish the duality of (12) and (11) based on the primal formulation (12) (the only difference being a change in sign as can be seen when comparing (8) with (11)).

Next we analyze the existence and uniqueness of a solution of (12) and start by showing the lower semi-continuity of E.

Lemma 3.1

(Sequential lower semi-continuity) The functional E defined in (12) is lower semi-continuous with regards to weak V-convergence.

Proof

We show lower semi-continuity of each summand of E:

  1. (i)

    The term \(F_1^*(T {\textbf{u}})\) is by definition given by the supremum

    $$\begin{aligned} F_1^*(T {\textbf{u}}) = \sup _{\begin{array}{c} p_1 \in L^2(\varOmega ) \\ |p_1| \le \alpha _1 \end{array}} \Big \{ {\langle }T{\textbf{u}} - g, p_1{\rangle }_{L^2} - \tfrac{\gamma _1}{2\alpha _1} \Vert p_1\Vert _{L^2}^2 \Big \}. \end{aligned}$$

    Since the supremum of lower semi-continuous functions is lower semi-continuous due to Lemma 2.2, it suffices to show that \({\tilde{F}}_1: V \rightarrow {\mathbb {R}}\), \({\tilde{F}}_1({\textbf{u}}):= {\langle }T{\textbf{u}} - g, p_1{\rangle }_{L^2} - \tfrac{\gamma _1}{2\alpha _1} \Vert p_1\Vert _{L^2}^2\) is V-weakly lower semi-continuous for every fixed \(p_1 \in L^2(\varOmega ), |p_1| \le \alpha _1\). This is imminent since both \(T: V \rightarrow L^2(\varOmega )\) and the inner product are V-weakly continuous.

  2. (ii)

    Similarly, the term \(F_2^*(\nabla {\textbf{u}})\) is given by the supremum

    $$\begin{aligned} F_2^*(\nabla {\textbf{u}}) = \sup _{\begin{array}{c} {\textbf{p}}_2 \in W_2^* \\ |{\textbf{p}}_2|_F \le \lambda \end{array}} \Big \{ {\langle }{\textbf{u}}, \nabla ^* {\textbf{p}}_2{\rangle }_{V,V^*} - \tfrac{\gamma _2}{2\lambda } \Vert {\textbf{p}}_2\Vert _{L^2}^2 \Big \} \end{aligned}$$

    and we conclude by the same argument.

  3. (iii)

    Since the terms \(\Vert T{\textbf{u}} - g\Vert _{L^2}^2\) and \(\Vert {\textbf{u}}\Vert _{L^2}^2\) are both convex and continuous in \({\textbf{u}} \in V\), they are also weakly lower semi-continuous.

  4. (iv)

    For the term \(\Vert S{\textbf{u}}\Vert _{L^2}^2\) we distinguish both possible choices of S. If \(S = I: V \rightarrow V_S\), \(V = V_S \subseteq L^2(\varOmega )^m\), then \({\textbf{u}} \mapsto \Vert {\textbf{u}}\Vert _{L^2}^2\) is weakly continuous since it is both convex and continuous. If \(S = \nabla : V \rightarrow V_S\), \(V \subseteq H^1(\varOmega )^m\), then \({\textbf{u}} \mapsto \Vert \nabla {\textbf{u}}\Vert _{L^2}^2\) is weakly continuous with the same argument since \(\nabla : H^1(\varOmega )^m \rightarrow L^2(\varOmega )^{d\times m}\) is a continuous operator. \(\square \)

Proposition 3.1

If \(a_B\) is coercive, then (12) has a unique solution \(\hat{{\textbf{u}}} \in V\). If additionally \(\lambda > 0\), then \(\hat{{\textbf{u}}} \in V \cap BV(\varOmega )^m\).

Proof

The proof is similar to the one of Proposition 2.2. Coercivity of E is shown as in Proposition 2.2, while we use Lemma 3.1 for weak lower semi-continuity.

Since \(C_0^\infty (\varOmega )^m \subseteq W_2^*\), we have

$$\begin{aligned} \int _\varOmega |D{\textbf{u}}|_F - c&\le \sup _{\begin{array}{c} {\textbf{p}}_2 \in C_0^\infty (\varOmega )^{d\times m} \\ |{\textbf{p}}_2|_F \le 1 \end{array}} \Big \{ {\langle }{\textbf{u}}, {\text {div}}{\textbf{p}}_2{\rangle }_{L^2} - \tfrac{\gamma }{2} \Vert {\textbf{p}}_2\Vert _{L^2}^2 \Big \} \\&\le \sup _{\begin{array}{c} {\textbf{p}}_2 \in W_2^* \\ |{\textbf{p}}_2|_F \le \lambda \end{array}} \Big \{ {\langle }{\textbf{u}}, - {\text {div}}{\textbf{p}}_2{\rangle }_{L^2} - \tfrac{\gamma _2}{2\lambda } \Vert {\textbf{p}}_2\Vert _{L^2}^2 \Big \} = F_2^*(\nabla {\textbf{u}}) \end{aligned}$$

for come constant \(c>0\) and hence we obtain the existence of a unique solution \(\hat{{\textbf{u}}}\) with \(\hat{{\textbf{u}}} \in V\cap BV(\varOmega )^m\) if \(\lambda > 0\), cf. Proposition 3.4\(\square \)

To examine existence and uniqueness of the dual model (11) we utilize the following lemma for \(p=2\).

Lemma 3.2

The set \({\mathcal {M}}:= \{f \in L^p(\varOmega ): |f| \le \alpha \} \subseteq L^p(\varOmega )\), \(1 \le p \le \infty \) is (weakly) closed, convex and bounded for any \(\alpha \in L^p(\varOmega )\).

Proof

It is easy to see that \({\mathcal {M}}\) is convex by a pointwise consideration of the constraint and bounded in \(L^p(\varOmega )\) by \(\alpha \). For showing closedness let \((p_n)_{n\in {\mathbb {N}}} \subseteq {\mathcal {M}}\), \(p_n \rightarrow p \in L^p(\varOmega )\) be a convergent sequence in \({\mathcal {M}}\). Due to [2, Theorem 13.6] there exists a subsequence \((q_n)_{n\in {\mathbb {N}}} \subseteq (p_n)_{n\in {\mathbb {N}}}\) with \(q_n \rightarrow p\) pointwise almost everywhere. In particular we have \(|p| \le \sup _{n\in {\mathbb {N}}} |q_n| \le \alpha \) almost everywhere and therefore conclude \(p \in {\mathcal {M}}\). Finally, since a closed convex subset of a Banach space is weakly closed [27, Corollary 8.74] we find that \({\mathcal {M}}\) is weakly closed as well. \(\square \)

Theorem 3.2

Problem (11) has at least one solution \({\textbf{p}} \in W^*\), which is unique if \(\gamma _1, \gamma _2 > 0\).

Proof

Similarly as in the proof of Proposition 2.2 we apply the direct method, see e.g. [13, Theorem 2.1], using the weak topology on \(W^*\). That is, we show that the functional \(D: W^* \rightarrow \overline{{\mathbb {R}}}\) is proper, weakly l.s.c. and coercive.

The functional D is proper since it is bounded from below and admits a finite value, e.g. \(D(0) < \infty \).

Further, since the linear operator \(\varLambda ^*: W^* \rightarrow V^*\) is bounded and \(B^{-1}: V^* \rightarrow V\) is bounded as well due to coercivity of \(a_B\), see Remark 2.1, the term \({\textbf{p}} \mapsto \Vert \varLambda ^* {\textbf{p}} - \alpha _2 T^*g\Vert _{B^{-1}}^2\) is continuous and due to convexity also weakly lower semi-continuous, see Lemma 2.1. Similarly, the terms \({\textbf{p}} \mapsto -\frac{\alpha _2}{2}\Vert g\Vert _{L^2}^2 + {\langle }g,p_1{\rangle }_{L^2} + \frac{\gamma _1}{2\alpha _1} \Vert p_1\Vert _{L^2}^2 + \frac{\gamma _2}{2\lambda } \Vert {\textbf{p}}_2\Vert _{L^2}^2\) are weakly lower semi-continuous. By an application of Lemma 3.2 the set \({\tilde{K}}:= \{{\textbf{p}} \in L^2(\varOmega ) \times L^2(\varOmega )^{d\times m}: |p_1|\le \alpha _1, |{\textbf{p}}_2|_F \le \lambda \}\) is weakly closed in \(L^2(\varOmega ) \times L^2(\varOmega )^{d\times m}\) and in particular \({\tilde{K}} \cap (L^2(\varOmega ) \times H_0^{\textrm{div}}(\varOmega )^m)\) is weakly closed in \(L^2(\varOmega ) \times H_0^{\textrm{div}}(\varOmega )^m\).

Since the subspace \(W^*\) is closed and convex and therefore weakly closed in \(L^2(\varOmega ) \times L^2(\varOmega )^{d\times m}\) or \(L^2(\varOmega ) \times H_0^{\textrm{div}}(\varOmega )^m\), the set \(K:= {\tilde{K}} \cap W^*\) must be weakly closed in \(W^*\). Noticing that K defines the only non-trivial levelset of \({\textbf{p}} \mapsto \chi _{|p_1|\le \alpha _1} + \chi _{|{\textbf{p}}_2|_F \le \lambda }\) we conclude by Proposition 2.1 that this term is weakly lower semi-continuous and as such D in total as well.

Now we show that \(D: W^* \rightarrow \overline{{\mathbb {R}}}\) is coercive. Due to the box-constraints \(\chi _{|p_1|\le \alpha _1} + \chi _{|{\textbf{p}}_2|_F \le \lambda }\) it is easy to see that \(\Vert {\textbf{p}}\Vert _{L^2} \rightarrow \infty \) implies \(D({\textbf{p}}) \rightarrow \infty \). It therefore remains to check in the case of Setting (\(\nabla .\textrm{iii}\)) with \(W_2^* \subseteq H_0^{\textrm{div}}(\varOmega )^m\) when . Since \(a_B\) is coercive with coercivity constant \(c_B > 0\), we have

$$\begin{aligned} \Vert {\textbf{v}}\Vert _{V^*}^2 \le \Vert B\Vert ^2\Vert B^{-1}{\textbf{v}}\Vert _{V}^2&\le \tfrac{\Vert B\Vert ^2}{c_B}a_B(B^{-1}{\textbf{v}}, B^{-1}{\textbf{v}}) \\&\le \tfrac{\Vert B\Vert ^2}{c_B}{\langle }BB^{-1}{\textbf{v}}, B^{-1}{\textbf{v}}{\rangle }_{V^*,V} = \tfrac{\Vert B\Vert ^2}{c_B}\Vert {\textbf{v}}\Vert _{B^{-1}}^2 \end{aligned}$$

for any \({\textbf{v}}\in V^*\), which allows us to bound

$$\begin{aligned} D({\textbf{p}})&\ge \frac{1}{2}\Vert T^* p_1 - {\text {div}}{\textbf{p}}_2 - \alpha _2 T^* g\Vert _{B^{-1}}^2 + c_1 \\&\ge \frac{1}{2} \left( \Vert {\text {div}}{\textbf{p}}_2\Vert _{B^{-1}} - \Vert T^*p_1 - \alpha _2 T^*g\Vert _{B^{-1}} \right) ^2 +c_1 \\&\ge \frac{1}{2} \left( c_2 \Vert {\text {div}}{\textbf{p}}_2\Vert _{V^*} - c_3\right) ^2 + c_1 \rightarrow \infty \end{aligned}$$

for some constants \(c_1\in {\mathbb {R}}\), \(c_2,c_3 > 0\) independent of \({\textbf{p}}_2\), which shows coercivity of the functional D. The direct method, see for example [13, Theorem 2.1], then concludes the existence of a solution \({\textbf{p}} \in W^*\).

Uniqueness in case \(\gamma _1, \gamma _2 > 0\) follows from strict convexity in the terms \(\frac{\gamma _1}{2\alpha _1} \Vert p_1\Vert _{L^2}^2\) and \(\frac{\gamma _2}{2\lambda } \Vert {\textbf{p}}_2\Vert _{L^2}^2\) similar to the proof of Proposition 2.2. \(\square \)

For special choices of V the regularized terms in the primal problem (12) may be formulated in a more explicit way. They form integral expressions similar to those of the non-regularized primal problem (4) but include a pointwise so-called Huber-smoothing of the integrand.

Proposition 3.2

The terms \(F_1^*(T{\textbf{u}})\) and \(F_2^*(\nabla {\textbf{u}})\) from Theorem 3.1 are called Huber-regularized \(L^1\) and Huber-regularized total variation respectively and depending on V can be given explicitly by

  1. (i)

    if \(V \in \{H^1(\varOmega )^m, L^2(\varOmega )^m\}\),

  2. (ii)

    if \(V = H^1(\varOmega )^m\),

where the Huber-function \(\varphi _\gamma : {\mathbb {R}}\rightarrow [0,\infty )\) for \(\gamma \ge 0\) is defined by

$$\begin{aligned} \varphi _\gamma (x):= {\left\{ \begin{array}{ll} \frac{1}{2\gamma } x^2 &{} \text {if } |x| \le \gamma , \\ |x| - \frac{\gamma }{2} &{} \text {if } |x| > \gamma . \end{array}\right. } \end{aligned}$$
(14)

In particular, if \(V = H^1(\varOmega )^m\), then the optimality conditions (13) may be written as

$$\begin{aligned} \begin{aligned} 0&= \varLambda ^* {\textbf{p}} - \alpha _2 T^* g + B{\textbf{u}}, \\ 0&= p_1 \max \{\gamma _1,|T {\textbf{u}} -g|\} - \alpha _1 (T {\textbf{u}} -g),&|p_1|&\le \alpha _1 ,\\ 0&= {\textbf{p}}_2 \max \{\gamma _2,|\nabla {\textbf{u}}|_{F}\} - \lambda \nabla {\textbf{u}},&|{\textbf{p}}_2|_F&\le \lambda , \end{aligned} \end{aligned}$$
(15)

where \(\max \) denotes the pointwise maximum.

Proof

We have

$$\begin{aligned} F_1^*(q_1)&= \sup _{p_1\in W_1^*} \Big \{{\langle }p_1, q_1{\rangle }_{W_1^*, W_1} - {\langle }p_1, g{\rangle }_{L^2} - \chi _{|p_1| \le \alpha _1} - \tfrac{\gamma _1}{2\alpha _1} \Vert p_1\Vert _{L^2}^2 \Big \}. \end{aligned}$$

A function \(p_1\) is a supremum of this set if in an a.e. sense either \(|p_1| < \alpha _1\) with \(0 = q_1 - g - \frac{\gamma _1}{\alpha _1} p_1\) or \(|p_1| = \alpha _1\) with \(0 = q_1 - g - \mu p_1 - \frac{\gamma _1}{\alpha _1} p_1\) for any \(\mu \ge 0\). In the former case we have \(p_1 = \frac{\alpha _1}{\gamma _1}(q_1 - g)\) and \(|q_1 - g| < {\gamma _1}\), while in the latter we have \(\mu = \tfrac{1}{\alpha _1}(|q_1 - g| - \gamma _1) \ge 0\), therefore \(|q_1 - g| \ge \gamma _1\) and \(p_1 = \frac{q_1 - g}{\mu + \frac{\gamma _1}{\alpha _1}} = \alpha _1 \frac{q_1 - g}{|q_1 - g|}\). We thus deduce

For the conjugate \(F_2^*\) of \(F_2\) we get

$$\begin{aligned} \begin{aligned} F_2^*(\textbf{q}_2)&= \sup _{{\textbf{p}}_2 \in W_2^*} \Big \{{\langle } {\textbf{p}}_2, \textbf{q}_2 {\rangle }_{W_2^*,W_2} - \chi _{|{\textbf{p}}_2|_F\le \lambda } - \tfrac{\gamma _2}{2\lambda } \Vert {\textbf{p}}_2\Vert _{L^2}^2 \Big \}. \end{aligned} \end{aligned}$$

After scaling with \(\frac{1}{\lambda }\), i.e. substituting \(\textbf{w}:=\frac{{\textbf{p}}_2}{\lambda }\), we obtain

(16)

The pointwise constrained maximization problem on the right hand side yields the Karush-Kuhn-Tucker (KKT) conditions

$$\begin{aligned} \nabla {\textbf{u}} - \gamma _2 \textbf{w} - 2\mu \textbf{w}&= 0,&\quad |\textbf{w}|_F^2 - 1&\le 0,\\ \mu (|\textbf{w}|_F^2 - 1)&= 0,&\mu&\ge 0. \end{aligned}$$

Assuming \(\gamma _2 > 0\) implies \(\gamma _2 + 2\mu > 0\) and hence we have \(\textbf{w} = \frac{\nabla {\textbf{u}}}{\gamma _2 + 2\mu }\). If \(|\textbf{w}|_F < 1\) then \(\mu = 0\) and hence we obtain \(\textbf{w}=\frac{\nabla {\textbf{u}}}{\gamma _2}\). Inserting this in (16) yields the integrand \(\frac{1}{2\gamma _2} |\nabla {\textbf{u}}|_F^2\). If \(|\textbf{w}|_F = 1\) then we observe that \(1 = |\textbf{w}|_F = \frac{1}{\gamma _2 + 2\mu } |\nabla {\textbf{u}}|_F\) which leads to \(\gamma _2 + 2\mu = |\nabla {\textbf{u}}|_F\) and thus \(\textbf{w} = \frac{\nabla {\textbf{u}}}{|\nabla {\textbf{u}}|_F}\). Inserting in (16) yields the integrand \(|\nabla {\textbf{u}}|_F - \frac{\gamma _2}{2}\). Summarizing our findings we arrive at the integrand

$$\begin{aligned} \varphi _{\gamma _2}(|\nabla {\textbf{u}}|_F) = {\left\{ \begin{array}{ll} \tfrac{1}{2\gamma _2} |\nabla {\textbf{u}}|_F^2 &{} \text {if }|\nabla {\textbf{u}}|_F < \gamma _2, \\ |\nabla {\textbf{u}}|_F - \tfrac{\gamma _2}{2} &{} \text {else} \end{array}\right. } \end{aligned}$$

and thus

If \(\gamma _2=0\), a similar argument shows that

To show that (13) can be written as in (15) if \(V = H^1(\varOmega )^m\), we derive from \(L^2(\varOmega )^{d\times m} \ni \nabla {\textbf{u}} \in \partial F_2({\textbf{p}}_2)\) for \(\gamma _2 > 0\) that necessarily \(|{\textbf{p}}_2|_F \le \lambda \) and pointwise

$$\begin{aligned} \nabla {\textbf{u}}&\in {\left\{ \begin{array}{ll} \{ \tfrac{\gamma _2}{\lambda } {\textbf{p}}_2 \} &{} \text {if }|{\textbf{p}}_2|_F< \lambda , \\ \{ \mu {\textbf{p}}_2 : \mu \ge 0 \} &{} \text {if }|{\textbf{p}}_2|_F = \lambda , \end{array}\right. } \\ \iff \qquad {\textbf{p}}_2&= {\left\{ \begin{array}{ll} \lambda \tfrac{\nabla {\textbf{u}}}{\gamma _2} &{} \text {if }|\nabla {\textbf{u}}|_F < \gamma _2, \\ \lambda \tfrac{\nabla {\textbf{u}}}{|\nabla {\textbf{u}}|_F} &{} \text {if} |\nabla {\textbf{u}}|_F \ge \gamma _2, \end{array}\right. } \\&= \lambda \tfrac{\nabla {\textbf{u}}}{\max \{\gamma _2, |\nabla {\textbf{u}}|_F\}}. \end{aligned}$$

For \(\gamma _2 = 0\) the same argument applies, except for \(\nabla {\textbf{u}} = 0\), in which case only \(|{\textbf{p}}_2|_F \le \lambda \) holds. In any case, we can summarize for \(\gamma _2 \ge 0\) that \(\nabla u \in \partial F_2({\textbf{p}}_2)\) is indeed equivalent to

$$\begin{aligned} 0&= {\textbf{p}}_2 \max \{\gamma _2, |\nabla {\textbf{u}}|_F\} - \lambda \nabla {\textbf{u}},&|{\textbf{p}}_2|_F&\le \lambda . \end{aligned}$$

For the representation \(T{\textbf{u}} \in \partial F_1(p_1)\) one may proceed analogously. \(\square \)

We note that in general, e.g. for discrete subspaces V, the terms \(F_1^*(T{\textbf{u}})\) and \(F_2^*(\nabla {\textbf{u}})\) may not have such a simple explicit form as in Proposition 3.2.

3.2 Dual Characterization of the Huber-TV-Functional

In Proposition 3.2 we have seen the pointwise representation of the regularized primal total variation term \(F_2^*(\nabla {\textbf{u}})\) for \(V = H^1(\varOmega )^m\) by utilizing the Huber-function (14). We will now extend this representation to \(V = L^2(\varOmega )\) by means of a more generally defined Huber-TV functional, cf. Definition 3.1. This functional has been used in [49] for regularization and its relation with the dual studied in [18]. It is recognized to reduce the staircasing effect of the total variation [18]. We state some elementary properties of the Huber function and the Huber-TV functional and provide their proofs.

Proposition 3.3

The Huber-function (14) satisfies the following properties:

  1. (i)

    \(0 \le \gamma _- \le \gamma _+ \implies \forall x \in {\mathbb {R}}: \varphi _{\gamma _-}(x) \ge \varphi _{\gamma _+}(x)\),

  2. (ii)

    \(\forall x \in {\mathbb {R}}: \lim _{\gamma \rightarrow 0^+} \varphi _\gamma (x) = \varphi _0(x) = |x|\).

  3. (iii)

    \(\forall x \in {\mathbb {R}}: |\varphi _\gamma '(x)| \le 1\),

  4. (iv)

    for any \(f \in L^2(\varOmega )\).

Proof

  1. (i)

    We distinguish depending on \(x \in {\mathbb {R}}\) the cases

    $$\begin{aligned} |x| \le \gamma _- \le \gamma _+:&\quad \tfrac{1}{2\gamma _+}x^2 \le \tfrac{1}{2\gamma _-}x^2, \\ \gamma _- \le |x| \le \gamma _+:&\quad \tfrac{1}{2\gamma _+}x^2 \le \tfrac{1}{2}|x| \le |x| - \tfrac{\gamma _-}{2}, \\ \gamma _- \le \gamma _+ \le |x|:&\quad |x| - \tfrac{\gamma _+}{2} \le |x| - \tfrac{\gamma _-}{2}. \end{aligned}$$
  2. (ii)

    For \(x = 0\) it is clear that \(\varphi _\gamma (x) = |0|\). Otherwise one has \(\varphi _\gamma (x) = |x| - \tfrac{\gamma }{2} \rightarrow |x|\) for any small \(0< \gamma < |x|\) and \(x \in {\mathbb {R}}\).

  3. (iii)

    We derive for \(x \in {\mathbb {R}}\) directly

    In any case \(|\varphi _\gamma '(x)| \le 1\) for every \(x \in {\mathbb {R}}\).

  4. (iv)

    Due to Item (ii) by the Monotone Convergence Theorem [29, Appendix E, Theorem 4] we have . Then the statement follows by Item (i). \(\square \)

Definition 3.1

(Huber-TV-Functional, cf. [49]) For \({\textbf{u}} \in L^2(\varOmega )^m\) and \(\gamma \ge 0\) we denote by

$$\begin{aligned} \int _\varOmega \varphi _{\gamma }(|D {\textbf{u}}|_{{F}}):= \sup _{\begin{array}{c} \textbf{w} \in C_0^\infty (\varOmega )^{d\times m} \\ |\textbf{w}|_F \le 1 \end{array}} \Big \{ {\langle }{\textbf{u}}, -{\text {div}}\textbf{w}{\rangle }_{L^2} - \tfrac{\gamma }{2} \Vert \textbf{w}\Vert _{L^2}^2 \Big \} \end{aligned}$$
(17)

the \(\gamma \)-regularized Huber-TV functional.

In Definition 3.1, similarly to the total variation from (2), the supremum over pointwise constrained functions in \(C_0^\infty (\varOmega )^{d\times m}\) are taken, while \(F_2^*\) does so over \(H_0^{\textrm{div}}(\varOmega )^m\) in Setting (\(\nabla .\textrm{iii}\)), see proof of Proposition 3.2. Though \(C_0^\infty (\varOmega )^{d \times m}\) is a dense subset of \(H_0^{\textrm{div}}(\varOmega )^m\), the equivalence of \(F_2^*\) and (17) is non-trivial in view of the pointwise constraints, cf. [37]. This kind of equivalence was first claimed in [35], while the necessary argument was only sufficiently established later in [37].

We have the following density result.

Theorem 3.3

Let \(W_2^* \in \{H_0^{\textrm{div}}(\varOmega )^{m}, L^2(\varOmega )^{d\times m}\}\), \(\lambda > 0\) and denote

$$\begin{aligned} K_\lambda := \{{\textbf{p}} \in W_2^* : |{\textbf{p}}|_F \le \lambda \}. \end{aligned}$$

Then \(\overline{K_\lambda \cap C_0^\infty (\varOmega )^{d\times m}}^{\Vert {\,\cdot \,}\Vert _{W_2^*}} = K_\lambda \).

Proof

The proof may be carried out analogously to the proof of [37, Theorem 1]. For the decomposition into appropriate star-shaped domains necessary in that proof, we additionally refer the reader to [19, Proposition 2.5.3 and 2.5.4]. \(\square \)

Corollary 3.1

The term \(F_2^*(\nabla {\textbf{u}})\) from Theorem 3.1 is called Huber-regularized total variation and may take on the following explicit form

$$\begin{aligned} F_2^*(\nabla {\textbf{u}}) = \lambda \int _\varOmega \varphi _{\gamma _2}(|D {\textbf{u}}|_F) \end{aligned}$$

if \(V \in \{H^1(\varOmega )^m, L^2(\varOmega )^m\}\).

Proof

Using notation \(W_2^* \in \{L^2(\varOmega )^{d\times m}, H_0^{\textrm{div}}(\varOmega )^m\}\) respectively from Theorem 3.1, by definition of the convex conjugate and using the set density result from Theorem 3.3 we have

$$\begin{aligned} F_2^*(\nabla {\textbf{u}})&= \sup _{\begin{array}{c} {\textbf{p}}_2 \in W_2^* \\ |{\textbf{p}}_2|_F \le \lambda \end{array}} \Big \{ {\langle }\nabla {\textbf{u}}, {\textbf{p}}_2{\rangle }_{W_2,W_2^*} - \tfrac{\gamma _2}{2\lambda } \Vert {\textbf{p}}_2\Vert _{L^2}^2 \Big \} \\&= \sup _{\begin{array}{c} {\textbf{p}}_2 \in C_0^\infty (\varOmega )^{d\times m} \\ |{\textbf{p}}_2|_F \le \lambda \end{array}} \Big \{ {\langle }\nabla {\textbf{u}}, {\textbf{p}}_2{\rangle }_{L^2} - \tfrac{\gamma _2}{2\lambda } \Vert {\textbf{p}}_2\Vert _{L^2}^2 \Big \} \\&= \lambda \sup _{\begin{array}{c} \tilde{{\textbf{p}}}_2 \in C_0^\infty (\varOmega )^{d\times m} \\ |\tilde{{\textbf{p}}}_2|_F \le 1 \end{array}} \Big \{ {\langle }{\textbf{u}}, -{\text {div}}\tilde{{\textbf{p}}}_2{\rangle }_{L^2} - \tfrac{\gamma _2}{2} \Vert \tilde{{\textbf{p}}}_2\Vert _{L^2}^2 \Big \} = \lambda \int _\varOmega \varphi _{\gamma _2}(|D{\textbf{u}}|_F), \end{aligned}$$

where the last equality is the definition of the Huber-TV functional, see (17). \(\square \)

If \({\textbf{u}} \in H^1(\varOmega )^m\), the Huber-TV functional degrades to the Lebesgue integral over \(\varOmega \) of the Huber function term \(\varphi _{\gamma _2}(|\nabla {\textbf{u}}|_F)\) as we see in the following proposition.

Proposition 3.4

The Huber-TV functional (17) satisfies the following properties

  1. (i)

    \({\textbf{u}} \in BV(\varOmega )^m \iff \int _\varOmega \varphi _{\gamma }(|D {\textbf{u}}|_F) < \infty \) for any \(\gamma \ge 0\),

  2. (ii)

    If \({\textbf{u}} \in H^1(\varOmega )^m\), then

    where \(\varphi _{\gamma }\), \(\gamma \ge 0\) in the second integral is the Huber-function (14).

  3. (iii)

    \(0 \le \gamma _- \le \gamma _+ \implies \int _\varOmega \varphi _{\gamma _-}(|D{\textbf{u}}|_F) \ge \int _\varOmega \varphi _{\gamma _+}(|D{\textbf{u}}|_F)\),

  4. (iv)

    \(\lim _{\gamma \rightarrow 0} \int _\varOmega \varphi _{\gamma }(|D{\textbf{u}}|_F) = \int _\varOmega |D{\textbf{u}}|_F\).

Proof

  1. (i)

    Since \(\textbf{w}\) is box-constrained in the supremum from Definition 3.1 we can bound \(\int _\varOmega \varphi _{\gamma }(|D {\textbf{u}}|)\) from above and below:

    $$\begin{aligned} \int _\varOmega |D{\textbf{u}}|_F - c&\le \sup _{\begin{array}{c} \textbf{w} \in C_0^\infty (\varOmega )^{d\times m} \\ |\textbf{w}|_F \le 1 \end{array}} \Big \{ {\langle }{\textbf{u}}, {\text {div}}\textbf{w}{\rangle }_{L^2} - \tfrac{\gamma }{2} \Vert \textbf{w}\Vert _{L^2}^2 \Big \} \le \int _\varOmega |D{\textbf{u}}|_F, \end{aligned}$$

    where \(c:= \tfrac{\gamma }{2}|\varOmega | < \infty \).

  2. (ii)

    Using partial integration we get

    We may replace \(C_0^\infty (\varOmega )^{d\times m}\) by \(L^2(\varOmega )^{d\times m}\) due to Theorem 3.3. By the same pointwise consideration as in the proof of Proposition 3.2, we see that the supremum is attained for the Huber function integrand \(\varphi _{\gamma }(|\nabla u|_F)\).

  3. (iii)

    There exists a sequence \((\textbf{w}_n)_{n \in {\mathbb {N}}} \subseteq C_0^{\infty }(\varOmega )^{d\times m}\), \(|\textbf{w}_n|_F \le 1\) such that

  4. (iv)

    Because of strict monotonicity from (iii), the limit \(\lim _{\gamma \rightarrow 0} \int _\varOmega \varphi _\gamma (|D{\textbf{u}}|_F)\) is achieved by the supremum

\(\square \)

3.3 \(\varGamma \)-Convergence

We will now analyze how minimizers of (12) behave for \(\gamma :=(\gamma _1,\gamma _2) \rightarrow 0\) by making use of \(\varGamma \)-convergence.

Lower semi-continuity from Lemma 3.1 together with the properties of the Huber-TV functional allow us to prove a \(\varGamma \)-convergence result for the functional E.

Lemma 3.3

(\(\varGamma \)-convergence) Let \((\gamma _1^j)_{j\in {\mathbb {N}}}, (\gamma _2^j)_{j\in {\mathbb {N}}} > 0\) be monotonically decreasing sequences with \(\lim _{j\rightarrow \infty } \gamma _1^j = \lim _{j\rightarrow \infty } \gamma _2^j = 0\). Denote \(E^j: V \rightarrow \overline{{\mathbb {R}}}\) the energy functional in (12) for \((\gamma _1, \gamma _2)= (\gamma _1^j, \gamma _2^j)\) for \(j\in {\mathbb {N}}\) and \(E^\infty \) the functional in (4). Then with respect to weak V-convergence.

Proof

By the monotonicity property of the Huber-TV functional from Proposition 3.4 (iii) and the Huber-function Proposition 3.3 (i), we observe that \(E^j({\textbf{u}}) \le E^{j+1}({\textbf{u}})\) and \(E^j({\textbf{u}}) \rightarrow E^\infty ({\textbf{u}})\) pointwise for every fixed \({\textbf{u}}\in V\). Further for every \(j\in {\mathbb {N}}\) we have that \(E^j\) is (sequentially) weakly lower semi-continuous in V due to Lemma 3.1. According to [15, Remark 1.40 (ii)] we thus have with respect to weak V-convergence. \(\square \)

The following lemma ensures that the minimizers of \(E^j\), \(j \in {\mathbb {N}}\) are all contained in some common weakly compact set \(K \subseteq V\), which is a prerequisite for showing that the minimizers converge for \(j \rightarrow \infty \).

Lemma 3.4

(Equi-coercivity) Let \(\lambda > 0\) and \((E^j)_{j\in {\mathbb {N}}}, (\gamma _1^j)_{j\in {\mathbb {N}}}, (\gamma _2^j)_{j\in {\mathbb {N}}}\) as in Lemma 3.3. Then the sequence \((E^j)_{j}\) is equi-mildly coercive with regard to weak V-convergence, i.e. there exists a non-empty sequentially (with regard to weak V-convergence) compact set \(K \subseteq V\) such that \(\inf _V E^j = \inf _K E^j\) for all \(j \in {\mathbb {N}}\).

Proof

As \(E^j\) is proper for any \(j\in {\mathbb {N}}\), i.e. there exist \({\textbf{u}}\in V\) such that \(E^j({\textbf{u}})<\infty \), by coercivity of \(a_B\) from Assumption (A1) we obtain the coercivity of \(E^j\) in V for all \(j\in {\mathbb {N}}\).

Denote by \(L^j_a:= \{{\textbf{u}} \in V: E^j({\textbf{u}}) \le a\}\), \(a \in {\mathbb {R}}\) the lower level sets of \(E^j\) for \(j\in {\mathbb {N}}\). The level sets \(L^j_a\), \(j \in {\mathbb {N}}\), are bounded due to coercivity of \(E^j\) shown above.

Since \(E^j \le E^{j+1}\) due to Proposition 3.4, the level sets \(L_a^j\) are nested for any fixed \(a \in {\mathbb {R}}\), i.e. \(L_a^j \supseteq L_a^{j+1}\) for \(j\in {\mathbb {N}}\).

Consequently \(E^j \le E^\infty \) and since \(E^\infty (0) < \infty \) we may chose \(a:= E^\infty (0)\) to ensure \(L_a^j \ne \emptyset \) for all \(j \in {\mathbb {N}}\).

For all \(j \in {\mathbb {N}}\) the minimizers of \(E^j\) exist in V (see Proposition 3.1) and are contained within some non-empty weakly closed ball \(K \supseteq \overline{L_a^j}\) in V centred at the origin. Since V is reflexive K is weakly compact, see e.g. [17, Theorem 3.18], concluding the proof. \(\square \)

We are now ready to show our final main result, namely that for \(\gamma \rightarrow 0\) minimizers of (12) approach the minimizer of (4).

Theorem 3.4

Let \(\lambda > 0\) and \({\textbf{u}}^j, {\textbf{u}}^\infty \) denote the unique minimizers of \(E^j\) and \(E^\infty \) as given in Lemma 3.3 respectively for \(j\in {\mathbb {N}}\). Then \({\textbf{u}}^j \rightharpoonup {\textbf{u}}^\infty \) for \(j\rightarrow \infty \) with respect to weak V-convergence.

Proof

As shown in the proof of Lemma 3.4 the minimizers \(({\textbf{u}}^j)_{j\in {\mathbb {N}}}\) are contained within a sequentially compact (with regard to weak V-convergence) set K. Then, according to [15, Theorem 1.21] every weak limit of a subsequence of \(({\textbf{u}}^j)_{j\in {\mathbb {N}}}\) is a minimum point of \(E^\infty \). Since the minimum \({\textbf{u}}^\infty \) of \(E^\infty \) is unique, we have \({\textbf{u}}^j \rightharpoonup {\textbf{u}}^\infty \) for \(j\rightarrow \infty \). \(\square \)

4 Primal-Dual Semi-smooth Newton Algorithm

In this section we derive a primal-dual semi-smooth Newton method, cf. [34], in order to find an approximate solution of (12). Note that such Newton methods have already been used for the \(L^2\)-TV model [26, 38, 39, 44] and \(L^1\)-TV model [25, 42] in image reconstruction, i.e. \(m=1\) (greyscale images). We extend the approach of semi-smooth Newton methods to a vector-valued setting and to the \(L^1\)-\(L^2\)-TV model.

4.1 Derivation

In general (13) has a solution \({\hat{u}} \in V\) which can be approximated using continuous piecewise linear finite elements [13, Chapter 10.2]. Since all such discrete functions are elements of \(H^1(\varOmega )^m\), we derive for both settings, i.e. Setting \((S.\textrm{i})\) and Setting \((S.\textrm{ii})\), the semi-smooth Newton system using the spaces \(V=H^1(\varOmega )^m\) and \(W^*=L^2(\varOmega ) \times L^2(\varOmega )^{d \times m}\) for the primal and predual variable respectively. This simplification is sufficient for our discrete setting in any case, and sufficient for the continuous setting as long as \(V = H^1(\varOmega )^m\).

Let us denote for convenience

$$\begin{aligned} m_1&:= m_1({\textbf{u}}) := \max \{\gamma _1, |T{\textbf{u}} - g|\},&\chi _1&:=\chi _1({\textbf{u}}):= {\left\{ \begin{array}{ll} 1 &{} \text {if }|T{\textbf{u}}-g| \ge \gamma _1\\ 0 &{} \text {else} \end{array}\right. }, \\ m_2&:= m_2({\textbf{u}}) := \max \{\gamma _2, |\nabla {\textbf{u}}|_F\},&\chi _2&:=\chi _2({\textbf{u}}):= {\left\{ \begin{array}{ll} 1 &{} \text {if} |\nabla {\textbf{u}}|_F \ge \gamma _2 \\ 0 &{} \text {else} \end{array}\right. }. \end{aligned}$$

Writing the system of optimality condition (15) as \((0,0,0)=F({\textbf{u}},p_1,{\textbf{p}}_2)\), the resulting Newton system \((0,0,0) = DF({\textbf{u}},p_1,{\textbf{p}}_2)(\textbf{d}_{{\textbf{u}}},d_{p_1},\textbf{d}_{{\textbf{p}}_2})\) reads as follows:

$$\begin{aligned}&\alpha _2 T^* T \textbf{d}_{{\textbf{u}}} + \beta S^*S \textbf{d}_{{\textbf{u}}}{} & {} + T^* d_{p_1}{} & {} + \nabla ^*\textbf{d}_{{\textbf{p}}_2}{} & {} = -\Big (\nabla ^*({\textbf{p}}_2) + T^* p_1 \nonumber \\{} & {} {}{} & {} {}{} & {} {}&\quad \quad \quad + \alpha _2 T^*(T {\textbf{u}} - g) + \beta S^* S {\textbf{u}}\Big ), \end{aligned}$$
(18)
$$\begin{aligned}&\chi _1 \frac{(T {\textbf{u}} - g) \cdot T \textbf{d}_{{\textbf{u}}}}{|T {\textbf{u}} - g|} p_1 - \alpha _1 T \textbf{d}_{{\textbf{u}}}{} & {} + m_1 d_{p_1}{} & {} {} & {} = -\Big (m_1 p_1 - \alpha _1(T {\textbf{u}} - g)\Big ), \end{aligned}$$
(19)
$$\begin{aligned}&\chi _2 \frac{\nabla {\textbf{u}} \cdot \nabla \textbf{d}_{\textbf{u}}}{|\nabla {\textbf{u}}|} {\textbf{p}}_2 -\lambda \nabla \textbf{d}_{{\textbf{u}}}{} & {} {} & {} + m_2 \textbf{d}_{{\textbf{p}}_2}{} & {} = -\Big (m_2 {\textbf{p}}_2 - \lambda \nabla {\textbf{u}}\Big ), \end{aligned}$$
(20)

where \({\textbf{u}}\in H^1(\varOmega )^m\), \(p_1 \in L^2(\varOmega )\), \({\textbf{p}}_2 \in L^2(\varOmega )^{d \times m}\) represent the variables from the previous Newton step and \((\textbf{d}_{{\textbf{u}}},d_{p_1},\textbf{d}_{{\textbf{p}}_2}) \in H^1(\varOmega )^m \times L^2(\varOmega ) \times L^2(\varOmega )^{d \times m}\) is the solution of the Newton system.

Rearranging (19) and (20) for \(d_{p_1}\) and \(\textbf{d}_{{\textbf{p}}_2}\) yields

$$\begin{aligned} d_{p_1}&= -p_1 + \frac{\alpha _1}{m_1}( T({\textbf{u}} + \textbf{d}_{\textbf{u}}) - g) - \chi _1 \frac{(T{\textbf{u}} -g)\cdot T \textbf{d}_{\textbf{u}}}{|T {\textbf{u}} - g|^2} p_1, \end{aligned}$$
(21)
$$\begin{aligned} \textbf{d}_{{\textbf{p}}_2}&= -{\textbf{p}}_2 + \frac{\lambda }{m_2} \nabla ({\textbf{u}} + \textbf{d}_{\textbf{u}}) - \chi _2 \frac{\nabla {\textbf{u}} \cdot \nabla \textbf{d}_{\textbf{u}}}{|\nabla {\textbf{u}}|_{{F}}^2} {\textbf{p}}_2. \end{aligned}$$
(22)

Plugging these two equations into (18) leads to

$$\begin{aligned} 0&= T^* \Big ( \frac{\alpha _1}{m_1} \big (T({\textbf{u}} + \textbf{d}_{\textbf{u}}) - g\big ) - \chi _1 \frac{(T {\textbf{u}} - g) \cdot T \textbf{d}_{\textbf{u}}}{|T {\textbf{u}} - g|^2} p_1 \Big ) \\&\quad + \nabla ^* \Big (\frac{\lambda }{m_2} \nabla ({\textbf{u}} + \textbf{d}_{\textbf{u}}) - \chi _2 \frac{\nabla {\textbf{u}} \cdot \nabla \textbf{d}_{\textbf{u}}}{|\nabla {\textbf{u}}|_{{F}}^2} {\textbf{p}}_2 \Big ) \\&\quad + \alpha _2 T^* \big ( T({\textbf{u}} + \textbf{d}_{\textbf{u}}) -g \big ) + \beta S^*S ({\textbf{u}} + \textbf{d}_{\textbf{u}}), \end{aligned}$$

which is to be understood in a weak sense.

Recall \(a_B: H^1(\varOmega )^m \times H^1(\varOmega )^m \rightarrow {\mathbb {R}}\) from (5) and define \(a_1, a_2: H^1(\varOmega )^m \times H^1(\varOmega )^m \rightarrow {\mathbb {R}}\) and \(l: H^1(\varOmega )^m \rightarrow {\mathbb {R}}\) as follows:

We then have the following result.

Theorem 4.1

Let \(V \subseteq H^1(\varOmega )^m\) be a subspace such that there exists \(c_S > 0\) with \(\Vert \nabla {\textbf{u}}\Vert _{L^2}\le c_S \Vert S {\textbf{u}}\Vert _{L^2}\) for all \({\textbf{u}} \in V\). If \(p_1 \in L^2(\varOmega )\), \({\textbf{p}}_2 \in L^2(\varOmega )^{d \times m}\) such that \(|p_1| \le \alpha _1\), \(|{\textbf{p}}_2|_F \le \lambda \) holds a.e. in \(\varOmega \), then the problem

$$\begin{aligned} a(\textbf{d}_{\textbf{u}}, {{\varphi }}):= a_1(\textbf{d}_{\textbf{u}}, {{\varphi }}) + a_2(\textbf{d}_{\textbf{u}}, {{\varphi }}) + a_B(\textbf{d}_{\textbf{u}}, {{\varphi }}) = l({{\varphi }}) \qquad \forall {{\varphi }} \in V \end{aligned}$$
(23)

admits a unique solution \(\textbf{d}_{\textbf{u}} \in V\).

Proof

In V we verify the prerequisites for the Lax-Milgram Lemma, see e.g. [23, Theorem 1.1.3], i.e. boundedness of a and l, as well as coercivity of a with regard to \(\Vert {\,\cdot \,}\Vert _{H^1(\varOmega )^m}\).

We verify boundedness of l

$$\begin{aligned} |l({{\varphi }})|&\le \Vert B\Vert \Vert {\textbf{u}}\Vert _{L^2} \Vert {{\varphi }}\Vert _{L^2} + \lambda |\varOmega | \Vert \nabla {{\varphi }}\Vert _{L^2} + \alpha _1 |\varOmega | \Vert T {{\varphi }}\Vert _{L^2} + \alpha _2 \Vert g\Vert _{L^2} \Vert T {{\varphi }}\Vert _{L^2} \\&\le c \Vert {{\varphi }}\Vert _{H^1(\varOmega )^m} \end{aligned}$$

for some constant \(c > 0\), since T and \(\varOmega \) are bounded and \({\textbf{u}} \in L^2(\varOmega )^m\), \(g \in L^2(\varOmega )\).

Boundedness of \(a_1, a_2\) follows from

$$\begin{aligned} |a_1({\textbf{v}},\textbf{w})|&\le \Big (\Vert \tfrac{\alpha _1}{m_1} T {\textbf{v}}\Vert _{L^2} + \Vert \tfrac{\chi _1}{m_1^2}(T{\textbf{u}} - g)(T {\textbf{v}}) p_1\Vert _{L^2}\Big ) \Vert T \textbf{w}\Vert _{L^2} \\&\le \tfrac{2\alpha _1}{\gamma _1} \Vert T\Vert ^2 \Vert {\textbf{v}}\Vert _{L^2} \Vert \textbf{w}\Vert _{L^2}, \\ |a_2({\textbf{v}},\textbf{w})|&\le \Big (\Vert \tfrac{\lambda }{m_2} \nabla {\textbf{v}}\Vert _{L^2} + \Vert \tfrac{\chi _1}{m_2^2}(\nabla {\textbf{u}} \cdot \nabla {\textbf{v}}) {\textbf{p}}_2\Vert _{L^2}\Big )\Vert \nabla \textbf{w}\Vert _{L^2}\\&\le \tfrac{2\lambda }{\gamma _2} \Vert \nabla {\textbf{v}}\Vert _{L^2} \Vert \nabla \textbf{w}\Vert _{L^2}. \end{aligned}$$

Since \(a_B\) is bounded due to T and S being bounded, this implies that \(|a({\textbf{v}},\textbf{w})| \le c \Vert {\textbf{v}}\Vert _{H^1(\varOmega )^m}\Vert \textbf{w}\Vert _{H^1(\varOmega )^m}\) for some constant \(c > 0\).

For Setting \((S.\textrm{ii})\), i.e. \(S = \nabla \), coercivity of \(a_B\) with regard to \(\Vert {\,\cdot \,}\Vert _{H^1(\varOmega )^m}\) follows directly from Assumption (A1), while this is not the case for Setting \((S.\textrm{i})\), i.e. \(S = I\). In Setting \((S.\textrm{i})\) Assumption (A1) gives only coercivity of \(a_B\) with regard to \(\Vert {\,\cdot \,}\Vert _{L^2}\). By the additional prerequisite \(\Vert \nabla {\textbf{u}}\Vert _{L^2} \le c_S\Vert {\textbf{u}}\Vert _{L^2}\) for all \({\textbf{u}}\in V\) coercivity of \(a_B\) with respect to \(\Vert {\,\cdot \,}\Vert _{H^1(\varOmega )^m}\) follows also in this setting.

Having ensured the coercivity of \(a_B\), it is now sufficient to show that \(a_1\) and \(a_2\) are positive semi-definite. Using the vectorization operator \({\text {vec}}: {\mathbb {R}}^{d\times m} \rightarrow {\mathbb {R}}^{dm}: X \mapsto (X_{(k - 1 {\text {mod}} d)+1,\lfloor \frac{k-1}{d}\rfloor + 1})_{k=1}^{k=dm}\) applied in a pointwise sense for convenience we see that

where \(I_{dm\times dm} \in {\mathbb {R}}^{dm\times dm}\) denotes the unit matrix. It thus remains to show pointwise positive semi-definiteness for \(A_1: \varOmega \rightarrow {\mathbb {R}}\) and \(A_2: \varOmega \rightarrow {\mathbb {R}}^{dm\times dm}\). We see this by evaluating for \(\textbf{x} \in {\mathbb {R}}^{dm}\):

$$\begin{aligned} A_1&\ge \tfrac{\alpha _1}{m_1} - \chi _1 \tfrac{|p_1|}{m_1} \tfrac{|T{\textbf{u}}-g|}{m_1} \ge \tfrac{\alpha _1}{m_1} - \chi _1 \tfrac{\alpha _1}{m_1} \ge 0, \\ \textbf{x}^T A_2 \textbf{x}&\ge \tfrac{\lambda }{m_2} |\textbf{x}|^2 - \chi _2 \tfrac{|{\text {vec}}({\textbf{p}}_2)|}{m_2} \tfrac{|{\text {vec}}(\nabla {\textbf{u}})|}{m_2} |\textbf{x}|^2 \ge \big (\tfrac{\lambda }{m_2} - \chi _2 \tfrac{\lambda }{m_2}\big )|\textbf{x}|^2 \ge 0. \end{aligned}$$

This concludes the coercivity of the sum \(a = a_1 + a_2 + a_B\) and applying the Lax-Milgram Lemma yields the required result. \(\square \)

Theorem 4.1 proves the solvability of the semi-smooth Newton step and thus ensures that the following semi-smooth Newton algorithm is well-defined in a general Hilbert space setting.

figure a

If not otherwise specified, we use for Algorithm 1 the Cauchy stopping criterion

$$\begin{aligned} \tfrac{1}{|\varOmega |}\Big (\Vert {\textbf{u}}^{n} - {\textbf{u}}^{n-1}\Vert _{L^2}^2 + \Vert p_{1}^{n} - p_{1}^{n-1}\Vert _{L^2}^2 + \Vert {\textbf{p}}_{2}^{n} - \textbf{p}_{2}^{n-1}\Vert _{L^2}^2\Big ) < \varepsilon _{\text {newton}}, \end{aligned}$$
(24)

for some specified constant \(\varepsilon _{\text {newton}}>0\) and \((p_1^n,{\textbf{p}}_2^n):={\textbf{p}}^n\).

We would like to mention that it is not clear to us how the above derivation of the semi-smooth Newton method could be extended to the case \(V=L^2(\varOmega )^m\). For example in the weak formulation we require function spaces which allow for derivatives, see the bilinear form \(a_2\) above, and the expression \(|\nabla \cdot |\), see e.g. (22), needs to be well defined.

5 Discretization: Finite Elements

In order to implement the proposed semi-smooth Newton method, see Algorithm 1, we consider a finite dimensional subspace V. More precisely, we use a finite element subspace. We will construct the polynomial basis functions over a mesh that consists of triangles. As stated in the introduction, we intend to apply the method to images and hence we shortly discuss the alignment of meshes with respect to image pixels.

For two-dimensional computer images given by an array \(A \in [0, 1]^{n_1 \times n_2}\) define the domain \(\varOmega := [1, n_1] \times [1, n_2]\). If not otherwise noted, \(\varOmega \) is triangulated using simplices with nodes at integer coordinates \((x_1, x_2) \in {\mathbb {Z}}^2\), \(1 \le x_1 \le n_1\), \(1 \le x_2 \le n_2\) corresponding to pixel centers as depicted in Fig. 1.

Fig. 1
figure 1

Image aligned simplicial grid construction

Let \({\mathcal {T}}\) denote the set of cells and \(\varGamma \) the set of oriented facets (i.e. edges for \(d = 2\)) of the simplicial triangulation. For any cell \(K \in {\mathcal {T}}\) let \(P_k(K)\) be the space of polynomial functions on K with total degree \(k \in {\mathbb {N}}\). We choose finite dimensional subspaces \(V_h \subseteq H^1(\varOmega )^m \subseteq V\), \(W_h^* \subseteq L^2(\varOmega ) \times L^2(\varOmega )^{d \times m} \subseteq W^*\), \(Z_h \subseteq L^2(\varOmega )\) as follows:

$$\begin{aligned} \begin{aligned} V_h&:= \{{\textbf{u}} \in C(\varOmega )^m: {\textbf{u}}|_K \in P_1(K)^m, K \in {\mathcal {T}}\}, \\ W_h^*&:= \{(p_1, {\textbf{p}}_2) \in C(\varOmega ) \times L^2(\varOmega )^{d \times m}: p_1|_K \in P_1(K), {\textbf{p}}_2|_{K} \in P_0(K)^{d \times m},\\&\quad \qquad K \in {\mathcal {T}} \}, \\ Z_h&:= \{g \in C(\varOmega ): g|_K \in P_1(K), K \in {\mathcal {T}} \}, \end{aligned} \end{aligned}$$
(25)

i.e. piecewise linear continuous elements for \({\textbf{u}}\), g, \(p_1\) and piecewise constant discontinuous elements for \({\textbf{p}}_2\).

Returning to problem (12) we note that dualization and discretization do not in general commute. First restricting V to a subspace \(V_h\) and then constructing the dual problem may lead to a different result than vice versa. Indeed, the simple pointwise representations deduced for the dual problem in Proposition 3.2 do not necessarily hold true for subspaces of V. For that reason a modified primal discrete energy is introduced in [32], which allows for a manageable dual representation with direct constraints on the degrees of freedom. Here, we explore a suitable discretization of the continuous optimality conditions (15) instead. Namely, in the discrete finite element setting we will search for solutions \(p = (p_1, {\textbf{p}}_2) \in W_h^*\), \(u \in V_h\) which satisfy

$$\begin{aligned} \begin{aligned} 0&= \varLambda ^* {\textbf{p}} - \alpha _2 T^* g + B{\textbf{u}}, \\ 0&= p_1 \max \{\gamma _1,|T {\textbf{u}} -g|\} - \alpha _1 (T {\textbf{u}} -g),&|p_1|&\le \alpha _1, \\ 0&= {\textbf{p}}_2 \max \{\gamma _2,|\nabla {\textbf{u}}|_{F}\} - \lambda \nabla {\textbf{u}},&|{\textbf{p}}_2|_F&\le \lambda , \end{aligned} \end{aligned}$$
(26)

where the last two equations are enforced on vertices only. This is due to the fact, that on a single cell \(T{\textbf{u}} - g\) is linear, while the expression \(|T{\textbf{u}} - g|\) in general is not. To solve this discrete system of equations, we utilize Algorithm 1 with \(V=V_h\) and \(W^*=W_h^*\). We note that since the optimality conditions (26) for \({\textbf{p}}\) are enforced on vertices only, the updates (21) are carried out in the same way. In the assembly of the system \(a(\textbf{d}_{\textbf{u}}, {{\varphi }}) = l({{\varphi }})\) from Theorem 4.1 for terms involving \({\textbf{p}}\) we use a quadrature formula which only requires evaluations on vertices. Further, we remark that the proposed semi-smooth Newton method in a finite element setting is well-defined due to Theorem 4.1. In particular we have the following statement.

Corollary 5.1

Assume \(|p_1| \le \alpha _1\), \(|{\textbf{p}}_2|_F \le \lambda \) holds for \(p = (p_1, {\textbf{p}}_2) \in W_h^*\). Then the discrete problem of finding \(\textbf{d}_{\textbf{u}} \in V_h\) such that \(a(\textbf{d}_{\textbf{u}}, {{\varphi }}) = l({{\varphi }})\) for all \({{\varphi }} \in V_h\) admits a unique solution.

Proof

If \(S = \nabla \), the statement follows immediately from Theorem 4.1 using \(c_S = 1\). Let \(S = I\), then the finite element inverse inequality (see e.g. [23, Theorem 3.2.6] or [1, Theorem 1.3]) yields

$$\begin{aligned} \Vert \nabla {\textbf{u}}\Vert _{L^2} \le c h^{-1} \Vert {\textbf{u}}\Vert _{L^2} = c h^{-1}\Vert S {\textbf{u}} \Vert _{L^2}, \end{aligned}$$

where h is the smallest cell diameter and c is a constant independent of h. Since in our case \(h=\sqrt{2}\), Theorem 4.1 with \(c_S = c h^{-1}\) again yields the required result. \(\square \)

We start out with an \(L^2\)-norm estimate of the gradient operator in our finite element setting, which will be used for chosing the stepsize in Algorithm 2.

Lemma 5.1

Let \(d = 2\). For every cell \(K \in \mathcal T\) with diameter \(\rho _K\) and every \({\textbf{u}} \in V_h\) we have the upper bound

$$\begin{aligned} \Vert \nabla {\textbf{u}}\Vert _{L^2(K)} \le \tfrac{6\sqrt{2}}{\rho _K} \Vert {\textbf{u}}\Vert _{L^2(K)}. \end{aligned}$$

Proof

Let \(F: {\hat{K}} \rightarrow K\), \(\hat{\textbf{x}} \mapsto A\textbf{x} + b\) be the affine transformation bijectively mapping the reference cell \({\hat{K}}\) to K and set \(\hat{{\textbf{u}}}:= {\textbf{u}} \circ F\) to be \({\textbf{u}}\) transformed onto \({\hat{K}}\). As in the proof of [51, Proposition 3.38], since K contains a ball with diameter \(\rho _K\) and \({\hat{K}}\) is contained in a ball with diameter \(h_{{\hat{K}}}\), we have

$$\begin{aligned} \frac{\Vert \nabla {\textbf{u}}\Vert _{L^2(K)}}{\Vert {\textbf{u}}\Vert _{L^2(K)}} = \frac{\Vert A^{-t} \nabla \hat{{\textbf{u}}}\Vert _{L^2({\hat{K}})}}{\Vert \hat{{\textbf{u}}}\Vert _{L^2({\hat{K}})}} \le \frac{h_{{\hat{K}}}}{\rho _K} \frac{\Vert \nabla \hat{{\textbf{u}}}\Vert _{L^2({\hat{K}})}}{\Vert \hat{{\textbf{u}}}\Vert _{L^2({\hat{K}})}} = \frac{\sqrt{2}}{\rho _k} \frac{\Vert \nabla \hat{{\textbf{u}}}\Vert _{L^2({\hat{K}})}}{\Vert \hat{{\textbf{u}}}\Vert _{L^2({\hat{K}})}} \end{aligned}$$

and it remains to bound \(\tfrac{\Vert \nabla \hat{{\textbf{u}}}\Vert _{L^2({\hat{K}})}}{\Vert \hat{{\textbf{u}}}\Vert _{L^2({\hat{K}})}}\). Representing \(\hat{{\textbf{u}}}\) in local coordinates: \(\hat{{\textbf{u}}}(x, y) = a x + by + c(1-x-y)\), \(a, b, c \in {\mathbb {R}}\) we explicitly calculate using a computer algebra system

Using \(0 \le 3(a + b + c)^2 = 3(a^2 + b^2 + c^2 + 2ab + 2ac + 2bc)\) and \(0 \le (\sqrt{2} x + \tfrac{c}{\sqrt{2}})^2 = 2 x^2 + \tfrac{c^2}{2} + 2xc\), \(x \in \{a, b\}\) we bound

$$\begin{aligned} a^2 + b^2 + 2c^2 - 2ac - 2bc \le 4a^2 + 4b^2 + 5c^2 + 6ab + 4ac + 4bc \\ \le 6a^2 + 6b^2 + 6c^2 + 6ab + 6ac + 6bc \end{aligned}$$

and infer \(\Vert \nabla \hat{{\textbf{u}}}\Vert _{L^2({\hat{K}})}^2 \le 6 \cdot \tfrac{12}{2} \Vert \hat{{\textbf{u}}}\Vert _{L^2({\hat{K}})}^2 = 36 \Vert \hat{{\textbf{u}}}\Vert _{L^2({\hat{K}})}^2\). Combining this with the transformation above, we get

$$\begin{aligned} \frac{\Vert \nabla {\textbf{u}}\Vert _{L^2(K)}}{\Vert {\textbf{u}}\Vert _{L^2(K)}}&\le \frac{\sqrt{2}}{\rho _k} \frac{\Vert \nabla \hat{{\textbf{u}}}\Vert _{L^2({\hat{K}})}}{\Vert \hat{{\textbf{u}}}\Vert _{L^2({\hat{K}})}} \le \frac{6\sqrt{2}}{\rho _k}. \end{aligned}$$

\(\square \)

6 Numerical Experiments

In the following we present numerical experiments to show that our model together with Algorithm 1 can indeed be applied in practice to solve image processing tasks such as denoising, inpainting and the calculation of optical flow fields. The implementation of the performed algorithms is done in the programming language Julia [14] and can be found at [33].

We start this subsection by comparing our proposed method with the famous semi-implicit primal-dual algorithm from Chambolle and Pock [21] with respect to convergence speed.

6.1 Convergence Rate

We apply the accelerated method [21, Algorithm 2] to our problem (12) and in the discrete case to (26) which yields the following algorithm. A similar method was used in [3] for a special case of our model.

figure b

The non-accelerated variant of Algorithm 2 is obtained by using constant \(\tau _n:= \tau _0\), \(\sigma _n:= \sigma _0\) (and thus constant \(\theta _n\)) for all \(n \in {\mathbb {N}}\). In accordance to (26) for the discrete setting \(V = V_h\) the projections and are carried out in a nodal sense.

Note that with the exception of \({\textbf{u}}^{n+1}\) in Algorithm 2 all steps can be performed locally as a simple update, whereas for \(u^{n+1}\) in general the solution of the variational equality

$$\begin{aligned} \begin{aligned}&{\langle }{\textbf{u}}^{n+1}, {\textbf{v}}{\rangle }_{L^2} + \tau _n\big ( \alpha _2{\langle }T {\textbf{u}}^{n+1}, T{\textbf{v}}{\rangle }_{L^2} + \beta {\langle }S {\textbf{u}}^{n+1}, S{\textbf{v}}{\rangle }_{L^2}\big ) \\&\qquad = {\langle }{\textbf{u}}^n, {\textbf{v}}{\rangle }_{L^2} - \tau _n\big ( {\langle }p_1^n - \alpha _2 g, T{\textbf{v}}{\rangle }_{L^2} + {\langle }{\textbf{p}}_2^n, \nabla {\textbf{v}}{\rangle }_{L^2}\big ) \end{aligned} \end{aligned}$$
(27)

for all \({\textbf{v}} \in V\) is required.

To numerically observe the asymptotic convergence properties of Algorithm 1, a small image g as depicted in Fig. 2 has been chosen along with the denoising setting \(T = I\), \(S = I\), \(\alpha _1 = 0\) \(\alpha _2 = 30\), \(\lambda = 1 \), \(\beta = 0 \) and \(\gamma _1 = 1\times 10^{-2}, \gamma _2 = 1\times 10^{-3} \). We use \(L = \Vert \nabla \Vert _{L^2} = \max _{K \in {\mathcal {T}}} \tfrac{6\sqrt{2}}{\rho _K} \approx 8.36\) according to Lemma 5.1, \(\tau _0 = \tfrac{1}{L}\), cf. [21], \(\mu = \alpha _2 + \beta \) for Algorithm 2. Since here \(m=1\), V is a scalar-valued function space and we write \(u\in V\) instead of \({\textbf{u}}\in V\) (as before). We stop iterating when either the criterion (24) with \(\varepsilon _{\text {newton}} = 1\times 10^{-10} \) or \(n \ge 10, 000 \) holds true, whichever comes first. The energy \({\hat{E}}:= 112.47 \) was obtained as the minimal energy over all iterations and algorithms and assumed by Algorithm 1.

From the step lengths and energies in Fig. 3 one can see the sublinear convergence of the semi-implicit method and its accelerated variant. The semi-smooth Newton method displays superlinear convergence, reaches the desired tolerance after only a few iterations and assumes the minimal energy \({\hat{E}}\) in the last steps which are excluded from the logarithmic plot.

Fig. 2
figure 2

From left to right: 64x64 pixel input image g and respective denoised outputs for semi-implicit, semi-implicit accelerated and semi-smooth Newton algorithms

Fig. 3
figure 3

Comparison of steps and energy. The minimal energy \({\hat{E}}\) is reached by Algorithm 1 in the last step wherefore the corresponding data point has been excluded from the logarithmic plot.

6.2 Denoising

From the original image \({\tilde{g}}\) in Fig. 4 we generate an artificially noisy input \(g:= \varphi ({\tilde{g}} + \eta )\), where \(\eta \) denotes zero mean additive Gaussian noise with variance 0.1 and \(\varphi (x) \in \{0,1,x\}\) with probability \(\tfrac{p}{2}, \tfrac{p}{2}, 1 - p\) respectively and \(p = 2\times 10^{-2}\).

We denoise (i.e. remove the noise from) g using Algorithm 1 by setting \(T = I\), \(S = I\) and using manually chosen parameters \(\alpha _1 = 0.2\), \(\alpha _2 = 8\), \(\lambda = 1\), \(\beta = 0\), \(\gamma _1 =1\times 10^{-4} \), \(\gamma _2 = 1\times 10^{-4}\), \(\varepsilon _{\text {newton}} = 1\times 10^{-5}\) to obtain visually pleasing results. The result visible in Fig. 4 matches the expected behaviour of total variation denoising, i.e. coherent noisy regions are flattened out, while sharp edges are preserved.

Fig. 4
figure 4

From left to right: original image \({\tilde{g}}\), noisy input image g, denoised output image u

6.3 Combined Inpainting and Denoising

We want to show how the regularization due to S and \(\beta \) affects the output. To that end we choose the example of combined inpainting and denoising, i.e. \(m=1\). Inpainting is the task of restoring a given defected image \(g\in L^2(\varOmega \setminus D)\) covering the defected (inpainting) region \(D\subseteq \varOmega \). Then the operator \(T:=\textrm{Id}_{\varOmega \setminus D}\) denotes the masking operator defined by

$$\begin{aligned} (\textrm{Id}_{\varOmega \setminus D} u)(\textbf{x}) := {\left\{ \begin{array}{ll} u(\textbf{x}) &{} \textbf{x} \in \varOmega \setminus D, \\ 0 &{} \textbf{x} \in D, \end{array}\right. } \end{aligned}$$
(28)

The input image in Fig. 5 is generated by first applying the inpainting mask in Fig. 5, yielding the inpainting region D, to receive a masked image \({\tilde{g}}\) and then adding noise to arrive at \(g:= \varphi ({\tilde{g}} + \eta )\), where \(\eta \) denotes zero mean additive Gaussian noise with variance 0.1 and \(\varphi (x) \in \{0,1,x\}\) with probability \(\tfrac{p}{2}, \tfrac{p}{2}, 1 - p\) respectively and \(p = 2\times 10^{-2}\).

Note that for the application of image inpainting special care has to be taken in our finite element setting. This is because image interpolation may leak corrupt data from within the inpainting area if the inpainting mask is not extended to cover this area. In particular, global interpolation methods, such as \(L^2\)-projection in the case of cellwise linear continuous elements, should be avoided and for other interpolation methods, the inpainting mask needs to be extended to cover the area of influence. Since our mesh is image-aligned, i.e. cell vertices correspond to pixel centres, we use nodal interpolation for g and implement the operator T for the discrete setting \(V = V_h\) in a cell-wise sense to be 0 whenever any of its cells vertices are masked. An illustration of which cells this affects can be seen in Fig. 6.

Fig. 5
figure 5

Top row: inpainting mask and masked noisy input image g, center row: output for \(S = I\) and \(\beta = 3\times 10^{-2}\), \(\beta = 0.3\) and \(\beta = 3\) respectively, bottom row: output for \(S = \nabla \) and \(\beta = 5\), \(\beta = 50\) and \(\beta = 500\) respectively

Fig. 6
figure 6

Simplicial grid on \(3\times 3\) image with image inpainting mask covering the bottom center pixel and corresponding cell-wise inpainting mask (horizontally striped area)

We execute Algorithm 1 on the original image resolution grid and chose parameters by visual preference as follows: \(\alpha _1 = 0.2\), \(\alpha _2 = 8\), \(\lambda = 1\), \(\gamma _1 = 1\times 10^{-4}\), \(\gamma _2 = 1\times 10^{-4}\). We use \(\varepsilon _{\text {newton}} = 1\times 10^{-4}\) in Algorithm 1.

In Fig. 5 we see the results for \(S = I\) and \(S = \nabla \) respectively. In each case three choices of \(\beta \) were made: first with \(\beta \) sufficiently large to notice the regularization tradeoff and then with \(\beta \) reduced by a factor of 10 and increased by a factor of 10. We see that for small \(\beta \) the outputs for \(S = I\) and \(S = \nabla \) are visually almost indistinguishable, while for larger \(\beta \) undesirable features are introduced. Namely for \(S = \nabla \) the output image becomes blurry, while for \(S = I\) a general darkening takes place which is dominant in the inpainting area, where no data term is guiding the output.

6.4 Optical Flow

The problem of optical flow is to compute the apparent motion field of an image sequence. One approach, given two grey-scale images \(f_0, f_1: \varOmega \rightarrow [0, 1]\), is to estimate a displacement field \({\textbf{u}}: \varOmega \rightarrow {\mathbb {R}}^m\), \(m = d\), which maps points of similar brightness, i.e. for all \(\textbf{x} \in \varOmega \)

$$\begin{aligned} f_0(\textbf{x}) = f_1(\textbf{x} + {\textbf{u}}(\textbf{x})). \end{aligned}$$
(29)

Here exceeding displacements \(\textbf{x} + {\textbf{u}}(\textbf{x}) \not \in \varOmega \) are ignored. Equation (29) is called the brightness constancy assumption. It is usually underdetermined since u is vector-valued while (29) is scalar, and depending on \(f_0, f_1\) there might not even exist a solution, e.g. due to occlusion or brightness change. Nevertheless, (29) may be still applied in a minimization setting as a data term, e.g. using the \(L^2\) residual, together with suitable regularization to arrive at an approximate motion field u [8].

Assuming smooth \(f_0, f_1\) and expanding the right hand side of (29) at \(\textbf{x} + {\textbf{u}}_0(\textbf{x})\) for some smooth initial guess \({{\textbf{u}}_0}: \varOmega \rightarrow {\mathbb {R}}^2\) one arrives at

$$\begin{aligned} \begin{aligned} f_0(\textbf{x})&= f_1(\textbf{x} + {\textbf{u}}_0(\textbf{x}) + ({\textbf{u}} - {\textbf{u}}_0)(\textbf{x})) \\&\approx f_1(\textbf{x} + {\textbf{u}}_0(\textbf{x})) + \nabla f_1(\textbf{x} + {\textbf{u}}_0(\textbf{x})) \cdot ({\textbf{u}} - {\textbf{u}}_0)(\textbf{x}) \\&\approx f_w(\textbf{x}) + \nabla f_w(\textbf{x}) \cdot ({\textbf{u}} - {\textbf{u}}_0)(\textbf{x}) \\&= f_w(\textbf{x}) + \nabla f_w(\textbf{x}) \cdot {\textbf{u}}(\textbf{x}) - \nabla f_w(\textbf{x}) \cdot {\textbf{u}}_0(\textbf{x}), \end{aligned} \end{aligned}$$
(30)

where \(f_w\), defined as \(f_w(\textbf{x}):= f_1(\textbf{x} + {\textbf{u}}_{0}(\textbf{x}))\) is a (backwards-)warped version of \(f_1\). Note that in the derivation sketched above we generally have

$$\begin{aligned} \nabla f_w(\textbf{x}) = (I + {\textbf{u}}_0'(\textbf{x})^T) \nabla f_1(\textbf{x} + {\textbf{u}}_0(\textbf{x})) \ne \nabla f_1(\textbf{x} + {\textbf{u}}_0(\textbf{x})). \end{aligned}$$

We call (30) the optical flow equation linearized at the initial guess \({\textbf{u}}_0\). Note that for any solution \({\textbf{u}}\) to (30), \({\textbf{u}} + {\textbf{v}}\) with \(\nabla f_w \cdot {\textbf{v}} = 0\) is a solution as well, i.e. the linearized optical flow equation provides flow information only in the image gradient direction, a phenomenon also known as aperture problem.

We use our model (1) to estimate a solution to (30) by setting

$$\begin{aligned} T {\textbf{u}} := \nabla f_{w} \cdot {\textbf{u}}, \qquad g := \nabla f_{w} \cdot {\textbf{u}}_{0} - (f_{w} - f_0). \end{aligned}$$

The parameters \(\alpha _1\), \(\alpha _2\), \(\lambda \) in (1) allow to tune the optical flow model. Notable special cases in the discrete setting include e.g. L1-TV optical flow in [52] and a comparison of L1-TV and L2-TV in [24].

While the linearized optical flow equation (30) has localized the global condition (29), it comes at the cost of misrepresenting large displacements. One may alleviate this problem by repositioning the linearization point as in Algorithm 3.

figure c

The images \(f_0\), \(f_1\), \(f_{w,k}\) in Algorithm 3 used for the model (12) generally make use of the discrete space \(Z_h\), whereas for g we instead use a cellwise linear discontinuous space to capture the discontinuous component \(\nabla f_{w,k-1}\). The warping step \(f_{w,k}(\textbf{x}) = f_1(\textbf{x} + {\textbf{u}}_k(\textbf{x}))\) itself, however, is carried out by evaluating the original image \(f_1\) using bicubic interpolation.

To approximately solve for \({{\textbf{u}}_k}\) in Algorithm 3, we make in (12) the choice

$$\begin{aligned} T {\textbf{u}} := \nabla f_{w,k-1} \cdot {\textbf{u}}, \qquad g := \nabla f_{w,k-1} \cdot {\textbf{u}}_{k-1} -(f_{w,k-1} - f_0). \end{aligned}$$

We use in Algorithm 3 the stopping criterion

$$\begin{aligned} \tfrac{ \Vert f_{w,k-1} - f_0\Vert _{L^2} - \Vert f_{w,k} - f_0\Vert _{L^2}}{\Vert f_{w,k-1} - f_0\Vert _{L^2}} < \varepsilon _{\text {warp}} \end{aligned}$$

for some specified constant \(\varepsilon _{\text {warp}}\), which ensures that warping continues only as long as the remaining image difference \(\Vert f_{w,k} - f_0\Vert _{L^2}^2\) is being reduced sufficiently.

Note that the warping technique in Algorithm 3 may be combined with a coarse-to-fine scheme, where \({{\textbf{u}}_k}\) is solved on increasingly finer scales, resolving large displacements on an early coarse scale and filling in detail later. In upcoming work [4] we plan to use adaptive finite elements to establish such a coarse-to-fine scheme.

Fig. 7
figure 7

Middlebury Dimetrodon Optical Flow Benchmark: top row: \(f_0\), \(f_1\), image difference \(f_1 - f_0\), bottom left: computed optical flow u using Algorithm 3 stopping after one iteration, bottom center: computed optical flow u using Algorithm 3, bottom right: ground truth optical flow

In our experiments we use the manually chosen model parameters \(S = \nabla \), \(\alpha _1 = 10\), \(\alpha _2 = 0\), \(\lambda = 1\) to obtain visually pleasing results, cf. superiority of L1-TV in [24], \(\beta = 1\times 10^{-5}\), \(\gamma _1 = 1\times 10^{-4}\), \(\gamma _2 = 1\times 10^{-4}\) to balance between speed and quality of the reconstruction and \({{\textbf{u}}_0}:= 0\). For Algorithm 1\(\varepsilon _{\text {newton}} = 1\times 10^{-3}\) was chosen. We use \(\varepsilon _{\text {warp}} = 5\times 10^{-2}\) in Algorithm 3.

In Fig. 7 we evaluate Algorithm 3 visually against the Middlebury optical flow benchmark [9]. We also consider Algorithm 3 stopped after just one iteration, i.e. the classical linearized optical flow equation.

The color-coded images representing optical flow fields are normalized by the maximum motion of the ground truth flow data and black areas of the ground truth data represent unknown flow information, e.g. due to occlusion. A good resemblance of the computed optical flow to the ground truth and the effect of total variation regularization, i.e. sharp edges separating homogeneous regions, can be seen clearly. It is unclear how much visual improvement more careful or adaptive parameter selection may give and further study remains to be done.