1 Introduction

More than 25 years have passed since Leonid Rudin, Stanley Osher and Emad Fatemi proposed their classical model for edge-preserving denoising of images [37]. Given an image corrupted with zero-mean Gaussian noise. If this image is modelled as a function \(f:{\varOmega }\rightarrow {\mathbf {R}}\), where \({\varOmega }\) is a bounded, piecewise smooth domain in the plane, then the original Rudin–Osher–Fatemi (ROF) model proposes to recover the clean image as the function \(u_0:{\varOmega }\rightarrow {\mathbf {R}}\) which solves the following constrained minimization problem:

$$\begin{aligned}&\min \int _{{\varOmega }} |\nabla u(x)|\,\mathrm{d}x\quad \text {subject to }\\&\int _{{\varOmega }} u\,\mathrm{d}x = \int _{{\varOmega }}f\,\mathrm{d}x\quad \text {and}\quad \int _{{\varOmega }} (u-f)^2\,\mathrm{d}x=\sigma ^2. \end{aligned}$$

Here, the expression \(\int _{{\varOmega }} |\nabla u(x)|\,\mathrm{d}x\) represents the total variation in the function u. (The set of functions for which the total variation is finite is denoted \(\mathrm{BV}({\varOmega })\).) The linear constraint models that the noise has zero mean, and the quadratic constraint, that it has variance \(\sigma ^2\). In practice, one studies the Lagrange formulation of this problem and minimizes the functional

$$\begin{aligned} \lambda \int _{{\varOmega }}|\nabla u|\,\mathrm{d}x + \frac{1}{2}\int _{{\varOmega }}(u-f)^2\,\mathrm{d}x \end{aligned}$$

over functions \(u\in \mathrm{BV}({\varOmega })\). The linear constraint is then automatically satisfied (see Sect. 4), and it was shown early on, by Chambolle and Lions [19], that there exists a positive value for the Lagrange multiplier \(\lambda \) such that the quadratic constraint is satisfied as well. (It is convenient to place the Lagrange multiplier in front of the objective function instead of in front of the constraint.)

The present paper will carry out a thorough analysis of the one-dimensional ROF model: we take \({\varOmega }\) to be a bounded interval \(I=(a,b)\) and let \(f\in L^2(I)\) denote a given (noisy) signal. To this signal, we associate the (ROF) functional

$$\begin{aligned} E_\lambda (u) = \lambda \int _a^b | u'(x)|\,\mathrm{d}x + \frac{1}{2}\int _a^b(f(x)-u(x))^2\,\mathrm{d}x\;, \end{aligned}$$

where \(\lambda >0\) is a parameter, and define the denoised signal as the function \(u_\lambda \in \mathrm{BV}(I)\) which minimizes this energy, i.e.

$$\begin{aligned} u_\lambda := \mathop {{{{\mathrm{arg}}~{\mathrm{min}}}}}\limits _{u\in \mathrm{BV}(I)} E_\lambda (u)\;. \end{aligned}$$
(1)

Precise definitions of the total variation and the space \(\mathrm{BV}(I)\) will be given in Sects. 2 and 3.

We are going to compare the one-dimensional ROF model to the taut string algorithm, which is an alternative method for denoising of signals with applications in statistics, nonparametric estimation, real-time communication systems and stochastic analysis. In the continuous setting, for analogue signals, the taut string algorithm can be stated in the following manner (cf. Fig. 1):

figure b

The taut string algorithm has been extensively studied in the discrete setting by Mammen and de Geer as well as Davies and Kovacs [20, 29] and Dumgen and Kovacs [21]. Recently, using the highly developed methods of real interpolation theory (Peetre’s K-functional and the notion of invariant K-minimal sets, etc.), Niyobuhingiro [31] has investigated the ROF model in the discrete case and Setterqvist [40] has probed the limits to which taut string methods may be extended.

The taut string algorithm instructs us to minimize the curve length functional L(W) in (2) among all functions W whose graphs are curves through the points (aF(a)) and (bF(b)) and which lies within the tube \(T_\lambda \). The name of the algorithm derives from this shortest path problem. It turns out that one may just as well minimize the energy associated with an elastic rubber band satisfying the same boundary conditions and the same constraints:

$$\begin{aligned} \min _{W\in T_\lambda } E(W):=\frac{1}{2}\int _a^b W'(x)^2\,\mathrm{d}x\;, \end{aligned}$$
(3)

That the new problem (3) has the same solution (\(W_\lambda \)) as (2) is the content of the following interesting lemma:

Lemma 1

Let \(H:{\mathbf {R}}\rightarrow {\mathbf {R}}\) be a convex \(C^1\)-function and set

$$\begin{aligned} L_H(W)=\int _I H(W'(x))\,\mathrm{d}x. \end{aligned}$$

If \(W_*:={{{{\mathrm{arg}}~{\mathrm{min}}}}}_{W\in T_\lambda }E(W)\), then \(W_*\) is also a solution to the minimization problem \(\min _{W\in T_\lambda } L_H(W)\). Moreover, if H is strictly convex then \(W_*\) is the unique minimizer in \(T_\lambda \) of \(L_H\).

If we take \(H(s)=(1+s^2)^{1/2}\), it follows that (2) and (3) have precisely the same minimizer in \(T_\lambda \), namely \(W_\lambda \). While this statement seems intuitively clear from our everyday experience with rubber bands and strings, the mathematical assertion is not equally self-evident. A proof is consequently offered in “Appendix A”.

The paper, which is a considerably modified and enlarged version of the author’s preprint [32] and subsequent conference paper [33], has two main purposes. The first is to present a new, elementary proof of the following remarkable result:

Theorem 1

The taut string algorithm and the ROF model yield the same solution; \(f_\lambda = u_\lambda \).

This is not new—a discrete version of this theorem was proved in [20, 29]. In the continuous setting, the equivalence result was explicitly stated and proved by Grassmair [24]. There is also an extensive treatment in [39, Ch. 4]. Indeed, a few years earlier, Hintermüller and Kunisch [26, p.7], in a brief (but inconclusive) remark, refer to the close relation between the one-dimensional ROF model and the taut string algorithm. The new proof is given in Sect. 5 after the basic existence theory for the ROF model has been developed.

Fig. 1
figure 1

Graphical illustrations of the steps in the taut string algorithm applied to a piecewise constant signal

The second main purpose of the paper is to prove the following “fundamental” estimate on the denoised signal (see Sect. 8):

Theorem 2

If the signal f belongs to \(\mathrm{BV}(I)\), then for any \(\lambda > 0\), the denoised signal \(u_\lambda \) satisfies the inequality

$$\begin{aligned} -(f')^- \le u_\lambda ' \le (f')^+ \, , \end{aligned}$$
(4)

where \((f')^+\) and \((f')^-\) denote the positive and the negative variations, respectively, of \(f'\) (distributional derivative).

Here, \(f'\), as well as the derivative \(u_\lambda '\), is computed in the distributional sense and is, in general, a signed measure. Recall that \((f')^+\) and \((f')^-\) are finite positive measures satisfying \(f'=(f')^+ - (f')^-\), see, e.g. [38, Sect. 6.6]. As an example, the reader may compute the derivatives of f and \(u_\lambda =f_\lambda \) shown in Fig. 1d. Theorem 2 immediately implies the “edge-preserving” property of the one-dimensional ROF model: the solution \(u_\lambda \) can only have a “jump” at points where the data f has a jump, a qualitative result which holds even in the multidimensional case, as proved by Caselles et al. [16]. In the one-dimensional case, we obtain, in addition, a quantification of these jumps; they have the same sign as and are dominated by the jumps in the data. This assertion is embodied in the estimate (4) which does not carry over to higher dimensions. The proof of Theorem 2 is based on (an extension to bilateral obstacle problems of) the classic Lewy–Stampacchia inequality [28] and uses the taut string interpretation (Theorem 1) in an essential way. The proof of Theorem 2 and its implications are given in Sect. 8.

Our Theorem 2 turns out to be a special case of an estimate proved in Briani et al. [14, Lemmas 3, 4] and is related to a result (Lemmas 2, 3) in Bonfore and Figalli [11, p. 4459]. Both papers study the gradient flows associated with certain one-homogeneous functionals. In [14], it is functionals of the form \(\int _{{\varOmega }} | {\text {div}}\,{\varvec{w}}|\,\mathrm{d}x\) for vector fields \({\varvec{w}}\) defined on \({\varOmega }\), and the paper was not directly concerned with the ROF model. The relevance of these papers was first pointed out the author after the publication of [33]. Moreover, our method of proof differs from those in [11, 14] and we use Theorem 2 to derive the number of fundamental properties of the one-dimensional ROF model.

Perhaps, the most significant consequence of Theorem 2 is that for any in-signal f belonging to \(\mathrm{BV}(I)\), we get \(u_\lambda \rightarrow f\) strongly in \(\mathrm{BV}(I)\) as \(\lambda \rightarrow 0+\), in particular we show that \(\int | f' - u_\lambda '|\,\mathrm{d}x \rightarrow 0\) as \(\lambda \rightarrow 0+\), see Proposition 8. The usual Moreau–Yosida approximation result, see, for example, [2, Ch. 17], only contains the weaker assertion that \(u_\lambda \rightarrow f\) in \(L^2(I)\) and \(\int _I |u_\lambda '|\,dx\rightarrow \int _I|f'|\,dx\) as \(\lambda \) tends to zero.

The literature on the ROF model is extensive, and many of the results about the one-dimensional case are often scattered throughout research articles and monographs as examples illustrating the more general multidimensional theory that is their real focus. This sometimes makes these results hard to find and, once found, these examples may be hard to follow as they rest on the general theoretical framework already developed up to that point in the text. Newcomers to the field as well as more application-oriented researchers would probably welcome an introduction to the topic which could give an overview of the theory of the one-dimensional ROF model, on the one hand, and introduce him to some of the ideas which is used in the analysis of the general case.

The present paper may be seen as such an overview. Here, the theory is developed from scratch and known properties of the ROF model are collected in one place and given efficient proofs within a unified framework. The style is expository and is considered to be accessible to anyone who wants to learn about total variation techniques—a little bit of measure theory, knowledge of basic functional analysis and of Sobolev spaces in one dimension are the only prerequisites needed to follow the text. In fact, once the total variation of a function has been properly defined, it turns out that the theory of the ROF model in one dimension hinges on little more than the projection theorem (onto closed convex sets) and completion by squares. As we shall see, this elementary setting allows us to introduce and highlight, in a concrete setting, some of the interesting phenomena which occur in the analysis of more general convex variational problems.

Some of the known results, apart from Theorems 1 and 2, for which new proofs have been supplied are: (i) Propositions 4 where some basic properties of the ROF model are re-derived, and (ii) Propositions 6 and 5, where some precise results on the rate of convergence \(u_\lambda \rightarrow f\), and of the value function \(E_\lambda (u_\lambda )\), are given as \(\lambda \) tends to zero—collecting all such result in one place! (iii) Moreover, a new and slick proof of the fact that \(u_\lambda \) is a semi-group with respect to \(\lambda \) is given (Proposition 10), and we also derive the infinitesimal generator of this semi-group. (iv) Finally, we indicated how our method can be modified prove the “lower convex envelope” interpretation of the solution to the isotonic regression problem, Sect. 11.

Some entirely new results have also emerged: Proposition 8, which asserts that \(u_\lambda \rightarrow f\) in \(\mathrm{BV}(I)\) as \(\lambda \rightarrow 0+\) whenever \(f\in \mathrm{BV}(I)\) is new. The author also believes that the statement in part 2 of Proposition 9 is new and likewise is its consequence in Corollary 2. The explicit solution to the ROF model given in Example 5 also seems to appear here for the first time. And, at least in the context of the ROF model, the improved convergence rate proved in Proposition 6 seems to have gone unnoticed until now. Moreover, Proposition 3 gives an improvement to a known “gap” estimate found in [44]. Finally, our treatment of the fused lasso model in the continuous setting in Sect. 10 seems to be the first of its kind.

2 Our Analysis Toolbox

Throughout this paper, I denotes an open, bounded interval (ab), where \(a<b\) are real numbers and \({\bar{I}}=[a,b]\) is the corresponding closed interval.

\(C_0^1(I)\) denotes the space of continuously differentiable (test) functions \(\xi :I\rightarrow {\mathbf {R}}\) with compact support in I, and \(C({\bar{I}})\) is the space of continuous functions on the closure of I.

For \(1\le p \le \infty \), \(L^p(I)\) denotes the Lebesgue space of measurable functions \(f:I\rightarrow {\mathbf {R}}\) with finite p-norm; \(\Vert f\Vert _p:= \big (\int _a^b |f(x)|^p\,\mathrm{d}x\big )^{1/p} < \infty \), when p is finite, and \(\Vert f\Vert _\infty = {\text {ess sup}}_{x\in I}|f(x)| <\infty \) when \(p=\infty \). The space \(L^2(I)\) is a Hilbert space with the inner product \( \langle f,g\rangle =\langle f,g\rangle _{L^2(I)} := \int _a^b f(x)g(x)\,\mathrm{d}x\) and the corresponding norm \(\Vert f\Vert :=(\langle f,f\rangle _{L^2(I)})^{1/2}=\Vert f\Vert _2\).

We are going to need the Sobolev spaces over \(L^2\):

$$\begin{aligned} H^1(I)=\big \{ u\in L^2(I)\,:\, u'\in L^2(I) \big \}\;, \end{aligned}$$

where \(u'\) denotes the distributional derivative of u. This is a Hilbert space when equipped with the inner product \(\langle u,v\rangle _{H^1}:= \langle u,v\rangle + \langle u',v'\rangle \) and the corresponding norm \(\Vert u\Vert _{H^1} =( \Vert u'\Vert _2^2 + \Vert u\Vert _2^2 )^{1/2}\). Any \(u\in H^1(I)\) can, after correction on a set of measure zero, be identified with a unique function in \(C({\bar{I}})\). In particular, a unique value u(x) can be assigned to u for every \(x\in {\bar{I}}\).

The following subspace of \(H^1(I)\) plays an important role in our analysis:

$$\begin{aligned} H_0^1(I) = \big \{ u\in H^1(I)\,:\, u(a)=0\text { and } u(b)=0\, \big \}\;. \end{aligned}$$

Here, \(\langle u,v\rangle _{H_0^1(I)} := \int _a^b u'(x)v'(x)\,\mathrm{d}x\) defines an inner product on \(H_0^1(I)\) whose induced norm

$$\begin{aligned} \Vert u\Vert _{H_0^1(I)}=\Vert u'\Vert _2 \end{aligned}$$

is equivalent to the norm inherited from \(H^1(I)\) (by the Poincaré inequality).

Finally, let H be a (general) real Hilbert space with inner product between \(u,v\in H\) denoted by \(\langle u,v \rangle \) and the corresponding norm \(\Vert u\Vert =\sqrt{\langle u,u \rangle }\). The following result is standard [13, Théorème V.2]:

Proposition 1

(Projection Theorem) Let \(K\subset H\) be a non-empty closed convex set. Then, for every \(z\in H\) there exists a unique point \(x_*\in K\) such that

$$\begin{aligned} \Vert z-x_* \Vert = \min _{x\in K}\Vert z- x \Vert . \end{aligned}$$

Moreover, the minimizer \(x_*\) is characterized by the following property:

$$\begin{aligned} x_*\in K\quad \text {and}\quad \langle z-x_*,x-x_*\rangle \le 0,\text { for all } x\in K. \end{aligned}$$
(5)

The point \(x_*\) is called the projection of z onto K and is denoted \(x_*={Pr}_K(z)\).

Recall that the projection onto a set K is a non-expansive mapping:

$$\begin{aligned} \Vert {Pr}_K(y)-{Pr}_K(x)\Vert \le \Vert y -x\Vert \end{aligned}$$
(6)

for all \(x, y\in H\). We also need the nonlinear mapping \({Sr}_K:H\rightarrow H\) defined as the residual after projection onto K;

$$\begin{aligned} {Sr}_K(z) := z - {Pr}_K(z),\quad (z\in H). \end{aligned}$$
(7)

This is called the shrinkage mapping associated with \({Pr}_K\), and it is also non-expansive,

$$\begin{aligned} \Vert {Sr}_K(y)-{Sr}_K(x)\Vert \le \Vert y -x\Vert \end{aligned}$$
(8)

for all \(x, y\in H\).

The two estimates (6) and (8) follow immediately from the inequality,

$$\begin{aligned}&\Vert {Pr}_K(y)- {Pr}_K(x)\Vert ^2 \nonumber \\&\quad + \Vert {Sr}_K(y)-{Sr}_K(x)\Vert ^2 \le \Vert y -x\Vert ^2 \end{aligned}$$
(9)

which is valid for all \(x, y\in H\). To derive this inequality, take \(u={Pr}_K(y)-{Pr}_K(x)\) and \(v={Sr}_K(y)-{Sr}_K(x)\) in the parallelogram identity \(2(\Vert u\Vert ^2 + \Vert v\Vert ^2)=\Vert u-v\Vert ^2 + \Vert u+v\Vert ^2\). After the second term on the right-hand side has been expanded and the resulting identity simplified, we obtain:

$$\begin{aligned}&\Vert {Pr}_K(y)-{Pr}_K(x)\Vert ^2+ \Vert {Sr}_K(y)-{Sr}_K(x)\Vert ^2 \\&\quad =\Vert y -x\Vert ^2 -2\langle {Pr}_K(y)-{Pr}_K(x),{Sr}_K(y)-{Sr}_K(x)\rangle . \end{aligned}$$

Now, (9) follows if it can be proved that

$$\begin{aligned} -\langle {Pr}_K(y)-{Pr}_K(x),{Sr}_K(y)-{Sr}_K(x)\rangle \ge 0. \end{aligned}$$

This follows from the characterization of the projection map (Theorem 1) by first applying (5) to the projection of y taking \(v={Pr}_K(x)\) and then applying (5) to the projection of x with \(v={Pr}_K(y)\). If the resulting inequalities are added, the above inequality follows. This proves (9).

Example 1

As an illustration, consider the existence and uniqueness of a taut string as defined by (3). Assume, for simplicity, that the cumulative signal F satisfies \(F(a)=F(b)=0\). Then, \(T_\lambda \) is a closed convex subset of \(H_0^1(I)\)—non-empty because \(F\in T_\lambda \). Since \(\Vert u\Vert _{H_0^1(I)} = \Vert u'\Vert _{L^2(I)}\), we immediately see that (3) is equivalent to

$$\begin{aligned} \min _{W\in T_\lambda }\Vert z - W\Vert _{H_0^1(I)}, \end{aligned}$$

with \(z = 0\) (the origin). It follows from the projection theorem that there exists a unique solution \(W_\lambda = {Pr}_{T_\lambda }(0)\) of this minimization problem.

Example 2

An even simpler example is the following. The set \(B:=\{ u\in L^2(I)\; .\; \Vert u\Vert _\infty \le 1\}\) is a closed convex subset of \(L^2(I)\). For \(\lambda \ge 0\), let \(\lambda B := \{ \lambda u \;:\; u\in B\}\). The projection of \(\varphi \in L^2(I)\) onto \(\lambda B\) is given by truncation:

$$\begin{aligned} {Pr}_{\lambda B}(\varphi ) = {Pr}_{[-\lambda ,\lambda ]}\circ \,\varphi \;, \end{aligned}$$
(10)

where \({Pr}_{[-\lambda ,\lambda ]}: {\mathbf {R}}\rightarrow {\mathbf {R}}\) is the function given by

$$\begin{aligned} {Pr}_{[-\lambda ,\lambda ]}(t) = {\left\{ \begin{array}{ll} t &{} \text {if }|t|\le \lambda \;,\\ \lambda {\text {sign}}(t) &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$
(11)

This is, in fact, the projection in one dimension of \(t\in {\mathbf {R}}\) onto the closed interval \([-\lambda ,\lambda ]\). Since \({Pr}_{[-\lambda ,\lambda ]}\circ \,\varphi \) clearly belongs to \(\lambda B\), we only need to verify the condition \(\langle \varphi - {Pr}_{[-\lambda ,\lambda ]}\circ \,\varphi \,,\,v-{Pr}_{[-\lambda ,\lambda ]}\circ \,\varphi \rangle \le 0\) for all \(v\in \lambda B\). Now, this inequality follows from a simple calculation:

$$\begin{aligned}&\langle \varphi - {Pr}_{[-\lambda ,\lambda ]}\circ \,\varphi \,,\,v-{Pr}_{[-\lambda ,\lambda ]}\circ \,\varphi \rangle \\&\quad = \int _{\{\varphi \ge \lambda \}}(\varphi -\lambda )(v-\lambda ) + \int _{\{\varphi \le \lambda \}}(\varphi +\lambda )(v+\lambda )\le 0 \end{aligned}$$

which holds because \(v\in \lambda B\) implies \(-\lambda \le v(x)\le \lambda \) for almost all \(x\in I\). The “residual” \(\varphi - {Pr}_{\lambda {\hat{C}}}(\varphi )\) of the projection is given by the formula \({Sr}_{[-\lambda ,\lambda ]}\circ \,\varphi \), where \({Sr}_{[-\lambda ,\lambda ]}:{\mathbf {R}}\rightarrow {\mathbf {R}}\) is the so-called soft threshold map or shrinkage map defined by \({Sr}_{[-\lambda ,\lambda ]}(t) = t - {Pr}_{[-\lambda ,\lambda ]}(t)\). We shall meet both functions again in the sequel.

3 Precise Definition of the ROF Model

The expression \(\int _I |u'|\,\mathrm{d}x\) for the total variation makes sense for \(u\in H^1(I)\) but is otherwise merely a convenient symbol. A more general and precise definition is needed: one which works when \(u'\) does not exist in the classical sense. The standard way to define the total variation is via duality: For \(u\in L^1(I)\) set,

$$\begin{aligned} J(u) = {\text {sup}}\Big \{ \int _a^bu(x)\xi '(x)\,\mathrm{d}x\, : \xi \in C_0^1(I),\, \Vert \xi \Vert _\infty \le 1\Big \} . \end{aligned}$$

If \(J(u)<\infty \), u is said to be a function of bounded variation on I, and J(u) is called the total variation of u (using the same notation as in [17, 19]). The set of all integrable functions on I of bounded variation is denoted \(\mathrm{BV}(I)\), that is, \(\mathrm{BV}(I) = \big \{u\in L^1(I)\, :\, J(u) <\infty \big \}\). This becomes a Banach space when equipped with the norm \(\Vert u\Vert _{BV}:=J(u) + \Vert u\Vert _{L^1}\). Notice that, as already mentioned, if \(u\in H^1(I)\)\(J(u)=\int _I |u'|\,\mathrm{d}x < \infty \), so \(u\in \mathrm{BV}(I)\).

Let us illustrate how the definition works for a function with a jump discontinuity:

Example 3

Let \(u(x)={\text {sign}}(x)\) for \(x\in I=(-1,1)\). For any \(\xi \in C_0^1(I)\), satisfying \(|\xi (x)|\le 1\) for all \(x\in I\), we have

$$\begin{aligned}&\int _{-1}^{1} u(x)\xi '(x)\,\mathrm{d}x \\&\quad = \int _0^1\xi '(x)\,\mathrm{d}x -\int _{-1}^0\xi '(x)\,\mathrm{d}x=-2\xi (0) \le 2, \end{aligned}$$

where equality holds for any admissible \(\xi \) which satisfies \(\xi (0)=-1\). So \(J(u)=2\) and \(u\in \mathrm{BV}(I)\), as predicted by intuition. Here, the supremum is attained for many choices of \(\xi \). This is not always the case; if \(u(x)=x\) on \(I=(0,1)\), then \(J(u)=1\), but the supremum is not attained by any admissible test function.

The following lemma shows that the definition of the total variation J and the space \(\mathrm{BV}(I)\) can be moved to a Hilbert space setting involving \(L^2\) and \(H_0^1\).

Lemma 2

Every \(u\in \mathrm{BV}(I)\) belongs to \(L^2(I)\) and

$$\begin{aligned} J(u)=\mathop {{\text {sup}}}\limits _{\xi \in K}\, \langle u,\xi '\rangle _{L^2(I)}\;, \end{aligned}$$
(12)

where \(K=\{\,\xi \in H_0^1(I)\,:\, \Vert \xi \Vert _\infty \le 1 \,\}\), which is a closed and convex set in \(H_0^1(I)\).

Proof

If \(u\in \mathrm{BV}(I)\), then Sobolev’s lemma for functions of bounded variation, see [2, p. 152], ensures that \(u\in L^\infty (I)\). This in turn implies \(u\in L^2(I)\) because I is bounded. The (ordinary) Sobolev’s lemma asserts that \(H_0^1(I)\) is continuously embedded in \(L^\infty (I)\). Since K is the inverse image under the embedding map of the unit ball in \(L^\infty (I)\), which is both closed and convex, we draw the conclusion that K is closed and convex in \(H_0^1\).

It only remains to prove (12). Clearly, J(u) cannot exceed the right-hand side because the set \(\{\, \xi \in C_0^1(I)\,:\, \Vert \xi \Vert _\infty \le 1\,\}\) is contained in K. To verify that equality holds, it is enough to prove the inequality

$$\begin{aligned} \langle u,\xi '\rangle _{L^2(I)} \le J(u) \Vert \xi \Vert _{\infty },\quad \text {for all } \xi \in H_0^1(I), \end{aligned}$$
(13)

as it implies that the right-hand side of (12) cannot exceed J(u). To do this, we first notice that the inequality holds for all \(\zeta \in C_0^1(I)\). This follows by applying homogeneity to the definition of J(u). Second, if \(\xi \in H_0^1(I)\) we can use that \(C_0^1(I)\) is dense in \(H_0^1(I)\) and find functions \(\zeta _n\in C_0^1(I)\) such that \(\zeta _n\rightarrow \xi \) in \(H_0^1(I)\) (and in \(L^\infty (I)\) by the continuous embedding). It follows that

$$\begin{aligned} \langle u,\xi '\rangle _{L^2(I)}&= \lim _{n\rightarrow \infty } \langle u,\zeta _n'\rangle _{L^2(I)}\\&\le J(u) \lim _{n\rightarrow \infty } \Vert \zeta _n\Vert _{\infty } = J(u) \Vert \xi \Vert _{\infty }, \end{aligned}$$

which establishes (13) and the proof is complete. \(\square \)

The inequality (13) shows that \(\xi \rightarrow \langle u,\xi '\rangle \) extends to a continuous linear functional on \(C_0(I)\). Riesz’ representation theorem (cf., e.g. [2, Thm. 1.54]) therefore implies that the distributional derivative \(u'\) of \(u\in \mathrm{BV}(I)\) is a signed (Radon) measure \(\mu \) on I, and that we may write

$$\begin{aligned} \langle u, \xi ' \rangle _{L^2(I)} = \int _I \xi \,\mathrm{d}\mu \, . \end{aligned}$$

This will be useful later on.

We can now give the precise definition of the ROF model: For any \(f\in L^2(I)\) and any real number \(\lambda >0\), the ROF functional is the function \(E_\lambda :\mathrm{BV}(I)\rightarrow {\mathbf {R}}\) given by

$$\begin{aligned} E_\lambda (u)=\lambda J(u)+\frac{1}{2}\Vert f- u \Vert _{L^2(I)}^2\;. \end{aligned}$$
(14)

Denoising according to the ROF model is the mapping \(L^2(I) \ni f\mapsto u_\lambda \in \mathrm{BV}(I)\) defined by (1). To emphasize the role of the in-signal f, we sometimes write \(E_\lambda (f;u)\) instead of \(E_\lambda (u)\) and denote the corresponding ROF minimizer \(u_\lambda \) by the more elaborate \(u_\lambda (f)\). Well-posedness of the ROF model is demonstrated in Sect. 5 after some simple observations about the symmetry properties of \(E_\lambda (f,\cdot )\) have been presented in the next section.

4 Simple Symmetries of the ROF Functional

We begin with some simple observations. For any \(u\in \mathrm{BV}(I)\), the total variation functional J satisfies \(J(u+c)=J(u)\) for all \(c\in {\mathbf {R}}\) and \(J(cu) = cJ(u)\) whenever \(c>0\). This implies the following formulas for the ROF functional:

$$\begin{aligned}&E_\lambda (f;u) = E_\lambda (f+c;u+c)\,,\quad \text {for } c\in {\mathbf {R}}, \text { and}\\&c^2E_\lambda (f,u) = E_{c\lambda }(cf;cu)\,,\quad \text {for }c>0. \end{aligned}$$

When \(u_\lambda \) denotes the minimizer of \(E_\lambda (f;\cdot )\), the first of these identities implies that the minimizer of the functional \(E_\lambda (f+c;\cdot )\) is the function \(u_\lambda + c\). Likewise, if \(c>0\), the second identity shows that the function \(cu_{\lambda /c}\) minimizes \(E_\lambda (cf;\cdot )\). These symmetry properties (translation and scaling of the dependent variable) can be expressed as

$$\begin{aligned}&u_\lambda (f+c) = u_\lambda (f)+c\quad \text {and} \quad (c\in {\mathbf {R}}) \end{aligned}$$
(15)
$$\begin{aligned}&u_\lambda (cf) = cu_{\lambda /c}(f), \quad (c> 0.) \end{aligned}$$
(16)

We also have \(E_\lambda (-f;-u)=E_\lambda (f;u)\), so changing the sign of the signal will just change the sign of the minimizer; \(u_\lambda (-f)=-u_\lambda (f)\). For any integrable function f, let \(f_I=|I|^{-1}\int _I f\,\mathrm{d}x\) denote the mean value of f on the interval I. The formula (15) with \(c=-f_I\) becomes

$$\begin{aligned} u_\lambda (f) = u_\lambda (f-f_I) + f_I \end{aligned}$$
(17)

which shows that we may restrict ourselves to consider signals with zero mean, \(f_I=0\). In fact, the following easily verified identity

$$\begin{aligned} E_\lambda (u-u_I) = E_\lambda (u) -\frac{1}{2}\Vert u_I \Vert ^2 \end{aligned}$$

implies that the minimizer \(u_\lambda \) of \(E_\lambda \) has zero mean. (For general f, we have \((u_\lambda )_I = f_I\) which is seen by taking mean values in (17).) Consequently, if \(f_I=0\) it is enough to minimize \(E_\lambda \) over functions u with zero mean.

5 Existence Theory for the ROF Model

The following theorem contains the key result for developing the properties of the ROF model. It proves existence and uniqueness of the ROF minimizer \(u_\lambda \) for a general signal f in \(L^2\) and gives a necessary and sufficient characterization of this minimizer in terms of itself and a dual variable (\(\xi _\lambda \) in the theorem). Throughout the analysis, we assume that f has mean value zero in \(I=(a,b)\). This assumption, which is strictly speaking not needed in order for the result to be true, implies that the cumulative signal F(x) satisfies \(F(a)=F(b)=0\), and hence, \(F\in H_0^1(I)\), which will simplify the exposition.

Theorem 3

We have the equality

$$\begin{aligned} \min _{u\in \mathrm{BV}(I)} E_\lambda (u) = \max _{\xi \in K} \frac{1}{2}\Big \{ \Vert f\Vert _{L^2(I)}^2 - \Vert f-\lambda \xi '\Vert _{L^2(I)}^2\Big \},\nonumber \\ \end{aligned}$$
(18)

with the minimum achieved by a unique \(u_\lambda \in \mathrm{BV}(I)\) and the maximum achieved by a unique \(\xi _\lambda \in K\). The two functions are related by the identities

$$\begin{aligned} u_\lambda =f-\lambda \xi _\lambda '\;, \end{aligned}$$
(19a)

and

$$\begin{aligned} J(u_\lambda ) = \langle u_\lambda , \xi _\lambda ' \rangle _{L^2(I)}\;. \end{aligned}$$
(19b)

Moreover, if \(u_\lambda \ne 0\), then \(\Vert \xi _\lambda \Vert _\infty =1\). Conversely, the conditions (19a) and (19b) characterize the solution; if a pair of functions \({\bar{u}}\in \mathrm{BV}(I)\) and \({\bar{\xi }}\in K\) satisfy \({\bar{u}}=f-\lambda {\bar{\xi }}'\) and \(J({\bar{u}}) = \langle {\bar{u}} , \bar{\xi '} \rangle _{L^2(I)}\), then \({\bar{u}}=u_\lambda \) and \({\bar{\xi }}=\xi _\lambda \).

This result is an instance of the Fenchel–Rockafellar theorem, see, for example, Brezis [13, p. 11]. It is tailored with our specific needs in mind and will be proved with our bare hands using the projection theorem. (The general version was used by Hintermüller and Kunisch [26] in their analysis of the multidimensional ROF model with the “Manhattan metric”.) In one of the first theoretical analyses of the ROF model, Chambolle and Lions [19] proved the existence of a minimizer (for a more general case) using the standard argument where a minimizing sequence is shown to converge weakly to function which can be shown to be the desired solution. The equality (18) has played an important role in the development of numerical algorithms for total variation minimization, either directly, as in Zhu et al. [44] or, more indirectly, as in Chambolle [17].

Before the proof starts, let us remind the reader of the following general fact: If M and N are arbitrary non-empty sets and \({\varPhi }:M\times N\rightarrow {\mathbf {R}}\) is any real-valued function, then the inequality

$$\begin{aligned} \inf _{x\in M}\mathop {{\text {sup}}}\limits _{y\in N}{\varPhi }(x,y) \ge \mathop {{\text {sup}}}\limits _{y\in N}\inf _{x\in M}{\varPhi }(x,y)\;, \end{aligned}$$
(20)

is always true, as is easily checked. The use of \(\inf \)’s and \({\text {sup}}\)’s is crucial, as neither the greatest lower bounds nor the least upper bounds are necessarily attained.

Proof

Since \(E_\lambda (u) = {{\text {sup}}}_{\xi \in K} \lambda \langle u,\xi '\rangle + \frac{1}{2}\Vert f-u\Vert ^2\), it follows from (20) that

$$\begin{aligned} \inf _{u\in \mathrm{BV}(I)}E_\lambda (u) \ge \mathop {{\text {sup}}}\limits _{\xi \in K}\Big \{ \inf _{u\in \mathrm{BV}(I)}\lambda \langle u,\xi '\rangle +\frac{1}{2}\Vert u-f\Vert ^2 \Big \}\;. \end{aligned}$$

We first solve, for \(\xi \in K\) fixed, the minimization problem on the right-hand side. Expanding \(\Vert f-u\Vert ^2\) and completing squares with respect to u yields:

$$\begin{aligned}&\lambda \langle u,\xi '\rangle +\frac{1}{2}\Vert u-f\Vert ^2\\&\quad = \frac{1}{2}\Big \{ \Vert u-(f-\lambda \xi ')\Vert ^2 -\Vert f-\lambda \xi '\Vert ^2+\Vert f\Vert ^2 \Big \} \end{aligned}$$

The right-hand side is clearly minimized by the \(L^2(I)\)-function \(u=f-\lambda \xi '\) and

$$\begin{aligned} \inf _{u\in \mathrm{BV}(I)}E_\lambda (u) \ge \mathop {{\text {sup}}}\limits _{\xi \in K} \frac{1}{2}\Big \{ \Vert f\Vert ^2-\Vert f-\lambda \xi '\Vert ^2\Big \} \end{aligned}$$
(21)

holds. The maximization problem on the right-hand side is equivalent to

$$\begin{aligned} \inf _{\xi \in K} \Vert f-\lambda \xi '\Vert&= \inf _{\xi \in K} \Vert F'-\lambda \xi '\Vert \nonumber \\&= \lambda \inf _{\xi \in K} \Vert \lambda ^{-1}F -\xi \Vert _{H_0^1(I)}\;. \end{aligned}$$
(22)

By Proposition 1, this problem has the unique solution \(\xi _\lambda =P_K(\lambda ^{-1}F)\in K\), so the supremum is attained in (21). Now, let the function \(u_\lambda \) be defined by (19a) in the theorem. A priori, \(u_\lambda \) belongs to \(L^2(I)\), but we are going to show that \(u_\lambda \in \mathrm{BV}(I)\): the characterization of \(\xi _\lambda \) according in the projection theorem states that \( \xi _\lambda \in K\) and \(\langle f-\lambda \xi _\lambda ', \lambda \xi '-\lambda \xi _\lambda '\rangle \le 0\) for all \(\xi \in K\). If we use the definition of \(u_\lambda \) and divide by \(\lambda >0\), this characterization becomes

$$\begin{aligned} \langle u_\lambda ,\xi '\rangle \le \langle u_\lambda ,\xi _\lambda '\rangle \quad \text {for all }\xi \in K, \end{aligned}$$

where the right-hand side is finite. It follows from the definition of the total variation that \(u_\lambda \in \mathrm{BV}(I)\) with \(J(u_\lambda )=\langle u_\lambda ,\xi _\lambda '\rangle \), as asserted in the theorem. (This reasoning can be reversed; if (19b) is true, then \(\xi _\lambda \) is the minimizer in (22).) Also, if \(u_\lambda \ne 0\) then \(\Vert \xi _\lambda \Vert _\infty <1\) is not consistent with the maximizing property (19b), and hence, \(\Vert \xi _\lambda \Vert _\infty =1\), as claimed.

It remains to be verified that \(u_\lambda \) minimizes \(E_\lambda \) and that equality holds in (21). This follows from a direct calculation:

$$\begin{aligned} \inf _{u\in \mathrm{BV}(I)} E_\lambda (u)&\ge \max _{\xi \in K} \frac{1}{2}\Big \{ \Vert f\Vert ^2-\Vert f-\lambda \xi '\Vert ^2\Big \}\\&= \frac{1}{2}\Vert f\Vert ^2 - \frac{1}{2}\Vert u_\lambda \Vert ^2\\&= \frac{1}{2}\Vert f\Vert ^2 +\frac{1}{2}\Vert u_\lambda \Vert ^2 - \Vert u_\lambda \Vert ^2\\&= \frac{1}{2}\Vert f\Vert ^2 +\frac{1}{2}\Vert u_\lambda \Vert ^2 -\langle u_\lambda ,f-\lambda \xi _\lambda '\rangle \\&=\frac{1}{2}\Vert f-u_\lambda \Vert ^2 + \langle u_\lambda , \lambda \xi _\lambda \rangle \\&=\frac{1}{2}\Vert f-u_\lambda \Vert ^2 + \lambda J(u_\lambda )\\&= E_\lambda (u_\lambda )\;. \end{aligned}$$

So \(\inf E_\lambda (u) = E_\lambda (u_\lambda )\), the infimum is attained, and equality holds in (21). The inequality \(E_\lambda (u)-E_\lambda (u_\lambda )\ge \frac{1}{2}\Vert u-u_\lambda \Vert ^2\) implies the uniqueness of \(u_\lambda \). The converse statement is proved by backtracking the steps of the above proof. \(\square \)

The equivalence of the two denoising models can now be established:

Proof of Theorem 1

It follows from Theorem 3 that the minimizer \(u_\lambda \) of the ROF functional is given by \(u_\lambda = f- \lambda \xi _\lambda '\) where \(\xi _\lambda \) is the unique solution of

$$\begin{aligned} \min _{\xi \in K}\frac{1}{2}\Vert f-\lambda \xi '\Vert _{L^2(I)}^2\;. \end{aligned}$$
(23)

If we introduce the new variable \(W:=F-\lambda \xi \), where \(F\in H_0^1(I)\) is the cumulative signal, then \(W\in H_0^1(I)\) and the condition \(\Vert \xi \Vert _\infty \le 1\) implies that W satisfies \(F(x)-\lambda \le W(x)\le F(x)+\lambda \) on I. Therefore, (23) is equivalent to

$$\begin{aligned} \min _{W\in T_\lambda }\frac{1}{2}\Vert W'\Vert _{L^2(I)}^2\,, \end{aligned}$$

which is the minimization problem in step 3 of the taut string algorithm whose solution is denoted by \(W_\lambda \). It follows that \(W_\lambda =F-\lambda \xi _\lambda \) and differentiation yields \(f_\lambda = W_\lambda '= f-\lambda \xi _\lambda ' = u_\lambda \), the desired result. \(\square \)

Our proof of Theorem 1 is essentially a change of variables and, as such, becomes almost a “derivation” of the taut string interpretation. We also get the existence and uniqueness of solutions to both models in one stroke. The proof given in [24] first shows that \(u_\lambda \) and \(W_\lambda '\) satisfy the same set of three necessary conditions, and that these conditions admit at most one solution. Then, it proceeds to drive home the point by establishing existence separately for both models. The argument assumes \(f\in L^\infty \) and involves a fair amount of measure-theoretic considerations. The proof of equivalence given in [39] is based on a thorough functional analytic study of Meyer’s G-norm and is not elementary.

The last two proofs contain the following useful observations:

Corollary 1

The optimal dual variable \(\xi _\lambda \) is given by projection in \(H_0^1(I)\) onto K,

$$\begin{aligned} \xi _\lambda ={Pr}_K(\lambda ^{-1}F)\, , \end{aligned}$$

and the taut string \(W_\lambda \) by the shrinkage map,

$$\begin{aligned} W_\lambda = {Sr}_{\lambda K}(F)= \lambda {Sr}_K(\lambda ^{-1}F)\, , \end{aligned}$$

where \(F\in H_0^1(I)\) is the cumulative signal and \(\lambda >0\).

Denoising according to the ROF model is a mapping \(f\mapsto u_\lambda (f)\) which is contractive and continuous with respect to \(\lambda >0\):

Proposition 2

(a) For signals \(f, {\bar{f}}\in L^{2}(I)\), we have

$$\begin{aligned} \Vert u_\lambda ({\bar{f}}) -u_\lambda (f) \Vert _{L^2(I)} \le \Vert {\bar{f}} - f\Vert _{L^2(I)}. \end{aligned}$$

(b) For any \(f\in L^2(I)\),

$$\begin{aligned} \Vert u_\lambda (f)-u_\nu (f) \Vert _{L^2(I)} \le \frac{2|\lambda -\nu |}{\max (\lambda ,\nu )}\Vert f\Vert _{L^2(I)} \end{aligned}$$

for all \(\lambda , \nu >0\),

The first assertion of the proposition is a well-known property of the Moreau–Yosida approximation (or of the proximal map), see [6, Theorem 17.2.1]. Both assertions are easy consequences of the corollary.

Proof

(a) We apply the taut string interpretation. Let F and \({\bar{F}}\) be the cumulative signals corresponding to f and \({\bar{f}}\), and let \(W_\lambda \) and \({\bar{W}}_\lambda \) be the associated taut strings. The non-expansiveness of the shrinkage map (8) yields

$$\begin{aligned} \Vert u_\lambda ({\bar{f}})-u_\lambda (f)\Vert _{L^2(I)}&\le \Vert {\bar{W}}_\lambda -W_\lambda \Vert _{H_0^1(I)}\\&= \Vert {Sr}_{\lambda K}({\bar{F}}) - {Sr}_{\lambda K}(F)\Vert _{H_0^1(I)}\\&\le \Vert {\bar{F}} - F \Vert _{H_0^1(I)} \\&= \Vert {\bar{f}} - f \Vert _{L^2(I)}\,, \end{aligned}$$

which is the desired estimate.

(b) This time, we use the non-expansiveness of the projection map to obtain a bound on the difference:

$$\begin{aligned} \Vert u_\lambda - u_\nu \Vert _{L^2(I)}&= \Vert \lambda \xi _\lambda ' -\nu \xi _\nu '\Vert _{L^2(I)}\\&= \Vert \lambda \xi _\lambda -\nu \xi _\nu \Vert _{H_0^1(I)}\\&= \Vert \lambda {Pr}_K(\lambda ^{-1}F) - \nu {Pr}_K(\nu ^{-1}F) \Vert _{H_0^1(I)}\\&\le \Vert \lambda {Pr}_K(\lambda ^{-1}F) - \lambda {Pr}_K(\nu ^{-1}F) \Vert _{H_0^1}\\&\quad + \Vert \lambda {Pr}_K(\nu ^{-1}F) - \nu {Pr}_K(\nu ^{-1}F) \Vert _{H_0^1}\\&\le \lambda \Vert \lambda ^{-1}F-\nu ^{-1}F\Vert _{H_0^1} \\&\quad + \nu ^{-1}|\lambda -\nu |\, \Vert F\Vert _{H_0^1}\\&\le 2\frac{|\lambda -\nu |}{\nu }\Vert f\Vert _{L^2(I)}\, . \end{aligned}$$

Here, it was used that \(\Vert {Pr}_K(F)\Vert _{H_0^1(I)}\le \Vert F\Vert _{H_0^1(I)}\). This follows from the non-expansiveness of the projection since \({Pr}_K(0)=0\). If the roles of \(\lambda \) and \(\nu \) are interchanged, then we get another bound,

$$\begin{aligned} \Vert u_\lambda - u_\nu \Vert _{L^2(I)} \le 2\frac{|\lambda -\nu |}{\lambda } \Vert f\Vert _{L^2(I)}\, . \end{aligned}$$

Taking the smaller of the two gives the desired result. \(\square \)

It is an interesting observation that Theorem 3 associates a unique test function (or dual variable), \(\xi _\lambda \in K\), with the solution \(u_\lambda \) of the ROF model, namely the one which satisfies \(J(u_\lambda )=\langle u_\lambda , \xi _\lambda '\rangle _{L^2}\). In particular, as demonstrated in Example 3, because there are functions u for which the supremum in the definition of \(J(u)\) is not attained. An explicit example of a ROF minimizer looks as follows:

Example 4

Consider the step function on \(I=(-1,1)\),

$$\begin{aligned} f(x)={\text {sign}}(x)\,,\qquad x\in I. \end{aligned}$$

An easy calculation, based on the taut string interpretation, shows that

$$\begin{aligned} u_\lambda =(1-\lambda )_+{\text {sign}}(x)\quad \text {and}\quad \xi _\lambda =\frac{|x|-1}{\max (1,\lambda )}\in H_0^1(I). \end{aligned}$$

In fact, the cumulative signal is \(F(x)=|x|-1\) and, in the case when \(0<\lambda <1\), it is easy to check that \(W_\lambda (x) =(1-\lambda )F(x)\). It follows from the taut string interpretation that \(u_\lambda =W_\lambda '\) and \(\xi _\lambda \) can then be computed from (19a). Notice that the dual variable \(\xi _\lambda \) is not in \(C_0^1(I)\), so the extension of the space of test functions from \(C_0^1\) to \(H_0^1\) is essential to our theory. For \(\lambda \ge 1\), we find \(u_\lambda =0\) (draw a picture) and \(\xi _\lambda = \lambda ^{-1}F(x)\). Note that \(\Vert \xi _\lambda \Vert _\infty = 1\) when \(u_\lambda \ne 0\).

This example was considered in Strong and Chan [42], one of the very first papers treating explicit solutions to the ROF model. At the time, they did neither have access to the the duality formulation of the ROF model nor its taut string interpretation, and their solution consequently goes over several pages.

Another observation is the following: if we derive the Euler–Lagrange equation for the ROF functional \(E_\lambda (u)=\lambda \int _I |u'|+(1/2)\int _I (f-u)^2\) in a purely formal manner, we get

$$\begin{aligned} -\lambda \frac{\mathrm{d}}{\mathrm{d}x}\Big ( \frac{u'}{|u'|}\Big ) = f-u. \end{aligned}$$

The meaning of the nonlinear differential operator on the left- hand side of this equation is unclear as it stands. However, we know from (19a) in Theorem 3 that the minimizer \(u_\lambda \) satisfies \(u_\lambda = f-\lambda \xi _\lambda '\), which, if compared to the formal Euler–Lagrange equation above, leads to the interpretion,

$$\begin{aligned} -\frac{\mathrm{d}}{\mathrm{d}x}\Big ( \frac{u_\lambda '}{|u_\lambda '|}\Big ) = \xi _\lambda '\, \end{aligned}$$
(24)

by homogeneity. In Example 4, we found that the ROF minimizer was proportional to the signal f, namely \(u_\lambda =(1-\lambda )f\) for \(0<\lambda <1\). Therefore,

$$\begin{aligned} -\frac{\mathrm{d}}{\mathrm{d}x}\Big ( \frac{u_\lambda '}{|u_\lambda '|}\Big ) = -\frac{\mathrm{d}}{\mathrm{d}x}\Big ( \frac{f'}{|f'|}\Big ). \end{aligned}$$

On the other hand, \(\xi _\lambda ' = \min (1,\lambda ^{-1})f=f\) which combined with the above equation and (24) yields

$$\begin{aligned} -\frac{\mathrm{d}}{\mathrm{d}x}\Big ( \frac{f'}{|f'|}\Big ) = f\,, \end{aligned}$$
(25)

that is, f is an eigenfunction of the nonlinear operator \(u\mapsto (\mathrm {d}/\mathrm{d}x)(u'/|u'|)\). Eigenfunctions of this sort have been extensively studied in the two-dimensional setting in Bellettini et al. [9]. There it was shown that if f is any such eigenfunction, then \(u_\lambda =(1-\lambda )_+f\) minimizes the two-dimensional ROF functional with regularization weight \(\lambda \) (see also [1, 8]).

6 Certifying the Quality of Approximate Solutions

If we define the functional

$$\begin{aligned} L_\lambda (\xi ) = \frac{1}{2}\big \{ \Vert f\Vert ^2 -\Vert f-\lambda \xi '\Vert ^2\big \}, \end{aligned}$$
(26)

then Theorem 3 states that \(E_\lambda (u)\ge L_\lambda (\xi )\) holds for all \(u\in \mathrm{BV}(I)\) and all \(\xi \in K\), and that we have equality if and only if \(u=u_\lambda \) and \(\xi =\xi _\lambda \). In this section, we ask ourselves the following question: suppose \((u,\xi )\) is an approximate solution in the sense that \(E_\lambda (u)\) is only slightly bigger than \(L_\lambda (\xi )\), does it follow that \((u,\xi )\) is a good approximation to the ROF minimizer in the sense that u is close to \(u_\lambda \) and \(\xi \) close to \(\xi _\lambda \) in their respective functional spaces?

To investigate this question, we follow Zhu et al. [44] and introduce the “gap function”

$$\begin{aligned} {\text {gap}}(u,\xi )&:= E_\lambda (u)-L_\lambda (\xi ) \nonumber \\&= \lambda J(u) + \frac{1}{2}\Vert f-u \Vert ^2_{L^2({\varOmega })} \nonumber \\&+ \frac{1}{2}\Vert f-\lambda \xi ' \Vert ^2_{L^2({\varOmega })}-\frac{1}{2}\Vert f\Vert ^2_{L^2({\varOmega })}\, , \end{aligned}$$
(27)

where \(u\in \mathrm{BV}(I)\) and \(\xi \in K\). Using this “gap function”, we can prove the following refined version of an estimate found in [44]:

Proposition 3

For all \(u\in \mathrm{BV}({\varOmega })\) and \(\xi \in K\), the following estimate holds

$$\begin{aligned} {\text {gap}}(u,\xi ) \ge \frac{1}{2}\Vert u-u_\lambda \Vert ^2_{L^2({\varOmega })} + \frac{1}{2}\Vert \lambda \xi ' -\lambda \xi _\lambda ' \Vert ^2_{L^2({\varOmega })}\, , \end{aligned}$$

where \(u_\lambda \) and \(\xi _\lambda \) are the solution of the ROF problem and its dual.

If we assume that the function u and the dual variable \(\xi \) satisfy the relation \(u = f - \lambda \xi '\), as is the case in many numerical algorithms for TV-minimization, then we get the following estimate as a corollary of the above proposition:

$$\begin{aligned} {\text {gap}}(u,\xi ) \ge \Vert u - u_\lambda \Vert ^2\;. \end{aligned}$$
(28)

This is the actual result stated in [44].

Fig. 2
figure 2

In-signal \(f(x)=\cos (nx)\) with \(n=3\) and the ROF minimizer \(u_\lambda \) with \(3\lambda = \sin (\pi /3)- (\pi /3)\cos (\pi /3) = (3\sqrt{3}- \pi )/6\) and \(\alpha (\lambda ) = \cos (\pi /3)=1/2\) superimposed

Proof

We first introduce any ROF solution pair \(u_\lambda \), \(\xi _\lambda \) into the definition of the duality gap and expand:

$$\begin{aligned} {\text {gap}}(u,\xi )&:= \lambda J(u) + \frac{1}{2}\Vert f-u_\lambda + u_\lambda -u \Vert ^2 \\&\quad \,+ \frac{1}{2}\Vert f-\lambda \xi _\lambda ' + \lambda \xi _\lambda ' -\lambda \xi ' \Vert ^2-\frac{1}{2}\Vert f\Vert ^2 \\&=\lambda J(u) -\langle u,\lambda \xi _\lambda '\rangle + \langle u_\lambda , \lambda \xi _\lambda '\rangle \\&\quad + \langle u_\lambda , \lambda \xi _\lambda ' -\lambda \xi '\rangle \\&\quad + \frac{1}{2}\Vert f-u_\lambda \Vert ^2 + \frac{1}{2}\Vert f-\lambda \xi _\lambda '\Vert ^2 -\frac{1}{2}\Vert f\Vert ^2\\&\quad + \frac{1}{2}\Vert u-u_\lambda \Vert ^2 + \frac{1}{2}\Vert \lambda \xi ' -\lambda \xi _\lambda ' \Vert ^2\\&\ge \frac{1}{2}\Vert f-u_\lambda \Vert ^2 + \langle u_\lambda , \lambda \xi _\lambda '\rangle + \frac{1}{2}\Vert f-u_\lambda \Vert ^2\\&\quad - \frac{1}{2}\Vert f\Vert ^2 + \frac{1}{2}\Vert u-u_\lambda \Vert ^2 + \frac{1}{2}\Vert \lambda \xi ' -\lambda \xi _\lambda ' \Vert ^2\\&= \frac{1}{2}\Vert u-u_\lambda \Vert ^2 + \frac{1}{2}\Vert \lambda \xi ' -\lambda \xi _\lambda ' \Vert ^2\, , \end{aligned}$$

and the proof is complete. \(\square \)

7 Consequences of Theorem 3 and the Taut String Interpretation

We now prove some known, and some new, properties of the ROF model.

The taut string algorithm suggests that \(W_\lambda =0\), and therefore, \(u_\lambda =0\), when \(\lambda \) is sufficiently large, and that \(W_\lambda \) must touch the sides \(F\pm \lambda \) of the tube \(T_\lambda \) when \(\lambda \) is small. These assertions can be made precise:

Proposition 4

  1. (a)

    The denoised signal \(u_\lambda = 0\) if and only if \(\lambda \ge \Vert F\Vert _\infty \), and

  2. (b)

    if \(0< \lambda < \Vert F\Vert _\infty \) then \(\Vert F - W_\lambda \Vert _\infty = \lambda \).

  3. (c)

    \(\Vert W_\lambda \Vert _\infty = \max (0, \Vert F\Vert _\infty - \lambda )\).

The results (a) and (b) are well known, and proofs, valid in the multidimensional case, can be found in Meyer’s treatise [30]. The natural estimate in (c) seems to be stated here for the first time. Notice that the maximum norm \(\Vert F\Vert _\infty \) of the cumulative signal F coincides, in one dimension, with the Meyer’s G-norm \(\Vert f\Vert _{*}\) of the signal f. Theorem 3 and the taut string interpretation of the ROF model allow us to give very short and direct proofs of all three properties.

Proof

(a) By Theorem 1, the denoised signal \(u_\lambda \) is zero if and only if the taut string \(W_\lambda \) is zero. We know that \(W_\lambda =F-\lambda \xi _\lambda \) where, as seen from (22), \(\xi _\lambda \) is the projection in \(H_0^1(I)\) of \(\lambda ^{-1}F\) onto the closed convex set K. Therefore, \(u_\lambda = 0\) if and only if \(\lambda ^{-1}F\in K\), that is, if and only if \(\Vert F\Vert _\infty \le \lambda \), as claimed.

(b) If \(0< \lambda < \Vert F\Vert _\infty \), then \(u_\lambda \ne 0\) hence \(\Vert \xi _\lambda \Vert _\infty = 1\), by Theorem 3. The assertion now follows by taking norms in the identity \(\lambda \xi _\lambda = F-W_\lambda \).

(c) The equality clearly holds when \(\lambda \ge \Vert F\Vert _\infty \) because \(W_\lambda =0\) by (a). When \(c:=\Vert F\Vert _\infty -\lambda >0\), we use a truncation argument: If W belongs to \(T_\lambda \), then so does \({\hat{W}}:=\min (c,W)\), in particular \(c>0\) ensures that \({\hat{W}}(a)={\hat{W}}(b)=0\). Since \(E({\hat{W}})\le E(W)\), and \(W_\lambda \) is the (unique) minimizer of E over \(T_\lambda \), we conclude that \(\max _I W_\lambda \le c\). A similar argument gives \(-\min _I W_\lambda \le c\). Thus, \(\Vert W_\lambda \Vert _\infty \le \max (0, \Vert F\Vert _\infty - \lambda )\). The reverse inequality follows from (b). \(\square \)

As an application of the above result and of Theorem 3, we consider the following exactly solvable example which confirms the result of several numerical simulations and which would most likely be out of reach if one had used the methods developed in [42]:

Example 5

On the interval, \(I=(0,2\pi ]\) and for some positive integer, n define \(f(x)=\cos (nx)\). If, for simplicity, we change the setting a little and impose periodic boundary conditions on the admissible functions u in \(E_\lambda (u)\), then we find that

If\(\lambda \ge 1/n\), then\(u_\lambda =0\)and for\(0<\lambda <1/n\), there exists a number\(\alpha = \alpha (\lambda )\)such that\(0<\alpha <1\)and the denoised signal is given by truncation,

$$\begin{aligned} u_\lambda = {Pr}_{[-\alpha ,\alpha ]}\circ f. \end{aligned}$$
(29)

Here,\({Pr}_{[-\alpha ,\alpha ]}: {\mathbf {R}}\rightarrow {\mathbf {R}}\)denotes projection onto the closed (convex) interval\([-\alpha ,\alpha ]\). The case with\(n=3\)is illustrated in Fig. 2.

Let us first clarify what we mean by periodic boundary conditions. It is the same as defining and minimizing the ROF functional \(E_\lambda \) for functions defined on the unit circle T rather than over an interval I. The theory developed earlier, in particular Theorem 3, still holds if we replace \(H_0^1(I)\) by \(H^1(T)\) and define the closed convex set K accordingly.

First, notice that the cumulative function \(F(X)=\int _0^xf(t)\,\mathrm{d}t=n^{-1}\sin (nx)\) satisfies \(\Vert F\Vert _\infty =1/n\) so, by Proposition 4, \(u_\lambda =0\) if \(\lambda \ge 1/n\). Therefore, it is enough to study the case when \(0<\lambda <1/n\). To deal with that case, we set \(u_*= {Pr}_{[-\alpha ,\alpha ]}\circ f\) and try to find a dual function \(\xi _*\in K\) such that the sufficient conditions for optimality (19a) and (19b) hold for then \(u_*=u_\lambda \) by Theorem 3. It follows from the first of these conditions that \(\xi _*\) must be unique periodic function which satisfies

$$\begin{aligned} \xi _*' = \frac{f - u_*}{\lambda }\,, \end{aligned}$$

whose mean value over I is zero. It remains to be verified that \(\Vert \xi _*\Vert _\infty \le 1\) and that the second of the two conditions holds. We consider the latter first. It is easy to see that

$$\begin{aligned} J(u_*)=4n\alpha . \end{aligned}$$

we consequently need to show that there exists a number \(\alpha \) between zero and one such that \(\langle u_*,\xi _*'\rangle = 4n\alpha \). Using periodicity and symmetry of the functions \(u_*\) and \(\xi _*'\), it is easy to see that

$$\begin{aligned} \langle u_*,\xi _*'\rangle = 2n\int _{-a}^a \alpha \frac{\cos (nt)-\alpha }{\lambda }\,\mathrm{d}t \end{aligned}$$

where a in the limits of the integral is the smallest positive number such that \(\alpha = \cos (na)\). We note that \(0<a<\pi /2n\), see Fig. 2. Evaluation of this integral leads to the following condition on the number a, and therefore, on \(\alpha \):

$$\begin{aligned} \sin (na)-na \cos (na) = n\lambda \quad (0<a<\pi /2n). \end{aligned}$$
(30)

Since the function \(\theta \mapsto \sin \theta - \theta \cos \theta \) is strictly increasing on the interval \([0,\pi /2]\) with range [0, 1], it follows that Eq. (30) has a unique solution \(a_\lambda \in (0,\pi /2n)\). Thus, we have shown that the condition (19b) holds if we take \(\alpha =\alpha (\lambda )=\cos (na_\lambda )\). The proof will be complete if we can verify that \(\xi _*\) satisfies \(\Vert \xi _*\Vert _\infty \le \). But this essentially follows from the work already done and the calculation, which holds for \(0\le x\le a_\lambda \):

$$\begin{aligned} 0\le \xi _*(x)&= \int _0^x \xi _*'(t)\,\mathrm{d}t \\&= \int _0^x\frac{\cos (nt)-\cos (na_\lambda )}{\lambda }\,\mathrm{d}t\\&\le \int _0^{a_\lambda }\frac{\cos (nt)-\cos (na_\lambda )}{\lambda }\,\mathrm{d}t\\&= \frac{\sin (na_\lambda )-na_\lambda \cos (na_\lambda )}{n\lambda } = 1\,, \end{aligned}$$

since \(a_\lambda \) is the solution of (30). Since the function is periodic, it now easy to see that \(|\xi _*(x)|\le 1\) on the entire interval I and the verification is complete.

We continue with some additional properties of the ROF model. Define, for \(\lambda > 0\), the value function

$$\begin{aligned} e(\lambda ) := \inf _{u\in \mathrm{BV}(I)} E_\lambda (u), \end{aligned}$$

that is, \(e(\lambda ) = E_\lambda (u_\lambda )\). The next two theorems contain the essentially well-known results.

Proposition 5

The function \(e:(0,+\infty )\rightarrow (0,+\infty )\) is non-decreasing and concave, hence continuous, and satisfies \(e(\lambda )=\Vert f\Vert ^2/2\) for \(\lambda \ge \Vert F\Vert _\infty \). Moreover, for \(f\in L^2(I)\)

$$\begin{aligned} \lim _{\lambda \rightarrow 0+} e(\lambda )=0. \end{aligned}$$

and if \(f\in \mathrm{BV}(I)\) then \(e(\lambda )=O(\lambda )\) as \(\lambda \rightarrow 0+\).

Proof

If \(\lambda _2\ge \lambda _1 > 0\), then the inequality \(E_{\lambda _2}(u) \ge E_{\lambda _1}(u)\) holds trivially for all u. Taking infimum over the functions in \(\mathrm{BV}(I)\) yields \(e(\lambda _2) \ge e(\lambda _1)\), so e is non-decreasing.

For any u, the right-hand side of the inequality

$$\begin{aligned} e(\lambda ) \le E_\lambda (u) = \lambda J(u) + \frac{1}{2}\Vert u-f\Vert ^2\;, \end{aligned}$$

is an affine and therefore a concave, function of \(\lambda \). Because the infimum of any family of concave functions is again concave, it follows that \(e(\lambda ) = \inf _{u\in \mathrm{BV}(I)} E_\lambda (u)\) is concave.

For \(\lambda \ge \Vert F\Vert _\infty \), we know from the previous theorem that \(u_\lambda = 0\), so \(e(\lambda )=E_\lambda (0)=\Vert f\Vert ^2/2\).

To prove the assertion about \(e(\lambda )\) as \(\lambda \) tends to zero from the right, we first assume that \(f\in \mathrm{BV}(I)\), in which case it follows that \(0 < e(\lambda ) \le E_\lambda (f) =\lambda J(f)\), so \(e(\lambda )=O(\lambda )\) because \(J(f)<\infty \).

If we merely have \(f\in L^2(I)\), an approximation argument is needed: For any \(\epsilon > 0\), take a function \(f_\epsilon \in H_0^1(I)\) such that \(\Vert f-f_\epsilon \Vert ^2/2 < \epsilon \). Then \(f_\epsilon \in \mathrm{BV}(I)\) and \(0\le e(\lambda ) \le E_\lambda (f_\epsilon ) < \lambda J(f_\epsilon ) + \epsilon .\) It follows that \(0 \le {{{{\mathrm{lim}}~{\mathrm{sup}}}}}_{\lambda \rightarrow 0+} e(\lambda ) < \epsilon \). Since \(\epsilon \) is arbitrary, we get \(\lim _{\lambda \rightarrow 0+}e(\lambda )=0\). \(\square \)

The map \(f\rightarrow u_\lambda \) is in fact the Moreau–Fenchel resolvent of the total variation functional J, see [6, Sect. 17.2.1], and the following proposition is a therefore a special case of a much more general result from the theory of Moreau–Fenchel approximations. Notice, however, that the second part of our proposition contains a refined quantification of the rate of convergence of \(u_\lambda \) to f as \(\lambda \rightarrow 0\) in that the common \(O(\lambda )\) is replaced by \(o(\lambda )\). The latter is not easily located in the literature.

Proposition 6

For any \(f\in L^2(I)\), we have \(u_\lambda \rightarrow f\) in \(L^2\) as \(\lambda \rightarrow 0+\). Moreover, if \(f\in \mathrm{BV}(I)\) then \(J(u_\lambda )\rightarrow J(f)\) and \(\Vert u_\lambda -f\Vert _{L^2(I)}=o(\lambda ^{1/2})\) as \(\lambda \rightarrow 0+\).

Proof

The obvious inequality \(\Vert f-u_\lambda \Vert ^2/2\le e(\lambda )\) and the fact \(\lim _{\lambda \rightarrow 0+}e(\lambda )=0\), proved above, implies the first assertion. When \(f\in \mathrm{BV}(I)\), it follows from the inequality \(\lambda J(u_\lambda ) + \frac{1}{2}\Vert u_\lambda -f\Vert _{L^2(I)}^2 = e(\lambda )\le E_\lambda (f) =\lambda J(f)\) that

$$\begin{aligned} \Vert u_\lambda -f\Vert _{L^2(I)}^2 \le 2\lambda ( J(f)-J(u_\lambda ))\;. \end{aligned}$$
(31)

Consequently, \(\Vert u_\lambda -f\Vert ^2_{L^2(I)} = O(\lambda )\) and we also notice that \(J(u_\lambda )\le J(f)\) for all \(\lambda >0\). But we can do slightly better than that. Since \(u_\lambda \rightarrow f\) in \(L^2\) as \(\lambda \rightarrow 0+\), we have

$$\begin{aligned} J(f)\le {\text {*}}{lim\,inf}_{\lambda \rightarrow 0+} J(u_\lambda )\, , \end{aligned}$$

where the lower semi-continuity of the total variation J, cf. [2, Prop. 3.6], was used. Since \(J(u_\lambda )\le J(f)\), we also obtain an estimate from below: \({{{{\mathrm{lim}}~{\mathrm{sup}}}}}_{\lambda \rightarrow 0+} J(u_\lambda ) \le J(f)\). We conclude that \(\lim _{\lambda \rightarrow 0+}J(u_\lambda ) = J(f)\). If this is used in (31), we find that \(\Vert u_\lambda -f\Vert ^2_{L^2(I)} = o(\lambda )\) as \(\lambda \rightarrow 0+\). \(\square \)

The next example shows that the convergence rate stated in the proposition is optimal in the sense that the exponent cannot be lowered.

Example 6

On the interval \(I=(0,1)\) and with \(\alpha > 1\), let the data be given by \(f(x)=\alpha x^{\alpha -1}\) for \(x\in I\). Since f is monotone, \(J(f)= \lim _{\epsilon \rightarrow 0+}(f(1-\epsilon )-f(\epsilon ))=\alpha < \infty \), and hence, \(f\in \mathrm{BV}(I)\). Applying the taut string interpretation to the cumulative function is \(F(x)=x^\alpha \) shows that the denoised signal \(u_\lambda \) is constant near the interval end points and coincides with f in between. In fact, near the left interval end point we have \(u_\lambda (x)= \alpha (\lambda /(\alpha -1))^{(\alpha -1)/\alpha }\) for \(0\le x\le (\lambda /(\alpha -1))^{1/\alpha }\). It follows, by an easy computation, that for each choice of \( \alpha \) there exists a positive constant \(C=C(\alpha )\) such that \(\Vert f-u_\lambda \Vert _{L^2(I)}^2 \ge C\lambda ^{2-1/\alpha }\). This bounds the rate of convergence from below.

The following result, mentioned in by Burger and Osher [15, Sect. 5], shows that if the data f belong to the space of functions with bounded variation and satisfy an additional regularity condition, then the convergence rate for the limits \(u_\lambda \rightarrow f\) and \(J(u_\lambda )\rightarrow J(f)\) as \(\lambda \rightarrow 0+\) can be improved (considerably) to \(O(\lambda )\) .

Proposition 7

Suppose the data \(f\in \mathrm{BV}(I)\) satisfy \(J(f) = \langle \xi _0', f\rangle \) for some test function \(\xi _0\in K\), then we have the bounds

$$\begin{aligned} \Vert f - u_\lambda \Vert _{L^2(I)} \le 2\Vert \xi _0' \Vert _{L^2(I)}\cdot \lambda \, \end{aligned}$$
(32a)

and

$$\begin{aligned} 0< J(f)-J(u_\lambda ) \le 2\Vert \xi _0' \Vert _{L^2(I)}^2\cdot \lambda \, \end{aligned}$$
(32b)

for any \(\lambda > 0\).

The additional requirement on the data—the supremum in \(J(f)=\sup _{\xi \in K}\langle \xi ',f\rangle \) being attained for some \(\xi _0\)—is an instance of the so-called source condition. The source condition for non-quadratic convex variational regularization of inverse problems was identified in [15] and used to derive convergence rates for the generalized Bregman distances. The authors point out that the above result, which they write in a slightly different way, may be proved along the same lines as their other estimates. Here, we provide the details:

Proof

Clearly, \(E_\lambda (u_\lambda )\le E_\lambda (f)\) implies the inequality

$$\begin{aligned} \lambda J(u_\lambda ) +\frac{1}{2}\Vert f-u_\lambda \Vert ^2 \le \lambda J(f). \end{aligned}$$

Moreover, since \(J(u_\lambda ) \ge \langle \xi _0',u_\lambda \rangle \), by the definition of the total variation, we find that the extra assumption on the data f gives yet an inequality:

$$\begin{aligned} J(f) - J(u_\lambda ) \le \langle \xi _0' , f - u_\lambda \rangle . \end{aligned}$$
(33)

If these two estimates are combined, together with the Cauchy–Schwarz inequality, then we get

$$\begin{aligned} \Vert f-u_\lambda \Vert ^2 \le 2\lambda \langle \xi _0',f-u_\lambda \rangle \le 2\lambda \Vert \xi _0'\Vert \, \Vert f-u_\lambda \Vert \,, \end{aligned}$$

which gives (32a). If we apply the Cauchy–Schwarz inequality once more, this time to the right-hand side of (33) and use (32a) then (32b) follows. \(\square \)

8 Proof and Applications of Theorem 2

We begin with the proof of the fundamental estimate on the derivative of the denoised signal:

Proof of Theorem 2

The estimate (4) is a consequence of an extension of the original Lewy–Stampacchia inequality [28] to bilateral obstacle problems. The bilateral obstacle problem, in the one-dimensional setting, is to minimize the energy \(E(u):=\frac{1}{2}\int _a^b u'(x)^2\,\mathrm{d}x\) in (3) over the closed convex set \(C=\{ u\in H_0^1(I): \phi (x) \le u(x)\le \psi (x) \text { a.e. }I\}\). The obstacles are given by the functions \(\phi ,\psi \in H^1(I)\) which satisfy the conditions \(\phi <\psi \) on I, and \(\phi< 0 <\psi \) on \(\partial I =\{ a,b\}\). The latter ensures that C is non-empty.

Suppose \(\phi '\) and \(\psi '\) are in \(\mathrm{BV}(I)\), such that \(\phi ''\) and \(\psi ''\) are signed measures, then the solution \(u_0\) of the minimization problem \(\min _{u\in C} E(u)\) satisfies the following inequality (as measures)

$$\begin{aligned} -(\phi '')^- \le u_0'' \le \,\, (\psi '')^+\, . \end{aligned}$$
(34)

Here, the notation \(\mu ^+\) and \(\mu ^-\) is used to denote the positive and negative variations, respectively, of a signed measure \(\mu \). The estimate in (34) is the extension of the Lewy–Stampacchia inequality. We prove this result in “Appendix B”. Our proof is based on an abstract argument, valid in much more general settings, given in [23].

The assumption of our theorem that \(f\in \mathrm{BV}(I)\) implies that \(F''=f'\) is a signed measure. If we apply (34) with \(\phi =F-\lambda \) and \(\psi = F+\lambda \), then we find that the taut string \(W_\lambda \) satisfies

$$\begin{aligned} -(F'')^- \le W_\lambda '' \le \,\, (F'')^+ \, . \end{aligned}$$

The estimate (4) follows if we substitute the identities \(F'=f\) and \(W_\lambda '=u_\lambda \) into the above inequality. \(\square \)

Having established Theorem 2, we are able to prove the following result about the strong convergence in \(\mathrm{BV}(I)\) of the ROF minimizer as the regularization parameter approaches zero.

Proposition 8

If \(f\in \mathrm{BV}(I)\) then

$$\begin{aligned} J(f-u_\lambda )=J(f)-J(u_\lambda ). \end{aligned}$$
(35)

In particular, both \(J(f-u_\lambda )\) and \(\Vert f-u_\lambda \Vert _{BV}\) tend to zero as \(\lambda \rightarrow 0+\).

Proof

The measures \((f')^+\) and \((f')^-\) are concentrated on disjoint measurable sets (Hahn decomposition, see [38, Sect. 6.14]), so Proposition 2 implies the pair of inequalities, \(0\le (u_\lambda ')^+\le (f')^+\) and \(0\le (u_\lambda ')^-\le (f')^-\). A direct calculation, using the fact that \(J(v)=(v')^+(I) + (v')^-(I)\) for any function \(v\in \mathrm{BV}(I)\), yields

$$\begin{aligned} J(f-u_\lambda )&= (f'-u_\lambda ')^+(I) +(f'-u_\lambda ')^-(I)\\&=(f')^+(I) - (u_\lambda ')^+(I) + (f')^-(I) - (u'_\lambda )^-(I)\\&= J(f)-J(u_\lambda ), \end{aligned}$$

where the right-hand side tends to zero as \(\lambda \rightarrow 0+\), by Proposition 6. \(\square \)

Theorem 2 also implies the first part of

Proposition 9

Suppose f is a piecewise constant function on I. 1) The ROF minimizer \(u_\lambda \) is again piecewise constant for all \(\lambda >0\). 2) There exists a number \({\bar{\lambda }}>0\) and a piecewise linear function \({\bar{\xi }}\in K\) such that \(\xi _\lambda ={\bar{\xi }}\) for all \(\lambda \), \(0<\lambda \le {\bar{\lambda }}\).

Proof

If f is piecewise constant, then there exists nodes \(a=x_0<x_1<\cdots<x_{N-1}<x_N=b\) which partitions the interval \(I=(a,b]\) into N subintervals \(I_i=(x_{i-1},x_i]\) such that f equals the constant value \(f_i\in {\mathbf {R}}\) on \(I_i\) for \(i=1,\ldots ,N\). The derivative of the signal becomes

$$\begin{aligned} f'=\sum _{i=1}^{N-1} b_i\delta _{x_i} \end{aligned}$$
(36)

where \(b_i=f_{i+1}-f_i\) for \(i=1,\ldots ,N-1\) and \(\delta _x\) denotes the Dirac measure supported at x. We may assume that \(f_{i+1}\ne f_i\) and therefore \(b_i\ne 0\) for all \(i=1,\ldots ,N-1\). From (36), it follows that \(J(f)=\sum _{i=1}^{N-1} |b_i|<\infty \) so f belongs to \(\mathrm{BV}(I)\) and Theorem 2 is applicable:

$$\begin{aligned}&\sum _{i=1}^{N-1} \min \{0,b_i \}\delta _{x_i} \\&\quad = -(f')^- \le u_\lambda '\le (f')^+ \\&\quad = \sum _{i=1}^{N-1} \max \{0,b_i \}\delta _{x_i}\, . \end{aligned}$$

This estimate shows that there exists numbers \(c_i(\lambda )\) such that \(u_\lambda '=\sum _{i=1}^{N-1} c_i(\lambda ) \delta _{x_i}\) and that the \(c_i(\lambda )\)’s satisfy \(0\le c_i(\lambda )\cdot b_i^{-1}\le 1\) for \(i=1,\ldots ,N-1\). Since the derivative is zero except at a finite set of points, we draw the conclusion that \(u_\lambda \) is piecewise constant function, as claimed. Observe that the set of nodes of \(u_\lambda \) is contained in the set of nodes of f. (The latter is the “edge-preserving” property of the one-dimensional ROF model.)

The proof of the second part of the proposition is suggested by the taut string interpretation of the ROF model: When the parameter \(\lambda \) is sufficiently close to zero, the taut string will, at each node of the signal f, meet either the upper obstacle \(F+\lambda \) or the lower obstacle \(F-\lambda \), and here. The idea of the proof is to guess \({\bar{\xi }}\) and set

$$\begin{aligned} {\bar{u}}_\lambda = f - \lambda {\bar{\xi }}'\,, \end{aligned}$$
(37)

as required by (19a), and then verify the identity (19b). Then Theorem 3 implies that \({\bar{u}}_\lambda =u_\lambda \) and \({\bar{\xi }}=\xi _\lambda \), as claimed.

As \({\bar{\xi }}\) we choose the continuous piecewise linear function which is zero at the end points of the interval I and satisfies

$$\begin{aligned} {\bar{\xi }}(x_i) = -{\text {sign}}(b_i)\,,\quad i=1,\ldots ,N-1\,, \end{aligned}$$

and, as mentioned above, let \({\bar{u}}_\lambda \) be defined by (37). It is clear that \(\Vert {\bar{\xi }}\Vert _\infty \le 1\) so that \({\bar{\xi }}\in K\). Our task is now to verify that the pair \({\bar{u}}_\lambda , {\bar{\xi }}\) satisfies \(J({\bar{u}}_\lambda )=\langle {\bar{u}}_\lambda , {\bar{\xi }}\rangle \). First, notice that

$$\begin{aligned} {\bar{u}}'_\lambda = f'-\lambda {\bar{\xi }}'' = \sum _{i=1}^{N-1} (b_i - \lambda d_i)\delta _{x_i}\,, \end{aligned}$$

where we have used that \({\bar{\xi }}\) is piecewise linear such that \({\bar{\xi }}''\) exists and equals \(\sum _{i=1}^{N-1}d_i\delta _{x_i}\) for some real numbers \(d_1,\ldots ,d_{N-1}\) (which could in principle be computed from the definition of \({\bar{\xi }}\) and knowledge of where f’s node is located.) It is now clear that we may find a positive number \({\bar{\lambda }}\) such that if \(0\le \lambda <{\bar{\lambda }}\) then each of the numbers \(b_i-\lambda d_i\), used to represent \({\bar{u}}'_\lambda \), has the same sign as \(b_i\). (In fact \({\bar{\lambda }}=(\max _{1\le i\le N-1}(d_i/b_i))^{-1}\) works). For any \(\lambda \) smaller than \({\bar{\lambda }}\), we have

$$\begin{aligned} \langle {\bar{u}}_\lambda , {\bar{\xi }}'\rangle&= \langle -{\bar{u}}'_\lambda , {\bar{\xi }}\rangle \\&=\sum _{i=1}^{N-1} -(b_i-\lambda d_i)\langle \delta _{x_i},{\bar{\xi }}\rangle \\&=\sum _{i=1}^{N-1} (b_i-\lambda d_i){\text {sign}}(b_i)\\&=\sum _{i=1}^{N-1} |b_i-\lambda d_i| = J({\bar{u}}_\lambda ) \end{aligned}$$

as wanted, and the proof of the second part of the proposition is complete. \(\square \)

The proposition implies the strongest approximation rate imaginable:

Corollary 2

If f is a piecewise constant function, then \(\Vert f -u_\lambda \Vert _{L^2(I)} = O(\lambda )\) and \(J(f)-J(u_\lambda )=O(\lambda )\) as \(\lambda \rightarrow 0+\).

Proof

Clearly, \(f\in \mathrm{BV}(I)\) so the result will follow from the bounds in Proposition 7 provided \(J(f)=\langle \xi _0',f\rangle \) for some \(\xi _0\in K\). But this is a consequence of the result just proved: \(\xi _\lambda = {\bar{\xi }}\) when \(\lambda \) is close to zero, hence

$$\begin{aligned} J(f) = \lim _{\lambda \rightarrow 0+} J(u_\lambda ) = \lim _{\lambda \rightarrow 0+} \langle u_\lambda , {\bar{\xi }}'\,\rangle _{L^2(I)} = \langle f, {\bar{\xi }}'\,\rangle _{L^2(I)}, \end{aligned}$$

so we may take \(\xi _0={\bar{\xi }}\). \(\square \)

9 ROF Denoising as a Semi-Group

We now turn our attention to ROF denoising as the map \(f\mapsto u_\lambda \) and therefore need to adjust notation to incorporate the dependence of \(u_\lambda \) upon the signal f in a natural manner. Therefore, for each \(\lambda \ge 0\), define a mapping \(S_\lambda : L^2(I) \rightarrow L^2(I)\) by setting, for any \(f\in L^2(I)\),

$$\begin{aligned} S_\lambda (f) = {\left\{ \begin{array}{ll} u_\lambda &{} \text {if } \lambda > 0, \text { and}\\ f &{} \text {when }\lambda = 0. \end{array}\right. } \end{aligned}$$
(38)

Then \(\{ S_\lambda \}_{\lambda \ge 0}\) is a family of nonlinear operators which we claim has the following properties:

  1. 1.

    \(S_0= {\text {Id}}\), the identity mapping on \(L^2(I)\).

  2. 2.

    For any \(f\in L^2(I)\), \([0,+\infty )\ni \lambda \mapsto S_\lambda (f)\in L^2(I)\) is continuous.

  3. 3.

    For any \(\lambda \ge 0\), \(S_\lambda \) is non-expansive;

    $$\begin{aligned} \Vert S_\lambda (f_2)-S_\lambda (f_1)\Vert _{L^2(I)} \le \Vert f_2-f_1\Vert _{L^2(I)} \end{aligned}$$

    for all \(f_1,f_2\in L^2(I)\).

  4. 4.

    \(S_\lambda \circ S_\nu = S_{\lambda +\nu }\) for all \(\lambda .\nu \ge 0\)

Property 1) follows, of course, from the definition of \(S_\lambda \) and reflects our intuition associated with the definition of \(u_\lambda \) in (1). This intuition is confirmed by Proposition 6 which states that \(u_\lambda \rightarrow f\) in \(L^2\) as \(\lambda \rightarrow 0+\). This observation together with part (b) of Proposition 2 implies the second property in the above list. The third property is simply part (a) of Proposition 2 rewritten in terms of \(S_\lambda \). If the fourth and last property can be proved, then the family \(\{S_\lambda \}_{\lambda \ge 0}\) satisfies the axioms of a semi-group of nonlinear operators on the Hilbert space \(L^2(I)\), see Barbu [7, Ch. III.1, p.98]. The last property is the content of the following proposition:

Proposition 10

(Semi-group property) For \(f\in L^2(I)\), the formula

$$\begin{aligned} S_\lambda ( S_\nu (f)) = S_{\lambda +\nu }(f) \end{aligned}$$
(39)

holds for all \(\lambda ,\nu \ge 0\).

A proof of the semi-group property can be found in [39]. However, the fundamental estimate in Theorem 2 and the characterization of the ROF minimizer in Theorem 3 allow us to present short and very direct proof of this result.

Proof

The assertion holds trivially if either \(\lambda \) or \(\nu \) is equal zero, so we may assume that \(\lambda , \mu > 0\). The idea of the proof is to set

$$\begin{aligned} {\bar{u}} = S_\nu (u_\lambda ) = S_\nu (S_\lambda (f)) \end{aligned}$$

and then show that there exists a dual variable \({\bar{\xi }}\in K\) such that

$$\begin{aligned} {\left\{ \begin{array}{ll} {\bar{u}} = f - (\lambda +\mu ){\bar{\xi }}'\quad \text {and }\\ J({\bar{u}}) = \langle {\bar{u}} , {\bar{\xi }}' \rangle . \end{array}\right. }. \end{aligned}$$

For then, the characterization of the ROF minimizer in Theorem 3 implies that \({\bar{u}}=S_{\lambda +\nu }(f)\) and (39) holds.

Since \(u_\lambda \) and \({\bar{u}}\) are both defined as ROF minimizers, they satisfy the conditions (19a) and (19b) of Theorem 3. That is, there exist uniquely determined functions \(\xi _\lambda \) and \({\tilde{\xi }}_\nu \) in K such that

$$\begin{aligned} {\left\{ \begin{array}{ll} u_\lambda = f - \lambda \xi _\lambda '\; ,\\ J(u_\lambda ) = \langle u_\lambda , \xi _\lambda '\rangle , \end{array}\right. } \quad \text {and}\qquad {\left\{ \begin{array}{ll} {\bar{u}} = u_\lambda - \nu {\widetilde{\xi }}_\nu '\; ,\\ J({\bar{u}}) = \langle {\bar{u}} , {\tilde{\xi }}_\nu ' \rangle . \end{array}\right. } \end{aligned}$$

Here, the tilde in \({\widetilde{\xi }}_\nu \) signifies that we are dealing with the dual variable in the denoising of \(u_\lambda \) rather than of f.

If we set

$$\begin{aligned} {\bar{\xi }} = \frac{\lambda \xi _\lambda + \nu {\widetilde{\xi }}_\nu }{\lambda + \nu }, \end{aligned}$$

then \({\bar{\xi }}\in K\) because it is the convex combination of two elements of K. Using the above characterizations of \(u_\lambda \) and \({\bar{u}}\), we find that

$$\begin{aligned} f - (\lambda +\nu ){\bar{\xi }}' = (f - \lambda \xi _\lambda ')-\nu {\widetilde{\xi }}_\nu ' = u_\lambda -\nu {\widetilde{\xi }}_\nu ' = {\bar{u}}\, , \end{aligned}$$

in other words \({\bar{u}}\) and \({\bar{\xi }}\) fulfil (19a), by construction. It remains to verify that (19b) is holds as well. Since

$$\begin{aligned} \langle {\bar{u}} , {\bar{\xi }}'\rangle&= \frac{\lambda }{\lambda +\nu }\langle {\bar{u}} , \xi _\lambda '\rangle + \frac{\nu }{\lambda +\nu }\langle {\bar{u}} , {\widetilde{\xi }}_\nu '\rangle \\&=\frac{\lambda }{\lambda +\nu }\langle {\bar{u}} , \xi _\lambda '\rangle + \frac{\nu }{\lambda +\nu }J({\bar{u}}) \, , \end{aligned}$$

where we have used the characterization of \({\bar{u}}\) stated above. We see that (19b) follows if it can be shown that \(\langle {\bar{u}} , \xi _\lambda '\rangle = J({\bar{u}})\). This follows from the identity in Proposition 8 applied with \(u_\lambda \) as in-signal:

$$\begin{aligned} J({\bar{u}})=J(u_\lambda ) - J(u_\lambda -{\bar{u}})\, . \end{aligned}$$

In fact, since \(J({\bar{u}}) \ge \langle {\bar{u}},\xi '\rangle \) for all \(\xi \in K\), this identity implies the inequality

$$\begin{aligned} J({\bar{u}})&\le J(u_\lambda ) - \langle u_\lambda -{\bar{u}} , \xi _\lambda '\rangle \\&= J(u_\lambda ) - J(u_\lambda ) + \langle {\bar{u}} , \xi _\lambda '\rangle \\&= \langle {\bar{u}} , \xi _\lambda '\rangle \end{aligned}$$

hence \(J({\bar{u}}) = \langle {\bar{u}} ,\xi _\lambda '\rangle \) and the proof is complete. \(\square \)

The last part of the proof yields

Corollary 3

If \(\lambda > 0 \) then \(J(u_{\lambda }) = \langle u_{\lambda } , \xi _\nu ' \rangle _{L^2(I)}\) for all \(\nu \), \(0<\nu \le \lambda \).

That is, the total variation of \(u_\lambda \) can be computed by taking inner product with any of the previous \(\xi _\nu \)’s.

We now know that ROF denoising defines a family of nonlinear operators \(\{S_\lambda \}_{\lambda \ge 0}\) which forms a contractive semi-group under composition. It is natural to seek the infinitesimal generator of this semi-group.

The infinitesimal generator, should it exist, is given by the right-hand derivative of \(S_\lambda (f)\) at \(\lambda =0\):

$$\begin{aligned} \lim _{\lambda \rightarrow 0+} \frac{S_\lambda (f) - f}{\lambda } = \lim _{\lambda \rightarrow 0+} \frac{u_\lambda - f}{\lambda } = -\lim _{\lambda \rightarrow 0+} \xi _\lambda '\, , \end{aligned}$$

where we have used (19a). If the limit \(\lim _{\lambda \rightarrow 0+}\xi _\lambda := \xi _0\) exists in \(H^1(I)\), then it follows that

$$\begin{aligned} \frac{\mathrm{d}}{\mathrm{d}\lambda }S_\lambda (f)\big |_{\lambda =0} = -\xi _0'. \end{aligned}$$
(40)

So in order to determine the infinitesimal generator of the semi-group, we first need to show that the limit \(\xi _0:=\lim _{\lambda \rightarrow 0+}\xi _\lambda \) exists for sufficiently many \(f\in \mathrm{BV}(I)\) and then characterize \(\xi _0\) in terms of f alone. A first step in this characterization is based on Lemma 3, where the notion of the subgradient of a convex function is used.

We recall the definition of the subdifferential of a convex functional, restricting ourselves to the Hilbert space case. Let H be a real Hilbert space and \({\varPhi }:H\rightarrow (-\infty ,\infty ]\) a lower semi-continuous convex functional defined in H such that \({\text {Dom}}{\varPhi } := \{x\in H\, : \, {\varPhi }(x)<\infty \}\) is non-empty. Let \(x_0\in {\text {Dom}}{\varPhi }\) and suppose there is a vector \(y\in H\) such that the following inequality holds

$$\begin{aligned} {\varPhi }(x) - {\varPhi }(x_0) \ge \langle y, x-x_0\rangle \quad \text {for all } x\in {\text {Dom}}{\varPhi }. \end{aligned}$$

Then y is called a subgradient of \({\varPhi }\) at \(x_0\). The set of all such subgradients is called the subdifferential of \({\varPhi }\) at \(x_0\) and is denoted \(\partial {\varPhi }(x_0)\). The map \(x\mapsto \partial {\varPhi }(x)\) is a set-valued operator. It is possible that \(\partial {\varPhi }(x_0)=\emptyset \). The map \(x\mapsto \partial {\varPhi }(x)\) is monotone in the sense that if \(x,x_0\) are points satisfying \(\partial {\varPhi }(x)\ne \emptyset \) and \(\partial {\varPhi }(x_0)\ne \emptyset \), then for any \(\xi \in \partial {\varPhi }(x)\) and \(\xi _0\in \partial {\varPhi }(x_0)\) we have \(\langle \xi -\xi _0, x-x_0\rangle \ge 0\). This follows immediately from the definition of the subgradient.

If we take \(H=L^2(I)\) and let \({\varPhi }(u)=J(u)\) be the total variation of u, then \({\text {Dom}}J=\mathrm{BV}(I)\) and we can characterize the subdifferential \(\partial J\) in the following manner:

Lemma 3

Let \(u_0\in {\text {Dom}}J\) then \(\eta \in \partial J(u_0)\) if and only if there exists \(\xi _0\in K\) such that \(\eta =\xi _0'\) and \(J(u_0)=\langle u_0,\xi _0'\rangle \).

The total variation J is considered as a function on \(L^2(I)\) so \(\eta \in L^2(I)\). Example 3 in Sect. 3 shows that there are cases where \(u_0\in {\text {Dom}}J\) but \(\partial J(u_0)=\emptyset \). The lemma is the one-dimensional instance of a more general multidimensional result, see Alter et al. [1, Lemma 1, p. 335] as well as Bellettini et al. [8]. For completeness of exposition, we provide a proof:

Proof

Assume first that the equality \(J(u_0)=\langle u_0,\xi _0'\rangle \) holds for some \(\xi _0\in H_0^1(I)\) and let \(u\in \mathrm{BV}(I)\). By the definition of the total variation, we have \(J(u)\ge \langle u,\xi _0'\rangle \). Subtracting the first identity from this inequality yields

$$\begin{aligned} J(u)-J(u_0)\ge \langle u-u_0,\xi _0'\rangle \, , \end{aligned}$$
(41)

which is precisely the condition for \(\xi _0'\in \partial J(u_0)\).

Conversely, suppose \(\eta \in \partial J(u_0)\) such that the inequality (41) holds for all \(u\in \mathrm{BV}(I)\). If we set \(\xi _0(x)=\int _a^x\eta (t)\,\mathrm{d}t\) then \(\xi _0\in H^1\) with \(\eta =\xi _0'\). Clearly, \(\xi _0(a)=0\) so to conclude that \(\xi _0\in H_0^1(I)\) we need to show that \(\xi _0(b)=0\). This, on the other hand, follows if we can show that \(\langle \eta , 1\rangle =0\). By substituting \(u = u_0+1\) into the definition of the subgradient of J at \(u_0\), we get

$$\begin{aligned} 0 = J(u)-J(u_0) \ge \langle \eta , u - u_0\rangle = \langle \eta ,1\rangle . \end{aligned}$$

Similarly, if \(u=u_0-1\) we find the reverse inequality and may conclude that \(\langle \eta , 1\rangle =0\). Consequently, \(\xi _0\in H_0^1(I)\).

Our next aim is to show that \(\xi _0\) belongs to K. For any \(v\in \mathrm{BV}(I)\), we set \(u=u_0+v\). An application of the triangle inequality for J and the inequality (41) implies

$$\begin{aligned} J(v)\ge J(u_0+v)-J(u_0)\ge \langle v, \xi _0'\rangle \, . \end{aligned}$$

Since this inequality holds for all \(v\in \mathrm{BV}(I)\), we conclude that \(\Vert \xi _0\Vert _\infty \le 1\), and hence, \(\xi _0\in K\). (In fact, any \(v\in H^1\) belongs to BV with \(J(v) = \Vert v'\Vert _{L^1}\) and may therefore be used in the above estimate;

$$\begin{aligned} \Vert v'\Vert _{L^1} = J(v)\ge \langle v, \xi _0'\rangle = \langle -v',\xi _0\rangle \,, \end{aligned}$$

where integration by parts was used in the last step. This shows that \(\L ^2\ni f\mapsto \langle f, \xi _0\rangle \in {\mathbf {R}}\) extends by continuity to a bounded linear on functional \(L^1\). Riesz’ theorem then implies that \(\xi _0\in L^\infty \) with \(\Vert \xi _0\Vert _{\infty }\le 1\).)

Now, if we take \(u=(1/2)u_0\) in (41) and use the positive homogeneity of J then this inequality becomes

$$\begin{aligned} -\frac{1}{2}J(u_0) \ge -\frac{1}{2}\langle u_0,\xi _0'\rangle \end{aligned}$$

or

$$\begin{aligned} J(u_0) \le \langle u_0,\xi _0'\rangle \, , \end{aligned}$$

which, in view of the definition of J, implies \(J(u_0) = \langle u_0,\xi _0'\rangle \), and the proof is complete. \(\square \)

In view of Lemma 3, the necessary and sufficient conditions (19a) and (18) in Theorem 3 can be reformulated as \(\lambda ^{-1}(f-u_\lambda )\in \partial J(u_\lambda )\), i.e. Fermat’s rule for a minimum (for convex functions.) We note in passing that the equation \(u+\partial J(u)\ni f\) has a unique solution (namely \(u_\lambda \)) for each \(f\in L^2(I)\). Hence, \(\partial J\) is a maximal monotone (or \(-\partial J\) is a maximal dissipative) set-valued operator on \(L^2\), cf. Barbu [7, p. 71].

Now, suppose that \(f\in L^2(I)\) is such that the limit \(\xi _\lambda \rightarrow \xi _0\) exists in \(H_0^1(I)\) as \(\lambda \rightarrow 0+\). Here, as usual, \(\xi _\lambda \) is the unique element in K associated with the ROF minimizer \(u_\lambda =S_\lambda (f)\). Clearly, \(\xi _0\in K\). Since the identity

$$\begin{aligned} J(u_\lambda ) = \langle u_\lambda , \xi _\lambda '\rangle \end{aligned}$$

holds for all \(\lambda >0\) and \(u_\lambda \rightarrow f\) in \(L^2(I)\) as \(\lambda \rightarrow 0+\) by Proposition 6, the lower semi-continuity of the total variation implies

$$\begin{aligned} J(f) \le \lim _{\lambda \rightarrow 0+} \langle u_\lambda , \xi _\lambda \rangle = \langle f,\xi _0'\rangle \, . \end{aligned}$$

We conclude from this that \(f\in {\text {Dom}}J=\mathrm{BV}(I)\) with \(J(f)=\langle f,\xi _0'\rangle \). Moreover, Lemma 3 implies that \(\xi _0'\in \partial J(f)\) so, by (40), we arrive at

$$\begin{aligned} -\frac{\mathrm{d}}{\mathrm{d}\lambda }S_\lambda (f)\Big |_{\lambda =0} = \xi _0'\in \partial J(f). \end{aligned}$$

Notice that the derivative \((\mathrm{d}/\mathrm{d}\lambda )S_\lambda (f)\big |_{\lambda =0}\) exists for all f in a dense subset of \(L^2(I)\). To see this, we use that the limit \(\lim _{\lambda \rightarrow 0+}\xi _\lambda := \xi _0\) exists whenever f is a piecewise constant function (by Proposition 9) and that the piecewise constant functions on I are dense in \(L^2(I)\). We have proved (Cf. [7, Th. 1.2, p.175]):

Theorem 4

The infinitesimal generator of the nonlinear contractive semi-group \(\{ S_\lambda \}_{\lambda \ge 0}\) is the nonlinear set-valued mapping \(f\mapsto -\partial J(f)\).

Using the formal notation introduced in Sect. 5, this result may be expressed by saying that \(u(x,\lambda ) := u_\lambda (x)\) solves the following Cauchy problem:

$$\begin{aligned} \partial _\lambda u = \partial _x\Big ( \frac{\partial _x u}{|\partial _x u|}\Big )\quad \text {on }I\times [0,\infty )\text { and } u(\cdot ,0) = f. \end{aligned}$$

The nonlinear parabolic PDE above is known as the total variation flow. The total variation flow in higher dimensions has been extensively studied in Andreu et al. [3, 4], Bellettini et al. [8] and Alter et al. [1]. They all start with the construction of the minimizing gradient flow associated with the total variation functional and then go on to derive its various properties including its relation to the ROF model. Here, we have started at the other end: using the nice properties of the one-dimensional case, it is proved that the ROF model gives rise to a semi-group of nonlinear contractive operators. We then proceed to derive its infinitesimal generator, which turns out to be the minus, the subdifferential of the total variation.

In view of the characterization of J’s subgradient given in Lemma 3, it follows from Example 3 that there exist \(f\in L^2(I)\) such that \(\partial J(f)\) consists of more than one element. If f is such a function and the derivative \(-(\mathrm{d}/\mathrm{d}\lambda )S_\lambda (f)|_{\lambda =0}\) exists, then it is reasonable to ask which element of \(\partial J(f)\) this derivative corresponds to. The answer is provided by the following result.

Proposition 11

Suppose \(f\in L^2(I)\) is such that the derivative \( (\mathrm{d}/\mathrm{d}\lambda )S_\lambda (f)\big |_{\lambda =0} := -\xi _0'\) exists. Then \(\xi _0'\) is the element in \(\partial J(f)\) with the smallest \(L^2\)-norm.

Proof

The assumption on f implies that the limit \(\xi _\lambda \rightarrow \xi _0\) exists in \(H_0^1(I)\) as \(\lambda \rightarrow 0+\). For each \(\lambda >0\), \(\xi _\lambda \) is characterized as the unique member of K which solves

$$\begin{aligned} \max _{\xi \in K} \Big \{ \Vert f\Vert ^2-\Vert f-\lambda \xi '\Vert ^2\Big \}\, . \end{aligned}$$

It follows that

$$\begin{aligned} 2\langle f,\xi _\lambda '\rangle - \lambda \Vert \xi _\lambda '\Vert ^2\ge 2\langle f,\xi '\rangle - \lambda \Vert \xi '\Vert ^2 \end{aligned}$$

for all \(\xi \in K\). If we pick \(\xi \) in \(\partial J(f)\), then \(\langle f,\xi '\rangle = J(f)\), by the Lemma, and we find

$$\begin{aligned} \Vert \xi '\Vert ^2 - \Vert \xi _\lambda '\Vert ^2 \ge \frac{2}{\lambda }\Big \{ J(f) - \langle f,\xi _\lambda '\rangle \Big \} \ge 0\, , \end{aligned}$$

which is the desired result. \(\square \)

A very detailed and well-written analysis of the one-dimensional total variation flow can be found in the paper by Bonforte and Figalli [11]. Here, the theory of the flow is developed as a limit of a time-discretized problem (the Crandall–Liggett approach) which leads them to study certain properties of the ROF functional, some of which are close to ours (e.g. Lemma 2.3 in [11] seems to contain the same insight as our Theorem 2.)

10 Application to Fused Lasso

In this section and the next, we briefly analyze two TV-minimization problems that are related to the ROF model. First out is a generalization of the ROF model obtained by adding a positive multiple of the \(L^1\)-norm of the variable u to the ROF functional (14):

$$\begin{aligned} E_{\lambda ,\mu }(u) = \lambda J(u) + \mu \Vert u \Vert _{L^1(I)} + \frac{1}{2}\Vert f - u\Vert _{L^2(I)}^2\;, \end{aligned}$$
(42)

where \(\lambda , \mu \ge 0\) are regularization parameters. Suppose f is a piecewise constant function on \(I=(a,b]\) with equidistant nodes \(a=x_0<x_1<\cdots<x_{N-1}<x_N=b\) and with the constant value \(f_i\) on each subinterval \((x_{i-1},x_i]\) for \(i=1,\ldots ,N\), i.e.

$$\begin{aligned} f=\sum _{i=1}^N f_i\chi _{(x_{i-1},x_i]}. \end{aligned}$$

Also, restrict minimization to the set of functions u which are piecewise constant with the same nodes as f and the constant value \(u_i\) on the ith subinterval; \(u=\sum _{i=1}^N u_i\chi _{(x_{i-1},x_i]}\). Substitution of such f and u into (42) leads to the minimizing the following discretized functional,

$$\begin{aligned} E_{\lambda ,\mu }(u) = \lambda \sum _{i=1}^{N-1} |u_{i+1}-u_i| + \mu \sum _{i=1}^N|u_i| + \frac{1}{2} \sum _{i=1}^N |f_i-u_i|^2. \end{aligned}$$

This minimization problem, known as the fused lasso model (in Lagrange form), was introduced in Tibshirani et al. [43]. The functional is strictly convex and has compact sub-level sets and therefore possesses a unique minimum \(u^*=\sum _{i=1}^N u^*_i\chi _{(x_{i-1},x_i]}\) which is called the fused lasso signal approximator. The idea of the fused lasso model is to simultaneously promote sparsity in \(u^*\) and its (discrete) derivative. As shown by Friedman et al. [22], there is a close relationship between the fused lasso model and the discrete ROF model (i.e. \(E_{\lambda ,\mu }\) with \(\mu =0\)): the fused lasso signal approximator \(u^*\) can be obtained from the discrete ROF minimizer \(u_\lambda = \sum _{i=1}^N u_{\lambda ,i}\chi _{(x_{i-1},x_i]}\) by soft thresholding at level \(\mu \),

$$\begin{aligned} u^*_i = {\left\{ \begin{array}{ll} u_{\lambda ,i} - \mu {\text {sign}}(u_{\lambda ,i}) &{} \text {for } |u_{\lambda ,i}|>\mu \\ 0 &{} \text {otherwise,} \end{array}\right. } \end{aligned}$$

for \(i=1,\ldots , N\).

The purpose of this section is to extend this result to the continuous fused lasso model (42). In the continuous case, the fused lasso signal approximator is defined by

$$\begin{aligned} u_{\lambda ,\mu } = \mathop {{{{\mathrm{arg}}~{\mathrm{min}}}}}\limits _{u\in \mathrm{BV}(I)} E_{\lambda ,\mu }(u)\;. \end{aligned}$$

and the claimed relation to the ROF model is:

Theorem 5

The fused lasso signal approximator \(u_{\lambda ,\mu }\) can be computed, for all \(\lambda , \mu \ge 0\), from the ROF minimizer \(u_\lambda \) by the formula

$$\begin{aligned} u_{\lambda ,\mu } = {Sr}_{[-\mu ,\mu ]} \circ u_\lambda \;, \end{aligned}$$
(43)

where the function \({Sr}_{[-\mu ,\mu ]}:{\mathbf {R}}\rightarrow {\mathbf {R}}\) given by

$$\begin{aligned} {Sr}_{[-\mu ,\mu ]}(t) = t - \frac{\mu t}{\max (\mu , |t|)}\;,\quad t\in {\mathbf {R}}\;, \end{aligned}$$
(44)

is the soft threshold map (or shrinkage map) at level \(\mu \) (cf. Sect. 2).

Proof

The total variation is again expressed in terms of a dual variable \(\xi \) as \(J(u)=\sup _{\xi \in K}\langle u,\xi '\rangle \). We follow the same pattern and represent the \(L^1\)-norm by \(\Vert u\Vert _{L^1} = \sup _{\eta \in B}\langle u,\eta \rangle \) where \(B:=\{ \eta \in L^{\infty }(I)\; : \; \Vert \eta \Vert _{\infty } \le 1\}\) denotes the closed unit ball in \(L^\infty \) ( which is a subset of \(L^2(I)\) because I is bounded.) This formula holds because \(L^\infty \) is the dual of \(L^1\), cf. [38, Thm. 6.16]. It follows that (42) may be written as

$$\begin{aligned} E_{\lambda ,\eta }(u) = \sup _{\eta \in B,\xi \in K} \Big \{ \lambda \langle u,\xi '\rangle + \mu \langle u,\eta \rangle + \frac{1}{2}\Vert f - u\Vert ^2\Big \}\;. \end{aligned}$$

Now, the convex sets \(K':=\{ \zeta =\xi '\, :\, \xi \in K \}\) and B are both closed in \(L^2(I)\). The same is true for the dilated sets \(\lambda K'\) and \(\mu B\) and for their Minkowski sum

$$\begin{aligned} C:=\lambda K' + \mu B. \end{aligned}$$

We can therefore rewrite the fused lasso energy as

$$\begin{aligned} E_{\lambda ,\eta }(u) = \sup _{\zeta \in C} \Big \{ \langle u,\zeta \rangle + \frac{1}{2}\Vert f - u\Vert ^2\Big \}\;. \end{aligned}$$

By following the proof of Theorem 3 step by step, we can show that

$$\begin{aligned} \min E_{\lambda ,\eta }(u) = \max _{\zeta \in C}\frac{1}{2}\Big \{ \Vert f\Vert ^2 -\Vert f-\zeta \Vert ^2\Big \} \end{aligned}$$
(45)

where equality is attained for a unique pair of function \(u_*\in \mathrm{BV}(I)\) and \(\zeta _*\in C\). Moreover, this pair is characterized by the condition that

$$\begin{aligned} u_* = f - \zeta _*\quad \text {and}\quad \langle f-\zeta _*, \zeta -\zeta _*\rangle \le 0\;,\quad \forall \zeta \in C\;. \end{aligned}$$

By definition, \(u_*=u_{\lambda ,\mu }\), the fused lasso estimator.

Fig. 3
figure 3

Graphical illustrations of the taut string interpretation of isotonic regression

Now, the second member \(\zeta _*\) may be written as a sum \(\zeta _* = \lambda \xi _*' + \mu \eta _*\) for some \(\xi _*\in K\) and \(\eta _*\in B\). This decomposition of \(\zeta _*\) is usually not unique, but the following argument holds for any such decomposition. The necessary and sufficient condition for optimality in (45) may be expressed as

$$\begin{aligned} {\left\{ \begin{array}{ll} \langle f-\lambda \xi _*'-\mu \eta _*\;,\; \zeta - \lambda \xi _*'-\mu \eta _* \rangle \le 0\;,\\ \text {for all}\quad \zeta \in \lambda K'+\mu B\;. \end{array}\right. } \end{aligned}$$
(46)

This, in turn, can be split into a pair of conditions by choosing first \(\zeta = \lambda \xi '+\mu \eta \) for \(\xi \in K\) and then \(\zeta = \lambda \xi _*' + \mu \eta _*\) in (46):

$$\begin{aligned} \langle f-\lambda \xi _*'-\mu \eta _*\;,\; \xi ' - \xi _*'\rangle \le 0\;,&\quad \forall \xi \in K\;, \end{aligned}$$
(47)
$$\begin{aligned} \langle f-\lambda \xi _*'-\mu \eta _*\;,\; \eta -\eta _* \rangle \le 0\;,&\quad \forall \eta \in B\;. \end{aligned}$$
(48)

These conditions together are both necessary and sufficient for \(\zeta _* = \lambda \xi _*' + \mu \eta _*\) to be the maximizer of the right-hand side of (45). By the projection theorem, (48) implies that \(\mu \eta _*={Pr}_{\mu B}(f-\lambda \xi _*')\). This means that the fused lasso solution can be written in terms of \(\xi _*\) as

$$\begin{aligned} u_{\lambda ,\mu }&= f-\zeta _*\nonumber \\&= f - \lambda \xi _* -\mu \eta _*\nonumber \\&= (f-\lambda \xi _*')-{Pr}_{\mu B}(f-\lambda \xi _*')\nonumber \\&={Sr}_{[-\mu ,\mu ]}(f-\lambda \xi _*'). \end{aligned}$$
(49)

It remains to be proved that we can take \(\xi _*\) to be the optimal dual variable \(\xi _\lambda \) in the ROF model. This is a consequence of the following argument: From Sect. 2, we know that the projection \(\mu \eta _*\) of \(f-\lambda \xi _*'\) onto \(\mu B\) can be computed explicitly by the formula

$$\begin{aligned} \mu \eta _* = \frac{\mu \, (f-\lambda \xi _*')}{\max (\mu , |f-\lambda \xi _*'|)}\;. \end{aligned}$$
(50)

If this expression is substituted into (47), it follows that \(\xi _*\) satisfies:

$$\begin{aligned} \langle \, {Sr}_{[-\mu ,\mu ]} (f-\lambda \xi _*') \, , \, \xi '-\xi _*'\,\rangle \le 0 \quad \forall \xi \in K\;. \end{aligned}$$
(51)

where \({Sr}_{[-\mu ,\mu ]}\) is the soft threshold (44). Now, let the function \(H:{\mathbf {R}}\rightarrow {\mathbf {R}}\) be defined by

$$\begin{aligned} H(t) = {\left\{ \begin{array}{ll} \frac{1}{2}(t-\mu )^2 &{} \text {for }t\ge \mu \;,\\ 0 &{} \text {when }-\mu \le t\le \mu \text { and}\\ \frac{1}{2}(t+\mu )^2 &{} \text {for } t\le -\mu \;. \end{array}\right. } \end{aligned}$$
(52)

This function is convex and differentiable, and it is easy to check that its derivative is \(H'(t) = {Sr}_{[-\mu ,\mu ]}(t)\). Thus, (51) can be rewritten as

$$\begin{aligned} \langle \, H' (f-\lambda \xi _*') \, , \, \xi '-\xi _*'\,\rangle \le 0 \quad \forall \xi \in K\;, \end{aligned}$$

which is the necessary and sufficient condition for \(\xi _*\in K\) to be a solution to the minimization problem

$$\begin{aligned} \inf _{\xi \in K} L_H(f-\lambda \xi ')\,\text { where }\, L_H(W):=\int _I H(W')\,\mathrm{d}x. \end{aligned}$$
(53)

In summary, for any decomposition of \(\zeta _*\) into a sum \(\lambda \xi _*'+\mu \eta _*\) where \(\xi _*\in K\) and \(\eta _*\in B\), the first function \(\xi _*\) solves (53) and the second one \(\eta _*\) can be computed from \(\xi _*\) using (50). Conversely, if we find a solution \(\xi _*\) of (53) and define \(\eta _*\) by the explicit formula in (50) then \(\zeta _*=\lambda \xi _*'+\mu \eta _*\) maximizes the right-hand side in (45). Now, Lemma 1 implies that the ROF minimizer \(\xi _\lambda \) is a solution to (53). Since \(u_\lambda = f-\lambda \xi _\lambda '\), the formula (43) follows immediately from (49). \(\square \)

11 Application to Isotonic Regression

As a second example, we briefly outline, mostly without proofs, how the theory developed for the ROF model can be modified in order to derive the so-called “lower convex envelope” interpretation of the isotonic regression estimator. Isotonic regression is a method from mathematical statistics used for nonparametric estimation of probability distributions, see for instance, [5]. It is a least-squares problem with a monotonicity constraint: for \(f\in L^2(I)\), determine a non-decreasing function \(u_\uparrow \in L^2(I)\) which solves the minimization problem,

$$\begin{aligned} \min _{u\in L^2_\uparrow (I)}\frac{1}{2}\Vert u-f\Vert _{L^2(I)}^2\;, \end{aligned}$$
(54)

where \(L^2_\uparrow (I)\) denotes the set of all non-decreasing functions in \(L^2(I)\). The “lover convex envelope” interpretation is shown for a piecewise constant signal f in Fig. 3.

The idea is to reformulate (54) as an unconstrained optimization problem by replacing the total variation term J of the ROF functional by regularization term \(J_\uparrow \) that can distinguish between functions that are non-decreasing or not. To achieve this, we set

$$\begin{aligned} K_+ = \big \{ \xi \in H_0^1(I)\, :\, \xi (x)\ge 0\text { for all } x\in I \big \} \end{aligned}$$

and define the functional

$$\begin{aligned} J_\uparrow (u) = \mathop {{\text {sup}}}\limits _{\xi \in K_+} \, \langle u,\xi '\rangle _{L^2(I)}. \end{aligned}$$

It can be shown that

$$\begin{aligned} J_\uparrow (u) = {\left\{ \begin{array}{ll} 0 &{} \text {if } u\in L_\uparrow ^2(I)\;,\\ +\infty &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

In the terminology of convex analysis, \(J_\uparrow \) is the indicator function of the closed convex cone \(L_\uparrow ^2(I)\). It follows that the isotonic regression problem (54) is equivalent to finding the minimizer \(u_\uparrow \) in \(L^2(I)\) of the functional

$$\begin{aligned} E_\uparrow (u) := J_\uparrow (u) + \frac{1}{2}\Vert u-f\Vert _{L^2(I)}^2. \end{aligned}$$
(55)

Notice that, there is no need for a positive weight (such as the \(\lambda \) in the ROF functional) in this functional because the regularizer assumes only the values zero and infinity.

Again, we may assume that f mean value equal to zero so that the cumulative function \(F(x):=\int _a^xf(t)\,\mathrm{d}t\) belongs to \(H_0^1(I)\). This will be used below.

Mimicking the proof of Theorem 3, we get:

$$\begin{aligned} \min _{u\in L^2(I)} E_\uparrow (u) = \max _{\xi \in K_+}\frac{1}{2}\Big \{\Vert f\Vert ^2 - \Vert f-\xi '\Vert _{L^2(I)}^2\Big \}\,, \end{aligned}$$
(56)

where the minimum on the left-hand side is attained by a unique function \(u_\uparrow \in L^2_\uparrow (I)\) and the maximum on the right-hand side by a unique \(\xi _\uparrow \in K_+\). These pair of functions satisfy

$$\begin{aligned} u_\uparrow = f-\xi _\uparrow '\quad \text {and}\quad \langle u_\uparrow \,,\, \xi _\uparrow '\rangle =0. \end{aligned}$$
(57)

These conditions are also sufficient for a pair of functions, \(u_\uparrow \in L^2_\uparrow (I)\) and \(\xi _\uparrow \in K_+\) to be the optimizers in (56).

Notice that if the two conditions of (57) are combined, the solution to the isotonic regression problem (54) can be characterized by the conditions \(u_\uparrow \in L^2_\uparrow (I)\) and \(f-u_\uparrow \in K_+':=\{ \xi '\,:\, \xi \in K_+\}\) and \(\langle u_\uparrow \, ,\, f-u_\uparrow \rangle =0\). Thus, \(K_+'\) is the dual cone of \(L^2_\uparrow (I)\) and the pair \(u_\uparrow , f-u_\uparrow \) is the Moreau decomposition of f.

Example 7

For \(f(x)=1-|x|\), \(-1<x<1\), the solution of the isotonic regression problem is

$$\begin{aligned} u_\uparrow (x) = {\left\{ \begin{array}{ll} f(x) &{} \text {for } -1<x<\alpha -1,\\ \alpha &{} \text {for }\alpha -1\le x <1, \end{array}\right. }\,, \end{aligned}$$

where \(\alpha = 2-\sqrt{2}\). To verify this, observe that \(u_\uparrow \) is clearly in \(L^2_\uparrow (-1,1)\) and if we define \(\xi _\uparrow '=f-u_\uparrow \), then

$$\begin{aligned} \langle u_\uparrow \,,\, \xi _\uparrow '\rangle = \int _{-1}^{\alpha -1}0\,\mathrm{d}x +\int _{\alpha -1}^{1}\alpha (f(x)-\alpha )\,\mathrm{d}x=0\,, \end{aligned}$$

so that the conditions (57) are both fulfilled. Since it is easy to verify that \(\xi _\uparrow (x):=\int _{-1}^{x} \xi _\uparrow (t)\,\mathrm{d}t\ge 0\) for all \(-1<x<1\), so that \(\xi _\uparrow \in K_+\), it follows from the above characterization that \(u_\uparrow \) is indeed the isotonic regressor associated with f.

Now, to prove the lower envelope interpretation we introduce the new variable \(W=F-\xi \), where \(\xi \in K_+\), and set

$$\begin{aligned} T=\{ W\in H_0^1(I)\, :\, W(x)\le F(x), { x \in I}\}. \end{aligned}$$

Maximization of the right-hand side of (55) is seen to be equivalent to solving the obstacle problem

$$\begin{aligned} \min _{W\in T} \frac{1}{2}\Vert W'\Vert _{L^2(I)}^2\,, \end{aligned}$$

which admits a unique solution \(W_\uparrow \) by the projection theorem. It follows that (55) also has the unique solution \(u_\uparrow = W_\uparrow '\quad (\text {distributional derivative})\) which belongs to \(L_\uparrow ^2(I)\) because \(E_\uparrow (u_\uparrow )\) is finite and therefore zero.

The solution \(W_\uparrow \) of the obstacle problem satisfies \(W_\uparrow ''\ge 0\) (this is the “easy” part of the original Lewy–Stampacchia inequality, \(0\le W_\uparrow ''\le (F'')^+\)) and is therefore automatically a convex function. In fact, by optimality, \(W_\uparrow \) is the maximal convex function lying below F, i.e. it is the lower convex envelope of F. Similar problems are considered in the multidimensional case, using higher-order methods (the space of functions with bounded Hessians), in Hinterberger and Scherzer [25].

12 Higher-Order Total Variation Regularization

In this section, we briefly consider the analogue of the ROF model and its taut string interpretation for denoising using higher-order total variation regularization. The one-dimensional nth order ROF model is defined as minimization of the functional

$$\begin{aligned} E^n_\lambda (u) := \lambda \int _a^b |u^{(n)}(x)|\,\mathrm{d}x + \frac{1}{2}\int _a^b (f(x)-u(x))^2\,\mathrm{d}x, \end{aligned}$$

where \(u^{(n)}\) denotes the nth derivative of \(u \in C^n(a,b)\) and \(\lambda >0\) is the regularization parameter. Here, we shall treat only the case \(n=2\), i.e. the second-order ROF model. In the multidimensional setting, second- and higher-order regularizations have been considered early on by Pöschl and Scherzer [36] as well as by Breides et al. [12]. They have subsequently found applications in restoration of MRI [27] and image inpainting [35], to mention just two examples. A detailed account of the second-order regularization in image restoration, as well as additional references, can be found in Begrounioux [10]. The one-dimensional case has be studied in a purely discrete setting in Steidl et al. [41], and one-dimensional examples are found in [34, 36], among others.

We first give a proper definition of the second-order total variation, \(J_2\), which will replace the formal expression \(\int _a^b |u''|\,\mathrm{d}x\) used in the definition of the second-order ROF functional \(E^2_\lambda (u)\). For \(u\in L^1(I)\), where \(I=(a,b)\), we define

$$\begin{aligned} J_2(u) = \sup \int _a^b u\xi ''\,\mathrm{d}x \end{aligned}$$

with the supremum being taken over all \(\xi \in C_c^2(I)\) which satisfies \(|\xi (x)|\le 1\) for all \(x\in I\). The set of functions for which \(J_2\) has a finite value is denoted \(BV_2(I)\). It is clear from this definition that if \(u\in H^1(I)\) then \(J_2(u)=J(u')\); just apply integration by parts in the above integral and recall the definition of J. It can be shown that any \(u\in BV_2(I)\) is automatically a member of \(H^1(I)\) (in fact \(u\in W^{1,\infty }(I)\) holds). In particular, \(u\in L^2(I)\) and therefore the definition of the second-order total variation \(J_2\) may be rephrased as

$$\begin{aligned} J_2(u) = \sup _{\xi \in K_2} \langle u, \xi ''\rangle _{L^2(I)}, \end{aligned}$$

where \(K_2 := \{ \xi \in H_0^2(I) \,:\, \Vert \xi \Vert _\infty \le 1\}\). Recall that \(H_0^2(I)\) is the closure of \(C_c^2(I)\) in the Hilbert space \(H^2(I)\), in fact

$$\begin{aligned} H_0^2(I) = \big \{ \xi \in H^2(I) \,:\, \xi =\xi '=0\text { at }\partial I=\{a,b\} \big \}\,, \end{aligned}$$

see Brezis [13, p. 134, Remarque 18]. Moreover, the map \(u\mapsto \Vert u''\Vert _{L^2(I)}\) defines a Hilbert space norm on \(H_0^2(I)\), i.e. the norm induced by the bilinear form \(\langle u'' , v''\rangle _{L^2(I)}\), which defines an inner product on \(H_0^2(I)\) The precise definition of the second-order ROF functional is

$$\begin{aligned} E^2_\lambda (u) := \lambda J_2(u)+\frac{1}{2}\Vert f - u\Vert _{L^2(I)}^2. \end{aligned}$$

The function \(u_\lambda :I\rightarrow {\mathbf {R}}\) given by

$$\begin{aligned} u_\lambda := \mathop {{{{\mathrm{arg}}~{\mathrm{min}}}}}\limits _{u\in BV_2(I)} E^2_\lambda (u)\,, \end{aligned}$$

is the denoising of \(f\in L^2(I)\) using the second-order total variation regularization with weight \(\lambda \).

The function \(u_\lambda \), if it exists, is the solution of the unconstrained minimization problem for \(E^2_\lambda \) over \(BV_2(I)\). By considering variations of the form \(u=u_\lambda + a_0 + a_1x\), where \(a_0,a_1\in {\mathbf {R}}\), then the optimality of \(u_\lambda \) implies that \(\partial E^2_\lambda /\partial a_0 = \partial E^2_\lambda /\partial a_1 =0\). Since \(J_2(u + a_0 + a_1x) = J_2(u)\) for all \(u\in BV_2(I)\), this leads to the two conditions: \(\int _a^b u_\lambda =\int _a^b f\) and \(\int _a^b xu_\lambda = \int _a^b xf\). We may therefore assume, without loss of generality, that the data f satisfies

$$\begin{aligned} \int _a^b f(x)\,\mathrm{d}x = 0\quad \text {and}\quad \int _a^b xf(x)\,dx = 0. \end{aligned}$$

This assumption is imposed throughout the rest of this section, unless otherwise stated.

Using the methods introduced in the study of the (ordinary) ROF model, it is possible to prove the following result:

Theorem 6

We have the identity

$$\begin{aligned} \min _{u\in BV_2(I)} E^2_\lambda (u) = \max _{\xi \in K_2} \frac{1}{2}\Big ( \Vert f\Vert _{L^2(I)}^2 - \Vert f-\lambda \xi ''\Vert _{L^2(I)}^2\Big ),\nonumber \\ \end{aligned}$$
(58)

with the minimum achieved by a unique \(u_\lambda \in BV_2(I)\) and the maximum by a unique \(\xi _\lambda \in K_2\), the two functions are related by

$$\begin{aligned} u_\lambda =f-\lambda \xi _\lambda ''\,, \end{aligned}$$
(59a)

and

$$\begin{aligned} J_2(u_\lambda )=\langle \,u_\lambda \, ,\,\xi _\lambda ''\,\rangle _{L^2(I)}. \end{aligned}$$
(59b)

Moreover, if \(u_\lambda \ne 0\), then \(\Vert \xi _\lambda \Vert _\infty =1\). Conversely, the conditions (59a) and (59b) are sufficient for \(u_\lambda \) to be a solution of the second-order ROF model.

It follows form this theorem that the minimizer \(u_\lambda \) of the second-order ROF functional exists and can be obtained by finding the solution \(\xi _\lambda \) of

$$\begin{aligned} \min _{\xi \in K_2} \frac{1}{2}\Vert f-\lambda \xi ''\Vert _{L^2(I)}^2 \end{aligned}$$

and the set \(u_\lambda = f-\lambda \xi _\lambda ''\).

Now, let us define \(F'(x) = \int _a^x f(t)\,\mathrm{d}t\) and \(F(x)=\int _a^x F'(t)\,\mathrm{d}t\) such that \(F''=f\). Then \(F(a)=F'(a)=0\) by construction and \(F(b)=F'(b)=0\) by our assumption of the signal f. Since \(f\in L^2(I)\), we see that \(F\in H_0^2(I)\). If we introduce the new variable \(W=F-\lambda \xi \), where \(\xi \in K_2\), then the above minimization problem becomes:

$$\begin{aligned} u_\lambda = W_\lambda ''\text { where } W_\lambda = \mathop {{{{\mathrm{arg}}~{\mathrm{min}}}}}\limits _{W\in T^2_\lambda } \frac{1}{2}\Vert W''\Vert _{L^2(I)}^2. \end{aligned}$$

Here, the set of admissible functions

$$\begin{aligned} T^2_\lambda = \{ W\in H_0^2(I) \, :\, F-\lambda \le W\le F+\lambda \} \end{aligned}$$

is the second-order \(\lambda \)-tube. The condition on \(W:=F-\lambda \xi \) follows from the condition \(\Vert \xi \Vert _\infty \le 1\) on the functions \(\xi \) in \(K_2\).

The above procedure for finding \(u_\lambda \) has a mechanical interpretation: The second-order regularized denoising \(u_\lambda \) of f is the second (weak) derivative of the function \(W_\lambda \) which in turn corresponds to the energy minimizing shape of an ideal elastic beam (a cubic spline), clamped at the end points of I and forced to lie between a pair of parallel walls at a uniform distance \(\lambda \) from the graph of the bi-cumulative signal F. Denoising using nth order total variation regularization may be analyzed in a similar manner, but for \(n>2\) there is no obvious mechanical interpretation.

This “restricted spline” interpretation of the second-order ROF model can be used to “guess” the analytical solution of the second-order ROF model for simple f. The following example is probably the simplest imaginable non-trivial example.

Example 8

On the interval \(I=(-1,1)\), let the signal \(f:I\rightarrow {\mathbf {R}}\) be given by \(f(x)=|x|-\frac{1}{2}\). Notice that \(\int _i f=0\) and \(\int _I xf=0\). We can use the necessary and sufficient condition formulated in Theorem 6 to prove that the restored signal is

$$\begin{aligned} u_\lambda =(1-12\lambda )_+f\,, \end{aligned}$$
(60)

for any \(\lambda >0\). In particular it follows that \(u_\lambda =0\) if \(\lambda \ge 1/12\) and that \(\Vert f-u_\lambda \Vert _{BV_2(I)}:=J_2(f-u_\lambda )+\Vert f-u_\lambda \Vert _{L^1(I)}=O(\lambda )\) as \(\lambda \rightarrow 0+\). The proof runs as follows: If (60) is substituted into (59a), we obtain

$$\begin{aligned} \xi _\lambda '' = \min (12,\lambda ^{-1})f. \end{aligned}$$

Since the signal f satisfies \(\int _If =\int _Ixf = 0\) it follows that \(\xi _\lambda = \min (12,\lambda ^{-1})F\in H_0^2(I)\). Moreover, \(\Vert \xi _\lambda \Vert _\infty \le 1\) with equality if and only if \(0\le \lambda \le 1/12\). It is now easy to verify that (59b) holds:

$$\begin{aligned} \langle \xi _\lambda '',u_\lambda \rangle&= \langle \min (12,\lambda ^{-1})f, (1-12\lambda )_+f\rangle \\&= 12(1-12\lambda )_+\Vert f\Vert ^2_{L^2(I)} \\&= 2(1-12\lambda )_+ = J_2(u_\lambda )\,, \end{aligned}$$

the last equality being a consequence of the fact that \(J_2(f)=J(f')=2\). This completes the verification.

The above explicit example was first found by Papafitsoros and Breides [34], see their Fig. 8d, but the derivation presented here is considerably shorter. Their paper focuses on a regularization term which is a kind of weighted combination of the first- and second-order total variation and contains only the latter as a special case. The example may be viewed as the second-order analogue to Example 4 where which considered denoising of a simple piecewise constant signal in the ROF model; in both cases, the restored signal is obtained by a simple scaling of the data.

Theorem 6 and its “restricted spline” interpretation make it clear that certain results for the (ordinary) ROF model carries over to the second-order case. For instance, parts (a) and (b) of Proposition 4 still hold: \(u_\lambda =0\) if and only if \(\Vert F\Vert _\infty <\lambda \) and that \(\Vert W_\lambda -F\Vert _\infty = \lambda \) also holds in that case, where F is the bi-cumulative function and \(W_\lambda \) the optimal spline.

13 Concluding Remarks

We have developed the theory for the one-dimensional ROF model for a quite general class of signals and given a thorough investigation of the properties of its solutions. This includes a useful fundamental estimate, Theorem 2, on the denoised signal. The theory may find practical applications in signal processing and image analysis alike. Indeed, by using the fundamental estimate, we saw how application of the ROF model to a piecewise constant signal leads to a piecewise constant denoised signal with the same nodes, i.e. the model is “edge-preserving” (Proposition 9). Our theory can be modified to cover the one-dimensional ROF model defined on the real line \({\mathbf {R}}\) or the half-line \({\mathbf {R}}_+\). To do this, the Sobolev spaces have to be replaced by certain Beppo Levi spaces. The theory can also be modified to handle regularization with a weighted total variation term, something like

$$\begin{aligned} J_w(u):=\int _{\mathbf {R}}w(x)|u'(x)|\,\mathrm{d}x\, , \end{aligned}$$

where \(w(x)>0\). These two extensions would allow us to analyze the higher-dimensional ROF functional for spherically symmetric signals. The theory for the N-dimensional ROF model can be developed along almost the same lines, it seems, with exception for the uniqueness of the dual variables and the fundamental estimate; when \(N\ge 2\) the dual variables are vector fields \(\xi =(\xi _1,\ldots ,\xi _N)\) whose magnitudes are bounded by one. As is well known, only the divergence of this vector field is uniquely determined, not the vector field itself. The natural generalization of the fundamental estimate does not seem to hold for \(N\ge 2\). The reason why it fails is that if f is the characteristic function of the unit square in the plane, then the denoising \(u_\lambda \) has level curves which looks like squares with the corners rounded of, see [18, Sect. 2.2.3]. This means that the support of the gradient of denoised signal is not contained in the support of the gradient of the original signal (rather, the support shifts inward, inside the square), and this is incompatible with a bound like the fundamental estimate. It would be interesting to know if there exists some alternative estimate on the denoised signal which could replace the fundamental estimate in higher dimensions.