Bilevel Parameter Learning for HigherOrder Total Variation Regularisation Models
 1.9k Downloads
 5 Citations
Abstract
We consider a bilevel optimisation approach for parameter learning in higherorder total variation image reconstruction models. Apart from the least squares cost functional, naturally used in bilevel learning, we propose and analyse an alternative cost based on a Huberregularised TV seminorm. Differentiability properties of the solution operator are verified and a firstorder optimality system is derived. Based on the adjoint information, a combined quasiNewton/semismooth Newton algorithm is proposed for the numerical solution of the bilevel problems. Numerical experiments are carried out to show the suitability of our approach and the improved performance of the new cost functional. Thanks to the bilevel optimisation framework, also a detailed comparison between \(\text {TGV}^2\) and \(\text {ICTV}\) is carried out, showing the advantages and shortcomings of both regularisers, depending on the structure of the processed images and their noise level.
Keywords
Bilevel optimisation Total variation regularisers Image quality measures1 Introduction
In this paper, we propose a bilevel optimisation approach for parameter learning in higherorder total variation regularisation models for image restoration. The reconstruction of an image from imperfect measurements is essential for all research which relies on the analysis and interpretation of image content. Mathematical image reconstruction approaches aim to maximise the information gain from acquired image data by intelligent modelling and mathematical analysis.
While functional modelling (1.1) constitutes a mathematically rigorous and physical way of setting up the reconstruction of an image—providing reconstruction guarantees in terms of error and stability estimates—it is limited with respect to its adaptivity for real data. On the other hand, databased modelling of reconstruction approaches is set up to produce results which are optimal with respect to the given data. However, in general, it neither offers insights into the structural properties of the model nor provides comprehensible reconstruction guarantees. Indeed, we believe that for the development of reliable, comprehensible and at the same time effective models (1.1), it is essential to aim for a unified approach that seeks tailormade regularisation and data models by combining model and databased approaches.
Rather than working on the discrete problem, as is done in standard parameter learning and model optimisation methods, we optimise the regularisation models in infinitedimensional function space. The resulting problems are difficult to treat due to the nonsmooth structure of the lower level problem, which makes it impossible to verify standard constraint qualification conditions for Karush–Kuhn–Tucker (KKT) systems. Therefore, in order to obtain characterising firstorder necessary optimality conditions, alternative analytical approaches have emerged, in particular regularisation techniques [4, 20, 28]. We consider such an approach here and study the related regularised problem in depth. In particular, we prove the Fréchet differentiability of the regularised solution operator, which enables to obtain an optimality condition for the problem under consideration and an adjoint state for the efficient numerical solution of the problem. The bilevel problems under consideration are related to the emerging field of generalised mathematical programmes with equilibrium constraints (MPEC) in function space. Let us remark that even for finitedimensional problems, there are few recent references dealing with stationarity conditions and solution algorithms for this type of problems (see, e.g. [18, 30, 33, 34, 38]).
Let us give an account to the state of the art of bilevel optimisation for model learning. In machine learning, bilevel optimisation is well established. It is a semisupervised learning method that optimally adapts itself to a given dataset of measurements and desirable solutions. In [15, 23, 43], for instance, the authors consider bilevel optimisation for finitedimensional Markov random field models. In inverse problems, the optimal inversion and experimental acquisition setup is discussed in the context of optimal model design in works by Haber, Horesh and Tenorio [25, 26], as well as Ghattas et al. [3, 9]. Recently, parameter learning in the context of functional variational regularisation models (1.1) also entered the image processing community with works by the authors [10, 22], Kunisch, Pock and coworkers [14, 33], Chung et al. [16] and Hintermüller et al. [30].
For the existence analysis of an optimal solution as well as for the derivation of an optimality system for the corresponding learning problem (1.2), we will consider a smoothed version of the constraint problem (1.1)—which is the one in fact used in the numerics. That is, we replace R(u)—being TV, TGV or ICTV in this paper—by a Huberregularised version and add an \(H^1\) regularisation with a small weight to (1.1). In this setting and under the special assumption of box constraints on \(\alpha \) and \(\beta \), we provide a simple existence proof for an optimal solution. A more general existence result that holds also for the original nonsmooth problem and does not require box constraints is derived in [19], and we refer the reader to this paper for a more sophisticated analysis on the structure of solutions.
The proposed bilevel approach has an important indirect consequence: It establishes a basis for the comparison of the different total variation regularisers employed in image denoising tasks. In the last part of this paper, we exhaustively compare the performance of \(\text {TV}\), \(\text {TGV}^2\) and \(\text {ICTV}\) for various image datasets. The parameters are chosen optimally, according to the proposed bilevel approach, and different quality measures (like PSNR and SSIM) are considered for the comparison. The obtained results are enlightening about when to use each one of the considered regularisers. In particular, \(\text {ICTV}\) appears to behave better for images with arbitrary structure and moderate noise levels, whereas \(\text {TGV}^2\) behaves better for images with large smooth areas.
Outline of the paper In Sect. 2, we state the bilevel learning problem for the two higherorder total variation regularisation models, TGV and ICTV, and prove existence of an optimal parameter pair \(\alpha ,\beta \). The bilevel optimisation problem is analysed in Sect. 3, where existence of Lagrange multipliers is proved and an optimality system, as well as a gradient formula, is derived. Based on the optimality condition, a BFGS algorithm for the bilevel learning problem is devised in Sect. 5.1. For the numerical solution of each denoising problem, an infeasible semismooth Newton method is considered. Finally, we discuss the performance of the parameter learning method by means of several examples for the denoising of natural photographs in Sect. 5. Therein, we also present a statistical analysis on how TV, ICTV and TGV regularisation compare in terms of returned image quality, carried out on 200 images from the Berkeley segmentation dataset BSDS300.
2 Problem Statement and Existence Analysis
2.1 Formal Statement
Let \(\Omega \subset \mathbb {R}^n\) be an open bounded domain with Lipschitz boundary. This will be our image domain. Usually \(\Omega =(0, w) \times (0, h)\) for w and h the width and height of a twodimensional image, although no such assumptions are made in this work. Our data f and \(f_0\) are assumed to lie in \(L^2(\Omega )\).
Definition 2.1
Remark 2.1
Please note that in our formulation of the bilevel problem (2.3), we only impose a nonnegativity constraint on the parameters \(\alpha \) and \(\beta \), i.e. we do not strictly bound them away from zero. There are two reasons for that. First, for the existence analysis of the smoothed problem, the case \(\alpha =\beta =0\) is not critical since compactness can be secured by the \(H^1\) term in the functional, compare Sect. 2.2. Second, in [19], we indeed prove that even for the nonsmooth problem (as \(\mu \rightarrow 0\)), under appropriate assumptions on the given data, the optimal \(\alpha ,\beta \) are guaranteed to be strictly positive.
2.2 Existence of an Optimal Solution
The existence of an optimal solution for the learning problem (2.3) is a special case of the class of bilevel problems considered in [19], where the existence of optimal parameters in \((0,+\infty ]^{2N}\) is proven. For convenience of the reader, we provide a simplified proof for the case where additional box constraints on the parameters are imposed. We start with an auxiliary lower semicontinuity result for the Huberregularised functionals.
Lemma 2.1
Let \(u,v\in L^p(\Omega )\), \(1\le p<\infty \). Then, the functional \(u \mapsto \int _\Omega uv_\gamma ~ dx\), where \(\cdot _\gamma \) is the Huber regularisation in Definition 2.1, is lower semicontinuous with respect to weak* convergence in \(\mathcal {M}(\Omega ; \mathbb {R}^d)\)
Proof
Our main existence result is the following.
Theorem 2.1
We consider the learning problem (2.3) for TGV\(^2\) and ICTV regularisation, optimising over parameters \((\alpha ,\beta )\) such that \(0 \le \alpha \le \bar{\alpha }, 0 \le \beta \le \bar{\beta }\). Here \((\bar{\alpha },\bar{\beta })<\infty \) is an arbitrary but fixed vector in \(\mathbb R^{2}\) that defines a box constraint on the parameter space. There exists an optimal solution \((\hat{\alpha },\hat{\beta })\in \mathbb R^{2}\) for this problem for both choices of cost functionals, \(F=F_{L^2_2}\) and \(F=F_{{L_\eta ^1\!\nabla }}\).
Proof
Remark 2.2

Using the existence result in [19], in principle we could allow infinite values for \(\alpha \) and \(\beta \). This would include both \(\text {TV}^2\) and \(\text {TV}\) as possible optimal regularisers in our learning problem.
 In [19], in the case of the \(L^2\) cost and assuming thatwe moreover show that the parameters \((\alpha ,\beta )\) are strictly larger than 0. In the case of the Huberised TV cost, this is proven in a discretised setting. Please see [19] for details.$$\begin{aligned} R_{\alpha ,\beta }^{\gamma }(f)>R_{\alpha ,\beta }^{\gamma }(f_0), \end{aligned}$$

The existence of solutions with \(\mu =0\), that is without elliptic regularisation, is also proven in [19]. Note that here, we focus on the \(\mu >0\) case since the elliptic regularity is required for proving the existence of Lagrange multipliers in the next section.
Remark 2.3
In [19], it was shown that the solution map of our bilevel problem is outer semicontinuous. This implies, in particular, that the minimisers of the regularised bilevel problems converge towards the minimiser of the original one.
3 Lagrange Multipliers
In this section, we prove the existence of Lagrange multipliers for the learning problem (2.3) and derive an optimality system that characterises stationary points. Moreover, a gradient formula for the reduced cost functional is obtained, which plays an important role in the development of fast solution algorithms for the learning problems (see Sect. 5.1).
3.1 Differentiability of the Solution Operator
Theorem 3.1
Proof
Thanks to the ellipticity of \(a(\cdot , \cdot )\) and the monotonicity of \(h_\gamma \), the existence of a unique solution to the linearised equation follows from the LaxMilgram theorem.
Remark 3.1
The extra regularity result for secondorder systems used in the last proof and due to Gröger [24, Thm. 1, Rem. 14] relies on the properties of the domain \(\Omega \). The result was originally proved for \(C^2\) domains. However, the regularity of the domain (in the sense of Gröger) may also be verified for convex Lipschitz bounded domains [17], which is precisely our image domain case.
Remark 3.2
The Fréchet differentiability proof makes use of the quasilinear structure of the \(\text {TGV}^2\) variational form, making it difficult to extend to the ICTV model without further regularisation terms. For the latter, however, a Gâteaux differentiability result may be obtained using the same proof technique as in [22].
3.2 The Adjoint Equation
Next, we use the Lagrangian formalism for deriving the adjoint equations for both the \(\text {TGV}^2\) and ICTV learning problems. The existence of a solution to the adjoint equation follows from the LaxMilgram theorem.
Theorem 3.2
Proof
Remark 3.3
3.3 Optimality Condition
Using the differentiability of the solution operator and the wellposedness of the adjoint equation, we derive next an optimality system for the characterisation of local minima of the bilevel learning problem. Besides the optimality condition itself, a gradient formula arises as byproduct, which is of importance in the design of solution algorithms for the learning problems.
Theorem 3.3
Proof
Remark 3.4
From the existence result (see Remark 2.2), we actually know that, under some assumptions on F, \(\bar{\alpha }\) and \(\bar{\beta }\) are strictly greater than zero. This implies that the multipliers \(\lambda _1\) and \(\lambda _2\) may be zero, and the problem becomes an unconstrained one. This plays an important role in the design of solution algorithms, since only a mild treatment of the constraints has to be taken into account, as shown in Sect. 6.
4 Numerical Algorithms
In this section, we propose a secondorder quasiNewton method for the solution of the learning problem with scalar regularisation parameters. The algorithm is based on a BFGS update, preserving the positivity of the iterates through the line search strategy and updating the matrix cyclically depending on the satisfaction of the curvature condition. For the solution of the lower level problem, a semismooth Newton method with a properly modified Jacobi matrix is considered. Moreover, warm initialisation strategies have to be taken into account in order to get convergence for the \(\text {TGV}^2\) problem.
4.1 BFGS Algorithm
Thanks to the gradient characterisation obtained in Theorem 3.3, we next devise a BFGS algorithm to solve the bilevel learning problems with higherorder regularisers. We employ a few technical tricks to ensure convergence of the classical method. In particular, we limit the step length to get at most a fraction closer to the boundary. As shown in [19], the solution is in the interior for the regularisation and cost functionals we are interested in.
Moreover, the good behaviour of the BFGS method depends upon the BFGS matrix staying positive definite. This would be ensured by the Wolfe conditions, but because of our step length limitation, the curvature condition is not necessarily satisfied. (The Wolfe conditions are guaranteed to be satisfied for some step length \(\sigma \), if our domain is unbounded, but the range, where the step satisfies the criterion, may be beyond our maximum step length and is not necessarily satisfied closer to the current point.) Instead, we skip the BFGS update if the curvature is negative.
Overall, our learning algorithm may be written as follows:
Algorithm 4.1
 (1)
Solve the adjoint equation (3.10b) for \(\Pi ^i\), and calculate \(\nabla \mathcal F (\alpha ^i,\beta ^i)\) from (3.11).
 (2)If \(i \ge 2\), do the following:
 (a)
Set \(s :=(\alpha ^i,\beta ^i)(\alpha ^{i1}, \beta ^{i1})\), and \(r :=\nabla \mathcal F(\alpha ^i,\beta ^i)\nabla \mathcal F(\alpha ^{i1},\beta ^{i1})\).
 (b)Perform the BFGS update$$\begin{aligned} B^i :={\left\{ \begin{array}{ll} B^{i1}, &{} s^T r \le 0,\\ B^{i1}  \frac{(B^{i1} s) (B^{i1}s)^T}{t^T B^{i1} s} + \frac{r r^T}{s^Tr} &{} s^T r > 0. \end{array}\right. } \end{aligned}$$
 (a)
 (3)Compute \(\delta _{\alpha , \beta }\) from$$\begin{aligned} B^i \delta _{\alpha , \beta } = g^i. \end{aligned}$$
 (4)Initialise \(\sigma :=\min \{1, \sigma _{\max }/2\}\), whereRepeat the following:$$\begin{aligned} \sigma _{\max } :=\max \{ \sigma \ge 0 \mid (\alpha ^i, \beta ^i)+\sigma \delta _{\alpha , \beta } > 0\}. \end{aligned}$$
 (a)
Let \((\alpha _\sigma , \beta _\sigma ) :=(\alpha ^i, \beta ^i)+\sigma \delta _{\alpha , \beta }\), and solve the denoising problem (2.3b) for \((\alpha , \beta )=(\alpha _\sigma , \beta _\sigma )\), yielding \(u_\sigma \).
 (b)If the residual \(\Vert (\alpha _\sigma , \beta _\sigma )  (\alpha ^i, \beta ^i)\Vert /\Vert (\alpha _\sigma , \beta _\sigma )\Vert < \rho \), do the following:
 (i)
If \(\min _\sigma \mathcal F(\alpha _\sigma , \beta _\sigma ) < \mathcal F(\alpha ^i, \beta ^i)\) over all \(\sigma \) tried, choose \(\sigma ^*\) the minimiser, set \((\alpha ^{i+1}, \beta ^{i+1}) :=(\alpha _{\sigma ^*}, \beta _{\sigma ^*})\), \(u^{i+1} :=u_{\sigma ^*}\), and continue from Step 5.
 (ii)
Otherwise end the algorithm with solution \((\alpha ^*, \beta ^*) :=(\alpha ^i, \beta ^i)\).
 (i)
 (c)
Otherwise, if Armijo condition \(\mathcal F(\alpha _\sigma , \beta _\sigma ) \le \mathcal F(\alpha ^i, \beta ^i) + \sigma c \nabla \mathcal F(\alpha ^i,\beta ^i)^T \delta _{\alpha , \beta }\) holds, set \((\alpha ^{i+1}, \beta ^{i+1}) :=(\alpha _{\sigma }, \beta _{\sigma })\), \(u^{i+1} :=u_{\sigma }\), and continue from Step 5.
 (d)
In all other cases, set \(\sigma :=\sigma /2\) and continue from Step 4a.
 (a)
 (5)
If the residual \(\Vert (\alpha ^{i+1}, \beta ^{i+1})  (\alpha ^i, \beta ^i)\Vert /\Vert (\alpha ^{i+1}, \beta ^{i+1})\Vert < \rho \), end the algorithm with \((\alpha ^* , \beta ^*) :=(\alpha ^{i+1}, \beta ^{i+1})\). Otherwise continue from Step 1 with \(i :=i+1\).
Step (4) ensures that the iterates remain feasible, without making use of a projection step.
4.2 An Infeasible Semismooth Newton Method
In this section, we consider semismooth Newton methods for solving the \(\text {TGV}^2\) and the ICTV denoising problems. Semismooth Newton methods feature a local superlinear convergence rate and have been previously successfully applied to image processing problems (see, e.g. [21, 29, 32]). The primaldual algorithm we use here is an extension of the method proposed in [29] to the case of higherorder regularisers.
Remark 4.1
Quantified results for the parrot image (\(\ell =256=\text {image width/height in pixels}\))
Denoise  Cost  Initial (\(\alpha \),\(\beta \))  Result (\(\alpha \)*, \(\beta \)*)  Cost  SSIM  PSNR  Its.  Fig. 

\(\hbox {TGV}^{2}\)  \(L_\eta ^1 \nabla \)  \((\alpha _{\mathrm{TV}}^*/\ell ,\alpha _{\mathrm{TV}}^*)\)  (0.069/\(\ell ^{2}\), 0.051/ \(\ell \))  6.615  0.897  31.720  12  4c 
\(\hbox {TGV}^{2}\)  \(L_2^2 \)  \((\alpha _{\mathrm{TV}}^*/\ell ,\alpha _{\mathrm{TV}}^*)\)  (0.058/\(\ell ^{2}\), 0.041/\(\ell \))  6.412  0.890  31.992  11  4d 
ICTV  \(L_\eta ^1 \nabla \)  \((\alpha _{\mathrm{TV}}^*/\ell ,\alpha _{\mathrm{TV}}^*)\)  (0.068/ \(\ell ^{2}\), 0.051/\(\ell \))  6.656  0.895  31.667  16  4e 
ICTV  \(L_2^2 \)  \((\alpha _{\mathrm{TV}}^*/\ell ,\alpha _{\mathrm{TV}}^*)\)  (0.051/\(\ell ^{2}\), 0.041/\(\ell \))  6.439  0.887  31.954  7  4f 
TV  \(L_\eta ^1 \nabla \)  \(0.1/\ell \)  0.057/\(\ell \)  6.944  0.887  31.298  10  4g 
TV  \(L_2^2 \)  \(0.1/\ell \)  0.042/\(\ell \)  6.623  0.879  31.710  12  4h 
4.3 Warm Initialisation
In our numerical experimentation, we generally found Algorithm 4.1 to perform well for learning the regularisation parameter for \(\text {TV}\) denoising as was done in [22]. For learning the two (or even more) regularisation parameters for \(\text {TGV}^2\) denoising, we found that a warm initialisation is needed to obtain convergence. More specifically, we use \(\text {TV}\) as an aid for discovering both the initial iterate \((\alpha ^0,\beta ^0)\) as well as the initial BFGS matrix \(B^1\). This is outlined in the following algorithm:
Algorithm 4.2
 (1)
Solve the corresponding problem for \(\text {TV}\) using Algorithm 4.1. This yields optimal \(\text {TV}\) denoising parameter \(\alpha _\text {TV}^*\), as well as the BFGS estimate \(B_\text {TV}\) for \(\nabla ^2 \mathcal F (\alpha _\text {TV}^*)\).
 (2)
Run Algorithm 4.1 for \(\text {TGV}^2\) with initialisation \((\alpha ^0,\beta ^0) :=(\alpha _\text {TV}^* \delta _0, \alpha _\text {TV}^*)\), and initial BFGS matrix \(B^1 :=\mathrm {diag}(B_\text {TV}\delta _0, B_\text {TV})\).
Quantified results for the synthetic image (\(\ell =256=\text {image width/height in pixels}\))
Denoise  Cost  Initial \(\vec \alpha \)  Result \(\vec \alpha ^*\)  Value  SSIM  PSNR  Its.  Fig. 

TGV\(^{2}\)  \(L_\eta ^1 \nabla \)  \((\alpha _{\mathrm{TV}}^*/\ell ,\alpha _{\mathrm{TV}}^*)\)  (0.453/\(\ell ^{2}\), 0.071/\(\ell \))  3.769  0.989  36.606  17  5c 
TGV\(^{2}\)  \(L_2^2 \)  \((\alpha _{\mathrm{TV}}^*/\ell ,\alpha _{\mathrm{TV}}^*)\)  (0.307/\(\ell ^{2}\), 0.055/\(\ell \))  3.603  0.986  36.997  19  5d 
ICTV  \(L_\eta ^1 \nabla \)  \((\alpha _{\mathrm{TV}}^*/\ell ,\alpha _{\mathrm{TV}}^*)\)  (0.505/\(\ell ^{2}\), 0.103/\(\ell \))  4.971  0.970  34.201  23  5e 
ICTV  \(L_2^2 \)  \((\alpha _{\mathrm{TV}}^*/\ell ,\alpha _{\mathrm{TV}}^*)\)  (0.056/\(\ell ^{2}\), 0.049/\(\ell \))  3.947  0.965  36.206  7  5f 
TV  \(L_\eta ^1 \nabla \)  \(0.1/\ell \)  0.136/\(\ell \)  5.521  0.966  33.291  6  5g 
TV  \(L_2^2 \)  \(0.1/\ell \)  0.052/\(\ell \)  4.157  0.948  35.756  7  5h 
5 Experiments
In this section, we present some numerical experiments to verify the theoretical properties of the bilevel learning problems and the efficiency of the proposed solution algorithms. In particular, we exhaustively compare the performance of the new proposed cost functional with respect to wellknown quality measures, showing a better behaviour of the new cost for the chosen tested images. The performance of the proposed BFGS algorithm, combined with the semismooth Newton method for the lower level problem, is also examined.
Moreover, on basis of the learning setting proposed, a thorough comparison between \(\text {TGV}^2\) and \(\text {ICTV}\) is carried out. The use of higherorder regularisers in image denoising is rather recent, and the question on whether \(\text {TGV}^2\) or ICTV performs better has been around. We target that question and, on basis of the bilevel learning approach, we are able to give some partial answers.
5.1 Gaussian Denoising
We tested Algorithm 4.1 for \(\text {TV}\) and Algorithm 4.2 for \(\text {TGV}^2\) Gaussian denoising parameter learning on various images. Here we report the results for two images, the parrot image in Fig. 4a, and the geometric image in Fig. 5. We applied synthetic noise to the original images, such that the PSNR of the parrot image are 24.7, and the PSNR of the geometric image is 24.8.
In order to learn the regularisation parameter \(\alpha \) for \(\text {TV}\), we picked initial \(\alpha ^0=0.1/\ell \). For \(\text {TGV}^2\), initialisation by \(\text {TV}\) was used as in Algorithm 4.1. We chose the other parameters of Algorithm 4.1 as \(c=1{\textsc {e}}^{4}\), \(\rho =1{\textsc {e}}^{5}\), \(\theta =1{\textsc {e}}{8}\), and \(\Theta =10\). For the SSN denoising method, the parameters \(\gamma =100\) and \(\mu =1{\textsc {e}}^{10}\) were chosen.
We have included results for both the \(L^2\)squared cost functional \({L_2^2}\) and the Huberised total variation cost functional \({L_\eta ^1\!\nabla }\). The learning results are reported in Table 1 for the parrot images, and Table 2 for the geometric image. The denoising results with the discovered parameters are shown in Figs 4 and 5. We report the resulting optimal parameter values, the cost functional value, PSNR, SSIM [46], as well as the number of iterations taken by the outer BFGS method.
Our first observation is that all approaches successfully learn a denoising parameter that gives a goodquality denoised image. Secondly, we observe that the gradient cost functional \({L_\eta ^1\!\nabla }\) performs visually and in terms of SSIM significantly better for \(\text {TGV}^2\) parameter learning than the cost functional \({L_2^2}\). In terms of PSNR, the roles are reversed, as should be, since the \({L_2^2}\) is equivalent to PSNR. This again confirms that PSNR is a poorquality measure for images. For \(\text {TV}\), there is no significant difference between different cost functionals in terms of visual quality, although the PSNR and SSIM differ.
We also observe that the optimal \(\text {TGV}^2\) parameters \((\alpha ^*, \beta ^*)\) generally satisfy \(\beta ^*/\alpha ^* \in (0.75, 1.5)/\ell \). This confirms the earlier observed heuristic that if \(\ell \approx 128,\, 256\) then \(\beta \in (1, 1.5) \alpha \) tends to be a good choice. As we can observe from Figs. 4 and 5, this optimal \(\text {TGV}^2\) parameter choice also avoids the staircasing effect that can be observed with \(\text {TV}\) in the results.
In Fig. 3, we have plotted by the red star the discovered regularisation parameter \((\alpha ^*, \beta ^*)\) reported in Fig. 4. Studying the location of the red star, we may conclude that Algorithms 4.1 and 4.2 manage to find a nearly optimal parameter in very few BFGS iterations.
5.2 Statistical Testing
To obtain a statistically significant outlook to the performance of different regularisers and cost functionals, we made use of the Berkeley segmentation dataset BSDS300 [36], displayed in Fig. 6. We resized each image to 128 pixels on its shortest edge and take the \(128\times 128\) top left square of the image. To this dataset, we applied pixelwise Gaussian noise of variance \(\sigma ^2=2,10\), and 20. We tested the performance of both cost functionals, \({L_\eta ^1\!\nabla }\) and \({L_2^2}\), as well as the \(\text {TGV}^2\), \(\text {ICTV}\), and \(\text {TV}\) regularisers on this dataset, for all noise levels. In the first instance, reported in Figs. 7, 8, 9 and 10 (noise levels \(\sigma ^2=2,20\) only), and Tables 3, 4 and 5, we applied the proposed bilevel learning model on each image individually, to learn the optimal parameters specifically for that image, and a corresponding noisy image for all of the noise levels separately. For the algorithm, we use the same parametrisation as presented in Sect. 5.1.
The figures display the noisy images and indicate by colour coding the best result as judged by the structural similarity measure SSIM [46], PSNR and the objective function value (\({L_\eta ^1\!\nabla }\) or \({L_2^2}\) cost). These criteria are, respectively, the top, middle and bottom rows of colourcoding squares. Red square indicates that \(\text {TV}\) performed the best, green square indicates that \(\text {ICTV}\) performed the best and blue square indicates that \(\text {TGV}^2\) performed the best—this is naturally for the optimal parameters for the corresponding regulariser and cost functional discovered by our algorithms.
Regulariser performance with individual learning, \(L_2^2 \) and \(L_\eta ^1 \nabla \) costs and noise variance \(\sigma ^{2} =\) 2; BSDS300 dataset, resized
SSIM  PSNR  Value  

Mean  Std  Med  Best  Mean  Std  Med  Best  Mean  Std  Med  Best  
Noisy data  0.978  0.015  0.981  0  41.56  0.86  41.95  0  2.9E\(^{4}\)  3.1E\(^{2}\)  2.9E\(^{4}\)  0 
\(L_\eta ^1 \nabla \)TV  0.988  0.005  0.989  1  42.57  1.10  42.46  5  2.4E\(^{4}\)  3.7E\(^{3}\)  2.5E\(^{4}\)  1 
\(L_\eta ^1 \nabla \)ICTV  0.989  0.005  0.990  141  42.74  1.16  42.62  143  2.3E\(^{4}\)  3.9E\(^{3}\)  2.4E\(^{4}\)  137 
\(L_\eta ^1 \nabla \)TGV\(^{2}\)  0.989  0.005  0.989  58  42.70  1.17  42.55  52  2.4E\(^{4}\)  4.0E\(^{3}\)  2.5E\(^{4}\)  62 
95 % t test  \(\hbox {ICTV}> \hbox {TGV}^{2} > \hbox {TV}\)  \(\hbox {ICTV}> \hbox {TGV}^{2} > \hbox {TV}\)  \(\hbox {ICTV}> \hbox {TGV}^{2} > \hbox {TV}\)  
\(L_2^2 \)TV  0.988  0.005  0.988  2  42.64  1.14  42.50  2  0.41  0.08  0.43  2 
\(L_2^2 \)ICTV  0.988  0.005  0.989  142  42.79  1.18  42.64  148  0.39  0.08  0.41  148 
\(L_2^2 \)TGV\(^{2}\)  0.988  0.005  0.989  56  42.76  1.19  42.58  50  0.40  0.08  0.42  50 
95 % t test  \(\hbox {ICTV}> \hbox {TGV}^{2} > \hbox {TV}\)  \(\hbox {ICTV}> \hbox {TGV}^{2} > \hbox {TV}\)  \(\hbox {ICTV}> \hbox {TGV}^{2} > \hbox {TV}\) 
Regulariser performance with individual learning, \(L_2^2 \) and \(L_\eta ^1 \nabla \) costs and noise variance \(\sigma ^{2} =\) 10; BSDS300 dataset, resized
SSIM  PSNR  Value  

Mean  Std  Med  Best  Mean  Std  Med  Best  Mean  Std  Med  Best  
Noisy data  0.731  0.120  0.744  0  27.72  0.88  28.09  0  1.4E\(^{5}\)  2.5E\(^{3}\)  1.4E\(^{5}\)  0 
\(L_\eta ^1 \nabla \)TV  0.898  0.036  0.900  4  31.28  1.63  30.97  8  7.3E\(^{4}\)  2.2E\(^{4}\)  7.3E\(^{4}\)  1 
\(L_\eta ^1 \nabla \)ICTV  0.906  0.034  0.909  139  31.54  1.68  31.21  142  7.1E\(^{4}\)  2.2E\(^{4}\)  7.1E\(^{4}\)  121 
\(L_\eta ^1 \nabla \)TGV\(^{2}\)  0.905  0.035  0.907  57  31.47  1.72  31.10  50  7.1E\(^{4}\)  2.2E\(^{4}\)  7.1E\(^{4}\)  78 
95 % t test  ICTV > TGV\(^{2}>\) TV  ICTV > TGV\(^{2}>\) TV  ICTV > TGV\(^{2}>\) TV  
\(L_2^2 \)TV  0.897  0.033  0.898  9  31.54  1.76  31.15  2  5.52  1.89  5.51  2 
\(L_2^2 \)ICTV  0.903  0.032  0.903  131  31.72  1.76  31.33  148  5.30  1.81  5.35  148 
\(L_2^2 \)TGV\(^{2}\)  0.902  0.033  0.903  60  31.67  1.80  31.28  50  5.38  1.87  5.39  50 
95 % t test  ICTV > TGV\(^{2}>\) TV  ICTV > TGV\(^{2}>\) TV  ICTV > TGV\(^{2}>\) TV 
Regulariser performance with individual learning, \(L_2^2 \) and \(L_\eta ^1 \nabla \) costs and noise variance \(\sigma ^{2} = 20\); BSDS300 dataset, resized
SSIM  PSNR  Value  

Mean  Std  Med  Best  Mean  Std  Med  Best  Mean  Std  Med  Best  
Noisy data  0.505  0.143  0.516  0  21.80  0.92  22.14  0  2.8E\(^{5}\)  7.9E\(^{3}\)  2.8E\(^{5}\)  0 
\(L_\eta ^1 \nabla \)TV  0.795  0.063  0.799  7  27.27  1.64  27.02  11  1.0E\(^{5}\)  3.5E\(^{4}\)  9.7E\(^{4}\)  1 
\(L_\eta ^1 \nabla \)ICTV  0.810  0.061  0.814  120  27.52  1.66  27.24  125  9.7E\(^{4}\)  3.4E\(^{4}\)  9.6E\(^{4}\)  79 
\(L_\eta ^1 \nabla \)TGV\(^{2}\)  0.808  0.062  0.814  73  27.50  1.74  27.15  64  9.8E\(^{4}\)  3.5E\(^{4}\)  9.5E\(^{4}\)  120 
95 % t test  ICTV > TGV\(^{2}>\) TV  ICTV, TGV\(^{2}>\) TV  ICTV, TGV\(^{2}>\) TV  
\(L_2^2 \)TV  0.802  0.056  0.804  8  27.70  1.93  27.28  0  13.65  5.53  13.14  0 
\(L_2^2 \)ICTV  0.811  0.056  0.816  126  27.86  1.91  27.45  138  13.14  5.22  12.62  138 
\(L_2^2 \)TGV\(^{2}\)  0.810  0.057  0.814  66  27.83  1.94  27.41  62  13.28  5.38  12.77  62 
95 % t test  ICTV > TGV\(^{2}>\) TV  ICTV > TGV\(^{2}>\) TV  ICTV > TGV\(^{2}>\) TV 
For the first image of the dataset, \(\text {ICTV}\) does in all of the Figs. 7, 8, 9, 10, 11, 12, 13 and 14 better than \(\text {TGV}^2\), while for the second image, the situation is reversed. We have highlighted these two images for the \({L_\eta ^1\!\nabla }\) cost in Figs. 15, 16, 17 and 18, for both noise levels \(\sigma =2\) and \(\sigma =20\). In the case where \(\text {ICTV}\) does better, hardly any difference can be observed by the eye, while for second image, \(\text {TGV}^2\) clearly has less staircasing in the smooth areas of the image, especially with the noise level \(\sigma =20\).
5.3 The Choice of Cost Functional
Regulariser performance with batch learning, \(L_\eta ^1 \nabla \) and \(L_2^2 \) costs, noise variance \(\sigma ^{2} =\) 2; BSDS300 dataset, resized
SSIM  PSNR  Value  

Mean  Std  Med  Best  Mean  Std  Med  Best  Mean  Std  Med  Best  
Noisy data  0.978  0.015  0.981  16  41.56  0.86  41.95  24  2.9E\(^{4}\)  3.1E\(^{2}\)  2.9E\(^{4}\)  16 
\(L_\eta ^1 \nabla \)TV  0.987  0.006  0.988  23  42.43  1.07  42.37  21  2.5E\(^{4}\)  3.4E\(^{3}\)  2.5E\(^{4}\)  20 
\(L_\eta ^1 \nabla \)ICTV  0.988  0.006  0.989  119  42.56  1.06  42.51  135  2.4E\(^{4}\)  3.5E\(^{3}\)  2.5E\(^{4}\)  113 
\(L_\eta ^1 \nabla \)TGV\(^{2}\)  0.987  0.006  0.989  42  42.51  1.09  42.44  20  2.4E\(^{4}\)  3.6E\(^{3}\)  2.5E\(^{4}\)  51 
95 % t test  ICTV > TGV\(^{2}>\) TV  ICTV > TGV\(^{2}>\) TV  ICTV > TGV\(^{2}>\) TV  
\(L_2^2 \)TV  0.986  0.007  0.987  13  42.46  0.95  42.43  17  0.42  0.07  0.43  17 
\(L_2^2 \)ICTV  0.987  0.007  0.988  139  42.57  0.95  42.56  128  0.41  0.07  0.42  128 
\(L_2^2 \)TGV\(^{2}\)  0.987  0.007  0.988  38  42.53  0.97  42.51  40  0.41  0.07  0.42  40 
95 % t test  ICTV > TGV\(^{2}>\) TV  ICTV > TGV\(^{2}>\) TV  ICTV > TGV\(^{2}>\) TV 
Regulariser performance with batch learning, \(L_\eta ^1 \nabla \) and \(L_2^2 \) costs, noise variance \(\sigma ^{2}=\) 10; BSDS300 dataset, resized
SSIM  PSNR  Value  

Mean  Std  Med  Best  Mean  Std  Med  Best  Mean  Std  Med  Best  
Noisy data  0.731  0.120  0.744  8  27.72  0.88  28.09  2  1.4E\(^{5}\)  2.5E\(^{3}\)  1.4E\(^{5}\)  0 
\(L_\eta ^1 \nabla \)TV  0.893  0.035  0.897  23  31.24  1.87  30.94  23  7.5E\(^{4}\)  2.2E\(^{4}\)  7.3E\(^{4}\)  18 
\(L_\eta ^1 \nabla \)ICTV  0.897  0.034  0.902  134  31.36  1.81  31.11  150  7.4E\(^{4}\)  2.2E\(^{4}\)  7.2E\(^{4}\)  107 
\(L_\eta ^1 \nabla \)TGV\(^{2}\)  0.896  0.035  0.901  35  31.31  1.88  31.01  25  7.4E\(^{4}\)  2.3E\(^{4}\)  7.2E\(^{4}\)  75 
95 % t test  ICTV > TGV\(^{2}>\) TV  ICTV > TGV\(^{2}>\) TV  ICTV, TGV\(^{2}>\) TV  
\(L_2^2 \)TV  0.887  0.035  0.889  29  31.31  1.50  31.15  25  5.72  1.91  5.51  25 
\(L_2^2 \)ICTV  0.889  0.036  0.893  127  31.41  1.44  31.28  131  5.57  1.83  5.37  131 
\(L_2^2 \)TGV\(^{2}\)  0.888  0.035  0.891  44  31.38  1.50  31.20  44  5.64  1.90  5.44  44 
95 % t test  ICTV > TGV\(^{2}>\) TV  ICTV > TGV\(^{2}>\) TV  ICTV > TGV\(^{2}>\) TV 
Regulariser performance with batch learning, \(L_\eta ^1 \nabla \) and \(L_2^2 \) costs, noise variance \(\sigma ^{2} =\) 20; BSDS300 dataset, resized
SSIM  PSNR  Value  

Mean  Std  Med  Best  Mean  Std  Med  Best  Mean  Std  Med  Best  
Noisy data  0.505  0.143  0.516  4  21.80  0.92  22.14  1  2.8E\(^{5}\)  7.9E\(^{3}\)  2.8E\(^{5}\)  0 
\(L_\eta ^1 \nabla \)TV  0.789  0.067  0.798  18  27.37  2.13  26.98  24  1.0E\(^{5}\)  3.7E\(^{4}\)  9.8E\(^{4}\)  14 
\(L_\eta ^1 \nabla \)ICTV  0.795  0.065  0.804  139  27.46  2.10  27.05  141  1.0E\(^{5}\)  3.6E\(^{4}\)  9.6E\(^{4}\)  91 
\(L_\eta ^1 \nabla \)TGV\(^{2}\)  0.794  0.066  0.804  39  27.44  2.12  27.04  34  1.0E\(^{5}\)  3.7E\(^{4}\)  9.6E\(^{4}\)  95 
95 % t test  ICTV > TGV\(^{2}>\) TV  ICTV > TGV\(^{2}>\) TV  TGV\(^{2}>\) ICTV > TV  
\(L_2^2 \)TV  0.786  0.053  0.790  31  27.50  1.71  27.27  33  14.11  5.78  13.16  33 
\(L_2^2 \)ICTV  0.790  0.054  0.790  123  27.56  1.64  27.37  119  13.84  5.54  12.75  119 
\(L_2^2 \)TGV\(^{2}\)  0.789  0.053  0.793  46  27.55  1.70  27.33  48  13.93  5.73  12.95  48 
95 % t test  ICTV, TGV\(^{2}>\) TV  ICTV, TGV\(^{2}>\) TV  ICTV > TGV\(^{2}>\) TV 
6 Conclusion and Outlook
In this paper, we propose a bilevel optimisation approach in function space for learning the optimal choice of parameters in higherorder total variation regularisation. We present a rigorous analysis of this optimisation problem as well as a numerical discussion in the context of image denoising.
Analytically, we obtain the existence results for the bilevel optimisation problem and prove the Fréchet differentiability of the solution operator. This leads to the existence of Lagrange multipliers and a firstorder optimality system characterising optimal solutions. In particular, the existence of an adjoint state allows to obtain a cost functional gradient formula which is of importance in the design of efficient solution algorithms.
Notes
Acknowledgments
This research has been supported by King Abdullah University of Science and Technology (KAUST) Award No. KUKI100743, EPSRC grants Nr. EP/J009539/1 “Sparse & Higherorder Image Restoration” and Nr. EP/M00483X/1 “Efficient computational tools for inverse imaging problems”, Escuela Politécnica Nacional de Quito Award No. PIS 1214, MATHAmSud project SOCDE “Sparse Optimal Control of Differential Equations” and the Leverhulme Trust project on “Breaking the nonconvexity barrier”. While in Quito, T. Valkonen has moreover been supported by SENESCYT (Ecuadorian Ministry of Higher Education, Science, Technology and Innovation) under a Prometeo Fellowship.
References
 1.Benning, M., Brune, C., Burger, M., Müller, J.: Higherorder TV methodsenhancement via Bregman iteration. J. Sci. Comput. 54(2–3), 269–310 (2013)MATHMathSciNetCrossRefGoogle Scholar
 2.Benning, M., Gladden, L., Holland, D., Schönlieb, C.B., Valkonen, T.: Phase reconstruction from velocityencoded MRI measurements—a survey of sparsitypromoting variational approaches. J. Magn. Reson. 238, 26–43 (2014)CrossRefGoogle Scholar
 3.Biegler, L., Biros, G., Ghattas, O., Heinkenschloss, M., Keyes, D., Mallick, B., Tenorio, L., van Bloemen Waanders, B., Willcox, K., Marzouk, Y.: LargeScale Inverse Problems and Quantification of Uncertainty, vol. 712. Wiley, New York (2011)MATHGoogle Scholar
 4.Bonnans, J.F., Tiba, D.: Pontryagin’s principle in the control of semilinear elliptic variational inequalities. Appl. Math. Optim. 23(1), 299–312 (1991)MATHMathSciNetCrossRefGoogle Scholar
 5.Bredies, K., Kunisch, K., Pock, T.: Total generalized variation. SIAM J. Imaging Sci. 3, 492–526 (2011)MATHMathSciNetCrossRefGoogle Scholar
 6.Bredies, K., Holler, M.: A total variationbased jpeg decompression model. SIAM J. Imaging Sci. 5(1), 366–393 (2012)MATHMathSciNetCrossRefGoogle Scholar
 7.Bredies, K., Kunisch, K., Valkonen, T.: Properties of \(L^1\text{ TGV }^2\): the onedimensional case. J. Math. Anal. Appl. 398, 438–454 (2013)MATHMathSciNetCrossRefGoogle Scholar
 8.Bredies, K., Valkonen, T.: Inverse problems with secondorder total generalized variation constraints. In: Proceedings of the 9th International Conference on Sampling Theory and Applications (SampTA), Singapore (2011)Google Scholar
 9.BuiThanh, T., Willcox, K., Ghattas, O.: Model reduction for largescale systems with highdimensional parametric input space. SIAM J. Sci. Comput. 30(6), 3270–3288 (2008)MATHMathSciNetCrossRefGoogle Scholar
 10.Calatroni, L., De los Reyes, J.C., Schönlieb, C.B.: Dynamic sampling schemes for optimal noise learning under multiple nonsmooth constraints. In: Poetzsche, C. (ed.) System Modeling and Optimization, pp. 85–95. Springer Verlag, New York (2014)CrossRefGoogle Scholar
 11.Chambolle, A., Lions, P.L.: Image recovery via total variation minimization and related problems. Numer. Math. 76, 167–188 (1997)MATHMathSciNetCrossRefGoogle Scholar
 12.Chan, T., Marquina, A., Mulet, P.: Highorder total variationbased image restoration. SIAM J. Sci. Comput. 22(2), 503–516 (2000)MATHMathSciNetCrossRefGoogle Scholar
 13.Chan, T.F., Kang, S.H., Shen, J.: Euler’s elastica and curvaturebased inpainting. SIAM J. Appl. Math. 63(2), 564–592 (2002)MATHMathSciNetGoogle Scholar
 14.Chen, Y., Pock, T., Bischof, H.: Learning \(\ell _1\)based analysis and synthesis sparsity priors using bilevel optimization. In: Workshop on Analysis Operator Learning versus Dictionary Learning, NIPS 2012 (2012)Google Scholar
 15.Chen, Y., Ranftl, R., Pock, T.: Insights into analysis operator learning: from patchbased sparse models to higherorder mrfs. IEEE Trans. Image Process. (2014) (to appear)Google Scholar
 16.Chung, J., Español, M.I., Nguyen, T.: Optimal regularization parameters for generalform tikhonov regularization. arXiv preprint arXiv:1407.1911 (2014)
 17.Dauge, M.: Neumann and mixed problems on curvilinear polyhedra. Integr. Equ. Oper. Theory 15(2), 227–261 (1992)MATHMathSciNetCrossRefGoogle Scholar
 18.De los Reyes, J.C., Meyer, C.: Strong stationarity conditions for a class of optimization problems governed by variational inequalities of the second kind. J. Optim. Theory Appl. 168(2), 375–409 (2015)MATHMathSciNetCrossRefGoogle Scholar
 19.De los Reyes, J.C., Schönlieb, C.B., Valkonen, T.: The structure of optimal parameters for image restoration problems. J. Math. Anal. Appl. 434(1), 464–500 (2016)MATHMathSciNetCrossRefGoogle Scholar
 20.De los Reyes, J.C.: Optimal control of a class of variational inequalities of the second kind. SIAM J. Control Optim. 49(4), 1629–1658 (2011)MATHMathSciNetCrossRefGoogle Scholar
 21.De los Reyes, J.C., Hintermüller, M.: A duality based semismooth Newton framework for solving variational inequalities of the second kind. Interfaces Free Bound. 13(4), 437–462 (2011)MATHMathSciNetCrossRefGoogle Scholar
 22.De los Reyes, J.C., Schönlieb, C.B.: Image denoising: learning the noise model via nonsmooth PDEconstrained optimization. Inverse Probl. Imaging 7(4), 1139–1155 (2013)MATHMathSciNetCrossRefGoogle Scholar
 23.Domke, J.: Generic methods for optimizationbased modeling. In: International Conference on Artificial Intelligence and Statistics, pp. 318–326 (2012)Google Scholar
 24.Gröger, K.: A \(W^{1, p}\)estimate for solutions to mixed boundary value problems for second order elliptic differential equations. Math. Ann. 283(4), 679–687 (1989)MATHMathSciNetCrossRefGoogle Scholar
 25.Haber, E., Tenorio, L.: Learning regularization functionals—a supervised training approach. Inverse Probl. 19(3), 611 (2003)MATHMathSciNetCrossRefGoogle Scholar
 26.Haber, E., Horesh, L., Tenorio, L.: Numerical methods for the design of largescale nonlinear discrete illposed inverse problems. Inverse Probl. 26(2), 025002 (2010)MATHMathSciNetCrossRefGoogle Scholar
 27.Hinterberger, W., Scherzer, O.: Variational methods on the space of functions of bounded hessian for convexification and denoising. Computing 76(1), 109–133 (2006)MATHMathSciNetCrossRefGoogle Scholar
 28.Hintermüller, M., Laurain, A., Löbhard, C., Rautenberg, C.N., Surowiec, T.M.: Elliptic mathematical programs with equilibrium constraints in function space: Optimality conditions and numerical realization. In: Rannacher, R. (ed.) Trends in PDE Constrained Optimization, pp. 133–153. Springer International Publishing, Berlin (2014)Google Scholar
 29.Hintermüller, M., Stadler, G.: An infeasible primaldual algorithm for total bounded variationbased infconvolutiontype image restoration. SIAM J. Sci. Comput. 28(1), 1–23 (2006)MATHMathSciNetCrossRefGoogle Scholar
 30.Hintermüller, M., Wu, T.: Bilevel optimization for calibrating point spread functions in blind deconvolution. Preprint (2014)Google Scholar
 31.Knoll, F., Bredies, K., Pock, T., Stollberger, R.: Second order total generalized variation (TGV) for MRI. Magn. Reson. Med. 65(2), 480–491 (2011)CrossRefGoogle Scholar
 32.Kunisch, K., Hintermüller, M.: Total bounded variation regularization as a bilaterally constrained optimization problem. SIAM J. Imaging Sci. 64(4), 1311–1333 (2004)MATHMathSciNetGoogle Scholar
 33.Kunisch, K., Pock, T.: A bilevel optimization approach for parameter learning in variational models. SIAM J. Imaging Sci. 6(2), 938–983 (2013)MATHMathSciNetCrossRefGoogle Scholar
 34.Luo, Z.Q., Pang, J.S., Ralph, D.: Mathematical Programs with Equilibrium Constraints. Cambridge University Press, Cambridge (1996)MATHCrossRefGoogle Scholar
 35.Lysaker, M., Tai, X.C.: Iterative image restoration combining total variation minimization and a secondorder functional. Int. J. Comput. Vis. 66(1), 5–18 (2006)MATHCrossRefGoogle Scholar
 36.Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings of the 8th International Conference on Computer Vision, vol. 2, pp. 416–423 (2001). The database is available online at http://www.eecs.berkeley.edu/Research/Projects/CS/vision/bsds/BSDS300/html/dataset/images.html
 37.Masnou, S., Morel, J.M.: Level lines based disocclusion. In: 1998 IEEE International Conference on Image Processing (ICIP 98), pp. 259–263 (1998)Google Scholar
 38.Outrata, J.V.: A generalized mathematical program with equilibrium constraints. SIAM J. Control Optim. 38(5), 1623–1638 (2000)MATHMathSciNetCrossRefGoogle Scholar
 39.Papafitsoros, K., Schönlieb, C.B.: A combined first and second order variational approach for image reconstruction. J. Math. Imaging Vis. 48(2), 308–338 (2014)MATHMathSciNetCrossRefGoogle Scholar
 40.Ring, W.: Structural properties of solutions to total variation regularization problems. ESAIM 34, 799–810 (2000)MATHMathSciNetCrossRefGoogle Scholar
 41.Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D 60, 259–268 (1992)MATHMathSciNetCrossRefGoogle Scholar
 42.Sun, D., Han, J.: Newton and quasiNewton methods for a class of nonsmooth equations and related problems. SIAM J. Optim. 7(2), 463–480 (1997)MATHMathSciNetCrossRefGoogle Scholar
 43.Tappen, M.F.: Utilizing variational optimization to learn Markov random fields. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’07), pp. 1–8 (2007)Google Scholar
 44.Valkonen, T., Bredies, K., Knoll, F.: Total generalised variation in diffusion tensor imaging. SIAM J. Imaging Sci. 6(1), 487–525 (2013)MATHCrossRefGoogle Scholar
 45.Viola, F., Fitzgibbon, A., Cipolla, R.: A unifying resolutionindependent formulation for early vision. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 494–501 (2012)Google Scholar
 46.Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)CrossRefGoogle Scholar
 47.Zowe, J., Kurcyusz, S.: Regularity and stability for the mathematical programming problem in Banach spaces. Appl. Math. Optim. 5(1), 49–62 (1979)MATHMathSciNetCrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.