Structural Changes in Nonlocal Denoising Models Arising Through Bi-Level Parameter Learning

We introduce a unified framework based on bi-level optimization schemes to deal with parameter learning in the context of image processing. The goal is to identify the optimal regularizer within a family depending on a parameter in a general topological space. Our focus lies on the situation with non-compact parameter domains, which is, for example, relevant when the commonly used box constraints are disposed of. To overcome this lack of compactness, we propose a natural extension of the upper-level functional to the closure of the parameter domain via Gamma-convergence, which captures possible structural changes in the reconstruction model at the edge of the domain. Under two main assumptions, namely, Mosco-convergence of the regularizers and uniqueness of minimizers of the lower-level problem, we prove that the extension coincides with the relaxation, thus admitting minimizers that relate to the parameter optimization problem of interest. We apply our abstract framework to investigate a quartet of practically relevant models in image denoising, all featuring nonlocality. The associated families of regularizers exhibit qualitatively different parameter dependence, describing a weight factor, an amount of nonlocality, an integrability exponent, and a fractional order, respectively. After the asymptotic analysis that determines the relaxation in each of the four settings, we finally establish theoretical conditions on the data that guarantee structural stability of the models and give examples of when stability is lost.


Introduction
One of the most widely used methods to solve image restoration problems is the variational regularization approach. This variational approach consists of minimizing a reconstruction functional that decomposes into a fidelity and a regularization term, which give rise to competing effects. While the fidelity term ensures that the reconstructed image is close to the (noisy) data, the regularization term is designed to remove the noise by incorporating prior information on the clean image. In the case of a simple L 2 -fidelity term, the reconstruction functional is given by where ⊂ R n is the image domain, u η ∈ L 2 ( ) the noisy image, and R : L 2 ( ) → [0, ∞] the regularizer. A common choice for R is the total variation (T V ) regularization proposed by Rudin, Osher, & Fatemi [52], which penalizes sharp oscillations, but does not exclude edge discontinuities, as they appear in most images. Since its introduction, the T Vmodel has inspired a variety of more advanced regularization terms, like the infimalconvolution total variation (I CT V ) [19], the total generalized variation (T GV ) [14], and many more, cf. [10] and the references therein. Due to the versatility of the variational formulation, regularizers of a completely different type can be used as well. Recently, a lot of attention has been directed towards regularizers incorporating nonlocal effects, such as those induced by difference quotients [5,11,15,38] and fractional operators [1,3,4]. Nonlocal regularizers have the advantage of not requiring the existence of (full) derivatives, allowing to work with functions that are less regular than those in the local counterpart.
With an abundance of available choices, finding a suitable regularization term for a specific application is paramount for obtaining accurate reconstructions. This is often done by fixing a parameter-dependent family of regularizers and tuning the parameter in accordance with the noise and data. Carrying out this process via trial and error can be hard and inefficient, which led to the development of a more structured approach in the form of bi-level optimization. We refer, e.g., to [30,31] (see also [21,22,35,53]) and to the references therein, as well as to [33] for a detailed overview. The idea behind bi-level optimization is to employ a supervised learning scheme based on a set of training data consisting of noisy images and their corresponding clean versions. To determine an optimal parameter, one minimizes a selected cost functional which quantifies the error with respect to the training data. Overall, this results in a nested variational problem with upper-and lower-level optimization steps related to the cost and reconstruction functional, respectively. Key aspects of the mathematical study of these bi-level learning schemes include establishing the existence of solutions and deriving optimality conditions, which lay the foundation for devising reliable numerical solution methods.
In recent years, there has been a rapid growth in the literature devoted to addressing the above questions. To mention but a few examples, we first refer the paper [41] dealing with learning real-valued weight parameters in front of the regularization terms for a rather general class of inverse problems; in [2,6], the authors optimize the fractional parameter of a regularizer depending on the spectral fractional Laplacian; spatially dependent weights are determined through training via other nonlocal bilevel schemes (e.g., inside the Gagliardo semi-norm [40] or in a type of fractional gradient [32]), and in classical T V -models [23,39,47]; as done in [29], one can also learn the fidelity term instead of the regularizer.
A common denominator in the above references is the presence of certain a priori compactness constraints on the set of admissible parameters, such as box constraints like in [41], where the weights are assumed to lie in some compact interval away from 0 and infinity. These conditions make it possible to prove stability of the lower-level problem and obtain existence of optimal parameters within a class of structurally equivalent regularizers. However, imposing artificial restrictions to the parameter range like these may lead to suboptimal results depending on the given training data.
It is then substantial to consider removing such constraints in order to work on maximal domains naturally associated with the parameters, which is also our focus in this paper. An inherent effect of this approach is that qualitative changes in the structure of the regularizer may occur at the edges of the domain. If optimal parameters are attained at the boundary, this indicates that the chosen class of regularization terms is not well-suited to the training data. To exclude these degenerate cases, it is of interest to provide analytic conditions to guarantee that the optimal parameters are attained in the interior of the domain, thereby preserving the structure of the regularizer. The first work to address the aforementioned tasks is [30] by De Los Reyes, Schönlieb, & Valkonen, where optimization is carried out for weighted sums of local regularizers of different type with each weight factor allowed to take any value in [0, ∞]. As such, their bi-level scheme is able to encompass multiple regularization structures at once, like T V and T V 2 and their interpolation T GV . Similarly, the authors in [44] vary the weight factor in the whole range [0, ∞] as well as the underlying finite-dimensional norm of the total variation regularizer. We also mention [28], where the order of a newly introduced nonlocal counterpart of the T GV -regularizer is tuned, and [27], which studies a bi-level scheme covering the cases of T V , T GV 2 , and N sT GV 2 in a comprehensive way.
In this paper, we introduce a unified framework to deal with parameter learning beyond structural stability in the context of bi-level optimization schemes. In contrast to the above references, where the analysis is tailored to a specifically chosen type of parameter dependence, our regularizers can exhibit a general dependence on parameters in a topological space. Precisely, we consider a parametrized family of regularizers R λ : L 2 ( ) → [0, ∞] with λ ranging over a subset of a topological space X , which is assumed to be first countable. If we focus for brevity on a single data point (u c , u η ) ∈ L 2 ( ) × L 2 ( ), with u c and u η the clean and noisy images (see Sect. 2 for larger data sets), the bi-level optimization problem reads: (Lower−level) K λ := arg min where J λ (u) := u − u η 2 L 2 ( ) + R λ (u) is the reconstruction functional. Our approach for studying this general bi-level learning scheme relies on asymptotic tools from the calculus of variations. We define a suitable notion of stability for the lower-level problems that requires the family of functionals {J λ } λ∈ to be closed under taking -limits; see [13,25] for a comprehensive introduction on -convergence. Since -convergence ensures the convergence of sequences of minimizers, one can conclude that, in the presence of stability, the upper-level functional I admits a minimizer (Theorem 2.3).
A different strategy is required to obtain the existence of solutions when stability fails. Especially relevant here is the case of real-valued parameters when box constraints are disposed of and non-closed intervals are considered; clearly, stability is then lost for the simple fact that a sequence of parameters can converge to the boundary of . To overcome this issue, we propose a natural extension I : → [0, ∞] of I, now defined on the closure of our parameter domain, and identified via -convergence of the lower-level functionals. Precisely, where the functionals J λ : L 2 ( ) → [0, ∞] are characterized as L 2 -weak -limits (if they exist) of functionals J λ with λ → λ. To justify the choice of this particular extension, we derive an intrinsic connection with relaxation theory in the calculus of variations (for an introduction, see, e.g., [24,Chapter 9] and the references therein). Explicitly, the relaxation of the upper-level functional I is given by its lower semicontinuous envelope (after the trivial extension to by ∞), This relaxed version of I has the desirable property that it admits a minimizer (if is compact) and minimizing sequences of I have subsequences that converge to an optimal parameter of I rlx . Our main theoretical result (Theorem 2.5) shows that the extension I coincides with the relaxation I rlx under suitable assumptions and therefore inherits the same properties (cf. Corollary 2.8).
Besides the generic conditions that each R λ is weakly lower semicontinuous and has non-empty domain (see (H)), which ensure that J λ possesses a minimizer, we work under two main assumptions: (i) The Mosco-convergence of the regularizers, i.e., -convergence with respect to the strong and weak L 2 -topology, and (ii) the uniqueness of minimizers of J λ for λ ∈ \ .
We demonstrate in Example 2.7 that these assumptions are in fact optimal. Due to (i), the -limits J λ preserve the additive decomposition into the L 2 -fidelity term and a regularizer, and coincide with J λ inside . As a consequence of the latter, it follows that I = I in , making I a true extension of I. For the parameter values at the boundary, λ ∈ \ , however, the regularizers present in J λ can have a completely different structure from the family of regularizers {R λ } λ∈ that we initially started with. When the optimal parameter of the extended problem is attained inside , one recovers instead a solution to the original training scheme, yielding structure preservation. For a discussion on related results in the context of optimal control problems [9,16,17], we refer to the end of Sect. 2.
To demonstrate the applicability of our abstract framework, we investigate a quartet of practically relevant scenarios with families of nonlocal regularizers that induce qualitatively different structural changes; namely, learning the optimal weight, varying the amount of nonlocality, optimizing the integrability exponent, and tuning the fractional parameter. More precisely, in all these four applications, our starting point is a non-closed real interval ⊂ [−∞, ∞] and we seek to determine the extension I on the closed interval , which admits a minimizer by the theory outlined above. The first step is to calculate the Mosco-limits of the regularizers, which reveals the type of structural change occurring at the boundary points. Subsequently, we study for which training sets of clean and noisy images the optimal parameters are attained either inside or at the edges. In two cases, we determine explicit analytic conditions on the data that guarantee structure preservation for the optimization process.
The first setting involves a rather general nonlocal regularizer R : L 2 ( ) → [0, ∞] multiplied by a weight parameter α in = (0, ∞). Inside the domain, we observe structural stability as J α = J α for all α ∈ ; in contrast, the regularization disappears when α = 0 and forces the solutions to be constant when α = ∞. Moreover, we derive sufficient conditions in terms of the data that prevent the optimal parameter from being attained at the boundary points; for a single data point (u c , u η ), they specify to see Theorem 3.2. Notice that the first of these two conditions is comparable to the one in [30,Eq. (10)] and shows positivity of optimal weights. Inspired by the use of different L p -norms in image processing, such as in the form of quadratic, T V , and Lipschitz regularization [50,Sect. 4], we focus our second case on the integrability exponent of nonlocal regularizers of double-integral type; precisely, functionals of the form with a suitable f : × × R × R → [0, ∞). Possible choices for the integrand f include bounded functions or functions of difference-quotient type. We prove stability of the lower-level problem in , and determine the Mosco-limit for p → ∞ via L papproximation techniques as in [20,42]. In particular, we show that it is given by a double-supremal functional of the form In order to see how this structural change affects the image reconstruction, we highlight examples of training data for which the supremal regularizer performs better or worse than the integral counterparts. As a third application, we consider two families of nonlocal regularizers {R δ } δ∈ with = (0, ∞), which were introduced by Aubert & Kornprobst [5] and Brezis & Nguyen in [15], respectively, and are closely related to nonlocal filters frequently used in image processing. The parameter δ reflects the amount of nonlocality in the regularizer. It is known that the functionals R δ tend, as δ → 0, to a multiple of the total variation in the sense of -convergence. Based on these results, we prove in both cases that the reconstruction functional of our bi-level optimization scheme turns into the classical T V -denoising model when δ = 0, whereas the regularization vanishes at the other boundary value, δ = ∞. As such, the extended bi-level schemes encode simultaneously nonlocal and total variation regularizations. We round off the discussion by presenting some instances of training data where the optimal parameters are attained either at the boundary or in the interior of .
Our final bi-level optimization problem features a different type of nonlocality arising from fractional operators; to be precise, we consider, in the same spirit as in [1], the L 2 -norm of the spectral fractional Laplacian as a regularizer. The parameter of interest here is the order s/2 of the fractional Laplacian, which is taken in the fractional range s ∈ = (0, 1). At the values s = 0 and s = 1, we recover local models with regularizers equal to the L 2 -norm of the function and its gradient, respectively. Thus, one expects the fractional model to perform better than the two local extremes. We quantify this presumption by deriving analytic conditions in terms of the eigenfunctions and eigenvalues of the classical Laplacian on ensuring the optimal parameters to be attained in the truly fractional regime. These conditions on the training data are established by proving and exploiting the differentiability of the extended upper-level functional I.
For completeness, we mention that practically relevant scenarios when is a topological space include those in which the reconstruction parameters are spacedependent, and thus described by functions. The analysis of this class of applications is left open for future investigations.
The outline of the paper is as follows. In Sect. 2, we present the general abstract bilevel framework, and prove the results regarding the existence of optimal parameters and the two types of extensions of bi-level optimization schemes. Sections 3-6 then deal with the four different, practically relevant applications mentioned in the previous paragraph. As a note, we point out that they are each presented in a self-contained way, allowing the readers to move directly to the sections that correspond best to their interests.

Establishing the Unified Framework
Let ⊂ R n be an open bounded set, and let N j=1 be a set of available square-integrable training data, where each u c j represents a clean image and u η j a distorted version thereof, which can be obtained, for instance, by applying some noise to u c j . These data are collected in the vector-valued functions To reconstruct each damaged image, u η j , we consider denoising models that consist of a simple fidelity term and a (possibly nonlocal) regularizer; precisely, we minimize functionals J λ, j : where the regularizer R λ : L 2 ( ) → [0, ∞], with Dom R λ = {v ∈ L 2 ( ) : R λ (u) < ∞}, is a (possibly nonlocal) functional parametrized over λ ∈ with a subset of a topological space X satisfying the first axiom of countability. Throughout the paper, we always assume that for every λ ∈ , we have Dom R λ is non-empty, R λ is weakly L 2 -lower semicontinuous.
Observe that the functionals J λ, j then have a minimizer by the direct method in the calculus of variations. The result of the reconstruction process, meaning the quality of the reconstructed image resulting as a minimizer of (2.1), is known to depend on the choice of the regularizing term R λ . Our goal is to set up a training scheme that is able to learn how to select a "good" parameter λ within a corresponding given family {R λ } λ∈ of regularizers. Here, as briefly described in the Introduction for the single data point case (N = 1), we follow the approach introduced in [30,31] in the spirit of machine learning optimization schemes, where training the regularization term means to solve the nested variational problem with J λ, j as in (2.1). Notice that K λ = ∅ because for all j ∈ {1, . . . , N }, we have by Assumption (H).
To study the training scheme (T ), we start by introducing a notion of weak L 2stability for the family {J λ } λ∈ , with This notion relies on the concept of -convergence and is related to the notion of (weak) stability as in [41,Definition 2.3], which is defined in terms of minimizers of the lower-level problem. Before proceeding, we briefly recall the definition and some properties ofconvergence in the setting relevant to us; for more on this topic, see [13,25] for instance.

Definition 2.2 (and Mosco
• (Limsup inequality) For every u ∈ L 2 ( ), there exists a sequence (u k ) k ⊂ L 2 ( ) such that u k u in L 2 ( ) and The sequence (F k ) k converges in the sense of Mosco-convergence in L 2 ( ) to F, written F = Mosc(L 2 )-lim k→∞ F k , if, in addition, the limsup inequality can be realised by a sequence converging strongly in L 2 ( ).
If the liminf inequality holds, then the sequence from the limsup inequality automatically satisfies lim k→∞ F k (u k ) = F(u), and is therefore often called a recovery sequence. We note that the above sequential definition of -convergence coincides with the topological definition [25, Proposition 8.10] for equi-coercive sequences (F k ) k , i.e., F k ≥ for all k ∈ N and for some : In particular, the theory implies that the -limit F is (sequentially) L 2 -weakly lower semicontinuous. The -convergence has the key property of yielding the convergence of solutions (if they exist) to those of the limit problem, which makes it a suitable notion of variational convergence. Precisely, if u k is a minimizer of F k for all k ∈ N and u a cluster point of the sequence (u k ) k , then u is a minimizer of F and min [25,Corollary 7.20]. Notice that the existence of cluster points is implied by the assumption of equi-coercivity. In the special case when (F k ) k is a constant sequence of functionals, say F k = G for all k ∈ N, the -limit corresponds to the relaxation of G, i.e., its L 2 -weakly lower semicontinuous envelope. Observe that replacing each F k by its relaxation does not affect the -limit of (F k ) k , see [25,Proposition 6.11].
As we discuss next, weak L 2 -stability provides existence of solutions to the training scheme (T ). We note that the family of functionals {J λ } λ∈ as in (2.3) is equi-coercive in a componentwise sense. Proof The statement follows directly from the direct method and the classical properties of -convergence.
Let (λ k ) k ⊂ be a minimizing sequence for I. Then, for each k ∈ N, there is In particular, (w k ) k is uniformly bounded in L 2 ( ; R N ); hence, extracting a subsequence if necessary, one may assume that w k w in L 2 ( ; R N ) as k → ∞ for some w ∈ L 2 ( ; R N ). Using the equi-coercivity, we apply the compactness result for -limits [25,Corollary 8.12] to find a further subsequence of (λ k ) k (not relabeled) such that (J λ k , j ) k (w-L 2 )-converges for all j ∈ {1, ..., N }. Consequently, by the weak L 2 -stability assumption and the properties of -convergence on minimizing sequences, there existsλ ∈ such that w ∈ Kλ. Then, along with (2.4), which finishes the proof.

Remark 2.4
We give a simple counterexample to illustrate that minimizers for I may not exist in general. Take = (0, ∞) ⊂ R, a single data point (u c , u η ) with u c = u η = 0, and R λ (u) = λ u 2 which does not have a minimizer on = (0, ∞). By the previous theorem, the family must fail to be weakly L 2 -stable. Indeed, (w-L 2 )lim λ→0 J λ coincides with the pointwise limit and is equal to · −u η 2 L 2 ( ) , which is not an element of {J λ } λ∈(0,∞) . Theorem 2.3 is useful in many situations, including the basic case when the parameter set is a compact real interval. However, weak L 2 -stability is not always guaranteed, as Remark 2.4 illustrates. If, for instance, we have a sequence (λ k ) k converging to a point in X outside , then there is no reason to expect that To overcome this issue and provide a more general existence framework, we will look at a suitable replacement of the bi-level scheme. In the following, we denote by the closure of and suppose that for each j ∈ {1, . . . , N } and λ ∈ , the -limits exist, where λ takes values on an arbitrary sequence in . We further set Based on these definitions, we introduce I : → [0, ∞] as the extension of the upper level functional I given by Observe that K λ, j is L 2 -weakly closed because the functional J λ, j , as a (w-L 2 )-limit by (2.5), is L 2 -weakly lower semicontinuous. Hence, the minimum in the definition of I is actually attained. Notice that taking constant sequences in the parameter space in (2.5) and using the weak lower semicontinuity of the regularizers R λ in (H), we conclude that J λ coincides with J λ whenever λ ∈ . In that sense, we can think of {J λ } λ∈ as the extension of the family {J λ } λ∈ to the closure of .

All together, this leads to the extended bi-level problem (Upper-level)
Minimize The theorem below compares the extended upper level functional I with the relaxation of I (after trivial extension to by ∞), that is, with its lower semicontinuous envelope I rlx : → [0, ∞] given by As we will see, the key assumption to obtain the equality between I and I rlx is the Mosco-convergence of the family of regularizers in (2.9), which is stronger than the -convergence of the reconstruction functionals in (2.5). It even implies the Moscoconvergence and, in this case, the limit passage can be performed additively in the fidelity and regularizing term; thus, for all j ∈ {1, . . . , N }, we have for u ∈ L 2 ( ).
exist for each λ ∈ , with λ taking values on sequences in , and (ii) K λ is a singleton for every λ ∈ \ .
Then, the extension I of I to the closure coincides with the relaxation of I, i.e., I = I rlx on .
Proof To show that I ≤ I rlx , we take λ ∈ and let (λ k ) k ⊂ with λ k → λ in be an admissible sequence for I rlx (λ) in (2.7). We may even assume that ∞ > lim inf k→∞ I(λ k ) = lim k→∞ I(λ k ). Then, recalling (2.2) and fixing δ > 0, we can find w k ∈ K λ k such that In particular, (w k ) k is uniformly bounded in L 2 ( ; R N ), which allows us to extract an L 2 -weakly converging subsequence (not relabeled) with limitw ∈ L 2 ( ; R N ). By the properties of -convergence on cluster points of minimizing sequences recalled above (see also [25,Corollary 7.20]), we infer from (2.5) thatw j ∈ argmin u∈L 2 ( ) J λ, j (u) for all j ∈ {1, . . . , N }; in other words,w ∈ K λ . Thus, By letting δ → 0 first, and then taking the infimum over all admissible sequences for To prove the reverse inequality, we start by recalling that for λ ∈ , J λ is weakly L 2 -lower semicontinuous by Assumption (H); thus, (2.5) yields J λ = J λ for λ ∈ . Consequently, I(λ) = I(λ) ≥ I rlx (λ) for λ ∈ . We are then left to consider λ ∈ \ and find a sequence (λ k ) k ⊂ converging to λ in and satisfying lim inf k→∞ I(λ k ) ≤ I(λ). To that end, take any (λ k ) k ⊂ with λ k → λ in , and let w k ∈ K λ k for k ∈ N. Recalling (ii), denote by w λ = (w λ,1 , . . . , w λ,N ) the unique element in K λ . Then, using (2.5) and the equi-coercivity of (J λ ) λ∈ , we obtain by the theory of -convergence (see [25,Corollary 7.24 The following shows that (w k ) k converges even strongly in L 2 ( ; R N ). Indeed, fixing j ∈ {1, . . . , N }, we infer from (2.10) along with the Mosco-convergence of the regularizers in (i) and (2.8) that thus, w k → w λ strongly in L 2 ( ; R N ) using the combination of weak convergence and convergence of norms by the Radon-Riesz property. With this, we finally conclude that finishing the proof.

Remark 2.6
By inspecting the proof, it becomes clear that the estimate I ≤ I rlx holds without the additional assumptions (i) and (ii) from the previous theorem; in other words, I always provides a lower bound for the relaxation of I.
The identity I = I rlx mail fail if either of the assumptions (i) or (ii) in Theorem 2.5 is dropped as the following example shows.
(2.5) and (2.1)) exist and are given by where χ E denotes the indicator function of a set E ⊂ L 2 ( ), i.e., The non-trivial case is when λ = 0. In this case, we observe that we can take (v λ ) λ as a recovery sequence for u = 0 because it converges weakly in L 2 ( ) as λ → 0 to´( 0,1) n v dx = 0 by the Riemann-Lebesgue lemma for periodically oscillating sequences. For the liminf inequality, let u λ u as λ → 0 and suppose without loss of generality that which completes the proof of (2.11) when λ = 0.
In view of (2.11), one can now read off that K λ = K λ = {v λ /(1+λ)} for λ ∈ (0, 1] and K 0 = {0}. In particular, condition (ii) on the uniqueness of minimizers of the extended lower-level problem is fulfilled here. Hence, for λ ∈ (0, 1], and It is immediate to see from (2.12) that Notice that this example hinges on the fact that the minimizers v λ /(1+λ) only converge weakly as λ → 0, which, in view of the proof of Theorem 2.5, implies that the family of regularizers {R λ } λ∈ does not Mosco-converge in L 2 ( ) in the sense of (2.9), thus failing to satisfy (i). b) For the necessity of (ii), consider = (0, 1], a single data point (u c , u η ) with u c = 0 and u η 2 L 2 ( ) = 1, and Consequently, for λ ∈ (0, 1], we have J λ = J λ and u η is its unique minimizer; in contrast, for λ = 0, J 0 has two minimizers, namely K 0 = {u η , 0} = {u η , u c }. Finally, we observe that the conclusion of Theorem 2.5 fails here because The following result is a direct consequence of Theorem 2.5 and standard properties of relaxation.

Corollary 2.8 Under the assumptions of Theorem 2.5 and if is compact, it holds that:
(i) The extension I has at least one minimizer and min I = inf I.
(ii) Any minimizing sequence (λ k ) k ⊂ of I converges up to subsequence to a minimizer λ ∈ of I. (iii) If λ ∈ minimizes I, then λ is also a minimizer of I.
We conclude this section on the theoretical framework with a brief comparison with related works on optimal control problems. By setting K = {(w, λ) ∈ L 2 ( ) × : w ∈ K λ }, the bi-level optimization problem (T ) can be equivalently rephrased into minimizing as a functional of two variables; observe that Similar functionals and their relaxations have been studied in the literature, including [9,16,17]. Especially the paper [9] by Belloni, Buttazzo, & Freddi, where the authors propose to extend the control space to its closure and find a description of the relaxed optimal control problem, shares many parallels with our results. Apart from some differences in the assumptions and abstract set-up, the main reason why their results are not applicable here is the continuity condition of the cost functional with respect to the state variable [9, Eq. (2.11)]. In our setting, this would translate into weak continuity of the L 2 -norm, which is clearly false. The argument in the proof of Theorem 2.5 exploiting the Mosco-convergence of the regularizers (see (2.9)) is precisely what circumvents this issue.

Learning the Optimal Weight of the Regularization Term
In this section, we study the optimization of a weight factor, often called tuning parameter, in front of a fixed regularization term. Such tuning parameters are typically employed in practical implementations of variational denoising models to adjust the best level of regularization. This setting constitutes a simple, yet non-trivial, application of our general theory and therefore helps to exemplify the abstract results from the previous section. As above, ⊂ R n is a bounded open set and u c , u η ∈ L 2 ( ; R N ) are the given data representing pairs of clean and noisy images. We take = (0, ∞) describing the range of a weight factor and, to distinguish the various parameters throughout this paper, denote by α an arbitrary point in = [0, ∞]. For a fixed map R : with the properties that (H1 α ) R is convex, vanishes exactly on constant functions, and Dom R is dense in L 2 ( ), (H2 α ) R is lower semicontinuous on L 2 ( ), we define the weighted regularizers Note that (H1 α ) and (H2 α ) imply that the family {R α } α∈(0,∞) satisfies (H) because convexity and lower semicontinuity yield weak lower semicontinuity, making this setting match with the framework of Sect. 2. Following the definition of the training scheme (T ), we introduce here for α ∈ (0, ∞) and j ∈ {1, . . . , N } the reconstruction functionals cf. (2.1), and consider accordingly the upper level functional I : (0, ∞) → [0, ∞) given by 2). Further, the following set of hypotheses on the training data will play a crucial role for our main result in this section (Theorem 3.2): (H4 α ) the data u η and u c satisfy . Remark 3.1 (Discussion of the hypotheses (H1 α )-(H4 α )) a) Note that (H1 α ) implies that the set of minimizers for the reconstruction functionals, K α , has cardinality one, owing to the convexity of R and the strict convexity of the fidelity term, considering also that J α, j ≡ ∞. In the following, we write w (α) = (w b) An example of a nonlocal regularizer satisfying (H1 α ) and (H2 α ) is As an explicit choice, one can take g(t) = t p for t ∈ R and a(x, y) = |y − x| −n−sp for x, y ∈ with some s ∈ (0, 1) and p ≥ 1, which corresponds to a fractional Sobolev regularization. c) Assumption (H3 α ) asserts that the regularizer penalizes the noisy images more than the clean ones on average. This is a natural condition because any good regularizer should reflect the prior knowledge on the training data, favoring the clean images.
d) The second condition on the data, (H4 α ), means that the noisy image lies closer to the clean image than its mean value, which can be considered a reasonable assumption in the case of moderate noise and a non-trivial ground truth. Indeed, suppose the noise is bounded by u where the second inequality is due to Jensen's inequality.
Next, we prove that the assumptions (H1 α )-(H4 α ) on the regularization term and on the training set give rise to optimal weight parameters that stay away from the extremal regimes, α = 0 and α = ∞. Thus, in this case, the bi-level parameter optimization procedure preserves the structure of the original denoising model. A related statement in the same spirit can be found in [30,Theorem 1], although some of the details of the proof were not entirely clear to us. Our proof of Theorem 3.2 is based on a different approach and hinges on the following two lemmas, the first of which determines the Mosco-limits of the regularizers, and thereby provides an explicit formula of the extension I of I as introduced in (2.6).  Proof Using standard arguments, we show that the Mosco-limit of (R α k ) k exists for every sequence (α k ) k of positive real numbers with α k → α ∈ [0, ∞], and corresponds to the right hand side of (2.3).
Case 1: α ∈ (0, ∞). Using (H2 α ) for the liminf inequality and a constant recovery sequence for the upper bound, we conclude that the Mosco-limit of (R α k ) k coincides with R α .
Case 2: α = 0. The liminf inequality is trivial. For the recovery sequence, take u ∈ L 2 ( ) and let (u k ) k ⊂ Dom R converge strongly to u in L 2 ( ), which is feasible due to (H1 α ). By possibly repeating certain entries of the sequence (u k ) k (not relabeled), one can slowdown the speed at which R(u k ) potentially blows up and assume that α k R(u k ) → 0 as k → ∞. Thus, Then, along with the weak lower semicontinuity of R (see Remark 3.1 a)), This shows that R(u) = 0, which implies by the assumption on the zero level set of R in (H1 α ) that u is constant, i.e., u ∈ C.
As a consequence of the previous proposition, we deduce that the extension I : → [0, ∞] of I in the sense of (2.6) can be explicitly determined as Indeed, a straight-forward calculation of the unique componentwise minimizer of the extended reconstruction functionals J α at the boundary points α = 0 and α = ∞ leads to Since the assumptions (i) and (ii) of Theorem 2.5 are satisfied, I coincides with the relaxation I rlx . By Corollary 2.8 (i), I attains its minimum at someᾱ ∈ [0, ∞]. The degenerate casesᾱ ∈ {0, ∞} cannot be excluded a priori, but the next lemma shows that the minimum is attained in the interior (0, ∞) under suitable assumptions on the training data. (i) Under the additional assumption (H3 α ), there exists α ∈ (0, ∞) such that (ii) Under the additional assumption (H4 α ), there exists α 0 ∈ (0, ∞) such that, for all α ∈ (0, α 0 ), . (2.5)

Proof
We start by providing two useful auxiliary results about the asymptotic behavior of the reconstruction vector w (α) as α tends to zero; precisely, which proves the first part of (2.6) due to the arbitrariness of ε. Exploiting the minimality of w (α) j for J α, j again with α ∈ (0, ∞) entails and, together with the first part of (2.6) and the lower semicontinuity of R by (H2 α ), it follows then that showing the second part of (2.6). Regarding (i), we observe that the minimality of w (α) j for J α, j for any α ∈ (0, ∞) and j ∈ {1, . . . , N } imposes the necessary condition 0 ∈ ∂J α, j (w (α) j ) or, equivalently, where ∂C(u) ∈ L 2 ( ) ∼ = L 2 ( ) is the subdifferential of a convex function C : where ·, · L 2 ( ) denotes the standard L 2 ( )-inner product. Summing both sides over j ∈ {1, . . . , N } results in By (H3 α ) in combination with the second part of (2.6), there exists α 0 > 0 such that for all α ∈ (0, α 0 ), so that choosingᾱ ∈ (0, α 0 ) concludes the proof of (i).
To show (ii), we exploit the first limit in (2.6). Due to (H4 α ), it follows then for any , which gives rise to (2.5) for all k sufficiently large.

Proof of Theorem 3.2
Since I in (2.4) attains its infimum at a pointᾱ ∈ (0, ∞) by Lemma 3.4, we conclude from Corollary 2.8 (iii) thatᾱ is also a minimizer of I.
Let us finally remark that the assumptions (H3 α ) and (H4 α ) on the training data are necessary to obtain structure preservation in the sense of Theorem 3.2.
As a result, it follows that If we now take u c = 0 and suppose additionally that u η has zero mean value, then I(α) > 0 for all α ∈ (0, ∞), while clearly I(∞) = 0, that is, the minimum of I is only attained at the boundary point α = ∞. Similarly, for u c = u η , the unique minimizer of I is α = 0.
(H2 p ) f is separately convex in the second two variables, i.e., f (x, y, ·, ζ ) and f (x, y, ξ, ·) are convex for a.e. x, y ∈ and every ξ, ζ ∈ R n . In this setting, we take p ∈ [1, ∞) and consider the regularization term R p :

Remark 4.1 a)
Since the regularizer R p is invariant under symmetrization, one can assume without loss of generality that f is symmetric in the sense that f (x, y, ξ, ζ ) = f (y, x, ζ, ξ) for all x, y ∈ and ξ, ζ ∈ R. b) Let p, q ∈ [1, ∞) with p > q. Hölder's inequality then yields for every u ∈ Dom R p = {u ∈ L 2 ( ) : R p (u) < ∞} that ˆ ˆ f p (x, y, u(x), u(y)) dx dy A basic example of a symmetric Carathéodory function f satisfying (H1 p ) with β = 0 and (H2 p ) is where a ∈ L ∞ (R n ) is an even function such that ess inf R n a > 0. Another example of such a function f with β = 1 in (H1 p ) is with b > 0; note that for the p > n case, the corresponding regularizer R p is, up to a multiplicative constant, the Gagliardo semi-norm of the fractional Sobolev space W 1− n p , p ( ). Before showing how the framework of Sect. 2 can be applied here, let us first collect and discuss a few properties of the regularizers R p with p ∈ [1, ∞). We introduce the notation to indicate a suitable ( p, β)-nonlocal seminorm. Our first lemma shows that the boundedness of the regularizer R p is equivalent to the simultaneous boundedness of the L p -norm and of the ( p, β)-seminorm.

Lemma 4.2
There exists a constant C > 0, depending on n, p, , M, δ, and β, such that

3)
and for all u ∈ L 2 ( ), and for all p ∈ [1, ∞). The next result provides a characterization of the domain of R p .

Lemma 4.3 For any p ∈ [1, ∞) there holds
If, additionally, β p < n, then If, instead, β p > n, then are implicitly restricted to . The next lemma shows that any element of the domain of R p can be extended to a function having compact support and finite ( p, β)-seminorm. Proof If β > n p , this follows directly from well-established extension results for fractional Sobolev spaces on to those on R n (cf. [34,Theorem 5.4]), considering (2.6). If 1 ≤ β p ≤ n, the map x → |x − y| −β p is no longer integrable at infinity. Property (2.7) follows by minor modifications to the arguments in [34,Sect. 5].
Elements of the domain of R p can be approximated by sequences of smooth maps with compact support.

Lemma 4.5 Let p ∈ [1, ∞). For every u
Proof Letū be an extension of u as in Lemma 4.3. We define u l = ϕ 1/l * ū ∈ C ∞ c (R n ) for l ∈ N with (ϕ ε ) ε>0 a family of smooth standard mollifiers satisfying 0 ≤ ϕ ε ≤ 1 and´R n ϕ ε dx = 1, and whose support lies in the ball centered at the origin and with radius ε > 0, supp ϕ ε ⊂ B ε (0) ⊂ R n . Then, u l → u in L p ( ) and u l → u pointwise a.e. in as l → ∞. To show that Lebesgue's dominated convergence theorem can be applied, we use the upper bound in (H1 p ) to derive the following estimate for any l ∈ N: for a.e. (x, y) ∈ × . By Jensen's inequality and Fubini's theorem,  Finally, we characterize the weak lower-semicontinuity of the regularizers. We refer to [8,36,48] for a discussion on sufficient (and necessary) conditions for the weak lower semicontinuity of inhomogeneous double-integral functionals.  Given a collection of noisy images u η ∈ L 2 ( ; R N ) and p ∈ [1, ∞), we set, for each j ∈ {1, . . . , N }, with K p, j := arg min J p, j = ∅ since (H) is satisfied. As in (T ), we define I : where K p = K p,1 × K p,2 ×· · ·× K p,N . Next, we prove the Mosco-convergence result that will provide us with an extension of I to = [1, ∞]. It is an L p -approximation statement in the present nonlocal setting, which can be obtained from a modification Proof To show (2.9), it suffices to show that for every sequence ( p k ) k ⊂ [1, ∞) converging to p ∈ [1, ∞], (2.9) holds with p replaced by p k . We divide the proof into two cases. Case 1: p ∈ [1, ∞). For the recovery sequence, consider u ∈ Dom R p and take (u l ) l ⊂ C ∞ c (R) as in Lemma 4.5, satisfying u l → u in L p ( ) and R p (u l ) → R p (u) as l → ∞. In view of Lemma 4.3, we know that (u l ) l is contained in Dom R p and Dom R p k for all k ∈ N, and we conclude via Lebesgue's dominated convergence theorem that so that one can find a recovery sequence by extracting an appropriate diagonal sequence.
To prove the lower bound, let u k u in L 2 ( ) be such that lim k→∞ R p k (u k ) = lim inf k→∞ R p k (u k ) < ∞, and fix s ∈ (1, p) (or s = 1 if p = 1). Observe that p k ≥ s for all k sufficiently large because p k → p for k → ∞. Then, Remark 4.1 b) and the weak lower semicontinuity of R s according to Lemma 4.6 imply that If s = p = 1 the argument is complete, whereas in the case p > 1, an additional application of Fatou's lemma shows lim inf s p R s (u) ≥ R p (u), giving rise to the desired liminf inequality.
Case 2: p = ∞. That constant sequences serve as recovery sequences results from the observation that R p k (u) → R ∞ (u) as k → ∞ for all u ∈ Dom R ∞ . The latter is an immediate consequence of classical L p -approximation, i.e., the well-known fact To prove the lower bound, we argue via Young measure theory (see, e.g., [37,48] for a general introduction). Let u k u in L 2 ( ), and denote by ν = {ν x } x∈ the Young measure generated by a (non-relabeled) subsequence of (u k ) k . The barycenter of [ν x ] :=´R ξ dν x (ξ ) then coincides with u(x) for a.e. x ∈ . Without loss of generality, one can suppose that (2.10) On the other hand, with the nonlocal field v u associated with some u : Letting q → ∞, we use classical L p -approximation results and the Jensen's type inequality for separately convex functions in [43,Lemma 3.5 (iv)] to conclude that Finally, the lower bound follows from the previous estimate and (2.10).
The above result implies that the reconstruction functional for p = ∞ and j ∈ {1, . . . , N } is given by Under the additional convexity condition on the given function f : × ×R n ×R n → R that (H3 p ) f is (jointly) level convex in its last two variables, where level convexity means convexity of the sub-level sets of the function, the supremal functional R ∞ also becomes level convex. In combination with the strict convexity of the fidelity term, the reconstruction functional J ∞, j then admits a unique minimizer. Since level convexity is weaker than convexity, we do not necessarily have that J p, j for p ∈ [1, ∞) is (level) convex, and it may have multiple minimizers.
If we suppose that f fulfills (H1 p )-(H3 p ), then Theorem 2.5 and Proposition 4.8 imply that the extension I : [1, ∞] → [0, ∞] is given by where w (∞) denotes the unique componentwise minimizer of J ∞ . In particular, the hypothesis (ii) of Theorem 2.5 is satisfied, which shows that I is the relaxation of I and, thus, admits a minimizerp ∈ = [1, ∞].
We conclude this section with a discussion of examples when optimal values of the integrability exponents are obtained in the interior of the original interval or at its boundary, respectively. In one case, the presence of noise causes R ∞ to penalize u c more than u η , while R q for some q ∈ [1, ∞) prefers the clean image. This entails that the optimal parameter is attained in = [1, ∞). In the second case instead, the reconstruction functional for p = ∞ gives back the exact clean image and outperforms the reconstruction functionals for other parameter values.

Example 4.9 a)
Let f = α f : × × R n × R n → R, for some α > 0 to be specified later, be a double-integrand satisfying (H1 p ), (jointly) convex in the last two variables, and vanishing exactly on {(x, y, ξ, ξ) : x, y ∈ , ξ ∈ R}. Following (2.1), we set for u ∈ L 2 ( ) and p ∈ [1, ∞). We further introduce the following two conditions on the given data u η , u c ∈ L 2 ( ; R N ): By applying Lemma 3.4 (i) from the previous section with R = R q -the conditions (H1 α ), (H2 α ), and (H3 α ) are immediate to verify in view of Lemma 4.3, Lemma 4.6, and (H4 p ) -we can then deduce for small enough α that I(q) < u η − u c 2 L 2 ( ;R N ) . On the other hand, due to (H5 p ), the same lemma can be applied to R = R ∞ with R ∞ (u) = ess sup (x,y)∈ × f (x, y, u(x), u(y)) for u ∈ L 2 ( ) to find (2.11) provided α is sufficiently small. The reverse triangle inequality then yields where in the second and third inequality we have used (2.11). This proves that the optimal parameter is attained inside [1, ∞), and, therefore, is also a minimizer of I. b) We illustrate a) with a specific example. Consider = (0, 1) and let f (x, y, ξ, ζ ) = |ξ − ζ |/|x − y| for x, y ∈ and ξ , ζ ∈ R n . This leads then to the difference quotient regularizers and with Lip(u) denoting the Lipschitz constant of (a representative of) u, which could be infinite.
With the sawtooth function v : we take a single clean and noisy image given by respectively, where ε > 0 is small; see Fig. 1.
We observe that u c is constant near the boundaries and only slightly steeper than u η in the middle of the domain. Numerical calculations show that for small ε, such as ε = 0.1, the estimate R 2 (u c ) < R 2 (u η ), and hence (H4 p ) with q = 2, holds; moreover, (H5 p ) holds since the clean image has a higher Lipschitz constant than the noisy image in the sense that Therefore, we find that for α > 0 small enough, the optimal parameter lies inside = [1, ∞). c) If we work with the same regularizers as in b), there are reasonable images for which the Lipschitz regularizer in (2.13) performs better than the other regularizers in (2.12). Let us consider with α > 0 chosen as in b), the images Since u η is affine, we can show that the reconstruction with the Lipschitz regularizer is also an affine function. Indeed, for every other function, one can find an affine function with at most the same Lipschitz constant without increasing the distance to u η anywhere. This, in combination with the fact that the images are odd functions with respect to x = 1/2, shows that w (∞) is of the form w (∞) (x) = γ (x −1/2) = γ u c with γ ≥ 0. Due to the optimality of w (∞) , the constant γ has to minimize the quantity which yields γ = 1. Hence, w (∞) coincides with the clean image and therefore I(∞) = 0, which implies that p = ∞ is the optimal parameter in this case.

Varying the Amount of Nonlocality
Next, we study two classes of nonlocal regularizers, R δ with δ ∈ := (0, ∞), considered by Brezis & Nguyen [15] and Aubert & Kornprobst [5], respectively, in the context of image processing. In both cases, we aim at optimizing the parameter δ that encodes the amount of nonlocality in the problem. We mention further that both families of functionals recover the classical T V -reconstruction model in the limit δ → 0, cf. [5,15].
To set the stage for our analysis, consider training data (u c , u η ) ∈ L 2 ( ; R N ) × L 2 ( ; R N ) and the reconstruction functionals J δ, j : L 2 ( ) → [0, ∞] with δ ∈ and j ∈ {1, 2, . . . , N } given by After showing that the sets (2.1) are non-empty for each of the two choices of the regularizers R δ , the upper-level functional from (T ) in Sect. 2 becomes In order to find its extension I defined on = [0, ∞], we determine the Mosco-limits of the regularizers (cf. (2.6) and Theorem 2.5). This is the content of Propositions 5.3 and 5.5 below, which provide the main results of this section.
To guarantee that the functionals R δ satisfy a suitable compactness property, see Theorem 5.2 b), we must additionally assume that (H5 δ ) ϕ(t) > 0 for all t > 0.
Clearly, the last two functions from Example 5.1 satisfy the positivity condition, while the first one does not. In identifying the Mosco-limits R δ in each of the three cases δ ∈ (0, ∞), δ = 0, and δ = ∞, we make repeated use of [15, Theorems 1, 2 and 3], which we recall here for the reader's convenience. (i) There exists a constant K (ϕ) ∈ (0, 1], independent of , such that (R δ k ) kconverges as k → ∞, with respect to the L 1 ( )-topology, to R 0 : (b) Suppose that (H5 δ ) holds in addition to the above conditions, and let (u k ) k be a bounded sequence in L 1 ( ) with sup k R δ (u k ) < ∞ for some δ > 0. Then, there exists a subsequence (u k l ) l of (u k ) k and a function u ∈ L 1 ( ) such that lim l→∞ u k l − u L 1 ( ) = 0.
Proof Considering a sequence (δ k ) k ⊂ (0, ∞) with limit δ ∈ [0, ∞], one needs to verify that the Mosco-limit of (R δ k ) k exist and is given by the right-hand side of (2.3). We split the proof into three cases. Case 1: δ = 0. Let (u k ) k ⊂ L 2 ( ) and u ∈ L 2 ( ) be such that u k u in L 2 ( ). We aim to show that (2.4) One may thus assume without loss of generality that the limit inferior on the right-hand side of (2.4) is finite, and, after extracting a subsequence if necessary, also Hence, by Theorem 5.2 a) (ii), it follows that u k → u in L 1 ( ), which together with Theorem 5.2 a) (i) yields (2.4).
To complement this lower bound, we need to obtain for each u ∈ L 2 ( ) ∩ BV ( ) a sequence (u k ) k ⊂ L 2 ( ) such that u k → u in L 2 ( ) and (2.5) The idea is to suitably truncate a recovery sequence of the -limit (L 1 )-lim k→∞ R δ k from Theorem 5.2 (i). For the details, fix l ∈ N and consider the truncation function, (2.6) Choosing a sequence (l k ) k ⊂ R such that l k → ∞ and l k v k − u L 1 ( ) → 0 as k → ∞, we define Then, an application of Hölder's inequality shows that as k → ∞. Therefore, u k → u in L 2 ( ) and, in view of the monotonicity of ϕ in (H3 δ ), we conclude that which implies (2.5) by (2.6). Case 2: δ ∈ (0, ∞). Consider a sequence (u k ) k ⊂ L 2 ( ) and u ∈ L 2 ( ) such that u k u in L 2 ( ) and We start by observing that there existδ > 0 and K ∈ N such that for all k ≥ K , we haveδ/2 ≤ δ k ≤δ. Hence, the previous estimate and (H3 δ ) yield Consequently, in view of Theorem 5.2 b), we may further assume that Using Fatou's lemma first, and then (2.7) together with the lower semicontinuity of ϕ on [0, ∞), we get which proves the liminf inequality. For the recovery sequence, fix u ∈ L 2 ( ) and take u k = δ k δ u for k ∈ N. Then, Case 3: δ = ∞. The lower bound follows immediately by the non-negativity of R δ k for k ∈ N. As a recovery sequence for u ∈ L 2 ( ), take a sequence (u k ) k ⊂ L 2 ( ) such that u k → u in L 2 ( ) and Lip(u k ) ≤ δ 1/4 k , which is possible since δ k → ∞ as k → ∞. Then, using (H2 δ ), Hence, R δ k (u k ) → 0 as k → ∞, which concludes the proof.

Aubert & Kornprobst Setting
Let ⊂ R n be a bounded Lipschitz domain. We fix a nonnegative function ρ : and consider the regularizers given for δ ∈ = (0, ∞) and u ∈ L 2 ( ) by

Remark 5.4 a)
As ρ is non-increasing, we have for all 0 < δ <δ and x, y ∈ that for all u ∈ L 2 ( ). b) Note that the assumption (H) from Sect. 2 is satisfied here; in particular, R δ is L 2 -weakly lower semicontinuous. Indeed, as the dependence of the integrand on u is convex, it is enough to prove strong lower semicontinuity in L 2 ( ). This is in turn a simple consequence of Fatou's lemma. c) In this set-up, the sets K δ, j in (2.1) consist of a single element w (δ) j ∈ L 2 ( ) in light of the strict convexity of the fidelity term and convexity of R δ . The upper-level functional from (2.2) then becomes The nonlocal functionals in (2.8) have been applied to problems in imaging in [5], providing a derivative-free alternative to popular local models. The localization behavior of these functionals as δ → 0 is well-studied, originally by Bourgain, Brezis, & Mironescu [12] and later extended to the BV -case in [26,51]. Using these results, we show that, as δ → 0, the reconstruction functional in our bi-level scheme turns into the T V -reconstruction functional, see Proposition 5.5 below. Moreover, in order to get structural stability inside the domain , we exploit the monotonicity properties of the functional R δ , cf. Remark 5.4 a). Lastly, as δ → ∞, we observe that the regularization term vanishes.  (2.10) with κ n = S n−1 |e · σ | dσ for any e ∈ S n−1 .
Proof Given (δ k ) k ⊂ (0, ∞) with limit δ ∈ [0, ∞], the arguments below, subdivided into three different regimes, show that the Mosco-limit of (R δ k ) k exists and is equal to the right-hand side of (2.9). Case 1: δ = 0. For the lower bound, take a sequence u k u in L 2 ( ) and assume without loss of generality that By [12,Theorem 4], (u k ) k is relatively compact in L 1 ( ), so that u k → u in L 1 ( ). We now use the -liminf result with respect to the L 1 ( )-convergence in [51,Corollary 8], to deduce that as desired. For the recovery sequence, we may suppose that u ∈ L 2 ( ) ∩ BV ( ). Then, it follows from [51, Corollary 1] that showing that the constant sequence u k = u for all k ∈ N provides a recovery sequence. Case 2: δ ∈ (0, ∞). For the liminf inequality, take a sequence (u k ) k converging weakly to u in L 2 ( ). Ifδ ∈ (0, δ), then δ k >δ for all k ∈ N large enough. Hence, it follows from Remark 5.4 a) that where the last inequality uses the weak lower semicontinuity of Rδ, cf. Remark 5.4 b). Lettingδ δ and using the monotone convergence theorem gives lim inf For the limsup inequality, consider u ∈ L 2 ( ) with R δ (u) < ∞. Since ρ is nonincreasing by (H6 δ ), we may extend u to a functionū ∈ L 2 (R n ) by reflection across the boundary of the Lipschitz domain such that cf. [12,Proof of Theorem 4]. With (ϕ ε ) ε a family of smooth standard mollifiers, the sequence u l := ϕ 1/l * ū for l ∈ N converges to u in L 2 ( ) as l → ∞, and we may argue similarly to the proof of Lemma 4.5 to conclude that With ρ δ := δ −n ρ(| · |/δ) and for a fixed l ∈ N, we find that where Lip(u l ) is the Lipschitz constant of u l . We have ρ δ k → ρ δ in L 1 (R n ) as k → ∞ by a standard argument approximating ρ with smooth functions. Hence, we obtain and, letting l → ∞, results in The limsup inequality now follows by extracting an appropriate diagonal sequence. Case 3: δ = ∞. The only nontrivial case is the limsup inequality, for which we take a sequence (u l ) l ⊂ C ∞ c (R n ) that converges to u in L 2 ( ). Then, with R larger than the diameter of , one obtains for every l ∈ N that As k → ∞, the last quantity goes to zero since ρ(| · |) ∈ L 1 (R n ). Therefore, we deduce that lim k→∞ R δ k (u l ) = 0, and conclude again with a diagonal argument.

Conclusions and Examples
In both the Brezis & Nguyen and the Aubert & Kornprobst settings, we now find that the extension I : [0, ∞] → [0, ∞] is given by where w (0) j for j ∈ {1, . . . , N } is the unique minimizer of the T V -reconstruction functional J 0, j (with different weight factors in the two cases). In particular, we deduce from Theorem 2.5 and Corollary 2.8 that I is the relaxation of I and that these extended upper-level functionals I admit minimizersδ ∈ [0, ∞]. To get an intuition about when this optimal parameter is attained at the boundary or in the interior of , we present the following examples.
Example 5.6 a) For both settings analyzed in this section, it is clear that if the noisy and clean image coincide, u c ≡ u η , then the reconstruction model with parameter δ = ∞ gives the exact clean image back. Hence, in this case the optimal parameter is attained at the boundary point δ = ∞.
To see this, we observe that J 0 (ũ) ≤ J 0 (u) for any u ∈ BV (−1, 1) with where u − := ess inf x∈(−1,1) u(x) and u + := ess sup x∈(−1,1) u(x). Indeed, the map u has at most the same total variation as u and does not increase the distance to u η anywhere. Next, since u η is an odd function, the same should hold for the minimizer, meaning that −θ 1 = θ 2 =: θ ∈ [0, κ n ]. We can now determine the value of θ by optimizing the quantity J 0 (w (0) ) in θ . This boils down to minimizing and yields θ = 0. Hence, the reconstruction model for δ = 0 yields the exact clean image, so that I(0) = 0. The same conclusions can be drawn for the Brezis & Nguyen setting by replacing κ n in the example above with K (ϕ). c) Let us finally address the case when I becomes minimal inside = (0, ∞). We work once again with the Aubert & Kornprobst model from Sect. 5.2, and assume in addition to (H6 δ ) that the function ρ is equal to 1 in a neighborhood of zero. We consider the following conditions on the pair of data points (u c , u η ) ∈ L 2 ( ; R N ) × L 2 ( ; R N ): is the componentwise minimizer of the T V -reconstruction functional J 0 and we set The two hypotheses above can be realized, for example, by taking u η = (1 + ε)u c for some small ε > 0 and w (0) = u c . Notice that (H7 δ ) immediately rules out δ = 0 as an optimal candidate, since the reconstruction at δ = ∞ is better. On the other hand, ρ is supposed to be equal to 1 near the zero, so that we infer for large enough δ that for all u ∈ L 2 ( ). Since, for large δ, the dependence of the regularizer on δ is of the same type as the weight case from Sect. 3, we may apply Lemma 3.4 (i) in view of (H8 δ ). This yields, for all δ large enough, that with w (δ) the minimizer of J δ . This shows that the optimal parameter is not attained at δ = ∞ either and, as a result, needs to be attained inside = (0, ∞). Hence, the optimal regularizer lies within the class we started with. The same conclusions can be drawn for the Brezis & Nguyen case described in Sect. 5.1 if we assume that ϕ(t) = ct r for small t with c > 0 and r ≥ 2. One may take, for instance, the normalized version of the second function in Example 5.1. We then suppose that the pair of data points (u c , u η ) satisfies (H7 δ ) and (H8 δ ), but now instead of (2.11), take We observe with l = u η L ∞ ( ;R N ) (which we assume to be finite) and T l the truncation as in the proof of Proposition 5.3 that for all u ∈ L 2 ( ) and δ ∈ (0, ∞). Therefore, we may restrict our analysis to functions u ∈ L 2 ( ) with |u(x) − u(y)| ≤ 2 l for all x, y ∈ . By additionally considering δ large enough, we now find in analogy to (2.12).

Tuning the Fractional Parameter
This final section revolves around regularization via the L 2 -norm of the spectral fractional Laplacian of order s/2, with s in the parameter range = (0, 1). Our aim here is twofold. First, we determine the Mosco-limits of the regularizers, which allows us to conclude in view of the general theory in Sect. 2 that the extended bi-level problem recovers local models at the boundary points of = [0, 1]. Second, we provide analytic conditions ensuring that the optimal parameter lies in the interior of (0, 1), and illustrate them with an explicit example.
The motivation behind the fractional Laplacian as a regularizer comes from [1], where the authors show that replacing the total variation in the classical ROF model [52] with a spectral fractional Laplacian can lead to comparable reconstruction results with a much smaller computational cost, if the order is chosen correctly. An abstract optimization of the fractional parameter for the spectral fractional Laplacian has already been undertaken in [6], although we remark that a convex penalization term is added there to the model to ensure that the optimal fractional parameter lies inside (0, 1).
We begin with the problem set-up. Let ⊂ R n be a bounded Lipschitz domain and let (ψ m ) m∈N ⊂ H 1 0 ( ) be a sequence of eigenfunctions associated with the Laplace operator (− ) forming an orthonormal basis of L 2 ( ). With the corresponding eigenvalues 0 < λ 1 ≤ λ 2 ≤ λ 3 ≤ · · · ∞, it holds for every m ∈ N that It holds that H s ( ) is a Hilbert space for every s ∈ (0, 1); for more details on these spaces, we refer, e.g., to [18,46]. In view of (2.1), the so-called spectral fractional Laplacian of order s/2 (with Dirichlet boundary conditions) on these spaces is defined as For s ∈ (0, 1), we consider the regularizer with some μ > 0. At the end of this section (see Remark 6.4), the weight parameter μ will be used to exhibit examples where structure preservation holds. The regularizers R s coincide with μ · 2 H s ( ) on H s ( ), and are L 2 -weakly lower semicontinuous because u k u in L 2 ( ) yields by a discrete version of Fatou's lemma. Therefore, the hypotheses in (H) from Sect. 2 are satisfied. Next, we determine the Mosco-limits of the regularizers, and thereby, provide the basis for extending the upper-level functional according to Sect. 2. Proposition 6.1 (Mosco-convergence of the regularizers) Let := (0, 1) and R s for each s ∈ be given by (2.2). Then, for u ∈ L 2 ( ) and s ∈ = [0, 1], Fixing a sequence (s k ) k ⊂ (0, 1) with limit s ∈ [0, 1], we want to prove now that the Mosco-limit of (R s k ) k exists and is given by the right-hand side of (2.3).
Step 1: The liminf-inequality. Let u k u in L 2 ( ), and assume without loss of generality that lim inf k→∞ R s k (u k ) < ∞. Then, since (u k ) m → u m for each m ∈ N as k → ∞, it follows from a discrete version of Fatou's lemma that In light of (2.4) for the cases s ∈ {0, 1}, the last quantity equals the regularizer on the right hand side of (2.3) in all the three regimes. This finishes the proof of the lower bound.
Step 2: Construction of a recovery sequence. We first consider the u ∈ H 1 0 ( ) case. By the regularity of u and Lebesgue's dominated converge theorem (applied to the counting measure) and by considering the constant recovery sequence u k = u, we get The existence of a recovery sequence follows then by classical diagonalization arguments, using the previous case.
Given clean and noisy images, u c , u η ∈ L 2 ( ; R N ), we work with the reconstruction functionals The following lemma investigates how w (s) varies with s. In the s > 0 case, this lemma is essentially contained in [6, Theorem 2] (i.e., in a slightly different setting with periodic instead of Dirichlet boundary conditions). The proof below contains some additional details for the reader's convenience. In particular, we may take the limit t → s on the left-hand side of the preceding estimate and interchange with the sum to show the claim.
It follows as a consequence of Lemma 6. Since (H1 s ) guarantees that the minimizer of I is not s = 0 and (H2 s ) ensures the minimizer to be different from s = 1, Corollary 2.8 (iii) yields the following result.
We close this section with an interpretation of the conditions (H1 s ) and (H2 s ), and a specific example in which they are both satisfied. If we assume that the noise has mostly high frequencies and that the clean image has mostly moderate frequencies, then the mixed terms in (2.9) will be small. The first condition is then close to which holds for sufficiently small μ. Similarly, for sufficiently large μ, the second condition is satisfied. As we analyse in b) below, there are instances where we can find a range for μ that implies both conditions. b) In the case where = (0, π) 2 , by indexing the eigenfunctions via m = (m 1 , m 2 ) ∈ N 2 , we find ψ m (x) = sin(m 1 x 1 ) sin(m 2 x 2 ) with corresponding eigenvalues λ m = m 2 1 + m 2 2 . By choosing u c = ψ (1,1) as the clean image and η = 1 10 ψ (10,10) as the noise, the condition (2.9) turns into ⎧ ⎨ ⎩ −100 μ log(2) + log(200) > 0, −μ 4 log(2) (1 + 2μ) 3 + 2 log(200) (1 + 200μ) 3 < 0, which is satisfied for 0.0236 ≈ μ − < μ < μ + ≈ 0.0764.
On the other hand, when μ = 0.023, then s = 1 is optimal, while the optimal solution for μ = 0.11 is s = 0. This can be seen numerically as for these values of μ, the derivative I is either negative or positive on [0, 1], respectively.