Uncertainty Quantification in Image Segmentation Using the Ambrosio–Tortorelli Approximation of the Mumford–Shah Energy

The quantification of uncertainties in image segmentation based on the Mumford–Shah model is studied. The aim is to address the error propagation of noise and other error types in the original image to the restoration result and especially the reconstructed edges (sharp image contrasts). Analytically, we rely on the Ambrosio–Tortorelli approximation and discuss the existence of measurable selections of its solutions as well as sampling-based methods and the limitations of other popular methods. Numerical examples illustrate the theoretical findings.


Introduction
In the use of modern imaging techniques in medicine and embedded artificial intelligence systems for an automated investigation of anatomical structure or the classification of objects (human, tree, cloud, etc.), one of the fundamental The first two authors acknowledge financial support by the DFG under the Grant HI 1466/10-1 and the third author under SU 963/1-1, both associated with the project "Generalized Nash Equilibrium Problems with Partial Differential Operators: Theory, Algorithms, and Risk Aversion" within the priority programme SPP1962 "Non-smooth and Complementarity-based Distributed Parameter Systems: Simulation and Hierarchical Optimization." Moreover, the first author received funding by the Einstein Center Mathematics research project ( steps is the division of a given image into distinct areas with characteristic properties; see, e.g., [35,Chapter 2] and [12,Part II]. In this context, image segmentation is concerned with the problem of identifying regions with approximately homogeneous features (such as color or gray values, and texture) within given image data. The boundary curve separating such homogeneous features is called the edge set or simpy the edge.
In this work, we focus on the Mumford-Shah energy (model) which was introduced in [32]. Here, g ∈ L 2 (Ω) is a given grayscale image, possibly degraded by noise, where Ω ⊂ R 2 represents the image domain, and α, β > 0 are given parameters. Further, H 1 denotes the one-dimensional Hausdorff measure. In order to discuss the structure of M S g , we note that the third term is a least-squares data fitting term involving our desired reconstruction u and the data g. It is inspired by a maximum likelihood argument regarding the statistical properties of the underlying noise. Since the Mumford-Shah model aims to establish a division in monochromatic regions, the (generalized) gradient of the reconstructed image is minimized (penalized when viewed together with the choices of α and β) over the complement of the edge set Γ in Ω. This motivates the first term in M S g . In order to achieve a certain robustness in finding Γ with respect to noise, the length of the edge set is penalized in the second term. Moreover, penalizing the length of Γ secures a Caccioppoli property of the edge set. The aim associated with the above Mumford-Shah model typically is to find a restored image u : Ω → R and an edge set Γ , which separates homogeneous regions of interest, by minimizing M S g ; see (MS) above. In this context, the Hausdorff term in M S g is one ingredient yielding existence of a minimizing pair (u g , Γ g ). We note that such a minimizer is attached to the specific realization of the noise ξ contained in the data, i.e., g = g 0 + ξ with g 0 representing the underlying true image.
Obviously, the location of Γ g within Ω is influenced by the specific realization ξ of an associated random process and might not properly reflect the location of the true edge Γ g 0 pertinent to g 0 . Indeed, rather than computing Γ g one is merely interested in estimating the uncertainty in the location of the reconstructed edge set, given statistical properties of the underlying random process. This brings us into the realm of uncertainty quantification of geometric objects.
In general, one of the fundamental components of predictive estimation in uncertainty quantification is to analyze how uncertainty propagates through a given mathematical model; also known as model prediction [38,Chap. 9]. We observe that the vast majority of the existing literature on the propagation of uncertainty is concerned with well-defined systems whose (unique) solutions are given by real numbers, vectors, matrices, or distributed parameters; see, e.g., [38,Chap. 9,Chap. 10] or for an introduction with emphasis on methods also [39].
However, within the context of image processing (segmentation, in particular), variational models of crack propagation, or free discontinuity problems in general, the output of the model is a geometric quantity, which itself does not even live in a linear space; see [6] and the references therein. Due to the associated complexity, we emphasize the role played by the geometric variable (the edge set). In fact, of the existing methods for the propagation of uncertainty: direct evaluation of the mean and variance of a quantity of interest, sampling methods, perturbation methods, and spectral representations, only sampling methods appear to be possible.
In contrast to inverse problems, where Bayesian techniques admitting probability densities to characterize the true value are utilized, we are here confronted rather with a kind of forward problem. It is also worth mentioning that the model considered in this work yields solution pairs consisting of the restored image and associated edge set, which is not guaranteed to be unique due to a lack of strict convexity of the objective in the Mumford-Shah problem.
Our intention here is to therefore draw attention to an important but somewhat neglected class of problems for uncertainty quantification. As a means of investigating these models, in particular the edge set, we later propose the notion of generalized cumulative distribution function and pointwise quantiles. Of course, pointwise averages, variances, etc., are also conceivable. However, we believe that an estimation of the probability of an edge appearing within a given image patch (or a patch of pixels in the discrete setting) is the more valuable information.
Let us now return to the Mumford-Shah model and introduce the associated variational framework. Given sufficient Sobolev regularity of u on the set Ω\Γ , this leads us to the following minimization problem: Note that (MS) constitutes the renown Mumford-Shah problem. The interested reader is referred to the original paper [32] for more details, and for a condensed overview including a discussion of the existence of solutions we point to [6] and the references therein. Uniqueness of a solution cannot be expected in general as it can be seen in a one-dimensional counter example in [18,Section 7.4.5]. The analytic difficulties arise due to the set-variable Γ as well as u being an element of a space depending on this set. Moreover, we have the Hausdorff measure directly appearing in the functional. These aspects not only significantly complicate the analysis of (MS), but they also challenge the numerical treatment.
An elegant way to address these issues is due to Ambrosio and Tortorelli (cf. [10,11]). Driven by the analytical notion of Γ -convergence, the main idea is to treat the edge set along with its Hausdorff measure by a phase field approach. In fact, the so-called Ambrosio-Tortorelli functional AT g ε : together with the constraint set K := y ∈ H 1 (Ω) : 0 ≤ y ≤ 1 almost everywhere (a.e.) on Ω .
As before, the quantity u ∈ H 1 (Ω) denotes the restored image. The geometric quantity Γ , on the other hand, is approximated by the function z ∈ H 1 (Ω), which can be interpreted as an edge indicator. In this context, the zerolevel set of z is associated with an edge and the level set associated with z = 1 relates to the absence of such an edge. From this, we see that the first term in AT g ε refers to the first term of (MS). The second and third terms approximate the Hausdorff measure. The remaining terms are the data fitting term as well as an additional regularization for the restored image u with weighting factor η > 0. This term ensures coercivity. In [11], it is shown that the sequence of minimization problems Γ -converges in the strong L 1 (Ω; R 2 )-topology to the original Mumford-Shah problem (MS) as ε → 0 and η = o(ε), with o(t) t → 0 as t → 0. As motivated above, given noisy observations g, we are interested in studying the influence of random errors and other data transformation effects in the image data on the reconstruction of edges in the image. The degradation model underlying our study is given by where L is a linear continuous operator from L 2 (Ω) to L 2 (Ω), i.e., L ∈ L(L 2 (Ω)). Note that L may model image blur, subsampling or Fourier and wavelet transforms, respectively, to mention only a few. Moreover, ξ ∈ L 2 (Ω) is an oscillating map with zero mean. From a stochastic point of view, L and ξ constitute realizations of stochastic processes. We also assume that L does not annihilate constant functions, i.e. L1 = 0.
Since Γ in (MS) is a set, a mathematically meaningful notion of uncertainty propagation of this object is not immediately clear. In view of this, the Ambrosio-Tortorelli formulation offers a transformation of this geometric feature to a functional variable which is more amenable to the study of the propagation of uncertainty. Taking account of the additional data transformation operator L in our model, AT g ε becomes which constitutes the segmentation model under investigation in this paper. The rest of the paper is organized as follows. In Sect. 2, we provide a collection of important results on the Ambrosio-Tortorelli functional concerning properties of the functional as well as first-order conditions for its minimizers. These will be used in Sect. 3 to give a mathematical justification of random processed image and edge indicators in the sense of measurable selections based on the theorem of Kuratowski-Ryll-Nardzewski. In Sect. 4, we briefly discuss frontiers of popular techniques in uncertainty quantification before we turn our attention to sampling-based methods. We conclude our work in Sect. 5 with numerical examples.

Properties of the Deterministic Problem
Now we collect several properties of the Ambrosio-Tortorelli functional and its minimizers. In order to reduce technicalities in the subsequent proofs, we establish the following norm equivalence. Below, · L 2 (Ω) denotes the usual L 2 (Ω)-norm on a bounded domain Ω ⊂ R d and H 1 (Ω) is the Sobolev space of functions in L 2 (Ω) with generalized gradient in L 2 (Ω) d ; see, e.g., [3].
For proving the inverse inequality we define the quantity m : Lebesgue measure, and let C P > 0 be the Poincaré constant, which is the (smallest) real number fulfilling the Poincaré inequality see, e.g., [2,Corollary 5.4.1]. Using the above, we find which completes the proof.
From now on, we equip the Sobolev space H 1 (Ω) for the processed image u with the new norm ||| · |||, whereas the norm of the z-component remains unchanged. Next we consider continuity properties of the Ambrosio-Tortorelli functional.

Theorem 2 The mapping AT
Proof Let (u n , z n ) (u, z) be a weakly convergent sequence in H 1 (Ω) × K . Then there exists a subsequence (not relabeled) such that the limes inferior is attained and z n → z converges additionally pointwise almost everywhere (Fischer-Riesz). Using Lebesgue's dominated convergence theorem, we obtain z n ∇u n z∇u in L 2 (Ω; R d ). Eventually, we obtain by the weak lower semi-continuity of the L 2 -norm that which proves the lower-semicontinuity of AT g ε .
Using the direct method of calculus of variations in combination with Lemma 1 and Theorem 2, it is not difficult to derive the existence of a solution of (AT). Due to the nonconvexity of AT g ε , uniqueness cannot be expected in general. For the sake of characterizing critical points, we turn our attention to necessary first-order optimality conditions. The derivation of a corresponding system is challenged by the nonsmoothness of AT g ε with respect to the H 1 (Ω)× H 1 (Ω)topology together with the presence of the constraint set K . We address these issues by using a truncation argument. In fact, for all u, z ∈ H 1 (Ω) with z ∈ L ∞ (Ω) it holds that AT g ε (u, z) < ∞ as well as The constraint set is then substituted by z ∈ X with X := (1) and the respective sets of minimizers coincide. The advantage of the righthand side formulation in (1) lies in the fact that the Ambrosio-Tortorelli functional is continuously Fréchet Here, ·, · denotes the dual pairing between H 1 (Ω) × X and its dual. Then the stationarity system associated with the problem on H 1 (Ω) × X in (1) in its weak form reads for all ϕ ∈ H 1 (Ω) and ψ ∈ X . Utilizing coercivity properties of the involved inner products, one readily obtains the a priori bounds min(η, β)|||u||| ≤ β g L 2 (Ω) , and (3a) Moreover, we deduce z ∈ K , which results from (2b) by testing with ψ = −(z − 1) + + (−z) + with (·) + = max{0, ·} in a pointwise sense; see also [14,Prop. 1.3].
After clarifying existence of solutions and first-order necessary conditions, we next draw our attention to the following convergence result.

Theorem 3 Suppose there exists a bounded sequence
Further, let (μ j , ν j ) → 0 and z j ∈ K for all j ∈ N. Then there exist a point (u * , z * ) and a subsequence such that Proof The result follows from a straight forward adaptation of the proof in [14,Theorem 4.1].
The above result is useful when studying properties of solution sets of (2) and for the convergence analysis of a splitting-type algorithm in function space for iteratively solving (2). We continue here by studying such a splitting method and postpone the former aspect to Sect. 3.

Algorithm 1: Splitting Method
Data: parameters α, β, ε, η, image data g, starting values We note that the algorithm splits the overall nonconvex minimization problem into the iterative solution of two subsequent problems. Indeed within one iteration, in each step a convex quadratic minimization problem needs to be solved, which is equivalent to solving a linear partial differential equation (PDE), respectively. The convergence of this iteration scheme can then be obtained by applying a result due to [23] concerning the decay of the distance of two consecutive image iterates.

Corollary 4
Let (u j , z j ) j∈N be the sequence generated by the splitting algorithm. Then there exists a stationary point (u * , z * ) ∈ H 1 (Ω) × X and a subsequence such that Hence, we obtain for every ϕ ∈ H 1 (Ω) that Since b(u j , z j ; ψ) = 0 for all ψ ∈ X , we see that the conditions of Theorem 3 are fulfilled with μ j = max(1 + η, β)|||u j+1 − u j ||| and ν j = 0. Hence, the assertion follows.
We point out that the last estimate in (5), leading to max(1+η, β)|||u j+1 −u j ||| ≤ tol for some tolerance tol > 0, can also be used as a numerically useful stopping criterion for the splitting algorithm.
With these results at hand, we are now able to derive finer properties of the dependence of the solutions of the minimization problem and the first-order system on the given image data, respectively.

Existence of Measurable Selections
Next we introduce the concept of uncertain edges. As motivated earlier, we assume that the real image is inaccessible due to corruption by (random) noise and a deterministic degradation operator. The resulting data are therefore modeled by a random variable, which we denote in the following again by g for the sake of readability. Rather than addressing the issue of finding a reconstruction of the image along with an edge set, we seek to study the propagation of noise into the solution.

Measurable Selections for Ambrosio-Tortorelli
For the reasons mentioned above, we focus on the Ambrosio-Tortorelli approximation (AT) of the Mumford-Shah energy. The main issue now is to establish existence of random variables of solutions as well as their characterization. For this purpose we define the following set-valued solution operators S min , S stat : We start by providing structural properties of these operators.
Then the following properties hold true: (i) S has nonempty and compact values.
Proof We split the proof into several steps. Below C ⊂ H 1 (Ω) × K always denotes a closed set.
Step 1 (Nonemptiness). Since we know that the Ambrosio-Tortorelli problem has a minimizer for every g ∈ L 2 (Ω), it holds that S min (g) = ∅. Since every minimizer fulfils the stationarity system, S stat (g) = ∅ follows, as well. Step Since according to Theorem 2 the Ambrosio-Tortorelli functional is (weakly) l.s.c. on H 1 (Ω) × K with respect to the H 1 (Ω; R 2 )-topology, we find that S min (g) is closed.
Step 3 (S stat (g) and S min (g) are compact). Let (u n , z n ) n∈N ⊂ S stat (g) be a sequence of stationary points. Then this sequence is bounded in H 1 (Ω) × X by the a priori estimate (3), and z n ∈ K holds true, as well. We apply Theorem 3 with μ j = ν j = 0 and get the existence of a stationary point together with a strongly H 1 (Ω; R 2 )-convergent subsequence. Thus, the compactness of S stat as well as the compactness of S min (g) (using Step 2) are established. Step For this purpose, let (u n , z n ) n∈N ⊂ C be a sequence of stationary points for g n , n ∈ N. Then, 0 ≤ z n ≤ 1 holds almost everywhere and a g n (u n , z n ; ϕ) = 0 ∀ϕ ∈ H 1 (Ω), b(u n , z n ; ψ) = 0 ∀ψ ∈ X , for every n ∈ N. Again, we get the boundedness of ((u n , z n )) n∈N in H 1 (Ω) × X by the uniform boundedness of g n L 2 (Ω) , n ∈ N, combined with the a priori bound (3). Moreover, we find Since g n → g in L 2 (Ω), the conditions of Theorem 3 are fulfilled for g with μ n = β g n −g L 2 (Ω) and ν n = 0. Hence, there exists a stationary point (u * , z * ) together with a strongly Step Since every minimizer is a stationary point, we obtain from the arguments of step 4 a stationary point (u * , z * ) ∈ C and a strongly H 1 (Ω; is also a minimizer. In fact, we have Since (u, z) was arbitrary, we conclude that (u * , z * ) is also a minimizer for g. Hence S −1 min (C) is closed. This completes the proof.
The aim is to construct a random variable, whose values are the solutions of (AT) for the given image and for almost every realization of noise. This quantity then defines an associated reconstructed image and edge indicator, respectively. Therefore we aim at using the Kuratowski-Ryll-Nardzewski measurable selection theorem, which we state here for ease of reference and refer the interested reader to [13, Theorem 6.6.7].
Theorem 6 (Kuratowski and Ryll-Nardzewski). Let Y be a Polish space, (X , A) a measurable space and Ψ : X ⇒ Y a set-valued operator such that (i) Ψ (x) is nonempty and closed for all x ∈ X , and Then there exists a measurable selection, i.e., a mapping Given the notation of Theorem 6, we note that the fulfilment of the condition ) as a closed set for every n ∈ N. Then, we obtain Based on this observation, we are now able to prove the existence of measurable selections for the operators S min and S stat .
is a separable Hilbert space, it is a Polish space. As closed subsets of Polish spaces are Polish as well (see [40, 2 is a Polish space and hence the assumptions of Theorem 6 are fulfilled.

Measurable Selections for Mumford-Shah
Similar to the previous subsection, it is possible to derive an analogous result for the Mumford-Shah problem (MS). Hence, we focus on the denoising case, i.e., L = id L 2 (Ω) . Due to the analytical difficulties concerning the edge set as a geometric quantity, we have drawn our attention to the Ambrosio-Tortorelli model. In the following, we temporarily return to the original Mumford-Shah model. In fact, it is possible to derive a result similar to the one above for the processed image only. For this sake, we first discuss a reformulation of the Mumford-Shah problem (MS) based on special functions of bounded variations.
For this purpose, we recall that the space of functions of bounded variation BV (Ω) is defined as the set of L 1functions with bounded total variation. The latter quantity is given by is a Radon measure which can be decomposed according to Here, the first term is the part of Du that is absolutely continuous with respect to the Lebesgue measure with ∇u denoting its density. The terms u + , u − are the so-called upper, respectively, lower limits of u and S u : represents the jump set. The second term denotes the part of the measure that is absolutely continuous with respect to the Hausdorff measure restricted to S u , which we denote here as H d−1 S u . The remaining part is singular with respect to both the Lebesgue and Hausdorff measure and is referred to as the Cantor part of Du. For the definition of approximate limits as well as further details the reader is referred to [2, Section 10], [21,Section 5] as well as to the monograph [6]. The space of special functions of bounded variation is then defined as which is the subspace of BV -functions with vanishing Cantor part.
Using the above notation, it is possible to provide the following equivalent reformulation of the Mumford-Shah problem, originally proposed by [20], as a problem with the processed image contained in the S BV -space, i.e., We note that a solution must fulfil u ∈ L 2 (Ω) and ∇u ∈ L 2 (Ω; R d ) for the value of the functional to be finite. Observe further that this formulation has the advantage that the edge set does not enter as an explicit variable, but it is rather encoded in the function u through a jump. On the other hand, a potential drawback of this approach is given by the fact that now u is contained in a space that is neither reflexive nor separable. For details on the equivalence of the two formulations as well as the existence proof for g ∈ L ∞ (Ω) the reader is referred to [20] or to [6,Section 7]. For (MS-SBV), we are now able to derive an analogous version of Theorem 5, when we consider the solution operator as a mapping from L ∞ (Ω) to L 2 (Ω). Let us briefly discuss the requirement on g. In fact, from a practical view point an image signal is the product of a technical measurement process producing data in a fixed and finite range. Therefore the image signal will typically be essentially bounded. From the mathematical viewpoint, we seek to apply a result by Ambrosio addressing SBV functions as well as Theorem 6. In order to do so, we need bounded sequences in L ∞ (Ω) and a separable solution space L 2 (Ω) ⊃ S BV (Ω).
Then the following properties hold: (i) T has nonempty and compact values.
Before we provide the proof, we also state that there exists a measurable selection of minimizers, i.e., a Borel function Step 1 (The set T (g) ⊂ L 2 (Ω) is compact). We know that the Mumford-Shah problem has a solution (cf. [9, Example be a sequence of solutions. By the same truncation argument as in [9], we obtain u n L ∞ (Ω) ≤ g L ∞ (Ω) for all n ∈ N. Thus, (u n ) n∈N is uniformly bounded and by using we obtain that the conditions of the compactness result of Ambrosio [8] are fulfilled. Hence, there exist a subsequence (u n k ) k∈N as well as u ∈ S BV (Ω) ∩ L ∞ (Ω) such that u n k → u almost everywhere and the gradients ∇u n k ∇u converge weakly in L 2 (Ω; R d ). Due to the L ∞boundedness, we obtain u n k → u in L 2 (Ω) by dominated convergence. Moreover, we get by the lower semi-continuity of the norm and the second part of Ambrosio's compactness result that and eventually u ∈ T (g). This proves the compactness of T (g) in L 2 (Ω).
Step 2 (T −1 (C) ⊂ L ∞ (Ω) is closed for all closed subsets C ⊂ L 2 (Ω)). Let C ⊂ L 2 (Ω) be closed. We choose an arbitrary sequence (g n ) n∈N ⊂ T −1 (C) convergent in L ∞ (Ω) towards g ∈ L ∞ (Ω). Then for all n ∈ N there exists u n ∈ T (g n ) that minimizes the corresponding Mumford-Shah functional. It thus holds that M S for all v ∈ S BV (Ω). By plugging in v = 0 and using the uniform boundedness of g n in L ∞ , we recognize that the conditions of Ambrosio's compactness result are fulfilled, and therefore a subsequence (u n k ) k∈N as well as a limit u exist. By the above arguments, we see u n k → u in L 2 (Ω) and by the closedness assumption also u ∈ C. Together with the convergence of (g n k ) k∈N , we find analogously to Step 1 that for all v ∈ S BV (Ω). Thus, u ∈ T (g) ∩ C, and therefore T −1 (C) is a closed subset of L ∞ (Ω). We obtain the existence of a measurable selection again by the theorem of Kuratowski-Ryll-Nardzewski.
While the above result provides some theoretical insights into the behavior of the computed image, it does not yield a way of how to treat the jump sets, respectively, the edge sets. We therefore return to the Ambrosio-Tortorelli approximation.
So far we have discussed measurable selections of operators on function space with respect to their topologies and induced Borel algebras. We consider now an abstract probability space ( , F, P) together with a random variable g : → L 2 (Ω) being measurable with respect to equipped with the σ -algebra F and L 2 (Ω) equipped with the Borel algebra B(L 2 (Ω)). If we consider now a measurable selection g → (u sel (g), z sel (g)) of minimizers (stationary points), then the composition ξ → (u sel (g(ξ )), z sel (g(ξ ))) is a random variable of minimizers (stationary points). We refer to the respective components simply as ξ → (u(ξ ), z(ξ )).

Characterizing Measurable Selections for Ambrosio-Tortorelli
After establishing the existence of measurable selections, the next step is to derive a variational characterization, i.e., we seek to interpret selections of stationary points as solutions of a stochastic partial differential equation (SPDE) and, respectively, selections of minimizers as minimizer of an optimization problem. For this purpose, we need to introduce elements of Bochner space theory. First we equip the (abstract) measurable space ( , F) with a probability measure P. The resulting measure space induces a Lebesgue integral for R-valued random variables denoted by provided the argument is Lebesgue integrable. For a measurable set M ∈ F, we define the characteristic function as Since we do not change ( , F), we also denote the above space by L where we write λ d ⊗ P for the product measure on Ω × , equipped with its product σ -algebra B(Ω) ⊗ F. The spaces are equipped with the norms · 2 H P := E ||| · ||| 2 , and respectively. In any case, for our random edge indicator z ∈ X P we obtain 0 ≤ z ≤ 1 holding λ d ⊗ P-almost everywhere, and in combination with the a priori estimate (3b) the boundedness in X P and L ∞ P (H 1 (Ω)). If we, moreover, assume that g ∈ L p P (L 2 (Ω)), then also u ∈ L p P (H 1 (Ω)). From now on let g ∈ L 2 P (L 2 (Ω)). Our first target is the SPDE-characterization. For this purpose, we proceed regarding the notation as with the image g and denote by (u, z) a random variable of stationary points. Choosing arbitrary ϕ ∈ H P and ψ ∈ X P , we prove the measurability of ξ → a g(ξ ) (u(ξ ), z(ξ ); ϕ(ξ )) as well as ξ → b(u(ξ ), z(ξ ); ψ(ξ )). The proof is only carried out for the latter one as the proof for the former is analogous.
Indeed, for c ∈ R we consider the preimage of (−∞, c] under b, which is . By dominated convergence, there exists a subsequence (not relabeled) such that z n and ψ n converge pointwise almost everywhere. It also holds that By the strong convergence of ∇u n the first part tends to zero, and by the pointwise convergence of z n ψ n and |∇u | 2 |z n ψ n − z ψ | ≤ 2R|∇u | 2 the second part approaches zero by dominated convergence, as well. Hence we obtain yielding (u , z , ψ ) ∈ B. By the H 1 -measurability of u, z and ψ, we get with R := ψ X P that holds for all c ∈ R. This implies the measurability of ξ → b(u(ξ ), z(ξ ); ψ(ξ )). Analogously, we obtain the measurability of a g(·) (u( · ), z( · ); ϕ( · )).
Hence, every measurable selection of stationary points fulfils the SPDE-system (in weak form) Consequently, every measurable selection of stationary points is also a solution of the SPDE (6). The converse holds true as well. Let ϕ * ∈ H 1 (Ω) and ψ * ∈ X arbitrary. Then we define the following events and analogously 0 = E |b(u, z; ψ * )| , from which we deduce a(u, z; ϕ * ) = 0 P-a.s. as well as b(u, z; ψ * ) = 0 P-a.s. for all ϕ * ∈ H 1 (Ω) and ψ ∈ X . From this, we see the equivalence of the realization-wise interpretation of (2) and the SPDE-system (6).
After the characterization of stationary points, we want to characterize measurable selections of minimizers in a manner similar to the interchange of minimization and integration in [37, Chapter 14, Section F]. For this purpose, we define the constraint set K P := {v ∈ H P : 0 ≤ v ≤ 1 holds λ d ⊗ P-almost everywhere} and formulate the following minimization problem Using the same arguments as for the deterministic Ambrosio-Tortorelli problem, we obtain the following result.

Theorem 9
The problem (7) admits a solution, and one can relax the constraint z ∈ K P to z ∈ X P , i.e., Moreover, the functional E[AT g ε (·, ·)] : H P × X P → R is Fréchet differentiable, and for every minimizer (u, z) ∈ H P × X P the stationarity system (6) is satisfied.
Let (u * , z * ) be a measurable selection of minimizers and (ū,z) a minimizer of (7). From this, we obtain AT g ε (u * , z * ) ≤ AT g ε (ū,z) P-a.s. Taking the expectation Hence (u * , z * ) is a solution of (7) and it follows by the above scenariowise inequality that AT g ε (u * , z * ) = AT g ε (ū,z) P-a.s. The solution of the minimization problem for the space of Banach space valued random variables is therefore identical to the solution in the sense of P-almost every scenario. We summarize our findings on the characterization of measurable selections in the following theorem.
Theorem 10 Let g ∈ L 2 P (L 2 (Ω)). A random variable (u, z) ∈ H P × K P is a solution of (7) if and only if it is a P-a.s. measurable selection of minimizers, and it solves (6) if and only if it is a P-a.s. measurable selection of stationary points.
The purpose of this section is to give a mathematical meaning to random processed image and edges. The approximation of the Mumford-Shah problem yields the existence of random variables that solve the nondeterministic problem scenario-wise. In this sense, a measurable selection of minimizers serves the purpose of a random reconstruction and a random edge. Our numerical treatment of noise propagation in the following section is based on this observation.

Numerical Methods
As we are concerned with computing the influence of the signal's randomness on the edge, our problem class falls into the realm of forward problems in the terminology of uncertainty quantification (UQ). For the sake of accessibility, we briefly summarize some of the main challenges also when designing numerical solution schemes. These facts challenge the use of many popular methods in UQ. Indeed: 3. The Wiener Chaos expansion is significantly troubled by the local constraint 0 ≤ z ≤ 1. The global orthonormal expansion is not compatible to the pointwise inequality constraints. For a computational approach in this direction we refer to [36].
4. Stochastic collocation is essentially an interpolation method and necessarily needs point evaluations and continuous dependence on the parameter.
The above techniques seek to approximate the entire random variable by discrete objects. A different approach is based on the definition of a quantity of interest and its approximation using the techniques of statistics and numerics. In this paper, we employ Monte Carlo-based techniques, which rely on the strong law of large numbers in a Hilbert space setting. The latter one is stated here for the sake of convenience. For a proof see, e.g., [19].

Generalized Confidence Intervals
In our context, one important quantity of interest is concerned with the identification of regions, where it is more likely to encounter an edge. In this sense, we would like to obtain a notion of confidence intervals for the edge indicator. To achieve this, we first provide a pointwise generalization of cumulative distributional functions (cdfs) in the following definition.

Definition 12
Let Z be an L 2 (Ω)-valued random variable. The function F : L 2 (Ω) → L ∞ (Ω) is defined pointwise almost everywhere by for τ ∈ L 2 (Ω) and is called the generalized cumulative distribution function (gcdf) of Z .
Note that the above pointwise definition is justified by the Fubini theorem on integration on product spaces. Using this, we choose a level of significance s ∈ (0, 1) and define for z -interpreted as a random variable in L 2 P (L 2 (Ω)) only -the pointwise quantile τ s for (almost all) x ∈ Ω by This defines a one-sided confidence interval (the lower bound is zero), and two questions immediately arise: (i) Why is this object measurable, and (ii) how can this object be estimated in practice? Both questions are answered by using the following method.

Algorithm 2: Bisection method
Data: generalized cdf F, level of significance s ∈ (0, 1) Result: pointwise quantile τ s 1 Set t 0 = 0 and t u 0 = 1 as t 0 , t u 0 ∈ L 2 (Ω); 2 for n = 0, 1, 2, . . . do Calculate the mid point: 3 t mid n = 1 2 (t n + t u n ); Update the upper/ lower bound: By definition, the above sequences are nondecreasing, respectively, nonincreasing, and almost everywhere, it holds that t n ≤ t n+1 (t u n ≥ t u n+1 ) and, moreover, F(t n ) < s ≤ F(t u n ) for all n ∈ N. By the monotonicity, we also obtain the inequality 0 ≤ t n < t u n ≤ 1 and hence the convergence t n t * and t u n t * pointwise almost everywhere on Ω. Due to the measurability of t n , t u n , we obtain by the pointwise convergence, that the limits are measurable as well. Moreover, by construction we get |t n+1 − t u n+1 | ≤ 1 2 |t n − t u n | and inductively |t n − t u n | ≤ 2 −n almost everywhere. Therefore we get t * = t * .
In addition, t * is the pointwise quantile of interest, which is equivalent to stating that for allt ≤ t * with λ d (A) > 0 with A := {x ∈ Ω :t(x) < t * (x)} also P[z <t] < s holds a.e. on A. Indeed, by the choice oft there exists for all x ∈ A a number n = n(x) ∈ N witht(x) < t n (x). Hence, by defining A n := {x ∈ Ω :t < t n } we see that A n ⊂ A n+1 and A = ∞ n=1 A n . By the construction of (t n ) n∈N it holds that P[z <t] ≤ P[z ≤ t n ] < s almost everywhere on A n . We then see the required inequality P[z <t] < s almost everywhere on A and have thus shown t * = τ s .
The result of this discussion is summarized in the following proposition.

Proposition 13
Let Z be an L 2 (Ω)-valued random variable and let a level of significance s ∈ (0, 1) be given. Then, the pointwise quantile In practice, the exact gcdf is unknown and hence an approximation is needed. For this sake, we define the empirical quantiles in full analogy to the classical case of R-valued random variables.

Definition 14
Let Z be an L 2 (Ω)-valued random variable. Consider an iid. family of samples (Z i ) i∈N with distribution P Z . The empirical gcdf is defined as the mapping For proving convergence of (F n ) n∈N as introduced in Definition 14, we recall the statistical theory for R-valued random variables and refer the reader to [41] for proofs and additional information. After the above preparations, we can formulate a function space valued version of the previous theorem. For this, we need to extend the continuity assumption for the quantile in Lemma 15 on the level of significance to our situation. Note further that the gcdf induces for almost all x ∈ R a cdf of an R-valued random variable reading as R t → P[z(x) ≤ t] = F(t)(x). Hence one can interpret the quantile function in a pointwise fashion as a function s → (F( · )(x)) −1 (s) for almost all x ∈ Ω. This observation leads to the following assumption. Utilizing Theorem 16 in a pointwise fashion, we obtain the following result.
Theorem 18 Let z ∈ K P and (z i ) i∈N be iid. samplings according to the distribution P z . Moreover, fix the level of significance s ∈ (0, 1) fulfilling Assumption 17 with respect to the gcdf of z. Then the sequence of pointwise empirical s-quantiles τ s,n converges to the s-quantile τ s for almost all points with probability 1 and in L p (Ω × ) for all p ∈ [1, ∞), i.e.
is a cdf, we can apply Theorem 16 pointwise and obtain the convergence of F −1 n (s) → F −1 (s) pointwise λ d ⊗ P-almost everywhere. Due to the pointwise restriction of z, we know almost surely that τ s,n ∈ [0, 1] a.e. on Ω. Since |τ s,n − τ s | ≤ 1 holds λ d ⊗ P-a.e. we derive by dominated convergence which ends the proof.
In this subsection, we have encountered the pointwise quantile as quantity of interest for the investigation of random edges. The associated convergence analysis yields a foundation for a sample-based approximation method for the gcdf and the pointwise quantile. The bisection method from the beginning of this section can then be used to efficiently approximate the empirical quantile for a given sample size.

Variance Reduction by Control Variates
In contrast to the previous section, where we focus on quantiles, we discuss here the expected value as a quantity of interest. Therefore we consider the random edge indicator as an object in L 2 P (L 2 (Ω)) and aim at employing the Monte Carlo method here as well. Since the convergence order is always 1 2 , the convergence speed is merely depending on the variance. Associated acceleration methods aim at the formulation of a new random variable with the same expectation as the original one, but a smaller variance. Several such methods do exist mainly in finite dimensions (cf. [34] for an overview). We pursue an approach based on control variates. For convenience we first review the one-dimensional case aiming at an infinite-dimensional generalization.
Let Z ∈ L 2 ( , F, P) be an R-valued random variable. Our aim is to estimate E [Z ]. Suppose, we have another random variable Y , called the control variate, together with its (exact) expected value. Then we sample from the variable for a yet to be chosen ρ ∈ R. Obviously, it holds that E Z ρ = E [Z ] and the variance of Z ρ is minimal with respect to the weighting parameter ρ for the choice The minimal variance for Z ρ then becomes and, when Cov (Z , Y ) = 0, a variance reduction is achieved (when compared to Z ). Of course, this concept can be extended to m control variates (Y ( j) ) m j=1 by defining The optimal ρ = (ρ 1 , . . . , ρ m ) then reads ..,m ∈ R m×m denotes the covariance matrix of Y and the quantity c Z = (Cov Z , Y ( j) ) j=1,...m ∈ R m denotes the covariance vector of the control variates and the desired target. For a matrix A ∈ R m×m we denote its Moore-Penrose-inverse by A + . In the same way as in the previous section, we extend the above methodology in a pointwise fashion and introduce for this aim the following notation.
For U , V ∈ L 2 P (L 2 (Ω)) we define the pointwise covariance as where E is the expectation defined via the Bochner integral. Further we define the pointwise variance as in contrast to the variance of U , which reads For a random variable Z ∈ L 2 P (L 2 (Ω)), a family (Y ( j) ) j=1,...,m ⊂ L 2 P (L 2 (Ω)) of control variates, and a measurable function ρ : Ω → R m we consider the random variable The pointwise minimization of V Z ρ with respect to ρ gives us the minimizer of Var Z ρ due to the Fubini theorem. Next we prove its measurability.

Lemma 19
The mapping induced by the Moore-Penrose inverse ( · ) + : R m×m → R m×m with A → A + is measurable.
Proof By Theorem 3.4 in [7], it holds that Let G L m (R) denote the set of invertible matrices. Since the mapping ( · ) −1 : A are continuous and thus measurable. Due to the pointwise convergence A + = lim k→∞ f k (A) also the mapping (·) + is measurable.
Hence we have proven the following result.

Proposition 20
The mapping ρ : Ω → R m , defined by ρ = −C + Y c Z , is (Borel) measurable. Proof Since the composition as well as the multiplication of measurable functions are measurable as well, we obtain the measurability of ρ using the preceding Lemma.

Numerical Examples
Our numerical methods are next applied to a selection of real world images. In this respect, the given image data are a (discrete) pixel matrix (G jk ) m j,k=1 of size m × m with values between 0 (black) and 255 (white). In order to utilize our continuous framework, we interpret this pixel mask as a function in L ∞ (Ω) with Ω := (0, m) 2 . This is done by decomposing Ω (up to null sets) into open squares of size 1 and rewriting with Q jk := (0, 1) 2 + ( j − 1, m − k). Concerning the underlying degradation models, we focus on two cases: (i) white noise and (ii) randomized motion blur.

White Noise Model
In this setting, we assume L := id L 2 (Ω) and that the pixel values are corrupted by a family of independent, normally distributed random variables, whose standard deviation is σ > 0 for all pixels. Then the random variable reads as where ξ jk ∼ N (0, 1) are iid. with standard normal distribution. In our experiments, we set σ = 40. This is in fact a rather large standard deviation compared to situations in practice. Hence, since the Mumford-Shah model has been designed in a way that it suppresses the influence of noise, a more aggressive noise model helps the visualization of results in subsequent experiments.

Motion Blur Model
In addition to the noise model above, we now also consider the motion blur operator L ∈ L(L 2 (Ω)) with horizontal movement defined by and v = (1, 0) the direction of motion. The parameter controls the velocity of the motion, respectively, the strength of the blurring effect. Since the integration domain exceeds Ω, we extend the argument g 0 by zero outside its domain. In our model, we assume the parameter ∼ U(0,¯ ) to be a uniformly distributed random variable, where we choosē = m 4 to be one fourth of the image size. In view of our underlying degradation framework, we decompose the above object into with L ∈ L(L 2 (Ω)) the linear operator such that Lg 0 = E L g 0 holds for all g 0 ∈ L 2 (Ω) and ξ := L g 0 − Lg 0 as centered random variable. The expected value is the subject of the following auxiliary result.  Proof It is easily seen that for each g ∈ C ∞ 0 (Ω) the mapping → L g is (Lipschitz-)continuous. Using L L(L 2 (Ω)) ≤ 1 for all ∈ R (shown similary to the proof of L L(L 2 (Ω)) ≤ 1 below) and the density of C ∞ 0 (Ω) in L 2 (Ω) we obtain by the Banach-Steinhaus theorem the continuity of → L g for all g ∈ L 2 (Ω) and hence the measurability of the above random variable.
For the calculation of the expected value, choose again g ∈ C ∞ 0 (R d ) arbitrarily. Then we obtain by direct computation The set of integration M is depicted in Fig. 1. Now we change the order of integration by slicing the set vertically instead of horizontally. A slice at t ∈ (−1, 1) reads then as the interval (|t|, 1). Hence, we proceed and obtain Finally, we obtain the boundedness of L by from which we deduce the relation for L 2 (Ω) by using again the density of C ∞ 0 (Ω) in L 2 (Ω).

Experimental Setup
As set of test images we take the image of the cameraman as well as an MRI scan of a human head in side view (see Fig. 2). We apply both instances of our degradation method to both test images, respectively. A selection of samples is depicted in Fig. 3. Since both of our methods are sample based, we need to generate a critical point of the Ambrosio-Tortorelli functional for each sample. This task can be parallelized. Of course we are unable to solve the minimization problem exactly and hence employ the finite element method (FEM) proposed in [14] in combination with the splitting method for the corresponding discretization. In order to obtain an FE mesh, we divide each of the abovementioned pixel squares into two triangles by inserting alternately a diagonal; see Fig.  4. Since the FEM approximation needs resources not depending on the image (like parts of the local stiffness matrices), we are able to share them among the minimization processes in order to reduce computational overhead. Our implementation is done in MATLAB. For the numerical solution of the PDEs, we use the software package AFEM [4].
In the white noise case, we use the parameters α = 75, β = 1, ε = 10 −2 and η = ε 1.1 , and in the motion blur case, we change to the parameters α = 50, β = 10 for the cameraman and α = 20, β = 10 for the MRI. Since in the latter case the noise component ξ is depending on the original image, a change of parameters is justified.
Moreover, we emphasize that depending on the form of the operator L it can be difficult to obtain the mass matrix with entries (Lφ i , Lφ j ) L 2 (Ω) exactly, where φ i are nodal basis functions. Hence we are working with the interpolations I T Lφ i instead, where I T : C(Ω) → C(Ω) denotes the nodal interpolation operator defined by ..,(n+1)(m+1) being the coordinates of the nodes.

Pointwise Quantiles
Firstly we consider the pointwise quantiles in order to detect areas, where an edge is likely to be found. As level of significance, we take s = 0.9 for the noise model cases. For the motion blur case, we take s = 0.5-obtaining the medianand s = 0.1 to better study the behavior of the quantiles. The visual results are depicted in Fig. 5. We note that the quantiles of experiments for the white noise case do not differ significantly from the edge indicators of the original images. In fact, on samples of edge indicators one can see little dark spots generated by the algorithm. Since they are not very distinct and their position is random, the quantile does not react sensitive to them. A more sensitive reaction can be seen for the expected values in the following section.
In the motion blur model, the reconstruction behavior has an influence. Since our blurring operator and the actually applied L differ, the reconstruction part tries to transport intensity where it would normally not belong to. This behavior leads to high oscillations (especially for small ) as well as doubling effects (especially for big ) in the corresponding reconstructed image. This results in the horizontal motion of strongly indicated edges as well as weak, blurred edges. Only very small parts of the edges-mainly horizontal onesremain at the same position.
In order to further clarify this effect, we resort to an additional synthetic test image similar to the one from [27] for the parameters α = 20, β = 10 and evaluate the quantiles using only 200 samples for a descending series of levels of significance. The results can be found in Fig. 6.
One observes that for high levels of significance (s = 0.9 and s = 0.75) the only clearly recognizable edges are the horizontal ones (upper and lower edge of the bar, the lower edge of the triangle and at the upper and lower poles of the circle). When reducing s, the moving edges gain importance, which results in gray areas around the places corresponding to the circle and the triangle. As it can be seen in the result for s = 0.3, the presence of distinct edges next to each other leads-due to the motion blur-to an overlap. This results in the detection of an edge at a position where for the original image no edge occurs at all. For low levels of significance s = 0.2 and s = 0.1, the edges corresponding to the original image are duplicated in horizontal direction around their original positions. In the last image, this results in a broad band of edges.
Coming now back to our images considered in Fig. 5, we can build on the above observations to get an improved understanding of the mechanisms leading to these results. As we have seen, edges in different directions occur in the quantile for very different values of s. Thus, we use here two levels s = 0.5 and s = 0.1 to get a better understanding. Again, we observe for the first value only gray areas as well as prolonged horizontal edges (e.g., the shoulder of the cameraman and the top of the head in the MRI example). For the lower value of s = 0.1, we observe doubled edges around the ones for the original images. In the cameraman image on the left and right boundary, gray horizontal 'oscillations' can be observed. This effect occurs since the image function has been extended by 0 outside of Ω. Since the background is originally gray, we generate here an edge (in contrast to the MRI image, where the background was already black in the first place). This also leads to some oscillatory effects on the left side due to "collisions" of the boundary and the man. We note that since our synthetic test image is an extreme example we do not have here such large bands of edges, but we still find duplications of edges.

Expected Values
In this section, we utilize the control variates technique to obtain the expected values of the edge indicators. Besides the mean values themselves, we also want to access the variance reduction effect. In our experiments, we use a total of   choose a different edge detection approach using filter-based methods which is way cheaper than the Mumford-Shah or Ambrosio-Tortorelli-based methods, respectively. Therefore, we use the so-called Marr-Hildreth filter, which reads where k σ := exp(− 1 2σ |x| 2 ) denotes the Gaussian kernel. The idea is to apply a smoothing step to reduce on the one hand the possible influence of noise and, on the other hand, to increase the regularity from an image signal in L 2 (R 2 ) to a smoothed signal in C ∞ (R 2 ). The application of the Laplacian seeks to find sources and valleys in the gradient map, which resort to rapid changes in the gradient. Due to its structure, the Marr-Hildreth filter is also called Laplacian of Gaussian in the literature. For more details on this, the interested reader is referred to [18, section 2.6.2] and the references therein.
As control variates, we then choose its absolute value as well as the exponential of its weighted negative. Especially, the latter has a chance to behave at least locally similar to the edge indicator obtained by our model. Additionally, we use the original image to get the noise applied to the image itself.
For motion blur, a direct application of the above technique would be inappropriate. Due to the presence of the operator L, we first apply a reconstruction step to the image based on the following consideration: Having the edge indicator given, one would just need to solve the linear equation (2a) to obtain the restored image. Since the real edge indicator is not given and expensive to calculate, we perform the reconstruction with respect to the two extreme cases, z = 1 corresponding to the a priori assumption of no edge occurring in the image at all, and z = 0 corresponding to the case of edges appearing everywhere in the image. Based on this, we consider the functions v 1 , v 2 ∈ H 1 (Ω) as solutions of the PDEs where L * is the adjoint of L and ν denotes the outward unit normal to ∂Ω. In this sense, we obtain two reconstructions of the image with respect to two different parameter choices. As for the noise model case, we apply again the Marr-Hildreth filter to v 1 , v 2 and use the absolute values as well as an exponential of its weighted negative.
To study the variance reduction effect, we compare the variances of the edge indicator z with the ones of the modified variable z cv . Since our approach is based on the pointwise application of control variates, we moreover investigate where the variance has been reduced. For this sake, we plot the difference of the pointwise variances V [z] − V [z cv ]. We remark here that mathematically the difference can never be negative, but since our multiplier and our variances are just estimations, it can happen that scattered negative values do occur. Hence the plots have been truncated at zero from below to give a clearer view. Note that in Fig. 7 the colormap is inverted for better visibility.
An overall number of 1000 samples has been generated, where 100 of them have been used to approximate the multipliers for the control variates. Consequently, for the estimation of mean, variance and difference of pointwise variances the other 900 were used.
For the noise case, we see in Fig. 8 that the background is considerably darker in comparison to the quantiles. This behavior is caused by the isotropic distribution of little dark spots generated by the noise. Besides that, there is again a strong similarity to the edge indicator of the original image. Moreover, a noticeable variance reduction can be observed. By considering the plots of the difference of the pointwise variances, we see a bigger improvement in regions, where many edges are close to each other.
For the motion blur model, we see horizontal edges pulled longer. By closer inspection, one finds several edges parallel and near to the original ones, such that they form a weak blurred band. This is caused by the motion of edges in horizontal direction described in the previous section. A significantly higher variance reduction effect can be observed in Table 2, which is not surprising as the Mumford-Shah model has been designed to reduce the effect of noise. Regarding the difference of the pointwise variances, we find the highest reductions in close proximity to the edges of the original image.

Conclusion
In this paper, we studied the influence of errors in images on edges resulting from an image segmentation procedure. Due to the inherent difficulty of performing uncertainty quantification for a geometric quantity, we transitioned to the Ambrosio-Tortorelli problem to profit from the accessible vector space structure. Despite the lack of convexity or unique solutions, we were able to establish a rigorous theoretical foundation for selections of solutions. To perform a quantitative treatment, we proposed numerical methods addressing quantiles and the expected values and applied them to practically meaningful images and instances of the considered degradation model.
Since uncertainty quantification of geometric objects is a relatively new and scarcely investigated topic, we offer here a possible access point for researchers dealing with problems having a similar geometric structure. From the practical viewpoint, our results might be of importance to obtain a qualitative understanding of the uncertainty associated with (reconstructed) edges in certain medical or machine vision applications.
Our investigation opens several new research perspectives. On the one hand, one may incorporate more general degradation operators in the Mumford-Shah formulation. This leads to analytical difficulties for the existence of solutions as well as numerical challenges.
On the numerical level: due to the anisotropic nature of the error indicator as well as the phase field approach the use of adaptive mesh refinement is of interest. Further, with the large amount of data generated from images, the use of com-pression techniques like low rank approximations might be attractive. However, in both cases the interplay of uncertainties might lead to difficulties and needs careful treatment.

A Word of Caution at the End
As an addendum, we want to emphasize that the white noise model considered in this work cannot be interpreted as a pixelwise projection of a random variable with values in L 2 (Ω). In the following we will make use of techniques from probability theory, which can for example be found in [29]. In contrast to before let Ω = (0, 1) 2 and define for n ∈ N a uniform grid Q n = (Q n jk ) n j,k=1 such that the area of a square around every pixel is 1 n 2 . Precisely speaking, we are going to show that there exists no random variable ξ * ∈ L 2 P such that the white noise with respect to this grid ξ Q has the same distribution as Q ξ * , where the operator Q ∈ L(L 2 (Ω)) is the piecewise projection onto the pixel grid defined by Q g := n j,k=1 1 |Q jk | Q jk gdx · 1 Q jk . Having λ 2 (Q n jk ) = 1 n 2 , we introduce the orthonormal system (e n jk ) n j,k=1 with e n jk := n1 Q n jk and rewrite the projection Q n as Q n g = n j,k=1 (e n jk , g) L 2 (Ω) · e n jk .
The white noise reads as ξ n = ξ Q n = n n j,k=1 ξ jk 1 Q n jk = n j,k=1 ξ jk e n jk , with (ξ jk ) n j,k=1 a family of independent, identically distributed random variables with standard normal distribution. It is straightforward to show that Q n ξ 2n = ξ n holds for all n. For proving the non-existence of ξ * as described above, we assume the contrary. Using the fact that diam(Q n jk ) = √ 2 n goes to zero for n → ∞ one can show by the density of smooth functions in L 2 (Ω) combined with a version of the Banach-Steinhaus theorem (cf. [42,Korollar IV 2.5]) that the operators Q n converge pointwise to the identity on L 2 (Ω). Hence Q n ξ * → ξ * converges λ 2 ⊗ P-a.e. In particular this yields that ξ n -having the same distribution as Q n ξ * -must converge in distribution to ξ * meaning that the corresponding probability distributions μ n are weakly convergent in the sense of measures. This implies especially the pointwise convergence of the respective characteristic functions, reading for an arbitrary v ∈ L 2 (Ω) as (e n jk , v) 2 As we have seen above, the linear operators Q n converge pointwise to the identity, such that we deduce the convergence for all v ∈ L 2 (Ω).
Since we have assumed that the μ n converge weakly, this implies that the characteristic function of the distribution of ξ * must be exp − 1 2 v 2 H . But according to [31, Proposition 1.2.11], there exists no probability measure with this particular characteristic function. This yields the anticipated contradiction.
Hence we deduce that the term white noise is only justified with respect to a given pixel grid.