Abstract
The quantification of uncertainties in image segmentation based on the Mumford–Shah model is studied. The aim is to address the error propagation of noise and other error types in the original image to the restoration result and especially the reconstructed edges (sharp image contrasts). Analytically, we rely on the Ambrosio–Tortorelli approximation and discuss the existence of measurable selections of its solutions as well as sampling-based methods and the limitations of other popular methods. Numerical examples illustrate the theoretical findings.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In the use of modern imaging techniques in medicine and embedded artificial intelligence systems for an automated investigation of anatomical structure or the classification of objects (human, tree, cloud, etc.), one of the fundamental steps is the division of a given image into distinct areas with characteristic properties; see, e.g., [35, Chapter 2] and [12, Part II]. In this context, image segmentation is concerned with the problem of identifying regions with approximately homogeneous features (such as color or gray values, and texture) within given image data. The boundary curve separating such homogeneous features is called the edge set or simpy the edge.
Over the years, different approaches have been proposed in order to accomplish this task by solving, e.g., suitably defined partial differential equations or by minimizing appropriate energies. The latter technique leads to variational models; see, e.g., [15,16,17, 22, 24,25,26, 30, 32] and the monographs [33, 5, Chapter 4] or [18, Chapter 7] as well as the references therein for an overview.
In this work, we focus on the Mumford–Shah energy (model)
which was introduced in [32]. Here, \(g \in L^2(\varOmega )\) is a given grayscale image, possibly degraded by noise, where \(\varOmega \subset \mathbb {R}^2\) represents the image domain, and \(\alpha ,\beta >0\) are given parameters. Further, \(\mathcal {H}^{1}\) denotes the one-dimensional Hausdorff measure. In order to discuss the structure of \(MS^g\), we note that the third term is a least-squares data fitting term involving our desired reconstruction u and the data g. It is inspired by a maximum likelihood argument regarding the statistical properties of the underlying noise. Since the Mumford–Shah model aims to establish a division in monochromatic regions, the (generalized) gradient of the reconstructed image is minimized (penalized when viewed together with the choices of \(\alpha \) and \(\beta \)) over the complement of the edge set \(\varGamma \) in \(\varOmega \). This motivates the first term in \(MS^g\). In order to achieve a certain robustness in finding \(\varGamma \) with respect to noise, the length of the edge set is penalized in the second term. Moreover, penalizing the length of \(\varGamma \) secures a Caccioppoli property of the edge set.
The aim associated with the above Mumford–Shah model typically is to find a restored image \(u: \varOmega \rightarrow \mathbb {R}\) and an edge set \(\varGamma \), which separates homogeneous regions of interest, by minimizing \(MS^g\); see (MS) above. In this context, the Hausdorff term in \(MS^g\) is one ingredient yielding existence of a minimizing pair \((u_g,\varGamma _g)\). We note that such a minimizer is attached to the specific realization of the noise \(\xi \) contained in the data, i.e., \(g=g_0 + \xi \) with \(g_0\) representing the underlying true image.
Obviously, the location of \(\varGamma _g\) within \(\varOmega \) is influenced by the specific realization \(\xi \) of an associated random process and might not properly reflect the location of the true edge \(\varGamma _{g_0}\) pertinent to \(g_0\). Indeed, rather than computing \(\varGamma _g\) one is merely interested in estimating the uncertainty in the location of the reconstructed edge set, given statistical properties of the underlying random process. This brings us into the realm of uncertainty quantification of geometric objects.
In general, one of the fundamental components of predictive estimation in uncertainty quantification is to analyze how uncertainty propagates through a given mathematical model; also known as model prediction [38, Chap. 9]. We observe that the vast majority of the existing literature on the propagation of uncertainty is concerned with well-defined systems whose (unique) solutions are given by real numbers, vectors, matrices, or distributed parameters; see, e.g., [38, Chap. 9, Chap. 10] or for an introduction with emphasis on methods also [39].
However, within the context of image processing (segmentation, in particular), variational models of crack propagation, or free discontinuity problems in general, the output of the model is a geometric quantity, which itself does not even live in a linear space; see [6] and the references therein. Due to the associated complexity, we emphasize the role played by the geometric variable (the edge set). In fact, of the existing methods for the propagation of uncertainty: direct evaluation of the mean and variance of a quantity of interest, sampling methods, perturbation methods, and spectral representations, only sampling methods appear to be possible.
In contrast to inverse problems, where Bayesian techniques admitting probability densities to characterize the true value are utilized, we are here confronted rather with a kind of forward problem. It is also worth mentioning that the model considered in this work yields solution pairs consisting of the restored image and associated edge set, which is not guaranteed to be unique due to a lack of strict convexity of the objective in the Mumford–Shah problem.
Our intention here is to therefore draw attention to an important but somewhat neglected class of problems for uncertainty quantification. As a means of investigating these models, in particular the edge set, we later propose the notion of generalized cumulative distribution function and pointwise quantiles. Of course, pointwise averages, variances, etc., are also conceivable. However, we believe that an estimation of the probability of an edge appearing within a given image patch (or a patch of pixels in the discrete setting) is the more valuable information.
Let us now return to the Mumford–Shah model and introduce the associated variational framework. Given sufficient Sobolev regularity of u on the set \(\varOmega \backslash \varGamma \), this leads us to the following minimization problem:
Note that (MS) constitutes the renown Mumford–Shah problem. The interested reader is referred to the original paper [32] for more details, and for a condensed overview including a discussion of the existence of solutions we point to [6] and the references therein. Uniqueness of a solution cannot be expected in general as it can be seen in a one-dimensional counter example in [18, Section 7.4.5]. The analytic difficulties arise due to the set-variable \(\varGamma \) as well as u being an element of a space depending on this set. Moreover, we have the Hausdorff measure directly appearing in the functional. These aspects not only significantly complicate the analysis of (MS), but they also challenge the numerical treatment.
An elegant way to address these issues is due to Ambrosio and Tortorelli (cf. [10, 11]). Driven by the analytical notion of \(\varGamma \)-convergence, the main idea is to treat the edge set along with its Hausdorff measure by a phase field approach. In fact, the so-called Ambrosio–Tortorelli functional \(AT^g_\varepsilon : H^1(\varOmega ) \times K \rightarrow \mathbb {R}\) is defined by
together with the constraint set
As before, the quantity \(u \in H^1(\varOmega )\) denotes the restored image. The geometric quantity \(\varGamma \), on the other hand, is approximated by the function \(z \in H^1(\varOmega )\), which can be interpreted as an edge indicator. In this context, the zero-level set of z is associated with an edge and the level set associated with \(z = 1\) relates to the absence of such an edge. From this, we see that the first term in \(\widetilde{AT}_\varepsilon ^g\) refers to the first term of (MS). The second and third terms approximate the Hausdorff measure. The remaining terms are the data fitting term as well as an additional regularization for the restored image u with weighting factor \(\eta > 0\). This term ensures coercivity. In [11], it is shown that the sequence of minimization problems
\(\varGamma \)-converges in the strong \(L^1(\varOmega ;\mathbb {R}^2)\)-topology to the original Mumford–Shah problem (MS) as \(\varepsilon \rightarrow 0\) and \(\eta = o(\varepsilon )\), with \(\frac{o(t)}{t} \rightarrow 0\) as \(t \rightarrow 0\).
As motivated above, given noisy observations g, we are interested in studying the influence of random errors and other data transformation effects in the image data on the reconstruction of edges in the image. The degradation model underlying our study is given by
where L is a linear continuous operator from \(L^2(\varOmega )\) to \(L^2(\varOmega )\), i.e., \(L \in \mathcal {L}(L^2(\varOmega ))\). Note that L may model image blur, subsampling or Fourier and wavelet transforms, respectively, to mention only a few. Moreover, \(\xi \in L^2(\varOmega )\) is an oscillating map with zero mean. From a stochastic point of view, L and \(\xi \) constitute realizations of stochastic processes. We also assume that L does not annihilate constant functions, i.e. \(L 1 \ne 0\).
Since \(\varGamma \) in (MS) is a set, a mathematically meaningful notion of uncertainty propagation of this object is not immediately clear. In view of this, the Ambrosio–Tortorelli formulation offers a transformation of this geometric feature to a functional variable which is more amenable to the study of the propagation of uncertainty. Taking account of the additional data transformation operator L in our model, \(AT^g_\varepsilon \) becomes
which constitutes the segmentation model under investigation in this paper.
The rest of the paper is organized as follows. In Sect. 2, we provide a collection of important results on the Ambrosio–Tortorelli functional concerning properties of the functional as well as first-order conditions for its minimizers. These will be used in Sect. 3 to give a mathematical justification of random processed image and edge indicators in the sense of measurable selections based on the theorem of Kuratowski–Ryll–Nardzewski. In Sect. 4, we briefly discuss frontiers of popular techniques in uncertainty quantification before we turn our attention to sampling-based methods. We conclude our work in Sect. 5 with numerical examples.
2 Properties of the Deterministic Problem
Now we collect several properties of the Ambrosio–Tortorelli functional and its minimizers. In order to reduce technicalities in the subsequent proofs, we establish the following norm equivalence. Below, \(\Vert \cdot \Vert _{L^2(\varOmega )}\) denotes the usual \(L^2(\varOmega )\)-norm on a bounded domain \(\varOmega \subset \mathbb {R}^d\) and \(H^1(\varOmega )\) is the Sobolev space of functions in \(L^2(\varOmega )\) with generalized gradient in \(L^2(\varOmega )^d\); see, e.g., [3].
Lemma 1
Let \(L \in \mathcal {L}(L^2(\varOmega ))\) with \(L 1 \ne 0\). Then the norm
and the usual \(H^1\)-norm, i.e., \(\Vert u\Vert _{H^1(\varOmega )}^2 = \Vert \nabla u\Vert _{L^2(\varOmega )}^2 + \Vert u\Vert _{L^2(\varOmega )}^2\), are equivalent on \(H^1(\varOmega )\).
Proof
Let \(u \in H^1(\varOmega )\) be chosen arbitrarily. At first, we see
and hence \({\vert \vert \vert }u {\vert \vert \vert }\le \max \left( 1, \Vert L\Vert _{\mathcal {L}(L^2)} \right) \Vert u\Vert _{H^1(\varOmega )}\), where \(\Vert \cdot \Vert _{\mathcal {L}(L^2)}\) denotes the operator norm for linear bounded operators from \(L^2(\varOmega )\) to \(L^2(\varOmega )\).
For proving the inverse inequality we define the quantity \(m := \frac{1}{\lambda ^d(\varOmega )} \int _\varOmega u \mathrm {d}x\in \mathbb {R}\), where \(\lambda ^d\) denotes the d-dimensional Lebesgue measure, and let \(C_P > 0\) be the Poincaré constant, which is the (smallest) real number fulfilling the Poincaré inequality
see, e.g., [2, Corollary 5.4.1]. Using the above, we find
which completes the proof. \(\square \)
From now on, we equip the Sobolev space \(H^1(\varOmega )\) for the processed image u with the new norm \({\left| \left| \left| \ \cdot \ \right| \right| \right| }\), whereas the norm of the z-component remains unchanged. Next we consider continuity properties of the Ambrosio–Tortorelli functional.
Theorem 2
The mapping \(AT^g_\varepsilon : {H^1(\varOmega )}\times K \rightarrow \mathbb {R}\) is weakly lower semi-continuous.
Proof
Let \((u_n,z_n) \rightharpoonup (u,z)\) be a weakly convergent sequence in \({H^1(\varOmega )}\times K\). Then there exists a subsequence (not relabeled) such that the limes inferior is attained and \(z_n \rightarrow z\) converges additionally pointwise almost everywhere (Fischer-Riesz). Using Lebesgue’s dominated convergence theorem, we obtain \(z_n \nabla u_n \rightharpoonup z \nabla u\) in \(L^2(\varOmega ;\mathbb {R}^d)\). Eventually, we obtain by the weak lower semi-continuity of the \(L^2\)-norm that
which proves the lower-semicontinuity of \(AT^g_\varepsilon \). \(\square \)
Using the direct method of calculus of variations in combination with Lemma 1 and Theorem 2, it is not difficult to derive the existence of a solution of (AT). Due to the nonconvexity of \(AT_\varepsilon ^g\), uniqueness cannot be expected in general.
For the sake of characterizing critical points, we turn our attention to necessary first-order optimality conditions. The derivation of a corresponding system is challenged by the nonsmoothness of \(AT^g_\varepsilon \) with respect to the \({H^1(\varOmega )}\times {H^1(\varOmega )}\)-topology together with the presence of the constraint set K. We address these issues by using a truncation argument. In fact, for all \(u, z \in H^1(\varOmega )\) with \(z \in L^\infty (\varOmega )\) it holds that \(AT^g_\varepsilon (u,z) < \infty \) as well as
The constraint set is then substituted by \(z \in X\) with \(X:= {H^1(\varOmega )}\cap L^\infty (\varOmega )\) together with the norm \(\Vert \cdot \Vert _X^2 := \Vert \cdot \Vert _{{H^1(\varOmega )}}^2 + \Vert \cdot \Vert _{L^\infty (\varOmega )}^2\). Consequently, we obtain
and the respective sets of minimizers coincide. The advantage of the righthand side formulation in (1) lies in the fact that the Ambrosio–Tortorelli functional is continuously Fréchet differentiable on \(H^1(\varOmega ) \times X\) with derivative
Here, \(\langle \cdot , \cdot \rangle \) denotes the dual pairing between \({H^1(\varOmega )}\times X\) and its dual. Then the stationarity system associated with the problem on \(H^1(\varOmega ) \times X\) in (1) in its weak form reads
for all \(\varphi \in {H^1(\varOmega )}\) and \(\psi \in X\). Utilizing coercivity properties of the involved inner products, one readily obtains the a priori bounds
Moreover, we deduce \(z \in K\), which results from (2b) by testing with \(\psi = -(z-1)^+ + (-z)^+\) with \((\cdot )^+=\max \{0,\cdot \}\) in a pointwise sense; see also [14, Prop. 1.3].
After clarifying existence of solutions and first-order necessary conditions, we next draw our attention to the following convergence result.
Theorem 3
Suppose there exists a bounded sequence \(((u_j,z_j))_{j\in \mathbb {N}} \subset {H^1(\varOmega )}\times X\) such that
Further, let \((\mu _j,\nu _j) \rightarrow 0\) and \(z_j \in K\) for all \(j\in \mathbb {N}\). Then there exist a point \((u^*,z^*)\) and a subsequence such that
Proof
The result follows from a straight forward adaptation of the proof in [14, Theorem 4.1]. \(\square \)
The above result is useful when studying properties of solution sets of (2) and for the convergence analysis of a splitting-type algorithm in function space for iteratively solving (2). We continue here by studying such a splitting method and postpone the former aspect to Sect. 3.
We note that the algorithm splits the overall nonconvex minimization problem into the iterative solution of two subsequent problems. Indeed within one iteration, in each step a convex quadratic minimization problem needs to be solved, which is equivalent to solving a linear partial differential equation (PDE), respectively. The convergence of this iteration scheme can then be obtained by applying a result due to [23] concerning the decay of the distance of two consecutive image iterates.
Corollary 4
Let \((u_j,z_j)_{j\in \mathbb {N}}\) be the sequence generated by the splitting algorithm. Then there exists a stationary point \((u^*,z^*) \in {H^1(\varOmega )}\times X\) and a subsequence such that
Proof
Adapting [23, Lemma 2.3.9 and Theorem 2.3.10], we obtain
Since \(u_{j+1} ={\text {argmin}}_{u \in {H^1(\varOmega )}} AT^g_\varepsilon (u,z_j)\) it holds that
Hence, we obtain for every \(\varphi \in {H^1(\varOmega )}\) that
Since \(b(u_j,z_j;\psi ) = 0\) for all \(\psi \in X\), we see that the conditions of Theorem 3 are fulfilled with \(\mu _j = \max (1+\eta ,\beta ){\vert \vert \vert }u_{j+1} - u_j {\vert \vert \vert }\) and \(\nu _j = 0\). Hence, the assertion follows. \(\square \)
We point out that the last estimate in (5), leading to \(\max (1+\eta , \beta ){\vert \vert \vert }u_{j+1} - u_j{\vert \vert \vert }\le \mathrm {tol}\) for some tolerance \(\mathrm {tol}>0\), can also be used as a numerically useful stopping criterion for the splitting algorithm.
With these results at hand, we are now able to derive finer properties of the dependence of the solutions of the minimization problem and the first-order system on the given image data, respectively.
3 Existence of Measurable Selections
Next we introduce the concept of uncertain edges. As motivated earlier, we assume that the real image is inaccessible due to corruption by (random) noise and a deterministic degradation operator. The resulting data are therefore modeled by a random variable, which we denote in the following again by g for the sake of readability. Rather than addressing the issue of finding a reconstruction of the image along with an edge set, we seek to study the propagation of noise into the solution.
3.1 Measurable Selections for Ambrosio–Tortorelli
For the reasons mentioned above, we focus on the Ambrosio–Tortorelli approximation (AT) of the Mumford–Shah energy. The main issue now is to establish existence of random variables of solutions as well as their characterization. For this purpose we define the following set-valued solution operators \(\mathcal {S}_{\min } , \mathcal {S}_\mathrm {stat}: {L^2(\varOmega )}\rightrightarrows {H^1(\varOmega ;\mathbb {R}^2)}\) defined by
We start by providing structural properties of these operators.
Theorem 5
Let \(\mathcal {S}\in \{\mathcal {S}_{\min },\mathcal {S}_\mathrm {stat}\}\). Then the following properties hold true:
-
(i)
\(\mathcal {S}\) has nonempty and compact values.
-
(ii)
For all closed sets \(C \subset {H^1(\varOmega )}\times K\), it holds that \(\mathcal {S}^{-1}(C) \subset {L^2(\varOmega )}\) is closed,
where we use \(\mathcal {S}^{-1}(C):= \{g \in {L^2(\varOmega )}: \mathcal {S}(g) \cap C \ne \emptyset \}\).
Proof
We split the proof into several steps. Below \(C \subset H^1(\varOmega ) \times K\) always denotes a closed set.
Step 1 (Nonemptiness). Since we know that the Ambrosio–Tortorelli problem has a minimizer for every \(g \in {L^2(\varOmega )}\), it holds that \(\mathcal {S}_{\min }(g) \ne \emptyset \). Since every minimizer fulfils the stationarity system, \(\mathcal {S}_\mathrm {stat}(g) \ne \emptyset \) follows, as well.
Step 2 (\(\mathcal {S}_{\min }(g)\) is closed). Let \(m := \min _{(u,z) \in {H^1(\varOmega )}\times K} AT^g_\varepsilon (u,z)\). Then we rewrite
Since according to Theorem 2 the Ambrosio–Tortorelli functional is (weakly) l.s.c. on \({H^1(\varOmega )}\times K\) with respect to the \({H^1(\varOmega ;\mathbb {R}^2)}\)-topology, we find that \(\mathcal {S}_{\min }(g)\) is closed.
Step 3 (\(\mathcal {S}_\mathrm {stat}(g)\) and \(\mathcal {S}_{\min }(g)\) are compact). Let \((u_n,z_n)_{n\in \mathbb {N}} \subset \mathcal {S}_\mathrm {stat}(g)\) be a sequence of stationary points. Then this sequence is bounded in \({H^1(\varOmega )}\times X\) by the a priori estimate (3), and \(z_n \in K\) holds true, as well. We apply Theorem 3 with \(\mu _j = \nu _j = 0\) and get the existence of a stationary point together with a strongly \({H^1(\varOmega ;\mathbb {R}^2)}\)-convergent subsequence. Thus, the compactness of \(\mathcal {S}_\mathrm {stat}\) as well as the compactness of \(\mathcal {S}_{\min }(g)\) (using Step 2) are established.
Step 4 (Proof of (ii) for \(\mathcal {S}_\mathrm {stat}\)). Let \((g_n)_{n\in \mathbb {N}} \subset \mathcal {S}_\mathrm {stat}^{-1}(C)\) be a (strongly) \({L^2(\varOmega )}\)-convergent sequence with limit \(g \in {L^2(\varOmega )}\). We show \(g \in \mathcal {S}_\mathrm {stat}^{-1}(C)\). For this purpose, let \((u_n,z_n)_{n\in \mathbb {N}} \subset C\) be a sequence of stationary points for \(g_n\), \(n\in \mathbb {N}\). Then, \(0 \le z_n \le 1\) holds almost everywhere and
for every \(n\in \mathbb {N}\). Again, we get the boundedness of \(((u_n,z_n))_{n \in \mathbb {N}}\) in \(H^1(\varOmega ) \times X\) by the uniform boundedness of \(\Vert g_n\Vert _{L^2(\varOmega )}\), \(n\in \mathbb {N}\), combined with the a priori bound (3). Moreover, we find
Since \(g_n \rightarrow g\) in \({L^2(\varOmega )}\), the conditions of Theorem 3 are fulfilled for g with \(\mu _n = \beta \Vert g_n - g\Vert _{{L^2(\varOmega )}}\) and \(\nu _n = 0\). Hence, there exists a stationary point \((u^*,z^*)\) together with a strongly \({H^1(\varOmega ;\mathbb {R}^2)}\)-convergent subsequence of \((u_n,z_n)_{n\in \mathbb {N}}\). Since \((u_n,z_n) \subset C\) for all \(n\in \mathbb {N}\) and C is closed, we obtain \((u^*,z^*) \in C\). Therefore, \(\mathcal {S}_\mathrm {stat}(g) \cap C \ne \emptyset \) and, hence, \(g \in \mathcal {S}_\mathrm {stat}^{-1}(C)\).
Step 5 (Proof of (ii) for \(\mathcal {S}_{\min }\)). Let \((g_n)_{n\in \mathbb {N}} \subset \mathcal {S}_{\min }^{-1}(C)\) be a strongly \({L^2(\varOmega )}\)-convergent sequence with limit \(g \in {L^2(\varOmega )}\). We show \(g \in \mathcal {S}_{\min }^{-1}(C)\). Take \((u_n,z_n)_{n\in \mathbb {N}} \subset C\) as a corresponding sequence of minimizers for \(g_n\), \(n\in \mathbb {N}\). For all \(n \in \mathbb {N}\) and all \((u,z) \in {H^1(\varOmega )}\times K\) it thus holds that
Since every minimizer is a stationary point, we obtain from the arguments of step 4 a stationary point \((u^*,z^*) \in C\) and a strongly \({H^1(\varOmega ;\mathbb {R}^2)}\)-convergent subsequence \((u_{n_k},z_{n_k})_{k\in \mathbb {N}}\). Then \((u^*,z^*)\) is also a minimizer. In fact, we have
Since (u, z) was arbitrary, we conclude that \((u^*,z^*)\) is also a minimizer for g. Hence \(\mathcal {S}_{\min }^{-1}(C)\) is closed. This completes the proof. \(\square \)
The aim is to construct a random variable, whose values are the solutions of (AT) for the given image and for almost every realization of noise. This quantity then defines an associated reconstructed image and edge indicator, respectively. Therefore we aim at using the Kuratowski–Ryll-Nardzewski measurable selection theorem, which we state here for ease of reference and refer the interested reader to [13, Theorem 6.6.7].
Theorem 6
(Kuratowski and Ryll-Nardzewski). Let Y be a Polish space, \((\mathcal {X},\mathcal {A})\) a measurable space and \(\varPsi : {\mathcal {X}} \rightrightarrows Y\) a set-valued operator such that
-
(i)
\(\varPsi ({x})\) is nonempty and closed for all \(x \in \mathcal {X}\), and
-
(ii)
for all \(U \subset Y\) open, it holds that \(\varPsi ^{-1}(U) = \{{x \in \mathcal {X}} : \varPsi ({x}) \cap U \ne \emptyset \} \in {\mathcal {A}}\).
Then there exists a measurable selection, i.e., a mapping \(\varPsi _\mathrm {sel}: {\mathcal {X}} \rightarrow Y\) such that \(\varPsi _\mathrm {sel}({x}) \in \varPsi ({x})\) for all \(x \in \mathcal {X}\).
Given the notation of Theorem 6, we note that the fulfilment of the condition \(\varPsi ^{-1}(C) \in {\mathcal {A}}\) for all closed subsets \(C \subset Y\) implies assumption (ii). Indeed, let \(U \subset Y\) be an arbitrary open set \(U \subset Y\). By \(U^c\), we denote the complement of U in Y. Rewriting U gives
with \(C_n := \{y \in Y: {\text {dist}}(y,U^c) \ge \tfrac{1}{n}\} = ({\text {dist}}(\ \cdot \ ,U^c))^{-1} ([\tfrac{1}{n},+\infty ))\) as a closed set for every \(n \in \mathbb {N}\). Then, we obtain
Since \(\varPsi ^{-1}(C_n) \in {\mathcal {A}}\) by assumption, we have \(\varPsi ^{-1}(U) \in {\mathcal {A}}\).
Based on this observation, we are now able to prove the existence of measurable selections for the operators \(\mathcal {S}_{\min }\) and \(\mathcal {S}_\mathrm {stat}\).
Corollary 7
Let \(\mathcal {S}\in \{\mathcal {S}_{\min }, \mathcal {S}_\mathrm {stat}\}\). Then there exist Borel functions \(u_\mathrm {sel}: L^2(\varOmega ) \rightarrow H^1(\varOmega )\) and \(z_\mathrm {sel}: L^2(\varOmega ) \rightarrow K\) such that \((u_\mathrm {sel}(g), z_\mathrm {sel}(g)) \in \mathcal {S}(g)\) for all \(g \in L^2(\varOmega )\).
Proof
Set \(\varPsi \in \{\mathcal {S}_{\min }, \mathcal {S}_\mathrm {stat}\}\), \({\mathcal {X}} = L^2(\varOmega )\), \({\mathcal {A}} = \mathcal {B}(L^2(\varOmega ))\) be the Borel algebra and \(Y = H^1(\varOmega ) \times K\). Since \(H^1(\varOmega ;\mathbb {R}^2)\) is a separable Hilbert space, it is a Polish space. As closed subsets of Polish spaces are Polish as well (see [40, Proposition A.1] or use [1, Corollary 3.5]) also \(Y \subset H^1(\varOmega )^2\) is a Polish space and hence the assumptions of Theorem 6 are fulfilled. \(\square \)
3.2 Measurable Selections for Mumford–Shah
Similar to the previous subsection, it is possible to derive an analogous result for the Mumford–Shah problem (MS). Hence, we focus on the denoising case, i.e., \(L = \mathrm {id}_{L^2(\varOmega )}\). Due to the analytical difficulties concerning the edge set as a geometric quantity, we have drawn our attention to the Ambrosio–Tortorelli model. In the following, we temporarily return to the original Mumford–Shah model. In fact, it is possible to derive a result similar to the one above for the processed image only. For this sake, we first discuss a reformulation of the Mumford–Shah problem (MS) based on special functions of bounded variations.
For this purpose, we recall that the space of functions of bounded variation \(BV(\varOmega )\) is defined as the set of \(L^1\)-functions with bounded total variation. The latter quantity is given by
Accordingly, we have \(BV(\varOmega ) = \{ u \in L^1(\varOmega ) : |Du|(\varOmega ) < +\infty \}\). Equipped with the norm \(\Vert u\Vert _{BV} = \Vert u\Vert _{L^1(\varOmega )} + |Du|(\varOmega )\) the space \(BV(\varOmega )\) is a Banach space. The distributional derivative Du of \(u \in BV(\varOmega )\) is a Radon measure which can be decomposed according to
Here, the first term is the part of Du that is absolutely continuous with respect to the Lebesgue measure with \(\nabla u\) denoting its density. The terms \(u^+, u^-\) are the so-called upper, respectively, lower limits of u and \(S_u := \{x \in \varOmega : u^+(x) > u^-(x)\}\) represents the jump set. The second term denotes the part of the measure that is absolutely continuous with respect to the Hausdorff measure restricted to \(S_u\), which we denote here as . The remaining part is singular with respect to both the Lebesgue and Hausdorff measure and is referred to as the Cantor part of Du. For the definition of approximate limits as well as further details the reader is referred to [2, Section 10], [21, Section 5] as well as to the monograph [6].
The space of special functions of bounded variation is then defined as
which is the subspace of BV-functions with vanishing Cantor part.
Using the above notation, it is possible to provide the following equivalent reformulation of the Mumford–Shah problem, originally proposed by [20], as a problem with the processed image contained in the SBV-space, i.e.,
We note that a solution must fulfil \(u \in L^2(\varOmega )\) and \(\nabla u \in L^2(\varOmega ;\mathbb {R}^d)\) for the value of the functional to be finite. Observe further that this formulation has the advantage that the edge set does not enter as an explicit variable, but it is rather encoded in the function u through a jump. On the other hand, a potential drawback of this approach is given by the fact that now u is contained in a space that is neither reflexive nor separable. For details on the equivalence of the two formulations as well as the existence proof for \(g \in L^\infty (\varOmega )\) the reader is referred to [20] or to [6, Section 7].
For (MS-SBV), we are now able to derive an analogous version of Theorem 5, when we consider the solution operator as a mapping from \(L^\infty (\varOmega )\) to \(L^2(\varOmega )\). Let us briefly discuss the requirement on g. In fact, from a practical view point an image signal is the product of a technical measurement process producing data in a fixed and finite range. Therefore the image signal will typically be essentially bounded. From the mathematical viewpoint, we seek to apply a result by Ambrosio addressing SBV functions as well as Theorem 6. In order to do so, we need bounded sequences in \(L^\infty (\varOmega )\) and a separable solution space \(L^2(\varOmega ) \supset SBV(\varOmega )\).
Theorem 8
Let \(T:L^\infty (\varOmega ) \rightrightarrows L^2(\varOmega )\) be the solution mapping of the Mumford–Shah problem (MS-SBV), i.e.,
Then the following properties hold:
-
(i)
T has nonempty and compact values.
-
(ii)
For all closed sets \(C \subset L^2(\varOmega )\), it holds that \(T^{-1}(C) \subset L^2(\varOmega )\) is closed.
Before we provide the proof, we also state that there exists a measurable selection of minimizers, i.e., a Borel function \(u_\mathrm {sel}:L^\infty (\varOmega ) \rightarrow L^2(\varOmega )\) with \(u_\mathrm {sel}(g) \in T(g)\) for all \(g \in L^\infty (\varOmega )\).
Proof
Let \(g \in L^\infty (\varOmega )\).
Step 1 (The set \(T(g) \subset L^2(\varOmega )\) is compact). We know that the Mumford–Shah problem has a solution (cf. [9, Example 5.2]). Hence \(T(g) \ne \emptyset \). Let \((u_n)_{n \in \mathbb {N}} \subset T(g) \subset SBV(\varOmega )\) be a sequence of solutions. By the same truncation argument as in [9], we obtain \(\Vert u_n\Vert _{L^\infty (\varOmega )} \le \Vert g\Vert _{L^\infty (\varOmega )}\) for all \(n \in \mathbb {N}\). Thus, \((u_n)_{n \in \mathbb {N}}\) is uniformly bounded and by using
we obtain that the conditions of the compactness result of Ambrosio [8] are fulfilled. Hence, there exist a subsequence \((u_{n_k})_{k \in \mathbb {N}}\) as well as \(u \in SBV(\varOmega ) \cap L^\infty (\varOmega )\) such that \(u_{n_k} \rightarrow u\) almost everywhere and the gradients \(\nabla u_{n_k} \rightharpoonup \nabla u\) converge weakly in \(L^2(\varOmega ; \mathbb {R}^d)\). Due to the \(L^\infty \)-boundedness, we obtain \(u_{n_k} \rightarrow u\) in \(L^2(\varOmega )\) by dominated convergence. Moreover, we get by the lower semi-continuity of the norm and the second part of Ambrosio’s compactness result that
and eventually \(u \in T(g)\). This proves the compactness of T(g) in \(L^2(\varOmega )\).
Step 2 (\(T^{-1}(C) \subset L^\infty (\varOmega )\) is closed for all closed subsets \(C \subset L^2(\varOmega )\)). Let \(C \subset L^2(\varOmega )\) be closed. We choose an arbitrary sequence \((g_n)_{n \in \mathbb {N}} \subset T^{-1}(C)\) convergent in \(L^\infty (\varOmega )\) towards \(g \in L^\infty (\varOmega )\). Then for all \(n \in \mathbb {N}\) there exists \(u_n \in T(g_n)\) that minimizes the corresponding Mumford–Shah functional. It thus holds that \(MS_{SBV}^{g_n}(u_n) \le MS_{SBV}^{g_n}(v)\) for all \(v \in SBV(\varOmega )\). By plugging in \(v = 0\) and using the uniform boundedness of \(g_n\) in \(L^\infty \), we recognize that the conditions of Ambrosio’s compactness result are fulfilled, and therefore a subsequence \((u_{n_k})_{k \in \mathbb {N}}\) as well as a limit u exist. By the above arguments, we see \(u_{n_k} \rightarrow u\) in \(L^2(\varOmega )\) and by the closedness assumption also \(u \in C\). Together with the convergence of \((g_{n_k})_{k \in \mathbb {N}}\), we find analogously to Step 1 that
for all \(v \in SBV(\varOmega )\). Thus, \(u \in T(g) \cap C\), and therefore \(T^{-1}(C)\) is a closed subset of \(L^\infty (\varOmega )\). We obtain the existence of a measurable selection again by the theorem of Kuratowski–Ryll-Nardzewski. \(\square \)
While the above result provides some theoretical insights into the behavior of the computed image, it does not yield a way of how to treat the jump sets, respectively, the edge sets. We therefore return to the Ambrosio–Tortorelli approximation.
So far we have discussed measurable selections of operators on function space with respect to their topologies and induced Borel algebras. We consider now an abstract probability space \((\Xi ,\mathcal {F},\mathbb {P})\) together with a random variable \(g:\Xi \rightarrow L^2(\varOmega )\) being measurable with respect to \(\Xi \) equipped with the \(\sigma \)-algebra \(\mathcal {F}\) and \(L^2(\varOmega )\) equipped with the Borel algebra \(\mathcal {B}(L^2(\varOmega ))\). If we consider now a measurable selection \(g \mapsto (u_\mathrm {sel}(g),z_\mathrm {sel}(g))\) of minimizers (stationary points), then the composition \(\xi \mapsto (u_\mathrm {sel}(g(\xi )),z_\mathrm {sel}(g(\xi )))\) is a random variable of minimizers (stationary points). We refer to the respective components simply as \(\xi \mapsto (u(\xi ), z(\xi ))\).
3.3 Characterizing Measurable Selections for Ambrosio–Tortorelli
After establishing the existence of measurable selections, the next step is to derive a variational characterization, i.e., we seek to interpret selections of stationary points as solutions of a stochastic partial differential equation (SPDE) and, respectively, selections of minimizers as minimizer of an optimization problem. For this purpose, we need to introduce elements of Bochner space theory.
First we equip the (abstract) measurable space \((\Xi , \mathcal {F})\) with a probability measure \(\mathbb {P}\). The resulting measure space induces a Lebesgue integral for \(\mathbb {R}\)-valued random variables denoted by
provided the argument is Lebesgue integrable. For a measurable set \(M \in \mathcal {F}\), we define the characteristic function as
and remark that \(\mathbb {E}\left[ {\mathbb {1}_M}\right] = \mathbb {P}[M]\) holds. Let now V be a separable Banach space and \(p \in [1, +\infty ]\). The space of Bochner p-integrable functions is denoted by
Since we do not change \((\Xi , \mathcal {F})\), we also denote the above space by \(L^p_\mathbb {P}(V)\). Equipped with the norm \(\Vert v\Vert _{L^p_{\mathbb {P}}(V)} := \mathbb {E}[\Vert v\Vert _V^p]^{\frac{1}{p}}\) it is even a Banach space, and in the case of \(p = 2\) and V being a Hilbert space, it is a Hilbert space, as well. For all \(v \in L^p_\mathbb {P}(V)\) the Bochner integral
exists and is referred to as the expectation. Since the Lebesgue integral is just a special case of the Bochner integral with \(V = \mathbb {R}\), there is no risk of confusion. For an overview on Bochner spaces, the interested reader is referred to, e.g., [28, Section 1]. For \(V = {H^1(\varOmega )}\), we also write \({H_\mathbb {P}}:= {L^{2}_\mathbb {P}}({H^1(\varOmega )})\) as well as
where we write \(\lambda ^d \otimes \mathbb {P}\) for the product measure on \(\varOmega \times \Xi \), equipped with its product \(\sigma \)-algebra \(\mathcal {B}(\varOmega ) \otimes \mathcal {F}\). The spaces are equipped with the norms
respectively. In any case, for our random edge indicator \(z \in {X_\mathbb {P}}\) we obtain \(0 \le z \le 1\) holding \(\lambda ^d\otimes \mathbb {P}\)-almost everywhere, and in combination with the a priori estimate (3b) the boundedness in \({X_\mathbb {P}}\) and \(L^\infty _\mathbb {P}(H^1(\varOmega ))\). If we, moreover, assume that \(g \in {L^{p}_\mathbb {P}}(L^2(\varOmega ))\), then also \(u \in {L^{p}_\mathbb {P}}(H^1(\varOmega ))\). From now on let \(g \in {L^{2}_\mathbb {P}}(L^2(\varOmega ))\).
Our first target is the SPDE-characterization. For this purpose, we proceed regarding the notation as with the image g and denote by (u, z) a random variable of stationary points. Choosing arbitrary \(\varphi \in {H_\mathbb {P}}\) and \(\psi \in {X_\mathbb {P}}\), we prove the measurability of \(\xi \mapsto a^{g(\xi )}(u(\xi ),z(\xi );\varphi (\xi ))\) as well as \(\xi \mapsto b(u(\xi ),z(\xi );\psi (\xi ))\). The proof is only carried out for the latter one as the proof for the former is analogous.
Indeed, for \(c \in \mathbb {R}\) we consider the preimage of \((-\infty , c]\) under b, which is
where \(R>0\) and \(B_R^{L^\infty }(0) := \{ \tilde{\psi }\in X: \Vert \tilde{\psi }\Vert _{L^\infty } \le R\}\). We seek to show the closedness of B in \(H^1(\varOmega ;\mathbb {R}^3)\). Let \((u_n',z_n',\psi _n') \rightarrow (u',z',\psi ')\) converge in \(H^1(\varOmega ;\mathbb {R}^3)\). By dominated convergence, there exists a subsequence (not relabeled) such that \(z_n'\) and \(\psi _n'\) converge pointwise almost everywhere. It also holds that
By the strong convergence of \(\nabla u_n'\) the first part tends to zero, and by the pointwise convergence of \(z_n' \psi _n'\) and \(|\nabla u'|^2 |z_n' \psi _n' - z' \psi '| \le 2R |\nabla u'|^2\) the second part approaches zero by dominated convergence, as well. Hence we obtain
yielding \((u',z',\psi ') \in B\). By the \(H^1\)-measurability of u, z and \(\psi \), we get with \(R := \Vert \psi \Vert _{X_\mathbb {P}}\) that
holds for all \(c \in \mathbb {R}\). This implies the measurability of \(\xi \mapsto b(u(\xi ), z(\xi ); \psi (\xi ))\). Analogously, we obtain the measurability of \(a^{g(\cdot )}(u(\,\cdot \,),z(\,\cdot \,);\varphi (\,\cdot \,))\).
Using the Cauchy–Schwarz inequality, we obtain for \(u,\varphi \in {H_\mathbb {P}}\) and \(z,\psi \in {X_\mathbb {P}}\)
Hence, every measurable selection of stationary points fulfils the SPDE-system (in weak form)
Consequently, every measurable selection of stationary points is also a solution of the SPDE (6). The converse holds true as well. Let \(\varphi ^* \in H^1(\varOmega )\) and \(\psi ^* \in X\) arbitrary. Then we define the following events
Plugging \(\varphi = \varphi ^* \mathbb {1}_{A_\Xi } - \varphi ^* \mathbb {1}_{A_\Xi ^c}\) and \(\psi = \psi ^* \mathbb {1}_{B_\Xi } - \psi ^* \mathbb {1}_{B_\Xi ^c}\) into (6), we obtain
and analogously \(0 = \mathbb {E}\left[ |b(u,z;\psi ^*)|\right] \), from which we deduce \(a(u,z;\varphi ^*) = 0\) \(\mathbb {P}\text {-a.s.}\) as well as \(b(u,z;\psi ^*) = 0\) \(\mathbb {P}\text {-a.s.}\) for all \(\varphi ^* \in H^1(\varOmega )\) and \(\psi \in X\). From this, we see the equivalence of the realization-wise interpretation of (2) and the SPDE-system (6).
After the characterization of stationary points, we want to characterize measurable selections of minimizers in a manner similar to the interchange of minimization and integration in [37, Chapter 14, Section F]. For this purpose, we define the constraint set
and formulate the following minimization problem
Using the same arguments as for the deterministic Ambrosio–Tortorelli problem, we obtain the following result.
Theorem 9
The problem (7) admits a solution, and one can relax the constraint \(z \in K_\mathbb {P}\) to \(z \in {X_\mathbb {P}}\), i.e.,
Moreover, the functional \(\mathbb {E}[AT^g_\varepsilon (\cdot ,\cdot )]: H_\mathbb {P} \times X_\mathbb {P} \rightarrow \mathbb {R}\) is Fréchet differentiable, and for every minimizer \((u,z) \in H_\mathbb {P} \times X_\mathbb {P}\) the stationarity system (6) is satisfied.
Let \((u^*,z^*)\) be a measurable selection of minimizers and \((\bar{u},\bar{z})\) a minimizer of (7). From this, we obtain \(AT^g_\varepsilon (u^*,z^*) \le AT^g_\varepsilon (\bar{u}, \bar{z}) \ \mathbb {P}\text {-a.s.}\) Taking the expectation yields \(\mathbb {E}[AT^g_\varepsilon (u^*,z^*)] \le \mathbb {E}[AT^g_\varepsilon (\bar{u},\bar{z})]\). Since \((\bar{u}, \bar{z})\) minimizes (7), it holds that \(\mathbb {E}[AT^g_\varepsilon (\bar{u}, \bar{z})] \le \mathbb {E}[AT^g_\varepsilon (u^*,z^*)]\). Hence \((u^*,z^*)\) is a solution of (7) and it follows by the above scenariowise inequality that \(AT^g_\varepsilon (u^*,z^*) = AT^g_\varepsilon (\bar{u}, \bar{z}) \ \mathbb {P}\text {-a.s.}\) The solution of the minimization problem for the space of Banach space valued random variables is therefore identical to the solution in the sense of \(\mathbb {P}\)-almost every scenario. We summarize our findings on the characterization of measurable selections in the following theorem.
Theorem 10
Let \(g \in {L^{2}_\mathbb {P}}(L^2(\varOmega ))\). A random variable \((u,z) \in H_\mathbb {P} \times K_\mathbb {P}\) is a solution of (7) if and only if it is a \(\mathbb {P}\)-a.s. measurable selection of minimizers, and it solves (6) if and only if it is a \(\mathbb {P}\)-a.s. measurable selection of stationary points.
The purpose of this section is to give a mathematical meaning to random processed image and edges. The approximation of the Mumford–Shah problem yields the existence of random variables that solve the nondeterministic problem scenario-wise. In this sense, a measurable selection of minimizers serves the purpose of a random reconstruction and a random edge. Our numerical treatment of noise propagation in the following section is based on this observation.
4 Numerical Methods
As we are concerned with computing the influence of the signal’s randomness on the edge, our problem class falls into the realm of forward problems in the terminology of uncertainty quantification (UQ). For the sake of accessibility, we briefly summarize some of the main challenges also when designing numerical solution schemes.
-
1.
The solution may be non-unique.
-
2.
Due to the lack of convexity of the solution sets, we are unable to use the Michael selection theorem (cf. [13, Theorem 6.6.3]) to establish the existence of continuous selections.
These facts challenge the use of many popular methods in UQ. Indeed:
-
3.
The Wiener Chaos expansion is significantly troubled by the local constraint \(0 \le z \le 1\). The global orthonormal expansion is not compatible to the pointwise inequality constraints. For a computational approach in this direction we refer to [36].
-
4.
Stochastic collocation is essentially an interpolation method and necessarily needs point evaluations and continuous dependence on the parameter.
The above techniques seek to approximate the entire random variable by discrete objects. A different approach is based on the definition of a quantity of interest and its approximation using the techniques of statistics and numerics. In this paper, we employ Monte Carlo-based techniques, which rely on the strong law of large numbers in a Hilbert space setting. The latter one is stated here for the sake of convenience. For a proof see, e.g., [19].
Theorem 11
Let H be a separable Hilbert space and \((Z_n)_{n \in \mathbb {N}}\) a sequence of pairwise independent, identically distributed (iid.) random variables in \(L^1_\mathbb {P}(H)\) with \(\mathbb {E}[Z_n] = m\) for all \(n \in \mathbb {N}\). Then
i.e., \(\mathbb {P}\left[ \lim _{n \rightarrow \infty }\Vert \hat{m}_n - m\Vert _H = 0\right] = 1\) holds true. If, additionally, \((Z_n)_{n \in \mathbb {N}}\) is a family in \(L^2_\mathbb {P}(H)\), then \(\hat{m}_n \rightarrow m\) in \(L^2_\mathbb {P}(H)\), as well.
4.1 Generalized Confidence Intervals
In our context, one important quantity of interest is concerned with the identification of regions, where it is more likely to encounter an edge. In this sense, we would like to obtain a notion of confidence intervals for the edge indicator. To achieve this, we first provide a pointwise generalization of cumulative distributional functions (cdfs) in the following definition.
Definition 12
Let Z be an \({L^2(\varOmega )}\)-valued random variable. The function \(F: {L^2(\varOmega )}\rightarrow L^\infty (\varOmega )\) is defined pointwise almost everywhere by
for \(\tau \in L^2(\varOmega )\) and is called the generalized cumulative distribution function (gcdf) of Z.
Note that the above pointwise definition is justified by the Fubini theorem on integration on product spaces. Using this, we choose a level of significance \(s \in (0,1)\) and define for z – interpreted as a random variable in \({L^{2}_\mathbb {P}}({L^2(\varOmega )})\) only – the pointwise quantile \(\tau _s\) for (almost all) \(x \in \varOmega \) by
This defines a one-sided confidence interval (the lower bound is zero), and two questions immediately arise: (i) Why is this object measurable, and (ii) how can this object be estimated in practice? Both questions are answered by using the following method.
By definition, the above sequences are nondecreasing, respectively, nonincreasing, and almost everywhere, it holds that \(t^\ell _n \le t^\ell _{n + 1}\) (\(t_n^u \ge t^u_{n + 1}\)) and, moreover, \(F(t_n^\ell ) < s \le F(t_n^u)\) for all \(n \in \mathbb {N}\). By the monotonicity, we also obtain the inequality \(0 \le t_n^\ell < t_n^u \le 1\) and hence the convergence \(t^\ell _n \nearrow t_*\) and \(t_n^u \searrow t^*\) pointwise almost everywhere on \(\varOmega \). Due to the measurability of \(t_n^\ell , t_n^u\), we obtain by the pointwise convergence, that the limits are measurable as well. Moreover, by construction we get \(|t^\ell _{n + 1} - t^u_{n + 1}| \le \frac{1}{2}|t^\ell _n - t_n^u|\) and inductively \(|t^\ell _n - t_n^u| \le 2^{-n}\) almost everywhere. Therefore we get \(t_* = t^*\).
In addition, \(t^*\) is the pointwise quantile of interest, which is equivalent to stating that for all \(\tilde{t} \le t^*\) with \(\lambda ^d(A) > 0\) with \(A := \{x \in \varOmega : \tilde{t}(x) < t^*(x)\}\) also \(\mathbb {P}[z< \tilde{t}] < s\) holds a.e. on A. Indeed, by the choice of \(\tilde{t}\) there exists for all \(x \in A\) a number \(n = n(x) \in \mathbb {N}\) with \(\tilde{t}(x) < t^\ell _n(x)\). Hence, by defining \(A_n := \{ x \in \varOmega : \tilde{t} < t_n^\ell \}\) we see that \(A_n \subset A_{n+1}\) and \(A = \bigcup _{n=1}^\infty A_n\). By the construction of \((t^\ell _n)_{n \in \mathbb {N}}\) it holds that \(\mathbb {P}[z< \tilde{t}] \le \mathbb {P}[z \le t_n^\ell ] < s\) almost everywhere on \(A_n\). We then see the required inequality \(\mathbb {P}[z< \tilde{t}] < s\) almost everywhere on A and have thus shown \(t^* = \tau _s\).
The result of this discussion is summarized in the following proposition.
Proposition 13
Let Z be an \(L^2(\varOmega )\)-valued random variable and let a level of significance \(s \in (0,1)\) be given. Then, the pointwise quantile
is a measurable function.
In practice, the exact gcdf is unknown and hence an approximation is needed. For this sake, we define the empirical quantiles in full analogy to the classical case of \(\mathbb {R}\)-valued random variables.
Definition 14
Let Z be an \(L^2(\varOmega )\)-valued random variable. Consider an iid. family of samples \((Z_i)_{i \in \mathbb {N}}\) with distribution \(\mathbb {P}_Z\). The empirical gcdf is defined as the mapping
For proving convergence of \((F_n)_{n \in \mathbb {N}}\) as introduced in Definition 14, we recall the statistical theory for \(\mathbb {R}\)-valued random variables and refer the reader to [41] for proofs and additional information.
Lemma 15
Let \((F_n)_{n \in \mathbb {N}}\) be a sequence of cdfs of associated \(\mathbb {R}\)-valued random variables and let F be the cdf of an \(\mathbb {R}\)-valued random variable. We assume \(F_n(t) \rightarrow F(t)\) for all \(t \in \mathbb {R}\). Let \(s \in (0,1)\) such that the quantile function \(F^{-1}\) is continuous in s. Then we have \(F_n^{-1}(s) \rightarrow F^{-1}(s)\).
Proof
See [41, Lemma 21.2 in a stronger form]. \(\square \)
Based on the preceding lemma, we now prove a convergence result for the s-quantiles of the empirical gcdfs.
Theorem 16
Let Z be an \(\mathbb {R}\)-valued random variable with cdf \(F :\mathbb {R} \rightarrow [0,1]\) and \((Z_i)_{i \in \mathbb {N}}\) be an iid. family of samples with distribution \(\mathbb {P}_Z\). For \(n \in \mathbb {N}\) we denote by \(F_n\) the empirical cdf. For a level of significance \(s \in (0,1)\), chosen such that the quantile function \(F^{-1}\) is continuous in s, the empirical quantiles \(F_n^{-1}(s) \rightarrow F^{-1}(s)\) converge with probability 1. If, moreover, \(0 \le Z \le 1\) holds \(\mathbb {P}\)-a.s., then in addition we get \(F_n^{-1}(s) \rightarrow F^{-1}(s) \text { in } L^p(\Xi , \mathcal {F}, \mathbb {P})\) for all \(p \in [1, \infty )\).
Proof
Using the strong law of large numbers, we get \(F_n(t) \rightarrow \mathbb {E}[F_n(t)] = F(t)\) \(\mathbb {P}\)-a.s for all \(t \in \mathbb {R}\). By Lemma 15, we know that for every realization of \(F_n\) the corresponding quantiles converge and therefore \(F_n^{-1}(s) \rightarrow F^{-1}(s)\) with probability 1. If we assume \(0 \le Z \le 1\), we get immediately \(0 \le F_n^{-1}(s), F^{-1}(s) \le 1\). By dominated convergence, we see that for all \(p \in [1, \infty )\) also \(\mathbb {E}[|F_n^{-1}(s) - F^{-1}(s)|^p] \rightarrow 0\) converges. \(\square \)
After the above preparations, we can formulate a function space valued version of the previous theorem. For this, we need to extend the continuity assumption for the quantile in Lemma 15 on the level of significance to our situation. Note further that the gcdf induces for almost all \(x \in \mathbb {R}\) a cdf of an \(\mathbb {R}\)-valued random variable reading as \(\mathbb {R} \ni t \mapsto \mathbb {P}[z(x) \le t] = F(t)(x)\). Hence one can interpret the quantile function in a pointwise fashion as a function \(s \mapsto \left( F(\,\cdot \,)(x)\right) ^{-1}(s)\) for almost all \(x\in \varOmega \). This observation leads to the following assumption.
Assumption 17
Let \(s \in (0,1)\) be fixed such that
has Lebesgue measure zero.
Utilizing Theorem 16 in a pointwise fashion, we obtain the following result.
Theorem 18
Let \(z \in K_\mathbb {P}\) and \((z_i)_{i \in \mathbb {N}}\) be iid. samplings according to the distribution \(\mathbb {P}_z\). Moreover, fix the level of significance \(s \in (0,1)\) fulfilling Assumption 17 with respect to the gcdf of z. Then the sequence of pointwise empirical s-quantiles \(\tau _{s,n}\) converges to the s-quantile \(\tau _s\) for almost all points with probability 1 and in \(L^p(\varOmega \times \Xi )\) for all \(p \in [1,\infty )\), i.e.
Proof
Since for almost all \(x \in \varOmega \) the function \(t \mapsto \mathbb {P}[z(x) \le t] = F(t)(x)\) is a cdf, we can apply Theorem 16 pointwise and obtain the convergence of \(F_n^{-1}(s) \rightarrow F^{-1}(s)\) pointwise \(\lambda ^d \otimes \mathbb {P}\)-almost everywhere. Due to the pointwise restriction of z, we know almost surely that \(\tau _{s,n} \in [0,1]\) a.e. on \(\varOmega \). Since \(|\tau _{s,n} - \tau _s| \le 1\) holds \(\lambda ^d \otimes \mathbb {P}\)-a.e. we derive by dominated convergence
which ends the proof. \(\square \)
In this subsection, we have encountered the pointwise quantile as quantity of interest for the investigation of random edges. The associated convergence analysis yields a foundation for a sample-based approximation method for the gcdf and the pointwise quantile. The bisection method from the beginning of this section can then be used to efficiently approximate the empirical quantile for a given sample size.
4.2 Variance Reduction by Control Variates
In contrast to the previous section, where we focus on quantiles, we discuss here the expected value as a quantity of interest. Therefore we consider the random edge indicator as an object in \(L^2_\mathbb {P}(L^2(\varOmega ))\) and aim at employing the Monte Carlo method here as well. Since the convergence order is always \(\tfrac{1}{2}\), the convergence speed is merely depending on the variance. Associated acceleration methods aim at the formulation of a new random variable with the same expectation as the original one, but a smaller variance. Several such methods do exist mainly in finite dimensions (cf. [34] for an overview). We pursue an approach based on control variates. For convenience we first review the one-dimensional case aiming at an infinite-dimensional generalization.
Let \(Z \in L^2(\Xi ,\mathcal {F},\mathbb {P})\) be an \(\mathbb {R}\)-valued random variable. Our aim is to estimate \(\mathbb {E}\left[ {Z}\right] \). Suppose, we have another random variable Y, called the control variate, together with its (exact) expected value. Then we sample from the variable
for a yet to be chosen \({{\rho }} \in \mathbb {R}\). Obviously, it holds that \(\mathbb {E}\left[ {Z_{{\rho }}}\right] = \mathbb {E}\left[ {Z}\right] \) and the variance of \(Z_{{\rho }}\) is minimal with respect to the weighting parameter \({{\rho }}\) for the choice
where \({\text {Cov}}\left( Z,Y \right) := \mathbb {E}\left[ {(Z - \mathbb {E}\left[ {Z}\right] ) \cdot (Y - \mathbb {E}\left[ {Y}\right] )}\right] \) and \({\text {Var}}\left( Y \right) := \)
\(\mathbb {E}\left[ {(Y - \mathbb {E}\left[ {Y}\right] )^2}\right] \). The minimal variance for \(Z_{{\rho }}\) then becomes
and, when \({\text {Cov}}\left( Z,Y \right) \ne 0\), a variance reduction is achieved (when compared to Z). Of course, this concept can be extended to m control variates \((Y^{(j)})_{j = 1}^m\) by defining
The optimal \({{\rho }} = ({{\rho }}_1, \dots , {{\rho }}_m)^\top \) then reads
where \(C_Y = \left( {\text {Cov}}\left( Y^{(i)},Y^{(j)} \right) \right) _{i, j = 1 \dots , m} \in \mathbb {R}^{m \times m}\) denotes the covariance matrix of Y and the quantity \(c_Z = ({\text {Cov}}\left( Z,Y^{(j)} \right) )_{j = 1, \dots m} \in \mathbb {R}^m\) denotes the covariance vector of the control variates and the desired target. For a matrix \(A \in \mathbb {R}^{m \times m}\) we denote its Moore–Penrose-inverse by \(A^+\). In the same way as in the previous section, we extend the above methodology in a pointwise fashion and introduce for this aim the following notation.
For \(U,V \in L_\mathbb {P}^2(L^2(\varOmega ))\) we define the pointwise covariance as
where \(\mathbb {E}\) is the expectation defined via the Bochner integral. Further we define the pointwise variance as
in contrast to the variance of U, which reads
For a random variable \(Z \in L^2_\mathbb {P}(L^2(\varOmega ))\), a family \((Y^{(j)})_{j = 1, \dots , m} \subset L^2_\mathbb {P}(L^2(\varOmega ))\) of control variates, and a measurable function \({{\rho }}: \varOmega \rightarrow \mathbb {R}^m\) we consider the random variable
The pointwise minimization of \(\mathbb {V}\left[ {Z_{{\rho }}}\right] \) with respect to \({{\rho }}\) gives us the minimizer of \({\text {Var}}\left( Z_{{\rho }} \right) \) due to the Fubini theorem. Next we prove its measurability.
Lemma 19
The mapping induced by the Moore-Penrose inverse \(( \ \cdot \ )^+: \mathbb {R}^{m \times m} \rightarrow \mathbb {R}^{m \times m}\) with \(A \mapsto A^+\) is measurable.
Proof
By Theorem 3.4 in [7], it holds that
Let \(GL_m(\mathbb {R})\) denote the set of invertible matrices. Since the mapping \((\ \cdot \ )^{-1}: GL_m(\mathbb {R}) \rightarrow GL_m(\mathbb {R})\) is continuous, the functions \(f_k: \mathbb {R}^{m \times m} \rightarrow \mathbb {R}^{m \times m}\) defined by
are continuous and thus measurable. Due to the pointwise convergence \(A^+ = \lim _{k \rightarrow \infty } f_k(A)\) also the mapping \((\cdot )^+\) is measurable. \(\square \)
Hence we have proven the following result.
Proposition 20
The mapping \({{\rho }}: \varOmega \rightarrow \mathbb {R}^m\), defined by \({{\rho }} = -C_Y^+ c_Z\), is (Borel) measurable.
Proof
Since the composition as well as the multiplication of measurable functions are measurable as well, we obtain the measurability of \({{\rho }}\) using the preceding Lemma. \(\square \)
5 Numerical Examples
Our numerical methods are next applied to a selection of real world images. In this respect, the given image data are a (discrete) pixel matrix \({(G_{jk})}^m_{j,k = 1}\) of size \(m \times m\) with values between 0 (black) and 255 (white). In order to utilize our continuous framework, we interpret this pixel mask as a function in \(L^\infty (\varOmega )\) with \(\varOmega := (0,m)^2\). This is done by decomposing \(\varOmega \) (up to null sets) into open squares of size 1 and rewriting
with \(Q_{jk}:= (0,1)^2 + (j-1, m-k)\). Concerning the underlying degradation models, we focus on two cases: (i) white noise and (ii) randomized motion blur.
5.1 White Noise Model
In this setting, we assume \(L := id_{L^2(\varOmega )}\) and that the pixel values are corrupted by a family of independent, normally distributed random variables, whose standard deviation is \(\sigma > 0\) for all pixels. Then the random variable reads as
where \(\xi _{jk} \sim \mathcal {N}(0,1)\) are iid. with standard normal distribution. In our experiments, we set \(\sigma = 40\). This is in fact a rather large standard deviation compared to situations in practice. Hence, since the Mumford–Shah model has been designed in a way that it suppresses the influence of noise, a more aggressive noise model helps the visualization of results in subsequent experiments.
5.2 Motion Blur Model
In addition to the noise model above, we now also consider the motion blur operator \(\widehat{L}_\ell \in \mathcal {L}(L^2(\varOmega ))\) with horizontal movement defined by
and \(v = (1,0)^\top \) the direction of motion. The parameter \(\ell \) controls the velocity of the motion, respectively, the strength of the blurring effect. Since the integration domain exceeds \(\varOmega \), we extend the argument \(g_0\) by zero outside its domain. In our model, we assume the parameter \(\ell \sim \mathcal {U}(0,\bar{\ell })\) to be a uniformly distributed random variable, where we choose \(\bar{\ell }= \frac{m}{4}\) to be one fourth of the image size. In view of our underlying degradation framework, we decompose the above object into
with \(L \in \mathcal {L}(L^2(\varOmega ))\) the linear operator such that \(L g_0 = \mathbb {E}\left[ {\widehat{L}_\ell g_0}\right] \) holds for all \(g_0 \in L^2(\varOmega )\) and \(\xi := \widehat{L}_\ell g_0 - L g_0\) as centered random variable. The expected value is the subject of the following auxiliary result.
Lemma 21
Let \(\widehat{L}_\ell g(x) := \frac{1}{2} \int _{-1}^1 g(x + t\ell v) \mathrm {d}t\) as above and \(\ell \sim \mathcal {U}(0,\bar{\ell })\) be a uniformly distributed random variable with parameter \(\bar{\ell }> 0\). Then the operator L has the form
Proof
It is easily seen that for each \(g \in C_0^\infty (\varOmega )\) the mapping \(\ell \mapsto \widehat{L}_\ell g\) is (Lipschitz-)continuous. Using \(\Vert \widehat{L}_\ell \Vert _{\mathcal {L}(L^2(\varOmega ))} \le 1\) for all \(\ell \in \mathbb {R}\) (shown similary to the proof of \(\Vert L\Vert _{\mathcal {L}(L^2(\varOmega ))} \le 1\) below) and the density of \(C_0^\infty (\varOmega )\) in \(L^2(\varOmega )\) we obtain by the Banach–Steinhaus theorem the continuity of \(\ell \mapsto \widehat{L}_\ell g\) for all \(g \in L^2(\varOmega )\) and hence the measurability of the above random variable.
For the calculation of the expected value, choose again \(g \in C_0^\infty (\mathbb {R}^d)\) arbitrarily. Then we obtain by direct computation
The set of integration M is depicted in Fig. 1. Now we change the order of integration by slicing the set vertically instead of horizontally. A slice at \(t \in (-1,1)\) reads then as the interval (|t|, 1). Hence, we proceed and obtain
Finally, we obtain the boundedness of L by
from which we deduce the relation for \(L^2(\varOmega )\) by using again the density of \(C_0^\infty (\varOmega )\) in \(L^2(\varOmega )\). \(\square \)
5.3 Experimental Setup
As set of test images we take the image of the cameraman as well as an MRI scan of a human head in side view (see Fig. 2). We apply both instances of our degradation method to both test images, respectively. A selection of samples is depicted in Fig. 3. Since both of our methods are sample based, we need to generate a critical point of the Ambrosio–Tortorelli functional for each sample. This task can be parallelized. Of course we are unable to solve the minimization problem exactly and hence employ the finite element method (FEM) proposed in [14] in combination with the splitting method for the corresponding discretization. In order to obtain an FE mesh, we divide each of the abovementioned pixel squares into two triangles by inserting alternately a diagonal; see Fig. 4. Since the FEM approximation needs resources not depending on the image (like parts of the local stiffness matrices), we are able to share them among the minimization processes in order to reduce computational overhead. Our implementation is done in MATLAB. For the numerical solution of the PDEs, we use the software package AFEM [4].
In the white noise case, we use the parameters \(\alpha = 75, \beta = 1, \varepsilon = 10^{-2}\) and \(\eta = \varepsilon ^{1.1}\), and in the motion blur case, we change to the parameters \(\alpha = 50, \beta = 10\) for the cameraman and \(\alpha = 20, \beta = 10\) for the MRI. Since in the latter case the noise component \(\xi \) is depending on the original image, a change of parameters is justified.
Moreover, we emphasize that depending on the form of the operator L it can be difficult to obtain the mass matrix with entries \((L {{\phi }}_i, L {{\phi }}_j)_{L^2(\varOmega )}\) exactly, where \({{\phi }}_i\) are nodal basis functions. Hence we are working with the interpolations \(I_\mathcal {T} L {{\phi }}_i\) instead, where \(I_\mathcal {T}: C(\bar{\varOmega }) \rightarrow C(\bar{\varOmega })\) denotes the nodal interpolation operator defined by
with \((x_i)_{i = 1, \dots , (n+1)(m+1)}\) being the coordinates of the nodes.
5.4 Pointwise Quantiles
Firstly we consider the pointwise quantiles in order to detect areas, where an edge is likely to be found. As level of significance, we take \(s = 0.9\) for the noise model cases. For the motion blur case, we take \(s = 0.5\)—obtaining the median—and \(s = 0.1\) to better study the behavior of the quantiles. The visual results are depicted in Fig. 5.
We note that the quantiles of experiments for the white noise case do not differ significantly from the edge indicators of the original images. In fact, on samples of edge indicators one can see little dark spots generated by the algorithm. Since they are not very distinct and their position is random, the quantile does not react sensitive to them. A more sensitive reaction can be seen for the expected values in the following section.
In the motion blur model, the reconstruction behavior has an influence. Since our blurring operator and the actually applied \(\widehat{L}_\ell \) differ, the reconstruction part tries to transport intensity where it would normally not belong to. This behavior leads to high oscillations (especially for small \(\ell \)) as well as doubling effects (especially for big \(\ell \)) in the corresponding reconstructed image. This results in the horizontal motion of strongly indicated edges as well as weak, blurred edges. Only very small parts of the edges—mainly horizontal ones—remain at the same position.
In order to further clarify this effect, we resort to an additional synthetic test image similar to the one from [27] for the parameters \(\alpha = 20\), \(\beta = 10\) and evaluate the quantiles using only 200 samples for a descending series of levels of significance. The results can be found in Fig. 6.
One observes that for high levels of significance (\(s = 0.9\) and \(s = 0.75\)) the only clearly recognizable edges are the horizontal ones (upper and lower edge of the bar, the lower edge of the triangle and at the upper and lower poles of the circle). When reducing s, the moving edges gain importance, which results in gray areas around the places corresponding to the circle and the triangle. As it can be seen in the result for \(s = 0.3\), the presence of distinct edges next to each other leads—due to the motion blur—to an overlap. This results in the detection of an edge at a position where for the original image no edge occurs at all. For low levels of significance \(s = 0.2\) and \(s = 0.1\), the edges corresponding to the original image are duplicated in horizontal direction around their original positions. In the last image, this results in a broad band of edges.
Coming now back to our images considered in Fig. 5, we can build on the above observations to get an improved understanding of the mechanisms leading to these results. As we have seen, edges in different directions occur in the quantile for very different values of s. Thus, we use here two levels \(s = 0.5\) and \(s=0.1\) to get a better understanding. Again, we observe for the first value only gray areas as well as prolonged horizontal edges (e.g., the shoulder of the cameraman and the top of the head in the MRI example). For the lower value of \(s = 0.1\), we observe doubled edges around the ones for the original images. In the cameraman image on the left and right boundary, gray horizontal ‘oscillations’ can be observed. This effect occurs since the image function has been extended by 0 outside of \(\varOmega \). Since the background is originally gray, we generate here an edge (in contrast to the MRI image, where the background was already black in the first place). This also leads to some oscillatory effects on the left side due to “collisions” of the boundary and the man. We note that since our synthetic test image is an extreme example we do not have here such large bands of edges, but we still find duplications of edges.
5.5 Expected Values
In this section, we utilize the control variates technique to obtain the expected values of the edge indicators. Besides the mean values themselves, we also want to access the variance reduction effect. In our experiments, we use a total of 3 (respectively, 4) control variates \((y_j)\) for the noise-only model (respectively, motion blur model). Their choice is heuristic and specified in Table 1. For the noise case, we choose a different edge detection approach using filter-based methods which is way cheaper than the Mumford–Shah or Ambrosio–Tortorelli-based methods, respectively. Therefore, we use the so-called Marr–Hildreth filter, which reads
where \(k_\sigma := \exp (-\frac{1}{2\sigma }|x|^2)\) denotes the Gaussian kernel. The idea is to apply a smoothing step to reduce on the one hand the possible influence of noise and, on the other hand, to increase the regularity from an image signal in \(L^2(\mathbb {R}^2)\) to a smoothed signal in \(C^\infty (\mathbb {R}^2)\). The application of the Laplacian seeks to find sources and valleys in the gradient map, which resort to rapid changes in the gradient. Due to its structure, the Marr–Hildreth filter is also called Laplacian of Gaussian in the literature. For more details on this, the interested reader is referred to [18, section 2.6.2] and the references therein.
As control variates, we then choose its absolute value as well as the exponential of its weighted negative. Especially, the latter has a chance to behave at least locally similar to the edge indicator obtained by our model. Additionally, we use the original image to get the noise applied to the image itself.
For motion blur, a direct application of the above technique would be inappropriate. Due to the presence of the operator L, we first apply a reconstruction step to the image based on the following consideration: Having the edge indicator given, one would just need to solve the linear equation (2a) to obtain the restored image. Since the real edge indicator is not given and expensive to calculate, we perform the reconstruction with respect to the two extreme cases, \(z = 1\) corresponding to the a priori assumption of no edge occurring in the image at all, and \(z = 0\) corresponding to the case of edges appearing everywhere in the image. Based on this, we consider the functions \(v_1,v_2 \in H^1(\varOmega )\) as solutions of the PDEs
where \(L^*\) is the adjoint of L and \(\nu \) denotes the outward unit normal to \(\partial \varOmega \). In this sense, we obtain two reconstructions of the image with respect to two different parameter choices. As for the noise model case, we apply again the Marr–Hildreth filter to \(v_1,v_2\) and use the absolute values as well as an exponential of its weighted negative.
To study the variance reduction effect, we compare the variances of the edge indicator z with the ones of the modified variable \(z_{cv}\). Since our approach is based on the pointwise application of control variates, we moreover investigate where the variance has been reduced. For this sake, we plot the difference of the pointwise variances \(\mathbb {V}\left[ {z}\right] - \mathbb {V}\left[ {z_{cv}}\right] \). We remark here that mathematically the difference can never be negative, but since our multiplier and our variances are just estimations, it can happen that scattered negative values do occur. Hence the plots have been truncated at zero from below to give a clearer view. Note that in Fig. 7 the colormap is inverted for better visibility.
An overall number of 1000 samples has been generated, where 100 of them have been used to approximate the multipliers for the control variates. Consequently, for the estimation of mean, variance and difference of pointwise variances the other 900 were used.
For the noise case, we see in Fig. 8 that the background is considerably darker in comparison to the quantiles. This behavior is caused by the isotropic distribution of little dark spots generated by the noise. Besides that, there is again a strong similarity to the edge indicator of the original image. Moreover, a noticeable variance reduction can be observed. By considering the plots of the difference of the pointwise variances, we see a bigger improvement in regions, where many edges are close to each other.
For the motion blur model, we see horizontal edges pulled longer. By closer inspection, one finds several edges parallel and near to the original ones, such that they form a weak blurred band. This is caused by the motion of edges in horizontal direction described in the previous section. A significantly higher variance reduction effect can be observed in Table 2, which is not surprising as the Mumford–Shah model has been designed to reduce the effect of noise. Regarding the difference of the pointwise variances, we find the highest reductions in close proximity to the edges of the original image.
6 Conclusion
In this paper, we studied the influence of errors in images on edges resulting from an image segmentation procedure. Due to the inherent difficulty of performing uncertainty quantification for a geometric quantity, we transitioned to the Ambrosio–Tortorelli problem to profit from the accessible vector space structure. Despite the lack of convexity or unique solutions, we were able to establish a rigorous theoretical foundation for selections of solutions. To perform a quantitative treatment, we proposed numerical methods addressing quantiles and the expected values and applied them to practically meaningful images and instances of the considered degradation model.
Since uncertainty quantification of geometric objects is a relatively new and scarcely investigated topic, we offer here a possible access point for researchers dealing with problems having a similar geometric structure. From the practical viewpoint, our results might be of importance to obtain a qualitative understanding of the uncertainty associated with (reconstructed) edges in certain medical or machine vision applications.
Our investigation opens several new research perspectives. On the one hand, one may incorporate more general degradation operators in the Mumford–Shah formulation. This leads to analytical difficulties for the existence of solutions as well as numerical challenges.
On the numerical level: due to the anisotropic nature of the error indicator as well as the phase field approach the use of adaptive mesh refinement is of interest. Further, with the large amount of data generated from images, the use of compression techniques like low rank approximations might be attractive. However, in both cases the interplay of uncertainties might lead to difficulties and needs careful treatment.
7 A Word of Caution at the End
As an addendum, we want to emphasize that the white noise model considered in this work cannot be interpreted as a pixelwise projection of a random variable with values in \(L^2(\varOmega )\). In the following we will make use of techniques from probability theory, which can for example be found in [29]. In contrast to before let \(\varOmega = (0,1)^2\) and define for \(n \in \mathbb {N}\) a uniform grid \(Q^n = {(Q^n_{jk})}^n_{j,k = 1}\) such that the area of a square around every pixel is \(\tfrac{1}{n^2}\). Precisely speaking, we are going to show that there exists no random variable \(\xi ^* \in {L^{2}_\mathbb {P}}\) such that the white noise with respect to this grid \(\xi _Q\) has the same distribution as \(\Pi _Q \xi ^*\), where the operator \(\Pi _Q \in \mathcal {L}(L^2(\varOmega ))\) is the piecewise projection onto the pixel grid defined by
Having \(\lambda ^2(Q^n_{jk}) = \frac{1}{n^2}\), we introduce the orthonormal system \((e_{jk}^n)_{j,k = 1}^n\) with \(e_{jk}^n := n \mathbb {1}_{Q_{jk}^n}\) and rewrite the projection \(\Pi _{Q^n}\) as
The white noise reads as
with \((\xi _{jk})_{j, k = 1}^n\) a family of independent, identically distributed random variables with standard normal distribution. It is straightforward to show that \(\Pi _{Q^{n}}\xi ^{2n} = \xi ^n\) holds for all n. For proving the non-existence of \(\xi ^*\) as described above, we assume the contrary. Using the fact that \({\text {diam}}(Q_{jk}^n) = \frac{\sqrt{2}}{n}\) goes to zero for \(n \rightarrow \infty \) one can show by the density of smooth functions in \(L^2(\varOmega )\) combined with a version of the Banach–Steinhaus theorem (cf. [42, Korollar IV 2.5]) that the operators \(\Pi _{Q^n}\) converge pointwise to the identity on \(L^2(\varOmega )\).
Hence \(\Pi _{Q_n} \xi ^* \rightarrow \xi ^*\) converges \(\lambda ^2 \otimes \mathbb {P}\)-a.e. In particular this yields that \(\xi _n\)—having the same distribution as \(\Pi _{Q_n} \xi ^*\)—must converge in distribution to \(\xi ^*\) meaning that the corresponding probability distributions \(\mu _n\) are weakly convergent in the sense of measures. This implies especially the pointwise convergence of the respective characteristic functions, reading for an arbitrary \(v \in L^2(\varOmega )\) as
towards the characteristic function of \(\xi ^*\). The computation of \(\varphi _n\) yields
As we have seen above, the linear operators \(\Pi _{Q^n}\) converge pointwise to the identity, such that we deduce the convergence
Since we have assumed that the \(\mu _n\) converge weakly, this implies that the characteristic function of the distribution of \(\xi ^*\) must be \(\exp \left( -\tfrac{1}{2}\Vert v\Vert _H^2\right) \). But according to [31, Proposition 1.2.11], there exists no probability measure with this particular characteristic function. This yields the anticipated contradiction.
Hence we deduce that the term white noise is only justified with respect to a given pixel grid.
References
Aliprantis, C.D., Border, K.C.: Infinite Dimensional Analysis: A Hitchhiker’s Guide. Springer, Berlin (2006)
Attouch, H., Buttazzo, G., Michaille, G.: Variational Analysis in Sobolev and BV Spaces: Applications to PDEs and Optimization, MOS-SIAM Series on Optimization, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2014)
Adams, R.A., Fournier, J.J.F.: Sobolev Spaces. Pure and Applied Mathematics. Elsevier Science, Amsterdam (2003)
AFEM software package. www.math.hu-berlin.de/~ccafm/teachingAdvanced/Tutorials/afem.php. Last visited 10th of May 2019
Aubert, G., Faugeras, O., Kornprobst, P.: Mathematical Problems in Image Processing: Partial Differential Equations and the Calculus of Variations. Applied Mathematical Sciences. Springer, New York (2008)
Ambrosio, L., Fusco, N., Pallara, D.: Functions of Bounded Variation and Free Discontinuity Problems. Oxford Science Publications, Clarendon Press, London (2000)
Albert, A.: Regression and the Moore–Penrose Pseudoinverse. Mathematics in Science and Engineering. Elsevier Science, Amsterdam (1972)
Ambrosio, L.: A compactness theorem for a new class of functions of bounded variation. Bollettino della Unione Matematica Italiana 3(4), 857–881 (1989)
Ambrosio, L.: Existence theory for a new class of variational problems. Arch. Ration. Mech. Anal. 111(4), 291–322 (1990)
Ambrosio, L., Tortorelli, V.M.: Approximation of functional depending on jumps by elliptic functional via t-convergence. Commun. Pure Appl. Math. 43(8), 999–1036 (1990)
Ambrosio, L., Tortorelli, V.M.: On the approximation of free discontinuity problems. Boll. Unione Mat. Ital. VII. Ser. B 6(1), 105–123 (1992)
Bankman, I.: Handbook of Medical Image Processing and Analysis. Academic Press Series in Biomedical Engineering, 2nd edn. Elsevier Science, Amsterdam (2008)
Beer, G.: Topologies on Closed and Closed Convex Sets. Mathematics and Its Applications. Springer, Berlin (1993)
Burke, S., Ortner, C., Süli, E.: An adaptive finite element approximation of a variational model of brittle fracture. SIAM J. Numer. Anal. 48(3), 980–1012 (2010)
Chambolle, A.: Image segmentation by variational methods: Mumford and Shah functional and the discrete approximations. SIAM J. Appl. Math. 55(3), 827–863 (1995)
Caselles, V., Kimmel, R., Sapiro, G., Sbert, C.: Minimal surfaces based object segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 19(4), 394–398 (1997)
Cohen, L.D.: Avoiding local minima for deformable curves in image analysis. In: Le Méhauté, A., Rabut, C., Schumaker, L.L. (eds.) Curves and Surfaces with Applications in CAGD, pp. 77–84. Vanderbilt University Press, Nashvill (1997)
Chan, T.F., Shen, J.: Image Processing and Analysis: Variational, PDE, Wavelet, and Stochastic Methods. Other Titles in Applied Mathematics. Society for Industrial and Applied Mathematics, Philadelphia (2005)
Chen, Y.-X., Zhu, W.-J.: Note on the strong law of large numbers in a Hilbert space. Gen. Math. 19(3), 11–18 (2011)
De Giorgi, E., Carriero, M., Leaci, A.: Existence theorem for a minimum problem with free discontinuity set. Arch. Ration. Mech. Anal. 108(4), 195–218 (1989)
Evans, L.C., Gariepy, R.F.: Measure Theory and Fine Properties of Functions. Studies in Advanced Mathematics. Taylor & Francis, New York (1991)
Geman, D., Geman, S.: Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6(6), 721–741 (1984)
Haehnle, J.: Numerical Analysis of the Mumford–Shah and Mumford–Shah–Euler Functionals for Sphere-Valued Functions, and Applications to Numerical Image Processing. Dissertation, Universität Tübingen (2010)
Hintermüller, M., Laurain, A.: Multiphase image segmentation and modulation recovery based on shape and topological sensitivity. J. Math. Imaging Vis. 35(1), 1–22 (2009)
Hintermüller, M., Ring, W.: A second order shape optimization approach for image segmentation. SIAM J. Appl. Math. 64(2), 442–467 (2003)
Hintermüller, M., Ring, W.: An inexact Newton-CG-type active contour approach for the minimization of the Mumford–Shah functional. J. Math. Imaging Vis. 20, 19–42 (2004)
Hintermüller, M., Stadler, G.: An infeasible primal–dual algorithm for total bounded variation-based inf-convolution-type image restoration. SIAM J. Sci. Comput. 28(1), 1–23 (2006)
Hytönen, T., van Neerven, J., Veraar, M., Weis, L.: Analysis in Banach Spaces: Volume I—Martingales and Littlewood–Paley Theory. Ergebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge/A Series of Modern Surveys in Mathematics. Springer International Publishing, Cham (2016)
Klenke, A.: Probability Theory: A Comprehensive Course. Universitext. Springer, London (2013)
Kass, M., Witkin, A., Terzopoulos, D.: Snakes; active contour models. Int. J. Comput. Vis. 1, 321–331 (1987)
Maniglia, S., Rhandi, A.: Gaussian measures on separable Hilbert spaces and applications. Quaderni del Dipartimento di Matematica dell’Università del Salento 2004(1) (2004)
Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and associated variational problems. Commun. Pure Appl. Math. 42(5), 577–685 (1989)
Morel, J.-M., Solimini, S.: Variational Methods in Image Segmentation, Volume 14 of Progress in Nonlinear Differential Equations and Their Applications. Birkhäuser Boston Inc., Boston (1995). With seven image processing experiments
Owen, A.B.: Monte Carlo theory, methods and examples. not published yet, see http://statweb.stanford.edu/~owen/mc/ (2013)
Parker, J.R.: Algorithms for Image Processing and Computer Vision. EBL-Schweitzer, 2nd edn. Wiley, New York (2010)
Pätz, T., Kirby, R.M., Preusser, T.: Ambrosio–Tortorelli segmentation of stochastic images: model extensions, theoretical investigations and numerical methods. Int. J. Comput. Vis. 103(2), 190–212 (2013)
Rockafellar, R.T., Wets, M., Wets, R.J.B.: Variational Analysis. Grundlehren der mathematischen Wissenschaften. Springer, Berlin (2009)
Smith, R.C.: Uncertainty Quantification: Theory, Implementation, and Applications. Computational Science and Engineering. SIAM, Philadelphia (2013)
Sullivan, T.J.: Introduction to Uncertainty Quantification. Texts in Applied Mathematics. Springer International Publishing, Cham (2015)
Treves, F., Smith, P.A., Eilenberg, S.: Topological Vector Spaces, Distributions and Kernels: Pure and Applied Mathematics, vol. 25. Elsevier Science, Amsterdam (2016)
van der Vaart, A.W.: Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge (2000)
Werner, D.: Funktionalanalysis. Springer-Lehrbuch. Springer, Berlin (2006)
Acknowledgements
The authors thank the anonymous referees for many constructive comments and suggestions that helped to improve the paper.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The first two authors acknowledge financial support by the DFG under the Grant HI 1466/10-1 and the third author under SU 963/1-1, both associated with the project “Generalized Nash Equilibrium Problems with Partial Differential Operators: Theory, Algorithms, and Risk Aversion” within the priority programme SPP1962 “Non-smooth and Complementarity-based Distributed Parameter Systems: Simulation and Hierarchical Optimization.” Moreover, the first author received funding by the Einstein Center Mathematics research project (ECMath CH12) as well as by the Berlin Mathematics Research Center MATH+
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Hintermüller, M., Stengl, SM. & Surowiec, T.M. Uncertainty Quantification in Image Segmentation Using the Ambrosio–Tortorelli Approximation of the Mumford–Shah Energy. J Math Imaging Vis 63, 1095–1117 (2021). https://doi.org/10.1007/s10851-021-01034-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10851-021-01034-2