Iterative Potts Minimization for the Recovery of Signals with Discontinuities from Indirect Measurements: The Multivariate Case

Kiefer, Lukas; Storath, Martin; Weinmann, Andreas

doi:10.1007/s10208-020-09466-9

Iterative Potts Minimization for the Recovery of Signals with Discontinuities from Indirect Measurements: The Multivariate Case

Open access
Published: 06 July 2020

Volume 21, pages 649–694, (2021)
Cite this article

Download PDF

You have full access to this open access article

Foundations of Computational Mathematics Aims and scope Submit manuscript

Iterative Potts Minimization for the Recovery of Signals with Discontinuities from Indirect Measurements: The Multivariate Case

Download PDF

Lukas Kiefer^1,2,
Martin Storath³ &
Andreas Weinmann²

2724 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

Signals and images with discontinuities appear in many problems in such diverse areas as biology, medicine, mechanics and electrical engineering. The concrete data are often discrete, indirect and noisy measurements of some quantities describing the signal under consideration. A frequent task is to find the segments of the signal or image which corresponds to finding the discontinuities or jumps in the data. Methods based on minimizing the piecewise constant Mumford–Shah functional—whose discretized version is known as Potts energy—are advantageous in this scenario, in particular, in connection with segmentation. However, due to their non-convexity, minimization of such energies is challenging. In this paper, we propose a new iterative minimization strategy for the multivariate Potts energy dealing with indirect, noisy measurements. We provide a convergence analysis and underpin our findings with numerical experiments.

Fast Convex Optimization via Differential Equation with Hessian-Driven Damping and Tikhonov Regularization

Article 30 May 2024

Infinite-dimensional distances and divergences between positive definite operators, Gaussian measures, and Gaussian processes

Article 21 May 2024

Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

Article 07 June 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Problems involving reconstruction tasks for functions with discontinuities appear in various biological and medical applications. Examples are the steps in the rotation of the bacterial flagella motor [70, 78, 79], the cross-hybridization of DNA [30, 44, 77], X-ray tomography [74], electron tomography [49] and SPECT [51, 93]. An engineering example is crack detection in brittle material in mechanics [3]. Further examples may for instance be found in the papers [22, 25, 34, 58, 59] and the references therein. In general, signals with discontinuities appear in many applied problems. A central task is to restore the jumps, edges, change points or segments of the signals or images from the observed data. These observed data are usually indirectly measured. Furthermore, they consist of measurements on a discretized grid and are typically corrupted by noise.

In many scenarios, non-convex nonsmooth variational methods are a suitable choice for the partitioning task, i.e., the task of finding the jumps/edges/change points; see for example [13, 58, 70]. In particular, methods based on piecewise constant Mumford–Shah functionals [62, 63] have been used in various different applications. The piecewise constant Mumford–Shah model also appears in statistics and image processing where it is often called Potts model [13,14,15, 36, 72, 91]; this is a tribute to Renfrey B. Potts and his work in statistical mechanics [73]. The variational formulation of the piecewise constant Mumford–Shah/Potts model (with an indirect measurement term) is given by

$$\begin{aligned} \textstyle {\text {argmin}}_u \ \gamma \, \Vert \nabla u\Vert _{0} + \left\Vert A u - f \right\Vert _{2}^2. \end{aligned}$$

(1)

Here, A is a linear operator modeling the measurement process, e.g., the Radon transform in computed tomography (CT), or the point-spread function of the microscope in microscopy. Further, f is an element of the data space, e.g., a sinogram or part of it in CT, or the blurred microscopy image in microscopy. The mathematically precise definition of the jump term $\Vert \nabla u \Vert _{0}$ in the general situation is rather technical. However, if u is piecewise constant and the discontinuity set of u is sufficiently regular, say, a union of $C^1$ curves, then $\Vert \nabla u\Vert _{0}$ is just the total arc length of this union. In general, the gradient $\nabla u$ is given in the distributional sense and the boundary length is expressed in terms of the $(d-1)$-dimensional Hausdorff measure. When u is not piecewise constant, the jump penalty is infinite [75]. The second term measures the fidelity of a solution u to the data f. The parameter $\gamma > 0$ controls the balance between data fidelity and jump penalty. (A wider class of Mumford–Shah models can be obtained by replacing the squared $L^2$ distance by more general data terms such as other norm-based expressions or divergences.)

The piecewise constant Mumford–Shah/Potts model can be interpreted in two ways. On the one hand, if the imaged object is (approximately) piecewise constant, then the solution is an (approximate) reconstruction of the imaged object. On the other hand, since a piecewise constant solution directly induces a partitioning of the image domain, it can be seen as joint reconstruction and segmentation. Executing reconstruction and segmentation jointly typically leads to better results than performing the two steps successively [51, 74, 75, 85]. We note that in order to deal with the discrete data, the energy functional is typically discretized; see Sect. 2.1. Some references concerning Mumford–Shah functionals are [2, 8, 18, 33, 45, 67, 75] and also the references therein; see also the book [1]. The piecewise constant Mumford–Shah functionals are among the maybe most well-known representatives of the class of free-discontinuity problems introduced by De Giorgi [29].

The analysis of the nonsmooth and non-convex problem (1) is rather involved. We discuss some analytic aspects. We first note that without additional assumptions the existence of minimizers of (1) is not guaranteed in a continuous domain setting [32, 33, 75, 84]. To ensure the existence of minimizers, additional penalty terms such as an $L^p$ ($1<p<\infty $) term of the form $\Vert u\Vert _p^p$ [74, 75] or pointwise boundedness constraints [45] have been considered. We note that the existence of minimizers is guaranteed in the discrete domain setup for typical discretizations [33, 84]. Another important topic is to verify that the Potts model is a regularization method in the sense of inverse problems. The first work dealing with this task is [75]: The authors assume that the solution space consists of non-degenerate piecewise constant functions with at most k (arbitrary, but fixed) different values which are additionally bounded. Under relatively mild assumptions on the operator A, they show stability. Further, by giving a suitable parameter choice rule, they show that the method is a regularizer in the sense of inverse problems. Related references are [45, 50] with the latter including (non-piecewise constant) Mumford–Shah functionals. We note that Mumford–Shah approaches (including the piecewise constant Mumford–Shah variant) also regularize the boundaries of the discontinuity set of the underlying signal [45].

Solving the Potts problem is algorithmically challenging. For $A = \mathrm {id},$ it is NP-hard for multivariate domains [13, 87], and, for general linear operators A, it is even NP-hard for univariate signals [84]. Thus, finding a global minimizer within reasonable time seems to be unrealistic in general. Nevertheless, due to its importance, many approximative strategies for multivariate Potts problems with $A = \mathrm {id}$ have been proposed. (We note that the case $A = \mathrm {id}$ is important as well since it captures the partitioning problem in image processing.) For the Potts problem with general A there are still some but not that many existing approaches, in particular in the multivariate situation. For a more detailed discussion, we refer to the paragraph on algorithms for piecewise constant Mumford–Shah problems below. A further discussion of methods for reconstructing piecewise constant signals may be found in [59]. In [90], we have considered the univariate Potts problem for a general operator A and have proposed a majorization–minimization strategy which we called iterative Potts minimization in analogy to iterative thresholding schemes. In this work, we will develop iterative Potts minimization schemes for the more demanding multivariate situation which is important for multivariate applications as appearing in imaging problems.

Existing Algorithmic Approaches to the Piecewise Constant Mumford–Shah Problem and Related Problems We start to consider the Potts problem for general operator A. In [5], Bar et al. consider an Ambrosio–Tortorelli-type approximation. Kim et al. use a level-set-based active contour method for deconvolution in [48]. Ramlau and Ring [74] employ a related level-set approach for the joint reconstruction and segmentation of X-ray tomographic images; further applications are electron tomography [49] and SPECT [51]. The authors of the present paper have proposed a strategy based on the alternating methods of multipliers in [84] for the univariate case and in [85] for the multivariate case.

Fornasier and Ward [33] rewrite Mumford–Shah problems as a pointwise penalized problem and derive generalized iterative thresholding algorithms for the rewritten problems in the univariate situation. Further, they show that their method converges to a local minimizer in the univariate case. Their approach principally carries over to the piecewise constant Mumford–Shah functional as explained in [84, 90] and then results in a $\ell ^0$ sparsity problem. In the univariate situation, this NP-hard optimization problem is unconstrained and may be addressed by iterative hard thresholding algorithms for $\ell ^0$ penalizations, analyzed by Blumensath and Davies in [9, 10]. (Note that related algorithms based on iterative soft thresholding for $\ell ^1$ penalized problems have been considered by Daubechies, Defrise and De Mol in [28].) Artina et al. [3] in particular consider the multivariate discrete Mumford–Shah model using the pointwise penalization approach of [33]. In the multivariate setting, this results in a corresponding non-convex and nonsmooth problem with linear constraints. The authors successively minimize local quadratic and strictly convex perturbations (depending on the previous iterate) of a (fixed) smoothed version of the objective by augmented Lagrangian iterations which themselves can be accomplished by iterative thresholding via a Lipschitz continuous thresholding function. They show that the accumulation points of the sequences produced by their algorithm are constraint critical points of the smoothed problem. In the multivariate situation, a similar approach for rewriting the Potts problem results in an $\ell ^0$ sparsity problem with additional equality constraints. Algorithmic approaches for such $\ell ^0$ sparsity problem with equality constraints are the penalty decomposition methods of [60, 61, 96]. The connection with iterative hard thresholding is that the inner loop of the employed two-stage process usually is of iterative hard thresholding type. The difference of the hard thresholding-based methods to our approach in this paper is that we do not have to deal with constraints and the full matrix A but with the nonseparable regularizing term $\Vert \nabla u \Vert _0$ instead of its separable counterpart $\Vert u\Vert _0.$ Hence, we cannot use hard thresholding.

Another frequently appearing method in the context of restoration of piecewise constant images is total variation minimization [76]. There the jump penalty $\Vert \nabla u \Vert _0$ is replaced by the total variation $\Vert \nabla u \Vert _1.$ The arising minimization problem is convex and therefore numerically tractable with convex optimization techniques [21, 26]. Candès, Wakin and Boyd [17] use iteratively reweighted total variation minimization for piecewise constant recovery problems. Results of compressed sensing type related to the Potts problem have been derived by Needell and Ward [65, 66]: under certain conditions, minimizers of the Potts function agree with total variation minimizers. However, in the presence of noise, total variation minimizers might significantly differ from minimizers of the Potts problem. But the minimizers of the Potts problem are the results frequently desired in practice. Further, algorithms based on convex relaxations of the Potts problem (1) have gained a lot of interest in recent years; see, e.g., [4, 16, 20, 37, 56, 86].

We next discuss approaches for the multivariate Potts problem for the situation $A = \mathrm {id}$ which is particularly interesting in image processing and for which there are some further approaches. The first class of approaches is the approach via graph cuts. Here, the range space of u is a priori restricted to a relatively small number of values. The problem remains NP-hard, but it then allows for an approach by sequentially solving binary partitioning problems via minimal graph cut algorithms [12, 13, 52]. We here point out that this approach also can deal with (possibly non-convex) data fidelity terms more general than the squared $L^2$ data term employed in (1) (in the case $A = \mathrm {id}$). Another approach is to limit the number k of different values which u may take without discretizing the range space a priori. For $k=2,$ active contours were used by Chan and Vese [24] to minimize the corresponding binary Potts model. They use a level-set function to represent the partitions which evolves according to the Euler–Lagrange equations of the Potts model. A globally convergent strategy for the binary segmentation problem is presented in [23]. The active contour method for $k=2$ was extended to larger k in [88]. Note that, for $k >2$ the problem is NP-hard. We refer to [27] for an overview on level-set segmentation. In [40,41,42], Hirschmüller proposes a non-iterative strategy for the Potts problem which is based on cost aggregation. It has lower computational cost, but comes with lower quality reconstructions compared with graph cuts. Due to the small number of potential values of u, these methods mainly appear in connection with image segmentation. Methods for restoring piecewise constant images without restricting the range space are proposed in Nikolova et al. [68, 69]. They use non-convex regularizers which are algorithmically approached using a graduated non-convexity approach. We note that the Potts problem (1) does not fall into the class of problems considered in [68, 69]. Last but not least, Xu et al. [94] proposed a piecewise constant model reminiscent of the Potts model that is approached by a half-quadratic splitting using a pixelwise iterative thresholding type technique. It was later extended to a method for blind image deconvolution [95].

Contributions The contributions of this paper are threefold: (i) We propose a new iterative minimization strategy for multivariate piecewise constant Mumford–Shah/Potts objective functions as well as a (still NP-hard) quadratic penalty relaxation. (ii) We provide a convergence analysis of the proposed schemes. (iii) We show the applicability of our schemes in several experiments.

Concerning (i), we propose two schemes which are based on majorization–minimization or forward–backward splitting methods of Douglas–Rachford type [57]. The one scheme addresses the Potts problem directly, whereas the other scheme treats a quadratic penalty relaxation. The solutions of the relaxed problem themselves are not feasible for the Potts problem but near to a feasible solution of the Potts problem where nearness can be quantified. In particular, when a given tolerance in applications is acceptable the relaxed scheme is applicable. In contrast to the approaches in [9, 33] and [60, 61] for sparsity problems which lead to thresholding algorithms, our approach leads to non-separable yet computationally tractable problems in the backward step.

Concerning (ii), we first analyze the proposed quadratic penalty relaxation scheme. In particular, we show convergence toward a local minimizer. Due to the NP-hardness of the quadratic penalty relaxation, the convergence result is in the range of what can be expected best. Concerning the scheme for the non-relaxed Potts problem we also perform a convergence analysis. In particular, we obtain results on the convergence toward local minimizers on subsequences. The quality of the convergence results is comparable with the ones in [60, 61]. We note that compared with [60, 61] we face the additional challenge to deal with the non-separability of the backward step. (We note that in practice we observe convergence of the whole sequence, not on a subsequence.)

Concerning (iii) we consider problems with full and partial data. We begin to apply our algorithms to deconvolution problems. In particular, we consider deblurring and denoising Gaussian blur images and motion blur images, respectively. We further consider noisy and undersampled Radon data, together with the task of joint reconstruction, denoising and segmentation. Finally, we use our method in the situation of pure image partitioning (without blur) which is a widely considered problem in computer vision.

Organization of the Paper In Sect. 2, we derive the proposed algorithmic schemes. In Sect. 3, we provide a convergence analysis for the proposed schemes. In Sect. 4, we apply the algorithms derived in the present paper to concrete reconstruction problems. In Sect. 5, we draw conclusions.

2 Majorization–Minimization Algorithms for Multivariate Potts Problems

2.1 Discretization

We use the following finite difference type discretization of the multivariate Potts problem (1) given by

$$\begin{aligned} P_\gamma (u) = \Vert Au - f\Vert _2^2 + \gamma \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} u \, \right\| _0, \end{aligned}$$

(2)

where the $a_s \in {\mathbb {Z}}^2$ come from a finite set of directions and the symbol $\nabla _{a_s} u \, (i,j)$ denotes the directional difference $u_{(i,j)+a_s} - u_{i,j}$ with respect to the direction $a_s$ at the pixel (i, j). The symbol $\Vert \nabla _{a_s} u \Vert _0 $ denotes the number of nonzero entries of $\nabla _{a_s} u.$ The simplest set of directions consists of the unit vectors $a_1=(0,1),$ $a_2=(1,0)$ along with unit weights. Unfortunately, when refining the grid, this discretization converges to a limit that measures the boundary in terms of the $\ell ^1$ analogue of the Hausdorff measure [18]. The practical consequences are unwanted block artifacts in the reconstruction (geometric staircasing). More isotropic results are obtained by adding the diagonals $a_3=(1,1),a_4=(1,-1)$ to the directions $a_1$ and $a_2;$ a near isotropic discretization can be achieved by extending this system by the knight moves $a_5=(1,2),a_6=(2,1),a_7=(1,-2),a_8=(2,-1).$ (The name is inspired by the possible moves of a knight in chess.) Weights $\omega _s$ for the system $\{a_1,a_2,a_3,a_4\}$ of coordinate directions and diagonal directions can be chosen as $\omega _{s} = \sqrt{2}-1 $ for the coordinate part $s=1,2$ and $\omega _{s} = 1-\tfrac{\sqrt{2}}{2}$ for diagonal part $s=3,4$. When additionally adding knight-move directions, weights $\omega _s$ for the system $\{a_1,\ldots ,a_8\}$ can be chosen as $\omega _{s} = \sqrt{5} - 2$ for the coordinate part $s=1,2,$ $\omega _{s} = \sqrt{5} - \frac{3}{2}\sqrt{2}$ for diagonal part $s=3,4$, and $\omega _{s} = \frac{1}{2}(1 + \sqrt{2} - \sqrt{5}) $ for diagonal part $s=5,\ldots ,8.$ There are several ways to derive weights $\omega _s$ for the neighborhood systems: the method of [19] is based on an optimization approach, the method of [11] is based on the Cauchy–Crofton formula, and the approach of [85] is based on equating the Euclidean lengths of straight lines and the lengths of their digital counterparts. We note that for the system $\{a_1,a_2,a_3,a_4\}$ of coordinate directions and diagonal directions the weights of [19] and in [85] coincide; the weights displayed for the knight-move case above are the ones derived by the scheme in [85]. For further details, we refer to these references.

We record that the considered problem (2) has a minimizer.

Theorem 1

The discrete multivariate Potts problem (2) has a minimizer.

The validity of Theorem 1 can be seen by following the lines of the proof of [43, Theorem 2.1] where an analogous statement is shown for the (non-piecewise constant) Mumford–Shah problem.

Vector-Valued Images We briefly discuss the extension of (2) to vector-valued images and multi-channel data, e.g., (blurred) RGB color images. To this end, we assume multi-channel data $f = (f_1,\ldots ,f_C)$ consisting of C channels and images $u = (u_1,\ldots ,u_C)$. In this situation, the role of the first summand on the right-hand side of (2) is taken by the channel-wise sum $\sum _{c=1}^C \Vert Au_c - f_c\Vert _2^2$. The symbol $\nabla _{a_s} u(i,j)$ now denotes the vector of directional differences with entries $u_{(i,j)+a_s,c} - u_{i,j,c}$, $c=1,\ldots ,C$ and the entirety of these vectors form the rows of $\nabla u_{a_s}$. Consequently, $\Vert \nabla _{a_s} u \Vert _0$ denotes the number of nonzero rows of $\nabla _{a_s}u$. As a result, introducing a jump between two pixels in all channels has the same costs as opening a jump in a single channel only. This enforces the jumps to be aligned across the channels which is in contrast to a channel-wise application of the single-channel Potts model (2).

2.2 Derivation of the Proposed Algorithmic Schemes

We start out with the discretization (2) of the multivariate Potts problem. We introduce S versions $u_1,\ldots ,u_S$ of the target u and link them via equality constraints in the following consensus form to obtain the problem

$$\begin{aligned} P_\gamma (u_1,\ldots ,u_S) \rightarrow \min , \qquad \text { s.t. } \quad u_1 = \ldots = u_S, \end{aligned}$$

(3)

where the function $P_\gamma (u_1,\ldots ,u_S)$ of the S variables $u_1,\ldots ,u_S$ is given by

$$\begin{aligned} P_\gamma (u_1,\ldots ,u_S) = \sum _{s=1}^S \frac{1}{S} \left\| Au_s - f \right\| _2^2 + \gamma \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} u_s \, \right\| _0. \end{aligned}$$

(4)

Note that solving (3) is equivalent to solving the discrete Potts problem (2). Further, note that we have overloaded the symbol $P_\gamma $ which, for one argument u, denotes the Potts function of (2) and for S arguments $u_1,\ldots ,u_S$ denotes the energy function of (4); we have the relation $P_\gamma (u,\ldots ,u)= P_\gamma (u)$.

A Majorization–Minimization Approach to the Quadratic Penalty Relaxation of the Potts Problem The quadratic penalty relaxation of (4) is given by

$$\begin{aligned} P_{\gamma , \rho }(u_1,\ldots ,u_S)= & {} \sum _{s=1}^S \frac{1}{S} \left\| Au_s - f \right\| _2^2 + \gamma \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} u_s \, \right\| _0 \nonumber \\&+ \rho \sum _{1\le s < s' \le S} c_{s,s'} \ \Vert u_s - u_{s'}\Vert _2^2. \end{aligned}$$

(5)

Here, the soft constraints which replace the equalities $u_1 = \ldots = u_S$ are realized via the squared Euclidean norms $\sum _{1\le s < s' \le S} c_{s,s'} \ \Vert u_s - u_{s'}\Vert _2^2,$ where the nonnegative numbers $c_{s,s'}$ denote weights (which may be set to zero if no direct coupling between the particular $u_s, u_{s'}$ is desired.) The symbol $\rho $ denotes a positive penalty parameter promoting the soft constraint, i.e., increasing $\rho $ enforces the $u_i$ to be closer to each other w.r.t. the Euclidean distance. We note that we later analytically quantify the size of $\rho $ which is necessary to obtain an a priori prescribed tolerance in the $u_i;$ see (18). Frequently, we use the short-hand notation

$$\begin{aligned} \rho _{s,s'}= \rho \ c_{s,s'}. \end{aligned}$$

(6)

Typical choices of the $\rho _{s,s'}$ are

$$\begin{aligned} \rho _{s,s'} = \rho \quad \text { for all } s,s', \qquad \qquad \text { or} \quad \rho _{s,s'} = \rho \ \delta _{((s+1)\mathrm{\,mod \,} S ), s'}\,, \end{aligned}$$

(7)

i.e., the constant choice ($c_{s,s'}=1$), as well as the coupling between consecutive variables with constant parameter ($\delta _{s,t} =1$ if and only if $s=t,$ and $\delta _{s,t} =0$ otherwise.) We note that in these situations only one additional positive parameter $\rho $ appears, and that this parameter is tied to the tolerance one is willing to accept as a distance of the $u_i;$ see Algorithm 1.

For the majorization–minimization approach, we derive a surrogate functional [28] of the function $P_{\gamma , \rho }(u_1,\ldots ,u_S)$ of (5). For this purpose, we introduce the block matrix B and the vector g given by

$$\begin{aligned} B = \begin{pmatrix} S^{-1/2}A &{} 0 &{} &{} \cdots &{} &{} 0 \\ 0 &{} S^{-1/2}A &{} &{} \cdots &{} &{} 0 \\ \vdots &{} &{} &{} \ddots &{} &{} \vdots \\ 0 &{} 0 &{} &{} \cdots &{} S^{-1/2}A &{} 0 \\ 0 &{} 0 &{} &{} \cdots &{} 0 &{} S^{-1/2}A \\ \rho _{1,2}^{1/2}I &{} -\rho _{1,2}^{1/2}I &{} 0 &{} \ldots &{} 0 &{} 0\\ \rho _{1,3}^{1/2}I &{} 0 &{} -\rho _{1,3}^{1/2}I &{} \ldots &{} 0 &{} 0\\ &{} \vdots &{} &{} &{} \vdots &{} \\ \rho _{1,S}^{1/2}I &{} 0 &{} 0 &{} \ldots &{} 0 &{} -\rho _{1,S}^{1/2}I \\ 0 &{} \rho _{2,3}^{1/2}I &{} -\rho _{2,3}^{1/2}I &{} \ldots &{} 0 &{} 0 \\ &{} \vdots &{} &{} &{} \vdots &{} \\ 0 &{} \rho _{2,S}^{1/2}I &{} 0 &{} \ldots &{} 0 &{} -\rho _{2,S}^{1/2}I \\ &{} &{} &{} \vdots &{} &{} \\ &{} &{} &{} \vdots &{} &{} \\ 0 &{} 0 &{} 0 &{} \ldots &{} \rho _{S-1,S}^{1/2}I &{} -\rho _{S-1,S}^{1/2}I \\ \end{pmatrix}, \quad g = \begin{pmatrix} S^{-1/2}f\\ S^{-1/2}f\\ \vdots \\ S^{-1/2}f \\ S^{-1/2}f \\ 0 \\ 0\\ \vdots \\ 0 \\ 0 \\ \vdots \\ 0 \\ \vdots \\ \vdots \\ 0\\ \end{pmatrix}. \end{aligned}$$

(8)

Here, I denotes the identity matrix and 0 the zero matrix; The matrix B has S block columns and $S+S(S-1)/2$ block rows. Further, we introduce the difference operator D given by

$$\begin{aligned} D(u_1,\ldots ,u_S) = \begin{pmatrix} \nabla _{a_1} u_1 \\ \vdots \\ \nabla _{a_S} u_S \\ \end{pmatrix} \end{aligned}$$

(9)

which applies the difference w.r.t. the ith direction to the ith component of u. We employ the weights $\omega _1,$ $\ldots , \omega _S$ to define the quantity $\Vert D(u_1,\ldots ,u_S)\Vert _{0,\omega }$ which counts the weighted number of jumps by

$$\begin{aligned} \Vert D(u_1,\ldots ,u_S)\Vert _{0,\omega } = \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} u_s \, \right\| _0. \end{aligned}$$

(10)

With all this comprehensive notation at hand, we may rewrite the function of (5) as

$$\begin{aligned} P_{\gamma ,\rho }(u_1,\ldots ,u_S) = \left\| B(u_1,\ldots ,u_S)^\mathrm{T} - g \right\| _2^2 + \gamma \ \Big \Vert \ D(u_1,\ldots ,u_S) \ \Big \Vert _{0,\omega }. \end{aligned}$$

(11)

Using the representation (11), the surrogate functional in the sense of [28] of $P_{\gamma ,\rho }$ is given by

$$\begin{aligned} P_{\gamma , \rho }^{\mathrm{surr}}(u_1,\ldots ,u_S,v_1,\ldots ,v_S)&= \frac{1}{L_{\rho }^2}\left\| B(u_1,\ldots ,u_S)^\mathrm{T} - g \right\| _2^2 + \frac{\gamma }{L_{\rho }^2} \ \Big \Vert \ D(u_1,\ldots ,u_S) \ \Big \Vert _{0,\omega }\\&\quad - \frac{1}{L_{\rho }^2}\left\| B(u_1,\ldots ,u_S)^\mathrm{T} - B(v_1,\ldots ,v_S)^\mathrm{T} \right\| _2^2 \nonumber \\&\quad + \left\| (u_1,\ldots ,u_S)^\mathrm{T} - (v_1,\ldots ,v_S)^\mathrm{T} \right\| _2^2. \nonumber \end{aligned}$$

(12)

Here, $L_{\rho } \ge 1$ denotes a constant which is chosen larger than the spectral norm $\Vert B \Vert $ of B (i.e., the operator norm w.r.t. the $\ell ^2$ norm.) This scaling is made to ensure that $B/L_{\rho }$ is contractive. In terms of A and the penalties $\rho _{s,s'},$ we require that

$$\begin{aligned} L_{\rho }^2 > \Vert A\Vert _2^2/S + 2 \max _{s \in \{1,\ldots ,S \} } \sum _{s': s'\ne s}^{S} \rho _{s,s'}. \end{aligned}$$

(13)

For the particular choice $\rho _{s,s'} = \rho $ as on the left-hand side of (7) we can choose $L_{\rho }^2$ smaller, i.e., $ L_{\rho }^2 > \Vert A\Vert _2^2/S + S \rho . $ For only coupling neighboring $u_s$ with the same constant $\rho $, i.e., the right-hand coupling of (7), we have $ L_{\rho }^2 > \Vert A\Vert _2^2/S + \alpha \rho , $ where $\alpha = 4,$ if S is even, and $\alpha = 2 - 2 \cos \left( \frac{\pi (S-1)}{S}\right) $ if S is odd. These choices ensure that $B/L_{\rho }$ is contractive by Lemma 1. Basics on surrogate functionals as we need them for this paper are gathered in Sect. 3.4. Further details on surrogate functionals can be found in [9, 10, 28].

Using elementary properties of the inner product shows that

$$\begin{aligned}&P_{\gamma , \rho }^{\mathrm{surr}} (u_1,\ldots ,u_S,v_1,\ldots ,v_S) \nonumber \\&\quad = \bigg \Vert (u_1,\ldots ,u_S)^\mathrm{T}- \bigg ((v_1,\ldots ,v_S)^\mathrm{T}-\frac{1}{L_{\rho }^2} B^\mathrm{T} (B(v_1,\ldots ,v_S)^\mathrm{T}-g) \bigg ) \bigg \Vert ^2_2 \nonumber \\&\qquad + \frac{\gamma }{L_\rho ^2}\Big \Vert D(u_1,\ldots ,u_S) \Big \Vert _{0,\omega } + R(v_1,\ldots ,v_S) , \end{aligned}$$

(14)

where $ R(v_1,\ldots ,v_S) $ is a rest term which is irrelevant when minimizing $P_{\mathrm{surr}}$ w.r.t. $u_1,\ldots ,u_S$ for fixed $v_1,\ldots ,v_S.$ Writing this down in terms of the original system matrix A and the data f yields

$$\begin{aligned}&P_{\gamma , \rho }^{\mathrm{surr}}\left( u_1,\ldots ,u_S,v_1,\ldots ,v_S \right) \\&\quad = \sum _{s=1}^S \left[ \left\| u_s - \left( v_s + \tfrac{1}{SL_{\rho }^2} A^*f - \tfrac{1}{S L_{\rho }^2} A^*A v_s - \sum _{s \ne s'}\tfrac{\rho _{s,s'}}{L_{\rho }^2} (v_s-v_{s'}) \right) \right\| _2^2 \right. \nonumber \\&\qquad \left. + \tfrac{\gamma \omega _s}{L_{\rho }^2} \left\| \nabla _{a_s} u_s \, \right\| _0 \right] + R(v). \nonumber \end{aligned}$$

(15)

For the quadratic penalty relaxation of the Potts problem, i.e., for minimizing the problem (5), we propose to use the surrogate iteration, i.e., $u^{(n+1)}_1,\ldots ,u^{(n+1)}_S$ $\in {\text {argmin}}_{u_1,\ldots ,u_S} P_{\gamma , \rho }^{\mathrm{surr}} (u_1,\ldots ,$ $u_S,u^{(n)}_1,\ldots ,u^{(n)}_S).$ Applied to (15), this surrogate iteration reads

$$\begin{aligned} \left( u^{(n+1)}_1,\ldots ,u^{(n+1)}_S \right) \in \mathop {{{\text {argmin}}}}\limits _{u_1,\ldots ,u_S} \sum _{s=1}^S \left[ \left\| u_s - h^{(n)}_s \right\| _2^2 + \tfrac{\gamma \omega _s}{L_{\rho }^2} \left\| \nabla _{a_s} u_s \, \right\| _0 \right] \end{aligned}$$

(16)

where $h^{(n)}_s$ is given by

$$\begin{aligned} h^{(n)}_s = u^{(n)}_s + \tfrac{1}{SL_{\rho }^2} A^*f - \tfrac{1}{S L_{\rho }^2} A^*A u^{(n)}_s - \sum _{s':s' \ne s}\tfrac{\rho _{s,s'}}{L_{\rho }^2} (u^{(n)}_s-u^{(n)}_{s'}), \quad \text { for all } s \in \{1,\ldots ,S\}. \end{aligned}$$

(17)

Note that in Sect. 2.3, we derive an efficient algorithm which computes an exact minimizer of (16). Now assume that we are willing to accept a deviation between the $u_s$ which is small, i.e.,

$$\begin{aligned} \Vert u_s - u_{s'}\Vert ^2_2 = \sum _{i,j}|(u_s)_{ij} - (u_{s'})_{ij}|^2 < \tfrac{\varepsilon ^2}{c_{s,s'}}, \end{aligned}$$

(18)

for $\varepsilon >0$ and for indices $s,s'$ with $c_{s,s'} \ne 0.$ The following algorithm computes a result fulfilling (18).

Algorithm 1

We consider the quadratic penalty relaxed Potts problem (5) and tolerance $\varepsilon $ for the targets $u_s$ we are willing to accept. We propose the following algorithm for the relaxed Potts problem (5) (which yields a result with targets $u_s$ deviating from each other by at most $\varepsilon /\sqrt{c_{s,s'}}$).

Set $\rho $ according to (34), set $L_{\rho }$ according to (13) (or, in the special cases of (7), as below (34) and (13).)
Initialize $u^{(n)}_s$ as discussed in the corresponding paragraph below, (e.g., $u^{(n)}_s = 0$ for all s.)
Iterate until convergence:
$$\begin{aligned} \text {1.}\quad&h^{(n)}_s = u^{(n)}_s + \tfrac{1}{SL_{\rho }^2} A^*f - \tfrac{1}{S L_{\rho }^2} A^*A u^{(n)}_s - \sum _{s':s' \ne s}\tfrac{\rho _{s,s'}}{L_{\rho }^2} (u^{(n)}_s-u^{(n)}_{s'}), \quad s = 1,\ldots ,S, \nonumber \\ \text {2.}\quad&\left( u^{(n+1)}_1,\ldots ,u^{(n+1)}_S \right) \in \mathop {{{\text {argmin}}}}\limits _{u_1,\ldots ,u_S} \sum _{s=1}^S \left[ \left\| u_s - h^{(n)}_s \right\| _2^2 + \tfrac{\gamma \omega _s}{L_{\rho }^2} \left\| \nabla _{a_s} u_s \, \right\| _0 \right] . \end{aligned}$$
(19)

We will see in Theorem 3 that this algorithm converges to a local minimizer of the quadratic penalty relaxation (5) and that the $u_s$ are $\varepsilon $-close, i.e., (18) is fulfilled.

The relation between the Potts problem and its quadratic penalty relaxation and obtaining a feasible solution for the Potts problem (4) from the output of Algorithm 1. As pointed out above, we show in Theorem 3 that Algorithm 1 produces a local minimizer of the quadratic penalty relaxation (5) of the Potts problem (4) and that the corresponding variables of a resulting solution are close up to an a priori prescribed tolerance. This may in practice be already enough. However, strictly speaking a local minimizer of the quadratic penalty relaxation (5) is not feasible for the Potts problem (4).

We will now explain a projection procedure to derive a feasible solution for the Potts problem (4) from a local minimizer of (5) with nearby variables $u_s$ (as produced by Algorithm 1.) Related theoretical results are stated as Theorem 4. In particular, we will see that in case the image operator A is lower bounded, the projection procedure applied to the output of Algorithm 1 yields a feasible point which is close to a local minimizer of the original Potts problem (4).

In order to explain the averaging procedure, we need some notions on partitionings. Recall that a partitioning ${\mathcal {P}}$ consists of a (finite number of) segments ${\mathcal {P}}_i$ which are pairwise disjoint sets of pixel coordinates whose union equals the image domain $\Omega ,$ i.e.,

$$\begin{aligned} \cup _{i=1}^{N_{\mathcal {P}}} {\mathcal {P}}_i = \Omega , \qquad {\mathcal {P}}_i \cap {\mathcal {P}}_j = \emptyset \quad \text {for all } i,j=1,\ldots ,N_{\mathcal {P}}. \end{aligned}$$

(20)

Here, we assume that each segment ${\mathcal {P}}_i$ is connected w.r.t. the neighborhood system $a_1,\ldots ,a_S$ in the sense that there is a path connecting any two elements in ${\mathcal {P}}_i$ with steps in $a_1,\ldots ,a_S.$

We will need the following proposed notion of a directional partitioning. A directional partition w.r.t. a set of S directions $a_1,\ldots ,a_S$ consists of a set ${\mathcal {I}}$ of (discrete) intervals I, where each interval I is associated with exactly one of the directions $a_1,\ldots ,a_S;$ here, an interval I associated with the direction $a_s$ has to be of the form $I = \{(i,j)+ k a_s : k = 0,\ldots , K-1\},$ where $K \in {\mathbb {N}}$ and I belongs to the discrete domain. (For each direction $a_s$, the corresponding intervals form an ordinary partition.) We note that Algorithm 1 which produces output $u=(u_1,\ldots ,u_S):\Omega \rightarrow {\mathbb {R}}^s$ induces a directional partitioning as follows. We observe that each variable $u_s$ is associated with a direction $a_s.$ For any $s \in \{1,\ldots ,S\},$ we let each (maximal) interval of constance of $u_s$ be an interval in ${\mathcal {I}}$ associated with $a_s.$

Each partitioning induces a directional partitioning ${\mathcal {I}}$ by letting the intervals I of ${\mathcal {I}}$ be the stripes with direction $a_s$ obtained from segment ${\mathcal {P}}_i$ for each direction $s =1,\ldots , S$ and each segment ${\mathcal {P}}_i, i=1, \ldots , N_{\mathcal {P}}.$ Furthermore, each directional partitioning ${\mathcal {I}}$ induces a partitioning by the following merging process.

Definition 1

We say that pixels x, y are related, in symbols, $x \sim y$, if there is a path $x_0=x,\ldots ,x_N=y$ connecting x, y in the sense that for any consecutive members $x_i,x_{i+1},$ $i=1,\ldots ,N-1,$ of the path there is an interval I of the directional partitioning ${\mathcal {I}}$ containing both $x_i,x_{i+1}.$

The relation $x\sim y$ obviously defines an equivalence relation and the corresponding equivalence classes ${\mathcal {P}}_i$ yield a partitioning on $\Omega .$ We use the symbols

$$\begin{aligned} {\mathcal {I}}({\mathcal {P}}) = {\mathcal {I}}_{{\mathcal {P}}},\qquad {\mathcal {P}}({\mathcal {I}}) = {\mathcal {P}}_{{\mathcal {I}}}, \end{aligned}$$

(21)

to denote the mappings assigning a partitioning a directional partitioning and vice versa, respectively.

As a final preparation, we consider a function $u=(u_1,\ldots ,u_S):\Omega \rightarrow {\mathbb {R}}^s$ as produced by Algorithm 1 and a partitioning ${\mathcal {P}}$ of $\Omega $ and define the following projection to a function $\pi _{\mathcal {P}}(u): \Omega \rightarrow {\mathbb {R}}$ by

$$\begin{aligned} \pi _{\mathcal {P}}(u)|_{{\mathcal {P}}_i} = \frac{\sum _{x \in {\mathcal {P}}_i} \sum _{s = 1}^S u_s(x) }{ S \ \#{\mathcal {P}}_i}, \end{aligned}$$

(22)

where the symbol $\#{\mathcal {P}}_i$ denotes the number of elements in the segment ${\mathcal {P}}_i.$ Hence, the projection $\pi $ defined via (22) averages w.r.t. all components of u and all members of the segment ${\mathcal {P}}_i$ and so produces a piecewise constant function w.r.t. the partitioning ${\mathcal {P}}.$

Using these notions, we propose the following projection procedure.

Procedure 1

(Projection Procedure) We consider output $u=(u_1,\ldots ,u_S):\Omega \rightarrow {\mathbb {R}}^s$ of Algorithm 1 together with its induced directional partitioning ${\mathcal {I}}.$

1.
Compute the partitioning ${\mathcal {P}}({\mathcal {I}}) = {\mathcal {P}}_{{\mathcal {I}}}$ induced by the directional partitioning ${\mathcal {I}}$ as explained above (21).
2.
Project $u=(u_1,\ldots ,u_S):\Omega \rightarrow {\mathbb {R}}^s$ to $\pi _{{\mathcal {P}}_{{\mathcal {I}}}}(u)$ using (22) for the partitioning ${\mathcal {P}}({\mathcal {I}}) = {\mathcal {P}}_{{\mathcal {I}}},$ and return $\pi _{{\mathcal {P}}_{{\mathcal {I}}}}(u)$ as output.

We notice that when having a partitioning ${\mathcal {P}}_{{\mathcal {I}}}$ solving the normal equation in the space of functions constant on ${\mathcal {P}}_{{\mathcal {I}}}$ would be an alternative to the above second step which, however, might be more expensive.

A Penalty Method for the Potts Problem Based on a Majorization–Minimization Approach for Its Quadratic Penalty Relaxation Intuitively, increasing the parameters $\rho $ during the iterations should tie the $u_s$ closer together such that the constraint of (3) should be ultimately fulfilled which results in an approach for the initial Potts problem (2). Recall that $\rho _{s,s'} = \rho \ c_{s,s'},$ was defined by (6), where the $c_{s,s'}$ are nonnegative numbers weighting the constraints. We here increase $\rho $ while leaving the $c_{s,s'}$ fixed during this process.

Algorithm 2

We consider the Potts problem (3) in S variables (which is equivalent to (2) as explained above). We propose the following algorithm for the Potts problem (3).

Let $\rho ^{(k)}$ be a strictly increasing sequence (e.g., $\rho ^{(k)} = \tau ^k\rho ^{(0)},$ with $\rho _0,\tau >1$) and $\delta _k \rightarrow 0$ be a strictly decreasing sequence converging to zero (e.g., $\delta _k = \delta _0/\tau ^k.$) Further, let
$$\begin{aligned} t > 2 \sigma _1^{-1/2} S^{-1/2} \Vert A\Vert \ \Vert f\Vert , \end{aligned}$$
(23)
where $\sigma _1$ is the smallest nonzero eigenvalue of $C^\mathrm{T}C$ with C given by (49). For the particular choice of coupling given by the left-hand and right-hand side of (7) we let
$$\begin{aligned}&t> \tfrac{2}{S}\Vert A\Vert \ \Vert f\Vert , \quad \text { and }\quad t > 2(2-2\cos (2\pi /S))^{-1/2} S^{-1/2} \Vert A\Vert \ \Vert f\Vert , \nonumber \\ \end{aligned}$$
(24)
respectively. Initialize $u^{(0)}_s := u^{(0,0)}_s$ as discussed in the corresponding paragraph below, (e.g., $u^{(0)}_s = 0$ for all s.)
Set $\rho = \rho ^{(0)},\ \rho _{s,s'} = \rho ^{(0)}c_{s,s'},\ \delta = \delta _0,\ k,n=0;$ set $L_{\rho }$ according to (13) (or, in the special cases of (7), as explained below (13))
1. A.
  While
  $$\begin{aligned} \left\| u^{(k,n)}_s - u^{(k,n)}_{s'} \right\|> \frac{t}{\rho \sqrt{c_{s,s'}}}, \quad \text { or } \quad \left\| u^{(k,n)}_s - u^{(k,n-1)}_s \right\| > \frac{\delta }{L_{\rho }} \end{aligned}$$
  (25)
  do
  $$\begin{aligned} \text {1.}\quad&h^{(k,n)}_s = u^{(k,n)}_s + \tfrac{1}{SL_{\rho }^2} A^*f - \tfrac{1}{S L_{\rho }^2} A^*A u^{(k,n)}_s - \sum _{s':s' \ne s}\tfrac{\rho _{s,s'}}{L_{\rho }^2} (u^{(k,n)}_s-u^{(k,n)}_{s'}),\nonumber \\&\qquad s = 1,\ldots ,S, \nonumber \\ \text {2.}\quad&\left( u^{(k,n+1)}_1,\ldots ,u^{(k,n+1)}_S \right) \in \mathop {{{\text {argmin}}}}\limits _{u_1,\ldots ,u_S} \sum _{s=1}^S \left[ \left\| u_s - h^{(k,n)}_s \right\| _2^2 + \tfrac{\gamma \omega _s}{L_{\rho }^2} \left\| \nabla _{a_s} u_s \, \right\| _0 \right] , \end{aligned}$$
  (26)
  and set $n=n+1.$
2. B.
  Set
  $$\begin{aligned} u^{(k+1)}_s = u^{(k+1,0)}_s = u^{(k,n)}_s, \end{aligned}$$
  (27)
  set $k = k+1, n=0,$ and let $\rho = \rho ^{(k)}, \rho _{s,s'} = \rho ^{(k)} \ c_{s,s'}, \delta = \delta _k;$ set $L_{\rho }$ according to (13) (or, in the special cases of (7), as below (13)) and goto A.

This approach is inspired by [60] which considers quadratic penalty methods in the sparsity context. There, the authors are searching for a solution with only a few nonzero entries. The corresponding prior is separable. In contrast to this work, the present work considers a non-separable prior.

Initialization Although the initialization of Algorithm 1 and of Algorithm 2 is not relevant for its convergence properties (cf. Sect. 3), the choice of the initialization influences the final result. (Please note that this also might happen for convex but not strictly convex problems.) We discuss different initialization strategies. The simplest choice is the all-zero initialization $(u_1^{(0)},\ldots ,u_s^{(0)}) = (0,\ldots ,0).$ Likewise, one can select the right-hand side of the normal equations of the underlying least squares problem, that is $A^\mathrm{T}f$. A third reasonable choice is the solution of the normal equation itself or an approximation of it. Using an approximation might in particular be reasonable to get a regularized approximation of the normal equation. A possible strategy to obtain such a regularized initialization is to apply a fixed number of Landweber iterations [54] or of the conjugate gradient method to the underlying least square problem. (In our experiments, we initialized Algorithm 1 with the result of 1000 Landweber iterations and Algorithm 2 with $A^\mathrm{T}f$.)

2.3 A Non-iterative Algorithm for Minimizing the Potts Subproblem (16)

Both proposed algorithms require solving the Potts subproblem (16) in the backward step, see (19),(26). We first observe that (16) can be solved for each of the $u_s$ separately. The corresponding s minimization problems are of the prototypical form

$$\begin{aligned} \begin{aligned} \mathop {{{\text {argmin}}}}\limits _{u_s:\Omega \rightarrow {\mathbb {R}}} \Vert u_s - f\Vert _2^2 + \gamma '_s \Vert \nabla _{a_s} u \Vert _0 \end{aligned} \end{aligned}$$

(28)

with given data f, the jump penalty $\gamma '_s = \tfrac{\gamma \omega _s}{L_{\rho }^2}> 0$ and the direction $a_s\in \mathbb {Z}^2$. As a next step, we see that (28) decomposes into univariate Potts problems for data along the paths in f induced by $a_s$, e.g., for $a_s = e_1$ those paths correspond to the rows of f and we obtain a minimizer $u_s^*$ of (28) by determining each of its rows individually. The univariate Potts problem amounts to minimizing

$$\begin{aligned} \begin{aligned} P^{\mathrm {id,1d}}_\gamma (x) = \Vert x - g\Vert ^2_2 + \gamma \Vert \nabla x\Vert _0 \rightarrow \min , \end{aligned} \end{aligned}$$

(29)

where the data g is given by the restriction of f to the pixels in $\Omega $ of the form $v+a_s z,$ for $z\in \mathbb {Z}$, i.e., $g(z)=f(v+a_s z)$.

Here, the offset v is fixed when solving each univariate problem, but varied afterward to get all lines in the image with direction $a_s.$ The target to optimize is denoted by $x\in {\mathbb {R}^n}$ and, in the resulting univariate situation, $\Vert \nabla x\Vert _0= \vert \{ i:x_i \ne x_{i+1} \} \vert $ denotes the number of jumps of x.

It is well known that the univariate direct problem (29) has a unique minimizer. Further these particular problems can be solved exactly by dynamic programming [18, 35, 62, 63, 92] which we briefly describe in the following. For further details, we refer to [35, 82]. Assume we have computed minimizers $x^l$ of (29) for partial data $(g_1,\ldots ,g_l)$ for each $l=1,\ldots ,r$, $r<n$. Then, the minimum value of (29) for $(g_1,\ldots ,g_{r+1})$ can be found by

$$\begin{aligned} \begin{aligned} P^{\mathrm {id,1d}}_\gamma ({x^{r+1}} )= \min _{l=1,\ldots ,r+1} P^{\mathrm {id,1d}}_\gamma (x^{l-1}) + \gamma +{\mathcal {E}}^{l:r+1}, \end{aligned} \end{aligned}$$

(30)

where we let $x^0$ be the empty vector, $P^{\mathrm {id,1d}}_\gamma (x^0) = -\gamma $ and ${\mathcal {E}}^{l:r+1}$ be the quadratic deviation of $(g_l,\ldots ,g_{r+1})$ from its mean. By denoting the minimizing argument in (30) by $l^*$ the minimizer $x^{r+1}$ is given by

$$\begin{aligned} x^{r+1} = (x^{l^*-1}, \mu _{[l^*,r]},\ldots ,\mu _{[l^*,r]}), \end{aligned}$$

(31)

where $\mu _{[l^*,r]}$ is the mean value of $(g_{l^*},\ldots ,g_r)$. Thus, we obtain a minimizer for full data g by successively computing $x^l$ for each $l=1,\ldots ,n$. By precomputing the first and second moments of data g and storing only jump locations the described method can be implemented in ${\mathcal {O}}(n^2)$, [35]. Another way to achieve ${\mathcal {O}}(n^2)$ is based on the QR decomposition of the design matrix by means of Givens rotations, see [82]. Furthermore, the search space can be pruned to speed up computations [47, 83].

We briefly describe the extensions of the above scheme necessary to approach (29) for vector valued-data $g\in \mathbb {R}^{n\times C}$ (e.g., the row of a color image). In this situation, the symbol ${\mathcal {E}}^{l:r+1}$ in (30) denotes the sum of the quadratic deviations of $(g_l,\ldots ,g_{r+1})$ from its channel-wise means. Further, $\mu _{[l^*,r]}\in \mathbb {R}^{C}$ in (31) is the vector of channel-wise means of the data $(g_{l^*},\ldots ,g_r)$. On the computational side, the first and second moments of each channel have to be precomputed separately. It is worth mentioning that the theoretical computational costs of the described method grows only linearly in the number of channels [83]. Thus, the proposed algorithm can be efficiently applied to vector-valued images with a high-dimensional codomain.

3 Analysis

3.1 Analytic Results

In the course of the derivation of the proposed algorithms above, we consider the quadratic penalty relaxation (5) of the multivariate Potts problem. Although it is more straightforward to access algorithmically via our approach, we first note that this problem is still NP-hard (as is the original problem).

Theorem 2

Finding a (global) minimizer of the quadratic penalty relaxation (5) of the multivariate Potts problem is an NP-hard problem.

The proof is given in Sect. 3.3. In Sect. 2.2, we have proposed Algorithm 1 to approach the quadratic penalty relaxation of the multivariate Potts problem. We show that the proposed algorithm converges to a local minimizer and that a feasible point of the original multivariate Potts problem is nearby.

Theorem 3

We consider the iterative Potts minimization Algorithm 1 for the quadratic penalty relaxation (5) of the multivariate Potts problem.

i.
Algorithm 1 computes a local minimizer of the quadratic penalty relaxation (5) of the multivariate Potts problem for any starting point. The convergence rate is linear.
ii.
We have the following relation between local minimizers ${{\mathcal {L}}}$, global minimizers ${{\mathcal {G}}}$ and the fixed points $\mathrm {Fix}({\mathbb {I}})$ of the iteration of Algorithm 1,
$$\begin{aligned} {{\mathcal {G}} } \subset \mathrm {Fix}({\mathbb {I}}) \subset {{\mathcal {L}}}. \end{aligned}$$
(32)
iii.
Assume a tolerance $\varepsilon $ we are willing to accept for the distance between the $u_s,$ i.e.,
$$\begin{aligned} \sum _{s,s'} c_{s,s'} \Vert u_s - u_{s'}\Vert ^2_2 = \sum _{s,s'} c_{s,s'} \sum _{i,j}|(u_s)_{ij} - (u_{s'})_{ij}|^2 \le \varepsilon ^2. \end{aligned}$$
(33)
Running Algorithm 1 with the choice of the parameter $\rho $ by
$$\begin{aligned} \rho > 2 \varepsilon ^{-1} \ \sigma _1^{-1/2} S^{-1/2} \Vert A\Vert \Vert f\Vert \end{aligned}$$
(34)
(where $\sigma _1$ is the smallest nonzero eigenvalue of $C^\mathrm{T}C$ with C given by (49); for the particular choice of the coupling given by (7), $\sigma _1 = S$ and $\sigma _1 = (2-2\cos (2\pi /S)),$ respectively) yields a local minimizer of the quadratic penalty relaxation (5) such that the $u_s$ are close up to $\varepsilon ,$ i.e., (33) is fulfilled.

The proof is given in Sect. 3.5. A solution of Algorithm 1 is not a feasible point for the initial Potts problem (3). However, we see below that it produces a $\delta $-approximative solution $u^*$ in the sense that there is $\mu ^*$ and a partitioning ${\mathcal {P}}^*$ such that

$$\begin{aligned} \sum _{s,s'} c_{s,s'} \Vert u^*_s - u^*_{s'}\Vert ^2_2< \delta , \qquad \text { and } \quad L(\mu ^*) < \delta , \end{aligned}$$

(35)

where $L(\mu ^*)$ is given by (53). In this context, note that the conditions for a local minimizer are given by $\sum _{s,s'} c_{s,s'} \Vert u^*_s - u^*_{s'}\Vert ^2_2 = 0$ and the Lagrange multiplier condition $L(\mu ^*) = 0.$ So (35) intuitively means that both the constraint and the Lagrange multiplier condition are approximately fulfilled for the partitioning induced by $u^*$.

Further, given a solution of Algorithm 1 we find a feasible point for the Potts problem (3) (or, equivalently,(2)) which is nearby as detailed in the following theorem.

Theorem 4

We consider the iterative Potts minimization Algorithm 1 for the quadratic penalty relaxation (5) in connection with the (non-relaxed) Potts problem (3).

i.
Algorithm 1 produces an approximative solution in the sense of (35) of the Potts problem (3).
ii.
The projection procedure (Procedure 1) proposed in Sect. 2.2 applied to the solution $u'=(u'_1,\ldots ,u'_S)$ of Algorithm 1 produces a feasible image ${{\hat{u}}}$ (together with a valid partitioning) for the Potts problem (3) which is close to $u'$ in the sense that
$$\begin{aligned} \Vert u_s'-{{\hat{u}}}\Vert \le C_1 \varepsilon \qquad \text {for all} \quad s \in \{1,\ldots ,S\}, \end{aligned}$$
(36)
where $\varepsilon = \max _{s,s'} \Vert u'_s-u'_{s'}\Vert $ quantifies the deviation between the $u_s.$ Here, $C_1 = \# \Omega /4, $ where the symbol $\# \Omega $ denotes the number of elements in $\Omega .$ If the imaging operator A is lower bounded, i.e., there is a constant $c>0$ such that $\Vert Au\Vert \ge c \Vert u\Vert $, a local minimizer $u^*$ of the Potts problem (3) is nearby, i.e.,
$$\begin{aligned} \Vert u^*-{{\hat{u}}}\Vert \le \frac{\sqrt{\eta }}{c} \end{aligned}$$
(37)
where
$$\begin{aligned} \eta := \left( \Vert A \Vert ^2 \varepsilon C_1^2 + 2 \Vert A \Vert C_1 \Vert f \Vert _2 \right) \varepsilon . \end{aligned}$$
(38)

The proof of Theorem 4 can be found at the end of Sect. 3.4, where most relevant statements are already shown in Sect. 3.3. Theorem 4 theoretically underpins the fact that, on the application side, we may use Algorithm 1 for the Potts problem (3) (accepting some arbitrary small tolerance we may fix in advance).

In addition, in Sect. 2.2, we have proposed Algorithm 2 to approach the Potts problem (3). We first show that Algorithm 2 is well defined.

Theorem 5

Algorithm 2 is well defined in the sense that the inner iteration governed by (25) terminates, i.e., for any $k \in {\mathbb {N}},$ there is $n \in {\mathbb {N}}$ such that the termination criterium given by (25) holds.

The proof of Theorem 5 is given in Sect. 3.6. Concerning the convergence properties of Algorithm 2, we obtain the following results.

Theorem 6

We consider the iterative Potts minimization algorithm (Algorithm 2) for the Potts problem (3).

Any cluster point of the sequence $u^{(k)}$ is a local minimizer of the Potts problem (3) (which implicitly implies that the components of each limit $u^*$ are equal, i.e., $u_s^{*} = u_{s'}^{*}$ for all $s,s'.$)
If A is lower bounded, the sequence $u^{(k)}$ produced by Algorithm 2 has a cluster point and the produced cluster points are local minimizers of the Potts problem (3).

The proof of Theorem 6 can be found in Sect. 3.6.

3.2 Estimates on Operator Norms and Lagrange Multipliers

Lemma 1

The spectral norm of the block matrix B given by (8) fulfills

$$\begin{aligned} \Vert B \Vert _2 \le \bigg (\tfrac{1}{S}\Vert A\Vert _2^2 + 2 \max _{s \in \{1,\ldots ,S \} } \sum _{s': s'\ne s}^{S} \rho _{s,s'}\bigg )^{\frac{1}{2}}. \end{aligned}$$

(39)

For the particular choice of constant $\rho _{s,s'} = \rho $ (independent of $s,s'$) as on the left-hand side of (7), we have the improved estimate

$$\begin{aligned} \Vert B \Vert _2 \le \bigg (\tfrac{1}{S}\Vert A\Vert _2^2 + S \rho \bigg )^{\frac{1}{2}}. \end{aligned}$$

(40)

For only coupling neighboring $u_s$ with the same constant $\rho $, i.e., the right-hand coupling of (7), we have

$$\begin{aligned} \Vert B \Vert _2 \le \bigg (\tfrac{1}{S}\Vert A\Vert _2^2 + \alpha \rho \bigg )^{\frac{1}{2}}, \quad \text { where } \quad \alpha = {\left\{ \begin{array}{ll} 4, &{}\quad \text {if }S\text { is even}, \\ 2 - 2 \cos \left( \frac{\pi (S-1)}{S}\right) , &{} \quad \text {if }S\text { is odd.} \end{array}\right. } \end{aligned}$$

(41)

Proof

We decompose the matrix B according to $B =\begin{pmatrix} S^{-1/2}{{\tilde{A}}} \\ {{\tilde{P}}} \end{pmatrix}.$ Here, ${{\tilde{A}}}$ denotes an $S \times S$-block diagonal matrix with each diagonal entry being equal to A, where A is the matrix representing the forward/imaging operator; see (8). The matrix ${{\tilde{P}}}$ is given as the lower ${S \atopwithdelims ()2} \times S$-block in (8) which represents the soft constraints.

Using this decomposition of B, we may decompose the symmetric and positive (semidefinite) matrix $B^\mathrm{T}B$ according to

$$\begin{aligned} B^\mathrm{T}B = \tfrac{1}{S} {{\tilde{A}}}^\mathrm{T} {{\tilde{A}}} + {{\tilde{P}}}^\mathrm{T} {{\tilde{P}}}, \end{aligned}$$

(42)

where ${{\tilde{A}}}^\mathrm{T} {\tilde{A}}$ is an $S \times S$-block diagonal matrix with each diagonal entry being equal to $A^\mathrm{T} A,$ and ${\tilde{P}}^\mathrm{T} {\tilde{P}}$ is an $S \times S$-block diagonal matrix with block entries given by

$$\begin{aligned} {\tilde{P}}^\mathrm{T} {\tilde{P}} = \begin{pmatrix} \sum \nolimits _{k=2}^S\rho _{1,k} I &{} -\rho _{1,2} I &{} -\rho _{1,3} I &{} \ldots &{} -\rho _{1,S} I\\ -\rho _{1,2} I &{} \sum \nolimits _{k=1,k \ne 2}^S\rho _{2,k} I &{} -\rho _{2,3}I &{} \ldots &{} -\rho _{2,S} I\\ &{} \vdots &{} &{} &{} \vdots &{} \\ -\rho _{1,S}I &{} -\rho _{2,S}I &{} -\rho _{3,S}I &{} \ldots &{} \sum \nolimits _{k=1}^{S-1}\rho _{S,k}I \end{pmatrix}, \end{aligned}$$

(43)

with $\rho _{l,k}:= \rho _{k,l}$ for $l>k.$ Using Gerschgorin’s Theorem (see for instance [81]), the eigenvalues of ${\tilde{P}}$ are contained in the union of the balls with center $x_r=\sum _{k=1,k \ne r}^S\rho _{r,k}$ and radius $x_r= \sum _{k=1,k \ne r}^S | - \rho _{r,k}|.$ These balls are all contained in the larger ball with center 0 and radius $2 \cdot \max _r x_r.$ This implies the general estimate (39).

For seeing (40), we decompose an argument $u=(u_1,\ldots ,u_S)$ according to $u= {{\bar{u}}} + u^0$ with an “average” part ${{\bar{u}}} = (\tfrac{1}{S}\sum _{i=1}^{S}u_i,\ldots ,\tfrac{1}{S}\sum _{i=1}^{S}u_i)$ and $u^0:=u-{{\bar{u}}}$ such that $u^0$ has average 0, i.e., $\sum _{i=1}^{S}u^0_i = 0,$ where 0 denotes the vector containing only zero entries here. In the situation of (40), the matrix ${\tilde{P}}^\mathrm{T} {\tilde{P}}$ has the form ${\tilde{P}}^\mathrm{T} {\tilde{P}} = \rho (S\cdot I - (1,\ldots ,1)(1,\ldots ,1)^\mathrm{T})$ We have ${\tilde{P}}^\mathrm{T} {\tilde{P}} {{\bar{u}}} = 0.$ Further, $ {\tilde{P}}^\mathrm{T} {\tilde{P}} u^0 = \rho S u^0.$ Hence, the largest modulus of an eigenvalue of ${\tilde{P}}^\mathrm{T} {\tilde{P}}$ equals $\rho S$ which in turn shows the estimate (40).

For seeing (41), we notice that in case of (41), the matrix ${\tilde{P}}^\mathrm{T} {\tilde{P}}$ has cyclic shift structure with three nonzero entries in each line. The discrete Fourier matrix w.r.t. the cyclic group of order S diagonalizes ${\tilde{P}}^\mathrm{T} {\tilde{P}}.$ The corresponding eigenvalues are given by $\lambda _k = \rho \left( 2 - 2 \cos \left( 2\pi \frac{k}{S} \right) \right) ,$ where $k=0,\ldots ,S-1.$ The largest modulus of an eigenvalue is thus given by $4 \ \rho ,$ if S is even, and by $\rho \cdot \left( 2 - 2 \cos \left( \frac{\pi (S-1)}{S}\right) \right) .$ $\square $

Note that the problem of estimating the operator norm of B in (39) involves computing the operator norm of ${\tilde{P}}$ given by (43). This problem is intimately related to computing the spectral norm of the Laplacian of a corresponding weighted graph (e.g., [38, 80]), in particular, we conclude from this link that the general estimate (39) is sharp in the sense that the factor of 2 in front of the sum cannot be made smaller. This is because, for a general graph, the spectral radius of the (normalized) Laplacian has spectral norm smaller than two and this factor of two is sharp; cf. [38, 80].

We recall that we have introduced the concept of a directional partitioning ${\mathcal {I}}$ and discussed its relation with the concept of a partitioning near (21) above. For a function $f: \Omega \rightarrow {\mathbb {R}}^S$ (representing its S component functions $f_1,\ldots ,f_S:\Omega \rightarrow {\mathbb {R}}$) defined on a grid $\Omega ,$ we consider the orthogonal projection $P_{\mathcal {I}}$ associated with a directional partition ${\mathcal {I}}$ by first sorting the intervals ${\mathcal {I}}$ into ${\mathcal {I}}_1,\ldots ,{\mathcal {I}}_S$ according to their associated directions $a_s,$ $s =1,\ldots , S,$ and then letting

$$\begin{aligned} P_{\mathcal {I}} f = \begin{pmatrix} P_{{\mathcal {I}}_1} f_1 \\ \vdots \\ P_{{\mathcal {I}}_S} f_S \end{pmatrix}, \qquad \text { where } \quad P_{{\mathcal {I}}_s} f_s|_{I} = \frac{\sum _{x \in I} f_s(x) }{ \# I}, \end{aligned}$$

(44)

i.e., the function $P_{{\mathcal {I}}_s} f_s$ on the interval I is given as the arithmetic mean of $f_i$ on the interval I for all intervals $I \in {\mathcal {I}}_s,$ and for all $s =1,\ldots , S.$ Here, the symbol $\# I$ denotes the number of elements in I. We note that $P_{\mathcal {I}}$ defines an orthogonal projection on the corresponding $\ell ^2$ space of discrete functions $f: \Omega \rightarrow {\mathbb {R}}^S$ with the norm $\Vert f\Vert ^2 = \sum _{s,i} |(f_s)_i|^2$ where i iterates through all the indices of $f_s.$

We consider a partitioning ${\mathcal {P}}$ of $\Omega ,$ its induced directional partitioning ${\mathcal {I}}^{{\mathcal {P}}}$ w.r.t. a set of S directions $a_1,\ldots ,a_S,$ and the subspace

$$\begin{aligned} {\mathcal {A}}^{{\mathcal {P}}} = P_{{\mathcal {I}}^{{\mathcal {P}}}}(\ell ^2(\Omega , {\mathbb {R}}^S)) \end{aligned}$$

(45)

of functions which are constant on the intervals of the induced directional partitioning ${\mathcal {I}}^{{\mathcal {P}}}$ (which equal the image of the orthogonal projection $P_{{\mathcal {I}}^{{\mathcal {P}}}}.$)

Functions $g: \Omega \rightarrow {\mathbb {R}}$ which are piecewise constant w.r.t. a partitioning ${\mathcal {P}},$ i.e., they are constant on each segment ${\mathcal {P}}_i$ are in one-to-one correspondence with the linear subspace ${\mathcal {B}}^{{\mathcal {P}}}$ of ${\mathcal {A}}^{{\mathcal {P}}}$ given by

$$\begin{aligned} {\mathcal {B}}^{{\mathcal {P}}} = \{f \in {\mathcal {A}}^{{\mathcal {P}}}: f_1 = \ldots = f_S \} \end{aligned}$$

(46)

as shown by the following lemma.

Lemma 2

There is a one-to-one correspondence between the linear space of piecewise constant mappings w.r.t. the partitioning ${\mathcal {P}},$ and the subspace ${\mathcal {B}}^{{\mathcal {P}}}$ of ${\mathcal {A}}^{{\mathcal {P}}}$ via the mapping $\iota : g \mapsto (g,\ldots ,g).$

Proof

Let g be a piecewise constant mapping w.r.t. the partitioning ${\mathcal {P}},$ then $(g,\ldots ,g)$ is constant on each interval I of the induced directional partitioning ${\mathcal {I}}^{{\mathcal {P}}},$ and $(g,\ldots ,g) \in {\mathcal {B}}^{{\mathcal {P}}}.$ This shows that $\iota $ is well defined in the sense that its range is contained in ${\mathcal {B}}^{{\mathcal {P}}}.$ Obviously, $\iota $ is an injective linear mapping so that it remains to show that any $f \in {\mathcal {B}}^{{\mathcal {P}}}$ is the image under $\iota $ of some $g:\Omega \rightarrow {\mathbb {R}}$ which is piecewise constant w.r.t. the partitioning ${\mathcal {P}}.$ To this end, let $f \in {\mathcal {B}}^{{\mathcal {P}}}.$ By definition, f has the form $f = (g,\ldots ,g)$ for some $g:\Omega \rightarrow {\mathbb {R}}.$ Now, toward a contradiction, assume there is a segment ${\mathcal {P}}_i$ and points $x,y \in {\mathcal {P}}_i$ with $g(x)\ne g(y).$ Since there is a path $x_0=x,\ldots ,x_N=y$ connecting x, y in ${\mathcal {P}}_i$ with steps in $a_1,\ldots ,a_S,$ we have that for any i there is an interval I in the induced partitioning ${\mathcal {I}}^{{\mathcal {P}}}$ containing $x_i$ together with $x_{i+1}.$ Since g is constant on each I in ${\mathcal {I}}^{{\mathcal {P}}}$ we get $g(x_i) = g(x_{i+1})$ for all i which implies $g(x)= g(y).$ This contradicts our assumption and shows the lemma. $\square $

Using the identification given by Lemma 2, we define, for a given partitioning ${\mathcal {P}},$ the projection $Q_{{\mathcal {P}}}$ onto ${\mathcal {B}}^{{\mathcal {P}}}$ by

$$\begin{aligned} Q_{{\mathcal {P}}} f = \begin{pmatrix} \pi _{{\mathcal {P}}} f\\ \vdots \\ \pi _{{\mathcal {P}}} f \end{pmatrix}, \qquad \text { where } \quad \pi f|_{{\mathcal {P}}_i} = \frac{\sum _{s=1}^S\sum _{x \in {\mathcal {P}}_i} f_s(x) }{ \# {\mathcal {P}}_i \ S}, \end{aligned}$$

(47)

i.e., we average w.r.t. the segment and to all component functions as given by (22). Since the components of $Q_{{\mathcal {P}}} f$ are all identical, we will not distinguish $Q_{{\mathcal {P}}}$ and $\pi _{{\mathcal {P}}}$ in the following. This means that we also use the symbol $Q_{{\mathcal {P}}}f$ to denote the scalar-valued function which is piecewise constant on the partitioning ${\mathcal {P}}.$

On ${\mathcal {A}}^{{\mathcal {P}}},$ we consider the problem

$$\begin{aligned} \mathop {{{\text {argmin}}}}\limits _{u_1,\ldots ,u_S} \sum _{s=1}^S \frac{1}{S} \left\| Au_s - f \right\| _2^2 \quad \text { subject to } \quad Cu=0, \end{aligned}$$

(48)

i.e., given the directional partitioning we are searching for a solution which belongs to ${\mathcal {B}}^{{\mathcal {P}}}$. Here, C denotes the matrix

$$\begin{aligned} C = \begin{pmatrix} c_{1,2} I &{} -c_{1,2}I &{} 0 &{} \ldots &{} 0 &{} 0\\ c_{1,3} I &{} 0 &{} -c_{1,3}I &{} \ldots &{} 0 &{} 0\\ &{} \vdots &{} &{} &{} \vdots &{} \\ c_{1,S} I &{} 0 &{} 0 &{} \ldots &{} 0 &{} -c_{1,S}I \\ 0 &{} c_{2,3} I &{} -c_{2,3}I &{} \ldots &{} 0 &{} 0 \\ &{} \vdots &{} &{} &{} \vdots &{} \\ 0 &{} c_{2,S} I &{} 0 &{} \ldots &{} 0 &{} -c_{2,S}I \\ &{} &{} &{} \vdots &{} &{} \\ 0 &{} 0 &{} 0 &{} \ldots &{} c_{S-1,S}I &{} -c_{S-1,S}I \\ \end{pmatrix}, \end{aligned}$$

(49)

where the $c_{s,s'}$ are as in (5); if $c_{s,s'}=0,$ the corresponding line is removed from the constraint matrix C. For the special choices of (7), we have

$$\begin{aligned} C = \begin{pmatrix} I &{} -I &{} 0 &{} \ldots &{} 0 &{} 0\\ I &{} 0 &{} -I &{} \ldots &{} 0 &{} 0\\ &{} \vdots &{} &{} &{} \vdots &{} \\ I &{} 0 &{} 0 &{} \ldots &{} 0 &{} -I \\ 0 &{} I &{} -I &{} \ldots &{} 0 &{} 0 \\ &{} \vdots &{} &{} &{} \vdots &{} \\ 0 &{} I &{} 0 &{} \ldots &{} 0 &{} -I \\ &{} &{} &{} \vdots &{} &{} \\ 0 &{} 0 &{} 0 &{} \ldots &{} I &{} -I \\ \end{pmatrix}, \quad \text { and } \quad C = \begin{pmatrix} I &{} -I &{} 0 &{} 0 &{} \ldots &{} 0 &{} 0&{} 0\\ 0 &{} I &{} -I &{} 0 &{} \ldots &{} 0 &{} 0&{} 0 \\ 0 &{} 0 &{} I &{} -I &{} \ldots &{} 0 &{} 0&{} 0 \\ &{} \vdots &{} &{} &{} &{} &{} \vdots &{} &{} \\ 0 &{} 0 &{} 0 &{} 0&{} \ldots &{} I &{} -I &{} 0 \\ 0 &{} 0 &{} 0 &{} 0&{}\ldots &{} 0 &{} I &{} -I\\ -I &{} 0 &{} 0 &{} 0&{}\ldots &{} 0 &{} 0 &{} I\\ \end{pmatrix} \end{aligned}$$

(50)

which reflects the constraints $u_1=\ldots =u_S.$ We recall that $\mu _{{\mathcal {P}}}$ is a Lagrange multiplier of the problem in (48) if

$$\begin{aligned} \min _{u \in {\mathcal {B}}^{{\mathcal {P}}}} \ \sum _{s=1}^S \frac{1}{S} \left\| Au_s - f \right\| _2^2 = \min _{u \in {\mathcal {A}}^{{\mathcal {P}}}} \ \sum _{s=1}^S \frac{1}{S} \left\| Au_s - f \right\| _2^2 + \mu _{{\mathcal {P}}}^\mathrm{T} C u. \end{aligned}$$

(51)

We note that for quadratic problems such as in (48) Lagrange multipliers always exist [7]. We have that

$$\begin{aligned} \tfrac{2}{S} P_{{\mathcal {I}}^{{\mathcal {P}}}}{\tilde{A}}^\mathrm{T} {\tilde{A}} P_{{\mathcal {I}}^{{\mathcal {P}}}} u_{{\mathcal {P}}}^*- \tfrac{2}{S} P_{{\mathcal {I}}^{{\mathcal {P}}}} {\tilde{A}}^\mathrm{T} {\tilde{f}} = C^\mathrm{T} \mu _{{\mathcal {P}}} = P_{{\mathcal {I}}^{{\mathcal {P}}}} C^\mathrm{T} \mu _{{\mathcal {P}}}, \end{aligned}$$

(52)

or, in other form,

$$\begin{aligned} L(\mu _{{\mathcal {P}}}) := \left\| \tfrac{2}{S} P_{{\mathcal {I}}^{{\mathcal {P}}}}{\tilde{A}}^\mathrm{T} {\tilde{A}} P_{{\mathcal {I}}^{{\mathcal {P}}}} u_{{\mathcal {P}}}^*- \tfrac{2}{S} P_{{\mathcal {I}}^{{\mathcal {P}}}} {\tilde{A}}^\mathrm{T} {\tilde{f}} - P_{{\mathcal {I}}^{{\mathcal {P}}}} C^\mathrm{T} \mu _{{\mathcal {P}}} \right\| =0, \end{aligned}$$

(53)

where ${\tilde{A}}$ is the block diagonal matrix with constant entry A on each diagonal component, ${\tilde{f}}$ is a block vector of corresponding dimensions with entry f in each component, and $u_{{\mathcal {P}}}^*$ is a minimizer of the constraint problem in ${\mathcal {B}}^{{\mathcal {P}}}$. We note that the last equality $C^\mathrm{T} \mu _{{\mathcal {P}}} = P_{{\mathcal {I}}^{{\mathcal {P}}}} C^\mathrm{T} \mu _{{\mathcal {P}}}$ in (52) holds since the left-hand side of (52) is contained in the image of $P_{{\mathcal {I}}^{{\mathcal {P}}}}$.

Lemma 3

We consider a partitioning ${\mathcal {P}}$ of the discrete domain $\Omega $ and the corresponding problem (48). There is a Lagrange multiplier $\mu _{{\mathcal {P}}}$ for (48) with

$$\begin{aligned} \Vert \mu _{{\mathcal {P}}} \Vert \le 2 \sigma _1^{-1/2} S^{-1/2} \Vert A\Vert \Vert f\Vert . \end{aligned}$$

(54)

Here, $\sigma _1$ is the smallest nonzero eigenvalue of $C^\mathrm{T}C$ with C given by (49). For the particular choice of C given by the left-hand side of (50), we have

$$\begin{aligned} \Vert \mu _{{\mathcal {P}}} \Vert \le \tfrac{2}{S} \Vert A\Vert \Vert f\Vert ; \end{aligned}$$

(55)

and, for the particular choice of C given by the right-hand side of (50) we have

$$\begin{aligned} \Vert \mu _{{\mathcal {P}}} \Vert \le 2 (2-2\cos (2\pi /S))^{-1/2} S^{-1/2} \Vert A\Vert \Vert f\Vert , \end{aligned}$$

(56)

(e.g., for $S=4,$ an eight neighborhood, $ 2-2\cos (2\pi /S) = \sigma _1 = 2.$) In particular, the right-hand side and the constants in all these estimates are independent of the particular partitioning ${\mathcal {P}}$.

Proof

For any minimizer $u_{{\mathcal {P}}}^*$ of the constraint problem in ${\mathcal {B}}^{{\mathcal {P}}},$ we have that

$$\begin{aligned} \Vert \tfrac{2}{S} P_{{\mathcal {I}}^{{\mathcal {P}}}}{\tilde{A}}^\mathrm{T} {\tilde{A}} P_{{\mathcal {I}}^{{\mathcal {P}}}} u_{{\mathcal {P}}}^*- \tfrac{2}{S} P_{{\mathcal {I}}^{{\mathcal {P}}}} {\tilde{A}}^\mathrm{T} {\tilde{f}}\Vert&\le \Vert \tfrac{2}{S}{\tilde{A}}^\mathrm{T} {\tilde{A}} u_{{\mathcal {P}}}^*- \tfrac{2}{S} {\tilde{A}}^\mathrm{T} {\tilde{f}} \Vert \le \tfrac{2}{S} \Vert A\Vert \Vert {\tilde{f}}\Vert \nonumber \\&\le \tfrac{2 \sqrt{S}}{S} \Vert A\Vert \Vert f\Vert , \end{aligned}$$

(57)

where we recall that ${\tilde{A}}$ is the block diagonal matrix with constant entry A, and ${\tilde{f}}$ is a block vector with entry f in each component. The first inequality is a consequence of the fact that $P_{{\mathcal {I}}^{{\mathcal {P}}}}$ is an orthogonal projection. The second inequality may be seen by evaluating the term for the constant zero function (which always belongs to ${\mathcal {B}}^{{\mathcal {P}}}$ ) as a candidate and by noting that $\Vert A^\mathrm{T} \Vert =\Vert A\Vert .$

Using (52), we have $\Vert C^\mathrm{T} \mu _{{\mathcal {P}}}\Vert \le \tfrac{2}{\sqrt{S}} \Vert A\Vert \Vert f\Vert .$ Choosing $\mu _{{\mathcal {P}}}$ in the complement of the zero space of $C^\mathrm{T},$ we get

$$\begin{aligned} \Vert C^\mathrm{T} \mu _{{\mathcal {P}}}\Vert \ge \inf _{x \in \left( {\text {ker}}\left( C^\mathrm{T}\right) \right) ^\perp ,\Vert x\Vert =1} \Vert C^\mathrm{T}x\Vert \ \Vert \mu _{{\mathcal {P}}}\Vert . \end{aligned}$$

(58)

We observe that finding the infimum in (58) corresponds to finding the square root of the smallest nonzero eigenvalue of $C^\mathrm{T}C.$ This is because (i) the nonzero eigenvalues of $C^\mathrm{T}C$ equal the nonzero eigenvalues of $CC^\mathrm{T},$ i.e.,

$$\begin{aligned} \min \left\{ \sigma : \sigma \in \mathrm {spectrum}(CC^\mathrm{T}){\setminus } \{0\} \right\} = \min \left\{ \sigma : \sigma \in \mathrm {spectrum}(C^\mathrm{T}C){\setminus } \{0\} \right\} = \sigma _1, \end{aligned}$$

(59)

where $\sigma _1$ is the smallest nonzero eigenvalue of $C^\mathrm{T}C$. Further, (ii) for $x \in \left( {\text {ker}}\left( C^\mathrm{T}\right) \right) ^\perp ,$ $\Vert C^\mathrm{T}x\Vert ^2 = \langle x, CC^\mathrm{T}x \rangle \ge \min \left\{ \sigma : \sigma \in \mathrm {spectrum}(CC^\mathrm{T}){\setminus } \{0\} \right\} \Vert x\Vert ^2.$ Hence, using (59) in (58) we get that $\Vert C^\mathrm{T} \mu _{{\mathcal {P}}}\Vert \ge \sqrt{\sigma _1} \Vert \mu _{{\mathcal {P}}}\Vert ,$ and together with (52) and (57), we obtain

$$\begin{aligned} \Vert \mu _{{\mathcal {P}}}\Vert \le \sigma ^{-1/2} \Vert C^\mathrm{T} \mu _{{\mathcal {P}}}\Vert \le 2 \sigma ^{-1/2} S^{-1/2} \Vert A\Vert \Vert f\Vert \end{aligned}$$

(60)

which shows (54).

Now we consider the particular choice of C given by the left-hand side of (50). Similar to the derivation in (43), we have that $C^\mathrm{T}C= S \cdot I - (1,\ldots ,1)(1,\ldots ,1)^\mathrm{T}).$ Further, the constants constitute the kernel of $C^\mathrm{T}C$ and any vector u in its orthogonal complement is mapped to Su. Hence, $\sigma _1 = S$ which shows (55).

Finally, we consider the particular choice of C given by the right-hand side of (50). As already explained in the proof of Lemma 1, the discrete Fourier transform shows that the corresponding eigenvalues are given by $\lambda _k = \rho \left( 2 - 2 \cos \left( 2\pi \frac{k}{S} \right) \right) ,$ where $k=0,\ldots ,S-1.$ The smallest nonzero eigenvalue is thus given by $2-2\cos (2\pi /S).$ This shows (56) which completes the proof of the lemma. $\square $

3.3 The Quadratic Penalty Relaxation of the Potts Problem and Its Relation to the Potts Problem

In this subsection, we reveal some relations between the Potts problem and its quadratic penalty relaxation; in particular, we show Theorem 2 and parts of Theorem 4. We start out to show that the quadratic penalty relaxation of the Potts problem is NP-hard which was formulated as Theorem 2.

Proof of Theorem 2

We consider the quadratic penalty relaxation (5) of the multivariate Potts problem in its equivalent form (11) which reads

$$\begin{aligned} P_{\gamma ,\rho }(u_1,\ldots ,u_S) = \left\| B(u_1,\ldots ,u_S)^\mathrm{T} - g \right\| _2^2 + \gamma \ \Big \Vert \ D(u_1,\ldots ,u_S) \ \Big \Vert _{0,\omega }. \end{aligned}$$

with B and g given by (8) and D given by (9). We serialize $ u:(u_1,\ldots ,u_S): \Omega \rightarrow {\mathbb {R}}^S$ into a function ${{\hat{u}}}: X \rightarrow {\mathbb {R}}$ with $X \subset {\mathbb {Z}}$ being a discrete interval of size $S \#\Omega $ as follows: for $u_s,$ we consider the discrete lines in the image with direction $a_s$ and interpret u on these lines as a vector; then we concatenate these vectors starting with the one corresponding to the leftmost upper line to obtain a vector of length $\#\Omega ;$ for each s, we obtain such a vector and we again concatenate these vectors starting with index $s=1,2,\ldots $ to obtain the resulting object which we denote by ${{\hat{u}}}.$ Using this serialization we may arrange B, g and D accordingly to obtain the univariate Potts problem

$$\begin{aligned} {{\hat{P}}}_{\gamma ,\rho }({{\hat{u}}}) = \left\| {{\hat{B}}} {{\hat{u}}} - {{\hat{g}}} \right\| _2^2 + \gamma \ \Big \Vert {{\hat{\omega }}} \nabla {{\hat{u}}} \Big \Vert _{0}, \quad \text { where } {{\hat{\omega }}}: X \rightarrow [0,\infty ) \end{aligned}$$

is a weight vector, $\omega \nabla {{\hat{u}}}$ denotes pointwise multiplication, and ${{\hat{B}}},{{\hat{g}}}$ are the matrix and the vector corresponding to B, g w.r.t. the serialization. The weight vector may be zero which in particular happens at the line breaks, i.e., those indices where two vectors have been concatenated in the above procedure. More precisely, constant data induce a directional segmentation on $\Omega $ and the image of the directional segmentation under the above serialization procedure induces a partitioning of the univariate domain D; precisely between these segments, the weight vectors equals zero. Now, for each segment $[d_1,\ldots ,d_r]$ in D, we transform the basis $\delta _{d_1},\ldots ,\delta _{d_r}$ to the basis $\delta _{d_2}-\delta _{d_1}, \ldots , \delta _{d_r}-\delta _{d_{r-1}}, \tfrac{1}{r}\sum _{l=1}^r \delta _{d_l} $ obtained by neighboring differences and the average. As a result (which is in detail elaborated in [84]), we obtain a problem of the form

$$\begin{aligned} {{\hat{P}}}_{\gamma ,\rho }({{\hat{u}}}) = \left\| {\tilde{B}} {\tilde{u}} - {\tilde{b}} \right\| _2^2 + \gamma \ \Big \Vert {{\hat{\omega }}} {\tilde{u}} \Big \Vert _{0}, \quad \text { where } {{\hat{\omega }}}: D \rightarrow [0,\infty ) \end{aligned}$$

(61)

which is a sparsity problem and which is known to be NP-hard; see, for instance, [84]. This shows the assertion. $\square $

We next characterize the local minimizers of the relaxed Potts problem (5) and of the Potts problem (2).

Lemma 4

A local minimizer $u=(u_1,\ldots ,u_S)$ of the quadratic penalty relaxation (5) is characterized as follows: let ${\mathcal {I}}$ be the directional partitioning induced by the minimizer u, and ${\mathcal {P}} = {\mathcal {P}}_{{\mathcal {I}}}$ be the induced partitioning, then u is a minimizer of the problem

$$\begin{aligned} \min _{u \in {\mathcal {A}}^{{\mathcal {P}}}} F_{\rho }(u), \quad \text { where } \quad F_{\rho }(u) = \sum \nolimits _{s=1}^S \tfrac{1}{S} \left\| Au_s - f \right\| _2^2 + \rho \Vert C u \Vert ^2. \end{aligned}$$

(62)

Conversely, if u minimizes (62) on ${\mathcal {A}}^{{\mathcal {P}}},$ then u is a minimizer of the relaxed Potts problem (5).

Proof

Let $u=(u_1,\ldots ,u_S)$ be a local minimizer of the quadratic penalty relaxation (5). Hence, there is a neighborhood ${\mathcal {U}}$ of u such that, for any $v \in {\mathcal {U}},$ $P_{\gamma , \rho }(v) \ge P_{\gamma , \rho }(u).$ Now if $v \in {\mathcal {A}}^{{\mathcal {P}}}$ and $\Vert v-u\Vert $ is small, then $\sum _{s=1}^S \omega _s \left\| \nabla _{a_s} u_s \, \right\| _0$ $ =\sum _{s=1}^S \omega _s \left\| \nabla _{a_s} v_s \, \right\| _0$ which implies that

$$\begin{aligned} F_{\rho }(u) = P_{\gamma , \rho }(u) - \gamma \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} u_s \,\right\| _0 \le P_{\gamma , \rho }(v) - \gamma \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} v_s \, \right\| _0 = F_{\rho }(v). \end{aligned}$$

(63)

This shows that u minimizes (62). Conversely, we assume that u minimizes (62). If the directional partitioning ${\mathcal {I}}'$ induced by u is coarser than ${\mathcal {I}}$ consider the coarser directional partitioning ${\mathcal {I}}'$ instead of ${\mathcal {I}}.$ Let the maximum norm of $h=(h_1\ldots ,h_S)$ be smaller than the height of the smallest jump of u, then, for $u+h,$

$$\begin{aligned} \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} (u_s+h_s) \, \right\| _0 \ge \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} u_s \, \right\| _0. \end{aligned}$$

(64)

If inequality holds in (64), the continuity of $F_{\rho }$ implies that $F_{\rho }(u+h) \ge F_{\rho }(u) - \varepsilon $ for small enough h and arbitrary $\varepsilon .$ Hence,

$$\begin{aligned} P_{\gamma , \rho }(u)&= F_{\rho }(u) + \gamma \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} u_s \, \right\| _0 \le F_{\rho }(u+h) - \gamma \min _s \omega _s \nonumber \\&\quad + \gamma \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} (u_s+h_s) \, \right\| _0 + \varepsilon \nonumber \\&\le F_{\rho }(u+h) + \gamma \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} (u_s+h_s) \, \right\| _0 = P_{\gamma , \rho }(u+h), \end{aligned}$$

(65)

if we choose $\varepsilon $ small enough. If equality holds in (64), we have that $u+h \in {\mathcal {A}}^{{\mathcal {P}}}$ which implies $F_{\rho }(u) \le F_{\rho }(u+h)$ since u is a minimizer of $F_{\rho }$ on ${\mathcal {A}}^{{\mathcal {P}}}.$ This in turn implies $P_{\gamma , \rho }(u) \le P_{\gamma , \rho }(u+h)$ by the assumed equality in (64). Together, in any case, $P_{\gamma , \rho }(u) \le P_{\gamma , \rho }(u+h)$ for any small perturbation h. This shows that u is a local minimizer of $P_{\gamma , \rho }$ which completes the proof. $\square $

Lemma 5

We consider a function $u^*:\Omega \rightarrow {\mathbb {R}}$ and its induced partitioning ${\mathcal {P}}.$ Then, u is a local minimizer of the Potts problem (2), if and only if $(u^*,\ldots ,u^*)$ minimizes (48) w.r.t. ${\mathcal {P}}.$

Proof

Since the proof of this statement is very similar to the proof of Lemma 4, we keep it rather short and refer to the proof of Lemma 4 if more explanation is necessary. Let u be a minimizer of (2) which is equivalent to ${{\bar{u}}} = (u,\ldots ,u)$ being a minimizer of (4). There is a neighborhood ${\mathcal {U}}$ of ${{\bar{u}}}$ such that, for any ${{\bar{v}}}=(v,\ldots ,v) \in {\mathcal {U}},$ $P_{\gamma }(v) \ge P_{\gamma }(u).$ For ${{\bar{v}}} \in {\mathcal {B}}^{{\mathcal {P}}}$ with small $\Vert {{\bar{v}}}- {{\bar{u}}}\Vert ,$ we have $\sum _{s=1}^S \omega _s \left\| \nabla _{a_s} u \, \right\| _0$ $ =\sum _{s=1}^S \omega _s \left\| \nabla _{a_s} v \, \right\| _0.$ Hence, by the definition of $P_\gamma $ in (4) $ \Vert Au - f \Vert _2^2 \le \Vert Av - f \Vert _2^2 $ which shows that $(u^*,\ldots ,u^*)$ minimizes (48).

Conversely, let ${{\bar{u}}} = (u,\ldots ,u)$ be a minimizer of (48) with the partitioning ${\mathcal {P}}$ induced by u. For ${{\bar{h}}}=(h,\ldots ,h)$ with absolute value smaller than the minimal height of a jump of u, we have the estimate $ \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} (u+h) \, \right\| _0 $ $ \ge \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} u \, \right\| _0. $ If inequality holds in this estimate, the continuity of $F_{\rho }$ implies that $ \Vert A(u+h) - f \Vert _2^2 \ge \Vert Au - f \Vert _2^2 - \varepsilon $ for small enough h and arbitrary $\varepsilon .$ Hence, $ P_{\gamma }( {{\bar{u}}}) \le \Vert A(u+h) - f \Vert _2^2 - \gamma \min _s \omega _s + \gamma \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} (u+h) \, \right\| _0 + \varepsilon $ $ \le P_{\gamma , \rho }({{\bar{u}}}+{{\bar{h}}}) $ if $\varepsilon $ is small. If equality holds above, i.e., $ \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} (u+h) \, \right\| _0 $ $ = \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} u \, \right\| _0, $ then ${{\bar{u}}}+ {{\bar{h}}} \in {\mathcal {B}}^{{\mathcal {P}}}$ which implies that $\Vert Au - f \Vert _2^2 \le \Vert A(u+h) - f \Vert _2^2$ since ${{\bar{u}}}$ is a minimizer of the corresponding function on ${\mathcal {B}}^{{\mathcal {P}}}.$ As a consequence $P_{\gamma }({{\bar{u}}}) \le P_{\gamma }({{\bar{u}}}+ {{\bar{h}}})$ for any small perturbation h. This shows that u is a local minimizer of $P_{\gamma }$ which completes the proof. $\square $

Proposition 1

Any local minimizer of the quadratic penalty relaxation (5) is an approximate local minimizer in the sense of (35) of the Potts problem (3).

Proof

By Lemma 4, a local minimizer $u=(u_1,\ldots ,u_S)$ of the quadratic penalty relaxation (5) is a minimizer of the problem (62). Let us thus consider a local minimizer u of (5) with induced partitioning ${\mathcal {P}} = {\mathcal {P}}_{{\mathcal {I}}}.$ Since u minimizes (62), we have

$$\begin{aligned} \tfrac{1}{S} P_{{\mathcal {I}}}{\tilde{A}}^\mathrm{T} {\tilde{A}} P_{{\mathcal {I}}} u - \tfrac{1}{S} P_{{\mathcal {I}}} {\tilde{A}}^\mathrm{T} {\tilde{f}} + \rho P_{{\mathcal {I}}} C^\mathrm{T} C P_{{\mathcal {I}}} u = 0 \end{aligned}$$

(66)

since the gradient projected to ${\mathcal {A}}^{{\mathcal {P}}}$ equals zero for any local minimizer of the restricted problem on the subspace ${\mathcal {A}}^{{\mathcal {P}}}.$ (The notation is chosen as in (53) above.) We define $\mu $ by $\mu = \rho C P_{{\mathcal {I}}} u$ and obtain

$$\begin{aligned} L(\mu ) = \Vert \tfrac{1}{S} P_{{\mathcal {I}}}{\tilde{A}}^\mathrm{T} {\tilde{A}} P_{{\mathcal {I}}} u - \tfrac{1}{S} P_{{\mathcal {I}}} {\tilde{A}}^\mathrm{T} {\tilde{f}} + P_{{\mathcal {I}}} C^\mathrm{T} \mu \Vert = 0 \end{aligned}$$

(67)

by (66). It remains to show that $\Vert Cu\Vert $ becomes small. To this end, we observe that, by Lemma 6, for arbitrary $v=(v_1,\ldots ,v_S)\in {\mathcal {A}}^{{\mathcal {P}}}$, $ \Vert C v \Vert = \Vert C P_{{\mathcal {I}}} v \Vert \le \tfrac{1}{\rho }\Vert \mu ^*\Vert + \sqrt{\tfrac{F_{\rho }(v)- \min _{x \in {\mathcal {A}}^{{\mathcal {P}}}} F_{\rho }(x)}{\rho }}, $ where $\mu ^*$ is an arbitrary Lagrange multiplier of (48). Plugging in the minimizer u for v yields $\Vert C u \Vert < \tfrac{1}{\rho }\Vert \mu ^*\Vert .$ Thus, letting $\delta = \tfrac{1}{\rho }\Vert \mu ^*\Vert ,$ we have

$$\begin{aligned} \sum _{s,s'} c_{s,s'} \Vert u^*_s - u^*_{s'}\Vert ^2_2 = \Vert Cu\Vert ^2 < \delta , \end{aligned}$$

(68)

and $L(\mu ) = 0$ by (67) which by (35) shows the assertion and completes the proof. $\square $

For the proof of Proposition 1 as well as in the following, we need the next lemma. Similar statements are [53, Proposition 13] and [60, Lemma 2.5]. However, since there are differences concerning the precise estimate in these references, and the setup here is slightly different, we provide a brief proof here for the readers convenience.

Lemma 6

Let ${\mathcal {P}}$ be a partitioning and ${\mathcal {I}} = {\mathcal {I}}_{{\mathcal {P}}}$ be the corresponding induced partitioning. For arbitrary $v=(v_1,\ldots ,v_S)\in {\mathcal {A}}^{{\mathcal {P}}}$,

$$\begin{aligned} \Vert C v \Vert = \Vert C P_{{\mathcal {I}}} v \Vert \le \tfrac{1}{\rho }\Vert \mu ^*\Vert + \sqrt{\tfrac{F_{\rho }(v)- \min _{x \in {\mathcal {A}}^{{\mathcal {P}}}} F_{\rho }(x)}{\rho }}, \end{aligned}$$

(69)

where $\mu ^*$ is an arbitrary Lagrange multiplier of (48).

Proof

By [53, Corollary 2], we have for arbitrary $v=(v_1,\ldots ,v_S)\in {\mathcal {A}}^{{\mathcal {P}}}$ that

$$\begin{aligned} \sum \nolimits _{s=1}^S \tfrac{1}{S} \left\| Av_s - f \right\| _2^2 - \min _{(y,\ldots ,y) \in {\mathcal {B}}^{{\mathcal {P}}}} \left\| Ay - f \right\| _2^2 \ge - \Vert \mu ^*\Vert \ \Vert Cv\Vert . \end{aligned}$$

(70)

Then,

$$\begin{aligned} F_{\rho }(v)- \min _{x \in {\mathcal {A}}^{{\mathcal {P}}}} F_{\rho }(x)&\ge \sum \nolimits _{s=1}^S \tfrac{1}{S} \left\| Av_s - f \right\| _2^2 + \rho \Vert C v \Vert ^2 - \min _{(y,\ldots ,y) \in {\mathcal {B}}^{{\mathcal {P}}}} F_{\rho }(y,\ldots ,y) \nonumber \\&= \sum \nolimits _{s=1}^S \tfrac{1}{S} \left\| Av_s - f \right\| _2^2 +\rho \Vert C v \Vert ^2 - \min _{(y,\ldots ,y) \in {\mathcal {B}}^{{\mathcal {P}}}} \left\| Ay - f \right\| _2^2 \nonumber \\&\ge \rho \Vert C v \Vert ^2 - \Vert \mu ^*\Vert \ \Vert Cv\Vert . \end{aligned}$$

(71)

For the first inequality, we wrote down the definition of $F_{\rho }$ and restricted the set with respect to which the minimum is formed which results in a potentially larger function value. For the second inequality we notice that, for $(y,\ldots ,y) \in {\mathcal {B}}^{{\mathcal {P}}}$, we have $C(y,\ldots ,y)=0,$ and for the last inequality we employed (70). Now, writing $ z^2 - \frac{\Vert \mu ^*\Vert }{\Vert \rho \Vert } z$ $= z^2 - \frac{\Vert \mu ^*\Vert }{\rho } z + \left( \tfrac{\Vert \mu ^*\Vert }{2\rho }\right) ^2 - \left( \tfrac{\Vert \mu ^*\Vert }{2\rho }\right) ^2$ $= (z - \tfrac{\Vert \mu ^*\Vert }{2\rho } )^2 - \left( \tfrac{\Vert \mu ^*\Vert }{2\rho }\right) ^2 $ and plugging this into (71) with $z := \Vert Cv\Vert $ yields

$$\begin{aligned} \tfrac{F_{\rho }(v)- \min _{x \in {\mathcal {A}}^{{\mathcal {P}}}} F_{\rho }(x)}{\rho } \ge \left( \Vert Cv\Vert - \tfrac{\Vert \mu ^*\Vert }{2\rho } \right) ^2 - \left( \tfrac{\Vert \mu ^*\Vert }{2\rho }\right) ^2, \end{aligned}$$

(72)

and hence

$$\begin{aligned} \left| \Vert Cv\Vert - \tfrac{\Vert \mu ^*\Vert }{2\rho } \right| \le \sqrt{\tfrac{F_{\rho }(v)- \min _{x \in {\mathcal {A}}^{{\mathcal {P}}}} F_{\rho }(x)}{\rho } + \left( \tfrac{\Vert \mu ^*\Vert }{2\rho }\right) ^2 } \le \sqrt{\tfrac{F_{\rho }(v)- \min _{x \in {\mathcal {A}}^{{\mathcal {P}}}} F_{\rho }(x)}{\rho }} + \tfrac{\Vert \mu ^*\Vert }{2\rho } \end{aligned}$$

(73)

where the last inequality is a consequence of the fact that the unit ball w.r.t. the $\ell ^1$ norm is contained in the unit ball w.r.t. the $\ell ^2$ norm. As a consequence, $\Vert Cv\Vert \le \sqrt{\tfrac{F_{\rho }(v)- \min _{x \in {\mathcal {A}}^{{\mathcal {P}}}} F_{\rho }(x)}{\rho }} + \tfrac{\Vert \mu ^*\Vert }{2\rho } + \tfrac{\Vert \mu ^*\Vert }{2\rho }$ which completes the proof. $\square $

Next, we see that for any local minimizer of the quadratic penalty relaxation (5), we can find a nearby feasible point using the projection procedure (Procedure 1) proposed in Sect. 2.2. Further, if the imaging operator A is lower bounded, we find a nearby minimizer.

Proposition 2

Procedure 1 applied to a local minimizer $u'=(u'_1,\ldots ,u'_S)$ of the quadratic penalty relaxation (5) produces a feasible image ${{\hat{u}}}$ (together with a valid partitioning) for the Potts problem (3) which is close to $u'$ in the sense that

$$\begin{aligned} \Vert u_s'-{{\hat{u}}}\Vert \le C_1 \varepsilon \qquad \text {for all} \quad s \in \{1,\ldots ,S\}, \end{aligned}$$

(74)

where $\varepsilon = \max _{s,s'} \Vert u'_s-u'_{s'}\Vert $ quantifies the deviation between the $u_s.$ Here $C_1 = \# \Omega /4, $ where the symbol $\# \Omega $ denotes the number of elements in $\Omega .$

If the imaging operator A is lower bounded, i.e., there is a constant $c>0$ such that $\Vert Au\Vert \ge c \Vert u\Vert $, a local minimizer $u^*$ of the Potts problem (3) is nearby, i.e.,

$$\begin{aligned} \Vert u^*-{{\hat{u}}}\Vert \le \frac{\sqrt{\eta }}{c} \end{aligned}$$

(75)

where

$$\begin{aligned} \eta := \left( \Vert A \Vert ^2 \varepsilon C_1^2 + 2 \Vert A \Vert C_1 \Vert f \Vert _2 \right) \varepsilon . \end{aligned}$$

(76)

Proof

We denote the directional partitioning induced by $u'$ by ${\mathcal {I}}$ and the corresponding induced partitioning by ${\mathcal {P}} = {\mathcal {P}}_{{\mathcal {I}}}.$ We note that Procedure 1 applied to $u'$ precisely produces

$$\begin{aligned} ({{\hat{u}}},\ldots ,{{\hat{u}}})= Q_{\mathcal {P}} u', \end{aligned}$$

(77)

with the projection $Q_{\mathcal {P}}$ given by (47). We first note, that the average $({{\bar{u}}})_{ij} = \tfrac{1}{S} \sum _{s=1}^{S} (u_s')_{ij}$ fulfills $|({{\bar{u}}})_{ij}- (u_s')_{ij}|<\varepsilon .$ Further, the function value of ${{\hat{u}}}$ which is piecewise constant w.r.t. ${\mathcal {P}}$ is obtained by ${{\hat{u}}}|_{{\mathcal {P}}_i} = \sum _{x \in {\mathcal {P}}_i}{{\bar{u}}}(x)/ \# {\mathcal {P}}_i.$ Hence, we may estimate

$$\begin{aligned} \Vert u_s'-{{\hat{u}}}\Vert _2^2 \le \varepsilon L, \end{aligned}$$

(78)

where L is the maximal length of a path connecting any two pixels as given by Definition 1. As a worst case estimate, we get $L\le C_1$ where we define $C_1$ as one fourth of the number of elements in $\Omega ,$ i.e., $C_1 = \tfrac{\# \Omega }{4}.$ This shows (74).

For $F_{\rho }$ given by (62), we have

$$\begin{aligned} F_{\rho }(u')&\le F_{\rho }({{\hat{u}}},\ldots ,{{\hat{u}}}) = \sum \nolimits _{s=1}^S \tfrac{1}{S} \left\| A{{\hat{u}}} - f \right\| _2^2 \nonumber \\&\le \sum \nolimits _{s=1}^S \tfrac{1}{S} \left( \left\| A{{\hat{u}}} - A u_s' \right\| _2 + \left\| Au_s' - f \right\| _2\right) ^2 \nonumber \\&\le \sum \nolimits _{s=1}^S \tfrac{1}{S} \left( \Vert A \Vert \varepsilon C_1 + \left\| Au_s' - f \right\| _2\right) ^2 \\&\le \Vert A \Vert ^2 \varepsilon ^2 C_1^2 + 2 \Vert A \Vert \varepsilon C_1 \sum \nolimits _{s=1}^S \tfrac{1}{S} \left\| Au_s' - f \right\| _2 + \sum \nolimits _{s=1}^S \tfrac{1}{S} \left\| Au_s' - f \right\| _2^2 \nonumber \\&\le \eta + F_{\rho }(u'), \nonumber \end{aligned}$$

(79)

with

$$\begin{aligned} \eta = \left( \Vert A \Vert ^2 \varepsilon C_1^2 + 2 \Vert A \Vert C_1 \Vert f \Vert _2 \right) \varepsilon , \end{aligned}$$

(80)

as given in (76). The first inequality holds since as a local minimizer of the quadratic penalty relaxation (5), $u'$ is the global minimizer of $F_{\rho }$ on ${\mathcal {A}}^{{\mathcal {P}}}$ by Lemma 4 and since $({{\hat{u}}},\ldots ,{{\hat{u}}}) \in {\mathcal {A}}^{{\mathcal {P}}}$ by construction. The next inequalities apply the triangle inequality and estimates on matrix norms. The last inequality is a consequence of the fact that $\sum \nolimits _{s=1}^S \tfrac{1}{S} \left\| Au_s' - f \right\| _2 \le \Vert f\Vert _2.$ Otherwise, if $\Vert Au_s' - f \Vert _2 > \Vert f \Vert _2,$ choosing $u_s'=0$ would yield a lower function value which would contradict the minimality of $u'.$

Now consider the partitioning ${\mathcal {P}}'$ induced by ${{\hat{u}}},$ and the corresponding minimizer $u^*,$ i.e.,

$$\begin{aligned} (u^*,\ldots ,u^*) = \mathop {{{\text {argmin}}}}\limits _{u \in {\mathcal {B}}^{{\mathcal {P}}'}} F_{\rho }(u) \end{aligned}$$

(81)

where, for $(u,\ldots ,u) \in {\mathcal {B}}^{{\mathcal {P}}'},$ we have $F_{\rho }(u,\ldots ,u) = \left\| Au - f \right\| _2^2.$ By Lemma 5, $u^*$ is a local minimizer of the Potts problem (2). On the other hand, by orthogonality in an inner product space, we have

$$\begin{aligned} Au^*= P_{A \left( {\mathcal {B}}^{{\mathcal {P}}'}\right) } f, \quad \text { and }\quad \Vert f- P_{A \left( {\mathcal {B}}^{{\mathcal {P}}'}\right) } f\Vert ^2 = \min _{u \in {\mathcal {B}}^{{\mathcal {P}}'}} F_{\rho }(u), \end{aligned}$$

(82)

where $P_{A \left( {\mathcal {B}}^{{\mathcal {P}}'}\right) }$ denotes the orthogonal projection onto the image of ${\mathcal {B}}^{{\mathcal {P}}'}$ under the linear mapping A. Thus,

$$\begin{aligned} \Vert A {{\hat{u}}} - Au^*\Vert ^2&= \Vert A {{\hat{u}}} - P_{A \left( {\mathcal {B}}^{{\mathcal {P}}'}\right) } f\Vert ^2 \nonumber \\&= \Vert A {{\hat{u}}} - f\Vert ^2 - \Vert f- P_{A \left( {\mathcal {B}}^{{\mathcal {P}}'}\right) } f\Vert ^2 = \Vert A {{\hat{u}}} - f\Vert ^2 - \Vert A u^*- f\Vert ^2. \end{aligned}$$

(83)

Inserting $u^*$ in the estimate (79), we get

$$\begin{aligned} F_{\rho }(u') \le F_{\rho }(u^*,\ldots ,u^*) \le F_{\rho }({{\hat{u}}},\ldots ,{{\hat{u}}}) \le \eta + F_{\rho }(u') \le \eta + F_{\rho }(u^*,\ldots ,u^*). \end{aligned}$$

(84)

This allows us to further estimate

$$\begin{aligned} \Vert A {{\hat{u}}} - Au^*\Vert ^2&= \Vert A {{\hat{u}}} - f\Vert ^2 - \Vert A u^*- f\Vert ^2 \le \Vert A u^*- f\Vert ^2+ \eta - \Vert A u^*- f\Vert ^2 = \eta . \end{aligned}$$

(85)

If now the operator A is lower bounded, then

$$\begin{aligned} \Vert {{\hat{u}}} - u^*\Vert ^2 < \frac{1}{c^2} \Vert A {{\hat{u}}} - Au^*\Vert ^2 \le \frac{\eta }{c^2} \end{aligned}$$

(86)

which completes the proof. $\square $

3.4 Majorization–Minimization for Multivariate Potts Problems

In this part, we build the basis for the convergence analysis of Algorithms 1 and 2.

We first recall some basics on surrogate functionals. We consider functionals F(u) of the form $F(u) = \Vert Xu-z\Vert ^2+ \gamma J(u),$ where X is a given (measurement) matrix with operator norm $\Vert X\Vert <1$ (with the operator norm formed w.r.t. the $\ell ^2$ norm), z is a given vector (of data), J is an arbitrary (not necessarily convex) lower semicontinuous functional, and $\gamma >0$ is a parameter. In general, the surrogate functional $F^{\mathrm{surr}}(u,v)$ of F(u) is given by

$$\begin{aligned} F^{\mathrm{surr}}(u,v) = F(u) + \Vert u-v\Vert ^2 - \Vert Xu-Xv\Vert ^2. \end{aligned}$$

(87)

Lemma 7

Consider the functionals $F(u) = \Vert Xu-z\Vert ^2+ \gamma J(u)$ as above with $\Vert X\Vert <1.$ (For our purposes, J is the regularizer $\Vert D(u)\Vert _{0,\omega }$ given by (10).) Then, we get for the associated surrogate functional $F^{\mathrm{surr}}$ given by (87) (with J as regularizer), that

i.
the inequality
$$\begin{aligned} F^{\mathrm{surr}}(u,v) \ge F(u) \end{aligned}$$
holds for all v; and $F^{\mathrm{surr}}(u,v) = F(u)$ holds if and only if $u=v;$
ii.
the functional values $F(u^k)$ of the sequence $u^k$ given by the surrogate iteration $u^{k+1} = {\text {argmin}}_u F^{\mathrm{surr}}(u,u^k)$ are non-increasing, i.e.,
$$\begin{aligned} F(u^{k+1}) \le F(u^{k}); \end{aligned}$$
(88)
iii.
the distance between consecutive members of the previous surrogate sequence $u^k$ converges to 0, i.e.,
$$\begin{aligned} \lim _{k \rightarrow \infty } \Vert u^{k+1}-u^k\Vert = 0. \end{aligned}$$
(89)

We note that—when minimizing F—the condition $\Vert X\Vert <1$ can always be achieved by rescaling, i.e., dividing the functional F by a number which is larger than $\Vert X\Vert ^2.$ Proofs of the general statements above on surrogate functionals (which do not rely on the specific structure of the problems considered here) may for instance be found in the above mentioned papers [9, 28, 33].

We now employ properties of the quadratic penalty relaxation $P_{\gamma , \rho }(u_1,\ldots ,u_S)$ of the Potts energy given by (5). The strategy is similar to the authors’ approach for the univariate case in [90]. We first show that the minimizers of $P_{\gamma , \rho }(u_1,\ldots ,u_S)$ (with $B= \mathrm {id}$ in (11)) which are precisely the solutions of (16) have a minimal directional jump height which only depends on the scale parameter $\gamma ,$ the directional weights $\omega _s$ and the constant $L_{\rho }$ but not on the particular input data. Here, for the multivariate discrete function $u= (u_1,\ldots ,u_S)$ (and the directional system $a_s,$ $s=1,\ldots ,S$) a directional jump is a jump in the sth component $u_s$ in direction $a_s$ for some s. In particular, jumps of $u_s$ in directions $a_{s'}$ with $s'\ne s$ are not considered.

Lemma 8

We consider the function $P_{\gamma , \rho }(u_1,\ldots ,u_S)$ of (11) for the choice $B=\mathrm {id}$ and data $h=(h_1,\ldots ,h_S)$. In other words, we consider the problem (16) for arbitrary data $h=(h_1,\ldots ,h_S)$. Then, there is a constant $c>0$ which is independent of the minimizer $u^*=(u_1^*,\ldots ,u_S^*)$ of (16) and the data h such that the minimal directional jump height $j_{\min }(u^*)$ (w.r.t. the directional system $a_s,$ $s=1,\ldots ,S,$) of a minimizer $u^{*}$ fulfills

$$\begin{aligned} j_{\min }(u^*) \ge c. \end{aligned}$$

(90)

The constant c depends on $\gamma ,$ the directional weights $\omega _s$ and the constant $L_{\rho }.$

Proof

Writing $ u = (u_1,\ldots ,u_S)$, we restate (16) as the problem of minimizing

$$\begin{aligned} P^{\mathrm {id}}_{\gamma /L_{\rho }^2}(u_1,\ldots ,u_S) = \left\| u - h \right\| _2^2 + \frac{\gamma }{L_{\rho }^2} \ \Big \Vert \ D(u_1,\ldots ,u_S) \ \Big \Vert _{0,\omega } \end{aligned}$$

(91)

where we use the notation $\Vert D(u_1,\ldots ,u_S)\Vert _{0,\omega } = \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} u_s \, \right\| _0$ introduced in (10). We let

$$\begin{aligned} c = \sqrt{\tfrac{\gamma \ \min _{s \in \{1,\ldots ,S\}} \omega _s }{L_{\rho }^2 W}}, \end{aligned}$$

(92)

where W denotes the maximal length of the signal u per dimension (e.g., if u denotes an $l \times b$ image, then $W=\max (l,b)$.) We now assume that $h_{\min }(u^*) < c,$ which means that the minimizer $u^*$ has a directional jump of height smaller than c. For such $u^*,$ we construct an element $u'$ with a smaller $P^{\mathrm {id}}_{\gamma /L_{\rho }^2}$ value which yields a contradiction since $u^*$ is a minimizer of $P^{\mathrm {id}}_{\gamma /L_{\rho }^2}$. To this end, we let $a_s$ be a direction such that the component $u^*_s$ of $u^*$ has a jump of height smaller than c. We denote the (discrete) directional intervals in direction $a_s$ near the directional jump by $I_1,I_2$ and the corresponding points near the directional jump of $u_s^*$ by $x_1$ and $x_2.$ We let $m_1,m_2$ and m be the mean of $h_s$ on $I_1,I_2$ and $I_1 \cup I_2$, respectively. We define

$$\begin{aligned} u_{s'}' = u_{s'}^*\quad \text {if} \quad s' \ne s, \qquad \text { and } \qquad u_s'(x) = {\left\{ \begin{array}{ll} m &{} \text { for } x \in I_1 \cup I_2 \\ u_s^*(x) &{} \text { elsewhere. } \end{array}\right. } \end{aligned}$$

(93)

By construction, $\Vert \nabla u'\Vert _0 = \Vert \nabla u^*\Vert _0 -1,$ and thus

$$\begin{aligned} \Vert D(u'_1,\ldots ,u'_S)\Vert _{0,\omega } = \Vert D(u^*_1,\ldots ,u^*_S)\Vert _{0,\omega } - \omega _s \le \Vert D(u^*_1,\ldots ,u^*_S)\Vert _{0,\omega } - \min _{s \in \{1,\ldots ,S\}} \omega _s. \end{aligned}$$

(94)

Since $u^*$ is a minimizer of $P^{\mathrm {id}}_{\gamma /L_{\rho }^2},$ its sth component $u^*_s$ equals $m_1$ on $I_1$ and $m_2$ on $I_2.$ Further, as $u_{s'}' = u_{s'}^*$ if $s' \ne s$ and $u_s^*$ and $u_s'$ only differ on $I_1 \cup I_2,$ we have that

$$\begin{aligned} \Vert u'-h\Vert ^2 = \sum _{s'=1}^S \Vert u_{s'}'-h_{s'}\Vert ^2&= \sum _{s'=1, s' \ne s }^S \Vert u_{s'}^*-h_{s'}\Vert ^2 +\Vert u_s^*-h_s\Vert ^2 \nonumber \\&\quad + l_1 |m_1-m|^2 + l_2 |m_2-m|^2 \nonumber \\&< \Vert u^*-h\Vert ^2 + W c^2, \end{aligned}$$

(95)

where $l_1,l_2$ denote the length of $I_1$ and $I_2,$ respectively. Employing (94) together with (95), we get

$$\begin{aligned} P^{\mathrm {id}}_{\gamma /L_{\rho }^2}(u'_1,\ldots ,u'_S)&= \left\| u' - h \right\| _2^2 + \frac{\gamma }{L_{\rho }^2} \ \Big \Vert \ D(u'_1,\ldots ,u'_S) \ \Big \Vert _{0,\omega } \\&< \Vert u^*-h\Vert ^2 + W c^2 + \frac{\gamma }{L_{\rho }^2}\Vert D(u^*_1,\ldots ,u^*_S)\Vert _{0,\omega } - \frac{\gamma }{L_{\rho }^2}\min _{s \in \{1,\ldots ,S\}} \omega _s \\&\le \Vert u^*-h\Vert ^2 + \frac{\gamma }{L_{\rho }^2}\Vert D(u^*_1,\ldots ,u^*_S)\Vert _{0,\omega } = P^{\mathrm {id}}_{\gamma /L_{\rho }^2}(u^*_1,\ldots ,u^*_S). \end{aligned}$$

The validity of the last inequality follows by (92). Together, $u'$ has a smaller function value than $u^*$ which is a contradiction to $u^*$ being a minimizer which shows the assertion. $\square $

Proposition 3

The iteration (19) of Algorithm 1 converges to a local minimizer of the quadratic penalty relaxation $P_{\gamma , \rho }$ of the Potts objective function given by (5). The convergence rate is linear.

Proof

We divide the proof into three parts. First, we show that the directional partitionings induced by the iterates $u^{(n)}$ become fixed after sufficiently many iterations. In a second part, we derive the convergence of Algorithm 1 and, in a third part, we show that the limit point is a local minimizer of $P_{\gamma , \rho }$.

(1) We first show that the directional partitioning ${\mathcal {I}}^n$ induced by the iterates $u^{(n)}$ gets fixed for large n. For every $n \in {\mathbb {N}},$ the iterate $u^{(n)}$ of Algorithm 1 is a minimizer of the function $P_{\gamma , \rho }$ of (11) for the choice $B=\mathrm {id}$ as it appears in (16). Here, the data $h=(h_1,\ldots ,h_S)$ is given by (17). By Lemma 8, there is a constant $c>0$ which is independent of the particular $u^{(n)} =(u_1^{(n)},\ldots ,u_S^{(n)})$ of (16) and the data h such that the minimal directional jump height $j_{\min }(u^{(n)})$ fulfills

$$\begin{aligned} j_{\min }(u^{(n)}) \ge c \quad \text { for all }\quad n \in {\mathbb {N}}. \end{aligned}$$

(96)

We note that the parameter $\gamma ,$ the directional weights $\omega _s$ and the constant $L_{\rho }$ which the constant c depends on by Lemma 8 do not change during the iteration of Algorithm 1.

If two iterates $u^{(n)},u^{(n+1)}$ have different induced directional partitionings ${\mathcal {I}}^n, {\mathcal {I}}^{n+1}$ their $\ell ^\infty $ distance always fulfills $\Vert u^{(n)}-u^{(n+1)}\Vert _\infty > c/2$ since both $u^{(n)},u^{(n+1)}$ have minimal jump height of at least c and different induced directional partitionings. This implies $\Vert u^{(n)}-u^{(n+1)}\Vert _2> c/2$ for the $\ell ^2$ distance as well. This may only happen for small n, since Lemma 7 by (89), we have $\Vert u^{(n)}-u^{(n+1)}\Vert _2 \rightarrow 0$ as n increases. Hence, there is an index N such that, for all $n \ge N,$ the directional partitionings ${\mathcal {I}}^n$ are identical.

(2) We use the previous observation to show the convergence of Algorithm 1. We consider iterates $u^{(n)}$ with $n \ge N;$ they have the same induced directional partitionings which we denote by ${\mathcal {I}}',$ and all jumps have minimal jump height c. Hence, for $n \ge N,$ the iteration of (16) can be written as

$$\begin{aligned} u^{(n+1)} = P_{{\mathcal {I}}'}(h^{(n)}) \end{aligned}$$

(97)

with $P_{{\mathcal {I}}'}$ being the orthogonal projection onto the $\ell ^2$ space ${\mathcal {A}}^{{\mathcal {P}}}$ consisting of functions which are piecewise constant w.r.t. the directional partitioning ${\mathcal {I}}',$ and where $h^{(n)}$ depends on $u^{(n)}$ via

$$\begin{aligned} h^{(n)}_s = u^{(n)}_s + \tfrac{1}{SL_{\rho }^2} A^*f - \tfrac{1}{S L_{\rho }^2} A^*A u^{(n)}_s - \sum _{s':s' \ne s}\tfrac{\rho _{s,s'}}{L_{\rho }^2} (u^{(n)}_s-u^{(n)}_{s'}), \quad \text { for all } s \in \{1,\ldots ,S\}, \end{aligned}$$

(98)

as given by (17). As introduced before, we use the symbols ${\tilde{A}}$ to denote the block diagonal matrix with constant entry A on each diagonal component, and ${\tilde{f}}$ for the block vector of corresponding dimensions with entry f in each component. With this notation, we may write (97) as

$$\begin{aligned} u^{(n+1)} = P_{{\mathcal {I}}'}((I - \tfrac{1}{SL_{\rho }^2} ({\tilde{A}})^\mathrm{T} {\tilde{A}}- \tfrac{1}{SL_{\rho }^2} \rho \ C^\mathrm{T}C) u^{(n)} + \tfrac{1}{SL_{\rho }^2} {\tilde{A}}^\mathrm{T}{\tilde{f}}). \end{aligned}$$

(99)

Since $u^{(n)}$ is piecewise constant w.r.t. the directional partitioning ${\mathcal {I}}',$ we have $u^{(n)} = P_{{\mathcal {I}}'} u^{(n)}.$ Using this fact and the fact that $P_{{\mathcal {I}}'}$ is an orthogonal projection we obtain

$$\begin{aligned} u^{(n+1)}= & {} \left( I - \left( \left( \tfrac{{\tilde{A}} P_{{\mathcal {I}}'}}{\sqrt{S} L_{\rho }} \right) ^\mathrm{T} \left( \tfrac{{\tilde{A}} P_{{\mathcal {I}}'}}{\sqrt{S} L_{\rho }} \right) + \left( \tfrac{\sqrt{\rho }C P_{{\mathcal {I}}'}}{\sqrt{S} L_{\rho }} \right) ^\mathrm{T} \left( \tfrac{\sqrt{\rho }C P_{{\mathcal {I}}'}}{\sqrt{S} L_{\rho }} \right) \right) \right) u^{(n)} \nonumber \\&+ \left( \tfrac{{\tilde{A}} P_{{\mathcal {I}}'}}{\sqrt{S} L_{\rho }} \right) ^\mathrm{T} \tfrac{{\tilde{f}}}{{\sqrt{S} L_{\rho }}}. \end{aligned}$$

(100)

Since $C {\tilde{A}}^\mathrm{T}{\tilde{f}} = 0,$ the iteration (100) can be interpreted as Landweber iteration for the block matrix consisting of the upper block $({\tilde{A}} P_{{\mathcal {I}}'})/(\sqrt{S} L_{\rho })$ and the lower block $(\sqrt{\rho }C P_{{\mathcal {I}}'})/(\sqrt{S} L_{\rho })$ and data ${\tilde{f}}/(\sqrt{S} L_{\rho })$ extended by 0. The Landweber iteration converges at a linear rate; cf., e.g., [31]. Thus, the iteration (97) convergences and, in turn, we get the convergence of Algorithm 1 at a linear rate to some limit $u^*$.

(3) We show that $u^*$ is a local minimizer. Since $u^*$ is the limit of the iterates $u^{(n)} $, the jumps of $u^*$ also have minimal height c, the number of jumps are equal to those of the $u^{(n)}$ for all $n \ge N,$ and the induced directional partitioning ${\mathcal {I}}^*$ equals the partitioning ${\mathcal {I}}'$ of the $u^{(n)}$ for $n \ge N.$ Since $u^*$ equals the limit of the above Landweber iteration, $u^*$ minimizes $F_\rho $ given by (62) on ${\mathcal {A}}^{{\mathcal {P}}_{{\mathcal {I}}'}}.$ Then, by Lemma 4$u^*$ is a local minimizer of the relaxed Potts energy $P_{\gamma ,\rho }$ which completes the proof. $\square $

After having shown the convergence of Algorithm 1 to a local minimizer, we have now gathered all information to show Theorem 4.

Proof of Theorem 4

Assertion i. was stated and shown as Proposition 1 in Sect. 3.3. By Proposition 3, Algorithm 1 produces a local minimizer. Then, the assertion ii. is a consequence of Proposition 2. $\square $

3.5 Estimating the Distance Between the Objectives

The next lemma is a preparation for the proof of item (iii) of Theorem 3.

Lemma 9

We consider Algorithm 1 for the quadratic penalty relaxation (5) of the multivariate Potts problem. For any output $u =(u_1,\ldots ,u_S)$ of Algorithm 1 we have that

$$\begin{aligned} \left( \sum \nolimits _{s,s'} c_{s,s'} \Vert u_s - u_{s'}\Vert ^2_2\right) ^{\tfrac{1}{2}} \le 2 \sigma _1^{-1/2} S^{-1/2} \Vert A\Vert \Vert f\Vert / \rho . \end{aligned}$$

(101)

Here, $\sigma _1$ denotes the smallest nonzero eigenvalue of $C^\mathrm{T}C$ with C given by (49).

Proof

Since $u =(u_1,\ldots ,u_S)$ is the output of Algorithm 1 it is a local minimizer of the relaxed Potts problem (5). In particular, there is a directional partitioning ${\mathcal {I}}$ with respect to which u is piecewise constant. We denote the induced partitioning by ${\mathcal {P}} ={\mathcal {P}}_{{\mathcal {I}}}.$ By Lemma 6, we have

$$\begin{aligned} \left( \sum \nolimits _{s,s'} c_{s,s'} \Vert u_s - u_{s'}\Vert ^2_2\right) ^{\tfrac{1}{2}} = \Vert C u \Vert = \Vert C P_{{\mathcal {I}}} u \Vert \le \tfrac{1}{\rho }\Vert \mu ^*\Vert + \sqrt{\tfrac{F_{\rho }(u)- \min _{x \in {\mathcal {A}}^{{\mathcal {P}}}} F_{\rho }(x)}{\rho }},\nonumber \\ \end{aligned}$$

(102)

where $\mu ^*$ is an arbitrary Lagrange multiplier of (48). By Lemma 3, we have that $\Vert \mu ^*\Vert \le 2 \sigma _1^{-1/2} S^{-1/2} \Vert A\Vert \Vert f\Vert ,$ for any partitioning of the discrete domain $\Omega ,$ and in particular for the partitioning ${\mathcal {P}} ={\mathcal {P}}_{{\mathcal {I}}}.$ This shows that

$$\begin{aligned} \Vert Cu\Vert \le 2 \sigma _1^{-1/2} S^{-1/2} \Vert A\Vert \Vert f\Vert / \rho + \sqrt{\tfrac{F_{\rho }(u)- \min _{x \in {\mathcal {A}}^{{\mathcal {P}}}} F_{\rho }(x)}{\rho }}. \end{aligned}$$

Since u is a local minimizer of the relaxed Potts problem (5), it is a minimizer of $F_{\rho }$ on ${\mathcal {A}}^{{\mathcal {P}}}$ by Lemma 4, and the second summand on the right-hand side equals zero. This shows (101) and completes the proof. $\square $

We have now gathered all information necessary to show Theorem 3.

Proof of Theorem 3

Part (i) is shown by Proposition 3.

Concerning (ii), we first show that any global minimizer of the relaxed Potts energy $P_{\gamma , \rho }$ given by (5) appears as a stationary point of Algorithm 1. To this end, we start Algorithm 1 with a global minimizer ${{\bar{u}}}^*=(u_1^*,\ldots ,u_S^*)$ as initialization. Then, we have for all ${{\bar{v}}}=(v_1,\ldots ,v_S)$ with ${{\bar{v}}} \ne {{\bar{u}}}^*,$

$$\begin{aligned} P_{\gamma , \rho }^{\mathrm{surr}}\left( v_1,\ldots ,v_S,u_1^*,\ldots ,u_S^*\right)&= P_{\gamma , \rho }({{\bar{v}}}) - \Vert B{{\bar{v}}}-B{{\bar{u}}}^*\Vert ^2 + \Vert {{\bar{v}}}-{{\bar{u}}}^*\Vert ^2\\&> P_{\gamma , \rho }({{\bar{v}}}) \ge P_{\gamma , \rho }({{\bar{u}}}^*) = P_{\gamma , \rho }^{\mathrm{surr}} ({{\bar{u}}}^*,{{\bar{u}}}^*). \nonumber \end{aligned}$$

(103)

Here, B is given by (8). The estimate (103) means that ${{\bar{u}}}^*$ is the minimizer of the surrogate functional w.r.t. the first component, i.e., it is the minimizer of the mapping ${{\bar{v}}} \mapsto $ $P_{\gamma , \rho }^{\mathrm{surr}}({{\bar{v}}},{{\bar{u}}}^*).$ Thus, the iterate ${{\bar{u}}}^{(1)}=(u^{(1)}_1,\ldots ,u^{(1)}_S)$ of Algorithm 1 equals ${{\bar{u}}}^*$ when the iteration is started with ${{\bar{u}}}^*.$ Thus, the global minimizer ${{\bar{u}}}^*$ is a stationary point of Algorithm 1.

It remains to show that each stationary point of Algorithm 1 is a local minimizer of the relaxed Potts energy $P_{\gamma , \rho }$. This has essentially already been done in the proof of Proposition 3: start the iteration given by (16) with a stationary point $u';$ its limit equals $u'$ and is thus a local minimizer by Proposition 3.

Concerning (iii), we use Lemma 9 to estimate

$$\begin{aligned} \left( \sum \nolimits _{s,s'} c_{s,s'} \Vert u_s - u_{s'}\Vert ^2_2\right) ^{\tfrac{1}{2}} \le 2 \sigma _1^{-1/2} S^{-1/2} \Vert A\Vert \Vert f\Vert / \rho < \varepsilon . \end{aligned}$$

(104)

The second inequality follows by our choice of $\rho $ in (34) as $\rho > 2 \varepsilon ^{-1} \ \sigma _1^{-1/2} S^{-1/2} \Vert A\Vert \Vert f\Vert .$ This shows the validity of (iii) and completes the proof. $\square $

3.6 Convergence Analysis of Algorithm 2

We start out showing that Algorithm 2 is well defined in the sense that the inner iteration governed by (25) terminates. This result was formulated as Theorem 5.

Proof of Theorem 5

We have to show that, for any $k \in {\mathbb {N}},$ there is $n \in {\mathbb {N}}$ such that

$$\begin{aligned} \left\| u^{(k,n)}_s - u^{(k,n)}_{s'} \right\| \le \frac{t}{\rho _k \sqrt{c_{s,s'}}}, \quad \text { and } \quad \left\| u^{(k,n)}_s - u^{(k,n-1)}_s \right\| \le \frac{\delta _k}{L_{\rho }}. \end{aligned}$$

(105)

To see the right-hand side of (105), we notice that, by Proposition 3, the iteration (19) converges to a local minimizer of the quadratic penalty relaxation $P_{\gamma , \rho }(u_1,\ldots ,u_S)$ of the Potts energy. The inner loop of Algorithm 2 precisely computes the iteration (19) (for the penalty parameter $\rho $ which increases with k.) Thus, the distance between consecutive iterates $u^{(k,n)}_s,u^{(k,n-1)}_s$ converges to zero as n increases which implies the validity of the right-hand side of (105) for sufficiently large n, and all $k \in {\mathbb {N}}.$

To see the left-hand inequality in (105), we notice that, by the considerations above, the inner loop of Algorithm 2 would converge to a minimizer ${{\bar{u}}}^{(k),*} = (u_1^{(k),*},\ldots ,u_S^{(k),*})$ if it was not terminated by (105) for all $k \in {\mathbb {N}}.$ Since ${{\bar{u}}}^{(k),*}$ is a local minimizer of the relaxed Potts problem (5) for the parameter $\rho _k$, it is a minimizer of $F_{\rho _k}$ on ${\mathcal {A}}^{{\mathcal {P}}}$ (where ${\mathcal {P}}$ denotes the partitioning induced by ${{\bar{u}}}^{(k),*}$) by Lemma 4. Hence, for any $k\in {\mathbb {N}}$ and any $\xi >0$ there is ${{\bar{u}}}^{(k,n)} =(u^{(k,n)}_1,\ldots ,u^{(k,n)}_S)$ such that $F_{\rho _k}({{\bar{u}}}^{(k,n)})- F_{\rho _k}({{\bar{u}}}^{(k),*}) < \xi .$ We let $\tau = (t - 2\sigma _1^{-1/2} S^{-1/2} \Vert A\Vert \ \Vert f\Vert ) /\rho _k,$ and choose $\xi = \rho _k \tau ^2.$ Using this together with Lemma 6, we estimate

$$\begin{aligned} \sqrt{c_{s,s'}} \Vert u_s^{(k,n)} - u_{s'}^{(k,n)}\Vert _2 = \Vert C {{\bar{u}}}^{(k,n)} \Vert&\le \tfrac{1}{\rho _k}\Vert \mu ^*\Vert + \sqrt{\tfrac{F_{\rho _k}({{\bar{u}}}^{(k,n)})- F_{\rho _k}({{\bar{u}}}^{(k),*})}{\rho _k}} \nonumber \\&\le \tfrac{1}{\rho _k}\Vert \mu ^*\Vert + \sqrt{\tfrac{\xi }{\rho _k}} \le \tfrac{1}{\rho _k}\Vert \mu ^*\Vert + \tau \le \tfrac{t}{\rho _k} \end{aligned}$$

(106)

where $\mu ^*$ is an arbitrary Lagrange multiplier of (48). Here, the last inequality is true since by Lemma 3 we have that $\Vert \mu ^*\Vert \le 2 \sigma _1^{-1/2} S^{-1/2} \Vert A\Vert \Vert f\Vert $ which implies that $\tau \le (t-\mu ^*)/\rho _k.$ The estimate (106) shows the left-hand inequality in (105) and completes the proof. $\square $

We have now gathered all information to prove Theorem 6 which deals with the convergence properties of Algorithm 2.

Proof of Theorem 6

We start out to show that any accumulation point of the sequence $u^{(k)}$ produced by Algorithm 2 is a local minimizer of the Potts problem (3). Let $u^*$ be such an accumulation point and let ${\mathcal {I}}^*$ be the directional partitioning induced by $u^*.$ We may extract a subsequence $u^{(k_l)}$ of the sequence $u^{(k)}$ such that $u^{(k_l)}$ converges to $u^*$ as $l \rightarrow \infty ,$ and such that the directional partitionings ${\mathcal {I}}^{k_l}$ induced by the $u^{(k_l)}$ all equal the directional partitioning ${\mathcal {I}}^*,$ i.e., ${\mathcal {I}}^{k_l}={\mathcal {I}}^*$ for all $l \in {\mathbb {N}}.$ We let

$$\begin{aligned} \mu ^{k_l} = - 2 \rho _{k_l} \ C u^{k_l} \end{aligned}$$

(107)

with the matrix C given by (49), and estimate

$$\begin{aligned} \nonumber \Vert \tfrac{2}{S}{\tilde{A}}^\mathrm{T} {\tilde{A}} u^{k_l} - \tfrac{2}{S}{\tilde{A}}^\mathrm{T} {\tilde{f}}- C^\mathrm{T} \mu ^{k_l} \Vert&= \Vert \tfrac{2}{S}{\tilde{A}}^\mathrm{T} {\tilde{A}} u^{k_l} - \tfrac{2}{S}{\tilde{A}}^\mathrm{T} {\tilde{f}}+ 2 \rho _{k_l} C^\mathrm{T} \ C u^{k_l}\Vert \\&= \Vert \nabla F_{\rho _{k_l}}(u^{k_l}) \Vert \le \tfrac{\delta _{k_l}}{L_{\rho _{k_l}}} \le \delta _{k_l}. \end{aligned}$$

(108)

We recall that ${\tilde{A}}$ was the block diagonal matrix having the matrix A as entry in each diagonal component and that $F_{\rho _{k_l}}$ was given by (62). We notice that the second before last inequality follows by the right-hand side of (105). We further estimate

$$\begin{aligned} \Vert \mu ^{k_l}\Vert = \rho _{k_l} \Vert C u^{k_l}\Vert < \rho _{k_l} \tfrac{S t}{\rho _{k_l}} = S t \end{aligned}$$

which is a consequence of the left-hand side of (105). Hence, the sequence $\mu ^{k_l}$ is bounded and thus has a cluster point, say $\mu ^*,$ by the Bolzano–Weierstraß Theorem. By passing to a further subsequence (where we suppress the new indexation for better readability and still use the symbol l for the index), we get that

$$\begin{aligned} \mu ^{k_l} \rightarrow \mu ^*\quad \text { as }\quad l \rightarrow \infty . \end{aligned}$$

(109)

Now, on this subsequence, we have that $u^{(k_l)} \rightarrow u^{*}$ and that $\mu ^{k_l} \rightarrow \mu ^*.$ Hence, taking limits on both sides of (108) yields

$$\begin{aligned} \tfrac{2}{S}{\tilde{A}}^\mathrm{T} {\tilde{A}} u^{*} - \tfrac{2}{S}{\tilde{A}}^\mathrm{T} {\tilde{f}} - C^\mathrm{T} \mu ^{*} = 0, \end{aligned}$$

(110)

since $\delta _{k_l} \rightarrow 0$ as $l \rightarrow \infty .$ Further,

$$\begin{aligned} \Vert Cu^*\Vert \le \lim _{l \rightarrow \infty }\tfrac{\Vert \mu ^{k_l}\Vert }{\rho _{k_l}} \le \Vert \mu ^*\Vert \lim _{l \rightarrow \infty }\tfrac{1}{\rho _{k_l}} =0. \end{aligned}$$

(111)

This implies that the components of $u^*$ are equal, i.e., $u_s^{*} = u_{s'}^{*}$ for all $s,s'.$ In particular $u^*$ is a feasible point for the Potts problem (3). Or, letting ${\mathcal {P}}^*$ be the partitioning induced by $u^*,$ we have that $u^*\in {\mathcal {B}}^{{\mathcal {P}}}.$ Then, (110) tells us that $u^*$ minimizes (48) which by Lemma 5 tells us that $u^*$ is a local minimizer of (3), or synonymously, that any component of $u^*$ (which are all equal) minimizes the Potts problem (2). This shows the first assertion of Theorem 6.

We continue by showing the second assertion of Theorem 6, i.e., if A is lower bounded, then the sequence $u^{(k)}$ produced by Algorithm 2 has a cluster point. Then, by the above considerations, each cluster point is a local minimizer which shows the assertion. To this end, we show that, if A is lower bounded, the sequence $u^{(k)}$ produced by Algorithm 2 is bounded which by the Heine–Borel property of finite dimensional Euclidean space implies that it has a cluster point. So we assume that A is lower bounded, and consider the sequence $u^{(k)}=(u_1^{(k)},\ldots ,u_S^{(k)})$ produced by Algorithm 2. As in the proof of Theorem 5 we see that, for any $k \in {\mathbb {N}}$, there is a local minimizer $ u^{(k),*} = (u_1^{(k),*},\ldots ,u_S^{(k),*})$ of (5) such that

$$\begin{aligned} \Vert u^{(k)}- u^{(k),*}\Vert \le C_2 \delta _k, \end{aligned}$$

(112)

where $C_2$ is a constant independent of k. By Lemma 4, $ u^{(k),*}$ is a minimizer of $F_{\rho }$ on ${\mathcal {A}}^{{\mathcal {P}}}$ (where ${\mathcal {P}}$ denotes the partitioning induced by $ u^{(k),*}$.) Hence,

$$\begin{aligned} \tfrac{1}{S} \sum \nolimits _{s=1}^S\Vert A u_s^{(k),*} - f \Vert ^2 \le F_{\rho }( u^{(k),*}) \le \Vert f \Vert ^2 \end{aligned}$$

by choosing the function having the zero function as entry in each component as a candidate. This implies

$$\begin{aligned} \tfrac{1}{S} \sum \nolimits _{s=1}^S\Vert A u_s^{(k),*} \Vert ^2 \le 4 \Vert f \Vert ^2. \end{aligned}$$

(113)

Then, since A is lower bounded, there is a constant $c>0$ such that

$$\begin{aligned} \Vert u^{(k),*}\Vert ^2 = \tfrac{1}{S} \sum \nolimits _{s=1}^S\Vert u_s^{(k),*}\Vert ^2 \le \tfrac{1}{S} \sum \nolimits _{s=1}^S c^2\Vert A u_s^{(k),*}\Vert ^2 \le 4 c^2 \Vert f \Vert ^2 \end{aligned}$$

(114)

where we used (113) for the last inequality. Combining this estimate with (112) yields

$$\begin{aligned} \Vert u^{(k)}\Vert \le \Vert u^{(k)}- u^{(k),*}\Vert + \Vert u^{(k),*}\Vert \le C_2 \delta _k + 2 c \Vert f\Vert . \end{aligned}$$

(115)

Since we have chosen $\delta _k$ as a sequence converging to zero, (115) shows that the sequence $u^{(k)}$ is bounded which implies that it has cluster points. This completes the proof. $\square $

4 Numerical Results

In this section, we show the applicability of our methods to different imaging tasks. We start out by providing the necessary implementation details. Then, we compare the results of the quadratic relaxation (5) (Algorithm 1) to the ones of the Potts problem (2) (Algorithm 2). Next, we apply Algorithm 2 to blurred image data and to image reconstruction from incomplete Radon data. Finally, we consider the image partitioning problem according to the classical Potts model.

Implementation Details We implemented Algorithms 1 and 2 for the coupling schemes in (7) and the set of compass and diagonal directions $ (1,0),(0,1),(1,1),(1,-1)$ with weights $\omega _{1,2} = \sqrt{2}-1$ and $\omega _{3,4} = 1-\frac{\sqrt{2}}{2}$.

Concerning Algorithm 1, we observed both visually and quantitatively appealing results if we use relaxed step-sizes $L_\rho ^\lambda = L_\rho [\lambda + (1-(n+1)^{-1/2})(1-\lambda ) ]$ for an empirically chosen parameter $0<\lambda \le 1$, where $L_\rho $ denotes the estimate in Lemma 1. The iterations were stopped when the nearness condition (18) was fulfilled and the iterates did not change anymore, i.e., when $\Vert u_1^{(n)} - u_1^{(n-1)}\Vert /(\Vert u_1^{(n)}\Vert + \Vert u_1^{(n-1)}\Vert )$ and $\Vert u_2^{(n)} - u_2^{(n-1)}\Vert /(\Vert u_2^{(n)}\Vert + \Vert u_2^{(n-1)}\Vert )$ were smaller than $10^{-6}$. The result of Algorithm 1 was transformed into a feasible solution of (3) by applying the projection procedure described in Sect. 2.2 (Procedure 1). As initialization we applied 1000 Landweber iterations with step-size $1/\Vert A\Vert ^2$ to the least squares problem induced by the linear operator A and data f.

Concerning Algorithm 2, we set $\rho ^{(0)} = 10^{-3}$ in all experiments which we incremented by the factor $\tau = 1.05$ in each outer iteration. The $\delta $-sequence was chosen as $\delta ^{(k)} = \frac{1}{\eta \rho ^{(k)}}$ for $\eta = 0.95$ when coupling all variables and $\eta = 0.98$ when coupling consecutive variables. Similarly to Algorithm 1, step A of Algorithm 2 was performed using the relaxed step sizes $L_\rho ^\lambda = L_\rho [\lambda + (1-(n+1)^{-1/2})(1-\lambda ) ]$ for an application-dependent parameter $0<\lambda \le 1$ and for the estimate $L_\rho $ in Lemma 1. We stopped the iterations when the relative discrepancy of the first two splitting variables $\Vert u^{(k)}_1 - u^{(k)}_2 \Vert / (\Vert u^{(k)}_1 \Vert + \Vert u^{(k)}_2 \Vert ) $ was smaller than $10^{-6}$. We initialized Algorithm 2 with $A^\mathrm{T}f$.

Comparison of Algorithm 1 and Algorithm 2 We compare Algorithm 1 and Algorithm 2 for blurred image data, that is, the linear operator A in (1) amounts to the convolution by a kernel K. In the present experiment, we chose a Gaussian kernel with standard deviation $\sigma =7$ and of size $6\sigma + 1$. Here, we coupled all splitting variables and chose the step-size parameter $\lambda =0.4$ for Algorithm 1 and $\lambda = 0.35$ for Algorithm 2, respectively. In Fig. 1, we applied both methods to a blurred natural image. While both algorithms yield reasonable partitionings, Algorithm 2 provides smoother edges than Algorithm 1. Further, Algorithm 1 produces some smaller segments (at the treetops).

Application to Blurred Data For the following experiments, we focus on Algorithm 2. In case of motion blur we set the step-size parameter to $\lambda = 0.25$, while for Gaussian blur we set $\lambda =0.35$ as in Fig. 1. We compare our method with the Ambrosio–Tortorelli approximation [2] of the classical Mumford–Shah model (which itself tends to the piecewise constant Mumford–Shah model for increasing variation penalty) given by

$$\begin{aligned} \begin{aligned} A_\varepsilon (u,v) = \gamma \int \varepsilon \vert \nabla v \vert ^2 +\frac{(v-1)^2}{4\varepsilon } \mathrm {d}x +\alpha \int v^2 \Vert \nabla u \Vert ^2 \mathrm {d}x + \frac{1}{2} \int (K *u - f) \mathrm {d}x. \end{aligned} \end{aligned}$$

(116)

The variable v serves as an edge indicator and $\varepsilon >0$ is an edge smoothing parameter that is chosen empirically. The parameter $\gamma > 0$ controls the weight of the edge length penalty and the parameter $\alpha > 0$ penalizes the variation. In this respect, a higher value of $\alpha $ promotes solutions which are closer to being piecewise constant. In the limit $\alpha \rightarrow \infty $, minimizers of (116) are piecewise constant. Our implementation follows the scheme presented in [6]. The functional $A_\varepsilon $ is alternately minimized w.r.t. u and v. To this end, we iteratively solve the Euler–Lagrange equations

$$\begin{aligned} \begin{aligned} 2\alpha v \Vert \nabla u \Vert _2^2 + \gamma \frac{v-1}{2\varepsilon } - 2\varepsilon \gamma \nabla ^2 v&= 0, \\ (K*u - f) *{\widetilde{K}} - 2\alpha \mathrm {div}(v^2 \nabla u)&= 0, \end{aligned} \end{aligned}$$

(117)

where ${\widetilde{K}}(x) = K(-x)$. The first line is solved w.r.t. v using a MINRES solver and the second line is solved using the method of conjugate gradients [6]. The iterations were stopped when the relative change of both variables was small, i.e., if both $\Vert u^{k+1} - u^k \Vert /(\Vert u^k\Vert + 10^{-6}) <10^{-3}$ and $\Vert v^{k+1} - v^k \Vert /(\Vert v^k\Vert + 10^{-6}) <10^{-3}$.

Figure 2 shows the restoration of a traffic sign from simulated horizontal motion blur. For the Ambrosio–Tortorelli approximation, we set $\alpha = 10^5$ to promote a piecewise constant solution. We observe that both the Ambrosio–Tortorelli approximation and the proposed method restore the data to a human readable form. However, the Ambrosio–Tortorelli result shows clutter and blur artifacts. Our method provides sharp edges and it produces less artifacts.

In Fig. 3, we partition a natural image blurred by a Gaussian kernel and corrupted by Gaussian noise. We observed that the Ambrosio–Tortorelli result was heavily corrupted by artifacts for $\alpha =10^5$. This might be attributed to the underlying linear systems in scheme (117) which become severely ill-conditioned for large choices of the variation penalty $\alpha $. Therefore, we chose the moderate variation penalty $\alpha =10^{5}$ which does only provide an approximately piecewise constant (rather piecewise smooth) result. The result does not fully separate the background from the fish in terms of edges. On the other hand, the result of the proposed method sharply differentiates between background and fish. Further, it highlights various segments of the fish.

Reconstruction from Radon Data We here consider reconstruction from Radon data which appears for instance in computed tomography. We recall that the Radon transform reads

$$\begin{aligned} \begin{aligned} Ru(\theta ,s) = \int _{-\infty }^\infty u(s\theta + t\theta ^\perp ) \mathrm {d}t, \end{aligned} \end{aligned}$$

(118)

where $s\in \mathbb {R},$ $\theta \in S^1$ and $\theta ^\perp \in S^1$ is (counterclockwise) perpendicular to $\theta $; see [64]. For our experiments, we use a discretization of the Radon transform created using the AIR tools software package [39]. Regarding our method, we employed coupling of consecutive splitting variables and the step-size parameter was set to $\lambda =0.11$. To quantify the reconstruction quality, we use the mean structural similarity index (MSSIM) [89] which is bounded from above by 1, where higher values indicate better results.

We compare the proposed method to filtered back projection (FBP) which is standard in practice [71]. The FBP is computed using its Matlab implementation with the standard Ram–Lak filter. Furthermore, we compare with total variation (TV) regularization [76] in the Lagrange form $\Vert Ru-f\Vert _2^2 + \mu \Vert \nabla u \Vert _1$ with parameter $\mu >0$. Its implementation follows the Chambolle–Pock algorithm [21]. The corresponding parameter $\mu $ was tuned w.r.t. the MSSIM index.

In Fig. 4, we show the reconstruction results for the Shepp–Logan phantom from undersampled (25 angles) and noisy Radon data. Standard FBP produces strong streak artifacts which are typical for angular undersampling, and the reconstruction suffers from noise. The TV regularization and the proposed method both provide considerably improved reconstruction results. The proposed method achieves a higher MSSIM value than the TV reconstruction, and it provides a reconstruction which is less grainy than the TV result.

Image Partitioning Finally, we consider the classical Potts problem corresponding to $A=\mathrm {id}$ in (1). While the focus of the present paper is on a general imaging operator A, we next observe that it also works rather well for $A=\mathrm {id}$. We used the full coupling scheme and set the step-size parameter to $\lambda = 0.55$.

To put our result in context, we added the results of two other methods for $A=\mathrm {id}$: the $L_0$ gradient smoothing method of Xu et al. [94] and the state-of-the-art $\alpha $-expansion graph cut algorithm based on max-flow/min-cut of the library GCOptimization 3.0 of Veksler and Delong [12, 13, 52]. The method of [94] has a parameter $\kappa >1$ to control the convergence speed and a smoothing weight $\nu $. In our experiments, we set $\kappa = 1.01$ and $\nu =0.1$. For the graph cuts the same neighborhood weights and jump penalty were used as for the proposed method. The discrete labels are computed via k-means.

In Fig. 5, we show the results for a natural image corrupted by Gaussian noise. The Ambrosio–Tortorelli result suffers from clutter and remains noisy. The result of $L_0$ gradient smoothing over-segments the textured window area while it smooths out details of the cross. The state of the art graph cuts method and the proposed method both provide satisfying results which are visually comparable. Further, they yield solutions with comparable Potts energy values. For instance, on the IVC dataset [55] which consists of 10 natural color images of size $512\times 512,$ for the model parameters, $\gamma = 0.25$ and $\gamma =1,$ the mean values of the proposed approach are 7107.8 and 13053.2 compared with the respective mean energies of the graph cut approach 7093.2 and 13008.7 which differ by about half a percent. (For the results in Fig. 5, the energy value of the proposed approach is 25067.7 compared with 25119.5 for the graph cuts approach.) Here, for the graph cut approach, we took the mean value of the input image on each computed segment before computing the Potts objective function. To sum up, while the proposed method can handle general linear operators A, the quality of the results for $A=\mathrm id$ is comparable with the state-of-the-art graph cut algorithm for $A=\mathrm id$.

5 Conclusion

In this paper, we have proposed a new iterative minimization strategy for multivariate piecewise constant Mumford–Shah/Potts energies as well as their quadratic penalty relaxations. Our schemes are based on majorization–minimization or forward–backward splitting methods of Douglas–Rachford type [57]. In contrast to the approaches in [9, 33, 60, 61] for sparsity problems which lead to thresholding algorithms, our approach leads to non-separable yet computationally tractable problems in the backward step.

As a second part, we have provided a convergence analysis for the proposed algorithms. For the proposed quadratic penalty relaxation scheme, we have shown convergence toward a local minimizer. Due to the NP-hardness of the quadratic penalty relaxation, the convergence result is in the range of what can be expected best. Concerning the scheme for the non-relaxed Potts problem, we have also performed a convergence analysis. In particular, we have obtained results on the convergence toward local minimizers on subsequences. The quality of these results is comparable with the results of [60, 61] where, compared with these papers, we had to deal with the non-separability of the backward step as an additional challenge.

Finally, we have shown the applicability of our schemes in several experiments. We have applied our algorithms to deconvolution problems including the problem of deblurring and denoising motion blur images. We have further dealt with noisy and undersampled Radon data for the task of joint reconstruction, denoising and segmentation. Finally, we have applied our approach to the situation of pure image partitioning (without blur) which is a widely considered problem in computer vision.

References

Ambrosio, L., Fusco, N., Pallara, D.: Functions of bounded variation and free discontinuity problems. Clarendon Press Oxford (2000)
MATH Google Scholar
Ambrosio, L., Tortorelli, V.: Approximation of functional depending on jumps by elliptic functional via $\Gamma $-convergence. Communications on Pure and Applied Mathematics 43(8), 999–1036 (1990)
MathSciNet MATH Google Scholar
Artina, M., Fornasier, M., Solombrino, F.: Linearly constrained nonsmooth and nonconvex minimization. SIAM Journal on Optimization 23(3), 1904–1937 (2013)
MathSciNet MATH Google Scholar
Bae, E., Yuan, J., Tai, X.C.: Global minimization for continuous multiphase partitioning problems using a dual approach. International Journal of Computer Vision 92(1), 112–129 (2011)
MathSciNet MATH Google Scholar
Bar, L., Sochen, N., Kiryati, N.: Variational pairing of image segmentation and blind restoration. In: ECCV 2004, pp. 166–177. Springer (2004)
Bar, L., Sochen, N., Kiryati, N.: Semi-blind image restoration via Mumford–Shah regularization. IEEE Transactions on Image Processing 15(2), 483–493 (2006)
Google Scholar
Bertsekas, D.: Constrained optimization and Lagrange multiplier methods. Academic Press Cambridge (2014)
MATH Google Scholar
Blake, A., Zisserman, A.: Visual reconstruction. MIT Press Cambridge (1987)
Google Scholar
Blumensath, T., Davies, M.: Iterative thresholding for sparse approximations. Journal of Fourier Analysis and Applications 14(5-6), 629–654 (2008)
MathSciNet MATH Google Scholar
Blumensath, T., Davies, M.: Iterative hard thresholding for compressed sensing. Applied and Computational Harmonic Analysis 27(3), 265–274 (2009)
MathSciNet MATH Google Scholar
Boykov, Y., Kolmogorov, V.: Computing geodesics and minimal surfaces via graph cuts. In: Proceedings of the Ninth IEEE International Conference on Computer Vision, vol. 1, pp. 26–33 (2003)
Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(9), 1124–1137 (2004)
MATH Google Scholar
Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(11), 1222–1239 (2001)
Google Scholar
Boysen, L., Bruns, S., Munk, A.: Jump estimation in inverse regression. Electronic Journal of Statistics 3, 1322–1359 (2009)
MathSciNet MATH Google Scholar
Boysen, L., Kempe, A., Liebscher, V., Munk, A., Wittich, O.: Consistencies and rates of convergence of jump-penalized least squares estimators. The Annals of Statistics 37(1), 157–183 (2009)
MathSciNet MATH Google Scholar
Brown, E., Chan, T., Bresson, X.: Completely convex formulation of the Chan-Vese image segmentation model. International Journal of Computer Vision 98(1), 103–121 (2012)
MathSciNet MATH Google Scholar
Candès, E., Wakin, M., Boyd, S.: Enhancing sparsity by reweighted $\ell ^1$ minimization. Journal of Fourier Analysis and Applications 14(5), 877–905 (2008)
MathSciNet MATH Google Scholar
Chambolle, A.: Image segmentation by variational methods: Mumford and Shah functional and the discrete approximations. SIAM Journal on Applied Mathematics 55(3), 827–863 (1995)
MathSciNet MATH Google Scholar
Chambolle, A.: Finite-differences discretizations of the Mumford–Shah functional. Mathematical Modelling and Numerical Analysis 33(02), 261–288 (1999)
MathSciNet MATH Google Scholar
Chambolle, A., Cremers, D., Pock, T.: A convex approach to minimal partitions. SIAM Journal on Imaging Sciences 5(4), 1113–1158 (2012)
MathSciNet MATH Google Scholar
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision 40(1), 120–145 (2011)
MathSciNet MATH Google Scholar
Chan, C., Katsaggelos, A., Sahakian, A.: Image sequence filtering in quantum-limited noise with applications to low-dose fluoroscopy. IEEE Transactions on Medical Imaging 12(3), 610–621 (1993)
Google Scholar
Chan, T., Esedoglu, S., Nikolova, M.: Algorithms for finding global minimizers of image segmentation and denoising models. SIAM Journal on Applied Mathematics 66(5), 1632–1648 (2006)
MathSciNet MATH Google Scholar
Chan, T., Vese, L.: Active contours without edges. IEEE Transactions on Image Processing 10(2), 266–277 (2001)
MATH Google Scholar
Chartrand, R.: Fast algorithms for nonconvex compressive sensing: MRI reconstruction from very few data. In: IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pp. 262–265 (2009)
Condat, L.: A direct algorithm for 1-D total variation denoising. IEEE Signal Processing Letters 20(11), 1054–1057 (2013)
Google Scholar
Cremers, D., Rousson, M., Deriche, R.: A review of statistical approaches to level set segmentation: integrating color, texture, motion and shape. International Journal of Computer Vision 72(2), 195–215 (2007)
Google Scholar
Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics 57(11), 1413–1457 (2004)
MathSciNet MATH Google Scholar
De Giorgi, E.: Free discontinuity problems in calculus of variations. Frontiers in Pure and Applied Mathematics, a collection of papers dedicated to J.L. Lions on the occasion of his 60th birthday, R. Dautray ed., North Holland (1991)
Drobyshev, A., Machka, C., Horsch, M., Seltmann, M., Liebscher, V., Angelis, M., Beckers, J.: Specificity assessment from fractionation experiments (SAFE): a novel method to evaluate microarray probe specificity based on hybridisation stringencies. Nucleic Acids Research 31(2), e1 (2003)
Google Scholar
Engl, H., Hanke, M., Neubauer, A.: Regularization of inverse problems. Springer Berlin (1996)
MATH Google Scholar
Fornasier, M., March, R., Solombrino, F.: Existence of minimizers of the Mumford–Shah functional with singular operators and unbounded data. Annali di Matematica Pura ed Applicata 192(3), 361–391 (2013)
MathSciNet MATH Google Scholar
Fornasier, M., Ward, R.: Iterative thresholding meets free-discontinuity problems. Foundations of Computational Mathematics 10(5), 527–567 (2010)
MathSciNet MATH Google Scholar
Frick, K., Munk, A., Sieling, H.: Multiscale change point inference. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76(3), 495–580 (2014)
MathSciNet MATH Google Scholar
Friedrich, F., Kempe, A., Liebscher, V., Winkler, G.: Complexity penalized M-estimation. Journal of Computational and Graphical Statistics 17(1), 201–224 (2008)
MathSciNet Google Scholar
Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 721–741 (1984)
MATH Google Scholar
Goldluecke, B., Strekalovskiy, E., Cremers, D.: Tight convex relaxations for vector-valued labeling. SIAM Journal on Imaging Sciences 6(3), 1626–1664 (2013)
MathSciNet MATH Google Scholar
Graham, F.: Spectral graph theory. American Mathematical Society Providence (1997)
Google Scholar
Hansen, P., Saxild-Hansen, M.: AIR tools—a Matlab package of algebraic iterative reconstruction methods. Journal of Computational and Applied Mathematics 8(236), 2167–2178 (2012)
MathSciNet MATH Google Scholar
Hirschmüller, H.: Accurate and efficient stereo processing by semi-global matching and mutual information. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 807–814. IEEE (2005)
Hirschmüller, H.: Stereo vision in structured environments by consistent semi-global matching. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2386–2393. IEEE (2006)
Hirschmüller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(2), 328–341 (2008)
Google Scholar
Hohm, K., Storath, M., Weinmann, A.: An algorithmic framework for Mumford–Shah regularization of inverse problems in imaging. Inverse Problems 31(11), 115011 (2015)
MathSciNet MATH Google Scholar
Hupé, P., Stransky, N., Thiery, J., Radvanyi, F., Barillot, E.: Analysis of array CGH data: from signal ratio to gain and loss of DNA regions. Bioinformatics 20(18), 3413–3422 (2004)
Google Scholar
Jiang, M., Maass, P., Page, T.: Regularizing properties of the Mumford–Shah functional for imaging applications. Inverse Problems 30(3), 035007 (2014)
MathSciNet MATH Google Scholar
Juan, O., Boykov, Y.: Active graph cuts. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 1023–1029. IEEE (2006)
Killick, R., Fearnhead, P., Eckley, I.: Optimal detection of changepoints with a linear computational cost. Journal of the American Statistical Association 107(500), 1590–1598 (2012)
MathSciNet MATH Google Scholar
Kim, J., Tsai, A., Cetin, M., Willsky, A.: A curve evolution-based variational approach to simultaneous image restoration and segmentation. In: Proceedings of the IEEE International Conference on Image Processing, vol. 1, pp. I–109. IEEE (2002)
Klann, E.: A Mumford–Shah-like method for limited data tomography with an application to electron tomography. SIAM Journal on Imaging Sciences 4(4), 1029–1048 (2011)
MathSciNet MATH Google Scholar
Klann, E., Ramlau, R.: Regularization properties of Mumford–Shah-type functionals with perimeter and norm constraints for linear ill-posed problems. SIAM Journal on Imaging Sciences 6(1), 413–436 (2013)
MathSciNet MATH Google Scholar
Klann, E., Ramlau, R., Ring, W.: A Mumford–Shah level-set approach for the inversion and segmentation of SPECT/CT data. Inverse Problems and Imaging 5(1), 137–166 (2011)
MathSciNet MATH Google Scholar
Kolmogorov, V., Zabin, R.: What energy functions can be minimized via graph cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence 26(2), 147–159 (2004)
Google Scholar
Lan, G., Monteiro, R.: Iteration-complexity of first-order penalty methods for convex programming. Mathematical Programming 138(1-2), 115–139 (2013)
MathSciNet MATH Google Scholar
Landweber, L.: An iteration formula for Fredholm integral equations of the first kind. American Journal of Mathematics 73(3), 615–624 (1951)
MathSciNet MATH Google Scholar
Le Callet, P., Autrusseau, F.: Subjective quality assessment IRCCyN/IVC database (2005). http://www.irccyn.ec-nantes.fr/ivcdb/
Lellmann, J., Schnörr, C.: Continuous multiclass labeling approaches and algorithms. SIAM Journal on Imaging Sciences 4(4), 1049–1096 (2011)
MathSciNet MATH Google Scholar
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM Journal on Numerical Analysis 16(6), 964–979 (1979)
MathSciNet MATH Google Scholar
Little, M., Jones, N.: Generalized methods and solvers for noise removal from piecewise constant signals. I. Background theory. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Science 467(2135), 3088–3114 (2011)
MATH Google Scholar
Little, M., Jones, N.: Generalized methods and solvers for noise removal from piecewise constant signals. II. New methods. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Science 467(2135), 3115–3140 (2011)
MATH Google Scholar
Lu, Z.: Iterative hard thresholding methods for $l_0$ regularized convex cone programming. Mathematical Programming 147, 125–154 (2014)
MathSciNet MATH Google Scholar
Lu, Z., Zhang, Y.: Sparse approximation via penalty decomposition methods. SIAM Journal on Optimization 23(4), 2448–2478 (2013)
MathSciNet MATH Google Scholar
Mumford, D., Shah, J.: Boundary detection by minimizing functionals. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 17, pp. 137–154 (1985)
Google Scholar
Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and associated variational problems. Communications on Pure and Applied Mathematics 42(5), 577–685 (1989)
MathSciNet MATH Google Scholar
Natterer, F.: The mathematics of computerized tomography, vol. 32. SIAM Philadelphia (1986)
MATH Google Scholar
Needell, D., Ward, R.: Near-optimal compressed sensing guarantees for total variation minimization. IEEE Transactions on Image Processing 22(10), 3941–3949 (2013)
MathSciNet MATH Google Scholar
Needell, D., Ward, R.: Stable image reconstruction using total variation minimization. SIAM Journal on Imaging Sciences 6(2), 1035–1058 (2013)
MathSciNet MATH Google Scholar
Nikolova, M.: Thresholding implied by truncated quadratic regularization. IEEE Transactions on Signal Processing 48(12), 3437–3450 (2000)
MathSciNet MATH Google Scholar
Nikolova, M., Ng, M., Tam, C.P.: Fast nonconvex nonsmooth minimization methods for image restoration and reconstruction. IEEE Transactions on Image Processing 19(12), 3073–3088 (2010)
MathSciNet MATH Google Scholar
Nikolova, M., Ng, M., Zhang, S., Ching, W.: Efficient reconstruction of piecewise constant images using nonsmooth nonconvex minimization. SIAM Journal on Imaging Sciences 1(1), 2–25 (2008)
MathSciNet MATH Google Scholar
Nord, A., Gachon, E., Perez-Carrasco, R., Nirody, J., Barducci, A., Berry, R., Pedaci, F.: Catch bond drives stator mechanosensitivity in the bacterial flagellar motor. Proceedings of the National Academy of Sciences 114(49), 12952–12957 (2017)
Google Scholar
Pan, X., Sidky, E., Vannier, M.: Why do commercial CT scanners still employ traditional, filtered back-projection for image reconstruction? Inverse Problems 25(12), 123009 (2009)
MathSciNet MATH Google Scholar
Pock, T., Chambolle, A., Cremers, D., Bischof, H.: A convex relaxation approach for computing minimal partitions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 810–817 (2009)
Potts, R.: Some generalized order-disorder transformations. Mathematical Proceedings of the Cambridge Philosophical Society 48, 106–109 (1952)
MathSciNet MATH Google Scholar
Ramlau, R., Ring, W.: A Mumford–Shah level-set approach for the inversion and segmentation of X-ray tomography data. Journal of Computational Physics 221(2), 539–557 (2007)
MathSciNet MATH Google Scholar
Ramlau, R., Ring, W.: Regularization of ill-posed Mumford–Shah models with perimeter penalization. Inverse Problems 26(11), 115001 (2010)
MathSciNet MATH Google Scholar
Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena 60(1), 259–268 (1992)
MathSciNet MATH Google Scholar
Snijders, A., Nowak, N., Segraves, R., et al.: Assembly of microarrays for genome-wide measurement of DNA copy number by CGH. Nature Genetics 29, 263–264 (2001)
Google Scholar
Sowa, Y., Berry, R.: Bacterial flagellar motor. Quarterly Reviews of Biophysics 41(02), 103–132 (2008)
Google Scholar
Sowa, Y., Rowe, A., Leake, M., Yakushi, T., Homma, M., Ishijima, A., Berry, R.: Direct observation of steps in rotation of the bacterial flagellar motor. Nature 437(7060), 916–919 (2005)
Google Scholar
Spielman, D.: Spectral graph theory and its applications. In: 48th Annual IEEE Symposium on Foundations of Computer Science, pp. 29–38 (2007)
Stoer, J., Bulirsch, R.: Introduction to numerical analysis. Springer Berlin (2013)
MATH Google Scholar
Storath, M., Kiefer, L., Weinmann, A.: Smoothing for signals with discontinuities using higher order Mumford-Shah models. Numerische Mathematik 143(2), 423–460 (2019)
MathSciNet MATH Google Scholar
Storath, M., Weinmann, A.: Fast partitioning of vector-valued images. SIAM Journal on Imaging Sciences 7(3), 1826–1852 (2014)
MathSciNet MATH Google Scholar
Storath, M., Weinmann, A., Demaret, L.: Jump-sparse and sparse recovery using Potts functionals. IEEE Transactions on Signal Processing 62(14), 3654–3666 (2014)
MathSciNet MATH Google Scholar
Storath, M., Weinmann, A., Frikel, J., Unser, M.: Joint image reconstruction and segmentation using the Potts model. Inverse Problems 31(2), 025003 (2015)
MathSciNet MATH Google Scholar
Strekalovskiy, E., Chambolle, A., Cremers, D.: A convex representation for the vectorial Mumford–Shah functional. In: IEEE CVPR, pp. 1712–1719 (2012)
Veksler, O.: Efficient graph-based energy minimization methods in computer vision. Ph.D. thesis, Cornell University (1999)
Vese, L., Chan, T.: A multiphase level set framework for image segmentation using the Mumford and Shah model. International Journal of Computer Vision 50(3), 271–293 (2002)
MATH Google Scholar
Wang, Z., Bovik, A., Sheikh, H., Simoncelli, E.: Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13(4), 600–612 (2004)
Google Scholar
Weinmann, A., Storath, M.: Iterative Potts and Blake–Zisserman minimization for the recovery of functions with discontinuities from indirect measurements. Proceedings of the Royal Society A 471(2176), 20140638 (2015)
MathSciNet MATH Google Scholar
Winkler, G.: Image analysis, random fields and Markov chain Monte Carlo methods: a mathematical introduction. Springer Berlin (2003)
MATH Google Scholar
Winkler, G., Liebscher, V.: Smoothers for discontinuous signals. Journal of Nonparametric Statistics 14(1-2), 203–222 (2002)
MathSciNet MATH Google Scholar
Wolf, P., Jørgensen, J., Schmidt, T., Sidky, E.: Few-view single photon emission computed tomography (SPECT) reconstruction based on a blurred piecewise constant object model. Physics in Medicine and Biology 58(16), 5629 (2013)
Google Scholar
Xu, L., Lu, C., Xu, Y., Jia, J.: Image smoothing via $l_0$ gradient minimization. ACM Transactions on Graphics 30(6), 174 (2011)
Google Scholar
Xu, L., Zheng, S., Jia, J.: Unnatural $l_0$ sparse representation for natural image deblurring. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1107–1114 (2013)
Zhang, Y., Dong, B., Lu, Z.: $l_0$ minimization for wavelet frame based image restoration. Mathematics of Computation 82(282), 995–1015 (2013)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

Open Access funding provided by Projekt DEAL.

Author information

Authors and Affiliations

Mathematical Imaging Group, Heidelberg University, Heidelberg, Germany
Lukas Kiefer
Department of Mathematics and Natural Sciences, Hochschule Darmstadt, Darmstadt, Germany
Lukas Kiefer & Andreas Weinmann
Department of Applied Natural Sciences and Humanities, University of Applied Sciences Würzburg-Schweinfurt, Würzburg, Germany
Martin Storath

Authors

Lukas Kiefer
View author publications
You can also search for this author in PubMed Google Scholar
Martin Storath
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Weinmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andreas Weinmann.

Additional information

Communicated by Hans Munthe-Kaas.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

L. Kiefer and A. Weinmann were supported by the German Research Foundation (DFG) Grant WE5886/4-1. A. Weinmann further acknowledges support by DFG Grant WE5886/3-1. M. Storath was supported by DFG Grant STO1126/2-1.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kiefer, L., Storath, M. & Weinmann, A. Iterative Potts Minimization for the Recovery of Signals with Discontinuities from Indirect Measurements: The Multivariate Case. Found Comput Math 21, 649–694 (2021). https://doi.org/10.1007/s10208-020-09466-9

Download citation

Received: 26 November 2018
Revised: 05 May 2020
Accepted: 06 May 2020
Published: 06 July 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s10208-020-09466-9

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Iterative Potts Minimization for the Recovery of Signals with Discontinuities from Indirect Measurements: The Multivariate Case

Abstract

Similar content being viewed by others

Fast Convex Optimization via Differential Equation with Hessian-Driven Damping and Tikhonov Regularization

Infinite-dimensional distances and divergences between positive definite operators, Gaussian measures, and Gaussian processes

Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

1 Introduction

2 Majorization–Minimization Algorithms for Multivariate Potts Problems

2.1 Discretization

Theorem 1

2.2 Derivation of the Proposed Algorithmic Schemes

Algorithm 1

Definition 1

Procedure 1

Algorithm 2

2.3 A Non-iterative Algorithm for Minimizing the Potts Subproblem (16)

3 Analysis

3.1 Analytic Results

Theorem 2

Theorem 3

Theorem 4

Theorem 5

Theorem 6

3.2 Estimates on Operator Norms and Lagrange Multipliers

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

3.3 The Quadratic Penalty Relaxation of the Potts Problem and Its Relation to the Potts Problem

Proof of Theorem 2

Lemma 4

Proof

Lemma 5

Proof

Proposition 1

Proof

Lemma 6

Proof

Proposition 2

Proof

3.4 Majorization–Minimization for Multivariate Potts Problems

Lemma 7

Lemma 8

Proof

Proposition 3

Proof

Proof of Theorem 4

3.5 Estimating the Distance Between the Objectives

Lemma 9

Proof

Proof of Theorem 3

3.6 Convergence Analysis of Algorithm 2

Proof of Theorem 5

Proof of Theorem 6

4 Numerical Results

5 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation