Iterative Potts Minimization for the Recovery of Signals with Discontinuities from Indirect Measurements: The Multivariate Case

Signals and images with discontinuities appear in many problems in such diverse areas as biology, medicine, mechanics and electrical engineering. The concrete data are often discrete, indirect and noisy measurements of some quantities describing the signal under consideration. A frequent task is to find the segments of the signal or image which corresponds to finding the discontinuities or jumps in the data. Methods based on minimizing the piecewise constant Mumford–Shah functional—whose discretized version is known as Potts energy—are advantageous in this scenario, in particular, in connection with segmentation. However, due to their non-convexity, minimization of such energies is challenging. In this paper, we propose a new iterative minimization strategy for the multivariate Potts energy dealing with indirect, noisy measurements. We provide a convergence analysis and underpin our findings with numerical experiments.


Introduction
Problems involving reconstruction tasks for functions with discontinuities appear in various biological and medical applications.Examples are the steps in the rotation of the bacterial flagella motor [68,76,77], the cross-hybridization of DNA [29,43,75], x-ray tomography [72], electron tomography [48] and SPECT [50,91].An engineering example is crack detection in brittle material in mechanics [3].Further examples may for instance be found in the papers [21,24,33,56,57] and the references therein.In general, signals with discontinuities appear in many applied problems.A central task is to restore the jumps, edges, change points or segments of the signals or images from the observed data.These observed data are usually indirectly measured.Furthermore, they consist of measurements on a discretized grid and are typically corrupted by noise.
In many scenarios, nonconvex nonsmooth variational methods are a suitable choice for the partitioning task, i.e., the task of finding the jumps/edges/change points; see for example [12,56,68].In particular, methods based on piecewise constant Mumford-Shah functionals [60,61] have been used in various different applications.The piecewise-constant Mumford-Shah model also appears in statistics and image processing where it is often called Potts model [12-14, 35, 70, 89]; this is a tribute to Renfrey B. Potts and his work in statistical mechanics [71].The variational formulation of the piecewise-constant Mumford-Shah/Potts model (with an indirect measurement term) is given by argmin u γ ∇u 0 + Au − f 2  2 . (1) Here, A is a linear operator modeling the measurement process, e.g., the Radon transform in computed tomography (CT), or the point-spread function of the microscope in microscopy.Further, f is an element of the data space, e.g., a sinogram or part of it in CT, or the blurred microscopy image in microscopy.The mathematically precise definition of the jump term ∇u 0 in the general situation is rather technical.However, if u is piecewise-constant and the discontinuity set of u is sufficiently regular, say, a union of C 1 curves, then ∇u 0 is just the total arc length of this union.In general, the gradient ∇u is given in the distributional sense and the boundary length is expressed in terms of the (d − 1)-dimensional Hausdorff measure.When u is not piecewise-constant, the jump penalty is infinite [73].The second term measures the fidelity of a solution u to the data f.The parameter γ > 0 controls the balance between data fidelity and jump penalty.
The piecewise-constant Mumford-Shah/Potts model can be interpreted in two ways.On the one hand, if the imaged object is (approximately) piecewise-constant, then the solution is an (approximate) reconstruction of the imaged object.On the other hand, since a piecewiseconstant solution directly induces a partitioning of the image domain, it can be seen as joint reconstruction and segmentation.Executing reconstruction and segmentation jointly typically leads to better results than performing the two steps successively [50,72,73,83].We note that in order to deal with the discrete data, the energy functional is typically discretized; see Section 2.1.Some references concerning Mumford-Shah functionals are [2,8,17,32,44,65,73] and also the references therein; see also the book [1].The piecewise constant Mumford-Shah functionals are among the maybe most well-known representatives of the class of free-discontinuity problems introduced by De Giorgi [28].
The analysis of the nonsmooth and nonconvex problem (1) is rather involved.We discuss some analytic aspects.We first note that without additional assumptions the existence of minimizers of (1) is not guaranteed in a continuous domain setting [31,32,73,82].To ensure the existence of minimizers, additional penalty terms such as an L p (1 < p < ∞) term of the form u p p [72,73] or pointwise boundedness constraints [44] have been considered.We note that the existence of minimizers is guaranteed in the discrete domain setup for typical discretizations [32,82].Another important topic is to verify that the Potts model is a regularization method in the sense of inverse problems.The first work dealing with this task is [73]: The authors assume that the solution space consists of non-degenerate piecewise-constant functions with at most k (arbitrary, but fixed) different values which are additionally bounded.Under relatively mild assumptions on the operator A, they show stability.Further, by giving a suitable parameter choice rule, they show that the method is a regularizer in the sense of inverse problems.Related references are [44,49] with the latter including (non-piecewise-constant) Mumford-Shah functionals.We note that Mumford-Shah approaches (including the piecewise constant Mumford-Shah variant) also regularize the boundaries of the discontinuity set of the underlying signal [44].
Solving the Potts problem is algorithmically challenging.For A = id, it is NP-hard for multivariate domains [12,85], and, for general linear operators A, it is even NP-hard for univariate signals [82].Thus, finding a global minimizer within reasonable time seems to be unrealistic in general.Nevertheless, due to its importance, many approximative strategies for multivariate Potts problems with A = id have been proposed.(We note that the case A = id is important as well since it captures the partitioning problem in image processing.)For the Potts problem with general A there are still some but not that many existing approaches, in particular in the multivariate situation.For a more detailed discussion, we refer to the paragraph on algorithms for piecewise constant Mumford-Shah problems below.A further discussion of methods for reconstructing piecewise constant signals may be found in [57].In [88], we have considered the univariate Potts problem for a general operator A and have proposed a majorization-minimization strategy which we called iterative Potts minimization in analogy to iterative thresholding schemes.In this work, we will develop iterative Potts minimization schemes for the more demanding multivariate situation which is important for multivariate applications as appearing in imaging problems.
Existing algorithmic approaches to the piecewise constant Mumford-Shah problem and related problems.We start to consider the Potts problem for general operator A. In [5], Bar et al. consider an Ambrosio-Tortorelli-type approximation.Kim et al. use a level-set based active contour method for deconvolution in [47].Ramlau and Ring [72] employ a related level-set approach for the joint reconstruction and segmentation of x-ray tomographic images; further applications are electron tomography [48] and SPECT [50].The authors of the present paper have proposed a strategy based on the alternating methods of multipliers in [82] for the univariate case and in [83] for the multivariate case.
Fornasier and Ward [32] rewrite Mumford-Shah problems as a pointwise penalized problem and derive generalized iterative thresholding algorithms for the rewritten problems in the univariate situation.Further, they show that their method converges to a local minimizer in the univariate case.Their approach principally carries over to the piecewise constant Mumford-Shah functional as explained in [82,88] and then results in a 0 sparsity problem.In the univariate situation, this NP-hard optimization problem is unconstrained and may be addressed by iterative hard thresholding algorithms for 0 penalizations, analyzed by Blumensath and Davies in [9,10].(Note that related algorithms based on iterative soft thresholding for 1 penalized problems have been considered by Daubechies, Defrise, and De Mol in [27].)Artina et al. [3] in particular consider the multivariate discrete Mumford-Shah model using the pointwise penalization approach of [32].In the multivariate setting, this results in a corresponding nonconvex and nonsmooth problem with linear constraints.The authors successively minimize local quadratic and strictly convex perturbations (depending on the previous iterate) of a (fixed) smoothed version of the objective by augmented Lagrangian iterations which themselves can be accomplished by iterative thresholding via a Lipschitz continuous thresholding function.They show that the accumulation points of the sequences produced by their algorithm are constraint critical points of the smoothed problem.In the multivariate situation, a similar approach for rewriting the Potts problem results in an 0 sparsity problem with additional equality constraints.Algorithmic approaches for such 0 sparsity problem with equality constraints are the penalty decomposition methods of [58,59,94].The connection with iterative hard thresholding is that the inner loop of the employed two stage process usually is of iterative hard thresholding type.The difference of the hard thresholding based methods to our approach in this paper is that we do not have to deal with constraints and the full matrix A but with the nonseparable regularizing term ∇u 0 instead of its separable counterpart u 0 .Hence we cannot use hard thresholding.
Another frequently appearing method in the context of restoration of piecewise constant images is total variation minimization [74].There the jump penalty ∇u 0 is replaced by the total variation ∇u 1 .The arising minimization problem is convex and therefore numerically tractable with convex optimization techniques [20,25].Candès, Wakin, and Boyd [16] use iteratively reweighted total variation minimization for piecewise constant recovery problems.Results of compressed sensing type related to the Potts problem have been derived by Needell and Ward [63,64]: under certain conditions, minimizers of the Potts functional agree with total variation minimizers.However, in the presence of noise, total variation minimizers might significantly differ from minimizers of the Potts problem.But the minimizers of the Potts problem are the results frequently desired in practice.Further, algorithms based on convex relaxations of the Potts problem (1) have gained a lot of interest in recent years; see, e.g., [4,15,19,36,54,84].
We next discuss approaches for the multivariate Potts problem for the situation A = id which is particularly interessting in image processing and for which there are some further approaches.The first class of approaches is the approach via graph cuts.Here, the range space of u is a priori restricted to a relatively small number of values.The problem remains NP-hard, but it then allows for an approach by sequentially solving binary partitioning problems via minimal graph cut algorithms [11,12,51].Another approach is to limit the number k of different values which u may take without discretizing the range space a priori.For k = 2, active contours were used by Chan and Vese [23] to minimize the corresponding binary Potts model.They use a level set function to represent the partitions which evolves according to the Euler-Lagrange equations of the Potts model.A globally convergent strategy for the binary segmentation problem is presented in [22].The active contour method for k = 2 was extended to larger k in [86].Note that, for k > 2 the problem is NP hard.We refer to [26] for an overview on level set segmentation.In [39][40][41], Hirschmüller proposes a non-iterative strategy for the Potts problem which is based on cost aggregation.It has lower computational cost, but comes with lower quality reconstructions compared with graph cuts.Due to the small number of potential values of u, these methods mainly appear in connection with image segmentation.Methods for restoring piecewise constant images without restricting the range space are proposed in Nikolova et al. [66,67].They use non-convex regularizers which are algorithmically approached using a graduated non-convexity approach.We note that the Potts problem (1) does not fall into the class of problems considered in [66,67].Last but not least, Xu et al. [92] proposed a piecewise constant model reminiscent of the Potts model that is approached by a half-quadratic splitting using a pixelwise iterative thresholding type technique.It was later extended to a method for blind image deconvolution [93].
Contributions.The contributions of this paper are threefold: (i) We propose a new iterative minimization strategy for multivariate piecewise constant Mumford-Shah/Potts functionals as well as a (still NP-hard) quadratic penalty relaxation.(ii) We provide a convergence analysis of the proposed schemes.(iii) We show the applicability of our schemes in several experiments.
Concerning (i), we propose two schemes which are based on majorization-minimization or forward-backward splitting methods of Douglas-Rachford type [55].The one scheme addresses the Potts problem directly, whereas the other scheme treats a quadratic penalty relaxation.The solutions of the relaxed problem themselves are not feasible for the Potts problem but near to a feasible solution of the Potts problem where nearness can be quantified.In particular, when a given tolerance in applications is acceptable the relaxed scheme is applicable.In contrast to the approaches in [9,32] and [58,59] for sparsity problems which lead to thresholding algorithms, our approach leads to non-separable yet computationally tractable problems in the backward step.
Concerning (ii), we first analyze the proposed quadratic penalty relaxation scheme.In particular, we show convergence towards a local minimizer.Due to the NP hardness of the quadratic penalty relaxation, the convergence result is in the range of what can be expected best.Concerning the scheme for the non-relaxed Potts problem we also perform a convergence analysis.In particular, we obtain results on the convergence towards local minimizers on subsequences.The quality of the convergence results is comparable with the ones in [58,59].We note that compared with [58,59] we face the additional challenge to deal with the non-separability of the backward step.(We note that in practice we observe convergence of the whole sequence, not on a subsequence.) Concerning (iii) we consider problems with full and partial data.We begin to apply our algorithms to deconvolution problems.In particular, we consider deblurring and denoising Gaussian blur images and motion blur images, respectively.We further consider noisy and undersampled Radon data, together with the task of joint reconstruction, denoising and segmentation.Finally, we use our method in the situation of pure image partitioning (without blur) which is a widely considered problem in computer vision.
Organization of the paper.In Section 2 we derive the proposed algorithmic schemes.In Section 3 we provide a convergence analysis for the proposed schemes.In Section 4 we apply the algorithms derived in the present paper to concrete reconstruction problems.In Section 5 we draw conclusions.

Majorization-minimization algorithms for multivariate Potts problems 2.1 Discretization
We use the following finite difference type discretization of the multivariate Potts problem (1) given by where the a s ∈ Z 2 come from a finite set of directions and the symbol ∇ a s u (i, j) denotes the directional difference u (i, j)+a s − u i, j with respect to the direction a s at the pixel (i, j).The symbol ∇ a s u 0 denotes the number of nonzero entries of ∇ a s u.The simplest set of directions consists of the unit vectors a 1 = (0, 1), a 2 = (1, 0) along with unit weights.Unfortunately, when refining the grid, this discretization converges to a limit that measures the boundary in terms of the 1 analogue of the Hausdorff measure [17].The practical consequences are unwanted block artifacts in the reconstruction (geometric staircasing).More isotropic results are obtained by adding the diagonals a 3 = (1, 1), a 4 = (1, −1) to the directions a 1 and a 2 ; a near isotropic discretization can be achieved by extending this system by the knight moves a 5 = (1, 2), a 6 = (2, 1), a 7 = (1, −2), a 8 = (2, −1).(The name is inspired by the possible moves of a knight in chess.)Weights ω s for the system {a 1 , a 2 , a 3 , a 4 } of coordinate directions and diagonal directions can be chosen as ω s = √ 2 − 1 for the coordinate part s = 1, 2 and 2 for diagonal part s = 3, 4. When additionally adding knight-move directions, weights ω s for the system {a 1 , . . ., a 8 } can be chosen as ω 2 for diagonal part s = 3, 4, and ) for diagonal part s = 5, . . ., 8. General schemes for weight design are proposed in [18] and in [83].We note that for the system {a 1 , a 2 , a 3 , a 4 } of coordinate directions and diagonal directions the weights of [18] and in [83] coincide; the weights displayed for the knight-move case above are the ones derived by the scheme in [83].
For further details we refer to these references.
We record that the considered problem (2) has a minimizer.
The validity of Theorem 1 can be seen by following the lines of the proof of [42, Theorem 2.1] where an analogous statement is shown for the (non-piecewise constant) Mumford-Shah problem.

Derivation of the proposed algorithmic schemes
We start out with the discretization (2) of the multivariate Potts problem.We introduce S versions u 1 , . . ., u S of the target u and link them via equality constraints in the following consensus form to obtain the problem where the functional P γ (u 1 , . . ., u S ) of the S variables u 1 , . . ., u S is given by Note that solving (3) is equivalent to solving the discrete Potts problem (2).Further, note that we have overloaded the symbol P γ which, for one argument u, denotes the discrete Potts functional of (2) and for S arguments u 1 , . . ., u S denotes the functional of (4); we have the relation P γ (u, . . ., u) = P γ (u).
A majorization-minimization approach to the quadratic penalty relaxation of the Potts problem.The quadratic penalty relaxation of ( 4) is given by Here, the soft constraints which replace the equalities u 1 = . . .= u S are realized via the squared Euclidean norms 1≤s<s ≤S c s,s u s − u s 2 2 , where the nonnegative numbers c s,s denote weights (which may be set to zero if no direct coupling between the particular u s , u s is desired.)The symbol ρ denotes a positive penalty parameter promoting the soft constraint, i.e., increasing ρ enforces the u i to be closer to each other w.r.t. the Euclidean distance.We note that we later analytically quantify the size of ρ which is necessary to obtain an a priori prescribed tolerance in the u i ; see (18) below.Frequently, we use the short-hand notation Typical choices of the ρ s,s are ρ s,s = ρ for all s, s , or ρ s,s = ρ δ (s+1) mod S ,s , i.e., the constant choice (c s,s = 1), as well as the coupling between consecutive variables with constant parameter (δ s,t = 1 if and only if s = t, and δ s,t = 0 otherwise.)We note that in these situations only one additional positive parameter ρ appears, and that this parameter is tied to the tolerance one is willing to accept as a distance of the u i ; see Algorithm 1.
For the majorization-minimzation approach, we derive a surrogate functional [27] of the functional P γ,ρ (u 1 , . . ., u S ) of (5).For this purpose, we introduce the block matrix B and the vector g given by Here I denotes the identity matrix and 0 the zero matrix; The matrix B has S block columns and S + S (S − 1)/2 block rows.Further, we introduce the difference operator D given by which applies the difference w.r.t. the ith direction to the ith component of u.We employ the weights ω 1 , . . ., ω S to define the quantity D(u 1 , . . ., u S ) 0,ω which counts the weighted number of jumps by With all this comprehensive notation at hand, we may rewrite the functional of ( 5) as Using the representation (11), the surrogate functional in the sense of [27] of the functional P γ,ρ is given by Here L ρ ≥ 1 denotes a constant which is chosen larger than the spectral norm B of B (i.e., the operator norm w.r.t. the 2 norm.)This scaling is made to ensure that B/L ρ is contractive.In terms of A and the penalties ρ s,s , we require that For the particular choice ρ s,s = ρ as on the left-hand side of ( 7) we can choose L 2 ρ smaller, i.e., L 2 ρ > A 2 2 /S + S ρ.For only coupling neighboring u s with the same constant ρ, i.e., the right-hand coupling of (7), we have L 2 ρ > A 2 2 /S + αρ, where α = 4, if S is even, and α = 2 − 2 cos π(S −1) S if S is odd.These choices ensure that B/L ρ is contractive by Lemma 8. Basics on surrogate functionals as we need them for this paper are gathered in Section 3.4.Further details on surrogate functionals can be found in [9,10,27].
Using elementary properties of the inner product shows that where R(v 1 , . . ., v S ) is a rest term which is irrelevant when minimizing P surr w.r.t.u 1 , . . ., u S for fixed v 1 , . . ., v S .Writing this down in terms of the original system matrix A and the data f yields For the quadratic penalty relaxation of the Potts problem, i.e., for minimizing the problem (5) we propose to use the surrogate iteration, i.e.
. Applied to (15), this surrogate iteration reads where h (n) s is given by Note that in Section 2.3 below, we derive an efficient algorithm which computes an exact minimizer of ( 16).Now assume that we are willing to accept a deviation between the u s which is small, i.e., u s − u s for ε > 0 and for indices s, s with c s,s 0. The following algorithm computes a result fulfilling (18).
Algorithm 1.We consider the quadratic penalty relaxed Potts problem (5) and tolerance ε for the targets u s we are willing to accept.We propose the following algorithm for the relaxed Potts problem (5) (which yields a result with targets u s deviating from each other by at most ε/ √ c s,s ).
• Set ρ according to (34), set L ρ according to (13) (or, in the special cases of (7), as below (34) and (13).) Initialize u (n) s as discussed in the corresponding paragraph below, (e.g., u (n) s = 0 for all s.) • Iterate until convergence: We will see in Theorem 4 that this algorithm converges to a local minimizer of the quadratic penalty relaxation (5) and that the u s are ε-close, i.e., (18) is fulfilled.
The relation between the Potts problem and its quadratic penalty relaxation and obtaining a feasible solution for the Potts problem (4) from the output of Algorithm 1.As pointed out above, we show in Theorem 4 that Algorithm 1 produces a local minimizer of the quadratic penalty relaxation (5) of the Potts problem (4) and that the corresponding variables of a resulting solution are close up to an a priori prescribed tolerance.This may in practice be already enough.However, strictly speaking a local minimizer of the quadratic penalty relaxation ( 5) is not feasible for the Potts problem (4).
We will now explain a projection procedure to derive a feasible solution for the Potts problem (4) from a local minimizer of ( 5) with nearby variables u s (as produced by Algorithm 1.) Related theoretical results are stated as Theorem 5.In particular, we will see that in case the image operator A is lower bounded, the projection procedure applied to the output of Algorithm 1 yields a feasible point which is close to a local minimizer of the original Potts problem (4).
In order to explain the averaging procedure, we need some notions on partitionings.Recall that a partitioning P consists of a (finite number of) segments P i which are pairwise disjoint sets of pixel coordinates whose union equals the image domain Ω, i.e., Here, we assume that each segment P i is connected w.r.t. the neighborhood system a 1 , . . ., a S in the sense that there is a path connecting any two elements in P i with steps in a 1 , . . ., a S .We will need the following proposed notion of a directional partitioning.A directional partition w.r.t. a set of S directions a 1 , . . ., a S consists of a set I of (discrete) intervals I, where each interval I is associated with exactly one of the directions a 1 , . . ., a S ; here, an interval I associated with the direction a s has to be of the form I = {(i, j) + ka s : k = 0, . . ., K − 1}, where K ∈ N and I belongs to the discrete domain.(For each direction a s , the corresponding intervals form an ordinary partition.)We note that Algorithm 1 which produces output u = (u 1 , . . ., u S ) : Ω → R s induces a directional partitioning as follows.We observe that each variable u s is associated with a direction a s .For any s ∈ {1, . . ., S }, we let each (maximal) interval of constance of u s be an interval in I associated with a s .
Each partitioning induces a directional partitioning I by letting the intervals I of I be the stripes with direction a s obtained from segment P i for each direction s = 1, . . ., S and each segment P i , i = 1, . . ., N P .Furthermore, each directional partitioning I induces a partitioning by the following merging process.Definition 2. We say that pixels x, y are related, in symbols, x ∼ y, if there is a path x 0 = x, . . ., x N = y connecting x, y in the sense that for any consecutive members x i , x i+1 , i = 1, . . ., N− 1, of the path there is an interval I of the directional partitioning I containing both x i , x i+1 .
The relation x ∼ y obviously defines an equivalence relation and the corresponding equivalence classes P i yield a partitioning on Ω.We use the symbols to denote the mappings assigning a partitioning a directional partitioning and vice versa, respectively.
As a final preparation we consider a function u = (u 1 , . . ., u S ) : Ω → R s as produced by Algorithm 1 and a partitioning P of Ω and define the following projection to a function π P (u) : Ω → R by where the symbol #P i denotes the number of elements in the segment P i .Hence the projection π defined via (22) averages w.r.t.all components of u and all members of the segment P i and so produces a piecewise constant function w.r.t. the partitioning P.
Using these notions we propose the following projection procedure.
1. Compute the partitioning P(I) = P I induced by the directional partitioning I as explained above (21).
We notice that when having a partitioning P I solving the normal equation in the space of functions constant on P I would be an alternative to the above second step which, however, might be more expensive.
A penalty method for the Potts problem based on a majorization-minimization approach for its quadratic penalty relaxation.Intuitively, increasing the parameters ρ during the iterations should tie the u s closer together such that the constraint of (3) should be ultimately fulfilled which results in an approach for the initial Potts problem (2).Recall that ρ s,s = ρ c s,s , was defined by (6), where the c s,s are nonnegative numbers weighting the constraints.We here increase ρ while leaving the c s,s fixed during this process.
Algorithm 2. We consider the Potts problem (3) in S variables (which is equivalent to (2) as explained above).We propose the following algorithm for the Potts problem (3).
in the special cases of (7), as below (13)) and goto A.
This approach is inspired by [58] which considers quadratic penalty methods in the sparsity context.There, the authors are searching for a solution with only a few nonzero entries.The corresponding prior is separable.In contrast to this work, the present work considers a nonseparable prior.
Initialization.Although the initialization of Algorithm 1 and of Algorithm 2 is not relevant for its convergence properties (cf.Section 3), the choice of the initialization influences the final result.(Please note that this also might happen for convex but not strictly convex problems.)We discuss different initialization strategies.The simplest choice is the all-zero initialization (u (0) 1 , ..., u (0) s ) = (0, ..., 0).Likewise, one can select the right hand side of the normal equations of the underlying least squares problem, that is A T f .A third reasonable choice is the solution of the normal equation itself or an approximation of it.Using an approximation might in particular be reasonable to get a regularized approximation of the normal equation.A possible strategy to obtain such a regularized initialization is to apply a fixed number of Landweber iterations [53] or of the conjugate gradient method to the underlying least square problem.(In our experiments, we initialized Algorithm 1 with the result of 1000 Landweber iterations and Algorithm 2 with A T f .)

A non-iterative algorithm for minimizing the Potts subproblem (16)
Both proposed algorithms require solving the Potts subproblem ( 16) in the backward step, see (19), (26).We first observe that ( 16) can be solved for each of the u s separately.The corresponding s minimization problems are of the prototypical form argmin with given data f , the jump penalty γ s = γω s L 2 ρ > 0 and the direction a s ∈ Z 2 .As a next step, we see that (28) decomposes into univariate Potts problems for data along the paths in f induced by a s , e.g., for a s = e 1 those paths correspond to the rows of f and we obtain a minimizer u * s of ( 28) by determining each of its rows individually.The univariate Potts problem amounts to minimizing where the data g is given by the restriction of f to the pixels in Ω of the form v + a s Z, i.e., Here the offset v is fixed when solving each univariate problem, but varied afterwards to get all lines in the image with direction a s .The target to optimize is denoted by x ∈ R n and, in the resulting univariate situation, ∇x 0 = |{i : x i x i+1 }| denotes the number of jumps of x.
It is well-known that the univariate direct problem (29) has a unique minimizer.Further these particular problems can be solved exactly by dynamic programming [17,34,60,61,90] which we briefly describe in the following.For further details we refer to [34,80].Assume we have computed minimizers x l of (29) for partial data (g 1 , ..., g l ) for each l = 1, ..., r, r < n.Then the minimum value of ( 29) for (g 1 , ..., g r+1 ) can be found by where we let x 0 be the empty vector, P id,1d γ (x 0 ) = −γ and E l:r+1 be the quadratic deviation of (g l , ..., g r+1 ) from its mean.By denoting the minimizing argument in (30) by l * the minimizer x r+1 is given by where µ [l * ,r] is the mean value of (g l * , ..., g r ).Thus, we obtain a minimizer for full data g by successively computing x l for each l = 1, ..., n.By precomputing the first and second moments of data g and storing only jump locations the described method can be implemented in O(n 2 ), [34].Another way to achieve O(n 2 ) is based on the QR decomposition of the design matrix by means of Givens rotations, see [80].Furthermore, the search space can be pruned to speed up computations [46,81].

Analytic results
In the course of the derivation of the proposed algorithms above, we consider the quadratic penalty relaxation (5) of the multivariate Potts problem.Although it is more straight-forward to access algorithmically via our approach, we first note that this problem is still NP hard (as is the original problem).
Theorem 3. Finding a (global) minimizer of the quadratic penalty relaxation (5) of the multivariate Potts problem is an NP hard problem.
The proof is given in Section 3.3 below.In Section 2.2 we have proposed Algorithm 1 to approach the quadratic penalty relaxation of the multivariate Potts problem.We show that the proposed algorithm converges to a local minimizer and that a feasible point of the original multivariate Potts problem is nearby.
iii.Assume a tolerance ε we are willing to accept for the distance between the u s , i.e., s,s c s,s u s − u s Running Algorithm 1 with the choice of the parameter ρ by (where σ 1 is the smallest non-zero eigenvalue of C T C with C given by (49); for the particular choice of the coupling given by (7), σ 1 = S and σ 1 = (2 − 2 cos(2π/S )), respectively) yields a local minimizer of the quadratic penalty relaxation (5) such that the u s are close up to ε, i.e., (33) is fulfilled.
The proof is given in Section 3.5 below.A solution of Algorithm 1 is not a feasible point for the initial Potts problem (3).However, we see below that it produces a δ-approximative solution u * in the sense that there is µ * and a partitioning P * such that where L(µ * ) is given by ( 53) below.In this context note that the conditions for a local minimizer are given by s,s c s,s u * s −u * s 2 2 = 0 and the Lagrange multiplier condition L(µ * ) = 0.So (35) intuitively means that both the constraint and the Lagrange multiplier condition are approximately fulfilled for the partitioning induced by u * .
Further, given a solution of Algorithm 1 we find a feasible point for the Potts problem (3) (or, equivalently,(2)) which is nearby as detailed in the following theorem.ii.The projection procedure (Procedure 1) proposed in Section 2.2 applied to the solution u = (u 1 , . . ., u S ) of Algorithm 1 produces a feasible image û (together with a valid partitioning) for the Potts problem (3) which is close to u in the sense that where ε = max s,s u s − u s quantifies the deviation between the u s .Here C 1 = #Ω/4, where the symbol #Ω denotes the number of elements in Ω.If the imaging operator A is lower bounded, i.e., there is a constant c > 0 such that Au ≥ c u , a local minimizer u * of the Potts problem (3) is nearby, i.e., where The proof of Theorem 5 can be found at the end of Section 3.4, where most relevant statements are already shown in Section 3.3.Theorem 5 theoretically underpins the fact that, on the application side, we may use Algorithm 1 for the Potts problem (3) (accepting some arbitrary small tolerance we may fix in advance).
In addition, in Section 2.2, we have proposed Algorithm 2 to approach the Potts problem (3).We first show that Algorithm 2 is well-defined.Theorem 6. Algorithm 2 is well-defined in the sense that the inner iteration governed by (25) terminates, i.e., for any k ∈ N, there is n ∈ N such that the termination criterium given by (25) holds.
The proof of Theorem 6 is given in Section 3.6.Concerning the convergence properties of Algorithm 2 we obtain the following results.
• Any cluster point of the sequence u (k) is a local minimizer of the Potts problem (3) (which implicitly implies that the components of each limit u * are equal, i.e., u * s = u * s for all s, s .) • If A is lower bounded, the sequence u (k) produced by Algorithm 2 has a cluster point and the produced cluster points are local minimizers of the Potts problem (3).
The proof of Theorem 7 can be found in Section 3.6.

Estimates on operator norms and Lagrange multipliers
Lemma 8.The spectral norm of the block matrix B given by (8) fulfills For the particular choice of constant ρ s,s = ρ (independent of s, s ) as on the left-hand side of (7) we have the improved estimate For only coupling neighboring u s with the same constant ρ, i.e., the right-hand coupling of (7), we have Proof.We decompose the matrix B according to . Here Ã denotes an S × S -block diagonal matrix with each diagonal entry being equal to A, where A is the matrix representing the forward/imaging operator; see (8).The matrix P is given as the lower S 2 × S -block in (8) which represents the soft constraints.
Using this decomposition of B, we may decompose the symmetric and positive (semidefinite) matrix B T B according to where ÃT Ã is an S × S -block diagonal matrix with each diagonal entry being equal to A T A, and PT P is an S × S -block diagonal matrix with block entries given by with ρ l,k := ρ k,l for l > k.Using Gerschgorin's Theorem (see for instance [79]), the eigenvalues of P are contained in the union of the balls with center x r = S k=1,k r ρ r,k and radius x r = S k=1,k r |−ρ r,k |.These balls are all contained in the larger ball with center 0 and radius 2•max r x r .This implies the general estimate (39).
For seeing (41), we notice that in case of (41), the matrix PT P has cyclic shift structure with three nonzero entries in each line.The discrete Fourier matrix w.r.t. the cyclic group of order S diagonalizes PT P. The corresponding eigenvalues are given by λ k = ρ 2 − 2 cos 2π k
Note that the problem of estimating the operator norm of B in (39) involves computing the operator norm of P given by (43).This problem is intimately related to computing the spectral norm of the Laplacian of a corresponding weighted graph (e.g., [37,78]), in particular, we conclude from this link that the general estimate (39) is sharp in the sense that the factor of 2 in front of the sum cannot be made smaller.This is because, for a general graph, the spectral radius of the (normalized) Laplacian has spectral norm smaller than two and this factor of two is sharp; cf.[37,78].
We recall that we have introduced the concept of a directional partitioning I and discussed its relation with the concept of a partitioning near (21) above.For a function f : Ω → R S (representing its S component functions f 1 , . . ., f S : Ω → R) defined on a grid Ω, we consider the orthogonal projection P I associated with a directional partition I by first sorting the intervals I into I 1 , . . ., I S according to their associated directions a s , s = 1, . . ., S , and then letting , where i.e., the function P I s f s on the interval I is given as the arithmetic mean of f i on the interval I for all intervals I ∈ I s , and for all s = 1, . . ., S .Here, the symbol #I denotes the number of elements in I.We note that P I defines an orthogonal projection on the corresponding 2 space of discrete functions f : Ω → R S with the norm f 2 = s,i |( f s ) i | 2 where i iterates through all the indices of f s .
We consider a partitioning P of Ω, its induced directional partitioning I P w.r.t. a set of S directions a 1 , . . ., a S , and the subspace of functions which are constant on the intervals of the induced directional partitioning I P (which equal the image of the orthogonal projection P I P .)Functions g : Ω → R which are piecewise constant w.r.t. a partitioning P, i.e., they are constant on each segment P i are in one-to-one correspondence with the linear subspace B P of A P given by as shown by the following lemma.
Lemma 9.There is a one-to-one correspondence between the linear space of piecewise constant mappings w.r.t. the partitioning P, and the subspace B P of A P via the mapping ι : g → (g, . . ., g).
Proof.Let g be a piecewise constant mapping w.r.t. the partitioning P, then (g, . . ., g) is constant on each interval I of the induced directional partitioning I P , and (g, . . ., g) ∈ B P .This shows that ι is well-defined in the sense that its range is contained in B P .Obviously, ι is an injective linear mapping so that it remains to show that any f ∈ B P is the image under ι of some g : Ω → R which is piecewise constant w.r.t. the partitioning P. To this end, let f ∈ B P .By definition, f has the form f = (g, . . ., g) for some g : Ω → R. Now, towards a contradiction, assume there is a segment P i and points x, y ∈ P i with g(x) g(y).Since there is a path x 0 = x, . . ., x N = y connecting x, y in P i with steps in a 1 , . . ., a S , we have that for any i there is an interval I in the induced partitioning I P containing x i together with x i+1 .Since g is constant on each I in I P we get g(x i ) = g(x i+1 ) for all i which implies g(x) = g(y).This contradicts our assumption and shows the lemma.
Using the identification given by Lemma 9, we define, for a given partitioning P, the projection Q P onto B P by i.e., we average w.r.t. the segment and to all component functions as given by (22).Since the components of Q P f are all identical, we will not distinguish Q P and π P in the following.This means that we also use the symbol Q P f to denote the scalar-valued function which is piecewise constant on the partitioning P.
On A P , we consider the problem argmin i.e., given the directional partitioning we are searching for a solution which belongs to B P .Here, C denotes the matrix where the c s,s are as in ( 5); if c s,s = 0, the corresponding line is removed from the constraint matrix C. For the special choices of (7), we have which reflects the constraints u 1 = . . .= u S .We recall that µ P is a Lagrange multiplier of the problem in (48) if We note that for quadratic problems such as in (48) Lagrange multipliers always exist [7].We have that 2 or, in other form, L(µ P ) := 2 S P I P ÃT ÃP I P u * P − 2 S P I P ÃT f − P where Ã is the block diagonal matrix with constant entry A on each diagonal component, f is a block vector of corresponding dimensions with entry f in each component, and u * P is a minimizer of the constraint problem in B P .We note that the last equality C T µ P = P I P C T µ P in (52) holds since the left-hand side of ( 52) is contained in the image of P I P .Lemma 10.We consider a partitioning P of the discrete domain Ω, and the corresponding problem (48).There is a Lagrange multiplier µ P for (48) with Here, σ 1 is the smallest non-zero eigenvalue of C T C with C given by (49).For the particular choice of C given by the left-hand side of (50) we have and, for the particular choice of C given by the right-hand side of (50) we have (e.g., for S = 4, an eight neigborhood, 2 − 2 cos(2π/S ) = σ 1 = 2.) In particular, the right-hand side and the constants in all these estimates are independent of the particular partitioning P.
Proof.For any minimizer u * P of the constraint problem in B P , we have that where we recall that Ã is the block diagonal matrix with constant entry A, and f is a block vector with entry f in each component.The first inequality is a consequence of the fact that P I P is an orthogonal projection.The second inequality may be seen by evaluating the functional for the constant zero function (which always belongs to B P ) as a candidate and by noting that A T = A .Using ( 52), we have C T µ P ≤ 2 √ S A f .Choosing µ P in the complement of the zero space of C T , we get We observe that finding the infimum in ( 58) corresponds to finding the square root of the smallest nonzero eigenvalue of C T C.This is because (i) the nonzero eigenvalues of C T C equal the nonzero eigenvalues of CC T , i.e., where σ 1 is the smallest non-zero eigenvalue of x, CC T x ≥ min σ : σ ∈ spectrum(CC T ) \ {0} x 2 .Hence, using ( 59) in ( 58) we get that C T µ P ≥ √ σ 1 µ P , and together with ( 52) and ( 57), we obtain which shows (54).Now we consider the particular choice of C given by the left-hand side of (50).Similar to the derivation in (43), we have that C T C = S • I − (1, . . ., 1)(1, . . ., 1) T ).Further, the constants constitute the kernel of C T C and any vector u in its orthogonal complement is mapped to S u.Hence, σ 1 = S which shows (55).
Finally, we consider the particular choice of C given by the right-hand side of (50).As already explained in the proof of Lemma 8 the discrete Fourier transform shows that the corresponding eigenvalues are given by λ k = ρ 2 − 2 cos 2π k S , where k = 0, . . ., S − 1.The smallest nonzero eigenvalue is thus given by 2 − 2 cos(2π/S ).This shows (56) which completes the proof of the lemma.

The quadratic penalty relaxation of the Potts problem and its relation to the Potts problem
In this subsection we reveal some relations between the Potts problem and its quadratic penalty relaxation; in particular, we show Theorem 3 and parts of Theorem 5. We start out to show that the quadratic penalty relaxation of the Potts problem is NP hard which was formulated as Theorem 3.
Proof of Theorem 3. We consider the quadratic penalty relaxation (5) of the multivariate Potts problem in its equivalent form (11) which reads .
with B and g given by ( 8) and D given by ( 9).We serialize u : (u 1 , . . ., u S ) : Ω → R S into a function û : X → R with X ⊂ Z being a discrete interval of size S #Ω as follows: For u s , we consider the discrete lines in the image with direction a s and interpret u on these lines as a vector; then we concatenate these vectors starting with the one corresponding to the leftmost upper line to obtain a vector of length #Ω; for each s, we obtain such a vector and we again concatenate these vectors starting with index s = 1, 2, . . . to obtain the resulting object which we denote by û.Using this serialization we may arrange B, g and D accordingly to obtain the univariate Potts problem , where ω : which is a sparsity problem and which is known to be NP hard; see, for instance, [82].This shows the assertion.
We next characterize the local minimizers of the relaxed Potts problem (5) and of the Potts problem (2).Lemma 11.A local minimizer u = (u 1 , . . ., u S ) of the quadratic penalty relaxation (5) is characterized as follows: let I be the directional partitioning induced by the minimizer u, and P = P I be the induced partitioning, then u is a minimizer of the problem Conversely, if u minimizes (62) on A P , then u is a minimizer of the relaxed Potts problem (5).
Proof.Let u = (u 1 , . . ., u S ) be a local minimizer of the quadratic penalty relaxation (5).Hence, there is a neigborhood U of u such that, for any v ∈ U, P γ,ρ (v) ≥ P γ,ρ (u).Now if v ∈ A P and Proposition 13.Any local minimizer of the quadratic penalty relaxation (5) is an approximate local minimizer in the sense of (35) of the Potts problem (3).
Proof.By Lemma 11, a local minimizer u = (u 1 , . . ., u S ) of the quadratic penalty relaxation ( 5) is a minimizer of the problem (62).Let us thus consider a local minimizer u of ( 5) with induced partitioning P = P I .Since u minimizes (62), we have 1 since the gradient projected to A P equals zero for any local minimizer of the restricted problem on the subspace A P .(The notation is chosen as in (53) above.)We define µ by µ = ρCP I u and obtain by (66).It remains to show that Cu becomes small.To this end, we observe that, by Lemma 14, , where µ * is an arbitrary Lagrange multiplier of (48).Plugging in the minimizer u for v yields Cu < 1 ρ µ * .Thus letting δ = 1 ρ µ * , we have and L(µ) = 0 by (67) which by (35) shows the assertion and completes the proof.
For the proof of Proposition 13 as well as in the following we need the next lemma.Similar statements are [52, Proposition 13] and [58, Lemma 2.5].However, since there are differences concerning the precise estimate in these references, and the setup here is slightly different, we provide a brief proof here for the readers convenience.Lemma 14.Let P be a partitioning and I = I P be the corresponding induced partitioning.For where µ * is an arbitrary Lagrange multiplier of (48).
Proof.By [52, Corollary 2], we have for arbitrary Then, For the first inequality we wrote down the definition of F ρ and restricted the set with respect to which the minimum is formed which results in a potentially larger functional value.For the second inequality we notice that, for (y, . . ., y) ∈ B P , we have C(y, . . ., y) = 0, and for the last inequality we employed (70).Now, writing and plugging this into (71) with z := Cv yields and hence where the last inequality is a consequence of the fact that the unit ball w.r.t. the 1 norm is contained in the the unit ball w.r.t. the 2 norm.As a consequence, Cv ≤ 2ρ which completes the proof.Next, we see that for any local minimizer of the quadratic penalty relaxation (5), we can find a nearby feasible point using the projection procedure (Procedure 1) proposed in Section 2.2.Further, if the imaging operator A is lower bounded, we find a nearby minimizer.Proposition 15.Procedure 1 applied to a local minimizer u = (u 1 , . . ., u S ) of the quadratic penalty relaxation (5) produces a feasible image û (together with a valid partitioning) for the Potts problem (3) which is close to u in the sense that where ε = max s,s u s − u s quantifies the deviation between the u s .Here C 1 = #Ω/4, where the symbol #Ω denotes the number of elements in Ω.
If the imaging operator A is lower bounded, i.e., there is a constant c > 0 such that Au ≥ c u , a local minimizer u * of the Potts problem (3) is nearby, i.e., where Proof.We denote the directional partitioning induced by u by I and the corresponding induced partitioning by P = P I .We note that Procedure 1 applied to u precisely produces with the projection Q P given by (47).We first note, that the average (ū Further, the function value of û which is piecewise constant w.r.t.P is obtained by û| P i = x∈P i ū(x)/#P i .Hence, we may estimate where L is the maximal length of a path connecting any two pixels as given by Definition 2. As a worst case estimate, we get L ≤ C 1 where we define C 1 as one fourth of the number of elements in Ω, i.e., C 1 = #Ω 4 .This shows (74).For F ρ given by ( 62), we have as given in (76).The first inequality holds since as a local minimizer of the quadratic penalty relaxation (5), u is the global minimizer of F ρ on A P by Lemma 11 and since (û, . . ., û) ∈ A P by construction.The next inequalities apply the triangle inequality and estimates on matrix norms.The last inequality is a consequence of the fact that S s=1 choosing u s = 0 would yield a lower functional value which would contradict the minimality of u .
Now consider the partitioning P induced by û, and the corresponding minimizer u * , i.e., (u * , . . ., u * ) = argmin where, for (u, . . ., u) ∈ B P , we have F ρ (u, . . ., u) = Au − f 2 2 .By Lemma 12, u * is a local minimizer of the Potts problem (2).On the other hand, by orthogonality in an inner product space, we have Au * = P A(B P ) f, and where P A(B P ) denotes the orthogonal projection onto the image of B P under the linear mapping A. Thus, Inserting u * in the estimate (79), we get This allows us to further estimate If now the operator A is lower bounded, then which completes the proof.

Majorization-minimization for multivariate Potts problems
In this part we build the basis for the convergence analysis of Algorithm 1 and Algorithm 2.
We first recall some basics on surrogate functionals.We consider functionals F(u) of the form F(u) = Xu − z 2 + γJ(u), where X is a given (measurement) matrix with operator norm X < 1 (with the operator norm formed w.r.t. the 2 norm), z is a given vector (of data), J is an arbitrary (not necessarily convex) lower semicontinuous functional, and γ > 0 is a parameter.In general, the surrogate functional F surr (u, v) of F(u) is given by Lemma 16.Consider the functionals F(u) = Xu − z 2 + γJ(u) as above with X < 1. (For our purposes, J is the regularizer D(u) 0,ω given by (10).)Then, we get for the associated surrogate functional F surr given by (87) (with J as regularizer), that i. the inequality holds for all v; and F surr (u, v) = F(u) holds if and only if u = v; ii. the functional values F(u k ) of the sequence u k given by the surrogate iteration u k+1 = argmin u F surr (u, u k ) are non-increasing, i.e., F(u k+1 ) ≤ F(u k ); (88) iii. the distance between consecutive members of the previous surrogate sequence u k converges to 0, i.e., lim We note that -when minimizing F-the condition X < 1 can always be achieved by rescaling, i.e., dividing the functional F by a number which is larger than X 2 .Proofs of the general statements above on surrogate functionals (which do not rely on the specific structure of the problems considered here) may for instance be found in the above mentioned papers [9,27,32].
We now employ properties of the quadratic penalty relaxation P γ,ρ (u 1 , . . ., u S ) of the Potts functional given by (5).The strategy is similar to the authors' approach for the univariate case in [88].We first show that the minimizers of P γ,ρ (u 1 , . . ., u S ) (with B = id in (11)) which are precisely the solutions of ( 16) have a minimal directional jump height which only depends on the scale parameter γ, the directional weights ω s and the constant L ρ but not on the particular input data.Here, for the multivariate discrete function u = (u 1 , . . ., u S ) (and the directional system a s , s = 1, . . ., S ) a directional jump is a jump in the sth component u s in direction a s for some s.In particular, jumps of u s in directions a s with s s are not considered.
Lemma 17.We consider the functional P γ,ρ (u 1 , . . ., u S ) of (11) for the choice B = id and data h = (h 1 , . . ., h S ).In other words, we consider the problem (16) for arbitrary data h = (h 1 , . . ., h S ).Then there is a constant c > 0 which is independent of the minimizer u * = (u * 1 , . . ., u * S ) of ( 16) and the data h such that the minimal directional jump height j min (u * ) (w.r.t. the directional system a s , s = 1, . . ., S ,) of a minimzer u * fulfills j min (u * ) ≥ c. ( The constant c depends on γ, the directional weights ω s and the constant L ρ . Proof.Writing u = (u 1 , . . ., u S ) we restate (16) as the problem of minimizing where we use the notation D(u 1 , . . ., u S ) 0,ω = S s=1 ω s ∇ a s u s 0 introduced in (10).We let where W denotes the maximal length of the signal u per dimension (e.g., if u denotes an l × b image, then W = max(l, b).)We now assume that h min (u * ) < c, which means that the minimizer u * has a directional jump of height smaller than c.For such u * , we construct an element u with a smaller P id γ/L 2 ρ value which yields a contradiction since u * is a minimizer of P id γ/L 2 ρ .To this end, we let a s be a direction such that the component u * s of u * has a jump of height smaller than c.We denote the (discrete) directional intervals in direction a s near the directional jump by I 1 , I 2 and the corresponding points near the directional jump of u * s by x 1 and x 2 .We let m 1 , m 2 and m be the mean of h s on I 1 , I 2 and I 1 ∪ I 2 , respectively.We define By construction, ∇u 0 = ∇u * 0 − 1. and thus as given by (17).As introduced before, we use the symbols Ã to denote the block diagonal matrix with constant entry A on each diagonal component, and f for the block vector of corresponding dimensions with entry f in each component.With this notation we may write (97) as Since u (n) is piecewise constant w.r.t. the directional partitioning I , we have u (n) = P I u (n) .Using this fact and the fact that P I is an orthogonal projection we obtain Since C ÃT f = 0, the iteration (100) can be interpreted as Landweber iteration for the block matrix consisting of the upper block ( ÃP I )/( √ S L ρ ) and the lower block ( √ ρCP I )/( √ S L ρ ) and data f /( √ S L ρ ) extended by 0. The Landweber iteration converges at a linear rate; cf., e.g., [30].Thus, the iteration (97) convergences and, in turn, we get the convergence of Algorithm 1 at a linear rate to some limit u * .
(3) We show that u * is a local minimizer.Since u * is the limit of the iterates u (n) , the jumps of u * also have minimal height c, the number of jumps are equal to those of the u (n) for all n ≥ N, and the induced directional partitioning I * equals the partitioning I of the u (n) for n ≥ N. Since u * equals the limit of the above Landweber iteration, u * minimizes F ρ given by ( 62

Estimating the distance between the objectives
The next lemma is a preparation for the proof of item (iii) of Theorem 4.
Lemma 19.We consider Algorithm 1 for the quadratic penalty relaxation (5) of the multivariate Potts problem.For any output u = (u 1 , . . ., u S ) of Algorithm 1 we have that s,s c s,s u s − u s Here, σ 1 denotes the smallest non-zero eigenvalue of C T C with C given by (49).
Proof.Since u = (u 1 , . . ., u S ) is the output of Algorithm 1 it is a local minimizer of the relaxed Potts problem (5).In particular, there is a directional partitioning I with respect to which u is piecewise constant.We denote the induced partitioning by P = P I .By Lemma 14 we have where µ * is an arbitrary Lagrange multiplier of (48).By Lemma 10 we have that µ * ≤ 2σ −1/2 1 S −1/2 A f , for any partitioning of the discrete domain Ω, and in particular for the partitioning P = P I .This shows that Since u is a local minimizer of the relaxed Potts problem (5), it is a minimizer of F ρ on A P by Lemma 11, and the second summand on the right hand side equals zero.This shows (101) and completes the proof.
We have now gathered all information necessary to show Theorem 4.
Proof of Theorem 4. Part (i) is shown by Proposition 18. Concerning (ii) we first show that any global minimizer of the relaxed Potts functional P γ,ρ given by ( 5) appears as a stationary point of Algorithm 1.To this end we start Algorithm 1 with a global minimizer ū * = (u * 1 , . . ., u * S ) as initialization.Then, we have for all v = (v 1 , . . ., v S ) with v ū * , Here, B is given by ( 8).The estimate (103) means that ū * is the minimizer of the surrogate functional w.r.t. the first component, i.e., it is the minimizer of the mapping v → P surr γ,ρ (v, ū * ).Thus, the iterate ū(1) = (u (1)  1 , . . ., u (1) S ) of Algorithm 1 equals ū * when the iteration is started with ū * .Thus, the global minimizer ū * is a stationary point of Algorithm 1.It remains to show that each stationary point of Algorithm 1 is a local minimizer of the relaxed Potts functional P γ,ρ .This has essentially already been done in the proof of Proposition 18: start the iteration given by ( 16) with a stationary point u ; its limit equals u and is thus a local minimizer by Proposition 18.
Concerning (iii), we use Lemma 19 to estimate The second inequality follows by our choice of ρ in (34) as ρ > 2ε −1 σ −1/2 1 S −1/2 A f .This shows the validity of (iii) and completes the proof.

Convergence Analysis of Algorithm 2
We start out showing that Algorithm 2 is well-defined in the sense that the inner iteration governed by (25) terminates.This result was formulated as Theorem 6.
Proof of Theorem 6.We have to show that, for any k ∈ N, there is n ∈ N such that To see the right-hand side of (105), we notice that, by Proposition 18, the iteration (19) converges to a local minimizer of the quadratic penalty relaxation P γ,ρ (u 1 , . . ., u S ) of the Potts functional.The inner loop of Algorithm 2 precisely computes the iteration (19) (for the penalty parameter ρ which increases with k.) Thus, the distance between consecutive iterates u (k,n) s , u (k,n−1) s converges to zero as n increases which implies the validity of the right-hand side of (105) for sufficiently large n, and all k ∈ N.
To see the left-hand inequality in (105), we notice that, by the considerations above, the inner loop of Algorithm 2 would converge to a minimizer ū(k), * = (u (k), * 1 , . . ., u (k), * S ) if it was not terminated by (105) for all k ∈ N. Since ū(k), * is a local minimizer of the relaxed Potts problem (5) for the parameter ρ k , it is a minimizer of F ρ k on A P (where P denotes the partitioning induced by ū(k), * ) by Lemma 11.Hence, for any k ∈ N and any ξ > 0 there is ū Using this together with Lemma 14 we estimate where µ * is an arbitrary Lagrange multiplier of (48).Here, the last inequality is true since by Lemma 10 we have that µ The estimate (106) shows the left-hand inequality in (105) and completes the proof.
We have now gathered all information to prove Theorem 7 which deals with the convergence properties of Algorithm 2.
Proof of Theorem 7. We start out to show that any accumulation point of the sequence u (k) produced by Algorithm 2 is a local minimizer of the Potts problem (3).Let u * be such an accumulation point and let I * be the directional partitioning induced by u * .We may extract a subsequence u (k l ) of the sequence u (k) such that u (k l ) converges to u * as l → ∞, and such that the directional partitionings I k l induced by the u (k l ) all equal the directional partitioning I * , i.e., I k l = I * for all l ∈ N. We let with the matrix C given by (49), and estimate We recall that Ã was the block diagonal matrix having the matrix A as entry in each diagonal component and that F ρ k l was given by (62).We notice that the second before last inequality follows by the right hand side of (105).We further estimate which is a consequence of the left hand side of (105).Hence, the sequence µ k l is bounded and thus has a cluster point, say µ * , by the Bolzano-Weierstraß Theorem.By passing to a further subsequence (where we suppress the new indexation for better readability and still use the symbol l for the index) we get that Now, on this subsequence, we have that u (k l ) → u * and that µ k l → µ * .Hence taking limits on both sides of (108) yields since δ k l → 0 as l → ∞.Further, This implies that the components of u * are equal, i.e., u * s = u * s for all s, s .In particular u * is a feasible point for the Potts problem (3).Or, letting P * be the partitioning induced by u * , we have that u * ∈ B P .Then, (110) tells us that u * minimizes (48) which by Lemma 12 tells us that u * is a local minimizer of (3), or synonymously, that any component of u * (which are all equal) minimizes the Potts problem (2).This shows the first assertion of Theorem 7.
We continue by showing the second assertion of Theorem 7, i.e., if A is lower bounded, then the sequence u (k) produced by Algorithm 2 has a cluster point.Then, by the above considerations, each cluster point is a local minimizer which shows the assertion.To this end, we show that, if A is lower bounded, the sequence u (k) produced by Algorithm 2 is bounded which by the Heine-Borel property of finite dimensional Euclidean space implies that it has a cluster point.So we assume that A is lower bounded, and consider the sequence u (k) = (u (k) 1 , . . ., u (k) S ) produced by Algorithm 2. As in the proof of Theorem 6 we see that, for any k ∈ N, there is a local minimizer u (k), * = (u (k), * 1 , . . ., u (k), * S ) of ( 5) such that u where C 2 is a constant independent of k.By Lemma 11, u (k), * is a minimizer of F ρ on A P (where P denotes the partitioning induced by u (k), * .)Hence, by choosing the function having the zero function as entry in each component as a candidate.This implies Then, since A is lower bounded, there is a constant c > 0 such that where we used (113) for the last inequality.Combining this estimate with (112) yields Since we have chosen δ k as a sequence converging to zero, (115) shows that the sequence u (k) is bounded which implies that it has cluster points.This completes the proof.Application to blurred data.For the following experiments, we focus on Algorithm 2. In case of motion blur we set the step-size parameter to λ = 0.25, while for Gaussian blur we set λ = 0.35 as in Figure 1.We compare our method with the Ambrosio-Tortorelli approximation [2] of the classical Mumford-Shah model (which itself tends to the piecewise constant Mumford-Shah model for increasing variation penalty) given by The variable v serves as an edge indicator and ε > 0 is an edge smoothing parameter that is chosen empirically.The parameter γ > 0 controls the weight of the edge length penalty and the parameter α > 0 penalizes the variation.In this respect, a higher value of α promotes solutions which are closer to being piecewise constant.In the limit α → ∞ minimizers of (116) are piecewise constant.Our implementation follows the scheme presented in [6].The functional A ε is alternately minimized w.r.t.u and v.To this end, we iteratively solve the Euler-Lagrange equations where K(x) = K(−x).The first line is solved w.r.t.v using a MINRES solver and the second line is solved using the method of conjugate gradients [6].The iterations were stopped when the relative change of both variables was small, i.e., if both u k+1 − u k /( u k + 10 −6 ) < 10 −3 and v k+1 − v k /( v k + 10 −6 ) < 10 −3 .Figure 2 shows the restoration of a traffic sign from simulated horizontal motion blur.For the Ambrosio-Tortorelli approximation we set α = 10 5 to promote a piecewise constant solution.We observe that both the Ambrosio-Tortorelli approximation and the proposed method restore the data to a human readable form.However, the Ambrosio-Tortorelli result shows clutter and blur artifacts.Our method provides sharp edges and it produces less artifacts.
In Figure 3 we partition a natural image blurred by a Gaussian kernel and corrupted by Gaussian noise.We observed that the Ambrosio-Tortorelli result was heavily corrupted by artifacts for α = 10 5 .This might be attributed to the underlying linear systems in scheme (117)   which become severely ill-conditioned for large choices of the variation penalty α.Therefore, we chose the moderate variation penalty α = 5 which does only provide an approximately piecewise constant (rather piecewise smooth) result.The result does not fully separate the background from the fish in terms of edges.On the other hand, the result of the proposed method sharply differentiates between background and fish.Further, it highlights various segments of the fish.
Reconstruction from Radon data.We here consider reconstruction from Radon data which appears for instance in computed tomography.We recall that the Radon transform reads where s ∈ R, θ ∈ S 1 and θ ⊥ ∈ S 1 is (counterclockwise) perpendicular to θ; see [62].For our experiments, we use a discretization of the Radon transform created using the AIR tools software package [38].Regarding our method, we employed coupling of consecutive splitting variables

Conclusion
In this paper, we have proposed a new iterative minimization strategy for multivariate piecewise constant Mumford-Shah/Potts functionals as well as their quadratic penalty relaxations.Our schemes are based on majorization-minimization or forward-backward splitting methods of Douglas-Rachford type [55].In contrast to the approaches in [9,32,58,59] for sparsity problems which lead to thresholding algorithms, our approach leads to non-separable yet computationally tractable problems in the backward step.As a second part, we have provided a convergence analysis for the proposed algorithms.For the proposed quadratic penalty relaxation scheme, we have shown convergence towards a local minimizer.Due to the NP hardness of the quadratic penalty relaxation, the convergence result is in the range of what can be expected best.Concerning the scheme for the non-relaxed Potts problem we have also performed a convergence analysis.In particular, we have obtained results on the convergence towards local minimizers on subsequences.The quality of these results is comparable with the results of [58,59] where, compared with these papers, we had to deal with the non-separability of the backward step as an additional challenge.
Finally, we have shown the applicability of our schemes in several experiments.We have applied our algorithms to deconvolution problems including the problem of deblurring and denoising motion blur images.We have further dealt with noisy and undersampled Radon data for the task of joint reconstruction, denoising and segmentation.Finally, we have applied our approach to the situation of pure image partitioning (without blur) which is a widely considered problem in computer vision.

Theorem 4 .
We consider the iterative Potts minimization Algorithm 1 for the quadratic penalty relaxation (5) of the multivariate Potts problem.i. Algorithm 1 computes a local minimizer of the quadratic penalty relaxation (5) of the multivariate Potts problem for any starting point.The convergence rate is linear.ii.We have the following relation between local minimizers L, global minimizers G and the fixed points Fix(I) of the iteration of Algorithm 1,

Theorem 5 .
We consider the iterative Potts minimization Algorithm 1 for the quadratic penalty relaxation (5) in connection with the (non-relaxed) Potts problem (3).

i. Algorithm 1
produces an approximative solution in the sense of (35) of the Potts problem (3).
is a weight vector, ω∇û denotes pointwise multiplication, and B, ĝ are the matrix and the vector corresponding to B, g w.r.t. the serialization.The weight vector may be zero which in particular happens at the line breaks, i.e., those indices where two vectors have been concatenated in the above procedure.More precisely, constant data induce a directional segmentation on Ω and the image of the directional segmentation under the above serialization procedure induces a partitioning of the unvariate domain D; precisely between these segments, the weight vectors equals zero.Now, for each segment [d 1 , . . ., d r ] in D, we transform the basis δ d 1 , . . ., δ d r to the basis δ d 2 − δ d 1 , . . ., δ d r − δ d r−1 , 1 r r l=1 δ d l obtained by neighboring differences and the average.As a result (which is in detail elaborated in [82]), we obtain a problem of the form Pγ,ρ (û) = Bũ − b 2 2 + γ ωũ 0 ) on A P I .Then by Lemma 11 u * is a local minimizer of the relaxed Potts functional P γ,ρ which completes the proof.After having shown the convergence of Algorithm 1 to a local minimizer, we have now gathered all information to show Theorem 5. Proof of Theorem 5. Assertion i. was stated and shown as Proposition 13 in Section 3.3.By Proposition 18 Algorithm 1 produces a local minimizer.Then the assertion ii. is a consequence of Proposition 15.

Figure 1 :
Figure 1: Results of Algorithm 1 and Algorithm 2 for partitioning an image blurred by a Gaussian kernel of standard deviation 3.Both methods provide reasonable partitionings.Algorithm 2 provides smoother edges than Algorithm 1 (e.g., the boundary between the meadow and the forest, the back of the cow).In addition, Algorithm 1 produces some smaller segments around the treetops.

Figure 2 :
Figure 2: Restoration from simulated horizontal motion blur of 80 pixel length and Gaussian noise with σ = 0.02.The result of the Ambrosio-Tortorelli scheme exhibits noisy and blurred artifacts and bumpy edges (e.g., the boundaries of the digits).The contours of the proposed result are concise and considerably less clutter is present.

Figure 3 :
Figure 3: Partitioning of an image blurred by a Gaussian kernel of standard deviation 7 and corrupted by Gaussian noise with σ = 0.2.The result of the Ambrosio-Tortorelli approximation does not yield a convincing partitioning of the scene, in particular many parts of the fish are merged with the background.The proposed approach provides a partitioning which reflects many parts of the fish.

(a) 1 Figure 5 :
Figure 5: Comparison of partitionings of a natural image corrupted by Gaussian noise with σ = 0.2.The Ambrosio-Tortorelli result is noisy and corrupted by clutter.The L 0 gradient smoothing over-segments the large window on the left hand side, while details of the cross in the bottom right are smoothed out.The proposed result is visually competitive with the state-of-the-art graph cuts result.

2
S P I P ÃT ÃP I P u * P − 2 S P I P ÃT f ≤ 2 s equals m 1 on I 1 and m 2 on I 2 .Further, as u s = u * s if s s and u * s and u s only differ on I 1 ∪ I 2 , we have that ρ, its sth component u *