1 Introduction

Problems involving reconstruction tasks for functions with discontinuities appear in various biological and medical applications. Examples are the steps in the rotation of the bacterial flagella motor [70, 78, 79], the cross-hybridization of DNA [30, 44, 77], X-ray tomography [74], electron tomography [49] and SPECT [51, 93]. An engineering example is crack detection in brittle material in mechanics [3]. Further examples may for instance be found in the papers [22, 25, 34, 58, 59] and the references therein. In general, signals with discontinuities appear in many applied problems. A central task is to restore the jumps, edges, change points or segments of the signals or images from the observed data. These observed data are usually indirectly measured. Furthermore, they consist of measurements on a discretized grid and are typically corrupted by noise.

In many scenarios, non-convex nonsmooth variational methods are a suitable choice for the partitioning task, i.e., the task of finding the jumps/edges/change points; see for example [13, 58, 70]. In particular, methods based on piecewise constant Mumford–Shah functionals [62, 63] have been used in various different applications. The piecewise constant Mumford–Shah model also appears in statistics and image processing where it is often called Potts model [13,14,15, 36, 72, 91]; this is a tribute to Renfrey B. Potts and his work in statistical mechanics [73]. The variational formulation of the piecewise constant Mumford–Shah/Potts model (with an indirect measurement term) is given by

$$\begin{aligned} \textstyle {\text {argmin}}_u \ \gamma \, \Vert \nabla u\Vert _{0} + \left\Vert A u - f \right\Vert _{2}^2. \end{aligned}$$
(1)

Here, A is a linear operator modeling the measurement process, e.g., the Radon transform in computed tomography (CT), or the point-spread function of the microscope in microscopy. Further, f is an element of the data space, e.g., a sinogram or part of it in CT, or the blurred microscopy image in microscopy. The mathematically precise definition of the jump term \(\Vert \nabla u \Vert _{0}\) in the general situation is rather technical. However, if u is piecewise constant and the discontinuity set of u is sufficiently regular, say, a union of \(C^1\) curves, then \(\Vert \nabla u\Vert _{0}\) is just the total arc length of this union. In general, the gradient \(\nabla u\) is given in the distributional sense and the boundary length is expressed in terms of the \((d-1)\)-dimensional Hausdorff measure. When u is not piecewise constant, the jump penalty is infinite [75]. The second term measures the fidelity of a solution u to the data f. The parameter \(\gamma > 0\) controls the balance between data fidelity and jump penalty. (A wider class of Mumford–Shah models can be obtained by replacing the squared \(L^2\) distance by more general data terms such as other norm-based expressions or divergences.)

The piecewise constant Mumford–Shah/Potts model can be interpreted in two ways. On the one hand, if the imaged object is (approximately) piecewise constant, then the solution is an (approximate) reconstruction of the imaged object. On the other hand, since a piecewise constant solution directly induces a partitioning of the image domain, it can be seen as joint reconstruction and segmentation. Executing reconstruction and segmentation jointly typically leads to better results than performing the two steps successively [51, 74, 75, 85]. We note that in order to deal with the discrete data, the energy functional is typically discretized; see Sect. 2.1. Some references concerning Mumford–Shah functionals are [2, 8, 18, 33, 45, 67, 75] and also the references therein; see also the book [1]. The piecewise constant Mumford–Shah functionals are among the maybe most well-known representatives of the class of free-discontinuity problems introduced by De Giorgi [29].

The analysis of the nonsmooth and non-convex problem (1) is rather involved. We discuss some analytic aspects. We first note that without additional assumptions the existence of minimizers of (1) is not guaranteed in a continuous domain setting [32, 33, 75, 84]. To ensure the existence of minimizers, additional penalty terms such as an \(L^p\) (\(1<p<\infty \)) term of the form \(\Vert u\Vert _p^p\) [74, 75] or pointwise boundedness constraints [45] have been considered. We note that the existence of minimizers is guaranteed in the discrete domain setup for typical discretizations [33, 84]. Another important topic is to verify that the Potts model is a regularization method in the sense of inverse problems. The first work dealing with this task is [75]: The authors assume that the solution space consists of non-degenerate piecewise constant functions with at most k (arbitrary, but fixed) different values which are additionally bounded. Under relatively mild assumptions on the operator A, they show stability. Further, by giving a suitable parameter choice rule, they show that the method is a regularizer in the sense of inverse problems. Related references are [45, 50] with the latter including (non-piecewise constant) Mumford–Shah functionals. We note that Mumford–Shah approaches (including the piecewise constant Mumford–Shah variant) also regularize the boundaries of the discontinuity set of the underlying signal [45].

Solving the Potts problem is algorithmically challenging. For \(A = \mathrm {id},\) it is NP-hard for multivariate domains [13, 87], and, for general linear operators A,  it is even NP-hard for univariate signals [84]. Thus, finding a global minimizer within reasonable time seems to be unrealistic in general. Nevertheless, due to its importance, many approximative strategies for multivariate Potts problems with \(A = \mathrm {id}\) have been proposed. (We note that the case \(A = \mathrm {id}\) is important as well since it captures the partitioning problem in image processing.) For the Potts problem with general A there are still some but not that many existing approaches, in particular in the multivariate situation. For a more detailed discussion, we refer to the paragraph on algorithms for piecewise constant Mumford–Shah problems below. A further discussion of methods for reconstructing piecewise constant signals may be found in [59]. In [90], we have considered the univariate Potts problem for a general operator A and have proposed a majorization–minimization strategy which we called iterative Potts minimization in analogy to iterative thresholding schemes. In this work, we will develop iterative Potts minimization schemes for the more demanding multivariate situation which is important for multivariate applications as appearing in imaging problems.

Existing Algorithmic Approaches to the Piecewise Constant Mumford–Shah Problem and Related Problems We start to consider the Potts problem for general operator A. In [5], Bar et al. consider an Ambrosio–Tortorelli-type approximation. Kim et al. use a level-set-based active contour method for deconvolution in [48]. Ramlau and Ring [74] employ a related level-set approach for the joint reconstruction and segmentation of X-ray tomographic images; further applications are electron tomography [49] and SPECT [51]. The authors of the present paper have proposed a strategy based on the alternating methods of multipliers in [84] for the univariate case and in [85] for the multivariate case.

Fornasier and Ward [33] rewrite Mumford–Shah problems as a pointwise penalized problem and derive generalized iterative thresholding algorithms for the rewritten problems in the univariate situation. Further, they show that their method converges to a local minimizer in the univariate case. Their approach principally carries over to the piecewise constant Mumford–Shah functional as explained in [84, 90] and then results in a \(\ell ^0\) sparsity problem. In the univariate situation, this NP-hard optimization problem is unconstrained and may be addressed by iterative hard thresholding algorithms for \(\ell ^0\) penalizations, analyzed by Blumensath and Davies in [9, 10]. (Note that related algorithms based on iterative soft thresholding for \(\ell ^1\) penalized problems have been considered by Daubechies, Defrise and De Mol in [28].) Artina et al. [3] in particular consider the multivariate discrete Mumford–Shah model using the pointwise penalization approach of [33]. In the multivariate setting, this results in a corresponding non-convex and nonsmooth problem with linear constraints. The authors successively minimize local quadratic and strictly convex perturbations (depending on the previous iterate) of a (fixed) smoothed version of the objective by augmented Lagrangian iterations which themselves can be accomplished by iterative thresholding via a Lipschitz continuous thresholding function. They show that the accumulation points of the sequences produced by their algorithm are constraint critical points of the smoothed problem. In the multivariate situation, a similar approach for rewriting the Potts problem results in an \(\ell ^0\) sparsity problem with additional equality constraints. Algorithmic approaches for such \(\ell ^0\) sparsity problem with equality constraints are the penalty decomposition methods of [60, 61, 96]. The connection with iterative hard thresholding is that the inner loop of the employed two-stage process usually is of iterative hard thresholding type. The difference of the hard thresholding-based methods to our approach in this paper is that we do not have to deal with constraints and the full matrix A but with the nonseparable regularizing term \(\Vert \nabla u \Vert _0\) instead of its separable counterpart \(\Vert u\Vert _0.\) Hence, we cannot use hard thresholding.

Another frequently appearing method in the context of restoration of piecewise constant images is total variation minimization [76]. There the jump penalty \(\Vert \nabla u \Vert _0\) is replaced by the total variation \(\Vert \nabla u \Vert _1.\) The arising minimization problem is convex and therefore numerically tractable with convex optimization techniques [21, 26]. Candès, Wakin and Boyd [17] use iteratively reweighted total variation minimization for piecewise constant recovery problems. Results of compressed sensing type related to the Potts problem have been derived by Needell and Ward [65, 66]: under certain conditions, minimizers of the Potts function agree with total variation minimizers. However, in the presence of noise, total variation minimizers might significantly differ from minimizers of the Potts problem. But the minimizers of the Potts problem are the results frequently desired in practice. Further, algorithms based on convex relaxations of the Potts problem (1) have gained a lot of interest in recent years; see, e.g., [4, 16, 20, 37, 56, 86].

We next discuss approaches for the multivariate Potts problem for the situation \(A = \mathrm {id}\) which is particularly interesting in image processing and for which there are some further approaches. The first class of approaches is the approach via graph cuts. Here, the range space of u is a priori restricted to a relatively small number of values. The problem remains NP-hard, but it then allows for an approach by sequentially solving binary partitioning problems via minimal graph cut algorithms [12, 13, 52]. We here point out that this approach also can deal with (possibly non-convex) data fidelity terms more general than the squared \(L^2\) data term employed in (1) (in the case \(A = \mathrm {id}\)). Another approach is to limit the number k of different values which u may take without discretizing the range space a priori. For \(k=2,\) active contours were used by Chan and Vese [24] to minimize the corresponding binary Potts model. They use a level-set function to represent the partitions which evolves according to the Euler–Lagrange equations of the Potts model. A globally convergent strategy for the binary segmentation problem is presented in [23]. The active contour method for \(k=2\) was extended to larger k in [88]. Note that, for \(k >2\) the problem is NP-hard. We refer to [27] for an overview on level-set segmentation. In [40,41,42], Hirschmüller proposes a non-iterative strategy for the Potts problem which is based on cost aggregation. It has lower computational cost, but comes with lower quality reconstructions compared with graph cuts. Due to the small number of potential values of u, these methods mainly appear in connection with image segmentation. Methods for restoring piecewise constant images without restricting the range space are proposed in Nikolova et al. [68, 69]. They use non-convex regularizers which are algorithmically approached using a graduated non-convexity approach. We note that the Potts problem (1) does not fall into the class of problems considered in [68, 69]. Last but not least, Xu et al. [94] proposed a piecewise constant model reminiscent of the Potts model that is approached by a half-quadratic splitting using a pixelwise iterative thresholding type technique. It was later extended to a method for blind image deconvolution [95].

Contributions The contributions of this paper are threefold: (i) We propose a new iterative minimization strategy for multivariate piecewise constant Mumford–Shah/Potts objective functions as well as a (still NP-hard) quadratic penalty relaxation. (ii) We provide a convergence analysis of the proposed schemes. (iii) We show the applicability of our schemes in several experiments.

Concerning (i), we propose two schemes which are based on majorization–minimization or forward–backward splitting methods of Douglas–Rachford type [57]. The one scheme addresses the Potts problem directly, whereas the other scheme treats a quadratic penalty relaxation. The solutions of the relaxed problem themselves are not feasible for the Potts problem but near to a feasible solution of the Potts problem where nearness can be quantified. In particular, when a given tolerance in applications is acceptable the relaxed scheme is applicable. In contrast to the approaches in [9, 33] and [60, 61] for sparsity problems which lead to thresholding algorithms, our approach leads to non-separable yet computationally tractable problems in the backward step.

Concerning (ii), we first analyze the proposed quadratic penalty relaxation scheme. In particular, we show convergence toward a local minimizer. Due to the NP-hardness of the quadratic penalty relaxation, the convergence result is in the range of what can be expected best. Concerning the scheme for the non-relaxed Potts problem we also perform a convergence analysis. In particular, we obtain results on the convergence toward local minimizers on subsequences. The quality of the convergence results is comparable with the ones in [60, 61]. We note that compared with [60, 61] we face the additional challenge to deal with the non-separability of the backward step. (We note that in practice we observe convergence of the whole sequence, not on a subsequence.)

Concerning (iii) we consider problems with full and partial data. We begin to apply our algorithms to deconvolution problems. In particular, we consider deblurring and denoising Gaussian blur images and motion blur images, respectively. We further consider noisy and undersampled Radon data, together with the task of joint reconstruction, denoising and segmentation. Finally, we use our method in the situation of pure image partitioning (without blur) which is a widely considered problem in computer vision.

Organization of the Paper In Sect. 2, we derive the proposed algorithmic schemes. In Sect. 3, we provide a convergence analysis for the proposed schemes. In Sect. 4, we apply the algorithms derived in the present paper to concrete reconstruction problems. In Sect. 5, we draw conclusions.

2 Majorization–Minimization Algorithms for Multivariate Potts Problems

2.1 Discretization

We use the following finite difference type discretization of the multivariate Potts problem (1) given by

$$\begin{aligned} P_\gamma (u) = \Vert Au - f\Vert _2^2 + \gamma \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} u \, \right\| _0, \end{aligned}$$
(2)

where the \(a_s \in {\mathbb {Z}}^2\) come from a finite set of directions and the symbol \(\nabla _{a_s} u \, (i,j)\) denotes the directional difference \(u_{(i,j)+a_s} - u_{i,j}\) with respect to the direction \(a_s\) at the pixel (ij). The symbol \(\Vert \nabla _{a_s} u \Vert _0 \) denotes the number of nonzero entries of \(\nabla _{a_s} u.\) The simplest set of directions consists of the unit vectors \(a_1=(0,1),\) \(a_2=(1,0)\) along with unit weights. Unfortunately, when refining the grid, this discretization converges to a limit that measures the boundary in terms of the \(\ell ^1\) analogue of the Hausdorff measure [18]. The practical consequences are unwanted block artifacts in the reconstruction (geometric staircasing). More isotropic results are obtained by adding the diagonals \(a_3=(1,1),a_4=(1,-1)\) to the directions \(a_1\) and \(a_2;\) a near isotropic discretization can be achieved by extending this system by the knight moves \(a_5=(1,2),a_6=(2,1),a_7=(1,-2),a_8=(2,-1).\) (The name is inspired by the possible moves of a knight in chess.) Weights \(\omega _s\) for the system \(\{a_1,a_2,a_3,a_4\}\) of coordinate directions and diagonal directions can be chosen as \(\omega _{s} = \sqrt{2}-1 \) for the coordinate part \(s=1,2\) and \(\omega _{s} = 1-\tfrac{\sqrt{2}}{2}\) for diagonal part \(s=3,4\). When additionally adding knight-move directions, weights \(\omega _s\) for the system \(\{a_1,\ldots ,a_8\}\) can be chosen as \(\omega _{s} = \sqrt{5} - 2\) for the coordinate part \(s=1,2,\) \(\omega _{s} = \sqrt{5} - \frac{3}{2}\sqrt{2}\) for diagonal part \(s=3,4\), and \(\omega _{s} = \frac{1}{2}(1 + \sqrt{2} - \sqrt{5}) \) for diagonal part \(s=5,\ldots ,8.\) There are several ways to derive weights \(\omega _s\) for the neighborhood systems: the method of [19] is based on an optimization approach, the method of [11] is based on the Cauchy–Crofton formula, and the approach of [85] is based on equating the Euclidean lengths of straight lines and the lengths of their digital counterparts. We note that for the system \(\{a_1,a_2,a_3,a_4\}\) of coordinate directions and diagonal directions the weights of [19] and in [85] coincide; the weights displayed for the knight-move case above are the ones derived by the scheme in [85]. For further details, we refer to these references.

We record that the considered problem (2) has a minimizer.

Theorem 1

The discrete multivariate Potts problem (2) has a minimizer.

The validity of Theorem 1 can be seen by following the lines of the proof of [43, Theorem 2.1] where an analogous statement is shown for the (non-piecewise constant) Mumford–Shah problem.

Vector-Valued Images We briefly discuss the extension of (2) to vector-valued images and multi-channel data, e.g., (blurred) RGB color images. To this end, we assume multi-channel data \(f = (f_1,\ldots ,f_C)\) consisting of C channels and images \(u = (u_1,\ldots ,u_C)\). In this situation, the role of the first summand on the right-hand side of (2) is taken by the channel-wise sum \(\sum _{c=1}^C \Vert Au_c - f_c\Vert _2^2\). The symbol \(\nabla _{a_s} u(i,j)\) now denotes the vector of directional differences with entries \(u_{(i,j)+a_s,c} - u_{i,j,c}\), \(c=1,\ldots ,C\) and the entirety of these vectors form the rows of \(\nabla u_{a_s}\). Consequently, \(\Vert \nabla _{a_s} u \Vert _0\) denotes the number of nonzero rows of \(\nabla _{a_s}u\). As a result, introducing a jump between two pixels in all channels has the same costs as opening a jump in a single channel only. This enforces the jumps to be aligned across the channels which is in contrast to a channel-wise application of the single-channel Potts model (2).

2.2 Derivation of the Proposed Algorithmic Schemes

We start out with the discretization (2) of the multivariate Potts problem. We introduce S versions \(u_1,\ldots ,u_S\) of the target u and link them via equality constraints in the following consensus form to obtain the problem

$$\begin{aligned} P_\gamma (u_1,\ldots ,u_S) \rightarrow \min , \qquad \text { s.t. } \quad u_1 = \ldots = u_S, \end{aligned}$$
(3)

where the function \(P_\gamma (u_1,\ldots ,u_S)\) of the S variables \(u_1,\ldots ,u_S\) is given by

$$\begin{aligned} P_\gamma (u_1,\ldots ,u_S) = \sum _{s=1}^S \frac{1}{S} \left\| Au_s - f \right\| _2^2 + \gamma \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} u_s \, \right\| _0. \end{aligned}$$
(4)

Note that solving (3) is equivalent to solving the discrete Potts problem (2). Further, note that we have overloaded the symbol \(P_\gamma \) which, for one argument u,  denotes the Potts function of (2) and for S arguments \(u_1,\ldots ,u_S\) denotes the energy function of (4); we have the relation \(P_\gamma (u,\ldots ,u)= P_\gamma (u)\).

A Majorization–Minimization Approach to the Quadratic Penalty Relaxation of the Potts Problem The quadratic penalty relaxation of (4) is given by

$$\begin{aligned} P_{\gamma , \rho }(u_1,\ldots ,u_S)= & {} \sum _{s=1}^S \frac{1}{S} \left\| Au_s - f \right\| _2^2 + \gamma \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} u_s \, \right\| _0 \nonumber \\&+ \rho \sum _{1\le s < s' \le S} c_{s,s'} \ \Vert u_s - u_{s'}\Vert _2^2. \end{aligned}$$
(5)

Here, the soft constraints which replace the equalities \(u_1 = \ldots = u_S\) are realized via the squared Euclidean norms \(\sum _{1\le s < s' \le S} c_{s,s'} \ \Vert u_s - u_{s'}\Vert _2^2,\) where the nonnegative numbers \(c_{s,s'}\) denote weights (which may be set to zero if no direct coupling between the particular \(u_s, u_{s'}\) is desired.) The symbol \(\rho \) denotes a positive penalty parameter promoting the soft constraint, i.e., increasing \(\rho \) enforces the \(u_i\) to be closer to each other w.r.t. the Euclidean distance. We note that we later analytically quantify the size of \(\rho \) which is necessary to obtain an a priori prescribed tolerance in the \(u_i;\) see (18). Frequently, we use the short-hand notation

$$\begin{aligned} \rho _{s,s'}= \rho \ c_{s,s'}. \end{aligned}$$
(6)

Typical choices of the \(\rho _{s,s'}\) are

$$\begin{aligned} \rho _{s,s'} = \rho \quad \text { for all } s,s', \qquad \qquad \text { or} \quad \rho _{s,s'} = \rho \ \delta _{((s+1)\mathrm{\,mod \,} S ), s'}\,, \end{aligned}$$
(7)

i.e., the constant choice (\(c_{s,s'}=1\)), as well as the coupling between consecutive variables with constant parameter (\(\delta _{s,t} =1\) if and only if \(s=t,\) and \(\delta _{s,t} =0\) otherwise.) We note that in these situations only one additional positive parameter \(\rho \) appears, and that this parameter is tied to the tolerance one is willing to accept as a distance of the \(u_i;\) see Algorithm 1.

For the majorization–minimization approach, we derive a surrogate functional [28] of the function \(P_{\gamma , \rho }(u_1,\ldots ,u_S)\) of (5). For this purpose, we introduce the block matrix B and the vector g given by

$$\begin{aligned} B = \begin{pmatrix} S^{-1/2}A &{} 0 &{} &{} \cdots &{} &{} 0 \\ 0 &{} S^{-1/2}A &{} &{} \cdots &{} &{} 0 \\ \vdots &{} &{} &{} \ddots &{} &{} \vdots \\ 0 &{} 0 &{} &{} \cdots &{} S^{-1/2}A &{} 0 \\ 0 &{} 0 &{} &{} \cdots &{} 0 &{} S^{-1/2}A \\ \rho _{1,2}^{1/2}I &{} -\rho _{1,2}^{1/2}I &{} 0 &{} \ldots &{} 0 &{} 0\\ \rho _{1,3}^{1/2}I &{} 0 &{} -\rho _{1,3}^{1/2}I &{} \ldots &{} 0 &{} 0\\ &{} \vdots &{} &{} &{} \vdots &{} \\ \rho _{1,S}^{1/2}I &{} 0 &{} 0 &{} \ldots &{} 0 &{} -\rho _{1,S}^{1/2}I \\ 0 &{} \rho _{2,3}^{1/2}I &{} -\rho _{2,3}^{1/2}I &{} \ldots &{} 0 &{} 0 \\ &{} \vdots &{} &{} &{} \vdots &{} \\ 0 &{} \rho _{2,S}^{1/2}I &{} 0 &{} \ldots &{} 0 &{} -\rho _{2,S}^{1/2}I \\ &{} &{} &{} \vdots &{} &{} \\ &{} &{} &{} \vdots &{} &{} \\ 0 &{} 0 &{} 0 &{} \ldots &{} \rho _{S-1,S}^{1/2}I &{} -\rho _{S-1,S}^{1/2}I \\ \end{pmatrix}, \quad g = \begin{pmatrix} S^{-1/2}f\\ S^{-1/2}f\\ \vdots \\ S^{-1/2}f \\ S^{-1/2}f \\ 0 \\ 0\\ \vdots \\ 0 \\ 0 \\ \vdots \\ 0 \\ \vdots \\ \vdots \\ 0\\ \end{pmatrix}. \end{aligned}$$
(8)

Here, I denotes the identity matrix and 0 the zero matrix; The matrix B has S block columns and \(S+S(S-1)/2\) block rows. Further, we introduce the difference operator D given by

$$\begin{aligned} D(u_1,\ldots ,u_S) = \begin{pmatrix} \nabla _{a_1} u_1 \\ \vdots \\ \nabla _{a_S} u_S \\ \end{pmatrix} \end{aligned}$$
(9)

which applies the difference w.r.t. the ith direction to the ith component of u. We employ the weights \(\omega _1,\) \(\ldots , \omega _S\) to define the quantity \(\Vert D(u_1,\ldots ,u_S)\Vert _{0,\omega }\) which counts the weighted number of jumps by

$$\begin{aligned} \Vert D(u_1,\ldots ,u_S)\Vert _{0,\omega } = \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} u_s \, \right\| _0. \end{aligned}$$
(10)

With all this comprehensive notation at hand, we may rewrite the function of (5) as

$$\begin{aligned} P_{\gamma ,\rho }(u_1,\ldots ,u_S) = \left\| B(u_1,\ldots ,u_S)^\mathrm{T} - g \right\| _2^2 + \gamma \ \Big \Vert \ D(u_1,\ldots ,u_S) \ \Big \Vert _{0,\omega }. \end{aligned}$$
(11)

Using the representation (11), the surrogate functional in the sense of [28] of \(P_{\gamma ,\rho }\) is given by

$$\begin{aligned} P_{\gamma , \rho }^{\mathrm{surr}}(u_1,\ldots ,u_S,v_1,\ldots ,v_S)&= \frac{1}{L_{\rho }^2}\left\| B(u_1,\ldots ,u_S)^\mathrm{T} - g \right\| _2^2 + \frac{\gamma }{L_{\rho }^2} \ \Big \Vert \ D(u_1,\ldots ,u_S) \ \Big \Vert _{0,\omega }\\&\quad - \frac{1}{L_{\rho }^2}\left\| B(u_1,\ldots ,u_S)^\mathrm{T} - B(v_1,\ldots ,v_S)^\mathrm{T} \right\| _2^2 \nonumber \\&\quad + \left\| (u_1,\ldots ,u_S)^\mathrm{T} - (v_1,\ldots ,v_S)^\mathrm{T} \right\| _2^2. \nonumber \end{aligned}$$
(12)

Here, \(L_{\rho } \ge 1\) denotes a constant which is chosen larger than the spectral norm \(\Vert B \Vert \) of B (i.e., the operator norm w.r.t. the \(\ell ^2\) norm.) This scaling is made to ensure that \(B/L_{\rho }\) is contractive. In terms of A and the penalties \(\rho _{s,s'},\) we require that

$$\begin{aligned} L_{\rho }^2 > \Vert A\Vert _2^2/S + 2 \max _{s \in \{1,\ldots ,S \} } \sum _{s': s'\ne s}^{S} \rho _{s,s'}. \end{aligned}$$
(13)

For the particular choice \(\rho _{s,s'} = \rho \) as on the left-hand side of (7) we can choose \(L_{\rho }^2\) smaller, i.e., \( L_{\rho }^2 > \Vert A\Vert _2^2/S + S \rho . \) For only coupling neighboring \(u_s\) with the same constant \(\rho \), i.e., the right-hand coupling of (7), we have \( L_{\rho }^2 > \Vert A\Vert _2^2/S + \alpha \rho , \) where \(\alpha = 4,\) if S is even, and \(\alpha = 2 - 2 \cos \left( \frac{\pi (S-1)}{S}\right) \) if S is odd. These choices ensure that \(B/L_{\rho }\) is contractive by Lemma 1. Basics on surrogate functionals as we need them for this paper are gathered in Sect. 3.4. Further details on surrogate functionals can be found in [9, 10, 28].

Using elementary properties of the inner product shows that

$$\begin{aligned}&P_{\gamma , \rho }^{\mathrm{surr}} (u_1,\ldots ,u_S,v_1,\ldots ,v_S) \nonumber \\&\quad = \bigg \Vert (u_1,\ldots ,u_S)^\mathrm{T}- \bigg ((v_1,\ldots ,v_S)^\mathrm{T}-\frac{1}{L_{\rho }^2} B^\mathrm{T} (B(v_1,\ldots ,v_S)^\mathrm{T}-g) \bigg ) \bigg \Vert ^2_2 \nonumber \\&\qquad + \frac{\gamma }{L_\rho ^2}\Big \Vert D(u_1,\ldots ,u_S) \Big \Vert _{0,\omega } + R(v_1,\ldots ,v_S) , \end{aligned}$$
(14)

where \( R(v_1,\ldots ,v_S) \) is a rest term which is irrelevant when minimizing \(P_{\mathrm{surr}}\) w.r.t. \(u_1,\ldots ,u_S\) for fixed \(v_1,\ldots ,v_S.\) Writing this down in terms of the original system matrix A and the data f yields

$$\begin{aligned}&P_{\gamma , \rho }^{\mathrm{surr}}\left( u_1,\ldots ,u_S,v_1,\ldots ,v_S \right) \\&\quad = \sum _{s=1}^S \left[ \left\| u_s - \left( v_s + \tfrac{1}{SL_{\rho }^2} A^*f - \tfrac{1}{S L_{\rho }^2} A^*A v_s - \sum _{s \ne s'}\tfrac{\rho _{s,s'}}{L_{\rho }^2} (v_s-v_{s'}) \right) \right\| _2^2 \right. \nonumber \\&\qquad \left. + \tfrac{\gamma \omega _s}{L_{\rho }^2} \left\| \nabla _{a_s} u_s \, \right\| _0 \right] + R(v). \nonumber \end{aligned}$$
(15)

For the quadratic penalty relaxation of the Potts problem, i.e., for minimizing the problem (5), we propose to use the surrogate iteration, i.e., \(u^{(n+1)}_1,\ldots ,u^{(n+1)}_S\) \(\in {\text {argmin}}_{u_1,\ldots ,u_S} P_{\gamma , \rho }^{\mathrm{surr}} (u_1,\ldots ,\) \(u_S,u^{(n)}_1,\ldots ,u^{(n)}_S).\) Applied to (15), this surrogate iteration reads

$$\begin{aligned} \left( u^{(n+1)}_1,\ldots ,u^{(n+1)}_S \right) \in \mathop {{{\text {argmin}}}}\limits _{u_1,\ldots ,u_S} \sum _{s=1}^S \left[ \left\| u_s - h^{(n)}_s \right\| _2^2 + \tfrac{\gamma \omega _s}{L_{\rho }^2} \left\| \nabla _{a_s} u_s \, \right\| _0 \right] \end{aligned}$$
(16)

where \(h^{(n)}_s\) is given by

$$\begin{aligned} h^{(n)}_s = u^{(n)}_s + \tfrac{1}{SL_{\rho }^2} A^*f - \tfrac{1}{S L_{\rho }^2} A^*A u^{(n)}_s - \sum _{s':s' \ne s}\tfrac{\rho _{s,s'}}{L_{\rho }^2} (u^{(n)}_s-u^{(n)}_{s'}), \quad \text { for all } s \in \{1,\ldots ,S\}. \end{aligned}$$
(17)

Note that in Sect. 2.3, we derive an efficient algorithm which computes an exact minimizer of (16). Now assume that we are willing to accept a deviation between the \(u_s\) which is small, i.e.,

$$\begin{aligned} \Vert u_s - u_{s'}\Vert ^2_2 = \sum _{i,j}|(u_s)_{ij} - (u_{s'})_{ij}|^2 < \tfrac{\varepsilon ^2}{c_{s,s'}}, \end{aligned}$$
(18)

for \(\varepsilon >0\) and for indices \(s,s'\) with \(c_{s,s'} \ne 0.\) The following algorithm computes a result fulfilling (18).

Algorithm 1

We consider the quadratic penalty relaxed Potts problem (5) and tolerance \(\varepsilon \) for the targets \(u_s\) we are willing to accept. We propose the following algorithm for the relaxed Potts problem (5) (which yields a result with targets \(u_s\) deviating from each other by at most \(\varepsilon /\sqrt{c_{s,s'}}\)).

  • Set \(\rho \) according to (34), set \(L_{\rho }\) according to (13) (or, in the special cases of (7), as below (34) and (13).)

  • Initialize \(u^{(n)}_s\) as discussed in the corresponding paragraph below, (e.g., \(u^{(n)}_s = 0\) for all s.)

  • Iterate until convergence:

    $$\begin{aligned} \text {1.}\quad&h^{(n)}_s = u^{(n)}_s + \tfrac{1}{SL_{\rho }^2} A^*f - \tfrac{1}{S L_{\rho }^2} A^*A u^{(n)}_s - \sum _{s':s' \ne s}\tfrac{\rho _{s,s'}}{L_{\rho }^2} (u^{(n)}_s-u^{(n)}_{s'}), \quad s = 1,\ldots ,S, \nonumber \\ \text {2.}\quad&\left( u^{(n+1)}_1,\ldots ,u^{(n+1)}_S \right) \in \mathop {{{\text {argmin}}}}\limits _{u_1,\ldots ,u_S} \sum _{s=1}^S \left[ \left\| u_s - h^{(n)}_s \right\| _2^2 + \tfrac{\gamma \omega _s}{L_{\rho }^2} \left\| \nabla _{a_s} u_s \, \right\| _0 \right] . \end{aligned}$$
    (19)

We will see in Theorem 3 that this algorithm converges to a local minimizer of the quadratic penalty relaxation (5) and that the \(u_s\) are \(\varepsilon \)-close, i.e., (18) is fulfilled.

The relation between the Potts problem and its quadratic penalty relaxation and obtaining a feasible solution for the Potts problem (4) from the output of Algorithm 1. As pointed out above, we show in Theorem 3 that Algorithm 1 produces a local minimizer of the quadratic penalty relaxation (5) of the Potts problem (4) and that the corresponding variables of a resulting solution are close up to an a priori prescribed tolerance. This may in practice be already enough. However, strictly speaking a local minimizer of the quadratic penalty relaxation (5) is not feasible for the Potts problem (4).

We will now explain a projection procedure to derive a feasible solution for the Potts problem (4) from a local minimizer of (5) with nearby variables \(u_s\) (as produced by Algorithm 1.) Related theoretical results are stated as Theorem 4. In particular, we will see that in case the image operator A is lower bounded, the projection procedure applied to the output of Algorithm 1 yields a feasible point which is close to a local minimizer of the original Potts problem (4).

In order to explain the averaging procedure, we need some notions on partitionings. Recall that a partitioning \({\mathcal {P}}\) consists of a (finite number of) segments \({\mathcal {P}}_i\) which are pairwise disjoint sets of pixel coordinates whose union equals the image domain \(\Omega ,\) i.e.,

$$\begin{aligned} \cup _{i=1}^{N_{\mathcal {P}}} {\mathcal {P}}_i = \Omega , \qquad {\mathcal {P}}_i \cap {\mathcal {P}}_j = \emptyset \quad \text {for all } i,j=1,\ldots ,N_{\mathcal {P}}. \end{aligned}$$
(20)

Here, we assume that each segment \({\mathcal {P}}_i\) is connected w.r.t. the neighborhood system \(a_1,\ldots ,a_S\) in the sense that there is a path connecting any two elements in \({\mathcal {P}}_i\) with steps in \(a_1,\ldots ,a_S.\)

We will need the following proposed notion of a directional partitioning. A directional partition w.r.t. a set of S directions \(a_1,\ldots ,a_S\) consists of a set \({\mathcal {I}}\) of (discrete) intervals I, where each interval I is associated with exactly one of the directions \(a_1,\ldots ,a_S;\) here, an interval I associated with the direction \(a_s\) has to be of the form \(I = \{(i,j)+ k a_s : k = 0,\ldots , K-1\},\) where \(K \in {\mathbb {N}}\) and I belongs to the discrete domain. (For each direction \(a_s\), the corresponding intervals form an ordinary partition.) We note that Algorithm 1 which produces output \(u=(u_1,\ldots ,u_S):\Omega \rightarrow {\mathbb {R}}^s\) induces a directional partitioning as follows. We observe that each variable \(u_s\) is associated with a direction \(a_s.\) For any \(s \in \{1,\ldots ,S\},\) we let each (maximal) interval of constance of \(u_s\) be an interval in \({\mathcal {I}}\) associated with \(a_s.\)

Each partitioning induces a directional partitioning \({\mathcal {I}}\) by letting the intervals I of \({\mathcal {I}}\) be the stripes with direction \(a_s\) obtained from segment \({\mathcal {P}}_i\) for each direction \(s =1,\ldots , S\) and each segment \({\mathcal {P}}_i, i=1, \ldots , N_{\mathcal {P}}.\) Furthermore, each directional partitioning \({\mathcal {I}}\) induces a partitioning by the following merging process.

Definition 1

We say that pixels xy are related, in symbols, \(x \sim y\), if there is a path \(x_0=x,\ldots ,x_N=y\) connecting xy in the sense that for any consecutive members \(x_i,x_{i+1},\) \(i=1,\ldots ,N-1,\) of the path there is an interval I of the directional partitioning \({\mathcal {I}}\) containing both \(x_i,x_{i+1}.\)

The relation \(x\sim y\) obviously defines an equivalence relation and the corresponding equivalence classes \({\mathcal {P}}_i\) yield a partitioning on \(\Omega .\) We use the symbols

$$\begin{aligned} {\mathcal {I}}({\mathcal {P}}) = {\mathcal {I}}_{{\mathcal {P}}},\qquad {\mathcal {P}}({\mathcal {I}}) = {\mathcal {P}}_{{\mathcal {I}}}, \end{aligned}$$
(21)

to denote the mappings assigning a partitioning a directional partitioning and vice versa, respectively.

As a final preparation, we consider a function \(u=(u_1,\ldots ,u_S):\Omega \rightarrow {\mathbb {R}}^s\) as produced by Algorithm 1 and a partitioning \({\mathcal {P}}\) of \(\Omega \) and define the following projection to a function \(\pi _{\mathcal {P}}(u): \Omega \rightarrow {\mathbb {R}}\) by

$$\begin{aligned} \pi _{\mathcal {P}}(u)|_{{\mathcal {P}}_i} = \frac{\sum _{x \in {\mathcal {P}}_i} \sum _{s = 1}^S u_s(x) }{ S \ \#{\mathcal {P}}_i}, \end{aligned}$$
(22)

where the symbol \(\#{\mathcal {P}}_i\) denotes the number of elements in the segment \({\mathcal {P}}_i.\) Hence, the projection \(\pi \) defined via (22) averages w.r.t. all components of u and all members of the segment \({\mathcal {P}}_i\) and so produces a piecewise constant function w.r.t. the partitioning \({\mathcal {P}}.\)

Using these notions, we propose the following projection procedure.

Procedure 1

(Projection Procedure) We consider output \(u=(u_1,\ldots ,u_S):\Omega \rightarrow {\mathbb {R}}^s\) of Algorithm 1 together with its induced directional partitioning \({\mathcal {I}}.\)

  1. 1.

    Compute the partitioning \({\mathcal {P}}({\mathcal {I}}) = {\mathcal {P}}_{{\mathcal {I}}}\) induced by the directional partitioning \({\mathcal {I}}\) as explained above (21).

  2. 2.

    Project \(u=(u_1,\ldots ,u_S):\Omega \rightarrow {\mathbb {R}}^s\) to \(\pi _{{\mathcal {P}}_{{\mathcal {I}}}}(u)\) using (22) for the partitioning \({\mathcal {P}}({\mathcal {I}}) = {\mathcal {P}}_{{\mathcal {I}}},\) and return \(\pi _{{\mathcal {P}}_{{\mathcal {I}}}}(u)\) as output.

We notice that when having a partitioning \({\mathcal {P}}_{{\mathcal {I}}}\) solving the normal equation in the space of functions constant on \({\mathcal {P}}_{{\mathcal {I}}}\) would be an alternative to the above second step which, however, might be more expensive.

A Penalty Method for the Potts Problem Based on a Majorization–Minimization Approach for Its Quadratic Penalty Relaxation Intuitively, increasing the parameters \(\rho \) during the iterations should tie the \(u_s\) closer together such that the constraint of (3) should be ultimately fulfilled which results in an approach for the initial Potts problem (2). Recall that \(\rho _{s,s'} = \rho \ c_{s,s'},\) was defined by (6), where the \(c_{s,s'}\) are nonnegative numbers weighting the constraints. We here increase \(\rho \) while leaving the \(c_{s,s'}\) fixed during this process.

Algorithm 2

We consider the Potts problem (3) in S variables (which is equivalent to (2) as explained above). We propose the following algorithm for the Potts problem (3).

  • Let \(\rho ^{(k)}\) be a strictly increasing sequence (e.g., \(\rho ^{(k)} = \tau ^k\rho ^{(0)},\) with \(\rho _0,\tau >1\)) and \(\delta _k \rightarrow 0\) be a strictly decreasing sequence converging to zero (e.g., \(\delta _k = \delta _0/\tau ^k.\)) Further, let

    $$\begin{aligned} t > 2 \sigma _1^{-1/2} S^{-1/2} \Vert A\Vert \ \Vert f\Vert , \end{aligned}$$
    (23)

    where \(\sigma _1\) is the smallest nonzero eigenvalue of \(C^\mathrm{T}C\) with C given by (49). For the particular choice of coupling given by the left-hand and right-hand side of (7) we let

    $$\begin{aligned}&t> \tfrac{2}{S}\Vert A\Vert \ \Vert f\Vert , \quad \text { and }\quad t > 2(2-2\cos (2\pi /S))^{-1/2} S^{-1/2} \Vert A\Vert \ \Vert f\Vert , \nonumber \\ \end{aligned}$$
    (24)

    respectively. Initialize \(u^{(0)}_s := u^{(0,0)}_s\) as discussed in the corresponding paragraph below, (e.g., \(u^{(0)}_s = 0\) for all s.)

  • Set \(\rho = \rho ^{(0)},\ \rho _{s,s'} = \rho ^{(0)}c_{s,s'},\ \delta = \delta _0,\ k,n=0;\) set \(L_{\rho }\) according to (13) (or, in the special cases of (7), as explained below (13))

    1. A.

      While

      $$\begin{aligned} \left\| u^{(k,n)}_s - u^{(k,n)}_{s'} \right\|> \frac{t}{\rho \sqrt{c_{s,s'}}}, \quad \text { or } \quad \left\| u^{(k,n)}_s - u^{(k,n-1)}_s \right\| > \frac{\delta }{L_{\rho }} \end{aligned}$$
      (25)

      do

      $$\begin{aligned} \text {1.}\quad&h^{(k,n)}_s = u^{(k,n)}_s + \tfrac{1}{SL_{\rho }^2} A^*f - \tfrac{1}{S L_{\rho }^2} A^*A u^{(k,n)}_s - \sum _{s':s' \ne s}\tfrac{\rho _{s,s'}}{L_{\rho }^2} (u^{(k,n)}_s-u^{(k,n)}_{s'}),\nonumber \\&\qquad s = 1,\ldots ,S, \nonumber \\ \text {2.}\quad&\left( u^{(k,n+1)}_1,\ldots ,u^{(k,n+1)}_S \right) \in \mathop {{{\text {argmin}}}}\limits _{u_1,\ldots ,u_S} \sum _{s=1}^S \left[ \left\| u_s - h^{(k,n)}_s \right\| _2^2 + \tfrac{\gamma \omega _s}{L_{\rho }^2} \left\| \nabla _{a_s} u_s \, \right\| _0 \right] , \end{aligned}$$
      (26)

      and set \(n=n+1.\)

    2. B.

      Set

      $$\begin{aligned} u^{(k+1)}_s = u^{(k+1,0)}_s = u^{(k,n)}_s, \end{aligned}$$
      (27)

      set \(k = k+1, n=0,\) and let \(\rho = \rho ^{(k)}, \rho _{s,s'} = \rho ^{(k)} \ c_{s,s'}, \delta = \delta _k;\) set \(L_{\rho }\) according to (13) (or, in the special cases of (7), as below (13)) and goto A.

This approach is inspired by [60] which considers quadratic penalty methods in the sparsity context. There, the authors are searching for a solution with only a few nonzero entries. The corresponding prior is separable. In contrast to this work, the present work considers a non-separable prior.

Initialization Although the initialization of Algorithm 1 and of Algorithm 2 is not relevant for its convergence properties (cf. Sect. 3), the choice of the initialization influences the final result. (Please note that this also might happen for convex but not strictly convex problems.) We discuss different initialization strategies. The simplest choice is the all-zero initialization \((u_1^{(0)},\ldots ,u_s^{(0)}) = (0,\ldots ,0).\) Likewise, one can select the right-hand side of the normal equations of the underlying least squares problem, that is \(A^\mathrm{T}f\). A third reasonable choice is the solution of the normal equation itself or an approximation of it. Using an approximation might in particular be reasonable to get a regularized approximation of the normal equation. A possible strategy to obtain such a regularized initialization is to apply a fixed number of Landweber iterations [54] or of the conjugate gradient method to the underlying least square problem. (In our experiments, we initialized Algorithm 1 with the result of 1000 Landweber iterations and Algorithm 2 with \(A^\mathrm{T}f\).)

2.3 A Non-iterative Algorithm for Minimizing the Potts Subproblem (16)

Both proposed algorithms require solving the Potts subproblem (16) in the backward step, see (19),(26). We first observe that (16) can be solved for each of the \(u_s\) separately. The corresponding s minimization problems are of the prototypical form

$$\begin{aligned} \begin{aligned} \mathop {{{\text {argmin}}}}\limits _{u_s:\Omega \rightarrow {\mathbb {R}}} \Vert u_s - f\Vert _2^2 + \gamma '_s \Vert \nabla _{a_s} u \Vert _0 \end{aligned} \end{aligned}$$
(28)

with given data f, the jump penalty \(\gamma '_s = \tfrac{\gamma \omega _s}{L_{\rho }^2}> 0\) and the direction \(a_s\in \mathbb {Z}^2\). As a next step, we see that (28) decomposes into univariate Potts problems for data along the paths in f induced by \(a_s\), e.g., for \(a_s = e_1\) those paths correspond to the rows of f and we obtain a minimizer \(u_s^*\) of (28) by determining each of its rows individually. The univariate Potts problem amounts to minimizing

$$\begin{aligned} \begin{aligned} P^{\mathrm {id,1d}}_\gamma (x) = \Vert x - g\Vert ^2_2 + \gamma \Vert \nabla x\Vert _0 \rightarrow \min , \end{aligned} \end{aligned}$$
(29)

where the data g is given by the restriction of f to the pixels in \(\Omega \) of the form \(v+a_s z,\) for \(z\in \mathbb {Z}\), i.e., \(g(z)=f(v+a_s z)\).

Here, the offset v is fixed when solving each univariate problem, but varied afterward to get all lines in the image with direction \(a_s.\) The target to optimize is denoted by \(x\in {\mathbb {R}^n}\) and, in the resulting univariate situation, \(\Vert \nabla x\Vert _0= \vert \{ i:x_i \ne x_{i+1} \} \vert \) denotes the number of jumps of x.

It is well known that the univariate direct problem (29) has a unique minimizer. Further these particular problems can be solved exactly by dynamic programming [18, 35, 62, 63, 92] which we briefly describe in the following. For further details, we refer to [35, 82]. Assume we have computed minimizers \(x^l\) of (29) for partial data \((g_1,\ldots ,g_l)\) for each \(l=1,\ldots ,r\), \(r<n\). Then, the minimum value of (29) for \((g_1,\ldots ,g_{r+1})\) can be found by

$$\begin{aligned} \begin{aligned} P^{\mathrm {id,1d}}_\gamma ({x^{r+1}} )= \min _{l=1,\ldots ,r+1} P^{\mathrm {id,1d}}_\gamma (x^{l-1}) + \gamma +{\mathcal {E}}^{l:r+1}, \end{aligned} \end{aligned}$$
(30)

where we let \(x^0\) be the empty vector, \(P^{\mathrm {id,1d}}_\gamma (x^0) = -\gamma \) and \({\mathcal {E}}^{l:r+1}\) be the quadratic deviation of \((g_l,\ldots ,g_{r+1})\) from its mean. By denoting the minimizing argument in (30) by \(l^*\) the minimizer \(x^{r+1}\) is given by

$$\begin{aligned} x^{r+1} = (x^{l^*-1}, \mu _{[l^*,r]},\ldots ,\mu _{[l^*,r]}), \end{aligned}$$
(31)

where \(\mu _{[l^*,r]}\) is the mean value of \((g_{l^*},\ldots ,g_r)\). Thus, we obtain a minimizer for full data g by successively computing \(x^l\) for each \(l=1,\ldots ,n\). By precomputing the first and second moments of data g and storing only jump locations the described method can be implemented in \({\mathcal {O}}(n^2)\), [35]. Another way to achieve \({\mathcal {O}}(n^2)\) is based on the QR decomposition of the design matrix by means of Givens rotations, see [82]. Furthermore, the search space can be pruned to speed up computations [47, 83].

We briefly describe the extensions of the above scheme necessary to approach (29) for vector valued-data \(g\in \mathbb {R}^{n\times C}\) (e.g., the row of a color image). In this situation, the symbol \({\mathcal {E}}^{l:r+1}\) in (30) denotes the sum of the quadratic deviations of \((g_l,\ldots ,g_{r+1})\) from its channel-wise means. Further, \(\mu _{[l^*,r]}\in \mathbb {R}^{C}\) in (31) is the vector of channel-wise means of the data \((g_{l^*},\ldots ,g_r)\). On the computational side, the first and second moments of each channel have to be precomputed separately. It is worth mentioning that the theoretical computational costs of the described method grows only linearly in the number of channels [83]. Thus, the proposed algorithm can be efficiently applied to vector-valued images with a high-dimensional codomain.

3 Analysis

3.1 Analytic Results

In the course of the derivation of the proposed algorithms above, we consider the quadratic penalty relaxation (5) of the multivariate Potts problem. Although it is more straightforward to access algorithmically via our approach, we first note that this problem is still NP-hard (as is the original problem).

Theorem 2

Finding a (global) minimizer of the quadratic penalty relaxation (5) of the multivariate Potts problem is an NP-hard problem.

The proof is given in Sect. 3.3. In Sect. 2.2, we have proposed Algorithm 1 to approach the quadratic penalty relaxation of the multivariate Potts problem. We show that the proposed algorithm converges to a local minimizer and that a feasible point of the original multivariate Potts problem is nearby.

Theorem 3

We consider the iterative Potts minimization Algorithm 1 for the quadratic penalty relaxation (5) of the multivariate Potts problem.

  1. i.

    Algorithm 1 computes a local minimizer of the quadratic penalty relaxation (5) of the multivariate Potts problem for any starting point. The convergence rate is linear.

  2. ii.

    We have the following relation between local minimizers \({{\mathcal {L}}}\), global minimizers \({{\mathcal {G}}}\) and the fixed points \(\mathrm {Fix}({\mathbb {I}})\) of the iteration of Algorithm 1,

    $$\begin{aligned} {{\mathcal {G}} } \subset \mathrm {Fix}({\mathbb {I}}) \subset {{\mathcal {L}}}. \end{aligned}$$
    (32)
  3. iii.

    Assume a tolerance \(\varepsilon \) we are willing to accept for the distance between the \(u_s,\) i.e.,

    $$\begin{aligned} \sum _{s,s'} c_{s,s'} \Vert u_s - u_{s'}\Vert ^2_2 = \sum _{s,s'} c_{s,s'} \sum _{i,j}|(u_s)_{ij} - (u_{s'})_{ij}|^2 \le \varepsilon ^2. \end{aligned}$$
    (33)

    Running Algorithm 1 with the choice of the parameter \(\rho \) by

    $$\begin{aligned} \rho > 2 \varepsilon ^{-1} \ \sigma _1^{-1/2} S^{-1/2} \Vert A\Vert \Vert f\Vert \end{aligned}$$
    (34)

    (where \(\sigma _1\) is the smallest nonzero eigenvalue of \(C^\mathrm{T}C\) with C given by (49); for the particular choice of the coupling given by (7), \(\sigma _1 = S\) and \(\sigma _1 = (2-2\cos (2\pi /S)),\) respectively) yields a local minimizer of the quadratic penalty relaxation (5) such that the \(u_s\) are close up to \(\varepsilon ,\) i.e., (33) is fulfilled.

The proof is given in Sect. 3.5. A solution of Algorithm 1 is not a feasible point for the initial Potts problem (3). However, we see below that it produces a \(\delta \)-approximative solution \(u^*\) in the sense that there is \(\mu ^*\) and a partitioning \({\mathcal {P}}^*\) such that

$$\begin{aligned} \sum _{s,s'} c_{s,s'} \Vert u^*_s - u^*_{s'}\Vert ^2_2< \delta , \qquad \text { and } \quad L(\mu ^*) < \delta , \end{aligned}$$
(35)

where \(L(\mu ^*)\) is given by (53). In this context, note that the conditions for a local minimizer are given by \(\sum _{s,s'} c_{s,s'} \Vert u^*_s - u^*_{s'}\Vert ^2_2 = 0\) and the Lagrange multiplier condition \(L(\mu ^*) = 0.\) So (35) intuitively means that both the constraint and the Lagrange multiplier condition are approximately fulfilled for the partitioning induced by \(u^*\).

Further, given a solution of Algorithm 1 we find a feasible point for the Potts problem (3) (or, equivalently,(2)) which is nearby as detailed in the following theorem.

Theorem 4

We consider the iterative Potts minimization Algorithm 1 for the quadratic penalty relaxation (5) in connection with the (non-relaxed) Potts problem (3).

  1. i.

    Algorithm 1 produces an approximative solution in the sense of (35) of the Potts problem (3).

  2. ii.

    The projection procedure (Procedure 1) proposed in Sect. 2.2 applied to the solution \(u'=(u'_1,\ldots ,u'_S)\) of Algorithm 1 produces a feasible image \({{\hat{u}}}\) (together with a valid partitioning) for the Potts problem (3) which is close to \(u'\) in the sense that

    $$\begin{aligned} \Vert u_s'-{{\hat{u}}}\Vert \le C_1 \varepsilon \qquad \text {for all} \quad s \in \{1,\ldots ,S\}, \end{aligned}$$
    (36)

    where \(\varepsilon = \max _{s,s'} \Vert u'_s-u'_{s'}\Vert \) quantifies the deviation between the \(u_s.\) Here, \(C_1 = \# \Omega /4, \) where the symbol \(\# \Omega \) denotes the number of elements in \(\Omega .\) If the imaging operator A is lower bounded, i.e., there is a constant \(c>0\) such that \(\Vert Au\Vert \ge c \Vert u\Vert \), a local minimizer \(u^*\) of the Potts problem (3) is nearby, i.e.,

    $$\begin{aligned} \Vert u^*-{{\hat{u}}}\Vert \le \frac{\sqrt{\eta }}{c} \end{aligned}$$
    (37)

    where

    $$\begin{aligned} \eta := \left( \Vert A \Vert ^2 \varepsilon C_1^2 + 2 \Vert A \Vert C_1 \Vert f \Vert _2 \right) \varepsilon . \end{aligned}$$
    (38)

The proof of Theorem 4 can be found at the end of Sect. 3.4, where most relevant statements are already shown in Sect. 3.3. Theorem 4 theoretically underpins the fact that, on the application side, we may use Algorithm 1 for the Potts problem (3) (accepting some arbitrary small tolerance we may fix in advance).

In addition, in Sect. 2.2, we have proposed Algorithm 2 to approach the Potts problem (3). We first show that Algorithm 2 is well defined.

Theorem 5

Algorithm 2 is well defined in the sense that the inner iteration governed by (25) terminates, i.e., for any \(k \in {\mathbb {N}},\) there is \(n \in {\mathbb {N}}\) such that the termination criterium given by (25) holds.

The proof of Theorem 5 is given in Sect. 3.6. Concerning the convergence properties of Algorithm 2, we obtain the following results.

Theorem 6

We consider the iterative Potts minimization algorithm (Algorithm 2) for the Potts problem (3).

  • Any cluster point of the sequence \(u^{(k)}\) is a local minimizer of the Potts problem (3) (which implicitly implies that the components of each limit \(u^*\) are equal, i.e., \(u_s^{*} = u_{s'}^{*}\) for all \(s,s'.\))

  • If A is lower bounded, the sequence \(u^{(k)}\) produced by Algorithm 2 has a cluster point and the produced cluster points are local minimizers of the Potts problem (3).

The proof of Theorem 6 can be found in Sect. 3.6.

3.2 Estimates on Operator Norms and Lagrange Multipliers

Lemma 1

The spectral norm of the block matrix B given by (8) fulfills

$$\begin{aligned} \Vert B \Vert _2 \le \bigg (\tfrac{1}{S}\Vert A\Vert _2^2 + 2 \max _{s \in \{1,\ldots ,S \} } \sum _{s': s'\ne s}^{S} \rho _{s,s'}\bigg )^{\frac{1}{2}}. \end{aligned}$$
(39)

For the particular choice of constant \(\rho _{s,s'} = \rho \) (independent of \(s,s'\)) as on the left-hand side of (7), we have the improved estimate

$$\begin{aligned} \Vert B \Vert _2 \le \bigg (\tfrac{1}{S}\Vert A\Vert _2^2 + S \rho \bigg )^{\frac{1}{2}}. \end{aligned}$$
(40)

For only coupling neighboring \(u_s\) with the same constant \(\rho \), i.e., the right-hand coupling of (7), we have

$$\begin{aligned} \Vert B \Vert _2 \le \bigg (\tfrac{1}{S}\Vert A\Vert _2^2 + \alpha \rho \bigg )^{\frac{1}{2}}, \quad \text { where } \quad \alpha = {\left\{ \begin{array}{ll} 4, &{}\quad \text {if }S\text { is even}, \\ 2 - 2 \cos \left( \frac{\pi (S-1)}{S}\right) , &{} \quad \text {if }S\text { is odd.} \end{array}\right. } \end{aligned}$$
(41)

Proof

We decompose the matrix B according to \(B =\begin{pmatrix} S^{-1/2}{{\tilde{A}}} \\ {{\tilde{P}}} \end{pmatrix}.\) Here, \({{\tilde{A}}}\) denotes an \(S \times S\)-block diagonal matrix with each diagonal entry being equal to A,  where A is the matrix representing the forward/imaging operator; see (8). The matrix \({{\tilde{P}}}\) is given as the lower \({S \atopwithdelims ()2} \times S\)-block in (8) which represents the soft constraints.

Using this decomposition of B, we may decompose the symmetric and positive (semidefinite) matrix \(B^\mathrm{T}B\) according to

$$\begin{aligned} B^\mathrm{T}B = \tfrac{1}{S} {{\tilde{A}}}^\mathrm{T} {{\tilde{A}}} + {{\tilde{P}}}^\mathrm{T} {{\tilde{P}}}, \end{aligned}$$
(42)

where \({{\tilde{A}}}^\mathrm{T} {\tilde{A}}\) is an \(S \times S\)-block diagonal matrix with each diagonal entry being equal to \(A^\mathrm{T} A,\) and \({\tilde{P}}^\mathrm{T} {\tilde{P}}\) is an \(S \times S\)-block diagonal matrix with block entries given by

$$\begin{aligned} {\tilde{P}}^\mathrm{T} {\tilde{P}} = \begin{pmatrix} \sum \nolimits _{k=2}^S\rho _{1,k} I &{} -\rho _{1,2} I &{} -\rho _{1,3} I &{} \ldots &{} -\rho _{1,S} I\\ -\rho _{1,2} I &{} \sum \nolimits _{k=1,k \ne 2}^S\rho _{2,k} I &{} -\rho _{2,3}I &{} \ldots &{} -\rho _{2,S} I\\ &{} \vdots &{} &{} &{} \vdots &{} \\ -\rho _{1,S}I &{} -\rho _{2,S}I &{} -\rho _{3,S}I &{} \ldots &{} \sum \nolimits _{k=1}^{S-1}\rho _{S,k}I \end{pmatrix}, \end{aligned}$$
(43)

with \(\rho _{l,k}:= \rho _{k,l}\) for \(l>k.\) Using Gerschgorin’s Theorem (see for instance [81]), the eigenvalues of \({\tilde{P}}\) are contained in the union of the balls with center \(x_r=\sum _{k=1,k \ne r}^S\rho _{r,k}\) and radius \(x_r= \sum _{k=1,k \ne r}^S | - \rho _{r,k}|.\) These balls are all contained in the larger ball with center 0 and radius \(2 \cdot \max _r x_r.\) This implies the general estimate (39).

For seeing (40), we decompose an argument \(u=(u_1,\ldots ,u_S)\) according to \(u= {{\bar{u}}} + u^0\) with an “average” part \({{\bar{u}}} = (\tfrac{1}{S}\sum _{i=1}^{S}u_i,\ldots ,\tfrac{1}{S}\sum _{i=1}^{S}u_i)\) and \(u^0:=u-{{\bar{u}}}\) such that \(u^0\) has average 0,  i.e., \(\sum _{i=1}^{S}u^0_i = 0,\) where 0 denotes the vector containing only zero entries here. In the situation of (40), the matrix \({\tilde{P}}^\mathrm{T} {\tilde{P}}\) has the form \({\tilde{P}}^\mathrm{T} {\tilde{P}} = \rho (S\cdot I - (1,\ldots ,1)(1,\ldots ,1)^\mathrm{T})\) We have \({\tilde{P}}^\mathrm{T} {\tilde{P}} {{\bar{u}}} = 0.\) Further, \( {\tilde{P}}^\mathrm{T} {\tilde{P}} u^0 = \rho S u^0.\) Hence, the largest modulus of an eigenvalue of \({\tilde{P}}^\mathrm{T} {\tilde{P}}\) equals \(\rho S\) which in turn shows the estimate (40).

For seeing (41), we notice that in case of (41), the matrix \({\tilde{P}}^\mathrm{T} {\tilde{P}}\) has cyclic shift structure with three nonzero entries in each line. The discrete Fourier matrix w.r.t. the cyclic group of order S diagonalizes \({\tilde{P}}^\mathrm{T} {\tilde{P}}.\) The corresponding eigenvalues are given by \(\lambda _k = \rho \left( 2 - 2 \cos \left( 2\pi \frac{k}{S} \right) \right) ,\) where \(k=0,\ldots ,S-1.\) The largest modulus of an eigenvalue is thus given by \(4 \ \rho ,\) if S is even, and by \(\rho \cdot \left( 2 - 2 \cos \left( \frac{\pi (S-1)}{S}\right) \right) .\) \(\square \)

Note that the problem of estimating the operator norm of B in (39) involves computing the operator norm of \({\tilde{P}}\) given by (43). This problem is intimately related to computing the spectral norm of the Laplacian of a corresponding weighted graph (e.g., [38, 80]), in particular, we conclude from this link that the general estimate (39) is sharp in the sense that the factor of 2 in front of the sum cannot be made smaller. This is because, for a general graph, the spectral radius of the (normalized) Laplacian has spectral norm smaller than two and this factor of two is sharp; cf. [38, 80].

We recall that we have introduced the concept of a directional partitioning \({\mathcal {I}}\) and discussed its relation with the concept of a partitioning near (21) above. For a function \(f: \Omega \rightarrow {\mathbb {R}}^S\) (representing its S component functions \(f_1,\ldots ,f_S:\Omega \rightarrow {\mathbb {R}}\)) defined on a grid \(\Omega ,\) we consider the orthogonal projection \(P_{\mathcal {I}}\) associated with a directional partition \({\mathcal {I}}\) by first sorting the intervals \({\mathcal {I}}\) into \({\mathcal {I}}_1,\ldots ,{\mathcal {I}}_S\) according to their associated directions \(a_s,\) \(s =1,\ldots , S,\) and then letting

$$\begin{aligned} P_{\mathcal {I}} f = \begin{pmatrix} P_{{\mathcal {I}}_1} f_1 \\ \vdots \\ P_{{\mathcal {I}}_S} f_S \end{pmatrix}, \qquad \text { where } \quad P_{{\mathcal {I}}_s} f_s|_{I} = \frac{\sum _{x \in I} f_s(x) }{ \# I}, \end{aligned}$$
(44)

i.e., the function \(P_{{\mathcal {I}}_s} f_s\) on the interval I is given as the arithmetic mean of \(f_i\) on the interval I for all intervals \(I \in {\mathcal {I}}_s,\) and for all \(s =1,\ldots , S.\) Here, the symbol \(\# I\) denotes the number of elements in I. We note that \(P_{\mathcal {I}}\) defines an orthogonal projection on the corresponding \(\ell ^2\) space of discrete functions \(f: \Omega \rightarrow {\mathbb {R}}^S\) with the norm \(\Vert f\Vert ^2 = \sum _{s,i} |(f_s)_i|^2\) where i iterates through all the indices of \(f_s.\)

We consider a partitioning \({\mathcal {P}}\) of \(\Omega ,\) its induced directional partitioning \({\mathcal {I}}^{{\mathcal {P}}}\) w.r.t. a set of S directions \(a_1,\ldots ,a_S,\) and the subspace

$$\begin{aligned} {\mathcal {A}}^{{\mathcal {P}}} = P_{{\mathcal {I}}^{{\mathcal {P}}}}(\ell ^2(\Omega , {\mathbb {R}}^S)) \end{aligned}$$
(45)

of functions which are constant on the intervals of the induced directional partitioning \({\mathcal {I}}^{{\mathcal {P}}}\) (which equal the image of the orthogonal projection \(P_{{\mathcal {I}}^{{\mathcal {P}}}}.\))

Functions \(g: \Omega \rightarrow {\mathbb {R}}\) which are piecewise constant w.r.t. a partitioning \({\mathcal {P}},\) i.e., they are constant on each segment \({\mathcal {P}}_i\) are in one-to-one correspondence with the linear subspace \({\mathcal {B}}^{{\mathcal {P}}}\) of \({\mathcal {A}}^{{\mathcal {P}}}\) given by

$$\begin{aligned} {\mathcal {B}}^{{\mathcal {P}}} = \{f \in {\mathcal {A}}^{{\mathcal {P}}}: f_1 = \ldots = f_S \} \end{aligned}$$
(46)

as shown by the following lemma.

Lemma 2

There is a one-to-one correspondence between the linear space of piecewise constant mappings w.r.t. the partitioning \({\mathcal {P}},\) and the subspace \({\mathcal {B}}^{{\mathcal {P}}}\) of \({\mathcal {A}}^{{\mathcal {P}}}\) via the mapping \(\iota : g \mapsto (g,\ldots ,g).\)

Proof

Let g be a piecewise constant mapping w.r.t. the partitioning \({\mathcal {P}},\) then \((g,\ldots ,g)\) is constant on each interval I of the induced directional partitioning \({\mathcal {I}}^{{\mathcal {P}}},\) and \((g,\ldots ,g) \in {\mathcal {B}}^{{\mathcal {P}}}.\) This shows that \(\iota \) is well defined in the sense that its range is contained in \({\mathcal {B}}^{{\mathcal {P}}}.\) Obviously, \(\iota \) is an injective linear mapping so that it remains to show that any \(f \in {\mathcal {B}}^{{\mathcal {P}}}\) is the image under \(\iota \) of some \(g:\Omega \rightarrow {\mathbb {R}}\) which is piecewise constant w.r.t. the partitioning \({\mathcal {P}}.\) To this end, let \(f \in {\mathcal {B}}^{{\mathcal {P}}}.\) By definition, f has the form \(f = (g,\ldots ,g)\) for some \(g:\Omega \rightarrow {\mathbb {R}}.\) Now, toward a contradiction, assume there is a segment \({\mathcal {P}}_i\) and points \(x,y \in {\mathcal {P}}_i\) with \(g(x)\ne g(y).\) Since there is a path \(x_0=x,\ldots ,x_N=y\) connecting xy in \({\mathcal {P}}_i\) with steps in \(a_1,\ldots ,a_S,\) we have that for any i there is an interval I in the induced partitioning \({\mathcal {I}}^{{\mathcal {P}}}\) containing \(x_i\) together with \(x_{i+1}.\) Since g is constant on each I in \({\mathcal {I}}^{{\mathcal {P}}}\) we get \(g(x_i) = g(x_{i+1})\) for all i which implies \(g(x)= g(y).\) This contradicts our assumption and shows the lemma. \(\square \)

Using the identification given by Lemma 2, we define, for a given partitioning \({\mathcal {P}},\) the projection \(Q_{{\mathcal {P}}}\) onto \({\mathcal {B}}^{{\mathcal {P}}}\) by

$$\begin{aligned} Q_{{\mathcal {P}}} f = \begin{pmatrix} \pi _{{\mathcal {P}}} f\\ \vdots \\ \pi _{{\mathcal {P}}} f \end{pmatrix}, \qquad \text { where } \quad \pi f|_{{\mathcal {P}}_i} = \frac{\sum _{s=1}^S\sum _{x \in {\mathcal {P}}_i} f_s(x) }{ \# {\mathcal {P}}_i \ S}, \end{aligned}$$
(47)

i.e., we average w.r.t. the segment and to all component functions as given by (22). Since the components of \(Q_{{\mathcal {P}}} f\) are all identical, we will not distinguish \(Q_{{\mathcal {P}}}\) and \(\pi _{{\mathcal {P}}}\) in the following. This means that we also use the symbol \(Q_{{\mathcal {P}}}f\) to denote the scalar-valued function which is piecewise constant on the partitioning \({\mathcal {P}}.\)

On \({\mathcal {A}}^{{\mathcal {P}}},\) we consider the problem

$$\begin{aligned} \mathop {{{\text {argmin}}}}\limits _{u_1,\ldots ,u_S} \sum _{s=1}^S \frac{1}{S} \left\| Au_s - f \right\| _2^2 \quad \text { subject to } \quad Cu=0, \end{aligned}$$
(48)

i.e., given the directional partitioning we are searching for a solution which belongs to \({\mathcal {B}}^{{\mathcal {P}}}\). Here, C denotes the matrix

$$\begin{aligned} C = \begin{pmatrix} c_{1,2} I &{} -c_{1,2}I &{} 0 &{} \ldots &{} 0 &{} 0\\ c_{1,3} I &{} 0 &{} -c_{1,3}I &{} \ldots &{} 0 &{} 0\\ &{} \vdots &{} &{} &{} \vdots &{} \\ c_{1,S} I &{} 0 &{} 0 &{} \ldots &{} 0 &{} -c_{1,S}I \\ 0 &{} c_{2,3} I &{} -c_{2,3}I &{} \ldots &{} 0 &{} 0 \\ &{} \vdots &{} &{} &{} \vdots &{} \\ 0 &{} c_{2,S} I &{} 0 &{} \ldots &{} 0 &{} -c_{2,S}I \\ &{} &{} &{} \vdots &{} &{} \\ 0 &{} 0 &{} 0 &{} \ldots &{} c_{S-1,S}I &{} -c_{S-1,S}I \\ \end{pmatrix}, \end{aligned}$$
(49)

where the \(c_{s,s'}\) are as in (5); if \(c_{s,s'}=0,\) the corresponding line is removed from the constraint matrix C. For the special choices of (7), we have

$$\begin{aligned} C = \begin{pmatrix} I &{} -I &{} 0 &{} \ldots &{} 0 &{} 0\\ I &{} 0 &{} -I &{} \ldots &{} 0 &{} 0\\ &{} \vdots &{} &{} &{} \vdots &{} \\ I &{} 0 &{} 0 &{} \ldots &{} 0 &{} -I \\ 0 &{} I &{} -I &{} \ldots &{} 0 &{} 0 \\ &{} \vdots &{} &{} &{} \vdots &{} \\ 0 &{} I &{} 0 &{} \ldots &{} 0 &{} -I \\ &{} &{} &{} \vdots &{} &{} \\ 0 &{} 0 &{} 0 &{} \ldots &{} I &{} -I \\ \end{pmatrix}, \quad \text { and } \quad C = \begin{pmatrix} I &{} -I &{} 0 &{} 0 &{} \ldots &{} 0 &{} 0&{} 0\\ 0 &{} I &{} -I &{} 0 &{} \ldots &{} 0 &{} 0&{} 0 \\ 0 &{} 0 &{} I &{} -I &{} \ldots &{} 0 &{} 0&{} 0 \\ &{} \vdots &{} &{} &{} &{} &{} \vdots &{} &{} \\ 0 &{} 0 &{} 0 &{} 0&{} \ldots &{} I &{} -I &{} 0 \\ 0 &{} 0 &{} 0 &{} 0&{}\ldots &{} 0 &{} I &{} -I\\ -I &{} 0 &{} 0 &{} 0&{}\ldots &{} 0 &{} 0 &{} I\\ \end{pmatrix} \end{aligned}$$
(50)

which reflects the constraints \(u_1=\ldots =u_S.\) We recall that \(\mu _{{\mathcal {P}}}\) is a Lagrange multiplier of the problem in (48) if

$$\begin{aligned} \min _{u \in {\mathcal {B}}^{{\mathcal {P}}}} \ \sum _{s=1}^S \frac{1}{S} \left\| Au_s - f \right\| _2^2 = \min _{u \in {\mathcal {A}}^{{\mathcal {P}}}} \ \sum _{s=1}^S \frac{1}{S} \left\| Au_s - f \right\| _2^2 + \mu _{{\mathcal {P}}}^\mathrm{T} C u. \end{aligned}$$
(51)

We note that for quadratic problems such as in (48) Lagrange multipliers always exist [7]. We have that

$$\begin{aligned} \tfrac{2}{S} P_{{\mathcal {I}}^{{\mathcal {P}}}}{\tilde{A}}^\mathrm{T} {\tilde{A}} P_{{\mathcal {I}}^{{\mathcal {P}}}} u_{{\mathcal {P}}}^*- \tfrac{2}{S} P_{{\mathcal {I}}^{{\mathcal {P}}}} {\tilde{A}}^\mathrm{T} {\tilde{f}} = C^\mathrm{T} \mu _{{\mathcal {P}}} = P_{{\mathcal {I}}^{{\mathcal {P}}}} C^\mathrm{T} \mu _{{\mathcal {P}}}, \end{aligned}$$
(52)

or, in other form,

$$\begin{aligned} L(\mu _{{\mathcal {P}}}) := \left\| \tfrac{2}{S} P_{{\mathcal {I}}^{{\mathcal {P}}}}{\tilde{A}}^\mathrm{T} {\tilde{A}} P_{{\mathcal {I}}^{{\mathcal {P}}}} u_{{\mathcal {P}}}^*- \tfrac{2}{S} P_{{\mathcal {I}}^{{\mathcal {P}}}} {\tilde{A}}^\mathrm{T} {\tilde{f}} - P_{{\mathcal {I}}^{{\mathcal {P}}}} C^\mathrm{T} \mu _{{\mathcal {P}}} \right\| =0, \end{aligned}$$
(53)

where \({\tilde{A}}\) is the block diagonal matrix with constant entry A on each diagonal component, \({\tilde{f}}\) is a block vector of corresponding dimensions with entry f in each component, and \(u_{{\mathcal {P}}}^*\) is a minimizer of the constraint problem in \({\mathcal {B}}^{{\mathcal {P}}}\). We note that the last equality \(C^\mathrm{T} \mu _{{\mathcal {P}}} = P_{{\mathcal {I}}^{{\mathcal {P}}}} C^\mathrm{T} \mu _{{\mathcal {P}}}\) in (52) holds since the left-hand side of (52) is contained in the image of \(P_{{\mathcal {I}}^{{\mathcal {P}}}}\).

Lemma 3

We consider a partitioning \({\mathcal {P}}\) of the discrete domain \(\Omega \) and the corresponding problem (48). There is a Lagrange multiplier \(\mu _{{\mathcal {P}}}\) for (48) with

$$\begin{aligned} \Vert \mu _{{\mathcal {P}}} \Vert \le 2 \sigma _1^{-1/2} S^{-1/2} \Vert A\Vert \Vert f\Vert . \end{aligned}$$
(54)

Here, \(\sigma _1\) is the smallest nonzero eigenvalue of \(C^\mathrm{T}C\) with C given by (49). For the particular choice of C given by the left-hand side of (50), we have

$$\begin{aligned} \Vert \mu _{{\mathcal {P}}} \Vert \le \tfrac{2}{S} \Vert A\Vert \Vert f\Vert ; \end{aligned}$$
(55)

and, for the particular choice of C given by the right-hand side of (50) we have

$$\begin{aligned} \Vert \mu _{{\mathcal {P}}} \Vert \le 2 (2-2\cos (2\pi /S))^{-1/2} S^{-1/2} \Vert A\Vert \Vert f\Vert , \end{aligned}$$
(56)

(e.g., for \(S=4,\) an eight neighborhood, \( 2-2\cos (2\pi /S) = \sigma _1 = 2.\)) In particular, the right-hand side and the constants in all these estimates are independent of the particular partitioning \({\mathcal {P}}\).

Proof

For any minimizer \(u_{{\mathcal {P}}}^*\) of the constraint problem in \({\mathcal {B}}^{{\mathcal {P}}},\) we have that

$$\begin{aligned} \Vert \tfrac{2}{S} P_{{\mathcal {I}}^{{\mathcal {P}}}}{\tilde{A}}^\mathrm{T} {\tilde{A}} P_{{\mathcal {I}}^{{\mathcal {P}}}} u_{{\mathcal {P}}}^*- \tfrac{2}{S} P_{{\mathcal {I}}^{{\mathcal {P}}}} {\tilde{A}}^\mathrm{T} {\tilde{f}}\Vert&\le \Vert \tfrac{2}{S}{\tilde{A}}^\mathrm{T} {\tilde{A}} u_{{\mathcal {P}}}^*- \tfrac{2}{S} {\tilde{A}}^\mathrm{T} {\tilde{f}} \Vert \le \tfrac{2}{S} \Vert A\Vert \Vert {\tilde{f}}\Vert \nonumber \\&\le \tfrac{2 \sqrt{S}}{S} \Vert A\Vert \Vert f\Vert , \end{aligned}$$
(57)

where we recall that \({\tilde{A}}\) is the block diagonal matrix with constant entry A,  and \({\tilde{f}}\) is a block vector with entry f in each component. The first inequality is a consequence of the fact that \(P_{{\mathcal {I}}^{{\mathcal {P}}}}\) is an orthogonal projection. The second inequality may be seen by evaluating the term for the constant zero function (which always belongs to \({\mathcal {B}}^{{\mathcal {P}}}\) ) as a candidate and by noting that \(\Vert A^\mathrm{T} \Vert =\Vert A\Vert .\)

Using (52), we have \(\Vert C^\mathrm{T} \mu _{{\mathcal {P}}}\Vert \le \tfrac{2}{\sqrt{S}} \Vert A\Vert \Vert f\Vert .\) Choosing \(\mu _{{\mathcal {P}}}\) in the complement of the zero space of \(C^\mathrm{T},\) we get

$$\begin{aligned} \Vert C^\mathrm{T} \mu _{{\mathcal {P}}}\Vert \ge \inf _{x \in \left( {\text {ker}}\left( C^\mathrm{T}\right) \right) ^\perp ,\Vert x\Vert =1} \Vert C^\mathrm{T}x\Vert \ \Vert \mu _{{\mathcal {P}}}\Vert . \end{aligned}$$
(58)

We observe that finding the infimum in (58) corresponds to finding the square root of the smallest nonzero eigenvalue of \(C^\mathrm{T}C.\) This is because (i) the nonzero eigenvalues of \(C^\mathrm{T}C\) equal the nonzero eigenvalues of \(CC^\mathrm{T},\) i.e.,

$$\begin{aligned} \min \left\{ \sigma : \sigma \in \mathrm {spectrum}(CC^\mathrm{T}){\setminus } \{0\} \right\} = \min \left\{ \sigma : \sigma \in \mathrm {spectrum}(C^\mathrm{T}C){\setminus } \{0\} \right\} = \sigma _1, \end{aligned}$$
(59)

where \(\sigma _1\) is the smallest nonzero eigenvalue of \(C^\mathrm{T}C\). Further, (ii) for \(x \in \left( {\text {ker}}\left( C^\mathrm{T}\right) \right) ^\perp ,\) \(\Vert C^\mathrm{T}x\Vert ^2 = \langle x, CC^\mathrm{T}x \rangle \ge \min \left\{ \sigma : \sigma \in \mathrm {spectrum}(CC^\mathrm{T}){\setminus } \{0\} \right\} \Vert x\Vert ^2.\) Hence, using (59) in (58) we get that \(\Vert C^\mathrm{T} \mu _{{\mathcal {P}}}\Vert \ge \sqrt{\sigma _1} \Vert \mu _{{\mathcal {P}}}\Vert ,\) and together with (52) and (57), we obtain

$$\begin{aligned} \Vert \mu _{{\mathcal {P}}}\Vert \le \sigma ^{-1/2} \Vert C^\mathrm{T} \mu _{{\mathcal {P}}}\Vert \le 2 \sigma ^{-1/2} S^{-1/2} \Vert A\Vert \Vert f\Vert \end{aligned}$$
(60)

which shows (54).

Now we consider the particular choice of C given by the left-hand side of (50). Similar to the derivation in (43), we have that \(C^\mathrm{T}C= S \cdot I - (1,\ldots ,1)(1,\ldots ,1)^\mathrm{T}).\) Further, the constants constitute the kernel of \(C^\mathrm{T}C\) and any vector u in its orthogonal complement is mapped to Su. Hence, \(\sigma _1 = S\) which shows (55).

Finally, we consider the particular choice of C given by the right-hand side of (50). As already explained in the proof of Lemma 1, the discrete Fourier transform shows that the corresponding eigenvalues are given by \(\lambda _k = \rho \left( 2 - 2 \cos \left( 2\pi \frac{k}{S} \right) \right) ,\) where \(k=0,\ldots ,S-1.\) The smallest nonzero eigenvalue is thus given by \(2-2\cos (2\pi /S).\) This shows (56) which completes the proof of the lemma. \(\square \)

3.3 The Quadratic Penalty Relaxation of the Potts Problem and Its Relation to the Potts Problem

In this subsection, we reveal some relations between the Potts problem and its quadratic penalty relaxation; in particular, we show Theorem 2 and parts of Theorem 4. We start out to show that the quadratic penalty relaxation of the Potts problem is NP-hard which was formulated as Theorem 2.

Proof of Theorem 2

We consider the quadratic penalty relaxation (5) of the multivariate Potts problem in its equivalent form (11) which reads

$$\begin{aligned} P_{\gamma ,\rho }(u_1,\ldots ,u_S) = \left\| B(u_1,\ldots ,u_S)^\mathrm{T} - g \right\| _2^2 + \gamma \ \Big \Vert \ D(u_1,\ldots ,u_S) \ \Big \Vert _{0,\omega }. \end{aligned}$$

with B and g given by (8) and D given by (9). We serialize \( u:(u_1,\ldots ,u_S): \Omega \rightarrow {\mathbb {R}}^S\) into a function \({{\hat{u}}}: X \rightarrow {\mathbb {R}}\) with \(X \subset {\mathbb {Z}}\) being a discrete interval of size \(S \#\Omega \) as follows: for \(u_s,\) we consider the discrete lines in the image with direction \(a_s\) and interpret u on these lines as a vector; then we concatenate these vectors starting with the one corresponding to the leftmost upper line to obtain a vector of length \(\#\Omega ;\) for each s,  we obtain such a vector and we again concatenate these vectors starting with index \(s=1,2,\ldots \) to obtain the resulting object which we denote by \({{\hat{u}}}.\) Using this serialization we may arrange Bg and D accordingly to obtain the univariate Potts problem

$$\begin{aligned} {{\hat{P}}}_{\gamma ,\rho }({{\hat{u}}}) = \left\| {{\hat{B}}} {{\hat{u}}} - {{\hat{g}}} \right\| _2^2 + \gamma \ \Big \Vert {{\hat{\omega }}} \nabla {{\hat{u}}} \Big \Vert _{0}, \quad \text { where } {{\hat{\omega }}}: X \rightarrow [0,\infty ) \end{aligned}$$

is a weight vector, \(\omega \nabla {{\hat{u}}}\) denotes pointwise multiplication, and \({{\hat{B}}},{{\hat{g}}}\) are the matrix and the vector corresponding to Bg w.r.t. the serialization. The weight vector may be zero which in particular happens at the line breaks, i.e., those indices where two vectors have been concatenated in the above procedure. More precisely, constant data induce a directional segmentation on \(\Omega \) and the image of the directional segmentation under the above serialization procedure induces a partitioning of the univariate domain D;  precisely between these segments, the weight vectors equals zero. Now, for each segment \([d_1,\ldots ,d_r]\) in D,  we transform the basis \(\delta _{d_1},\ldots ,\delta _{d_r}\) to the basis \(\delta _{d_2}-\delta _{d_1}, \ldots , \delta _{d_r}-\delta _{d_{r-1}}, \tfrac{1}{r}\sum _{l=1}^r \delta _{d_l} \) obtained by neighboring differences and the average. As a result (which is in detail elaborated in [84]), we obtain a problem of the form

$$\begin{aligned} {{\hat{P}}}_{\gamma ,\rho }({{\hat{u}}}) = \left\| {\tilde{B}} {\tilde{u}} - {\tilde{b}} \right\| _2^2 + \gamma \ \Big \Vert {{\hat{\omega }}} {\tilde{u}} \Big \Vert _{0}, \quad \text { where } {{\hat{\omega }}}: D \rightarrow [0,\infty ) \end{aligned}$$
(61)

which is a sparsity problem and which is known to be NP-hard; see, for instance, [84]. This shows the assertion. \(\square \)

We next characterize the local minimizers of the relaxed Potts problem (5) and of the Potts problem (2).

Lemma 4

A local minimizer \(u=(u_1,\ldots ,u_S)\) of the quadratic penalty relaxation (5) is characterized as follows: let \({\mathcal {I}}\) be the directional partitioning induced by the minimizer u,  and \({\mathcal {P}} = {\mathcal {P}}_{{\mathcal {I}}}\) be the induced partitioning, then u is a minimizer of the problem

$$\begin{aligned} \min _{u \in {\mathcal {A}}^{{\mathcal {P}}}} F_{\rho }(u), \quad \text { where } \quad F_{\rho }(u) = \sum \nolimits _{s=1}^S \tfrac{1}{S} \left\| Au_s - f \right\| _2^2 + \rho \Vert C u \Vert ^2. \end{aligned}$$
(62)

Conversely, if u minimizes (62) on \({\mathcal {A}}^{{\mathcal {P}}},\) then u is a minimizer of the relaxed Potts problem (5).

Proof

Let \(u=(u_1,\ldots ,u_S)\) be a local minimizer of the quadratic penalty relaxation (5). Hence, there is a neighborhood \({\mathcal {U}}\) of u such that, for any \(v \in {\mathcal {U}},\) \(P_{\gamma , \rho }(v) \ge P_{\gamma , \rho }(u).\) Now if \(v \in {\mathcal {A}}^{{\mathcal {P}}}\) and \(\Vert v-u\Vert \) is small, then \(\sum _{s=1}^S \omega _s \left\| \nabla _{a_s} u_s \, \right\| _0\) \( =\sum _{s=1}^S \omega _s \left\| \nabla _{a_s} v_s \, \right\| _0\) which implies that

$$\begin{aligned} F_{\rho }(u) = P_{\gamma , \rho }(u) - \gamma \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} u_s \,\right\| _0 \le P_{\gamma , \rho }(v) - \gamma \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} v_s \, \right\| _0 = F_{\rho }(v). \end{aligned}$$
(63)

This shows that u minimizes (62). Conversely, we assume that u minimizes (62). If the directional partitioning \({\mathcal {I}}'\) induced by u is coarser than \({\mathcal {I}}\) consider the coarser directional partitioning \({\mathcal {I}}'\) instead of \({\mathcal {I}}.\) Let the maximum norm of \(h=(h_1\ldots ,h_S)\) be smaller than the height of the smallest jump of u,  then, for \(u+h,\)

$$\begin{aligned} \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} (u_s+h_s) \, \right\| _0 \ge \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} u_s \, \right\| _0. \end{aligned}$$
(64)

If inequality holds in (64), the continuity of \(F_{\rho }\) implies that \(F_{\rho }(u+h) \ge F_{\rho }(u) - \varepsilon \) for small enough h and arbitrary \(\varepsilon .\) Hence,

$$\begin{aligned} P_{\gamma , \rho }(u)&= F_{\rho }(u) + \gamma \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} u_s \, \right\| _0 \le F_{\rho }(u+h) - \gamma \min _s \omega _s \nonumber \\&\quad + \gamma \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} (u_s+h_s) \, \right\| _0 + \varepsilon \nonumber \\&\le F_{\rho }(u+h) + \gamma \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} (u_s+h_s) \, \right\| _0 = P_{\gamma , \rho }(u+h), \end{aligned}$$
(65)

if we choose \(\varepsilon \) small enough. If equality holds in (64), we have that \(u+h \in {\mathcal {A}}^{{\mathcal {P}}}\) which implies \(F_{\rho }(u) \le F_{\rho }(u+h)\) since u is a minimizer of \(F_{\rho }\) on \({\mathcal {A}}^{{\mathcal {P}}}.\) This in turn implies \(P_{\gamma , \rho }(u) \le P_{\gamma , \rho }(u+h)\) by the assumed equality in (64). Together, in any case, \(P_{\gamma , \rho }(u) \le P_{\gamma , \rho }(u+h)\) for any small perturbation h. This shows that u is a local minimizer of \(P_{\gamma , \rho }\) which completes the proof. \(\square \)

Lemma 5

We consider a function \(u^*:\Omega \rightarrow {\mathbb {R}}\) and its induced partitioning \({\mathcal {P}}.\) Then, u is a local minimizer of the Potts problem (2), if and only if \((u^*,\ldots ,u^*)\) minimizes (48) w.r.t. \({\mathcal {P}}.\)

Proof

Since the proof of this statement is very similar to the proof of Lemma 4, we keep it rather short and refer to the proof of Lemma 4 if more explanation is necessary. Let u be a minimizer of (2) which is equivalent to \({{\bar{u}}} = (u,\ldots ,u)\) being a minimizer of (4). There is a neighborhood \({\mathcal {U}}\) of \({{\bar{u}}}\) such that, for any \({{\bar{v}}}=(v,\ldots ,v) \in {\mathcal {U}},\) \(P_{\gamma }(v) \ge P_{\gamma }(u).\) For \({{\bar{v}}} \in {\mathcal {B}}^{{\mathcal {P}}}\) with small \(\Vert {{\bar{v}}}- {{\bar{u}}}\Vert ,\) we have \(\sum _{s=1}^S \omega _s \left\| \nabla _{a_s} u \, \right\| _0\) \( =\sum _{s=1}^S \omega _s \left\| \nabla _{a_s} v \, \right\| _0.\) Hence, by the definition of \(P_\gamma \) in (4) \( \Vert Au - f \Vert _2^2 \le \Vert Av - f \Vert _2^2 \) which shows that \((u^*,\ldots ,u^*)\) minimizes (48).

Conversely, let \({{\bar{u}}} = (u,\ldots ,u)\) be a minimizer of (48) with the partitioning \({\mathcal {P}}\) induced by u. For \({{\bar{h}}}=(h,\ldots ,h)\) with absolute value smaller than the minimal height of a jump of u,  we have the estimate \( \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} (u+h) \, \right\| _0 \) \( \ge \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} u \, \right\| _0. \) If inequality holds in this estimate, the continuity of \(F_{\rho }\) implies that \( \Vert A(u+h) - f \Vert _2^2 \ge \Vert Au - f \Vert _2^2 - \varepsilon \) for small enough h and arbitrary \(\varepsilon .\) Hence, \( P_{\gamma }( {{\bar{u}}}) \le \Vert A(u+h) - f \Vert _2^2 - \gamma \min _s \omega _s + \gamma \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} (u+h) \, \right\| _0 + \varepsilon \) \( \le P_{\gamma , \rho }({{\bar{u}}}+{{\bar{h}}}) \) if \(\varepsilon \) is small. If equality holds above, i.e., \( \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} (u+h) \, \right\| _0 \) \( = \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} u \, \right\| _0, \) then \({{\bar{u}}}+ {{\bar{h}}} \in {\mathcal {B}}^{{\mathcal {P}}}\) which implies that \(\Vert Au - f \Vert _2^2 \le \Vert A(u+h) - f \Vert _2^2\) since \({{\bar{u}}}\) is a minimizer of the corresponding function on \({\mathcal {B}}^{{\mathcal {P}}}.\) As a consequence \(P_{\gamma }({{\bar{u}}}) \le P_{\gamma }({{\bar{u}}}+ {{\bar{h}}})\) for any small perturbation h. This shows that u is a local minimizer of \(P_{\gamma }\) which completes the proof. \(\square \)

Proposition 1

Any local minimizer of the quadratic penalty relaxation (5) is an approximate local minimizer in the sense of (35) of the Potts problem (3).

Proof

By Lemma 4, a local minimizer \(u=(u_1,\ldots ,u_S)\) of the quadratic penalty relaxation (5) is a minimizer of the problem (62). Let us thus consider a local minimizer u of (5) with induced partitioning \({\mathcal {P}} = {\mathcal {P}}_{{\mathcal {I}}}.\) Since u minimizes (62), we have

$$\begin{aligned} \tfrac{1}{S} P_{{\mathcal {I}}}{\tilde{A}}^\mathrm{T} {\tilde{A}} P_{{\mathcal {I}}} u - \tfrac{1}{S} P_{{\mathcal {I}}} {\tilde{A}}^\mathrm{T} {\tilde{f}} + \rho P_{{\mathcal {I}}} C^\mathrm{T} C P_{{\mathcal {I}}} u = 0 \end{aligned}$$
(66)

since the gradient projected to \({\mathcal {A}}^{{\mathcal {P}}}\) equals zero for any local minimizer of the restricted problem on the subspace \({\mathcal {A}}^{{\mathcal {P}}}.\) (The notation is chosen as in (53) above.) We define \(\mu \) by \(\mu = \rho C P_{{\mathcal {I}}} u\) and obtain

$$\begin{aligned} L(\mu ) = \Vert \tfrac{1}{S} P_{{\mathcal {I}}}{\tilde{A}}^\mathrm{T} {\tilde{A}} P_{{\mathcal {I}}} u - \tfrac{1}{S} P_{{\mathcal {I}}} {\tilde{A}}^\mathrm{T} {\tilde{f}} + P_{{\mathcal {I}}} C^\mathrm{T} \mu \Vert = 0 \end{aligned}$$
(67)

by (66). It remains to show that \(\Vert Cu\Vert \) becomes small. To this end, we observe that, by Lemma 6, for arbitrary \(v=(v_1,\ldots ,v_S)\in {\mathcal {A}}^{{\mathcal {P}}}\), \( \Vert C v \Vert = \Vert C P_{{\mathcal {I}}} v \Vert \le \tfrac{1}{\rho }\Vert \mu ^*\Vert + \sqrt{\tfrac{F_{\rho }(v)- \min _{x \in {\mathcal {A}}^{{\mathcal {P}}}} F_{\rho }(x)}{\rho }}, \) where \(\mu ^*\) is an arbitrary Lagrange multiplier of (48). Plugging in the minimizer u for v yields \(\Vert C u \Vert < \tfrac{1}{\rho }\Vert \mu ^*\Vert .\) Thus, letting \(\delta = \tfrac{1}{\rho }\Vert \mu ^*\Vert ,\) we have

$$\begin{aligned} \sum _{s,s'} c_{s,s'} \Vert u^*_s - u^*_{s'}\Vert ^2_2 = \Vert Cu\Vert ^2 < \delta , \end{aligned}$$
(68)

and \(L(\mu ) = 0\) by (67) which by (35) shows the assertion and completes the proof. \(\square \)

For the proof of Proposition 1 as well as in the following, we need the next lemma. Similar statements are [53, Proposition 13] and [60, Lemma 2.5]. However, since there are differences concerning the precise estimate in these references, and the setup here is slightly different, we provide a brief proof here for the readers convenience.

Lemma 6

Let \({\mathcal {P}}\) be a partitioning and \({\mathcal {I}} = {\mathcal {I}}_{{\mathcal {P}}}\) be the corresponding induced partitioning. For arbitrary \(v=(v_1,\ldots ,v_S)\in {\mathcal {A}}^{{\mathcal {P}}}\),

$$\begin{aligned} \Vert C v \Vert = \Vert C P_{{\mathcal {I}}} v \Vert \le \tfrac{1}{\rho }\Vert \mu ^*\Vert + \sqrt{\tfrac{F_{\rho }(v)- \min _{x \in {\mathcal {A}}^{{\mathcal {P}}}} F_{\rho }(x)}{\rho }}, \end{aligned}$$
(69)

where \(\mu ^*\) is an arbitrary Lagrange multiplier of (48).

Proof

By [53, Corollary 2], we have for arbitrary \(v=(v_1,\ldots ,v_S)\in {\mathcal {A}}^{{\mathcal {P}}}\) that

$$\begin{aligned} \sum \nolimits _{s=1}^S \tfrac{1}{S} \left\| Av_s - f \right\| _2^2 - \min _{(y,\ldots ,y) \in {\mathcal {B}}^{{\mathcal {P}}}} \left\| Ay - f \right\| _2^2 \ge - \Vert \mu ^*\Vert \ \Vert Cv\Vert . \end{aligned}$$
(70)

Then,

$$\begin{aligned} F_{\rho }(v)- \min _{x \in {\mathcal {A}}^{{\mathcal {P}}}} F_{\rho }(x)&\ge \sum \nolimits _{s=1}^S \tfrac{1}{S} \left\| Av_s - f \right\| _2^2 + \rho \Vert C v \Vert ^2 - \min _{(y,\ldots ,y) \in {\mathcal {B}}^{{\mathcal {P}}}} F_{\rho }(y,\ldots ,y) \nonumber \\&= \sum \nolimits _{s=1}^S \tfrac{1}{S} \left\| Av_s - f \right\| _2^2 +\rho \Vert C v \Vert ^2 - \min _{(y,\ldots ,y) \in {\mathcal {B}}^{{\mathcal {P}}}} \left\| Ay - f \right\| _2^2 \nonumber \\&\ge \rho \Vert C v \Vert ^2 - \Vert \mu ^*\Vert \ \Vert Cv\Vert . \end{aligned}$$
(71)

For the first inequality, we wrote down the definition of \(F_{\rho }\) and restricted the set with respect to which the minimum is formed which results in a potentially larger function value. For the second inequality we notice that, for \((y,\ldots ,y) \in {\mathcal {B}}^{{\mathcal {P}}}\), we have \(C(y,\ldots ,y)=0,\) and for the last inequality we employed (70). Now, writing \( z^2 - \frac{\Vert \mu ^*\Vert }{\Vert \rho \Vert } z\) \(= z^2 - \frac{\Vert \mu ^*\Vert }{\rho } z + \left( \tfrac{\Vert \mu ^*\Vert }{2\rho }\right) ^2 - \left( \tfrac{\Vert \mu ^*\Vert }{2\rho }\right) ^2\) \(= (z - \tfrac{\Vert \mu ^*\Vert }{2\rho } )^2 - \left( \tfrac{\Vert \mu ^*\Vert }{2\rho }\right) ^2 \) and plugging this into (71) with \(z := \Vert Cv\Vert \) yields

$$\begin{aligned} \tfrac{F_{\rho }(v)- \min _{x \in {\mathcal {A}}^{{\mathcal {P}}}} F_{\rho }(x)}{\rho } \ge \left( \Vert Cv\Vert - \tfrac{\Vert \mu ^*\Vert }{2\rho } \right) ^2 - \left( \tfrac{\Vert \mu ^*\Vert }{2\rho }\right) ^2, \end{aligned}$$
(72)

and hence

$$\begin{aligned} \left| \Vert Cv\Vert - \tfrac{\Vert \mu ^*\Vert }{2\rho } \right| \le \sqrt{\tfrac{F_{\rho }(v)- \min _{x \in {\mathcal {A}}^{{\mathcal {P}}}} F_{\rho }(x)}{\rho } + \left( \tfrac{\Vert \mu ^*\Vert }{2\rho }\right) ^2 } \le \sqrt{\tfrac{F_{\rho }(v)- \min _{x \in {\mathcal {A}}^{{\mathcal {P}}}} F_{\rho }(x)}{\rho }} + \tfrac{\Vert \mu ^*\Vert }{2\rho } \end{aligned}$$
(73)

where the last inequality is a consequence of the fact that the unit ball w.r.t. the \(\ell ^1\) norm is contained in the unit ball w.r.t. the \(\ell ^2\) norm. As a consequence, \(\Vert Cv\Vert \le \sqrt{\tfrac{F_{\rho }(v)- \min _{x \in {\mathcal {A}}^{{\mathcal {P}}}} F_{\rho }(x)}{\rho }} + \tfrac{\Vert \mu ^*\Vert }{2\rho } + \tfrac{\Vert \mu ^*\Vert }{2\rho }\) which completes the proof. \(\square \)

Next, we see that for any local minimizer of the quadratic penalty relaxation (5), we can find a nearby feasible point using the projection procedure (Procedure 1) proposed in Sect. 2.2. Further, if the imaging operator A is lower bounded, we find a nearby minimizer.

Proposition 2

Procedure 1 applied to a local minimizer \(u'=(u'_1,\ldots ,u'_S)\) of the quadratic penalty relaxation (5) produces a feasible image \({{\hat{u}}}\) (together with a valid partitioning) for the Potts problem (3) which is close to \(u'\) in the sense that

$$\begin{aligned} \Vert u_s'-{{\hat{u}}}\Vert \le C_1 \varepsilon \qquad \text {for all} \quad s \in \{1,\ldots ,S\}, \end{aligned}$$
(74)

where \(\varepsilon = \max _{s,s'} \Vert u'_s-u'_{s'}\Vert \) quantifies the deviation between the \(u_s.\) Here \(C_1 = \# \Omega /4, \) where the symbol \(\# \Omega \) denotes the number of elements in \(\Omega .\)

If the imaging operator A is lower bounded, i.e., there is a constant \(c>0\) such that \(\Vert Au\Vert \ge c \Vert u\Vert \), a local minimizer \(u^*\) of the Potts problem (3) is nearby, i.e.,

$$\begin{aligned} \Vert u^*-{{\hat{u}}}\Vert \le \frac{\sqrt{\eta }}{c} \end{aligned}$$
(75)

where

$$\begin{aligned} \eta := \left( \Vert A \Vert ^2 \varepsilon C_1^2 + 2 \Vert A \Vert C_1 \Vert f \Vert _2 \right) \varepsilon . \end{aligned}$$
(76)

Proof

We denote the directional partitioning induced by \(u'\) by \({\mathcal {I}}\) and the corresponding induced partitioning by \({\mathcal {P}} = {\mathcal {P}}_{{\mathcal {I}}}.\) We note that Procedure 1 applied to \(u'\) precisely produces

$$\begin{aligned} ({{\hat{u}}},\ldots ,{{\hat{u}}})= Q_{\mathcal {P}} u', \end{aligned}$$
(77)

with the projection \(Q_{\mathcal {P}}\) given by (47). We first note, that the average \(({{\bar{u}}})_{ij} = \tfrac{1}{S} \sum _{s=1}^{S} (u_s')_{ij}\) fulfills \(|({{\bar{u}}})_{ij}- (u_s')_{ij}|<\varepsilon .\) Further, the function value of \({{\hat{u}}}\) which is piecewise constant w.r.t. \({\mathcal {P}}\) is obtained by \({{\hat{u}}}|_{{\mathcal {P}}_i} = \sum _{x \in {\mathcal {P}}_i}{{\bar{u}}}(x)/ \# {\mathcal {P}}_i.\) Hence, we may estimate

$$\begin{aligned} \Vert u_s'-{{\hat{u}}}\Vert _2^2 \le \varepsilon L, \end{aligned}$$
(78)

where L is the maximal length of a path connecting any two pixels as given by Definition 1. As a worst case estimate, we get \(L\le C_1\) where we define \(C_1\) as one fourth of the number of elements in \(\Omega ,\) i.e., \(C_1 = \tfrac{\# \Omega }{4}.\) This shows (74).

For \(F_{\rho }\) given by (62), we have

$$\begin{aligned} F_{\rho }(u')&\le F_{\rho }({{\hat{u}}},\ldots ,{{\hat{u}}}) = \sum \nolimits _{s=1}^S \tfrac{1}{S} \left\| A{{\hat{u}}} - f \right\| _2^2 \nonumber \\&\le \sum \nolimits _{s=1}^S \tfrac{1}{S} \left( \left\| A{{\hat{u}}} - A u_s' \right\| _2 + \left\| Au_s' - f \right\| _2\right) ^2 \nonumber \\&\le \sum \nolimits _{s=1}^S \tfrac{1}{S} \left( \Vert A \Vert \varepsilon C_1 + \left\| Au_s' - f \right\| _2\right) ^2 \\&\le \Vert A \Vert ^2 \varepsilon ^2 C_1^2 + 2 \Vert A \Vert \varepsilon C_1 \sum \nolimits _{s=1}^S \tfrac{1}{S} \left\| Au_s' - f \right\| _2 + \sum \nolimits _{s=1}^S \tfrac{1}{S} \left\| Au_s' - f \right\| _2^2 \nonumber \\&\le \eta + F_{\rho }(u'), \nonumber \end{aligned}$$
(79)

with

$$\begin{aligned} \eta = \left( \Vert A \Vert ^2 \varepsilon C_1^2 + 2 \Vert A \Vert C_1 \Vert f \Vert _2 \right) \varepsilon , \end{aligned}$$
(80)

as given in (76). The first inequality holds since as a local minimizer of the quadratic penalty relaxation (5), \(u'\) is the global minimizer of \(F_{\rho }\) on \({\mathcal {A}}^{{\mathcal {P}}}\) by Lemma 4 and since \(({{\hat{u}}},\ldots ,{{\hat{u}}}) \in {\mathcal {A}}^{{\mathcal {P}}}\) by construction. The next inequalities apply the triangle inequality and estimates on matrix norms. The last inequality is a consequence of the fact that \(\sum \nolimits _{s=1}^S \tfrac{1}{S} \left\| Au_s' - f \right\| _2 \le \Vert f\Vert _2.\) Otherwise, if \(\Vert Au_s' - f \Vert _2 > \Vert f \Vert _2,\) choosing \(u_s'=0\) would yield a lower function value which would contradict the minimality of \(u'.\)

Now consider the partitioning \({\mathcal {P}}'\) induced by \({{\hat{u}}},\) and the corresponding minimizer \(u^*,\) i.e.,

$$\begin{aligned} (u^*,\ldots ,u^*) = \mathop {{{\text {argmin}}}}\limits _{u \in {\mathcal {B}}^{{\mathcal {P}}'}} F_{\rho }(u) \end{aligned}$$
(81)

where, for \((u,\ldots ,u) \in {\mathcal {B}}^{{\mathcal {P}}'},\) we have \(F_{\rho }(u,\ldots ,u) = \left\| Au - f \right\| _2^2.\) By Lemma 5, \(u^*\) is a local minimizer of the Potts problem (2). On the other hand, by orthogonality in an inner product space, we have

$$\begin{aligned} Au^*= P_{A \left( {\mathcal {B}}^{{\mathcal {P}}'}\right) } f, \quad \text { and }\quad \Vert f- P_{A \left( {\mathcal {B}}^{{\mathcal {P}}'}\right) } f\Vert ^2 = \min _{u \in {\mathcal {B}}^{{\mathcal {P}}'}} F_{\rho }(u), \end{aligned}$$
(82)

where \(P_{A \left( {\mathcal {B}}^{{\mathcal {P}}'}\right) }\) denotes the orthogonal projection onto the image of \({\mathcal {B}}^{{\mathcal {P}}'}\) under the linear mapping A. Thus,

$$\begin{aligned} \Vert A {{\hat{u}}} - Au^*\Vert ^2&= \Vert A {{\hat{u}}} - P_{A \left( {\mathcal {B}}^{{\mathcal {P}}'}\right) } f\Vert ^2 \nonumber \\&= \Vert A {{\hat{u}}} - f\Vert ^2 - \Vert f- P_{A \left( {\mathcal {B}}^{{\mathcal {P}}'}\right) } f\Vert ^2 = \Vert A {{\hat{u}}} - f\Vert ^2 - \Vert A u^*- f\Vert ^2. \end{aligned}$$
(83)

Inserting \(u^*\) in the estimate (79), we get

$$\begin{aligned} F_{\rho }(u') \le F_{\rho }(u^*,\ldots ,u^*) \le F_{\rho }({{\hat{u}}},\ldots ,{{\hat{u}}}) \le \eta + F_{\rho }(u') \le \eta + F_{\rho }(u^*,\ldots ,u^*). \end{aligned}$$
(84)

This allows us to further estimate

$$\begin{aligned} \Vert A {{\hat{u}}} - Au^*\Vert ^2&= \Vert A {{\hat{u}}} - f\Vert ^2 - \Vert A u^*- f\Vert ^2 \le \Vert A u^*- f\Vert ^2+ \eta - \Vert A u^*- f\Vert ^2 = \eta . \end{aligned}$$
(85)

If now the operator A is lower bounded, then

$$\begin{aligned} \Vert {{\hat{u}}} - u^*\Vert ^2 < \frac{1}{c^2} \Vert A {{\hat{u}}} - Au^*\Vert ^2 \le \frac{\eta }{c^2} \end{aligned}$$
(86)

which completes the proof. \(\square \)

3.4 Majorization–Minimization for Multivariate Potts Problems

In this part, we build the basis for the convergence analysis of Algorithms 1 and 2.

We first recall some basics on surrogate functionals. We consider functionals F(u) of the form \(F(u) = \Vert Xu-z\Vert ^2+ \gamma J(u),\) where X is a given (measurement) matrix with operator norm \(\Vert X\Vert <1\) (with the operator norm formed w.r.t. the \(\ell ^2\) norm), z is a given vector (of data), J is an arbitrary (not necessarily convex) lower semicontinuous functional, and \(\gamma >0\) is a parameter. In general, the surrogate functional \(F^{\mathrm{surr}}(u,v)\) of F(u) is given by

$$\begin{aligned} F^{\mathrm{surr}}(u,v) = F(u) + \Vert u-v\Vert ^2 - \Vert Xu-Xv\Vert ^2. \end{aligned}$$
(87)

Lemma 7

Consider the functionals \(F(u) = \Vert Xu-z\Vert ^2+ \gamma J(u)\) as above with \(\Vert X\Vert <1.\) (For our purposes, J is the regularizer \(\Vert D(u)\Vert _{0,\omega }\) given by (10).) Then, we get for the associated surrogate functional \(F^{\mathrm{surr}}\) given by (87) (with J as regularizer), that

  1. i.

    the inequality

    $$\begin{aligned} F^{\mathrm{surr}}(u,v) \ge F(u) \end{aligned}$$

    holds for all v;  and \(F^{\mathrm{surr}}(u,v) = F(u)\) holds if and only if \(u=v;\)

  2. ii.

    the functional values \(F(u^k)\) of the sequence \(u^k\) given by the surrogate iteration \(u^{k+1} = {\text {argmin}}_u F^{\mathrm{surr}}(u,u^k)\) are non-increasing, i.e.,

    $$\begin{aligned} F(u^{k+1}) \le F(u^{k}); \end{aligned}$$
    (88)
  3. iii.

    the distance between consecutive members of the previous surrogate sequence \(u^k\) converges to 0,  i.e.,

    $$\begin{aligned} \lim _{k \rightarrow \infty } \Vert u^{k+1}-u^k\Vert = 0. \end{aligned}$$
    (89)

We note that—when minimizing F—the condition \(\Vert X\Vert <1\) can always be achieved by rescaling, i.e., dividing the functional F by a number which is larger than \(\Vert X\Vert ^2.\) Proofs of the general statements above on surrogate functionals (which do not rely on the specific structure of the problems considered here) may for instance be found in the above mentioned papers [9, 28, 33].

We now employ properties of the quadratic penalty relaxation \(P_{\gamma , \rho }(u_1,\ldots ,u_S)\) of the Potts energy given by (5). The strategy is similar to the authors’ approach for the univariate case in [90]. We first show that the minimizers of \(P_{\gamma , \rho }(u_1,\ldots ,u_S)\) (with \(B= \mathrm {id}\) in (11)) which are precisely the solutions of (16) have a minimal directional jump height which only depends on the scale parameter \(\gamma ,\) the directional weights \(\omega _s\) and the constant \(L_{\rho }\) but not on the particular input data. Here, for the multivariate discrete function \(u= (u_1,\ldots ,u_S)\) (and the directional system \(a_s,\) \(s=1,\ldots ,S\)) a directional jump is a jump in the sth component \(u_s\) in direction \(a_s\) for some s. In particular, jumps of \(u_s\) in directions \(a_{s'}\) with \(s'\ne s\) are not considered.

Lemma 8

We consider the function \(P_{\gamma , \rho }(u_1,\ldots ,u_S)\) of (11) for the choice \(B=\mathrm {id}\) and data \(h=(h_1,\ldots ,h_S)\). In other words, we consider the problem (16) for arbitrary data \(h=(h_1,\ldots ,h_S)\). Then, there is a constant \(c>0\) which is independent of the minimizer \(u^*=(u_1^*,\ldots ,u_S^*)\) of (16) and the data h such that the minimal directional jump height \(j_{\min }(u^*)\) (w.r.t. the directional system \(a_s,\) \(s=1,\ldots ,S,\)) of a minimizer \(u^{*}\) fulfills

$$\begin{aligned} j_{\min }(u^*) \ge c. \end{aligned}$$
(90)

The constant c depends on \(\gamma ,\) the directional weights \(\omega _s\) and the constant \(L_{\rho }.\)

Proof

Writing \( u = (u_1,\ldots ,u_S)\), we restate (16) as the problem of minimizing

$$\begin{aligned} P^{\mathrm {id}}_{\gamma /L_{\rho }^2}(u_1,\ldots ,u_S) = \left\| u - h \right\| _2^2 + \frac{\gamma }{L_{\rho }^2} \ \Big \Vert \ D(u_1,\ldots ,u_S) \ \Big \Vert _{0,\omega } \end{aligned}$$
(91)

where we use the notation \(\Vert D(u_1,\ldots ,u_S)\Vert _{0,\omega } = \sum _{s=1}^S \omega _s \left\| \nabla _{a_s} u_s \, \right\| _0\) introduced in (10). We let

$$\begin{aligned} c = \sqrt{\tfrac{\gamma \ \min _{s \in \{1,\ldots ,S\}} \omega _s }{L_{\rho }^2 W}}, \end{aligned}$$
(92)

where W denotes the maximal length of the signal u per dimension (e.g., if u denotes an \(l \times b\) image, then \(W=\max (l,b)\).) We now assume that \(h_{\min }(u^*) < c,\) which means that the minimizer \(u^*\) has a directional jump of height smaller than c. For such \(u^*,\) we construct an element \(u'\) with a smaller \(P^{\mathrm {id}}_{\gamma /L_{\rho }^2}\) value which yields a contradiction since \(u^*\) is a minimizer of \(P^{\mathrm {id}}_{\gamma /L_{\rho }^2}\). To this end, we let \(a_s\) be a direction such that the component \(u^*_s\) of \(u^*\) has a jump of height smaller than c. We denote the (discrete) directional intervals in direction \(a_s\) near the directional jump by \(I_1,I_2\) and the corresponding points near the directional jump of \(u_s^*\) by \(x_1\) and \(x_2.\) We let \(m_1,m_2\) and m be the mean of \(h_s\) on \(I_1,I_2\) and \(I_1 \cup I_2\), respectively. We define

$$\begin{aligned} u_{s'}' = u_{s'}^*\quad \text {if} \quad s' \ne s, \qquad \text { and } \qquad u_s'(x) = {\left\{ \begin{array}{ll} m &{} \text { for } x \in I_1 \cup I_2 \\ u_s^*(x) &{} \text { elsewhere. } \end{array}\right. } \end{aligned}$$
(93)

By construction, \(\Vert \nabla u'\Vert _0 = \Vert \nabla u^*\Vert _0 -1,\) and thus

$$\begin{aligned} \Vert D(u'_1,\ldots ,u'_S)\Vert _{0,\omega } = \Vert D(u^*_1,\ldots ,u^*_S)\Vert _{0,\omega } - \omega _s \le \Vert D(u^*_1,\ldots ,u^*_S)\Vert _{0,\omega } - \min _{s \in \{1,\ldots ,S\}} \omega _s. \end{aligned}$$
(94)

Since \(u^*\) is a minimizer of \(P^{\mathrm {id}}_{\gamma /L_{\rho }^2},\) its sth component \(u^*_s\) equals \(m_1\) on \(I_1\) and \(m_2\) on \(I_2.\) Further, as \(u_{s'}' = u_{s'}^*\) if \(s' \ne s\) and \(u_s^*\) and \(u_s'\) only differ on \(I_1 \cup I_2,\) we have that

$$\begin{aligned} \Vert u'-h\Vert ^2 = \sum _{s'=1}^S \Vert u_{s'}'-h_{s'}\Vert ^2&= \sum _{s'=1, s' \ne s }^S \Vert u_{s'}^*-h_{s'}\Vert ^2 +\Vert u_s^*-h_s\Vert ^2 \nonumber \\&\quad + l_1 |m_1-m|^2 + l_2 |m_2-m|^2 \nonumber \\&< \Vert u^*-h\Vert ^2 + W c^2, \end{aligned}$$
(95)

where \(l_1,l_2\) denote the length of \(I_1\) and \(I_2,\) respectively. Employing (94) together with (95), we get

$$\begin{aligned} P^{\mathrm {id}}_{\gamma /L_{\rho }^2}(u'_1,\ldots ,u'_S)&= \left\| u' - h \right\| _2^2 + \frac{\gamma }{L_{\rho }^2} \ \Big \Vert \ D(u'_1,\ldots ,u'_S) \ \Big \Vert _{0,\omega } \\&< \Vert u^*-h\Vert ^2 + W c^2 + \frac{\gamma }{L_{\rho }^2}\Vert D(u^*_1,\ldots ,u^*_S)\Vert _{0,\omega } - \frac{\gamma }{L_{\rho }^2}\min _{s \in \{1,\ldots ,S\}} \omega _s \\&\le \Vert u^*-h\Vert ^2 + \frac{\gamma }{L_{\rho }^2}\Vert D(u^*_1,\ldots ,u^*_S)\Vert _{0,\omega } = P^{\mathrm {id}}_{\gamma /L_{\rho }^2}(u^*_1,\ldots ,u^*_S). \end{aligned}$$

The validity of the last inequality follows by (92). Together, \(u'\) has a smaller function value than \(u^*\) which is a contradiction to \(u^*\) being a minimizer which shows the assertion. \(\square \)

Proposition 3

The iteration (19) of Algorithm 1 converges to a local minimizer of the quadratic penalty relaxation \(P_{\gamma , \rho }\) of the Potts objective function given by (5). The convergence rate is linear.

Proof

We divide the proof into three parts. First, we show that the directional partitionings induced by the iterates \(u^{(n)}\) become fixed after sufficiently many iterations. In a second part, we derive the convergence of Algorithm 1 and, in a third part, we show that the limit point is a local minimizer of \(P_{\gamma , \rho }\).

(1) We first show that the directional partitioning \({\mathcal {I}}^n\) induced by the iterates \(u^{(n)}\) gets fixed for large n. For every \(n \in {\mathbb {N}},\) the iterate \(u^{(n)}\) of Algorithm 1 is a minimizer of the function \(P_{\gamma , \rho }\) of (11) for the choice \(B=\mathrm {id}\) as it appears in (16). Here, the data \(h=(h_1,\ldots ,h_S)\) is given by (17). By Lemma 8, there is a constant \(c>0\) which is independent of the particular \(u^{(n)} =(u_1^{(n)},\ldots ,u_S^{(n)})\) of (16) and the data h such that the minimal directional jump height \(j_{\min }(u^{(n)})\) fulfills

$$\begin{aligned} j_{\min }(u^{(n)}) \ge c \quad \text { for all }\quad n \in {\mathbb {N}}. \end{aligned}$$
(96)

We note that the parameter \(\gamma ,\) the directional weights \(\omega _s\) and the constant \(L_{\rho }\) which the constant c depends on by Lemma 8 do not change during the iteration of Algorithm 1.

If two iterates \(u^{(n)},u^{(n+1)}\) have different induced directional partitionings \({\mathcal {I}}^n, {\mathcal {I}}^{n+1}\) their \(\ell ^\infty \) distance always fulfills \(\Vert u^{(n)}-u^{(n+1)}\Vert _\infty > c/2\) since both \(u^{(n)},u^{(n+1)}\) have minimal jump height of at least c and different induced directional partitionings. This implies \(\Vert u^{(n)}-u^{(n+1)}\Vert _2> c/2\) for the \(\ell ^2\) distance as well. This may only happen for small n,  since Lemma 7 by (89), we have \(\Vert u^{(n)}-u^{(n+1)}\Vert _2 \rightarrow 0\) as n increases. Hence, there is an index N such that, for all \(n \ge N,\) the directional partitionings \({\mathcal {I}}^n\) are identical.

(2) We use the previous observation to show the convergence of Algorithm 1. We consider iterates \(u^{(n)}\) with \(n \ge N;\) they have the same induced directional partitionings which we denote by \({\mathcal {I}}',\) and all jumps have minimal jump height c. Hence, for \(n \ge N,\) the iteration of (16) can be written as

$$\begin{aligned} u^{(n+1)} = P_{{\mathcal {I}}'}(h^{(n)}) \end{aligned}$$
(97)

with \(P_{{\mathcal {I}}'}\) being the orthogonal projection onto the \(\ell ^2\) space \({\mathcal {A}}^{{\mathcal {P}}}\) consisting of functions which are piecewise constant w.r.t. the directional partitioning \({\mathcal {I}}',\) and where \(h^{(n)}\) depends on \(u^{(n)}\) via

$$\begin{aligned} h^{(n)}_s = u^{(n)}_s + \tfrac{1}{SL_{\rho }^2} A^*f - \tfrac{1}{S L_{\rho }^2} A^*A u^{(n)}_s - \sum _{s':s' \ne s}\tfrac{\rho _{s,s'}}{L_{\rho }^2} (u^{(n)}_s-u^{(n)}_{s'}), \quad \text { for all } s \in \{1,\ldots ,S\}, \end{aligned}$$
(98)

as given by (17). As introduced before, we use the symbols \({\tilde{A}}\) to denote the block diagonal matrix with constant entry A on each diagonal component, and \({\tilde{f}}\) for the block vector of corresponding dimensions with entry f in each component. With this notation, we may write (97) as

$$\begin{aligned} u^{(n+1)} = P_{{\mathcal {I}}'}((I - \tfrac{1}{SL_{\rho }^2} ({\tilde{A}})^\mathrm{T} {\tilde{A}}- \tfrac{1}{SL_{\rho }^2} \rho \ C^\mathrm{T}C) u^{(n)} + \tfrac{1}{SL_{\rho }^2} {\tilde{A}}^\mathrm{T}{\tilde{f}}). \end{aligned}$$
(99)

Since \(u^{(n)}\) is piecewise constant w.r.t. the directional partitioning \({\mathcal {I}}',\) we have \(u^{(n)} = P_{{\mathcal {I}}'} u^{(n)}.\) Using this fact and the fact that \(P_{{\mathcal {I}}'}\) is an orthogonal projection we obtain

$$\begin{aligned} u^{(n+1)}= & {} \left( I - \left( \left( \tfrac{{\tilde{A}} P_{{\mathcal {I}}'}}{\sqrt{S} L_{\rho }} \right) ^\mathrm{T} \left( \tfrac{{\tilde{A}} P_{{\mathcal {I}}'}}{\sqrt{S} L_{\rho }} \right) + \left( \tfrac{\sqrt{\rho }C P_{{\mathcal {I}}'}}{\sqrt{S} L_{\rho }} \right) ^\mathrm{T} \left( \tfrac{\sqrt{\rho }C P_{{\mathcal {I}}'}}{\sqrt{S} L_{\rho }} \right) \right) \right) u^{(n)} \nonumber \\&+ \left( \tfrac{{\tilde{A}} P_{{\mathcal {I}}'}}{\sqrt{S} L_{\rho }} \right) ^\mathrm{T} \tfrac{{\tilde{f}}}{{\sqrt{S} L_{\rho }}}. \end{aligned}$$
(100)

Since \(C {\tilde{A}}^\mathrm{T}{\tilde{f}} = 0,\) the iteration (100) can be interpreted as Landweber iteration for the block matrix consisting of the upper block \(({\tilde{A}} P_{{\mathcal {I}}'})/(\sqrt{S} L_{\rho })\) and the lower block \((\sqrt{\rho }C P_{{\mathcal {I}}'})/(\sqrt{S} L_{\rho })\) and data \({\tilde{f}}/(\sqrt{S} L_{\rho })\) extended by 0. The Landweber iteration converges at a linear rate; cf., e.g., [31]. Thus, the iteration (97) convergences and, in turn, we get the convergence of Algorithm 1 at a linear rate to some limit \(u^*\).

(3) We show that \(u^*\) is a local minimizer. Since \(u^*\) is the limit of the iterates \(u^{(n)} \), the jumps of \(u^*\) also have minimal height c,  the number of jumps are equal to those of the \(u^{(n)}\) for all \(n \ge N,\) and the induced directional partitioning \({\mathcal {I}}^*\) equals the partitioning \({\mathcal {I}}'\) of the \(u^{(n)}\) for \(n \ge N.\) Since \(u^*\) equals the limit of the above Landweber iteration, \(u^*\) minimizes \(F_\rho \) given by (62) on \({\mathcal {A}}^{{\mathcal {P}}_{{\mathcal {I}}'}}.\) Then, by Lemma 4\(u^*\) is a local minimizer of the relaxed Potts energy \(P_{\gamma ,\rho }\) which completes the proof. \(\square \)

After having shown the convergence of Algorithm 1 to a local minimizer, we have now gathered all information to show Theorem 4.

Proof of Theorem 4

Assertion i. was stated and shown as Proposition 1 in Sect. 3.3. By Proposition 3, Algorithm 1 produces a local minimizer. Then, the assertion ii. is a consequence of Proposition 2. \(\square \)

3.5 Estimating the Distance Between the Objectives

The next lemma is a preparation for the proof of item (iii) of Theorem 3.

Lemma 9

We consider Algorithm 1 for the quadratic penalty relaxation (5) of the multivariate Potts problem. For any output \(u =(u_1,\ldots ,u_S)\) of Algorithm 1 we have that

$$\begin{aligned} \left( \sum \nolimits _{s,s'} c_{s,s'} \Vert u_s - u_{s'}\Vert ^2_2\right) ^{\tfrac{1}{2}} \le 2 \sigma _1^{-1/2} S^{-1/2} \Vert A\Vert \Vert f\Vert / \rho . \end{aligned}$$
(101)

Here, \(\sigma _1\) denotes the smallest nonzero eigenvalue of \(C^\mathrm{T}C\) with C given by (49).

Proof

Since \(u =(u_1,\ldots ,u_S)\) is the output of Algorithm 1 it is a local minimizer of the relaxed Potts problem (5). In particular, there is a directional partitioning \({\mathcal {I}}\) with respect to which u is piecewise constant. We denote the induced partitioning by \({\mathcal {P}} ={\mathcal {P}}_{{\mathcal {I}}}.\) By Lemma 6, we have

$$\begin{aligned} \left( \sum \nolimits _{s,s'} c_{s,s'} \Vert u_s - u_{s'}\Vert ^2_2\right) ^{\tfrac{1}{2}} = \Vert C u \Vert = \Vert C P_{{\mathcal {I}}} u \Vert \le \tfrac{1}{\rho }\Vert \mu ^*\Vert + \sqrt{\tfrac{F_{\rho }(u)- \min _{x \in {\mathcal {A}}^{{\mathcal {P}}}} F_{\rho }(x)}{\rho }},\nonumber \\ \end{aligned}$$
(102)

where \(\mu ^*\) is an arbitrary Lagrange multiplier of (48). By Lemma 3, we have that \(\Vert \mu ^*\Vert \le 2 \sigma _1^{-1/2} S^{-1/2} \Vert A\Vert \Vert f\Vert ,\) for any partitioning of the discrete domain \(\Omega ,\) and in particular for the partitioning \({\mathcal {P}} ={\mathcal {P}}_{{\mathcal {I}}}.\) This shows that

$$\begin{aligned} \Vert Cu\Vert \le 2 \sigma _1^{-1/2} S^{-1/2} \Vert A\Vert \Vert f\Vert / \rho + \sqrt{\tfrac{F_{\rho }(u)- \min _{x \in {\mathcal {A}}^{{\mathcal {P}}}} F_{\rho }(x)}{\rho }}. \end{aligned}$$

Since u is a local minimizer of the relaxed Potts problem (5), it is a minimizer of \(F_{\rho }\) on \({\mathcal {A}}^{{\mathcal {P}}}\) by Lemma 4, and the second summand on the right-hand side equals zero. This shows (101) and completes the proof. \(\square \)

We have now gathered all information necessary to show Theorem 3.

Proof of Theorem 3

Part (i) is shown by Proposition 3.

Concerning (ii), we first show that any global minimizer of the relaxed Potts energy \(P_{\gamma , \rho }\) given by (5) appears as a stationary point of Algorithm 1. To this end, we start Algorithm 1 with a global minimizer \({{\bar{u}}}^*=(u_1^*,\ldots ,u_S^*)\) as initialization. Then, we have for all \({{\bar{v}}}=(v_1,\ldots ,v_S)\) with \({{\bar{v}}} \ne {{\bar{u}}}^*,\)

$$\begin{aligned} P_{\gamma , \rho }^{\mathrm{surr}}\left( v_1,\ldots ,v_S,u_1^*,\ldots ,u_S^*\right)&= P_{\gamma , \rho }({{\bar{v}}}) - \Vert B{{\bar{v}}}-B{{\bar{u}}}^*\Vert ^2 + \Vert {{\bar{v}}}-{{\bar{u}}}^*\Vert ^2\\&> P_{\gamma , \rho }({{\bar{v}}}) \ge P_{\gamma , \rho }({{\bar{u}}}^*) = P_{\gamma , \rho }^{\mathrm{surr}} ({{\bar{u}}}^*,{{\bar{u}}}^*). \nonumber \end{aligned}$$
(103)

Here, B is given by (8). The estimate (103) means that \({{\bar{u}}}^*\) is the minimizer of the surrogate functional w.r.t. the first component, i.e., it is the minimizer of the mapping \({{\bar{v}}} \mapsto \) \(P_{\gamma , \rho }^{\mathrm{surr}}({{\bar{v}}},{{\bar{u}}}^*).\) Thus, the iterate \({{\bar{u}}}^{(1)}=(u^{(1)}_1,\ldots ,u^{(1)}_S)\) of Algorithm 1 equals \({{\bar{u}}}^*\) when the iteration is started with \({{\bar{u}}}^*.\) Thus, the global minimizer \({{\bar{u}}}^*\) is a stationary point of Algorithm 1.

It remains to show that each stationary point of Algorithm 1 is a local minimizer of the relaxed Potts energy \(P_{\gamma , \rho }\). This has essentially already been done in the proof of Proposition 3: start the iteration given by (16) with a stationary point \(u';\) its limit equals \(u'\) and is thus a local minimizer by Proposition 3.

Concerning (iii), we use Lemma 9 to estimate

$$\begin{aligned} \left( \sum \nolimits _{s,s'} c_{s,s'} \Vert u_s - u_{s'}\Vert ^2_2\right) ^{\tfrac{1}{2}} \le 2 \sigma _1^{-1/2} S^{-1/2} \Vert A\Vert \Vert f\Vert / \rho < \varepsilon . \end{aligned}$$
(104)

The second inequality follows by our choice of \(\rho \) in (34) as \(\rho > 2 \varepsilon ^{-1} \ \sigma _1^{-1/2} S^{-1/2} \Vert A\Vert \Vert f\Vert .\) This shows the validity of (iii) and completes the proof. \(\square \)

3.6 Convergence Analysis of Algorithm 2

We start out showing that Algorithm 2 is well defined in the sense that the inner iteration governed by (25) terminates. This result was formulated as Theorem 5.

Proof of Theorem 5

We have to show that, for any \(k \in {\mathbb {N}},\) there is \(n \in {\mathbb {N}}\) such that

$$\begin{aligned} \left\| u^{(k,n)}_s - u^{(k,n)}_{s'} \right\| \le \frac{t}{\rho _k \sqrt{c_{s,s'}}}, \quad \text { and } \quad \left\| u^{(k,n)}_s - u^{(k,n-1)}_s \right\| \le \frac{\delta _k}{L_{\rho }}. \end{aligned}$$
(105)

To see the right-hand side of (105), we notice that, by Proposition 3, the iteration (19) converges to a local minimizer of the quadratic penalty relaxation \(P_{\gamma , \rho }(u_1,\ldots ,u_S)\) of the Potts energy. The inner loop of Algorithm 2 precisely computes the iteration (19) (for the penalty parameter \(\rho \) which increases with k.) Thus, the distance between consecutive iterates \(u^{(k,n)}_s,u^{(k,n-1)}_s\) converges to zero as n increases which implies the validity of the right-hand side of (105) for sufficiently large n,  and all \(k \in {\mathbb {N}}.\)

To see the left-hand inequality in (105), we notice that, by the considerations above, the inner loop of Algorithm 2 would converge to a minimizer \({{\bar{u}}}^{(k),*} = (u_1^{(k),*},\ldots ,u_S^{(k),*})\) if it was not terminated by (105) for all \(k \in {\mathbb {N}}.\) Since \({{\bar{u}}}^{(k),*}\) is a local minimizer of the relaxed Potts problem (5) for the parameter \(\rho _k\), it is a minimizer of \(F_{\rho _k}\) on \({\mathcal {A}}^{{\mathcal {P}}}\) (where \({\mathcal {P}}\) denotes the partitioning induced by \({{\bar{u}}}^{(k),*}\)) by Lemma 4. Hence, for any \(k\in {\mathbb {N}}\) and any \(\xi >0\) there is \({{\bar{u}}}^{(k,n)} =(u^{(k,n)}_1,\ldots ,u^{(k,n)}_S)\) such that \(F_{\rho _k}({{\bar{u}}}^{(k,n)})- F_{\rho _k}({{\bar{u}}}^{(k),*}) < \xi .\) We let \(\tau = (t - 2\sigma _1^{-1/2} S^{-1/2} \Vert A\Vert \ \Vert f\Vert ) /\rho _k,\) and choose \(\xi = \rho _k \tau ^2.\) Using this together with Lemma 6, we estimate

$$\begin{aligned} \sqrt{c_{s,s'}} \Vert u_s^{(k,n)} - u_{s'}^{(k,n)}\Vert _2 = \Vert C {{\bar{u}}}^{(k,n)} \Vert&\le \tfrac{1}{\rho _k}\Vert \mu ^*\Vert + \sqrt{\tfrac{F_{\rho _k}({{\bar{u}}}^{(k,n)})- F_{\rho _k}({{\bar{u}}}^{(k),*})}{\rho _k}} \nonumber \\&\le \tfrac{1}{\rho _k}\Vert \mu ^*\Vert + \sqrt{\tfrac{\xi }{\rho _k}} \le \tfrac{1}{\rho _k}\Vert \mu ^*\Vert + \tau \le \tfrac{t}{\rho _k} \end{aligned}$$
(106)

where \(\mu ^*\) is an arbitrary Lagrange multiplier of (48). Here, the last inequality is true since by Lemma 3 we have that \(\Vert \mu ^*\Vert \le 2 \sigma _1^{-1/2} S^{-1/2} \Vert A\Vert \Vert f\Vert \) which implies that \(\tau \le (t-\mu ^*)/\rho _k.\) The estimate (106) shows the left-hand inequality in (105) and completes the proof. \(\square \)

We have now gathered all information to prove Theorem 6 which deals with the convergence properties of Algorithm 2.

Proof of Theorem 6

We start out to show that any accumulation point of the sequence \(u^{(k)}\) produced by Algorithm 2 is a local minimizer of the Potts problem (3). Let \(u^*\) be such an accumulation point and let \({\mathcal {I}}^*\) be the directional partitioning induced by \(u^*.\) We may extract a subsequence \(u^{(k_l)}\) of the sequence \(u^{(k)}\) such that \(u^{(k_l)}\) converges to \(u^*\) as \(l \rightarrow \infty ,\) and such that the directional partitionings \({\mathcal {I}}^{k_l}\) induced by the \(u^{(k_l)}\) all equal the directional partitioning \({\mathcal {I}}^*,\) i.e., \({\mathcal {I}}^{k_l}={\mathcal {I}}^*\) for all \(l \in {\mathbb {N}}.\) We let

$$\begin{aligned} \mu ^{k_l} = - 2 \rho _{k_l} \ C u^{k_l} \end{aligned}$$
(107)

with the matrix C given by (49), and estimate

$$\begin{aligned} \nonumber \Vert \tfrac{2}{S}{\tilde{A}}^\mathrm{T} {\tilde{A}} u^{k_l} - \tfrac{2}{S}{\tilde{A}}^\mathrm{T} {\tilde{f}}- C^\mathrm{T} \mu ^{k_l} \Vert&= \Vert \tfrac{2}{S}{\tilde{A}}^\mathrm{T} {\tilde{A}} u^{k_l} - \tfrac{2}{S}{\tilde{A}}^\mathrm{T} {\tilde{f}}+ 2 \rho _{k_l} C^\mathrm{T} \ C u^{k_l}\Vert \\&= \Vert \nabla F_{\rho _{k_l}}(u^{k_l}) \Vert \le \tfrac{\delta _{k_l}}{L_{\rho _{k_l}}} \le \delta _{k_l}. \end{aligned}$$
(108)

We recall that \({\tilde{A}}\) was the block diagonal matrix having the matrix A as entry in each diagonal component and that \(F_{\rho _{k_l}}\) was given by (62). We notice that the second before last inequality follows by the right-hand side of (105). We further estimate

$$\begin{aligned} \Vert \mu ^{k_l}\Vert = \rho _{k_l} \Vert C u^{k_l}\Vert < \rho _{k_l} \tfrac{S t}{\rho _{k_l}} = S t \end{aligned}$$

which is a consequence of the left-hand side of (105). Hence, the sequence \(\mu ^{k_l}\) is bounded and thus has a cluster point, say \(\mu ^*,\) by the Bolzano–Weierstraß Theorem. By passing to a further subsequence (where we suppress the new indexation for better readability and still use the symbol l for the index), we get that

$$\begin{aligned} \mu ^{k_l} \rightarrow \mu ^*\quad \text { as }\quad l \rightarrow \infty . \end{aligned}$$
(109)

Now, on this subsequence, we have that \(u^{(k_l)} \rightarrow u^{*}\) and that \(\mu ^{k_l} \rightarrow \mu ^*.\) Hence, taking limits on both sides of (108) yields

$$\begin{aligned} \tfrac{2}{S}{\tilde{A}}^\mathrm{T} {\tilde{A}} u^{*} - \tfrac{2}{S}{\tilde{A}}^\mathrm{T} {\tilde{f}} - C^\mathrm{T} \mu ^{*} = 0, \end{aligned}$$
(110)

since \(\delta _{k_l} \rightarrow 0\) as \(l \rightarrow \infty .\) Further,

$$\begin{aligned} \Vert Cu^*\Vert \le \lim _{l \rightarrow \infty }\tfrac{\Vert \mu ^{k_l}\Vert }{\rho _{k_l}} \le \Vert \mu ^*\Vert \lim _{l \rightarrow \infty }\tfrac{1}{\rho _{k_l}} =0. \end{aligned}$$
(111)

This implies that the components of \(u^*\) are equal, i.e., \(u_s^{*} = u_{s'}^{*}\) for all \(s,s'.\) In particular \(u^*\) is a feasible point for the Potts problem (3). Or, letting \({\mathcal {P}}^*\) be the partitioning induced by \(u^*,\) we have that \(u^*\in {\mathcal {B}}^{{\mathcal {P}}}.\) Then, (110) tells us that \(u^*\) minimizes (48) which by Lemma 5 tells us that \(u^*\) is a local minimizer of (3), or synonymously, that any component of \(u^*\) (which are all equal) minimizes the Potts problem (2). This shows the first assertion of Theorem 6.

We continue by showing the second assertion of Theorem 6, i.e., if A is lower bounded, then the sequence \(u^{(k)}\) produced by Algorithm 2 has a cluster point. Then, by the above considerations, each cluster point is a local minimizer which shows the assertion. To this end, we show that, if A is lower bounded, the sequence \(u^{(k)}\) produced by Algorithm 2 is bounded which by the Heine–Borel property of finite dimensional Euclidean space implies that it has a cluster point. So we assume that A is lower bounded, and consider the sequence \(u^{(k)}=(u_1^{(k)},\ldots ,u_S^{(k)})\) produced by Algorithm 2. As in the proof of Theorem 5 we see that, for any \(k \in {\mathbb {N}}\), there is a local minimizer \( u^{(k),*} = (u_1^{(k),*},\ldots ,u_S^{(k),*})\) of (5) such that

$$\begin{aligned} \Vert u^{(k)}- u^{(k),*}\Vert \le C_2 \delta _k, \end{aligned}$$
(112)

where \(C_2\) is a constant independent of k. By Lemma 4, \( u^{(k),*}\) is a minimizer of \(F_{\rho }\) on \({\mathcal {A}}^{{\mathcal {P}}}\) (where \({\mathcal {P}}\) denotes the partitioning induced by \( u^{(k),*}\).) Hence,

$$\begin{aligned} \tfrac{1}{S} \sum \nolimits _{s=1}^S\Vert A u_s^{(k),*} - f \Vert ^2 \le F_{\rho }( u^{(k),*}) \le \Vert f \Vert ^2 \end{aligned}$$

by choosing the function having the zero function as entry in each component as a candidate. This implies

$$\begin{aligned} \tfrac{1}{S} \sum \nolimits _{s=1}^S\Vert A u_s^{(k),*} \Vert ^2 \le 4 \Vert f \Vert ^2. \end{aligned}$$
(113)

Then, since A is lower bounded, there is a constant \(c>0\) such that

$$\begin{aligned} \Vert u^{(k),*}\Vert ^2 = \tfrac{1}{S} \sum \nolimits _{s=1}^S\Vert u_s^{(k),*}\Vert ^2 \le \tfrac{1}{S} \sum \nolimits _{s=1}^S c^2\Vert A u_s^{(k),*}\Vert ^2 \le 4 c^2 \Vert f \Vert ^2 \end{aligned}$$
(114)

where we used (113) for the last inequality. Combining this estimate with (112) yields

$$\begin{aligned} \Vert u^{(k)}\Vert \le \Vert u^{(k)}- u^{(k),*}\Vert + \Vert u^{(k),*}\Vert \le C_2 \delta _k + 2 c \Vert f\Vert . \end{aligned}$$
(115)

Since we have chosen \(\delta _k\) as a sequence converging to zero, (115) shows that the sequence \(u^{(k)}\) is bounded which implies that it has cluster points. This completes the proof. \(\square \)

4 Numerical Results

In this section, we show the applicability of our methods to different imaging tasks. We start out by providing the necessary implementation details. Then, we compare the results of the quadratic relaxation (5) (Algorithm 1) to the ones of the Potts problem (2) (Algorithm 2). Next, we apply Algorithm 2 to blurred image data and to image reconstruction from incomplete Radon data. Finally, we consider the image partitioning problem according to the classical Potts model.

Implementation Details We implemented Algorithms 1 and 2 for the coupling schemes in (7) and the set of compass and diagonal directions \( (1,0),(0,1),(1,1),(1,-1)\) with weights \(\omega _{1,2} = \sqrt{2}-1\) and \(\omega _{3,4} = 1-\frac{\sqrt{2}}{2}\).

Concerning Algorithm 1, we observed both visually and quantitatively appealing results if we use relaxed step-sizes \(L_\rho ^\lambda = L_\rho [\lambda + (1-(n+1)^{-1/2})(1-\lambda ) ]\) for an empirically chosen parameter \(0<\lambda \le 1\), where \(L_\rho \) denotes the estimate in Lemma 1. The iterations were stopped when the nearness condition (18) was fulfilled and the iterates did not change anymore, i.e., when \(\Vert u_1^{(n)} - u_1^{(n-1)}\Vert /(\Vert u_1^{(n)}\Vert + \Vert u_1^{(n-1)}\Vert )\) and \(\Vert u_2^{(n)} - u_2^{(n-1)}\Vert /(\Vert u_2^{(n)}\Vert + \Vert u_2^{(n-1)}\Vert )\) were smaller than \(10^{-6}\). The result of Algorithm 1 was transformed into a feasible solution of (3) by applying the projection procedure described in Sect. 2.2 (Procedure 1). As initialization we applied 1000 Landweber iterations with step-size \(1/\Vert A\Vert ^2\) to the least squares problem induced by the linear operator A and data f.

Concerning Algorithm 2, we set \(\rho ^{(0)} = 10^{-3}\) in all experiments which we incremented by the factor \(\tau = 1.05\) in each outer iteration. The \(\delta \)-sequence was chosen as \(\delta ^{(k)} = \frac{1}{\eta \rho ^{(k)}}\) for \(\eta = 0.95\) when coupling all variables and \(\eta = 0.98\) when coupling consecutive variables. Similarly to Algorithm 1, step A of Algorithm 2 was performed using the relaxed step sizes \(L_\rho ^\lambda = L_\rho [\lambda + (1-(n+1)^{-1/2})(1-\lambda ) ]\) for an application-dependent parameter \(0<\lambda \le 1\) and for the estimate \(L_\rho \) in Lemma 1. We stopped the iterations when the relative discrepancy of the first two splitting variables \(\Vert u^{(k)}_1 - u^{(k)}_2 \Vert / (\Vert u^{(k)}_1 \Vert + \Vert u^{(k)}_2 \Vert ) \) was smaller than \(10^{-6}\). We initialized Algorithm 2 with \(A^\mathrm{T}f\).

Comparison of Algorithm 1 and Algorithm 2 We compare Algorithm 1 and Algorithm 2 for blurred image data, that is, the linear operator A in (1) amounts to the convolution by a kernel K. In the present experiment, we chose a Gaussian kernel with standard deviation \(\sigma =7\) and of size \(6\sigma + 1\). Here, we coupled all splitting variables and chose the step-size parameter \(\lambda =0.4\) for Algorithm 1 and \(\lambda = 0.35\) for Algorithm 2, respectively. In Fig. 1, we applied both methods to a blurred natural image. While both algorithms yield reasonable partitionings, Algorithm 2 provides smoother edges than Algorithm 1. Further, Algorithm 1 produces some smaller segments (at the treetops).

Fig. 1
figure 1

Results of Algorithms 1 and 2 for partitioning an image blurred by a Gaussian kernel of standard deviation 3. Both methods provide reasonable partitionings. Algorithm 2 provides smoother edges than Algorithm 1 (e.g., the boundary between the meadow and the forest, the back of the cow). In addition, Algorithm 1 produces some smaller segments around the treetops

Application to Blurred Data For the following experiments, we focus on Algorithm 2. In case of motion blur we set the step-size parameter to \(\lambda = 0.25\), while for Gaussian blur we set \(\lambda =0.35\) as in Fig. 1. We compare our method with the Ambrosio–Tortorelli approximation [2] of the classical Mumford–Shah model (which itself tends to the piecewise constant Mumford–Shah model for increasing variation penalty) given by

$$\begin{aligned} \begin{aligned} A_\varepsilon (u,v) = \gamma \int \varepsilon \vert \nabla v \vert ^2 +\frac{(v-1)^2}{4\varepsilon } \mathrm {d}x +\alpha \int v^2 \Vert \nabla u \Vert ^2 \mathrm {d}x + \frac{1}{2} \int (K *u - f) \mathrm {d}x. \end{aligned} \end{aligned}$$
(116)

The variable v serves as an edge indicator and \(\varepsilon >0\) is an edge smoothing parameter that is chosen empirically. The parameter \(\gamma > 0\) controls the weight of the edge length penalty and the parameter \(\alpha > 0\) penalizes the variation. In this respect, a higher value of \(\alpha \) promotes solutions which are closer to being piecewise constant. In the limit \(\alpha \rightarrow \infty \), minimizers of (116) are piecewise constant. Our implementation follows the scheme presented in [6]. The functional \(A_\varepsilon \) is alternately minimized w.r.t. u and v. To this end, we iteratively solve the Euler–Lagrange equations

$$\begin{aligned} \begin{aligned} 2\alpha v \Vert \nabla u \Vert _2^2 + \gamma \frac{v-1}{2\varepsilon } - 2\varepsilon \gamma \nabla ^2 v&= 0, \\ (K*u - f) *{\widetilde{K}} - 2\alpha \mathrm {div}(v^2 \nabla u)&= 0, \end{aligned} \end{aligned}$$
(117)

where \({\widetilde{K}}(x) = K(-x)\). The first line is solved w.r.t. v using a MINRES solver and the second line is solved using the method of conjugate gradients [6]. The iterations were stopped when the relative change of both variables was small, i.e., if both \(\Vert u^{k+1} - u^k \Vert /(\Vert u^k\Vert + 10^{-6}) <10^{-3}\) and \(\Vert v^{k+1} - v^k \Vert /(\Vert v^k\Vert + 10^{-6}) <10^{-3}\).

Figure 2 shows the restoration of a traffic sign from simulated horizontal motion blur. For the Ambrosio–Tortorelli approximation, we set \(\alpha = 10^5\) to promote a piecewise constant solution. We observe that both the Ambrosio–Tortorelli approximation and the proposed method restore the data to a human readable form. However, the Ambrosio–Tortorelli result shows clutter and blur artifacts. Our method provides sharp edges and it produces less artifacts.

In Fig. 3, we partition a natural image blurred by a Gaussian kernel and corrupted by Gaussian noise. We observed that the Ambrosio–Tortorelli result was heavily corrupted by artifacts for \(\alpha =10^5\). This might be attributed to the underlying linear systems in scheme (117) which become severely ill-conditioned for large choices of the variation penalty \(\alpha \). Therefore, we chose the moderate variation penalty \(\alpha =10^{5}\) which does only provide an approximately piecewise constant (rather piecewise smooth) result. The result does not fully separate the background from the fish in terms of edges. On the other hand, the result of the proposed method sharply differentiates between background and fish. Further, it highlights various segments of the fish.

Fig. 2
figure 2

Restoration from simulated horizontal motion blur of 80 pixel length and Gaussian noise with \(\sigma =0.02\). The result of the Ambrosio–Tortorelli scheme exhibits noisy and blurred artifacts and bumpy edges (e.g., the boundaries of the digits). The contours of the proposed result are concise and considerably less clutter is present

Fig. 3
figure 3

Partitioning of an image blurred by a Gaussian kernel of standard deviation 7 and corrupted by Gaussian noise with \(\sigma =0.2\). The result of the Ambrosio–Tortorelli approximation does not yield a convincing partitioning of the scene, in particular many parts of the fish are merged with the background. The proposed approach provides a partitioning which reflects many parts of the fish

Fig. 4
figure 4

Reconstruction of the Shepp–Logan phantom from undersampled Radon data (25 projection angles) corrupted by Gaussian noise with \(\sigma =0.7\). The proposed method provides a genuine piecewise constant reconstruction and the SSIM is improved by the factor 11.58 for filtered backprojection and by 1.05 for total variation, respectively

Reconstruction from Radon Data We here consider reconstruction from Radon data which appears for instance in computed tomography. We recall that the Radon transform reads

$$\begin{aligned} \begin{aligned} Ru(\theta ,s) = \int _{-\infty }^\infty u(s\theta + t\theta ^\perp ) \mathrm {d}t, \end{aligned} \end{aligned}$$
(118)

where \(s\in \mathbb {R},\) \(\theta \in S^1\) and \(\theta ^\perp \in S^1\) is (counterclockwise) perpendicular to \(\theta \); see [64]. For our experiments, we use a discretization of the Radon transform created using the AIR tools software package [39]. Regarding our method, we employed coupling of consecutive splitting variables and the step-size parameter was set to \(\lambda =0.11\). To quantify the reconstruction quality, we use the mean structural similarity index (MSSIM) [89] which is bounded from above by 1, where higher values indicate better results.

We compare the proposed method to filtered back projection (FBP) which is standard in practice [71]. The FBP is computed using its Matlab implementation with the standard Ram–Lak filter. Furthermore, we compare with total variation (TV) regularization [76] in the Lagrange form \(\Vert Ru-f\Vert _2^2 + \mu \Vert \nabla u \Vert _1\) with parameter \(\mu >0\). Its implementation follows the Chambolle–Pock algorithm [21]. The corresponding parameter \(\mu \) was tuned w.r.t. the MSSIM index.

In Fig. 4, we show the reconstruction results for the Shepp–Logan phantom from undersampled (25 angles) and noisy Radon data. Standard FBP produces strong streak artifacts which are typical for angular undersampling, and the reconstruction suffers from noise. The TV regularization and the proposed method both provide considerably improved reconstruction results. The proposed method achieves a higher MSSIM value than the TV reconstruction, and it provides a reconstruction which is less grainy than the TV result.

Fig. 5
figure 5

Comparison of partitionings of a natural image corrupted by Gaussian noise with \(\sigma =0.2\). The Ambrosio–Tortorelli result is noisy and corrupted by clutter. The \(L_0\) gradient smoothing over-segments the large window on the left-hand side, while details of the cross in the bottom right are smoothed out. The proposed result is visually competitive with the state-of-the-art graph cuts result

Image Partitioning Finally, we consider the classical Potts problem corresponding to \(A=\mathrm {id}\) in (1). While the focus of the present paper is on a general imaging operator A, we next observe that it also works rather well for \(A=\mathrm {id}\). We used the full coupling scheme and set the step-size parameter to \(\lambda = 0.55\).

To put our result in context, we added the results of two other methods for \(A=\mathrm {id}\): the \(L_0\) gradient smoothing method of Xu et al. [94] and the state-of-the-art \(\alpha \)-expansion graph cut algorithm based on max-flow/min-cut of the library GCOptimization 3.0 of Veksler and Delong [12, 13, 52]. The method of [94] has a parameter \(\kappa >1\) to control the convergence speed and a smoothing weight \(\nu \). In our experiments, we set \(\kappa = 1.01\) and \(\nu =0.1\). For the graph cuts the same neighborhood weights and jump penalty were used as for the proposed method. The discrete labels are computed via k-means.

In Fig. 5, we show the results for a natural image corrupted by Gaussian noise. The Ambrosio–Tortorelli result suffers from clutter and remains noisy. The result of \(L_0\) gradient smoothing over-segments the textured window area while it smooths out details of the cross. The state of the art graph cuts method and the proposed method both provide satisfying results which are visually comparable. Further, they yield solutions with comparable Potts energy values. For instance, on the IVC dataset [55] which consists of 10 natural color images of size \(512\times 512,\) for the model parameters, \(\gamma = 0.25\) and \(\gamma =1,\) the mean values of the proposed approach are 7107.8 and 13053.2 compared with the respective mean energies of the graph cut approach 7093.2 and 13008.7 which differ by about half a percent. (For the results in Fig. 5, the energy value of the proposed approach is 25067.7 compared with 25119.5 for the graph cuts approach.) Here, for the graph cut approach, we took the mean value of the input image on each computed segment before computing the Potts objective function. To sum up, while the proposed method can handle general linear operators A, the quality of the results for \(A=\mathrm id\) is comparable with the state-of-the-art graph cut algorithm for \(A=\mathrm id\).

5 Conclusion

In this paper, we have proposed a new iterative minimization strategy for multivariate piecewise constant Mumford–Shah/Potts energies as well as their quadratic penalty relaxations. Our schemes are based on majorization–minimization or forward–backward splitting methods of Douglas–Rachford type [57]. In contrast to the approaches in [9, 33, 60, 61] for sparsity problems which lead to thresholding algorithms, our approach leads to non-separable yet computationally tractable problems in the backward step.

As a second part, we have provided a convergence analysis for the proposed algorithms. For the proposed quadratic penalty relaxation scheme, we have shown convergence toward a local minimizer. Due to the NP-hardness of the quadratic penalty relaxation, the convergence result is in the range of what can be expected best. Concerning the scheme for the non-relaxed Potts problem, we have also performed a convergence analysis. In particular, we have obtained results on the convergence toward local minimizers on subsequences. The quality of these results is comparable with the results of [60, 61] where, compared with these papers, we had to deal with the non-separability of the backward step as an additional challenge.

Finally, we have shown the applicability of our schemes in several experiments. We have applied our algorithms to deconvolution problems including the problem of deblurring and denoising motion blur images. We have further dealt with noisy and undersampled Radon data for the task of joint reconstruction, denoising and segmentation. Finally, we have applied our approach to the situation of pure image partitioning (without blur) which is a widely considered problem in computer vision.