1 Introduction

Image segmentation is a fundamental task in image processing and computer vision. It consists in dividing an image into non-overlapping regions of shared features, such as intensity, smoothness, and texture, which are related to the final goal of the segmentation. Thus, the division into regions is not unique, and the image segmentation can be regarded as a strongly ill-posed problem.

Let f be an image defined in a domain \(\Omega \subset {{\mathbb {R}}}^d\) (\(d \ge 2\)), segmenting f consists in finding a decomposition of the domain \(\Omega\) into a set of non-empty pairwise-disjoint regions \(\Omega _i\), \(i=1,\ldots ,m\).

A segmentation of f can be expressed through a curve \(C^*\) that matches the boundaries of the decomposition of \(\Omega\), i.e. \(C^*= \bigcup \limits _{i =1 }^m\partial \Omega _i\) and/or a piecewise-constant function \(f^*\) defined on \(\Omega\) that approximates f.

The research on image segmentation has made several advances in the last decades and various approaches have been developed, including thresholding, region growing, edge detection and variational methods [1,2,3]. Variational models, based on optimizing energy functionals, have been widely investigated, proving to be very effective on different images; curve evolution [4], anisotropic diffusion [5] and the Mumford-Shah model [6] are good representatives of these methods. Other recent approaches to image segmentation include learning-based methods, which often exploit deep-learning techniques [7,8,9], watershed [10], random walk methods [11], graph cuts [12, 13], epidemiological models on images [14]. However, in this case, a large amount of data must be available to train learning networks, thus making those approaches impractical in some applications.

Two-region segmentation is here considered, where the domain of the given image \(\bar{f}\) is separated in two regions of interest, so \(m=2\) and \(\Omega =\Omega _{in} \cup \Omega _{out}\), i.e. \(\Omega _{in}\) and \(\Omega _{out}\) are the foreground and the background of the image, respectively. Although the choice of \(m=2\) significantly simplifies the segmentation problem, it has a lot of application fields, such as biological and medical imaging, text extraction, compression of screen content and mixed content documents, and can be used as a computational kernel for more complex segmentation tasks [15,16,17,18,19,20].

A widely-used two-region model was introduced by Chan and Vese in [21] and, together with its variations, is regarded as state of the art in the segmentation community. These models are currently used in medical and astronomical application fields and have lately been associated with machine learning frameworks (see, e.g. [8, 22,23,24,25,26]). The Chan-Vese model is a special case of the most popular Mumford-Shah one [6] restricted to piecewise constant functions. The solution is the best approximation to \(\bar{f}\) among all the functions that take only two values, \(c_{in}\) and \(c_{out}\). As is the case of many variational models for image processing, the model results in a non-convex optimization problem and may have various local minima. Chan et al. [27] propose a convexed relaxation model, here denoted as CEN, which considers the case of f taking values in [0, 1], and sets one of the two regions as

$$\begin{aligned} \Omega _{in} = \left\{ x : f(x) > \alpha \right\} \text{ for } \text{ a.e. } \alpha \in (0,1). \end{aligned}$$

The CEN model first computes the values \(c_{in}\) and \(c_{out}\), and then, given \(\lambda >0\), it determines f by solving the convex minimization problem

$$\begin{aligned} \begin{array}{rl} \displaystyle \min _{0 \le f \le 1}&\displaystyle \int _{\Omega } \vert \nabla f\vert \, dx + \lambda \int _{\Omega } \left( (c_{in} - \bar{f} (x))^2 f(x) + (c_{out} - \bar{f} (x))^2 (1-f(x)) \right) dx. \end{array} \end{aligned}$$
(1)

We note that the aforementioned models assume that each image region is defined as a smooth or constant function. However, images may not be piecewise smooth or flat as a whole, but they may contain some non-smooth regions. In practice, imposing smoothness on such kind of images may lead to a destructive averaging of the image content [28], which can produce an inaccurate segmentation. Exploiting information on the non-smooth structure of an image can help to improve the CEN model to be effective on a larger set than the one of smooth-images as done, e.g. in [29], thanks to the introduction of spatially-varying regularization methods.

In this paper we will design a new model for two-region image segmentation that, starting from a rough cartoon-texture decomposition \(\bar{f}=\bar{u}+\bar{v}\) of the initial image, produces a cartoon-texture-driven decomposition of \(\bar{u}\) as \(\bar{u} = u + v\) and simultaneously provides a segmentation of u. In the new model a Kullback-Leibler divergence term is used to force v to be close to \(\bar{v}\), thus allowing it to further extract smaller-scale oscillatory components from the starting cartoon part \(\bar{u}\). Thanks to this additional term, the segmentation process is shown to have improved robustness with respect to noise and texture in the initial image.

The rest of the paper is organized as follows: in Sect. 2 we recall the cartoon-texture decomposition of an image, in Sect. 3 we introduce the proposed model, which results in a non-smooth convex optimization problem, and in Sect. 4 we introduce an ADMM scheme for the problem solution and analyze its convergence. Sect. 5 is devoted to numerical experiments and comparison with the original CEN model and with state-of-the-art models suited for textural image segmentation. Finally, we draw our conclusions in Sect. 6.

2 Cartoon-texture decomposition

An image f is usually described as a superposition of two components, i.e.,

$$\begin{aligned} f=u+v, \end{aligned}$$

where u is the geometric component and v is the oscillatory one. The geometric component, commonly referred to as ‘cartoon’, consists of the piecewise-constant parts of an image, including homogeneous regions, contours, and sharp edges. In contrast, the oscillatory component includes the patterns which can be observed in the image, such as texture or noise. Both texture and noise can indeed be seen as repeated patterns of small scale details, with noise being characterized by random and uncorrelated values. The cartoon-texture decomposition of an image plays an important role in computer vision [30], with a wide range of applications to, e.g., image restoration, segmentation, image editing, and remote sensing. It is an underdetermined linear inverse problem with many solutions, usually described by variational models able to force the cartoon and the texture into different functional spaces in order to produce the required decomposition.

Following the idea of Meyer [31], the general image decomposition problem can be formulated as

$$\begin{aligned} \begin{array}{rl} \underset{(u,v)\in X \times Y}{\min } &{} g_1(u)+g_2(v)\\ s.t. &{} u+v=f,\\ \end{array} \end{aligned}$$
(2)

where X and Y are suitable function spaces and \(g_1\) and \(g_2\) are functionals that model the cartoon regions and the texture patterns, respectively. Several choices have been proposed in literature for both XY and \(g_1, g_2\) [32, 33]. A widely used choice to model the cartoon is \(g_1(u)=TV(u)\), due to its ability to induce piecewise smooth u with bounded variations [34, 35]. Some alternative approaches impose a sparse representation of the cartoon under a given system, such as wavelet frames [36] or curvelet systems [37]. Modeling the texture component is a more complex task, due to the difficulty of conceptualizing mathematical properties able to encompass all the texture types. Many models use the space of oscillatory functions equipped with appropriate norms able to represent textured or oscillatory patterns [34, 35, 38]. An alternative approach assumes that, under suitable conditions, textures can be sparsified, i.e., a texture patch can be represented by few atoms in a given dictionary or by specific transforms [39].

Since the existing methods for cartoon-texture decomposition are beyond the scope of this paper, here we simply assume that we are able to obtain a decomposition of the given image:

$$\begin{aligned} \bar{f} = \bar{u} + \bar{v}, \end{aligned}$$
(3)

with the aim of using the different information on the two components to improve the effectiveness of the CEN model. In our experiments we will consider the algorithm described in [40]. Figure 1 shows the decomposition produced by one iteration of the algorithm, which results in

$$\begin{aligned} \bar{u}(x)=\omega (\rho _\sigma (x))L_\sigma * \bar{f}+(1-\omega (\rho _\sigma (x)))\bar{f}, \;\;\;\; \bar{v}(x)=\bar{f}(x)-\bar{u}(x), \end{aligned}$$
(4)

where \(L_\sigma\) is a low-pass filter, \(*\) is the convolution operator, \(\omega : [0,1] \longrightarrow [0,1]\) is an increasing function that is constant and equal to zero near zero and constant and equal to 1 near 1, and \(\rho _\sigma (x)\) is the relative reduction rate of local TV

$$\begin{aligned} \rho _\sigma (x) =\frac{LTV_\sigma (\bar{f}(x))-LTV_\sigma (L_\sigma *\bar{f}(x))}{LTV_\sigma (\bar{f}(x))} \in [0,1] \end{aligned}$$
(5)

with \(LTV_\sigma (\bar{f}(x)) = \left( L_\sigma * \vert \nabla \bar{f}\vert \right)\).

Fig. 1
figure 1

Cartoon-texture decomposition of airplane image after the application of (4)-(5)

We note that the cartoon-texture decomposition produced by (4) is not unique, but it depends on the choice of \(\sigma\) [40]. Anyway, we will show that a rough decomposition is enough for our model, hence there’s no need for an accurate tuning of \(\sigma\).

3 The C-TETRIS model

We here introduce the Cartoon-Texture Evolution for Two-Region Image Segmentation (C-TETRIS) model. As mentioned in the previous sections, starting from the decomposition (3), the main idea behind C-TETRIS is to simultaneously produce the segmentation of \(\bar{u}\) and its cartoon-texture decomposition. In detail, it decomposes \(\bar{u}\) as \(\bar{u}=u+v\), where v is enforced to be close to \(\bar{v}\), and computes a segmentation of u by solving the problem

$$\begin{aligned} \begin{array}{rl} \underset{u,c_{in},c_{out},v}{\min } &{} \displaystyle {\mathcal E}_{CEN}(u,c_{in},c_{out}; \bar{u}) + \mu {\mathcal D}_{KL}(v;\bar{v})\\ {{\,\mathrm{s.t.}\,}}&{} 0 \le u \le 1,\\ &{} \displaystyle u+v=\bar{u}, \end{array} \end{aligned}$$
(6)

where \({{\mathcal {E}}}_{CEN}\) represents the objective function of problem (1), \({\mathcal D}_{KL}(v;\bar{v})\) denotes the Kullback-Leibler (KL) divergence of v from \(\bar{v}\), defined as

$$\begin{aligned} {{\mathcal {D}}}_{KL}(v;\bar{v}) =\int _{\Omega } v(x) \log \left( \frac{v(x)}{\bar{v}(x)}\right) dx, \end{aligned}$$
(7)

where we set

$$\begin{aligned} v(x) \log \left( \frac{v(x)}{\bar{v}(x)} \right) = \left\{ \begin{array}{cc} 0 &{} v(x)=0, \\ \infty &{} \bar{v}(x) = 0, \end{array} \right. \end{aligned}$$

and \(\mu >0\). The KL divergence measures the amount of information lost if \(\bar{v}\) is used to approximate v and appears in many models of imaging science, where it is usually employed as a fidelity term. Simply speaking, the C-TETRIS model extracts from \(\bar{u}\) the “remaining texture” and produces its best approximation among all the functions that take only two values.

In the following we consider the discrete version of (6). Let

$$\begin{aligned} \Omega _{n_x,n_y}=\left\{ (i,j) : 0 \le i \le n_x-1, \, 0 \le j \le n_y-1\right\} \end{aligned}$$

be a discretization of \(\Omega\) consisting of an \(n_x \times n_y\) grid of pixels and

$$\begin{aligned} \vert \nabla _x u \vert _{i,j} = \vert \delta _x^+ u \vert _{i,j} , \quad \vert \nabla _y u \vert _{i,j} = \vert \delta _y^+ u \vert _{i,j} \end{aligned}$$

where \(\delta _x^+\) and \(\delta _y^+\) are the forward finite-difference operators in the x- and y-directions, with unit spacing, and the values \(u_{i,j}\) with indices outside \(\Omega _{n_x,n_y}\) are defined by replication. The discrete version of the (6) leads to the following non-smooth constrained optimization problem:

$$\begin{aligned} \begin{array}{rl} \underset{u, c_{in}, c_{out},v}{\min } &{} \displaystyle E_{CEN}(u,c_{in},c_{out}; \bar{u}) + \mu D_{KL}(v;\bar{v}) \\ {{\,\mathrm{s.t.}\,}}&{} 0 \le u \le 1, \\ &{} u +v ={\bar{u}}, \end{array} \end{aligned}$$
(8)

where we denoted by \(E_{CEN}\) the discrete version of \({{\mathcal {E}}}_{CEN}\), defined as

$$\begin{aligned}&E_{CEN}(u,c_{in},c_{out}; \bar{u}) = \sum _{i,j} \big (\vert \nabla _x u \vert _{i,j} + \vert \nabla _y u \vert _{i,j}\big ) +\\&\quad +\lambda \sum _{i,j} \left( u_{i,j} (c_{in}-\bar{u}_{i,j})^2 + (1- u_{i,j})\,( c_{out}-\bar{u}_{i,j})^2\right) , \end{aligned}$$

and we denoted with \(D_{KL}\) the discrete version of the Kullback-Leibler divergence \({{\mathcal {D}}}_{KL}\), defined as

$$\begin{aligned} D_{KL}(v;\bar{v}) = \sum _{i,j} v_{i,j} \log \left( \frac{v_{i,j}}{\bar{v}_{i,j}}\right) . \end{aligned}$$

It is worth noting that the first term in \(E_{CEN}\) corresponds to the discrete Total Variation (TV) of the image u. We here opted for the use of a modified version of the TV functional, in which the \(\ell _2\) norm is replaced by the \(\ell _1\) one (as proposed in [41]), since in the case of image restoration it is known to produce sharper piece-wise constant images. Nevertheless, a preliminary comparison between the models equipped with the \(\ell _1\) and the \(\ell _2\) version, respectively, showed no difference in terms of segmentation quality.

4 Minimizing the C-TETRIS model

We here focus on the solution of the minimization problem in (8). One can observe that, although the problem is in general nonconvex, it becomes convex when either the pair \((c_{in}, \,c_{out})\) or the pair \((u,\,v)\) are fixed. Suppose, for the moment, that the values of \(c_{in}, c_{out}\) have been determined and consider the minimization problem in u and v only, which can be written as

$$\begin{aligned} \begin{array}{rl} \underset{u,v}{\min } &{} \displaystyle \sum _{i,j} \big (\vert \nabla _x u \vert _{i,j} + \vert \nabla _y u \vert _{i,j}\big ) + \lambda \, r^\top u + \mu D_{KL}(v;\bar{v}) \\ {{\,\mathrm{s.t.}\,}}&{} 0 \le u \le 1, \\ &{} u + v ={\bar{u}}, \end{array} \end{aligned}$$
(9)

where we defined, for each (ij),

$$\begin{aligned} r_{i,j}\equiv r_{i,j}(c_{in},c_{out})= \left( c_{in}-\bar{u}_{i,j} \right) ^2 - \left( c_{out}-\bar{u}_{i,j}\right) ^2. \end{aligned}$$

Problem (9) is a non-smooth convex optimization problem subject to linear and bound constraints which we propose to solve by the Alternate Directions Method of Multipliers (ADMM) [42]. To this aim, we reformulate problem (9) as

$$\begin{aligned} \begin{array}{rl} \underset{u,d_x,d_y,v}{\min } &{} \displaystyle \Vert d_x\Vert _1 + \Vert d_y\Vert _1 + \lambda \, r^\top u + \mu D_{KL}(v;\bar{v}) \\ {{\,\mathrm{s.t.}\,}}&{} d_x = \nabla _x u, \\ &{} d_y = \nabla _y u, \\ &{} u + v ={\bar{u}}, \\ &{} 0 \le u \le 1. \end{array} \end{aligned}$$
(10)

Starting from (10), it is straightforward to check that the objective function and the constraints of the problem can be split in two blocks. Indeed, by introducing the variable \(z = [d_x^\top ,d_y^\top ,v^\top ]^\top\), one can further reformulate (10) as

$$\begin{aligned} \begin{array}{rl} \underset{u,z}{\min } &{} \displaystyle F(u) + G(z) \\ {{\,\mathrm{s.t.}\,}}&{} H\,u - z = b, \end{array} \end{aligned}$$
(11)

where we defined

$$\begin{aligned}&F(u) = \lambda \, r^\top u + \chi _{[0,1]}(u),\qquad G(z) = \Vert d_x\Vert _1 + \Vert d_y\Vert _1 + \mu D_{KL}(v;\bar{v}),\\&H = \left[ \nabla _x^\top ,\,\nabla _y^\top ,\,-I \right] ^\top , \text{ and } \quad b = [ 0,\,0,\,-\bar{u}^\top ]^\top , \end{aligned}$$

and we used \(\chi _{[0,1]}(u)\) to indicate the characteristic function of the hypercube \([0,1]^{n_x\times n_y}\).

Consider the Lagrangian and the augmented Lagrangian functions associated with problem (11), defined respectively as

$$\begin{aligned}&\mathcal {L}(u,z,\xi ) = F(u) + G(z) + \xi ^\top \left( H\,u - z - b\right) ,\\&\mathcal {L}_A(u,z,\xi ;\rho ) = F(u) + G(z) + \xi ^\top \left( H\,u - z - b\right) + \frac{\rho }{2}\left\| H\,u - z - b\right\| _2^2, \end{aligned}$$

where \(\rho >0\), and \(\xi\) is a vector of Lagrange multipliers.

Starting from given estimates \(u^0\), \(z^0\), and \(\xi ^0\), at each iteration k ADMM updates the estimates as

$$\begin{aligned} \begin{aligned} u^{k+1}&= \displaystyle \mathop {{{\,\mathrm{argmin}\,}}}\limits _{u} \mathcal {L}_A(u,z^k,\xi ^k;\rho ),\\ z^{k+1}&= \displaystyle \mathop {{{\,\mathrm{argmin}\,}}}\limits _{z} \mathcal {L}_A(u^{k+1},z,\xi ^k;\rho ),\\ \xi ^{k+1}&= \displaystyle \xi ^k + \rho \left( H\,u^{k+1} - z^{k+1}\right) . \end{aligned} \end{aligned}$$
(12)

Since F(u) and G(z) in (11) are closed, proper and convex, and H has full rank, the convergence of ADMM can be proved by exploiting the classical result from [43], which we report in the following.

Theorem 1

Consider problem (11) where F(u) and G(z) are closed, proper and convex functions and H has full rank. Consider the summable sequences \(\{\varepsilon _k\}, \{\nu _k\} \subset {{\mathbb {R}}}_+\) and let

$$\begin{aligned}&\left\| u^{k+1} - \mathop {{{\,\mathrm{argmin}\,}}}\limits _{u} \mathcal {L}_A(u,z^k,\xi ^k;\rho )\right\| \le \varepsilon _k,\\&\left\| z^{k+1} - \mathop {{{\,\mathrm{argmin}\,}}}\limits _{z} \mathcal {L}_A(u^{k+1},z,\xi ^k;\rho )\right\| \le \nu _k,\\&\xi ^{k+1} = \xi ^k + \rho \left( H\,u^{k+1} - z^{k+1}\right) . \end{aligned}$$

If there exists a saddle point \((u^*,z^*,\xi ^*)\) of \(\mathcal {L}(u,z,\xi )\), then \(u^k\rightarrow u^*\), \(z^k\rightarrow z^*\) and \(\xi ^k\rightarrow \xi ^*\). If such saddle point does not exist, then at least one of the sequences \(\{z^k\}\) or \(\{\xi ^k\}\) is unbounded.

Theorem 1 guarantees the convergence of the ADMM scheme even if the subproblems are solved inexactly, provided that the inexactness of the solution can be controlled.

So far we have been concerned with the solution of problem (9) when the values of \(c_{in}\) and \(c_{out}\) are known in advance which, however, is not the case in practice. By following the example of [27], we adopt a two-step scheme in which we alternate updates of u and z, determining the shape of the two regions, and updates of \(c_{in}\) and \(c_{out}\). Observe that, by fixing \(u=u^k\) and \(z=z^k\), the restriction of problem (8) to \(c_{in}\) and \(c_{out}\) can be written as the unconstrained convex quadratic optimization problem

$$\begin{aligned} \underset{c_{in}, c_{out}}{\min } \quad \displaystyle \sum _{i,j} \left( u^k_{i,j} (c_{in}-\bar{u}_{i,j})^2 + (1- u^k_{i,j})\,( c_{out}-\bar{u}_{i,j})^2\right) . \end{aligned}$$
(13)

Hence, we propose to update the values of \(c_{in}\) and \(c_{out}\) after each ADMM step by taking the exact minimizer of problem (13), i.e., by setting

$$\begin{aligned} c_{in}^k = \frac{\sum _{i,j}u^k_{i,j}\bar{u}_{i,j}}{\sum _{i,j}u^k_{i,j}},\quad \text{ and }\quad c_{out}^k = \frac{\sum _{i,j}(1-u^k_{i,j})\bar{u}_{i,j}}{\sum _{i,j}(1-u^k_{i,j})}. \end{aligned}$$
(14)

It is worth pointing out that such a modification alters the original ADMM scheme making it an inexact alternate minimization scheme for the problem in u, z, \(c_{in}\), and \(c_{out}\). Nevertheless, as also shown for the original CEN model, the experiments carried out in this work show that in all the cases under analysis the values of \(c_{in}\) and \(c_{out}\) stagnate after the first few iterations, thus recovering in practice the convergence properties shown for the case of fixed \(c_{in}\) and \(c_{out}\).

4.1 Solving the ADMM subproblems

We will now focus on how the subproblems in (12) can be solved in practice. First, by expliciting the form of the augmented Lagrangian functions, we can rewrite the ADMM scheme as

$$\begin{aligned} \begin{aligned} u^{k+1}&= \displaystyle \mathop {{{\,\mathrm{argmin}\,}}}\limits _{0\le u \le 1} \lambda \, r^\top u + (\xi ^k)^\top \left( H\,u - z^k - b\right) + \frac{\rho }{2}\left\| H\,u - z^k - b\right\| _2^2,\\ z^{k+1}&= \displaystyle \mathop {{{\,\mathrm{argmin}\,}}}\limits _{z} G(z) + (\xi ^k)^\top \left( H\,u^{k+1} - z - b\right) + \frac{\rho }{2}\left\| H\,u^{k+1} - z - b\right\| _2^2,\\ \xi ^{k+1}&= \displaystyle \xi ^k + \rho \left( H\,u^{k+1} - z^{k+1} - b\right) . \end{aligned} \end{aligned}$$

It is straightforward to check that the minimization problem over z can be split into three independent minimization problems, respectively on \(d_x\), \(d_y\), and v, leading to the following scheme

$$\begin{aligned} \begin{aligned} u^{k+1}&= \displaystyle \mathop {{{\,\mathrm{argmin}\,}}}\limits _{0\le u \le 1} \lambda \, r^\top u + (\xi ^k)^\top \left( H\,u - z^k - b\right) + \frac{\rho }{2}\left\| H\,u - z^k\right\| _2^2,\\ d_x^{k+1}&= \displaystyle \mathop {{{\,\mathrm{argmin}\,}}}\limits _{d_x} \Vert d_x\Vert _1 + (\xi _x^k)^\top \left( \nabla _x u^{k+1} - d_x\right) + \frac{\rho }{2}\left\| \nabla _x u^{k+1} - d_x\right\| _2^2,\\ d_y^{k+1}&= \displaystyle \mathop {{{\,\mathrm{argmin}\,}}}\limits _{d_y} \Vert d_y\Vert _1 + (\xi _y^k)^\top \left( \nabla _y u^{k+1} - d_y\right) + \frac{\rho }{2}\left\| \nabla _y u^{k+1} - d_y\right\| _2^2,\\ v^{k+1}&= \displaystyle \mathop {{{\,\mathrm{argmin}\,}}}\limits _{v} \mu D_{KL}(v;\bar{v}) + (\xi _v^k)^\top \left( -u^{k+1} - v + \bar{u}\right) + \frac{\rho }{2}\left\| u^{k+1} + v - \bar{u}\right\| _2^2,\\ \xi ^{k+1}&= \displaystyle \xi ^k + \rho \left( H\,u^{k+1} - z^{k+1} -b\right) , \end{aligned} \end{aligned}$$
(15)

where we split the Lagrange multipliers vector \(\xi\) as \(\xi = [\xi _x^\top , \xi _y^\top , \xi _v^\top ]^\top\). The scheme presented in (15) can be further simplified by exploiting the linearity of the constraints \(H\,u -z = b\), as suggested in [44]. In detail, by introducing the vectors \(b_x^k = \frac{\xi _x^k}{\rho }\), \(b_y^k = \frac{\xi _y^k}{\rho }\), and \(b_v^k = -\frac{\xi _v^k}{\rho }-\bar{u}\), one can rewrite (15) equivalently as

$$\begin{aligned} u^{k+1} =&\;\; \displaystyle \mathop {{{\,\mathrm{argmin}\,}}}\limits _{0\le u \le 1} \lambda \, r^\top u + \frac{\rho }{2}\left\| \nabla _x u - d_x^k + b_x^k\right\| _2^2 \nonumber \\&+ \frac{\rho }{2}\left\| \nabla _y u - d_y^k + b_y^k\right\| _2^2 + \frac{\rho }{2}\left\| u + v^k + b_v^k\right\| _2^2, \end{aligned}$$
(16)
$$\begin{aligned} d_x^{k+1} =&\;\; \displaystyle \mathop {{{\,\mathrm{argmin}\,}}}\limits _{d_x} \Vert d_x\Vert _1 + \frac{\rho }{2}\left\| \nabla _x u^{k+1} - d_x + b_x^k\right\| _2^2, \end{aligned}$$
(17)
$$\begin{aligned} d_y^{k+1} =&\;\; \displaystyle \mathop {{{\,\mathrm{argmin}\,}}}\limits _{d_y} \Vert d_y\Vert _1 + \frac{\rho }{2}\left\| \nabla _y u^{k+1} - d_y + b_y^k\right\| _2^2, \end{aligned}$$
(18)
$$\begin{aligned} v^{k+1} =&\;\; \displaystyle \mathop {{{\,\mathrm{argmin}\,}}}\limits _{v} \mu D_{KL}(v;\bar{v}) + \frac{\rho }{2}\left\| u^{k+1} + v + b_v^k\right\| _2^2, \end{aligned}$$
(19)
$$\begin{aligned} b_x^{k+1} =&\;\;\; \displaystyle b_x^k + \nabla _x\,u^{k+1} - d_x^{k+1},\nonumber \\ b_y^{k+1} =&\;\;\; \displaystyle b_y^k + \nabla _y\,u^{k+1} - d_y^{k+1},\nonumber \\ b_v^{k+1} =&\;\;\; \displaystyle b_v^k + u^{k+1} + v^{k+1} - \bar{u}. \end{aligned}$$
(20)

Problem (16) is a strongly convex bound-constrained quadratic optimization problem. To obtain an approximate solution \(u^{k+1}\), by following [29, 45], we consider the optimality conditions of the unconstrained version of the problem, i.e., the solution to the linear system

$$\begin{aligned} (- \Delta + I) u = - \frac{\lambda \,r}{\rho } + ( \nabla _x^\top ( b_x^{k} - d_x^{k}) )+ ( \nabla _y^\top ( b_y^{k} - d_y^{k}) ) + (b_v^{k}-v^{k}), \end{aligned}$$

where \(\Delta\) represents the finite-difference discretization of the Laplacian. We first solve the system by Gauss-Seidel method and then project the solution in \([0,1]^{n_x \times n_y}\).

As regards the updates in (17)-(19), one has to note that they are proximal operators [46, 47] of closed proper and convex functions. In detail, the proximal operator in (17) and (18) can be computed in closed form by means of the well-known soft-thresholding operator, defined as

$$\begin{aligned} {[}{{\mathcal {S}}}(x,\gamma )]_{i,j}= \mathrm {sign}(x_{i,j})\cdot \max \big (\vert x_{i,j}\vert -\gamma , 0\big ). \end{aligned}$$

Finally, the proximal operator in (19) can be computed as

$$\begin{aligned}{}[\mathrm {prox}_{\gamma D_{KL}(x,\tilde{x})} (x)]_{i,j} = \gamma W (\gamma ^{-1}\tilde{x}_{i,j} e^{\gamma ^{-1}x_{i,j}-\tilde{x}_{i,j}^{-1}}), \end{aligned}$$

where W(x) is the Lambert W function satisfying \(W (y) e^{W(y)} = y\) which, although not available in closed form, can be approximated with high precision.

5 Numerical experiments

In this section, we test the effectiveness of C-TETRIS in producing two-region segmentation on various image sets. The first set contains three pairs of real-life images with corresponding ground truth coming from the database [48]: man is a smooth image whereas flowerbed and stone show an object foreground on a textured background. The second set consists of four images available from the Berkeley database [49] which are in general considered to be smooth: the real-life images airplane and squirrel, and the medical images brain and ultrasound. The third set of images consists of noisy versions of the famous cameraman image from MIT Image LibraryFootnote 1 which we use to test the robustness of the C-TETRIS model with respect to the noise. The fourth and last set of images consists of three textural images: tiger and bear, taken from [49], and spiral, taken from [21]. We here provide some further details on the numerical experiments. The C-TETRIS algorithm was implemented in MATLAB using the Image Processing Toolbox, where the cartoon-texture decomposition was initially performed by one iteration of the algorithm described in [40], using a Gaussian filter with \(\sigma = 2\) as \(L_{\sigma }\), and the following function \(\omega\) [40]:

$$\begin{aligned} \omega (x) = \left\{ \begin{array}{ll} 0, &{} x \le l_1, \\ (x-l_1)/(l_2 -l_1), &{} l_1< x < l_2, \\ 1, &{} x \ge l_2, \end{array}\right. \end{aligned}$$
(21)

where the weights \(l_1\) and \(l_2\) have been set to 0.25 and 0.5, respectively. We would like to remark that extensive testing showed that the accuracy of the produced segmentation is only slightly influenced by the variation of the Gaussian smoothing parameter, \(\sigma\), or by the number of steps performed to obtain the cartoon-texture decomposition. Among the several available implementations of CEN we chose the oneFootnote 2 proposed by the authors of [45]. Although the code is written in C programming language, a MEX interface is available for testing in MATLAB. This implementation is based on split Bregman iterations with the following stopping criterion:

$$\begin{aligned} \vert \mathtt {diff}^k - \mathtt {diff}^{k-1} \vert \le \mathtt {tol} \quad \text{ and } \quad k > \mathtt {maxit} , \end{aligned}$$
(22)

where

$$\begin{aligned} \mathtt {diff}^k = \frac{\mathtt {sd}(f^k)}{\mathtt {sd}(f^k) \cdot \mathtt {sd}(f^{k-1})}, \quad \mathtt {sd}(f^k) = \sum _{i,j} (f_{i,j}^k - f_{i,j}^{(k-1)})^2, \end{aligned}$$

\(\mathtt {tol}\) is a given tolerance and \(\mathtt {maxit}\) is the maximum number of SB iterations. In order to make a fair comparison, all the algorithms presented in the next section use the stopping criterion (22), where we set \(\mathtt {maxit}=50\) and \(\mathtt {tol} =10^{-6}\) (\(\mathtt {tol} =10^{-8}\) for the noisy images). The parameter \(\lambda\) in (1) and in (9), has a scaling role and was set according to the level of required details in the segmentation. In particular, in each test for CEN model we used the value proposed by the authors in the available code, which we indicate as \(\lambda _{CEN}\), based on this empirical rule: \(\lambda _{CEN} =10^a\) with \(a \in \{-1,0,1\}\) from larger to smaller regularization/smoothing. To balance the presence of the KL term, for C-TETRIS we perform a grid search and select a parameter \(\lambda\) with a variation of at most \(5\%\) from \(\lambda _{CEN}\). The parameter \(\mu\) was set as \(\mu =10^c\) with \(c \in \{-2,-1,0\}\). Finally, the Bregman parameter \(\rho\) was set to 1.

Before proceeding with the experiments on the four image sets described above, we show an example of the functioning of the proposed model. We consider an image for each of the four sets and report in Fig. 2 the starting cartoon-texture decomposition and the components u and v after the first ADMM iteration, at an intermediate iteration and at the last iteration. We note that, as the ADMM advances, the remaining texture is progressively subtracted from the cartoon, allowing a clearer distinction of background and foreground.

Fig. 2
figure 2

Details of the evolution of the cartoon (u) and the texture part (v) performed by C-TETRIS on images airplane, cameraman with 15\(\%\) of salt & pepper noise, and tiger. The segmentations produced on the last iteration for each image are showed in Fig. 3, 4, 5, and 6, respectively

Fig. 3
figure 3

Segmentations of images with ground truth by CEN and C-TETRIS

Fig. 4
figure 4

Segmentations of smooth images by CEN and C-TETRIS. The results of the segmentation produced by CEN on the cartoon part of the image are also shown

Fig. 5
figure 5

Segmentations of cameraman with different sources and noise levels by CEN and C-TETRIS. Gaussian and Poissonian noise are applied with different SNR values, whereas salt and pepper noise is added with different percentages (see section 5.3 for details)

Fig. 6
figure 6

Segmentations of textural images by C-TETRIS, SpAReg, and HTB

5.1 Results on ground truth images

First of all, in order to assess the accuracy of the C-TETRIS segmentation model, a comparison with ground truth data is presented in Fig. 3. The quality of the produced segmentations confirms the greater ability of C-TETRIS with respect to CEN in separating foreground objects from the background, especially on the flowerbed and stone images, where textured background is present. Furthermore, quantitative analysis measuring the similarity between the segmented images and the corresponding ground truth is given in Table 1. The segmentation errors have been evaluated using four traditional measuresFootnote 3. The Rand Index (RI) [50] counts the fraction of pairs of pixels whose labellings are consistent between the computed segmentation and the ground truth, the Global Consistency Error (GCE) [51] measures the distance between two segmentations assuming that one segmentation must be a refinement of the other, the Variation of Information (VI) [52] computes the distance between two segmentations as the average conditional entropy of one segmentation given the other, and the Boundary Displacement Error (BDE) [53] computes the average boundary pixels displacement error between two segmented imagesFootnote 4. As we can note in Table 1, the segmentations produced by C-TETRIS, have smaller values of CGE, VI, and BDE, than the ones produced by CEN, as well as they present the highest values of the RI measures, showing a greater consistency with the corresponding ground truth in the partitioning of foreground objects from the background.

Table 1 Measures of segmentation error produced by CEN and C-TETRIS on figures displayed in Fig. 3

5.2 Results on smooth images

In Fig. 4, we show a comparison between C-TETRIS and CEN on the segmentation of the set of smooth images.

For the sake of completeness we report also the segmentation results produced by CEN on the cartoon of the images. In general, the segmentations produced by C-TETRIS are comparable with or better than the ones produced by CEN. The segmentation of airplane shows the great effectiveness of the proposed model to separate accurately a non-uniform background from the object, due to the ability of C-TETRIS to remove the remaining texture in the cartoon, as showed in Fig. 2. We note that in general there are no significant differences in the quality of the segmentation results between CEN applied to the original image and CEN applied to the cartoon. However, in the case of ultrasound the segmentation on the cartoon produces unreliable result, due to the loss of contrast introduced by decomposition. In Table 2, two global metrics are listed to measure the contrast between the given image and its cartoon. In particular we used

$$\begin{aligned} m_1 = f_{max} - f_{min} \end{aligned}$$

and the Michelson formula [54]:

$$\begin{aligned} m_2= (f_{max} - f_{mean})/(f_{max} + f_{mean}) \end{aligned}$$

where \(f_{max}\), \(f_{min}\) and \(f_{mean}\) are the maximum, the minimum and the mean value respectively of the given image intensity. We can note that the cartoon part of ultrasound shows the largest reduction of the both metrics with respect to the original image.

Table 2 Global metrics of the image contrast (defined in Sect. 5.2) evaluated on the set of smooth images and their cartoon part displayed in Fig. 4

5.3 Results on noisy images

In Fig. 5 a comparison between C-TETRIS and CEN on the set of noisy images is shown. The cameraman image was corrupted by different source of noise using the MATLAB imnoise function. In detail: the option ‘gaussian’ was used with different values for the standard deviation to obtain images affected by Gaussian noise with signal-to-noise ratio (SNR) equal to 20 and 15, respectively; by rescaling the pixels of the original image and using the option ‘poisson’ we obtained images affected by Poisson noise with SNR equal to 35 and 30, respectively; finally, the option ‘salt & pepper’ was used to create images affected by impulsive noise on 5% and 15% of the pixels. We note that C-TETRIS is more accurate in separating background and foreground, especially when the noise level increases. In this case, indeed, the noise is recognised as texture part and classified as foreground.

5.4 Results on textural images

Here we analyze the results of the C-TETRIS model on images containing textural components which require a two-region segmentation. We compared C-TETRIS with the Spatially Adaptive Regularization (SpAReg) model [29], which modifies the CEN model as follows:

$$\begin{aligned} \begin{array}{rl} \underset{f}{\min } &{} \displaystyle \sum _{i,j} \left( \vert \nabla _x f \vert _{i,j} + \vert \nabla _y f \vert _{i,j} + \lambda _{i,j} \, (r^\top f)_{i,j} \right) \\ {{\,\mathrm{s.t.}\,}}&{} 0 \le f \le 1 \end{array} \end{aligned}$$
(23)

where each entry of the matrix \(\Lambda =(\lambda _{i,j})\) weighs the pixel (ij) according to local texture information as follows:

$$\begin{aligned} \lambda _{i,j} = \max \left\{ \frac{\lambda _{min}}{\lambda _{max}}, \, 1-(\rho _\sigma )_{i,j} \right\} \lambda _{max} \, . \end{aligned}$$
(24)

\((\rho _\sigma )_{i,j}\) was defined applying the Eq. (5) to the given image \({\bar{f}}\), and \(0< \lambda _{min}< \lambda _{max} < \infty\) is a suitable range to drive the level of regularization, depending on the image to be segmented. In all the tests we set \(\lambda _{min} \le \lambda _{CEN} < \lambda _{max}\). We also include in the comparison a well-known segmentation model designed for textural images [55], that we denote as HTB. While C-TETRIS and SpAReg, being based on the original CEN model, classify foreground and background as regions with different intensities, the HTB model classifies them as regions with different textural components. In detail, it finds a contour that maximizes the KL distance between the probability density functions of the regions inside and outside the evolving (closed) active contour, which is aimed at separating textural objects of interest from the background. The feature used to characterize the texture is based on principal curvatures \(\chi\) of the intensity image considered as a 2-D manifold embedded in \({{\mathbb {R}}}^3\). In detail, the objective function of the HTB model is

$$\begin{aligned} KL(p_{in},p_{out})= \sum _{i,j} ((p_{in})_{i,j}-(p_{out})_{i,j}) \, ( \log \, (p_{in})_{i,j} - \log \, (p_{out})_{i,j}), \end{aligned}$$

where \(p_{in},p_{out}\) are the probability distribution of the texture feature \(\chi\) in \(\Omega _{in}\) and \(\Omega _{out}\), respectively, assuming a Gaussian distribution. We consider the implementation of HTB model provided in [45].

Figure 6 compares the segmentations produced by C-TETRIS with the ones produced by SpAReg and HTB, respectively. Firstly, we note that C-TETRIS outperforms both SpAReg and HTB on tiger and spiral, where the textural object region was well identified and separated from the background. On the bear test image, C-TETRIS seems to identify the main object better than SpAReg; however, it mistakenly includes in the foreground region some parts of the background below the bear. Both models are outperformed by HTB, which is the only model able to include the upper part of the image in the background region. In our opinion, the inaccurate result produced by the other two models is mainly due to the inhomogeneity of the background intensity that adverses its separation from the foreground region.

6 Conclusion

In this paper, a new model named Cartoon-Texture Evolution for Two-Region Image Segmentation (C-TETRIS) is proposed. C-TETRIS intends to improve the CEN model, which is specifically designed for smooth images, to produce good results on a wider set of images. Indeed, starting from a rough cartoon-texture decomposition of the image to be segmented, \(\bar{f} = \bar{u} + \bar{v}\), where \(\bar{u}\) and \(\bar{v}\) describe the cartoon and the texture components respectively, C-TETRIS is able to simultaneously produce a decomposition of \(\bar{u}\) as \(\bar{u}=u+v\), where v is enforced to be close to \(\bar{v}\) and the best approximation among all the functions that take only two values of u. This is realized by combining the CEN model on u and a Kullback-Leibler divergence of v from \(\bar{v}\). The proposed model leads to a non-smooth constrained optimization problem solved by means of the ADMM method, for which a convergence result is provided. Numerical experiments show that, as the ADMM advances, C-TETRIS progressively subtracts from \(\bar{u}\) the remaining texture, leading to a clearer distinction between background and foreground of the image. The experiments show that the proposed model is able to produce accurate two-region segmentation, comparable with or better than the one produced by state-of-the-art segmentation models, for several images also corrupted by noise or containing textural components. Furthermore, C-TETRIS seems to be independent of the type and level of noise. Future work will deal with the extension of the proposed combination of cartoon-texture decomposition and KL divergence term to more advanced image segmentation models.