A Cluster Sampling Method for Image Matting via Sparse Coding

Feng, Xiaoxue; Liang, Xiaohui; Zhang, Zili

doi:10.1007/978-3-319-46475-6_13

Xiaoxue Feng¹⁷,
Xiaohui Liang¹⁷ &
Zili Zhang¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9906))

Included in the following conference series:

European Conference on Computer Vision

20k Accesses
39 Citations

Abstract

In this paper, we present a new image matting algorithm which solves two major problems encountered by previous sampling-based algorithms. The first is that existing sampling-based approaches typically rely on certain spatial assumptions in collecting samples from known regions, and thus their performance deteriorates if the underlying assumptions are not satisfied. Here, we propose a method that a more representative set of samples is collected so as not to miss out true samples. This is accomplished by clustering the foreground and background pixels and collecting samples from each of the clusters. The second problem is that the quality of matting result is determined by the goodness of a single sample pair which causes errors when sampling-based methods fail to select the best pairs. In this paper, we derive a new objective function for directly obtaining the estimation of the alpha matte from a bunch of samples. Comparison on a standard benchmark dataset demonstrates that the proposed approach generates more robust and accurate alpha matte than state-of-the-art methods.

You have full access to this open access chapter, Download conference paper PDF

A matting method based on full feature coverage

Article 21 September 2018

A novel image matting method using sparse manual clicks

Article 22 December 2015

A Robust Matting Method Combined with Sparse-Coded Model

Keywords

1 Introduction

Estimation of foreground and background layers of an image is fundamental in image and video editing. In the computer vision literatures, this problem is known as image matting or alpha matting. Mathematically, the process is modeled in [1] by considering the observed color of a pixel as a combination of foreground color and background color:

$$\begin{aligned} I_{z} = \alpha _{z}F_{z} + (1-\alpha _{z}B_{z}) \end{aligned}$$

(1)

where $F_{z}$ and $B_{z}$ are the foreground and background colors of pixel z, $\alpha _{z}$ represents the opacity of a pixel and takes values in the range [0,1] with $\alpha _{z} = 1$ for foreground pixels and $\alpha _{z} = 0$ for background pixels. This is a highly ill-posed problem since we have to estimate seven unknowns from three composition equations for each pixel - one for each color channel. Typically, matting approaches rely on constraints such as assumption on image statistics [2, 3] or user interactions such as a trimap to reduce the solution space. A trimap [4] partitions an image into three regions - known foreground, known background and unknown regions that consist of a mixture of foreground and background colors.

From the aspect of assumptions on image statistics, existing natural image matting methods fall into three categories: (1) propagation-based [2, 5–10]; (2) color sampling-based [11–18]; (3) combination of sampling-based and propagation-based [19–22] methods. Propagation-based methods assume that neighboring pixels are correlated under some image statistics and use their affinities to propagate alpha values of known regions toward unknown ones. Sampling-based methods assume that the foreground and background colors of an unknown pixel can be explicitly estimated by examining nearby pixels. Thus, these methods collect sets of known foreground and background samples to estimate alpha values of unknown pixels. Early parametric sampling-based methods usually fit parametric statistical models to known foreground and background samples and then estimate alpha values by considering the distances of unknown pixels to known foreground and background distributions. However, it will generate large fitting errors when the color distribution could not significantly fit a statistical model. Recently, non-parametric sampling-based methods simply collect sets of known foreground and background samples and select best (F, B) pairs via an objective function combining spatial, photometric and probabilistic characteristics of an image to estimate alphas value of unknown pixels. Once the best (F, B) pair is selected, the alpha value is computed as

$$\begin{aligned} \alpha _{z} = \frac{{(I_{z}-B)}\cdot {(F-B)}}{\parallel {F-B}\parallel ^2} \end{aligned}$$

(2)

Combined methods [19–22] cast matting as an optimization problem and combine the color sampling component and the alpha propagation component in an energy function; solving for alpha matte becomes an energy minimization task. By the combination, more accurate and robust matting solutions can be expected. For a more comprehensive review on image matting methods, we refer the reader to [21, 23].

The matting method proposed in this paper belongs to the group of sampling-based approaches. As we will discuss in next section in detail, these approaches suffer from the fact that the quality of the extracted matte highly depends on the selected samples and the performance degrades when the true foreground and background colors (true samples) of unknown pixels are not in the sample sets. Existing sampling-based methods sample foreground and background colors based on their spatial closeness to the unknown pixels only (sample around the boundaries of the known regions [8, 13, 21] or expand the sampling range for pixels farther from the foreground and background boundaries [16]) which lead to the missing out true samples problem, especially when the trimaps are coarse. To overcome this problem, we build a large set of representative samples that covers all the color clusters in the image to avoid the loss of true samples, and then select a set of candidate samples for each unknown pixel from these representative samples via an objective function that takes advantage of spatial as well as color statistics of the samples. The samples selected by proposed method are shown in Fig. 1(h).

The second disadvantage of current non-parametric sampling-based approaches is that they choose the best (F, B) pair from candidate samples through optimization and use that pair to estimate $\alpha _{z}$ via Eq. (2). This implies that $\alpha _{z}$ is determined by a single (F, B) pair and the goodness of that pair depends on how well the optimization is done. Thus if the optimization process fails to find the best pair, the extracted matte will not be accurate. Inspired by sparse coding matting [17], a new objective function is proposed which gives the estimation of alpha value directly from a bunch of candidate foreground and background samples for a given pixel instead of estimating it from a single best pair. This objective function contains measures of chromatic distortion and spatial statistics in the image, which is the main difference from the original sparse coding matting [17] which contains the chromatic distortion only.

This paper is organized as follows. We review sampling-based matting methods and their limitations in Sect. 2 followed by description of the proposed approach in Sect. 3. Experimental results are discussed in Sect. 4 and we conclude the paper in Sect. 5.

2 Related Work

Sampling-based image matting methods mainly differ from each other in (1) how they collect the candidate foreground and background samples for unknown pixels, and (2) how they estimate the alpha matte from the candidate samples.

Samples Collection: Early sampling-based methods simply collect foreground and background samples that are spatially close to the unknown pixel, either from a local window containing the unknown pixel [11] or along the boundaries of known regions [12, 21] based on local smooth assumptions. This will cause large fitting errors when the assumptions do not hold.

Shared sampling matting [13] shots rays in different directions from unknown pixels that divide the image plane into disjoint sectors containing equal planar angles and collects samples on the rays. Fore each ray, it collects at most one background and at most one foreground sample - the ones closer to the unknown pixel along the ray as shown in Fig. 1(c). Global sampling matting [14] proposes a approach that takes all available samples into consideration. Their foreground (background) sample set consists of all known foreground (background) pixels on the boundaries of unknown regions as shown in Fig. 1(d).

The aforementioned sampling-based methods generally collect samples only around the boundaries of the known regions which may miss out true samples. Comprehensive sampling matting [16] builds a more comprehensive and representative set of known samples by expanding the sampling range farther from the foreground and background boundary and sampling from all color distributions in the sample regions as shown in Fig. 1(f). This approach gives better results than the previous sampling-based approaches. However, there is still a possibility of missing out true samples since the sampling strategy depends on spatial closeness. KL-Divergence sampling matting [18] formulates sampling as a row-sparsity regularized trace minimization problem and picks a small set of candidate samples that best explain the unknown pixels based on pairwise dissimilarities between known and unknown pixels as shown in Fig. 1(g). This method gathers a uniform sparse set of samples for all unknown pixels which might also miss out true samples. A visual comparison of the alpha mattes estimated by the proposed method with comprehensive [16] and KL-Divergence sampling methods [18] is shown in the last row of Fig. 1.

Alpha Matte Estimation: Classical parametric sampling-based image matting algorithms focus on how to model the relations between the samples and the alpha parameter. The Knockout method [12] adopts a weighted sum of candidate samples to estimate foreground and background colors of unknown pixels and uses them to estimate the alpha value in each channel. The final alpha value is estimated as a weighted sum of the values in all channels. Bayesian matting [11] models foreground and background colors as mixtures of Gaussians and the matting problem is formulated in a well-defined Bayesian framework, then the matte is solved with a maximum-likelihood criterion.

Due to the improperness of estimating alpha values with the statistical model in parametric sampling-based methods, recent non-parametric sampling-based approaches focus on selecting a best foreground and background sample pair (F, B) from candidate samples and using the best pair to estimate alpha value via Eq. (2), as suggested in [13–16, 18, 21]. They use an objective function containing different image characteristics to find the best (F, B) pair. These methods differ from each other in what image characteristics they use.

In non-parametric sampling-based methods, the alpha values are determined by a single (F, B) pair, thus when the designed objective function fails to find the best sample pair, inaccurate alpha matte will be generated. To overcome this limitation, sparse coding matting [17] cast image matting as a sparse coding problem and generates alpha values from a bunch of foreground and background samples instead of choosing a single best (F, B) pair. This approach gives visually superior matte than previous non-parametric sampling-based approaches.

3 Proposed Method

In this section, we first describe our clustering-based sampling method which collects a representative set of samples for all known pixels. Next, a simple objective function is proposed to select a set of candidate foreground and background samples for each unknown pixel from the previous collected representative set of samples. Then, we elaborate how an objective function containing both chromatic distortion and spatial statistics is proposed that gives the estimation of alpha values directly from a bunch of foreground and background samples. Finally, we describe how the pre and post-processing are used to refine the matting performance.

3.1 Gathering Samples Using K-means Clustering

The goal of sampling is to gather a representative set of foreground and background samples that covers a large range of diverse color clusters in the image so as not to miss out true samples. This is accomplished by clustering the foreground and background pixels respectively via a two-level hierarchical k-means clustering framework considering the spatial statistics as well as the color statistics in the image. This is motivated by the observation that the foreground and background colors in an image could be represented by a sparse set of pixels.

For the foreground region defined by a trimap, we first cluster the pixels into K clusters. We defined the feature vector q(z) at a given pixel z as a 5-D vector $[R_z\ G_z\ B_z\ x_z\ y_z]^T$ consisting of the concatenation of RGB color and spatial position in the image coordinates. Then, we create a matrix Q such that each column corresponds to a feature vector of one known foreground pixel. Thus, we can treat Q as the data matrix in the k-means clustering algorithm [24]. After the first level of clustering, the same clustering process is applied on pixels in each cluster but with respect to color statistics only. The numbers of clusters in the second level clustering are determined by the sum color variances of three color channels in each cluster obtained in the first level. The mean color values in each cluster at the second level constitutes the representative set of foreground samples. Using exactly the same method, a representative set of background samples could be obtained.

3.2 Selecting Candidate Samples

In the k-means clustering sample gathering step, we collect two large set of foreground and background samples that covers various color clusters in the image for all the unknown pixels. To reduce the number of legal hypotheses to be tested in the estimation of alpha matte, for each unknown pixel z, we choose a set of candidate samples that could better represent the true foreground and background colors of the pixel from that representative sample sets. Hence, a simple objective function $O_{z}$ adopting previously suggested measures of chromatic distortion $C_{z}$ and spatial statistics $S_{z}$ in [8, 15, 16, 18] is proposed:

$$\begin{aligned} O_{z}(F_{i}, B_{j}) = C_{z}(F_{i}, B_{j}) \times S_{z}(F_{i}, B_{j}) \end{aligned}$$

(3)

$C_{z}$ quantifies how well the estimated alpha value $\alpha _{z}$ of pixel z obtained using Eq. (2) from a sample pair $(F_{i}, B_{j})$ fits the linear model of composition Eq. (1), and is given by:

$$\begin{aligned} C_{z}(F_{i}, B_{j}) = exp(-\parallel {I_{z}-(\alpha _{z}F_{i} + (1-\alpha _{z}B_{j}))}\parallel ) \end{aligned}$$

(4)

where $I_{z}$ denotes the observed color of unknown pixel z. It has a high value for (F, B) pair whose estimated alpha could well fit the linear composite equation.

The term $S_{z}$ quantifies the closeness between the unknown pixel z and the sample pair(F, B) in the spatial coordinates domain. It is formulated as:

$$\begin{aligned} S_{z}(F_{i}, B_{j}) = exp(-\frac{\parallel {z-F_{i}^{s}}\parallel }{Z^{F}}) \times exp(-\frac{\parallel {z-B_{j}^{s}}\parallel }{Z^{B}}) \end{aligned}$$

(5)

where $F_{i}^{s}$ denotes the spatial coordinates of foreground sample $F_{i}$ and $B_{j}^{s}$ denotes the spatial coordinates of $B_{j}$. $Z^{F}=\frac{1}{|\mathcal {S}^{F}|}\sum _{F_{k}\in \mathcal {S}^{F}}\parallel {z-F_{k}^{s}}\parallel $ and $Z^{B} = \frac{1}{|\mathcal {S}^{B}|}\sum _{B_{k}\in \mathcal {S}^{B}}\parallel {z-B_{k}^{s}}\parallel $ correspond to the mean spatial distances from the unknown pixel z to all the foreground samples $\mathcal {S}^{F}$ with $|\mathcal {S}^{F}|$ elements and background samples $\mathcal {S}^{B}$ with $|\mathcal {S}^{B}|$ elements respectively which are used as scaling factors. Hence, it tends to select samples that are spatially close to the unknown pixel.

Finally, for each pixel z, we select a number of N foreground and background pairs with the highest values for the objective function (3). The corresponding foreground samples and background samples of the N pairs constitutes the foreground sample set ${\mathcal {S}_{z}^F}$ and background sample set ${\mathcal {S}_{z}^B}$ of the unknown pixel z.

Figure 2 shows the sampling process of the proposed method. The original image is shown in Fig. 2(a) whose trimap consists of background, unknown and foreground regions labeled as black, gray and white respectively as shown in Fig. 2(b). The foreground and background clusters obtained by a two-level k-means clustering framework are shown in Fig. 2(c), with clusters represented by different colors. Figure 2(d) shows the selected candidate samples (with red and blue points representing foreground and background samples, respectively) for pixel p (yellow point). As it can be seen, the proposed sampling strategy selects foreground and background samples from known regions for each pixel meanwhile avoids missing out true samples.

3.3 Estimating $\alpha $ via Sparse Coding

As mentioned in Sect. 2, previous non-parametric sampling-based methods generally select the best foreground and background pair (F, B) for each pixel from the candidate sample through an optimization process and use it to estimate alpha value by Eq. (2). The main drawback of these methods is that the alpha value is determined by a single best pair thus that they would generate incorrect alpha matte if the optimization fails to find the best pairs. To overcome this limitation, inspired by [17], the proposed method capitalizes on the sparse coding to establish an objective function for generating alpha values directly from a bunch of candidate foreground and background samples.

In [17], the authors form a dictionary $\mathcal {D}$ for each unknown pixel z using the collected foreground and background samples. The word vector used for constituting the dictionary is a 6-D vector $[R\ G\ B\ L\ a\ b]^T$ consisting of the concatenation of the RGB and Lab color spaces, and is normalized to unit length. $\mathcal {D}$ is a matrix with each column corresponding to a word vector with respect to the candidate sample. Then, the alpha value of pixel z is determined by sparse coding as

$$\begin{aligned} \varvec{\beta } = \mathop {argmin}\limits _{\varvec{\beta }}\parallel {v_{z}-\mathcal {D}\varvec{\beta }}\parallel ^2\qquad s.t. \quad \parallel {\varvec{\beta }}\parallel _1 \le 1;\ \varvec{\beta } \ge 0 \end{aligned}$$

(6)

where $v_{z}$ is the single vector at pixel z composed of $(R_{z},G_{z},B_{z},L_{z},a_{z},b_{z})$. The sparse codes $\varvec{\beta }$ corresponding to words in the dictionary that belong to foreground sample set are added to form the alpha value for the unknown pixel.

$$\begin{aligned} \alpha _{z} = \sum _{p\in F_z} \beta _{p} \end{aligned}$$

(7)

where $F_z$ is the set of foreground samples of pixel z. Since the non-zero values in $\varvec{\beta }$ indicate the ratios of the corresponding sample colors in composing the color of unknown pixel, the sparse codes directly provide the alpha value.

The proposed method also takes advantage of the sparse coding to directly generate $\alpha $ from a bunch of candidate samples. Moreover, we take extra characteristics of the samples into consideration while sparse coding. The extra characteristics we use are spatial distances of the samples to the unknown pixel and the color variances of the clusters which generate the sample colors. The alpha value is determined by an objective function derived from a weighted sparse coding as

$$\begin{aligned} \begin{aligned} \varvec{\beta } =&\mathop {argmin}\limits _{\varvec{\beta }} \parallel {v_{z} - \mathcal {D} \varvec{\beta }} \parallel _{2}^{2} + \lambda \parallel {diag(\varvec{w})\varvec{\beta }\parallel _{1}} \\&s.t. \quad \parallel {\varvec{\beta }}\parallel _1 \le 1;\ \varvec{\beta } \ge 0\\ \end{aligned} \end{aligned}$$

(8)

where $v_{z}$ and $\mathcal {D}$ have the same meaning as that in Eq. (6). $\lambda $ is a weighting parameter balancing the weights of the chromatic distortion and spatial statistics. $diag(\varvec{w})$ is a diagonal matrix corresponding to vector $\varvec{w}$ and indicates the weights of the words in the dictionary with respect to the characteristics of the corresponding samples, thus it is formulated as:

$$\begin{aligned} w_{p} = 1-T_{z}(Y_{p}) \times U_{z}(Y_{p}) \end{aligned}$$

(9)

where T represents the spatial statistics of the image and U indicates the color variances of the clusters.

The term $T_{z}$ measures the spatial distance of the sample $Y_{p}$ to the unknown pixel z and is given by:

$$\begin{aligned} T_{z}(Y_{p}) =\left\{ \begin{aligned}&exp(-\frac{\parallel {z-Y_p^s}\parallel }{Z_{z}^{F}}), \quad&Y_p\in \mathcal {S}_z^F\\&exp(-\frac{\parallel {z-Y_p^s}\parallel }{Z_{z}^{B}}),&Y_p\in \mathcal {S}_z^B\\ \end{aligned} \right. \quad p =1,2,\cdots ,P \end{aligned}$$

(10)

where P is the size of the dictionary $\mathcal {D}$. $Z_{z}^{F}$ and $Z_{z}^{B}$ represent the mean spatial distances from the unknown pixel z to all the candidate foreground and background samples of that pixel, respectively. Hence, the sparse codes tend to have high values for the words in $\mathcal {D}$ that are computed by spatially close samples to the unknown pixel.

The term $U_{z}$ forces the sparse codes be biased towards those samples that come from clusters with low color variances and is formulated as:

$$\begin{aligned} U_{z}(Y_{p}) = \left\{ \begin{aligned}&exp(-(1+\frac{\log _{10}{Y_{p}^r}}{M_{F}})), \quad&Y_p\in \mathcal {S}_z^F\\&exp(- (1+\frac{\log _{10}{Y_{p}^r}}{M_{B}})),&Y_p\in \mathcal {S}_z^B\\ \end{aligned} \right. \quad p =1,2,\cdots ,P \end{aligned}$$

(11)

where $Y_{p}^r$ is the sum variances of each color channel in the cluster the sample $Y_{p}$ comes from. The scalars $M_{F} = max_{F_{k} \in S_{z}^F}(\left| {\log _{{10}} F_{k}^{r} } \right| )$ and $M_{B} = max_{B_{k} \in B_{z}^F}(\left| {\log _{{10}} B_{k}^{r} }\right| )$ are used as scaling factors, which correspond to the maximum absolute logarithm of the sum variances in three color channels in clusters forming the foreground sample set and background sample set, respectively.

The optimization of Eq. (8) can be solved as quadratic programming problem. We use a variant of active-set algorithm [25] that can benefit from the sparsity solution [26] to solve the optimization problem. Once the codes $\varvec{\beta }$ are generated, the alpha value of pixel z could be obtained using Eq. (7).

Figure 3 shows the effect of taking spatial statistics and color variances into consideration while estimating alpha matte. The original image is shown in Fig. 3(a) with corresponding foreground and background boundaries. Zoomed areas are shown in Fig. 3(b). Figure 3(c) shows the alpha mattes of zoomed areas obtained with $\lambda = 0$ and (d) with $\lambda = 0.0025$ in Eq. (8). The ground truth mattes of zoomed areas are shown in Fig. 3(e). As it can be seen, combining chromatic distortion, spatial statistics and color variances into a weighted spare coding framework provides more accurate alpha matte than just using chromatic distortion [17].

3.4 Pre and Post-processing

Akin to recent sampling-based matting approaches [16–18], we adopt some pre- and post-processing steps on the proposed method.

Expansion of Known Regions: To obtain a more refined trimap, the proposed method uses a pre-processing step to extrapolate known foreground and known background regions into the unknown regions based on certain chromatic and spatial thresholds. An unknown pixel z is considered as foreground if, there exists a pixel $r\in F$ satisfying

$$\begin{aligned} (\parallel {z-r}\parallel < E_{thr})\wedge (\parallel {I_z-I_r}\parallel \le C_{thr}-\parallel {z-r}\parallel )) \end{aligned}$$

(12)

where $E_{thr}$ and $C_{thr}$ are threshold in spatial and color spaces, respectively. A similar formulation is applied to expand the background regions.

Local Smoothing: As a post-processing, we perform local smoothing on the initial alpha matte estimated by weighted sparse coding to obtain a smooth matte using a modified version of the Laplacian matting model [2] adopted in [13]. Hence, the final alpha matte is optimized with a cost function consisting of the data term $\hat{\varvec{\alpha }}$ and a confidence value f together with a local smoothness term expressed by matting Laplacian given by:

$$\begin{aligned} \varvec{\alpha }= \mathop {argmin}\limits _{\varvec{\alpha }} \varvec{\alpha }^T \varvec{L} \varvec{\alpha } + \varvec{(\alpha -\hat{\alpha })}^T (\varepsilon \varvec{\varSigma } +\gamma \varvec{\varGamma })\varvec{(\alpha -\hat{\alpha })} \end{aligned}$$

(13)

where $\hat{\varvec{\alpha }}$ is the initial alpha matte generated using Eq. (7). $\varvec{L}$ is the matting Laplacian defined in [2]. $\varepsilon $ is a large weighting parameter penalizing the divergence of the alpha values of the known pixels and $\gamma $ is a constant value denoting the relative importance of the data and smoothness terms. The data term imposes the final alpha matte to be close to the initial alpha matte $\hat{\varvec{\alpha }}$ and the matting Laplacian enforce local smoothing. $\varvec{\varSigma }$ is a diagonal matrix with values 1 for known foreground and background pixels and 0 for unknown pixels, while the diagonal matrix $\varvec{\varGamma }$ has values 0 for known pixels and f for unknown pixels. The confidence value $f_{z}$ at a given pixel z is computed by:

$$\begin{aligned} f_{z} = R_{z}\times {C_{z}} \end{aligned}$$

(14)

$$\begin{aligned} R_{z} = exp(-\parallel {v_{z}-\mathcal {D}\varvec{\beta }}\parallel ) \end{aligned}$$

(15)

where $R_{z}$ measures the deviation in reconstructing the input single vector based on sparse coefficients; $C_{z}$ measures the distortion between estimated color and observed color which has been explained in Eq. (4).

4 Experimental Results

In this section, we first assess the effect of $\lambda $ in Eq. (8). Then the performance of the proposed matting method is evaluated on a benchmark dataset [27]. It consists of 27 training images and 8 testing images. The training images have two types of trimaps: small and large while the testing images have three types of trimaps: small, large and user defined which are available at www.alphamatting.com. The ground-truth alpha mattes for the training set are publicly available and hidden from the public for testing images. An independent quantitative evaluation is provided in terms of the mean squared error (MSE), the sum of absolute difference (SAD), the gradient error and the connectivity error. Finally, we evaluate the effectiveness of the proposed sampling method in dealing with missing out true samples problem.

4.1 Effect of Parameter $\lambda $

To verify the effectiveness of our weighted sparse coding in generating alpha value in a quantitative manner, we evaluate the average MSE values over all the training images with all trimaps on the benchmark dataset with different values of $\lambda $, and is shown in Fig. 4. When $\lambda =0$, the objective function used to estimate alpha matte becomes the same to that in [17]. As it can be seen in Fig. 4, our objective function considers both chromatic distortion and spatial statistics performs better than that in [17] which only considers chromatic distortion when $\lambda $ is set properly. In the experiments, $\lambda $ is set to 0.0025 as it provides the minimum MSE value for the training set.

Table 1. Evaluation of matting methods on the benchmark dataset [27] with three trimaps with respect to SAD, MSE and Gradient error metrics

Full size table

4.2 Evaluation on Benchmark Dataset

Table 1 shows the quantitative evaluation of the proposed matting approach compared with current matting methods via the alpha matting website [27]. Only ten best preforming methods are shown in the table. We report the average rankings over the 8 testing images according to SAD, MSE and gradient error metrics. “Average small/large/user ranks” represent the average ranks over images for each of the three types of trimaps. The overall rank is the average over all the testing images for all types of trimaps. The proposed method ranks first with respect to SAD with a overall rank of 10.2. We achieve the best ranking among all the methods with respect to SAD and gradient error on the large trimap and ranks second with respect to MSE (the first is LNSP [22]). The proposed method also ranks first on the user trimap with respect to SAD error. This implies that the proposed method are more robust to the fineness of the trimap than previous sampling-based methods since it weakens the spatial assumptions while sampling for unknown pixels.

Figure 5 shows the visual comparison of our approach with the recent matting methods [16–18, 22] on the doll, plant and pineapple images from the benchmark dataset. Original images and zoomed areas are shown in Fig. 5(a) and (b) respectively. The estimated mattes for zoomed areas by recent matting methods of LNSP [22], Comprehensive sampling [16], Sparse coding matting [17], KL-Divergence based sparse sampling [18] and ours are shown in Fig. 5(c–g). The doll (first and second rows) is placed in front of a highly textured background which makes it hard for sampling-based approaches to discriminate between foreground and background as shown in Fig. 5(c,d,f). Sparse coding matting [17] which employs the success of sparse coding proposes a better matte as shown in Fig. 5(e). The same problem happens on the first zoomed area of plant (third row) where some characters of the background are considered as foreground as shown in Fig. 5(c,d,f). Sampling-based methods typically rely on certain spatial assumptions while collecting samples from known regions which might lead to missing out true foreground and background colors for some unknown pixels such as pineapple and the second zoomed area in plant (last three rows) as shown in Fig. 5(d). Although KL-Divergence sampling approach formulates sampling as a sparse subset selection problem, it collects the same set of samples for all the unknown pixels which also leads to missing out true samples as seen in Fig. 5(f).

The proposed method builds a representative sample set for all unknown pixels to cover all true samples, and then selects a set of candidate samples for each unknown pixel via an objective function. Moreover, inspired by sparse coding matting [17], we use a weighted sparse coding to generate alpha value directly from a bunch of foreground and background samples which avoids the limitation that the quality of the alpha matte is highly rely on the goodness of a single simple pair. These two characteristics make the proposed approach extract out a visually superior matte in these ambiguous areas as shown in Fig. 5(g).

4.3 Missing True Samples

Previous sampling-based image matting methods typically rely on spatial closeness while collecting samples which would fail to generate accurate alpha matte when the true samples are not spatially close to the unknown pixels, this problem is known as missing out true samples. Figure 6(a) shows two original images with their corresponding foreground and background boundaries from the benchmark dataset [27]. Zoomed areas and their ground truth alpha mattes are shown in Fig. 6(b) and (c), respectively.

In the zoomed area of doll girl image (first row), the true background colors of the gray pixels in the unknown region are far away from the them, thus the set of background samples by comprehensive and global sampling methods do no contain the gray colors. They wrongly estimate these pixels as foreground as shown in the first row of Fig. 6(f) and (g). The pumpkin image (second row) has a complex foreground and the distribution of the true foreground samples do not satisfy the spatial closeness assumption, thus are missed out in the foreground sample set collected by comprehensive and global sampling methods for some parts of the pumpkin. They are mistakenly estimated as background as shown in the second row of Fig. 6(f) and (g). KL-Divergence sampling select a sparse set of foreground and background samples which might also miss out true samples as shown in Fig. 6(e). The proposed method collects a relatively large and representative set of samples from all the known regions and selects a candidate set of samples for each unknown pixels based on both color and spatial statistics to solve the problem of missing out true samples. The visual comparison between ground truth mattes and estimated ones by proposed method is shown in Fig. 6(c) and (d), demonstrating that the sampling strategy proposed in this paper could well solve the missing out true samples problem.

5 Conclusions

A robust sampling-based image matting approach is proposed that applies a new sampling strategy to build a representative set of samples from known regions. Rather than collecting samples according to spatial assumptions or selecting a uniform sample set for all unknown pixels, we select samples for each unknown pixel based on both color and spatial statistic to solve the problem of missing out true samples. Moreover, based on weighted sparse coding, we adopt a new objective function to generate alpha values from a bunch of candidate samples directly to remove the restriction of a single (F,B) pair in determining the alpha value. Finally, the quality of the estimated matte is refined using a local smooth priors. Experimental results show that the proposed method achieves more robust performance than state-of-the-art approaches evaluated on a benchmark dataset.

References

Porter, T., Duff, T.: Compositing digital images. In: Proceedings of 11th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1984, pp. 253–259. ACM, New York (1984)
Google Scholar
Levin, A., Lischinski, D., Weiss, Y.: A closed form solution to natural image matting. In: Proceedings of 2006 IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 61–68, June 2006
Google Scholar
Levin, A., Rav-Acha, A., Lischinski, D.: Spectral matting. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2007
Google Scholar
Juan, O., Keriven, R.: Trimap segmentation for fast and user-friendly alpha matting. In: Paragios, N., Faugeras, O., Chan, T., Schnörr, C. (eds.) VLSM 2005. LNCS, vol. 3752, pp. 186–197. Springer, Heidelberg (2005). doi:10.1007/11567646_16
Chapter Google Scholar
Sun, J., Jia, J., Tang, C.K., Shum, H.Y.: Poisson matting. ACM Trans. Graph. 23(3), 315–321 (2004)
Article Google Scholar
Bai, X., Sapiro, G.: A geodesic framework for fast interactive image and video segmentation and matting. In: IEEE 11th International Conference on Computer Vision, pp. 1–8, October 2007
Google Scholar
Singaraju, D., Rother, C., Rhemann, C.: New appearance models for natural image matting. In: Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 659–666, June 2009
Google Scholar
He, K., Sun, J., Tang, X.: Fast matting using large kernel matting Laplacian matrices. In: Proceedings of 2010 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2165–2172, June 2010
Google Scholar
Rhemann, C., Rother, C., Rav-Acha, A., Sharp, T.: High resolution matting via interactive trimap segmentation. In: Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2008
Google Scholar
Chen, Q., Li, D., Tang, C.K.: KNN matting. In: Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 869–876, June 2012
Google Scholar
Chuang, Y.Y., Curless, B., Salesin, D., Szeliski, R.: A Bayesian approach to digital matting. In: Proceedings of 2001 IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 264–271 (2001)
Google Scholar
Berman, A., Dadourian, A., Vlahos, P.: Method for removing from an image the background surrounding a selected object. US Patent 6134346 (2000)
Google Scholar
Gastal, E.S.L., Oliveira, M.M.: Shared sampling for real-time alpha matting. Comput. Graph. Forum 29(2), 575–584 (2010)
Article Google Scholar
He, K., Rhemann, C., Rother, C., Tang, X., Sun, J.: A global sampling method for alpha matting. In: Proceedings of 2011 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2049–2056, June 2011
Google Scholar
Shahrian, E., Rajan, D.: Weighted color and texture sample selection for image matting. In: Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 718–725, June 2012
Google Scholar
Shahrian, E., Rajan, D., Price, B., Cohen, S.: Improving image matting using comprehensive sampling sets. In: Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 636–643, June 2013
Google Scholar
Johnson, J., Rajan, D., Cholakkal, H.: Sparse codes as alpha matte. In: Proceedings of British Machine Vision Conference. BMVA Press (2014)
Google Scholar
Karacan, L., Erdem, A., Erdem, E.: Image matting with KL-divergence based sparse sampling. In: IEEE 15th International Conference on Computer Vision, pp. 424–432 (2015)
Google Scholar
Wang, J., Cohen, M.: An iterative optimization approach for unified image segmentation and matting. In: IEEE 10th International Conference on Computer Vision, vol. 2, pp. 936–943, October 2005
Google Scholar
Guan, Y., Chen, W., Liang, X., Ding, Z., Peng, Q.: Easy matting - a stroke based approach for continuous image matting. Comput. Graph. Forum 25, 567–576 (2006)
Article Google Scholar
Wang, J., Cohen, M.: Optimized color sampling for robust matting. In: Proceedings of 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2007
Google Scholar
Chen, X., Zou, D., Zhou, S., Zhao, Q., Tan, P.: Image matting with local and nonlocal smooth priors. In: Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1902–1907, June 2013
Google Scholar
Zhu, Q., Shao, L., Li, X., Wang, L.: Targeting accurate object extraction from an image: a comprehensive study of natural image matting. IEEE Trans. Neural Netw. Learn. Syst. 26(2), 185–207 (2015)
Article MathSciNet Google Scholar
Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. Appl. Stat. 28(1), 100–108 (1979)
Article MATH Google Scholar
Chen, Y., Mairal, J., Harchaoui, Z.: Fast and robust archetypal analysis for representation learning. CoRR abs/1405.6472 (2014)
Google Scholar
Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Optimization with sparsity-inducing penalties. Found. Trends Mach. Learn. 4(1), 1–106 (2012)
Article MATH Google Scholar
Rhemann, C., Rother, C., Wang, J., Gelautz, M., Kohli, P., Rott, P.: A perceptually motivated online benchmark for image matting. In: Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1826–1833, June 2009
Google Scholar

Download references

Acknowledgements

This work is supported by the funds of National Natural Science Foundation of China (No. 61572058) and National High Technology Research and Development Program of China (No. 2015AA016402). The authors would like to thank the anonymous reviewers for their insightful comments and suggestions on this work.

Author information

Authors and Affiliations

State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
Xiaoxue Feng, Xiaohui Liang & Zili Zhang

Authors

Xiaoxue Feng
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohui Liang
View author publications
You can also search for this author in PubMed Google Scholar
Zili Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaohui Liang .

Editor information

Editors and Affiliations

RWTH Aachen, Aachen, Germany
Bastian Leibe
Czech Technical University, Prague 2, Czech Republic
Jiri Matas
University of Trento, Povo - Trento, Italy
Nicu Sebe
University of Amsterdam, Amsterdam, The Netherlands
Max Welling

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Feng, X., Liang, X., Zhang, Z. (2016). A Cluster Sampling Method for Image Matting via Sparse Coding. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds) Computer Vision – ECCV 2016. ECCV 2016. Lecture Notes in Computer Science(), vol 9906. Springer, Cham. https://doi.org/10.1007/978-3-319-46475-6_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-46475-6_13
Published: 17 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46474-9
Online ISBN: 978-3-319-46475-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics