Image smoothing based on global sparsity decomposition and a variable parameter

Smoothing images, especially with rich texture, is an important problem in computer vision. Obtaining an ideal result is difficult due to complexity, irregularity, and anisotropicity of the texture. Besides, some properties are shared by the texture and the structure in an image. It is a hard compromise to retain structure and simultaneously remove texture. To create an ideal algorithm for image smoothing, we face three problems. For images with rich textures, the smoothing effect should be enhanced. We should overcome inconsistency of smoothing results in different parts of the image. It is necessary to create a method to evaluate the smoothing effect. We apply texture pre-removal based on global sparse decomposition with a variable smoothing parameter to solve the first two problems. A parametric surface constructed by an improved Bessel method is used to determine the smoothing parameter. Three evaluation measures: edge integrity rate, texture removal rate, and gradient value distribution are proposed to cope with the third problem. We use the alternating direction method of multipliers to complete the whole algorithm and obtain the results. Experiments show that our algorithm is better than existing algorithms both visually and quantitatively. We also demonstrate our method’s ability in other applications such as clip-art compression artifact removal and content-aware image manipulation.


The problem
Natural images usually contain texture and structure. The human visual system can easily understand the structure without being affected by texture. However, for the computer, because the texture can be complex, irregular, and anisotropic [1], it is a challenging task to remove texture from an image. The purpose of image smoothing is to remove the texture without destroying the structure as much as possible. Image smoothing is an important and widely used image processing technology; in tasks such as image segmentation, edge extraction, image enhancement, image decomposition, and artifact removal, it simplifies the problem immensely. Existing image smoothing algorithms can be roughly divided into three categories: filters based on local information, global optimization frameworks, and data-driven methods.

Filters based on local information
Bilateral filtering (BLF) [2] is a representative smoothing filter, which achieves smoothing by estimating the value of local patches by weighted average (Gaussian kernel estimation). After BLF was proposed, many improved versions [3,4] appeared, mostly using modified Gaussian kernels. Among them, bilateral texture filtering (BTF) [5] can ensure high sharpness of edges, but flat regions lack regularity, and the visual effect is poor. Tree filtering [6] successfully mitigates the ringing phenomenon by constructing a tree structure, but if a misclassified pixel occurs, it causes the main edge to break down, resulting in a false boundary. Filters based on local information also include anisotropic filters [7], guided filters [8], an extremum smoothing algorithm [9], etc. Most of these filters are intuitive and simple, but are too dependent on local information and cause ringing.

Global optimization frameworks
Weighted least squares (WLS) [10] is a relatively robust image smoothing algorithm based on global optimization, particularly suitable for gradual image coarsening and edge preserving multi-scale detail extraction. Inspired by the difference-of-Gaussian (DoG) feature extraction algorithm [11,12], relativity-of-Gaussian (RoG) [13] smooths the image by describing the relationship between Gaussian filters with different sizes. However, RoG has several problems, such as difficult parameter control and a tendency to local information loss. A well-known algorithm, total variation (TV) [14] performing regular optimization based on the L 1 norm, is often used in image denoising and restoration, but cannot effectively achieve smoothing.
By approximately highlighting the image structure to control the number of non-zero gradients, L 0 gradient minimization (L 0 ) [15] algorithm has adequate protection for the main edges, but can easily lose original colors, so is lacking in aesthetics. L 0 gradient projection (L 0 p) [16] overcomes the difficulty of parameter control of L 0 , without limiting the obvious pseudo-boundaries. To overcome the shortcomings of L 0 p, algorithm in Ref. [17] restricts the smoothed image's gradient, only matching a few images. Compared with filtering methods, the optimization framework is more flexible but lacks local information protection, and can easily lose local weak edges. The relative total variation (RTV) [18] algorithm applies the relative norm to combine local filters with global optimization. Although the effect is sound, it may cause edge expansion and fail to protect local weak edges.

Data-driven methods
With the development of deep learning, data-driven image smoothing algorithms have appeared. However, as there is no ground truth in image smoothing, conventional supervised and semi-supervised learning methods cannot be readily used. The algorithms [19,20] attempt to use a unified CNN framework to simulate the previous smoothing methods [10,18], without getting rid of the limitations of the original algorithms. Although the algorithm (DVP) [21] optimizes the image smoothing process to improve the effect to a certain extent by training parameters, generalization is always the barrier.

Contributions
Most filters based on local information are relatively intuitive and simple. Nonetheless, they tend to depend on the image's local information, resulting in ringing, edge expansion, and other phenomena. The optimization framework methods are flexible, but the regular terms' global selection can hardly protect local information. Generalization is still a limitation of the data-driven methods. Combining local filters and the global optimization framework can mitigate the problem to some extent, but is still not ideal, with, e.g., insufficient weak-edge protection.
Overall, we may summarize the three main problems faced by image smoothing at present, and propose improvement schemes: 1. The smoothing effect should be enhanced for richly textured images. In general, with increasing image texture, current algorithms' smoothing effects become worse, as shown in Fig. 1 The difficulty with image smoothing lies in the algorithm's ability to distinguish between texture and edge, but the human eye can readily achieve this. Therefore, we can compare the edges extracted from the smoothing results with manually selected edges from the original image to evaluate the smoothing effect. Furthermore, image smoothing changes the gradient distribution, and the gradient is positively correlated with image smoothness, so we can compare the gradient distribution of the results to evaluate algorithms. Based on the above two points, we propose three measures: edge integrity rate, texture removal rate, and gradient value distribution to quantitatively evaluate the results using edges and gradients. In summary, we combine local filters with the global optimization framework and propose an image smoothing algorithm based on global sparsity decomposition and a variable parameter. Firstly, the global sparse decomposition is used to pre-remove part of the texture to improve the smoothing performance. The variable parameter is then obtained via a parametric surface by patch-shift selection using the improved Bessel method to ensure localization and continuity. Finally, to limit image gradient variation through the L 1 norm, we achieve image smoothing. A flow chart is shown in Fig. 2.
The main structure of this paper is as follows: Section 2 introduces the model framework used in our algorithm, as well as the global sparse decomposition, patch-shift parameter selection, and improved Bessel fitting. Section 3 describes in detail the solution of our algorithm based on alternating direction method of multipliers (ADMM). Section 4 shows the effects of selecting different parameters and compares the differences in visual effects and quantitative results between other algorithms and ours. Section 5 shows application of our algorithm to clip-art compression artifact removal and content-aware image manipulation. Finally, Section 6 summarizes the paper.

Problem
This section considers how to solve the first two problems. Natural images can generally be described as Here y, x, and n represent the original image, structure, and texture respectively. The goal of image smoothing is to obtain x from y. The global optimization framework can be described aŝ wherex is the result, the first term is a fidelity term, and λ is a smoothing parameter. R(x) is the regularisation term, which is prior information and non-negative. We require λ > 0, otherwise R(x) may not give the right guidance. For example, if λ = −1, the second term in Eq. (1) is − ∇x 2 2 causing us to expect a larger gradient of x when finding the minimum, which runs counter to our intent of removing texture information.

Global sparse decomposition
For the first problem, we decompose the image into two parts: the low frequency part representing the structure and the high frequency part containing the texture. In image super-resolution and image reconstruction, the high frequencies are usually considered to be the missing information in the scaling process used to refine the result [22][23][24].
In contrast, high frequencies need to be removed during image smoothing. What needs to be made clear here is that we need to remove the texture beforehand and to ensure as much as possible that edges are not damaged. Therefore, global sparse decomposition is used to ensure that the high frequencies are sparse and to reduce the loss of structural information. The algorithm can be described as where y L and y H represent low and high frequency information respectively. f L is a low-pass filter, and ⊗ is the convolution operator. f L ⊗ y L is used to ensure the smooth component contains low-frequency information, so as to ensure that y H approximately represents the texture. κ controls the smoothness: the larger κ, the more information It is well known that the L p norm can promote sparsity when p 1. Here we use y H 1 to force y H to be the sparse component under the L 1 norm (L 1 norm is used to ensure convexity), making y H contain only texture without destroying the structure. We compare high frequency y H with the gradient in Fig. 3, and label different colors according to the pixel values. Obviously, y H is more sparse than the gradient. We further analyze this property in Fig. 4, and present the numerical distribution of gradient and y H with different κ. It can be seen that the peak value of y H is near 0, and the numerical distribution is closer to the Laplace distribution. This is since y H is treated sparsely under the L 1 norm. Comparatively, the gradient's numerical distribution is fluctuating, and the peak is non-zero, containing much missing structural information from the image. Furthermore, it can be presumed that κ affects the sparsity of y H : the larger κ, the sparser y H . After removing y H , we use y L for smoothing.

Patch-shift parameter selection and surface fitting
The second problem occurs mainly because λ is a constant parameter. Even if we get the globally optimal solution, it may not be locally optimal. Separate parameter calculations for all points can validly solve this problem but are extremely timeconsuming, so we propose a two-step parameter calculation algorithm, including patch-shift parameter selection and parametric surface fitting.

Patch-shift parameter selection
Patch-shift is an intuitive approach, where we set the values of patches by comparing them to global variations. To simply adjust the smoothness, a user adjustable parameter λ G replaces the original λ. We define the local parameter λ i,j by where Ω i,j refers to the patch and (i, j) are its coordinates. σ(·) is the standard deviation operation. χ i,j is the fluctuation rate. s is a simple adjustment factor. ε is a small value to prevent the denominator from being zero. As shown in Fig. 5, the smoother Ω i,j , the smaller and more rapidly decreasing χ i,j . However, due to patch-shift parameter selection discontinuity, the results show an obvious pseudo boundary at the junctions of patches, as shown in Fig. 6(d).

Parametric surface fitting
To solve this problem, we propose a novel algorithm. We assign the patches' parameters to their center points to get a set of sample values. This grid of values can be considered to be a low-resolution surface, and we fit a parametric surface based on it, to give each pixel can get a unique parameter. After comparing various fitting methods, the Bessel method was chosen because: (i) the sample values calculated by Eq. (2) tend to fluctuate and cannot well-tuned by s alone. The Bessel method typically smooths the midpoint by passing only the starting and ending points of the sample values, which allows for easy parameter adjustment, and (ii) since Bessel method guarantees convexity, ensuring that λ correctly satisfies λ > 0. Usually, the greatest similarity is found between adjacent points, so we propose a fitting method based on neighboring patches (n-patches), to allow more sample values to act on point p which needs to be found. While considering the computational difficulty, the 16 sample values closest to p are chosen to construct a parametric surface. Each 3 × 3 patch is called an n-patch, and 16 sample values constitute 4 n-patches, as shown in Fig. 7 (all red points construct one n-patch). We assume the window sliding step to be 1 for ease of illustration. are Bźier basis functions defined by and t is the distance from p to the reference point of f i,j (p) in the space (m, n). All of f i,j (p), f i+1,j (p), f i,j+1 (p), and f i+1,j+1 (p) can give different parameters. However, we expect points with the same pixel values to have the same parameters in order to make the image smoother except at the edges, so a pixel-sensitive Gaussian weight considering pixel values is used to sum the four parameters. F i,j (p) can be defined as The weights in Eq. (3) are P and P i,j are the pixel values at point p and the center point of f i,j (p), respectively. s is a adjustment factor. As shown in Fig. 8, ω makes the result smoother.
After obtaining the parameters of all points, we combine them as χ y L . Under the control of λ G , the final parameter can be expressed as

Efficient ADMM method for image smoothing
In order to improve efficiency, we combine the global sparse decomposition, the parametric surface, and the L 1 norm to give our model [25,26]: arg min x,y L ,y H Here α and κ weigh the sparsity of y H . λ y L controls the sparsity of gradient. Eq. (5) is non-differentiable and non-linear and is difficult to solve directly. So, we adapt ADMM to optimize this function iteratively. Two Lagrange constraints are added based on this strategy: arg min x,y L ,y H ,T For ease of writing, we omit d and replace the four-direction operator with ∇. γ 1 and γ 2 are the parameters of the two Lagrange constraints. In practical, γ 1 and γ 2 are initialized to small positive values and are increased in each iteration to ensure convergence. μ 1 and μ 2 are Lagrange multipliers. T is the auxiliary parameter. Eq. (6) is convex, so we can update each parameter iteratively until convergence.

Computing y L
Assuming all other parameters are fixed, we have arg min The above problem can be solved directly by gradient descent and optimized by two-dimensional fast Fourier transform: F(·) and F −1 (·) represent the fast Fourier transform (FFT) and its inverse (IFFT). F(·) denotes complex conjugation. We invert the matrix in the spatial domain by element multiplication in the frequency domain, making the operation more efficient.

Computing y H
Consistent with the previous idea, we let Λ = y − f L ⊗ y L + μ 1 , to give the problem for y H : arg min This problem can be solved independently for each pixel i via simple soft-thresholding:

Computing T
Similarly, fixing the variables other than T in Eq. (6), the problem for T can be expressed as follows: In the same way as for Eq. (8), we obtain

Computing x
After finding T , y L , and y H , the optimization of x can be described by arg min x The above problem also meets the requirements of gradient descent, and can be solved using:

Updating μ 1 and μ 2
At the end of each iteration, the Lagrange multipliers must be updated:

Algorithm summary and complexity
The entire reconstruction process is outlined in Algorithm 1. The most time-consuming part is the solution of λ y L , which depends on the number of pixels N , the number of patches K, and the patch size k. In general, we do not reduce N during smoothing as doing so loses detail. However, K and k can directly affect the fitting effect, so we conducted a series of experiments to balance time and quality. We use N ∇ to describe the smoothing effect:

8:
Update μ 1 and μ 2 using Eq. (11); 9: end while Assuming that images x 1 and x 2 are equally large, N ∇ (x 1 ) = N ∇ (x 2 ) implies that the two images have the same smoothness. Our algorithm can reduce the gradient at each iteration, so the performance of different K or k can be evaluated by comparing how much time it takes to smooth the same image to the same N ∇ . The experimental images used for this purpose were all taken from BSD500. As can be seen from Fig. 9(a), the time consumed is almost constant as k increases when N ∇ is large enough. However, as N ∇ decreases, the time spent becomes gradually positively correlated with k. Therefore, k can be adjusted according to specific needs. In this paper, we set k = 3. As image patches are selected in various ways, comparing K is confusing. Assuming that k is fixed, we replace K with the image patch move steps, which is interpreted as different percentages k. As shown in Fig. 9(b), the operation time decreases first and then increases as steps increase. The optimal value is about 0.3. Moreover, a decreasing difference between the λ y L of two adjacent iterations was witnessed during the experiment. So we set a strategy to reduce the number of λ y L calculations: after the 10th iteration, we calculate λ y L every five iterations. Experiments show that this strategy can not only ensure the correctness of our algorithm, but also effectively reduce the calculation time. In terms of convergence, Eq. (6) is convex. When the values of γ 1 and γ 2 are large enough, ADMM can ensure that the variables converge [27][28][29][30].

Experiments and discussion
In this section, the values of parameters in our algorithm are firstly discussed. Then we compare other algorithms with ours in terms of visual effects, and we create some images to evaluate the results quantitatively. Finally, the solution to the third problem is given.

Analysis of parameters
The size of f L can directly affect the time spent on decomposition, so we conducted statistical experiments based on BSD500 to select the most efficient value. As shown in Fig. 10, the time consumed is relatively stable and has the lowest average when f L is 6 × 6. α affects the smoothness of y L and the information contained in y H . As can be seen from Fig. 11, image decomposition can effectively separate the high frequencies; y L becomes smoother  as α increases. Here we set α = 5. Figure 12 presents a set of smoothed results for different λ G . It can be seen that the larger λ G , the smoother the result. We set γ 1 = γ 2 and each iteration increases by 5% [31].

Comparison of visual effects
In order to demonstrate the effectiveness of our algorithm, we compare it to WLS [10], TV [14], tree filter [6], RoG [13], L 0 [15], RTV [18], and DSHFG [32]. All algorithms used were based on the code provided by the authors and with manual adjustment of parameters. As shown in Fig. 13, WLS does not distinguish textures and edges well, nor does TV, and the whole result is very blurry. Tree filtering averages bilateral  weights and tree weights, but it does not protect all edges well. RoG uses several sets of Gaussian kernels with different weights to achieve texture removal, which can fully smooth the image globally, but some edges are not well protected. L 0 can better sharpen and protect the strong edges, but the effect when processing high-contrast texture images is poor, because it is difficult to distinguish textures based solely on gradients. RTV's local regularity can help it to achieve texture removal, but it cannot protect local weak edges well. DSHFG is an L 0 norm minimization smoothing algorithm based on image decomposition, which removes texture well, but it also loses local weak edges. In contrast, our algorithm can not only distinguish texture and structure well and remove texture, but also effectively protects weak edge.

Quantitative comparison based on created images
To quantitatively evaluate the results of different algorithms using PSNR, we manually constructed several texture images, as shown in Fig. 14. In order to show the poor generalization of data-driven methods, VDCNN [33] and ResNet [33] are added to the control group. Smoothing results are shown in Fig. 15 and Table 1. It can be seen that most algorithms except TV remove texture well, but there are some artificial textures that have not been removed in Fig. 15(d) and Fig. 15(e). The PSNR shows that our algorithm is better.

Quantitative comparison with the proposed evaluation method
We propose three evaluation metrics in terms of edge and gradient distribution to overcome the dependence on ground truth.

Edge integrity rate and texture removal rate
Conventional edge extraction algorithms cannot reasonably distinguish texture and edges, as shown in Fig. 16(b). So we manually indicate the real edges and present the edge integrity rate and the texture removal rate to evaluate the smoothing effect. The edge integrity rate evaluates the degree of protection of edges. The texture removal rate evaluates the level of texture removal. They are defined as follows: where EI and T R represent edge integrity rate and texture removal rate. EE(x) and GT (y) are the extracted edges from smoothing results and handdrawn ground truth. The operators and ⊕ mean XNOR and XOR, respectively. While the author of RTV provides the texture image we experimented with, the manually indicated edges are too coarse, and we redrew them. Let us first observe the visual differences between the several algorithms: see Fig. 17. Smoothing can result in simplifying edges. WLS and RTV do a good job of removing textures, but they also cause some missing edges. DSHFG can preserve relatively intact edges. However, DSHFG loses some weak edges, such as the flower-like edge at the bottom left of the image. Tables 2 and 3 show the edge integrity rate and texture removal rate for all algorithms and demonstrate that our algorithm outperforms the others. The average edge integrity rate of all algorithms is lower than 50%. This is because while the human eye can distinguish texture and edges, it can not be easy to determine the exact location of pixel-level edges. The boundaries obtained by smoothing algorithms are typically 1-3 pixels wide, while the labeled data are generally wider than 3 pixels, a problem we address in our next study. As shown in Table 3, the texture removal rates of RTV, DSHFG, and our method are relatively good, with some even exceeding 99%. The effects evaluated by the two indexes are consistent   with our visual conclusions as a whole, indicating that these two indexes can provide a good quantitative comparison of image smoothing.

Gradient value distribution
Image smoothing is about eliminating as much redundant texture as possible, which leads to gradients in the sparse direction. Thus, gradient value distribution can also be used to describe smoothness. On the premise of ensuring that structure is not destroyed, the more the distribution tends to 0, the sparser the gradient, and the better the smoothing effect. As shown in Fig. 18, the peak values of the gradient for all algorithms lie around 0, indicating that the gradients of smoothed images tend to be sparse. Other than TV, our algorithm has the highest sparsity. From a visual inspection, it can be seen that TV destroys structural information, which leads to extreme sparsity. In summary, our algorithm outperforms the others in visual performance and is supported by the three suggested metrics: edge integrity rate, texture removal rate, and gradient value distribution.

Clip-art compression artifact removal
Image processing operations such as compression or super-resolution can distort images such as cartoons and clip-art, and generate pseudo boundaries that traditional denoising algorithms cannot remove. As can be seen in Fig. 19, image smoothing can effectively solve this problem, and our method provides better results than with others.

Content-aware image manipulation
Our proposed method can be combined with image significance detection [34] to realize content-aware image manipulation by dividing the image into foreground and background and processing them separately to achieve foreground enhancement or background blurring.

Conclusions and limitations
In summary, we have made targeted improvements to three current problems in image smoothing. We enhance smoothing performance on richly-textured images by pre-removingl textures based on global sparse decomposition. By parameter adaptation based on patch-shift and parametric surface fitting through an improved Bessel method, we adapt to the inconsistency of different parts of the image. Three evaluation metrics are proposed to quantitatively evaluate smoothing performance, avoiding dependence on ground truth. Comparisons with existing algorithms demonstrate that our algorithm works better. However, our algorithm also has limitations. We do not solve the problem of training pairs, so it cannot train a convolutional network intuitively. If this problem is solved, the smoothing quality can be further improved, which is what we hope to do next.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www. editorialmanager.com/cvmj.