Fast and faithful scale-aware image filters

This paper proposes a fast and accurate computational framework for scale-aware image filters. Our framework is based on accurately approximating L1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L^{1}$$\end{document} Gaussian convolution with respect to a transformed pixel domain representing geodesic distance on a guidance image manifold in order to recover salient edges in a manner faithful to scale-space theory while removing small image structures. Our framework possesses linear computational complexity with high approximation precision. We examined it numerically in terms of speed, accuracy, and quality compared with conventional methods.


Introduction
Shape and texture structures in images are composed of various scale sizes. Therefore, a multi-scale mathematical framework called scale-space theory [18,33] to represent an image with different scales has been thoroughly studied for many years within pattern recognition and image processing communities. Traditionally, linear convolution of an image with the Gaussian function G σ (·) has been employed to obtain a scale-space representation that smooths out image structures smaller than a given scale parameter σ ∈ R >0 1 as the standard deviation of G σ (·). See the seminal book [21] for the methodology and applications of scale-space theory. The Gaussian convolution also rapidly flattens salient image edges with a size greater than σ , but this is a highly redundant representation of edges over the scale space. Combining smooth image regions with salient edges is therefore more compact and desirable [10]. 1 Strictly speaking, it is a high-frequency suppression filter in the Fourier domain, and there is no closed-form relationship between σ and the curvature of a geometric structure in the spatial domain. In contrast, because the fundamental solution of the linear diffusion equation leads to a Gaussian scale-space for some boundary conditions [20], a time variable of the (mean) curvature flow and σ are directly related in continuous cases [23]. It is often difficult to control the parameters of popular edge-preserving image filters such as bilateral filters [24] and nonlinear diffusions (especially the number of iterations) to achieve a filtering result with a target scale σ . More recently, scale-aware filters [16,19,29,36], which remove image structures smaller than a specific scale while preserving salient edges, have attracted considerable attention because of their stability and simplicity. These scale-aware filters can directly and intuitively specify the target scale σ , and the convergence of their iterative filtering process is very stable compared with conventional edge-aware filters. These filters have therefore been widely used in applications such as edge detection, detail enhancement, and image abstraction. The scale-aware concept has also been extended to 3D meshes [32,35]. Since scale-aware filters usually require time-consuming computations, box-based averaging methods (e.g., [26]) have been employed for their fast approximation [36]. Unfortunately, a box-like averaging kernel that includes naive truncation of a Gaussian function often produces undesired artifacts because it leads to a sinc-like kernel shape in its Fourier domain. Extension of these methods [4,9,15] may therefore also cause artifacts. See Fig. 1 for an example of artifacts caused by a box filter and its relationship to a scale-aware filter (RG: Rolling Guidance [36]). Although a popular recursive Gaussian filter [6] has been extended to non-uniform pixels [12] and could perhaps be employed for a scale-aware filter, such recursive filters [1,6,31] have non-trivial numerical issues (e.g., some σ and boundary problems [5,30]) because they optimize their coefficients for fixed parameters. Developing not only a fast but also an accurate approximation of scale-aware filters is thus important because of recent advances in the use of image data for science and engineering as well as images with a high dynamic range (HDR) and high resolution.
In this paper, we propose a novel computational framework for fast and accurate approximation of scale-aware image filters. Our framework is based on adapting domain splitting [3,34] and transformation [11,12] techniques to construct an average-based joint filter that produces a nonlinear color averaging of a given image according to another guidance image. Our scale-aware filters recursively apply the constructed joint filter to recover edge features while removing small image structures. The joint filter consists of a separable implementation of L 1 Gaussian convolutions on a guidance-transformed domain that is equipped with a metric of geodesic distance on an image manifold [26] composed of the pixel coordinates and their corresponding guidance image colors. The L 1 Gaussian convolution is approximated very accurately with linear computational complexity by splitting the transformed domain into representative regions where discrete convolutions can be efficiently performed. Figure 2 gives an overview of the framework.
Our framework is faithful to scale-space theory because it implies that the L 1 Gaussian convolution never increases the number of extrema for the one-dimensional continuous case (mentioned as truncated exponential functions by [21] [ §6. 2.3]). In other words, it does not produce any phantom edges that do not exist in the original image for cases of linear filtering. It is also associated with the absence of ripples in the Fourier domain of the L 1 Gaussian function, as shown in the bottom-left image of Fig. 1. We introduce the implementation of three conventional scale-aware filters [19,29,36] via our framework in this paper and an assessment of their performances versus the conventional box-based averaging method.
The rest of the paper is organized as follows. We present our scale-aware filters based on joint filters in Sect. 2. Sections 3 and 4 describe the guidance domain transformation and the domain-splitting Gaussian convolution for the joint filter, respectively. Our numerical experiments are explained in Sect. 5. We conclude the paper in Sect. 6.

Joint-averaging scale-aware filters
For a given image pixel x = {x p } on R d , let I = I(x) and J s = J s (x) on R c be the input and the s-th filtered image color of our framework, respectively, where d, c ∈ N and s ∈ N ∪ {0}. Our scale-aware filter is then recursively defined by where σ ∈ R >0 is a user-specified scale parameter, G σ (·) = exp(− |·| σ ) is a L 1 Gaussian function (also known as a Laplace distribution in statistics and probability theory), and F is a functional of an average-based joint filter f defined in Eqs. from (2) to (5). Here, J 0 is an initial smoothed image for removing small structures on the input I by using the normalized Gaussian convolution in Eq. (1). The joint filter f recovers salient image edges during the above recursive filtering process.
Let f = f(x, g, h) : R d+2c → R c be a joint filter of integrand h = h(x) and guidance g = g(x) on R c , respectively. The f is then given by a normalized convolution: where y = {y p } ∈ R d , φ ∈ R >0 is a user-specified edge-awareness parameter, σ s is the standard deviation of the integrand h, and T p,λ (·) : R d+c+1 → R >0 denotes a domain transformation with respect to the p-th coordinate basis (described in Eq. (7) of Sect. 3). The transformation T p,λ (·) measures a geodesic distance on the image manifold (x p , λ g) ∈ R c+1 in order to obtain an edge-recovering effect according to the guidance g. Note that the Gaussian convolution G σ (·) in Eq. (3) uses only the scale parameter σ for its variance, but both φ and σ characterize the transformation T p,λ (·) by λ. Filter Models: We have implemented the following three conventional filters in our framework: RG [36], SiR (Smooth and iteratively Restore [19]), and AG (Alternating Guided [29]). The RG literature [36] first proposed a scale-aware filtering framework based on a recursive process involving averagebased joint filters such as joint/cross bilateral [8,25], guided [17], and domain transformation [11] filters. The RG filter smooths the curvature of large-scale edges, whereas the SiR filter preserves curvature by exchanging the use of the integrand and guidance images in the joint filter. The AG filter also improves two SiR drawbacks: reducing intensity and restoring small structures around large-scale edges. Figures  3 and 4 demonstrate the different effects of these filters on the same parameters via our framework, and their stable convergence rates are shown in Fig. 5. Four iterations (s = 4) have been recommended for fast results [36], and 20 iterations (s = 20) are enough for high-quality results. RG, SiR, and AG filters in our framework are given by RG [36],

f(x, I, h) :
SiR [19], where M x = M x (·) ∈ R c is a vector median filter [2] (only one-link neighbor pixels, i.e., a 3 × 3 pixel window for a 2D image). Note that the integrand and guidance (h and I) are exchanged in F of the RG and SiR filters on the right-hand side of Eq. (5) and that the AG filter applies the RG and SiR filters alternatively. In contrast to conventional filters [19,29,36], our framework is restricted to use of the domain transformation for the joint filter f, as in Eq. (3), in order to apply the fast and accurate Gaussian convolutions described in Sect. 4. Algorithm 1 shows the pseudocode of our scaleaware filters.

Guidance domain transformation
The domain transform technique [11,12] provides edgeaware image smoothing efficiently, and it can be utilized for scale-aware image filters, as demonstrated in the RG filter [36]. The basic idea of this technique is to apply fast linear convolutions on the transformed domain representing the magnitude of the image edge as a metric of the domain (length) by using geodesic distance on an image manifold instead of performing expensive nonlinear convolutions, as shown in Fig. 6. We describe here the guidance domain transformation T p,λ (·) in Eq. (3) adapted to our framework in order to efficiently perform Gaussian convolutions of f in Eqs. (2) and (3). For a given pixel x = {x p } on R d , the same pixel as in Sect. 2, consider its one-dimensional local coordinate system S parallel to the p-th coordinate basis. The following where the origin of S is located on u p (0, x). For a given image color h(x) = {h q (x)} on R c at x, the (λ-scaled) locus of image color h(u p ) on the joint space of the p-th pixel coordinate and its color form a hyper curve C p , i.e., a one-dimensional image manifold, where λ, defined in Eq. (4), controls the ratio of the metrics between pixel and color spaces. Note that {x l } : l = p of C p characterizes a (d − 1)-parameter family of hyper curves corresponding to each l-th pixel coordinate basis. The convolutions in Eq. (2) are thus performed separately in terms of varying the coordinate element p (see Eq. (3)).
The geodesic distance between two points on C p indicates the amount and magnitude of the image edges within its corresponding range of t, because the arc length of C p increases when its corresponding color h(u p ) changes rapidly with respect to t. The arc length of C p within the interval t ∈ [0, a] is given by integrating the magnitude of its tangent vector [28]: Simple computations then yield our guidance domain transformation T p,λ (·) (employed in Eq. (3)), as follows: The above transformation (7) is performed on the image manifold (x, λ g) as T p,λ (x p , x, g) in order to incorporate edge information provided by the guidance g into the joint filter f in Eqs. (2) and (3). Discrete Implementation: As recommended in [11], the firstorder partial derivative of h q (u p ) with respect to t in Eq. (7) is approximated by using the forward difference scheme: t, x)).  as f(x, g, h).
/* Transformation (7) can be computed here: a tradeoff between speed and memory usage. */ 2 for i ← 1 to N ite do 3 Update σ i by Eq. (8); 4 for p ← 1 to d do 5 for any 1D pixel array of X parallel to its p-th basis do // Algorithm 5. end for 10 end for 11 /* Domain transformation by Eq. (7): The joint filter f is also implemented iteratively with the scale variable where N ite is the number of iterations (N ite = 3 in our numerical experiments). The pseudocode in Algorithm 2 illustrates a separable implementation of our joint filter f based on the above domain transformations, where n p ∈ N is the number of pixels in the p-th coordinate basis (i.e., the image size of the p-th dimension). Its one-dimensional L 1 Gaussian convolution on the non-uniform pixel coordinates (transformed domain) is also accurately performed by using our domain-splitting technique described in Sect. 4.
fast Gaussian convolution methods. In contrast, a domainsplitting technique [3,34] approximates a discrete analogue of a Gaussian convolution (known as a Gauss transform [14]) very accurately (without ringing artifacts) and quickly, even for a non-uniformly sampled variable of the Gaussian function. The key feature of this technique is decomposition of the integral domain of Gaussian convolution into sub-domains centered at each representative point on the domain by using the L 1 norm for a metric instead of the popular L 2 norm. We explain here the domain-splitting technique adapted to our framework in order to accurately compute the Gaussian convolutions of the initial smoothed image J 0 in Eq. (1) and the joint filter (defined in Eqs. (2) and (3)) on the transformed domain T p,λ (x p , x, g). For a given pixel x = {x p } on R d , consider a set of n points P : {t i } on R representing the transformed pixels of x with respect to its p-th coordinate basis and corresponding where h(t i ) ∈ R is a q-th channel color at t i , n ∈ N is the number of pixels, and the transformation T p,λ (·) with its straight line u p are defined in Eqs. (7) and (6), respectively. The L 1 Gauss transform f (·) ∈ R with h(·) at t j ∈ P is given by where naively computing f (t j ) for all j = {1, 2, . . . , n} requires quadratic computational complexity O(n 2 ). Basic Decomposition Concept: For a fixed point t 1 , splitting the domain of the L 1 norm distance between t i and t j with respect to their order yields This process is illustrated conceptually in Fig. 7. Substituting the above equation into an where the dependency of the two indices i and j within the Gaussian is resolved such that a two-variable function can be replaced by a combination of two one-variable functions.
The L 1 Gauss transform f (·) consequently becomes Fig. 7 A basic domain decomposition concept that enables measurement of the distance between t i and t j by subtraction of two distances via an anchor coordinate t 1 G σ (t i −t 1 ) may cause an overflow for very large values of |t i − t 1 |. Accurate Approximation To avoid the above-mentioned numerical problem, consider a set of m representative poles {α k } on R instead of using the fixed point t 1 . Assuming that α 1 < α 2 < · · · < α m , the domain splitting of |t i − t j | around the pole α k is given by where the domains 1 i,k , 2 i,k , and 3 i,k are defined by as shown in Fig. 8. This domain splitting with the poles {α k } thus makes the Gauss transform f (·) in Eq. (9) become where G (σ,l,k) ≡ G σ (t l −α k ), γ ( j) = k, and γ 2 (k) = min( j) such that α k ≤ t j < α k+1 , as illustrated in Fig. 9. For a given upper bound of computational precision δ, the inequality exp( |α k+1 −α k | σ ) < δ should be held to avoid the above-mentioned numerical instability. Satisfying this condition leads to a stable choice of distances between poles by using the relationship |α k+1 − α k | = ϕσ log(δ), where ϕ ∈ (0, 1) is a parameter. In our framework, we use ϕ = 0.5 and a double floating-point precision format for δ (i.e., 64bit DBL_MAX of the C++ programming language). Since the range of P is defined by w = t n − t 1 > 0, the number of poles and their coordinates are automatically determined by where [·] is the ceiling function and m n in general cases. Although t 1 is always equal to zero in our framework, shifting the first pole (α 1 = t 1 ), as in Eq. (14), guarantees that Eq. (11) is numerically stable because of how the poles {α k } are chosen (see Fig. 10).
Since Coe f depends on only the pixel (transformed) coordinates P : {t j } and scale σ , it can be reused for different color channels, as in Algorithm 5 (GaussTransform in the joint filter: Algorithm 2), as well as for the uniform pixel coordinates, as in Algorithm 6 (GaussianFilter in the scale-aware filter: Algorithm 1, i.e., J 0 in Eq. (1)). Note that the terms i h(t i )/G σ (t i − α · ) and i G σ (t i − α · )h(t i ) appear in the equations for A k , B k , and also C j . The results of computing A k and B k are therefore reused to obtain C j in DSSum(·) (see Algorithm 4). We also employed a fast library [22] for the exponential function exp(·) in DSInit(·). Ignoring substi- ]; 10 end for /* Integrating D j , E j , and C j in Eq. (11). tutions and indexing, the complexities and costs of DSInit(·) and DSSum(·) are listed in Table 1.

Numerical experiments
All numerical experiments in this paper were performed on an i9-10980XE CPU (3.0 GHz, 36 core, and no parallelization was used) PC with 128 GB RAM and a 64-bit OS with a GNU C++ 9.3 compiler.
We first examined the accuracy and computational speed of our domain-splitting approximation (Our: uniform and Our NU: non-uniform pixels) described in Sect. 4 (Algorithms 6 and 5) by numerically comparing their accuracy and computational speed with those of the popular box and recursive filters such as the Box (moving average) [26], EBox (extended box) [15], SII (stacked image integral) [4,9], Deriche [6], VYV [31], and AM [1] filters. Their implementations and boundary conditions were based on the open library of [13] with 10 −15 tolerance (4th order recursion). Table 2 shows the peak signal-to-noise ratio (PSNR) [24,34] and the maximum error E max [34] of one-dimensional Gaussian filtering (d = 1 for J 0 in Eq. (1) and convolutions in the joint filter) for relatively small σ (0.05% to 1% of image size w = n), where the exact Gauss transform (Eq. (9)) and FIR (Finite Impulse Response) [13] were employed for comparison with their approximations. Our approximations were slightly slower than those with the conventional methods but significantly more accurate (about 10 10 times). Figure 11 demonstrates a typical error profile of Table 2 parameter settings, where some ripple-shaped errors, which cause undesired artifacts and phantom edges, are observed  for all conventional methods. At first glance, the VYV and Deriche filters look preferable for use in computer graphics applications because the magnitudes of their errors are small, as shown in Table 2 and Fig. 11. However, applying these methods stably and accurately for variable parameters is not trivial. During our experiments, we easily encountered cases of artifacts that could not be ignored or that resulted in complete failure. Figure 11 (top-right) shows a boundary artifact caused by use of the Deriche filter for large σ (10% of the image size, which is not irrelevantly large). This artifact may have been the result of one of the causes discussed in [5,13,30]. Moreover, the errors of these methods, including the AM and EBox filters, behave in a nonlinear manner with respect to not only σ but also the image size w (number of pixels n as well) because their formulations (e.g., coefficient optimization) rely on these parameters. Despite the fact that both σ and n were linearly scaled from the case in Fig. 11, the VYV and Deriche filters generated the results with the large errors, as shown in Fig. 12. In contrast to these conventional methods, our theoretical formulations guarantee  (1). The image sizes and figure numbers (φ and σ are described in the captions) are listed in Fig. 13 the high stability and accuracy of our approximations, which were numerically confirmed by our experiments. Since the quality of both J 0 and the convolutions used in the joint filter are important, the approximation described in Sect. 4 is suitable for scale-aware filters. We next evaluated the quality and computational speed of our scale-aware filters. The input images are listed in Fig.  13, and the timings are shown in Table 3. Figures 3, 4, and 18 demonstrate our scale-aware filtering results with varying scales (σ ). The scale-space and salient edge features are clearly apparent. Our filters did not produce undesired artifacts generated by box-based convolutions (moving average [26], where √ 3σ radius was employed to obtain the visually similar results), as shown in Figs. 1, 15, and 16. Figure 14    shows timing comparisons for various numbers of iterations s and images sizes, and it is summarized in Table 4. Our filters achieved linear computational speeds (slightly slower than the box kernel convolutions 3 ), and their convergence properties were similar to those of the RG, SiR, and AG filters, as shown in Fig. 5. See Fig. 17 for the filtering effects of iterations s ∈ {4, 20} recommended by [36].

Conclusion
We have proposed a fast and accurate computational framework for scale-aware image filters. Our new framework is based on accurately approximating L 1 Gaussian convolution with respect to a transformed pixel domain representing geodesic distance on a guidance image manifold in order to recover salient edges in a manner faithful to scale-space theory while removing small image structures. Our framework achieved linear computational complexity with high-quality filtering results. We compared our framework numerically in terms of speed, precision, and visual quality with popular conventional methods.

Input
RG-box RG-our SiR-box SiR-our AG-box AG-our

Fig. 15
Quality comparison of image of face for the RG, SiR, and AG filters with the box kernel ( √ 3σ radius was used) and our framework (L 1 Gaussian), where s = 20, φ = 1, σ = 6 (RG and AG) and φ = 0.5, σ = 3 (SiR). Bottom images correspond to the square root of the magnitude of the gradient (||∇J 20 || 1/2 ) of the eye and mouth parts of the upper images, where some artifacts appeared in the box-based results

Input
RG-box RG-our SiR-box SiR-our AG-box AG-our  Since our framework is robustly applicable to HDR images with a wide range of scale σ , applications to computational photography, engineering, and science are promising future work. Limitations of our framework are related to domain transformations and scale-aware filters, such as slow convergence of elongated image regions (Fig. 17) and reduced intensity (SiR, Fig. 4). Combining our framework with a guided filter [17] may reduce such artifacts. Future work will also include numerical comparisons with the nonuniform recursive method [12] (an extension of Deriche [6]).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.