Fast and faithful scale-aware image filters

Yoshizawa, Shin; Yokota, Hideo

doi:10.1007/s00371-021-02249-5

Fast and faithful scale-aware image filters

Original article
Open access
Published: 05 August 2021

Volume 37, pages 3051–3062, (2021)
Cite this article

Download PDF

You have full access to this open access article

The Visual Computer Aims and scope Submit manuscript

Fast and faithful scale-aware image filters

Download PDF

Shin Yoshizawa¹ &
Hideo Yokota¹

1248 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

This paper proposes a fast and accurate computational framework for scale-aware image filters. Our framework is based on accurately approximating $L^{1}$ Gaussian convolution with respect to a transformed pixel domain representing geodesic distance on a guidance image manifold in order to recover salient edges in a manner faithful to scale-space theory while removing small image structures. Our framework possesses linear computational complexity with high approximation precision. We examined it numerically in terms of speed, accuracy, and quality compared with conventional methods.

Rolling Guidance Filter

A nonlocal gradient concentration method for image smoothing

Article Open access 14 August 2015

Inverse Kernels for Fast Spatial Deconvolution

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Shape and texture structures in images are composed of various scale sizes. Therefore, a multi-scale mathematical framework called scale-space theory [18, 33] to represent an image with different scales has been thoroughly studied for many years within pattern recognition and image processing communities. Traditionally, linear convolution of an image with the Gaussian function $G_{\sigma }(\cdot )$ has been employed to obtain a scale-space representation that smooths out image structures smaller than a given scale parameter $\sigma \in {\mathbb R}_{>0}$^{Footnote 1} as the standard deviation of $G_{\sigma }(\cdot )$. See the seminal book [21] for the methodology and applications of scale-space theory. The Gaussian convolution also rapidly flattens salient image edges with a size greater than $\sigma $, but this is a highly redundant representation of edges over the scale space. Combining smooth image regions with salient edges is therefore more compact and desirable [10].

It is often difficult to control the parameters of popular edge-preserving image filters such as bilateral filters [24] and nonlinear diffusions (especially the number of iterations) to achieve a filtering result with a target scale $\sigma $. More recently, scale-aware filters [16, 19, 29, 36], which remove image structures smaller than a specific scale while preserving salient edges, have attracted considerable attention because of their stability and simplicity. These scale-aware filters can directly and intuitively specify the target scale $\sigma $, and the convergence of their iterative filtering process is very stable compared with conventional edge-aware filters. These filters have therefore been widely used in applications such as edge detection, detail enhancement, and image abstraction. The scale-aware concept has also been extended to 3D meshes [32, 35]. Since scale-aware filters usually require time-consuming computations, box-based averaging methods (e.g., [26]) have been employed for their fast approximation [36]. Unfortunately, a box-like averaging kernel that includes naive truncation of a Gaussian function often produces undesired artifacts because it leads to a sinc-like kernel shape in its Fourier domain. Extension of these methods [4, 9, 15] may therefore also cause artifacts. See Fig. 1 for an example of artifacts caused by a box filter and its relationship to a scale-aware filter (RG: Rolling Guidance [36]). Although a popular recursive Gaussian filter [6] has been extended to non-uniform pixels [12] and could perhaps be employed for a scale-aware filter, such recursive filters [1, 6, 31] have non-trivial numerical issues (e.g., some $\sigma $ and boundary problems [5, 30]) because they optimize their coefficients for fixed parameters. Developing not only a fast but also an accurate approximation of scale-aware filters is thus important because of recent advances in the use of image data for science and engineering as well as images with a high dynamic range (HDR) and high resolution.

In this paper, we propose a novel computational framework for fast and accurate approximation of scale-aware image filters. Our framework is based on adapting domain splitting [3, 34] and transformation [11, 12] techniques to construct an average-based joint filter that produces a nonlinear color averaging of a given image according to another guidance image. Our scale-aware filters recursively apply the constructed joint filter to recover edge features while removing small image structures. The joint filter consists of a separable implementation of $L^{1}$ Gaussian convolutions on a guidance-transformed domain that is equipped with a metric of geodesic distance on an image manifold [26] composed of the pixel coordinates and their corresponding guidance image colors. The $L^{1}$ Gaussian convolution is approximated very accurately with linear computational complexity by splitting the transformed domain into representative regions where discrete convolutions can be efficiently performed. Figure 2 gives an overview of the framework.

Our framework is faithful to scale-space theory because it implies that the $L^{1}$ Gaussian convolution never increases the number of extrema for the one-dimensional continuous case (mentioned as truncated exponential functions by [21] [§6.2.3]). In other words, it does not produce any phantom edges that do not exist in the original image for cases of linear filtering. It is also associated with the absence of ripples in the Fourier domain of the $L^{1}$ Gaussian function, as shown in the bottom-left image of Fig. 1. We introduce the implementation of three conventional scale-aware filters [19, 29, 36] via our framework in this paper and an assessment of their performances versus the conventional box-based averaging method.

The rest of the paper is organized as follows. We present our scale-aware filters based on joint filters in Sect. 2. Sections 3 and 4 describe the guidance domain transformation and the domain-splitting Gaussian convolution for the joint filter, respectively. Our numerical experiments are explained in Sect. 5. We conclude the paper in Sect. 6.

2 Joint-averaging scale-aware filters

For a given image pixel ${\mathbf{x}}=\{x_{p}\}$ on ${\mathbb R}^{d}$, let ${\mathbf{I}}={\mathbf{I}}({\mathbf{x}})$ and ${\mathbf{J}}^{s}={\mathbf{J}}^{s}({\mathbf{x}})$ on ${\mathbb R}^{c}$ be the input and the s-th filtered image color of our framework, respectively, where $d, c\in {\mathbb N}$ and $s\in {\mathbb N}\,\cup \, \{0\}$. Our scale-aware filter is then recursively defined by

$$\begin{aligned} {\mathbf{J}}^{s+1} = {\mathbf{F}}({\mathbf{x}},{\mathbf{f}},{\mathbf{J}}^{s}),\quad {\mathbf{J}}^{0} \equiv \frac{\int G_{\sigma }({\mathbf{x}}-{\mathbf{y}}){\mathbf{I}}({\mathbf{y}})d{\mathbf{y}}}{\int G_{\sigma }({\mathbf{x}}-{\mathbf{y}})d{\mathbf{y}}} \end{aligned}$$

(1)

where $\sigma \in {\mathbb R}_{>0}$ is a user-specified scale parameter, $G_{\sigma }(\cdot ) = \exp (-\frac{|\cdot |}{\sigma })$ is a $L^{1}$ Gaussian function (also known as a Laplace distribution in statistics and probability theory), and ${\mathbf{F}}$ is a functional of an average-based joint filter ${\mathbf{f}}$ defined in Eqs. from (2) to (5). Here, ${\mathbf{J}}^{0}$ is an initial smoothed image for removing small structures on the input ${\mathbf{I}}$ by using the normalized Gaussian convolution in Eq. (1). The joint filter ${\mathbf{f}}$ recovers salient image edges during the above recursive filtering process.

Let ${\mathbf{f}}={\mathbf{f}}({\mathbf{x}},{\mathbf{g}},{\mathbf{h}}): {\mathbb R}^{d+2c}\rightarrow {\mathbb R}^{c}$ be a joint filter of integrand ${\mathbf{h}}={\mathbf{h}}({\mathbf{x}})$ and guidance ${\mathbf{g}}={\mathbf{g}}({\mathbf{x}})$ on ${\mathbb R}^{c}$, respectively. The ${\mathbf{f}}$ is then given by a normalized convolution:

$$\begin{aligned}&{\mathbf{f}}({\mathbf{x}},{\mathbf{g}},{\mathbf{h}}) =\frac{\int W(\sigma ,\phi ,{\mathbf{x}},{\mathbf{y}},{\mathbf{g}}){\mathbf{h}}({\mathbf{y}}) d{\mathbf{y}}}{\int W(\sigma ,\phi ,{\mathbf{x}},{\mathbf{y}},{\mathbf{g}}) d{\mathbf{y}}}, \end{aligned}$$

(2)

$$\begin{aligned}&W(\cdot ) = \prod _{p=1}^{d} G_{\sigma }(T_{p,\lambda }(x_{p},{\mathbf{x}},{\mathbf{g}})-T_{p,\lambda }(y_{p},{\mathbf{y}},{\mathbf{g}})), \end{aligned}$$

(3)

$$\begin{aligned}&\lambda = \sqrt{\sigma /(\sigma _{s}\phi )} \end{aligned}$$

(4)

where ${\mathbf{y}}=\{y_{p}\}\in {\mathbb R}^{d}$, $\phi \in {\mathbb R}_{>0}$ is a user-specified edge-awareness parameter, $\sigma _{s}$ is the standard deviation of the integrand ${\mathbf{h}}$, and $T_{p,\lambda }(\cdot ): {\mathbb R}^{d+c+1}\rightarrow {\mathbb R}_{>0}$ denotes a domain transformation with respect to the p-th coordinate basis (described in Eq. (7) of Sect. 3). The transformation $T_{p,\lambda }(\cdot )$ measures a geodesic distance on the image manifold $(x_{p},\lambda \,{\mathbf{g}})\in {\mathbb R}^{c+1}$ in order to obtain an edge-recovering effect according to the guidance ${\mathbf{g}}$. Note that the Gaussian convolution $G_{\sigma }(\cdot )$ in Eq. (3) uses only the scale parameter $\sigma $ for its variance, but both $\phi $ and $\sigma $ characterize the transformation $T_{p,\lambda }(\cdot )$ by $\lambda $.

Filter Models: We have implemented the following three conventional filters in our framework: RG [36], SiR (Smooth and iteratively Restore [19]), and AG (Alternating Guided [29]). The RG literature [36] first proposed a scale-aware filtering framework based on a recursive process involving average-based joint filters such as joint/cross bilateral [8, 25], guided [17], and domain transformation [11] filters. The RG filter smooths the curvature of large-scale edges, whereas the SiR filter preserves curvature by exchanging the use of the integrand and guidance images in the joint filter. The AG filter also improves two SiR drawbacks: reducing intensity and restoring small structures around large-scale edges. Figures 3 and 4 demonstrate the different effects of these filters on the same parameters via our framework, and their stable convergence rates are shown in Fig. 5. Four iterations ($s=4$) have been recommended for fast results [36], and 20 iterations ($s=20$) are enough for high-quality results.

RG, SiR, and AG filters in our framework are given by ${\mathbf{F}}= {\mathbf{F}}({\mathbf{x}},{\mathbf{f}},{\mathbf{h}})\in {\mathbb R}^{c}$ in Eq. (1) such that

(5)

where ${\mathbf{M}}_{{\mathbf{x}}} = {\mathbf{M}}_{{\mathbf{x}}}(\cdot ) \in {\mathbb R}^{c}$ is a vector median filter [2] (only one-link neighbor pixels, i.e., a $3\times 3$ pixel window for a 2D image). Note that the integrand and guidance (${\mathbf{h}}$ and ${\mathbf{I}}$) are exchanged in ${\mathbf{F}}$ of the RG and SiR filters on the right-hand side of Eq. (5) and that the AG filter applies the RG and SiR filters alternatively. In contrast to conventional filters [19, 29, 36], our framework is restricted to use of the domain transformation for the joint filter ${\mathbf{f}}$, as in Eq. (3), in order to apply the fast and accurate Gaussian convolutions described in Sect. 4. Algorithm 1 shows the pseudocode of our scale-aware filters.

3 Guidance domain transformation

The domain transform technique [11, 12] provides edge-aware image smoothing efficiently, and it can be utilized for scale-aware image filters, as demonstrated in the RG filter [36]. The basic idea of this technique is to apply fast linear convolutions on the transformed domain representing the magnitude of the image edge as a metric of the domain (length) by using geodesic distance on an image manifold instead of performing expensive nonlinear convolutions, as shown in Fig. 6. We describe here the guidance domain transformation $T_{p,\lambda }(\cdot )$ in Eq. (3) adapted to our framework in order to efficiently perform Gaussian convolutions of ${\mathbf{f}}$ in Eqs. (2) and (3).

For a given pixel ${\mathbf{x}}= \{x_{p}\}$ on ${\mathbb R}^{d}$, the same pixel as in Sect. 2, consider its one-dimensional local coordinate system ${\mathcal S}$ parallel to the p-th coordinate basis. The following straight line ${\mathbf{u}}_{p} = {\mathbf{u}}_{p}(t,{\mathbf{x}})\in {\mathbb R}^{d}$ passing through ${\mathbf{x}}$ then reparameterizes ${\mathcal S}$ by $t\in {\mathbb R}$:

$$\begin{aligned} {\mathbf{u}}_{p}={\mathbf{u}}_{p}(t,{\mathbf{x}})\equiv & {} \{ \left\{ \begin{aligned} x_{l}\quad&\text {if}\,\, l\ne p,\\ t\quad&\text {Otherwise} \end{aligned} \right. \} \end{aligned}$$

(6)

where the origin of ${\mathcal S}$ is located on ${\mathbf{u}}_{p}(0,{\mathbf{x}})$.

For a given image color ${\mathbf{h}}({\mathbf{x}})=\{h_{q}({\mathbf{x}})\}$ on ${\mathbb R}^{c}$ at ${\mathbf{x}}$, the ($\lambda $-scaled) locus of image color ${\mathbf{h}}({\mathbf{u}}_{p})$ on the joint space of the p-th pixel coordinate and its color form a hyper curve ${\mathcal C}_{p}$, i.e., a one-dimensional image manifold,

$$\begin{aligned} {\mathcal C}_{p}: {\mathbf{r}}_{p}(t)=(t,\lambda \,{\mathbf{h}}({\mathbf{u}}_{p}))\in {\mathbb R}^{c+1} \end{aligned}$$

where $\lambda $, defined in Eq. (4), controls the ratio of the metrics between pixel and color spaces. Note that $\{x_{l}\}: l\ne p$ of ${\mathcal C}_{p}$ characterizes a $(d-1)$-parameter family of hyper curves corresponding to each l-th pixel coordinate basis. The convolutions in Eq. (2) are thus performed separately in terms of varying the coordinate element p (see Eq. (3)).

The geodesic distance between two points on ${\mathcal C}_{p}$ indicates the amount and magnitude of the image edges within its corresponding range of t, because the arc length of ${\mathcal C}_{p}$ increases when its corresponding color ${\mathbf{h}}({\mathbf{u}}_{p})$ changes rapidly with respect to t. The arc length of ${\mathcal C}_{p}$ within the interval $t\in [0,a]$ is given by integrating the magnitude of its tangent vector [28]: $\int ^{a}_{0} ||\frac{\partial {\mathbf{r}}_{p}(t)}{\partial t}||\,dt = \int ^{a}_{0} ||(1,\lambda \frac{\partial {\mathbf{h}}({\mathbf{u}}_{p})}{\partial t})||\,dt$. Simple computations then yield our guidance domain transformation $T_{p,\lambda }(\cdot )$ (employed in Eq. (3)), as follows:

$$\begin{aligned} \begin{aligned} T_{p,\lambda }(a,{\mathbf{x}},{\mathbf{h}}) = \int \limits ^{a}_{0} \sqrt{1+\lambda ^{2}\sum _{q=1}^{c} |\frac{\partial h_{q}({\mathbf{u}}_{p})}{\partial t}|^{2}}\,dt. \end{aligned} \end{aligned}$$

(7)

The above transformation (7) is performed on the image manifold $({\mathbf{x}},\lambda \,{\mathbf{g}})$ as $T_{p,\lambda }(x_{p},{\mathbf{x}},{\mathbf{g}})$ in order to incorporate edge information provided by the guidance ${\mathbf{g}}$ into the joint filter ${\mathbf{f}}$ in Eqs. (2) and (3).

Discrete Implementation: As recommended in [11], the first-order partial derivative of $h_{q}({\mathbf{u}}_{p})$ with respect to t in Eq. (7) is approximated by using the forward difference scheme:

$$\begin{aligned} \frac{\partial h_{q}({\mathbf{u}}_{p})}{\partial t} \approx h_{q}({\mathbf{u}}_{p}(t+1,{\mathbf{x}}))-h_{q}({\mathbf{u}}_{p}(t,{\mathbf{x}})). \end{aligned}$$

The joint filter ${\mathbf{f}}$ is also implemented iteratively with the scale variable

$$\begin{aligned} \sigma _{i} = \sigma \sqrt{3}(2^{N_{ite}-i})/\sqrt{4^{N_{ite}}-1} \end{aligned}$$

(8)

where $N_{ite}$ is the number of iterations ($N_{ite}=3$ in our numerical experiments). The pseudocode in Algorithm 2 illustrates a separable implementation of our joint filter ${\mathbf{f}}$ based on the above domain transformations, where $n_{p}\in {\mathbb N}$ is the number of pixels in the p-th coordinate basis (i.e., the image size of the p-th dimension). Its one-dimensional $L^{1}$ Gaussian convolution on the non-uniform pixel coordinates (transformed domain) is also accurately performed by using our domain-splitting technique described in Sect. 4.

4 Domain-splitting Gaussian convolution

Using conventional fast methods such as the well-studied FFTs [7] and recursive filters [1, 6, 31] to perform Gaussian convolutions on a transformed domain is not trivial, because these methods have usually been designed for use with only uniform-image pixels. See [13] for an excellent review of fast Gaussian convolution methods. In contrast, a domain-splitting technique [3, 34] approximates a discrete analogue of a Gaussian convolution (known as a Gauss transform [14]) very accurately (without ringing artifacts) and quickly, even for a non-uniformly sampled variable of the Gaussian function. The key feature of this technique is decomposition of the integral domain of Gaussian convolution into sub-domains centered at each representative point on the domain by using the $L^{1}$ norm for a metric instead of the popular $L^{2}$ norm. We explain here the domain-splitting technique adapted to our framework in order to accurately compute the Gaussian convolutions of the initial smoothed image ${\mathbf{J}}^{0}$ in Eq. (1) and the joint filter (defined in Eqs. (2) and (3)) on the transformed domain $T_{p,\lambda }(x_{p},{\mathbf{x}},{\mathbf{g}})$.

For a given pixel ${\mathbf{x}}=\{x_{p}\}$ on ${\mathbb R}^{d}$, consider a set of n points ${\mathcal P}: \{t_{i}\}$ on ${\mathbb R}$ representing the transformed pixels of ${\mathbf{x}}$ with respect to its p-th coordinate basis and corresponding color ${\mathbf{h}}=\{h_{q}({\mathbf{x}})\}\in {\mathbb R}^{c}$ such that

$$\begin{aligned} t_{i}\equiv & {} T_{p,\lambda }(i,{\mathbf{x}},{\mathbf{h}}),\quad t_{1}\le t_{2} \le \cdots \le t_{n}, \\ h(t_{i})\equiv & {} h_{q}({\mathbf{u}}_{p}(i,{\mathbf{x}})),\quad i = \{1,2,\ldots ,n\} \end{aligned}$$

where $h(t_{i})\in {\mathbb R}$ is a q-th channel color at ${\mathbf{t}}_{i}$, $n\in {\mathbb N}$ is the number of pixels, and the transformation $T_{p,\lambda }(\cdot )$ with its straight line ${\mathbf{u}}_{p}$ are defined in Eqs. (7) and (6), respectively. The $L^{1}$ Gauss transform $f(\cdot )\in {\mathbb R}$ with $h(\cdot )$ at $t_{j}\in {\mathcal P}$ is given by

$$\begin{aligned} f(t_{j}) = \sum _{i=1}^{n} G_{\sigma }(t_{j}-t_{i})h(t_{i}) \end{aligned}$$

(9)

where naively computing $f(t_{j})$ for all $j=\{1,2,\ldots ,n\}$ requires quadratic computational complexity $O(n^{2})$.

Basic Decomposition Concept: For a fixed point $t_{1}$, splitting the domain of the $L^{1}$ norm distance between $t_{i}$ and $t_{j}$ with respect to their order yields

$$\begin{aligned} |t_{i}-t_{j}|= & {} \left\{ \begin{aligned} |t_{i}-t_{1}|-|t_{j}-t_{1}|&\quad \text {if}&t_{1}\le t_{j}\le t_{i},\\ -|t_{i}-t_{1}|+|t_{j}-t_{1}|&\quad \text {if}&t_{1}\le t_{i} < t_{j}. \end{aligned} \right. \end{aligned}$$

This process is illustrated conceptually in Fig. 7. Substituting the above equation into an $L^{1}$ Gaussian $G_{\sigma }(t_{i}-t_{j})$ leads to

$$\begin{aligned} G_{\sigma }(t_{i}-t_{j})= & {} \left\{ \begin{aligned} \frac{G_{\sigma }(t_{i}-t_{1})}{G_{\sigma }(t_{j}-t_{1})}&\quad \text {if}&t_{1}\le t_{j}\le t_{i},\\ \frac{G_{\sigma }(t_{j}-t_{1})}{G_{\sigma }(t_{i}-t_{1})}&\quad \text {if}&t_{1}\le t_{i} < t_{j} \end{aligned} \right. \end{aligned}$$

where the dependency of the two indices i and j within the Gaussian is resolved such that a two-variable function can be replaced by a combination of two one-variable functions. The $L^{1}$ Gauss transform $f(\cdot )$ consequently becomes

$$\begin{aligned} f(t_{j})= & {} h(t_{j}) + G_{\sigma }(t_{j}-t_{1})\xi (j-1) +\frac{\eta (j+1)}{G_{\sigma }(t_{j}-t_{1})},\nonumber \\ \end{aligned}$$

(10)

$$\begin{aligned} \begin{aligned} \xi (j)= & {} \sum _{i=1}^{j}\frac{h(t_{i})}{G_{\sigma }(t_{i}-t_{1})},\quad \eta (j) = \sum _{i=j}^{n}G_{\sigma }(t_{i}-t_{1})h(t_{i}) \end{aligned}\nonumber \\ \end{aligned}$$

where $\xi (0) \equiv 0 \equiv \eta (n+1)$. Note that Eq. (10) depends on only the index j, and $\xi (\cdot )$ and $\eta (\cdot )$ can be pre-computed (with linear computational time) for all $j=\{1,2,\ldots ,n\}$ at once before calculating $f(t_{j})$. Linear computational complexity O(3n) is hence required to calculate $f(t_{j})$ for all indices j by the decomposition (10). However, some numerical instability makes it difficult to compute Eq. (10) accurately. For instance, $\frac{1}{G_{\sigma }(t_{i}-t_{1})}$ may cause an overflow for very large values of $|t_{i}-t_{1}|$.

Accurate Approximation To avoid the above-mentioned numerical problem, consider a set of m representative poles $\{\alpha _{k}\}$ on ${\mathbb R}$ instead of using the fixed point $t_{1}$. Assuming that $\alpha _{1}<\alpha _{2}<\cdots <\alpha _{m}$, the domain splitting of $|t_{i}-t_{j}|$ around the pole $\alpha _{k}$ is given by

$$\begin{aligned} |t_{i}-t_{j}|= & {} \left\{ \begin{aligned} |t_{i}-\alpha _{k}|-|t_{j}-\alpha _{k}|&\quad \text {if}\,\,&t_{j}\in \Omega _{i,k}^{1},\\ -|t_{i}-\alpha _{k}|+|t_{j}-\alpha _{k}|&\quad \text {if}\,\,&t_{j}\in \Omega _{i,k}^{2},\\ |t_{i}-\alpha _{k}|+|t_{j}-\alpha _{k}|&\quad \text {if}\,\,&t_{j} \in \Omega _{i,k}^{3} \end{aligned} \right. \end{aligned}$$

where the domains $\Omega _{i,k}^{1}$, $\Omega _{i,k}^{2}$, and $\Omega _{i,k}^{3}$ are defined by

$$\begin{aligned} \Omega _{i,k}^{1}= & {} \{z\in {\mathcal P}: \alpha _{k}\le z\le t_{i},\,\text {or}\,\,t_{i}\le z\le \alpha _{k}\},\\ \Omega _{i,k}^{2}= & {} \{z\in {\mathcal P}: \alpha _{k}\le t_{i}< z,\,\text {or}\,\,z< t_{i}\le \alpha _{k}\},\\ \Omega _{i,k}^{3}= & {} \{z\in {\mathcal P}: z< \alpha _{k}\le t_{i},\,\text {or}\,\,t_{i}\le \alpha _{k} < z\},\\ \end{aligned}$$

as shown in Fig. 8. This domain splitting with the poles $\{\alpha _{k}\}$ thus makes the Gauss transform $f(\cdot )$ in Eq. (9) become

$$\begin{aligned} f(t_{j}) = h(t_{j})+C_{j}+D_{j}+E_{j}, \end{aligned}$$

(11)

$$\begin{aligned} \begin{aligned} C_{j}&=\{G_{(\sigma ,j,\gamma (j))}\sum _{i=\gamma _{2}(\gamma (j))}^{j-1}\frac{h(t_{i})}{G_{(\sigma ,i,\gamma (j))}}\}\,+\\&\quad +\{\frac{1}{G_{(\sigma ,j,\gamma (j))}}\sum _{i=j+1}^{\gamma _{2}(\gamma (j)+1)-1}G_{(\sigma ,i,\gamma (j))}h(t_{i})\}, \end{aligned} \end{aligned}$$

(12)

$$\begin{aligned} D_{j}= & {} \sum _{k=1}^{\gamma (j)-1}G_{(\sigma ,j,k)}A_{k},\quad E_{j} = \sum _{k=\gamma (j)+1}^{m}G_{(\sigma ,j,k)}B_{k}, \end{aligned}$$

$$\begin{aligned} A_{k}= & {} \sum _{i=\gamma _{2}(k)}^{\gamma _{2}(k+1)-1}\frac{h(t_{i})}{G_{(\sigma ,i,k)}},\quad B_{k} = \sum _{i=\gamma _{2}(k)}^{\gamma _{2}(k+1)-1}G_{(\sigma ,i,k)}h(t_{i}) \end{aligned}$$

(13)

where $G_{(\sigma ,l,k)} \equiv G_{\sigma }(t_{l}-\alpha _{k})$, $\gamma (j) = k$, and $\gamma _{2}(k) = \text {min}(j)$ such that $\alpha _{k}\le t_{j} <\alpha _{k+1}$, as illustrated in Fig. 9.

For a given upper bound of computational precision $\delta $, the inequality $\exp (\frac{|\alpha _{k+1}-\alpha _{k}|}{\sigma }) < \delta $ should be held to avoid the above-mentioned numerical instability. Satisfying this condition leads to a stable choice of distances between poles by using the relationship $|\alpha _{k+1}-\alpha _{k}| = \varphi \sigma \log (\delta )$, where $\varphi \in (0,1)$ is a parameter. In our framework, we use $\varphi = 0.5$ and a double floating-point precision format for $\delta $ (i.e., 64bit DBL_MAX of the C++ programming language). Since the range of ${\mathcal P}$ is defined by $w = t_{n}-t_{1}>0$, the number of poles and their coordinates are automatically determined by

$$\begin{aligned} \{\alpha _{k}\} = t_{1}+\{0,1,\ldots ,m-1\}\frac{w}{m},\quad m = [\frac{w}{\varphi \sigma \log (\delta )}] \end{aligned}$$

(14)

where $[\cdot ]$ is the ceiling function and $m \ll n$ in general cases. Although $t_{1}$ is always equal to zero in our framework, shifting the first pole ($\alpha _{1} = t_{1}$), as in Eq. (14), guarantees that Eq. (11) is numerically stable because of how the poles $\{\alpha _{k}\}$ are chosen (see Fig. 10).

If $|\alpha _{k}-t_{j}| > \sigma \log (\delta )$, then $G_{\sigma }(\alpha _{k}-t_{j})$ becomes numerically zero where $\alpha _{k}$ is located far from $t_{j}$. Therefore, $D_{j}$ and $E_{j}$ are approximated by

$$\begin{aligned} D_{j} \approx G_{(\sigma ,j,\gamma (j)-1)}A_{\gamma (j)-1},\quad E_{j} \approx G_{(\sigma ,j,\gamma (j)+1)}B_{\gamma (j)+1}\nonumber \\ \end{aligned}$$

(15)

where $D_{j} \approx 0$ if $\gamma (j) < 2$, and $E_{j} \approx 0$ if $\gamma (j) > n-1$. The approximation error at $t_{j}$ is given by

$$\begin{aligned} \sum _{k=1}^{\gamma (j)-2}G_{\sigma }(t_{j}-\alpha _{k})A_{k}+\sum _{k=\gamma (j)+2}^{m}G_{\sigma }(t_{j}-\alpha _{k})B_{k}. \end{aligned}$$

The above approximation of $f(t_{j})$ thus possesses accuracy equivalent to truncation of a sum within a $3 \varphi \sigma \log (\delta )$ region. Such a truncation is very accurate^{Footnote 2} and equivalent to naive truncation with a radius of about $525\sigma $, where $\varphi =0.5$ and $\log ({\mathrm{{DBL}}\_\mathrm{{MAX}}})\approx 700$. Also, this approximation algorithm requires only $O(4n+2m+\max (n,m))$ operations ($O(1/[\sigma ])$ with respect to $\sigma $).

Efficient Implementation The above domain-splitting technique is efficiently implemented in our framework by using the following initialization (DSInit) and integration (DSSum) procedures described in the pseudocodes of Algorithms 3 and 4, respectively. DSInit$(\cdot )$ computes the poles $\{\alpha \}$ in Eq. (14) and some of the coefficients $C_{j}$, $D_{j}$, and $E_{j}$ in Eqs. (12) and (15). DSSum$(\cdot )$ then calculates the $L^{1}$ Gauss transform $\{f(t_{j})\}$ in Eq. (11) with respect to the given colors $\{h(t_{j})\}$ by using the coefficients determined by DSInit$(\cdot ): Coef$. Since Coef depends on only the pixel (transformed) coordinates ${\mathcal P}: \{t_{j}\}$ and scale $\sigma $, it can be reused for different color channels, as in Algorithm 5 (GaussTransform in the joint filter: Algorithm 2), as well as for the uniform pixel coordinates, as in Algorithm 6 (GaussianFilter in the scale-aware filter: Algorithm 1, i.e., ${\mathbf{J}}^{0}$ in Eq. (1)). Note that the terms $\sum _{i} h(t_{i})/G_{\sigma }(t_{i}-\alpha _{\cdot })$ and $\sum _{i} G_{\sigma }(t_{i}-\alpha _{\cdot })h(t_{i})$ appear in the equations for $A_{k}$, $B_{k}$, and also $C_{j}$. The results of computing $A_{k}$ and $B_{k}$ are therefore reused to obtain $C_{j}$ in DSSum$(\cdot )$ (see Algorithm 4). We also employed a fast library [22] for the exponential function $\exp (\cdot )$ in DSInit$(\cdot )$. Ignoring substitutions and indexing, the complexities and costs of DSInit$(\cdot )$ and DSSum$(\cdot )$ are listed in Table 1.

Table 1 Complexities and costs: numbers of multiplications (Mult.), additions (Add.), and exponential functions ($\exp (\cdot )$) per pixel

Full size table

5 Numerical experiments

All numerical experiments in this paper were performed on an i9-10980XE CPU (3.0 GHz, 36 core, and no parallelization was used) PC with 128 GB RAM and a 64-bit OS with a GNU C++ 9.3 compiler.

We first examined the accuracy and computational speed of our domain-splitting approximation (Our: uniform and Our NU: non-uniform pixels) described in Sect. 4 (Algorithms 6 and 5) by numerically comparing their accuracy and computational speed with those of the popular box and recursive filters such as the Box (moving average) [26], EBox (extended box) [15], SII (stacked image integral) [4, 9], Deriche [6], VYV [31], and AM [1] filters. Their implementations and boundary conditions were based on the open library of [13] with $10^{-15}$ tolerance (4th order recursion). Table 2 shows the peak signal-to-noise ratio (PSNR) [24, 34] and the maximum error $E_{\max }$ [34] of one-dimensional Gaussian filtering ($d=1$ for ${\mathbf{J}}^{0}$ in Eq. (1) and convolutions in the joint filter) for relatively small $\sigma $ (0.05% to 1% of image size $w=n$), where the exact Gauss transform (Eq. (9)) and FIR (Finite Impulse Response) [13] were employed for comparison with their approximations. Our approximations were slightly slower than those with the conventional methods but significantly more accurate (about $10^{10}$ times).

Table 2 Timing and accuracy comparisons of ${\mathbf{J}}^{0}$ where $d=1$, $n=10^{4}$, $h(\cdot )\in [0,1]$, averaged PSNR of 20 $\sigma \in \{5,10,\ldots ,100\}$, and 100 datasets randomly generated for each $\sigma $ were employed. Time shows averaged computational time (milliseconds) for $n\in 10^{\{4,5,6\}}$

Full size table

Figure 11 demonstrates a typical error profile of Table 2 parameter settings, where some ripple-shaped errors, which cause undesired artifacts and phantom edges, are observed for all conventional methods. At first glance, the VYV and Deriche filters look preferable for use in computer graphics applications because the magnitudes of their errors are small, as shown in Table 2 and Fig. 11. However, applying these methods stably and accurately for variable parameters is not trivial. During our experiments, we easily encountered cases of artifacts that could not be ignored or that resulted in complete failure. Figure 11 (top-right) shows a boundary artifact caused by use of the Deriche filter for large $\sigma $ (10% of the image size, which is not irrelevantly large). This artifact may have been the result of one of the causes discussed in [5, 13, 30]. Moreover, the errors of these methods, including the AM and EBox filters, behave in a nonlinear manner with respect to not only $\sigma $ but also the image size w (number of pixels n as well) because their formulations (e.g., coefficient optimization) rely on these parameters. Despite the fact that both $\sigma $ and n were linearly scaled from the case in Fig. 11, the VYV and Deriche filters generated the results with the large errors, as shown in Fig. 12. In contrast to these conventional methods, our theoretical formulations guarantee the high stability and accuracy of our approximations, which were numerically confirmed by our experiments. Since the quality of both ${\mathbf{J}}^{0}$ and the convolutions used in the joint filter are important, the approximation described in Sect. 4 is suitable for scale-aware filters.

Table 3 Timings (seconds) by box and our framework, where s represents the iteration number in Eq. (1). The image sizes and figure numbers ($\phi $ and $\sigma $ are described in the captions) are listed in Fig. 13

Full size table

We next evaluated the quality and computational speed of our scale-aware filters. The input images are listed in Fig. 13, and the timings are shown in Table 3. Figures 3, 4, and 18 demonstrate our scale-aware filtering results with varying scales ($\sigma $). The scale-space and salient edge features are clearly apparent. Our filters did not produce undesired artifacts generated by box-based convolutions (moving average [26], where $\sqrt{3}\sigma $ radius was employed to obtain the visually similar results), as shown in Figs. 1, 15, and 16. Figure 14 shows timing comparisons for various numbers of iterations s and images sizes, and it is summarized in Table 4.

Table 4 Speed comparison of box [26] and our scale-aware filters (megapixels per second), where s is the iteration number in Eq. (1)

Full size table

Our filters achieved linear computational speeds (slightly slower than the box kernel convolutions^{Footnote 3}), and their convergence properties were similar to those of the RG, SiR, and AG filters, as shown in Fig. 5. See Fig. 17 for the filtering effects of iterations $s\in \{4,20\}$ recommended by [36].

6 Conclusion

We have proposed a fast and accurate computational framework for scale-aware image filters. Our new framework is based on accurately approximating $L^{1}$ Gaussian convolution with respect to a transformed pixel domain representing geodesic distance on a guidance image manifold in order to recover salient edges in a manner faithful to scale-space theory while removing small image structures. Our framework achieved linear computational complexity with high-quality filtering results. We compared our framework numerically in terms of speed, precision, and visual quality with popular conventional methods.

Since our framework is robustly applicable to HDR images with a wide range of scale $\sigma $, applications to computational photography, engineering, and science are promising future work. Limitations of our framework are related to domain transformations and scale-aware filters, such as slow convergence of elongated image regions (Fig. 17) and reduced intensity (SiR, Fig. 4). Combining our framework with a guided filter [17] may reduce such artifacts. Future work will also include numerical comparisons with the non-uniform recursive method [12] (an extension of Deriche [6]).

Notes

Strictly speaking, it is a high-frequency suppression filter in the Fourier domain, and there is no closed-form relationship between $\sigma $ and the curvature of a geometric structure in the spatial domain. In contrast, because the fundamental solution of the linear diffusion equation leads to a Gaussian scale-space for some boundary conditions [20], a time variable of the (mean) curvature flow and $\sigma $ are directly related in continuous cases [23].
In statistics and its application to the empirical sciences, a $3\sigma $ radius is often used and considered sufficient to approximate normally distributed data (i.e., an $L^{2}$ Gaussian function). However, a Gaussian convolution with naive truncation within a $3\sigma $ radius produces some undesired ringing artifacts around sharp image edges [34].
Note that the timing results in Table 2 was optimized for one-dimensional array. In contrast, the results shown in Fig. 14 and Tables 3 and 4 were based on two-dimensional arrays which are not optimized for random memory access in the separable implementation.

References

Alvarez, L., Mazorra, L.: Signal and image restoration using shock filters and anisotropic diffusion. SIAM J. Numer. Anal. 31(2), 590–605 (1994). https://doi.org/10.1137/0731032
Article MathSciNet MATH Google Scholar
Astola, J., Haavisto, P., Neuvo, Y.: Vector median filters. Proc. IEEE 78(4), 678–689 (1990). https://doi.org/10.1109/5.54807
Article Google Scholar
Bashkirova, D., Yoshizawa, S., Latypov, R., Yokota, H.: Fast ${L}^{1}$ Gauss transforms for edge-aware image filtering. Proc. ISP RAS 29(4), 55–72 (2017). https://doi.org/10.15514/ISPRAS-2017-29(4)-4
Bhatia, A., Snyder, W., Bilbro, G.: Stacked integral image. In: IEEE ICRA, pp. 1530–1535 (2010). https://doi.org/10.1109/ROBOT.2010.5509400
Cuomo, S., Farina, R., Galletti, A., Marcellino, L.: A k-iterated scheme for the first-order Gaussian recursive filter with boundary conditions. In: FedCSIS, pp. 641–647. IEEE (2015). https://doi.org/10.15439/2015F286
Deriche, R.: Recursively implementing the Gaussian and its derivatives. Tech. Rep. RR-1893, INRIA (1993)
Duhamel, P., Vetterli, M.: Fast Fourier transforms: A tutorial review and a state of the art. Sig. Process. 19(4), 259–299 (1990). https://doi.org/10.1016/0165-1684(90)90158-U
Article MathSciNet MATH Google Scholar
Eisemann, E., Durand, F.: Flash photography enhancement via intrinsic relighting. ACM TOG 23(3), 673–678 (2004). https://doi.org/10.1145/1015706.1015778
Article Google Scholar
Elboher, E., Werman, M.: Efficient and accurate Gaussian image filtering using running sums. In: ISDA, pp. 897–902 (2012). https://doi.org/10.1109/ISDA.2012.6416657
Elder, J.: Are edges incomplete? IJCV 34(2–3), 97–122 (1999). https://doi.org/10.1023/A:1008183703117
Article Google Scholar
Gastal, E., Oliveira, M.: Domain transform for edge-aware image and video processing. ACM TOG 30(4), 69:1–69:12 (2011). https://doi.org/10.1145/2010324.1964964
Gastal, E., Oliveira, M.: High-order recursive filtering of non-uniformly sampled signals for image and video processing. CGF 34(2), 81–93 (2015). https://doi.org/10.1111/cgf.12543
Article Google Scholar
Getreuer, P.: A survey of Gaussian convolution algorithms. Image Process. Line 3, 276–300 (2013). https://doi.org/10.5201/ipol.2013.87
Article Google Scholar
Greengard, L., Strain, J.: The fast Gauss transform. SIAM J. Sci. Stat. Comput. 12(1), 79–94 (1991). https://doi.org/10.1137/0912004
Article MathSciNet MATH Google Scholar
Gwosdek, P., Grewenig, S., Bruhn, A., Weickert, J.: Theoretical foundations of Gaussian convolution by extended box filtering. In: SSVM, pp. 447–458 (2011). https://doi.org/10.1007/978-3-642-24785-9_38
Ham, B., Cho, M., Ponce, J.: Robust guided image filtering using nonconvex potentials. IEEE TPAMI 40(1), 291–207 (2018). https://doi.org/10.1109/TPAMI.2017.2669034
Article Google Scholar
He, K., Sun, J., Tang, X.: Guided image filtering. IEEE TPAMI 56(6), 1397–1409 (2013). https://doi.org/10.1109/TPAMI.2012.213
Article Google Scholar
Iijima, T.: Basic theory of pattern observation. In: Papers of Technical Group on Automata and Automatic Control. IEICE (1959). (Japanese)
Kniefacz, P., Kropatsch, W.: Smooth and iteratively restore: a simple and fast edge-preserving smoothing model. ArXiv/CoRR: 1505.06702 (2015)
Koenderink, J.: The structure of images. Biol. Cybern. 50, 363–370 (1984). https://doi.org/10.1007/BF00336961
Article MathSciNet MATH Google Scholar
Lindeberg, T.: On the axiomatic foundations of linear scale-space. In: Gaussian Scale-Space Theory, pp. 75–97. Springer (1997). https://doi.org/10.1007/978-94-015-8802-7_6
Mitsunari, S.: Fast approximate function of exponential function exp and log. Library: github.com/herumi/fmath (2012)
Osher, S., Sethian, J.: Fronts propagating with curvature-dependent speed: algorithms based on Hamilton- Jacobi formulations. J. Comput. Phys. 79(1), 1249 (1988). https://doi.org/10.1016/0021-9991(88)90002-2
Article MathSciNet MATH Google Scholar
Paris, S., Durand, F.: A fast approximation of the bilateral filter using a signal processing approach. IJCV 81(1), 24–52 (2009). https://doi.org/10.1007/s11263-007-0110-8
Article Google Scholar
Petschnigg, G., Szeliski, R., Agrawala, M., Cohen, M., Hoppe, H., Toyama, K.: Digital photography with flash and no-flash image pairs. ACM TOG 23(3), 664–672 (2004). https://doi.org/10.1145/1015706.1015777
Article Google Scholar
Sochen, N., Kimmel, R., Malladi, R.: A general framework for low level vision. IEEE TIP 7(3), 310–318 (1998). https://doi.org/10.1109/83.661181
Article MathSciNet MATH Google Scholar
Song, B.L.: Cappadocia balloon inflating. In: CC BY-SA 2.0 (2010). www.flickr.com/people/blieusong
Struik, D.: Lectures on Classical Differential Geometry, 2nd edn. Dover Publications (1988)
Toet, A.: Alternating guided image filtering. Peer J. Comput. Sci. 2, e72 (2016). https://doi.org/10.7717/peerj-cs.72
Article Google Scholar
Triggs, B., Sdika, M.: Boundary conditions for young-van vliet recursive filtering. IEEE TSP 54(6), 2365–2367 (2006). https://doi.org/10.1109/TSP.2006.871980
Article MATH Google Scholar
van Vliet, L., Young, I., Verbeek, P.: Recursive Gaussian derivative filters. In: ICPR, vol. 1, pp. 509–514 (1998). https://doi.org/10.1109/ICPR.1998.711192
Wang, P.S., Fu, X.M., Liu, Y., Tong, X., Liu, S.L., Guo, B.: Rolling guidance normal filter for geometric processing. ACM TOG 34(6), 173:1–173:9 (2015). https://doi.org/10.1145/2816795.2818068
Weickert, J., Ishikawa, S., Imiya, A.: Linear scale-space has first been proposed in Japan. J. Math. Imaging Vis. 10, 237–252 (1999). https://doi.org/10.1023/A:1008344623873
Article MathSciNet MATH Google Scholar
Yoshizawa, S., Yokota, H.: Fast ${L}^{1}$ Gaussian convolution via domain splitting. In: IEEE ICIP, pp. 2908–2912. IEEE-SP (2014). https://doi.org/10.1109/ICIP.2014.7025588
Zhang, J., Deng, B., Hong, Y., Peng, Y., Qin, W., Liu, L.: Static/dynamic filtering for mesh geometry. IEEE TVCG 25(4), 1774–1787 (2019). https://doi.org/10.1109/TVCG.2018.2816926
Article Google Scholar
Zhang, Q., Shen, X., Xu, L., Jia, J.: Rolling guidance filter. In: ECCV, LNCS, pp. 815–830. Springer (2014). https://doi.org/10.1007/978-3-319-10578-9_53

Download references

Author information

Authors and Affiliations

Image Processing Research Team, RIKEN, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
Shin Yoshizawa & Hideo Yokota

Authors

Shin Yoshizawa
View author publications
You can also search for this author in PubMed Google Scholar
Hideo Yokota
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shin Yoshizawa.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yoshizawa, S., Yokota, H. Fast and faithful scale-aware image filters. Vis Comput 37, 3051–3062 (2021). https://doi.org/10.1007/s00371-021-02249-5

Download citation

Accepted: 04 July 2021
Published: 05 August 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s00371-021-02249-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Fast and faithful scale-aware image filters

Abstract