1 Introduction

Shape and texture structures in images are composed of various scale sizes. Therefore, a multi-scale mathematical framework called scale-space theory [18, 33] to represent an image with different scales has been thoroughly studied for many years within pattern recognition and image processing communities. Traditionally, linear convolution of an image with the Gaussian function \(G_{\sigma }(\cdot )\) has been employed to obtain a scale-space representation that smooths out image structures smaller than a given scale parameter \(\sigma \in {\mathbb R}_{>0}\)Footnote 1 as the standard deviation of \(G_{\sigma }(\cdot )\). See the seminal book [21] for the methodology and applications of scale-space theory. The Gaussian convolution also rapidly flattens salient image edges with a size greater than \(\sigma \), but this is a highly redundant representation of edges over the scale space. Combining smooth image regions with salient edges is therefore more compact and desirable [10].

It is often difficult to control the parameters of popular edge-preserving image filters such as bilateral filters [24] and nonlinear diffusions (especially the number of iterations) to achieve a filtering result with a target scale \(\sigma \). More recently, scale-aware filters [16, 19, 29, 36], which remove image structures smaller than a specific scale while preserving salient edges, have attracted considerable attention because of their stability and simplicity. These scale-aware filters can directly and intuitively specify the target scale \(\sigma \), and the convergence of their iterative filtering process is very stable compared with conventional edge-aware filters. These filters have therefore been widely used in applications such as edge detection, detail enhancement, and image abstraction. The scale-aware concept has also been extended to 3D meshes [32, 35]. Since scale-aware filters usually require time-consuming computations, box-based averaging methods (e.g., [26]) have been employed for their fast approximation [36]. Unfortunately, a box-like averaging kernel that includes naive truncation of a Gaussian function often produces undesired artifacts because it leads to a sinc-like kernel shape in its Fourier domain. Extension of these methods [4, 9, 15] may therefore also cause artifacts. See Fig. 1 for an example of artifacts caused by a box filter and its relationship to a scale-aware filter (RG: Rolling Guidance [36]). Although a popular recursive Gaussian filter [6] has been extended to non-uniform pixels [12] and could perhaps be employed for a scale-aware filter, such recursive filters [1, 6, 31] have non-trivial numerical issues (e.g., some \(\sigma \) and boundary problems [5, 30]) because they optimize their coefficients for fixed parameters. Developing not only a fast but also an accurate approximation of scale-aware filters is thus important because of recent advances in the use of image data for science and engineering as well as images with a high dynamic range (HDR) and high resolution.

Fig. 1
figure 1

Top: linear smoothing (center) and scale-aware filtering (right, RG [36]) results with a box and our \(L^{1}\) Gaussian kernels (top-left, where the bottom-left graphs show their Fourier signals). Bottom images correspond to the magnitudes (square roots) of their gradients

In this paper, we propose a novel computational framework for fast and accurate approximation of scale-aware image filters. Our framework is based on adapting domain splitting [3, 34] and transformation [11, 12] techniques to construct an average-based joint filter that produces a nonlinear color averaging of a given image according to another guidance image. Our scale-aware filters recursively apply the constructed joint filter to recover edge features while removing small image structures. The joint filter consists of a separable implementation of \(L^{1}\) Gaussian convolutions on a guidance-transformed domain that is equipped with a metric of geodesic distance on an image manifold [26] composed of the pixel coordinates and their corresponding guidance image colors. The \(L^{1}\) Gaussian convolution is approximated very accurately with linear computational complexity by splitting the transformed domain into representative regions where discrete convolutions can be efficiently performed. Figure 2 gives an overview of the framework.

Fig. 2
figure 2

An overview of our framework. Algorithms are described in the following sections

Our framework is faithful to scale-space theory because it implies that the \(L^{1}\) Gaussian convolution never increases the number of extrema for the one-dimensional continuous case (mentioned as truncated exponential functions by [21] [§6.2.3]). In other words, it does not produce any phantom edges that do not exist in the original image for cases of linear filtering. It is also associated with the absence of ripples in the Fourier domain of the \(L^{1}\) Gaussian function, as shown in the bottom-left image of Fig. 1. We introduce the implementation of three conventional scale-aware filters [19, 29, 36] via our framework in this paper and an assessment of their performances versus the conventional box-based averaging method.

The rest of the paper is organized as follows. We present our scale-aware filters based on joint filters in Sect. 2. Sections 3 and 4 describe the guidance domain transformation and the domain-splitting Gaussian convolution for the joint filter, respectively. Our numerical experiments are explained in Sect. 5. We conclude the paper in Sect. 6.

2 Joint-averaging scale-aware filters

For a given image pixel \({\mathbf{x}}=\{x_{p}\}\) on \({\mathbb R}^{d}\), let \({\mathbf{I}}={\mathbf{I}}({\mathbf{x}})\) and \({\mathbf{J}}^{s}={\mathbf{J}}^{s}({\mathbf{x}})\) on \({\mathbb R}^{c}\) be the input and the s-th filtered image color of our framework, respectively, where \(d, c\in {\mathbb N}\) and \(s\in {\mathbb N}\,\cup \, \{0\}\). Our scale-aware filter is then recursively defined by

$$\begin{aligned} {\mathbf{J}}^{s+1} = {\mathbf{F}}({\mathbf{x}},{\mathbf{f}},{\mathbf{J}}^{s}),\quad {\mathbf{J}}^{0} \equiv \frac{\int G_{\sigma }({\mathbf{x}}-{\mathbf{y}}){\mathbf{I}}({\mathbf{y}})d{\mathbf{y}}}{\int G_{\sigma }({\mathbf{x}}-{\mathbf{y}})d{\mathbf{y}}} \end{aligned}$$
(1)

where \(\sigma \in {\mathbb R}_{>0}\) is a user-specified scale parameter, \(G_{\sigma }(\cdot ) = \exp (-\frac{|\cdot |}{\sigma })\) is a \(L^{1}\) Gaussian function (also known as a Laplace distribution in statistics and probability theory), and \({\mathbf{F}}\) is a functional of an average-based joint filter \({\mathbf{f}}\) defined in Eqs. from (2) to (5). Here, \({\mathbf{J}}^{0}\) is an initial smoothed image for removing small structures on the input \({\mathbf{I}}\) by using the normalized Gaussian convolution in Eq. (1). The joint filter \({\mathbf{f}}\) recovers salient image edges during the above recursive filtering process.

Let \({\mathbf{f}}={\mathbf{f}}({\mathbf{x}},{\mathbf{g}},{\mathbf{h}}): {\mathbb R}^{d+2c}\rightarrow {\mathbb R}^{c}\) be a joint filter of integrand \({\mathbf{h}}={\mathbf{h}}({\mathbf{x}})\) and guidance \({\mathbf{g}}={\mathbf{g}}({\mathbf{x}})\) on \({\mathbb R}^{c}\), respectively. The \({\mathbf{f}}\) is then given by a normalized convolution:

$$\begin{aligned}&{\mathbf{f}}({\mathbf{x}},{\mathbf{g}},{\mathbf{h}}) =\frac{\int W(\sigma ,\phi ,{\mathbf{x}},{\mathbf{y}},{\mathbf{g}}){\mathbf{h}}({\mathbf{y}}) d{\mathbf{y}}}{\int W(\sigma ,\phi ,{\mathbf{x}},{\mathbf{y}},{\mathbf{g}}) d{\mathbf{y}}}, \end{aligned}$$
(2)
$$\begin{aligned}&W(\cdot ) = \prod _{p=1}^{d} G_{\sigma }(T_{p,\lambda }(x_{p},{\mathbf{x}},{\mathbf{g}})-T_{p,\lambda }(y_{p},{\mathbf{y}},{\mathbf{g}})), \end{aligned}$$
(3)
$$\begin{aligned}&\lambda = \sqrt{\sigma /(\sigma _{s}\phi )} \end{aligned}$$
(4)

where \({\mathbf{y}}=\{y_{p}\}\in {\mathbb R}^{d}\), \(\phi \in {\mathbb R}_{>0}\) is a user-specified edge-awareness parameter, \(\sigma _{s}\) is the standard deviation of the integrand \({\mathbf{h}}\), and \(T_{p,\lambda }(\cdot ): {\mathbb R}^{d+c+1}\rightarrow {\mathbb R}_{>0}\) denotes a domain transformation with respect to the p-th coordinate basis (described in Eq. (7) of Sect. 3). The transformation \(T_{p,\lambda }(\cdot )\) measures a geodesic distance on the image manifold \((x_{p},\lambda \,{\mathbf{g}})\in {\mathbb R}^{c+1}\) in order to obtain an edge-recovering effect according to the guidance \({\mathbf{g}}\). Note that the Gaussian convolution \(G_{\sigma }(\cdot )\) in Eq. (3) uses only the scale parameter \(\sigma \) for its variance, but both \(\phi \) and \(\sigma \) characterize the transformation \(T_{p,\lambda }(\cdot )\) by \(\lambda \).

Filter Models: We have implemented the following three conventional filters in our framework: RG [36], SiR (Smooth and iteratively Restore [19]), and AG (Alternating Guided [29]). The RG literature [36] first proposed a scale-aware filtering framework based on a recursive process involving average-based joint filters such as joint/cross bilateral [8, 25], guided [17], and domain transformation [11] filters. The RG filter smooths the curvature of large-scale edges, whereas the SiR filter preserves curvature by exchanging the use of the integrand and guidance images in the joint filter. The AG filter also improves two SiR drawbacks: reducing intensity and restoring small structures around large-scale edges. Figures 3 and 4 demonstrate the different effects of these filters on the same parameters via our framework, and their stable convergence rates are shown in Fig. 5. Four iterations (\(s=4\)) have been recommended for fast results [36], and 20 iterations (\(s=20\)) are enough for high-quality results.

RG, SiR, and AG filters in our framework are given by \({\mathbf{F}}= {\mathbf{F}}({\mathbf{x}},{\mathbf{f}},{\mathbf{h}})\in {\mathbb R}^{c}\) in Eq. (1) such that

(5)

where \({\mathbf{M}}_{{\mathbf{x}}} = {\mathbf{M}}_{{\mathbf{x}}}(\cdot ) \in {\mathbb R}^{c}\) is a vector median filter [2] (only one-link neighbor pixels, i.e., a \(3\times 3\) pixel window for a 2D image). Note that the integrand and guidance (\({\mathbf{h}}\) and \({\mathbf{I}}\)) are exchanged in \({\mathbf{F}}\) of the RG and SiR filters on the right-hand side of Eq. (5) and that the AG filter applies the RG and SiR filters alternatively. In contrast to conventional filters [19, 29, 36], our framework is restricted to use of the domain transformation for the joint filter \({\mathbf{f}}\), as in Eq. (3), in order to apply the fast and accurate Gaussian convolutions described in Sect. 4. Algorithm 1 shows the pseudocode of our scale-aware filters.

Fig. 3
figure 3

RG (top), SiR (middle), and AG (bottom) results of filtering the Balloon image [27] via our framework, where \(\phi =1.5\) and \(\sigma \in \{16,32,64\}\) (left to right)

Fig. 4
figure 4

Zoomed images of the rightmost panels of Fig. 3: \(\sigma = 64\)

Fig. 5
figure 5

Convergence rates of our scale-aware filters demonstrated in Fig. 3. Horizontal and vertical axes are the number of iterations s and the normalized MAE (Mean Absolute Error divided by the product of c and \(\max ({\mathbf{I}})\)) between \({\mathbf{J}}^{s+1}\) and \({\mathbf{J}}^{s}\)

figure c

3 Guidance domain transformation

The domain transform technique [11, 12] provides edge-aware image smoothing efficiently, and it can be utilized for scale-aware image filters, as demonstrated in the RG filter [36]. The basic idea of this technique is to apply fast linear convolutions on the transformed domain representing the magnitude of the image edge as a metric of the domain (length) by using geodesic distance on an image manifold instead of performing expensive nonlinear convolutions, as shown in Fig. 6. We describe here the guidance domain transformation \(T_{p,\lambda }(\cdot )\) in Eq. (3) adapted to our framework in order to efficiently perform Gaussian convolutions of \({\mathbf{f}}\) in Eqs. (2) and (3).

Fig. 6
figure 6

Uniform pixels are transformed with respect to the geodesic distance of the image manifold to perform convolutions efficiently

For a given pixel \({\mathbf{x}}= \{x_{p}\}\) on \({\mathbb R}^{d}\), the same pixel as in Sect. 2, consider its one-dimensional local coordinate system \({\mathcal S}\) parallel to the p-th coordinate basis. The following straight line \({\mathbf{u}}_{p} = {\mathbf{u}}_{p}(t,{\mathbf{x}})\in {\mathbb R}^{d}\) passing through \({\mathbf{x}}\) then reparameterizes \({\mathcal S}\) by \(t\in {\mathbb R}\):

$$\begin{aligned} {\mathbf{u}}_{p}={\mathbf{u}}_{p}(t,{\mathbf{x}})\equiv & {} \{ \left\{ \begin{aligned} x_{l}\quad&\text {if}\,\, l\ne p,\\ t\quad&\text {Otherwise} \end{aligned} \right. \} \end{aligned}$$
(6)

where the origin of \({\mathcal S}\) is located on \({\mathbf{u}}_{p}(0,{\mathbf{x}})\).

For a given image color \({\mathbf{h}}({\mathbf{x}})=\{h_{q}({\mathbf{x}})\}\) on \({\mathbb R}^{c}\) at \({\mathbf{x}}\), the (\(\lambda \)-scaled) locus of image color \({\mathbf{h}}({\mathbf{u}}_{p})\) on the joint space of the p-th pixel coordinate and its color form a hyper curve \({\mathcal C}_{p}\), i.e., a one-dimensional image manifold,

$$\begin{aligned} {\mathcal C}_{p}: {\mathbf{r}}_{p}(t)=(t,\lambda \,{\mathbf{h}}({\mathbf{u}}_{p}))\in {\mathbb R}^{c+1} \end{aligned}$$

where \(\lambda \), defined in Eq. (4), controls the ratio of the metrics between pixel and color spaces. Note that \(\{x_{l}\}: l\ne p\) of \({\mathcal C}_{p}\) characterizes a \((d-1)\)-parameter family of hyper curves corresponding to each l-th pixel coordinate basis. The convolutions in Eq. (2) are thus performed separately in terms of varying the coordinate element p (see Eq. (3)).

The geodesic distance between two points on \({\mathcal C}_{p}\) indicates the amount and magnitude of the image edges within its corresponding range of t, because the arc length of \({\mathcal C}_{p}\) increases when its corresponding color \({\mathbf{h}}({\mathbf{u}}_{p})\) changes rapidly with respect to t. The arc length of \({\mathcal C}_{p}\) within the interval \(t\in [0,a]\) is given by integrating the magnitude of its tangent vector [28]: \(\int ^{a}_{0} ||\frac{\partial {\mathbf{r}}_{p}(t)}{\partial t}||\,dt = \int ^{a}_{0} ||(1,\lambda \frac{\partial {\mathbf{h}}({\mathbf{u}}_{p})}{\partial t})||\,dt\). Simple computations then yield our guidance domain transformation \(T_{p,\lambda }(\cdot )\) (employed in Eq. (3)), as follows:

$$\begin{aligned} \begin{aligned} T_{p,\lambda }(a,{\mathbf{x}},{\mathbf{h}}) = \int \limits ^{a}_{0} \sqrt{1+\lambda ^{2}\sum _{q=1}^{c} |\frac{\partial h_{q}({\mathbf{u}}_{p})}{\partial t}|^{2}}\,dt. \end{aligned} \end{aligned}$$
(7)

The above transformation (7) is performed on the image manifold \(({\mathbf{x}},\lambda \,{\mathbf{g}})\) as \(T_{p,\lambda }(x_{p},{\mathbf{x}},{\mathbf{g}})\) in order to incorporate edge information provided by the guidance \({\mathbf{g}}\) into the joint filter \({\mathbf{f}}\) in Eqs. (2) and (3).

figure d

Discrete Implementation: As recommended in [11], the first-order partial derivative of \(h_{q}({\mathbf{u}}_{p})\) with respect to t in Eq. (7) is approximated by using the forward difference scheme:

$$\begin{aligned} \frac{\partial h_{q}({\mathbf{u}}_{p})}{\partial t} \approx h_{q}({\mathbf{u}}_{p}(t+1,{\mathbf{x}}))-h_{q}({\mathbf{u}}_{p}(t,{\mathbf{x}})). \end{aligned}$$

The joint filter \({\mathbf{f}}\) is also implemented iteratively with the scale variable

$$\begin{aligned} \sigma _{i} = \sigma \sqrt{3}(2^{N_{ite}-i})/\sqrt{4^{N_{ite}}-1} \end{aligned}$$
(8)

where \(N_{ite}\) is the number of iterations (\(N_{ite}=3\) in our numerical experiments). The pseudocode in Algorithm 2 illustrates a separable implementation of our joint filter \({\mathbf{f}}\) based on the above domain transformations, where \(n_{p}\in {\mathbb N}\) is the number of pixels in the p-th coordinate basis (i.e., the image size of the p-th dimension). Its one-dimensional \(L^{1}\) Gaussian convolution on the non-uniform pixel coordinates (transformed domain) is also accurately performed by using our domain-splitting technique described in Sect. 4.

4 Domain-splitting Gaussian convolution

Using conventional fast methods such as the well-studied FFTs [7] and recursive filters [1, 6, 31] to perform Gaussian convolutions on a transformed domain is not trivial, because these methods have usually been designed for use with only uniform-image pixels. See [13] for an excellent review of fast Gaussian convolution methods. In contrast, a domain-splitting technique [3, 34] approximates a discrete analogue of a Gaussian convolution (known as a Gauss transform [14]) very accurately (without ringing artifacts) and quickly, even for a non-uniformly sampled variable of the Gaussian function. The key feature of this technique is decomposition of the integral domain of Gaussian convolution into sub-domains centered at each representative point on the domain by using the \(L^{1}\) norm for a metric instead of the popular \(L^{2}\) norm. We explain here the domain-splitting technique adapted to our framework in order to accurately compute the Gaussian convolutions of the initial smoothed image \({\mathbf{J}}^{0}\) in Eq. (1) and the joint filter (defined in Eqs. (2) and (3)) on the transformed domain \(T_{p,\lambda }(x_{p},{\mathbf{x}},{\mathbf{g}})\).

For a given pixel \({\mathbf{x}}=\{x_{p}\}\) on \({\mathbb R}^{d}\), consider a set of n points \({\mathcal P}: \{t_{i}\}\) on \({\mathbb R}\) representing the transformed pixels of \({\mathbf{x}}\) with respect to its p-th coordinate basis and corresponding color \({\mathbf{h}}=\{h_{q}({\mathbf{x}})\}\in {\mathbb R}^{c}\) such that

$$\begin{aligned} t_{i}\equiv & {} T_{p,\lambda }(i,{\mathbf{x}},{\mathbf{h}}),\quad t_{1}\le t_{2} \le \cdots \le t_{n}, \\ h(t_{i})\equiv & {} h_{q}({\mathbf{u}}_{p}(i,{\mathbf{x}})),\quad i = \{1,2,\ldots ,n\} \end{aligned}$$

where \(h(t_{i})\in {\mathbb R}\) is a q-th channel color at \({\mathbf{t}}_{i}\), \(n\in {\mathbb N}\) is the number of pixels, and the transformation \(T_{p,\lambda }(\cdot )\) with its straight line \({\mathbf{u}}_{p}\) are defined in Eqs. (7) and (6), respectively. The \(L^{1}\) Gauss transform \(f(\cdot )\in {\mathbb R}\) with \(h(\cdot )\) at \(t_{j}\in {\mathcal P}\) is given by

$$\begin{aligned} f(t_{j}) = \sum _{i=1}^{n} G_{\sigma }(t_{j}-t_{i})h(t_{i}) \end{aligned}$$
(9)

where naively computing \(f(t_{j})\) for all \(j=\{1,2,\ldots ,n\}\) requires quadratic computational complexity \(O(n^{2})\).

Basic Decomposition Concept: For a fixed point \(t_{1}\), splitting the domain of the \(L^{1}\) norm distance between \(t_{i}\) and \(t_{j}\) with respect to their order yields

$$\begin{aligned} |t_{i}-t_{j}|= & {} \left\{ \begin{aligned} |t_{i}-t_{1}|-|t_{j}-t_{1}|&\quad \text {if}&t_{1}\le t_{j}\le t_{i},\\ -|t_{i}-t_{1}|+|t_{j}-t_{1}|&\quad \text {if}&t_{1}\le t_{i} < t_{j}. \end{aligned} \right. \end{aligned}$$

This process is illustrated conceptually in Fig. 7. Substituting the above equation into an \(L^{1}\) Gaussian \(G_{\sigma }(t_{i}-t_{j})\) leads to

$$\begin{aligned} G_{\sigma }(t_{i}-t_{j})= & {} \left\{ \begin{aligned} \frac{G_{\sigma }(t_{i}-t_{1})}{G_{\sigma }(t_{j}-t_{1})}&\quad \text {if}&t_{1}\le t_{j}\le t_{i},\\ \frac{G_{\sigma }(t_{j}-t_{1})}{G_{\sigma }(t_{i}-t_{1})}&\quad \text {if}&t_{1}\le t_{i} < t_{j} \end{aligned} \right. \end{aligned}$$

where the dependency of the two indices i and j within the Gaussian is resolved such that a two-variable function can be replaced by a combination of two one-variable functions. The \(L^{1}\) Gauss transform \(f(\cdot )\) consequently becomes

$$\begin{aligned} f(t_{j})= & {} h(t_{j}) + G_{\sigma }(t_{j}-t_{1})\xi (j-1) +\frac{\eta (j+1)}{G_{\sigma }(t_{j}-t_{1})},\nonumber \\ \end{aligned}$$
(10)
$$\begin{aligned} \begin{aligned} \xi (j)= & {} \sum _{i=1}^{j}\frac{h(t_{i})}{G_{\sigma }(t_{i}-t_{1})},\quad \eta (j) = \sum _{i=j}^{n}G_{\sigma }(t_{i}-t_{1})h(t_{i}) \end{aligned}\nonumber \\ \end{aligned}$$

where \(\xi (0) \equiv 0 \equiv \eta (n+1)\). Note that Eq. (10) depends on only the index j, and \(\xi (\cdot )\) and \(\eta (\cdot )\) can be pre-computed (with linear computational time) for all \(j=\{1,2,\ldots ,n\}\) at once before calculating \(f(t_{j})\). Linear computational complexity O(3n) is hence required to calculate \(f(t_{j})\) for all indices j by the decomposition (10). However, some numerical instability makes it difficult to compute Eq. (10) accurately. For instance, \(\frac{1}{G_{\sigma }(t_{i}-t_{1})}\) may cause an overflow for very large values of \(|t_{i}-t_{1}|\).

Fig. 7
figure 7

A basic domain decomposition concept that enables measurement of the distance between \(t_{i}\) and \(t_{j}\) by subtraction of two distances via an anchor coordinate \(t_{1}\)

Accurate Approximation To avoid the above-mentioned numerical problem, consider a set of m representative poles \(\{\alpha _{k}\}\) on \({\mathbb R}\) instead of using the fixed point \(t_{1}\). Assuming that \(\alpha _{1}<\alpha _{2}<\cdots <\alpha _{m}\), the domain splitting of \(|t_{i}-t_{j}|\) around the pole \(\alpha _{k}\) is given by

$$\begin{aligned} |t_{i}-t_{j}|= & {} \left\{ \begin{aligned} |t_{i}-\alpha _{k}|-|t_{j}-\alpha _{k}|&\quad \text {if}\,\,&t_{j}\in \Omega _{i,k}^{1},\\ -|t_{i}-\alpha _{k}|+|t_{j}-\alpha _{k}|&\quad \text {if}\,\,&t_{j}\in \Omega _{i,k}^{2},\\ |t_{i}-\alpha _{k}|+|t_{j}-\alpha _{k}|&\quad \text {if}\,\,&t_{j} \in \Omega _{i,k}^{3} \end{aligned} \right. \end{aligned}$$

where the domains \(\Omega _{i,k}^{1}\), \(\Omega _{i,k}^{2}\), and \(\Omega _{i,k}^{3}\) are defined by

$$\begin{aligned} \Omega _{i,k}^{1}= & {} \{z\in {\mathcal P}: \alpha _{k}\le z\le t_{i},\,\text {or}\,\,t_{i}\le z\le \alpha _{k}\},\\ \Omega _{i,k}^{2}= & {} \{z\in {\mathcal P}: \alpha _{k}\le t_{i}< z,\,\text {or}\,\,z< t_{i}\le \alpha _{k}\},\\ \Omega _{i,k}^{3}= & {} \{z\in {\mathcal P}: z< \alpha _{k}\le t_{i},\,\text {or}\,\,t_{i}\le \alpha _{k} < z\},\\ \end{aligned}$$

as shown in Fig. 8. This domain splitting with the poles \(\{\alpha _{k}\}\) thus makes the Gauss transform \(f(\cdot )\) in Eq. (9) become

$$\begin{aligned} f(t_{j}) = h(t_{j})+C_{j}+D_{j}+E_{j}, \end{aligned}$$
(11)
$$\begin{aligned} \begin{aligned} C_{j}&=\{G_{(\sigma ,j,\gamma (j))}\sum _{i=\gamma _{2}(\gamma (j))}^{j-1}\frac{h(t_{i})}{G_{(\sigma ,i,\gamma (j))}}\}\,+\\&\quad +\{\frac{1}{G_{(\sigma ,j,\gamma (j))}}\sum _{i=j+1}^{\gamma _{2}(\gamma (j)+1)-1}G_{(\sigma ,i,\gamma (j))}h(t_{i})\}, \end{aligned} \end{aligned}$$
(12)
$$\begin{aligned} D_{j}= & {} \sum _{k=1}^{\gamma (j)-1}G_{(\sigma ,j,k)}A_{k},\quad E_{j} = \sum _{k=\gamma (j)+1}^{m}G_{(\sigma ,j,k)}B_{k}, \end{aligned}$$
$$\begin{aligned} A_{k}= & {} \sum _{i=\gamma _{2}(k)}^{\gamma _{2}(k+1)-1}\frac{h(t_{i})}{G_{(\sigma ,i,k)}},\quad B_{k} = \sum _{i=\gamma _{2}(k)}^{\gamma _{2}(k+1)-1}G_{(\sigma ,i,k)}h(t_{i}) \end{aligned}$$
(13)

where \(G_{(\sigma ,l,k)} \equiv G_{\sigma }(t_{l}-\alpha _{k})\), \(\gamma (j) = k\), and \(\gamma _{2}(k) = \text {min}(j)\) such that \(\alpha _{k}\le t_{j} <\alpha _{k+1}\), as illustrated in Fig. 9.

Fig. 8
figure 8

Division of the domains \( \Omega _{i,k}^{1}\), \( \Omega _{i,k}^{2}\), and \( \Omega _{i,k}^{3}\) via the poles \(\{\alpha _{k}\}\)

Fig. 9
figure 9

Indexing functions \(\gamma (\cdot )\) and \(\gamma _{2}(\cdot )\)

For a given upper bound of computational precision \(\delta \), the inequality \(\exp (\frac{|\alpha _{k+1}-\alpha _{k}|}{\sigma }) < \delta \) should be held to avoid the above-mentioned numerical instability. Satisfying this condition leads to a stable choice of distances between poles by using the relationship \(|\alpha _{k+1}-\alpha _{k}| = \varphi \sigma \log (\delta )\), where \(\varphi \in (0,1)\) is a parameter. In our framework, we use \(\varphi = 0.5\) and a double floating-point precision format for \(\delta \) (i.e., 64bit DBL_MAX of the C++ programming language). Since the range of \({\mathcal P}\) is defined by \(w = t_{n}-t_{1}>0\), the number of poles and their coordinates are automatically determined by

$$\begin{aligned} \{\alpha _{k}\} = t_{1}+\{0,1,\ldots ,m-1\}\frac{w}{m},\quad m = [\frac{w}{\varphi \sigma \log (\delta )}] \end{aligned}$$
(14)

where \([\cdot ]\) is the ceiling function and \(m \ll n\) in general cases. Although \(t_{1}\) is always equal to zero in our framework, shifting the first pole (\(\alpha _{1} = t_{1}\)), as in Eq. (14), guarantees that Eq. (11) is numerically stable because of how the poles \(\{\alpha _{k}\}\) are chosen (see Fig. 10).

Fig. 10
figure 10

Stable domain splitting by \(\{\alpha _{k}\}\), where the indices j and k are chosen such that \(|\alpha _{k+1}-\alpha _{k}| \ge |t_{j}-\alpha _{\gamma (j)}| \ge |t_{\gamma _{2}(\cdot )}-\alpha _{\gamma (\cdot )}|\) in Eqs. (12) to (13). It implies \(\delta > \exp (\frac{|t_{j}-\alpha _{k}|}{\sigma })\) in this figure

If \(|\alpha _{k}-t_{j}| > \sigma \log (\delta )\), then \(G_{\sigma }(\alpha _{k}-t_{j})\) becomes numerically zero where \(\alpha _{k}\) is located far from \(t_{j}\). Therefore, \(D_{j}\) and \(E_{j}\) are approximated by

$$\begin{aligned} D_{j} \approx G_{(\sigma ,j,\gamma (j)-1)}A_{\gamma (j)-1},\quad E_{j} \approx G_{(\sigma ,j,\gamma (j)+1)}B_{\gamma (j)+1}\nonumber \\ \end{aligned}$$
(15)

where \(D_{j} \approx 0\) if \(\gamma (j) < 2\), and \(E_{j} \approx 0\) if \(\gamma (j) > n-1\). The approximation error at \(t_{j}\) is given by

$$\begin{aligned} \sum _{k=1}^{\gamma (j)-2}G_{\sigma }(t_{j}-\alpha _{k})A_{k}+\sum _{k=\gamma (j)+2}^{m}G_{\sigma }(t_{j}-\alpha _{k})B_{k}. \end{aligned}$$

The above approximation of \(f(t_{j})\) thus possesses accuracy equivalent to truncation of a sum within a \(3 \varphi \sigma \log (\delta )\) region. Such a truncation is very accurateFootnote 2 and equivalent to naive truncation with a radius of about \(525\sigma \), where \(\varphi =0.5\) and \(\log ({\mathrm{{DBL}}\_\mathrm{{MAX}}})\approx 700\). Also, this approximation algorithm requires only \(O(4n+2m+\max (n,m))\) operations (\(O(1/[\sigma ])\) with respect to \(\sigma \)).

figure e

Efficient Implementation The above domain-splitting technique is efficiently implemented in our framework by using the following initialization (DSInit) and integration (DSSum) procedures described in the pseudocodes of Algorithms 3 and 4, respectively. DSInit\((\cdot )\) computes the poles \(\{\alpha \}\) in Eq. (14) and some of the coefficients \(C_{j}\), \(D_{j}\), and \(E_{j}\) in Eqs. (12) and (15). DSSum\((\cdot )\) then calculates the \(L^{1}\) Gauss transform \(\{f(t_{j})\}\) in Eq. (11) with respect to the given colors \(\{h(t_{j})\}\) by using the coefficients determined by DSInit\((\cdot ): Coef\). Since Coef depends on only the pixel (transformed) coordinates \({\mathcal P}: \{t_{j}\}\) and scale \(\sigma \), it can be reused for different color channels, as in Algorithm 5 (GaussTransform in the joint filter: Algorithm 2), as well as for the uniform pixel coordinates, as in Algorithm 6 (GaussianFilter in the scale-aware filter: Algorithm 1, i.e., \({\mathbf{J}}^{0}\) in Eq. (1)). Note that the terms \(\sum _{i} h(t_{i})/G_{\sigma }(t_{i}-\alpha _{\cdot })\) and \(\sum _{i} G_{\sigma }(t_{i}-\alpha _{\cdot })h(t_{i})\) appear in the equations for \(A_{k}\), \(B_{k}\), and also \(C_{j}\). The results of computing \(A_{k}\) and \(B_{k}\) are therefore reused to obtain \(C_{j}\) in DSSum\((\cdot )\) (see Algorithm 4). We also employed a fast library [22] for the exponential function \(\exp (\cdot )\) in DSInit\((\cdot )\). Ignoring substitutions and indexing, the complexities and costs of DSInit\((\cdot )\) and DSSum\((\cdot )\) are listed in Table 1.

figure f
figure g
figure h
Table 1 Complexities and costs: numbers of multiplications (Mult.), additions (Add.), and exponential functions (\(\exp (\cdot )\)) per pixel

5 Numerical experiments

All numerical experiments in this paper were performed on an i9-10980XE CPU (3.0 GHz, 36 core, and no parallelization was used) PC with 128 GB RAM and a 64-bit OS with a GNU C++ 9.3 compiler.

We first examined the accuracy and computational speed of our domain-splitting approximation (Our: uniform and Our NU: non-uniform pixels) described in Sect. 4 (Algorithms 6 and 5) by numerically comparing their accuracy and computational speed with those of the popular box and recursive filters such as the Box (moving average) [26], EBox (extended box) [15], SII (stacked image integral) [4, 9], Deriche [6], VYV [31], and AM [1] filters. Their implementations and boundary conditions were based on the open library of [13] with \(10^{-15}\) tolerance (4th order recursion). Table 2 shows the peak signal-to-noise ratio (PSNR) [24, 34] and the maximum error \(E_{\max }\) [34] of one-dimensional Gaussian filtering (\(d=1\) for \({\mathbf{J}}^{0}\) in Eq. (1) and convolutions in the joint filter) for relatively small \(\sigma \) (0.05% to 1% of image size \(w=n\)), where the exact Gauss transform (Eq. (9)) and FIR (Finite Impulse Response) [13] were employed for comparison with their approximations. Our approximations were slightly slower than those with the conventional methods but significantly more accurate (about \(10^{10}\) times).

Table 2 Timing and accuracy comparisons of \({\mathbf{J}}^{0}\) where \(d=1\), \(n=10^{4}\), \(h(\cdot )\in [0,1]\), averaged PSNR of 20 \(\sigma \in \{5,10,\ldots ,100\}\), and 100 datasets randomly generated for each \(\sigma \) were employed. Time shows averaged computational time (milliseconds) for \(n\in 10^{\{4,5,6\}}\)
Fig. 11
figure 11

1D Gaussian filtering \({\mathbf{J}}^{0}\) comparisons for \(w=n=10^{4}\) pixels with \(\sigma \in \{10^{2},10^{3}\}\), where differences between the approximations and their correct results are shown. Top-left: input and the correct results with \(\sigma =10^{2}\) based on FIR and exact \(L^{1}\) Gauss transform. Top-right: A boundary artifact of the Deriche [6] with \(\sigma =10^{3}\)

Figure 11 demonstrates a typical error profile of Table 2 parameter settings, where some ripple-shaped errors, which cause undesired artifacts and phantom edges, are observed for all conventional methods. At first glance, the VYV and Deriche filters look preferable for use in computer graphics applications because the magnitudes of their errors are small, as shown in Table 2 and Fig. 11. However, applying these methods stably and accurately for variable parameters is not trivial. During our experiments, we easily encountered cases of artifacts that could not be ignored or that resulted in complete failure. Figure 11 (top-right) shows a boundary artifact caused by use of the Deriche filter for large \(\sigma \) (10% of the image size, which is not irrelevantly large). This artifact may have been the result of one of the causes discussed in [5, 13, 30]. Moreover, the errors of these methods, including the AM and EBox filters, behave in a nonlinear manner with respect to not only \(\sigma \) but also the image size w (number of pixels n as well) because their formulations (e.g., coefficient optimization) rely on these parameters. Despite the fact that both \(\sigma \) and n were linearly scaled from the case in Fig. 11, the VYV and Deriche filters generated the results with the large errors, as shown in Fig. 12. In contrast to these conventional methods, our theoretical formulations guarantee the high stability and accuracy of our approximations, which were numerically confirmed by our experiments. Since the quality of both \({\mathbf{J}}^{0}\) and the convolutions used in the joint filter are important, the approximation described in Sect. 4 is suitable for scale-aware filters.

Fig. 12
figure 12

1D Gaussian filtering \({\mathbf{J}}^{0}\) comparisons with artifacts where \(n=10^{5}\) and \(\sigma \): \(10^{4}\) (left) and \(2\times 10^{4}\) (right). The PSNRs of the Deriche, VYV, Our, and Our NU in the left (right) image are equal to 24.7 (6.6), 9.2 (10.3), 280 (280), and 278 (280), respectively

Fig. 13
figure 13

Input images with their numbers of pixels (w: width and h: height) and numbers of figures employed

Table 3 Timings (seconds) by box and our framework, where s represents the iteration number in Eq. (1). The image sizes and figure numbers (\(\phi \) and \(\sigma \) are described in the captions) are listed in Fig. 13
Fig. 14
figure 14

Timings versus iterations (left: three different sizes) and versus numbers of pixels (right), where scaled Lena images representing an average of 10 iterations for each of 10 different scales (\(\sigma \in \{5,10,15\ldots ,50\}\)) were employed. Note that the SiR graphs are omitted because their profiles are similar to the RG graphs visually

Fig. 15
figure 15

Quality comparison of image of face for the RG, SiR, and AG filters with the box kernel (\(\sqrt{3}\sigma \) radius was used) and our framework (\(L^{1}\) Gaussian), where \(s=20\), \(\phi =1,\,\sigma =6\) (RG and AG) and \(\phi =0.5,\,\sigma =3\) (SiR). Bottom images correspond to the square root of the magnitude of the gradient (\(||\nabla {\mathbf{J}}^{20}||^{1/2}\)) of the eye and mouth parts of the upper images, where some artifacts appeared in the box-based results

We next evaluated the quality and computational speed of our scale-aware filters. The input images are listed in Fig. 13, and the timings are shown in Table 3. Figures 3, 4, and 18 demonstrate our scale-aware filtering results with varying scales (\(\sigma \)). The scale-space and salient edge features are clearly apparent. Our filters did not produce undesired artifacts generated by box-based convolutions (moving average [26], where \(\sqrt{3}\sigma \) radius was employed to obtain the visually similar results), as shown in Figs. 1, 15, and 16. Figure 14 shows timing comparisons for various numbers of iterations s and images sizes, and it is summarized in Table 4.

Fig. 16
figure 16

Quality comparison of image of glass for the RG, SiR, and AG filters with the box kernel (\(\sqrt{3}\sigma \) radius was used) and our framework (\(L^{1}\) Gaussian), where \(\phi =1.5\), \(\sigma =8\), and \(s=20\). Bottom images correspond to the same square root of the magnitude of the gradient (\(||\nabla {\mathbf{J}}^{20}||^{1/2}\)) as the bottom images in Fig. 15

Table 4 Speed comparison of box [26] and our scale-aware filters (megapixels per second), where s is the iteration number in Eq. (1)
Fig. 17
figure 17

Iteration effects of image of snack via our framework with \(\phi =1.5\), \(\sigma =16\), and \(s\in \{4,20\}\)

Fig. 18
figure 18

Various scales of image of ham via our framework with the AG filter where \(\phi =0.5\), \(s=20\), and \(\sigma \in \{4,8,16,32\}\)

Our filters achieved linear computational speeds (slightly slower than the box kernel convolutionsFootnote 3), and their convergence properties were similar to those of the RG, SiR, and AG filters, as shown in Fig. 5. See Fig. 17 for the filtering effects of iterations \(s\in \{4,20\}\) recommended by [36].

6 Conclusion

We have proposed a fast and accurate computational framework for scale-aware image filters. Our new framework is based on accurately approximating \(L^{1}\) Gaussian convolution with respect to a transformed pixel domain representing geodesic distance on a guidance image manifold in order to recover salient edges in a manner faithful to scale-space theory while removing small image structures. Our framework achieved linear computational complexity with high-quality filtering results. We compared our framework numerically in terms of speed, precision, and visual quality with popular conventional methods.

Since our framework is robustly applicable to HDR images with a wide range of scale \(\sigma \), applications to computational photography, engineering, and science are promising future work. Limitations of our framework are related to domain transformations and scale-aware filters, such as slow convergence of elongated image regions (Fig. 17) and reduced intensity (SiR, Fig. 4). Combining our framework with a guided filter [17] may reduce such artifacts. Future work will also include numerical comparisons with the non-uniform recursive method [12] (an extension of Deriche [6]).