High-performance fractional anisotropic diffusion filter for portable applications

Anisotropic diffusion is one of the most effective methods used in image processing. It can be used to eliminate the small textures of an image while preserving its significant edges. In this paper, a new anisotropic diffusion filter is proposed based on a fractional calculus kernel rather than integer kernel to improve the overall performance of the filter. Integer and fractional anisotropic filters are implemented using the Genesys-2 FPGA kit to utilize the efficiency of parallelism in FPGAs. Integer and fractional anisotropic filters are tested against the achievable PSNR value vs the number of iterations. The proposed fractional anisotropic filter has a better PSNR value using a smaller number of iterations, reducing the power and area compared to integer anisotropic filter. The proposed filter can be used in image smoothing, edge detection, image segmentation, image denoising, and cartooning. In addition, the proposed filter reduces the power consumption by 58.2% compared to integer-order filters, which makes the proposed filter suitable for battery-based applications.


Introduction
Anisotropic diffusion filters are an essential technique exploited in many image processing applications.The main feature of an anisotropic filter (AF) is that it can remove less important textures in the image while preserving its structural edges.The anisotropic filter smoothing is adaptive according to the nature of the image features.The diffusion operator decreases the smoothing intensity if it detects an edge, while adaptively increasing the intensity of the smoothing otherwise.The anisotropic filter iterates over the image multiple times, removing more texture per iteration.Anisotropic diffusion filters are utilized in many different image processing applications such as image denoising, image smoothing, image cartooning, and image segmentation [1].
As for image denoising, a combination of convolutional neural network (CNN) and anisotropic diffusion filter is proposed for a multilevel denoising system in [2].Authors in [3] proposed a new total variation (TV) anisotropic diffusion system by coupling the locally weighted matrix with the gradient operator to denoise images.By exploiting iterative nonconservative edge-based force fields, authors in [4] used a nonconservative shock filter (NCSF) to produce images with strong discontinuities that can be coupled with anisotropic filters to denoise images.A PDE adaptive exponent-driven regularization is used to denoising images while using anisotropic diffusion and a convolutional neural network (CNN) [5].A new class of adaptive Perona-Malik (PM) diffusion, which combines the PM equation with the heat equation, is proposed in [6].It uses the edge indicator 98 Page 2 of 12 as the variable exponent to control the diffusion model.In contrast to the PM model, the new model does not create new image features.A fourth-order anisotropic diffusion filter is proposed in [7] to denoise images based on the Hessian matrix.This method overcomes the staircase effect caused by second-order diffusion filters while also solving the problem of over-smoothing caused by other fourth-order diffusion filters.A novel weighted anisotropic total variational (WATV) model was proposed in [8] that can be used for image denoising and deblurring.Authors in [9] proposed a novel anisotropic diffusion speckle reduction filter.They used a probability density function of edges pixels along with other information to control the diffusion process.Anisotropic diffusion also has been used in image restoration.For example, a fourth-order anisotropic diffusion PDE model with a multi-well potential function was proposed for grayscale image inpainting in [10].In image abstraction, anisotropic diffusion with adaptive Kuwahara filtering (ADKF) presented in [11] can be used to abstract and segment images as it does not suffer from over-smoothing.Author in [12] proposed a scene simplification technique for optogenetic retinal prosthesis based on anisotropic filter.An eigenvalue anisotropic diffusion-based operator for image segmentation is proposed in [13].Authors in [2] proposed a hybrid digital image denoising approach using anisotropic diffusion and a convolutional neural network (CNN).The noisy images are fed to (CNN) for noise reduction, followed by the application of an anisotropic filter to preserve their edges.
Fractional calculus has been widely exploited in many systems due to its dynamicity.In fractional calculus, an infinite number of fractional orders can describe a system instead of limited integer orders, making it better to describe most of systems.Fractional calculus integration and derivation operators can be implemented by multiple methods, including direct implementation based on classic and derived definitions [14,15].Another set of implementation methods can be used to implement fractional operators based on approximations.For example, the frequency response of FIR and IIR filters can be approximated to the frequency response of the fractional operator.These methods can be used to implement the fractional-order operator needed for different systems, such as image processing [16][17][18][19][20].
Fractional anisotropic filter studies aimed to exploit fractional-order operators to replace the integer-order operators in the non-linear partial differential equations (PDEs).For example, authors in [21] proposed a new class of fractional-order anisotropic equations used for image denoising.The new class equations are based on Euler-Lagrange equations of a cost function.These equations can be considered as a natural interpolation between anisotropic diffusion equations of fourth-order [22], and Perona-Malik equations [23].
Authors in [24] proposed a PDE scheme for contour detection in medical images based on a fractional-order regularization term to make it more robust against noise.Authors in [25] proposed an ultrasound speckle filtering method for preserving edges and removing complex speckle noises in ultrasound images.In [26], a generalized fractional derivative (G-derivative)-based anisotropic diffusion equations are proposed for image denoising.A fractional generalization of diffusion model is proposed in [27].The proposed generalization is based on a conformable-based fractional derivative operator.In [28,29], a GL-based fractional anisotropic filter is proposed to smooth image texture while preserving its edges.Authors in [25] proposed a fractional anisotropic speckle filtering method of tissue-dependent speckle noises in ultrasound images.
This paper proposes a fractional calculus-based anisotropic diffusion filter and compares it with the traditional Sobelbased anisotropic filter.Caputo fractional definition is used to configure a gradient operator kernel instead of the Sobel kernel.An optimization algorithm is used to optimize the anisotropic filter for its optimum.Fractional and integer anisotropic filters are designed and implemented on the Genesys-2 FGPA kit to demonstrate the performance, power consumption, and area consumed using the two filters.This paper is organized as follows, Sect. 2 discusses the anisotropic diffusion filters.Section 3 discusses the fractional kernel used as a gradient operator for the anisotropic filter.Section 4 discusses the performance comparison between Sobel and Caputo-based gradient operators.Section 5 presents the optimization of the anisotropic filter.Section 6 presents the hardware implementation of the fractional anisotropic filter.Section 7 discusses the hardware results.Finally, Sect.8 concludes the work.

Anisotropic diffusion filters
Anisotropic diffusion filters are non-linear adaptive smoothing filters based on the properties of the image.The anisotropic filters iterate over the image to remove small unwanted textures, while preserving the main details and edges of the image.Anisotropic diffusion can also remove noises based on the same approach.Perona and Malik designed a non-linear anisotropic diffusion filter called PM in [23].In this filter, each smoothing filter is adaptive based on the detected local properties of the image.The filter decreases the smoothing level in case of an edge is detected.Otherwise, it increases the smoothing level.The model proposed by [23] is defined by: where I t is the image at a t time stamp, div is the divergence operator and ∇I is the gradient of the image.This model (1) is based on the heat diffusion model.If g is a constant, the equation is reduced to the isotropic heat diffusion equation I t , = g∇I .In case the diffusion operator g depends on ∇I(x, y, t) at different iterations, the diffusion filter is said to be non-linear as it changes in every iteration depending on the edges of the new iteration input.On the other hand, if the diffusion operator g does not change over different iterations, then the process is linear.The diffusion operator used in this paper is a non-linear operator that is defined by [23]: where ∇I H , ∇I V are the gradient of the image in the vertical and horizontal directions.The equivalent equation of (1) in discrete time is defined as: A block diagram describing the anisotropic filtering process based on (3) is shown in Fig. 1.Any gradient operators can be used to differentiate the image.This paper uses the Sobel operator as a study case for integer gradient operators described in Fig. 2.

Fractional gradient operator
Different fractional gradient operators were proposed based on different definitions [18,30].This paper uses a fractional gradient operator based on the Caputo fractional definition proposed in [31].The Caputo gradient operator is defined by: (2)

Anisotropic filter design
Compared to the Sobel kernel, the Caputo kernel has smaller coefficients which requires more iterations to compromise small coefficients.Thus, the Caputo filter is scaled with a scaling factor to solve this issue.The effect of scaling the kernels by a factor is studied using different scaling factors and measuring their performance in denoising images using the PSNR metric.An analysis of relationship between PSNR, scale, and iterations is performed using a gray 8-bit pixel Lena image 256 × 256 is shown in Fig. 3.The frac- tional order used for Caputo is 0.75.The original image is added to a Gaussian noise with a variance of 20.
Figure 3 shows that increasing the scaling factor increases the PSNR, similar to increasing the iteration number.For example, scaling fractional anisotropic by two and iterating it five times has the same PSNR value as the case of scaling it with one and iterating it nine times.On the other hand, increasing the scaling factor while increasing the number of iterations decreases the PSNR value.That is due to oversmoothing the image with an excessive number of iterations.Figure 3 also shows the maximum PSNR achieved by Sobel and Caputo anisotropic filters.Figure 3a shows that using a Sobel-based anisotropic filter, the maximum PSNR that can be achieved is 24.53 using 11 iterations.On the other hand, Fig. 3b shows that the Caputo anisotropic filter can achieve a PSNR of 24.99 dB using only three iterations.The maximum PSNR achieved by Caputo is 25.44 dB using 10 iterations.
To design a fractional-order anisotropic filter, an appropriate fractional order must be chosen based on certain criteria.Figure 4 and Table 1 show the maximum output PSNR of a denoised image using Caputo fractional anisotropic filter with different fractional orders.Scaling factors ranging from [1 : 200] are used to scale the fractional kernels then the maximum PSNR achieved by this kernel is plotted.The original image is added to Gaussian noise with a variance of 400.Based on Table 1, a fractional order of 0.88 has the highest PSNR of 27 dB using eight iterations compared to other fractional orders.To furtherly improve the output results of denoising, an optimization is done over different images to optimize and generalize the anisotropic filtering factors to its best.

Optimizing fractional anisotropic filter
According to (3), multiple factors can affect the performance of the fractional anisotropic filter.These factors are number of iterations N, fractional order , step size Δt an scaling factor.To optimize these factors, a metaheuristic optimization algorithm is used to find the best of these factors that achieve the best PSNR value.The fitness function of this optimization is fitness = PSNR.
Flower Pollination Algorithm (FPA) [32] is the optimization used for this problem due to its high accuracy.Tables 2, 3, 4, 5, 6, and 7 show the optimization results  of Caputo anisotropic filter using different iteration numbers and different 512 × 512 images.The tables show that increasing the iteration number increases the optimum achieved PSNR and SSIM to their maximum.After five iterations, the increase in the PSNR and SSIM dramatically decreases.Table 8 shows the best chosen parameters to achieve an optimum PSNR and SSIM values for any

Hardware implementation
FPGA platform is considered one of the most effective prototyping platforms for its parallelism capabilities, making it a better image processing platform than serial processors.For example, the author in [33] segmented the input image and processed all the segments simultaneously, which achieved better throughput that CPUs can not achieve.Taking advantage of the FPGA parallelism, the proposed architecture is pipelined to have better throughput.The system iterations are evaluated at the same time, which will lead to better throughput compared to evaluating each iteration successively.

FPGA implementation of fractional kernel
The implementation of the anisotropic filter requires the implementation of the diffusion operator first.Sobel and Caputo kernels are implemented separately to study the required resources to implement them.Table 9 shows both kernels' resources usage, power consumption, and maximum operating frequency.The table shows that Caputo kernel has higher power consumption and more delay than the Sobel kernel.In addition, Caputo kernel requires more DSP units to implement.This higher amount of resources is due to the complexity of the Caputo kernel, as it requires a fixed-point representation and fixed-point mathematical operations due to the fractional coefficients.

Anisotropic filters FPGA implementation
This section inspects the hardware implementation of anisotropic filters and their results.Figure 5 shows the hardware structure of the proposed anisotropic filter with different iterations.The hardware implementation uses the pixel buffer method proposed in [34].

Pixel buffer
The pixel buffer method proposes a delay line shift register with a size of W * (K s − 1) + K s .The delay register receives the pixel stream and stores it to limit the filter kernel from reading the same pixel value multiple times.Figure 6a shows the buffered image in the linear buffer.The green squares represent the pixels that are multiplied by the kernel.The gray squares represent the pixels that will be used for the kernel multiplication as the pixel buffer shifts.Figure 6b shows the block diagram of the pixel buffer.The pixel buffer is implemented as a shift register with multiple outputs at distinct indexes colored in green.
Figure 7 shows the different states of pixel buffer.Figure 7a shows the pixel buffer at its first stage at the top of the image.The pixels are shifted in the pixel buffer until register at index W × ⌊K s ∕2⌋ + ⌈K s ∕2⌉ is loaded with the first pixel of the image.When this condition is satisfied, the second stage in Fig. 7b begins.In this stage, the pixel buffer passes its outputs to be multiplied by the kernel.The side neighbor pixels are wrapped around, while the top pixel neighbors are assigned to zero (marked in gray).Figure 7c is the stage where all the shift register is loaded with the image pixels.There are no neighbor pixels that are needed to be assigned to zero.The left and right neighbor pixels are only wrapped in case the kernel is at the image edges.Figure 7d shows the final stage of the pixel buffer, where the buffer finishes shifting the image pixels and starts to assign empty registers to zero to finish the convolution process of the last pixels in the image.This stage starts when the last image pixel is loaded to the pixel buffer and continues till the last pixel of the image is at the center of the kernel.

Kernel convolution
Figure 8 shows the convolution of the pixel buffer output pixels.First, each pixel is multiplied by the corresponding kernel coefficient.Then, multiplication outputs are summed together to construct the output pixel in its corresponding index.

G function
The implementation of the anisotropic filter diffusion operator is shown in Fig. 9.The main struggle in the diffusion operator realization is the implementation of a hardware square root.To implement fixed-point square root, an approach similar to the long division approach is used.The used approach is summarized as: 1. Set n as the number of radicand x bits divided by 2. 2. Select the most significant two bits of x and shift it to the left by 2 to a.
(a) (b) 3. Subtract the current result q appended with (2'b01) form a and assign the result to t: t = a − [q, 01]. 4. Check the subtraction result (t).If t ≥ 0 , then a = t and q is appended with 1, else append q with 0. 5. Subtract n by 1. 6.If n = 0 exit, else return to step 2.
The remaining blocks are implemented directly, considering the datapath width in different iterations.In the first iteration, the first buffer has an 8-bit width datapath, representing each pixel by eight bits.All the remaining blocks have a 32-datapath width with a different representation of the fixed-point q-notation.

Results
According to Sect. 5, Fig. 10 shows the denoising results of different images using the optimum factors of Caputo and Sobel anisotropic filters.Caputo uses 1 iteration while Sobel uses nine iterations.The figures show that Caputo anisotropic filter has a better preserving of edges while removing most of the noise texture compared to the Sobel filter.
Table 10 shows the resource usage of Sobel and fractional anisotropic filter using different iterations.As listed in Table 10, both methods have similar maximum operating frequency and power consumption considering the same number of iterations.However, the fractional anisotropic filter requires more DSPs than Sobel as it has more complex multiplication in the convolution stage.As detailed in Table 10, the resources consumed using the Caputo kernel are very close to those consumed using the Sobel kernel in case of using the same number of iterations.In contrast, the PSNR value of the fractional kernel is higher using the same amount of iterations compared to the Sobel kernel, which means that a fractional anisotropic filter will require fewer iterations than a Sobel anisotropic filter.Based on this example, the Caputo anisotropic filter will require one iteration to achieve an average PSNR of 26.156 dB for the five images listed in Table 8, while a Sobel kernel will only achieve an average PSNR of 25.18 dB using nine iterations. Figure 11 shows the resource consumption of Caputo using one iteration and Sobel using nine iterations.A fractional kernel using one iteration will require 93.51%, 95.76%, 89.3%, 58.2% less LUTs, registers, DSPs, and power, respectively.Table 11 shows the PSNR of employing nine iterations of the Sobel anisotropic filter, compared to using only one iteration of Caputo fractional anisotropic filter.Based on Fig. 11 and Table 11, Caputo fractional anisotropic filter outperforms the Sobel anisotropic filter in both performance and hardware resources usage.The hardware complexity in this algorithm increases linearly with the image width and the kernel width.This increase is in the size of the buffer register and the multipliers required for the convolution stage.The required buffer size is determined by: ×W × (Ks − 1) + Ks , where the W is the input image width and Ks is the kernel width.The required amount of multipliers (DSP unites) for convolution stage is the size of the kernel used for the processing.The latency of the system also increases when increasing the image width.The output comes late than the input by 2 × W × (Ks − 1) + Ks .For It number of iterations, the latency becomes 2 × It × W × (Ks − 1) + Ks.

Conclusion
This work presented a different implementation of the anisotropic diffusion filter using fractional gradient kernel instead of integer gradient kernel such as Sobel kernel.The utilization of the Caputo fractional kernel enhanced the PSNR ratio of output images related to the input images while requiring less number of iterations compared to the Sobel-based anisotropic filter.This reduced the total area required by the system and its power consumption.As per the results, one iteration using the Caputo-Anisotropic filter is equivalent to nine iterations, which leads to a reduction by 58.2% in power consumption, 93.51% reduction in LUTs, 95.76% reduction in Registers, and 89.3% reduction in DSP units.

Fig. 4
Fig.4Fractional-order vs maximum PSNR vs the number of iterations

Fig. 6
Fig. 6 Pixel buffer block diagram.a Pixel buffering an image.b Pixel buffer shift register

Fig. 7 Fig. 8 Fig. 9
Fig. 7 Image buffering in the line buffer.a First stage, b second stage, c third stage, and d fourth stage

Fig. 10
Fig.10 Image denoising using AF a, e, i, m, q, u original images, b, f, j, n, r, v noisy images, c, g, k, o, s, w denosied images using Sobel AF, and d, h, l, p, t, and x denoised images using Caputo AF

Table 1
Comparison between iterations required to achieve different PSNR values using Caputo and Sobel anisotropic filtering

Table 10
Comparison of resources used by Sobel and fractional anisotropic filters with different iterations designed for 256 × 256 images 98 Page 10 of 12