Explaining Image Enhancement Black-Box Methods through a Path Planning Based Algorithm

Nowadays, image-to-image translation methods, are the state of the art for the enhancement of natural images. Even if they usually show high performance in terms of accuracy, they often suffer from several limitations such as the generation of artifacts and the scalability to high resolutions. Moreover, their main drawback is the completely black-box approach that does not allow to provide the final user with any insight about the enhancement processes applied. In this paper we present a path planning algorithm which provides a step-by-step explanation of the output produced by state of the art enhancement methods, overcoming black-box limitation. This algorithm, called eXIE, uses a variant of the A* algorithm to emulate the enhancement process of another method through the application of an equivalent sequence of enhancing operators. We applied eXIE to explain the output of several state-of-the-art models trained on the Five-K dataset, obtaining sequences of enhancing operators able to produce very similar results in terms of performance and overcoming the huge limitation of poor interpretability of the best performing algorithms.

Figure 1: Schematic view of eXIE.Given as input a low quality image x, a pretrained model for image enhancement is used to obtain an high quality version ŷ.Both, the low quality and the high quality versions of the images are used to execute a modified version of the A * algorithm in order to find the shortest sequence of enhancing operators [a 0 , a 1 , . . ., a n−1 ] that emulates the enhancement process.Once this sequence is obtained, each operator is applied to the low quality version to enhance it.eXIE can be used in used in many application scenarios.For instance it can be used to provide a baseline to professional photographers, that can revise and adjust the proposed processing pipeline instead of starting from scratch.It can also be used as an educational tool to support beginners in understanding when to apply image processing operators.
To verify the quality of the sequences found by the proposed method we performed a thorough experimentation on the Five-K data set, comparing several other state of the art methods results with their corresponding version modified by eXIE.The loss in accuracy caused by eXIE was minimal, and in some cases the method was even able to improve the underlying enhancement method.Moreover, eXIE showed impressive results when applied to high resolution images.This paper presents the following main contributions: • The eXIE algorithm for the explanation of the output of existing image enhancement methods.As far as we know, this is one of the first works combining path finding and image enhancement.
• eXIE is one of the first explainable algorithm appositely developed to obtain a step-by-step interpretation of image enhancement methods output predictions.
• A novel heuristic function that is used by the modified A * algorithm.
• An in depth experimentation, in which we observed how eXIE was able to explain the outputs of several state of the art methods with minimal loss in performance as well as to reproduce with high fidelity human retouched images.
The paper is organized as follows: Section 1 presents the most relevant works on image enhancement and explainable AI published in the literature.Then, Section 2 describes the proposed method and how it works.Section 3 presents the results obtained by applying the method on the images produced by several state of the art methods for image enhancement and by expert photographers.Finally, Section 4 concludes the paper with a discussion about possible directions for future investigation on this topic.

Related Works
Image enhancement is a classic problem in image processing in which a low quality image is transformed in its high quality version while preserving its visual content.Providing an explaination of the behavior of these methods, is very important to allow beginner photo editors to learn how to visually enhance low quality images using the output of these method as reference.In this section we analyze two different families of works related to our method: Image Enhancement methods and Explainable AI algorithms (XAI).[7].
Another family of methods includes the parametric approaches.In this case, the neural network learns the coefficients of a parametric color transformation to be applied to the low quality input image to obtain the enhanced version.

Explainable AI
In the last years, artificial intelligence (AI) and deep learning based methods showed super performances in several fields and applications as computer vision, natural language processing, time series analysis, etc.One major drawback of AI methods is their intrinsic lack of interpretability.Several works have been presented in order to provide explanations for neural networks' predictions, making them interpretable for the final user.
Grad-Cam is one of the most famous XAI algorithms.This method, following the gradient flow in a convolutional neural network, is able to provide a localisation map as output highliting the most relevant part of the image where the network was mainly focusing its attention to output the final prediction.Moreover, the method was also tested on different tasks such as visual question answering (VQA) and captioning proving the ability of the model of analyzing and providing an explanation of neural networks reasoning [15].
Similarly, Zintgraf et al. proposed a prediction difference analysis method to visualize the neural netowrks' predictions in the task of image classification.This method analyzes the prediction provided by the neural network and assigns a score to each input feature with respect to a chosen class.Unlike to Grad-Cam, this method, highlights part of the input feature map using conditional and multivariate sampling.This general approach was tested on two different domains: natual images from ImageNet dataset and medical magnetic resonance imaging scans [16].
Shirikumar et al. developed a method called Deep Learning Important FeaTures (DeepLIFT).This method, decompose the neural network's predictions on a defined input, backpropagating the contribution of each neuron of the network to the input features.A contribution score is assigned to each of the neuron in the CNN in an efficient way with only a single backpropagation step.This score is based on the difference between the activation of each neuron and a reference activation [17].

Lundberg et al. presented an XAI feature importance method based on SHapley Additive exPlanations values (SHAP).
Given a prediction, a series of values are assigned to each feature accordingly with the importance of the singular feature in the classification process [18].
Ribeiro et al, proposed a decision rules based method for explaining the behavior of complex model.For space reasons, the considered graph is generated using only three editing operators over all the channels of the images and it is truncated after two levels.
One of the most powerful architectures for image generation are generative adversarial networks (GANs).This method shows high performance in several domains, producing great images with high quality details.One of the main drawback of this method is to be a complex black-box model.Nguyen et al. developed a GAN, based on activation maximization, to highlight the features learned by each neuron of the generator in order to explain and interpret the generation process [20].
Li et al. proposed an unrolling based method called DUBLID (Deep Unrolling for Blind Deblurring).The unrolling procedure was used to decompose an iterative algorithm, able to emulate in the gradient domain a total-variation regularization method, in to a fully interpretable neural network for image deblurring [21].
Wang et al. developed an interpretable sparse coding network for image super resolution.The proposed architecture, composed of a cascade of spare-coding network with different scale factors, was able to obtain very good performances without the introduction of artifacts in the final images [22].

Method
The aim of the proposed method is to find an equivalent sequence of enhancing operators able to emulate the enhancement process of another state of the art method.The search of the sequences is performed by a modified version of the A * algorithm [23].A * is an efficient path finding algorithm, able to find the shortest path from an initial node to a final node in a graph, if it exists.
To find the shortest path between two nodes in a graph, the algorithm uses two functions to evaluate the nodes: • the backtrack function g(x), that computes the length of the path from starting node to the current nod xe.
• The heuristic function h(x) that estimates the length of the optimal path connecting the current node x to the target node.In order to ensure the optimality of the solution, the heuristic must be optimistic, i.e. the estimates of the length of the path to the target node must not exceed the actual distance.
These two functions are added together and the resulting value f (x) = g(x) + h(x) is assigned to each node.The algorithm iteratively visits one node at a time following ascending f values.The search terminates when the final node is visited.

eXIE
In this work, the nodes composing the graph are images and the edges connecting two nodes are image processing operators.A connection between two nodes in the graph corresponds to a transition from an image (node) to its modified version obtained applying an editing operator to the former one.Figure 2 shows an example of the graph traversed by the search algorithm.
The initial node is identified by the low quality image to be enhanced x.The goal is to find the shortest sequence of enhancing operators [a 0 , a 1 , . . ., a n−1 ] to improve the quality of the image x ( formally, y * = a n−1 (a n−2 (. . .a 0 (x))) as close as possible to the image produced by the considered image to image model ŷ.
As editing operators we considered a small set of general filtering functions widely used in image processing.In the following x ijc represents channel c ∈ {R, G, B} of the pixel with (i, j) coordinates: • Brightness adjustment: (1) We considered the parameters δ ∈ ∆ = {−0.05,+0.05, −0.005, +0.005}, and applied to all the color channels or to a single color channe.• Contrast adjustment: where µ c is the average channel value.We considered the variants with σ ∈ Σ = {0.9,1.4} and applied the operator channel wise considering all channels or just one.
(3) We considered the two values γ ∈ Γ = {0.6,1.05}, and applied the transformation channel wise considering all channels or just one.
Values and the operators contained in the three sets ∆, Γ, Σ have been selected in order to enhance the input image with a reasonable number of applications of the operators.Smaller values would lead to the same results but requiring a longer time for the searching process.The value of the input pixels is always supposed to be in the [0, 1] range, and all output values are clipped to stay in that range.In total 32 image processing operators where considered.

Heuristic Function
The heuristic function is a key element in the search algorithm.It estimates the length of the path from a given node to the target node.In order to guarantee the optimality of the solution it needs to be optimistic, that is, it needs to underestimate the actual distance between the nodes.On the other hand, being too optimistic slows down the algorithm that would not be able to quickly progress towards the target.
We defined the heuristic function by considering how many times one of the operators needs to be applied to transform a single pixel value into the target value.More in detail, for each pixel value x ijc we compute three counters: the Brightness Counter, the Contrast Counter and the Gamma Correction Counter.The Brightness counter is the number of times the brightness operator, needs to be applied to x ijc to make it reach the target ŷijc .
We defined the Contrast Counter in a similar way.This requires to identify special cases since sometimes it is not possible to transform the pixel value in tothe desired target just by using this operator.
Finally, the Gamma Correction Counter is defined as: The heuristic function h is defined for the whole image x as an upper bound to the number of applications of any given operator to match each pixel with the corresponding target value: Concerning the backtrack function g(x) we counted the number of times an operator has been applied to the initial image to obtain the image x.In order to limit the searching time of the algorithm, we introduced two other modifications.
The search terminates when the difference between the actual and the target nodes is below a set threshold x − ŷ < τ .Moreover, it can also terminate when the number of explored nodes exceeds a set limit L. When this happens the visited node that is closest to the target is selected and the path (i.e. the sequence of enhancing operators) from the root to this node is obtained as output along with the enhanced image.We observed that, when the number of visited nodes overcomes the value of L = 7000, the accuracy of the output images tend to be very stable.Accordingly with the maximum number of explored nodes L we set the value of the threshold τ = 2.These choices have been taken in order to deal with the trade-off between the output image quality and the time required to explore the graph.The pseudo-code for the whole procedure is reported in Algorithm 1.
Algorithm 1 eXIE Input: image x, target ŷ, set of image operators A, target threshold τ , maximum number of visited nodes L.

Results
In the first part of this section we present the dataset we used in our experiments.Then we discuss the results obtained by applying eXIE to several image enhancement methods from the state of the art.Finally, we will assess the performance of eXIE obtained with a new baseline neural network especially designed for this purpose.

Dataset
The dataset used for our experiments is the Adobe-MIT Five-K dataset [24].This dataset is composed of 5000 highresolution images in RAW format.For each of these images, five enhanced version are included, each one enhanced by an expert photographer identified by a letter from A to E. In our experiments we used the RAW images and the images retouched by Expert C which is considered the most consistent of the five.We split the images by following the procedure presented by Hu et al. [14]: 4000 training images and 1000 test images.

Enhancement methods
We selected seven popular state-of-the-art methods for image enhancement.They belong to different families of approaches such as image-to-image translation , parametric, reinforcement learning and transformer based methods: 1. Exposure: This deep reinforcement learning based method, is able to enhance low quality images producing high quality images [14].
2. CycleGan: GAN based method, uses a cycle loss for learning the correct function to map an input image from a source domain to a target domain.Among all the possible applications, it could be also applied to enhancement tasks.It could suffer from the artifacts generation in output images [4].
3. DaR: This method, called Distort and Recover, is a double Q-learning based algorithm for image enhancement [25].
4. Pix2Pix: Proposed by Isola et al. in 2016, this conditional adversarial network for image-to-image translation showed great performances in several domains of application.[1].
5. HDRNet: This architecture based on bilateral grid processing and local affine color transformation, learns the correct transformations to be applied to the low quality high resolution input image observing its resized version [2].
6. Star-DCE: transformer based method for image enhancement.This method, splits the input images in patches and embed them into tokens.Then these tokens are passed to a long-short range Transformer module composed of two branches: one for long-range context (composed of a cascade of transformer blocks) and the other for short-range context (composed of a cascade of convolutions and batch normalizations [3].
7. Parametric: Proposed by Bianco et al, this pipeline learns, in a paired training scenario, the parameters of a color transformation, using a downsampled version of the high resolution low quality input image.Once the color transformation is obtained, it is applied to the original input image to enhance its content [8].
All these methods have been trained using the following configuration described in the original paper on the 4000 images from the Five-K dataset.The trained models were then evaluated on the 1000 test images.Finally, all the images obtained have been analyzed by eXIE to produce the corresponding sequences of operators.

Experimental results
We used four different metrics: the Peak Signal-to-Noise Ratio (PSNR) [26], Learned Perceptual Image Patch Similarity (LPIPS) [27], Delta E (∆E) [28] and Structural Similarity Index SSIM [29].Table 1 compares the metrics computed on the output images of the considered methods and those computed on the images produced by eXIE.
In all the cases eXIE was able to imitate the underlying enhancement method with a good level of approximation.The lost in terms of the four metrics considered is never very high, and in some cases the images found by eXIE are even better than the starting ones.This happens because by design eXIE prevents the introduction of artifacts.In practice, artifacts are harder to reproduce than correct enhancement.Only for the most accurate methods, and only for the PSNR and ∆E metrics, the difference is noticeable.This drop in accuracy is acceptable in many applications, if the explainability of the model can be ensured.Differences in terms of LPIPS and SSIM are always negligible.
From the qualitative analysis of the images produced by eXIE shown in Figure 3 it is possible to notice how eXIE is able to emulate the enhancement process of the original methods very accurately.For instance, by looking at the images produced by eXIE on CycleGan (column 6) it is possible to see that the artifacts have been removed and the color balance for the images obtained by eXIE is better than the version produced using the image-to-image translation method.Moreover, by observing the results on DaR (column 3) and Exposure (column 2), we can see that eXIE is able to obtain better saturation and contrast in the final image with respect to the result of the original models.By observing the results on the best models instead, Star-DCE (column 1), Pix2Pix (column4), HDR (column 5) and Parametric (column 7), it is clear how eXIE is able to mimic them with high fidelity.

Sequence inspection
As a further analysis we studied the sequences produced by eXIE (Figure 4).The sequences obtained with the considered state of the art methods are quite heterogeneous among them.Moreover, they follow a different order in the application of the operators.
The firsts operators applied are typically those that increase the global brightness of the image.These operators, are applied on all the pixels of the three channels of the image.Then, when the image reaches a good overall balancing,  [4] 0.78 0.82 DaR [25] 0.86 0.86 Pix2Pix [1] 0.86 0.86 HDRNet [2] 0.89 0.87 Star-DCE [3] 0.89 0.88 Parametric [8] 0.90 0.88 more fine-grained operators are applied.For instance, in the HDRNet column the figure shows how the algorithm finds that, after the application of the brightness over the all image, the best next action is to decrease the value of the red channel of the image, obtaining a refinement of the color distribution.

Enhancement of high resolution images via UNET
Many image-to-image-translation methods, like pix2pix or cyclegan, work with images of fixed dimensions.For these methods, the enhancement of very high resolution images without producing artifacts, requires to keep constant the spatial dimension of the features is kept constant along all the layers of the model.We addressed this scenario by using our method on very low-resolution versions of the input images and by applying the sequences found to the original high resolution images.This approach, requires very few computational resources in terms of memory and time and, as we will see, it does not decrease output quality.
More in details, given a high-resolution image from the Five-K dataset, we resized it to a very low-resolution (32 × 32) and provided as input to an especially designed convolutional neural network based on the UNet [6] architecture.eXIE is then applied to the resulting low-resolution images, and the sequence of operators found is applied to the high-resolution input image.Figure 5 summarizes this approach.
The neural network used in this experiment is composed of two parts: an encoder and a decoder.During the training of the network, the high resolution input images X from the Five-K training set are resized obtaining a low resolution version x.These images are given as input to the net obtaining the enhanced version ŷ = f (x; θ).The images ŷ are compared with the target images y using the binary cross entropy (BCE) as loss function: The model has been trained using standard data augmentation techniques (cropping, resizing, random flip and rotations) on the image pairs for 600 epochs and batch size 32.The UNet parameters are updated using mini batch gradient descent and AdamW as optimizer [30] with a learning rate of 5e-3.The learning rate was decayed of 0.1 every 100 epochs starting from the 200th.
Once the network has been trained, it has been used to enhance the low-resolution versions of the images from the Five-K test set.These enhanced images have been then processed by eXIE along with the original low-resolution inputs.
Table 2 summarizes the results of the application of eXIE to the low-resolution images.The first column shows the performance of the neural model on the low resolution images.The metrics reported in the second column, are computed on the high resolution images.eXIE shows good performance even when it is applied to very high resolution images as the original ones provided in the Five-K dataset.
The values of the metrics computed on the images enhanced using eXIE are good.In fact, by comparing these results with those showed in    Analyzing the images showed in Figure 6, is it possible to notice the abscence of artifacts and that the color balancing applied by eXIE is correct .Moreover, from the analysis of the details is it possible to see that the visual content of the image is preserved correctly (this is also confirmed by the high value of SSIM in the previous table).

Case Study: Human Target
As a last experiment, we used eXIE to reverse engineering the work of an expert photo editor.The goal, here, is to replicate the image enhanced by a human expert as a sequence of elementary editing operations.The application for this scenario is educational: a beginner photo editor could use the system to learn how to achieve a given editing effect by looking at how eXIE reduced it to a sequence of operations.
To do so, we applied eXIE on the images in the Five-K dataset with the goal of reproducing the versions enhanced by the experts.Table 3 summarizes the results obtained.
The results in the Table 3 shows that eXIE is able to provide very high quality images reproducing the enhancement process applied by the experts of the Five-K dataset.Moreover, the values showed for the considered metrics are very high confirming the ability of our algorithm to emulate not only the enhancement process of image-to-image translation models, but also the sequence of operations chosen by human experts From the analysis of the distributions of the operations forming the sequences selected by eXIE (Figure 7), it is possible to notice that the most frequently selected operations are those that modify all the color channels.The distributions are  quite similar for all the experts, and also for the UNet, with just some difference in the frequency of single-channel operations (for the brightness, in particular) Observing actions probability distributions of each single expert it is possible to notice general trends or particular preferences.By observing Expert A actions distribution for it is possible to notice how brightness and gamma correction over all the three channels have almost the same probability, indicating that these two actions are interchangeable for Expert A most of the times.

Conclusions
In this paper we proposed eXIE, a novel method for explaining the enhancement process of state-of-the-art methods for image enhancement.This XAI method is able to provide an equivalent sequence of enhancement operators that emulate the behavior of image enhancement methods with only a small loss in the performances.
eXIE, was able to produce good looking images, even with a better color distribution with respect to some of the methods on which it is applied.Moreover, its output images do not present artifacts and show high quality details.
The ability of generalization of the method has been tested by applying it to high resolution images and executing the heuristic search algorithm on their low resolution versions.
In the future we plan to explore the application of eXIE to different processing tasks like retargeting and restoration.We will also consider more specific domain of applications such as that of medical imaging, where the explainability of the enhancement process is of vital importance.

Figure 2 :
Figure2: Example of the graph traversed by the eXIE algorithm.For space reasons, the considered graph is generated using only three editing operators over all the channels of the images and it is truncated after two levels.

Figure 3 :
Figure 3: Comparison of the images obtained by eXIE and by the original methods.

Figure 4 :Figure 5 :
Figure 4: Example of the sequences obtained with the application of eXIE on Star-DCE, Pix2Pix and HDRNet.

Figure 6 :
Figure 6: Example of the application of eXIE on two high resolution image from Five-K test set after being processed by the UNet.

Figure 7 :
Figure 7: Actions distributions of the sequences obtained applying eXIE to replicate the enhancing process applied by Five-K experts and the UNet.
Modern approaches to image enhancement are typically based on deep convolutional neural networks.In particular, in image-to-image translation methods a neural network learns how transform a low quality image into its enhanced version.Isola et al. developed a GAN-based conditional architecture able to translate the input image from a source domain to a new or a different target domain [1].Using a similar generative adversarial architecture, Zhu et al. developed a cycle-loss for translating images from an unpaired training set [4].Liu et al. presented an unsupervised version of the generative adversarial networks for image-to-image translation [5].Ronnenberg et al. developed an encoder-decoder architecture for image segmentation [6].This architecture was adapted during the years for several tasks, including image enhancement.Cai et al. developed a modified version of the UNet architecture to enhance low light images 1.1 Enhancement methods [9]rbi  et al.developed an architecture able to learn the coefficients of a transformation working on a low resolution version of the original input image.Once the transformation is obtained, it is applied to the high resolution image to enhance it[2].Bianco et al.proposed an architecture able to learn the parameters of a color transformation.These parameters are combined with a basis function and the resulting transformation is applied to the input image[8].The same research group also proposed another approach where a neural network estimates the coefficients of splines color curves[9].
[12] et al. presentedan enhancement architecture based on a color curve encoder.This encoder computes the color curves parameters on a low resolution version of the input image.The learned transformation is then applied to the high resolution image[10].Recently, Zhang et al. presented a transformer based architecture for real time image enhancement[3].In this work, a reduced structure of the original version presented by Vaswani et al.[11]is used to enhance low light images.Kim et al.[12]proposed representative

Table 1 :
Comparison of state-of-the-art enhancement methods on the Adobe Five-K dataset with and without eXIE.
Table 1 we can see that eXIE is able to enhance high resolution images better than other methods like Exposure, CycleGan and Dar.The values of the performance metrics are very similar to those obtained by HDRNet.

Table 2 :
Results of the experiment with the neural network for low-resolution enhancement (on low-resolution images) and the application of eXIE (on high-resolution images).

Table 3 :
Results of the application of eXIE on the test images of the Five-K dataset using the images of the 5 available experts as target.