Regarding to the results provided by the proposed operator, Figure 6 shows the output from the different processing stages. Figure 6a shows a dark image  in which main features of the scene cannot be appreciated, only the window can be distinguished. Figure 6b shows the output from the retina-like filtering, as the whole image is too dark, the retina output mainly enhances the edges of the main features. Figure 6c shows the output brightness equalization without taking into account the retina output. The final output image is depicted in Figure 6d. As we can observe in this image, main features of the scene can be distinguished clearly whereas the glares and the too bright areas which appear in Figure 6c have been mitigated. Moreover, the colors of the picture do not appear distorted.
Figures 7 and 8 show a comparative of the output of our operator, without taking into account the retina-like processing, and the output from well-known TMOs and contrast enhancement algorithms known as Drago et al. , Mantiuk et al. , and Reinhard and Devlin  operators and Multiscale Retinex . As we have explained previously, our operator has two main stages, one comprising a dynamic range adaptation and contrast enhancement and a second one for glare mitigation and edge enhancement. The later is specially designed for low-vision-affected people; therefore, the output obtained from the combination of both parts is not directly comparable with the output from other TMO operators. For this reason, we have set aside the output from the retina-like processing in this comparative.
In Figure 7a, we can observe the same underexposed image than in Figure 6. Figure 7b-g shows the original image enhanced with different TMO algorithms. In Figure 7b, we can observe perfectly all the elements of the image, actually the proposed operator not only brighten dark region, but also keep details of the landscape out of the window, whereas the output from the other TMOs presents the window overexposed (see Figure 7d-g), and, in some cases, the whole image appears too bright (see Figure 7e, g). Figure 8 shows another comparative example. In this case, the original image presents better signal-to-noise ratio (SNR) than in the previous case, according to Table 1. As we can observe, the proposed algorithm enhances the whole image without saturating the bright regions and preserves the overall level of illumination in medium values, that way all the details can be appreciated without presenting disturbing glares.
Table 1 summarizes the SNR of each of the images presented in Figures 7 and 8.
The value of the SNR has been calculated according to Equation (19):
Where μ is the average value of the image and σ is the standard deviation. We are working with color images so we show in Table 1 the average value of the SNR calculated for each channel.
From these comparisons of the results with different methods we can observe that the proposed algorithm provides an effective improvement of dark images and high-contrast images, without altering color information, preserving the details, and it does not brighten excessively the image. Moreover, it can enhance the image automatically according to the lighting conditions, without requiring the user to set complicated parameters.
According to the measures of the SNR presented in Table 1, our operator is able to increase the SNR with respect to the original image. Moreover, the values of the SNR provided by our algorithm are pretty similar to the values provided by the others TMOs, especially to the Drago operator. This operator is the one which provides more natural scenes and better detail reproduction in dark regions, according to a study performed by Yoshida et al. . In this study, the authors conduct a psychophysical experiment based on a direct comparison between the appearances of real-world HDR images of these scenes displayed on a low dynamic range monitor employing seven well-known TMOs. The human subjects were asked to rate image naturalness, overall contrast, overall brightness, and detail reproduction in dark and bright image regions with respect to the corresponding real-world scene.
Moreover, as it can be observed from Figures 7 and 8, the proposed operator provides better detail reproduction in bright image regions (observe the window in Figure 7a, b).
At this point, we discuss the results obtained from tests regarding the performance of the system using the GPU and the FPGA-based platforms, and also related to the use of resources for the FPGA implementation, and the speed up obtained with respect to a non-parallel CPU implementation.
As we have mentioned before, the complete system have been implemented on a GPU NVIDIA ION2 and on an FPGA Xilinx Spartan 3. The results regarding area occupation and clock frequency for the FPGA implementation are summarized in Table 2.
Table 3 summarizes the performance in frames per second (fps) of the CPU (Matlab code running on a single core), GPU, and FPGA implementations of the new operator and the speed up obtained with respect to the CPU when working with RGB images with VGA resolution (640 × 480). The CPU used to carry out this tests is the CPU Intel core i7 920 at 2.67 GHz. Both GPU and FPGA implementations reach real-time performance, over 25 fps, obtaining a minimum speed up of 7.5 with respect to the CPU even when using a high-end CPU.
The FPGA performs at a major frame rate than the GPU. This is mainly due to the large delay required to transfer the frame from the CPU memory to the GPU global memory and vice versa (10 ms). Nevertheless further improvement can be achieved by performing the transferences between the CPU and the GPU asynchronously, concurrently with computation.
However, the GPU works in floating point precision, whereas the FPGA uses fixed point since it has no native support for floating point arithmetic. Also the GPU computes the histogram with 256 intensity levels instead of the 64 levels employed by the FPGA, which is limited by routing constrains.
To measure the accuracy of both approaches we have calculated the peak-signal-to-noise ratio (PSNR), of the output image obtained with both, GPU and FPGA, systems with respect to the one obtained with the CPU, according to Equation (20). The resulting image computed with the FPGA obtains a PSNR of 30 dB, whereas with the GPU the value of the PSNR is infinite since the output image is identical to the one obtained with the CPU. The FPGA obtains a lower value for the PSNR as a result of the different algorithmic simplifications that had to be adopted, and the use of fixed point arithmetic.
where I stands for the resulting image obtained with the CPU, K is the resulting image obtained with the GPU or the FPGA, m and n are de dimensions of the image and MAX
is the maximum value that a pixel can reach (255 in our case).
Table 4 details the percentage of the total processing time employed in each of the tasks by the GPU and by the FPGA. In the case of the FPGA, the percentage of time is obtained for each module separately, when the whole system is working, all the tasks are being executed in a pipeline. On the other hand, the GPU employs only a 6% of the processing time in the histogram adjustment, since the histogram calculation is performed in parallel with the RGB to HSV conversion. Moreover, more than 30% of the time is employed in performing image transfers from CPU memory to GPU memory, so further improvement can be achieved performing the memory storage in parallel with the computation. Table 5 summarizes the power consumption, clock frequency, and weight for both systems.
According to the tables presented, we can observe that real-time performance (over 25 fps) is reached with both embedded solutions. Nevertheless the FPGA implementation is an order of magnitude more power efficient than the GPU, although it provides less accuracy in the computations and therefore output images with less PSNR.
On the other hand, the FPGA solution is less weight, whereas the GPU solution is more affordable since its use is widely extended. In the case of the GPU, a fixed architecture is provided and the goal is to obtain its maximum performance, whereas an FPGA design leaves more choices to the engineer. This flexibility of the FPGA comes at the cost of a much larger design time than the GPU and makes tuning the system more difficult than in the case of the GPU.