Abstract
In this paper, we propose a novel method for creating HDR images from only two images—flash and non-flash images. Our method consists of two main steps, namely brightness gamma correction and bi-local chromatic adaptation transform (CAT). The brightness gamma correction performs series of increases and decreases of the non-flash brightness and yields multiple images with various exposure values. The bi-local CAT enhances the quality of each computed image by recovering missing details, using information from the flash image. The final multi-exposure images are then merged together to compute an HDR image. An evaluation shows that our HDR images, obtained by using only two LDR images, are close to HDR images, obtained by combining five manually taken multi-exposure images. Our method does not require the usage of a tripod and it is suitable for images of non-still objects, such as people, candle flames.
Similar content being viewed by others
1 Introduction
High-dynamic range (HDR) images capture the luminance of real-world scenes, which ranges from extreme dark to direct sunlight. Details of both shadow and highlight areas, present in a high-dynamic scene, can be recovered in a single HDR image. In contrast, standard digital imaging produces low-dynamic-range (LDR) images, in which the luminance dynamics is replaced by the discrete luma range. The latter limits the capture of details in scene shadow and highlight regions, resulting in an under-/over-exposure.
The camera response function (CRF) gives the relationship between the luminance and the luma up to a scale factor. To this end, to recover an HDR image using standard digital imaging, an estimation of the CRF and a detail recovery for black/white image pixels are required. The most common way of creating an HDR image by respecting these two requirements is to merge multiple LDR images, taken at various exposure times and referred to, in this paper, as multi-exposure images.
Despite the efficiency of multi-exposure methods, the process of creating an HDR image is usually time-consuming. First, users are required to use a tripod and to adjust the camera exposure time each time they take an image (if the camera does not include exposure bracketing function). Moreover, during the shooting process, misalignment could become an issue, especially when there are moving objects in the scene. To this end, more time is likely to be spent aligning the images in the hope of correcting ghosting artifacts.
To represent the atmosphere and the details of a real-world environment, users often need to take more than three exposure images [25]. However, this may increase the risk of misalignment and noise. In the particular case of dark environments with a high luminance range, decreasing the exposure time allows to capture fine details in the highlights, but may significantly increase the levels of noise in the computed HDR image.
Instead of taking several multi-exposure images of the same scene, we propose a method, in which we use only two images to recover an HDR image—a non-flash image, taken at a certain exposure value, and its corresponding flash image. Our method can also be used for low-dynamic scenes to enhance the quality of a non-flash image with the help of a flash image. Non-flash images represent the genuine atmosphere of the original scene lighting. However, especially for images shot in dark environments, the non-flash images are often noisy and lack important details (in under-/over-exposed pixels). In contrast, flash images contain more details, but they do not preserve the original scene lighting.
The first key idea behind our HDR image creation lies in mimicking the CRF by a brightness function. The brightness function, used in our method, aims to represent the human perception of a scene at various brightness levels. This corresponds to the main purpose of digital cameras. Therefore, we strongly believe that the CRF can be well-approximated by our brightness function. We alter the brightness of a non-flash image by a one-parameter-dependent gamma correction and we yield a sequence of brightness exposure images. To create an HDR image, we then need to recover the missing details of the multiple brightness images (for which we recover no information by the brightness correction).
The second key idea of our method consists in recovering these missing details by using reliable information from the flash image. To retain the original ambience of the scene while preserving the details of the flash image, we propose a novel bi-local chromatic adaptation transform (CAT). The bi-local CAT is directly applied to the flash image in order to adapt its brightness to that of the non-flash image. As the non-flash brightness is lower than the flash brightness, the bi-local CAT remaps the flash pixel values into values, corresponding to lower brightness. Therefore, to allow for an increase of the brightness dynamics and the contrast of the flash image, we carry out the bi-local CAT on each of the multiple brightness images. That way, we obtain final multi-exposure images, which we merge into an HDR image. We apply our method to dark environment scenes with high-dynamic range, for which the reach of the flash is significant.
The main contributions of this paper are fivefold:
-
Automatic non-flash image brightness correction;
-
Bi-local CAT for automatic creation of multi-exposure images from only two images—flash and non-flash;
-
Automatic recovery of HDR images from the computed multi-exposure images;
-
Enhancement of a non-flash image using a flash image;
-
Automatic removal of the soft shadows of the flash image as well as diminution of flash reflections.
The advantages of our method over the classical multi-exposure methods are the following: (1) the number of images for obtaining an HDR image is reduced to two; (2) the usage of a tripod is not required in the case when the flash and non-flash images can be taken one after the other in a short period of time (therefore, our method is suitable for handheld device applications); (3) ghosting artifacts and misalignment are brought to a minimum. If a small misalignment between the two images occurs, our method is able to overcome it.
2 Related works
The entire dynamic range of a real-world scene cannot be captured by today’s camera sensors. That is why, digital images of scenes with high-dynamic luminance range are either under-or over-exposed. The classical technique for obtaining a high-dynamic range image without under-/over-exposed regions uses a set of images, taken at various exposure settings [3, 17]. Debevec et al. [3] first exploit the reciprocity property of imaging systems to construct the response curve of multi-exposure images and to recover their HDR radiance map. Reversely, Mann et al. [17] compute a floating-point image as a representation of an “undigital” image with an extended dynamic range, without any prior knowledge about the response curve of the imaging device. The floating-point HDR image is yet again computed from a set of multi-exposure images. The general concept of using multi-exposure images for creating HDR images is highly exploited in today’s photography. However, this approach has several main drawbacks, including possible image misalignment and ghosting for scenes with moving objects. To this end, there exist a number of techniques, designed to handle misalignment and ghosting artifacts [9, 10, 26, 28].
To overcome the main limitations of multi-exposure methods, Tocci et al. [29] propose an optical architecture which automatically captures three optically aligned images at different exposures by splitting the light from a single lens and focusing it onto high-, medium- and low-exposure imaging sensors. However, the proposed optical advancement is not available for massive use and its construction is costly. In contrast, other methods use a single-coded image to recover per-pixel exposures [22, 23]. They rely on a spatially varying optical mask on the sensor, giving different exposures to adjacent pixels. The coded exposures are mapped to an HDR image using reconstruction techniques, such as interpolation [23], piece-wise linear estimators, based on Gaussian mixture models [1], and the recently proposed sparse reconstruction, based on convolution sparse coding [27]. However, such reconstructions are computationally costly, they require hardware modification and they can introduce artifacts if the mask is regular and a simple interpolation is used.
Furthermore, a new method for image brightening from a single image, using standard digital cameras, has been recently introduced [16]. Li et al. create three virtually exposed images from a single image by increasing the brightness of the under-exposed regions. The brightness increase is carried out by a non-decreasing function in a newly designed “simplified” CIE Lab color space. Unlike our method, Li et al.’s method does not explicitly compute and modify the brightness of the original image. It is used to brighten dark objects in outdoor scenes as well as to create a tone-mapped version of an HDR image by fusing the three virtual exposures. A brightening approach from a single image would not give plausible results if the input image contains a significant number of under-/over-exposed pixels, for which no information can be recovered from a single image.
Furthermore, Mertens et al. [21] propose an exposure fusion, which merges a sequence of multi-exposure images into an image with extended luminance range, which can be directly displayed on an LDR screen (a tone-mapped image). The fusion is guided by series of metrics which ensure that only the well-exposed values of each exposure image are kept in the result. Unlike the exposure fusion, which combines several multi-exposure images into one enhanced image, two other methods introduce image enhancement techniques for flash photography, relying on only two images. The methods in [5, 24] exploit the properties of flash and non-flash image pairs for dark environments. These methods combine the non-flash ambient light with the details from the flash image using a bilateral-filter-based image decomposition. That way, they enhance the quality of the non-flash image. Unlike the HDR imagery, which provides a number of LDR outcomes (tone mappers), the methods in [5, 24] generate a single LDR image, which cannot be extended to an HDR image. Other methods also take two differently exposed images as an input. The method in [31] is applied between blurred and noisy image pairs for the purpose of image deblurring, whereas the methods in [11, 14] take differently exposed subsequent frames from a video sequence to reconstruct an HDR video.
Matsuoka et al. [20] also exploit the properties of the flash image, this time in the context of HDR imagery. To construct an HDR image, the authors integrate a sequence of multi-exposure images in the wavelet domain. Before merging the multi-exposure images, two steps are performed. First, the flash image is used to find an alpha mask of shadow regions of the long exposure image. Second, a noise removal technique, guided by the flash image, is applied to denoise these shadow regions. Unlike our method, Matsuoka et al.’s method does not explicitly involve the flash image into the creation of the HDR image (no flash image information is transferred into the final HDR image). Furthermore, similarly to multi-exposure methods, Matsuoka et al.’s method requires a tripod to shoot the multi-exposure images and it is suitable only for static scenes.
3 Our method
In the present section, we introduce our method for computing an HDR image from two images—a flash image F and a non-flash image \(E_{0}\). Figure 1 illustrates the main flowchart of our method. The proposed method starts with a noise removal step, yielding free of noise flash and non-flash images. The brightness of the noise-free non-flash image is modified during a brightness correction step, at the end of which we obtain a sequence of multiple brightness images. The images in this sequence contain black and/or white pixels. In the next step, an iterative bi-local CAT, the missing details are recovered using information from the noise-free flash image. That way, we generate a final sequence of multi-exposure images, which we then merge together into an HDR image.
3.1 Noise removal
Our method starts by denoising the flash and non-flash images. Even though the flash image is considered a reliable image, containing no/very little noise, the flash may introduce grainy noise. Therefore, we apply a bilateral filter with a small kernel size on the flash image to handle any possible noise. The bilateral filter behaves well for a well-lit images, such as flash images. In contrast, for images shot in dark environments without a flash, experiments show that the guided filter [12] performs better than both bilateral and cross-bilateral filters [4, 5]. Therefore, we apply the guided filter to denoise the non-flash image.
3.2 Brightness gamma correction
Creating an HDR image from multi-exposure images requires a knowledge of the CRF. Several methods for recovering the CRF exist [3, 17]. They recover the CRF up to a scale factor from at least two multi-exposure images. Once recovered, the CRF could be used to compute multi-exposure images from a single image.
To simplify the process of creating multi-exposure images, we no longer use prior knowledge about the CRF. Instead, we mimic the CRF by a brightness function. The brightness, which is one of the absolute color appearance attributes, is fundamental for our approach. It describes the intensity of the light source, and its sensation depends on the adaptation to the scene light source. Furthermore, it varies with the environment (dark, dim, bright, etc.). A key advantage of the brightness over other color appearance attributes, such as lightness, is its unbounded range.
We compute the brightness \(Q_0\) of the non-flash image \(E_0\) using CIECAM02 [6]. The brightness \(Q_0\) is modified using a gamma correction function, where gamma is derived from a brightness-dependent parameter p. By varying this parameter, we obtain multiple brightness images \(E_p\). Their brightness \(Q_p\) is computed using the gamma correction, proposed by Bist et al. [2]:
The gamma value \(\gamma (p)\) is obtained as a function of the correction parameter p and the maximum brightness \(Q_{\max }\) of the non-flash image \(E_0\). The parameter p is expressed in terms of \(Q_{\max }\). Therefore, we either increase (for \(p \le Q_{\max }\)) or decrease (for \(p > Q_{\max }\)) the brightness of the non-flash image \(E_0\) to obtain each brightness exposure image \(E_p\). The optimal choice of parameter p is discussed in Sect. 3.5.
3.3 Iterative bi-local CAT
The brightness gamma correction, presented in the previous subsection, does not introduce new information, and therefore, details in the under-/over-exposed areas of the non-flash image \(E_0\) cannot be recovered. To tackle the limitations of using a single image for the recovery of an HDR image, we consider an extra image—the flash image F. This image can easily be taken alongside the non-flash image in less than one second and contains reliable information about the shadows of the non-flash image as well as more scene details.
We propose a novel CAT, which carries out a transformation of the flash image F with respect to each image \(E_p\). This transformation, that we call bi-local CAT, aims to adapt the colors of the image F to those of the image \(E_p\) as well as to remove the impact of the flash on the original scene lighting, while preserving the details of the image F (except for the flash shadows and reflections). Compared to previous works [5, 24], our method allows for an advanced combination of flash/non-flash light, color and detail, and at the same time is robust to small misalignment, flash shadows and reflections.
The bi-local CAT extends the local CAT, presented in the iCAM [7, 15]. The local iCAM CAT would compute a global illuminant for the image \(E_p\) and would locally adapt the colors of the flash image F to this illuminant. However, the wide luminance range of the image \(E_p\), varying from pure black to pure white, cannot be correctly described by a single illuminant. To transfer the high contrast areas of the image \(E_p\) onto the flash image F, the bi-local CAT computes a local representation of the illuminant of the image \(E_p\), instead of a global one, as well as a local representation of the illuminant of the flash image F. Like standard CATs [6, 15], the bi-local CAT starts by converting the RGB stimuli of both images F and \(E_p\) into spectrally sharpened RGB signals [6]. Then, we apply the von Kries normalization pixel-wise to convert the spectrally sharpened RGB stimuli (\(R^{F}\), \(G^F\), \(B^F\)) of the flash image into the adapted tristimulus responses (\(R_c\), \(B_c\), \(G_c\)) as follows:
where the triples (\(R^{E_p}_{w}\), \(G^{E_p}_{w}\), \(B^{E_p}_{w}\)) and (\(R^{F}_{w}\), \(G^{F}_{w}\), \(B^{F}_{w}\)) are pixels from low-pass versions of the images \(E_p\) and F, respectively (more details in the following paragraph). The adaptation factor D is given as follows [6, 15]:
where the scalar \(L_A\) is the adapting luminance, taken as 20\(\%\) of the white object in the scene. The surrounding factor, denoted by S, equals 1 for average surround, 0.9 for dim surround and 0.8 for dark environments. In our method, we carry out an adaptation of the colors of the flash image, and therefore, the surround is considered average (\(S = 1\)). A coefficient \(K = 0.3\) is used by Kuang et al. [15] to avoid full adaptation and de-saturation of the colors. In contrast, we use a coefficient \(K = 1\) to perform a full adaptation. The adaptation factor D ranges from 0 (no adaptation) to 1 (full adaptation).
The von Kries normalization in Eqs. (2), (3) and (4) computes the per-pixel ratio of two low-pass images (\(R^{F}_{w}\), \(G^{F}_{w}\), \(B^{F}_{w}\)) and (\(R^{E_p}_{w}\), \(G^{E_p}_{w}\), \(B^{E_p}_{w}\)), called white images (following the notation in [6]). So far, the von Kries normalization has been carried out either globally [8], in which case the white images boil down to white points, or locally between a single-point illuminant and a white image [13, 15]. To the best of our knowledge, a CAT has never been applied in a bi-local context. Figure 2 shows the advantage of the bi-local CAT over two local CATs for the purposes of this paper.
The white images are computed directly from the flash image F and the image \(E_p\) as follows.
-
The flash white image is computed by applying the guided filter. We observed that in our context the guided filter outperforms Gaussian and bilateral filters. Experiments show that Gaussian filter fails to transfer properly the shadows of the image \(E_p\), introducing brand new shadow regions. Moreover, the bilateral filter introduces a lot of visible halo artifacts around the edges. In contrast, the guided filter suppresses the presence of such halo artifacts, preserves the shadow boundaries of the image \(E_p\) and robustly sharpens the details of the flash image.
-
The white image of the image \(E_p\) is the image \(E_p\) itself. The image \(E_p\) is obtained from the image \(E_0\), to which we have applied the guided filter.
When applied iteratively, the bi-local CAT robustly adapts the colors of the image F to the colors of the image \(E_p\) and progressively removes flash shadows and reflections. The iterations are performed as follows:
During the first iteration (\(t = 1\)), we carry out the bi-local CAT between the images F and \(E_{p}\). For the following iterations, we perform the bi-local CAT between the result from the previous iteration \(F^{p}_{t-1}\) and the image \(E_{p}\). After N iterations, we obtain the final exposure image \(F^{p}_{N}\). During each iteration t, the flash white image is recomputed from the result \(F_{t-1}^p\), whereas the white image of the image \(E_p\) remains unchanged. The two main properties of the iterative bi-local CAT are discussed hereafter.
Property 1: Darkening When the ratio of the white images is less than 1, i.e., \(I^{E_p}_{w} / I^{F_p^t}_w < 1\), where I stands for R, G and B channels, the bi-local CAT darkens the flash image F (left-hand plot in Fig. 3). As the white image of the image F is recomputed iteratively, the pixels of the flash image will keep decreasing until reaching an iteration k, for which the white image ratio becomes close to 1 (because the values \((R_c, G_c, B_c)\) remain unchanged after the iteration k, see Eqs. (2), (3) and (4)). To recover information in the under-exposed regions of the brightness multi-exposures, the maximum number of iterations does not have to exceed k. However, it still needs to be big enough for the bi-local CAT to transfer the scene ambience and remove flash shadows and reflections. More information on the optimal number of iterations is presented in Sect. 3.5.
Property 2: Brightening When the ratio of white images is greater than 1, i.e., \(I^{E_p}_{w} / I^{F_t^p}_w > 1\), the bi-local CAT brightens the flash image (left-hand plot in Fig. 3). The pixel values of the flash image will keep increasing until reaching an iteration l, for which the white image ratio becomes close to 1. After the iteration l, the values of the flash image remain unchanged.
These two properties reveal the ability of the bi-local CAT to increase the dynamic range of the flash image F (by both darkening and brightening). They also reveal the importance of the brightness correction step in our algorithm. If we applied the iterative bi-local CAT only between the flash and the non-flash images (without computing multiple brightness images), we would progressively darken the values of the result \(F_t^p\) by shifting its histogram to the left (right-hand plot in Fig. 3). In this case, the final result would represent the brightness of the non-flash image \(E_0\) rather than the brightness of the scene. In contrast, once we obtain the sequence of multiple brightness images and perform the bi-local CAT, the histogram of the result \(F_t^p\) is shifted both to the left and to the right (we darken the pixels in the shadows and brighten the ones in the highlights).
3.4 Image fusion
The bi-local CAT yields a sequence of multi-exposure images, which are then merged together to recover an HDR image. We use Debevec et al. fusion method [3], which relies on a CRF estimation. In our method, we estimate the CRF from the final multi-exposure images.
Additionally, we compute the real CRF from a sequence of real multi-exposure images to verify whether or not the CRF, used in our method, is similar to the real one. Figure 4 presents plots of the CRF, computed from final multi-exposure images, and the real CRF. We observe that the CRF, used in our method, approximates well to the real CRF. This conclusion is based on several experiments, involving various real image sets. Figure 4 shows also the CRF, computed from the multiple brightness images. The CRF, estimated after the iterative bi-local CAT, is more accurate than the CRF, estimated after the brightness correction. This reveals a key advantage of our method over methods, based only on a brightness correction.
3.5 Choice of optimal values of p and N
The efficiency of our method greatly depends on the parameter p, used in the brightness correction step. We analyze which values of p allow to compute a plausible approximation of the real CRF.
First, for every iteration \(t \in [1, 10]\) of the bi-local CAT, we compute multi-exposure images by using each value of p from the set \(\{(0.6 + 0.1i)Q_{\max }\}_{i = 0}^{39}\) (for a total of 400 final multi-exposure images). Experiments showed that values of p lower than \(0.6\cdot Q_{\max }\) result in over-exposure of the majority of pixels in the result, and therefore, we exclude them. Second, we compute the structural similarity metrics (SSIM) [30] between each of the 400 multi-exposure images and each of several real multi-exposure images of the same scene (taken manually by a professional photographer). We observe a clearly defined peak, optimizing the SSIM value for each iteration t (Fig. 5). The peaks for all iterations t (per real multi-exposure image) correspond to the same p, which remains unchanged for all the different sets of real multi-exposure images, for which we performed this analysis. These sets of images were taken with two different types of cameras. Therefore, the value of p is also independent of the choice of camera. The value of p depends only on the exposure of the real multi-exposure image, but at the same time, it is insensitive to the choice of an exposure for the non-flash image.
We have experimentally derived the value \(p_i\) of the \(i\mathrm{th}\) final multi-exposure image as a function of \(Q_{\max }\) and the image index i, \(i \in \{1, \dots , M\}\):
where C is a constant, which has experimentally been set to 0.7. The sign S is either equal to 1 for an increase of the non-flash brightness \(Q_0\), or equal to \(-1\) for a brightness decrease. We have experimentally found out that the use of \(M = 6\) final multi-exposure images (out of which one is the non-flash image) helps generate HDR images, close to the ground truth. Our experiments have indicated that the exposure value \(X_i\) of the \(i\mathrm{th}\) final multi-exposure image can be expressed as \(X_i = X_0 + S\cdot i\), where \(X_0\) is the exposure of the image \(E_0\). The final multi-exposure images together with the computed exposure values allow to recover plausible HDR images.
In our experiments, the maximum SSIM score was reached during the eighth iteration. We therefore chose to perform \(N=8\) iterations of the bi-local CAT.
4 Results and evaluation
In the this section, we present our HDR results and we evaluate their similarity to the ground truth.
4.1 Experimental setup
We have built a data set of images of real-world scenes, consisting of flash and non-flash images and real multi-exposure images. The flash and non-flash images were taken in a short period of time (less than one second) and were used to compute the results, shown in this paper. Additionally, we took real multi-exposure images to recover a real HDR image per scene. A professional photographer has chosen the best exposure values in order to capture the finest details in the shadows and the highlights. To make the real HDR images representative of the ground truth, we used a tripod to avoid misalignment during the shooting process. We compare our results to the ground truth in the evaluation part of this section.
4.2 Recommendations for the choice of non-flash images
In our method, we choose the non-flash image \(E_{0}\) to be the lowest exposed image with less than 5% black pixels. Despite the fact that the iterative bi-local CAT is able to recover the missing details in black pixel regions, in the case when the percentage of black pixels exceeds 5%, the non-flash image becomes too low-exposed and noisy. This results in a trade-off between the fidelity of the result and the successful noise removal when applying our method. The noisier the image, the bigger the kernel size of the guided filter and the greater the loss of details. The non-flash image \(E_0\) may not be the lowest exposed image for a given scene; however, its exposure time is still significantly short to allow for the flash and non-flash images to be taken subsequently without the use of a tripod.
4.3 Evaluation
Figure 6 presents an HDR result, obtained with our method, as well as a real HDR image of the scene. To evaluate the similarity between our HDR result and the real HDR image, we compute their luminance histograms (Fig. 6). Our method recovers the dynamic range of the real HDR images in our data set (resulting in the same number of f-stops as the real HDR images). The luminance distribution of our results is strongly correlated with the ground-truth luminance. Moreover, we adopt the perceptual metrics HDR-VDP-2 [18] to visualize the perceptual difference between our HDR results and the ground truth. Red regions in the HDR-VDP-2 color-coded map indicate deviations from the ground-truth luminance. The color-coded maps in Fig. 6 reveal an overall high perceptual similarity between our result and the ground truth.
The real HDR images aim to represent the ground truth by merging a number of multi-exposure images. The more multi-exposure images we merge, the closer the HDR image is to the real-world scene. To show how close our results are to the ground truth, in Fig. 7 we compare them to several HDR images, obtained by combining two, three and five real multi-exposure images. The HDR-VDP-2 metrics indicates that our HDR result is visually similar to the HDR image, computed from five real multi-exposure images. Moreover, the log2 luminance distribution of our HDR image is highly correlated with that of the real HDR image, obtained from five real multi-exposure images. In this sense, our result is closer to the ground truth than the HDR images, recovered by merging two and three real multi-exposure images.
Despite the similarity with HDR images, computed from five real multi-exposure images, our results may differ from real HDR images at shadow areas. While adapting to the colors of the image \(E_p\), our bi-local CAT preserves the details of the flash image in the shadows of the result. Reversely, taking low-exposures images in dark environments may cause noise in the shadows and compromise the integrity of the real HDR image. Figure 8 illustrates a key property of our method, i.e., the detail recovery. Our HDR image preserves the DVD labels in the shadows of the scene, unlike the HDR image, obtained from five real multi-exposures.
The main advantage of our method over the multi-exposure approach is illustrated in Fig. 9. The flash and non-flash images, shown in the figure, were taken with a handheld camera, imitating a typical user case. Our method successfully recovers HDR images of non-still (slow moving) objects (such as people, posing for portraits) and avoids ghosting artifacts.
Finally, another advantage of our method consists of an automatic removal of soft shadows from the flash image, carried out by the bi-local CAT. If the flash image contains shadows, created by the flash, there is a risk that they will appear in the final result \(F^{p}_{N}\) (and if they do, the result would look unnatural). It turns out, though, that eight iterations of our bi-local CAT are enough to completely remove soft shadows from our HDR result, as illustrated in Fig. 10. Our method also reduces reflections, caused by the flash.
4.4 Non-flash image enhancement
Our method can be used in the context of non-flash image enhancement. We increase the quality of a non-flash image in terms of detail recovery and scene illumination enhancement, as shown in Fig. 11. Given flash and non-flash images, we automatically recover an HDR image and then we use various tone-mapping operators to visualize it on an LDR screen. Figure 12 shows a comparison between our method and two state-of-the-art methods, all used in the context of non-flash image enhancement. Mertens et al. [21] fail to properly combine the flash and non-flash images, because the flash image is already well-exposed. Eisemann at al. [5] produce a single image as an outcome of their method. In contrast, our method provides a number of enhanced images, each resulting from a different tone-mapping operator.
5 Conclusion
In this paper, we proposed a novel method for creating HDR images, relying on only two images as an input—flash and non-flash images. Our method automatically creates multiple exposure images by brightening the non-flash image and bi-locally adapting the colors of the flash image to the brightened image. Our method is used to compute HDR images, which do not significantly differ from HDR images, obtained by merging five manually taken multi-exposure images. We proposed a method for handling challenging dark environment scenes, in which the non-flash image is often unreliable as it contains noise and lacks information. Moreover, our method can be used in the context of non-flash image enhancement and in comparison with existing methods, it provides various enhancement options. Due to the limited reach of the flash, our approach is limited for outdoor scenes, which are left for future work.
References
Aguerrebere, C., Almansa, A., Gousseau, Y., Delon, J., Muse, P.: Single shot high dynamic range imaging using piecewise linear estimators. In: 2014 IEEE International Conference on Computational Photography (ICCP), pp. 1–10. IEEE (2014)
Bist, C., Cozot, R., Madec, G., Ducloux, X.: Style aware tone expansion for hdr displays. In: Proceedings of graphics interface, pp 57–63 (2016)
Debevec, P.E., Malik, J.: Recovering high dynamic range radiance maps from photographs. In: ACM SIGGRAPH 2008 Classes, p. 31. ACM (2008)
Durand, F., Dorsey, J.: Fast bilateral filtering for the display of high-dynamic-range images. ACM Trans. Graph. (TOG) 21(3), 257–266 (2002)
Eisemann, E., Durand, F.: Flash photography enhancement via intrinsic relighting. In: ACM Transactions on Graphics (Proceedings of Siggraph Conference), vol. 23. ACM Press (2004). http://maverick.inria.fr/Publications/2004/ED04
Fairchild, M.D.: Color Appearance Models. Wiley, Chichester, UK (2013)
Fairchild, M.D., Johnson, G.M.: iCAM framework for image appearance, differences, and quality. J. Electron. Imaging 13(1), 126–138 (2004)
Frigo, O., Sabater, N., Demoulin, V., Pierre, H.: Optimal transportation for example-guided color transfer. In: 12th Asian Conference on Computer Vision (ACCV) (2014)
Gallo, O., Gelfand, N., Chen, W.C., Tico, M., Pulli, K.: Artifact-free high dynamic range imaging. In: 2009 IEEE International Conference on Computational Photography (ICCP), pp. 1–7. IEEE (2009)
Granados, M., Kim, K.I., Tompkin, J., Theobalt, C.: Automatic noise modeling for ghost-free hdr reconstruction. ACM Trans. Graph. (TOG) 32(6), 201 (2013)
Gryaditskaya, Y., Pouli, T., Reinhard, E., Myszkowski, K., Seidel, H.P.: Motion aware exposure bracketing for hdr video. In: Computer Graphics Forum, vol. 34, pp. 119–130. Wiley Online Library (2015)
He, K., Sun, J., Tang, X.: Guided image filtering. In: European Conference on Computer Vision, pp. 1–14. Springer (2010)
Hristova, H., Le Meur, O., Cozot, R., Bouatouch, K.: Style-aware robust color transfer. In: EXPRESSIVE International Symposium on Computational Aesthetics in Graphics, Visualization, and Imaging (2015)
Kalantari, N.K., Shechtman, E., Barnes, C., Darabi, S., Goldman, D.B., Sen, P.: Patch-based high dynamic range video. ACM Trans. Graph. 32(6), 202–211 (2013)
Kuang, J., Johnson, G.M., Fairchild, M.D.: iCAM06: a refined image appearance model for hdr image rendering. J. Vis. Commun. Image Represent. 18(5), 406–414 (2007)
Li, Z., Zheng, J.: Single image brightening via exposure fusion. In: ICASSP (2016)
Mann, S., Picard, R.: On being ”undigital” with digital cameras: extending dynamic range by combining differently exposed pictures. In: Proceedings of Society for Imaging Science and Technologys 48th Annual Conference (1995)
Mantiuk, R., Kim, K.J., Rempel, A.G., Heidrich, W.: Hdr-vdp-2: a calibrated visual metric for visibility and quality predictions in all luminance conditions. In: ACM Transactions on Graphics (TOG), vol. 30, p. 40. ACM (2011)
Mantiuk, R., Myszkowski, K., Seidel, H.P.: A perceptual framework for contrast processing of high dynamic range images. ACM Trans. Appl. Percept. (TAP) 3(3), 286–308 (2006)
Matsuoka, R., Baba, T., Okuda, M., Shirai, K.: High dynamic range image acquisition using flash image. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1612–1616. IEEE (2013)
Mertens, T., Kautz, J., Van Reeth, F.: Exposure fusion: a simple and practical alternative to high dynamic range photography. In: Computer Graphics Forum, vol. 28, pp. 161–171. Wiley Online Library (2009)
Nayar, S.K., Mitsunaga, T.: High dynamic range imaging: spatially varying pixel exposures. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2000, vol. 1, pp. 472–479. IEEE (2000)
Nayar, S.K., Narasimhan, S.G.: Assorted pixels: multi-sampled imaging with structural models. In: Computer Vision ECCV 2002, pp. 636–652. Springer (2002)
Petschnigg, G., Szeliski, R., Agrawala, M., Cohen, M., Hoppe, H., Toyama, K.: Digital photography with flash and no-flash image pairs. In: ACM Transactions on Graphics (TOG), vol. 23, pp. 664–672. ACM (2004)
Ramirez Orozco, R., Loscos, C., Martin, I., Artusi, A.: Multiview HDR video sequence generation. In: Dufaux, F., Le Callet, P., Mantiuk, R., Mrak, M. (eds.) High Dynamic Range Video: From Acquisition to Display and Applications, pp. 121–138. Academic Press, Elsevier (2016). ISBN 978-0-08-100412-8
Sen, P., Kalantari, N.K., Yaesoubi, M., Darabi, S., Goldman, D.B., Shechtman, E.: Robust patch-based HDR reconstruction of dynamic scenes. ACM Trans. Graph. 31(6), 203 (2012)
Serrano, A., Heide, F., Gutierrez, D., Wetzstein, G., Masia, B.: Convolutional sparse coding for high dynamic range imaging. In: Computer Graphics Forum, vol. 35. Wiley-Blackwell (2016)
Sidibé, D., Puech, W., Strauss, O.: Ghost detection and removal in high dynamic range images. In: Signal Processing Conference, 2009 17th European, pp. 2240–2244. IEEE (2009)
Tocci, M.D., Kiser, C., Tocci, N., Sen, P.: A versatile hdr video production system. ACM Trans. Graph. (TOG) 30(4), 41 (2011)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Yuan, L., Sun, J., Quan, L., Shum, H.Y.: Image deblurring with blurred/noisy image pairs. In: ACM Transactions on Graphics (TOG), vol. 26, p. 1. ACM (2007)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hristova, H., Le Meur, O., Cozot, R. et al. High-dynamic-range image recovery from flash and non-flash image pairs. Vis Comput 33, 725–735 (2017). https://doi.org/10.1007/s00371-017-1399-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-017-1399-0