1 Introduction

Photometrically calibrating a camera is necessary, particularly when applying physics-based computer vision methods, such as photometric stereo (Woodham 1980; Ikeuchi 1981), shape from shading (Ikeuchi and Horn 1981; Zhang et al. 1999), color constancy (Hordley 2006; Weijer et al. 2007; Tan et al. 2004; Kawakami and Ikeuchi 2009), illumination estimation (Sato et al. 2003; Li et al. 2003; Lalonde et al. 2009), and surface reflectance estimation (Debevec et al. 2000; Hara et al. 2005; Haber et al. 2009). There have been a number of studies on automatic calibration of camera response functions (or gamma function) and vignetting correction (Lin et al. 2004; Takamatsu et al. 2008; Kuthirummal et al. 2008). These methods produce images that the intensity values are strictly proportional to the radiance of the scenes.

In computer vision literature, less attention has been paid to estimating camera spectral sensitivities and white balance settings,Footnote 1 despite the fact that both of them are crucial for color calibration between different types of cameras. The lack of attention is because physics-based methods usually assume images are captured by the same cameras; thus, the color space used in the whole process is the same. However, when different types of cameras are used, color calibration becomes vital, since without it, identical scene radiance will result in different color values.

To highlight the effects of camera spectral sensitivities and white balance settings, Fig. 1 shows the same scene captured by three different consumer cameras. As shown in the figure, the colors of a scene vary, due to the different camera spectral sensitivities and white balance settings. To further emphasize on the differences, the caption in the figure also includes the comparisons in terms of the chromaticity values.

Fig. 1
figure 1

The color differences of images taken by different cameras. The left, middle and right images are taken by Canon IXY 900 IS, Casio EX-Z 1050 and Panasonic DMC-FX 100, respectively. The images were adjusted to have the same scale of intensity values, to emphasize the color differences. The averaged chromaticity values \((r,\,g,\,b)\) of the red squares are \((0.45,\,0.32,\,0.23)\) for Canon, \((0.39,\,0.32,\,0.29)\) for Casio, and \((0.41,\,0.32,\,0.27)\) for Panasonic. The color differences are caused by different camera spectral sensitivities and white balance settings

The intensity formation of colored images can be modeled as:

$$\begin{aligned} I_c = \int \limits _{\varOmega } L(\lambda ) q_c(\lambda ) d \lambda , \end{aligned}$$

where \(I_c\) is the intensity at channel \(c,\) with \(c \in \{r,\,g,\,b\},\,\varOmega \) is the range of the visible wavelengths, and \(L\) is the incoming spectral radiance.Footnote 2 Considering the von Kries model in computational color constancy, we assume that for different white balance settings, cameras automatically multiply the intensity of each color channel with different scaling factors (\(k_c\)), namely, \(q_c = k_c q_c^\prime ,\) where \(q_c^\prime \) and \(k_c\) are the spectral sensitivity and white balance for \(c\) color channel, respectively.

Based on the last equation, our goal is to estimate \(q_c\) from given \(I_c.\) This means that given image intensity values, we intend to estimate the camera spectral sensitivities and white balance setting together, without any intention to separate them (\(q_c = k_c q^\prime _c\)). Note that, \(k_c\) is estimated up to a scale, and thus its relative value can be obtained from \(q_c.\)

In the literature, one of the basic techniques to achieve the goal is to use a monochromator (Vora et al. 1997), a special device that can transmit a selected narrow band of wavelengths of light. The method provides accurate estimation, and hence is commonly used. Other methods that do not use a monochromator require both input images and the corresponding scene spectral radiances (Hubel et al. 1994; Sharma and Trussell 1993; Finlayson et al. 1998; Barnard and Funt 2002; Ebner 2008; Thomson and Westland 2001).

Unlike the existing methods, in this paper, we introduce a novel method that uses only images without requiring additional devices. The basic idea of our method is, first, to estimate the sky spectral radiance \(L(\lambda )\) through a sky image \(I_c,\) and then to obtain the mixture of the spectral sensitivities and white balance setting, \(q_c(\lambda ),\) by solving Eq. (1). To our knowledge, this approach is novel, particularly the use of images with sky regions.

To estimate the sky spectral radiance, we calculate the turbidity of the sky from image intensities, assuming the sun direction with respect to the camera viewing direction can be extracted. The calculated turbidity provides the CIE chromaticities that can then be converted to the spectral radiance using the formulas of the Judd daylight phases (Judd et al. 1964).

Having the input sky image and its corresponding spectra, we estimate the camera spectral sensitivities by solving the linear system derived from Eq. (1). However, this solution can be unstable if the variances of the input colors are small, which is the case for sky images. To overcome the problem, we utilize precomputed basis functions.

The main contribution of this paper is a novel method of spectral sensitivity and white balance estimation from images. Other contributions are as follows. First, the improvement of the sky turbidity estimation (Lalonde et al. 2010), where a wide variety of cameras that have different spectral sensitivities and white balance settings can be handled without calibration. Second, a publicly available camera spectral sensitivity database that consists of twelve different cameras (Zhao 2013). Third, the application of the estimated camera spectral sensitivities for physics-based color correction in outdoor scenes, which according to our experiment, produces better results than those of a color transfer method (Reinhard et al. 2001).

There are a few assumptions used in our method. First, it assumes the presence of the sky in the input images. Ideally, it is clear sky. However, it performs quite robustly even when the sky is hazy or partially cloudy. Second, it assumes that the sun direction with respect to the camera viewing direction can be extracted. If we have the camera at hand, we can arrange the camera in such a way that we can extract the information from the image. However, if we do not have (e.g., we utilize prestored collections of images, such as those available on the Internet or in old albums), the EXIF tag (the time when the image is taken), the site geolocation and the pose of a reference object in the site can be used to determine the camera viewing- and the sun directions. While the requirement of a known geolocation and a reference object sounds restrictive, if we apply the method for landmark objects (such as the Statue of Liberty, the Eiffel Tower, etc.), such information can normally be obtained. Moreover, online services like Google Earth or Google Maps can also be used to determine the geolocation of the site. Third, we share the assumption that is used in the sky model (Preetham et al. 1999): the atmosphere can be modeled using sky turbidity, which is the ratio of the optical thickness of haze versus molecules.Footnote 3

The rest of the paper is organized as follows: Sect. 2 briefly reviews the related work. Section 3 describes the sky turbidity estimation and calculation from turbidity to spectral radiance. Section 4 explains the estimation of camera spectral sensitivity using basis functions. Section 5 provides the detailed implementation. Section 6 shows the experimental result, followed by Sect. 7, which introduces an application that corrects colors between different cameras based on the estimated camera spectral sensitivities. Section 8 discusses the limitation and the accuracy of the method. Finally, Sect. 9 concludes our paper.

2 Related Work

Most of the existing methods of camera spectral sensitivity estimation (Barnard and Funt 2002; Thomson and Westland 2001) solve the linear system derived from Eq. (1), given a number of spectra and their corresponding \(RGB\) values. However, such estimation is often unstable, since spectral representations of materials and illumination live in a low-dimensional space (Slater and Healey 1998; Parkkinen et al. 1989), which implies that the dimension of spectra is insufficient to recover high-dimensional camera spectral sensitivity information. To make the estimation stable, further constraints are required in the optimization process, and the existing methods mostly differ in the constraints they use.

Pratt and Mancill (1976) impose a smoothing matrix on pseudo-matrix inversion, compare it with the Wiener estimation, and claim that the Wiener estimation produces better results. Hubel et al. (1994) later confirm that Wiener estimation does provide smoother results than those of the pseudo-matrix inversion. Sharma and Trussell (1993) use a formulation based on set theory and introduce a few constraints on camera spectral sensitivities, such as non-negativity, smoothness, and error variance. Finlayson et al. (1998) represent camera spectral sensitivities by a linear combination of the first 9 or 15 Fourier basis functions, and use a constraint that a camera spectral sensitivity must be uni- or bimodal. Barnard and Funt (2002) use all the constraints, replace the absolute intensity error with the relative intensity error, and estimate the camera spectral sensitivities and response function at once. Ebner (2008) uses an evolution strategy along with the positivity and the smoothness constraints. Thomson and Westland (2001) use the Gram–Charlier expansion (Frieden 1983) for basis functions to reduce the dimensions of camera spectral sensitivities. Nonlinear fitting is performed in the method.

The main limitation of the mentioned methods is the requirement of the spectral radiance, which is problematic if the camera is not at hand, or if no additional devices (such as a monochromator or spectrometer) are available. In constrast, our method does not require known scene spectral radiance. Moreover, in computing the camera spectral sensitivities we do not use an iterative technique that is required in the existing methods, although in the sub-process of estimating turbidity, we do use an optimization process.

It should be noted that several methods of computer vision have utilized the radiometric sky model. Yu and Malik (1998) use the Perez et al.’s sky model (1993) to calculate the sky radiance from photographs, in the context of recovering the photometric properties of architectural scenes. Lalonde et al. (2010) exploit the visible portion of the sky and estimates turbidity to localize clouds in sky images, which are similar to the technique we use. However, the method cannot be used directly for our purpose, since its optimization is based on \(xyY\) color space. To convert image \(RGB\) into \(xyY,\) a linear matrix must be estimated from known camera sensitivities and white balance setting, which are obviously unknown in our case. Thus, instead of using \(xyY,\) we assume that the relative intensity (i.e., the ratio of a sample point’s intensity over a reference point’s intensity) is independent from cameras up to a global scale factor. By fitting the relative intensity between pixels to that of the sky model, our method can estimate the turbidity, which we consider to be an improvement over the method of Lalonde et al. (2010).

3 Estimating Sky Spectra

Camera spectral sensitivities can be estimated if both image \(RGB\) values and the corresponding spectra, respectively \(I_c\) and \(L(\lambda )\) in Eq. (1), are known. However, in our problem setting, only pixel values \(I_c\) are known. To overcome this, our idea is to infer spectra \(L(\lambda )\) from pixel values using sky images. Since, from sky images, turbidity can be estimated, and from turbidity, sky spectra can be obtained. This section will focus on this process. Later, having obtained the sky spectra and the corresponding \(RGB\) values, the camera spectral sensitivities can be calculated from Eq. (1).

3.1 Sky Turbidity Estimation

The appearance of the sky, e.g. the color and the clearness, is determined by the scattering and absorption of the solar irradiance caused by air molecules, aerosols, ozone, water vapor and mixed gases, where some of them change according to the climate conditions (Chaiwiwatworakul and Chirarattananon 2004). Aerosols are attributed to many factors, such as volcanic eruptions, forest fires, etc., and difficult to characterize precisely. However, a single heuristic parameter, namely turbidity, has been studied and used in the atmospheric sciences (Preetham et al. 1999). Higher turbidity implies more scattering and thus whiter sky.

To estimate turbidity, our basic idea is to match the brightness distribution between an actual image and the sky model proposed by Preetham et al. (1999). The model describes the correlation between the brightness distribution and the sky turbidity based on the simulations of various sun positions and turbidity values. According to it, the luminance \(Y\) of the sky in any viewing direction with respect to the luminance at the zenith \({Y_z}\) is given by:

$$\begin{aligned} Y =\frac{\mathcal{{F}}(\theta ,\,\gamma )}{\mathcal{{F}}(0,\,\theta _s)}\,Y_z, \end{aligned}$$

where \(\mathcal{F}(., .)\) is the sky brightness distribution function of turbidity developed by Perez et al. (1993), \(\theta _s\) is the zenith angle of the sun, \({\theta }\) is the zenith angle of the viewing direction, and \({\gamma }\) is the angle of the sun direction with respect to the camera viewing direction, as shown in Fig. 2. More details of calculating the sky luminance are provided in Appendix A.

Fig. 2
figure 2

The coordinates for specifying the sun position and the viewing direction in the sky hemisphere

Hence, to estimate turbidity (\(T\)), our method minimizes the following error function:

$$\begin{aligned} Err=\sum _{i=1}^{n}{\left| \frac{Y_i(T)}{Y_{ref}(T)}-\frac{J_i}{J_{ref}} \right| }, \end{aligned}$$

where \(n\) represents the number of sample points and \({Y}/Y_{ref}\) is the luminance ratio of the sky, which can be calculated from \(\mathcal{{F}}(\theta ,\,\gamma )/ \mathcal{{F}}(\theta _{ref},\,\gamma _{ref}),\) given the sun direction and the turbidity. \({Y_{ref}}\) is the luminance of a reference point, and we found that it can be the zenith as in Eq. (2), or any other point in the visible sky portion. \(J\) is the total intensity of a pixel:

$$\begin{aligned} J=I_r+I_g+I_b, \end{aligned}$$

where \(I_c\) is the image intensity defined in Eq. (1). \(J_{ref}\) is the total intensity of a reference pixel. Since we assume the camera gamma function is linear, the image intensity ratio (\(J_i/J_{ref}\)) is proportional to the luminance ratio of the sky (\(Y_i/Y_{ref}\)), regardless of the camera sensitivities and white balance setting. The error function is minimized by Particle Swarm Optimization (Kennedy and Eberhart 1995), which is generally more robust than the Levenberg–Marquardt algorithm when there are several local minima.

To minimize the error function, the sun- and the camera viewing directions are required. With this respect, we consider two cases:

  1. (1)

    The easier case is when a single image is taken using a fish-eye lens or an omnidirectional camera, since, assuming the optical axis of the camera is perpendicular to the ground, we can fit an ellipse to saturated pixels, and find its center as the sun position in the sky hemisphere shown in Fig. 2.

  2. (2)

    The harder case is when the sky is captured by a normal lens, and the sun is not visible in the input image. In this circumstance, we search images that include a reference object with a known pose and geolocation. The pose and geolocation of a reference object (particularly a landmark object) are in many cases searchable on the Internet. The camera viewing direction can, then, be recovered using a few images that include the reference object by using SfM (structure from motion). The sun position is estimated from the time stamp in the EXIF tag and the geolocation of the object. The details of calculating the sun direction when the sun is not visible in the image are given in Appendix B.

Aside from the two cases above, when clouds are present in the input image, the turbidity estimation tends to be erroneous. To tackle this, our method employs a RANSAC type approach, where it estimates the turbidity from sample sky pixels, repeats this procedure, and finds the turbidity that has the largest inliers with the smallest error.

3.2 Sky Spectra from Turbidity

Preetham et al. (1999) also introduce the correlation of turbidity and the CIE chromaticity (\({x}\) and \({y}\)). The CIE chromaticity can be calculated as follows:

$$\begin{aligned} {x} = {x_z}\frac{\mathcal{{F}}(\theta ,\,\gamma )}{\mathcal{{F}}(0,\, \theta _s)},\ \quad \text{ and } \quad \ {y}= {y_z}\frac{\mathcal{{F}}(\theta ,\,\gamma )}{\mathcal{{F}}(0,\,\theta _s)}, \end{aligned}$$

where \({x_z}\) and \({y_z}\) represent the zenith chromaticities, and are functions of turbidity. For computing \({x}\) and \({y}\) in detail, see Appendix C.

Having obtained \(x\) and \(y\) in Eq. (5), the sky spectra can be calculated using the known basis functions of daylights (Judd et al. 1964; Wyszecki and Stiles 1982). The sky spectrum \({S_D(\lambda )}\) is given by a linear combination of the mean spectrum and the first two eigenvector functions:

$$\begin{aligned} {S_D(\lambda )} = {S_0(\lambda )}+{M_1S_1(\lambda )}+{M_2S_2(\lambda )}, \end{aligned}$$

where scalar coefficients \({M_1}\) and \({M_2}\) are determined by chromaticity values \({x}\) and \({y}.\) Computing \({M_1}\) and \({M_2}\) from \({x}\) and \({y}\) is also given in Appendix C. Three basis functions \(S_0(\lambda ),\,S_1(\lambda )\) and \(S_2(\lambda )\) can be found in Judd et al. (1964), Wyszecki and Stiles (1982).

4 Estimating Camera Spectral Sensitivity

Given a number of input image \(RGB\) values and the corresponding spectra, the camera spectral sensitivities can be computed using Eq. (1), of which the matrix notation is expressed as

$$\begin{aligned} \mathbf I = \mathbf q ^{t}\mathbf L , \end{aligned}$$

where \(\mathbf I \) is a \(3\times n\) pixel matrix, \(\mathbf L \) is a \(w \times n\) spectral matrix, and \(\mathbf q \) is a \(w\times 3\) camera-sensitivity matrix. \(n\) represents the number of pixels, and \(w\) represents the number of wavelengths. Provided sufficient data for \(\mathbf I \) and \(\mathbf L ,\) we can estimate \(\mathbf q \) by operating \(\mathbf I \mathbf L ^{+},\) where \(\mathbf L ^+\) is the pseudo-inverse of \(\mathbf L .\)

Unfortunately, the rank of the matrix \(\mathbf L \) has to be at least \(w,\) to calculate the pseudo-inverse \(\mathbf L ^+\) stably. In our case, the representation of the sky spectral radiance is three dimensional (3D) since we calculate the spectral radiance using the basis functions in Eq. (6). This means that the direct matrix inversion method would produce erroneous results.

To solve the problem, we propose to use a set of basis functions computed from known camera spectral sensitivities (Zhao et al. 2009). In many cases, camera spectral sensitivities have different distribution functions but the variances will not be extremely large, meaning that their representation may lie in a low-dimensional space, similar to the illumination basis functions (Slater and Healey 1998). Since basis functions can reduce dimensionality and thus the number of unknowns, this method generally provides robust and more accurate results than the direct matrix inversion method.

4.1 Estimation Using Basis Functions

Representing the camera spectral sensitivities using a linear combination of basis functions is expressed as:

$$\begin{aligned} {q_c(\lambda )} = {\sum _{i=1}^d b_i^c B_{i}^c(\lambda )}, \end{aligned}$$

where \({d}\) is the number of the basis functions, \({b_i^c}\) is the coefficient and \({B_i^c(\lambda )}\) is the basis function with \(c \in \{r,\,g,\,b\}.\) By substituting this equation into Eq. (1), we can have:

$$\begin{aligned} I_c = {\sum _{i=1}^d b_i^c \int \limits _{\varOmega } L(\lambda ) B_i^c(\lambda )} d\lambda . \end{aligned}$$

By using \({E_i^c}\) to describe the multiplication of the spectral radiance and the basis function of a camera spectral sensitivity: \({E_i^c} = {\int _{\varOmega } L(\lambda ) B_i^c(\lambda ) d\lambda },\) we can obtain

$$\begin{aligned} {I_c} = {\sum _{i=1}^d b_i^c E_i^c}. \end{aligned}$$

Now, let us suppose that we have \({n}\) sets of data (\(n\) image pixels and the corresponding spectral radiance); then, we can describe the last equation as \(\mathbf I = \mathbf b \mathbf E ,\) where \(\mathbf I \) is a \(1\) by \({n}\) matrix, \(\mathbf b \) is a 1 by \({d}\) coefficient matrix, and \(\mathbf E \) is a \({d}\) by \({n}\) matrix. Consequently, this coefficient matrix \(\mathbf b \) can be expressed as follows: \(\mathbf b =\mathbf I \mathbf E ^{+},\) where \(\mathbf E ^{+}\) is the pseudo-inverse of \(\mathbf E .\)

4.2 Basis Functions from a Database

The rank of the multiplication matrix (\(\mathbf E \)) has to be larger than the number of the basis functions (\({d}\)) to make the estimation robust. Since the estimated spectral radiance is at most rank three, we use 3D basis functions for the camera spectral sensitivity estimation.

To extract the basis functions, we collected several digital cameras to make a database and measured their spectral sensitivities, including a few spectral sensitivities taken from the literature (Vora et al. 1997; Buil 2005). Cameras included in the database are Sony DXC 930, Kodak DCS 420, Sony DXC 9000, Canon 10D, Nikon D70, and Kodak DCS 460. Those used for testing are not included. This camera spectral sensitivity database is publicly available at our website (Zhao 2013). We obtain the basis functions from the database by using the principal component analysis.

The percentages of eigenvalues for each color channel are shown in Table 1. The sum of the first three eigenvalues is about \({93\,\%}\) for all three channels; thus, the first three vectors cover \({93\,\%}\) information of the database. Based on this, the first three eigenvectors are used as basis functions, which are shown in Fig. 3.

Fig. 3
figure 3

The extracted basis functions of red, green and blue channels from our sensitivity database

Table 1 The values of the first four eigenvalues of the basis functions extracted from our camera spectral sensitivity database, and the percentage of those eigenvalues in capturing the distributions of all the cameras

5 Implementation

The flowchart of the algorithm is shown in Fig. 4. If the input image is captured by an omnidirectional camera, where the optical axis is perpendicular to the ground, and the sun appears in the image, then an ellipse is fitted to the saturated pixels, and the sun position is considered to be the center of that ellipse. Subsequently, the angle between the sun and the camera is computed. However, if the image is taken using an ordinary camera, and we cannot directly know the sun position in the image, then the sun position and the camera viewing direction need to be estimated.

Fig. 4
figure 4

The flowchart of the implementation. If the input image is from omnidirectional camera, the algorithm directly estimate the sun position in the image. Otherwise, the algorithm will calculate the sun direction through the EXIF tag, and calculate the camera parameters from a structure-from-motion method (using images of the same scene as the input image). Then, it samples the sky pixels (i.e., view directions in a geodesic dome) and estimates the turbidity. Having the turbidity, it calculates the sky spectral radiance, which can be used to calculate the camera spectral sensitivity and the white balance setting. The details of the flowchart can be found in Sect. 5

The sun position is estimated through a known geolocation (e.g., using Google Earth) and EXIF tag (the time stamp). The camera viewing direction is estimated using the Bundler (Snavely et al. 2006) and the pose of a reference object. This pose of a reference object can be estimated using Google Earth, where the orientation angle is calculated by drawing a line between two specified points, as shown in Fig. 5. However, this estimation is less accurate than the actual on-site measurement (which for some landmark objects, is available on the Internet). The inaccuracy in the camera viewing angle is 6\(^{\circ }\) in this case, which, in turn, decreases the accuracy of estimating the camera spectral sensitivities approximately 3 %.

Fig. 5
figure 5

Estimating the orientation angle by Google Earth

To estimate the sun position in an image, one might consider Lalonde et al.’s method (2009). However, for a typical image shown in Fig. 11g, the method produced the estimated angular errors 10\(^{\circ }\) for \(\theta \) and 16\(^{\circ }\) for \(\phi ,\) which are considerably large. Therefore, instead of using the method, we used the geolocation and time stamp to estimate the sun position. Note that, according to the psychophysical test by Lopez-Moreno et al. (2010), the human visual system cannot spot an anomalously lit object with respect to the rest of the scene, when the divergence between the coherent and the anomalous light is up to 35\(^{\circ }.\) Thus, the error of Lalonde et al. (2009) may be tolerable in some applications.

We also evaluated how many images can be used for consistent estimation of the viewing angles of a specific camera by the Bundler (Snavely et al. 2006). The result is shown in Fig. 6, where we tried as many as 300 images and the SfM algorithm produces consistent results after approximately 50 images.

Fig. 6
figure 6

The relation between the number of images used for SfM and estimated camera angles. The top and bottom figures shows estimated elevation (\(\theta \)) and azimuth (\(\phi \)) angles with respect to the number of input images. The labels around the plotted data are the number of images used

Having determined the sun position and camera viewing direction, a few pixels in the sky, which correspond to points in the sky hemisphere, are sampled uniformly. To ensure the uniformity, a geodesic dome is used to partition the sky hemisphere equally, and sample some points in each partition. For omnidirectional images, the corresponding sky pixels can be obtained directly from the sample points generated using the geodesic dome. For perspective images, we first calculate the camera’s field of view from the image dimension and focal length. Then, the sample geodesic points lying on the camera’s field of view are used to calculate the coordinates of the corresponding sky pixels.

Turbidity is estimated from the intensity ratios of these sampled sky pixels using Particle Swarm Optimization. RANSAC is used to remove the outliers. The spectral radiance is converted from chromaticity values (\(x\) and \(y\)), which are calculated from the turbidity. Finally, the camera spectral sensitivities together with the white balance setting are estimated using the precomputed basis functions, the calculated sky spectra, and their corresponding \(RGB\) values.

6 Experimental Results

In our experiments, we used both raw images, which are affected by minimal built-in color processing, and images downloaded from the Internet. We assume those images were taken with the gamma function off or have been radiometrically calibrated.

Before evaluating our method, we verified the assumption used in the sky turbidity estimation, namely, image intensities are proportional to the sky luminance. We used two cameras: Nikon D1x and Canon 5D attached with a fish-eye lens. The images are shown in Fig. 8d, g. We sampled about 120 points uniformly distributed on the sky hemisphere. Figure 7 shows the results, where the image intensities of both cameras are linear with respect to the sky luminance values of the sky model.

Fig. 7
figure 7

The verification of correlation between the sky luminance and image intensity for two cameras

Fig. 8
figure 8

Various sky conditions captured by three different omnidirectional cameras: the top row shows the images of Ladybug2, the second row shows the images of Canon 5D. The first two images of the bottom row are captured by Nikon D1x and the third one is rectified from image (a)

6.1 Raw Images

6.1.1 Omnidirectional Images

A number of clear sky images were taken almost at the same time using three different cameras: Ladybug2, Canon 5D, and Nikon D1x, where the latter two cameras were attached with a fish-eye lens, as shown in Fig. 8a, d, g. To show the effectiveness of the proposed method, we compared it with Barnard and Funt (2002). In the comparison, we used the same inputs, i.e., the estimated sky spectra and the corresponding sky pixels. Figure 9a, d, g show the estimated results. The ground-truth of these cameras was measured by using a monochromator. The proposed method was able to estimate the same sky turbidity, around \(2.2 \pm 0.02\) through different cameras with different \(RGB\) values.

Fig. 9
figure 9

The sensitivity estimation results using the input images shown in Fig. 8. Ground-truth (GT), estimated sensitivities of our method (Estimated), and the method of Barnard and Funt (2002) (Barnard) are shown for three different cameras: Ladybug2, Canon 5D and Nikon D1x

The mean error and RMSE of both proposed and Barnard and Funt’s methods are shown in Table 2. Here, the maximum values of the estimated camera spectral sensitivities were normalized to \(1.0.\) The largest mean error of the proposed method was less than \(3.5\) %, while that of Barnard et al.’s was \(7\) %. The proposed method also had a smaller standard deviation.

Table 2 The evaluation of estimated camera spectral sensitivity from omnidirectional images: mean error and RMSE

The method was also evaluated for different sky conditions as shown in Fig. 8: (b) partially cloudy sky, (c) thin cloudy sky, (e) hazy sky, and (h) significantly cloudy sky. For Fig. 8b, c, RANSAC was used to exclude the outliers (cloud pixels). For other images, we estimated the sky turbidity from the sampled sky pixels using the Particle Swarm Optimization. The estimated turbidity for those weather conditions were about 2, 3, 4 and 12, respectively. The recovered camera spectral sensitivities are shown in Fig. 9b, c, e, h. A large error occurs in (h), because the whole sky was covered by thick clouds that did not fit Preetham et al.’s model.

In the experiment, we also verified whether the proposed method is effective to estimate the white balance settings by using two images taken from the same camera (thus the same camera spectral sensitivities) but different white balance settings. Figure 8e, f show such images. The estimated camera spectral sensitivities are shown in Fig. 9e, f. As expected, the shapes of the camera spectral sensitivities were the same, and different only in the magnitude.

6.1.2 Perspective Images

We tested our method for perspective images (images rectified from omnidirectional images) and images taken from ordinary cameras. First, to show that the narrower field of view also works with the method, we used the rectified spherical image shown in Fig. 8i. This image is part of Fig. 8a. The recovered sensitivity is shown in Fig. 9i. The performance did not change significantly compared to Fig. 8a, although only the partial sky was visible. We tested three different directions in Fig. 8a, and had similar results. The estimated sun position in Fig. 8a was used here.

Second, to show that the method can handle images where the sun is not visible and the camera poses are unknown, we captured images with a reference object without knowing its pose and geolocation, shown in Fig. 10a. We captured 16 images in total, and recovered each camera pose with respect to the reference object. The sun position was estimated from the time stamp on the EXIF tag. The estimated camera spectral sensitivities are shown in Fig. 10b.

Fig. 10
figure 10

The rectilinear images with a reference object from multiple views of Nikon D1x and the estimated camera spectral sensitivity

6.2 In-camera Processed Images

General images such as those available on the Internet are much more problematic compared with the images we tested in Sect. 6.1, since the gamma function has to be estimated and the images were usually taken by cameras that have built-in color processing (Ramanath et al. 2005).

Nevertheless, we evaluated our method with those images, which were captured by three different cameras: Canon EOS Rebel XTi, Canon 5D, Canon 5D Mark II. Figure 11 shows the images of the Statue of Liberty downloaded from a photosharing site. These images were JPEG compressed and taken with internal camera processing. Chakrabarti et al. (2009) introduce an empirical camera model, which converts a JPEG image back to a raw image. We implemented the method to photometrically calibrate the camera (to estimate the response function and internal color processing). The camera pose and the sun direction were estimated in the same manner as in the previous experiment (Fig. 10a). As many as 187 images were used. The method was also evaluated by different sky conditions: clear sky (Fig. 11a, g, i), cloudy sky (Fig. 11c, e), and hazy sky (Fig. 11k).

Fig. 11
figure 11

The sensitivity estimation results for images downloaded from the Internet. Three different cameras were tested: the top row shows the images of Canon EOS Rebel XTi, the second row shows those of Canon 5D, and the bottom row shows those of Canon 5D Mark II. (a, c, e, g, i, k) are the input images, and (b, d, f, h, j, l) are the corresponding results. All input images are downloaded from the Internet. (“GT”) in the graphs refers to the ground-truth, and (“Estimated”) refers to the estimated sensitivities

The estimated camera spectral sensitivities are also shown in Fig. 11. The error evaluation is summarized in Table 3. The mean error for \(RGB\) channels is larger than the results from omnidirectional images because of the residual errors of the internal color processing, the estimation of the response function, and the data compression.

Table 3 The evaluation of the estimated camera spectral sensitivity from Internet images: mean error and RMSE

We used the Macbeth color chart to evaluate the accuracy of the estimated camera spectral sensitivities. Specifically, we captured the spectral radiance of the first 18 color patches and used the estimated camera spectral sensitivities to predict the image intensities. The predicted and captured image intensities are plotted onto a 2D space. We found that if the error of the estimated camera spectral sensitivities is less than 5 %, then the plotted data forms an almost perfect straight line.

7 Application: Color Correction

One of the applications of estimating camera spectral sensitivities and white balance setting is to correct the colors between different cameras. The purpose of this color correction is similar to the color transfer (Reinhard et al. 2001). Hence, we compared the results of color correction using our estimated camera spectral sensitivities and white balance with those of the color transfer.

Before showing the comparisons, here we briefly discuss our color correction technique. By discretizing Eq. (1) and using matrix notation, we can rewrite it as follows:

$$\begin{aligned} \mathbf I _{n\times 3}= \mathbf L _{n\times w} \mathbf Q _{w\times 3} \mathbf B _{3\times 3}= \mathbf E _{n\times 3} \mathbf B _{3\times 3}, \end{aligned}$$

where \(\mathbf I \) is the intensity matrix, \(\mathbf L \) is the matrix of the spectral radiance, \(\mathbf Q \) is the matrix of the basis functions for the camera spectral sensitivities, \(\mathbf B \) is the coefficient matrix, and \(\mathbf E \) is the multiplication of \(\mathbf L \) and \(\mathbf Q .\) Note that, the basis functions used here are different from those extracted in Sect. 4.2, where now we use the same basis for the three color channels. \(n\) is the number of surfaces, and \(w\) is the number of sampled wavelengths.

Suppose we have an image captured by one camera, denoted as \(\mathbf I _1= \mathbf E \mathbf B _1;\) then the same scene captured by another camera is expressed as

$$\begin{aligned} \mathbf I _2= \mathbf E \mathbf B _2 = \mathbf I _1 \mathbf B _1^{-1} \mathbf B _2. \end{aligned}$$

Since \(\mathbf B _1\) and \(\mathbf B _2\) are computable if both camera spectral sensitivities are known, the color conversion from one image to another is possible via the last equation. Figure 12 shows the extracted basis functions that are common for the three channels.

Fig. 12
figure 12

The extracted basis functions common for all three channels from our sensitivity database

The color correction result of the Statue of Liberty is shown in Fig. 13. In the figure, (a) and (b) show the source and target images, and (d) is the result of the proposed method. We also implemented Reinhard et al.’s color transfer algorithm (2001) to have an idea how a color transfer method performs for color correction. Reinhard et al. (2001) transforms one color cluster in \(RGB\) space into the other by the combination of translation and scaling, assuming that the two clusters follow the Gaussian distribution. The result of Reinhard et al.’s method is shown in Fig. 13c.

Fig. 13
figure 13

Color correction between different cameras are shown in (ad). (a) The source image captured by Canon 5D. (b) The target image captured by Canon EOS Rebel XTi. (c) The result of color transfer (Reinhard et al. 2001). (d) The result of our color correction method. Chromaticity evaluation between images are shown in (eg), where “target image,” “color transfer,” and “our method” represent chromaticities of (bd). The result of our method is close to the target image except for the point 4, because it lies in the shadow region of the target image (b)

Since the proposed method is based on the physical cameras’ characteristics, and it uses affine transformation shown in Eq. (12) which is more general than the combination of translation and scaling, it produces visually better results, e.g., in the chest area, or in the platform of the statue, as shown in Fig. 13b–d. The proposed method can determine the transformation once the camera sensitivities are obtained, which is beneficial for color correction applications.

The quantitative evaluation is shown in Fig. 13e–g. We sampled six pixels as shown in Fig. 13a, and compared the chromaticity of those pixels of the three images (b)–(d). In those figures, “target image,” “color transfer,” and “our method” represent the chromaticity of the target image, the result of color transfer, and the proposed color correction. The chromaticity values of the proposed method are close to those of the target image, except for the point 4, which lies in the shadow region of the target image.

Note that, while Fig. 13b was captured only 1 h later than Fig. 13a, their color appears significantly different. By assuming that the illumination did not change significantly, the difference should be caused by camera properties, such as spectral sensitivities and white balance settings. Thus, the proposed method would be useful for applications where color calibration between cameras is necessary.

Other two examples of color correction are shown in Figs. 14 and 15. In both figures, (a) and (b) are the gamma-corrected images of Fig. 1, and (c) shows the result of the proposed color correction for two different cameras. The quantitative evaluations are also shown in the figures. “Casio,” “Pana,” “Pana2Casio,” “Canon” and “Canon2Casio” represent the chromaticity values of Casio, Panasonic, color corrected from Panasonic to Casio, Canon, and color corrected from Canon to Casio, respectively. The performance was evaluated on the four sampled pixels as shown in (d)–(f) in each figure.

Fig. 14
figure 14

Color correction between different cameras are shown in (ac). (a) The target image captured by Casio, (b) the source image captured by Panasonic, (c) the color correction result from Panasonic (b) to Casio (a). The chromaticity evaluation between images are shown in (df). “Casio,” “Pana,” and “Pana2Casio” represent chromaticity values of (ac). The performance is evaluated on four points

Fig. 15
figure 15

Color correction between different cameras are shown in (ac). (a) The target image captured by Casio, (b) the source image captured by Canon, (c) the color correction result from Canon (b) to Casio (a). Chromaticity evaluation between images are shown in (df). “Casio,” “Canon,” and “Canon2Casio” represent chromaticities of (ac). The performance is evaluated on four points

8 Discussion

8.1 Accuracy of the Sky Model

The sky model (Preetham et al. 1999) might pose an accuracy issue in estimating camera spectral sensitivities, therefore we evaluated it by comparing the intensity produced by the model with the actual sky intensity.

The result is shown in Fig. 16, where (a) shows the actual sky image captured by the Canon 5D camera, and (b) is the simulated sky image. The image intensity in (b) was adjusted in such a way that their average became equal to that in (a), although the red ellipse part was excluded from the averaging, since we considered it to be affected by the scattering at the aperture. We took six sample pixels and compared the chromaticity values, which are summarized in Fig. 16c–e.

Fig. 16
figure 16

The top row shows the comparison of captured and simulated images of the sky, where the camera used was Canon 5D. The chromaticities of the six pixels are shown in (ce)

8.2 Robustness to the Sun Direction Estimation

We tested the robustness of the camera spectral sensitivity estimation by adding noise to the sun direction. With 5\(^{\circ },\) 10\(^{\circ },\) and 15\(^{\circ }\) errors, the mean error of three channels of Nikon D1x and Canon 5D were about 3, 7, and 11 %, respectively. This implies that the error increases linearly to the angular error of the sun direction.

8.3 Comparison of Two Sky Turbidity Estimation Methods

We compared the proposed sky turbidity estimation method with Lalonde et al. (2010). Our method is based on brightness, while Lalonde et al.’s is based on the \(xyY\) color space. Supposing we capture the same scene by two different cameras or with two different white balance settings, then the calculated \(xyY\) values are different according to different \(RGB\) values. Therefore, using Lalonde et al. (2010), the estimated sky turbidity values are different, which cannot be correct since the scene is exactly the same. The proposed method can handle this problem by assuming the image brightness or intensity stays proportional. We conducted an experiment to verify this. Using the two methods, we fitted the sky model to images and the estimated sky turbidity values. The result is shown in Fig. 17, where (a) and (b) are the input images simulated from the sky model whose sky turbidity was manually set to \(2.0,\) and their white balance settings were set to “Daylight” and “Fluorescent,” respectively. The estimated sky turbidity values by the proposed method are \(2.03\) for both input images, while the sky turbidity values by Lalonde et al.’s method are \(2.32\) and \(1.41.\) The simulated sky appearance from the estimated sky turbidity values are shown in Fig. 17c–f. The proposed method can estimate turbidity independent from the white balance settings.

Fig. 17
figure 17

The simulated sky appearance from the estimated sky turbidity by the proposed and Lalonde et al.’s method (2010)

8.4 Limitations of the Proposed Method

Many images, particularly those available on the Internet, have been processed further by image processing software, such as the Adobe Photoshop. To verify the limitation of the method, we created such modified images. We changed the color balance for the first image (by multiplying each color channel by a constant), adjusted the hue manually for the second image (by increasing the pixel values of the green channel to make it greenish), and then estimated the camera spectral sensitivities from them. The result is shown in Fig. 18, where (a) shows the original image, (b) shows the manually color balanced image, (c) shows the manually hue-adjusted image, (d) and (e) show the estimated results. The estimated camera spectral sensitivities from Fig. 18b was close to the ground-truth. However, the estimated camera spectral sensitivities from Fig. 18c had large errors compared to the ground-truth, since the sky turbidity was deviated by the hue modification. There are some operations performed on images by the Photoshop that conflict with the camera spectral sensitivity estimation, and in future work we will consider how to automatically filter out such contaminated images.

Fig. 18
figure 18

Manually processed image and estimated camera spectral sensitivities for Canon EOS Rebel XTi. (a) The original image. (b) Manually processed image by changing color balance. (c) Manually processed image by increasing the pixel value of green channel. (d) The estimated camera spectral sensitivities from (b). (e) The estimated camera spectral sensitivities from (c)

9 Conclusion

In this paper, we have proposed a novel method to estimate camera spectral sensitivities and white balance setting from images with sky regions. The proposed method could significantly benefit physics-based computer vision or computer vision in general, particularly for future research where the images on the Internet become valuable. To conclude, our contributions in this paper are (1) the novel method that uses images for camera spectral sensitivity and white balance estimation, (2) the database of camera spectral sensitivities that is publicly available, (3) the improved sky turbidity estimation that handles a wide variety of cameras, and (4) the camera spectral sensitivity-based color correction between different cameras.