Keywords

1 Introduction

When unpolarised light is reflected by a surface it becomes partially polarised [1]. The degree to which the reflected light is polarised conveys information about the surface orientation and, therefore, provides a cue for shape recovery. There are a number of attractive properties to this ‘shape-from-polarisation’ (SfP) cue. It requires only a single viewpoint and illumination environment, it is invariant to illumination and surface albedo and it provides information about both the zenith and azimuth angle of the surface normal. Like photometric stereo, shape estimates are dense (the surface normal is estimated at every pixel so resolution is limited only by the sensor) and, since it does not rely on detecting or matching features, it is applicable to smooth, featureless surfaces.

However, there are a number of drawbacks to using SfP in a practical setting. The polarisation cue alone provides only ambiguous estimates of surface orientation. Hence, previous work focuses on developing heuristics to locally disambiguate the surface normals. Even having done so, surface orientation is only a 2.5D shape cue and so the estimated normal field must be integrated in order to recover surface depth [2] or used to refine a depth map captured using other cues [3]. This two step approach of disambiguation followed by integration means that the surface integrability constraint is not enforced during disambiguation and also that errors accumulate over the two steps. In this paper, we propose a SfP method (see Fig. 1 for an overview) with the following novel ingredients:

  1. 1.

    In contrast to prior work, we compute SfP in the depth, as opposed to the surface normal, domain. Instead of disambiguating the polarisation normals, we defer resolution of the ambiguity until surface height is computed. To do so, we express the azimuthal ambiguity as a collinearity condition that is satisfied by either interpretation of the polarisation measurements.

  2. 2.

    We express polarisation and shading constraints as linear equations in the unknown depth enabling efficient and globally optimal depth estimation.

  3. 3.

    We use a novel hybrid diffuse/specular polarisation and shading model, allowing us to handle glossy surfaces.

  4. 4.

    We show that illumination can be determined from the ambiguous normals and unpolarised intensity up to a binary ambiguity (a particular generalised Bas-relief [4] transformation: the convex/concave ambiguity). This means that our method can be applied in an uncalibrated scenario and we consider both point source and 1st/2nd order spherical harmonic (SH) illumination.

Fig. 1.
figure 1

Overview of method: from a single polarisation image of a homogenous, glossy object in uncontrolled (possibly outdoor) illumination, we estimate lighting and compute depth directly.

1.1 Related Work

Previous SfP methods can be categorised into two groups, those that: 1. use only a single polarisation image, and 2. combine a polarisation image with additional cues. The former group (of which our method is a member) can be considered ‘single shot’ methods (single shot capture devices exist using polarising beamsplitters [25] or CMOS sensors with micropolarising filters [26]). More commonly, a polarisation image is obtained by capturing a sequence of images in which a linear polarising filter is rotated in front of the camera (possibly with unknown rotation angles [5]). SfP methods can also be classified according to the polarisation model used (dielectric versus metal, diffuse, specular or hybrid models) and if they compute shape in the surface normal or surface height domain.

Single polarisation image. The earliest work focussed on capture, decomposition and visualisation of polarisation images [6]. Both Miyazaki et al. [2] and Atkinson and Hancock [7] used a diffuse polarisation model and, under an assumption of object convexity, propagate disambiguation of the surface normal inwards from the boundary. This greedy approach will not produce globally optimal results, limits application to objects with a visible occluding boundary and does not consider integrability. Morel et al. [8] took a similar approach but used a specular polarisation model suitable for metallic surfaces. Huynh et al. [9] also assumed convexity to disambiguate the polarisation normals; however, their approach also estimates refractive index. As in our method, Mahmoud et al. [10] exploited the unpolarised intensity via a shading cue. Assuming Lambertian reflectance and known lighting direction and albedo, the surface normal ambiguity can be resolved. We avoid all of these assumptions and, by strictly enforcing integrability, improve robustness to noise.

Polarisation with additional cues. Rahmann and Canterakis [11] combined a specular polarisation model with stereo cues. Similarly, Atkinson and Hancock [12] used polarisation normals to segment an object into patches, simplifying stereo matching. Huynh et al. [13] extended their earlier work to use multispectral measurements to estimate both shape and refractive index. There have been a number of attempts to augment polarisation cues with calibrated, Lambertian photometric stereo, e.g. [14]. Drbohlav and Sara [15] showed how the Bas-relief ambiguity [4] in uncalibrated photometric stereo could be resolved using polarisation. However, this approach requires a polarised light source. Recently, Ngo et al. [16] derived constraints that allowed surface normals, light directions and refractive index to be estimated from polarisation images under varying lighting. However, this approach requires at least 4 light directions in contrast to the single direction required by our method. Very recently, Kadambi et al. [3] proposed an interesting approach in which a single polarisation image is combined with a depth map obtained by an RGBD camera. The depth map is used to disambiguate the normals and provide a base surface for integration.

2 Problem Formulation and Polarisation Theory

We make the following assumptions (more general than much previous work in the area): 1. Dielectric (i.e. non-metallic) material with uniform (but unknown) albedo. 2. Orthographic projection. 3. The refractive index of the surface is known, though dependency on this quantity is weak and we fix it to a constant value for all of our experiments. 4. Pixels can be classified as either diffuse dominant or specular dominant. 5. The object surface is smooth (i.e. \(C^2\) continuous).

We parameterise surface height by the function \(z(\mathbf{u})\), where \(\mathbf{u}=(x,y)\) is an image point. Foreground pixels belonging to the surface are represented by the set \(\mathcal {F}\), \(|{\mathcal{F}}|=K\). The unit surface normal can be expressed in spherical world coordinates as:

$$\begin{aligned} \mathbf{n}(\mathbf{u})=[n_x(\mathbf{u})\ n_y(\mathbf{u})\ n_z(\mathbf{u})]^T=[\sin \alpha (\mathbf{u})\sin \theta (\mathbf{u})\ \cos \alpha (\mathbf{u})\sin \theta (\mathbf{u})\ \cos \theta (\mathbf{u})]^T, \end{aligned}$$
(1)

and formulated via the surface gradient as follows

$$\begin{aligned} \mathbf{n}(\mathbf{u}) =\frac{ \left[ -p(\mathbf{u})\ -q(\mathbf{u})\ 1 \right] ^T }{ \sqrt{p(\mathbf{u})^2+q(\mathbf{u})^2+1} }, \end{aligned}$$
(2)

where \(p(\mathbf{u})=\partial _x z(\mathbf{u})\) and \(q(\mathbf{u})=\partial _y z(\mathbf{u})\), so that \(\nabla z(\mathbf{u}) = [p(\mathbf{u})\ q(\mathbf{u})]^T\).

Fig. 2.
figure 2

Polarimetric capture (a) and decomposition to polarisation image (b–d).

2.1 Polarisation Image

When unpolarised light is reflected from a surface, it becomes partially polarised. There are a number of mechanisms by which this process occurs. The two models that we use are described in Sect. 2.2 and are suitable for dielectric materials. A polarisation image (Fig. 2b–d) can be estimated by capturing a sequence of images (Fig. 2a) in which a linear polarising filter in front of the camera is rotated through a sequence of \(P\ge 3\) different angles \(\vartheta _j\), \(j\in \left\{ 1, \dots ,\ P\right\} \). The intensity at a pixel varies sinusoidally between \(I_{\text {min}}\) and \(I_{\text {max}}\) with the polariser angle:

$$\begin{aligned} i_{\vartheta _j}(\mathbf{u})=\frac{I_{\text {max}}(\mathbf{u})+I_{\text {min}}(\mathbf{u})}{2}+\frac{I_{\text {max}}(\mathbf{u})-I_{\text {min}}(\mathbf{u})}{2}\cos [2\vartheta _j-2\phi (\mathbf{u})]. \end{aligned}$$
(3)

The polarisation image is obtained by decomposing the sinusoid at every pixel into three quantities [6]. These are the phase angle, \(\phi (\mathbf{u})\), the degree of polarisation, \(\rho (\mathbf{u})\), and the unpolarised intensity, \(i_{\text {un}}(\mathbf{u})\), where:

$$\begin{aligned} \rho (\mathbf{u}) = \frac{I_{\text {max}}(\mathbf{u})-I_{\text {min}}(\mathbf{u})}{I_{\text {max}}(\mathbf{u})+I_{\text {min}}(\mathbf{u})}\; \text {and} \; i_{\text {un}}(\mathbf{u})=\frac{I_{\text {max}}(\mathbf{u})+I_{\text {min}}(\mathbf{u})}{2}. \end{aligned}$$
(4)

The parameters of the sinusoid can be estimated from the captured image sequence using nonlinear least squares [7], linear methods [9] or via a closed form solution [6] for the specific case of \(P=3\), \(\vartheta \in \{ 0^{\circ }, 45^{\circ }, 90^{\circ }\}\). See supplementary material for details of our sinusoid fitting scheme.

2.2 Polarisation Models

A polarisation image provides a constraint on the surface normal direction at each pixel. The exact nature of the constraint depends on the polarisation model used. We assume that the object under study is composed of a dielectric material exhibiting both diffuse reflection (due to subsurface scattering) and specular reflection (due to direct reflection at the air/surface interface). We make use of both types of reflection. This model is particularly suitable for smooth, glossy materials such as porcelain, skin, plastic and surfaces finished with gloss paint. We follow recent works [3, 17] and assume that reflection from a point can be classified as diffuse dominant or specular dominant (see supplementary material for our classification scheme). Hence, a pixel \(\mathbf{u}\) belongs either to the set of diffuse pixels, \(\mathcal {D}\), or the set of specular pixels, \(\mathcal {S}\), with \({\mathcal{F}}={\mathcal{D}}\cup \mathcal{S}\).

Fig. 3.
figure 3

(a) Relationship between degree of polarisation and zenith angle, for specular and diffuse dielectric reflectance with \(\eta =1.5\). (b) Estimated zenith angle from degree of polarisation in Fig. 2b. (c) Visualisation of estimated zenith angle. (Color figure online)

Diffuse polarisation model. For diffuse reflection, the degree of polarisation is related (Fig. 3a, red curve) to the zenith angle \(\theta (\mathbf{u})\in [0,\frac{\pi }{2}]\) of the normal in viewer-centred coordinates (i.e. the angle between the normal and viewer):

$$\begin{aligned} \rho (\mathbf{u}) = \frac{{\sin \left( \theta (\mathbf{u})\right) }^2\, {\left( \eta - \frac{1}{\eta }\right) }^2}{4\, \cos \left( \theta (\mathbf{u})\right) \, \sqrt{\eta ^2 - {\sin \left( \theta (\mathbf{u})\right) }^2} - {\sin \left( \theta (\mathbf{u})\right) }^2\, {\left( \eta + \frac{1}{\eta }\right) }^2 + 2\, \eta ^2 + 2}, \end{aligned}$$
(5)

where \(\eta \) is the refractive index. The dependency on \(\eta \) is weak and typical values for dielectrics range between 1.4 and 1.6. We assume \(\eta =1.5\) for the rest of this paper. This expression can be rearranged to give a closed form solution for the zenith angle in terms of a function, \(f(\rho (\mathbf{u}),\eta )\), that depends on the measured degree of polarisation and the refractive index:

$$\begin{aligned}&\cos (\theta (\mathbf{u})) = \mathbf {n}(\mathbf{u})\cdot \mathbf {v} = f(\rho (\mathbf{u}),\eta ) \nonumber \\&= \, \sqrt{\frac{2\, \rho + 2\, \eta ^2\, \rho - 2\, \eta ^2 + \eta ^4 + {\rho }^2 + 4\, \eta ^2\, {\rho }^2 - \eta ^4\, {\rho }^2 - 4\, \eta ^3\, \rho \, \sqrt{- \left( \rho - 1\right) \, \left( \rho + 1\right) } + 1}{\eta ^4\, {\rho }^2 + 2\, \eta ^4\, \rho + \eta ^4 + 6\, \eta ^2\, {\rho }^2 + 4\, \eta ^2\, \rho - 2\, \eta ^2 + {\rho }^2 + 2\, \rho + 1}} \end{aligned}$$
(6)

where we drop the dependency of \(\rho \) on \(\mathbf{u}\) for brevity. Since we work in a viewer-centred coordinate system, the viewing direction is \(\mathbf {v}=[0\ 0\ 1]^T\) and we have simply: \( n_z(\mathbf{u}) = f(\rho (\mathbf{u}),\eta ),\) or, in terms of the surface gradient,

$$\begin{aligned} \frac{1}{\sqrt{p(\mathbf{u})^2+q(\mathbf{u})^2+1}}=f(\rho (\mathbf{u}),\eta ). \end{aligned}$$
(7)

The phase angle determines the azimuth angle of the surface normal \(\alpha (\mathbf{u})\in [0,2\pi ]\) up to a \(180^{\circ }\) ambiguity: \( \mathbf{u}\in {\mathcal{D}}\Rightarrow \alpha (\mathbf{u}) = \phi (\mathbf{u})\ \text {or}\ (\phi (\mathbf{u}) + \pi ) \). Hence, for a diffuse pixel \(\mathbf{u}\in \mathcal {D}\), this means that the surface normal is given (up to an ambiguity) by either \(\mathbf{n}(\mathbf{u})=\bar{\mathbf{n}}(\mathbf{u})\) or \(\mathbf{n}(\mathbf{u})=\mathbf{T}\bar{\mathbf{n}}(\mathbf{u})\) where

$$\begin{aligned} \bar{\mathbf{n}}(\mathbf{u}) = \begin{bmatrix} \sin \phi (\mathbf{u})\sin \theta (\mathbf{u}) \\ \cos \phi (\mathbf{u})\sin \theta (\mathbf{u}) \\ \cos \theta (\mathbf{u}) \\ \end{bmatrix} \;\text {and}\; \mathbf {T} = \mathbf {R}_z(180^{\circ }) = \begin{bmatrix} -1&0&0 \\ 0&-1&0 \\ 0&0&1 \\ \end{bmatrix}. \end{aligned}$$
(8)

Specular polarisation model. For specular reflection, the degree of polarisation is again related to the zenith angle (Fig. 3a, blue curve):

$$\begin{aligned} \rho _s(\mathbf{u}) = \frac{2\sin (\theta (\mathbf{u}))^2\cos (\theta (\mathbf{u}))\sqrt{\eta ^2-\sin (\theta (\mathbf{u}))^2}}{\eta ^2-\sin (\theta (\mathbf{u}))^2-\eta ^2\sin (\theta (\mathbf{u}))^2+2\sin (\theta (\mathbf{u}))^4}. \end{aligned}$$
(9)

This expression is problematic for two reasons: 1. it cannot be analytically inverted to solve for zenith angle, 2. there are two solutions. The first problem is overcome simply by using a lookup table and interpolation. The second problem is not an issue in practice. Specular reflections occur when the surface normal is approximately halfway between the viewer and light source directions. We assume that the light source \(\mathbf{s}\) is positioned in the same hemisphere as the viewer, i.e. \(\mathbf{v}\cdot \mathbf{s}>0\). In this configuration, specular pixels will never have a zenith angle \(>{\sim }45^{\circ }\). Hence, we can restrict (9) to this range and, therefore, a single solution.

In contrast to diffuse reflection, the azimuth angle of the surface normal is perpendicular to the phase of the specular polarisation [18] leading to a \(\frac{\pi }{2}\) shift: \(\mathbf{u}\in \mathcal{S}\Rightarrow \alpha (\mathbf{u}) = \left( \phi (\mathbf{u})-\pi /2\right) \; \text {or}\;\left( \phi (\mathbf{u}) + \pi /2\right) \).

Figure 3b shows zenith angle estimates using the diffuse/specular model on \({\mathcal{D}}\)/\(\mathcal{S}\) respectively. In Fig. 3c we show the cosine of the estimated zenith angle, a visualisation corresponding to a Lambertian rendering with frontal lighting.

2.3 Shading Constraint

The unpolarised intensity provides an additional constraint on the surface normal direction via an appropriate reflectance model. We assume that, for diffuse-labelled pixels, light is reflected according to the Lambertian model. We also assume that albedo is uniform and factor it into the light source vector \(\mathbf{s}\). Hence, unpolarised intensity is related to the surface normal by:

$$\begin{aligned} \mathbf{u}\in {\mathcal{D}}\Rightarrow i_{\text {un}}(\mathbf{u}) = \cos (\theta _i(\mathbf{u}))=\mathbf {n}(\mathbf{u})\cdot \mathbf {s}, \end{aligned}$$
(10)

where \(\theta _i(\mathbf{u})\) is the angle of incidence (angle between light source and surface normal). In terms of the surface gradient, this becomes:

$$\begin{aligned} i_{\text {un}}(\mathbf{u}) = \frac{-p(\mathbf{u})s_x-q(\mathbf{u})s_y+s_z}{\sqrt{p(\mathbf{u})^2+q(\mathbf{u})^2+1}}. \end{aligned}$$
(11)

Note that if the light source and viewer direction coincide (a configuration that is physically impossible to achieve precisely) then this equation provides no more information than the degree of polarisation. Hence, we assume that the light source direction is different from the viewing direction, i.e. \(\mathbf{s}\ne \mathbf{v}\).

For specular pixels, we do not use the unpolarised intensity directly (though it is used in the labelling of specular pixels - see supplementary material). Instead, we assume simply that the normal is approximately equal to the halfway vector:

$$\begin{aligned} \mathbf{u}\in \mathcal{S}\Rightarrow \mathbf {n}(\mathbf{u}) \approx \mathbf {h} = (\mathbf {s}+\mathbf {v})/\Vert \mathbf {s}+\mathbf {v}\Vert . \end{aligned}$$
(12)

3 Linear Depth Estimation with Known Illumination

We now show that the polarisation shape cues can be expressed as per pixel equations that are linear in terms of the surface gradient. By using finite difference approximations to the surface gradient, this allows us to write the problem of depth estimation in terms of a large system of linear equations. This means that depth estimation is both efficient and certain to obtain the global optimum. In this section we assume that the lighting and albedo are known. However, in the following section we describe how they can be estimated from the polarisation image, allowing depth recovery with uncalibrated illumination.

3.1 Polarisation Constraints as Linear Equations

First, we note that the phase angle constraint can be written as a collinearity condition. This condition is satisfied by either of the two possible azimuth angles implied by the phase angle measurement. Writing it in this way is advantageous because it means we do not have to disambiguate the surface normals explicitly. Instead, when we solve the linear system for depth, the azimuthal ambiguities are resolved in a globally optimal way. Specifically, for diffuse pixels we require the projection of the surface normal into the x-y plane, \([n_x\ n_y]\), and a vector in the image plane pointing in the phase angle direction, \([\sin (\phi ) \; \cos (\phi )]\), to be collinear. These two vectors are collinear when the following condition is satisfied:

$$\begin{aligned} \mathbf {n}(\mathbf{u})\cdot [\cos (\phi (\mathbf{u}))\ -\sin (\phi (\mathbf{u}))\ 0]^T = 0. \end{aligned}$$
(13)

Substituting (2) into (13) and noting that the nonlinear term in (2) is always \(\ne 0\) we obtain the first linear equation in the surface gradient:

$$\begin{aligned} {-}p(\mathbf{u})\cos (\phi (\mathbf{u})) + q(\mathbf{u})\sin (\phi (\mathbf{u})) = 0. \end{aligned}$$
(14)

A similar expression can be obtained for specular pixels, substituting in the \(\frac{\pi }{2}\)-shifted phase angles. This condition exhibits a natural weighting that is useful in practice. The phase angle estimates are more reliable when the zenith angle is large (i.e. when the degree of polarisation is high and so the signal to noise ratio is high). When the zenith angle is large, the magnitude of the surface gradient is large, meaning that disagreement with the estimated phase angle is penalised more heavily than for a small zenith angle where the gradient magnitude is small.

The second linear constraint has two different forms for diffuse and specular pixels. The diffuse constraint is obtained by combining the expressions for the unpolarised intensity and the degree of polarisation. To do so, we take a ratio between (11) and (7) which cancels the nonlinear normalisation factor:

$$\begin{aligned} \frac{i_{\text {un}}(\mathbf{u})}{f(\rho (\mathbf{u}),\eta )}=-p(\mathbf{u})s_x-q(\mathbf{u})s_y+s_z, \end{aligned}$$
(15)

yielding our second linear equation in the surface gradient. For specular pixels, we express (12) in terms of the surface gradient as:

$$\begin{aligned} p(\mathbf{u}) = -h_x/h_z \; \text {and} \; q(\mathbf{u}) = -h_y/h_z. \end{aligned}$$
(16)

3.2 Linear Height Recovery

The surface gradient in (2) can be approximated numerically from the discretised surface height function using finite differences. To reduce sensitivity to noise and improve robustness, where possible we use a smoothed central difference approximation. Such an approximation is obtained by convolving the surface height function with Sobel operators \(\mathbf{G}_x, \mathbf{G}_y\in \mathbb {R}^{3\times 3}\): \(\partial _x z \approx z * \mathbf{G}_x\) and \(\partial _y z \approx z * \mathbf{G}_y\). At the boundary of the image or the foreground mask, not all neighbours may be available for a given pixel. In this case, we use unsmoothed central differences (where both horizontal or both vertical neighbours are available) or, where only a single neighbour is available, single forward/backward differences.

Substituting these finite differences into (14), (15) and (16) therefore leads to linear equations with between 3 and 8 unknown values of z (depending on which combination of numerical gradient approximations are used). Of course, the surface height function is unknown. So, we seek the surface height function whose finite difference gradients solve the system of linear equations over all pixels. Due to noise, we do not expect an exact solution. Hence, for an image with K foreground pixels, we can solve in a least squares sense the system of 2K linear equations in the K unknown height values. In order to resolve the unknown constant of integration (i.e. applying an arbitrary offset to z does not affect its orthographic images), we add an additional linear equation to set the height of one pixel to zero. We end up with the linear least squares problem \( \min _\mathbf{z} \Vert \mathbf{Az} - \mathbf{b} \Vert ^2 \), where \(\mathbf{A}\) has \(2K+1\) rows, K columns and is sparse (each row has at most 8 non-zero values). This can be solved efficiently. Note: this is a system of linear equations in depth. It is not a partial differential equation. Hence, we do not require boundary conditions to be specified.

We also find it advantageous (though not essential) to include two priors on the surface height: 1. Laplacian smoothness, 2. convexity. Both are expressed as linear equations in the surface height. See supplementary material for details.

4 Illumination Estimation from a Polarisation Image

The method described above enables linear depth recovery from a single polarisation image when the illumination direction is known. In this section, we describe how to use the polarisation image to estimate illumination, prior to depth estimation, so that the method above can be applied in an uncalibrated scenario. First, we show that the problem of light source estimation is ambiguous. Second, we derive a method to compute the light source direction (up to the binary ambiguity) from ambiguous normals using the minimum possible number of observations. Third, we extend this to an efficient least squares optimisation that uses the whole image and is applicable to noisy data. Finally, we relax the lighting assumptions to allow more flexible 1st and 2nd order SH illumination.

We consider only diffuse pixels for illumination estimation, since specular pixels are sparse and we wish to avoid estimating the parameters of a particular assumed specular reflectance model. Hence, the unpolarised intensity is assumed to follow a Lambertian model with uniform albedo, as in (11). For the true \(\mathbf{s}\), \( i_{\text {un}}(\mathbf{u}) = \bar{\mathbf {n}}(\mathbf{u})^T\mathbf {s} \ \vee \ i_{\text {un}}(\mathbf{u}) = (\mathbf {T}\bar{\mathbf{n}}(\mathbf{u}))^T\mathbf {s}\). Hence, a single pixel restricts the light source to two planes.

4.1 Relationship to the Bas-Relief Ambiguity

For an image with K diffuse pixels, there are \(2^K\) possible disambiguations of the polarisation normals. Suppose that we know the correct disambiguation of the normals and that we stack them to form the matrix \(\mathbf {N}_{\text {true}}\in {\mathbb {R}}^{K\times 3}\) and stack the unpolarised intensities in the vector \(\mathbf{i}=[i_{\text {un}}(\mathbf{u}_1)\ \dots \ i_{\text {un}}(\mathbf{u}_K)]^T\). In this case, the light source \(\mathbf{s}\) that satisfies \( \mathbf {N}_{\text {true}}\mathbf {s} = \mathbf {i} \) is given by the pseudo-inverse:

$$\begin{aligned} \mathbf{s}=\mathbf {N}_{\text {true}}^+\mathbf{i}. \end{aligned}$$
(17)

However, for any invertible \(3\times 3\) linear transform \(\mathbf{A}\in GL(3)\), it is also true that \( \mathbf {N}_{\text {true}}\mathbf {A}^{-1}\mathbf {A}\mathbf {s} = \mathbf {i} \), and so \(\mathbf{As}\) is also a solution using the transformed normals \(\mathbf {N}_{\text {true}}\mathbf {A}^{-1}\). However, the only such \(\mathbf{A}\) where \(\mathbf{N}_{\text {true}}\mathbf{A}^{-1}\) is consistent with the polarisation image is \(\mathbf{A}=\mathbf{T}\), i.e. where the azimuth angle of each normal is shifted by \(\pi \). Hence, if \(\mathbf{s}\) is a solution with normals \(\mathbf{N}_{\text {true}}\) then \(\mathbf{T}{} \mathbf{s}\) is also a solution with normals \(\mathbf{N}_{\text {true}}{} \mathbf{T}\). Note that \(\mathbf{T}\) is a generalised Bas-relief (GBR) transformation [4] with parameters \(\mu = 0\), \(\nu =0\) and \(\lambda =\pm 1\), i.e. the binary convex/concave ambiguity. Hence, from a polarisation image with unknown lighting, we will be unable to distinguish the true normals and lighting from those transformed by \(\mathbf{T}\). Since \(\mathbf{T}\) is a GBR transformation, the transformed normals remain integrable and correspond to the true surface negated in depth (see Fig. 4).

Fig. 4.
figure 4

A polarisation image of a diffuse object enables uncalibrated surface reconstruction up to a convex/concave ambiguity. Both interpretations are consistent with the polarisation image.

4.2 Minimal Solutions

Suppose that \(\mathbf{N}\in {\mathbb {R}}^{K\times 3}\) contains one of the \(2^K\) possible disambiguations of the K surface normals, i.e. \(\mathbf{N}_j=\bar{\mathbf{n}}(\mathbf{u}_j)\) or \(\mathbf{N}_j=\mathbf{T}\bar{\mathbf{n}}(\mathbf{u}_j)\). If \(\mathbf{N}\) is a valid disambiguation, then (with no noise) we expect: \(\mathbf{N}{} \mathbf{N}^+\mathbf{i}=\mathbf{i}\). We can see in a straightforward way that three pixels will be insufficient to distinguish a valid from an invalid disambiguation. When \(K=3\), \(\mathbf{N}^+=\mathbf{N}^{-1}\) and so \(\mathbf{N}{} \mathbf{N}^+=\mathbf{I}\) and hence the condition is satisfied by any combination of disambiguations. The reason for this is that, apart from degenerate cases, any three planes will intersect at a point so any combination of transformed or untransformed normals will allow an \(\mathbf{s}\) to be found that satisfies all three equations.

However, the problem becomes well-posed for \(K>3\). The system of linear equations must be consistent and have a unique solution. If some, but not all, of the normals are transformed from their true direction then the system of equations will be inconsistent. By the Rouché–Capelli theorem [19], consistency and uniqueness require \(\text {rank}(\mathbf{N})=\text {rank}\left( \left[ \mathbf{N}\ \mathbf{i}\right] \right) =3\). So, we could try each possible combination of disambiguated normals and check whether the rank condition is satisfied. Note that we only need consider half of the possible disambiguations. We can divide the \(2^K\) disambiguations into \(2^{K-1}\) pairs differing by a global transformation and only need consider one of each of the pairs. So, for the minimal case of \(K=4\), we construct the 8 possible normal matrices \(\mathbf{N}\), with the first row fixed to \(\mathbf{N}_1=\bar{\mathbf{n}}(\mathbf{u}_1)\), and find the one satisfying the rank condition. For this \(\mathbf{N}\) we find \(\mathbf{s}\) by (17) and the solution is either \((\mathbf{N},\mathbf{s})\) or \((\mathbf{NT},\mathbf{Ts})\).

4.3 Alternating Optimisation

In practice, we expect the ambiguous normals and unpolarised intensities to be noisy. Therefore, a least squares solution over all observed pixels is preferable. Since the unknown illumination is only 3D and we have a polarisation observation for every pixel, the problem is highly overconstrained. Following the combinatorial approach above, we could build all \(2^K\) possible systems of linear equations, solve them in a least squares sense and take the one with minimal residual as the solution. However, this is NP-hard and impractical for any non-trivial value of K. Instead, we can write an optimisation problem to find \(\mathbf{s}\):

$$\begin{aligned} \mathbf {s}^* = \mathop {\hbox {arg min}}\limits _{\mathbf {s}\in {\mathbb {R}}^3} \sum _{j\in {\mathcal{D}}} \min \left( \left[ \bar{\mathbf{n}}(\mathbf{u}_j)\cdot \mathbf{s}-i_{\text {un}}(\mathbf{u}_j)\right] ^2, \left[ \mathbf{T}\bar{\mathbf{n}}(\mathbf{u}_j)\cdot \mathbf{s}-i_{\text {un}}(\mathbf{u}_j)\right] ^2 \right) . \end{aligned}$$
(18)

This is non-convex since the minimum of two convex functions is not convex [20]. However, (18) can be efficiently optimised using alternating assignment and optimisation. In practice, we find that this almost always converges to the global minimum even with a random initialisation. In the assignment step, given an estimate for the light source at iteration t, \(\mathbf{s}^{(t)}\), we choose from each ambiguous pair of normals the one that yields minimal error under illumination \(\mathbf{s}^{(t)}\):

$$\begin{aligned} \mathbf{N}_j^{(t)} := {\left\{ \begin{array}{ll} \bar{\mathbf{n}}(\mathbf{u}_j) &{} \text {if } \left[ \bar{\mathbf{n}}(\mathbf{u}_j)\cdot \mathbf{s}^{(t)}-i_{\text {un}}(\mathbf{u}_j)\right] ^2 < \left[ \mathbf{T}\bar{\mathbf{n}}(\mathbf{u}_j)\cdot \mathbf{s}^{(t)}-i_{\text {un}}(\mathbf{u}_j)\right] ^2 \\ \mathbf{T}\bar{\mathbf{n}}(\mathbf{u}_j) &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(19)

At the optimisation step, we use the selected normals to compute the new light source by solving the linear least squares system: \( \mathbf{s}^{(t+1)} := (\mathbf{N}^{(t)})^+\mathbf{i} \). These two steps are iterated to convergence. In all our experiments, this process converged in fewer than 10 iterations. To resolve the ambiguity in our experimental results, we always take the light source estimate that gives the maximal surface.

4.4 Extension to 1st and 2nd Order Spherical Harmonic Lighting

Using a first or second order SH diffuse lighting model [21, 22], the binary ambiguity in the surface normal leads to a binary ambiguity in the SH basis vector at each pixel. Specifically, a first order SH lighting model introduces a constant term: \( i_{\text {un}}(\mathbf{u}) = \mathbf{b}_4(\mathbf{u})^T\mathbf{s}_4 \) with basis vector \( \mathbf{b}_4(\mathbf{u}) = \begin{bmatrix} {n}_x(\mathbf{u})\&{n}_y(\mathbf{u})\&{n}_z(\mathbf{u})&1 \end{bmatrix}^T \). With ambiguous normals, the basis vector is known up to a binary ambiguity: \(\mathbf{b}_4(\mathbf{u})=\bar{\mathbf{b}}_4(\mathbf{u})\) or \(\mathbf{b}_4(\mathbf{u})=\mathbf{T}_4\bar{\mathbf{b}}_4(\mathbf{u})\) with \( \bar{\mathbf{b}}_4(\mathbf{u}) = \begin{bmatrix} \bar{n}_x(\mathbf{u})\&\bar{n}_y(\mathbf{u})\&\bar{n}_z(\mathbf{u})&1 \end{bmatrix}^T \) and the transformation given by: \( \mathbf{T}_4 = \text {diag}(-1, -1, 1, 1) \). Solving for \(\mathbf{s}_4\) is the same problem as solving for a point source, leading to the same ambiguity. If \(\mathbf{s}_4\) is a solution with minimal residual then \(\mathbf{T}_4\mathbf{s}_4\) is also an optimal solution and the transformation of the normals corresponds to a GBR convex/concave transformation. Similarly, a second order SH lighting model: \( i_{\text {un}}(\mathbf{u}) = \mathbf{b}_9(\mathbf{u})^T\mathbf{s}_9 \) with basis vector \( \mathbf{b}_9 = \begin{bmatrix} 1\&n_x\&n_y\&n_z\&3n_z^2\!-\!1\&n_xn_y\&n_xn_z\&n_yn_z\&n_x^2{-}n_y^2 \end{bmatrix}^T, \) can be handled in exactly the same way with the appropriate transformation matrix given by: \( \mathbf{T}_9 = \text {diag}(1, -1, -1, 1, 1, 1, -1, -1, 1) \). For shape estimation, we compute the 4D or 9D lighting vector, subtract from the diffuse intensity the zeroth and second order appearance contributions and then run the same algorithm as for point source illumination using only the first order appearance.

5 Experimental Results

We begin with a quantitative evaluation on synthetic data. We render images of the Stanford bunny with Blinn-Phong reflectance under point source illumination (Fig. 5a). We simulate polarisation according to (3), (5) and (9) with varying polariser angle, add Gaussian noise of standard deviation \(\sigma \) and quantise to 8 bits. We vary light source direction over \(\theta _l\in \{15^{\circ }, 30^{\circ }, 60^{\circ }\}\) and \(\alpha _l\in \{0^{\circ },90^{\circ },180^{\circ },270^{\circ }\}\). We estimate a polarisation image for each \((\sigma ,\theta _l,\alpha _l)\) and use this as input. For comparison, we implemented the only previous methods applicable to a single polarisation image: 1. boundary propagation [2, 7] and 2. Lambertian shading disambiguation [10]. The second method requires known lighting and albedo. For both this and our method, we provide results with ground truth lighting/albedo (superscript “gt”) and lighting/albedo estimated using the method in Sect. 4.3 (superscript “est”). For the comparison methods, we compute a depth map using least squares integration, as in [23]. For our method, we compute surface normals using a bicubic fit to the estimated depth.

Fig. 5.
figure 5

Typical surface normal estimates (c–e) from noisy synthetic data (a). The inset sphere in (b) shows how surface orientation is visualised as a colour. (Color figure online)

We show typical results in Fig. 5c-e and quantitative results in Table 1 (RMS depth error and mean angular surface normal error averaged over \(\alpha _l\) and 100 repeats for each setting; best result for each setting emboldened). The boundary propagation method [2, 7] assumes convexity, meaning that internal concavities are incorrectly recovered. The Lambertian method [10] exhibits high frequency noise since solutions are purely local. Both methods also contain errors in specular regions and propagate errors from normal estimation into the integrated surface. Our solution is smoother and more stable in specular regions yet still recovers fine surface detail. Note however that the simple constraint in (12) encourages all specular normals to point in the same direction, leading to over-flattening of specular regions. Quantitatively, our method offers the best performance across all settings. In many cases, the result with estimated lighting is better than with ground truth. We believe that this is because it enables the method to partially compensate for noise. In Table 2 we show the quantitative accuracy of our lighting estimate. We use the same point source directions as above. When the lighting is within \(15^{\circ }\) of the viewing direction, the error is less than \(1^{\circ }\). For order 1 and 2 SH lighting, we use the same order 1 components as the point source directions and randomly generate the order 0 and 2 components.

Fig. 6.
figure 6

Qualitative comparison on real world data. Light source direction = \([2\ 0\ 7]\).

Table 1. Depth and surface normal estimation errors on synthetic data.
Table 2. Quantitative light source estimation results on synthetic data.

In order to evaluate our method on real world images, we capture two datasets using a Canon EOS-1D X and vary a linear polarising filter over \(180^{\circ }\) in \(10^{\circ }\) increments. The first dataset is captured in a dark room using a Lowel Prolight to approximate a point source. We experiment with both known and unknown lighting. For known lighting, the approximate position of the light source is measured and to calibrate for unknown light source intensity and surface albedo, we use the method in Sect. 4.3 to compute the length of the light source vector, fixing its direction to the measured one. The second dataset is captured outdoors on a sunny day using natural illumination. We use an order 1 SH lighting model.

Fig. 7.
figure 7

Qualitative results on a variety of material types. The first three rows show results captured in dark room conditions with a point light source. The two panels in the final row show results in outdoor, uncontrolled illumination. Depth maps are encoded as brighter = closer. The first row shows a result with known lighting direction, all others are estimated.

We show a qualitative comparison between our method and the two reference methods in Fig. 6 using known lighting (see supplementary material for more comparative results). The comparison methods exhibit the same artefacts as on synthetic data. Some of the noise in the normals is removed by the smoothing effect of surface integration but concave/convex errors in [2, 7] grossly distort the overall shape, while the surface details of the wings are lost by [10]. In Fig. 7 we show qualitative results of our method on a range of material types, under a variety of known or estimated illumination conditions (both indoor point source and outdoor uncontrolled). Note that the recovered surface of the angel remains stable even with estimated illumination (compared to known illumination in Fig. 6). Note also that our method is able to recover the fine surface detail of the skin of the lemon and orange under both point source and natural illumination.

6 Conclusions

We have presented the first SfP technique in which polarisation constraints are expressed directly in terms of surface depth. Moreover, through careful construction of these equations, we ensure that they are linear and so depth estimation is simply a linear least squares problem. The SfP cue is often described as being locally ambiguous. We have shown that, in fact, even with unknown lighting the diffuse unpolarised intensity image restricts the uncertainty to a global convex/concave ambiguity. Our method is practically useful, enabling monocular, passive depth estimation even in outdoor lighting. For reproducibility, we make a full implementation of our method and the two comparison methods availableFootnote 1.

In future, we would like to relax the assumptions in Sect. 2. From a practical perspective, the most useful would be to allow spatially-varying albedo. Rather than assuming that pixels are specular or diffuse dominant, we would also like to allow mixtures of the two polarisation models and to exploit specular shading. To do so would require an assumption of a specular BRDF model. An alternative would be to fit a data-driven BRDF model [24] directly to the ambiguous polarisation normals, potentially allowing single shot BRDF and shape estimation.