1 Introduction

Computer vision has been striving to recreate our human visual perception. Wheatstone’s fundamental observations (Wheatstone 1838) state that a set of solely two adjacent cameras facilitates imitating a human’s binocular vision. Using these two images in conjunction with a stereo display technique, e.g. stereoscopic glasses (Huang et al. 2015), allows for the reproduction of depth as perceived by human eyes. With regard to the location in object space, however, this stereo vision system concedes much more freedom than the human’s perception as the distance between cameras, called baseline, may vary. Hence, the flexibility in camera stereoscopy makes it possible to adapt to particular depth scenarios. For example, triangulation is used in stellar parallax to measure the distance to stars (Hirshfeld 2001). What applies to a macroscopic universe, may also be useful for a microscope.

However, miniaturising multiple stereo setups to the level as required by microscopes poses a problem to hardware fabrication since lens diameters restrict baseline gaps between cameras. As an alternative, a Micro Lens Array (MLA) may be placed in front of an image sensor of an otherwise conventional microscope (Levoy et al. 2006; Broxton et al. 2013), which is generally known as a light field camera. An obvious attempt to regard the micro lens pitch as the baseline proves to be impractical as optical parameters of the objective lens affect a light field’s geometry (Hahne et al. 2014a, b).

The light field camera, also known as plenoptic camera, was adopted to the field of computer vision ever since Adelson and Wang (1992) published an article, which coined the term plenoptic deduced from Latin and Greek meaning “full view”. The authors were the first to computationally generate a depth map by solving the stereo correspondence problem based on footage from a plenoptic camera and concluded that its baseline is confined to the main lens’ aperture size. Although Adelson and Wang could not provide methods to acquire quantitative baseline measures, the authors predicted the baseline to be relatively small. When Levoy and Hanrahan (1996) proposed a concise 4-D light field notation, each ray in the light field could be represented by merely four coordinates (uvst) obtained from the rays’ intersection at two two-dimensional (2-D) planes placed behind one another. In respect of a plenoptic camera, these sampling planes may be represented by MLA and image sensor. In case of a plenoptic camera, maximum directional light field resolution is captured when focusing micro lenses to infinity (Ng 2006), which is accomplished by placing the MLA stationary one focal length in front of the sensor. This plenoptic camera type has been made commercially available by Lytro Inc. (2012) and is capable of synthetically focusing images (Ng et al. 2005; Fiss et al. 2014; Hahne et al. 2016).

By shifting the sensor away from the MLA focal plane, research has shown that the spatial and directional resolution can be traded off, which involves different image synthesis approaches (Lumsdaine and Georgiev 2008; Georgiev et al. 2006). To distinguish between these optical setups, Lytro’s camera was later named Standard Plenoptic Camera (SPC) in a publication by Perwass and Wietzke (2012), who devised a more complex MLA that features different micro lens types. The spatio-angular trade-off in a plenoptic camera is determined by diameter, focal length, image position and packing of the micro lenses, just as the sensor pixel pitch, which thus makes it part of the optical hardware design.

Over the years, several studies have provided different methods to acquire disparity maps from an SPC (Heber and Pock 2014; Bok et al. 2014; Jeon et al. 2015; Tao et al. 2017). To the best of our knowledge, researchers have not dealt with the estimation of an object’s distance using triangulation on the basis of disparity maps obtained from a light field camera. One reason might have been that baselines are required, which are not obvious in the case of plenoptic cameras as the optics involved is more complex than with conventional stereoscopy. Attempts to estimate a plenoptic camera’s baseline were initially addressed in publications by our research group (Hahne et al. 2014a, b), which provided validation through simulation only. Besides, main lens pupil positions have been ignored in this work, yielding large deviations when estimating the distance to refocused image planes obtained from an SPC (Hahne et al. 2016). It is thus expected that our previous triangulation scheme (Hahne et al. 2014a, b) entails errors in the experimentation which is subject to investigation. A more recent study by Jeon et al. (2015) has also proposed a baseline estimation method without giving details on the optical groundwork and lacking validation activities.

In this paper, we propose a refined optics-geometrical model for light field triangulation and estimate object distances captured by an SPC. Our plenoptic model is the first to pinpoint virtual cameras along the entrance pupil of the objective lens. Verification is accomplished through real images from a custom-built SPC and a ray tracing simulator (Zemax 2011) for a quantitative deviation assessment. A top-level overview of the processing pipeline for experimental validation is given in Fig. 1. By doing so, we obtain much more accurate baseline and object distance results than by our previous method (Hahne et al. 2014a) and Jeon et al. (2015). The proposed concept will prove to be valuable in fields where stereo vision is traditionally used.

Fig. 1
figure 1

Block diagram for experimental validation

This paper has been organised in the following way. Section 2 briefly reviews the binocular vision concept by means of the geometry in order to recall stereo triangulation. This is followed by a step-wise development of an SPC ray model in Sect. 3 where the extraction of viewpoints images from a raw SPC capture is also demonstrated. Experimental work is presented in Sect. 4, which aims to assess claims made in Sect. 3 by measuring baseline and tilt angle from a disparity map analysis and a ray tracing simulation (Zemax 2011). Results are summarised and discussed in Sect. 5.

2 Stereoscopic Triangulation

2.1 Coplanar Stereo Cameras

The SPC can be seen as a complex derivative of a stereo vision system. The stereo triangulation concept is presented hereafter to serve as a groundwork.

Fig. 2
figure 2

Stereo triangulation scheme with parallel cameras where a point is projected through the optical centres \(O_L, O_R\) yielding two image points (orange) in each camera. The relative displacement of these points returns the horizontal disparity \(\Delta x = x_R - x_L\). The baseline B, object distance Z and image distance b affect the measured disparity (Color figure online)

Figure 2 illustrates a stereoscopic camera setup where sensors are coplanar. The depicted setup may be parameterised by the spacing of the cameras’ axes, denoted as B for baseline, the cameras’ image distance b and the optical centres \(O_L\), \(O_R\) for each camera, respectively. As seen in the diagram, an object point is projected onto both camera sensors indicated by orange dots. With regard to corresponding image centres, the position of the image point in the left camera clearly differs from that in the right. This phenomenon is known as parallax and results in a relative displacement of respective image points from different viewpoints. To measure this displacement, the horizontal disparity \(\Delta x\) is introduced given by \(\Delta x = x_R - x_L\), where \(x_R\) and \(x_L\) denote horizontal distances from each projected image point to the optical image centre. Nowadays, image detectors are composed of discrete photosensitive cells making it possible to locate and measure \(\Delta x\). The disparity computation is a well studied task (Marr and Poggio 1976; Yang et al. 1993; Bobick and Intille 1999) and is often referred to as solving the correspondence problem. Algorithmic solutions to this are applied to a set of points in the image rather than a single one and thus yield a map of \(\Delta x\) values, which indicate the depth of a captured scene.

An object point’s depth distance Z can be directly fetched from parameters in Fig. 2. As highlighted with a dark tone of grey, \(\Delta x\) may represent the base of any acute scalene triangle with b as its height. Another triangle spanned by the base B and height Z is a scaled version of it and shown in light grey. This relationship relies on the method of similar triangles and can be written as an equality of ratios

$$\begin{aligned} \frac{Z}{B} = \frac{b}{\Delta x} . \end{aligned}$$
(1)

To infer the depth distance Z, Eq. (1) may be rearranged to

$$\begin{aligned} Z = \frac{b \times B}{\Delta x}. \end{aligned}$$
(2)

As seen by these equations, it is feasible to retrieve information about the depth location Z. Likewise, if \(\Delta x\) is constant, it may be obvious that by decreasing the baseline B, the object distance Z shrinks. Given a case where the depth range is located at a far distance, it is thus recommended to aim for a large baseline. Note that this relationship and corresponding mathematical statements only hold for cases where optical axes of \(O_L, O_R\) are aligned in parallel.

2.2 Tilted Stereo Cameras

Reasonable scenarios exist in which a camera’s optical axis is tilted with respect to the other. In such a case, the principle of similar triangles does not apply in the same manner as in Eq. (1).

Fig. 3
figure 3

Stereo triangulation scheme with non-parallel cameras where sensors are seen to be coplanar. \(\varPhi \) denotes the tilt angle of the right camera’s main lens \(O_R\) as related to that of the left camera \(O_L\)

Taking the left camera as the orientation reference, the right lens \(O_R\) is seen to be tilted as shown in Fig. 3. In this case, perspective image rectification is commonly employed to correct for non-coplanar stereo vision setups (Burger and Burge 2009). Iocchi (1998) concludes that optical axes intersect in a point \(Z_0\) as both axes lie on the xz plane if angle rotation occurs around the y-axis, whereas image planes of both cameras are still seen to be parallel. In traditional stereo vision, this yields deviations such that Iocchi’s (1998) method serves as a first-order approximation for small angle rotations in the absence of image processing. As demonstrated in Sect. 3.2, this approach, however, is suitable for our plenoptic triangulation model where imaginary sensor planes of virtual cameras are coplanar, whilst their optical axes may be non-parallel. Let \(\varPhi \) be the rotation angle, then laws of trigonometry allow to put

$$\begin{aligned} Z_0=\frac{B}{\tan (\varPhi )} \end{aligned}$$
(3)

and

$$\begin{aligned} Z = \frac{b \times B}{\Delta x + \frac{b \times B}{Z_0}} \end{aligned}$$
(4)

which may be shortened to

$$\begin{aligned} Z = \frac{b \times B}{\Delta x + b \times \tan (\varPhi )} \, \end{aligned}$$
(5)

after substituting for \(Z_0\). This approximation suffices to estimate the depth Z for small rotation angles \(\varPhi \) in stereoscopic systems without the need of an image rectification.

3 SPC Ray Model

To conceptualise a light field ray model for an SPC, we start tracing rays from the sensor side to the object space. For simplification, we consider chief rays only and follow their path from each sensor’s pixel centre at micro image domain u to the optical centre of its corresponding micro lens \(s_j\) with lens index j. In an SPC, the spacing between MLA and image sensor plane amounts to the micro lens focal length \(f_s\). Figure 4 visualises chief rays travelling through a micro lens and the objective lens indicating Micro Image Centres (MICs). With the aid of ray geometry, an MIC is found by a chief ray connecting an optical centre of a micro lens with that of the main lens. MICs play a key role in realigning a light field from an SPC and are locally obtained by \(c={(M - 1)}/{2}\), where M indicates one-dimensional (1-D) micro image resolutions, which are seen to be consistent. Discrete micro image points in the horizontal direction are then indexed by \({c+i}\), where \({i \in [-c,c]}\) such that 1-D micro image samples are given as \(u_{c+i,j}\).

Fig. 4
figure 4

Lens components of plenoptic camera (Hahne et al. 2016) depicting a micro lens \(s_j\) with pitch size \(p_M\) in a and an objective lens with exit pupil \(A'\) in b. A chief ray \(m_{c+i,j}\) pierces through the micro lens centre and sensor sampling positions \(c+i\) which are separated by pixel width \(p_p\). Chief rays originate from the exit pupil centre \(A'\) and arrive at Micro Image Centres (MICs) where red coloured crossbars signify gaps between MICs and respective micro lens optical axes. It can be seen that red crossbars grow towards image edges (Color figure online)

In earlier publications (Hahne et al. 2014a, b), it was assumed that MICs lie on the optical axes of corresponding micro lenses. However, it has been argued that this assumption would only be true if the distance between objective lens and MLA were infinitely large (Dansereau 2014). Due to the finite separation, MICs are displaced from their micro lens optical axes. A more accurate approach in estimating MIC positions is to model chief rays in a way that they connect optical centres of micro and main lenses (Dansereau et al. 2013). In Fig. 4b we further refine this hypothesis by regarding the centre of an exit pupil \(A'\) to be the origin from which MIC chief rays arise. Detecting MICs correctly is essential for our geometrical light ray model because MICs serve as reference points in the viewpoint image synthesis.

Fig. 5
figure 5

Illustration of the SPC ray model (Hahne et al. 2016), where MICs can be found by connecting the optical centre of the main lens with that of each micro lens and extending these rays (highlighted in yellow) until they reach the sensor. Here, the main lens is modelled as a thin lens such that entrance and exit pupils are in line with principal planes (Color figure online)

Figure 5 depicts our more advanced model that combines statements made about light rays’ paths in an SPC. For clarity, the main lens U is depicted as a thin lens meaning that the exit pupil centre coincides with the optical centre. However, the distinction is maintained in the following.

3.1 Viewpoint Extraction

It has been shown in Adelson and Wang (1992), Ng (2006), Dansereau (2014), Bok et al. (2014) that extracting viewpoints from an SPC can be attained by collecting all pixels sharing the same respective micro image position. To comply with provided notations, a 1-D sub-aperture image \(E_{i}\left[ s_j\right] \) with viewpoint index i is computed with

$$\begin{aligned} E_{i}\left[ s_j\right] = E_{f_s}\left[ s_j, \, u_{c+i}\right] \end{aligned}$$
(6)

where u and c have been omitted in the subscript of \(E_{i}\) since i is a sufficient index for sub-aperture images in the 1-D row. Equation (6) implies that the effective viewpoint resolution equals the number of micro lenses. Figure 6 depicts the reordering process producing 2-D sub-aperture images \(E_{(i,g)}\) by means of index variables \(\left[ s_j, \, t_h\right] \) and \(\left[ u_{c+i} \, , \, v_{c+g}\right] \) for spatial and directional domains, respectively. As can be seen from colour-highlighted pixels, samples at a specific micro image position correspond to the respective viewpoint location in a camera array.

Fig. 6
figure 6

Multiple sub-aperture image extraction with a calibrated raw image in a as obtained by an SPC and extracted 2-D sub-aperture images \(E_{(i,g)}\) in b where each colour represents a different perspective view. Note that the above figures consider a \(180^{\circ }\) image rotation by the sensor to compensate for main lens image rotation. Micro image samples are indexed by \({\left[ s_j, t_h\right] }\) and pixels within micro images by \({\left[ u_{c+i}, v_{c+g}\right] }\) with \(M=3\). Coordinates \({\left[ u_{c+i}, v_{c+g}\right] }\) index viewpoint images and \({\left[ s_j, t_h\right] }\) their related spatial pixels (Color figure online)

Since raw SPC captures do not naturally feature the \(E_{f_s}\left[ s_j, \, u_{c+i}\right] \) index notation, it is convenient to define an index translation formula, considering the light field photograph to be of two regular sensor dimensions \(\left[ x_k, \, y_l\right] \) as taken with a conventional sensor. In the horizontal dimension indices are converted by

$$\begin{aligned} k =j \times M+c+i, \end{aligned}$$
(7)

which means that \(\left[ x_k\right] \) is formed by

$$\begin{aligned} \left[ x_k\right] = \left[ x_{j \times M+c+i}\right] = \left[ s_j, \, u_{c+i}\right] . \end{aligned}$$
(8)

bearing in mind that M represents the 1-D micro image resolution. Similarly, the vertical index translation may be

$$\begin{aligned} l = h \times M+c+g \end{aligned}$$
(9)

and therefore

$$\begin{aligned} \left[ y_l\right] = \left[ y_{h \times M+c+g}\right] = \left[ t_h, \, v_{c+g}\right] . \end{aligned}$$
(10)

These definitions comply with Fig. 6 and enable to apply our 4-D light field notation \(\left[ s_j, \, u_{c+i}, \, t_h, \, v_{c+g}\right] \) to conventionally 2-D sampled representations \(\left[ x_k, \, y_l\right] \), where k and l start to count from index 0. To apply the proposed ray model and image process, the captured light field has to be calibrated and rectified such that the centroid of each micro image coincides with the centre of a central pixel. This requires an image interpolation with sub-pixel precision, which was first pointed out by Cho et al. (2013) and confirmed by Dansereau et al. (2013).

3.2 Virtual Camera Array

In the previous section, it was shown how to render multi-views from SPC photographs by means of the proposed ray model. Because a 4-D plenoptic camera image can be reorganised to a set of multi-view images as if taken with an array of cameras, it is supposed that each of these images has an optical centre of a so-called virtual camera with a distinct location. The localisation of such is, however, not obvious. This problem was first recognised and addressed in publications by our research group (Hahne et al. 2014a, b), but, however, lacked of experimental verification. As a starting point, we deploy ray functions that proved to be viable to pinpoint refocused SPC image planes (Hahne et al. 2016) and further refine the model by finding intersections along the entrance pupil. Once theoretical positions of virtual cameras are derived, we examine in which way the well established concept of stereo triangulation (see Sect. 2) applies to the proposed SPC ray model.

In order to geometrically describe rays in the light field, we first define the height of optical centres \(s_j\) in the MLA by

$$\begin{aligned} s_j = (j-o) \times p_{M} \end{aligned}$$
(11)

with \(o = (J-1) /2\) as the index of the central micro lens where J is the overall number of micro lenses in the horizontal direction. Geometrical MIC positions are denoted as \(u_{c,j}\) and can be found by tracing main lens chief rays travelling through the optical centre of each micro lens. This is calculated by

$$\begin{aligned} u_{c,j} = \frac{s_j}{d_{A'}} \times f_s + s_j, \end{aligned}$$
(12)

where \(f_s\) is the micro lens focal length and \(d_{A'}\) is the distance from MLA to exit pupil of the main lens, which is illustrated in Fig. 4b. Micro image sampling positions that lie next to MICs can be acquired by a corresponding multiple i of the pixel pitch \(p_p\) as given by

$$\begin{aligned} u_{c+i,j} = u_{c,j} + i \times p_p. \end{aligned}$$
(13)

Chief ray slopes \(m_{c+i,j}\) that impinge at micro image positions \(u_{c+i,j}\) can be acquired by

$$\begin{aligned} m_{c+i,j} = \frac{s_j - u_{c+i,j}}{f_s}. \end{aligned}$$
(14)

Let \(b_U\) be the objective’s image distance, then a chief ray’s intersection at the refractive main lens plane \(U_{i,j}\) is given by

$$\begin{aligned} U_{i,j} = m_{c+i,j} \times b_U + s_j. \end{aligned}$$
(15)

where c has been left out in the subscript of \(U_{i,j}\) as it is a constant and will be omitted in following ray functions for simplicity. The spacing between principal planes of an objective lens will be taken into account at a later stage.

Since the main lens works as a refracting element, chief rays possess different slopes in object space, which can be calculated as follows

$$\begin{aligned} F_{i,j}= m_{c+i,j} \times f_U, \end{aligned}$$
(16)

with a chief ray passing through a point \(F_{i,j}\) along the main lens focal plane F by means of its image side slope \(m_{c+i,j}\) and the main lens focal length \(f_U\). Consequentially, a chief ray slope \(q_{i,j}\) of that beam in object space is given by

$$\begin{aligned} q_{i,j} = \frac{F_{i,j} - U_{i,j}}{f_U} \end{aligned}$$
(17)

as it depends on the intersections at refractive main lens plane U, focal plane \(F_U\) and the chief ray’s travelling distance, which is \(f_U\) in this particular case. With reference to preliminary remarks, an object ray’s path may be provided as a linear function \(\widehat{f}_{i,j}\) of the depth z, which is written as

$$\begin{aligned} \widehat{f}_{i,j}(z)&= q_{i,j} \times z+U_{i,j},\quad z \in \left[ U,\infty \right) . \end{aligned}$$
(18)

As the name suggests, sub-aperture images are created at the main lens’ aperture. To investigate ray positions at the aperture, it is worth introducing the aperture’s geometrical equivalents to the proposed model, which have not been considered in our previous publications (Hahne et al. 2014a). An obvious attempt would be to locate a baseline \(B_{A'}\) at the exit pupil, which is found by

$$\begin{aligned} B_{A'} = m_{c+i,j} \times d_{A'}, \end{aligned}$$
(19)

where \(m_{c+i,j}\) is obtained from Eq. (14). Practical applications of an image-side baseline \(B_{A'}\) are unclear at this stage.

Fig. 7
figure 7

SPC model triangulation with \(b_U=f_U\) and principal planes \(H_{1U}\), \(H_{2U}\) just as the exit \(A'\) and entrance pupil plane \(A''\). Red circles next to \(A''_{i}\) indicate virtual camera positions. Note that virtual cameras \(A''_{-1}\) and \(A''_1\) are separated by gap \(G=2\) yielding baseline \(B_2\) (Color figure online)

Fig. 8
figure 8

SPC model triangulation with \(b_U>f_U\). Red circles next to \(A''_{i}\) indicate virtual camera positions. Note that the gap \(G=1\) and therefore \(B_1\) and \(\varPhi _1\) (Color figure online)

However, the baseline at the entrance pupil \(A''\) is a much more valuable parameter when determining an object distance via triangulation in an SPC. Figure 7 offers a closer look at our light field ray model by also showing principal planes \(H_{1U}\) and \(H_{2U}\). There, it can be seen that all rays having i in common (e.g. blue rays) geometrically converge to the entrance pupil \(A''\) and diverge from the exit pupil \(A'\). Intersecting chief rays at the entrance pupil can be seen as indicating object-side-related positions of virtual cameras \(A''_{i}\).

The calculation of virtual camera positions \(A''_{i}\) is provided in the following. By taking object space ray functions \(\widehat{f}_{i,j}(z)\) from Eq. (18) for two rays with different j but same i and setting them equal as given by

$$\begin{aligned} q_{i,o}\times z + U_{i,o} = q_{i,o+1}\times z + U_{i,o+1}, \, \, z \in \left( -\infty , \infty \right) , \end{aligned}$$
(20)

we can solve for the equation system which yields a distance \(\overline{A''H_{1U}}\) from entrance pupil \(A''\) to object-side principal plane \(H_{1U}\) (see Fig. 7). Recall that the index for the central micro lens \(s_j\) is found by with o as the image centre offset.

Table 1 Micro lens specifications for \(\lambda =550\) nm
Table 2 Main lens parameters

The object-side-related position of \(A''_i\) can be acquired by

$$\begin{aligned} A''_i = q_{i,o} \times \overline{A''H_{1U}} + U_{i,o}. \end{aligned}$$
(21)

With this, a baseline \(B_{G}\) that spans from one \(A''_i\) to another by gap G can be obtained as follows

$$\begin{aligned} B_{G} = A''_{i} + A''_{i+G}. \end{aligned}$$
(22)
Fig. 9
figure 9

Photographs from our custom-built camera with a camera body and collimator and b MLA fixation

For example, a baseline \(B_1\) ranging from \(A''_{0}\) to \(A''_{1}\) is identical to that from \(A''_{-1}\) to \(A''_{0}\). This relies on the principle that virtual cameras are separated by a consistent width. To apply the triangulation concept, rays are virtually extended towards the image space by

$$\begin{aligned} N_{i,j} = -q_{i,j} \times b_N + A''_{i}, \end{aligned}$$
(23)

where \(b_N\) is an arbitrary scalar which can be thought of as a virtual image distance and \(N_{i,j}\) as a spatial position at the virtual image plane of a corresponding sub-aperture. The scalable variable \(b_N\) linearly affects a virtual pixel pitch \(p_N\), which is found by

$$\begin{aligned} p_N = \big |N_{i,o} - N_{i,o+1}\big |. \end{aligned}$$
(24)

Setting \(b_U=f_U\) aligns optical axes \(z'_i\) of virtual cameras to be parallel to the main optical axis \(z_U\) (see Fig. 7). For all other cases where \(b_U\ne f_U\) (e.g. Fig. 8), the rotation angle \(\varPhi _i\) of a virtual optical axis \(z'_i\) is obtained by

$$\begin{aligned} \varPhi _i = \arctan {\left( q_{i,o}\right) }. \end{aligned}$$
(25)

The relative tilt angle \(\varPhi _G\) from one camera to another can be calculated with

$$\begin{aligned} \varPhi _{G} = \varPhi _{i} + \varPhi _{i+G}, \end{aligned}$$
(26)

which completes the characterisation of virtual cameras.

Figure 8 visualises chief rays’ paths in the light field when focusing the objective lens such that \(b_U>f_U\). In this case, \(z'_i\) intersects with \(z_U\) at the plane at which the objective lens is focusing. Objects placed at this plane possess a disparity \(\Delta x = 0\) and thus are expected to be located at the same relative 2-D position in each sub-aperture image. As a consequence, objects placed behind the \(\Delta x = 0\) plane expose negative disparity.

Establishing the triangulation in an SPC allows object distances to be retrieved just as in a stereoscopic camera system. On the basis of Eq. (5), a depth distance \(Z_{G,\Delta x}\) of an object with certain disparity \(\Delta x\) is obtained by

$$\begin{aligned} Z_{G,\Delta x} = \frac{b_N \times B_{G}}{\Delta x \times p_N + b_N \times \tan \left( \varPhi _G\right) } \end{aligned}$$
(27)

and can be shortened to

$$\begin{aligned} Z_{G,\Delta x} = \frac{b_N \times B_{G}}{\Delta x \times p_N}, \quad \text {if} \, \, \varPhi _G = 0 \end{aligned}$$
(28)

which is only the case where \(b_U=f_U\). One may notice that Eq. (28) is an adapted version of the well-known triangulation equation given in Eq. (2).

4 Validation

We deploy a custom-made plenoptic camera containing a full frame sensor with 4008 \(\times \) 2672 active image resolution and \(p_p~=~9~\mu \)m pixel pitch. Photos of our camera are depicted in Fig. 9. Details on the assembly and optical calibration of an SPC can be found in Hahne’s thesis (2016). Lens and MLA specifications are provided hereafter.

4.1 Lens Specification

Experimentations are conducted with two different micro lens designs, denoted as MLA (I.) and (II.), which can be found in Table 1. Input parameters relevant to the triangulation are \(f_s\) and \(p_m\). Besides this, Table 1 provides the lens thickness \(t_s\), refractive index n, radii of curvature \(R_{s1}\), \(R_{s2}\) and principal plane distance \(\overline{H_{1s}H_{2s}}\). The number of micro lenses in our MLA amounts to 281 \(\times \) 188 for horizontal and vertical dimensions, respectively. These values allow for modelling the micro lenses in an optical design software.

Fig. 10
figure 10

Disparity maps from sub-aperture images \(E_{(i,g)}\) with \(b_U = f_U\). a Central image \(E_{(0,0)}\) containing 281 by 188 pixels; b disp. map with \(G=4\), \(\text {max}\{\Delta x\} = 5\) and block size = 29; c disp. map with \(G=8\), \(\text {max}\{\Delta x\} = 9\) and block size = 39; d disp. map with \(G=4\), \(\text {max}\{\Delta x\} = 5\) and block size = 29. a Reference image \(E_{(0,0)}\) where \(d_f \rightarrow \infty \). b \(\Delta x\) values from \(E_{(-2,0)}\) and \(E_{(2,0)}\). c \(\Delta x\) values from \(E_{(-4,0)}\) and \(E_{(4,0)}\). d \(\Delta x\) values from \(E_{(0,0)}\) and \(E_{(4,0)}\)

Table 3 Baseline results \(B_G\) with infinity focus \((b_U=f_U)\)

It is well known that the focus ring of today’s objective lenses moves a few lens groups whilst others remain static, which, in consequence, changes the lens system’s cardinal points. To prevent this and simplify the experimental setup, we only shift the plenoptic sensor away from the main lens to vary its image distance \(b_U\) by keeping the focus ring at infinity. In doing so, we assure cardinal points remain at the same relative position. However, the available space in our customised camera constrains the sensor’s shift range to an overall focus distance of \(d_f \approx \) 4 m where \(d_f\) is the distance from the MLA’s front vertex to the plane that the main lens is focused on. For this reason, we examine two focus settings \((d_f \rightarrow \infty ~\text {and}~d_f \approx ~4~\text {m})\) in the experiment. To acquire the main lens image distance \(b_U\), we employ the thin lens equation and solve for \(b_U\) as given by

$$\begin{aligned} b_U = \left( \frac{1}{f_U} - \frac{1}{a_U}\right) ^{-1}, \end{aligned}$$
(29)

with \(a_U=d_f - b_U - \overline{H_{1U}H_{2U}}\) as the object distance. After substituting for \(a_U\), however, it can be seen that \(b_U\) is an input and output parameter at the same time, which turns out to be a typical chicken-and-egg case. To treat this problem, we define the initial image distance to be the focal length (\(b_U:=f_U\)) and substitute the resulting \(b_U\) for the input variable afterwards. This procedure is iterated until both values are the same. Objective lenses are denoted as \(f_{193}\), \(f_{90}\) and \(f_{197}\) with index numbers representing focal lengths in millimetres. The lens designs for \(f_{193}\) and \(f_{90}\) were found in Caldwell (2000), Yanagisawa (1990) whilst \(f_{197}\) is obtained experimentally using the technique provided by TRIOPTICS (2015). Table 2 lists calculated image, exit pupil and principal plane distances for the main lenses. It is noteworthy that all parameters are provided with respect to 550 nm wavelength. Precise focal lengths \(f_U\) are found in the image distance column at the infinity focus row.

4.2 Experiments

To verify claims made about SPC triangulation, experiments are conducted as follows. Baselines and tilt angles are estimated based on Eqs. (22) and (26) using parameters given in Tables 1 and 2. Thereof, we compute object distances from Eq. (27) for each disparity and place real objects at the calculated distances. Experimental validation is achieved by comparing predicted baselines with those obtained from disparity measurements. The extraction of a disparity map from an SPC requires at least two sub-aperture images that are obtained using Eq. (6). Disparity maps are calculated by block matching with the Sum of Absolute Differences (SAD) method using an available implementation (Abbeloos 2010, 2012). To measure baselines, Eq. (27) has to be rearranged such that

$$\begin{aligned} B_{G} = \frac{Z_{G,\Delta x} \times \left( \Delta x \times p_N + b_N \times \tan \left( \varPhi _G\right) \right) }{b_N}. \end{aligned}$$
(30)

This formula can also be written as

$$\begin{aligned} \varPhi _G = \arctan \left( \frac{\frac{B_{G} \times b_N}{Z_{G,\Delta x}} - \Delta x \times p_N}{b_N}\right) , \end{aligned}$$
(31)

which yields a relative tilt angle \(\varPhi _G\) in radians that can be converted to degrees by multiplication by \(180/\pi \).

Stereo triangulation experiments are conducted such that \(B_4\) and \(B_8\), just as \(\varPhi _4\) and \(\varPhi _8\), are predicted based on main lens \(f_{197}\) and MLA (II.) with \(d_f \rightarrow \infty \) and \(d_f \approx 4\) m focus setting. Real objects were placed at selected depth distances \(Z_{G,\Delta x}\) calculated from this setup.

An exemplary sub-aperture image \(E_{(i,g)}\) with infinity focus setting and related disparity maps is shown in Fig. 10. A sub-pixel precise disparity measurement has been applied to Fig. 10b, d as the action figure lies between integer disparities. It may be obvious that disparities in Fig. 10b, d are nearly identical since both viewpoint pairs are separated by \(G=4\), however placed at different horizontal positions. This justifies the claim that the spacing between adjacent virtual cameras is consistent. Besides, it is also apparent that objects at far distances expose lower disparity values and vice versa. Comparing Fig. 10b, c shows that a successive increase in the baseline \(B_G\) implies a growth in the object’s disparity values, an observation also found in traditional computer stereo vision.

Table 3 lists baseline measurements and corresponding deviations with respect to the predicted baseline. This table is quite revealing in several ways. First, the most striking result is that there is no significant difference between baseline predictions and measurements using the model proposed in this paper. The reason for a 0% deviation is that objects are placed at the centre of predicted depth planes \(Z_{G,\Delta x}\). An experiment conducted with random object positions would yield non-zero errors that do not reflect the model’s accuracy, but rather our SPC’s capability to resolve depth, which depends on MLA and sensor specification. Hence, such an experiment is only meaningful when evaluating the camera’s depth resolution. A more revealing percentage error is obtained by a larger number of disparities, which in turn requires the baseline to be extended. These parameters have been maximised in our experimental setup making it difficult to further refine depth. To obtain quantitative error results, Sect. 4.3 aims to benchmark proposed SPC triangulation with the aid of a simulation tool (Zemax 2011).

A second observation is that our previous methods (Hahne et al. 2014a, b) yield identical baseline estimates, but fail experimental validation exhibiting significantly large errors in the triangulation. This is due to the fact that our previous model ignored pupil positions of the main lens such that virtual cameras were seen to be lined up on its front focal plane instead of its entrance pupil. Baseline estimates calculated according to a definition provided by Jeon et al. (2015) further deviate from our results with \(B_4 = 290.7293\) mm and \(B_8 = 581.4586\) mm. As the authors disregard optical centre positions of the sub-aperture images, it is impossible to obtain distances via triangulation and assess results using percentage errors.

Fig. 11
figure 11

Disparity maps from sub-aperture images \(E_{(i,g)}\) with \(b_U > f_U\). a Central image \(E_{(0,0)}\) containing 281 by 187 pixels; b disp. map with \(G=4\), \(\text {max}\{\Delta x\} = 5\) and block size = 33; c disp. map with \(G=8\), \(\text {max}\{\Delta x\} = 9\) and block size = 39; d disp. map with \(G=4\), \(\text {max}\{\Delta x\} = 5\) and block size = 33. a Reference image \(E_{(0,0)}\) where \(d_f \approx 4\) m. b \(\Delta x\) values from \(E_{(-2,0)}\) and \(E_{(2,0)}\). c \(\Delta x\) values from \(E_{(-4,0)}\) and \(E_{(4,0)}\). d \(\Delta x\) values from \(E_{(0,0)}\) and \(E_{(4,0)}\).

Table 4 Tilt angle results \(\varPhi _G\) with 4 m focus (\(b_U>f_U\))

Whenever \(d_f \rightarrow \infty \), virtual camera tilt angles in our model are assumed to be \(\varPhi _G=0^{\circ }\). Accurate baseline measurements inevitably confirm predicted tilt angles as measured baselines would deviate otherwise. To ensure this is the case, a second SPC triangulation experiment is carried out with \(d_f \approx 4\) m, yielding images shown in Fig. 11.

Disparity maps in Fig. 11b, d give further indication that the spacing between adjacent virtual cameras is consistent. Results in Table 4 demonstrate that tilt angle predictions match measurements. It is further shown that virtual cameras are rotated by small angles of less than a degree. Nevertheless, these tilt angles are non-negligible as they are large enough to shift the \(\Delta x=0\) disparity plane from infinity to \(d_f \approx 4\) m, which can be seen in Fig. 11.

Generally, Tables 3 and 4 suggest that the adapted stereo triangulation concept proves to be viable in an SPC without measurable deviations if objects are placed at predicted distances. A maximum baseline is achieved with a short MLA focal length \(f_s\), large micro lens pitch \(p_M\), long main lens focal length \(f_U\) and a sufficiently large entrance pupil diameter.

A baseline approximation of the first-generation Lytro camera may be achieved with the aid of the metadata (*.json file) attached to each light field photograph as it contains information about the micro lens focal length \(f_s=0.025\) mm, pixel pitch \(p_p~\approx ~0.0014\) mm and micro lens pitch \(p_M~\approx ~0.0139\) mm, yielding \(M=9.9286\) samples per micro image. The accommodated zoom lens provides a variable focal length in the range of \(f_U = 6.45\)–51.4 mm (43–341 mm as 35 mm-equivalent) (Ellison 2014). It is unclear whether the source refers to the main lens only or to the entire optical system including the MLA. From this, hypothetical baseline estimates for the first-generation Lytro camera are calculated via Eqs. (20)–(22) and given in Table 5.

Table 5 Baseline estimates of Lytro’s 1st generation camera
Table 6 Baseline and tilt angle simulation with \(G=6\) and \(i=0\)

Disparity analysis of perspective Lytro images should lead to baseline measures \(B_G\) similar to those of the prediction. However, verification is impossible as the camera’s automatic zoom lens, settings (current principal planes and pupil locations) are undisclosed. Reliable measurements of such require disassembly of the main lens, which is impractical in the case of present-day Lytro cameras as main lenses are unmountable.

4.3 Simulation

To obtain quantitative measures, this section investigates the positioning of a virtual camera array by modelling a plenoptic camera in an optics simulation software (Zemax 2011). Table 6 reveals a comparison of predicted and simulated virtual camera positions just as their baseline \(B_{G}\) and relative tilt angle \(\varPhi _G\). Thereby, the distance from an objective’s front vertex \(V_{1U}\) to entrance pupil \(A''\) is given by

$$\begin{aligned} \overline{V_{1U}A''} = \overline{V_{1U}H_{1U}} + \overline{A''H_{1U}} \end{aligned}$$
(32)

bearing in mind that \(\overline{A''H_{1U}}\) is the distance from entrance pupil \(A''\) to object-side principal plane \(H_{1U}\) and \(\overline{V_{1U}H_{1U}}\) separates the front vertex \(V_{1U}\) from its object side principal plane \(H_{1U}\). Simulated \(\overline{V_{1U}A''}\) are obtained by extending ray slopes \(q_{i,j}\) towards the sensor, whilst these virtually elongated rays are seen to ignore lenses and finding the intersection of \(q_{i,j}\) and \(q_{i,j+1}\).

Observations in Table 6 indicate that the baseline grows with

  • larger main lens focal length \(f_U\)

  • shorter micro lens focal length \(f_s\)

  • decreasing focusing distance \(d_f\) \((a_U)\)

given that the entrance pupil diameter is large enough to accommodate the baseline. Besides, it has been proven that tilt angle rotations become larger with decreasing \(d_f\). Baselines have been estimated accurately with errors below 0.1% on average, except for one example. The key problem causing the largest error is that MLA (I.) features a shorter focal length \(f_s\) than MLA (II.) which produces steeper light ray slopes \(m_{c+i,j}\) and hence severe aberration effects. Tilt angle errors remain below 0.3% although results deviate by only \(0.001^{\circ }\) for \(f_{90}\) and are even non-existent for \(f_{193}\). However, entrance pupil location errors of about \(\le \)1% are larger than in any other simulated validation. One reason for these inaccuracies is that the entrance pupil \(A''\) is an imaginary vertical plane, which in reality may exhibit a non-linear shape around the optical axis.

An experiment assessing the relationship between disparity \(\Delta x\) and distance \(Z_{G,\Delta x}\) using different objective lenses is presented in Table 7. From this, it can be concluded that denser depth sampling is achieved with larger main lens focal length \(f_U\). Moreover, it is seen that a tilt in virtual cameras yields a negative disparity \(\Delta x\) for objects further away than \(d_f\), which is a phenomenon that also applies to tilted cameras in stereoscopy. The reason why \(d_f \approx Z_{G,\Delta x}\) when \(\Delta x=0\) is that \(Z_{G,\Delta x}\) reflects the separation between ray intersection and entrance pupil \(A''\), which lies nearby the sensor and \(d_f\) is the spacing between ray intersection and MLA’s front vertex. Overall, it can be stated that distance estimates based on the stereo triangulation behave similar to those in geometrical optics with errors of up to ±0.33%.

Table 7 Disparity simulation and distance with \(G=6\) and \(i=0\)

5 Discussion and Conclusions

In essence, this paper presented the first systematic study on how to successfully apply the triangulation concept to a Standard Plenoptic Camera (SPC). It has been shown that an SPC projects an array of virtual cameras along its entrance pupil, which can be seen as an equivalent to a multi-view camera system. Thereby, the proposed geometry of the SPC’s light field suggests that the entrance pupil diameter constrains the maximum baseline. This backs up and further refines an observation made by Adelson and Wang (1992), who considered the aperture size to be the baseline limit. Our customised SPC merely offers baselines in the millimetre range, which results in relatively small stereo vision setups. Due to this, depth sampling planes move towards the camera, which will prove to be useful for close range applications such as microscopy. It is also expected that multiple viewpoints taken with small baselines evade the occlusion problem.

The presented work has provided the first experimental baseline and distance results based on disparity maps obtained by a plenoptic camera. Predictions of our geometrical model match measures of the experimentation without indicating a significant deviation. An additional benchmark test of the proposed model with an optical simulation software has revealed errors of up to ±0.33% for baseline and distance estimates under different lens settings, which supports the model’s accuracy. Deviations are due to the imperfections of objective lenses. More specifically, prediction inaccuracies may be caused by all sorts of aberrations that result in a non-geometrical behaviour of a lens. By compensating for this through enhanced image calibration, we believe it is possible to lower the measured deviation.

The major contribution of the proposed ray model is that it allows any SPC to be used as an object distance estimator. A broad range of applications for which stereoscopy has been traditionally occupied can benefit from this solution. This includes endoscopes or microscopes that require very close depth ranges, the automotive industry where tracking objects in road traffic is a key task and the robotics industry with robots in space or automatic vacuum cleaners at home. Besides this, plenoptic triangulation may be used for quality assurance purposes in the large field of machine vision. The model further assists in the prototyping stage of plenoptic photo and video cameras as it allows the baseline to be adjusted as desired.

Further research may investigate how triangulation applies to other types of plenoptic cameras, such as the focused plenoptic camera or coded-aperture camera. More broadly, research is also required to benchmark a typical plenoptic camera’s depth resolution against that of competitive depth sensing techniques like stereoscopy, time of flight and light sectioning.