Baseline and Triangulation Geometry in a Standard Plenoptic Camera

In this paper, we demonstrate light field triangulation to determine depth distances and baselines in a plenoptic camera. Advances in micro lenses and image sensors have enabled plenoptic cameras to capture a scene from different viewpoints with sufficient spatial resolution. While object distances can be inferred from disparities in a stereo viewpoint pair using triangulation, this concept remains ambiguous when applied in the case of plenoptic cameras. We present a geometrical light field model allowing the triangulation to be applied to a plenoptic camera in order to predict object distances or specify baselines as desired. It is shown that distance estimates from our novel method match those of real objects placed in front of the camera. Additional benchmark tests with an optical design software further validate the model’s accuracy with deviations of less than ±0.33% for several main lens types and focus settings. A variety of applications in the automotive and robotics field can benefit from this estimation model.


Introduction
Computer vision has been striving to recreate our human visual perception. Wheatstone's fundamental observations (Wheatstone, 1838) state that a set of solely two adjacent cameras facilitates imitating a human's binocular vision. Using these two images in conjunction with a stereo display technique, e.g. stereoscopic glasses (Huang et al, 2015), allows for the reproduction of depth as perceived by humans. With regard to the location in object space, however, this stereo vision system concedes much more freedom than the human's perception as the distance between cameras, called baseline, may vary. Hence, the flexibility in camera stereoscopy makes it possible to adapt to particular depth scenarios. For example, triangulation is used in stellar parallax to measure the distance to stars (Hirshfeld, 2001). What applies to a macroscopic universe, may also be useful for a microscope.
However, miniaturising multiple stereo setups to the level as required by microscopes poses a problem to hardware fabrication since lens diameters restrict baseline gaps between cameras. As an alternative, a Micro Lens Array (MLA) may be placed in front of an image sensor of an otherwise conventional microscope Broxton et al, 2013), which is generally known as a light field camera. An obvious attempt to regard the micro lens pitch as the baseline proves to be impractical as optical parameters of the objective lens affect a light field's geometry (Hahne et al, 2014a,b).
The light field camera, also known as plenoptic camera, was adopted to the field of computer vision ever since Adelson and Wang (1992) published an article, which coined the term plenoptic deduced from Latin and Greek meaning "full view". The authors were the first to computationally generate a depth map by solving the stereo correspondence problem based on footage from a plenoptic camera and concluded that its baseline is confined to the main lens' aperture size. Although Adelson and Wang could not provide methods to arXiv:2010.04638v1 [cs.IR] 9 Oct 2020 acquire quantitative baseline measures, the authors predicted the baseline to be relatively small. When Levoy and Hanrahan (1996) proposed a concise 4-D light field notation, each ray in the light field could be represented by merely four coordinates (u, v, s,t) obtained from the rays' intersection at two two-dimensional (2-D) planes placed behind one another. In respect of a plenoptic camera, these sampling planes may be represented by MLA and image sensor. In case of a plenoptic camera, maximum directional light field resolution is captured when focusing micro lenses to infinity , which is accomplished by placing the MLA stationary one focal length in front of the sensor. This plenoptic camera type has been made commercially available by Lytro Inc. (2012) and is capable of synthetically focusing images (Ng et al, 2005;Fiss et al, 2014;Hahne et al, 2016). By shifting the sensor away from the MLA focal plane, research has shown that the spatial and directional resolution can be traded off, which involves different image synthesis approaches (Lumsdaine and Georgiev, 2008;Georgiev et al, 2006). To distinguish between these optical setups, Lytro's camera was later named Standard Plenoptic Camera (SPC) in a publication by Perwass and Wietzke (2012), who devised a more complex MLA that features different micro lens types. The spatio-angular trade-off in a plenoptic camera is determined by diameter, focal length, image position and packing of the micro lenses, just as the sensor pixel pitch, which thus makes it part of the optical hardware design.
Over the years, several studies have provided different methods to acquire disparity maps from an SPC (Heber and Pock, 2014;Bok et al, 2014;Jeon et al, 2015;Tao et al, 2017). To the best of our knowledge, researchers have not dealt with the estimation of an object's distance using triangulation on the basis of disparity maps obtained from a light field camera. One reason might have been that baselines are required, which are not obvious in the case of plenoptic cameras as the optics involved is more complex than with conventional stereoscopy. Attempts to estimate a plenoptic camera's baseline were initially addressed in publications by our research group (Hahne et al, 2014a,b), which provided validation through simulation only. Besides, main lens pupil positions have been ignored in this work, yielding large deviations when estimating the distance to refocused image planes obtained from an SPC (Hahne et al, 2016). It is thus expected that our previous triangulation scheme (Hahne et al, 2014a,b) entails errors in the experimentation which is subject to investigation. A more recent study by Jeon et al (2015) has also proposed a baseline estimation method without giving details on the optical groundwork and lacking validation activities.
In this paper, we propose a refined optics-geometrical model for light field triangulation and estimate object distances captured by an SPC. Our plenoptic model is the first to pinpoint virtual cameras along the entrance pupil of the objective lens. Verification is accomplished through real images from a custom-built SPC and a ray tracing simulator (Zemax LLC, 2011) for a quantitative deviation assessment. A toplevel overview of the processing pipeline for experimental validation is given in Fig. 1. By doing so, we obtain much more accurate baseline and object distance results than by our previous method (Hahne et al, 2014a) and Jeon et al (2015). The proposed concept will prove to be valuable in fields where stereo vision is traditionally used.  This paper has been organised in the following way. Section 2 briefly reviews the binocular vision concept by means of the geometry in order to recall stereo triangulation. This is followed by a step-wise development of an SPC ray model in Section 3 where the extraction of viewpoints images from a raw SPC capture is also demonstrated. Experimental work is presented in Section 4, which aims to assess claims made in Section 3 by measuring baseline and tilt angle from a disparity map analysis and a ray tracing simulation (Zemax LLC, 2011). Results are summarised and discussed in Section 5.

Coplanar Stereo Cameras
The SPC can be seen as a complex derivative of a stereo vision system. The stereo triangulation concept is presented hereafter to serve as a groundwork. Figure 2 illustrates a stereoscopic camera setup where sensors are coplanar. The depicted setup may be parameterised by the spacing of the cameras' axes, denoted as B for baseline, the cameras' image distance b and the optical centres O L , O R for each camera, respectively. As seen in the diagram, an object point is projected onto both camera sensors indicated by orange dots. With regard to corresponding image centres, the position of the image point in the left camera clearly differs from that in the right. This phenomenon is known as parallax and results in a relative displacement of respective image points from different viewpoints. To measure this displacement, the horizontal disparity ∆ x is introduced given by ∆ x = x R −x L , where x R and x L denote horizontal distances from each projected image point to the optical image Stereo triangulation scheme with parallel cameras where a point is projected through the optical centres O L , O R yielding two image points (orange) in each camera. The relative displacement of these points returns the horizontal disparity ∆ x = x R − x L . The baseline B, object distance Z and image distance b affect the measured disparity.
centre. Nowadays, image detectors are composed of discrete photosensitive cells making it possible to locate and measure ∆ x. The disparity computation is a well studied task (Marr and Poggio, 1976;Yang et al, 1993;Bobick and Intille, 1999) and is often referred to as solving the correspondence problem. Algorithmic solutions to this are applied to a set of points in the image rather than a single one and thus yield a map of ∆ x values, which indicate the depth of a captured scene. An object point's depth distance Z can be directly fetched from parameters in Fig. 2. As highlighted with a dark tone of grey, ∆ x may represent the base of any acute scalene triangle with b as its height. Another triangle spanned by the base B and height Z is a scaled version of it and shown in light grey. This relationship relies on the method of similar triangles and can be written as an equality of ratios To infer the depth distance Z, Eq. (1) may be rearranged to As seen by these equations, it is feasible to retrieve information about the depth location Z. Likewise, if ∆ x is constant, it may be obvious that by decreasing the baseline B, the object distance Z shrinks. Given a case where the depth range is located at a far distance, it is thus recommended to aim for a large baseline. Note that this relationship and corresponding mathematical statements only hold for cases where optical axes of O L , O R are aligned in parallel.

Tilted Stereo Cameras
Reasonable scenarios exist in which a camera's optical axis is tilted with respect to the other. In such a case, the principle of similar triangles does not apply in the same manner as in Eq.
(1). Taking the left camera as the orientation reference, the right lens O R is seen to be tilted as shown in Fig. 3. In this case, perspective image rectification is commonly employed to correct for non-coplanar stereo vision setups (Burger and Burge, 2009). Iocchi (1998) concludes that optical axes intersect in a point Z 0 as both axes lie on the x, z plane if angle rotation occurs around the y-axis, whereas image planes of both cameras are still seen to be parallel. In traditional stereo vision, this yields deviations such that Iocchi's (1998) method serves as a first-order approximation for small angle rotations in the absence of image processing. As demonstrated in Section 3.2, this approach, however, is suitable for our plenoptic triangulation model where imaginary sensor planes of virtual cameras are coplanar, whilst their optical axes may be non-parallel. Let Φ be the rotation angle, then laws of trigonometry allow to put and which may be shortened to after substituting for Z 0 . This approximation suffices to estimate the depth Z for small rotation angles Φ in stereoscopic systems without the need of an image rectification.

SPC Ray Model
To conceptualise a light field ray model for an SPC, we start tracing rays from the sensor side to the object space. For simplification, we consider chief rays only and follow their path from each sensor's pixel centre at micro image domain u to the optical centre of its corresponding micro lens s j with lens index j. Figure 4 visualises chief rays travelling through a micro lens and the objective lens indicating Micro Image Centres (MICs). With the aid of ray geometry, an MIC is found by a chief ray connecting an optical centre of a micro lens with that of the main lens. MICs play a key role in realigning a light field from an SPC and are locally obtained Lens components of plenoptic camera (Hahne et al, 2016) depicting a micro lens s j with pitch size p M in a and an objective lens with exit pupil A in b. A chief ray m c+i, j pierces through the micro lens centre and sensor sampling positions c + i which are separated by pixel width p p . Chief rays originate from the exit pupil centre A and arrive at Micro Image Centres (MICs) where red coloured crossbars signify gaps between MICs and respective micro lens optical axes. It can be seen that red crossbars grow towards image edges.
In earlier publications (Hahne et al, 2014a,b), it was assumed that MICs lie on the optical axes of corresponding micro lenses. However, it has been argued that this assumption would only be true if the distance between objective lens and MLA would be infinitely large (Dansereau, 2014). Due to the finite separation, MICs are displaced from their micro lens optical axes. A more accurate approach in estimating MIC positions is to model chief rays in a way that they connect optical centres of micro and main lenses (Dansereau et al, 2013). In Fig. 4b we further refine this hypothesis by regarding the centre of an exit pupil A to be the origin from which MIC chief rays arise. Detecting MICs correctly is essential for our geometrical light ray model because MICs serve as reference points in the viewpoint image synthesis. Fig. 5 Illustration of the SPC ray model (Hahne et al, 2016), where MICs can be found by connecting the optical centre of the main lens with that of each micro lens and extending these rays (highlighted in yellow) until they reach the sensor. Here, the main lens is modelled as a thin lens such that entrance and exit pupil are in line with principal planes. Figure 5 depicts our more advanced model that combines statements made about light rays' paths in an SPC. For clarity, the main lens U is depicted as a thin lens meaning that the exit pupil centre coincides with the optical centre. However, the distinction is maintained in the following.

Viewpoint Extraction
It has been shown in Adelson and Wang (1992), , Dansereau (2014), Bok et al (2014) that extracting viewpoints from an SPC can be attained by collecting all pixels sharing the same respective micro image position. To comply with provided notations, a 1- where u and c have been omitted in the subscript of E i since i is a sufficient index for sub-aperture images in the 1-D row. Equation (6) implies that the effective viewpoint resolution equals the number of micro lenses. Figure 6 depicts the reordering process producing 2-D sub-aperture images E (i,g) by means of index variables [s j , t h ] and [u c+i , v c+g ] for spatial and directional domains, respectively. As can be seen from colour-highlighted pixels, samples at a specific micro image position correspond to the respective viewpoint location in a camera array. Since raw SPC captures do not naturally feature the E f s [s j , u c+i ] index notation, it is convenient to define an index translation formula considering the light field photograph to be of two regular sensor dimensions, [x k , y l ] as taken with a conventional sensor. In the horizontal dimension indices are converted by which means that [x k ] is formed by bearing in mind that M represents the 1-D micro image resolution. Similarly, the vertical index translation may be and therefore These definitions comply with Fig. 6 and enable to apply our 4-D light field notation [s j , u c+i , t h , v c+g ] to conventionally 2-D sampled representations [x k , y l ] with k ∈ [0, K) and l ∈ [0, L). To apply the proposed ray model and image process, the captured light field has to be calibrated and rectified such that the centroid of each micro image coincides with the centre of a central pixel. This requires an image interpolation with sub-pixel precision, which was first pointed out by Cho et al (2013) and confirmed by Dansereau et al (2013).  17 17 17  17 17 17  17 17  Multiple sub-aperture image extraction with a calibrated raw image in a as obtained by an SPC and extracted 2-D sub-aperture images E (i,g) in b where each colour represents a different perspective view. Note that the above figures consider a 180°image rotation by the sensor to compensate for main lens image rotation. Micro image samples are indexed by [s j , t h ] and pixels within micro images by their related spatial pixels.

Virtual Camera Array
In the previous section, it was shown how to render multiviews from SPC photographs by means of the proposed ray model. Because a 4-D plenoptic camera image can be reorganised to a set of multi-view images as if taken with an array of cameras, it is supposed that each of these images has an optical centre of a so-called virtual camera with a distinct location. The localisation of such is, however, not obvious. This problem was first recognised and addressed in publications by our research group (Hahne et al, 2014a,b) but, however, lacked of experimental verification. As a starting point, we deploy ray functions that proved to be viable to pinpoint refocused SPC image planes (Hahne et al, 2016) and further refine the model by finding intersections along the entrance pupil. Once theoretical positions of virtual cameras are derived, we examine in which way the well established concept of stereo triangulation (see Section 2) applies to the proposed SPC ray model.
In order to geometrically describe rays in the light field, we first define the height of optical centres s j in the MLA by with o = (J − 1) /2 as the index of the central micro lens where J is the overall number of micro lenses in the horizontal direction. Geometrical MIC positions are denoted as u c, j and can be found by tracing main lens chief rays travelling through the optical centre of each micro lens. This is calculated by where f s is the micro lens focal length and d A is the distance from MLA to exit pupil of the main lens, which is illustrated in Fig. 4b. Micro image sampling positions that lie next to MICs can be acquired by a corresponding multiple i of the pixel pitch p p as given by Chief ray slopes m c+i, j that impinge at micro image positions u c+i, j can be acquired by Let b U be the objective's image distance, then a chief ray's intersection at the refractive main lens plane U i, j is given by where c has been left out in the subscript of U i, j as it is a constant and will be omitted in following ray functions for simplicity. The spacing between principal planes of an objective lens will be taken into account at a later stage.
Since the main lens works as a refracting element, chief rays possess different slopes in object space, which can be calculated as follows with a chief ray passing through a point F i, j along the main lens focal plane F by means of its image side slope m c+i, j and the main lens focal length f U . Consequentially, a chief ray slope q i, j of that beam in object space is given by as it depends on the intersections at refractive main lens plane U, focal plane F U and the chief ray's travelling distance, which is f U in this particular case. With reference to preliminary remarks, an object ray's path may be provided as a linear function f i, j of the depth z, which is written as As the name suggests, sub-aperture images are created at the main lens' aperture. To investigate ray positions at the aperture, we introduce the aperture's geometrical equivalents to the proposed model, which have not been considered in (Hahne et al, 2014a). An obvious attempt would be to locate a baseline B A at the exit pupil, which is found by where m c+i, j is obtained from Eq. (14). Practical applications of an image-side baseline B A are unclear at this stage. However, the baseline at the entrance pupil A is a much more valuable parameter when determining an object distance via triangulation in an SPC. Figure 7 offers a closer look at our light field ray model by also showing principal planes H 1U and H 2U . There, it can be seen that all rays having i in common (e.g. blue rays) geometrically converge to the entrance pupil A and diverge from the exit pupil A . Intersecting chief rays at the entrance pupil can be seen as indicating object-side-related positions of virtual cameras A i .
The calculation of virtual camera positions A i is provided in the following. By taking object space ray functions f i, j (z) from Eq. (18) for two rays with different j but same i and setting them equal as given by we can solve for the equation system which yields a distance A H 1U from entrance pupil A to object-side principal plane H 1U (see Fig. 7). Recall that the index for the central micro lens s j is found by j = o = (J − 1) /2 where o defines the image centre offset. The object-side-related position of a virtual camera A i can be acquired by With this, a baseline B G that spans from one A i to another by gap G can be obtained as follows For example, a baseline B 1 ranging from A 0 to A 1 is identical to that from A −1 to A 0 . This relies on the principle that virtual cameras are separated by a consistent width. To apply the triangulation concept, rays are virtually extended towards the image space by where b N is an arbitrary scalar which can be thought of as a virtual image distance and N i, j as a spatial position at the virtual image plane of a corresponding sub-aperture. The scalable variable b N linearly affects a virtual pixel pitch p N , which is found by Setting b U = f U aligns optical axes z i of virtual cameras to be parallel to the main optical axis z U (see Fig. 7). For all other cases where b U = f U (e.g. Fig. 8), the rotation angle Φ i of a virtual optical axis z i is obtained by The relative tilt angle Φ G from one camera to another can be calculated with which completes the characterisation of virtual cameras. Figure 8 visualises chief rays' paths in the light field when focusing the objective lens such that b U > f U . In this case, z i intersects with z U at the plane at which the objective lens is focusing. Objects placed at this plane possess a disparity ∆ x = 0 and thus are expected to be located at the same relative 2-D position in each sub-aperture image. As a consequence, objects placed behind the ∆ x = 0 plane expose negative disparity.
Establishing the triangulation in an SPC allows object distances to be retrieved just as in a stereoscopic camera system. On the basis of Eq. (5), a depth distance Z G,∆ x of an object with certain disparity ∆ x is obtained by and can be shortened to which is only the case where b U = f U . One may notice that Eq. (28) is an adapted version of the well-known triangulation equation given in Eq. (2).
Red circles next to A i indicate virtual camera positions. Note that the gap G = 1 and therefore B 1 and Φ 1 .

Validation
We deploy a custom-made plenoptic camera containing a full frame sensor with 4008 x 2672 active image resolution and p p = 9 µm pixel pitch. Photos of our camera are depicted in Fig. 9. Details on the assembly and optical calibration of an SPC can be found in Hahne's thesis (2016). Lens and MLA specifications are provided hereafter.

Lens Specifications
Experimentations are conducted with two different micro lens designs, denoted as MLA (I.) and (II.), which can be found in Table 1. Input parameters relevant to the triangulation are f s and p m . Besides this, Table 1 provides the lens thickness t s , refractive index n, radii of curvature R s1 , R s2 and Fig. 9 Photographs from our custom-built camera with camera body and collimator (left) and MLA fixation (right).
principal plane distance H 1s H 2s . The number of micro lenses in our MLA amounts to 281 x 188 for horizontal and vertical dimensions, respectively. These values allow for modelling the micro lenses in an optical design software. It is well known that the focus ring of today's objective lenses moves a few lens groups whilst others remain static which, in consequence, changes the lens system's cardinal points. To prevent this and simplify the experimental setup, we only shift the plenoptic sensor away from the main lens to vary its image distance b U by keeping the focus ring at infinity. In doing so, we assure cardinal points remain at the same relative position. However, the available space in our customised camera constrains the sensor's shift range to an overall focus distance of d f ≈ 4 m where d f is the distance from the MLA's front vertex to the plane that the main lens is focused on. For this reason, we examine two focus settings (d f → ∞ and d f ≈ 4 m) in the experiment. To acquire the main lens image distance b U , we employ the thin lens equation and solve for b U as given by with a U = d f − b U − H 1U H 2U as the object distance. After substituting for a U , however, it can be seen that b U is an input and output parameter at the same time, which turns out to be a typical chicken-and-egg case. To treat this problem, we define the initial image distance to be the focal length (b U := f U ) and substitute the resulting b U for the input variable afterwards. This procedure is iterated until both values are the same. Objective lenses are denoted as f 193 , f 90 and f 197 with index numbers representing focal lengths in millimetres. The lens designs for f 193 and f 90 were found in (Caldwell, 2000;Yanagisawa, 1990) whilst f 197 is obtained experimentally using the technique provided by TRIOPTICS (2015). Table 2 lists calculated image, exit pupil and principal plane distances for the main lenses. It is noteworthy that all parameters are provided with respect to 550 nm wavelength. Precise focal lengths f U are found in the image distance column at the infinity focus row.

Experiments
To verify claims made about SPC triangulation, experiments are conducted as follows. Baselines and tilt angles are estimated based on Eqs. (22) and (26) using parameters given in Tables 1 and 2. Thereof, we compute object distances from Eq. (27) for each disparity and place real objects at the calculated distances. Experimental validation is achieved by comparing predicted baselines with those obtained from disparity measurements. The extraction of a disparity map from an SPC requires at least two sub-aperture images that are obtained using Eq. (6). Disparity maps are calculated by block matching with the Sum of Absolute Differences (SAD) method using an available implementation (Abbeloos, 2010(Abbeloos, , 2012. To measure baselines, Eq. (27) has to be rearranged such that This formula can also be written as which yields a relative tilt angle Φ G in radians that can be converted to degrees by multiplication by 180/π. Stereo triangulation experiments are conducted such that B 4 and B 8 , just as Φ 4 and Φ 8 , are predicted based on main lens f 197 and MLA (II.) with d f → ∞ and d f ≈ 4 m focus setting. Real objects were placed at selected depth distances Z G,∆ x calculated from this setup. An exemplary sub-aperture image E (i,g) with infinity focus setting and related disparity maps is shown in Fig. 10. A sub-pixel precise disparity measurement has been applied to Figs. 10b and 10d as the action figure lies between integer disparities. It may be obvious that disparities in Figs. 10b and 10d are nearly identical since both viewpoint pairs are separated by G = 4, however placed at different horizontal positions. This justifies the claim that the spacing between adjacent virtual cameras is consistent. Besides, it is also apparent that objects at far distances expose lower disparity values and vice versa. Comparing Figs. 10b and 10c shows that a successive increase in the baseline B G implies a growth in the object's disparity values, an observation also found in traditional computer stereo vision.  Table 3 lists baseline measurements and corresponding deviations with respect to the predicted baseline. This table is quite revealing in several ways. First, the most striking result is that there is no significant difference between baseline predictions and measurements using the model proposed in this paper. The reason for a 0 % deviation is that objects are placed at the centre of predicted depth planes Z G,∆ x . An experiment conducted with random object positions would yield non-zero errors that do not reflect the model's accuracy, but rather our SPC's capability to resolve depth, which depends on MLA and sensor specification. Hence, such an experiment is only meaningful when evaluating the camera's depth resolution. A more revealing percentage error is obtained by a larger number of disparities, which in turn requires the baseline to be extended. These parameters have been maximised in our experimental setup making it difficult to further refine depth. To obtain quantitative error results, Subsection 4.3 aims to benchmark proposed SPC triangulation with the aid of a simulation tool (Zemax LLC, 2011).
A second observation is that our previous methods (Hahne et al, 2014a,b) yield identical baseline estimates, but fail experimental validation exhibiting significantly large errors in the triangulation. This is due to the fact that our previous model ignored pupil positions of the main lens such that virtual cameras were seen to be lined up on its front focal plane instead of its entrance pupil. Baseline estimates calculated according to a definition provided by Jeon et al (2015) further deviate from our results with B 4 = 290.7293 mm and B 8 = 581.4586 mm. As the authors disregard optical centre positions of the sub-aperture images, it is impossible to obtain distances via triangulation and assess results using percentage errors.
Whenever d f → ∞, virtual camera tilt angles in our model are assumed to be Φ G = 0°. Accurate baseline measurements inevitably confirm predicted tilt angles as measured baselines would deviate otherwise. To ensure this is the case, a second SPC triangulation experiment is carried out with d f ≈ 4 m, yielding images shown in Fig. 11. Disparity maps in Figs. 11b and 11d give further indication that the spacing between adjacent virtual cameras is consistent. Results in Table 4 demonstrate that tilt angle predictions match measurements. It is further shown that virtual cameras are rotated by small angles of less than a degree. Nevertheless, these tilt angles are non-negligible as they are large enough to shift the ∆ x = 0 disparity plane from infinity to d f ≈ 4 m, which can be seen in Fig. 11. Generally, Tables 3 and 4 suggest that the adapted stereo triangulation concept proves to be viable in an SPC without measurable deviations if objects are placed at predicted distances. A maximum baseline is achieved with a short MLA focal length f s , large micro lens pitch p M , long main lens focal length f U and a sufficiently large entrance pupil diameter. A baseline approximation of the first-generation Lytro camera may be achieved with the aid of the metadata (*.json file) attached to each light field photograph as it contains information about the micro lens focal length f s = 0.025 mm, pixel pitch p p ≈ 0.0014 mm and micro lens pitch p M ≈ 0.0139 mm, yielding M = 9.9286 samples per micro image. The accommodated zoom lens provides a variable focal length f U in the range of 6.45 mm -51.4 mm (43 mm -341 mm as 35 mm-equivalent) (Ellison, 2014). It is unclear whether the source refers to the main lens only or to the entire optical system including MLA. From this, hypothetical baseline estimates for the first-generation Lytro camera are calculated via Eqs. (20) to (22) and given in Table 5. Disparity analysis of perspective Lytro images should lead to baseline measures B G similar to those of the prediction. However, verification is impossible as the camera's automatic zoom lens settings (current principal planes and pupil locations) are undisclosed. Reliable measurements of such require disassembly of the main lens, which is impractical in the case of present-day Lytro cameras as main lenses are unmountable.

Simulation
To obtain quantitative measures, this section investigates the positioning of a virtual camera array by modelling a plenoptic camera in an optics simulation software (Zemax LLC, 2011). Table 6 reveals a comparison of predicted and simulated virtual camera positions just as their baseline B G and relative tilt angle Φ G . Thereby, the distance from an objective's front vertex V 1U to entrance pupil A is given by bearing in mind that A H 1U is the distance from entrance pupil A to object-side principal plane H 1U and V 1U H 1U separates H 1U from the front vertex V 1U . Simulated V 1U A are obtained by extending ray slopes q i, j towards the sensor whilst these virtually elongated rays are seen to ignore lenses and finding the intersection of q i, j and q i, j+1 . Observations in Table 6 indicate that the baseline grows with larger main lens focal length f U shorter micro lens focal length f s decreasing focusing distance d f (a U ) Table 6 Baseline and tilt angle simulation with G = 6 and i = 0. given that the entrance pupil diameter is large enough to accommodate the baseline. Besides, it has been proven that tilt angle rotations become larger with decreasing d f . Baselines have been estimated accurately with errors below 0.1 % on average except for one example. The key problem causing the largest error is that MLA (I.) features a shorter focal length f s than MLA (II.) which produces steeper light ray slopes m c+i, j and hence severe aberration effects. Tilt angle errors remain below 0.3 % although results deviate by only 0.001°f or f 90 and are even non-existent for f 193 . However, entrance pupil location errors of about ≤ 1 % are larger than in any other simulated validation. One reason for these inaccuracies is that the entrance pupil A is an imaginary vertical plane which in reality may exhibit a non-linear shape around the optical axis. An experiment assessing the relationship between disparity ∆ x and distance Z G,∆ x using different objective lenses is presented in Table 7. From this, it can be concluded that denser depth sampling is achieved with larger main lens focal length f U . Moreover, it is seen that a tilt in virtual cameras yields a negative disparity ∆ x for objects further away than d f which is a phenomenon that also applies to tilted cameras in stereoscopy. The reason why d f ≈ Z G,∆ x when ∆ x = 0 is that Z G,∆ x reflects the separation between ray intersection and entrance pupil A , which lies nearby the sensor and d f is the spacing between ray intersection and MLA's front vertex. Overall, it can be stated that distance estimates based on the stereo triangulation behave similar to those in geometrical optics with errors of up to ±0.33 %.

Discussion and Conclusions
In essence, this paper presented the first systematic study on how to successfully apply the triangulation concept to a Standard Plenoptic Camera (SPC). It has been shown that an SPC projects an array of virtual cameras along its entrance pupil, which can be seen as an equivalent to a multi-view camera system. Thereby, the proposed geometry of the SPC's light field suggests that the entrance pupil diameter constrains the maximum baseline. This backs up and further refines an observation made by Adelson and Wang (1992), who considered the aperture size to be the baseline limit. Our customised SPC merely offers baselines in the millimetre range, which results in relatively small stereo vision setups. Due to this, depth sampling planes move towards the camera, which will prove to be useful for close range applications such as microscopy. It is also expected that multiple viewpoints taken with small baselines evade the occlusion problem.
The presented work has provided the first experimental baseline and distance results based on disparity maps obtained by a plenoptic camera. Predictions of our geometrical model match measures of the experimentation without indicating a significant deviation. An additional benchmark test of the proposed model with an optical simulation software has revealed errors of up to ±0.33 % for baseline and distance estimates under different lens settings, which supports the model's accuracy. Deviations are due to the imperfections of objective lenses. More specifically, prediction inaccuracies may be caused by all sorts of aberrations that result in a nongeometrical behaviour of a lens. By compensating for this through enhanced image calibration, we believe it is possible to lower the measured deviation.
The major contribution of the proposed ray model is that it allows any SPC to be used as an object distance estimator. A broad range of applications for which stereoscopy has been traditionally occupied can benefit from this solution. This includes endoscopes or microscopes that require very close depth ranges, the automotive industry where tracking objects in road traffic is a key task and the robotics industry with robots in space or automatic vacuum cleaners at home. Besides this, plenoptic triangulation may be used for qual- ity assurance purposes in the large field of machine vision. The model further assists in the prototyping stage of plenoptic photo and video cameras as it allows the baseline to be adjusted as desired. Further research may investigate how triangulation applies to other types of plenoptic cameras, such as the focused plenoptic camera or coded-aperture camera. More broadly, research is also required to benchmark a typical plenoptic camera's depth resolution with that of competitive depth sensing techniques like stereoscopy, time of flight and light sectioning.