Establishing the performance of low-cost Lytro cameras for 3D coordinate geometry measurements

Lytro cameras are equipped to capture 3D information in one exposure without the need for structured illumination, allowing greyscale depth maps of the captured image to be created using the Lytro desktop software. These consumer-grade light-field cameras (Lytro) provide a cost-effective method of measuring the depth of multiple objects which is suitable for many applications. But, the greyscale depth maps generated using the Lytro cameras are in relative depth scale and hence not suitable for engineering applications where absolute depth is essential. In this research, camera control variables, environmental sensitivity, depth distortion characteristics, and the effective working range of first- and second-generation Lytro cameras were evaluated. In addition, a depth measuring technique to deliver 3D output depth maps represented in SI units (metres) is discussed in detail exhibiting the suitability of consumer-grade Lytro cameras suitability in metrological applications without significant modifications.


Introduction
Measuring the depth of a scene using digital cameras is becoming a common technique used in many engineering fields. The accuracy of depth measurement varies depending on the application and equipment, with techniques for improving the accuracy being developed extensively over the last two decades. Early techniques to generate scene depth data using 2D cameras involved capturing multiple images of the same scene by changing the camera position (z axis) [1,2], aperture diameter, and focal length [3][4][5]. These methods may be suitable for some applications that require only nonparameterised depth data, i.e. different depths represented by varying relative greyscale values. In this case, depth data generated with these techniques may not necessarily have a direct relationship with real-world measurements.
Within the field of machine vision, absolute depth information of a scene is particularly useful, and research is ongoing to improve the time taken to provide this data using minimum sensors and hardware but maximising depth and data quality. Depth sensing instruments are finding many applications in the field of robotics and manufacturing industries, and new technologies have proliferated in recent years. Light-field (or plenoptic) camera technology is one of the newer methods that is attracting interest (from a domestic consumer viewpoint and more recently from an industrial perspective), with potentially interesting features to offer the engineering community such as higher depth-of-focus, change focus after capturing an image, depth information, and single camera operation. Light-field cameras can offer depth information of a scene in a single exposure, this being a significant advantage over other depth sensing instruments that require multiple images, multiple camera positions, or multiple simultaneous cameras.
Light-field cameras can produce additional Z coordinate data when compared to normal cameras which generate X and Y coordinate data. A microlens array (MLA) is placed in between the primary lens assembly and the imaging sensor, with each MLA lens element of shorter focal length compared to the focal length of the primary lens. The microlenses act as a multiplicity of individual cameras capturing angular data of light rays [6], hence recording 3D light-field 1 3 data on a 2D photosensor compared to a normal camera recording 2D data with the same sensor.
The computer vision community has invested significant research in this new field with several new companies already commercialising domestic and industrial light-field cameras such as Lytro, Pelican, and Raytrix [7][8][9]. Even though there is a high cost difference in light-field cameras available in the market (consumer grade versus industrial grade), the basic working principle is the same and there are limited differences with respect to hardware and optics depending upon the application and the user.
Cameras produced by the Lytro Company are typically consumer-grade products. The depth data generated by these cameras are usually indexed in varying relative greyscale values corresponding to changes in depth in the scene, rather than being represented as absolute depth values. These relative greyscale depth results are used by computer vision groups to enhance and provide smooth shift between different depth values [10][11][12][13]. These methods typically concentrate on generating better-quality greyscale depth maps of the respective scenes, rather than developing depth maps in absolute units. In addition, key optical characteristics required to calculate absolute depth data in a straightforward manner, such as the MLA to photosensor distance ( d ′ ), are not provided. Hence, a hybrid camera set-up was used to measure depth values [14].
The purpose of the research presented here is to define the potential for commercial consumer-grade light-field cameras (in this case Lytro cameras) for measuring scene depth in absolute (in this case SI units) with the potential for engineering metrology applications. This work extends previous definitions of the accuracy and the repeatability capability of such cameras [15], defines possible environmental conditions when these values may be achieved, and provides performance benchmarks with this lower end yet cost-effective technology to allow exploration as a potential 3D coordinate metrology solution, in preference to other multi-camera or multi-view solutions.

Theory of light-field cameras
The idea of gathering light-field data using a single 2D sensor was originally proposed by Lippmann in 1908 [16] with the help of very small lenses similar to the primary lens, although fabrication in the early nineteenth century was difficult. An alternative (but similar) technique was the development of parallax stereograms [17,18]. Advancements in precision optics and manufacturing techniques led the way to the fabrication of very small lenses with high accuracy and precision, eventually leading to the development of the light-field cameras.
The interaction of light rays in a traditional 2D camera and a light-field camera is similar with respect to the primary lens system. However, when an MLA is placed in between the primary lens and the photosensor, the light rays from the primary lens pass through the MLA and are then recorded on the photosensor. Since each microlens unit of the MLA is very small in size compared to the primary lens (typically 100-150 times smaller), the light passing through these small elements are original light rays carrying encoded directional information.
However, differences between capturing a scene using a light-field camera and a conventional camera are significant. The raw image of a normal camera resembles the captured scene (albeit in 2D) without any further processing or alterations of pixels by the user. With a light-field camera, the raw image does not resemble the scene because it is a collection of multiple light rays from different directions, bundled as a group classified based on their native or respective microlens. Hence, different views of a scene are recorded with a single exposure [6,15] although the spatial resolution of the final image is sacrificed.
The light-field camera works on the principle of the pinhole camera model, where each microlens in an MLA acts as an individual camera recording light rays onto the relevant pixels. Each microlens group records light rays passing into the camera from different viewpoints, and hence, the lightfield image is a set of different views of the same scene. This allows post-focusing of the captured image data, permitting blurred regions in an image to be brought into focus once the scene data have been recorded.
To illustrate, three microlenses of the same focal length in an MLA are shown in Fig. 1, where the photosensor is initially placed at a distance d ′ from the MLA. Light rays pass through the microlens optical centre and are recorded by the pixels of microimage, where the microimage is the number of pixels belonging to an individual microlens unit. In this configuration, there is no overlapping of light rays. The light rays recorded by the photosensor pixels at the distance d ′ are highlighted with a circle for M2 . The light ray recorded by the pixel under consideration is denoted as P i . If the photosensor is physically placed at a distance of d ′′ instead of d ′ , then the light rays travel the longer distance of d ′′ , and consequently, there will be overlapping of light rays occurring on pixels of each microimage group. For example, the pixel recording the light ray from the optical centre of M2 at distance d ′ also experiences light rays from M1 and M3, respectively. A similar phenomenon happens with the rest of the pixels in the microimage. The pixel under consideration ( P ′ i ) receives data from M2 and is recorded on a different pixel when compared to the same scenario with distance d ′ .
The assumption made here for post-focusing features is that the light rays travel in a straight line after leaving the pinhole model. If the photosensor physical distance d ′ is retained and pixel P i is retraced virtually and further towards distance d ′′ , then we end up reaching pixel P ′ i . If the same technique is followed for all pixels, the resulting data match with the data of the photosensor as if it were physically placed at d ′′ .
To perform such virtual movements of photosensor, specific details about the MLA are necessary, such as each microlens centre location with respect to the photosensor pixel and distance between microlens elements. This information can be calculated manually [19][20][21][22] or obtained from the MLA manufacturer.
The ray-space coordinate diagram is a geometric reconstruction that illustrates how a ray parameterised by considering physical pixel and MLA location to generate virtual pixel data, and hence, an entire LF′ image (virtual LF data) is reparameterised by the intersection of the light ray with the MLA and photosensor planes. In Fig. 2, u is the microlens centre, x is the pixel location on the real photosensor under consideration with the parent MLA location of u , so the pixel under consideration is (x − u) and x ′ is the resulting location of the light ray on the virtual photosensor. By similar triangles, the illustrated ray that intersects the lens at u and the film plane at x also intersects the x ′ plane given by (1).
The diagram only shows the 2D light rays involving the x and u planes, but in the 3D scenario, there will be additional y and v planes forming complete 4D LF data. As a result, is defined as = d � d �� which is the relative depth of the film plane and the recorded LF is given as L F (x, y, u, v) . Equations (2) and (3) represent the virtual 4D LF data, , that result from focusing at different depths corresponding to the value. The distance between MLA and the photosensor ( d ′ ) plays an important role in changing the focus plane in the light-field cameras.

Performance characteristics of the Lytro cameras
Lytro cameras were one of the first commercially available light-field cameras specifically aimed at the consumer market. The Lytro Desktop Software (LDS) was provided to process all images captured by the family of cameras (Lytro I first-generation and second-generation Illum). Depth maps (greyscale), all-in-focus images, perspective-shift images, a small video of the captured scene, and stereo image pairs can all be exported. Images are stored in .lfp and .lfr formats in the Lytro I generation and Illum cameras, respectively. Specifications for the cameras used in this work are defined in Table 1, whereby LC1 is a Lytro I camera (I), LC2 is an additional Lytro I camera (II), and LC3 is a second-generation Lytro Illum camera.

Sensitivity to colour and changes of contrast
It has been previously observed that Lytro cameras have different response characteristics to colours, especially combinations of colours with high-low contrast, such as white and black, white and red, and other combinations [15]. This response is further demonstrated in Fig. 3, where a cameragenerated depth map of a black and white checkerboard is defined at different depths in the final depth image (Fig. 3a). Similar results can be seen with the RGB checkerboard image (Fig. 3b), whereby depth values have changed as a function of colour (noting that the figure is represented here in greyscale). However, when using similar or uniform colour objects as shown in Fig. 4, the camera can distinguish between several similar Lego™ bricks, and each brick is represented with different and appropriate depth values (a dark and white region in depth map represents near and far regions, respectively). It should be noted that in some cases the depth values change for the same bricks when very close to the camera under constant lighting conditions. This characteristic of these commercial light-field cameras illustrates the potential effects that external factors have on depth value calculations and representation. But it is further noted that the depth values remain consistent when images are taken with constant lighting conditions at a fixed distance from the camera. This provides evidence for repeatability, a basic requirement of any metrology instrument in an engineering application.

Sensitivity to depth
The Lytro I and Illum cameras exhibit different depth distortion responses when the target objects are very close to the cameras (< 100 mm). To illustrate this distortion, a flat plane object was imaged at 20 mm distance from the camera with greyscale depth values generated. Data recorded using the Lytro I camera resulted in a uniform depth map, whereas the Illum camera results in irregular (U-shape) depth values, as shown in Fig. 5. The significant depth sensitivity is due to the fact that the Illum camera has more microlens groups, each accommodating 196 pixels compared to the Lytro I camera (100 pixels) and hence is more sensitive to change in distances that internally depends on the change in illuminations. It was observed that the irregular depth results of the Illum camera were inversely proportional to distance, i.e. the irregular depth values were higher when the object was very close, and the irregularity decreased as the object distance increased from the camera, eventually resulting in flat greyscale depth maps.
An averaging technique was employed to post-process the irregularity in the depth values of the Illum camera since the LDS was considered as a black box with the internal parameters of the software pipeline being unknown to the users. In the averaging technique, the centre of each image was found by calculating the centre pixels of the output image, C x , C y , with r being the distance of image pixel (x, y) under consideration from C x , C y , and r c the radius under consideration that decreases as the fixed size object moves away from the camera, as shown in Fig. 6. Since the Illum raw image width was 1.4 times the height, there are two reasons to choose r c along the width of the image instead of the height of the image. Firstly, irregular depth values increase outwards from the centre of the image. Secondly, if the radius is considered along the height, then the total length of r c will be less than the actual length of the object and hence total points for averaging will be less. Using (4), the average depth value for a flat surface at a given distance from the camera can be calculated, where r and r c are given by (5) and (6), respectively.
This method of averaging is applicable for the Illum camera because the rate of distortion is independent of the object distance from the camera. In this work, n = 5 for all experiments has been used and final depth values were calculated using (7), where d i is the greyscale depth values (for the Illum camera d Avg = d i ), and n is the number of images.

Depth measurement characteristics
The depth results from the Lytro cameras have no direct relation with absolute depth values and hence a mapping technique becomes important. Here, greyscale depth values ranging from 0 to 255 were mapped to an absolute distance less than 1000 mm. The mapping set-up [15] was achieved using a stable laboratory environment under different controlled uniform diffuse lighting conditions (1400 cd and 1600 cd) and temperature conditions (20 °C). The Lytro cameras were located in the same position and orientation, for all changes of object distance. Positional information of distance between the camera and object was recorded using a motorised positional encoder (linear rail) unit with an uncertainty of ± 10 μm. The image plane and the object plane were set to be parallel with respect to each other (Fig. 7). All camera features were set to be in automatic mode, with shutter speed, ISO, and neutral density filter values noted as being 1/40 of a second, 400, and − 0.4, respectively, for each camera, at 120 mm object distance from the camera.
Nonreflective paper was attached to a flat glass plate (200 mm in height and 150 mm in width) and used as the object for all experiments. These were carried out, one camera at a time, by varying only the object distance from the camera. Five images were captured at every 5 mm increment in object distance, up to and including 1000 mm. All five images at each image position were processed through the LDS to generate an 8-bit depth map, and MATLAB 2015b was used to process the final average depth map and the RGB image for each distance increment. This experimentation provided the framework for defining and evaluating the sensitivity or response curves of each camera, a criteria that define the operational characteristics and measurement zone of each camera.

Sensitivity curves
The depth range defines the distance from an instrument that can be measured, and results are ideally required to be highly repeatable. Repeatability defines the ability of the instrument to produce the same results measured at different instances, provided that there is constant measurement condition. With a higher depth range and repeatability value, the confidence of using measured data in any application will be high. Similar to many depth sensing devices, the Lytro cameras have a defined depth range that is significantly influenced by lighting conditions and surface nature of object measured. In this work, this has been defined as being the sensitivity or response curve. The sensitivity curve is a method of expressing the relationship of how (relative) depth values measured using a Lytro light-field camera can be expressed in SI units (mm). It also represents the camera behaviour for a given working environment (lighting condition set-up) as defined above.
The sensitivity curve results are not linear with depth values varying with distance, as shown in Fig. 8a, b, c. The results indicate that only a section of the sensitivity curve gives viable and useful data that has an approximately linear relationship with the relative greyscale depth to distance (mm). All three response curves defined in Fig. 8 have been divided into an active zone (AZ) and inactive zones (IAZ), depending upon the relationship between axes, greyscale depth values and distance from the camera. An AZ is categorised as a region where the resulting response curve has an approximately linear variation with respect to both axes, while an IAZ is a region where there is no possible linear relationship between two axes, i.e. many distances are represented by very few greyscale values (very low sensitivity response), or, many greyscale values are represented by small distance values (very high sensitivity response).
The LC1 and LC2 cameras (Fig. 8a, b) have a very limited initial response for close-range depth detection with 0 mm represented by 0 greyscale depth values, increasing to 85 greyscale depth values for 10 mm of measurement. A   Fig. 7 Experiment set-up to map relation between greyscale and absolute values similar trend is shown by the LC3 camera (Fig. 8c) where close-range measurement starts with 130 greyscale values, drops to 25 greyscale values and then increases steadily. The depth values from 0 mm to approximately 40 mm, and after 280 mm, are defined as IAZ, while 50 mm to 270 mm is defined as AZ for LC1 and LC2 cameras. For the LC3 camera, the AZ extends from 10 to 500 mm, this being a function of the different camera and MLA design. The frequent variations in the response curve of LC3 may be due to optical distortion that is influenced by the higher sensitivity to the depth, as defined in Sect. 3.
When considering the two different lighting conditions, the total distance of AZ for LC1 and LC2 cameras remained unchanged (approximately 210 mm). However, the AZ distance changed position as a function of lighting condition, i.e. for LC1 and LC2 cameras at 1400 cd, there was a large initial IAZ of 30 mm to 40 mm which then led into the AZ, while at 1600 cd, the IAZ decreased more than 50% to 10 mm and then 210 mm of AZ. This suggests that the initial IAZ (and consequently the AZ) can be controlled by changing lighting conditions depending upon the application. However, it also identifies that changing experimental conditions or object of analysis, may change the response characteristics of the measurement process.

Pixel resolution assessment
For complete 3D analysis of an object or scene, it is important to measure the X and Y coordinates as well as the Z values. For machine vision applications, it is important to calculate the height and width of an object by counting the number of pixels accommodated in the region of interest (ROI) and multiplying the number of pixels with the corresponding pixel resolution providing the actual measurement. For such calculation, the final image from the machine vision camera should be free from distortions and blur. Optical distortion will reduce the accuracy of any measurement application using the camera. Also, blur causes problems in identifying the exact number of pixels in the ROI. One of the advantages of using the Lytro camera for measuring spatial resolution is that along with greyscale depth map, the LDS generates an all-in-focus RGB image of the scene, which is the 2D collection (pixels) of focused region of the ROI.
Optical distortion will pose a problem for calculating the exact number of pixels in the ROI, hence important to generate a distortion-free image before calculating pixel value in SI units (mm). Using the LDS, a central camera view was extracted, i.e. 2D RGB all-in-focus image, and subsequently used for spatial calculations. The LDS calibrated the 2D images internally before generating the results, but noticeable distortion existed when objects were very close to the cameras in the range of 10 mm to 300 mm. Hence, the RGB images were processed to generate a distortion-free image using MATLAB image correction code, as defined in Fig. 9. A group of 20 images of a regular pattern checkerboard (40 mm grid size) were captured using the Lytro camera. These were processed, and calibration parameters for a given camera were generated.
To calculate each pixel value in absolute units (in this case millimetres), a regular pattern checkerboard was again used as an object. Raw images of the checkerboard were taken at different distances using the set-up defined in Sect. 4, ranging from 0 to 1000 mm with an increment of 50 mm. For each increment in distance, five images were taken and processed for pixel resolution. If the height and width of the checkerboard per unit are ( c h , c w ), the distance of the checkerboard from the camera is d and the average pixel count in each unit of the checkerboard is ( n h , n w ), then (8) represents the final pixel resolution, Pr(d) , in SI units.
Using pre-calculated calibration parameters on distorted Lytro images, distortion-free images were generated, and checkerboard corners were detected. The number of pixels between the checkerboard corners was calculated for each image; this being repeated for all five images at each distance setting, and the resulting pixel count was averaged to obtain the final pixel count. This value was normalised to obtain an absolute pixel value (millimetres).
The lateral pixel resolution provides the key value for a given distance that can convert pixel count into absolute units (millimetres). Figure 10 demonstrates that all three cameras have a linear relationship as a function of object distance and pixel resolution. Here, it can be seen that as object distance increases, the pixel resolution changes (as expected) to include a larger lateral distance. The LC1 and LC2 cameras behave similarly to each other with respect to changing pixel resolution values, while LC3 deviates slightly although the cause was not determined. It should also be noted that pixel resolution was calculated in 50 mm steps; hence, the lateral pixel response in between 50 mm was assumed to be linear.

3D measurement
The development of the key metrics of distortion correction, sensitivity definition, and lateral pixel resolution allowed absolute 3D depth maps to be generated. Figure 11 represents the workflow to generate 3D data from the raw Lytro scene data. Raw images were processed to generate greyscale depth data and a 2D-RGB colour map of the scene. Each greyscale value in a depth map was matched to the pre-calculated response curve data, and the corresponding absolute value was generated. Complete absolute depth data were obtained, with each Lytro camera having its own Z axis measurement range. Using the 2D-RGB data and pixel resolution data, the width and height of the objects in the scene were calculated.
Once the ROI was selected, the number of pixels and the absolute depth of ROI were calculated, and subsequently corresponding X and Y axes in absolute values were generated using pixel resolution data, leading to complete 3D data sets. To validate this 3D measurement method, reference data were used of the known dimensions of objects placed at known distances from each camera. For example, objects placed perpendicular to the camera optical axis are shown in Fig. 12 and the resulting depth map is shown in Fig. 13 with comparative 3D measurement data listed in Table 2. The results of Z axis measurement are ± 5.0 mm when compared to the reference values, this being a function of the response curve resolution generated for this experimental research in 5 mm increment steps up to 1000 mm (Sect. 4). The lateral or spatial measurements are also well defined with variation from reference values being approximately ± 1 mm.
The depth data along with the RGB values obtained by the all-in-focus image were rendered using a 3D viewer The left-hand side figure elements represent the top view of the relative data from the LDS software with compressed object depth representation, while the right-hand side of the figure represents the Z calibrated absolute depth data with the correct depth distance relationship between each object in the scene. Figures 15 and 16 represent examples of scene data captured using the Lytro I generation camera and the Illum camera, respectively, along with their absolute data (mm).

Conclusions
Consumer-grade light-field cameras have become reality in recent years and potentially present a single camera, single position solution to 3D imaging, reducing the complexity of measurement alignment and transducer numbers, which is of very significant interest to the machine vision community. This work has specifically considered the possibility for two types of domestic consumer-grade light-field cameras to be used for metrological measurements of objects in 3D space, and as such has defined the generation of data in absolute Factors that specifically affect the quality and integrity of measurement have been investigated. A novel process of determining absolute depth data integrity from the Lytro cameras has been introduced that produces a greyscale varying depth map called the response or sensitivity curve. The response curve illustrates the relationship between greyscale depth data from the cameras with absolute distances. It has been shown that, in particular, the first-generation cameras are significantly sensitive to changing lighting conditions, and, that the depth sensitivity of all cameras is nonlinear, with limited zones of reliable data (50 mm to 270 mm for the Lytro-first-generation cameras, and, 30 mm to 650 mm for the Lytro Illum (second generation) camera) and zones of unreliable data. The shape of the response curves for different illumination conditions remains similar, thereby exhibiting the nature of the camera's response to depth as a function of illumination. However, the active (reliable) zone of measurement shifts in the Z axis, specifically for the firstgeneration cameras. This characteristic was not noticeable with the second-generation camera.
In addition, the pixel resolution of the Lytro family of cameras has been calculated that enables the novel measurement of any scene with 3D data in absolute units with these consumer-grade cameras. The accuracy and repeatability achieved were + 10.0 mm to − 20.0 mm, and typically 0.5 mm, respectively, in the Z coordinate (depth) because the response or sensitivity curves were generated at 5 mm intervals, and hence, accuracy is closely related to these intervals.
For the lateral X and Y coordinates measurement, the accuracy was + 1.5 mm within the active zone of cameras for the conditions of test cited in this work.
The development of the sensitivity curves and the pixel resolution calculations have allowed the development of novel absolute depth data images from the consumer-grade cameras with data calibrated in the three orthogonal axes (X, Y, Z). This has been illustrated with objects correctly measured and displayed in 3D space, albeit with accompanying statements of accuracy and repeatability.
In summary, the initial drive for this work was to assess the potential for consumer-grade light-field cameras to be used for metrological measurements of objects in 3D space. The experimental and theoretical work has clearly identified active zones of use within which the cameras respond in a near linear fashion (greyscale values versus absolute distance), and within these zones, their performance for metrological measurements in 3D space is viable, albeit the second-generation camera has better potential. Mitigating factors are the observed inherent change of as a function of lighting levels for the first-generation cameras, although this is less noticeable for the second-generation cameras. Consequently, both cameras currently have capability in the near field (active zones), but limited capability in the far field (inactive zones) due to very restricted greyscale/absolute distance ratios and sensitivity.
Furthermore, at this time point in time, the repeatability and accuracy statements derived from the experimentation in the near field are limiting factors that require improvement. It is believed that the development of independent volumetric calibration algorithms for the light-field data will improve these statements, along with refining the 5 mm resolution steps of the depth experiments. Future research is considering the analysis of complex shape objects along with using additional data generated by the LDS such as perspective views, and how this information can be used to underpin detailed volumetric calibration.