Decoding and calibration method on focused plenoptic camera

The ability of light gathering of plenoptic camera opens up new opportunities for a wide range of computer vision applications. An efficient and accurate method to calibrate plenoptic camera is crucial for its development. This paper describes a 10-intrinsic-parameter model for focused plenoptic camera with misalignment. By exploiting the relationship between the raw image features and the depth–scale information in the scene, we propose to estimate the intrinsic parameters from raw images directly, with a parallel biplanar board which provides depth prior. The proposed method enables an accurate decoding of light field on both angular and positional information, and guarantees a unique solution for the 10 intrinsic parameters in geometry. Experiments on both simulation and real scene data validate the performance of the proposed calibration method.


Introduction
The light field cameras, including plenoptic camera designed by Ng [1,2] and focused plenoptic camera designed by Georgiev [3][4][5], capture both angular and spatial information of rays in space. With the micro-lens array between image sensor and main lens, the rays from the same point in the scene fall on different locations of image sensor. With a particular camera model, the 2D raw image can be decoded into a 4D light field [6,7], which allows applications on refocusing, multiview imaging, depth estimation, and so on [1,[8][9][10]. To support the applications, an accurate calibration method for light field camera is necessary.
Prior work in this area has dealt with the calibration of plenoptic camera and focused plenoptic camera by projecting images into the 3D world, but their camera models are still improvable. These methods make an assumption that the geometric center of micro-lens image lies on the optical axis of its corresponding micro-lens, and do not consider the constraints on the high-dimensional features of light fields. In this paper, we concentrate on the focused plenoptic camera and analyze the variance and invariance between the distribution of rays inside the camera and in real world scene, namely the relationship between the raw image features and the depth-scale information. We fully take into account the misalignment of the micro-lens array, and propose a 10-intrinsic-parameter light field camera model to relate the raw image and 4D light fields by ray tracing. Furthermore, to improve calibration accuracy, instead of a singleplanar board, we design a parallel biplanar board to provide depth and scale priors. The method is verified on simulated data and a physical focused plenoptic camera. The effects of rendered images on different intrinsic parameters are compared.
In summary, our main contributions are listed as follows: (1) A full light field camera model taking into account the geometric relationship between the center of micro-lens image and the optical center of micro-lens, which is ignored in most literature.
(2) A loop-locked algorithm which is capable of exploiting the 3D scene prior for estimating the intrinsic parameters in one shoot with good stability and low computational complexity.
The remainder of this paper is organized as follows. Section 2 summarizes related work on light field camera models, decoding and calibration methods.
Section 3 describes the ideal model for a traditional camera or a focused plenoptic camera, and presents three theorems we utilize for intrinsic parameter estimation. In Section 4, we propose a more complete model for a focused plenoptic camera. Section 5 presents our calibration algorithm. In Section 6, we evaluate our method on both simulation and real data. Finally, Section 7 concludes with summary and future work.

Related work
A light field camera captures light field in a single exposure. The 4D light field data is rearranged on a 2D image sensor in accordance with the optical design. Moreover, the distribution of raw image depends on the relative position of the focused point inside the camera and the optical center of the microlens, as shown in Fig. 1. Figure 1(a) shows the design of Ng's plenoptic camera, where the micro-lens array is on the image plane of the main lens and the rays from the focused point almost fall on the same microlens image. Figure 1(b) and Fig. 1(c) show the design of Georgiev's focused plenoptic camera with a microlens array focused on the image plane of main lens,  Decoding light field is equivalent to computing multiview images in two perpendicular directions. Multiview images are reorganized by selecting a contiguous set of pixels from each micro-lens image, for example, one pixel for plenoptic camera [2] and a patch for focused plenoptic camera [3,10] However, for a focused plenoptic camera, the patch size influences the focus depth of the rendered image. Such decoding method causes discontinuity on outof-focus area and results in artifact of aliasing.
For decoding a 2D raw image to a 4D light field representation, a common assumption is made that the center of each micro-lens image lies on the optical axis of its corresponding micro-lens [7,11,12] in ideal circumstances. Perwaß et al. [7] synthesized refocused images on different depths by searching pixels from multiple micro-lens images. Georgiev et al. [13] decoded into light field using ray transfer matrix analysis. Based on this assumption, the deviation in the ray's original direction has little effect on rendering a traditional image. However, the directions of decoded rays are crucial for an accurate estimation of camera intrinsic parameters, which is particularly important for absolute depth estimation [14] or light field reparameterization for cameras in different poses [15].
The calibration of a physical light field camera aims to decode rays more accurately. Several methods are proposed for the plenoptic camera. Dansereau et al. [6] presented a 15-parameter plenoptic camera model to relate pixels to rays in 3D space, which provides theoretical support for light field panorama [15]. The parameters are initialized using traditional camera calibration techniques. Bok et al. [16] formulated a geometric projection model to estimate intrinsic and extrinsic parameters by utilizing raw images directly, including analytical solution and non-linear optimization. Thomason et al. [17] concentrated on the misalignment of the micro-lens array and estimated its position and orientation. In this work, the directions of rays may deviate due to an inaccurate solution of the installation distances among main lens, micro-lens array, and image sensor. On the other hand, Johannsen et al. [12] estimated the intrinsic and extrinsic parameters for a focused plenoptic camera by reconstructing a grid pattern from the raw image directly. The depth distortion caused by main lens was taken into account in their method. More importantly, expect for Ref. [17], these methods do not consider the deviation of the image center or the optical center for each micro-lens, which tends to cause inaccuracy in decoded light field.

The world in camera
The distribution of rays refracted by a camera lens is different from the original light field. In this section, we first discuss the corresponding relationship between the points in the scene and inside the camera modelled as a thin lens. Then we analyze the invariance in an ideal focused plenoptic camera, based on a thin lens and a pinhole model for the main lens and micro-lens respectively. Finally we conclude the relationship between the raw image features and the depth-scale information in the scene. Our analysis is conducted in the non-homogeneous coordinate system.

Thin lens model
As shown in Fig. 2, the rays emitted from the scene point (x obj , y obj , z obj ) T in different directions are refracted through the lens aperture and brought to a single convergence point (x in , y in , z in ) T if z obj > F , where F denotes the focal length of the thin lens. The relationship between the two points is described as follows: 1 Equation (2) shows that the ratio on the coordinates of the two points changes with z obj . Furthermore, there is a projective relationship between the coordinates inside and outside the camera. For example, as shown in Fig. 3, the objects with the same size in different depths in the scene correspond to the objects with different sizes inside the camera. The relationship can be described as where the focal length F satisfies:

Ideal focused plenoptic camera model
As shown in Fig. 1, there are two optical designs of the focused cameras. In this paper, we only consider the design in Fig. 1(b). The case in Fig. 1(c) is similar to the former, only with the difference in the relative position of the focus point and the optical center of the micro-lens.
In this section, the main lens and the micro-lens array are described by a thin lens and a pinhole model respectively. As shown in Fig. 4, the main lens, the micro-lens array, and the image sensor are parallel to each other and all perpendicular to the optical axis. The optical center of the main lens lies on the optical axis.
Let d img and d lens be the distance between two geometric centers of arbitrary adjacent microlens images and the diameter of the micro-lens respectively, as shown in Fig. 4(a). The ratio between them is d lens where L and l are the distances among the main lens, the micro-lens array, and the image sensor respectively. We can find that the ratio L/l is 3 Two objects with the same size of T in the scene at different depths focus inside a camera with focal length F .
Center of micro-lens image (a) Rays r1 and r2 intersect at the same point on image sensor to ensure the maximum size of the micro-lens image. Rays r3 and r4 fall on the geometric centers of two microlens images Sensor point dependent on the raw image and the diameter of micro-lens, which is useful for our calibration model in Section 5. Moreover, there is a deviation between the optical center of micro-lens and the geometric center of its image, and d img is constant in the same plenoptic camera. Let d lens, scene and d img, scene be the size of microlens and its image refracted through the main lens into the scene respectively ( Fig. 4(b)), combining Eqs. (2) and (5), the ratio between them satisfies: Equation (6) shows that though the rays are refracted through the main lens, the deviation between the geometric center of micro-lens image and the optical center of micro-lens still can not be ignored. The effect of deviations on the rendered images will be demonstrated and discussed in experiment.
In Fig. 4(b), A and B are the focus points of two scene points A and B respectively. The rays emitted from every focus point fall on multiple micro-lens and focus on the image sensor, resulting in multiple images A i and B i . The distance between sensor points A i and A i+1 is computed: where L A is the distance between focus point A and the micro-lens array, and |·| denotes the absolute operator. Equation (7) indicates that the distance between arbitrary two adjacent sensor points of the same focus point inside the camera is only dependent on intrinsic parameters. Once the raw image is shot (thus d A is determined), L A is only dependent on l and d lens . According to triangle similarity, we can get the coordinate of the focus point: Based on Eq. (7), we can simplify Eq. (8) as According to Eq. (9), once a raw image is shot (thus d A and x A are determined) and d lens is given, x A and y A can be calculated and they are independent on other intrinsic parameters. Furthermore, the length of AB can be calculated using only the raw image and d lens .
Imaging that there are two objects with equal size in the scene, as shown in Fig. 3, the distance between the focus point and the micro-lens array can be calculated via Eq. (7). Replacing b 1 and b 2 in Eq. (4) and simplifying via Eqs. (5) and (7), we get the relationship: where S 1 , S 2 , L I 1 , and L I 2 are dependent on only three factors, including the raw image, d lens , and l. Equation (10) shows that the value of F can be calculated uniquely once the other intrinsic parameters are determined.
In the same manner, Eq. (3) can be simplified as From Eq. (11), the size of an object in the scene is independent on l. The size of an object which we reconstruct from the raw image can not be taken as a cost function to constrain l.
In summary, given the coordinates of micro-lens and the raw image, three theorems can be concluded as follows: (1) The size of a reconstructed object inside the camera and its distance to the micro-lens array are constant (Eq. (9)).
(2) The unique F can be determined by the prior of the scene (Eq. (10)).
(3) The size of the reconstructed object in the scene is constant with changing L (Eq. (11)).

Micro-lens-based camera model
In this section we present a more complete model for a focused plenoptic camera with misalignment of the micro-lens array [17], which is capable of decoding more accurate light field. There are 10 intrinsic parameters totally to be presented in this section, including the distance between the main lens and the micro-lens array, L, the distance between the micro-lens array and the image sensor, l, the misalignment of micro-lens array, x m , y m , (θ, β, γ), the focal length of the main lens, F , and the shift of image coordinate, (u 0 , v 0 ).

Distribution of micro-lens image
As shown in Fig. 5(a), every micro-lens with its unique coordinate (x i , y i , 0) T is tangent with each other. In addition, (x i , y i , 0) T is only dependent on d lens . To simplify the discussion, we assume the layout of the micro-lens array is square-like. For hexagon-like configuration, it is easy to partition the whole array into two square-like ones. With the transformation shown in Fig. 5(b), the coordinate of the optical center of the micro-lens is represented as where t = (x m , y m , L) T and R is the rotation matrix with three degrees of freedom, i.e., the rotations (θ, β, γ) about three coordinate axes, which are similar to the traditional camera calibration model [18]. Although the main lens and the image sensor are parallel, the case between the micro-lens array and the image sensor is not similar (Fig. 5(c)). Each geometric center of the micro-lens image is represented as

Projections from the raw image
Once the coordinate of a micro-lens's optical center (x c , y c , x c ) T and its image point (x img , y img , L + l) T are calculated, we can get a unique ray r i represented as As shown in Fig. 4(b), the multiple images {(x A i , y A i ) T |i = 1, · · · , n} on the image sensor from the same focus point A can be located if a proper pattern is shot, such as a grid-array pattern [12]. Thus the multiple rays emitted from point A through different optical centers of the micro-lenses are collected to calculate the coordinate of point A : where · 2 represents L 2 norm. Till now, we have accomplished the decoding process of light field inside the camera. To obtain the light field data in the scene, combining the depth-dependent scaling ratio described in Eq. (2), the representation of the focused pointsÂ can be transformed using the focal lens F easily.

Calibration
Compared to the ideal focused plenoptic camera model, the shift caused by the rotations of related micro-lenses is far less than l and the difference in the numerical calculation is trivial, therefore the three theorems concluded for an ideal focused plenoptic camera still hold for our proposed model with misalignment. More importantly, when there is zero machining error, the diameter of the micro-lens d lens is set, and does not need to be estimated during the calibration. Consequently, the unique solution of the intrinsic parameters P P P = (θ, β, γ, x m , y m , L, l, u 0 , v 0 ) T and F can be estimated using the two steps described in the following.

Decoding by micro-lens optical center
To locate the centers of the micro-lens images, we shoot a white scene [19,20]. Then a template of proper size is cut out from the white image and its similarity with the original white image is calculated via normalized cross-correlation (NCC). To find the locations with subpixel accuracy, a threshold is placed on the similarity map such that all values less than 50% of the maximum intensity are set to zero. Then we take the filtered similarity map as weight and calculate the weighted coordinate of every small region. The results are shown in Fig. 6. To estimate parameters P P P, we minimize the cost function: is the offset between the camera coordinate and the image coordinate. After this optimization, P P P is used to calculate micro-lens optical centers and reconstruct the calibration points. Then the rays are obtained via Eq. (14).
According to Eq. (5), the solution of Eq. (17), changing with the initial value of L, is not unique. Moreover, the ratio L/l is almost constant with changing initial value of P P P. Although there are differences between the models described in Section 3.2 and Section 4, the theorems still hold since the shift caused by the rotations can be ignored. This observation will be verified in experiment later.
In addition, the value of l influences the direction of decoded rays. Due to the coupling relationship of angle and depth, either of them can be used as the prior to be introduced to estimate the unique P P P.

Reconstruction of calibration points
To reconstruct a plane in the scene, we may shoot a certain pattern in order to recognize multiple images from different scene points. A crop of the calibration board and its raw image we shoot are shown in Fig. 7. To locate the multiple images of every point on the calibration board, we preprocess the grid image by adding the inverse color of the white image to the grid image (Fig. 7). Then one of the sensor points corresponding to the focus point A , denoted by A crop of calibration board, its raw image, and the preprocessed image by white image.
is located by the same method described in Section 5.1. Consequently, the plane we shoot in the scene, denoted byΠ = {A i |i = 1, · · · , n}, is easy to be reconstructed using Eqs. (2) and (14).
As shown in Fig. 8, we design a parallel biplanar board with known distance between the two parallel planes and the distance between adjacent grids, which can provide depth prior P r dp and scale prior P r sc . Equivalently, we can shoot a single-plane board twice while we move the camera on a guide rail to a fixed distance.
After the sensor point (x A i ,ŷ A i ) T of arbitrary scene point A is located and the intrinsic parameters P P P are determined, we can reconstruct the grid-array planeΠ 1 andΠ 2 in the scene. Then the minimum distance of arbitrary point on the two calibration board planes can be calculated, referred asT 1 andT 2 respectively. Finally, we can minimize the cost function to estimate the focal length F of main lens: whereT 1 andT 2 are only dependent on F in this step. According to Eq. (10), there is an optimal solution for Eq. (18) if P P P is determined.
Note that if the values of L or l is incorrect, the distance between planeΠ 1 andΠ 2 is not equal to the prior distance. Therefore we take the distance Scale prior -

Pr
Pr sc dp Fig. 8 The parallel biplanar board we designed to provide depth prior for calibration. between planeΠ 1 andΠ 2 as the last cost function: L = argmin L dis(Π 1 ,Π 2 ) − P r dp 2 , L > 0 (19) where dis(·, ·) represents the distance between two parallel planes. In practice, we take the mean distance of reconstructed points onΠ 1 to planeΠ 2 as the value of dis. Moreover,T 1 andT 2 may not be equal to P r sc due to possible calculation error, so we must refine the value of depth prior to ensure the correct ratio of scale and depth.

Algorithm summary
The complete algorithm is summarized in Algorithm 1.
step of the loop of L should be changed with the value of dis(Π 1 ,Π 2 ) − P r dp 2 in Eq. (19). The same principle is applied to the search step of F . In addition, because of the monotonicity of dis(Π 1 ,Π 2 ) − P r dp 2 with L, and F with T 1 (F ) −T 2 (F ) 2 , we can use dichotomy to search an accurate value more efficiently.

Experimental results
In experiments, we apply our calibration method on simulated and real world scene data. We capture three datasets of white images and grid images using a self-assembly focused plenoptic camera (Fig. 9). The camera includes a GigE camera with a CCD image sensor whose resolution is 4008×2672 pixels that are 9 mm wide, F-mount Nikon lens with 50 mm focal length, and a micro-lens array whose diameter is 300 mm with negligible error in hexagon layout. We use the function "fminunc" in MATLAB to complete the non-linear optimization in Eqs. (15), (17), and (18). The initial parameters are set as the installation parameters, and θ, β, γ, x m , y m are set to zero.

Simulated data
First we verify the calibration method on simulated images rendered in MATLAB, as shown in Fig. 1. The ground truth and the calibrated parameters are shown in Table 1. We compare the estimated angle of the ray passing through each optical center of micro- Fig. 9 The focused plenoptic camera we installed and its micro-lens array inside the camera. lens and the one of the main lens to the ground truth, which is shown in Fig. 10. The differences are less than 1.992×10 −3 rad. We compare the geometric centers of the microlens images we locate and the ones with optimization. The error maps of 84×107 geometric centers optimized with different L are shown in Fig. 11(a). From Fig. 11(b), we find that there are 96.53% of the centers whose error is less than 0.1 pixel, which is the input for the following projection step.
The comparison of the locations of optical centers of micro-lenses with different L is illustrated in Fig. 12. The difference in x-coordinate and ycoordinate of the optical center is trivial with changing L. The maximal difference is 4.2282× 10 −6 mm when L changes from 55 to 84 mm, which proves our observation mentioned in Section 5.1.
The values of F , dis(Π 1 ,Π 2 ),Ŝ 1 ,Ŝ 2 ,T 1 , andT 2 are shown in Fig. 13. It is obvious thatŜ 1 andŜ 2 are almost constant when L changes, proving the correctness made in Eqs. (9) and (11). In addition, the values of dis(Π 1 ,Π 2 ) correlate linearly with L, which testifies the reasonability of the cost function described in Eq. (18). The relationship amongT 1 , T 2 , and F is shown in Fig. 14, which proves the Fig. 10 The histogram of the deviation between the estimated angles of the rays and the ground truth.  Fig. 11 The results of optimization on geometric centers of the micro-lens image on simulated data. analysis about Eq. (10).

Physical camera
Then we verify the calibration method on the physical focused plenoptic camera. To obtain the equivalent data of parallel biplanar board, we shoot a single-plane board twice while we move the camera on a guide rail to an accurate fixed distance, as shown in Fig. 9. The depth prior P r dp is precisely controlled to be 80.80 mm and the scale prior P r sc  is 28.57 mm. The calibration results are shown in Fig. 15.
As shown in Fig. 15(a), there is an obvious error between the computed geometric centers and the located centers on the edge of the error map, which may result from the distortion of lenses or the machining error of micro-lens. However, we find that that there are 73.00% of the centers whose error is less than 0.6 pixel, as shown in Fig. 15(b). The  mean difference of geometric centers of micro-lens images optimized with different L is 1.89×10 −4 pixel (Fig. 15(c)). The results of F , dis(Π 1 ,Π 2 ),Ŝ 1 ,Ŝ 2 ,T (T 1 =T 2 ) with different L are similar to the results on simulated data.
Finally, to verify the stability of our algorithm, we calibrate intrinsic parameters with different poses of calibration board. Corresponding results are shown in Table 2.

Rendering
We render the focused image with deviations between the optical center of micro-lens and the geometric center of micro-lens image.
We shoot a resolution test chart on the same depth for simulated data (Fig. 16), which indicates that the deviation surely effects the accuracy of decoded light  field. Then we shoot a chess board for simulated data to evaluate the width of every grid in the rendered images. We resize the images by setting the mean width of the grids to be 100 pixels. Then we calculate the range and the standard deviation of the grid width. The results are shown in Table  3, which indicates that the calibration contributes to the uniform scale in the same depth and reduces the distortion caused by incorrect deviations. The results on physical camera are shown in Table 4 and Fig. 17. The decoded light field with the estimated intrinsic parameters leads to more accurate refocus   distance [14], which is equivalent to a correct ratio of scale and depth.

Conclusions and future work
In the paper we present a 10-intrinsic-parameter model to describe a focused plenoptic camera with misalignment. To estimate the intrinsic parameters, we propose a calibration method based on the relationship between the raw image features and the depth-scale information in the real world scene. To provide depth and scale priors to constrain the intrinsic parameters, we design a parallel biplanar board with grids. The calibration approach is evaluated on simulation as well as real data. Experimental results show that our proposed method is capable of decoding more accurate light field for the focused plenoptic camera. Future work includes modelling the distortion caused by the micro-lens and main lens, optimization of extrinsic parameters, and the reparameterization of multiple and re-sampling light field data from cameras with different poses.
University, for six months and the Department of Computer Science, University of Delaware, for one month, respectively.
Professor Wang's research interests include computer vision and computational photography, such as 3D structure and shape reconstruction, object detection, tracking and recognition in dynamic environment, and light field imaging and processing. He has published more than 100 papers in the international journals and conferences.
Open Access The articles published in this journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www. editorialmanager.com/cvmj.