Abstract
We propose a novel minimal solver for sphere fitting via its 2D central projection, i.e., a special ellipse. The input of the presented algorithm consists of contour points detected in a camera image. General ellipse fitting problems require five contour points. However, taking advantage of the isotropic spherical target, three points are enough to define the tangent cone parameters of the sphere. This yields the sought ellipse parameters. Similarly, the sphere center can be estimated from the cone if the radius is known. These proposed geometric methods are rapid, numerically stable, and easy to implement. Experimental results—on synthetic, photorealistic, and real images—showcase the superiority of the proposed solutions to the state-of-the-art methods. A real-world LiDAR-camera calibration application justifies the utility of the sphere-based approach resulting in an error below a few centimeters.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Target-based sensor calibration is a fundamental topic in 3D computer vision. The efficiency of these methods depends on the accuracy and robustness of the object detection algorithms. This paper focuses on spherical targets for robust fitting and extrinsic camera calibration, as they are easy to identify in camera images owing to the rotation symmetry. In the image plane, the perspective projection of a sphere is an ellipse; thus, its contour detection becomes an essential step for sphere recognition.
Consequently, we propose sample-minimizing solutions for the special ellipse and the sphere parameter estimation. The methods take advantage of the spherical projection since the shape of the ellipse is restricted by the sphere and the intrinsic camera parameters. These approaches require only three contour points as opposed to the general five-point problem of ellipse fitting.
Spherical calibration. The industry often utilizes spherical targets for calibrating sensors with different modalities (Ruijl et al. 2001; Klüng and Meli 2005). They are exploited in the field of optics (Xiong et al. 2021; Liu et al. 2017). Depth cameras or laser scanners, like LiDAR, generate three-dimensional pointclouds where spherical objects are readily distinguished. As this shape is represented by the coordinates of the center in space and the radius, the search space has four variables that can be accurately estimated. If the radius of the sphere is known, the degrees of freedom become three. This paper focuses only on estimation of extrinsic parameters even if intrinsic ones can also be computed using at least three spherical objects (Lu and Payandeh 2010).
Most of the related papers do not consider the geometric advantage of the setup. Numerous related publications advise a contrastive background and target object during real-world tests. For instance, Sun et al. (2016) use white spheres in front of a black board. Guan et al. (2015) apply the sphere as the only light source.
Robust Detectors. We strive to find a robust method without restrictions during measurements to improve our sphere-based calibration pipeline (Tóth et al. 2020). This previous work approximates the spherical projection with a circle and detects the ellipse contour with Random Sampling Consensus (RANSAC) algorithm (Fischler and Bolles 1981). Recently, RANSAC and its variants (Zuliani et al. 2005; Chum and Matas 2005; Barath and Matas 2021) have been widely used for robust model and multi-model estimation. These methods require both minimal and overdetermined solvers. The minimal method runs in each iteration on a randomly selected set of points and calculates the parameters from minimal input. The overdetermined solver estimates the model parameters from a larger sample set. The time complexity of RANSAC exponentially depends on the minimal sample size (see Table 1). Therefore a reliable ellipse edge detection for minimal and overdetermined cases is essential for rapid RANSAC-based estimation.
Ellipse Detectors. The main categories of ellipse fitting methods are parameter estimation by ellipse contour points and edge-following algorithms. The classical point-based estimators (Fitzgibbon et al. 1996, 1999; Zhang 1997; Rosin 1993; Prasad et al. 2013; Halir and Flusser 1998) apply e.g. least squares method and analytical or numerical approximation to find the ellipse parameters. The latter approaches (Mai et al. 2007; Liu and Qiao 2009; Lu et al. 2020; Meng et al. 2020) extract arcs from images and groups those belonging to the same ellipse.
Ellipse detectors can utilize the Hough transform (HT) to estimate the general ellipse parameters by a voting scheme (Hough 1962; Nair and Saunders 1996). The time and memory demands of this kind of detector are very high. Moreover, HT is also very sensitive to the quantification of the parameters. Xie and Ji (2002) proposed a rapid HT-based method where only one parameter is considered during the voting, and the other four parameters are estimated based on the symmetry of the ellipses.
Recent methods apply deep learning for ellipse estimation (Liu et al. 2007; Dong et al. 2021). Machine learning is very efficient for e.g. free-form object detection (Cai and Vasconcelos 2018). Nonetheless, other approaches are more accurate when the sought model is describable by a few clear geometric constraints and only a few parameters. Machine learning is not a suitable tool for such problems as the object translation and rotation, the so-called pose, can be poorly retrieved (Jabbar et al. 2019).
Sphere Detectors. For sphere detection in images, point-based methods are commonly used requiring at least three contour points. Shi et al. (2019) proposed a point-based minimal ellipse detector minimizing an algebraic error function, where the center estimation is composed of a particular and a free solution in the null space. In the work of Sun et al. (2016), a flexible and global spherical calibration is introduced. The paper solves the standard problem of spherical projection fitting without outliers as they apply monochromatic background. We propose analytical point-based methods for both sphere and spherical ellipse estimations to reach more accurate results in our calibration pipeline with robust detectors. The methods are compared to both general five-point and minimal three-point methods.
Contribution. This paper presents how to accurately estimate the projected ellipse parameters from contour points for calibrated cameras. Furthermore, we reveal how to evaluate the sphere parameters analytically. It is proven that three contour points of an ellipse are sufficient to estimate the projective line of the sphere centers. Hence, if the radius of the sphere is known, the sphere position can be estimated. We propose a direct method to estimate the ellipse with parametric representation. Finally, it is shown how the sphere parameters can be computed if the contour ellipse is known. We address the problem of both minimal and overdetermined cases. The applied error value is geometric and no numerical optimization is needed contrary to the algebraic error-based state-of-the-art methods (Shi et al. 2019; Sun et al. 2016).
Paper outline. Fig. 1 shows the proposed methods along with the input and output parameters. The paper is structured to present these techniques in the following way.
Section 2 overviews the mathematical background of the problem visualized in Fig. 2. Section 2.1 gives a formal description of the setup. Two constraints corresponding to the elliptical projection are derived in Sect. 2.2.
The proposed algorithms are introduced in Sect. 3: (i) 3pFit (Sect. 3.1) is for three-point ellipse detection, where the ellipse fitting is considered with parametric representation, (ii) Ell2Sphere (Sect. 3.2) converts the ellipse parameters to sphere parameters, and (iii) Direct3pFit (Sect. 3.3) gives a direct solution to fit the sphere parameters from a pointset. The entire derivations in the appendices prove the reliability of our methods.
Section 4 discusses our novel algorithms and provides pseudocodes and performance analysis. The validation of the proposed methods in Sect. 5 is also special in the sense that both semi-synthetic and real tests are used here. Images of spheres with ground truth parameters are rendered with Blender. The proposed and state-of-the-art methods are quantitatively compared. The methods are qualitatively compared on real data as well.
2 Mathematical Background
The aim of this paper is to estimate the parameters of a conic sectiont (Glaeser et al. 2016,) i.e., an ellipse (see Lemma 1), in an image which contains the projected contour of a sphere. This section describes the shape and curve representations relevant to the sphere projection problem.
2.1 Preliminaries and Notation
In this subsection, we enumerate the used formulae for the camera model and the main geometric objects: the sphere, the cone, and the ellipse.
The perspective camera. Assume, we have a calibrated perspective camera \(\begin{bmatrix} s_{u}&s_{v} \end{bmatrix}^{\text {T}}\), \(\begin{bmatrix} u_{0}&v_{0} \end{bmatrix}^{\text {T}}\), and f denote the pixel scaling, the principal point, and the focal length, respectively. Then the intrinsic parameters are as follows:
The pixels of the image can be normalized by the camera parameters which map the points of the image plane into the world frame:
where \(\begin{bmatrix}{\hat{u}}&{\hat{v}}&1\end{bmatrix}^{\text {T}}\) denotes the normalized homogeneous coordinates of the point \(\begin{bmatrix}u&v\end{bmatrix}^{\text {T}}\) in pixels. Hence, the camera location and the principal point are \(\textbf{c} = \begin{bmatrix}0&0&0\end{bmatrix}^{\text {T}}\) and \(\textbf{p} = \begin{bmatrix}0&0&f=1\end{bmatrix}^{\text {T}}\), respectively. The unit vector to any 3D point except for the origin \({\textbf{p}}_i\in {\mathbb {R}}^3\) is \({\textbf{q}}_i = \frac{\mathbf {p_i}}{\Vert \mathbf {p_i}\Vert }\). For the principal point, \({\textbf{q}} = {\textbf{p}}\) holds.
Geometry representation. Fig. 2 illustrates the sphere projection in the camera. To find the mathematical connection between any contour points of the conic section and the sphere, we use the following notations and equations.
Let us consider the implicit representation of a sphere as:
where \(\textbf{s} =\begin{bmatrix}x_0&y_0&z_0\end{bmatrix}^{\text {T}}\)and r are the sphere center and radius, respectively. We assume the radius is known in our approach and the sphere center is an unknown position.
The cone from the camera center, touching the sphere, determines how the sphere becomes a conic section in the image and vice versa. Throughout this paper, the cone is defined by the following two parameters: the unit vector of the cone axis \(\textbf{w}\) and the angle \(\alpha \) between the axis and the generators. The apex of the cone is at the camera position \(\textbf{c}\).
2.2 Ellipse Estimation from a Sphere
If a sphere is projected into the image plane, the equations of the ellipse in both implicit and parametric forms can be deduced from the sphere parameters. Moreover, two invariant properties can be observed.
Implicit representation. In the general case, the implicit equation of a conic section contains six unknown parameters:
which is an ellipse if \(\Delta = B^2 - 4AC <0 \), i.e., when the conic discriminant is negative. In this special case, it becomes indeed an ellipse (Hajder et al. 2020), as we proved in Lemma 1.
Lemma 1
If a conic section is the projection of a sphere lying in front of the image plane, the result is an ellipse.
Proof
If a conic section is an ellipse, the conic discriminant is negative:
Since the conic section is visible in the image, we introduce a restriction on \(z_0\) which expresses that the whole sphere surface is strictly in front of the camera: \(r<z_0\). Otherwise, the sphere would be either behind the image plane and invisible, or touch or contain the camera itself, which is unrealistic in a real-world scenario, as discussed in Fig. 3. The conic discriminant becomes:
After substituting (6) into the inequality in (5):
The restriction yields \(0< z_0^2 - r^2 \le x_0^2 + y_0^2 + z_0^2 - r^2\). Hence, (7) holds, that is \(\Delta < 0\), and the conic section is indeed an ellipse. \(\square \)
The back projection of an image point corresponds to a ray in space. If we use the normalized coordinates defined in (2), the coordinates of the ray can be written as \(\begin{bmatrix}x&y&z \end{bmatrix}^{\text {T}}=z\begin{bmatrix}{\hat{u}}&{\hat{v}}&1 \end{bmatrix}^{\text {T}}\). The implicit representation of the sphere (3) gives the intersections of a ray and the sphere:
This leads to the following coefficients of (4) as it comes from (A.7) discussed in “Appendix A” and based on Tóth et al. (2020):
Three invariant properties can be defined:
The equations are linearly dependent, thus, one can be dropped as
Parametric representation. Another approach to define an ellipse is a 2D affine transformation describing how an origin-centered unit circle becomes an ellipse specified by a rotation angle, two scaling factors, and a translation vector. Hence, the parametric ellipse is
where \({\textbf{R}} \left( \theta \right) = \begin{bmatrix} \cos \theta &{} -\sin \theta \\ \sin \theta &{} \cos \theta \end{bmatrix}\) is a 2D rotation matrix calculated from the angle \(\theta \in \left[ -\frac{\pi }{2}; \frac{\pi }{2}\right) \), a and b are the major and minor semi-axes, \({\textbf{e}}_0 = \begin{bmatrix} e_x&e_y \end{bmatrix}^{\text {T}}\) is the center of the ellipse.
The centre of the ellipse \({\textbf{e}}_0\) and the central projection of the sphere center \(\mathbf {e_s}\) are not identical, see the spatial view in Fig. 2 or the planar view in Fig. 4. This disparity in the image gives the eccentricity of the projection \(\varepsilon = \Vert \textbf{e}_\textbf{s} - {{\textbf {e}}}_0 \Vert \) which has to be considered (Matsuoka and Maruyama 2016) when constructing the new method.
To find the parametric representation of an ellipse using the sphere parameters, an important property of a sphere projection is that the line of the major ellipse axis goes through the principal point of the image (Lu and Payandeh 2010). In other words, if the ellipse center \({\textbf{e}}_0\) is known, it yields the value of the rotation angle directly: \(\theta = {{\,\textrm{atan2}\,}}\left( \textbf{e}_y,\textbf{e}_x \right) \). In this paper, we call this property the axial constraint.
Let us use constraints \(\Delta = B^2 - 4AC\) and \(\Psi = AE^2 + CD^2 - BDE + F\Delta \) to formulate the equations which converts a general ellipse from implicit form to its parametric one (Hoffmann 1989):
Substituting (9) into (13) with the axial constraint gives that the parametric ellipse parameters can be delivered from the sphere parameters directly:
An important observation describes the connection between the ellipse center and the semi-axes as follows:
This means that an ellipse axis is defined if the two center coordinates and the other axis are known. In conclusion, if \({\textbf{e}}_0\) and either a or b are known, then \(\theta \) and the remaining semi-axis can be calculated as well. In this paper, this property is called projectional constraint. The projected sphere center is the first two coordinates of \(\textbf{s}\) normalized by the third one which is \(\mathbf {e_s} = \frac{1}{z_0}\begin{bmatrix} x_0&y_0 \end{bmatrix}^{\text {T}}\).
Observation. Both deductions contain two constraints. The implicit form has constraints \({\mathcal {C}}_1\), \({\mathcal {C}}_2\), and \({\mathcal {C}}_3\) in (10), but only two of them are independent because of (11). The parametric form has an axial and a projectional constraint.
In this paper, the formulae in (9) and (14) are referred to as Sphere2Ell. This name covers both possible conversions depending on the representation of the required output as we know the parametric-implicit conversion of an ellipse.
3 Proposed Minimal Solvers
Classical methods (Fitzgibbon et al. 1996, 1999; Halir and Flusser 1998; Zhang 1997; Rosin 1993) use five points for ellipse estimations. Based on our special scenario, we can consider the two constraints introduced in the previous section. Hence, the degrees of freedom are reduced from five to three. Therefore, three contour points, denoted by \({\textbf{p}}_i = \begin{bmatrix} {\hat{u}}_i&{\hat{v}}_i&1\end{bmatrix}^{\text {T}}\), are enough for estimation the ellipse parameters, where \({\hat{u}}_i\) and \({\hat{v}}_i\) are the normalized coordinates of the i-th sample point as in (2), \(i \in \{1,2,3\}\).
We pursue to apply two approaches here: one is based on the constraints of the implicit representation similar to some state-of-the-art methods (Sun et al. 2016; Shi et al. 2019), and the other is based on a novel geometric approach. The implicit solution in Appendix B has no closed-form solution and did not overperform the rival methods. The following approach shows that the estimation problem can be solved significantly easier and faster than the previously mentioned algorithm as the solution can be given in closed form, without any numerical approximation applied.
3.1 Geometric Approach
The main idea of the proposed method is that the projection can be defined by the cone directed to the sphere center when the generator of the cone touches the sphere and the apex is at the camera focal point. The intersection of this cone and the image plane gives the ellipse contour in the image. Therefore, the cone parameters can be calculated from a few contour points; then, the ellipse parameters can be retrieved.
The cone parameters. If three points \(\textbf{p}_i, i \in \{1,2,3\}\) are given on the ellipse contour, the cone can be defined by the following process. Let the unit vectors from \(\textbf{c}\) to \(\textbf{p}_i\) be denoted by \({\textbf{q}}_i\). Then
Let matrix \({\textbf{Q}} = \begin{bmatrix} {\textbf{q}}_1&{\textbf{q}}_2&{\textbf{q}}_3 \end{bmatrix}\) contain the normals and \({\textbf{1}}_3\) be a one 3-vector. The inner products can be written in matrix form after re-ordering as
Hence, the magnitude and the direction of vector \({\textbf{Q}}^{-T} {\textbf{1}}_3\) give the cone parameters \(\cos \alpha \) and \({\textbf{w}}\).
The end-points of the major axis. If the cone parameters \(\cos \alpha \) and \({\textbf{w}}\) are known, the ellipse parameters can be calculated as well. For this purpose, we estimate two well-defined contour points of the ellipse: the end-points of the major axis denoted by \({\textbf{a}}_1\) and \({\textbf{a}}_2\) (see Fig. 4). The axial constraint makes these points and the semi-major axis a measurable in the image.
Figure 5 illustrates the problem, where the plane contains the points \(\textbf{c}\), \(\textbf{p}\), \({\textbf{a}}_1\), and \(\textbf{a}_2\). A unit normal of this plane can be defined as \({\textbf{p}} \times {\textbf{w}}\) which equals to
Thus, the directions from \(\textbf{c}\) to \({\textbf{a}}_1\) and \({\textbf{a}}_2\) can be calculated, if \( {\textbf{R}} \left( {\textbf{n}}, \alpha \right) \) denotes the axis–angle representation of a 3-dimensional rotation describing the rotation around vector \({\textbf{n}}\) with angle \(\alpha \)
If \({\textbf{q}}_{{\textbf{a}}_1}\) and \({\textbf{q}}_{{\textbf{a}}_2}\) are known, the axis end-points are
where the projection operator \(\Pi _{{\mathcal {I}}}: {\mathbb {R}}^3 \mapsto {\mathbb {R}}^2 \) maps a vector from 3D space to a point in the normalized image plane.
The parametric ellipse. As it was demonstrated, calculating the major axis end-points \(\textbf{a}_1\) and \(\textbf{a} _2\) are straightforward using the axial constraint. Moreover, the projectional constraint and formula (15) determines b based on \(\textbf{e}_0\) and a. Let us use the constraint \(\Omega = e_x^2 + e_y^2 + 1 -a^2\), then
3.1.1 Overdetermined Case
If we have a large set of points consisting of \(n > 3\) ellipse contour points, (17) is modified as
where \( {\textbf{Q}} = \left[ {\textbf{q}}_1 \dots {\textbf{q}}_n \right] \), \({\textbf{Q}}^+\) is the Moore-Penrose inverse of matrix \({\textbf{Q}}\) and \({\textbf{1}}_n\) is a one n-vector. This technique minimizes a geometric error. The remaining steps of the algorithm are the same. We refer to both minimal and overdetermined solutions as 3pFit.
The main advantage over the state-of-the-art algorithms (Sun et al. 2016; Shi et al. 2019) is the minimization of the geometric error, i.e., the inverse of \(cos \alpha \) in (23) and (17), instead of an algebraic one. \(cos^{-1}\alpha \) equals to the the ratio of the focal point—sphere distance and the radius.
3.2 Center from Parametric Ellipse
We describe a sphere center estimator algorithm called Ell2Sphere next that allows us to compute the sphere parameters based on the proposed ellipse estimators. The inputs of the method are the ellipse parameters and the radius of the sphere. The main goal is to define the cone parameters and then find the sphere center.
First, we estimate the end-points of the major ellipse axis. Based on the geometric meaning of the parametric ellipse described before (12), the input of function \(\varvec{e}\) are the intersection points of an origin-centered unit circle and axis x, i.e., \(\phi = \{0, \pi \}\). Hence, the sought points are \({\textbf{a}}_1 = \varvec{e}(0)\) and \({\textbf{a}}_2 = \varvec{e}(\pi )\).
The next step is to find the direction of the cone axis. We back-project the points \({\textbf{a}}_1\) and \({\textbf{a}}_2\) into the 3D space:
where the projection operator \(\Pi _{{\mathcal {S}}}: {\mathbb {R}}^2 \mapsto {\mathbb {R}}^3 \) provides the image ray and maps a point \(\textbf{x}\) from the normalized image plane to a vector into the 3D space as \(\Pi _{{\mathcal {S}}}(\textbf{x}) = \Pi _{{\mathcal {I}}}^{-1} (\textbf{x}) / \Vert \Pi _{{\mathcal {I}}}^{-1} (\textbf{x})\Vert \). Afterward, the angle between \({\textbf{q}}_{{\textbf{a}}_1}\) and \({\textbf{q}}_{{\textbf{a}}_2}\) gives the angle of the cone and the angle bisector of these vectors gives the direction as
Since \(\sin \alpha = r / \Vert \textbf{s} \Vert \) (see Fig. 5), variables \(\alpha \) and r give the distance between the camera focal point and the sphere center, the latter of which is obtained as follows:
Based on \(\langle {\textbf{q}}_{{\textbf{a}}_1}, {\textbf{q}}_{{\textbf{a}}_2} \rangle = \cos 2 \alpha \), trigonometrical identities allow us to calculate the sphere center s without measuring angle \(\alpha \):
3.3 Center from Contour Points
Some ellipse contour points can define ellipse contour points define the sphere center straightforwardly and the ellipse estimation can be omitted from the proposed method. This simplified algorithm called Direct3pFit is suitable for both minimal and overdetermined problems.
The method has three main steps: estimation of \(\textbf{Q}\) containing the normal vectors, then the cone parameters, which yields the sphere center. It starts the same way as in Sect. 3.1 until (17) or (23) when the cone parameters are already estimated. Then the sphere center is written like in (26). The optimization step of (27) is modified as \(\cos \alpha \) is estimated in (17) and (23) instead of \(\cos 2 \alpha \):
4 Algorithms
The actual implementations of the algorithms have a large effect on the speed. Therefore, we provide some implementation details and list the algorithmic components that facilitate fast performance. In this section, we list these optimization steps, justify the design choices and present the pseudocode of our methods: 3pFit, Ell2Sphere, and Direct3pFit are outlined in Algorithms 1, 2, and 3, respectively. The algorithms contain only high-level descriptions, and the operations on the left have to be performed.
3pFit and Ell2Sphere estimate the major axis end-points \(\textbf{a}_1\) and \(\textbf{a}_2\) to find the ellipse and cone parameters instead of measuring the minor axis end-points \(\textbf{b}_1\) and \(\textbf{b}_2\), although the problem could be formulated and solved in both ways. Figure 4 illustrates both cases. The minor axis end-points can be calculated as \({\textbf{b}}_1 = \Pi _{{\mathcal {I}}}\left( {\textbf{R}} \left( {\textbf{w}}, \pi /2 \right) \cdot {\textbf{q}}_{{\textbf{a}}_1} \right) \) and \({\textbf{b}}_2 = \Pi _{{\mathcal {I}}}\left( {\textbf{R}} \left( {\textbf{w}}, \pi /2 \right) ^{T} \cdot {\textbf{q}}_{{\textbf{a}}_1} \right) \) similar to (21). This means that finding the end-points regarding b instead of a would need extra rotation calculation and cause more numerical error.
In 3pFit and Direct3pFit, the reformulation of \({\textbf{q}}_{{\textbf{a}}_2}\) (20) comes from the properties of a Lie algebra. Accordingly, we can reuse the already calculated \({\textbf{R}} \left( {\textbf{n}}, \alpha \right) \) in (19) instead of estimating the new matrix which would be more time-consuming than the simple transposition. This simplification was executed also in the equation of \({\textbf{b}}_2\) in the last paragraph as \({\textbf{R}} \left( {\textbf{w}}, -\pi /2 \right) = {\textbf{R}} \left( {\textbf{w}}, \pi /2 \right) ^{T}\).
Finally, we advise not to measure \(\alpha \) in Ell2Sphere and Direct3pFit directly to calculate \(\textbf{s}\) in (26) because the \(\arccos \) operation inflicts rounding error in (25). The deductions with \(\cos 2\alpha \) in (27) and \(\cos \alpha \) in (28) are more accurate and rapid.
Stability analysis in Sect. 5.3 supports these steps. Furthermore, a real-world example in Table 7 presents that the accuracy holds if the sequential execution of 3pFit and Direct3pFit changed to Direct3pFit while it is less time-consuming.
Computational complexity. In Table 2, we compared the computational complexity of the three-point algorithms and summarized the principal differences to reveal the innovation of the proposed approach. IMM (Shi et al. 2019) computes on a double-sized coefficient matrix contrary to the proposed and LLSB (Sun et al. 2016) methods. Besides elementary computational steps, the primary matrix operations of the methods are normalization (NORM), computing the pseudoinverse matrix (PINV), linear least squares method (LSQ), and singular value decomposition (SVD). In terms of complexity, all three methods guarantee linear runtime. Nevertheless, the expected time demand of IMM is higher due to the larger matrix size. Besides, the fourth column presents that the sphere estimation \({\textbf{s}}\)in IMM is not a direct method from point set \({\mathcal {P}}\) but contains the estimation of the ellipse as a sub-step. LLSB and the proposed Direct3pFit define the sphere position \({\textbf{s}}\) via the cone parameters more efficiently but with different approaches.
The last column shows whether the algorithm can produce the ellipse parameters of (4) or (12), which is critical in terms of application in a robust detector. The primary motivation of the paper is to find suitable methods for the calibration pipeline of Tóth et al. (2020), which estimates the ellipses using RANSAC. The distances between every point and the ellipse are evaluated within every iteration to find the largest consensus set. The distance function d of the later experiment in (29) computes the same value. This step requires the geometric parameters a, b, \({\textbf{e}}_0\) and \(\theta \) estimated only by the proposed 3pFit method directly. Hence, the output of IMM and LLSB would be converted from the \(A-F\) parameters in every iteration of RANSAC, causing extra computational time.
5 Results
To demonstrate the performance of the sphere and ellipse fitting methods, we simulated several test scenarios in MATLAB, Blender, and made real-world tests. The source code of the proposed methods is available on GitHub as part of a sensor calibration project: https://github.com/tothtekla/SphereCalib.
The next paragraphs outline the tested ellipse fitting methods.
LSQDir (Halir and Flusser 1998): This variant of the classical direct ellipse fitting algorithm of Fitzgibbon et al. (1996) provides better numerical stability. LSQDir finds the parameters of a general ellipse via eigenvector computation of a small matrix. Therefore it is very fast and non-iterative. The drawback of this method is that it is biased toward finding small ellipses.
HQ (Lu et al. 2020): This is a high-quality, four-stage ellipse detector. It detects arcs from Sobel or Canny edge detectors and several properties of these arcs are exploited, e.g. overall gradient distribution, arc-support directions, and polarity. This method is sensitive to the setting of its three parameters. Thus, it was tuned on each test sequence in HQ to achieve accurate results.
AAMB (Meng et al. 2020): After segmenting the edges into elliptic arcs, the so-called digraph-based arc adjacency matrix (AAM) is constructed. All arc combinations are then generated through bidirectionally searching the AAM for ellipse candidates. The next step is the estimation of the cumulative-factor-based cumulative matrices (CM). At last, the ellipses are fitted to each candidate through the double eigen-decomposition of the CMs using the Jacobi method.
Some novel papers deal with sphere positioning and contain ellipse fitting too. The following paragraphs summarize the approaches and the innovation of our method in contrast to these in addition to the complexity analysis in Sect. 4.
LLSB (Sun et al. 2016): A linear least-squares-based sphere estimation method is presented in this paper, which requires high-quality ellipse contours. Its limitation is the white spherical target in front of a blackboard, as mentioned in the introduction. This method reconstructs sphere centers effectively using two constraints as well. However, the paper did not reveal how to interpret them independently. The approach detects the sphere center directly, while ellipse estimation is not exposed in detail, but the parameters can be expressed.
IMM (Shi et al. 2019): An implicit minimal method is proposed for sphere fitting similarly to our initial attempt, described in Appendix B. The sphere center estimation is composed of a particular and a free solution in the null space, which problems are solved by SVD and Gauss-Jordan’s elimination. Both projected ellipse and sphere parameters can be computed directly, minimizing an algebraic error function.
CB (Hajder et al 2020, Tóth et al 2020): Our former approach deals with circle-based sphere estimation, which is suitable especially for narrow-angle lenses when the a/b ratio of the projected ellipse is close to one. In a RANSAC process, the minimal solver is a three-point circle estimator with a well-tuned error threshold. LSQDir is applied to find the ellipse parameters to the largest consensus set. The sphere center is obtained from the implicit ellipse parameters with Levenberg-Marquardt optimization.
In order to evaluate the performance of the algorithms, both minimal and non-minimal cases were tested contaminated by zero-mean Gaussian noise. We also varied the distance of the sphere centers from the focal point both along the vertical and optical axes. We modeled several situations both in MATLAB and in Blender. We assumed that the camera intrinsics were calibrated, and the sphere radius was known.
Moreover, we tested real scenarios with two pre-calibrated cameras. We compared the algorithms by estimating the parameters of the observed sphere in both cameras. Since the cameras and spheres were static, the 3D translation between the sphere centers should coincide with the translation of the two cameras and, thus, can be used for measuring the error. Finally, a camera-LiDAR calibration application revealed the relevance of these algorithms.
5.1 Experimental Settings
The applied synthetic, photorealistic, and real camera intrinsics are listed in Table 3. These parameters were assumed to be known during the tests. The perspective camera model was applied in all cases. The digital camera was a Hikvision MV-CA020-20GC sensor with high-quality Fujinon SV-0614 H lenses. The radial distortion of the lenses was negligible.
The ground truth ellipse parameters were also required to compare the different estimators in the synthetic tests. They were delivered from the sphere parameters, as we discussed in Sect. 2.2. In the real-world tests (Sects. 5.5 and 5.6), 3pFit was executed in a RANSAC framework to find the ellipse in the image and Direct3pFit to find the estimated sphere center based on the best consensus set
5.2 Optimization of Direct3pFit
Levenberg-Marquardt minimization (LMM) can help in numerical optimization (Hartey and Zisserman 2004) when the data set is noisy. The formula of the LMM problem was
where \(\textbf{s}\) is initially estimated with the proposed Direct3pFit, the function d gives the Euclidean difference vector between a parametric ellipse and an inlier pointset, and \({\mathcal {P}}\) is the set of the 2D inlier points. The function d is a quadric; the sought root can be approximated numerically.
LMM found a local optimum in a few steps in many experiments. The most likely reason is that although the noise occurs in the image plane, the cone axis is close to perpendicular to that plane even if the ellipse lies far from the principal point but still visible, i.e., near a corner of the image (Hajder et al. 2020). Equations (17) and (23) minimize a geometric error. After that, the algorithm always takes exact steps, and no approximation is needed. As the geometric error is linear w.r.t. the inverse of the cosine of the angle, Direct3pFit obtains the (optimal) solution in one step without any iterations. The inverse of the cosine is the ratio of the sphere center—camera distance and the radius. The fitting error in the image space, i.e., the ellipse-points distance, can be minimized via numerical optimization. As the proposed method gives the initial value, which is also geometric error-based minimization, the numerical optimization usually cannot improve the result. However, the optimization of (29) via LMM is preferred in degenerated cases.
5.3 Synthesized Accuracy and Ill-Conditioning Tests
Synthetic tests implemented in MATLAB for competitive-quantitative evaluation demonstrate how the three-point algorithms behave in noisy and ill-conditioned cases. Camera parameters are listed in the first two rows of Table 3.
The effect of the inlier number. In the first experiment presented in Fig. 6, the ellipse and sphere center estimation mean error was evaluated w.r.t. the number of points using 3PFit and Direct3PFit in case of no noise, 1 pixel and 2 pixel white Gaussian noise in the image. The sphere parameters were \(\textbf{s} = \begin{bmatrix}-0.95&0.35&3.00\end{bmatrix}^{\text {T}}\) and \(r = 0.35\), and the tests were repeated 1000 times with different point sets containing 3–700 inliers. We observed that the sphere center error is about \(10^{1.5}\) times more than the ellipse center error in this configuration. For overdetermined cases, 50 was a sufficient number of fitting points to get a millimeter-precise result with this setup during synthetic tests. Noise-free scenarios and at least 50 points gave a sub-millimeter result. In general, more contour points gradually reduced the error rate. For real-world scenarios, accurate inlier localization was a bottleneck of the sphere detection, as the algorithm in this setup cannot approach centimeter precision with 2 pixel noise even with about 500 points. Nonetheless, in general, more contour points gradually reduced the error rate.
Ill-conditioned tests. To support this observation and test the robustness, we generated ill-conditioned scenarios containing only adjacent inliers, i.e., an arc segment. This problem occurs when the ellipse is blurred, and only a short contour segment can be detected. We compared the results to the two other sphere detectors IMM (Shi et al. 2019) and LLSB (Sun et al. 2016) as well presented in Fig. 7 and Fig. 8. The methods using 40 adjacent inliers from the total \({\sim }2000\) points gave unsatisfactory results: ellipses with large semi-axes as the inliers are nearly fitting to a line. 47, 48, and 51 inliers have a varying, unstable impact. From 59 to 101 inliers, the parameters converged to the optimal values and reached centimeter-precise results. Finally, the ellipses are ideal in case of 250 or more inliers. LLSB and the proposed algorithm overperform IMM in case of 70–100 inliers. The compared quantitative results and runtime are listed in Table 4. The measurements executed on an Intel(R) i7-8665U 1.90 GHz with 16 GB RAM verified the accuracy of the methods, while the runtime of the proposed approach was significantly better: within a second for 1894 run.
Stability analysis. More than 25000 noiseless test cases with 1000 inliers were generated with varying sphere parameters. The top image of Fig. 9 showcases the reliability of the proposed method and LLSB: the highest errors were around \(10^{-10}\) and \(10^{-8}\), respectively. However, the two peaks around \(10^{-7}\) and \(10^{-2}\) reveals the stability problems of IMM in the zoomed, bottom image. Additionally, we evaluated the effect of the proposed implementation improvements presented in Sect. 4. The optimization steps did not influence the results of the overdetermined cases. In contrast, the same test data set but with a minimal three-point fitting revealed that the algorithm is faster by \(3.71\%\) with the same accuracy. This property is especially beneficial in using robust detectors with a high outlier ratio.
5.4 Photorealistic Tests
Further tests based on photorealistic images were generated by Blender. The camera parameters, listed in the third row of Table 3, and the sphere radius \(r = 0.25\) m were fixed while the sphere location was changed in the scene. We described the sphere locations using the following terms.
The depth coordinates and a varying step set were defined as \(z_k = 0.75 + 0.05 \cdot k \) and \(step_k = 0.05 + 0.065 \cdot k \) where \(k \in \{0, \dots , 25 \}\). We defined new set of x and y coordinates for every depth value \(z_k\) as \(x_i= step_k \cdot i \) and \(y_j = step_k \cdot j \) where \(i \in \{0, \dots , 9 \}\) and \(j \in \{0, \dots , 3 \}\). This process evaluated \(10 \times 4 \times 26 = 1040\) sphere positions in total. All sphere contours were continuous and fully visible in the input images. The image sequence of Fig. 10 on the left shows some generated input image indexed with [i, j, k]. As no noise was added to the images, the inlier set \({\mathcal {P}}\) was obtained directly by Canny edge detection.
The results for the test evaluation are listed in Table 5. The high-quality input images caused a sub-pixel error rate for every tested method. Our 3pFit method approximated the length of the major semi-axis the best and gave the most precise sphere localization. Although the estimation of b and \(\textbf{e}_0\) were not outperform some other methods, the difference is minimal, and the constraint-based approach generated precise sphere positions. The time demand of the proposed method is also notable compared to the others. Note, that implementation of AAMB and HQ was provided in advance; hence, the edge detection was different which affected the runtime. The methods LSQDir, LLSB, and IMM got the same input pointset after executing Canny edge detection. Ell2Sphere algorithm helps to estimate the sphere positions for the comparison based on the general five-point methods HQ, AAMB, and LSQDir.
The middle image in Fig. 10 presents the median sphere center estimation error w.r.t. the distance from the optical axis projected into the image and measured in pixels, which means the \(\textbf{p}\)—\(\textbf{e}_S\) distance. This diagram concludes that the further distance—i.e., higher a/b ratio—favors a more precise estimation in case of high picture quality, especially in case of the HQ method. One key component of the proposed method is to estimate the major ellipse axis end-points, and circle-like sphere projections do not favor this process. The optimal distance is around \(300-600\) pixel distance for LSQDir, LLSB, IMM, and the proposed method followed by an increasing error. If the depth of the sphere is normal but the distance from the optical axis is relatively large, the ellipse approximates to another conic section: the parabola (see Fig. 3). Hence the inflected problem is similar to the circle-like cases.
The depth test on the right image of Fig. 10 reports the same sphere center estimation error w.r.t. the z coordinate of the sphere. This setup had an optimum at \(z = 1.4\) m, but the difference from the higher error rate is around 1 mm. On the first hand, the further distance from the camera increases the rounding error as we convert the point between pixels and the metric system. On the other hand, the horizontal and vertical views of the camera influence the visibility of the sphere in the image. The closer depth means a smaller range for sphere coordinates x and y, which causes unfavorable, circle-like ellipse parameters as in the previous test. Some examples are plotted on the left in Fig. 10, labeled as image[0, 0, 0], image[9, 0, 0], image[0, 3, 0], and image[9, 3, 0] among the input images. We think the optimum was the point when the spheres were relatively further from the camera and the optical axis, and the raster images caused minimal measurement error in the contour points. The sphere fitting was generally stable in the case of varying depth parameters. Note that the trend of AAMB is worse around 1–2 mm in depth and optical axis distance tests.
5.5 Real-World Stereo Camera Tests
During real-world scenarios, we used Hikvision MV-CA020-20GC cameras to measure the precision of the sphere center estimation (see detailed camera parameters in Table 3). A sphere with 0.25 m radius was placed 68 times in the scene. Some image pairs with the fitted ellipses are pictured in Fig. 11. The sphere-camera positions were not measured directly, but after a GT transformation: the cameras were translated along axis x by 21 cm. We refined this prior knowledge of the extrinsic parameters using the checkerboard-based (Zhang 2000; Heikkila and Silven 1997) Stereo Camera Calibrator in Matlab Computer Vision Toolbox. Therefore, the sphere center was estimated in the two camera coordinate systems based on the paired images, and the error was given as the translation error of the sphere centers.
For real images, edge detection and outlier filtering are required; accordingly, the methods were tested using RANSAC. The sub-steps are compared in Table 6. The proposed process finds the best ellipse contour point set with 3pFit and Direct3pFit computes the sphere center in a single step. The general five-point methods HQ, AAMB, and LSQDir were supplemented with \(Ell2Sphere \) to facilitate the comparison after ellipse refitting. CB applies a circle-based minimal method and Levenberg-Marquardt optimization on implicit ellipse parameters from LSQDir.
At first, we examined the difference between our proposed 2-step sphere center estimation via the ellipse fitting and the direct method that directly generates the sphere from the points. The results listed in Table 7 justify that the two approaches had similar results, and the results differed only due to rounding errors. Hence, applying Direct3pFit in a single operation during refitting is an appropriate choice.
The main results of the experiments over the concurrent methods were plotted in Fig. 12. The horizontal axis exhibits all of the test cases. The five dots are the measured errors in every column based on the same input using varying methods. The results are ordered based on the error values of the proposed method for easier visual comparison. The fewer points below the dark blue points, the better the proposed fitting algorithm. The color of the lowest dots gives the ratio of the best estimation listed in the upper left corner of the diagram. Missing dots in a column mean that a result is an outlier or the error rate is higher than 6 cm. The outlier ratio are also listed above, and the most conspicuous issue was the high rate of LSQDir and the deviation of the error of CB. The other approaches had only negligible wrong results. Using the proposed algorithm, matching the sphere center estimation with the two cameras gave \(1-6\) cm error while the average distance of the sphere and the cameras was around 1–2 m. In general, the error was not linear w.r.t. the distance of the sphere and the cameras, but close sphere images (as the stereo image example in fourth column of Fig 16) definitely results in a smaller error. The positions of purple points in some of the last test cases indicate that LSQDir produced the best, only 2 cm error. Tn these instances, sphere-camera distance was long (like in fifth column of Fig 16) meaning that LSQDir was robust despite the outlier rate. Our method outperformed the rival ones giving the best results for \(69.70\%\) of the test cases.
Figure 13 represents the distribution of the same data series categorized into 0.5 cm long intervals in a histogram-like manner. The proposed method performed better within the 1.5–2 cm error rate, while most cases were between 2 and 3 cm. Remarkable that 9 test results of CB were below 1.5 cm error. In precision, the AAMB method lagged at the beginning. The drawback of CB is that \(41.86\%\) of the tests were above the 6 cm error and cannot be listed in the histogram.
For a more profound analysis, Fig. 14 shows the sphere location error along the three main coordinate axes separately. During the estimation for the coordinates x and y, the spread was below 2 cm. The mean error was not accurate around zero. This may be caused by the light direction affecting the edge positions. The plot suggests that the detectable error comes from the wrong estimation, mainly along axis z, i.e., the methods are sensitive in terms of depth estimation. However, \(50\%\) of the estimation was still within the \(\pm 2\) cm error level. The advantage of the proposed method also appeared along axis z: the standard deviation was reduced from 6 to 4 cm compared to other methods. It can be seen along every axis that these camera instincts negatively affect the circle-based results of CB as they have the most significant spread in all three directions.
The cumulative distribution function (CDF) shown in Fig. 15 has very similar results as a continuous function. The steepest parts of the functions are similarly around 1.5–3 cm. The mean error values for the methods are listed in the bottom right corner. In total, the proposed method precedes the others with 2–4 mm. Due to the high amount of outliers and less accurate estimations, the LSQDir and CB does not reach the top value within 6 cm.
5.6 Real-World Application: Camera-LiDAR Calibration
In this part of the paper, we illustrate the utility of the proposed algorithms via a multi-camera and LiDAR calibration experiment for an autonomous driving system. In Tóth et al 2020, we presented how spheres are suitable for LiDAR-camera calibration, which has three main steps: (i) pointcloud-based sphere center estimation, (ii) image-based sphere center estimation, and (iii) relative pose estimation by pointset registration. The theoretical contribution of this paper was applicable in the image-based sphere recognition, as 3pFit was executed in a RANSAC framework to find the ellipse in the image and Direct3pFit to find the estimated sphere center based on the best consensus set. This section aims to present the results of the improved calibration process.
A Velodyne VLP-16 LiDAR sensor and three Hikvision cameras (see the left picture in Fig. 17) were mounted on our test vehicle. Initially, the cameras under the LiDAR were oriented about \(\pm 20^{\circ }\) to the forward direction, and the one on the top was looking ahead, but the displacement errors during the mounting process and the car movement motivated us to apply a precise calibration. The cameras and the LiDAR were hardware-triggered to take the images, and the pointcloud was captured during a single turnaround of the LiDAR simultaneously. The calibration input was one continuous capture with 4 frames per second with half of the ordinary image resolution (see Hikvision camera intrinsics in Table 3) while one walked with a sphere back and forth in front of the sensor setup. The method calibrated every camera to the LiDAR-based on 6, 9, and 11 image-pointcloud pair. The number of the test images and LiDAR scans depended on the visibility of the sphere from different points of view. Some recordings contained the sphere in two camera images; hence, only 19 LiDAR clouds were processed in total due to the overlap in the field of view.
One image sequence of Camera\(\#2\) can be seen in Fig. 16. The mean error and standard deviation compared to the original SphereCalib (Tóth et al 2020) method of this example were listed in Table 8. The novel methods improved the calibration significantly: the mean error was reduced from 5.48 to 1.25 cm with sub-centimeter standard deviation results. The 3D point registration gives an error within a few centimeters.
The calibration result is compared to the actual setup in Fig. 17. The origins and the coordinate axis directions represent a realistic sensor setup. Especially the rotation parameters are accurate, as the coloring shows in the further qualitative examples. This estimation is essential, as the displacement can be more easily guaranteed during mounting, but the rotation of the cameras is more often error-prone than planned.
Figure 18 visualizes the estimated 6, 12, 10, and 19 sphere centers of Camera\(\#1\), Camera\(\#2\), Camera\(\#3\), and the LiDAR, respectively, after transforming them in the LiDAR coordinate system. This qualitative comparison confirms the results in Table 8 as the error can be observed in the point pair registration before applying 3pFit and Direct3pFit in the top image. After the improvements, the sphere centers nearly overlap in the bottom image, as the error rate is minimal.
In Fig. 19, the colorized and registered pointcloud sequence is a visual illustration of the reliability. As we had neither GPS nor IMU data, the iterative closest point (ICP, Besl and McKay (1992)) algorithm concatenated the LiDAR snapshots to map the environment and the movement of the vehicle. Several landmarks and objects are well-colored: the window sills, the basement pillars, the gutter, the road sign, and some shadows.
In conclusion, the calibration experiments revealed that spheres are practical for 2D–3D data registration. A further application of spheres is panoramic image stitching, as the projected ellipse centers can be matched between two cameras, which gives the homography between the images (Hartey and Zisserman 2004). We plan to expand the autonomous test vehicle with additional sensors and use sphere-based calibration for adequate reliability.
6 Conclusion
This paper proposed a simple analytical three-point algorithm for parameter estimation of a sphere and its projected ellipse, which needs no numerical optimization. We demonstrated that the contour points of the sphere in the image form a special ellipse. Based on this particular setup, two geometric constraints have been applied here: the projectional constraint and the axial constraint. Experimental results show that the proposed ellipse fitting method outperforms the rival techniques in various environments. Our analysis also reveals that the introduced constraints and some simplification steps improve the precision of the proposed algorithm. Therefore, this method is faster even than the rival three-point approaches. Given these results, we believe that the proposed fitting algorithms provide accurate parameters to assist various algorithms in utilizing spheres.
References
Barath, D., & Matas, J. (2021). Graph-cut ransac: Local optimization on spatially coherent structures. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2021.3071812
Besl, P., & McKay, N. D. (1992). A method for registration of 3-d shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(2), 239–256. https://doi.org/10.1109/34.121791
Cai, Z., & Vasconcelos, N. (2018). Cascade R-CNN: delving into high quality object detection. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 6154–6162, https://doi.org/10.1109/CVPR.2018.00644
Chum, O., & Matas, J. (2005). Matching with PROSAC—progressive sample consensus. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR 2005), 20-26 June 2005, San Diego, CA, USA, pp. 220–226, https://doi.org/10.1109/CVPR.2005.221
Dong, W., Roy, P., Peng, C., et al. (2021). Ellipse R-CNN: Learning to infer elliptical object from clustering and occlusion. IEEE Transactions on Image Processing, 30, 2193–2206. https://doi.org/10.1109/TIP.2021.3050673
Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395. https://doi.org/10.1145/358669.358692
Fitzgibbon, A., Pilu, M., & Fisher, R. (1996). Direct least squares fitting of ellipses. In: Proceedings of 13th international conference on pattern recognition (Vol. 1, pp. 253–257). https://doi.org/10.1109/ICPR.1996.546029
Fitzgibbon, A. W., Pilu, M., & Fisher, R. B. (1999). Direct least square fitting of ellipses. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(5), 476–480. https://doi.org/10.1109/34.765658
Glaeser, G., Stachel, H., & Odehnal, B. (2016). The Universe of Conics. Heidelberg: Springer Spektrum Berlin.
Guan, J., Deboeverie, F., Slembrouck, M., et al. (2015). Extrinsic calibration of camera networks using a sphere. Sensors, 15(8), 18985–19005. https://doi.org/10.3390/s150818985
Hajder, L., Tóth, T., & Pusztai, Z. (2020). Automatic estimation of sphere centers from images of calibrated cameras. In: Proceedings of the 15th international joint conference on computer vision, imaging and computer graphics theory and applications, VISIGRAPP 2020, Volume 4: VISAPP, Valletta, Malta, February 27-29, 2020. SCITEPRESS, pp. 490–497. https://doi.org/10.5220/0009164304900497
Halir, R., & Flusser, J. (1998). Numerically stable direct least squares fitting of ellipses. In: Proceedings 6th international conference in central Europe on computer graphics and visualization. WSCG
Hartey, A., & Zisserman, A. (2004). Multiple view geometry in computer vision (2 edn). Cambridg: Cambridge University Press, https://doi.org/10.1017/CBO9780511811685
Heikkila, J., & Silven, O. (1997). A four-step camera calibration procedure with implicit image correction. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition, pp. 1106–1112. https://doi.org/10.1109/CVPR.1997.609468
Hoffmann, C. M. (1989). Geometric and Solid Modeling: An Introduction. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Hough, P.V. (1962). Method and means for recognizing complex patterns US Patent
Jabbar, A., Mendes, A., Chalup, S.K. (2019). Comparing ellipse detection and deep neural networks for the identification of drinking glasses in images. In: Proceedings of 12th international conference computer vision systems, ICVS 2019, Thessaloniki, Greece, September 23-25, 2019, pp. 319–329, https://doi.org/10.1007/978-3-030-34995-0_29
Klüng, A., & Meli, F. (2005). Self calibration method for 3d roundness of spheres using an ultraprecision coordinate measuring machine. In: Proceedings of the 5th EUSPEN international conference, pp 193–196.
Liu, Z., & Qiao, H. (2009). Multiple ellipses detection in noisy environments: A hierarchical approach. Pattern Recognition, 42(11), 2421–2433. https://doi.org/10.1016/j.patcog.2009.01.028
Liu, Z., Qiao, H., & Xu, L. (2007). Investigation on multisets mixture learning based object detection. International Journal of Pattern Recognition and Artificial Intelligence, 21(8), 1339–1351. https://doi.org/10.1142/S0218001407005971
Liu, Z., Wu, Q., Wu, S., et al. (2017). Flexible and accurate camera calibration using grid spherical images. Optics Express 25, 13, 15,269-15,285. https://doi.org/10.1364/OE.25.015269
Lu, C., Xia, S., Shao, M., et al. (2020). Arc-support line segments revisited: An efficient high-quality ellipse detection. IEEE Transactions on Image Processing, 29, 768–781. https://doi.org/10.1109/TIP.2019.2934352
Lu, Y., & Payandeh, S. (2010). On the sensitivity analysis of camera calibration from images of spheres. Computer Vision and Image Understanding, 114, 8–20. https://doi.org/10.1016/j.cviu.2009.09.001
Mai, F., Hung, Y.S., & Zhong, H., et al. (2007). A hierarchical approach for fast and robust ellipse extraction. In: 2007 IEEE international conference on image processing, pp V-345–V-348, https://doi.org/10.1109/ICIP.2007.4379836
Matsuoka, R., & Maruyama, S. (2016). Eccentricity on an image caused by projection of a circle and a sphere. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences III-5:19–26. https://doi.org/10.5194/isprs-annals-III-5-19-2016
Meng, C., Li, Z., Bai, X., et al. (2020). Arc adjacency matrix-based fast ellipse detection. IEEE Transactions on Image Processing, 29, 4406–4420. https://doi.org/10.1109/TIP.2020.2967601
Nair, P. S., & Saunders, A. T., Jr. (1996). Hough transform based ellipse detection algorithm. Pattern Recognition Letters,17(7), 777–784. https://doi.org/10.1016/0167-8655(96)00014-1
Prasad, D. K., Leung, M. K. H., & Quek, C. (2013). Ellifit: An unconstrained, non-iterative, least squares based geometric ellipse fitting method. Pattern Recognition, 46(5), 1449–1465. https://doi.org/10.1016/j.patcog.2012.11.007
Rosin, P. L. (1993). A note on the least squares fitting of ellipses. Pattern Recognition Letters, 14(10), 799–808. https://doi.org/10.1016/0167-8655(93)90062-I
Ruijl, T., Franse, J., & van Eijk, J. (2001). Ultra precision cmm aiming for the ultimate concept. In: Proceedings of the 2nd EUSPEN international conference, pp 234–237, https://doi.org/10.1051/ijmqe/2014009
Shi, K., Li, X., & Xu, H. et al. (2019). Sphere localization from a minimal number of points in a single image. In: Proceedings of the international conference on advances in computer technology, information science and communications–Volume 1: CTISC, INSTICC. SciTePress, pp 65–70, https://doi.org/10.5220/0008096300650070
Sun, J., He, H., & Zeng, D. (2016). Global calibration of multiple cameras based on sphere targets. Sensors, 16(1), 77. https://doi.org/10.3390/s16010077
Tóth, T., Pusztai, Z., & Hajder, L. (2020). Automatic lidar-camera calibration of extrinsic parameters using a spherical target. In: 2020 IEEE international conference on robotics and automation, ICRA 2020, Paris, France, May 31–August 31, 2020. IEEE, pp 8580–8586, https://doi.org/10.1109/ICRA40945.2020.9197316
Xie, Y., & Ji, Q. (2002). A new efficient ellipse detection method. In: 16th international conference on pattern recognition, pp 957–960 vol.2, https://doi.org/10.1109/ICPR.2002.1048464
Xiong, F., et al. (2021). Lens-Free optical detection of thermal motion of a submillimeter sphere diamagnetically levitated in high vacuum. Physical Review Applied. https://doi.org/10.1103/physrevapplied.16.l011003
Zhang, Z. (1997). Parameter estimation techniques: A tutorial with application to conic fitting. Image and Vision Computing, 15(1), 59–76. https://doi.org/10.1016/S0262-8856(96)01112-2
Zhang, Z. (2000). A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11), 1330–1334. https://doi.org/10.1109/34.888718
Zuliani, M., Kenney, C.S., & Manjunath, B.S. (2005). The multiRANSAC algorithm and its application to detect planar homographies. In: Proceedings of the 2005 international conference on image processing, ICIP 2005, Genoa, Italy, September 11-14, 2005, pp 153–156, https://doi.org/10.1109/ICIP.2005.1530351
Acknowledgements
EFOP\(-\)3.6.3-VEKOP-16-2017-00001: Talent Management in Autonomous Vehicle Control Technologies—The Project is supported by the Hungarian Government and co-financed by the European Social Fund. Supported by the ÚNKP-21-3 New National Excellence Program of the Ministry for Innovation and Technology from the source of the National Research, Development and Innovation Fund. L. Hajder thanks the support of the “Application Domain Specific Highly Reliable IT Solutions” project that has been implemented with the support provided from the National Research, Development and Innovation Fund of Hungary, financed under the Thematic Excellence Programme TKP2020-NKA-06 (National Challenges Subprogramme) funding scheme.
Funding
Open access funding provided by Eötvös Loránd University.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Federica Arrigoni.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Perspective sphere projection
The implicit form of a sphere can be written as
where \(\textbf{s}=\begin{bmatrix} x_{0}&y_{0}&z_{0}\end{bmatrix}^{T}\) and r are the sphere center and radius, respectively. A spatial point \({\textbf{x}} \in {\mathbb {R}}^3\) is on the surface of the sphere if (A.1) holds.
The sphere is a special kind of quadric. The implicit equation of a general quadric is
where \({\textbf{M}}= {\textbf{M}}^{T}\) is a \(4 \times 4 \) symmetric matrix. It can be rewritten as
where the \(3\times 3\) matrix \(\tilde{{\textbf{M}}}\) is also symmetric.
For a sphere, the matrix \({\textbf{M}}\), representing the quadric, can be written as
Thus, for a sphere, \(\tilde{{\textbf{M}}}= {\textbf{I}},{\textbf{b}}=- {\textbf{s}}\), and \(\nu ={\textbf{s}}^{T} {\textbf{s}} - r^{2}\).
Contour of a projected quadric. For a pin-hole camera, the camera intrinsic parameters are frequently represented by the so-called camera matrix \({\textbf{K}}\). The projection of a point is given as follows: \( \lambda \begin{bmatrix}u&v&1\end{bmatrix}^{\text {T}}={\textbf{K}} {\textbf{x}}\), where \(\lambda \) is the so-called projective depth and \({\textbf{u}}=\begin{bmatrix}u&v&1\end{bmatrix}^{\text {T}}\) is the homogeneous representation of the image points. Therefore, a spatial point \({\textbf{x}}\) on the reprojected ray of image point \({\textbf{u}}\) can be obtained as \({\textbf{x}}=\lambda {\textbf{K}}^{-1} {\textbf{u}}\).
If the reprojected ray is substituted, we get the following implicit equation:
Then \(\lambda ^{2}{\textbf{u}}^{T}{\textbf{K}}^{-T} \tilde{{\textbf{M}}} {\textbf{K}}^{-1}{\textbf{u}}+2\lambda \nu {\textbf{b}}^{T} {\textbf{K}}^{-1} {\textbf{u}}+\nu =0\). This is a quadratic equation, therefore, a ray and a non-degenerate quadric have at most two intersection points. Contour points produce a single intersection in which case the discriminant is zero: \(\left( 2\nu {\textbf{b}}^{T}{\textbf{K}}^{-1}\textbf{u}\right) ^{2}-4\left( {\textbf{u}}^{T}{\textbf{K}}^{-T}\tilde{{\textbf{M}}} {\textbf{K}}^{-1}{\textbf{u}}\right) \nu =0\). Therefore,
This equation represents a conic in the image.
Special case: projection of spheres. For spheres, \({\textbf{b}}=-{\textbf{s}}\), \(\nu ={\textbf{s}}^{T}{\textbf{s}}-r^{2}\), and \(\tilde{{\textbf{M}}}={\textbf{I}}\). Moreover, if the cameras are calibrated as it is written in (2) then \({\textbf{K}}= {\textbf{I}}\). Therefore, (A.5) can be modified as
If the coordinates within the vectors are considered, the formula becomes \((ux_{0}+vy_{0}+z_{0})^{2}= (x_{0}^{2}+y_{0}^{2}+z_{0}^{2}-R^{2}) (u^{2}+v^{2}+1)\). One can rewrite it as
Appendix B Implicit approach with SVD
The equation system using three independent ellipse contour points \(\textbf{p}_i = \begin{bmatrix}{\hat{u}}_i&{\hat{v}}_i&1\end{bmatrix}^{\text {T}}\), \(i \in \{1,2,3\}\) can be written in matrix form, based on (4), as follows:
A more compact form is \({\textbf{M}} \varvec{\Phi } = {\textbf{0}}\), where \({\textbf{M}} \in {\mathbb {R}}^{3 \times 6} \) is the matrix containing the polynomial functions of the coordinates and \(\Phi \) is the column vector containing the six ellipse coefficients. Applying Singular Value Decomposition (SVD) \({\textbf{M}} = {\textbf{U}}\Sigma {\textbf{V}}^{\text {T}}\) yields the basis vectors for the null space of \({\textbf{M}}\). The last three column vectors of \({\textbf{V}}\) give the three right null vectors \(\mathbf {v_4}\), \(\mathbf {v_5}\), and \(\mathbf {v_6}\) of \({\textbf{M}}\). The linear combination of those gives the solution for the ellipse parameters:
The weights \(\gamma ,\delta \in {\mathbb {R}}\) for the linear combination should be computed considering the constraints, discussed in (10). Substituting the expressions into \({\mathcal {C}}_1\) and \({\mathcal {C}}_2\) leads us to an equation system of third degree polynomials containing two variables.
Unfortunately, a closed formula to \(\gamma \) and \(\delta \) expressed by the null space vector elements is not attainable, because there is no general solution of the following equation system:
where every coefficient \(g_i\) of \(\textbf{g}\) and \(h_i\) of \(\textbf{h}\) are third degree polynomials of some of the \(v_{jk}\) values. \(v_{jk}\) denotes the kth element of \(\mathbf {v_j}\), \(i \in \{1, \dots , 10\}\), \(j \in \{4,5,6 \}\), \(k \in \{1, \dots , 6\}\).
To the best of our knowledge, these coefficients cannot be factorized and do not contain any usable special property to find an explicit solution to \(\gamma \) and \(\delta \). Hence, we applied the symbolic solver built-in function in MATLAB on the equations containing the concrete values of vectors \(\mathbf {v_4}\), \(\mathbf {v_5}\), and \(\mathbf {v_6}\), but it is time-consuming which makes this solution not recommended. Shi et al. (2019) tried to solve it similarly in an implicit way, but by different techniques, and also found a solution only with multiple roots.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Tóth, T., Hajder, L. A Minimal Solution for Image-Based Sphere Estimation. Int J Comput Vis 131, 1428–1447 (2023). https://doi.org/10.1007/s11263-023-01766-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-023-01766-1