# 2.5D Vision-Based Estimation

**DOI:**https://doi.org/10.1007/978-1-4471-5102-9_100148-1

- 198 Downloads

## Abstract

2.5D vision-based techniques, also known as hybrid vision-based techniques, provide flexible ways to estimate the range or velocity of moving objects. The information from both the image space (2D) and the Cartesian space (3D) is simultaneously utilized to construct the system state in this technology, which overcomes the disadvantages of the traditional visual serving schemes. It has been widely adopted in Motion from structure, structure from motion, and structure and motion problems.

## Keywords

2.5D Vision Hybrid visual servoing Vision-based estimation Motion from structure Structure from motion Structure and motiom## Introduction

As a class of powerful tools widely utilized in robotics, the 2.5D vision-based techniques arise from visual servoing at the end of the twentieth century (Malis and Chaumette 1999). Visual servoing aims at increasing the flexibility, accuracy, and robustness of a closed-loop robotic system using real-time visual feedback (Hutchinson et al. 1996; Chaumette and Hutchinson 2006; Janabi-Sharifi et al. 2011). Different visual servo algorithms mainly differ in the construction of states. There are two traditional schemes: image-based visual servoing (IBVS) and position-based visual servoing (PBVS). IBVS utilizes the image space features to construct the closed-loop error system, and PBVS employs the Cartesian space features. However, both schemes have intrinsic drawbacks.

Originally, the 2.5D vision-based techniques are proposed to improve the performance of the traditional IBVS and PBVS schemes. The basic characteristics of the 2.5D vision-based control are simultaneously utilizing the information from both the image space (2D) and the Cartesian space (3D) to construct the states. Therefore, it is also referred to as the hybrid servoing. The 2.5D vision-based control can overcome many shortcomings of IBVS and PBVS because it provides flexible ways to manipulate the translational motion and the rotational motion individually, which greatly facilitates the control development. The classic regulation task of a robotic arm with six degrees of freedoms (DoFs) (Malis and Chaumette 1999, 2000) provides a great tutorial to understand the 2.5D vision-based control.

Due to the effectiveness and advantages of the 2.5D vision-based control, the 2.5D vision-based techniques are also widely adopted in another essential problem: vision-based estimation. Vision-based estimation aims at identifying the unknown key information of an object using visual sensors. There have been many related applications. In the vision-based estimation problem, the measurable variables are utilized as feedback to close the loop. In the estimation problem, the 2.5D vision-based techniques affect the construction of states, which is similar to the control problem. According to distinct design goals, the estimation problems can be roughly divided into motion from structure (MfS) problems, structure from motion (SfM) problems, and structure and motion (SaM) problems.

## 2.5D Vision-Based Estimation

*s*(

*t*) is the system state,

**v**(

*t*) is the camera velocity,

*ϕ*comprises some parameters, and

*f*

_{s}(⋅) is a function which describes how

*s*(

*t*),

**v**(

*t*), and

*ϕ*determine the evolution of the system state. In (1),

*y*

_{m}(

*t*) is the measurable output, and

*g*(⋅) describes the relationship between

*y*

_{m}(

*t*) and

*s*(

*t*). In terms of 2.5D vision-based estimation, the system state is generally constructed in the following form:

*s*

_{t}(

*t*) consists of \( p(t) \in \mathbb {R}^2 \) (the image coordinates of the feature point) and \( \alpha (t) \in \mathbb {R} \) (a factor related to the depth of the feature point). It is clear that

*p*(

*t*) comes from the image space (2D) and

*α*(

*t*) and

*s*

_{r}(

*t*) come from the Cartesian space (3D).

### Motion from Structure

*s*(

*t*) is measurable (

*y*

_{m}(

*t*) =

*s*(

*t*)) and

**v**(

*t*) remains to be determined. Given that the dynamics

*f*

_{s}(⋅) is known, the objective can be achieved by estimating \( \dot {s}(t) \). In (3), an appropriately designed observer

*h*(⋅) can guarantee that \( \hat {s}(t) \rightarrow s(t) \) and \( \dot {\hat {s}}(t) \rightarrow \dot {s}(t) \). Then, the velocity

**v**(

*t*) can be calculated utilizing the known

*f*

_{s}(⋅).

The two critical requirements mentioned before (*f*_{s}(⋅) is known and *s* is measurable) can be easily satisfied by using the 2.5D vision-based techniques with the aid of the homography. For example, Chitrakarana et al. (2005) utilize this strategy to asymptotically identify the six DoF velocity of a moving object using a single fixed camera. A reference state *s*_{d} is introduced and an error signal is constructed as *e*(*t*) = *s*(*t*) − *s*_{d}. Then, a computable interaction matrix is derived which relates the variation of the error \( \dot {e} \) to the velocity **v**. Next, a nonlinear continuous estimator is designed to identify \( \dot {e} \) asymptotically which is further utilized to determine the velocity provided that a single geometric length between two feature points and the rotation information between the camera and the reference image are known.

### Structure from Motion

Contrary to the MfS problem, the SfM problem utilizes the known relative motion between the camera and the object to reconstruct the object geometry. In other words, **v**(*t*) is known, while the unknown structure information may exist in *s*(*t*) (when *s*(*t*) is partially unmeasurable) or *ϕ* (when *s*(*t*) is measurable). Let the coordinates of a feature point expressed in the camera frame be denoted as *P* = (*X*, *Y*, *Z*)^{T} where *Z* indicates the depth.

*s*(

*t*), the available information comes from both

**v**(

*t*) and

*y*

_{m}(

*t*). This happens when the factor

*α*(

*t*) takes the absolute depth (

*α*=

*Z*or \( \alpha = \ln (Z) \)). The observer is designed in the following way:

*ζ*(

*t*), this strategy can be formulated as

*ϕ*,

*s*(

*t*) is available (

*y*

_{m}(

*t*) =

*s*(

*t*)), and the relationship between

*s*(

*t*) and

*ϕ*should be figured out to estimate

*ϕ*. This situation occurs when

*α*(

*t*) takes the ratio of depths (\( \alpha = \frac {Z_d}{Z} \) or \( \alpha = \ln \left ( \frac {Z_d}{Z} \right ) \)) because this ratio can be directly calculated by means of homography techniques. Under this circumstance, both

*s*(

*t*) and

**v**(

*t*) can be utilized to construct the observer for

*ϕ*.

### Structure and Motion

The SaM problem focuses on identifying both the Cartesian coordinates of the feature points and the relative motion information (Dani et al. 2012). In SaM problems, both *s*(*t*) and **v**(*t*) are (partially) unmeasurable and to be determined, which leads to two kinds of methods.

**v**

_{1}(

*t*) and

*ϕ*are unknown, all other available variables can be adopted in observers to determine

**v**

_{1}(

*t*) and

*ϕ*. Although the two dynamics are different, there is common information between them which facilitates the observer development.

**v**

_{1}(

*t*) up to a scale resulting from the unknown absolute range. As the range is contained in

*ϕ*(

*t*), \( \hat {\phi } \) will further benefit the estimation of

**v**

_{1}(

*t*).

## Future Directions for Research

The existing applications of 2.5D vision-based techniques have been confined to robotic arms with 6 DoFs. Much effort has been devoted to vision-based control and estimation on wheeled mobile robots (WMRs). However, the existing works are mainly image-based (Mariottini et al. 2007) or position-based (Zhang et al. 2018). Applying the 2.5D vision-based techniques on WMRs is still faced with many challenges due to the nonholonomic constraints.

Many works mention that the 2.5D-based scheme increases the likelihood that the object will stay in the camera field of view (FoV) (Malis and Chaumette 1999; Chen et al. 2005), but few of them provide sufficiently persuasive and rigorous theoretical analysis. Chen et al. (2007) adopt an image space navigation function together with the 2.5D-based scheme to generate a Cartesian space trajectory which ensures all feature points remain visible. However, the tracking errors may still result in the feature points leaving the FoV. Parikh et al. (2017) provide a different perspective on this issue for reference. They investigate the state estimation without visual feedback when the feature points are out of the FoV by means of switched systems. However, this strategy leads to the stabilization problem of the overall system.

In many works, the local minimums have not been rigorously analyzed. A finite number of simulations or experiments cannot guarantee the global convergence. Zhang et al. (2019) make a good demonstration of investigating the state space and analyzing the existence of multiple equilibriums. This helps in figuring out whether the global convergence holds or what the influence of those undesired equilibriums is.

## Cross-References

## Bibliography

- Chaumette F, Hutchinson S (2006) Visual servo control part I: Basic approaches. IEEE Robot Autom Mag 13(4):82–90CrossRefGoogle Scholar
- Chen J, Dawson DM, Dixon WE, Behal A (2005) Adaptive homography-based visual servo tracking for a fixed camera configuration with a camera-in-hand extension. IEEE Trans Control Syst Technol 13(5):814–825CrossRefGoogle Scholar
- Chen J, Dawson DM, Dixon WE, Chitrakaran VK (2007) Navigation function-based visual servo control. Automatica 43(7):1165–1177MathSciNetCrossRefGoogle Scholar
- Chen J, Chitrakaran VK, Dawson DM (2011) Range identification of features on an object using a single camera. Automatica 47(1):201–206MathSciNetCrossRefGoogle Scholar
- Chen J, Zhang K, Jia B, Gao Y (2018) Identification of a moving object’s velocity and range with a static-moving camera system. IEEE Trans Autom Control 63(7):2168–2175MathSciNetCrossRefGoogle Scholar
- Chitrakarana VK, Dawson DM, Dixon WE, Chen J (2005) Identification of a moving object’s velocity with a fixed camera. Automatica 41(3):553–562MathSciNetCrossRefGoogle Scholar
- Chwa D, Dani AP, Dixon WE (2015) Range and motion estimation of a monocular camera using static and moving objects. IEEE Trans Control Syst Technol 24(4):1174–1183CrossRefGoogle Scholar
- Dani AP, Kan Z, Fischer NR, Dixon WE (2011) Structure estimation of a moving object using a moving camera: An unknown input observer approach. In: 50th IEEE conference on decision and control and European control conference, Orlando, pp 5005–5010Google Scholar
- Dani AP, Fischer NR, Dixon WE (2012) Single camera structure and motion. IEEE Trans Autom Control 57(1):241–246MathSciNetCrossRefGoogle Scholar
- Hutchinson S, Hager GD, Corke PI (1996) A tutorial on visual servo control. IEEE Trans Robot Autom 12(5):651–670CrossRefGoogle Scholar
- Janabi-Sharifi F, Deng L, Wilson WJ (2011) Comparison of basic visual servoing methods. IEEE/ASME Trans Mechatron 16(5):967–983CrossRefGoogle Scholar
- Malis E, Chaumette F (1999) 2-1/2D visual servoing. IEEE Trans Robot Autom 15(2):238–250Google Scholar
- Malis E, Chaumette F (2000) 2 1/2 D visual servoing with respect to unknown objects through a new estimation scheme of camera displacement. Int J Comput Vis 37(1):79–97CrossRefGoogle Scholar
- Mariottini GL, Oriolo G, Prattichizzo D (2007) Image-based visual servoing for nonholonomic mobile robots using epipolar geometry. IEEE Trans Robot 23(1): 87–100CrossRefGoogle Scholar
- Parikh A, Cheng TH, Chen HY, Dixon WE (2017) A switched systems framework for guaranteed convergence of image-based observers with intermittent measurements. IEEE Trans Robot 33(2):266–280CrossRefGoogle Scholar
- Zhang K, Chen J, Li Y, Gao Y (2018) Unified visual servoing tracking and regulation of wheeled mobile robots with an uncalibrated camera. IEEE/ASME Trans Mechatron 23(4):1728–1739CrossRefGoogle Scholar
- Zhang K, Chaumette F, Chen J (2019) Trifocal tensor-based 6-DOF visual servoing. Int J Robot Res 38(10–11):1208–1228CrossRefGoogle Scholar