Depth perception in single rgb camera system using lens aperture and object size: a geometrical approach for depth estimation

Alphonse, P. J. A.; Sriharsha, K. V.

doi:10.1007/s42452-021-04212-4

Depth perception in single rgb camera system using lens aperture and object size: a geometrical approach for depth estimation

Research Article
Open access
Published: 01 May 2021

Volume 3, article number 595, (2021)
Cite this article

Download PDF

You have full access to this open access article

SN Applied Sciences Aims and scope Submit manuscript

Depth perception in single rgb camera system using lens aperture and object size: a geometrical approach for depth estimation

Download PDF

P. J. A. Alphonse¹ &
K. V. Sriharsha¹

4117 Accesses
7 Citations
Explore all metrics

Abstract

In recent years, with increase in concern about public safety and security, human movements or action sequences are highly valued when dealing with suspicious and criminal activities. In order to estimate the position and orientation related to human movements, depth information is needed. This is obtained by fusing data obtained from multiple cameras at different viewpoints. In practice, whenever occlusion occurs in a surveillance environment, there may be a pixel-to-pixel correspondence between two images captured from two cameras and, as a result, depth information may not be accurate. Moreover use of more than one camera exclusively adds burden to the surveillance infrastructure. In this study, we present a mathematical model for acquiring object depth information using single camera by capturing the in focused portion of an object from a single image. When camera is in-focus, with the reference to camera lens center, for a fixed focal length for each aperture setting, the object distance is varied. For each aperture reading, for the corresponding distance, the object distance (or depth) is estimated by relating the three parameters namely lens aperture radius, object distance and object size in image plane. The results show that the distance computed from the relationship approximates actual with a standard error estimate of 2.39 to 2.54, when tested on Nikon and Cannon versions with an accuracy of 98.1% at 95% confidence level.

Depth Estimation Using Single Camera with Dual Apertures

Depth Estimation in Image Sequences in Single-Camera Video Surveillance Systems

Depth perception in single camera system using focus blur and aperture number

Article 31 March 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

1.1 Motivation

Depth recovery is pre-requisite for identifying human actions in surveillance and this information can be attained from sequence of images taken in multiple views using more than one camera. But the real challenge lies in dealing with conventional photography that captures only two dimensional projection of a 3d scene. As conventional cameras [1] are capable of capturing in-focused potions of object(s) which are away from the plane of focus, focusing is considered to be one of the contributing factors for retrieving coarse depth information. When image is in-focus, knowledge of camera parameters help us in estimating range of an object with a typical experimental set up as shown in Fig. 4. When an object is near to the camera, the image tends to grow larger and tends to diminish gradually while moving away from the camera. This inverse relationship is studied in detail by relating three parameters lens aperture radius, object size in image plane and the object distance along camera axial line.

1.2 Research contributions

A relations is established between the object distance (or depth), object size and lens aperture radius and observations are taken in auto focusing mode.
A multiplying factor is introduced to compensate the effect of object lightning on camera exposure settings while estimating depth of an object.
A curve estimation regression analysis is made for validating the relation between camera to object distance (or depth) and object size.

Literature Survey is presented in Section 2, proposed work in section 3 with subsections Prerequisites, Experimental set up and Camera Calibration, Database creation for Depth Estimation, Results & interpretation in Section 4 with subsections dealing with the influence of object lighting and Camera exposure settings during Depth Estimation, Finding Real height from Depth estimated, Method validation, Model accuracy with reference to the existing works. The rest of paper is concluded in Section 5.

2 Literature survey

Depth Estimation or Extraction refers a set of algorithms or techniques aiming at reconstructing a spatial structure of a 3D scene. In order to obtain depth of an object in a 3D scene, several approaches have been in existence. Electronically, all these approaches can be categorized in two ways namely active and passive. Among the active methods so far proposed, light is the first kind of energy to measure distance. Incandescent light produced from high temperature of a coil is used for distance measuring. But the system is sensitive to color of illuminated object and hence it fails.

Time of Flight depth [2,3,4] using phase delay of the reflected IR light, estimates the depth directly without the help of conventional computer vision algorithms. But as the principle is based on phase shift calculation, only a range of distances with in one unambiguous range [0, 2 $\pi ]$ can be retrieved. And also calculations made from the phase shift $\emptyset$ gives possible measurement errors. Pulse modulation approach is an alternative ToF, where depth of object is associated not only with duration of light pulse but also with duration of camera shutter. This solves the range ambiguity but at the same time suffers from calibration issues.

Nonsystematic depth errors [5], caused by light scattering and multi path propagation effects in 3D scene reconstruction makes ToF cameras inefficient for practical use. Another critical problem seen in ToF depth image is motion blur [6] occurs when camera or target is in motion. The error in phase measurements induces overshoot or undershoots in depth transition regions with in the bounded integration time limiting the depth accuracy and frame rate.

Triangulation methods offer better solution in estimating depth by capturing a scene taken from multiple views. Here we are restricting our discussion to binocular vision (left and right views). As discussed in [7] differences between two images, give depth information and these differences are known as disparities. In other words shift between stereocorresponding points is termed as disparity.

However estimation of disparity map is considered to be the fundamental problem in computer vision. As mentioned in [8] stereo correspondence points are determined either with maximum correlation or with minimum sum of squared differences (SAD) or absolute intensity Differences (AID) and hence disparity map is constructed. Furthermore efforts are continued in order to increase accuracy in depth estimation and in this process depth map merging approaches [9] multi view stereo came into existence. In this process, depth map is computed for each view point using binocular stereo, synthesized according to path based normalized cross correlation metric and are merged to produce 3D models. However, the disparity map construction needs knowledge of camera configuration with epi-polar geometry constraint. And also occlusions in the monitoring environment creates mismatch in pixel to pixel correspondence which in turn leads to ambiguity in disparity map computation.

Power consumption, camera mounting space [10] and memory stack space for computation process [11] are also the factors that alarmed the researchers to go with monocular vision, i.e. information from single image) in order to obtain depth information.\par A single view point of ToF depth sensor [12, 13] offers significant benefit in providing accurate depth measurements without being affected by ambient lightning, shadows and occlusions. Here the design itself provides illumination and also phase measurement is taken as criteria for depth measurement but not the intensity.

As on Today CW-ToF (Continuous Wave ToF) sensors are dominating the consumer electronics and low end robotics space with certain limitations. As stated in [14] the design mainly suffers from three fundamental issues. One is the range which is limited by power consumption and eye safety considerations. Second is accuracy which is adversely affected by illumination affects. And finally the interference comes in to picture when they start operating in bulk amount in indoor and outdoor environments. Further in order to evaluate depth range accuracy, Plenoptic cameras [15,16,17] design came in to existence. This camera uses not only the intensity of light in the scene but also the directional information of light distribution in scene for retrieving depth of the surface. But with poor reconstruction depth quality and large storage requirements, human action recognition in surveillance systems is impossible and hence the design is not recommended.

Considering the economic and technical issues in all the above mentioned approaches, we proposed a theory relating three fundamental parameters Object size, lens aperture radius and object depth. This theory can be well established in any conventional surveillance cameras without incorporating any additional hardware or software and even it works in both indoor and outdoor environments under any circumstances.

3 Proposed work

3.1 Prerequisites

3.1.1 Focal length

Consider an object ‘O’ at infinity focus, photographed by rectilinear convex lens with a focal length ‘$f$’ and aperture radius’$a_{R}$’to form an image at a distance equal to ‘$f$’ from the image plane.

As shown in Fig. 1 lens with aperture of radius ‘$a_{R}$’ project incoming rays on to image plane and hence image distance for object distance at infinity defines the focal point ‘F’. In this case, an object is at infinite focus appears sharp with an image distance $I_{inf} \; = \;f$

As shown in Fig. 2, the point ‘F’ is focused at a distance ‘${\text{I}}_{{{\text{of}}}}$ behind the lens but in front of the image plane. As receiving plane is not at the Image focus, blurriness appears in the image.

'$O_{d}$' – Object distance from lens

'$I_{inf}$'- Distance between lens and image plane when image is in perfect focus\par

'$I_{inf} \; - \;I_{of}$-Distance between lens and image plane when image is de-focused\par.

As shown in Fig. 2 Light rays emanating from an object ‘O’ fall on lens and converges at distance '$I_{of}$' on sensor side of the lens.

For a thin lens [18] with focal length ‘$f$’, the relation between '$O_{d}$' and '$I_{of}$'is stated as

$$\frac{1}{{O_{d} }}\; + \;\frac{1}{{I_{of} }}\; = \;\frac{1}{f}$$

(1)

$$O_{d} \; = \;\frac{{f \times I_{of} }}{{I_{of} - f}}$$

(2)

We now explore what happens if an object is at a distance greater than or less than $O_{d}$ is imaged. As show in Fig. 2, object at infinity focus, would be focused at distance $I_{of}$ behind the lens but in front of the sensor creating a sharp image.

From the Geometry of Fig. 3, we obtain

$$\frac{{a_{R} }}{{I_{of} }} = \frac{{O_{s} }}{{I_{{inf - I_{of} }} }}$$

(3)

$$I_{of} \; = \;a_{R} \times \frac{{I_{{inf - I_{of} }} }}{{O_{s} }}$$

(4)

From the Geometry of Fig. 3, we also obtain

$$\frac{{I_{{inf - I_{of} }} }}{{I_{of} }}\; = \;\frac{{O_{s} }}{{a_{R} }}$$

(5)

$$\frac{{I_{inf} }}{{I_{of} }}\; = \;1\; + \;\frac{{O_{s} }}{{a_{R} }}$$

(6)

$$\frac{{I_{inf} }}{{I_{of} }}\; = \;\frac{{O_{s + } a_{R} }}{{a_{R} }}$$

(7)

$$I_{of} \; = \;I_{inf} \times \frac{{a_{R} }}{{O_{s + } a_{R} }}$$

(8)

Substituting Eq. (8) in Eq. (2), we obtain

$$O_{d} \; = \;\frac{{\left( {f \times \left( {I_{inf} \times \left( {\frac{{a_{R} }}{{O_{s} + a_{R} }}} \right)} \right)} \right) }}{{\left( {I_{inf} \times \left( {\frac{{a_{R} }}{{O_{s} + a_{R} }}} \right) - f} \right)}}$$

(9)

$$O_{d} \; = \;\frac{{\left( {f \times I_{inf} \times a_{R} } \right)}}{{\left( {I_{inf} \times a_{R} } \right) - \left( {f \times \left( {O_{s} + a_{R} } \right)} \right)}}$$

(10)

where

$$a_{R} = (f/f_{stop} )/2$$

Knowing the object distance from camera $O_{d}$, object real height can be determined from the relation

$$\frac{Real\;Height}{{Image\;height}}\; = \;\frac{{O_{d} }}{Image\;distance}$$

(11)

In many imaging applications, always object distance from lens is considerably greater than image distance. Hence the Eq. (1) can be approximated as

$$\frac{1}{f} \cong \frac{1}{Image\,distance}$$

(12)

By substituting Eq. (12) in Eq. (11),we obtain

$$\frac{Real\;Height}{{Image\;height}}\; = \;\frac{{Object\;distance\;(O_{d}) }}{focal\;length\;(f)}$$

(13)

$$Real\;Height = \frac{Image\;height \times Object\;distance}{f}$$

(14)

3.2 Experimental set-up and camera calibration

Our research aims at measuring the depth of a stationary object using a single RGB camera by establishing a relationship among focal length ($f$), lens aperture ($a_{R}$), object size ($O_{s}$) and camera to object distance (or depth)($O_{d}$) parameters. In this perspective, a 50 mm Nikkor prime lens (known as conventional lenses with fixed angle of view) is used to focus objects at different working distances starting from 10 cm by successively altering the opening $f_{stop}$ for repeated collection of readings. ISO and shutter speed parameters are tuned in required ratios for sensor to render the sharp image. The device and its exposure are mentioned in Tables 1 and 2.

Table 1 Experimental Device Details for Photogrammetry Experiment

Depth perception in single rgb camera system using lens aperture and object size: a geometrical approach for depth estimation

Abstract

Similar content being viewed by others

Depth Estimation Using Single Camera with Dual Apertures

Depth Estimation in Image Sequences in Single-Camera Video Surveillance Systems

Depth perception in single camera system using focus blur and aperture number

1 Introduction

1.1 Motivation

1.2 Research contributions

2 Literature survey

3 Proposed work

3.1 Prerequisites

3.1.1 Focal length

3.2 Experimental set-up and camera calibration

3.3 Creating a database for depth estimation

4 Results and discussions

4.1 Influence of object lighting on camera exposure settings during depth estimation

4.1.1 Finding optimum value of 'k'

4.2 Finding real height of an object from depth (\({\mathbf{O}}_{{\mathbf{d}}} )\) estimated

4.3 Method validation

4.4 Comparison of our work with existing works

5 Conclusions

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation