A novel data-driven algorithm for object detection, tracking, distance estimation, and size measurement in stereo vision systems

Taromi, Amirhossein Dadashzadeh; Klidbary, Sajad Haghzad

doi:10.1007/s11042-024-19372-9

A novel data-driven algorithm for object detection, tracking, distance estimation, and size measurement in stereo vision systems

Published: 20 May 2024

(2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Amirhossein Dadashzadeh Taromi¹ &
Sajad Haghzad Klidbary ORCID: orcid.org/0000-0002-5308-7264²

59 Accesses
Explore all metrics

Abstract

Distance and size estimation of objects of interests is an inevitable task for many navigation and obstacle avoidance algorithms mainly used in autonomus and robotic systems. Stereo vision systems, inspired by human visual perception, can infer depth from images as a cheap and accessible solution. On one hand, accurately calibrating cameras is a challenging task and the main source of error in current stereo vision based distance and size estimation algorithms. On the other hand, considering the recent advancements in Deep Learning, alongside the fact that human eyes do not need calibration but human brain can estimate the distance and size of objects fairly accurate was the main motivation behind this study. The proposed algorithm uses YOLOv8 as the object detector, and an MLP to learn the relation between distance, size, and disparity from collected data in a stereo vision system. In our experiments, conducted at distances ranging from 50 to 200 centimeters with calibrated and uncalibrated cameras, our proposed algorithm showcased accurate performance in both scenarios. It achieved distance measurements with an accuracy of up to 99.99% in select cases and maintained the mean accuracy of 98.15% for distance, 92.87% for width, and 93.92% for height estimations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 7

Monocular depth estimation based on deep learning: An overview

Article 10 June 2020

Realtime Object Distance Measurement Using Stereo Vision Image Processing

Influence of Neural Network Receptive Field on Monocular Depth and Ego-Motion Estimation

Article Open access 28 November 2023

Data

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Umam F, Fuad M, Suwarno I, Ma’arif A, Caesarendra W (2023) Obstacle avoidance based on stereo vision navigation system for omni-directional robot. JRC 4(2):227–242. https://doi.org/10.18196/jrc.v4i2.17977
Article Google Scholar
Alqobali R, Alshmrani M, Alnasser R, Rashidi A, Alhmiedat T, Alia OM (2023) A survey on robot semantic navigation systems for indoor environments. Appl Sci 14(1):89. https://doi.org/10.3390/app14010089
Article Google Scholar
Zhou M, Shen P, Zhu H, Shen Y (2023) In-water fish body-length measurement system based on stereo vision. Sensors 23(14):6325. https://doi.org/10.3390/s23146325
Article Google Scholar
Ouali I, Halima MB, Wali A (2022) Augmented reality for scene text recognition, visualization and reading to assist visually impaired people. Procedia Comput Sci 207:158–167. https://doi.org/10.1016/j.procs.2022.09.048
Article Google Scholar
Pereira A, Nunes N, Vieira D, Costa N, Barroso J (2015) Blind guide: an ultrasound sensor-based body area network for guiding blind people. Procedia Comput Sci 67:403–408. https://doi.org/10.1016/j.procs.2015.09.285
Article Google Scholar
Tokoro, S (1996) Automotive application systems of a millimeter-wave radar. In: Proceedings of conference on intelligent vehicles, IEEE, ???, pp 260–265. https://doi.org/10.1109/IVS.1996.566388
Nashashibi F, Devy M (1993) 3-D incremental modeling and robot localization in a structured environment using a laser range finder. In: [1993] Proceedings IEEE international conference on robotics and automation, IEEE, ???, pp 20–271. https://doi.org/10.1109/ROBOT.1993.291956
Mielle M, Magnusson M, Lilienthal AJ (2019) A comparative analysis of radar and lidar sensing for localization and mapping. In: 2019 European conference on mobile robots (ECMR), IEEE, ???, pp 1–6. https://doi.org/10.1109/ECMR.2019.8870345
Kim G, Ashraf I, Eom J, Park Y (2023) Coded pulse stream LiDAR based on optical orthogonal frequency-division multiple access. IEEE Access 11:142734–142747. https://doi.org/10.1109/ACCESS.2023.3343916
Article Google Scholar
Shirmohammadi S, Ferrero A (2014) Camera as the instrument: the rising trend of vision based measurement. IEEE Instrum Meas Mag 17(3):41–47. https://doi.org/10.1109/MIM.2014.6825388
Article Google Scholar
Aswini N, Uma SV (2019) Obstacle avoidance and distance measurement for unmanned aerial vehicles using monocular vision. Int J Electr Comput Eng (IJECE) 9(5):3504. https://doi.org/10.11591/ijece.v9i5.pp3504-3511
Article Google Scholar
Huang L, Chen Y, Fan Z, Chen Z (2018) Measuring the absolute distance of a front vehicle from an in-car camera based on monocular vision and instance segmentation. J Electron Imaging 27(04):1. https://doi.org/10.1117/1.JEI.27.4.043019
Article Google Scholar
Wahab MNA, Sivadev N, Sundaraj K (2011) Target distance estimation using monocular vision system for mobile robot. In: 2011 IEEE Conference on Open Systems, IEEE, ???, pp 11–15. https://doi.org/10.1109/ICOS.2011.6079296
Rahman KA, Hossain MdS, Bhuiyan MdA-A, Zhang T, Hasanuzzaman Md, Ueno H (2009) Person to camera distance measurement based on eye-distance. In: 2009 Third international conference on multimedia and ubiquitous engineering, IEEE, ???, pp 137–141. https://doi.org/10.1109/MUE.2009.34
Mustafah YM, Noor R, Hasbi H, Azma AW (2012) Stereo vision images processing for real-time object distance and size measurements. In: 2012 International conference on computer and communication engineering (ICCCE), IEEE, ???, pp 659–663. https://doi.org/10.1109/ICCCE.2012.6271270
Zaarane A, Slimani I, Al Okaishi W, Atouf I, Hamdoun A (2020) Distance measurement system for autonomous vehicles using stereo camera. Array 5:100016. https://doi.org/10.1016/j.array.2020.100016
Article Google Scholar
Martinez F, Jacinto E, Martinez F (2020) Obstacle detection for autonomous systems using stereoscopic images and bacterial behaviour. Int J Electr Comput Eng 10(2):2164–2172. https://doi.org/10.11591/ijece.v10i2.pp2164-2172
Article Google Scholar
Adil E, Mikou M, Mouhsen A (2022) A novel algorithm for distance measurement using stereo camera. CAAI Trans Intell Technol 7(2):177–186. https://doi.org/10.1049/cit2.12098
Article Google Scholar
Wang Z, Ding Y, Zhang T, Huang X (2023) Automatic real-time fire distance, size and power measurement driven by stereo camera and deep learning. Fire Saf J 140:103891. https://doi.org/10.1016/j.firesaf.2023.103891
Article Google Scholar
Zaidi SSA, Ansari MS, Aslam A, Kanwal N, Asghar M, Lee B (2022) A survey of modern deep learning based object detection models. Digital Signal Process 126:103514. https://doi.org/10.1016/j.dsp.2022.103514
Article Google Scholar
Wei S-D, Lai S-H (2008) Fast template matching based on normalized cross correlation with adaptive multilevel winner update. IEEE Trans Image Process 17(11):2227–2235. https://doi.org/10.1109/tip.2008.2004615. 18972660
Slimani I, Zaarane A, Hamdoun A (2016) Convolution algorithm for implementing 2D discrete wavelet transform on the FPGA. In: 2016 IEEE/ACS 13th International conference of computer systems and applications (AICCSA), IEEE, ???, pp 1–3. https://doi.org/10.1109/AICCSA.2016.7945831
Ouali I, Halima MB, Wali A (2023) An augmented reality for an arabic text reading and visualization assistant for the visually impaired. Multimedia Tools and Applications 82(28):43569–43597
Ouali I, Fourati R, Halima MB, Wali A (2023) A novel method for arabic text detection with interactive visualization. In: 2023 IEEE Symposium on Computers and Communications (ISCC), IEEE, pp 09–12. https://doi.org/10.1109/ISCC58397.2023.10218141
Lu X, Wang W, Ma C, Shen J, Shao L, Porikli F (2019) See more, know more: unsupervised video object segmentation with co-attention siamese networks. In: Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition (CVPR)
Lu X, Ma C, Ni B, Yang X, Reid I, Yang M-H (2018) Deep regression tracking with shrinkage loss. In: Proceedings of the European conference on computer vision (ECCV), pp 353–369
Lu X, Wang W, Shen J, Crandall DJ, Van Gool L (2021) Segmenting objects from relational visual data. IEEE Trans Pattern Anal Mach Intell 44(11):7885–7897. https://doi.org/10.1109/TPAMI.2021.3115815
Article Google Scholar
Brown D (1971) Close-range camera calibration. Accessed 27 Aug 2023. https://www.semanticscholar.org/paper/Close-Range-Camera-Calibration-Brown/1150007b62a3c7dac99c2c8f85c63 bfab74891af
Fryer JG, Brown DC (1986) Lens distortion for close-range photogrammetry. Photogramm Eng Remote Sens 52:51–58
Google Scholar
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer society conference on computer vision and pattern recognition. CVPR 2001 vol. 1, IEEE, ??? p. https://doi.org/10.1109/CVPR.2001.990517
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Computer society conference on computer vision and pattern recognition (CVPR’05) vol 1, IEEE, ???, pp 886–8931. https://doi.org/10.1109/CVPR.2005.177
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, IEEE, ???, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791
Article Google Scholar
Girshick R, Donahue J, Darrell T, Malik J (2013) Rich feature hierarchies for accurate object detection and semantic segmentation. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. https://doi.org/10.1109/CVPR.2014.81
Article Google Scholar
Girshick R (2015) Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp 07–13. IEEE. https://doi.org/10.1109/ICCV.2015.169
Ren S, He K, Girshick R, Sun J (2016) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Aharon N, Orfaig R, Bobrovsky B-Z (2022) Bot-sort: robust associations multi-pedestrian tracking. arXiv:2206.14651
Zhang Y, Sun P, Jiang Y, Yu D, Weng F, Yuan Z, Luo P, Liu W, Wang X (2022) Bytetrack: multi-object tracking by associating every detection box
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: Common Objects in Context. In: Computer Vision – ECCV 2014, Springer, Cham, Switzerland, pp 740–755. https://doi.org/10.1007/978-3-319-10602-1_48
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: ICML’10: Proceedings of the 27th international conference on international conference on machine learning, Omnipress, Madison, CT, USA, pp 807–814. https://doi.org/10.5555/3104322.3104425
Solak S, Bolat ED (2018) A new hybrid stereovision-based distance-estimation approach for mobile robot platforms. Comput Electr Eng 67:672–689. https://doi.org/10.1016/j.compeleceng.2017.10.022
Article Google Scholar

Download references

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this paper.

Author information

Authors and Affiliations

Department of Engineering, University of Zanjan, Zanjan, Iran
Amirhossein Dadashzadeh Taromi
Faculty of Engineering, Department of Electrical and Computer Engineering, University of Zanjan, Zanjan, Iran
Sajad Haghzad Klidbary

Authors

Amirhossein Dadashzadeh Taromi
View author publications
You can also search for this author in PubMed Google Scholar
Sajad Haghzad Klidbary
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Authors have same contribution of this paper.

Corresponding author

Correspondence to Sajad Haghzad Klidbary.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest with respect to the research, authorship, contribution, and/or publication of this paper.

Ethical responsibilities

This manuscript has not been published nor is it currently under consideration for publication elsewhere.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Taromi, A.D., Klidbary, S.H. A novel data-driven algorithm for object detection, tracking, distance estimation, and size measurement in stereo vision systems. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19372-9

Download citation

Received: 27 December 2023
Revised: 03 April 2024
Accepted: 06 May 2024
Published: 20 May 2024
DOI: https://doi.org/10.1007/s11042-024-19372-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel data-driven algorithm for object detection, tracking, distance estimation, and size measurement in stereo vision systems

Abstract

Access this article

Similar content being viewed by others

Monocular depth estimation based on deep learning: An overview

Realtime Object Distance Measurement Using Stereo Vision Image Processing

Influence of Neural Network Receptive Field on Monocular Depth and Ego-Motion Estimation

Data

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical responsibilities

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel data-driven algorithm for object detection, tracking, distance estimation, and size measurement in stereo vision systems

Abstract

Access this article

Similar content being viewed by others

Monocular depth estimation based on deep learning: An overview

Realtime Object Distance Measurement Using Stereo Vision Image Processing

Influence of Neural Network Receptive Field on Monocular Depth and Ego-Motion Estimation

Data

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical responsibilities

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation