Abstract
Distance and size estimation of objects of interests is an inevitable task for many navigation and obstacle avoidance algorithms mainly used in autonomus and robotic systems. Stereo vision systems, inspired by human visual perception, can infer depth from images as a cheap and accessible solution. On one hand, accurately calibrating cameras is a challenging task and the main source of error in current stereo vision based distance and size estimation algorithms. On the other hand, considering the recent advancements in Deep Learning, alongside the fact that human eyes do not need calibration but human brain can estimate the distance and size of objects fairly accurate was the main motivation behind this study. The proposed algorithm uses YOLOv8 as the object detector, and an MLP to learn the relation between distance, size, and disparity from collected data in a stereo vision system. In our experiments, conducted at distances ranging from 50 to 200 centimeters with calibrated and uncalibrated cameras, our proposed algorithm showcased accurate performance in both scenarios. It achieved distance measurements with an accuracy of up to 99.99% in select cases and maintained the mean accuracy of 98.15% for distance, 92.87% for width, and 93.92% for height estimations.
Similar content being viewed by others
Data
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Umam F, Fuad M, Suwarno I, Ma’arif A, Caesarendra W (2023) Obstacle avoidance based on stereo vision navigation system for omni-directional robot. JRC 4(2):227–242. https://doi.org/10.18196/jrc.v4i2.17977
Alqobali R, Alshmrani M, Alnasser R, Rashidi A, Alhmiedat T, Alia OM (2023) A survey on robot semantic navigation systems for indoor environments. Appl Sci 14(1):89. https://doi.org/10.3390/app14010089
Zhou M, Shen P, Zhu H, Shen Y (2023) In-water fish body-length measurement system based on stereo vision. Sensors 23(14):6325. https://doi.org/10.3390/s23146325
Ouali I, Halima MB, Wali A (2022) Augmented reality for scene text recognition, visualization and reading to assist visually impaired people. Procedia Comput Sci 207:158–167. https://doi.org/10.1016/j.procs.2022.09.048
Pereira A, Nunes N, Vieira D, Costa N, Barroso J (2015) Blind guide: an ultrasound sensor-based body area network for guiding blind people. Procedia Comput Sci 67:403–408. https://doi.org/10.1016/j.procs.2015.09.285
Tokoro, S (1996) Automotive application systems of a millimeter-wave radar. In: Proceedings of conference on intelligent vehicles, IEEE, ???, pp 260–265. https://doi.org/10.1109/IVS.1996.566388
Nashashibi F, Devy M (1993) 3-D incremental modeling and robot localization in a structured environment using a laser range finder. In: [1993] Proceedings IEEE international conference on robotics and automation, IEEE, ???, pp 20–271. https://doi.org/10.1109/ROBOT.1993.291956
Mielle M, Magnusson M, Lilienthal AJ (2019) A comparative analysis of radar and lidar sensing for localization and mapping. In: 2019 European conference on mobile robots (ECMR), IEEE, ???, pp 1–6. https://doi.org/10.1109/ECMR.2019.8870345
Kim G, Ashraf I, Eom J, Park Y (2023) Coded pulse stream LiDAR based on optical orthogonal frequency-division multiple access. IEEE Access 11:142734–142747. https://doi.org/10.1109/ACCESS.2023.3343916
Shirmohammadi S, Ferrero A (2014) Camera as the instrument: the rising trend of vision based measurement. IEEE Instrum Meas Mag 17(3):41–47. https://doi.org/10.1109/MIM.2014.6825388
Aswini N, Uma SV (2019) Obstacle avoidance and distance measurement for unmanned aerial vehicles using monocular vision. Int J Electr Comput Eng (IJECE) 9(5):3504. https://doi.org/10.11591/ijece.v9i5.pp3504-3511
Huang L, Chen Y, Fan Z, Chen Z (2018) Measuring the absolute distance of a front vehicle from an in-car camera based on monocular vision and instance segmentation. J Electron Imaging 27(04):1. https://doi.org/10.1117/1.JEI.27.4.043019
Wahab MNA, Sivadev N, Sundaraj K (2011) Target distance estimation using monocular vision system for mobile robot. In: 2011 IEEE Conference on Open Systems, IEEE, ???, pp 11–15. https://doi.org/10.1109/ICOS.2011.6079296
Rahman KA, Hossain MdS, Bhuiyan MdA-A, Zhang T, Hasanuzzaman Md, Ueno H (2009) Person to camera distance measurement based on eye-distance. In: 2009 Third international conference on multimedia and ubiquitous engineering, IEEE, ???, pp 137–141. https://doi.org/10.1109/MUE.2009.34
Mustafah YM, Noor R, Hasbi H, Azma AW (2012) Stereo vision images processing for real-time object distance and size measurements. In: 2012 International conference on computer and communication engineering (ICCCE), IEEE, ???, pp 659–663. https://doi.org/10.1109/ICCCE.2012.6271270
Zaarane A, Slimani I, Al Okaishi W, Atouf I, Hamdoun A (2020) Distance measurement system for autonomous vehicles using stereo camera. Array 5:100016. https://doi.org/10.1016/j.array.2020.100016
Martinez F, Jacinto E, Martinez F (2020) Obstacle detection for autonomous systems using stereoscopic images and bacterial behaviour. Int J Electr Comput Eng 10(2):2164–2172. https://doi.org/10.11591/ijece.v10i2.pp2164-2172
Adil E, Mikou M, Mouhsen A (2022) A novel algorithm for distance measurement using stereo camera. CAAI Trans Intell Technol 7(2):177–186. https://doi.org/10.1049/cit2.12098
Wang Z, Ding Y, Zhang T, Huang X (2023) Automatic real-time fire distance, size and power measurement driven by stereo camera and deep learning. Fire Saf J 140:103891. https://doi.org/10.1016/j.firesaf.2023.103891
Zaidi SSA, Ansari MS, Aslam A, Kanwal N, Asghar M, Lee B (2022) A survey of modern deep learning based object detection models. Digital Signal Process 126:103514. https://doi.org/10.1016/j.dsp.2022.103514
Wei S-D, Lai S-H (2008) Fast template matching based on normalized cross correlation with adaptive multilevel winner update. IEEE Trans Image Process 17(11):2227–2235. https://doi.org/10.1109/tip.2008.2004615. 18972660
Slimani I, Zaarane A, Hamdoun A (2016) Convolution algorithm for implementing 2D discrete wavelet transform on the FPGA. In: 2016 IEEE/ACS 13th International conference of computer systems and applications (AICCSA), IEEE, ???, pp 1–3. https://doi.org/10.1109/AICCSA.2016.7945831
Ouali I, Halima MB, Wali A (2023) An augmented reality for an arabic text reading and visualization assistant for the visually impaired. Multimedia Tools and Applications 82(28):43569–43597
Ouali I, Fourati R, Halima MB, Wali A (2023) A novel method for arabic text detection with interactive visualization. In: 2023 IEEE Symposium on Computers and Communications (ISCC), IEEE, pp 09–12. https://doi.org/10.1109/ISCC58397.2023.10218141
Lu X, Wang W, Ma C, Shen J, Shao L, Porikli F (2019) See more, know more: unsupervised video object segmentation with co-attention siamese networks. In: Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition (CVPR)
Lu X, Ma C, Ni B, Yang X, Reid I, Yang M-H (2018) Deep regression tracking with shrinkage loss. In: Proceedings of the European conference on computer vision (ECCV), pp 353–369
Lu X, Wang W, Shen J, Crandall DJ, Van Gool L (2021) Segmenting objects from relational visual data. IEEE Trans Pattern Anal Mach Intell 44(11):7885–7897. https://doi.org/10.1109/TPAMI.2021.3115815
Brown D (1971) Close-range camera calibration. Accessed 27 Aug 2023. https://www.semanticscholar.org/paper/Close-Range-Camera-Calibration-Brown/1150007b62a3c7dac99c2c8f85c63bfab74891af
Fryer JG, Brown DC (1986) Lens distortion for close-range photogrammetry. Photogramm Eng Remote Sens 52:51–58
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer society conference on computer vision and pattern recognition. CVPR 2001 vol. 1, IEEE, ??? p. https://doi.org/10.1109/CVPR.2001.990517
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Computer society conference on computer vision and pattern recognition (CVPR’05) vol 1, IEEE, ???, pp 886–8931. https://doi.org/10.1109/CVPR.2005.177
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, IEEE, ???, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791
Girshick R, Donahue J, Darrell T, Malik J (2013) Rich feature hierarchies for accurate object detection and semantic segmentation. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. https://doi.org/10.1109/CVPR.2014.81
Girshick R (2015) Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp 07–13. IEEE. https://doi.org/10.1109/ICCV.2015.169
Ren S, He K, Girshick R, Sun J (2016) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Aharon N, Orfaig R, Bobrovsky B-Z (2022) Bot-sort: robust associations multi-pedestrian tracking. arXiv:2206.14651
Zhang Y, Sun P, Jiang Y, Yu D, Weng F, Yuan Z, Luo P, Liu W, Wang X (2022) Bytetrack: multi-object tracking by associating every detection box
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: Common Objects in Context. In: Computer Vision – ECCV 2014, Springer, Cham, Switzerland, pp 740–755. https://doi.org/10.1007/978-3-319-10602-1_48
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: ICML’10: Proceedings of the 27th international conference on international conference on machine learning, Omnipress, Madison, CT, USA, pp 807–814. https://doi.org/10.5555/3104322.3104425
Solak S, Bolat ED (2018) A new hybrid stereovision-based distance-estimation approach for mobile robot platforms. Comput Electr Eng 67:672–689. https://doi.org/10.1016/j.compeleceng.2017.10.022
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this paper.
Author information
Authors and Affiliations
Contributions
Authors have same contribution of this paper.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest with respect to the research, authorship, contribution, and/or publication of this paper.
Ethical responsibilities
This manuscript has not been published nor is it currently under consideration for publication elsewhere.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Taromi, A.D., Klidbary, S.H. A novel data-driven algorithm for object detection, tracking, distance estimation, and size measurement in stereo vision systems. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19372-9
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-19372-9