Skip to main content
Log in

A novel data-driven algorithm for object detection, tracking, distance estimation, and size measurement in stereo vision systems

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Distance and size estimation of objects of interests is an inevitable task for many navigation and obstacle avoidance algorithms mainly used in autonomus and robotic systems. Stereo vision systems, inspired by human visual perception, can infer depth from images as a cheap and accessible solution. On one hand, accurately calibrating cameras is a challenging task and the main source of error in current stereo vision based distance and size estimation algorithms. On the other hand, considering the recent advancements in Deep Learning, alongside the fact that human eyes do not need calibration but human brain can estimate the distance and size of objects fairly accurate was the main motivation behind this study. The proposed algorithm uses YOLOv8 as the object detector, and an MLP to learn the relation between distance, size, and disparity from collected data in a stereo vision system. In our experiments, conducted at distances ranging from 50 to 200 centimeters with calibrated and uncalibrated cameras, our proposed algorithm showcased accurate performance in both scenarios. It achieved distance measurements with an accuracy of up to 99.99% in select cases and maintained the mean accuracy of 98.15% for distance, 92.87% for width, and 93.92% for height estimations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Umam F, Fuad M, Suwarno I, Ma’arif A, Caesarendra W (2023) Obstacle avoidance based on stereo vision navigation system for omni-directional robot. JRC 4(2):227–242. https://doi.org/10.18196/jrc.v4i2.17977

    Article  Google Scholar 

  2. Alqobali R, Alshmrani M, Alnasser R, Rashidi A, Alhmiedat T, Alia OM (2023) A survey on robot semantic navigation systems for indoor environments. Appl Sci 14(1):89. https://doi.org/10.3390/app14010089

    Article  Google Scholar 

  3. Zhou M, Shen P, Zhu H, Shen Y (2023) In-water fish body-length measurement system based on stereo vision. Sensors 23(14):6325. https://doi.org/10.3390/s23146325

    Article  Google Scholar 

  4. Ouali I, Halima MB, Wali A (2022) Augmented reality for scene text recognition, visualization and reading to assist visually impaired people. Procedia Comput Sci 207:158–167. https://doi.org/10.1016/j.procs.2022.09.048

    Article  Google Scholar 

  5. Pereira A, Nunes N, Vieira D, Costa N, Barroso J (2015) Blind guide: an ultrasound sensor-based body area network for guiding blind people. Procedia Comput Sci 67:403–408. https://doi.org/10.1016/j.procs.2015.09.285

    Article  Google Scholar 

  6. Tokoro, S (1996) Automotive application systems of a millimeter-wave radar. In: Proceedings of conference on intelligent vehicles, IEEE, ???, pp 260–265. https://doi.org/10.1109/IVS.1996.566388

  7. Nashashibi F, Devy M (1993) 3-D incremental modeling and robot localization in a structured environment using a laser range finder. In: [1993] Proceedings IEEE international conference on robotics and automation, IEEE, ???, pp 20–271. https://doi.org/10.1109/ROBOT.1993.291956

  8. Mielle M, Magnusson M, Lilienthal AJ (2019) A comparative analysis of radar and lidar sensing for localization and mapping. In: 2019 European conference on mobile robots (ECMR), IEEE, ???, pp 1–6. https://doi.org/10.1109/ECMR.2019.8870345

  9. Kim G, Ashraf I, Eom J, Park Y (2023) Coded pulse stream LiDAR based on optical orthogonal frequency-division multiple access. IEEE Access 11:142734–142747. https://doi.org/10.1109/ACCESS.2023.3343916

    Article  Google Scholar 

  10. Shirmohammadi S, Ferrero A (2014) Camera as the instrument: the rising trend of vision based measurement. IEEE Instrum Meas Mag 17(3):41–47. https://doi.org/10.1109/MIM.2014.6825388

    Article  Google Scholar 

  11. Aswini N, Uma SV (2019) Obstacle avoidance and distance measurement for unmanned aerial vehicles using monocular vision. Int J Electr Comput Eng (IJECE) 9(5):3504. https://doi.org/10.11591/ijece.v9i5.pp3504-3511

    Article  Google Scholar 

  12. Huang L, Chen Y, Fan Z, Chen Z (2018) Measuring the absolute distance of a front vehicle from an in-car camera based on monocular vision and instance segmentation. J Electron Imaging 27(04):1. https://doi.org/10.1117/1.JEI.27.4.043019

    Article  Google Scholar 

  13. Wahab MNA, Sivadev N, Sundaraj K (2011) Target distance estimation using monocular vision system for mobile robot. In: 2011 IEEE Conference on Open Systems, IEEE, ???, pp 11–15. https://doi.org/10.1109/ICOS.2011.6079296

  14. Rahman KA, Hossain MdS, Bhuiyan MdA-A, Zhang T, Hasanuzzaman Md, Ueno H (2009) Person to camera distance measurement based on eye-distance. In: 2009 Third international conference on multimedia and ubiquitous engineering, IEEE, ???, pp 137–141. https://doi.org/10.1109/MUE.2009.34

  15. Mustafah YM, Noor R, Hasbi H, Azma AW (2012) Stereo vision images processing for real-time object distance and size measurements. In: 2012 International conference on computer and communication engineering (ICCCE), IEEE, ???, pp 659–663. https://doi.org/10.1109/ICCCE.2012.6271270

  16. Zaarane A, Slimani I, Al Okaishi W, Atouf I, Hamdoun A (2020) Distance measurement system for autonomous vehicles using stereo camera. Array 5:100016. https://doi.org/10.1016/j.array.2020.100016

    Article  Google Scholar 

  17. Martinez F, Jacinto E, Martinez F (2020) Obstacle detection for autonomous systems using stereoscopic images and bacterial behaviour. Int J Electr Comput Eng 10(2):2164–2172. https://doi.org/10.11591/ijece.v10i2.pp2164-2172

    Article  Google Scholar 

  18. Adil E, Mikou M, Mouhsen A (2022) A novel algorithm for distance measurement using stereo camera. CAAI Trans Intell Technol 7(2):177–186. https://doi.org/10.1049/cit2.12098

    Article  Google Scholar 

  19. Wang Z, Ding Y, Zhang T, Huang X (2023) Automatic real-time fire distance, size and power measurement driven by stereo camera and deep learning. Fire Saf J 140:103891. https://doi.org/10.1016/j.firesaf.2023.103891

    Article  Google Scholar 

  20. Zaidi SSA, Ansari MS, Aslam A, Kanwal N, Asghar M, Lee B (2022) A survey of modern deep learning based object detection models. Digital Signal Process 126:103514. https://doi.org/10.1016/j.dsp.2022.103514

    Article  Google Scholar 

  21. Wei S-D, Lai S-H (2008) Fast template matching based on normalized cross correlation with adaptive multilevel winner update. IEEE Trans Image Process 17(11):2227–2235. https://doi.org/10.1109/tip.2008.2004615. 18972660

  22. Slimani I, Zaarane A, Hamdoun A (2016) Convolution algorithm for implementing 2D discrete wavelet transform on the FPGA. In: 2016 IEEE/ACS 13th International conference of computer systems and applications (AICCSA), IEEE, ???, pp 1–3. https://doi.org/10.1109/AICCSA.2016.7945831

  23. Ouali I, Halima MB, Wali A (2023) An augmented reality for an arabic text reading and visualization assistant for the visually impaired. Multimedia Tools and Applications 82(28):43569–43597

  24. Ouali I, Fourati R, Halima MB, Wali A (2023) A novel method for arabic text detection with interactive visualization. In: 2023 IEEE Symposium on Computers and Communications (ISCC), IEEE, pp 09–12. https://doi.org/10.1109/ISCC58397.2023.10218141

  25. Lu X, Wang W, Ma C, Shen J, Shao L, Porikli F (2019) See more, know more: unsupervised video object segmentation with co-attention siamese networks. In: Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition (CVPR)

  26. Lu X, Ma C, Ni B, Yang X, Reid I, Yang M-H (2018) Deep regression tracking with shrinkage loss. In: Proceedings of the European conference on computer vision (ECCV), pp 353–369

  27. Lu X, Wang W, Shen J, Crandall DJ, Van Gool L (2021) Segmenting objects from relational visual data. IEEE Trans Pattern Anal Mach Intell 44(11):7885–7897. https://doi.org/10.1109/TPAMI.2021.3115815

    Article  Google Scholar 

  28. Brown D (1971) Close-range camera calibration. Accessed 27 Aug 2023. https://www.semanticscholar.org/paper/Close-Range-Camera-Calibration-Brown/1150007b62a3c7dac99c2c8f85c63bfab74891af

  29. Fryer JG, Brown DC (1986) Lens distortion for close-range photogrammetry. Photogramm Eng Remote Sens 52:51–58

    Google Scholar 

  30. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer society conference on computer vision and pattern recognition. CVPR 2001 vol. 1, IEEE, ??? p. https://doi.org/10.1109/CVPR.2001.990517

  31. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Computer society conference on computer vision and pattern recognition (CVPR’05) vol 1, IEEE, ???, pp 886–8931. https://doi.org/10.1109/CVPR.2005.177

  32. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, IEEE, ???, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848

  33. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25

  34. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791

    Article  Google Scholar 

  35. Girshick R, Donahue J, Darrell T, Malik J (2013) Rich feature hierarchies for accurate object detection and semantic segmentation. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. https://doi.org/10.1109/CVPR.2014.81

    Article  Google Scholar 

  36. Girshick R (2015) Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp 07–13. IEEE. https://doi.org/10.1109/ICCV.2015.169

  37. Ren S, He K, Girshick R, Sun J (2016) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  38. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

  39. Aharon N, Orfaig R, Bobrovsky B-Z (2022) Bot-sort: robust associations multi-pedestrian tracking. arXiv:2206.14651

  40. Zhang Y, Sun P, Jiang Y, Yu D, Weng F, Yuan Z, Luo P, Liu W, Wang X (2022) Bytetrack: multi-object tracking by associating every detection box

  41. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: Common Objects in Context. In: Computer Vision – ECCV 2014, Springer, Cham, Switzerland, pp 740–755. https://doi.org/10.1007/978-3-319-10602-1_48

  42. Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: ICML’10: Proceedings of the 27th international conference on international conference on machine learning, Omnipress, Madison, CT, USA, pp 807–814. https://doi.org/10.5555/3104322.3104425

  43. Solak S, Bolat ED (2018) A new hybrid stereovision-based distance-estimation approach for mobile robot platforms. Comput Electr Eng 67:672–689. https://doi.org/10.1016/j.compeleceng.2017.10.022

    Article  Google Scholar 

Download references

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this paper.

Author information

Authors and Affiliations

Authors

Contributions

Authors have same contribution of this paper.

Corresponding author

Correspondence to Sajad Haghzad Klidbary.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest with respect to the research, authorship, contribution, and/or publication of this paper.

Ethical responsibilities

This manuscript has not been published nor is it currently under consideration for publication elsewhere.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Taromi, A.D., Klidbary, S.H. A novel data-driven algorithm for object detection, tracking, distance estimation, and size measurement in stereo vision systems. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19372-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-19372-9

Keywords

Navigation