Skip to main content

Advertisement

Log in

Real-time deep learning–based image processing for pose estimation and object localization in autonomous robot applications

  • ORIGINAL ARTICLE
  • Published:
The International Journal of Advanced Manufacturing Technology Aims and scope Submit manuscript

Abstract

Artificial intelligence (AI) is shaping manufacturing to make it smarter, intelligent, and autonomous. Presently, flexible robots have been introduced that collaborate with humans on the shop floor to enhance productivity and efficiency. Object classification and pose estimation in an autonomous robotic system are crucial problems for proper grasping. Extensive research is being conducted to achieve low-cost, computationally efficient, and real-time assessments. However, most of the existing approaches are computationally expensive and constrained to previous knowledge of the 3D structure of an object. This article presents an AI-based solution, which generalizes cuboid- and cylindrical-shaped objects’ grasping in real-time, irrespective of the dimensions. The AI algorithm has achieved an average precision of 89.44% and 82.43% for cuboid- and cylindrical-shaped objects. It is identified without the knowledge of the objects’ 3D model. The pose is estimated in real-time, accurately. The integrated solution has been implemented in a robotic system fitted with two grippers, a conveyor system, and sensors. Results of several experiments have been reported in this article, which validates the solution. The proposed methodology has achieved 100% accuracy during our experiments to grasp objects on the conveyor belt.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24

Similar content being viewed by others

References

  1. Redmon J, Divvala S, Girshick R, Farhadi A (2015) You Only Look Once: unified, real-time object detection. Comput Vis Pattern Recognit. https://doi.org/10.48550/arXiv.1506.02640

  2. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y et al (2015) SSD: single shot multiBox detector. https://doi.org/10.1007/978-3-319-46448-0_2

  3. Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: an efficient alternative to SIFT or SURF. Int Conf Comput Vis IEEE 2564–2571. https://doi.org/10.1109/ICCV.2011.6126544

    Article  Google Scholar 

  4. Karami E, Prasad S, Shehata M (2017) Image matching using SIFT, SURF, BRIEF and ORB: performance comparison for distorted images. Comput Vis Pattern Recognit. https://doi.org/10.48550/arXiv.1710.02726

  5. Lowe DG (1999) Object recognition from local scale-invariant features. Proc IEEE Int Conf Comput Vis 2:1150–1157. https://doi.org/10.1109/ICCV.1999.790410

    Article  Google Scholar 

  6. Kehl W, Manhardt F, Tombari F, Ilic S, Navab N (2017) SSD-6D: Making RGB-based 3D detection and 6D pose estimation great again. Comput Vis Pattern Recognit. https://doi.org/10.48550/arXiv.1711.10006

  7. Song C, Song J, Huang Q (2020) HybridPose: 6D object pose estimation under hybrid representations. Comput. Vis. Pattern Recognit. arXiv:2001.01869. https://doi.org/10.48550/arXiv.2001.01869

  8. Zakharov S, Shugurov I, Ilic S (2019) DPOD: 6D pose object detector and refiner. Comput Vis Pattern Recognit. https://doi.org/10.48550/arXiv.1902.11020

  9. Yang S, Scherer S (2018) CubeSLAM: monocular 3D object SLAM. https://doi.org/10.1109/TRO.2019.2909168

  10. Xiao J, Russell BC, Torralba A (2012) Localizing 3D cuboids in single-view images. Adv Neural Inf Process Syst (NIPS 2012). https://papers.nips.cc/paper/2012/file/58238e9ae2dd305d79c2ebc8c1883422-Paper.pdf

  11. Tekin B, Sinha SN, Fua P (2017) Real-time seamless single shot 6D object pose prediction. https://doi.org/10.1109/CVPR.2018.00038

  12. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Comput Vis Pattern Recognit. https://doi.org/10.48550/arXiv.1506.01497

  13. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. Computer Vis Pattern Recognit. https://doi.org/10.48550/arXiv.1703.06870

  14. Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. Comput Vis Pattern Recognition. https://doi.org/10.48550/arXiv.1708.02002

  15. Sreedhar K (2012) Enhancement of images using morphological transformations. Int J Comput Sci Inf Technol 4:33–50. https://doi.org/10.5121/ijcsit.2012.4103

    Article  Google Scholar 

  16. Christopher RW (1998) Perspective transform estimation. https://www.researchgate.net/profile/Christopher-R-Wren/publication/215439543_Perspective_Transform_Estimation/links/‌56df558708ae9b93f79a948e/Perspective-Transform-Estimation.pdf. (Accessed 26 Jun 2020)

  17. Canny J (1986) A Computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell PAMI 8:679–698. https://doi.org/10.1109/TPAMI.1986.4767851

    Article  Google Scholar 

  18. Lepetit V, Moreno-Noguer F, Fua P (2009) EPnP: an accurate O(n) solution to the PnP problem. Int J Comput Vis 81:155–166. https://doi.org/10.1007/s11263-008-0152-6

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

All the authors contributed to the study’s conception and design. The data collection and analysis were performed by Ritam Upadhyay and Abhishek Asi. The conveyor design, analyses, and fabrication were performed by Nidhi Prasad and Pravanjan Nayak. The sensor interaction and robotic implementation were performed by Ritam Upadhyay and Pravanjan Nayak. The first draft of the manuscript was written by Ritam Upadhyay and Debasish Mishra, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Surjya K. Pal.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection: New Intelligent Manufacturing Technologies through the Integration of Industry 4.0 and Advanced Manufacturing.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Upadhyay, R., Asi, A., Nayak, P. et al. Real-time deep learning–based image processing for pose estimation and object localization in autonomous robot applications. Int J Adv Manuf Technol 127, 1905–1919 (2023). https://doi.org/10.1007/s00170-022-09994-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00170-022-09994-4

Keywords

Navigation