Fast, yet robust end-to-end camera pose estimation for robotic applications

Kamranian, Zahra; Sadeghian, Hamid; Naghsh Nilchi, Ahmad Reza; Mehrandezh, Mehran

doi:10.1007/s10489-020-01982-z

Fast, yet robust end-to-end camera pose estimation for robotic applications

Published: 16 November 2020

Volume 51, pages 3581–3599, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Zahra Kamranian ORCID: orcid.org/0000-0002-2932-9091¹,
Hamid Sadeghian²,
Ahmad Reza Naghsh Nilchi¹ &
…
Mehran Mehrandezh³

614 Accesses
6 Citations
Explore all metrics

Abstract

Camera pose estimation in robotic applications is paramount. Most of recent algorithms based on convolutional neural networks demonstrate that they are able to predict the camera pose adequately. However, they usually suffer from the computational complexity which prevent them from running in real-time. Additionally, they are not robust to perturbations such as partial occlusion while they have not been trained on such cases beforehand. To study these limitations, this paper presents a fast and robust end-to-end Siamese convolutional model for robot-camera pose estimation. Two colored-frames are fed to the model at the same time, and the generic features are produced mainly based on the transfer learning. The extracted features are then concatenated, from which the relative pose is directly obtained at the output. Furthermore, a new dataset is generated, which includes several videos taken at various situations for the model evaluation. The proposed technique shows a robust performance even in challenging scenes, which have not been rehearsed during the training phase. Through the experiments conducted with an eye-in-hand KUKA robotic arm, the presented network renders fairly accurate results on camera pose estimation despite scene-illumination changes. Also, the pose estimation is conducted with reasonable accuracy in presence of partial camera occlusion. The results are enhanced by defining a new dynamic weighted loss function. The proposed method is further exploited in visual servoing scenario.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review of convolutional neural networks in computer vision

Article Open access 23 March 2024

Deep learning-based 3D reconstruction: a survey

Article 28 January 2023

Simultaneous Multi-View Object Recognition and Grasping in Open-Ended Domains

Article Open access 16 April 2024

Notes

https://github.com/zkamranian/Pose-Estimation-Dataset

References

Bateux Q, Marchand E, Leitner J, Chaumette F, Corke P (2018) Training deep neural networks for visual servoing. In: IEEE international conference on robotics and automation (ICRA), IEEE, pp 1–8
Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (surf). Comput Vision Image Underst 110(3):346–359
Article Google Scholar
Brachmann E, Krull A, Nowozin S, Shotton J, Michel F, Gumhold S, Rother C (2017) Dsac-differentiable ransac for camera localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6684–6692
Brachmann E, Rother C (2018) Learning less is more-6d camera localization via 3d surface regression. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4654–4662
Brahmbhatt S, Gu J, Kim K, Hays J, Kautz J (2018) Geometry-aware learning of maps for camera localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2616–2625
Calonder M, Lepetit V, Strecha C, Fua P (2010) Brief: binary robust independent elementary features. In: European conference on computer vision. Springer, New York, pp 778–792
Cavallari T, Golodetz S, Lord N, Valentin J, Prisacariu V, Di Stefano L, Torr PH (2019) Real-time rgb-d camera pose estimation in novel scenes using a relocalisation cascade. IEEE Transactions on Pattern Analysis and Machine Intelligence
Charco JL, Vintimilla BX, Sappa AD (2018) Deep learning based camera pose estimation in multi-view environment. In: 2018 14Th international conference on signal-image technology & internet-based systems (SITIS), IEEE, pp 224–228
Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details:, Delving deep into convolutional nets. arXiv:1405.3531
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Computer vision and pattern recognition (CVPR), IEEE, pp 248–255
DeTone D, Malisiewicz T, Rabinovich A (2016) Deep image homography estimation. arXiv:1606.03798
Francois C (2017) Deep learning with python
Gálvez-López D, Tardos JD (2012) Bags of binary words for fast place recognition in image sequences. IEEE Trans Robot 28(5):1188–1197
Article Google Scholar
Glocker B, Izadi S, Shotton J, Criminisi A (2013) Real-time rgb-d camera relocalization. In: IEEE international symposium on mixed and augmented reality (ISMAR), IEEE, pp 173–179
Glocker B, Shotton J, Criminisi A, Izadi S (2014) Real-time rgb-d camera relocalization via randomized ferns for keyframe encoding. IEEE Trans Visualizat Comput Graph 21(5):571–583
Article Google Scholar
Golodetz S, Cavallari T, Lord NA, Prisacariu VA, Murray DW, Torr PH (2018) Collaborative large-scale dense 3d reconstruction with online inter-agent pose optimisation. IEEE Trans Visualizat Comput Graph 24(11):2895–2905
Article Google Scholar
Held D, Thrun S, Savarese S (2016) Learning to track at 100 fps with deep regression networks. In: European conference on computer vision. Springer, New York, pp 749–765
Kähler O, Prisacariu VA, Ren CY, Sun X, Torr P, Murray D (2015) Very high frame rate volumetric integration of depth images on mobile devices. IEEE Trans Visualizat Comput Graph 21 (11):1241–1250
Article Google Scholar
Kamranian Z, Nilchi ARN, Monadjemi A, Navab N (2018) Iterative algorithm for interactive co-segmentation using semantic information propagation. Appl Intell 48(12):5019–5036
Article Google Scholar
Kamranian Z, Nilchi ARN, Sadeghian H, Tombari F, Navab N (2019) Joint motion boundary detection and cnn-based feature visualization for video object segmentation. Neural Comput Applic pp 1–19
Kamranian Z, Tombari F, Nilchi ARN, Monadjemi A, Navab N (2018) Co-segmentation via visualization. J Vis Commun Image Represent 55:201–214
Article Google Scholar
Kendall A, Cipolla R (2017) Geometric loss functions for camera pose regression with deep learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5974–5983
Kendall A, Grimes M, Cipolla R (2015) Posenet: a convolutional network for real-time 6-dof camera relocalization. In: Proceedings of the IEEE international conference on computer vision, pp 2938–2946
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Konda KR, Memisevic R (2015) Learning visual odometry with a convolutional network. In: VISAPP (1), pp 486–490
Lathuilière S, Mesejo P, Alameda-Pineda X, Horaud R (2019) A comprehensive analysis of deep regression. IEEE Trans Pattern Anal Mach Intell
Li Y, Wang G, Ji X, Xiang Y, Fox D (2018) Deepim: deep iterative matching for 6d pose estimation. In: Proceedings of the european conference on computer vision (ECCV), pp 683–698
Lin Y, Liu Z, Huang J, Wang C, Du G, Bai J, Lian S (2019) Deep global-relative networks for end-to-end 6-dof visual localization and odometry. In: Pacific rim international conference on artificial intelligence. Springer, New York, pp 454–467
Liu R, Zhang H, Liu M, Xia X, Hu T (2009) Stereo cameras self-calibration based on sift. In: 2009 international conference on measuring technology and mechatronics automation, IEEE, vol 1, pp 352–355
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Computer vision and pattern recognition (CVPR), conference on, IEEE, pp 3431–3440
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60(2):91–110
Article Google Scholar
Melekhov I, Ylioinas J, Kannala J, Rahtu E (2017) Image-based localization using hourglass networks. In: Proceedings of the IEEE international conference on computer vision workshops, pp 879–886
Melekhov I, Ylioinas J, Kannala J, Rahtu E (2017) Relative camera pose estimation using convolutional neural networks. In: International conference on advanced concepts for intelligent vision systems. Springer, New York, pp 675–687
Mur-Artal R, Montiel JMM, Tardos JD (2015) Orb-slam: a versatile and accurate monocular slam system. IEEE Trans Robot 31(5):1147–1163
Article Google Scholar
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 1520–1528
Radwan N, Valada A, Burgard W (2018) Vlocnet++: Deep multitask learning for semantic visual localization and odometry. IEEE Robot Automat Lett 3(4):4407–4414
Article Google Scholar
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement arXiv
Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: an efficient alternative to sift or surf. In: 2011 international conference on computer vision, Ieee, pp 2564–2571
Sadeghian H, Villani L, Kamranian Z, Karami A (2015) Visual servoing with safe interaction using image moments. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 5479–5485
Sarlin PE, Cadena C, Siegwart R, Dymczyk M (2019) From coarse to fine: robust hierarchical localization at large scale. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 12716–12725
Schonberger JL, Frahm JM (2016) Structure-from-motion revisited. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4104–4113
Siciliano B, Sciavicco L, Villani L, Oriolo G (2010) Robotics: modelling, planning and control. Springer Science & Business Media, New York
Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Sivic J, Zisserman A (2008) Efficient visual search of videos cast as text retrieval. IEEE Trans Pattern Anal Mach Intell 31(4):591–606
Article Google Scholar
Ruiz-del Solar J, Loncomilla P, Soto N (2018) A survey on deep learning methods for robot vision. arXiv:1803.10862
Tola E, Lepetit V, Fua P (2009) Daisy: an efficient dense descriptor applied to wide-baseline stereo. IEEE Trans Pattern Anal Mach Intell 32(5):815–830
Article Google Scholar
Ummenhofer B, Zhou H, Uhrig J, Mayer N, Ilg E, Dosovitskiy A, Brox T (2017) Demon: depth and motion network for learning monocular stereo. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5038–5047
Valentin J, Vineet V, Cheng MM, Kim D, Shotton J, Kohli P, Nießner M., Criminisi A, Izadi S, Torr P (2015) Semanticpaint: interactive 3d labeling and learning at your fingertips. ACM Trans Graph (TOG) 34(5):1–17
Article Google Scholar
Wang Z, Dai Z, Póczos B, Carbonell J (2019) Characterizing and avoiding negative transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 11293–11302
Zhang Y, Wang S, Genlin J (2015) Application of time-varying acceleration coefficients pso to face pose estimation. In: First international conference on information sciences, machinery, materials and energy. Atlantis Press, Paris

Download references

Author information

Authors and Affiliations

Department of Artificial Intelligence, Faculty of Computer Engineering, University of Isfahan, Isfahan, Iran
Zahra Kamranian & Ahmad Reza Naghsh Nilchi
Faculty of Engineering, University of Isfahan, Isfahan, Iran
Hamid Sadeghian
Faculty of Engineering, University of Regina, Regina, Canada
Mehran Mehrandezh

Authors

Zahra Kamranian
View author publications
You can also search for this author in PubMed Google Scholar
Hamid Sadeghian
View author publications
You can also search for this author in PubMed Google Scholar
Ahmad Reza Naghsh Nilchi
View author publications
You can also search for this author in PubMed Google Scholar
Mehran Mehrandezh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zahra Kamranian.

Ethics declarations

Conflict of interests

The authors declare that they have no conflicts of interest.

Additional information

Human and animal rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kamranian, Z., Sadeghian, H., Naghsh Nilchi, A.R. et al. Fast, yet robust end-to-end camera pose estimation for robotic applications. Appl Intell 51, 3581–3599 (2021). https://doi.org/10.1007/s10489-020-01982-z

Download citation

Accepted: 25 September 2020
Published: 16 November 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s10489-020-01982-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast, yet robust end-to-end camera pose estimation for robotic applications

Abstract

Access this article

Similar content being viewed by others

A review of convolutional neural networks in computer vision

Deep learning-based 3D reconstruction: a survey

Simultaneous Multi-View Object Recognition and Grasping in Open-Ended Domains

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Human and animal rights

Informed consent

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fast, yet robust end-to-end camera pose estimation for robotic applications

Abstract

Access this article

Similar content being viewed by others

A review of convolutional neural networks in computer vision

Deep learning-based 3D reconstruction: a survey

Simultaneous Multi-View Object Recognition and Grasping in Open-Ended Domains

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Human and animal rights

Informed consent

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation