Skip to main content
Log in

Fast, yet robust end-to-end camera pose estimation for robotic applications

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Camera pose estimation in robotic applications is paramount. Most of recent algorithms based on convolutional neural networks demonstrate that they are able to predict the camera pose adequately. However, they usually suffer from the computational complexity which prevent them from running in real-time. Additionally, they are not robust to perturbations such as partial occlusion while they have not been trained on such cases beforehand. To study these limitations, this paper presents a fast and robust end-to-end Siamese convolutional model for robot-camera pose estimation. Two colored-frames are fed to the model at the same time, and the generic features are produced mainly based on the transfer learning. The extracted features are then concatenated, from which the relative pose is directly obtained at the output. Furthermore, a new dataset is generated, which includes several videos taken at various situations for the model evaluation. The proposed technique shows a robust performance even in challenging scenes, which have not been rehearsed during the training phase. Through the experiments conducted with an eye-in-hand KUKA robotic arm, the presented network renders fairly accurate results on camera pose estimation despite scene-illumination changes. Also, the pose estimation is conducted with reasonable accuracy in presence of partial camera occlusion. The results are enhanced by defining a new dynamic weighted loss function. The proposed method is further exploited in visual servoing scenario.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Notes

  1. https://github.com/zkamranian/Pose-Estimation-Dataset

References

  1. Bateux Q, Marchand E, Leitner J, Chaumette F, Corke P (2018) Training deep neural networks for visual servoing. In: IEEE international conference on robotics and automation (ICRA), IEEE, pp 1–8

  2. Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (surf). Comput Vision Image Underst 110(3):346–359

    Article  Google Scholar 

  3. Brachmann E, Krull A, Nowozin S, Shotton J, Michel F, Gumhold S, Rother C (2017) Dsac-differentiable ransac for camera localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6684–6692

  4. Brachmann E, Rother C (2018) Learning less is more-6d camera localization via 3d surface regression. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4654–4662

  5. Brahmbhatt S, Gu J, Kim K, Hays J, Kautz J (2018) Geometry-aware learning of maps for camera localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2616–2625

  6. Calonder M, Lepetit V, Strecha C, Fua P (2010) Brief: binary robust independent elementary features. In: European conference on computer vision. Springer, New York, pp 778–792

  7. Cavallari T, Golodetz S, Lord N, Valentin J, Prisacariu V, Di Stefano L, Torr PH (2019) Real-time rgb-d camera pose estimation in novel scenes using a relocalisation cascade. IEEE Transactions on Pattern Analysis and Machine Intelligence

  8. Charco JL, Vintimilla BX, Sappa AD (2018) Deep learning based camera pose estimation in multi-view environment. In: 2018 14Th international conference on signal-image technology & internet-based systems (SITIS), IEEE, pp 224–228

  9. Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details:, Delving deep into convolutional nets. arXiv:1405.3531

  10. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Computer vision and pattern recognition (CVPR), IEEE, pp 248–255

  11. DeTone D, Malisiewicz T, Rabinovich A (2016) Deep image homography estimation. arXiv:1606.03798

  12. Francois C (2017) Deep learning with python

  13. Gálvez-López D, Tardos JD (2012) Bags of binary words for fast place recognition in image sequences. IEEE Trans Robot 28(5):1188–1197

    Article  Google Scholar 

  14. Glocker B, Izadi S, Shotton J, Criminisi A (2013) Real-time rgb-d camera relocalization. In: IEEE international symposium on mixed and augmented reality (ISMAR), IEEE, pp 173–179

  15. Glocker B, Shotton J, Criminisi A, Izadi S (2014) Real-time rgb-d camera relocalization via randomized ferns for keyframe encoding. IEEE Trans Visualizat Comput Graph 21(5):571–583

    Article  Google Scholar 

  16. Golodetz S, Cavallari T, Lord NA, Prisacariu VA, Murray DW, Torr PH (2018) Collaborative large-scale dense 3d reconstruction with online inter-agent pose optimisation. IEEE Trans Visualizat Comput Graph 24(11):2895–2905

    Article  Google Scholar 

  17. Held D, Thrun S, Savarese S (2016) Learning to track at 100 fps with deep regression networks. In: European conference on computer vision. Springer, New York, pp 749–765

  18. Kähler O, Prisacariu VA, Ren CY, Sun X, Torr P, Murray D (2015) Very high frame rate volumetric integration of depth images on mobile devices. IEEE Trans Visualizat Comput Graph 21 (11):1241–1250

    Article  Google Scholar 

  19. Kamranian Z, Nilchi ARN, Monadjemi A, Navab N (2018) Iterative algorithm for interactive co-segmentation using semantic information propagation. Appl Intell 48(12):5019–5036

    Article  Google Scholar 

  20. Kamranian Z, Nilchi ARN, Sadeghian H, Tombari F, Navab N (2019) Joint motion boundary detection and cnn-based feature visualization for video object segmentation. Neural Comput Applic pp 1–19

  21. Kamranian Z, Tombari F, Nilchi ARN, Monadjemi A, Navab N (2018) Co-segmentation via visualization. J Vis Commun Image Represent 55:201–214

    Article  Google Scholar 

  22. Kendall A, Cipolla R (2017) Geometric loss functions for camera pose regression with deep learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5974–5983

  23. Kendall A, Grimes M, Cipolla R (2015) Posenet: a convolutional network for real-time 6-dof camera relocalization. In: Proceedings of the IEEE international conference on computer vision, pp 2938–2946

  24. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980

  25. Konda KR, Memisevic R (2015) Learning visual odometry with a convolutional network. In: VISAPP (1), pp 486–490

  26. Lathuilière S, Mesejo P, Alameda-Pineda X, Horaud R (2019) A comprehensive analysis of deep regression. IEEE Trans Pattern Anal Mach Intell

  27. Li Y, Wang G, Ji X, Xiang Y, Fox D (2018) Deepim: deep iterative matching for 6d pose estimation. In: Proceedings of the european conference on computer vision (ECCV), pp 683–698

  28. Lin Y, Liu Z, Huang J, Wang C, Du G, Bai J, Lian S (2019) Deep global-relative networks for end-to-end 6-dof visual localization and odometry. In: Pacific rim international conference on artificial intelligence. Springer, New York, pp 454–467

  29. Liu R, Zhang H, Liu M, Xia X, Hu T (2009) Stereo cameras self-calibration based on sift. In: 2009 international conference on measuring technology and mechatronics automation, IEEE, vol 1, pp 352–355

  30. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Computer vision and pattern recognition (CVPR), conference on, IEEE, pp 3431–3440

  31. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60(2):91–110

    Article  Google Scholar 

  32. Melekhov I, Ylioinas J, Kannala J, Rahtu E (2017) Image-based localization using hourglass networks. In: Proceedings of the IEEE international conference on computer vision workshops, pp 879–886

  33. Melekhov I, Ylioinas J, Kannala J, Rahtu E (2017) Relative camera pose estimation using convolutional neural networks. In: International conference on advanced concepts for intelligent vision systems. Springer, New York, pp 675–687

  34. Mur-Artal R, Montiel JMM, Tardos JD (2015) Orb-slam: a versatile and accurate monocular slam system. IEEE Trans Robot 31(5):1147–1163

    Article  Google Scholar 

  35. Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 1520–1528

  36. Radwan N, Valada A, Burgard W (2018) Vlocnet++: Deep multitask learning for semantic visual localization and odometry. IEEE Robot Automat Lett 3(4):4407–4414

    Article  Google Scholar 

  37. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement arXiv

  38. Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: an efficient alternative to sift or surf. In: 2011 international conference on computer vision, Ieee, pp 2564–2571

  39. Sadeghian H, Villani L, Kamranian Z, Karami A (2015) Visual servoing with safe interaction using image moments. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 5479–5485

  40. Sarlin PE, Cadena C, Siegwart R, Dymczyk M (2019) From coarse to fine: robust hierarchical localization at large scale. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 12716–12725

  41. Schonberger JL, Frahm JM (2016) Structure-from-motion revisited. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4104–4113

  42. Siciliano B, Sciavicco L, Villani L, Oriolo G (2010) Robotics: modelling, planning and control. Springer Science & Business Media, New York

    Google Scholar 

  43. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  44. Sivic J, Zisserman A (2008) Efficient visual search of videos cast as text retrieval. IEEE Trans Pattern Anal Mach Intell 31(4):591–606

    Article  Google Scholar 

  45. Ruiz-del Solar J, Loncomilla P, Soto N (2018) A survey on deep learning methods for robot vision. arXiv:1803.10862

  46. Tola E, Lepetit V, Fua P (2009) Daisy: an efficient dense descriptor applied to wide-baseline stereo. IEEE Trans Pattern Anal Mach Intell 32(5):815–830

    Article  Google Scholar 

  47. Ummenhofer B, Zhou H, Uhrig J, Mayer N, Ilg E, Dosovitskiy A, Brox T (2017) Demon: depth and motion network for learning monocular stereo. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5038–5047

  48. Valentin J, Vineet V, Cheng MM, Kim D, Shotton J, Kohli P, Nießner M., Criminisi A, Izadi S, Torr P (2015) Semanticpaint: interactive 3d labeling and learning at your fingertips. ACM Trans Graph (TOG) 34(5):1–17

    Article  Google Scholar 

  49. Wang Z, Dai Z, Póczos B, Carbonell J (2019) Characterizing and avoiding negative transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 11293–11302

  50. Zhang Y, Wang S, Genlin J (2015) Application of time-varying acceleration coefficients pso to face pose estimation. In: First international conference on information sciences, machinery, materials and energy. Atlantis Press, Paris

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zahra Kamranian.

Ethics declarations

Conflict of interests

The authors declare that they have no conflicts of interest.

Additional information

Human and animal rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kamranian, Z., Sadeghian, H., Naghsh Nilchi, A.R. et al. Fast, yet robust end-to-end camera pose estimation for robotic applications. Appl Intell 51, 3581–3599 (2021). https://doi.org/10.1007/s10489-020-01982-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-01982-z

Keywords

Navigation