Relative Camera Pose Estimation Using Convolutional Neural Networks

Melekhov, Iaroslav; Ylioinas, Juha; Kannala, Juho; Rahtu, Esa

doi:10.1007/978-3-319-70353-4_57

Iaroslav Melekhov¹⁸,
Juha Ylioinas¹⁸,
Juho Kannala¹⁸ &
…
Esa Rahtu¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10617))

Included in the following conference series:

International Conference on Advanced Concepts for Intelligent Vision Systems

4727 Accesses
105 Citations

Abstract

This paper presents a convolutional neural network based approach for estimating the relative pose between two cameras. The proposed network takes RGB images from both cameras as input and directly produces the relative rotation and translation as output. The system is trained in an end-to-end manner utilising transfer learning from a large scale classification dataset. The introduced approach is compared with widely used local feature based methods (SURF, ORB) and the results indicate a clear improvement over the baseline. In addition, a variant of the proposed architecture containing a spatial pyramid pooling (SPP) layer is evaluated and shown to further improve the performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Tola, E., Lepetit, V., Fua, P.: DAISY: an efficient dense descriptor applied to wide baseline stereo. IEEE Trans. PAMI 32, 815–830 (2010)
Article Google Scholar
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32
Chapter Google Scholar
Engel, J., Schöps, T., Cremers, D.: LSD-SLAM: large-scale direct monocular SLAM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 834–849. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_54
Google Scholar
Jensen, R., Dahl, A., Vogiatzis, G., Tola, E., Aanæs, H.: Large scale multi-view stereopsis evaluation. In: CVPR, pp. 406–413 (2014)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_23
Google Scholar
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to sift or surf. In: ICCV (2011)
Google Scholar
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: binary robust independent elementary features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 778–792. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_56
Chapter Google Scholar
DeTone, D., Malisiewicz, T., Rabinovich, A.: Deep image homography estimation. CoRR abs/1606.03798 (2016)
Google Scholar
Konda, K., Memisevic, R.: Learning visual odometry with a convolutional network. In: VISIGRAPP (2015)
Google Scholar
Mohanty, V., Agrawal, S., Datta, S., Ghosh, A., Sharma, V.D., Chakravarty, D.: DeepVO: a deep learning approach for monocular visual odometry. CoRR abs/1611.06069 (2016)
Google Scholar
Ummenhofer, B., Zhou, H., Uhrig, J., Mayer, N., Ilg, E., Dosovitskiy, A., Brox, T.: DeMoN: depth and motion network for learning monocular stereo. CoRR abs/1612.02401 (2016)
Google Scholar
Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: ICCV (2015)
Google Scholar
Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: CVPR (2005)
Google Scholar
Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: NIPS (2014)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25, pp. 1097–1105. Curran Associates, Inc. (2012)
Google Scholar
Babenko, A., Lempitsky, V.S.: Aggregating deep convolutional features for image retrieval. In: ICCV (2015)
Google Scholar
Razavian, A.S., Sullivan, J., Maki, A., Carlsson, S.: Visual instance retrieval with deep convolutional networks. CoRR abs/1412.6574 (2014)
Google Scholar
Azizpour, H., Razavian, A.S., Sullivan, J., Maki, A., Carlsson, S.: From generic to specific deep representations for visual recognition. In: CVPRW (2015)
Google Scholar
Wilson, K., Snavely, N.: Robust global translations with 1DSfM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 61–75. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_5
Google Scholar
Moulon, P., Monasse, P., Marlet, R., Others: OpenMVG: an open multiple view geometry library (2012). https://github.com/openMVG/openMVG
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014)
Google Scholar
Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: a matlab-like environment for machine learning. In: BigLearn, NIPS Workshop (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Aalto University, Helsinki, Finland
Iaroslav Melekhov, Juha Ylioinas & Juho Kannala
Tampere University of Technology, Tampere, Finland
Esa Rahtu

Authors

Iaroslav Melekhov
View author publications
You can also search for this author in PubMed Google Scholar
Juha Ylioinas
View author publications
You can also search for this author in PubMed Google Scholar
Juho Kannala
View author publications
You can also search for this author in PubMed Google Scholar
Esa Rahtu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Iaroslav Melekhov .

Editor information

Editors and Affiliations

DGA, Paris, France
Jacques Blanc-Talon
University of Antwerp, Antwerp, Belgium
Rudi Penne
Ghent University - imec, Ghent, Belgium
Wilfried Philips
CSIRO Data 61, Canberra, Aust Capital Terr, Australia
Dan Popescu
University of Antwerp, Wilrijk, Belgium
Paul Scheunders

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Melekhov, I., Ylioinas, J., Kannala, J., Rahtu, E. (2017). Relative Camera Pose Estimation Using Convolutional Neural Networks. In: Blanc-Talon, J., Penne, R., Philips, W., Popescu, D., Scheunders, P. (eds) Advanced Concepts for Intelligent Vision Systems. ACIVS 2017. Lecture Notes in Computer Science(), vol 10617. Springer, Cham. https://doi.org/10.1007/978-3-319-70353-4_57

Download citation

DOI: https://doi.org/10.1007/978-3-319-70353-4_57
Published: 23 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70352-7
Online ISBN: 978-3-319-70353-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics