Robust camera pose estimation by viewpoint classification using deep learning

Nakajima, Yoshikatsu; Saito, Hideo

doi:10.1007/s41095-016-0067-z

Robust camera pose estimation by viewpoint classification using deep learning

Research Article
Open access
Published: 06 December 2016

Volume 3, pages 189–198, (2017)
Cite this article

Download PDF

You have full access to this open access article

Computational Visual Media Aims and scope Submit manuscript

Robust camera pose estimation by viewpoint classification using deep learning

Download PDF

Yoshikatsu Nakajima¹ &
Hideo Saito¹

4054 Accesses
18 Citations
Explore all metrics

Abstract

Camera pose estimation with respect to target scenes is an important technology for superimposing virtual information in augmented reality (AR). However, it is difficult to estimate the camera pose for all possible view angles because feature descriptors such as SIFT are not completely invariant from every perspective. We propose a novel method of robust camera pose estimation using multiple feature descriptor databases generated for each partitioned viewpoint, in which the feature descriptor of each keypoint is almost invariant. Our method estimates the viewpoint class for each input image using deep learning based on a set of training images prepared for each viewpoint class. We give two ways to prepare these images for deep learning and generating databases. In the first method, images are generated using a projection matrix to ensure robust learning in a range of environments with changing backgrounds. The second method uses real images to learn a given environment around a planar pattern. Our evaluation results confirm that our approach increases the number of correct matches and the accuracy of camera pose estimation compared to the conventional method.

Article PDF

Towards Viewpoint Invariant 3D Human Pose Estimation

Object Pose Estimation from Monocular Image Using Multi-view Keypoint Correspondence

Viewpoint Estimation—Insights and Model

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Kato, H.; Billinghurst, M. Marker tracking and HMD calibration for a video-based augmented reality conferencing system. In: Proceedings of the 2nd IEEE and ACM International Workshop on Augmented Reality, 85–94, 1999.
Chapter Google Scholar
Lee, T.; Hollerer, T. Hybrid feature tracking and user interaction for markerless augmented reality. In: Proceedings of IEEE Virtual Reality Conference, 145–152, 2008.
Google Scholar
Maidi, M.; Preda, M.; Le, V. H. Markerless tracking for mobile augmented reality. In: Proceedings of IEEE International Conference on Signal and Image Processing Applications, 301–306, 2011.
Google Scholar
Lowe, D. G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision Vol. 60, No. 2, 91–110, 2004.
Article Google Scholar
Mikolajczyk, K.; Schmid, C. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 27, No. 10, 1615–1630, 2005.
Article Google Scholar
Lepetit, V.; Fua, P. Keypoint recognition using randomized trees. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 28, No. 9, 1465–1479, 2006.
Article Google Scholar
Breiman, L. Random forests. Machine Learning Vol. 45, No. 1, 5–32, 2001.
Article MATH Google Scholar
Yoshida, T.; Saito, H.; Shimizu, M.; Taguchi, A. Stable keypoint recognition using viewpoint generative learning. In: Proceedings of the International Conference on Computer Vision Theory and Applications, Vol. 2, 310–315, 2013.
Google Scholar
Hartigan, J. A.; Wong, M. A. Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) Vol. 28, No. 1, 100–108, 1979.
MATH Google Scholar
Fukushima, K.; Miyake, S. Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recognition Vol. 15, No. 6, 455–469, 1982.
Article Google Scholar
Hubel, D. H.; Wiesel, T. N. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. The Journal of Physiology Vol. 160, No. 1, 106–154, 1962.
Article Google Scholar
LeCun, Y.; Boser, B.; Denker, J. S.; Henderson, D.; Howard, R. E.; Hubbard, W.; Jackel, L. D. Backpropagation applied to handwritten zip code recognition. Neural Computation Vol. 1, No. 4, 541–551, 1989.
Article Google Scholar
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; Berg, A. C.; Fei-Fei, L. ImageNet large scale visual recognition challenge. International Journal of Computer Vision Vol. 115, No. 3, 211–252, 2015.
Article MathSciNet Google Scholar
Agrawal, P.; Carreira, J.; Malik, J. Learning to see by moving. In: Proceedings of IEEE International Conference on Computer Vision, 37–45, 2015.
Google Scholar
Rumelhart, D. E.; Hintont, G. E.; Williams, R. J. Learning representations by back-propagating errors. Nature Vol. 323, 533–536, 1986.
Article MATH Google Scholar
Hinton, G. E.; Srivastava, N.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.
MATH Google Scholar
Krizhevsky, A.; Sutskever, I.; Hinton, G. E. ImageNet classification with deep convolutional neural network. In: Proceedings of Advances in Neural Information Processing Systems, 1097–1105, 2012.
Google Scholar
Mikolajczyk, K.; Tuytelaars, T.; Schmid, C.; Zisserman, A.; Matas, J.; Schaffalitzky, F.; Kadir, T.; GooL, L. V. A comparison of affine region detectors. International Journal of Computer Vision Vol. 65, No. 1, 43–72, 2005.
Article Google Scholar
Fischler, M. A.; Bolles, R. C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM Vol. 24, No. 6, 381–395, 1981.
Article MathSciNet Google Scholar
Yu, G.; Morel, J.-M. ASIFT: An algorithm for fully affine invariant comparison. Image Processing On Line Vol. 1, 1–28, 2011.
Google Scholar
Ozuysal, M.; Calonder, M.; Lepetit, V.; Fua, P. Fast keypoint recognition using random ferns. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 32, No. 3, 448–461, 2009.
Article Google Scholar
Tokui, S.; Oono, K.; Hido, S.; Clayton, J. Chainer: A next-generation open source framework for deep learning. In: Proceedings of Workshop on Machine Learning Systems (LearningSys) in the 29th Annual Conference on Neural Information Processing Systems, 2015.
Google Scholar
Lin, M.; Chen, Q.; Yan, S. Network in network. arXiv preprint arXiv:1312.4400, 2013.
Google Scholar
Alcantarilla, P. F.; Nuevo, J.; Bartoli, A. Fast explicit diffusion for accelerated features in nonlinear scale spaces. In: Proceedings of British Machine Vision Conference, 13.1–13.11, 2013.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Science and Technology, Keio University, Keio, Japan
Yoshikatsu Nakajima & Hideo Saito

Authors

Yoshikatsu Nakajima
View author publications
You can also search for this author in PubMed Google Scholar
Hideo Saito
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yoshikatsu Nakajima.

Additional information

This article is published with open access at Springerlink.com

Yoshikatsu Nakajima received his B.E. degree in information and computer science from Keio University, Japan, in 2016. Since 2016, he has been a master student in the Department of Science and Technology at Keio University, Japan. His research interests include augmented reality, SLAM, object recognition, and computer vision.

Hideo Saito received his Ph.D. degree in electrical engineering from Keio University, Japan, in 1992. Since then, he has been on the Faculty of Science and Technology, Keio University. From 1997 to 1999, he joined the Virtualized Reality Project in the Robotics Institute, Carnegie Mellon University as a visiting researcher. Since 2006, he has been a full professor in the Department of Information and Computer Science, Keio University. His recent activities for academic conferences include being Program Chair of ACCV2014, a General Chair of ISMAR2015, and a Program Chair of ISMAR2016. His research interests include computer vision and pattern recognition, and their applications to augmented reality, virtual reality, and human robotics interaction.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Rights and permissions

Open Access The articles published in this journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Nakajima, Y., Saito, H. Robust camera pose estimation by viewpoint classification using deep learning. Comp. Visual Media 3, 189–198 (2017). https://doi.org/10.1007/s41095-016-0067-z

Download citation

Received: 25 July 2016
Accepted: 13 November 2016
Published: 06 December 2016
Issue Date: June 2017
DOI: https://doi.org/10.1007/s41095-016-0067-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Robust camera pose estimation by viewpoint classification using deep learning

Abstract

Article PDF

Similar content being viewed by others

Towards Viewpoint Invariant 3D Human Pose Estimation

Object Pose Estimation from Monocular Image Using Multi-view Keypoint Correspondence

Viewpoint Estimation—Insights and Model

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust camera pose estimation by viewpoint classification using deep learning

Abstract

Article PDF

Similar content being viewed by others

Towards Viewpoint Invariant 3D Human Pose Estimation

Object Pose Estimation from Monocular Image Using Multi-view Keypoint Correspondence

Viewpoint Estimation—Insights and Model

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation