Learning to Detect Good 3D Keypoints

Tonioni, Alessio; Salti, Samuele; Tombari, Federico; Spezialetti, Riccardo; Stefano, Luigi Di

doi:10.1007/s11263-017-1037-3

Learning to Detect Good 3D Keypoints

Published: 08 August 2017

Volume 126, pages 1–20, (2018)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Alessio Tonioni¹,
Samuele Salti²,
Federico Tombari ORCID: orcid.org/0000-0001-5598-5212³,
Riccardo Spezialetti¹ &
…
Luigi Di Stefano¹

2746 Accesses
19 Citations
9 Altmetric
Explore all metrics

Abstract

The established approach to 3D keypoint detection consists in defining effective handcrafted saliency functions based on geometric cues with the aim of maximizing keypoint repeatability. Differently, the idea behind our work is to learn a descriptor-specific keypoint detector so as to optimize the end-to-end performance of the feature matching pipeline. Accordingly, we cast 3D keypoint detection as a classification problem between surface patches that can or cannot be matched correctly by a given 3D descriptor, i.e. those either good or not in respect to that descriptor. We propose a machine learning framework that allows for defining examples of good surface patches from the training data and leverages Random Forest classifiers to realize both fixed-scale and adaptive-scale 3D keypoint detectors. Through extensive experiments on standard datasets, we show how feature matching performance improves significantly by deploying 3D descriptors together with companion detectors learned by our methodology with respect to the adoption of established state-of-the-art 3D detectors based on hand-crafted saliency functions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

www.pointclouds.org.
www.opencv.org.
http://graphics.stanford.edu/data/3Dscanrep/.
http://www.dsi.unive.it/~rodola/data.html.
The increased minimum number of samples was motivated also by limitations concerning memory management by the OpenCV “io” module which we used to save and load the forest to and from disk. Indeed, the adopted implementation cannot handle correctly forests that are too large: increasing the minimum number of samples reduced the average depth of each tree in the forest and, thereby, the final file size of the forest.
http://github.com/CVLAB-Unibo/Keypoint-Learning.

References

Aldoma, A., Fäulhammer, T., & Vincze, M. (2014). Automation of “ground truth” annotation for multi-view RGB-D object instance recognition datasets. In Proceedings of international conference on intelligent robots and systems (IROS).
Aldoma, A., Marton, Z., Tombari, F., Wohlkinger, W., Potthast, C., Zeisl, B., et al. (2012a). Point cloud library: Three-dimensional object recognition and 6 dof pose estimation. IEEE Robotics and Automation Magazine (RAM), 19(3), 80–91.
Article Google Scholar
Aldoma, A., Tombari, F., Di Stefano, L., & Vincze, M. (2012b). A global hypotheses verification method for 3d object recognition. In European conference on computer vision (ECCV), Lecture Notes in Computer Science (Vol. 7574, pp. 511–524). Berlin, Heidelberg: Springer. doi:10.1007/978-3-642-33712-3_37.
Alexandre, L. (2012). 3d descriptors for object and category recognition: A comparative evaluation. In IROS workshop on color-depth camera fusion in robotics.
Bariya, P., & Nishino, K. (2010). Scale-hierarchical 3d object recognition in cluttered scenes. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 1657–1664. doi:10.1109/CVPR.2010.5539774.
Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Speeded-up robust features (surf). Computer Vision and Image Understanding, 110(3), 346–359.
Article Google Scholar
Behley, J., Steinhage, V., & Cremers, A. (2012). Performance of histogram descriptors for the classification of 3d laser range data in urban environments. In International conference on robotics and automation.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. doi:10.1023/A:1010933404324.
Article MATH Google Scholar
Castellani, U., Cristani, M., & Fantoni, S. (2008). Sparse points matching by combining 3D mesh saliency with statistical descriptors. In Proceedings of computer graphics forum, pp. 643–652.
Creusot, C., Pears, N., & Austin, J. (2013). A machine-learning approach to keypoint detection and landmarking on 3d meshes. International Journal of Computer Vision, 102(1–3), 146–179. doi:10.1007/s11263-012-0605-9.
Article Google Scholar
Criminisi, A., Shotton, J., & Konukoglu, E. (2012). Decision forests: A unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Foundations and Trends in Computer Graphics and Vision, 7(2–3), 81–227. http://research.microsoft.com/apps/pubs/default.aspx?id=158806.
Dutagaci, H., Cheung, C., & Godil, A. (2012). Evaluation of 3d interest point detection techniques via human-generated ground truth. The Visual Computer, 28(9), 901–917. doi:10.1007/s00371-012-0746-4.
Article Google Scholar
Guo, Y., Bennamoun, M., Sohel, F., Lu, M., Wan, J., & Kwok, N. M. (2016). A comprehensive performance evaluation of 3d local feature descriptors. International Journal of Computer Vision, 116(1), 66–89.
Article MathSciNet Google Scholar
Guo, Y., Sohel, F., Bennamoun, M., Lu, M., & Wan, J. (2013a). Rotational projection statistics for 3d local surface description and object recognition. International Journal of Computer Vision, 105(1), 63–86.
Article MathSciNet MATH Google Scholar
Guo, Y., Sohel, F., Bennamoun, M., Lu, M., & Wan, J. (2013b). Trisi: A distinctive local surface descriptor for 3d modeling and object recognition. In 8th international conference on computer graphics theory and applications (GRAPP).
Hartmann, W., Havlena, M., & Schindler, K. (2014). Predicting matchability. In 2014 IEEE conference on computer vision and pattern recognition (CVPR), pp. 9–16. doi:10.1109/CVPR.2014.9.
Holzer, S., Shotton, J., & Kohli, P. (2012). Learning to efficiently detect repeatable interest points in depth data. In 2012 IEEE European conference on computer vision (ECCV).
Johnson, A., & Hebert, M. (1999). Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(5), 433–449.
Article Google Scholar
Leutenegger, S., Chli, M., & Siegwart, R. (2011). BRISK: Binary robust invariant scalable keypoints. In 2011 IEEE international conference on computer vision (ICCV), pp. 2548–2555. doi:10.1109/ICCV.2011.6126542.
Li, Y., Wang, S., Tian, Q., & Ding, X. (2015). A survey of recent advances in visual feature detection. Neurocomputing, 149 Part B, 736–751. http://www.sciencedirect.com/science/article/pii/S0925231214010121
Lin, X., Zhu, C., Zhang, Q., & Liu, Y. (2016). 3d keypoint detection based on deep neural network with sparse autoencoder. arXiv preprint arXiv:1605.00129.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Article Google Scholar
Mian, A. S., Bennamoun, M., & Owens, R. A. (2010). On the repeatability and quality of keypoints for local feature-based 3D object retrieval from cluttered scenes. International Journal of Computer Vision, 89(2–3), 348–361.
Article Google Scholar
Ovsjanikov, M., Huang, Q., & Guibas, L. (2011). A condition number for non-rigid shape matching. In Eurographics symposium on geometry processing, Vol. 30.
Proença, P. F., Gaspar, F., & Dias, M. S. (2013). Good appearance and shape descriptors for object category recognition. In Advances in visual computing. Lecture notes in computer science (Vol. 8033, pp. 385–394). Springer: Berlin, Heidelberg.
Rodolà, E., Albarelli, A., Bergamasco, F., & Torsello, A. (2013). A scale independent selection process for 3D object recognition in cluttered scenes. International Journal of Computer Vision, 102(1–3), 129–145. doi:10.1007/s11263-012-0568-x.
Article MathSciNet Google Scholar
Rosten, E., Porter, R., & Drummond, T. (2010). Faster and better: A machine learning approach to corner detection. The IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(1), 105–119.
Article Google Scholar
Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. (2011). Orb: An efficient alternative to sift or surf. In IEEE international conference on computer vision, pp. 2564–2571. http://doi.ieeecomputersociety.org/10.1109/ICCV.2011.6126544.
Rusu, R. B., Blodow, N., & Beetz, M. (2009). Fast point feature histograms (FPFH) for 3D registration. In International conference on robotics and automation, pp. 3212–3217. doi:10.1109/ROBOT.2009.5152473.
Salti, S., Tombari, F., & Di Stefano, L. (2014). SHOT: Unique signatures of histograms for surface and texture description. Computer Vision and Image Understanding, 125, 251–264. doi:10.1016/j.cviu.2014.04.011, http://www.sciencedirect.com/science/article/pii/S1077314214000988.
Salti, S., Tombari, F., Spezialetti, R., & Di Stefano, L. (2015). Learning a descriptor-specific 3D keypoint detector. In The IEEE international conference on computer vision (ICCV), pp. 2318–2326.
Shi, J., & Tomasi, C. (1994). Good features to track. In 1994 IEEE conference on computer vision and pattern recognition (CVPR’94), pp. 593–600.
Steder, B., Rusu, R. B., Konolige, K., & Burgard, W. (2011). Point feature extraction on 3d range scans taking into account object boundaries. In 2011 IEEE international conference on robotics and automation (ICRA) (pp. 2601–2608). IEEE.
Strecha, C., Lindner, A., Ali, K., & Fua, P. (2009). Training for task specific keypoint detection. In J. Denzler, G. Notni, & H. Se (Eds.), Pattern recognition: Lecture notes in computer science (Vol. 5748, pp. 151–160). Berlin, Heidelberg: Springer. doi:10.1007/978-3-642-03798-6_16.
Chapter Google Scholar
Sukno, F., Waddington, J., & Whelan, P. (2012). Comparing 3d descriptors for local search of craniofacial landmarks. In International symposium on visual computing (ISVC).
Sun, J., Ovsjanikov, M., & Guibas, L. (2009). A concise and provably informative multi-scale signature based on heat diffusion. In Eurographics symposium on geometry processing, Vol. 28.
Taati, B., Bondy, M., Jasiobedzki, P., & Greenspan, M. (October 2007). Variable dimensional local shape descriptors for object recognition in range data. In Proceedings of the 11th IEEE international conference on computer vision; Rio de Janeiro, Brazil, Vol. 1421, p. 18.
Teran, L., & Mordohai, P. (2014). 3D interest point detection via discriminative learning. In ECCV 2014. Lecture notes in computer science (Vol. 8689, pp. 159–173). Springer. doi:10.1007/978-3-319-10590-1_11.
Tombari, F., Salti, S., & DiStefano, L. (2013). Performance evaluation of 3d keypoint detectors. International Journal of Computer Vision, 102(1–3), 198–220. doi:10.1007/s11263-012-0545-4.
Article Google Scholar
Tuytelaars, T., & Mikolajczyk, K. (2008). Local invariant feature detectors: A survey. Foundations and Trends in Computer Graphics and Vision, 3(3), 177–280.
Article Google Scholar
Verdie, Y., Yi, K. M., Fua, P., & Lepetit, V. (2015). TILDE: A temporally invariant learned DEtector. In Proceedings of the computer vision and pattern recognition.
Wohlkinger, W., Aldoma, A., Rusu, R., & Vincze, M. (2012). 3dnet: Large-scale object class recognition from cad models. In International conference on robotics and automation (ICRA).
Zaharescu, A., Boyer, E., Varanasi, K., & Horaud, R. (2009). Surface feature detection and description with applications to mesh matching. In Proceedings of international conference on computer vision and pattern recognition (CVPR), pp. 373–380.
Zhong, Y. (2009). Intrinsic shape signatures: A shape descriptor for 3D object recognition. In Proceedings of international conference on computer vision workshops, pp. 1–8.

Download references

Author information

Authors and Affiliations

DISI, University of Bologna, Bologna, Italy
Alessio Tonioni, Riccardo Spezialetti & Luigi Di Stefano
Fleetmatics Research, Florence, Italy
Samuele Salti
CAMP, Technical University of Munich, Munich, Germany
Federico Tombari

Authors

Alessio Tonioni
View author publications
You can also search for this author in PubMed Google Scholar
Samuele Salti
View author publications
You can also search for this author in PubMed Google Scholar
Federico Tombari
View author publications
You can also search for this author in PubMed Google Scholar
Riccardo Spezialetti
View author publications
You can also search for this author in PubMed Google Scholar
Luigi Di Stefano
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Federico Tombari.

Additional information

Communicated by Ko Nishino.

Samuele Salti and Federico Tombari: This work was done when at DISI, University of Bologna.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tonioni, A., Salti, S., Tombari, F. et al. Learning to Detect Good 3D Keypoints. Int J Comput Vis 126, 1–20 (2018). https://doi.org/10.1007/s11263-017-1037-3

Download citation

Received: 19 November 2016
Accepted: 20 July 2017
Published: 08 August 2017
Issue Date: January 2018
DOI: https://doi.org/10.1007/s11263-017-1037-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning to Detect Good 3D Keypoints

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

Microsoft COCO: Common Objects in Context

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning to Detect Good 3D Keypoints

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

Microsoft COCO: Common Objects in Context

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation