Skip to main content
Log in

Learning to Detect Good 3D Keypoints

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

The established approach to 3D keypoint detection consists in defining effective handcrafted saliency functions based on geometric cues with the aim of maximizing keypoint repeatability. Differently, the idea behind our work is to learn a descriptor-specific keypoint detector so as to optimize the end-to-end performance of the feature matching pipeline. Accordingly, we cast 3D keypoint detection as a classification problem between surface patches that can or cannot be matched correctly by a given 3D descriptor, i.e. those either good or not in respect to that descriptor. We propose a machine learning framework that allows for defining examples of good surface patches from the training data and leverages Random Forest classifiers to realize both fixed-scale and adaptive-scale 3D keypoint detectors. Through extensive experiments on standard datasets, we show how feature matching performance improves significantly by deploying 3D descriptors together with companion detectors learned by our methodology with respect to the adoption of established state-of-the-art 3D detectors based on hand-crafted saliency functions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. www.pointclouds.org.

  2. www.opencv.org.

  3. http://graphics.stanford.edu/data/3Dscanrep/.

  4. http://www.dsi.unive.it/~rodola/data.html.

  5. The increased minimum number of samples was motivated also by limitations concerning memory management by the OpenCV “io” module which we used to save and load the forest to and from disk. Indeed, the adopted implementation cannot handle correctly forests that are too large: increasing the minimum number of samples reduced the average depth of each tree in the forest and, thereby, the final file size of the forest.

  6. http://github.com/CVLAB-Unibo/Keypoint-Learning.

References

  • Aldoma, A., Fäulhammer, T., & Vincze, M. (2014). Automation of “ground truth” annotation for multi-view RGB-D object instance recognition datasets. In Proceedings of international conference on intelligent robots and systems (IROS).

  • Aldoma, A., Marton, Z., Tombari, F., Wohlkinger, W., Potthast, C., Zeisl, B., et al. (2012a). Point cloud library: Three-dimensional object recognition and 6 dof pose estimation. IEEE Robotics and Automation Magazine (RAM), 19(3), 80–91.

    Article  Google Scholar 

  • Aldoma, A., Tombari, F., Di Stefano, L., & Vincze, M. (2012b). A global hypotheses verification method for 3d object recognition. In European conference on computer vision (ECCV), Lecture Notes in Computer Science (Vol. 7574, pp. 511–524). Berlin, Heidelberg: Springer. doi:10.1007/978-3-642-33712-3_37.

  • Alexandre, L. (2012). 3d descriptors for object and category recognition: A comparative evaluation. In IROS workshop on color-depth camera fusion in robotics.

  • Bariya, P., & Nishino, K. (2010). Scale-hierarchical 3d object recognition in cluttered scenes. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 1657–1664. doi:10.1109/CVPR.2010.5539774.

  • Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Speeded-up robust features (surf). Computer Vision and Image Understanding, 110(3), 346–359.

    Article  Google Scholar 

  • Behley, J., Steinhage, V., & Cremers, A. (2012). Performance of histogram descriptors for the classification of 3d laser range data in urban environments. In International conference on robotics and automation.

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. doi:10.1023/A:1010933404324.

    Article  MATH  Google Scholar 

  • Castellani, U., Cristani, M., & Fantoni, S. (2008). Sparse points matching by combining 3D mesh saliency with statistical descriptors. In Proceedings of computer graphics forum, pp. 643–652.

  • Creusot, C., Pears, N., & Austin, J. (2013). A machine-learning approach to keypoint detection and landmarking on 3d meshes. International Journal of Computer Vision, 102(1–3), 146–179. doi:10.1007/s11263-012-0605-9.

    Article  Google Scholar 

  • Criminisi, A., Shotton, J., & Konukoglu, E. (2012). Decision forests: A unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Foundations and Trends in Computer Graphics and Vision, 7(2–3), 81–227. http://research.microsoft.com/apps/pubs/default.aspx?id=158806.

  • Dutagaci, H., Cheung, C., & Godil, A. (2012). Evaluation of 3d interest point detection techniques via human-generated ground truth. The Visual Computer, 28(9), 901–917. doi:10.1007/s00371-012-0746-4.

    Article  Google Scholar 

  • Guo, Y., Bennamoun, M., Sohel, F., Lu, M., Wan, J., & Kwok, N. M. (2016). A comprehensive performance evaluation of 3d local feature descriptors. International Journal of Computer Vision, 116(1), 66–89.

    Article  MathSciNet  Google Scholar 

  • Guo, Y., Sohel, F., Bennamoun, M., Lu, M., & Wan, J. (2013a). Rotational projection statistics for 3d local surface description and object recognition. International Journal of Computer Vision, 105(1), 63–86.

    Article  MathSciNet  MATH  Google Scholar 

  • Guo, Y., Sohel, F., Bennamoun, M., Lu, M., & Wan, J. (2013b). Trisi: A distinctive local surface descriptor for 3d modeling and object recognition. In 8th international conference on computer graphics theory and applications (GRAPP).

  • Hartmann, W., Havlena, M., & Schindler, K. (2014). Predicting matchability. In 2014 IEEE conference on computer vision and pattern recognition (CVPR), pp. 9–16. doi:10.1109/CVPR.2014.9.

  • Holzer, S., Shotton, J., & Kohli, P. (2012). Learning to efficiently detect repeatable interest points in depth data. In 2012 IEEE European conference on computer vision (ECCV).

  • Johnson, A., & Hebert, M. (1999). Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(5), 433–449.

    Article  Google Scholar 

  • Leutenegger, S., Chli, M., & Siegwart, R. (2011). BRISK: Binary robust invariant scalable keypoints. In 2011 IEEE international conference on computer vision (ICCV), pp. 2548–2555. doi:10.1109/ICCV.2011.6126542.

  • Li, Y., Wang, S., Tian, Q., & Ding, X. (2015). A survey of recent advances in visual feature detection. Neurocomputing, 149 Part B, 736–751. http://www.sciencedirect.com/science/article/pii/S0925231214010121

  • Lin, X., Zhu, C., Zhang, Q., & Liu, Y. (2016). 3d keypoint detection based on deep neural network with sparse autoencoder. arXiv preprint arXiv:1605.00129.

  • Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.

    Article  Google Scholar 

  • Mian, A. S., Bennamoun, M., & Owens, R. A. (2010). On the repeatability and quality of keypoints for local feature-based 3D object retrieval from cluttered scenes. International Journal of Computer Vision, 89(2–3), 348–361.

    Article  Google Scholar 

  • Ovsjanikov, M., Huang, Q., & Guibas, L. (2011). A condition number for non-rigid shape matching. In Eurographics symposium on geometry processing, Vol. 30.

  • Proença, P. F., Gaspar, F., & Dias, M. S. (2013). Good appearance and shape descriptors for object category recognition. In Advances in visual computing. Lecture notes in computer science (Vol. 8033, pp. 385–394). Springer: Berlin, Heidelberg.

  • Rodolà, E., Albarelli, A., Bergamasco, F., & Torsello, A. (2013). A scale independent selection process for 3D object recognition in cluttered scenes. International Journal of Computer Vision, 102(1–3), 129–145. doi:10.1007/s11263-012-0568-x.

    Article  MathSciNet  Google Scholar 

  • Rosten, E., Porter, R., & Drummond, T. (2010). Faster and better: A machine learning approach to corner detection. The IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(1), 105–119.

    Article  Google Scholar 

  • Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. (2011). Orb: An efficient alternative to sift or surf. In IEEE international conference on computer vision, pp. 2564–2571. http://doi.ieeecomputersociety.org/10.1109/ICCV.2011.6126544.

  • Rusu, R. B., Blodow, N., & Beetz, M. (2009). Fast point feature histograms (FPFH) for 3D registration. In International conference on robotics and automation, pp. 3212–3217. doi:10.1109/ROBOT.2009.5152473.

  • Salti, S., Tombari, F., & Di Stefano, L. (2014). SHOT: Unique signatures of histograms for surface and texture description. Computer Vision and Image Understanding, 125, 251–264. doi:10.1016/j.cviu.2014.04.011, http://www.sciencedirect.com/science/article/pii/S1077314214000988.

  • Salti, S., Tombari, F., Spezialetti, R., & Di Stefano, L. (2015). Learning a descriptor-specific 3D keypoint detector. In The IEEE international conference on computer vision (ICCV), pp. 2318–2326.

  • Shi, J., & Tomasi, C. (1994). Good features to track. In 1994 IEEE conference on computer vision and pattern recognition (CVPR’94), pp. 593–600.

  • Steder, B., Rusu, R. B., Konolige, K., & Burgard, W. (2011). Point feature extraction on 3d range scans taking into account object boundaries. In 2011 IEEE international conference on robotics and automation (ICRA) (pp. 2601–2608). IEEE.

  • Strecha, C., Lindner, A., Ali, K., & Fua, P. (2009). Training for task specific keypoint detection. In J. Denzler, G. Notni, & H. Se (Eds.), Pattern recognition: Lecture notes in computer science (Vol. 5748, pp. 151–160). Berlin, Heidelberg: Springer. doi:10.1007/978-3-642-03798-6_16.

    Chapter  Google Scholar 

  • Sukno, F., Waddington, J., & Whelan, P. (2012). Comparing 3d descriptors for local search of craniofacial landmarks. In International symposium on visual computing (ISVC).

  • Sun, J., Ovsjanikov, M., & Guibas, L. (2009). A concise and provably informative multi-scale signature based on heat diffusion. In Eurographics symposium on geometry processing, Vol. 28.

  • Taati, B., Bondy, M., Jasiobedzki, P., & Greenspan, M. (October 2007). Variable dimensional local shape descriptors for object recognition in range data. In Proceedings of the 11th IEEE international conference on computer vision; Rio de Janeiro, Brazil, Vol. 1421, p. 18.

  • Teran, L., & Mordohai, P. (2014). 3D interest point detection via discriminative learning. In ECCV 2014. Lecture notes in computer science (Vol. 8689, pp. 159–173). Springer. doi:10.1007/978-3-319-10590-1_11.

  • Tombari, F., Salti, S., & DiStefano, L. (2013). Performance evaluation of 3d keypoint detectors. International Journal of Computer Vision, 102(1–3), 198–220. doi:10.1007/s11263-012-0545-4.

    Article  Google Scholar 

  • Tuytelaars, T., & Mikolajczyk, K. (2008). Local invariant feature detectors: A survey. Foundations and Trends in Computer Graphics and Vision, 3(3), 177–280.

    Article  Google Scholar 

  • Verdie, Y., Yi, K. M., Fua, P., & Lepetit, V. (2015). TILDE: A temporally invariant learned DEtector. In Proceedings of the computer vision and pattern recognition.

  • Wohlkinger, W., Aldoma, A., Rusu, R., & Vincze, M. (2012). 3dnet: Large-scale object class recognition from cad models. In International conference on robotics and automation (ICRA).

  • Zaharescu, A., Boyer, E., Varanasi, K., & Horaud, R. (2009). Surface feature detection and description with applications to mesh matching. In Proceedings of international conference on computer vision and pattern recognition (CVPR), pp. 373–380.

  • Zhong, Y. (2009). Intrinsic shape signatures: A shape descriptor for 3D object recognition. In Proceedings of international conference on computer vision workshops, pp. 1–8.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Federico Tombari.

Additional information

Communicated by Ko Nishino.

Samuele Salti and Federico Tombari: This work was done when at DISI, University of Bologna.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tonioni, A., Salti, S., Tombari, F. et al. Learning to Detect Good 3D Keypoints. Int J Comput Vis 126, 1–20 (2018). https://doi.org/10.1007/s11263-017-1037-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-017-1037-3

Keywords

Navigation