Abstract
The established approach to 3D keypoint detection consists in defining effective handcrafted saliency functions based on geometric cues with the aim of maximizing keypoint repeatability. Differently, the idea behind our work is to learn a descriptor-specific keypoint detector so as to optimize the end-to-end performance of the feature matching pipeline. Accordingly, we cast 3D keypoint detection as a classification problem between surface patches that can or cannot be matched correctly by a given 3D descriptor, i.e. those either good or not in respect to that descriptor. We propose a machine learning framework that allows for defining examples of good surface patches from the training data and leverages Random Forest classifiers to realize both fixed-scale and adaptive-scale 3D keypoint detectors. Through extensive experiments on standard datasets, we show how feature matching performance improves significantly by deploying 3D descriptors together with companion detectors learned by our methodology with respect to the adoption of established state-of-the-art 3D detectors based on hand-crafted saliency functions.
Similar content being viewed by others
Notes
The increased minimum number of samples was motivated also by limitations concerning memory management by the OpenCV “io” module which we used to save and load the forest to and from disk. Indeed, the adopted implementation cannot handle correctly forests that are too large: increasing the minimum number of samples reduced the average depth of each tree in the forest and, thereby, the final file size of the forest.
References
Aldoma, A., Fäulhammer, T., & Vincze, M. (2014). Automation of “ground truth” annotation for multi-view RGB-D object instance recognition datasets. In Proceedings of international conference on intelligent robots and systems (IROS).
Aldoma, A., Marton, Z., Tombari, F., Wohlkinger, W., Potthast, C., Zeisl, B., et al. (2012a). Point cloud library: Three-dimensional object recognition and 6 dof pose estimation. IEEE Robotics and Automation Magazine (RAM), 19(3), 80–91.
Aldoma, A., Tombari, F., Di Stefano, L., & Vincze, M. (2012b). A global hypotheses verification method for 3d object recognition. In European conference on computer vision (ECCV), Lecture Notes in Computer Science (Vol. 7574, pp. 511–524). Berlin, Heidelberg: Springer. doi:10.1007/978-3-642-33712-3_37.
Alexandre, L. (2012). 3d descriptors for object and category recognition: A comparative evaluation. In IROS workshop on color-depth camera fusion in robotics.
Bariya, P., & Nishino, K. (2010). Scale-hierarchical 3d object recognition in cluttered scenes. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 1657–1664. doi:10.1109/CVPR.2010.5539774.
Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Speeded-up robust features (surf). Computer Vision and Image Understanding, 110(3), 346–359.
Behley, J., Steinhage, V., & Cremers, A. (2012). Performance of histogram descriptors for the classification of 3d laser range data in urban environments. In International conference on robotics and automation.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. doi:10.1023/A:1010933404324.
Castellani, U., Cristani, M., & Fantoni, S. (2008). Sparse points matching by combining 3D mesh saliency with statistical descriptors. In Proceedings of computer graphics forum, pp. 643–652.
Creusot, C., Pears, N., & Austin, J. (2013). A machine-learning approach to keypoint detection and landmarking on 3d meshes. International Journal of Computer Vision, 102(1–3), 146–179. doi:10.1007/s11263-012-0605-9.
Criminisi, A., Shotton, J., & Konukoglu, E. (2012). Decision forests: A unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Foundations and Trends in Computer Graphics and Vision, 7(2–3), 81–227. http://research.microsoft.com/apps/pubs/default.aspx?id=158806.
Dutagaci, H., Cheung, C., & Godil, A. (2012). Evaluation of 3d interest point detection techniques via human-generated ground truth. The Visual Computer, 28(9), 901–917. doi:10.1007/s00371-012-0746-4.
Guo, Y., Bennamoun, M., Sohel, F., Lu, M., Wan, J., & Kwok, N. M. (2016). A comprehensive performance evaluation of 3d local feature descriptors. International Journal of Computer Vision, 116(1), 66–89.
Guo, Y., Sohel, F., Bennamoun, M., Lu, M., & Wan, J. (2013a). Rotational projection statistics for 3d local surface description and object recognition. International Journal of Computer Vision, 105(1), 63–86.
Guo, Y., Sohel, F., Bennamoun, M., Lu, M., & Wan, J. (2013b). Trisi: A distinctive local surface descriptor for 3d modeling and object recognition. In 8th international conference on computer graphics theory and applications (GRAPP).
Hartmann, W., Havlena, M., & Schindler, K. (2014). Predicting matchability. In 2014 IEEE conference on computer vision and pattern recognition (CVPR), pp. 9–16. doi:10.1109/CVPR.2014.9.
Holzer, S., Shotton, J., & Kohli, P. (2012). Learning to efficiently detect repeatable interest points in depth data. In 2012 IEEE European conference on computer vision (ECCV).
Johnson, A., & Hebert, M. (1999). Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(5), 433–449.
Leutenegger, S., Chli, M., & Siegwart, R. (2011). BRISK: Binary robust invariant scalable keypoints. In 2011 IEEE international conference on computer vision (ICCV), pp. 2548–2555. doi:10.1109/ICCV.2011.6126542.
Li, Y., Wang, S., Tian, Q., & Ding, X. (2015). A survey of recent advances in visual feature detection. Neurocomputing, 149 Part B, 736–751. http://www.sciencedirect.com/science/article/pii/S0925231214010121
Lin, X., Zhu, C., Zhang, Q., & Liu, Y. (2016). 3d keypoint detection based on deep neural network with sparse autoencoder. arXiv preprint arXiv:1605.00129.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Mian, A. S., Bennamoun, M., & Owens, R. A. (2010). On the repeatability and quality of keypoints for local feature-based 3D object retrieval from cluttered scenes. International Journal of Computer Vision, 89(2–3), 348–361.
Ovsjanikov, M., Huang, Q., & Guibas, L. (2011). A condition number for non-rigid shape matching. In Eurographics symposium on geometry processing, Vol. 30.
Proença, P. F., Gaspar, F., & Dias, M. S. (2013). Good appearance and shape descriptors for object category recognition. In Advances in visual computing. Lecture notes in computer science (Vol. 8033, pp. 385–394). Springer: Berlin, Heidelberg.
Rodolà, E., Albarelli, A., Bergamasco, F., & Torsello, A. (2013). A scale independent selection process for 3D object recognition in cluttered scenes. International Journal of Computer Vision, 102(1–3), 129–145. doi:10.1007/s11263-012-0568-x.
Rosten, E., Porter, R., & Drummond, T. (2010). Faster and better: A machine learning approach to corner detection. The IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(1), 105–119.
Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. (2011). Orb: An efficient alternative to sift or surf. In IEEE international conference on computer vision, pp. 2564–2571. http://doi.ieeecomputersociety.org/10.1109/ICCV.2011.6126544.
Rusu, R. B., Blodow, N., & Beetz, M. (2009). Fast point feature histograms (FPFH) for 3D registration. In International conference on robotics and automation, pp. 3212–3217. doi:10.1109/ROBOT.2009.5152473.
Salti, S., Tombari, F., & Di Stefano, L. (2014). SHOT: Unique signatures of histograms for surface and texture description. Computer Vision and Image Understanding, 125, 251–264. doi:10.1016/j.cviu.2014.04.011, http://www.sciencedirect.com/science/article/pii/S1077314214000988.
Salti, S., Tombari, F., Spezialetti, R., & Di Stefano, L. (2015). Learning a descriptor-specific 3D keypoint detector. In The IEEE international conference on computer vision (ICCV), pp. 2318–2326.
Shi, J., & Tomasi, C. (1994). Good features to track. In 1994 IEEE conference on computer vision and pattern recognition (CVPR’94), pp. 593–600.
Steder, B., Rusu, R. B., Konolige, K., & Burgard, W. (2011). Point feature extraction on 3d range scans taking into account object boundaries. In 2011 IEEE international conference on robotics and automation (ICRA) (pp. 2601–2608). IEEE.
Strecha, C., Lindner, A., Ali, K., & Fua, P. (2009). Training for task specific keypoint detection. In J. Denzler, G. Notni, & H. Se (Eds.), Pattern recognition: Lecture notes in computer science (Vol. 5748, pp. 151–160). Berlin, Heidelberg: Springer. doi:10.1007/978-3-642-03798-6_16.
Sukno, F., Waddington, J., & Whelan, P. (2012). Comparing 3d descriptors for local search of craniofacial landmarks. In International symposium on visual computing (ISVC).
Sun, J., Ovsjanikov, M., & Guibas, L. (2009). A concise and provably informative multi-scale signature based on heat diffusion. In Eurographics symposium on geometry processing, Vol. 28.
Taati, B., Bondy, M., Jasiobedzki, P., & Greenspan, M. (October 2007). Variable dimensional local shape descriptors for object recognition in range data. In Proceedings of the 11th IEEE international conference on computer vision; Rio de Janeiro, Brazil, Vol. 1421, p. 18.
Teran, L., & Mordohai, P. (2014). 3D interest point detection via discriminative learning. In ECCV 2014. Lecture notes in computer science (Vol. 8689, pp. 159–173). Springer. doi:10.1007/978-3-319-10590-1_11.
Tombari, F., Salti, S., & DiStefano, L. (2013). Performance evaluation of 3d keypoint detectors. International Journal of Computer Vision, 102(1–3), 198–220. doi:10.1007/s11263-012-0545-4.
Tuytelaars, T., & Mikolajczyk, K. (2008). Local invariant feature detectors: A survey. Foundations and Trends in Computer Graphics and Vision, 3(3), 177–280.
Verdie, Y., Yi, K. M., Fua, P., & Lepetit, V. (2015). TILDE: A temporally invariant learned DEtector. In Proceedings of the computer vision and pattern recognition.
Wohlkinger, W., Aldoma, A., Rusu, R., & Vincze, M. (2012). 3dnet: Large-scale object class recognition from cad models. In International conference on robotics and automation (ICRA).
Zaharescu, A., Boyer, E., Varanasi, K., & Horaud, R. (2009). Surface feature detection and description with applications to mesh matching. In Proceedings of international conference on computer vision and pattern recognition (CVPR), pp. 373–380.
Zhong, Y. (2009). Intrinsic shape signatures: A shape descriptor for 3D object recognition. In Proceedings of international conference on computer vision workshops, pp. 1–8.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Ko Nishino.
Samuele Salti and Federico Tombari: This work was done when at DISI, University of Bologna.
Rights and permissions
About this article
Cite this article
Tonioni, A., Salti, S., Tombari, F. et al. Learning to Detect Good 3D Keypoints. Int J Comput Vis 126, 1–20 (2018). https://doi.org/10.1007/s11263-017-1037-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-017-1037-3