Advertisement

DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DoF Relocalization

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12349)

Abstract

For relocalization in large-scale point clouds, we propose the first approach that unifies global place recognition and local 6DoF pose refinement. To this end, we design a Siamese network that jointly learns 3D local feature detection and description directly from raw 3D points. It integrates FlexConv and Squeeze-and-Excitation (SE) to assure that the learned local descriptor captures multi-level geometric information and channel-wise relations. For detecting 3D keypoints we predict the discriminativeness of the local descriptors in an unsupervised manner. We generate the global descriptor by directly aggregating the learned local descriptors with an effective attention mechanism. In this way, local and global 3D descriptors are inferred in one single forward pass. Experiments on various benchmarks demonstrate that our method achieves competitive results for both global point cloud retrieval and local point cloud registration in comparison to state-of-the-art approaches. To validate the generalizability and robustness of our 3D keypoints, we demonstrate that our method also performs favorably without fine-tuning on the registration of point clouds that were generated by a visual SLAM system. Code and related materials are available at https://vision.in.tum.de/research/vslam/dh3d.

Keywords

Point clouds 3D deep learning Relocalization 

Supplementary material

504439_1_En_43_MOESM1_ESM.pdf (30.4 mb)
Supplementary material 1 (pdf 31158 KB)

References

  1. 1.
    Angelina Uy, M., Hee Lee, G.: PointNetVLAD: deep point cloud based retrieval for large-scale place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4470–4479 (2018)Google Scholar
  2. 2.
    Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5297–5307 (2016)Google Scholar
  3. 3.
    Axelrod, B., Kaelbling, L.P., Lozano-Pérez, T.: Provably safe robot navigation with obstacle uncertainty. Int. J. Rob. Res. 37(13–14), 1760–1774 (2018)CrossRefGoogle Scholar
  4. 4.
    Bai, X., Luo, Z., Zhou, L., Fu, H., Quan, L., Tai, C.L.: D3Feat: joint learning of dense detection and description of 3D local features. arXiv:2003.03164 [cs.CV] (2020)
  5. 5.
    Cao, F., Zhuang, Y., Zhang, H., Wang, W.: Robust place recognition and loop closing in laser-based SLAM for UGVs in urban environments. IEEE Sens. J. 18(10), 4242–4252 (2018)CrossRefGoogle Scholar
  6. 6.
    Chen, Z., et al.: Deep learning features at scale for visual place recognition. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3223–3230. IEEE (2017)Google Scholar
  7. 7.
    Chen, Z., Liu, L., Sa, I., Ge, Z., Chli, M.: Learning context flexible attention model for long-term visual place recognition. IEEE Robot. Autom. Lett. 3(4), 4015–4022 (2018)CrossRefGoogle Scholar
  8. 8.
    Choy, C., Park, J., Koltun, V.: Fully convolutional geometric features. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8958–8966 (2019)Google Scholar
  9. 9.
    Cop, K.P., Borges, P.V., Dubé, R.: DELIGHT: an efficient descriptor for global localisation using LiDAR intensities. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 3653–3660. IEEE (2018)Google Scholar
  10. 10.
    Deng, H., Birdal, T., Ilic, S.: PPF-FoldNet: unsupervised learning of rotation invariant 3D local descriptors. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 620–638. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01228-1_37CrossRefGoogle Scholar
  11. 11.
    Deng, H., Birdal, T., Ilic, S.: PPFNet: global context aware local features for robust 3D point matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 195–205 (2018)Google Scholar
  12. 12.
    Deng, H., Birdal, T., Ilic, S.: 3D local features for direct pairwise registration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3244–3253 (2019)Google Scholar
  13. 13.
    Deschaud, J.E.: IMLS-SLAM: scan-to-model matching based on 3D data. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 2480–2485. IEEE (2018)Google Scholar
  14. 14.
    DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 224–236 (2018)Google Scholar
  15. 15.
    Dusmanu, M., et al.: D2-Net: a trainable CNN for joint description and detection of local features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8092–8101 (2019)Google Scholar
  16. 16.
    Elbaz, G., Avraham, T., Fischer, A.: 3D point cloud registration for localization using a deep neural network auto-encoder. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4631–4640 (2017)Google Scholar
  17. 17.
    Engel, J., Stückler, J., Cremers, D.: Large-scale direct SLAM with stereo cameras. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1935–1942. IEEE (2015)Google Scholar
  18. 18.
    Gojcic, Z., Zhou, C., Wegner, J.D., Wieser, A.: The perfect match: 3D point cloud matching with smoothed densities. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5545–5554 (2019)Google Scholar
  19. 19.
    Gordo, A., Almazán, J., Revaud, J., Larlus, D.: Deep image retrieval: learning global representations for image search. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 241–257. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_15CrossRefGoogle Scholar
  20. 20.
    Granström, K., Schön, T.B., Nieto, J.I., Ramos, F.T.: Learning to close loops from range data. Int. J. Rob. Res. 30(14), 1728–1754 (2011)CrossRefGoogle Scholar
  21. 21.
    Groh, F., Wieschollek, P., Lensch, H.P.A.: Flex-convolution. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11361, pp. 105–122. Springer, Cham (2019).  https://doi.org/10.1007/978-3-030-20887-5_7CrossRefGoogle Scholar
  22. 22.
    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)Google Scholar
  23. 23.
    Yew, Z.J., Lee, G.H.: 3DFeat-Net: weakly supervised local 3D features for point cloud registration. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 630–646. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01267-0_37CrossRefGoogle Scholar
  24. 24.
    Johnson, A.E.: Spin-Images: A Representation for 3D Surface Matching. Carnegie Mellon University, Pittsburgh (1997) Google Scholar
  25. 25.
    Khoury, M., Zhou, Q.Y., Koltun, V.: Learning compact geometric features. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 153–161 (2017)Google Scholar
  26. 26.
    Kim, G., Kim, A.: Scan context: egocentric spatial descriptor for place recognition within 3D point cloud map. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4802–4809. IEEE (2018)Google Scholar
  27. 27.
    Li, J., Lee, G.H.: USIP: unsupervised stable interest point detection from 3D point clouds. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 361–370 (2019)Google Scholar
  28. 28.
    Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: PointCNN: convolution on x-transformed points. In: Advances in Neural Information Processing Systems, pp. 820–830 (2018)Google Scholar
  29. 29.
    Lowe, D.G.: Local feature view clustering for 3D object recognition. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, p. I. IEEE (2001)Google Scholar
  30. 30.
    Lu, W., Wan, G., Zhou, Y., Fu, X., Yuan, P., Song, S.: DeepVCP: an end-to-end deep neural network for point cloud registration. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 12–21 (2019)Google Scholar
  31. 31.
    Lu, W., Zhou, Y., Wan, G., Hou, S., Song, S.: L3-Net: towards learning based lidar localization for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6389–6398 (2019)Google Scholar
  32. 32.
    Maddern, W., Pascoe, G., Linegar, C., Newman, P.: 1 year, 1000 km: the Oxford RobotCar dataset. Int. J. Rob. Res. 36(1), 3–15 (2017)CrossRefGoogle Scholar
  33. 33.
    Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans. Rob. 31(5), 1147–1163 (2015)CrossRefGoogle Scholar
  34. 34.
    Noh, H., Araujo, A., Sim, J., Weyand, T., Han, B.: Large-scale image retrieval with attentive deep local features. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3456–3465 (2017)Google Scholar
  35. 35.
    Ort, T., Paull, L., Rus, D.: Autonomous vehicle navigation in rural environments without detailed prior maps. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 2040–2047. IEEE (2018)Google Scholar
  36. 36.
    Pomerleau, F., Liu, M., Colas, F., Siegwart, R.: Challenging data sets for point cloud registration algorithms. Int. J. Rob. Res. 31(14), 1705–1711 (2012)CrossRefGoogle Scholar
  37. 37.
    Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)Google Scholar
  38. 38.
    Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view CNNs for object classification on 3D data. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 5648–5656 (2016)Google Scholar
  39. 39.
    Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)Google Scholar
  40. 40.
    Revaud, J., Weinzaepfel, P., de Souza, C.R., Humenberger, M.: R2D2: repeatable and reliable detector and descriptor. In: NeurIPS (2019)Google Scholar
  41. 41.
    Röhling, T., Mack, J., Schulz, D.: A fast histogram-based similarity measure for detecting loop closures in 3-D LiDAR data. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 736–741. IEEE (2015)Google Scholar
  42. 42.
    Rusu, R.B., Bradski, G., Thibaux, R., Hsu, J.: Fast 3D recognition and pose using the viewpoint feature histogram. In: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2155–2162. IEEE (2010)Google Scholar
  43. 43.
    Rusu, R.B., Marton, Z.C., Blodow, N., Beetz, M.: Persistent point feature histograms for 3D point clouds. In: Proceedings of the 10th International Conference on Intelligent Autonomous Systems (IAS-10), Baden-Baden, Germany, pp. 119–128 (2008)Google Scholar
  44. 44.
    Sarlin, P.E., Cadena, C., Siegwart, R., Dymczyk, M.: From coarse to fine: robust hierarchical localization at large scale. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12716–12725 (2019)Google Scholar
  45. 45.
    Sarlin, P.E., Debraine, F., Dymczyk, M., Siegwart, R., Cadena, C.: Leveraging deep visual descriptors for hierarchical efficient localization. arXiv preprint arXiv:1809.01019 (2018)
  46. 46.
    Sattler, T., et al.: Are large-scale 3D models really necessary for accurate visual localization? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1637–1646 (2017)Google Scholar
  47. 47.
    Simonovsky, M., Komodakis, N.: Dynamic edge-conditioned filters in convolutional neural networks on graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3693–3702 (2017)Google Scholar
  48. 48.
    Sipiran, I., Bustos, B.: Harris 3D: a robust extension of the Harris operator for interest point detection on 3D meshes. Vis. Comput. 27(11), 963 (2011).  https://doi.org/10.1007/s00371-011-0610-yCrossRefGoogle Scholar
  49. 49.
    Su, H., et al.: SplatNet: sparse lattice networks for point cloud processing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2530–2539 (2018)Google Scholar
  50. 50.
    Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 945–953 (2015)Google Scholar
  51. 51.
    Taira, H., et al.: InLoc: indoor visual localization with dense matching and view synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7199–7209 (2018)Google Scholar
  52. 52.
    Tombari, F., Salti, S., Di Stefano, L.: Unique shape context for 3D data description. In: Proceedings of the ACM Workshop on 3D Object Retrieval, pp. 57–62. ACM (2010)Google Scholar
  53. 53.
    Wang, P., Yang, R., Cao, B., Xu, W., Lin, Y.: Dels-3D: deep localization and segmentation with a 3D semantic map. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5860–5869 (2018)Google Scholar
  54. 54.
    Wang, P.S., Liu, Y., Guo, Y.X., Sun, C.Y., Tong, X.: O-CNN: octree-based convolutional neural networks for 3D shape analysis. ACM Trans. Graph. (TOG) 36(4), 72 (2017)Google Scholar
  55. 55.
    Wang, R., Schworer, M., Cremers, D.: Stereo DSO: large-scale direct sparse visual odometry with stereo cameras. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3903–3911 (2017)Google Scholar
  56. 56.
    Wang, S., Suo, S., Ma, W.C., Pokrovsky, A., Urtasun, R.: Deep parametric continuous convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2589–2597 (2018)Google Scholar
  57. 57.
    Wang, W., Yu, R., Huang, Q., Neumann, U.: SGPN: similarity group proposal network for 3D point cloud instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2569–2578 (2018)Google Scholar
  58. 58.
    Wang, Y., Solomon, J.M.: Deep closest point: learning representations for point cloud registration. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3523–3532 (2019)Google Scholar
  59. 59.
    Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. (TOG) 38(5), 1–12 (2019)CrossRefGoogle Scholar
  60. 60.
    Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 467–483. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_28CrossRefGoogle Scholar
  61. 61.
    Yin, H., Wang, Y., Tang, L., Ding, X., Xiong, R.: LocNet: global localization in 3D point clouds for mobile robots. In: Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, pp. 26–30 (2018)Google Scholar
  62. 62.
    Zeng, A., Song, S., Nießner, M., Fisher, M., Xiao, J., Funkhouser, T.: 3DMatch: learning local geometric descriptors from RGB-D reconstructions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1802–1811 (2017)Google Scholar
  63. 63.
    Zhang, J., Singh, S.: LOAM: lidar odometry and mapping in real- time. In: Robotics: Science and Systems Conference (RSS), Berkeley, CA, July 2014Google Scholar
  64. 64.
    Zhang, J., Singh, S.: Visual-lidar odometry and mapping: low-drift, robust, and fast. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 2174–2181. IEEE (2015)Google Scholar
  65. 65.
    Zhang, W., Xiao, C.: PCAN: 3D attention map learning using contextual information for point cloud based retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12436–12445 (2019)Google Scholar
  66. 66.
    Zhao, Y., Birdal, T., Deng, H., Tombari, F.: 3D point capsule networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1009–1018 (2019)Google Scholar
  67. 67.
    Zhong, Y.: Intrinsic shape signatures: a shape descriptor for 3D object recognition. In: 2009 IEEE 12th International Conference on Computer Vision Workshops, pp. 689–696. IEEE (2009)Google Scholar
  68. 68.
    Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4490–4499 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Technical University of MunichGarchingGermany
  2. 2.ArtisenseGarchingGermany

Personalised recommendations