Skip to main content
Log in

Hybrid3D: learning 3D hybrid features with point clouds and multi-view images for point cloud registration

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

In recent years, point cloud registration has achieved great success by learning geometric features with deep learning techniques. However, existing approaches that rely on pure geometric context still suffer from sensor noise and geometric ambiguities (e.g., flat or symmetric structure), which limit their robustness to real-world scenes. When 3D point clouds are constructed by RGB-D cameras, we can enhance the learned features with complementary texture information from RGB images. To this end, we propose to learn a 3D hybrid feature that fully exploits the multi-view colored images and point clouds from indoor RGB-D scene scans. Specifically, to address the discrepancy of 2D–3D observations, we design to extract informative 2D features from image planes and take only these features for fusion. Then, we utilize a novel soft-fusion module to associate and fuse hybrid features in a unified space while alleviating the ambiguities of 2D–3D feature binding. Finally, we develop a self-supervised feature scoring module customized for our multi-modal hybrid features, which significantly improves the keypoint selection quality in noisy indoor scene scans. Our method shows competitive registration performance with previous methods on two real-world datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Chen K, Lai Y K, Hu S M. 3D indoor scene modeling from RGB-D data: a survey. Comp Visual Media, 2015, 1: 267–278

    Article  Google Scholar 

  2. Chen Y D, Hao C Y, Wu W, et al. Robust dense reconstruction by range merging based on confidence estimation. Sci China Inf Sci, 2016, 59: 092103

    Article  Google Scholar 

  3. Chen W, Duan J, Basevi H, et al. PointPoseNet: point pose network for robust 6D object pose estimation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020. 2824–2833

  4. Du J, Wang R, Cremers D. DH3D: deep hierarchical 3D descriptors for robust large-scale 6DoF relocalization. In: Proceedings of European Conference on Computer Vision, 2020. 744–762

  5. Schneider T, Dymczyk M, Fehr M, et al. Maplab: an open framework for research in visual-inertial mapping and localization. IEEE Robot Autom Lett, 2018, 3: 1418–1425

    Article  Google Scholar 

  6. Choy C, Dong W, Koltun V. Deep global registration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. 2514–2523

  7. Lu S, Han J G, Wang L Z, et al. Research on two-stage variable scale three-dimensional point cloud registration algorithm. Laser Optoelectron Prog, 2020, 57: 201503

    Article  Google Scholar 

  8. Zhang Z, Dai Y, Sun J. Deep learning based point cloud registration: an overview. Virtual Reality Intell Hardware, 2020, 2: 222–246

    Article  Google Scholar 

  9. Guo Y, Bennamoun M, Sohel F A, et al. A comprehensive performance evaluation of 3D local feature descriptors. Int J Comput Vis, 2016, 116: 66–89

    Article  MathSciNet  Google Scholar 

  10. Stancelova P, Sikudova E, Cernekova Z. Performance evaluation of selected 3D keypoint detector-descriptor combinations. In: Proceedings of International Conference on Computer Vision and Graphics, 2020. 188–200

  11. Bai X, Luo Z, Zhou L, et al. D3Feat: joint learning of dense detection and description of 3D local features. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 2020. 6358–6366

  12. Zhang Z, Sun J, Dai Y, et al. VRNet: learning the rectified virtual corresponding points for 3D point cloud registration. IEEE Trans Circuits Syst Video Technol, 2022, 32: 4997–5010

    Article  Google Scholar 

  13. Liu B S, Chen X M, Han Y H, et al. Accelerating DNN-based 3D point cloud processing for mobile computing. Sci China Inf Sci, 2019, 62: 212102

    Article  MathSciNet  Google Scholar 

  14. Park J, Zhou Q, Koltun V. Colored point cloud registration revisited. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), Venice, 2017. 143–152

  15. Dusmanu M, Rocco I, Pajdla T, et al. D2-Net: a trainable CNN for joint description and detection of local features. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 8092–8101

  16. Revaud J, de Souza C R, Humenberger M, et al. R2D2: reliable and repeatable detector and descriptor. In: Proceedings of Neural Information Processing Systems (NeurIPS), Vancouver, 2019. 12405–12415

  17. DeTone D, Malisiewicz T, Rabinovich A. SuperPoint: self-supervised interest point detection and description. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, 2018. 224–236

  18. Tang J, Kim H, Guizilini V, et al. Neural outlier rejection for self-supervised keypoint learning. In: Proceedings of the 8th International Conference on Learning Representations (ICLR), 2020

  19. Wang C, Xu D, Zhu Y, et al. Densefusion: 6D object pose estimation by iterative dense fusion. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 2019. 3343–3352

  20. Xu D, Anguelov D, Jain A. PointFusion: deep sensor fusion for 3D bounding box estimation. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 244–253

  21. Chen Y, Yang B, Liang M, et al. Learning joint 2D-3D representations for depth completion. In: Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), 2019. 10022–10031

  22. He Y, Sun W, Huang H, et al. PVN3D: a deep point-wise 3D keypoints voting network for 6DoF pose estimation. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 2020. 11629–11638

  23. Qi C, Chen X, Litany O, et al. ImVoteNet: boosting 3D object detection in point clouds with image votes. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 2020. 4403–4412

  24. Huang S, Gojcic Z, Usvyatsov M, et al. PREDATOR: registration of 3D point clouds with low overlap. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. 4267–4276

  25. Yu H, Li F, Saleh M, et al. CoFiNet: reliable coarse-to-fine correspondences for robust pointcloud registration. In: Proceedings of Advances in Neural Information Processing Systems, 2021. 34

  26. Li J, Lee G H. USIP: unsupervised stable interest point detection from 3D point clouds. In: Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), 2019. 361–370

  27. Tombari F, Salti S, Stefano L D. Unique signatures of histograms for local surface description. In: Proceedings of European Conference on Computer Vision, 2010. 356–369

  28. Rusu R B, Blodow N, Beetz M. Fast point feature histograms (FPFH) for 3D registration. In: Proceedings of IEEE International Conference on Robotics and Automation, Kobe, 2009. 3212–3217

  29. Rusu R B, Bradski G, Thibaux R, et al. Fast 3D recognition and pose using the viewpoint feature histogram. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, Taipei, 2010. 2155–2162

  30. Aldoma A, Tombari F, Rusu R B, et al. OUR-CVFH-oriented, unique and repeatable clustered viewpoint feature histogram for object recognition and 6DOF pose estimation. In: Proceedings of Joint DAGM (German Association for Pattern Recognition) and OAGM Symposium, 2012. 113–122

  31. Steder B, Rusu R B, Konolige K, et al. NARF: 3D range image features for object recognition. In: Proceedings of Workshop on Defining and Solving Realistic Perception Problems in Personal Robotics at the IEEE/RSJ, 2010

  32. Zeng A, Song S, Nießner M, et al. 3DMatch: learning local geometric descriptors from RGB-D reconstructions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 199–208

  33. Khoury M, Zhou Q, Koltun V. Learning compact geometric features. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), 2017. 153–161

  34. Deng H, Birdal T, Ilic S. PPFNet: global context aware local features for robust 3D point matching. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 195–205

  35. Deng H, Birdal T, Ilic S. PPF-FoldNet: unsupervised learning of rotation invariant 3D local descriptors. In: Proceedings of European Conference on Computer Vision (ECCV), 2018. 602–618

  36. Deng H, Birdal T, Ilic S. 3D local features for direct pairwise registration. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 2019. 3244–3253

  37. Wang Y, Solomon J. Deep closest point: learning representations for point cloud registration. In: Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), 2019. 3523–3532

  38. Aoki Y, Goforth H, Srivatsan R A, et al. PointNetLK: robust & efficient point cloud registration using pointnet. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 7163–7172

  39. Qi C, Su H, Mo K, et al. PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 2017. 77–85

  40. Qi C, Yi L, Su H, et al. PointNet+ +: deep hierarchical feature learning on point sets in a metric space. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 2017. 5099–5108

  41. Thomas H, Qi C, Deschaud J-E, et al. KPConv: flexible and deformable convolution for point clouds. In: Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), 2019. 6410–6419

  42. Wang Y, Sun Y, Liu Z, et al. Dynamic graph CNN for learning on point clouds. ACM Trans Graph, 2019, 38: 1–12

    Google Scholar 

  43. Gojcic Z, Zhou C, Wegner J D, et al. The perfect match: 3D point cloud matching with smoothed densities. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 2019. 5545–5554

  44. Choy C, Park J, Koltun V. Fully convolutional geometric features. In: Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), 2019. 8957–8965

  45. Liu B, Wang M, Foroosh H, et al. Sparse convolutional neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 2015. 806–814

  46. Ao S, Hu Q, Yang B, et al. SpinNet: learning a general surface descriptor for 3D point cloud registration. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. 11753–11762

  47. Spezialetti R, Salti S, Stefano L D. Learning an effective equivariant 3D descriptor without supervision. In: Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), 2019. 6401–6410

  48. Chen Z, Yang F, Tao W. DetarNet: decoupling translation and rotation by siamese network for point cloud registration. 2021. ArXiv:2112.14059

  49. Bai X, Luo Z, Zhou L, et al. PointDSC: robust point cloud registration using deep spatial consistency. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. 15859–15869

  50. Sarlin P-E, DeTone D, Malisiewicz T, et al. SuperGlue: learning feature matching with graph neural networks. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 4938–4947

  51. Zhou L, Zhu S, Luo Z, et al. Learning and matching multi-view descriptors for registration of point clouds. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 527–544

  52. Li L, Zhu S, Fu H, et al. End-to-end learning local multi-view descriptors for 3D point clouds. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 2020. 1916–1925

  53. Huang H, Kalogerakis E, Chaudhuri S, et al. Learning local shape descriptors from part correspondences with multiview convolutional networks. ACM Trans Graph, 2018, 37: 1–14

    Google Scholar 

  54. Banani M E, Gao L, Johnson J. UnsupervisedR&R: unsupervised point cloud registration via differentiable rendering. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. 7129–7139

  55. Banani M E, Johnson J. Bootstrap your own correspondences. In: Proceedings of Computer Vision and Pattern Recognition, 2021. 6433–6442

  56. Sipiran I, Bustos B. Harris 3D: a robust extension of the Harris operator for interest point detection on 3D meshes. Vis Comput, 2011, 27: 963–976

    Article  Google Scholar 

  57. Zhong Y. Intrinsic shape signatures: a shape descriptor for 3D object recognition. In: Proceedings of IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, Kyoto, 2009. 689–696

  58. Yew Z J, Lee G H. 3DFeat-Net: weakly supervised local 3D features for point cloud registration. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 630–646

  59. Liang M, Yang B, Wang S, et al. Deep continuous fusion for multi-sensor 3D object detection. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 663–678

  60. Yoo J H, Kim Y, Kim J, et al. 3D-CVF: generating joint camera and LIDAR features using cross-view spatial feature fusion for 3D object detection. In: Proceedings of European Conference on Computer Vision, 2020. 720–736

  61. Dai A, Nießner M. 3DMV: joint 3D-multi-view prediction for 3D semantic scene segmentation. In: Proceedings of European Conference on Computer Vision, 2018. 452–468

  62. Jaritz M, Gu J, Su H. Multi-view pointnet for 3D scene understanding. In: Proceedings of IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019

  63. Zhang J, Zhu C, Zheng L C, et al. Fusion-aware point convolution for online semantic 3D scene segmentation. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 4534–4543

  64. Huang S S, Ma Z Y, Mu T J, et al. Supervoxel convolution for online 3D semantic segmentation. ACM Trans Graph, 2021, 40: 1–15

    Google Scholar 

  65. Xiang R, Zheng F, Su H, et al. 3dDepthNet: point cloud guided depth completion network for sparse depth and single color image. 2020. ArXiv:2003.09175

  66. Xing X, Cai Y, Lu T, et al. 3DTNet: learning local features using 2D and 3D cues. In: Proceedings of International Conference on 3D Vision (3DV), Verona, 2018. 435–443

  67. Wang B, Chen C, Cui Z, et al. P2-Net: joint description and detection of local features for pixel and point matching. In: Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), 2021. 16004–16013

  68. Pham Q-H, Uy M A, Hua B-S, et al. LCD: learned cross-domain descriptors for 2D-3D matching. In: Proceedings of Computer Vision and Pattern Recognition, New York, 2020. 11856–11864

  69. Feng M, Hu S, Ang M H, et al. 2D3D-MatchNet: learning to match keypoints across 2D image and 3D point cloud. In: Proceedings of International Conference on Robotics and Automation (ICRA), 2019. 4790–4796

  70. Christiansen P H, Kragh M F, Brodskiy Y, et al. UnsuperPoint: end-to-end unsupervised interest point detector and descriptor. 2019. ArXiv:1907.04011

  71. Lindenberger P, Sarlin P-E, Larsson V, et al. Pixel-perfect structure-from-motion with featuremetric refinement. In: Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), 2021. 5987–5997

  72. He K, Lu Y, Sclaroff S. Local descriptors optimized for average precision. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 596–605

  73. Wang C, Pelillo M, Siddiqi K. Dominant set clustering and pooling for multi-view 3D object recognition. In: Proceedings of British Machine Vision Conference, London, 2017

  74. Mishchuk A, Mishkin D, Radenovic F, et al. Working hard to know your neighbor’s margins: local descriptor learning loss. In: Proceedings of Computer Vision and Pattern Recognition, Long Beach, 2017. 4826–4837

  75. Law M T, Thome N, Cord M. Quadruplet-wise image similarity learning. In: Proceedings of IEEE International Conference on Computer Vision, Sydney, 2013. 249–256

  76. Brachmann E, Rother C. Learning less is more-6D camera localization via 3D surface regression. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. 4654–4662

  77. Choi S, Zhou Q, Koltun V. Robust reconstruction of indoor scenes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 2015. 5556–5565

Download references

Acknowledgements

This work was partially supported by National Natural Science Foundation of China (Grant No. 61932003) and ZJU-SenseTime Joint Lab of 3D Vision.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hujun Bao.

Additional information

Supporting information

Appendixes A and B. The supporting information is available online at info.scichina.com and link.springer.com. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.

Supplementary File

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, B., Huang, Z., Li, Y. et al. Hybrid3D: learning 3D hybrid features with point clouds and multi-view images for point cloud registration. Sci. China Inf. Sci. 66, 172101 (2023). https://doi.org/10.1007/s11432-022-3604-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-022-3604-6

Keywords

Navigation