Skip to main content

H3DNet: 3D Object Detection Using Hybrid Geometric Primitives

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12357))

Abstract

We introduce H3DNet, which takes a colorless 3D point cloud as input and outputs a collection of oriented object bounding boxes (or BB) and their semantic labels. The critical idea of H3DNet is to predict a hybrid set of geometric primitives, i.e., BB centers, BB face centers, and BB edge centers. We show how to convert the predicted geometric primitives into object proposals by defining a distance function between an object and the geometric primitives. This distance function enables continuous optimization of object proposals, and its local minimums provide high-fidelity object proposals. H3DNet then utilizes a matching and refinement module to classify object proposals into detected objects and fine-tune the geometric parameters of the detected objects. The hybrid set of geometric primitives not only provides more accurate signals for object detection than using a single type of geometric primitives, but it also provides an overcomplete set of constraints on the resulting 3D layout. Therefore, H3DNet can tolerate outliers in predicted geometric primitives. Our model achieves state-of-the-art 3D detection results on two large datasets with real 3D scans, ScanNet and SUN RGB-D. Our code is open-sourced at here.

B. Sun and H. Yang—Equal Contribution.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Akata, Z., Malinowski, M., Fritz, M., Schiele, B.: Multi-cue zero-shot learning with strong supervision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 59–68 (2016)

    Google Scholar 

  2. Atzmon, M., Maron, H., Lipman, Y.: Point convolutional neural networks by extension operators. arXiv preprint arXiv:1803.10091 (2018)

  3. Baxter, J.: A Bayesian/information theoretic model of learning to learn via multiple task sampling. Mach. Learn. 28(1), 7–39 (1997)

    Article  Google Scholar 

  4. Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1907–1915 (2017)

    Google Scholar 

  5. Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Niessner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

    Google Scholar 

  6. Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3D object reconstruction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 605–613 (2017)

    Google Scholar 

  7. Graham, B., Engelcke, M., van der Maaten, L.: 3D semantic segmentation with submanifold sparse convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9224–9232 (2018)

    Google Scholar 

  8. Guibas, L.J., Huang, Q., Liang, Z.: A condition number for joint optimization of cycle-consistent networks. In: Advances in Neural Information Processing Systems, pp. 1007–1017 (2019)

    Google Scholar 

  9. Hermosilla, P., Ritschel, T., Vázquez, P.P., Vinacua, À., Ropinski, T.: Monte Carlo convolution for learning on non-uniformly sampled point clouds. In: SIGGRAPH Asia 2018 Technical Papers, p. 235. ACM (2018)

    Google Scholar 

  10. Hou, J., Dai, A., Niessner, M.: 3D-SIS: 3D semantic instance segmentation of RGB-D scans. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

    Google Scholar 

  11. Hua, B.S., Tran, M.K., Yeung, S.K.: Pointwise convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 984–993 (2018)

    Google Scholar 

  12. Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 6555–6564 (2017). https://doi.org/10.1109/CVPR.2017.694

  13. Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7482–7491 (2018)

    Google Scholar 

  14. Kim, B., Xu, S., Savarese, S.: Accurate localization of 3D objects from RGB-D data using segmentation hypotheses. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013, pp. 3182–3189 (2013). https://doi.org/10.1109/CVPR.2013.409

  15. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: 3rd International Conference on Learning Representations, Conference Track Proceedings, ICLR 2015, San Diego, CA, USA, 7–9 May 2015 (2015). http://arxiv.org/abs/1412.6980

  16. Klokov, R., Lempitsky, V.: Escape from cells: deep Kd-networks for the recognition of 3D point cloud models. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 863–872 (2017)

    Google Scholar 

  17. Lahoud, J., Ghanem, B.: 2D-driven 3D object detection in RGB-D images. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017, pp. 4632–4640 (2017). https://doi.org/10.1109/ICCV.2017.495

  18. Lahoud, J., Ghanem, B.: 2D-driven 3D object detection in RGB-D images. In: The IEEE International Conference on Computer Vision (ICCV), October 2017

    Google Scholar 

  19. Lahoud, J., Ghanem, B., Pollefeys, M., Oswald, M.R.: 3D instance segmentation via multi-task metric learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9256–9266 (2019)

    Google Scholar 

  20. Lepetit, V., Moreno-Noguer, F., Fua, P.: Epnp: an accurateO(n) solution to the pnp problem. Int. J. Comput. Vis. 81(2), 155–166 (2009).https://doi.org/10.1007/s11263-008-0152-6

  21. Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: PointCNN: convolution on x-transformed points. In: Advances in Neural Information Processing Systems, pp. 820–830 (2018)

    Google Scholar 

  22. Li, Y., Wang, G., Ji, X., Xiang, Y., Fox, D.: DeepIM: deep iterative matching for 6D pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision - ECCV 2018, ECCV 2018. Lecture Notes in Computer Science, vol. 11210, pp. 695–711. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_42

    Chapter  Google Scholar 

  23. Liang, M., Yang, B., Chen, Y., Hu, R., Urtasun, R.: Multi-task multi-sensor fusion for 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7345–7353 (2019)

    Google Scholar 

  24. Lin, D., Fidler, S., Urtasun, R.: Holistic scene understanding for 3d object detection with RGBD cameras. In: IEEE International Conference on Computer Vision, ICCV 2013, Sydney, Australia, 1–8 December 2013, pp. 1417–1424 (2013). https://doi.org/10.1109/ICCV.2013.179

  25. Luvizon, D.C., Picard, D., Tabia, H.: 2D/3D pose estimation and action recognition using multitask deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5137–5146 (2018)

    Google Scholar 

  26. Pavlakos, G., Zhou, X., Chan, A., Derpanis, K.G., Daniilidis, K.: 6-DoF object pose from semantic keypoints. In: 2017 IEEE International Conference on Robotics and Automation, ICRA 2017, Singapore, Singapore, 29 May– 3 June 2017, pp. 2011–2018 (2017). https://doi.org/10.1109/ICRA.2017.7989233

  27. Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: PVNet: pixel-wise voting network for 6DoF pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019, pp. 4561–4570 (2019). https://doi.org/10.1109/CVPR.2019.00469

  28. Pham, Q.H., Nguyen, T., Hua, B.S., Roig, G., Yeung, S.K.: JSIS3D: joint semantic-instance segmentation of 3D point clouds with multi-task pointwise networks and multi-value conditional random fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8827–8836 (2019)

    Google Scholar 

  29. Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3D object detection in point clouds. arXiv preprint arXiv:1904.09664 (2019)

  30. Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum PointNets for 3D object detection from rgb-d data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 918–927 (2018)

    Google Scholar 

  31. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)

    Google Scholar 

  32. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointnNet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)

    Google Scholar 

  33. Ren, Z., Sudderth, E.B.: Three-dimensional object detection and layout prediction using clouds of oriented gradients. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

    Google Scholar 

  34. Ruder, S.: An overview of multi-task learning in deep neural networks. CoRR abs/1706.05098 (2017). http://arxiv.org/abs/1706.05098

  35. Sener, O., Koltun, V.: Multi-task learning as multi-objective optimization. In: Advances in Neural Information Processing Systems, pp. 527–538 (2018)

    Google Scholar 

  36. Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–779 (2019)

    Google Scholar 

  37. Song, C., Song, J., Huang, Q.: HybridPose: 6D object pose estimation under hybrid representations. CoRR abs/2001.01869 (2020). http://arxiv.org/abs/2001.01869

  38. Song, S., Lichtenberg, S.P., Xiao, J.: Sun RGB-D: a RGB-D scene understanding benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 567–576 (2015)

    Google Scholar 

  39. Song, S., Xiao, J.: Sliding shapes for 3D object detection in depth images. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. Lecture Notes in Computer Science, vol. 8694, pp. 634–651. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_41

    Chapter  Google Scholar 

  40. Song, S., Xiao, J.: Deep sliding shapes for amodal 3D object detection in RGB-D images. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

    Google Scholar 

  41. Su, H., et al.: SplatNet: sparse lattice networks for point cloud processing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2530–2539 (2018)

    Google Scholar 

  42. Tatarchenko, M., Dosovitskiy, A., Brox, T.: Octree generating networks: efficient convolutional architectures for high-resolution 3D outputs. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2088–2096 (2017)

    Google Scholar 

  43. Tatarchenko, M., Park, J., Koltun, V., Zhou, Q.Y.: Tangent convolutions for dense prediction in 3D. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3887–3896 (2018)

    Google Scholar 

  44. Wang, N., Zhou, W., Tian, Q., Hong, R., Wang, M., Li, H.: Multi-cue correlation filters for robust visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4844–4853 (2018)

    Google Scholar 

  45. Wang, P.S., Liu, Y., Guo, Y.X., Sun, C.Y., Tong, X.: O-CNN: octree-based convolutional neural networks for 3D shape analysis. ACM Trans. Graph. (TOG) 36(4), 72 (2017)

    Google Scholar 

  46. Wang, X., Liu, S., Shen, X., Shen, C., Jia, J.: Associatively segmenting instances and semantics in point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4096–4105 (2019)

    Google Scholar 

  47. Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. arXiv preprint arXiv:1801.07829 (2018)

  48. Xie, S., Liu, S., Chen, Z., Tu, Z.: Attentional AhapeContextNet for point cloud recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4606–4615 (2018)

    Google Scholar 

  49. Xu, Y., Fan, T., Xu, M., Zeng, L., Qiao, Y.: SpiderCNN: deep learning on point sets with parameterized convolutional filters. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. Lecture Notes in Computer Science, vol. 11212, pp. 90–105. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_6

    Chapter  Google Scholar 

  50. Yang, Y., Feng, C., Shen, Y., Tian, D.: FoldingNet: point cloud auto-encoder via deep grid deformation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 206–215 (2018)

    Google Scholar 

  51. Yang, Z., Yan, S., Huang, Q.: Extreme relative pose network under hybrid representations. CoRR abs/1912.11695 (2019). http://arxiv.org/abs/1912.11695

  52. Yi, L., Zhao, W., Wang, H., Sung, M., Guibas, L.J.: GSPN: generative shape proposal network for 3D instance segmentation in point cloud. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3947–3956 (2019)

    Google Scholar 

  53. Zhang, Y., Yang, Q.: A survey on multi-task learning. CoRR abs/1707.08114 (2017). http://arxiv.org/abs/1707.08114

  54. Zhang, Z., Liang, Z., Wu, L., Zhou, X., Huang, Q.: Path-invariant map networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11084–11094, June 2019

    Google Scholar 

  55. Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4490–4499 (2018)

    Google Scholar 

  56. Zou, Y., Luo, Z., Huang, J.B.: DF-Net: unsupervised joint learning of depth and flow using cross-task consistency. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision - ECCV 2018, ECCV 2018. Lecture Notes in Computer Science, vol. 11209, pp. 36–53. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_3

    Chapter  Google Scholar 

Download references

Acknowledgement

We would like to acknowledge the support from NSF DMS-1700234, a Gift from Snap Research, and a hardware donation from NVIDIA.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zaiwei Zhang , Bo Sun , Haitao Yang or Qixing Huang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 24920 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, Z., Sun, B., Yang, H., Huang, Q. (2020). H3DNet: 3D Object Detection Using Hybrid Geometric Primitives. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12357. Springer, Cham. https://doi.org/10.1007/978-3-030-58610-2_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58610-2_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58609-6

  • Online ISBN: 978-3-030-58610-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics