Finding Your (3D) Center: 3D Object Detection Using a Learned Loss

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12363)


Massive semantically labeled datasets are readily available for 2D images, however, are much harder to achieve for 3D scenes. Objects in 3D repositories like ShapeNet are labeled, but regrettably only in isolation, so without context. 3D scenes can be acquired by range scanners on city-level scale, but much fewer with semantic labels. Addressing this disparity, we introduce a new optimization procedure, which allows training for 3D detection with raw 3D scans while using as little as 5% of the object labels and still achieve comparable performance. Our optimization uses two networks. A scene network maps an entire 3D scene to a set of 3D object centers. As we assume the scene not to be labeled by centers, no classic loss, such as Chamfer can be used to train it. Instead, we use another network to emulate the loss. This loss network is trained on a small labeled subset and maps a non-centered 3D object in the presence of distractions to its own center. This function is very similar – and hence can be used instead of – the gradient the supervised loss would provide. Our evaluation documents competitive fidelity at a much lower level of supervision, respectively higher quality at comparable supervision. Supplementary material can be found at:


3D learning 3D point clouds 3D object detection Unsupervised 

Supplementary material

Supplementary material 1 (mp4 71137 KB)

504473_1_En_5_MOESM2_ESM.pdf (3.6 mb)
Supplementary material 2 (pdf 3735 KB)


  1. 1.
    Adler, J., Öktem, O.: Solving ill-posed inverse problems using iterativedeep neural networks. Inverse Prob. 33(12) (2017)Google Scholar
  2. 2.
    Armeni, I., Sax, S., Zamir, A.R., Savarese, S.: Joint 2D–3D-semantic data for indoor scene understanding. arXiv:1702.01105 (2017)
  3. 3.
    Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv:1512.03012 (2015)
  4. 4.
    Chen, X., et al.: 3D object proposals for accurate object class detection. In: NIPS (2015)Google Scholar
  5. 5.
    Chen, Y., Liu, S., Shen, X., Jia, J.: Fast point R-CNN. In: ICCV (2019)Google Scholar
  6. 6.
    Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: CVPR (2017)Google Scholar
  7. 7.
    Engelcke, M., Rao, D., Wang, D.Z., Tong, C.H., Posner, I.: Vote3deep: fast object detection in 3D point clouds using efficient convolutional neural networks. In: ICRA (2017)Google Scholar
  8. 8.
    Feng, M., Gilani, S.Z., Wang, Y., Zhang, L., Mian, A.: Relation graph network for 3D object detection in point clouds. arXiv:1912.00202 (2019)
  9. 9.
    Flynn, J., et al.: Deepview: view synthesis with learned gradient descent. In: CVPR (2019)Google Scholar
  10. 10.
    Girshick, R.: Fast R-CNN. In: ICCV (2015)Google Scholar
  11. 11.
    Griffiths, D., Boehm, J.: A review on deep learning techniques for 3D senseddata classification. Remote Sens. 11(12) (2019)Google Scholar
  12. 12.
    Hermosilla, P., Ritschel, T., Ropinski, T.: Total denoising: unsupervised learning of 3D point cloud cleaning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 52–60 (2019)Google Scholar
  13. 13.
    Hermosilla, P., Ritschel, T., Vázquez, P.P., Vinacua, À., Ropinski, T.: Monte Carlo convolution for learning on non-uniformly sampled point clouds. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 37(6), 1–12 (2018)Google Scholar
  14. 14.
    Hou, J., Dai, A., Nießner, M.: 3D-SIS: 3D semantic instance segmentation of RGB-D scans. In: CVPR (2019)Google Scholar
  15. 15.
    Karpathy, A., Miller, S., Fei-Fei, L.: Object discovery in 3D scenes via shape analysis. In: ICRA (2013)Google Scholar
  16. 16.
    Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3D proposal generation and object detection from view aggregation. In: IROS (2018)Google Scholar
  17. 17.
    Lahoud, J., Ghanem, B., Pollefeys, M., Oswald, M.R.: 3D instance segmentation via multi-task metric learning. In: ICCV (2019)Google Scholar
  18. 18.
    Lehtinen, J., et al.: Noise2noise: learning image restoration without clean data. arXiv:1803.04189 (2018)
  19. 19.
    Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). Scholar
  20. 20.
    Maximov, M., Leal-Taixe, L., Fritz, M., Ritschel, T.: Deep appearance maps. In: ICCV (2019)Google Scholar
  21. 21.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). Scholar
  22. 22.
    Qi, C.R., Chen, X., Litany, O., Guibas, L.J.: ImVoteNet: boosting 3D object detection in point clouds with image votes. In: arXiv preprint arXiv:2001.10692 (2020)
  23. 23.
    Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3D object detection in point clouds. In: ICCV (2019)Google Scholar
  24. 24.
    Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)Google Scholar
  25. 25.
    Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. arXiv:1706.02413 (2017)
  26. 26.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016)Google Scholar
  27. 27.
    Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: CVPR (2019)Google Scholar
  28. 28.
    Song, S., Lichtenberg, S.P., Xiao, J.: Sun RGB-D: a RGB-D scene understanding benchmark suite. In: CVPR (2015)Google Scholar
  29. 29.
    Song, S., Xiao, J.: Sliding shapes for 3D object detection in depth images. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 634–651. Springer, Cham (2014). Scholar
  30. 30.
    Song, S., Xiao, J.: Deep sliding shapes for amodal 3D object detection in RGB-D images. In: CVPR (2016)Google Scholar
  31. 31.
    Weisstein, E.: Fixed point (2020).
  32. 32.
    Weisstein, E.: Hammersley point set (2020).
  33. 33.
    Yang, B., et al.: Learning object bounding boxes for 3D instance segmentation on point clouds. arXiv:1906.01140 (2019)
  34. 34.
    Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv:1904.07850 (2019)
  35. 35.
    Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: CVPR (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.University College LondonLondonUK

Personalised recommendations