Weakly Supervised 3D Object Detection from Lidar Point Cloud

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12358)


It is laborious to manually label point cloud data for training high-quality 3D object detectors. This work proposes a weakly supervised approach for 3D object detection, only requiring a small set of weakly annotated scenes, associated with a few precisely labeled object instances. This is achieved by a two-stage architecture design. Stage-1 learns to generate cylindrical object proposals under weak supervision, i.e., only the horizontal centers of objects are click-annotated in bird’s view scenes. Stage-2 learns to refine the cylindrical proposals to get cuboids and confidence scores, using a few well-labeled instances. Using only 500 weakly annotated scenes and 534 precisely labeled vehicle instances, our method achieves \(85\) \(-\) \(95\)% the performance of current top-leading, fully supervised detectors (requiring 3, 712 exhaustively and precisely annotated scenes with 15, 654 instances). Moreover, with our elaborately designed network architecture, our trained model can be applied as a 3D object annotator, supporting both automatic and active (human-in-the-loop) working modes. The annotations generated by our model can be used to train 3D object detectors, achieving over 94% of their original performance (with manually labeled training data). Our experiments also show our model’s potential in boosting performance when given more training data. Above designs make our approach highly practical and introduce new opportunities for learning 3D object detection at reduced annotation cost.


3d object detection Weakly supervised learning 

Supplementary material

504454_1_En_31_MOESM1_ESM.pdf (5.4 mb)
Supplementary material 1 (pdf 5503 KB)


  1. 1.
    Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., Urtasun, R.: Monocular 3D object detection for autonomous driving. In: CVPR (2016)Google Scholar
  2. 2.
    Chabot, F., Chaouch, M., Rabarisoa, J., Teuliere, C., Chateau, T.: Deep MANTA: a coarse-to-fine many-task network for joint 2D and 3D vehicle analysis from monocular image. In: CVPR (2017)Google Scholar
  3. 3.
    Chen, X., et al.: 3D object proposals for accurate object class detection. In: NeurIPS (2015)Google Scholar
  4. 4.
    Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: CVPR (2017)Google Scholar
  5. 5.
    Yang, B., Luo, W., Urtasun, R.: Pixor: real-time 3D object detection from point clouds. In: CVPR (2018)Google Scholar
  6. 6.
    Chen, Y., Liu, S., Shen, X., Jia, J.: Fast point R-CNN. In: ICCV (2019)Google Scholar
  7. 7.
    Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: Pointpillars: fast encoders for object detection from point clouds. In: CVPR (2019)Google Scholar
  8. 8.
    Maturana, D., Scherer, S.: Voxnet: a 3D convolutional neural network for real-time object recognition. In: IROS (2015)Google Scholar
  9. 9.
    Wu, Z., et al.: 3D shapenets: a deep representation for volumetric shapes. In: CVPR (2015)Google Scholar
  10. 10.
    Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3D classification and segmentation. In: CVPR (2017)Google Scholar
  11. 11.
    Li, B., Zhang, T., Xia, T.: Vehicle detection from 3D Lidar using fully convolutional network. In: Robotics: Science and Systems (2016)Google Scholar
  12. 12.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the KITTI vision benchmark suite. In: CVPR (2012)Google Scholar
  13. 13.
    Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: CVPR (2019)Google Scholar
  14. 14.
    Zakharov, S., Kehl, W., Bhargava, A., Gaidon, A.: Autolabeling 3D objects with differentiable rendering of SDF shape priors. arXiv preprint arXiv:1911.11288 (2019)
  15. 15.
    Lee, J., Walsh, S., Harakeh, A., Waslander, S.L.: Leveraging pre-trained 3D object detection models for fast ground truth generation. In: ITSC (2018)Google Scholar
  16. 16.
    Yan, Y., Mao, Y., Li, B.: Second: sparsely embedded convolutional detection. Sensors (2018)Google Scholar
  17. 17.
    Zhou, Y., Tuzel, O.: Voxelnet: end-to-end learning for point cloud based 3D object detection. In: CVPR (2018)Google Scholar
  18. 18.
    Yang, Z., Sun, Y., Liu, S., Shen, X., Jia, J.: STD: sparse-to-dense 3D object detector for point cloud. In: ICCV (2019)Google Scholar
  19. 19.
    Xiang, Y., Choi, W., Lin, Y., Savarese, S.: Data-driven 3D voxel patterns for object category recognition. In: CVPR (2015)Google Scholar
  20. 20.
    Xie, J., Zheng, Z., Gao, R., Wang, W., Zhu, S.C., Nian Wu, Y.: Learning descriptor networks for 3D shape synthesis and analysis. In: CVPR (2018)Google Scholar
  21. 21.
    Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3D shape recognition. In: ICCV (2015)Google Scholar
  22. 22.
    Qi, C.R., Su, H., Niessner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view CNNs for object classification on 3D data. In: CVPR (2016)Google Scholar
  23. 23.
    Rethage, D., Wald, J., Sturm, J., Navab, N., Tombari, F.: Fully-convolutional point networks for large-scale point clouds. In: ECCV (2018)Google Scholar
  24. 24.
    Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: NeurIPS (2017)Google Scholar
  25. 25.
    Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3D object detection from RGB-D data. In: CVPR (2018)Google Scholar
  26. 26.
    Mousavian, A., Anguelov, D., Flynn, J., Kosecka, J.: 3D bounding box estimation using deep learning and geometry. In: CVPR (2017)Google Scholar
  27. 27.
    Li, B., Ouyang, W., Sheng, L., Zeng, X., Wang, X.: GS3D: an efficient 3D object detection framework for autonomous driving. In: CVPR (2019)Google Scholar
  28. 28.
    Mottaghi, R., Xiang, Y., Savarese, S.: A coarse-to-fine model for 3D pose estimation and sub-category recognition. In: CVPR (2015)Google Scholar
  29. 29.
    Liang, M., Yang, B., Wang, S., Urtasun, R.: Deep continuous fusion for multi-sensor 3D object detection. In: ECCV (2018)Google Scholar
  30. 30.
    Liang, M., Yang, B., Chen, Y., Hu, R., Urtasun, R.: Multi-task multi-sensor fusion for 3D object detection. In: CVPR (2019)Google Scholar
  31. 31.
    Bearman, A., Russakovsky, O., Ferrari, V., Fei-Fei, L.: What’s the point: semantic segmentation with point supervision. In: ECCV (2016)Google Scholar
  32. 32.
    Papadopoulos, D.P., Uijlings, J.R., Keller, F., Ferrari, V.: Training object class detectors with click supervision. In: CVPR (2017)Google Scholar
  33. 33.
    Maninis, K.K., Caelles, S., Pont-Tuset, J., Van Gool, L.: Deep extreme cut: from extreme points to object segmentation. In: CVPR (2018)Google Scholar
  34. 34.
    Papadopoulos, D.P., Uijlings, J.R., Keller, F., Ferrari, V.: Extreme clicking for efficient object annotation. In: ICCV (2017)Google Scholar
  35. 35.
    Benenson, R., Popov, S., Ferrari, V.: Large-scale interactive object segmentation with human annotators. In: CVPR (2019)Google Scholar
  36. 36.
    Xie, J., Kiefel, M., Sun, M.T., Geiger, A.: Semantic instance annotation of street scenes by 3D to 2D label transfer. In: CVPR (2016)Google Scholar
  37. 37.
    Wang, P., Huang, X., Cheng, X., Zhou, D., Geng, Q., Yang, R.: The apolloscape open dataset for autonomous driving and its application. IEEE TPAMI (2019)Google Scholar
  38. 38.
    Song, S., Lichtenberg, S.P., Xiao, J.: SUN RGB-D: a RGB-D scene understanding benchmark suite. In: CVPR (2015)Google Scholar
  39. 39.
    Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: ICCV (2017)Google Scholar
  40. 40.
    Shi, S., Wang, Z., Shi, J., Wang, X., Li, H.: From points to parts: 3D object detection from point cloud with part-aware and part-aggregation network. IEEE TPAMI (2020)Google Scholar
  41. 41.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.School of Computer ScienceBeijing Institute of TechnologyBeijingChina
  2. 2.ETH ZurichZurichSwitzerland
  3. 3.Inception Institute of Artificial IntelligenceAbu DhabiUAE

Personalised recommendations