Skip to main content

Self-Distillation for Robust LiDAR Semantic Segmentation in Autonomous Driving

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13688))

Abstract

We propose a new and effective self-distillation framework with our new Test-Time Augmentation (TTA) and Transformer based Voxel Feature Encoder (TransVFE) for robust LiDAR semantic segmentation in autonomous driving, where the robustness is mission-critical but usually neglected. The proposed framework enables the knowledge to be distilled from a teacher model instance to a student model instance, while the two model instances are with the same network architecture for jointly learning and evolving. This requires a strong teacher model to evolve in training. Our TTA strategy effectively reduces the uncertainty in the inference stage of the teacher model. Thus, we propose to equip the teacher model with TTA for providing privileged guidance while the student continuously updates the teacher with better network parameters learned by itself. To further enhance the teacher model, we propose a TransVFE to improve the point cloud encoding by modeling and preserving the local relationship among the points inside each voxel via multi-head attention. The proposed modules are generally designed to be instantiated with different backbones. Evaluations on SemanticKITTI and nuScenes datasets show that our method achieves state-of-the-art performance. Our code is publicly available at https://github.com/jialeli1/lidarseg3d.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
EUR   29.95
Price includes VAT (Finland)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR   42.79
Price includes VAT (Finland)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR   54.99
Price includes VAT (Finland)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The new soft label acquisition strategies proposed in this paper.

References

  1. Alonso, I., Riazuelo, L., Montesano, L., Murillo, A.C.: 3D-MiniNet: learning a 2D representation from point clouds for fast and efficient 3D LiDAR semantic segmentation. IEEE Rob. Autom. Lett. 5(4), 5432–5439 (2020)

    Article  Google Scholar 

  2. An, S., Liao, Q., Lu, Z., Xue, J.H.: Efficient semantic segmentation via self-attention and self-distillation. IEEE Trans. Intell. Transp. Syst. 23, 1–11 (2022)

    Article  Google Scholar 

  3. Armeni, I., et al.: 3D semantic parsing of large-scale indoor spaces. In: CVPR, pp. 1534–1543 (2016)

    Google Scholar 

  4. Behley, J., et al.: SemanticKITTI: a dataset for semantic scene understanding of LiDAR sequences. In: ICCV, pp. 9296–9306 (2019)

    Google Scholar 

  5. Berman, M., Triki, A.R., Blaschko, M.B.: The lovász-softmax loss: a tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In: CVPR, pp. 4413–4421 (2018)

    Google Scholar 

  6. Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. In: CVPR, pp. 11618–11628 (2020)

    Google Scholar 

  7. Chen, Y., Zhang, Z., Cao, Y., Wang, L., Lin, S., Hu, H.: RepPoints v2: verification meets regression for object detection. In: NeurIPS (2020)

    Google Scholar 

  8. Cheng, R., Razani, R., Taghavi, E., Li, E., Liu, B.: (AF)2–S3Net: attentive feature fusion with adaptive feature selection for sparse semantic segmentation network. In: CVPR, pp. 12547–12556 (2021)

    Google Scholar 

  9. Choy, C.B., Gwak, J., Savarese, S.: 4D spatio-temporal ConvNets: minkowski convolutional neural networks. In: CVPR, pp. 3075–3084 (2019)

    Google Scholar 

  10. Cortinhal, T., Tzelepis, G., Aksoy, E.E.: SalsaNext: fast, uncertainty-aware semantic segmentation of LiDAR point clouds for autonomous driving. arXiv preprint arXiv:2003.03653 (2020)

  11. Cubuk, E.D., Zoph, B., Mané, D., Vasudevan, V., Le, Q.V.: AutoAugment: learning augmentation strategies from data. In: CVPR, pp. 113–123 (2019)

    Google Scholar 

  12. Deng, J., Shi, S., Li, P., Zhou, W., Zhang, Y., Li, H.: Voxel R-CNN: towards high performance voxel-based 3D object detection. In: AAAI, pp. 1201–1209 (2021)

    Google Scholar 

  13. Graham, B., Engelcke, M., van der Maaten, L.: 3D semantic segmentation with submanifold sparse convolutional networks. In: CVPR, pp. 9224–9232 (2018)

    Google Scholar 

  14. Guo, M., Cai, J., Liu, Z., Mu, T., Martin, R.R., Hu, S.: PCT: point cloud transformer. Comput. Visual Media 7(2), 187–199 (2021)

    Article  Google Scholar 

  15. Hataya, R., Zdenek, J., Yoshizoe, K., Nakayama, H.: Faster AutoAugment: learning augmentation strategies using backpropagation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12370, pp. 1–16. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58595-2_1

    Chapter  Google Scholar 

  16. He, T., Shen, C., Tian, Z., Gong, D., Sun, C., Yan, Y.: Knowledge adaptation for efficient semantic segmentation. In: CVPR, pp. 578–587 (2019)

    Google Scholar 

  17. Hendrycks, D., Dietterich, T.G.: Benchmarking neural network robustness to common corruptions and perturbations. In: ICLR (2019)

    Google Scholar 

  18. Hendrycks, D., Mu, N., Cubuk, E.D., Zoph, B., Gilmer, J., Lakshminarayanan, B.: AugMix: a simple data processing method to improve robustness and uncertainty. In: ICLR (2020)

    Google Scholar 

  19. Hinton, G.E., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. CoRR abs/1503.02531 (2015). http://arxiv.org/abs/1503.02531

  20. Hu, H., Cui, J., Wang, L.: Region-aware contrastive learning for semantic segmentation. In: ICCV, pp. 16271–16281 (2021)

    Google Scholar 

  21. Hu, Q., et al.: RandLA-Net: efficient semantic segmentation of large-scale point clouds. In: CVPR, pp. 11105–11114 (2020)

    Google Scholar 

  22. Kim, I., Kim, Y., Kim, S.: Learning loss for test-time augmentation. In: NeurIPS (2020)

    Google Scholar 

  23. Kochanov, D., Nejadasl, F.K., Booij, O.: KPRNet: improving projection-based LiDAR semantic segmentation. In: ECCV (2020)

    Google Scholar 

  24. Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: CVPR, pp. 12689–12697 (2019)

    Google Scholar 

  25. Li, J., Dai, H., Shao, L., Ding, Y.: Anchor-free 3D single stage detector with mask-guided attention for point cloud. In: ACM MM, pp. 553–562 (2021)

    Google Scholar 

  26. Li, J., Dai, H., Shao, L., Ding, Y.: From voxel to point: IoU-guided 3D object detection for point cloud with voxel-to-point decoder. In: ACM MM (2021)

    Google Scholar 

  27. Li, J., et al.: P2V-RCNN: point to voxel feature learning for 3D object detection from point clouds. IEEE Access 9, 98249–98260 (2021)

    Article  Google Scholar 

  28. Liong, V.E., Nguyen, T.N.T., Widjaja, S., Sharma, D., Chong, Z.J.: AMVNet: assertion-based multi-view fusion network for LiDAR semantic segmentation. CoRR abs/2012.04934 (2020). http://arxiv.org/abs/2012.04934

  29. Liu, Y., et al.: Unbiased teacher for semi-supervised object detection. In: ICLR (2021)

    Google Scholar 

  30. Liu, Y., Chen, K., Liu, C., Qin, Z., Luo, Z., Wang, J.: Structured knowledge distillation for semantic segmentation. In: CVPR, pp. 2604–2613 (2019)

    Google Scholar 

  31. Lyzhov, A., Molchanova, Y., Ashukha, A., Molchanov, D., Vetrov, D.P.: Greedy policy search: a simple baseline for learnable test-time augmentation. In: Adams, R.P., Gogate, V. (eds.) UAI, vol. 124, pp. 1308–1317 (2020)

    Google Scholar 

  32. Milioto, A., Vizzo, I., Behley, J., Stachniss, C.: RangeNet++: fast and accurate LiDAR semantic segmentation. In: IROS, pp. 4213–4220 (2019)

    Google Scholar 

  33. Park, S., Heo, Y.S.: Knowledge distillation for semantic segmentation using channel and spatial correlations and adaptive cross entropy. Sensors 20(16), 4616 (2020)

    Article  Google Scholar 

  34. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR, pp. 77–85 (2017)

    Google Scholar 

  35. Qiu, S., Anwar, S., Barnes, N.: Semantic segmentation for real point cloud scenes via bilateral augmentation and adaptive fusion. In: CVPR, pp. 1757–1767 (2021)

    Google Scholar 

  36. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  37. Shi, S., Wang, Z., Shi, J., Wang, X., Li, H.: From points to parts: 3D object detection from point cloud with part-aware and part-aggregation network. IEEE TPAMI 43(8), 2647–2664 (2021)

    Google Scholar 

  38. Taghanaki, S.A., Luo, J., Zhang, R., Wang, Y., Jayaraman, P.K., Jatavallabhula, K.M.: Robustpointset: a dataset for benchmarking robustness of point cloud classifiers. In: ICLR (2021)

    Google Scholar 

  39. Tang, H., Liu, Z., Zhao, S., Lin, Y., Lin, J., Wang, H., Han, S.: Searching efficient 3D architectures with sparse point-voxel convolution. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 685–702. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_41

    Chapter  Google Scholar 

  40. Tang, Y., Chen, W., Luo, Y., Zhang, Y.: Humble teachers teach better students for semi-supervised object detection. In: CVPR, pp. 3132–3141 (2021)

    Google Scholar 

  41. Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: NeurIPS, pp. 1195–1204 (2017)

    Google Scholar 

  42. Thomas, H., Qi, C.R., Deschaud, J., Marcotegui, B., Goulette, F., Guibas, L.J.: KPConv: flexible and deformable convolution for point clouds. In: ICCV, pp. 6410–6419 (2019)

    Google Scholar 

  43. Vaswani, A., et al.: Attention is all you need. In: NeurIPS, pp. 5998–6008 (2017)

    Google Scholar 

  44. Wang, H., Zhao, H., Li, X., Tan, X.: Progressive blockwise knowledge distillation for neural network acceleration. In: IJCAI, pp. 2769–2775 (2018)

    Google Scholar 

  45. Wang, Y., Zhou, W., Jiang, T., Bai, X., Xu, Y.: Intra-class feature variation distillation for semantic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 346–362. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_21

    Chapter  Google Scholar 

  46. Wu, B., Wan, A., Yue, X., Keutzer, K.: SqueezeSeg: convolutional neural nets with recurrent CRF for real-time road-object segmentation from 3D LiDAR point cloud. In: ICRA, pp. 1887–1893 (2018)

    Google Scholar 

  47. Wu, B., Zhou, X., Zhao, S., Yue, X., Keutzer, K.: SqueezeSegV2: improved model structure and unsupervised domain adaptation for road-object segmentation from a LiDAR point cloud. In: ICRA, pp. 4376–4382 (2019)

    Google Scholar 

  48. Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: SegFormer: simple and efficient design for semantic segmentation with transformers. CoRR abs/2105.15203 (2021)

    Google Scholar 

  49. Xu, C., et al.: SqueezeSegV3: spatially-adaptive convolution for efficient point-cloud segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 1–19. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_1

    Chapter  Google Scholar 

  50. Yan, X., Zheng, C., Li, Z., Wang, S., Cui, S.: PointASNL: robust point clouds processing using nonlocal neural networks with adaptive sampling. In: CVPR, pp. 5588–5597 (2020)

    Google Scholar 

  51. Yan, Y., Mao, Y., Li, B.: SECOND: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)

    Article  Google Scholar 

  52. Yi, L., et al scalable active framework for region annotation in 3D shape collections. ACM TOG 35(6), 210:1–210:12 (2016)

    Google Scholar 

  53. Zhang, F., Fang, J., Wah, B., Torr, P.: Deep FusionNet for point cloud semantic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12369, pp. 644–663. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58586-0_38

    Chapter  Google Scholar 

  54. Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S.Z.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: CVPR,pp. 9756–9765 (2020)

    Google Scholar 

  55. Zhang, Y., Qu, Y., Xie, Y., Li, Z., Zheng, S., Li, C.: Perturbed self-distillation: weakly supervised large-scale point cloud semantic segmentation. In: ICCV,pp. 15520–15528 (2021)

    Google Scholar 

  56. Zhang, Y., et al.: PolarNet: an improved grid representation for online LiDAR point clouds semantic segmentation. In: CVPR, pp. 9598–9607 (2020)

    Google Scholar 

  57. Zheng, W., Tang, W., Jiang, L., Fu, C.: SE-SSD: self-ensembling single-stage object detector from point cloud. In: CVPR, pp. 14494–14503 (2021)

    Google Scholar 

  58. Zhou, Z., Zhang, Y., Foroosh, H.: Panoptic-PolarNet: proposal-free LiDAR point cloud panoptic segmentation. In: CVPR, pp. 13194–13203 (2021)

    Google Scholar 

  59. Zhu, X., et al.: Cylindrical and asymmetrical 3D convolution networks for LiDAR segmentation. In: CVPR, pp. 9939–9948 (2021)

    Google Scholar 

  60. Zhu, Z., Xu, M., Bai, S., Huang, T., Bai, X.: Asymmetric non-local neural networks for semantic segmentation. In: ICCV, pp. 593–602. IEEE (2019)

    Google Scholar 

Download references

Acknowledgement

This work was supported by the National Key Research and Development Program of China (2018YFE0183900) and YUNJI Technology Co. Ltd.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Hang Dai or Yong Ding .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1088 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, J., Dai, H., Ding, Y. (2022). Self-Distillation for Robust LiDAR Semantic Segmentation in Autonomous Driving. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13688. Springer, Cham. https://doi.org/10.1007/978-3-031-19815-1_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19815-1_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19814-4

  • Online ISBN: 978-3-031-19815-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics