Abstract
We propose a new and effective self-distillation framework with our new Test-Time Augmentation (TTA) and Transformer based Voxel Feature Encoder (TransVFE) for robust LiDAR semantic segmentation in autonomous driving, where the robustness is mission-critical but usually neglected. The proposed framework enables the knowledge to be distilled from a teacher model instance to a student model instance, while the two model instances are with the same network architecture for jointly learning and evolving. This requires a strong teacher model to evolve in training. Our TTA strategy effectively reduces the uncertainty in the inference stage of the teacher model. Thus, we propose to equip the teacher model with TTA for providing privileged guidance while the student continuously updates the teacher with better network parameters learned by itself. To further enhance the teacher model, we propose a TransVFE to improve the point cloud encoding by modeling and preserving the local relationship among the points inside each voxel via multi-head attention. The proposed modules are generally designed to be instantiated with different backbones. Evaluations on SemanticKITTI and nuScenes datasets show that our method achieves state-of-the-art performance. Our code is publicly available at https://github.com/jialeli1/lidarseg3d.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
The new soft label acquisition strategies proposed in this paper.
References
Alonso, I., Riazuelo, L., Montesano, L., Murillo, A.C.: 3D-MiniNet: learning a 2D representation from point clouds for fast and efficient 3D LiDAR semantic segmentation. IEEE Rob. Autom. Lett. 5(4), 5432–5439 (2020)
An, S., Liao, Q., Lu, Z., Xue, J.H.: Efficient semantic segmentation via self-attention and self-distillation. IEEE Trans. Intell. Transp. Syst. 23, 1–11 (2022)
Armeni, I., et al.: 3D semantic parsing of large-scale indoor spaces. In: CVPR, pp. 1534–1543 (2016)
Behley, J., et al.: SemanticKITTI: a dataset for semantic scene understanding of LiDAR sequences. In: ICCV, pp. 9296–9306 (2019)
Berman, M., Triki, A.R., Blaschko, M.B.: The lovász-softmax loss: a tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In: CVPR, pp. 4413–4421 (2018)
Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. In: CVPR, pp. 11618–11628 (2020)
Chen, Y., Zhang, Z., Cao, Y., Wang, L., Lin, S., Hu, H.: RepPoints v2: verification meets regression for object detection. In: NeurIPS (2020)
Cheng, R., Razani, R., Taghavi, E., Li, E., Liu, B.: (AF)2–S3Net: attentive feature fusion with adaptive feature selection for sparse semantic segmentation network. In: CVPR, pp. 12547–12556 (2021)
Choy, C.B., Gwak, J., Savarese, S.: 4D spatio-temporal ConvNets: minkowski convolutional neural networks. In: CVPR, pp. 3075–3084 (2019)
Cortinhal, T., Tzelepis, G., Aksoy, E.E.: SalsaNext: fast, uncertainty-aware semantic segmentation of LiDAR point clouds for autonomous driving. arXiv preprint arXiv:2003.03653 (2020)
Cubuk, E.D., Zoph, B., Mané, D., Vasudevan, V., Le, Q.V.: AutoAugment: learning augmentation strategies from data. In: CVPR, pp. 113–123 (2019)
Deng, J., Shi, S., Li, P., Zhou, W., Zhang, Y., Li, H.: Voxel R-CNN: towards high performance voxel-based 3D object detection. In: AAAI, pp. 1201–1209 (2021)
Graham, B., Engelcke, M., van der Maaten, L.: 3D semantic segmentation with submanifold sparse convolutional networks. In: CVPR, pp. 9224–9232 (2018)
Guo, M., Cai, J., Liu, Z., Mu, T., Martin, R.R., Hu, S.: PCT: point cloud transformer. Comput. Visual Media 7(2), 187–199 (2021)
Hataya, R., Zdenek, J., Yoshizoe, K., Nakayama, H.: Faster AutoAugment: learning augmentation strategies using backpropagation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12370, pp. 1–16. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58595-2_1
He, T., Shen, C., Tian, Z., Gong, D., Sun, C., Yan, Y.: Knowledge adaptation for efficient semantic segmentation. In: CVPR, pp. 578–587 (2019)
Hendrycks, D., Dietterich, T.G.: Benchmarking neural network robustness to common corruptions and perturbations. In: ICLR (2019)
Hendrycks, D., Mu, N., Cubuk, E.D., Zoph, B., Gilmer, J., Lakshminarayanan, B.: AugMix: a simple data processing method to improve robustness and uncertainty. In: ICLR (2020)
Hinton, G.E., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. CoRR abs/1503.02531 (2015). http://arxiv.org/abs/1503.02531
Hu, H., Cui, J., Wang, L.: Region-aware contrastive learning for semantic segmentation. In: ICCV, pp. 16271–16281 (2021)
Hu, Q., et al.: RandLA-Net: efficient semantic segmentation of large-scale point clouds. In: CVPR, pp. 11105–11114 (2020)
Kim, I., Kim, Y., Kim, S.: Learning loss for test-time augmentation. In: NeurIPS (2020)
Kochanov, D., Nejadasl, F.K., Booij, O.: KPRNet: improving projection-based LiDAR semantic segmentation. In: ECCV (2020)
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: CVPR, pp. 12689–12697 (2019)
Li, J., Dai, H., Shao, L., Ding, Y.: Anchor-free 3D single stage detector with mask-guided attention for point cloud. In: ACM MM, pp. 553–562 (2021)
Li, J., Dai, H., Shao, L., Ding, Y.: From voxel to point: IoU-guided 3D object detection for point cloud with voxel-to-point decoder. In: ACM MM (2021)
Li, J., et al.: P2V-RCNN: point to voxel feature learning for 3D object detection from point clouds. IEEE Access 9, 98249–98260 (2021)
Liong, V.E., Nguyen, T.N.T., Widjaja, S., Sharma, D., Chong, Z.J.: AMVNet: assertion-based multi-view fusion network for LiDAR semantic segmentation. CoRR abs/2012.04934 (2020). http://arxiv.org/abs/2012.04934
Liu, Y., et al.: Unbiased teacher for semi-supervised object detection. In: ICLR (2021)
Liu, Y., Chen, K., Liu, C., Qin, Z., Luo, Z., Wang, J.: Structured knowledge distillation for semantic segmentation. In: CVPR, pp. 2604–2613 (2019)
Lyzhov, A., Molchanova, Y., Ashukha, A., Molchanov, D., Vetrov, D.P.: Greedy policy search: a simple baseline for learnable test-time augmentation. In: Adams, R.P., Gogate, V. (eds.) UAI, vol. 124, pp. 1308–1317 (2020)
Milioto, A., Vizzo, I., Behley, J., Stachniss, C.: RangeNet++: fast and accurate LiDAR semantic segmentation. In: IROS, pp. 4213–4220 (2019)
Park, S., Heo, Y.S.: Knowledge distillation for semantic segmentation using channel and spatial correlations and adaptive cross entropy. Sensors 20(16), 4616 (2020)
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR, pp. 77–85 (2017)
Qiu, S., Anwar, S., Barnes, N.: Semantic segmentation for real point cloud scenes via bilateral augmentation and adaptive fusion. In: CVPR, pp. 1757–1767 (2021)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Shi, S., Wang, Z., Shi, J., Wang, X., Li, H.: From points to parts: 3D object detection from point cloud with part-aware and part-aggregation network. IEEE TPAMI 43(8), 2647–2664 (2021)
Taghanaki, S.A., Luo, J., Zhang, R., Wang, Y., Jayaraman, P.K., Jatavallabhula, K.M.: Robustpointset: a dataset for benchmarking robustness of point cloud classifiers. In: ICLR (2021)
Tang, H., Liu, Z., Zhao, S., Lin, Y., Lin, J., Wang, H., Han, S.: Searching efficient 3D architectures with sparse point-voxel convolution. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 685–702. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_41
Tang, Y., Chen, W., Luo, Y., Zhang, Y.: Humble teachers teach better students for semi-supervised object detection. In: CVPR, pp. 3132–3141 (2021)
Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: NeurIPS, pp. 1195–1204 (2017)
Thomas, H., Qi, C.R., Deschaud, J., Marcotegui, B., Goulette, F., Guibas, L.J.: KPConv: flexible and deformable convolution for point clouds. In: ICCV, pp. 6410–6419 (2019)
Vaswani, A., et al.: Attention is all you need. In: NeurIPS, pp. 5998–6008 (2017)
Wang, H., Zhao, H., Li, X., Tan, X.: Progressive blockwise knowledge distillation for neural network acceleration. In: IJCAI, pp. 2769–2775 (2018)
Wang, Y., Zhou, W., Jiang, T., Bai, X., Xu, Y.: Intra-class feature variation distillation for semantic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 346–362. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_21
Wu, B., Wan, A., Yue, X., Keutzer, K.: SqueezeSeg: convolutional neural nets with recurrent CRF for real-time road-object segmentation from 3D LiDAR point cloud. In: ICRA, pp. 1887–1893 (2018)
Wu, B., Zhou, X., Zhao, S., Yue, X., Keutzer, K.: SqueezeSegV2: improved model structure and unsupervised domain adaptation for road-object segmentation from a LiDAR point cloud. In: ICRA, pp. 4376–4382 (2019)
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: SegFormer: simple and efficient design for semantic segmentation with transformers. CoRR abs/2105.15203 (2021)
Xu, C., et al.: SqueezeSegV3: spatially-adaptive convolution for efficient point-cloud segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 1–19. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_1
Yan, X., Zheng, C., Li, Z., Wang, S., Cui, S.: PointASNL: robust point clouds processing using nonlocal neural networks with adaptive sampling. In: CVPR, pp. 5588–5597 (2020)
Yan, Y., Mao, Y., Li, B.: SECOND: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)
Yi, L., et al scalable active framework for region annotation in 3D shape collections. ACM TOG 35(6), 210:1–210:12 (2016)
Zhang, F., Fang, J., Wah, B., Torr, P.: Deep FusionNet for point cloud semantic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12369, pp. 644–663. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58586-0_38
Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S.Z.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: CVPR,pp. 9756–9765 (2020)
Zhang, Y., Qu, Y., Xie, Y., Li, Z., Zheng, S., Li, C.: Perturbed self-distillation: weakly supervised large-scale point cloud semantic segmentation. In: ICCV,pp. 15520–15528 (2021)
Zhang, Y., et al.: PolarNet: an improved grid representation for online LiDAR point clouds semantic segmentation. In: CVPR, pp. 9598–9607 (2020)
Zheng, W., Tang, W., Jiang, L., Fu, C.: SE-SSD: self-ensembling single-stage object detector from point cloud. In: CVPR, pp. 14494–14503 (2021)
Zhou, Z., Zhang, Y., Foroosh, H.: Panoptic-PolarNet: proposal-free LiDAR point cloud panoptic segmentation. In: CVPR, pp. 13194–13203 (2021)
Zhu, X., et al.: Cylindrical and asymmetrical 3D convolution networks for LiDAR segmentation. In: CVPR, pp. 9939–9948 (2021)
Zhu, Z., Xu, M., Bai, S., Huang, T., Bai, X.: Asymmetric non-local neural networks for semantic segmentation. In: ICCV, pp. 593–602. IEEE (2019)
Acknowledgement
This work was supported by the National Key Research and Development Program of China (2018YFE0183900) and YUNJI Technology Co. Ltd.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Li, J., Dai, H., Ding, Y. (2022). Self-Distillation for Robust LiDAR Semantic Segmentation in Autonomous Driving. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13688. Springer, Cham. https://doi.org/10.1007/978-3-031-19815-1_38
Download citation
DOI: https://doi.org/10.1007/978-3-031-19815-1_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19814-4
Online ISBN: 978-3-031-19815-1
eBook Packages: Computer ScienceComputer Science (R0)
