Abstract
Robust road segmentation is a key challenge in self-driving research. Though many image based methods have been studied and high performances in dataset evaluations have been reported, developing robust and reliable road segmentation is still a major challenge. Data fusion across different sensors to improve the performance of road segmentation is widely considered an important and irreplaceable solution. In this paper, we propose a novel structure to fuse image and LiDAR point cloud in an end-to-end semantic segmentation network, in which the fusion is performed at decoder stage instead of at, more commonly, encoder stage. During fusion, we improve the multi-scale LiDAR map generation to increase the precision of multi-scale LiDAR map by introducing pyramid projection method. Additionally, we adapted the multi-path refinement network with our fusion strategy and improve the road prediction compared with transpose convolution with skip layers. Our approach has been tested on KITTI ROAD dataset and have a competitive performance.
Similar content being viewed by others
References
Asvadi A, Garrote L, Premebida C, Peixoto P, Nunes U (2017) Multi-modal vehicle detection: fusing 3d-LiDAR and color camera data. Pattern Recogn Lett 115:20–29
Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39:2481–2495
Caltagirone L, Bellone M, Svensson L, Wahde M (2019) LIDAR–camera fusion for road detection using fully convolutional neural networks. Robot Auton Syst 2019:125–131
Caltagirone L, Scheidegger S, Svensson L, Wahda M (2017) Fast LIDAR-based road detection using fully convolutional neural networks. IEEE Intelligent Vehicles Symposium 2017:1019–1024
Chen C, Seff A, Kornhauser A, Xiao J (2015) Deepdriving: learning affordance for direct perception in autonomous driving. IEEE International Conference on Computer Vision 2015:2722–2730
Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille L (2015) Semantic image segmentation with deep convolutional nets and fully connected CRFs. International Conference on Learning Representations 2015:1–1
Chen L, Yang J, Kong H (2017) LiDAR-histogram for fast road and obstacle detection. IEEE International Conference on Robotics and Automation 2017:1343–1348
Chen L, Zhu Y, George P, Florian S (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. European Conference on Computer Vision 2018:833–851
Chen Z, Chen Z (2017) RBNet: a deep neural network for unified road and road boundary detection. Neural Information Processing 2017:677–687
Deng J, Dong W, Socher R, Li L, Li K, Li F (2009) ImageNet: a large-scale hierarchical image database. IEEE International Conference on Computer Vision and Pattern Recognition 2009:248–255
Fritsch J, Kuhnl T, Geiger A (2014) A new performance measure and evaluation benchmark for road detection algorithms. IEEE Conference on Intelligent Transportation Systems 2014:1693–1700
Han X, Wang H, Lu J, Zhao C (2017) Road detection based on the fusion of Lidar and image data. Int J Adv Robot Syst 14:1–10
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Proc. IEEE Conf Comput Vis Pattern Recognit 2016:770–778
Lin G, Milan A, Shen C, Reid I (2017) RefineNet: Multi-path refinement networks for high-resolution semantic segmentation. Proc IEEE Conf Comput Vis Pattern Recognit 2017:5168–5177
Liu H, Han X, Li X, Yao Y, Huang P, Tang Z (2018) Deep representation learning for road detection using siamese network. Multimed Tools Appl 2018:1–15
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. Proc IEEE Conf Comput Vis Pattern Recognit 2015:3431–3440
Lu K, Li J, An X, He H (2014) A hierarchical approach for road detection. IEEE International Conference on Robotics and Automation 2014:517–522
Muñoz-Bulnes J, Fernandez C, Parra I, Fernández-Llorca D, Sotelo M (2017) Deep fully convolutional networks with random data augmentation for enhanced generalization in road detection. IEEE International Conference on Intelligent Transportation Systems 2017:366–371
Olaf R, Philipp F, Thomas B (2015) U-Net: Convolutional networks for biomedical image segmentation. International Conference on Medical Image Computing and Computer Assisted Intervention 2015:234–241
Oliveira G, Burgard W, Brox T (2016) Efficient deep methods for monocular road segmentation. International Conference on Intelligent Robots and Systems 2016:9–14
Premebida C, Carreira J, Batista J, Nunes U (2014) Pedestrian detection combining RGB and dense LIDAR data. IEEE International Conference on Intelligent Robots and Systems 2014:4112–4117
Schlosser J, Chow C, Kira Z (2016) Fusing LIDAR and images for pedestrian detection using convolutional neural networks. IEEE International Conference on Robotics and Automation 2016:2198–2205
Shen F, Xu Y, Liu L, Yang Y, Huang Z, Shen H (2018) Unsupervised deep hashing with similarity-adaptive and discrete optimization. IEEE Trans Pattern Anal Mach Intell 40(12):3034–3044
Shen F, Yang Y, Liu L, Liu W, Tao D, Shen H (2017) Asymmetric binary coding for image search. IEEE Trans Multimedia 19(9):2022–2032
Shen F, Zhou X, Yang Y, Song J, Shen H, Tao D (2016) A fast optimization method for general binary code learning. IEEE Trans Image Process 25 (12):5610–5621
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations 2015:1–14
Treml M, Arjona-Medina J, Unterthiner T, Durgesh R, Friedmann F, Schuberth P, Mayr A, Heusel M, Hofmarcher M, Widrich M, Bodenhofer U, Nessler B, Hochreiter S (2016) Speeding up semantic segmentation for autonomous driving. NIPS Workshop 2016:96–108
Xiao L, Wang R, Dai B, Fang Y, Liu D (2018) Hybrid conditional random field based camera-LIDAR fusion for road detection. Inf Sci 432:543–558
Xie G, Zhang X, Shu X, Yan S, Liu C (2015) Task-driven feature pooling for image classification. IEEE International Conference on Computer Vision 2015:1179–1187
Xie G, Zhang X, Yan S, Liu C (2017) Hybrid CNN and dictionary-based models for scene recognition and domain adaptation. IEEE Trans Circuits Syst Video Technol 27(6):1263–1274
Xie G, Zhang X, Yan S, Liu C (2017) SDE: a novel selective, discriminative and equalizing feature representation for visual recognition. Int J Comput Vis 124 (2):145–168
Yang W, Li J, Zheng H, Xu R (2017) A nuclear norm based matrix regression based projections method for feature extraction. IEEE Access 6:7445–7451
Yang W, Wang Z, Sun C (2015) A collaborative representation based projections method for feature extraction. Pattern Recogn 48(1):20–27
Yang W, Wang Z, Yin J, Sun C, Ricanek K (2013) Image classification using kernel collaborative representation with regularized least square. Appl Math Comput 222:13–28
Yao Y, Shen F, Zhang J, Liu L, Tang Z, Shao L (2019) Extracting multiple visual senses for web learning. IEEE Trans Multimedia 21(1):184–196
Yao Y, Shen F, Zhang J, Liu L, Tang Z, Shao L (2019) Extracting privileged information for enhancing classifier learning. IEEE Trans Image Process 28 (1):436–450
Yao Y, Zhang J, Shen F, Hua X, Xu J, Tang Z (2016) Automatic image dataset construction with multiple textual metadata. IEEE International Conference on Multimedia and Expo 2016:1–6
Yao Y, Zhang J, Shen F, Hua X, Xu J, Tang Z (2017) Exploiting web images for dataset construction a domain robust approach. IEEE Trans Multimedia 19 (8):1771–1784
Yao Y, Zhang J, Shen F, Yang W, Hua X, Tang Z (2018) Extracting privileged information from untagged corpora for classifier learning. International Joint Conference on Artificial Intelligence 2018:1085–1091
Yao Y, Zhang J, Shen F, Yang W, Huang P, Tang Z (2018) Discovering and distinguishing multiple visual senses for polysemous words. AAAI Conference on Artificial Intelligence 2018:523–530
Zhao M, Zhang J, Porikli F, Zhang C, Zhang W (2017) Learning a perspective-embedded deconvolution network for crowd counting. IEEE International Conference on Multimedia and Expo 2017:403–408
Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr P (2015) Conditional random fields as recurrent neural networks. IEEE International Conference on Computer Vision 2015:1529–1537
Zheng W (2017) Multichannel EEG-based emotion recognition via group sparse canonical correlation analysis. IEEE Transactions on Cognitive and Developmental Systems 19(3):281–290
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, H., Yao, Y., Sun, Z. et al. Road segmentation with image-LiDAR data fusion in deep neural network. Multimed Tools Appl 79, 35503–35518 (2020). https://doi.org/10.1007/s11042-019-07870-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-07870-0