Abstract
Autonomous driving can reduce the number of road accidents due to human error and result in safer roads. One important part of the system is the perception unit, which provides information about the environment surrounding the car. Currently, most manufacturers are using not only RGB cameras, which are passive sensors that capture light already in the environment but also Lidar. This sensor actively emits laser pulses to a surface or object and measures reflection and time-of-flight. Previous work, YOLOP, already proposed a model for object detection and semantic segmentation, but only using RGB. This work extends it for Lidar and evaluates performance on KITTI, a public autonomous driving dataset. The implementation shows improved precision across all objects of different sizes. The implementation is entirely made available: https://github.com/filipepcampos/yolomm.
This work is supported by European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internationalization Programme (COMPETE 2020) [Project n 047264; Funding Reference: POCI-01-0247-FEDER-047264].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Behley, J., et al.: Towards 3D LiDAR-based semantic scene understanding of 3D point cloud sequences: the SemanticKITTI dataset. Int. J. Robot. Res. 40(8–9), 959–967 (2021). https://doi.org/10.1177/02783649211006735
Bolya, D., Zhou, C., Xiao, F., Lee, Y.J.: YOLACT: real-time instance segmentation (2019)
Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving (2020)
Chan, C.Y.: Advancements, prospects, and impacts of automated driving systems. Int. J. Transp. Sci. Technol. 6(3), 208–216 (2017). https://doi.org/10.1016/j.ijtst.2017.07.008, https://www.sciencedirect.com/science/article/pii/S2046043017300035. safer Road Infrastructure and Operation Management
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1907–1915 (2017)
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding (2016)
Deschaud, J.E.: KITTI-CARLA: a KITTI-like dataset generated by CARLA Simulator. arXiv e-prints: arXiv:2109.00892 (2021)
Detlefsen, N.S., et al.: TorchMetrics - measuring reproducibility in PyTorch. J. Open Sour. Softw. 7(70), 4101 (2022). https://doi.org/10.21105/joss.04101
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)
Heuer, F., Mantowsky, S., Bukhari, S.S., Schneider, G.: MultiTask-CenterNet (MCN): efficient and diverse multitask learning using an anchor free approach (2021)
Lee, D.G., Kim, Y.K.: Joint semantic understanding with a multilevel branch for driving perception. Appl. Sci. 12(6), 2877 (2022). https://doi.org/10.3390/app12062877
Liao, Y., Xie, J., Geiger, A.: KITTI-360: a novel dataset and benchmarks for urban scene understanding in 2D and 3D. Pattern Anal. Mach. Intell. (PAMI) 45, 3292–310 (2022)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017). https://doi.org/10.1109/CVPR.2017.106
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation (2018)
Milioto, A., Vizzo, I., Behley, J., Stachniss, C.: RangeNet++: fast and accurate LiDAR semantic segmentation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2019)
Paek, D.H., Kong, S.H., Wijaya, K.T.: K-lane: lidar lane dataset and benchmark for urban roads and highways. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshop on Autonomous Driving (WAD) (2022)
Sheeny, M., De Pellegrin, E., Mukherjee, S., Ahrabian, A., Wang, S., Wallace, A.: RADIATE: a radar dataset for automotive perception. arXiv preprint: arXiv:2010.09076 (2020)
Sun, P., et al.: Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Vu, D., Ngo, B., Phan, H.: HybridNets: end-to-end perception network (2022)
Wu, D., et al.: YOLOP: you only look once for panoptic driving perception. Mach. Intell. Res. 19, 1–13 (2022)
Yu, F., et al.: BDD100K: a diverse driving dataset for heterogeneous multitask learning (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 Springer Nature Switzerland AG
About this paper
Cite this paper
Campos, F., Cerqueira, F.G., Cruz, R.P.M., Cardoso, J.S. (2024). YOLOMM – You Only Look Once for Multi-modal Multi-tasking. In: Vasconcelos, V., Domingues, I., Paredes, S. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2023. Lecture Notes in Computer Science, vol 14469. Springer, Cham. https://doi.org/10.1007/978-3-031-49018-7_40
Download citation
DOI: https://doi.org/10.1007/978-3-031-49018-7_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49017-0
Online ISBN: 978-3-031-49018-7
eBook Packages: Computer ScienceComputer Science (R0)