Abstract
Traffic scene visual perception technology is very important for intelligent transportation. Although the emerging panoptic segmentation is the most desirable sensing technology, object detection and semantic segmentation are relatively more mature and have fewer requirements for data annotation. In this paper, a joint object detection and semantic segmentation perception method is proposed for both practicability and accuracy. The proposed method is based on the results of object detection and semantic segmentation. Firstly, the result of basic semantic segmentation is preprocessed according to the principle of entropy. Secondly, the candidate bounding boxes of pedestrians and vehicles are extracted by object detection. Thirdly, candidate bounding boxes are optimized by using a K-means based vertex clustering algorithm. Finally, the contours of scene elements are matched with the results of semantic segmentation. The experimental results on the Cityscapes dataset show that the final perception effect is more susceptible to semantic segmentation results. The theoretical upper limit of the actual perception effect is 95.4% of the ground-truth of panoptic segmentation. The proposed method can effectively combine object detection and semantic segmentation, and achieve perception results similar to panoptic segmentation without additional data annotation.
Similar content being viewed by others
References
Tighe J, Niethammer M, Lazebnik S. (2014) Scene parsing with object instances and occlusion ordering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3748–3755
Kirillov A, He K, Girshick R, et al. (2019) Panoptic segmentation. 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), long beach, CA, USA, 2019, pp 9396–9405
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Machine Intell. https://doi.org/10.1109/TPAMI.2016.2577031
Lin T, Goyal P, Girshick R, et al. (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision pp 2980–2988
Liu W, Anguelov D, Erhan D et al (2016) SSD: single shot multibox detector. In: Leibe Bastian, Matas Jiri, Sebe Nicu, Welling Max (eds) Computer Vision – ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I. Springer International Publishing, Cham, pp 21–37
Redmon J, Divvala S, Girshick R, Farhadi A. (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Duan K, Bai S, et al. (2019) Centernet: Keypoint triplets for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6569-6578
Long J, Shelhamer E, Darrell T. (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Garcia-garcia A, Orts-escolano S, Oprea S et al (2018) A survey on deep learning techniques for image and video semantic segmentation. Appl Soft Comput 70:41–65
Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. international conference on medical image computing and computer assisted intervention, Munich, Germany, pp 234–241
Jégou S, Drozdzal M, Vazquez D, Romero A, Bengio Y. (2017) The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 11–19
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: A deep convolu- tional encoder-decoder architecture for image segmentation. IEEETrans Pattern Anal Mach Intell 39(12):2481–2495
Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Machine Intel 40(4):834–848
Yu C, Wang J, Peng C, et al. (2018) BiSeNet: bilateral segmentation network for real-time semantic segmentation. European Conference on computer vision, Munich, Germany, 2018, pp.334–349.
Siam M, Gamal M, Abdel-Razek M, et al. (2018) A comparative study of real-time semantic segmentation for autonomous driving. In: IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), Salt Lake City, UT, pp 700–710.
He K, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. (2017) In: Proceedings of the IEEE international conference on computer vision, Venice, Italy, pp 2961–2969
Kirillov A, Girshick R, He K, Dollár P. (2019) Panoptic feature pyramid networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Long Beach, CA, USA, pp 6399–6408
Sofiiuk K, Barinova O, Konushin A. (2019) Adaptis: Adaptive instance selection network. In: Proceedings of the IEEE/CVF international conference on computer vision, Seoul, Korea pp 7355–7363
Yang Y, Li H, Zhao Q et al (2020) SOGNet: scene overlap graph network for panoptic segmentation. Proc AAAI Conf Artif Intel 34(07):12637–12644
Porzi L, Bulò SR, Colovic A, Kontschieder P (2019) Seamless scene segmentation. 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Long Beach, CA, USA, pp 8269–8278
Xiong Y, Liao R, Zhao H, Hu R, Bai M, Yumer E, Urtasun R. (2019) Upsnet: a unified panoptic segmentation network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8818–8826
Li J, Bhargava A, Tagawa T et al.(2019) Learning to Fuse Things and Stuff. arXiv: 1812.01192v2,
Li Y et al. (2019) Attention-guided unified network for panoptic segmentation. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Long Beach, CA, USA, 2019, pp 7019–7028
Yang TJ, Collins MD, Zhu Y, Hwang JJ, Liu T, Zhang X, Sze V, Papandreou G, Chen LC. (2019) Deeperlab: Single-shot image parser. arXiv preprint arXiv:1902.05093
Liu H, Peng C, Yu C, Wang J, Liu X, Yu G, Jiang W. (2019) An end-to-end network for panoptic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition pp 6172–6181
Li Y, Zhao H, Qi X, et al. (2021) Fully convolutional networks for panoptic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 214–223
Wang H, Zhu Y, Adam H, Yuille A, Chen LC. (2021) Max-deeplab: End-to-end panoptic segmentation with mask transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition pp 5463–5474
Mohan R, Valada A (2021) Efficientps: Efficient panoptic segmentation. Int J Comput Vision 129(5):1551–1579
Cordts M, Omran M, Ramos S, et al. (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition pp 3213–3223
Tu Z, Chen X, Yuille AL, Zhu SC et al. (2003) Image parsing: unifying segmentation, detection, and recognition. In: Proceedings ninth IEEE international conference on computer vision, Nice, France, pp 18–25
Yao J, Fidler S, Urtasun R. (2020) Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation. In: 2012 IEEE conference on computer vision and pattern recognition, Providence, RI, pp 702–709. IEEE.
Fan R, Dahnoun N (2018) Real-time stereo vision-based lane detection system. Measure Sci Technol 29(7):074005
Yang K, Wang K, Zhao X, Cheng R, Bai J, Yang Y, Liu D (2017) IR stereo RealSense: decreasing minimum range of navigational assistance for visually impaired individuals. J Ambient Intel Smart Environ 9(6):743–755. https://doi.org/10.3233/AIS-170459
Zhou W, Worrall S, Zyner A, Nebot E. (2018) Automated process for incorporating drivable path into real-time semantic segmentation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA) Brisbane, QLD pp 6039–6044. IEEE
Teichmann M, Weber M, Zoellner M, Cipolla R, Urtasun R. (2018) Multinet: Real-time joint semantic reasoning for autonomous driving. In: 2018 IEEE intelligent vehicles symposium (IV) Changshu pp. 1013–1020
Kendall A, Gal Y, Cipolla R, Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, Salt Lake City, UT, 2018, pp 7482–7491
Huo Z, Xia Y, Zhang B (2016) Vehicle type classification and attribute prediction using multi-task RCNN. In: 9th International congress on image and signal processing, biomedical engineering and informatics (CISP-BMEI), Datong, pp 564–569
Dvornik N, Shmelkov K, Mairal J, Schmid C. (2017) Blitznet: A real-time deep network for scene understanding. In: Proceedings of the IEEE international conference on computer vision pp 4154–4162
Cheng Z, Wang Z, Huang H, Liu Y. (2019) Dense-acssd for end-to-end traffic scenes recognition. In: 2019 IEEE Intelligent Vehicles Symposium (IV) Paris, France, pp 460–465
Acknowledgements
This work is being supported by the National Key Research and Development Project of China under Grant No. 2020AAA0104001, the Zhejiang Lab. under Grant No. 2019KD0AD011005 and the Zhejiang Provincial Natural Science Foundation of China under Grant No. LQ22F020008.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Weng, L., Wang, Y. & Gao, F. Traffic Scene Perception Based on Joint Object Detection and Semantic Segmentation. Neural Process Lett 54, 5333–5349 (2022). https://doi.org/10.1007/s11063-022-10864-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-022-10864-z