Abstract
Object detection, semantic segmentation, and instance segmentation form the bases for many computer vision tasks in autonomous driving. The complexity of these tasks increases as we shift from object detection to instance segmentation. The state-of-the-art models are evaluated on standard datasets such as pascal-voc and ms-cococ, which do not consider the dynamics of road scenes. Driving datasets such as Cityscapes and Berkeley Deep Drive (bdd) are captured in a structured environment with better road markings and fewer variations in the appearance of objects and background. However, the same does not hold for Indian roads. The Indian Driving Dataset (idd) is captured in unstructured driving scenarios and is highly challenging for a model due to its diversity. This work presents a comprehensive evaluation of state-of-the-art models on object detection, semantic segmentation, and instance segmentation on-road scene datasets. We present our analyses and compare their quantitative and qualitative performance on structured driving datasets (Cityscapes and bdd) and the unstructured driving dataset (idd); understanding the behavior on these datasets helps in addressing various practical issues and helps in creating real-life applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Araki, R., Onishi, T., Hirakawa, T., Yamashita, T., Fujiyoshi, H.: MT-DSSD: deconvolutional single shot detector using multi task learning for object detection, segmentation, and grasping detection. In: ICRA (2020)
Arnab, A., Jayasumana, S., Zheng, S., Torr, P.H.: Higher order Conditional Random Fields in deep neural networks. In: ECCV (2016)
Bolya, D., Zhou, C., Xiao, F., Lee, Y.: YOLACT: real-time instance segmentation. In: ICCV (2019)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: High quality object detection and instance segmentation. IEEE Trans. PAMI (2019)
Chen, K., et al.: MMDetection: Open MMLab detection toolbox and benchmark. arXiv (2019)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. PAMI (2018)
Chen, L.C., Yang, Y., Wang, J., Xu, W., Yuille, A.L.: Attention to scale: scale-aware semantic image segmentation. In: CVPR (2016)
Cordts, M., et al.: The Cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. IJCV (2010)
Girshick, R.: Fast R-CNN. In: ICCV (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: CVPR (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Huang, Z., Huang, L., Gong, Y., Huang, C., Wang, X.: Mask Scoring R-CNN. In: CVPR (2019)
Ke, L., Tai, Y.W., Tang, C.K.: Deep occlusion-aware instance segmentation with overlapping bilayers. In: CVPR (2021)
Lee, Y., Park, J.: Centermask: Real-time anchor-free instance segmentation. In: CVPR (2020)
Liang, X., Lin, L., Wei, Y., Shen, X., Yang, J., Yan, S.: Proposal-free network for instance-level object segmentation. IEEE Trans. PAMI (2017)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV (2017)
Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: ECCV (2014)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: Single shot multibox detector. In: ECCV (2015)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: CVPR (2016)
Redmon, J., Farhadi, A.: YOLO9000: Better, faster, stronger. In: CVPR (2017)
Redmon, J., Farhadi, A.: YOLOv3: An incremental improvement. arXiv (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: NeurIPS (2015)
Romera, E., Alvarez, J.M., Bergasa, L.M., Arroyo, R.: ERFNet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. (2018)
Varma, G., Subramanian, A., Namboodiri, A., Chandraker, M., Jawahar, C.: IDD: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: WACV (2019)
Wang, X., Kong, T., Shen, C., Jiang, Y., Li, L.: SOLO: segmenting objects by locations. In: ECCV (2020)
Wang, Y., Zhou, Q., Xiong, J., Wu, X., Jin, X.: ESNet: an efficient symmetric network for real-time semantic segmentation. In: PRCV (2019)
Wu, T., Tang, S., Zhang, R., Cao, J., Zhang, Y.: CGNet: a light-weight context guided network for semantic segmentation. IEEE Trans. Image Process. (2020)
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2. https://github.com/facebookresearch/detectron2 (2019)
Xie, E., et al.: Polarmask: Single shot instance segmentation with polar representation. In: CVPR (2020)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: CVPR (2017)
Yu, F., et al.: BDD100k: a diverse driving dataset for heterogeneous multitask learning. In: CVPR (2020)
Yu, F., Koltun, V., Funkhouser, T.A.: Dilated residual networks. arXiv (2017)
Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: CVPR (2018)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017)
Zhao, Q., et al.: M2Det: a single-shot object detector based on multi-level feature pyramid network. In: AAAI (2019)
Acknowledgements
This work was partly funded by IHub-Data at IIIT-Hyderabad.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Singh, D., Rahane, A., Mondal, A., Subramanian, A., Jawahar, C.V. (2022). Evaluation of Detection and Segmentation Tasks on Driving Datasets. In: Raman, B., Murala, S., Chowdhury, A., Dhall, A., Goyal, P. (eds) Computer Vision and Image Processing. CVIP 2021. Communications in Computer and Information Science, vol 1567. Springer, Cham. https://doi.org/10.1007/978-3-031-11346-8_44
Download citation
DOI: https://doi.org/10.1007/978-3-031-11346-8_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11345-1
Online ISBN: 978-3-031-11346-8
eBook Packages: Computer ScienceComputer Science (R0)