Skip to main content

Evaluation of Detection and Segmentation Tasks on Driving Datasets

  • Conference paper
  • First Online:
Computer Vision and Image Processing (CVIP 2021)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1567))

Included in the following conference series:

Abstract

Object detection, semantic segmentation, and instance segmentation form the bases for many computer vision tasks in autonomous driving. The complexity of these tasks increases as we shift from object detection to instance segmentation. The state-of-the-art models are evaluated on standard datasets such as pascal-voc and ms-cococ, which do not consider the dynamics of road scenes. Driving datasets such as Cityscapes and Berkeley Deep Drive (bdd) are captured in a structured environment with better road markings and fewer variations in the appearance of objects and background. However, the same does not hold for Indian roads. The Indian Driving Dataset (idd) is captured in unstructured driving scenarios and is highly challenging for a model due to its diversity. This work presents a comprehensive evaluation of state-of-the-art models on object detection, semantic segmentation, and instance segmentation on-road scene datasets. We present our analyses and compare their quantitative and qualitative performance on structured driving datasets (Cityscapes and bdd) and the unstructured driving dataset (idd); understanding the behavior on these datasets helps in addressing various practical issues and helps in creating real-life applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Araki, R., Onishi, T., Hirakawa, T., Yamashita, T., Fujiyoshi, H.: MT-DSSD: deconvolutional single shot detector using multi task learning for object detection, segmentation, and grasping detection. In: ICRA (2020)

    Google Scholar 

  2. Arnab, A., Jayasumana, S., Zheng, S., Torr, P.H.: Higher order Conditional Random Fields in deep neural networks. In: ECCV (2016)

    Google Scholar 

  3. Bolya, D., Zhou, C., Xiao, F., Lee, Y.: YOLACT: real-time instance segmentation. In: ICCV (2019)

    Google Scholar 

  4. Cai, Z., Vasconcelos, N.: Cascade R-CNN: High quality object detection and instance segmentation. IEEE Trans. PAMI (2019)

    Google Scholar 

  5. Chen, K., et al.: MMDetection: Open MMLab detection toolbox and benchmark. arXiv (2019)

    Google Scholar 

  6. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. PAMI (2018)

    Google Scholar 

  7. Chen, L.C., Yang, Y., Wang, J., Xu, W., Yuille, A.L.: Attention to scale: scale-aware semantic image segmentation. In: CVPR (2016)

    Google Scholar 

  8. Cordts, M., et al.: The Cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)

    Google Scholar 

  9. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. IJCV (2010)

    Google Scholar 

  10. Girshick, R.: Fast R-CNN. In: ICCV (2015)

    Google Scholar 

  11. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)

    Google Scholar 

  12. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: CVPR (2017)

    Google Scholar 

  13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  14. Huang, Z., Huang, L., Gong, Y., Huang, C., Wang, X.: Mask Scoring R-CNN. In: CVPR (2019)

    Google Scholar 

  15. Ke, L., Tai, Y.W., Tang, C.K.: Deep occlusion-aware instance segmentation with overlapping bilayers. In: CVPR (2021)

    Google Scholar 

  16. Lee, Y., Park, J.: Centermask: Real-time anchor-free instance segmentation. In: CVPR (2020)

    Google Scholar 

  17. Liang, X., Lin, L., Wei, Y., Shen, X., Yang, J., Yan, S.: Proposal-free network for instance-level object segmentation. IEEE Trans. PAMI (2017)

    Google Scholar 

  18. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV (2017)

    Google Scholar 

  19. Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: ECCV (2014)

    Google Scholar 

  20. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: Single shot multibox detector. In: ECCV (2015)

    Google Scholar 

  21. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)

    Google Scholar 

  22. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: CVPR (2016)

    Google Scholar 

  23. Redmon, J., Farhadi, A.: YOLO9000: Better, faster, stronger. In: CVPR (2017)

    Google Scholar 

  24. Redmon, J., Farhadi, A.: YOLOv3: An incremental improvement. arXiv (2018)

    Google Scholar 

  25. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: NeurIPS (2015)

    Google Scholar 

  26. Romera, E., Alvarez, J.M., Bergasa, L.M., Arroyo, R.: ERFNet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. (2018)

    Google Scholar 

  27. Varma, G., Subramanian, A., Namboodiri, A., Chandraker, M., Jawahar, C.: IDD: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: WACV (2019)

    Google Scholar 

  28. Wang, X., Kong, T., Shen, C., Jiang, Y., Li, L.: SOLO: segmenting objects by locations. In: ECCV (2020)

    Google Scholar 

  29. Wang, Y., Zhou, Q., Xiong, J., Wu, X., Jin, X.: ESNet: an efficient symmetric network for real-time semantic segmentation. In: PRCV (2019)

    Google Scholar 

  30. Wu, T., Tang, S., Zhang, R., Cao, J., Zhang, Y.: CGNet: a light-weight context guided network for semantic segmentation. IEEE Trans. Image Process. (2020)

    Google Scholar 

  31. Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2. https://github.com/facebookresearch/detectron2 (2019)

  32. Xie, E., et al.: Polarmask: Single shot instance segmentation with polar representation. In: CVPR (2020)

    Google Scholar 

  33. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: CVPR (2017)

    Google Scholar 

  34. Yu, F., et al.: BDD100k: a diverse driving dataset for heterogeneous multitask learning. In: CVPR (2020)

    Google Scholar 

  35. Yu, F., Koltun, V., Funkhouser, T.A.: Dilated residual networks. arXiv (2017)

    Google Scholar 

  36. Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: CVPR (2018)

    Google Scholar 

  37. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017)

    Google Scholar 

  38. Zhao, Q., et al.: M2Det: a single-shot object detector based on multi-level feature pyramid network. In: AAAI (2019)

    Google Scholar 

Download references

Acknowledgements

This work was partly funded by IHub-Data at IIIT-Hyderabad.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deepak Singh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Singh, D., Rahane, A., Mondal, A., Subramanian, A., Jawahar, C.V. (2022). Evaluation of Detection and Segmentation Tasks on Driving Datasets. In: Raman, B., Murala, S., Chowdhury, A., Dhall, A., Goyal, P. (eds) Computer Vision and Image Processing. CVIP 2021. Communications in Computer and Information Science, vol 1567. Springer, Cham. https://doi.org/10.1007/978-3-031-11346-8_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-11346-8_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-11345-1

  • Online ISBN: 978-3-031-11346-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics