Advertisement

Learning to Separate: Detecting Heavily-Occluded Objects in Urban Scenes

Conference paper
  • 492 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12363)

Abstract

While visual object detection with deep learning has received much attention in the past decade, cases when heavy intra-class occlusions occur have not been studied thoroughly. In this work, we propose a Non-Maximum-Suppression (NMS) algorithm that dramatically improves the detection recall while maintaining high precision in scenes with heavy occlusions. Our NMS algorithm is derived from a novel embedding mechanism, in which the semantic and geometric features of the detected boxes are jointly exploited. The embedding makes it possible to determine whether two heavily-overlapping boxes belong to the same object in the physical world. Our approach is particularly useful for car detection and pedestrian detection in urban scenes where occlusions often happen. We show the effectiveness of our approach by creating a model called SG-Det (short for Semantics and Geometry Detection) and testing SG-Det on two widely-adopted datasets, KITTI and CityPersons for which it achieves state-of-the-art performance. Our code is available at https://github.com/ChenhongyiYang/SG-NMS.

Notes

Acknowledgements

We acknowledge partial support of this work by the MURI Program, N00014-19-1-2571 associated with AUSMURIB000001, and the National Science Foundation under Grant No. 1928477.

References

  1. 1.
    Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)Google Scholar
  2. 2.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  3. 3.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014)Google Scholar
  4. 4.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788 (2016)Google Scholar
  5. 5.
    Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_2CrossRefGoogle Scholar
  6. 6.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)Google Scholar
  7. 7.
    Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)Google Scholar
  8. 8.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  9. 9.
    Bodla, N., Singh, B., Chellappa, R., Davis, L. S.: Soft-NMS-improving object detection with one line of code. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5561–5569 (2017)Google Scholar
  10. 10.
    Newell, A., Huang, Z., Deng, J.: Associative embedding: end-to-end learning for joint detection and grouping. In: Advances in Neural Information Processing Systems, pp. 2277–2287 (2017)Google Scholar
  11. 11.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)Google Scholar
  12. 12.
    Zhang, S., Benenson, R., Schiele, B.: CityPersons: a diverse dataset for pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3221 (2017)Google Scholar
  13. 13.
    Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)Google Scholar
  14. 14.
    Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016)Google Scholar
  15. 15.
    Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)Google Scholar
  16. 16.
    Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)CrossRefGoogle Scholar
  17. 17.
    Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Occlusion-aware R-CNN: detecting pedestrians in a crowd. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 637–653 (2018)Google Scholar
  18. 18.
    Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_26CrossRefGoogle Scholar
  19. 19.
    He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)CrossRefGoogle Scholar
  20. 20.
    Hosang, J., Benenson, R., Schiele, B.: Learning non-maximum suppression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4507–4515 (2017)Google Scholar
  21. 21.
    Liu, S., Huang, D., Wang, Y.: Adaptive NMS: refining pedestrian detection in a crowd. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6459–6468 (2019)Google Scholar
  22. 22.
    Hu, H., Gu, J., Zhang, Z., Dai, J., Wei, Y.: Relation networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3588–3597 (2018)Google Scholar
  23. 23.
    Tychsen-Smith, L., Petersson, L.: Improving object localization with fitness NMS and bounded IoU loss. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018Google Scholar
  24. 24.
    He, Y. Zhu, C., Wang, J., Savvides, M., Zhang, X.: Bounding box regression with uncertainty for accurate object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2888–2897 (2019)Google Scholar
  25. 25.
    Jiang, B., Luo, R., Mao, J., Xiao, T., Jiang, Y.: Acquisition of localization confidence for accurate object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 784–799 (2018)Google Scholar
  26. 26.
    Tan, Z., Nie, X., Qian, Q., Li, N., Li, H.: Learning to rank proposals for object detection. In: The IEEE International Conference on Computer Vision (ICCV), October 2019Google Scholar
  27. 27.
    Wang, X., Xiao, T., Jiang, Y., Shao, S., Sun, J., Shen, C.: Repulsion loss: detecting pedestrians in a crowd. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7774–7783 (2018)Google Scholar
  28. 28.
    Zhang, S., Yang, J., Schiele, B.: Occluded pedestrian detection through guided attention in CNNs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6995–7003 (2018)Google Scholar
  29. 29.
    Noh, J., Lee, S., Kim, B., Kim, G.: Improving occlusion and hard negative handling for single-stage pedestrian detectors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 966–974 (2018)Google Scholar
  30. 30.
    Tian, Y., Luo, P., Wang, X., Tang, X.: Deep learning strong parts for pedestrian detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1904–1912 (2015)Google Scholar
  31. 31.
    Zhou, C., Yuan, J.: Multi-label learning of part detectors for heavily occluded pedestrian detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3486–3495 (2017)Google Scholar
  32. 32.
    Goldman, E., Herzig, R., Eisenschtat, A., Goldberger, J., Hassner, T.: Precise detection in densely packed scenes. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019Google Scholar
  33. 33.
    Zhou, C., Yuan, J.: Bi-box regression for pedestrian detection and occlusion estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 135–151 (2018)Google Scholar
  34. 34.
    Pang, Y., Xie, J., Khan, M.H., Anwer, R.M., Khan, F.S., Shao, L.: Mask-guided attention network for occluded pedestrian detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4967–4975 (2019)Google Scholar
  35. 35.
    Zhang, K., Xiong, F., Sun, P., Hu, L., Li, B., Yu, G.: Double anchor R-CNN for human detection in a crowd. arXiv preprint arXiv:1909.09998 (2019)
  36. 36.
    Law, H., Deng, J.: CornerNet: detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 734–750 (2018)Google Scholar
  37. 37.
    Ren, J., et al.: Accurate single stage detector using recurrent rolling convolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5420–5428 (2017)Google Scholar
  38. 38.
    Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)
  39. 39.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  40. 40.
    Yang, B., Yan, J., Lei, Z., Li, S.Z.: Craft objects from images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6043–6051 (2016)Google Scholar
  41. 41.
    Yang, F., Choi, W., Lin, Y.: Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2129–2137 (2016)Google Scholar
  42. 42.
    Liu, W., Liao, S., Hu, W., Liang, X., Zhang, Y.: Improving tiny vehicle detection in complex scenes. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2018)Google Scholar
  43. 43.
    Hu, X., et al.: SINet: a scale-insensitive convolutional neural network for fast vehicle detection. IEEE Trans. Intell. Transp. Syst. 20(3), 1010–1019 (2018)CrossRefGoogle Scholar
  44. 44.
    Wang, T., He, X., Cai, Y., Xiao, G.: Learning a layout transfer network for context aware object detection. IEEE Trans. Intell. Transp. Syst. (2019)Google Scholar
  45. 45.
    Wei, J., He, J., Zhou, Y., Chen, K., Tang, Z., Xiong, Z.: Enhanced object detection with deep convolutional neural networks for advanced driving assistance. IEEE Trans. Intell. Transp. Syst. (2019)Google Scholar
  46. 46.
    Mousavian, A., Anguelov, D., Flynn, J., Kosecka, J.: 3D bounding box estimation using deep learning and geometry. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7074–7082 (2017)Google Scholar
  47. 47.
    Lin, T.-Y. Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Boston UniversityBostonUSA
  2. 2.University of WashingtonSeattleUSA

Personalised recommendations