Advertisement

Improving Multispectral Pedestrian Detection by Addressing Modality Imbalance Problems

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12363)

Abstract

Multispectral pedestrian detection is capable of adapting to insufficient illumination conditions by leveraging color-thermal modalities. On the other hand, it is still lacking of in-depth insights on how to fuse the two modalities effectively. Compared with traditional pedestrian detection, we find multispectral pedestrian detection suffers from modality imbalance problems which will hinder the optimization process of dual-modality network and depress the performance of detector. Inspired by this observation, we propose Modality Balance Network (MBNet) which facilitates the optimization process in a much more flexible and balanced manner. Firstly, we design a novel Differential Modality Aware Fusion (DMAF) module to make the two modalities complement each other. Secondly, an illumination aware feature alignment module selects complementary features according to the illumination conditions and aligns the two modality features adaptively. Extensive experimental results demonstrate MBNet outperforms the state-of-the-arts on both the challenging KAIST and CVC-14 multispectral pedestrian datasets in terms of the accuracy and the computational efficiency. Code is available at https://github.com/CalayZhou/MBNet.

Keywords

Multispectral pedestrian detection Modality imbalance problems Multimodal feature fusion 

Notes

Acknowledgment

This work was supported in part by the National Natural Science Foundation of China (Nos. 61627804).

Supplementary material

504473_1_En_46_MOESM1_ESM.zip (51.9 mb)
Supplementary material 1 (zip 53106 KB)

References

  1. 1.
    Behley, J., et al.: A dataset for semantic scene understanding of lidar sequences. In: Proceedings of the IEEE/CVF International Conf. on Computer Vision (ICCV), vol. 3 (2019)Google Scholar
  2. 2.
    Brazil, G., Liu, X.: Pedestrian detection with autoregressive network phases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7231–7240 (2019)Google Scholar
  3. 3.
    Brazil, G., Yin, X., Liu, X.: Illuminating pedestrians via simultaneous detection & segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4950–4959 (2017)Google Scholar
  4. 4.
    Cao, Y., Guan, D., Wu, Y., Yang, J., Cao, Y., Yang, M.Y.: Box-level segmentation supervised deep neural networks for accurate and real-time multispectral pedestrian detection. ISPRS J. Photogram. Remote Sens. 150, 70–79 (2019)CrossRefGoogle Scholar
  5. 5.
    Chen, K., et al.: Towards accurate one-stage object detection with AP-loss. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5119–5127 (2019)Google Scholar
  6. 6.
    Chi, C., Zhang, S., Xing, J., Lei, Z., Li, S.Z., Zou, X.: Relational learning for joint head and human detection. arXiv preprint arXiv:1909.10674 (2019)
  7. 7.
    Choi, H., Kim, S., Park, K., Sohn, K.: Multi-spectral pedestrian detection based on accumulated object proposal with fully convolutional networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 621–626. IEEE (2016)Google Scholar
  8. 8.
    Deng, L., Yang, M., Li, T., He, Y., Wang, C.: RFBNet: deep multimodal networks with residual fusion blocks for RGB-D semantic segmentation. arXiv preprint arXiv:1907.00135 (2019)
  9. 9.
    Dollár, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1532–1545 (2014)CrossRefGoogle Scholar
  10. 10.
    Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 743–761 (2011)CrossRefGoogle Scholar
  11. 11.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)Google Scholar
  12. 12.
    González, A., et al.: Pedestrian detection at day/night time with visible and fir cameras: a comparison. Sensors 16(6), 820 (2016)CrossRefGoogle Scholar
  13. 13.
    Guan, D., Cao, Y., Yang, J., Cao, Y., Yang, M.Y.: Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection. Inf. Fusion 50, 148–157 (2019)CrossRefGoogle Scholar
  14. 14.
    Guo, M., Haque, A., Huang, D.A., Yeung, S., Fei-Fei, L.: Dynamic task prioritization for multitask learning. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. Lecture Notes in Computer Science, vol. 11220, pp. 282–299. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01270-0_17CrossRefGoogle Scholar
  15. 15.
    Ha, Q., Watanabe, K., Karasawa, T., Ushiku, Y., Harada, T.: MFNet: towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5108–5115. IEEE (2017)Google Scholar
  16. 16.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  17. 17.
    Hwang, S., Park, J., Kim, N., Choi, Y., So Kweon, I.: Multispectral pedestrian detection: benchmark dataset and baseline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1037–1045 (2015)Google Scholar
  18. 18.
    Jang, H.D., Woo, S., Benz, P., Park, J., Kweon, I.S.: Propose-and-attend single shot detector. arXiv preprint arXiv:1907.12736 (2019)
  19. 19.
    Ju, R., Ge, L., Geng, W., Ren, T., Wu, G.: Depth saliency based on anisotropic center-surround difference. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 1115–1119. IEEE (2014)Google Scholar
  20. 20.
    Kong, T., Sun, F., Liu, H., Jiang, Y., Shi, J.: Consistent optimization for single-shot object detection. arXiv preprint arXiv:1901.06563 (2019)
  21. 21.
    Konig, D., Adam, M., Jarvers, C., Layher, G., Neumann, H., Teutsch, M.: Fully convolutional region proposal networks for multispectral person detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 49–56 (2017)Google Scholar
  22. 22.
    Li, C., Song, D., Tong, R., Tang, M.: Multispectral pedestrian detection via simultaneous detection and segmentation. arXiv preprint arXiv:1808.04818 (2018)
  23. 23.
    Li, C., Song, D., Tong, R., Tang, M.: Illumination-aware faster R-CNN for robust multispectral pedestrian detection. Pattern Recogn. 85, 161–171 (2019)CrossRefGoogle Scholar
  24. 24.
    Li, X., et al.: A unified framework for concurrent pedestrian and cyclist detection. IEEE Trans. Intell. Transp. Syst. 18(2), 269–281 (2016)CrossRefGoogle Scholar
  25. 25.
    Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)Google Scholar
  26. 26.
    Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)Google Scholar
  27. 27.
    Liu, J., Zhang, S., Wang, S., Metaxas, D.N.: Multispectral deep neural networks for pedestrian detection. arXiv preprint arXiv:1611.02644 (2016)
  28. 28.
    Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018)Google Scholar
  29. 29.
    Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. Lecture Notes in Computer Science, vol. 9905, pp. 21–37. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_2CrossRefGoogle Scholar
  30. 30.
    Liu, W., Liao, S., Hu, W., Liang, X., Chen, X.: Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. Lecture Notes in Computer Science, vol. 11218, pp. 643–659. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01264-9_38CrossRefGoogle Scholar
  31. 31.
    Liu, W., Liao, S., Ren, W., Hu, W., Yu, Y.: High-level semantic feature detection: a new perspective for pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5187–5196 (2019)Google Scholar
  32. 32.
    Noh, J., Lee, S., Kim, B., Kim, G.: Improving occlusion and hard negative handling for single-stage pedestrian detectors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 966–974 (2018)Google Scholar
  33. 33.
    Oksuz, K., Cam, B.C., Kalkan, S., Akbas, E.: Imbalance problems in object detection: a review. arXiv preprint arXiv:1909.00169 (2019)
  34. 34.
    Park, K., Kim, S., Sohn, K.: Unified multi-spectral pedestrian detection based on probabilistic fusion networks. Pattern Recogn. 80, 143–155 (2018)CrossRefGoogle Scholar
  35. 35.
    Qian, Q., Chen, L., Li, H., Jin, R.: DR loss: improving object detection by distributional ranking. arXiv preprint arXiv:1907.10156 (2019)
  36. 36.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  37. 37.
    Wagner, J., Fischer, V., Herman, M., Behnke, S.: Multispectral pedestrian detection using deep fusion convolutional neural networks. In: ESANN (2016)Google Scholar
  38. 38.
    Wang, X., Xiao, T., Jiang, Y., Shao, S., Sun, J., Shen, C.: Repulsion loss: detecting pedestrians in a crowd. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7774–7783 (2018)Google Scholar
  39. 39.
    Wu, B., Iandola, F., Jin, P.H., Keutzer, K.: Squeezedet: unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 129–137 (2017)Google Scholar
  40. 40.
    Xu, D., Ouyang, W., Ricci, E., Wang, X., Sebe, N.: Learning cross-modal deep representations for robust pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5363–5371 (2017)Google Scholar
  41. 41.
    Zhang, L., Lin, L., Liang, X., He, K.: Is faster R-CNN doing well for pedestrian detection? In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. Lecture Notes in Computer Science, vol. 9906, pp. 443–457. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_28CrossRefGoogle Scholar
  42. 42.
    Zhang, L., Liu, Z., Chen, X., Yang, X.: The cross-modality disparity problem in multispectral pedestrian detection. arXiv preprint arXiv:1901.02645 (2019)
  43. 43.
    Zhang, L., et al.: Cross-modality interactive attention network for multispectral pedestrian detection. Inf. Fusion 50, 20–29 (2019)CrossRefGoogle Scholar
  44. 44.
    Zhang, L., Zhu, X., Chen, X., Yang, X., Lei, Z., Liu, Z.: Weakly aligned cross-modal learning for multispectral pedestrian detection. In: Proceedings of the IEEE International Conference on Computer Vision, p. 5127–5137 (2019)Google Scholar
  45. 45.
    Zhang, S., Benenson, R., Schiele, B.: Citypersons: a diverse dataset for pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3221 (2017)Google Scholar
  46. 46.
    Zhang, S., Yang, J., Schiele, B.: Occluded pedestrian detection through guided attention in CNNs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6995–7003 (2018)Google Scholar
  47. 47.
    Zhang, S., et al.: CASIA-SURF: a large-scale multi-modal benchmark for face anti-spoofing. arXiv preprint arXiv:1908.10654 (2019)
  48. 48.
    Zheng, Y., Izzat, I.H., Ziaee, S.: GFD-SSD: gated fusion double ssd for multispectral pedestrian detection. arXiv preprint arXiv:1903.06999 (2019)
  49. 49.
    Zhou, C., Wu, M., Lam, S.K.: SSA-CNN: semantic self-attention CNN for pedestrian detection. arXiv preprint arXiv:1902.09080 (2019)

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Nanjing UniversityNanjingChina

Personalised recommendations