Robust Multispectral Pedestrian Detection via Uncertainty-Aware Cross-Modal Learning

Park, Sungjune; Kim, Jung Uk; Kim, Yeon Gyun; Moon, Sang-Keun; Ro, Yong Man

doi:10.1007/978-3-030-67832-6_32

Sungjune Park¹⁵,
Jung Uk Kim¹⁵,
Yeon Gyun Kim¹⁶,
Sang-Keun Moon¹⁷ &
…
Yong Man Ro¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12572))

Included in the following conference series:

International Conference on Multimedia Modeling

2755 Accesses
3 Citations

Abstract

With the development of deep neural networks, multispectral pedestrian detection has been received a great attention by exploiting complementary properties of multiple modalities (e.g., color-visible and thermal modalities). Previous works usually rely on network prediction scores in combining complementary modal information. However, it is widely known that deep neural networks often show the overconfident problem which results in limited performance. In this paper, we propose a novel uncertainty-aware cross-modal learning to alleviate the aforementioned problem in multispectral pedestrian detection. First, we extract object region uncertainty which represents the reliability of object region features in multiple modalities. Then, we combine each modal object region feature considering object region uncertainty. Second, we guide the classifier of detection framework with soft target labels to be aware of the level of object region uncertainty in multiple modalities. To verify the effectiveness of the proposed methods, we conduct extensive experiments with various detection frameworks on two public datasets (i.e., KAIST Multispectral Pedestrian Dataset and CVC-14).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cao, Y., Guan, D., Wu, Y., Yang, J., Cao, Y., Yang, M.Y.: Box-level segmentation supervised deep neural networks for accurate and real-time multispectral pedestrian detection. ISPRS J. Photogramm. Remote Sens. 150, 70–79 (2019)
Article Google Scholar
Chang, J., Lan, Z., Cheng, C., Wei, Y.: Data uncertainty learning in face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5710–5719 (2020)
Google Scholar
Choi, H., Kim, S., Park, K., Sohn, K.: Multi-spectral pedestrian detection based on accumulated object proposal with fully convolutional networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 621–626. IEEE (2016)
Google Scholar
Dollár, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: a benchmark. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 304–311. IEEE (2009)
Google Scholar
Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016)
Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Google Scholar
González, A., et al.: Pedestrian detection at day/night time with visible and fir cameras: a comparison. Sensors 16(6), 820 (2016)
Article Google Scholar
Guan, D., Cao, Y., Yang, J., Cao, Y., Yang, M.Y.: Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection. Inf. Fusion 50, 148–157 (2019)
Article Google Scholar
Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 345–360. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_23
Chapter Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
He, Y., Zhu, C., Wang, J., Savvides, M., Zhang, X.: Bounding box regression with uncertainty for accurate object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2888–2897 (2019)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Hwang, S., Park, J., Kim, N., Choi, Y., So Kweon, I.: Multispectral pedestrian detection: benchmark dataset and baseline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1037–1045 (2015)
Google Scholar
Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? In: Advances in Neural Information Processing Systems, pp. 5574–5584 (2017)
Google Scholar
Kim, J.U., Kwon, J., Kim, H.G., Lee, H., Ro, Y.M.: Object bounding box-critic networks for occlusion-robust object detection in road scene. In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 1313–1317. IEEE (2018)
Google Scholar
Kim, J.U., Kwon, J., Kim, H.G., Ro, Y.M.: BBC net: bounding-box critic network for occlusion-robust object detection. IEEE Trans. Circuits Syst. Video Technol. 30(4), 1037–1050 (2019)
Article Google Scholar
Kim, J.U., Park, S., Ro, Y.M.: Towards human-like interpretable object detection via spatial relation encoding. In: 2020 IEEE International Conference on Image Processing (ICIP), pp. 3284–3288. IEEE (2020)
Google Scholar
Kim, J.U., Ro, Y.M.: Attentive layer separation for object classification and object localization in object detection. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 3995–3999. IEEE (2019)
Google Scholar
Konig, D., Adam, M., Jarvers, C., Layher, G., Neumann, H., Teutsch, M.: Fully convolutional region proposal networks for multispectral person detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 49–56 (2017)
Google Scholar
Le, Q.V., Smola, A.J., Canu, S.: Heteroscedastic Gaussian process regression. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 489–496 (2005)
Google Scholar
Li, C., Song, D., Tong, R., Tang, M.: Multispectral pedestrian detection via simultaneous detection and segmentation. arXiv preprint arXiv:1808.04818 (2018)
Li, C., Song, D., Tong, R., Tang, M.: Illumination-aware faster R-CNN for robust multispectral pedestrian detection. Pattern Recognit. 85, 161–171 (2019)
Article Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Liu, J., Zhang, S., Wang, S., Metaxas, D.N.: Multispectral deep neural networks for pedestrian detection. arXiv preprint arXiv:1611.02644 (2016)
Nix, D.A., Weigend, A.S.: Estimating the mean and variance of the target probability distribution. In: Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN 1994), vol. 1, pp. 55–60. IEEE (1994)
Google Scholar
Park, K., Kim, S., Sohn, K.: Unified multi-spectral pedestrian detection based on probabilistic fusion networks. Pattern Recognit. 80, 143–155 (2018)
Article Google Scholar
Paszke, A., et al.: Automatic differentiation in pytorch (2017)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Song, S., Xiao, J.: Deep sliding shapes for amodal 3D object detection in RGB-D images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 808–816 (2016)
Google Scholar
Torabi, A., Massé, G., Bilodeau, G.A.: An iterative integrated framework for thermal-visible image registration, sensor fusion, and people tracking for video surveillance applications. Comput. Vis. Image Underst. 116(2), 210–221 (2012)
Article Google Scholar
Tung, F., Mori, G.: Similarity-preserving knowledge distillation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1365–1374 (2019)
Google Scholar
Wang, X., Wang, M., Li, W.: Scene-specific pedestrian detection for static video surveillance. IEEE Trans. Pattern Anal. Mach. Intell. 36(2), 361–374 (2013)
Article Google Scholar
Zhang, L., et al.: Cross-modality interactive attention network for multispectral pedestrian detection. Inf. Fusion 50, 20–29 (2019)
Article Google Scholar
Zhang, L., Zhu, X., Chen, X., Yang, X., Lei, Z., Liu, Z.: Weakly aligned cross-modal learning for multispectral pedestrian detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5127–5137 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Image and Video Systems Laboratory, School of Electrical Engineering, KAIST, Daejeon, Republic of Korea
Sungjune Park, Jung Uk Kim & Yong Man Ro
Agency for Defense Development, Daejeon, Republic of Korea
Yeon Gyun Kim
Korea Electric Power Corporation (KEPCO) Research Institute, Daejeon, Republic of Korea
Sang-Keun Moon

Authors

Sungjune Park
View author publications
You can also search for this author in PubMed Google Scholar
Jung Uk Kim
View author publications
You can also search for this author in PubMed Google Scholar
Yeon Gyun Kim
View author publications
You can also search for this author in PubMed Google Scholar
Sang-Keun Moon
View author publications
You can also search for this author in PubMed Google Scholar
Yong Man Ro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Man Ro .

Editor information

Editors and Affiliations

Charles University, Prague, Czech Republic
Jakub Lokoč
Charles University, Prague, Czech Republic
Tomáš Skopal
Klagenfurt University, Klagenfurt, Austria
Klaus Schoeffmann
CERTH-ITI, Thessaloniki, Greece
Vasileios Mezaris
Renmin University of China, Beijing, China
Xirong Li
CERTH-ITI, Thessaloniki, Greece
Stefanos Vrochidis
Queen Mary University of London, London, UK
Ioannis Patras

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Park, S., Kim, J.U., Kim, Y.G., Moon, SK., Ro, Y.M. (2021). Robust Multispectral Pedestrian Detection via Uncertainty-Aware Cross-Modal Learning. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12572. Springer, Cham. https://doi.org/10.1007/978-3-030-67832-6_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-67832-6_32
Published: 21 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67831-9
Online ISBN: 978-3-030-67832-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics