Skip to main content
Log in

Guided Attention in CNNs for Occluded Pedestrian Detection and Re-identification

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Pedestrian detection and re-identification have progressed significantly in the last few years. However, occluded people are notoriously hard to detect and recognize, as their appearance varies substantially depending on a wide range of occlusion patterns. In this paper, we aim to propose a simple and compact method based on CNNs for occlusion handling. We start with interpreting CNN channel features of a pedestrian detector, and we find that different channels activate responses for different body parts respectively. These findings motivate us to employ an attention mechanism across channels to represent various occlusion patterns in one single model, as each occlusion pattern can be formulated as some specific combination of body parts. Therefore, an attention network with self or external guidances is proposed as an add-on to the baseline CNN method. Also, we propose an attention guided self-paced learning method to balance the optimization across different occlusion levels. Our proposed method shows significant improvements over the baseline methods for both pedestrian detection and re-identification tasks. For pedestrian detection, we achieve a considerable improvement of 8pp to the baseline FasterRCNN detector on the heavy occlusion subset of CityPersons and on Caltech we outperform the state-of-the-art method by 5pp. For pedestrian re-identification, our method surpasses the baseline and achieves state-of-the-art performance on multiple re-identification benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  • Ahmed E., Jones M., & Marks T. K. (2015). An improved deep learning architecture for person re-identification. In CVPR.

  • Bau D., Zhou B., Khosla A., Oliva A., & Torralba A. (2017) Network dissection: Quantifying interpretability of deep visual representations. In CVPR

  • Bell S., Zitnick C. L., Bala K., & Girshick R. (2016). Inside outside net: Detecting objects in context with skip pooling and recurrent neural networks. In CVPR

  • Benenson R., Omran M., Hosang J., & Schiele B. (2014). Ten years of pedestrian detection, what have we learned? In ECCV, CVRSUAD workshop.

  • Brazil G., & Liu X. (2019). Pedestrian detection with autoregressive network phases. In CVPR

  • Brazil G., Yin X., & Liu X. (2017). Illuminating pedestrians via simultaneous detection & segmentation. In ICCV.

  • Cai Z., Fan Q., Feris R., & Vasconcelos N. (2016). A unified multi-scale deep convolutional neural network for fast object detection. In ECCV.

  • Cheng D., Gong Y., Zhou S., Wang J., & Zheng N. (2016). Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In CVPR.

  • Chu X., Zheng A., Zhang X., & Sun J. (2020). Detection in crowded scenes: One proposal, multiple predictions. In CVPR.

  • Cordts M., Omran M., Ramos S., Rehfeld T., Enzweiler M., Benenson R., Franke U., Roth S., & Schiele B. (2016) The cityscapes dataset for semantic urban scene understanding. In CVPR.

  • Ding, S., Lin, L., Wang, G., & Chao, H. (2015). Deep feature learning with relative distance comparison for person re-identification. Pattern Recognition, 48(10),

  • Dollár, P., Wojek, C., Schiele, B., & Perona, P. (2012). Pedestrian detection: An evaluation of the state of the art. PAMI, 34(4), 743–761.

    Article  Google Scholar 

  • Du X., El-Khamy M., Lee J., & Davis L. S. (2016). Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection. In arXiv.

  • Enzweiler, M., Eigenstetter, A., Schiele, B., & Gavrila, D. (2010). Multi-cue pedestrian classification with partial occlusion handling. In CVPR.

  • Ess, A., Leibe, B., Schindler, K., & Gool, L. V. (2008). A mobile vision system for robust multi-person tracking. In CVPR.

  • Felzenszwalb, P. F., Girshick, R. B., Mcallester, D., & Ramanan, D. (2009). Object detection with discriminatively trained part based models. PAMI, 32(9), 1627–1645.

    Article  Google Scholar 

  • Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR.

  • Gonzalez-Garcia, A., Modolo, D., & Ferrari, V. (2017). Do semantic parts emerge in convolutional neural networks? IJCV, 126(5), 476–494.

    Article  MathSciNet  Google Scholar 

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR.

  • He, L., Liang, J., Li, H., & Sun, Z. (2018). Deep spatial feature reconstruction for partial person re-identification: Alignment-free approach. In CVPR.

  • Hosang, J., Omran, M., Benenson, R., & Schiele, B. (2015). Taking a deeper look at pedestrians. In CVPR.

  • Hu, J., Shen, L., & Sun, G. (2017). Squeeze-and-excitation networks. arXiv.

  • Huang, H., Li, D., Zhang, Z., Chen, X., & Huang, K. (2018) Adversarially occluded samples for person re-identification. In CVPR.

  • Huang, X., Ge, Z., Jie, Z., & Yoshie, O. (2020a). NMS by representative region: Towards crowded pedestrian detection by proposal pairing. In CVPR.

  • Huang, X., Ge, Z., Jie, Z., & Yoshie1, O. (2020b). NMS by representative region: Towards crowded pedestrian detection by proposal pairing. In CVPR.

  • Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., & Schiele, B. (2016). Deepercut: A deeper, stronger, and faster multi-person pose estimation model. In ECCV.

  • Jaderberg, M., Simonyan, K., Zisserman, A., & Kavukcuoglu, K. (2015). Spatial transformer networks. In NIPS.

  • Kingma, D., & Ba, J. (2015). Adam: A method for stochastic optimization. In ICLR.

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NIPS.

  • Li, G., Li, J., Zhang, S., & Yang, J. (2020). Learning hierarchical graph for occluded pedestrian detection. In ACM MM.

  • Li, J., Liang, X., Shen, S., Xu, T., & Yan, S. (2016). Scale-aware fast R-CNN for pedestrian detection. arXiv

  • Li, W., Zhao, R., Xiao, T., & Wang, X. (2014). Deepreid: Deep filter pairing neural network for person re-identification. In CVPR.

  • Li, W., Zhu, X., & Gong, S. (2018). Harmonious attention network for person re-identification. In CVPR.

  • Lin, C., Lu, J., Wang, G., & Zhou, J. (2018). Graininess-aware deep feature learning for pedestrian detection. In ECCV.

  • Liu, H., Feng, J., Qi, M., Jiang, J., & Yan, S. (2017). End-to-end comparative attention networks for person re-identification. TIP, 26(7),

  • Liu, J., Ni, B., Yan, Y., Zhou, P., Cheng, S., & Hu, J. (2018a). Pose transferrable person re-identification. In CVPR.

  • Liu S., Huang D., & Wang Y. (2019a) Adaptive nms: Refining pedestrian detection in a crowd. In: CVPR

  • Liu W., Liao S., Hu W., Liang X., & Chen X. (2018b) Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. In: ECCV

  • Liu W., Liao S., Ren W., Hu W., & Yu Y. (2019b) High-level semantic feature detection: A new perspective for pedestrian detection. In: CVPR

  • Mathias M., Benenson R., Timofte R., & Van Gool L. (2013) Handling occlusions with franken-classifiers. In: ICCV

  • Newell A., Yang K., & Deng J. (2016) Stacked hourglass networks for human pose estimation. In: ECCV

  • Noh J., Lee S., Kim B., & Kim G. (2018) Improving occlusion and hard negative handling for single-stage pedestrian detectors. In: CVPR

  • Ouyang W., & Wang X. (2012) A discriminative deep model for pedestrian detection with occlusion handling. In: CVPR

  • Ouyang W., & Wang X. (2013) Joint deep learning for pedestrian detection. In: ICCV

  • Paisitkriangkrai S., Shen C., & van den Hengel A. (2014) Strengthening the effectiveness of pedestrian detection. In: ECCV

  • Pang Y., Xie J., Khan M. H., Anwer R. M., Khan F. S., & Shao L. (2019) Mask-guided attention network for occluded pedestrian detection. In: ICCV

  • Ren S., He K., Girshick R., & Sun J. (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. In: NIPS

  • Ristani E., Solera F., Zou R., Cucchiara R., & Tomasi C. (2016) Performance measures and a data set for multi-target, multi-camera tracking. In: ECCV

  • Saquib Sarfraz M., Schumann A., Eberle A., & Stiefelhagen R. (2018) A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking. In: CVPR

  • Shao S., Zhao Z., Li B., Xiao T., Yu G., Zhang X., & Sun J. (2018) Crowdhuman: A benchmark for detecting human in a crowd. arXiv preprint arXiv:180500123

  • Si J., Zhang H., Li C.-G., Kuen J., Kong X., Kot A. C., & Wang G. (2018) Dual attention matching network for context-aware feature sequence based person re-identification. In: CVPR

  • Simon M., Rodner E., & Denzler J. (2014) Part detector discovery in deep convolutional neural networks. In: ACCV

  • Song T., L. Sun D. X., Sun H., & Pu S. (2018) Small-scale pedestrian detection based on topological line localization and temporal feature aggregation. In: ECCV

  • Su C., Li J., Zhang S., Xing J., Gao W., & Tian Q. (2017) Pose-driven deep convolutional model for person re-identification. In: ICCV

  • Suh Y., Wang J., Tang S., Mei T., & Mu Lee K. (2018) Part-aligned bilinear representations for person re-identification. In: ECCV

  • Szegedy C., Vanhoucke V., Ioffe S., Shlens J., & Wojna Z. (2016) Rethinking the inception architecture for computer vision. In: CVPR

  • Tian Y., Luo P., Wang X., & Tang X. (2015a) Deep learning strong parts for pedestrian detection. In: ICCV

  • Tian Y., Luo P., Wang X., & Tang X. (2015b) Pedestrian detection aided by deep learning semantic tasks. In: CVPR

  • Varior R. R., Shuai B., Lu J., Xu D., & Wang G. (2016) A Siamese Long Short-Term Memory Architecture for Human Re-Identification. In: ECCV

  • Wang S., Cheng J., Liu H., & Tang M. (2017) Pcn: Part and context information for pedestrian detection with cnns. In: BMVC

  • Wang X., Xiao T., Jiang Y., Shao S., Sun J., & Shen C. (2018) Repulsion loss: Detecting pedestrians in a crowd. In: CVPR

  • Wei Liu W. R. W. H. Y. Y. Shengcai Liao (2019) High-level semantic feature detection: A new perspective for pedestrian detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  • Wu J., Zhou C., Yang M., Zhang Q., Li Y., & Yuan J. (2020) Temporal-context enhanced detection of heavily occluded pedestrians. In: CVPR

  • Xiao T., Li H., Ouyang W., & Wang X. (2016) Learning deep feature representations with domain guided dropout for person re-identification. In: CVPR

  • Xiao T., Li S., Wang B., Lin L., & Wang X. (2017) Joint detection and identification feature learning for person search. In: CVPR

  • Xie J., Cholakkal H., Anwer R., Khan F., Pang Y., Shao L., & Shah M. (2020) Count- and similarity-aware r-cnn for pedestrian detection. In: ECCV

  • Xu J., Zhao R., Zhu F., Wang H., & Ouyang W. (2018) Attention-aware compositional network for person re-identification. In: CVPR

  • Yi D., Lei Z., Liao S., & Li S. Z. (2014) Deep metric learning for person re-identification. In: ICPR

  • Zeiler M. D., & Fergus R. (2014) Visualizing and understanding convolutional networks. In: ECCV

  • Zhang L., Lin L., Liang X., & He K. (2016a) Is faster rcnn doing well with pedestrian detection. In: ECCV

  • Zhang S., Benenson R., Omran M., Hosang J., & Schiele B. (2016b) How far are we from solving pedestrian detection? In: CVPR

  • Zhang S., Benenson R., & Schiele B. (2017) Citypersons: A diverse dataset for pedestrian detection. In: CVPR

  • Zhang, S., Benenson, R., Omran, M., Hosang, J., & Schiele, B. (2018a). Towards reaching human performance in pedestrian detection. PAMI, 40(4), 973–986.

    Article  Google Scholar 

  • Zhang S., Wen L., Bian X., & Lei Z., Li S. Z. (2018b) Occlusion-aware r-cnn: Detecting pedestrians in a crowd. In: ECCV

  • Zheng L., Shen L., Tian L., Wang S., Wang J., & Tian Q. (2015a) Scalable person re-identification: A benchmark. In: ICCV

  • Zheng L., Bie Z., Sun Y., Wang J., Su C., Wang S., & Tian Q. (2016a) Mars: A video benchmark for large-scale person re-identification. In: ECCV

  • Zheng L., Yang Y., & Hauptmann A. G. (2016b) Person re-identification: Past, present and future. arXiv

  • Zheng L., Zhang H., Sun S., Chandraker M., Yang Y., & Tian Q. (2017a) Person re-identification in the wild. In: CVPR

  • Zheng W. S., Gong S., & Xiang T. (2009) Associating groups of people. In: BMVC

  • Zheng W. S., Li X., Xiang T., Liao S., Lai J., & Gong S. (2015b) Partial person re-identification. In: ICCV

  • Zheng Z., Zheng L., & Yang Y. (2017b) Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In: ICCV

  • Zheng Z., Zheng L., & Yang Y. (2018) A discriminatively learned cnn embedding for person reidentification. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 14(1)

  • Zhong Z., Zheng L., Cao D., & Li S. (2017a) Re-ranking person re-identification with k-reciprocal encoding. In: CVPR

  • Zhong Z., Zheng L., Kang G., Li S., & Yang Y. (2017b) Random erasing data augmentation. In: arxiv

  • Zhou C., & Yuan J. (2017) Multi-label learning of part detectors for heavily occluded pedestrian detection. In: ICCV

  • Zhou C., & Yuan J. (2018) Bi-box regression for pedestrian detection and occlusion estimation. In: ECCV

  • Zhou C., Yang M., & Yuan J. (2019) Discriminative feature transformation for occluded pedestrian detection. In: ICCV

Download references

Acknowledgements

This work was supported by the Funds for International Cooperation and Exchange of the National Natural Science Foundation of China (Grant No. 61861136011), the Natural Science Foundation of Jiangsu Province, China (Grant No. BK20181299), the National Science Fund of China (Grant No. 61702262), the Fundamental Research Funds for the Central Universities (Grant No. 30920032201), the National Key Research and Development Program of China under Grant 2017YFC0820601.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jian Yang.

Additional information

Communicated by Chen Change Loy.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, S., Chen, D., Yang, J. et al. Guided Attention in CNNs for Occluded Pedestrian Detection and Re-identification. Int J Comput Vis 129, 1875–1892 (2021). https://doi.org/10.1007/s11263-021-01461-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-021-01461-z

Keywords

Navigation