Abstract
Pedestrian detection and re-identification have progressed significantly in the last few years. However, occluded people are notoriously hard to detect and recognize, as their appearance varies substantially depending on a wide range of occlusion patterns. In this paper, we aim to propose a simple and compact method based on CNNs for occlusion handling. We start with interpreting CNN channel features of a pedestrian detector, and we find that different channels activate responses for different body parts respectively. These findings motivate us to employ an attention mechanism across channels to represent various occlusion patterns in one single model, as each occlusion pattern can be formulated as some specific combination of body parts. Therefore, an attention network with self or external guidances is proposed as an add-on to the baseline CNN method. Also, we propose an attention guided self-paced learning method to balance the optimization across different occlusion levels. Our proposed method shows significant improvements over the baseline methods for both pedestrian detection and re-identification tasks. For pedestrian detection, we achieve a considerable improvement of 8pp to the baseline FasterRCNN detector on the heavy occlusion subset of CityPersons and on Caltech we outperform the state-of-the-art method by 5pp. For pedestrian re-identification, our method surpasses the baseline and achieves state-of-the-art performance on multiple re-identification benchmarks.
Similar content being viewed by others
References
Ahmed E., Jones M., & Marks T. K. (2015). An improved deep learning architecture for person re-identification. In CVPR.
Bau D., Zhou B., Khosla A., Oliva A., & Torralba A. (2017) Network dissection: Quantifying interpretability of deep visual representations. In CVPR
Bell S., Zitnick C. L., Bala K., & Girshick R. (2016). Inside outside net: Detecting objects in context with skip pooling and recurrent neural networks. In CVPR
Benenson R., Omran M., Hosang J., & Schiele B. (2014). Ten years of pedestrian detection, what have we learned? In ECCV, CVRSUAD workshop.
Brazil G., & Liu X. (2019). Pedestrian detection with autoregressive network phases. In CVPR
Brazil G., Yin X., & Liu X. (2017). Illuminating pedestrians via simultaneous detection & segmentation. In ICCV.
Cai Z., Fan Q., Feris R., & Vasconcelos N. (2016). A unified multi-scale deep convolutional neural network for fast object detection. In ECCV.
Cheng D., Gong Y., Zhou S., Wang J., & Zheng N. (2016). Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In CVPR.
Chu X., Zheng A., Zhang X., & Sun J. (2020). Detection in crowded scenes: One proposal, multiple predictions. In CVPR.
Cordts M., Omran M., Ramos S., Rehfeld T., Enzweiler M., Benenson R., Franke U., Roth S., & Schiele B. (2016) The cityscapes dataset for semantic urban scene understanding. In CVPR.
Ding, S., Lin, L., Wang, G., & Chao, H. (2015). Deep feature learning with relative distance comparison for person re-identification. Pattern Recognition, 48(10),
Dollár, P., Wojek, C., Schiele, B., & Perona, P. (2012). Pedestrian detection: An evaluation of the state of the art. PAMI, 34(4), 743–761.
Du X., El-Khamy M., Lee J., & Davis L. S. (2016). Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection. In arXiv.
Enzweiler, M., Eigenstetter, A., Schiele, B., & Gavrila, D. (2010). Multi-cue pedestrian classification with partial occlusion handling. In CVPR.
Ess, A., Leibe, B., Schindler, K., & Gool, L. V. (2008). A mobile vision system for robust multi-person tracking. In CVPR.
Felzenszwalb, P. F., Girshick, R. B., Mcallester, D., & Ramanan, D. (2009). Object detection with discriminatively trained part based models. PAMI, 32(9), 1627–1645.
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR.
Gonzalez-Garcia, A., Modolo, D., & Ferrari, V. (2017). Do semantic parts emerge in convolutional neural networks? IJCV, 126(5), 476–494.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR.
He, L., Liang, J., Li, H., & Sun, Z. (2018). Deep spatial feature reconstruction for partial person re-identification: Alignment-free approach. In CVPR.
Hosang, J., Omran, M., Benenson, R., & Schiele, B. (2015). Taking a deeper look at pedestrians. In CVPR.
Hu, J., Shen, L., & Sun, G. (2017). Squeeze-and-excitation networks. arXiv.
Huang, H., Li, D., Zhang, Z., Chen, X., & Huang, K. (2018) Adversarially occluded samples for person re-identification. In CVPR.
Huang, X., Ge, Z., Jie, Z., & Yoshie, O. (2020a). NMS by representative region: Towards crowded pedestrian detection by proposal pairing. In CVPR.
Huang, X., Ge, Z., Jie, Z., & Yoshie1, O. (2020b). NMS by representative region: Towards crowded pedestrian detection by proposal pairing. In CVPR.
Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., & Schiele, B. (2016). Deepercut: A deeper, stronger, and faster multi-person pose estimation model. In ECCV.
Jaderberg, M., Simonyan, K., Zisserman, A., & Kavukcuoglu, K. (2015). Spatial transformer networks. In NIPS.
Kingma, D., & Ba, J. (2015). Adam: A method for stochastic optimization. In ICLR.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NIPS.
Li, G., Li, J., Zhang, S., & Yang, J. (2020). Learning hierarchical graph for occluded pedestrian detection. In ACM MM.
Li, J., Liang, X., Shen, S., Xu, T., & Yan, S. (2016). Scale-aware fast R-CNN for pedestrian detection. arXiv
Li, W., Zhao, R., Xiao, T., & Wang, X. (2014). Deepreid: Deep filter pairing neural network for person re-identification. In CVPR.
Li, W., Zhu, X., & Gong, S. (2018). Harmonious attention network for person re-identification. In CVPR.
Lin, C., Lu, J., Wang, G., & Zhou, J. (2018). Graininess-aware deep feature learning for pedestrian detection. In ECCV.
Liu, H., Feng, J., Qi, M., Jiang, J., & Yan, S. (2017). End-to-end comparative attention networks for person re-identification. TIP, 26(7),
Liu, J., Ni, B., Yan, Y., Zhou, P., Cheng, S., & Hu, J. (2018a). Pose transferrable person re-identification. In CVPR.
Liu S., Huang D., & Wang Y. (2019a) Adaptive nms: Refining pedestrian detection in a crowd. In: CVPR
Liu W., Liao S., Hu W., Liang X., & Chen X. (2018b) Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. In: ECCV
Liu W., Liao S., Ren W., Hu W., & Yu Y. (2019b) High-level semantic feature detection: A new perspective for pedestrian detection. In: CVPR
Mathias M., Benenson R., Timofte R., & Van Gool L. (2013) Handling occlusions with franken-classifiers. In: ICCV
Newell A., Yang K., & Deng J. (2016) Stacked hourglass networks for human pose estimation. In: ECCV
Noh J., Lee S., Kim B., & Kim G. (2018) Improving occlusion and hard negative handling for single-stage pedestrian detectors. In: CVPR
Ouyang W., & Wang X. (2012) A discriminative deep model for pedestrian detection with occlusion handling. In: CVPR
Ouyang W., & Wang X. (2013) Joint deep learning for pedestrian detection. In: ICCV
Paisitkriangkrai S., Shen C., & van den Hengel A. (2014) Strengthening the effectiveness of pedestrian detection. In: ECCV
Pang Y., Xie J., Khan M. H., Anwer R. M., Khan F. S., & Shao L. (2019) Mask-guided attention network for occluded pedestrian detection. In: ICCV
Ren S., He K., Girshick R., & Sun J. (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. In: NIPS
Ristani E., Solera F., Zou R., Cucchiara R., & Tomasi C. (2016) Performance measures and a data set for multi-target, multi-camera tracking. In: ECCV
Saquib Sarfraz M., Schumann A., Eberle A., & Stiefelhagen R. (2018) A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking. In: CVPR
Shao S., Zhao Z., Li B., Xiao T., Yu G., Zhang X., & Sun J. (2018) Crowdhuman: A benchmark for detecting human in a crowd. arXiv preprint arXiv:180500123
Si J., Zhang H., Li C.-G., Kuen J., Kong X., Kot A. C., & Wang G. (2018) Dual attention matching network for context-aware feature sequence based person re-identification. In: CVPR
Simon M., Rodner E., & Denzler J. (2014) Part detector discovery in deep convolutional neural networks. In: ACCV
Song T., L. Sun D. X., Sun H., & Pu S. (2018) Small-scale pedestrian detection based on topological line localization and temporal feature aggregation. In: ECCV
Su C., Li J., Zhang S., Xing J., Gao W., & Tian Q. (2017) Pose-driven deep convolutional model for person re-identification. In: ICCV
Suh Y., Wang J., Tang S., Mei T., & Mu Lee K. (2018) Part-aligned bilinear representations for person re-identification. In: ECCV
Szegedy C., Vanhoucke V., Ioffe S., Shlens J., & Wojna Z. (2016) Rethinking the inception architecture for computer vision. In: CVPR
Tian Y., Luo P., Wang X., & Tang X. (2015a) Deep learning strong parts for pedestrian detection. In: ICCV
Tian Y., Luo P., Wang X., & Tang X. (2015b) Pedestrian detection aided by deep learning semantic tasks. In: CVPR
Varior R. R., Shuai B., Lu J., Xu D., & Wang G. (2016) A Siamese Long Short-Term Memory Architecture for Human Re-Identification. In: ECCV
Wang S., Cheng J., Liu H., & Tang M. (2017) Pcn: Part and context information for pedestrian detection with cnns. In: BMVC
Wang X., Xiao T., Jiang Y., Shao S., Sun J., & Shen C. (2018) Repulsion loss: Detecting pedestrians in a crowd. In: CVPR
Wei Liu W. R. W. H. Y. Y. Shengcai Liao (2019) High-level semantic feature detection: A new perspective for pedestrian detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Wu J., Zhou C., Yang M., Zhang Q., Li Y., & Yuan J. (2020) Temporal-context enhanced detection of heavily occluded pedestrians. In: CVPR
Xiao T., Li H., Ouyang W., & Wang X. (2016) Learning deep feature representations with domain guided dropout for person re-identification. In: CVPR
Xiao T., Li S., Wang B., Lin L., & Wang X. (2017) Joint detection and identification feature learning for person search. In: CVPR
Xie J., Cholakkal H., Anwer R., Khan F., Pang Y., Shao L., & Shah M. (2020) Count- and similarity-aware r-cnn for pedestrian detection. In: ECCV
Xu J., Zhao R., Zhu F., Wang H., & Ouyang W. (2018) Attention-aware compositional network for person re-identification. In: CVPR
Yi D., Lei Z., Liao S., & Li S. Z. (2014) Deep metric learning for person re-identification. In: ICPR
Zeiler M. D., & Fergus R. (2014) Visualizing and understanding convolutional networks. In: ECCV
Zhang L., Lin L., Liang X., & He K. (2016a) Is faster rcnn doing well with pedestrian detection. In: ECCV
Zhang S., Benenson R., Omran M., Hosang J., & Schiele B. (2016b) How far are we from solving pedestrian detection? In: CVPR
Zhang S., Benenson R., & Schiele B. (2017) Citypersons: A diverse dataset for pedestrian detection. In: CVPR
Zhang, S., Benenson, R., Omran, M., Hosang, J., & Schiele, B. (2018a). Towards reaching human performance in pedestrian detection. PAMI, 40(4), 973–986.
Zhang S., Wen L., Bian X., & Lei Z., Li S. Z. (2018b) Occlusion-aware r-cnn: Detecting pedestrians in a crowd. In: ECCV
Zheng L., Shen L., Tian L., Wang S., Wang J., & Tian Q. (2015a) Scalable person re-identification: A benchmark. In: ICCV
Zheng L., Bie Z., Sun Y., Wang J., Su C., Wang S., & Tian Q. (2016a) Mars: A video benchmark for large-scale person re-identification. In: ECCV
Zheng L., Yang Y., & Hauptmann A. G. (2016b) Person re-identification: Past, present and future. arXiv
Zheng L., Zhang H., Sun S., Chandraker M., Yang Y., & Tian Q. (2017a) Person re-identification in the wild. In: CVPR
Zheng W. S., Gong S., & Xiang T. (2009) Associating groups of people. In: BMVC
Zheng W. S., Li X., Xiang T., Liao S., Lai J., & Gong S. (2015b) Partial person re-identification. In: ICCV
Zheng Z., Zheng L., & Yang Y. (2017b) Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In: ICCV
Zheng Z., Zheng L., & Yang Y. (2018) A discriminatively learned cnn embedding for person reidentification. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 14(1)
Zhong Z., Zheng L., Cao D., & Li S. (2017a) Re-ranking person re-identification with k-reciprocal encoding. In: CVPR
Zhong Z., Zheng L., Kang G., Li S., & Yang Y. (2017b) Random erasing data augmentation. In: arxiv
Zhou C., & Yuan J. (2017) Multi-label learning of part detectors for heavily occluded pedestrian detection. In: ICCV
Zhou C., & Yuan J. (2018) Bi-box regression for pedestrian detection and occlusion estimation. In: ECCV
Zhou C., Yang M., & Yuan J. (2019) Discriminative feature transformation for occluded pedestrian detection. In: ICCV
Acknowledgements
This work was supported by the Funds for International Cooperation and Exchange of the National Natural Science Foundation of China (Grant No. 61861136011), the Natural Science Foundation of Jiangsu Province, China (Grant No. BK20181299), the National Science Fund of China (Grant No. 61702262), the Fundamental Research Funds for the Central Universities (Grant No. 30920032201), the National Key Research and Development Program of China under Grant 2017YFC0820601.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Chen Change Loy.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, S., Chen, D., Yang, J. et al. Guided Attention in CNNs for Occluded Pedestrian Detection and Re-identification. Int J Comput Vis 129, 1875–1892 (2021). https://doi.org/10.1007/s11263-021-01461-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-021-01461-z