Skip to main content
Log in

Efficient Person Search: An Anchor-Free Approach

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Person search aims to simultaneously localize and identify a query person from uncropped images. To achieve this goal, state-of-the-art models typically add a re-id branch upon two-stage detectors like Faster R-CNN. Owing to the ROI-Align operation, this pipeline yields promising accuracy as re-id features are explicitly aligned with the corresponding object regions, but in the meantime, it introduces high computational overhead due to dense object anchors. In this work, we present an anchor-free approach to efficiently tackling this challenging task, by introducing the following dedicated designs. First, we select an anchor-free detector (i.e., FCOS) as the prototype of our framework. Due to the lack of dense object anchors, it exhibits significantly higher efficiency compared with existing person search models. Second, when directly accommodating this anchor-free detector for person search, there exist several misalignment issues in different levels (i.e., scale, region, and task). To address these issues, we propose an aligned feature aggregation module to generate more discriminative and robust feature embeddings. Accordingly, we name our framework as Feature-Aligned Person Search Network (AlignPS). Third, by investigating the advantages of both anchor-based and anchor-free models, we further augment AlignPS with an ROI-Align head, which significantly improves the robustness of re-id features while still keeping our model highly efficient. Our framework not only achieves state-of-the-art or competitive performance on two challenging person search benchmarks, but can be also extended to other challenging searching tasks such as animal and object search. All the source codes, data, and trained models are available at: https://github.com/daodaofr/alignps.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. https://github.com/PICWD/PICWD.

  2. https://github.com/caposerenity/mindspore_AlignPS.

  3. We test the PyTorch implementation at https://github.com/serend1p1ty/person_search.

References

  • Ahmed, E., Jones, M. J., & Marks, T. K. (2015). An improved deep learning architecture for person re-identification. In IEEE conference on computer vision and pattern recognition (pp. 3908–3916).

  • Belghazi, M. I., Baratin, A., Rajeswar, S., Ozair, S., Bengio, Y., Hjelm, R. D., & Courville, A. C. (2018). Mutual information neural estimation. In International Conference on Machine Learning (pp. 530–539).

  • Cai, Z., & Vasconcelos, N.(2018). Cascade R-CNN: delving into high quality object detection. In IEEE conference on computer vision and pattern recognition (pp. 6154–6162).

  • Chang, X., Huang, P., Shen, Y., Liang, X., Yang, Y., & Hauptmann, A. G. (2018). RCAA: Relational context-aware agents for person search. In European Conference for Computer Vision (pp. 86–102).

  • Chen, D., Zhang, S., Ouyang, W., Yang, J., & Schiele, B. (2020) Hierarchical online instance matching for person search. In AAAI (pp. 10518–10525).

  • Chen, D., Zhang, S., Ouyang, W., Yang, J., & Tai, Y. (2020). Person search by separated modeling and A mask-guided two-stream CNN model. IEEE Transactions Image Processing, 29, 4669–4682.

  • Chen, D., Zhang, S., Yang, J., & Schiele, B. (2021). Norm-aware embedding for efficient person search and tracking. International Journal of Computer Vision, 129(11), 3154–3168.

    Article  Google Scholar 

  • Chen, G., Choi, W., Yu, X., Han, T. X., & Chandraker, M. (2017). Learning efficient object detection models with knowledge distillation. In Advances in neural information processing systems (pp. 742–751).

  • Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J., Zhang, Z., Cheng, D., Zhu, C., Cheng, T., Zhao, Q., Li, B., Lu, X., Zhu, R., Wu, Y., Dai, J., Wang, J., Shi, J., Ouyang, W., Loy, C. C., & Lin, D. (2019). Mmdetection: Open mmlab detection toolbox and benchmark. arXiv:1906.07155.

  • Chen, W., Chen, X., Zhang, J., & Huang, K. (2017) Beyond triplet loss: A deep quadruplet network for person re-identification. In IEEE conference on computer vision and pattern recognition (pp. 1320–1329).

  • Chen, Y., Han, C., Wang, N., & Zhang, Z. (2019) Revisiting feature alignment for one-stage object detection. arXiv:1908.01570.

  • Chen, Y., Zhu, X., Zheng, W., & Lai, J. (2018). Person re-identification by camera correlation aware feature augmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(2), 392–408.

    Article  Google Scholar 

  • Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, & Y. (2017) Deformable convolutional networks. In International conference on computer vision (pp. 764–773).

  • Dai, Y., Li, X., Liu, J., Tong, Z., & Duan, L.Y. (2021) Generalizable person re-identification with relevance-aware mixture of experts. In IEEE conference on computer vision and pattern recognition (pp. 16145–16154).

  • Deng, J., Dong, W., Socher, R., Li, L., Li, K., & Li, F. (2009). Imagenet: A large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition (pp. 248–255).

  • Dong, W., Zhang, Z., Song, C., & Tan, T. (2020). Bi-directional interaction network for person search. In IEEE conference on computer vision and pattern recognition (pp. 2836–2845).

  • Dong, W., Zhang, Z., Song, C., & Tan, T. (2020). Instance guided proposal network for person search. In IEEE conference on computer vision and pattern recognition (pp. 2582–2591).

  • Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., & Tian, Q. (2019) Centernet: Keypoint triplets for object detection. In IEEE conference on computer vision (pp. 6568–6577).

  • Farenzena, M., Bazzani, L., Perina, A., Murino, V., & Cristani, M. (2010). Person re-identification by symmetry-driven accumulation of local features. In IEEE conference on computer vision and pattern recognition (pp. 2360–2367).

  • Ge, Y., Li, Z., Zhao, H., Yin, G., Yi, S., Wang, X., & Li, H. (2018). FD-GAN: pose-guided feature distilling GAN for robust person re-identification. In Advances neural information processing systems (pp. 1230–1241).

  • Gray, D., & Tao, H. (2008). Viewpoint invariant pedestrian recognition with an ensemble of localized features. In European conference computer vision (pp. 262–275).

  • Han, C., Ye, J., Zhong, Y., Tan, X., Zhang, C., Gao, C., & Sang, N. (2019). Re-id driven localization refinement for person search. In Conference on computer vision (pp. 9813–9822).

  • Han, H., Li, J., Jain, A. K., Shan, S., & Chen, X. (2019). Tattoo image search at scale: Joint detection and compact representation learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(10), 2333–2348.

    Article  Google Scholar 

  • He, K., Zhang, X., Ren, & S., Sun, J. (2016). Deep residual learning for image recognition. In IEEE conference on computer vision and pattern recognition (pp. 770–778).

  • Hermans, A., Beyer, L., & Leibe, B. (2017). In defense of the triplet loss for person re-identification. arXiv:1703.07737.

  • Hinton, G.E., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv:1503.02531.

  • Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., & Bengio, Y. (2019). Learning deep representations by mutual information estimation and maximization. In ICLR.

  • Hong, P., Wu, T., Wu, A., Han, X., & Zheng, W. S. (2021). Fine-grained shape-appearance mutual learning for cloth-changing person re-identification. In IEEE conference on computer vision and pattern recognition (pp. 10513–10522).

  • Hou, Y., Ma, Z., Liu, C., Hui, T., & Loy, C. C. (2020). Inter-region affinity distillation for road marking segmentation. In IEEE conference on computer vision and pattern recognition.

  • Kim, H., Joung, S., Kim, I. J., & Sohn, K. (2021). Prototype-guided saliency feature learning for person search. In IEEE conference on computer vision and pattern recognition (pp. 4865–4874).

  • Kinney, J. B., & Atwal, G. S. (2014). Equitability, mutual information, and the maximal information coefficient. Proceedings of the National Academy of Sciences, 111(9), 3354–3359.

    Article  MathSciNet  MATH  Google Scholar 

  • Kong, T., Sun, F., Liu, H., Jiang, Y., Li, L., & Shi, J. (2020). Foveabox: Beyound anchor-based object detection. IEEE Transactions on Image Processing, 29, 7389–7398.

    Article  MATH  Google Scholar 

  • Lan, X., Zhu, X., & Gong, S. (2018). Person search by multi-scale matching. European Conference on Computer Vision, 11205, 553–569.

    Google Scholar 

  • Law, H., & Deng, J. (2018). Cornernet: Detecting objects as paired keypoints. In European Conference on Computer Vision (pp. 765–781).

  • Li, J., Liao, S., Jiang, H., & Shao, L. (2020) Box guided convolution for pedestrian detection. In ACM International Conference Multimedia (pp. 1615–1624).

  • Li, Q., Jin, S., & Yan, J. (2017). Mimicking very efficient network for object detection. In IEEE conference on computer vision and pattern recognition (pp. 7341–7349).

  • Li, S., Song, W., Fang, Z., Shi, J., Hao, A., Zhao, Q., & Qin, H. (2020). Long-short temporal-spatial clues excited network for robust person re-identification. International Journal of Computer Vision, 128(12), 2936–2961.

    Article  Google Scholar 

  • Li, S., Yang, L., Huang, J., Hua, X., & Zhang, L. (2019). Dynamic anchor feature selection for single-shot object detection. In International conference on computer vision (pp. 6608–6617).

  • Li, Z., & Hoiem, D. (2018). Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12), 2935–2947.

    Article  Google Scholar 

  • Li, Z., & Miao, D. (2021). Sequential end-to-end network for efficient person search. In AAAI (pp. 2011–2019).

  • Lin, T., Dollár, P., Girshick, R. B., He, K., Hariharan, B., & Belongie, S. J. (2017). Feature pyramid networks for object detection. In IEEE conference on computer vision and pattern recognition (pp. 936–944).

  • Lin, T., Goyal, P., Girshick, R.B., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In: Int. Conf. Comput. Vis., pp. 2999–3007.

  • Liu, H., Feng, J., Jie, Z., Karlekar, J., Zhao, B., Qi, M., Jiang, J., & Yan, S. (2017). Neural person search machines. In International conference on computer vision (pp. 493–501).

  • Liu, J., Ni, B., Yan, Y., Zhou, P., Cheng, S., & Hu, J. (2018). Pose transferrable person re-identification. In IEEE conference on computer vision and pattern recognition (pp. 4099–4108).

  • Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S. E., Fu, C., & Berg, A. C. (2016). SSD: Single shot multibox detector. In European Conference on Computer Vision (pp. 21–37).

  • Liu, W., Liao, S., Ren, W., Hu, W., & Yu, Y. (2019) High-level semantic feature detection: A new perspective for pedestrian detection. In IEEE conference on computer vision and pattern recognition (pp. 5187–5196).

  • Liu, Y., Chen, K., Liu, C., Qin, Z., Luo, Z., & Wang, J. (2019). Structured knowledge distillation for semantic segmentation. In IEEE conference on computer vision and pattern recognition (pp. 2604–2613).

  • Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.

    Article  Google Scholar 

  • Miao, J., Wu, Y., Liu, P., Ding, Y., & Yang, Y. (2019). Pose-guided feature alignment for occluded person re-identification. In International Conference on Computer Vision (pp. 542–551).

  • Mullapudi, R. T., Chen, S., Zhang, K., Ramanan, D., & Fatahalian, K. (2019). Online model distillation for efficient video inference. In IEEE international conference on computer vision (pp. 3572–3581).

  • Munjal, B., Amin, S., Tombari, F., & Galasso, F. (2019). Query-guided end-to-end person search. In IEEE conference on computer vision and pattern recognition (pp. 811–820).

  • Ouyang, W., & Wang, X. (2013). Joint deep learning for pedestrian detection. In International conference on computer vision (pp. 2056–2063).

  • Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., & Lin, D. (2019) Libra R-CNN: towards balanced learning for object detection. In IEEE conference on computer vision and pattern recognition (pp. 821–830).

  • Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., & Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. In Advance neural information processing systems (pp. 8024–8035).

  • Peng, Z., Li, Z., Zhang, J., Li, Y., Qi, G., & Tang, J. (2019). Few-shot image recognition with knowledge transfer. In International conference on computer vision (pp. 441–449).

  • Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A. (2016). You only look once: Unified, real-time object detection. In IEEE conference on computer vision and pattern recognition (pp. 779–788).

  • Redmon, J., & Farhadi, A. (2017). YOLO9000: better, faster, stronger. In IEEE conference on computer vision and pattern recognition (pp. 6517–6525).

  • Ren, S., He, K., Girshick, R. B., & Sun, J. (2017). Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149.

    Article  Google Scholar 

  • Song, G., Liu, Y., Wang, X. (2020). Revisiting the sibling head in object detector. In IEEE conference on computer vision and pattern recognition (pp. 11560–11569).

  • Su, C., Li, J., Zhang, S., Xing, J., Gao, W., & Tian, Q. (2017) Pose-driven deep convolutional model for person re-identification. In International conference on computer vision (pp. 3980–3989).

  • Sun, Y., Zheng, L., Yang, Y., Tian, Q., & Wang, S. (2018). Beyond part models: Person retrieval with refined part pooling (and A strong convolutional baseline). In European conference on computer vision (pp. 501–518).

  • Tian, Z., Shen, C., Chen, H., & He, T. (2019). FCOS: fully convolutional one-stage object detection. In International conference on computer vision (pp. 9626–9635).

  • Wang, C., Ma, B., Chang, H., Shan, S., & Chen, X. (2020). TCTS: A task-consistent two-stage framework for person search. In IEEE conference on computer vision and pattern recognition (pp. 11949–11958).

  • Wang, H., Zhu, X., Gong, S., & Xiang, T. (2018). Person re-identification in identity regression space. International Journal of Computer Vision, 126(12), 1288–1310.

    Article  Google Scholar 

  • Wang, J., Chen, K., Yang, S., Loy, C.C., & Lin, D. (2019) Region proposal by guided anchoring. In IEEE conference on computer vision and pattern recognition (pp. 2965–2974).

  • Wang, T., Gong, S., Zhu, X., & Wang, S. (2016). Person re-identification by discriminative selection in video ranking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(12), 2501–2514.

    Article  Google Scholar 

  • Wang, X., Fu, T., Liao, S., Wang, S., Lei, Z., & Mei, T. (2020). Exclusivity-consistency regularized knowledge distillation for face recognition. In European conference on computer vision (pp. 325–342).

  • Wei, L., Zhang, S., Gao, W., & Tian, Q. (2018). Person transfer GAN to bridge domain gap for person re-identification. In IEEE conference on computer vision and pattern recognition (pp. 79–88).

  • Xiao, J., Xie, Y., Tillo, T., Huang, K., Wei, Y., & Feng, J. (2019). IAN: The individual aggregation network for person search. Pattern Recognition, 87, 332–340.

    Article  Google Scholar 

  • Xiao, T., Li, S., Wang, B., Lin, L., & Wang, X. (2017). Joint detection and identification feature learning for person search. In IEEE conference on computer vision and pattern recognition (pp. 3376–3385).

  • Yan, Y., Li, J., Qin, J., Bai, S., Liao, S., Liu, L., Zhu, F., & Shao, L. (2021). Anchor-free person search. In IEEE conference on computer vision and pattern recognition (pp. 7690–7699).

  • Yan, Y., Qin, J., Ni, B., Chen, J., Liu, L., Zhu, F., Zheng, W. S., Yang, X., & Shao, L. (2020). Learning multi-attention context graph for group-based re-identification. Intell: IEEE Trans. Pattern Anal. Mach.

    Google Scholar 

  • Yan, Y., Zhang, Q., Ni, B., Zhang, W., Xu, M., & Yang, X. (2019). Learning context graph for person search. In IEEE conference on computer vision and pattern recognition (pp. 2158–2167).

  • Yang, Z., Liu, S., Hu, H., Wang, L., & Lin, S. (2019). Reppoints: Point set representation for object detection. In International conference on computer vision (pp. 9656–9665).

  • Yao, H., & Xu, C. (2021). Joint person objectness and repulsion for person search. IEEE Transactions on Image Processing, 30, 685–696.

    Article  Google Scholar 

  • Zhang, L., He, Z., Yang, Y., Wang, L., & Gao, X. (2022). Tasks integrated networks: Joint detection and retrieval for image search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1), 456–473.

    Article  Google Scholar 

  • Zhang, S., Benenson, R., & Schiele, B. (2017). Citypersons: A diverse dataset for pedestrian detection. In IEEE conference on computer vision and pattern recognition (pp. 4457–4465).

  • Zhang, S., Chen, D., Yang, J., & Schiele, B. (2021). Guided attention in cnns for occluded pedestrian detection and re-identification. International Journal of Computer Vision, 129(6), 1875–1892.

    Article  Google Scholar 

  • Zhang, S., Wen, L., Bian, X., Lei, Z., & Li, S.Z. (2018). Single-shot refinement neural network for object detection. In IEEE conference on computer vision and pattern recognition (pp. 4203–4212).

  • Zhang, X., Wang, X., Bian, J., Shen, C., & You, M. (2021). Diverse knowledge distillation for end-to-end person search. In AAAI (pp. 3412–3420).

  • Zhang, Y., Wang, C., Wang, X., Zeng, W., & Liu, W. (2020). Fairmot: On the fairness of detection and re-identification in multiple object tracking. arXiv:2004.01888.

  • Zheng, L., Zhang, H., Sun, S., Chandraker, M., Yang, Y., & Tian, Q. (2017) Person re-identification in the wild. In IEEE conference on computer vision and pattern recognition (pp. 3346–3355).

  • Zheng, W., Hong, J., Jiao, J., Wu, A., Zhu, X., Gong, S., Qin, J., & Lai, J. (2022). Joint bilateral-resolution identity modeling for cross-resolution person re-identification. International Journal of Computer Vision, 130(1), 136–156.

    Article  Google Scholar 

  • Zhou, X., Wang, D., & Krähenbühl, P. (2019) Objects as points. arXiv:1904.07850.

  • Zhu, X., Hu, H., Lin, S., & Dai, J. (2019). Deformable convnets V2: More deformable, better results. In IEEE conference on computer vision and pattern recognition (pp. 9308–9316).

  • Zhu, X., Zhu, X., Li, M., Morerio, P., Murino, V., & Gong, S. (2021). Intra-camera supervised person re-identification. International Journal of Computer Vision, 129(5), 1580–1595.

    Article  Google Scholar 

Download references

Acknowledgements

This paper was supported by NSFC (No. 62201342, U19B2035, 62276129), Shanghai Municipal Science and Technology Major Project (2021SHZDZX0102), the Natural Science Foundation of Jiangsu Province (No. BK20220890), and CAAI-Huawei MindSpore Open Fund.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yichao Yan or Jie Qin.

Additional information

Communicated by Jingdong Wang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yan, Y., Li, J., Qin, J. et al. Efficient Person Search: An Anchor-Free Approach. Int J Comput Vis 131, 1642–1661 (2023). https://doi.org/10.1007/s11263-023-01772-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-023-01772-3

Keywords

Navigation