Skip to main content
Log in

A multitask joint framework for real-time person search

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Person searches generally involve three important parts: person detection, feature extraction and identity comparison. However, a person search integrating detection, extraction and comparison has the two following drawbacks. First, the accuracy of detection will affect the accuracy of comparison. Second, it is difficult to achieve real-time results in real-world applications. To solve these problems, we propose a multitask joint framework for real-time person search (MJF) that optimizes person detection, feature extraction and identity comparison. For the person detection module, we propose the YOLOv5-GS model, which is trained with a person dataset. YOLOv5-GS combines the advantages of the Ghostnet and the squeeze-and-excitation block and improves the speed of person detection. For the feature extraction module, we design a model adaptation architecture, which can select different networks according to the number of people. It can balance the relationship between accuracy and speed. For identity comparison, we propose a 3D pooled table and a matching strategy to improve identification accuracy. On the condition of 1920 \(\times\) 1080-resolution video and a 200-ID table, the IR and the FPS achieved by our method reach 82.69% and 25.14, respectively. Therefore, the MJF can achieve real-time person search.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Xu, Y., Ma, B., Huang, R., Lin, L.: Person search in a scene by jointly modeling people commonness and person uniqueness. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 937–940 (2014)

  2. Zhong, Z., Zheng, L., Cao, D., Li, S.: Re-ranking person re-identification with k-reciprocal encoding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1318–1327 (2017)

  3. Sun, Y., Zheng, L., Deng, W., Wang, S.: Svdnet for pedestrian retrieval. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3800–3808 (2017)

  4. He, T., Zhang, Z., Zhang, H., Zhang, Z., Xie, J., Li, M.: Bag of tricks for image classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 558–567 (2019)

  5. Xiao, T., Li, S., Wang, B., Lin, L., Wang, X.: End-to-end deep learning for person search 2 (2). arXiv preprint arXiv:1604.01850

  6. Munjal, B., Amin, S., Tombari, F., Galasso, F.: Query-guided end-to-end person search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 811–820 (2019)

  7. He, Z., Zhang, L.: End-to-end detection and re-identification integrated net for person search. In: Asian Conference on Computer Vision, pp. 349–364. Springer (2018)

  8. Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: Ghostnet: More features from cheap operations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1580–1589 (2020)

  9. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

  10. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)

  11. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)

  12. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

  13. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

  14. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)

  15. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C-Y., Berg, AC.: Ssd: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016)

  16. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

  17. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)

  18. Farhadi, A., Redmon, J.: Yolov3: an incremental improvement. Computer Vision and Pattern Recognition, cite as

  19. Bochkovskiy, A., Wang, C-Y., Liao, H-YM.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934

  20. D-fes: Deep facial expression recognition system. In: Conference on Information and Communication Technology

  21. Singh, H., Dhanak, N., Ansari, H., Kumar, K.: Hdml: habit detection with machine learning. In: International Conference on Computer and Communication Technology (2017)

  22. Sharma, S., Kumar, K.: Asl-3dcnn: American sign language recognition technique using 3-d convolutional neural networks. Multimed. Tools Appl. 80(17), 26319–26331 (2021)

    Article  Google Scholar 

  23. A novel superpixel based color spatial feature for salient object detection. In: Conference on Information and Communication Technology

  24. Ansari, H., Vijayvergia, A., Kumar, K.: Dcr-hmm: depression detection based on content rating using hidden markov model. In: In Proceedings of IEEE 2nd Conference on Information and Communication Technology, (CICT 2018) (2019)

  25. Negi, A., Chauhan, P., Kumar, K., Rajput, RS.: Face mask detection classifier and model pruning with keras-surgeon. In: 2020 5th IEEE International Conference on Recent Advances and Innovations in Engineering (ICRAIE) (2020)

  26. Vijayvergia, A., Kumar, K.: Selective shallow models strength integration for emotion detection using GloVe and LSTM. Multim. Tools Appl. 80, 28349–28363 (2021)

    Article  Google Scholar 

  27. Hu, J., Gao, X., Wu, H., Gao, S.: Detection of workers without the helments in videos based on yolo v3. In: 2019 12th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), pp. 1–4. IEEE (2019)

  28. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)

  29. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

  30. Zhang, H., Wu, C., Zhang, Z., Zhu, Y., Zhang, Z., Lin, H., Sun, Y., He, T., Mueller, J., Manmatha, R., et al.: Resnest: Split-attention networks. arXiv preprint arXiv:2004.08955

  31. Li, W., Zhao, R., Xiao, T., Wang, X.: Deepreid: deep filter pairing neural network for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 152–159 (2014)

  32. Varior, RR., Shuai, B., Lu, J., Xu, D., Wang, G.: A siamese long short-term memory architecture for human re-identification. In: European Conference on Computer Vision, pp. 135–153. Springer (2016)

  33. Sharma, S., Kumar, P., Kumar, K.: Lexer: Lexicon based emotion analyzer

  34. Kumar, S., Kumar, K.: Lsrc: Lexicon star rating system over cloud, pp. 1–6 (2018)

  35. Kumar, S., Kumar, K.: Irsc: Integrated automated review mining system using virtual machines in cloud environment. In: 2018 Conference on Information and Communication Technology (CICT) (2018)

  36. Su, C., Li, J., Zhang, S., Xing, J., Gao, W., Tian, Q.: Pose-driven deep convolutional model for person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3960–3969 (2017)

  37. Zhao, H., Tian, M., Sun, S., Shao, J., Yan, J., Yi, S., Wang, X., Tang, X.: Spindle net: person re-identification with human body region guided feature decomposition and fusion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1077–1085 (2017)

  38. Vijayvergia, A., Kumar, K.: Star: rating of reviews by exploiting variation in emotions using transfer learning framework. In: 2018 Conference on Information and Communication Technology (CICT) (2018)

  39. Darbari, A., Kumar, K., Darbari, S., Patil, P.L.: Requirement of artificial intelligence technology awareness for thoracic surgeons. Cardiothorac. Surg. 29(1), 13 (2021)

    Article  Google Scholar 

  40. Kumar, K., Kurhekar, M.: Sentimentalizer: Docker container utility over cloud. In: International Conference on Advances in Pattern Recognition

  41. Zhao, L., Li, X., Zhuang, Y., Wang, J.: Deeply-learned part-aligned representations for person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3219–3228 (2017)

  42. Xiao, T., Li, S., Wang, B., Lin, L., Wang, X.: Joint detection and identification feature learning for person search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3415–3424 (2017)

  43. Li, Y., Yin, G., Liu, C., Yang, X., Wang, Z.: Triplet online instance matching loss for person re-identification. Neurocomputing 433, 10–18 (2021)

    Article  Google Scholar 

  44. Wang, C-Y., Mark Liao, H-Y., Wu, Y-H., Chen, P-Y., Hsieh, J-W., Yeh, I-H.: Cspnet: A new backbone that can enhance learning capability of cnn. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 390–391 (2020)

  45. Lin, T-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, CL.: Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp. 740–755 (2014)

  46. Shao, S., Zhao, Z., Li, B., Xiao, T., Yu, G., Zhang, X., Sun, J.: Crowdhuman: a benchmark for detecting human in a crowd. arXiv preprint arXiv:1805.00123

  47. Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1116–1124 (2015)

  48. Gou, M., Karanam, S., Liu, W., Camps, O., Radke, R.J.: Dukemtmc4reid: A large-scale multi-camera person re-identification dataset. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1425–1434 (2017). https://doi.org/10.1109/CVPRW.2017.185

  49. Bolle, RM., Connell, JH., Pankanti, S., Ratha, NK., Senior, AW.: The relation between the roc curve and the cmc. In: IEEE Workshop on Automatic Identification Advanced Technologies (2005)

  50. Zheng, F., Deng, C., Sun, X., Jiang, X., Guo, X., Yu, Z., Huang, F., Ji, R.: Pyramidal person re-identification via multi-loss dynamic training. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8506–8514 (2019). https://doi.org/10.1109/CVPR.2019.00871

  51. Zhang, X., Luo, H., Fan, X., Xiang, W., Sun, Y., Xiao, Q., Jiang, W., Zhang, C., Sun, J.: Alignedreid: Surpassing human-level performance in person re-identification. arXiv preprint arXiv:1711.08184

  52. Martinel, N., Foresti, G.L., Micheloni, C.: Aggregating deep pyramidal representations for person re-identification. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1544–1554 (2019). https://doi.org/10.1109/CVPRW.2019.00196

  53. Herzog, F., Ji, X., Teepe, T., Hörmann, S., Gilg, J., Rigoll, G.: Lightweight multi-branch network for person re-identification. arXiv preprint arXiv:2101.10774

  54. Wang, H., Fan, Y., Wang, Z., Jiao, L., Schiele, B.: Parameter-free spatial attention network for person re-identification. arXiv preprint arXiv:1811.12150

  55. Gao, S.H., Cheng, M.M., Zhao, K., Zhang, X.Y., Yang, M.H., Torr, P.: Res2net: a new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 652–662 (2021). https://doi.org/10.1109/TPAMI.2019.2938758

    Article  Google Scholar 

  56. Zhou, K., Yang, Y., Cavallaro, A., Xiang, T.: Omni-scale feature learning for person re-identification. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3701–3711 (2019). https://doi.org/10.1109/ICCV.2019.00380

  57. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. arXiv preprint arXiv:1611.05431

  58. Zhang, H., Wu, C., Zhang, Z., Zhu, Y., Zhang, Z., Lin, H., Sun, Y., He, T., Muller, J., Manmatha, R., Li, M., Smola, A.: Resnest: Split-attention networks. arXiv preprint arXiv:2004.08955

Download references

Funding

This work was supported in part by the Natural Science Foundation of Xinjiang Uygur Autonomous Region (no. 2022D01B05).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xinzhong Wang, Guangqiang Yin or Zhiguo Wang.

Ethics declarations

Conflict of interest

The data that support the findings of this study are available online. These datasets were derived from the following public domain resources: [COCO, CrowdHuman, Market-1501, DukeMTMC].

Additional information

Communicated by C. Yan.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Yin, K., Liang, J. et al. A multitask joint framework for real-time person search. Multimedia Systems 29, 211–222 (2023). https://doi.org/10.1007/s00530-022-00982-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-022-00982-y

Keywords

Navigation