Advertisement

Person Search in Videos with One Portrait Through Visual and Temporal Links

  • Qingqiu HuangEmail author
  • Wentao Liu
  • Dahua Lin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11217)

Abstract

In real-world applications, e.g. law enforcement and video retrieval, one often needs to search a certain person in long videos with just one portrait. This is much more challenging than the conventional settings for person re-identification, as the search may need to be carried out in the environments different from where the portrait was taken. In this paper, we aim to tackle this challenge and propose a novel framework, which takes into account the identity invariance along a tracklet, thus allowing person identities to be propagated via both the visual and the temporal links. We also develop a novel scheme called Progressive Propagation via Competitive Consensus, which significantly improves the reliability of the propagation process. To promote the study of person search, we construct a large-scale benchmark, which contains 127K manually annotated tracklets from 192 movies. Experiments show that our approach remarkably outperforms mainstream person re-id methods, raising the mAP from \(42.16\%\) to \(62.27\%\) (Code at https://github.com/hqqasw/person-search-PPCC).

Keywords

Person search Portrait Visual and temporal Progressive Propagation Competitive Consensus 

Notes

Acknowledgement

This work is partially supported by the Big Data Collaboration Research grant from SenseTime Group (CUHK Agreement No. TS1610626), the General Research Fund (GRF) of Hong Kong (No. 14236516).

Supplementary material

Supplementary material 1 (mp4 10794 KB)

474201_1_En_26_MOESM2_ESM.pdf (122 kb)
Supplementary material 2 (pdf 121 KB)

References

  1. 1.
    Ahmed, E., Jones, M., Marks, T.K.: An improved deep learning architecture for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3908–3916 (2015)Google Scholar
  2. 2.
    Apostolidis, E., Mezaris, V.: Fast shot segmentation combining global and local visual descriptors. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6583–6587. IEEE (2014)Google Scholar
  3. 3.
    Cheng, D., Gong, Y., Zhou, S., Wang, J., Zheng, N.: Person re-identification by multi-channel parts-based cnn with improved triplet loss function. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1335–1344 (2016)Google Scholar
  4. 4.
    Ding, S., Lin, L., Wang, G., Chao, H.: Deep feature learning with relative distance comparison for person re-identification. Pattern Recogn. 48(10), 2993–3003 (2015)CrossRefGoogle Scholar
  5. 5.
    Feris, R., Bobbitt, R., Brown, L., Pankanti, S.: Attribute-based people search: lessons learnt from a practical surveillance system. In: Proceedings of International Conference on Multimedia Retrieval, p. 153. ACM (2014)Google Scholar
  6. 6.
    Gheissari, N., Sebastian, T.B., Hartley, R.: Person reidentification using spatiotemporal appearance. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1528–1535. IEEE (2006)Google Scholar
  7. 7.
    Gong, S., Cristani, M., Yan, S., Loy, C.C. (eds.): Person Re-Identification. ACVPR. Springer, London (2014).  https://doi.org/10.1007/978-1-4471-6296-4CrossRefGoogle Scholar
  8. 8.
    Gou, M., Karanam, S., Liu, W., Camps, O., Radke, R.J.: DukeMTMC4ReID: a large-scale multi-camera person re-identification dataset. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (2017)Google Scholar
  9. 9.
    Gray, D., Tao, H.: Viewpoint invariant pedestrian recognition with an ensemble of localized features. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 262–275. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-88682-2_21CrossRefGoogle Scholar
  10. 10.
    Guo, Y., Zhang, L., Hu, Y., He, X., Gao, J.: Ms-celeb-1m: challenge of recognizing one million celebrities in the real world. Electron. Imaging 2016(11), 1–6 (2016)CrossRefGoogle Scholar
  11. 11.
    Hamdoun, O., Moutarde, F., Stanciulescu, B., Steux, B.: Person re-identification in multi-camera system by signature based on interest point descriptors collected on short video sequences. In: Second ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC 2008), pp. 1–6. IEEE (2008)Google Scholar
  12. 12.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  13. 13.
    Hirzer, M., Beleznai, C., Roth, P.M., Bischof, H.: Person re-identification by descriptive and discriminative classification. In: Heyden, A., Kahl, F. (eds.) SCIA 2011. LNCS, vol. 6688, pp. 91–102. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-21227-7_9CrossRefGoogle Scholar
  14. 14.
    Huang, Q., Xiong, Y., Lin, D.: Unifying identification and context learning for person recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2217–2225 (2018)Google Scholar
  15. 15.
    Joon Oh, S., Benenson, R., Fritz, M., Schiele, B.: Person recognition in personal photo collections. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3862–3870 (2015)Google Scholar
  16. 16.
    Karanam, S., Gou, M., Wu, Z., Rates-Borras, A., Camps, O., Radke, R.J.: A systematic evaluation and benchmark for person re-identification: features, metrics, and datasets. arXiv preprint arXiv:1605.09653 (2016)
  17. 17.
    Koestinger, M., Hirzer, M., Wohlhart, P., Roth, P.M., Bischof, H.: Large scale metric learning from equivalence constraints. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2288–2295. IEEE (2012)Google Scholar
  18. 18.
    Kumar, V., Namboodiri, A.M., Jawahar, C.: Face recognition in videos by label propagation. In: 2014 22nd International Conference on Pattern Recognition (ICPR), pp. 303–308. IEEE (2014)Google Scholar
  19. 19.
    Li, H., Brandt, J., Lin, Z., Shen, X., Hua, G.: A multi-level contextual model for person recognition in photo albums. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1297–1305 (2016)Google Scholar
  20. 20.
    Li, H., Lu, H., Lin, Z., Shen, X., Price, B.: Inner and inter label propagation: salient object detection in the wild. IEEE Trans. Image Process. 24(10), 3176–3186 (2015)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Li, S., Xiao, T., Li, H., Zhou, B., Yue, D., Wang, X.: Person search with natural language description. In: Proceedings of the CVPR (2017)Google Scholar
  22. 22.
    Li, W., Zhao, R., Xiao, T., Wang, X.: Deepreid: deep filter pairing neural network for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 152–159 (2014)Google Scholar
  23. 23.
    Liao, S., Hu, Y., Zhu, X., Li, S.Z.: Person re-identification by local maximal occurrence representation and metric learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2197–2206 (2015)Google Scholar
  24. 24.
    Lin, D., Kapoor, A., Hua, G., Baker, S.: Joint people, event, and location recognition in personal photo collections using cross-domain context. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 243–256. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15549-9_18CrossRefGoogle Scholar
  25. 25.
    Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  26. 26.
    Ma, B., Su, Y., Jurie, F.: Local descriptors encoded by fisher vectors for person re-identification. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012. LNCS, vol. 7583, pp. 413–422. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33863-2_41CrossRefGoogle Scholar
  27. 27.
    Ma, B., Su, Y., Jurie, F.: Covariance descriptor based on bio-inspired features for person re-identification and face verification. Image Vis. Comput. 32(6–7), 379–390 (2014)CrossRefGoogle Scholar
  28. 28.
    Prosser, B.J., Zheng, W.S., Gong, S., Xiang, T., Mary, Q.: Person re-identification by support vector ranking. In: BMVC, vol. 2, p. 6 (2010)Google Scholar
  29. 29.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  30. 30.
    Rohrbach, M., Ebert, S., Schiele, B.: Transfer learning in a transductive setting. In: Advances in nEural Information Processing Systems, pp. 46–54 (2013)Google Scholar
  31. 31.
    Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Sener, O., Song, H.O., Saxena, A., Savarese, S.: Learning transferrable representations for unsupervised domain adaptation. In: Advances in Neural Information Processing Systems, pp. 2110–2118 (2016)Google Scholar
  33. 33.
    Sheikh, R., Garbade, M., Gall, J.: Real-time semantic segmentation with label propagation. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 3–14. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-48881-3_1CrossRefGoogle Scholar
  34. 34.
    Sidiropoulos, P., Mezaris, V., Kompatsiaris, I., Meinedo, H., Bugalho, M., Trancoso, I.: Temporal video segmentation to scenes using high-level audiovisual features. IEEE Trans. Circ. Syst. Video Technol. 21(8), 1163–1177 (2011)CrossRefGoogle Scholar
  35. 35.
    Su, C., Zhang, S., Xing, J., Gao, W., Tian, Q.: Deep attributes driven multi-camera person re-identification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 475–491. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_30CrossRefGoogle Scholar
  36. 36.
    Tripathi, S., Belongie, S., Hwang, Y., Nguyen, T.: Detecting temporally consistent objects in videos through object class label propagation. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–9. IEEE (2016)Google Scholar
  37. 37.
    Wang, J., Jebara, T., Chang, S.F.: Graph transduction via alternating minimization. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1144–1151. ACM (2008)Google Scholar
  38. 38.
    Wang, T., Gong, S., Zhu, X., Wang, S.: Person re-identification by discriminative selection in video ranking. IEEE Trans. Pattern Anal. Mach. Intell. 38(12), 2501–2514 (2016)CrossRefGoogle Scholar
  39. 39.
    Xiao, T., Li, H., Ouyang, W., Wang, X.: Learning deep feature representations with domain guided dropout for person re-identification. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1249–1258. IEEE (2016)Google Scholar
  40. 40.
    Xiao, T., Li, S., Wang, B., Lin, L., Wang, X.: Joint detection and identification feature learning for person search. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3376–3385. IEEE (2017)Google Scholar
  41. 41.
    Zajdel, W., Zivkovic, Z., Krose, B.: Keeping track of humans: have i seen this person before? In: Proceedings of the 2005 IEEE International Conference on Robotics and Automation, ICRA 2005. pp. 2081–2086. IEEE (2005)Google Scholar
  42. 42.
    Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Sig. Process. Lett. 23(10), 1499–1503 (2016)CrossRefGoogle Scholar
  43. 43.
    Zhang, N., Paluri, M., Taigman, Y., Fergus, R., Bourdev, L.: Beyond frontal faces: improving person recognition using multiple cues. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4804–4813 (2015)Google Scholar
  44. 44.
    Zheng, L., et al.: MARS: a video benchmark for large-scale person re-identification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 868–884. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_52CrossRefGoogle Scholar
  45. 45.
    Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1116–1124 (2015)Google Scholar
  46. 46.
    Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems, pp. 321–328 (2004)Google Scholar
  47. 47.
    Zhu, X., Ghahramani, Z.: Learning from labeled and unlabeled data with label propagation (2002)Google Scholar
  48. 48.
    Zoidi, O., Tefas, A., Nikolaidis, N., Pitas, I.: Person identity label propagation in stereo videos. IEEE Trans. Multimedia 16(5), 1358–1368 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.CUHK-SenseTime Joint LabThe Chinese University of Hong Kong ShatinHong Kong
  2. 2.Department of Computer Science and TechnologyTsinghua UniversityBeijingChina
  3. 3.SenseTime ResearchBeijingChina

Personalised recommendations