Advertisement

A Hybrid 2D and 3D Convolution Based Recurrent Network for Video-Based Person Re-identification

  • Li Cheng
  • Xiao-Yuan Jing
  • Xiaoke Zhu
  • Fumin Qi
  • Fei Ma
  • Xiaodong Jia
  • Liang Yang
  • Chunhe Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11301)

Abstract

Video-based person re-identification (re-id), which aims to match people through videos captured by non-overlapping camera views, has attracted lots of research interest recently. In this paper, we propose a novel hybrid 2D and 3D convolution based recurrent neural network for video-based person re-id task, which can simultaneously make use of the local short-term fast-varying motion information and the global long-term spatial and temporal information. Specifically, the 3D convolutional module is able to explore the local short-term fast-varying motion information, while the recurrent layer performed can learn global long-term spatial and temporal information. We evaluate the proposed hybrid neural network on the publicly available PRID 2011, iLIDS-VID and MARS multi-shot pedestrian re-identification datasets, and the experiment results demonstrate the effectiveness of our approach on the task of video-based person re-id.

Keywords

3D convolution Short-term fast-varying motion information Spatial and temporal information 

Notes

Acknowledgments

The authors would like to thank the editors and anonymous reviewers for their constructive comments and suggestions. This work was supported by NSFC-Key Project of General Technology Fundamental Research United Fund No. U1736211, the National Key Research and Development Program of China under Grant No.2017YFB0202001, the National Nature Science Foundation of China under Grant Nos. 61672208, U1504611,41571417, the Natural Science Foundation Key Project for Innovation Group of Hubei Province under Grant No.2018CFA024, the Science and Technique Development Program of Henan under Grant Nos. 172102210186, 182102311066, the Medical Education Research Project of Henan No. Wjlx2016095.

References

  1. 1.
    Bazzani, L., Cristani, M., Perina, A., Murino, V.: Multiple-shot person re-identification by chromatic and epitomic analyses. Pattern Recogn. Lett. 29(1), 898–903 (2008)Google Scholar
  2. 2.
    Chen, Y., Zhu, X., Zheng, W., Lai, J.: Person re-identification by camera correlation aware feature augmentation. IEEE Trans. Pattern Anal. Mach. Intell. 40(2), 392–408 (2018)CrossRefGoogle Scholar
  3. 3.
    Chung, D., Tahboub, K., Delp, E.J.: A two stream siamese convolutional neural network for person re-identification. In: International Conference on Computer Vision, ICCV, pp. 1992–2000. IEEE Computer Society (2017)Google Scholar
  4. 4.
    Farenzena, M., Bazzani, L., Perina, A., Murino, V., Cristani, M.: Person re-identification by symmetry-driven accumulation of local features. In: Computer Vision and Pattern Recognition, CVPR, pp. 2360–2367. IEEE Computer Society (2010)Google Scholar
  5. 5.
    Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: Computer Vision and Pattern Recognition, CVPR, pp. 1933–1941. IEEE Computer Society (2016)Google Scholar
  6. 6.
    Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Computer Vision and Pattern Recognition, CVPR, pp. 580–587. IEEE Computer Society (2014)Google Scholar
  7. 7.
    Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: Computer Vision and Pattern Recognition, CVPR, pp. 1735–1742 (2006)Google Scholar
  8. 8.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition, CVPR, pp. 770–778. IEEE Computer Society (2016)Google Scholar
  9. 9.
    He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_38CrossRefGoogle Scholar
  10. 10.
    Hirzer, M., Beleznai, C., Roth, P.M., Bischof, H.: Person re-identification by descriptive and discriminative classification. In: Heyden, A., Kahl, F. (eds.) SCIA 2011. LNCS, vol. 6688, pp. 91–102. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-21227-7_9CrossRefGoogle Scholar
  11. 11.
    Huang, Y., Wang, W., Wang, L.: Video super-resolution via bidirectional recurrent convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 1015–1028 (2018)CrossRefGoogle Scholar
  12. 12.
    Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)CrossRefGoogle Scholar
  13. 13.
    Jing, X.Y., et al.: Super-resolution person re-identification with semi-coupled low-rank discriminant dictionary learning. In: Computer Vision and Pattern Recognition, CVPR, pp. 695–704. IEEE Computer Society (2015)Google Scholar
  14. 14.
    Li, S., Shao, M., Fu, Y.: Person re-identification by cross-view multi-level dictionary learning. IEEE Trans. Pattern Anal. Mach. Intell. (2017)Google Scholar
  15. 15.
    Liu, K., Ma, B., Zhang, W., Huang, R.: A spatio-temporal appearance representation for viceo-based pedestrian re-identification. In: International Conference on Computer Vision, ICCV, pp. 3810–3818. IEEE Computer Society (2015)Google Scholar
  16. 16.
    McLaughlin, N., del Rincón, J.M., Miller, P.C.: Recurrent convolutional network for video-based person re-identification. In: Computer Vision and Pattern Recognition, CVPR, pp. 1325–1334. IEEE Computer Society (2016)Google Scholar
  17. 17.
    Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Computer Vision and Pattern Recognition, CVPR, pp. 815–823. IEEE Computer Society (2015)Google Scholar
  18. 18.
    Su, C., Yang, F., Zhang, S., Tian, Q., Davis, L.S., Gao, W.: Multi-task learning with low rank attribute embedding for person re-identification. In: IEEE International Conference on Computer Vision, ICCV. pp. 3739–3747. IEEE Computer Society (2015)Google Scholar
  19. 19.
    Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 4278–4284. AAAI Press (2017)Google Scholar
  20. 20.
    Wang, T., Gong, S., Zhu, X., Wang, S.: Person re-identification by video ranking. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 688–703. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10593-2_45CrossRefGoogle Scholar
  21. 21.
    Xie, Y., Yu, H., Gong, X., Dong, Z., Gao, Y.: Learning visual-spatial saliency for multiple-shot person re-identification. IEEE Sig. Process. Lett. 22(11), 1854–1858 (2015)CrossRefGoogle Scholar
  22. 22.
    Xu, S., Cheng, Y., Gu, K., Yang, Y., Chang, S., Zhou, P.: Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In: International Conference on Computer Vision, ICCV, pp. 4743–4752. IEEE Computer Society (2017)Google Scholar
  23. 23.
    You, J., Wu, A., Li, X., Zheng, W.: Top-push video-based person re-identification. In: Computer Vision and Pattern Recognition, CVPR, pp. 1345–1353. IEEE Computer Society (2016)Google Scholar
  24. 24.
    Yu, H., Wang, J., Huang, Z., Yang, Y., Xu, W.: Video paragraph captioning using hierarchical recurrent neural networks. In: Computer Vision and Pattern Recognition, CVPR, pp. 4584–4593. IEEE Computer Society (2016)Google Scholar
  25. 25.
    Zhang, W., Chen, Q., Zhang, W., He, X.: Video paragraph captioning using hierarchical recurrent neural networks. Neurocomputing 275, 781–787 (2018)CrossRefGoogle Scholar
  26. 26.
    Zhang, W., Yu, X., He, X.: Learning bidirectional temporal cues for video-based person re-identification. IEEE Trans. Circuits Syst. Video Technol. 28(10), 2768–2776 (2018)CrossRefGoogle Scholar
  27. 27.
    Zheng, L., et al.: MARS: a video benchmark for large-scale person re-identification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 868–884. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_52CrossRefGoogle Scholar
  28. 28.
    Zheng, L., Wang, S., Tian, L., He, F., Liu, Z., Tian, Q.: Query-adaptive late fusion for image search and person re-identification. In: Computer Vision and Pattern Recognition, CVPR, pp. 1741–1750 (2015)Google Scholar
  29. 29.
    Zhou, Z., Huang, Y., Wang, W., Wang, L., Tan, T.: See the forest for the trees: joint spatial and temporal recurrent neural networks for video-based person re-identification. In: Computer Vision and Pattern Recognition, CVPR, pp. 6776–6785. IEEE Computer Society (2017)Google Scholar
  30. 30.
    Zhu, X., Jing, X., Wu, F., Feng, H.: Video-based person re-identification by simultaneously learning intra-video and inter-video distance metrics. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI, pp. 3552–3559. IJCAI/AAAI Press (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Li Cheng
    • 1
  • Xiao-Yuan Jing
    • 1
    • 2
  • Xiaoke Zhu
    • 1
    • 3
  • Fumin Qi
    • 1
  • Fei Ma
    • 1
  • Xiaodong Jia
    • 1
  • Liang Yang
    • 1
  • Chunhe Wang
    • 1
  1. 1.School of ComputerWuhan UniversityWuhanChina
  2. 2.College of AutomationNanjing University of Posts and TelecommunicationsNanjingChina
  3. 3.School of Computer and Information EngineeringHenan UniversityKaifengChina

Personalised recommendations