Video-Based Person Re-identification by 3D Convolutional Neural Networks and Improved Parameter Learning

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10882)


In this paper we propose a novel approach for video-based person re-identification that exploits convolutional neural networks to learn the similarity of persons observed from video camera. We take 3-dimensional convolutional neural networks (3D CNN) to extract fine-grained spatiotemporal features from the video sequence of a person. Unlike recurrent neural networks, 3D CNN preserves the spatial patterns of the input, which works well on re-identification problem. The network maps each video sequence of a person to a Euclidean space where distances between feature embeddings directly correspond to measures of person similarity. By our improved parameter learning method called entire triplet loss, all possible triplets in the mini-batch are taken into account to update network parameters. This parameter updating method significantly improves training, enabling the embeddings to be more discriminative. Experimental results show that our model achieves new state of the art identification rate on iLIDS-VID dataset and PRID-2011 dataset with 82.0%, 83.3% at rank 1, respectively.


  1. 1.
    Ahmed, E., Jones, M., Marks, T.K.: An improved deep learning architecture for person re-identification. In: CVPR (2015)Google Scholar
  2. 2.
    Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: A deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561 (2015)
  3. 3.
    Cheng, D., Gong, Y., Zhou, S., Wang, J., Zheng, N.: Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In: CVPR (2016)Google Scholar
  4. 4.
    Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
  5. 5.
    Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: CVPR (2015)Google Scholar
  6. 6.
    Gao, C., Wang, J., Liu, L., Yu, J.G., Sang, N.: Temporally aligned pooling representation for video-based person re-identification. In: ICIP (2016)Google Scholar
  7. 7.
    Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: CVPR (2006)Google Scholar
  8. 8.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  9. 9.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  10. 10.
    Huang, C., Loy, C.C., Tang, X.: Local similarity-aware deep feature embedding. In: NIPS (2016)Google Scholar
  11. 11.
    Ji, S., Xu, W., Yang, M., Yu, K.: 3d convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)CrossRefGoogle Scholar
  12. 12.
    Kingma, D., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  13. 13.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  14. 14.
    Li, W., Zhao, R., Xiao, T., Wang, X.: Deepreid: Deep filter pairing neural network for person re-identification. In: CVPR (2014)Google Scholar
  15. 15.
    Liu, K., Ma, B., Zhang, W., Huang, R.: A spatio-temporal appearance representation for viceo-based pedestrian re-identification. In: ICCV (2015)Google Scholar
  16. 16.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)Google Scholar
  17. 17.
    Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)zbMATHGoogle Scholar
  18. 18.
    McLaughlin, N., Del Rincon, J.M., Miller, P.: Data-augmentation for reducing dataset bias in person re-identification. In: AVSS (2015)Google Scholar
  19. 19.
    McLaughlin, N., Martinez del Rincon, J., Miller, P.: Recurrent convolutional network for video-based person re-identification. In: CVPR (2016)Google Scholar
  20. 20.
    Oh Song, H., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: CVPR (2016)Google Scholar
  21. 21.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: NIPS (2015)Google Scholar
  22. 22.
    Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: CVPR (2015)Google Scholar
  23. 23.
    Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS (2014)Google Scholar
  24. 24.
    Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV (2015)Google Scholar
  25. 25.
    Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu, Y.: Learning fine-grained image similarity with deep ranking. In: CVPR (2014)Google Scholar
  26. 26.
    Wang, T., Gong, S., Zhu, X., Wang, S.: Person re-identification by discriminative selection in video ranking. In: PAMI (2016)CrossRefGoogle Scholar
  27. 27.
    Wu, L., Shen, C., Hengel, A.v.d.: Deep recurrent convolutional networks for video-based person re-identification: An end-to-end approach. arXiv preprint arXiv:1606.01609 (2016)
  28. 28.
    Xiao, T., Li, H., Ouyang, W., Wang, X.: Learning deep feature representations with domain guided dropout for person re-identification. In: CVPR (2016)Google Scholar
  29. 29.
    You, J., Wu, A., Li, X., Zheng, W.S.: Top-push video-based person re-identification. In: CVPR (2016)Google Scholar
  30. 30.
    Zhao, R., Ouyang, W., Wang, X.: Person re-identification by salience matching. In: ICCV (2013)Google Scholar
  31. 31.
    Zheng, L., Bie, Z., Sun, Y., Wang, J., Su, C., Wang, S., Tian, Q.: Mars: a video benchmark for large-scale person re-identification. In: ECCV (2016)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Keio UniversityTokyoJapan
  2. 2.Panasonic CorporationOsakaJapan

Personalised recommendations