Skip to main content

Robust Re-Identification by Multiple Views Knowledge Distillation

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Abstract

To achieve robustness in Re-Identification, standard methods leverage tracking information in a Video-To-Video fashion. However, these solutions face a large drop in performance for single image queries (e.g., Image-To-Video setting). Recent works address this severe degradation by transferring temporal information from a Video-based network to an Image-based one. In this work, we devise a training strategy that allows the transfer of a superior knowledge, arising from a set of views depicting the target object. Our proposal – Views Knowledge Distillation (VKD) – pins this visual variety as a supervision signal within a teacher-student framework, where the teacher educates a student who observes fewer views. As a result, the student outperforms not only its teacher but also the current state-of-the-art in Image-To-Video by a wide margin (6.3% mAP on MARS, 8.6% on Duke and 5% on VeRi-776). A thorough analysis – on Person, Vehicle and Animal Re-ID – investigates the properties of VKD from a qualitatively and quantitatively perspective. Code is available at https://github.com/aimagelab/VKD.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For the sake of clarity, all the loss terms are referred to one single example. In the implementation, we extend the penalties to a batch by averaging.

  2. 2.

    Since the teacher parameters are fixed, its entropy is constant and the objective of Eq. 3 reduces to the cross-entropy between \(\textit{\textbf{y}}_{T}\) and \(\textit{\textbf{y}}_{S}\).

  3. 3.

    In the following, we refer to Duke-Video-ReID simply as Duke. Another variant of Duke named Duke-ReID exists  [34], but it does not come with query tracklets.

  4. 4.

    Since VeRi-776 does not include any tracklet information in the query set, following all other competitors we limit experiments to the I2V setting only.

References

  1. Alfasly, S.A.S., et al.: Variational representation learning for vehicle re-identification. In: IEEE International Conference on Image Processing (2019)

    Google Scholar 

  2. Bagherinezhad, H., Horton, M., Rastegari, M., Farhadi, A.: Label refinery: improving ImageNet classification through label progression. arXiv preprint arXiv:1805.02641 (2018)

  3. Bao, L., Ma, B., Chang, H., Chen, X.: Masked graph attention network for person re-identification. In: IEEE International Conference on Computer Vision and Pattern Recognition Workshops (2019)

    Google Scholar 

  4. Bergamini, L., et al.: Multi-views embedding for cattle re-identification. In: IEEE International Conference on Signal-Image Technology & Internet-Based Systems (2018)

    Google Scholar 

  5. Bhardwaj, S., Srinivasan, M., Khapra, M.M.: Efficient video classification using fewer frames. In: IEEE International Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  6. Chen, D., Li, H., Xiao, T., Yi, S., Wang, X.: Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. In: IEEE International Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  7. Chu, R., et al.: Vehicle re-identification with viewpoint-aware metric learning. In: IEEE International Conference on Computer Vision (2019)

    Google Scholar 

  8. Fu, Y., Wang, X., Wei, Y., Huang, T.: STA: spatial-temporal attention for large-scale video-based person re-identification. In: AAAI Conference on Artificial Intelligence (2019)

    Google Scholar 

  9. Furlanello, T., Lipton, Z.C., Tschannen, M., Itti, L., Anandkumar, A.: Born again neural networks. In: International Conference on Machine Learning (2018)

    Google Scholar 

  10. Gu, X., Ma, B., Chang, H., Shan, S., Chen, X.: Temporal knowledge propagation for image-to-video person re-identification. In: IEEE International Conference on Computer Vision (2019)

    Google Scholar 

  11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE International Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  12. Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017)

  13. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: NeurIPS Deep Learning and Representation Learning Workshop (2015)

    Google Scholar 

  14. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE International Conference on Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  15. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (2015)

    Google Scholar 

  16. Khan, S.D., Ullah, H.: A survey of advances in vision-based vehicle re-identification. Comput. Vis. Image Underst. 183, 50–63 (2019)

    Article  Google Scholar 

  17. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. Int. Conf. Learn. Represent. (2015)

    Google Scholar 

  18. Li, S., Li, J., Lin, W., Tang, H.: Amur tiger re-identification in the wild. arXiv preprint arXiv:1906.05586 (2019)

  19. Li, Z., Hoiem, D.: Learning without forgetting. In: European Conference on Computer Vision (2016)

    Google Scholar 

  20. Liao, S., Hu, Y., Zhu, X., Li, S.Z.: Person re-identification by local maximal occurrence representation and metric learning. In: IEEE International Conference on Computer Vision and Pattern Recognition (2015)

    Google Scholar 

  21. Liu, C., Zhang, R., Guo, L.: Part-pose guided amur tiger re-identification. In: IEEE International Conference on Computer Vision Workshops (2019)

    Google Scholar 

  22. Liu, C.T., Wu, C.W., Wang, Y.C.F., Chien, S.Y.: Spatially and temporally efficient non-local attention network for video-based person re-identification. In: British Machine Vision Conference (2019)

    Google Scholar 

  23. Liu, N., Zhao, Q., Zhang, N., Cheng, X., Zhu, J.: Pose-guided complementary features learning for Amur tiger re-identification. In: IEEE International Conference on Computer Vision Workshops (2019)

    Google Scholar 

  24. Liu, X., Zhang, S., Huang, Q., Gao, W.: Ram: a region-aware deep model for vehicle re-identification. In: IEEE International Conference on Multimedia and Expo (ICME) (2018)

    Google Scholar 

  25. Liu, X., Liu, W., Mei, T., Ma, H.: A deep learning-based approach to progressive vehicle re-identification for urban surveillance. In: European Conference on Computer Vision (2016)

    Google Scholar 

  26. Liu, X., Liu, W., Mei, T., Ma, H.: PROVID: progressive and multimodal vehicle reidentification for large-scale urban surveillance. IEEE Trans. Multimedia 20(3), 645–658 (2017)

    Article  Google Scholar 

  27. Liu, Y., Junjie, Y., Ouyang, W.: Quality aware network for set to set recognition. In: IEEE International Conference on Computer Vision (2017)

    Google Scholar 

  28. Luo, H., Gu, Y., Liao, X., Lai, S., Jiang, W.: Bag of tricks and a strong baseline for deep person re-identification. In: IEEE International Conference on Computer Vision and Pattern Recognition Workshops (2019)

    Google Scholar 

  29. Matiyali, N., Sharma, G.: Video person re-identification using learned clip similarity aggregation. In: The IEEE Winter Conference on Applications of Computer Vision (2020)

    Google Scholar 

  30. Nguyen, T.B., Le, T.L., Nguyen, D.D., Pham, D.T.: A reliable image-to-video person re-identification based on feature fusion. In: Asian Conference on Intelligent Information and Database Systems (2018)

    Google Scholar 

  31. Park, J., Woo, S., Lee, J., Kweon, I.S.: BAM: bottleneck attention module. In: British Machine Vision Conference (2018)

    Google Scholar 

  32. Qian, J., Jiang, W., Luo, H., Yu, H.: Stripe-based and attribute-aware network: a two-branch deep model for vehicle re-identification. arXiv preprint arXiv:1910.05549 (2019)

  33. Ristani, E., Solera, F., Zou, R., Cucchiara, R., Tomasi, C.: Performance measures and a data set for multi-target, multi-camera tracking. In: European Conference on Computer Vision (2016)

    Google Scholar 

  34. Ristani, E., Tomasi, C.: Features for multi-target multi-camera tracking and re-identification. In: IEEE International Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  35. Romero, A., et al.: FitNets: hints for thin deep nets. In: International Conference on Learning Representations (2015)

    Google Scholar 

  36. Sandler, M., et al.: MobileNetV2: inverted residuals and linear bottlenecks. In: IEEE International Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  37. Schneider, S., Taylor, G.W., Linquist, S., Kremer, S.C.: Past, present and future approaches using computer vision for animal re-identification from camera trap data. Methods Ecol. Evol. 10(3), 461–470 (2019)

    Article  Google Scholar 

  38. Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: IEEE International Conference on Computer Vision and Pattern Recognition (2015)

    Google Scholar 

  39. Selvaraju, R.R., et al.: Grad-CAM: Visual explanations from deep networks via gradient-based localization. In: IEEE International Conference on Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  40. Si, J., et al.: Dual attention matching network for context-aware feature sequence based person re-identification. In: IEEE International Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  41. Sohn, K.: Improved deep metric learning with multi-class N-pair loss objective. In: Neural Information Processing Systems (2016)

    Google Scholar 

  42. Tang, Z., et al.: PAMTRI: pose-aware multi-task learning for vehicle re-identification using highly randomized synthetic data. In: IEEE International Conference on Computer Vision (2019)

    Google Scholar 

  43. Tian, M., et al.: Eliminating background-bias for robust person re-identification. In: IEEE International Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  44. Tung, F., Mori, G.: Similarity-preserving knowledge distillation. In: IEEE International Conference on Computer Vision (2019)

    Google Scholar 

  45. Ustinova, E., Lempitsky, V.: Learning deep embeddings with histogram loss. In: Neural Information Processing Systems (2016)

    Google Scholar 

  46. Wang, G., Lai, J., Xie, X.: P2SNet: can an image match a video for person re-identification in an end-to-end way? IEEE Trans. Circ. Syst. Video Technol. (2017)

    Google Scholar 

  47. Wang, J., Zhou, F., Wen, S., Liu, X., Lin, Y.: Deep metric learning with angular loss. In: IEEE International Conference on Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  48. Wang, Z., et al.: Orientation invariant feature embedding and spatial temporal regularization for vehicle re-identification. In: IEEE International Conference on Computer Vision (2017)

    Google Scholar 

  49. Wu, Y., et al.: Exploit the unknown gradually: one-shot video-based person re-identification by stepwise learning. In: IEEE International Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  50. Xie, Z., Li, L., Zhong, X., Zhong, L., Xiang, J.: Image-to-video person re-identification with cross-modal embeddings. Pattern Recogn. Lett. 133, 70–76 (2019)

    Article  Google Scholar 

  51. Yang, C., Xie, L., Qiao, S., Yuille, A.: Knowledge distillation in generations: more tolerant teachers educate better students. arXiv preprint arXiv:1805.05551 (2018)

  52. Yu, J., et al.: A strong baseline for tiger re-id and its bag of tricks. In: IEEE International Conference on Computer Vision Workshops (2019)

    Google Scholar 

  53. Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: International Conference on Learning Representations (2017)

    Google Scholar 

  54. Zhang, D., et al.: Image-to-video person re-identification with temporally memorized similarity learning. IEEE Trans. Circ. Syst. Video Technol. 28(10), 2622–2632 (2017)

    Article  Google Scholar 

  55. Zheng, L., et al.: Mars: a video benchmark for large-scale person re-identification. In: European Conference on Computer Vision (2016)

    Google Scholar 

  56. Zheng, L., Yang, Y., Hauptmann, A.G.: Person re-identification: past, present and future. arXiv preprint arXiv:1610.02984 (2016)

  57. Zhong, Z., Zheng, L., Cao, D., Li, S.: Re-ranking person re-identification with k-reciprocal encoding. In: IEEE International Conference on Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  58. Zhou, Y., Liu, L., Shao, L.: Vehicle re-identification by deep hidden multi-view inference. IEEE Trans. Image Process. 27(7), 3275–3287 (2018)

    Article  MathSciNet  Google Scholar 

  59. Zhou, Y., Shao, L.: Aware attentive multi-view inference for vehicle re-identification. In: IEEE International Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

Download references

Acknowledgement

The authors would like to acknowledge Farm4Trade for its financial and technical support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Angelo Porrello .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 11516 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Porrello, A., Bergamini, L., Calderara, S. (2020). Robust Re-Identification by Multiple Views Knowledge Distillation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12355. Springer, Cham. https://doi.org/10.1007/978-3-030-58607-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58607-2_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58606-5

  • Online ISBN: 978-3-030-58607-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics