Advertisement

Robust Re-Identification by Multiple Views Knowledge Distillation

Conference paper
  • 828 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12355)

Abstract

To achieve robustness in Re-Identification, standard methods leverage tracking information in a Video-To-Video fashion. However, these solutions face a large drop in performance for single image queries (e.g., Image-To-Video setting). Recent works address this severe degradation by transferring temporal information from a Video-based network to an Image-based one. In this work, we devise a training strategy that allows the transfer of a superior knowledge, arising from a set of views depicting the target object. Our proposal – Views Knowledge Distillation (VKD) – pins this visual variety as a supervision signal within a teacher-student framework, where the teacher educates a student who observes fewer views. As a result, the student outperforms not only its teacher but also the current state-of-the-art in Image-To-Video by a wide margin (6.3% mAP on MARS, 8.6% on Duke and 5% on VeRi-776). A thorough analysis – on Person, Vehicle and Animal Re-ID – investigates the properties of VKD from a qualitatively and quantitatively perspective. Code is available at https://github.com/aimagelab/VKD.

Keywords

Deep learning Re-Identification Knowledge Distillation 

Notes

Acknowledgement

The authors would like to acknowledge Farm4Trade for its financial and technical support.

Supplementary material

504449_1_En_6_MOESM1_ESM.pdf (11.2 mb)
Supplementary material 1 (pdf 11516 KB)

References

  1. 1.
    Alfasly, S.A.S., et al.: Variational representation learning for vehicle re-identification. In: IEEE International Conference on Image Processing (2019)Google Scholar
  2. 2.
    Bagherinezhad, H., Horton, M., Rastegari, M., Farhadi, A.: Label refinery: improving ImageNet classification through label progression. arXiv preprint arXiv:1805.02641 (2018)
  3. 3.
    Bao, L., Ma, B., Chang, H., Chen, X.: Masked graph attention network for person re-identification. In: IEEE International Conference on Computer Vision and Pattern Recognition Workshops (2019)Google Scholar
  4. 4.
    Bergamini, L., et al.: Multi-views embedding for cattle re-identification. In: IEEE International Conference on Signal-Image Technology & Internet-Based Systems (2018)Google Scholar
  5. 5.
    Bhardwaj, S., Srinivasan, M., Khapra, M.M.: Efficient video classification using fewer frames. In: IEEE International Conference on Computer Vision and Pattern Recognition (2019)Google Scholar
  6. 6.
    Chen, D., Li, H., Xiao, T., Yi, S., Wang, X.: Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. In: IEEE International Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  7. 7.
    Chu, R., et al.: Vehicle re-identification with viewpoint-aware metric learning. In: IEEE International Conference on Computer Vision (2019)Google Scholar
  8. 8.
    Fu, Y., Wang, X., Wei, Y., Huang, T.: STA: spatial-temporal attention for large-scale video-based person re-identification. In: AAAI Conference on Artificial Intelligence (2019)Google Scholar
  9. 9.
    Furlanello, T., Lipton, Z.C., Tschannen, M., Itti, L., Anandkumar, A.: Born again neural networks. In: International Conference on Machine Learning (2018)Google Scholar
  10. 10.
    Gu, X., Ma, B., Chang, H., Shan, S., Chen, X.: Temporal knowledge propagation for image-to-video person re-identification. In: IEEE International Conference on Computer Vision (2019)Google Scholar
  11. 11.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE International Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  12. 12.
    Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017)
  13. 13.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: NeurIPS Deep Learning and Representation Learning Workshop (2015)Google Scholar
  14. 14.
    Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE International Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  15. 15.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (2015)Google Scholar
  16. 16.
    Khan, S.D., Ullah, H.: A survey of advances in vision-based vehicle re-identification. Comput. Vis. Image Underst. 183, 50–63 (2019)CrossRefGoogle Scholar
  17. 17.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. Int. Conf. Learn. Represent. (2015)Google Scholar
  18. 18.
    Li, S., Li, J., Lin, W., Tang, H.: Amur tiger re-identification in the wild. arXiv preprint arXiv:1906.05586 (2019)
  19. 19.
    Li, Z., Hoiem, D.: Learning without forgetting. In: European Conference on Computer Vision (2016)Google Scholar
  20. 20.
    Liao, S., Hu, Y., Zhu, X., Li, S.Z.: Person re-identification by local maximal occurrence representation and metric learning. In: IEEE International Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  21. 21.
    Liu, C., Zhang, R., Guo, L.: Part-pose guided amur tiger re-identification. In: IEEE International Conference on Computer Vision Workshops (2019)Google Scholar
  22. 22.
    Liu, C.T., Wu, C.W., Wang, Y.C.F., Chien, S.Y.: Spatially and temporally efficient non-local attention network for video-based person re-identification. In: British Machine Vision Conference (2019)Google Scholar
  23. 23.
    Liu, N., Zhao, Q., Zhang, N., Cheng, X., Zhu, J.: Pose-guided complementary features learning for Amur tiger re-identification. In: IEEE International Conference on Computer Vision Workshops (2019)Google Scholar
  24. 24.
    Liu, X., Zhang, S., Huang, Q., Gao, W.: Ram: a region-aware deep model for vehicle re-identification. In: IEEE International Conference on Multimedia and Expo (ICME) (2018)Google Scholar
  25. 25.
    Liu, X., Liu, W., Mei, T., Ma, H.: A deep learning-based approach to progressive vehicle re-identification for urban surveillance. In: European Conference on Computer Vision (2016)Google Scholar
  26. 26.
    Liu, X., Liu, W., Mei, T., Ma, H.: PROVID: progressive and multimodal vehicle reidentification for large-scale urban surveillance. IEEE Trans. Multimedia 20(3), 645–658 (2017)CrossRefGoogle Scholar
  27. 27.
    Liu, Y., Junjie, Y., Ouyang, W.: Quality aware network for set to set recognition. In: IEEE International Conference on Computer Vision (2017)Google Scholar
  28. 28.
    Luo, H., Gu, Y., Liao, X., Lai, S., Jiang, W.: Bag of tricks and a strong baseline for deep person re-identification. In: IEEE International Conference on Computer Vision and Pattern Recognition Workshops (2019)Google Scholar
  29. 29.
    Matiyali, N., Sharma, G.: Video person re-identification using learned clip similarity aggregation. In: The IEEE Winter Conference on Applications of Computer Vision (2020)Google Scholar
  30. 30.
    Nguyen, T.B., Le, T.L., Nguyen, D.D., Pham, D.T.: A reliable image-to-video person re-identification based on feature fusion. In: Asian Conference on Intelligent Information and Database Systems (2018)Google Scholar
  31. 31.
    Park, J., Woo, S., Lee, J., Kweon, I.S.: BAM: bottleneck attention module. In: British Machine Vision Conference (2018)Google Scholar
  32. 32.
    Qian, J., Jiang, W., Luo, H., Yu, H.: Stripe-based and attribute-aware network: a two-branch deep model for vehicle re-identification. arXiv preprint arXiv:1910.05549 (2019)
  33. 33.
    Ristani, E., Solera, F., Zou, R., Cucchiara, R., Tomasi, C.: Performance measures and a data set for multi-target, multi-camera tracking. In: European Conference on Computer Vision (2016)Google Scholar
  34. 34.
    Ristani, E., Tomasi, C.: Features for multi-target multi-camera tracking and re-identification. In: IEEE International Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  35. 35.
    Romero, A., et al.: FitNets: hints for thin deep nets. In: International Conference on Learning Representations (2015)Google Scholar
  36. 36.
    Sandler, M., et al.: MobileNetV2: inverted residuals and linear bottlenecks. In: IEEE International Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  37. 37.
    Schneider, S., Taylor, G.W., Linquist, S., Kremer, S.C.: Past, present and future approaches using computer vision for animal re-identification from camera trap data. Methods Ecol. Evol. 10(3), 461–470 (2019)CrossRefGoogle Scholar
  38. 38.
    Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: IEEE International Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  39. 39.
    Selvaraju, R.R., et al.: Grad-CAM: Visual explanations from deep networks via gradient-based localization. In: IEEE International Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  40. 40.
    Si, J., et al.: Dual attention matching network for context-aware feature sequence based person re-identification. In: IEEE International Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  41. 41.
    Sohn, K.: Improved deep metric learning with multi-class N-pair loss objective. In: Neural Information Processing Systems (2016)Google Scholar
  42. 42.
    Tang, Z., et al.: PAMTRI: pose-aware multi-task learning for vehicle re-identification using highly randomized synthetic data. In: IEEE International Conference on Computer Vision (2019)Google Scholar
  43. 43.
    Tian, M., et al.: Eliminating background-bias for robust person re-identification. In: IEEE International Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  44. 44.
    Tung, F., Mori, G.: Similarity-preserving knowledge distillation. In: IEEE International Conference on Computer Vision (2019)Google Scholar
  45. 45.
    Ustinova, E., Lempitsky, V.: Learning deep embeddings with histogram loss. In: Neural Information Processing Systems (2016)Google Scholar
  46. 46.
    Wang, G., Lai, J., Xie, X.: P2SNet: can an image match a video for person re-identification in an end-to-end way? IEEE Trans. Circ. Syst. Video Technol. (2017)Google Scholar
  47. 47.
    Wang, J., Zhou, F., Wen, S., Liu, X., Lin, Y.: Deep metric learning with angular loss. In: IEEE International Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  48. 48.
    Wang, Z., et al.: Orientation invariant feature embedding and spatial temporal regularization for vehicle re-identification. In: IEEE International Conference on Computer Vision (2017)Google Scholar
  49. 49.
    Wu, Y., et al.: Exploit the unknown gradually: one-shot video-based person re-identification by stepwise learning. In: IEEE International Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  50. 50.
    Xie, Z., Li, L., Zhong, X., Zhong, L., Xiang, J.: Image-to-video person re-identification with cross-modal embeddings. Pattern Recogn. Lett. 133, 70–76 (2019)CrossRefGoogle Scholar
  51. 51.
    Yang, C., Xie, L., Qiao, S., Yuille, A.: Knowledge distillation in generations: more tolerant teachers educate better students. arXiv preprint arXiv:1805.05551 (2018)
  52. 52.
    Yu, J., et al.: A strong baseline for tiger re-id and its bag of tricks. In: IEEE International Conference on Computer Vision Workshops (2019)Google Scholar
  53. 53.
    Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: International Conference on Learning Representations (2017)Google Scholar
  54. 54.
    Zhang, D., et al.: Image-to-video person re-identification with temporally memorized similarity learning. IEEE Trans. Circ. Syst. Video Technol. 28(10), 2622–2632 (2017)CrossRefGoogle Scholar
  55. 55.
    Zheng, L., et al.: Mars: a video benchmark for large-scale person re-identification. In: European Conference on Computer Vision (2016)Google Scholar
  56. 56.
    Zheng, L., Yang, Y., Hauptmann, A.G.: Person re-identification: past, present and future. arXiv preprint arXiv:1610.02984 (2016)
  57. 57.
    Zhong, Z., Zheng, L., Cao, D., Li, S.: Re-ranking person re-identification with k-reciprocal encoding. In: IEEE International Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  58. 58.
    Zhou, Y., Liu, L., Shao, L.: Vehicle re-identification by deep hidden multi-view inference. IEEE Trans. Image Process. 27(7), 3275–3287 (2018)MathSciNetCrossRefGoogle Scholar
  59. 59.
    Zhou, Y., Shao, L.: Aware attentive multi-view inference for vehicle re-identification. In: IEEE International Conference on Computer Vision and Pattern Recognition (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.AImageLabUniversity of Modena and Reggio EmiliaModenaItaly

Personalised recommendations