Tracking Persons-of-Interest via Adaptive Discriminative Features

  • Shun Zhang
  • Yihong GongEmail author
  • Jia-Bin Huang
  • Jongwoo Lim
  • Jinjun Wang
  • Narendra Ahuja
  • Ming-Hsuan Yang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9909)


Multi-face tracking in unconstrained videos is a challenging problem as faces of one person often appear drastically different in multiple shots due to significant variations in scale, pose, expression, illumination, and make-up. Low-level features used in existing multi-target tracking methods are not effective for identifying faces with such large appearance variations. In this paper, we tackle this problem by learning discriminative, video-specific face features using convolutional neural networks (CNNs). Unlike existing CNN-based approaches that are only trained on large-scale face image datasets offline, we further adapt the pre-trained face CNN to specific videos using automatically discovered training samples from tracklets. Our network directly optimizes the embedding space so that the Euclidean distances correspond to a measure of semantic face similarity. This is technically realized by minimizing an improved triplet loss function. With the learned discriminative features, we apply the Hungarian algorithm to link tracklets within each shot and the hierarchical clustering algorithm to link tracklets across multiple shots to form final trajectories. We extensively evaluate the proposed algorithm on a set of TV sitcoms and music videos and demonstrate significant performance improvement over existing techniques.


Convolutional Neural Network Music Video Multiple Shot Visual Constraint Data Association Problem 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The work is partially supported by National Basic Research Program of China (973 Program, 2015CB351705), NSFC (61332018), Office of Naval Research (N0014-16-1-2314), R&D programs by NRF (2014R1A1A2058501) and MSIP/IITP (IITP-2016-H8601-16-1005) of Korea, NSF CAREER (1149783) and gifts from Adobe and NVIDIA.


  1. 1.
    Brendel, W., Amer, M., Todorovic, S.: Multiobject tracking as maximum weight independent set. In: CVPR (2011)Google Scholar
  2. 2.
    Collins, R.T.: Multitarget data association with higher-order motion models. In: CVPR (2012)Google Scholar
  3. 3.
    Yang, B., Nevatia, R.: Multi-target tracking by online learning of non-linear motion patterns and robust appearance models. In: CVPR (2012)Google Scholar
  4. 4.
    Zhang, L., Li, Y., Nevatia, R.: Global data association for multi-object tracking using network flows. In: CVPR (2008)Google Scholar
  5. 5.
    Zhao, X., Gong, D., Medioni, G.: Tracking using motion patterns for very crowded scenes. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 315–328. Springer, Heidelberg (2012)Google Scholar
  6. 6.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  7. 7.
    Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: DeCAF: a deep convolutional activation feature for generic visual recognition. In: ICML (2014)Google Scholar
  8. 8.
    Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting 10,000 classes. In: CVPR (2014)Google Scholar
  9. 9.
    Sun, Y., Chen, Y., Wang, X., Tang, X.: Deep learning face representation by joint identification-verification. In: NIPS (2014)Google Scholar
  10. 10.
    Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: CVPR (2015)Google Scholar
  11. 11.
    Hu, J., Lu, J., Tan, Y.P.: Discriminative deep metric learning for face verification in the wild. In: CVPR (2014)Google Scholar
  12. 12.
    Andriyenko, A., Schindler, K., Roth, S.: Discrete-continuous optimization for multi-target tracking. In: CVPR (2012)Google Scholar
  13. 13.
    Huang, C., Li, Y., Ai, H., et al.: Robust head tracking with particles based on multiple cues. In: ECCVW (2006)Google Scholar
  14. 14.
    Li, Y., Ai, H., Yamashita, T., Lao, S., Kawade, M.: Tracking in low frame rate video: a cascade particle filter with discriminative observers of different lifespans. In: CVPR (2007)Google Scholar
  15. 15.
    Ben Shitrit, H., Berclaz, J., Fleuret, F., Fua, P.: Tracking multiple people under global appearance constraints. In: ICCV (2011)Google Scholar
  16. 16.
    Huang, C., Wu, B., Nevatia, R.: Robust object tracking by hierarchical association of detection responses. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 788–801. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  17. 17.
    Li, Y., Huang, C., Nevatia, R.: Learning to associate: hybridboosted multi-target tracker for crowded scene. In: CVPR (2009)Google Scholar
  18. 18.
    Wu, B., Lyu, S., Hu, B.G., Ji, Q.: Simultaneous clustering and tracklet linking for multi-face tracking in videos. In: ICCV (2013)Google Scholar
  19. 19.
    Andriyenko, A., Schindler, K.: Multi-target tracking by continuous energy minimization. In: CVPR (2011)Google Scholar
  20. 20.
    Roth, M., Bauml, M., Nevatia, R., Stiefelhagen, R.: Robust multi-pose face tracking by multi-stage tracklet association. In: ICPR (2012)Google Scholar
  21. 21.
    Wang, B., Wang, G., Chan, K.L., Wang, L.: Tracklet association with online target-specific metric learning. In: CVPR (2014)Google Scholar
  22. 22.
    Kuo, C.H., Nevatia, R.: How does person identity recognition help multi-person tracking? In: CVPR (2011)Google Scholar
  23. 23.
    Cinbis, R.G., Verbeek, J., Schmid, C.: Unsupervised metric learning for face identification in TV video. In: ICCV (2011)Google Scholar
  24. 24.
    Wu, B., Zhang, Y., Hu, B.G., Ji, Q.: Constrained clustering and its application to face clustering in videos. In: CVPR (2013)Google Scholar
  25. 25.
    Tapaswi, M., Parkhi, O.M., Rahtu, E., Sommerlade, E., Stiefelhagen, R., Zisserman, A.: Total cluster: a person agnostic clustering method for broadcast videos. In: ICVGIP (2014)Google Scholar
  26. 26.
    Xiao, S., Tan, M., Xu, D.: Weighted block-sparse low rank representation for face clustering in videos. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VI. LNCS, vol. 8694, pp. 123–138. Springer, Heidelberg (2014)Google Scholar
  27. 27.
    Bauml, M., Tapaswi, M., Stiefelhagen, R.: Semi-supervised learning with constraints for person identification in multimedia data. In: CVPR (2013)Google Scholar
  28. 28.
    Sivic, J., Everingham, M., Zisserman, A.: “Who are you?” - Learning person specific classifiers from video. In: CVPR (2009)Google Scholar
  29. 29.
    Everingham, M., Sivic, J., Zisserman, A.: “Hello! My name is... Buffy” - automatic naming of characters in TV video. In: BMVC (2006)Google Scholar
  30. 30.
    Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: CVPR (2014)Google Scholar
  31. 31.
    Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: BMVC (2015)Google Scholar
  32. 32.
    Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. TPAMI 34(7), 1409–1422 (2012)CrossRefGoogle Scholar
  33. 33.
    Pernici, F.: FaceHugger: the ALIEN tracker applied to faces. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012. LNCS, vol. 7585, pp. 597–601. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33885-4_61 Google Scholar
  34. 34.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  35. 35.
    Mathias, M., Benenson, R., Pedersoli, M., Van Gool, L.: Face detection without bells and whistles. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part IV. LNCS, vol. 8692, pp. 720–735. Springer, Heidelberg (2014)Google Scholar
  36. 36.
    Breitenstein, M.D., Reichlin, F., Leibe, B., Koller-Meier, E., Van Gool, L.: Robust tracking-by-detection using a detector confidence particle filter. In: ICCV (2009)Google Scholar
  37. 37.
    Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: CVPR (2005)Google Scholar
  38. 38.
    Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: CVPR (2006)Google Scholar
  39. 39.
    Xing, J., Ai, H., Lao, S.: Multi-object tracking through occlusions by local tracklets filtering and global tracklets association with detection responses. In: CVPR (2009)Google Scholar
  40. 40.
    Yi, D., Lei, Z., Liao, S., Li, S.Z.: Learning face representation from scratch. arXiv (2014)Google Scholar
  41. 41.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM MM (2014)Google Scholar
  42. 42.
    Zhang, S., Wang, J., Wang, Z., Gong, Y., Liu, Y.: Multi-target tracking by learning local-to-global trajectory models. PR 48(2), 580–590 (2015)Google Scholar
  43. 43.
    Van der Maaten, L., Hinton, G.: JMLR 9(2579–2605), 85 (2008)Google Scholar
  44. 44.
    Ayazoglu, M., Sznaier, M., Camps, O.I.: Fast algorithms for structured robust principal component analysis. In: CVPR (2012)Google Scholar
  45. 45.
    Dicle, C., Camps, O.I., Sznaier, M.: The way they move: tracking multiple targets with similar appearance. In: ICCV (2013)Google Scholar
  46. 46.
    Lin, Z., Courbariaux, M., Memisevic, R., Bengio, Y.: Neural networks with few multiplications. arXiv (2015)Google Scholar
  47. 47.
    Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. In: BMVC (2014)Google Scholar
  48. 48.
    Tapaswi, M., Bauml, M., Stiefelhagen, R.: Knock! Knock! Who is it? Probabilistic person identification in TV-series. In: CVPR (2012)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Shun Zhang
    • 1
  • Yihong Gong
    • 1
    Email author
  • Jia-Bin Huang
    • 2
  • Jongwoo Lim
    • 3
  • Jinjun Wang
    • 1
  • Narendra Ahuja
    • 2
  • Ming-Hsuan Yang
    • 4
  1. 1.Xi’an Jiaotong UniversityXi’anChina
  2. 2.University of Illinois, Urbana-ChampaignChampaignUSA
  3. 3.Hanyang UniversitySeoulSouth Korea
  4. 4.University of CaliforniaMercedUSA

Personalised recommendations