Skip to main content

Norm-Aware Embedding for Efficient Person Search and Tracking


Person detection and Re-identification are two well-defined support tasks for practically relevant tasks such as Person Search and Multiple Person Tracking. Person Search aims to find and locate all instances with the same identity as the query person in a set of panoramic gallery images. Similarly, Multiple Person Tracking, especially when using the tracking-by-detection pipeline, requires to detect and associate all appeared persons in consecutive video frames. One major challenge shared by the two tasks comes from the contradictory goals of detection and re-identification, i.e, person detection focuses on finding the commonness of all persons while person re-ID handles the differences among multiple identities. Therefore, it is crucial to reconcile the relationship between the two support tasks in a joint model. To this end, we present a novel approach called Norm-Aware Embedding to disentangle the person embedding into norm and angle for detection and re-ID respectively, allowing for both effective and efficient multi-task training. We further extend the proposal-level person embedding to pixel-level, whose discrimination ability is less affected by misalignment. Our Norm-Aware Embedding achieves remarkable performance on both person search and multiple person tracking benchmarks, with the merit of being easy to train and resource-friendly.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9


  1. 1.

  2. 2.

    Code will be updated at this site.

  3. 3.


  1. Ahmed, E., Jones, M., & Marks, T. K. (2015). An improved deep learning architecture for person re-identification. CVPR.

    Article  Google Scholar 

  2. Babaee, M., Athar, A., Rigoll, G. (2018) Multiple people tracking using hierarchical deep tracklet re-identification. arXiv preprint arXiv:1811.04091

  3. Bergmann, P., Meinhardt, T., Leal-Taixe, L. (2019). Tracking without bells and whistles. In: ICCV

  4. Bernardin, K., & Stiefelhagen, R. (2008). Evaluating multiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing, 2008, 1–10.

    Article  Google Scholar 

  5. Breitenstein, M.D., Reichlin, F., Leibe, B., Koller-Meier, E., Van Gool, L. (2009). Robust tracking-by-detection using a detector confidence particle filter. In: ICCV

  6. Chang, X., Huang, P.Y., Shen, Y.D., Liang, X., Yang, Y., Hauptmann, A.G. (2018). Rcaa: Relational context-aware agents for person search. In: ECCV

  7. Chen, D., Zhang, S., Ouyang, W., Yang, J., Schiele, B. (2020). Hierarchical online instance matching for person search. In: AAAI

  8. Chen, D., Zhang, S., Ouyang, W., Yang, J., Tai, Y. (2018). Person search via a mask-guided two-stream cnn model. In: ECCV

  9. Chen, D., Zhang, S., Ouyang, W., Yang, J., & Tai, Y. (2020). Person search by separated modeling and a mask-guided two-stream cnn model. TIP, 29, 4669–4682.

    Google Scholar 

  10. Chen, D., Zhang, S., Yang, J., Schiele, B. (2020). Norm-aware embedding for efficient person search. In: CVPR

  11. Cheng, D., Gong, Y., Zhou, S., Wang, J., & Zheng, N. (2016). Person re-identification by multi-channel parts-based CNN with improved triplet loss function. CVPR.

  12. Choi, W. (2015). Near-online multi-target tracking with aggregated local flow descriptor. In: ICCV

  13. Chu, P., Ling, H. (2019). Famnet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. In: ICCV

  14. Dendorfer, P., Rezatofighi, H., Milan, A., Shi, J., Cremers, D., Reid, I., Roth, S., Schindler, K., Leal-Taixé, L. (2020). Mot20: A benchmark for multi object tracking in crowded scenes. arXiv preprint arXiv:2003.09003

  15. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In: CVPR

  16. Deng, J., Guo, J., Xue, N., Zafeiriou, S. (2018) Arcface: Additive angular margin loss for deep face recognition. arXiv preprint arXiv:1801.07698

  17. Ding, S., Lin, L., Wang, G., & Chao, H. (2015). Deep feature learning with relative distance comparison for person re-identification. PR, 48(10), 2993–3003.

    Article  Google Scholar 

  18. Dollar, P., Appel, R., Belongie, S., & Perona, P. (2014). Fast feature pyramids for object detection. TPAMI, 36(8), 1532–1545.

    Article  Google Scholar 

  19. Dollar, P., Tu, Z., Perona, P., & Belongie, S. (2009). Integral channel features. In: BMVC.

  20. Evangelidis, G. D., & Psarakis, E. Z. (2008). Parametric image alignment using enhanced correlation coefficient maximization. TPAMI, 30(10), 1858–1865.

    Article  Google Scholar 

  21. Fan, X., Jiang, W., Luo, H., Fei, M. (2018). Spherereid: Deep hypersphere manifold embedding for person re-identification. arXiv preprint arXiv:1807.00537

  22. Farenzena, M., Bazzani, L., Perina, A., Murino, V., & Cristani, M. (2010). Person re-identification by symmetry-driven accumulation of local features. CVPR.

  23. Felzenszwalb, P. F., Girshick, R. B., Mcallester, D., & Ramanan, D. (2009). Object detection with discriminatively trained part based models. TPAMI, 32(9), 1627–1645.

    Article  Google Scholar 

  24. Feng, W., Hu, Z., Wu, W., Yan, J., Ouyang, W. (2019). Multi-object tracking with multiple cues and switcher-aware classification. arXiv preprint arXiv:1901.06129

  25. Girshick, R., Donahue, J., Darrell, T., Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR

  26. Girshick, R., Iandola, F., Darrell, T., Malik, J. (2015). Deformable part models are convolutional neural networks. In: CVPR

  27. Guo, Y., Zhang, L. (2017). One-shot face recognition by promoting underrepresented classes. arXiv preprint arXiv:1707.05574

  28. Han, C., Ye, J., Zhong, Y., Tan, X., Zhang, C., Gao, C., Sang, N. (2019). Re-id driven localization refinement for person search. In: ICCV

  29. He, K., Gkioxari, G., Dollár, P., Girshick, R. (2017). Mask r-cnn. In: ICCV

  30. He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. In: CVPR

  31. Henschel, R., Zou, Y., Rosenhahn, B. (2019). Multiple people tracking using body and joint detections. In: CVPRW

  32. Ioffe, S., Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: ICML

  33. Keuper, M., Tang, S., Zhongjie, Y., Andres, B., Brox, T., Schiele, B. (2016). A multi-cut formulation for joint segmentation and tracking of multiple objects. arXiv preprint arXiv:1607.06317

  34. Kim, C., Li, F., Ciptadi, A., Rehg, J.M. (2015). Multiple hypothesis tracking revisited. In: ICCV

  35. Kostinger, M., Hirzer, M., Wohlhart, P., Roth, P. M., & Bischof, H. (2012). Large scale metric learning from equivalence constraints. CVPR.

  36. Kuo, C.H., Nevatia, R. (2011). How does person identity recognition help multi-person tracking? In: CVPR

  37. Lan, X., Zhu, X., Gong, S. (2018). Person search by multi-scale matching. In: ECCV

  38. Leal-Taixé, L., Canton-Ferrer, C., Schindler, K. (2016). Learning by tracking: Siamese cnn for robust target association. In: CVPRW

  39. Leal-Taixé, L., Milan, A., Reid, I., Roth, S., Schindler, K. (2015). Motchallenge 2015: Towards a benchmark for multi-target tracking. arXiv preprint arXiv:1504.01942

  40. Li, W., Zhao, R., Xiao, T., & Wang, X. (2014). DeepReID: Deep filter pairing neural network for person re-identification. CVPR.

  41. Li, X., Zheng, W. S., Wang, X., Xiang, T., & Gong, S. (2015). Multi-scale learning for low-resolution person re-identification. ICCV.

  42. Liao, S., Hu, Y., Zhu, X., Li, S.Z. (2015). Person re-identification by local maximal occurrence representation and metric learning. In: CVPR

  43. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S. (2017). Feature pyramid networks for object detection. In: CVPR

  44. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L. (2014). Microsoft coco: Common objects in context. In: ECCV

  45. Liu, H., Feng, J., Jie, Z., Jayashree, K., Zhao, B., Qi, M., Jiang, J., Yan, S. (2017). Neural person search machines. In: ICCV

  46. Liu, H., Feng, J., Qi, M., Jiang, J., & Yan, S. (2017). End-to-end comparative attention networks for person re-identification. TIP, 26(7), 3492–3506.

    MathSciNet  Article  MATH  Google Scholar 

  47. Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., Song, L. (2017). Sphereface: Deep hypersphere embedding for face recognition. In: CVPR

  48. Liu, W., Wen, Y., Yu, Z., Yang, M. (2016). Large-margin softmax loss for convolutional neural networks. In: ICML

  49. Lu, Z., Rathod, V., Votel, R., Huang, J. (2020). Retinatrack: Online single stage joint detection and tracking. arXiv preprint arXiv:2003.13870

  50. Ma, L., Tang, S., Black, M.J., Van Gool, L. (2018). Customized multi-person tracker. In: ACCV

  51. Milan, A., Leal-Taixé, L., Reid, I., Roth, S., Schindler, K. (2016). Mot16: A benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831

  52. Milan, A., Roth, S., & Schindler, K. (2013). Continuous energy minimization for multitarget tracking. TPAMI, 36(1), 58–72.

    Article  Google Scholar 

  53. Munjal, B., Amin, S., Tombari, F., Galasso, F. (2019). Query-guided end-to-end person search. In: CVPR

  54. Ouyang, W., Wang, X. (2012). A discriminative deep model for pedestrian detection with occlusion handling. In: CVPR

  55. Ouyang, W., Wang, X. (2013). Joint deep learning for pedestrian detection. In: ICCV

  56. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A. (2017). Automatic differentiation in pytorch. In: NIPS-W

  57. Pirsiavash, H., Ramanan, D., Fowlkes, C.C. (2011). Globally-optimal greedy algorithms for tracking a variable number of objects. In: CVPR

  58. Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P.V., Schiele, B. (2016). Deepcut: Joint subset partition and labeling for multi person pose estimation. In: CVPR

  59. Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards real-time object detection with region proposal networks. TPAMI, 39(6), 1137–1149.

    Article  Google Scholar 

  60. Sun, Y., Zheng, L., Yang, Y., Tian, Q., Wang, S. (2018). Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In: ECCV

  61. Tang, S., Andres, B., Andriluka, M., Schiele, B. (2015). Subgraph decomposition for multi-target tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5033–5041

  62. Tang, S., Andriluka, M., Andres, B., Schiele, B. (2017). Multiple people tracking by lifted multicut and person re-identification. In: CVPR

  63. Tian, Z., Shen, C., Chen, H., He, T. (2019). Fcos: Fully convolutional one-stage object detection. In: ICCV

  64. Varior, R. R., Shuai, B., Lu, J., Xu, D., & Wang, G. (2016). A siamese long short-term memory architecture for human re-identification. ECCV

  65. Wang, X., Doretto, G., Sebastian, T., Rittscher, J., & Tu, P. (2007). Shape and appearance context modeling. ICCV.

  66. Wang, Y., Gong, D., Zhou, Z., Ji, X., Wang, H., Li, Z., Liu, W., Zhang, T. (2018). Orthogonal deep features decomposition for age-invariant face recognition. In: ECCV

  67. Wang, Z., Zheng, L., Liu, Y., Wang, S (2019)Towards real-time multiobject tracking. arXiv preprint arXiv:1909.12605

  68. Wei, L., Zhang, S., Yao, H., Gao, W., Tian, Q.: Glad: Global-local-alignment descriptor for pedestrian retrieval. In: ACM’MM (2017)

  69. Wen, L., Li, W., Yan, J., Lei, Z., Yi, D., Li, S.Z.: Multiple target tracking based on undirected hierarchical relation hypergraph. In: CVPR (2014)

  70. Wen, Y., Zhang, K., Li, Z., & Qiao, Y. (2016). A discriminative feature learning approach for deep face recognition. ECCV

  71. Xiang, J., Xu, G., Ma, C., Hou, J. (2020). End-to-end learning deep crf models for multi-object tracking. TCSVT

  72. Xiang, W., Huang, J., Qi, X., Hua, X.S., Zhang, L. (2018). Homocentric hypersphere feature embedding for person re-identification. arXiv preprint arXiv:1804.08866

  73. Xiang, Y., Alahi, A., Savarese, S.: Learning to track: Online multi-object tracking by decision making. In: ICCV (2015)

  74. Xiao, J., Xie, Y., Tillo, T., Huang, K., Wei, Y., Feng, J. (2017). Ian: The individual aggregation network for person search. arXiv preprint arXiv:1705.05552

  75. Xiao, T., Li, H., Ouyang, W., Wang, X. (2016). Learning deep feature representations with domain guided dropout for person re-identification. In: CVPR

  76. Xiao, T., Li, S., Wang, B., Lin, L., Wang, X. (2017). Joint detection and identification feature learning for person search. In: CVPR

  77. Xu, J., Zhao, R., Zhu, F., Wang, H., Ouyang, W. (2018). Attention-aware compositional network for person re-identification. In: CVPR

  78. Xu, Y., Osep, A., Ban, Y., Horaud, R., Leal-Taixé, L., Alameda-Pineda, X. (2020). How to train your deep multi-object tracker. In: CVPR

  79. Yan, Y., Li, J., Qin, J., Bai, S., Liao, S., Liu, L., Zhu, F., Shao, L. (2021). Anchor-free person search. In: CVPR

  80. Yan, Y., Qin, J., Ni, B., Chen, J., Liu, L., Zhu, F., Zheng, W. S., Yang, X., & Shao, L. (2020). Learning multi-attention context graph for group-based re-identification. TPAMI.

  81. Yan, Y., Zhang, Q., Ni, B., Zhang, W., Xu, M., Yang, X.: Learning context graph for person search. In: CVPR (2019)

  82. Yang, F., Choi, W., Lin, Y.: Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: CVPR (2016)

  83. Yao, H., Zhang, S., Hong, R., Zhang, Y., Xu, C., & Tian, Q. (2019). Deep representation learning with part loss for person re-identification. TIP, 28(6), 2860–2871.

    MathSciNet  MATH  Google Scholar 

  84. Yi, D., Lei, Z., Liao, S., & Li, S. Z. (2014). Deep metric learning for person re-identification. ICPR.

  85. Zhang, L., Xiang, T., Gong, S. (2016). Learning a discriminative null space for person re-identification. In: CVPR

  86. Zhang, S., Bauckhage, C., Cremers, A.B. (2014) Informed haar-like features improve pedestrian detection. In: CVPR

  87. Zhang, S., Benenson, R., Omran, M., Hosang, J., Schiele, B. (2016).How far are we from solving pedestrian detection? In: CVPR

  88. Zhang, S., Benenson, R., Omran, M., Hosang, J., & Schiele, B. (2018). Towards reaching human performance in pedestrian detection. TPAMI, 40(4), 973–986.

    Article  Google Scholar 

  89. Zhang, S., Benenson, R., Schiele, B. (2015). Filtered channel features for pedestrian detection. In: CVPR

  90. Zhang, S., Benenson, R., Schiele, B. (2017). Citypersons: A diverse dataset for pedestrian detection. In: CVPR

  91. Zhang, S., Yang, J., Schiele, B. (2018). Occluded pedestrian detection through guided attention in cnns. In: CVPR

  92. Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: Fairmot: On the fairness of detection and re-identification in multiple object tracking. arXiv preprint arXiv:2004.01888 (2020)

  93. Zhao, L., Li, X., Zhuang, Y., Wang, J.: Deeply-learned part-aligned representations for person re-identification. In: ICCV (2017)

  94. Zhao, R., Ouyang, W., & Wang, X. (2013). Unsupervised salience learning for person re-identification. CVPR.

  95. Zheng, L., Bie, Z., Sun, Y., Wang, J., Su, C., Wang, S., & Tian, Q. (2016). Mars: A video benchmark for large-scale person re-identification. ECCV

  96. Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q. (2015). Scalable person re-identification: A benchmark. In: ICCV

  97. Zheng, L., Zhang, H., Sun, S., Chandraker, M., Yang, Y., Tian, Q. (2017). Person re-identification in the wild. In: CVPR

  98. Zhou, X., Wang, D., Krähenbühl, P. (2019). Objects as points. arXiv preprint arXiv:1904.07850

Download references


This work was partially supported by the National Science Fund of China (Grant No. U1713208), Funds for International Co-operation and Exchange of the National Natural Science Foundation of China (Grant No. 61861136011), “111” Program B13022, Natural Science Foundation of Jiangsu Province, China (Grant No. BK20181299), and National Key Research and Development Program of China (Grant No. 2017YFC0820601).

Author information



Corresponding authors

Correspondence to Shanshan Zhang or Jian Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by Ivan Laptev.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chen, D., Zhang, S., Yang, J. et al. Norm-Aware Embedding for Efficient Person Search and Tracking. Int J Comput Vis 129, 3154–3168 (2021).

Download citation


  • Person search
  • Pedestrian detection
  • Person re-identification
  • Multiple object tracking