Skip to main content
Log in

FairMOT: On the Fairness of Detection and Re-identification in Multiple Object Tracking

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Multi-object tracking (MOT) is an important problem in computer vision which has a wide range of applications. Formulating MOT as multi-task learning of object detection and re-ID in a single network is appealing since it allows joint optimization of the two tasks and enjoys high computation efficiency. However, we find that the two tasks tend to compete with each other which need to be carefully addressed. In particular, previous works usually treat re-ID as a secondary task whose accuracy is heavily affected by the primary detection task. As a result, the network is biased to the primary detection task which is not fair to the re-ID task. To solve the problem, we present a simple yet effective approach termed as FairMOT based on the anchor-free object detection architecture CenterNet. Note that it is not a naive combination of CenterNet and re-ID. Instead, we present a bunch of detailed designs which are critical to achieve good tracking results by thorough empirical studies. The resulting approach achieves high accuracy for both detection and tracking. The approach outperforms the state-of-the-art methods by a large margin on several public datasets. The source code and pre-trained models are released at https://github.com/ifzhang/FairMOT.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Bae, S. H., & Yoon, K. J. (2014). Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1218–1225).

  • Bae, S. H., & Yoon, K. J. (2017). Confidence-based data association and discriminative deep appearance learning for robust online multi-object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(3), 595–610.

    Article  Google Scholar 

  • Berclaz, J., Fleuret, F., Turetken, E., & Fua, P. (2011). Multiple object tracking using k-shortest paths optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(9), 1806–1819.

    Article  Google Scholar 

  • Bergmann, P., Meinhardt, T., & Leal-Taixe, L. (2019). Tracking without bells and whistles. In ICCV (pp. 941–951).

  • Bernardin, K., & Stiefelhagen, R. (2008). Evaluating multiple object tracking performance: The clear mot metrics. EURASIP Journal on Image and Video Processing, 2008, 1–10.

    Article  Google Scholar 

  • Bewley, A., Ge, Z., Ott, L., Ramos, F., & Upcroft, B. (2016). Simple online and realtime tracking. In ICIP (pp. 3464–3468). IEEE.

  • Bochinski, E., Eiselein, V., & Sikora, T. (2017). High-speed tracking-by-detection without using image information. In 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS) (pp. 1–6). IEEE.

  • Bolme, D. S., Beveridge, J. R., Draper, B. A., & Lui, Y. M. (2010). Visual object tracking using adaptive correlation filters. In CVPR (pp. 2544–2550). IEEE.

  • Brasó, G., & Leal-Taixé, L. (2020). Learning a neural solver for multiple object tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6247–6257).

  • Cai, Z., & Vasconcelos, N. (2018). Cascade r-cnn: Delving into high quality object detection. In CVPR (pp. 6154–6162).

  • Chao, P., Kao, C. Y., Ruan, Y. S., Huang, C. H., & Lin, Y. L. (2019). Hardnet: A low memory traffic network. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 3552–3561).

  • Chen, L., Ai, H., Shang, C., Zhuang, Z., & Bai, B. (2017). Online multi-object tracking with convolutional neural networks. In 2017 IEEE international conference on image processing (ICIP) (pp. 645–649). IEEE.

  • Chen, L., Ai, H., Zhuang, Z., & Shang, C. (2018a). Real-time multiple people tracking with deeply learned candidate selection and person re-identification. In 2018 IEEE international conference on multimedia and expo (ICME) (pp. 1–6). IEEE.

  • Chen, Z., Badrinarayanan, V., Lee, C. Y., & Rabinovich, A. (2018b). Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In ICML, PMLR (pp. 794–803).

  • Cheng, B., Xiao, B., Wang, J., Shi, H., Huang, T. S., & Zhang, L. (2020). Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In CVPR.

  • Choi, W. (2015). Near-online multi-target tracking with aggregated local flow descriptor. In Proceedings of the IEEE international conference on computer vision (pp. 3029–3037).

  • Chu, P., Fan, H., Tan, C. C., & Ling, H. (2019). Online multi-object tracking with instance-aware tracker and dynamic model refreshment. In 2019 IEEE winter conference on applications of computer vision (WACV) (pp. 161–170). IEEE.

  • Chu, P., & Ling, H. (2019). Famnet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. In ICCV (pp. 6172–6181).

  • Dendorfer, P., Rezatofighi, H., Milan, A., Shi, J., Cremers, D., Reid, I., Roth, S., Schindler, K., & Leal-Taixé, L. (2020). Mot20: A benchmark for multi object tracking in crowded scenes. arXiv preprint arXiv:2003.09003.

  • Dollár, P., Wojek, C., Schiele, B., & Perona, P. (2009). Pedestrian detection: A benchmark. In CVPR (pp. 304–311). IEEE.

  • Dong, Z., Li, G., Liao, Y., Wang, F., Ren, P., & Qian, C. (2020). Centripetalnet: Pursuing high-quality keypoint pairs for object detection. In CVPR (pp. 10519–10528).

  • Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., & Tian, Q. (2019). Centernet: Keypoint triplets for object detection. In ICCV (pp. 6569–6578).

  • Ess, A., Leibe, B., Schindler, K., & Van Gool, L. (2008). A mobile vision system for robust multi-person tracking. In CVPR (pp. 1–8). IEEE.

  • Fang, K., Xiang, Y., Li, X., & Savarese, S. (2018). Recurrent autoregressive networks for online multi-object tracking. In WACV (pp. 466–475). IEEE.

  • Feichtenhofer, C., Pinz, A., & Zisserman, A. (2017). Detect to track and track to detect. In Proceedings of the IEEE international conference on computer vision (pp. 3038–3046).

  • Felzenszwalb, P., McAllester, D., & Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. In CVPR (pp. 1–8). IEEE.

  • Guo, M., Haque, A., Huang, D. A., Yeung, S., & Fei-Fei, L. (2018). Dynamic task prioritization for multitask learning. In Proceedings of the European conference on computer vision (ECCV) (pp. 270–287).

  • Han, S., Huang, P., Wang, H., Yu, E., Liu, D., Pan, X., & Zhao, J. (2020) Mat: Motion-aware multi-object tracking. arXiv preprint arXiv:2009.04794

  • Han, W., Khorrami, P., Paine, T. L., Ramachandran, P., Babaeizadeh, M., Shi, H., Li, J., Yan, S., & Huang, T. S. (2016). Seq-nms for video object detection. arXiv preprint arXiv:1602.08465.

  • He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In ICCV (pp. 2961–2969).

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778).

  • Henriques, J. F., Caseiro, R., Martins, P., & Batista, J. (2014). High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3), 583–596.

    Article  Google Scholar 

  • Henschel, R., Zou, Y., & Rosenhahn, B. (2019). Multiple people tracking using body and joint detections. In CVPRW.

  • Hermans, A., Beyer, L., & Leibe, B. (2017). In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737.

  • Hornakova, A., Henschel, R., Rosenhahn, B., & Swoboda, P. (2020). Lifted disjoint paths with application in multiple object tracking. In International conference on machine learning, PMLR (pp. 4364–4375).

  • Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Fluids Engineering, 82(1), 35–45.

    MathSciNet  Google Scholar 

  • Kang, K., Li, H., Xiao, T., Ouyang, W., Yan, J., Liu, X., & Wang, X. (2017). Object detection in videos with tubelet proposal networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 727–735).

  • Kang, K., Ouyang, W., Li, H., & Wang, X. (2016). Object detection from video tubelets with convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 817–825).

  • Kendall, A., Gal, Y., & Cipolla, R. (2018). Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In CVPR (pp. 7482–7491).

  • Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

  • Kokkinos, I. (2017). Ubernet: Training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In CVPR (pp. 6129–6138).

  • Kuhn, H. W. (1955). The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2(1–2), 83–97.

    Article  MathSciNet  Google Scholar 

  • Law, H., & Deng, J. (2018). Cornernet: Detecting objects as paired keypoints. In ECCV (pp. 734–750).

  • Leal-Taixé, L., Milan, A., Reid, I., Roth, S., & Schindler, K. (2015). Motchallenge 2015: Towards a benchmark for multi-target tracking. arXiv preprint arXiv:1504.01942.

  • Liang, C., Zhang, Z., Lu, Y., Zhou, X., Li, B., Ye, X., & Zou, J. (2020). Rethinking the competition between detection and reid in multi-object tracking. arXiv preprint arXiv:2010.12138.

  • Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017a). Feature pyramid networks for object detection. In CVPR (pp. 2117–2125).

  • Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017b). Focal loss for dense object detection. In ICCV (pp. 2980–2988).

  • Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In ECCV (pp. 740–755). Springer.

  • Liu, S., Johns, E., & Davison, A. J.(2019). End-to-end multi-task learning with attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1871–1880).

  • Lu, Z., Rathod, V., Votel, R., & Huang, J. (2020). Retinatrack: Online single stage joint detection and tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 14668–14678).

  • Luo, H., Gu, Y., Liao, X., Lai, S., & Jiang, W. (2019a). Bag of tricks and a strong baseline for deep person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops.

  • Luo, H., Xie, W., Wang, X., & Zeng, W. (2019b). Detect or track: Towards cost-effective video object detection/tracking. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 8803–8810.

  • Mahmoudi, N., Ahadi, S. M., & Rahmati, M. (2019). Multi-target tracking using cnn-based features: Cnnmtt. Multimedia Tools and Applications, 78(6), 7077–7096.

    Article  Google Scholar 

  • Milan, A., Leal-Taixé, L., Reid, I., Roth, S., & Schindler, K. (2016) Mot16: A benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831.

  • Milan, A., Roth, S., & Schindler, K. (2013). Continuous energy minimization for multitarget tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(1), 58–72.

    Article  Google Scholar 

  • Pang, B., Li, Y., Zhang, Y., Li, M., & Lu, C. (2020). Tubetk: Adopting tubes to track multi-object in a one-step training model. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6308–6318).

  • Pang, J., Qiu, L., Li, X., Chen, H., Li, Q., Darrell, T., & Yu, F. (2021). Quasi-dense similarity learning for multiple object tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 164–173).

  • Peng, J., Wang, C., Wan, F., Wu, Y., Wang, Y., Tai, Y., Wang, C., Li, J., Huang, F., & Fu, Y. (2020). Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In European conference on computer vision (pp. 145–161). Springer.

  • Radosavovic, I., Kosaraju, R.P., Girshick, R., He, K., Dollár, P. (2020). Designing network design spaces. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10428–10436

  • Ranjan, R., Patel, V. M., & Chellappa, R. (2017). Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. T-PAMI, 41(1), 121–135.

    Article  Google Scholar 

  • Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767.

  • Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91–99).

  • Ristani, E., Solera, F., Zou, R., Cucchiara, R., & Tomasi, C. (2016). Performance measures and a data set for multi-target, multi-camera tracking. In ECCV (pp. 17–35). Springer.

  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.

    Article  MathSciNet  Google Scholar 

  • Sadeghian, A., Alahi, A., & Savarese, S. (2017). Tracking the untrackable: Learning to track multiple cues with long-term dependencies. In Proceedings of the IEEE international conference on computer vision (pp. 300–311).

  • Sanchez-Matilla, R., Poiesi, F., & Cavallaro, A. (2016). Online multi-target tracking with strong and weak detections. In ECCV (pp. 84–99). Springer.

  • Sener, O., & Koltun, V. (2018). Multi-task learning as multi-objective optimization. In NIPS (pp. 527–538).

  • Shan, C., Wei, C., Deng, B., Huang, J., Hua, X. S., Cheng, X., & Liang, K. (2020). Fgagt: Flow-guided adaptive graph tracking. arXiv preprint arXiv:2010.09015.

  • Shao, S., Zhao, Z., Li, B., Xiao, T., Yu, G., Zhang, X., & Sun, J. (2018). Crowdhuman: A benchmark for detecting human in a crowd. arXiv preprint arXiv:1805.00123.

  • Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

  • Sun, S., Akhtar, N., Song, H., Mian, A. S., & Shah, M. (2019). Deep affinity network for multiple object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 104–119.

    Google Scholar 

  • Tang, P., Wang, C., Wang, X., Liu, W., Zeng, W., & Wang, J. (2019). Object detection in videos by high quality object linking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(5), 1272–1278.

    Article  Google Scholar 

  • Tang, S., Andriluka, M., Andres, B., & Schiele, B. (2017). Multiple people tracking by lifted multicut and person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3539–3548).

  • Valmadre, J., Bewley, A., Huang, J., Sun, C., Sminchisescu, C., & Schmid, C. (2021). Local metrics for multi-object tracking. arXiv preprint arXiv:2104.02631.

  • Vandenhende, S., Georgoulis, S., Van Gansbeke, W., Proesmans, M., Dai, D. & Van Gool, L. (2021). Multi–Task learning for dense prediction tasks: A survey. In IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2021.3054719.

  • Voigtlaender, P., Krause, M., Osep, A., Luiten, J., Sekar, B. B. G., Geiger, A., & Leibe, B. (2019). Mots: Multi-object tracking and segmentation. In CVPR (pp. 7942–7951).

  • Wan, X., Wang, J., Kong, Z., Zhao, Q., & Deng, S. (2018). Multi-object tracking using online metric learning with long short-term memory. In 2018 25th IEEE international conference on image processing (ICIP) (pp. 788–792). IEEE.

  • Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., et al. (2020). Deep high–resolution representation learning for visual recognition. In IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2020.2983686.

  • Wang, Z., Zheng, L., Liu, Y., Li, Y., & Wang, S. (2020b). Towards real-time multi-object tracking. In Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16 (pp. 107–122). Springer.

  • Wen, L., Li, W., Yan, J., Lei, Z., Yi, D., & Li, S. Z. (2014). Multiple target tracking based on undirected hierarchical relation hypergraph. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1282–1289).

  • Wojke, N., Bewley, A., & Paulus, D. (2017). Simple online and realtime tracking with a deep association metric. In 2017 IEEE international conference on image processing (ICIP) (pp. 3645–3649). IEEE.

  • Xiang, Y., Alahi, A., & Savarese, S. (2015). Learning to track: Online multi-object tracking by decision making. In ICCV (pp. 4705–4713).

  • Xiao, T., Li, S., Wang, B., Lin, L., & Wang, X. (2017). Joint detection and identification feature learning for person search. In CVPR (pp. 3415–3424).

  • Xu, J., Cao, Y., Zhang, Z., & Hu, H. (2019). Spatial–temporal relation networks for multi-object tracking. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 3988–3998).

  • Yang, F., Choi, W., & Lin, Y. (2016). Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2129–2137).

  • Yang, Z., Liu, S., Hu, H., Wang, L., & Lin, S. (2019). Reppoints: Point set representation for object detection. In ICCV (pp. 9657–9666).

  • Yu, F., Li, W., Li, Q., Liu, Y., Shi, X., & Yan, J. (2016). Poi: Multiple object tracking with high performance detection and appearance feature. In ECCV (pp. 36–42). Springer.

  • Yu, F., Wang, D., Shelhamer, E., & Darrel, l. T. (2018). Deep layer aggregation. In CVPR (pp. 2403–2412).

  • Zamir, A. R., Dehghan, A., & Shah, M. (2012). Gmcp-tracker: Global multi-object tracking using generalized minimum clique graphs. In European conference on computer vision (pp. 343–356). Springer.

  • Zhang, L., Li, Y., & Nevatia, R. (2008). Global data association for multi-object tracking using network flows. In 2008 IEEE conference on computer vision and pattern recognition (pp. 1–8). IEEE.

  • Zhang, S., Benenson, R., & Schiele, B. (2017). Citypersons: A diverse dataset for pedestrian detection. In CVPR (pp. 3213–3221).

  • Zhang, Y., Sheng, H., Wu, Y., Wang, S., Lyu, W., Ke, W., & Xiong, Z. (2020). Long-term tracking with deep tracklet association. IEEE Transactions on Image Processing, 29, 6694–6706.

    Article  Google Scholar 

  • Zheng, L., Zhang, H., Sun, S., Chandraker, M., Yang, Y., & Tian, Q. (2017a). Person re-identification in the wild. In CVPR (pp. 1367–1376).

  • Zheng, Z., Zheng, L., & Yang, Y. (2017b). A discriminatively learned cnn embedding for person reidentification. ACM Transactions on Multimedia Computing, Communications, and Applications, 14(1), 1–20.

  • Zhou, X., Koltun, V., & Krähenbühl, P. (2020). Tracking objects as points. In European conference on computer vision (pp. 474–490). Springer.

  • Zhou, X., Wang, D., & Krähenbühl, P. (2019a). Objects as points. arXiv preprint arXiv:1904.07850.

  • Zhou, X., Zhuo, J., & Krahenbuhl, P. (2019b). Bottom-up object detection by grouping extreme and center points. In CVPR (pp. 850–859).

  • Zhou, Z., Xing, J., Zhang, M., & Hu, W. (2018). Online multi-target tracking with tensor-based high-order graph matching. In 2018 24th International Conference on Pattern Recognition (ICPR) (pp. 1809–1814). IEEE.

  • Zhu, J., Yang, H., Liu, N., Kim, M., Zhang, W., & Yang, M. H. (2018). Online multi-object tracking with dual matching attention networks. In Proceedings of the European conference on computer vision (ECCV) (pp. 366–382).

Download references

Acknowledgements

This work was in part supported by NSFC (Nos. 61733007 and 61876212) and MSRA Collaborative Research Fund. We thank all the anonymous reviewers for their valuable suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinggang Wang.

Additional information

Communicated by Bumsub Ham.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Wang, C., Wang, X. et al. FairMOT: On the Fairness of Detection and Re-identification in Multiple Object Tracking. Int J Comput Vis 129, 3069–3087 (2021). https://doi.org/10.1007/s11263-021-01513-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-021-01513-4

Keywords

Navigation