Skip to main content

A robust attribute-aware and real-time multi-target multi-camera tracking system using multi-scale enriched features and hierarchical clustering


Multi-Camera Multi-Target Tracking (MTMCT) has challenges such as viewpoint and pose variations, scale and illumination changes, and occlusion. Available MTMCT approaches have high computational complexity and are not sufficiently robust in the mentioned challenges. In this work, an Attribute Recognition-based MTMCT(AR-MTMCT) framework is presented for real-time application. This framework performs object detection, re-Identification (Re-Id) feature extraction, and attribute recognition in an end-to-end manner. Applying attributes highly improves MTMC online tracking performance in the mentioned challenges. The pipeline of AR-MTMCT consists of three modules. The first module is a novel one-shot Single-Camera Tracking (SCT) architecture named Attribute Recognition-Multi Object Tracking (AR-MOT) which performs object detection, Re-Id feature extraction, and attributes recognition using one backbone through multi-task learning. Hierarchical clustering is performed in the second module to deal with the detection of several instances of one identity in the overlapping areas of cameras. In the last module, a new data association algorithm is performed using spatial information to reduce matching candidates. We also have proposed an efficient strategy in the data association algorithm to remove lost tracks by making a trade-off between the number of lost tracks and the maximum lost time. Evaluation and training of AR-MTMCT have been done on the large-scale MTA dataset. The proposed system has been improved by 20% and 11%, respectively, compared to the WDA method in IDF1 and IDs metrics. Also, the AR-MTMCT outperforms the state-of-the-art methods by a large margin on decreasing computational complexities.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Data Availability

The data supporting this study's findings for multi-camera and single-camera tracking are available at and, respectively.


  1. Hou, L., Wan, W., Hwang, J.-N., Muhammad, R., Yang, M., Han, K.: Human tracking over camera networks: A review. EURASIP J. Adv. Signal Process. (2017).

    Article  Google Scholar 

  2. Wang, X.: Intelligent multi-camera video surveillance: A review. Pattern Recogn. Lett. 34, 3–19 (2013)

    Article  Google Scholar 

  3. Gaikwad, B., Karmakar, A.: Smart surveillance system for real-time multi-person multi-camera tracking at the edge. J. Real-Time Image Proc. 18, 1993–2007 (2021)

    Article  Google Scholar 

  4. Kohl, P., Specker, A., Schumann, A., Beyerer, J.: The MTA dataset for multi target multi camera pedestrian tracking by weighted distance aggregation. CVPR 2020 workshop (2020)

  5. Ristani, E., Tomasi, C.: Features for multi-target multi-camera tracking and re-identification. arXiv:1803.10859v1, (2018)

  6. Zhang, X., Izquierdo, E.: Real-time multi-target multi-camera tracking with spatial-temporal information. 2019 IEEE Visual Communications and Image Processing (VCIP) (2019).

  7. Fang, K., Xiang, Y., Li, X., Savarese, S.: Recurrent autoregressive networks for online multi-object tracking. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 466–475 (2018)

  8. Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. IEEE international conference on image processing (ICIP), pp. 3645–3649 (2017)

  9. Yu, F., Li, W., Li, Q., Liu, Y., Shi, X., Yan, J.: Poi: multiple object tracking with high performance detection and appearance feature. European Conference on Computer Vision, pp. 36–42, Springer (2016).

  10. Zhou, Z., Xing, J., Zhang, M., Hu, W.: Online multi-target tracking with tensorbased high-order graph matching. 24th International Conference on Pattern Recognition (ICPR), pp. 1809–1814 (2018)

  11. Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: FairMOT: On the fairness of detection and re-Identification in multiple object tracking. arXiv:2004.01888v6 (2021)

  12. Tan, M., Le, Q.V.: Efficientnet: rethinking model scaling for convolutional neural networks. arXiv:1905.11946, (2019)

  13. Moghaddam, M., Charmi, M., Hassanpoor, H.: Jointly human semantic parsing and attribute recognition with feature pyramid structure in EfficientNets. IET Image Processing, (2021)

  14. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. IEEE International Conference on Computer Vision (ICCV), pp 2961–2969 (2017)

  15. Wang, Z., Zheng, L., Liu, Y., Wang, S.: Towards real-time multi-object tracking. arXiv preprint arXiv:1909.12605 (2019)

  16. Gilbert, A., Bowden, R.: Tracking objects across cameras by incrementally learning inter-camera colour calibration and patterns of activity. European Conference on Computer Vision (ECCV), vol. 125–136, (2006)

  17. Javed, O., Shafique, K., Rasheed, Z., Shah, M.: Modeling inter-camera space-time and appearance relationships for tracking across non-overlapping views. Comput. Vis. Image Underst. 109, 146–162 (2008)

    Article  Google Scholar 

  18. Srivastava, S., Ng, K.K., Delp, E.J.: Color correction for object tracking across multiple cameras. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1821–1824 (2011)

  19. Tesfaye, Y.T., Zemene, E., Prati, A., Pelillo, M., Shah, M.: Multi-target tracking in multiple non-overlapping cameras using constrained dominant sets. arXiv preprint arXiv:1706.06196 (2017)

  20. Cai, Y., Medioni, G.: Exploring context information for inter-camera multiple target tracking. IEEE Winter Conference on Applications of Computer Vision, Steamboat Springs, CO, USA, pp. 761–76 (2014)

  21. Cheng, D., Gong, Y., Wang, J., Hou, Q., Zheng, N.: Part aware trajectories association across non-overlapping uncalibrated cameras. Neurocomputing 230, 30–39 (2017)

    Article  Google Scholar 

  22. Gao, Y., Ji, R., Zhang, L., Hauptmann, A.G.: Symbiotic tracker ensemble toward A unified tracking framework. IEEE Trans. Circ. Syst. Video Technol. (TCSVT) 24, 1122–1131 (2014)

    Article  Google Scholar 

  23. Kuo, C., Huang, C., Nevatia, R.: Inter-camera association of multi-target tracks by on-line learned appearance affinity models. European Conference on Computer Vision (ECCV), pp. 388–396 (2010)

  24. Zhang, S., Zhu, Y., Roy-Chowdhury, A.K.: Tracking multiple interacting targets in a camera network. Comput. Vis. Image Underst. 134, 64–73 (2015)

    Article  Google Scholar 

  25. Cao, T.S.Z., Wei, S.-E., Sheikh, Y.: Realtime multiperson 2d pose estimation using part affinity fields. CVPR (2017)

  26. Liming Zhao, X.L., Wang, J., Zhuang, Y.: Deeply-learned part-aligned representations for person reidentification. Presented at the ICCV (2017)

  27. YOLOv5. https :// 5. Accessed 24 July 2020.

  28. Yu, Q., Chang, X., Song, Y.Z., Xiang, T., Hospedales, T.M.: The devil is in the middle: exploiting mid-level representations for cross-domain instance matching. arXiv preprint 711.08106 (2017)

  29. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: Inverted residuals and linear bottlenecks. In Conference on Computer Vision and Pattern Recognition CVPR (2018)

  30. Milan, A., Leal-Taixe, L., Reid, I., Roth, S., Schindler, K.: Mot16: A benchmark for multi-object tracking. arXiv. preprint arXiv:160300831 (2016)

  31. Li, D., Zhang, Z., Chen, H.L.X., Huang, K.: A Richly annotated dataset for pedestrian attribute recognition. arXiv:1603.07054v3 (2016)

  32. Bouguettaya, A., Yu, Q., Liu, X., Zhou, X., Songa, A.: Efficient agglomerative hierarchical clustering. 42(5):2785–2797 (2015)

  33. Chen, L., Ai, H., Zhuang, Z., Shang, C.: Real-time multiple people tracking with deeply learned candidate selection and person reidentification. IEEE Int. Conf. Multimed. Expo (ICME) 1, 1–6 (2018)

    Google Scholar 

  34. Ess, A., Leibe, B., Schindler, K., Gool, L.V.: A mobile vision system for robust multi-person tracking. IEEE Conference on Computer Vision and Pattern Recognition (2008)

  35. Zhang, S., Benenson, R., Schiele, B.: Citypersons: A diverse dataset for pedestrian detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 3213–3221 (2017)

  36. Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: A benchmark. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, vol. 304–311, 2009.

  37. Xiao, T., Li, S., Wang, B., Lin, L., Wang, X.: Joint detection and identification feature learning for person search. In CVPR (2017)

  38. Zheng, L., Zhang, H., Sun, S., Chandraker, M., Yang, Y., Tian, Q.Q.: Person reidentification in the wild. In CVPR (2017)

  39. K. B. a. R. S.: Evaluating multiple object tracking performance: the CLEAR MOT metrics. EURASIP Journal on Image and Video Processing, no. 1–10, (2008)

  40. Keni, B., Rainer, S.: Evaluating multiple object tracking performance: the clear mot metrics. EURASIP J. Image Video Process (2008)

  41. Yamaguchi, K. Berg, A.C., Ortiz, L.E., & Berg, T.L.: Who are you with and where are you going?. In: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit, pp. 1345–1352 (2011)

  42. Mahmoudi, N., Ahadi, S.M., Rahmati, M.: Multi-target tracking using CNN-based features: CNNMTT. Multimed. Tools Appl. 78(6), 7077–7096 (2019)

    Article  Google Scholar 

  43. Zagoruyko, S. & Komodakis, N.: Wide residual networks. arXiv:1605.07146 (2016)

Download references

Author information

Authors and Affiliations



Conception and design of study: Mahnaz Moghaddam, Mostafa Charmi, Hossein Hassanpoor Analysis and interpretation of data: Mahnaz Moghaddam, Mostafa Charmi Drafting the manuscript: Mahnaz Moghaddam, Mostafa Charmi, Hossein Hassanpoor Revising the manuscript critically for important intellectual content: Mahnaz Moghaddam, Mostafa Charmi. Revising the manuscript based on the editor's and the reviewers' comments: Mahnaz Moghaddam, Mostafa Charmi All authors approved the final submitted manuscript to the journal.

Corresponding author

Correspondence to Mostafa Charmi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Moghaddam, M., Charmi, M. & Hassanpoor, H. A robust attribute-aware and real-time multi-target multi-camera tracking system using multi-scale enriched features and hierarchical clustering. J Real-Time Image Proc 20, 45 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: