Multi-Camera Multi-Target Tracking (MTMCT) has challenges such as viewpoint and pose variations, scale and illumination changes, and occlusion. Available MTMCT approaches have high computational complexity and are not sufficiently robust in the mentioned challenges. In this work, an Attribute Recognition-based MTMCT(AR-MTMCT) framework is presented for real-time application. This framework performs object detection, re-Identification (Re-Id) feature extraction, and attribute recognition in an end-to-end manner. Applying attributes highly improves MTMC online tracking performance in the mentioned challenges. The pipeline of AR-MTMCT consists of three modules. The first module is a novel one-shot Single-Camera Tracking (SCT) architecture named Attribute Recognition-Multi Object Tracking (AR-MOT) which performs object detection, Re-Id feature extraction, and attributes recognition using one backbone through multi-task learning. Hierarchical clustering is performed in the second module to deal with the detection of several instances of one identity in the overlapping areas of cameras. In the last module, a new data association algorithm is performed using spatial information to reduce matching candidates. We also have proposed an efficient strategy in the data association algorithm to remove lost tracks by making a trade-off between the number of lost tracks and the maximum lost time. Evaluation and training of AR-MTMCT have been done on the large-scale MTA dataset. The proposed system has been improved by 20% and 11%, respectively, compared to the WDA method in IDF1 and IDs metrics. Also, the AR-MTMCT outperforms the state-of-the-art methods by a large margin on decreasing computational complexities.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
Hou, L., Wan, W., Hwang, J.-N., Muhammad, R., Yang, M., Han, K.: Human tracking over camera networks: A review. EURASIP J. Adv. Signal Process. (2017). https://doi.org/10.1186/s13634-017-0482-z
Wang, X.: Intelligent multi-camera video surveillance: A review. Pattern Recogn. Lett. 34, 3–19 (2013)
Gaikwad, B., Karmakar, A.: Smart surveillance system for real-time multi-person multi-camera tracking at the edge. J. Real-Time Image Proc. 18, 1993–2007 (2021)
Kohl, P., Specker, A., Schumann, A., Beyerer, J.: The MTA dataset for multi target multi camera pedestrian tracking by weighted distance aggregation. CVPR 2020 workshop (2020)
Ristani, E., Tomasi, C.: Features for multi-target multi-camera tracking and re-identification. arXiv:1803.10859v1, (2018)
Zhang, X., Izquierdo, E.: Real-time multi-target multi-camera tracking with spatial-temporal information. 2019 IEEE Visual Communications and Image Processing (VCIP) (2019).
Fang, K., Xiang, Y., Li, X., Savarese, S.: Recurrent autoregressive networks for online multi-object tracking. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 466–475 (2018)
Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. IEEE international conference on image processing (ICIP), pp. 3645–3649 (2017)
Yu, F., Li, W., Li, Q., Liu, Y., Shi, X., Yan, J.: Poi: multiple object tracking with high performance detection and appearance feature. European Conference on Computer Vision, pp. 36–42, Springer (2016).
Zhou, Z., Xing, J., Zhang, M., Hu, W.: Online multi-target tracking with tensorbased high-order graph matching. 24th International Conference on Pattern Recognition (ICPR), pp. 1809–1814 (2018)
Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: FairMOT: On the fairness of detection and re-Identification in multiple object tracking. arXiv:2004.01888v6 (2021)
Tan, M., Le, Q.V.: Efficientnet: rethinking model scaling for convolutional neural networks. arXiv:1905.11946, (2019)
Moghaddam, M., Charmi, M., Hassanpoor, H.: Jointly human semantic parsing and attribute recognition with feature pyramid structure in EfficientNets. IET Image Processing, (2021)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. IEEE International Conference on Computer Vision (ICCV), pp 2961–2969 (2017)
Wang, Z., Zheng, L., Liu, Y., Wang, S.: Towards real-time multi-object tracking. arXiv preprint arXiv:1909.12605 (2019)
Gilbert, A., Bowden, R.: Tracking objects across cameras by incrementally learning inter-camera colour calibration and patterns of activity. European Conference on Computer Vision (ECCV), vol. 125–136, (2006)
Javed, O., Shafique, K., Rasheed, Z., Shah, M.: Modeling inter-camera space-time and appearance relationships for tracking across non-overlapping views. Comput. Vis. Image Underst. 109, 146–162 (2008)
Srivastava, S., Ng, K.K., Delp, E.J.: Color correction for object tracking across multiple cameras. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1821–1824 (2011)
Tesfaye, Y.T., Zemene, E., Prati, A., Pelillo, M., Shah, M.: Multi-target tracking in multiple non-overlapping cameras using constrained dominant sets. arXiv preprint arXiv:1706.06196 (2017)
Cai, Y., Medioni, G.: Exploring context information for inter-camera multiple target tracking. IEEE Winter Conference on Applications of Computer Vision, Steamboat Springs, CO, USA, pp. 761–76 (2014)
Cheng, D., Gong, Y., Wang, J., Hou, Q., Zheng, N.: Part aware trajectories association across non-overlapping uncalibrated cameras. Neurocomputing 230, 30–39 (2017)
Gao, Y., Ji, R., Zhang, L., Hauptmann, A.G.: Symbiotic tracker ensemble toward A unified tracking framework. IEEE Trans. Circ. Syst. Video Technol. (TCSVT) 24, 1122–1131 (2014)
Kuo, C., Huang, C., Nevatia, R.: Inter-camera association of multi-target tracks by on-line learned appearance affinity models. European Conference on Computer Vision (ECCV), pp. 388–396 (2010)
Zhang, S., Zhu, Y., Roy-Chowdhury, A.K.: Tracking multiple interacting targets in a camera network. Comput. Vis. Image Underst. 134, 64–73 (2015)
Cao, T.S.Z., Wei, S.-E., Sheikh, Y.: Realtime multiperson 2d pose estimation using part affinity fields. CVPR (2017)
Liming Zhao, X.L., Wang, J., Zhuang, Y.: Deeply-learned part-aligned representations for person reidentification. Presented at the ICCV (2017)
YOLOv5. https ://github.com/ultralytics/yolov 5. Accessed 24 July 2020.
Yu, Q., Chang, X., Song, Y.Z., Xiang, T., Hospedales, T.M.: The devil is in the middle: exploiting mid-level representations for cross-domain instance matching. arXiv preprint 711.08106 (2017)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: Inverted residuals and linear bottlenecks. In Conference on Computer Vision and Pattern Recognition CVPR (2018)
Milan, A., Leal-Taixe, L., Reid, I., Roth, S., Schindler, K.: Mot16: A benchmark for multi-object tracking. arXiv. preprint arXiv:160300831 (2016)
Li, D., Zhang, Z., Chen, H.L.X., Huang, K.: A Richly annotated dataset for pedestrian attribute recognition. arXiv:1603.07054v3 (2016)
Bouguettaya, A., Yu, Q., Liu, X., Zhou, X., Songa, A.: Efficient agglomerative hierarchical clustering. 42(5):2785–2797 (2015)
Chen, L., Ai, H., Zhuang, Z., Shang, C.: Real-time multiple people tracking with deeply learned candidate selection and person reidentification. IEEE Int. Conf. Multimed. Expo (ICME) 1, 1–6 (2018)
Ess, A., Leibe, B., Schindler, K., Gool, L.V.: A mobile vision system for robust multi-person tracking. IEEE Conference on Computer Vision and Pattern Recognition (2008)
Zhang, S., Benenson, R., Schiele, B.: Citypersons: A diverse dataset for pedestrian detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 3213–3221 (2017)
Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: A benchmark. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, vol. 304–311, 2009.
Xiao, T., Li, S., Wang, B., Lin, L., Wang, X.: Joint detection and identification feature learning for person search. In CVPR (2017)
Zheng, L., Zhang, H., Sun, S., Chandraker, M., Yang, Y., Tian, Q.Q.: Person reidentification in the wild. In CVPR (2017)
K. B. a. R. S.: Evaluating multiple object tracking performance: the CLEAR MOT metrics. EURASIP Journal on Image and Video Processing, no. 1–10, (2008)
Keni, B., Rainer, S.: Evaluating multiple object tracking performance: the clear mot metrics. EURASIP J. Image Video Process (2008)
Yamaguchi, K. Berg, A.C., Ortiz, L.E., & Berg, T.L.: Who are you with and where are you going?. In: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit, pp. 1345–1352 (2011)
Mahmoudi, N., Ahadi, S.M., Rahmati, M.: Multi-target tracking using CNN-based features: CNNMTT. Multimed. Tools Appl. 78(6), 7077–7096 (2019)
Zagoruyko, S. & Komodakis, N.: Wide residual networks. arXiv:1605.07146 (2016)
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Moghaddam, M., Charmi, M. & Hassanpoor, H. A robust attribute-aware and real-time multi-target multi-camera tracking system using multi-scale enriched features and hierarchical clustering. J Real-Time Image Proc 20, 45 (2023). https://doi.org/10.1007/s11554-023-01301-y