In this paper, we focus mainly on designing a Multi-Target Object Tracking algorithm that would produce high-quality trajectories while maintaining low computational costs. Using online association, such features enable this algorithm to be used in applications like autonomous driving and autonomous surveillance. We propose CNN-based, instead of hand-crafted, features to lead to higher accuracies. We also present a novel grouping method for 2-D online environments without prior knowledge of camera parameters and an affinity measure based on the groups maintained in previous frames. Comprehensive evaluations of our algorithm (CNNMTT) on a publicly available and widely used dataset (MOT16) reveal that the CNNMTT method achieves high quality tracking results in comparison to the state of the art while being faster and involving much less computational cost.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Please note that unlike the work of  where dynamic modeling was done in a 3D tracking environment, our model works in 2D environments and thus does not need previous knowledge about the camera.
Stochastic Gradient Descent
An observation on MOT16 dataset.
Andriyenko A, Schindler K, Roth S (2012) Discrete-continuous optimization for multi-target tracking. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on (pp. 1926–1933). IEEE
Bhattacharyya A (1946) On a measure of divergence between two multinomial populations. Sankhyā: the indian journal of statistics, 401–406
Breitenstein MD, Reichlin F, Leibe B, Koller-Meier E, Van Gool L (2011) Online multiperson tracking-by-detection from a single, uncalibrated camera. IEEE Trans Pattern Anal Mach Intell 33(9):1820–1833
Choi W (2015) Near-online multi-target tracking with aggregated local flow descriptor. In Proceedings of the IEEE International Conference on Computer Vision (pp. 3029–3037)
Choi W, Savarese S (2010) Multiple target tracking in world coordinate with single, minimally calibrated camera. In European Conference on Computer Vision (pp. 553–567). Springer Berlin Heidelberg
Dollár P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. In ICML (pp. 647–655)
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Girshick R, Donahue J, Darrell T, Malik J (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580–587)
Helbing D, Molnar P (1995) Social force model for pedestrian dynamics. Phys Rev E 51(5):4282
Henriques JF, Caseiro R, Batista J (2011) Globally optimal solution to multi-object tracking with merged measurements. In 2011 International Conference on Computer Vision (pp. 2470–2477). IEEE
Henschel R, Leal-Taixé L, Rosenhahn B, Schindler K (2016) Tracking with multi-level features. arXiv preprint arXiv:1607.07304
Hu M, Ali S, Shah M (2008) Detecting global motion patterns in complex videos. In Pattern Recognition, 2008. ICPR 2008. 19th International Conference on (pp. 1–5). IEEE
Kalal Z, Mikolajczyk K, Matas J (2010). Forward-backward error: Automatic detection of tracking failures. In Pattern recognition (ICPR), 2010 20th international conference on (pp. 2756–2759). IEEE
Kalal Z, Mikolajczyk K, Matas J (2012) Tracking-learning-detection. IEEE Trans Pattern Anal Mach Intell 34(7):1409–1422
Keuper M, Tang S, Zhongjie Y, Andres B, Brox T, Schiele B (2016) A multi-cut formulation for joint segmentation and tracking of multiple objects. arXiv preprint arXiv:1607.06317
Kratz L, Nishino K (2010) Tracking with local spatio-temporal motion patterns in extremely crowded scenes. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on (pp. 693–700). IEEE
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105)
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06) (Vol. 2, pp. 2169–2178). IEEE
Lee B, Erdenee E, Jin S, Rhee PK (2016) Multi-class multi-object tracking using changing point detection. arXiv preprint arXiv:1608.08434
Lee B, Erdenee E, Jin S, Nam MY, Jung YG, Rhee PK (2016) Multi-class multi-object tracking using changing point detection. In European Conference on Computer Vision (pp. 68–83). Springer, Cham
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Milan A, Roth S, Schindler K (2014) Continuous energy minimization for multitarget tracking. IEEE Trans Pattern Anal Mach Intell 36(1):58–72
Milan A, Leal-Taixe L, Reid I, Roth S, Schindler K (2016) MOT16: A Benchmark for Multi-Object Tracking. arXiv preprint arXiv:1603.00831
Mitzel D, Leibe B (2011) Real-time multi-person tracking with detector assisted structure propagation. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on (pp. 974–981). IEEE
Munkres J (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1):32–38
Pirsiavash H, Ramanan D, Fowlkes CC (2011) Globally-optimal greedy algorithms for tracking a variable number of objects. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on (pp. 1201–1208). IEEE
Possegger H, Mauthner T, Roth PM, Bischof H (2014) Occlusion geodesics for online multi-object tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1306–1313)
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91–99)
Sanchez-Matilla R, Poiesi F, Cavallaro A (2016) Online Multi-target Tracking with Strong and Weak Detections. In ECCV Workshops (2) (pp. 84–99)
Sanchez-Matilla R, Poiesi F, Cavallaro A (2016) Multi-target tracking with strong and weak detections. In ECCV Workshops-Benchmarking Multi-Target Tracking (Vol. 5, No. 6, p. 18)
Stiefelhagen R, Bernardin K, Bowers R, Garofolo J, Mostefa D, Soundararajan P (2006) The CLEAR 2006 evaluation. In International Evaluation Workshop on Classification of Events, Activities and Relationships(pp. 1–44). Springer Berlin Heidelberg
Sugimura D, Kitani KM, Okabe T, Sato Y, Sugimoto A (2009) Using individuality to track individuals: clustering individual trajectories in crowds using local appearance and frequency trait. In 2009 IEEE 12th International Conference on Computer Vision (pp. 1467–1474). IEEE
Tang S, Andriluka M, Andres B, Schiele B (2017). Multiple people tracking by lifted multicut and person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3539–3548)
Tao D, Guo Y, Song M, Li Y, Yu Z, Tang YY (2016) Person re-identification by dual-regularized kiss metric learning. IEEE Trans Image Process 25(6):2726–2738
Wang X, Yang M, Zhu S, Lin Y (2013) Regionlets for generic object detection. In Proceedings of the IEEE International Conference on Computer Vision (pp. 17–24)
Wu B, Nevatia R (2006). Tracking of multiple, partially occluded humans based on static body part detection. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06) (Vol. 1, pp. 951–958). IEEE
Yang B, Nevatia R (2012) Multi-target tracking by online learning of non-linear motion patterns and robust appearance models. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on (pp. 1918–1925). IEEE
Yang M, Yu T, Wu Y (2007) Game-theoretic multiple target tracking. In 2007 IEEE 11th International Conference on Computer Vision(pp. 1–8). IEEE
Yang H, Shao L, Zheng F, Wang L, Song Z (2011) Recent advances and trends in visual tracking: a review. Neurocomputing 74(18):3823–3831
Yu F, Li W, Li Q, Liu Y, Shi X, Yan J (2016) POI: multiple object tracking with high performance detection and appearance feature. In European Conference on Computer Vision (pp. 36–42). Springer, Cham
Zeiler MD, Fergus R (2014). Visualizing and understanding convolutional networks. In European Conference on Computer Vision (pp. 818–833). Springer International Publishing
Zhao X, Gong D, Medioni G (2012) Tracking using motion patterns for very crowded scenes. In Computer Vision–ECCV 2012 (pp. 315–328). Springer Berlin Heidelberg
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Mahmoudi, N., Ahadi, S.M. & Rahmati, M. Multi-target tracking using CNN-based features: CNNMTT. Multimed Tools Appl 78, 7077–7096 (2019). https://doi.org/10.1007/s11042-018-6467-6
- Multi-target tracking
- Machine vision
- Multi-Object tracking
- Video surveillance
- Pedestrian tracking