Multi-target tracking using CNN-based features: CNNMTT

Abstract

In this paper, we focus mainly on designing a Multi-Target Object Tracking algorithm that would produce high-quality trajectories while maintaining low computational costs. Using online association, such features enable this algorithm to be used in applications like autonomous driving and autonomous surveillance. We propose CNN-based, instead of hand-crafted, features to lead to higher accuracies. We also present a novel grouping method for 2-D online environments without prior knowledge of camera parameters and an affinity measure based on the groups maintained in previous frames. Comprehensive evaluations of our algorithm (CNNMTT) on a publicly available and widely used dataset (MOT16) reveal that the CNNMTT method achieves high quality tracking results in comparison to the state of the art while being faster and involving much less computational cost.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Notes

  1. 1.

    Multi-Target Tracking

  2. 2.

    Bag-Of-Words

  3. 3.

    Single-Object Tracking

  4. 4.

    https://drive.google.com/file/d/0B5ACiy41McAHMjczS2p0dFg3emM/view?usp=drive_web

  5. 5.

    Please note that unlike the work of [28] where dynamic modeling was done in a 3D tracking environment, our model works in 2D environments and thus does not need previous knowledge about the camera.

  6. 6.

    Stochastic Gradient Descent

  7. 7.

    An observation on MOT16 dataset.

  8. 8.

    http://motchallenge.net

References

  1. 1.

    Andriyenko A, Schindler K, Roth S (2012) Discrete-continuous optimization for multi-target tracking. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on (pp. 1926–1933). IEEE

  2. 2.

    Bhattacharyya A (1946) On a measure of divergence between two multinomial populations. Sankhyā: the indian journal of statistics, 401–406

  3. 3.

    Breitenstein MD, Reichlin F, Leibe B, Koller-Meier E, Van Gool L (2011) Online multiperson tracking-by-detection from a single, uncalibrated camera. IEEE Trans Pattern Anal Mach Intell 33(9):1820–1833

    Article  Google Scholar 

  4. 4.

    Choi W (2015) Near-online multi-target tracking with aggregated local flow descriptor. In Proceedings of the IEEE International Conference on Computer Vision (pp. 3029–3037)

  5. 5.

    Choi W, Savarese S (2010) Multiple target tracking in world coordinate with single, minimally calibrated camera. In European Conference on Computer Vision (pp. 553–567). Springer Berlin Heidelberg

    Google Scholar 

  6. 6.

    Dollár P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545

    Article  Google Scholar 

  7. 7.

    Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. In ICML (pp. 647–655)

  8. 8.

    Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  Google Scholar 

  9. 9.

    Girshick R, Donahue J, Darrell T, Malik J (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580–587)

  10. 10.

    Helbing D, Molnar P (1995) Social force model for pedestrian dynamics. Phys Rev E 51(5):4282

    Article  Google Scholar 

  11. 11.

    Henriques JF, Caseiro R, Batista J (2011) Globally optimal solution to multi-object tracking with merged measurements. In 2011 International Conference on Computer Vision (pp. 2470–2477). IEEE

  12. 12.

    Henschel R, Leal-Taixé L, Rosenhahn B, Schindler K (2016) Tracking with multi-level features. arXiv preprint arXiv:1607.07304

  13. 13.

    Hu M, Ali S, Shah M (2008) Detecting global motion patterns in complex videos. In Pattern Recognition, 2008. ICPR 2008. 19th International Conference on (pp. 1–5). IEEE

  14. 14.

    Kalal Z, Mikolajczyk K, Matas J (2010). Forward-backward error: Automatic detection of tracking failures. In Pattern recognition (ICPR), 2010 20th international conference on (pp. 2756–2759). IEEE

  15. 15.

    Kalal Z, Mikolajczyk K, Matas J (2012) Tracking-learning-detection. IEEE Trans Pattern Anal Mach Intell 34(7):1409–1422

    Article  Google Scholar 

  16. 16.

    Keuper M, Tang S, Zhongjie Y, Andres B, Brox T, Schiele B (2016) A multi-cut formulation for joint segmentation and tracking of multiple objects. arXiv preprint arXiv:1607.06317

  17. 17.

    Kratz L, Nishino K (2010) Tracking with local spatio-temporal motion patterns in extremely crowded scenes. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on (pp. 693–700). IEEE

  18. 18.

    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105)

  19. 19.

    Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06) (Vol. 2, pp. 2169–2178). IEEE

  20. 20.

    Lee B, Erdenee E, Jin S, Rhee PK (2016) Multi-class multi-object tracking using changing point detection. arXiv preprint arXiv:1608.08434

  21. 21.

    Lee B, Erdenee E, Jin S, Nam MY, Jung YG, Rhee PK (2016) Multi-class multi-object tracking using changing point detection. In European Conference on Computer Vision (pp. 68–83). Springer, Cham

    Google Scholar 

  22. 22.

    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  23. 23.

    Milan A, Roth S, Schindler K (2014) Continuous energy minimization for multitarget tracking. IEEE Trans Pattern Anal Mach Intell 36(1):58–72

    Article  Google Scholar 

  24. 24.

    Milan A, Leal-Taixe L, Reid I, Roth S, Schindler K (2016) MOT16: A Benchmark for Multi-Object Tracking. arXiv preprint arXiv:1603.00831

  25. 25.

    Mitzel D, Leibe B (2011) Real-time multi-person tracking with detector assisted structure propagation. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on (pp. 974–981). IEEE

  26. 26.

    Munkres J (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1):32–38

    MathSciNet  Article  Google Scholar 

  27. 27.

    Pirsiavash H, Ramanan D, Fowlkes CC (2011) Globally-optimal greedy algorithms for tracking a variable number of objects. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on (pp. 1201–1208). IEEE

  28. 28.

    Possegger H, Mauthner T, Roth PM, Bischof H (2014) Occlusion geodesics for online multi-object tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1306–1313)

  29. 29.

    Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91–99)

  30. 30.

    Sanchez-Matilla R, Poiesi F, Cavallaro A (2016) Online Multi-target Tracking with Strong and Weak Detections. In ECCV Workshops (2) (pp. 84–99)

    Google Scholar 

  31. 31.

    Sanchez-Matilla R, Poiesi F, Cavallaro A (2016) Multi-target tracking with strong and weak detections. In ECCV Workshops-Benchmarking Multi-Target Tracking (Vol. 5, No. 6, p. 18)

  32. 32.

    Stiefelhagen R, Bernardin K, Bowers R, Garofolo J, Mostefa D, Soundararajan P (2006) The CLEAR 2006 evaluation. In International Evaluation Workshop on Classification of Events, Activities and Relationships(pp. 1–44). Springer Berlin Heidelberg

  33. 33.

    Sugimura D, Kitani KM, Okabe T, Sato Y, Sugimoto A (2009) Using individuality to track individuals: clustering individual trajectories in crowds using local appearance and frequency trait. In 2009 IEEE 12th International Conference on Computer Vision (pp. 1467–1474). IEEE

  34. 34.

    Tang S, Andriluka M, Andres B, Schiele B (2017). Multiple people tracking by lifted multicut and person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3539–3548)

  35. 35.

    Tao D, Guo Y, Song M, Li Y, Yu Z, Tang YY (2016) Person re-identification by dual-regularized kiss metric learning. IEEE Trans Image Process 25(6):2726–2738

    MathSciNet  Article  Google Scholar 

  36. 36.

    Wang X, Yang M, Zhu S, Lin Y (2013) Regionlets for generic object detection. In Proceedings of the IEEE International Conference on Computer Vision (pp. 17–24)

  37. 37.

    Wu B, Nevatia R (2006). Tracking of multiple, partially occluded humans based on static body part detection. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06) (Vol. 1, pp. 951–958). IEEE

  38. 38.

    Yang B, Nevatia R (2012) Multi-target tracking by online learning of non-linear motion patterns and robust appearance models. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on (pp. 1918–1925). IEEE

  39. 39.

    Yang M, Yu T, Wu Y (2007) Game-theoretic multiple target tracking. In 2007 IEEE 11th International Conference on Computer Vision(pp. 1–8). IEEE

  40. 40.

    Yang H, Shao L, Zheng F, Wang L, Song Z (2011) Recent advances and trends in visual tracking: a review. Neurocomputing 74(18):3823–3831

    Article  Google Scholar 

  41. 41.

    Yu F, Li W, Li Q, Liu Y, Shi X, Yan J (2016) POI: multiple object tracking with high performance detection and appearance feature. In European Conference on Computer Vision (pp. 36–42). Springer, Cham

    Google Scholar 

  42. 42.

    Zeiler MD, Fergus R (2014). Visualizing and understanding convolutional networks. In European Conference on Computer Vision (pp. 818–833). Springer International Publishing

  43. 43.

    Zhao X, Gong D, Medioni G (2012) Tracking using motion patterns for very crowded scenes. In Computer Vision–ECCV 2012 (pp. 315–328). Springer Berlin Heidelberg

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Nima Mahmoudi.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mahmoudi, N., Ahadi, S.M. & Rahmati, M. Multi-target tracking using CNN-based features: CNNMTT. Multimed Tools Appl 78, 7077–7096 (2019). https://doi.org/10.1007/s11042-018-6467-6

Download citation

Keywords

  • Multi-target tracking
  • Machine vision
  • Tracking-by-detection
  • Multi-Object tracking
  • Video surveillance
  • Pedestrian tracking