Skip to main content

Multi-Vehicle Tracking Using Heterogeneous Neural Networks for Appearance And Motion Features

Abstract

This paper presents a multi-vehicle tracking algorithm using appearance feature and motion history based on heterogeneous deep learning aimed at autonomous driving applications. Our proposed multi-vehicle tracking model follows the tracking-by-detection paradigm. To track multiple vehicles, we utilize the appearance and motion features of the target vehicles in consecutive frames. The proposed multi-vehicle tracking system employs a deep convolutional neural network, which is trained with a triplet loss minimization method to extract appearance features. The key contribution of the proposed method lies in a Long Short-Term Memory (LSTM) with a fully connected layer that accurately predicts the probability distribution of the next appearance and motion features of tracked objects. We constructed a multi-vehicle tracking dataset from various real road traffic using a camera sensor on a vehicle. To evaluate our proposed algorithm, we use several multi-target tracking datasets from the KITTI object tracking benchmark, which is a Public tracking dataset, as well as our evaluation dataset. Experimental results demonstrate that the proposed multi-vehicle tracking algorithm achieves a MOTA of 84.5% and MOTP 86.3% on the KITTI tracking dataset, and a MOTA of 81.8% and MOTP 84.8% on our evaluation dataset, an improvement of 8.6% and 9.6% over the previous methods.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

References

  1. Luo, W., Xing, J., Milan, A., Zhang, X., Liu, W., Zhao, X., Kim, T.K.: Multiple object tracking: A literature review. arXiv:1409.7618 (2014)

  2. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448 (2015)

  3. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp. 91–99 (2015)

  4. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 2961–2969 (2017)

  5. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: European conference on computer vision, pp. 21–37. Springer (2016)

  6. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788 (2016)

  7. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7263–7271 (2017)

  8. Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 815–823 (2015)

  9. Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv:1703.07737 (2017)

  10. Abdallah, M.S., Kim, H., Ragab, M.E., Hemayed, E.E.: Zero-shot deep learning for media mining: Person spotting and face clustering in video big data. Electronics 8(12), 1394 (2019)

    Article  Google Scholar 

  11. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587 (2014)

  12. Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. arXiv:1804.02767. (2018)

  13. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: Optimal speed and accuracy of object detection. arXiv:2004.10934. (2020)

  14. Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B.: Simple online and realtime tracking. In: 2016 IEEE international conference on image processing (ICIP), pp. 3464–3468. IEEE (2016)

  15. Arulampalam, M.S., Maskell, S., Gordon, N., Clapp, T.: A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal Processing 50(2), 174 (2002)

    Article  Google Scholar 

  16. Leal-Taixé, L., Canton-Ferrer, C., Schindler, K.: Learning by tracking: Siamese CNN for robust target association. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops. pp. 33–40 (2016)

  17. Tao, R., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1420–1429 (2016)

  18. Milan, A., Rezatofighi, S.H., Dick, A., Reid, I., Schindler, K.: Online multi-target tracking using recurrent neural networks. In: Thirty-First AAAI conference on artificial intelligence (2017)

  19. Wan, X., Wang, J., Zhou, S.: An online and flexible multi-object tracking framework using long short-term memory. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 1230–1238 (2018)

  20. Fang, K., Xiang, Y., Li, X., Savarese, S.: Recurrent autoregressive networks for online multi-object tracking. In: 2018 IEEE winter conference on applications of computer vision (WACV), pp. 466–475. IEEE (2018)

  21. Rumelhart, D.E., Hinton, G.E., Williams, R.J., et al.: Learning representations by back-propagating errors. Cognitive Modeling 5(3), 1 (1988)

    MATH  Google Scholar 

  22. Schmidhuber, J., Hochreiter, S.: Long short-term memory. Neural Comput 9(8), 1735 (1997)

    Article  Google Scholar 

  23. Sadeghian, A., Alahi, A., Savarese, S.: Tracking the untrackable: Learning to track multiple cues with long-term dependencies. In: Proceedings of the IEEE international conference on computer vision, pp. 300–311 (2017)

  24. Sharma, S., Ansari, J.A., Murthy, J.K., Krishna, K.M.: Beyond pixels: Leveraging geometry and shape cues for online multi-object tracking. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 3508–3515. IEEE (2018)

  25. Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International conference on image processing (ICIP), pp. 3645–3649. IEEE (2017)

  26. Zhang, W., Zhou, H., Sun, S., Wang, Z., Shi, J., Loy, C.C.: Robust multi-modality multi-object tracking. In: Proceedings of the IEEE International conference on computer vision, pp. 2365–2374 (2019)

  27. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)

  28. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European conference on computer vision, pp. 818–833. Springer (2014)

  29. Kokare, M., Chatterji, B.N., Biswas, P.K.: Comparison of similarity metrics for texture image retrieval. In: TENCON 2003. Conference on convergent technologies for asia-pacific region, vol. 2 pp. 571–575. (2003) https://doi.org/10.1109/TENCON.2003.1273228

  30. Tafazzoli, F., Frigui, H., Nishiyama, K.: A large and diverse dataset for improved vehicle make and model recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 1–8 (2017)

  31. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition, pp. 3354–3361. IEEE (2012)

  32. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)

  33. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)

  34. Bernardin, K., Elbs, A., Stiefelhagen, R.: Multiple object tracking performance metrics and evaluation in a smart room environment. In: Sixth IEEE International Workshop on Visual Surveillance, in conjunction with ECCV, vol. 90, p. 91. Citeseer (2006)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed S. Abdallah.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Abdallah, M.S., Han, D.S. & Kim, H. Multi-Vehicle Tracking Using Heterogeneous Neural Networks for Appearance And Motion Features. Int. J. ITS Res. (2022). https://doi.org/10.1007/s13177-022-00320-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13177-022-00320-6

Keywords

  • Convolution Neural network (CNN)
  • KL divergence
  • Multiple Object Tracking (MOT)
  • Object detection
  • Triplet loss
  • Vehicle tracking