Robust Person Tracking Algorithm Based on Convolutional Neural Network for Indoor Video Surveillance Systems

  • Rykhard BohushEmail author
  • Iryna Zakharava
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1055)


In this paper, we present an algorithm for multi person tracking in indoor surveillance systems based on tracking-by-detection approach. Convolutional Neural Networks (CNNs) for detection and tracking both are used. CNN Yolov3 has been utilized as detector. Person features extraction is performed based on modified CNN ResNet. Proposed architecture includes 29 convolutional and one fully connected layer. Hungarian algorithm is applied for objects association. After that object visibility in the frame is determined based on CNN and color features. For algorithm evaluation prepared videos that was labeled and tested using MOT evaluation metric. The proposed algorithm efficiency is illustrated and confirmed by our experimental results.


Person Tracking Indoor CNN CUDA 


  1. 1.
    MOTChallenge: The Multiple Object Tracking Benchmark. Accessed 14 Aug 2019
  2. 2.
    Miguel, M.D., Brunete, A., Hernando, M., Gambao, E.: Home camera-based fall detection system for the elderly. Sensors 17(12), 2864 (2017)CrossRefGoogle Scholar
  3. 3.
    Kuplyakov, D., Shalnov, E., Konushin, A.: Markov chain Monte Carlo based video tracking algorithm. Program. Comput. Softw. 43(4), 224–229 (2017)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Tao, R., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1420–1429. IEEE, Las Vegas (2016)Google Scholar
  5. 5.
    Zhao, L., Li, X., Zhuang, Y., Wang, J.: Deeply-learned part-aligned representations for person re-identification. In: 2017 IEEE International Conference on Computer Vision (CVPR), pp. 3239–3248. IEEE, Venice (2017)Google Scholar
  6. 6.
    Chahyati, D., Fanany, M.I., Arymurthy, A.: Tracking people by detection using CNN features. Proc. Comput. Sci. 124, 167–172 (2017)CrossRefGoogle Scholar
  7. 7.
    Insafutdinov, E., et al.: ArtTrack: articulated multi-person tracking in the wild. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1293–1301. IEEE, Honolulu (2016)Google Scholar
  8. 8.
    Iqbal, U., Milan, A., Gall, J.: PoseTrack: joint multi-person pose estimation and tracking. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4654–4663. Honolulu (2016)Google Scholar
  9. 9.
    Wojke, N., Bewley, A., Paulus, D.: Simple online and real time tracking with a deep association metric. In: IEEE International Conference on Image Processing (ICIP), pp. 3645–3649. IEEE, Beijing (2017)Google Scholar
  10. 10.
    Bewley, A., Ge, Z., Ott, L., Ramos, F.T., Upcroft, B.: Simple online and real time tracking. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 3464–3468. IEEE, Phoenix (2016)Google Scholar
  11. 11.
    Real-time Multi-person tracker using YOLO v3 and deep_sort with tensorflow. Accessed 14 Aug 2019
  12. 12.
    YOLOv3: An Incremental Improvement. Source. Accessed 14 Aug 2019
  13. 13.
    MOT16 Results. Accessed 14 Aug 2019
  14. 14.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE, Las Vegas (2016)Google Scholar
  15. 15.
    Wu, L., Chunhua, S., Hengel, A.: PersonNet: person re-identification with deep convolutional neural networks. Accessed 14 Aug 2019
  16. 16.
    Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Logistics Q. 2, 83–97 (1995)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Bogush, R., Maltsev, S.: Minimax criterion of similarity for video information processing. In: Siberian Conference on Control and Communications, pp. 120–127. IEEE, Tomsk (2007)Google Scholar
  18. 18.
  19. 19.
    iLIDS Video re-IDentification (iLIDS-VID) Dataset. Accessed 14 Aug 2019
  20. 20.
    Keni, B., Stiefelhagen, R.: Evaluating multiple object tracking performance: the CLEAR MOT metrics. EURASIP J. Image Video Process. 1, 1–10 (2008)Google Scholar
  21. 21.
    King, D.W.: Dlib-ml. machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Polotsk State UniversityNovopolotskRepublic of Belarus

Personalised recommendations