Globally Optimal Object Tracking with Complementary Use of Single Shot Multibox Detector and Fully Convolutional Network

  • Jinho Lee
  • Brian Kenji Iwana
  • Shouta Ide
  • Hideaki Hayashi
  • Seiichi Uchida
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10749)


Object tracking is one of the most important but still difficult tasks in computer vision and pattern recognition. The main difficulties in the tracking task are appearance variation of target objects and occlusion. To deal with those difficulties, we propose a object tracking method combining Single Shot Multibox Detector (SSD), Fully Convolutional Network (FCN) and Dynamic Programming (DP). SSD and FCN provide a probability value of the target object which allows for appearance variation within each category. DP provides a globally optimal tracking path even with severe occlusions. Through several experiments, we confirmed that their combination realized a robust object tracking method. Also, in contrast to traditional trackers, initial position and a template of the target do not need to be specified. We show that the proposed method has a higher performance than the traditional trackers in tracking various single objects through video frames.


Object tracking Single Shot Multibox Detector Fully Convolutional Network Dynamic Programming 


  1. 1.
    Lewis, J.P.: Fast template matching. In: Vision Interface vol. 95, no. 120123, pp. 15–19 (1995)Google Scholar
  2. 2.
    Okuma, K., Taleghani, A., de Freitas, N., Little, J.J., Lowe, D.G.: A boosted particle filter: multitarget detection and tracking. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3021, pp. 28–39. Springer, Heidelberg (2004). CrossRefGoogle Scholar
  3. 3.
    Comaniciu, D., Ramesh, V., Meer, P.: Real-time tracking of non-rigid objects using mean shift. Comput. Vis. Pattern Recogn. 2, 142–149 (2000)Google Scholar
  4. 4.
    Zach, C., Gallup, D., Frahm, J.-M.: Fast gain-adaptive KLT tracking on the GPU. In: Computer Vision and Pattern Recognition, pp. 1–7 (2008)Google Scholar
  5. 5.
    He, W., Yamashita, T., Lu, H., Lao, S.: Surf tracking. In: IEEE 12th International Conference on Computer Vision, pp. 1586–1592 (2009)Google Scholar
  6. 6.
    Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). CrossRefGoogle Scholar
  7. 7.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)Google Scholar
  8. 8.
    Uchida, S., Sakoe, H.: A monotonic and continuous two-dimensional warping based on dynamic programming. In: International Conference on Pattern Recognition, vol. 1, pp. 521–524 (1998)Google Scholar
  9. 9.
    Geiger, D., Gupta, A., Costa, L.A., Vlontzos, J.: Dynamic programming for detecting, tracking, and matching deformable contours. IEEE Trans. Pattern Anal. Mach. Intell. 17(3), 294–302 (1995)CrossRefGoogle Scholar
  10. 10.
    Arnold, J., Shaw, S.W., Pasternack, H.: Efficient target tracking using dynamic programming. IEEE Trans. Aerosp. Electron. Syst. 29(1), 44–56 (1993)CrossRefGoogle Scholar
  11. 11.
    Wu, Y., Lim, J., Yang, M.-H.: Online object tracking: a benchmark. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2411–2418 (2013)Google Scholar
  12. 12.
    Mei, X., Ling, H.: Robust visual tracking using \(l1\) minimization. In: IEEE 12th International Conference on Computer Vision, pp. 1436–1443 (2009)Google Scholar
  13. 13.
    Zhang, T., Ghanem, B., Liu, S., Ahuja, N.: Robust visual tracking via multi-task sparse learning. In: Computer Vision and Pattern Recognition (CVPR), pp. 2042–2049 (2012)Google Scholar
  14. 14.
    Han, B., Comaniciu, D., Zhu, Y., Davis, L.S.: Sequential kernel density approximation and its application to real-time visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. 30(7), 1186–1197 (2008)CrossRefGoogle Scholar
  15. 15.
    Jepson, A.D., Fleet, D.J., El-Maraghi, T.F.: Robust online appearance models for visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. 25(10), 1296–1311 (2003)CrossRefGoogle Scholar
  16. 16.
    Ross, D.A., Lim, J., Lin, R.-S., Yang, M.-H.: Incremental learning for robust visual tracking. Int. J. Comput. Vis. 77(1), 125–141 (2008)CrossRefGoogle Scholar
  17. 17.
    Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1409–1422 (2012)CrossRefGoogle Scholar
  18. 18.
    Grabner, H., Grabner, M., Bischof, H.: Real-time tracking via on-line boosting. In: BMVC, vol. 1, no. 5, p. 6 (2006)Google Scholar
  19. 19.
    Grabner, H., Leistner, C., Bischof, H.: Semi-supervised on-line boosting for robust tracking. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 234–247. Springer, Heidelberg (2008). CrossRefGoogle Scholar
  20. 20.
    Son, J., Jung, I., Park, K., Han, B.: Tracking-by segmentation with online gradient boosting decision tree. In: Proceedings of IEEE International Conference on Computer Vision, pp. 3056–3064 (2015)Google Scholar
  21. 21.
    Lawrence, S., Giles, C.L., Tsoi, A.C., Back, A.D.: Face recognition: a convolutional neural-network approach. IEEE Trans. Neural Netw. 8(1), 98–113 (1997)CrossRefGoogle Scholar
  22. 22.
    Ciresan, D.C., Meier, U., Gambardella, L.M., Schmidhuber, J.: Convolutional neural network committees for handwritten character classification. In: Document Analysis and Recognition (ICDAR), pp. 1135–1139 (2011)Google Scholar
  23. 23.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  24. 24.
    Sainath, T.N., Mohamed, A.-R., Kingsbury, B., Ramabhadran, B.: Deep convolutional neural networks for LVCSR. In: Acoustics, Speech and Signal Processing (ICASSP), pp. 8614–8618 (2013)Google Scholar
  25. 25.
    Fan, J., Xu, W., Wu, Y., Gong, Y.: Human tracking using convolutional neural networks. IEEE Trans. Neural Netw. 21(10), 1610–1623 (2010)CrossRefGoogle Scholar
  26. 26.
    Maung, T.H.H.: Real-time hand tracking and gesture recognition system using neural networks. World Acad. Sci. Eng. Technol. 50, 466–470 (2009)Google Scholar
  27. 27.
    Torricelli, D., Conforto, S., Schmid, M., D’Alessio, T.: A neural-based remote eye gaze tracker under natural head motion. Comput. Methods Prog. Biomed. 92(1), 66–78 (2008)CrossRefGoogle Scholar
  28. 28.
    Li, H., Li, Y., Porikli, F.: Robust online visual tracking with a single convolutional neural network. In: Asian Conference on Computer Vision, pp. 194–209 (2014)Google Scholar
  29. 29.
    Wang, L., Ouyang, W., Wang, X., Lu, H.: Visual tracking with fully convolutional networks. In: Proceedings of IEEE International Conference on Computer Vision, pp. 3119–3127 (2015)Google Scholar
  30. 30.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556 (2014)Google Scholar
  31. 31.
    Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015)CrossRefGoogle Scholar
  32. 32.
    Hare, S., Golodetz, S., Saffari, A., Vineet, V., Cheng, M.-M., Hicks, S.L., Torr, P.H.S.: Structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2096–2109 (2016)CrossRefGoogle Scholar
  33. 33.
    Zhong, W., Lu, H., Yang, M.-H.: Robust object tracking via sparsity-based collaborative model. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1838–1845 (2012)Google Scholar
  34. 34.
    Kalal, Z., Matas, J., Mikolajczyk, K.: P-N learning: bootstrapping binary classifiers by structural constraints. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 49–56 (2010)Google Scholar
  35. 35.
    Dinh, T.B., Vo, N., Medioni, G.: Context tracker: exploring supporters and distracters in unconstrained environments. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1177–1184 (2011)Google Scholar
  36. 36.
    Kwon, J., Lee, K.M.: Visual tracking decomposition. In: Computer Vision and Pattern Recognition (CVPR), pp. 1269–1276 (2010)Google Scholar
  37. 37.
    Zhang, L., Li, Y., Nevatia, R.: Global data association for multi-object tracking using network flows. In: Computer Vision and Pattern Recognition, pp. 1–8 (2008)Google Scholar
  38. 38.
    Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D., Brox, T.: FlowNet: learning optical flow with convolutional networks. In: IEEE International Conference on Computer Vision, pp. 2758–2766 (2015)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Jinho Lee
    • 1
  • Brian Kenji Iwana
    • 1
  • Shouta Ide
    • 1
  • Hideaki Hayashi
    • 1
  • Seiichi Uchida
    • 1
  1. 1.Kyushu UniversityFukuokaJapan

Personalised recommendations