Advertisement

Generic Pixel Level Object Tracker Using Bi-Channel Fully Convolutional Network

  • Zijing Chen
  • Jun Li
  • Zhe Chen
  • Xinge You
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10634)

Abstract

As most of the object tracking algorithms predict bounding boxes to cover the target, pixel-level tracking methods provide a better description of the target. However, it remains challenging for a tracker to precisely identify detailed foreground areas of the target. In this work, we propose a novel bi-channel fully convolutional neural network to tackle the generic pixel-level object tracking problem. By capturing and fusing both low-level and high-level temporal information, our network is able to produce pixel-level foreground mask of the target accurately. In particular, our model neither updates parameters to fit the tracked target nor requires prior knowledge about the category of the target. Experimental results show that the proposed network achieves compelling performance on challenging videos in comparison with competitive tracking algorithms.

Keywords

Visual tracking Segmentation Convolutional neural network 

Notes

Acknowledgments

This work is supported by Big Massive Open Online Course (MOOC) Data Retrieval and Classification Based on Cognitive Style.

References

  1. 1.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  2. 2.
    Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: IEEE International Conference on Computer Vision, pp. 4489–4497. IEEE Press, New York (2015)Google Scholar
  3. 3.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440. IEEE Press, New York (2015)Google Scholar
  4. 4.
    Dai, J., He, K., Li, Y., Ren, S., Sun, J.: Instance-sensitive fully convolutional networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 534–549. Springer, Cham (2016). doi: 10.1007/978-3-319-46466-4_32 CrossRefGoogle Scholar
  5. 5.
    Levi, D., Garnett, N., Fetaya, E., Herzlyia, I.: StixelNet: a deep convolutional network for obstacle detection and road segmentation. In: British Machine Vision Conference, p. 109-1. BMVC Press (2015)Google Scholar
  6. 6.
    Shen, S.-C., Zheng, W.-L., Lu, B.-L.: Online object tracking based on depth image with sparse coding. In: Loo, C.K., Yap, K.S., Wong, K.W., Beng Jin, A.T., Huang, K. (eds.) ICONIP 2014. LNCS, vol. 8836, pp. 234–241. Springer, Cham (2014). doi: 10.1007/978-3-319-12643-2_29 Google Scholar
  7. 7.
    Mei, X., Ling, H., Wu, Y., Blasch, E.P., Bai, L.: Efficient minimum error bounded particle resampling L1 tracker with occlusion detection. IEEE Trans. Image Process. 22, 2661–2675 (2013)CrossRefMathSciNetzbMATHGoogle Scholar
  8. 8.
    Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: Exploiting the circulant structure of tracking-by-detection with kernels. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 702–715. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33765-9_50 CrossRefGoogle Scholar
  9. 9.
    Danelljan, M., Hger, G., Khan, F., Felsberg, M.: Accurate scale estimation for robust visual tracking. In: British Machine Vision Conference. BMVC Press (2014)Google Scholar
  10. 10.
    Hare, S., Golodetz, S., Saffari, A., Vineet, V., Cheng, M.M., Hicks, S.L., Torr, P.H.: Struck: structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2096–2109 (2016)CrossRefGoogle Scholar
  11. 11.
    Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3074–3082. IEEE Press, New York (2015)Google Scholar
  12. 12.
    Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)CrossRefGoogle Scholar
  13. 13.
    Tao, R., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1420–1429. IEEE Press, New York (2016)Google Scholar
  14. 14.
    Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). doi: 10.1007/978-3-319-48881-3_56 CrossRefGoogle Scholar
  15. 15.
    Pathak, D., Girshick, R., Dollár, P., Darrell, T., Hariharan, B.: Learning features by watching objects move. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE Press, New York (2017)Google Scholar
  16. 16.
    Jain, S., Xiong, B., Grauman, K.: FusionSeg: learning to combine motion and appearance for fully automatic segmention of generic objects in videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE Press, New York (2017)Google Scholar
  17. 17.
    Caelles, S., Maninis, K.K., Pont-Tuset, J., Leal-Taixé, L., Cremers, D., Van Gool, L.: One-shot video object segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE Press, New York (2017)Google Scholar
  18. 18.
    Khoreva, A., Perazzi, F., Benenson, R., Schiele, B., Sorkine-Hornung, A.: Learning video object segmentation from static images. arXiv preprint arXiv:1612.02646 (2016)
  19. 19.
    Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 724–732. IEEE Press, New York (2016)Google Scholar
  20. 20.
    Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., et al.: Flownet: learning optical flow with convolutional networks. In: IEEE International Conference on Computer Vision, pp. 2758–2766. IEEE Press, New York (2015)Google Scholar
  21. 21.
    Zhang, Z., Saligrama, V.: Zero-shot learning via semantic similarity embedding. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4166–4174. IEEE Press, New York (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Faculty of Engineering and Information Technology, Centre for Artificial IntelligenceUniversity of Technology SydneyUltimoAustralia
  2. 2.School of Information Technology, UBTECH Sydney Artificial Intelligence CentreThe University of SydneyDarlingtonAustralia
  3. 3.School of Electronic Information and CommunicationsHuazhong University of Science and TechnologyWuhanChina

Personalised recommendations