Skip to main content

Multi-domain collaborative feature representation for robust visual object tracking

Abstract

Jointly exploiting multiple different yet complementary domain information has been proven to be an effective way to perform robust object tracking. This paper focuses on effectively representing and utilizing complementary features from the frame domain and event domain for boosting object tracking performance in challenge scenarios. Specifically, we propose common features extractor to learn potential common representations from the RGB domain and event domain. For learning the unique features of the two domains, we utilize a unique extractor for event based on Spiking neural networks to extract edge cues in the event domain which may be missed in RGB in some challenging conditions, and a unique extractor for RGB based on deep convolutional neural networks to extract texture and semantic information in RGB domain. Extensive experiments on standard RGB benchmark and real event tracking dataset demonstrate the effectiveness of the proposed approach. We show our approach outperforms all compared state-of-the-art tracking algorithms and verify event-based data is a powerful cue for tracking in challenging scenes.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

References

  1. Barranco, F., Fermuller, C., Ros, E.: Real-time clustering and multi-target tracking using event-based sensors. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2018)

  2. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: Proceedings of the European Conference on Computer Vision. Springer (2016)

  3. Bhat, G., Danelljan, M., Gool, L.V., Timofte, R.: Learning discriminative model prediction for tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2019)

  4. Bi, Y., Chadha, A., Abbas, A., Bourtsoulatze, E., Andreopoulos, Y.: Graph-based object classification for neuromorphic vision sensing. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2019)

  5. Brändli, C., Berner, R., Yang, M., Liu, S.C., Delbrück, T.: A 240 \(\times \) 180 130 db 3 \(\mu \)s latency global shutter spatiotemporal vision sensor. IEEE J. Solid-state Circ. (2014)

  6. Cadena, P.R.G., Qian, Y., Wang, C., Yang, M.: Spade-e2vid: Spatially-adaptive denormalization for event-based video reconstruction. IEEE Trans. Image Process. (2021)

  7. Chen, H., Suter, D., Wu, Q., Wang, H.: End-to-end learning of object motion estimation from retinal events for event-based object tracking. In: Proceedings of the AAAI Conference on Artificial Intelligence (2020)

  8. Chen, H., Wu, Q., Liang, Y., Gao, X., Wang, H.: Asynchronous tracking-by-detection on adaptive time surfaces for event-based object tracking. In: Proceedings of the 27th ACM International Conference on Multimedia (2019)

  9. Choi, J., Yoon, K.J., et al.: Learning to super resolve intensity images from events. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)

  10. Cohen, G.K., Orchard, G., Leng, S.H., Tapson, J., Benosman, R.B., Van Schaik, A.: Skimming digits: neuromorphic classification of spike-encoded images. Front. Neurosci. (2016)

  11. Dai, K., Wang, D., Lu, H., Sun, C., Li, J.: Visual tracking via adaptive spatially-regularized correlation filters. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)

  12. Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: Atom: Accurate tracking by overlap maximization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)

  13. Danelljan, M., Bhat, G., Shahbaz Khan, F., Felsberg, M.: Eco: Efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)

  14. Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision (2015)

  15. Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)

  16. Gehrig, D., Loquercio, A., Derpanis, K.G., Scaramuzza, D.: End-to-end learning of representations for asynchronous event-based data. In: Proceedings of the IEEE International Conference on Computer Vision (2019)

  17. Gehrig, M., Shrestha, S.B., Mouritzen, D., Scaramuzza, D.: Event-based angular velocity regression with spiking networks. In: 2020 IEEE International Conference on Robotics and Automation (ICRA) (2020)

  18. Gerstner, W.: Time structure of the activity in neural network models. Phys. Rev. E 51, 738–758 (1995)

    Article  Google Scholar 

  19. He, A., Luo, C., Tian, X., Zeng, W.: A twofold siamese network for real-time object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)

  20. Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37, 583–596 (2014)

    Article  Google Scholar 

  21. Huang, L., Zhao, X., Huang, K.: Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 43, 1562–1577 (2019)

    Article  Google Scholar 

  22. Jung, I., Son, J., Baek, M., Han, B.: Real-time mdnet. In: Proceedings of the European Conference on Computer Vision (2018)

  23. Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. (2011)

  24. Kart, U., Kämäräinen, J.K., Matas, J., Fan, L., Cricri, F.: Depth masked discriminative correlation filter. In: 2018 24th International Conference on Pattern Recognition (2018)

  25. Kart, U., Kämäräinen, J.K., Matas, J., Matas, J.: How to make an rgbd tracker? In: Proceedings of the European Conference on Computer Vision (2018)

  26. Kepple, D.R., Lee, D., Prepsius, C., Isler, V., Park, I.M., Lee, D.D.: Jointly learning visual motion and confidence from local patches in event cameras. In: Proceedings of the European Conference on Computer Vision (2020)

  27. Lan, X., Ye, M., Zhang, S., Yuen, P.C.: Robust collaborative discriminative learning for rgb-infrared tracking. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

  28. Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: Siamrpn++: Evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)

  29. Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)

  30. Li, C., Lu, A., Zheng, A., Tu, Z., Tang, J.: Multi-adapter rgbt tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)

  31. Li, C., Zhu, C., Huang, Y., Tang, J., Wang, L.: Cross-modal ranking with soft consistency and noisy labels for robust rgb-t tracking. In: Proceedings of the European Conference on Computer Vision (2018)

  32. Li, P., Chen, B., Ouyang, W., Wang, D., Yang, X., Lu, H.: Gradnet: Gradient-guided network for visual object tracking. In: Proceedings of the IEEE International Conference on Computer Vision (2019)

  33. Li, W., Li, X., Bourahla, O.E., Huang, F., Wu, F., Liu, W., Wang, Z., Liu, H.: Progressive multistage learning for discriminative tracking. IEEE Trans. Cybern. (2020)

  34. Mei, H., Liu, Y., Wei, Z., Zhou, D., Xiaopeng, X., Zhang, Q., Yang, X.: Exploring dense context for salient object detection. IEEE Trans. Circuits Syst. Video Technol. (2021)

  35. Mei, H., Yang, X., Wang, Y., Liu, Y., He, S., Zhang, Q., Wei, X., Lau, R.W.: Don’t hit me! glass detection in real-world scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)

  36. Mitrokhin, A., Fermuller, C., Parameshwara, C., Aloimonos, Y.: Event-based moving object detection and tracking. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2018)

  37. Mostafavi, M., Wang, L., Yoon, K.J.: Learning to reconstruct hdr images from events, with applications to depth and flow prediction. Int. J. Comput. Vis. (2021)

  38. Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

  39. Neftci, E.O., Mostafa, H., Zenke, F.: Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks. IEEE Signal Process. Mag. 36, 51–63 (2019)

    Article  Google Scholar 

  40. Pan, L., Liu, M., Hartley, R.: Single image optical flow estimation with an event camera. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)

  41. Piatkowska, E., Belbachir, A.N., Schraml, S., Gelautz, M.: Spatiotemporal multiple persons tracking using dynamic vision sensor. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (2012)

  42. Qiao, Y., Liu, Y., Yang, X., Zhou, D., Xu, M., Zhang, Q., Wei, X.: Attention-guided hierarchical structure aggregation for image matting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)

  43. Qiao, Y., Liu, Y., Zhu, Q., Yang, X., Wang, Y., Zhang, Q., Wei, X.: Multi-scale information assembly for image matting. In: Computer Graphics Forum (2020)

  44. Ramesh, B., Zhang, S., Yang, H., Ussa, A., Ong, M., Orchard, G., Xiang, C.: e-tld: Event-based framework for dynamic object tracking. IEEE Trans. Circuits Syst. Video Technol. (2020)

  45. Rebecq, H., Gehrig, D., Scaramuzza, D.: Esim: an open event camera simulator. In: Conference on Robot Learning (2018)

  46. Rebecq, H., Ranftl, R., Koltun, V., Scaramuzza, D.: Events-to-video: Bringing modern computer vision to event cameras. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)

  47. Ren, W., Wang, X., Tian, J., Tang, Y., Chan, A.B.: Tracking-by-counting: Using network flows on crowd density maps for tracking multiple targets. IEEE Trans. Image Process. (2020)

  48. Shrestha, S.B., Orchard, G.: Slayer: Spike layer error reassignment in time. In: Advances in Neural Information Processing Systems (2018)

  49. Shrestha, S.B., Orchard, G.: SLAYER: Spike layer error reassignment in time. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems. Curran Associates Inc, Red Hook (2018)

    Google Scholar 

  50. Shrestha, S.B., Song, Q.: Robustness to training disturbances in Spikeprop learning. IEEE Trans. Neural Netw. Learn. Syst. 29, 3126–3139 (2017)

    Google Scholar 

  51. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  52. Song, S., Xiao, J.: Tracking revisited using rgbd camera: Unified benchmark and baselines. In: Proceedings of the IEEE International Conference on Computer Vision (2013)

  53. Stoffregen, T., Gallego, G., Drummond, T., Kleeman, L., Scaramuzza, D.: Event-based motion segmentation by motion compensation. In: Proceedings of the IEEE International Conference on Computer Vision (2019)

  54. Tavanaei, A., Ghodrati, M., Kheradpisheh, S.R., Masquelier, T., Maida, A.: Deep learning in spiking neural networks. Neural Netw 111, 47–63 (2019)

    Article  Google Scholar 

  55. Tulyakov, S., Fleuret, F., Kiefel, M., Gehler, P., Hirsch, M.: Learning an event sequence embedding for dense event-based deep stereo. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2019)

  56. Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., Torr, P.H.: End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)

  57. Wang, L., Ho, Y.S., Yoon, K.J., et al.: Event-based high dynamic range image and very high frame rate video generation using conditional generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)

  58. Wang, X., Fan, B., Chang, S., Wang, Z., Liu, X., Tao, D., Huang, T.S.: Greedy batch-based minimum-cost flows for tracking multiple objects. IEEE Trans. Image Process. 26, 4765–4776 (2017)

    MathSciNet  Article  Google Scholar 

  59. Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1834–1848 (2015)

    Article  Google Scholar 

  60. Xiao, J., Stolkin, R., Gao, Y., Leonardis, A.: Robust fusion of color and depth data for RGB-D target tracking using adaptive range-invariant depth models and spatio-temporal consistency constraints. IEEE Trans. Cybern. 48, 2485–2499 (2017)

    Google Scholar 

  61. Xu, K., Yang, X., Yin, B., Lau, R.W.: Learning to restore low-light images via decomposition-and-enhancement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)

  62. Yang, X., Mei, H., Xu, K., Wei, X., Yin, B., Lau, R.W.: Where is my mirror? In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2019)

  63. Yang, X., Mei, H., Zhang, J., Xu, K., Yin, B., Zhang, Q., Wei, X.: DRFN: Deep recurrent fusion network for single-image super-resolution with large factors. IEEE Trans. Multimedia 21, 328–337D (2018)

    Article  Google Scholar 

  64. Yang, X., Xu, K., Chen, S., He, S., Yin, B.Y., Lau, R.: Active matting. Adv. Neural Inf. Process. Syst. (2018)

  65. Yang, X., Xu, K., Song, Y., Zhang, Q., Wei, X., Lau, R.W.: Image correction via deep reciprocating hdr transformation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)

  66. Zenke, F., Ganguli, S.: Superspike: Supervised learning in multilayer spiking neural networks. Neural Comput. (2018)

  67. Zhang, J., Long, C., Wang, Y., Piao, H., Mei, H., Yang, X., Yin, B.: A two-stage attentive network for single image super-resolution. IEEE Trans. Circuits Syst. Video Technol. (2021)

  68. Zhang, J., Long, C., Wang, Y., Yang, X., Mei, H., Yin, B.: Multi-context and enhanced reconstruction network for single image super resolution. In: 2020 IEEE International Conference on Multimedia and Expo. IEEE (2020)

  69. Zhang, L., Danelljan, M., Gonzalez-Garcia, A., van de Weijer, J., Shahbaz Khan, F.: Multi-modal fusion for end-to-end rgb-t tracking. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2019)

  70. Zhang, T., Liu, S., Xu, C., Liu, B., Yang, M.H.: Correlation particle filter for visual tracking. IEEE Trans. Image Process. (2017)

  71. Zhang, T., Xu, C., Yang, M.H.: Learning multi-task correlation particle filters for visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. 41, 365–378 (2018)

    Article  Google Scholar 

  72. Zhang, Z., Peng, H.: Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)

  73. Zhu, Q., Triesch, J., Shi, B.E.: An event-by-event approach for velocity estimation and object tracking with an active event camera. IEEE J. Emerging Sel. Top. Circuits Syst. 10, 557–566 (2020)

    Article  Google Scholar 

  74. Zhu, Y., Li, C., Luo, B., Tang, J., Wang, X.: Dense feature aggregation and pruning for rgbt tracking. In: Proceedings of the 27th ACM International Conference on Multimedia (2019)

  75. Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European Conference on Computer Vision (2018)

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 91748104, Grant 61972067, and the Innovation Technology Funding of Dalian (Project No. 2018J11CY010, 2020JJ26GX036).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Yang.

Ethics declarations

Conflict of interest

Jiqing Zhang, Kai Zhao, Bo Dong, Yingkai Fu, Yuxin Wang, Xin Yang and Baocai Yin declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Zhao, K., Dong, B. et al. Multi-domain collaborative feature representation for robust visual object tracking. Vis Comput 37, 2671–2683 (2021). https://doi.org/10.1007/s00371-021-02237-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-021-02237-9

Keywords

  • Visual object tracking
  • Event-based camera
  • Multi-domain
  • Challenging conditions