Kernel based local matching network for video object segmentation

Wang, Guoqiang; Li, Lan; Zhu, Min; Zhao, Rui; Zhang, Xiang

doi:10.1007/s00138-024-01524-4

Kernel based local matching network for video object segmentation

Research
Published: 25 March 2024

Volume 35, article number 42, (2024)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Guoqiang Wang¹,
Lan Li²,
Min Zhu¹,
Rui Zhao³ &
…
Xiang Zhang²

93 Accesses
1 Altmetric
Explore all metrics

Abstract

Recently, the methods based on space-time memory network have achieved advanced performance in semi-supervised video object segmentation, which has attracted wide attention. However, this kind of methods still have a fatal limitation. It has the interference problem of similar objects caused by the way of non-local matching, which seriously limits the performance of video object segmentation. To solve this problem, we propose a Kernel-guided Attention Matching Network (KAMNet) by the use of local matching instead of non-local matching. At first, KAMNet uses spatio-temporal attention mechanism to enhance the model’s discrimination between foreground objects and background areas. Then KAMNet utilizes gaussian kernel to guide the matching between the current frame and the reference set. Because the gaussian kernel decays away from the center, it can limit the matching to the central region, thus achieving local matching. Our KAMNet gets speed-accuracy trade-off on benchmark datasets DAVIS 2016 (\( \mathcal {J \& F}\) of 87.6%) and DAVIS 2017 (\( \mathcal {J \& F}\) of 76.0%) with 0.12 second per frame.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

COMatchNet: Co-Attention Matching Network for Video Object Segmentation

Kernelized Memory Network for Video Object Segmentation

Adaptive Guidance and Attention-Refined Network for Fast Video Object Segmentation

Article 15 July 2023

References

Lee, K.-H., Hwang, J.-N.: On-road pedestrian tracking across multiple driving recorders. IEEE Trans. Multimed. 17(9), 1429–1438 (2015)
Article Google Scholar
Wang, W., Lu, X., Shen, J., Crandall, D.J., Shao, L.: Zero-shot video object segmentation via attentive graph neural networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9236–9245 (2019)
Wang, W., Shen, J., Porikli, F., Yang, R.: Semi-supervised video object segmentation with super-trajectories. IEEE Trans. Pattern Anal. Mach. Intell. 41(4), 985–998 (2018)
Article Google Scholar
Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 724–732 (2016)
Pont-Tuset, J., Perazzi, F., Caelles, S., Arbeláez, P., Sorkine-Hornung, A., Van Gool, L.: The 2017 davis challenge on video object segmentation. arXiv:1704.00675 (2017)
Caelles, S., Montes, A., Maninis, K.-K., Chen, Y., Van Gool, L., Perazzi, F., Pont-Tuset, J.: The 2018 davis challenge on video object segmentation. arXiv:1803.00557 (2018)
Oh, S.W., Lee, J.-Y., Xu, N., Kim, S.J.: Video object segmentation using space-time memory networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9226–9235 (2019)
Cheng, J., Tsai, Y.-H., Wang, S., Yang, M.-H.: Segflow: Joint learning for video object segmentation and optical flow. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)
Caelles, S., Maninis, K.-K., Pont-Tuset, J., Leal-Taixé, L., Cremers, D., Van Gool, L.: One-shot video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 221–230 (2017)
Voigtlaender, P., Leibe, B.: Online adaptation of convolutional neural networks for video object segmentation. arXiv:1706.09364 (2017)
Bao, L., Wu, B., Liu, W.: Cnn in mrf: Video object segmentation via inference in a cnn-based higher-order spatio-temporal mrf. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5977–5986 (2018)
Luiten, J., Voigtlaender, P., Leibe, B.: Premvos: Proposal-generation, refinement and merging for video object segmentation (2019)
Maninis, K.K., Caelles, S., Chen, Y., Pont-Tuset, J., Leal-Taixe, L., Cremers, D., Gool, L.V.: Video object segmentation without temporal information. IEEE Trans. Pattern Anal. Mach. Intell. 41(6), 1515–1530 (2019)
Article Google Scholar
Xiao, H., Kang, B., Liu, Y., Zhang, M., Feng, J.: Online meta adaptation for fast video object segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 42(5), 1205–1217 (2020)
Google Scholar
Khoreva, A., Perazzi, F., Benenson, R., Schiele, B., Sorkine-Hornung, A.: Learning video object segmentation from static images. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3491–3500 (2017)
Khoreva, Anna, Benenson, Rodrigo, Ilg, Eddy, Brox, Thomas, Schiele, Bernt: Lucid data dreaming for video object segmentation. Int. J. Comput. Vis. 127(9), 1175–1197 (2019). https://doi.org/10.1007/s11263-019-01164-6
Article Google Scholar
Hu, Y.T., Huang, J.B., Schwing, A.G.: Maskrnn: Instance level video object segmentation. In: 2018 Conference and Workship on Neural Information Processing Systems, pp. 325–334 (2018)
Li, X., Loy, C.C.: Video object segmentation with joint re-identification and attention-aware mask propagation. In: Proceedings of the European Conference on Computer Vision, pp. 90–105 (2018)
Oh, S.W., Lee, J.-Y., Sunkavalli, K., Kim, S.J.: Fast video object segmentation by reference-guided mask propagation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7376–7385 (2018)
Johnander, J., Danelljan, M., Brissman, E., Khan, F.S., Felsberg, M.: A generative appearance model for end-to-end video object segmentation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8945–8954 (2019)
Chen, X., Li, Z., Yuan, Y., Yu, G., Shen, J., Qi, D.: State-aware tracker for real-time video object segmentation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9381–9390 (2020)
Shin Yoon, J., Rameau, F., Kim, J., Lee, S., Shin, S., So Kweon, I.: Pixel-level matching for video object segmentation using convolutional neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2167–2176 (2017)
Hu, Y.-T., Huang, J.-B., Schwing, A.G.: Videomatch: Matching based video object segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 54–70 (2018)
Zeng, X., Liao, R., Gu, L., Xiong, Y., Fidler, S., Urtasun, R.: Dmm-net: Differentiable mask-matching network for video object segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3929–3938 (2019)
Voigtlaender, P., Chai, Y., Schroff, F., Adam, H., Leibe, B., Chen, L.-C.: Feelvos: Fast end-to-end embedding learning for video object segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9481–9490 (2019)
Yang, Z., Wei, Y., Yang, Y.: Collaborative video object segmentation by foreground-background integration. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V, pp. 332–348. Springer (2020)
Wang, Z., Xu, J., Liu, L., Zhu, F., Shao, L.: Ranet: Ranking attention network for fast video object segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3978–3987 (2019)
Miller, A., Fisch, A., Dodge, J., Karimi, A.-H., Bordes, A., Weston, J.: Key-value memory networks for directly reading documents. arXiv:1606.03126 (2016)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Sukhbaatar, S., Weston, J., Fergus, R., et al.: End-to-end memory networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Cheng, H.K., Tai, Y.-W., Tang, C.-K.: Rethinking space-time networks with improved memory coverage for efficient video object segmentation. Adv. Neural. Inf. Process. Syst. 34, 11781–11794 (2021)
Google Scholar
Hu, L., Zhang, P., Zhang, B., Pan, P., Xu, Y., Jin, R.: Learning position and target consistency for memory-based video object segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4144–4154 (2021)
Lu, X., Wang, W., Danelljan, M., Zhou, T., Shen, J., Van Gool, L.: Video object segmentation with episodic graph memory networks. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, pp. 661–679 (2020)
Liang, Y., Li, X., Jafari, N., Chen, J.: Video object segmentation with adaptive feature bank and uncertain-region refinement. Adv. Neural. Inf. Process. Syst. 33, 3430–3441 (2020)
Google Scholar
Wang, H., Jiang, X., Ren, H., Hu, Y., Bai, S.: Swiftnet: Real-time video object segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1296–1305 (2021)
Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. arXiv:2201.12329 (2022)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Perazzi, F., Khoreva, A., Benenson, R., Schiele, B., Sorkine-Hornung, A.: Learning video object segmentation from static images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Hu, Y.-T., Huang, J.-B., Schwing, A.: Maskrnn: Instance level video object segmentation. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Ci, H., Wang, C., Wang, Y.: Video object segmentation by learning location-sensitive embeddings. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 501–516 (2018)
Cheng, J., Tsai, Y.-H., Wang, S., Yang, M.-H.: Segflow: Joint learning for video object segmentation and optical flow. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 686–695 (2017)
Maninis, K.-K., Caelles, S., Chen, Y., Pont-Tuset, J., Leal-Taixé, L., Cremers, D., Van Gool, L.: Video object segmentation without temporal information. IEEE Trans. Pattern Anal. Mach. Intell. 41(6), 1515–1530 (2018)
Article Google Scholar
Jampani, V., Gadde, R., Gehler, P.V.: Video propagation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 451–461 (2017)
Yang, L., Wang, Y., Xiong, X., Yang, J., Katsaggelos, A.K.: Efficient video object segmentation via network modulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6499–6507 (2018)
Cheng, J., Tsai, Y.-H., Hung, W.-C., Wang, S., Yang, M.-H.: Fast and accurate online video object segmentation via tracking parts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7415–7424 (2018)
Chen, Y., Pont-Tuset, J., Montes, A., Van Gool, L.: Blazingly fast video object segmentation with pixel-wise metric learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1189–1198 (2018)
Su, T., Song, H., Liu, D., Liu, B., Liu, Q.: Unsupervised video object segmentation with online adversarial self-tuning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 688–698 (2023)
Cheng, H.K., Tai, Y.-W., Tang, C.-K.: Modular interactive video object segmentation: Interaction-to-mask, propagation and difference-aware fusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5559–5568 (2021)
Xie, H., Yao, H., Zhou, S., Zhang, S., Sun, W.: Efficient regional memory network for video object segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1286–1295 (2021)
Miles, R., Yucel, M.K., Manganelli, B., Saà-Garriga, A.: Mobilevos: Real-time video object segmentation contrastive learning meets knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10480–10490 (2023)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980 (2014)
Li, L., Wang, W., Zhou, T., Li, J., Yang, Y.: Unified mask embedding and correspondence learning for self-supervised video segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18706–18716 (2023)
Duke, B., Ahmed, A., Wolf, C., Aarabi, P., Taylor, G.W.: Sstvos: Sparse spatiotemporal transformers for video object segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5912–5921 (2021)
Xu, N., Yang, L., Fan, Y., Yang, J., Yue, D., Liang, Y., Price, B., Cohen, S., Huang, T.: Youtube-vos: sequence-to-sequence video object segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 585–601 (2018)

Download references

Author information

Authors and Affiliations

Sichuan University, Chengdu, China
Guoqiang Wang & Min Zhu
University of Electronic Science and Technology of China, Chengdu, China
Lan Li & Xiang Zhang
Shenzhen Polytechnic University, Shenzhen, China
Rui Zhao

Authors

Guoqiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lan Li
View author publications
You can also search for this author in PubMed Google Scholar
Min Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Rui Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiang Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, G., Li, L., Zhu, M. et al. Kernel based local matching network for video object segmentation. Machine Vision and Applications 35, 42 (2024). https://doi.org/10.1007/s00138-024-01524-4

Download citation

Received: 04 September 2023
Revised: 09 February 2024
Accepted: 22 February 2024
Published: 25 March 2024
DOI: https://doi.org/10.1007/s00138-024-01524-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Kernel based local matching network for video object segmentation

Abstract

Access this article

Similar content being viewed by others

COMatchNet: Co-Attention Matching Network for Video Object Segmentation

Kernelized Memory Network for Video Object Segmentation

Adaptive Guidance and Attention-Refined Network for Fast Video Object Segmentation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Kernel based local matching network for video object segmentation

Abstract

Access this article

Similar content being viewed by others

COMatchNet: Co-Attention Matching Network for Video Object Segmentation

Kernelized Memory Network for Video Object Segmentation

Adaptive Guidance and Attention-Refined Network for Fast Video Object Segmentation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation