Skip to main content
Log in

Evaluating quality of motion for unsupervised video object segmentation

  • Published:
Optoelectronics Letters Aims and scope Submit manuscript

Abstract

Current mainstream unsupervised video object segmentation (UVOS) approaches typically incorporate optical flow as motion information to locate the primary objects in coherent video frames. However, they fuse appearance and motion information without evaluating the quality of the optical flow. When poor-quality optical flow is used for the interaction with the appearance information, it introduces significant noise and leads to a decline in overall performance. To alleviate this issue, we first employ a quality evaluation module (QEM) to evaluate the optical flow. Then, we select high-quality optical flow as motion cues to fuse with the appearance information, which can prevent poor-quality optical flow from diverting the network’s attention. Moreover, we design an appearance-guided fusion module (AGFM) to better integrate appearance and motion information. Extensive experiments on several widely utilized datasets, including DAVIS-16, FBMS-59, and YouTube-Objects, demonstrate that the proposed method outperforms existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. MADDERN W, PASCOE G, LINEGAR C, et al. 1 year, 1000 km: the oxford robotcar dataset[J]. The international journal of robotics research, 2017, 36(1): 3–15.

    Article  Google Scholar 

  2. XIONG L, TANG G. Multi-object tracking based on deep associated features for UAV applications[J]. Optoelectronics letters, 2023, 19(2): 105–111.

    Article  ADS  Google Scholar 

  3. KARRAY F, ALEMZADEH M, ABOU SALEH J, et al. Human-computer interaction: overview on state of the art[J]. International journal on smart sensing and intelligent systems, 2008, 1(1): 137–159.

    Article  Google Scholar 

  4. YANG S, ZHANG L, QI J, et al. Learning motion-appearance co-attention for zero-shot video object segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, October 10–17, 2021, Montreal, QC, Canada. New York: IEEE, 2021: 1564–1573.

    Google Scholar 

  5. PEI G, SHEN F, YAO Y, et al. Hierarchical feature alignment network for unsupervised video object segmentation[C]//European Conference on Computer Vision, October 23–27, 2022, Tel Aviv, Israel. Cham: Springer Nature, 2022: 596–613.

    Google Scholar 

  6. JI G P, FU K, WU Z, et al. Full-duplex strategy for video object segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, October 10–17, 2021, Montreal, QC, Canada. New York: IEEE, 2021: 4922–4933.

    Google Scholar 

  7. ZHOU T, WANG S, ZHOU Y, et al. Motion-attentive transition for zero-shot video object segmentation[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(07): 13066–13073.

    Article  Google Scholar 

  8. CUI Y, JIANG C, WANG L, et al. MixFormer: end-to-end tracking with iterative mixed attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 19–20, 2022, New Orleans, LA, USA. New York: IEEE, 2022: 13608–13618.

    Google Scholar 

  9. XIE E, WANG W, YU Z, et al. SegFormer: simple and efficient design for semantic segmentation with transformers[J]. Advances in neural information processing systems, 2021, 34: 12077–12090.

    ADS  Google Scholar 

  10. XU N, YANG L, FAN Y, et al. Youtube-vos: sequence-to-sequence video object segmentation[C]//Proceedings of the European Conference on Computer Vision, September 8–14, 2018, Munich, Germany. Berlin, Heidelberg: Springer-Verlag, 2018: 585–601.

    Google Scholar 

  11. PERAZZI F, PONT-TUSET J, MCWILLIAMS B, et al. A benchmark dataset and evaluation methodology for video object segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 26–July 1, 2016, Las Vegas, Nevada, USA. New York: IEEE, 2016: 724–732.

    Google Scholar 

  12. TEED Z, DENG J. Raft: recurrent all-pairs field transforms for optical flow[C]//Proceedings of the European Conference on Computer Vision, August 23–28, 2020, Glasgow, UK. Berlin, Heidelberg: Springer-Verlag, 2020: 402–419.

    Google Scholar 

  13. OCHS P, MALIK J, BROX T. Segmentation of moving objects by long term video analysis[J]. IEEE transactions on pattern analysis and machine intelligence, 2013, 36(6): 1187–1200.

    Article  Google Scholar 

  14. PREST A, LEISTNER C, CIVERA J, et al. Learning object class detectors from weakly annotated video[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition, June 16–21, 2012, Providence, RI, USA. New York: IEEE, 2012: 3282–3289.

    Google Scholar 

  15. ZHANG K, ZHAO Z, LIU D, et al. Deep transport network for unsupervised video object segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, October 10–17, 2021, Montreal, QC, Canada. New York: IEEE, 2021: 8781–8790.

    Google Scholar 

  16. REN S, LIU W, LIU Y, et al. Reciprocal transformations for unsupervised video object segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, October 10–17, 2021, Montreal, QC, Canada. New York: IEEE, 2021: 15455–15464.

    Google Scholar 

  17. SCHMIDT C, ATHAR A, MAHADEVAN S, et al. D2conv3d: dynamic dilated convolutions for object segmentation in videos[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, January 3–8, 2022, Waikoloa, HI, USA. New York: IEEE, 2022: 1200–1209.

    Google Scholar 

  18. CHO S, LEE M, LEE S, et al. Treating motion as option to reduce motion dependency in unsupervised video object segmentation[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, January 2–7, 2023, Waikoloa, USA. New York: IEEE, 2023: 5140–5149.

    Google Scholar 

  19. WANG W, LU X, SHEN J, et al. Zero-shot video object segmentation via attentive graph neural networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, October 27–November 2, 2019, Seoul, Korea (South). New York: IEEE, 2019: 9236–9245.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huihui Song.

Ethics declarations

Conflicts of interest

The authors declare no conflict of interest.

Additional information

This work has been supported by the National Natural Science Foundation of China (No.61872189).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, G., Song, H. Evaluating quality of motion for unsupervised video object segmentation. Optoelectron. Lett. 20, 379–384 (2024). https://doi.org/10.1007/s11801-024-3207-1

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11801-024-3207-1

Document code

Navigation