Feature Correlation Transformer for Estimating Ambiguous Optical Flow

Fang, Guibiao; Chen, Junhong; Liang, Dayong; Asim, Muhammad; Van Reeth, Frank; Claesen, Luc; Yang, Zhenguo; Liu, Wenyin

doi:10.1007/s11063-023-11273-6

Feature Correlation Transformer for Estimating Ambiguous Optical Flow

Published: 06 May 2023

Volume 55, pages 7543–7559, (2023)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Guibiao Fang¹,
Junhong Chen^1,2,
Dayong Liang¹,
Muhammad Asim^1,3,
Frank Van Reeth²,
Luc Claesen⁴,
Zhenguo Yang¹ &
…
Wenyin Liu¹

209 Accesses
1 Altmetric
Explore all metrics

Abstract

Cost volume is widely used to establish correspondences in optical flow estimation. However, when dealing with low-texture and occluded areas, it is difficult to estimate the cost volume correctly. Therefore, we propose a replacement: feature correlation transformer (FCTR), a transformer with self- and cross-attention alternations for obtaining global receptive fields and positional embedding for establishing correspondences. With global context and positional information, FCTR can produce more accurate correspondences for ambiguous areas. Using position-embedded feature allows the removal of the context network; the positional information can be aggregated within ambiguous motion boundaries, and the number of model parameters can be reduced. To speed up network convergence and strengthen robustness, we introduce a smooth L1 loss with exponential weights in the pre-training step. At the time of submission, our method achieves competitive performance with all published optical flow methods on both the KITTI-2015 and MPI-Sintel benchmarks. Moreover, it outperforms all optical flow and scene flow methods in KITTI-2015 foreground-region prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LiteFlowNet3: Resolving Correspondence Ambiguity for More Accurate Optical Flow Estimation

I-RAFT: Optical Flow Estimation Model Based on Multi-scale Initialization Strategy

Object-Scale Adaptive Optical Flow Estimation Network

References

Sun S, Kuang Z, Sheng L, Ouyang W, Zhang W(2018) Optical flow guided feature: a fast and robust motion representation for video action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1390–1399
Wan Y, Yu Z, Wang Y, Li X (2020) Action recognition based on two-stream convolutional networks with long-short-term spatiotemporal features. IEEE Access 8:85284–85293
Article Google Scholar
Li G, Xie Y, Wei T, Wang K, Lin L(2018) Flow guided recurrent neural encoder for video salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3243–3252
Liu D, Cui Y, Chen Y, Zhang J, Fan B (2020) Video object detection for autonomous driving: Motion-aid feature calibration. Neurocomputing 409:1–11
Article Google Scholar
Hua M, Nan Y, Lian S(2019) Small obstacle avoidance based on rgb-d semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp 886–894
Xiao F, Zheng P, di Tria J, Kocer BB, Kovac M (2021) Optic flow-based reactive collision prevention for MAVS using the fictitious obstacle hypothesis. IEEE Robot Autom Lett 6(2):3144–3151
Article Google Scholar
Horn BK, Schunck BG (1981) Determining optical flow. Artif Intell 17(1–3):185–203
Article MATH Google Scholar
Brox T, Bruhn A, Papenberg N, Weickert J (2004) High accuracy optical flow estimation based on a theory for warping. In: Computer Vision-ECCV 2004: 8th european conference on computer vision, Prague, Czech Republic. Proceedings, Part IV 8. Springer, pp. 25–36
Dosovitskiy A, Fischer P, Ilg E, Hausser P, Hazirbas C, Golkov V, Van Der Smagt P, Cremers D, Brox T (2015) Flownet: learning optical flow with convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp. 2758–2766
Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T (2017) Flownet 2.0: evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2462–2470
Ranjan A, Black MJ (2017) Optical flow estimation using a spatial pyramid network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4161–4170
Hui T-W, Tang X, Loy CC (2018) Liteflownet: a lightweight convolutional neural network for optical flow estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8981–8989
Hur J, Roth S (2019) Iterative residual refinement for joint optical flow and occlusion estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5754–5763
Sun D, Yang X, Liu M-Y, Kautz J(2018) Pwc-net: CNNS for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8934–8943
Zhao S, Sheng Y, Dong Y, Chang EI, Xu Y (2020) Maskflownet: asymmetric feature matching with learnable occlusion mask. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6278–6287
Teed Z, Deng J(2020) Raft: Recurrent all-pairs field transforms for optical flow. In: European Conference on Computer Vision, Springer, pp. 402–419
Luo A, Yang F, Luo K, Li X, Fan H, Liu S (2022) Learning optical flow with adaptive graph reasoning. In: Proceedings of the AAAI conference on artificial intelligence, 36:1890–1898
Sarlin P-E, DeTone D, Malisiewicz T, Rabinovich A (2020) Superglue: learning feature matching with graph neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4938–4947
Sun J, Shen Z, Wang Y, Bao H, Zhou X (2021) Loftr: detector-free local feature matching with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8922–8931
Li Z, Liu X, Drenkow N, Ding A, Creighton FX, Taylor RH, Unberath M (2021) Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 6197–6206
Zhang F, Woodford OJ, Prisacariu VA, Torr PHS (2021) Separable flow: learning motion cost volumes for optical flow estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 10807–10817
Jiang S, Campbell D, Lu Y, Li H, Hartley R (2021) Learning to estimate hidden motions with global motion aggregation. In: Proceedings of the IEEE/cvf international conference on computer vision, pp. 9772–9781
Yin Z, Darrell T, Yu F (2019) Hierarchical discrete distribution decomposition for match density estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6044–6053
Yang G, Ramanan D (2019) Volumetric correspondence networks for optical flow. In: Advances in neural information processing systems, pp 793–803
Hui T-W, Loy CC (2020) Liteflownet3: resolving correspondence ambiguity for more accurate optical flow estimation. In: European conference on computer vision, Springer, pp. 169–184
Jiang S, Lu Y, Li H, Hartley R (2021) Learning optical flow from a few matches. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16592–16600
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision, Springer, pp. 213–229
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Xu H, Zhang J, Cai J, Rezatofighi H, Tao D (2022) Gmflow: learning optical flow via global matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8111–8120
Katharopoulos A, Vyas A, Pappas N, Fleuret F(2020) Transformers are rnns: Fast autoregressive transformers with linear attention. In: International conference on machine learning, PMLR, pp. 5156–5165
Mayer N, Ilg E, Hausser P, Fischer P, Cremers D, Dosovitskiy A, Brox T(2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4040–4048
Kondermann D, Nair R, Honauer K, Krispin K, Andrulis J, Brock A, Gussefeld B, Rahimimoghaddam M, Hofmann S, Brenner C(2016) The hci benchmark suite: Stereo and flow ground truth with uncertainties for urban autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 19–28
Butler DJ, Wulff J, Stanley GB, Black MJ(2012) A naturalistic open source movie for optical flow evaluation. In: European conference on computer vision, Springer, pp. 611–625
Menze M, Geiger A(2015) Object scene flow for autonomous vehicles. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3061–3070
Sun D, Yang X, Liu M-Y, Kautz J (2019) Models matter, so does training: an empirical study of cnns for optical flow estimation. IEEE Trans Pattern Anal Mach Intell 42(6):1408–1423
Article Google Scholar
Teed Z, Deng J(2021) Raft-3d: Scene flow using rigid-motion embeddings. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8375–8384
Yang G, Ramanan D(2021) Learning to segment rigid motions from two frames. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1266–1275

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 91748107, No. 62076073, No. 61902077), the Guangdong Basic and Applied Basic Research Foundation (No. 2020A1515010616), Science and Technology Program of Guangzhou (No. 202102020524), the Guangdong Innovative Research Team Program (No. 2014ZT05G157), Special Funds for the Cultivation of Guangdong College Students’ Scientific and Technological Innovation (pdjh2020a0173), the Key-Area Research and Development Program of Guangdong Province (2019B010136001), and the Science and Technology Planning Project of Guangdong Province LZC0023. Chen Junhong was sponsored by the China Scholarship Council (No.202208440309).

Author information

Authors and Affiliations

College of Computer Science and Technology, Guangdong University of Technology, Guangzhou, 510006, Guangdong, China
Guibiao Fang, Junhong Chen, Dayong Liang, Muhammad Asim, Zhenguo Yang & Wenyin Liu
Expertise Centre for Digital Media, Hasselt University, 3590, Diepenbeek, Limburg, Belgium
Junhong Chen & Frank Van Reeth
College of Computer and Information Sciences, Prince Sultan University, Riyadh, 11586, Riyadh, Saudi Arabia
Muhammad Asim
Hasselt University, 3590, Diepenbeek, Limburg, Belgium
Luc Claesen

Authors

Guibiao Fang
View author publications
You can also search for this author in PubMed Google Scholar
Junhong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Dayong Liang
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Asim
View author publications
You can also search for this author in PubMed Google Scholar
Frank Van Reeth
View author publications
You can also search for this author in PubMed Google Scholar
Luc Claesen
View author publications
You can also search for this author in PubMed Google Scholar
Zhenguo Yang
View author publications
You can also search for this author in PubMed Google Scholar
Wenyin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junhong Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Fang, G., Chen, J., Liang, D. et al. Feature Correlation Transformer for Estimating Ambiguous Optical Flow. Neural Process Lett 55, 7543–7559 (2023). https://doi.org/10.1007/s11063-023-11273-6

Download citation

Accepted: 29 March 2023
Published: 06 May 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11063-023-11273-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature Correlation Transformer for Estimating Ambiguous Optical Flow

Abstract

Access this article

Similar content being viewed by others

LiteFlowNet3: Resolving Correspondence Ambiguity for More Accurate Optical Flow Estimation

I-RAFT: Optical Flow Estimation Model Based on Multi-scale Initialization Strategy

Object-Scale Adaptive Optical Flow Estimation Network

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Feature Correlation Transformer for Estimating Ambiguous Optical Flow

Abstract

Access this article

Similar content being viewed by others

LiteFlowNet3: Resolving Correspondence Ambiguity for More Accurate Optical Flow Estimation

I-RAFT: Optical Flow Estimation Model Based on Multi-scale Initialization Strategy

Object-Scale Adaptive Optical Flow Estimation Network

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation