Skip to main content
Log in

Feature Correlation Transformer for Estimating Ambiguous Optical Flow

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Cost volume is widely used to establish correspondences in optical flow estimation. However, when dealing with low-texture and occluded areas, it is difficult to estimate the cost volume correctly. Therefore, we propose a replacement: feature correlation transformer (FCTR), a transformer with self- and cross-attention alternations for obtaining global receptive fields and positional embedding for establishing correspondences. With global context and positional information, FCTR can produce more accurate correspondences for ambiguous areas. Using position-embedded feature allows the removal of the context network; the positional information can be aggregated within ambiguous motion boundaries, and the number of model parameters can be reduced. To speed up network convergence and strengthen robustness, we introduce a smooth L1 loss with exponential weights in the pre-training step. At the time of submission, our method achieves competitive performance with all published optical flow methods on both the KITTI-2015 and MPI-Sintel benchmarks. Moreover, it outperforms all optical flow and scene flow methods in KITTI-2015 foreground-region prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Sun S, Kuang Z, Sheng L, Ouyang W, Zhang W(2018) Optical flow guided feature: a fast and robust motion representation for video action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1390–1399

  2. Wan Y, Yu Z, Wang Y, Li X (2020) Action recognition based on two-stream convolutional networks with long-short-term spatiotemporal features. IEEE Access 8:85284–85293

    Article  Google Scholar 

  3. Li G, Xie Y, Wei T, Wang K, Lin L(2018) Flow guided recurrent neural encoder for video salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3243–3252

  4. Liu D, Cui Y, Chen Y, Zhang J, Fan B (2020) Video object detection for autonomous driving: Motion-aid feature calibration. Neurocomputing 409:1–11

    Article  Google Scholar 

  5. Hua M, Nan Y, Lian S(2019) Small obstacle avoidance based on rgb-d semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp 886–894

  6. Xiao F, Zheng P, di Tria J, Kocer BB, Kovac M (2021) Optic flow-based reactive collision prevention for MAVS using the fictitious obstacle hypothesis. IEEE Robot Autom Lett 6(2):3144–3151

    Article  Google Scholar 

  7. Horn BK, Schunck BG (1981) Determining optical flow. Artif Intell 17(1–3):185–203

    Article  MATH  Google Scholar 

  8. Brox T, Bruhn A, Papenberg N, Weickert J (2004) High accuracy optical flow estimation based on a theory for warping. In: Computer Vision-ECCV 2004: 8th european conference on computer vision, Prague, Czech Republic. Proceedings, Part IV 8. Springer, pp. 25–36

  9. Dosovitskiy A, Fischer P, Ilg E, Hausser P, Hazirbas C, Golkov V, Van Der Smagt P, Cremers D, Brox T (2015) Flownet: learning optical flow with convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp. 2758–2766

  10. Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T (2017) Flownet 2.0: evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2462–2470

  11. Ranjan A, Black MJ (2017) Optical flow estimation using a spatial pyramid network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4161–4170

  12. Hui T-W, Tang X, Loy CC (2018) Liteflownet: a lightweight convolutional neural network for optical flow estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8981–8989

  13. Hur J, Roth S (2019) Iterative residual refinement for joint optical flow and occlusion estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5754–5763

  14. Sun D, Yang X, Liu M-Y, Kautz J(2018) Pwc-net: CNNS for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8934–8943

  15. Zhao S, Sheng Y, Dong Y, Chang EI, Xu Y (2020) Maskflownet: asymmetric feature matching with learnable occlusion mask. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6278–6287

  16. Teed Z, Deng J(2020) Raft: Recurrent all-pairs field transforms for optical flow. In: European Conference on Computer Vision, Springer, pp. 402–419

  17. Luo A, Yang F, Luo K, Li X, Fan H, Liu S (2022) Learning optical flow with adaptive graph reasoning. In: Proceedings of the AAAI conference on artificial intelligence, 36:1890–1898

  18. Sarlin P-E, DeTone D, Malisiewicz T, Rabinovich A (2020) Superglue: learning feature matching with graph neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4938–4947

  19. Sun J, Shen Z, Wang Y, Bao H, Zhou X (2021) Loftr: detector-free local feature matching with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8922–8931

  20. Li Z, Liu X, Drenkow N, Ding A, Creighton FX, Taylor RH, Unberath M (2021) Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 6197–6206

  21. Zhang F, Woodford OJ, Prisacariu VA, Torr PHS (2021) Separable flow: learning motion cost volumes for optical flow estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 10807–10817

  22. Jiang S, Campbell D, Lu Y, Li H, Hartley R (2021) Learning to estimate hidden motions with global motion aggregation. In: Proceedings of the IEEE/cvf international conference on computer vision, pp. 9772–9781

  23. Yin Z, Darrell T, Yu F (2019) Hierarchical discrete distribution decomposition for match density estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6044–6053

  24. Yang G, Ramanan D (2019) Volumetric correspondence networks for optical flow. In: Advances in neural information processing systems, pp 793–803

  25. Hui T-W, Loy CC (2020) Liteflownet3: resolving correspondence ambiguity for more accurate optical flow estimation. In: European conference on computer vision, Springer, pp. 169–184

  26. Jiang S, Lu Y, Li H, Hartley R (2021) Learning optical flow from a few matches. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16592–16600

  27. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473

  28. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  29. Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

  30. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision, Springer, pp. 213–229

  31. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

  32. Xu H, Zhang J, Cai J, Rezatofighi H, Tao D (2022) Gmflow: learning optical flow via global matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8111–8120

  33. Katharopoulos A, Vyas A, Pappas N, Fleuret F(2020) Transformers are rnns: Fast autoregressive transformers with linear attention. In: International conference on machine learning, PMLR, pp. 5156–5165

  34. Mayer N, Ilg E, Hausser P, Fischer P, Cremers D, Dosovitskiy A, Brox T(2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4040–4048

  35. Kondermann D, Nair R, Honauer K, Krispin K, Andrulis J, Brock A, Gussefeld B, Rahimimoghaddam M, Hofmann S, Brenner C(2016) The hci benchmark suite: Stereo and flow ground truth with uncertainties for urban autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 19–28

  36. Butler DJ, Wulff J, Stanley GB, Black MJ(2012) A naturalistic open source movie for optical flow evaluation. In: European conference on computer vision, Springer, pp. 611–625

  37. Menze M, Geiger A(2015) Object scene flow for autonomous vehicles. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3061–3070

  38. Sun D, Yang X, Liu M-Y, Kautz J (2019) Models matter, so does training: an empirical study of cnns for optical flow estimation. IEEE Trans Pattern Anal Mach Intell 42(6):1408–1423

    Article  Google Scholar 

  39. Teed Z, Deng J(2021) Raft-3d: Scene flow using rigid-motion embeddings. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8375–8384

  40. Yang G, Ramanan D(2021) Learning to segment rigid motions from two frames. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1266–1275

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 91748107, No. 62076073, No. 61902077), the Guangdong Basic and Applied Basic Research Foundation (No. 2020A1515010616), Science and Technology Program of Guangzhou (No. 202102020524), the Guangdong Innovative Research Team Program (No. 2014ZT05G157), Special Funds for the Cultivation of Guangdong College Students’ Scientific and Technological Innovation (pdjh2020a0173), the Key-Area Research and Development Program of Guangdong Province (2019B010136001), and the Science and Technology Planning Project of Guangdong Province LZC0023. Chen Junhong was sponsored by the China Scholarship Council (No.202208440309).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junhong Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fang, G., Chen, J., Liang, D. et al. Feature Correlation Transformer for Estimating Ambiguous Optical Flow. Neural Process Lett 55, 7543–7559 (2023). https://doi.org/10.1007/s11063-023-11273-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-023-11273-6

Keywords

Navigation