Abstract
Cost volume is widely used to establish correspondences in optical flow estimation. However, when dealing with low-texture and occluded areas, it is difficult to estimate the cost volume correctly. Therefore, we propose a replacement: feature correlation transformer (FCTR), a transformer with self- and cross-attention alternations for obtaining global receptive fields and positional embedding for establishing correspondences. With global context and positional information, FCTR can produce more accurate correspondences for ambiguous areas. Using position-embedded feature allows the removal of the context network; the positional information can be aggregated within ambiguous motion boundaries, and the number of model parameters can be reduced. To speed up network convergence and strengthen robustness, we introduce a smooth L1 loss with exponential weights in the pre-training step. At the time of submission, our method achieves competitive performance with all published optical flow methods on both the KITTI-2015 and MPI-Sintel benchmarks. Moreover, it outperforms all optical flow and scene flow methods in KITTI-2015 foreground-region prediction.
Similar content being viewed by others
References
Sun S, Kuang Z, Sheng L, Ouyang W, Zhang W(2018) Optical flow guided feature: a fast and robust motion representation for video action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1390–1399
Wan Y, Yu Z, Wang Y, Li X (2020) Action recognition based on two-stream convolutional networks with long-short-term spatiotemporal features. IEEE Access 8:85284–85293
Li G, Xie Y, Wei T, Wang K, Lin L(2018) Flow guided recurrent neural encoder for video salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3243–3252
Liu D, Cui Y, Chen Y, Zhang J, Fan B (2020) Video object detection for autonomous driving: Motion-aid feature calibration. Neurocomputing 409:1–11
Hua M, Nan Y, Lian S(2019) Small obstacle avoidance based on rgb-d semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp 886–894
Xiao F, Zheng P, di Tria J, Kocer BB, Kovac M (2021) Optic flow-based reactive collision prevention for MAVS using the fictitious obstacle hypothesis. IEEE Robot Autom Lett 6(2):3144–3151
Horn BK, Schunck BG (1981) Determining optical flow. Artif Intell 17(1–3):185–203
Brox T, Bruhn A, Papenberg N, Weickert J (2004) High accuracy optical flow estimation based on a theory for warping. In: Computer Vision-ECCV 2004: 8th european conference on computer vision, Prague, Czech Republic. Proceedings, Part IV 8. Springer, pp. 25–36
Dosovitskiy A, Fischer P, Ilg E, Hausser P, Hazirbas C, Golkov V, Van Der Smagt P, Cremers D, Brox T (2015) Flownet: learning optical flow with convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp. 2758–2766
Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T (2017) Flownet 2.0: evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2462–2470
Ranjan A, Black MJ (2017) Optical flow estimation using a spatial pyramid network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4161–4170
Hui T-W, Tang X, Loy CC (2018) Liteflownet: a lightweight convolutional neural network for optical flow estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8981–8989
Hur J, Roth S (2019) Iterative residual refinement for joint optical flow and occlusion estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5754–5763
Sun D, Yang X, Liu M-Y, Kautz J(2018) Pwc-net: CNNS for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8934–8943
Zhao S, Sheng Y, Dong Y, Chang EI, Xu Y (2020) Maskflownet: asymmetric feature matching with learnable occlusion mask. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6278–6287
Teed Z, Deng J(2020) Raft: Recurrent all-pairs field transforms for optical flow. In: European Conference on Computer Vision, Springer, pp. 402–419
Luo A, Yang F, Luo K, Li X, Fan H, Liu S (2022) Learning optical flow with adaptive graph reasoning. In: Proceedings of the AAAI conference on artificial intelligence, 36:1890–1898
Sarlin P-E, DeTone D, Malisiewicz T, Rabinovich A (2020) Superglue: learning feature matching with graph neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4938–4947
Sun J, Shen Z, Wang Y, Bao H, Zhou X (2021) Loftr: detector-free local feature matching with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8922–8931
Li Z, Liu X, Drenkow N, Ding A, Creighton FX, Taylor RH, Unberath M (2021) Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 6197–6206
Zhang F, Woodford OJ, Prisacariu VA, Torr PHS (2021) Separable flow: learning motion cost volumes for optical flow estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 10807–10817
Jiang S, Campbell D, Lu Y, Li H, Hartley R (2021) Learning to estimate hidden motions with global motion aggregation. In: Proceedings of the IEEE/cvf international conference on computer vision, pp. 9772–9781
Yin Z, Darrell T, Yu F (2019) Hierarchical discrete distribution decomposition for match density estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6044–6053
Yang G, Ramanan D (2019) Volumetric correspondence networks for optical flow. In: Advances in neural information processing systems, pp 793–803
Hui T-W, Loy CC (2020) Liteflownet3: resolving correspondence ambiguity for more accurate optical flow estimation. In: European conference on computer vision, Springer, pp. 169–184
Jiang S, Lu Y, Li H, Hartley R (2021) Learning optical flow from a few matches. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16592–16600
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision, Springer, pp. 213–229
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Xu H, Zhang J, Cai J, Rezatofighi H, Tao D (2022) Gmflow: learning optical flow via global matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8111–8120
Katharopoulos A, Vyas A, Pappas N, Fleuret F(2020) Transformers are rnns: Fast autoregressive transformers with linear attention. In: International conference on machine learning, PMLR, pp. 5156–5165
Mayer N, Ilg E, Hausser P, Fischer P, Cremers D, Dosovitskiy A, Brox T(2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4040–4048
Kondermann D, Nair R, Honauer K, Krispin K, Andrulis J, Brock A, Gussefeld B, Rahimimoghaddam M, Hofmann S, Brenner C(2016) The hci benchmark suite: Stereo and flow ground truth with uncertainties for urban autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 19–28
Butler DJ, Wulff J, Stanley GB, Black MJ(2012) A naturalistic open source movie for optical flow evaluation. In: European conference on computer vision, Springer, pp. 611–625
Menze M, Geiger A(2015) Object scene flow for autonomous vehicles. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3061–3070
Sun D, Yang X, Liu M-Y, Kautz J (2019) Models matter, so does training: an empirical study of cnns for optical flow estimation. IEEE Trans Pattern Anal Mach Intell 42(6):1408–1423
Teed Z, Deng J(2021) Raft-3d: Scene flow using rigid-motion embeddings. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8375–8384
Yang G, Ramanan D(2021) Learning to segment rigid motions from two frames. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1266–1275
Acknowledgements
This work is supported by the National Natural Science Foundation of China (No. 91748107, No. 62076073, No. 61902077), the Guangdong Basic and Applied Basic Research Foundation (No. 2020A1515010616), Science and Technology Program of Guangzhou (No. 202102020524), the Guangdong Innovative Research Team Program (No. 2014ZT05G157), Special Funds for the Cultivation of Guangdong College Students’ Scientific and Technological Innovation (pdjh2020a0173), the Key-Area Research and Development Program of Guangdong Province (2019B010136001), and the Science and Technology Planning Project of Guangdong Province LZC0023. Chen Junhong was sponsored by the China Scholarship Council (No.202208440309).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fang, G., Chen, J., Liang, D. et al. Feature Correlation Transformer for Estimating Ambiguous Optical Flow. Neural Process Lett 55, 7543–7559 (2023). https://doi.org/10.1007/s11063-023-11273-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-023-11273-6