Skip to main content

Disentangling Architecture and Training for Optical Flow

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Abstract

How important are training details and datasets to recent optical flow architectures like RAFT? And do they generalize? To explore these questions, rather than develop a new architecture, we revisit three prominent architectures, PWC-Net, IRR-PWC and RAFT, with a common set of modern training techniques and datasets, and observe significant performance gains, demonstrating the importance and generality of these training details. Our newly trained PWC-Net and IRR-PWC show surprisingly large improvements, up to 30% versus original published results on Sintel and KITTI 2015 benchmarks. Our newly trained RAFT obtains an Fl-all score of 4.31% on KITTI 2015 and an avg. rank of 1.7 for end-point error on Middlebury. Our results demonstrate the benefits of separating the contributions of architectures, training techniques and datasets when analyzing performance gains of optical flow methods. Our source code is available at https://autoflow-google.github.io.

D. Sun and C. Herrmann—Equal technical contribution.

D. Sun—Project lead.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Baker, S., Scharstein, D., Lewis, J.P., Roth, S., Black, M.J., Szeliski, R.: A database and evaluation methodology for optical flow. IJCV 9, 1–31 (2011)

    Article  Google Scholar 

  2. Bao, W., Lai, W.S., Ma, C., Zhang, X., Gao, Z., Yang, M.H.: Depth-aware video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3703–3712 (2019)

    Google Scholar 

  3. Barron, J., Fleet, D., Beauchemin, S.: Performance of optical flow techniques. IJCV 12, 43–77 (1994)

    Article  Google Scholar 

  4. Bello, I., et al.: Revisiting resnets: improved training and scaling strategies. Adv. Neural Inf. Process. Syst. 34, 1–14 (2021)

    Google Scholar 

  5. Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 611–625. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33783-3_44

    Chapter  Google Scholar 

  6. Chen, Z., Jin, H., Lin, Z., Cohen, S., Wu, Y.: Large displacement optical flow from nearest neighbor fields. In: CVPR, pp. 2443–2450 (2013)

    Google Scholar 

  7. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR, pp. 248–255. IEEE (2009)

    Google Scholar 

  8. Djelouah, A., Campos, J., Schaub-Meyer, S., Schroers, C.: Neural inter-frame compression for video coding. In: CVPR, pp. 6421–6429 (2019)

    Google Scholar 

  9. Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  10. Dosovitskiy, A., et al.: FlowNet: learning optical flow with convolutional networks. In: Proceedings of ICCV (2015)

    Google Scholar 

  11. Fan, L., Huang, W., Gan, C., Ermon, S., Gong, B., Huang, J.: End-to-end learning of motion representation for video understanding. In: Proceedings of CVPR (2018)

    Google Scholar 

  12. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: Proceedings of CVPR, pp. 3354–3361. IEEE (2012)

    Google Scholar 

  13. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT press, Cambridge (2016)

    MATH  Google Scholar 

  14. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  15. He, T., Zhang, Z., Zhang, H., Zhang, Z., Xie, J., Li, M.: Bag of tricks for image classification with convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 558–567 (2019)

    Google Scholar 

  16. Hui, T.W., Tang, X., Change Loy, C.: Liteflownet: a lightweight convolutional neural network for optical flow estimation. In: Proceedings of CVPR (2018)

    Google Scholar 

  17. Hur, J., Roth, S.: Iterative residual refinement for joint optical flow and occlusion estimation. In: Proceedings of CVPR, pp. 5754–5763 (2019). https://github.com/visinf/irr/blob/master/models/pwcnet/_irr.py

  18. Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: Proceedings of CVPR (2017)

    Google Scholar 

  19. Jiang, H., Sun, D., Jampani, V., Yang, M.H., Learned-Miller, E., Kautz, J.: Super SloMo: high quality estimation of multiple intermediate frames for video interpolation. In: Proceedings of CVPR (2018)

    Google Scholar 

  20. Jiang, S., Campbell, D., Lu, Y., Li, H., Hartley, R.: Learning to estimate hidden motions with global motion aggregation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9772–9781 (2021)

    Google Scholar 

  21. Jonschkowski, R., Stone, A., Barron, J.T., Gordon, A., Konolige, K., Angelova, A.: What matters in unsupervised optical flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 557–572. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_33

    Chapter  Google Scholar 

  22. Kim, D., Woo, S., Lee, J.Y., Kweon, I.S.: Deep video inpainting. In: CVPR, pp. 5792–5801 (2019)

    Google Scholar 

  23. Kondermann, D., et al.: The hci benchmark suite: stereo and flow ground truth with uncertainties for urban autonomous driving. In: CVPR Workshops, pp. 19–28 (2016)

    Google Scholar 

  24. Lipson, L., Teed, Z., Deng, J.: Raft-stereo: multilevel recurrent field transforms for stereo matching. In: 3DV, pp. 218–227. IEEE (2021)

    Google Scholar 

  25. Liu, L., et al.: Learning by analogy: reliable supervision from transformations for unsupervised optical flow estimation. In: CVPR, pp. 6489–6498 (2020)

    Google Scholar 

  26. Liu, P., Lyu, M., King, I., Xu, J.: Selflow: self-supervised learning of optical flow. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  27. Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s. arXiv preprint arXiv:2201.03545 (2022)

  28. Luo, A., Yang, F., Luo, K., Li, X., Fan, H., Liu, S.: Learning optical flow with adaptive graph reasoning. arXiv preprint arXiv:2202.03857 (2022)

  29. Lv, Z., Kim, K., Troccoli, A., Sun, D., Rehg, J.M., Kautz, J.: Learning rigidity in dynamic scenes with a moving camera for 3D motion field estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 484–501. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_29

    Chapter  Google Scholar 

  30. Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of CVPR (2016)

    Google Scholar 

  31. Mehl, L., Beschle, C., Barth, A., Bruhn, A.: An anisotropic selection scheme for variational optical flow methods with order-adaptive regularisation. In: Elmoataz, A., Fadili, J., Quéau, Y., Rabin, J., Simon, L. (eds.) SSVM 2021. LNCS, vol. 12679, pp. 140–152. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75549-2_12

    Chapter  MATH  Google Scholar 

  32. Meister, S., Hur, J., Roth, S.: Unflow: unsupervised learning of optical flow with a bidirectional census loss. In: AAAI (2018)

    Google Scholar 

  33. Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: CVPR (2016)

    Google Scholar 

  34. Ranjan, A., Black, M.J.: Optical flow estimation using a spatial pyramid network. In: Proceedings of CVPR (2017)

    Google Scholar 

  35. Ranjan, A., et al.: Competitive collaboration: joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: CVPR, pp. 12240–12249 (2019)

    Google Scholar 

  36. Richter, S.R., Hayder, Z., Koltun, V.: Playing for benchmarks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2213–2222 (2017)

    Google Scholar 

  37. Richter, S.R., Hayder, Z., Koltun, V.: Playing for benchmarks. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017, pp. 2232–2241 (2017). https://doi.org/10.1109/ICCV.2017.243

  38. Shi, H., Zhou, Y., Yang, K., Yin, X., Wang, K.: Csflow: learning optical flow via cross strip correlation for autonomous driving. arXiv preprint arXiv:2202.00909 (2022)

  39. Steiner, A., Kolesnikov, A., Zhai, X., Wightman, R., Uszkoreit, J., Beyer, L.: How to train your vit? data, augmentation, and regularization in vision transformers. arXiv preprint arXiv:2106.10270 (2021)

  40. Stone, A., Maurer, D., Ayvaci, A., Angelova, A., Jonschkowski, R.: Smurf: self-teaching multi-frame unsupervised raft with full-image warping. In: CVPR, pp. 3887–3896 (2021)

    Google Scholar 

  41. Stroud, J., Ross, D., Sun, C., Deng, J., Sukthankar, R.: D3d: distilled 3d networks for video action recognition. In: CVPR, pp. 625–634 (2020)

    Google Scholar 

  42. Sun, D., et al.: TF-RAFT: a tensorflow implementation of raft. In: ECCV Robust Vision Challenge Workshop (2020)

    Google Scholar 

  43. Sun, D., Roth, S., Black, M.J.: Secrets of optical flow estimation and their principles. In: CVPR, pp. 2432–2439. IEEE (2010)

    Google Scholar 

  44. Sun, D., et al.: Autoflow: learning a better training set for optical flow. In: CVPR (2021)

    Google Scholar 

  45. Sun, D., Yang, X., Liu, M.Y., Kautz, J.: Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In: CVPR (2018)

    Google Scholar 

  46. Sun, D., Yang, X., Liu, M.Y., Kautz, J.: Models matter, so does training: an empirical study of cnns for optical flow estimation. IEEE TPAMI 42, 1408–1423 (2019)

    Article  Google Scholar 

  47. Szeliski, R.: Computer Vision: Algorithms and Applications. Springer, Heidelberg (2010). https://doi.org/10.1007/978-1-84882-935-0

    Book  MATH  Google Scholar 

  48. Teed, Z., Deng, J.: RAFT: recurrent all-pairs field transforms for optical flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402–419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24

    Chapter  Google Scholar 

  49. Teed, Z., Deng, J.: Raft-3d: scene flow using rigid-motion embeddings. In: CVPR, pp. 8375–8384 (2021)

    Google Scholar 

  50. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357. PMLR (2021)

    Google Scholar 

  51. Wan, Z., Mao, Y., Dai, Y.: Praflow_rvc: pyramid recurrent all-pairs field transforms for optical flow estimation in robust vision challenge 2020. arXiv preprint arXiv:2009.06360 (2020)

  52. Wang, J., Zhong, Y., Dai, Y., Zhang, K., Ji, P., Li, H.: Displacement-invariant matching cost learning for accurate optical flow estimation. Adv. Neural Inf. Process. Syst. 33, 15220–15231 (2020)

    Google Scholar 

  53. Wightman, R., Touvron, H., Jégou, H.: Resnet strikes back: an improved training procedure in timm. arXiv preprint arXiv:2110.00476 (2021)

  54. Xiao, T., et al.: Learnable Cost Volume Using the Cayley Representation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 483–499. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_28

    Chapter  Google Scholar 

  55. Xu, H., Yang, J., Cai, J., Zhang, J., Tong, X.: High-resolution optical flow from 1d attention and correlation. In: ICCV (2021)

    Google Scholar 

  56. Yang, G., Ramanan, D.: Volumetric correspondence networks for optical flow. In: NeurIPS, vol. 32, pp. 794–805 (2019)

    Google Scholar 

  57. Yang, G., Zhao, H., Shi, J., Deng, Z., Jia, J.: SegStereo: exploiting semantic information for disparity estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 660–676. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_39

    Chapter  Google Scholar 

  58. Yin, Z., Darrell, T., Yu, F.: Hierarchical discrete distribution decomposition for match density estimation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  59. Yu, H., et al.: Foal: fast online adaptive learning for cardiac motion estimation. In: CVPR, pp. 4313–4323 (2020)

    Google Scholar 

  60. Yu, J.J., Harley, A.W., Derpanis, K.G.: Back to basics: unsupervised learning of optical flow via brightness constancy and motion smoothness. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 3–10. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_1

    Chapter  Google Scholar 

  61. Zach, C., Pock, T., Bischof, H.: A duality based approach for realtime tv-l 1 optical flow. In: DAGM (2007)

    Google Scholar 

  62. Zhang, F., Woodford, O.J., Prisacariu, V.A., Torr, P.H.: Separable flow: learning motion cost volumes for optical flow estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10807–10817 (2021)

    Google Scholar 

  63. Zhao, H., Gan, C., Ma, W.C., Torralba, A.: The sound of motions. In: CVPR, pp. 1735–1744 (2019)

    Google Scholar 

  64. Zhao, S., Sheng, Y., Dong, Y., Chang, E.I.C., Xu, Y.: Maskflownet: asymmetric feature matching with learnable occlusion mask. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  65. Zhao, X., Pang, Y., Zhang, L., Lu, H., Zhang, L.: Suppress and balance: a simple gated network for salient object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 35–51. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_3

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Charles Herrmann .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 18957 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sun, D., Herrmann, C., Reda, F., Rubinstein, M., Fleet, D.J., Freeman, W.T. (2022). Disentangling Architecture and Training for Optical Flow. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13682. Springer, Cham. https://doi.org/10.1007/978-3-031-20047-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20047-2_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20046-5

  • Online ISBN: 978-3-031-20047-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics