Skip to main content
Log in

Local and nonlocal flow-guided video inpainting

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The purpose of video inpainting is to get a reasonable content from the video to fill in the missing region. Video is a continuous four-dimensional sequence in the temporal dimension. It’s difficult to ensure the temporal continuity of video by inpaint video frames respectively along the time dimension. Video inpainting has gone from the traditional inpainting algorithm to the advanced learning based inpainting method. It has been able to inpainting for a variety of scenes. However, there are still unresolved questions in video inpainting, and video inpainting is still a challenging task. Existing works focused on fixing the problem of object removal in the video, and neglected the importance of inpainting the occlusion scene in the middle region. For the occlusion problem in the middle region, we propose a local and nonlocal optical flow video inpainting framework. First, according to the forward and backward directions of the reference frame and the sampling window, we divide the video into local and nonlocal frames, extract the local and nonlocal optical flow and feed them to the residual network for rough inpainting. Next, our approach extracts and completes the edges of the predicted flow. Finally, the composed optical flow field guides the propagation of pixels to inpaint the video content. Experimental results on DAVIS and YouTube-VOS datasets show that our method has significantly improved in terms of the image quality and optical flow quality compared with the state of the art. Codes are available at {https://github.com/lengfengio/LNFVI.git.}

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Availability of data and materials

We using available datasets to tested our method. The DAVIS dataset can be found in https://davischallenge.org/davis2017/code.html and the YouTube-VOS dataset can be found in https://youtube-vos.org/dataset

Code Availability

We upload the project in GitHub in the next few months.

References

  1. Barnes C, Shechtman E, Finkelstein A, Goldman D B (2009) Patchmatch: a randomized correspondence algorithm for structural image editing. ACM Trans Graph 28(3):24

    Article  Google Scholar 

  2. Beauchemin S S, Barron J L (1995) The computation of optical flow. ACM Comput Surv (CSUR) 27(3):433–466

    Article  Google Scholar 

  3. Chen L, Takaki T, Ishii I (2012) Accuracy of gradient-based optical flow estimation in high-frame-rate video analysis. IEICE Trans Inform Syst 95 (4):1130–1141

    Article  Google Scholar 

  4. Cheng J, Yang Y, Tang X, Xiong N, Zhang Y, Lei F (2020) Generative adversarial networks: a literature review. KSII Trans Internet Inform Syst (TIIS) 14(12):4625–4647

    Google Scholar 

  5. Cong R, Lei J, Fu H, Porikli F, Huang Q, Hou C (2019) Video saliency detection via sparsity-based reconstruction and propagation. IEEE Trans Image Process 28(10):4819–4831

    Article  MathSciNet  Google Scholar 

  6. Ding L, Goshtasby A (2001) On the canny edge detector. Pattern Recogn 34(3):721–725

    Article  Google Scholar 

  7. Ding D, Ram S, Rodríguez J J (2018) Image inpainting using nonlocal texture matching and nonlinear filtering. IEEE Trans Image Process 28 (4):1705–1719

    Article  MathSciNet  Google Scholar 

  8. Elharrouss O, Almaadeed N, Al-Maadeed S, Akbari Y (2020) Image inpainting: a review. Neural Process Lett 51(2):2007–2028

    Article  Google Scholar 

  9. Gao C, Saraf A, Huang J-B, Kopf J (2020) Flow-edge guided video completion. In: European conference on computer vision. Springer, pp 713–729

  10. Gibbons F X (1990) Self-attention and behavior: a review and theoretical update. Adv Exper Soc Psychol 23:249–303

    Article  Google Scholar 

  11. Huang J-B, Kang S B, Ahuja N, Kopf J (2016) Temporally coherent completion of dynamic video. ACM Trans Graph (TOG) 35(6):1–11

    Google Scholar 

  12. Iizuka S, Simo-Serra E, Ishikawa H (2017) Globally and locally consistent image completion. ACM Trans Graph (ToG) 36(4):1–14

    Article  Google Scholar 

  13. Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T (2017) Flownet 2.0: evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2462–2470

  14. Jam J, Kendrick C, Walker K, Drouard V, Hsu J G-S, Yap M H (2021) A comprehensive review of past and present image inpainting methods. Comput Vis Image Understand 203:103147

    Article  Google Scholar 

  15. Kim D, Woo S, Lee J-Y, Kweon I S (2019) Deep video inpainting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5792–5801

  16. Köhler R, Schuler C, Schölkopf B, Harmeling S (2014) Mask-specific inpainting with deep neural networks. In: German conference on pattern recognition. Springer, pp 523–534

  17. Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25

  18. Lee S, Oh S W, Won D, Kim S J (2019) Copy-and-paste networks for deep video inpainting. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4413–4421

  19. Li Y, Zhang X, Chen D (2018) Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1091–1100

  20. Lin J, Gan C, Han S Temporal shift module for efficient video understanding. arXiv:1811.08383 (2018) 1811

  21. Li Z, Lu C-Z, Qin J, Guo C-L, Cheng M-M (2022) Towards an end-to-end framework for flow-guided video inpainting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 17562–17571

  22. Liu K, Du Q, Yang H, Ma B (2010) Optical flow and principal component analysis-based motion detection in outdoor videos. EURASIP J Adv Signal Process 2010(1):1–6

    Article  Google Scholar 

  23. Liu K, Li J, Hussain Bukhari S S (2022) Overview of image inpainting and forensic technology. Security and Communication Networks, 2022

  24. Nazeri K, Ng E, Joseph T, Qureshi F Z, Ebrahimi M (2019) Edgeconnect: generative image inpainting with adversarial edge learning. arXiv:1901.00212

  25. Newson A, Almansa A, Fradet M, Gousseau Y, Pérez P (2014) Video inpainting of complex scenes. SIAM J Imag Sci 7(4):1993–2019

    Article  MathSciNet  Google Scholar 

  26. Oh S W, Lee S, Lee J-Y, Kim S J (2019) Onion-peel networks for deep video completion. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4403–4412

  27. Ouyang H, Wang T, Chen Q (2021) Internal video inpainting by implicit long-range propagation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 14579–14588

  28. Pathak D, Krahenbuhl P, Donahue J, Darrell T, Efros A A (2016) Context encoders: feature learning by inpainting. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2536–2544

  29. Perazzi F, Pont-Tuset J, McWilliams B, Van Gool L, Gross M, Sorkine-Hornung A (2016) A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 724–732

  30. Ren J S, Xu L, Yan Q, Sun W (2015) Shepard convolutional neural networks. Advances in Neural Information Processing Systems, 28

  31. Shao M, Zhang W, Zuo W, Meng D (2020) Multi-scale generative adversarial inpainting network based on cross-layer attention transfer mechanism. Knowl-Based Syst 196:105778

    Article  Google Scholar 

  32. Shaw P, Uszkoreit J, Vaswani A (2018) Self-attention with relative position representations. arXiv:1803.02155

  33. Shih T K, Tang N C, Hwang J-N (2009) Exemplar-based video inpainting without ghost shadow artifacts by maintaining temporal continuity. IEEE Trans Circ Syst Video Technol 19(3):347–360

    Article  Google Scholar 

  34. Sridevi G, Srinivas Kumar S (2019) Image inpainting based on fractional-order nonlinear diffusion for image reconstruction. Circ Syst Signal Process 38 (8):3802–3817

    Article  Google Scholar 

  35. Theckedath D, Sedamkar RR (2020) Detecting affect states using vgg16, resnet50 and se-resnet50 networks. SN Comput Sci 1(2):1–7

    Article  Google Scholar 

  36. Wang C, Huang H, Han X, Wang J (2019) Video inpainting by jointly learning temporal structure and spatial details. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 5232–5239

  37. Wang L, Chen W, Yang W, Bi F, Yu F R (2020) A state-of-the-art review on image synthesis with generative adversarial networks. IEEE Access 8:63514–63537

    Article  Google Scholar 

  38. Wang L, Guo Y, Liu L, Lin Z, Deng X, An W (2020) Deep video super-resolution using hr optical flow estimation. IEEE Trans Image Process 29:4323–4336

    Article  Google Scholar 

  39. Wexler Y, Shechtman E, Irani M (2007) Space-time completion of video. IEEE Trans Pattern Anal Mach Intell 29(3):463–476

    Article  Google Scholar 

  40. Xu N, Yang L, Fan Y, Yue D, Liang Y, Yang J, Huang T (2018) Youtube-vos: a large-scale video object segmentation benchmark. arXiv:1809.03327

  41. Xu R, Li X, Zhou B, Loy C C (2019) Deep flow-guided video inpainting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3723–3732

  42. Yin Z, Shi J (2018) Geonet: unsupervised learning of dense depth, optical flow and camera pose. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1983–1992

  43. Yu J, Lin Z, Yang J, Shen X, Lu X, Huang T S (2018) Generative image inpainting with contextual attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition , pp 5505–5514

  44. Yu J, Lin Z, Yang J, Shen X, Lu X, Huang T S (2019) Free-form image inpainting with gated convolution. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4471–4480

  45. Zeng Y, Fu J, Chao H (2020) Learning joint spatial-temporal transformations for video inpainting. In: European conference on computer vision. Springer, pp 528–543

  46. Zhang Y, Aydın TO (2021) Deep hdr estimation with generative detail reconstruction. In: Computer graphics forum, vol 40. Wiley Online Library, pp 179–190

  47. Zhang M, Zhang G-F (2021) Fast image inpainting strategy based on the space-fractional modified cahn-hilliard equations. Comput Math Applic 102:1–14

    Article  MathSciNet  Google Scholar 

  48. Zhang R, Isola P, Efros A A, Shechtman E, Wang O (2018) The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 586–595

  49. Zhang K, Fu J, Liu D (2022) Inertia-guided flow completion and style fusion for video inpainting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5982–5991

  50. Zhao W, Zhang J, Li L, Barnes N, Liu N, Han J (2021) Weakly supervised video salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pp 16826–16835

Download references

Funding

This work was supported by Fundamental Research Funds for the Universities of Henan Province(NSFRF220414), Excellent Young Teachers Program of Henan Polytechnic University (No.2019XQG-02).

Author information

Authors and Affiliations

Authors

Contributions

All authors took part in the discussion of the work described in this paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Zhanqiang Huo.

Ethics declarations

Ethics approval and consent to participate

Not applicable

Consent for Publication

Not applicable

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, J., Yang, Z., Huo, Z. et al. Local and nonlocal flow-guided video inpainting. Multimed Tools Appl 83, 10321–10340 (2024). https://doi.org/10.1007/s11042-023-15457-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15457-z

Keywords

Navigation