Generalizable Deep Video Inpainting Detection Based on Constrained Convolutional Neural Networks

Li, Jinchuan; Zhao, Xianfeng; Cao, Yun

doi:10.1007/978-981-97-2585-4_9

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14511))

Included in the following conference series:

International Workshop on Digital Watermarking

49 Accesses

Abstract

Deep video inpainting can automatically fill in missing content both in spatial and temporal domain. Unfortunately, malicious video inpainting operations can distort media content, making it challenging for viewers to detect inpainting traces due to their realistic visual effects. As a result, the detection of video inpainting has emerged as a crucial research area in video forensics. Several detection models that have been proposed are trained and tested on datasets made by three kinds of inpainting models, but not tested against the latest and better deep inpainting models. To address this, we introduce a novel end-to-end video inpainting detection network, comprising a feature extraction module and a feature learning module. The Feature extraction module is a Bayar layer and the feature learning module is an encoder-decoder module. The proposed approach is evaluated with inpainted videos created by several state-of-the-art deep video inpainting networks. Extensive experiments has proven that our approach achieved better inpainting localization performance than other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kim, D., Woo, S., Lee, J.Y., Kweon, I.S.: Deep video inpainting. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5792–5801 (2019). https://doi.org/10.1109/CVPR.2019.00594
Lee, S., Oh, S.W., Won, D., Kim, S.J.: Copy-and-paste networks for deep video inpainting. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4413–4421 (2019). https://doi.org/10.1109/ICCV.2019.00451
Oh, S.W., Lee, S., Lee, J.Y., Kim, S.J.: Onion-peel networks for deep video completion. In: 2019 IEEE/CVF International Conference on Computer Vision (CVPR), pp. 4403–4412 (2019). https://doi.org/10.1109/ICCV.2019.00450
Zhou, P., Yu, N., Wu, Z., Davis, L.S., Shrivastava, A., Lim, S.N.: Deep video inpainting detection. arXiv preprint. arXiv:2101.11080 (2021)
Yu, B., Li, W., Li, X., Lu, J., Zhou, J.: Frequency-aware spatiotemporal transformers for video inpainting detection. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8188–8197 (2021). https://doi.org/10.1109/ICCV48922.2021.00808
Wei, S., Li, H., Huang, J.: Deep video inpainting localization using spatial and temporal traces. In: 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8957–8961 (2022). https://doi.org/10.1109/ICASSP43922.2022.9746190
Zeng, Y., Fu, J., Chao, H.: Learning joint spatial-temporal transformations for video inpainting. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 528–543. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_31
Chapter Google Scholar
Gao, C., Saraf, A., Huang, J.B., Kopf, J.: Flow-edge guided video completion. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 713–729. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_42
Chapter Google Scholar
Liu, R., et al.: FuseFormer: fusing fine-grained information in transformers for video inpainting. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14040–14049 (2021). https://doi.org/10.1109/ICCV48922.2021.01378
Xu, R., Li, X., Zhou, B., Loy, C.C.: Deep flow-guided video inpainting. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3723–3732 (2019). https://doi.org/10.1109/CVPR.2019.00384
Bertalmio, M., Sapiro, G., Caselles, V., Ballester, C.: Image inpainting. In: 2000 27th Annual Conference on Computer graphics and Interactive Techniques, pp. 417–424 (2000). https://doi.org/10.1145/344779.344972
Bertalmio, M., Bertozzi, A.L., Sapiro, G.: Navier-stokes, fluid dynamics, and image and video inpainting. In: 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), p. I (2001). https://doi.org/10.1109/CVPR.2001.990497
Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.B.: PatchMatch: a randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 28(3) (2009). https://doi.org/10.1145/1531326.1531330
Criminisi, A., Pérez, P., Toyama, K.: Region filling and object removal by exemplar-based image inpainting. IEEE Trans. Image Process. 13(9), 1200–1212 (2004). https://doi.org/10.1109/TIP.2004.833105
Article Google Scholar
Newson, A., Almansa, A., Fradet, M., Gousseau, Y., Pérez, P.: Video inpainting of complex scenes. SIAM J. Imaging Sci. 7(4), 1993–2019 (2014). https://doi.org/10.1137/140954933
Article MathSciNet Google Scholar
Huang, J.B., Kang, S.B., Ahuja, N., Kopf, J.: Temporally coherent completion of dynamic video. ACM Trans. Graph. (ToG) 35(6), 1–11 (2016). https://doi.org/10.1145/2980179.2982398
Article Google Scholar
Chang, Y.L., Liu, Z.Y., Lee, K.Y., Hsu, W.: Free-form video inpainting with 3D gated convolution and temporal PatchGAN. In: 2019 IEEE/CVF International Conference on Computer Vision (CVPR), pp. 9066–9075 (2019). https://doi.org/10.1109/ICCV.2019.00916
Chang, Y.L., Yu Liu, Z., Hsu, W.: VORNet: spatio-temporally consistent video inpainting for object removal. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR) (2019). https://doi.org/10.1109/CVPRW.2019.00229
Wang, C., Huang, H., Han, X., Wang, J.: Video inpainting by jointly learning temporal structure and spatial details. In: 2019 AAAI Conference on Artificial Intelligence, pp. 5232–5239 (2019). https://doi.org/10.1609/aaai.v33i01.33015232
Zhang, K., Fu, J., Liu, D.: Inertia-guided flow completion and style fusion for video inpainting. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5982–5991 (2022). https://doi.org/10.1109/CVPR52688.2022.00589
Liu, R., et al.: Decoupled spatial-temporal transformer for video inpainting. arXiv preprint. https://arxiv.org/abs/2104.06637
Kang, J., Oh, S.W., Kim, S.J.: Error compensation framework for flow-guided video inpainting. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13675, pp. 375–390. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19784-0_22
Chapter Google Scholar
Zhang, K., Fu, J., Liu, D.: Flow-guided transformer for video inpainting. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13678, pp. 74–90. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19797-0_5
Chapter Google Scholar
Zhang, H., Mai, L., Xu, N., Wang, Z., Collomosse, J., Jin, H.: An internal learning approach to video inpainting. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2720–2729 (2019). https://doi.org/10.1109/ICCV.2019.00281
Ouyang, H., Wang, T., Chen, Q.: Internal video inpainting by implicit long-range propagation. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14579–14588 (2021). https://doi.org/10.1109/ICCV48922.2021.01431
Bayar, B., Stamm, M.C.: Constrained convolutional neural networks: a new approach towards general purpose image manipulation detection. IEEE Trans. Inf. Forensics Secur. 13(11), 2691–2706 (2018). https://doi.org/10.1109/TIFS.2018.2825953
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint. http://arxiv.org/abs/1409.1556
Ding, X., Pan, Y., Luo, K., Huang, Y., Ouyang, J., Yang, G.: Localization of deep video inpainting based on spatiotemporal convolution and refinement network. In: 2021 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5 (2021). https://doi.org/10.1109/ISCAS51556.2021.9401675
Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 724–732 (2016). https://doi.org/10.1109/CVPR.2016.85
Li, H., Huang, J.: Localization of deep inpainting using high-pass fully convolutional network. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8301–8310 (2019). https://doi.org/10.1109/ICCV.2019.00839
Zhou, P., et al.: Generate, segment, and refine: Towards generic manipulation segmentation. In: 2020 AAAI Conference on Artificial Intelligence, pp. 13058–13065 (2020). https://doi.org/10.1609/aaai.v34i07.7007

Download references

Acknowledgements

This work was supported by National Key Technology Research and Development Program under 2020AAA0140000.

Author information

Authors and Affiliations

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, 100085, China
Jinchuan Li, Xianfeng Zhao & Yun Cao
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, 100085, China
Jinchuan Li, Xianfeng Zhao & Yun Cao

Authors

Jinchuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Xianfeng Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yun Cao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xianfeng Zhao .

Editor information

Editors and Affiliations

School of Cyber Security, Qilu University of Technology, Jinan, China
Bin Ma
Qilu University of Technology, Jinan, China
Jian Li
Qilu University of Technology, Jinan, China
Qi Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, J., Zhao, X., Cao, Y. (2024). Generalizable Deep Video Inpainting Detection Based on Constrained Convolutional Neural Networks. In: Ma, B., Li, J., Li, Q. (eds) Digital Forensics and Watermarking. IWDW 2023. Lecture Notes in Computer Science, vol 14511. Springer, Singapore. https://doi.org/10.1007/978-981-97-2585-4_9

Download citation

DOI: https://doi.org/10.1007/978-981-97-2585-4_9
Published: 25 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2584-7
Online ISBN: 978-981-97-2585-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Generalizable Deep Video Inpainting Detection Based on Constrained Convolutional Neural Networks