Abstract
Deep video inpainting can automatically fill in missing content both in spatial and temporal domain. Unfortunately, malicious video inpainting operations can distort media content, making it challenging for viewers to detect inpainting traces due to their realistic visual effects. As a result, the detection of video inpainting has emerged as a crucial research area in video forensics. Several detection models that have been proposed are trained and tested on datasets made by three kinds of inpainting models, but not tested against the latest and better deep inpainting models. To address this, we introduce a novel end-to-end video inpainting detection network, comprising a feature extraction module and a feature learning module. The Feature extraction module is a Bayar layer and the feature learning module is an encoder-decoder module. The proposed approach is evaluated with inpainted videos created by several state-of-the-art deep video inpainting networks. Extensive experiments has proven that our approach achieved better inpainting localization performance than other methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kim, D., Woo, S., Lee, J.Y., Kweon, I.S.: Deep video inpainting. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5792–5801 (2019). https://doi.org/10.1109/CVPR.2019.00594
Lee, S., Oh, S.W., Won, D., Kim, S.J.: Copy-and-paste networks for deep video inpainting. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4413–4421 (2019). https://doi.org/10.1109/ICCV.2019.00451
Oh, S.W., Lee, S., Lee, J.Y., Kim, S.J.: Onion-peel networks for deep video completion. In: 2019 IEEE/CVF International Conference on Computer Vision (CVPR), pp. 4403–4412 (2019). https://doi.org/10.1109/ICCV.2019.00450
Zhou, P., Yu, N., Wu, Z., Davis, L.S., Shrivastava, A., Lim, S.N.: Deep video inpainting detection. arXiv preprint. arXiv:2101.11080 (2021)
Yu, B., Li, W., Li, X., Lu, J., Zhou, J.: Frequency-aware spatiotemporal transformers for video inpainting detection. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8188–8197 (2021). https://doi.org/10.1109/ICCV48922.2021.00808
Wei, S., Li, H., Huang, J.: Deep video inpainting localization using spatial and temporal traces. In: 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8957–8961 (2022). https://doi.org/10.1109/ICASSP43922.2022.9746190
Zeng, Y., Fu, J., Chao, H.: Learning joint spatial-temporal transformations for video inpainting. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 528–543. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_31
Gao, C., Saraf, A., Huang, J.B., Kopf, J.: Flow-edge guided video completion. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 713–729. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_42
Liu, R., et al.: FuseFormer: fusing fine-grained information in transformers for video inpainting. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14040–14049 (2021). https://doi.org/10.1109/ICCV48922.2021.01378
Xu, R., Li, X., Zhou, B., Loy, C.C.: Deep flow-guided video inpainting. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3723–3732 (2019). https://doi.org/10.1109/CVPR.2019.00384
Bertalmio, M., Sapiro, G., Caselles, V., Ballester, C.: Image inpainting. In: 2000 27th Annual Conference on Computer graphics and Interactive Techniques, pp. 417–424 (2000). https://doi.org/10.1145/344779.344972
Bertalmio, M., Bertozzi, A.L., Sapiro, G.: Navier-stokes, fluid dynamics, and image and video inpainting. In: 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), p. I (2001). https://doi.org/10.1109/CVPR.2001.990497
Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.B.: PatchMatch: a randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 28(3) (2009). https://doi.org/10.1145/1531326.1531330
Criminisi, A., Pérez, P., Toyama, K.: Region filling and object removal by exemplar-based image inpainting. IEEE Trans. Image Process. 13(9), 1200–1212 (2004). https://doi.org/10.1109/TIP.2004.833105
Newson, A., Almansa, A., Fradet, M., Gousseau, Y., Pérez, P.: Video inpainting of complex scenes. SIAM J. Imaging Sci. 7(4), 1993–2019 (2014). https://doi.org/10.1137/140954933
Huang, J.B., Kang, S.B., Ahuja, N., Kopf, J.: Temporally coherent completion of dynamic video. ACM Trans. Graph. (ToG) 35(6), 1–11 (2016). https://doi.org/10.1145/2980179.2982398
Chang, Y.L., Liu, Z.Y., Lee, K.Y., Hsu, W.: Free-form video inpainting with 3D gated convolution and temporal PatchGAN. In: 2019 IEEE/CVF International Conference on Computer Vision (CVPR), pp. 9066–9075 (2019). https://doi.org/10.1109/ICCV.2019.00916
Chang, Y.L., Yu Liu, Z., Hsu, W.: VORNet: spatio-temporally consistent video inpainting for object removal. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR) (2019). https://doi.org/10.1109/CVPRW.2019.00229
Wang, C., Huang, H., Han, X., Wang, J.: Video inpainting by jointly learning temporal structure and spatial details. In: 2019 AAAI Conference on Artificial Intelligence, pp. 5232–5239 (2019). https://doi.org/10.1609/aaai.v33i01.33015232
Zhang, K., Fu, J., Liu, D.: Inertia-guided flow completion and style fusion for video inpainting. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5982–5991 (2022). https://doi.org/10.1109/CVPR52688.2022.00589
Liu, R., et al.: Decoupled spatial-temporal transformer for video inpainting. arXiv preprint. https://arxiv.org/abs/2104.06637
Kang, J., Oh, S.W., Kim, S.J.: Error compensation framework for flow-guided video inpainting. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13675, pp. 375–390. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19784-0_22
Zhang, K., Fu, J., Liu, D.: Flow-guided transformer for video inpainting. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13678, pp. 74–90. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19797-0_5
Zhang, H., Mai, L., Xu, N., Wang, Z., Collomosse, J., Jin, H.: An internal learning approach to video inpainting. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2720–2729 (2019). https://doi.org/10.1109/ICCV.2019.00281
Ouyang, H., Wang, T., Chen, Q.: Internal video inpainting by implicit long-range propagation. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14579–14588 (2021). https://doi.org/10.1109/ICCV48922.2021.01431
Bayar, B., Stamm, M.C.: Constrained convolutional neural networks: a new approach towards general purpose image manipulation detection. IEEE Trans. Inf. Forensics Secur. 13(11), 2691–2706 (2018). https://doi.org/10.1109/TIFS.2018.2825953
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint. http://arxiv.org/abs/1409.1556
Ding, X., Pan, Y., Luo, K., Huang, Y., Ouyang, J., Yang, G.: Localization of deep video inpainting based on spatiotemporal convolution and refinement network. In: 2021 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5 (2021). https://doi.org/10.1109/ISCAS51556.2021.9401675
Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 724–732 (2016). https://doi.org/10.1109/CVPR.2016.85
Li, H., Huang, J.: Localization of deep inpainting using high-pass fully convolutional network. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8301–8310 (2019). https://doi.org/10.1109/ICCV.2019.00839
Zhou, P., et al.: Generate, segment, and refine: Towards generic manipulation segmentation. In: 2020 AAAI Conference on Artificial Intelligence, pp. 13058–13065 (2020). https://doi.org/10.1609/aaai.v34i07.7007
Acknowledgements
This work was supported by National Key Technology Research and Development Program under 2020AAA0140000.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Li, J., Zhao, X., Cao, Y. (2024). Generalizable Deep Video Inpainting Detection Based on Constrained Convolutional Neural Networks. In: Ma, B., Li, J., Li, Q. (eds) Digital Forensics and Watermarking. IWDW 2023. Lecture Notes in Computer Science, vol 14511. Springer, Singapore. https://doi.org/10.1007/978-981-97-2585-4_9
Download citation
DOI: https://doi.org/10.1007/978-981-97-2585-4_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2584-7
Online ISBN: 978-981-97-2585-4
eBook Packages: Computer ScienceComputer Science (R0)