Multimedia Systems

, Volume 24, Issue 5, pp 597–609 | Cite as

Semantic object removal with convolutional neural network feature-based inpainting approach

  • Xiuxia Cai
  • Bin Song
Regular Paper


Object removal is a popular image manipulation technique, which mainly involves object segmentation and image inpainting two technical problems. In the conventional object removal framework, the object segmentation part needs a mask or artificial pre-processing; and the inpainting technique still requires further improving the quality. In this paper, we propose a new framework of object removal using the techniques of deep learning. Conditional random fields as recurrent neural networks (CRF-RNN) is used to segment the target in sematic, which can avoid the trouble of mask or artificial pre-processing for object segmentation. In inpainting part, a new method for inpainting the missing region is proposed. Besides, the representation features are calculated from the convolutional neural network (CNN) feature maps of the neighbor regions of the missing region. Then, large-scale bound-constrained optimization (L-BFGS) is used to synthesize the missing region based on the CNN representation features of similarity neighbor regions. We evaluate the proposed method by applying it to different kinds of images and textures for object removal and inpainting. Experimental results demonstrate that our method is better than the conventional method in terms of inpainting applications and object removal.


Deep learning Semantic segmentation Inpainting CNN 



We thank the anonymous reviewers and the editor for their valuable comments. This work has been supported by The National Natural Science Foundation of China (Nos. 61772387 and 61372068), the Research Fund for the Doctoral Program of Higher Education of China (No. 20130203110005), the Fundamental Research Funds for the Central Universities (No. K5051301033), the 111 Project (No. B08038), and also supported by the ISN State Key Laboratory.


  1. 1.
    Li, Z., Tang, J.: Weakly supervised deep matrix factorization for social image understanding. IEEE Trans. Image Process. 26(1), 276–288 (2017)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Girshick, R., Donahue, J., Darrell, T., et al.: Region-based convolutional networks for accurate object detection and segmentation. Pattern Anal. Mach. Intell. IEEE Trans. 38(1), 142–158 (2016)CrossRefGoogle Scholar
  3. 3.
    Alexe, B., Deselaers, T., Ferrari, V.: Measuring the objectness of image windows. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2189C2202 (2012)CrossRefGoogle Scholar
  4. 4.
    Han, J., Zhang, D., Cheng, G., Guo, L., Ren, J.: Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning. IEEE Trans. Geosci. Remote Sens. 53(6), 3325–3337 (2015)CrossRefGoogle Scholar
  5. 5.
    Zhang, D., Han, J., Li, C., Wang, J., Li, X.: Detection of co-salient objects by looking deep and wide. Int. J. Comput. Vis. 120(2), 215–232 (2016)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Zhang, D., Han, J., Han, J., Shao, L.: Cosaliency detection based on intrasaliency prior transfer and deep intersaliency mining. IEEE Trans. Neural Netw. Learn. Syst. 27(6), 1163–1176 (2016)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Zheng, S., Jayasumana, S., Romera-Paredes, B.,Vineet, V., Su, Z., Du, D.: Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 1529–1537 (2015)Google Scholar
  8. 8.
    Darabi, S., Shechtman, E., Barnes, C., Goldman, D. B., Sen, P.: Image melding: combining inconsistent images using patch -based synthesis. Trans. Gr. 31(3), article 82 (2012)CrossRefGoogle Scholar
  9. 9.
    Liang, Z., Yang, G., Ding, X., et al.: An efficient forgery detection algorithm for object removal by exemplar-based image inpainting. J. Vis. Commun. Image Represent. 30, 75–85 (2015)CrossRefGoogle Scholar
  10. 10.
    Ruzic, T., Pizurica, A.: Context-aware patch-based image inpainting using Markov random field modeling. Image Process. IEEE Trans. 24(1), 444–456 (2015)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Carreira, J., Sminchisescu, C.: CPMC: automatic object segmentation using constrained parametric min-cuts. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1312C1328 (2012)CrossRefGoogle Scholar
  12. 12.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105 (2012)Google Scholar
  13. 13.
    Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: Deepface: Closing the gap to human-level performance in face verification. In: Computer vision and pattern recognition (CVPR),2014 IEEE conference on, 1701C1708 (2014)Google Scholar
  14. 14.
    Cheng, G., Zhou, P., Han, J.: Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 54(12), 7405–7415 (2016)CrossRefGoogle Scholar
  15. 15.
    Yao, X., Han, J., Cheng, G., Qian, X., Guo, L.: Semantic annotation of high-resolution satellite images via weakly supervised learning. IEEE Trans. Geosci. Remote Sens. 54(6), 3660–3671 (2016)CrossRefGoogle Scholar
  16. 16.
    Gatys, L.A., Ecker, A.S., Bethge, M.A.: Neural algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015)
  17. 17.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  18. 18.
    Simonyan, K., Vedaldi, A., & Zisserman, A.: Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)
  19. 19.
    Pathak, D., Krahenbuhl, P., Donahue, J., et al.: Context encoders: Feature learning by inpainting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2536–2544 (2016)Google Scholar
  20. 20.
    Li, Z., Tang, J.: Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Trans. Multimed. 17(11), 1989–1999 (2015)CrossRefGoogle Scholar
  21. 21.
    Cadieu, C.F., Hong, H., Yamins, D.L.K., et al.: Deep neural networks rival the representation of primate IT cortex for core visual object recognition. PLoS Comput. Biol. 10(12), e1003963 (2014)CrossRefGoogle Scholar
  22. 22.
    Gl, U., van Gerven, M.A.J.: Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. J. Neurosci. 35(27), 10005–10014 (2015)CrossRefGoogle Scholar
  23. 23.
    Khaligh-Razavi, S.M., Kriegeskorte, N.: Deep supervised, but not unsupervised, models may explain IT cortical representation. PLoS comput. biol. 10(11), e1003915 (2014)CrossRefGoogle Scholar
  24. 24.
    Paris, S., Durand, F.: A fast approximation of the bilateral filter using a signal processing approach. IJCV 81(1), 24C52 (2013)Google Scholar
  25. 25.
    Afonso, M.V., BioucasDias, J.M., Figueiredo, M.A.T.: An augmented Lagrangian approach to the constrained optimization formulation of imaging inverse problems. IEEE Trans. Image Process. 20(3), 681 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Hu, Y., Zhang, D., Ye, J., Li, X., He, X.: Fast and accurate matrix completion via truncated nuclear norm regularization. IEEE Trans. Pattern Anal. Mach. Intell. 35(9), 2117 (2013)CrossRefGoogle Scholar
  27. 27.
    Barnes, C., Shechtman, E., Dan, B. G., Dan, B.G.: PatchMatch: a randomized correspondence algorithm for structural image editing. ACM SIGGRAPH 28, 24 (2009)Google Scholar
  28. 28.
    Zhu, J.Y., Kr?henbhl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. Comput. Vis. ECCV 2016. Springer, Berlin (2016)Google Scholar
  29. 29.
    Kingma, D. P., Welling, M.: Auto-encoding variational bayes. arXiv preprint 1312.6114 arXiv:1312.6114 (2013)
  30. 30.
    Gatys, L., Ecker, A.S., Bethge, M.: Texture synthesis using convolutional neural networks. In: Advances in Neural Information Processing Systems, pp 262–270 (2015)Google Scholar
  31. 31.
    Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A.: Describing textures in the wild. In: IEEE Conference in Computer Vision and Pattern Recognition (CVPR), pp 3606–3613 (2014)Google Scholar
  32. 32.
    Cimpoi, M., Maji, S., Kokkinos, I., Vedaldi, A.: Deep filter banks for texture recognition, description, and segmentation. Inter. J. Comput. Vis. 118(1), 65–94 (2016)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Zhu, S., Ma, K.-K.: A new diamond search algorithm for fast block matching motion estimation. Image Process. IEEE Trans. 9(2), 287–290 (2000)CrossRefGoogle Scholar
  34. 34.
    Zhu, C., Byrd, R.H., Lu, P., Nocedal, J.: Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans. Math. Softw. TOMS 23(4), 550C560 (1997)MathSciNetzbMATHGoogle Scholar
  35. 35.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia. ACM, Orlando, Florida, USA. pp 675–678 (2014)Google Scholar
  36. 36.
    Heeger, D.J., Bergen, J.R.: Pyramid-based texture analysis/synthesis. In: Proceedings of the 22nd annual conference on Computer graphics and interactive techniques. pp 229–238 (1995)Google Scholar
  37. 37.
    Portilla, J., Simoncelli, P.: A parametric texture model based on joint statistics of complex wavelet coefficients. Int. J. Comput. Vis. 40(1), 49–71 (2000)CrossRefzbMATHGoogle Scholar
  38. 38.
    Xie, X., Tian, F., Seah, H.S.: Feature guided texture synthesis (fgts) for artistic style transfer. In: Proceedings of the 2nd international conference on Digital interactive media in entertainment and arts. pp 44–49 (2007)Google Scholar
  39. 39.
    Criminisi, A., Prez, P., Toyama, K.: Region filling and object removal by exemplar-based image inpainting. IEEE Trans. image Process. 13(9), 1200–1212 (2004)CrossRefGoogle Scholar
  40. 40.
    Hays, J., Efros, A.A.: Scene completion using millions of photographs. Commun. ACM 51(10), 87–94 (2008)CrossRefGoogle Scholar
  41. 41.
    Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: A retrospective. Int. J. comput. vis. 111(1), 98–136 (2015)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.State Key Laboratory of Integrated Services NetworksXidian UniversityXianChina

Personalised recommendations