1 Introduction

Image inpainting derives from the restoration of damaged artworks [5]. Its basic idea is to use the undamaged and effective information to restore the damaged regions according to certain rules [11, 30]. Its main purpose is to make the restored image meet the requirements of human vision, so that people who are not familiar with the original image cannot notice the restoration trace [17, 33]. With the rapid development of computer and multimedia technology, the inpainting technology has been widely used in many fields [20], such as scratches restoration of old photos and precious literatures, protection of cultural relics [8, 18], robot vision, film and television special effects production, and so on [3, 37].

At present, the existing inpainting approaches can be classified into three categories. The first is based on the Partial Differential Equation (PDE). Its basic idea is that the missing region is filled smoothly by diffusing the effective information from the undamaged region into the damaged region at the pixel level [35]. The representative approaches include the BSCB model [5], the TV model [7], and the CDD model [6]. For the small and non-textured damaged regions, these approaches can achieve convincingly excellent result, but for the large and textured missing regions they tend to induce over-smooth effect or stair-case effect.

The second is based on the exemplar, which is also commonly used method at present. Its basic idea is that the missing region is restored using the similar patch in a visually plausible manner at the patch level. The representative approaches include the non-parametric sampling approach [12], the exemplar-based approach [9], the patch-sparsity-based approach [39], and the nonlocal-means approach [38]. The advantage of these approaches is they can obtain satisfactory results in restoring the large damaged regions. However, they also have some disadvantages, such as unreasonable restoration order, mismatch error and error accumulation, low efficiency due to greedy search, and so on.

The third is based on the sparse representation. Its basic idea is to calculate the sparse representation coefficient of the damaged patch on the over-complete dictionary, and then restore the damaged patch according to the sparse representation coefficient and the over-complete dictionary. The representative approaches include the K-SVD-based approach [1] and the MCA-based approach [13]. These approaches can achieve convincing results in filling missing pixels and restoring scratches. However, due to the fact that the methods based on sparse representation are still in the initial stage, and the over-complete dictionary has a great influence on the restoration results, the adaptability of over-complete dictionaries needs to be further improved.

From the perspective of image inpainting, removing object from the image means that we need to restore the large-scale damaged regions [14, 46], and the most commonly used method is based on exemplar [4, 10]. For each damaged patch, it searches for the similar exemplar patch in the undamaged region according to the matching rule, and then uses the exemplar patch to replace the damaged patch. A series of studies have shown that, the method can achieve satisfactory restoration effect. However, it also suffers from some shortcomings. For example, it uses the SSD (Sum of Squared Differences) to measure the degree of similarity between target patch and exemplar patch. Although the matching rule is simple, it may lead to a mismatch error, i.e., the damaged patch is replaced by an unsuitable patch. Even worse, the error will be continually accumulated along with the process progresses. Finally some undesired objects will be introduced into the restored region, and the restoration results cannot meet the requirements of human vision.

In view of these problems, we propose an image inpainting method for object removal based on difference degree constraint. Compared with the previous methods, our main contributions are as follows:

  1. 1.

    We define the MSD (Mean of Squared Differences) between the target patch and the exemplar patch, and use it to measure the degree of differences between corresponding pixels at known positions (i.e., the pixels that already exist) in the target patch and the exemplar patch.

  2. 2.

    We define the SMD (Square of Mean Difference) between the target patch and the exemplar patch, and use it to measure the degree of differences between the pixels at known positions (i.e., the pixels that already exist) in the target patch and the pixels at unknown positions (i.e., the pixels that will be used to fill) in the exemplar patch.

  3. 3.

    Based on MSD and SMD, we define a new matching rule, and the exemplar patch with the smallest sum of MSD and SMD is selected as the most similar patch. In this way, we can effectively prevent the occurrence and accumulation of mismatch error, and improve the restoration effect.

2 Related works

The PDE-based methods solve the partial differential equations and make the effective information smoothly propagate to the damaged region along the direction of isophote. Rathish et al. [31] used the square of the L2 norm of Hessian of the image as regularization term, used convexity splitting to solve the resulting semi-discrete scheme in Fourier domain. Yang et al. [40] utilized the newly defined fractional-order structure tensor to control the regularization process. The new model can inherit genuine anisotropism of tensor regularization, and better handle subtle details and complex structures. Theljani et al. [36] based on a fourth-order variational model, used an adaptive selection of the diffusion parameters to optimize the regularization effects in the neighborhoods of the small features. Mousavi et al. [28] considered the effect of spectrum and phase angle of the Fourier transform, generated two regularization parameters and had two degree of freedom, so as to restore an image. These methods can be used to restore small-scale damaged regions, such as removing scratches, removing text coverage, filling holes, and so on.

The exemplar-based methods mainly search for the most similar exemplar patch in the undamaged source region, and then use it to restore the corresponding damaged target patch. These methods are commonly used to remove object from images. Liu et al. [23] used structural similarity index measure (SSIM), obtained the best candidate patch by four cases of rotation and inversion, so as to find the most similar exemplar patch. Isogawa et al. [21] noted the important effect of the mask on the restoration results, proposed a mask optimization method and used it to automatically obtain good results. Wong et al. [38] introduced the idea of non-local mean in image denoising into image inpainting, and used the mean of a number of exemplar patches to replace the damaged patch. However, it may lead to the loss of texture details, and induce the over-smooth phenomenon in restored regions. Shen et al. [34] directly selected some patches from original image, formed the over-complete dictionary, and restored the damaged images based on sparse representation. For smooth images, the method can obtain satisfactory restoration effect. But for texture images, it may lead to loss of some texture details. Liu et al. [24] modified the confidence term into an exponential form, computed the sum of confidence term and data term to make the filling order more reasonable. Zhang et al. [47] used the information of curvature and gradient to replace the data term, to improve the filling order. However, they did not improve the matching rule, so there may be a mismatch error between patches in the inpainting process. Nan et al. [29] set different weights for data item and confidence item according to the golden section, so that the restoration order is more reasonable, but it cannot effectively prevent the occurrence of mismatch error, and the restoration effect needs to be improved. Yao [41] introduced the correlation between the target patch and the neighborhood patch into the priority calculation, and modified the multiplication to addition. Besides, she defined a new similarity calculation function to improve the restoration effect. Ghorai et al. [15] proposed a Markov Random Field (MRF)-based image inpainting method. They used a novel group formation strategy based on subspace clustering to search the candidate patches in relevant source region only, and adopted an patch refinement scheme using higher order singular value decomposition to capture underlying pattern among the candidate patches. Zhang et al. [43] used a surface fitting as the prior knowledge, and used a Jaccard similarity coefficient to improve the matching precision between patches. These exemplar-based methods have attracted the attention of many researchers, and various improved methods have been continuously proposed.

The sparse-representation-based methods mainly use over-complete dictionary and sparse coding to reconstruct damaged pixels. Zhang et al. [44] classified the patches according to local feature. For smooth patch, it was restored by over-complete dictionary and sparse coding. For texture patch, it was restored by exemplar patch. Mo et al. [27] used self-adaptive group structure and sparse representation to solve the problems that object structure discontinuity and poor texture detail occurred in image inpainting method, and achieved better restoration results. Hu et al. [19] combined Criminisi method and sparse representation method, used sparse representation instead of searching for the most similar exemplar patch in Criminisi algorithm. Zhang et al. [45] monitored the restoration process. When there was a mismatch between the exemplar patch and the target patch, they calculated the sparse coding of the target patch on the discrete cosine dictionary, then reconstructed the target patch using the over-complete dictionary and sparse coding, and obtained a better restoration effect. However, the application of sparse representation in image inpainting is still in its infancy, its representation model and the adaptability of over-complete dictionary needs to be further improved.

It should be noted that in recent years, with the rapid development of deep learning, researchers have applied it to various fields of computer vision [25], such as object segmentation [26], object detection, saliency detection and so on. In terms of image inpainting, Goodfellow et al. [16] proposed a Generative Adversarial Networks (GAN). They used a large number of real images to train the generative model and the discriminative model, so that the deep network can learn the feature distribution of the real images. Finally, the generator can be used to automatically generate images that are very similar to the real images. Sagong et al. [32] proposed a fast PEPSI (Parallel Extended-decoder Path for Semantic Inpainting) model. It can reduce the number of convolution operations by adopting a structure consisting of a single shared encoding network and a parallel decoding network with coarse and inpainting paths. Zeng et al. [42] proposed a PEN-Net (Pyramid-context ENcoder Network). It used a U-Net structure to restore an image by encoding contextual semantics from full resolution input, and decoding the learned semantic features back into images. Besides, it used a pyramid-context encoder to progressively learn region affinity by attention from a high-level semantic feature map and transfer the learned attention to the previous low-level feature map. Jiang et al. [22] used a generator, a global discriminator, and a local discriminator to design the network model, and generated more realistic restoration results. In addition, there are seven papers on image inpainting based on deep learning at the CVPR 2020.

3 Definitions of MSD and SMD

3.1 Notations

For easy understanding, we adopt same notations used in [9]. As shown in Fig. 1, Ω is the target region (i.e., the damaged region) which will be removed and filled, Φ is the source region (i.e., the undamaged region), it may be defined as the entire image I minus the target region (i.e., Φ = IΩ). Ω denotes the boundary of the target region Ω. Suppose that the patch Ψp centered at the point p(pΩ ) is to be filled. Given the patch Ψp, np is the unit vector orthogonal to the boundary Ω and \(\nabla ^{\bot }_{p}\) is the isophote at the point p.

Fig. 1
figure 1

Notation diagram

3.2 Mismatch error

In the traditional exemplar-based method, when the target patch Ψp is determined, it searches for the exemplar patch which is most similar to Ψp according to the following matching rule [9]:

$$ {\varPsi}_{\hat{q}}=\underset{{\varPsi}_{q}\in{\varPhi}}{\arg\min} ssd({\varPsi}_{p},{\varPsi}_{q}) $$
(1)

where ssd(Ψp,Ψq) is defined as the SSD (Sum of Squared Differences) of the already existing pixels between the two patches.

Although the matching rule is simple, there is a risk that the target patch is restored by an unsuitable exemplar patch. Even worse, the error will be continually accumulated along with the process progresses, which may introduce some undesired and unexpected objects into the target regions. To better illustrate, in Fig. 2 we show the restoration process when removing a target from an image according to this matching rule, where the first (i.e., the first row and the first column) is the original image. The second is the object to be removed, which is marked in green. The third is the inpainting mask. From the fourth to the last one (i.e., the fourth row and the fifth column) are the restoration process of object removal. It should be noted that we have saved a total of 371 images in the simulation experiment. In order to simplify the display, we only show 17 of them here. But from these images, we can still clearly see the restoration process of the target region.

Fig. 2
figure 2

Restoration process when removing a target from an image according to the original matching rule

From Fig. 2 we can see that, at the beginning the target region where the object is located is gradually reduced along with the process progresses. However, in the image in row 2 and column 3, a small part of the arm of another contestant is copied into the target region, which means that the target patch has been replaced by an unsuitable exemplar patch, i.e. there is a mismatch error between the two patches. After that, the arm part in the target region is gradually enlarged, which means that the mismatch error has been accumulated along with the process progresses. Finally, an unexpected object is introduced into the restoration image, and make the result cannot meet the requirements of human vision, as shown in the last image.

Through the above analysis, we think the situations in which mismatch error is likely to occur can be summarized into two categories. The one is the differences between the pixels that already exist in target patch and the corresponding pixels in exemplar patch are relatively large. The other is the differences between the pixels that already exist in the target patch and the pixels that will be used for filling are relatively large. Under these situations, if the target patch is replaced by the exemplar patch, it is likely to lead to the occurrence of mismatch error.

3.3 Definition of MSD

On the one hand, if the differences between the already existing pixels in target patch and the corresponding pixels in exemplar patch are relatively large, the mismatch error is likely to occur. According to this situation, we define the MSD and use it to measure the degree of differences between the two patches. It is defined as:

$$ MSD({\varPsi}_{p},{\varPsi}_{q})=\frac{\sum|\bar{M}{\varPsi}_{p}-\bar{M}{\varPsi}_{q}|^{2}}{\sum\bar{M}} $$
(2)

where Ψp is the target patch and Ψq is the exemplar patch. M is the binary mask, it uses 1 to indicate the pixels that need to be filled, and uses 0 to indicate the pixels that already exist\(.\bar {M}{\varPsi }_{p}\) extracts the pixels that already exist in the target patch Ψp, \(\bar {M}{\varPsi }_{q}\) extracts corresponding pixels in the exemplar patch Ψq. In short, MSD calculates the average of the squared differences between corresponding pixels at known positions in two patches, which is used to measure the degree of difference between the two patches.

According to the (2), if the value of MSD is small, it means the already existing pixels in the two patches are very similar. On the contrary, if the value of MSD is large, it means the already existing pixels in the two patches are very different. In this situation, when using the exemplar patch to restore the target patch, the mismatch error is likely to occur.

3.4 Definition of SMD

On the other hand, if the differences between pixels that already exist in the exemplar patch and pixels that will be used for filling are relatively large, the mismatch error is still likely to occur. According to this situation, we define the SMD and use it to measure the degree of differences between the two patches. It is defined as:

$$ SMD({\varPsi}_{p},{\varPsi}_{q})=|\frac{\sum\bar{M}{\varPsi}_{p}}{\sum\bar{M}}-\frac{\sum M{\varPsi}_{q}}{\sum M}|^{2} $$
(3)

where Ψp is the target patch and Ψq is the exemplar patch,M is the binary mask\(.{\sum \bar {M}{\varPsi }_{p}}/{\sum \bar {M}}\) calculates the average of pixels that already exist in the target patch. \({\sum M{\varPsi }_{q}}/{\sum M}\) calculates the average of pixels that will be used for filling, these pixels are located at unknown positions in the exemplar patch. In short, SMD is used to measure the degree of difference between the pixels that already exist and the pixels used to fill.

According to (3), if the value of SMD is small, it means the already existing pixels in target patch and the pixels that will be used for filling are very similar. On the contrary, if the value of SMD is large, it means the already existing pixels and the pixels that will be used for filling are very different. In this situation, when using the exemplar patch to restore the target patch, the two parts of the restored patch are very different, and the mismatch error is still likely to occur.

3.5 Definition of matching rule

Based on the above definitions of MSD and SMD, we define a new matching rule in the proposed method, and use it to measure the degree of similarity between exemplar patch and target patch. It is defined as:

$$ {\varPsi}_{\hat{q}}=\underset{{\varPsi}_{q}\in{\varPhi}}{\arg\min} MSD({\varPsi}_{p},{\varPsi}_{q})+SMD({\varPsi}_{p},{\varPsi}_{q}) $$
(4)

To better illustrate the effectiveness of the matching rule defined in (4), we show a simple example in Fig. 3. We synthesize a binary test image, which consists of 9 small squares, and each small square is 9 × 9 in size, as shown in (a). In (b), we specify a target patch in this image, which is located in the center of the image. In addition, we also specify the damaged region in the target patch, which is marked by green and its size is 3 × 9.

Fig. 3
figure 3

Comparison of matching rules defined in (1) and (4)

According to the matching rule defined in (1), the similar exemplar patch can be found, which is marked by red dotted line in (c), because the SSD of the patch is 0. Finally, using the exemplar patch to restore the target patch, we obtain the restored image, as shown in (e). It can be seen that significant visual inconsistencies appear in the result image.

According to the matching rule defined in (4), the MSD of the exemplar patch which is marked by red dotted line in (c) is 0. However, the SMD of the patch is 65025. Therefore, the sum of MSD and SMD of the patch in (c) is 65025. In comparison, the MSD of the exemplar patch which is marked by blue dotted line in (d) is 0, and the SMD of the patch is still 0. Therefore, the sum of MSD and SMD of the patch in (d) is 0. Since 0 is less than 65025, the exemplar patch in (d) is determined to be a best similar exemplar patch. Using the exemplar patch to restore the target patch, we obtain the restored image, as shown in (f). It can be seen that the restoration result can meet the requirements of human visual consistency.

4 Proposed method

4.1 Priority computation

In the image inpainting, the filling order is determined by the priority values. In the proposed method, we use the method in [9] to compute the patch priority, because it can make the patches which are on the continuation of strong edges get higher priority values, and maintain the continuity of the structure.

For each patch Ψp centered at the point p has a patch priority, it is defined as:

$$ P(p)=C(p)\times D(p) $$
(5)

where C(p) is the confidence term and D(p) is the data term. Confidence term C(p) indicates how many existing pixels are there in the target patch. It is defined as:

$$ C(p)=\frac{{\sum}_{q\in{\varPsi}_{p}\bigcap{\varPhi}}C(q)}{|{\varPsi}_{p}|} $$
(6)

where |Ψp| is the area of the patch, that is, the number of pixels contained in the patch. Therefore, the more pixels already exist in a patch, the greater its confidence term. During the initialization, C(p) is set as:

$$ C(p)= \left\{ \begin{array}{lll} 0 & & {\forall p \in {\varOmega}}\\ 1 & & {\forall p \in {\varPhi}} \end{array} \right. $$
(7)

Data term D(p) indicates how strong the isophote hitting the boundary is. It is defined as:

$$ D(p)=\frac{|{\nabla^{\bot}_{p}}\cdot n_{p}|}{\alpha} $$
(8)

where α is a normalization factor. Therefore, the smaller the angle between the vector \(\nabla ^{\bot }_{p}\) and the vector np of a patch, the larger its data term.

4.2 Patch matching

Once all priorities have been calculated, we find the target patch Ψp with the highest priority, and search for the exemplar patch which is most similar to it in the source region according to the matching rule.

In the proposed method, we use the matching rule in (4) instead of (1) to search for the exemplar patch \({\varPsi }_{\hat {q}}\).

4.3 Patch restoration

When the exemplar patch \({\varPsi }_{\hat {q}}\) is found, we use it to restore the current target patch Ψp as follows:

$$ M{\varPsi}_{p}=M{\varPsi}_{\hat{q}} $$
(9)

4.4 Algorithm description

In order to describe the proposed method more clearly, we show its flow chart in Fig. 4.

Fig. 4
figure 4

Flow chart of the proposed method

Here, the proposed algorithm steps are described in detail as follows:

  • Identify the target regions. The target regions are indicated and extracted according to the inpainting mask;

  • Calculate the patch priorities according to (5), and select the target patch Ψp with the highest priority;

  • For each exemplar patch, calculate the MSD and the SMD according to (2) and (3), and find the most similar patch \({\varPsi }_{\hat {q}}\) according to (4);

  • Use exemplar patch \({\varPsi }_{\hat {q}}\) to restore the target patch Ψp according to (9);

  • Update the value of confidence term according to the following equation:

$$ C(\hat{p})=C(p)\quad \forall \hat{p}\in {\varPsi}_{p}\bigcap {\varOmega} $$
(10)

The algorithm iterates the above steps until all pixels in target region have been filled.

5 Experimental results

In order to verify the effectiveness and feasibility of the proposed method, we select a number of natural images from the BSDS dataset [2] in the experiment. All the experiments are run on the computer with the configuration of 2.7GHz processor and 2GB RAM.

For each image, we specify a target object, and then restore the region where it is located, to achieve the goal of removing the object. For better comparison and analysis, in the experiment, we use the method in [9], the method in [34], the method in [24], the method in [47], the method in [29], the method in [41], and the proposed method to restore each image, respectively. Finally, we compared and analyzed the restoration results qualitatively and quantitatively.

In the following figures, we show the restoration results of different methods in two groups: smooth images and texture images. For each Figure, (a) is the original image, (b) is the objet to be removed, which is marked by green, (c) is the result of method in [9], (d) is the result of method in [34], (e) is the result of method in [24], (f) is the result of method in [47], (g) is the result of method in [29], (h) is the result of method in [41], and (i) is the result of the proposed method. For better comparisons, in the restoration results of each method, we marked the target region with a white rectangle.

5.1 Qualitative analysis

In Figs. 56 and 7, we show the restoration results of three smooth images. To better distinguish, we name each image separately. In Fig. 5, we remove an eagle from the image, so we name the image as “eagle”. In Fig. 6, we remove the polar bear from the image, and we name the image as “polar bear”. In Fig. 7, we remove a goose from the image, and we name the image as “goose”.

Fig. 5
figure 5

Restoration results of “eagle”

Fig. 6
figure 6

Restoration results of “polar bear”

Fig. 7
figure 7

Restoration results of “goose”

In Figs. 89 and 10, we show the restoration results of three texture images, which contain bright colors and rich textures. In Fig. 8, we remove the rider on the left side of the image, and we name the image as “rider”. In Fig. 9, we remove the stone column from the image, and we name the image as “stone column”. In Fig. 10, we remove the tree from the hillside, and we name the image as “hillside”.

Fig. 8
figure 8

Restoration results of “rider”

Fig. 9
figure 9

Restoration results of “stone column”

Fig. 10
figure 10

Restoration results of “hillside”

As can be seen from the above figures, in the results of method in [9], there are often some undesired and unexpected objects in the target regions. For example, a small portion of another eagle was copied to the target region in Fig. 5 (c); an unexpected black object appeared in the target region in Fig. 6 (c); the head of the other goose was copied to the target region in Fig. 7 (c); part of the red coat was copied to the target region in Fig. 8 (c); a small part of the stone bench was copied to the target region in Fig. 9 (c); part of the blue lake appeared in the target region in Fig. 10 (c). The reason is that it uses the SSD to measure the degree of similarity between two patches, which may result in that the target patch is restored by an unsuitable exemplar patch, and the mismatch error may be continually accumulated along with the process progresses, eventually lead to the occurrence of the above situation.

In the method in [34], it used the sparse representation to restore the damaged patches, for smooth images, it can obtain satisfactory restoration effect, as shown in Figs. 5 (d), 6 (d), and 7(d). However, for texture images, texture details may be lost, resulting in over-smooth phenomenon in the target region. For example, lots of grass details are lost in the target region in Fig. 8 (d), there is a serious over-smooth phenomenon in the target region in Fig. 9 (d), and the details of the grass on the hillside are also lost in the target region in Fig. 10 (d). The reason is that the over-complete dictionary lacks adaptability to different textures, and this method only approximately reconstructs of image patches, so the restoration effect of texture image needs to be further improved.

The method in [24] modified the confidence term into an exponential form and computed the sum of confidence term and data term to make the filling order more reasonable, which can improve the restoration effect to some extent, as shown in Figs. 6 (e), 7 (e),and 8 (e), there are no isolated, trivial objects in the target regions. However, a small portion of another eagle was introduced into the target region in Fig. 5 (e). Parts of the stone bench were copied to the target region in Fig. 9 (e). Part of the blue lake appeared in the target region in Fig. 10 (e). The reason is that this method only improves the filling order and does not improve the matching rules, so sometimes the mismatch error occurs during restoration process.

The method in [47] can obtain better restoration results. For example, no superfluous objects are introduced into the target area in Figs. 5 (f), 6 (f), 8 (f), and 10 (f). The reason is that it used the information of curvature and gradient to replace the data term, which make the filling order more reasonable to a certain extend. However, it still has some problems. For example, part of another goose’s wings was copied into the target region in Fig. 7 (f), and part of stone bench was copied into the target region in Fig. 9 (f). The reason is that this method can make the filling order more reasonable, but it cannot effectively prevent occurrence of mismatch error.

The method in [29] changed the weights of data item and confidence item according to the golden section, which can make the restoration order more reasonable and obtain better results, as shown in Figs. 6 (g) and 8 (g). However, a small portion of another eagle was copied to the target region in Fig. 5 (g). A small part of the head of another goose was copied to the target region in 7 (g). A small part of the stone bench was copied to the target region in Fig. 9 (g), and part of the blue lake appeared in the target region in Fig. 10 (g). The reason is that this method only improves the filling order, but does not further improve the matching rule.

The method in [41] modified the priority calculation and defined a new matching rule, so it can achieve better restoration effect, and no undesired objects were introduced into target region in all these images. However, it added a distance factor constraint in the matching rule, which makes it always tend to select those exemplar patches that are closer to the target patch. Therefore, for the image with rich texture, those texture details that are closer to the target region may be duplicated many times, resulting in excessive texture repetition in the target region. For example, it achieved satisfactory results in Figs. 5 (h), 6 (h), and 7 (h). But in Fig. 8 (h), there are some continuous repetitions of grass in the target region. In Fig. 9 (h), there is excessive repetition of some branches in the target region. In Fig. 10 (h), there are multiple repetitions of a part of the lawn in the target region.

Compared with the other methods, the proposed method can achieve satisfactory results, and there are no undesired and unexpected objects in the target regions. The reason is that, it not only measures the degree of differences between corresponding pixels at known positions in the target patch and the exemplar patch, but also measures the degree of differences between the pixels at known positions in the target patch and the pixels at unknown positions in the exemplar patch, which can effectively prevent the occurrence of mismatch error, and make the results meet the requirements of human vision.

5.2 Quantitative analysis

Through the analysis in Section 3.2, we know that if the MSD of exemplar patch is relatively large or the SMD of exemplar patch is relatively large, we consider the mismatch error is likely to occur. So in the proposed method, we define a new matching rule as shown in Eq (4), the exemplar patch with the smallest sum of MSD and SMD is selected as the most similar exemplar patch. Therefore, in order to quantitatively analyze and compare various methods, we separately calculated the sum of MSD and SMD of each matched patch of each method during the restoration process, and used it to verify whether the sum of MSD and SMD in our method was relatively small compared with other methods.

However, for each image, each method contains a lot of matched patches in the restoration process. Moreover, in each method, the sum of MSD and SMD of the vast majority of matched patches are very small. Therefore, for each image, if the values of all matched patches of all methods are placed in the same chart, the distribution of data will be relatively centralized and cannot be clearly distinguished. For example, in the restoration process of rider, the sum of MSD and SMD of each matched patch of each method are shown in Fig. 11. As can be seen from it, the vast majority of the values in each method are relatively small, and their distribution is relatively concentrated, so we cannot distinguish the values of each method well and cannot effectively perform quantitative comparisons.

Fig. 11
figure 11

Sum of MSD and SMD of each matched patch of each method in the restoration process of “rider”

Based on the above analysis, in order to better distinguish the data distribution of each method and compare the performance of each method more clearly, we extracted the largest 10 values of each method for quantitative analysis, as shown in Fig. 12, where (a) is the largest 10 values of each method in restoration process of “eagle”. (b) is the largest 10 values of each method in restoration process of “polar bear”. (c) is the largest 10 values of each method in restoration process of “goose”. (d) is the largest 10 values of each method in restoration process of “rider”. (e) is the largest 10 values of each method in restoration process of “stone column”. (f) is the largest 10 values of each method in restoration process of “hillside”.

Fig. 12
figure 12

Quantitative comparison of each method in each image, where each sub-figure shows the largest 10 values (sum of MSD and SMD) of each method for each image

In Fig. 12 (a), the values of method in [9, 24], and [29] are relatively large, the values of method in [41, 47], and proposed method are relatively small. The distribution of these data coincides with the restoration results shown in Fig. 5. In Fig. 5 (c), (e), and (g), unexpected objects are introduced into the target region, while in Fig. 5 (f), (h), and (i), no undesired objects are introduced into the target region.

In Fig. 12 (b), only the values of method in [9] are relatively large, and the values of the other five methods are relatively small. What exactly confirms with this situation is that in Fig. 6 (c), an unexpected black object appeared in the target region, and the restoration effect of the other five methods can meet the requirements of human visual consistency.

In Fig. 12 (c), the values of method in [9, 47], and [29] are relatively large while the values of method in [24, 41], and proposed method are relatively small. The distribution of these data is exactly consistent with the restoration results in Fig. 7. Parts of the body of another goose were copied to the target region in Fig. 7 (c), (f), and (g), while no undesired objects were introduced into the target region in Fig. 7 (e), (h), and (i).

In Fig. 12 (d), only the values of method in [9] are relatively large, while the values of the other five methods are relatively small. What exactly confirms with this situation is that, part of the red coat was copied to the target region in Fig. 8 (c), while the other methods can obtain better restoration results.

In Fig. 12 (e), the values of method in [47] are the largest, followed by the values of methods in [9, 24, 29, 41], and the values in proposed method are the smallest. Corresponding to this situation is that, the size of unexpected object in Fig. 9 (f) is the largest, followed by the size of unexpected object in Fig. 9 (c), (e), (g), and (h). There is no unexpected object appears in the target region in Fig. 9 (i).

In Fig. 12 (f), the values of method in [9] are the largest, followed by the values of methods in [24, 29], and [41], the values of method in [47] and proposed method are the smallest. Corresponding to this situation is that, the size of unexpected object in Fig. 10 (c) is the largest, followed by the size of unexpected object in Fig. 10 (e), (g), and (h). There are no unexpected objects appear in the target region in Fig. 10 (f) and (i).

Through the above analysis, we can see that the quantitative results are consistent with the qualitative visual results. If the sum of MSD and SMD are relatively high, some undesired and unexpected objects are introduced into the target region. If the sum of MSD and SMD are relatively small, then no extra objects are introduced into the target region and the restoration images can satisfy the requirements of subjective visual consistency. The experimental results can effectively illustrate the effectiveness of the new matching rule defined in our method.

5.3 Discussion

It should be noted that we did not compare the proposed method with the method based on deep learning. The reason is that we think the basic principles of these two types of methods are completely different. The exemplar-based method mainly searches for similar exemplar patch in the undamaged region of the image, and then uses it to restore the damaged pixels, while the deep learning method mainly uses a large number of real images to train the deep network model, so that the network model can learn the feature distribution of the real images, finally it can be used to automatically generate images that conform to the feature distribution to achieve the purpose of restoring the damaged region. Compared with the deep learning method, the exemplar-based method does not need to use a large number of samples and spend a lot of time to train the deep network. However, we have realized the powerful capabilities of deep learning in image inpainting. The papers presented at CVPR conferences in 2019 and 2020 have shown that deep learning methods have achieved amazing and satisfactory results. Up to now, we have carried out in-depth research on inpainting methods based on deep learning, hoping to further improve the effect of image restoration.

In addition, we also need to note that although the proposed method has achieved good restoration results, its restoration process takes more time than other methods. We think there are two main reasons: First, for each target patch, we need to traverse the whole source region to search for the most similar exemplar patch, which takes a certain amount of time. Second, for each exemplar patch, we need to calculate MSD and SMD separately, and then compare them according to the sum of MSD and SMD. This process also takes a certain amount of time.

6 Conclusions

In this paper we propose an image inpainting method for object removal based on difference degree constraint. For the problems of mismatch error and error accumulation during the restoration process, we define the MSD and the SMD between the target patch and the exemplar patch, and use them to measure the degree of difference between the two patches. Based on the MSD and the SMD, we redefine a new matching rule to prevent mismatch error and error accumulation in time. Experimental results show the effectiveness of the proposed method. In the following research, we will study the inpainting method based on the Generative Adversarial Networks, hoping to further improve the restoration effect of object removal.