Image editing by object-aware optimal boundary searching and mixed-domain composition

When combining very different images which often contain complex objects and backgrounds, producing consistent compositions is a challenging problem requiring seamless image editing. In this paper, we propose a general approach, called object-aware image editing, to obtain consistency in structure, color, and texture in a unified way. Our approach improves upon previous gradient-domain composition in three ways. Firstly, we introduce an iterative optimization algorithm to minimize mismatches on the boundaries when the target region contains multiple objects of interest. Secondly, we propose a mixed-domain consistency metric for measuring gradients and colors, and formulate composition as a unified minimization problem that can be solved with a sparse linear system. In particular, we encode texture consistency using a patch-based approach without searching and matching. Thirdly, we adopt an object-aware approach to separately manipulate the guidance gradient fields for objects of interest and backgrounds of interest, which facilitates a variety of seamless image editing applications. Our unified method outperforms previous state-of-the-art methods in preserving global texture consistency in addition to local structure continuity.


Introduction
Seamless image editing has been an active research field in recent years. It is widely applied in panorama mosaicing [1], photo composition [2,3], manipulating large collections of photos [4], and so on. Seamless image editing involves combining source regions with target images in a visually natural way. Since input images often contain multiple objects with differently textured backgrounds, a natural-looking editing result should meet the expectations of human visual perception [5] and preserve not only local structure continuity within boundaries but also consistent color and texture transitions between source and target images.
In general, when combining images with very different structures and textures, a successful image editing algorithm should preserve the following properties, to produce results in agreement with our visual perception: 1) Local structure continuity.
When the input images include multiple objects or different structures, the composition should not break the local salient structures in the overlap region, to avoid structure collision or discontinuity. 2) Smooth color transitions. The composition should blend the colors between input images to avoid blurring due to differences along the boundary. 3) Global texture consistency. When the input images have very different textures, the process should take into account not only gradients but also colors to preserve texture consistency.
Much research has been conducted to solve these issues.
In this paper, we propose a general approach which we call object-aware image editing (OAIE), based on optimal editing region selection and mixed-domain composition. Our main contributions include: 1) an iterative optimization algorithm to compute optimal boundaries for regions containing multiple objects, which can be used to preserve local structure continuity; 2) a mixed-domain measure to address smooth color transitions and global texture consistency, allowing us to formulate editing as a unified minimization problem; 3) a patch-based approach to encoding texture consistency which requires neither search nor voting.
Based on these contributions, OAIE can simultaneously eliminate local structure artifacts, provide smooth color transitions, and preserve global texture consistency.

Related work
In the image editing literature, many methods have been proposed to address inconsistency in structures, colors, and textures.

Optimal seam methods
Optimal seam methods are often applied for seamlessly compositing images and textures; they achieve good local structure continuity. They seek a partition curve (an optimal seam) in the overlap region to minimize the difference between the two input images, in order to make the seam as invisible as possible. Generally, objects are visually salient in images [6], so an optimal seam should pass around objects to avoid structure collisions. Dynamic programming [7] or graph cut [8] methods are usually applied to find the optimal seam. Optimal seam methods can handle image composition well when there are only small structure and color differences between input images in the overlap region. However, if the differences are large, it is hard to find a pleasing optimal seam. To combine inconsistent images, Darabi et al. [9] proposed a novel method, image melding (IM), which uses a Poisson equation solver to find suitable colors by minimizing an energy function based on mixed L 2 /L 0 norms for colors and gradients. This method can produce a gradual transition between source images without sacrificing texture sharpness. However, when there are too many edges or complex textures around the boundary, the editing result can look unnatural. Tao et al. [10] proposed error-tolerant image composition (ETIC), which minimizes the curl of the target gradients on the foreground-background boundary. However, the editing result may have a very sharp boundary and color leakage may occur.

Gradient-domain methods
Gradient-domain methods are usually adopted to obtain smooth color transitions.
They make use of known gradient information to produce the final composition by interpolation. The basic idea is to reconstruct the image from gradient fields with specified boundary conditions. Pérez et al. [11] proposed an effective image blending approach, Poisson image editing (PIE). By solving a Poisson equation with user-specified Dirichlet boundary conditions, this approach can blend colors seamlessly between input images. However, its effectiveness heavily depends on careful alignment of the structures of the input images along the userdrawn boundary. Agarwala et al. [2] combined graph cut optimization and gradient-domain composition to create photomontages.
[12] proposed an L 1 -based gradient-domain stitching method to eliminate visible seams in the overlap region by use of gradient fields. Jia et al. [13] proposed an easy blending method, drag-and-drop pasting (DDP), by optimizing boundary conditions for gradient-domain composition. Farbman et al. [14] introduced an alternative, mean-value coordinate based approach, to carry out seamless cloning via a weighted combination of values along the boundary. Bhat et al. [15] proposed a unified variational model, GradientShop, to perform a number of image and video editing tasks. It contains many filters and uniformly uses quadratic optimization, lowering the computation time. Li et al. [16] performed multiscale editing by applying a nonlinear filter bank to adjacent pixels at each level of a Gaussian pyramid, to eliminate visual artifacts. However, visual artifacts still occur when the trade-off parameter is too large or too small. In Ref. [17], the authors presented a gradient-based variational model for video editing, which addresses the problem of propagating gradient-domain information along the optical flow of the video. With this method, a user can edit a frame by modifying the texture of an object's surface and then propagate this edit throughout the video. Bie et al. [18] incorporated the users' intent in outlining the source patch, to tackle structure conflicts between the source image patch and the target image. Hua et al. [19] added an extra edge-aware constraint term in a general gradient-domain optimization framework, enforcing similar image filtering effects while preserving edges. In Ref. [20], Zhang et al. performed image copy-andpaste with optimized gradients, where they created a gradient transition map in the cloning area and then used an interpolation-based method to calculate the composition results from the reconstructed gradient map.

Patch-based synthesis
Patch-based texture synthesis is usually used to address global texture consistency, and has been successfully applied to various editing tasks on still images, video, and stereo pairs. Darabi et al.
[9] proposed a novel method, image melding (IM), which adopts patch-based synthesis to find suitable colors, and minimizes an energy function based on mixed L 2 /L 0 norms for colors and gradients. This method uses an iterative search-andvoting blending scheme, and can produce a gradual transition between inconsistent source images without sacrificing texture sharpness. However, when there are too many edges or complex textures around the boundary, or limited texture sources for synthesis, the editing result can be unnatural. Ma and Xu [21] proposed an efficient manifold preserving edit propagation method which searches feature space using an adaptive neighborhood size, which reduces time and memory costs without reducing visual fidelity. Barnes et al. [4] proposed a fast patch-based optimization method, PatchTable, for efficient computational photography. In Ref. [22], Luo et al. extended the patch-based synthesis framework from 2D to 3D for stereoscopic image editing. They introduced a depth-dependent patchpair similarity measure and a joint patch-pair search. Chen et al. [23] presented an interactive system called sketch2photo to compose a realistic picture from a simple freehand sketch annotated with text labels. Their system found several photographs in agreement with the sketch and text labels by searching the Internet and automatically selected suitable photographs to generate a high quality composition. In Ref. [24], Zhang et al. extended patch-based synthesis to plenoptic images captured by consumer-level lenslet-based devices for interactive light field editing. They represented the light field as a set of images captured from different viewpoints and performed patch-based image synthesis on all affected layers of the central view, and then propagated the edits to all other views. To address the heavy computational burden of gradient-domain operators, Ref. [25] proposed a patch-based synthesis method using a Laplacian pyramid to improve searching for correspondences with enhanced awareness of edge structures.

Definition
As illustrated in Fig. 1, seamless image editing requires combining a region of interest Ω r (with external region boundary ∂Ω r ) from a source image f s into the target image f t . Generally, Ω r may include one or more objects of interest O i , i = 1, . . . , n (e.g., n = 3). We define a target of interest Ω i (with external target boundary ∂Ω i ) as a subregion surrounding the corresponding O i (with external object boundary ∂O i ), and note that O i ⊆ Ω i ⊆ Ω r . Then the optimal editing region selection problem requires finding the optimal  Optimal editing region selection. A region of interest from the source image may include one or more objects of interest. Optimal editing region selection requires finding optimal targets of interest which contain objects of interest, to minimize local structure mismatches. As optimal target boundaries should not intersect each other, we break the band connectivity with several shortest boundary cuts or connections, giving an effective algorithm for finding the shortest closed path.
targets of interest {Ω i }, i = 1, . . . , n, which minimize local structure mismatches. The task can be turned into one of optimizing each target boundary ∂Ω i between ∂O i and ∂Ω r for each object of interest Then the optimal editing region Ω is separated into two parts including objects of interest O and backgrounds of interest B. The extra region Ω r \ Ω is neglected in image editing. The cut C i connects the boundaries between object of interest O i and Ω r , and the cut C ij connects the boundaries between O i and O j .

Optimal boundary for single object
Consider the minimization problem for gradientdomain composition as in Ref. [11], which seeks an image f to approximate the target gradient field v = ∇f s (from the source image) in a leastsquares sense with the given user-specified Dirichlet boundary condition: (1) can be written as Note that the variational energy Ω ∇f 2 dx dy in Eq.
(2) will approach zero if and only if the boundary condition f | ∂Ω = k (where k is a constant). This observation has been utilized in DDP [13] to define the optimal boundary condition when including a single object of interest.

Optimal boundaries for multiple objects
In this paper, we extend the case to placing multiple objects within Ω r . Then, the resulting optimal boundaries can be computed by minimizing the following boundary energy function: where the constant k i is a vector for a color image or a scalar for a grayscale image. We may consider each target boundary ∂Ω i independently due to their independence of each other. Since ∂Ω i may pass through all pixels in the region Ω r \O, it is intractable to simultaneously estimate {∂Ω i } and {k i }. To solve this problem, an iterative optimization algorithm is proposed, which operates in an alternate manner.
In Step 3, computing the optimal boundary for O i is equivalent to finding a shortest closed path in a graph G i . The nodes in G i are pixels within the band Ω r \ O while the edges represent 4-connectivity relationships between neighboring pixels.
node as the color difference with respect to k i . Following Ref.
[13], we break the band connectivity with the shortest boundary cut C i connecting ∂O i and ∂Ω r , as shown by black solid lines in Fig. 1, and remove all edges crossing the cut from the corresponding graph G i . In addition, to ensure each optimal boundary encloses only one object, we construct the boundary connection set {C ij } between objects of interest {O i }, shown as black dotted lines in Fig. 1, and then apply 2D dynamic programming [26] to find the shortest closed path which connects the two sides of C i as well as passing through the associated boundary connections.

Obtaining objects of interest
To compute the optimal boundaries, we need to firstly obtain the objects of interest {O i }. Level sets [27] can be used to perform automatic object segmentation. However, as the editing of objects of interest usually needs user interaction (e.g., to only select certain objects), we rely on obtaining objects of interest by interactive segmentation techniques.
Step 2: Given each current ∂Ω i , compute the optimal k i by taking the derivative of Eq. (3) and setting to zero: where |∂Ω i | is the length of the boundary ∂Ω i . So k i is the average color difference across the boundary ∂Ω i .
Step 3: Using the current k i , the boundary ∂Ω i is optimized using the shortest path algorithm.
Step 4: Repeat Steps 2 and 3 until the energy in Eq. (3) does not decrease in two successive iterations.
To do so, a user simply draws a box surrounding the object of interest, and then GrabCut [28] is applied to extract the object of interest. Note that even if the objects of interest are not obtained very accurately, our computation of optimal boundaries still can avoid structure inconsistency since the optimal boundaries usually pass through smooth regions rather than object edges.

Mixed-domain composition
Optimal editing region selection can preserve local structural continuity well. However, if the backgrounds of the input images differ greatly in color and texture (e.g., pasting a source of interest from a smooth region into a target image with coarse textures), color bleeding or texture artifacts still may occur when reconstructing from a gradient field, even with well-optimized boundary conditions. To address this issue, we must enhance color and texture consistency between images. Therefore, we perform mixed-domain composition by solving the following minimization problem: where the consistency energy E C provides a unified consistency metric for measuring gradients and colors. The gradient-domain energy is defined on all targets of interest to obtain smooth color transitions, using D g (p) = ||∇f (p) − v(p)|| 2 , while the color-domain energy is defined on backgrounds of interest to preserve global texture consistency. We adopt a patch-based metric for encoding texture consistency and define D c where f (p) and f t (p) are the background pixel sets in w × w image patches N w (p) centered at p, and N B w (p) = N w (p) ∩ B. Unlike the IM method [9], this patch-based approach neither needs search nor voting. The trade-off parameter λ is used to balance the influence of the two terms.
Defining a binary mask M , the minimization problem may be written as min f p∈Ω where M (p) is 1 if p ∈ B, and 0 otherwise. Based on the Euler-Lagrange equation, we obtain: ∀p ∈ Ω (7) We make use of backward differences to discretely approximate Eq. (7), leading to a large sparse linear system: where |A(p)| represents the number of available pixels in a 4-neighborhood A(p) centered at p; it satisfies |A(p)| 4. v pq is the projection of v((p + q)/2) on the edge [p, q] in the direction of → pq, given by v pq = v ((p + q)/2) · → pq. The computational complexity for constructing the sparse linear system is O (|Ω|); the space complexity for storing the coefficient matrix is O (|Ω|). Many efficient technologies such as Ref. [29] can be adopted for solving the sparse linear system. We set w = 9 and λ = 1 in this paper.

Object-aware gradient manipulation
Unlike traditional composition methods [11,15] which manipulate the guidance gradient field for the target region using a unified operator, OAIE manipulates the guidance gradient fields of O and B separately with independent operators, in an objectaware way.

Transformation operators
For objects of interest O, the gradient field can be manipulated by modifying ∇f s using linear or nonlinear transformations: where the transformation operator τ is used to change the appearance of targets of interest (e.g., their texture, illumination, contrast, color, etc.) and can be chosen from the transformation set Λ = {clone, illuminate, smooth, recolor, . . . }. Here, clone provides seamless cloning, illuminate provides local illumination change, smooth provides texture smoothing (e.g., cartoonization), recolor provides object recoloring, and so on. Note that different operators can be used for each object.

Mixture operators
For backgrounds of interest B, the gradient field is manipulated by modifying or mixing source and target gradient fields: where the mixture operator φ is chosen from the set Θ = {src, max, min, avg, . . . } where src stands for using the source gradient, max/min/avg stand for picking the larger / smaller / average gradient from ∇f s and ∇f t , etc.

Combination
Finally, the gradient fields from Eqs. (9) and (10) are combined to form the new guidance gradient field in editing region Ω. This new guidance gradient field is substituted into Eqs. (5)-(8).

Algorithm comparison
OAIE performs composition with gradient fields in an optimized target region Ω with boundary conditions on ∂Ω, as well as colors in B. By contrast, PIE reconstructs images from gradient fields in Ω r with non-optimized boundary conditions on ∂Ω r while DDP performs this process with gradient fields in Ω and optimized boundary conditions on {∂Ω i }. ETIC optimizes the boundary conditions by minimizing the curl of the gradients on ∂Ω r . IM carries out image composition with patch-based synthesis using the textures in Ω r \ Ω.

Seamless cloning
Seamless cloning aims to copy a region of interest from a source image and seamlessly insert it into a target image. In this process, the guidance gradient fields of the editing region Ω and backgrounds of interest B are manipulated with Eqs. (9) and (10) respectively. Figure 2 shows a seamless cloning example which combines a region of interest containing multiple objects with a target image. The result of direct cloning is shown in Fig. 2(c), with object boundaries and optimal boundaries marked in blue and red respectively. Results generated by the four methods PIE, DDP, ETIC, and IM all have artifacts in the background region, e.g., around the object boundaries. However, OAIE obtains a more natural result since it considers a unified constraint on both object appearance and background texture simultaneously. Figure 3 provides a further seamless cloning example which inserts a source region with a smooth background into a target image with rich textures. None of the other four methods preserve texture consistency well (leading to unnatural colors or textures), while OAIE obtains consistency of both local structures and global textures.

Selective modification
Selective modification aims to adjust local or global appearance (e.g., texture, color, illumination, etc.), so image cloning methods such as DDP, ETIC, and IM are not applicable. We compared our results from OAIE with those from the PIE method. During selective modification, optimal editing region selection is neglected, so f s = f t , while the gradient fields of O and B can be obtained using the same or different operators. Figure 4 demonstrates examples of global and local selective modification. Figure 4(a) demonstrates global recoloring of Fig. 3(h) with the recolor operator. This operator multiplies the RGB channels of the original image by different values respectively to form the source image, and then performs seamless composition. Figure 4(b) shows the result of using operator illuminate to reduce local specular reflections (shown in blue in Fig. 4(a)). The operator illuminate modifies the gradient field using v = α β |∇f | −β ∇f where α = β = 0.2, following Ref. [11]. We see that when either global or local operators are applied to the cloned composition, the resulting textures still remain natural, verifying the robustness of OAIE. Figure 5 presents another example of selective modification which changes the local color of a target of interest. In Fig. 5(a), the object and target boundaries are marked in blue and red respectively. The PIE result exhibits color bleeding artifacts around the object, while OAIE avoids this problem effectively. To explain why, let us consider the difference between the minimization problems solved by PIE and OAIE. Compared to PIE, the minimization problem for OAIE includes an extra texture consistency term defined using a patch-based approach. It has been demonstrated in Ref.
[9] that patch-based synthesis can produce a gradual transition between images without sacrificing texture sharpness. This allows OAIE to produce a smooth transition across image edges without color bleeding and texture blurring.

Seamless stitching
Seamless stitching aims to combine multiple source images into a panorama mosaic. Figure 6 illuminates an example of stitching two remote sensing images with a very large tone difference. Optimal seam methods will lead to obvious artifacts due to the color difference between the source images in the overlap region (shown in blue in Fig. 6(b)). Since consistency constraints are imposed for both color and texture, OAIE achieves a more globally natural result than PIE, as shown in Figs. 6(c) and 6(d).
In addition, OAIE can stitch multiple target parts, as shown in Fig. 7, where several face parts from different face images are stitched into an integrated face portrait. The upper row shows the target image and source parts marked in red in the source images, while the bottom row gives the results generated by cloning, PIE, DDP, and OAIE respectively. Various structure or color artifacts can be seen in the results generated from PIE and DDP. In contrast, the result from OAIE is more natural. A potential application of this editing tool is portrait synthesis for police work.

Limitations
Like other methods, our method has some limitations when the background structures between source and target images in the editing region collide. As  shown in Fig. 8, the result generated by our method suffers from color bleeding around the background structures, while the results from PIE and DDP have serious structure artifacts.

Computational efficiency
Our method is simple and efficient. Firstly, the user selects and drags the region of interest from the source image to the target image. Then, simple interactive segmentation allows the user to indicate the objects of interest. This usually takes about 1-2 seconds of computation depending on the sizes of the objects. After specifying the type of editing required, the composition is performed automatically by using dynamic programming to find optimal boundaries and solving a large sparse linear system for mixeddomain composition. The computational complexity of dynamic programming is O(n), while solving a large sparse linear system takes time O(n) [29] where n is the number of variables.

Patch-level similarity
In our mixed-domain composition, we adopt patch-based synthesis to preserve texture consistency, so texture similarity is measured at patch-level rather than at pixel-level. Generally, patch-based synthesis can preserve more texture cues, giving less smooth results than pixel-based synthesis. Figure 9 shows an example, where the results provide a close up of part of Fig. 2, using different synthesis methods. The results demonstrate that patch-based synthesis provides more rich texture information.

Editing quality estimation
Generally, subjective measures are used to evaluate editing quality. This is usually performed by using the human eyes to check the naturalness of the editing results in terms of structure, color, and texture. To verify the advantages of our method, we used blind image quality evaluation as proposed in Ref. [30], considering the image naturalness in the local editing regions. The experimental results showed that our method can give better quality.

Conclusions
In this paper, we have proposed a general approach called OAIE for seamless image editing. It jointly performs optimal editing region selection and mixeddomain composition, allowing it to cope with visual inconsistency in local structures and global textures simultaneously.
In particular, OAIE provides a unified mixed-domain consistency measure for gradients and colors, in which texture consistency is encoded with a patch-based approach which does  not require search. Compared to four state-of-the-art methods, our unified approach is more powerful in preventing global color and texture inconsistencies, while preserving local structure continuity.

Fig. 7
Seamless stitching multiple target parts. Various structure or color artifacts (e.g., around the glasses and mouth) can be found in the results generated by the PIE and DDP methods, while the OAIE result is more natural.  Composition using patch-based synthesis (above) and pixel-based synthesis (below). Patch-based synthesis provides richer texture cues. He received his Ph.D. degree from the University of Science and Technology of China. His research mainly focuses on the Internet of things and intelligent information processing.
Open Access The articles published in this journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www. editorialmanager.com/cvmj.