Polygonal finite element-based content-aware image warping

Mesh-based image warping techniques typically represent image deformation using linear functions on triangular meshes or bilinear functions on rectangular meshes. This enables simple and efficient implementation, but in turn, restricts the representation capability of the deformation, often leading to unsatisfactory warping results. We present a novel, flexible polygonal finite element (poly-FEM) method for content-aware image warping. Image deformation is represented by high-order poly-FEMs on a content-aware polygonal mesh with a cell distribution adapted to saliency information in the source image. This allows highly adaptive meshes and smoother warping with fewer degrees of freedom, thus significantly extending the flexibility and capability of the warping representation. Benefiting from the continuous formulation of image deformation, our poly-FEM warping method is able to compute the optimal image deformation by minimizing existing or even newly designed warping energies consisting of penalty terms for specific transformations. We demonstrate the versatility of the proposed poly-FEM warping method in representing different deformations and its superiority by comparing it to other existing state-of-the-art methods.


Introduction
Due to advances in imaging technology, the acquisition and display of digital images are almost universal. Various display devices are used to view images, such as phones, tablets, monitors, and televisions. Images frequently change size, and should fill the whole screen to achieve an optimal display; screens vary in size. Images also need to be resized in other applications. For example, document display and printing require resizing embedded images to comply with a specified layout. Research into image resizing, also known as image retargeting, has drawn much attention in recent years, and several techniques have been proposed.
Image scaling is the most straightforward method to achieve the image resizing goal. However, scaling often does not produce satisfactory results, as it is oblivious to image content. Another simple method for image retargeting is cropping. Cropping inevitably causes information loss and leads to unpleasant results. To preserve relevant information, especially visually important structures and objects, a more sophisticated class of techniques attempts to resize images in a content-aware fashion. Existing contentaware image retargeting methods can be classified into two general categories: cropping methods and warping methods. In content-aware cropping methods, pixels or regions in an image are removed according to pre-specified criteria. They achieve results with better visual quality than naive cropping. However, important objects may be broken, and artifacts may be introduced as the pixel removal operation highly depends on object detection results, which are often inaccurate.
Warping methods, also referred to as continuous methods, are another popular type of image retargeting technique. Unlike cropping methods, which may discard important contents, warping methods retain important and unimportant contents. To obtain a resized image with important objects preserved, warping methods perform a nonlinear deformation that minimizes the distortion of important regions while allowing large distortions in unimportant regions. Mesh-based warping methods construct a mesh on the image domain and obtain the resized image by deforming the mesh. In most previous methods, the warping meshes used for driving the deformation are strictly triangular or quadrilateral. Non-uniform deformation of the image, which is supposed to be a continuous function, is typically represented by piecewise linear functions on warping meshes. However, only allowing a single cell shape in the warping mesh with a linear approximation of the associated functions is too restrictive. This limits the non-uniform image deformation to a relatively small function space, leading to unsatisfactory results. This paper introduces a novel continuous warping representation and proposes a fully automatic algorithm for content-aware image retargeting. The warp mapping is represented as a smooth function by high-order generalized barycentric coordinates defined over polygonal meshes. Our representation possesses superior properties, such as supporting highly adaptive meshes, high-order basis functions, and achieving continuity without enforcing additional constraints.
Image warping is driven by the deformation of these polygonal meshes determined by a specific energy function. Experimental results show that our algorithm for content-aware image warping achieves a better trade-off between warping quality and mesh size, and greater robustness, than other existing state-of-the-art methods. In summary, our main contributions are: (1) A novel poly-FEM-based warping representation for content-aware image retargeting. Image warping is represented by continuously stitched functions with higher-order approximation defined over polygonal meshes on the image domain. This representation includes the piecewise linear representation as a special case, and so can achieve more satisfactory results.
(2) An efficient and fully automatic framework to warp images. Polygonal meshes for driving the deformation are generated with local density and shape adaptive to feature information. Different warping energy functions can be incorporated and tested easily and consistently to achieve various deformation results.
The remainder of this paper is organized as follows: Section 2 reviews related work. We propose a poly-FEM-based warping representation in Section 3. Section 4 presents the algorithm and implementation for our poly-FEM-based image warping method. Results and comparisons are presented in Section 5. Conclusions, limitations, and suggestions for future work are given in Section 6.

Related work
Image retargeting has been extensively studied in computer graphics. In this paper, we focus on contentaware retargeting techniques. From the vast body of literature in the field, we only review references closely related to this paper, and refer the interested readers to Refs. [1,2] for more comprehensive surveys.

Content-aware cropping
Cropping-based methods discard pixels in unimportant regions and scale or shift the remaining pixels to resize the image. One class of contentaware cropping techniques searches for a cropping rectangle inside which the aggregated importance is maximized [3,4]. They perform well if the input image contains only one central important object. To deal with images with two or more scattered regions of interest, Setlur et al. [5] proposed a method that first removes the regions of interest and inpaints the holes to generate a background, and then places the cropped objects back. It heavily relies on accurate segmentation of the source image.
Seam carving-based methods form another class of cropping methods, which decrease the image width or height one pixel at a time by removing a seam with least importance and shifting the remaining pixels to compensate for the removed seam [6]. The original seam carving method introduces visible artifacts if the input image contains straight lines or geometric structures. This method is thus enhanced by including line detection to better preserve straight lines [7]. The multi operator (Multiop) method uses several operators for resizing media, including cropping, seam carving, scaling, and warping [8]. Patch-based methods achieve retargeting by manipulating patches [9,10]. As they remove regions or strips, patch-based methods can be considered to generalize cropping-based methods. Generally speaking, cropping-based methods remove pixels from source images, which causes loss of information; hence artifacts can sometimes be observed in the results.

Content-aware warping
Warping methods scale the source image nonuniformly to preserve important regions. In general, regions with high importance are constrained to distort as little as possible, while unimportant regions are allowed to have relatively large deformation. The image is subdivided into a mesh, whose deformation drives the deformation of the source image. Typically, a triangular mesh [11,12] or a quadrilateral mesh [13][14][15][16][17][18] is used, and image deformation is naturally represented as piecewise linear or bilinear functions on mesh faces, respectively. For example, a piecewise bilinear warping may be computed by iteratively computing optimal local scaling factors for each cell of a quadrilateral mesh according to a significance map [14]. A piecewise linear warping on a triangular mesh can be constructed from an approximation to a prescribed Beltrami representation (BR) [12]. Instead of limiting ourselves to piecewise linear functions on triangular or quadrilateral meshes, we propose a more general representation of the continuous warping that supports high-order continuity and adaptive meshes.

Deep learning-based methods
Recently, attempts have been made to solve the image retargeting problem using deep learning techniques [19][20][21][22], which are extensions of the methods mentioned above. For example, a weaklyand self-supervised deep convolutional neural network (WSSDCNN) has been proposed for predicting attentive shift maps in Ref. [19]. Scaling on grid cells is used to represent image distortion in the deep cyclic image retargeting approach (Cycle-IR) [20]. The multi-operator retargeting is formulated as a Markov decision-making process and optimized by reinforcement learning in the semantics and aesthetics aware multi-operator image retargeting (SAMIR) framework [21]. The deep network resizing (DNR) method applies resizing operators, including seam carving and grid-warping, in feature space instead of pixel space [22].

FEM-based warping
In the computer graphics community, FEM has been applied to applications such as 2D/3D morphing [23,24] and geometric modeling [25][26][27]. Traditional FEM has also been applied to image warping, relying on strictly triangular or rectangular meshes. For example, Gee et al. [28] used simple linear elements in medical image warping for registration. Later, a discontinuous Galerkin FEM (DG FEM) with triangular or rectangular elements using power polynomials was applied to the content-aware image warping task [29]. Requirements on element types, to simplify or accelerate the involved computation, restrict the approximation capability of FEM.

High-order poly-FEM
Poly-FEMs offer several advantages over traditional finite elements in practical applications. Generalized barycentric coordinates (GBCs), such as Wachspress coordinates [30,31] and mean value coordinates (MVCs) [32], provide suitable bases for linear finite elements on general polygons as generalizations of linear barycentric FEM shape functions. Recently, extensions to higher-order approximations on polygonal elements have also been studied [33][34][35]. Higher-order poly-FEMs share attractive properties with piecewise linear poly-FEMs, such as partition of unity, nodal data interpolation, and smoothness. In addition, they provide higher-order reproduction properties.
High-order poly-FEM has been successfully applied to solving partial differential equations [33][34][35] and function approximation [36]. We are motivated by these successes to use the high-order poly-FEM method in the image warping problem. Our highorder polygonal element-based method has important advantages over prior mesh-based methods, such as allowing for highly adaptive meshes, smoother and more flexible representations of image deformation using many fewer degrees of freedom (DOFs), and ability to incorporate general deformation energy functions into the framework.

Poly-FEM warping representation
To apply high-order poly-FEM to image warping, we first discretize the domain so that the continuous warping map is subdivided into smaller and simpler polygons. Then, poly-FEMs are used to approximate the unknown function over the discretized domain. In this section, we briefly discuss the poly-FEMs, which we propose as a basis for the discretized image warping representation. We defer discussing the polygonal mesh generation until Section 4.3.2. Note that the discretization operation using highorder elements is independent of the choice of shape functions for the aforementioned high-order GBCs. For simplicity, we describe the image warping representation using the quadratic serendipity elements (QSEs) proposed in Ref. [33] as an example.

Quadratic poly-FEMs
The QSEs in Ref. [33] are developed using GBCs, for instance, MVCs. We first introduce some notation; our notation differs slightly from that of Ref. [33], in an inessential way. Let Ω be a convex polygon in the plane with n vertices ordered counter-clockwise (v 1 , · · · , v n ), with no more than three consecutive vertices collinear. Each vertex v i is associated with a GBC, denoted λ i (u, v). For simplicity, hereinafter we omit the variables in each function, e.g., for i, j =1, · · ·, n; subscripts are to be taken modulo n.
The quadratic serendipity element basis functions ψ k associated with vertices or midpoints v i , i = 1, · · · , 2n, are defined as linear combinations of µ ij = λ i λ j as We refer the readers to Ref. [33] for further details of the computation of coefficients.
In particular, the QSE basis functions possess all the properties needed for admissible quadratic FEM basis functions: • Partition of unity: 2n k=1 ψ k = 1.
• Smoothness: ψ k is smooth within the domain Ω and is discontinuous across the element boundary. • Quadratic precision: ψ k for k = 1, · · · , 2n can reproduce polynomials of up to degree two. • Nodal interpolation: The n-sided polygonal domain with the associated basis function set {ψ k } 2n k=1 forms a construction of QSE on Ω using GBCs. Figure 2 shows examples of MVC-based QSE basis functions from Ref. [33]. )). Now we discretize this continuous model using QSEs in preparation for the optimal warp computation. To the best of our knowledge, this is the first time QSEs have been applied to an image warping representation.

Consider the rectangular domain
Assume that the domain I has been appropriately discretized into a polygonal mesh M with cells Ω k for k = 1, · · · , N . We construct the QSE basis functions on each cell. Consider two adjacent cells Ω k 1 and Ω k 2 sharing a common edge v 1 v 2 . The two basis functions ψ 1 k 1 and ψ 1 k 2 associated with the same vertex v 1 (or the same midpoint v n+1 ) on Ω k 1 and Ω k 2 respectively are discontinuous across the edge v 1 v 2 . However, this pair of bases coincides with each other on v 1 v 2 , due to their nodal interpolation property and quadratic precision. To ensure automatic continuity across element boundaries, we collect the basis functions associated with a vertex (or an edge midpoint). Their sum is set as a basis in the final warping representation. Specifically, for two basis functions ψ 1 k 1 and ψ 1 k 2 associated with midpoint v i on the polygonal mesh, we replace ψ 1 k 1 and ψ 1 k 2 with a new basis function B i which is the sum of these two discontinuous basis functions: Basis functions associated with mesh vertices are treated in the same fashion. Figure 3 shows examples of the merged bases associated with a vertex and an edge midpoint. Consider a polygonal mesh with M vertices and edge midpoints, We denote the merged basis function associated with a vertex or an edge midpoint by B i (u, v), i = 1, · · · , M . Then the warping map is represented as a linear combination of these basis functions, giving where c i = (c i1 , c i2 ) are the position vectors of vertices or edge midpoints v i after deformation. Section 4 describes an image warping framework tailored to compute the coefficients based on our new warping representation. Note that the warping map defined in Eq. (3) is naturally continuous along cell edges, benefiting from the basis consolidation mentioned above, whereas for DG FEM bases, additional constraints are needed to enforce continuity of the individually defined quadratic power polynomial functions along the cell edges, to achieve a continuous map [29].

Algorithm
Content-aware image warping aims to preserve visually important image regions as much as possible while allowing the visually unimportant image regions to have relatively large distortions. In this section, we present our algorithm for content-aware image warping based on our warping representation.

Overview
Given an input image I, we aim to find a warping map as represented in Eq. (3) that preserves regions with high importance. Most mesh-based image warping techniques boil down to an optimization framework to obtain the warping mesh, differing by particular objective functions. They consist of two steps in general: importance map generation and image retargeting. Here we follow the same workflow. First, we generate a saliency map according to objectlevel semantic information. Next, a non-uniform polygonal mesh is constructed on the input image based on the importance map. Finally, we compute a warping map defined on the polygonal mesh by optimizing a distortion energy function to achieve image retargeting. The pipeline of our algorithm is shown in Fig. 4. Details of each step are given below. A density function is computed (c), and used to generate a content-aware polygonal mesh (d); cells incident to an edge shorter than 5% of the mean edge length are shown in red. This is optimized to remove short edges (e) and finally the warped mesh is computed by optimizing a distortion energy to give the warped image (f).

Saliency map generation
The estimation of visual attention (saliency) has been a fundamental problem in neurosciences, psychology, and computer vision for a long time. Various applications for saliency estimation include object detection and recognition, photo collage, and image compression. Saliency estimation is also the first step of content-aware image retargeting.
In the context of image retargeting, visual attention is the main driving factor determining what information in the image is perceived to be the most important to preserve. Visual attention representations include pixel-level features, such as contours, textural contrast or color features, and higher-level features, such as faces, people, and objects. The saliency map is estimated based on the identification and analysis of several different visual attention factors. Engelke et al. [37] investigated the impact of different visual attention representations on content-aware image retargeting. They suggested using object-level regions of interest (ROI) for image retargeting, which gave superior performance in their practical experiments. Accordingly, our saliency map generation is based on object-level semantic information. Here, we adopt the discriminative regional feature integration method [38], which provides a basic fit for our application. This method introduces a regional object-sensitive descriptor and an image-specific backgroundness descriptor to estimate saliency. The final saliency map is obtained by fusing saliency maps computed from the multi-level image segmentation to remedy possible inaccuracies due to unreliable segmentation: see Fig. 4(b) for an example saliency map.
It should be pointed out that all content-aware warping methods rely on saliency estimation results. More advanced saliency detection methods could be adopted to achieve better warping results without affecting our overall image warping pipeline; we leave this as a topic for future investigation.

Polygonal mesh generation
To define the warping map, we need to construct an appropriate polygonal mesh on the input image. Most existing methods for polygonal mesh generation either directly rely on Voronoi diagrams or indirectly exploit the duality of Voronoi diagrams and Delaunay tessellations and their properties. In this section, we adopt a direct approach to generating suitable polygonal mesh. We first study the conditions needed to prevent fold-overs in the warped image. Then we apply the centroidal Voronoi tessellation (CVT) method according to image saliency. The mesh is further optimized to better suit our image warping purpose.

Conditions for foldover-free warping
In image warping, fold-overs should be avoided in the output images as they usually introduce undesirable artifacts. The problem of fold-overs is an essential manifestation of the lack of bijectivity of the warping map defined by both the input and warped polygonal meshes. In this section, we first analyze the conditions to guarantee bijectivity of warping in a single cell, following Ref. [39]. Then, we propose criteria for input polygonal mesh generation based on these conditions. We defer the computation of the warping map to Section 4.4.
For simplicity, we now consider the mapping f = (f 1 , f 2 ) in a single cell Ω i * . Let be the Jacobian of mapping f . Then, f is injective if its Jacobian determinant det(J(f )) is strictly positive in domain Ω i * . The linear precision property of quadratic basis functions implies that the image warping map restricted to each cell Ω i * can be represented as where I(u, v) is the identity map and d i = c i − v i is the displacement of vertices or edge midpoints upon warping. Simple algebra reduces the Jacobian of f to (6) To simplify the notation for later discussion, we rewrite the basis functions using a little algebra as where coefficients have the form A i,j = r i,j c a,b i,j , r i,j are constants independent of the geometry of polygons, and Note that as the factor s in the coefficients c a,b i,j approaches ∞, so do the coefficients A i,j , as the angles v i−1 v i v i+1 approach π. Short edges of polygonal cells may also cause a blowup in the coefficients A i,j used to construct ψ i and the gradient ∇µ i,j over the short edges. The problem of extremely large gradients over edges is independent of the generalized barycentric coordinates employed since all quadratic bases are identical on the edges of polygonal meshes. Hence, the mapping on cells with large angles or short edges could fail to meet the sufficient conditions for bijectivity even if the displacements of vertices are very small. In other words, polygonal meshes free of short edges and large angles may allow larger displacements of vertices in defining a bijective warping map. Therefore, large angles and short edges should be avoided in the polygonal meshes. We here use the CVT method to generate large-angle-free polygonal meshes, and then use a mesh improvement scheme in Ref. [40] to further remove the short edges.

Adaptive mesh generation
The CVT is a special Voronoi tessellation whose seed points coincide with the centroids of the corresponding Voronoi cells. In particular, assume we have a Voronoi diagram with n seed points x i , for i = 1, · · · , n, in the image domain. Then the CVT satisfies: where Ω i is the Voronoi cell of x i and ρ(x) is a specified density function. A CVT can be computed by Lloyd's relaxation method, which iteratively moves each seed point x i to the centroid of the corresponding cell Ω i [41]. When ρ(x) = 1, we obtain a polygonal mesh with uniformly distributed polygonal cells. In the following, we obtain content-aware polygonal meshes by carefully choosing the density function. Note that regions with high saliency in an image undergo small deformation, while regions with low saliency undergo large deformation. We need more polygonal elements in areas with high saliency variance to produce smooth transitions between different deformations. Each vertex or edge midpoint of the warped mesh corresponds to a DOF in the warping representation. We can locally increase the resolution of meshes in the corresponding area to introduce more DOFs into the warping representation. We design the density function for the mesh cell distribution as follows: (1) We first detect sharp edges in the saliency map using the Sobel edge detector [42]. The density at points on detected edges is set to 1.
(2) For any other point a distance d away from the detected edges, the density is set to 1/d 3 . Figure 4(c) visualizes this density function, which leads to the content-aware polygonal mesh in Fig. 4(d). Intuitively, more cells are placed in the transitions between the visually important and unimportant areas.

Mesh optimization
Note that a Voronoi vertex is the circumcenter of a Delaunay triangle. To prevent circumcenters of Delaunay triangles from coming close to each other, we relocate the vertices of Delaunay triangles such that circumcenters of triangles are as interior as possible. In particular, we minimize the squared distances d t from the circumcenters to incenters of triangles t of the Delaunay triangulation T as Eq. (10) [40]: where R t and r t are the circumradius and inradus of triangle t, respectively. For a triangle t with edge lengths a, b, c and area A, R t and r t are given by R t = abc/(4A) and r t = 2A/(a + b + c), respectively.
Here we use the gradient descent method to minimize the energy in Eq. (10). A polygonal mesh obtained by the CVT method may contain arbitrarily short edges: see Fig. 5

(a).
Our experiments indicate that such short edges may lead to a non-bijective warping map: the resulting deformed pixel grid may locally fold over in regions around short edges: see Figs. 5(d) and 5(e) for the deformed mesh and a close up view of the warping map, where only the resulting deformed pixel grids are shown to better visualize the map deformation. Ill-shaped polygonal meshes also lead to undesired warping results: see Fig. 5(d) for an example. Our experiments indicate that warping maps based on optimized meshes, where all edges shorter than 5% of the mean edge length are removed, achieve bijectivity up to pixel accuracy. Figure 5(g) shows the optimized mesh, corresponding to a bijective warping map, with the deformed pixel grids in Figs. 5(j) and 5(k) and the superior warping result in Fig. 5(l).
It should be pointed out that other methods, such as edge collapse, can also remove short edges. However, this might no longer maintain the Voronoi properties and introduce non-convex elements. The advantage of using a polygonal mesh is that it allows the mesh resolution to be highly adaptive to the content of the source image. We will show later that we can get similar results to previous FEMbased methods using many fewer DOFs, owing to the flexibility of polygonal meshes.

Deformation energy optimization
With the polygonal mesh in hand, we can now construct the polygonal elements on each cell. Once the basis functions on each cell are defined, the warping representation in Eq. (3) is completely determined by the coefficients c i . We propose a deformation energy function E(f ) to quantify the performance of a specific warping map f(u, v). The optimal warping map is then determined by minimizing the energy function while considering boundary constraints and other additional constraints, e.g., to preserve lines. For simplicity, we define the deformation energy of a warping map f(u, v) as s(u, v) is the saliency value at point (u, v) given in Section 4.2, J(f ) is the Jacobian of the warping f defined in Eq. (4), I is the 2 × 2 identity matrix, and · F denotes the Frobenius matrix norm. Note that J(f ) locally equals I if f (u, v) does not distort the image at all. Intuitively, the energy function in Eq. (11) allows translations and penalizes all other transformations. When computing the optimal warping, we enforce boundary conditions to ensure that each boundary vertex of the polygonal mesh remains on the boundary after warping. In particular, assume that we resize the image from m × n to m × n , the boundary conditions are The optimal warp f * (u, v) is the minimizer of the energy function in Eq. (11): under the conditions (12). Due to our continuous representation of the warping map, we can integrate other particular terms into our objective function to adjust the penalty for different transformations, such as rotation, uniform scaling, and inversion. For example, the deformation energy defined in Ref. [43] is written as (14) which penalizes all transformations other than translation and uniform scaling. In Refs. [43] and [44], a rotation-invariant energy, and a distortion energy permitting translation, similarity transform, and rotation are defined as (15) and (16) respectively, where C = J T J. An improvement to the energy in Eq. (16) is also designed in Ref. [29], which increases the penalty to infinity as the horizontal scaling factor goes towards zero:Ẽ

Constrained energy optimization
The energy functions in Eqs. (11), (15), and (16) allow for efficient minimization in a single Newton step, while others result in more complex optimization problems. On the other hand, the minimizer of the energies mentioned above may also introduce inversions, creating visible artifacts in the warping results. To optimize all the energies consistently, we adopt the iterative L-BFGS method, taking into account inversion prevention at each iteration. Imposition of the sufficient condition (8) on the deformed mesh during energy optimization ensures a bijective map result. However, it may also reduce the flexibility for optimization and lead to unsatisfactory warping results: see Figs. 5(b), 5(c), 5(h), and 5(i). Thus, we relax the condition imposed in the optimization. Note also that if the warped polygonal mesh has fold-overs, then the mapping is not-bijective, which suggests an overlapping-free warped polygonal mesh to guarantee a bijective mapping necessarily. Our algorithm, therefore, attempts to satisfy the necessary conditions while maintaining bijectivity. In particular, we check whether the polygonal mesh overlaps after each L-BFGS iteration. If a vertex or midpoint locally introduces an overlap, its position is rolled back to the previous iteration. In detail, we maintain a list of overlapped cells after each L-BFGS iteration. Then we repeatedly remove a cell from the list, and roll back the positions of all the vertices and midpoints associated with the cell to the previous iteration. This rolling back may cause adjacent cells to overlap, and if so they are added to the list. When the list is empty, we stop.
The L-BFGS method involves integrating poly-FEM basis function derivatives. We evaluate these integrals by triangulating each polygonal cell and applying the quadrature rule to the resulting triangles. Based on short-edge free polygonal meshes, this iterative updating and rolling back optimization strategy for energy minimization never introduced inversion in all examples considered in this paper. Figures 4(e) and 4(f) show an example of the deformed polygonal mesh and the final warped image. Figure 6 also shows deformations resulting from optimizing different energy functions using the proposed warping representation.

Comparison to the FEM-based method
High-order polygonal elements allow smoother warping while using fewer DOFs.  Fig. 7(c). We can observe that the quadratic poly-FEM achieves a much smoother deformation. Unnatural distortion in the result of the linear poly-FEM is highlighted in a red rectangle in Fig. 7(c). Figures 7(g)-7(i) show that we can get similar results with even fewer DOFs or cells.
Amongst other warping methods, the most closely related one to ours is the work of Ref. [29], where the warping map is defined using a DG FEM using power polynomials on a triangular or quadrilateral mesh. As shown in Fig. 8(b), traditional FEM-based warping methods using regular quadrilateral meshes require more DOFs, and a very dense mesh, to obtain satisfactory warping results. On the other hand, warping based on triangular or polygonal meshes achieves a similar result to the structured meshbased method with far fewer degrees of freedom, as unstructured meshes can provide local refinement and improve the utilization of degrees of freedom: see Figs. 8(c) and 8(d).
The warping map of the DG FEM-based method, with individually defined power polynomials on each cell, is discontinuous between elements. Additional DOFs are necessary in the warping function optimization to restore the coupling between elements. If the discontinuities between elements are too large, function values on edges cannot be safely averaged, leading to artifacts. To achieve a visually pleasing result, one must resolve the warping optimization problem on denser meshes. We can observe that the DG FEM-based method has to use a large number of cells to introduce enough DOFs to achieve visually satisfying results: see Fig. 8(c). Instead, our method generates naturally smooth warps between elements without specifying any additional constraints. Hence, we can achieve smooth warps with any number of elements: see Fig. 8(d).

Comparison to interactive methods
Automated saliency map generation techniques cannot yet reach the quality of manual methods. To achieve more satisfactory results or avoid failures of automatic algorithms, many works allow user intervention to specify saliency maps. For example, the BR method requires manual specification of important regions [12]. We compare our proposed poly-FEM warping method to the BR method in Fig. 9. Our automatic method generates comparable   results to those of the BR method. The DNR method [22] requires manual choice of an appropriate threshold to switch from seam-carving to warping in their multi-operator scheme to achieve good results. Seam removal may introduce artifacts in the final output, especially in less important regions: see Fig. 9(c). By contrast, our method achieves visually more satisfactory results.

Comparison to other methods
We first compare the results of our poly-FEM method to classic and state-of-the-art retargeting methods on all examples from the most popular benchmark, RetargetMe (with 80 images in total). Due to lack of space, we only show one set of results from all the classic methods in Fig. 10 for subjective judgement. We can observe that our approach better avoids unexpected information loss and preserves salient content in this example. To avoid time-consuming and laborious personal quality assessment for the retargeting results of the remaining test images in RetargeMe, we evaluated retargeting quality using the General Regression Neural Network-based Objective Quality Assessment (GRNN-OQA) in Ref. [47].
Values from GRNN-OQA are in [0, 1], where a higher value indicates a better quality of the retargeted image. The GRNN-OQA method computes scores which aim to preserve the ranking of subjective scores for the same-source results and to provide a reference to compare different-source results. Since each existing retargeting method exhibits its own advantages and limitations, no single method works better than other methods for all the test images. For each retargeting method, we compute the average GRNN-OQA score over RetargetMe. Note that the GRNN-OQA method provides a trained model for ranking resized images' with scaling factors of 0.75 and 0.5. Hence, we only report the average scores when resizing images in this way. The number of times each method ranked highest or lowest in the ranking of retargeting results of the same source image are also reported: see Table 1. Overall, our method is more robust and better than other retargeting methods, in the senses that our method gets the highest average scores, the most highestscoring results, and the fewest lowest-scoring results.
In Ref. [45], the retargetability score in the range [0, 1] is computed to measure the retargeting difficulty of an input image, where a low value indicates a high level of difficulty. Images with retargetability scores of (0, 0.75] are suggested for retargeting method assessment. To further evaluate the effectiveness of our proposed method, we conducted experiments on four images with low or moderate retargetability scores: see Fig. 11(a). We illustrate results for a scaling factor of 0.5 using two selected classic methods and the three deep learning-based methods, for subjective judgement. We observe that, due to rich content or geometric structures, the retargeting results present severe artifacts for CR, SM, WSSCDNN, and cycle-IR methods: see Figs. 11(b)-11(e). Note that the SAMIR method generates an optimal multi-operator sequence, which needs to scale the image to half the original size as preprocessing. This simple re-scaling of the image leads to blurred results: see Fig. 11(f). Compared to these methods, our method better preserves salient content and introduces fewer unnatural artifacts in images with rich content: see Fig. 11(g).

Speed
All our results in this paper used a small number of elements (50-200) and DOFs , and the whole computation can be done within 0.5 s. For example, the time to warp an image of size 640 × 480 to 50% of its original width using 200 elements is 0.417 s; the most time-consuming step is saliency map generation, taking more than 50% of the total time. Our method is slightly more time-consuming than Cycle-IR (0.203 s) and faster than WSSCDNN (0.982 s), SAMIR (25.23 s CPU time), and DNR (60-100 s [22]) when resizing an image of this size.

Discussions and conclusions
We have introduced a poly-FEM-based image warping representation and provided a framework for content-aware image warping. The proposed poly-FEM warping method considerably improves the representation power of the warping map, so achieves more satisfactory results than other existing methods while using a small number of DOFs.
Despite the generally promising results shown in the paper, the proposed poly-FEM warping method suffers from two major limitations. First, it relies on saliency detection results, like other content-aware image retargeting techniques. Hence, one of our future works is to investigate more sophisticated and efficient saliency detection methods suitable for content-aware image warping. Second, the proposed method may fail to preserve geometric structures, such as lines, especially when line structures occur in low-saliency regions and the distortion distribution is strongly uneven: see the pen in Fig. 12(c). Note that the line structure in low-saliency regions can be well preserved, e.g., the road in Figs. 7(f) and 7(i) and Fig. 9(e), perhaps because the optimal quadratic poly-FEM-based deformation functions are close to linear in the corresponding regions. One possible solution to preserve such line structures is to mark them as salient, resulting in reduced shape deformation in the warping. As shown at the bottom of Fig. 12(b), we manually assign higher salient values to line structures by painting lines with more bright colors. These line structures are well preserved in the final result: see Fig. 12(d). We would like to improve the proposed poly-FEM warping method to include automatic line detection and line structure preservation into our framework.
In this paper, optimal warping is computed on a polygonal mesh with a pre-specified number of cells. We may decrease the number of cells without obviously sacrificing deformation quality: see Figs. 4(e), 8(d), 7(d), and 7(g) for examples. Hence, we would like to design an automatic method for determining an appropriate number of cells to better balance the mesh size (and so running time) and the warp quality. Benefiting from the Lagrange interpolation property and linear precision property of the high-order basis functions used, the warping map is C ∞ smooth in the interiors of cells and is C 0 stitched along cell boundaries. It would also be possible to include barycentric coordinates with the Hermite interpolation property into our framework, e.g., cubic MVC [48], to get a warping map with higher-order continuity along cell boundaries. In addition, our current framework focuses on image warping. We plan to extend the proposed method to retargeting video by including motion features into the importance map and ensuring consistency of the warping grid.