A survey of the state-of-the-art in patch-based synthesis

This paper surveys the state-of-the-art of research in patch-based synthesis. Patch-based methods synthesize output images by copying small regions from exemplar imagery. This line of research originated from an area called “texture synthesis”, which focused on creating regular or semi-regular textures from small exemplars. However, more recently, much research has focused on synthesis of larger and more diverse imagery, such as photos, photo collections, videos, and light fields. Additionally, recent research has focused on customizing the synthesis process for particular problem domains, such as synthesizing artistic or decorative brushes, synthesis of rich materials, and synthesis for 3D fabrication. This report investigates recent papers that follow these themes, with a particular emphasis on papers published since 2009, when the last survey in this area was published. This survey can serve as a tutorial for readers who are not yet familiar with these topics, as well as provide comparisons between these papers, and highlight some open problems in this area.


Introduction
Due to the widespread adoption of digital photography and social media, digital images have enormous richness and variety. Photographers frequently have personal photo collections of thousands of images, and cameras can be used to easily capture high-definition video, stereo images, range images, and high-resolution material samples. This deluge of Fig. 1 An illustration of texture synthesis. A texture synthesis algorithm is given as input a small exemplar consisting of a regular, semi-regular, or stochastic "texture" image. The algorithm then synthesizes a large, seamless output texture based on the input exemplar. Reproduced with permission from Wei et al. [104], © The Eurographics Association 2009. image data has spurred research into algorithms that automatically remix or modify existing imagery based on high-level user goals.
One successful research thread for manipulating imagery based on user goals is patch-based synthesis. Patch-based synthesis involves a user providing one or more exemplar images to an algorithm, which is then able to automatically synthesize new output images by mixing and matching small compact regions called patches or neighborhoods from the exemplar images. Patches are frequently of fixed size, e.g. 8x8 squares.
The area of patch-based synthesis traces its intellectual origins to an area called "texture synthesis," which focused on creating regular or semi-regular textures from small examples. See Figure 1 for an example of texture synthesis. A comprehensive survey of texture synthesis methods up to the year 2009 is available [104]. Since then, research has focused increasingly on synthesis of larger and more diverse imagery, such as photos, photo collections, videos, and light fields. Additionally, recent research has focused on customizing the synthesis process for particular problem domains, such as synthesizing artistic or decorative brushes, synthesis of rich materials, and synthesis for 3D fabrication.  [36]. At left is an input exemplar. At right, in (a), (b), and (c), output imagery is synthesized. In (a) and (b) are shown what we call the matching stage, where patches are selected according to different criteria: either random sampling (a) or based on a patch similarity term with previously selected patches (b). In (c) is shown what we call the blending stage, which composites and blends together overlapping patches. In this method, the "blending" is actually done based on a minimal error boundary cut. Reproduced with permission from Efros and Freeman [36], © Association for Computing Machinery 2001.
In this survey, we cover recent papers that follow these themes, with a particular emphasis on papers published since 2009. This survey also provides a gentle introduction to the state-of-the-art in this area, so that readers unfamiliar with this area can learn about these topics. We additionally provide comparisons between state-of-the-art papers and highlight open problems.

Overview
This survey paper is structured as follows. In Section 3, we provide an gentle introduction to patchbased synthesis by reviewing how patch-based synthesis methods work. This introduction includes a discussion of the two main stages of most patch-based synthesis methods: matching, which finds suitable patches to copy from exemplars, and blending, which composites and blends patches together on the output image. Because the matching stage tends to be inefficient, in Section 4, we next go into greater depth on accelerations to the patch matching stage. For the remainder of the paper, we then investigate different applications of patch-based synthesis. In Section 5-Section 7, we discuss applications to image inpainting, synthesis of whole images, image collections, and video. In Section 8, we then investigate synthesis algorithms that are tailored towards specialized problem domains, including mimicking artistic style, synthesis of artistic brushes and decorative patterns, synthesis for 3D and 3D fabrication, fluid synthesis, and synthesis of rich materials. Finally, we wrap up with discussion and a possible area for future work in Section 9.

Introduction to patch-based synthesis
There are two main approaches for example-based synthesis: pixel-based methods, which synthesize by copying one pixel at a time from an exemplar to an output image, and patch-based methods, which synthesize by copying entire patches from the exemplar. Pixel-based methods are discussed in detail by Wei et al. [104]. Patch-based methods have been in widespread use recently, so we focus on them.
We now provide a review of how patch-based synthesis methods work. In Figure 2, an illustration is shown of the patch-based method of image quilting [36]. There are two main stages of most patch-based synthesis methods: matching and blending.
The matching stage locates suitable patches to copy from exemplars, by establishing a correspondence between locations in the output image being synthesized, and the input exemplar image.
In image quilting [36], this is done by laying down patches in raster scan order, and then selecting out of a number of candidate patches the one that has the best agreement with already placed patches. This matching process is straightforward for small textures but becomes more challenging for large photographs or photo collections, so we discuss in more depth different matching algorithms in Section 4.
Subsequently, the blending stage composites and blends patches together on the output image. See Figure 3 for an illustration of the different patch blending methods discussed here. For sparse patches that have a relatively small overlap region, blending can be done by simply compositing irregularly-shaped patches [90], using a blending operation in the overlap region [70], or using dynamic programming or graph cuts to find optimal seams [36,65] (see Figure 2(c) and Figure 3(a-c)). In other papers, dense patches are defined such that there is one patch centered at every pixel. In this case, many patches simultaneously overlap, and thus the blending operation is typically done as a weighted average of many different candidate "votes" for colors in the overlap region [7,96,106] (see Figure 3(d)).
Synthesis can further be divided into greedy algorithms that synthesize pixels or patches only once, and iterative optimization algorithms that use multiple passes to repeatedly improve the texture in the output image. The image quilting [36] method shown in Figure 2 is a greedy algorithm because it simply places patches in raster order in a single pass. A limitation of greedy algorithms is that if a mistake is  Blending methods for patch-based synthesis. On the left are shown three methods for blending sparse patches, which are defined as patches having a relatively small overlap region: (a) irregularly-shaped patches can be composited [90]; (b) overlapping patches can be blended in the overlap region [70]; or (c) optimal seams can be found to make a hard "cut" between patches. On the right (d) is shown a method for blending densely overlapping patches, where a patch is defined around every pixel: an average of all overlapping colors is computed by a voting process [96,106]. made in synthesis, the algorithm cannot later recover from the mistake. In contrast, dense patch synthesis algorithms [7,59,96,106] typically repeat the matching and blending stages as an optimization.
In the matching stage, patches within the current estimate for the output image are matched against the exemplar to establish potential improvements to the texture. Next, in the blending stage, these texture patches are copied and blended together by the "voting" process to give an improved estimate for the output image at the next iteration. Iterative optimization methods also typically work in a coarse-to-fine manner, by running the optimization on an image pyramid [18]: the optimization is repeated until convergence at a coarse resolution, and then this is repeated at successively finer resolutions until the target image resolution is reached.

Matching algorithms
In this section, we discuss algorithms for finding good matches between the output image being synthesized and the input exemplar image.
As explained in the previous section, patch-based synthesis algorithms have two main stages: matching and blending. The matching stage locates the best patches to copy from the exemplar image to the output image that is being synthesized.
Matching is generally done by minimizing a distance term between a partially or completely synthesized patch in the output image and a same-shaped region in the exemplar image: a search is used to find the exemplar patch that minimizes this distance. We call the distance a patch distance or neighborhood distance. For example, in the image quilting [36] example of Figure 2, this distance is defined by measuring the L 2 norm between corresponding pixel colors in the overlap region between blocks B1 and B2.
For iterative optimization methods [96,106], the patch distance is frequently defined as an L 2 norm between corresponding pixel colors of a pxp square patch in the output image and the same-sized region in the exemplar image. More generally, the patch distance could potentially operate on any image features (e.g. SIFT features computed densely on each pixel [73]), use any function to measure the error of the match (including functions not satisfying the triangle inequality), and use two or more degrees of freedom for the search over correspondences from the output image patch and the exemplar patch (for example, in addition to searching to find the (x, y) translational coordinates of an exemplar patch to match, a rotation angle θ, and scale s could also be searched over).
Typically, matching then proceeds by finding nearest neighbors from the synthesized output S to the exemplar E according to the patch distance. In the terminology of Barnes et al. [8], for the case of two (XY) translational degrees of freedom, we can define a Nearest Neighbor Field (NNF) as a function f : S → R 2 of offsets, defined over all possible patch coordinates (locations of patch centers) in image S, mapping to the center coordinate of the corresponding most similar patch in the exemplar.
Although search in the matching stage can be done in a brute-force manner by exhaustively sampling the parameter space, this tends to be so inefficient as to be impractical for all but the smallest exemplar images. Thus, much research has been devoted to more efficient matching algorithms. Generally, research has focused on approximation algorithms for the matching, because exact algorithms remain slower [107], and the human visual system is not sensitive to small errors in color.
We will first discuss different approximate matching algorithms for patches, followed by a discussion of how these can be generalized to apply to correspondence finding algorithms in computer vision. In Figure 4 are shown illustrations of the key components of a number of approximate matching algorithms. We now discuss five matching algorithms.
Matching using coherence. One simple technique to find good correspondences is to take advantage of coherence [3]. Typically, the nearest neighbor field (NNF) used for matching is initialized in some manner, such as by random sampling. However, random correspondences are quite poor, that is, they have high approximation error. An illustration of  The key components of different approximate matching algorithms. The goal of each matching algorithm is to establish correspondences between patches in the output image and patches in the exemplar. Each algorithm proposes one or more candidate matches, out of which is selected the match with minimal patch distance. See the body of Section 4 for a discussion of each algorithm.
coherence is shown in Figure 4(a). Suppose that during synthesis, we are examining a current patch in the output image (shown as a solid red square) and it has a poor correspondence in the exemplar (shown as a dotted red square, with the red arrow indicating the correspondence), that is, the correspondence has high patch distance. In this case, a better correspondence might be obtained from an adjacent patch such as the blue one to the left. However, the correspondence from the adjacent left patch (in blue) must be shifted to the right to obtain an appropriate correspondence for the current patch. The new (shifted) candidate correspondence is shown in green. The patch distance is evaluated for this new candidate correspondence, compared with the existing patch distance for the red correspondence, and whichever correspondence has lower patch distance is written back to the NNF. We refer to this process of choosing the best correspondence as improving a correspondence. Matching using coherence is also known as propagation, because it has the effect of propagating good correspondences across the image. Typically, one might propagate from a larger set of adjacent patches than just the left one: for example, if one is synthesizing in raster order, one might propagate from the left patch, the above patch, and the above-left patch.
Matching using k-coherence. An effective technique for finding correspondences in small, repetitive textures is k-coherence [101]. This combines the previous coherence technique with precomputed k-nearest neighbors within the exemplar.
The k-coherence method is illustrated in Figure 4 A precomputation is run on the exemplar, which determines for each patch in the exemplar (illustrated by a green square), what are the k most similar patches located elsewhere in the exemplar (illustrated in green, for k = 2). Now suppose that during synthesis, similarly as before, we are examining a current patch, which has a poor correspondence, shown in red. We first apply the coherence rule: we look up the adjacent left patch's correspondence within the exemplar (shown in blue), and shift the exemplar patch to the right by one pixel to obtain the coherence candidate (shown in green). The coherence candidate as well as its k nearest neighbor candidates (all shown as green squares) are all considered as candidates to improve the current patch's correspondence. This process can be repeated for other coherence candidates, such as for the above patch.
Matching using PatchMatch. When real-world photographs are used, k-coherence alone is insufficient to find good patch correspondences, because it assumes that the image is a relatively small and repetitive texture. PatchMatch [6][7][8] allows for better global correspondences within real-world photographs. It augments the previous coherence (or propagation) stage with a random search process, which can search for good correspondences across the entire exemplar image, but places most samples in local neighborhoods surrounding the current correspondence. Specifically, the candidate correspondences are sampled uniformly within sampling windows that are centered on the best current correspondence. As each patch is visited, the sampling window initially has the same size as the exemplar image, but it then contracts in width and height by powers of two until the it reaches 1 pixel in size. This random search process is shown in Figure 4(c). Unlike methods such as localitysensitive hashing (LSH) or kd-trees, PatchMatch takes less memory, and it is more flexible: it can use an arbitrary patch distance function. The Generalized PatchMatch algorithm [8] can utilize arbitrary degrees of freedom, such as matching over rotations and scales, and can find approximate k-nearest neighbors instead of only a single nearest neighbor. For the specific case of matching patches using 2D translational degrees of freedom only under the L 2 norm, PatchMatch is more efficient than kd-trees when they are used naively [7], however, it is less efficient than stateof-the-art techniques that combine LSH or kd-trees with coherence [46,63,87]. Recently, graph-based matching algorithms have also been developed based on PatchMatch, which operate across image collections: we discuss these next in Section 6.
Matching using locality sensitive hashing. Locality-sensitive hashing [29] (LSH) is a dimension reduction and quantization method that maps from a high dimensional feature space down to a lowerdimensional quantized space that is suitable to use as a "bucket" in a multidimensional hash table. For example, in the case of patch synthesis, the feature space for a pxp square patch in RGB color space could be R 3p 2 , because we could stack the RGB colors from each of the p 2 pixels in the patch into a large vector. The hash bucket space could be some lower-dimensional space such as N 6 . The "locality" property of LSH is that similar features map to the same hash table bucket with high probability. This allows one to store and retrieve similar patches by a simple hash-function lookup. One example of a locality-sensitive hashing function is a projection onto a random hyperplane followed by quantization [29]. Two recent works on matching patches using LSH are coherency-sensitive hashing (CSH) [63] and PatchTable [9]. These two works have a similar hashing process and both use coherence to accelerate the search. Here we discuss the hashing process of PatchTable, because it computes matches only from the output image to the exemplar, whereas CSH computes matches from one image to the other and vice versa.
The patch search process for PatchTable [9] is illustrated in Figure 4(d). First, in a precomputation stage, a multidimensional hash table is created, which maps a hash bucket to an exemplar patch location. The exemplar patches are then inserted into the hash table. Second, as shown in Figure 4(d), during patch Results from PatchMatch filter [77]. The algorithm accepts a stereo image pair as input, and estimates stereo disparity maps. The resulting images show a rendering from a new 3D viewpoint. Reproduced with permission from [77], © 2013.
synthesis, for each output patch, we use LSH to map the patch to a hash table cell, which stores the location of an exemplar patch. Thus, for each output patch, we can look up a similar exemplar patch. In practice, this hash lookup operation is done only on a sparse grid for efficiency, and coherence is used to fill in the gaps.
Matching using tree-based techniques. Tree search techniques have long been used in patch synthesis [46,70,87,96,105]. The basic idea is to first reduce the dimensionality of patches using a technique such as principal components analysis (PCA) [35], and then insert the reduced dimensionality feature vectors into a tree data structure that adaptively divides up the patch appearance space. The matching process is illustrated in Figure 4(e). Here a kd-tree is shown as a data structure that indexes the patches. A kd-tree is an adaptive space-partitioning data structure that divides up space by axis-aligned hyperplanes to conform to the density of the points that were inserted into the tree [85]. Such kd-tree methods are the state-ofthe-art for efficient patch searches for 2D translational matching with L 2 norm patch distance function [46,87].
We now review the state-of-the-art technique of TreeCANN [87]. TreeCANN works by first inserting all exemplar patches that lie along a sparse grid into a kd-tree. The sparse grid is used to improve the efficiency of the algorithm, because kd-tree operations are not highly efficient, and are memory intensive. Next, during synthesis, output patches that lie along a sparse grid are searched against the kd-tree. Locations between sparse grid pixels are filled using coherence.
Correspondences in computer vision. The PatchMatch [8] algorithm permits arbitrary patch distances with arbitrary degrees of freedom. Many papers in computer vision have therefore adapted PatchMatch to handle challenging correspondence problems such as stereo matching and optical flow. We review a few representative papers here. Bleyer et al. [14] showed that better correspondences can be found between stereo image pairs by adding additional degrees of freedom to patches so they can tilt in 3D out of the camera plane. The PatchMatch filter work [77] showed that edge-aware filters on cost volumes can be used in combination with PatchMatch to solve labeling problems such as optical flow and stereo matching. Results for stereo matching are shown in Figure 5. Similarly, optical flow [22] with large displacements has been addressed by computing a NNF from PatchMatch, which provides approximate correspondences, and then using robust model fitting and motion segmentation to eliminate outliers. Belief propagation techniques have also been integrated with PatchMatch [12] to regularize the correspondence fields it produces.

Images
The patch-based matching algorithms of Section 4 facilitate many applications in image and video manipulation. In this section, we discuss some of these applications. Patch-based methods can be used to inpaint targeted regions or "reshuffle" content in images. They can also be used to edit repeated elements in images. Researchers have also proposed editing and enhancement methods for image collections and videos. These applications incorporate the efficient patch query techniques from Section 4 in order to make running times be practical.
One compelling application of patch-based querying and synthesis methods is image inpainting [43]. Image inpainting removes a foreground region from an image by replacing it with background material found elsewhere in the image, or in other images.
In [7], PatchMatch was shown to be useful for interactive high-quality image inpainting, by using an iterative optimization method that works in a coarse-to-fine manner. Specifically, this process works by repeatedly matching partially completed features inside the hole region to better matches in the background region. Various subsequent methods took a similar approach and focused on improving the completion quality for more complex conditions, such as maintaining geometrical information that can preserve continuous structures between the hole to inpaint and existing content [19,53]. To combine such different strategies into a uniform framework, Bugeau et al. [17] and Arias et al. [2] separately proposed variational systems that can choose different metrics when matching patches to adapt to different kinds of inpainting inputs. Kopf et al. [62] also proposed a method to predict the inpainting quality, which can help to choose the most suitable image inpainting methods for different situations. Most recently, deep learning has been introduced in [88] to estimate the missing patches and produce a feature descriptor. The patch-based approach has also been used for Internet photo applications like image inpainting for street view generation [115]. Image "reshuffling" is the problem where a user roughly grabs a region of an image and moves it to a new position. The goal is that the computer can synthesize a new image consistent with the user constraints. Reshuffling can be treated as the extension of image inpainting techniques, by initializing the regions to be synthesized by user specified contents [26]. In Barnes et al. [7], the reshuffling results are made more controllable by adding user constraints to the PatchMatch algorithm.
Editing repeated elements in images. To perform object level image editing, it is necessary to deform objects, and inpaint occluded parts of the objects and the background. Patch-based methods can be used to address these challenges. One scenario for object-level image manipulation involves repeated elements in textures and natural images. The idea of editing repeated elements was first proposed by the work of RepFinder [23]. In their interactive system, the repeated objects are first detected and extracted. Next, the edited objects are composited on the completed background using a PatchMatch-based inpainting method. One result is shown in Figure 6. Huang et al. [52] improve the selection part of RepFinder and also demonstrate similar editing applications. In addition to exploring traditional image operations like moving, deleting and deforming the repeated elements, novel editing tools were also proposed by Huang et al. In the work of ImageAdmixture [110], mixtures between groups of similar elements were created, as shown in Fig. 6. To create natural appearance in the mixed elements' boundary regions, patch-based synthesis is used to generate the appearance using the pixels from boundaries of other elements within the same group.
Denoising. The Generalized PatchMatch work [8] showed that non-local patch searches can be integrated into the non-local means method [16] for image denoising to improve noise reduction. Specifically, this process works by finding for each image patch, the k most similar matches both globally across the image, and locally within a search region. A weighted average of these patches can be taken to remove noise from the input image patches. Liu and Freeman [71] showed that a similar non-local patch search can be used for video, where optical flow guides the patch search process. The An illustration of editing repeated elements in an image. Upper row: RepFinder [23] uses patch-based methods to complete the background of the editing results before compositing different layers of foreground objects. Lower row: ImageAdmixture [110] uses patch-based synthesis to determine boundary appearance when generating mixtures of objects. Reproduced with permission from [23], © Associating for Computing Machinery 2010, and [110], © IEEE 2012. method of Deledalle et al. [30] showed that principal components analysis (PCA) can be combined with patch-based denoising to produce high-quality image denoising results. Finally, Chatterjee and Milanfar [20] showed that a patch-based Wiener filter can be used to reduce noise by exploiting redundancies at the patch level. In the closely related task of image smoothing, researchers have also investigated patch-based methods that use second order feature statistics [58,72].

Image collections
In the scenario where a photographer has multiple images, there are two categories of existing works on utilizing patches as the basic operating units. In the first category, researchers extended patch-based techniques that had previously been applied to single images to multi-view images and image collections as Tong et al. did in [100]. In the second category, patches are treated as a bridges to build connections between different images. We now discuss the first category of editing tools. For stereo images, patch-based inpainting methods have been used to complete regions that have missing pixel colors caused by dis-occlusions when synthesizing novel perspective viewpoints [28,60,103]. In the work of Wang et al. [103], depth information is also utilized to aid the patch-based hole filling process for object removal. Morse et al. [84] proposed an extension of PatchMatch to obtain better stereo image inpainting results. Morse et al. complete the depth information first and then add this depth information to Fig. 7 Results of reshuffling and re-layering the content in a light field by PlenoPatch [111]. Reproduced with permission from [111], © IEEE 2016.

Fig. 8
Matching regions (right) by NRDC [44] of the two input images (left and middle) with different color theme. Reproduced with permission from [44], © Association for Computing Machinery 2011.
PatchMatch's propagation step when finding matches for inpainting both stereo views. Inspired by the commercial development of light field capture devices like PiCam and Lytro, researchers have also developed editing tools for light fields similar to existing 2D image editing tools [112]. One way to look at light fields is as an image array with many different camera viewpoints. Zhang et al.
[111] demonstrated a layered patchbased synthesis system which is designed to manipulate light fields as an image array. This enables users to perform inpainting, and re-arrange and re-layer the content in the light field as shown in Fig. 7. In these methods, patch querying speed is a bottleneck in the performance. Thus, Barnes et al. [9] proposed to use a fast query method to accelerate the matching process across all the image collections. The proposed patch-based applications, such as image stitching using a small album and light-field super-resolution were reported to be significantly faster.
In the second category, patches are treated as a bridge to build connections between contents from different images. Unlike previous region matching methods which find similar shapes or global features, these methods focus on matching contiguous regions with similar appearance. This allows a dense, nonrigid correspondence to be estimated between related regions. Non-rigid dense correspondence (NRDC) [44] is a representative work.
Based on Generalized Fig. 9 PatchNet [51] can be used to find a graph that connects local regions, where each local region has an internal repeating texture. PatchNet can be used for library-driven editing applications such as image composition. Reproduced with permission from Hu et al. [51], © Association for Computing Machinery 2013.
PatchMatch [8], the NRDC paper proposed a method to find contiguous matching regions between two related images, by checking and merging the good matches between neighbors. NRDC demonstrated good matching results even when there is a large change in colors or lighting between input images. An example is shown in Fig. 8. The approach of NRDC was further improved and applied to matching contents across an image collection in the work of HaCohen et al. [45]. For large image datasets, Gould et al. [41] proposed a method to build a matching graph using PatchMatch, and optimize a conditional Markov random field to propagate pixel labels to all images from just a small subset of annotated images. Patches are also used as a representative feature for matching local contiguous regions in PatchNet [51]. In this work, an image region with coherent appearance is summarized by a graph node, associated with a single representative patch, while geometric relationships between different regions are encoded by labelled graph edges giving contextual information. As shown in Fig. 9, the representative patches and the contextual information are combined to find reasonable local regions and objects for the purpose of library-driven editing.

Video
Here we discuss how patch-based methods can be extended for applications on videos, by incorporating temporal information into patch-based optimizations. We will start by discussing a relatively easier application to high-dynamic range video [57], and then proceed to briefly discuss two works on video inpainting [42,86], followed by video summarization by means of "video tapestries" [5]. The method of Kalantari et al. [57] reconstructs high-dynamic range (HDR) video. A brief definition of HDR imaging is that it extends conventional photography by using computation to achieve greater dynamic range in luminance, typically by using several photographs of the same scene with varying exposure.
In Kalantari et al. [57], HDR video is reconstructed by a special video camera that can alternate the exposure at each frame of the video. For example, the exposure of frame 1 could be low, frame 2 could be medium, and frame 3 could be high, and then this pattern could repeat. The goal of the high-dynamic range reconstruction problem then is to reconstruct the missing exposure information: for example, on frame 1 we need to reconstruct the missing medium and high exposure information. This will allow us to reconstruct a video that has high-dynamic range at every frame, and will thus allow us to take highquality videos that simultaneously include both very dark and bright regions. One solution to this problem is to use optical flow to simply guide the missing information from past and future frames. However, optical flow is not always accurate, so better results are obtained by Kalantari et al. [57] by formulating the problem as a patch-based optimization that fills in missing pixels by minimizing both optical-flow like terms and patch similarity terms. This problem is fairly well-constrained, because there is one constraint image at every frame. Results are shown in Figure 10. Note that Sen et al. [94] also presented a similar patch-based method of HDR reconstruction where the goal is to produce a single output image rather than a video.
The problem of video inpainting, in contrast, is very unconstrained.
In video inpainting, the user selects a spacetime region of the video to remove, and then the computer must synthesize an entire volume of pixels in that region that plausibly removes the target objects. The problem is further complicated because both the camera and foreground objects may move, and introduce parallax, occlusions, disocclusions, and shadows. Granados et al. [42] developed a video inpainting method that aligns other candidate frames to the frame that is to be removed, selects among candidate pixels for the Results for high-dynamic range (HDR) video reconstruction from Kalantari et al. [57]. On the top row are shown frames recorded by a special video camera that alternates exposure at each frame (between high, low, and middle exposure). At bottom is shown the HDR reconstruction, which has both light and dark areas well-exposed. See discussion in Section 7. Reproduced with permission from [57], © Association for Computing Machinery 2013.
inpainting using a color-consistency term, and then removes intensity differences using gradient-domain fusion. Granados et al. assume a piecewise planar background region. Newson et al. [86] inpaint video by using a global, patch-based optimization. Specifically, Newson et al. introduce a spatio-temporal extension to PatchMatch to accelerate the search problem, use a multi-resolution texture feature pyramid to improve texture, and estimate background movement using an affine model.
Finally, the video tapestries [5] work shows that patch-based methods can also be applied to producing pleasing summaries of video. Video tapestries are produced by selecting a hierarchical set of keyframes, with one keyframe level per zoom level, so that a user can interactively zoom in to the tapestries. An appealing summary is then produced by compacting or summarizing the resulting layout image so that it is slightly smaller along each dimension: this facilitates the joining of similar regions, and the removal of repetitive features.
Deblurring. Here, we discuss the application of patch-based methods to one hard inverse problem in computer vision, deblurring. Recently, patch-based techniques have been used to deblur images that are blurred due to say camera shake. Cho et al. [24] showed video can be deblurred by observing that some frames are sharper than others. The sharp regions can be detected and used to restore blurry regions in nearby frames. This is done by a patch-based synthesis process that ensures spatial and temporal coherence. Sun et al. [98] showed that blur kernels can be estimated by using a patch prior that is customized towards modeling corner and edge regions. The blur kernel and the deblurred image can both be estimated by an iterative process by imposing this patch prior. Sun et al. [99] later investigated whether the deblurring results could be improved by training on similar images. Sun et al. [99] showed that deblurring results could be improved if patch priors that locally adapt based on region correspondences are used, or multi-scale patchpyramid priors are used.

Synthesis for specialized domains
In this section, we investigate synthesis algorithms that are tailored towards specialized problem domains, including mimicking artistic style, synthesis of artistic brushes, decorative patterns, synthesis for 3D and 3D fabrication, fluid synthesis, and synthesis of rich materials.
Mimicking artistic style. Patch-based synthesis can mimic the style of artists, such as oil paint style, watercolor style, or even abstract styles. We discuss an early work in this area, image analogies [49], followed by two recent works, Bénard et al. [11] and StyLit, which perform example-based stylization using patches. The image analogies framework [49] gave early results showing the transfer of oil paint and watercolor styles. This style transfer works by providing an exemplar photograph A, a stylized variant of the exemplar A , and an input photograph B. The image analogies

Source
Result Source Result Fig. 12 Results for painting by feature [79]. See Section 8 for discussion. Reproduced with permission from [79], © Association for Computing Machinery 2013.

Fig. 13
Results from RealBrush [74]. At left: physically painted strokes are captured with a camera to create a digital library of natural media. At right: an artist has created a digital painting. RealBrush synthesizes realistic brush texture in the digital painting by sampling from the oil paint and plasticine exemplars in the library. Reproduced with permission from [74], © Association for Computing Machinery 2013. framework then predicts a stylized version B of the input photograph B. Image parsing was shown by [109] to improve results for such exemplar-based stylization. The method of Bénard et al. [11] allows artists to paint over certain keyframes in a 3D rendered animation. Other frames are automatically stylized in the target style by using a temporally coherent extension of image analogies to the video cube. The StyLit method [39] uses the "lit sphere" paradigm [97] for transfer of artistic style. In this approach, a well-lit photorealistic sphere is presented to the artist, so that the artist can produce a non-photorealistic examplar of the same sphere. In StyLit [39], the sphere is rendered using global light transport, and the rendered image is decomposed into lighting channels such as direct diffuse illumination, direct specular illumination, first two bounces of diffuse illumination, and so forth. This is augmented by an iterative patch assignment process, which avoids producing bad matches while also avoiding excessive regularity in the produced texture. See Figure 11 for example stylized results.
Synthesis of artistic brushes and decorative patterns. Patch-based synthesis methods can easily generate complex textures and structures. For this reason, they have been adopted for artistic design. We first discuss digital brush tools, and then discuss tools for designing decorative patterns.
Research on digital brushes includes methods for painting directly from casually-captured texture exemplars [78,79,92], RealBrush [74], which synthesizes from carefully captured natural media exemplars, and "autocomplete" of repetitive strokes in paintings [108]. Ritter et al. [92] initially developed a framework that allows artists to paint with textures on a target image by sampling textures directly from an exemplar image. This approach works by adapting typical texture energy functions to specially handle boundaries and layering effects. Painting by feature [79] further advanced the area of texture-based painting. They improved the sampling of patches along boundary curves, where humans are particularly sensitive to misalignments or repetitions of texture, and then filled other areas using patch-based inpainting. Results are shown in Figure 12. Brushables [78] also improved upon the example-based painting approach by allowing the artist to specify a direction field for the brush, and simultaneously synthesized both the edge and interior regions. RealBrush [74] allows for rich natural media to be captured with a camera, processed with fairly minimal user input, and then used in subsequent digital paintings. Results from RealBrush are shown in Figure 13. The autocompletion of repetitive painting strokes [108] works by detecting repetitive painting operations, and suggesting an autocompletion to users if repetition is detected.
Strokes are represented in a curve representation which is sampled so that neighborhoods matching can be performed between collections of samples.
Decorative patterns are frequently used in illustrated manuscripts, formal invitations, web pages, and interior design. We discuss several recent works on synthesis of decorative patterns.
DecoBrush [75] allows a designer to synthesize decorative patterns by giving the algorithm examples of the patterns, and specifying a path that the pattern should follow. See Figure 14 for DecoBrush results, including the pattern exemplars, a path, and a resulting decorative pattern. In cases where the pattern is fairly regular and self-similar, Zhou et al. [114] showed that a simpler dynamic programming technique can be used to synthesize patterns along curves.
Later, Zhou et al. [113] showed that decorative patterns can be synthesized Results from DecoBrush [75]. See discussion in Section 8. Reproduced with permission from [75], ©Association for Computing Machinery 2014.
with better controls over topology (e.g. number of holes and connected components), by first synthesizing the topology using a topological descriptor, and then synthesizing the pattern itself. This latter work also demonstrated design of 3D patterns, which we discuss next.
We first discuss synthesis of 3D structures that do not need to be physically fabricated. Early texture synthesis works focused on synthesizing voxels for the purposes of synthesizing geometry on surfaces [13] or synthesizing 3D volumetric textures [33,61]. For volumetric 3D texture synthesis, Kopf et al. [61] showed that 3D texture could be effectively synthesized from a 2D exemplar by matching 2D neighborhoods aligned along the three principal axes in 3D space. Kopf et al. [61] additionally showed that histogram matching can make the synthesized statistics more similar to the exemplar. Dong et al. [33] showed that such 3D textures can be synthesized in a lazy manner, such that if a surface is to be textured, then only voxels near the surface need be evaluated. A key observation in their paper is to synthesize the 3D volume from precomputed sets of 3D candidates, each of which is a triple of interleaved 2D neighborhoods. This reduces the search space during synthesis. See Figure 15 for some results. In a similar manner, Lee et al. [66] showed that arbitrary 3D geometry could be synthesized. In their case, to reduce the computational burden, rather than synthesizing voxels directly, they synthesized an adaptive signed distance field using an octree representation.
A separate line of research focused on the synthesis of discrete vector elements in 3D [81,82].
In Ma et al. 2011 [82], collections of discrete 3D elements are synthesized by matching neighborhoods to a reference exemplar of 3D elements. Multiple samples can be placed along each 3D element, which allows Lazy solid texture synthesis [33] can be used to synthesize 3D texture from a 2D exemplar. This is done in a lazy manner, so that only voxels near the surface of a 3D model need be synthesized. At left is the exemplar. (a) a result for a 3D mesh, (b) a consistent texture can be generated even if the mesh is fractured. Reproduced with permission from [33], © The Author(s) Journal compilation 2008, © The Eurographics Association and Blackwell Publishing 2008.
for synthesis of oblong objects such as a bowl of spaghetti or bean vegetables. Constraints can be incorporated into the synthesis process, such as physics, boundary constraints, or orientation fields. Later, Ma et al. 2013 [81] extended this synthesis method to handle dynamically animating 3D objects, such as groups of fish or animating noodles.
Synthesis for 3D fabrication. Physical artifacts can be fabricated in a computational manner by using subtractive manufacturing, such as computernumerical control (CNC) milling and laser cutters, or using additive manufacturing, such as fused deposition modeling (FDM) printing and photopolymerization. During the printing process, it may be desirable to add fine-scale texture detail, such as ornamentation by flowers or swirls on a lampshade. Researchers have explored using patch-based and by-example methods in this direction recently. Dumas et al. [34] showed that a mechanical optimization technique called topology optimization can be combined with an appearance optimization that controls patterns and textures. Topology optimization designs 2D or 3D shapes so as to have structural properties such as supporting force loads. Subsequently, Martínez et al. [83] showed that geometric patterns including empty regions can be synthesized along the surfaces of 3D shapes such that the printed shape is structurally sound. The method works by a joint optimization of a patch-based appearance term and a structural soundness term. The synthesized patterns follow the input exemplar, as shown in Figure 16 (left). Recently, Chen et al. [21] demonstrated that fine filigrees can be fabricated in an example-driven manner. Their method works by Synthesis for 3D fabrication. At left are results from Martínez et al. [83]. At right are shown filigrees from Chen et al. [21]. Subtle patterns of material and empty space can be created along the surface of the object, such that they resemble 2D exemplars. See the paper body for more discussion. Reproduced with permission from [21,83], © Association for Computing Machinery, 2015 and 2016, respectively.
reducing the filigree to a skeleton that references the base elements, and they also relax the problem by permitting partial overlap of elements. Example filigrees are shown in Figure 16 (right).
Fluids. Texture synthesis has been used to texture fluid animations in 2D and 3D. The basic idea is to texture one frame of an animation, then advect the texture forward in time by following the fluid motion, and then re-texture the next frame if needed by starting from this initial guess. Some early works explored these ideas for synthesizing 2D textures directly on 3D fluids [4,64]. More recently, patch-based fluid synthesis research has focused on stylizing and synthesizing fluid flows that match a target exemplar [15,55,80]. Ma et al. [80] focused on synthesizing high-resolution motion fields that match an exemplar, while the lowresolution flow matches a simulation or other guidance field. Browning et al. [15] synthesized stylized 2D fluid animations that match an artist's stylized exemplar. Jamriška et al. [55] implements a more sophisticated system of transfer from exemplars, including support for video exemplars, encouraging uniform usage of exemplar patches, and improvement of the texture when it is advected for a long time. Figure 17 shows a result for fluid texturing with the method of Jamriška et al. [55].
Rich materials.
Real-world materials tend to have complex appearance, including varying amounts of weathering, gradual or abrupt transitions between different materials, varying normals, lighting, orientation, and scale. Texture approaches can be used to factor out or manipulate these rich material properties.
We discuss four recent works: the first two focus on controlling material properties [1,27], the third focuses on controlling weathering [10], and the last focuses on capturing spatially varying BRDFs by taking two photographs of a textured material [1]. Image melding [27] permits an artist to create smooth transitions between two source images in a way that gradually transitions between inconsistent colors, textures, and structural properties. This permits applications in object cloning, stitching of challenging panoramas, and hole filling using multiple photographs.
Diamanti et al. [32] focuses on a similar problem of synthesizing images or materials that follow user annotations, such as the desired scale, material, orientation, lighting, and so forth. In cases that the desired annotation is not present, Diamanti et al. [32] performs texture interpolation to synthesize a plausible guess for unobserved material properties. A result from Diamanti et al. [32] is shown in Figure 18. The recent work by Bellini et al. [10] demonstrates that automatic control over degree of weathering can be achieved for repetitive, textured images. This method finds for each patch of the input image a measure for how weathered it is, by measuring its dissimilarity to other similar patches. Subsequently, weathering can be reduced or increased in a smooth, time-varying manner by replacing lowweathered patches with high-weathered ones, or vice versa. The method of Aittala et al. [1] uses a flash, no-flash pair of photographs of a textured material to recover a spatially varying BRDF representation for the material. This is done by leveraging a self-similarity observation that although a texture's appearance may vary spatially, for any point on the texture, there exist many other points with similar reflectance properties. Aittala et al. fit a spatially varying BRDF that models diffuse and specular, anisotropic reflectance over a Fig. 18 Results for complex material synthesis from Diamanti et al. [32]. The target image (a) is used as a backdrop and the source (b) is used as an exemplar for the patch-based synthesis process. The artist creates annotations for different properties of the material, such as large vs small bricks, the normal, and lit vs shadow. A result image (d) is synthesized. Here about 50% of desired patch annotations were not seen in the exemplar, and were instead interpolated between the known materials. See discussion in Section 8. Reproduced with permission from [32], © Association for Computing Machinery 2015.
detailed normal map.

Discussion
There have been great advances in patch-based synthesis in the last decade. Techniques that originally synthesized larger textures from small ones have been adapted to many domains, including editing of images and image collections, video, denoising and deblurring, synthesis of artistic brushes, decorative patterns, synthesis for 3D fabrication, and fluid stylization. We now discuss one alternative approach that could potentially inspire future work: deep learning for synthesis of texture.
Deep neural networks have recently been used for synthesis of texture [31,40,102]. Gatys et al. [40] showed that artistic style transfer can be performed from an exemplar to an arbitrary photograph by means of deep neural networks that are trained on recognition problems for millions of images. Specifically, this approach works by optimizing the image so that feature maps at the higher levels of a neural network have similar pairwise correlations as the exemplar, but the image is still not too dissimilar in its feature map to the input image. This optimization can be carried out by back-propagation. Subsequently, Ulyanov et al. [102] showed that such a texture generation process can be accelerated by pre-training a convolutional network to mimic a given exemplar texture. Denton et al. [31] showed that convolutional neural networks can be used to generate novel images, by training a different convolutional network at each level of a Laplacian pyramid, using a generative adversarial network loss.
Although neural networks are parametric models that are quite different from the non-parametric approach commonly used in patch-based synthesis, it may be interesting in future research to combine the benefits of both approaches. For example, unlike neural networks, non-parametric patch methods tend to use training-free k-nearest neighbor methods, and so they do not need to go through a training process to "learn" a given exemplar. However, neural networks have recently shown state-of-the-art performance on many hard inverse problems in computer vision such as semantic segmentation. Thus, hybrid approaches might be developed that take advantage of the benefits of both techniques.