Stroke-GAN Painter: Learning to paint artworks using stroke-style generative adversarial networks

It is a challenging task to teach machines to paint like human artists in a stroke-by-stroke fashion. Despite advances in stroke-based image rendering and deep learning-based image rendering, existing painting methods have limitations: they (i) lack flexibility to choose different art-style strokes, (ii) lose content details of images, and (iii) generate few artistic styles for paintings. In this paper, we propose a stroke-style generative adversarial network, called Stroke-GAN, to solve the first two limitations. Stroke-GAN learns styles of strokes from different stroke-style datasets, so can produce diverse stroke styles. We design three players in Stroke-GAN to generate pure-color strokes close to human artists’ strokes, thereby improving the quality of painted details. To overcome the third limitation, we have devised a neural network named Stroke-GAN Painter, based on Stroke-GAN; it can generate different artistic styles of paintings. Experiments demonstrate that our artful painter can generate various styles of paintings while well-preserving content details (such as details of human faces and building textures) and retaining high fidelity to the input images.


Introduction
Painting, as an important visual art, symbolizes human imagination and ingenuity. Human artists have used a variety of painting tools to create their artworks with specific characteristic styles. However, it is time-consuming for people to master painting skills, requiring much learning, imitating, and practising. Recent computer-aided painting methods generate non-photorealistic images similar to paintings, thereby offering effective paintingassistants for human painting learners, but it is still a challenging task to teach machines to paint artworks based on given images like human artists. Unlike directly generating a style-transfer image or photographic image [1][2][3][4], machine painting is carried out by a machine or computer in a stroke-by-stroke manner. The key to teaching a machine to mimic human artists lies in addressing the following three challenges: (i) painting artistic strokes on the canvas in a human-painting order, starting from a given input image, (ii) generating artistic strokes with textures like human artists' strokes, and (iii) preserving detailed contents of a given image while creating a painting instead of reconstructing a photorealistic image.
Some conventional methods include stroke-based rendering (SBR) methods [5][6][7], which have made contributions to stroke modeling. The quality of such stroke textures is good and mimics human strokes. However, these methods achieve a semi-automatic painting process which needs substantial user intervention. Furthermore, this is time-consuming and requires considerable painting skills of the user, and moreover, these SBR models have a limited number of painting styles. Unlike conventional SBR models, learning-based methods have flexible frameworks which can adapt to diverse artistic styles. In addition, they can create paintings without user intervention. Recently, researchers have typically used recurrent neural networks (RNNs) [8,9] and reinforcement learning (RL) models [10][11][12] to generate stroke-bystroke artworks. However, such unified frameworks lack flexibility to choose different styles of strokes and some paintings generated in particular artistic styles (e.g., pastel-like paintings) lack fine details.
To address limitations of existing painting methods, we have built a new model leveraging advantages of both conventional SBR methods and learningbased methods, as an extension of Ref. [13]. We first describe a novel stroke generative adversarial network (Stroke-GAN) which learns different stroke styles from stroke-style datasets and generates diverse stroke styles with adjustable parameters (stroke path, stroke size and shape, stroke color and transparency). Based on Stroke-GAN, we then describe a neuralnetwork painter which learns to create different styles of paintings in a stroke-by-stroke paradigm. We call the entire framework Stroke-GAN Painter. Stroke-GAN Painter learns to generate a painting in a coarse-to-fine manner, as shown in Fig. 1. In particular, the painting quality becomes better by repeating multiple learning-to-paint processes, during which it proceeds from being a novice to being a veteran. In contrast to existing methods, such as sketching [8,14], doodling [10], Neural Painter (NP) [15], and MDRL Painter (MDRLP) [12], our painter can generate diverse artistic styles of painting with different types of strokes. Moreover, the images generated by our painter also preserve key content details (such as face details in portraits) well, as shown in Fig. 1. In summary, our work makes the following three main contributions.
• We propose a three-player-game model, Stroke-GAN, to generate strokes in an artistic style, which are fully adjustable in terms of stroke path, stroke size and shape, stroke color and transparency, thereby greatly improving the stylization of generated paintings. Stroke-GAN has two generators and one discriminator; the second generator learns to purify the stainedcolor strokes generated by the first generator. Consequently, the generated strokes have pure colors and textures closer to human artists' strokes. • We design a painter based on Stroke-GAN; it does not need to be trained on any image datasets. It learns to create different artistic styles of paintings based on stylized strokes, e.g., oilpainting-like artworks, watercolor-like artworks, pastel-like artworks, in a unified framework. Our generated paintings preserve content details of reference images well while offering stylistic diversity of paintings.

Related work
We briefly survey closely related studies on machine painting. We classify related studies into conventional stroke-based rendering (SBR), learningbased rendering, and image-style transfer (IST).

Conventional stroke-based rendering
SBR methods essentially reconstruct images as nonphotorealistic imagery using stroke-based models.
In particular, Ref. [5] introduces a semi-automatic painting method based on a greedy algorithm; it needs substantial human intervention to control stroke shapes and select stroke locations. The authors of Ref. [6] propose a style design for their painting method by using spline-brush strokes to render the image, but this method requires a high degree of painting skill of its users. The work in Ref. [7] proposes a method to segment an image into areas with similar levels of salience to control the strokes.
However, most of these methods require substantial human intervention to choose key parameters, and are thus inconvenient for ordinary users. Moreover, SBR methods only generate a limited number of artistic styles [20], consequently leading to inflexibility.

Learning-based rendering
Recently, researchers have adopted learning-based methods to improve the painting effects compared to traditional SBR methods. SPIRAL [10,21] develops an adversarially-trained deep reinforcement learning (DRL) agent which learns structures of images. However, it does not reconstruct details of human portraits well. Sketch-RNN [8] constructs stroke-based drawings after training on human-drawn image datasets; it achieves excellent results for common objects, although it only makes simple-line paintings. StrokeNet [9] trains an agent to learn to paint based on a differentiable render and recurrent neural network (RNN); it generalizes poorly on color images. Computers are now able to generate quite realistic oil paintings [12,22,23], and pastel-like paintings [15]. MDRLP [12] paints oil-painting-like pictures with a small number of strokes, although it only mimics one style in a unified framework and loses brush-stroke textures. Methods such as those in Refs. [22,23] improve stroke textures by redesigning their stroke rendering. NP [15] has a similar design to our method since both generate strokes using a GAN-based module. However, our approach differs from theirs in several ways. Firstly, we use a threeplayer GAN model to generate adjustable strokes while NP only uses a normal GAN to generate fixed strokes. Secondly, our method can learn from any stroke datasets while NP must use strokes produced by the MyPaint program as the stroke dataset. Thirdly, NP requires a massive manually-labelled stroke dataset to generate one-by-one action strokes. Their model requires the same strokes as the stroke dataset provided by MyPaint to ensure that the optimized strokes are those needed by the painting model. Our model has no data-labelling owing to that it can paint well by using the three-player Stroke-GAN to generate strokes similar to those in the stroke dataset.

Image style transfer
Image style transfer (IST) methods are popular in both research projects and industrial applications [24][25][26][27][28], although few methods have been applied to stroke-based image rendering. PaintBot [29] based on a DRL network recreates the target image in a strokeby-stroke manner while the painting style is only restricted to the style of the reference image. Neural Painter [15] uses a method to generate style strokes to recreate the target image without style reference images. However, it requires a large manually-labelled stroke dataset and also lacks flexibility to choose different styles of strokes. Moreover, the generated paintings lack fine details, e.g., in the human face and textures of buildings.
In this paper, we propose a new learning-based method, Stroke-GAN Painter. It integrates the advantages of SBR methods with learning-based methods to generate diverse styles of paintings with high quality. Specifically, Stroke-GAN can generate different styles of strokes, and stroke designer modules can generate different style stroke datasets for Stroke-GAN. Meanwhile, we use a rendering module to optimize stroke selection, to ensure Stroke-GAN generates well-behaved strokes, which are rendered onto the canvas to create high-quality paintings.

Overview
We propose a new painting model, Stroke-GAN Painter, to achieve stroke-by-stroke painting for machines or computers. The goal of Stroke-GAN Painter is to paint in diverse artistic styles of paintings in a unified framework. We mainly consider oil paintings, watercolor paintings, and pastel paintings here, although further painting styles could also be easily added to our framework. When given an image, our model can continually render style-strokes onto the canvas to create different artistic styles of paintings by choosing the style of the strokes. Figure 2 depicts our proposed Stroke-GAN Painter which consists of a stroke designer module, Stroke-GAN, the rendering module, and the canvas. The stroke designer module provides different styles of stroke datasets for training Stroke-GAN, which generates style strokes (see Fig. 2(a)). The rendering module feeds in both the reference image U 0 and canvas C n to optimize stroke selection, which controls Stroke-GAN to generate well-behaved strokes. The states of the canvas during the learning-to-paint process are shown in Fig. 2(b). Stroke-GAN Painter learns to create paintings from novice to veteran ability and its painting quality improves with more painting time. Section 3.2 presents the stroke designer module to generate different styles of stroke datasets for training Stroke-GAN. Section 3.3 presents Stroke-GAN, which feeds in values obtained from stroke selection to generate style strokes, which are then painted on the canvas C n . The rendering module feeds in both U 0 and C n to optimize stroke selection, thereby finishing the painting process. Section 3.4 describes the painting process and stroke selection optimization.

Stroke modeling
The artistic style of a painting can be affected by the stroke style. The stroke designer module is used to create different styles of strokes to provide training datasets for Stroke-GAN. Inspired by previous studies [12,15], our stroke designer modules consider the following variables: P denotes a set of control points of a Bézier curve, S denotes a set controlling size of a geometric shape, T denotes a set to control stroke transparency, and V denotes a set to control stroke color.

Stroke datasets
The stroke designer module provides different strokestyle datasets for training Stroke-GAN. Each stroke dataset includes 200,000 stroke images, each with 64 × 64 resolution. We currently provide stroke datasets in three diverse styles, although others could be added: a watercolor-stroke dataset, an oil-paintingstroke dataset, and a pastel-stroke dataset. The pastel-stroke dataset is from Ref. [15] while both oilpainting-stroke dataset and watercolor-stroke dataset are provided by our stroke designer modules. We model the stroke as its path and tip, using a Bézier curve (BC) to construct the stroke path. Circles with variable radii are used to simulate the stroke tip. Each stroke is made from 100 variable circles moving along BC, which is given by where P i denotes the control point with coordinates (x i , y i ) ∈ P.

Basics
In order to improve the painting quality and the fidelity of the stroke, we have designed a threeplayer GAN model, Stroke-GAN, to generate stylized strokes for the painting process. Stroke-GAN is the core component which allows our model to use a unified framework to produce diverse artistic styles of paintings from a given image. Its third player, the coloring module, allows preservation of high fidelity details. Paintings contain several elements: lines, textures, colors, and so on [30]. Our Stroke-GAN is designed to generate strokes containing these elements; the stroke style has a major influence on the painting style. As Stroke-GAN has an end-toend training model, our painter framework has the flexibility to change the artistic style by choosing differently-trained Stroke-GAN models. We do not need to train the whole painter on any image datasets since the model is designed to learn to paint from novice to veteran ability.

Motivation for Stroke-GAN
Our design for Stroke-GAN follows the Deep Convolutional Generative Adversarial Network (DCGAN) [31], which offers more stable training than conventional GANs. However, the strokes generated by a normal DCGAN (i.e., a two-player game) have stained colors. It is unreasonable to try to reproduce the painting process of human artists by rendering strokes in stained colors. To address this problem, we design a second generator (the third player) to recolor strokes with three color parameters to generate pure-colored strokes, which are better for painting (closer to the strokes painted by human artists).

Three-player-game of Stroke-GAN
In order to obtain pure-colored strokes for reasonable painting strokes, we provide a coloring module as a second generator G immediately following the normal generator G (being essentially a convolutional neural network), as shown in Fig. 3. Stroke-GAN consists of a normal generator G, a coloring module G , and a discriminator D. Let h t denote the onedimensional (1-D) vector used as the random noise to generated strokes. Let h s t and h c t denote 1-D vectors being fed into G and G , respectively, and After being fed with h s t and h c t , respectively, G learns to generate stroke images but stained-colored, and G learns to generate pure-color strokes. The discriminator first determines whether a stroke generated by G is valid or not, and then determines the stroke image generated by G . Both G and G update during the adversary mode with D. Since the randomness of DCGAN leads to the unexpected stroke color, we build a second generator G (the coloring module) to control the stroke color. Particularly, the vector h t consists of 50 elements. The normal generator (G) feeds the variable h s t with 47 elements and outputs the stroke image G(h s t ) while h c t is used to control the stroke color {r, g, b} fed into G . Stroke-GAN improves the original DCGAN to purify the colors of strokes (see Fig. 3). The coloring module feeds h c t with the stroke G(h s t ) to learn to purify the stroke.
Since the stroke image just contains the background and the stroke, we can use threshold segmentation [32] to separate the stroke region from the background. Let P = [p 1 , . . . , p c ] denote the matrix of pixels in the stroke image, where c is the number of channels of the image. The average of P is denoted by P . Since the pixel values of the background in the stroke image generated by G are unknown, we should train G to find the threshold of the background pixels. We denote the threshold by γ. We use threshold segmentation to eliminate the background region from the stroke image, and the result is denoted by P , which is obtained as Eq. (2): The values of the elements in P are 0 or close to 0 in the background region. We use min(·) and max(·) to calculate the minimum and maximum values in P , respectively. We determine the stroke region and denote the results by S P , which is obtained by In particular, the values of the elements in S P close to 1 are stroke pixels, and values close to 0 represent background pixels. We then use h c t to recolor the stroke and obtain the pure stroke image P S as Eq. (4): The coloring module endows our painter with more creativity in painting, e.g., outputting paintings with different colors even from the same input image. The reason is that the coloring module in Stroke-GAN takes in h c t directly to recolor the stroke image by learning the color information of the input. This is the reason that Stroke-GAN can be trained without the data-labelling restriction of generating an image the same as the labeled image. Even though Stroke-GAN generates strokes different from the dataset, the coloring module learns color information by analyzing the input image directly so as to ensure that the rendered canvas is close to the input image. The design of the coloring module plays an important role in ensuring the GAN generates realistic human strokes. Moreover, this design also allows Stroke-GAN to easily learn different styles of strokes. Therefore, while Stroke-GAN utilizes a unified framework, it can generate different styles of paintings.

Training of Stroke-GAN
We train Stroke-GAN to acquire different stroke models to endow our painter the ability to paint with different stroke styles. Figure 3 depicts the structure of Stroke-GAN. Stroke-GAN is trained with 128 images as a mini-batch for each stroke dataset containing 200,000 images. We use the whole dataset as the training set since the DCGAN model has no need for validation. When training Stroke-GAN, we directly feed the generator with a 1-D vector (h t ) to generate a stroke image. The initial values of the elements in h t are random. During training, the generator learns to produce images close to the dataset by optimizing the values of these elements. We use the Adam optimizer to train the Stroke-GAN model, with a learning rate of 0.0002, and values of betas of 0.5 and 0.999. Each pair comprising a generated stroke image and a real stroke image is then fed into the discriminator. The discriminator next determines whether the pair of strokes is valid or not. If the generated stroke image is similar to the real stroke image, the pair is valid, and invalid otherwise.
The training procedure for Stroke-GAN is given in Algorithm 1. We essentially train the normal generator G, the coloring module G , and the discriminator D by back-propagating the loss, to update the parameters θ g , θ c , and θ d , respectively. In particular, the discriminator D has the loss d , the generator G has the loss g , and the generator G has the loss c . We use binary cross entropy (BCE) denoted by (z, y) to measure the loss of the input sample z on conditional variable y, with the number of the samples of T , as Eq. (5): When training the discriminator on real stroke images (denoted by x), y = 1, so using Eq. (5), the loss dr for real strokes is When training the discriminator on fake stroke images generated by G, we then have y = 0, so the loss for for k do 3: Sample real data {x1, . . . , x k } from stroke dataset;

8:
Train D with the real loss added to the fake-loss function using Set label = 1;

11:
Train G with the loss function BCELoss D G(h s t ) , label ;

13:
Input h c t ;

14:
Train G with the loss function: end for 16: end for fake strokes, da , is When training the discriminator on fake stroke images generated by G , we again have y = 0, so the loss for fake strokes db , is The entire loss of the discriminator is Similarly, the normal generator G has the loss: and the loss function of the coloring module is where the strokes in G(h s t ) are visually stained. The coloring module learns to compute the content of the stroke region in G(h s t ) and recolor the stroke by h c t . Figure 4(a) compares sample strokes generated by Stroke-GAN with and without the coloring module (CM) for various stroke datasets. Figure 4(b) shows convergence of Stroke-GAN for different stroke datasets. The stained-color strokes generated by Stroke-GAN W/O CM cannot mimic human artists' strokes. The pure-color strokes generated by Stroke-GAN W CM are close to human artists' strokes, benefiting the quality of the content. Although both Stroke-GAN W/O CM and Stroke-GAN W CM can generate strokes similar to the given strokes, the former generates parti-colored strokes while the latter generates pure-colored strokes, which are better as they are closer to those of human artists. Stroke-GAN can learn any styles of strokes given an appropriate training dataset. In this paper, we use three stoke datasets: watercolor, oil-painting, and pastel stroke datasets, and save the trained models as Style 1, Style 2, and Style 3, respectively. We then choose the corresponding model to give a certain artistic style. In summary, Stroke-GAN has the following merits. Firstly, it can recolor the stroke according to the design of the coloring module, thereby improving the artistic creativity of the painter. For example, the color of the painting can be recreated close to but not the same as the input reference image. Secondly, it can flexibly learn any style of strokes as long as a stroke dataset is available, owing to its completeness and independence. Thirdly, it enables end-to-end training so that it can be easily applied in a painting models for various artistic styles, by choosing different Stroke-GAN models.

Basics
We endow our painter with the capability to paint in diverse artistic styles. Besides using diverse strokestyles generated by Stroke-GAN, we also consider the feature-extraction network (FEN) to extract contents of reference images. After processing by the FEN, the original reference image may lose some content but retaining the core information of the image. We design the rendering module with an FEN and an optimization algorithm. We use the FEN in our rendering module to process the features of the reference image and the canvas. We use the optimization algorithm to "pick" well-behaved strokes for rendering the canvas.

Painting process
We design our painter to mimic the painting process used by human artists painting in a given style. Painting is conducted in a coarse-to-fine manner, in which our painter learns to paint from scratch to a finely-detailed painting after multiple times of painting. The Stroke-GAN generates a sequence of strokes at one time and the rendering module optimizes stroke selection (see Section 3.4.3) to "render" these strokes on the canvas. The painting process is shown in Fig. 5. The painting model consists of the rendering module, stroke selection, the canvas, and the Stroke-GAN. The rendering module optimizes the stroke selection and the Stroke-GAN generates continually strokes used to paint. Stroke-GAN Painter learns to paint from novice to veteran ability. We observe that the painting quality improves by repeating multiple learning-topaint processes. The painting quality becomes better with increased n. The reference image is denoted by  3). One h t in the stroke selection is used to pick one stroke (generated by Stroke-GAN). For each painting iteration, the stroke selection outputs a set of H n of size T to let Stroke-GAN generate T strokes. One painting process is finished when T strokes have all been rendered onto the canvas, i.e., C n is completed for some n.
Each stroke generated by the Stroke-GAN is essentially an image, with 64 × 64 pixels. Thus, the canvas is divided into grid cells for rendering convenience, with the size of each cell also 64 × 64 pixels. We render T strokes onto the canvas cellby-cell to finish one painting process. In each cell, the content at the stroke position in the new stroke image replaces that at the corresponding position. Stroke-GAN produces T strokes at a time, where T equals the number of strokes in each cell multiplied by h × w cells. The strokes are sequentially rendered in a grid order. Stroke-GAN runs once in one painting process. The mapping f : S → H uses the transition function C n+1 = f (C n , H n+1 ). Stroke selection first outputs an initial set H 1 ; each element in H 1 denotes an effect for rendering a stroke at a certain position of the canvas. The rendering module then optimizes stroke selection via the stroke-selection-optimization algorithm and generates a new set of elements H n , which are used to render strokes on the canvas to get C n . We continue the coarse-to-fine process from C n to C n+1 , where C n+1 denotes a finer-grained painting (with optimized strokes). Continuing the above process, we finally obtain the best painting C N .

Stroke selection optimization
A rendering module is used to optimize stroke selection; it consists of an FEN and the optimization algorithm. During the painting process, the rendering module first feeds in both the reference image and the painted canvas to compute the distance between them, and then optimizes the stroke selection. Stroke selection picks well-behaved strokes generated by the Stroke-GAN. This ensures that the state of the canvas C n+1 is better than C n . This procedure continues until one painting process is complete, and the painting C N is generated after N painting process. The learning-to-paint process works in a coarse-to-fine manner.
It is a key task to optimize the stroke generated by the Stroke-GAN in the stroke selection step. The rendering module first extracts the feature maps of U 0 and C n , and then processes the difference of the input and the canvas by computing their 1 -distance. We denote the extracted feature maps of the input reference image U 0 and those of the painted canvas C n by F(U 0 ) = {I j |j = 1, . . . , M } and F(C n ) = {c j |j = 0, . . . , M }, respectively, where M is the number of features extracted by the neural rendering module. In particular, I j and c j denote the features of U 0 and those of a certain state of the canvas C n , respectively. We calculate the 1 -distance loss function L(U 0 , C n ) as This essentially computes the distance between the features of the reference image U 0 and those of canvas C n . The stroke selection algorithm optimizes the generated stroke by resetting the values of the elements in h t based on the 1 -distance loss function. The function f(C n , H n+1 ) is computed by the backpropagation algorithm for L(U 0 , C n ). Each C n (the state of the canvas) is rendered by T strokes, and each stroke is produced according to the values in h t . Therefore, the values of the elements in each h t can be updated to the ones needed by backpropagation for L(U 0 , C n ).

Basics
As explained in Section 3.3, Stroke-GAN is the core component for generating the style. The rendering module also contributes to the stylization of the whole painting. In particular, we use the FEN in the rendering module to extract features of the reference image as well as the painted canvas. This process loses some content of the original image but mimics paintings close to the original image. This step can make the generated painting differ from but be similar to the reference image, providing artistic mimesis or "realism". We reconstruct the painting style by using the Stroke-GAN and FEN in the rendering module. In particular, Stroke-GAN endows the painter with diverse styles of strokes and the FEN in the rendering module creates the artistic style.

Artistic style
Since Ref. [15] indicates that the content objective preserves only high-level features while the parameterization can fill in details, we also only take highlevel features as inputs. We adopt two of the most representative deep neural networks: GoogleNet [33] or residual nets (ResNets) [34] for the digital rendering module. The design of using FEN to process the original reference image, rather than directly using the original image, endows our painter with artistic creativity while retaining a high similarity to the reference image. We focus on realism in oil paintings, so we use ResNet to build the FEN in the rendering module. ResNets have high accuracy when extracting information [34], so can keep a high fidelity in the extracted features. On the other hand, features extracted by GoogleNet are relatively sparse so that may offer more space for our painterly creativity. Therefore, we use GoogleNet for watercolor and pastel paintings.

Stroke style
Different strokes can produce different styles of artwork even when used by the same human artist. We provide three kinds of strokes to endow our painter with more creativity. Figure 6 shows different stroke styles for watercolors, oil-painting, and pastels. The watercolor strokes have smooth, soft contours and the brush paths are simple and pure. In contrast, oil-painting strokes have sharp contours and volatile paths. When stacking multiple strokes on the canvas, the oil-painting texture can be recognised easily due to these characteristics. The pastel strokes seem to be accumulated from many uneven points (mimicking the granular textures of pastel paintings). These different styles of strokes cause the canvas to show different styles of painting. After utilizing different FENs to process the reference image in conjunction with different types of strokes, we obtain different styles of paintings.

Experimental results
We have evaluated our approach with several experiments. We first describe the implementation. Then, we evaluate three styles of paintings generated by our painter and compare the output of the proposed Stroke-GAN Painter to state-of-the-art methods. Finally, we study alternatives to understand how our Stroke-GAN Painter generates different styles of paintings by fine-tuning the design.

Implementation
Our experiments were conducted on a workstation with an i7-7700k CPU and an NVIDIA Titan RTX GPU. We evaluated our painter on three image datasets: CelebA [35], ImageNet [36], and real-world photos. These images cover various types of content including portraits, landscapes, animals, plants, and buildings. All images used in experiments are labelled by "Img No.". Style 1 (Stroke-GAN model 1) and the rendering module using GoogleNet were used to generate watercolor paintings, Style 2 and ResNet were used to generate oil paintings, and Style 3 and GoogleNet were used to generate pastel paintings.

Comparison of stroke styles
We compare paintings generated by different styles of strokes on CelebA, ImageNet, and real-world photos. In Fig. 7 and Fig. 8, the top row shows the inputs, and the images in successive rows show oil painting, watercolor painting, and pastel painting results. Img 39 and Img 12 were randomly selected from CelebA [35], Img 32 and Img 35 were randomly selected from ImageNet [36], and the others are real-world photos. Figures 7 and 8 show that all generated paintings exhibit different styles in contrast to the reference images. In particular, the oil-painting-stroke paintings in the second row well preserve textures, lines, and color features, consequently capturing fine details in the reference images. Meanwhile, we observe stroke textures from oil paintings, demonstrating oil-painting stylization. The watercolor-stroke paintings in the third row exhibit a style between pastel paintings and oil paintings; this style is good at expressing details for scenery and building images (see Imgs 26,15,11,21,and 23). The pastel-stroke paintings in the last row preserve some textures and lines while losing some color features. Figure 9 plots 1 -distances between the generated images and the reference images. Specifically, Fig. 9(a) plots the 1 -distance versus painting times for the three stroke styles. We observe that all three styles nearly converge after 300 painting times, although the oil-painting stroke style converges faster than the other two. The pastel-stroke style paintings converge the slowest since they lose more content detail. Figure 9(b) compares convergence for three types of image datasets using the same watercolor-stroke style. It takes 200 painting iterations to recreate the images in CelebA, 300 iterations for images from ImageNet, and 400 for real-world photos. The portrait images of CelebA are relatively easier to learn than those of ImageNet and real-world photos as they have fewer features.

Methods
We further evaluate our painter by comparing it to various state-of-the-art learning-based methods, including Neural Painter (NP) [15], MDRL Painter (MDLRP) [12], SNP [22], and PaintTF [23]; these outperform other learning methods and traditional Fig. 7 Three artistic styles of paintings generated by Stroke-GAN Painter using images from CelebA [35], ImageNet [36], and a real-world photo (Img 15).  SBR methods. We do so using two representative artistic styles for our model: pastel-stroke painting and oil-stroke painting. We use NP for comparative pastel-stroke paintings, and MDRLP, SNP, and PaintTF for comparative oil paintings. In order to obtain the best paintings generated by the compared methods, we use the authors' pre-trained models and default parameter values .

Qualitative comparison
Figures 10 and 11 compare paintings generated by our painter and these compared models. Figure 10 compares pastel paintings generated by NP and our painter. Our painter generates images with more details and textures than NP. For example, Fig. 10 Comparison to the prior method: pastel-stroke paintings generated by our painter and NP [15].
we cannot see facial texture and teeth in the woman's portrait generated by NP while the image generated by our painter well preserves those details, thereby looking more vivid. NP [15] only generates fixed strokes, while our Stroke-GAN generates variable strokes thanks to the coloring module. This allows strokes to be tuned according to the input image, thus retaining more details. Figure 11 compares oil paintings generated by MDRLP, SNP, PaintTF, and our painter. Images generated by our painter suffer less content loss than those generated by MDRLP, SNP, and PaintTF. It is quite obvious when comparing inset close-ups, e.g., our painter well preserves details of the man's eyes and mouth, and textures of the motorcycle and the cloud. Our model uses the independent Stroke-GAN to generate strokes with diverse shapes and variable sizes so can depict detailed contents. However, brushstrokes used in SNP and PaintTF have few shape variants as they are directly generated by their entire models. In particular, their models have only two shapes of strokes despite variant stroke size and angles. On the other hand, the stroke-texture representation differs between all compared methods. MDRLP presents stroke textures while losing some content since it has no special process during stroking to mimic the stroke textures. Adding more strokes and painting steps can make the result more similar to the input photo instead of a painting. SNP and PaintTF present stroke textures by adding a stroke-texture mask after stroke generation. In other words, the stroke contains no textures when generated and the textures have no affect on optimizing the stroke. Our Stroke-GAN painter renders the stroke texture by generating a sharp contour that mimics the thick edge of oil paints in a stroke. Therefore, the painting results present irregular-line textures instead of the brush textures.

Quantitative comparison
To further compare the quality of paintings generated by Stroke-GAN Painter and the other methods, we conducted a two-step user study inspired by Refs. [37,38]. For fairness, the experiments were blind trials, in which users did not know which paintings were generated by which methods. User Study I investigated relative preferences for paintings generated by these methods. User Study II investigated preservation of detailed content and stroke textures in paintings generated by these methods.
User Study I. User Study I used two questionnaires, the first one to compare pastel style artworks, and the second one to compare oilpainting style artworks. Since User Study I was designed to evaluate the preferences of people for artworks generated by different methods, we did not emphasize the backgrounds of users in the comparison, although they came from both artistic and nonartistic backgrounds.
In the first questionnaire, we arbitrarily choose 20 images from CelebA [35] (3 images), ImageNet [36] (6 images), and real-world photos (11 photos); the images included various types of contents including landscapes, buildings, animals, and portraits. The first group of participants had various backgrounds (10% with artistic training), age groups (17-50), and gender (44 females, 43 males). We evaluated pastelstroke paintings generated by our painter and NP [15]. For each reference image (images numbered 1-20 in Fig. 12), we obtained a pair of images painted by our painter and NP. We evaluated the user preference for and stylization of generated images: we asked participants to choose which image better represented a pastel-stroke painting and which they preferred in each pair of images. Figure 12(a) depicts the results. Most users picked the results created by Stroke-GAN Painter as their preference, for all pairs of paintings; our paintings gained 77% of all votes on average. These high votes imply that paintings from Stroke-GAN Painter present pastel-painting style better than the compared ones.
Similarly, the second questionnaire evaluated the oilpainting effect, comparing our painter, MDRLP [12], SNP [22], and PaintTF [23]. The second group of participants was also chosen to have diverse backgrounds, age, and gender (40 females, 32 males). We also select 20 images from CelebA, ImageNet, and real-world photos to cover different content types (images numbered 21-40 in Fig. 12). We asked participants to choose which image is closer to an oil painting and which they preferred in each set of images. Again, more participants (31% among four methods) voted paintings generated by our method presenting better oil-painting style than those from other methods.
User Study II. We used the second user study to compare paintings generated by our Stroke-GAN Painter and other methods in terms of content detail and stroke textures. We again used two questionnaires (on a Likert scale [39]) for pastel paintings and oil paintings, separately. The participants were divided into two groups: users with and without an artistic  [15]. (b) Oil-painting results created by Stroke-GAN Painter, MDRLP [12], SNP [22], and PaintTF [23]. Vertical axis: percentage of users' preferences for an image. Horizontal axis: numbered image pairs. background. All participants were chosen from various age groups  and gender (20 females, 5 males) for each questionnaire. We compared the average score (µ), variance (σ), and the 95% confidence interval for paintings generated by each method. Tables 1 and 2 give results for the two user groups, respectively.
In Table 1 (users without an artistic background), the content details of paintings generated by MDRLP [12], PaintTF [23], and SNP [22] gained low scores (lower than 3). One reason lies in the fact that their paintings lose too many details, and the stroke textures generated by MDRLP [12] are also difficult to recognize for most users. In contrast, the paintings generated by Stroke-GAN Painter have better scores for both content details and stroke textures. Similarly, in Table 2 (users with an artistic background), our  method again gained higher evaluation scores than the other methods. Comparing Table 1 to Table 2, users lacking an artistic background gave higher scores than users with an artistic background in most cases. Nevertheless, both users with and without artistic backgrounds evaluated our paintings higher than those of other methods. In particular, the average score (µ) for the content details reaches 3.940 and the upper bound is 4.060 (in the 95% confidence interval) in Table 1. In Table 2, the average score of the stroke texture reaches 3.956 with upper bound 4.141. Interestingly, for stroke textures, users without artistic background gave a higher score (2.816) for MDRLP [12] than users with an artistic background (2.583), and the highest score given by users without artistic background is 3.468 (our method). However, opposite scores are given by users with artistic background. The scores given for SNP [22] and PaintTF [23] are higher than those for our method although our score is 3.604, close to these two methods.
In summary, both pastel and oil paintings from our method contain more detailed contents than do the compared methods. For non-artistic users, the stroke textures are not well presented by most methods, while artistic users think that PaintTF [23], SNP [22], and our methods (both pastel and oilpainting) present stroke textures well.
User Study III. In order to evaluate the aesthetics of the results, we further conducted User Study III to evaluate color tone and aesthetic beauty of the output paintings. We again used two user-study questionnaires (using a Likert scale [39]) for pastelstroke paintings and oil-painting-stroke paintings, separately. The input images were those used in User Study II. The participants were divided into two groups: users with an artistic background (15) and users without an artistic background (19). The participants came from various age groups (21-40), with 18 females and 16 males, for each questionnaire. We compare the average score (µ), variance (σ), and the 95% confidence interval for paintings generated by each method. Tables 3 and 4 give results for the two user groups.
In Table 3, scores for color tone and aesthetic beauty given by users without artistic background for pastel paintings (NP [15] and our method) are higher  than 3. On the other hand, oil-paintings obtained lower scores. In particular, paintings generated by PaintTF [23] and SNP [22] gained much lower scores than our method and MDRLP [12] for aesthetic beauty. As the scores for content in Table 1 (given by users without artistic background) are also lower than 3, we see that the paintings generated by PaintTF [23] and SNP [22] lose too much content detail, so users without an artistic background give low scores for beauty. Although users without artistic background did not give high scores, the ranking of the compared methods for aesthetic beauty item and color-tone remain the same. On the other hand, users with an artistic background gave higher scores than users without an artistic background. In Table 4, paintings generated by all compared methods obtain scores higher than 3. In particular, our paintings score 4.217 for color tone and 3.937 for aesthetic beauty. Comparing Tables 4  and 3, our method, MDRLP [12], PaintTF [23], and SNP [22] achieve much higher scores than NP [15]. Evaluation of pastel-stroke paintings generated by NP differ little between users with and without artistic backgrounds. However, the difference between these two kinds of users is obvious when evaluating the oil-painting style paintings. For example, for aesthetic beauty, users without artistic background give higher scores for paintings by PaintTF than by SNP while users with artistic background give lower scores. However, all users give consistent evaluations.
In particular, for both color tone and aesthetic beauty, our method is better than the others, while PaintTF [23] and SNP [22] rank lowest. Considering Tables 1-4 overall, our method scores the highest among the state-of-the-art methods. NP performs well for both content and color tone. MDRLP performs well for color tone. PaintTF and SNP perform well for stroke texture.

Alternatives
In this section, we investigate how our painter generates different styles of paintings when using alternative FENs and strokes.

Feature-extraction network
Recall that we chose GoogleNet as the FEN for watercolor and pastel-stroke images, and ResNet as the FEN for oil-painting images. We consider a new FEN with a combination of GoogleNet and ResNet, namely (G+R) to generate paintings. We denote the 1 -distance of features extracted by GoogleNet and the 1 -distance of features extracted by ResNet by L G (U 0 , C n ) and L R (U 0 , C n ), respectively: see Eq. (13). The loss function of the G+R network can be written as L(U 0 , C n ) = 0.5L G (U 0 , C n ) + 0.5L R (U 0 , C n ) (14) We only perform backpropagation for the final L(U 0 , C n ). Figure 13 compares results generated by the three different FENs and four stroke styles, Styles 1-4. Style 4 is a new stroke designed with a hollow circle and a cubic Bézier curve; its stroke dataset is generated using a similar method to that for Style 1. We see from Fig. 13 that the model using GoogleNet+ResNet as the FEN can generate a style of artwork combining the characters of pastel and oil-painting styles. Interestingly, the FEN essentially affects the artistic style of a painting, and the stroke style affects the painting style of the painting. Therefore, we confidently infer that various combinations of FENs and stroke styles can create a diversity of artistic styles of paintings.

Number of strokes
We next investigate the impact of the number of strokes. In particular, the canvas is divided into h×w cells. We then generate images with various number of strokes in each cell. Figure 14 depicts paintings generated using different numbers of strokes with the same FEN. The image generated using 2 strokes looks colorful and artistically creative although it also loses much content detail. Larger numbers of strokes (8 or 10) lead to an exquisite image, much closer to the reference image than images generated by fewer strokes (2 or 5). Having an adjustable number of strokes provides the users with artistic choices.

Conclusions and future work
In this paper, we have presented a stroke-based image rendering approach to mimic the human painting process and generate different styles of paintings.
In particular, we designed Stroke-GAN to generate various styles of strokes. We model the painting process as a stroke-state-optimization process, which can be optimized by a deep convolutional neural network. Our artistic painter can generate different styles of paintings in a coarse-to-fine fashion, like a human painter. User studies of the paintings generated by Stroke-GAN Painter and competing methods demonstrate that our painter gained most votes for the closeness to pastel paintings and oil paintings. Moreover, images generated by our painter also preserve more content detail than existing methods. A deep learning algorithm and Stroke-GAN are used to decompose the reference image into a grid of cells to allow rendering of a sequence of strokes to achieve the stroke-by-stroke effect. We designed Stroke-GAN to generate style strokes by learning from stroke datasets. Our Stroke-GAN can learn any stroke style, providing the painting agent with creativity and flexibility. Although our generated paintings do not compare with masters' artworks, we have made an important step for learning-based AI painting with creative and flexible artistic styles. Meanwhile, there is much that can be done to improve the quality of the output, and other artistic styles not mentioned in this paper could also be emulated. In future, we can combine the advantages of conventional stroke-based methods with learning-based methods to improve painting quality. On the other hand, we hope to develop new style transfer methods in a strokeby-stroke manner to enrich the artistic styles of AI painting.

Availability of data and materials
The data and materials generated during the study are available from the corresponding authors on reasonable request.