Estimating reflectance and shape of objects from a single cartoon-shaded image

Although many photorealistic relighting methods provide a way to change the illumination of objects in a digital photograph, it is currently difficult to relight digital illustrations having a cartoon shading style. The main difference between photorealistic and cartoon shading styles is that cartoon shading is characterized by soft color quantization and nonlinear color variations that cause noticeable reconstruction errors under a physical reflectance assumption, such as Lambertian reflection. To handle this non-photorealistic shading property, we focus on shading analysis of the most fundamental cartoon shading technique. Based on the color map shading representation, we propose a simple method to determine the input shading as that of a smooth shape with a nonlinear reflectance property. We have conducted simple ground-truth evaluations to compare our results to those obtained by other approaches.


Introduction
Despite recent progress in 3D computer graphics techniques, traditional cartoon shading styles remain popular for 2D digital art. Artists can use a variety of commercial software (e.g., Adobe R Photoshop, Corel R Painter) to design their own expressive shading styles. Although the design principle used roughly follows a physical illumination model, editing is restricted to 2D drawing operations. We are interested in exploring new interactions which allow relighting of a painted shading style given a single input image.
Reconstructing surface shape and reflectance from a single image is known as the shape-from-shading problem [1]. Based on the fundamental problem setting, most relighting approaches assume shading follows a Lambertian model [2][3][4]. Although these approaches work well for photorealistic images, they often fail to interpret cartoon shading styles in digital illustrations.
The main difference between photorealistic and cartoon shading styles is that cartoon shading is characterized by nonlinear color variation with soft quantization. The designed shading is typically more quantized than the inherent surface shape and its illumination. This assumption is common in many 3D stylized rendering techniques which use color map representation [5][6][7] that simply convert smooth 3D illumination to an artistic shading style. As shown in Fig. 1, this simple mechanism can produce a variety of shading styles with different quantization effects. However, such stylization processes make it more difficult for shading analysis to reconstruct a surface shape and reflectance from such shading.
In this paper, we propose a simple shading analysis method to recover a reasonable shading representation from the input quantized shading. As a first step, we focus on the most fundamental cartoon shading [6]. Our primary assumption is that the main nonlinear factor in the final shading can be encoded by a color map function. With this in mind, we aim to reconstruct a smooth surface field and a nonlinear reflectance property from the input shading. Using these estimated data, our method provides a way to change the illumination of the input image with its quantized shading style.
To evaluate our approach, we conducted a simple pilot study using a prepared set of 3D models and color maps with a variety of stylization inputs. The proposed method was quantitatively compared to related approaches, which provided several key insights regarding relighting stylized shading.

Related work
Color mapping is a common approach used to generate stylized appearances in comics or illustrations. In stylized rendering of a 3D scene, the color map representation is used to convert smooth 3D illumination into quantized nonlinear shading effects [5][6][7]. Similar conversion techniques are used in 2D image abstraction methods for photorealistic images or videos [8][9][10][11]. As a starting point, our work follows the basic assumption that stylized shading appearance is based on a smooth surface shape. Previous shape reconstruction methods for painted illustrations also attempt to recover a smooth surface shape from the limited information provided by feature lines. Lumo [12] generates an approximate normal field by interpolating normals on region boundaries and interior contours. Sýkora et al. [13] extended this approach with a simple set of user annotations to recover full 3D shape for global illumination rendering. CrossShade [14] enables the user to design cross-section curves for better control of the constructed normal field.
The CrossShade technique was extended by Iarussi et al. [15] to construct generalized bend fields from rough sketches in a bitmap form. However, these approaches only focus on shape modeling from the boundary constraints. The recently proposed inverse toon shading [16] modeling framework also follows the strategy of modeling normal fields by designing isophote curves. In this work, the interpolation scheme requires manual editing to design two sets of isophotes with different illumination conditions for robust interpolation. In addition, reliable isophote values are also assumed. In contrast, our objective is to use a single cartoon-shaded image to provide a shading representation that contains both a shape and a nonlinear color map reflectance.
An entire illumination constraint is considered in the well-known shape-from-shading (SFS) problem [1] for photorealistic images. Since the problem is severely ill-posed, accurate surface reconstruction requires skilled user interaction [3,4,17]. The user must specify shape constraints to reduce the solution space of the SFS problem. To reduce user burden, another class of approach suggests rough approximation from luminance gradients [2,18] that can be tolerated by human perception. However, such approaches assume a photorealistic reflectance model, which often results in large reconstruction errors for the nonlinear shading in digital illustrations.
Motivated by these considerations, we attempt to leverage limited cartoon shading information to model a smooth surface shape and nonlinear reflectance to reproduce the original shading appearance.

Shading model assumptions
As proposed in the technique of cartoon shading [6], we assume a color map representation is used to reproduce the artist's nonlinear shading effects. Figure 2 illustrates the basic cartoon shading process. In this model, shading color c ∈ R 3 is computed as follows: where I ∈ R is the luminance value of the illumination, and M : R → R 3 is a 1D color map function which converts the luminance value to the final shading color. For a diffuse shading material, we set I = L · N , where L is a light vector and N is the surface normal vector. We are interested in manipulating L to L to produce a new lighting result, i.e., c = M (L · N ). However, the inverse problem is ill-posed if only shading color c is available. The primary consideration of this paper is that we limit the solution space for other factors while preserving the final shading appearance. Some basic assumptions considered in this paper are as follows.
• Smooth shape and illumination. We assume that the surface shape N and the illumination I are smooth and follow a linear relationship. The only nonlinear factor is the color map function M , which is used to produce the stylized shading appearance. • Monotonic function for color map. For the color map function M , we assume a monotonic relation between image luminance I c (obtained from c) and surface illumination I. This assumption is important to simplify our problem definition as a variation of a photorealistic relighting problem.
• Diffuse lighting for illumination. We analyze all shading effects as due to diffuse lighting. We do not explicitly model specular reflections and shadows in our shading analysis experiments. Figure 3 illustrates the main process of the proposed shading analysis and relighting approach. Here we provide the primary objective and summarize each step.

Methods
• Initial normal estimation. First, an initial normal field N 0 is required as input for the reflectance estimation and normal refinement steps.
Since the reflectance property is not available, we simply approximate a smooth rounded normal field from the silhouette.
Given the initial normal field N 0 , we estimate a key light direction L and a color map function M which best fit c = M (L·N 0 ). This decomposition result roughly matches the original shading c for the given N 0 .
Since the estimated decomposition does not satisfy c = M (L · N 0 ), we refine the surface normal N 0 to N to reproduce the original shading c. • Relighting. Based on the above analysis results, the proposed method can relight the given input illustration. We change the light vector L to L to obtain the final shading color c = M (L · N ). In the following sections, each step of the proposed shading analysis and relighting approaches is described in detail.

Initial normal estimation
For the target region Ω, we can obtain a rounded normal field N 0 from the silhouette inflation constraints [12,13]: p∈ Ω where N ∂Ω = (N ∂Ωx , N ∂Ωy , 0) is the normal constraint from the silhouette ∂Ω. These normals are propagated to the interior of Ω using a diffusion method [19]. As shown in Fig. 4, we can obtain a smooth initial normal field N 0 as a rounded shape.

Reflectance estimation
Once the initial normal field N 0 has been obtained, our system estimates reflectance factors based on the cartoon shading representation c = M (L · N ). The reflectance estimation process takes the original color c and the initial normal N 0 as inputs to estimate the light direction L and the color map function M . We assume that the scene is illuminated by a single key light direction (i.e., L is the same for the entire image). The color map function M is estimated for each target object.
In the early stage of our experiments, we observed that the key light estimation step was significantly affected by the input material style and shape. Our simple experiment is summarized in the Appendix. Since L is a key factor in the following estimation steps, we assume that a reliable light direction is provided by the user. In our evaluation, we used a predefined ground-truth light direction L t to observe errors caused by the other estimation steps.
Color map estimation. Given the smooth illumination result I 0 = L · N 0 , we estimate a color map function M to fit c = M (I 0 ). As shown in Fig. 5, isophote pixels of I 0 do not provide the same color as c. Therefore, a straight forward minimization of Ω c − M (I 0 ) 2 produces a blurred color map M .
To avoid this invalid correspondence between I 0 and c, we force monotonicity by sorting the target pixels in dark-to-bright order as shown in Fig. 6. From the sorted pixels, we can obtain a valid correspondence between luminance range [I i , I i+1 ] and each shading color c i in the same luminance order. As a result, a color map function M is recovered as a lookup table for obtaining c i from [I i , I i+1 ]. We also construct the corresponding inverse map M −1 , which is an additional lookup table to retrieve the luminance range [I i , I i+1 ] from a shading color c i .

Normal refinement
As shown in the right image of Fig. 6, the shading result of M (L ·N 0 ) does not match c perfectly. Here we consider refining normal N 0 to reproduce the  original color c by minimizing the following objective function: where Ω c − M (L · N ) 2 forces the shading function to match the input shading, Ω ∆N 2 is a smoothness constraint, and λ is a regularization factor for the smoothness constraint. Estimating N from Eq. (3) is not straightforward due to the nonlinear function of M .
To address this issue, we provide the following complementary objective function to Eq. (3): where M −1 : R 3 → R is the inverse function of M to change the appearance constraint into the illumination constraint. Since the constraint becomes a simple quadratic function, it can be minimized using the Gauss-Seidel method with successive over relaxation until convergence to a local minimum. When the inverse function M −1 is simply defined from the image luminance I c , Eq. (4) is the same as the photorealistic formulation suggested in a previous study [4]: where k d = 1/k d is the reciprocal of the diffuse reflectance constant. In our case, we can define the inverse function M −1 from the estimated color map function M . Figure 7 illustrates the illumination constraints for the normal refinement process. From the color map estimation process described in Section 4.2, the luminance range [I i , I i+1 ] is known for each shading color c i . Therefore, the illumination is restricted by the following conditions: where C i := {p ∈ Ω|c(p) = c i } is the quantized color area and illumination L · N (p) is constrained to We solve the problem by minimizing the following energy: where E I (N ) = i C i P i (L · N ) is the luminance range constraint with penalty functions P i . We define P i for each C i as follows: The normal N is updated iteratively from the estimated initial normal N 0 in Gauss-Seidel iterations. Here we chose λ = 1.5 to obtain the refinement result. Compared to the initial normal N 0 , the refined normal N better fits the original color c.

Relighting
Based on the cartoon shading representation c = M (L · N ), our system enables lighting interactions for the input illustration. We can obtain a relighting result c by changing the light vector L to L as follows: where the estimated factors M and N are preserved in relighting process.

Evaluation of shading analysis
To evaluate our shading analysis approach, we conducted a simple pilot study via a ground-truth comparison. We compare our estimated results with several existing approaches and ground-truth inputs.

Experimental design
To generate a variety of stylized appearance, we first prepared shape and color map datasets (see Fig. 8).

Shape dataset.
We prepared 20 groundtruth 3D models having varying shape complexity and recognizability. This dataset includes 7 simple primitive shapes and 13 other shapes from 3D shape repositories. Each ground-truth model is rendered from a specific view point to generate a 512 × 512 normal field.
Color map dataset. To better understand real situations, we extracted color maps from existing digital illustrations. We selected a small portion of a material area with a stroke. Then the selected pixels were simply sorted in luminance order to obtain a color map. We tried to extract more than 100 material areas from different digital illustrations sources. From the extracted color maps, we selected 24 distinctive color maps with different quantization effects.
Given the ground-truth normal field N t and color map M t , a final input image was obtained by c t = M t (L t · N t ). Note that we also provide a groundtruth light direction L t in our evaluation process.

Comparison of reflectance models
We first compared the visual difference between our target cartoon shading model and a common photorealistic Lambertian model as shown in Fig. 9.
To obtain an ambient color k a and a diffuse reflectance color k d for the Lambertian shading representation c = k a + k d I, we minimized M (I) − (k a + k d I) 2 with the input color map function M . The color difference suggests that cartoon shading includes some nonlinear parts, which cannot be described by a simple Lambertian model. We will discuss how this nonlinear reflectance property affects the estimation results. Figure 10 summarizes a comparison of our estimation results with ones from Lumo [12] and the Lambertian assumption [4]. To simulate Lumo we used the silhouette inflation constraints of the initial normal estimation in Eq. (2). For the Lambertian assumption, we used the illumination constraint in Eq. (5) with a small value λ = 1.0 to fit the input image luminance I c . In all examples, we used our color map estimation method (Section 4.2) to reproduce the original shading appearance.

Shading analysis
As shown in Fig. 10, Lumo cannot produce the details of illumination due to the lack of inner shading constraints. The Lambertian assumption recovers the original shading appearance well; however, the estimated normal field is overfitted to the quantized illumination. Although our method distributes certain shading errors near the boundaries of the color areas, it produces a relatively smooth normal field and illumination that are both similar to the ground-truth. Figure 11 summarizes the shading analysis results for different material settings. Although our method cannot recover the same shape from different quantization styles, the estimated normal field is smoother than the input shading.
We also compute the mean squared error (MSE) to compare estimated results quantitatively (see . In each comparison, we used the same shape and changed materials for computing the shape estimation errors.
Note that our method tends to produce smaller errors for simple rounded shapes but the errors Fig. 10 Comparison of shading analysis results with Lumo [12] and Lambertian assumption [4]. The proposed method reproduces the original shading appearance similar to the Lambertian assumption with a smooth normal field as in Lumo.   become larger than the Lambertian assumption for more complex shapes. For a complex shape like the Pulley shown in Fig. 15, even the Lambertian assumption results in large errors. Since initial  normal estimation errors become large in such cases, our method fails to recover a valid shape when only minimizing the appearance error. We provide further discussions on initial normal estimation errors in Section 7.
Though the estimated shape may not be accurate, our method successfully reduces the influence of the material difference in all comparisons. Thanks to the proposed shading analysis based on the cartoon shading model assumption, our method regulates estimated reflectance properties for various quantization settings. Figure 16 and the supplemental videos in Fig. 16 Comparison of our relighting results with those from Lumo [12] and using the Lambertian assumption in Ref. [4]. The shading analysis shows the estimated shading results from the input ground-truth light direction and shading. The analysis data are used to produce the following relighting results. Our method can produce dynamic illumination changes from the input light directions as in Lumo, which are less noticeable in the Lambertian assumption. The details of the shapes are also preserved in our method.

Relighting
the Electronic Supplementary Material (ESM) summarize a comparison of our relighting results with those from Lumo [12] and using the Lambertian assumption in Ref. [4]. In all examples, we first estimate the shading representations in the shading analysis step. Then we use the analysis data to produce relighting results.
As in the discussion in the previous evaluation of the shading analysis, the proposed method and the Lambertian assumption can preserve the original shading appearance in the shading analysis step. However, the Lambertian assumption tends to be strongly affected by the initial input illumination, so that dynamic illumination changes from the input light directions are less noticeable in the relighting results. On the other hand, the proposed method and Lumo can produce dynamic illumination changes that are similar to the ground-truth relighting results. The proposed method cannot fully recover the details of the ground-truth shape; however, our shading decomposition result can provide both dynamic illumination changes and details of the target shape.

Real illustration examples
We have tested our shading analysis approach on different shading styles using three real illustrations. Figure 17 shows relighting results for the one of them, the others are included in the supplemental videos in the ESM. The material regions are relatively simple, but each material region is painted with different quantization effects.
To apply our shading analysis and relighting methods, we first manually segmented material regions for the target illustration. We also provide a key light direction L for the target illustration, which is needed for our reflectance estimation step. Fig. 17 Relighting sequence using the proposed method. Nondiffuse parts are limited to static transitions with simple residual representation.

Fig. 18
Reflectance and shape estimation results for a real illustration. Non-diffuse parts are encoded as residual shading. Figure 18 illustrates the elements of reflectance and shape estimation results for the illustration. Compared to the ideal cartoon shading in our evaluations, a material region in the real examples may include non-diffuse parts. As suggested by a photorealistic illumination estimation method [20], we encode such specular and shadow effects as residual differences ∆c = c − M (L · N ) from our assumed shading representation c = M (L· N ). Finally, we obtain relighting results as c = M (L · N ) + ∆c by changing the light direction L .
As shown in Fig. 17 and the supplemental videos in the ESM, the residual representation can recover the appearance of the original shading. We also note that our initial experiment produced possible shading transitions for diffuse lighting, while specular and shadow effects are relatively static.

Discussion and future work
In this paper, we have demonstrated a new shading analysis framework for cartoon-shaded objects. The visual appearance of the relighting results is improved by the proposed shading analysis. We incorporate color map shading representation in our shading analysis approach, which enables shading decomposition into a smooth normal field and a nonlinear color map reflectance. We have introduced a new way to provide lighting interaction with digital illustrations; however, there are several things left to accomplish.
Firstly, our method requires a reliable light direction which is provided by the user. Since the light estimation method in the Appendix is significantly affected by the input shading, more friendly and robust cartoon shading estimation approaches are needed. We consider that a perceptually motivated approach [21] might be suitable.
Secondly, the method minimizes the appearance error, because a shading image is the only input. This results in an under-constrained problem to estimate both shape and reflectance. Actually, our method achieves almost the same appearance as the input. As shown in Fig. 19, the proposed method cannot recover the input shape even if the material has Lambertian reflectance with full illumination constraints. Although the recovered shape satisfies appearance similarity with the color map that is estimated in advance, we need a better solution space to obtain a plausible shape. Since a desirable shape is typically different for different users, we plan to integrate user constraints [3,4,14] for normal refinement. More robust iterated refinement cycles of shape and reflectance estimations are to be desired.
Another limitation is that our initial normal field approximation assumes the shape to be convex. This causes errors noticeable in complex shapes such as the Pulley, as shown in Fig. 19. Currently, we also plan to incorporate interior contours for concave constraints as suggested by Lumo [12]. Even though we require a robust edge detection process to define suitable normal constraints for various illustration styles, this is a promising direction for future work that may yield a more pleasing initial normal field.
Although large collections of 2D digital illustrations are available online, we cannot directly apply our method since we require manual segmentation. A crucial area of future research is to automate albedo estimation, as suggested by intrinsic images [22,23]. While our initial experiments with manual segmentation produced possible shading transitions via the diffuse shading assumption, our method cannot fully encode additional specular and shadow effects. Therefore, incorporating such specular and shadow models is an important future work for more practical situations. Such shading effects are often designed using non-photorealistic principles; however, we hope that our approach will provide a promising direction for new 2.5D image representations of digital illustrations.

Appendix Light estimation
In the early stage of our experiments, we tried to estimate the key light direction L from the input shading c and the estimated initial normal N 0 .
As suggested by Ref. [4], we approximate the problem using Lambertian reflectance I c = k d L ·N 0 , where the diffuse term L · N 0 is simply scaled by the diffuse constant k d . For the input illumination I c , we compute the luminance value from the original color c as the L component in Lab color space. We estimate the light vector L by minimizing the following energy: where L is given by L = k d L. We finally obtain the unit light vector L = L / L by normalizing L . The diffuse reflectance constant k d is optionally computed from k d = L . Figure 20 summarizes our experiment for light estimation. In this experiment, we give a single ground-truth light direction L t (top left) to generate the input cartoon-shaded image c t and then estimate a key light direction L by solving Eq. (10).
It can be observed that the estimated results look consistent with near-Lambertian materials (the left 3 maps) but inconsistent with more stylized materials (the right 3 maps). Another important factor is the shape complexity. The estimated light direction is relatively consistent with rounded smooth shapes. However, the light estimation error becomes quite large when the input model contains many crease edges, especially around the silhouette.
The result suggests that we require additional constraints to improve light estimation. In this paper, we simply provide a ground-truth light direction for evaluation, or a user-given reliable light direction for relighting real illustration examples.