Sketching human character animations by composing sequences from large motion database
- First Online:
- Cite this article as:
- Yoo, I., Vanek, J., Nizovtseva, M. et al. Vis Comput (2014) 30: 213. doi:10.1007/s00371-013-0797-1
- 681 Downloads
Quick creation of 3D character animations is an important task in game design, simulations, forensic animation, education, training, and more. We present a framework for creating 3D animated characters using a simple sketching interface coupled with a large, unannotated motion database that is used to find the appropriate motion sequences corresponding to the input sketches. Contrary to the previous work that deals with static sketches, our input sketches can be enhanced by motion and rotation curves that improve matching in the context of the existing animation sequences. Our framework uses animated sequences as the basic building blocks of the final animated scenes, and allows for various operations with them such as trimming, resampling, or connecting by use of blending and interpolation. A database of significant and unique poses, together with a two-pass search running on the GPU, allows for interactive matching even for large amounts of poses in a template database. The system provides intuitive interfaces, an immediate feedback, and poses very small requirements on the user. A user study showed that the system can be used by novice users with no animation experience or artistic talent, as well as by users with an animation background. Both groups were able to create animated scenes consisting of complex and varied actions in less than 20 minutes.
KeywordsSketching Character animation Interactive systems
Animating a 3D character is a challenging task that has been approached from three main directions. In artistic 3D animation, the animator uses a variety of techniques, such as keyframe animation, parameter curve editing, inverse and forward kinematics (IK/FK), and multiple targets morphing to craft the character poses and motions. In data-driven animation (e.g., motion capture), live motion is recorded directly from an actor, digitized, and then mapped onto a 3D character. In procedural animation, a computational model is used to create and control the motion, e.g., the animator sets conditions for some type of physical or behavioral simulation.
Although existing 3D animation systems provide powerful tools that are appropriate for precise character posing and realistic motion, they are expensive, and have a steep learning curve. The goal of rapidly posing 3D articulated figures, although addressed by previous work, is not fully solved. A good inspiration is traditional 2D animation where the experienced animator sketches the main character poses (keyframes) and the rough character movements using pencil and paper and the assistant animator draws the missing in-betweens.
Using 2D sketching to create 3D animation allows the user to take advantage of the benefits of 2D modeling: intuitiveness, fast pose definition, and quick production of simple, first-pass animated scenes. This paper is not the first to recognize the potential of sketches for 3D animation. The problem has already been addressed in computer graphics by various researchers [5, 19, 21, 22, 34]. However, most of the previous work uses 2D annotated sketches and matches each sketch with a single pose, either by extracting the 3D pose directly from the sketch  or by matching and obtaining a single pose from an existing database .
Extracting motion sequences, rather than individual poses, from hand-drawn sketches could be an effective method to create simple, first-pass 3D character animations because it provides frame-to-frame coherence and motion continuity. The key observation of our approach is that an animated scene can be effectively built by combining the sequences from the database containing poses corresponding to the input sketches. In other words, the user input 2D pose is used within the context of an existing animation. For instance, when an animator sketches the walk “contact” pose, it is very likely that the intention is to animate a walking motion. With our approach, the contact pose sketched by the animator is considered to be part of a walking sequence. Therefore, the animator has the ability to select the identified walking sequence, or part of it, and is relieved of the task of drawing additional walking positions. However, in order to contextualize the pose within the motion, additional information should be provided. We use simple motion and rotation curve sketches to define the motion. In this way, our approach is an extension of the previous work, because it also allows for animation using single poses.
using sequences from the motion database as basic animation building blocks with the detected pose considered in the context of the animation;
a novel intuitive sketching interface that allows for a quick definition of the character and its movements through strokes defining joint motion and rotation; and
efficient matching of character poses with animation curves from a large, unannotated database of motion sequences running in parallel on the GPU.
2 Previous work
Using sketches as 2D input to generate full 3D models has been successfully applied in various areas of computer graphics. One of the first attempts was a sketch-based modeling system SKETCH  and Teddy  that used input strokes for mesh creation. These works have been followed by many, such as Gingold et al. , who created 3D freeform surfaces from 2D sketches, or Rivers et al. , who presented an approach for the creation of 3D models from silhouettes. Recently, Lee et al.  presented a system where the user draws a single sketch, and an improved shadow image is derived automatically. They can generate objects with correct proportions and spacing, and it could be applied for 2D character drawing. Another area where 2D sketching has been successfully used is for facial expressions by  and . Ulicny et al.  used sketching for crowds. Thorne et al.  used a predefined vocabulary of 18 motions to create high level control for sketching a 3D animation.
Sketching as an input for articulated characters has also been suggested. An approach that is closely related to ours is the work of Davis et al. , who created a sketch-based system where the user draws a 2D stick model with annotated joints and bones, and the system reconstructs possible 3D poses to create the final animation. This system, however, uses stick figures as the input, whereas ours uses simple freehand strokes that are more intuitive to many users. Mao et al.  expanded this approach by online joint and bone recognition during the sketch procedure. They applied a projection method to the numerical solution by considering the joint range of motion and joint grouping. Annotated poses were used to create 3D animated character meshes in . Recently, physics has been used to enhance control of animated characters in . Hecker et al.  presented a key framing and posing system for animating characters with unknown morphologies that uses an inverse kinematics solver to animate the character. Recently, Lo and Zwicker  presented a system that allows animation of characters by sketching motion trajectory combined with a search in motion database. However, their approach does not allow creation of the pose that must be already created.
The aforementioned methods provided easy ways for 3D stick-figure reconstruction and creation of animation. The problem with these forward approaches is that they are prone to generating false positives; thus the user is typically asked to correct the output. One preferred solution is by using databases of motion-capture data. Kovar et al.  introduced motion graphs that were able to create motions following desired paths from motion-capture data. Arikan et al.  used cut and paste of motion-capture data to generate natural looking animations. Another application called MotionMaster  requires two user inputs: labeled sketch with joint locations and a trajectory (motion curve). This system can find and refine a 3D Kung-Fu motion-capture sequence. Since MotionMaster required detailed initial sketches, Jianyuan et al.  introduced an easier approach that quickly creates natural motion by gestures. However, their method also required a predefined skeleton. Lin et al.  used a similar approach where the input is an annotated stick figure drawn in a predefined camera position. The extracted pose is then used for matching against a database of motion-captured animations. However, their approach requires a user interaction during the semiautomated matching. Recently, Wei and Chai  used an approach similar to ours that exploits a motion-capture database with predefined poses. This method has unique features such as natural-pose creation using probabilities, the way to impose constraints of the inverse kinematics for overcoming the unnatural-posing problem, and a probabilistic learning system that was trained to generate millions of poses. However, this approach still requires the user to identify certain key features in the input sketch, rendering the rapid creation of animation problematic. Chao et al.  proposed a motion retrieval and search method that analyzes joint trajectories by using spherical harmonic functions. Jain et al.  used a 2D- to 3D-sketching system with a motion capture database. Similar to our approach, their 3D poses are projected and compared to 2D poses. Contrary to their method, where the user is required to create virtual markers that aid the matching, ours is fully automated. An important difference is that our framework matches the pose within the context of predefined motion sequences.
Several approaches related to our work allow for efficient searching in motion-capture data. Krueger et al.  used multidimensional kd-trees to improve searching and Forbes and Fiume  have introduced data search using weighted principal component analysis. Sakamoto et al.  presented a mapping method of motions into an image plane. The method improved searching motion differences from motion captured data, but did not take into account 3D motion retrieval. Real-time motion-capture data has also been used for a mapping to an animation in . Pullen and Bregler  introduced a support tool for incorporating motion-capture data into animation by matching 2D motion to motion-capture data. Instead of detecting a static pose, our approach finds a pose within a motion context. Choi et al.  suggested motion retrieval and visualization using stick figures including motion curves. They compared sketches with projected stick figures using predefined camera location per character. Although the proposed method succeeded to find suitable motion, reconstructing 3D poses from sketches and rotational curves was not considered.
A substantial body of previous work is related to mapping transformations between the 2D and the 3D pose. One class of previous work attempts to determine the 3D position by estimating the position of the 3D camera. For example, Parameswaran et al.  calculate the 3D position of important points in a body-centric system, assuming that the hips and shoulder joints are coplanar. Chaudhuri et al.  presented an animation system that contains a camera reconstruction algorithm and a mesh retargeting method to find correspondence between a drawn 2D sketch and a given 3D model. The model is deformed using view-dependent IK to obtain the best match with the given sketch. However, the user still needs to manually specify correspondences between the sketched character and the reference pose joints. Our approach uses an automatic matching and camera position estimation by adaptive projecting the 3D character.
Our solution is complementary to the previous work and extends it in various directions. It is not intended to creating exact and precise animation; main usage is quick animation creation. It does not require extra information while drawing the input model; only strokes to define main body parts are needed. Additionally, a user can also specify joint motion and rotation curves, using strokes to better express desired movement of the pose. Because it is a database-based approach, it always provides a valid pose, as long as all poses in the database are valid. Moreover, our method also provides an estimation of camera position from the drawn sketch. Instead of selecting and creating animation with just single poses as a key frame, the matched pose is considered to be a part of motion sequence from the database, and those sequence blocks are used to build the resulting animation.
After this review of previous and related work, a brief overview of the entire method is provided. The individual parts of the system’s pipeline are discussed afterward, namely, sketching the 2D pose and its reconstruction in Sect. 4, the 3D motion-sequence database and 2D pose-matching outlined in Sect. 5, and the final assembly of the animation is in Sect. 6. After that, we present implementation and results in Sect. 7, and the paper is concluded with a section discussing limitations and possible avenues for future work in Sect. 8.
3 Method overview
The 2D pose itself can be a part of various 3D sequences in widely different contexts. To supply additional information that would help to better select the motion of the character, the user can optionally add motion or rotation paths near joints (see Fig. 1b). For example, drawing a motion curve near a hand joint indicates moving the hand, and an arc or an ellipse around a joint represents its rotation. These motion and rotation curves provide additional clues that are used to further improve the pose-matching process.
In the next step, the 2D pose is compared to a large database of 3D motion sequences. Because it would be difficult to directly compare 2D and 3D poses, projected 3D poses from the database are compared with the input reference 2D pose; i.e., each 3D pose in the database is projected from the estimated camera viewpoint. A similarity measure is evaluated for each projection, the sequences are sorted, the matched 3D pose is highlighted, and the sequences are shown to the user.
The selected sequences are composed in the sequence editor into the final animation. An individual sequence can be adjusted in time, trimmed, and two sequences can be blended or interpolated. The process of sketching and selecting a sequence is repeated until the desired animation is created. The animation can be either viewed directly or exported into the common file formats used in professional animation software. In the following sections, details of each part of our framework are described and discussed.
4 2D pose sketching and reconstruction
Humans have the ability to perceive the 3D structure of the character pose from a 2D drawing, and they can even guess the implied motion from a static image or sketch. However, this task is very complicated in automated recognition, and there is no robust way of recognizing 2D sketches that could be applied to direct 3D pose reconstruction. In our approach, the 3D motion is recreated from 2D sketches by imposing minimal requirements on the sketcher, who needs to draw simple strokes identifying important parts of the character’s body, such as spine, head, arms, and legs. Moreover, the user can add strokes that identify the motion and rotation paths of certain parts of the body. This information is later used to match the pose to an animated sequence from the database more efficiently.
4.1 Sketch-based 2D pose definition
4.2 2D pose reconstruction
Because the semantic information of each stroke is explicitly known, the important parts, such as collar, pelvis, head, hands, and feet, are identified and connected. We first reconstruct joints belonging to the spine. From the strokes of arms and spine, the joint points defining the position of head, neck, pelvis, and collar is extracted. Because the lower and upper spinal joints do not significantly affect the 2D to 3D matching, the lower and upper spinal joints are located as evenly distributed from pelvis to collar.
5 3D pose matching
Once the 2D pose is reconstructed from the sketch, its occurrences in the database of prerecorded motion samples are searched. A shared advantage of all methods that use the database is that the matching always finds a valid and well-defined pose (assuming that the database includes only correct poses). A disadvantage is that new poses cannot be created because the content of the database is limited. Moreover, the time required to search in the database can become a bottleneck.
5.1 Motion snapshot database
The motion sequence database from the CMU Graphics Lab Motion Capture Database includes 4 million unannotated poses in nearly 2,400 different animation sequences occupying 3.1 GB of space and totaling 6.5 hours. The database includes sequences in more than 40 different skeleton formats, so they were converted into a single hierarchical skeleton structure. Our framework is intended for a quick animation creation; so the poses were simplified by removing toes, fingers, and wrists. However, the pose structure contains essential data for character animation such as joint hierarchy, bone direction vectors, length of bones, and rotation angles. The default Eulerian rotations were converted into quaternion representation  to prevent the gimbal-lock problem.
Most sequential poses are similar to each other, as can be seen in Fig. 5. Thus, we reduce the number of snapshots in the database by saving significantly different poses. Two frames are considered significantly different if the average angular difference of all joints is higher than a predefined threshold (15° in our application), or if a single joint moves more than a predefined threshold (45° in our application) between the frames. The motion snapshot database stores only significantly different frames and it has about 200,000 different poses out of the 4 million from the input. This reduces application memory requirements in the matching phase significantly.
5.2 2D to 3D pose matching
A correct projection of the 3D pose to match the 2D pose requires estimating the camera position in which the sketch has been drawn, and it is done by an iterative approach. First, each 3D pose from the snapshot database is projected from six basic views: front, back, left, right, top, and bottom (see Fig. 8). Top and bottom views have low priority because animators rarely draw sketches from these views, and they can be difficult to detect because of the overlapping curves in the pose. The projected 3D poses are then compared with the reference 2D pose, and their angular similarity is evaluated.
To evaluate the robustness of our algorithm, we loaded 600 random poses from the database and put the camera at random locations obtained from regularly sampled points on a sphere. The 3D pose was projected from that direction, joint positions were jittered, and we then attempted to find the original pose in the database. The results show that even with high jitter values above 5 % which changed the pose significantly, the success rate was above 75 %. This has been later verified by the user study, where the intended poses were found for the vast majority of the sketches.
Bone matching priority
We have observed that the angles between different parts of the body express the overall pose better than the actual joint positions, and there is no need to normalize poses. This is the reason why angles rather than joint positions were used for the comparison. When comparing 2D poses with the projected 3D pose, some bone relationships are more relevant to overall pose configuration: spine line (pelvis to collar joints), hand lines (shoulder to hand joints), and leg lines (hip to foot joints), as well as limb joints such as elbow and knee, are also important, since they provide the configuration of arms and legs.
Although the matching process converges rapidly, it becomes computationally intensive with a high number of poses. However, the sequences in the database are independent, and the matching process can be easily parallelized. We have implemented this algorithm on the GPU by using CUDA and the matching process takes less than one second for direct pose comparison and less than five seconds for comparisons with curves. This performance enables us to run multiple matching steps for all possible configurations of the pose with swapped left/right hands and legs in order to explore more configurations and resolve ambiguities. These ambiguities result from the fact that the order of left/right hands and legs is not fixed and also comparing hands and legs of hand-drawn 2D character with predefined 3D characters yields to multiple possible configurations.
5.3 Motion and rotation curves matching
The node affected by the user-sketched motion curve is detected by finding the closest joint to the center of the bounding box of the curve. If this is not the intended node, the user can override this automatic selection. The sketched curve is first resampled and smoothed into approximately 100 points to simplify the further steps. The two principal perpendicular directions of the sketched motion curve are detected using principal component analysis algorithm (PCA). The curve is then transformed so that the first sample point is in the origin, and the main axis is aligned with the x-axis of the coordinate system. The curve is then divided into monotonous blocks, and each block’s direction is coded as one of the four: up-left, up-right, down-left, and down-right. Each block is then sampled in a predefined number of points, for which the first derivative c′ and the curvature κ are calculated.
We can assume that the rotation curve (Fig. 11) is essentially a circular arc that can be unprojected by using determined camera position (Sect. 5.2), and the angle of the rotation can be measured in 3D. To apply the rotation, we need to find the rotation axis and the rotation angle.
The affected node of a rotation curve is detected by finding the closest joint to the center of the curve (similar as with motion curve). The curve is then resampled, and the PCA finds the principal directions. The rotation curve is an arc distorted by projection so that the two radii of the ellipse can be detected. By comparing the radii, the ellipse is unprojected, the rotation axis is tilted, and the best candidate bone that is parallel to the 3D axis is found. The angle of rotation αr is then calculated from the two end points of the joint rotation curves.
5.4 Sequence selection
Depending on the user-provided input, there are various options for pose matching. The final matching score of the 3D pose p and the 2D pose r is denoted by S(p,r) and it is used to sort the matched sequences that are offered to the user for selection.
Pose matching only
If the user provides a single 2D pose with no additional information, the pose matching score from Eq. (2) becomes the absolute score so that S(p,r)=Sp(p,r).
Pose matching with rotation or motion curve
Multiple pose matching
When the scores of all poses are evaluated, the sequences are sorted according to their score, and the user selects the best-fitting one.
6 Sequence editor
The sequence editor allows the user to align the sequences’ view directions by rotating the selected sequence and also to efficiently combine the selected sequences into the final animation. The sequences can be trimmed, resampled, blended, and interpolated. The operations that use multiple sequences also use the motion and rotation curves, if supported, to assist the transition.
An existing sequence can be trimmed from the beginning or from the end. In the extreme, the entire sequence can be trimmed into an individual pose. However, if the motion curves were used for certain joints, the pose will still contain the information about their motion that will be used in the interpolation and blending. If the timing of a sequence needs to be modified, it is resampled by changing its length in the sequence editor.
Two sequences can be blended into one by connecting the trajectories of motions and rotations from different sequence blocks. This operation is applied when the user locates the sequences in the sequence editor in such a way that they (partially) overlap. The quaternion interpolation  is used to avoid the gimbal-lock problem. Similarly, if two sequences are separated in time, the quaternion interpolation is used to connect them.
The final animation can be previewed directly in our framework, or it can be exported and further processed in an external application. Although blending and interpolating of the sequences provide an easy and quick way of prototyping, the sequence editor has some limitations, e.g., it cannot change the path of the animation or natural motion transitions are not guaranteed.
7 Implementation and results
Our framework is implemented in C++; it uses OpenGL and GLSL for rendering of the character poses and CUDA for fast, parallel sequence matching. All tests were done on an i7 Intel CPU equipped desktop with 16 GB of memory, and NVIDIA GTX 480 graphics accelerator.
The user interface was designed to be as simple as possible. It has five panels to control the animation process: sketching, transition, matching, animation, and pose selection as can be seen on the accompanying video.
The sketching panel contains tools to sketch the pose using strokes. The user can load a background image as a reference for drawing strokes. The strokes are indicated by different colors for a better control, and each stroke can be redrawn if necessary. Motion and rotation curves are also defined by strokes.
When the sketch is finished, it is matched with the database, and the matching panel shows the most similar sequences containing the drawn pose in descending order. The feedback is immediate, and the sequences are animated. They can be stopped, slowed down, or zoomed, and the user can select any of them by scrolling through results. The GPU-oriented implementation provides immediate results and stores the order of the entire database of 6.5 hours on an off-shelf computer. In addition, our approach does not require additional database pre-processing, so that any motion capture data can be used immediately.
Performance comparison between CPU and GPU implementation
CPU (Core i7 920, 2.66 GHz)
GPU (NVidia Geforce GTX480)
The selected sequence is brought to the transition panel where it can be further edited by trimming, blending, and resampling. All changes can be immediately seen in an animation panel where they are composed into the final animation, which can be exported into the common animation formats and further edited in professional animation programs.
A user study was performed to evaluate our framework. Four professional animators with experience in using animation packages and five novice users were asked to create three sequences demonstrating animations with varying complexity. The novice users needed additional time to get familiar with the interface (this time is also included in the measurement). There was no limit on the number of sketches, nor on the maximum and minimum durations of the sequences, and the participants were free to use any sequence blocks with the only requirement being that the resulting animation should follow the provided scenario.
The first animation is shown in Fig. 1 and consists of a sitting character that stands up, walks, starts running, flips over an obstacle, and falls on the ground. It is a fairly complex animation requiring variety of sequences that demonstrate all aspects of our system. On average, this animation took about 20 minutes to complete. At least 6 sketches (average 10) and 5 sequences were used to complete the task. The length of the resulting animation varied from 8 to 19 seconds with an average of 14.6 seconds.
Picking a coin
Results of the user study
Creation time (min.)
# of Sketches
# of Blocks
Animation length (s)
Pick a coin
The SSRFF animation was the most complicated requirement; it took the most time to create, about 20 minutes and used 10 sketches on average. In traditional software packages like Maya, even a highly experienced user would take several hours to preform a similar task. In all animations, the subjects spent most of the time was selecting the right sequences in the list and editing them. Also, sequence blocks used were tied with the number of sketches drawn (some sub-sequences required more than one sketch with motion curves). We have observed that users tend to draw the poses first and if a good match is not found, they extend them with the motion or rotation curves to refine the search. An example is common motions, such as walk and run, which are very general, and they are present in different variations in many motion sequences. These sequences were refined and matched much more easily with the help of the motion and rotation curves. In general, sequences from the large motion database significantly speed up the animation process, and it is possible to create a rough first-pass animation within minutes.
Participants were also asked to comment on the intuitiveness of the interface. They found drawing simple skeleton and using motion curves to refine the motion intuitive and easy to use. Also the fixed order of drawing the figure was not perceived as a limiting factor, as most of the animators tend to draw in the same order.
Time required to create the animation was approximately the same for the novice and also professional users. That seems to indicate our system can be used without any prior experience with the animation software and techniques because the way our application is used differs from the traditional animation packages. However, more in-depth user study would be required to justify this assumption. Also, the learning process seems to be fast, after a 15 minutes of demonstration, participants were able to create their own animations.
Additional evaluation was performed by comparing our algorithm with the Dynamic Time Warping (DTW) approach introduced in . We have tested 2,000 motion curves. Each curve was randomly jittered 500× by displacing its vertices perpendicularly by distance given as a percentage of the curve length. So jittering by 0.1 displaced the vertices randomly by 10 % of the length of the curve. After that, the curves were resampled so that they have exactly 12 vertices that was the maximum reasonable time for the DTW algorithm. The randomization step was increased by 1 %. Then the input and the randomized curve were compared using our algorithm and DTW.
The speed of the comparison depends on the number of compared vertices. The DTW is a recursive algorithm and its processing time increases exponentially; comparing 15 sampling points for two curves took more than 45 seconds. However, in order to achieve a good accuracy we require to match at least 50 points from the input curve, which was impossible using the DTW (the comparison time was over several hours). Our method required only 5 ms to compare curves with 50 sampling points, thus we conclude that the DTW algorithm is not suitable for processing in real-time whereas our methods performs well.
8 Conclusions and future work
We have presented a framework for a quick creation of first-pass 3D character animations from 2D sketches using a large motion database. The basic building block of our framework is a sequence from the database with a registered pose or multiple poses. The sketches can be enhanced by motion or rotation curves that help the searching process and the combination of the detected sequences into the final animation. The additional motion curves allow defining the pose in the context of the animation that is useful for common animations such as walk. These sequences can then be refined, and the motion curves provide an important detail that helps the animation process. The user study indicates that there is no significant difference for advanced animators and novice users as all participants were able to create the animated sequences quickly and they used all the functions that the framework provided.
Our system has various limitations. The first is that it allows for the creation of an individual character that cannot be used in the context of a scene, it cannot interact with objects, or different characters in the scene. The second limitation stems from the fact that our system is used for quick creation, so it does not support the animation of hands, toes, or facial expressions. Another limitation is that our method is not suited for finding subtle motion differences. This limitation is common to database-oriented approaches. It is partially alleviated by the motion curves that allow for better tuning of the detected results, but further editing of the selected models is usually necessary to create detailed animations. Moreover, the expressive power of the system is given by the content of the sequences stored in the motion database, and the user cannot create new sequences or sequences that go beyond the simple blending or interpolation of existing sequences.
There are several possible extensions of our work. One would be in allowing the character to follow a defined path . It would be a simple extension of the sketching interface, but extra work would be necessary to assure that the motion is realistic even for sharp edges or abrupt motions. Additional options to provide better navigation of the characters would be by spatial keyframing of  or by motion curves of . Another avenue for future work would use multiple characters. An interesting approach was recently presented by Ho et al. , who used extended motion and retargeting for multiple animated characters by exploiting scene and character semantics. Also, in order to evaluate results more precisely, an in-depth user study with a large sample of professional and novice users would be required.
The data used in this project was obtained from mocap.cs.cmu.edu. The database was created with funding from NSF EIA-0196217.
(MOV 33.1 MB)