Hand gesture-based interactive puppetry system to assist storytelling for children

Digital techniques have been used to assist narrative and storytelling, especially in many pedagogical practices. With the rapid development of HCI techniques, saturated with digital media in their daily lives, young children, demands more interactive learning methods and meaningful immersive learning experiences. In this paper, we propose a novel hand gesture-based puppetry storytelling system which provides a more intuitive and natural human computer interaction method for young children to develop narrative ability in virtual story world. Depth motion sensing and hand gestures control technology is utilized in the implementation of user-friendly interaction. Young players could intuitively use hand gestures to manipulate virtual puppet to perform story and interact with different items in virtual environment to assist narration. Based on the result of the evaluation, this novel digital storytelling system shows positive pedagogical functions on children’s narrating ability as well as the competencies of cognitive and motor coordination. The usability of the system is preliminary examined in our test, and the results which showed that young children can benefit from playing with Puppet Narrator.


Introduction
In recognition for their considerable positive effects on pedagogy, educational games or serious games for training purposes have become immensely popular [1,2]. As a modern form of traditional storytelling, digital storytelling systems emerged over the last few years and have demonstrated powerful pedagogical functions, which enable children to express themselves and cooperate with others during narrative performance. Storytelling is essentially one of the original forms of teaching [3], which can be used as a method to teach ethics, values, and cultural norms and differences.
Digital storytelling following the same well-known strategies similar to classical storytelling can help children to acquire several technological skills and work in groups and strengthen the bonds between each other. As another social benefit, digital storytelling can also help disabled children or students with learning difficulties to remove the barriers of communication with adults and peers and overcome the inability to focus on their feelings or thoughts by providing them with opportunities to play active roles [4]. At present, the major pedagogical benefit gained with digital storytelling is the ability to narrate [5,6].
However, storytelling is not only about narrative. In its basic form, storytelling is usually combined with gestures and expressions. Oral narrative can also be combined with other body movements, e.g., dancing to enhance the storytelling through remembrance and dramatic enactment of stories [7]. From this point of view, storytelling will not only benefit a child's understanding of narrative structures, but also fertilize other abilities, such as cognitive competence and physical coordination during performance with the aid of different media.
In this paper, we design and develop a novel digital puppetry storytelling system 'Puppet Narrator' for young children utilizing depth motion sensing technology, which supports hand and finger motions as input but requiring no hand contact or touching. Considering a puppet's operation complexity for young children, we use hand-tracking and gesture recognition technologies to simplify operations and provide intuitive interface, in which children can use hand gestures to manipulate virtual puppet to perform story. Contrary to the research [8], in addition to narrative fertilization, we also devote our proposed system to increase children's abilities on the aspects of cognitive development, and motor coordination ability during their storytelling performance with the help of depth motion sensing device. The system usability is preliminarily examined, and the results show that young players can benefit from the novel narrating.
In summary, this work has two main contributions: • Introduce a novel narrative assistance with gesture control and computer animation by combining motionsensing technology to manipulate a 3D puppet; • Implement a prototype of the digital storytelling system to help young children to develop their related skills.
The remainder of this paper is organized as follows. Section 2 makes a brief introduction on related research works. Section 3 presents our system design. Section 4 describes system implementation, including the architecture, input data processing, motion control, and output. Experimental result and evaluation are discussed in Sect. 5, and Sect. 7 concludes the paper.

Related works
Over the past decade, there has been a considerable growth of advanced virtual reality technologies which provides the possibilities that allowing students to experience virtual learning environment in highly interactive and natural ways. Virtual reality is also considered as one of the most powerful tools for supporting learning process. VR interactive technologies are widely used in digital storytelling systems, such as the tangible interfaces, haptic feedback, and wireless handheld orientation sensors. Storytelling Alice [9] introduces computer programming to learners using 3D animated stories. Toontastic [10] can be considered as a collaborative and constructive digital animation creator that is designed to help children capture and share their stories with other children around the world. Wayang Authoring system [11] is a web-based visual story authoring media for children, which enables children to use virtual puppets to create stylised digital stories. Inspired by the traditional Chinese shadow puppetry, ShadowStory [12] is designed for children to use a Tablet to create digital shadow puppets and perform story cooperatively on a projection screen. Handheld wireless sensors are used to control the movements of shadow puppets. In Mousawi's [13] research, an iPad was used as a storytelling tool to help teachers and parents evaluate and improve the communication skills of Arabic children. Bonsignore's [14] work supports collaborative mobile storytelling for young children of 8-11 years old, which focuses on mobile reading and authoring to encompass both local and remote mobile authorship practices. In the educational program "From the Ancient to the Modern Tablets", target users can help a digital agent to time-travel back in a 3D Immersive eLearning Environments [15]. Rubart [16] proposed a multi-touch tablet system supporting face-to-face collaborative storytelling which could provide a natural interaction experience following the metaphor of a virtual meeting desk.
Compared to the traditional oral storytelling, virtual reality technologies present digital content in more compelling and engaging formats, which provide users more interactive and immersive experience to assist narration. Interaction methods primarily used in the previous research are limited to the traditional human-computer interface (HCI) technologies, such as keyboard and mouse (e.g., [9,11]), handheld remote controller (e.g., [12]), and touch screen, including mobile phone and tablets (e.g., [10][11][12][13][14][15][16]). However, at present, the young children, also known as the "new media generation" or Digital Natives [17], were born in a richer media environment and they start to interact with new technologies from an early age. Saturated with digital media in their daily lives, they require more interactive learning environments, multimodal feedback, and meaningful learning experiences, which bring new challenges to the current digital storytelling. The new generation of the digital storytelling system is expected to offer a novel and immersive way to captivate learners' interests in a new horizon and improve the quality of teaching and learning in the virtual story world.
One of our main concerns is how to provide this "new media generation" an engaging and immersive interaction to accelerate their learning progress by involving novel virtual reality technologies. The development of novel HCI methods (e.g., depth motion-sensing technology) provide us with new possibility for deriving educational benefits from storytelling by creating new ways of enabling interaction through hand gestures or other modalities. Not only oral narration but children's other capacities are expected to be improved during their storytelling performance, such as cognitive competence and physical coordination. Such improvement can be significant, especially when digital storytelling system can provide players with more immersive interactions, such as using hand gestures to manipulate avatars' movements in virtual environments.

General design
In this section, we discuss the system design which involves the novel manifestation pattern as well as the pedagogical considerations we mentioned previously.

Target
Recently, there has been substantial amount of research undertaken on digital storytelling mainly investigating narrative abilities training, such as Toontastic, Kodu, Storytelling Alice, and Wayang Authoring et al. [8]. However, as we mentioned before, storytelling is not just narrating. Beside oral narration, the development of other abilities is also vitally important for young children, e.g., space and object cognitive abilities. If powerful user-friendly interaction methods, which are more intuitive and natural, are provided by digital storytelling systems, then the additional abilities (cognition and motor coordination) will also be considered.
There is a plethora of research in the field of psychology which is used for the development of important skills in children [18][19][20][21]. Research suggests that spatial-temporal reasoning and spatial visualization ability are an important indicator of achievement in science, technology, engineering, and mathematics [22], and a pre-school child's visual spatial attention ability predicts his future reading skills [23]. Researchers have also postulated a set of the so-called "core domains" in cognitive development and suggested that children have innate sensitivity to specific kinds of patterns of information. Those commonly speculated core skills of cognition include: number [24], space [25], visual perception [26], essentialism [27], and language acquisition [28]. As an important aspect of children's psychosocial development, the significance of motor coordination competence has also been recognized in pedagogy a long time ago. Children with poor motor coordination have been found to underachieve educationally and to experience difficulties with peer relationships [29].
Puppet Narrator is developed to target children between 5 and 8 years, which covers the age group of Key Stage1 (5-7 years old).In UK, the national curriculum is organised into blocks of years called "Key Stages". In addition, in Arora's study, the initial prototype of a narrative learning device "FunPi" has been tested with four children of age group of 6-8 years [30]. Our aim is to endow digital storytelling with a novel interaction method, which is more flexible and immersive. At the same time, our system is highly educational not only in terms of narrative competence but also cognitive ability and motor coordination. In our system, besides narrative ability training, we also pay attention to the development of the core skills of numerical cognition, spatial awareness, and visual perception with the assistance of motion-sensing technology in virtual environment. To enhance the children's motor coordination competence, using depth motion sensor to track and recognize players' hand movement and gestures, our system enables children to use their hand motions to manipulate virtual puppet for interacting during narrating.
Our preliminary conception is as follows: following the provided story plot, children will finish the whole story narration, and at the same time, children can simply use hand motion to control the movement of virtual puppet and interact with playthings in virtual scenario to assist narrating. Through this procedure, their narrative ability will be nourished. Using hand gestures for controlling the avatar, their motor coordination ability will be trained. In interaction with virtual items having different properties and roles in the story, their space and object recognition capability will also be developed.
Based on the above considerations, there are several aspects and we should consider in our system design, as summarized below: 1. For the purpose of narrative ability training: The structure of our training aims is illustrated in Fig. 1.

Story topic
"The Crow and the Pitcher" is one most famous of Aesop's Fables. A thirsty crow found a pitcher with some water at Fig. 1 Training aims to cover three fundamental abilities: narrative ability, cognitive skills, and motor coordination ability, each of which is realised through different related training activities in the digital storytelling practices the bottom out of the reach of its beak. The crow picks up pebble stones and drops them into the pitcher to raise the water level until it can drink the water. The fable is made by ancient Greek poet Bianor [31] and then collected by Avianus. The fable emphasizes the virtue of thoughtfulness over brute strength and the value of the crow's persistence. Considering its popularity among young children as well as its positive pedagogical meaning, we choose "The Crow and the Pitcher" as the story topic. During narration within our virtual environment, young children use their hands and a set of hand gestures to manipulate the puppet crow to pick up pebbles and drop them into the pitcher. The crow's actions, such as flying, grasping, and drinking, will be presented through pre-recorded animations, controlled by the hand gestures.

Pipeline
The component-level interaction within the system is shown in Fig. 2. First, a story plot is provided as storytelling hints to young players. Second, players use hand motion to manipulate the avatar through depth motion sensor device, which can automatically track hand motion and recognize hand gestures. Depth image data from the motion sensor are obtained and interpreted into motion control commands by the host computer. Finally, as visual feedback, the avatar's responding animation is provided to players, and then, players adjust their hand gestures/movement to push the plot forward and narrate the story. Under this novel manifestation pattern, not only players' oral narrative ability but also their cognitive and motor coordination competence is expected to be developed.

Implementation
Puppet Narrator is mainly composed of three parts: input, motion control, and output, as illustrated in Fig. 3. The input  The implementation of the system architecture is mainly composed of three parts: input, motion control, and output part processes the sensor data captured from motion sensor device through HCI and passes it to the next part. Motion control interprets the data subsequently and determines avatar's location and posture. The output module updates avatar performance in virtual environment as the feedback.
We utilized a Leap Motion controller [32] in our system as the HCI sensor device to track hand gestures, which can provide a high fidelity finger tracking through an infrared depth sensor. We utilized the Leap Motion SDK provided by the Leap Motion Co. as the API to access the motion data of hands and fingers from the device. All the 3D models and animations were created in Maya 2014. We integrated and developed the entire system in Unity3D Pro V4.2.

Digital puppet crow design
To make our system more appealing for young children, considering puppetry's positive benefits in education, we use a digital puppet as an avatar to assist children's storytelling through animation technology.
1. Puppet geometry construction which describes puppet shape, the rigging system, and its shading materials. The shape of the digital crow is modelled in Maya 2014, simulating the picture of the crow in the plot. Rigging the puppet defines its behaviour with bones connecting joints.

User interaction which defines the way players interact
with the puppet, describing the performer expressions, and interaction interfaces. 3. Animation The crow puppet uses pre-recorded animations to produce actions, which are triggered by the players' hand gestures. Figure 4 shows the sketched crow, the digital 3D crow model, and animation screenshots.

Hand model
Hand model is shown in the right of Fig. 5a, where f 1 (i ∈ [1,5]) presents the position of the finger tips (i is the number of recognized fingers), and c presents the centre of the palm. The plane of the hand is formed with the normal vector n , and the directional vector d . Vector n and d present the normal and the directional vector of the hand plane. The player's hand and its virtual skeleton mapped are shown in Fig. 5b. For ease of controlling by young children, we define four intuitive motion controls: right, left, move downward and upward, which are mapped to different hand gestures, as illustrated in the upper four rows in Table 1.

Input data processing
The sensor data provided by Leap Motion controller contain a diversity of information about hands and pointables (such as fingers or finger-like tools defined by Leap Motion Co.   [31]), in the virtual scene, which is updated by frame and can be represented as follows: where FR is frame rate; H represents the set of hands detected, and P represents the set of pointables; while T represents timestamp. Hand data H mainly contain the hand identifier, direction, and different values about the palm position and status: In addition, the pointables data P include the id, direction, and position information relative to the hand:

Motion control
Recent research on neuroscience found that in human brain development, there is a strong connection between perception, imagination, and movement. According to the previous studies [33][34][35], through the feedback of their own movement, visual observation or the feedback of avatar's (3D virtual character's) motion [36], humans can recognize and coordinate their movements better. The motion control module is the core of the system, which has two main functions: movement control and recognizing. Movement control function is responsible for mapping the play's relative hand position to the movement of the puppet crow. Recognizing function is in charge of identifying the pattern of hand movement and gestures implicating player's intention. Once a recognizable gesture is detected, recognizing function will trigger a pre-recorded animation or event (e.g., gripping a pebble or flapping wings).

Movement control mechanism
Since in a storytelling system, children mainly focus on narrating, a complex puppet manipulation as if we are stringing a puppet in a real show will be distracting or even hamper narrative [37]. If young children pay much attention on puppet manipulation, they might forget the story line. From this consideration, we design a user-friendly puppet prototype used as a storytelling avatar with a simpler interaction manner and easier motion control mechanism.
For young children to control the movement of the puppet/avatar, the most direct way is converting the translation of their hands position in Leap Motion coordinate system to the position of puppet in virtual environment. Considering the different scales between these two workspaces, a proper transformation matrix should be involved as a scaling into the coordinate translation. Once players' hands is not recognized by the Leap Motion device, puppet will keep its position in the last frame and then resume the movement immediately when hands can be detected again in the following frames. Within each frame, the position of the puppet is decided by the values of two coordinate vectors: the puppet's former coordinate in the previous frame and the hand's relative translation generated by player's hand movement in the current frame. In addition, the scaling factor considered, at the meanwhile, the puppet crow's coordinate in virtual workspace is computed as follows: where P crow is the puppet crow's position in the current frame, R hand is player's relative hand movement in real world in the current frame compared with the previous frame, which can be obtained in formula (2), M is the transformation matrix between the player's workspace and the puppet crow's coordinate system, S is the scaling matrix, and P crow presents puppet crow's position in the previous frame. The algorithm of the movement control mechanism is illustrated below, which takes Leap Motion sensor data as the input and calculates the difference of the data information between the current frame and the previous frame to generate puppet new position in the current frame. If motion data are not accessible from Leap Motion controller in the current frame (that maybe caused by an unrecognized object/hand gesture or moving out of the detecting range), puppet will stay, where it was in the previous frame, unmoved (Line 1-Line 3). Else, update puppet position according to formula (4) (Line 5-Line 9).

Types of hand gestures
In virtual interactive environment, using hand gestures as an input to control avatar's performance is more natural and intuitive than other HCI methods, such as keyboard or touch screen input. Hand gesture interaction is clearly visible to others and constitutes an expressive action in itself. When designing the system, the most important consideration is choosing a most natural and intuitive gestural interaction manner to play with the avatar. Considering their simplicity and demonstrated effectiveness, we utilize detectable pointing gestures [23] into HCI.
In our development, we have designed different types of hand gestures targeted to different levels of interactive applications.

Type-I
Type-I, basic single-hand gesture set, is used to control avatar's movement and trigger avatar's simple animation actions with a single hand. This set of hand gestures is mainly designed for young children by providing a natural and intuitive way to interact with the playthings at a basic interaction level. Table 1 shows the mapping between basic single-hand gesture set performed by young children and the puppetry crow's action in virtual environment, which actually includes two different kinds of gestures: navigation gestures (item 1-5) and action gestures (item 6-7).

Type-II
Type-II, advanced single-hand gesture set, is used to generate more complex avatar's actions on the base of the Type-I. Such type of gestures can be used for producing more complex interaction. This set of hand gestures is the complement and improvement of the basic single-hand gesture set. We defined complex hand gesture interactions to provide more functionalities as shown in Table 2. The targeting users are primarily juniors instead of 5-8 years young children, who have a better motor control and cognition ability. An intu- Fig. 6 Two hands' positions in relation to Leap Motion device when using Type-III hand gesture set itive example is to use a single finger tracing a circle in space to manipulate the crow to turn around. Table 2 also shows the mapping between advanced single-hand gesture set performed by players and the avatar's action in virtual environment.

Type-III
Furthermore, both the player's left and right hands are involved in our Type-III hand gesture interaction for more experienced users, called two-handed gesture set. In Type-III, two subsets of hand gestures (Hand-A navigation gestures and Hand-B action gestures) are designed separately and each set is attached with one hand: navigation gestures are assigned to the left hand for steering tasks and action gestures are allotted to the right hand and trigger avatar's action performance. As illustrated in Fig. 6, the Leap Motion controller is placed in front of the player and his hands are held out straight on each side of the sensor device.  Type-III: Hand-A, left hand navigation gestures allow players to continuously control avatar's movement in virtual scenes when moving left hand into different directions or inclining hand upward/downward, as illustrated in Table 3.
Type-III: Hand-B, right-hand action gestures represent what kinds of action the player wants the avatar to play. Different gestures of the right will trigger pre-recorded animation clips to perform different actions. Action gestures include the action gestures defined in the Type-I basic singlehand gesture set (item 6-7) and Type-II advanced single-hand gesture set.
Generally speaking, everyone has a different handedness, for example, some people are right-handed and others are left-handed, or someone has his/her own understanding of the meaning of hand gestures, e.g., using fist to finish the game instead of the using Okay sign. The type and appearance of hand gestures may vary a lot. Of course different kinds of hand gesture sets could be defined for more personal and more complex interaction with the deep motion sensor device. No matter how powerful the interaction is, there is one principle we should follow: intuitive and natural input is the most desired feature.
Leap Motion controller could recognize and track hands, fingers with high precision. Furthermore, the movement pat-terns of each fingers could be observed individually, and certain kinds of movement patterns could be recognized by Leap Motion as gestures which indicated the user's intent or command. Hand gestures are represented by the "Gesture" class, and its subclasses, such as CircleGesture, KeyTapGesture, ScreenTapGesture, and SwipeGesture [31]. For example, moving a hand from side to side indicates a swipe gesture. Furthermore, there are also some third-party software devoting to extend the gesture recognition functionality of the Leap Motion controller, which enables users to assign actions to different gestures according to their favour [38]. Using Leap Motion API and its third-party software, we could track and recognize the different types of hand gestures as mentioned before. To reduce the degree of difficulty of young children's operation, simple gestures, such as grip and stretch, are recognizable in our system, as shown in the two bottom rows of Table 1.

Output
The output module updates puppet crow's position by the calculation of the motion control module. It also plays pre-recorded animations which have been linked with the recognisable gestures in motion control. Once a recognisable gesture is detected, pre-recorded animation of the puppet crow linked with this gesture will be triggered. The puppet crow can then act as a real puppet and perform a pre-set action responding to the player's hand gestures in the virtual environment. A scenario of a player controlling the avatar by finger gestures is shown in Fig. 9. This module provides the corresponding feedbacks to players for the adjustment of hand movements, which is vital for our cognitive development purposes as well as motor coordination ability training.

Prototype
The plot of "The Crow and the Pitcher" is presented first as the hint of the story, as illustrated in Fig. 7. A scenario of Puppet Narrator during playing is shown in Fig. 8.
In the scenario, there are five kinds of virtual items, each of which has different functions, as shown in Table 4.
In Puppet Narrator, young children can use their fingers to control the animation of the puppet. In Fig. 9a, we can see that the player stretches his hand and move it to control the movement of the puppet crow by mapping the palm position to the crow's position. In Fig. 9b, we can see that the crow has grasped a pebble successfully and is preparing to drop it into the pitcher.

Progressive storytelling creation
It is a dilemma that on one hand using hand gestures to manipulate an avatar to assist storytelling can greatly increase young children's enthusiasm for narrating, but on the other hand, it also demands higher motor and speech coordination. This becomes a certain challenge to young players To reduce adaptive difficulty, we have designed a kind of progressive storytelling training model, which includes two practice phases. During the first phase, players only need to focus on the hand gesture interaction and do not need to worry about the story plot narrating. Story plot is narrated automatically by the system following the players' operation, which provides young children the opportunity to master how to use hand gestures to operate the avatars in the virtual scene through the motion sensor device. Once the young players get used to the hand gesture-based interaction, self-narration is added in as a demand for them to fertilise narrative ability as well as motor coordination training during the second practice phase. The concept model of the progressive storytelling training progress is illustrated in Fig. 10.

Supervised narrative
During the training process, necessary supervision and guide of an adult tutor is provided to assist young children's narration. An important aspect is the plot control which is crucial for further development of interactive storytelling. To provide young players the opportunities to narrate when interacting with the virtual puppet, the tutor will interrupt and ask the player to stop, once a certain task or movement is accomplished to allow time for narration. In our interactive storytelling system, it is not compulsory for young children to follow the direct causality pattern strictly. The entire narration can be considered as a graphic structure which consists of multiple key plot points. The connections between these key points push the plot forward. The key points of the story "The Crow and the Pitcher" are listed below: • Describe a thirsty crow searching for water.
• Show how the crow tries to reach the water in the pitcher but fails. • Describe what the crow thinks when seeing the stones.
• Describe how the crow distinguishes between the different uses of stones and sticks. • Help the crow approach and pick up the stones.
• Help the crow fly to the pitcher and drop a stone into the pitcher to raise the water level. • Present how happy the crow is when it can drink the water.
The concept model of the supervised storytelling procedure is illustrated in Fig. 11. Key plot points accompanied by players' action and interaction compose the main storyline. At the key points, a tutor would gently interrupt the young players' hand gesture interaction with the avatar and remind them to stop at the "gap" to take their time to narrate. Referencing the draft of the plot, the players can narrate from their own understanding of the story.
The screenshots of the story narrated by a 7-year-old boy is recorded in Fig. 12.

Evaluation
(1) Participants A pilot study participated by four 5-8-year-old young children (mean 6 years and 4 months) has been conducted for pedagogical evaluation with permission from their parents who were informed about the nature of the study and its purpose. One adult volunteer (a post graduate student) took part in the experiment as the observer to assess the young players' performance. This observer has research background of HCI and is experienced in VR development. He only provided observations instead of any professional opinion or analysis.
(2) Method All participants were trained to familiarise themselves with the gesture controller after detailed explanation of the experiment.
In the experiment, each child has carried out five trials of the "The Crow and the Pitcher" story. Following the provided key story points, all children were engaged with the linguistic game with great interests and built their own stories by interacting with the digital puppets.
During playing, some minor frustrations (or difficulties) were observed, such as picking up the stick by mistake, failing to move the avatar toward the pitcher, or using a second hand to support the main hand in operation, etc. Getting around these difficulties also made the children feel rewarded and find the game interesting.
An overview observation is that: • All of the children could finish the game with few faults after a couple rounds of repeating.
• All of the children could narrate the plot with more complex words of their own language and understand the last three rounds.
For pedagogical evaluation, two different types of evaluations are conducted: metric-based objective evaluation and observer's subjective evaluation.
Fundamental research has been carried out to evaluate the story narration quality of the elementary school student from long time ago [39][40][41]. Various variables were examined, including story length and syntactic complexity [42][43][44], the use of specific story grammar components, and the presence of episodes, etc [45], among which the metrics of story length and syntactic complexity are commonly examined in the practice of school-age language development studies [42][43][44]. In this paper, the variables of story length and syntactic complexity were studied for narrative ability evaluation using the metric of narrative complexity (M2) to calculate the number of words produced by the children in objective evaluation phase and the criterion of Vivid Narration in the subjective evaluation. The evaluation of other narrative abilities, including Story Grammar Components, Episodes, and Story Comprehension, etc., is relatively complex, which needs the analysis and agreement among independent judges, and children's answers to several questions [45]. Since the main concern in this paper is to provide a novel interactive method to assist narration rather than a case study on the pedagogical evaluation using existing educational tools/systems, only the most commonly used variables were considered.
Event related potentials (ERP) is currently used in the study of the cognitive process and motor control in VR. It is the measured brain response to the sensory, cognitive or motor event [46]. For example, ERP method is employed to study cognitive process in a virtual traffic environment through the presentation of traffic signs with different background colours [47]. In the virtual cognitive training [48], subjects' motion and cognitive process was studied by asking participates to think about and furtherly to grasp the virtual objects (i.e., glasses, cup, scissors, mouse, pen, and fork) with gesture controller. Based on these previous event-related potential (ERP) research, three events were investigated in our experiment in accordance with our system features and target users: locating hand to grasp pebble, picking up stick by mistake, and performing wrong hand gestures. They were depicted with three metrics, respectively: number of tries to locate pebble (M3), number of tries to pick up stick (M4), and wrong hand gestures (M5). For more details of event-related potential (ERP) method, refer to the previous research [46][47][48]. Considering the implementation difficulty in objective evaluating, two subjective criteria for assessing the abilities of the smoothness of hand movement and accuracy of hand/avatar location were evaluated by an observer's subjective judgement. The statistics of objective metrics in each round of trial is recorded. The trend plot of average values of metrics is presented in Fig. 13, which showed improvement and the details are shown in Table 5.
Both Fig. 13 and Table 5 show the quality of improvement of young participants' performance over several rounds of trials. Ability to successfully locate the pebble (M3) is improved by practicing picking up or dropping pebbles which can be indicated by the decline of locating tries (shown by green line in Fig. 13). The average number of tries of picking up the stick by mistake (M4) drops from 1 to 0, which means that children can distinguish the stick from pebbles after two rounds of training. Wrong hand gestures (M5) in the later stages of the experiment is improved over the earlier attempts.
We noticed from the experiment that young children were able to tell the story confidently and use more words and longer sentences in the last two rounds of trials, with some encouragement, which is reflected by metric M2 (red line in Fig. 13). One interesting thing we found is change of duration of narrating the story (M1). Initially, participants used 3 min on average to complete the whole narration and then they could finish it within 1.5 min, which benefits from getting more familiar with the system. Finally, the duration of narration tends to last a little longer. The reason is that once young children get used to the control mechanism and the interaction interface, they tend to pay more attention on the story line and the improvisation, which is vital for storytelling and more suitable for pedagogical purpose as a learning tool for children.
(4) Subjective evaluation Three subjective criteria are included in the subjective evaluation of the young players' performance: smoothness of hand movement, and accuracy of hand/avatar location and vivid narration.
• Smoothness is the characteristic of player's coordinated hand gesture movement. • Location accuracy determines the accuracy of the player's hand that manipulates the avatar to a certain location in the virtual scene. • Vivid narration requires young children to use rich expressions to create a clear picture and use vocal tones to suit the story. It enables the audience to be fully engaged and emotionally involved in the story.
The evaluator rated the players on each subjective criterion using a scale that ranges from "poor" to "excellent", as opposed to objective evaluations that more often have a numerical score attached to the criteria. The evaluator's records showed the improvement of players' performance during the subjective evaluation experience. First, young players' hand movements became smoother with practices. The decrease of the jerky hand movement suggested the improvement of the motion control and handeye coordination.
Second, the ability to locate hand position accurately is enhanced. From the tutor's observation, the times of relocating attempts decreased in the last trails. Children tended to control avatar's movement more easily by gradually adapting the mapping between their hand positions and the gesture controller's coordinate system after several rounds of practice.
Finally, and most importantly, the training led to the ability to present a vivid narration. According to the records, the children tended to use more descriptive words for narration across the five rounds. One typical example is the words used by a 7-year-old boy at the start of the story: in the first trial, the boy used the sentence "A crow is looking for some water". In the third trial, the sentence was rephrased to "A thirsty crow is flying around looking for water". After adding more decorative words, the sentence finally transformed to "A thirsty crow is flying around on a hot summer day looking for water". It is obvious that the narrative ability was enhanced during storytelling. With the improvement of the criteria of the smoothness of hand movement and location accuracy after several rounds of trials, the children were able to pay more attention on the narration and rhetoric, which is vital for storytelling and even more suitable for pedagogical purposes as a learning tool for children.

Discussion
During the test, the aspects of children's cognitive competence and motor coordination ability were improved.
Numerical cognition: all the three pebbles need to be dropped into the pitcher before the crow can drink water, which requires a basic numerical cognition to finish the game.
Spatial cognition: pebbles could only be picked up within predefined area and dropped into the pitcher, which means that the spatial cognition ability is required in locating the pebble.
Visual perception: there are several kinds of virtual items in the scenario and each of them has different properties and usages, which means that the players need to distinguish them from others and make correct choices, for example, only pebbles can make water to mount up and the stick will not work.
Motor coordination: only predefined hand gestures can be recognized by our system, which means that players need to perform hand gestures correctly. For controlling the puppet crow, players need to use hand motions as the input to manipulate the crow in the virtual environment and adjust hand gestures according to the visual feedback received from the crow's responding movement. This requires players to move their hands smoothly and steadily. During the performance, the players' narration often incorporates with hands movement and changing gestures, which also forms motor coordination.
In our experiment, basic single-hand gesture set (Type-I) is used as the HCI instead of more complex hand gestures, although we have designed two other different types of hand gestures set to provide different levels of interaction. Due to the young participates' limited motor control ability and the cognition capability, we have difficulty to train the young players to use the complex hand gestures. Simple hand gesture is more adaptable for young children by providing a natural and intuitive way for interaction at a basic level. It will be possible to introduce complex gesture set for older age group if a new experiment will be designed or in other pedagogical practices.
One limitation of the user interface comes from the hardware. The Leap Motion device can only detect hands, fingers, and tools within the tracking area. Once player's hand moves out of range, the system can easily lose track and the pup-pet will keep still and stay, where it is until the hand moves back within the detective area and is recognised again by the controller. At the same time of providing a better immersive experience to the players, the passive effects, e.g., the cybersickness, should also be taken into account in system design. To alleviate the passive effect, for example, the entire virtual game/story is designed to consist of several key plot points as we discussed in the system design. In addition, at some points (e.g., the key plot points), players were reminded to stop at the "gap" and take their time to narrate. That means there should be not too much time restrictions, especially when the targeting users are young children.
There are limitations with our evaluation considering the limited data pool. According to our observation, there were obvious differences between 5 and 8 years in storytelling performances. In addition, the language skills also varied between genders. These differences are expected to be well observed and discussed in our next research. We plan to exploit our system to conduct indicative experiments with more subjects to identify the pros and cons with quantitative data and analysis. Both independent variables and dependent variables will be measured or observed to provide a clearer understanding. In addition, a control group will be involved in the future serving for comparison evaluations. The number of subjects is limited by the difficulty of working with young children, where close supervision and support of parents are necessary. Therefore, a control group is missing in the experiment which may possibly cause ambiguity.

Conclusion
We have presented a novel digital storytelling system assisted with virtual puppetry, 'Puppet Narrator', providing young children with natural interaction/control and immersive experience when narrating story. The system is designed to support training of different cognitive skills and motor coordination through storytelling. It has been a novel attempt to include advanced motion-sensing technology and computer animation as a medium for this development.
The usability of the system is preliminary examined in our test, and the results which showed that young children can benefit from playing with Puppet Narrator from the analysis are promising. However, as only a limited number of subjects are tested, we will need to examine more cases and design a psychological experiment to affirm the conclusion. Further validation and analysis of the effectiveness of this approach is needed, but at the moment, our observation and analysis can be supported with success of other parallel development in digital story telling [8].
The story telling in our test is a supervised learning process guided by an adult, which is important for the young ones to accomplish their narrative task and receive proper training.
Without the presence of supervision, it is possible that the virtual puppet may distract the story telling that the child would focus on the playing and controlling of the puppetry without practicing their narration. It will require a rewarding strategy in the future development of such system to automatically encourage and reward the players when they accomplish the narrative task properly.