Introduction

Embodied cognition posits that our cognition is grounded in our bodily interactions with social, cultural, and physical environments (Barsalou, 2010; Núñez, 2005; Wilson, 2002). It proposes a viable alternative to some core assumptions of early cognitive science, such as the idea that knowledge is encoded in amodal, symbolic cognitive structures. In the embodied cognition perspective, cognition is deeply rooted in our bodies and the possible interactions between the body and the physical environment. Embodied cognition theory emphasizes also that bodily and emotional engagement is an essential part of learning, which conventional curriculum design often fails to incorporate (Glenberg, 2010).

In recent years, advances in digital technology have spurred designs for embodied learning experiences, applying embodied cognition theory to the learning of academic topics for various age groups of learners. Virtual and augmented realities, for example, enable designing learning environments within which bodily movements and gestures are used to perform a learning task (e.g., Lindgren & Johnson-Glenberg, 2013).

This paper introduces humanoid social robots—also called sociable robots (Breazeal, 2002)—as a new tool to facilitate embodied learning experiences. Over the last decade, social robotics, as a rapidly growing field in engineering and computer science, has examined the interaction and relationship building between robots and human users. Especially for education, researchers use a metaphor of a co-learner or a playmate. These robots have been developed popularly for supporting young children’s learning and development (Kennedy et al., 2015). Much research in this arena shows the great potential of the robots for children’s coordinated development intellectually and socially (Belpaeme et al., 2018), which is in line with core claims of embodied cognition. Nevertheless, social robots have rarely been examined from the perspective of embodied cognition and learning. We (the authors) hypothesize that the robots can facilitate the embodied interactions of young children while they learn foundational academic skills and concepts. In the paper, we sought to understand the affordances of the robot for enabling young children’s embodied learning experiences. We examined children’s interaction behavior with the datasets that we had collected, as part of multiyear design research projects, from two cohorts of children who participated in two robotic interaction activities. With the lack of prior research, we adopted a grounded theory approach (Charmaz, 2006) to exploring emerging patterns in embodied actions while children engaged in robotic interactions.

Theoretical background

Embodied cognition

In the last two decades, the idea that the body is closely tied to cognition has attracted many researchers in psychology, cognitive science and learning sciences (Alibali & Nathan, 2012; Glenberg, 2010; Wilson, 2002). Scholars of embodied cognition generally agree that sensory motor systems are involved in cognition and that cognition is mediated by body-based systems including body shape, bodily movements, and experiences of bodily processes (Lakoff & Johnson, 1987).

According to this perspective, cognition is “an extended system assembled from a broad array of resources” (Wilson & Golonka, 2013, p. 1), rather than confined to the brain and operating on amodal symbolic representations. The perspective includes the assumption that sensory systems not only deliver information to the brain but also process it in a way that enables actions. Further, perceptual processes alone may be able to direct behavior towards a goal without the need for abstract representations of the goal in the brain (McBeath et al., 1995). Over the last few decades, considerable evidence has accumulated on the integral role of the body in knowledge development and learning (Bieda & Nathan, 2009; Richland et al. 2007), highlighting the benefits of involving the body during cognitive activities (Alibali & Kita, 2010; Goldin-Meadow et al., 2008).

The theory of embodied cognition also emphasizes the importance of environment-body links and body states (including emotional states) in explaining cognition (Danziger et al., 2011; Gallagher, 2005; Shapiro, 2011). The body is directly in touch with the environment; the senses and visceral feelings guide or influence action directly without central processing by the brain. In performing cognitive tasks, therefore, the position of the body is consequential in what resources are appropriated which, in turn, influences how problems are solved. For example, Nemirovsky et al. (2012), showed that when the whole body was involved in solving geometry problems, the way in which learners developed solutions differed from when they used paper and pencil. In this study, learners solved algebra problems on a classroom floor on which mathematical symbols and structures were drawn, carrying out operations and anticipating results using their body. The study highlighted that in the paper-and-pencil setting, learners developed a more discrete style of action based on a sequence of individual operations. Embodied solutions, in contrast, were grounded in “the continuity of geometric space” (p. 320).

The role of the social environment in which an embodiment occurs is another focal point in the embodied cognition perspective. Its relevance is emphasized in Goodwin’s (2003) account on the ecology of multimodal systems (e.g. gestures, gaze, language). These systems function together to build relevant action within an interaction. For example, Goodwin (2003) suggests that gestures are intended to be publicly visible. Their production is grounded in environmental structures, and behaviors that are deployed jointly to make gestures interpretable by the participants. Clearly, in this perspective, embodiments are also produced to make other behaviors interpretable. This view contrast with others’ (e.g. McNeill, 2005) that view gestures as embodied manifestation of the same psychological processes that underly the production of speech. Goodwin’s (2003) account is closer to Gibson’s (1986) ecological psychology perspective which describes human organisms as operating within an environmental niche co-created by the capabilities of the organism (including, clearly, bodily capabilities) and properties of the physical and social environments that establish possibilities for action.

Recent work has integrated these views into analyses of children’s bodily movements during learning activities. Flood and Abrahamson (2015), for example, documented how students’ and teachers’ gestures are produced so as to invite elaboration of each other’s thinking. In a session on learning the normative definition of speed, a child used a gestural representation of distance (two differently-sized pinches using thumb and index finger), and the teacher built on it to teach the child that definition: greater distance per unit time. The gesture is interpretable only by taking into account environmental structures, other co-occurring behaviors (talk) and the goal to make it visible to the teacher. It was based on a graphical representation of distance visible on a screen and produced within the line of sight of the teacher so as to invite elaboration.

In this sense, bodily expressions are social actions. Bodily action is a form of communication (Goodwin, 2000) that can be interpreted effectively in terms of the position in an interactional sequence. Figuring out how forms of embodiment may evolve and build on each other during social interaction will help advance both embodied cognition theory and designing for embodied learning experiences (Roth & Thom, 2009; Wittmann et al., 2013).

Further, there is growing evidence that children’s reasoning is embodied, and that embodying reasoning is beneficial to their learning. Williams (2012), for example, showed that children, aged six to eight, relied on image-schemas grounded in bodily structures and processes to reason about time when reading an analogue clock. Williams (2012) contends that during clock reading the image schema SOURCE-PATH-GOAL “structures the conceptualization of a full path of motion” (p. 223). This image schema is grounded in bodily experiences of moving towards a goal (Lakoff & Núñez, 2000). The clock’s two hands, which are linked conceptually and mechanically, follow a path of motion starting and ending at the 12. This point determines whether the time is called using “past” or “till”: for reading “past” time, the reference hour is the source of the short hand motion (e.g. “it’s 20 past 6”), while for “till” readings it is the goal of the motion (“it’s 10 till 7”). In a study with children aged four to six conducted by Boncoddo et al. (2010), were instructed to use their hands to accompany their thinking about a mechanical process. This group of children developed more correct explanations than children who only observed the process. Importantly, the embodied reasoning of children is well-aligned with child development theory that acknowledges the integral nature of intellectual, social, emotional, and sensory-motor development.

To conclude, the various ways in which the body is implicated in cognition and learning have not yet been sufficiently explored. To advance the theory of embodied cognition, it is necessary to design environments where the complex interrelations between cognition, the body, and the environment can be observed in situ (Malinverni & Pares, 2014). Also, in order to design programs in ways that support the cognitive and emotional development of children, we need to know how embodiment evolves in physical and social environments. Documenting children’s embodiments will enable the development of coherent models and effective embodied learning designs.

Digital technology and embodied learning design

Advanced digital technology has been increasingly used to develop environments that support embodied learning. Such environments are designed to develop perceptual and cognitive structures and processes by prompting learners to engage in physical actions. Different approaches to designing technology have been tested, but the efficacy of one approach over another is still to be determined (Johnson-Glenberg & Megowan-Romanowicz, 2017).

In general, there is a consensus that effective designs must allow learners to enact their full behavioral repertoire rather than limiting them to a certain range of actions and gestures through direct instruction (Antle, 2009). Hall et al. (2014), for example, designed an environment where learners were able to freely choose the way that they perceived and appropriated resources to develop their understanding. Likewise, Martin (2009) showed that children (1st graders) achieved more conceptual gains from embodying their mathematical thinking when they freely explored solutions involving physical actions than when their actions were constrained. This kind of design seems to closely mirror how we learn in the real world and is also in line with the view that learning is better supported when learners are able to direct their own activities and physical manipulations (McNeil & Uttal, 2009).

Social robots

Social robots (or sociable robots) are autonomous humanoid robots that interact, communicate, and do things collaboratively with humans and other robots (Kose-Bagci et al. 2009; Taipale et al. 2015). As a subset of service robots, they are popular in education, health, and other retail service sectors and distinguished from industrial automated systems that perform menial and hazardous tasks on behalf of humans (International Federation of Robotics, https://ifr.org/service-robots). Modelled on human social and cultural protocols, social robots are designed to follow the rules and manners commonly expected in human social relations for their respective roles, as well as demonstrating social behaviors (such as greeting, being polite, friendly, etc.).

The use of social robots is in line with seminal classical theories in developmental psychology, which emphasize the importance of social interaction in child development. Bandura’s concept of peer modeling (Bandura, 2001) highlights the importance of peer interaction in a child’s learning and play; in a study, companion robots who match children’s ability levels collaborate with children learning target skills (Westlund & Breazeal, 2015). The Vygotskian concept of zone of proximal development (Vygotsky et al., 1978) supports the presence of an advanced other to stimulate intellectual development; likewise, tutor robots can help improve children’s learning through social and instructional dialog (e.g. Saerbeck et al., 2010). To date, the findings from child robot interaction research are consistent in that children are highly engaged in the task when assisted by a robot and develop social and affective relationships with the robot.

Physical presence with embodiment seems to be a defining feature of social robots. An example of a social robot is presented in Fig. 1a, typically standing on two legs and moving. This distinguishes the robots from other digital tools such as virtual agents (i.e., animated on-screen characters) and mobile devices. A number of studies over the last decade have shown the benefits of the physical presence of an embodied robot for learners’ motivation, engagement, and task performance. In a pioneering work, Kose-Bagci et al. (2009) compared three types of robot presence in child-robot collaboration, where children (aged nine to ten) practiced drumming while taking turns with a robot KASPAR: physically embodied vs. digitally embodied (i.e., an on-screen robot) vs. disembodied (no visual presence of a robot). Children who worked with the physically embodied robot understood the game better and improved their drumming and turn-taking performances compared to children in the other two conditions. This impact was stronger when the physically embodied robot used hand gestures; also, in this condition, children perceived the robot as significantly more intelligent. The study did not clarify how this stronger attribution of intelligence to a gesture-using robot might relate to children’s performance. We speculate that a child’s theory of mind (attributing mental states to others and objects) might trigger a sense of social presence in child-robot interaction, which then facilitates their developing social relations with the robot. In another study, students demonstrated more collaborative behaviors in book-moving tasks with the robot physically present than with on-screen robots, responding to the robot’s request in a shorter time and performing unusual tasks more frequently (Bainbridge et al., 2011). Also, Li (2015) surveyed thirty-three empirical studies that have examined the effectiveness of the physical presence of embodied robots. The author concluded that participants, regardless of their age, were better persuaded by physically embodied robots, perceived the robots more positively, and often performed better with the robot than with on-screen avatars.

Fig. 1
figure 1

Sample snapshots of child robot interaction

As such, physically embodied robots seem to afford a stronger sense of social presence for children (Kennedy et al., 2015), providing social, interactive, and even affective contexts. Such contexts seem essential for the holistic development of children. Indeed, complex pro-social behavior in social robots, such as attention-guiding and showing empathy, resulted in increased learning in children (Leite et al., 2013; Saerbeck et al., 2010). Narrative gestures used by story-telling robots increased children’s story recall (Huang & Mutlu, 2013).

Our studies on embodiment in child–robot interaction

Purpose of study

The purpose of the current studies was to understand whether social robots could be effectively used as a tool to facilitate children’s embodied interactions. With the datasets collected from two cohorts of kindergarten children from two multiyear design research projects, the authors qualitatively observed children’s interactions in robot-mediated activities. In the first dataset (Study One), children learned and played with a robot individually; in the second dataset (Study Two), children worked collaboratively with a peer in robot-mediated activities.

Methodological framework and process

This line of inquiry was unprecedented, so we took a grounded-theory approach (Glaser & Strauss, 1967) to our exploration of seeking emerging patterns in children’s interaction behaviors repeatedly across the robotic activity sessions. We ethnographically observed the activity in situ—at home and/or in school, and two senior researchers and graduate assistants took notes children’s activities, actions, conversations, interpersonal interactions and other of observable behaviors. In taking notes, and later when viewing the videos, we were broadly guided by the notion of exchange structure as this had guided the design of robot actions. By attuning our sensibility to it, episodes were identified as bounded by an initiation (e.g., the robot moving; the robot asking a question) and a response (e.g., a child carrying out a physical action; a child’s answer). Re-initiations and follow-up questions were viewed as separate episodes. The 20-min sessions were segmented into about 30 episodes of varying length, which could range from 10 s to a few minutes. When a behavior of interest to our research questions occurred in a segment, it was interpreted in the subsequent steps as an instance of our broader categories that emerged from the analysis.

Then, we integrated everyone’s field notes into one master researcher journal for analysis. We also recorded the sessions by digital media (audio/video recordings). Afterwards, one researcher, with the help of the graduate assistants, conducted a systematic and thorough analysis. We first reviewed the episodes in the researcher journal in detail to check whether the behaviors described therein were sufficiently salient and interesting to warrant their interpretation. Following this, we further reviewed the video clips specific to the scene of interest, which provided richer and more accurate multimodal information. We then summarized recurring behavioral patterns, categories, and instances; lastly, both researchers discussed this output to confirm the patterns.

As we observed the children’s interactions on site, it was very obvious that children’s interactions were richly embodied, and that this embodiment was inherently present in almost every action they took. Through the lens of embodied cognition theory, we analyzed the embodied actions of every child in the cohort both as an individual and as an interactant in the triad. We looked for notable patterns in the ways in which children involved their body and social emotions and used the physical environment during problem-solving and communications with the peer and the robot, referring to Wilson and Golonka’s (2013) questions on the nature of tasks and the ways individuals access, assemble, and use bodily, social, and physical resources to accomplished the tasks. The overall analytic techniques were similar to the classical qualitative data analyses (Charmaz, 2006; Corbin & Strauss, 2008), starting with open and thematic coding and moving small bits of information into larger categories. In this process, we constantly compared patterns that emerged in most of the children’s behavior across the interaction sessions and refined the patterns to ensure that they were unique. Prior accounts of embodied learning (discussed in the literature review above) provided some guidance in developing these categories. For example, as we viewed and discussed children’s embodied behaviors, it became clear that in almost all responses to questions relating mathematical measures children accompanied the verbal responses with bodily movements. Thus, we decided to develop the category Embodied doing of early mathematics and science. Table 1 provides a summary account of the process and resulting qualitative categories. We paid particular attention to those aspects of children’s engagement that were particularly afforded by a physically embodied robot and unlikely to be demonstrated in interactions with other popular electronic devices (e.g., computers, tablets, and phones). This enabled us to reveal that embodied actions resided in the physical space afforded by the robot.

Table 1 Summary data coding

The authors have complied with American Psychological Association ethical standards for conducting research with young children, including institutional and parental approval and protecting the rights and welfare of the children. We acknowledge that child-robot interaction in this study is in line with the position of the National Association for the Education of Young Children (https://www.naeyc.org/), in that our designs are developmentally appropriate and equitable in order to optimize opportunities for all children’s cognitive, social, emotional, physical, and linguistic development.

Study one: a robot for active engagement

Material and context

In this study, we developed one-on-one child-robot interaction activities, where young children learned basic English language and literacy skills (i.e., identifying the names of geometric shapes, the names of colors, and letter-sound correspondences). The activities were implemented at home and in preschool with eleven children (aged three to six) two times within a one-week span, each time for about an hour.

According to child development theory (e.g., McDevitt & Ormrod, 2015), children aged three to ten are typically at the developmental stage where they develop fine and gross motor skills. Also, pediatric research reports that children’s visual-motor skills are closely connected to later success in academic skills development (Radesky et al., 2015). Our main design goal was to create learning activities that were developmentally appropriate and also engaged children holistically (i.e., bodily, emotionally, and cognitively). Given that the robot was new to many children and caregivers, we designed the activities to balance familiar materials (rhymes, flashcards, and books) and the new technology (Kim & Smith, 2017). The various sensors and wheels on an embodied robot enabled the designers to implement multiple channels for haptic and kinetic interaction between child and robot (see Fig. 1a). The sensors also interfaced with physical flashcards and books (see Fig. 1b, c). We produced three main activities (songs, card games, and a book). They played a specific role in incremental mastery of the learning objectives. The song activity was designed to expose children to the topic as the robot sang about it. The card game activity allowed children to practice what was introduced as reinforcement. The book activity extended what children had learned by applying it to new contexts, in which children used the topical words and sounds to help the robot to find a passcode to launch its spaceship.

Emerging themes

We identified two distinct phenomena from the observations: (i) rich multimodal interaction and (ii) fluid learning space.

Rich multimodal interaction

Particularly when compared to interactions with other digital media such as on-screen characters and tablets, one clear pattern was observed in children’s interactions with the robot. These transactional interactions were not restricted only to a stationary screen. The children did not have to hold the device with their hands or use other tools (e.g., mouse, keyboards, etc.) to respond to the robot. Instead, children used a touch screen to select an item according to the robot’s cues while the robot nodded its head down to present the visuals on the screen from an appropriate angle for the child. As shown in Fig. 1d, while playing games, children touched the robot’s forehead which was equipped with touch sensors or the proximity sensor on the robots’ eyes (see Fig. 1a). In Fig. 1e, children showed a physical card to the robot whose mouth was equipped with an optical sensor that read the card. In the activity with a physical book (Fig. 1f), children used a wand to cue the robot’s book reading and play an I-SPY game on the book’s pages. All of these actions by the children demonstrated that the robot’s embodiment created possibilities for diverse interaction modalities between the robot and the child.

Fluid learning space

The robot’s bodily movement appeared to trigger children to move, changing their postures and locations around the robot’s postures and locations. Very often children made sure that the robot could see what they were doing by voluntarily re-positioning themselves within its visual field. They directed their embodiments to its front, e.g., making a gesture close to and in front of the robot’s face.

Figure 2 presents snapshots of this transitional interface. In this activity, the robot used its wheels to move around on the floor while it sang songs about the names of colors and shapes. In the color songs, it moved as if it were dancing to the rhythm. In the shape songs, it moved to draw the shape (for example, a triangle) on the floor. At the end of the song, the robot repeated the word “triangle” a few times in a different pitch and tone each time, nodding its head. It asked children “can you say triangle?” Children repeated the word while mimicking the robot’s behaviors. In summary, in the learning activities afforded by the robot, the children’s learning space was fluid, and their engagement was mobile.

Fig. 2
figure 2

Examples of fluid learning space

In Study One, it was evident that the physically embodied robot elicited children’s active bodily engagement. Additionally, two unexpected, noteworthy patterns emerged from our observation: (i) extended attention and engagement and (ii) voluntary collaboration with a peer. The diverse and free interactions with the robot mentioned above might be conducive to children’s extended attention and engagement in this playful learning activity. Each interaction session was designed to last fifteen to twenty minutes; the design team scheduled trial sessions for one child for about thirty minutes, including some preparation time. However, each session ran more than one hour because the team found it difficult to interrupt the children’s flow in their engagement with the robot (Csikszentmihalyi, 1990). When the team finally intervened to stop the session, the children always wanted to play more and only left reluctantly, asking if the robot would come back. This observation made the team rethink young children’s attention span, conventionally viewed as quite limited. When a learning activity infuses a sense of play, allowing them to move around and interact naturally (i.e., diverse multimodal interactions using the hands and body freely), children’s attention lasted much longer. Also noticeable was that the children voluntarily invited their friends nearby to their play with the robot although the play was designed for one-on-one interaction (between the robot and one child). The two children voluntarily discussed the problem together, took turns, and helped each other. This motivated the design team to leverage the robot’s affordances for developing collaboration skills among young children.

Study two: a robot for mediating children’s collaboration

Material and context

In Study Two, we instantiated a triad of a robot and two children where the robot mediated the children’s collaboration. During this triadic interaction, the children were expected to develop collaborative skills while helping the robot as they discussed early academic concepts and solved the relevant problems. We implemented this activity with ten pairs of children in a media lab of a public elementary school over two weeks (six sessions, each session taking twenty to twenty-five minutes). The observation team consisted of two senior researchers and two graduate assistants, who focused on understanding how children spontaneously use their bodily movement, gestures, and physical environment in order to communicate their understanding of early academic concepts.

At the start of the activity, the robot introduced itself as a newbie who had just arrived from another planet and asked the children to help it learn about things on earth. The topics included animals, birthdays, family, and school. Out of six interaction sessions, three were conversational and three were solving problems using one shared tablet. During the conversational sessions, the robot asked open-ended questions like “what are animals?” “What do you do on your birthday?”Why do you come to school?” Children jointly answered the questions. When they gave conflicting answers, the robot said “I’m confused. Can you two talk and give me one answer?” This way the robot prompted the children to bring in their personal experiences and discuss together the best answer for the robot. During the problem-solving sessions, the robot introduced its problems related to the topic and asked the children to help solve them. In this process, children co-created imaginary artefacts using a shared tablet. For example, after learning about animals from the conversation session, the robot asked the children to help build an imaginary zoo on its planet. From the built-in item libraries, children had to agree to select one item at a time in order to populate animals, feed the animals, and provide the appropriate habitat. Other examples included planning the robot’s birthday party and overcoming obstacles that the robot’s parents faced on their way to visit the robot on the earth. Solving these problems involved children’s practice of early literacy (e.g., consonant/vowel strings and word formation) and arithmetic skills (e.g., number symbols, addition, and subtraction).

The current status of natural language processing technology is not sufficiently mature to manage natural conversations between the robot and children. For robot utterances, therefore, we adopted a Wizard of Oz method (Riek, 2012), where a researcher hidden behind the scene controlled the robot’s utterances and movements. During our observations, we also looked for emerging patterns in children’s embodiment. Spontaneous bodily actions during talking and thinking are increasingly seen as manifestations of the activation of a system that integrates bodily multimodal information. The actions are thus viewed as evidence that the cognition is embodied (Hostetter & Alibali, 2019).

Emerging themes

Consistent with Study One, multimodal and multisensory embodiment was an essential part of children’s interactions across the sessions even when the robot’s movements and sensors were not activated. All pairs of children used their body extensively for a broad range of cognitive and communicative processes. We categorized children’s embodiments into three main themes: (i) embodiment of early mathematics and science knowledge and reasoning, (ii) appropriation of physical space, and (iii) embodied collaboration.

Embodied doing of early mathematics and science

This theme encompasses instances when children engaged their body simultaneously with thinking and talking about physical properties. Prompted by the robot’s questions about a topic (e.g. animals and birthdays), children willingly brought in their personal interests and experiences on the topic. The robot’s questions were open-ended, such as “What are animals?” “What is birthday?” Also, the robot showed an image on its screen and asked, “Is this an animal?” followed by probing questions, “How do you know that?” “How big is it?” “Where does it sleep?”. The children’s responses were rich, integrally using speech, gestures, and movement.

First, children embodied their explanations of physical and mathematical properties, such as height, size or speed of an animal. When the robot asked about properties of an animal (e.g., “How big is a cat?”), children accompanied their verbal answers with gestures, most commonly using their hands and arms (Fig. 3). The child in Fig. 3a, for example, moved her hands in quick rotation (indicated by the blue arrows) when answering the question “How fast is a rabbit?” The Fig. 3b exemplifies children’s use of gestures to represent an animal’s height; Fig. 3c to represent the size of an animal.

Fig. 3
figure 3

Bodily representation of mathematical properties

Second, children used their body frequently when they explained or represented processes. The child in Fig. 4, for example, appeared to understand biological processes as involving objects and their transformations occurring in distinct stages. He had just heard the robot saying, “On my planets there are no animals. What are animals?” After answering “Animals eat,” he spontaneously extended his response using hand forms and movements. Figure 4a shows the embodied representation of the first step of the digestive process, with the right hand representing solid food, and the left hand the stomach. Figure 4b shows how food and stomach come to interact. The child drew food and the stomach together representing how food moved towards the stomach. Figure 4c shows the representation of the process of digestion. The child opened his left hand to envelop the right hand to represent that the stomach grasped solid food. These embodied representations were carefully carried out, clearly demonstrating how the objects interacted in the distinct stages.

Fig. 4
figure 4

Embodied representation of the digestive process

Third, embodying knowledge involved not only the representation of single ideas but also the reasoning of several ideas in sequence. When responding to the robot’s question, children often developed their answers in an elaborate manner. This elaboration often involved more than one idea and was typically accompanied by visible physical actions. Figure 5, for example, shows a sequence of actions that accompanies such elaboration. Here, the robot showed an image of a humanoid robot standing upright on two legs and asked, “Is this an animal?” The child answered, “No, this is not an animal!” The robot then asked, “How do you know that?” She answered, “Because real animals don’t stand up.” While talking, she stood up and said “they don’t stand on their feet like humans. They stand on four feet.” She then crouched down on all four limbs and started to crawl, mimicking common body position and locomotion of animals (Fig. 5a). Realizing that her general statement about animals would not apply to all animals, she stood up again (Fig. 5b) and assumed the walking position of a penguin (arms close to the torso and hands flat against it). This change in position was accompanied by her saying, “and some, and some, like penguins, they stand on two feet.” She started walking awkwardly without bending her legs just like a penguin (Fig. 5c). We interpret this sequence of embodiments as displaying a reasoning sequence involving the detection of and recovery from an overgeneralization.

Fig. 5
figure 5

Embodied reasoning

Appropriation of physical space for categorization

Under this theme, we group recurrent instances of talk and thinking implicating a child’s peri-personal space, which is defined by the body (e.g. how far arms can reach). Specifically, children used bodily actions and positions when they defined categories and the boundaries between the categories. Such talk was often accompanied by physically drawing spaces in the air or on the floor. The drawn spaces and boundaries were typically larger than the children, so they used hands, arms, and the whole body at the same time. Figure 6 exemplifies the response of a child to the robot’s question “what are animals?” The child stood up and said, “There are animals that are herbivores eating plants and animals that are carnivores eating meat.” While saying this, she took a step with her left arm pointed to the left for herbivores (Fig. 6a) and then to the right of her torso for carnivores (Fig. 6b), indicating that herbivores are a different category from carnivores. Then, while saying “and omnivores eat both” (Fig. 6c), she moved both arms in front of her torso with both hands open to create a space that encompassed both categories.

Fig. 6
figure 6

Appropriation of physical space to represent categories

Figure 7 presents another example of embodying categories. The figure shows the girl’s embodiment after the robot’s question, “is this an animal?” She answered, “No! This is a robot!” and then “there are animals and things.” While talking, she placed her hand on the floor to mark a boundary between two categories (Fig. 7a). The boundary she created was pointed towards the robot, placing a robot in one category. She continued to open her left arm, saying “Animals are there” (Fig. 7b). Then, she pointed with her finger to the robot’s head, saying “and robots are here” (Fig. 7c). In this example, she used the position of the robot to draw the boundary between robots and animals.

Fig. 7
figure 7

Appropriation of physical space to represent categories

Embodied collaboration

This theme explored examples where children implicated their bodies in support of communicative and interactional goals. There were many instances where the embodiment of a child evolved along with the embodiment of his/her partner. Clearly, embodiment was not only a form of knowing and reasoning by an individual but also a social act. While doing tasks together, children saw their partner’s embodiment and built on and extended that embodiment to continue the tasks. Such connected embodiments occurred most frequently when a child represented mathematical properties or the appearance of animals or objects. Figure 8 presents one such connected embodiment. Upon the robot’s question, “How big is a monkey?” the girl used her hands to represent the height of a monkey and saying, “This big!” (Fig. 8a) After seeing this gesture, the boy said, “No, it’s not! It’s only that big!” and used his hands to represent a height that was shorter than the girl’s (Fig. 8b). Observing the boy’s gesture, the girl said “Aah!” and corrected her gesture to match the boy’s.

Fig. 8
figure 8

Connected embodiment for a mathematical property

In Fig. 9, the robot showed the children a picture of a lion and asked, “Is this an animal?” The child in a blue shirt first responded, “Yep, and it’s definitely a mane ‘cause that’s what a lion looks like.” While talking, she drew a half-circle in front of her head representing a lion’s mane (Fig. 9a). The child in a green shirt observed this gesture and started to move her hands towards her neck (Fig. 9b) and said, “Yes. Yeah, the lion has a mane.” She then placed both hands close to her neck to represent more clearly that the mane of a lion is located around its head (Fig. 9c).

Fig. 9
figure 9

Connected embodiment for appearance

Another instance of embodied collaboration occurred when children took turns during tasks. In a few activities, two children used a shared tablet to solve problems. To prompt collaboration between the children, we used a completion paradigm, where each child had to choose one half of an item to complete the item. During this task, we observed several instances of embodied turn-taking, that is, children used hands and bodily gestures to signal to the partner their intent to yield turns, such as gazes, elbow-nudging, and withdrawal postures. Figure 10 shows two instances where turn-taking is embodied. In (a–c), the child on the left selected the right part of the animal using the index finger (Fig. 10a), then withdrew and curved it (Fig. 10b) to signal the other child that it was her turn. Upon the complete withdrawal, the other child made a selection (Fig. 10c). In Fig. 10d–f, the boy selected the left half of an animal (Fig. 10d) and then withdrew the index finger from the tablet (Fig. 10e). To signal the girl that his turn was completed, he formed the hand into a fist and drew it away from the tablet (to his chin), which triggered the immediate action of the girl to make her selection (10f).

Fig. 10
figure 10

Embodied turn-taking

In addition, there was a distinctive phenomenon recurring consistently among the children. Many children habitually used hand gestures and head movements that were not connected directly to the specific concept nor the processes they were engaged in. The gestures and movements did not seem to carry any particular meanings but are notable because they recurred every time children engaged in similar tasks. This pattern of embodiment seemed more prevalent especially when the task was cognitively demanding, such as counting larger numbers, elaboration, or trying to prevail in an argument. In Fig. 11a, the girl bobbed her index finger up and down while she counted a larger number of fish (e.g., any number larger than 5 or 6) displayed on the tablet while she only gazed over to count a small number of fish (e.g., 3–4). Figure 11b shows the gazes and head movements of a girl and a boy while elaborating. While talking to each other, they often needed to think to elaborate their ideas. When this occurred, their gazes moved away from the peer and the robot towards open space as if they needed time alone to concentrate on their thoughts, shortly after which they came back to converse.

Fig. 11
figure 11

Other embodied actions

Embodied emotional interaction

Another noteworthy pattern of embodiment was that children frequently used their bodies to express emotions towards the robot while playing and learning. All children in both studies commonly used celebratory gestures and hugging to express their excitement and companionship. In Fig. 12a, a child expressed excitement, giving a thumbs-up. In 12b, the boy expressed a sense of camaraderie with the robot, extending his arms to high-five with it. Figure 12c shows children expressing their fondness for the robot by hugging it.

Fig. 12
figure 12

Embodied expression of emotions

Discussion

From the perspective of embodied cognition, we examined the interaction behavior of children in robot-mediated activities to understand the affordances of the robot for enabling embodied learning experiences. In Study One, where children’s bodily movement and haptic interactions were elicited intentionally by a robot, we observed richly multimodal, fluid embodiments over extended periods of time and markedly high engagement. In Study Two, although their bodily engagement was not intentionally prompted, the children spontaneously used their bodies not only to convey scientific and mathematical knowledge but also to deliver social messages.

The phenomena observed in both studies were consistent with each other and with literature in this field. To reiterate, children’s thinking was integrally coordinated with their richly multimodal and multisensory actions. Their thinking and action appropriated available resources and the physical environment to achieve the problem-solving goal. Importantly, our work added new insights to the study of embodiment. First, this age group of children has not been studied actively in the embodied cognition community. However, this is the age when children start schooling and developing academically and socially outside the home and family. Our documentation will provide insights for designing curricula for this age group in a developmentally appropriate way. Second, our study has revealed that, along with embodying knowledge and reasoning, children’s embodiments are influenced critically by the social context as well as interpersonal emotional dynamics.

The role of social robots

Consistent with the social robotics literature, children in our studies engaged in physical interaction around the robot to a much larger degree and in a broader variety of ways than they would with other digital tools. The authors conjectured that young children’s habit of personifying their toys (e.g., dolls or idols) and building companionships with them might be applied consistently to child robot interaction, which in turn stimulated interaction dynamics that were natural and social. The effect of the robot indeed seemed to go beyond mere physical presence. Children treated the robot similarly to a friend (Kim et al., 2018). As the sessions progressed, some children who were shy at first gradually displayed familiarity and intimacy with the robot, just like the way friendships with other children might develop. They wanted to know more about it and asked personal questions, such as “where is your family?” “who are your friends?” or “what’s your friend’s name?” In sum, the embodiment of a robot afforded children an opportunity to express their knowledge, affect, and social relations in ways that were more natural and spontaneous compared to other conventional learning contexts. From this, the benefit of a robot can be twofold: a robot can be used as a viable research tool to study children’s embodied interaction and also as a learning tool to enhance engagement in academic tasks.

Embodiment as a social act

Notably, our observations highlighted the influence of the social context on children’s embodied interaction. The embodiments we documented did not occur in a social vacuum but were closely related to the physical movements of the robot and the prior embodiments of their peer. In Study One, prompted by the robot’s movement, children physically moved around in the learning space. In Study Two, they built on and elaborated their partner’s embodiment of an idea. These connected embodiments confirm the idea from literature that participants in an interaction use others’ embodiments as a resource to make progress on a task (Roth & Thom, 2009; Wittmann et al., 2013). Overall, the forms of embodiments produced are influenced not only by how knowledge is represented but also to whom that knowledge is being conveyed (Flood & Abrahamson, 2015; Goodwin, 2000).

The embodiment of affective relationships has drawn little attention in the literature on embodied cognition and embodied learning design. Compared to the action cognition links (e.g. Cook et al., 2007), very few studies examined emotions as a component of cognition beyond mere discussion (Danziger et al., 2011; Gallagher, 2015). It was clear in our studies that embodiments were being directed at the peer and/or the robot, and emotions integrally modulated the embodiments towards them. The analysis of embodiment therefore should take into account its social and emotional dimensions as well as the cognitive dimension.

Design implication: minimalistic design for voluntary interest

To design effectively for young children, we propose a form of minimalistic design tapping into a child’s natural way of interaction. The importance of aligning design with a child’s world and habitual way of embodied interaction has been recognized by prior work (e.g. Martin, 2009; Rosen et al., 2018). The physical, emotional, and cognitive engagement among children, particularly in Study Two, emerged spontaneously as children interacted freely over time. The affordance of a social robot was unique in that its embodied social behavior triggered task related behavior and relevant feedback. For example, in our design, the robot being new to earth signaled to the children that they had to help it. The robot’s simple remarks, “I like that,” “That is cool” seemed to function as powerful feedback. It also said, “I’m confused” when children conflicted with each other, prompting the children to re-do the activity. Our design demonstrated that, even without direct instructions or task assignment, intense embodied engagement was elicited from children while they playfully helped the robot.

Issues and recommendations

We observed that children made progress in tasks using complex integrations of linguistic (speech), visual (observation), and spatio-dynamic (action) modes. This confirmed that the analysis of body-based behaviors should take into account how one behavior is situated and relates to other behaviors (Ferrara 2014). Physical actions, for example, can be examined as evidence for expressions of affect, and expressions of affect can further be identified by analyzing children’s facial expressions. Posture can indicate degree of affinity and social relationships. In this way we can capture the manifold ways in which children use their body to communicate.

Although the impact of the position of the body on learning has drawn some attention in the literature (e.g., Hall et al., 2002), there is a little research examining the impact empirically. In our studies, the robot’s front served as the center stage where on-task activity took place. The robot’s back produced a peripheral space the children used for momentary off-task behavior. This differentiation of the learning space seemed to allow children to modulate their engagement and regulate their own learning. Currently, researchers increasingly use advanced technology (e.g., virtual and augmented realities) for embodied learning design that can enable learners to move more freely (Johnson-Glenberg & Megowan-Romanowicz, 2017). The design and study of such environments should take into account the impact of varying physical positions on learning.

Lastly, current research on social robotics has typically studied only one or two aspects of robotic features and their immediate impact on narrowly defined discrete skill acquisition (e.g. Kose-Bagci et al., 2009). It was clear from our study that interaction for learning involved children’s bodily engagement, embodied social relations, and embodied emotions. Detailed documentation of embodied behavior of children can inform the relevant research community not only about the multiple ways in which the body is implicated in thought, emotion, and social behavior but also the conditions through which embodiments are produced.

Conclusion

Popular mobile devices such as phones and tablet tether children to a screen. Our observations have shown, in contrast, that humanoid robots elicit a broad range of embodied behaviors. The embodied cognition perspective has further proven useful for identifying affordances provided by a robot and describing how embodiments come about in physical and social contexts. For children, an embodied robot can serve as a catalyst for the use of their body for thinking and communication. Its physical presence renders a space for interaction, also influencing children’s embodiments and allowing them to regulate their own learning and engagement. To conclude, our study confirms that the embodied cognition perspective could be a beneficial framework for designing learning technology for children. More generally, learning environments designed within the embodied cognition perspective are very likely to promote the balanced development of children intellectually, socially, emotionally, and physically.