Since the late 1990s, the demands of a globalized and technological world have influenced international educational forums to rethink educational practices around the world. Teacher-centered learning approaches were supplanted by student-centered ones, potentializing the idea of learning in groups. In the following decades, and particularly in the last 10 years, collaborative learning (CL) came to be recognized as a twenty-first-century educational trend (OECD, 2010, 2019); it has gained momentum in the educational sciences both as a teaching method and as a transversal competence (i.e., general and relevant skills that students should develop through the several stages of the educational degrees, thus crossing the curriculum longitudinally) (Binkley et al., 2012). Therefore, recent research in this field has focused heavily on revealing the potential of CL to support high-level learning from early childhood to higher education, correlating learning outcomes to regulatory systems involving cognitive, metacognitive, socio-emotional, and motivational aspects of collaboration. But what is collaborative learning? Why are these different aspects of collaboration important and how should we look at the body to learn more about collaboration?

To elaborate on these questions, the present paper starts by presenting the current definition and main epistemological root of the field of CL, the history of different theoretical frameworks that have concentrated on understanding the regulatory systems in learning processes (individual and collective), and addresses some of the research gaps in CL studies. Furthermore, the review opens a dialogue with current research findings in cognitive sciences, discussing how looking at collaboration through the perspective of embodied cognition (EC) theories can open the field. It is argued that investigating CL by analyzing bodily engagements and intersubjectivity is a possibility to explore aspects of the learning situation, which can reveal how the skills for collaboration emerge and are developed.

Collaborative Learning: Definitions and Epistemological Roots

Collaborative learning comprises the idea of a “coordinated, synchronous activity” (Roschelle & Teasley, 1995, p. 70), which results directly from a group’s attempt to sustain the same goals and conception of a problem. CL is thus recognized as a complex process based on the perspective that learning is an active and situated process of knowledge construction, and although the learning happens individually is dependent on the other’s actions.

Collaborative learning has been investigated through different perspectives but mainly under a cognitivist or social cognitive approach (Volet et al., 2009), and methodologically addressed through the analysis of the discursive dynamics created among members of a group during a group task (Hadwin et al., 2011), or by experimental studies measuring the correlations between different aspects of collaboration and learning outcomes (e.g., Gully et al., 2012; Lam & Muldner, 2017; Zambrano et al., 2019). The philosophical accounts supporting these studies are based on a mentalistic explanation of human cognition (Bratman, 1992), which depicts social encounters as encounters of minds. Interactions thus depend on individuals’ capabilities to “infer each other’s beliefs and desires to understand and predict the other’s intentions and moves” (Fantasia et al., 2014, p. 1). It is simultaneously assumed that cognition is located and is dependent on the brain and its ability to process information collected from the environment. Under this perspective, understanding others’ intentions is achieved using a system of representation of mental states (mind-reading) and the body works merely as a receptor of perceptual stimuli; the brain is then responsible for operating cognitive processes. The construct of content processing is grounded in information processing approaches (i.e., when cognition is understood as comparable to how computers work) and represents the mental activities used by students to process content knowledge (Volet et al., 2009, p. 130). Therefore, according to this account, collaboration depends exclusively on sophisticated and high-level cognitive and metacognitive functions and the verbal expression of these processes.

To understand how students express their ability to learn how to learn in groups, the research focus in this field has been on, first, the study of dynamics of regulations of learning (i.e., self-regulation, coregulation, and socially shared regulations), which are specific actions that enable students to reflect on their own and other’s thinking and acting processes during task execution and learning. Particularly, exploring cognitive, socio-emotional, and motivational aspects of social interactions, which are manifested within the collaborative situations. Second, on the analysis of verbal dialogues, which reveals such regulations of learning.

Collaborative Learning and the Analysis of Individual and Group-Level Dynamics: from Self-Regulated Learning to Socially Shared Regulations of Learning during Group Tasks

The regulatory processes of learning were first systematically investigated in the late 1980s (not necessarily within CL situations), and provided extensive knowledge on students’ agency and proactivity for learning. Since then, different theoretical models to explain such processes have been elaborated, addressing the regulations of learning from an individual to a group level.

The Sociocognitive Model and the Self-Regulated Learning

Zimmerman’s sociocognitive model emphasized the self-referential system as a core element of learning (Zimmerman, 1986, 2000). Zimmerman proposed that students must be able to reflect upon their own learning process while enacting in learning tasks, and act in order to regulate their behaviors and thinking during learning. Zimmerman showed how positive learning outcomes are related to a series of actions (so-called essential elements), such as time control, setting goals, self-reflection, and measuring self-efficacy. This self-referential system was conceptualized as self-regulated learning (SRL), and it redefined pedagogical practices by emphasizing the importance of reliable measurement of its essential elements instead of simply measuring final academic results (Ramdass & Zimmerman, 2011; Zimmerman, 2010).

The concept of SRL was furthered explored and expanded, addressing the aspects of motivation and metacognition. Pintrich (2000) investigated the correlations between SRL and motivation by comparing groups of high- and low-achieving students, and expanded the concept of SRL including motivation as a sine qua non condition for learning. In a slightly different direction, focusing on exploring the benefits of goal-setting, tactics, and feedback to learning outcomes, Winne and a number of his colleagues (Winne, 1996; Winne & Hadwin, 1998; Winne & Perry, 2000) produced a set of studies demonstrating how the identification of task-demands, establishment of goals, recognition of tactics, and evaluation of the results create an individual’s metacognitive capacity. Metacognition is created through following four sequenced stages during task execution (e.g., studying)—defining a task, planning, enacting tactics, and preparing future actions (Winne & Hadwin, 1998). At each stage, students generate internal feedback or can be provided with external feedback, which builds models that guide their thinking process and upgrade their understanding of the task’s goals and what is necessary for them to achieve them. When such models are used as the bases for examining information, students engage in what is defined as metacognitive monitoring.

Research on the diverse aspects of SRL was crucial for establishing a new empirical-based methodology for learning that decentralizes the role of the teacher and emphasizes student action in the learning process. Nevertheless, the focus on individuals’ metacognitive abilities did not cover the gaps regarding the influence of the context (e.g., effects of group engagement, and peer learning). In the mid-2000s there was a shift from the individual level analysis (self-regulated learning) toward a group-level analysis of regulation of learning (coregulation of learning) and then to socially shared regulation of learning (SSRL) (Barron, 2003; Greeno, 2006). Research on natural educational environments has accumulated evidence on the influence of contextual settings on learning and led to the consolidation of two distinct models: the sociocultural and the constructionist models (McCaslin & Hickey, 2001). Both of these perspectives understand metacognitive abilities not as intrinsic to the individual but arising from interactions between the learner and other partners.

The Sociocultural Model and the Coregulations of Learning

The sociocultural model focuses on the coregulations governing interactions between a student and a teacher or a more capable partner (Hadwin et al., 2005). Coregulation of learning is explained as a transitional process; an ability that starts with the regulations imposed by the teacher and then evolves to the SRL. In other words, the coregulation established within the relation with the teacher or a more experienced peer prompts and/or supports the development of the student’s SRL. The sociocultural model opened the discussion of the dialectic, social, and culturally situated nature of learning, and emphasized the importance of mediation (referred to in many studies as scaffolding) and task instruction and design (Azevedo, 2005; Turner & Patrick, 2008). Under this perspective, studies have discussed the positive association between socioemotional interactions and the control of self-regulated learning during learning in group, elucidating how social–emotional and motivational elements influence learning in groups (Järvenoja et al., 2010). Studies have demonstrated that self-regulated learning increases dependence on the positive relationship between the focus of the activity and type of social regulation within the group (Grau & Whitebread, 2012), and that students differ in terms of adaptation to instructional opportunities that depend on multiple simultaneous personal and sociocultural factors within their relationships with teachers and peers (McCaslin & Burross, 2011). This body of research has impacted the development of instructional practices (e.g., clear and specific guidelines for task design) (Barkley et al., 2005; Torres & Marriot, 2010) and revealed the complexity of collaboration.

The Constructionist Model and the Socially Shared Regulations of Learning

The constructionist model focuses on the regulations that happen simultaneously among different members of a group. Recognizing how students participate in the construction of joint knowledge by sharing their understanding of theoretical concepts, ideas for solving problems, and judgments relating to evaluation of the task (Hurme & Järvelä, 2005), as well as studies focusing on the emergence of productive forms of collective thinking, have consolidated the conceptualization of SSRL. SSRL focuses on the metacognitive properties of dialogs and the particular interpersonal component that allows joint regulations, which in turn build shared representations of, goals for, and strategies for solving collective tasks (Hadwin et al., 2011). These elements are responsible for enabling students to enlarge their individual skills by converging divergent perspectives that affect their cognitive learning paths (Chi & Wylie, 2014).

The conceptualization and empirical investigation of SSRL significantly advanced the understanding of group dynamics and learning. It revealed specific patterns in group dialog led by processes of joint metacognition in different moments of task execution (e.g., planning, monitoring, and evaluating), which evidenced CL (Järvelä & Hadwin, 2013). Furthermore, research on SSRL revealed support for the idea that argumentative skills are a prerequisite for CL (Isohätälä et al., 2017; Marttunen et al., 2007). To reconcile divergent ideas and advance beyond individual capabilities, the members of a group should have and use a set of existing verbal abilities to achieve a specific type of exploratory talk that allows metacognition to emerge (Mercer, 2013). Grau et al. (2018) found that metacognitive regulations are positively and significantly correlated with exploratory talk, but are not significantly correlated with cumulative talk and were negatively correlated with disputational and limited talk. These findings corroborate those of Volet et al.’s (2009) claim that the key element for socially shared regulations is the ability to engage in elaborations, inferences, and interpretations. They also stressed the significance of clear and well-elaborated communication (Volet et al., 2009). These results have had practical implications for educational contexts, particularly in the development of practices that can assist students in this type of dialog and the creation of activities that prompt discussions in group tasks.

The Situative Model and the Expansion of SSRL

Another theoretical framework analyzing SSRL is the situative and contextual framework (Järvenoja et al., 2015; Volet et al., 2009). Järvenoja et al.’s (2015) situative and contextual model, derived from previous studies (Greeno, 2006; Järvelä et al., 2008; Nolan & Ward, 2008), emphasizes the complementarity between the individual and social dimensions of learning and proposes that learning regulations should be investigated within particular situations—specifically, the “where,” “when,” and “with whom” learning occurs. The diverse elements that comprise context are understood to constitute the process of learning rather than just influencing it. In terms of methodology, the situative and contextual model differs from previous ones in three regards: (1) it demands the integration of different levels of analysis of regulations of learning (individual and group); (2) it considers cognitive as well as socioemotional regulations to understand the interactive process; and (3) most importantly, it calls attention to the nonverbal behaviors that circumscribe the learning process. However, although the situative approach sheds light on the importance of analyzing what learners do when interacting, nonverbal behaviors are still analyzed as a separate aspect of the storyline of events, supporting the overall interpretation of intentions in an individual’s discourse.

The different models presented above should not be understood as a gradual evolution in the conceptualization of regulatory systems of learning. Rather, they are distinct ways of theorizing and investigating learning process, which in their particularities reveal the complexity of the phenomenon (Järvenoja et al., 2015). The reflection made here is that the gradual incorporation of different analytical units on the analysis of regulations of learning happens due to the recognition of the limitations in each theoretical model.

The research gaps yet to be bridged

Methodologically, within the field of CL, there is a common understanding that multi-model research designs should be developed in order to address multiple layers of analysis (individual and group) (Isohätälä et al., 2017; Pandero et al., 2015) and the interplay of the different types of processes (e.g., cognitive, social, affective, and motivational) that occur simultaneously during interactions (Azevedo, 2005; Järvelä et al., 2016; Järvelä et al., 2019; Volet et al., 2013). However, it is nevertheless noticeable that there is a clear dichotomy between body and brain and cognition and emotion. The body is trivially supporting meaning-making during dialog and its actions are incorporated in the construction of cognitive processes. Therefore, even though traditional approaches recognize that meaning-making permeates a complex assemblage of gestures, postures, eye contact, facial expressions, and other environmental cues that build the mechanisms that enable humans to, for example, understand other’s actions, the analysis of behaviors and their effect on regulation process have not yet been deeply incorporated in the analysis of SSRL.

One could question the relevance of emphasizing the role of the body during collaborative tasks, or more precisely during CL situations, as cognitivist approaches to cognition provide theoretical support for answering majority of the research questions regarding knowledge construction and social interactions raised so far in this field. However, doing so would limit the analysis of collaboration in situations in which individuals verbally express their thoughts using sophisticated cognitive mechanisms; this would mean that argumentation and reasoning skills are considered prerequisites, and collaboration would only be identified by the analysis of utterances during group tasks (Isohätälä et al., 2017, 2020). Theoretically, this approach implies that either no CL learning can take place without verbal utterances, or that the methods constructed to date are insufficient to fully elucidate all aspects of collaboration. Such limitations open gaps with regard to collaboration within groups that have asymmetrical interactions such as groups of small children, who sometimes cannot sufficiently articulate their thinking as arguments but are nevertheless capable of collaboration (Fantasia et al., 2014; Ferreira, submitted for publication).

In addition, developmental, intersubjective and dynamic aspects of collaboration are far from being fully understood. For example, it is unknown how students learn to identify which emotional states or cognitive processes they need to regulate during interactions, or how interaction skills are constructed throughout the activities students engage in during schooling. Malmberg et al. (2013) previously pointed out the importance of exploring the role of nonverbal aspects of communication due to their potential influence on socioemotional aspects of learning, but we have not yet asked whether (or how) our emotional states define the openness for engagement in knowledge construction, or what emotional states are necessary for engagement and how are they manifested and perceived by members of a group.

It is also important to understand how the body is involved in the construction of meaning during dialogs and why collaboration enables learning for some students but not for others, or whether the interaction itself is what allows collaboration to emerge and interpersonal skills to develop, and whether the ability to collaborate and learn in group depends exclusively on an individual’s cognitive capabilities are still questioned. Last, it is uncertain if and how culture affects the construction of collaboration, or if the ways we act and enact in the world change or define different ways to collaborate. To answer such questions, it is necessary to deepen the qualitative analysis of collaborative interactions, looking at the phenomenon longitudinally and beyond the different types of regulation processes immediately involved in the interaction and the individual participation and outcomes. It demands a change of epistemological perspective of social cognition and enlargement of our view of social interactions and embodiment, not only as contextual elements but as constituting human cognition. Therefore, to make sense of cognition we must study the brain, the body, their relationship with the world, and their relationship(s) to the brain–body systems of others. In the following section, light is shed on the possibilities to elaborate on further developments in the field of CL by approaching the phenomenon through an EC perspective, understanding the body’s role during learning in groups in face-to-face collaborative situations, and how intersubjectivity is constructed within the interaction.

The Embodied Cognition Theory: Different Approaches and New Research Questions for Collaborative Learning

The understanding of humans’ bodily experiences and the implications of the functioning of our brain–body mechanism have been a primary focus in the field of cognitive neuroscience for the past two decades (Goldman & De Vignemont, 2009; Kiverstein & Clark, 2009). Studies investigating the role of bodily states and the importance of action and sensorimotor systems for cognition have produced a number of theories, all of which are generally referred to as embodied cognition (Barsalou, 2008; Gallese, 2008). Broadly, this theoretical framework adopts the notion that to comprehensively understand how cognitive processes operate, it is necessary to acknowledge the brain as embodied. In other words, how humans collect information and assemble the world depends non-trivially on the body, its experiences, and its movements; the only way that the brain talks to the environment is through the body, sensory tissues, and organs.

The theories explaining the embodiment of cognition vary significantly. Less radical models of EC recognize sensorimotor systems as being part of multiple ways to understand cognition and its processes (for a detailed overview, see Kiverstein & Clark, 2009; Borghi & Cimatti, 2010; Borghi & Caruana, 2015). The proposal in this moderate approach is, in a majority of cases, compatible with many versions of traditional cognitive science, basing its rationale on the idea of “bodily formats” (i.e., representational codes used to form interoceptive or directive representations of one’s bodily states and activities) and appealing to the evidence that the brain reuses cognitive processes originally created for different purposes (Goldman, 2012). In this perspective, important work has been done around the theory of embodied simulation, which explains the human ability of mind-reading through the idea that other people’s mental states are represented by adopting their perspective: by identifying or matching their states with resonant states of one’s own (Gallese, 2003; Gallese & Goldman, 1998). Embodied simulation proposes a more general functional mechanism for perception and imagination than that formulated by traditional cognition accounts and it gained momentum with the discovery of mirror neurons and mirror mechanisms in the brain (see Gallese, 2018; Gallese et al., 2004). Mirror neurons are a specific type of visuomotor neurons, which are activated both when the individual performs an action or when watching another’s actions; it forms a cortical system that matches observation and execution of motor actions. This structure “allow us to directly understand the meaning of the actions and emotions of others by internally replicating (‘simulating’) them without any explicit reflective mediation” (Gallese et al., 2004, p.396).

On the radical stand of EC, philosophical developments on the nature of social cognition offer a different explanation, which not only involves but is dependent on the body; cognition is constrained by an individual’s body and the specificities of the actions afforded by the environment it engages in (Gallagher, 2005). At this end of the spectrum, the understanding of others’ intentions and actions as dependent on the mirror mechanisms in the brain (see Gallese et al., 2004; Glenberg & Gallese, 2012) has been heavily criticized and highlighted as a way to disembody embodiment (Gallagher, 2015). The more radical approaches suggest that social cognition is not necessarily an act of thinking but sometimes simply an act of perceiving (e.g., recognizing another person’s intentions or emotions in their face expressions and embodied actions); perception directly picks up another’s mental state without the need of theorizing or simulating (Gallagher & Zahavi, 2012). Therefore, cognition is dependent on what our bodies are capable of perceiving and doing when coupling in the world, and is shaped by the possibilities (affordances) of action. Human cognition is understood as sense-making: “a cognizer’s adaptive regulation of its states and interactions with the world, with respect to implications for the continuation of its own autonomous identity” (Fantasia et al., 2014, p. 4).

Furthermore, within the scope of embodied cognition theories, it is also possible to talk about the notion of extended (or situated) cognition (Clark & Chalmers, 1998), which will understand cognition as a process that is not limited to one’s body and bodily actions, but it overflows to objects and environment. This understanding of cognition finds resonance with the idea of the use of the body, or the physical action to outsource or alter internal functions (e.g., memory, perception) or information processing (e.g., mathematical reasoning), as cognitive offloading (Risko & Gilbert, 2016), which is a sophisticated mechanism of using the body and the external world to support one’s thinking. However, proponents of extended cognition (Clark, 2008; Rowlands, 2003) claim that the “epistemic actions” of a given activity (e.g., flipping cards in a memory game) or using the environment to trigger delayed information (e.g., finding a phone number in your phone) do not merely serve to outsource internal functions, but are an integral part of cognition. “The cognitive loop encompasses states, processes, and mechanisms both of the brain and the body as well as of the environment” (Lyre, 2018, p. 1), thus, some cognitive processes are partially driven by environmental scaffolding (Krueger, 2011).

This means that the notion of embodiment can be based on different and distinct philosophical pillars, and although this explanation will not be addressed in depth in this paper, it is essential that it be acknowledged, as it guides distinct directions to formulate research questions. In all cases, however, the significant contribution of the embodied approach is the recognition that the experiences we entertain in the world (including with other people) are the result of our bodily nature shaping our perceptions and actions; directing sensorimotor interactions is therefore crucial for gaining knowledge and developing cognitive capabilities (Engel et al., 2013).

This class of theories essentially opposes classical cognitive approaches, which claim that cognition is strictly the processing of abstract and amodal symbols (i.e., arbitrary transduction from the perceptual state), and that sensorimotor experiences are not involved in cognitive operations (Meteyard et al., 2012). Thus, under an embodied approach, sensorimotor experiences become a fundamental unit of analysis in the investigation of cognitive processes (including knowledge construction). Particularly for investigating learning processes and for further learning in groups, this approach could redefine how we look at, for example, the bodily engagements and nonverbal behaviors that comprise communication during collaborative interactions, how individuals use their sensorimotor systems and the implications of this for decision-making during the interactions, and even the information on physiological changes that can reveal how collaboration is bodily experienced (e.g., identifying synchronicity and physiological mutual regulations). This knowledge can potentially change the way peer interactions are organized, redirect the development of interpersonal skills, and, most importantly, support the understanding of collaboration as a form of interacting and understanding each other, which then can emerge and be learned even in early interactions.

What Do We Already Know about the Role of the Body in the Cognitive Processes Used in Social Interactions?

A substantial body of extant interdisciplinary research encompassing the fields of psychology, neurosciences, and philosophy has systematically demonstrated that our bodily nature creates the experiences we entertain when relating to others, shaping our perceptions and affording or constraining meaning-making (Uithol & Gallese, 2015). It is known, for example, how proprioceptive feedback (e.g., sense of limb position and movement) impacts perception, such as how posture can influence one’s mood and interfere in interactions with others (Osypiuk et al., 2018), or how feedback from one’s facial muscles affects social judgments by creating a predisposition toward engagement or disengagement in joint actions (Niedenthal et al., 2010). Furthermore, the ability to perform joint actions also can shape perception. The belief of engagement with others (e.g., the act of two or more people lifting a box together) and the expectation created in the social realm impact both object perception (e.g., judgment of the weight of the box) and task execution (Doerrfeld et al., 2012). In this case, co-actors perceive the environment in terms of what they can do together, or simulate the actions they intend to perform based on their own motor systems. Thus, the anticipated effort modulates the perception of an object’s properties in the context of joint action, which implicates action prediction and action simulation processes in social interaction.

These studies show that during interpersonal encounters, the different elements of what people are able to perceive from the environment (including the other person) are noticed and can change the dynamic of the interaction, even if it is not directly connected to the purpose of the encounter. These findings support the argument that the “perceptual system is optimized for guiding peoples’ interactions with the environment rather than deriving action-independent object properties” (Doerrfeld et al., 2012, p. 474). In the context of collaborative interactions where engaging in joint actions is a demand, it may be particularly useful to be able to anticipate common effects in order to decide when, how, and with whom to collaborate. Moreover, it is also questionable if the effects of anticipated joint action on perception (e.g., object perception or other persons’ skills) could not indeed act as a driving force for collaboration.

From the perspective of the communication during interactions, it is also relevant to mention that studies have shown that adults coordinate in taking turns via nonverbal communication, evidencing the need for verbal synchronicity during collaborative dialog (Brennan & Hanna, 2009), as well as how participants in dyads sway their bodies in synchronized ways, especially when conversing with each other (Shockley et al., 2003), and visually coordinate their attention through synchronized eye movements to understand each other’s actions when completing collaborative tasks in pairs (Schneider & Pea, 2014). These studies showed how the body is also used to establish the synchrony necessary for collaborative interactions, demonstrating that is not at all clear yet what are the boundaries between speech and movement, and to what direction this relation is established—i.e., in speech influencing movement entrainments or vice versa. Furthermore, synchronicity is also found in less visible bodily reactions, such as physiological states (i.e., skin arousal measured by electrodermal activity from an individual’s wrist). Talking specifically about collaborative situations in small groups, studies have suggested that students tend to synchronize their physiological states during interactions, particularly when monitoring one another’s behaviors and cognition (Haataja et al., 2018; Malmberg et al., 2019), and have been able to show initial positive correlations between physiological synchrony and collaboration quality, task performance, and learning gains, by computing the number of cycles between low and high synchronization (Schneider et al., 2020). The aim of these studies was to explore how students monitor behavioral, cognitive, and affective processes during collaboration. Although the results are still limited, they support the view that bodily synchrony is relevant for joint action understanding in collaborative learning interactions (Järvelä et al., 2014).

Communication has also been analyzed from the perspective of the use of gestures, touch, and facial and whole-body expressions. Within this matter, studies argue that such bodily actions carry a property of material scaffolding, as it affords a distribution of cognition across different modalities (for a review see Roth, 2003), lightning cognitive load (Goldin-Meadow et al., 2001), or facilitating conservation of cognitive resources on the explanation task to improve performance (Goldin-Meadow & Wagner, 2005). Beyond the idea that gestures support the development of verbal competencies (Roth, 2003), the use of bodily actions allow social-cognitive extensions by providing both the listener and the speaker a systematic space where they can share intentionality (Krueger, 2011; Lyre, 2018). What this means is that gestures are particularly important for social interactions as they allow the participating individuals to alter the structure of their individual thoughts; the interactions generate new feedback cycles and process of shared feeling and sympathetic understanding specific to that exchange (Krueger, 2011). Therefore, gestures can be used collaboratively. For example, exploring patterns in how collaborative gestures arise while proving geometric conjectures, Walkington et al. (2019) identified that learners use collaborative gestures to extend mathematical ideas over multiple bodies (members of the group) as they explore, refine, and extend each other’s mathematical reasoning.

These and many other related findings present data that can sustain arguments that the role of our body during interactions, especially collaborative ones, is not secondary or contextual but defining and constitutive. Cognition is thoroughly dependent on the body in different cognitive processes used in dealing with others, such as emotion recognition and empathy, action understanding, joint attention, joint action, and interactions in general (Reddy & Uithol, 2015).

How Can We Further Investigate the Body during Collaborative Interactions, and Why Should We Do So under the Theoretical Framework of Embodiment?

In light the phenomenon of CL, the EC theoretical approach calls for an understanding first of the embodied learning involved in the process of knowledge construction in group work, and second of the embodied experience of intersubjectivity during such interactions. These two perspectives are intertwined, but they can be addressed through distinct research questions and methodological designs that consider the different approaches in embodied cognitive theories. An embodied approach opens space for breaking borders with traditional frameworks of cognitivism that acknowledge, for example, the interconnections between cognition, emotion, and motivation and understanding cognition as information-processing conducted by the brain, and thus investigate these elements as separate units correlated with different impacts on the resolution of tasks.

Embodied Learning during Collaborative Interactions

This initial perspective remains focused on how the body is involved in cognitive operations related to learning (e.g., language processing, concept understanding, and action understanding). The studies in this body of research were influenced by the discovery of mirror neurons and the subsequent simulation hypothesis (Gallese & Sinigaglia, 2011), which support the investigation of, for example, sensorimotor effects in lexical processing (Inkster et al., 2016), multisensory cognitive processing (Koning & Tabbers, 2011; Zwaan, 2014), bodily engagement and task integration (Skulmowski & Rey, 2018), and strategies to study physical and virtual embodied learning (Johnson-Glenberg et al., 2014; Fiorella & Mayer, 2016; Lindgren et al., 2016; Pouw et al., 2016; Johnson-Glenberg & Megowan-Romanowicz, 2017). The design of such studies was developed under a notion of task-related embodied manipulations and incidental forms of embodied learning, or through integrated forms of embodied learning. This means that the experiments conducted either tested the effects of incidental cues (e.g., weight, movement, or sensorimotor stimuli) on an individual’s perception of or the ability to perform specific cognitive (or metacognitive) processes (e.g., the recall of judgments or performance; see, for example, Skulmowski & Rey, 2017), or compared the learning settings relating to bodily activities with those that enable learning without requiring motor activity (e.g., Song et al., 2014). Even approaching different aspects of learning, the findings derived from the substantial amount of empirical evidence produced over the last decade indicate that the vast majority of cognitive processes are embodied, thus showing the potential to investigate learning processes through an embodied perspective.

Reflecting on learning processes in collaborative interactions raises the questions “how is our body constituting this learning situation?” and “how can we learn to collaborate?” These questions can be followed by different takes on the role of the body, and even further, the role of the environment at the situated moment of the interaction. For instance, looking at the visual sensory system people rely on in real-life interactive tasks, natural eye movements show a strong link between actual behavioral goals and overt visual attention. All fixations fall on task-relevant objects, and the range of duration of such fixations durations is wider, reflecting the acquisition of the information required for the task at hand (Pelz et al., 2001; Zhang et al., 2018). This process extracts information and coordinates motor actions, revealing the relationship between what is seen and the actions expressed in sequence (including expressions of utterances). The visual system can also capitalize on information regarding stimulus frequency, conditional probability, and temporal autocorrelation in visual signals to build expectations about forthcoming sensory information (Summerfield & De Lange, 2014). Beyond the visual attention distribution, pupillometry is a suitable measure for understanding the intensity of a stimuli, indicating cognitive and emotional processing (positive or negative valence). Understanding how the visual sensory system operates in collaborative decision-making and acting provides more insights into how and why the participant engages in specific actions or regulates specific contents (e.g., motivation, socio-emotional). It is also possible to raise and answer questions on the effects of the absence of the visual resource for the interaction (e.g., in situations where collaborations take place in virtual rather than face-to-face environments).

Further, the relationship between students’ physiological states and their engagement in different aspects of group tasks (e.g., monitoring group progress) remains to be revealed. Evidently, further studies investigating complex relations between students’ regulations and physiological states are still needed in order to develop a comprehensive understanding of the role of the body during collaborative interactions in a way that can be applied to support learning in real-life situations. Developing methods that incorporate the embodied features of interactions, such as the physiological indices of social signal processing (Knight et al., 2016), requires interdisciplinarity and the implementation of different methods of data integration. Nevertheless, the EC approach is relevant, as it allows a completely different view of collaboration and offers novel ways to investigate collaboration at the individual and group levels.

Another interesting way to look at the embodied learning during collaborative interactions is through the perspective of extended cognition, acknowledging the material environment circumscribing the moment of interaction. If considering that “intelligent behavior arises from the dynamic coupling between intelligent subject and its environment rather than only from the agent’s mind (brain, control system) itself” (Roth & Jornet, 2013, p. 143), the content of what happens between the members of the group (e.g., a problem-solving activity) is not the only important factor; it is also how the content is manifested (actions), for what purposes (context), and through which means (afforded by the environment). For example, learning environments that have different artifacts supporting students to store, process, and recall information, and specific protocols prompting the coupling with external world during learning process can address complexity and non-linearity in content thinking and problem-solving, which increases not just the engagement but also the learning outcomes of students (Haupt, 2015). As part of the environment, in the case of collaborative interactions, the members of the group can also take part of one’s thinking process—being an extension of their cognition, which will define a specific dynamic among members of the group. This type of analysis on social interactions in collaborative learning are scarce (Lyre, 2018) and could provide a totally different dimension on how we understand collaboration. “The efficacy of this approach lies in its capacity to account for both cognitive flexibility and the learners’ remarkable abilities to distribute cognitive demands between their brains, bodies, artifacts, and other people in their environment” (Bailey, 2020, p. 289).

The Embodied Experience of Intersubjectivity

EC theory also opens up space for investigating intersubjectivity (social cognition) in collaborative peer interactions and analyzing how the experience of interacting with others in meaning-making processes supports learning. Traditional approaches analyzing collaboration treat emotion and cognition, as well as internal (individual) and external (social) action and thought, as interconnected but separate units; any focus on what is produced within (or because of) the interaction is treated as a secondary interest. Nevertheless, collaboration as form of “acknowledging other’s interests and objectives in some relation to the extrapersonal context, and acting to complement the other’s response” (Hubley & Trevarthen, 1979, p. 58) is an essential feature of human social interaction (i.e., the ability to engage in meaningful interactions), and appears immediately in the first contact with others in different moments of daily routines (Tomasello et al., 2005). In the development of sociability, children gradually build an extensive cultural repertoire and interactive skills as they interact with their peers, and it is possible to identify how the bodily experiences in coordinative, collaborative, and communicative acts shape cognition (Ferreira et al., 2020). The process of meaning-making thus takes place at a level of interpersonal perception, in line with their own embodied experiences in each environment in which they take part. Therefore, when addressing collaboration in learning situations through an embodied perspective, it is also possible to do so using an enactive approach:

The idea of enactivism is that online bodily processes, not only sensory-motor processes but also affective processes shape the way the perceiver thinker experiences and considers the world and interacts with others. […] The path (or our understanding) is not pre-established; we construct it as we go and specifically through bodily processes, such as walking, moving, gesturing, reaching, grasping, and interacting with others. (Gallagher & Lindgren, 2015, p. 393)

The view of embodiment proposed here positions collaboration as a non-representational relational process that goes beyond verbal communication. It emphasizes participatory sense-making, which describes the relation between the individual and the world of significance that it enacts, and concerns acting and interacting, bridging gaps between cognition and affect (De Jaegher & Di Paolo, 2007). Thus, the issue is no longer whether one can or cannot regulate others’ behaviors, but rather whether one can understand what it takes to participate in joint action. By changing the philosophical stance behind the phenomenon, collaboration is seen as any other social interaction, sustained by processes of embodied coordination and including its breakdowns and repairs (Di Paolo & De Jaegher, 2012). Thus, the individual is not only responding to the external regulations that produce an appropriate action for a given situation but is also actively and systematically constructing the conditions of such exchanges, and by doing so enacting a world or cognitive domain (Di Paolo & De Jaegher, 2012, p. 38). Different tools such as conversational analysis (Hoey & Kendrick, 2017), dynamical system theory (Vallacher et al., 2010), and dual scanning neuroscience (Špiláková et al., 2019) have been used to investigate and understand the patterns of coordination and breakdowns in social interactions, but there have been remarkably few attempts to apply this framework to understand collaborative dynamics in learning situations.

Beyond the understanding that elements in nonverbal communications (e.g., gestures, eye contact, posture, etc.) can add a spatial and imaginative component to verbal communication, and potentially express and regulate meanings that may be difficult to communicate in words (Novack & Goldin-Meadow, 2015), addressing collaborative interactions using an enactive approach amplifies the notion of how bodily action in general affords or constrains an interaction. More importantly, it raises questions regarding how the body is used in learning situations of learning within group tasks. The patterns of coordination and breakdown are useful for explaining the mechanisms that can supplement or even replace individual cognitive function; they can modulate, enable, and constrain individual sense-making processes (De Jaegher et al., 2010). Therefore, this approach can provide an entirely new set of research questions for the field of CL, enlarging our understanding of the processual aspects of collaboration in learning situations with different age groups and across disciplines. From this perspective, it is possible to question how we learn to collaborate at all, if not by representational models or simulation of other’s actions, and if this is not the case, what are we learning?

Collaboration does not emerge spontaneously just by placing students together (Koivuniemi et al., 2018), and students use different repertoires for interaction depending on with whom they work, so there is a component of intersubjectivity that emerges because of (or within) the interaction itself. In contrast to classical cognitive science (i.e., the focus is on internal mechanisms), the enactivist approach emphasizes the extended, intersubjective, and socially situated nature of cognitive systems (Gallagher & Lindgren, 2015). The possibility of understanding collaboration as non-representational and dependent on sophisticated cognitive processes opens the possibility to investigate this phenomenon in groups that are otherwise left out, i.e., small children and people with intellectual disabilities or developmental disorders such as autism (Fantasia et al., 2014). This approach also raises questions regarding how behavioral patterns of social interactions in different cultures (e.g., use of gestures or other nonverbal communication) can shape and define different parameters for collaboration.

Final Considerations

This paper is primarily a review of CL and thus emphasizes the properties of the different regulations involved in this particular learning situation (e.g., from self to socially shared regulations). It also assesses the pillars of EC theory as a possible means of enlarging the understanding of these processes. Research findings that provided grounds for expanding the understanding of the role of the body (not as a mere instrument for capturing stimuli) during the construction of shared meanings in learning processes were highlighted, as was the idea that looking at intersubjectivity in collaborative interactions can provide new insights into the phenomenon. As the current debate stands, EC changes the philosophical perspective on the mind–body relationship, which has practical and pragmatic implications. It gives a whole new meaning to the way we understand perceptual categorization in the brain in relation to our intentions and to the affordance that specific stimuli give to us.

In a broad sense, CL is an educational approach to teaching and learning that emphasizes the relevance of group work as a method of problem-solving, completing tasks, and constructing knowledge or creating products (Torres & Irala, 2014). The core element of CL dynamics assumes that peers are able to present and defend ideas, exchange diverse beliefs, question other conceptual frameworks, and actively engage in the construction of knowledge together (Torres, 2004), which has been shown to be a valuable learning resource in different age groups and for diverse disciplinary contents (Hmelo-Silver et al., 2013). Nevertheless, CL is not confined to students’ heads and dependent only on verbal dialogues; beyond being deeply situated in a time and a space and constrained by a specific social structure that defines how learning will be developed, it is also dependent on the bodies involved (body–brain systems of all participants in the group), which is yet unexplored.

The essence of embodied theories of cognition is that the body, particularly bodily systems that have evolved for perception, action, and emotion, embed higher cognitive processes. Do EC theories refute the validity of traditional cognitivist approach? No, they do not, but they nevertheless provide an alternative explanation for how human knowledge is constructed, recognizing that cognition (in whatever situation) is embodied—it depends on the body and the experiences and actions it affords in the world. This imposes more than a theoretical problem because what researchers believe cognition to be impacts very practically the way they conduct research (from the choice of methodology to interpretation of results). Thus, in this paper, it is claimed that applying EC theory to the analysis of collaboration in learning situations also will reveal important aspects of this phenomenon. EC theory changes the paradigm and adds new dimensions to the prism through which CL can be investigated, particularly to the gap that regulatory systems are not able to reach.

One consideration regarding developmental aspects of collaboration is how enlarging the views of its process can support educational practitioners in rethinking pedagogical approaches to teaching this skill. The embodied approach also impacts how we understand the development of this basic feature of human sociability throughout life; with regard to embodiment, joint actions play a role in explaining how humans develop the ability to think about their minds and work together instead of waiting for their ability to perform sophisticated cognitive process to emerge and then collaborating. To understand collaboration as it develops, we need to investigate it at a more basic level than has been done so far. Lastly, as shown in this review, there are methodological challenges ahead. Isolated verbal dialogues do not portray the complexity of the process, nor do they give us access to bodily performance in the process. At the same time, bodily engagement or the expression of bodily actions as a singular element of analysis during peer interaction does not guarantee the transparency of the meaning being elaborated. It is therefore necessary to develop methodological approaches that integrate different types of data to provide information for individual and group level analysis. Pairing the analysis of physiological data relevant to understanding the features of interactions, group’s dialogues, video observations, and guided self-reports can be an interesting path.