Introduction

Problem-based learning (PBL) is an educational approach that utilizes problems to trigger learning by engaging students in solving an ill-structured scenario related to their learning objectives (Etherington, 2011). PBL has been shown to enhance interpersonal communication, critical thinking skills, and knowledge (Azizah & Aloysius, 2023; Yennita & Zukmadini, 2021). Nevertheless, PBL has known challenges such as limited participation, dominant personalities, and diversion from the core of the problem (Järvelä et al., 2013; Näykki et al., 2014; Schmidt et al., 2006). Inadequate resolution of these challenges may lead to dysfunctional group dynamics and the potential withdrawal of one or more group members from the learning process (Näykki et al., 2014).

The widespread adoption of internet-connected technologies has enabled the integration of computer-supported collaborative learning (CSCL) across different educational settings, facilitating teamwork and knowledge sharing (Suthers, 2012). Including CSCL in PBL enhances problem-solving abilities and promotes student collaboration by offering an effective medium of communication, a platform for sharing ideas, and self-paced interactions (Lu et al., 2010; Saleh et al., 2019). Effective teamwork in CSCL can be enhanced by defining clear roles and responsibilities, combined with the teacher’s scaffolding and guidance. Therefore, roles have emerged as a method for regulating group activities and promoting individual accountability, positive interactions, and group unity, all of which are essential for successful group interaction (De Wever et al., 2008; De Wever & Strijbos, 2021; Järvelä & Hadwin, 2013).

The emergence of learning analytics (LA) has made it possible to analyze data from different educational fields, including PBL. Teachers can use data to optimize the instructional design, individualize the learning experience, and facilitate intervention when needed (Kilińska & Ryberg, 2019). The field of LA focuses on the collection, analysis, and interpretation of learners’ data to enhance learning outcomes and optimize the educational environment (Siemens, 2013). The LA process involves interconnected, multiple stages that start with data collection to produce metrics, analytics, or visualizations. By utilizing data from online PBL, we can analyze role effectiveness and group collaboration dynamics to customize student support systems and enhance collaborative learning experiences (Saqr & López-Pernas, 2023).

Over the past four decades, CSCL research has made significant advances. Initial CSCL analyses were mostly code and count, which allowed a deeply nuanced understanding of the fine-grained process of collaboration. Lately, more emphasis has been placed on understanding CSCL using a process-oriented approach that unfolds in phases, transitions, and levels (Reimann, 2009). The emergence of LA has further expanded the repertoire of methods and techniques, making it possible to analyze large amounts of data generated within online learning environments and establish patterns that inform pedagogical practices and help improve learning outcomes (Ludvigsen, 2016). However, longitudinal analysis has not received enough attention in the literature (Du et al., 2021; Martin et al., 2020). This is, of course, because collecting longitudinal data is far from trivial and requires effort, time and the subject of students' attrition and drop-out from the study.

Most of the LA studies relied on quantitative measures (e.g., centrality measures) and did not take advantage of text (the content) generated through peer interactions (Nkhoma et al., 2020; Wang et al., 2023). All the more so in longitudinal analysis, few LA studies have traced the longitudinal progression of students’ collaborative learning across time. A notable exception is the work of Saqr & López-Pernas (2022), which explored students' roles over an extended period using centrality measures. As such, tracing and understanding which roles could emerge is still an open area of inquiry.

To that end, this study aims to bridge these gaps using the community of inquiry (COI) framework. CoI is a collaborative–constructivist model that allows the analysis of the online discussion (Garrison, 2016) from multiple perspectives: cognitive, social, and teaching dimensions. COI enables researchers to identify successful elements of online learning that are based on the three presences and, therefore, intervene accordingly (Garrison et al., 1999). In doing so, COI offers a holistic framework for capturing the different dimensions of knowledge construction, social interactions, and teaching instructions, as well as supporting their development.

Our objective is to investigate in depth the longitudinal dynamics of students’ roles and role transitions using a multi-layered analysis over a full year. The following research questions guided our investigation:

RQ1: What are the emergent roles the students assume in online PBL discussions within the elements of the community of inquiry model? How do they evolve?

RQ2: What are students' role trajectories in online PBL? How do these trajectories correlate with their academic performance?

Background

Problem-Based Learning

Problem-based learning (PBL) is a student-centered educational method commonly implemented in higher education. In PBL, students are required to work in small groups to discuss and solve an ill-defined scenario (known as the “problem”) (Permatasari et al., 2019; Saqr & Alamro, 2019). Students conduct these discussions without preparation or self-study to activate students’ existing knowledge. Since the students' existing knowledge is limited, they formulate questions (i.e., learning objectives) to guide their further individual self-study within the group. Following the individual self-study period, the students meet again to discuss and arrive at answers to the previously formulated learning objectives (Dolmans et al., 2016). This group discussion is tutor-guided and aimed at acquiring knowledge, enhancing problem comprehension, and developing skills in problem-solving (Barrows, 1996).

Effective PBL requires students to be engaged in self-directed and self-regulated learning strategies that involve the regulation of their own cognitive, motivational, and emotional processes and those of their peers to facilitate optimal learning outcomes (Zimmerman, 2011). Moreover, students can modify their learning strategies through interactive discussions (Saqr et al., 2020). This collaborative approach encourages interaction among peers, engagement in active communication, knowledge sharing, decision-making, and responsibility negotiation (Marra et al., 2014).

Traditional PBL has some limitations: first, students are restricted to the text and have no flexibility to seek clarification beyond it; second, students may have a challenge connecting scenarios to real-life contexts; and third, PBL focuses on cognitive skills for problem-solving rather than affective skills like interpersonal interactions (Hoffmann & Rittche, 1997; Lu et al., 2010).

Collaborative learning (CL) and PBL exhibit some similarities and differences. First, both strategies have a similar goal and learning activity, but in CL, it does not necessitate having a problem. Second, both PBL and CL depend on student collaboration to complete the given assignment in small group settings. Third, in PBL and CL, students are responsible for their own learning objectives. Finally, interdependence is a core characteristic of both PBL and CL; their levels of dependency vary greatly, as unlike CL, teachers may assign roles in PBL (Davidson & Major, 2014). Additionally, one of the main features of PBL is that students are not necessarily focused on resolving the problem at hand but rather engage in debate and argumentation, which can lead to deeper and more advanced discussions (Saqr & López-Pernas, 2021a).

Technology-supported PBL generates a large amount of data as the students utilize a variety of resources to solve problems or complete tasks (Ünal, 2019). These data have significant value for in-depth investigation. Nevertheless, the data will not offer meaningful insights without an effective analysis. To tackle this challenge, LA has emerged to analyze and provide results to stakeholders (Pan et al., 2020). There is a wide range of LA approaches to monitor students’ progress and behavior, with the primary focus of LA methods as outlined by Siemens (Siemens, 2013) being digital data sources and quantitative methodology that is commonly used in online education research. Yet LA is criticized for using quantitative metrics with little attention to theory or the content of students’ collaboration. To capture online collaborative learning, there is a need for a powerful framework to operationalize the data through interconnected social, cognitive, and teaching elements (Garrison et al., 1999). To fill this gap, we use COI.

Community of inquiry framework

The community of inquiry (CoI) framework (Fig. 1) was initially developed for asynchronous text-based online discussions with three main components, “social presence, cognitive presence, and teaching presence,” that are necessary for a successful educational experience (Garrison et al., 1999). Social presence is related to learners’ ability to express themselves socially and emotionally through interpersonal interaction. Social presence includes group cohesiveness, interactivity with others’ threads, and affective emotional expression. A high level of social presence helps to create the discourse that is necessary for establishing cognitive presence, which is the ability of learners to build knowledge, solve a problem, and evaluate their solution (Garrison & Arbaugh, 2007). The cognitive presence occurs in four successive phases: triggering event, exploration, integration, and resolution (Garrison & Arbaugh, 2007). The third component is teaching presence, which organizes, facilitates, and directs cognitive and social presence to achieve valuable learning outcomes (Anderson et al., 2001).

Fig. 1
figure 1

Community of inquiry framework including the elements for social, cognitive and teaching presences

In the PBL context, Kamin et al. (2001) used a critical thinking framework developed by Garrison (1991) for coding the discourse of medical students. Kamin et al. (2003) also used this framework to distinguish between face-to-face and virtual PBL. Subsequently, Kamin et al. (2006) applied the CoI model to assess the teaching presence in online PBL sessions (Kamin et al., 2006). Similarly, Lajoie et al. (2014) applied CoI coding as a deductive method to investigate PBL strategies in various cultural settings.

The CoI is not a static framework as it attempts to explore the educational process from a dynamic perspective, which is one of the main CoI concerns (Garrison & Arbaugh, 2007). The temporal dimension of collaborative learning monitors the students' reactions to situational challenges throughout their collaborative process. These challenges stimulate the need to regulate both the cognitive and socioemotional aspects of collaboration (Hadwin et al., 2017). In a recent study, Junus et al. (2022) argued in their investigation over three consecutive semesters that online discussion guided by the CoI framework showed positive role-playing experiences in collaborative learning to facilitate effective interaction and knowledge building in collaborative learning environments.

Roles in CSCL

The involvement of students in online discussion does not necessarily lead to improvement in knowledge building, cognitive development, or social interaction (Fischer et al., 2013). According to the “script theory of guidance,” students’ interactions in CSCL can be more productive by giving them specific scripts for discussions and assigning different roles to learners to build knowledge (Fischer et al., 2013). A role is a set of behaviors and responsibilities that a student takes for smooth functioning and effective interaction within a group (Hare, 1994). There are two role perspectives that are increasingly gaining popularity in higher education: scripted and emerging roles (Jermann et al., 2004). Scripted roles are intentionally designed or “scripted” to enhance the collaboration experience and improve results in the group (Strijbos & Weinberger, 2010). Assigning specific roles in CSCL helps improve students’ learning by encouraging them to construct knowledge, develop critical thinking, and participate in cognitive discussions (Rolim et al., 2019; Schellens et al., 2007). In the CoI context, role assignments and providing scripts are used as strategies to attain high levels of cognitive and social presence (Gašević et al., 2015). Although scripted roles can be beneficial for learning as they guide students, they may promote compliance with less creativity (Wise & Schwarz, 2017) and cause demotivation (Radkowitsch et al., 2020).

On the other hand, emerging roles are not predefined or preassigned. They emerge spontaneously through interpersonal interactions and conversations among group members. These roles involve aspects that were not initially planned or specified at the start of the group work (Strijbos & Weinberger, 2010). Emerging roles lead to greater learner agency, as they give students more control over their roles in the group, which can enhance motivation and the learning experience (Strijbos & Weinberger, 2010). The advancements in LA allow researchers to detect emerging roles in online discussion forums using different methods such as social network analysis, natural language processing, epistemic network analysis, and clustering algorithms (Dowell & Poquet, 2021; Gašević et al., 2019; Rolim et al., 2019).

The conceptualization of roles by Strijbos and De Laat (2010) in CSCL has three role levels. First, the micro-level “roles as tasks,” which specifies collaborative learning activities (either process or product tasks) as roles. Second is the meso level “roles as patterns,” where students participate in multiple activities, either processes, products, or a combination of them, to create a pattern forming a role play in the CSCL. Third is the macro-level “roles as stances,” where students’ participative patterns reflect an attitude toward their tasks. This stance is affected by the perceived value of the tasks for the students and their engagement level (Strijbos & De Laat, 2010). A more recent study conducted by Saqr and López-Pernas (2022) introduced a fourth longitudinal level, which considers “roles as disposition” (Fig. 2). The authors hypothesized that students, regardless of the course or the task, consistently reemerge in their roles in a predictable pattern chronologically, even if they initially held a transient attitude towards their tasks. Thus, disposition may be considered the primary factor for explaining most of the roles taken, along with other secondary factors like teacher, group, and tasks (Saqr & López-Pernas, 2022).

Fig. 2
figure 2

Adapted from the roles framework by Strijbos & De Laat (2010) and its longitudinal extension by Saqr and López-Pernas (2022)

Temporal perspectives in CSCL

Learning is a dynamic and progressive process that follows a temporal and sequential framework (Saqr et al., 2019; Saqr & Peeters, 2022). The importance of temporality for a deeper understanding of the collaborative learning process has been established in the existing literature (Mercer, 2008; Slattery, 1995). Nonetheless, there is inadequate utilization of learning data in temporal analysis and investigation (Knight et al., 2017; Martin et al., 2020). To understand the full breadth of CSCL, several methods can be used to capture the dimensions and perspectives of CSCL, including descriptive statistics and visualization, social network analysis, process mining, sequential analysis, discourse analysis and epistemic network analysis (Lämsä et al., 2021). In this study, we follow the latest advances in LA to take advantage of the multitude of methods to provide a comprehensive analysis of CSL roles and their dynamics. We discuss these methods in the following section.

Process mining (PM) is a data analysis technique based on building a process model to extract useful information from event log data sorted in chronological order (Romero & Ventura, 2020). In the educational context, PM is mainly used “for model and theory development rather than statistical testing" (Bannert et al., 2014). PM demonstrates the sequence and co-occurrence of events using its varying methods of data analysis (Reimann, 2009). The process model is utilized to study educational processes such as assessment (Pechenizkiy et al., 2009), recommendation of learning resources (Hachicha et al., 2021), and analysis of learning strategies (Ahmad Uzir et al., 2020). Malmberg et al. (Malmberg et al., 2015) used PM to identify learning strategies and their relationship to types of personalities. Similarly, Peeters et al. (2020) used PM to study self-regulation tactics. PM can be used to compare the process models of different study groups, such as the patterns of successful and less successful students in self-regulated learning (Bannert et al., 2014) and the strategies of high- and low-performing students (Channa et al., 2023).

Sequence mining is another data analysis technique used to unveil the relative order of events, actions, or behaviors (Zhang & Paquette, 2023). The patterns of SM can be used to reveal learning approaches, map students’ behavior over time, and evaluate scaffolding (Zhang & Paquette, 2023). These sequential patterns can be summarized, grouped, or clustered into a set of groups (Gabadinho et al., 2011). A study by López-Pernas et al. (2021) examined the sequential patterns of student activities to identify clusters of various learning techniques and tactics. Other research applied SM to identify and analyze subgroups of learners and reveal behavior differences (Kinnebrew et al., 2013; Kinnebrew & Biswas, 2012). In their study, Boroujeni et al. (Boroujeni et al., 2017) analyzed the temporal pattern, discussion contents, and social interaction of students using SM to monitor the role transition from an active state to a passive status in discussion forums (Boroujeni et al., 2017).

Many researchers have extended beyond SM in their analyses and attempted to use more than one method to reach a broader view of the educational process. For instance, Matcha et al. (2019) used sequence mining, clustering, and process mining to detect and interpret students’ tactics to perform a particular task and strategies in learning to identify common behaviors of students in three successive course editions. Similarly, Saqr and López-Pernas (2023) and Elmoazen et al. (2022b) applied sequence and process mining for temporal analysis of learner’s interactions in online PBL discussions.

Epistemic network analysis (ENA) is a recent data analysis technique that reveals the co-occurrence of concepts in educational contexts (Shaffer et al., 2009). ENA analyzes the coded discourse on a segment-by-segment basis to recognize the weight of connections between codes (Csanadi et al., 2018). The illustrated ENA graphs display the interaction between codes within a unit of analysis to understand the structure and dynamics of the discourse (Shaffer & Ruis, 2017). Within ENA graphs, edges indicate the relationship between codes in each stanza. Stanza is a researcher-defined segment of the discourse, which can be sequential, time-based segments, or according to the nature of the dataset (Shaffer et al., 2016). Additionally, ENA can create trajectory graphs to show how networks evolve and the changes in connections between codes (Shaffer et al., 2016).

ENA has been applied to evaluate students’ interactions, compare students' performance, and assess educational interventions (Elmoazen et al., 2022a). For instance, Andrist et al. (Andrist et al., 2015) used ENA to investigate the long-term evolution of learners’ networks. In their study, Swiecki and Shaffer (2020) applied ENA to analyze the cognitive and social aspects of teamwork in problem-solving training exercises. ENA was used to compare high performers and low connections (Sullivan et al., 2018) while Ferreira et al. (2022) compared emerging and scripted roles as different analysis units. Combination of ENA with other data analytics approaches such as social network analysis (Gašević et al., 2019) and process and sequence mining (Saint et al., 2020) can unfold sequential and temporal patterns of the learning process and lead to a deeper understanding of collaborative learning (Gašević et al., 2019; Saint et al., 2020).

Identification and longitudinal perspective of roles in CSCL

Several works have been devoted to the identification of roles using qualitative methods (Strijbos & De Laat, 2010). Recently, others have used computational methods utilizing algorithms and data-driven techniques such as centrality measures (Xie et al., 2018, p. 201) , diffusion centrality (Saqr & Viberg, 2020), clustering (Marcos-García et al., 2015) , and algorithmic identification (Dowell et al., 2019). Machine learning algorithms were also used to analyze interactions and uncover patterns in data to devise roles (Rosenberg & Krist, 2021). However, given the reliance of these methods on quantitative measures (counts of posts), they fall short when it comes to the content and the depth of the interaction, and in particular, the higher-order constructs (De Laat & Lally, 2004; Strijbos & De Laat, 2010). Given this gap, the CoI framework can provide a strong socio-cognitive foundation for analyzing emerging roles in online discussions (Ferreira et al., 2022).

Whereas research on role identification is abundant, research focusing on studying role dynamics has been limited. Existing research has mostly investigated interaction dynamics—not roles—in a limited number of time points: two courses (Aviv et al., 2003), three courses (Kim & Ketenci, 2019). For instance, Laat et al. (Laat et al., 2007) studied the dynamics of three phases of collaborative work through a multimodal approach using social network analysis (SNA), content analysis, and critical event recall. Similarly, Skrypnyk et al. (Skrypnyk et al., 2015) studied week-by-week interactions using SNA in massive open online courses (MOOCs). The social dynamics associated with leadership behavior were studied by Xie et al. (Xie et al., 2018) to monitor leadership clusters (Xie et al., 2018). SNA was used by Saqr et al. (2018) to monitor CSCL in three courses over a full-term period and find out the role change as a response to the teacher’s intervention. Possibly the closest example is the work of Saqr and López-Pernas (2022) who used centrality measures to define the roles of the students in online PBL according to their activity in CSCL and cluster students according to their trajectories. Most recently, (Wu & Ouyang, 2024) coded online collaborative discussions and used the clustering approach to define the roles of the students and ENA to detect role transitions. Nevertheless, all of this work either studied the progression of interactions or the evolution of message counts or centralities. Therefore, the study of the evolution of CSCL roles over time is so far understudied and remains a gap that research needs to address. A possible reason behind this shortcoming is the need for longitudinal data as well as several methods and steps to trace an extended duration of time.

Drawing up Reimann et al.’s (2014) concern that relying on a single sequential approach can hide in details due to the ontological flatness of educational data, this study used a multilevel analytical approach to enable the identification of complete role dynamics in online PBL. Sequential mining points out sequences of roles and gives a coherent picture of how students' roles are evolving over time. Visualization approaches allow exploration of the content, duration, order, and co-occurrences of events. Transition analysis was used to analyze the way previous roles transitioned. Such a combination yields a much better understanding of the temporal aspects of students' roles, thus going beyond the concerns expressed by Reimann et al. (2014) and giving space for a more fine-grained interpretation of data.

Methodology

Context

The current study takes place in an educational setting with 135 first-year Saudi dental students, 47 females and 88 males. It includes four main courses: histology, anatomy, genetics, and pathology, all delivered through a blended learning approach within a PBL-based curriculum. All first-year students who enrolled in the courses participated in the study recruitment process. Each course typically has a duration of 5–6 weeks. Students participate in lectures covering specific topics, complemented by weekly PBL sessions focusing on related topics. These PBL sessions facilitate problem-solving and discussion. In PBL, the students begin with a face-to-face session. Simultaneously, they engage in online discussions via the LMS forum, sharing their learning objectives and insights, which are subsequently presented in a second face-to-face session at the end of each week.

This study investigated online PBL interactions within four consecutive courses across the academic year. Each course had three to four PBL sessions, with a total of 13 sessions contributing 15% to the final grade. This assessment is part of a comprehensive approach to assess students’ performance. The final examination weighs 60%, and the mid-term examination has 25% of the overall grade. The grading structure shows the importance of critical thinking and collaborative problem-solving in shaping the students’ overall performance.

In the PBL sessions, students are organized into small groups. Students start the week with a face-to-face session to discuss specific curriculum issues. Within this session, students identify their learning objectives and divide responsibilities, then move to online collaboration and share their work by posting it on the discussion boards for the entire week. The online PBL discussion helps to share information, build knowledge, and improve critical thinking, as students will have the opportunity to search, understand, argue, and actively engage in reviewing and commenting on peers’ posts under the supervision of their tutor. By the end of the week, another face-to-face session would be held for the group and their tutor to conclude and present their outcomes. In our study, we focused on the students’ interaction in the online component of the blended PBL process.

Data collection and coding

The data collected for this research was from the Moodle LMS. An ethical approval was granted to the study proposal by the ethics committee of the institution. The study focused on dental students' interactions within discussion boards related to PBL sessions. These interactions were observed over four consecutive courses across the academic year, covering the entire academic year. The dataset included user IDs, interaction IDs, timestamps, groups, authors, discussion topics, and posts. The collected dataset was organized using Microsoft Excel. Initial data cleaning was conducted to remove irrelevant information to be prepared for coding and subsequent analysis. All students’ IDs were anonymized by the eLearning Unit using the “digest” R-package (Eddelbuettel et al., 2021).

The dataset was coded using the ten elements of CoI’s social, cognitive, and teaching presences (Fig. 1). Coders assigned the values “zero” for the absence of a code and “one” for the presence of a code. The first two authors coded students’ discourse. Initially, 200 posts were independently coded with moderate intercoder agreement using the kappa test (κ = 0.76) (McHugh, 2012). Thereafter, they discussed any conflict before starting a second round of independent coding for another 100 posts that showed a higher intercoder agreement (κ = 0.88). The first author then proceeded with the coding process and discussed any uncertainties with the second author. This resulted in a total of 24632 annotations, rather than the original 9301 posts, as each post may have multiple codes attached to it.

Social presence refers to students’ interactions and building social and emotional relationships while learning online. Table 1 presents the codes of social presence outlined by Rourke et al. (1999).

Table 1 Codes for social presence categories and their indicators

Cognitive presence is related to students’ interaction, problem solving, critical thinking, and knowledge construction. It involves four codes listed in Table 2, following the modified version of Chen et al.’s (2019) codes.

Table 2 Codes for Cognitive presence and their indicators

Teaching presence refers to the role of the instructor or leader in designing, guiding, and supporting the online discussion experience (Table 3). We used codes by Weerasingh et al. (2012) for the teaching presence.

Table 3 Codes for teaching presence and their indicators

Both teachers and students can show teaching presence, and the teaching presence of students was found to be important for learning and metacognition development (Garrison & Akyol, 2015). Few studies have examined the teaching presence of students (Jansson et al., 2021). Studies focused only on the teaching presence of the instructor (Kilis & Yıldırım, 2019; Zhao & Sullivan, 2017). Other studies have used teaching presence for students, for instance, (Chen et al., 2017; Jansson et al., 2021; Weerasinghe et al., 2012). We included teaching presence as a dimension in role identification because it can aid in understanding students’ collaborative patterns and facilitate better role identification, despite the limited research on this topic.

Given the longitudinal design of our study, investigating the evolution of roles required several steps. First, we need to identify the roles using a computation method similar to the previous research (Dowell et al., 2019; Marcos-García et al., 2015). Second, we need to understand the order and temporal evolution of these roles. Third, we need to understand how and how often roles change from one to another (e.g., from an active to an inactive role) and at what frequency this happens. Therefore, this study uses several techniques and methods. First, we used mixture models to identify roles; in doing so, combining unsupervised machine learning methods with manual coding (Greene et al., 2019). Human coding is better aligned with theoretical interpretation and offers an in-depth understanding of the content (Rosenberg & Krist, 2021). To understand the roles further and reveal the patterns of association between the coded content, we employed ENA (Shaffer et al., 2016).

  • RQ1: What are the emergent roles the students assume in online PBL discussions within the elements of the community of inquiry model? How do they evolve?

Identifying the roles based on CoI indicators (from Variables to States)

To identify the roles of students, we used Latent Class Analysis (LCA). LCA is a statistical method to discover heterogeneity and find latent groups (Hickendorff et al., 2018). LCA was utilized to cluster students into distinct subgroups using the glca R-package (Rosenberg et al., 2019). LCA enables the grouping of students into homogeneous subgroups based on CoI indicators of social, cognitive, and teaching presence (Fig. 3A).

Fig. 3
figure 3

Overview of the data analysis methodology. A Application of latent class analysis (LCA) for student role classification. B Utilization of hierarchical clustering to detect different students’ trajectories

LCA gives a maximum likelihood to each individual assigned to each subgroup by identifying the group with the highest probability, as stated by Spurk et al. (Spurk et al., 2020). LCA offers some advantages over other clustering methods, as it does not have assumptions for homogeneity and the normality of the data, the group assignment is directly estimated from the model, and it utilizes multiple parameters to facilitate model evaluation (Weller et al., 2020). Thus, it is applicable across various data types (Hickendorff et al., 2018; Scotto Rosato & Baer, 2012) and especially in education for students’ roles. (Scrucca et al., 2024).

To conduct LCA, we should estimate classes for the underlying groups to uncover statistical models of representations of these groups and their characteristics. We estimated ten classes based on Weller et al.’s (2020) recommendations. The model and number of classes were then determined based on Bayesian Information Criterion (BIC) and entropy. We chose three classes where BIC has the largest decrease (elbow method) (Nylund-Gibson & Choi, 2018) with an entropy of 0.9 (well above 0.8) and average posterior probability for cluster 1 was 0.98, cluster 2 was 0.95 and cluster 3 was 0.92, which indicates a very good students’ classification (Clark & Muthén, 2009).

To evaluate cluster separation, a comparison of means was performed. First, the data were checked for normality using the Komogorov–Smirnov test, which showed a non-normal distribution, so the Kruskal–Wallis test was used to compare the three classes, and post hoc pairwise Dunn’s test with Holm’s correction was used for post hoc pairwise comparisons (Holm, 1979). Epsilon squared (ε2) was used to measure the effect size (King et al., 2010; Tomczak & Tomczak, 2014), where ε2= 0.01 to < 0.04 indicates a weak effect; ε2 = 0.04 to < 0.16 indicates a medium effect; ε2= 0.16 to 0.036 indicates relatively strong, 0.36 to < 0.64 strong effect, and finally > 0.64 for very strong effect (Cohen, 1988; Rea, 2014).

Epistemic network analysis

To map the connection patterns between CoI indicators in different roles and visualize the network structure, we used ENA. We used the ENA Webkit (epistemicnetwork.org) for network visualization and analysis of code co-occurrences within each student role. ENA was used based on the premise that relationships matter between ideas and are more important than their counts, as it allows researchers to understand the cognitive structure of discourse elements. ENA offered a better understanding of how students connect different discourse elements as well as allowed us to compare different roles. Each week was considered an independent time frame for analysis (stanza). Nodes represent the codes of CoI indicators within the ENA network, while edges indicate the strength of connections between these codes. These network visualizations helped us understand the relative importance of each CoI indicator within each role. To compare different student roles, we used ENA difference graphs. These graphs were calculated by subtracting the weight of each connection in one network from the corresponding connections in another. The most strongly connected codes likely represented the main characteristics of each role, which influenced its ultimate name.

From states to sequences

To understand the longitudinal unfolding of roles, we used sequence analysis to model the succession of roles across time. To do so, we built a time-ordered sequence for the 13 weeks in the four courses included in the study. Each student would have a sequence of 13 roles, one for every week. TramineR R-package (Gabadinho et al., 2011) was used to construct an index plot to visualize the sequence of roles. TramineR is used in educational contexts to analyze and visualize sequence data and longitudinal trajectories (López-Pernas et al., 2021; Saqr & López-Pernas, 2021b, 2022). Through the index plot, each horizontal bar represents a single student, and each of the 13 colored blocks along the bar represents the temporal succession of roles along the four courses (Gabadinho et al., 2011).

An example for a hypothetical student who was a leader for the first 2 weeks, a social mediator at week 3, a peripheral explorer at week 4, a social mediator at week 5, and so on would look as follows:

$$\text{Leader}\to \text{leader}\to \text{Social Mediator }\to \text{Peripheral Explorer }\to \text{ Social Mediator }\to \text{leader}$$
  • RQ2: What are students' role trajectories in online PBL? How do these trajectories correlate with their academic performance?

Trajectories of Students’ Roles

To identify clusters of similar trajectories, we used hierarchical clustering as a temporal clustering method. We utilized the longest common subsequence (LCS) method from the TraMineR package (Gabadinho et al., 2011), to calculate the distance between the subsequences and organize students based on similar trajectories (Fig. 3B). This method finds sequences that show similar temporal changes to identify different trajectories and compare them based on sequence properties. Each trajectory was presented and labeled in an index plot. These trajectories were subjected to comprehensive analysis based on the frequency and transitions associated with students’ roles over 13 weeks.

Process mining of trajectories

To visualize and estimate the probabilities associated with transitions of different roles within different trajectories, we used stochastic process mining based on the Markov model (MM) using the seqHMM R package (Helske & Helske, 2023). Despite the availability of various process mining algorithms, we used the Markov model to assess temporal processes, as it constructs transitions between different roles, with each role relying on the previous one through sequence interactions (Molenaar & Chiu, 2014) without the need to trim the data as other process mining methods may require, which may affect the output. For better interpretation, we applied a threshold of 0.05 to show the probability of a role transition below 5% thinner and lighter in color using the qgraph package (Epskamp et al., 2012). The resulting model was graphically represented, with nodes representing roles and edges showing transitions with their direction and transition probabilities and the initial probabilities as the borders for each node (Helske et al., 2024).

Learning trajectories and academic performance

To analyze differences in the grade point average (GPA) of final exam grades among the three learning trajectories, we used Kruskal–Wallis analysis of variance (Ostertagová et al., 2014). To further understand the practical significance of the results, epsilon-squared was calculated to estimate the effect size (Tomczak & Tomczak, 2014). Post hoc pairwise comparisons were employed using Dunn’s test with Holm’s correction (Holm, 1979).

Results

The full dataset included 9301 posts (8458 for students and 843 for teachers). The total student number was 135. The data included students’ online interactions on the discussion board for four consecutive courses throughout the academic year. These interactions were coded using nine indicators of CoI dimensions to get 24632 annotations, as some posts received multiple codes during the coding process. Table 4 displays the mean indicator observed for each dimension of the CoI framework. The mean values of the social presence affective score showed a low level of emotional expression (0.689 ± 1.25), the interactive score demonstrated a high level of interaction (3.93 ± 3.34), and the cohesion score showed a moderate level of group cohesion (2.02 ± 2.8). The analysis excluded the code “resolution” for cognitive presence due to its disproportionately small number of posts (23 posts). Other cognitive presence codes revealed varying levels; the triggering score showed a low level of initiating cognitive discussion (0.241 ± 0.618), the exploration score showed a high level of knowledge exploration (3.84 ± 2.61), and integration showed a moderate level of argumentation in knowledge construction (0.798 ± 0.972). For the teaching presence, the students showed a relatively low score in their teaching dimension in the online discussion. The organization score was the lowest (0.143 ± 0.414), instructions showed a higher score (0.685 ± 1.16), and facilitation showed a moderate value between the other two scores (0.343 ± 0.662).

Table 4 Frequency of various CoI indicators per week
  • RQ1: What are the roles the students assume in online PBL discussions within the elements of the community of inquiry model? How do they evolve?

Roles identification

We used latent cluster analysis (LCA) to cluster students according to the nine indicators of the CoI framework. The best CA model suggested three roles, which were given the labels of leader, social mediator, and peripheral explorer according to COI indicators, as explained below (Fig. 4)

Fig. 4
figure 4

Centroids for codes in each of the identified roles. The centroid shows the mean value of each code per role

Leader role: (n = 405, 23.7%)

The “leader” role is characterized by high levels of social presence per week, especially interactive (mean of 8.43) and group cohesion (mean of 4.56). This role has high values of cognitive presence, a notable increase in exploration (mean of 7.3), and, to a lesser extent, integration (mean of 1.36). The teaching presence indicators have low values for organization (mean of 0.16), facilitation (mean of 0.55), and higher instructional support (mean of 1.59) (Table 5).

Table 5 Kruskal–Wallis comparison of the CoI indicators across identified roles

Social mediator role: (n = 857, 50.1%)

Students with “social mediator” roles exhibit a moderate level of involvement across social dimensions, especially in the interactive indicator (mean of 3.51) and group cohesion (mean of 1.74). Within cognitive presence, their activity is predominantly centered around exploration (mean of 3.58), with less involvement in triggering (mean of 0.25) and integration (mean of 0.73) phases. Regarding teaching presence, their participation was lower than both social and cognitive dimensions, with instructions having the highest teaching presence actions (mean of 0.6) (Table 5).

Peripheral explorer role: (n = 450, 26.3%)

The “peripheral explorer” role is characterized by limited participation across various CoI dimensions. This role exhibited lower levels of affective involvement (mean of 0.07), interaction (mean of 0.68), and group cohesion (mean of 0.27) within the social presence. The cognitive presence indicators also showed lower values, except for the exploration, which showed moderate activity (mean of 1.22). The teaching presence followed the trend of the other two presences, with the lowest value among all codes (Table 5).

Comparison between identified roles and examining the separation of roles

The Kruskal–Wallis comparison was performed to compare the identified roles (leader, moderate, and isolate) and assess whether the roles were well separated along various dimensions of CoI: social, cognitive, and teaching (Table 5).

Social presence: all indicators showed highly significant differences among the roles (p < 0.001), with a high effect size on the observed variance for affective (16.5%) and 38.5% for cohesion. A very strong effect size was identified in the interactive indicator (79.6%).

Cognitive presence: the indicators showed highly significant differences among the roles (p < 0.001), with different levels of effect sizes on the observed variations. The triggering and integration phases showed a moderate effect size (5.3% and 10.1%, respectively), while exploration showed a very strong effect size (74.7%).

Teaching presence: instructions and facilitation showed a highly significant difference between the roles with varying levels of effect size that denoted a difference in impact on observed variance that was strong for instruction (23.1%) and medium for facilitation (4.6%). The only insignificant indicator with a weak effect size (0.2%) was organization. This could be explained by considering the organization as a duty for each student to have a weekly objective as an assigned task.

The pairwise comparisons showed statistically significant differences between all pairs except for the organization indicator. These comparisons revealed well-separated roles, with significant differences observed among most indicators.

Mapping CoI code relationships through ENA

Leader role: the ENA network analysis revealed the connection patterns within the leading students and their role in promoting strong connections across all dimensions of the CoI framework (Fig. 5A). Notably, the network exhibited larger node sizes, reflecting active social interaction, cognitive exploration, and teaching instruction. Moreover, it revealed a strong linkage between cognitive components (specifically exploration and integration) and social interaction. The leaders showed a strong connection between the instructional aspect of teaching presence and other indicators. When comparing the leaders’ ENA network to other roles (Fig. 6), it showed a stronger linkage between instructions and other codes. Leaders showed weaker connections between social presence and cognitive codes than social mediators. On the other hand, their social presence codes had a stronger connection than the peripheral explorers.

Fig. 5
figure 5

Epistemic network analysis for the three roles

Fig. 6
figure 6

Subtraction comparison between the Epistemic Network for the three roles (green lines represent stronger “leaders” connection, yellow lines represent stronger “social mediators” connections, and red lines represent stronger “peripheral explorers” connections)

Social mediator role: the ENA showed the interconnections of the CoI framework among social mediator roles, as shown in Fig. 5B. The network has enlarged node sizes for interactivity, group cohesion within social presence, and exploration within cognitive presence. The edge analysis showed strong connections between two aspects of social presence (interactivity and group cohesion) and two indicators of cognitive presence (exploration and integration). When comparing social mediator network with the leaders, social mediators showed strong linkages between interactivity and group cohesion codes of social presence and exploration of cognitive presence. However, in comparison with peripheral explorers, there is a notably stronger connection between social, cognitive, and teaching codes, except for the exploration–integration–interactivity triangle.

Peripheral explorer role: the analysis of the ENA network revealed the connection patterns within the peripheral explorer students (Fig. 5C). The network displayed large node sizes in cognitive exploration, cognitive integration, and social interaction. The connection appeared to be strong between the triads of exploration, integration, and interaction. Despite the isolated nature of this role, exploration acts as their primary activity, followed by cognitive integration. The comparison between peripheral explorers and the other two roles showed a strong cognitive exploration connection with cognitive integration and social interactivity codes (Fig. 6).

Mapping the temporal unfolding of roles across time

To map the role succession over time by assembling the roles across the 13 weeks, the sequence was sorted using the longest common subsequence (LCS) and plotted in an index plot (Fig. 7). In the index plot, every student’s sequence appears horizontally as 13 colored sequences of blocks, with each block representing the student’s role in the specified week. The green color reflects leaders, the yellow color represents social mediators, and the red color represents peripheral explorers.

Fig. 7
figure 7

Index plot representing the sequence of the identified roles

For easier graph interpretation, the roles are arranged in an order that reflects their similarity. The upper part of the plot has a predominant green color, which represents the leaders that may sustain their leadership or oscillate between leader and social mediator roles throughout the 13 weeks. The middle is primarily yellow, representing the social mediators that are mostly stable in their roles with minor changes to either the leader or peripheral explorer roles. Lastly, the lower part is predominantly red, representing the peripheral explorer students who may in some weeks assume the social mediator role and are less likely to take a leadership role.

  • RQ2: What are students' role trajectories in online PBL? How do these trajectories correlate with their academic performance?

Role trajectories

Similar trajectories were identified using the LCS method to calculate the distance between the longest common subsequences. and organize students based on similar trajectories.

The identified trajectories were named the “active constructive trajectory”, the social interactive trajectory”, and the “free riders trajectory”.

Active constructive trajectory: in the active constructive trajectory (n = 40), students predominantly have leadership roles (78.5%), maintaining this position with a high degree of consistency (75%). In 24% of cases, they transitioned from leaders to social mediators. From their social mediator role, the majority of students (58%) continued in the same role or returned to their previous leadership role (39%). A few students show periods of decreased activity, adopting the peripheral explorer for up to two weeks, then the majority (75%) transitioned to a social mediator role, and a smaller percentage (12%) reassumed leadership roles (Fig. 8).

Fig. 8
figure 8

Index plot of the sequence (left) and network (right) of the active constructive trajectory. The pie ring around each node represents the initial probability

The social interactive trajectory: the social interactive trajectory (n = 48) where students predominantly initiated their activity in discussion as social mediators (81.3%). A high proportion of social mediators (85%) showed stability in their roles, with 9% transitioning to leaders, and less frequency to peripheral explorers (5%). Students in this trajectory also showed transitions to social mediators from leadership (83%) and from peripheral explorers (77%) (Fig. 9).

Fig. 9
figure 9

Index plot of the sequence (left) and network (right) of social interactive trajectory. The pie ring around each node represents the initial probability

The free riders trajectory: the free riders trajectory (n = 47) displayed a diverse initiation among students, either as peripheral explorers (51.1%), social mediators (34%), or even leaders (14.9%). Throughout this trajectory, 81% of students persist in their peripheral explorer role. It is evident that a high percentage of students migrate from social mediators (76%) and from leaders (40%) to peripheral explorers. Students rarely transitioned directly from peripheral explorer to leader. Instead, such transitions typically involve an intermediary social mediator role, lasting about a week or two (Fig. 10).

Fig. 10
figure 10

Index plot of the sequence (left) and network (right) of free riders trajectory. The pie ring around each node represents the initial probability

Learning trajectories and academic performance

To compare academic performance among students exhibiting distinct learning trajectories: active constructive, social interactive, or free riders (Fig. 11). The Kruskal–Wallis test revealed a statistically significant difference among the three learning trajectory groups [H = 40.59, p < 0.0001, confidence interval (CI) (0.22–1)]. Dunn’s pairwise comparisons showed that free riders have significantly lower mean grades (M = 62.9 ± 11.9) in comparison to active constructive (M = 77.4 ± 8.64) and social interactive (M = 78.2 ± 10.5) trajectories. On the other hand, there was no statistically significant difference between students’ grades in active constructive and social interactive trajectories. The effect size (ε2 = 0.30) indicates a large effect of the students’ trajectory on their grades.

Fig. 11
figure 11

Violin plot to compare the course grade among the three trajectories using Kruskal–Wallis test and the Dunn pairwise test

Discussion

Research on roles—and collaborative learning in general—tend to be on the cross-sectional side, and temporally oriented research is lacking. In particular, longitudinal research using qualitatively coded data based on a robust theoretical framework and modern LA methods is rather limited. Therefore, our study aimed to contribute to existing knowledge by applying the CoI framework on a longitudinal scale to analyze the dynamics of students’ interactions in online PBL. We relied on CoI as a mature theoretical framework that can help unpack and understand students' interactions within the collaborative context to capture students’ learning processes (e.g., cognitive and social) and their collaborative learning environments (e.g., teaching and social).

  • RQ1: What are the emergent roles the students assume in online PBL discussions within the elements of the community of inquiry model? How do they evolve?

The first research question aimed to discover the different roles of students in online PBL. We discovered three distinct roles using the CoI framework: leaders, social mediators, and peripheral explorers. Each role showed unique characteristics across social, cognitive, and teaching presences. The leader role, with its high social, cognitive, and teaching presence, and in particular: social interactivity, exploration, and instruction indicators. Unlike previous research that relied solely on SNA metrics and described students based on their “activity” (Medina et al., 2016; Saqr & López-Pernas, 2022; Xie et al., 2018) or influence, (Saqr & López-Pernas, 2021a) these findings offer an in-depth characterization of the leader role by highlighting their several interaction types and prominence in teaching presence. We argue that the leaders identified in this study are more accurately and precisely described, because they were identified based on a holistic mapping of their contribution, both in frequency as well as the relationships between the interactions. These leaders, alongside their social activity, were keen explorers of cognitive presence and provided instructional support for their colleagues.

Students with the least effort and involvement in the CSCL were labeled as peripheral explorers. The peripheral label was used before to refer to students who are largely inactive, e.g., (Kim & Ketenci, 2019; Ouyang & Chang, 2019). Other labels include lurkers (Dowell et al., 2019; Wu & Ouyang, 2024) or isolates (Saqr & López-Pernas, 2022). The peripheral explorer role, which included over a quarter of the students, is characterized by few and limited interactions. Their social and cognitive values were low, except for exploration, which was their most prominent activity. The ENA network for peripheral explorers confirmed their peripheral nature, with exploration having the strongest connections. The teaching presence of peripheral explorer students was somewhat muted in comparison to other roles. Contrary to SNA studies that only offer a description of the “inactive” or “peripheral” role in terms of activity only, our study shows their occasional exploration, which could be exploited to drive more engagement.

The intermediate role was labeled as a social mediator. The mediator students, acting as a bridge between students, have a meditation role with a moderate level of knowledge construction (Ouyang & Chang, 2019). The social mediator role included approximately half of the students, who centered around social interaction and cognitive exploration. However, their teaching presence was lower compared with that of leaders. The ENA network for social mediators is a network of social and cognitive threads that are tightly connected, confirming the synergy between these aspects of their learning experience. The labels mediator was used in previous research (Ouyang & Chang, 2019), and other labels were also used as coordinators (Marcos-García et al., 2015), and connectors (Jimoyiannis et al., 2013). Yet, such research has only described this mediation based on quantitative measures.

As such, while previous research has often focused on the frequency of student interactions in online PBL or centrality measures [e.g., (Saqr & López-Pernas, 2022)], in our study, we used the actual content of online discussions to identify different roles based on the quality and nature of students' contributions rather than simply the quantity of their posts. For example, Leaders were not only active contributors but also demonstrated higher levels of cognitive interaction and instructional support in their posts. We argue that this method is more precise and offers clues for the missing elements that can be supported.

  • RQ2: What are students’ role trajectories in online PBL? How do these trajectories correlate with their academic performance?

Due to the possible heterogeneity of how roles evolve and the fact that roles can have different temporal patterns and transitions throughout the 13 weeks, we used a clustering algorithm to capture the different trajectories. The identified trajectories were labeled according to their characteristics: active constructive, social interactive, and free riders, with various patterns of transitions and role changes.

In our study, the active constructive trajectory showed active involvement in collaboration and maintained a consistently active pattern. Students within this trajectory strived to maintain leadership roles and occasionally assumed mediator positions. In the social interactive trajectory, students focused on social interaction and exploration. The social activity observed may reflect the group’s high level of interaction. Lastly, the free rider trajectory consisted mainly of less active (peripheral explorer) students compared to their counterparts, with predominant focus on exploration. This reduced student activity is reflected in the lower overall group activity compared to the active constructive trajectory. Assigning a leadership role to inactive students may help them build more social and cognitive connections with their peers (Ouyang & Chang, 2019).

While investigating how roles evolve in different trajectories is necessary to understand the temporal succession of roles (and it may be of interest to researchers in its own right), it also has several implications. By anticipating the potential roles students might assume during future collaborative work, educators can design activities and group formations to maximize productive collaboration. This involves structuring activities with these roles in mind, matching students’ strengths and interests to specific tasks, ensuring a more equitable distribution, and equipping groups with complementary skill sets. Finally, understanding potential roles enables teachers to provide targeted support, such as guiding leaders in facilitating discussions effectively.

Research assessing the temporal unfolding of roles across different collaborative contexts in CSCL is rare (Saqr & López-Pernas, 2021b, 2022). Yet, some examples exist examining the longitudinal trajectories of other constructs, e.g., students’ self-regulation. In particular, the work of (anonymized) described three trajectories: "intense” (intensely engaged most of the time, similar to the active constructive trajectory in our study where leaders were dominating); a “fluctuating trajectory” (similar to the social interactive trajectories where moderators dominated); and “wallowing-in-the-mire” (similar to our free riders trajectory where isolation dominated across time).

The overarching pattern in this study was that students in all trajectories consistently exhibited behaviors aligned with their initial identified role rather than changing it. This suggests that initial students’ patterns, or “role as stance,” may reflect a consistent strategy for collaborative learning. For instance, we found that students in the social interactive trajectory remained social mediators most of the time, and they rarely changed from time to time into leaders or peripheral explorers. Similarly, students in the free rider’s trajectory followed the same path. Similarly, we also found that students continued to be peripheral explorers most of the time, and it was clear that isolated students dominated the free-rider trajectory.

The limited transitions of roles in the three trajectories confirm the hypothesis of Saqr and López-Pernas (2022) who postulated that roles may be considered dispositions (Fig. 2). This suggests that students who begin with rich connections with peers are more likely to remain active, with a higher capability to spread and receive ideas, whereas inactive students who start with poorer connections will have difficulty building up connections and contributing to their peers cognitively (Ouyang & Chang, 2019). Students who are less organized often depend on assistance from others and need support in the form of instruction on more effective learning practices or “how to learn” (Hattie & Donoghue, 2016; López-Pernas & Saqr, 2021).

This stability of roles has advantages and disadvantages. On the one hand, leaders are more likely to reinforce their leadership skills and perform the leadership role as expected in future iterations. On the other hand, it shows that students are less likely to learn, expand their social skills, or experiment with other roles without being assigned by the teachers. In other words, the lack of structure, either in total or in part, is less likely to result in the spontaneous emergence of different roles or the acquisition of social skills beyond students’ “comfort zone.” This observation aligns with the “stratified learning zone" described by Abrahamson and Wilensky (2021) in demographically diverse classrooms, where students primarily interact with peers with similar experiences and perspectives. This can limit their exposure to diverse viewpoints and hinder broader social skills development (Abrahamson & Wilensky, 2021).

Interestingly, contrary to previous research based on quantitative analysis that consistently described the leaders as having higher grades (Saqr & López-Pernas, 2021a), we found that both active trajectories (active constructive and social interactive trajectories) had almost equal grades. This may be explained by the fact that we identified roles based on qualitative coding that showed differences related to student functioning in the collaboration that were not necessarily better yet qualitatively distinct. In other words, active roles – while different – are both conducive to learning. The significantly lower grades observed in the free riders trajectory, compared to the active constructive and social interactive trajectories, highlight the impact of a student's role on their academic achievement in online PBL. These grade discrepancies between trajectories necessitate the need for targeted interventions, such as assigning roles or individual scaffolding, to encourage interactions between students and improve learning outcomes.

Implications

Our study reveals an important finding: the relative stability of emergent roles within collaborative learning contexts. This stability, while advantageous, also presents trade-offs. On one hand, stable roles create a comfortable and predictable environment for students. Leaders, for instance, can hone and reinforce their leadership skills as they repeatedly assume leadership positions. This contributes to the development of leadership competencies and to productive collaborative groups. However, role stability may hinder students’ exploration of alternative roles. Consequently, their opportunities for skill diversification and the acquisition of social skills remain limited. Furthermore, the absence of structuring reduces the likelihood of spontaneous role changes.

Understanding role variation allows educators to offer targeted support. By anticipating the roles students might assume during collaborative work, instructors can tailor tasks to align with individual strengths and interests. Striking a balance between structured roles and room for exploration becomes essential. Encouraging students to step beyond their comfort zones requires thoughtful guidance. Given the relationship between student roles and academic performance in our study, teachers should encourage active participation and knowledge construction for all students. This could involve intentionally designing activities that allow students to take on different responsibilities or even customized support for students who struggle in online discussions.

Generally, this study underscores the importance of longitudinal research designs that capture the dynamics of student roles and interactions over time. Future research should employ multi-method approaches, combining both quantitative and qualitative analysis, to gain a better understanding of how technology influences student collaboration and learning outcomes in PBL.

Limitations

While this research investigates in depth the longitudinal dynamics of students’ roles in online PBL, some limitations should be considered in interpreting our findings. The study's focus on dental students may restrict the generalization of the results to other educational contexts due to curriculum variations and methods of teaching. Additionally, the study relied exclusively on student contributions to the discussion board, which might affect the validity of the study as there might be other course-relevant communications between students either when they meet face-to-face in the classroom or on different platforms (email, messaging apps, etc.). These interactions may affect the roles that students play within their groups. However, given the nature and design of the task, this was an unlikely scenario. In addition, students may not express themselves on the discussion board in the same manner as they would in face-to-face conversations. For example: students may be more likely to present themselves favorably, showing behaviors typical of the ideal student role. For future research, we can expand by adding data from other communication channels, like instant messaging platforms, or replace the LMS platform with instant messaging platforms like Discord for course-related discussions. Moreover, students who are absent during certain weeks may affect the captured data. Although the authors tried many reliability checks and coding training, manual coding is still subjective and depends on human judgment, and coder interpretations may lead to inconsistencies in the coding process.

This study used LCA to categorize students into different roles based on their CoI dimensions over one week. Relying on unsupervised clustering methods may cause a misclassification, yet the high number of average posterior probabilities, entropy, and clear separation of clusters point out that this is a very minor risk. Additionally, while the analysis revealed transitions in student roles across different weeks, the assumption that a single role best characterizes a student within a given week. CSCL is a dynamic process where student behavior can change according to certain tasks or group needs, and different granularities can manifest different insights (Winne & Hadwin, 1998).

Our analysis was informed by our learning design, which has a weekly periodicity. Yet within weeks, changes can happen. For example, a student might initially act as an 'explorer' while searching for information before transitioning to a 'leader' role to guide group discussion later in the week. Similarly, a student classified as a 'leader' might have an 'explorer' tendency to ask questions. To address this limitation, future research could capture the within-week variability of student roles by analyzing student engagement on a session-by-session basis rather than aggregating data for the whole week. Such an approach may provide a wider understanding of the change in students’ roles in online learning.

While the free riders trajectory was associated with lower grades in our study, it's important to investigate the underlying causes of this disparity. Future research could examine whether lack of engagement is the sole contributing factor or if other variables, such as self-regulation skills or content knowledge, also play a role.

Conclusion

This study investigated the longitudinal evolution of students' roles across contexts and different collaborative tasks using a multi-layered temporal analysis combining machine learning with human coding and a Community of Inquiry (CoI) framework as a theoretical foundation. We identified three distinct roles: leaders, social mediators, and peripheral explorers. Our temporal analysis of role evolution trajectories revealed that students tend to maintain consistent roles throughout contexts and different courses. We observed that students assume similar roles repeatedly, and when they change, they usually return to their usual role. These findings support the notion that roles may be more of a stable disposition that hardly changes without intervention. These results also cast doubt on the idea that students spontaneously learn to collaborate or improve their collaborative skills over time.

Our findings have implications for group composition and scaffolding, as well as teacher roles. Groups need to be formed to encourage students to practice different roles, improve their collaborative skills, and go beyond their "comfort zone". Teachers' roles may be more important than commonly thought to offer scaffolding for socially isolated students and stimulate participatory discussions. In other words, teachers may – and possibly should – encourage the active engagement of all students by deliberately designing activities and groups, and offering support to students who encounter difficulties in online discussions.