Modelling diffusion in computer-supported collaborative learning: a large scale learning analytics study

This study empirically investigates diffusion-based centralities as depictions of student role-based behavior in information exchange, uptake and argumentation, and as consistent indicators of student success in computer-supported collaborative learning. The analysis is based on a large dataset of 69 courses (n = 3,277 students) with 97,173 total interactions (of which 8,818 were manually coded). We examined the relationship between students’ diffusion-based centralities and a coded representation of their interactions in order to investigate the extent to which diffusion-based centralities are able to adequately capture information exchange and uptake processes. We performed a meta-analysis to pool the correlation coefficients between centralities and measures of academic achievement across all courses while considering the sample size of each course. Lastly, from a cluster analysis using students’ diffusion-based centralities aimed at discovering student role-taking within interactions, we investigated the validity of the discovered roles using the coded data. There was a statistically significant positive correlation that ranged from moderate to strong between diffusion-based centralities and the frequency of information sharing and argumentation utterances, confirming that diffusion-based centralities capture important aspects of information exchange and uptake. The results of the meta-analysis showed that diffusion-based centralities had the highest and most consistent combined correlation coefficients with academic achievement as well as the highest predictive intervals, thus demonstrating their advantage over traditional centrality measures. Characterizations of student roles based on diffusion centralities were validated using qualitative methods and were found to meaningfully relate to academic performance. Diffusion-based centralities are feasible to calculate, implement and interpret, while offering a viable solution that can be deployed at any scale to monitor students’ productive discussions and academic success.


Introduction
Knowledge and behavior flow within social interactions which results in the adoption of innovations, endorsement of opinions, and spread of ideas, just to mention a few examples (Guilbeault et al., 2018) (Anderson et al., 2001;Fields & Kafai, 2009). These phenomena spread within the fabric of social networks through the process of diffusion (Anderson et al., 2001;Singh, 2018). Relationships between individuals are the joints in pathways through which information flows. Some have even compared the resulting pathways to 'pipes' that facilitate the flow. As emphasized by Siemens (2004), "the pipe is more important than the content within the pipe." (Siemens, 2004). Consequently, the network structure, the roles played by collaborators within this structure, and the interaction dynamics at play in this structure all play significant roles in shaping knowledge diffusion.
Graph-based (network) representation of discourse has been established as a credible method and a foundation for visualizing, supporting, and enhancing argumentative knowledge construction in computer-supported collaborative learning (CSCL) Lund et al., 2007;Muller Mirza et al., 2007;Pinkwart et al., 2006;Reed & Rowe, 2004). Therefore, researchers have harnessed the power of graph-based methods such as Social Network Analysis (SNA) to study the diffusion process using network measures to capture the overall network structure, the connectedness of users, and the relationship between collaborators. These methods are then used to further translate these properties into indicators, often referred to as centrality measures (Fields & Kafai, 2009;Suthers, 2015). Centrality measures (computed over graph representations) have been successfully demonstrated to map conceptually to the diffusion processes associated with many phenomena such as knowledge transfer, the spread of viral memes, as well as the diffusion of behavior (Mochalova & Nanopoulos, 2013;Sumith et al., 2018). Such successful applications and the practical utility of the diffusion-based centrality measures noted in several fields have motivated this specific work investigating the possible value of these measures in capturing the flow, uptake, and diffusion of knowledge in CSCL settings.
We argue that an approach to characterizing knowledge diffusion that takes advantage of well-established graph-based centrality measures is able to capture knowledge co-construction, uptake, and argumentation in networked CSCL settings. Using network methods to understand the ways knowledge is exchanged, discussed or negotiated in collaborative learning (CL) scenarios aids in understanding students' roles in collaboration, developing better monitoring mechanisms, and creating more consistent indicators of success (Cakir et al., 2005;Fields & Kafai, 2009;Lee & Tan, 2017a;Suthers & Desiato, 2012). A large volume of research has already sought to capture the essence of student interactions and further translate the representation of these interactions into indicators that can be used for modeling success (see Sect. 2.4). Yet, further research is needed on both fronts. Our work specifically builds on previous insights related to graph-based representation of argumentation and discourse elements in CSCL (Muller Mirza et al., 2007;Scheuer et al., 2010;Schwarz & Glassner, 2007), to the notion of uptake, the idea of worthiness of contributions and knowledge building in CSCL (Chen & Zhang, 2016;Lee & Tan, 2017a;Suthers, 2015;Suthers & Desiato, 2012), and finally to the work on diffusion of knowledge (Banerjee et al., 2013;Fields & Kafai, 2009). We bring these research strands together to provide practical and validated methods for quantifying and tracing interactions, uptake, and argumentation processes, identify relevant roles, and ultimately capture indicators of student success in CSCL.

Background
The optimal support of knowledge co-construction requires mechanisms that facilitate the mapping and tracing of student contributions. A large body of research in CSCL has demonstrated the worth of several tools and methods for visualizing argumentation processes and providing valuable insights both to students and teachers (Noroozi et al., 2012;Scheuer et al., 2010). This study builds on this immense body of research and offers a method based on graph-based diffusion centralities to help identify productive discussions as well as student roles in the discourse. Throughout this section, we cover graph-based representations of knowledge construction in CSCL. In particular, we build on the concepts of uptake, the spread of knowledge, and argumentation. Then, we provide a review of the literature explaining the concepts of diffusion and how diffusion is modeled. Later, we discuss diffusion metrics and how they are relevant to the dialogical interactions that occur in CSCL. We compare such operationalizations to other existing measures and highlight their advantages. We then review the previous literature on traditional SNA metrics, their use in CSCL, and how this study is meant to fill some of the existing gaps.

Graph-based representation and visualization of discourse in CSCL
The process of knowledge co-construction requires learners to engage in two dialogical spaces: cognitive and relational, both of which are essential elements for the success thereof (Janssen & Bodemer, 2013;Soller et al., 2005). The cognitive space encompasses both cognitive and metacognitive learning activities, such as discussing targeted concepts or learning strategies. The relational space is concerned with negotiations, communications and expressions of ideas (Janssen & Bodemer, 2013;Soller et al., 2005). Interaction in the relational space is aimed at reaching shared understanding about concepts under discussion in the cognitive space (Janssen & Bodemer, 2013). Both dimensions are intertwined since communication and speech acts-the building blocks thereof-incorporate both the cognitive and the relational dimensions. Such dialogical interactions "involve all dimensions of an individual -cognitive, communication, affective," which make them valuable in educational settings (Mirza & Perret-Clermont, 2009). Therefore, capturing the process of dialogical interactions, uptake and diffusion can help us understand how knowledge is created, exchanged, debated, discussed, endorsed or refuted (Suthers et al., 2010;Wise et al., 2021).
The important elements of dialogical interactions in argumentative knowledge coconstruction are participatory contributions related to the administration of participation; practices reflecting responsiveness and reciprocity (i.e., how a student responds, connects, and engages with others' contributions), and representations of reasoning and argumentation strategies (Noroozi et al., 2012;Scheuer et al., 2010;Wise et al., 2021). According to Scheuer et al., (2010), argumentation may be visually represented in either of these forms: threaded, graph-based, matrix, linear, or hierarchically nested frames (containers). All such formats are types of graphs (or networks) in which the "elements of knowledge or discussions" are the nodes and the relations among such nodes are the edges (Bell, 2013;Pinkwart et al., 2006;Reed & Rowe, 2004). Representation in the form of graphs offers an intuitive method for visualizing the different types of knowledge production. Therefore, graph-based methods have been used extensively for analyzing and visualizing the threaded structure of argumentation transcripts Lund et al., 2007;Muller Mirza et al., 2007;Pinkwart et al., 2006;Reed & Rowe, 2004). Examples of application include: supporting decision-making about the current state of debate (van Gelder, 2003), encouraging and helping scaffold discussions (Suthers & Hundhausen, 2003), or offering a medium for debate (Schwarz & Glassner, 2007). Several systems and tools exist to support argumentation and interaction visualization using different types of graphs (Scheuer et al., 2010). For instance, Digalo allows students to represent their argumentation as graphs within the tool (Muller Mirza et al., 2007;Schwarz & Glassner, 2007); Araucaria allows users to analyze the graphical representation of their arguments; DebateGraph allows the visualization of debates (Miller & Volz, 2013), and Metafora allows creating and visualizing plans as graphs (Schwarz et al., 2015).
Graph-based quantitative measures can also be used to capture the process of argumentation, debate, or knowledge construction. In CSCL, adding a post (i.e., a node) represents turn-taking. The advancement of argumentation may be visualized as adding another post (node) to a post (node) in an existing thread (Suthers, 2015;Wise et al., 2021). Aspects like depth, branching, and structure of interactions can then be captured and quantified in the form of graph-analytic measures. For instance, Clark & Sampson (2017) analyzed the quality of argumentation in threaded discussions using coded interactions. The authors measured the length and depth of threaded discussions and found that discussions with oppositional episodes are significantly longer, structurally more sophisticated, and tend to be grounded with reasons. (Clark & Sampson, 2017). The depth and branching of argumentation has been found to be malleable to intervention using collaborative scripts, which was found to be reflected in longer and deeper (more branched) forum threads (Weinberger et al., 2017). Similarly, the quality of argumentation has also been demonstrated to improve in response to providing appropriate visualization of the argumentation graphs Muller Mirza et al., 2007;Nussbaum et al., 2007;Tsai et al., 2012).

Uptake and reply-worthiness
Further work on capturing the argumentation and knowledge construction process comes from research in learning analytics and SNA. Suthers et al., (2010) offered compelling evidence of how analytics can capture the process of knowledge construction or argumentation in what he referred to as the "uptake" process. Uptake-according to the authors-is a process that occurs when a learner "takes" a peer's contribution and builds on it, advances the argument contained within it, or responds (Suthers et al., 2010). Such uptake is contingent on the relevance and the worthiness of the peer's interaction. Their approach operationalizes the uptake concept in threaded discussions using graphs of relationships. Subsequently, Suthers (2015) suggested the Traces framework, which "identifies observable contingencies between events and uses these to build more abstract models of interaction and ties represented as graphs". They used the Traces framework for the identification of central participants, significant interactions, as well as roles (Suthers, 2015). In the same vein, Lee & Tan (2017a) argued that learners respond to ideas or interactions in a knowledge-building setting based on their promisingness, i.e., relevance to the learning objectives or goals (Chen et al., 2015). Thus, the mining of uptake offers an indication of the promisingness and, more importantly, the learner's contribution to the discourse (Lee et al., 2016;Lee & Tan, 2017a). The authors used a combination of SNA centrality measures (e.g., betweenness and degree centrality) to discover promising ideas and how the promising ideas influence the subsequent discourse (Lee et al., 2016). Further support for the concept of promising ideas was demonstrated using a combination of SNA metrics, which was able to discover more promising ideas. The tracing of uptake of promising ideas was also demonstrated using several methods, e.g., temporal analytics to demonstrate the evolution of uptake overtime (Lee & Tan, 2017a, b), using the "Idea Pipeline framework" which combines SNA and quality metrics (Lee et al., 2016).
Tracking the flow or diffusion of the argumentation process has also been demonstrated by several researchers using traditional methods. Anderson et al., (2001) studied the spread of arguments in children. Their main findings were that arguments "snowball". For example, when a useful argument is used by a child, the tendency to spread and the probability of uptake or diffusion of this argument remains high (Anderson et al., 2001;Fields & Kafai, 2009) used ethnographic methods to track the diffusion of knowledge-sharing among school children. The authors were able to map the diversity of knoweldge-sharing situations and spaces that children use (Fields & Kafai, 2009). Similar work has been performed by Jeong et al., (2011) who used sequential analysis of coded discourse to study the transitions within and between argumentation episodes among children. The authors found that opposing arguments are more likely to elicit replies (be taken up) and trigger longer chains of discourse moves with rich argumentation.
This literature review shows that research in the learning sciences and in CSCL has investigated the potential of graph and related computational methods through use of different tools and taxonomies (e.g., flow, diffusion, uptake, spread, or evolution) for the purpose of studying the way students interact as well as the structure, quality, and content of such interactions. Our review of the literature offers credible empirical evidence of the value of tracing knowledge sharing, argumentation, and diffusion of knowledge using graph-based methods (or networks).
In this article, we build on previous research and use SNA to capture the process of argumentation, knowledge building, and uptake in threaded discussions represented as graphs. We do so by using diffusion centrality measures that were established in other fields to capture the diffusion of knowledge, behavior, and information related to the concept of uptake. The diffusion-based metrics (based on graph representations of threaded interactions) have been demonstrated to capture uptake process both in real-life settings and in simulation settings at least as well as more sophisticated diffusion modeling approaches and therefore provide a more practical alternative (Banerjee et al., 2013;Kitsak et al., 2010;Liu et al., 2016;Sumith et al., 2018).

Diffusion
The study of diffusion in graphs has been extensively investigated and modeled in a wide variety of phenomena across several domains, including the adoption of new fashion, the uptake of information, the ripple effect of stock market disturbances, the growth of political movements, the viral spread of Internet memes and, recently, in educational contexts (see e.g., Genlott et al., 2019;Guilbeault et al., 2018;. According to Barabási (2013), a "key discovery of network science is that the architecture of networks emerging in various domains of science, nature, and technology are similar to each other." Thus, models developed, e.g., for the examination of diffusion in social networks, were successfully employed for the study of the dynamics of e.g., the uptake of knowledge and diffusion of behavior (Cowan & Jonard, 2004;Kitsak et al., 2010;Singh, 2018). In fact, most of the trials at studying the uptake, diffusion or co-evolution of knowledge have harnessed the power of graph-based methods (e.g., networks) (Jalili & Perc, 2017;Scheuer et al., 2010;Suthers et al., 2010;Suthers & Desiato, 2012;Zhang et al., 2016). Yet, compared to other fields, diffusion-related research attempts in the area of education have hitherto been scarce (Burgess et al., 2018). In this study, we build on the previous work and extend such insights to the process of knowledge sharing, argumentation, and uptake in CSCL environments.

Diffusion modeling
Several models have been suggested to address the diffusion problem. These models include cascade-based models, such as the popular SIR (susceptible-infected-recovered) model, which proposes that an individual can spread an idea or opinion to one of his/her contacts with a certain probability (Singh, 2018;Zhang et al., 2016). The transmission rate (or uptake) depends mainly on the persuasiveness (related to promisingness discussed in Sect. 2.2) of the idea and the readiness of the contact to endorse it. Another type of such model is a homophily-based model, which proposes that similarities among individuals affect their adoption of knowledge or endorsement of behavior (Shakarian et al., 2015;Zhang et al., 2016). Tipping models attempt to identify when an individual may adopt an idea, behavior, or technology. The models propose that there is a tipping point when a significant number of neighbors adopt such an idea. The diffusion in such models depends on the mass of adopters and their commitment and enthusiasm for the idea (Singh, 2018;Zhang et al., 2016).
The diffusion of knowledge, behavior or the uptake of information (e.g., in CSCL environment)-otherwise known as complex contagion-requires trust, multiple interactions and maybe even reinforcement from multiple sources to change an individual's opinion (Centola, 2010;Guilbeault et al., 2018;Jalili & Perc, 2017;Lehmann & Ahn, 2018). As a possible solution to the complexity of modeling knowledge diffusion, researchers have capitalized on the wealth of information encoded in the individual position in a network to derive measures that can help identify the spreaders using simple metrics (Anderson et al., 2001;Banerjee et al., 2013;Kitsak et al., 2010;Liu et al., 2016;Sumith et al., 2018). Diffusion metrics use computational methods to identify the structural properties that facilitate the diffusion and uptake process, i.e., the process of uptake of information, or those who influence the way information is endorsed (Banerjee et al., 2013;Kitsak et al., 2010). These metrics aim to accurately reflect the uptake, the reply-worthy (promising) information to be endorsed and the actors' roles (Banerjee et al., 2013;Kitsak et al., 2010;Liu et al., 2016;Sumith et al., 2018). A prime advantage of diffusion metrics is that they are easier to compute and require less technical knowledge compared to the more sophisticated diffusion models (Kitsak et al., 2010;Mochalova & Nanopoulos, 2013;. Diffusion metrics encompass the traditional centrality measures and diffusion-specific measures, which have proven computationally efficient and to have good performance compared to more established complicated metrics (Guilbeault et al., 2018;Pei et al., 2018;Zhang et al., 2016).

Diffusion metrics
Traditional centrality measures. In CSCL, degree centralities, out-degree (number of contributed posts) and in-degree (received replies), quantify participatory contribution and externalization of a student as well as the reply-worthiness of the contributions he or she makes (Fig. 1). Higher degree centralities have been linked to the ability to spread information: a student with more contacts can-literally-spread information to a greater extent than another one with limited contacts. Nonetheless, the degree measures are local measures, i.e., they capture only the direct replies. As such, degree centralities offer no information about the uptake beyond the students' immediate replies as they do not capture the connectedness of the contacts or how far information will be negotiated or endorsed Sumith et al., 2018). Figure 1 shows four networks to illustrate the basic concepts of centrality measures. In all four networks, node A has higher values of the corresponding centrality than node B. In other words, Fig. 1(I) shows that node A receives more contributions than B (in-degree); Fig. 1(II) shows node A is closer to most other nodes than B (closeness); Fig. 1(III) shows node A lies in-between two groups of nodes (betweenness), and Fig. 1(IV) shows node A has the most outgoing connections (out-degree).
In addition to degree centralities, there are several other centrality measures that are often operationalized in CSCL settings. For instance, closeness centrality (inverse distance to others) reflects the reachability of a student by quantifying the closeness to all others in the network. Thus, a student with higher closeness centrality has a shorter distance to others and can possibly spread more information (Jalili & Perc, 2017;Mochalova & Nanopoulos, 2013). Betweenness centrality quantifies the extent to which a student is positioned in paths between two nodes within a network; that is, a student connecting two isolated groups may have access to resources from both, and therefore, has an advantage over the members of either group. Empirical research on networks has shown a limited correlation between betweenness centrality and the range of the spread of influence or diffusion (Kitsak et al., 2010;Mochalova & Nanopoulos, 2013).
Diffusion-based centralities. An important finding in diffusion research is that the uptake probability-and consequently diffusion resulting from an interaction-is dependent on the position of the node that created it as well as the connectedness of the node and its embeddedness in the network (Anderson et al., 2001;Lee & Tan, 2017b;Suthers, 2015;Suthers & Desiato, 2012). Research has confirmed a positive correlation between such characteristics and diffusion capability. This work has resulted in a collection of diffusion-based centrality measures that have been proposed as a way to bring these characteristics together (position, connectedness and embeddedness) (Guilbeault et al., 2018;Singh, 2018;Zhang et al., 2016). In this study, based on the literature, we have selected the measures that have shown the potential for capturing the uptake process in online interactions. The selected measures represent the diffusion degree considering the cardinality of contacts, the connectedness of contacts, and the embeddedness of the node.
Nobel Laureate Abhijit Banerjee proposed the diffusion centrality in a seminal article in Science (Banerjee et al., 2013). Simply put, diffusion centrality measures how far a piece of information given to a user will be endorsed. Banerjee et al., (2013) empirically investigated if diffusion centrality captures the process of uptake of information about microfinance solutions among villagers in India. Villagers who had high diffusion centrality were more likely to spread the word and to encourage people to endorse (i.e., take up) their opinion. What is more, villagers were smart at identifying the key influencers and conse-quently, they were strategically selective from whom they got their information. The authors concluded that diffusion centrality performed better than traditional centrality measures in identifying spreaders of information (Banerjee et al., 2013). Subsequently, several researchers have applied diffusion centrality in several domains, e.g., spread rumors, politics and crime networks (Banerjee et al., 2019;. This measure takes a practical approach by mapping the possibility of an interaction to generate replies (e.g., debate or an argument), and how these replies generate further replies. In doing so, diffusion centrality captures the uptake or the promisingness of interaction and consequently how other replies are contingent on such contributions.
Other important measures have similarly reflected the uptake and the diffusion probability of an individual to be related to the size and connectedness measured based on the nodes' connections. In fact, research has consistently shown that there is a positive correlation between the extent of the connectedness of contacts and the uptake or diffusion potential of the node (Bae & Kim, 2014;Centola, 2010;Liu et al., 2016;Wang et al., 2017). A group of measures was proposed to compute the connectedness of the node.
In our study, the following measures were selected: • Maximum Neighborhood Component (MNC) is the cardinality of connected neighboring nodes excluding the target node. In a CSCL context, MNC reflects the breadth and length of the threads that the interaction is part of. A promising or reply-worthy interaction is expected to engage diverse participants and generate long threads of discussions resulting in sizable subnetworks and consequently higher values of MNC. The higher the MNC values are, the more the expected range of uptake Wang et al., 2017). • Coreness (k-core property or linkage) is similar to H-index. The coreness of a network is defined as the maximum size subnetwork in which each node has at least degree k (Kitsak et al., 2010). The coreness of a vertex is the maximum k to which it belongs (Kitsak et al., 2010). A node that generates significant promising contributions attracts other participants with significant contributions and strong connections. Representing these relations as a graph places these nodes strategically towards the core of the network. Such strategic location enables the spread of information to a larger fraction of the network. Coreness has recently attracted attention following the key findings of the seminal work of Kitsak et al., (2010), who found that a user positioned in the core of a network has a significant influence on the information diffusion process to a larger sector of the community compared to degree centrality. Several other examples confirmed such findings, with empirical evidence on the spread of disease, rumors, economic crisis information and online information (Bae & Kim, 2014;Liu et al., 2016;Wang et al., 2017). In CSCL settings, users who are involved with highly interactive posts that attract active users (with high coreness values) are more likely to have their ideas spread, discussed and generate more interactions. Higher coreness values also reflect engagement in dense threads with highly interactive students. • Cross Clique Connectivity measures the number of cliques or triangles a node belongs to. A higher number of cliques reflects higher degrees of embeddedness, connectivity of the contacts, as well as strong ties. Cliques and variants have been studied extensively in different fields, including education. In CSCL, when interaction generates replies by two different participants, this creates a triangle with the initial post. Such triangles/cliques are a sign of the promisingness and reply-worthiness of the post. Research has shown that nodes embedded in more cliques are more likely to be spreaders. Cross-clique connectivity has been investigated and proven useful in several studies for the spread of online knowledge (Faghani & Nguyen, 2013;. To demonstrate the diffusion-based centrality measures-how they capture the process of uptake, and how they differ from traditional degree measures-a fictional CSCL scenario is demonstrated in Fig. 2. Although, in the sample network, node A started the conversation and received replies from B, C, D, E and F, said replies did not generate further discussions, debates or arguments. Conversely, the reply from G sparked a meaningful (promising) conversation that stimulated further productive discussion (uptake), which received other meaningful replies from H, I and J (Table 1). Those nodes had the highest diffusion values, as they were the nodes that stimulated the argumentation process, debates, and coconstructed knowledge. Node G has the highest value of diffusion centrality (followed by J, A), and it is embedded in more cliques than node A as well. Therefore, it is expected that node G-and also J-has a higher probability of uptake compared to node A (Table 1). Nodes G, H, I, and J have the highest coreness values (k = 3): they lie within the core of the network as well as have three connections to nodes with at least three connections (k = 3). Such strategic core positions resulted from their promising ideas that generated such intense engagement. While node A had six connections, these connections were mostly isolated (with just one connection each) so, although A had the highest degree centrality, it had the third-highest diffusion centrality. This example demonstrates how diffusion-based centrality measures were capable of accurately capturing the uptake and argumentation process compared to the traditional centrality measures.

SNA and academic achievement
CSCL provides students with an optimal medium for collaboration, productive interactions, and knowledge construction (Jeong & Hmelo-Silver, 2016). The relationship between productive students' interactions in CSCL settings and improved academic achievement is widely accepted. In fact, several reviews and meta-analyses that synthesize a large number of other studies have provided firm empirical evidence of the positive association between the value of students' interactions in collaborative learning settings and cognitive gain, as well as skill and knowledge acquisition (Borokhovski et al., 2016;Chen et al., 2018;. Therefore, researchers have resorted to SNA to capture students' interactions in CSCL settings and translate them into indicators of success, with inconclusive results. We believe that, since diffusion-based centralities are more concerned with knowledge construction, and uptake by collaborators they may offer better indicators for students' success as well as indicators for productive discussions and reply-worthy arguments. Below is a review of previous research regarding centrality measures as indicators of success. Most of the existing research investigating centrality measures as indicators of performance in CSCL settings has relied on individual courses with a limited number of students. The findings reported in such studies (Fig. 3) suggest that a positive relationship exists between centrality measures and academic achievement, however, with marked variability among indicators. For example, out-degree centrality was shown to correlate with student performance by some researchers (e.g., Hernández-García et al., 2015;Saqr & Alamro, 2019); yet, others did not identify any significant correlations (e.g., Liu et al., 2018a, b;Reychav et al., 2018). Similarly, in-degree centrality was found to be positively correlated with students' performance by some scholars (e.g., (Z. Liu, Kang, Domanska, et al., 2018;Wise & Cui, 2018), whereas others have exhibited no significant correlations (e.g., Reychav et al., 2018;Saqr & Alamro, 2019). Such variable results and limited datasets have offered low potential for generalization. More recently, some studies have examined several courses, testing the value of centrality measures as indicators of student performance. For instance, four studies have used two courses (Cadima et al., 2012;Cho et al., 2007;Jiang et al., 2014;Joksimović et al., 2016), a study used four courses (Saqr et al., 2018), and another study used twelve courses . All these studies have shown mixed results, where centrality measures were positively and significantly correlated with performance in some courses, while non-statistically significant in others, or even sometimes negatively significantly correlated. Such inconsistency was most noticeable in betweenness and closeness centralities.
After a thorough review of the literature (summarized in Fig. 3), we can conclude that there are several small studies with variable results and multiple course studies with mixed results. It becomes apparent that the most investigated centrality measures (betweenness and closeness centralities) have shown the most inconsistent results. These conclusions beg the need for large-scale studies with sufficient power to test the value of traditional centrality measures as indicators of student achievement, as well as for the search for new centrality measures that provide more consistent results.
We argue that diffusion-based centrality measures have the advantage of being more aligned with the collaborative process, as they capture the uptake of the argumentation process in CSCL, the extent of engagement of collaborators (i.e., the diffusion of argumentation), and the breadth of interactions among collaborators. In doing so, they capture considerable aspects of the process of knowledge construction. Therefore, diffusion-based centralities are able to fill the aforementioned gap, i.e., they may show more consistency compared to the traditional centrality measures and thus provide a valuable tool for educators who wish to monitor, optimize, or support students in CSCL. Cadima et al., 2012;Cho et al., 2007;de-Marcos et al., 2016;Hernández-García et al., 2015;Jiang et al., 2014;Joksimović et al., 2016;Liu et al., 2018aLiu et al., , b, 2019Liu et al., 2018a, b;Putnik et al., 2016;Reychav et al., 2018;Saqr, Fors, & Nouri, Saqr et al., 2018a, b, c;Saqr, Fors, & Tedre, Saqr et al., 2018a, b, c;Saqr & Alamro, 2019;Wise & Cui, 2018.

SNA and students' roles in CSCL
In CSCL settings, interactions among collaborators are facilitated or constrained by the dynamics of the collaborative learning process and the roles played by each actor . Roles can be defined as distinct patterns of behavioral engagement, contribution characteristics, and social orientation (Dowell et al., 2019;Strijbos & Weinberger, 2010). For instance, in online learning contexts, leaders show different social interaction patterns that distinguish them from non-leaders (Kim et al., 2020). There are two main types of roles: scripted roles which are defined by the educators to facilitate the collaborative process or trigger a certain interaction pattern. Scripted roles can specify duties, activities, or responsibilities of the learner and how the roles rotate or change (Strijbos & Weinberger, 2010). Emergent roles are spontaneously and dynamically assumed by learners as they structure the learning process. The emergent role perspective emphasizes learners' autonomy and self-regulation. This study focuses on the emerging roles that students assume during the collaborative process.
Manually detecting the emergent roles can be an arduous process due to the large amount of online interaction data, the abundance of collaborative tasks, and the overtasked teachers (Dowell et al., 2019). Therefore, automatic approaches for detecting emergent roles are necessary. One empirical approach that has proved to be appropriate for the study of roles in CSCL settings is SNA (Haythornthwaite, 1996;Wasserman & Faust, 1994). Researchers have harnessed the power of SNA in ranking actors through quantification of their position, connectedness, and interactions to identify students' roles in the collaborative process ( Table 2). For instance, Temdee et al., (2006) suggested an approach for identifying the emergent leadership roles in a collaborative learning team based on the calculation of the leadership index, which was a combination of the degree, closeness and betweenness centralities. Their results indicated that the leadership index can be efficiently used to distinguish the leader role as verified by the students' vote. Similarly, Stuetzer et al., (2013) used a combination of degree, betweenness, weighted degree and eigenvector centralities to identify brokering roles in a distance learning setting. They discovered that brokers have a central position on the network, lie in the shortest path between all other pairs of partici- Ouyang & Chang, 2019; Saqr, Fors, Tedre, et al., 2018). The identification of roles in the previous studies has relied on profiling groups of learners with preset criteria for centrality measures, e.g., high values of degree centrality for the leader roles (active roles), and low centrality values for the isolated (or peripheral) roles. In addition to profiling roles according to preset criteria, Kim & Ketenci (2019) used k-Means clustering to unveil groups of learners according to different centrality measures (in-degree, out-degree and betweenness), resulting in three different roles: full participants, inbound participants and peripheral participants. Their results showed that the classifications explained learner engagement level in three dimensions: cognition, behavior, and emotion. However, they also found that the profiled roles were more distinguishable than those formed by clustering. Table 2 lists the roles identified in the reviewed articles, the centralities used to characterize such roles, and the sample size of the studies.
Our review has highlighted some gaps in the literature, namely the reliance on traditional centrality measures that are hard to interpret in collaborative settings as well as not capturing the uptake process (see Sect. 2.1). Most of the reviewed articles based the role identification on preset centrality values, which lacks the rigor needed for identification and verification of the validity of such roles offered by robust statistical methods and modern machine learning. Lastly, most studies used single courses with a limited number of participants, which offer low potential for generalization. To fill the identified gaps, this study aims to investigate the possible role of diffusion-based centralities which we argue can accurately capture the argumentation process in the context of CSCL as a base for the identification of roles. We use modern machine learning methods for the discovery and verification of roles. We further use a large sample with several courses to overcome the limitations of convenience sampling.

Context
The present study is based on an integral dataset consisting of all the courses with a collaborative online module offered at the college of Qassim University between 2013 and 2019. To avoid selection or publication bias, all courses were included if they had at least 30 students. Courses with fewer than 30 students were excluded since they were considered unsuitable for the correlation meta-analysis as they violated the central limit theorem (Kwak & Kim, 2017). Overall, the study included 69 course offerings (15 different courses). Although each course covered different healthcare-related topics, they all had a common pedagogical underpinning based on Problem-Based Learning (PBL). The program is fairly homogenous, where all courses have been designed using similar principles to facilitate online PBL, have similar assessment strategies, and subject matter that integrates basic sciences and clinical sciences. Admission in the program is competitive and therefore, students' admission grades are usually higher than those of other colleges. The courses integrate basic and clinical subjects in theme courses. For example, the course of Head and Neck: Structure and Function covers the subject of head and neck anatomy, physiology and pathology of the head and neck, as well as the common diseases of head and neck. The problems addressed in the course cover issues around head and neck. Other courses are structurally similar, e.g., Cell Structure and Function, Body Systems in Health and Diseases, Principles of Dental Sciences, Neuroscience, etc.
In each PBL course, students were divided into small groups with a tutor. The tutor role was relatively limited to face-to-face interactions. Students were assigned an open-ended problem on a weekly basis. They used an online forum to exchange ideas and perspectives, as well as argumentation about possible ways to understand, solve, or study the problem. Students' performance was measured by the course grade, consisting of separate grades for: (1) the final exam, (2) the level of engagement in the online forums, and (3) students' continuous assessment with the learning tasks and interaction during class. The final exam accounted for 80% of the overall grade, whereas the latter two components accounted for the remaining 20%; distributed as follows: 10% continuous assessment (e.g., practical assignments, seminar preparations, engagement in lectures and course duties), 5% for online forums, and 5% for face-to-face PBL group sessions.

Data collection
The data from the 69 course offerings were retrieved from the forum module of the Moodle learning management system for each post and included: the post author (the source), the replied-to (target), the post ID, the thread ID, the group ID, and the timestamp of each post.

Network Analysis
To capture the interaction processes, a post-reply directed network was constructed in which the author of each post was considered the source of the interaction, and the recipient replied-to was considered the target of the interaction ). An aggregate network was constructed for each PBL group (5-10 students) and prepared for analysis through the Igraph R package (Csardi & Nepusz, 2006). For each student, we report the calculated centrality measures according to (Poquet et al., 2021): Degree centralities (Borgatti, 2005;Borgatti & Brass, 2019;Opsahl et al., 2010;.
• Weighted out-degree centrality: Total number of posts contributed by a student. It is operationalized as a student's contribution to the discourse, participation, and social positioning. • Weighted in-degree centrality: Total number of replies received by a student. It is operationalized as a student's contribution to the discourse, number of reply-worthy contributions, and stimulation of others' participation. • Weighted degree centrality: Sum of weighted in-degree and weighted out degree.
• Out-degree centrality: Total number of students a student has interacted with. A high number indicates wide social capital, access to diverse opinions, and a larger network of connections. • In-degree centrality: Total number of students that a student has received replies from.
A higher number indicates wide social capital, worthiness of contributions, as well as prestige. • Degree centrality: Sum of in-degree and out degree. • Closeness centrality: Inverse distance between the student and all others in the network. It is operationalized as the reachability of or access to other students (Bae & Kim, 2014;Borgatti & Brass, 2019). • Betweenness centrality: Total times a student node is found on the shortest path between two others. It is operationalized as bridging social capital, mediating between collaborators, and access to diverse groups (Bae & Kim, 2014;Borgatti & Brass, 2019). • Eigen centrality: In contrast to degree centrality, which counts all connections with equal weights, eigenvector centrality considers the connectedness of a student's collaborators, i.e., connections to highly connected students are expected to be more important than connectedness to disconnected students (Bae & Kim, 2014;Borgatti & Brass, 2019).
• Diffusion degree: Sum of collaborators' diffusion probabilities, i.e., the probability that a student may spread an idea across the network. It is operationalized as the capability of the student to contribute posts that are reply-worthy and would stimulate uptake by others, i.e., the downstream uptake that a post has generated (Banerjee et al., 2013). • Coreness: Largest connected group of students in which all have degrees of at least k. It is operationalized as the strength of connectedness of the students' collaborators. A student with high value has contributed reply-worthy ideas that engaged others. The result is the student being in a strategic position in the 'core' of the network, i.e., embedded in discussions that have attracted highly interactive students (who have high coreness values too) (Kitsak et al., 2010). • Cross clique connectivity: The number of cliques that a student is a part of. It is operationalized as the ability of students' contributions to engage connected students who are also interacting with each other (forming triangles or cliques). Cliques are also a sign of strong connections (Faghani & Nguyen, 2013). • MNC: The cardinality of a student's connections excluding the student, which reflects how connected student collaborators are and, thus, their ability to spread information or be engaged in discussions. A student who is engaged in longer threads with diverse participation from well-connected students has a high MNC Wang et al., 2017).
For diffusion degree and coreness, we calculated the directed versions (the in and out): the in version considers only the incoming connections, and for the out version, we consider only the outgoing connections.

Discourse analysis
To investigate whether diffusion-based centrality measures are capable of capturing information exchange, uptake, and argumentation (RQ1), we (1) examined the relationship between these centralities and the frequency of coded interactions extracted from a dataset corresponding to five course offerings from the examined data. We did so by calculating Pearson's correlation coefficient between each of the examined centralities and the frequency of the occurrence of relevant codes. We used Holm's correction to account for multiple comparisons (Holm, 1979). We (2) performed analysis of variance between the identified emergent student roles, and the frequency of coded interactions was performed as well (see Sect. 3.6 for further details).
The coding scheme was based on the approach by Visschers-Pleijers et al., (2006) Verbal interactions in tutorial groups, which is a widely used coding scheme for PBL group interactions. Students' interactions were classified into five categories: three of them learning-related (Questioning, Reasoning, Conflicts), as well as Procedural interactions and Off-task/Irrelevant interactions. We further took inspiration from (Aarnio et al., 2013;Yew & Schmidt, 2009) to expand the Reasoning category to include different types of information sharing (renamed the category to Information discussion), and the Conflicts episode to involve the argumentation types (renamed to Argument and debate). Two researchers coded 5% of the data separately. The initial inter-rater reliability (Cohen's Kappa) was 0.6 to 0.8 for the different categories. Both researchers met and resolved disagreement and recoded the data separately, resulting in a final Cohen's Kappa for all data of 0.91 (McHugh, 2012). The final coding scheme is shown in Table S1.

Frequency analysis and meta-analysis
To investigate whether diffusion-based centralities serve as consistent indicators of study success in CSCL settings (RQ2), we first calculated Pearson's correlation coefficient between each centrality measure and students' final grades. To prepare the variables for Pearson's correlation, we applied Box-Cox transformation to all variables so that they were closer to the normal distribution (Havlicek & Peterson, 1976;Nefzger & Drasgow, 1957;Peterson, 1977). In order to rigorously pool the correlation coefficients across all courseswhile taking into account the sample size of each course-a correlation meta-analysis was performed (Hedges & Olkin, 1985). By doing so, we were able to calculate the heterogeneity within studies (course offerings in our case) (Gurevitch et al., 2018;Higgins & Thompson, 2002;Schwarzer et al., 2015). Therefore, we performed 17 meta-analyses (one for each centrality measure) to pool the correlations of all 69 course offerings. The combined correlation coefficient of the meta-analysis is a weighted average of all correlation coefficients in all courses. An inverse-variance pooling of Fisher's z transformed correlations was performed to obtain an accurate weight for the sample size of each course offering. A random-effects model was selected for the reporting of the combined correlation coefficient since we expected the course offerings to be heterogeneous (which was confirmed by the moderate levels of heterogeneity indicators) (Schwarzer et al., 2015).
To measure the consistency of the results, we estimated the heterogeneity (between-study or course offering variance). Heterogeneity measures the extent to which effect sizes vary within a meta-analysis (Higgins & Thompson, 2002;Schwarzer et al., 2015). Heterogeneity was estimated using the Sidik-Jonkman method. Higher levels of heterogeneity indicate less consistent coefficients. In turn, low levels of heterogeneity are a sign of consistent findings that increase certainty that future applications of the centrality measure are expected to render comparable results. I 2 -a measure of heterogeneity-was selected because its insensitivity to changes in the number of studies and is easy to interpret (I 2 of 25% or lower indicates very low heterogeneity; I 2 from 25 to 50% indicates low heterogeneity; I 2 from 50 to 75% indicates moderate heterogeneity, and I 2 greater than 75% indicates substantial heterogeneity (Higgins & Thompson, 2002;Schwarzer et al., 2015). We also report the prediction interval, which has become the recommended measure of heterogeneity, that accurately establishes the future predictability of the indicator considering the heterogeneity, i.e., the expected range of values in which the future correlation would probably lie. The prediction interval can be interpreted in a similar way to confidence intervals. That is, if the lower and upper bounds are on the positive side or both on the negative side, we expect that future applications within similar contexts would have comparable results within the bounds of the predictive interval (IntHout et al., 2016).
Lastly, to compare with previously reported results in the literature, we computed the number of course offerings (frequency) in which each centrality measure had a positive and statistically significant correlation with grades (PSSG), positive and statistically insignifi-cant correlations with grades (PSIG), negative and statistically significant correlation with grades (NSSG), as well as negative and statistically insignificant correlation with grades (NSIG).

Clustering
To investigate the possibility of using diffusion-based centralities to reveal students' emergent roles (RQ3), we applied clustering following the methods of López-Pernas et al., (2021). In our case, clustering of students could help discover students with similar emergent roles based on their interaction patterns, operationalized as diffusion and traditional centrality measures. To limit the influence of extreme values, data was winsorized, i.e., values in the top and bottom 5% were replaced by their closest values. Since the value of distance measures could be heavily affected by the scale of each centrality measure, we standardized all the data (subtracted the mean and divided by standard deviation) before measuring the inter-observation dissimilarities (Likas et al., 2003;Steinley, 2006).
To estimate the optimum number of clusters, we used the NbClust R package which implements 30 indices for identifying the recommended number of clusters (Charrad et al., 2014). The majority (12/30) vote method suggested three clusters as the optimum number, which was selected for our study. The k-means algorithm was used to perform the clustering of students based on Euclidean distance. The clustering algorithm was evaluated using the silhouette coefficient method-a common internal validity measure-that estimates the extent to which an observation is assigned the right cluster by computing the average distance between clusters. A silhouette value of 1 means the observation is perfectly placed in the right cluster, while a value of -1 indicates wrong assignment (Steinley, 2006). To estimate the separation among the emergent roles identified, we performed a Kruskal-Wallis non-parametric one-way analysis of variance (ANOVA), comparing the mean values of the centrality measures among the three clusters (Ostertagova et al., 2014). To rigorously evaluate the magnitude of the obtained results, we calculated the epsilon-squared effect size (Tomczak & Tomczak, 2014). Post-hoc pairwise comparisons were performed through Dunn's test to verify the magnitude and significance of the separation of clusters (emergent roles), using Holm's correction for multiple testing (Holm, 1979).
To examine how the identified emergent roles relate to academic achievement, we performed a Kruskal-Wallis ANOVA test to compare the final grades among the three clusters (Ostertagova et al., 2014). We computed the epsilon-squared effect size to evaluate the magnitude of the obtained results. We further used Dunn's test to perform post-hoc pairwise comparisons using Holm's correction. Lastly, to verify our hypothesis that diffusion centralities can capture the roles in information diffusion and argumentation (RQ1). We performed a Kruskal-Wallis ANOVA test (using the same methods) to compare the identified roles with the frequencies against the coded dataset (Ostertagova et al., 2014).

Results
The study included 69 courses, 3,277 students, 97,173 interactions (of which 8,818 were manually coded). Table 3 shows the descriptive statistics of all the courses analyzed. The median number of students that completed each course was 48. The median frequency of interactions in a course was 1,065. Table 3 also reports the summary statistics of the centrality measures of each student (detailed per-course statistics are available in Table S2). As such, the dataset had medium-sized courses with relatively interactive students: each student had a median weighted degree of 37 and interacted with a median of 10 others. Table  S3 shows the frequency of coded interactions.
There was a statistically significant positive correlation that ranged from moderate to strong magnitude between diffusion centralities and "Sharing info. and materials" (Fig. 4), which was highest in weighted degree centrality (r = 0.701, p < 0.01), coreness (r = 0.639, p < 0.01), diffusion degree (r = 0.578, p < 0.01), as well as in closeness (r = 0,473, p < 0.01); less so in betweenness centrality (r = 0.204, p = 0.004). A similar pattern is also noticed in "Sharing facts", "Continuing argument", "Argument" and less so in "Counter argument". In general, the correlation coefficients were higher between diffusion-based centralities and the frequency of the codes that represent information sharing than with those related to task organizing, off-task activities and questions. In the "Questioning category", the correlation was weak in the types of questions that stimulated more interactions ("Question open" and "Question critical"). Lastly, in the categories of "Off-tasks" and "Procedural Interactions", the correlation was non-existing or very weak.
In summary, the reported results point to high selectivity of diffusion centralities for capturing information exchange, uptake, and argumentation, as well as other types of interactions that are reply-worthy ("Question open" and "Question critical"). The detailed correlation coefficients between each of the examined centralities and the frequency of the occurrence of relevant codes are shown in Fig. 4. We calculated Pearson's correlation coefficient between students' grades and centrality measures (traditional and diffusion ones) in each course offering. Compared to traditional centrality measures, diffusion-based centralities had a higher number of PSSG, and only one course offering with NSIG (coreness-in). The number of PSSG was generally higher in 'out' measures (e.g., coreness-out, diffusion degree out) and consequently in the total measures. Coreness out and coreness total had the highest frequency of PSSG (62 [89.9%]), followed by the diffusion degree out and cross clique centrality (61 [88.4%]). In summary, diffusion centralities were more likely to correlate positively and significantly with grades and, therefore, are expected to be better indicators of students' success in CSCL settings. Among the traditional centrality measures, the results (Fig. 5) show that the weighted degree centralities were the most consistent indicators for a student. In most course offerings (62 [89.9%]), weighted degree and weighted out-degree were PSSG, closely followed by weighted indegree (60 [87%]). The non-weighted degree centralities were slightly lower in terms of frequency of PSSG compared to the weighted variant (61 [88.4%] degree, 57 [82.6%] indegree, 60 [87%] out-degree). However, there was a single course in which the correlation coefficient for non-weighted degree centrality was NSIG. The number of PSSG was the lowest in closeness centrality (54 of the course offerings [78.3%]), followed by betweenness

Fig. 5 Summary of the frequency of correlations between grades and centrality measures
Since the simple frequency of correlations is far from optimal for estimating our confidence and certainty in centrality measures, we proceed to report the results of the metaanalysis for more detailed and in-depth analysis ( Table 4). The random-effects model of the combined correlation coefficients of the centrality measures of the 69 course offerings was highest for the coreness out centrality (r = 0.57 [CI 0.53:0.60]), with the narrowest range of CI (the difference between the low and high limits of CI), and the highest lower bounds of predictive intervals ([0.26;0.77]). The weighted degree showed close but slightly lower combined correlation coefficients (r = 0.56 [CI 0.52:0.60]) as well as predictive intervals ([0.25;0.77]). Total coreness followed with a combined correlation coefficient of (r = 0.55 [CI 0.51:0.59]) and a predictive interval of ([0.20;0.78]). In general, weighted degree and diffusion centralities had high combined correlation coefficients (ranging from 0.51 to 0.57) as well as statistically significant predictive intervals (with lower bounds within the range of 0.15 and 0.26 and upper bounds within the range of 0.75 and 0.78). These results are an indication of the robustness and remarkable value of diffusion and weighted degree centralities.
However, closeness centralities showed a statistically insignificant predictive interval ([-0.06;0.81]). Betweenness centralities exhibited the lowest combined correlation coefficient of all centralities of (r = 0.38 [CI 0.33:0.42]). Eigenvector centrality also showed a comparatively low combined correlation coefficient (r = 0.41 [CI 0.36:0.45]). The lower limit of the predictive interval of both betweenness centrality and eigenvector centrality was low (0.07). Figure 6 shows the forest plot of the combined correlation coefficients (of the 69 course offerings) of all centralities and their corresponding confidence intervals.
In summary, the results of frequency of correlation tests and the meta-analysis show that diffusion-based centralities, together with weighted degree centralities, have higher and consistent combined correlation coefficients with academic achievement (measured by grades).
*The vertical line in the center of the forest plot represents a correlation value of 0, whereas the horizontal lines represent the 95% confidence interval of the correlations for the corresponding course. The box in the middle represents the weight of each study (course offering in our case). The point inside the box represents the effect size. Course offerings with confidence intervals crossing the 0 line on either side are considered statistically insignificant. Course offerings with both confidence interval bounds on the right side of the 0 line are considered in favor of a statistically positive and significant correlation.

RQ3: Can diffusion-based centralities reveal students' emergent roles and their possible correlation with learner performance?
Clustering of students' centralities using k-means was performed to group similar patterns of students' based on their interaction profiles. The examination of cluster centroids can give an idea about the 'average' or 'typical' profile of students that belong to each of the three identified groups (Fig. 7). The three identified emergent roles can be described as follows: • Influencers (n=675): Students in this cluster have higher diffusion centralities (coreness, diffusion degree and cross-clique centrality), indicating that their contributions were more likely to be influential, spread or attract other contributions (uptake). Influencers were close to more students (higher closeness centrality values) and posted information more frequently (higher weighted degree centrality). However, they had below average betweenness centralities, indicating that they were more likely to be starters of a discussion than followers who mediate or discuss others' posts, or maybe they were more often engaged in interactions with the isolated students. • Mediators (n=1296): The mediators showed diffusion centralities that were slightly above average, and in-between the isolates and the influencers. Interestingly, mediators had the highest values of betweenness centrality, indicating that they acted as bridges between the influencers and the isolates. Mediators had eigenvector centrality close to the influencers, demonstrating that they had similar social capital, i.e., since the mediators follow the influencers, they are expected to show similar connectivity levels. • Isolates (n=1306): Isolates -as the name implies-have low centrality measures, lower diffusion centralities, smaller neighborhood sizes, as well as low eigenvector centralities. Yet, the isolates have values of betweenness centrality that are slightly below the influencers, as both groups have little role in bridging interactions.
The Kruskal-Wallis test showed that the roles identified were well separated (Table 5). All centrality measures differed significantly (p < 0.001) between clusters with an effect size that ranged from relatively strong (ε 2 = 0.34) to strong (ε 2 = 0.75). The post-hoc pairwise test showed significant differences between all pairs of clusters except for Eigenvector centrality between influencers and mediators. The complete pairwise comparisons with their corresponding statistics are detailed in Fig. S1. The average silhouette coefficient for the clusters was 0.33 (0.3 for the influencers, 0.47 for the isolates, and 0.2 for the mediators). Students Fig. 7 Mean centrality values for each clustered profile who were identified as influencers had an average grade of 74.1 out of 100, compared to 63.52 for mediators and 44.87 for isolates (Fig. S2). The Kruskal-Wallis test showed that difference in grade was statistically significant with relatively strong effect size, (χ 2 (2) = 650, N = 3277, p < 0.001, ε 2 = 0.2). The post-hoc pairwise test showed that grades were significantly different between clusters. Figure 7 shows the comparison between students' centrality measures in each cluster.
Lastly, a comparison of the mean frequency of coded interactions among the three roles identified confirmed that each of the identified roles differed significantly on the parameters of information exchange, argumentation, and uptake ( Table 6). The effect size was strong (ε 2 = 0.462) for "Sharing info and materials" (ε 2 = 0.46), relatively strong in "Sharing facts" (ε 2 = 0.183), "Argument" (ε 2 = 0.168), and "Continuing argument" (ε 2 = 0.203); less so for "Evaluation" (ε 2 = 0.132). Figure 8 shows the comparison between students' coded interactions in each cluster. Questioning, Procedural and Off-task interactions either showed non-  6.093 2 0.048 0.030 P* Post-hoc pairwise comparisons were significant between all possible groups significant or very low effect sizes indicating the selectivity of the identified roles to the interactions related to information exchange, uptake, and argumentation.

Discussion
There are several constraints of time and effort that hinder the manual analysis of learners' data. Therefore, computational methods are increasingly needed to help teachers in their efforts to support learners (Wise & Schwarz, 2017). Our study was motivated by the earlier encouraging research on uptake in CSCL (Chen & Zhang, 2016;Lee & Tan, 2017a;Suthers, 2015;Suthers & Desiato, 2012), emerging research on diffusion in other fields (Banerjee et al., 2013;Cowan & Jonard, 2004), and the need for better monitoring mechanisms for student collaboration. We aimed to examine the role that diffusion-based centrality measuresas a computational method-could play in capturing the uptake process and diffusion of argumentation. This may be particularly revealing of the ways students facilitate knowledge uptake or information flow, encourage others to participate, help endorse an opinion, embrace a plan in a project, or follow a collaboration script. While previous work has demonstrated the immense value of representing and visualizing interaction graphs Muller Mirza et al., 2007;Reed & Rowe, 2004). The suitability of visualizations is limited by the number of interactions. That is, in large courses and thousands of messages, it becomes visually impossible to render usable insights (Scheuer et al., 2010). Therefore, computational methods that can make sense of big data are needed. Such computational methods should ideally build on the graph representations of threaded interactions and be able to capture metrics that are interpretable and understandable by teachers and students.
Modelling of diffusion or uptake of knowledge is subject to a number of conditions and difficulties, e.g., previous knowledge of the recipient, evaluation by the recipient (to accept or not) and need for reinforcement. These difficulties apply to CSCL as well, and have long been recognized (Suthers, 2015;Suthers & Desiato, 2012). The concepts of diffusion and uptake may vary by context, for instance, in PBL diffusion may mean the advancement of knowledge, in project-based learning it may mean the co-construction of new ideas (Suthers, 2015). However, computational methods can help infer the probability and the magnitude of uptake or diffusion based on the relational structure of the threaded interactions (Lee & Tan, 2017a;Suthers et al., 2010;Wise et al., 2021). With the availability of a significant corpus of data for each student, such inferences become more accurate, and the signal surpasses the noise. Suthers (2012) emphasized the importance of the practical applications of computational methods: "we should be more concerned with whether an automated method of analysis produces useful results in its applications rather than whether the specific intermediate representations are correlated with human analysis" (Suthers & Desiato, 2012). Our results have shown that diffusion-based centralities offer a practical solution for the inference of the uptake and argumentation within student's interactions that has been moderately to strongly correlated with a human coded dataset and showed strong to relatively strong effect sizes in differentiating roles of students who share information or use argumentations (RQ1). Such high values of effect sizes point to the significance and importance of such methods bring to the inference of uptake and argumentation (Tomczak & Tomczak, 2014).
Regarding RQ2, we were especially interested in developing a better understanding of indicators of student success in CSCL settings. Our results have shown that compared to traditional centrality measures, diffusion-based centralities were more likely to be positively and significantly correlated with grades in almost all courses in all measures (except for one course for coreness-in). The results of the meta-analysis further confirmed these findings and showed that diffusion centralities had higher combined correlation coefficients, narrower confidence intervals, and higher predictive intervals, thus offering solid proof of the robustness and consistency of diffusion centrality measures as indicators of success as well as reflective of the relevance of diffusion-based centrality measures to collaborative context. This can be explained by the fact diffusion centralities capture the role of students in generating reply-worthy promising ideas that are more likely to be taken up as well as they generate longer threads and engagement of others (Chen & Zhang, 2016;Lee & Tan, 2017a;Poquet et al., 2020;Saqr & Montero, 2020;Suthers, 2015;Suthers & Desiato, 2012). Students who can generate such contributions are more likely to be more engaged, more participatory or high achievers (Anderson et al., 2001;de-Marcos et al., 2016;Reychav et al., 2018;Saqr & Montero, 2020). Diffusion-based centralities capture extra information (length of threads, the range of involvement of others and possibility of uptake), as such they are more likely to accurately reflect a realistic view of students' efforts in the discourse. Diffusion-based centrality reflects how learners influence others' contributions (e.g., encourage others to reply and participate) adding an aspect that traditional centrality measures do not reflect.
Another finding of our study was that degree centralities, which reflect the number of contributions and number of replies, showed good consistency and correlation coefficients. This is not surprising, since they reflect students' efforts, participation in the discourse as well as the number of received replies (Joksimović et al., 2016;Romero & Ventura, 2020). However, as mentioned before, they do not reflect the breadth and range of uptake or the engagement of collaborators. The results of the most commonly used traditional measures (closeness and betweenness) regarding correlation coefficients with grades were the least consistent, which casts doubt on their value as indicators of student success. We believe that time has come for researchers to consider more relevant centrality measures such as diffusion centralities, as highlighted in the review of the literature (Table 2), these findings help fill such gaps and offer a credible alternative. A noteworthy finding is that the correlation of closeness centrality with argumentation and information sharing were moderate in most categories, pointing to the value thereof as a metric for information uptake. In other words, while caution should be exercised when using closeness centrality as indicator for success, it can be a reasonable indicator of participation in the discourse.
Diffusion-based centralities should be interpreted as proxy indicators for productive, stimulating discussions, longer threads, taken up by students or contributions that are reply worthy. This "proxy" indication offers a much-needed value compared to the ease of calculation and strong association with coded data. However, it is limited-as are all proxy indicators-in terms of accuracy, and specificity. In fact, this limitation applies to every tool and method that has been used to capture human discourse, ranging from natural language processing to even the more exhaustive methods of code and count. Natural language processing -although promising-has its challenges e.g., the need for large labeled datasets which are largely unavailable in education yet, and the difficulty in model interpretability (Li, 2018).
A remarkable challenge in collaborative learning networks relates to the accurate identification of students' roles in a network. In this study, diffusion-based methods were used to identify emergent roles in the collaborative learning process. We relied on the aggregate values of diffusion and traditional centrality measures as well as machine learning to identify meaningful roles based on students' interactions (RQ3). Compared to previous research, which has mainly used manual profiling based on preset roles (C.-M. Chen & Chang 2014;Marcos-García et al., 2015;Ouyang & Chang, 2019;Saqr, Fors, Tedre, et al., 2018;Stuetzer et al., 2013;Temdee et al., 2006), our approach relies on students' contributions of ideas that are reply-worthy and more likely to be taken up and help engage others, in other words, ideas that help advance the argumentation process as well as contribute meaningfully to the discourse. Our method exhibited strong to relatively strong effect sizes when validated through comparison with coded interactions. We believe that this method offers easier interpretability of the roles and more relevance to collaborative knowledge construction. Our work builds on the work by Kim & Ketenci (2019), who used clustering for identifying roles, by adding the uptake aspect that is more relevant to discourse and argumentation. We have identified three roles: influencers who are able to generate more engaging and replyworthy interactions, mediators who are more likely to engage and contribute to the argumentation rather than generate novel ideas, and an isolate type. The identified roles were significantly different regarding their diffusion centrality profiles with strong effect sizes, confirming the robustness of the technique. Similarly, the differences between the identified roles regarding performance showed relatively strong effect sizes. The comparison of the identified roles has revealed remarkable differences regarding performance with relatively strong effect size pointing to the significance of the findings.
Another noteworthy finding is the high number of emergent isolate students, while this is a frequent finding in the literature (Kim & Ketenci, 2019;, diffusion centralities have shown that a large number of the seemingly active students are not contributing valuable ideas to the discourse. That is, they focus on sharing facts, and information rather than engaging in argumentation, debate, or contributing ideas that are worthy of discussion. A possible remedy could be through structuring the collaborative process with scripted roles, rotating the duties, and specifying how students approach the task. To identify such isolated roles, computational methods that take into account the content of the message and the depth of the thread, like diffusion centrality measures or specialized tools that support argumentation, are thus needed (Miller & Volz, 2013;Schwarz et al., 2015;Schwarz & Glassner, 2007).
Moodle -the learning platform used in this study-may have been not very efficient in helping students engage in deeper discussion, debate, or argumentation. In fact, compared to the specialized tools, Moodle lacks lots of the functionalities which support argumentation, e.g., visualization of arguments, visual editors or participation tracking (Anderson et al., 2001;Lund et al., 2007;Muller Mirza et al., 2007;Nussbaum et al., 2007;Tsai et al., 2012). The findings of this study could add an interesting dimension to these tools by offering quantitative indicators that help teachers get an idea about students' interactions. Especially in large courses where visualization of a large number of messages may become difficult to understand.
PBL shares several similarities with CL but also differences.
(1) Both PBL and CL have a common task or a learning activity, which is typically a real-life problem in PBL but not necessarily a problem in CL.
(2) Both PBL and CL rely on small group interactions, in which students have to mutually collaborate to accomplish the task. (3) In Both PBL and CL, students are responsible and accountable for their learning. (4) Both PBL and CL have interdependence as a fundamental feature, however, both the approaches differ significantly in interdependence (Davidson & Major, 2014). In CL, teachers commonly use social and academic goals as a means to foster positive interdependence as well as structured tasks or assignments; seldom using roles, rewards or points. In PBL, teachers may use goals, structured tasks, roles and sometimes rewards (Davidson & Major, 2014). As such, information and problem-solving interdependence are key features of PBL as well as a motivating reallife problem with a structured interaction process where roles are commonly assigned (especially in face-to-face settings). These differences, and peculiarities of PBL would translate to structural graph differences, where groups are expected to have intense interactions, and dense information leading to connected and dense networks. Other essential features of PBL are that students do not essentially seek to solve the problem, but rather debate, discuss, and use argumentation which may result in more branching, deeper and longer threads.
The fact that PBL discussions are dense, with richer information exchange and debate make them likely to correlate with performance compared to other CL settings where discussions may be sparse, and far less correlated with performance. In some higher education settings, participation in CSCL may be below what educators aspire (Jeong & Hmelo-Silver, 2016). Similarly, MOOC forum discussions tend to be very sparse with infrequent participation and diverse students . Therefore, the scope of generalization for this study lies in similar PBL contexts, structured CL settings, and CL settings that involve rich information exchange. Generalizability to other contexts should be empirically confirmed before taking the results of this study at face value.
Our study has several strengths. We have demonstrated the results in 69 courses and thousands of students which is a proof of feasibility and scalability. We have also used a robust method-meta-analysis-for the calculation of the combined correlation coefficients as well as the predictive intervals, which offers a more realistic expectation of the performance of the predictor in the future and takes into account the heterogeneity and sample sizes. The discovered roles in our study have been verified with a coded dataset and showed high values of effect sizes regarding the separation of roles or grades. We believe that using the data of all the courses in the institution presents solid evidence as well as excludes possible bias. Our approach offers a verified computational method which is feasible to implement due to the availability of different software applications that can compute such diffusion-based centrality measures. Educators can use such metrics as robust indicators of success for students in collaborative settings or monitoring their productive interactions and how they contribute as well as encourage others to contribute. We wish that researchers build on our approach, investigate other methods of analysis and other metrics, and compare results to ours. Although we have used a large sample, the generalizability of results is subject to investigation and further confirmation.

Limitations
Our study is not without limitations. The inclusion of online PBL grades within the final grades may have added some noise or confounding to the correlation analysis. Since these grades were only 5% of the total grade students earned, we believe this effect is minimal and had no effect on the conclusions of the article. Since most of the studies have performed correlational analyses, we opted for using correlation analysis as well to be able to compare our results to those of previous works, and to establish the value and worth of the indicators. Accounting and investigating the different factors that influence the obtained results warrants further investigations, which we aim to pursue in our future studies. We studied the indicators separately to facilitate easy interpretability and estimation of magnitude of effect size so that educators can have an idea about what they can expect from an indicator. Several other analytical techniques could help understand the complexities of the interaction process e.g., Hierarchical Linear Modeling (HLM). HLM could account for the role of tutors, the composition of the group and other factors such as nestedness. Lastly, we have used a large number of students to derive our conclusions. Such inference using a large dataset may result in statistically significance results, even with small or negligible effect sizes. Therefore, we have relied on the effect size as a more accurate measure for the magnitude of results as well as predictive intervals. We encourage the readers to rely on the effect sizes as a more reliable measure of magnitude and expected impact of the findings (Suthers & Hundhausen, 2003).

Sharing info and materials
Sharing share information or learning resources related to the discussion.

Sharing facts
Providing objective information (no reasoning), which does not represent the student's words (e.g., definitions or facts).

Conclusion
Closing a discussion or stating the outcome of the conversation.

Comparison
Contrasting different issues and ideas or concepts by comparing them without reasoning.

Argument and debate Evaluation
Assessing other students' work or ideas and giving opinions or judgments for their or others' work with reasoning.

Argument
Reasoning, justifying, or rationally trying to resolve differences of perspectives and issues in critical discussion.

Continuing argument
Continuing an argument given by other students.

Counter argument
Providing a contradicting view for a given argument with reasoning and rationally disproving.

Disagreement
Disagreeing with other students' opinions or statements.

Questioning Question open
Raising a question that demands for new information and elaborated explanations of a specific contribution.

Question critical
Raising a question to challenge another person's contribution.

Question verification
Asking for clarification of a proposed idea or shared knowledge.

Question alternative
Making a logical extension of a previous question that has already been answered proposing an alternative explanation.

Team building
Talking about the group and related issues.

Problem handling
Addressing about a specific team issue and way to solve that issue.

Task management
Discussing about how to divide the work and about the progress of the assigned tasks.

Sharing feelings
Sharing their feelings about others' work or comments.

Sharing experience
Sharing experiences related to the discussion.