Introduction

Social network analysis (SNA) is a quantitative methodology frequently used in educational research to examine relationships, most commonly the relationships between students or teachers (Saqr et al., 2022). Over the last two decades, SNA has expanded from examining human self-reported relationships to analyzing learner interactions captured by educational technologies (Haythornthwaite & De Laat, 2012). Since then, SNA has been applied to trace data in various ways, including social media (Kitto et al., 2015), computer-supported collaborative learning (Dado & Bodemer, 2017), online course discussions (Chen et al., 2018), MOOCs and professional communities (Suthers, 2015; Wise & Cui, 2018), as well as social annotations of course readings and videos (Chen et al., 2022; Hecking et al., 2017). Despite the wide range of applications, methodological re-considerations required when SNA is applied to digital traces (Howison et al., 2011) are often overlooked (Poquet & Joksimovic, 2022).

The application of SNA in digital learning environments has been criticized (Chen & Poquet, 2022; Poquet & Chen, 2023; Poquet & Joksimovic, 2022; Poquet et al., 2020, 2021). The critiques call for systematicity in how networks are defined (Poquet et al., 2021) and modelled statistically across existing studies (Poquet & Joksimovic, 2022), in how digital learning theories are used to statistically model trace data (Poquet & Chen, 2023), and in how stakeholder values are considered in SNA of learner digital traces (Chen & Poquet, 2022). These critiques emphasize that a reliable cross-course analysis of digital learning interactions requires systematic, conceptually consistent, and scalable methods for analyzing networks that form in diverse contexts, for instance, in different courses via various technologies and pedagogies. Much of the critique remains prescriptive. To respond to the criticisms, empirical studies utilizing SNA need to surmount conceptual, methodological, and practical challenges.

This paper outlines some of these challenges and presents a framework to address them. The paper is organized as follows. Sect. “Challenges Facing SNA of Digital Interactions” outlines conceptual, methodological, and practical challenges facing SNA of digital learning traces. Sect. “Multi-level View of Online Discussions” proposes a conceptual framework and a methodology for overcoming these challenges. Sects. “Methods” and “Results” present methods and results obtained when the framework was applied to analyze student online interactions in discussion forums of twenty university courses. The framework analyzes multiple levels of learner interactions: (1) individual and dyadic posting behavior; (2) patterns of forum communication projected from the networks of forum posts, and (3) structures of emergent relationships emergent during communication patterns. Individual, network, and group levels within the framework are interdependent conceptually. This means that statistical modeling of the communication networks from lower level of individual participation affords further inference of the relational structures underpinning them. At each level, a separate set of hypotheses can be made to shape the statistical modeling, and our analysis presents just one example of what that could look like. The results show that analyzed courses have regularities in the patterns of posting behavior at the individual level and in the patterns of relational processes at the group level, whereas their communication patterns—currently the most studied level of analysis—appear to be more dissimilar. Having demonstrated the methodology, in Sect. “Relevance of Multi-level Framework” we discuss how this proposed framework overcomes current challenges and where further work is needed.

Challenges facing SNA of digital interactions

Interpersonal interactions and relationships are integral to student learning and success (Astin, 1993). Learning theories emphasize the importance of direct interaction or observing others in constructing knowledge or socializing into existing practices (Bransford et al., 2000; Bruner, 1996; Dillenbourg, 1999; Mayes, 2015; Teasley, 1997). In distance education, interactions are particularly important for reducing the psychological distance between learners distributed in time and space (Abrami et al., 2011). With the wide adoption of technology for learning, data collected in digital environments have been viewed as an opportunity to inform the facilitation of learner interactions. Analysis of what learners talk about has become an important part of studying learning from peer interactions (Henri, 1992). Analyzing the structure of learner exchanges online, such as who they are talking to, offered a further way to quantify learner relations (De Laat et al., 2007). Against this backdrop, SNA has gained traction as a technique for analyzing the structure of online interactions.

Recently, analyses of the structures that represent online interactions in learner discussions have come under criticism (Lund & Suthers, 2016; Poquet et al., 2021; Wise et al., 2017). Wise et al., for instance, showed the impact of how networks are defined on findings: arbitrary network definitions result in the arbitrary inclusion of data points as network nodes or ties and influence the results. Poquet and Joksimovic (2022) described the diversity of network analysis of learner traces as the cacophony of network approaches. To improve the quality of empirical work, Poquet et al. (2021) developed recommendations for the generalizability and reproducibility of network studies in learning analytics. However, many of these calls remain prescriptive.

To respond to these criticisms of SNA of trace data in digital learning, empirical studies need to surmount several challenges. To demonstrate these challenges, let’s consider a hypothetical scenario. An instructor implemented a new intervention to facilitate interpersonal activity between the students in the course. An analyst would like to compare the effects of this intervention across several courses and create a dashboard visualizing interaction patterns for the instructors. However, several challenges arise.

Firstly, the analyst faces a practical challenge of what learning outcomes the analyst should evaluate. SNA in learning analytics often targets learning outcomes measured at the level of a network, such as learner degree centrality, often operationalized as how many peers a student interacted with. Although this metric has been associated with grades (Saqr & López-Pernas, 2022), creativity (Dawson et al., 2011), and quality of discourse (Dowell et al., 2015), instructors may also be interested in other outcomes, such as individual level of participation or group-level indicators of community formation. The choice of these outcomes depends on the interest of individual instructors—some may value the development of a learning community as the goal of their intervention, whereas others may prioritize higher levels of individual participation in group activities. Individual participation, positioning in a communication network, and community development are outcomes that need to be measured at different levels. Further, scant evidence exists to advise as to how these outcomes are interdependent, i.e., if individual participation also advances community development. As a consequence, the final decision as to what learning outcome to focus on, remains with the stakeholder.

Second, the analyst faces methodological challenges. The problem of defining a network, that we have noted earlier, is just one of the methodological challenges. A direct visualization of a chosen network, which is a common way to represent patterns of group activities, is not a reliable source of insight since graph visualization can be easily manipulated to impose a particular interpretation. Instead, quantitative SNA metrics are needed, and descriptive SNA metrtics remain commonly used in dashboards and in studies (Dado & Bodemer, 2017; Poquet et al., 2021). In many instances, however, these descriptive indicators from different courses cannot be compared directly. To enable cross-course comparison, once networks are constructed, the analyst needs to conduct statistical analysis of the networks.

To address practical and methodological challenges, the analyst needs to attend to conceptual decisions. As we explained earlier, statistical analysis of networks is often needed for cross-course comparison. Conducting statistical network analysis requires choosing a theory that explains how the network forms and evolves to the desired outcome. Statistical modelling of networks involves comparing an observed network to a distribution of randomly generated networks simulated via conceptual hypotheses about why its ties form. The generative model specifying the rules for the simulated network is called a null model. Very few studies attempt to choose a null model that explain formation of learner networks in digital settings (Poquet & Chen, 2023). Instead, researchers often use social science theories of how relationships form in everyday life, when they statistically analyze networks of digital interactions.

Using social science theories to explain how relationships form seems a natural choice: SNA already embeds a theoretical view because many of network measures and their interpretations are deeply rooted in structural sociology (Wellman, 1997). However, learner ties in networks in digital settings are not equivalent to the social relationships studied by sociologists. In social networks, a tie between two people represents a social relationship, elicited by asking people to report who they trust or seek advice from. Measures of social relationships are well-theorized (Haythornthwaite, 1996; Rivera et al., 2010), yet if and when their interpretations are transferrable to digital learning is unclear. Similarly, statistical modelling of social networks relies on these theoretical tenets of how networks of social relationships form (Lusher et al., 2013). In this process, guided by the assumptions about how social networks form, random networks are generated from the processes known for their role in the formation of social networks, such as reciprocity (‘I scratch your back, you scratch mine’ mechanism), transitivity (‘a friend of a friend is my friend’ mechanism), and generalized exchange (‘pay it forward’ mechanism).

In contrast, interaction networks from digital data do not represent social relationships although some of the ties may potentially correspond to the relationship ties. Chen and Poquet (2020) argued that ties in social networks are constructed from the perceptions of a ‘state’ as to whether relationship between two people is perceived by either of them as real. Such ties are conceptually different from the ties in learner networks inferred from digital trace data, constructed from learner activity, representing ‘events’ of what has happened online. Learner networks inferred from digital traces may therefore not be theorized through the same mechanisms that describe social network formation, such as preferential attachment (Toivonen et al., 2006) or structural balance theory (Cartwright & Harary, 1956). Digital learning theories, such as knowledge building (Scardamalia & Bereiter, 1996), connectivism (Siemens, 2005), and networked learning (Jones, 2015), describe learning as a process situated within the socio-technical systems that form as learners participate in learning activities that are also mediated by learning artefacts. The actors involved in digitally mediated learning environments can be learners, technology-mediated artefacts, or units obtained from the language used for the communication (Jones, 2015; Nardi, 1996; Scardamalia & Bereiter, 1996; Siemens, 2005). The ties underpinning the relationships can represent different actions, relations, and qualities. Breaking down digital learning theories into the methodologies for analyzing and modelling these digital learner networks is nascent, and empirical studies are yet to rise to the challenge to implement these conceptualizations (Poquet & Chen, 2023).

Multi-level view of online discussions

To address existing practical, methodological, and conceptual challenges, we propose that analysis of digital learner interactions should differentiate between three different processes that overlap within the digital learning space: (1) individual participation in a discussion forum through the activity of posting; (2) communication that unfolds as learners contribute to the discussion forums; and (3) emergent relationship formation among some learners as they post and communicate with others. These three processes can be understood as occurring at the individual level, network-level, and the group level. We operationalize them as interdependent, where each level gives rise to the next one continuously (Fig. 1). In the remainder of the section, we draw on existing literature to explain the relevance of posting activity, communication, and relationship formation for digital learning and discuss their possible operationalizations.

Fig. 1
figure 1

Separating processes of participation, communication, and relating in discussion forums in digital settings

Posting activity

As per the literature on digital learning, posting activity serves as an indicator of online participation (Hrastinski, 2008). The act of posting on discussion forums indicates that learners are engaging with the activities designed by instructors. This engagement is prompted by the combination of designed pedagogical activities and learner motivation. While some learners may value interaction with their peers, others may prioritize their specific learning needs (Eynon, 2014). Nevertheless, they all respond to the pedagogical activity that requires them to post. Counting the number of posts on forums is a standard method used to evaluate forum participation. Counts of posts at the learner level is associated with the learner's final grade (Macfadyen & Dawson, 2010) and can be used to predict learner engagement and completion (Joksimovic et al., 2018).

Posting activity can be captured through tree-like network structures of post-to-post relationships (Aragón et al., 2017), with an example of a post-to-post network presented in Fig. 1a. Posts that respond to the very first post in the discussion are linked with an explicit ‘reply-to’ relationship. The depth of such tree-like network reflects the number of levels, or turns, within the discussion thread. Thinking of the networks of posts as the basic representation of forum activity could be useful as it affords to represent networks of distinct text-based units, preserving the content and linguistic styles of each contribution in a discussion, even if they were authored by the same person. Networks of posts are not yet aggregated to the learner level. As a result, the individual characteristics of each post, such as its function in a conversation, its relationship to the main ideas generated, or its linguistic properties, can be examined. Networks of posts therefore have potential to capture the mechanisms of how networks form in text-based settings than learner-to-learner networks.

Communication structures

Communication structures, or learner-to-learner projections of posting activity (Figure 1b), are commonly used to analyze learner networks inferred from digital trace data (Poquet et al., 2020). Communication between individuals is a central component of networked learning and connectivism theories. Connectivist settings require curating, amplifying, filtering, and guiding attention to different signals that travel through the network (Siemens, 2005). A more amplified and interconnected communication network is deemed effective for learning. Learners’ roles and their positioning within the communication structure may be critical to how the discussion unfolds across the group, and the roles played by individuals in digital settings have been studied through log data and text (Arvaja et al., 2008; Dowell & Poquet, 2021; Martınez et al., 2003).

The structure of communication networks can be captured through metrics such as graph density, reciprocity, and transitivity (Dado & Bodemer, 2017). Descriptive measures of communication structure are commonly used to describe learner interactions in online learning scenarios, but they are not always meaningful without a statistical evaluation of whether these would have been expected by chance. Most often, null models for communication networks in digital settings are implemented to generate learner-to-learner activity. Instead of simulating these learner-to-learner networks, Poquet et al. (2020) proposed using null models at the level of posting activity, they used learner-to-learner networks projected from the random networks generating posting activity to explain learner-to-learner communication patterns in digital learning settings. That is, simulated post-to-post networks can be transformed to simulated learner-to-learner network projections, which can then be evaluated against observed learner-to-learner projections for testing hypotheses about network formation.

Relational structures

Social relations between learners are also important in digital spaces. Relationships between learners underpin community formation, and theories such as community of inquiry presuppose that affective, relational community ties underpin knowledge construction in learning cohorts (Akyol & Garrison, 2008). The biggest challenge is to establish if any of the interpersonal text- and technology- mediated learner activity in fact constitutes a social relationship. Methodologically, social networks from digital data have been represented through learner-to-learner relations, where the ties are stringently defined and elicited from the communication network, such as through direct name tagging (Gruzd, 2009), filtering based on belonging to a particular category (Poquet et al., 2017) or statistical filtering of the frequency of learner-to-learner interactions (Mukerjee et al., 2022). We follow in this approach, suggesting that social structures and communication structures overlap but may not be identical. To operationalize what we call here ‘relational’ structures that evolve through online communication, we follow a view that sustained quality of interactions would constitute a relation, an evolved state between two people that is more than just information exchange. In a similar vein, a networked learning paradigm differentiates between the strong and weak interpersonal ties in a learning network (Jones et al., 2008), acknowledging the importance of including both intermittent and sustained relationships in the analysis of learner communication. Figure 1c captures this view of a relational structure as a backbone of a communication network.

Bringing together posting, communication, and social network levels

We have argued that participation, communication, and formation of relational structures in discussion forums online represent three different processes that have been discussed in digital learning literature. We suggest that posting activity gives rise to communication patterns and emergent relational structures. Table 1 summarizes the levels and units of analysis for each of these processes, as well as offers examples of research questions and indicators that can be used by the analyst. To demonstrate how this framework can be applied, we use it to analyze discussion forums in twenty online and blended university courses.

Table 1 Cross-level comparison in the suggested framework

Methods

Study context

Data were collected from the Moodle platform in twenty courses, offered in Social Sciences and Humanities of the same department in August—December 2016 in an Australian university. The courses were selected due to the high number of forum posts as per the administrative records. Courses were taught by different instructors and teaching assistants and included both blended and fully online offerings. The courses enrolled on average some 44 students (M = 84, SD = 111) and lasted one semester, typically 13 weeks. Average estimated weekly workload in all courses was 10 h (SD = 2). Two large assessments were typically present in most courses. In many courses, in-person attendance was not mandatory, which meant that although the courses may have been listed as blended, some students experienced them as fully online whereas some others may have chosen to occasionally attend. For this reason, we do not differentiate between the courses as blended and online, and this presents an important limitation of the insights we will present.

As part of the study, we took qualitative notes on how the instructors designed the tasks for discussion forums. In one course (D) learners were placed in fixed small groups throughout the course, with weekly assignments to be completed in this group. In other courses (F, G, E, R, J, K, Q, S, A, G, T) students were requested to reflect on a reading and post their individual answers. How the instructors set this up differed: in most courses the instructors set this up in a way that enabled all cohort to post weekly in one thematic forum (e.g., G), in other instances the instructors set up weekly thematic forums as well as smaller group spaces where specific students were assigned to post within the thematic forums (e.g. T). In courses L and M the students were requested to specifically respond to two other students who have not yet been responded to in that week. In some courses (P, O, N, I, C), there were no special tasks for discussions forum activity. We used this information about the generic nature of the discussion forum tasks to contextualize the results presented in Sect. “Relevance of Multi-level Framework”.

Overall analytical approach

Our methodology examines (1) posting indicators, (2) the structure of communication patterns, and (3) emergent relational structures in twenty courses. Data were collected at the level of a course forum—a collection of discussion threads in a Moodle system. The following analytical steps were taken:

  1. 1.

    Posting activity networks (post-to-post networks, Fig. 1a) were constructed. From each post-to-post network, we derived indicators of posting activity at the individual level (e.g., number of posts) and at the dyadic level (e.g., number of replies). Such indicators of posting activity were used to group twenty online courses into 5 groups as per the student activity in them, using principal component analysis (PCA).

  2. 2.

    Posting activity networks were then used for simulating distributions from the null models of posting activity, described, and evaluated in Poquet, Tupikina, and Santolini (2020).

  3. 3.

    Posting activity networks (both simulated and observed) were transformed into learner-to-learner networks. Observed learner-to-learner projection from each course was compared to the distribution of random learner-to-learner projections simulated for each course.

  4. 4.

    Edge weights in learner-to-learner projections were filtered, retaining only those edge weights that were two standard deviations higher than average edge weights in the simulated learner-to-learner projections.

  5. 5.

    Exponential random graph modelling was used to analyze patterns in these newly formed, filtered networks. Hypotheses for network formation for the ERGM analyses were drawn from our previous work on social processes in MOOCs (Poquet et al., 2017; Poquet & Dawson, 2018).

We further detail each of these steps in this section.

Grouping courses around post-level activity

Networks of forum posts were constructed for each course (Fig. 2). A node in such a network is a post, if another post was added using ‘reply’ button in Moodle, the two posts were linked. Exemplified in Fig. 2, each small network represents a discussion thread.

Fig. 2
figure 2

Posting network structures in courses K, D, P, and M. Nodes are posts (not students). Connected posts took place in the same discussion thread, with the use of the reply button constituting a tie between two posts

These networks of posts, further referred to as post-networks, were used to derive indicators of posting activity. These indicators describe individual activity such as the number of posts, as well as dyadic activity, such as a post-to-reply ratio in a course. Specific indicators computed for each course were: total number of forum posts in a course; percent of posts made by instructor; size of the largest discussion thread (i.e., largest component in the network of posts); mean number of posts per person relative to the person’s activity in other courses that semester (mean Z-score of the number of posts a person contributed on average in other courses that semester); number of discussion threads in a course (i.e., number of trees in a post-network); median number of posts per person in a course; mean number of posts per person in a course; mean number of posts across discussion threads of a course; number of replies within a course; post-to-reply ratio; percent of posts without reply (i.e. isolates in a post-networks); average number of replies per post; mean depth of discussion thread (i.e. number of levels in the thread).

We then conducted principal component analysis (PCA) of these indicators of posting activity across twenty courses (Fig. 3). The indicators of posting activity, specified above, correlated at least 0.3 with at least one other item, but not higher than 0.9, suggesting reasonable factorability. Kaiser–Meyer–Olkin measure of sampling adequacy was 0.53, which is adequate (Field et al., 2012). Bartlett’s test of sphericity was significant (c2 = 327.93, p < 0.0001). Initial eigenvalues suggested that three factors explained 41%, 23%, and 14% of the variance respectively, together describing 78% of variance.

Fig. 3
figure 3

PCA results. Letters represent courses, arrows—the relationship between two principal components and posting behavior indicators

Given the relationship between the features and factor loadings, we interpreted the two principal components as follows. The first component was interpreted as the ‘individual participation’—given its correlation with the indicators, such as the number of posts. The second component was interpreted as interactivity in the course as indicators of dyadic level activity—given that is correlated with features such as length of discussions or proportion of replies to posts on the forum. Here, we use participation and interactivity in their direct meaning. These two characteristics characterize to what extent learner behaviour in the discussion forum was more directed towards individual participation or towards responding or engaging with others, or both.

Comparing communication structures

Ties in the networks of posts were permutated to create a distribution of random networks of posts to correspond and compare with each observe network of posts. The methodology for simulation followed the approach detailed in Poquet et al. (2020). During the simulation, the number of nodes (posts made by each person) and out-degree in the post-network (number of replies a post received) was kept constant. This simple model controlled for learner activity but allowed randomness as to how the posts were connected in the simulated networks. In an earlier study (Poquet et al., 2020) we showed that this null model reproduced the degree in the learner network well, i.e. activity of the learner determined how many people they spoke to, but was not sufficient to explain clustering, i.e. exchanges between three people or in smaller groups. Simulations kept out-degree in the networks of posts fixed at ‘one’ (each post can only have one outgoing tie as it can only be a reply to one other post). R package ergm (Handcock et al., 2015) was used. Once networks were simulated, they were transformed into learner-to-learner network projections where ‘post X by learner A to post Y by learner B’ relations were converted into ‘learner A to learner B’ ties with the tie weight equivalent to the frequency of exchanges between each pair. For each set of networks (random and observed), network-level metrics were computed, such as the number of communities (Zeng & Yu, 2018), density, the Gini coefficient for the network degree, assortativity, transitivity, and edge weight. A Z-score for each of these network statistics was derived for the observed network in comparison to the values in the distribution of corresponding random networks.

ERGM of relational structures

To derive relational structures of learner exchanges, we filtered communication networks to derive a less noisy underlying communication structure. Edges in the observed network that were higher than two standard deviations than the average edge weight describing the distribution random networks were retained. These represented relational structures where learners were much more likely to exchange many times and throughout the course. It is plausible that these pairs formed impressions of each other and experienced social presence, though the data reported here had no self-reports to verify if this assumption holds. Then, we analyzed these filtered networks using ERGMs. ERGMs generate a distribution of random graphs and estimate if the features hypothesized as critical for network formation occur beyond chance using multiple hypothesis as to why the structure we are observing has formed (Lusher et al., 2013).

We used three hypotheses to test the formation of the network in the relational network structures, analyzed with ERGMS: the patterns of reciprocity (A to B, B to A), cyclical ties (A to B, B to C, C to A), and Simmelian ties (A to B, B to A, B to C, C to A, C to B, B to C, as shown in Fig. 4). These were selected because they were previously found in interaction networks (Joksimovic et al., 2016; Kellogg et al., 2014; Zhang et al., 2016). Our prior work showed how these features can differentiate between relational structures in MOOCs with different facilitation strategies (Poquet, 2017; Poquet et al., 2017; Poquet & Dawson, 2018).

Fig. 4
figure 4

Suggested indicators of interpersonal relationships in digital learning

We explain briefly why we would expect to observe these three patterns in an online discussion forum. Initially, learners mainly respond to each other (reciprocity pattern). Over time, some learners are more likely to co-occur in the same threads due to shared interests, activity levels, and ability, forming two-path structures, such as A to B, B to C, that over time may close into triad where ties represent repeated interaction online. Thus, the presence of the cyclical ties pattern was interpreted as having higher levels of information exchange, while the presence of the Simmelian ties pattern was interpreted as the onset of group formation processes. It is important to note that we do not interpret these patterns as being equivalent to their counterparts in social relationship patterns, where the Simmelian ties pattern represent the strongest cliques. In the context of sustained online conversations, we view Simmelian ties as building blocks of equivalent importance, but likely a different socio-emotional quality. R package ergm was used for the analysis (Handcock et al., 2015).

Results

Indicators of posting activity

We have conducted principal component analysis of indicators describing posting activity, such as total number of posts in a course, average number of exchanges in discussions within a course, etc. We interpreted two principal components describing learner posting activity as ‘individual participation’ and ‘interpersonal orientation’. The courses have varying degree of how much individual participation and interpersonal orientation they elicited in the students, but overall they could be positioned within four quadrants (Fig. 3):

  • High Participation and High Interactivity (courses S, Q, A, F, D),

  • High Participation and Low Interactivity (courses K, E, G, R, J),

  • Low Participation and High Interactivity (courses L, T, M), and

  • Low Participation and Low Interactivity (N, C, I, B, O, P, H).

Figure 5 presents the means of scaled indicators used for PCA, for visualization purposes. High participation and high interactivity group included one outlier—course D. This course had much higher indicators compared to all the other courses in this quadrant. We interpreted it as having high participation, and the rest of the courses in these quadrants were A, Q, A, F described as having moderate participation. Indicators used for PCA can be directly compared across the courses as many of them represent counts of activity. For example, Fig. 5 shows that Course D had a much higher number of posts and replies than any other course, whereas the percentage of posts without a reply in Course D was much lower than in many other courses.

Fig. 5
figure 5

Means of the posting behavior indicators per course group. Indicators across the courses were normalized for comparability: y-axis value is scaled, with 0 representing average for all courses, and values over 0—higher than average indicators, values below 0—lower than average indicators

Using the information about various choices instructors made for the forum task instructions, we can contextualize these indicators of the in-course participation. Course D, for example, assigned learners to fixed small groups and required them to complete a weekly task as a group. This course has the highest activity, both for individual participation and for interacting with others. Courses S, Q, A, F had larger groups of five to seven learners placed into a discussion thread weekly, to discuss a reading-related open-ended question. Courses L, T, and M invited learners to respond to two other learners on a regular basis. Their forums show high interpersonal orientation, but lower posting on the individual level. Courses in the group of K, E, G, R, and J had frequent graded tasks where learners were asked to post their personal view about a weekly topic or reading. This group of courses had forums with high interpersonal activity, which was less oriented to responding to others. On the other hand, in courses N, C, I, B, O, P, and H, there were no clear tasks for learner participation in the course forums, although in some of them, instructors set up discussion spaces without accompanying tasks. These courses without pre-defined pedagogical tasks or assessment-driven activity had forums with low posting activity.

Differences in communication structures

Figure 6 summarizes the results of examining communication structures in the forums of twenty courses. Heatmap cells indicate observed descriptive statistics of the SNA measures for number of communities, density, Gini coefficient for the degree distribution in communication networks, assortativity, transitivity, and edge weight in each of the learner-to-learner networks. Each cell indicates descriptive metrics from the observed learner-to-learner projections, cell colour captures if the raw metric was higher or lower than observed in a distribution of 20 random networks, based on the Z-score for the metric. Values that were more than two standard deviations lower than the random mean are colored in blue, those two standard deviations above the random mean—in red.

Fig. 6
figure 6

Cross-course comparisons of network metrics for learner-to-learner communication structures (Color figure online)

When examining communication structures, we found that it is more challenging to differentiate between the courses using communication-level indicators. All courses had a higher-than-expected frequency of interaction between two learners. Some other patterns were observed. For instance, courses with higher participation activity had communication structures where subgroups of learners emerged beyond what would be expected by chance (i.e., number of communities). Moreover, these courses had a higher-than-expected Gini coefficient of a degree distribution, indicating the inequality of communication across the network that captures that learners did not respond to all peers equally. Higher Gini coefficient signals uneven clustering within the communication network (to remind, clustering coefficient reflects that in some parts of the network triads are starting to form). This can be interpreted as follows: as learners post individual responses without an incentive to engage with the others specified in the instructional task, they may feel compelled to respond only to those posts that spark their interest or are relevant. In contrast, courses with high interactivity had more equal participation across learners, with lower-than-expected Gini coefficients. This can be interpreted as follows: as learners are forced to respond to posts that require replies, these conversations may be less natural, so there is less discussion turn taking between learners who self-select to engage. Hence, network clustering that captures discussion turn taking beyond change is less likely. If our interpretation of these results holds true, then they also suggest a dilemma in choosing between communication structures that foster sub-group development that could result in homophily versus discussions that foster amplified information flow across the entire learner cohort but may not result in genuine interest and potential for relating across the learners.

Patterns around the transitivity indicators that reflect network clustering, i.e., the formation of triads, were unclear in courses with low participation and high interactivity. As learners were asked to respond to two others in such courses, triadic indicators here are likely to reflect the task and need to be interpreted with caution. Courses with low participation and low interactivity had many metrics that did not differ from what would have been expected by chance for the networks of their size.

Relational structures underpinning communication structures

We examined relational structures emergent within communication networks. These relational structures were themselves derived from communication networks: communication ties between students with the frequency higher than two standard deviations above the mean across a distribution of the random networks were retained, while other ties were disregarded. Using ERGMs, we examined these relational structures representing frequent interactions using hypothesis about social network formation, such as reciprocity and closure (formation of triads).

Three types of patterns were examined: reciprocal dyadic, cyclical ties, and simmelian ties, i.e., a pattern where three learners would have reciprocal ties. The results (Table 2) show that all three patterns were significant and positive only in course D, with highest participation and interactivity, suggesting that only in this course learners progressed to the formation of activity at the level of a triad, as indicative of social processes. The finding makes sense, given that this was the only course with small groups asked to interact weekly. Courses with high level of interactivity showed positive and significant patterns of cyclical ties, suggesting that communication processes in them started to lead to the development of exchange in triads. We interpret this pattern as indicative of a lower presence of social processes than in course D, yet higher than in the remainder courses where cyclical ties could not be fitted or were not significant. Overall, these patterns suggest that courses with high participation and high interactivity were more likely to create conditions for relationship formation among learners. We are also careful to interpret courses with low participation and high interactivity as garnering more frequent learner interactions in triads, since transitivity in communication networks could be there by design, and log odds for cyclical ties are mixed and low.

Table 2 Summary of ERGM outputs for relational structures in online forums

Synthesis of empirical results

Our suggested methodology demonstrates how one can examine indicators of learner activity, communication patterns, and relational structures (Table 1). Posting activity approached as a fundamental process of discussion activity offers useful and comparable insights across the courses. Our findings demonstrate the heterogeneity of patterns examined at all three levels. First, when it comes to posting activity, we highlight that discussion forums activities prompt learners to engage in ways that reflects their individual participation as well as their extent of engaging with others. Indicators of communication networks and relational structures allow for easier distinction between the courses and comparison between them. The indicators of posting activity are also most trivial and therefore are easiest to implement for the instructors. Our analysis shows that these characteristics of learner activity in discussion forums, namely how much individual participation and interactivity they elicit from learners, further help interpreting communication and relational structures.

When it comes to indicators at the level of communication networks, we show that these are more difficult to interpret and generalize. Patterns in communication structures vary, and raw metrics from communication networks are not insightful. Visualizations of communication structures are also less clear for interpretation than those representing relational structures (Fig. 7). For instance, communication structures with low density indicators describe communication in small groups or sub-groups, higher density suggests more amplified information flow, without network sub-groups forming. Either of these two structures can be viewed as effective, depending on the objective of the instructor. Interesting patterns describe courses with high participation and low interactivity. Their communication patterns to some extent resemble those in course D that was designed for small group groups. However, ‘high participation—low interactivity’ courses do not presuppose any group work as learners are simply asked to post a reflection. One possible interpretation of why the structures of clustering emerge in these small groups is that individual-oriented activities enable interest-based discussions towards group formation, though further work needs to examine if such a claim is warranted.

Fig. 7
figure 7

Posting activity (left), communication structures (middle) and relational structures (right) for selected courses P, M, K, S and D as representative of different participation and interactivity levels

Relational structures have more regularity in the patterns describing their formation. Course D, the only one designed for collaborative work rather than social learning activity shows presence of group formation within its social structures. Results also suggest that courses that only focus on interactivity and do not promote higher levels of individual participation are not as successful at developing relational structures as the courses that focus both on individual participation and interactivity between the learners.

Relevance of multi-level framework

Despite widespread use of learner networks inferred from digital trace data, researchers face conceptual, methodological, and practical challenges in analyzing them. To address these challenges, we have suggested a framework that differentiates between different processes in online discussion threads: posting activity, communication, and formation of relationships between the learners. We have demonstrated the application of the framework to a set of courses. In this section, we explain how the framework addresses existing challenges faced by researchers who apply SNA to networks inferred from digital trace data and highlight where the gaps remain. For the summary of the section, please refer to Table 3.

Table 3 Synthesis of how proposed methodology addresses existing analytical challenges

First, in terms of practical challenges, currently SNA tends to focus on network-level outcomes, while analytics that capture interpersonal activities in the class should embrace various levels of social learning activity. To address this issue, future research needs to explore the relationship between the process and outcomes of interpersonal activity at the individual-level, group-level, and network-level, and how these relate to learning gains. Our results suggest that there may be conflicting tensions between outcomes at various levels of social activities. Instructor may value amplified information flow among all learners. If this is the case, our analysis shows that encouragement of individual posting activity promotes more unequal communication patterns across the learners, as we see in the courses with high individual participation. It should be noted that although our proposed multi-level framework is designed to enable the analysis of outcomes the level of participation, communication, and formation of relational structures, our example does not include any self-report or other data to triangulate our inferences about the nature and quality of the outcomes. Future work needs to integrate both student perceptions of the interactions as well as the quality of exchanges and learning gains.

Second, numerous methodological challenges exist. One of them is the need to classify courses in ways that are reproducible across various situations. The multi-level framework that we propose addresses this issue by using posting activity indicators to characterize the courses. Although this is helpful in our example, the challenge of generalizing pedagogical decisions for forum activity remains. Another challenge is the need for explicit and argumentative definitions for network ties, nodes, and boundaries. Our methodology does not address this problem directly, but it can easily integrate more rigorous justifications for the definitions of posting activity, communication networks and relational structures within the multi-level framework. Finally, current work over-relies on the use of descriptive metrics from SNA used for cross-course comparisons. Our framework directly addresses this issue because both communication metrics and relational structures are analyzed statistically.

When it comes to conceptual challenges, our proposed methodology makes a step towards strengthening conceptualizations in SNA studies of digital trace data by explicitly acknowledging three distinct processes of participation, communication, and relationship formation from digital learning literature. The multi-level framework embeds the theoretical inter-dependency between the levels. However, our application of digital learning theories does not apply specific theories to explain mechanisms acting in each level, besides student activity. This is intentional—we used a simple null model for exemplary purposes. To further overcome this challenge, future null models need to better integrate timing of activity, its frequency, and discourse, as they mediate interpersonal interactions. Importantly, multi-level framework is designed in such a way that the analyst must specify a generative (conceptual) model of the fundamental principles for how participation and communication emerge. Presented framework can further integrate conceptual elements related to specific theories of digital learning, that are particularly relevant at the level of posting activity. Elements related to communication theories or social network theories can be integrated at their correspondent levels of analysis. This way, the framework allows for conceptual alignment with various theoretical views.

As noted, a few important weaknesses of the framework remain. Firstly, it does not focus on a specific learning theory, and indicators of posting activity can be expanded to better align with relevant concepts in digital learning. For instance, forum-post-networks can reflect listening activity (Wise et al., 2013) or post-networks can be defined through categorical definitions of ‘builds on’ or ‘overlaps with’ using semantic approaches to tie construction (Dascalu et al., 2018). Alternatively, bipartite networks that include mediating artefacts can be included at the level of posting behavior (Hoppe, 2017). Secondly, our use of a null model to simulate random graphs provides merely a starting point. Overall, research operationalizing digital learning theories for statistical network modelling is scarce. In Poquet and Chen (2023), we discuss several possibilities of how these models can be operationalized. For the empirical work, we are aware of one other study (Chen et al., 2022) besides our own to have used null models in the analysis of digital learning. Thirdly, including content labels and discourse characteristics is necessary to understand communication patterns in digital learning. For instance, constructs or themes around the social norms may be more relevant to add to modelling social networks, whereas linguistic indicators (Dowell et al., 2015) or text transactivity (Howley et al., 2012) could help model properties of posting network formation. Fourth, learning outcomes, such as perceptions of learners and community development, can and should complement the framework to better understand and further establish the validity of the indicators. Without such a triangulation, the framework we provide offers weak evidence that it can be interpreted in ways we have done in this paper. Finally, our methodology presents the three distinct processes as static, but they are dynamically interacting through the progression of a course, and future work also needs to find ways to address the temporal aspects to shed light on the dynamics of digital interpersonal interaction.

Despite the weaknesses, proposed methodology extends the status quo in two ways. It offers a systematic view on how to connect various levels of interpersonal learning in ways that are coherent both conceptually and methodologically. Further, although the example we present is limited, the framework itself allows to integrate generative principles needed for statistical modeling and is, therefore, open to adjustment and further development for different digital learning theories.