Keywords

1 Introduction

This chapter provides a tutorial on conducting epistemic network analysis (ENA) and ordered network analysis (ONA) using R. We introduce these two techniques together because they share similar theoretical foundations, but each addresses a different challenge for analyzing large-scale qualitative data on learning processes.

ENA and ONA are methods for quantifying, visualizing, and interpreting network data. Taking coded data as input, ENA and ONA represent associations between codes in undirected or directed weighted network models, respectively. Both techniques measure the strength of association among codes and illustrate the structure of connections in network graphs, and they quantify changes in the composition and strength of those connections over time. Importantly, ENA and ONA enable comparison of networks both visually and via summary statistics, so they can be used to explore a wide range of research questions in contexts where patterns of association in coded data are hypothesized to be meaningful and where comparing those patterns across individuals or groups is important.

In the following sections, we will (1) briefly review literature relevant to the application of ENA and ONA, (2) provide a step-by-step guide to implementing ENA and ONA in R, and (3) suggest additional resources and examples for further exploration. By the end of this chapter, readers will be able to apply these techniques in their own research.

2 Literature Review

2.1 Epistemic Network Analysis (ENA)

ENA is a method for identifying and quantifying connections in coded data and representing them in undirected weighted network models [1]. There are two key features that differentiate ENA from other networks analysis tools or multivariate analyses: (1) ENA produces summary statistics that can be used to compare the differences in the content of networks rather than just their structure; and (2) ENA network visualizations provide information that is mathematically consistent with those summary statistics, which facilitates meaningful interpretation of statistical differences [2]. These features enable researchers to analyze a wide range of phenomena in learning analytics, including complex thinking and knowledge construction [3, 4], collaborative problem solving [5, 6], socio-emotional aspects of learning [7], mentoring [8], and teacher professional development [9,9,10,11].

One key feature that makes ENA an effective method in modeling collaborative interaction is that ENA can model individuals’ unique contributions to collaborative discourse while accounting for group context, and thus both individuals and groups can be analyzed in the same model. This feature is particularly valuable in collaborative learning environments, where the interactions and contributions of each individual are related and should not be treated as a series of isolated events. For example, Swiecki et al. [6] analyzed the communications of air defense warfare teams in training exercises and found that ENA was not only able to reveal differences in individual performance identified in a qualitative analysis of the collaborative discourse, but also to test those differences statistically.

2.2 Ordered Network Analysis (ONA)

Ordered Network Analysis (ONA) extends the theoretical and analytical advantages of ENA to account for the order of events by producing directed weighted networks rather than undirected models [12]. Like ENA, ONA takes coded data as input, identifies and measures connections among coded items, and visualizes the structure of connections in a metric space that enables both statistical and visual comparison of networks. However, ONA models the order in which codes appear in the data, enabling analysis of phenomena in which the order of events is hypothesized to be important.

For example, Tan et al. [12] used ONA to model the performance of military teams learning to identify, assess, and respond to potential threats detected by radar. The findings demonstrate that ONA could detect qualitative differences between teams in different training conditions that were not detected with unordered models and show that they are statistically significant. In their work, Tan et al. [12] argued that ONA possesses an advantage over methods such as Sequential Pattern Mining (SPM), which is widely used to identify frequent sequential patterns. In contrast to SPM, which prioritizes the specific micro-sequential order of events, ONA models processes by accounting for the co-temporal order of interactions between the units of analysis in response and what they are responding to. Consequently, ONA is a more appropriate methodological choice when modeling processes in ill-formed problem-solving scenarios, where collaborative interactions do not follow a prescribed sequence of steps but where the order of activities is still important.

ONA has also been used to analyze log data from online courses. For example, Fan et al. [13] analyzed self-regulated learning tactics employed by learners in Massive Open Online Courses (MOOC) using ONA and process mining. The authors found that ONA provided more nuanced interpretations of learning tactics compared to process mining because ONA models learning tactics across four dimensions: frequency, continuity, order, and the role of specific learning actions within broader tactics.

Like ENA, ONA produces summary statistics for network comparison and mathematically consistent network visualizations that enable interpretation of statistical measures. Unlike ENA, ONA models the order in which codes appear in data, enabling researchers to investigate whether and to what extent the order of events is meaningful in a given context.

In the following sections, we provide a step-by-step guide to conducting ENA and ONA analyses in R.

3 Epistemic Network Analysis in R

In this section, we demonstrate how to conduct an ENA analysis using the rENA package. If you are not familiar with ENA as an analytic technique, we recommend that you first read Shaffer [1], Shaffer and Ruis [14], and Bowman et al. [2] to familiarize yourself with the theoretical and methodological foundations of ENA.

3.1 Install the rENA Package and Load the Library

Before installing the rENA package, be sure that you are using R version 4.1 or newer. To check your R version, type R.version in your console. To update your R version (if needed), download and install R from the official R website: https://cran.r-project.org/

First, install the rENA package and then load the rENA library after installation is complete.

A code for package installation and loading. The code includes the command install dot packages to install the rem package from specified repositories, followed by library (rem) to load the package into the R session.

We also install the other package that is required for accessing the view() function Sect. 3.7.3 in rENA.

A code for package installation and loading. The code includes the command install dot packages to install the t map package from specified repositories, followed by library (t map) to load the package into the R session.

3.2 Dataset

The dataset we will use as an example, RS.data, is included in the rENA package. Note that the RS.data file in the package is only a subset of the full dataset, and is thus intended for demonstration purposes only.

To start, pass RS.data from the rENA package to a data frame named data.

A code reads as follows. Data equals r E N A colon colon R S dot data.

Use the head() function in R to subset and preview the first three rows present in the input data frame to familiarize yourself with the data structure.

A data table with headers including username, condition, confidence dot pre, post, activity number, and others. The table contains rows with mixed alphanumeric data entries, such as usernames, game-related terms like First Game, timestamps, and performance indicators.

RS.data consists of discourse from RescuShell, an online learning simulation where students work as interns at a fictitious company to solve a realistic engineering design problem in a simulated work environment. Throughout the internship, students communicate with their project teams and mentors via online chat, and these chats are recorded in the “text” column. A set of qualitative codes were applied to the data in the “text” column, where a value of 0 indicates the absence of the code and a value of 1 indicates the presence of the code in a given line.

Further details about the RS.data dataset can be found in Shaffer and Arastoopour [15]. Analyses of data from RescuShell and other engineering virtual internships can be found in Arastoopour et al. [16] and Chesler et al. [17].

3.3 Construct an ENA Model

To construct an ENA model, there is a function called ena which enables researchers to set the parameters for their model. This function wraps two other functions—ena.accumulate.data and ena.make.set—which can be used together to achieve the same result.

In the following sections, we will demonstrate how to set each parameter and explain how different choices affect the resulting ENA model.

3.3.1 Specify Units

In ENA, units can be individuals, ideas, organizations, or any other entity whose structure of connections you want to model. To set the units parameter, specify which column(s) in the data contain the variables that identify unique units.

For this example, choose the “Condition” column and the “UserName” column to define the units. The “Condition” column has two unique values: FirstGame, and SecondGame, representing novice users and relative expert users, respectively, as some students participated in RescuShell after having already completed a different engineering virtual internship. The “UserName” column includes unique user names for all students (n = 48). This way of defining the units means that ENA will construct a network for each student in each condition.

A code reads as follows. Unit Cols equals c left parenthesis double quotes Conditional double quotes comma double quotes User name double quotes right parenthesis.

To verify that the units are correctly specified, subset and preview the unique values in the units columns. There are 48 units from two conditions, which means that the ENA model will produce 48 individual-level networks for each of the units, and each unit is uniquely associated with either the novice group (FirstGame) or the relative expert group (SecondGame).

A code snippet using the head function to display the first three entries of a dataset. The output shows the condition column with the value First Game for all entries and the username column with usernames.

3.3.2 Specify Codes

Next, specify the columns that contain the codes. Codes are concepts whose pattern of association you want to model for each unit. ENA represent codes as nodes in the networks and co-occurrences of codes as edges. Most researchers use binary coding in ENA analyses, where the values in the code columns are either 0 (indicating that the code is not present in that line) or 1 (indicating that the code is present in that line). RS.data contains six code columns, all of which will be used here.

To specify the code columns, enter the code column names in a vector.

A code editor displaying a variable code Cols assigned an array. The array contains strings representing goals such as data, technical constraints, performance parameters, client and consultant requests, design reasoning, and collaboration.

To verify that the codes are correctly specified, preview the code columns selected.

A code output of the head function is applied to a data frame. The displayed columns are Data Technical dot Constraints, Performance dot Parameters, Client dot and dot Consultant dot Requests, Design dot Reasoning, and Collaboration.

3.3.3 Specify Conversations

The conversation parameter determines which lines in the data can be connected. Codes in lines that are not in the same conversation cannot be connected. For example, you may want to model connections within different time segments, such as days, or different steps in a process, such as activities.

In our example, choose the “Condition”, “GroupName”, and “ActivityNumber” columns to define the conversations. These choices indicate that connections can only happen between students who were in the same condition (FirstGame or SecondGame) and on the same project team (group), and within the same activity. This definition of conversation reflects what actually happened in the simulation: in a given condition, students only interacted with those who were in the same group, and each activity occurred on a different day.

To specify the conversation parameter, enter the column names in a vector.

A code reads as follows. Conversation Cols equals c left parenthesis double quotes Condition double quotes comma double quotes Group Name double quotes comma double quotes Activity Number double quotes right parenthesis.

To verify that the conversations are correctly specified, subset and preview the unique values in the conversation columns.

A code output of the head function is applied to a dataframe with unique values. The table displays the Condition, Group Name, and Activity Number columns, with the first three rows indicating the First Game condition, the Electric Group name, and activity numbers 1, 3, and 4.

3.3.4 Specify the Window

Once the conversation parameter is specified, a window method needs to be specified. Whereas the conversation parameter specifies which lines can be related, the window parameter determines which lines within the same conversation are related. The most common window method used in ENA is called a moving stanza window, which is what will be used here.

Briefly, a moving stanza window is a sliding window of fixed length that moves through a conversation to detect and accumulate code co-occurrences in recent temporal context. The lines within a designated stanza window are considered related to each other. For instance, if the moving stanza window is 7, then each line in the conversation is linked to the six preceding lines. See Siebert-Evenstone et al. [18] and Ruis et al. [19] for more detailed explanations of windows in ENA models.

Here, set the window.size.backFootnote 1parameter equal to 7. User can specify a different moving stanza window size by passing a different numerical value to the `window.size.back` parameter.

A code reads as follows. Window dot size dot back equals 7.

3.3.5 Specify Groups and Rotation Method

When specifying the units, we chose a column that indicates two conditions: FirstGame (novice group) and SecondGame (relative expert group). To enable comparison of students in these two conditions, three additional parameters need to be specified: groupVar, groups, and mean.

A snippet of code with comments. The code includes a variable assignment for group by set to Condition. Below, a dictionary group is defined with two keys, First Name and Second Name, representing unique values in the Condition column.

These three parameters indicate that when building the ENA model, the first dimension will maximize the difference between the two conditions: FirstGame and SecondGame. This difference maximization is achieved through mean = TRUE, which specifies that a means rotation will be performed at the dimensional reduction stage. If the means rotation is set to FALSE or there aren’t two distinct groups in your data and you still set mean as TRUE, ENA will by default use singular value decomposition (SVD) to perform the dimensional reduction. Bowman et al. [2] provide a mathematical explanation of the methods used in ENA to perform dimensional reductions.

3.3.6 Specify Metadata

The last parameter to be specified is metadata. Metadata columns are not required to construct an ENA model, but they provide information that can be used to subset units in the resulting model.

Specify the metadata columns shown below to include data on student outcomes related to reported self-confidence before and after participating in engineering virtual internships. We will use this data to demonstrate a simple linear regression analysis that can be done using ENA outputs as predictors.

A code snippet that defines a list variable named meta Cols with four string elements. Confidence dot change, confidence dot pre, confidence dot post, and C dot change.

3.3.7 Construct an Model

Now that all the essential parameters have been specified, the ENA model can be constructed.

The ena function constructs the ENA model, and we recommend that you store the output in an object (in this case, set.ena).

A code snippet with the definition of a function set dot e n a. The function is defined with parameters data, units, codes, conversation, window dot size dot back, metadata, group Var, groups, and mean.

As noted above, the ena helper function combines the functions ena.accumu late.data and ena.make.set. The following code will construct the same ENA model specified above using these two functions.

A code includes a function call e n a dot accumulate dot data with parameters for text data, units, conversation, metadata, and codes. The metadata parameter is noted as optional in a comment.
A code includes a function set dot e n a with parameters and comments explaining their equivalence in the e n a function. The parameters include e n a dot make dot set, rotation dot params, and conditions for the first and second games.

3.4 Summary of Key Model Outputs

Users can explore what is stored in the object set by typing set$ and select items from the drop down list. Here, we briefly describe the top-level items in set that are often of interest.

3.4.1 Connection Counts

Connection counts are the frequencies of unique connections a unit made. For each unit, ENA creates a cumulative adjacency vector that contains the sums of all unique code co-occurrences for that unit across all stanza windows. Here, there are 48 units in the ENA model, so there are 48 adjacency vectors. Each term in an ENA adjacency vector represents a unique co-occurrence of codes. Thus with six codes, each vector has 15 terms (n choose two). This is because ENA models are undirected and do not model co-occurrences of the same code.

To access ENA adjacency vectors, use set.ena$connection.counts.

A code output for the head of a data frame with connection counts. The data frame includes columns for E N A Unit condition, username, confidence dot change, and confidence dot pre, with numerical and textual entries for each column.
A data organized in columns with headings such as Data and Client dot and dot Consultant dot Requests, Technical dot Constraints and Client dot and consultant dot Requests, and more. Each heading is followed by rows of numbers.

3.4.2 Line Weights

To compare networks in terms of their relative patterns of association, researchers can spherically normalize the cumulative adjacency vectors by diving each one by its length. The resulting normalized vectors represent each unit’s relative frequencies of code co-occurrence. In other words, the sphere normalization controls for the fact that different units might have different amounts of interaction or different numbers of activities than others.

Notice that in set.ena$connection.counts, the value for each unique code co-occurrence is an integer equal or greater than 0, because they represent the raw connection counts between each unique pair of codes. In set.``ena``$line.weights, those raw counts are normalized, and therefore the values are rational numbers between 0 and 1.

To access the normalized adjacency vectors, use set.ena$line.weights.

A code output for the head of a data frame with line weights. The data frame includes columns for E N A Unit condition, username, confidence dot change, and confidence dot pre, with numerical and textual entries for each column.
Data is organized in columns with headings such as data and design dot reasoning, technical dot constraints and design dot reasoning, performance dot parameters and design dot reasoning, and more. Each heading is followed by rows of numbers.

3.4.3 ENA Points

As the product of a dimensional reduction, for each unit, ENA produces an ENA point in a two-dimensional space. Since there are 48 units, ENA produces 48 ENA points.

By default, rENA visualizes ENA points on an x-y coordinate plane defined by the first two dimensions of the dimensional reduction: for a means rotation, MR1 and SVD2, and for an SVD, SVD1 and SVD2.

To access these points, use set.ena$points.

A code output for the head of a data frame with e n a dollar points, 3. The data frame includes columns for E N A Unit condition, username, confidence dot change, and confidence dot pre, with numerical and textual entries for each column.
A code snippet with numerical data and comments. The data is organized in columns, each line prefixed with a hash symbol and an index number.

ENA points are thus summary statistics that researchers can use to conduct statistical tests, and they can also be used in subsequent analyses. For example, statistical differences between groups in the data can be tested using ENA dimension scores, and those scores can also be used in regression analyses to predict outcome variables, which we will demonstrate later.

3.4.4 Rotation Matrix

The rotation matrix used during the dimensional reduction can be accessed through set.ena$rotation. This is mostly useful when you want to construct an ENA metric space using one dataset and then project ENA points from different data into that space, as in Sect. 5.1.

A code snippet and its output. The code includes a function call to head, and the output shows numerical data organized into columns with header codes, M R 1, M R 2, and S V D 2.
A code snippet of numerical data with comments. Each line starts with a hash symbol and a number, indicating an index, followed by a series of decimal numbers.

3.4.5 Metadata

set$meta.data returns a data frame that includes all the columns of the ENA set except for the columns representing code co-occurrences.

A code snippet of a function call to head with the argument set dot e n a dollar meta dot data, 3. The output includes usernames with identifiers and game references, along with comments about confidence and position changes.

3.5 ENA Visualization

Once an ENA set is constructed, it can be visualized, which facilitates interpretation of the model. Here, we will look at the two conditions, “FirstGame” (novices) and “SecondGame” (relative experts), by plotting their mean networks.

3.5.1 Plot a Mean Network

To plot a network, use the ena.plot.network function. This function requires the network parameter (a character vector of line weights), and the line weights come from set$line.weights.

First, subset line weights for each of the two groups.

A code snippet with two lines of R code is used to subset line weights for first and second game conditions.

Next, calculate the mean networks for the two groups, and store the line weights as vectors.

A code snippet displaying two lines of R code that calculate the column means of two datasets named first game line weights and second game line weights.

During plotting, use a pipe | > to send the output of one function into the first parameter of the subsequent function. To distinguish the two mean networks, set the color of the FirstGame mean network to red (Fig. 1).

A code snippet with two lines of code. The first line calls a plot function with parameters set name and title, setting the title to First Game mean plot. The second line calls a plot network function on an object named network, referencing the first game mean.
Fig. 1
A graph titled First Game mean plot with four quadrants. Multiple lines radiate from the center to various labels such as technical constraints, design reasoning, data, collaboration, client and consultant requests, and performance parameters.

ENA mean network for FirstGame group

and the color of the SecondGame mean network to blue (Fig. 2).

A code snippet with two lines of code. The first line calls a plot function with parameters set name and title, setting the title to Second Game mean plot. The second line calls a plot network function on an object named network, referencing the second game mean.
Fig. 2
A graph titled Second Game mean plot with four quadrants. Multiple lines radiate from the center to various labels such as technical constraints, design reasoning, data, collaboration, client and consultant requests, and performance parameters.

ENA mean network for SecondGame group

As you can see from the two network visualizations above, their node positions are exactly same. All ENA networks from the same model have the same node positions, which are determined by an optimization routine that attempts to place the nodes such that the centroid of each unit’s network and the location of the ENA point in the reduced space are co-located.

Because of the fixed node positions, ENA can construct a subtracted network, which enables the identification of the most salient differences between two networks. To do this, ENA subtracts the weight of each connection in one network from the corresponding weighted connection in another network, then visualizes the differences in connection strengths. Each edge is color-coded to indicate which of the two networks contains the stronger connection, and the thickness and saturation of the edges corresponds to the magnitude of the difference.

To plot a subtracted network, first calculate the subtracted network line weights by subtracting one group’s line weights from the other. (Because ENA computes the absolute values of the differences in edge weights, the order of the two networks in the subtraction doesn’t matter.)

A code reads as follows. Subtracted dot mean equals first dot game dot mean minus second dot game dot mean.

Then, use the ena.plot function to plot the subtracted network. If the differences are relatively small, a multiplier can be applied to rescale the line weights, improving legibility (Fig. 3).

A code snippet displaying a function call to e n a dot plot with parameters for setting the title to Subtracted First Name (Red), Second Name (Blue).
Fig. 3
A graph titled Subtracted, first and second game mean plot with four quadrants. Multiple lines radiate from the center to various labels such as technical constraints, design reasoning, data, collaboration, client and consultant requests, and performance parameters.

ENA subtracted mean network for FirstGame (red) and SecondGame (blue)

Here, the subtracted network shows that on average, students in the FirstGame condition (red) made more connections with Technical.Constraints and Collaboration than students in the SecondGame condition (blue), while students in the SecondGame condition made more connections with Design.Reasoning and Performance.Parameters than students in the FirstGame condition. This is because students with more experience of engineering design practices did not need to spend as much time and effort managing the collaborative process and learning about the basic technical elements of the problem space, and instead spent relatively more time focusing on more complex analysis and design reasoning tasks.

Note that this subtracted network shows no connection between Technical.Constraints and Design.Reasoning, simply because the strength of this connection was similar in both conditions. Thus, subtraction networks should always be visualized along with with the two networks being subtracted.

3.5.2 Plot a Mean Network and its Points

The ENA point or points associated with a network or mean network can also be visualized.

To visualize the points associated with each of the mean networks plotted above, use set$points to subset the rows that are in each condition and plot each condition as a different color.

A code snippet with two lines of R code is used to subset rotated points for first and second game conditions.

Then, plot the FirstGame mean network the same as above using ena.plot.network, use | > to pipe in the FirstGame points that we want to include, and plot them using ena.plot.points.

Each point in the space is the ENA point for a given unit. The red and blue squares on the x-axis are the means of the ENA points for each condition, along with the 95% confidence interval on each dimension. (might need to zoom in for better readability).

Since we used a means rotation to construct the ENA model, the resulting space highlights the differences between FirstGame and SecondGame by constructing a rotation that places the means of each condition as close as possible to the x-axis of the space and maximizes the distance between them (Figs. 4, 5, 6, 7, and 8).

A code snippet creates a plot with points, mean points, and confidence intervals, then adds additional points and grouped points to the same plot, potentially with different visual representations and colors.
A code snippet generates a plot that includes the mean network and its points, additional points, and grouped points with a confidence interval representation, all plotted in a bright color.
Fig. 4
A scatterplot on a 4-quadrant chart plots points in a fluctuating trend across the chart. The mean point is plotted at the origin with the confidence interval surrounding it with a part from each quadrant. The maximum points from the second quadrant enter the confidence interval.

FirstGame ENA points (dots), mean point (square), and confidence interval (box)

Fig. 5
A dot plot with a nodal map titled First Game mean network and its points. The dot plots are concentrated on the origin. The map connects technical constraints from the second quadrant to design reasoning in the first, performance parameters in the fourth, and data, collaboration, and client and consultant requests in the third quadrant.

FirstGame ENA mean network and points

Fig. 6
A dot plot on a 4-quadrant chart with maximum plots on the first and fourth quadrants. The mean point is plotted on the positive x-axis with the confidence interval surrounding it with parts of both the first and fourth quadrants.

SecondGame ENA points (dots), mean point (square), and confidence interval (box)

Fig. 7
A dot plot with a nodal map titled second game mean network and its points. The plots are concentrated on the origin. Technical constraints from the second quadrant connect to design reasoning in the first, performance parameters in the fourth, and data in the third quadrant. Collaboration and client and consultant requests are weaker points.

SecondGame ENA mean network and points

Fig. 8
A dot plot is titled Subtracted Mean Network. Plots are concentrated on the origin. The first game network connects technical constraints from the second quadrant to data, collaboration, and client and consultant requests in the third quadrant. The second game connects design reasoning from the first to performance parameters in the fourth quadrant.

ENA subtracted mean network for FirstGame (blue) and SecondGame (red)

Then, do the same for the SecondGame condition.

A code snippet generates a plot with points, a mean point, and a confidence interval, and then adds additional points and grouped points with a confidence interval representation, all plotted in a cool hue.
A code snippet generates a plot that includes the mean network and its points for the Second Game data, along with additional points and grouped points with a confidence interval representation, all plotted in a cool hue.

Lastly, do the same for subtraction as well.

A code snippet generates a series of plots comparing the mean networks of two games and their points, with the color coding for the first and second games. It also includes confidence intervals represented as boxes for both sets of points.

Note that the majority of the red points (FirstGame) are located on the left side of the space, and the blue points (SecondGame) are mostly located on the right side of the space. This is consistent with the line weights distribution in the mean network: the FirstGame units make relatively more connections with nodes on the left side of the space, while the SecondGame units make relatively more connections with nodes on the right side of the space. The positions of the nodes enable interpretation of the dimensions, and thus interpretation of the locations of the ENA points.

3.5.3 Plot an Individual Unit Network and its Point

Plotting the network and ENA point for a single unit uses the same approach. First, subset the line weights and point for a given unit.

A set of commands extracts specific line weights and E N A points data from the set e n a object and stores them in matrix format for further analysis or processing.

Then, plot the network and point for that unit (Fig. 9).

A code snippet generates a plot focusing on the individual network analysis for the First Game Steven Z unit. It includes visualizations of line weights and E N A points.
Fig. 9
A dot plot with a nodal map titled Individual Network plots the dot on the negative x-axis near the origin. Technical constraints from the second quadrant connect with design reasoning in the first, performance parameters in the fourth, and data, collaboration, and client and consultant requests in the third quadrant.

EAN network for a student from FirstGame and its corresponding ENA point

Following the exact same procedure, we can, for example, choose a unit from the other condition to plot and also construct a subtracted plot for those two units (Fig. 10).

A code snippet prepares and visualizes individual network analysis for the Second Game Samuel o unit. It generates plots of line weights and E N A points, allowing for the examination of the structure and connections within the individual network of this specific unit.
Fig. 10
A dot plot with a nodal map titled Individual Network plots the dot on the positive x-axis near the origin. Technical constraints from the second quadrant connect with design reasoning in the first, performance parameters in the fourth, and data, collaboration, and client and consultant requests in the third quadrant.

ENA network for a student from SecondGame and its corresponding ENA point

To visually analyze the differences between the two individual networks, plot their subtracted network (Fig. 11).

Fig. 11
A dot plot with a nodal map titled Subtracted Network. The plots for First Game Steven and Second Game Samuel are on the negative and positive x-axes, respectively. The second game connects design reasoning with performance parameters and client and consultant requests. The first game connects the rest.

ENA subtracted network showing the differences between one student from FirstGame (red) and another student from SecondGame (blue)

A code snippet generates a plot that compares the subtracted network between the First Game Steven Z and the Second Game Samuel O units. It includes visualizations of subtracted networks and E N A points for both units.

In this unit-level subtracted network, Unit A (red) made relatively more connections with codes such as Technical.Constraints, Data, and Collaboration, while Unit B (blue) made relatively more connections with Design.Reasoning and Performance.Parameters.

3.5.4 Plot Everything, Everywhere, All at Once

The helper function ena.plotter enables users to plot points, means, and networks for each condition at the same time. This gives the same results as above more parsimoniously. However, this approach does not enable customization of edge and point colors.

A code defines and utilizes a helper function to generate types of plots based on ensemble network analysis data, with options to include points, means, and networks in the plots. It specifies grouping variables and groups for subsetting the data and performs subtraction operations on the network data. It then prints the generated plots.

3.6 Compare Groups Statistically

In addition to visual comparison of networks, ENA points can be analyzed statistically. For example, here we might test whether the patterns of association in one condition are significantly different from those in the other condition.

To demonstrate both parametric and non-parametric approaches to this question, the examples below use a Student’s t test and a Mann-Whitney U test to test for differences between the FirstGame and SecondGame condition. For more on differences between parametric and non-parametric tests, see Kaur and Kumar [20].

First, install the lsr package to enable calculation of effect size (Cohen’s d) for the t test.

A two-line code reads as follows. Install dot package left parenthesis single quotation starts L s r single quotation ends right parenthesis. Library left parenthesis l s r right parenthesis.

Then, subset the points to test for differences between the points of the two conditions.

A code extracts specific data points from the E N A dataset for different experimental conditions and dimensions for the First and Second Games. The extracted data points are stored in separate variables for further analysis or visualization.

Conduct the t test on the first and second dimensions.

A code snippet performs two Welch's two-sample t-tests, one for each dimension of the data points, to compare the means between the first and second game conditions. The results include the t-statistic, degrees of freedom, p-value, confidence interval, and sample means for each dimension.
A partial code snippet presents the 95% confidence interval for the difference in means between two groups or conditions, along with the sample estimates of the mean for each group.

Compute any other statistics that may be of interest. A few examples are given below.

A code snippet calculates the mean, standard deviation, and length of the data points for each dimension and condition. It provides insight into the central tendency, variability, and sample size of the data.
A code provides insight into the effect size of the differences between the means of data points for each dimension and condition. A larger Cohen's d value indicates a larger effect size, while a value close to zero suggests a smaller effect size or no meaningful difference between the means.

Here, along the x axis (MR1), a two-sample t test assuming unequal variance shows that the FirstGame (mean = −0.09, SD = 0.11, N = 26) condition is statistically significantly different for alpha = 0.05 from the SecondGame condition (mean = 0.11, SD = 0.10, N = 22; t(45.31) = −6.52, p = 0.00, Cohen’s d = 1.88). Along the y axis (SVD2), a two-sample t test assuming unequal variance shows that the FirstGame condition (mean = 0.11, SD = 0.13, N = 26) is not statistically significantly different for alpha = 0.05 from the SecondGame condition (mean = 0.00, SD = 1.3, N = 22; t(43.17) = 0, p = 1.00).

The Mann-Whitney U test is a non-parametric alternative to the independent two-sample t test.

First, install the rcompanion package to calculate the effect size (r) for a Mann-Whitney U test.

A two-line code reads as follows. Install dot packages left parenthesis single quotation starts r companion single quotation ends right parenthesis. Library left parenthesis r companion right parenthesis.

Then, conduct a Mann-Whitney U test on the first and second dimensions.

A code performs Wilcoxon rank sum tests to assess whether there is a statistically significant difference between the distributions of data points from different conditions for each dimension. The p-values provide evidence for or against the null hypothesis of no difference in distributions.

Compute any other statistics that may be of interest. A few examples are given below.

A set of codes provides information about the median, length, and Wilcoxon R statistic for the differences between the data points of two conditions for each dimension.

Here, along the x axis (MR1), a Mann-Whitney U test shows that the FirstGame condition (Mdn = −0.08, N = 26) was statistically significantly different for alpha = 0.05 from the SecondGame condition (Mdn = −0.007, N = 22; U = 50, p = 0.00, r = 0.86). Along the y axis (SVD2), a Mann-Whitney U test shows that the FirstGame condition (Mdn = 0.13, N = 26) is not statistically significantly different for alpha = 0.05 from the SecondGame condition (Mdn = 0.00, N = 22; U = 287, p = 0.99). The absolute value of r value in Mann-Whitney U test varies from 0 to close to 1. The interpretation values for r commonly in published literature is: 0.10 - < 0.3 (small effect), 0.30 - < 0.5 (moderate effect) and > = 0.5 (large effect).

3.7 Model Evaluation

In this section, we introduce three ways users can evaluate the quality of their ENA models.

3.7.1 Variance Explained

Briefly, variance explained (also called explained variation) refers to the proportion of the total variance in a dataset that is accounted for by a statistical model or set of predictors.

In ENA, to represent high-dimensional vectors in a two-dimensional space, ENA uses either singular value decomposition or means rotation combined with SVD. For each of the reduced dimensions, the variance in patterns of association among units explained by that dimension can be computed.

A code snippet displays the variance estimates for the first two model components, which provide insights into the amount of variance explained by each component in the ensemble network analysis model.

Here, the first dimension is MR1 and the second dimension is SVD2. The MR1 dimension has the highest variance explained at 32%.

As with any statstical model, greater explained variance does not necessarily indicate a better model, as it may be due to overfitting, but it provides one indicator of model quality.

3.7.2 Goodness of Fit

Briefly, a model’s goodness of fit refers to how well a model fits or represents the data. A model with a high goodness of fit indicates that it accurately represents the data and can make reliable predictions.

In ENA, a good fit means that the positions of the nodes in the space—and thus the network visualizations—are consistent with the mathematical properties of the model. In other words, we can confidently rely on the network visualizations to interpret the ENA model. The process that ENA uses to achieve high goodness of fit is called co-registration. The mathematical details of co-registration are beyond the scope of this chapter and can be found in Bowman et al. [2].

To test a model’s goodness of fit, use ena.correlations. The closer the value is to 1, the higher the model’s goodness of fit is. Most ENA models have a goodness of fit that is well above 0.90.

A code snippet computes and presents correlations between different model components using the Pearson and Spearman correlation coefficients, providing insights into the relationships between these components within the ensemble network analysis.

3.7.3 Close the Interpretative Loop

Another approach to evaluate an ENA model is to confirm the alignment between quantitative model (in our case, our ENA model) and the original qualitative data. In other words, we can return to the original data to confirm that quantitative findings give a fair representation of the data. This approach is an example of what’s called as closing the interpretative loop in Quantitative Ethnography field [1].

For example, based on our visual analysis of the network of “SecondGame.samuel o” in previous section, we are interested in what the lines are in the original data that contributed to the connection between Design.Reasoning and Performance.Parameters.

Let’s first review what “SecondGame.samuel o” ENA network looks like (Fig. 12).

A code snippet generates plots for the individual network and points associated with the Second Game Samuel O condition in E N A, with the plotted network.
Fig. 12
A dot plot with a nodal map titled individual network with the dot plotted below the positive x-axis. Technical constraints from the second quadrant connect with design reasoning in the first, performance parameters in the fourth, and data, collaboration, and client and consultant requests in the third quadrant.

ENA network for Samuel O from SecondGame

To do so, we use view() function and specify required parameters as below.

This is going to activate a window shows up in your Viewer panel. If it is too small to read, you can click on the “Show in new window” button to view it in your browser for better readability.

In the Viewer panel, hover over your cursor on any of the lines that are in bold, a size of 7 lines rectangle shows up, representing that in a moving stanza window of size 7, the referent line (the line in bold) and its preceding 6 lines. The 1 and 0 in Technical.Constraints column and Design.Reasoning column shows where the connections happened (Fig. 13).

Fig. 13
A table lists columns for Q E I D, condition, user name, group name, activity number, and text.

A screenshot of the view() function result. The highlighted lines represent lines within the same stanza window

For example, line 2477 Samuel shared his [Design.Reasoning] about “mindful of (the) how one device scores relative to other ones”, to reference back to what Casey said in line 2476 about [Performance.Parameters] “not one source/censor can be the best in every area so we had to sacrifice certain attributes”, as well as what Jackson said in line 2475 about safety as one of the [Performance.Parameters] “when it came to the different attributes, i think that all were important in their own way but i think safety is one of the most important”.

This is a qualitative example of a connection made between Performance.Parameters and Design.Reasoning.

3.8 Using ENA Model Outputs in Other Analyses

It is often useful to use the outputs of ENA models in subsequent analyses. The most commonly used outputs are the ENA points, i.e., set$points. For example, we can use a linear regression analysis to test whether ENA points on the first two dimensions are predictive of an outcome variable, in this case, change in confidence in engineering skills.

A code snippet performs linear regression analysis to examine the relationship between the CONFIDENCE change variable and predictor variables using the E N A data. The summary output provides insights into the model fit and significance of the predictor variables.
A summary output provides information about the coefficients, significance levels, goodness-of-fit measures, and overall significance of a linear regression model.

The results of this regression analysis show that ENA points are not a significant predictor of the students’ pre-post change in confidence (MR1: t = −0.53, p = 0.60; SVD2: t = 0.46, p = 0.65; Condition: t = −1.36, p = 0.18). The overall model was also not significant (F(3, 43) = 2.01, p = 0.13) with an adjusted r-squared value of 0.06.

Recall that the dataset we are using is a small subset of the full RS.data, and thus results that are significant for the whole dataset may not be for this sample.

4 Ordered Network Analysis with R

This section demonstrates how to conduct an ONA analysis using the ona R package. If you are new to ONA as an analytic technique, Tan et al. [12] provides a more detailed explication of its theoretical and methodological foundations.

Because ONA shares some conceptual and procedural similarities with ENA, you may also want to read the recommended papers from the ENA section [1, 2, 14].

4.1 Install the ONA Package and Load the Library

Install the ona package and load the ona library after installing.

A snippet of an R code installs the ona package from a specified repository and loads the library.

Then, install the other package that is required for ONA analysis.

A snippet of an R code installs the t m a package from a specified repository and loads the library.

4.2 Dataset

(Refer to Sect. 3.2 for a detailed description of the dataset used here.)

Load the RS.data dataset.

A snippet of a code reads as follows. Data = ona colon colon R S dot data.

4.3 Construct an ONA Model

To construct an ONA model, identify which columns in the data to use for the parameters required by the ONA modeling function. The parameters are defined identically in both ENA and ONA; see Sect. 3.3 for detailed explanations.

4.3.1 Specify Units

Select the units as in Sect. 3.3.1.

A snippet of an R code reads as follows. My underscore units left angle bracket hyphen c left parenthesis double quotes condition double quotes comma double quotes username double quotes right parenthesis.

4.3.2 Specify Codes

Select the codes as in Sect. 3.3.2.

A code snippet has a variable named my underscore codes being assigned a list of strings named data, technical dot constraints, performance dot parameters, client dot and dot consultant dot requests, design dot reasoning, and collaboration.

4.3.3 Specify Conversations

The parameter to specify conversations in rENA is called “conversation”; in ONA, the equivalent is called “my_hoo_rules”, where “hoo” is an abbreviation of “horizon of observation.”

Choose the combination of “Condition” column, “GroupName” column, and “ActivityNumber” column to define the conversation parameter.

The syntax to specify conversations using my_hoo_rules in ONA is slightly different from the syntax to specify conversation in ENA, but the conceptual definition is the same.

A code snippet has a variable my underscore hoo underscore rules being assigned a value from conversation underscore rules based on conditions involving unit dollar condition, unit dollar group name, and unit dollar activity number.

4.3.4 Specify the Window

Specify a moving stanza window size by passing a numerical value to the window_size parameter.

A snippet of an R code reads as follows. Window underscore size equals 7.

4.3.5 Specify Metadata

As in ENA, metadata columns can be included if desired. Metadata columns are not required to construct an ONA model, but they provide information that can be used to subset units in the resulting model.

A code snippet reads as follows. Meta cols equals c left parenthesis double quotes confidence dot change double quotes comma double quotes confidence dot pre double quotes comma double quotes confidence dot post comma double quotes dot change double quotes right parenthesis.

4.3.6 Accumulate Connections

Now that all the parameters are specified, connections can be accumulated. For each unit, the ONA algorithm uses a moving stanza window to identify connections formed from a current line of data (e.g., a turn of talk), or response, to the preceding lines within the window (the ground).

Unlike in ENA, where connections among codes are recorded in a symmetric adjacency matrix, ONA accounts for the order in which the connections occur by constructing an asymmetric adjacency matrix for each unit; that is, the number of connections from code A to code B may be different than the number of connections from B to A.

To accumulate connections, pass the parameters specified to the contexts and accumulate_contexts functions, and store the output in an object (in this case, accum.ona).

A snippet of R code creates an ONA model. It includes parameters and functions such as contexts, units underscore by, hoo underscore rules, accumulate underscore contexts, decay dot function, and meta dot data.

4.3.7 Construct an ONA Model

After accumulation, call the model function to construct an ONA model. ONA currently implements singular value decomposition (SVD) and means rotation (MR) to perform dimensional reduction.

To create an ONA model using SVD, pass the accum.ona object to the model function.

A code segment reads as follows. Set dot ona left angle bracket hyphen. Model left parenthesis accum dot ona right parenthesis.

When there are two discrete groups to compare, a means rotation can be used, as described in Sect. 3.3.5.

A means rotation is specified using rotate.using ="mean" in the model function. Additionally, the model function expects rotation.params to be a list with two named elements, each containing a logical vector representing the rows of units to be included in each group.

Here, construct the ONA model as shown below.

A code snippet sets conditions and parameters for two games in a programming environment, using a mean rotation method. The functions include rotate dot using and rotation dot params.

4.4 Summary of Key Model Outputs

Information about an ONA model is now stored in the R object set.ona.

As in rENA, users can explore the data stored in the object by typing set.ona$ and select items from the drop down list. Here, we briefly explain the top-level items in set.ona$.

4.4.1 Connection Counts

Because ONA accounts for the order in which the connections occur by constructing an asymmetric adjacency matrix for each unit, connection counts from code A to code B and from B to A, as well as self-connections for each code (from A to A) are recorded. Thus, because six codes were included in the model, the cumulative adjacency vector for each unit contains 36 terms (n^2).

A code snippet and its output. The code includes a function set dot ona dollar connection dot counts, and the output is a table with columns for condition, username, and E N A underscore unit data to data with numerical values.
A code snippet with comments. The comments detail various collaborations, followed by three distinct numbers, providing insights into data behavior.
A code snippet with comments. The comments detail various collaborations, followed by three distinct numbers, providing insights into data behavior.

4.4.2 Line Weights

To compare networks in terms of their relative patterns of association, researchers can spherically normalize the cumulative adjacency vectors by diving each one by its length. The resulting normalized vectors represent each unit’s relative frequencies of code co-occurrence. In other words, the sphere normalization controls for the fact that different units might have different amounts of interaction or different numbers of activities than others.

In set.ona$connection.counts, the value for each unique co-occurrence of codes is an integer equal or greater than 0, because they represent the directional connection counts between each pair of codes. In set.ona$line.weights, the connection counts are sphere normalized, and so the values are between 0 and 1.

A code snippet and its output. The code includes a function set dot ona dollar line dot weights, and the output is a table with columns for condition, username, and ENA underscore unit, ena underscore direction, and data to data with numerical values. The output also lists collaboration details with distinct numbers.
A code snippet with comments. The comments detail various collaborations followed by three distinct numbers, providing insights into data behavior.
A code snippet with comments. The comments detail various collaborations followed by three distinct numbers, providing insights into data behavior.

4.4.3 ONA Points

For each unit, ONA produces an ONA point in a two-dimensional space formed by the first two dimensions of the dimensional reduction.

Here, the MR1 column represents the x-axis coordinate for each unit, and the SVD2 column represents the y-axis coordinate for each unit.

A code snippet and its output. The code includes a function set dot ona dollar points, and the output is a table with columns for condition, username, E N A underscore unit, E N A underscore direction, and M R 1.
A code output lists sets of values for S V D 8 to S V D 36.

4.4.4 Rotation Matrix

The rotation matrix used during the dimensional reduction can be accessed through set.ona$rotation. This is mostly useful when you want to construct an ONA metric space using one dataset and then project ONA points from different data into that space, as in Sect. 5.2.

A code output lists sets of values for M R 1 and S V D 2 to S V D 36.

4.4.5 Metadata

set.ona$meta.data gives a data frame that includes all the columns except for the code connection columns.

A code snippet and its output. The code includes a function set dot ona dollar meta dot data, and the output is a table with columns for condition, username, and E N A underscore unit.

4.5 ONA Visualization

Once an ONA model is constructed, ONA networks can be visualize. The plotting function in ONA is called plot, and it works similarly to the same function in ENA.

Before plotting, you can set up several global parameters to ensure consistency across plots. These parameters will be clearer in subsequent sections.

A code snippet scales up or down node sizes, zooms in or out node positions, zooms in or out the point positions, adjusts the chevron color lighter or darker, and scales up or down edge sizes.

4.5.1 Plot a Mean Network

Mean ONA networks can be plotted for each of the conditions along with their subtracted network.

First, plot the mean network for the FirstGame condition. Use a pipe | > to connect the edges function and the nodes function. Users are only required to specify the weights parameter, as the remaining parameters have default values unless specified otherwise (Fig. 14).

A code snippet for setting up a network plot. It includes parameters for nodes and edges, such as size and position multipliers. The code has functions to customize the network visualization.
Fig. 14
A mean network graph for the First Game group has a coordinate plane with interconnectedness between decision reasoning, technical constraints, client and organizational requests, collaboration, and performance parameters in the first to fourth quadrants, respectively.

ONA mean network for FirstGame group

Since this is the first ONA network visualization in this chapter, we briefly explain how to read an ONA network.

Node size: In ONA, the node size is proportional to the number of occurrences of that code as a response to other codes in the data, with larger nodes indicating more responses. For example, in this plot, students in the FirstGame condition responded most frequently with discourse about Technical.Constraints.

Self-connections: The color and saturation of the circle within each node is proportional to the number of self-connections for that code: that is, when a code is both what students responded to and what they responded with. Colored circles that are larger and more saturated reflect codes with more frequent self-connections.

Edges: Note that unlike most directed network visualizations, which use arrows or spearheads to indicate direction, ONA uses a “broadcast” model, where the source of a connection (what students responded to) is placed at the apex side of the triangle and the destination of a connection (what students responded with) is placed at its base.

Chevrons on edges: The chevrons point in the direction of the connection. Between any pair of nodes, if there is a bidirectional connection, the chevron only appears on the side with the stronger connection. This helps viewers differentiate heavier edges in cases such as between Technical.Constraints and Data, where the connection strengths from both directions are similar. When the connection strengths are identical between two codes, the chevron will appear on both edges.

Now, plot the mean network for SecondGame (Fig. 15).

A code snippet for setting up a network plot titled Second Game Mean Network. It includes parameters for nodes and edges, such as size and position multipliers. The code has functions to customize the network visualization.
Fig. 15
A mean network graph for the First Game group has a coordinate plane with interconnectedness between decision reasoning, technical constraints, client and organizational requests, collaboration, and performance parameters in the first to fourth quadrants, respectively.

ONA mean network for SecondGame group

Then, plot the subtracted network to show the differences between the mean networks of the FirstGame and SecondGame conditions (Fig. 16).

A code snippet for generating a network graph titled subtracted mean network, first game versus second game. It includes functions and parameters to customize various visual elements, such as edges and nodes.
Fig. 16
A mean network graph titled subtracted mean network, first game versus second game has a coordinate plane with interconnectedness between decision reasoning, technical constraints, client and organizational requests, collaboration, and performance parameters in the first to fourth quadrants, respectively.

ONA subtracted network showing the differences between FirstGame (red) and SecondGame (blue)

4.5.2 Plot a Mean Network and its Points

Besides plotting the mean network for each condition and the subtracted network, we can also plot the individual units within each condition.

Use set.ona$points to subset the rows that are in each condition and plot the units in each condition as a different color.

The points are specified in the units function. The edges and nodes functions remain the same as above (Figs. 17, 18, 19, and 20).\r\n

A snippet of R code generates a plot with the title points, mean point, and confidence interval. The functions used are set dot ona, points, points underscore color, and show underscore mean.
A snippet of R programming code generates a network plot with specific parameters. It includes settings for points, edges, and nodes. Some functions used are points, points underscore color, show underscore mean, weights, edge underscore color, and self underscore connection underscore color.
Fig. 17
A scatterplot titled points, mean point, and confidence interval has a coordinate plane with scattered values mostly in the first to third quadrants. The mean point, represented as a square, is along the horizontal axis.

ONA points (dots) and their mean point (square) for FirstGame group

Fig. 18
A mean network graph titled first game mean network has a coordinate plane with interconnectedness between decision reasoning, technical constraints and client and organizational requests, collaboration and data, and performance parameters in the first to fourth quadrants, respectively.

ONA mean network for FirstGame group

A snippet of R code generates a plot with the title points, mean point, and confidence interval. The functions used are set dot ona, points, points underscore color, and show underscore mean.
Fig. 19
A scatterplot titled points, mean point, and confidence interval has a coordinate plane with scattered values mostly in the first, second, and fourth quadrants. The mean point represented as a square is along the horizontal axis.

ONA points (dots) and their mean point (square) for SecondGame group

Fig. 20
An epistemic network analysis coordinate plot titled second game mean network has interconnected vertices labeled design reasoning, technical constraints, collaboration, data, performance parameters, and client and consultant parameters. The edges representing the games are indicated in blue.

ONA mean network and points for SecondGame

A code snippet sets the title of the plot to second game mean network. It specifies the units for plotting, with blue points. The edges section customizes edge-related parameters such as size, arrow saturation, position, and color. The nodes section configures node-related parameters like size and position.

Plot the subtracted network as follows (Fig. 21).

A code snippet sets the title of the plot to first game and second game subtracted plot. The code defines point color, shows mean, and defines edge weights.
A code snippet defines the parameters such as edge size multiplier, edge arrow saturation multiplier, node position multiplier, edge color, node size multiplier, node position multiplier, and self-connection color.
Fig. 21
An epistemic network analysis coordinate plot titled first game versus second game has interconnected vertices labeled design reasoning, technical constraints, collaboration, data, performance parameters, and client and consultant parameters. The edges representing the games are indicated by different colors.

ONA subtracted network showing the differences between FirstGame (red) and SecondGame (blue)

4.5.3 Plot an Individual Network and its Points

To plot an individual student’s network and ONA point, use set.ona$points.\r\n

Here, we choose the same two units we compared in the ENA analysis (Sect. 3.5.3) (Figs. 22 and 23).

An R code snippet code sets the title of the plot to first game double colon steven z. It specifies the units for plotting, with red points. The edges and nodes sections are placeholders for additional parameters related to edges and nodes in the plot.
An R code snippet code sets the title of the plot to second game double colon Samuel o. It specifies the units for plotting, with blue points. The edges and nodes sections are placeholders for additional parameters related to edges and nodes in the plot.
A code snippet sets the parameters for network visualization, such as weights, edge size multiplier, edge arrow saturation multiplier, node position multiplier, edge color, node position multiplier, and self-connection color.
Fig. 22
An epistemic network analysis coordinate plot titled first game double colon steven z has interconnected vertices labeled design reasoning, technical constraints, collaboration, data, performance parameters, and client and consultant parameters.

ONA network for a student from FirstGame

Fig. 23
An epistemic network analysis coordinate plot titled second game double colon Samuel o has interconnected vertices labeled design reasoning, technical constraints, collaboration, data, performance parameters, and client and consultant parameters.

ONA network for a student from SecondGame

In this case, both units make relatively strong connections between Design.Reasoning and Data. However, for Unit A (red), the connection is relatively more from Design.Reasoning to Data than the other way around. This indicates that more often this unit responded with Data. In contrast, Unit B (blue) responded more frequently to Data with Design.Reasoning.

A subtracted network can make such differences more salient (Fig. 24).

An R code snippet calculates the mean difference between Steven and Luke’s networks and uses this information to adjust the visual properties of the nodes and edges in the plot. The resulting plot represents the subtracted network.
Fig. 24
An epistemic network analysis coordinate plot titled subtracted network of Steven Z and Samuel indicated by different colored edges. The interconnected vertices are labeled design reasoning, technical constraints, collaboration, data, performance parameters, and client and consultant requests.

ONA subtracted network showing the differences between one student from FirstGame (red) and another student from SecondGame (blue)

The connection between Design.Reasoning and Data consists of two triangles, one in blue pointing from Data to Design.Reasoning, the other in red pointing from Design.Reasoning to Data. This indicates that although both units made strong connections between these two codes, the relative directed frequencies are different. Recall that in the ENA subtracted network for the same two units, the connections between Data and Design.Reasoning were basically the same. ONA, by accounting for the order of events, shows that while the undirected relative frequencies were similar, there was a difference in the order in which the two students made the connection.

4.6 Compare Groups Statistically

In addition to visual comparison of networks, ENA points can be analyzed statistically. For example, here we might test whether the patterns of association in one condition are significantly different from those in the other condition.

To demonstrate both parametric and non-parametric approaches to this question, the examples below use a Student’s t test and a Mann-Whitney U test to test for differences between the FirstGame and SecondGame condition.

First, install the lsr package to enable calculation of effect size (Cohen’s d) for the t test.

A code snippet reads as follows, install dot packages left parenthesis l s r right parenthesis. Library left parenthesis l s r right parenthesis.

Then, subset the points to test for differences between the points of the two conditions.

A code snippet defines the ona underscore first underscore points underscore d 1, ona underscore second underscore points underscore d 1, ona underscore first underscore points underscore d 2, and ona underscore second underscore points underscore d 2.

Conduct the t test on the first and second dimensions.

A code snippet with 2 sections. The first section performs parametric tests on 2 datasets that are ona underscore first points underscore d 1 and ona underscore second underscore points underscore d 1, performs Welch 2 sample t-test. The second section uses 2 different data sets.
A code snippet reads as follows. Double hash negative 0.1008208 0.1008208 double hash sample estimates colon double hash mean of x mean of y double hash negative 1.727628 e minus 17 1.742362 e minus 17.

Compute any other statistics that may be of interest. A few examples are given below.

An R code snippet calculates the mean, standard deviation, and length metrics for 2 datasets, that are ona underscore first underscore points and ona underscore second underscore points with two different variables each d 1 and d 2. It also computes Cohen’s D statistic.
A code snippet reads, double hash left square bracket 1 right bracket 1.109985. Cohens D left parenthesis ona underscore first underscore points underscore d 2, ona underscore second underscore points underscore d 2 right parenthesis double hash left square bracket 1 right square bracket 1.997173 e minus 16.

Here, along the x axis (MR1), a two-sample t test assuming unequal variance shows that the FirstGame (mean = −0.05, SD = 0.09, N = 26) condition is statistically significantly different for alpha = 0.05 from the SecondGame condition (mean = 0.06, SD = 0.12, N = 22; t(41.001) = −3.77, p = 0.00, Cohen’s d = 1.1). Along the y axis (SVD2), a two-sample t test assuming unequal variance shows that the FirstGame condition (mean = −1.73, SD = 0.17, N = 26) is not statistically significantly different for alpha = 0.05 from the SecondGame condition (mean = 1,74, SD = 0.17, N = 22; t(45.45) = 0, p = 1.00, Cohen’s d = 0.00).

The Mann-Whitney U test is a non-parametric alternative to the independent two-sample t test.

First, install the rcompanion package to calculate the effect size (r) for a Mann-Whitney U test.

A code snippet reads as follows. Hashtag install dot packages left parenthesis single inverted comma r companion single inverted comma right parenthesis. Library left parenthesis r companion right parenthesis.

Then, conduct a Mann-Whitney U test on the first and second dimensions.

An R code snippet performs non-parametric tests using the Wilcoxon rank sum exact test on two sets of data. It calculates and displays the W statistic and p-value to evaluate if there is a significant difference between the two groups.

Compute any other statistics that may be of interest. A few examples are given below.

An R code snippet calculates the median and length of two datasets, ona underscore first underscore points underscore d 1, and ona underscore second underscore points underscore d 1. The code computes the median values, dataset lengths, and performs Wilcoxon tests for the given datasets.

Here, along the x axis (MR1), a Mann-Whitney U test shows that the FirstGame condition (Mdn = −0.04, N = 26) was statistically significantly different for alpha = 0.05 from the SecondGame condition (Mdn = 0.10, N = 22 U = 130, p = 0.001, r = 0.00). Along the y axis (SVD2), a Mann-Whitney U test shows that the FirstGame condition (Mdn = 0.001, N = 26) is not statistically significantly different for alpha = 0.05 from the SecondGame condition (Mdn = 0.00, N = 22, U = 264, p = 0.66, r = 0.71). The absolute value of r value in Mann-Whitney U test varies from 0 to close to 1. The interpretation values for r commonly in published literature is: 0.10 - < 0.3 (small effect), 0.30 - < 0.5 (moderate effect) and > = 0.5 (large effect).

4.7 Model Evaluation

4.7.1 Variance Explained

Briefly, variance explained (also called explained variation) refers to the proportion of the total variance in a dataset that is accounted for by a statistical model or set of predictors.

In ONA, to represent high-dimensional vectors in a two-dimensional space, ONA uses either singular value decomposition or means rotation combined with SVD. For each of the reduced dimensions, the variance in patterns of association among units explained by that dimension can be computed.

A code snippet reads as follows. Head left parenthesis set dot ona dollar sign model dollar sign variance comma 2 right parenthesis. Double hash M R 1 S V D 2, double hash 0.1367940 0.2736079.

In our example above, since we used means rotation method, the first dimension is labeled as MR1 and the second dimension is labeled as SVD2.The two dimensions in combination explained more than 40% of the variance.

Here, the first dimension is MR1 and the second dimension is SVD2. The two dimensions in combination explained more than 40% of the variance.

As with any statistical model, greater explained variance does not necessarily indicate a better model, as it may be due to overfitting, but it provides one indicator of model quality.

4.7.2 Goodness of Fit

Briefly, a model’s goodness of fit refers to how well a model fits or represents the data. A model with a high goodness of fit indicates that it accurately represents the data and can make reliable predictions.

In ONA, a good fit means that the positions of the nodes in the space—and thus the network visualizations—are consistent with the mathematical properties of the model. In other words, we can confidently rely on the network visualizations to interpret the ONA model. The process that ONA uses to achieve high goodness of fit is called co-registration, the same as the one used in ENA. The mathematical details of co-registration are beyond the scope of this chapter and can be found in Bowman et al. [2].

To test a model’s goodness of fit, use ona::correlations. The closer the value is to 1, the higher the model’s goodness of fit is. Most ENA models have a goodness of fit that is well above 0.90.

A code snippet reads as follows. Ona double colon correlations left parenthesis set dot ona right parenthesis, double hash pearson spearman, double hash 1 0.9801173 0.9801799, double hash 2 0.9801431 0.9759160.

4.7.3 Close the Interpretative Loop

Another approach to evaluate an ONA model is to confirm the alignment between quantitative model (in our case, our ONA model) and the original qualitative data. In other words, we can return to the original data to confirm that quantitative findings give a fair representation of the data. This approach is an example of what’s called as closing the interpretative loop in Quantitative Ethnography field [1].

For example, based on our visual analysis of the network of “SecondGame::samuel o” in previous section, we are interested in what the lines are in the original data that contributed to the connection from Performance.Parameters to Design.Reasoning.

Let’s first review what “SecondGame::samuel o” ONA network looks like. Based on the connection direction and strength from Technical.Constraints to Performance.Parameters, we would expect to see more examples of Samuel responded with “Design.Reasoning” to “Performance.Parameters”, than the other way around (Fig. 25).

A code snippet begins by subsetting data, and defining units, edges, and nodes. A window size of 7 is used for context accumulation.
Fig. 25
An epistemic network coordinate plot titled second game Samuel o has vertices labeled design reasoning, technical constraints, client and consultant requests, collaboration, data, and performance parameters.

ONA network for a student from SecondGame

To do so, we use view() function and specify required parameters as below.

This is going to activate a window shows up in your Viewer panel. If it is too small to read, you can click on the “Show in new window” button to view it in your browser for better readability.

In the Viewer panel, hover over your cursor on any of the lines that are in bold, a size of 7 lines rectangle shows up, representing that in a moving stanza window of size 7, the referent line (the line in bold) and its preceding 6 lines. The 1 and 0 in Technical.Constraints column and Design.Reasoning column shows where the connections happened (Fig. 26).

Fig. 26
A screenshot of the tabulated data obtained as a result of the view function. The columns are titled Q E I D, condition, user name, group name, activity number, text, performance parameters, and design reasoning.

A screenshot of the view() function result. The highlighted lines represent lines within the same stanza window

Notice that here we are viewing the same qualitative example as in Sect. 3.7.3 in ENA. In line 2477 Samuel shared his [Design.Reasoning] about “mindful of (the) how one device scores relative to other ones”, as a response to what Casey said in line 2476 about [Performance.Parameters] “not one source/censor can be the best in every area so we had to sacrifice certain attributes”, as well as what Jackson said in line 2475 about safety as one of the [Performance.Parameters] “when it came to the different attributes, i think that all were important in their own way but i think safety is one of the most important”.

Here, ONA was able to not only capture the occurrence between code Design.Reasoning and Performance.Parameters as ENA did, but also represent the connection direction from Design.Reasoning to Performance.Parameters.

4.8 Using ONA Model Outputs in Other Analyses

As with ENA, the outputs of ONA models can be used as inputs in other statistical models. See Sect. 3.8 for an example using ENA points.

5 Additional Features

In the sections above, we demonstrated how to do an ENA analysis and an ONA analysis. In this section, we show how to project new data into a space constructed with different data. This can be done as long as the same codes are used in both sets.

5.1 Projections in ENA

To project the ENA points from one model into a space constructed with different data, replace the rotation.set parameter of ena.make.set. In the example below, an “expert” model is developed using the SecondGame units and the FirstGame (novice) units are projected into that space. By projecting novice model’s units into expert model’s space, users can interpret the projected novice units’ networks based on the two dimensions defined by expert model’s node positions. In other words, interpreting novice’s networks in the context of experts’ space (Figs. 27 and 28).

An R code snippet begins with subsetting the data based on the condition second game, followed by processing the data by extracting the relevant columns, accumulating expert contexts, and creating an E N A model, and similar steps are followed for novice data.
A continued code snippet defines the expert dataset and asks to plot the test model epistemic plot and print the plot. The code finally compares the plots.
Fig. 27
An epistemic network plot titled all units has vertices representing the mean and edges representing conf intervals. The vertices are titled data, technical constraints, performance parameters, design reasoning, client and consultant requests, and collaboration.

Mean network for all units in the expert model

A code reads plot underscore nov dollar sign plot.
Fig. 28
An epistemic network analysis plot titled all units has interconnected vertices labeled data, technical constraints, design reasoning, performance parameters, client and consultant requests, and collaboration.

Mean network for novice students projected into expert space

5.2 Projections in ONA

Projection works similarly in ONA (Fig. 29).

A code snippet assumes an R dataset named o n a double colon R S dot data, creates 2 subsets of data titled exp dot data and nov dot data, and defines shared unit and shared code columns related to data, technical constraints, performance parameters, client and consultant requests, and so on.
An R code snippet begins with defining the conversation rules function, followed by a condition to check if the condition is in units condition and the group name is a unit group name. The next section accumulates context for expert data, creates an E N A model using a model function, and finally plots the E N A ordered set.
Fig. 29
An epistemic network coordinate plot titled novice data into expert space has interconnected nodes labeled performance parameters and collaboration, design reasoning, technical constraints and client and consultant requests, and data in the first to fourth quadrants. The data is scattered around the origin.

Projecting novice data into expert space

6 Discussion

In this chapter, we introduced two techniques, ENA and ONA, for quantifying, visualizing, and interpreting networks using coded data. Through the use of a demonstration dataset that documents collaborative discourse among students collaborating to solve an engineering design problem, we provided step-by-step instructions on how to model complex, collaborative thinking using ENA and ONA in R. The chapter combines theoretical explanations with tutorials, intended to be of aid to researchers with varying degrees of familiarity with network analysis techniques and R. This chapter mainly showcased the standard and most common use of these two tools. The ENA and ONA R packages, akin to other R packages, offer flexibility to researchers to tailor their analyses to their specific needs. For example, users with advanced R knowledge can supply their own adjacency matrices and use ENA or ONA solely as a visualization tool rather than an integrated modeling and visualization tool.

Due to the technical and practical focus of this chapter, we omitted detailed explanations of the theoretical, methodological, and mathematical foundations of ENA and ONA that are crucial for informed, theory-based learning analytics research using these techniques. Consult the Further Reading section for papers that explain these aspects of ENA and ONA in greater detail.