Abstract
Researchers tasked with understanding the effects of educational technology innovations face the challenge of providing evidence of causality. Given the complexities of studying learning in authentic contexts interwoven with technological affordances, conducting tightly-controlled randomized experiments is not always feasible nor desirable. Today, a set of tools is available that can help researchers reason about cause-and-effect, irrespective of the particular research design or approach. This theoretical paper introduces such a tool, a simple graphical formalism that can be used to reason about potential sources of bias. We further explain how causal graphs differ from structural equation models and highlight the value of explicit causal inference. The final section shows how causal graphs can be used in several stages of the research process, whether researchers plan to conduct observational or experimental research.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
Educational Technology and Instructional Systems research, like most other empirical sciences, is faced with the challenge of determining cause and effect to establish the effectiveness of interventions, determine ways to develop them further, and construct theories that capture the underlying principles of using technology for educational purposes. Determining cause and effect means attributing changes in a variable of interest, e.g., learning outcomes, unequivocally to a preceding variable, e.g., an instructional innovation.
As the experimental method is by many considered the gold standard for identifying cause and effect, Educational Technology and Instructional Systems researchers rely heavily on experimentation to investigate their research questions (Bulfin et al., 2014; Honebein & Reigeluth, 2021; Thomas & Lin Lin, 2020). However, not all research contexts are amenable to rigorously controlled experimentation. This is often the case when an educational technology cannot be implemented without coming into conflict with the features of the system in which learning takes place (Lewis, 2015). Thus, while causal inferences may be desired, clean experimental comparisons are not always feasible, for example, when an instructional innovation entails substantial curricular modification or when a technological innovation cannot be meaningfully implemented without also changing the instructional method. Such is the tension at which much of educational technology research lies (Salomon, 1991).
This theoretical paper will introduce a framework for causal reasoning based on causal graphs, so-called Directed Acyclic Graphs (DAGs). We will outline the basic formalism underlying such graphs, distinguishing them from other forms of graphs well-known to educational technology researchers, structural equation models. Reasoning with causal graphs implies that researchers make their causal assumptions explicit, which is, as we will show, a value in-of-itself. Finally, we will present examples of the utility of DAG-based causal reasoning in different stages of the education technology research process.
Causal inference within varied research approaches
Today, comparative research or “research to prove” is (again) under scrutiny, as presented by Honebein and Reigeluth (2021). In their classification, they distinguish five different types of research, some of which they find less equipped than others to deliver meaningful knowledge to the research field. Research-to-prove entails confirmatory approach of testing hypotheses that are either confirmed or rejected in controlled settings, usually associated with the experimental method, the vehicle of choice for delivering causal inferences. Although this makes up a majority of educational technology research, as Honebein and Reigeluth (2021) posit, other approaches may be better equipped to deliver results in line with the goal of the field. Research-to-prove emphasizes rigor over relevance, with the result that knowledge gained from it is lacking in providing guidance to improve student learning in practical contexts. Also, they posit the well-known problem of confounding, where the experimental comparison varies more than only the variable of interest. The result is that changes in the dependent variable cannot be equivocally attributed to the treatment. Instead, they advocate for an increased focus on research-to-improve. This notion stems from improvement science, a more practical, cyclical approach to solving real-world problems (LeMahieu et al., 2015). Research methods in this paradigm include action research, design-based research, evaluation research, and others, which are seen as more in line with the cyclical nature of designing educational technology (Phillips et al., 2012).
However, while one may be tempted to believe improvement science does not rely on causal inference (LeMahieu et al., 2015; Lewis, 2015), it should be highlighted that the fundamental nature of causality—for understanding both the most basic processes as well as the most complex systemic dependencies—means that our understanding of the world rests on causal assumptions. For example, Buchner and Kerres (2022) systematically review the literature on augmented reality in education and find a preponderance of media comparison studies. As more productive alternatives, they propose Learner-Treatment Interaction designs and value-added designs, which focus on the question when and how technology works well. This ties in with the idea of improvement science because questions about when and how a technology works well are the foundation for improvement, and, crucially, improvement is an inherently causal concept. This means that these research goals can only be achieved by finding ways to causally attribute learning or learning-relevant processes to some feature of the educational technology, which themselves interact with features of the educational context and the learner’s individual differences. Choosing and testing a feature candidate for improvement is based on an—frequently implicit—causal model that describes how variables influence each other. As LeMahieu et al., (2015) state: “We need to understand how some system produces current results to intervene in the right places and with the right changes. We also need this understanding if we are to implement complex practices across contexts” (p. 447).
In other words, freeing ourselves from the shackles of confounded comparative research does not free us from the causal foundation on which all empirical questions rest: “What works and why?” followed by “what can be changed and how?” In fact, within the research classification of Honebein and Reigeluth (2021), only research-to-describe would be satisfied with remaining on an associational level, where no attempts at causal reasoning are made, and no practical implications are intended (see Table 1). For all other research approaches, reasoning about cause and effect is imperative.
Causal inference and directed acyclic graphs
Given the developments in causal inference in the past decades, there are now tools available that enable researchers to improve their causal inferences, notwithstanding the specific research approach at hand. One central approach to this is a graphical method using DAGs, which allows researchers to reason about causal assumptions and make research design decisions as well as statistical analysis decisions. This approach, spearheaded by Judea Pearl since the 1990s, has taken up speed over the years such that it is now a comprehensive framework of causal inference (Hernan & Robins, 2020; Morgan & Winship 2015; Pearl, 1995, 2009). Most prominently employed in epidemiology (Munafo et al., 2018), there are numerous recent efforts to introduce and apply these principles to psychology (Lee 2012; Rohrer 2018), sociology (Elwert & Winship, 2014), as well as many other fields (Griffiths et al., 2020). The power of this approach is that—with only a few key principles—researchers can begin to improve their causal inferences in varied research contexts and every stage of the research process (see Section “Using DAGs in different phases of educational technology research”). To our knowledge, no such introduction has been explicitly presented to the field of Educational Technology.
DAGs consist of visual representations of causal assumptions. In essence, by using variables, boxes, and arrows, the goal is to build a causal model of the relevant variables of interest. As such, they may look like structural equation models (see Section “The relation between structural equation models and DAGs”). However, in contrast to structural equation models, DAGs are non-parametric. While an arrow pointing from one variable to another indicates a causal effect, it is agnostic about whether the form is linear, quadratic, or exponential. In other words, while working with DAGs, researchers are not subject to statistical and methodological constraints, nor data availability. This is because DAGs are conceptual tools. In the first instance, it is only imperative to consult expert knowledge (i.e., theory) to construct a DAG that is substantively plausible and as comprehensive as possible, given the current state of knowledge.
Using the DAG, researchers can then assess if the causal effect of interest is identified. This means inspecting the graph carefully, paying attention to the directionality of arrows, and tracing possible paths of the causal effect. In essence, through this careful inspection researchers ask: “Is it possible to estimate the causal effect of interest (given that my causal assumptions are correct)?” If the answer is yes, the causal effect can be validly estimated. Crucially, this is irrespective of the research design or approach, meaning that a valid causal estimate can also be retrieved from non-experimental research. On the other hand, if the answer is no, the causal effect is not identified; this means a source of bias prevents valid estimation of a causal effect. In this case, several steps may be taken to improve this situation, statistical controls, ceasing to stratify on a variable, or changing the research design (see Section “Using DAGs in different phases of educational technology research”). This distinction between identification and estimation highlights the function of DAGs as conceptual tools, which should be ideally used in the planning phase of empirical studies. However, we will also show how DAGs can be used in other phases of research, even if they were absent in the planning phase.
Using DAGs to identify confounders
Consider the following example: If a researcher wants to learn about the effectiveness of an intervention, this can be depicted in the simplest of graphical forms, an arrow pointing from the hypothesized independent variable (X) to a dependent variable (Y). This indicates the hypothesized causal effect. However, if the investigator cannot randomize the intervention, extending the causal graph is necessary. The most basic and common extension is a third variable (Z) that influences both the independent and dependent variable (Fig. 1a).
This kind of variable is frequently referred to as a confounding variable. In this example, student engagement could play this role because more engaged students (a) may make more or better use of the intervention and (b) are more likely to learn successfully, independent of the intervention (Fig. 1b). Of course, the plausibility of confounders depends on the variables that are at the heart of the investigation. Expert knowledge and theory may guide investigators in reasoning about potential confounders in the causal model. Given this confounding situation (also called a “fork” in the DAG literature), the researcher will conclude that the causal effect is not identified. This means additional steps must be taken because aside from the causal path of interest (Intervention → Student Learning), there is an additional, non-causal path (Intervention ← Student Engagement → Student Learning). This path is non-causal because, tracing it from X to Y, we encounter arrows pointing in the ‘wrong’ direction. This indicates a source of bias, meaning that an estimate of the causal effect would be an uninterpretable mix of the true causal effect and additional non-causal association due to confounding. To arrive at this conclusion, all paths connecting X and Y must be considered, irrespective of the direction of arrows. In this case, the inspection yields one open back-door path (Pearl et al., 2016), which should be blocked in the language of DAGs. A standard way of blocking paths is by measuring the confounder and controlling for it statistically. This is often also referred to as conditioning, adjusting, or stratifying. Graphically, this is depicted by a box around the variable controlled for (Fig. 1b). The most straightforward way of doing this is by group-specific analyses, that is, looking at different levels of the confounder individually. This would mean separating highly engaged and hardly engaged students instead of considering them simultaneously. If the confounder is adequately controlled, X and Y are d-separated (Pearl, 1995), meaning that no open back-door paths remain. Then, the causal effect is identified under the assumption that the DAG is complete—that is, no other confounding is present. The statistical estimate of X on Y would then yield the true causal effect. Aside from group-specific analyses, there are many other approaches, of which we will only mention a few. Researchers may instead opt for the covariate-in-regression approach, including one or more third variables (i.e., confounders) in a regression model. Conceptually, this is similar to group-specific analyses; those variables included in the regression model can be considered controlled because the coefficients are estimated while holding the covariate constant. Matching is a third popular approach to statistical control as it works by post-hoc constructing groups that are similar concerning some hypothesized confounding variables. A widely used matching method is propensity score matching. All these approaches aim to approximate the leveling effect of randomization by creating groups that do not differ based on the bias-introducing variable.
Using DAGs to distinguish confounders from mediators
A mediating effect or chain in the language of DAGs consists of three variables in succession, independent variable → mediator → dependent variable. For example, suppose our intervention is a tool (X) aimed at supporting self-reflection and self-explanation. In that case, these meta-cognitive processes (Z) may be mediators connecting the intervention to enhanced student learning (Y, see Fig. 2a). In DAG language, the mediator transmits the causal effect. It is not a non-causal path because all arrows along this path point in the direction of the causal effect, that is, there are no arrows going in the opposite direction of the causal effect of interest. Whereas in the case of confounding, it is imperative to control for this third variable, in the case of a mediating effect, controlling for Z introduces bias. This is because controlling for metacognitive processes blocks this path (intervention → [metacognitive processes] → student learning), even though changes in metacognitive processes may be a crucial reason for changes in student learning. By comparing only those students with identical values of the mediator, we exclude the very mechanism responsible for the effect. In practice, this overcontrol bias (Elwert & Winship, 2014) frequently attenuates the true causal effect. The latter is represented in Fig. 2b, where controlling for the mediator blocks the mediating path, such that there remains only a direct effect intervention → student learning if the mediating path is one of partial mediating. In the case of full mediation, a direct effect would be absent, yielding a false-negative. The remedy for this is to not control for mediating variables. To distinguish between confounders and mediators, the researcher need to draw causal graphs based on substantive reasoning.
Using DAGs to identify colliders
Finally, we have inverted forks, a somewhat underappreciated configuration with counterintuitive implications, independent variable → collider ← dependent variable. If our causal effect of interest is still unknown, but both the intervention and student learning exert an effect on student retention, we can depict this as an inverted fork (Fig. 3a). In inverted forks, the Z variable is called a collider because it has more than one incoming arrow, i.e., they collide in this variable.
Colliders, or common effects, have the unique property of naturally blocking spurious associations. In the hypothetical example where we are trying to identify the effect of our intervention on learning (Fig. 3a), inspecting our DAG to identify all possible paths, we do not encounter a back-door path. This is because, unlike a confounder, colliders have natural blocking properties. This means that an uncontrolled collider does not introduce bias. However, if we are unaware that Z is a collider, we may control for Z, either statistically or by study design, consciously or unwillingly. This introduces bias because any control on a collider opens back door paths going through this variable, thereby opening a non-causal path. In Fig. 3b, we now have to open paths, intervention → student learning and intervention → [student retention] ← student learning, the latter of which introduces a spurious association and, thus, biases the causal effect of interest. Conditioning on a collider, as it is called, or endogenous selection bias is an increasingly recognized pitfall that can have devastating effects on the validity of causal claims (Elwert & Winship, 2014; Munafo et al., 2018; Richardson et al., 2019). Notably, the issue is not (only) about the representativity of the sample, where findings may not be applied to other populations. Endogenous selection bias is more pervasive in that it also biases the estimate itself, thus limiting internal validity. As with confounding, a frequent result of conditioning on a collider is a false-positive, finding an effect or association where there is none. Illustrative examples of how this bias works exactly can be found in Rohrer (2018) and Griffith et al. (2020). Collider bias completes the three fundamental causal configurations found in a DAG. A summary of these configurations and their immediate implications can be found in Table 2.
The relation between structural equation models and DAGs
As structural equation modeling (SEM) is a popular tool for educational technology research that is also frequently used for non-experimental research designs (e.g., Lai et al., 2016; Lee et al., 2010; Makransky & Lilleholt, 2018; Rahman et al., 2021), researchers may wonder about the differences and similarities between DAGs and SEM (or path) models, in particular concerning causal inference. Because many quantitatively-oriented researchers are well acquainted with SEM, we will start with some similarities.
First, DAGs and SEMs look very similar. This is because on a visual level they are both based on the usage of variables (variable names and sometimes circles or boxes) and arrows that connect the variables (or are deliberately absent). This similarity is not due to chance but because DAGs are generalizations of simple path models (Pearl, 2009). Like SEMs, DAGs can become rather complex, with many variables pictured and arrows interconnected. In fact, for most applied cases (as opposed to simplified examples), particularly in the social sciences, complex configurations are expected, which should also be reflected in the models. Usually, there is a tension between the desire for a parsimonious model and the necessary complexity to capture the nomological net of variables.
On the one hand, parsimonious models are easier to analyze and provide a more pleasing portrayal of the research object. In addition, overly complex models are rarely sufficiently theoretically justified and can bring problems in the estimation stage (Goodboy & Kline, 2017). On the other hand, the absence of arrows also brings strong assumptions and needs to be defended on substantive grounds just as well (VanderWeele, 2012). It is a result of the complexity of the social sciences that by adding arrows to our model, we are being—perhaps paradoxically—conservative. Also, both approaches may yield competing models with different causal assumptions, which may be subjected to testing to arrive at the model that best approximates the data. Another similarity is that neither DAGs nor SEMs distinguish between types of causes. Whereas there have been voices insisting on the manipulability of a variable as a criterion for causality (Holland, 1986), attributes, causes, events, conditions, or any other type of cause for that matter (Freese & Kevern, 2013) are represented identically, as a variable with arrows incoming and/or outgoing.
This brings us to the critical differences between SEMs and DAGs. If SEMs are heavyweights in terms of necessary assumptions, DAGs are comparative featherweights, allowing for a streamlined approach to thinking causally. Where the function of SEM is to specify an assumption-laden statistical model that is then estimated with respect to its approximation of the data, DAGs are conceptual tools used to apply principled reasoning about causality and bias reduction. While the primary goal of SEM is estimation, DAGs are used for identification purposes only. Here, estimation comes later and is methodologically decoupled. This brings the benefit that DAGs can be used across several stages of the empirical research process (see Section “Using DAGs in different phases of educational technology research”) and are, thus, not limited to the stages concerned with statistical estimation. As they are non-parametric, many considerations associated with SEM, e.g., linearity, measurement level, multivariate normality, measurement invariance, etc., do not apply to DAGs. This emphasizes conceptual reasoning on the causal structures among variables of interest, irrespective of practical constraints. Further, although SEM can incorporate latent variables, their underlying indicators must be measurable, as declared in the measurement model (Teo et al., 2013). As conceptual tools, DAGs do not have this limitation because they can include hypothesized unmeasured or generally unavailable variables to incorporate them into causal reasoning. This ability to reason about hypothesized yet unobserved variables and their relationships is a central hallmark of the DAG approach.
For example, suppose a researcher knows their sample cannot be randomly selected from the population. They might want to represent this graphically by adding an unmeasured variable U (time zone differences) that points to X (intervention, see Fig. 4). In an online education context, the researcher may have a theory about the nature of U, e.g., that (a) time zone differences affect selection into the sample but also that (b) time zone differences do not affect the outcome Y directly, because, for example, the learning design emphasized asynchronous communication. In a SEM model, this hypothesis could not be represented, whereas these assumptions can easily be represented in a DAG even though U remains unmeasured. Inspecting the graph, our graphical causal reasoning suggests that U does not lead to confounding because there is no back-door path (Fig. 4a). Thus, U need not be controlled. If, for whatever reason, researchers should be able to and decide to measure and control U, this would not systematically introduce bias, being an example of neutral control (Cinelli et al., 2021). However, if the researcher encounters a situation where the learning design necessitates synchronous peer interaction (Z), unmeasured time zone differences U will not only affect selection into the sample but also how peer interaction unfolds, as time zone differences of learners may hamper their ability to connect in a timely and productive manner. In this case, U is a confounder, opening up the back-door path intervention ← time zone differences → peer interaction → student learning (Fig. 4b). Because time zones differences may be unavailable for control due to being unmeasured, researchers would need to find a way to control for peer interaction, as this would also block this back-door path from transmitting non-causal association, intervention ← time zone differences → [peer interaction] → student learning. In this case, a causal effect of the intervention on student learning could be obtained by finding a valid measure of peer interaction and statistically controlling for it.
The most significant difference between DAGs and SEMs, however, is the premium placed on explicit causal inference. However, as we will show, this difference is somewhat resolved upon closer inspection. Because many SEM studies rely on observational data, researchers are hesitant to claim causality from their estimates, despite the array of directional arrows populating the model and the causal foundations of SEM (Bollen & Pearl, 2013; Pearl, 2012). This speaks to the broader hesitancy of quantitative researchers to speak causally through non-experimental data (Hernán, 2018).
For example, many technology acceptance studies working with SEM use terms like “predict” to interpret estimates between what are clearly seen as independent variables (i.e., expectancies and conditions) on the intention to use a technology (e.g., Islamoglu et al., 2021; Leow et al., 2021; Rahman et al., 2021). Arguably, these become valuable insights if, and only if, some causal implications can be derived from these predictions, in these cases, e.g., by changing conditions to increase usage intention. In other words, in most cases, interpreting associations will be uninteresting unless the estimates also hold under the assumption of causality; that is, practical implications can be generated from the findings. Of course, researchers are correct in being hesitant, cognizant of the truism that correlation does not mean causation, lest their causal claims will be obstacles in peer review. This tension between clearly directional arrows in SEMs and researchers’ hesitancy to claim causality has a long history in the SEM literature and may stem from the misguided claim that SEM aims to establish causality from associations alone (Bollen & Pearl, 2013). However, this applies to SEM as it does to DAGs, there are no causal conclusions without causal assumptions. In both cases, these are encoded in the presence or absence of variables and the directional arrows connecting them.
For SEM, there are two types of input: causal assumptions arising from domain knowledge and empirical data that may substantiate or disconfirm these assumptions. This second part usually receives much attention in the SEM literature, to the detriment of careful explication of the causal assumptions and their implications (Pearl, 2012). DAGs, on the other hand, deal only with the first part, the causal assumptions based on theory or domain knowledge, which make their causal content crystal clear and much harder to ignore.
Using DAGs in different phases of educational technology research
Finally, we will provide an overview of how DAGs, as a tool for causal reasoning, can be used in phases of educational technology research. Fundamentally, principled causal reasoning with DAGs is helpful in every stage of the empirical research process. Here, we will highlight the information DAGs can provide and the decisions they can facilitate in study planning, data analysis, and appraisal of the literature (see Fig. 5) while illustrating this as much as possible with examples from the literature. We designed these research phases as circular because principled appraisal of the literature concerning causal inference then (likely) leads again to a need for future research.
DAGs in study planning
As a first step, we suggest that researchers formulate a clear causal research question, irrespective of the feasibility of different research designs. In other words, if what researchers are interested in is causal (almost always the case), this should be communicated clearly, even if an experiment may not be possible. Then, the researchers may use their expert knowledge (based on their training, the available literature, or with the help of additional content experts) to come up with a causal graph containing the causal effect of interest to the study and additional variables hypothesized to causally affect these main variables. Again, the inclusion of variables should not be based on their availability of measurement or control. Using the construction rules laid out in Section “Causal inference and directed acyclic graphs” (e.g., no cycles, no arrow implies no hypothesized relationship, etc.), a DAG is constructed. As stated before, parametric considerations do not apply at this point, as identification and estimation are two different processes. In many educational technology research contexts, this will result in DAGs that are complex.
As a substantive example of using a DAG for study planning, we turn to the student retention scenario described in Hicks et al. (2022). Researchers interested in an outreach intervention’s efficacy in supporting at-risk students may construct a DAG to plan their research study (see Fig. 6).
Although researchers are only interested in the effects of an outreach intervention on student retention, in consulting their domain knowledge, they quickly realize that their study would be biased if it simply measured whether students were more likely to remain in the study program if they came in contact with an outreach intervention. At the very least, they reason, there will be a mediating variable through which the intervention may exert its hypothetical effect. They suggest this may be learning regulation, a variable they anticipate to be unobservable in the context of their study. Further, outreach interventions do not happen in a vacuum. Instead, students will be contacted if, and only if, they are deemed at-risk, as indicated by a lack of study progress. Their at-risk status itself depends on student characteristics like academic capital as well as some aspect of teaching efficacy. The researchers expect teacher engagement to be the primary driver of teaching efficacy but also concede that this will likely be unmeasurable. In consulting with colleagues, they agree that this DAG captures the main variables at play. The central question now is: is the causal effect of interest identified? If not, how can the causal effect be identified via an identification strategy?
The researchers in this example conclude that the effect is not identified. In coming up with an identification strategy, they note that at-risk status is an important variable as it is a mediator and a potential collider. As a mediator (academic capital → at-risk status → outreach intervention), it provides a path for the confounding effect of academic capital (outreach intervention ← at-risk status ← academic capital → learning regulation → retention). As part of an identification strategy, this path could be blocked by adjusting for learning regulation, but because researchers expect this variable to be unobservable, another strategy is needed. If the researchers decide to adjust for at-risk status, for example, by only looking at students that fall into this at-risk category, leaving out well-performing students, they have adjusted for a collider, opening another non-causal path: outreach intervention ← teacher engagement → [at-risk status] ← academic capital → learning regulation → retention. For this reason, the researchers would be advised to sample the whole student population or a randomized subsample instead of only at-risk students. To solve the confounding problem of academic capital, academic capital could be controlled for.
As practical implications for researchers, this implies planning data collection to ensure a sample that includes the whole student population or a representative subset. In addition, researchers would need to search for well-validated measures of academic capital and include this measure in the research design. The extent to which the instrument is valid and reliable for this student population will determine the reduction of confounding bias in estimating the effect of interest. Crucially, under the assumption that the constructed DAG captures the actual constellation of variables to a high degree and their identification strategy can be practically implemented, the researchers can move forward with a non-experimental research design, while still being relatively confident in drawing causal inferences from their data. While the absence of bias cannot be proven conclusively, the goal is a principled approach to reducing bias. To the extent that the identification strategy convincingly eliminates sources of bias, researcher’s claims of causality, albeit preliminary and subject to further probing, are warranted.
DAGS for analysis
Whether DAGs were used in the planning stage or not, when confronted with the gleaned data from a given quantitative study, DAGs can be used to make principled analytical decisions to avoid bias. Of course, the space of potential decisions is limited by the decisions made in the earlier study planning phase. For example, if a variable was not measured prior, it cannot be subject to adjustment in the analysis stage. On the other hand, it is likely that even if study planning was done without DAGs, using them to reason about bias in the analysis stage can still be valuable in improving causal inferences and reducing bias.
When deciding which variables to include in a statistical model (that is, which variable to control for), there is a frequent appeal to include more rather than fewer third variables (Spector & Brannick, 2010). This applies in particular to observational research and is due to the—occasionally mistaken—assumption that with each covariate included, any bias potentially stemming from this variable is, in a way, cleaned out from the resulting estimate. Implicitly, this further assumes that controlling cannot do harm but only good. This unprincipled, overeager approach, sometimes called garbage-can regression (Achen, 2005), was already criticized by Meehl (1970), and its fallibility is apparent given the graphical causal inference framework presented here (Lee, 2012; Pearl, 2009; Rohrer, 2018). As shown, covariates can indeed eliminate confounding but can also bring trouble if they lead to overcontrol (blocking a mediator) or endogenous selection bias (controlling for a collider).
Put succinctly, there are good controls and bad controls, as well as some more ambiguous ones (Cinelli et al., 2021), and all can be directly derived from the graphical formalism of DAGs. As research has shown, the choice of covariates affects the results of a study more than the choice of analysis procedures (e.g., ANCOVA vs. Propensity score matching), and the right set of covariates can lead to estimates with near-total bias reduction compared to experimental effects (Steiner et al., 2011).
As an example of using DAGs in the analysis phase, consider Boerebach et al. (2013), who looked at faculty teaching performance in medical education with a focus on resultant role model types. Outcomes were medical students perceptions of faculty as either teacher-supervisor, physician, or person role model. Teaching performance incorporated elements of feedback, learning climate, professional attitude, etc. (see also Boerebach et al., 2012). Instead of positing one definitive DAG, they explore the impacts of different causal assumptions on estimates of the relationship between teaching performance and role model perceptions. Figure 7 shows one of these DAGs (albeit visually slightly adapted to fit this paper’s style).
Boerebach et al. (2013) use eight DAGs of varying complexity to explore the potential causal relationships between teacher performance and different role model perceptions. To reduce visual clutter, they subsume an array of confounders under variable Z, a practice that should be avoided in actual research contexts but does not pose a problem for our purposes. Crucially, Boerebach et al. (2013) know from previous research that all variables subsumed under Z are, in fact, confounders in that they affect teaching performance as well as role model types. For this reason, statistically adjusting for these variables is necessary. Controlling for all known confounders can lead researchers to interpret their estimate causally if the further assumption of relative completeness and correctness of the DAG is supported. Given the scarcity of empirically or theoretically supported causal assumptions in this line of research, Boerebach et al. (2013) opt to explore these relationships statistically instead. Depending on the underlying DAG and its analytical implications, they find a large variance in the estimated effects. From this, they argue for further research to limit the number of plausible models to better estimate causal effects from non-experimental research.
The issue of underdeveloped theoretical models and insufficient empirical support for definitive DAGs also applies to Educational Technology. Ironically, defending the causal assumptions of a DAG—in turn—calls for a robust body of causal knowledge. As of now, educational technology research is not yet at the point to provide strong defenses along these lines. For any plausible DAG, realistically, there will be competing, similarly plausible DAGs. One intermediate solution that can be used in the analysis phase is sensitivity analysis. To this end, VanderWeele and Ding (2017) proposed the E-Value, as the minimum strength of association an unmeasured confounder would need to have with treatment and outcome to explain away the effect of interest fully. For educational technology research, the E-Value could be used to assess the plausibility of competing DAGs with regards to being free from confounding. For example, a high E-Value for a DAG implies that the associations of the confounders with X and Y would need to be very high to reduce the causal effect to zero. Given typical effect sizes in the educational technology literature, an E-Value corresponding to d > 2 makes it implausible that the effect of interest is fully explained by confounding bias. Here, too, substantive knowledge of theory and evidence regarding the specific variables of interest guides the evaluation of plausibility. Researchers may want to assess the E-Value for competing DAGs to arrive at an intuition as to the likely degree of unmeasured confounding.
DAGs for literature appraisal
Causal graphs can also be used to critically appraise the available literature and assess evidence for a particular research question. Researchers can use their extensive expert knowledge and the information from published research studies to construct DAGs post-hoc. For example, Weidlich et al. (2022) analyzed published studies from the Learning Analytics literature. By creating plausible DAGs derived from information presented in the papers and substantive reasoning, they found likely instances of all three causal inference pitfalls, i.e., confounding bias, overcontrol bias, and collider bias. Reasoning about these sources of bias allowed for alternative interpretations of puzzling findings and, in some cases, led to simple proposals to decrease bias in future studies. For example, in line with other commentators, they identified Arnold & Pistilli (2012) as an instance of confounder bias. An analytics-based early warning system, Course Signals, provided striking effects on retention rates. Missing was, however, the complication that students further along their educational trajectory were more likely to encounter Course Signals, and the number of classes taken directly affects student retention (Fig. 8a). In this case, error-free measurement and, thus, complete adjustment for the confounding variable would have been possible.
In other instances, easy fixes are not available. This is the case when, for example, studies control for a collider by design, e.g., conducting their analyses only on students who completed a MOOC. The identification of such biases can lead to alternative interpretations of the results. For example, Zhu et al. (2016) longitudinally looked at how students’ social connectedness in a MOOC forum was associated with learning engagement. Due to the high drop-out rate characteristic for MOOCs, they conducted their analyses on an increasingly truncated sample of students. Plausibly, both variables of interest, social connectedness, and learning engagement affect MOOC attrition, making this a collider (Fig. 8b). The severity of collider bias would then increase with each passing week, as students successively drop out and the remaining sample becomes increasingly non-representative concerning the variables of interest.
Importantly, post-hoc reasoning about bias in the available literature can contribute to the identification of further research gaps. It is possible that an extensive literature has been produced on the effects of, for example, an innovative educational technology on learning outcomes, yet still, its causal foundations are shaky. Critically, this is more than just a question of whether experimental data is available. By reasoning about causality with the help of DAGs, it is possible to arrive at conclusions showing that an experiment may have yielded biased causal inferences due to, for example, blocking a mediator and, on the other hand, observational research may have provided more substantial evidence for causal claims due to appropriate choice of statistical control.
Regarding approaching real-world problems and the big questions of educational technology research (Reeves & Lin, 2020), it is imperative to appraise the available literature from a causal inference perspective to see the progress made specifically concerning causal knowledge. Systematic efforts to this effect can be found in Haber et al. (2018), where the causal language of academic and media articles is contrasted with the actual strength of inference. Similarly, systematic reviews which incorporate the strength and validity of causal inference using DAGS may also be necessary for emerging and established literatures on the effects of educational-technological innovations, like Flipped Classroom (Cheng et al., 2019), Learning Analytics systems (Bodily & Verbert, 2017), Classroom Technologies (Eutsler et al., 2020), and VR/AR (Di & Zheng, 2022; Buchner & Kerres, 2022).
Discussion and conclusion
Aside from the specific use cases outlined above, the most fundamental asset DAGs bring to the research process is the value of explicit causal thinking. Researchers in the field of Educational Technology will inevitably be faced with the tension between the inferences they would like to make and the reality of their data. The taboo against explicit causal inference (Grosz et al., 2020), which refers to the hesitancy of researchers to think causally and use causal language when dealing with observational data, can also be found in Educational Technology. In this case, researchers may resort to ‘scientific euphemisms’ (Hernan, 2018), using associational language instead of causal language, despite being required to formulate practical implications from their findings. And because there are no implications without causality, papers then display a problematic disconnect between what the researchers wanted to know (causal knowledge), what they ended up doing (knowledge about associations), and the implications that they take from their findings (again causal but reported via euphemistic language). Using DAGs in different phases of the research process, as outlined in Section “Using DAGs in different phases of educational technology research” helps resolve this tension by necessitating explicit causal claims and making transparent the ways in which bias may be present and remedies to prevent it.
Educational technology research has been criticized for its limited methodological ambition and capacity (Bulfin et al., 2014). At the same time, widespread (quasi-) experimental research-to-prove is under scrutiny (Honebein & Reigeluth, 2021). This may lead educational technology researchers to believe in a binary view in which they may only approach causal questions if their research context allows for randomized experimentation. On the contrary, the past two decades have seen tremendous progress in developing quantitative methodological approaches to draw causal inferences from observational research. Where randomized experiments remove confounders from the picture entirely, research with observational data must account for potential biases by deliberately using state-of-the-art methods and employing substantive knowledge to reason causally. With an explicit goal of studying causal mechanisms (with or without experimental data), researchers may be encouraged to use, for example, propensity score matching, instrumental variable estimation, regression discontinuity, and much more (Antonakis et al., 2010). The conceptual basis for making decisions in light of this methodological pluralism can be found in causal reasoning with DAGs.
Spector (2020) lamented the lack of progress in educational technology, a phenomenon, we argue, that can be traced back to ineffective knowledge cumulation. Following West et al. (2020), we agree that undergirding robust educational technology in research and practice is good theory. We further posit that undergirding good theory is a robust body of causal knowledge, cumulated over time. Because no single study can test all hypotheses and rule out all alternative explanations, it is imperative to build on the work of others. Without clarifying causal claims and the specific assumptions under which they may hold, it is challenging to appraise and evaluate the literature in terms of what is known, what is unknown, and what lies in between. In other words, cumulation and theory-building do not work without explicit causal inference. This may be one of the reasons why authors have claimed the field to be under-theorized and may explain why theories are mainly imported from other disciplines (McDonald & Yanchar, 2020). DAGs require theory and domain knowledge as a basis for their construction. At the same time, the causal assumptions in DAGs increase transparency and clarity while allowing for testing and falsifying prospective theories. Along with rapid developments of causal inference in other fields of research, we suggest that educational technology, too, should put explicit causal questions at the forefront—no matter the research paradigm at hand—in order to provide practical implications and address real-world problems (Reeves & Lin, 2020).
This theoretical paper has outlined a methodology for causal reasoning in educational technology research. Central to it is the construction and interpretation of causal graphs, or DAGs. Causal reasoning with DAGs provides a fruitful ground for causal inference that do not solely rely on experimental evidence, a feature crucial to educational technology research, where rigorously controlled experiments may prove difficult. Given the recent advocacy for research-to-improve approaches, this contribution has illustrated the benefits of DAG-based causal reasoning, how it relates to other common research methods, and how it can be used in all stages of research.
References
Achen, C. H. (2005). Let’s put garbage-can regressions and garbage-can probits where they belong. Conflict Management and Peace Science, 22(4), 327–339.
Antonakis, J., Bendahan, S., Jacquart, P., & Lalive, R. (2010). On making causal claims: A review and recommendations. The Leadership Quarterly, 21(6), 1086–1120.
Arnold, K. E., & Pistilli, M. D. (2012). Course signals at Purdue: Using learning analytics to increase student success. In Proceedings of the 2nd international conference on learning analytics and knowledge (pp. 267–270).
Buchner, J., & Kerres, M. (2022). Media comparison studies dominate comparative research on augmented reality in education. Computers & Education, 195, 104711.
Bodily, R., & Verbert, K. (2017). Review of research on student-facing learning analytics dashboards and educational recommender systems. IEEE Transactions on Learning Technologies, 10(4), 405–418.
Boerebach, B. C., Lombarts, K. M., Scherpbier, A. J., & Arah, O. A. (2013). The teacher, the physician and the person: Exploring causal connections between teaching performance and role model types using directed acyclic graphs. PloS One, 8(7), e69449.
Bollen, K. A., & Pearl, J. (2013). Eight myths about causality and structural equation models. In Handbook of causal analysis for social research (pp. 301–328). Springer.
Boerebach, B. C., Lombarts, K. M., Keijzer, C., Heineman, M. J., & Arah, O. A. (2012). The teacher, the physician and the person: How faculty’s teaching performance influences their role modelling. PLoS One, 7(3), e32089.
Bulfin, S., Henderson, M., Johnson, N. F., & Selwyn, N. (2014). Methodological capacity within the field of “educational technology” research: An initial investigation. British Journal of Educational Technology, 45(3), 403–414.
Cheng, L., Ritzhaupt, A. D., & Antonenko, P. (2019). Effects of the flipped classroom instructional strategy on students’ learning outcomes: A meta-analysis. Educational Technology Research and Development, 67(4), 793–824.
Cinelli, C., Forney, A., & Pearl, J. (2021). A crash course in good and bad controls. Sociological Methods & Research. https://doi.org/10.1177/00491241221099552
Di, X., & Zheng, X. (2022). A meta-analysis of the impact of virtual technologies on students’ spatial ability. Educational Technology Research and Development, 70(1), 73–98.
Elwert, F., & Winship, C. (2014). Endogenous selection bias: The problem of conditioning on a collider variable. Annual Review of Sociology, 40, 31.
Eutsler, L., Mitchell, C., Stamm, B., & Kogut, A. (2020). The influence of mobile technologies on preschool and elementary children’s literacy achievement: A systematic review spanning 2007–2019. Educational Technology Research and Development, 68(4), 1739–1768.
Freese, J., & Kevern, J. A. (2013). Types of causes. In Handbook of causal analysis for social research (pp. 27–41). Springer.
Goodboy, A. K., & Kline, R. B. (2017). Statistical and practical concerns with published communication research featuring structural equation modeling. Communication Research Reports, 34(1), 68–77.
Griffith, G. J., Morris, T. T., Tudball, M. J., Herbert, A., Mancano, G., Pike, L., & Hemani, G. (2020). Collider bias undermines our understanding of COVID-19 disease risk and severity. Nature Communications, 11(1), 1–12.
Grosz, M. P., Rohrer, J. M., & Thoemmes, F. (2020). The taboo against explicit causal inference in nonexperimental psychology. Perspectives on Psychological Science, 15(5), 1243–1255.
Haber, N., Smith, E. R., Moscoe, E., Andrews, K., Audy, R., Bell, W., Brennan, A. T., Breskin, A., Kane, J. C., Karra, M., McClure, E. S., Suarez, E. A., & CLAIMS Research Team. (2018). Causal language and strength of inference in academic and media articles shared in social media (CLAIMS): A systematic review. PloS one, 13(5), e0196346.
Hernán, M. A. (2018). The C-word: Scientific euphemisms do not improve causal inference from observational data. American Journal of Public Health, 108(5), 616–619.
Hernán, M. A., & Robins, J. M. (2020). Causal inference: What if. Chapman & Hall/CRC.
Hicks, B., Kitto, K., Payne, L., & Buckingham Shum, S. (2022). Thinking with causal models: A visual formalism for collaboratively crafting assumptions. In LAK22: 12th International Learning Analytics and Knowledge Conference (pp. 250–259).
Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81(396), 945–960.
Honebein, P. C., & Reigeluth, C. M. (2021). To prove or improve, that is the question: The resurgence of comparative, confounded research between 2010 and 2019. Educational Technology Research and Development, 69(2), 465–496.
Islamoglu, H., Yurdakul, K., I., & Ursavas, O. F. (2021). Pre-service teachers’ acceptance of mobile-technology-supported learning activities. Educational Technology Research and Development, 69(2), 1025–1054.
Lai, C. L., Hwang, G. J., Liang, J. C., & Tsai, C. C. (2016). Differences between mobile learning environmental preferences of high school teachers and students in Taiwan: A structural equation model analysis. Educational Technology Research and Development, 64(3), 533–554.
Lee, E. A. L., Wong, K. W., & Fung, C. C. (2010). How does desktop virtual reality enhance learning outcomes? A structural equation modeling approach. Computers & Education, 55(4), 1424–1442.
Lee, J. J. (2012). Correlation and causation in the study of personality. European Journal of Personality, 26(4), 372–390.
LeMahieu, P. G., Edwards, A. R., & Gomez, L. M. (2015). At the nexus of improvement science and teaching: Introduction to a special section of the Journal of Teacher Education. Journal of Teacher Education, 66(5), 446–449.
Leow, L. P., Phua, L. K., & Teh, S. Y. (2021). Extending the social influence factor: Behavioural intention to increase the usage of information and communication technology-enhanced student-centered teaching methods. Educational Technology Research and Development, 69(3), 1853–1879.
Lewis, C. (2015). What is improvement science? Do we need it in education? Educational Researcher, 44(1), 54–61.
Makransky, G., & Lilleholt, L. (2018). A structural equation modeling investigation of the emotional value of immersive virtual reality in education. Educational Technology Research and Development, 66(5), 1141–1164.
McDonald, J. K., & Yanchar, S. C. (2020). Towards a view of originary theory in instructional design. Educational Technology Research and Development, 68(2), 633–651.
Meehl, P. E. (1970). Nuisance variables and the ex post facto design. In M. Radner & S. Winokur (Eds.), Minnesota studies in the philosophy of science (Vol. 4) analyses of theories and methods of physics and psychology. University of Minnesota Press.
Morgan, S. L., & Winship, C. (2015). Counterfactuals and causal inference. Cambridge University Press.
Munafò, M. R., Tilling, K., Taylor, A. E., Evans, D. M., & Smith, D. (2018). Collider scope: When selection bias can substantially influence observed associations. International Journal of Epidemiology, 47(1), 226–235.
Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82(4), 669–688.
Pearl, J. (2009). Causality. Cambridge University Press.
Pearl, J. (2012). The causal foundations of structural equation modeling. In R. H. Hoyle (Ed.), Handbook of structural equation modeling (pp. 1–37). Guilford.
Pearl, J., Glymour, M., & Jewell, N. P. (2016). Causal inference in statistics: A primer. Wiley.
Phillips, R., Kennedy, G., & McNaught, C. (2012). The role of theory in learning technology evaluation research. Australasian Journal of Educational Technology. https://doi.org/10.14742/ajet.791
Rahman, T., Kim, Y. S., Noh, M., & Lee, C. K. (2021). A study on the determinants of social media based learning in higher education. Educational Technology Research and Development, 69(2), 1325–1351.
Reeves, T. C., & Lin, L. (2020). The research we have is not the research we need. Educational Technology Research and Development, 68(4), 1991–2001.
Richardson, T. G., Smith, D. G., & Munafò, M. R. (2019). Conditioning on a collider may induce spurious associations: Do the results of Gale et al. (2017) support a health-protective effect of neuroticism in population subgroups? Psychological Science, 30(4), 629–632.
Rohrer, J. M. (2018). Thinking clearly about correlations and causation: Graphical causal models for observational data. Advances in Methods and Practices in Psychological Science, 1(1), 27–42.
Salomon, G. (1991). Transcending the qualitative-quantitative debate: The analytic and systemic approaches to educational research. Educational Researcher, 20(6), 10–18.
Spector, P. E., & Brannick, M. T. (2010). Common method issues: An introduction to the feature topic in organizational research methods. Organizational Research Methods, 13(3), 403–406.
Spector, J. M. (2020). Remarks on progress in educational technology. Educational Technology Research and Development, 68, 833–836.
Steiner, P. M., Cook, T. D., & Shadish, W. R. (2011). On the importance of reliable covariate measurement in selection bias adjustments using propensity scores. Journal of Educational and Behavioral Statistics, 36(2), 213–236.
Teo, T., Tsai, L. T., & Yang, C. C. (2013). Applying structural equation modeling (SEM) in educational research: An introduction. In Application of structural equation modeling in educational research and practice (pp. 1–21). Brill.
VanderWeele, T. J. (2012). Invited commentary: Structural equation models and epidemiologic analysis. American Journal of Epidemiology, 176(7), 608–612.
VanderWeele, T. J., & Ding, P. (2017). Sensitivity analysis in observational research: Introducing the E-value. Annals of Internal Medicine, 167(4), 268–274.
Weidlich, J., Gašević, D., & Drachsler, H. (2022). Causal inference and bias in learning analytics: A primer on pitfalls using directed acyclic graphs. Journal of Learning Analytics, 9(3), 183–199.
West, R. E., Ertmer, P., & McKenney, S. (2020). The crucial role of theoretical scholarship for learning design and technology. Educational Technology Research and Development, 68(2), 593–600.
Zhu, M., Bergner, Y., Zhang, Y., Baker, R., Wang, Y., & Paquette, L. (2016). Longitudinal engagement, performance, and social connectivity: A MOOC case study using exponential random graph models. In Proceedings of the Sixth International Conference on Learning Analytics & Knowledge (pp. 223–230).
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors report no conflicts of interest for this manuscript.
Ethics approval and consent to participate
Research involving human participants and/or animals: The authors report that no humans nor animals are involved in this research as this is a theoretical research paper without empirical data.
Informed consent
The authors report that informed consent does not apply here as this is a theoretical research paper without empirical data.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Weidlich, J., Hicks, B. & Drachsler, H. Causal reasoning with causal graphs in educational technology research. Education Tech Research Dev (2023). https://doi.org/10.1007/s11423-023-10241-0
Accepted:
Published:
DOI: https://doi.org/10.1007/s11423-023-10241-0