FormalPara Learning Objectives

After reading this chapter, you should be able to:

  • Explain the epistemological basis of PT and its focus on theories about causal mechanisms.

  • Carry out PT on a case study and use evidence from that case study to update your initial estimates of the likelihoods that alternative explanations of the outcomes of the case are true.

  • Follow best practices of PT.

  • Identify the kinds of research questions and contexts in which PT is most useful.

  • Understand the Bayesian logic that underlies PT inferences.

  • Understand the strengths and limits of PT as a method of causal inference.

8.1 Introduction

Policymakers often need to assess the likely outcomes of alternative policies. To do so, they frequently need to develop causal understandings of past outcomes in situations where few cases exist and experiments are not possible for ethical or financial reasons. Process tracing (PT), a technique of within-case analysis analogous to detective work or medical diagnosis, is a key method of causal inference in individual cases. The goal is to explain the outcome of a single case, and as in detective work, the researcher can build upon both “suspects” (theories that provide potential alternative explanations for the outcome of a case) and “clues” (evidence or diagnostic tests).

Case studies have a long history—implicitly, they have been the primary method for historians and political observers since the Greek historian Thucydides wrote his chronicles in the fifth century BC. Many case studies have been done without much methodological rigor, however, which has given case study methods a bad reputation in some fields of research. In the past two decades, methodologists in political science and sociology have greatly improved and systematized case study methods, particularly the method of PT. This includes efforts to both refine case study methods and disseminate them to researchers through organizations, such as the American Political Science Association’s section on Qualitative and Multimethod work, and training programs, including those sponsored by the Institute for Qualitative and Multimethod Research (IQMR) at Syracuse University, the European Consortium for Political Research (ECPR), the Global School on Empirical Research Methods (GSERM) at the University of St. Gallen, summer schools at the University of Oslo and the University of Essex, and MethodsNet.

The present chapter gives an overview of PT and recent innovations in this method. It begins with a discussion of the epistemic assumptions of PT, building on Daniel Little’s Chap. 2 in this volume. It then defines PT and outlines best practices on how to do it, illustrating these with examples of case study research on the COVID pandemic. Next, the chapter assesses the comparative advantages of PT vis-à-vis other methods, including some of those addressed in the other chapters in this volume. This section also identifies the kinds of research questions and research contexts for which PT is most useful. The chapter then outlines two new developments in PT methods: formal Bayesian PT, and the use of causal models in the form of Directed Acyclic Graphs to assist in PT and to integrate qualitative and quantitative evidence. The chapter concludes with the strengths and limits of the method.

8.2 The Epistemic Foundations of Process Tracing

For policy purposes as well as academic theoretical progress, we need causal knowledge: what will be the outcome if we try policy X or if X happens in the world? Yet all research methods confront what has been called the “fundamental problem of causal inference”: we cannot rerun history after trying policy X, or after X happens in the world, and observe the outcome in the absence of X, while holding all other variables and historical developments constant.

Although no method can fully surmount this problem, scholars have outlined four general approaches to causation and associated methodological approaches to causal inference: regularity, counterfactual analysis, manipulation/experiments, and the causal mechanism account (Brady, 2008; see Chap. 2). The regularity approach, which Henry Brady calls “neo-Humean” after the philosopher David Hume, focuses on what Hume called “constant conjunction,” or what we now call correlation as the key to scientific explanation (Brady, 2008). The well-known limitation of this approach is that correlation does not equal causation. Even when observational data is plentiful, and robust correlations convince us that some causal relationship probably exists, the nature of the process that generates the correlations may be unknown, and the direction of causation—does A cause B, or does B or the expectation of B cause A—is not always certain. Statistical analyses also face the “ecological inference problem”: even if a correlation is causal, it does not necessarily explain any individual case in the population under study. A medicine could be helpful on average, for example, and at the same time be lethal to those who have an allergy to that medicine.

The counterfactual approach, and associated “potential outcomes” methods, posit that something is a cause if it satisfies the following: “if A then B, if not A then not B” (or, if not A then B does not happen in the same way, at the same time, or with the same magnitude). This definition of causation is intuitively appealing as a kind of common-sense understanding of causation, but it is more a thought experiment than a method of inference because we cannot observe counterfactual outcomes. In addition, while counterfactuals offer an intuitively appealing account of causation, they are also intuitively unsatisfying, and a weaker guide to policy choices in other cases, if they lack some account of the process through which the observed outcome arose (and that through which the unobserved counterfactual could have arisen).

The manipulation or experimental approach works to get as close as possible to observing the counterfactual outcome. It does so by selecting a “control” case or unit (or many randomly selected control cases or units) on which no manipulation is performed, and comparing the outcome to that of a case or unit to the outcome of a case that is as similar as possible to the control unit except that it has been subject to some treatment (or if there are many randomly selected cases, a comparison is made to a randomly selected treatment group).

This gets around some of the limitations of observational statistical analyses, but experiments have many demanding requirements or assumptions that must be met to be internally and externally valid. By one account, 26 requirements must be met for an experiment to allow a valid causal inference, including that random assignment has been properly done, that the proper statistical test is applied, that the sample size is sufficiently large, that there is no “compensatory rivalry” (which can happen if experimental subjects find out which group they have been assigned to and try harder to achieve a favorable outcome), and that there are no treatments that occur apart from the specific one under study (Cook, 2018). Even when these assumptions are met, an experiment may or may not get us much closer to understanding the processes that generate the observed outcome(s), which limits our ability to anticipate the scope conditions under which the causal relationship holds. In addition, for many important policy challenges, experiments are impractical, a point elaborated below. Even when field experiments are possible or historical processes provide “natural” experiments with nearly random assignment of individuals to some “treatment,” experiments outside of a controlled laboratory setting introduce many potential confounding variables that make it difficult to satisfy the assumptions necessary for causal inference.

The fourth approach, focusing on causal mechanisms and their capacities, provides the epistemological basis for PT (see Chap. 2, herein). In one much-cited definition, causal mechanisms can be thought of as “entities and activities, organized such that they are productive of regular changes” (Machamer et al., 2000). Causal mechanisms are the ontological entities in the world that generate the outcomes we observe, and we attempt to model these mechanisms with theories. This approach is consistent with and, in some sense, more fundamental than the others outlined above, as it includes a focus on the activities or processes that create correlations, that make experiments work, and that explain both actual and, if we could observe them, counterfactual outcomes. It is the regularity of causal mechanisms, or what some have called “invariance,” that gives them explanatory power.Footnote 1 Put another way, causal mechanisms cannot be “turned off” when the conditions that enable their operation exist.

Unlike some approaches to explanation, the causal mechanisms view rejects “as if” theoretical assumptions, or assertions that theories need not be consistent with more micro-level processes as long as these theories are predictively accurate “as if” their stated or implicit micro-mechanisms were true. In a causal mechanisms approach to explanation, theories must be consistent with the evidence at lower levels of analysis or smaller slices of space and time. We may, for pragmatic reasons, consider a simplified theory adequate for some policy purposes even if it does not give details on micro-level processes, but we do so knowing that a theory that is more consistent with the details at the next level down has greater accuracy and might lead to more nuanced policy prescriptions. The 1960s theory that “smoking can cause cancer,” for example, was sufficient for the public health policy advice “don’t smoke,” even though the detailed processes relating smoking to cancer were unknown at the time. We now have a more detailed theory about smoking and cancer that allows more precise policy prescriptions, such as “people with a mutation at a specific region on chromosome 15 are at a particularly high risk of cancer if they smoke.” Theories on macro-level social processes and outcomes can be useful, and for some purposes, it may be more efficient to do PT at the macro level, but if macro-level theories work through lower levels of analysis like individuals’ choices, they must still be consistent with the processes through which those choices are made to be considered as accurate as possible.

PT exploits this aspect of mechanistic explanations by generating and assessing evidence, sometimes in detailed slices of space and time, on the explicit or implicit processes hypothesized by alternative explanations for the outcomes of individual cases. It thus takes advantage of two sources of evidence and inference that Hume did not include as core features of his constant conjunction account: contiguity andsequencing. Contiguity gets at entities in spatial proximity, bumping into each other or exchanging information—in social phenomena, who said or did what to whom. Sequencing uses the order in which things happened to help make inferences to the best explanation of the outcomes of cases—although it can be empirically hard to tell which of two parties escalated a confrontation, for example, the order in which it happened matters in explaining the outcome.

The focus on evidence on hypothesized processes raises three challenges for PT: how far down must we go into the details of processes? when should we stop gathering evidence? and how far back in time should we go to provide adequate explanations? Unfortunately, while Bayesian logic, outlined below, provides answers to these questions, they are rather general: we stop pushing into more detailed observations, gathering additional evidence, or probing earlier points in time when we think it is unlikely that doing so will change our confidence in the likelihood of alternative explanations sufficiently to be worth the effort it would entail. Put another way, process tracers balance two risks:

  1. 1.

    Of stopping the collection and analysis of evidence too soon, when just a little more effort would have provided evidence that would convince us of a different explanation, and

  2. 2.

    Of stopping too late, expending effort that does not change our confidence in alternative explanations of the outcome.

On a more pragmatic level, at some point, social scientists leave the study of more detailed social and psychological processes to other fields of study that have the skills and equipment to gather and assess evidence on these processes: cognitive psychology, neuroscience, microbiology, and so on. But we should—and do—pay at least some attention to the research at these lower levels of analysis because findings inconsistent with our theories indicate that we need to modify those theories. In the fields of economics and political science, for example, numerous theories build on researchFootnote 2 that demonstrates how human decision-making often involves cognitive biases that depart from the assumptions of earlier rational choice models.

8.3 Process Tracing Best Practices and Examples from COVID Research

8.3.1 Definition of Process Tracing

PT is the gathering and “analysis of evidence on processes, sequences, and conjunctures of events within a case for the purposes of either developing or testing hypotheses about causal mechanisms that might causally explain the case” (Bennett & Checkel, 2015:7).

Bayesian logic is the underlying foundation of PT. Bayesianism in PT treats probabilities as degrees of belief in alternative explanations.Footnote 3 In this approach, we use our existing background knowledge to form initial degrees of belief in alternative explanations of the outcome of a case (called the “priors”), and then analyze evidence to form updated degrees of belief, now conditioned on the evidence (called the “posteriors”). The relative probability of evidence under the explanations is called the “likelihood” (or, when comparing two explanations, the “likelihood ratio”). Bayesianism uses the laws of probability to convert the likelihood of the evidence conditioned on the explanations to the posteriors, or the likelihood of the explanations conditioned on the evidence.

In mathematical symbols, Bayes Theorem outlining this process of updating can be expressed as in Eq. (8.1):

$$ \mathit{\Pr}\left(P|k\right)\kern0.5em =\frac{\mathit{\Pr}(P)\Pr \left(k|P\right)}{\mathit{\Pr}(P)\mathit{\Pr}\left(k|P\right)+\mathit{\Pr}\left(\sim P\right)\mathit{\Pr}\left(k|\sim P\right)} $$
(8.1)

where

Pr(P| k) is the posterior or updated probability of proposition P given (or conditional on) evidence k.

  • Pr(P) is the prior probability that proposition P is true.

  • Pr(k| P) is the likelihood of evidence k if P is true (or conditional on P).

  • Pr(~P) is the prior probability that proposition P is false.

  • Pr(k| ~P) is the likelihood of evidence k if proposition P is false (or conditional on ~P).

A mathematically equivalent equation, known as the “odds,” form Bayes Theorem, which in some ways in easier to work with, is as follows:

$$ Posterior\ Odds\ Ratio= Likelihood\ Ratio\kern0.5em \bullet \kern0.5em Prior\ Odds\ Ratio $$

where the Likelihood Ratio is the probability of finding evidence k conditional on P being true divided by the likelihood of k conditional on P being false. In the notation of probability, the equivalent equation reads as in (8.2):

$$ \frac{\mathit{\Pr}\left(P|k\right)}{\mathit{\Pr}\left(\sim P|k\right)}=\frac{\mathit{\Pr}\left(k|P\right)}{\mathit{\Pr}\left(k|\sim P\right)}\cdot \frac{\mathit{\Pr}(P)}{\mathit{\Pr}\left(\sim P\right)} $$
(8.2)

An intuitive way to understand Bayesian logic is to think of the strength of evidence, or the relative likelihood of finding a particular piece of evidence under alternative explanations. Evidence that is much more likely under one explanation than under another has high probative value. We already have a colloquial language for the strength of evidence (Van Evera, 1997: 31–32): evidence can constitute “smoking gun” tests, “hoop” tests, “doubly decisive” tests, or “straw in the wind” tests.

  • A smoking gun piece of evidence is information that strongly affirms an explanation if the evidence proves to exist, but only weakly undermines that explanation if the evidence is not found. The metaphor here is that if a smoking gun is found in the hand of a murder suspect immediately after a shot is heard and the victim’s body falls, then that suspect is very likely to be the murderer. The failure to find a smoking gun in the hand of a suspect, however, does not exonerate that suspect.

  • Hoop tests involve strong evidence that is asymmetric in the other direction. Passing a hoop test means an explanation is still a viable candidate, but it only slightly increases the probability that the explanation is true. Failing a hoop test, on the other hand, greatly undermines our confidence in an explanation. If a murder suspect was in a different city from the victim at the time of the murder, for example, the suspect is exonerated, as the “guilty” hypothesis has failed a hoop test. But finding that the suspect was in the same city as the victim does not greatly incriminate the suspect, as many people were in the city at the time.

  • Doubly decisive tests are symmetrical: they are strong at both affirming one explanation and casting doubt on others. An example here is a bank video camera that catches the face of a robber, incriminating them and exonerating others at the same time.

  • Straw in the wind tests are symmetrical but weak—in court cases, we refer to them as “circumstantial evidence.” The labels and descriptions of these four kinds of evidence are useful for teaching and understanding Bayesian logic, but it is also important to note that they are points on a continuum: the relative probability of evidence under alternative explanations can range from zero to one, and evidence can have different degrees of (a)symmetry.

8.3.2 How to Do Process Tracing

A brief outline of how to do PT is as follows:

  • • First, identify the dependent variable or outcome to be explained and develop some candidate theories that might explain the outcome of interest, together with their associated independent variables.

  • • Second, after gaining at least preliminary knowledge of the values of the independent and dependent variables of cases in the population of interest, select the case or cases on which to do PT.Footnote 4 There are many rationales for different types of case selection in small-n research, depending on the research objective. While a full discussion of case selection is beyond the scope of the present chapter (see Gerring & Seawright, 2008), as one example, if the goal is to try to identify processes or variables omitted from extant theories or models, it can be useful to study an outlier or deviant case that does not fit existing theories or statistical models.

  • • Third, after selecting the case or cases for PT, revisit the initial candidate theories and develop a more precise set of mutually exclusive and exhaustive potential explanations of the outcomes of the particular cases to be studied. This might include some potentially causal features of the individual cases that were not initially considered among the general candidate theories.

  • • Fourth, make a preliminary estimate of the likelihood that each explanation is true (the “prior” in the Bayesian logic that underlies PT).

  • • Fifth, derive the observable implications of each alternative theory for each case, asking: “what specific and concrete processes must have operated, in what sequence, if this theory explains the case, and what kind of potentially accessible evidence would those processes leave behind? What evidence would be true if each theory is not a valid explanation of the outcome of the case?”

  • • Sixth, gather the evidence and weigh its likelihood under the alternative explanations. When evidence is more likely to be true under one explanation than under the others, it increases our confidence that the first explanation is true. The most powerful kind of evidence is that which is far more likely under one theory or explanation than the others. Such evidence allows the researcher to strongly update their degrees of confidence in alternative possible explanations for the outcome.

  • • Finally, weigh the totality of the evidence, including both strong and weak evidence, and update the prior estimate of each explanation’s likelihood of being true to produce a new posterior estimate.

Thus far, this account outlines the deductive side of PT. In addition, PT has an inductive side. Any unanticipated evidence that appears to perhaps play a causal role but does not fit any of the candidate explanations might provide the basis for a new explanation of the case. When a researcher adds a new alternative explanation, it is necessary to re-estimate the priors of the revised set of explanations, re-estimate the likelihood of evidence under each explanation relative to the others, and re-weigh the totality of the evidence to update the likelihood that each of the alternative explanations is true.

Bayesian logic in PT helps dispel a common misconception about the validity of different kinds of iterations between theories and evidence. Methodologists often argue that a researcher cannot develop a theory from a case and then test it against that same case. There is a good rationale for this injunction in frequentist statistical methods, as a theory derived from correlations found in a population sample cannot legitimately be tested against that same population sample, as the probability of disproving the new theory is zero. Using Bayesian logic in PT, however, makes it possible to derive a theory from a piece of evidence and then test that theory in the same case (Fairfield & Charman, 2018). There are two reasons for this, one incontrovertible and one more contestable. The incontrovertible reason is that it is often possible to develop a theory from a case and then to test it against different, independent, and heretofore unexamined evidence from the case that could still prove the new theory to be wrong. Detectives and doctors do this all the time—a doctor might find one piece of diagnostic evidence that suggests a patient might be afflicted by a disease the doctor had not previously considered, and this insight can lead to additional diagnostic tests on the same patient. If the new tests are based on biological relationships that are independent of the first test, they can either affirm or disconfirm the new candidate diagnosis. It would be nonsensical to argue that the new diagnosis should be tested on a different patient to find out why the first patient is ill.

The second rationale for developing and testing a theory in the same case is more ambitious and contestable—it argues that it is legitimate to derive a theory from a piece of evidence in a case and to claim that this same evidence can be a severe test of the theory. In Bayesianism, it does not matter whether one first identifies an explanation and then assesses the likelihood of evidence under that explanation relative to rival explanations, or first derives a theory from evidence and then assess the relative likelihood of that evidence vis-à-vis the new explanation and its rivals. Evidence that is consistent with one explanation and inconsistent with its rivals is strong evidence in favor of the explanation, no matter when or how the explanation was derived (Fairfield and Charman, 2022). To use an analogy, if a detective thought an aggrieved business associate was the most likely suspect in a robbery, but then found a video recording of the crime scene showing a neighbor whom she had not previously suspected carrying out the crime, the very evidence that turned attention to the new suspect would also be powerful evidence for a conviction. The counterpoint to the unqualified application of this view is that humans are subject to potential confirmation bias, and it may be harder to objectively assess the likelihood of less definitive evidence under alternative explanations once the evidence is known to be true. Either way, Bayesian logic dictates that when we develop a new explanation or theory, we have to go back and re-evaluate all the evidence we gathered earlier, assessing its likelihood under the new theory in comparison to its likelihood under the theoretical explanations we had already considered.

8.3.3 Best Practices in Process Tracing

This chapter outlines, in the section below, on new and future developments, more recent and formal Bayesian ways of carrying out PT. Here, it turns to pragmatic advice about best practices in both informal and formal Bayesian PT. These practices are summarized in Table 8.1 (from Bennett & Checkel, 2015:21), and briefly elaborated below.

Table 8.1 Best practices in PT

8.3.3.1 Cast the Net Widely for Alternative Explanations

It is important to consider a wide range of alternative explanations. Considering a few additional explanations that may quickly prove to be weak and deserving only of a footnote risks spending additional time and effort, but leaving out a viable explanation skews the analysis of the likelihood of the evidence and jeopardizes inferences from a case study. How do we know whether we have considered a sufficiently wide range of alternative explanations? I present here several “checklists” of common sources of potential social explanations as a pragmatic guide.

First, we can look to “off-the-shelf” theories academics have applied to similar questions, participants’ and stakeholders’ explanations for events and outcomes, historians’ and area and functional experts’ explanations, and the implicit or explicit explanations offered by news reporters (Bennett & Checkel, 2015: 23).

Second, the literature on quasi-experiments and program evaluation identifies many general explanations to consider. These include the followingFootnote 5:

  • Theory of change: the implicit or explicit theory that is the basis for a policy that seeks a change in outcomes.

  • History: exogenous events (events outside of the scope of the theories or explanations that a researcher is applying to a case) during the period under study that can affect outcomes (such as economic cycles, elections, natural disasters, wars, etc.).

  • Maturation: individuals might go through aging processes that improve or degrade outcomes or policy effects over time.

  • Instrumentation: changes in measurement instruments or technologies can affect the assessment of outcomes.

  • Testing: exposure to testing or assessment can change the way stakeholders respond to events or policies.

  • Mortality: there may be selection bias regarding which stakeholders or recipients drop out of a population being studied.

  • Sequencing: the order in which events happen or program treatments are implemented may affect outcomes.

  • Selection: if acceptance into a program or population is not random—for example, if the program chooses to address the easiest cases first (low-hanging fruit) or the hardest cases first (triage), there can be selection bias.

  • Diffusion: if stakeholders interact with each other, this can affect results of a policy or program.

  • Design contamination: competition among stakeholders can affect outcomes; those not selected as beneficiaries of a policy might try harder to improve their own outcomes, or they might become demoralized and not try as hard to succeed.

  • Multiple treatments: if governments or other organizations are administering programs at the same time, or if a program being evaluated includes multiple treatments this can affect outcomes.

A third checklist of explanations to consider includes four kinds of agent–structure relations: (1) agents affecting structures; (2) structures enabling or constraining agents; (3) agent to agent interactions; and (4) structure to structure relationships (like demographic change). These four kinds of agent–structure relations intersect with three broad families of social and political theories focused on (1) ideas/identities/social relations; (2) material resources and incentives; and (3) institutional transactions costs/functional efficiency. The resulting matrix encompasses 12 common kinds of theories. For example, the functional efficiency family of theories includes agents emulating other agents whom they view as successful, structures selecting out efficient agents as in evolutionary selection, functional competition among agents creating market or balance of power structures, and structure to structure processes like adverse selection (see Bennett, 2013; Bennett & Mishkin, 2023, for elaboration).

It is important to note that the requirement for mutual exclusivity among candidate explanations is often misunderstood (Bennett et al., 2021, cfr. Zaks, 2020). Mutual exclusivity can always be set up by explanations that point to different independent variables as the primary or most important variable in determining the outcome—only one variable can be the main one. It can also take the form of explanations that draw on different variables, but this does not have to be the case. Mutual exclusivity does not require that explanations be monocausal, and it does not prohibit explanations that draw on some or even all of the same variables. Explanations can involve as many variables as a researcher wants, in any functional forms or relationships the researcher wants to specify. They can also use exactly the same variables but just pose different possible functional relations among them. For example, an internal combustion engine needs four things to function: fuel, oxygen, a spark, and compression. These same four things could produce failure to function in different combinations or functional relationships. It may be that an engine does not turn over because the spark plug and piston rings are both a bit worn, the fuel is low octane or has some contaminants, and the air intake is a bit clogged, in such a way that improving any one of these would be enough to get the engine to turn over. Or maybe, two of these components are fine and two are just faulty enough that together they prevent the engine from turning over.

In addition, the aspiration or claim to have achieved an exhaustive set of alternative explanations is always provisional. We can never be sure that the candidate explanations are exhaustive because it is always possible that the true explanation is one we have not considered or discovered. We cannot include an explanation we have not conceived. This is one reason that Bayesians are never 100% confident that they have identified the correct explanation for an outcome.

8.3.3.2 Be Equally Tough on the Alternative Explanations

It is tempting to pick a “favorite” explanation early in a research project, but it is important to resist this temptation, as it can lead to confirmation bias. The alternative explanations should be plausible—if they are not plausible, they need to be reformulated or other explanations need to be considered. One of the ways that rigorous methods work is that they help us, or even force us, to guard against our own confirmation biases.

In PT, this takes the form of thinking through the observable implications for all of the hypotheses. This includes asking for each explanation “what would be the observable implications about the process and sequence in the case if this explanation is true”—a question that comes naturally due to the way our brains work. It also includes asking “what would be true if this explanation is false”—a question we might overlook if PT methods did not require us to address it.

It is also important to do PT in relatively equal depth on each of the alternative hypotheses. Otherwise, there is an inclination to favor one hypothesis or another and to keep looking for confirming evidence for that explanation until you find it, and to stop looking for PT evidence on the alternative explanations after finding one or a few pieces of evidence that make them less likely.

8.3.3.3 Consider the Potential Biases of Evidentiary Sources

Documentary records can be biased by the preferences or instrumental goals of the people who made them regarding what they want to record, keep, and make available. Interviewees can have instrumental goals or motivated biases as well. They can also have unmotivated biases—recalled memories can be accurate, and the interviewee may have had access to some information streams and not others at the time of the events being studied. One way to take such potential biases into account is to discount the weight of evidence that could be subject to these biases.

8.3.3.4 Consider Whether the Case Is Most or Least Likely for Alternative Explanations

This recommended practice relates to the estimation of the case-specific priors on the alternative explanations.

When an explanation has a high prior (a most-likely case), but there is strong evidence in the case that the explanation is not correct, this might not only affect our explanation of the case at hand—it might lead us to narrow the scope conditions of the failed explanation and lower its prior for similar cases. Conversely, if the evidence from a case strongly supports an explanation that had a low prior, this might lead us to widen the scope conditions of this explanation and increase its prior for similar cases.

It is also useful at times to pick cases in which some of the explanations usually offered for the kind of case being studied simply cannot apply because their key variables or enabling scope conditions were not present. This can simplify the PT on such cases as it reduces the number of explanations on which PT is necessary.

8.3.3.5 Make a Justifiable Decision on When to Start

As discussed above in the section on epistemology, there is no general rule for selecting the temporal starting point for a case study. Often, it is useful to start at a critical juncture at which a key choice was made among alternative policies or at which a strong exogenous shock occurred. But the choice of a temporal starting point also depends on whether we want to study deep, structural, and often, slow-moving causes or shorter-term, proximate causes that often relate more to agency than to structures.

Either way, the researcher must balance the costs and risks of going too far back in time, which increases the time and effort required for the PT, versus those of not going sufficiently far into the past, which risks overlooking important earlier causes that set in motion later mediating causes that explain less of the variation in outcomes across cases.

8.3.3.6 Be Relentless in Getting Diverse Evidence, but Make a Justifiable Decision on When to Stop

Here again there is no precise general rule: the researcher must balance the costs and risks of stopping the collection of evidence too soon, when a little more evidence could have greatly changed our confidence in the explanations, versus those of stopping too late, which leads to wasted time and effort and little additional updating on the alternative explanations.

Bayesian logic adds a little more specificity to this broad advice, as it indicates that after you have examined a lot of the same kind of evidence, each additional piece of that kind of evidence has a low probability of surprising you or pushing you to update your beliefs on the likelihoods that alternative explanations are true. This is because similar evidence has already been taken into account or used for updating. However, different kinds of evidence that have not been so exhaustively examined are more likely to lead to significant updating on the alternative explanations.

8.3.3.7 Combine PT with Case Comparisons if Relevant

While PT is a within-case method, it can be fruitfully combined with comparative case studies to strengthen causal inferences and clarify the scope conditions of explanations. A particularly powerful combination is the use of PT on “most-similar” and “most-different” cases.

Most-similar cases are the same (or at least roughly the same)Footnote 6 in the values of all but one of the independent variables and they have different values on the dependent variable. This provides some evidence that the difference on the one independent may cause the difference on the dependent variable, but this inference is provisional, since there may be other potentially causal factors that differ between the two cases and that are not included among the independent variables. It is thus useful to apply PT both to assess whether there is a pathway through which the value on the independent variable that differs leads to the outcomes of the two cases and to assess whether the other potentially causal factors that differ do not lead to or cause the outcomes.

Conversely, a least similar case comparison involves two cases with the same value on the dependent variable and only one independent variable that has the same value. Here, PT can assess whether the common independent variable leads to the outcomes and whether other shared potentially causal factors do not.

8.3.3.8 Be Open to Inductive Insights

PT is most efficient when the researcher first develops a set of candidate explanations as described in (1) above and identifies their observable implications and the associated evidence to gather. The deductive effort this requires is quick and inexpensive compared to the field, interview, or archival work of actually gathering of the evidence. At the same time, it is important to remain alert for evidence that suggests possible causal processes not included in the initial set of explanations.

The feeling of puzzlement or surprise at an unexpected or unanticipated piece of evidence can lead to the development of a new explanation of a case for which the researcher can identify new observable implications on which to seek evidence. For this reason, it is often useful to do some initial open-ended research on a case—a process that some have called “soaking and poking”—as researchers immerse themselves in a case.

This is not the same as trying to approach a case without preconceptions, as some suggest in the grounded-theory or other traditionsFootnote 7: soaking and poking is still preceded by developing a set of theories and unexpected evidence emerges against the background of those theories. In other words, we recognize it as puzzling because it does not fit any of our candidate explanations well. In practice, there can be many iterations between the explanations and the evidence (Fairfield & Charman, 2018).

8.3.3.9 Use Deduction to Infer What Must Be True if a Hypothesis Is True

While deductively deriving the observable implications of a theory is fast and easy compared to gathering evidence, it is still challenging and contestable. Theories are usually not sufficiently detailed to immediately identify their observable implications in a particular case. This means that researchers and their readers or critics will not always agree on what the observable implications are for an explanation.

The best that a researcher can do here is to be clear and explicit about the implications they derived from a theoretical explanation and the logic through which they derived them. It is also possible to entertain alternative readings of the implications of a theory, and to factor into the conclusions whether some or all of these proved true. If the evidence was consistent with both of two possible interpretations of a theory, for example, then the theory is likely to be true regardless of which interpretation one uses.

To identify observable implications, it is necessary to mentally inhabit the hypothetical world in which the explanation is true and imagine very concretely the specific steps, sequences, and processes through which the explanation’s independent variable(s) could have generated the outcome.Footnote 8 Often, researchers are not sufficiently concrete and specific in thinking about who should have said or done what to whom when if an explanation were true. There can also be functionally equivalent substitutable steps at different points in the hypothesized process. If possession of a gun was necessary for a suspect to have committed a crime, for example, evidence that the suspect had purchased a gun is equally informative no matter whether the gun was paid for by check or credit card.

8.3.3.10 Remember Not All PT Is Conclusive

A final injunction is to remember that not all PT is conclusive. Whether it is highly conclusive depends on whether the evidence is much more likely under one explanation than under the others, and this cannot be known beforehand. In addition, even when the evidence does greatly raise the likelihood that one explanation is true, there is always some possibility that an even more accurate explanation never occurred to the researcher.

For these reasons, process tracers can never be 100% certain, and it is important to be clear about any uncertainty that remains after analyzing the evidence. In the formal Bayesian PT approach described below, this takes the form of specifying the posterior on each hypothesis in terms of an explicit probability or range of probabilities.

8.3.4 Examples from COVID Case Studies

While laboratory studies on the COVID-19 coronavirus have led to a rapid accumulation of knowledge about its biochemistry, case studies using a PT logic have been vitally important in learning about its transmission in real-world settings, where experiments are not possible. When COVID-19 first emerged as a public health concern, doctors, scientists, and government officials had limited knowledge of how the disease spread. It is easy in this instance to construct mutually exclusive and exhaustive means of transmission: (1) airborne inside only; (2) airborne inside and outside; (3) airborne inside plus transmission via common contact surfaces; or (3) airborne inside and outside plus infection through contact surfaces.Footnote 9 Epidemiologists had a range of views on what prior likelihood they should assign to each hypothesis, but in the end the priors did not matter much because powerful evidence emerged that was much more likely under explanation 1, rather than under explanations 2–4, as by far the most common means of transmission.

A key early case study came from a restaurant in Guangzhou, China, where one patron who had COVID dined on January 24, 2020 with three family members. Two other families dined at adjacent tables. Within 5 days, nine members of the three families developed COVID, with no other known exposures apart from the restaurant and subsequent within-family transmission. Close study of the restaurant seating revealed that, outside of the index patent’s family, only those in the airflow path of the air conditioner that blew air across the table of the index patient developed COVID, while none of the other 83 restaurant patrons or eight staff developed COVID. The authors of a study on this case concluded that droplet transmission in the air-conditioner airflow was likely the key transmission mechanism, and recommended improved ventilation and greater table distancing in restaurants. The absence of any cases among the restaurant staff who handled the index patient’s dirty dishes can be considered a failed smoking-gun test: it slightly reduces the likelihood of transmission of coronavirus through contact with surfaces of objects (Lu et al., 2020).

A later case study of a superspreader event at a choir practice in March 2020 underscored the danger of air transmission inside. Of the 61 people who attended the 2.5-hour practice, including one symptomatic index patient, 32 confirmed and 20 probable secondary COVID-19 cases occurred. The study concluded that close proximity and the act of singing led to high rates of transmission (Hamner et al., 2020).

The most definitive case study of COVID transmission, however, came from an event that provided a strong natural experiment (Shen et al., 2020). In January 2020, 128 people took two separate buses with recirculating cooling units (60 people in the first bus and 68 in the second, including a symptomatic index patient in the second bus) on a 100-minute round trip ride to a 150-minute event. Another 172 individuals attended the event but did not travel on either bus. None of the attendees wore masks. At the event, participants attended a morning service outdoors, followed by a brief lunch inside. They then returned to the same bus that had brought them, and took the same seats. Within days, 23 people on the second bus developed COVID, none of the passengers of the first bus developed COVID, and another seven individuals who were in close contact with the index patient at the ceremony or lunch but who had not ridden by bus developed COVID. Passengers seven rows behind the index patient on the bus developed COVID, while passengers next to windows that could be opened had lower rates of infection. This case provided further smoking gun evidence of air transmission in long exposure indoors, including transmission by small and relatively far-traveling aerosol droplets as well as heavier droplets. Later studies concluded that while transmission through surface contacts could not be ruled out, and that cases of such transmission have been reported when individuals touched an object that had been sneezed or coughed upon by a COVID patient, the odds of catching COVID were approximately one case for every 10,000 surface contacts (CDC, 2021). Similarly, while the bus study did not discuss outdoor transmission and such transmission could not be ruled out due to the seven individuals who developed COVID without riding a bus, the rarity of confirmed cases of outdoor transmission has reportedly led many experts to conclude that such cases constitute only 1% of total cases and perhaps as low as 0.1% (Leonhardt, 2021).

A fourth case study indicates the high efficacy of mask-wearing to prevent COVID transmission. This study focuses on two hair stylists in Missouri who contracted COVID in 2020. While these individuals were symptomatic, they were in proximity to 139 patrons indoors. All wore masks, and none of the patrons developed COVID (Hendrix et al., 2020).

Although these four studies use the logic of PT implicitly rather than explicitly, their conclusions follow Bayesian logic. The authors intuitively used the likelihood of evidence under alternative explanations, together with the laws of probability, to update views of the likelihood of alternative COVID transmission paths in light of the evidence.

The chapter turns in the penultimate section to new methodological developments and the question of whether using the Bayesian logic of PT more formally and explicitly improves inference to the best explanation.

8.4 The “Replication Crisis” and the Comparative Advantages of Process Tracing Case Studies

8.4.1 The Replication Crisis

In the last 15 years, concerns over a “replication crisis” have swept through the social and medical sciences and the policy analysis and program evaluation communities. The crisis centers on the concern over high rates of failure in attempts to replicate peer-reviewed research findings in medicine and the social sciences, including those based on experiments as well as observational statistical studies. This does not necessarily mean that studies whose findings cannot be replicated are wrong—there are many reasons it may not be possible to replicate a study or its findings, including changes in the historical context that make it impossible to recreate the same sample as that in the original study. Yet there is also evidence that such sample differences do not account for much of the variation in results found in replication failures (Klein et al., 2018). In addition, there are well-known methodological problems that can lead to false or overly confident conclusions that could account for the high rate of replication failures of published research. These problems include publication bias (papers supporting their hypotheses are published at a higher rate than those that do not and a higher rate than studies with null findings), “p-hacking” (manipulation of experimental and analysis methods, possibly unwitting, that artificially produces statistically significant results [see Chap. 4 herein, especially Sect. 4.2.3, on the model dependence of statistical analyses]),Footnote 10p-fishing” (seeking statistically significant results beyond the original hypothesis), and “HARKing” (Hypothesizing After the Results are Known, or post-hoc reframing of experimental intentions to fit known data).

One result of the replication crisis has been renewed emphasis on lab experiments, field experiments, natural experiments, regression discontinuity designs, and other research designs that attempt to allow causal identification. Even though experiments are among the methods that have experienced replication problems, and even though they have very demanding requirements and assumptions (especially field experiments: Cook, 2018), properly done experiments are less subject to some of the methodological limits of observational statistical studies. “Natural experiments,” or real world situations in which samples of a population are assigned to or end up in two different contexts or “treatment” conditions in a way that is random or close to random, can also be powerful. Another approach that has generated increased attention is regression discontinuity designs, in which the investigator compares samples of a population just above and just below a threshold that is a cutoff at which a treatment, such as class size in public schools, is assigned (see Chap. 3 herein).

These experimental and quasi-experimental methods all have important roles to play in policy-relevant causal inferences. Researchers and journal editors have also taken steps to address the problems associated with the replication crisis. Pre-registration of research designs, for example, limits the risk that researchers might unintentionally make so many modifications to their models that one model will produce a high degree of fit just by chance. Public repositories for data and replication materials are making research more transparent. Researchers have become more transparent about the assumptions behind instrumental variable and regression discontinuity designs and the conditions under which these achieve internal, statistical, and external validity (see Chap. 3 herein, especially Sect. 3.3). Some journals are carrying out replications before publication. Matching techniques (see Chap. 4 herein) and out-of-sample testing have become more common, and some journals have de-emphasized p-values in favor of a broader range of measures of the robustness of quantitative results, or moved to p-values of 1% rather than 5% as the standard for publication.

Still, even with improved practices, experimental and quasi-experimental methods have limits that are different from those of PT. For many problems of interest to both scholars and policymakers—wars, epidemics, economic crashes, etc.—these methods can be subject to practical and ethical constraints and problems of internal or external validity. Lab experiments are quite different from real world conditions. Field experiments on large-scale phenomena that involve potential harm are unethical, and other kinds of field experiments may be prohibitively costly or operationally impossible. Natural experiments require a level of “as-if random” assignment to “treatment” and “control” groups that is rarely fully met except in studies of lottery winnings (Dunning, 2015). Regression discontinuity designs, as well as field and natural experiments, have the challenge of assessing potential confounding variables. In addition, all population-level analyses face the ecological inference problem.

Because case studies using PT have a different set of comparative advantages from those of experimental and quasi-experimental research designs, they are useful as both a standalone method and as a complement to these other methods in multimethod designs. Most obviously, PT is useful when policymakers are interested in understanding causation in individual cases. PT can be especially useful in studying deviant cases, or cases that do not fit existing theories, and inductively deriving and then assessing new potential explanations. But PT case studies are not just for situations in which we want to explain outcomes in one or a few cases, or when only a small number of cases exist. Even when there is a large and relatively homogenous population available for statistical or experimental study, case studies can help get closer to causal mechanisms, examining how they work down to small slices of space and time.

8.4.2 Process Tracing on Complex Phenomena

In addition, PT is useful for assessing various kinds of complexity. These include the following:

  • Endogeneity. Endogeneity arises when there are feedback loops between the dependent and independent variables and when the direction of causation (X → Y versus Y → X) is unclear. In this regard, PT helps untangle the direction of causation by focusing on the sequence of events. This helps with the assessment of which events or pieces of information came first, and what events actors may have anticipated when they took action.

  • Multiple treatments. PT can assess multiple treatments or explanations by considering the likelihood of evidence under each of them.

  • Path dependence. PT can untangle path dependence by examining the sequencing of events and the observable implications of theories about path-dependent mechanisms like positive returns to scale, learning by doing, first mover advantages, complementary institutions, and so on (Bennett & Elman, 2006). Most, if not all, of the research on path-dependency uses PT case studies rather than quantitative analysis.

  • Equifinality. Equifinality is the existence of alternative paths to the same outcome. These paths may have many or no independent variables in common. Case studies using PT can chart out different paths to the outcome one case at a time.

  • Non-independence of cases. PT can assess the evidence on mechanisms that create dependences among cases, such as learning or emulation from one case to another.

  • Potential confounders. PT can assess whether any potential confounders identified in the course of research have a causal path to the outcome.

8.4.3 Process Tracing in Multimethod Research

PT can also be combined with other methods. One useful approach is to carry out a statistical analysis on observational data and then process trace one or a few cases to see if the hypothesized mechanisms that might explain population level correlations are evident in individual cases (Lieberman, 2005; Small, 2011). Statistical analysis can help identify outlier or deviant case, and PT on these cases may help identify omitted variables (Bennett & Braumoeller, 2022). In natural experiments, PT, on the ways in which different individuals or groups are “assigned” to or end up in the “treatment” and “control” groups, can help assess the validity of the assumptions of “as-if random assignment,” unbiased dropout rates, and no unmeasured confounders (Dunning, 2015). PT can be combined with Qualitative Comparative Analysis as well, helping to identify the potentially causal processes that generate the outcomes of individual cases (Schneider & Rohlfing, 2013).

8.4.4 Process Tracing and Generalizing from Case Studies

One alleged limitation of PT case studies is their supposed inability to generalize from their results, or to achieve external validity. This issue has often been misunderstood, however (George and Bennett, 2005; Bennett, 2022). “Average treatment effects” are not the only way to conceptualize generalization, and they are not always the most useful ones. The “average treatment effect” of being born, for example, is having 1.5 X chromosomes and 0.5 Y chromosomes, an outcome that does not exist for any single person. Sometimes it is useful instead to have narrow but strong “contingent generalizations,” or generalizations that apply to only a few cases or to a specified subset of a population, such as cases that share similar values on the independent variables and the dependent variable.

Single and comparative case studies using PT may or may not allow contingent generalizations. It is impossible for a researcher to know whether and to what population or scope conditions the findings of a case study will generalize before they have developed, perhaps partly inductively, a satisfactory explanation of the case. The understanding of the causal process that emerges from PT in a case study, together with theoretical intuitions on the scope conditions in which it operates and background knowledge on the frequency with which those conditions arise, is what determines whether, where, and how a case study’s findings might generalize. Charles Darwin, for example, studied several bird species on remote islands and came away with the theory of evolution, whose scope conditions include all living things. Conversely, imagine discovering that a voter favored a candidate not because of party affiliation, ideology, or any of the usual reasons, but because the candidate was the voter’s sister-in-law. This would only generalize to the relatives of candidates, or perhaps more loosely to social relations not ordinarily considered to be important to voting decisions (and some voters might vote against their in-laws despite sharing their party affiliations and policy views!).

In addition, the understanding of causal mechanisms that emerges from PT on a case, to the extent that this understanding is accurate, may generalize not only to similar cases or populations but to populations and contexts different from those of the case study at hand. As noted above, Darwin’s theory of evolution applied not only to birds but to all living creatures. This is different from testing or applying a theory to an out-of-sample subset of a population, as is sometimes done in statistical analyses; it is applying a theory to an out-of-population case or sample.

8.4.5 Limitations of Process Tracing

The limitations of PT correspond with the strengths of experimental, quasi-experimental methods and studies using statistical analyses of observational data. PT does not produce estimates of average effects, or correlation coefficients of independent variables. PT can shed light on how or through what mechanisms independent variables generated outcomes, but its inferences are more provisional and do not necessarily produce as confident an answer as randomized controlled experiments on whether a variable has an effect on the outcome.

8.5 New Developments in Process Tracing

Two new methodological developments are pushing the frontiers of process tracing. Both developments are outlined in forthcoming books, and both are rather technical and complex, so this chapter provides only a brief overview of each.

8.5.1 Formal Bayesian Process Tracing

Tasha Fairfield and Andrew Charman have worked out several methodological challenges to develop procedures for formal Bayesian PT (Fairfield & Charman, 2017; Fairfield & Charman, 2022). In formal Bayesian PT, researchers develop explicit numerical priors, between 0 and 1 or 0% and 100%, on the likelihood that alternative explanations are true (these could be ranges between high and low bounds, rather than point estimates). They also identify explicit numerical likelihood ratios for evidence conditioned on the alternative theories (which, again, need not be point estimates), and use these, together with Bayesian analysis of the collected evidence, to arrive at numerically explicit posterior estimates on the likelihood that alternative theories are true. Estimates of priors can be based on background information, on crowd-sourcing, or on a principle of indifference that assigns equal prior probability to all explanations. Estimates of likelihood ratios of evidence come from the theoretical logic of the alternative explanations. Researchers can check on the robustness of the posterior estimates by trying different distributions or ranges of priors and likelihood ratios.

One useful innovation that Fairfield and Charman introduce is the use of a logarithmic scale for the likelihood ratios of evidence. This simplifies the math, as logarithms allow adding the weight of different pieces of evidence rather than using multiplication. In addition, logarithmic scales, such as the decibel (db) scale, reflect the ways in which humans experience stimuli such as light or sound. It is intuitively easy to ask if a piece of evidence is “whispering” (30 db), “talking” (60 db), “shouting” (70–80 db), or “screaming” or above (90+ db) in favor of one explanation or another. After assigning logarithmic weights to how much each piece of evidence argues in favor of one explanation vis-à-vis another, the researcher can simply add up all of the weights to arrive at posterior estimates, just as if adding weights on a scale.

A common misunderstanding here is that the number of necessary comparisons of theories vis-à-vis the evidence becomes combinatorially large as the number of explanations grows (Bennett et al., 2021; cfr Zaks, 2021). This assumes that the likelihood for each piece of evidence under every hypothesis must be compared directly to that of every other hypothesis. In fact, it is necessary only to compare the likelihood of each piece of evidence for one explanation to that of each of the other explanations, and this implicitly compares the likelihood of the evidence under all the explanations to each other. By way of analogy, one could weigh a watermelon in terms of strawberries, and then weigh all the other fruits in a store in terms of strawberries, and this would provide the relative weight of every fruit in terms of either watermelons or strawberries.

Formal Bayesian PT has the advantage of making explicit all the judgements that are made implicitly in informal PT. This clarifies where and why an author and their readers or critics might disagree: they could disagree on the priors, on the likelihood of evidence, or on the reading of the evidence itself (one person may think a person interviewed in a research project is untruthful, for example, and another may not). Despite the advantages of formal Bayesian PT, however, its advocates do not recommend doing it fully on every piece of evidence for every hypothesis. Doing so requires an unrealistically long and tedious write-up of research results. Researchers may find it useful, however, to carry out full formal Bayesian analysis on a small number of pieces of evidence that they consider to be the most powerful in discriminating among the hypotheses. In addition, even though it is inadvisable to fully carry out and write up formal Bayesian PT, the demonstration that it is in principle possible to do so, and the explication of the logic of doing so, help guide the reasoning of informal or partially formal Bayesian PT.

8.5.2 New Modes of Multimethod Research

A second innovation, in an article and a forthcoming book by Macartan Humphreys and Alan Jacobs, also builds on Bayesian logic and moves in a compatible but different direction. Humphreys and Jacobs use formal causal models, in the form of Directed Acyclic Graphs (DAGs), to help identify the hypothesized probabilistic dependencies among variables that enter into PT (Humphreys & Jacobs, 2015; Humphreys and Jacobs 2023; on DAGs, see also Chap. 6 herein). These authors argue, as the present chapter has, that design-based inferential approaches like experimental and quasi-experimental methods cannot be carried out on many questions that interest both policymakers and scholars, and that these methods can sometimes provide information on effect sizes without clarifying the underlying models or mechanisms. Consequently, Humphreys and Jacobs focus on model-based inference rather than design-based inference.

DAGs are models that formally represent theories in ways that make these theories’ assumptions about mediating, moderating, and potential confounding variables clear and precise. Put another way, DAGs are graphical representations of Bayesian networks. Mediators are variables along the hypothesized causal path between an independent and dependent variable, so they help explain how the independent variable affects the dependent variable. Moderators are variables that affect the relationship between an independent variable and the dependent variable—they can strengthen, weaken, or negate that relationship. Confounders are variables that affect both the value of an independent variable and that of the dependent variable in a causal model, making it hard to estimate the true effect of the independent variable.

Humphreys and Jacobs argue that the core logic of their approach is most closely connected to PT and Bayesian inference, and they maintain that formally representing theories as DAGs helps guide methodological choices in both PT and quantitative analysis in ways that modify some traditional advice about how to carry out PT. Contrary to some earlier advice on case selection, for example, they argue that model-based inference demonstrates that for many inferential purposes “on the regression line” cases, or cases in which the outcome of interest occurred, are not necessarily the most informative. Optimal case selection, in their view, depends on the population distribution of different kinds of cases and the probative value of the available evidence. They also argue that the focus on intervening causal chains (mediators) in PT can sometimes be less productive than examining moderating conditions (moderators). Finally, DAGs can inform choices in multimethod work between breadth (how many cases to study) and depth (how intensively to study individual cases).

More generally, Humphreys and Jacobs argue that their approach dissolves the usual distinctions between qualitative and quantitative research, and that it can address and integrate case level and population level queries.

8.6 Conclusions

PT methods have many uses and comparative advantages. Unlike experimental and quasi-experimental and statistical methods, they can develop inferences on alternative explanations of individual cases. As PT is always on observational evidence in single cases, its scope is not as limited by cost, ethical concerns, or availability as experiments or quasi-experiments (although, to the extent that PT involves human subjects research such as interviews, it can raise ethical issues that require approval from an Institutional review board). PT brings causal inference close to the operation of causal mechanisms, sometimes in relatively small slices of space and time. While it is the only method (other than ethnographic methods) that is possible when one or a few cases exist, it is still useful for illuminating the operation of causal mechanisms and assessing the assumptions behind other methods even when large or randomly assigned populations are available for study. It can therefore contribute to multimethod projects involving statistical, experimental, and quasi-experimental methods.

At the same time, PT has several limitations and poses a number of research challenges. Collecting the necessary evidence can be laborious and time-consuming, and the conclusions can only be as strong as the evidence allows. Identifying the observable implications of alternative explanations requires careful thought, and scholars might not agree on what rather general theories imply about such implications in particular cases. PT case studies may allow strong contingent generalizations, or they may not. More broadly, just as the strengths of PT arise in areas where quantitative methods are weak, PT is weak where these other methods are strong. PT does not produce estimates of average effects, or correlation coefficients of independent variables. It can shed light on how or through what mechanisms independent variables generated outcomes, but its inferences do not necessarily produce as confident an answer as randomized controlled experiments on whether a variable actually had any effect on the outcome. Yet precisely because the strengths and weaknesses of PT and quantitative methods offset each other, there is great value in combining these approaches in multimethod research.

Recent innovations by Fairfield, Charman, Humphreys, and Jacobs hold great promise for continuing the recent and rapid improvement of PT methods and practices.

These authors’ ambitious innovations are at the cutting edge of PT techniques. As such, they have thus far been of interest mostly to methodologists and have not yet had a chance to be taken up by the much larger community of case study researchers. In short, although PT methods and practices are in some senses thousands of years old, they will continue to develop.

Review Questions

  1. 1.

    What are the differences among the neo-Humean regularity, counterfactual, manipulation/experiments, and the causal mechanism accounts of causation and causal inference?

  2. 2.

    What is a ‘prior’ in Bayesian terms? What is a ‘posterior?’ What is the ‘likelihood of evidence’ and how does it help us ‘update’ our prior to form our posterior? What kind of evidence allows the most updating?

  3. 3.

    Why is it important to ‘cast the net widely’ when formulating potential alternative explanations for the outcome of a case? How can you combine process tracing with case comparisons? Why are Bayesians never 100% sure they have the true explanation for an outcome?

  4. 4.

    What does it mean for alternative explanations to be ‘mutually exclusive and exhaustive?’ Does mutual exclusivity require that the explanations use completely different independent variables?

  5. 5.

    What does each of the following terms mean in the context of process tracing: Theory of change, History, Maturation, Instrumentation, Testing, Mortality, Sequencing, Selection, Diffusion, Design contamination, Multiple treatments.

  6. 6.

    Why is it important to pay attention to surprising or unexpected evidence from a case?

  7. 7.

    How can process tracing be combined with comparisons between cases?

  8. 8.

    What kind of conclusions can be drawn from the following case studies, and how does process tracing logic lead to these conclusions: (1) the transmission of COVID in an air-conditioned restaurant; (2) the spread of COVID at a choir practice; (3) the spread of COVID in one bus attending a ceremony and lunch but not the other bus; (4) the lack of transmission of COVID at a hair-dressing shop where two hairdressers had symptomatic COVID?

  9. 9.

    What are the meanings of the following terms for kinds of complexity: indigeneity, path dependence, equifinality, multiple treatments, non-independence of cases, potential confounders? How can process tracing help untangle each of these kinds of complexity?

  10. 10.

    How can process tracing be combined with statistical analysis of observational data? With quasi-experiments?

  11. 11.

    Under what conditions is it possible to generalize the results of a case study, and under what conditions is it not possible to do so?

  12. 12.

    How does formal Bayesian process tracing differ from less formal methods of process tracing? Is it advisable to do and write up formal Bayesian process tracing on every piece of evidence in a case study? Why or why not?

  13. 13.

    What is a Directed Acyclic Graph and how can it assist in process tracing and the integration of qualitative and quantitative evidence?

  14. 14.

    What are the limits and costs of process tracing?