Causal Inference Challenges and New Directions for Epidemiologic Research on the Health Effects of Social Policies

Epidemiologic research on the health effects of social policies is growing rapidly because of the potentially large impact of these policies on population health and health equity. We describe key methodological challenges faced in this nascent field and promising tools to enhance the validity of future studies. In epidemiologic studies of social policies, causal identification is most commonly pursued through confounder-control but use of instrument-based approaches is increasing. Researchers face challenges measuring relevant policy exposures; addressing confounding and positivity violations arising from co-occurring policies and time-varying confounders; deriving precise effect estimates; and quantifying and accounting for interference. Promising tools to address these challenges can enhance both internal validity (randomization, front door criterion for causal identification, new estimators that address interference and practical positivity violations) and external validity (data-driven methods for evaluating heterogeneous treatment effects; methods for transporting and generalizing effect estimates to new populations). Common threats to validity in epidemiologic research play out in distinctive ways in research on the health effects of social policies. This is an active area of methodologic development, with ongoing advances to support causal inferences and produce policy-relevant findings. Researchers must navigate the tension between research questions of greatest interest and research questions that can be answered most accurately and precisely with the data at hand. Additional work is needed to facilitate integration of modern epidemiologic methods with econometric tools for policy evaluation and to increase the size and measurement quality of datasets.


Introduction
Policies influencing social determinants of health are promising tools to improve population health and reduce health inequities. Legal rules established at any government jurisdiction (e.g., federal, state, city) can influence the distribution of social or behavioral determinants of health. Social policies include laws affecting education, income, immigration, labor, human rights, and employment, as well as those that regulate products such as alcohol, tobacco, and firearms [1•, 2•]. Such social policies may have broader impacts than policies regulating medical care or health care financing [3][4][5]. Although social policies have the potential to enhance health equity, they can also sustain or propagate oppression and inequity. Rigorous evaluation of these policies is thus increasingly recognized as an important domain for epidemiologic research.
Research on the health effects of social policies draws on both epidemiology and social science. Substantive knowledge about social policies and methodologies for policy evaluation are often better represented in economics and sociology than epidemiology. Research on drivers of population health is the essence of epidemiology. This convergence brings new methodological challenges, and growing interdisciplinary methodological work is helping to bridge the language, methods, and evidence across these disciplines [6-8, 9•, 10-12].
Like all research aiming to draw causal inferences, studies on the health effects of social policies require strong assumptions and must address potential violations of conditional exchangeability, positivity, and consistency, among others (Box 1) [9•, 13]. However, policy studies face unique challenges to these assumptions. Randomization is frequently not feasible or ethical for policy evaluations, and researchers often seek to evaluate policies that have already been implemented in a non-randomized fashion. The small number of jurisdictions included in most policy studies (e.g., there are only 50 US states) often causes positivity violations. Interference is expected because people are influenced by the policies in jurisdictions neighboring their own, and because policies change social norms. Therefore, policy research routinely requires innovating on traditional epidemiologic tools to accommodate violations of the conventional assumptions, for example by modifying the target parameter and corresponding research question [14]. Yet not all studies on the health effects of social policies are designed and executed with careful attention to such methodological issues [1•]. Without appropriate consideration of the study design, statistical analysis, and interpretation, policy evaluations can be useless, or even harmful.
We begin this review by summarizing common approaches to studying the health effects of social policies. The subsequent sections elaborate how the assumptions for causal inference play out and are challenged for epidemiologic research on the health effects of social policies. Interwoven are promising methodologic frontiers in social policy research and tools for enhancing study validity.
Box 1. Definitions of key causal concepts 1. Potential outcome: The outcome that an individual (or other unit of analysis, such as family or neighborhood) would experience if his/her treatment (or exposure) takes any particular value. Each individual is conceptualized as having a potential outcome for each possible treatment value. Potential outcomes are sometimes referred to as counterfactual outcomes 2. Exchangeability: The assumption of no confounding, i.e., the assumption that which treatment an individual receives is unrelated to her potential outcomes if given any particular treatment. This assumption is violated for example if people who are likely to have good outcomes regardless of treatment are more likely to actually be treated. In the context of instrumental variables analysis, exchangeability is the assumption that the instrument does not have shared causes with the outcome 3. Conditional exchangeability: The assumption that exchangeability is fulfilled after controlling for a set of measured covariates. When this assumption is met, we say that the set of covariatesknown as a sufficient set-fulfills the backdoor criterion with respect to the treatment and outcome 4. Positivity: All subgroups of individuals defined by covariate stratum (e.g., every combination of possible covariate values) must have a nonzero chance of experiencing every possible exposure level. Put another way, within every covariate subgroup, all exposure values of interest must be possible

Consistency:
The assumption that an individual's potential outcome setting treatment to a particular value is that person's actual outcome if s/he actually has that particular value of treatment. This could be violated if the outcome might depend on how treatment was delivered or some other variation in the meaning or content of the treatment. Some researchers consider consistency a truism rather than an assumption The informal definitions presented in this box are quoted from Matthay et al. 2019 [9 • ]

Common Approaches
The analytic approach is driven by both substantive interests and methodologic constraints. We distinguish between research evaluating the health effects of social policies by controlling confounders (Fig. 1a) and research that uses policies as instruments to quantify the effect of a social resource regulated by the policy (Fig. 1b) [9•]. For example, we might evaluate the effect of the earned income tax credit (EITC) policy on smoking prevalence [15], or we might instead use state-to-state variation in EITC benefit generosity to estimate how the extra income delivered by EITC affects health [16]. Both types of study are relevant to understanding policy effects. Evaluating the effects of a policy is useful for example to anticipate the effect of adopting a similar policy in the future. Using the policy as an instrument to evaluate the effects of the social resource regulated by the policy (i.e., defining that resource as the endogenous variable in an instrumental variable analysis) can strengthen causal evidence on social determinants of health and generate evidence to predict the impact of alternative interventions on those social resources. For example, EITC is one of the many possible policies to increase income for low-income families; one could also increase income by altering the minimum wage or enhancing Temporary Assistance to Needy Families benefits. Establishing the health benefits of extra income will help anticipate the consequences of these other income-support policies. Figure 1 summarizes common confounder-control and instrument-based methods. All confounder-control methods are all premised on the assumption that the factors that determine policy exposure and also influence heath (or proxies for these factors) are fully measured and accounted for in the statistical analysis. For example, average education, income, or attitudes of community residents all affect health and may influence the likelihood a policy is adopted. Researchers often attempt to measure and control for each of these variables. In contrast, instrument-based methods require the assumption that the instrument (e.g., the policy adoption indicator) is unrelated to the health outcome that people in the sample would have experienced under alternative values of the policy exposure. Although instrument-based methods in health research more commonly evaluate the effect of a resource delivered by the policy, they may also evaluate a policy as the exposure.
Instrument-based methods are commonly called quasiexperiments because the variation in the exposure that is induced by the instrument is thought to be like-random or arbitrary. Instrument-based methods are commonly perceived as having greater internal validity, because they circumvent the need to correctly identify, measure, and control for a set of covariates sufficient to control all confounding [9 •]. However, data from quasi-experimental settings can be statistically analyzed using either confounder-control or instrument-based methods, depending on the substantive question. The internal validity of the design derives from the quasi-experimental context, not from the analytic choice.
Since the publication of foundational work on the causal interpretation of instrument-based methods [19], IV has emerged as a hallmark of applied econometrics. In epidemiologic research involving social policies, confounder-control methods are more common, but uptake of instrument-based methods is increasing (Fig. 2) [20]. For both instrument-based and confounder-control methods, meeting the required assumptions for conditional exchangeability is challenging. Not every policy change is a valid instrument for a measured exposure variable. Multiple policies may change simultaneously. Places that adopt a policy may differ in important and unmeasured ways from places that do not adopt the policy. Thus, policy studies are most compelling when the source of quasi-experimental variation is plausibly like-random or arbitrary (Table 1).

What effects are measured and among whom? Measurement, consistency, and spillover
In research on the health effects of social policies, identifying and measuring relevant policy exposures is a major Exposure: Social policy (e.g., EITC) Required assumptions for using the backdoor criterion to evaluate causality: Conditional on measured covariates, the distribution of the health outcome in the group exposed to the social policy would have been the same as the distribution of the health outcome in the group unexposed to the social policy, had the group exposed to the social policy instead been unexposed, and vice versa Common methods Multiple regression models; propensity score matching or weighting; differences-in-differences; variants of differences-in-differences including two-way fixed effects and synthetic controls  (Fig. 1a). Alternatively, policies may be used as instruments to estimate the effect of the resource regulated by the policy, such as income, on health (Fig. 1b). Either approach requires strong assumptions to support causal inferences. For instrumental variable approaches, the three assumptions listed are required for evaluating causality, but causal identification also usually relies on a fourth assumption of monotonicity-that is, that the instrument does not affect the likelihood of exposure in opposite directions for different people in the sample. This figure provides intuitive informal definitions of the required assumptions for evaluating causality. There are many different ways to state the formal required assumptions, and we refer the reader to Hernán and Robins [13], Pearl [17], and Glymour and Swanson [18] for different variations. In particular, one alternative way to state the assumptions is using potential outcomes language. For the backdoor criterion, the required assumption to evaluate causality can be phrased as: Conditional on measured covariates, the potential health outcome is unrelated to the actual level of the social policy exposure. Likewise, the required assumption for using an instrumental variable to evaluate causality is: Conditional on measured covariates, the instrument is unrelated to the potential health outcomes that people in the sample would experience under alternative values of the exposure.  [63,64]. Legal Epidemiology resources are available to guide this process and maximize rigor and replicability.
Efforts are growing to support theory-based policy taxonomies [65•, 66], legal databases that are systematically coded and regularly updated (e.g., Law Atlas Policy Surveillance Program, Alcohol Policy Information System), and quantification of policy enforcement [63,64]. Qualitative work is essential to fully understand how policies operate and affect individuals.
Most policy evaluations use binary indicators of policy enactment, but more nuanced and multi-valued measures may better address threats to positivity and conditional exchangeability [67•]. Alternatives to binary measures include the proportion of the population eligible for or receiving a policy benefit [68]; the generosity of benefits [33]; the size of a tax, subsidy, or penalty [33,69,70]; the magnitude of uptake of a newly permitted activity (e.g., the number of new cannabis dispensaries) [71,72]; or the degree of enforcement [73]. If it is not possible to identify the effect of a single policy, the joint effects of policies adopted as a bundle [74,75] or the effect of the policy environment as measured by overall policy stringency or comprehensiveness [34,35,76] may be identifiable.
Analyses involving distinct exposure measures estimate different parameters and therefore answer different research questions. Researchers should consider what information will be most useful for developing new policies. Policymaking is notoriously messy and unpredictable, so the precise policy implemented in the past may not be politically achievable in the future. Research to understand the effects of specific policy features, and evaluations that can offer broader theoretical insights, are often more informative than research to understand a single, narrowly defined policy instantiation. Conflicting evidence on the effects of a policy may result from differing exposure measures. These can be conceptualized as due to violations of the consistency assumption [77]. Varying approaches to implementing similar policies in different places can result in different policy impacts. For example, studies of cannabis legalization often use a binary indicator of legalization, but health impact depends on whether the legalization policy enables retail sales [78][79][80]. For IV studies of the effects of a resource delivered by a policy, it is important to consider whether the consistency assumption holds with respect to the resource variablefor example, would the health benefit of extra income from EITC be similar to the health benefit from extra income due to minimum wage increases.
To avoid consistency violations, all of the relevant provisions and components of implementation must be correctly identified and measured-a particularly challenging task when it is not yet known what matters. Consistency is also problematic for policy measures that are composites, sums, or scores. Policy analyses often treat a one-unit increase in the score as having the same effect regardless which policy was added or the baseline level of the score [34,81], yet not all policies are equally effective for all outcomes. When consistency violations occur, the evidence delivered by a policy study will not clearly indicate what needs to be done (or not done) to replicate the study's findings, and policy decisions motivated by this evidence may not yield the desired results.
Social policies often have different effects for different population subgroups-i.e., heterogeneous treatment effects (HTEs) [82•]. For example, tobacco clean air policies especially benefit people with low levels of education [83] while paid family leave policies may improve breastfeeding outcomes primarily for high-income mothers [31]. HTEs are important to quantify for several reasons. Policies that disproportionately benefit the vulnerable can reduce health inequities, whereas the reverse can exacerbate inequities. Quantifying HTEs also supports research to anticipate whether the effect of an intervention will differ if implemented in a new population with a different composition. For example, recent extensions of methods for generalizing or transporting effect estimates can account for differences in the socioeconomic status, demographics, clinical profiles, or other characteristics in new settings [84•, 85-88]. On its own, understanding policy effects for specific subgroups (e.g., those receiving the intervention) may be a goal of the research [89,90]. Alternatively, HTE research can indicate who is most likely to have the largest benefit from a given resource, a priority for decision-makers with limited resources.  [21], housing vouchers [22], school vouchers [23], randomized refugee dispersal [24] Discontinuities based on dates of policy adoption or implementation Minimum unit pricing on alcohol purchases [25], lowering the blood alcohol content limit for drivers [26], sugar-sweetened beverage taxes [27] Region and time variation in policy adoption or implementation Electoral cycles [28], immigration laws [29], motor vehicle safety laws [30], paid family leave laws [31] Timing of delivery of benefits of a policy Tax credits and short-term health outcomes in months on tax disbursements (February, March, April) [15], monthly Supplemental Nutrition Assistance Program disbursements and health outcomes in first half versus second half of the month [32] Policy intensity, restrictiveness, or generosity Earned income tax credit benefit generosity [16], unemployment benefit generosity [33], firearm law restrictiveness [34], alcohol law restrictiveness [35], minimum wage level [36], beer excise tax [37] Physical proximity to a jurisdiction with a policy or a resource delivered by a policy Proximity to educational institutions [38], proximity to borders of states with cannabis legalization [39,40] [57,58], judges who have different propensities for leniency [59,60] We distinguish methods for evaluating HTEs based on a priori specified characteristics from data-driven methods which more agnostically search for groups with heterogeneous responses to the policy [82•]. Heterogeneity across pre-specified characteristics (e.g., race/ethnicity, age) is usually quantified via stratification or interaction terms in statistical models, and the chosen dimensions can be guided by theory or evidence [82•]. Data-driven methods can be used to evaluate whether there is any heterogeneity across any characteristics (e.g., across all possible combinations of covariates), to partition the participants into subgroups that have different policy responses, to identify the mostaffected subgroup, or to identify optimal policy combinations [67•, 91-95]. Data-driven HTE methods are one of the many emerging applications of machine learning in policy research, but they remain rare in applied health-related policy studies [67•, 82•].
Existing guidance on how HTEs should be evaluated, reported, and interpreted has been limited to randomized trials and clinical applications [96][97][98][99][100]. Policy studies involving HTEs require additional guidelines. Testing for policy effects in multiple population subgroups increases the risk of spurious findings, especially when considering subgroups defined by multiple covariates simultaneously. How this risk should be weighed against the knowledge gained and which methods are most appropriate for limiting spurious findings in policy contexts are not established. Precision concerns are exacerbated when examining HTEs. Given the limited resources, HTE evaluation should be prioritized for policies, settings, and population subgroups for which HTEs are likely to be substantial enough to alter recommendations for policy or practice. Yet few detailed empirical studies or theoretical frameworks are available to guide prioritization. Tools to robustly evaluate HTEs in small sample sizes are also needed.
In policy studies, unexpected, inconsistent, and null results are common. The Moving to Opportunity housing experiment benefitted girls and harmed boys, at least for some outcomes [22]. "Ban the box" policies, designed to reduce racial disparities in employment by preventing employers from conducting criminal background checks at certain stages of hiring, paradoxically exacerbated inequalities for low-skilled workers [101]. Identifying the causes of discrepant results can be challenging because the data sources, measures, statistical methods, and settings of policy studies are so diverse [1 •]. Implementation science and qualitative research can enhance social policy research by unveiling what went wrong or what needs to be done differently (e.g., solutions to differential uptake). Policy research is often disconnected from community engagement efforts, but such research will be stronger and more pertinent if researchers build in strategies to involve and solicit feedback from communities affected by the policies and health outcomes under study.
Spillover or interference violates the independence assumption of standard statistical approaches in social policy research and can lead to anticonservative standard errors and spurious associations. Policies may change health not only for the people to whom the policies directly apply but also for family members, neighbors, or other social contacts-for example, because the health outcome is contagious, responsive to social norms (e.g., smoking), or social in nature (e.g., violent injury). Neighboring jurisdictions with differing laws may motivate individuals to cross borders to avoid or pursue specific policies. For example, cross-border influences have been investigated for policies regulating firearms [102,103], tobacco [104], and sugar-sweetened beverages [105]. Policies operating at lower levels of aggregation (e.g. cities versus countries) may be more susceptible to spillover, because individuals with differing policy exposures are more likely interact or cross borders. Spillover and interference can magnify the effects of social policies (e.g., via contagion) or attenuate effects (e.g., crossing borders to avoid the policy). Either phenomenon threatens accurate estimation of causal effects.
Potential solutions include spatiotemporal modeling, complex systems modeling, social network analyses, and estimating novel parameters. Researchers have used Bayesian spatiotemporal analyses with conditional autoregressive random effects to account for spatial autocorrelation in studies of local alcohol and cannabis policies [106,107] and agent-based models have been used to evaluate the impact of past and potential alcohol and opioid policies (see Appendix for more on complex systems modeling) [108,109]. Estimating parameters that explicitly incorporate dependencies between jurisdictions or groups in their definition [110] or that allow for policy exposures to be assigned stochastically rather than deterministically [111] can also help address challenges to spillover and interference. Additionally, novel econometric work is advancing methods for assessing causal effects in social networks (e.g., coworkers, classrooms, or neighbors) [112,113].
Generalizing study results to new populations is almost always a goal of epidemiologic research. Advances in theory and statistics increasingly support methods to generalize or transport effect estimates [84•, 85-88]. Given that social policy effects likely differ across population subgroups, a given policy may have very different impacts in one population versus another. Methods for transport and generalization allow researchers to predict the impact of a policy in a population that is different in composition from the one initially studied. Despite the relevance of these methods to policy studies, transportability estimators have almost exclusively been applied to research based in randomized trials. All studies involve tradeoffs between internal and external validity, and no single study can achieve the optimal degree of internal and external validity [9•, 114•]. Even with imperfect internal validity, observational and quasi-experimental studies aiming to generalize estimates of policy effects to new populations can be valuable.

Statistical considerations and data needs
Policies often have small effects on individuals [115•]. Although small effects can matter immensely when applied to an entire population [116], many social policy studies lack sufficient sample sizes to precisely estimate these effects [117•]. Underpowered studies risk concluding that a healthpromoting or harmful policy has no effect. For this reason, large-scale administrative health data such as vital statistics or Medicare billing records are commonly used for policy studies. Electronic health records (EHR) and other clinical information systems are increasingly viable possibilities for policy analysis. The Affordable Care Act accelerated uptake of EHRs and incentivized adoption of a common data framework, although the lack of population representativeness in such datasets has not been fully addressed.
Administrative data tend to have less detailed, lower quality measurements, and rarely contain information on both social policy exposures and health outcomes. Therefore, health-related social policy evaluations often require linkages across multiple data sources or sacrificing measurement quality. Residential address can be used to link individuallevel health outcomes to relevant policy jurisdictions. Residential history information even permits linkage across life course periods. Few datasets offer comprehensive residential histories, but incorporating or linking residential histories into existing datasets would enhance policy research.
Given the limitations of administrative data, potential strategies to improve precision and measurement quality include taking detailed measurements on random subsamples of large datasets [118]; incorporating measures of social program participation into existing large-scale health data collection [119]; incorporating improved health measures into large-scale generalized datasets (e.g., American Community Survey); and supporting big data initiatives that harmonize multiple individual-level, geographically detailed administrative datasets [120]. Additionally, Bayesian statistical methods are rare in policy studies [1 •], but these methods can enhance precision by drawing on prior knowledge about policy effects and bounding the range of plausible effect estimates.
If the sample size is not fixed, power calculations can help ensure that a study is designed to achieve sufficient precision. In practice, power calculations are uncommon for policy studies with existing data, and in retrospect may have very low power [117•, 121, 122]. Over-reliance on null-hypothesis significance testing is a pernicious problem in policy analyses. Interpreting the confidence interval for the estimated effect is essential to avoid concluding that an underpowered study demonstrates that the policy has no important effects. If the confidence interval for the policy effect crosses the null but includes values that would be of substantial benefit or harm when applied at a population level, the study is underpowered.

Internal validity, conditional exchangeability, and positivity
Confounding bias arises from systematic differences between jurisdictions with different levels of policy exposure (e.g., states that did and did not adopt a given policy). The social, political, and economic forces that shape policies also affect many health outcomes. Confounding can be severe and intractable-for example, when the confounders and the policy are too closely aligned to be disentangled. Strong confounders of policies include other policies and jurisdiction-level political orientations, especially for polarizing issues such as firearms, abortion, or immigration [2 •]. For example, the restrictiveness of state firearm policies is strongly determined by state political orientations, and states with stricter policies tend to have many policy restrictions, making it difficult or impossible to disentangle the effect on any one policy from the others [2•, 123]. Quantitative bias analysis, negative control exposures and outcomes, and other robustness checks are valuable yet underutilized tools for assessing the likely direction and magnitude of confounding bias in social policies studies (see Appendix for detail). For these tools, a high priority is to standardize their inclusion in policy studies. Potential checks should be articulated even when they cannot be fielded in a particular dataset.
Most policy studies involve longitudinal data structures and the possibility of time-varying confounding. Time-varying confounding can occur when prior levels of a policy exposure affect downstream confounders which in turn affect subsequent policies and health outcomes. For example, US state prescription drug monitoring programs (PDMP) adopted in response to the overdose crisis may have affected illicit opioid market dynamics by decreasing access to prescription opioids and increasing demand for illicit opioids [124,125]. This pattern may prompt further changes to state opioid policies to reduce use of illicit opioids in an attempt to reduce overdose deaths. For studies evaluating the effects of opioid policies over time, illicit opioid market dynamics partially mediate the PDMP-overdose relationship, but confound the relationship between late-stage opioid policies and overdose. Bias therefore results from both typical regression adjustment for illicit opioid market dynamics or failure to adjust for illicit opioid market dynamics. Other examples of potential time-varying confounders include participation in social programs (e.g., SNAP, WIC, EITC, etc.), receipt of physical or mental health services, smoking status, alcohol use, exposure to air pollution, access to green space, residence in public housing, and diet quality.
Multiple methods have been developed to address timevarying confounding including inverse probability weighted estimation of marginal structural models, g-estimation of structural nested models, the longitudinal g-formula, and longitudinal targeted minimum loss-based estimation [126][127][128]. However, these methods are rarely used in applied studies, including to evaluate social policies [129]. Barriers to uptake of these methods include uncertainty about how to integrate them with IV or DID analyses of quasi-experiments and data requirements (large sample sizes, repeated observations on units over time).
Multiple related policies are often adopted or implemented in the same jurisdiction simultaneously or in quick succession, a problem known as co-occurring policies [1•, 2•]. Many study designs exploit variation in the timing and locations of policy changes across jurisdictions to isolate the causal effects of a policy. Co-occurring policies that all affect the outcome of interest pose a conundrum for such designs: left uncontrolled, co-occurring policies confound one another, but controlling for co-occurring policies can reduce effective sample size and lead to positivity violations. Positivity violations can lead to bias, imprecision, and undefined estimates [1 •, 2•]. This challenge is pervasive across numerous social policy domains and often results in very imprecise effect estimates [2•]. Potential solutions to co-occurring policy problems include explicitly assessing threats to positivity, restricting to population subgroups among whom the policies can be disentangled, using more nuanced measures of policy exposure, defining clusters of policies as the exposure(s) of interest, or using stringency or generosity scores to characterize the overall policy environment [1•, 14].
Randomization of policies is often considered unethical or impractical due to limited resources or political urgency, but experiments of public social interventions are often feasible. A recent systematic review identified 38 US social policies that were evaluated with randomized designs and examined health outcomes [117•]. Public and political support for social experiments may have waned in recent decades [130].
Epidemiologists have a role to play in advocating for the practicality, ethics, and benefits of randomization for science and public health. While legitimate ethical concerns exist, adoption and maintenance of potentially harmful policies and failure to randomize given the opportunity are also unethical.
New social policies can rarely be implemented instantaneously among all potential beneficiaries, and the need for gradual scale-up creates ethical opportunities to randomize at either the place-level (e.g., randomized stepped wedge designs [131]) or the individual-level (e.g., waitlist controls [132], lotteries [22]). The evaluation of California's Armed and Prohibited Persons System constitutes one of the largest-known cluster-randomized trials of a public policy [133•]. This initiative aimed to recover firearms from people who purchased them legally but later became prohibited from owning them. Collaboration between academic investigators and the California Department of Justice facilitated a randomized rollout of the program to the 1000 + communities who received intervention earlier or later.
Virtually all current efforts at inferring the causal effects of social policies on health can be characterized as either confounder-control or instrument-based methods [9•]. However, Pearl's transdisciplinary causal inference framework recognizes a third approach to causal identification: the front door criterion. This rarely used approach is premised on enumerating and measuring all the causal pathways by which an exposure affects an outcome (Fig. 3). If there are no unmeasured confounders of the exposure-mediator or mediator-outcome relationships, then an unbiased estimate of the causal effect of the exposure on the outcome can be quantified. Importantly, applications of the front-door criterion do not require the investigator to identify, measure, and appropriately adjust for confounders of the exposure-outcome relationship (i.e., backdoor adjustment). Thus, in social policy contexts where accurate and comprehensive confounder-control is challenging, and valid instruments are unavailable, but the mechanisms by which a policy affects health can reasonably be hypothesized and measured, the front door criterion shows great promise.
Glynn and Kashin demonstrated a compelling application of the front door criterion to estimate the effect of the often-studied Job Training Partnership Act (JTPA) program [134,135] on subsequent earnings by leveraging program adherence as the mediator [136 •]. Using data on participants and observational population-based controls, the authors evaluated varying specifications of front door and backdoor adjustment in comparison with results from the original JTPA randomized trial. While estimates from backdoor adjustment were sensitive to the choice of adjustment variables and often failed to replicate the trial results, the front door estimates consistently replicated trial results irrespective of the choice of adjustment variables. This study adds to a small but growing literature suggesting that front door approaches may be more robust to violations of the conditional exchangeability assumption than backdoor approaches [136•, 137, 138]. Similar applications can be conceptualized for social policies when the exposure is eligibility for resources and the mediator is program uptake or adherence.
Estimation methods that move beyond traditional regression coefficients also show promise in addressing threats to validity. For policy domains with political polarization, in which certain jurisdictions are unlikely to ever adopt a policy, defining the causal effect of interest not as an all-ornothing contrast (e.g., all states adopt the policy versus no states adopt the policy) but rather as a temporal shift (e.g., what if all states adopting the policy delayed adoption by 2 years) can result in a definition of positivity that is more achievable [139, 140•]. Recent advances in econometrics showed that the standard two-way fixed effects designinvolving panel data on multiple jurisdictions over time and indicator variables for each jurisdiction, each time step, and exposure to the policy-can be substantially biased when: (1) the timing of policy implementation is staggered across jurisdictions; and (2) the effect of the policy within a jurisdiction changes over time [141 •]. Both of these criteria are common for social policies and health. Multiple new estimators were developed to address this bias, but they have not yet been widely adopted in applied research [141•].

Conclusions
Rigorous research on the health effects of social policies can be highly relevant to public decision-making. Thus, epidemiologic research in this field is likely to grow. In the existing literature, study designs, data sources, policy domains, health outcomes, and populations of interest vary widely. Study quality also varies, and the strongest studies are those involving careful selection of the research question and precise attention to which populations represent the counterfactual outcomes, i.e., which populations are being used to approximate the outcomes the population exposed to the policy would have experienced if the policy had not been implemented. The estimand most plausibly identifiable with the data at hand is often not the estimand of greatest substantive interest. Navigating this tension-between the question of greatest interest and the question that can actually be answered-is a central focus of designing rigorous policy studies.
Numerous methodological tools are underutilized and are poised to enhance the rigor of research on the health effects of social policies as these methods become more accessible. Methodological development to integrate epidemiologic and econometric methods, and to increase the size and measurement quality of datasets will also improve study quality. Growing focus on evaluating heterogeneity in policy effects and anticipating the health effects of policies in new populations will expand the relevance of policy research and help to meet the growing demand for evidence to guide social policy decisions.

Appendix
Common confounder-control and instrument-based methods: Common confounder-control methods for policy research include multiple regression models [46,76,142,143], matching or propensity score weighting [144,145], many implementations of differences-in-differences (DID), and DID variants including two-way fixed effects and synthetic controls [26, 27, 29, 36, 146•, 147]. These methods are all premised on the assumption that the factors that determine policy exposure and also influence heath (or proxies for these factors) are fully measured and accounted for in the statistical analysis. For example, average education, income, or attitudes of community residents all affect health and may influence the likelihood a policy is adopted. Researchers often attempt to measure and control for each of these variables.
Instrument-based methods for policy research use instrumental variables (IV) analysis [45,148,149]. IV can also be implemented as fuzzy regression discontinuity [150][151][152], or DID, the latter if the instrument is defined as an indicator for the jurisdictions and times with the policy [153]. Instrumentbased methods require the assumption that the instrument (e.g., the discontinuity indicator or the policy adoption indicator) is unrelated to the health outcomes that people in the sample would have experienced under alternative values of Mediatoroutcome confounders Fig. 3 Directed acyclic graph and required assumptions for evaluating causality via the front door criterion Legend: In the absence of a randomized trial, causality is typically evaluated via one of two approaches: fulfilling the backdoor criterion (i.e., controlling for confounders) or using an instrumental variable. A rarely used alternative is to evaluate causality by fulfilling the front door criterion. Using the front door criterion is appealing because it can be fulfilled even when there are unmeasured confounders of the exposure-outcome relationship. This figure provides intuitive informal definitions of the required assumptions for evaluating causality via the front door criterion. There are many different ways to state the formal required assumptions, and we refer the reader to Hernán and Robins [13] and Pearl [17] for different variations. the policy exposure. Although instrument-based methods in health research are commonly used to evaluate the effect of a social resource delivered by the policy, they may also be used to evaluate a policy as the exposure.
Power calculations: If the sample size is not already fixed, power calculations can help ensure a study is designed to achieve sufficient precision. However, as inputs to such calculations, evidence to guide the selection of anticipated effect sizes for social policies and health is sparse. Several considerations suggest large effect sizes are unlikely [115•]. First, most social policies act on health through intermediate levers such as housing or economic security. Second, a large fraction of the population may be ineligible or otherwise unaffected by the social policy; population-level impacts will be muted proportional to the fraction of the population for whom the policy is irrelevant. Some research suggests that power calculations are rarely done for policy studies with existing data and fixed sample sizes, and in retrospect may have very low power [117 •, 121, 122].
Quantitative bias analysis, negative control exposures and outcomes, and other robustness checks are valuable yet underutilized tools for assessing the likely direction and magnitude of biases in social policies studies. Quantitative bias analyses (QBA) are a class of methods used to estimate the direction, magnitude, and uncertainty of systematic biases in estimated measures of association. For example, researchers have quantified the E-value, which indicates the minimum strength of associations that an unmeasured confounder would have to have with both the policy and the outcome to fully explain away the estimated policy-outcome association [103,154]. In policy studies, QBA may be useful for example in estimating the potential magnitude of bias arising from a particular strong confounder or mis-measurement of relevant policy features. Guidance for conducting QBA is growing [155, 156, 157•], but debate remains over which approaches are most informative [158,159].
Negative controls are useful for detecting biases including unmeasured confounding, recall bias, and analytic flaws [160]. For example, suicides completed with means other than firearms have been used as a negative control when evaluating the impacts of firearm policies on firearm suicides [161]. If firearm and nonfirearm suicides have similar determinants, but nonfirearm suicides are unaffected by firearm policies, then testing the association of firearm policies with nonfirearm suicides can be used to detect residual confounding due to these factors. Negative control variables can take many forms, and their utility depends on identifying a compelling negative control exposure or outcome. Other forms of robustness checks common in the econometric literature have been reviewed elsewhere [67•]. For research on the health effects of social policies, a high priority is to standardize the inclusion of QBA, negative controls, and other robustness checks. Potential checks should be articulated even when they cannot be fielded in a particular dataset.
Complex systems modeling, a suite of mathematical tools including compartmental, systems dynamic, and agent-based modeling, can help address the multi-level, dynamic, interrelated nature of real-world phenomena that make studies of social policies challenging [109, 162•]. These approaches can explicitly model feedback loops and reverse causation in which a policy is adopted in response to a problem, which can otherwise give the impression that a policy magnified the problem it was designed to address [124]. Systems modeling approaches can be used to answer causal questions about the impacts of social policies, but a main contribution of systems modeling for policy studies is predicting the potential impacts of hypothetical future policies in places where the policies have not yet been implemented [162•]. Lack of widespread standardized training in complex systems modeling is a primary barrier to greater uptake of these methods [163].
Another recent and exciting methodological advance for policy research is the development of automated search algorithms to support causal identification. For example, given a dataset, machine learning can be used to discover instrumental variables or natural experiments that can be leveraged for causal identification [164]. Causal search algorithms can also be used to determine which directed acyclic graphs are consistent with an observed dataset and thus which causal mechanisms are most plausible [165]. These emerging algorithms are promising both because of their potential to strengthen causal inferences about the health effects of social policies and to better understand the complex mechanisms via which social policies act to affect health. Pairing these algorithms with deep substantive knowledge is critical to ensure their appropriate use.
Funding This work was supported by the Evidence for Action program of the Robert Wood Johnson Foundation and grant number K99-AA028256 from the National Institute for Alcohol Abuse and Alcoholism.

Conflict of Interest
The authors declare no competing interests.

Human and Animal Rights and Informed Consent
This article does not contain any studies with human or animal subjects performed by any of the authors.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.