How to evaluate the effects of IMF conditionality

Following calls for a more disaggregated approach to studying the consequences of IMF programs, scholars have developed new datasets of IMF-mandated policy reforms, or ‘conditionality.’ Initial studies have explored how conditions have, inter alia, affected tax revenues, public sector wages, and health systems. Notwithstanding the important contributions of these studies, a methodological quandary arises as to how to quantitatively examine the effects of conditionality, as distinct from other aspects of IMF operations (e.g., credit, technical support, or aid and investment catalysis). In this article, we review and advance these methodological debates by developing an identification strategy for addressing the multiple endogenous components of IMF programs. We begin by surveying the main strategies for studying the effects of IMF programs: matching methods, instrumental variable approaches, system GMM estimation, and variants of Heckman estimators. We then adapt these methods for studying the effects of conditionality per se. Specifically, we utilize a compound instrumental variable design over a system of three equations to address sources of endogeneity related to, first, the IMF participation decision and, second, the conditions included within the program. In Monte Carlo simulations, we demonstrate that our approach is unbiased and performs better than alternatives on standard diagnostics across a range of scenarios. Finally, we apply these methods to investigate how IMF programs impact government education spending as a share of GDP on a sample of 132 developing countries for the period 1990 to 2014, finding exposure to an additional condition results in a 0.05 percentage point decline.


Introduction
Established in 1944, the International Monetary Fund (IMF or Fund) is a cornerstone institution of global economic governance. Not only is it central to the functioning of the world economy (Kentikelenis and Seabrooke 2017;Stone 2011;Woods 2006), but it has also played a decisive role in the long-run developmental trajectory of middleand low-income countries (Babb and Kentikelenis 2018;Dreher 2006;Dreher and Lang 2019;Kentikelenis et al. 2016;Vreeland 2003), affecting the lives of billions in the process (Babb 2005;Kentikelenis 2017). Unsurprisingly, the institution has invited controversies, with a track record rarely praised among scholars. 1 Among its various activities, the most contentious has been its practice of conditional lending. According to its founding charter, the Fund can provide temporary financing under 'adequate safeguards' to countries experiencing balance of payments problems. In exchange for this support, countries must agree to implement IMF-designed policy reform packages-or 'conditionality'-administered through a lending program. These programs typically last from six months to three years, and loan disbursements are phased over the duration in tranches, contingent upon the implementation of policy reforms.
The conditionality apparatus of IMF lending programs has two forms: quantitative and structural conditions (Bird 2009;IMF 2015). The former take the form of quantifiable macroeconomic targets that countries must meet and maintain throughout the program, such as credit aggregates, international reserves, fiscal balances, and external borrowing, and make up the majority of conditionality up to the present . Although quantitative conditions may overly restrict governments' fiscal policy space, policymakers can pursue a range of alternative policies to meet them; for example, several types of measures can yield budget deficit reductions. In contrast, structural conditions clearly specify the means that contribute to meeting the macroeconomic targets and other objectives. They concern a wider range of microeconomic reforms and afford governments less flexibility. Such reforms have commonly aimed at altering the underlying structure of an economy; for instance, by privatizing stateowned enterprises, legislating central bank independence, deregulating labor markets, or restructuring tax systems .
A large body of quantitative research is devoted to understanding the consequences of IMF programs (Abouharb and Cingranelli 2009;Bas and Stone 2014;Dreher 2006;Nelson and Wallace 2017;Nooruddin and Simmons 2006;Oberdabernig 2013;Stubbs et al. 2016). Most of this work has relied on a broad-brush binary indicator for whether or not a country is under a Fund program in a given year as a measure of the organization's engagement-plugging it into regression models and using it to differentiate control and treatment groups. Notwithstanding the important contributions of these studies, this methodological approach suffers from two major shortcomings. First, it assumes all IMF programs are identical, while-in practice-they entail heterogeneous policy content: some have a wide array of conditions attached, spanning multiple policy areas (e.g., 126 conditions for Romania in 2004); while others require a very limited number of measures (e.g. four conditions for Morocco in 2013). Second, the technique is unable to differentiate the effects of IMF conditionality from other pathways of program influence; for instance, through credit injections (Dreher 2006), scaled-up technical assistance and policy advice (Broome and Seabrooke 2015;IMF 2016a), aid and investment catalysis (IMF 2004;Stubbs et al. 2016), and moral hazard (Dreher and Walter 2010).
While scholars have developed appropriate methodological solutions to analyze the impact of IMF programs, further advancement of this research agenda has-until recently-been hamstrung by the lack of disaggregated data on conditionality. Prescient of the fact that these methods might soon reach their frontier, Vreeland (2006) called for the adoption of such a disaggregated approach to IMF conditionality over a decade ago. Scholars have responded to this call, assisted by the release of the IMF's Monitoring of Fund Arrangements (MONA) database of conditions and new panel datasets developed by independent scholars (Beazer and Woo 2016;Caraway et al. 2012;Copelovitch 2010a;Kentikelenis et al. 2016;Rickard and Caraway 2018). Given this marked increase in the range and detail of data available on conditionality, along with growing interest in the topic since the controversies surrounding the IMF's handling of the global financial crisis (Grabel 2011), the stage is set to undertake fine-grained quantitative analyses of the impact of conditionality per se. These methodological advances necessitate revisiting established methodologies to examine the impact of IMF programs.
This article elaborates on how to use panel data on conditionality in IMF programs to provide quantitative analyses of their effects, focusing on the impact stemming from the degree of conditionality-that is, the number of conditions applicable either in total, across condition types (i.e., quantitative or structural), or in a given policy area. We consider two endogenous components of treatment: (1) countries may select into IMF programs reflecting, for example, the severity of crises requiring IMF assistance; and (2) once participating in IMF programs, countries may select (or be selected) into greater or lesser degrees of conditionality.
Resolving these methodological quandaries allows scholars to test hypotheses and enrich understandings on the consequences of IMF programs. Does the IMF assist developing countries to improve their economic or social condition? Did changes in IMF policies introduced to address criticisms have the intended effects? Debates about IMF conditionality are, in essence, debates about development and globalization. For borrowing countries, their policy mix of conditionality determines their mode of integration into the world economic system and their ability to provide basic services to their population. Yet, despite such far-ranging implications, much debate has taken place on the basis of haphazard or inadequate empirical data. Accounting for IMF program heterogeneity, as endeavored here, is a step to rectifying this gap in our knowledge.
Our exposition of methodological issues, combined with Monte Carlo simulations, establishes the efficacy of maximum likelihood estimation (MLE) over a system of three simultaneous equations, in effect combining: (a) a compound instrumental variable approach to account for endogeneity of IMF program participation, using as an excludable instrument the interaction term of plausibly exogenous time variation in the budget constraint of the IMF and cross-sectional variation in the average of IMF program participation across the period of interest; and (b) a compound instrumental variable approach to account for endogeneity of IMF conditionality, using the interaction term of the budget constraint of the IMF and cross-sectional variation in the average number of conditions a country receives. By including both IMF participation dummy and conditionality variables, we are able to isolate effects of conditions from other aspects of IMF operations. We demonstrate the utility of our approach by applying these methods to examine how IMF programs impact government education expenditure in a sample of 132 developing countries for the period 1990 to 2014. We find that exposure to an additional IMF condition results in a 0.08 percentage point decline in government spending on education as a share of GDP.
The article is structured as follows. The next section reflects on methodological challenges scholars face when studying the effects of IMF programs and provides an overview of four main approaches: matching methods, instrumental variable approaches, system GMM estimation, and variants of Heckman estimators. Section 3 describes our strategy for investigating the effects of IMF conditionality per se. Section 4 illustrates these methods by examining the effects of IMF conditionality on government education expenditure. Section 5 reflects on the limitations and broader relevance of our methodological advances.

A review of established methods for studying the impact of IMF programs
Scholars interested in estimating the effects of IMF programs typically assemble a timeseries cross-section dataset with a large number of countries and years (e.g., all developing countries for a period spanning two decades), where the unit of analysis is the country-year. Studies collapse aspects of the IMF's operations into a binary indicator coded '1' if a country had an active program in a given year, and '0' otherwise (Bas and Stone 2014;Dreher 2006;Vreeland 2002). Existing strategies for estimating the average treatment effect 2 of this IMF program participation variable on some outcome variable-such as economic growth, foreign direct investment, or social expenditures-all confront the issue of selection bias (Steinwand and Stone 2008). This form of endogeneity is introduced because the circumstances of countries participating in IMF programs are systematically different from those not participating, which may in turn affect the outcome of interest. While some of these forces are observable and can thus be included as control variables (e.g., government fiscal balance or international reserves), other factors-such as political willingness to implement reforms-are not directly observable (Przeworski and Vreeland 2000;Stone 2008;Vreeland 2003). Failure to account for factors that are correlated with both IMF participation and the outcome would thus erroneously attribute their effects to IMF participation. Scholars have employed four strategies to overcome this limitation: matching methods, instrumental variables approaches, system GMM estimation, and variants of Heckman estimators. 3 We discuss each in turn.

Matching methods
Matching seeks to address the issue of selection bias arising from observables by pairing observations with similar context but different IMF participation status (Atoyan and Conway 2006). However, it does not offer a solution to selection bias arising from unobservables since it can only be used when variation between participating and nonparticipating countries can be captured by observed covariates (Hardoy 2003). The advantage of using matching methods is that they do not, in principle, require identification of a valid instrument (discussed later), and reduce dependence on modelling and distributional assumptions that accompany parametric approaches (Copelovitch 2010b).
Matching approaches focus on the impact of IMF participation for countries paired with other countries at a similar likelihood of participation to identify an average treatment effect on the treated (ATET) (Wooldridge 2010). The ATET can be distinguished from an average treatment effect (ATE) insofar as the former identifies the mean effect of those countries that actually participated in an IMF program, whereas the latter refers to the impact of a randomly selected country against a counterfactual non-participation state without considering whether or not the selected country would ever actually qualify for or be interested in participating in an IMF program in the first place (Hardoy 2003).
An initial step in the matching procedure is to calculate the probability of participating in IMF-supported programs conditional on observable economic and political conditions, estimated via a probit model. 4 The next step entails generating matches of similar probabilities, or propensity scores, between pools of participating and nonparticipating observations, or country-years, to construct a control group (Atoyan and Conway 2006;Bal Gündüz 2016). For instance, if we assume that country selection is driven only by levels of foreign reserves, then we could match Uganda in 1981 with Tanzania in 1983. Both cases had low reserves-0.1 times monthly imports-but only the former country participated in an IMF program. In our hypothetical example, Uganda in 1981 thus enters the treatment group, while Tanzania in 1983 enters the control group, in effect acting as a counterfactual Uganda. This process repeats until all treatment and control country-years are paired. In practice, these matches can be constructed using various tolerance levels and matching techniques, such as nearestneighbor matching, interval matching, or kernel matching (see Morgan and Winship 2007). 5 The final step involves calculating the ATET as the difference in means between treatment and control groups for matched data.
Several studies deploy matching methods to explore the impact of participation in IMF programs (Atoyan and Conway 2006;Bal Gündüz 2016;Garuda 2000;Hardoy 2003;Nelson and Wallace 2017). For instance, Hardoy (2003) uses nearest-neighbor 4 Matching by propensity score is the traditional and most popular approach, but alternative matching criteria include index scores or Mahalanobis metrics (Augurzky and Kluve 2007;King and Nielsen 2018). 5 Nearest-neighbor matching-the most commonly deployed matching technique in studies on the effects of IMF-attempts matches in terms of the absolute distance between their propensity scores, subject to the goal of minimizing the sum of all distances over all possible sets of matches (Atoyan and Conway 2006). The choice of tolerance level determines the absolute distance that propensity scores must be equal or less than before two observations are matched; unmatched observations are excluded from the subsequent ATET calculation. matching to examine the relationship between IMF participation and economic growth, observing no statistically significant difference across treatment and control groups. Atoyan and Conway (2006) also employ nearest-neighbor matching in their study of economic growth, and report no contemporaneous effects of IMF participation. The authors note that their approach excludes 105 of the 181 IMF participation countryyears due to a lack of matchable nonparticipants, in effect constraining the analysis to a subsample of countries drawn from the middle of the distribution of propensity scores. More recently, Bal Gündüz (2016) showed IMF participation is positively associated with short-term economic growth for low-income countries in a nearest-neighbor design; and Nelson and Wallace (2017) extend matching procedures in an iterative algorithm-so-called genetic matching (Diamond and Sekhon 2013)-revealing a modest but positive effect of IMF participation on the level of democracy.
Despite the merits of this method, important limitations remain. As mentioned, a key assumption is that all meaningful variation between participating and non-participating country-years can be captured by observed-or pre-treatment-covariates (Hardoy 2003). Since matching relies only on observable determinants of IMF participation to generate propensity scores, it can actually accentuate selection bias (Dreher 2006;Przeworski and Vreeland 2000;Vreeland 2003). Vreeland (2003) explains that matching methods systematically confuse the effect of participation with unobserved factors (e.g., political will), which can affect both selection into IMF programs and the outcome of interest. This selection on unobservables thus contradicts the necessary assumption for matching to provide consistent estimates when comparing means across participating and non-participating countries (Bas and Stone 2014). Another limitation is the inherent trade-off scholars face between minimizing the differences between matches, which may exclude treatment cases due to incomplete matching, or maximizing the number of matches, which may result in poor matches for treatment cases (Atoyan and Conway 2006). With no set benchmark for a suitable level of tolerance, the decision is at the researcher's discretion. Further, matching techniques do not appropriately account for the time-series cross-sectional structure of datasets (Nielsen and Sheffield 2009), which are often used in analyses of the effects of IMF programs. In particular, many matching methods match country-year observations rather than clusters of country-year observations nested within panels, or countries. More generally, King and Nielson (2018) caution against using propensity score matching as a method of causal inference entirely, citing inadequacies in the theoretical justification of its mathematical proof that introduce statistical biases to its results.

Instrumental variable approaches
A second solution to the problem of endogenous explanatory variables is two-or threestage least squares (2SLS or 3SLS) estimation using one or a series of instrumental variables. To serve as an instrument, a variable must fulfil two criteria: first, the 'exclusion criterion' is that it must not affect the outcome except via IMF participation; second, the 'relevance criterion' is that it must be partially correlated with IMF participation once other exogenous variables have been netted out (Wooldridge 2010). In 2SLS estimation, predicted values are obtained for IMF participation by regressing it on exogenous variables from the outcome equation and the excluded instrumental variables. The outcome variable is then regressed on predicted values of IMF participation and observed values of exogenous variables. Extending the 2SLS procedure, 3SLS estimation incorporates information from cross-correlations of error terms in a system of simultaneous equations for multiple endogenous variables to produce more efficient parameter estimates (Barro and Lee 2005). 6 Past research has relied on a range of political economy variables as instruments for IMF participation, which vary depending on the outcome of interest (Barro and Lee 2005;Butkiewicz and Yanikkaya 2005;Dreher 2006;Dreher and Gassebner 2012;Easterly 2005;Moser and Sturm 2011;Oberdabernig 2013;Steinwand and Stone 2008). Most studies rely on United Nations General Assembly (UNGA) voting similarity with the US (Dreher and Gassebner 2012;Steinwand and Stone 2008;Woo 2013); that is, all else equal, countries that vote similarly to the US are more likely to participate in IMF programs. To act as a valid instrument, voting patterns must influence IMF participation (Thacker 1999), but not affect the outcome variable except via IMF participation. Using this instrument, Dreher (2006) shows that IMF participation reduces growth rates even accounting for endogeneity, and Barro and Lee (2005) find that greater participation rates in Fund programs reduce economic growth, democracy, and rule of law.
Nonetheless, identifying valid instruments for all possible outcomes of interest remains a key problem associated with this method. Studies using instruments that proxy the geopolitical importance of a recipient country assume that the Local Average Treatment Effect (LATE) is representative of all IMF programs, not just the politically motivated ones (Dreher et al. 2018). In practice, the LATE might not be generalizable, as politically motivated programs could be less effective. Further, some studies adopting instruments may breach the exclusion criterion. For example, if the outcome is democracy then the UNGA instrument is not excludable (Nelson and Wallace 2017), since democratic states exhibit similar voting patterns to those cast by the US (Carter and Stone 2015). We also observe a clear breach in a study examining the influence of IMF participation on social expenditures by deploying international reserves, bilateral exchange rate, and an exchange rate classification index as instruments (Clements et al. 2013), all of which can affect social spending outside the IMF channel. 7 The challenge of identifying valid instruments is compounded when faced with the additional concern about the endogeneity of IMF conditionality (discussed later).

System GMM estimation
System generalized method-of-moments (GMM) estimators for dynamic panels (Arellano and Bond 1991;Arellano and Bover 1995;Blundell and Bond 1998) have recently been utilized to allay concerns of endogeneity in IMF participation. Unlike standard instrumental variable approaches, this method does not assume that valid 6 3SLS uses the 2SLS estimates for each equation in a system of simultaneous equations to obtain an estimate of the contemporaneous variance-covariance matrix of the errors across the system. A transformed singleequation representation of the system then yields 3SLS estimates, which are consistent and asymptotically more efficient than 2SLS estimates (Nsouli et al. 2006). 7 Exchange rates are not excludable because currency depreciation raises the costs of imported drugs and hospital equipment, which can increase government social spending (Kentikelenis et al. 2015). Likewise, governments with greater accumulations of international reserves can draw down on them to safeguard social expenditures during economic downturns (Thomson 2015). instruments are available outside the immediate dataset, instead employing internally derived instruments based on lagged values of levels and differences of IMF participation. System GMM proceeds by estimating a system of two simultaneous equations: a 'differences' equation-where explanatory variables are first-differences-uses lagged levels of IMF participation from two or more previous time periods to instrument the contemporaneous change in IMF participation; and a 'levels' equationwhere explanatory variables are levels-uses lagged first-differences to instrument contemporaneous levels of IMF participation (Roodman 2009a, b).
Studies of the consequences of IMF participation have only infrequently relied on system GMM estimators (Clements et al. 2013;Dreher and Gassebner 2012;Dreher and Walter 2010;Mukherjee and Singer 2010). Dreher and Walter (2010) found a negative association between IMF participation and currency crisis, and a positive association between IMF participation and exchange rate devaluation in response to a crisis, treating IMF participation and a lagged dependent variable as endogenous. IMF staff also adopted a system GMM setup to examine the effects of IMF participation on health and education expenditures, finding a positive effect on spending in low-income countries (Clements et al. 2013). In their approach, GDP per capita and government balance were internally instrumented, whereas IMF participation was externally instrumented. Additional studies have deployed system GMM only in robustness checks (Dreher and Gassebner 2012;Mukherjee and Singer 2010).
Despite its advertised flexibility, system GMM estimation makes strong assumptions about the data generating process. It assumes that the correct model for the outcome is dynamic (i.e., present changes are a function of past trends), that lagged differences can predict contemporaneous levels, and that first differences of instruments are uncorrelated with country fixed effects (Roodman 2009a, b;Stuckler et al. 2012). However, for the latter assumption to hold, country fixed effects and first differences of IMF participation must offset each other across the entire panel. It requires that "throughout the study period, [countries] sampled are not too far from steady states, in the sense that deviations from long-run means are not systematically related to fixed effects" (Roodman 2009b, p. 128). Whether this criterion is fulfilled depends on the sample of countries and time periods included, but is unlikely to be met in the context of IMF interventions. An additional limitation is that system GMM estimation is sensitive to the numerous minutiae-for example, the number of instrument lags, whether they are collapsed, and whether estimation is one-or two-step-none of which have a clear theoretical basis when studying IMF participation (Stuckler et al. 2012). As Roodman (2010) explains, these choices matter: they can make estimates more or less valid, and they can make certain tests of that validity stronger or weaker. There is also a risk of over-fitting endogenous variables by introducing too many instruments, thereby failing to expunge their endogenous components (Roodman 2009a). Roodman (2009a, p. 156) concludes that "the estimators carry a great and under-appreciated risk: the capacity by default to generate results that are invalid and appear valid."

Variants of Heckman estimators
Heckman variants correct for selection bias by treating non-random assignment of countries into IMF participating and non-participating groups as an omitted variable problem (Heckman 1979). In effect, the omitted variable is a catch-all term that captures the qualities that make the entity prone to selection. Like instrumental variable approaches, the appeal of Heckman variants are that they can control for selection on unobservables, such as political will (Vreeland 2003); yet, they are more efficient than instrumental variable approaches when the selection variable is dichotomous-such as IMF participation-rather than continuous (Wooldridge 2015).
Two main variants of Heckman estimators have been deployed in IMF literature: a standard Heckman model; and the control function approach. Both approaches initially employ a probit model to predict a country's IMF participation, thereby generating the 'inverse-Mills ratio.' 8 The participation equation typically requires an 'exclusion restriction'-an excludable instrument that influences selection into IMF programs but not the subsequent outcome of interest (Lang 2016). 9 The inverse-Mills ratio is subsequently added to the vector of controls in an outcome equation estimated with Ordinary Least Squares (OLS) regression. For Heckman models, the outcome equation is limited to observations only where the country has selected into the treatment. Although this approach cannot directly estimate the effect of participating in an IMF program, it can do so indirectly by estimating another model for observations without IMF participation, and then calculating the weighted difference for the entire sample of selection-corrected parameters for countries participating in IMF programs with selection-corrected parameters for those not participating (Vreeland 2003). Conversely, a control function approach includes all observations in the outcome equation (i.e., regardless of whether or not the country selected into treatment) (Wooldridge 2015), and can thereby directly estimate the effect of participating in an IMF program. The approach is frequently mislabeled as a Heckman model in the IMF literature because it draws on insights from Heckman's work vis-à-vis the source of omitted variable bias.
Several studies on the effects of IMF programs use variants of Heckman estimators (Bas and Stone 2014;IEO 2003;Kentikelenis et al. 2015;Mukherjee and Singer 2010;Nooruddin and Simmons 2006;Oberdabernig 2013;Przeworski and Vreeland 2000;Vreeland 2003). For instance, the IMF's Independent Evaluation Office (2003) used a control function approach to test for the effects of IMF participation on social expenditure, identifying a positive association. Employing a similar design, Kentikelenis and colleagues (Kentikelenis et al. 2015) found IMF participation is associated with higher health expenditures in sub-Saharan African low-income countries, and with lower health expenditures in low-income countries elsewhere. Investigating the effects of IMF participation on economic growth, Przeworski and Vreeland (2000) and Vreeland (2003) utilized a Heckman model that corrects both for a country's decision to request an IMF program and for the IMF's decision to approve or reject the request, finding IMF participation lowers growth rates; however, a more recent study reanalyzed their data and found beneficial effects on growth when modelling a country's decision on the expectation of the IMF's 8 Przeworski and Vreeland (2000) and Vreeland (2003) elaborate on this method by deploying a bivariate probit design, requiring two participation equations to model the process of IMF participation. This approach corrects both for country selection into IMF participation and IMF selection of countries to lend to, thereby adding two separate Inverse-Mills ratios to the vector of controls in the outcome equation. 9 What the IMF literature calls an exclusion restriction is typically referred to as an excludable instrument in conventional econometric terminology. We favor the term excludable instrument henceforth. Although Heckman variants that satisfy auxiliary assumptions on the joint distribution of error terms do not strictly require an excludable instrument, we encourage researchers to follow a conservative approach by deploying one.
decision (Bas and Stone 2014). Finally, Oberdabernig (2013) deployed a control function approach and combined it with Bayesian Model Averaging to demonstrate adverse shortterm effects of IMF participation on poverty and inequality.
Despite widespread use, Heckman variants are not without limitations. The precision of their estimates depend on the variance of the inverse-Mills ratio, which is determined by the predictive capacity of the first-stage probit model (Winship and Mare 1992). That is to say, it depends on having correctly specified the participation equation. Problems with collinearity of the inverse-Mills ratio may also arise when there is major overlap in the explanatory variables used in the participation equation and the outcome equation (Wooldridge 2012). While these concerns are allayed by introducing an excludable instrument, such a variable may not be readily available (Sartori 2003). 10 A final drawback is that country fixed effects cannot be introduced to the first-stage probit model, due to the well-known incidental parameter problem (Greene 2004). 11 On balance, while none of the methods are without limitations, we maintain that some clearly perform better than others in dealing with the problem of selection bias. Matching methods are the least palatable option due to their inability to address selection on unobservables. We also discount system GMM estimation because it carries stringent assumptions that are untenable in all but the most exceptional of circumstances; besides, the estimates are too sensitive to arbitrary changes in the model to inspire confidence. We are thus left with instrumental variable approaches and Heckman variants. Both approaches entail the pursuit of a variable that fulfils exclusion and relevance criteria (i.e., an excludable instrument). Here, we prioritize the minimization of potential bias that could be introduced by excluding country fixed effects in the first-stage equation over the gains in efficiency that Heckman variants achieve for dichotomous variables like IMF participation. We thus opt for instrumental variable approaches as our favored strategy. Nonetheless, because researchers may view this potential bias as negligible in their context, we also consider the more efficient control function approach below.

Adapting methods for studying the effects of conditionality
Notwithstanding voluminous literature on the IMF and its conditional lending, until recently scholars lacked systematic, transparent, and replicable data on the actual policy content of its programs. Assuming the unit of analysis as the country-year, quantitative studies relied on dummy variables that measure the presence of an IMF program in a given year. At a conceptual level, there are two main concerns with this approach. First, such work obscures countries' diverging experiences with IMF programs, which are 10 Sartori (2003) develops an estimator for binary-outcome selection models without excludable instruments, but notes that the rationale for needing an excludable instrument in Heckman approaches applies to both binary and continuous outcomes. 11 Conditional logit estimation would allow for the inclusion of fixed effects in a binary response model while avoiding incidental parameter bias. However, it cannot be used in a multi-equation framework because its errors have an extreme-value distribution; whereas the multi-equation estimator we introduce in this article assumes a multivariate normal distribution. Conditional logit thus cannot be used unless the researcher wants to extract the endogenous component of either IMF participation or conditionality, as in previous work ). designed ad hoc and thereby entail heterogeneous policy content not accurately captured by a binary variable . The empirical approach therefore implicitly sides with critics accusing the IMF of 'one size fits all' policies (Stiglitz 2002), despite the strong rejection of this claim by the organization (Dawson 2002). Second, these studies cannot isolate the effects of conditionality from alternative channels of program influence. These include, inter alia, scaled-up technical assistance and policy advice (Broome and Seabrooke 2015; IMF 2016a), aid catalysis (IMF 2004;Stubbs et al. 2016), and moral hazard (Dreher and Walter 2010). It is possible, for instance, that the impact of conditionality diverges from the impact of other aspects of IMF operations.
Scholars face additional endogeneity concerns when the variable of interest is IMF conditionality and not IMF program participation per se. While existing research has already established that countries select into IMF participation-a form of bias that the methods above attempt to parse out-what is less apparent is whether countries also select into conditions (Vreeland 2006). No academic consensus exists concerning whether conditions are requested by countries (Caraway et al. 2012;Rickard and Caraway 2014;Vreeland 2006), or imposed by IMF staff on unwilling borrowers (Chang 2007;Grabel 2011;Simmons et al. 2008;Stiglitz 2002). For proponents of the former argument, certain conditions may be sought by governments to gain leverage over domestic opposition to policy change (Vreeland 2006). The latter line of argument perceives conditionality as a coercive instrument at the disposal of the IMF, used to compel countries into implementing reforms they may not otherwise wish to undertake (Simmons et al. 2008). Where there is agreement is that the circumstances of countries receiving more IMF conditions are systematically different from those receiving fewer conditions. Indeed, several studies find that both domestic political conditions in the borrowing country and international strategic factors influence conditionality (Caraway et al. 2012;Dreher and Jensen 2007;Dreher et al. 2009Dreher et al. , 2015Gould 2003;Stone 2008). Chwieroth (2015) and Nelson (2014) also show that conditionality varies as a function of the professional ties and shared beliefs between IMF staff and borrowing-country officials. Yet, ambiguity remains as to whether these underlying differences would subsequently affect the outcome of interest.
We know of only 11 quantitative studies that examine the effects of IMF conditionality as distinct from IMF program participation, summarized in Table 1. These studies are yet to converge around a single method for addressing possible endogeneity biases. 12 Indeed, three of the 11 studies do not provide any treatment for endogeneity of conditionality (Rickard and Caraway 2018;Stubbs et al. 2017b;Woo 2013). It is also worth noting that these studies vary in the effect they wish to capture: five include an IMF participation variable in addition to the IMF conditionality variable, thereby capturing the total effect-or ATE-of IMF intervention (Bulír and Moon 2004;Chapman et al. 2017;Crivelli and Gupta 2016;Stubbs et al. 2017b;Wei and Zhang 2010); whereas six restrict analyses of conditionality to observations with IMF participation only, thereby capturing the conditioned effect-or ATET-of IMF intervention  (Beazer and Woo 2016;Casper 2017;Dreher and Vaubel 2004;Ivanova et al. 2006;Rickard and Caraway 2018;Woo 2013). The latter option is intuitive, and yields findings that are easier to interpret, since one need only consider the effect of IMF conditionality variables. However, it also means that results can only be interpreted within the context of country-years with an IMF program, in turn offering a more limited set of policy implications surrounding IMF program design more generally. Furthermore, using a restricted sample of the treated still does not absolve the requirement to address selection bias into a program, for instance by including the inverse-Mills ratio (Heckman 1979), nor account for endogeneity of conditionality.
If one wishes to distinguish effects of conditionality from other aspects of IMF programs, but is also interested in how this compares to cases without an IMF program, then both a measure of conditionality and a binary indicator for IMF participation should be included in the model. IMF conditionality has been measured on three dimensions: degree, or the number of conditions applicable either in total or within a given policy area; scope, or the total number of policy areas subject to conditionality; and depth, or the relative stringency of each of the conditions (IEO 2007; Kentikelenis et al. 2016;Stone 2008). We limit our analysis to the degree of conditionality, which scholars use as a proxy for the overall burden of conditionality (Copelovitch 2010a; Dreher and Jensen 2007;Dreher and Vaubel 2004;Gould 2003). This measure is admittedly imperfect because it does not capture the difficulty in implementing any individual condition (Dreher et al. 2015). In any case, it may be impossible to measure the difficulty of individual conditions, especially given the vastly different characteristics of IMF borrowers: the same condition in one domestic institutional environment could be easier to implement compared to another environment (Copelovitch 2010a). Coding condition depth entails a substantial-and, in our view, unacceptable-level of subjectivity. 13

Accounting for multiple endogenous IMF variables
We proceed under the assumption that countries select into both IMF participation and conditionality. Our proposed solution is to utilize maximum likelihood estimation (MLE) over a system of three simultaneous equations, in effect combining an instrumental variable approach to address endogeneity of participation with an instrumental variable approach to address endogeneity of conditionality. If all equations are linear then in theory 3SLS estimation could be used, but since we require greater flexibility due to non-linearity in the IMF program equation, we opt for MLE instead. We subsequently deliberate on plausibly excludable instruments for both IMF variables, before assessing the performance of our strategy in Monte Carlo simulations. Our unit of analysis throughout is the country-year, using time-series cross-sectional data. The methods we propose are adaptable to the inclusion of either a count of all conditions or multiple policy area counts; though, for the sake of parsimony, we focus on the total effect of all conditions.
Here, i is country and t is year. Equation (3) is the outcome equation, where W is the outcome of interest; d IMFPROG is the fitted value for IMF participation derived from Equation (1); d IMFCOND is the fitted value for the number of conditions derived from Equation (2); X denotes a vector of controls; μ is a set of country dummies; δ is a set of year dummies; and ε is the error term. 14 Equation (1) is a linear model to obtain predicted values of IMF participation, d IMFPROG. It is assumed to be a function of X, a list of covariates from the outcome equation; Z, an excludable instrument; μ, a set of country dummies; and δ, a set of year dummies . Equation (2) obtains the predicted values for IMF conditionality, d IMFCOND. It is also a function of X, μ, and δ; as well as, Y, another excludable instrument.
As discussed earlier, both instrumental variable approaches and variants of Heckman estimators can control for selection on unobservables into IMF program participation. A variation on Equation (1) is to thus use a control function approach instead, which is more efficient but will not allow for the inclusion of country fixed effects, μ i . 15 In this setting, the predicted probabilities of IMF participation can be obtained from a probit model.
To implement these analyses, we need a flexible estimator for multi-equation econometric models that can accommodate non-linearity if need be (i.e., if researchers choose to use a control function approach). MLE is suitable for this purpose and can be implemented in the cmp module for Stata, which allows us to jointly estimate the covariates of W, d IMFPROG, and d IMFCOND. The procedure produces consistent estimates under the assumption that the system is recursive and errors follow a multivariate normal distribution. It allows for arbitrary cross-equation correlation of errors, and clustered standard errors using the bootstrap. Details on how the model is jointly estimated, including the theoretical properties of the estimator and its distributional assumptions, are available in Roodman (2011) and further detailed in our supplementary online appendices. 16

Identifying excludable instruments
The perennial challenge of instrumental variable and control function approaches is finding observable variables that affect the endogenous variables the scholar wishes to instrument-here, the number of conditions applicable and the decision to participate in a program in the first place-but not the outcome variable, except via the impact on 14 This model can accommodate varying lag structures for right-hand variables, the appropriateness of which will depend on theoretical expectations and the outcome of interest. For instance, some studies enter IMF variables lagged one year to correspond with the budget cycle (Crivelli and Gupta 2016); while others suggest lags of either zero, one, or two years may be appropriate depending on the effect pathways one purports to measure (Oberdabernig 2013). There may be instances where researchers wish to test on even deeper lags where effects are expected to unfold only after a substantive period of time has elapsed. 15 Conditional logit estimation would allow for the inclusion of fixed effects in a binary response model while avoiding incidental parameter bias. However, its error distribution is incompatible with the assumptions required for multi-equation MLE. Unlike conditional logit, the latter model affords the flexibility to extract the endogenous components of both IMF programs and IMF conditionality. 16 The online appendices are available on The Review of International Organizations' website. conditionality and participation respectively. To overcome this issue, we develop a new instrument for IMF conditionality, before assessing potential instruments for IMF participation used in previous studies.
For conditionality, we repurpose a recently popularized compound instrumentation approach from the aid effectiveness literature (Dreher and Langlotz 2017;Nunn and Qian 2014). Specifically, our instrument is the interaction of the within-country average of the number of conditions across the period of interest with the year-on-year IMF's budget constraint. Formally specified, the predicted values for IMF conditionality specified in Equation (2) are derived as follows: Here, i is country and t is year. d IMFCOND is the fitted number of IMF conditions; IMFCOND ÀÀÀÀ ÀÀ ÀÀÀ À is the country-specific average of conditions; IMFBUDG is the budget constraint of the IMF in year t; X is a list of covariates from the outcome equation; μ is a set of country dummies; and δ is a set of year dummies. The identifying assumption of Equation (4) is that the outcome of interest in countries with different exposure to conditionality will not be affected differently by changes in the IMF's budget constraint other than through the impact of IMF conditions. This econometric strategy is supported in analytical proofs by Bun and Harrison (2018) and Nizalova and Murtazashvili (2016) that show the interaction of an endogenous variable with an exogenous one can be interpreted as being exogenous under mild assumptions. The approach is akin to a (continuous) difference-in-difference design: the effect of conditionality on the outcome of interest is compared across a group of high-exposure countries to IMF conditions and a group of low-exposure countries as the IMF's budget constraint changes. As in any difference-in-difference design, identification rests on an exogenous treatment and the absence of different pre-trends across groups. Below we discuss these assumptions, along with the usual assumptions on instrumental variables.
The instrument fulfils the relevance criterion because the cross-sectional average of conditionality approximates the general propensity of a country to obtain a specific amount of conditions in any given year, after accounting for observable factors that usually explain such variation. Furthermore, as previous research shows, the number of conditions increases when country demand for IMF loans grows, and decreases or stagnates when country demand for IMF loans is weak (Chapman et al. 2017;Dreher and Vaubel 2004). A plausible rationale for the observed relationship between the number of conditions and country demand is that as the IMF assists more countries, resource scarcity prompts the organization to assign a greater number of conditions to any given country as a safeguard measure for loan repayments (Dreher and Vaubel 2004;Vreeland 2003). The inverse also holds: Lang (2016) shows that the IMF is more generous with its loans when it has high liquidity, implying less conditions in times of resource abundance as the Fund becomes more eager to recruit borrowers, anticipating interest revenues on loans (Babb and Buira 2005). 17 This line of argument is underpinned by the idea that the IMF is, at least to some degree, a self-serving international bureaucracy that seeks to maximize revenues, protect future budgets, and maintain a position of global power (Dreher and Vaubel 2004;Vaubel 1996).
In our analysis, the budget constraint is measured via proxy using the natural log of the IMF's liquidity ratio-calculated as liquid resources divided by liquid liabilities (Lang 2016;Nelson and Wallace 2017). Here, liquid resources is the sum of usable currencies plus Special Drawing Rights contributed; and liquid liabilities is the sum of members' reserve tranche positions plus outstanding IMF borrowing from members (Lang 2016). Figure 1 plots the natural log of IMF liquidity ratio and the mean number of conditions per participating country in a year for 1990 to 2014, showing a significant correlation between the two variables (r = −.30). In regressions further below, the instrument consistently satisfies benchmarks for identifying strong instruments, with a Kleibergen-Paap F-statistic above ten (Staiger and Stock 1997).
The instrument fulfils the exclusion criterion because country-specific changes in conditionality that deviate from its long-run average are brought about only by decisions of the IMF that do not pertain to any given country, such as the introduction of social spending floors in the late-1990s or the streamlining initiative of the early-2000s (IMF 2001a; Kentikelenis et al. 2016). While one might be concerned about potential direct effects of the general propensity of a country to obtain a specific amount of conditions in any given year on the outcome variable, we control for this effect through the inclusion of country fixed effects in both conditionality and outcome equations (Dreher and Langlotz 2017). Moreover, there is no apparent pathway from the IMF's own budget constraint to an outcome variable of a given country other than through conditionality, since it is driven by organizational factors that have nothing to do with the characteristics of borrowing countries.
There could be a question on the exogeneity of the budget constraint insofar as wealthy member countries can replenish IMF resources in response to a greater number of countries participating in programs, which would diminish the Fund's risk aversion such that the organization is willing to agree to fewer conditions when bargaining a new program with a recipient country (Dreher and Vaubel 2004). This logic is flawed, however, because the amount of financial resources that members commit to the Fund is determined by exogenous institutional processes: set by the IMF's Board of Governors following quota reviews conducted every five years (IMF 2017a). It is therefore unlikely that IMF resource increases would then be linked to the outcome through unobserved channels (Lang 2016).
A related concern over instrument excludability is that donors may be less willing to spend in times of global financial crisis, resulting in both a reduction of the IMF's concessionary lending budget-which, unlike its non-concessionary account, is replenished through voluntary contributions from member countries rather than quota subscriptions (IMF 2016b)-and deteriorating socio-economic outcomes in aid dependent countries. The inclusion of year dummies in both conditionality and outcome equations can capture common external shocks across all countries, and will ensure that the instrument is not correlated with the error term in the outcome equation (Nunn and Qian 2014). In essence, our identification strategy relies on "the interaction term being exogenous conditional on the baseline controls" (Nunn and Qian 2014, p. 1632 emphasis added).
Another potential concern over instrument excludability derives from the underlying bargaining process that determines conditionality (Dreher and Vaubel 2004). It is possible that the within-country average of the number of conditions reflect levels of country 'priority' within the IMF insofar as it relates to geopolitics or risk aversion. For instance, geopolitically important countries have greater bargaining power and may thus receive fewer conditions on average than less geopolitically important countries (Stone, 2002(Stone, , 2011. If the elasticity of conditionality with regard to the IMF budget constraint also depends on political priority, such that low-priority countries see their conditionality increase faster than high-priority countries when IMF resources become scarce, then the exclusion criterion would be violated through the correlation between the instrument and the unobserved variable 'priority'. To test if this is the case, in Fig. 2 we divide countries into low, medium, and high conditionality groups and graph the IMF liquidity ratio against the number of conditions assigned to a specific country for each year of IMF participation. Countries at different burdens of conditionality have, on average, a similar propensity of receiving more conditions when the amount of countries in IMF programs increases, as indicated by equivalent gradients of the lines of best fit across conditionality groups (all between r = −.15 and r = −.29). We thus show that the elasticity of conditionality with respect to geopolitical importance is approximately constant; that is to say, the instrument satisfies the homogeneity of treatment assumption necessary for the exclusion criterion to be valid (Hainmueller et al. 2016).
Finally, since the above approach is akin to a difference-in-difference design, nonparallel trends across groups with different exposure would undermine identification (Christian and Barrett 2017). In our case, trends over time in the number of IMF conditions and the outcome variable should be similar across above-mean conditionality exposure and below-mean conditionality exposure groups of countries. In addition, inference would be threatened if there was a non-linear trend in the time-varying part of the compound instrument that is similar to the respective trends in the potentially endogenous variable and the outcome variable in the high-exposure group of countries. This is because such non-linear parallel trends would create spurious correlation not controlled for by year fixed effects unless the same non-linear trends are also present among the low-exposure group of countries (Christian and Barrett 2017). In our case, we would be concerned about similarly shaped non-linear trends in the IMF liquidity ratio, the number of IMF conditions, and education expenditure if these trends only occur among high-exposure countries but not the low-exposure countries. We graphically assess these two conditions below. Figure 3 allows us to compare trending behavior across exposure groups for the mean number of conditions (left panel) and our outcome variable of government education spending as a share of GDP (right panel). The trend pattern for the number of conditions is qualitatively similar across groups, although the highexposure countries face a higher number of conditions by definitional fiat in any given year. We also find similar trends across exposure groups in the outcome variable. In Fig. 4, we present the temporal evolution of the IMF liquidity ratio. Here we observe no similar (non-linear) trends between the IMF liquidity ratio and the mean number of conditions, or between the IMF liquidity ratio and education expenditures. Consequently, there is no apparent violation of the design assumptions of the difference-in-difference approach. Overall, we believe that the argument for using the interaction of the within-country average of the number of conditions and the year-on-year IMF budget constraint as an instrument is well grounded. To violate the exogeneity assumption, one would have to find an unobserved variable driving the relationship between the time-variant IMF budget constraint and the outcome of interest; and such an unobserved variable would also need to be correlated with the country-specific average of conditionality after controlling for country and year fixed effects and a vector of controls. Although it is unlikely that such a variable exists, we consider this possibility by incorporating plausible candidate variables for our particular outcome of interest as additional controls in robustness checks, namely education commitments of overseas development assistance and the number of role-equivalent countries participating in IMF programs in the past three years. Furthermore, to violate the parallel and non-overlapping trends assumptions, the contemporaneous trends in the outcome of interest would have to follow different patterns across groups of countries with high compared to low exposure to IMF conditionality, and the trends in the IMF's budget constraint, IMF conditionality, and the outcome of interest would need to overlap (Christian and Barrett 2017).
Excludable instruments are also needed to account for endogeneity of IMF participation. As described in Section 2.2, past studies usually rely on UNGA voting similarity with the US (Dreher and Gassebner 2012;Steinwand and Stone 2008;Woo 2013), but the LATE of the instrument might not be generalizable since politically motivated programs could be less (or more) effective (Dreher et al. 2018). Our preferred approach is thus to use another compound excludable instrument initially proposed by Lang (2016) and adopted more recently by Nelson and Wallace (2017): the interaction of the within-country average of IMF program participation across the period of interest with the year-on-year IMF's budget constraint, again using the natural log of the IMF liquidity ratio as a proxy.
Formally specified, the predicted values for IMF participation specified in Equation (1) are derived as follows: Here, i is country and t is year. d IMFPROG it is the predicted probability of IMF participation; IMFPROG ÀÀÀÀ ÀÀ ÀÀÀÀ is the country-specific average of participation; IMFBUDG is the budget constraint of the IMF in year t; X is a list of covariates from the outcome equation; μ is a set of country dummies; and δ is a set of year dummies. Lang (2016) provides a robust defense of the instrument's excludability, which follows the same logic as our IMF conditionality instrument vis-à-vis the exogenous variation of the budget constraint. Again, since such an approach is akin to a (continuous) difference-in-difference design, we check below whether its underlying assumptions hold.
In Fig. 5, we find both exposure groups to be similar in terms of their trending patterns with respect to the outcome variable and the mean probability of IMF program participation. Furthermore, there is no apparent trend similarity between the IMF's budget constraint (see Fig. 4) and the mean probability of IMF program participation and education expenditure, respectively, among above-mean participation exposure countries.

Performance assessment of estimators using Monte Carlo simulation
While the proposed multi-equation approach with correlated errors is preferable on a theoretical basis, its performance relative to alternative approaches needs to be tested. We conduct Monte Carlo simulations, which enable us to control the data generating process and thereby assess how well different estimation methods approximate the true parameter values.
Building on our theoretical discussion, we compare the performance of six estimators: our favored double-instrumental variable maximum-likelihood estimator that essentially linearizes the IMF participation equation to include fixed effects, akin to a 3SLS estimator, which accounts for endogeneity of IMF programs and conditionality (IV/IV/MLE); a conditional mixed process maximum-likelihood estimator that accounts for endogeneity of IMF programs and conditionality but does not linearize the IMF participation equation (CFA/IV/MLE); a two-step variant of Heckman estimator using a control function approach without correcting for endogeneity of conditionality (CFA/−/OLS); a 2SLS estimator correcting for endogeneity of IMF programs through a linearized IMF selection equation but with no correction for conditionality (IV/−/OLS); a 2SLS estimator without endogeneity correction for programs but with correction for endogeneity of conditionality through an instrumental variable approach (−/IV/OLS); and a simple OLS estimator without any endogeneity correction (−/−/OLS). We scrutinize these estimators across several scenarios. In the first five scenarios, we match the proposed data-generation process but vary cross-equation correlations and the strength of the instruments. In the last two scenarios, we probe robustness of the estimators to omitted-variable bias that jointly affects IMF treatments and the outcome of interest. For each simulation scenario, the data are generated and models estimated 500 times, and three quantities are calculated: bias, root mean square error, and optimism (Bell and Jones 2015). 18 We find that our proposed solution-instrumenting for both IMF treatments in a multi-equation framework with joint error structure-is the preferred one unless instruments are weak. Whether CFA/IV/MLE is preferable to IV/IV/MLE, however, depends on the specific parameterization. For example, CFA/IV/MLE performs best when a strong instrument for IMF conditions is available, when there is high residual correlation across equations, and when the model is misspecified by ignoring a confounder 18 We implement the simulation as follows. First, in each replication loop, we create a rectangular dataset of 1000 observations. Second, we generate the normally-distributed predictors, instruments, and errors, and calculate the response variables. The instrument is defined such that it correlates with the main predictor but also includes white noise, which simulates instrument weakness. Third, we run the analysis using six different estimators in each of the seven scenarios, calculating for each coefficient estimate its bias, mean squared error, and optimism with respect to the true parameter. For detailed descriptions and results tables please refer to the online appendices.

Fig. 5 Parallel trends in IMF participation compound instrument
How to evaluate the effects of IMF conditionality that affects both IMF programs and the outcome of interest. Conversely, IV/IV/MLE performs well (and slightly better than CFA/IV/MLE) when both instruments are strong and cross-equation correlation is mild, and in the misspecified case where a third variable causes both IMF conditions and the outcome. Our simulations also suggest conditions under which simpler alternatives underperform. Plain OLS is particularly poor when the model is misspecified. In this case, the bias concentrates in the coefficient of the IMF treatment with unobserved correlation with the outcome. A similar result holds when conditionality is instrumented but the researcher ignores selection into IMF programs. Likewise, when only IMF programs are instrumented but not IMF conditions, results are severely biased under a misspecified model. In this case, a control function approach performs relatively better than a linearized IV model for program selection, but both have unacceptable biases. Only when both instruments are weak, such estimators yield less biased results than the more complex doubleinstrumentation estimators that our approach uses. In sum, we find that CFA/IV/MLE and IV/IV/MLE are virtually unbiased across a wide range of scenarios and perform best in most cases on root mean squared error and optimism diagnostics. We take this as evidence of the superiority and robustness of our proposed estimator where problems of program selection and endogeneity of conditionality are intertwined. However, a caveat when applying these approaches is that moderately strong instruments are necessary. Failing that, simpler alternatives such as OLS are more robust-as confirmed by recent methodological literature (Young 2018).
Overall, our recommended approach has the advantage of enabling scholars to account for program heterogeneity and to separate the effects of conditionality from other aspects of IMF operations, while addressing potential endogeneity concerns surrounding the IMF variables. Our approach is also flexible enough to accommodate disaggregated counts of conditions (see below).

An empirical application to government education spending
To illustrate the utility of our proposed method, we present an empirical application where government education expenditure is the outcome variable. This issue has been the subject of sustained controversy, as critics argue that IMF programs result in cuts to funding earmarked for education (Nooruddin and Simmons 2006), while the IMF retorts that its programs safeguard such expenditure (Clements et al. 2013;IEO 2003). Previous studies utilized a dummy variable to measure the presence of a Fund-supported program, thereby masking diverse country experiences (Clements et al. 2013;Huber et al. 2008;IEO 2003;Nooruddin and Simmons 2006). We overcome this limitation by examining whether the number and type of conditions affect spending.

Pathways
We posit three pathways linking IMF conditionality to government education expenditure. These invoke both types of conditions: the quantifiable macroeconomic targets that make up the majority of conditionality ('quantitative conditions'); and the microeconomic reforms aimed at altering the underlying structure of an economy ('structural conditions'). A further three pathways pertain to IMF operations outside of the conditionality channel.
First, structural conditions that explicitly call for restructuring of the education sector may either increase or decrease education expenditure. For instance, the Nicaraguan program in 1999 required "management by local boards of 95 percent of secondary schools and 65 percent of primary schools" (IMF 2000b); Bolivian officials agreed in 2000 to "develop a reform proposal for higher education in order to reduce the share of public resources for higher education" (IMF 1999); and the Azerbaijani program called for authorities to "establish special government commissions to develop reform plans for the health and education sectors" in 1997 (IMF 1997).
Second, structural conditions that directly stipulate education sector hiring decisions can reduce government education spending. Examples abound: Bulgaria's program in 2006 required "an employment cut of at least 5,500 positions in the education sector" (IMF 2006); Sierra Leone had to "implement concrete measures to control the teachers ' payroll" in 2002' payroll" in (IMF 2002; and in 2004 Tajikistan was required to "reduce the number of employees in the education sector by 5 percent" (IMF 2003). By the same token, quantitative conditions stipulating general-rather than education-specificwage bill limits on the public sector can indirectly impede a borrowing country's ability to hire and remunerate teachers (Marphatia 2010). In West Africa alone, between 1995 and 2014 a combined 95 of the 211 years with IMF conditions included limits to the wage bill (Stubbs et al. 2017b).
Third, quantitative conditions set by the IMF in the majority of its programs on budget deficit reduction, international reserve holdings, and net domestic borrowing can shrink fiscal space, indirectly forcing a reduction in spending in the education sector Stubbs and Kentikelenis 2018). While IMF staff claim that quantitative conditions stipulating floors on social expenditures reverse this drag on education spending (Clements et al. 2013;IMF 2017b), recent research shows that these floors are observed infrequently, whereas fiscal deficit targets are almost always met .
These initial three pathways are represented empirically with a series of IMF conditionality variables (see below). Outside of conditionality channels, the IMF may also bolster government education expenditure via-fourth-the provision of low interest credit provided under its programs, although this additional resource is sometimes used to repay external debt instead (Gould 2003). Fifth, the presence of IMF programs may give countries a 'stamp of approval' that could catalyze additional bilateral aid from donor governments, thus boosting education expenditures through donor financing (Clements et al. 2013). Even so, evidence elsewhere shows this aid substitutes-rather than complements-government spending on the social sector, and that aid flows increase for general budget support and debt relief but not for education Stuckler et al. 2011). Sixth, scaled-up technical assistance can improve budget monitoring and execution, thus increasing the proportion of the education budget commitment that is actually spent on the education sector, rather than being spent elsewhere or going unspent (Stubbs et al. 2017b). For instance, the IMF prioritized assistance to improve the utilization of social sector appropriations in Benin in the late-1990s (IMF 1998), ultimately contributing to higher social spending (IMF 2000a). These subsequent three pathways are captured in our analysis by an IMF participation variable.

Variables
We investigate the effects of IMF conditionality on government education spending as a share of GDP for 132 developing countries over the period 1990 to 2014, as reported by the World Bank (2016). Our IMF conditionality variables are based on the coding of agreements between the Fund and its borrowers . The online appendices provide further details on how the dataset was created. We only count binding conditions (known as 'prior actions' or 'performance criteria'), following established procedures in this field of study (Copelovitch 2010a; Rickard and Caraway 2014;Stubbs et al. 2017b;Woo 2013). 19 Binding conditions directly determine scheduled disbursements of loans and must be implemented for the program to continue; whereas non-binding conditions serve as markers for broader progress assessment and non-implementation does not automatically suspend the loan (IMF 2001b), which may thus introduce noise to the analysis if included (Stubbs et al. 2017b). Our chosen measure also allows us to empirically isolate a conditionality effect from an effect-or lack thereof-due to country non-compliance, since the binding character of these conditions precludes the possibility of the latter (Dreher 2006;Vreeland 2003Vreeland , 2006. Since hypothesized pathways of impact entail both structural and quantitative conditions, our initial measure of conditionality aggregates them. In further analyses, we then test the disaggregated impact of conditions separated into quantitative and structural types, and also explore the effect of conditions in given policy areas. Our IMF participation variable is measured with a binary indicator for whether a country was under an IMF program in a given calendar year for at least five months of the year (Dreher 2006).
Following previous research (Clements et al. 2013;Huber et al. 2008;Stubbs and Kentikelenis 2018), our outcome equation includes control variables for economic and demographic factors that affect both the need for education expenditure and governmental capacity to meet those needs. Entering the model contemporaneously are the natural logarithm of GDP per capita, urban population as a share of total population, population aged under 15 as a share of working-age population, and level of democracy; while government balance as a share of GDP and trade as a share of GDP enter the model lagged one year to correspond with the budget cycle. We expect a positive effect of GDP per capita because, according to 'Wagner's Law', state activities expand to cover new administrative and social functions as economic development takes place, thereby increasing state spending (Brady and Lee 2014;Nooruddin and Simmons 2006;Wagner 1994). Urbanization should have a positive effect since urban dwellers can more easily mobilize to request additional services from governments as well as offering economies of scale (Baqir 2002). The population under 15 accounts for the demographic need for a government to spend on education (Huber et al. 2008). Democracy controls for the well-established finding that democratic governments increase public spending on social programs (Stasavage 2005). The government balance in the previous year controls for a country's fiscal space to increase education spending (Clements et al. 2013). Last, we expect a positive effect of trade openness in the previous year because it can lead to increases in social expenditures as governments seek to compensate those adversely affected by it (Rodrik 1998;Rudra 2008). We also introduce country fixed effects to account for time-invariant country-level characteristics, and year fixed effects to control for common external shocks across all countries. Data sources and summary statistics are provided in the online appendices.
As described earlier, our identification strategy entails instrumenting the number of conditions using the interaction of the within-country average of the number of conditions with the natural log of the IMF liquidity ratio; and instrumenting program participation with the interaction of the within-country average of program participation with the natural log of the IMF liquidity ratio. We compare how results differ for instrumental variable and control function approaches to endogeneity of IMF participation. The system of three simultaneous equations is estimated through MLE. Standard errors are calculated using the clustered Sandwich estimator, which adjusts for heteroscedasticity and serial correlation. Analyses are performed using Stata version 15.

Findings
In Table 2, we present the results of our quantitative analyses on government education expenditure as a share of GDP on six variants of our model. The online appendices report results from the first-stage models for IMF conditionality and participation.
To ensure our model specification is appropriate, in Model 1 we initially include only the control variables and conduct simple OLS estimation. Results on controls follow the expected effect direction established by previous studies on government education spending (Clements et al. 2013;Huber et al. 2008;Stubbs and Kentikelenis 2018): positive for urbanization, dependency ratio, democracy, and trade (although only democracy is statistically significant); and negative for GDP per capita and government balance (both statistically significant). 20 For Model 2, we add the IMF condition and participation variables, but again run simple OLS without any endogeneity corrections. Here, none of our IMF variables reach standard thresholds of statistical significance, and results on controls remain stable.
Next, we employ our preferred identification strategy in Models 3 and 4. As described earlier, we correct for endogeneity of the number of conditions and program participation in an instrumental variable approach using compound instruments: the interaction of the within-country average of the number of conditions with the natural log of the IMF liquidity ratio; and the interaction of the within-country average of IMF program participation with the natural log of the IMF liquidity ratio. In Model 3, we exclude potentially endogenous controls that could be affected by IMF intervention, namely democracy, government balance, and trade. In this setting, exposure to an additional IMF condition is associated with a statistically significant decrease of 0.05 percentage points in government education spending as a share of GDP. When we add potentially endogenous controls in Model 4, the headline result is substantively unchanged. The divergent findings from Model 2-which underestimates conditionality 20 Previous studies did not include both country and year fixed effects (Clements et al. 2013;Huber et al. 2008;Stubbs and Kentikelenis 2018). Our specification is thus a more stringent test, which may explain why some of our controls do not reach standard thresholds of significance. n/a n/a F-tests are Kleibergen-Paap statistics. Cluster-robust standard errors in brackets * p < 0.10, ** p < 0.05, *** p < 0.01 effect size and statistical significance-indicates the presence of endogeneity. In particular, the findings suggest that when government education spending is high, either the Fund imposes or a country actively seeks more conditions. Our finding on the statistically significant negative effect of IMF conditions is also substantively significant. The mean and standard deviation of government education spending as a share of GDP across our sample is 4.41 and 2.15 percent respectively; thus, setting the number of conditions at its mean of 10.61 corresponds to an average decrease of about quarter of a standard deviation in education spending, or 0.53 percentage points, all other factors held constant. Based on findings in Model 4, such an effect is comparable to a four-point decline in democracy, using an ordinal scale from zero (full autocracy) to ten (full democracy) (Teorell et al. 2016).
Outside of the conditionality channel, the IMF effect direction is negative but does not reach standard thresholds of significance. Results on control variables maintain their direction of effect, but government balance is no longer statistically significant in Model 4. Diagnostic statistics indicate that our instruments are strong across Models 3 and 4: Kleibergen-Paap F-statistics of 33.63 and 35.93 respectively for the conditionality compound instrument; and 15.03 and 16.61 for the participation compound instrument. 21 The instruments are also jointly relevant in both models, with Fstatistics of 33.64 and 35.96 respectively.
For Models 5 and 6, we maintain an instrumental variable approach to correct for endogeneity of conditionality, but employ a control function approach to correct for the endogeneity of participation. Recall that this approach is more efficient and can still use our compound instrument for IMF participation, but will not allow for the inclusion of country fixed effects in the first-stage equation due to the incidental parameter problem (Greene 2004). Model 5 again excludes potentially endogenous controls, while Model 6 includes the full list of controls. The main results are equivalent across both models: exposure to an additional IMF condition is associated with a statistically significant decrease of 0.07 percentage points in government education spending as a share of GDP. Compared to Models 3 and 4, the effect of IMF conditions is accentuated and reaches the higher significance threshold of p < 0.01. The IMF participation variable, while now positive in direction, remains statistically non-significant. Control variables are also qualitatively equivalent with the exception of a (non-significant) reversal of effect on the government balance. The conditionality compound instrument remains strong; and the first-stage probit shows the participation compound instrument is a statistically significant predictor of IMF participation.
To recap our results thus far, we find a consistent negative effect of the total number of IMF conditions on government education spending but no effect for other aspects of IMF operations across multiple model specifications and identification strategies. To demonstrate a potential avenue for future research, we investigate the impact of disaggregated sets of conditions in Table 3. When disaggregating conditionality, care must be taken to ensure all conditions are jointly included in a single model, so that the effect of the residual conditions is not attributed to the condition type or policy area of interest. Thus, we include two IMF conditionality variables in each model. Compound instruments can be constructed for each condition count within a single model based on the interaction of the within-country average of that condition type with the year-on- year IMF budget constraint. The online appendices report first-stage models for IMF participation and both conditionality variables. In Models 7 and 8, we test the impact of the number of conditions disaggregated into structural and quantitative types using our two alternative identification strategies. Results reveal that quantitative conditions yield a statistically significant negative effect in line with our theoretical expectations, but only when we use a control function approach to correct for endogeneity of IMF participation; however, structural conditions have a nonsignificant negative association on both models. The IMF participation variable also never reaches standard thresholds of significance and fluctuates in effect direction across the two models. Findings on controls remain consistent with previous models; and diagnostic statistics indicate our compound instruments are strong throughout.
For the next set of models, we investigate the impact of specific policy-area conditions. 22 We examine, in Models 9 and 10, expenditure conditions using the same identification strategies as the previous two models. Predicated on our earlier discussion of pathways linking IMF conditionality to government education expenditure, we expect these will constrain education spending. As a falsification test, in Models 11 and 12 we examine revenue conditions, which we expect will have no effect. Based on our dataset coding, expenditure conditions include all those related to expenditure administration, fiscal transparency, audits, budget preparation, domestic arrears, and fiscal balance; revenue conditions include those related to customs administration, tax policy, tax administration, and audits of private enterprises. The online appendices provide additional details on how conditions are classified into policy areas.
The results confirm our theoretical expectations. Expenditure conditions are negatively associated with government spending on education, and residual policy areas exhibit no effect; whereas revenue conditions yield no association, and residual policy areas are negatively associated with government spending on education. In particular, each additional expenditure condition corresponds to at least a 0.47 percentage point decrease in government education spending as a share of GDP. Setting the number of expenditure conditions at the mean of 1.77 would thus result in a 0.83 percentage point decrease in education spending, all other factors held constant. Outside of the conditionality channel, the IMF effect direction again fluctuates but never reaches standard thresholds of significance; and findings on control variables are also substantively unchanged from previous models. Diagnostic statistics indicate that our compound instruments are strong on all disaggregated condition counts except for those pertaining to IMF revenue conditions in Models 11 and 12, with Kleibergen-Paap F-statistics of 9.52 and 8.90 respectively. It thus cannot be discounted that coefficients in these two models may be biased. Our participation instrument is also strong throughout; and where a control function approach is used, first-stage probit results show the participation compound instrument is a statistically significant predictor of IMF participation. 22 The differential impact of conditionality can be captured across a series of specific policy areas by including a separate variable for the number of conditions requiring adoption of a specific policy area-such as anticorruption, stabilization, liberalization, deregulation, or privatization-along with a variable for the number of all remaining conditions. We recommend the inclusion of one specific policy area per model to avoid issues of multicollinearity between condition counts of different policy areas. For instance, liberalization conditions tend to be accompanied by privatization conditions, so are highly correlated in analyses.

Robustness checks
We perform a series of robustness checks, presented in Table 4. Results from the firststage models for IMF participation and conditionality are available in the online appendices. One concern is that countries fail to carry out these conditions, which would mean we erroneously attribute an effect to conditions that were never implemented. While using a binding condition count largely circumvents this issue (as they must be met in order for the program to continue), in some circumstances the IMF's Executive Board can grant waivers that allow countries to miss certain conditions without having their program suspended (Pop-Eleches 2009;Stone 2004). We thus deploy an implementation-corrected measure of conditionality, which deducts the number of waivers from the total number of binding conditions, in order to account for country compliance with conditions. Since the measure is not available beyond 2009, our sample period is reduced. As shown in Model 13, and using our preferred identification strategy, exposure to additional IMF conditions is still associated with decreases in government education spending at standard thresholds of significance, and results on control variables remain consistent. Compound instruments also remain strong. As an alternative, we instead adopt an implementation-discounted measure of conditionality in Model 14. IMF staff conduct reviews of programs regularly, and in case of non-implementation, the conclusion of these reviews are delayed. Thus, we discount conditions during the interruption period to account for non-compliance, as detailed in the online appendices; again, our sample does not extend beyond 2009. We find that accounting for program delays does not substantively alter results. Second, in the analyses thus far we focus on binding conditions, but it is plausible that non-binding conditions may also have an effect on government education spending. In Model 15, we incorporate the number of binding and non-binding conditions into our measure of conditionality. Results on control variables remain consistent, but exposure to additional IMF conditions has an attenuated effect and no longer holds a statistically significant relationship with decreases in government education spendingconsistent with the view that the inclusion of non-binding conditions introduces noise to the analysis (Stubbs et al. 2017b).
Third, to alleviate further concerns over the excludability of our conditionality instrument, we control for additional variables that may be driving the relationship between the IMF budget constraint and education expenditure that are also correlated with the country-specific average of conditionality. One possibility is donors' changing commitment to education via the development of new global initiatives, such as the Global Partnership on Education or Goal 2 of the Millennium Development Goals (i.e., to achieve universal primary education). These kinds of initiatives could prompt member countries to increase voluntary contributions to the IMF's concessionary account, thereby decreasing the IMF budget constraint; and the IMF might also assign more conditions to countries with poor education outcomes. At the same time, donors may also increase funding to countries with poor education performance, which could result in more government education spending. To address this issue, in Model 16 we control for the natural log of total education commitments of overseas development assistance to the recipient country. Our sample is reduced by 20% due to missing data on education aid. In this context, the effect size of IMF conditions is consistent with previous models, but it does not reach standard thresholds of statistical significance since the loss of observations reduces the precision of our estimates; our compound instrument for IMF participation is also slightly weak. We therefore re-run the analysis in Model 17 using instead the more efficient control function approach to account for endogeneity of IMF participation, finding IMF conditions have a statistically significant negative association with government education spending but that our compound instrument for IMF conditions is slightly weak.
Another possibility is that countries make policy decisions based on a process of social comparison-or 'competitive mimicry'-stemming from the pressure to remain effective and efficient relative to relevant-or 'role equivalent'-others in the global system of states (Henisz et al. 2005). A country's selection decision into an IMF program could be prompted by the number of role-equivalent others currently participating, which will impact the IMF's budget constraint and-in turn-the countryspecific average of conditionality. At the same time, a country may expect competition in the global system of states to intensify and expand when a greater number of roleequivalent others are on IMF programs, and will therefore be more compelled to match education spending levels with role-equivalent others. To account for this scenario, we create 18 groups of role-equivalent countries by sub-dividing the six World Bank regional groups (East Asia and Pacific, Europe and Central Asia, Latin America and the Caribbean, Middle East and North Africa, South Asia, and Sub-Saharan Africa) into three World Bank income groups (low income, lower-middle income, and uppermiddle income). For each country, we then calculate the number of country-years that role-equivalent countries spent under IMF participation in the past three years. When we include this variable in Model 18, our findings are unchanged.
Fourth, we compare our main results against alternative instruments for IMF participation. Past studies have shown that variables approximating geopolitical importance impact upon the decision to participate in IMF programs without necessarily affecting most domestic economic or social outcomes of interest, though the LATE might not be generalizable (Dreher et al. 2018). For instance, we know that allies of big powers receive favorable treatment by international financial institutions (Barro and Lee 2005;Dreher and Gassebner 2012;Steinwand and Stone 2008;Thacker 1999;Woo 2013). In Model 19, we thus deploy UNGA voting similarities with the US as an instrument for IMF participation instead of our compound instrument. Results remain consistent but the UNGA instrument is relatively weak. An alternative candidate variable for geopolitical importance is temporary membership in the United Nations Security Council Moser and Sturm 2011;Woo 2013), since major shareholders of the IMF may care about how countries vote and some countries are willing to trade their votes for IMF loans ). Using this variable as an instrument for IMF participation in Model 20 does not substantively alter results-and the instrument does not appear to be valid.
Fifth, we adopt alternative instruments to account for the endogeneity of conditionality in Model 21. As well as being determinants of IMF participation, UNGA voting similarities with the US and UNSC temporary membership are identified as potential determinants of conditionality that are plausibly exogenous with regard to government education spending (Caraway et al. 2012;Chwieroth 2015;Dreher and Jensen 2007;Dreher et al. 2015;Nelson 2014). Countries aligned with the US tend to receive more favorable treatment from the IMF and may thus receive fewer conditions; and major shareholders of the IMF may advocate softer conditionality in return for political influence over the UNSC. Using this strategy, we find a highly negative effect on government education spending of IMF intervention outside of the conditionality channel, but no statistically significant effect of IMF conditions. Little can be read into this result, however, because our conditionality instruments are weak, with a combined Kleibergen-Paap F-statistic of 2.76. This reinforces the value of our preferred compound instrumentation approach to examining the effects of IMF conditionality.

Conclusions
This article offered a new strategy for estimating the effects of IMF programs by incorporating fine-grained data on IMF-mandated policy reforms, or conditionality. The release of such data has enabled scholars to overcome two frequently evoked criticisms of earlier studies: that they treat the policy content of programs as homogenous; and that they do not delineate the effects of conditionality from other pathways of program influence. In so doing, the new data introduces methodological challenges linked to the inclusion of multiple endogenous IMF variables. After a review of existing quantitative approaches, we advocated that future studies utilize MLE over a system of three simultaneous equations, in effect combining the following: (a) an instrumental variable approach to account for endogeneity of IMF program participation, using the interaction of the within-country average of IMF program participation with the natural log of the IMF liquidity ratio as an excludable instrument; and (b) an instrumental variable approach to account for endogeneity of IMF conditionality, using the interaction of the within-country average of the number of conditions with the natural log of the IMF liquidity ratio as an excludable instrument. Monte Carlo simulations confirmed that our approach to addressing endogeneity is unbiased and performs better than the alternatives, provided that instruments are not weak. Applying our approach, we found that over the last two decades IMF conditions decreased government education spending in developing countries, consistent with expectations from past studies.
We note three main shortcomings of our approach. First, using the number of conditions as a measure of conditionality may not accurately reflect the true burden of conditionality, since it does not capture the relative difficulty of implementing any individual condition. By way of compromise, subsequent studies could consider how to incorporate the scope of conditionality-that is, the number of policy areas subject to conditionality-in addition to the number of conditions. Second, while our method enriches our understanding of the consequences of conditionality, it tells us less about the effect of IMF technical assistance. Our study utilized a dummy variable for the presence of a Fundsupported program to capture the effect of technical assistance, thereby-and in similar vein to criticisms made about the inability of earlier studies to encapsulate heterogeneous policy content-masking diverse country experiences that may, in turn, affect the outcome of interest. This markedly under-researched phenomenon represents the most promising avenue for future research to pursue. Third, compound instrumentation is no panacea. The approach is useful for identification when no obvious (non-compound) instrument is available, but researchers should verify to the extent possible that underlying assumptions hold. The core assumptions include (1) monotonicity of treatment (i.e., changes in the outcome in countries with different probability of IMF programs and mean number of IMF conditions-in short, IMF treatment variables-are not affected differently by changes in the IMF's budget constraint, other than via IMF programs and IMF conditionality, respectively); (2) parallel trends (i.e., the trends in IMF treatment variables and the outcome are similar across countries with above-mean exposure and below-mean exposure to IMF treatments); and (3) non-overlapping trends (i.e., non-linear trends in these variables for above-mean exposure countries do not overlap with the trend in the IMF's budget constraint).
Why should scholars prefer our approach over others when evaluating IMF programs? Our strategy has the advantage of offering an excludable instrument for conditionality which can-and should-be varied as needed for specific applications to conditionality policy areas, as we demonstrated with IMF expenditure and revenue conditions. Our approach also allows us to isolate where an effect is derived from among the conditions, promising greater nuance on policy advice. For instance, the results of our empirical application suggest that IMF programs should specifically reduce expenditure conditions if government spending on education is to be increased. Taken together, our approach enables scholars to draw more complete interpretations on the consequences of IMF programs. Even so, it is worth bearing in mind that any approach to endogeneity does not absolve scholars from thinking carefully about the underlying data-generating processes and making decisions based on strong theoretical foundations (Chaudoin et al. 2016).
Our methods are also transferable to studies on the effects of other organizations that engage in conditional program lending, such as the World Bank, regional development banks, the European Union, or large bilateral donors. Indeed, studying the effects of these programs is essential if we wish to better understand processes of change. As interest in the IMF-and other international financial institutions-continues to grow, so too will the available data sources and the attendant body of research around it. If we are to make the most of these developments, it is imperative that we revisit the methodologies used to estimate their consequences, paying attention to basic dictums of econometric analysis and the challenges of endogeneity. Until and unless we do, our knowledge of how these international organizations impact social, economic, or political outcomes will suffer.