How does government expenditure impact sustainable development? Studying the multidimensional link between budgets and development gaps

We develop a bottom-up causal framework to study the impact of public spending on high-dimensional and interdependent policy spaces in the context of socioeconomic and environmental development. Using data across 140 countries, we estimate the indicator-country-specific development gaps that will remain open in 2030. We find large heterogeneity in development gaps, and non-linear responses to changes in the total amount of government expenditure. Importantly, our method identifies bounds to how much a gap can be reduced by 2030 through sheer increments in public spending. We show that these structural bottlenecks cannot be addressed through expenditure on the existing government programs, but require novel micro-policies intended to affect behaviors, technologies, and organizational practices. One particular set of bottlenecks that stands out relates to the environmental issues contained in the sustainable development goals 14 and 15.


Introduction
In recent years, a vast literature on the sustainable development goals (SDGs) and the possibility of reaching them by 2030 has emerged. Some of these studies analyze specific SDGs and explore projections of indicators for different micro-policy interventions (e.g., González-Pier et al. 2016;Porciello et al. 2020;Boeren 2019;Sobczak et al. 2021;Mensi and Udenigwe 2021), while others focus on identifying synergies and trade-offs between different SDGs (indicators or targets) (e.g., Fuso Nerini et al. 2019;Lusseau and Mancini 2019;McGowan et al. 2019;Pedercini et al. 2019;Asadikia et al. 2021). This latter approach provides a more holistic evaluation of policy measures attempting to improve the performance of specific SDGs. A third variant of studies explores how the nature of the relationships between SDGs has changed over time and how likely it is that trade-offs can successfully transform into synergies in the coming years (e.g., Machingura and Lally 2017;Fader et al. 2018;Kroll et al. 2019;Amos and Lydgate 2020;Philippidis et al. 2020). Finally, a fourth set of studies makes use of expert advice or indicator trends to decipher the extent to which the SDGs might achieve the goals set for 2030 (e.g., Luken et al. 2020;Moyer and Hedden 2020;Pradhan et al. 2021;Benedek et al. 2021;Ionescu et al. 2020).
Two major points stand out from this succinct overview: (1) that a systemic perspective-emphasizing interactions among SDGs-is critical for policy evaluation; and (2) that a comprehensive understanding (quantitative) of how budgetary allocations impact SDG performance is almost entirely absent. This paper focuses on the latter point and tries to fill this knowledge gap by developing a modeling framework to study policy prioritization in the context of the SDGs. Akenroye et al. (2018) mention the importance of addressing the problem of policy prioritization and of leveraging existing budget resources for meeting these goals. Such funding frameworks are necessary to analyze pressing questions related to the effectiveness of public funding on existing government programs, for example: Do changes in the size and distribution of the budget (on existing programs) help, effectively, to close development gaps? 1 What are the most and least sensitive SDGs to such budgetary rearrangements? Can the commitments to the 2030 Agenda be met when there is enough government spending to guarantee successful policies? To what extent do structural factors hinder the effectiveness of existing programs? From the perspective of governments, understanding how their expenditure actions translate, at a systemic level, into effective policies is critical to guarantee the success of any international development agenda.
In this paper, we develop a bottom-up computational model in which public expenditure generates development advancement (with various degrees of effectiveness). A bottom-up approach to budgetary prioritization is necessary to properly account for political-economy factors that are present in a multidimensional and interdependent policy space (Guerrero and Castañeda 2020a;. One of the analytic benefits of this agent-computing model is the ability of calibration using coarse-grained data 2 of individual countries without needing to pool cross-national data. 3 We exploit this feature to study the sensitivity of country-specific indicators to changes in public expenditure. We study the feasibility of the SDGs across 140 countries using data from the 2020 edition of the Sustainable Development Report (SDR) (Sachs et al. 2020). 4 Our three main results are the following. First, we provide estimates of the SDG gaps that might remain by 2030 if government programs were to be kept unaltered. 5 Second, we demonstrate that the sensitivity of these gaps vary-in diverse and non-linear ways across countries and indicators-according to the amount of per capita government expenditure. Third, we identify the maximum reduction that can be achieved for the SDG gaps by 2030 through sheer expenditure increments. That is to say, there are stringent 'budgetary frontiers' that cannot be overcome without addressing long-term structural factors (redesigning the government programs). Altogether, our results provide quantitative and theoretically sound insights into what makes the SDGs unfeasible from the perspective of government expenditure and existing development strategies. 6 We structure the remainder of the paper in five more sections and an appendix. In Sect. 2, we present the methods employed: model description, network estimation, calibration procedure, goodness of fit, and definition of SDG gaps. In Sect. 3, we describe the sources of our database, which includes time series for development indicators and government expenditure, and explain how we geographically cluster the information for producing visualizations. Then, in Sect. 4, we show different figures describing the main results from our simulations. Section 5 compares alternative methodologies (data-fitting and aggregated models) with our bottom-up computational approach in the context of systemic policy analysis and budgetary allocations. Finally, in Sect. 6, we finalize the paper with a brief summary of the model's purpose and assumptions, and with a suggestion on how to use the simulation results for country-specific policy guidelines.

Methods
Essentially, the proposed model is designed to study how different budgetary allocations affect the-simultaneous and interdependent-evolution of a large set of development indicators. The model takes as inputs a vector with initial conditions for the indicators, a network with their interdependencies, a budget size, the fraction of positive changes in the indicators (as a measure of variation), and the final values they achieved in the last period of the sample. With this information, the parameters are calibrated to (1) match the simulated and empirical indicators in their final observations, and (2) match the fraction of positive changes. Due to the interdependent dynamics produced by the model, calibrating its parameters is not trivial. Nonetheless, we devise an efficient method that yields a goodness of fit above 90% for most countries. Our model is a variant of  and Guerrero and Castañeda 1 Informally, a development gap is the distance between the level of development of a policy issue and the goal to be achieved by a government or society. Section 4.1 provides a refinement of this concept through the idea of SDG gaps, which are the expected distance that an indicator will have in 2030 with respect to its goal. 2 In the agent-computing literature, this type of indicator data are considered coarse grained, since they do not provide disaggregated information about the individual behaviors of the agents: something typically needed to calibrate these models (e.g., microdata or administrative records). 3 While some studies use non-pooled country-level data to describe the structure of trade-offs and synergies (see Pradhan et al. 2017 and references), their capacity to produce quantitative prospective analysis is limited because proper statistical power can only be achievedunder traditional statistical tools-with a large number of observations (which can only be obtained by pooling cross-country data). 4 The SDR is produced by the Sustainable Development Solutions Network and the Bertelsmann Stiftung. 5 A government program is the set of policies that a government has in place to affect a specific development issue. Funding or defunding these programs is a short-/medium-term decision, while redesigning them is a long-term one (which needs to address structural factors). Our model focuses on the former-short/medium-term decisions-so it is assumed that the specific policies in place remain unchanged. 6 Although the third type of literature mentioned above argues for the need for structural changes to achieve SDGs and to break away from trade-offs, the meaning of structural bottleneck remains broad and often ambiguous in terms of policy instruments. Our paper sheds new light by introducing a more nuanced concept of bottlenecks, one with a direct link to government programs that can be directly affected through budgetary readjustments.
(2020a), with the improvement of accounting for the size of a government's budget. Similar models have been successfully applied to study ex ante policy evaluation (Castañeda and Guerrero 2019a), policy resilience , policy coherence (Guerrero and Castañeda 2020b), public governance , and sub-national development . Some of them have also been used in the provision of policy advice (Castañeda and Guerrero 2019b, d, c;Sulmont et al. 2021;Gobierno del Estado de México 2020). While the full details of the model are provided in Appendix A, here we explain the mechanisms that are most salient for this study and elaborate on the new calibration procedure.

Model description
The model consists of an agent representing the government or central authority in charge of deciding how to spend a budget of size B . There are N policy issues, each one with a level of development measured by an indicator. From these policy issues, n ≤ N can be directly impacted through existing government programs, and we assume that there is one program for each one of them. We call these types of policy issues instrumental, while the remaining N − n are considered collateral. An issue may be collateral because there does not exist a policy instrument to directly intervene it, and this may occur because the issue is too aggregate (e.g., GDP growth). 7 In addition to the government agent, there are n policymaking agents (functionaries), one in charge of each instrumental indicator 8 . Thus, the problem of the central authority is to allocate B resources across n policymakers to improve the N indicators. Policymakers, however, may have different goals from those of the central authority or may just be inefficient. Therefore, some of the allocated resources might end up diverted or wasted. Let us denote the allocation to instrumental policy issue i as P i , and the amount of resources that the policymaker uses effectively as C i ; we say that the latter is the contribution of the policymaker.
In an iterative process, the government agent reallocates its resources, prioritizing the most laggard 9 and the most efficient policy issues. In parallel, the policymakers try to maximize their benefit by determining a level of C i that shows proficiency to the central authority (for political reputation), but that also benefits them through the wasted resources P i − C i . The determination of C i happens through a behavioral model of reinforcement learning (which has extensive empirical validation), subjected to the monitoring of the government and to the corresponding penalties in case it spots inefficiencies.
The quality of the procurement mechanisms aimed at minimizing inefficiencies vary across countries according to empirical data on public governance, which we use as an input. With each step, the contributions and the total incoming spillovers S i (which could be positive or negative) determine the success probability i of the policy aimed at issue i. If the policy succeeds, the amount of improvement reflected in the indicator is proportional to the existing long-term structural factors, which we capture explicitly in a parameter i . Altogether, the model runs for T periods that can be mapped into calendar time. Parameter B corresponds to the empirical budget that a given country spent during the sampling period. Thus, in the calibration, the budget runs out after T simulation periods, reflecting different spending capabilities across countries, and enabling the test of potential effects from budgetary increments and reductions. We perform Monte Carlo simulations to generate stable measures of the indicators and other variables of interest. The reader should be aware that the model is calibrated and implemented for each country independently, so this approach overcomes concerns about biases from grouping countries or indicators.
In the interest of clarity and space, we summarize the model in Algorithm 1 and Figure 1. In this section, we focus on the two equations that drive the dynamics of the indicators and provide the details of the remaining equations in Appendix A. These equations connect the outcomes of the behavioral components with the spillover effects shaped by the network of interdependencies, and establish a clear differentiation between short-/mid-term and long-term dynamics. 7 Other reasons include lack of capacity (e.g., cybersecurity), advanced level of development (e.g., extreme poverty in some advanced economies), or lack of awareness (e.g., pollution and overexploitation of natural resources in several poor countries). 8 The model is flexible to accommodate multiple agents per indicator or multiple indicators per agent. This, however, requires detailed contextual information that we leave for country-specific studies. 9 Prioritizing laggard issues has been a promoted practice since the Millennium Development Project under the assumption that laggard indicators reveal potential bottlenecks. The panel at the center shows that the model establishes linkages between the micro and the macro levels. At the micro-level, the central authority allocates budgetary resources, while policymakers implement the government programs. At the macro-level, the network of interdependencies produces spillover effects that condition the evolution of the development indicators. In the upward causation component (right-vertical arrow), functionaries make an effective use of some of the resources that they receive from the central government. In the downward causation (left-vertical arrow), the overall dynamic produces reductions in the development gaps of the 2030 Agenda. This channel also transmits signals reflecting certain misuse of resources, which causes the government to penalize inefficient functionaries and reallocate resources. Moreover, the three circling arrows in the middle of the bottom panel describe a horizontal causation mechanism responsible for the social norms of inefficiency guiding functionaries' behavior. Finally, the left panel presents some of the outcomes that can be obtained from the model: the evolution of the indicators, policy priorities (allocation profiles), and sectoral inefficiencies. Sources: Guerrero and Castañeda (2020a) where is a normalizing constant 11 and S i,t is the total amount of spillovers received by indicator i in period t (this could be positive or negative). 12 The spillovers are computed every simulation period according to S i,t = ∑ j j,t j,i , where is an indicator function that returns 1 if indicator j grew in the previous period and 0 otherwise. The adjacency matrix corresponds to the empirical network of interlinkages, with each entry representing a conditional dependence from indicator j to i. Importantly, these conditional dependencies do not represent causal links, but rather an empirical regularity that the model takes into account (see Ospina-Forero et al. 2020 for a detailed discussion on estimating SDG networks and the impossibility of interpreting them as causal networks). While the structure of the network represented by is considered a long-term feature, the actual realization of the spillovers is a short-/mid-term phenomenon because it is the result of the dynamics of the other indicators in the previous period. 12 The term C i,t accounts for the expenditure contribution to an instrumental policy issue. For a collateral issue, C i,t equals zero, so its success depends on the overall 'financial health' of the government 1 n ∑ j C j,t , and on the spillovers S i,t . Therefore, we assume that public funding is a necessary but not sufficient condition for development. Now, let us define the evolution of indicator i as where parameter i > 0 captures long-term structural factors. 10 Parameter i imposes a limit to the growth that could be achieved in the short term through sheer spending. For instance, let ( i,t ) in equation 1 denote the outcome of a Bernoulli trial that can take values 1 (successful) or 0 (unsuccessful). This means that if a positive event materializes, the indicator grows according to i . As previously mentioned, the probability of a successful trial is i,t . Note that i,t is an endogenous variable of the model, so we proceed to explain how it is formed.

INTERVENTIONS
Recall that the total budget size across periods is B . This stock can be turned into flows by defining a disbursement schedule B 1 , … , B T , such that ∑ T t B t = B . For simplicity, let us assume that the disbursement schedule is homogeneous, so B t = B ∀ t . Next, consider the allocation profile P 1,t , … , P n,t that the central authority defines in period t. Under the homogeneous disbursement schedule assumption, ∑ n i P i,t = B holds, so the contributions of the policymakers are in the same units as the budget. To map C i,t into the success probability i,t , we define (1) Equation 2 represents the short-/mid-term component of the model, while parameter i from equation 1 captures the longterm factors limiting the impact of public expenditure on the indicators. For example, a government may increase the funds allocated to train quantum-computing engineers with the aim of strengthening this strategic area. While the number of engineers in this field may indeed increase due to the availability of scholarships, they may leave for another country or end up in unrelated jobs due to a lack of employment opportunities in the domestic labor market. A labor market-related structural factor, the demand for quantum-computing engineers, limits the speed with which this sector can develop; such speed will be reflected in modest improvements of the relevant indicators. Naturally, a structural reform could be seen as a change in i , but its interpretation proves difficult due to the multiple variables that are absorbed in this parameter; this is a challenge that we leave for future work. Nevertheless, i is informative about the limits of sheer spending at the level of each indicator, something lacking in all other approaches. For this reason, the model is consistent with the idea of analyzing budgetary changes over existing government programs.

Networks
As we have previously explained, the structure of the interdependencies between indicators is assumed to be a long-term feature, so the networks are exogenous inputs. As such, adjacency matrices can be built for each country by following any preferred criteria. A popular approach among development scholars is the qualitative approach of eliciting expert opinions. Unfortunately, this strategy is not scalable for a large set of countries and indicators (and is difficult to use in the case governments have severe time constraints). Ospina-Forero et al.
(2020) provide a comprehensive review of quantitative methods that may be suitable for estimating SDG networks. 13 With this information in hand, our method of choice is the Bayesian approach of sparse Gaussian Bayesian networks developed by Aragam et al. (2019) (and known as sparsebn). This procedure has the distinctive advantages of working well with high-dimensional datasets, even if they have short series, and producing adjacency matrices that try to minimize the number of links that may be false positives (hence the "sparse" term in the name). 14 Recall that the resulting networks should not be interpreted as causal relations, but as conditional probabilities, which means that a link A → B does not imply that ΔA guarantees ΔB . This is the reason why spillovers affect the probability of success i , and not the magnitude of the outcome. Of course, like with any statistical method, sparsebn makes certain assumptions such as a linear Gaussian structural equation model and no temporal dependence between observations. The former is a standard assumption in causal Bayesian models. Temporal dependencies can be partially removed by computing the first differences of the series. Overall, we consider that these assumptions are more reasonable than those made by alternative network estimation methods, and further arguments are provided by Ospina-Forero et al. 2020. Finally, the networks are estimated for each country individually, an important improvement over the existing literature on SDG synergies and trade-offs which tends to use pooled data. 15

Calibration
The aim of the calibration method is twofold, to assure (1) that the simulated dynamics of the indicators start and end at the empirical levels, and (2) that the model's average success probability corresponds to the empirical fraction of positive first differences of the indicators. 16 To achieve this, we need to find the parameter vector 1 , … , N , that minimizes an error measure.
There are two features that characterize this calibration problem. First, the dynamics of the indicators are interdependent. This means that if i changes, the 'speed' of another indicator j may be altered as well. Furthermore, these interdependencies are not obvious enough so that the model could be written as a system of equations to be simultaneously solved (as one may think by looking at equation 1). For instance, the fact that i is endogenous renders homogeneous Markov chains ineffective. The second feature is the computational cost of each evaluation. Since each simulation may yield a different trajectory for the same indicator, stable metrics have to be obtained from Monte Carlo simulations. This means that evaluating a given set of parameters involves several independent runs. 17 We develop a multi-objective gradient descent method that exploits the fact that each parameter can be associated to a specific error. Let us define an indicator-specific error as To minimize the error, first, we start with a proposed vector 1 , … , N , . Next, we perform a set of M Monte Carlo simulations and compute the error vector e 1 , … , e N , e . For each indicator i, if e i < 0 (meaning that the indicator grew too fast), then we multiply i by a factor 1 − i . If e i > 0 (the indicator was too slow), then we multiply i by 1 + i . The same logic applies to , which has a corresponding factor . Ideally, we want that the mean error converges to zero as we search the parameter space. We can generate such behavior by setting factors 1 , … , N , that change in proportion to the errors. As it turns out, a factor that achieves this for indicator i is , where I i,0 and I i,−1 are the empirical initial and final values of the indicator, while = |e | for . Our simulations suggest zero-error convergence for a large enough M. 18 Thus, it can be run for several iterations until a certain threshold for the average error is achieved. The calibration procedure for the model parameters is described in Algorithm 2. As the reader will notice, we bound the step factor (1 ± ) by 1/2 or 3/2 as we have found that this accelerates the convergence rate significantly.

Goodness of fit
For a single indicator i, the goodness of fit of its corresponding parameter i is which takes values in the interval (− , 1] , where is the lower bound induced by the theoretical maximum of the indicator. If no theoretical maximum exists, then the lower bound is − inf. The basic idea behind GoF i is that, in a good fit, the error e i should represent a small fraction of the historical gap that needs to be closed in a simulation ( I i,−1 − I i,0 ). Errors where the simulated average indicator ends below the empirical value are bound by I i,0 because the model only allows nonnegative growth. However, an error where the simulated average indicator ends above the empirical value may represent multiple times the size of the historical gap. Therefore, this metric not only takes into account accuracy with respect to the final value, but it also penalizes extreme errors with negative contributions when computing the mean goodness of fit across all indicators. Importantly, when testing alternative calibration methods, several indicators display a negative GoF i . This is not the case for our algorithm. The metric for the goodness of fit of parameter follows the same logic, but the target feature is the rate of positive first differences. Formally, the goodness of fit of is where Γ is the number of positive first differences in the empirical data as a fraction of all first differences.
The overall goodness of fit for a country is the average Figure 2 shows the distribution of the GoF after calibrating the model. 19 More detailed results on the goodness of fit are provided in Appendix F. Notice that, when performing this calibration procedure, we obtain a remarkable goodness of fit at the country level. Furthermore, the large majority of the parameters i exhibit a fitting above 0.9, while this is always the case for . (3)

Definition of SDG gaps
The main estimates of the paper are the gaps or the distances between development goals and the levels predicted for the indicators in 2030. If a prediction surpasses its goal, then we say that the gap has been closed. Formally, an SDG gap is where G 1 , … , G N are the development goals obtained from the SDR (the values that each indicator is supposed to achive in 2030), and Ī i,T is the expected value of indicator i-across M independent Monte Carlo simulations-after T simulation periods that are equivalent to 10 years. The underlying yearly budget for the 2021-2030 period is assumed to be identical, in per capita terms, to the (annual) average expenditure observed in the 21 years of data. We express the gaps as a proportion of their goals and in percentage terms. Thus, we can read an SDG gap as: "by 2030, indicator i will still need to close x% of its goal". 20

Data
There exist different databases from which one could obtain indicators classified into the SDGs, for example, the SDG indicators from the United Nations Statistics Division (United Nations 2020), the World Bank Atlas of Sustainable Development Goals (World Bank 2020), the OECD SDG distance indicators (OECD 2020), and the indicators compiled by the Bertelsmann Stiftung and Sustainable Development Solutions Network to produce the Sustainable Development Report (SDR) (Sachs et al. 2020). In this study, we use the SDR database for three main reasons. 21 First, the SDR is the only dataset that provides quantitative values for the goals to be achieved by each indicator. Furthermore, these goals are consistent across all the countries in the sample because the chosen indicators are applicable to each nation. Since the aim of this paper is to assess the , 20 Appendix C reports confidence intervals and provides a method to incorporate uncertainty about the quality of the data into the intervals when information on the indicators' errors is available. 21 A caveat of the chosen data is that they do not contain time series for SDG 12 (Responsible consumption and production). The indicators in SDG 12 relate to issues such as waste management, which have just recently been quantified in a handful of countries. However, this is also an issue in all alternative datasets.
feasibility of reaching the SDGs, having quantitative goals is necessary. Second, the SDR data have consistently longer time series than alternative databases. This is helpful for the calibration of the model, because the estimation of the structural factors 1 , … , N assumes that they capture long-term features of the data. For a sub-sample of 140 countries, the SDR provides time series with a length of almost 21 years (from 2000 to 2020) in numerous indicators. Alternative datasets, while they contain more indicators, fail to provide consistently long time series. Third, the majority of the data sources for the SDR indicators are recognized international (and intergovernmental) organizations, while the rest are scientifically sound products such as surveys from statistics bureaus, NGOs, and academic institutions. While the SDR team makes a substantial effort in gathering as much data as possible for each country, there are countries that lack some of the indicators, or that have too few observations. For this reason, different countries in our sample may have more or fewer indicators than others. This is problematic for all studies that pool cross-national data, since decisions have to be made regarding the imputation of missing observations, or the complete elimination of certain indicators. Our approach overcomes this problem because we do not need to produce estimates on pooled data. Thus, we allow each country to have its potentially unique set of indicators and perform the estimations independently of other nations. 22 This allows capturing as many policy dimensions as possible for each country, which is consistent with the philosophy behind multidimensional development.
While having unbalanced panels is still not the ideal setup for ex post cross-country comparisons, we believe that this framework is still able to overcome some of the main hurdles of data-fitting approaches. Appendix B provides detailed information on the 77 indicators of our sample, and their distribution across countries.
For the purpose of visualizing some of our results, we may color or aggregate them into country clusters. We should emphasize that this is only for visualization purposes. For these country clusters, we use the following grouping scheme: sub-Saharan Africa (Africa), Eastern Europe and Central Asia (E. For the national budgets, we use data on total government expenditure in current USD (which can be accessed through the link: data. world bank. org/ indic ator/ NE. CON. GOVT. KD). This information is obtained from the dataset on General Government Final Consumption Expenditure which, in turn, sources the information from the World Bank National Accounts Data and the OECD National Accounts data files. We compute the total expenditure exercised by each country in the 2000-2020 period. Missing values are imputed with the average yearly expenditure, and the final amount is transformed into per capita expenditure to remove population-size effects (we use the population size reported by the SDR).

Results
Because we only have aggregate yearly government expenditure for making worldwide comparisons, we limit our analysis to three types of simulation exercises. Firstly, we study whether SDG gaps of the 2030 United Nations Agenda can be closed assuming a benchmark scenario in which we project the historical yearly average of public expenditure for the following 10 years. Secondly, we analyze the sensitivity of these gaps to different increments/decrements of the budget size. We also visualize the response function of budgetary changes in terms of delays (or savings) in the number of years to reach the 2030 levels obtained in the benchmark scenario. Thirdly, we study structural bottlenecks that hamper the possibility of improving the indicators' performances by increasing the allocated funds. These bottlenecks are made evident when, by construction, inefficiencies and budgetary constraints are ruled out in a counter-factual simulation. Although these exercises are produced at the country level using all available indicators separately, for exposition reasons, we present several visualizations at the SDG or geographical cluster level in the main body of the paper. The reader should be aware that our methodology can deal with country-specific features such as the following: the network of interdependencies between indicators; the historical context reflected in the database and considered for calibration purposes; the indicators' initial conditions for prospective analyses; and the establishment of the 2030 goals attending to the countries' idiosyncrasies and political systems. Country-specific estimations are key when using the model for providing policy guidelines; however, technically this is not always possible with other methodologies. For example, in regression analyses using aggregate data, information from different countries has to be pooled to obtain enough degrees of freedom. The latter approach precludes the possibility of making inferences for particular countries and, thus, the estimates have limitations in terms of policy advice.

SDG gaps
We present our estimates of SDG gaps for 2030 at the level of each country in Figure 4. The bars indicate average levels across indicators, and the colored dots correspond to the ten indicators with the largest estimated gaps. The latter exemplify the gap disparities that exist within each country. As expected, the advanced market economies of the West exhibit gaps that are substantially lower than those estimated for the least developed countries (like those in Africa). However, there are also relatively successful countries in other regions of the world, such as Cyprus (CYP) and Croatia (HRV) in E. Europe & C. Asia; Japan (JPN), South Korea (KOR), and Singapore (SGP) in East and South Asia; and the United Arab Emirates (ARE) in MENA. In contrast, the least successful countries are Haiti (HTI) in LAC; and the Central African Republic (CAF), Eritrea (ERI), and Chad (TCD) in Africa.
At a more aggregate level, we can observe gap disparities across clusters and across SDGs. For example, while no country in Africa has an average gap below 18%, all Brown: West. Countries in gray were excluded from the sample due to lack of data the countries in the West have an average gap below 12%. The systematic persistence of certain dot colors (such as orange, corresponding to SDG 9 [Industry, innovation and infrastructure]) suggests that, in some SDGs, it is more difficult to close the gaps. Some of the most persistent SDGs across the dot markers are SDG 9-'Industry, innovation and infrastructure'-and SDG 7-'Affordable and clean energy'. Such a pattern is especially visible in Africa. Figure 5 provides a complementary visualization of the SDG gaps. Here, the gaps of the indicators have been averaged across countries in the same cluster. These plots reveal that only one indicator of SDG 7 is persistently close to the 100% gap in Africa, and that several indicators of SDG 9 exhibit high gaps. Another feature revealed by this visualization is that most of the environmental indicators in SDGs 14 (Life below water) and 15 (Life on land) present gaps above the cluster average (identified with the solid black ring). The reader should be aware of the risks of aggregation, which are evident when comparing the gaps estimated for the indicators in SDG 2 (Zero hunger) in Africa and West. Here, Africa is expected to perform better than West in obesity, nitrogen emission, and human trophic levels (which relates to dietary diversity). These problems are endemic to advanced market economies, so our results are intuitive. However, if we were to aggregate these gaps for the whole SDG 2, the result would suggest a similar performance between both clusters  Table B.1 in Appendix B. For precise estimates and confidence intervals of each individual indicator gap, see Appendix C. Sources: Authors' own calculations since the indicators related to hunger and malnutrition show the opposite performance (so their gaps would cancel out each other, at least approximately). Clearly, even with a multidimensional view of development, there exist specific policy issues that perform in substantially different ways across countries, even if they belong to the same dimension. This is one of the reasons why it is so important to move beyond the common practice of pooling cross-national and SDG-level data, and to produce more granular estimates that reflect the context and the spending capabilities of each country in each indicator.

Sensitivity to changes in the budget size
A key issue to be addressed when studying the feasibility of SDGs is the impact that budgetary changes have on the evolution of social, economic, and environmental indicators. In the context of this paper, we are interested in understanding how sensitive are the different SDG gaps to changes in public expenditure. Thus, we estimate the country-specific sensitivity of each indicator to changes in the overall size of the budget during the 2020-30 period. Our dataset suggests substantial variation in the growth of public spending between the 2000-2010 and the 2010-2020 decades (an average of 47%). Thus, our estimates consider prospective simulations with positive and negative changes of up to 50% with respect to the historical expenditure levels reflected in the data. 23 We measure sensitivity by calculating the difference between gaps from a benchmark scenario that maintains the historical expenditure levels (the average yearly expenditure from the data, projected over 10 years) and a scenario that considers changes in the size of the budget. Figure 6 presents a highly disaggregated picture of the different sensitivities when the budget is increased by 50%. Larger markers denote more sensitivity, while the gray lines indicate the absence of an indicator in a particular country. As a reference point, the largest marker corresponds to a reduction of 13% in an SDG gap. From this visualization we can highlight several important results. First, there is substantial heterogeneity across countries-positioned in the vertical axis-and indicators-positioned in the horizontal axis-with respect to the magnitude of gap reductions. Second, the most notorious impacts are not randomly scattered, but rather concentrated in specific SDGs (compare columns of different colors) and indicators. For instance, two gaps in SDG 9 ('Logistics performance index' and 'Mobile broadband subscriptions') have notable reductions in most of the countries where data are available. Third, the gaps of economic indicators in SDG 8 (Decent work and economic growth) are not particularly sensitive to a 50% increase in the budget size, especially when compared with those of SDG 9 (see the size of brown markers versus that of orange markers). Fourth, with the exception of some African cases, the gaps in SDGs 13 (Climate action), 14, and 15 (the environmental ones) rarely exhibit substantial improvements. Fifth, excluding a few country-indicator cases, the SDG 16 (Peace, justice and strong institutions) gaps do not seem responsive to a 50% increase in the budget. In section 4.3, we show that these diverse sensitivities are the result of long-term structural factors that impose a constraint to the effectiveness of public expenditure in government programs. Before elaborating on these structural factors, we provide further sensitivity results related to reductions to the budget size, and to an alternative sensitivity metric. Figure 7 presents sensitivity results for a 50% reduction in budget sizes. In this case, the sensitivity outcomes mean that the SDG gaps widen. As a reference point, the largest marker corresponds to a gap augmentation of nearly 20% with respect to the benchmark case. This implies that, in general, SDG gaps are more sensitive to a 50% reduction than to an increment of the same proportion in the budget size. This sensitivity asymmetry becomes evident when contrasting the outcomes of SDG 8-in 'Adults with an account at a bank or other financial institution'-presented in Figures 7 and 6. A similar asymmetric pattern can be found in environmental indicators from SDGs 14 and 15.
To have a better understanding of the asymmetric sensitivity between a 50% increment and reduction in the budget, we would like to revisit three assumptions about our modeling approach. First, we aim to model short-term dynamics and, hence, long-term structural factors are given through the exogenous parameters i that are specific to each indicator and country. Second, the impact of the public funds devoted to the different government programs is viewed in the context of short-/mid-term effects. This is so because we model a probability i,t representing the chance of indicator i to improve in the subsequent period t + 1 . These two aspects are combined into the evolution equation 1. While more public spending increases i,t , the long-term structural factors i limit the growth speed. Therefore, government expenditure only affects i,t , not i . Third, public spending is a necessary condition for development. Thus, from looking at the evolution equation we can tell that, if less expenditure brings t,i close to zero, then the growth trials are almost always unsuccessful, so the indicator dynamics stagnate.
From the side of budgetary increments, there seems to be a limit to how much some gaps can be reduced in a given period while, on the side of reductions, no improvement can be expected in the absence of public funds. Furthermore, given the law of motion of the indicators, and other microfoundations of the model, the response to expenditure changes may vary in non-linear ways. Thus, to provide a full picture of these non-linear response functions, we measure sensitivity in terms of the number of years that it would take to achieve a certain level (for each indicator and country), and compute them for 1% variations (positive and negative) in the budget size. We present the results aggregated into SDGs and clusters. In Figure 8, we present the aggregate response functions in the range of budgetary changes between -50% to +50% (with marginal changes of 1%). 24 We calculate the response functions using the difference in the number of years it takes for an adjusted budget to reach the levels of the indicators obtained in 2030 with the benchmark scenario. In the latter calculation, the historical annual expenditure average is projected forward throughout the following decades. If the budgetary changes produce additional years, there is a delay, while if there are some saved years, the delay displays in a negative scale.
Due to the aforementioned problems of aggregating indicators, the results presented in 8 should be considered as qualitative evidence of the non-linear responses to changes in public expenditure. 25 First, note that every SDG shows certain level of sensitivity to both positive and negative budgetary changes. Second, confirming our previous findings, the sensitivity to positive and negative budgetary changes are systematically asymmetrical in terms of the response magnitude. Third, the sensitivity rankings across SDGs vary between clusters and depends on the magnitude and direction of the budgetary change. For example, for countries in West, SDG 13 is the most sensitive to budgetary reductions, but the same ranking position is not observed in other clusters. Nevertheless, it is important to emphasize that SDG 13 systematically exhibits important delays in all clusters.

Budgetary frontiers and structural bottlenecks
Now that we have established the existence of non-linear responses of the SDG gaps to public spending, we elaborate on their structural origins. Let us open our argument by stating the obvious: that every government is constrained by time and resources. Thus, in a short-/mid-term scenario, time is critical in order to achieve a set of goals. While a particular policy may succeed in improving an issue, the amount of improvement is constrained by factors such as infrastructure, organizational practices, individuals' incentives, and technology that can only be modified through changes to the existing government programs; changes that take place in a longer time span. Thus, in the scope of existing government programs, these factors are considered exogenous, and we capture them through parameter i . It follows that the success in reaching development goals is partly determined by how much i allows an indicator to improve during a set amount of time. Not knowing these limits to success could lead to ineffective policy priorities and bad planning in terms of long versus short-/mid-term policies.
To unearth the limits imposed by structural factors, it is useful to think about the following hypothetical question: How much, in the years left to reach the SDGs, can the SDG gaps be closed if public funding was unlimited and fully efficient?. This theoretical scenario removes the resource constraints from the equation, and leaves us with the interaction between structural factors and time. Therefore, by estimating the SDG gaps under this hypothetical setting, it is possible to establish bounds to how small an SDG gap can become by 2030. To achieve this, we only need to assume ( i ) = 1 in equation 1. When ( i,t ) = 1 for every indicator, we say that the country operates at the 'budgetary frontier'. Thus, the SDG gaps that remain open at this frontier describe the limitations of increasing expenditure in the current government programs. In other words, if an SDG gap remains open at the budgetary frontier, it means that-regardless of how much public expenditure increases-the strategy will be unsuccessful if the long-term structural factors (i.e., their bottlenecks) in that policy issue are not addressed. Figure 9 presents the average SDG gaps at the budgetary frontiers of the different countries in the sample. Panel (a) aggregates the gaps across indicators with each country. Note that none of the average gaps closes entirely, even in the most advanced nations. As expected, these gaps are wider in Africa, reaching 39% in the Central African Republic (CAF). This diagram illustrates the relevance that local features have in the wide disparities observed across countries' performances. The estimated SDG gaps at the budgetary frontiers show countries exhibiting structural long-term hindrances of different magnitudes, even if they belong to the same cluster. Although the model cannot distinguish the specific reasons behind these discrepancies, it is reasonable to argue that their causes lie in bottlenecks of a local nature.
The right panel in Figure 9 shows the average gaps at the budgetary frontier, aggregated into SDGs within each cluster. The fact that SDG 13 presents a near-null gap in countries from Africa and LAC indicates that environmental issues related to climate action could be improved, on average, by properly funding existing government programs in those regions. However, this is not the case for other environmental SDGs. For instance, in SDGs 14 and 15, the frontier gaps vary between 27 and 42% 26 .
Finally, a cautious reader may consider that public spending should have structural consequences, so the exogenous factors i could also be affected in the short term. While this reasoning is, in principle, correct, the empirical evidence suggests that this process is rather weak. For instance, if the structural factors contained in i were to change substantially in the short term, then the SDG gaps estimated from simulations using more recent data samples should   -year (2016-2020) sample. 27 Figure 10 shows that our original estimates are robust to these alternative samples, as the six clusters show relatively small differences in their average gaps (see Appendix I for more disaggregated yet robust results). 28 For calculating these differences, we compare the average SDG gap for each country produced in the benchmark simulations-using 21 years of data-and the gap estimated with smaller time series of historical data (either 10 or 5 years). Notice also that the closer the size of these time series is to the whole historical sample, the smaller the difference in the average gaps is. That is to say, the dark bars are smaller than the light ones. From this, we conclude that the SDG network and the structural factors exhibit slow dynamics, validating our 1 3 conceptualization of long-versus short-/mid-term effects. Accordingly, the budgetary frontiers involve long-term considerations and demand the implementation of innovative micro-policies.

A discussion on the model's strengths and limitations
Models of multidimensional development typically use composite indices such as the Human Development Index and the SDG Index. However, if analysts wish to provide more nuanced advice with respect to specific SDGs in terms of policy prioritization and budgetary allocations, it is necessary to model the evolution of each separate dimension without aggregating them and losing valuable information. This task is problematic for statistical/econometric and machine learning approaches since they cannot deal easily with a high-dimensional policy space characterized by few observations (short time series). For example, multi-output models (such as regressions of equation systems or neural networks) demand unrealistically large amounts of observations for each dimension/indicator. To overcome this limitation, analysts pool cross-national data to produce their estimates. This, however, has the costly implication of removing country-specificity because any interpretation from the estimated parameters is limited to a hypothetical country with the average characteristics of the sample. In addition, data-pooling strategies only work with a limited number of indicators, since there exist only so many countries.
In data-fitting approaches, the problem of few observations aggravates when considering interdependencies between indicators because the number of potential interactions (parameters to be estimated) grows exponentially 1. Much of the empirical quantitative literature-which policymakers often use to guide their decisions-focus on the impact of one (or multiple) indicator(s) on another (or others). However, indicators are not instruments that governments can directly manipulate, but rather endogenous variables resulting from spending decisions. Hence, governments often motivate their expenditure choices using studies that do not offer evidence on how effective or viable it would be to fund a particular SDG given the existing government programs. There are two alternatives to remedy this analytical hurdle: the use of granular expenditure data or the implementation of a generative model of public spending. 2. Highly granular data of public expenditure-properly linked to specific development indicators-are practically non-existent. Under these circumstances, analysts have to rely on data aggregated into a few broad sectors (e.g., education, health, poverty alleviation) and select a 'representative' indicator for each one-or an average index. 3. When constructing a dataset to use these methods, one must assemble a large cross-country panel to obtain the necessary degrees of freedom for making estimations possible. 4. Most data-fitting approaches can only consider one dependent variable, which is inconsistent with the systemic view of the SDGs. 5. Establishing a causal link between an expenditure variable and an aggregate indicator is problematic because of confounding factors and reverse causation (because a government can adjust its budget according to the observed performance of the indicators). This problem also applies to studies using more sophisticated machine learning methods. 6. Due to inefficiencies during the policymaking process and spillover effects, the level of expenditure in a policy issue does not reflect the actual amount of resources effectively used.
In general, data-fitting and aggregate models are ill-suited due to their lack of explicit causal mechanisms. To overcome this problem, computational approaches such as agentcomputing models can be useful. Nonetheless, these models may also demand large amounts of data, so they are typically employed in micro-level studies relevant to a specific SDG. However, this does not mean that agent computing cannot be used for comprehensive analyses of SDGs but rather that substantial efforts need to be made in this direction. For instance, Allen et al. (2016) provide an extensive review of model types used for the assessment of SDGs. They find that agent-computing models account for only 1% of the studies in their literature survey.
In this paper, we contribute to this effort by developing an agent-computing model that is explicit about a critical causal channel: public expenditure. In contrast with recent studies on SDGs, like those mentioned in the paper's introduction, our model can be calibrated for individual countries on a large policy space, helping researchers and practitioners to get the most out of the available data. Furthermore, it does not impose aggregate relationships between the different indicators. Rather, it is very flexible since it allows the user to introduce any network of interdependencies that are relevant to the context under study (Ospina-Forero et al. 2020). While our model is not explicit about the full complexity of the system (and no other model is), it provides a rich enough yet parsimonious specification, which facilitates counterfactual experiments ('what if' scenarios) and allows estimating the impact of budgetary changes.
The proposed computational method also has some limitations that the reader should be aware of. Our model cannot produce ex ante evaluations of new government programs, nor can it yield policy prescriptions if, in the out-of-thesample analysis, there is a drastic transformation in the technological, political, and organizational underpinnings of a country. In this sense, the model assumes a 'business as usual' setting in which the system keeps working with similar government programs and structural features as those prevalent during the sampling period. This assumption is realistic in short-term analyses (less than 6 years) and admissible for evaluating policy design in a medium-term setting (5-15 years). Thus, our approach focuses on short-/ mid-term effects, and it explicitly separates structural factors that shape long-term dynamics. Ironically, while much of the existing methods suffer from the same limitations, their applications often tend to emphasize long-term scenarios.
The robustness of the model can be enhanced when disaggregated expenditure data are available at the SDG or government program levels. However, these databases only exist for a very few countries. Therefore, in this paper, we offer a worldwide application of the model and show that it can provide insightful policy guidelines even if a country only has aggregate expenditure data. As in any quantitative approach, when more detailed empirical information is available, our model can generate more specific policy prescriptions. For instance, with expenditure data disaggregated at the SDG level, it is possible to establish whether different budgetary allocations, to those observed historically, could exert an impact on the closing of development gaps.

Conclusion
We propose a bottom-up computational framework to analyze the short-and mid-term impact of budgetary allocations in a large set of SDG indicators. Our simulations use data from individual countries. Hence, it allows specifying context-dependent settings: initial conditions, calibrated parameters, and spillover effects among indicators. The underlying theory assumes fixed structural factors in the indicators' evolution equations and exogenous network topologies (which could be constructed in tandem with other qualitative and quantitative approaches). Our approach is useful to understand how to allocate resources across the existing government programs, so it facilitates identifying key priority areas. Moreover, through counter-factual simulations, the model can discover bottlenecks associated with the inefficacy of the public expenditure, which is key to achieve any development agenda. However, the model is not designed to identify the causes behind the structural constraints that prevent a country from closing its SDG gaps. Hence, the outputs are not informative about how to reformulate the existing micro-policies or how to generate new ones.
Our main results provide novel and nuanced estimates of the development gaps that will remain open in 2030, at the level of each country and indicator. We also find that more government spending is not enough to close the SDGs gaps, even if countries were operating at a budgetary frontier that entails enough resources for the existing government programs. Hence, complementary micro-policies are ultimately needed to overcome structural-long-term-bottlenecks and to improve the relevant indicators. When looking at the model's estimates, we can offer detailed interpretations of the simulation results. For instance, some environmental concerns such as clean air can be substantially ameliorated with a larger budget, while others (e.g., SDGs 14 and 15) require undertaking well-designed government programs to shift the historical course of ineffective policies.
Despite the simplicity of the model, it is possible to use it to infer two crucial country-specific features: (1) the possibility of closing the SDG gaps, and (2) the existence of longterm bottlenecks. Therefore, when analysts can identify one or several government programs with a particular indicator, it is possible to establish some policy guidelines with the model's estimates. Depending on the values of these features and the level of the indicator's historical performance, it is possible to define different routes of policy action. That is to say, whether the program should be reviewed, in terms of incentives and organizational practices before spending more public funds, or whether the SDG gaps can be closed by just channeling more funds into the existing programs.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.