Cross-national analysis of food security drivers: comparing results based on the Food Insecurity Experience Scale and Global Food Security Index

The second UN Sustainable Development Goal establishes food security as a priority for governments, multilateral organizations, and NGOs. These institutions track national-level food security performance with an array of metrics and weigh intervention options considering the leverage of many possible drivers. We studied the relationships between several candidate drivers and two response variables based on prominent measures of national food security: the 2019 Global Food Security Index (GFSI) and the Food Insecurity Experience Scale’s (FIES) estimate of the percentage of a nation’s population experiencing food security or mild food insecurity (FI<mod). We compared the contributions of explanatory variables in regressions predicting both response variables, and we further tested the stability of our results to changes in explanatory variable selection and in the countries included in regression model training and testing. At the cross-national level, the quantity and quality of a nation’s agricultural land were not predictive of either food security metric. We found mixed evidence that per-capita cereal production, per-hectare cereal yield, an aggregate governance metric, logistics performance, and extent of paid employment work were predictive of national food security. Household spending as measured by per-capita final consumption expenditure (HFCE) was consistently the strongest driver among those studied, alone explaining a median of 92% and 70% of variation (based on out-of-sample R2) in GFSI and FI<mod, respectively. The relative strength of HFCE as a predictor was observed for both response variables and was independent of the countries used for model training, the transformations applied to the explanatory variables prior to model training, and the variable selection technique used to specify multivariate regressions. The results of this cross-national analysis reinforce previous research supportive of a causal mechanism where, in the absence of exceptional local factors, an increase in income drives increase in food security. However, the strength of this effect varies depending on the countries included in regression model fitting. We demonstrate that using multiple response metrics, repeated random sampling of input data, and iterative variable selection facilitates a convergence of evidence approach to analyzing food security drivers.


Introduction
Recent data indicate that more than two billion people lack regular access to safe, nutritious, and sufficient food (FAO 2019), and an estimated 821 million people are not able to acquire enough food to meet minimum dietary energy requirements (FAO 2018a). The second United Nations Sustainable Development Goal (SDG 2) aims to eradicate hunger and all forms of malnutrition by 2030, yet hunger is slowly rising after decades of decline (UN 2019).
Food insecurity is a complex problem, manifesting as obesity and malnutrition in addition to extreme hunger and starvation (Candel 2014). A widely used definition from the FAO states that "food security exists when all people, at all times, have physical and economic access to sufficient, safe and nutritious food to meet their dietary needs and food preferences for an active and healthy life" (FAO 1996). This definition has been critiqued and refined (Barrett 2010;Coates 2013;Dilley and Boudreau 2001;Pinstrup-Andersen 2009;Tendall et al. 2015), and many food security measurement methodologies have been developed (Cafiero 2016;Carletto et al. 2013;EIU 2019;IPC Global Partners 2019;Jones et al. 2013;Leroy et al. 2015;Russell et al. 2018).
The Food Insecurity Experience Scale (FIES) measures food insecurity through the lens of a survey respondent's "lived experience" of food access (Cafiero 2016). The scale builds on experience-based assessment tools, which track the managed process by which a person typically confronts food insecurity (Ballard et al. 2013;Radimer et al. 1990). The FIES is globally calibrated to ensure cross-country comparability and has emerged as a leading indicator of food insecurity (Saint Ville et al. 2019). The official SDG indicator framework designates the FIES-based estimate of the prevalence of moderate or severe food insecurity in a nation's population as SDG Indicator 2.1.2 (UN General Assembly 2017).
The Global Food Security Index (GFSI) is a composite indicator that monitors national-level food security and has been tabulated since 2012 (EIU 2019). The GFSI is built upon 34 unique indicators spanning three conceptual pillars of food security: 1-affordability, 2-availability, and 3-quality and safety (Izraelov and Silber 2019). National GFSI scores are calculated by weighting these indicators according to an expert panel weighting matrix. Unlike the FIES, which directly measures individuals' experiences, the GFSI is countrycentered and considers food security according to the national capacity to promote food affordability, availability and quality/safety (Thomas et al. 2017). The GFSI is a blend of indicators that may themselves be considered determinants of food security (e.g., gross domestic product per capita, funding of food safety net programs) or metrics of food security (e.g., dietary energy adequacy, micronutrient availability). The GFSI uses a variety of national-level data to address the question: how food secure is a given country relative to others?
Robust definitions and measures also enable study of the drivers of food security. The FIES scores of individual survey respondents have served as the response variable for several analyses. Smith et al. (2017b) used multilevel linear probability models across 134 countries to find that FIES assessments of household food insecurity were most strongly related to low education levels, weak social networks, low social capital, low household income, and unemployment. In a separate paper, Smith et al. (2017a) used similar models across Latin American and Caribbean countries to find that low levels of education, limited social capital, and living in a country with low gross domestic product per capita were associated with the most severe food insecurity per FIES scores. Park et al. (2019) used the Gallup World Poll data to predict FIES scores for elderly populations using explanatory variables naturally available in the survey responses, including economic and demographic factors in addition to several composite indices (e.g., Community Basics Index). Omidvar et al. (2019) used household-level FIES data to analyze socio-demographic correlates of food insecurity among Middle Eastern and North African countries.
In addition to assessing GFSI's composition and validity (Chen et al. 2019;Izraelov and Silber 2019;Maricic et al. 2016; Thomas et al. 2017), researchers have employed the index for national and cross-country assessments (Cai et al. 2020;Chaudhary et al. 2018;EIU 2016;Molotoks et al. 2017). Yunusa et al. (2018) used the GFSI as a response variable in a cross-country analysis that finds that population and water resource availability were poor predictors of national food security. Richterman et al. (2019) used the GFSI to identify an inverse relationship between the cholera incidence rate and national food security among 30 countries.
Other cross-national analyses have used the Global Hunger Index, child stunting rates, and the prevalence of undernourishment as response variables. Laborde et al. (2016) examined trends between the Global Hunger Index and a set of long-term food security drivers by describing the food system as a system of equations. Their study concludes that income is a very strong driver, but also that the effect of a policy targeting a given driver can vary greatly depending on the context of the households, regions, or nations involved. The 2018 State of Food Security and Nutrition in the World report examined the influence of climate variability and extremes on the national prevalence of undernourishment using change point analysis, finding that climate shocks drove food crises, especially in countries where a high proportion of the population depends on agricultural livelihoods (FAO 2018a). The 2019 State of Food Security and Nutrition in the World report studied the cross-country effects of economic slowdowns, finding that an economic downturn was associated with a 5% increase in the national prevalence of undernourishment among 130 countries between 2011 and 2017 (FAO 2019). Other cross-country studies examined child stunting rates, which, though related to national food security, is specifically the result of poor nutrition and health early in life (Milman et al. 2005). Headey (2013) analyzed the effect of within-country changes in general developmental factors on child stunting rates, finding evidence that economic growth typically leads to reduction in stunting, but weaker evidence that agricultural growth plays a special role. Smith and Haddad (2015) studied determinants of cross-country reductions in stunting from 1970 to 2010, finding income growth and strong governance to be key basic determinants of improvements in child undernutrition, while safe water access, sanitation, women's education, gender equality, and the quantity and quality of food available were underlying determinants.
The above studies of food security drivers typically use only one measure of food security or nutrition as the response variable. However, it is well-recognized that no single metric can capture all dimensions of food security, and thus complete assessments of food security use a "convergence of evidence approach" across several metrics (Ballard et al. 2013;Coates 2013;Jones et al. 2013;Pérez-Escamilla et al. 2017).
Following this logic, it is useful to examine the drivers of food security using more than one metric in order to make more robust conclusions about the relative contributions of different explanatory variables. How do the results of cross-country food security models vary when the variable used to define national food security is changed? Further, how do these models respond to changes in data availability (i.e., the countries included in input data) and model formulation (i.e., the explanatory variables selected)? Here, we analyze the importance of several explanatory variables in regressions predicting food security at the national level based on both GFSI and FIES metrics. By conducting the analysis in parallel for each response variable, we compare results from two fundamentally different approaches to assessing food security. We further test the stability of our results to changes in explanatory variable selection and the countries included in regression model training and testing using stepwise forward variable selection and bootstrap sampling, respectively.

Data
This section provides additional background on the data used in this study and our rationale for the selection of explanatory variables. The full dataset is available for download in Online Resource 1, which also includes metadata on the definitions, sources, data years, and units of all variables.

Food Insecurity Experience Scale
The Food Insecurity Experience Scale (FIES) measures the access dimension of food insecurity through the lens of a person's lived experience (Cafiero 2016). Food insecurity is commonly experienced as a continuum, where mild food insecurity is first felt as a worry about how to procure food because of a lack of resources, progressing to compromise on the quality and variety of food, then reduction in the quantity of food, before skipping meals and experiencing hunger associated with severe food insecurity (Coates et al. 2006). The FIES Survey Module uses eight yes/no questions to assess the respondents' place on this continuum in the past 12 months (Ballard et al. 2013). Table A2.1 presents the questions in the survey module (Online Resource 2). The questions are ordered such that answering "yes" corresponds to increasing levels of food insecurity as the module progresses. From these ordered responses, the Rasch model is used to estimate the level of food insecurity experienced by the respondent (Nord 2014). The FIES Survey Module is administered to nationally representative samples of the adult population, and national-level results are calibrated to a global reference scale to ensure cross-country comparability (Cafiero et al. 2018).
FIES respondents can be classified as experiencing a) food security or mild food insecurity, b) moderate or severe food insecurity, or c) severe food insecurity (UNSD 2020). The moderate food security threshold is set by the 5th FIES Survey Module item, which asks if the respondent has eaten less than he/she thought he/she should because of a lack of money or other resources. The severe food insecurity threshold is set by the 8th item, which asks if the respondent has gone an entire day without eating for lack of money or other resources. Once national FIES measures have been calibrated to the global scale, the prevalence of these levels of food insecurity in the national population is estimated by probabilistically assigning respondents to each class as described in the official SDG Indicator 2.1.2 metadata (UNSD 2020).
Response variable: FI <mod SDG Indicator 2.1.2, denoted by FI mod + sev , is defined as the percentage of people who live in households classified by the FIES as moderately or severely food insecure (FAO 2018b; UN General Assembly 2017). It follows that the percentage of the population who experience either food security or mild food insecurity, FI <mod , can be defined as FI <mod = 1 − FI mod + sev . We used the percentage of the national population in the FI <mod class as a response variable in our analysis to facilitate comparison with the Global Food Security Index, which increases with increasing food security performance.

Global Food Security Index
The Economist Intelligence Unit's Global Food Security Index (GFSI) is a composite index that provides a common, cross-national basis for assessing food security (EIU 2016(EIU , 2019. The 2019 GFSI uses 34 unique indicators to cover broad aspects of food security, from average food supply, to diet diversification, to presence of a formal grocery sector, et cetera. The indicators are organized into three categories (Affordability, Availability, and Quality/Safety). Table A2.2 presents the GFSI components and their weights (Online Resource 2). To calculate the index, all GFSI input data are scaled to a value between zero and 100. After scaling, the three category scores are calculated as the weighted means of the indicators, and the overall GFSI score is calculated as the weighted mean of the category scores. We utilize the default indicator weighting matrix recommended by a peer panel of experts on food and agricultural policy. We do not adjust these default results with the optional Natural Resources and Resilience risk adjustment factor offered by the 2019 GFSI model. Although the expert indicator weights are subjective by nature, three independent recent studies have largely concluded that this index formulation is reasonable for use in assessing cross-national differences in food security (Chen et al. 2019;Izraelov and Silber 2019;Thomas et al. 2017).

Explanatory variables: country characteristics
A complex causal chain determines each person's food security, which may be defined according to the 1996 World Food Summit definition: physical and economic access to sufficient, safe, and nutritious food to meet dietary needs and food preferences (FAO 1996). The classic UNICEF framework for child undernutrition classified causes as "basic", "underlying", or "immediate" by their order in the causal chain (UNICEF 1990). For example, disease or inadequate dietary intake may be the immediate cause of undernutrition, but these may be the result of underlying household food insecurity, which is ultimately caused by broader inadequacies in resources (e.g., employment, technology) and other stressors (e.g., political unrest).
These basic determinants in the causal chain are also components of a multipart food system, which is described by the conceptual framework posed by the Global Panel on Agriculture and Food Systems for Nutrition's (GPAFSN) 2016 report (GPAFSN 2016, p. 27). In the GPAFSN framework, dietary quality is most proximally dependent on consumer purchasing power, but the way that income is spent depends on the broader "food environments" that determine which foods are physically accessible, as well as the price and nutritiousness of those foods. Food environments are also dependent on the food supply system, which includes an agricultural production subsystem, as well as subsystems that transform, store, transport and sell food products.
For this cross-country analysis, we select explanatory variables at the most "basic" level of the UNICEF causative framework, and which map to components of the GPAFSN's food systems framework. Conceptually, our explanatory variables describe key aspects of the food system, starting with basic agricultural resources (quality and quantity of agricultural land) utilized by the agricultural production subsystem to produce food, then including the governance and logistics performance which may affect the distribution of domestic and imported food within the food environment, and finally considering the income allowing the purchase of available food by consumers. Table 1 lists each of the selected explanatory and response variables with their units and provides some summary statistics. Figure A2.1 presents scatterplots between each response variable and each explanatory variable (Online Resource 2).
We chose the mean Crop Suitability Index (CSI) and hectares of arable land per capita as measures of agricultural land quality and quantity, respectively. We use the version of the CSI that assesses the suitability of a nation's land area for cultivating rain-fed cereals using low levels of agricultural inputs (van Velthuizen 2007). Arable land includes area classified by the FAO as under temporary crops, temporary meadows for mowing or for pasture, land under market or kitchen gardens, and land temporarily fallow (FAO 2020a).
We chose the per-capita cereal production and per-hectare cereal yield as indicators of in-country agricultural production. Cereal crops include wheat, rice, maize, barley, oats, rye, millet, sorghum, buckwheat, and mixed grains. Cereal production is measured as metric tons of cereal crops harvested for dry grain per capita per year (FAO 2020b). Cereal yield is measured as kilograms of cereals harvested for dry grain per hectare of harvested land (FAO 2020c).
We chose the Worldwide Governance Indicators (WGI) and Logistics Performance Index as measures of governance and logistics performance, respectively. The WGI include composite indicators that measure perceptions of governance quality in six dimensions: Voice and Accountability, Political Stability and Absence of Violence/Terrorism, Government Effectiveness, Regulatory Quality, Rule of Law, and Control of Corruption (Kaufmann et al. 2010). The six WGI indicators are reported in units of a standard normal distribution (i.e., ranging from approximately −2.5 to 2.5), and we use the mean of these indicators for each country as the explanatory variable in our study. The Logistics Performance Index (LPI) evaluates trade and transport-related infrastructure based on survey responses by on-the-ground freight and trade operators (Arvis et al. 2014).
Per-capita household final consumption expenditure (HFCE) reflects the real market value of goods and services purchased by households or by nonprofit institutions serving households. To enable cross-national comparability, we use a measure of HFCE that has been adjusted for purchasing power parity and converted to constant 2017 international dollars (World Bank 2019). HFCE estimates the annual consumption of an average individual, and it relates to consumer income in our conceptual framework. HFCE values are based on household consumption surveys which include imputed expenditures for own-consumption and owner-occupier rents (Lequiller and Blades 2014). These "own-consumption" expenditures include the products of subsistence agriculture, which are assigned a market value based on the farm gate prices that smallholders would have received if they had sold their produce (McCarthy 2013). Valuating the outputs of informal economies in a cross-country-comparable manner remains challenging for national accountants (Charmes 2012). Despite these uncertainties, we consider HFCE an estimate of the total consumption of goods and services of an average consumer, including the procurement of food by buying or growing.
The prevalence of paid employment indicates the percent of total employment made up of wage and salaried workers who hold "paid employment jobs" (ILO 2020). Workers with paid employment jobs are generally considered less vulnerable than own-account and contributing family workers (Gammarano 2018).
We do not claim that these explanatory variables include all characteristics relevant to national food security. Nor does securing access to food guarantee a high-quality diet for all people: utilizing food for healthy diets also relies on consumer behavior and education, among other factors (HLPE 2017). We do claim that the selected explanatory variables include several basic drivers that help determine the extent of food access within a national food system. We analyze how these characteristics can explain cross-national differences in food security.
Correlation between explanatory variables Multicollinearity between explanatory variables is common and can cause regression models' coefficients and predictive capability to be highly sensitive to changes in model specification and input data sample (Farrar and Glauber 1967). Figure A2.2 presents the correlation matrix for all explanatory variables. Many are correlated with one another. We use bootstrapping and variable subset selection techniques to present our results as distributions of model performance and coefficients across many regression model fits. The sensitivity of the results is thus presented directly in the data for the reader's own interpretation.

Methods
This section describes our approach to multivariable regression using the variables described in Section 2, including data preprocessing, bootstrap sampling, and stepwise forward variable selection.

Data preprocessing
To ensure comparability between regression results on both GFSI and FI <mod , we limited our analysis to 65 countries for which all response variable data are available. While utilizing all available countries for both metrics would increase sample size, it would also allow differences in the underlying samples to bias results. GFSI's data coverage prioritizes large countries to capture the largest possible percentage of global population, while the FIES results can be reported by any country who undertakes the survey module.
Prior to regression, we applied a Box-Cox transformation to rescale non-normal explanatory variables to make them more similar to a normal distribution (Box and Cox 1964). Supplementary Note 1 in Online Resource 2 provides further explanation of the Box-Cox transformation applied to the input data. Finally, to promote comparability of regression coefficients between the explanatory variables in each model, we transformed the explanatory variables so that they were centered and scaled to a standard deviation of one and a mean of zero. The response variables were not transformed in any way.
While preparing the analysis, we also tested the effect of changing these approaches to dataset selection (i.e., all available data versus only countries with both response variables available) and explanatory variable transformations (i.e., Box- Cox versus targeted logarithmic transforms of a few variables). Figure 5 shows that the performance of all univariate models was consistent for all four combinations of these modelling decisions.

Linear regression on bootstrap samples
We used ordinary least squares linear regression to quantitatively evaluate the relationships between combinations of explanatory variables and the response variables. As described above, data were available for 65 countries. Because of the small number of observations, it is useful to determine how sensitive our results are to the inclusion/exclusion of nations in the dataset used for model fitting (i.e., to test the generalizability of the models). Rather than performing just one regression for each combination of response and explanatory variables, we use bootstrapping (i.e., sampling with replacement) to train and test regression models on multiple subsets of the input dataset. Figure 1 illustrates the bootstrap sampling process used to fit and test each regression model. Starting with the original input dataset, we created 100 bootstrap samples by performing random sampling with replacement (Hastie et al. 2009). Each sample was comprised of a training set, which was used for model fitting, and a test set, which was used to evaluate model performance (out-of-sample R 2 ). The training set was created by drawing random samples with replacement until the training set was the same size as the original input dataset (65 countries). Because sampling was conducted with replacement, the resulting training set contained some replicates of the original countries. The countries that were left out of the training set served as the test set for that bootstrap sample. This procedure ensures that training and test sets are disjoint. The mean test set size was 23.7 ± 2.4.
As visualized in Fig. 1, the out-of-sample R 2 is calculated using only the actual and predicted response variable values in the test set. Thus, the out-of-sample R 2 can be considered the proportion of variance in the response variable that is explained by the regression model for countries the model did not "see" during fitting. This approach applies equally to models using any number of explanatory variables (i.e., including the univariate models in Fig. 4 and the multivariate models in Fig. 6). The bootstrap sampling process is repeated 100 times, generating a set of 100 R 2 values across all iterations of sampling and training-testing.

Model formulation and stepwise forward variable selection
The model formulations for the linear regressions that were fit to these bootstrap samples can be generically written for the i th response variable and j th set of explanatory variables as where Y is the vector of food security scores for response variable i, X j is the matrix of input data for explanatory variable subset j, β ij is a vector of coefficient estimates, and ε ij is a vector of error terms corresponding to response variable i and explanatory variable set j. The model is fit by minimizing the sum of squared residuals, as per the ordinary least squares regression approach.
Our use of bootstrap sampling shows how results change with variation in the countries used to train regression models. It is similarly useful to analyze how regression models perform when using different subsets of explanatory variables. Explanatory variable subset selection techniques can be used to identify the model formulations that achieve best out-ofsample model performance (for example, by avoiding overfitting to training data). Comparing results across the many regression models generated during subset selection also enhances interpretability. We tested two approaches to explanatory variable selection: exhaustive best subset selection, and stepwise forward selection (Hastie et al. 2009). The exhaustive best subset selection approach tested all possible combinations of explanatory variables, fitting 510 different model formulations between the two response and eight explanatory variables. The stepwise forward approach used a "greedy" algorithm that started with the best univariate model and iteratively added the explanatory variable that most improved the out-of-sample model performance at each step. Figure A2.3 shows that both approaches to variable selection produced nearly identical model performance for each number of explanatory variables, for each of the two response variables (Online Resource 2). We chose to present only the stepwise forward variable selection results here because the incremental nature of the algorithm highlights the value of adding each new explanatory variable to the model.

Statistical tools
We used the R language in the RStudio environment (R Core Team 2020). Data visualization and manipulation were conducted with the tidyverse ecosystem of packages (Wickham et al. 2019). Regressions were performed with the tidymodels ecosystem of packages (Kuhn and Wickham 2020).

Results
This study analyzed the relationships between two measures of national food security and a dataset of explanatory variables that characterizes 65 nations in terms of agricultural land quality and quantity, agricultural production, governance and infrastructure, and household income. We used linear regression models to quantify the contribution of each explanatory variable to the variation in both metrics. We further examine the stability of our results by repeating the regressions on varying input data sets to create distributions of model fits and performance. This section compares the two response variables and presents the regression results. India and Brazil are among the largest countries without publicly-available data on FI mod + sev which are excluded from this analysis (FAO 2018b). The Middle East and North Africa region is exceptionally sparsely covered. Despite the gaps, these data span many regional contexts.

Comparing GFSI and FI <mod
For both GFSI and FI <mod , North America, Europe and Central Asia, and East Asia and Pacific regions lead in food security performance. A second tier is comprised of Latin American and Caribbean countries, along with a Northern African nation (Egypt) and two South Asian countries (Nepal and Bangladesh). Sub-Saharan African countries show the worst regional performance on both response metrics. Figure 3 shows a strong positive correlation between countries' GFSI and FI <mod scores. For both metrics, the spread in national food security is tight for a cluster of high-performing nations, and much wider in the middle and lower parts of the scale. Except for Nigeria and Burkina Faso, Sub-Saharan African nations lie on or below the trendline, indicating FI <mod performance that is lower than what the GFSI scores alone might indicate. That is, for most Sub-Saharan African countries in this study, the prevalence of people reporting an experience of moderate or severe food insecurity in the past 12 months is higher than the rate that a model using GFSI's macro-level indicators would suggest. This same deviation from the trendline is observed to a milder extent for most Latin American and Caribbean nations.

Regression modelling
We quantitatively evaluated the relationship between the response variables and the explanatory variables using multivariable linear regression as described in Section 3.1. The underlying methodological differences in the response metrics inform the interpretation of regression results. Regressions on FI <mod show how explanatory variables predict the prevalence of food security (or mild food insecurity) in a population. Regressions on GFSI show how explanatory variables relate to the Index's framework for assessing national food security.
Because we used bootstrap sampling to run each model on 100 random training and testing datasets, all model performance and coefficient results are presented as distributions of outcomes across the 100 model fits. Figure 4 presents the out-of-sample R 2 results for univariate regression models. Each boxplot summarizes model performance across 100 iterations in which a one-predictor model is fit on a training set of 65 countries (including replicates) and then tested on a testing set comprised of all countries not used in training. For instance, when a model with only HFCE as an explanatory variable was used to predict FI <mod for 100 different sets of out-of-sample countries, the coefficient of determination ranged from 0.42 to 0.89, with a median of 0.70. Models predicting FI <mod are generally less accurate than models predicting GFSI.
The univariate model results align roughly with the thematic categories we used to select explanatory variables. Considering these categories one at a time, variables related to the quality and quantity of agricultural land were not predictive of either food security metric. Variables related to agricultural productionper-capita cereal production and cereal yieldwere the second-lowest performing category. However, cereal yield was a significantly better predictor than gross production for both response variables, attributing more importance to land use efficiency than tonnage grown per capita. WGI and LPI, which comprise the governance and logistics category, show mixed results. When predicting FI <mod , the median R 2 for the WGI-only model is 0.46, which is below that of cereal yield (0.58). For GFSI, however, the WGI-only model performs about as well as LPI and the percentage of workers in paid employment jobs (median R 2~0 .8).
Predictions by a univariate model using household final consumption expenditure per capita (HFCE) outperformed all other univariate models for both response variables, capturing a median of 92% of the variation in 2019 GFSI scores and 70% of the variation in FI <mod . We note that some of HFCE's predictive power for GFSI comes from the inclusion of GDP per capita as one of the indicators in the index (about 9% of the total score, per  boxplots presented in this study, the middle line, box hinges, and whiskers of the boxplot show the median, interquartile range (IQR), and the range of values up to 1.5*IQR more extreme than the box hinges, respectively. Here, each boxplot is overlaid by the data points it summarizes market value of all goods and services purchased by households, which corresponds to a portion of gross domestic product. Rather than to attempt to disentangle the effect of HFCE on both sides of the equation, we acknowledge this complication here and avoid relying solely on GFSI when making conclusions. We do note that R 2 results were negligibly changed even when we tested eliminating GDP per capita from the GFSI formulation. FI <mod is independent and is not based on macro-level indicators, making it an important complement to GFSI in this study. Figure 5 shows that these univariate model results are resilient to changes in 1) the countries that are included in the dataset, and 2) the transformation used to scale explanatory variables prior to model fitting. Across all explanatory and response variable combinations, the median out-of-sample R 2 varies by less than 0.09 as these two modelling assumptions are changed. Figure 6 summarizes the model performance of each stage of stepwise forward explanatory variable selection. For each response variable, the leftmost boxplot shows the R 2 performance for the best one-variable model. From left-to-right, the proceeding boxplots show how performance changes when a new variable is added to the model.
Both stepwise forward variable selections begin with HFCE as the first variable. HFCE captures nearly all the information required to predict GFSI: a model with only HFCE   O O O N N N N AN AN N N N NR R N N N N N N N NR N N N N N N N   (median R 2 = 0.92) is negligibly improved by adding variables. HFCE also predicts FI <mod well (median R 2 = 0.70 for the leftmost model), but the out-of-sample performance of the regression model improves with inclusion of cereal yield, cereal production and the quantity of arable land per capita (median R 2 = 0.77). Addition of further variablesincluding logistics index, mean CSI score, mean WGI score, and percentage of workers in paid employment jobsdecreased the ability of the model to predict the FI <mod for countries outside of the training sample. This is evidence of overfitting, as increasing model complexity worsened predictions on test set countries. Figure A2.3 shows that these results hold if exhaustive best subsets variable selection is used to specify the multivariate regressions instead of the stepwise forward variable selection approach presented in the main text. Median out-of-sample R 2 for univariate regression models using four sets of modelling assumptions. The first letter of the method tag corresponds to the input data utilized for regression: "O" indicates that all available data were used when fitting each regression model (i.e., 91 countries for FI <mod and 112 countries for GFSI), "I" indicates that only countries with both response variables available were used (i.e., the same 65 countries for both response variables). The second letter corresponds to the transform applied to non-linear explanatory variables. "B" indicates that a Box-Cox transformation was applied as described in Supplementary Note 1. "L" indicates a targeted approach where a natural logarithm was applied to HFCE and cereal production per capita to linearize these variables with respect to the response  6 Out-of-sample R 2 results at each stage of stepwise forward explanatory variable selection. The x-axis denotes the explanatory variables included in the regression, where A is arable land per capita, C is the mean Crop Suitability Index, P is cereals production per capita, Y is cereal yield, S is HFCE per capita, E is the percentage of workers in paid employment jobs, L is the Logistics Performance Index, and W is the mean Worldwide Governance Indicator score Figure 7 shows the distribution of regression coefficient estimates generated over 100 bootstrap samples. For each response variable, we include the coefficient estimate results for models using all explanatory variables, and for the 4-variable model created by stepwise forward variable selection. Each ridge shows how regression coefficients vary as the countries used for model training change across bootstrap samples. In the full model including all explanatory variables, coefficient estimates range widely, sometimes changing in sign from fit to fit. Many of the widest-ranging coefficients have a median p value above 0.2, indicating very low statistical significance of the estimate.

N N N AN N AN AN AN N N N N O A A A A T T T T T T T T A AO A A A A T T A T T A A R N N N N N N N N N N O A A A A T T T T T T T T A A IR I I N N N N N N NR R R R R R R R AN N AN N UT UT UT T UT UT T T T T T T T T UT UT UT UT UT UT UT UT UT UT UT UT UT UT UT A A T T UT U UT UT UT U U UT T T T T U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U AN AN AN AN T T T T T T T A A T T A A T T T T AN N T T T T T T T T T T T T T T A T T T T T T T T T SWE
Across both response metrics and all regression formulae, HFCE is the only explanatory variable with a consistently positive and statistically significant coefficient. However, the expected boost in national food security from an increase in HFCE varies across bootstrap samples. Some model fits suggest that a one standard deviation increase in Box-Cox transformed HFCE translates to a 30% increase in FI <mod . For others, the same increase in HFCE is estimated to have a much smaller effect. This suggests that the magnitude of influence of per-capita consumer spending on national food security depends in part on the countries being considered, and the causal models of food insecurity at work within them.
For both response metrics, the 4-variable model using forward variable selection tended to conserve at least one explanatory variable related to cereal yield or production. However, the four models in Fig. 7 do not agree on which of these agricultural production characteristics is more useful to the model, nor on the sign, magnitude, or significance of their coefficient estimates.
Both 4-variable models retained one measure of agricultural land quantity or quality, but with consistently negative coefficients. This indicates that, when comparing two countries, Thus, the y-axes of each distribution are in units of probability density. The distributions are colored based on the median p value observed over the bootstrap runs. The annotations in the bottomright of the plots state the explanatory variables used in the regression, and the median out-ofsample R 2 for the model runs. The explanatory variables are coded such that A is arable land per capita, C is the mean Crop Suitability Index, P is cereals production per capita, Y is cereal yield, S is HFCE per capita, E is the percentage of workers in paid employment jobs, L is the Logistics Performance Index, and W is the mean Worldwide Governance Indicator score if all else is held equal, countries with more or better-quality agricultural land are on average the less food secure of the pair. The four models in Fig. 7 disagree on the sign, magnitude, and significance of coefficients for LPI, WGI and the percent of workers in paid employment jobs. As a supplementary analysis, we also calculated SHapley Additive exPlanation (SHAP) values, which indicate the additive contribution of each explanatory variable to each model prediction. Table A2.3 summarizes the absolute SHAP values for all explanatory variables, showing that HFCE has the highest average absolute contribution to each model prediction for both response variables. Figure A2.4 visualizes the raw SHAP values for every model prediction, showing regional patterns: low HFCE by Sub-Saharan African countries have a strongly negative contribution to the expected value of both GFSI and FI <mod .

Discussion
This study analyzes cross-national food security performance using two prominent metrics: the Food Insecurity Experience Scale (FIES), and the Global Food Security Index (GFSI). The FIES is a "bottom-up" survey-based method that measures respondents' lived experience of food insecurity, and it is used to produce globally-calibrated estimates of the prevalence of moderate and severe food insecurity (Cafiero 2016;Cafiero et al. 2018). Our FIES-based response variable, FI <mod , estimates the percentage of a nation's population who have not eaten less than they should for lack of money or other resources in the past 12 months. The GFSI is a "top-down" index that relies on a suite of macro-level indicators combined with an expert-suggested weighting matrix to score countries based on the affordability, availability, and quality and safety of their food systems (EIU 2019). We caution that favorable national FIES and GFSI scores do not guarantee that the average citizen has a holistically healthy diet (GPAFSN 2016;Pérez-Escamilla et al. 2017). The FIES primarily assesses the "access" dimension of food security (Cafiero 2016), and the GFSI is not sensitive to all aspects of a healthy individual diet (e.g., specific nutrient deficiencies) (Izraelov and Silber 2019;Thomas et al. 2017).
Precisely because of their methodological differences, the GFSI and FIES can serve as complementary metrics in a twopronged approach to considering national-level food security. On its own, the GFSI is a subjective measure of national capacity for food security. However, its correlation with the FIES-based measure (Fig. 3) gives some assurance that these macro-level indicators are not completely out of touch with the lived experiences of citizens. Likewise, the FIES is wellequipped to measure food insecurity, but ill-equipped to explain it. Countries with identical prevalence of mild, moderate, or severe food insecurity may face very different challenges.
The GFSI offers 34 indicators that can be used in parallel to diagnose and alleviate barriers to food affordability, availability, and quality/safety. For example, FIES results indicate that roughly 50% of people living in both Honduras and Ghana are experiencing moderate or severe food insecurity. GFSI results for these countries reveal that many Ghanaians lack food safety for want of potable water and electricity. Honduras, meanwhile, performs better on food safety but significantly worse on measures of governance affecting national food availability.
We also used multivariable linear regression to assess the response variables' relationships to explanatory variables characterizing nations' agricultural land, agricultural production, governance and infrastructure, and household incomes. Model performance and coefficient estimates varied over a range of training datasets produced by bootstrap sampling, showing the sensitivity of these cross-country regressions to the subset of nations used for model fitting. For example, the FI <mod model with highest median out-of-sample R 2 (0.77), explained a minimum of 55% of variance in the out-of-sample predictions for one set of countries, but a maximum of 92% for another. Compared to FI <mod , GFSI was much easier to predict with the small set of macro-level explanatory variables used in this study, showing higher and more consistent out-ofsample R 2 performance across bootstrap samples. In one sense, this is unsurprising: our explanatory variables are similar in scope to macro-level GFSI indicators. However, this behavior is not necessarily obvious given the strong correlation between the two response metrics themselves. Despite this correlation, the prediction error is much higher and more variable for the FIES-based metric.
Across all model runs and bootstrap samples, HFCE was a strong predictor of both national food security metrics. HFCE estimates average consumer spending on durable goods (e.g., vehicles), non-durable goods (e.g., food), housing, and services. Importantly, HFCE estimates account for the value of farmers' consumption of their own produce based on farm gate prices (McCarthy 2013). Among univariate models, HFCE was the best single predictor of GFSI and FI <mod (Fig. 4). This result proved resilient to changes in the countries included in the input dataset and in the transformations applied to explanatory variables before model fitting (Fig. 5). Adding more explanatory variables to the model in iterations of stepwise forward selection only modestly improved the model's out-of-sample R 2 (Fig. 6). HFCE's regression coefficient was the only one with consistent positive sign and statistical significance across many combinations of response metrics, model specifications, and bootstrap samples (Fig. 7).
The quantity and quality of nation's agricultural land were not alone predictive of either food security metric. The bestperforming models during stepwise forward variable selection did retain either arable land per capita or mean CSI as a predictor. However, the coefficient estimates for these variables were consistently negative across model runs, indicating that when all else is held equal, countries with more or better agricultural land resources tended to also have lower national food security.
We find mixed evidence that the per-capita cereal production and per-hectare cereal yield were predictive of national food security. In stepwise forward variable selection, these were the first two variables added to HFCE to improve FI <mod predictions (Fig. 6), though coefficient estimates were smaller in magnitude and less consistently significant than for HFCE (Fig. 7).
The results of this cross-national analysis reinforce previous research supportive of a causal mechanism where an increase in income drives increase in food security. At the household level, lower incomes are consistently related to worse FIES food insecurity scores (Park et al. 2019;M. D. Smith et al. 2017a;M. D. Smith et al. 2017b). At the national level, economic growth has been identified as a key driver of reductions in child stunting (Headey 2013; Ruel and Alderman 2013; L. C. Smith and Haddad 2015). However, as our variation in results across bootstraps shows, the strength of the relationship between HFCE and the two national food security metrics varies based on the countries included in the training data. Aggregate economic growth does not always lead to reduction in poverty, nor do increased incomes eliminate all malnutrition (FAO 2019). Persistent income inequalities also cut off segments of the population from the benefits of aggregate economic growth and the infrastructure and services that come with it (e.g., quality health care, sanitation, and reliable power).
The strong relationship between national food security and HFCE shown here together with the universally low HFCE of subsistence farmers underscores the vulnerability of subsistence farmers to food insecurity. Subsistence farming is intrinsically variable both seasonally and interannually, and excess production from years with high yields is often unable to compensate for lean years because of storage losses, market failures, or lack of access to banking (Chambers et al. 1981;Thurow and Kilman 2010). Environmental variability, exacerbated by climate change, poses a heightened risk for these farmers, whose food consumption and local agricultural production are tightly coupled (Davis et al. 2020). Further, many smallholders simply do not own enough land to meet their food availability needs (Frelat et al. 2016) or to raise their consumption above the HCFE threshold for achieving higher food security performance. For instance, in our dataset, no country with HFCE below $5000 per capita per year had a prevalence of food security or mild food insecurity above 80% per FIES surveys.
The literature has identified on-and off-farm options for improving earnings, raising HFCE, and boosting food security. Agricultural development can increase on-farm income and food security when productivity increases are paired with functional crop markets and storage options (Abdoulaye et al. 2018;Burney et al. 2010;McArthur and McCord 2017;Webb and Block 2012). However, without these supporting factors, research finds that marginal increases in smallholder agricultural production or subsidies on agricultural inputs do not always improve food security or income (Harris and Orr 2014;Schreinemachers 2006;Walls et al. 2018). Beyond the farm gate, one study of over 13,000 sub-Saharan African farm households finds that off-farm income is an important income "stabilizer" that improves food availability (Frelat et al. 2016). Bezu et al. (2012) specifically find that Ethiopian households' consumption expenditures grow alongside off-farm income, and a wide body of literature has shown that off-farm jobs are key enablers of poverty reduction in rural areas (Djurfeldt and Djurfeldt 2013;Haggblade et al. 2010;Otsuka and Yamano 2006).
These observations from the literature, combined with our finding that HFCE is a primary driver of cross-national food security, support the proposition that the most effective strategies to improve food security will include measures to increase citizens' capacity for consumption, whether via agricultural earnings or off-farm income.
We examined the patterns in our small dataset using simple linear regression and data science techniques. Our crosssectional data can only be used to indicate "long run" differences in food security, which are the result of complex relationships between social, economic, and agricultural factors, among others (Headey 2013). Future studies may leverage larger datasets including more countries and explanatory variables, along with econometric techniques that regress on panel data, employ instrumental variables, control for country fixed effects, et cetera, that may allow the analysis to make stronger causal claims (e.g., Headey 2013; L. C. Haddad 2015, Smith andHaddad 2000). Finally, we show that model performance was significantly affected by aspects of the study that are typically left to the modeler's judgement: the choice of the response variable, the input dataset, and the model formulation (i.e., the explanatory variables selected). Rather than making just one justifiable selection of these parameters, we explicitly showed the sensitivity of our results to different combinations of decisions. Future studies may also consider employing our techniques to show this variation, including the use of multiple response variables, and the use of random sampling to portray regression results as distributions rather than single numbers which may in reality be subject to wide fluctuation with changes to the input data.

Conclusion
Despite substantial differences in methodologies and theoretical bases, the Global Food Security Index and the Food Insecurity Experience Scale metric (FI <mod ) were strongly correlated in our 65-country dataset. In regression models using explanatory variables to predict nations' food security scores, per-capita household final consumption expenditure consistently explained more variance in food security scores than other drivers. The quantity and quality of nation's agricultural land were not predictive of either food security metric. These findings were independent of modelling assumptions regarding the countries included in the input dataset, the subset of countries used for model training, the transformations applied to the explanatory variables prior to model training, and the variable selection technique used to specify multivariate regressions. We found mixed evidence that per-capita cereal production, per-hectare cereal yield, an aggregate governance metric, logistics performance, and the prevalence of paid employment work were predictive of national food security. The results of this cross-national analysis reinforce previous research supportive of a causal mechanism where, in the absence of exceptional local factors, an increase in income drives increase in food security. Initiatives that seek to improve national food security by focusing on other drivers without a clear path to improving incomes are less likely to achieve the desired effect. We conclude that the GFSI and FIES are complementary metrics, best used in tandem to monitor and explain national food security performance. Future studies may expand on these findings and techniques using more countries and a wider array of explanatory variables.
Code availability The R code used for this analysis is available upon request from the authors.
Authors' contributions Andrew Allee and Lee R. Lynd developed the study concept. Andrew Allee conducted the analysis and wrote the manuscript. Vikrant Vaze guided and reviewed the statistical analysis. All authors edited the manuscript and approved it for publication. Data availability The dataset used in this paper is available in Online Resource 1.

Declarations
Competing interests The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
A n d r e w A l l e e i s a P h D Candidate at the Thayer School of Engineering at Dartmouth College and a Research Fellow with Rocky Mountain Institute. His work aims to accelerate an inclusive clean energy transition in developing economies through data and engineering science. To that end, he has contributed to studies on wide-ranging topics, including food security, land use, cellulosic ethanol production, and rural electrification via minigrids. As a member of RMI's Africa Program, he leads efforts to accelerate Sub-Saharan Africa's growing minigrid sector through data analysis and modeling. Andrew is the recipient of a National Science Foundation Graduate Research Fellowship and holds a B.S. in Biochemistry from the University of Missouri. A fellow of the National Academy of Sciences, he is the recipient of the Lemelson-MIT Sustainability Prize for inventions and innovations that enhance economic opportunity and community well-being while protecting and restoring the natural environment, the Charles D. Scott award for distinguished contributions to the field of biotechnology for fuels and chemicals, and twotime recipient of a Charles A. Lindbergh grant in recognition of efforts to promote a balance between environmental preservation and technological advancement.
Vikrant Vaze is the Stata Family Career Development Associate Professor of Engineering at D a r t m o u t h C o l l e g e a n d a Research Affiliate at MIT. He rec e i v e d a n M S d e g r e e i n Transportation, another MS degree in Operations Research, and a PhD degree in Systems, all from MIT. He has a BS degree in Civil Engineering from IIT Mumbai. His research focuses on improving the efficiency of large-scale complex systems by using mathematical and computational modeling techniques including optimization, game theory, machine learning, data mining, and statistical inferencing. He teaches courses on Statistics and Operations Research. His research has been funded by multiple government agencies, including the NSF, DOD, FAA, NIH, and DOT, as well as by several industrial sponsors. He is the recipient of numerous awards and honors from industry, academia and various government agencies.