Resource frontiers and agglomeration economies: The varied logics of transnational land-based investing in Southern and Eastern Africa

Actor-level data on large-scale commercial agriculture in Sub-Saharan Africa are scarce. The peculiar choice of transnational investing in African land has, therefore, been subject to conjecture. Addressing this gap, we reconstructed the underlying logics of investment location choices in a Bayesian network, using firm- and actor-level interview and spatial data from 37 transnational agriculture and forestry investments across 121 sites in Mozambique, Zambia, Tanzania, and Ethiopia. We distinguish four investment locations across gradients of resource frontiers and agglomeration economies to derive the preferred locations of different investors with varied skillsets and market reach (i.e., track record). In contrast to newcomers, investors with extensive track records are more likely to expand the land use frontier, but they are also likely to survive the high transaction costs of the pre-commercial frontier. We highlight key comparative advantages of Southern and Eastern African frontiers and map the most probable categories of investment locations. Supplementary information The online version contains supplementary material available at 10.1007/s13280-021-01682-z.

The key outcome variable indices, i.e., resource frontier and agglomeration economies, were derived using four spatial variables. For each investment site (n=121), we derived a frontier index using spatial data on population density and the area of unconverted land and an agglomeration economies index using spatial data on market activity and field size (Table S3 lists data sources). We selected datasets that characterized the investment conditions as close as possible to the period when the investment decisions were made. Since BNs generally require discretizing continuous data, we used a mix of descriptive statistics and published standards to decide on the number of bins and bin-widths for the spatial variables (Table S2 lists the bins and bin-widths for spatial data). To derive the indices, we elicited a prior conditional probability for each parent state combination of the two indices, based on the definitions characterizing these indices (see Model under Methods in the main text for definitions) and our expertise in the domain (Cain, 2001). To elaborate this with an example from the agglomeration economies index, we take the case of an investment location that records a large field size and low-level market activity (i.e., the parent states). For this parent state combination, each of the low, medium, and high child states of the agglomeration economies index, were assigned a probability score of 0.6, 0.4, and 0, respectively. These elicited probability scores indicate that for a given investment location with a large field size and low market activity, the likelihood of the occurrence of agglomeration economies is more likely to be low with a 60 % chance, less likely to be medium with a 40% chance, and unlikely to be high. Table S4 presents the elicited conditional probabilities of the resource frontier index and S5 that of the agglomeration economies index. The conditional probability scores were arrived at by revising and deliberating the scores until there was agreement among all the authors. These outcome variable indices were then parameterized using spatial data. The probability distributions of the indices also accounted for the uncertainty in measurement errors. For parameter estimation, we used the built-in expectation-maximization (EM) algorithm in Netica™ (Lauritzen, 1995).
To facilitate interpretation, we derived an output typology by tabulating the resource frontier and agglomeration economies in a matrix. The typology was calibrated using area-weighted probability scores along the low, through medium, through high gradients of frontier and agglomeration economies indices (Table S6 and Fig. 1c in the main text).

Model validation:
We arrived at the final BN through an iterative and interactive process involving revisions, verification, and validation (Kjaerulff & Madsen, 2013). At each revision we verified the directional influence and tested the plausibility of the results. A series of final validation exercises was carried out with 3 selected investors among the sampled investors and a group of researchers who work in the region. In these final validation exercises, we asked the participants to explore the directed acyclic graph (DAG) visually and assess the conceptual framework and the plausibility of the directional influences by taking different investment conditions as examples. We also assessed the plausibility of the overall results, against participants' expertise in the sector or in the region.
To assess the influence of each selected variable on the other variables quantitatively, we conducted a series of sensitivity tests. A sensitivity test estimates the mutual dependence between two variables, by quantifying the amount of information that an observed variable yields about another. It is an entropy reduction measure, which quantifies how much knowing one variable reduces the uncertainty regarding the other (Pearl, 1988).  Table S1. Economic, agriculture, and social development context in the sample countries between 2008-2011(Source: FAOSTAT, 2019Hansen et al., 2013;Jayne et al, 2010;World Bank, 2014, 2020  Aggregate effect of product market reach, skillset, and regional experience none: all the child/observable variables in none state, limited: one child variable is in none state or both child variables in limited state, extensive: at least one child variable is in extensive state and others in limited or extensive state a. Product market reach (L) Degree of local and export market reach none: no market reach, limited: only local market reach, extensive: at least some export market reach i. Export market reach (L) Export market for the invested crop is already established no: no established export product market, yes: export product market already established ii. Local market reach (L)

SI Figures and
Local market for the invested crop is already established no: no established local product market, yes: local product market already established b. Skillset (L) Aggregate effect of farming or forestry experience in the same or similar types of production and other farming or forestry experience none: all the child variables in no state, limited: all the child variable combinations except none and extensive states, extensive: child variables farming or forestry experience in the same or similar types of production AND regional experience are in yes state i. Farming or forestry experience in similar types of production (L) Previous farming or forestry experience in the same or similar types of production as the target production of the investment under study no: no farming or forestry experience in the same or similar types of production, yes: has farming or forestry experience in the same or similar types of production ii. Farming or forestry experience in other types of production (L) Previous farming or forestry experience in types of production other than the target production of the investment under study no: no farming or forestry experience, yes: has farming or forestry experience in other types of production iii. Regional experience (L) Previous experience in farming, forestry, trading, or in any commercial activity in Southern and Eastern Africa no: no commercial experience in the region, yes: previous commercial experience in the region in farming, forestry, trading, or any other commercial activity 2. Types of production Type of production high-value food crops: high-value deciduous fruits and nuts, forestry: logging and plantation forestry, other: other agriculture excluding high value food crops