Integrated Spatial Simulation of Population and Urban Land Use: a Pan-European Model Validation

Spatial models jointly simulating population and land-use change provide support for policy-making, by allowing to explore territorial developments under alternative scenarios and resulting impacts in the environment, economy and society. However, their ability to reproduce observed spatial patterns is rarely evaluated through model validation. This lack of insight prevents researchers and policy-makers of fully grasping the ability of existing models to provide sensible projections of future land use and population density. In this article, we address this gap by performing a model validation of the LUISA Territorial Modelling Platform, a spatial model jointly simulating population and land use at a fine resolution (100 m) in the European Union and United Kingdom. In particular, we compare observed and simulated patterns of population and urban residential land-use change for the period of 1990–2015, and evaluate the model performance according to different degrees of urbanisation. The results show that model performance can vary depending on the context, even when the same data and methods are uniformly applied. The model performed consistently well in urban areas characterized by compact urban growth, but poorly where residential development occurred predominantly in scattered patterns across rural areas. Overall, the model tends to favour the formation of densely populated, highly accessible urban conglomerations, which often do not entirely correspond to the observed patterns. Based on the validation results, we propose directions for further model improvement and development. Model validation should be regarded as a critical step, and an integral part, in the process of developing models for policy support. Supplementary Information The online version contains supplementary material available at 10.1007/s12061-023-09518-x.


Appendix B. Allocation module B.1. Local suitability
The estimation of local population pressure and land-use suitability is based on an iterative procedure that starts with an empirical function explaining the attractiveness of a location for allocating residential population in a given time-step, as follows: , =  0 +  ,−1 +  ,−1 +  ,−1 +  1  1, + ⋯ +    , Eq.B1 where:  , is an empirical function representing the local attractiveness for residential functions, i.e. expected number of people in gridcell i in time-step t, as a function of a set of spatially-explicit explanatory variables;  0 is a constant;  ,−1 is the local population density (i.e.number of people) in gridcell i in the previous time-step t-1  ,−1 is the prior average population density in the immediate neighbourhood of gridcell i, defined through a spatial weight matrix W (Queen's case), in the previous time-step t-1;  ,−1 is the prior relative potential accessibility level in gridcell i in time-step t-1, calculated by dividing potential accessibility levels in gridcell i by the average values in the country of scope (see Jacobs-Crisioni et al., 2016); , ,  are estimators of the effect of prior population, neighbouring population density and accessibility, respectively, on observed local population densities;  1 to   is a set of s parameters estimating the effect of the spatially-explicit explanatory variables  1, to  , on observed local population densities;  1, to  , is a set of s local exogenous drivers assumed to be independent to prior local population densities, including slope, elevation, proximity to towns, proximity to leisure/touristic areas, proximity to roads, proximity to water, potential crop yields and zoning regulations (e.g.Natura 2000, nationally designated protected areas).
For a detailed overview of the considered variables and data sources, see Appendix C.These variables do not fully cover, however, the whole range of factors that presumably govern the local residential attractiveness.A previous study has found substantial temporally-constant heterogeneity in municipal population density changes that appeared to be unaccounted for, when controlling for existing population densities, potential accessibility and explanatory spatial drivers (Jacobs-Crisioni and Koomen, 2017).We attempted to capture this heterogeneity through the estimation of fixed effects of population density change at municipal level for the relatively long period of 1961-2011, based on a separate panel data regression of municipal EU census data (Gløersen and Lüer, 2013).These fixed effects were then downscaled to the 100m resolution using inverse distance weighting interpolation.They were then included as an explanatory variable, in order to control for unobserved heterogeneity in population density among municipalities and capture established historical trends (implicitly assumed to be constant over time).Based on municipal population trends, this variable captures a different scale of change compared with variables that vary at a coarser grain (such as potential accessibility) or at a much finer grain (such as slope).
Eq. B1 was empirically fitted for all modelled countries separately, using Global Human Settlement Layer (GHSL) population data series (Florczyk et al., 2019) as a reference for observed population patterns.Concurrently, the total population to be (re-)allocated within a region was determined for every simulated time-step, as a function of the net population change in the region (positive or negative) and a fixed amount of people that is assumed to move residential location within the region in between time-steps, as follows: where: Kj,t is the total number of people to be (re-)allocated within a region in time-step t ∆ , is the net population change in region j between time-step t and t-1  is a parameter indicating the amount of people changing residential location within region j each year (assumed to be 3.8%, i.e. the average intra-regional mobility according to the 2011 EU census).T is the number of years between time-steps t and t-1  ,−1 is the prior number of people living in region j in time-step t-1 Finally, the total population to be (re-)allocated within a region and local attractiveness for residential functions are combined into a function of local population pressure, representing both the relative attractiveness of a location for allocating the expected changes in regional population and the potential local demand for residential functions: Eq. B3 where:  ,  is the local population pressure in gridcell i located in region j, in time-step t; Kj,t is the total population to be (re-)allocated within region j (see Eq. B2);  , is the local attractiveness for residential functions of gridcell i located in region j, in time-step t (see Eq. B1).
Hence, local population pressure translates broader demographic changes at the regional level into relative changes in the local demand for providing residential functions (positive or negative), either by (de-)densifying existing urban areas (or even abandoning them), or by converting non-urban areas.This variable is then subsequently included as an endogenous driver in the estimation of urban land-use suitability.Country-specific land-use suitability functions were estimated separately for all simulated land-use types, through statistical analysis of observed land-use patterns.Since land-use change is conceptualised as a discrete phenomenon, we used the binomial logistic regression (BLR) model, a method commonly applied in land-change analysis to explain the occurrence of individual land-use types (Verburg et al., 2004).The BLR model transforms the dependent variable into a logit variable by estimating the odds of observing that land-use in a certain gridcell according to a set of spatially-explicit explanatory factors: ,, =  (+ , + , ) /(1 +  (+ , + , ) ) Eq. B4 where:  ,, is the local land-use suitability, expressed as the probability of land-use type k occurring in gridcell i in year t, given a set of spatially-explicit explanatory factors assumed to determine the suitability of a location to support that land-use;  is a constant;  is a vector of parameters estimating the effect of explanatory variables (i.e.spatially-explicit drivers) X on the local land-use suitability;  , is the values of a set of spatially-explicit drivers in gridcell i, including the same set of exogenous drivers used in Eq.B1 and dynamic endogenous factors such as population pressure (Eq.B3) and accessibility (see Appendix C),;  , is the values of a set of local drivers X that are spatially-lagged in the immediate neighbourhood of gridcell i in year t, according to spatial weight matrix W (Queen's case);  is a vector of parameters estimating the effects of spatially-lagged local drivers WX on the local land-use suitability.
We used the same set of explanatory factors in the estimation of land-use suitability for urban residential land use and the other simulated land-use types.Coefficients were estimated through maximum likelihood estimation, indicating the direction and intensity of each spatially-explicit driver in explaining land-use suitability.
Endogeneity issues invariably arise when explaining existing land-use patterns by variables that, themselves, depend on prior or current land use, likely causing biases in the estimated coefficients.For urban residential land use, this problem can be avoided by utilizing time series of built-up densities between 1990 and 2015 registered in the GHSL data series (Florczyk et al., 2019), as they entail fine resolution datasets that are consistently produced with identical remote sensing technology and interpretation models.The suitability functions for urban residential land use were thus fitted using land cover changes between 1990 and 2015 as observed in a subset of GHSL data series.Urban residential gridcells were assigned the value 1 in case they were not urban in 1990 and became urban by 2015, and 0 otherwise.For the other simulated land-use types, consistent time series are unavailable, or offering insufficient observations, for reliably fitting land-use suitability functions based on land-use changes (Lavalle et al., 2016).Hence, the suitability functions for land-use types other than urban residential (i.e. industry and commerce, agriculture, forestry) were fitted using CLC 2012 (Büttner et al., 2014) to represent the presence of land cover.
Local land-use utility is hereby computed as the expected net present value (NPV) of a given land use at a particular location, while taking into account its specific land-use suitability, expected revenues and costs.NPV is a standard method used in capital budgeting to appraise long-term investments, by measuring discounted time series of expected cash inflows and outflows, while taking into account the time value of money.To be regarded as economically attractive, a land-use investment should have a strictly non-negative NPV, after accounting for the expected annual costs and gross revenues associated with that particular land use within a given time-horizon, and their respective initial conversion costs.Land-use utility is thus computed as follows: , =  , = − , + ∑  ,, − , (1+  )

𝑇 𝑘 𝑡=1
Eq. B5 where:  , is the land-use utility of a land-use type k in gridcell i  , is the net present value of converting/maintaining the land in gridcell i to land-use type k  , are the initial conversion costs in gridcell i that are incurred through the conversion of land to that land-use type k (in €/ha).Initial costs include the required investments on, for example, new buildings, facilities and machinery, and eventual land clearing costs, as the existing physical makeup of a location may call for specific demolition operations or vegetation clearing.In addition, empirical studies demonstrated that land systems appear to show inertia to land-use change partly as a result of the magnitude of previous investments on land and aversion to change activities under uncertainty (e.g.Regan et al., 2015;Schatzki, 2003).Hence, in the quantification of the initial conversion costs, we also account for eventual sunk costs (i.e. the investments on land and capital assets that may have already been made in the past for converting land into the existing land use) and opportunity costs (i.e. the current net benefits that would be foregone in case land is converted).Initial costs thus depend to a large extent on the prior existing land use in a particular location.They are considered to be 0 if the gridcell is already under the land use for which NPV is being estimated.Land-use types requiring large investments or yielding high net revenues are typically given larger sunk costs and opportunity costs, respectively.This implies that existing land use may not be converted easily, even when an alternative land use would potentially yield higher net revenues. ,, is the annual gross revenues that can be obtained in gridcell i in year t from land-use type k (in €/ha/year), including revenues obtained from e.g.rental income, selling crops, subsidies, monetary valuation of ecosystem services, etc.;  , is the annual costs in year t associated with that land-use type k (in €/ha/year), e.g.maintenance costs, agricultural field operations and inputs, taxes, depreciation, etc.;   is the investment time-horizon of land-use type k (in years);   is the discount rate associated with land-use type k.
The annual gross revenues are measured through a proxy calculated by rescaling assumed maximum annual revenues of a given land-use type by the respective local land-use suitability function, as follows: ,, =  ,, * max  , Eq. B6 where:  ,, is the annual gross revenues that can be obtained from land-use type k in gridcell i in year t (in €/ha/year);  ,, is the local land-use suitability of land-use type k in gridcell i in year t, as estimated in Eq.B4 above; max  , are the assumed maximum annual revenues (in €/ha), i.e. the gross revenues obtained from land-use type k when the local land-use suitability is optimal (i.e. ,, = 100%).
Eq. B6 thus aims to capture that land-use revenues are highly location-dependent, for example, due to differences in local land-use suitability (e.g.house prices are often determined by the existence of local amenities, crop yields depend on a combination of local biophysical characteristics) and location-specific costs not captured by the base annual costs  , (e.g. transportation costs of commodities).The calculated NPVs are not presumed to be exact estimations of land-use utility, but rather to provide a local measure of the relative competitiveness among different land use alternatives in each location.Initial costs, maximum revenues, annual costs, discount rates and timehorizon were specified for the different simulated land-uses through a combination of literature review, available databases, official statistics and modelling results from specialised models (see Appendix D for a specification of these variables for the Netherlands, as an illustrative example).

B.3. Simulation of land-use and population patterns
The simulated land-use types are dynamically allocated so that the overall utility derived from land use within a region is maximised in every time-step, taking into account previous land-use and population patterns, local utility of alternative land-use activities and their expected land demand.For that purpose, we implement the discrete version of the spatial allocation algorithm proposed in Hilferink and Rietveld (1999;see Koomen et al., 2011 for a detailed description of the continuous and discrete models), which mimics iterative bidding processes in the land market.The algorithm is doubly constrained by, on the one hand, the regional land demand and, on the other hand, the regional land supply (i.e. the amount of land available to accommodate these activities), as follows: , =   .  . . , , for all simulated land-use types  and gridcells  Eq.B7 with   and   being specified so that the constraints below (i.e.Eqs.B8, B9 and B10) are satisfied: where:  , is the area of land-use type  allocated in gridcell , which according to the constraint specified in Eq.B8 can only be 0 (i.e.no land is allocated to land-use type k) or   (i.e. the whole gridcell size area, in this case 1 hectare, is allocated to land-use type k);  , is the land-use utility of land-use type k in gridcell i (i.e. the NPV of land-use type k in that location, as specified in Eq.B5);  is a parameter to adjust the model sensitivity (typically, 1 as default value);   is the demand balancing factor, which ensures that the total amount of allocated land for the simulated landuse type k equals its respective regional land demand  , ;   is the supply balancing factor, which ensures that the total amount of allocated land in gridcell i does not exceed its size   ;   is the area of gridcell  (i.e. 1 hectare);  is the number of simulated land-use types;  , is the total aggregated land demand for the simulated land-use type  in region j, as specified by the regional demand module (in hectares).
Solving the model entails finding the sets of values for all   and   that are equal to: Eq. B12 The appropriate values for   and   are found through an iterative approach mimicking a bidding process between competing land uses, in which each land-use attempts to get its total demand satisfied, but may be outbidded in a given location by another land use that derives higher utility from land.For a detailed description of the iterative bidding process, we refer to Hilferink and Rietveld (1999).
After the spatial allocation of land use (including urban residential land use), population is then redistributed over the region.The method to do so is similar to the way population pressure is computed (Eq.B3).However, a few additional elements are introduced to ensure that the observed inertia of the population patterns is maintained.Most importantly, the empirical function explaining local attractiveness for allocating population (i.e.Eq.B1) is refitted on specific subsets of the calibration data that, through their selection mechanism, include the effects of allocated urban land use, as follows: ′ , =  ,  +  ,   +  ,   Eq.B13 ′ , is an empirical function representing the local attractiveness for allocating population in gridcell i in timestep t, given a set of spatially-explicit explanatory factors, and according to subsets of the calibration data;  ,  is an empirical function similar to Eq. B1, refitted on a set of the calibration data where land use is classified as urban residential and travel time to the closest major urban cluster is up to 45 minutes.Major urban clusters are defined consistently across Europe as patches of contiguous urban residential land use that together have at least 30,000 inhabitants, and a minimum population of 5,000;  ,   is an empirical function similar to Eq. B1, refitted on a set of the calibration data where land use is classified as urban residential and travel time to the closest major urban cluster is longer than 45 minutes;  ,   is an empirical function similar to Eq. B1, refitted on a set of the calibration data where land use is not predominantly urban residential, but people were living there in the previous time-step t-1.
The three subsets of  , are mutually exclusive per gridcell, so that the result is always zero for two of the subset functions.Population changes are thus only possible if a grid cell is urban residential, has just become urban residential, or is not urban residential but already had population at the reference starting point.The inclusion of different subset functions is meant to account for distinct location preferences in different segments of the real estate market.Population distributions are then iteratively allocated, as a function of previous local population density and the relative attractiveness of a location for allocating the expected changes in regional population (for a detailed description of the allocation algorithm, see Jacobs-Crisioni et al., 2017): 1 yield maps from the BIOMA project (Donatelli et al., 2015); 2 accessibility and municipality fixed effect estimates, which combine CORINE derivates with TRANS-TOOLS road networks using methods from Jacobs-Crisioni and Koomen (2017); 3 based on tourism-specific data obtained from the online TripAdvisor database; 4 based on roads of at least class 3 according to the TomTom roads network database. 5 2020); annual revenues of (semi-)natural vegetation were assumed to be the average of the monetary valuation of the ecosystem services provided by (semi-)natural grassland, heathland and inland dunes { , |0,   }

Table C . 1 :
, is the number of people allocated in gridcell i in time-step t  ,−1 is the prior number of people in gridcell i in time-step t-1 Kj,t is the total population to be (re-)allocated within a region (see Eq. B2)  is the parameter indicating the amount of people changing residential location within region j each year T is the number of years between time-steps t and t-1 ′ , is the local attractiveness to allocate population in gridcell i located in region j, as explained by an empirical function given a set of spatially-explicit explanatory factors, and according to subsets of calibration data (see Eq. B13) Variables used as explanatory factors in Eqs.B1, B4 and B13 COPERNICUS EU digital elevation model. 6inventories of Natura 2000 and nationally designated areas, as provided by EEA.