A sequential multi-level framework to improve habitat suitability modelling

Habitat suitability models (HSM) can improve our understanding of a species’ ecology and are valuable tools for informing landscape-scale decisions. We can increase HSM predictive accuracy and derive more realistic conclusions by taking a multi-scale approach. However, this process is often statistically complex and computationally intensive. We provide an easily implemented, flexible framework for sequential multi-level, multi-scale HSM and compare it to two other commonly-applied approaches: single-level, multi-scale HSM and their post-hoc combinations. Our framework implements scale optimisation and model tuning at each level in turn, from the highest (population range) to the lowest (e.g. foraging habitat) level, whilst incorporating output habitat suitability indices from a higher level as a predictor. We used MaxEnt and a species of conservation concern in Britain, the lesser horseshoe bat (Rhinolophus hipposideros), to demonstrate and compare multi-scale approaches. Integrating models across levels, either by applying our framework, or by multiplying single-level model predictions, improved predictive performance over single-level models. Moreover, differences in the importance and direction of the species-environment associations highlight the potential for false inferences from single-level models or their post-hoc combinations. The single-level summer range model incorrectly identified a positive influence of heathland cover, whereas sequential multi-level models made biological sense and underlined this species’ requirement for extensive broadleaf woodland cover, hedgerows and access to buildings for roosting in rural areas. We conclude that multi-level HSM appear superior to single-level, multi-scale approaches; models should be sequentially integrated across levels if information on species-environment relationships is of importance.


Introduction
The ongoing rapid decline in biodiversity has been called the sixth mass extinction (Barnosky et al. 2011) and the need to reverse this trend has never been greater. To be effective, efforts to protect and enhance key habitats whilst mitigating against the effects of environmental change must be informed by the best available information on species ecology and distributions (Loiselle et al. 2003). Habitat Suitability Models (HSM) can inform environmental policy, spatial planning and conservation practice by filling in gaps around patchy, biased occurrence data with landscape-scale information on predicted habitat suitability and underlying environmental correlates (Guisan et al. 2013). However, poor models lead to unreliable inferences; if low model performance or high uncertainty go undetected, derived decisions can be costly, ineffective or even damaging to conservation efforts (Loiselle et al. 2003). More transparent, flexible frameworks that facilitate the implementation of best practice HSM techniques are therefore required, particularly for taxonomic groups with limited species records.
A major parameter that can influence the predictive accuracy and usefulness of an HSM is the spatial scale at which predictors are considered (Guisan and Thuiller 2005;De Knegt et al. 2010). Species respond to their environment at different scales according to their ecology and the spatial arrangement of the resources and/or conditions they require (Wiens 1989;Gehring and Swihart 2003;Mayor et al. 2009;Austin and Van Niel 2011). Single-scale modelling using a universal predictor grain and extent is widely adopted (Vicente et al. 2014), despite the fact that multi-scale HSMs that integrate predictors measured at their 'scale(s) of effect' (Holland et al. 2004) provide more accurate predictions, give deeper ecological inference and avoid spurious conclusions caused by scale mismatches (Poizat and Pont 1996;De Knegt et al. 2010;Vicente et al. 2011;Bellamy et al. 2013;Timm et al. 2016). However, multi-scale modelling is a complex and computationally intensive process (Scholes et al. 2013). Combining predictors measured over a range of scales increases the likelihood of multicollinearity (Lipsey et al. 2017), which breaks statistical assumptions and confounds model inference (Dormann et al. 2013;Bradter et al. 2013;Lipsey et al. 2017). The need for accessible technical solutions to overcome these statistical challenges has recently been called for (McGarigal et al. , p. 1171. To aid multi-scale HSM development and interpretation, geographic or behavioural levels (hereafter referred to as 'levels') can be ascribed to conceptualise the hierarchical structuring of the processes driving habitat selection (Johnson 1980;Wiens 1989;Lindenmayer 2000;Mayor et al. 2009;McGarigal et al. 2016). For example, the areas where an organism forages are nested within its home range, which also encompasses areas providing shelter, mating opportunities and other resources. This home range is nested within the population's geographic range. Distributions at higher levels are shaped by factors that vary slowly across space, such as the influence of climate on population ranges; more local scale, patchy predictors influencing a species' mobility, resource distribution or biotic interactions are important at lower levels (Pearson and Dawson 2003;Pearson et al. 2004;Vicente et al. 2014;Razgour et al. 2014). HSMs that integrate drivers at their scale of effect across multiple levels, and that encompass the full range of conditions experienced across the population range, provide a more complete characterisation of a species' niche and prevent truncation of modelled speciesenvironment relationships (Barbet-Massin et al. 2010;DeCesare et al. 2012;Fournier et al. 2017;Heisler et al. 2017;Bauder et al. 2018;Mateo et al. 2019b).
Our sequential multi-level framework begins with the user prescribing a series of nested geographic or behavioural levels based on the ecology of the focal taxonomic group; multi-scale models are built at each of these levels in hierarchical (top-down) order and the predictive suitability indices generated at each level are fed into the subsequent level's model(s) as a predictor. This approach helps the user to understand the influence of each candidate predictor and to select their best performing scale, whilst simultaneously enabling context dependency by providing regional, higher level habitat suitability information and minimising multicollinearity between the multi-scale predictors by splitting them across levels. Using a test species and study area, we quantify the accuracy of these sequential multi-level models in comparison to two other commonly-applied multi-scale approaches: (i) Single-level, multi-scale-no integration of models across levels; an approach taken by 18% of multi-scale habitat selection papers reviewed by McGarigal et al. (2016). (ii) Post-hoc multi-level-combinations of singlelevel, multi-scale model outputs (e.g. Johnson et al. 2004;Hattab et al. 2014;Fournier et al. 2017;Zeller et al. 2017;Bauder et al. 2018).
We hypothesised that, by integrating information on drivers operating across all levels, the multi-level approaches would provide more accurate predictions compared to single-level models (Pearson et al. 2004;Mateo et al. 2019b). Moreover, by allowing a species' response to features of the lower level, local environment to vary according to wider, regional conditions set at higher levels, we expected that our sequential multi-level framework would provide more realistic and nuanced information on species-environment relationships and higher predictive accuracy compared to the post-hoc combinations of single-level models. Enabling these 'cross-scale' interactions should help to account for local adaptation and improve our ability to provide effective recommendations Whittingham et al. 2007;Lindenmayer et al. 2008;Scholes et al. 2013;Oliver and Morecroft 2014;Spake et al. 2019). Our methods build on the sequential multi-level models trialled by Pearson et al. (2004) and Mateo et al. (2019b) by incorporating multiple scales at each level; additionally we outline steps for predictor and scale selection at each level, as recommended by McGarigal et al. (2016).
The framework we set out is flexible and can be applied to a range of ecological responses using various types of statistical models. Our example uses presence-only species records collated from multiple sources via local environmental record centres and online databases, since such data are becoming increasingly accessible and used for HSM (Graham et al. 2004). Major challenges with these data include sampling bias, poor metadata and low locational precision (Guisan et al. 2006). We apply our framework to a UK bat species that is of conservation concern across Europe, Rhinolophus hipposideros (lesser horseshoe bat). This woodland-adapted species underwent a rapid decline across its range from the 1950s, becoming locally extinct in many areas (Bontadina et al. 2002(Bontadina et al. , 2008. Information on the finegrained habitat requirements of all woodland-specialist bats is scarce, largely because of difficulties in detecting and surveying for these typically rare animals in structurally complex woodland environments, impeding effective management.

Materials and methods
The model framework Our sequential multi-level HSM framework is straightforward and transferable. In short, following Johnson's hierarchy (1980), multiple geographically nested model levels are defined for the focal species based on the species' ecology, modelling objectives and species data availability. The top level should comprise the entire population range, wherever possible. The response metrics (e.g. species presence or abundance) are then sequentially modelled at each prescribed level in hierarchical (top-down) order, whilst incorporating predictions from a preceding level as a predictor.
Our framework has five steps ( Fig. 1); details of its application to modelling R. hipposideros habitat suitability in Britain are provided below. We used MaxEnt (Phillips et al. 2006), a popular presence-only statistical HSM algorithm that tends to perform well compared to alternatives (e.g. Guisan et al. 2007). Gridded environmental predictors were created in ArcGIS (v. 10.2,www.esri.com). All other analyses were carried out using R (v. 3.4.2; R Core Team 2017) in R Studio (v.1.1.383; RStudio Team 2016). Data output and R code for key steps are provided in an online repository: https://bitbucket.org/chloebellamy FR/sequential-multi-level-hsm-framework.
Step 1: selecting levels Three geographically nested levels were defined for R. hipposideros (Supplementary Information (SI), Table SI1.1): British population range (level 1), summer range (level 2) and local habitats (level 3, at which separate models were produced for roosting and foraging behaviours). Population and summer ranges were modelled at the national extent in line with species data availability. A smaller model extent was defined for the local habitat level, based on the current R. hipposideros population range in Britain and wider research objectives (Fig. 2). Roosting and foraging habitats were modelled separately because these are behavioural and spatial subsets of the summer range for which we could assign many high resolution (B 100 m) species records to using the metadata provided (SI 1.1); bats require different habitat features and environmental conditions for foraging and roosting and so differentiating drivers and scales of effects is critical for targeting management to meet legal obligations for protecting bat roosts and foraging habitats (Mitchell-Jones 2004;Bellamy and Altringham 2015).
Step 2: selecting predictors and scales Candidate predictors were chosen to reflect environmental features and conditions likely to impact temperate woodland bat species distributions at each level (Table 1; Rebelo and Jones 2010;Boughey et al. 2011;Bellamy et al. 2013;Fuller et al. 2018). There is a focus on climate at the population range level; at lower levels, predictors provide mapped data on landcover, landscape structure and land management.  The complexity of a patch shape is calculated using: 4pA p 2 where A is area and p is perimeter (Osserman 1978). Values closer to one indicate a less complex, more compact patch shape. Perfectly circle shapes will have a value of 1 b Distance variables were always measured at the resolution of the gridded predictors, with no maximum search distance prescribed. Distance to woodland edge was inverted within a woodland polygon to distinguish between the woodland interior and exterior At each level, two candidate spatial scales were selected to represent the distances or areas typically covered by R. hipposideros according to published radiotracking data (Table SI1.1), as species' mobility has been found to be a good indicator of likely scales of effect (Jackson and Fahrig 2012). In the summer, this species typically focusses feeding behaviour 500-600 m around the roost (often comprising of building roof spaces; local habitat level), whilst generally remaining a mean distance of 1-3 km, and a maximum distance of around 5 km, from the roost (summer range level; Table SI1.1). In the autumn, R. hipposideros disperses to caves for hibernation. Although there is a paucity of published data on this dispersal behaviour, it is thought that these bats typically travel around 5 km, but longer distances have been recorded (H. Schofield, personal communication).
The pixel size of the input data represented the smaller scales of movement relevant to the population range (5 km), summer range (1 km) and local (roost or foraging) habitat (100 m) levels. To measure each predictor at the level's larger scale, a moving window analysis measured focal statistics around each raster pixel, using a circular window with a diameter set by the larger scale (10 km, 3 km and 500 m, respectively; Bellamy and Altringham 2015; Table 1). For example, at the population level the percentage cover of urban land use was measured within each 5 km pixel (smaller scale) and mean urban cover was also measured within a 10 km radius of each 5 km pixel (larger scale). This resulted in a wide scale range (100-10 km), that encompasses 30-50% of the maximum distances recorded at each level, whilst going well above average dispersal distances, following recommendations by Fahrig (2012, 2015).
Of the two scales, the optimal scale was then identified for each predictor at each level by creating univariate models (single predictors measured at a single scale) using default settings with threshold features disabled ('dismo' package; Hijmans et al. 2017). The scale which achieved the highest measure of training gain, which can be interpreted as the likelihood of the presence points (Merow et al. 2013), was selected.
Step 3: processing species data and pseudoabsences Rhinolophus hipposideros presence records were collated from 2005 to 2016 from individual recorders, the Bat Conservation Trust (BCT) database and the National Biodiversity Network Gateway (NBN Gateway; https://www.nbn.org.uk/) (SI3). Datasets were cross-checked for duplication. Records were categorised by date, locational precision and bat activity (foraging or commuting; roosting; mating; hibernating or unknown) using metadata (e.g. survey type) and were filtered according to their relevance and reliability at each level (SI 1.1). Records were only retained at each level if their locational precision was equal to or finer than the level's pixel size. The population range level model incorporated records of bats throughout their annual cycle, whereas the summer range and local level model only included records made during May to August inclusively. A single record per pixel was retained for modelling.
Many of the collated records were collected on an ad hoc basis or originated from field surveys designed using a variety of regional-scale sampling strategies. The models were therefore calibrated to account for the likely sampling bias of recorders towards accessible and heavily sampled areas (Graham et al. 2004). A 'target group' approach was used whereby 10,000 pseudoabsence data were selected according to estimated survey effort for all bat species in Britain over the same time period, as these records are presumed to have been collected using similarly biased strategies (Phillips et al. 2009) (see SI 1.1 for details; Fig. SI1.1).
Step 4: model nesting and tuning Once the environmental and species data were processed, the habitat suitability models were run for each level in hierarchical order, starting with the population range level. The gridded layer of logistic habitat suitability indices (HSI) generated by each model was disaggregated to the pixel size assigned to a subsequent level using bilinear interpolation, and these output data were included in the lower level model's set of candidate predictors. In this way, information on the predicted population range was used to help predict summer range suitability, and the resulting summer range predictions were subsequently fed into the local foraging and roosting models. The following processes were repeated for each model at each level: a. Multicollinearity: candidate predictors were compiled at their best performing scale alongside the disaggregated HSI from any preceding higherlevel model. The variance inflation factor (VIF) was calculated on this set of predictors. Highly collinear predictors were removed using the 'vifstep' stepwise function, 'usdm' package (Naimi et al. 2014) whereby predictors with a VIF higher than the conservative threshold of three were removed (Zuur et al. 2010). b. Creating data folds: presence data were split into three geographically-partitioned folds using k means clustering ('kmeans' function, 'stats' package; R Core Team 2017) the minimum distance of each pseudoabsence to the centroids of each species data fold was then used to partition pseudoabsences into corresponding spatial folds. c. Model tuning: optimal MaxEnt model settings were identified using the 'ENMeval' package (Muscarella et al. 2014), which uses raw output values at presence locations to provide samplesize-adjusted Akaike information criterion (AICc; Burnham andAnderson 2002, Warren andSeifert 2011). Combinations of feature types (predictor transformations) were tested (linear (L), quadratic (Q), product (P, enables interactions), and hinge (H)): L, H, LQ, LQH, LQP and LQHP. Threshold features were disabled to limit model complexity, thereby reducing the potential for overfitting (Shcheglovitova and Anderson 2013). The regularisation multiplier was varied in steps of 0.5, from 0.5-4.
Step 5: model validation and interpretation The geographically-partitioned data folds were used for three-fold cross validation. Outputs were first validated using the mean Area under Receiver Operating Characteristic (ROC) curve (AUC) statistic. All data were then entered into a final MaxEnt model using the 'dismo' package and optimal settings to produce model predictions. The Maximum Training Sensitivity and Specificity (MTSS) occupancy rule was used to partition HSI values into 'suitable' and 'unsuitable' categories (Liu et al. 2013). The proportions of pseudoabsences falling inside pixels predicted to be unsuitable ('true (pseudo)absence rate' or 'pseudo-specificity') were measured and a binomial test calculated the statistical significance of the proportion of presence records falling within the suitable area ('true presence rate' or 'sensitivity').
Marginal response curves showing the predicted probability of a species presence over a predictor's range were used to visualise species-environment relationships. MaxEnt provides percentage contribution values as an indication of the relative influence of each predictor, and a 'lambdas file' provides coefficients for the transformed features fitted (L, Q, H or P type; Phillips et al. 2006). To facilitate understanding of how much influence each environmental predictor has in the presence of information on regional suitability (sequential multi-level models) and without this information (single-level models) at the summer range and local levels, the sequential multi-level model percentage contribution values were also rescaled. This was done by summing the contribution values for all environmental predictors within the model, apart from the contribution value achieved by the higher-level HSI predictor, and dividing each environmental predictor's contribution value by this summed figure. Comparisons with other approaches Two alternative multi-scale HSM methods were applied to explore differences in model performance, output and interpretation at the summer range and local habitat levels. Comparisons are only valid between models of the same prevalence and extent (i.e. intra-level, not inter-level).
(a) Single-level models models were run at both lower levels (summer range and local) without incorporating the disaggregated predictions from the preceding level. Model fit and accuracy were compared using three statistics: AICc, AUC (resulting from the three-fold cross validation test) and the threshold-dependent measure of true presence. Response curves, predictor contribution metrics and model coefficients were used to investigate the influence of the method on the species-environment relationships fitted. (b) Post-hoc multi-level models HSI values from each single-level model were overlaid, harmonised to the finest pixel size without interpolation and multiplied on a pixel-by-pixel basis (e.g. Johnson et al. 2004;Fournier et al. 2017;Zeller et al. 2017;Bauder et al. 2018). As this approach involves using the product of the outputs from two or more models using different input data, measures of AICc and three-fold AUC could not be calculated without relying on the mean of these validation metrics across the input models, which we believe to be largely uninformative. A 'non-independent AUC' was computed instead, using all the presence and pseudoabsence data and the 'evaluate' function, 'dismo' package. This value is likely to be falsely inflated compared to the 'independent' AUC statistics resulting from three-fold, geographically-partitioned cross validation (Veloz 2009;Bellamy et al. 2013; Radosavljevic and Anderson 2014), but provides an alternative metric for comparing performance between all three approaches, alongside true presence and (pseudo)absence statistics. The MTSS thresholds were calculated for these outputs using the 'SDMTools' package (VanDerWal et al. 2012).

Model performance
Sample sizes, optimal settings and validation results for all models are presented in Table 2. For all models at all levels, the optimal settings included a regularisation multiplier of 2.5 or 3, and LQH or LQHP feature types. All models achieved statistically significant true presence rates (P \ 0.001). The population range model (which, as the highest model level, is inherently a single-level model and can't be used to compare the three modelling approaches) performed well with an AUC of 0.79 ± 0.02 and 91% of presence points falling within the 29% of 5 km resolution pixels predicted to be suitable across Britain (and a true (pseudo)absence rate of 61%). At the lower levels, comparisons are possible between model types within a level: Sequential multi-level versus single-level models In accordance with our hypothesis, the models created using our sequential framework better fit the input data (C 354 lower AICc) and provided more accurate predictions compared to single-level models. At the summer range level, although the true (pseudo)absence rates were similar across model types (circa 71%), the sequential multi-level model achieved 0.18 higher AUC values and over 20% more true presences, despite a slightly smaller area (1.3%) classified as 'suitable'. There were smaller gains for the sequential multi-level models at the local level: the test statistics were similar (AUC values 0.04 and 0.07 higher for roosting and foraging models respectively) and, although a smaller suitable area (9.0% and 7.1% respectively) resulted in circa 5% fewer pseudoabsences falling into unsuitable pixels, true presence rates were similar compared to single-level models.

Sequential versus post hoc multi-level models
Multiplying the single-level HSI values resulted in similarly high levels of accuracy at the summer range level. At the lower level, the post hoc model nonindependent AUC values were slightly lower (0.05) and the areas predicted to be suitable were circa 10% larger, resulting in higher true presence rates (almost 3% and 10% higher for the roosting and foraging models respectively), but much lower (circa 20%) true (pseudo)absence rates, indicating an overestimation of suitable area.

Predicted R. hipposideros distributions and species-environment associations
The estimated population range was characterised by non-urban areas with relatively high precipitation and maximum temperatures; all predictors were selected at the 10 km scale (Table 3; Fig. SI2.1.1). These conditions are concentrated within the current known range of lowland Wales and South-West England, although suitable habitat was forecasted to extend beyond this, into lowland western England and Scotland and some patches of eastern England (Figs. 2, SI2.3).
At the summer range level, according to the sequential multi-level model, areas within the predicted population range with around 40% broadleaf Optimal model settings (feature types: linear (L), quadratic (Q), product (P) and hinge (H)) and species presence sample sizes (n) are provided alongside measures of model fit (AICc) and accuracy (AUC and true presence). The 'non-independent AUC' was computed using all presence and pseudoabsence data, whereas the mean 'independent' AUC was calculated using threefold cross validation. 'Suitable area' describes the percentage of raster pixels predicted to be suitable following application of the MTSS occupancy threshold rule. True presence describes the proportion of presence records falling within this suitable area. A binomial test measured whether this proportion is greater than expected by chance according to the percentage suitable area. Model performance and fit can only be fairly compared between model types within a level *** P B 0.001 woodland cover at the 1 km scale provided the most suitable habitat (although care should be taken when interpreting this model's response curves due to the interactions fitted). High cover of ancient and conservation woodland within 3 km and low arable cover within 1 km also appear to be associated with R. hipposideros presence. The single-level model also identified broadleaf cover as an important, positive predictor but, in conflict with the sequential multilevel model, this was closely followed by heathland cover within 3 km as a positive predictor (Table 3; Fig. SI2.1.2). Foraging habitats were best characterised by areas close to buildings and hedges, with high broadleaf cover within 100 m, small woodland patches and low cover of heathland and arable within 500 m, according to both single-and multi-level approaches ( Fig. SI2.1.3). Roost habitats showed similar strong associations to high broadleaf cover (500 m scale) and areas close to buildings (typical R. hipposideros roost structures) and hedges, although areas with woodlands of medium shape compactness (analogous to shape complexity) were also a strong predictor of roost habitat suitability (Fig. SI2.1.3). Disaggregated HSI values from preceding levels were strong predictors at lower levels in the sequential multi-level model, achieving percentage contributions of 32-78% (Table 3). The nested nature of these models is clear from examining the habitat suitability surfaces; the imprint of the predicted population range is maintained across the levels, but suitable areas are further differentiated at lower levels according to fine resolution species-environment associations (Figs. 2,  3). More areas were predicted to be of intermediate suitability for foraging bats compared to roosting bats (Figs. 2, SI2.1.3), which may be explained by the higher contribution of summer range suitability on the foraging model (Table 3). Similar predictive HSI maps were provided by the post-hoc multi-level approach, although a higher proportion of pixels were predicted to be highly suitable (Table 2; Figs. 3, SI2.2.1-9). In contrast, the single-level models forecasted a geographically wider distribution of suitable summer range and foraging habitats and the differentiation in suitability between the west and east of Britain was not apparent (Figs. 3, SI2.2.1-9).
At the local level, environmental relationships were very similar between multi-and single-level models; distance to buildings and broadleaf cover were more important in the multi-level foraging model than the single-level model, but these differences were not pronounced (Table 3). At the summer range level, there were large disparities between the speciesenvironment relationships fitted between multi-and single-level models (Table 3; Fig. SI2.1.2). This may be because the optimum model settings incorporated 'product' feature settings at the summer range level, enabling interactions between predictors (Table 3). Predictors displaying large differences in their response curves between single-and multi-level approaches tended to be fitted as an interaction with population range HSI (Table SI2.1). Broadleaf cover was a stronger predictor of summer range suitability according to the multi-level model, and heathland cover had a much higher influence on the single-level model (Table 3, rescaled percentage contribution values). The direction of the relationships also sometimes switched; high building density and ancient woodland cover had a negative influence on summer A predictor's optimal scale was selected according to univariate measures of training gain. To facilitate comparison of predictor influence between modelling approaches, sequential multi-level model predictor contribution values were rescaled by summing the contribution values for all environmental predictors within the model, apart from that of the higher-level HSI predictor, and dividing each environmental predictor's contribution value by this summed figure. Single-level predictor contribution values were then subtracted from these rescaled values, so that predictors with negative 'difference' values (c) can be interpreted as having relatively less of an impact in the sequential, multi-level model range suitability in the absence of information on population range suitability but a positive influence in the multi-level model (Fig. SI2.1.2).

Discussion
The benefits of a sequential, multi-scale approach to HSM We provide an easily implemented, flexible framework for multi-level, multi-scale HSM that improves on single-level and post-hoc multi-level HSM approaches by nesting models across levels. Our framework has three principle advantages. Firstly, by integrating the HSI values obtained from models run at higher levels as a predictor in models run at lower levels, species-environment relationships are fitted whilst accounting for predicted regional suitability, echoing the hierarchical habitat selection process and accounting for context dependency. Secondly, partitioning predictors into separate model levels rather than grouping them into a single analysis limits multicollinearity, an issue prevalent in multi-scale modelling (Lipsey et al. 2017). Thirdly, the approach makes good use of species records commonly available at 'coarse' resolutions ([ 100 m), which are usually rejected or downscaled prior to HSM (Graham et al. 2015). The framework can be easily adapted according to aims and data availability; different statistical models and geographic extents (providing these are nested) can be fitted at each level. As well as varying regionally, species' relationships with local or landscape factors can also vary according to the response metric (e.g. species abundance or fitness), between individuals according to characteristics such as sex, life cycle stage or genotype, and across time according to seasonal factors (Martin 2018). Separate models can therefore be used to investigate differences in habitat use that manifest at lower levels between these subcategories, whilst integrating universal higher level model output (e.g. Rettie and Messier 2000;Gabor et al. 2001;Bauder et al. 2018). By allowing drivers operating at different scales to be disentangled, regionally suitable areas without the local-or landscape-scale features required at lower levels can be identified (Pearson et al. 2004;Vicente et al. 2011), improving the prescription and spatial targeting of management solutions (Lindenmayer 2000;Whittingham et al. 2007;Zeller et al. 2017). This type of crossscale approach can be extended to other types of ecological questions, given availability of spatially explicit response data, such as where to focus resources in support of ecosystem services (e.g. Spake et al. 2019).
We have demonstrated the superior predictive performance and fit of models developed with our framework compared to single-level methods using a single species and region. Although multiplying single-level outputs using the post-hoc multi-level approach provided similar predictive habitat suitability surfaces and levels of accuracy (apart from appearing to overestimate the area of suitable habitat), the fitted relationships differed at the summer range level. This has important consequences for inference. For example, the high contribution of heathland cover within 3 km in the single-level model (and thus posthoc combinations of single level models) indicates a strong, positive association with heathland areas, which is contrary to expectations; R. hipposideros wing shape and echolocation call structure mean it is well-adapted to cluttered environments and avoids large areas of open habitats (Bontadina et al. 2002;Zahn et al. 2008;Dool et al. 2016). Heathlands are mainly found in wetter regions in Britain and annual rainfall had a strong influence on R. hipposideros population range habitat suitability. This disparity is in accordance with our hypothesis and demonstrates that by focusing on a single level and disregarding drivers operating at other levels, spurious relationships may be identified when predictors are correlated. HSMs should therefore always be 'sense-checked' for biological realism alongside statistical validation (Fourcade et al. 2017).
It is also important to interpret results from the sequential, multi-level approach in the context of the relationships fitted at higher levels and any potential interactions with the disaggregated HSI. For example, building density was found to have an important, positive association with R. hipposideros habitat suitability at the summer range level using our sequential framework, contrary to the single-level model. However, this predictor was included as an interaction with the population range HSI, which itself was driven by a strong negative association with high urban landcover within 10 km. This indicates that R. hipposideros selects for rural areas with some built-infrastructure, which can be explained by its dependence on buildings for roosting (Dietz et al. 2009). However, although the MaxEnt output indicates which interactions were fitted, there is currently no function available to visualise these, limiting interpretation. To gain a better understanding of cross-scale interactions, our framework could be followed by a more deliberative approach, such as hierarchical General Linear Modelling (GLM) (Spake et al. 2019).

Future framework refinements
Our use of pseudo-optimised scale selection (sensu McGarigal et al. 2016), whereby univariate models are used to identify the scale at which a predictor best fits the response data at each level, improves upon previous multi-level studies that focus on a single scale at each level (e.g. Pearson et al. 2004;Mateo et al. 2019b). However, the number of scales tested at each level was limited to two and testing for scale effects in isolation (disregarding covariates and potential interactions) can inflate residual variance and alter scale selection (Bradter et al. 2013;Spake et al. 2019). Univariate methods are commonly adopted in multi-scale studies due to the complexity and computational memory requirements involved in testing many permutations of predictors and scales . Future research should focus on integrating true scale optimisation processes that identify a predictor's scale of effect across a wider breadth of scales (Jackson and Fahrig 2015;Miguet et al. 2016) in the presence of other predictors, such using the information-theoretic 'dredge' function provided by the R package 'Mumin', (which does not currently support MaxEnt models; Barton 2018) to test between a hypothesis-driven selection of predictor sets and interactions (e.g. Spake et al. 2019), or recently developed Bayesian approaches (e.g. Stuber et al. 2017, particularly when sample sizes are low (Mateo et al. 2019a)). Other areas for potential framework validation and refinement include the application of a simulation experiment with wellknown correlation structures to better test the framework's robustness (Meynard et al. 2019) and exploring the application of statistical approaches to classify predictors into levels (e.g. Vicente et al. 2014). Because lower level, fine-scale species-environment relationships are usually constrained by factors operating at higher levels (Allen and Starr 1982) we used a 'top-down' approach to order the models; however, comparisons with 'bottom-up' level sequences could also be carried out.

Model inference: implications for woodland bat conservation
This study demonstrates how the implementation of our HSM approach can inform targeted site-to landscape-scale decision making. A combination of climate, land use and land management factors, operating over a range of spatial scales, were found to be associated with distribution of R. hipposideros in Britain. Landscape-scale strategies should focus on protecting and re-connecting important R. hipposideros hotspots in Wales and South-West England and facilitating (re)colonisation of regions north and east of the current known range that are predicted to be suitable at the population range level. The models suggest that actions in these areas should focus on protecting and increasing surrounding broadleaf woodland cover, providing accessible roost spaces within appropriate buildings, and increasing hedgerow provision for foraging and commuting. Furthermore, ancient woodland and 'conservation woodlands' [those with a protected status or that are managed by organisations with a large focus on species or habitat conservation (Table 1)] appear to have a positive impact on summer range suitability; this could be further investigated if spatial data on the prescription of woodland management actions in private woodlands were available. Many of these findings are consistent with what we already know about the species (e.g. Boughey et al. 2011;Tournant et al. 2013;Froidevaux et al. 2017;Le Roux et al. 2017;Fuller et al. 2018), demonstrating the reliability of the approach and highlighting its potential to provide useful information on other, less well-understood woodland bat species for which reliable records are b Fig. 3 Predicted R. hipposideros logistic habitat suitability indices (HSI) at each level below the population range (a-c) according to each model type (1-3). HSI values range from 0 (unsuitable; pale yellow) to a maximum of 1.0 (highly suitable; dark blue), apart from the post-hoc multi-level results, which are multiplied across the levels. A rectangular inset map is provided (at different extents for the summer range and local levels) to enable comparisons of the finescale predictions. See Fig. SI2.2 for full-size images of individual model output particularly limited (e.g. the cryptic small Myotis species) (Pearson et al. 2006;Rebelo and Jones 2010).

Opportunities for further model validation and improvement
To prevent truncation of the R. hipposideros climatic niche, the population range level model should have encompassed its global range (Pearson et al. 2004;Barbet-Massin et al. 2010;Heisler et al. 2017;El-Gabbas and Dormann 2018), which extends south and east from Ireland and Britain into North Africa and Asia. However, bat records within this range were patchy or unavailable via data portals such as the Global Biodiversity Information Facility (https://data. gbif.org). Furthermore, the R. hipposideros global range has shrunk since the mid-twentieth century; in Britain range contraction to the south-west has tracked recent, rapid changes in land-use and agricultural intensification (Robinson and Sutherland 2002;Bontadina et al. 2008). This disequilibrium with the environment may also skew models, a problem that is relevant to many species (Araújo and Pearson 2005;Guisan and Thuiller 2005), but for which we lack good historical data to address. Independent data are needed to better assess model performance and reliability (Fielding and Bell 1997). Alongside the HSM work we are trailing methods and technologies for surveying woodlands for bats to inform a new citizen science monitoring scheme, the British Bat Survey, which is under development by the Bat Conservation Trust. Rolling out a standardised survey protocol to collect more precise, reliable field data nationwide will enable independent model validation and will eventually provide a more accurate and complete picture of bat species distributions. Used in conjunction with emerging spatial datasets derived from sources such as LIDAR, the influence of fine scale woodland composition and structure on woodland specialist bats species activity could be analysed (e.g. Carr et al. 2018). Our ability to model R. hipposideros habitat suitability would probably improve with the creation or wider accessibility of relevant UK-wide spatial data on features such as street lighting, agricultural intensity, trees outside woodland and underground sites suitable for hibernation (Fuller et al. 2018).

Conclusions
This case study suggests that sequential multi-level HSM frameworks incorporating within-level scale optimisation procedures should be adopted in favour of single-level models. This approach appears to provide more accurate predictions and a more complete understanding of the associations underlying a species' distribution with which to inform policy and practice, although validation using simulation experiments and other real word examples are required. Our results indicate that, when predictive accuracy is the sole priority of an HSM exercise, it may be justifiable to use a post-hoc multi-level approach to overlay and combine HSI outputs from multiple levels. Indeed, other studies have demonstrated the utility of this method for decision making (e.g. Johnson et al. 2004;Hattab et al. 2014;Fournier et al. 2017;Zeller et al. 2017;Bauder et al. 2018). However, we suggest that multi-level models should be sequentially integrated to enable a species' response to its environment to vary according to regional suitability when information on the underlying species-environment relationships is key.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Author contributions CB conceived and designed the framework and carried out the analysis, using feedback from all authors to fine-tune methodological design. KB, SR and CB prepared the input data. CB led the manuscript writing with considerable input from all co-authors. CB, SR, CH, KB and CW devised this work as part of the 'Putting Woodland Bats on the Map' project (https://www.forestresearch.gov.uk/research/ putting-uk-woodland-bats-on-themap/).
Data availability Some raw input species (via the NBN Gateway (https://www.nbn.org.uk/)) and environmental data (Table 1) are publicly available; access to other input data requires a data licence-please contact the data source. The output logistic predictive habitat suitability maps produced using the sequential, multi-level models are available here: https://bitbucket.org/chloebellamyFR/sequential-multi-levelhsm-framework.