Ecological Research

, Volume 25, Issue 5, pp 947–957

Marshalling existing biodiversity data to evaluate biodiversity status and trends in planning exercises

Authors

    • Biodiversity InstituteThe University of Kansas
  • Andrés Lira-Noriega
    • Biodiversity InstituteThe University of Kansas
  • A. Townsend Peterson
    • Biodiversity InstituteThe University of Kansas
  • Jorge Soberón
    • Biodiversity InstituteThe University of Kansas
Special Feature From SATOYAMA to managing global biodiversity

DOI: 10.1007/s11284-010-0753-8

Cite this article as:
Jiménez-Valverde, A., Lira-Noriega, A., Peterson, A.T. et al. Ecol Res (2010) 25: 947. doi:10.1007/s11284-010-0753-8

Abstract

A thorough understanding of biodiversity status and trends through time is necessary for decision-making at regional, national, and subnational levels. Information readily available in databases allows for development of scenarios of species distribution in relation to habitat changes. Existing species occurrence data are biased towards some taxonomic groups (especially vertebrates), and are more complete for Europe and North America than for the rest of the world. We outline a procedure for development of such biodiversity scenarios using available data on species distribution derived from primary biodiversity data and habitat conditions, and analytical software, which allows estimation of species’ distributions, and forecasting of likely effects of various agents of change on the distribution and status of the same species. Such approaches can translate into improved knowledge for countries regarding the 2010 Biodiversity Target of reducing significantly the rate of biodiversity loss—indeed, using methodologies such as those illustrated herein, many countries should be capable of analyzing trends of change for at least part of their biodiversity. Sources of errors that are present in primary biodiversity data and that can affect projections are discussed.

Keywords

Geographic rangeNiche modellingHabitat suitabilityClimate changeConservation

Introduction

Many important components of biological diversity are disappearing at unprecedented rates, with grave consequences for ecosystem services that are crucial for human societies (Millennium Assessment 2005). This biodiversity crisis has led many countries to decide, at the sixth meeting of the Conference of the Parties (COP) to the Convention on Biological Diversity (CBD) in April 2002, to “achieve by 2010 a significant reduction of the current rate of biodiversity loss at the global, regional and national level” (Decision VI/26, http://www.biodiv.org). This agreement is known as the “2010 Biodiversity Target” (hereafter “2010 target”) which, although having attracted a significant amount of attention and activity (Millennium Assessment 2005), is now regarded as unattainable (Mooney and Mace 2009; Secretariat of the Convention on Biological Diversity 2010).

A curious feature of the 2010 target and most biodiversity indicator targets that the CBD COP has adopted is that data required to calculate the indicators are largely unavailable or patchy at best, at national levels (Balmford et al. 2005; Walpolem et al. 2009). For instance, the CBD COP (the framework mentioned in annex II of decision VII/30 of the CBD on future evaluation and UNEP/CBD/COP/7/INF/33) demands, among 18 other indicators, information on “trends in extent of selected biomes, ecosystems and habitats” and “trends in abundance and distribution of selected species.” In most developing countries, monitoring such elements is simply not done. Without relevant data, the proposed indicators remain academic exercises not directly useful for policy development and decision-making (Soberón and Peterson 2009).

For the developing world, it has been argued that no substitute exists for developing national institutions capable of monitoring biodiversity components at fine scales (Soberón and Sarukhán 2010). The emerging field of biodiversity informatics is creating opportunities for making much better use of already available data, providing capacity for assessing some of these biodiversity indicators.

This contribution outlines a procedure for assessing the distribution, abundance, and aspects of conservation status of selected species through time. Major steps forward in enabling access to and analysis of biodiversity information have been taken (Edwards 2005; Soberón and Peterson 2004; Peterson et al. 2010), but this improved information access has not been translated generally into improved knowledge for countries regarding the 2010 target or other aspects of biodiversity. Rather, the great bulk of 2010 target evaluations published to date have been global trends assessments published by researchers based in Europe or North America (Butchart et al. 2005; Collen et al. 2008), with little regional or national content. We provide a framework for one dimension in which existing information can be marshalled to this task, and illustrate this potential by means of analyses across Europe based on existing and easily accessible information. The conclusion is that considerable knowledge can be harvested from existing and available information.

Biodiversity informatics: current status

Considerable progress has been made in biodiversity informatics over the past couple of decades. To begin with, since about 1980, biodiversity data have been captured in digital formats with increasing precision and ease, to the point that now most vertebrate data (and increasing proportions of invertebrate and plant data) exist in databases. Once such digital capture steps are taken—and indeed the digital capture step represents a serious bottleneck in the entire process of enabling biodiversity data for analysis—many additional advances become possible. We focus our attention on these ‘primary biodiversity data’ (i.e., data records that locate a particular species in a place at a particular point in time). We do not consider secondary biodiversity data sources (e.g., polygon-based range maps, grid-based range summaries) further, because, although they are derived from primary sources, they have limited utility owing to problems of uncertainty derivation, spatial resolution, and update frequency.

Beginning in the 1990s, access to these data began to be facilitated and made much more practical. The early FishGopher (Wiley and Peterson 2004) and then the much-improved Species Analyst (Kaiser 1999) provided proofs of concept. The development of the Darwin Core (http://rs.tdwg.org/dwc/) and the Distributed Generic Information Retrieval protocol (DiGIR, http://www.digir.net/) then opened the floodgates, and subsequent developments (e.g., Access to Biological Collections Data—ABCD—Schema; Taxonomic Database Working Group Access Protocol for Information Retrieval, TAPIR) have improved the situation still more. Now, just 10 years later, >200  M biodiversity records are online and available to researchers worldwide, particularly via large-scale biodiversity information networks providing direct access to primary, research-grade data, such as the Global Biodiversity Information Network (GBIF; http://www.gbif.org), VertNet (http://vertnet.org/index.php), SpeciesLink (http://splink.cria.org.br), Red Mundial para Información de la Biodiversidad (REMIB; http://www.conabio.gob.mx), and others.

Although large quantities of primary species occurrence data are now online, much work remains (Yesson et al. 2007). In particular, because the data are very Europe- and North America-centered, large spatial gaps remain to be filled. The data are also very vertebrate-focused, so many taxa remain underrepresented. In addition, the data frequently lack “value-added” features, such as georeferences—major efforts are underway to automate and make the georeferencing process faster and more efficient, interpreting textual locality descriptors intelligently and converting them into usable latitude-longitude coordinates with known degrees of error or uncertainty. Finally, a critical step is that of quality control, in which records likely holding errors are flagged, and errors potentially cleaned or corrected.

In a parallel fashion, analytical tools for biodiversity information have matured considerably in recent years. Approaches for characterizing and interpreting available biodiversity inventory data are now much improved—beginning with early steps (e.g., Soberón and Llorente 1993), these tools have now blossomed into robust software platforms (e.g., EstimateS, available at http://viceroy.eeb.uconn.edu/estimates). Similarly, tools for estimating ecological niches and potential geographic distributions of species have seen considerable exploration and advance, now including a broad methodology for data preparation, niche estimation, model validation, and exploration of the implications of results (Araújo and Guisan 2006; Guisan and Thuiller 2005; Peterson 2006; Phillips et al. 2006). Finally, tools for spatial interpretation of information regarding species’ distributions into explicit and objective strategies for management and conservation have improved considerably, and now are able to optimize multiple priorities and constraints simultaneously (Sarkar et al. 2006).

In sum, a broad infrastructure of primary biodiversity data and analysis tools is now at hand and very much available to the biodiversity community, albeit only recently. Although researchers have explored these opportunities eagerly, the policy community has been relatively slow in assimilating them. Despite well-founded concerns about gaps and errors in the data (Yesson et al. 2007), many of these complications can be attenuated via more detailed procedures for data analysis. All biodiversity data sets hold errors; the question rather is whether the existence of those errors can be incorporated into analyses and their effects taken into account. Also evident is the need to work hand-in-hand with people of different training and expertise, so that the resulting analyses do not lack the appropriate interpretation (e.g. that they are biologically and geographically meaningful). The remainder of this contribution is, in effect, an exploration of these considerations for biodiversity data across Europe, based on a diverse sample of 20 species from across the biota.

Methods and approaches in ecological niche modeling

Occurrence data acquisition and quality control

Species occurrence data are available in freely accessible biodiversity databases, such as GBIF, VertNet, SpeciesLink, REMIB. However, data quality control is an essential first step when using such information (Chapman 2005), because errors like mis-georeferencing and sampling bias can be more the rule than the exception. Figure 1 in Hortal et al. (2007) presents an example of sampling bias. Records falling outside of the region of analysis or in incorrect environments (e.g., terrestrial species in marine environments) can be flagged as problematic. In addition, when georeferencing, precision estimates are available, and records can be filtered to retain only those that are sufficiently precise for the analysis at hand. Records for which any taxonomic doubts exist should also be evaluated with some measure of caution.

Sampling bias is well known for its potential to hamper correct parameterization of ecological niche models (ENM; Vaughan and Ormerod 2003). As a consequence, it is important to assess a priori the distribution of sampling events (i.e., sites at which the species could have beenrecorded, had it been present) with respect to the approximate known distribution of the species; otherwise, the concentrations of occurrence data that drive the ecological niche modeling process may reflect concentrations in the distribution of sampling effort, rather than niche dimensions. Data density in any areas that are overrepresented owing to sampling bias needs to be reduced to match the density across the broader distributional area. Finally, after data cleaning and quality control, modeling can proceed only if sample sizes are sufficient to permit effective training of models (Jiménez-Valverde et al. 2009a, b; Wisz et al. 2008). These initial steps of data compilation and quality control are without doubt the most time-consuming, but also the most important; otherwise, the well-known “garbage in, garbage out” rule will dominate.

Assembling environmental data

Ideally, ecological niche models are based on relationships between occurrence information and geospatial datasets summarizing environmental variables with recognized influence on the population biology and distribution of the species under consideration (Austin 2002). However, for most species, such knowledge does not exist; rather, distributions of most species are vaguely characterized, and effects of abiotic, biotic, and geographic factors are only poorly understood (Soberón and Peterson 2005). This information gap at broad extents is more or less to be expected, given the difficulties of experimentation at geographic scales. Frequently, geospatial summaries of key variables are not available, so modelers usually must rely on general environmental variables that are available, in the hopes that they will be related at least indirectly to the species’ distribution in ecological and geographic dimensions. The WorldClim database (http://www.worldclim.org) sees extensive use at present thanks to its worldwide coverage, standard and easy-to-handle data format, variety of spatial resolutions, and availability of future climate scenarios (Hijmans et al. 2005), although other sources may be available on regional scales (e.g., the European Prudence project, http://prudence.dmi.dk/).

Niche modelers should also note that the spatial and temporal range and resolution of environmental variables must match that of the occurrence data. That is, the temporal range of the environmental data should coincide with that of the biological data, or occurrence records may fall at sites with environmental conditions not representative of those under which the species actually occurred. Final considerations are dimensionality and model complexity—to avoid producing models that are overfit, and particularly given high degrees of intercorrelation of environmental variables, the number of environmental variables considered should be kept relatively small. For example, just because 19 “bioclimatic” variables are available in WorldClim does not mean that all 19 should be included in development of models (Peterson and Nakazawa 2008). Ideally this question should be discussed in terms of the relevance of the environmental information for the focus species (e.g., Rödder et al. 2009).

Finally, depending on the characteristics of the occurrence data available, information regarding land use is an essential element in describing overall present distributions. Global-extent land use data sets have been developed (e.g., Global Land Cover Facility, http://glcf.umiacs.umd.edu/index.shtml), but more detailed information may be available, and indeed necessary, to permit assessment, for example, of changes through time. For example, the European CORINE land cover project (http://www.eea.europa.eu/) provides detailed geospatial summaries of land cover change from 1990 on, opening opportunities to assess impacts of aspects of global change on species’ distributions. In a parallel fashion, but on longer time scales, impacts of climate change (global warming) can be evaluated thanks to availability of model scenarios characterizing future climate conditions (e.g., Hadley Climate Centre’s HadCM3, for various emissions scenarios for 2020 and 2050) and of models of marine intrusion owing to increasing sea level with warming climates (Li et al. 2009; Menon et al. 2010).

Niche model development

Ecological niche models (ENM) is an approach that attempts to estimate environmental requirements of species by means of associations between observed patterns of occurrence of species and environmental variation across broad landscapes; once the ecological niche is estimated, it can be projected onto real landscapes to identify areas that are environmentally suitable for the species (Peterson 2001). Numerous methods have been developed for estimation of these environmental spaces or associated geographic distributions (Elith et al. 2006); however, most appropriate for ENM are methods that require only input of data documenting the presence of species, owing to uncertain meanings of data purporting to document absences (Jiménez-Valverde et al. 2008a, b). Probably the two most widely used approaches are the Genetic Algorithm for Rule-set Prediction (GARP; Stockwell and Peters 1999; http://www.nhm.ku.edu/desktopgarp/) and Maxent (Phillips et al. 2006; http://www.cs.princeton.edu/~schapire/maxent/). Both are evolutionary–computing approaches to estimating ecological niches in complex and highly dimensional environmental spaces: GARP is a genetic algorithm, which uses the biological analogy of chromosomal evolution to “evolve” solutions; while Maxent fits the probability distribution of maximum entropy (i.e., the most spread-out distribution), constrained by the values of the pixels where the species has been found. Both methods have been used extensively for niche modeling. When used and evaluated properly, they provide results with similar predictive power (Peterson et al. 2008). Because their specific configurations and applications have been described in detail elsewhere (Phillips et al. 2006; Phillips and Dudík 2008; Stockwell and Peters 1999), and because numerous examples are available in a burgeoning literature, we have not included detailed descriptions of our specific implementations herein: suffice it to say that we emphasized independent testing data for model refinement—full details are available upon request from the authors.

Processing raw model outputs into realized distributions

When ENM results are projected into geographic dimensions, a surface of continuous values ranking areas by their suitability (or, more formally, by their similarity in some general sense to the places where the species is known to occur) is produced. However, species rarely or never occupy all suitable sites owing to effects of biotic interactions and historical factors such as dispersal barriers (Pulliam 2000; Soberón 2007; Soberón and Peterson 2005). As a consequence, an important next step is to convert the map of overall potential into one representing a best hypothesis of the actual distribution of the species.

First, the continuous maps should be thresholded to focus on areas of presence versus areas of likely absence (Peterson et al. 2007). The selection of the threshold is not straightforward, and depends on the relative weights accorded to omission (i.e., predicting a known site of presence as absent) and commission (i.e., predicting an absence site as present) errors—that is, the threshold that is ideal for a situation will depend on the intended use of the map (Jiménez-Valverde and Lobo 2007b). In an ENM context, when no information about genuine absence of the species (or more formally the absence of suitable conditions for the species) is available, omission errors are of much more concern than commission errors (Lobo et al. 2008; Peterson et al. 2008), so thresholds that maximize the sensitivity (i.e., well-predicted presences) seem most appropriate.

After thresholding, with a map of likely presences and absences of suitable conditions in hand, an additional set of considerations becomes necessary. That is, because of the effects of dispersal barriers and other factors that prevent species from occupying the entire spatial footprint of the environmental conditions suitable for their populations, it is necessary to state explicit assumptions regarding those barriers (i.e., we must make assumptions about what areas have been available to the species for potential colonization, equivalent to “M” or the mobility component of species’ distributions in the BAM diagram of Soberón and Peterson 2005). Once these assumptions are established, the thresholded suitability maps can be trimmed to eliminate suitable areas that lie outside of the areas that the species has been able to colonize (Soberón 2010).

Finally, because ENM is often carried out based on climatic variables, land use and land cover should also be considered in developing and refining distributional estimates for purposes such as natural resources management. Here, again, explicit assumptions are necessary regarding which land cover types will be suitable for the species; because such information is not reflected in climate data, these assumptions cannot generally be recovered from the occurrence data. As a result, information from the scientific literature must be reviewed and synthesized to identify appropriate land cover types. Based on these assumptions, land cover types not considered appropriate for the species in question can be removed from the distributional estimate. Because these trimming and filtering processes are the most subjective steps in the protocol we present here, expert knowledge is quite valuable in developing genuinely robust estimates.

Evaluation of model predictions

The predictive power of any models such as those developed above must be assessed before the model predictions can be considered useful (Peterson 2005; Vaughan and Ormerod 2005). As a consequence, we incorporated two distinct phases of model evaluation prior to any use and interpretation of model predictions. These two phases are (1) testing the accuracy of the raw ENM models (i.e., the continuous predictions before the thresholding, trimming, and filtering processes) using the receiver operating characteristic (ROC) curve, and (2) evaluating the match between final distributional predictions (i.e., after thresholding, trimming, and filtering) using a hierarchical fuzzy pattern-matching approach.

The area under the ROC curve (AUC) is a discrimination measure that, despite recent important criticisms (Lobo et al. 2008; Peterson et al. 2008), is used widely in evaluating ENM predictions. The ROC curve is a plot of sensitivity, i.e., 1-omission error rate when tested with independent testing data against 1-specificity (i.e., the commission error rate) across all possible threshold values (Fielding and Bell 1997). When no information about absences is available, the proportion of area predicted present is substituted for 1-specificity (Phillips et al. 2006). Because the predictions are continuous, the ROC plots are developed over all possible thresholds of predictions, producing a curve; the AUC is then compared with the area under a curve of null expectations to evaluate whether the prediction is better than random or not (Swets 1988). This first set of tests will indicate whether model predictions are better than chance expectations; however, it should be emphasized that validation of such potential distributional hypothesis is complex (Jiménez-Valverde et al. 2008a, b) and modifications to the “typical” ROC approach become necessary (Peterson et al. 2008).

Assessing the accuracy of the final predictions (i.e., the hypotheses of actual distributions) is not easy, mainly because no information on true absences is available (Jiménez-Valverde et al. 2008a, b; Ward et al. 2009). As a consequence, we use a hierarchical fuzzy pattern-matching approach (Power et al. 2001). If distributional maps from some independent source are available (e.g., in the worked example developed below, from expert opinion), then we can compare the predicted distributions with them via these approaches. However, several sources of uncertainty make “true” accuracy statistics difficult to obtain; we mention three. (1) Available range maps do not (or rarely can) present the truth at fine resolutions, or the modeling exercise would be entirely unnecessary. (2) The filtering process regarding land use is imprecise, in the sense that this kind of information usually includes significant classification errors (Jiménez-Valverde et al. 2008a, b) and resolution effects. (3) Despite efforts to reduce potential distributions via assumptions regarding M, predictions will likely include some uninhabited areas. As a consequence, instead of pixel-by-pixel comparisons, we assessed similarity of patterns using a fuzzy pattern-matching global similarity measure (Power et al. 2001). This methodology is designed to mimic the human visual assessment process when comparing maps, assessing similarity at both local and global resolutions, i.e., assessing global similarity in the pattern, but recognizing local discrepancies (see Power et al. 2001 for a detailed explanation of the method). A fuzzy global matching value of ≥0.70 indicates close similarity in patterns recovered (Power et al. 2001).

A worked example

Here, we present the results of a series of explorations of the geographic distributions and conservation status of various species of the European biota. All of the species considered are accorded some conservation threat status by the European Environmental Agency Article 17 status assessment (http://biodiversity.eionet.europa.eu/article17). As a consequence, these results are directly relevant to biodiversity policy management in the region. More specifically, we evaluate whether the approaches explored—all based on biodiversity information that is directly and easily available online—provide novel information useful to status assessment and conservation prioritization activities under the Article 17 effort of the European Environmental Agency (EEA).

Species were selected for inclusion in this example based on four criteria: (1) occurrence in European Union countries that vary in their technical capacities for biodiversity analysis, (2) degree of biogeographic restriction and habitat specialization, (3) taxonomic coverage, and (4) data availability. We chose 20 species at random from among the species fitting these criteria (Table 1). Most occurrence records of species were obtained from the Global Biodiversity Information Facility (http://www.gbif.org/), supplemented with data from the Mammal Networked Information System (http://www.manisnet.org/) and HerpNet (http://www.herpnet.org/) information portals. In one case (the spider, Macrothele calpeiana), more precise occurrence data were available thanks to a prior study (Jiménez-Valverde and Lobo 2007a). As expected, the occurrence data showed numerous problems (mainly georeferencing errors and spatial biases; see extreme example in Fig. 1). For most species, after data cleaning and filtering, sample sizes of occurrence records were reduced somewhat (Fig. 2). Some species, previously selected, were substituted with others for which data were more abundant.
Table 1

Summary of species included in analyses

Mammals

Reptiles and amphibians

Plants

Invertebrates

Alopex lagopus

Bombina bombina

Primula scandinavica

Euphydryas aurinia

Spermophilus citellus

Vipera seoanei

Apium repens

Macrothele calpeiana

Lutra lutra

Bufo calamita

Narcissus nevadensis

Helicopsis striata

Nyctalus lasiopterus

Emys orbicularis

Cypripedium calceolus

Maculinea arion

Galemys pyrenaicus

Triturus cristatus

Eryngium viviparum

Lycaena dispar

https://static-content.springer.com/image/art%3A10.1007%2Fs11284-010-0753-8/MediaObjects/11284_2010_753_Fig1_HTML.gif
Fig. 1

Map of known occurrences of Lycaena dispar, based on data from the Global Biodiversity Information Facility (GBIF; http://www.gbif.org). The high density of points in Poland, compared with lower (or null) densities in surrounding countries illustrates the sampling bias. The grid-like arrangement of most points outside Poland reflecting different precision of georeferencing may compromise the results of any modeling using such data

https://static-content.springer.com/image/art%3A10.1007%2Fs11284-010-0753-8/MediaObjects/11284_2010_753_Fig2_HTML.gif
Fig. 2

Numbers of records available from GBIF (http://www.gbif.org), supplemented with records from the MaNIS (http://manisnet.org/) and HerpNet (http://www.herpnet.org/). Black bars Data portals for each species, white bars actual numbers of records used in the ecological niche models (ENM) procedure after data cleaning

We selected a set of environmental variables from the WorldClim climatic data archive (http://www.worldclim.org/) that we deemed most appropriate for the analysis of the different sets of selected species, based on relative independence of variables in detailed correlation analyses (Jiménez-Valverde et al. 2009a, b; Peterson and Nakazawa 2008). Specifically, we used annual mean temperature, mean diurnal temperature range, maximum temperature of warmest month, minimum temperature of coldest month, annual precipitation, and precipitation of wettest and driest months. To provide a view of likely future potential distributions of each species, we projected niche models developed for the present onto modeled climatic conditions for 2020 and 2050, from the Canadian Center for Climate Modeling and Analysis (CCCMA) A2 projection (drawn from the WorldClim download site, http://www.worldclim.org). To summarize likely effects of sea level rise and consequent marine intrusion into areas that are presently terrestrial, we used recent global projections at 1 km resolution (Li et al. 2009). Finally, to incorporate dimensions of land cover and its change through time, we used data sets from 1990 and 2000 from the CORINE land cover project (CLC1990 and CLC2000; http://dataservice.eea.europa.eu/dataservice).

GARP (with best subsets option; Anderson et al. 2003) and Maxent (default options) were used to generate initial distributional predictions for each species based on ecological niche estimates. Models were calibrated based on a random 50% of available occurrence data, and tested using the other 50% using the AUC (Fig. 3). All AUC values were higher than 0.7, the great majority being above 0.8, in theory indicating that the initial potential distributional hypotheses were better as predictors than random models. Finally, these niche models were projected onto the future climate scenarios, and areas projected as falling in zones of likely marine intrusion (6 m sea level rise scenario) were identified and excluded.
https://static-content.springer.com/image/art%3A10.1007%2Fs11284-010-0753-8/MediaObjects/11284_2010_753_Fig3_HTML.gif
Fig. 3

Results of the validation exercise presented as a frequency histogram of different values of received operating characteristic areas under the curve (AUC), indicating excellent predictive ability (i.e., AUC > 0.8) for 19 of the 20 species analyzed. We used random samples of 50% of available occurrence data to perform quantitative validations of model performance, based on relatively independent occurrence data. Models were calibrated based on half of the occurrence data, and tested using the other half

To estimate actual distributions, raw niche models were converted into presence–absence maps by means of thresholds that included 95% of the presence records on which the model was based (to accommodate 5% error rates that may have escaped the data cleaning process) and we took the intersection of the thresholded Maxent and GARP models for each species to identify a consensus predicted area. The resultant binary maps were trimmed according to the present biogeographic knowledge of the species, based on likely dispersal barriers that limit distributional possibilities of each species. For example, the spider Macrothele calpeiana is endemic to the Iberian Peninsula; however, areas exist across Europe that are suitable for the species from a climatic point of view. We eliminated all suitable areas falling outside of the accessible region for the species based on assumptions regarding dispersal (i.e., that species can disperse through continuous suitable areas, but not jump over areas of unsuitable conditions). Finally, using known habitat requirements of each species, all locations holding unsuitable land cover types were also eliminated; we filtered distributional areas using CLC1990, and the resulting map again filtered using CLC2000 data.

We compared the initially trimmed (i.e., to eliminate areas outside the dispersal “reach” of the species) maps and the filtered ones (i.e., also excluding land cover categories not suitable for the species) with the distributional estimates developed via expert opinion under Article 17 of the Habitat Initiative, provided by the EEA. Here, we used the fuzzy global pattern matching approach. The filtered maps showed close matching of patterns to the Article 17 maps of each species’ distribution, whereas the unfiltered maps, while still better than random (fuzzy global matching >0.0), did not match closely (Fig. 4). This reduced matching clearly resulted from inclusion of many areas not holding appropriate land cover types in the latter datasets. Overall, though, results indicate that our ENM approach and subsequent filtering and trimming produced useful estimates of species’ distributional potential, corresponding closely to expert-opinion-based estimates that took considerably greater effort and resources.
https://static-content.springer.com/image/art%3A10.1007%2Fs11284-010-0753-8/MediaObjects/11284_2010_753_Fig4_HTML.gif
Fig. 4

Comparison of trimmed niche model-based distributional estimates and trimmed and filtered estimates with the expert-derived maps from Article 17, across the 20 species analyzed, presented as box plots (horizontal line median; box 25–75% percentiles; whiskers 1.5 interquartile range, dots outliers) of fuzzy matching statistics

Finally, we concatenated all of the above steps into a single, combined, raster data set to summarize the initial (trimmed) distributional estimate, projections onto future climates for 2020 and 2050, filtering to reflect suitable land use categories in 1990 and 2000, and marine intrusion effects. In this way, queries could be performed easily to assess impacts of different processes on the species’ distribution. These summaries also offered an easy view of the relative contributions of different factors and their combinations to loss of distributional area for the species included in this study (see example in Fig. 5).
https://static-content.springer.com/image/art%3A10.1007%2Fs11284-010-0753-8/MediaObjects/11284_2010_753_Fig5_HTML.gif
Fig. 5

Combined summary of effects of land use, climate change, and marine intrusion on the geographic distribution of the butterfly Lycaena dispar. Top Map of distribution, showing the area lost by the different factors (black marine intrusion, dark gray land use changes, light gray climate change) and the remaining distribution (striped); bottom histogram showing area lost as a function of each individual factor and of combinations of factors

Discussion and conclusions

In this study, we outline and provide a worked example of practical protocols for obtaining distributional estimates for a diverse set of species that can be incorporated into quantitative evaluations of regional status assessment, natural resource planning, and conservation prioritization exercises. We have used several approaches, but all of the steps are based on primary biodiversity data, geospatial environmental data, and software tools that are broadly and freely available. The procedures described herein can be used by investigators of diverse backgrounds with a modicum of basic training (e.g., GIS skills), and with varying levels of resources of time, data, and funding (evident from our ability to assemble this exercise with little in the way of existing data in hand).

That our analyses are feasible for implementation by many or all researchers is clear from the relative ease with which the products we have described were assembled. In fact, all of the data we used were assembled in a few hours of work, given some basic expertise with GIS tools, and similar data resources exist now for many terrestrial components of biodiversity. The distributional hypothesis we developed were all validated by means of data-splitting approaches, and 95% (i.e., 19 out of 20) showed what is generally considered ‘excellent’ predictivity, although considerable doubt exists as to the full utility of the AUC approach (Lobo et al. 2008; Peterson et al. 2008). What is more, our final distributional summaries showed close correspondence to the Article 17 distributional information, so the ENM approach and the Article 17 information converge closely on the same geographic patterns. The difference, however, is in the time and expense involved in assembling the information: the ENM work is quite efficient, and these estimates can be developed at least initially without involvement of taxonomic experts, so the involvement of experts can then be focused on refinement of initial estimates. In addition, the ENM estimates offer considerably improved quantitative detail over what is generally produced by expert opinion efforts (Seoane et al. 2005). The Article 17 summaries, in contrast, required extensive (and expensive) input from experts.

These methodologies also permit incorporation of likely effects of future phenomena affecting species’ distributions, such as climate change, marine intrusion, and land use change. Despite some well-founded criticisms (Dormann 2007), these approaches presently are the only option available for such future-scenario explorations, and have considerable potential to provide proactive perspective on the species’ distributional potential (Pearson and Dawson 2003). Because validation of such predictions is not easy, given that the phenomenon being forecast takes place in the future, investigators have had to depend on special opportunities for model validation: e.g., retropredictions to the Last Glacial Maximum for species for which distributional information exists for the Pleistocene (Martínez-Meyer et al. 2004). A few opportunities have permitted validation of these projections over shorter time periods (Araújo et al. 2005; Foden et al. 2007). Hence, although our preference would be for more validation, it is clear that useful information results from these future projections of ENM hypotheses, information that is not available with other approaches.

A central challenge in biodiversity and natural resources planning is that of assessing where progress is being made, or where negatives outweigh positives. In particular, regarding biodiversity, the 2010 target prioritized reducing rates of biodiversity loss by 2010, but assessing progress toward this goal has not been easy. Although several indicators have been proposed (Butchart et al. 2004, 2005; Loh et al. 2005), they have been highly dependent on global status lists or complex indices—as such, they have not been broadly applicable, scalable, or accessible to countries outside of Western Europe and North America. Our approach offers a more flexible and broadly applicable alternative (Soberón and Peterson 2009): ENM approaches are integrated with multitemporal land cover and climate estimates, and range loss or gain is tracked via integration through time. The result is a simple and highly accessible approach that can track single species or customized sets of species, within particular regions or globally, and thus is much more adaptable to goals such as the species-based portions of the 2010 target.

Our positive result in this exploratory suite of analyses is qualified only in that the knowledge that our approaches yields is dependent on the quality and quantity of information that is available. That is, information products will give better results as the input information gets denser, richer, and more detailed (Wisz et al. 2008). In this sense, the success of our efforts should be a call for even-broader participation in efforts aimed at sharing primary, research-grade biodiversity data, as has been the focus of the Global Biodiversity Information Facility and other distributed biodiversity information networks. More generally, however, the approaches explored and illustrated in this paper can provide detailed information on the distributional potential of large suites of species in the face of changing landscapes, at diverse scales of time and space—this information will be quite useful in measuring success of the 2010 target and other such agreements aimed at reducing biodiversity loss.

Acknowledgments

This work was stimulated by discussions with Dr. Rania Spyropoulou of the European Environmental Agency, and was supported by a contract from that organization. A.J.-V. was further supported by a postdoctoral fellowship from the Ministerio de Educación y Ciencia, Spain (Ref.: EX-2007-0381). A.L.-N. received graduate student support from the Consejo Nacional de Ciencia y Tecnología, México (189216). A.T.P. and J.S. were supported by a grant from Microsoft Research.

Copyright information

© The Ecological Society of Japan 2010