Using crisp and fuzzy modelling to identify favourability hotspots useful to perform gap analysis
- First Online:
- Cite this article as:
- Estrada, A., Real, R. & Vargas, J.M. Biodivers Conserv (2008) 17: 857. doi:10.1007/s10531-008-9328-1
In this study, we propose the use of a favourability function to perform Gap Analysis. To exemplify this, we modelled the distribution of terrestrial mammal species in Andalusia (South of Spain) on the basis of their presence/absence on a grid of 10 km × 10 km UTM cells (n = 961). Using logistic regression and 30 variables related with the environment, space and human influence, we obtained probabilities of occurrence for each species in each cell. We computed a crisp favourability index considering the areas as favourable or unfavourable for a species if the probability of occurrence was higher or lower than the species prevalence, respectively. We also used a favourability function and fuzzy logic to level all species to the same threshold of favourability, which allowed to compare and to combine species distributions. Adding up the fuzzy favourability values for each species in each cell we obtained a fuzzy favourability index that we compared with species richness (sum of species in each cell) and with the crisp favourability index. We performed Gap Analysis by overlapping these results with the current reserve network of Andalusia. Gaps were grouped in fewer and bigger zones after applying the favourability indices. Considerations and recommendations for the use of the favourability function to select areas of conservation interest are discussed.
KeywordsAndalusia Conservation planning Fuzzy logic Mammals Natural reserve network Spain Spatial modelling
Red de Espacios Naturales Protegidos de Andalucía (Natural Reserve Network of Andalusia)
Universal Transverse Mercator
Triangulated Irregular Network
Triangulated Irregular Network Surface
Geographic Information System
False Discovery Rate
Area Under the Curve
Fuzzy Envelope Model
Ecological Niche Factor Analysis
The establishment of protected areas began in the 19th century, with the declaration of Yosemite as a state-protected natural reserve in 1864. Protected areas became increasingly relevant in nature conservation with the declaration of Yellowstone in 1872 as the first National Park in the world, being Yosemite upgraded to this category in 1890 (Greene 1987). In Spain, the first National Park was declared in 1918, and it was in 1929 when the first natural reserve was declared in Andalusia (Consejería de Medio Ambiente 2004). The number of protected areas increased sharply during the second half of the 20th century. This caused the need for evaluation of the conservation value of different areas to prioritize their preservation in the context of the whole network of reserves.
The assessment and design of a protected area network are considered as important applications of Conservation Biogeography, which was defined by Whittaker et al. (2005) as: “the application of biogeographical principles, theories, and analyses, being those concerned with the distributional dynamics of taxa individually and collectively, to problems concerning the conservation of biodiversity”.
Gap Analysis is a methodology designed to compare the distribution of biodiversity with the network of protected areas in a territory. Scott et al. (1989, 1993) began to use this term in the early 1990s and applied Gap Analysis to know which areas were more important for conservation and to prioritize which must be protected next. To assess the conservation value of an area it is necessary to have surrogates of biodiversity such as species richness (see, for instance, Araújo 1999; Yip 2004), rarity (Rey Benayas and de la Montaña 2003; Real et al. 2006b), vulnerability (de la Montaña and Rey Benayas 2002), maintenance of patterns and processes (Rouget et al. 2003), ecosystem representativeness (Sierra et al. 2002), or minimum area to support viable populations (Allen et al. 2001). Species richness is the most frequent measure of biodiversity (Brose et al. 2003; Grenyer et al. 2006) and one of the most important biological properties of a territory when evaluating its conservation status (Real 1992). However, species richness does not guarantee the correct protection of all species in a region, and the mere aggregation of species in an area does not imply that this area is important for these species.
Normally, the databases of these studies are atlases of species distribution that can be biased depending on the survey effort. As natural reserves sometimes are the most surveyed areas, they often appear in the atlases as supporting more species than the surrounding zones. Spatial modelling can attenuate this problem by providing potential distribution of each species, which do not depend so much on the survey effort (Real et al. 2006b). However, different species tend to have different prevalences in a territory, so causing different bias in their spatial modelling and precluding the joint use of the output models.
The favourability function proposed by Real et al. (2006a) levels all species at the same threshold of favourability, independently of their proportion of presences, and this allows direct comparison of the distribution of different species. In this approach, an area is not absolutely favourable or absolutely unfavourable for a species, but it has a degree of favourability. Fuzzy logic is applicable in this situation, as the process of environmental modelling can be understood as the identification of the grade of membership of each area to the fuzzy set of favourable areas for each species.
The term favourability, however, differs from habitat suitability or ecological niche. The favourability for the occurrence of a species may not represent the fundamental or the realized niche, and is not always related to the suitability of the habitat. According to the source-sink model (Pulliam 1988), for example, the geographical distribution of a species includes sink areas where the habitat is unsuitable but populations are maintained by dispersal from source areas. Hence distribution models may be describing an amalgam of realized niche (or suitable habitats) and sink areas close to sources (Austin 2002). Conversely, an area with high habitat suitability may be unfavourable for the occurrence of a species due to historic causes (past events that prevented the species from inhabiting the area).
The aim of this paper is to show how the favourability function can be used to perform fuzzy Gap Analysis. We detected important areas for the conservation of biodiversity in Andalusia, using mammal favourability hotspots as important conservation areas, and modelling the distribution of mammals with logistic regression and with the favourability function to obtain crisp and fuzzy favourability hotspots, respectively. We compared these results with those obtained using observed mammal richness.
Material and methods
The region supports 56 indigenous terrestrial mammal species, according with the Atlas of terrestrial mammals of Spain (Palomo and Gisbert 2002). Twenty-six of these species are threatened (Consejería de Medio Ambiente 2001a). The Cabrera vole (Microtus cabrerae) and the wolf (Canis lupus) are critically endangered in Andalusia (Consejería de Medio Ambiente 2001a) and the Iberian lynx (Lynx pardinus) is critically endangered in Spain and in the world (Palomo and Gisbert 2002; IUCN 2004).
The 10 km × 10 km UTM cell map (n = 961) was obtained overlapping the digital outline of Andalusia on the digital UTM cell map of the Iberian Peninsula (resulting by fusion of the maps used by Barbosa et al. (2003) for Spain and Portugal), using Cartalinx software. The distribution of the 56 indigenous terrestrial mammals inhabiting Andalusia on the basis of their presence/absence data were taken from the Atlas of terrestrial mammals of Spain (Palomo and Gisbert 2002), except for the Iberian lynx (Lynx pardinus) which were taken from Guzmán et al. (2004) because these are more recent data reporting a sharp reduction in the distribution of the species. Appendix 1 shows the list of mammal species inhabiting Andalusia.
Variables used to model mammals distribution in Andalusia
Cell area (m2)
Mean altitude (m)a
Slope (°) (calculated from Alti)
Mean relative air humidity in January at 07:00 (%)b
Mean relative air humidity in July at 07:00 (%)b
Annual relative air humidity range (%) ( = |HuJan-HuJul|)
Mean annual potential evapotranspiration (mm)b
Mean annual actual evapotranspiration (mm) ( = min [Prec, PET])
Mean annual insolation (h/year)b
Mean annual solar radiation (kW h/m2/day)b
Mean temperature in January (°C)b
Mean temperature in July (°C)b
Mean annual temperature (°C)b
Annual temperature range (°C) (= TJul - TJan)
Mean annual number of frost days (minimum temperature ≤0°C)b
Mean annual number of days with precipitation ≥0.1 mmb
Mean annual precipitation (mm)b
Maximum precipitation in 24 h (mm)b
Relative maximum precipitation ( = MP24/Prec)
Mean annual number of snow daysc
Mean annual runoff (mm)e
Distance to the nearest highway (km)f
Distance to the nearest urban centre with more than 100,000 inhabitants (km)f
Distance to the nearest urban centre with more than 500,000 inhabitants (km)f
By performing logistic regression of each species presence/absence on each variable separately we selected a subset of variables significantly related to each species distribution. To control the increase in type I error due to multiple tests (Benjamini and Hochberg 1995; García 2003), we only accepted the variables that were significant under a False Discovery Rate (FDR) of q < 0.05, using the procedure for all forms of dependency among statistics proposed by Benjamnini and Yekutieli (2001). We then performed forward–backward stepwise logistic regression of each species on the subset of significant predictor variables to obtain a multivariate logistic model. In forward steps the variable most significantly related to the residuals not explained by the previous variables is selected, but the significance of all the variables in the model is tested after each step of forward inclusion, and non-significant variables are excluded before the next forward selection step (Legendre and Legendre 1998).
We performed a crisp logic model considering logistic probabilities (P) of each species as favourable when P was higher than the prevalence (i.e. proportion of presences) and unfavourable when P was lower than the prevalence. Since probabilities are dependent on the prevalence, of each species, a probability value of 0.5 can indicate a favourable zone if the species has a restricted range or an unfavourable zone if the species is common. The output of this crisp favourability model has two possible values: 1 for favourable areas and 0 for unfavourable areas.
We transformed all this procedure into a fuzzy model by applying the favourability function proposed by Real et al. (2006a) on the logistic regression output, so converting logistic probabilities (P) into favourability (F) values. F values are independent of prevalence, since a value of F = 0.5 is assigned to the predictor conditions for which P = prevalence (Real et al. 2006a). Favourability values higher than 0.5 correspond to areas where the probability of presence is higher than that expected according to prevalence, and the opposite occurs in areas with favourability below 0.5. Whereas P-values for different species are not comparable because of the prevalence bias, F-values are directly equivalent, and the models for all mammals are then levelled to the same threshold of favourability and can be compared and combined directly. So, a favourability value of 0.5 is a neutral value for all species. The favourability value for a species in a cell can be interpreted as the grade of membership of the cell to the fuzzy set of cells that are favourable for the species, and then the favourability function is the membership function, so allowing the use of concepts and applications of fuzzy logic to the resulting spatial analysis of the species. This approach has the advantage over other fuzzy modelling approaches (Robertson et al. 2004; Levinsky et al. 2007) that it is related to logistic regression, which most modellers are familiar with, and provides an explicit equation based on multivariate statistics, and thus allows for selecting among the predictor variables and assigns different importance to them.
The discrimination power of the models was assessed calculating their correct classification rate (CCR), sensitivity, and specificity, using the favourability value of F = 0.5 as classification threshold, and the Area Under the Curve (AUC) of the Receiver Operating Characteristic, which is independent of any favourability threshold (Hosmer and Lemeshow 2000).
High values of index 1 represent species richness hotspots.
When applying index 3 we summed directly the output of the favourability values without transforming them to crisp values. High values of this index represent fuzzy favourability hotspots.
Only cells with at least 25% of their area covered by the RENPA were considered as protected (Fig. 2b). We performed Gap Analysis by overlapping those cells to the 20% of cells with highest richness values after applying the indices. We established the threshold in 20% of the territory because the RENPA covers approximately this proportion of Andalusia. Those cells above this threshold, which were not protected were considered gaps in the protection of mammals.
We did not obtain significant models for three species, which are marked with an asterisk in Appendix 1. The mean values of the discrimination power of the 53 remaining significant models were: 74.48% for the CCR, 74.36% for the sensitivity, 75.13% for the specificity and 0.81 for the AUC. According to the general rule proposed by Hosmer and Lemeshow (2000), this mean AUC value indicates excellent mean discrimination. All the modelled species were represented in the RENPA. At least 22% of the overall favourability in Andalusia for every species corresponded to areas covered by the RENPA.
Spatial models in Gap Analysis
Our results do not reflect the habitat suitability of the species. Habitat suitability models must be applied at a scale that is normally smaller than that used in our favourability models. A 100 km2 cell is a good resolution scale when the variation scale of the study is large, but it has a considerable variety of habitats inside, which advises against its use in habitat suitability models. However, the term habitat suitability has been applied elsewhere without taking into account these considerations (see, for instance, Thuiller et al. 2005) and using variables that are not characteristic of the habitat, such as, for example, the geographical location (Brotons et al. 2004).
Some species have scattered distributions in the Atlas of terrestrial mammals of Spain, which probably do not represent the status of the species but results from an insufficient survey effort. Spatial models solve this problem because areas with high scores tend to be together and predicted values decrease gradually to other areas with lower values (Barbosa et al. 2005). This is why different authors have applied spatial modelling to perform Gap Analysis (Allen et al. 2001; Pearlstine et al. 2002; Maiorano et al. 2006) or to select areas for species conservation (Araújo and Williams 2000; Araújo et al. 2002; van Teeffelen et al. 2006).
However, most spatial modelling techniques yield scores that are dependent on the prevalence of the species, which advises against their use in procedures such as Gap Analysis where every species should be levelled to the same threshold (Jiménez-Valverde and Lobo 2006). Species that have a restricted distribution due to ecological, geographical, or historical factors, have a low probability value even in the favourable areas where they are present. Conversely, common species have high probability of presence even in unfavourable areas. The use of the favourability function allows overcoming this drawback, because F values are high where the conditions are favourable for the species independently of the prevalence.
Surrogates of biodiversity
Normally, the conservation value of a territory is established using different criteria, such as richness, rarity or vulnerability of the species inhabiting the area (Rey Benayas and de la Montaña 2003; Real et al. 2006b; Grenyer et al. 2006; Estrada et al. in press). In this study we compared richness hotspots with favourability hotspots, the latter meaning zones favourable for a high number of mammal species.
Although species richness has been considered a good surrogate of biodiversity in many studies (Real et al. 1993; Araújo 1999; Maes et al. 2005), there are some authors who have shown that species richness is more related with the distribution of generalist species than to the distribution of rare ones (Vázquez and Gaston 2004). However, the majority of the fuzzy favourability hotspots for mammals obtained in this study coincide with those of fuzzy mammal rarity in Andalusia (Real et al. 2006b).
Reserve-selection techniques based on complementarity seek to maximize representation of biodiversity within the limitations of cost. Hotspots of complementarity have been shown to tend to be located in areas with high local species richness and in areas concentrating species of restricted range size (Araújo and Williams 2001). Complementarity is a first logical procedure when a country or region lacks a natural reserve network, because with a limited budget the new network would represent all species in few areas. This is not the case of Andalusia where more than 17,500 km2 are protected, the reserve network is composed of 150 natural areas, and all modelled mammal species are represented in the RENPA.
In fact, there will never be a perfect surrogate or suite of surrogates of biodiversity (Groom et al. 2006). Results of our study should be analyzed together with those of fuzzy rarity (Real et al. 2006b), vulnerability, endemicity or representation for mammals and other groups of animals and plants in the area, to identify important zones for biodiversity conservation in Andalusia. In this context, fuzzy logic could be applied using the intersection and union operations of the fuzzy indices, and we would obtain important areas for all groups and indices simultaneously, and important areas for at least one of them, respectively.
Fuzzy favourability hotspots
The aggregation of the favourability for the whole set of species identifies the areas favourable for many species, whose conservation value is higher than that of areas supporting many species. It is important to protect zones with high favourability value because: (a) they can act as refuges for those mammal species that are rare in the whole territory, i.e., for species that generally address adverse environmental conditions; (b) they may represent source areas where birth rate is higher than mortality and which export individuals to sink areas (Pulliam 1988); and (c) even unoccupied favourable areas are important for conservation when metapopulation dynamics are involved (Levins 1969; Hanski and Simberloff 1997; Muñoz et al. 2005). These arguments are relevant for individual species, but we argue that areas that fulfil these conditions for a high number of species merit attention for conservation purposes.
Species richness should not be always identified with favourability hotspots. An area supporting relatively few species may be highly favourable for them, so obtaining a high favourability score, and if an area favourable for a species is unoccupied, then it computes for favourability hotspot but not for species richness. On the other hand, at least three processes may lead to high species richness in generally unfavourable areas: (a) generalist species tend to be present even in areas with low favourability values; (b) sink areas for many species may overlap; and (c) some areas not particularly favourable for any species may be more thoroughly surveyed than more favourable areas.
The Atlas of terrestrial mammals of Spain is probably the best source available to know the distribution of mammals in Andalusia, but 36 cells have no species reported in the atlas, whereas numerous scattered individual cells stand out for having many species reported. This reflects the general fact that our knowledge of the distribution of species or other individual biodiversity entities is far from complete (Ferrier 2002). However, in our analysis, the number of important cells covered by the reserve network was higher after applying the favourability function, although this function was performed partly to correct a possible sampling bias in favour of protected areas. This seems to indicate that the survey effort was not biased to protected areas but to outside them. Although the criteria to establish protected areas in Andalusia were others than the richness or the favourability for the species, this analysis shows that the natural reserve network was generally well distributed in relation to the areas favourable for many species or, at least, better located than the mere visualization of rich areas in the atlas would suggest.
Robertson et al. (2004) applied a fuzzy classification technique (fuzzy envelope model, FEM) for predicting species’ distributions by using presence-only locality records. Areas with higher values after applying the FEM technique were interpreted as holding more favourable conditions for the organism. The favourability function proposed by Real et al. (2006a) also obtains favourable areas for the species but using presence–absence data, in a similar way as logistic regressions do. Consequently, the favourability function is to logistic regression approximately the same as FEM is to ENFA, favourability function and FEM to be used in the scope of fuzzy logic and logistic regression and ENFA in the scope of crisp logic. Robertson et al. (2004), as well as other authors (Brotons et al. 2004), pointed out that, in cases where absence records are available, models built using presence–absence data may perform better than presence-only data models. We used the favourability function because absence data may provide useful information of the conditions unfavourable for the species.
To protect the 20% of the territory with the highest richness value considering index 1, it would be necessary to establish many and dispersed protected areas in Andalusia, while fewer, bigger, and more concentrated reserves would be needed to cover the gaps detected with the crisp and fuzzy favourability modelling. Similar results were obtained after applying Gap Analysis to the rarity and fuzzy rarity of mammals (Real et al. 2006b), and to the richness and fuzzy favourability hotspots of amphibians (Estrada et al. in press). Fuzzy modelling is then more appropriate to implement the idea of Ferrier (2002), who affirmed that conservation areas not only should be selected to represent as many elements of biodiversity as possible, but also should be sufficiently large and well connected to promote long-term persistence of this diversity.
The establishment of important areas for mammal richness in Andalusia should not be based on a rigid methodology, but on a flexible procedure, which should be able to change in space and time. Maes et al. (2005) recommended that policy makers should make more use of modelling techniques as a proactive conservation tool, because it allows to better target sites predisposed to be included to a natural reserve network. In the same way, Ferrier (2002) proposed integrating biological and environmental data through predictive modelling as a strategy that may help alleviate some of the problems associated with using remotely mapped surrogates in conservation planning. As the favourability function produces an output value for each species in each cell in relation to environmental conditions, it is sensitive to changes of these conditions in space and even in time according to climate or land use change scenarios (Audsley et al. 2006; Thuiller et al. 2006). Consequently, this modelling method may be an important tool to assess both the present and future conservation value of a territory and to foresee the future design of reserve networks.
This work was conducted as part of the project CGL2006-09567/BOS (I+D project) funded by the Ministerio de Educación y Ciencia (Spain), financed jointly by the FEDER; and the project P05-RNM-00935, funded by the Consejería de Innovación, Ciencia y Empresa (Junta de Andalucía, Spain). A. Estrada is a PhD student with a grant of this administration.