Reassessing Neotropical angiosperm distribution patterns based on monographic data: a geometric interpolation approach

Raedig, Claudia; Dormann, Carsten F.; Hildebrandt, Anke; Lautenbach, Sven

doi:10.1007/s10531-010-9785-1

Reassessing Neotropical angiosperm distribution patterns based on monographic data: a geometric interpolation approach

Original Paper
Open access
Published: 28 January 2010

Volume 19, pages 1523–1546, (2010)
Cite this article

Download PDF

You have full access to this open access article

Biodiversity and Conservation Aims and scope Submit manuscript

Reassessing Neotropical angiosperm distribution patterns based on monographic data: a geometric interpolation approach

Download PDF

Claudia Raedig¹,
Carsten F. Dormann^2,3,
Anke Hildebrandt⁴ &
…
Sven Lautenbach²

2751 Accesses
15 Citations
Explore all metrics

Abstract

Monographic data rely on specimens deposited in herbaria and museums, which have been thoroughly revised by experts. However, monographic data have been rarely used to map species richness at large scale, mainly because of the difficulties caused by spatially heterogeneous sampling effort. In this paper we estimate patterns of species richness and narrow endemism, based on monographic data of 4,055 Neotropical angiosperm species. We propose a geometric interpolation method to derive species ranges at a 1° grid resolution. To this we apply an inverse distance-weighted summation scheme to derive maps of species richness and endemism. In the latter we also adjust for heterogeneous sampling effort. Finally, we test the robustness of the interpolated species ranges and derived species richness by applying the same method but using a leave-one-out-cross-validation (LOOCV). The derived map shows four distinct regions of elevated species richness: (1) Central America, (2) the Northern Andes, (3) Amazonia and (4) the Brazilian Atlantic coast (‘Mata Atlântica’). The region with the highest estimated species richness is Amazonia, with Central America following closely behind. Centers of narrow endemism are located over the entire Neotropics, several of them coinciding with regions of elevated species richness. Sampling effort has a minor influence on the interpolation of overall species richness, but it substantially influences the estimation of regions of narrow endemism. Thus, in order to improve maps of narrow endemism and resulting conservation efforts, more collection and identification activity is required.

Implementing spatial analyses to measure angiosperm biodiversity from the high-altitude grasslands of the Atlantic forest

Article 10 July 2023

Igor M. Kessous & Leandro Freitas

Odonates in warm regions of south america largely do not follow Rapoport’s rule

Article 14 January 2022

Thiago Barros Miguel, Lenize Batista Calvão, … Leandro Juen

Diversity and endemism of the flora of Ethiopia and Eritrea: state of knowledge and future perspectives

Article Open access 19 October 2021

Sebsebe Demissew, Ib Friis & Odile Weber

Introduction

Species distribution patterns enable scientists and conservation planners to estimate centers of biodiversity (e.g. Williams et al. 1996; Kress et al. 1998; Barthlott et al. 2005) and to identify priority areas for conservation actions (e.g. Davis et al. 1997; de Oliveira and Daly 1999; Schatz 2002; Tobler et al. 2007). Species confined to very small distribution areas, so-called narrow endemic species (Williams et al. 1996; Andersen et al. 1997), pose important conservation issues due to their vulnerability to extinction (Gentry 1986; Knapp 2002). Due to insufficient data collection and heterogeneous sampling effort, distribution patterns in the Neotropics are still poorly described (Kress et al. 1998; Bates and Demos 2001; Hopkins 2007; Morawetz and Raedig 2007). Moreover, the number of Neotropical angiosperm species is exceptionally large, estimated at up to 90,000 species (Raven 1988; Thomas 1999; Smith et al. 2004), making compilation of all species distributions a daunting task. Amazonia, the largest and least accessible part of the Neotropics, still harbors many regions where no plants have been collected at all; Schulman et al. (2007) reported 43% of Amazonia as devoid of botanical collections and an additional 28% as poorly collected. Species with limited or low occurrence are more likely to remain undiscovered, thus impeding the assessment of the distribution of narrow endemic species.

Given the fact that large areas generally are under-sampled, different techniques have been applied to map distribution patterns at large scale. The first essential steps toward estimating plant biodiversity at the global scale have been made by Davis et al. (1997) and Barthlott et al. (1999, 2005) using inventory-based data. These inventories are summary data for geographic units of varying size, mainly based on floras, regional species accounts, local checklists and plot-based data. Whereas Davis et al. (1997) collected information on all of their 234 priority sites and created sub-maps centered on these sites, Barthlott et al. (1999; 2005) estimated plant species richness for standardized units of area (10,000 km²) to derive global maps of plant species richness. In both studies, the Neotropics were indicated to be species-rich, but it was also noted that underlying collection data are lacking for vast parts of Amazonia (Kier et al. 2005; Kreft and Jetz 2007).

As an alternative to inventory-based analyses of species richness, distribution patterns can also be obtained by overlaying maps of geographic ranges of individual species, henceforth referred to as species ranges. Basically, species ranges correspond to regions where occurrences of individuals of the species have been recorded (Gaston 1991), but various more sophisticated concepts of deriving species ranges from occurrence data exist (Lomolino et al. 2006). For the Neotropics, two approaches to estimate angiosperm species ranges and species richness patterns have been applied. These are exclusively based on species occurrence records and do not rely on a summary of different data sources. Hopkins (2007) studied ranges of 1,584 Amazonian species at 1° grid resolution. Here, species ranges were generated by extrapolating from point occurrence data sets, if neighbor occurrences were positioned within the maximum distance of roughly 500 km. The superposition of the thus derived species ranges yielded a species richness map of known species that recognized large parts of the Amazon basin as species-rich. At the same time it displayed a bias for better collected areas. In addition to this approach based on species ranges, Hopkins (2007) modeled species richness based on species numbers, using the same maximum distance of roughly 500 km. In both approaches, this predefined limit can lead to overestimation of species ranges and of species numbers.

For the entire Neotropics, Morawetz and Raedig (2007) analyzed data of 3,715 angiosperm species to identify centers of diversity and narrow endemism. Species occurrences were overlaid onto a 1° grid and merged into the respective grid cells (quadrats). This point-to-grid conversion yielded species ranges with a high degree of range porosity. In contrast to the method applied by Hopkins (2007), this approach is prone to an underestimation of species ranges.

Point data, such as museum and herbarium specimen data, have proven useful for the generation of species ranges (Williams et al. 1996; Kress et al. 1998; Schatz 2002; Willis et al. 2003; Graham et al. 2004). However, there also exist some inherent drawbacks, such as heterogeneous sampling of space and taxa because of varying accessibility of areas and attractiveness of taxa to collectors (Nelson et al. 1990; Graham et al. 2004; Schulman et al. 2007; Sheth et al. 2008) and systematic inaccuracy (Meier and Dikow 2004; Hopkins 2007; Tobler et al. 2007). This problem can in part be avoided by using revised specimen data, which were reviewed by expert taxonomists and published in form of monographs, so-called monographic data (Thomas 1999; Knapp 2002; Hopkins 2007). After reviewing the available data, we found that monographic distribution data are the most promising—because of their taxonomic correctness and reference to large areas. Since survey data on angiosperm species do not cover such a large area, monographic data represent an alternative. However, these data are difficult to analyze, since standard methods used for abundance data cannot be applied.

Species ranges derived from point data are not only subject to uncertainty that originates from the underlying data but also from the construction method. Examples of techniques for the estimation of species ranges are the convex hull (Willis et al. 2003; Sheth et al. 2008), the minimum spanning tree (Hernández and Navarro 2007) or the minimum bounding box (Graham and Hijmans 2006). Generating species ranges by means of a convex hull often results in overestimation of species ranges (Burgman and Fox 2003) and ignores disjunct distribution patterns, particularly for widespread species. A refined method is the use of the alpha-hull (Edelsbrunner et al. 1983; Burgman and Fox 2003), which is based on a triangulation approach. When applying the alpha hull, first, the average distance between the occurrence points is calculated. For the resulting alpha hull, only those occurrences are considered which are connected by a line being a multiple (termed a) of this average line length. Subject to the selection of a, constructed ranges either resemble coarser (a being larger, maximum size: convex hull) or finer (a being smaller, minimum size: point) alpha hulls. Another widely used method for the estimation of species ranges is the ecological niche modeling approach. This approach relates species occurrences to site conditions such as climate variables (the predictor set) and predicts species ranges based on the pattern of these auxiliary variables.

So far, detailed species richness maps based on species ranges of large numbers of species cover only parts of the Neotropics or lack quantification of uncertainty due to heterogeneous sampling effort over area (Kress et al. 1998; Hopkins, 2007; Morawetz and Raedig 2007; Schulman et al. 2007). Here we introduce an interpolation approach, which can be applied for scant data, and which does not require more than the available pure species occurrence data. Our goal is to make the application of this approach independent of detailed knowledge of the ecological demands of the species. The resulting patterns are only an approximation of ‘real’ distribution patterns, but produced in a standardized, reproducible way.

The aim of this study is (i) to present a method tailored to map distribution patterns of Neotropical angiosperm species based on scarce, yet taxonomically reliable monographic occurrence data, (ii) to estimate the distribution patterns of Neotropical angiosperm species and (iii) to explore whether the method presented is appropriate for the identification of centers of diversity and narrow endemism.

Methods

Our analysis is based on distribution data of angiosperm species taken from monographs or similar thoroughly revised treatments covering the Neotropical realm (see Appendix 1). The database was presented in a previous work (Morawetz and Raedig 2007) and since then has been complemented with a further 340 species. It now contains 4,055 species, in 230 genera and 66 families, with ~77% woody and 23% herbaceous species. Species occurrence data were taken from distribution maps and transferred to a grid with 1° grid resolution containing 2,519 quadrats sized ~100 km × 100 km (varying from 12,550 km² at the equator to 8,250 km² at Tierra del Fuego). The species recorded in the database represent about 5% of all Neotropical angiosperm species. It should be stressed that species richness numbers and patterns derived here are indices of species richness, not estimates of absolute numbers.

Due to the special characteristics of our database, we had to design a novel interpolation approach. Firstly, because our data set only includes presence data (not presence/absence data), the choice of suitable habitat quality models was already strongly limited (e.g. Graham et al. 2004; Phillips et al. 2006). Secondly, many species are represented in very few quadrats. Although ecological niche models have successfully been applied for species with only five records (Pearson et al. 2007), exclusion of species having less than five occurrences would exclude about 50% of the species of our data set. Thirdly, the rule of the thumb that each explanatory variable requires about ten data points (Harrell 2001; Reineking and Schröder 2006) would exclude 90% of the species in our database, even if we used a small predictor set of only three environmental variables. Therefore, ecological niche modeling is not suitable for our data set. Furthermore, the species richness pattern of the point-to-grid-data (Fig. 3a) shows a strong bias towards easily accessible areas. Fitting a generalized additive model (GAM; Wood 2006) with species richness as the response and distance to cities, distance to rivers and distance to coasts as explanatory variables explained a significant amount of the variance (Explained deviance 0.39 for the Neotropics and 0.51 for Amazonia). Thus, we opted for a geometric interpolation-based approach to deduce species richness patterns. A requirement for this approach was the possibility to correct for heterogeneous sampling effort. In the absence of an independent validation data set, a further requirement to be met was the validation of the resulting species richness patterns.

Interpolating species ranges

The species occurrences contained in our database were overlaid with a grid (Fig. 1a). However, this point-to-grid data set is incomplete as it only contains occurrences of species which actually have been found, in quadrats that have actually been visited. We expect the actual species ranges to be much larger. Thus, based on the centroids of these quadrats, a conditional triangulation similar to the alpha hull approach was performed: if a point was less than a given interpolation distance d away from two other points, a triangle was created and added to the triangle set (Fig. 1b). If only two points were within the given interpolation distance d, and thus no triangle could be built, a line between these two points was created (Fig. 1c). Triangle and line sets as well as points (which could not be interpolated due to missing neighbor occurrences) were combined and the set of corresponding quadrats was identified as the interpolated species range for a given distance d (Fig. 1d). As an extension to the alpha-hull approach (Edelsbrunner et al. 1983; Burgman and Fox 2003), not only the polygons of the triangulation but also the lines and points were considered. Thereby we avoided the problem of exclusion of narrow endemic species from analysis.

From species ranges to weighted species richness

After processing all species for one distance class i, the interpolated species ranges were summed across all species, creating an estimate of species richness S _i. Results were calculated for the distance classes i = 1, 2, 3,…,10. These species richness grids S _i were combined by performing an inverse distance-weighted approach according to:

$$ S_{w} = \sum\limits_{i = 2}^{10} {\left( {d_{i}^{ - p} } \right.} \cdot \left. {\left( {S_{i} } \right. - \left. {S_{i - 1} } \right)} \right) + S_{1} $$

(1)

with p > 0, d ≥ 1. S ₁ is the original point-to-grid species richness grid, S _w is the grid of the resulting weighted species richness and d _i is the distance (d ₂ = 2, d ₃ = 3,…) used as a threshold in the conditional triangulation.

For each distance class, the increase in species richness relative to the next smaller distance class was calculated for each quadrat and multiplied by a weighting term $d_i^{-p}.$ Thereby, p is a tuning parameter of the weighting procedure applied to the quadrats. For each p > 0 and d ≥ 1, the corresponding weighting term lies between 0 and 1. The greater p becomes, the more relative weight is put on species richness calculated for smaller distances. The closer p is to 0, the more relative weight is put on species richness interpolated for larger distances (see Appendix 2). For the present work, we selected p = 0.5, which resulted in a combination of high weights for small distances and relatively low weights for large distances. The weighted differences between the distance classes were then added to the original point-to-grid data (S ₁), yielding the map of weighted species richness S _w. Species richness centers were identified as contiguous areas of quadrats with S _w > 100, i.e. more than 100 interpolated species.

Adjusting weighted species richness for sampling effort

We addressed the impact of uneven spatial sampling effort by incorporating an additional weighting factor. This factor is based on the ratio of the number of species recorded in a quadrat and the maximum number of species reported for each center of species richness C of the original point-to-grid map [S ₁/max_C (S ₁)]. This relationship between the number of species in a quadrat to the respective reference quadrat is used as a proxy for sampling effort for each quadrat. The higher the relative sampling effort in a quadrat, the nearer it will be to 1, hence the smaller the weighting (1—relative sampling effort) for the respective quadrat will be (Eq. 2). The higher the weight (relative sampling effort close to 0), the larger is the fraction of the interpolated species richness that enters the final estimation of species richness for that specific quadrat. The application of this correction factor to the inverse distance-weighted sum of species richness at the distances 2–10, added to the observed point-to-grid species richness S ₁ is henceforth referred to as adjusted species richness S _adj.

$$ S_{\text{adj}} = \left( {1 - {\frac{{S_{1} }}{{\max_{C} (S_{1} )}}}} \right)\,*\,\sum\limits_{i = 2}^{10} {\left( {d_{i}^{ - p} } \right.} \cdot \left. {\left( {S_{i} } \right. - \left. {S_{i - 1} } \right)} \right) + S_{1} $$

(2)

where max_C is a function which returns for each quadrat the maximum species richness for the diversity center the quadrat belongs to.

Estimating the interpolation robustness by cross-validation

In absence of a validation data set, we chose to estimate the robustness of the interpolation by performing a leave-one-out-cross-validation (LOOCV). Thereby, the interpolation steps were repeated on subsamples of the species points—leaving out each occurrence once—in order to cross-validate the interpolated species ranges (Efron and Gong 1983; Pearson et al. 2007). In contrast to the interpolation approach, this procedure generates floating point values indicating a robustness estimation for a species presence in a quadrat (Fig. 1e, f). For a detailed description of this approach, see Appendix 3. Dividing the resulting LOOCV-estimate by the weighted interpolation estimate S _w yielded the mean robustness of the weighted species richness estimation per quadrat.

Species ranges

So far we focused on species richness, originating from an overlay of species ranges. To detect the effort of interpolation on the species ranges of each species, we calculated the weighted range size range_w by combining the interpolated species ranges for each distance (range_i) for each species (Eq. 3, derived from Eq. 1).

$$ {\text{range}}_{w} = \sum\limits_{i = 2}^{10} {\left( {d_{i}^{ - p} } \right.} \cdot \left. {\left( {{\text{range}}_{i} } \right. - \left. {{\text{range}}_{i - 1} } \right)} \right) + {\text{range}}_{1} $$

(3)

Results are depicted as range size frequency distribution for the weighted range sizes (range_w) and are compared to the range size frequency distribution for individual distance classes.

Species richness of narrow endemic species

We used the same approximate definition for narrow endemic species as Gentry (1986): narrow endemic are those species for which the maximum interpolated range size was five quadrats (ca. 50,000 km², but the respective area varies with latitude between 41,250 and 62,750 km²). While the LOOCV was useful in validating the interpolated species ranges and derived species richness centers, it was not used for the validation of narrow endemism centers because it would exclude too many species (at least 80.5% of narrow endemic species).

Results

Species ranges

The range size frequency distribution of the original point-to-grid ranges (Fig. 2a) is highly right-skewed (skewness = 4.8), with a mean of 12.3 (±22.4 SD) and a maximum of 327 quadrats per species. Most species (3,995 = 99%) occur in less than 100 quadrats. With increasing interpolation distance d (see Eq. 1), both the mean and the maximum number of quadrats per species increase to 59.6 ± 123.2 and 1,378 quadrats for distance 10 (Fig. 2b–e). The combined inverse-distance weighted range size frequency distribution (Fig. 2f, according to Eq. 3) results in a mean of 32.6 ± 65.3, a maximum of 750.8 quadrats per species and a skewness of 4.1. While the mean value for d = 5 (Fig. 2c) is rather similar (33.3 ± 69.2), its range size frequency distribution has a higher skewness (4.5) and a higher maximum (831).

Species richness

Although our original point-to-grid species richness map (Fig. 3a) contains more species than the species richness map of a previous study (Morawetz and Raedig 2007) it identifies rather similar biodiversity centers. Point-to-grid species richness centers lie in Guatemala and adjacent regions, in Costa Rica and Panama reaching into the Chocó, in the Guyanas and at the border triangle of Venezuela, Colombia and Brazil. Moreover they stretch along the Andes (with peaks in the Ecuadorian and Peruvian Andes), along the Amazon with peaks close to Iquitos, Manaus, Santarém and Belém, and at the Brazilian Atlantic coast (Fig. 3a). The combination of the species richness grids over all distances according to Eq. 1 yields the map of weighted species richness (Fig. 3b) and results in four prominent species richness centers: one in Central America (1), crossing into the Andean species richness center (2), one Amazonian center (3) and one center in coastal Brazil (4). The final species richness map (Fig. 3c) adjusts for sampling effort according to these centers of species richness. It turned out that the reference quadrats with the maximum number of species chosen for each of the four centers are all located close to cities and rivers, i.e. easily accessible and therefore related to higher sampling effort: the quadrat at Iquitos (Peru) for Amazonia, the quadrat north from San José (Costa Rica) for Central America, the quadrat at Cali (Colombia, Valle de Cauca) for the Andes, and the quadrat at Rio de Janeiro (Brazil) for the Mata Atlântica.

Transferring the outlines of these centers of species richness to the maps of point-to-grid (Fig. 3a) and adjusted species richness (Fig. 3c), the Amazonian point-to-grid center of species richness has the lowest mean value (50.7 ± 49.5 species per quadrat, Table 1), whereas the mean value for the Amazonian center of adjusted species richness is highest (143.5 ± 32.9). Although the sizes of the species richness centers vary between 21 and 333 quadrats, the mean values of adjusted species richness per center are within a close range (119.2 ± 30.6–143.5 ± 32.9). The high standard deviation decreases from the point-to-grid towards the adjusted species richness map (Table 1), the standard deviation values for the Andean species richness center notably being the lowest.

Table 1 Mean and standard deviation values of angiosperm species richness in the four centers identified in Fig. 3b for original point-to-grid species richness and for interpolated species richness

Full size table

Whereas the effect of interpolation on range sizes is shown in Fig. 2f, the effect on point-to-grid species richness is shown in Fig. 4. This effect varies according to the centers of species richness (Fig. 4, ①–④) and to the quadrats not assigned to any of these centers (⑤, ‘unassigned quadrats’). While it has little effect on the unassigned quadrats ⑤, the interpolation effect is highest for Amazonia ① and the Andes ②. For the smallest center of species richness, the Mata Atlântica ④, the effect is heterogeneous and also the lowest out of the four centers.

The results of the cross validation are high for most quadrats, but the four species richness centers are reflected by slightly higher LOOCV values than the unassigned quadrats (Table 2).The mean robustness per quadrat ranges between 0.777 ± 0.073 and 0.832 ± 0.043, with highest LOOCV values for the Amazonian center of species richness (Table 2).

Table 2 Ratio between the species richness estimate by leave-one-out cross-validation (2,549 species) and by weighted interpolation (4,055 species) of the species richness centers identified in Fig. 3b

Full size table

According to the World Database on Protected Areas 2007 (WDPA Consortium 2008), most Neotropical quadrats are without any protection status (1,253; Fig. 5a) or with low protections status (986; Fig. 5b). The 160 quadrats with highest protection status (Fig. 5d) show maximum levels of species richness at comparably high human population density (Ciesin and Ciat 2005). Better protected quadrats (Fig. 5c, d) show varying correlation with population density, whereas quadrats without or with low protection status (Fig. 5a-b) consistently exhibit lower levels of species richness over all population density classes.

Narrow endemic species

Of the 4,055 species present in the database, 40% (1,573 species) were considered to be narrow endemic Neotropical species. The reference quadrats with the largest numbers of narrow endemic species chosen for each of the centers of species richness to adjust for sampling effort were the quadrats north of Manaus (Amazonia), east of San José (Central America), at Rio de Janeiro (Mata Atlântica), and at Cali (Andes). The map of centers of narrow endemism adjusted for sampling effort (Fig. 6a) did not differ much from the original point-to-grid map (Kendall’s τ: 0.96). Salient centers of adjusted species richness of narrow endemic angiosperms are situated in Costa Rica and Panama, along the Andes (from western Colombia to northern Peru) and at the Brazilian Atlantic coast close to Bahia and close to Rio de Janeiro, but a mosaic of quadrats containing up to five narrow endemics extends over the whole Neotropical region. Less prominent, but equally coherent areas of narrow endemism are located in the south of Mexico, the Caribbean islands, the southern Peruvian and the Bolivian Andes, parts of the Amazon basin, southeastern Cerrado and along the Pacific, the Atlantic and the Caribbean mainland coast. In combination, these areas exceed the areas suggested by Gentry (1992), who restricted Neotropical local endemism mainly to cloud forests ridges, inter-Andean valleys, Cuba and Hispaniola and isolated patches with specific habitat conditions especially in Amazonia. With the exception of the Amazonian species richness center, species richness centers identified in Fig. 3c are well reflected by the centers of narrow endemism. The 276 quadrats holding narrow endemic species and without protection status according to the categories Ia–IV (WDPA Consortium 2008) are highlighted in Fig. 6b.

Discussion

Methods interpolating species richness: spoiled for choice?

In this research we developed a new method for generating species ranges, which we used later to derive maps of species richness and centers of narrow endemism. At first glance it seems that we could have chosen between various approaches for generating species ranges (see section “Introduction”), why should we add yet a new one? The answer is that most methods were inappropriate, considering the characteristics of our data set, and thus also for many similar situations. The proportion of 1,324 species in our database with fewer than three occurrences drastically reduced the number of applicable methods. Also, we found no justification to extrapolate beyond the outmost occurrences of our species. This is due to the fact that every species’ range estimation is uncertain since it integrates over areas wherein the species in question has not been sampled. Uncertainty increases with distance to known species occurrences. Extrapolating our data beyond the outer species occurrences would therefore especially overestimate narrow-ranging species and include peripheral areas not belonging to the species range.