Introduction

Species distribution patterns enable scientists and conservation planners to estimate centers of biodiversity (e.g. Williams et al. 1996; Kress et al. 1998; Barthlott et al. 2005) and to identify priority areas for conservation actions (e.g. Davis et al. 1997; de Oliveira and Daly 1999; Schatz 2002; Tobler et al. 2007). Species confined to very small distribution areas, so-called narrow endemic species (Williams et al. 1996; Andersen et al. 1997), pose important conservation issues due to their vulnerability to extinction (Gentry 1986; Knapp 2002). Due to insufficient data collection and heterogeneous sampling effort, distribution patterns in the Neotropics are still poorly described (Kress et al. 1998; Bates and Demos 2001; Hopkins 2007; Morawetz and Raedig 2007). Moreover, the number of Neotropical angiosperm species is exceptionally large, estimated at up to 90,000 species (Raven 1988; Thomas 1999; Smith et al. 2004), making compilation of all species distributions a daunting task. Amazonia, the largest and least accessible part of the Neotropics, still harbors many regions where no plants have been collected at all; Schulman et al. (2007) reported 43% of Amazonia as devoid of botanical collections and an additional 28% as poorly collected. Species with limited or low occurrence are more likely to remain undiscovered, thus impeding the assessment of the distribution of narrow endemic species.

Given the fact that large areas generally are under-sampled, different techniques have been applied to map distribution patterns at large scale. The first essential steps toward estimating plant biodiversity at the global scale have been made by Davis et al. (1997) and Barthlott et al. (1999, 2005) using inventory-based data. These inventories are summary data for geographic units of varying size, mainly based on floras, regional species accounts, local checklists and plot-based data. Whereas Davis et al. (1997) collected information on all of their 234 priority sites and created sub-maps centered on these sites, Barthlott et al. (1999; 2005) estimated plant species richness for standardized units of area (10,000 km2) to derive global maps of plant species richness. In both studies, the Neotropics were indicated to be species-rich, but it was also noted that underlying collection data are lacking for vast parts of Amazonia (Kier et al. 2005; Kreft and Jetz 2007).

As an alternative to inventory-based analyses of species richness, distribution patterns can also be obtained by overlaying maps of geographic ranges of individual species, henceforth referred to as species ranges. Basically, species ranges correspond to regions where occurrences of individuals of the species have been recorded (Gaston 1991), but various more sophisticated concepts of deriving species ranges from occurrence data exist (Lomolino et al. 2006). For the Neotropics, two approaches to estimate angiosperm species ranges and species richness patterns have been applied. These are exclusively based on species occurrence records and do not rely on a summary of different data sources. Hopkins (2007) studied ranges of 1,584 Amazonian species at 1° grid resolution. Here, species ranges were generated by extrapolating from point occurrence data sets, if neighbor occurrences were positioned within the maximum distance of roughly 500 km. The superposition of the thus derived species ranges yielded a species richness map of known species that recognized large parts of the Amazon basin as species-rich. At the same time it displayed a bias for better collected areas. In addition to this approach based on species ranges, Hopkins (2007) modeled species richness based on species numbers, using the same maximum distance of roughly 500 km. In both approaches, this predefined limit can lead to overestimation of species ranges and of species numbers.

For the entire Neotropics, Morawetz and Raedig (2007) analyzed data of 3,715 angiosperm species to identify centers of diversity and narrow endemism. Species occurrences were overlaid onto a 1° grid and merged into the respective grid cells (quadrats). This point-to-grid conversion yielded species ranges with a high degree of range porosity. In contrast to the method applied by Hopkins (2007), this approach is prone to an underestimation of species ranges.

Point data, such as museum and herbarium specimen data, have proven useful for the generation of species ranges (Williams et al. 1996; Kress et al. 1998; Schatz 2002; Willis et al. 2003; Graham et al. 2004). However, there also exist some inherent drawbacks, such as heterogeneous sampling of space and taxa because of varying accessibility of areas and attractiveness of taxa to collectors (Nelson et al. 1990; Graham et al. 2004; Schulman et al. 2007; Sheth et al. 2008) and systematic inaccuracy (Meier and Dikow 2004; Hopkins 2007; Tobler et al. 2007). This problem can in part be avoided by using revised specimen data, which were reviewed by expert taxonomists and published in form of monographs, so-called monographic data (Thomas 1999; Knapp 2002; Hopkins 2007). After reviewing the available data, we found that monographic distribution data are the most promising—because of their taxonomic correctness and reference to large areas. Since survey data on angiosperm species do not cover such a large area, monographic data represent an alternative. However, these data are difficult to analyze, since standard methods used for abundance data cannot be applied.

Species ranges derived from point data are not only subject to uncertainty that originates from the underlying data but also from the construction method. Examples of techniques for the estimation of species ranges are the convex hull (Willis et al. 2003; Sheth et al. 2008), the minimum spanning tree (Hernández and Navarro 2007) or the minimum bounding box (Graham and Hijmans 2006). Generating species ranges by means of a convex hull often results in overestimation of species ranges (Burgman and Fox 2003) and ignores disjunct distribution patterns, particularly for widespread species. A refined method is the use of the alpha-hull (Edelsbrunner et al. 1983; Burgman and Fox 2003), which is based on a triangulation approach. When applying the alpha hull, first, the average distance between the occurrence points is calculated. For the resulting alpha hull, only those occurrences are considered which are connected by a line being a multiple (termed a) of this average line length. Subject to the selection of a, constructed ranges either resemble coarser (a being larger, maximum size: convex hull) or finer (a being smaller, minimum size: point) alpha hulls. Another widely used method for the estimation of species ranges is the ecological niche modeling approach. This approach relates species occurrences to site conditions such as climate variables (the predictor set) and predicts species ranges based on the pattern of these auxiliary variables.

So far, detailed species richness maps based on species ranges of large numbers of species cover only parts of the Neotropics or lack quantification of uncertainty due to heterogeneous sampling effort over area (Kress et al. 1998; Hopkins, 2007; Morawetz and Raedig 2007; Schulman et al. 2007). Here we introduce an interpolation approach, which can be applied for scant data, and which does not require more than the available pure species occurrence data. Our goal is to make the application of this approach independent of detailed knowledge of the ecological demands of the species. The resulting patterns are only an approximation of ‘real’ distribution patterns, but produced in a standardized, reproducible way.

The aim of this study is (i) to present a method tailored to map distribution patterns of Neotropical angiosperm species based on scarce, yet taxonomically reliable monographic occurrence data, (ii) to estimate the distribution patterns of Neotropical angiosperm species and (iii) to explore whether the method presented is appropriate for the identification of centers of diversity and narrow endemism.

Methods

Our analysis is based on distribution data of angiosperm species taken from monographs or similar thoroughly revised treatments covering the Neotropical realm (see Appendix 1). The database was presented in a previous work (Morawetz and Raedig 2007) and since then has been complemented with a further 340 species. It now contains 4,055 species, in 230 genera and 66 families, with ~77% woody and 23% herbaceous species. Species occurrence data were taken from distribution maps and transferred to a grid with 1° grid resolution containing 2,519 quadrats sized ~100 km × 100 km (varying from 12,550 km2 at the equator to 8,250 km2 at Tierra del Fuego). The species recorded in the database represent about 5% of all Neotropical angiosperm species. It should be stressed that species richness numbers and patterns derived here are indices of species richness, not estimates of absolute numbers.

Due to the special characteristics of our database, we had to design a novel interpolation approach. Firstly, because our data set only includes presence data (not presence/absence data), the choice of suitable habitat quality models was already strongly limited (e.g. Graham et al. 2004; Phillips et al. 2006). Secondly, many species are represented in very few quadrats. Although ecological niche models have successfully been applied for species with only five records (Pearson et al. 2007), exclusion of species having less than five occurrences would exclude about 50% of the species of our data set. Thirdly, the rule of the thumb that each explanatory variable requires about ten data points (Harrell 2001; Reineking and Schröder 2006) would exclude 90% of the species in our database, even if we used a small predictor set of only three environmental variables. Therefore, ecological niche modeling is not suitable for our data set. Furthermore, the species richness pattern of the point-to-grid-data (Fig. 3a) shows a strong bias towards easily accessible areas. Fitting a generalized additive model (GAM; Wood 2006) with species richness as the response and distance to cities, distance to rivers and distance to coasts as explanatory variables explained a significant amount of the variance (Explained deviance 0.39 for the Neotropics and 0.51 for Amazonia). Thus, we opted for a geometric interpolation-based approach to deduce species richness patterns. A requirement for this approach was the possibility to correct for heterogeneous sampling effort. In the absence of an independent validation data set, a further requirement to be met was the validation of the resulting species richness patterns.

Interpolating species ranges

The species occurrences contained in our database were overlaid with a grid (Fig. 1a). However, this point-to-grid data set is incomplete as it only contains occurrences of species which actually have been found, in quadrats that have actually been visited. We expect the actual species ranges to be much larger. Thus, based on the centroids of these quadrats, a conditional triangulation similar to the alpha hull approach was performed: if a point was less than a given interpolation distance d away from two other points, a triangle was created and added to the triangle set (Fig. 1b). If only two points were within the given interpolation distance d, and thus no triangle could be built, a line between these two points was created (Fig. 1c). Triangle and line sets as well as points (which could not be interpolated due to missing neighbor occurrences) were combined and the set of corresponding quadrats was identified as the interpolated species range for a given distance d (Fig. 1d). As an extension to the alpha-hull approach (Edelsbrunner et al. 1983; Burgman and Fox 2003), not only the polygons of the triangulation but also the lines and points were considered. Thereby we avoided the problem of exclusion of narrow endemic species from analysis.

Fig. 1
figure 1

Distance-weighted species range interpolation and LOOCV for Parkia platycephala Benth. (Hopkins 1986). ad Interpolation using the distance of three quadrats (distance i = 3). a The point set as reported in the monograph. b Based on this point set and the given distance i = 3, a conditional polyline generation and c a conditional triangulation is performed. d The overlay of the three sets is then used to predict the species range (range i ) for the given distance in the underlying 1° × 1° quadrats. ef LOOCV. e For the interpolation distance of three quadrats, solo- and 2-point-occurences are not included into the resulting species range. f The interpolation distance of five quadrats yields a species range including all species occurrences

From species ranges to weighted species richness

After processing all species for one distance class i, the interpolated species ranges were summed across all species, creating an estimate of species richness S i . Results were calculated for the distance classes i = 1, 2, 3,…,10. These species richness grids S i were combined by performing an inverse distance-weighted approach according to:

$$ S_{w} = \sum\limits_{i = 2}^{10} {\left( {d_{i}^{ - p} } \right.} \cdot \left. {\left( {S_{i} } \right. - \left. {S_{i - 1} } \right)} \right) + S_{1} $$
(1)

with p > 0, d ≥ 1. S 1 is the original point-to-grid species richness grid, S w is the grid of the resulting weighted species richness and d i is the distance (d 2 = 2, d 3 = 3,…) used as a threshold in the conditional triangulation.

For each distance class, the increase in species richness relative to the next smaller distance class was calculated for each quadrat and multiplied by a weighting term \(d_i^{-p}.\) Thereby, p is a tuning parameter of the weighting procedure applied to the quadrats. For each p > 0 and d ≥ 1, the corresponding weighting term lies between 0 and 1. The greater p becomes, the more relative weight is put on species richness calculated for smaller distances. The closer p is to 0, the more relative weight is put on species richness interpolated for larger distances (see Appendix 2). For the present work, we selected p = 0.5, which resulted in a combination of high weights for small distances and relatively low weights for large distances. The weighted differences between the distance classes were then added to the original point-to-grid data (S 1), yielding the map of weighted species richness S w . Species richness centers were identified as contiguous areas of quadrats with S w  > 100, i.e. more than 100 interpolated species.

Adjusting weighted species richness for sampling effort

We addressed the impact of uneven spatial sampling effort by incorporating an additional weighting factor. This factor is based on the ratio of the number of species recorded in a quadrat and the maximum number of species reported for each center of species richness C of the original point-to-grid map [S 1/max C (S 1)]. This relationship between the number of species in a quadrat to the respective reference quadrat is used as a proxy for sampling effort for each quadrat. The higher the relative sampling effort in a quadrat, the nearer it will be to 1, hence the smaller the weighting (1—relative sampling effort) for the respective quadrat will be (Eq. 2). The higher the weight (relative sampling effort close to 0), the larger is the fraction of the interpolated species richness that enters the final estimation of species richness for that specific quadrat. The application of this correction factor to the inverse distance-weighted sum of species richness at the distances 2–10, added to the observed point-to-grid species richness S 1 is henceforth referred to as adjusted species richness S adj.

$$ S_{\text{adj}} = \left( {1 - {\frac{{S_{1} }}{{\max_{C} (S_{1} )}}}} \right)\,*\,\sum\limits_{i = 2}^{10} {\left( {d_{i}^{ - p} } \right.} \cdot \left. {\left( {S_{i} } \right. - \left. {S_{i - 1} } \right)} \right) + S_{1} $$
(2)

where max C is a function which returns for each quadrat the maximum species richness for the diversity center the quadrat belongs to.

Estimating the interpolation robustness by cross-validation

In absence of a validation data set, we chose to estimate the robustness of the interpolation by performing a leave-one-out-cross-validation (LOOCV). Thereby, the interpolation steps were repeated on subsamples of the species points—leaving out each occurrence once—in order to cross-validate the interpolated species ranges (Efron and Gong 1983; Pearson et al. 2007). In contrast to the interpolation approach, this procedure generates floating point values indicating a robustness estimation for a species presence in a quadrat (Fig. 1e, f). For a detailed description of this approach, see Appendix 3. Dividing the resulting LOOCV-estimate by the weighted interpolation estimate S w yielded the mean robustness of the weighted species richness estimation per quadrat.

Species ranges

So far we focused on species richness, originating from an overlay of species ranges. To detect the effort of interpolation on the species ranges of each species, we calculated the weighted range size range w by combining the interpolated species ranges for each distance (range i ) for each species (Eq. 3, derived from Eq. 1).

$$ {\text{range}}_{w} = \sum\limits_{i = 2}^{10} {\left( {d_{i}^{ - p} } \right.} \cdot \left. {\left( {{\text{range}}_{i} } \right. - \left. {{\text{range}}_{i - 1} } \right)} \right) + {\text{range}}_{1} $$
(3)

Results are depicted as range size frequency distribution for the weighted range sizes (range w ) and are compared to the range size frequency distribution for individual distance classes.

Species richness of narrow endemic species

We used the same approximate definition for narrow endemic species as Gentry (1986): narrow endemic are those species for which the maximum interpolated range size was five quadrats (ca. 50,000 km2, but the respective area varies with latitude between 41,250 and 62,750 km2). While the LOOCV was useful in validating the interpolated species ranges and derived species richness centers, it was not used for the validation of narrow endemism centers because it would exclude too many species (at least 80.5% of narrow endemic species).

Results

Species ranges

The range size frequency distribution of the original point-to-grid ranges (Fig. 2a) is highly right-skewed (skewness = 4.8), with a mean of 12.3 (±22.4 SD) and a maximum of 327 quadrats per species. Most species (3,995 = 99%) occur in less than 100 quadrats. With increasing interpolation distance d (see Eq. 1), both the mean and the maximum number of quadrats per species increase to 59.6 ± 123.2 and 1,378 quadrats for distance 10 (Fig. 2b–e). The combined inverse-distance weighted range size frequency distribution (Fig. 2f, according to Eq. 3) results in a mean of 32.6 ± 65.3, a maximum of 750.8 quadrats per species and a skewness of 4.1. While the mean value for d = 5 (Fig. 2c) is rather similar (33.3 ± 69.2), its range size frequency distribution has a higher skewness (4.5) and a higher maximum (831).

Fig. 2
figure 2

Range size frequency distributions for all species. a Range size frequency distributions of the point-to-grid data. be Range size frequency distributions for selected interpolation distances. f Distance-weighted range size frequency distributions. The y-axis extends to 3,800, including a gap for y-values between 320 and 3,100

Species richness

Although our original point-to-grid species richness map (Fig. 3a) contains more species than the species richness map of a previous study (Morawetz and Raedig 2007) it identifies rather similar biodiversity centers. Point-to-grid species richness centers lie in Guatemala and adjacent regions, in Costa Rica and Panama reaching into the Chocó, in the Guyanas and at the border triangle of Venezuela, Colombia and Brazil. Moreover they stretch along the Andes (with peaks in the Ecuadorian and Peruvian Andes), along the Amazon with peaks close to Iquitos, Manaus, Santarém and Belém, and at the Brazilian Atlantic coast (Fig. 3a). The combination of the species richness grids over all distances according to Eq. 1 yields the map of weighted species richness (Fig. 3b) and results in four prominent species richness centers: one in Central America (1), crossing into the Andean species richness center (2), one Amazonian center (3) and one center in coastal Brazil (4). The final species richness map (Fig. 3c) adjusts for sampling effort according to these centers of species richness. It turned out that the reference quadrats with the maximum number of species chosen for each of the four centers are all located close to cities and rivers, i.e. easily accessible and therefore related to higher sampling effort: the quadrat at Iquitos (Peru) for Amazonia, the quadrat north from San José (Costa Rica) for Central America, the quadrat at Cali (Colombia, Valle de Cauca) for the Andes, and the quadrat at Rio de Janeiro (Brazil) for the Mata Atlântica.

Fig. 3
figure 3

Species richness of Neotropical angiosperms per quadrat. a Point-to-grid species richness (maximum number of species per quadrat: 331). b Weighted species richness (maximum number of species per quadrat: 391). c Species richness adjusted for sampling effort (maximum number of species per quadrat: 331) with delineation of the four largest species richness centers. 1—Central American, 2—Andean, 3—Amazonian, 4—Mata Atlântica species richness center. Projection: Aitoff, Central Meridian 60°W

Transferring the outlines of these centers of species richness to the maps of point-to-grid (Fig. 3a) and adjusted species richness (Fig. 3c), the Amazonian point-to-grid center of species richness has the lowest mean value (50.7 ± 49.5 species per quadrat, Table 1), whereas the mean value for the Amazonian center of adjusted species richness is highest (143.5 ± 32.9). Although the sizes of the species richness centers vary between 21 and 333 quadrats, the mean values of adjusted species richness per center are within a close range (119.2 ± 30.6–143.5 ± 32.9). The high standard deviation decreases from the point-to-grid towards the adjusted species richness map (Table 1), the standard deviation values for the Andean species richness center notably being the lowest.

Table 1 Mean and standard deviation values of angiosperm species richness in the four centers identified in Fig. 3b for original point-to-grid species richness and for interpolated species richness

Whereas the effect of interpolation on range sizes is shown in Fig. 2f, the effect on point-to-grid species richness is shown in Fig. 4. This effect varies according to the centers of species richness (Fig. 4, ①–④) and to the quadrats not assigned to any of these centers (⑤, ‘unassigned quadrats’). While it has little effect on the unassigned quadrats ⑤, the interpolation effect is highest for Amazonia ① and the Andes ②. For the smallest center of species richness, the Mata Atlântica ④, the effect is heterogeneous and also the lowest out of the four centers.

Fig. 4
figure 4

Effect of inverse distance-weighted interpolation on the distribution patterns of angiosperm species. ①–④: centers of species richness; ⑤: quadrats not assigned to a center of species richness. Symbols above the dotted equity line indicate that the interpolated species richness variable of the y-axis outnumbers the point-to-grid species richness of the x-axis. Non-linear regressions (trend lines and shaded standard error envelope) using Generalized Additive Models indicate different effects of interpolation for the different centers

The results of the cross validation are high for most quadrats, but the four species richness centers are reflected by slightly higher LOOCV values than the unassigned quadrats (Table 2).The mean robustness per quadrat ranges between 0.777 ± 0.073 and 0.832 ± 0.043, with highest LOOCV values for the Amazonian center of species richness (Table 2).

Table 2 Ratio between the species richness estimate by leave-one-out cross-validation (2,549 species) and by weighted interpolation (4,055 species) of the species richness centers identified in Fig. 3b

According to the World Database on Protected Areas 2007 (WDPA Consortium 2008), most Neotropical quadrats are without any protection status (1,253; Fig. 5a) or with low protections status (986; Fig. 5b). The 160 quadrats with highest protection status (Fig. 5d) show maximum levels of species richness at comparably high human population density (Ciesin and Ciat 2005). Better protected quadrats (Fig. 5c, d) show varying correlation with population density, whereas quadrats without or with low protection status (Fig. 5a-b) consistently exhibit lower levels of species richness over all population density classes.

Fig. 5
figure 5

Distribution of species on quadrats classified by protection status according to the World Database on Protected Areas 2007 (WDPA Consortium 2008) and estimated population density for 2005 (Ciesin and Ciat 2005). Species to be found in quadrats a without protection status, b with a proportion up to 25% of protected area, c with a proportion of 25–50% of protected area, and d with a proportion of more than 50% of protected area. The title of the y-axis continues above each panel of the graph

Narrow endemic species

Of the 4,055 species present in the database, 40% (1,573 species) were considered to be narrow endemic Neotropical species. The reference quadrats with the largest numbers of narrow endemic species chosen for each of the centers of species richness to adjust for sampling effort were the quadrats north of Manaus (Amazonia), east of San José (Central America), at Rio de Janeiro (Mata Atlântica), and at Cali (Andes). The map of centers of narrow endemism adjusted for sampling effort (Fig. 6a) did not differ much from the original point-to-grid map (Kendall’s τ: 0.96). Salient centers of adjusted species richness of narrow endemic angiosperms are situated in Costa Rica and Panama, along the Andes (from western Colombia to northern Peru) and at the Brazilian Atlantic coast close to Bahia and close to Rio de Janeiro, but a mosaic of quadrats containing up to five narrow endemics extends over the whole Neotropical region. Less prominent, but equally coherent areas of narrow endemism are located in the south of Mexico, the Caribbean islands, the southern Peruvian and the Bolivian Andes, parts of the Amazon basin, southeastern Cerrado and along the Pacific, the Atlantic and the Caribbean mainland coast. In combination, these areas exceed the areas suggested by Gentry (1992), who restricted Neotropical local endemism mainly to cloud forests ridges, inter-Andean valleys, Cuba and Hispaniola and isolated patches with specific habitat conditions especially in Amazonia. With the exception of the Amazonian species richness center, species richness centers identified in Fig. 3c are well reflected by the centers of narrow endemism. The 276 quadrats holding narrow endemic species and without protection status according to the categories Ia–IV (WDPA Consortium 2008) are highlighted in Fig. 6b.

Fig. 6
figure 6

Centers of narrow endemism of Neotropical angiosperm species (species richness per quadrat). a Adjusted species richness (Maximum number of narrow endemic species is 50). b Narrow endemic species not covered by a protection status according to categories Ia–IV (WDPA Consortium 2008). Maximum number of narrow endemic species per unprotected quadrat is 23. Projection: Aitoff, Central Meridian 60°W

Discussion

Methods interpolating species richness: spoiled for choice?

In this research we developed a new method for generating species ranges, which we used later to derive maps of species richness and centers of narrow endemism. At first glance it seems that we could have chosen between various approaches for generating species ranges (see section “Introduction”), why should we add yet a new one? The answer is that most methods were inappropriate, considering the characteristics of our data set, and thus also for many similar situations. The proportion of 1,324 species in our database with fewer than three occurrences drastically reduced the number of applicable methods. Also, we found no justification to extrapolate beyond the outmost occurrences of our species. This is due to the fact that every species’ range estimation is uncertain since it integrates over areas wherein the species in question has not been sampled. Uncertainty increases with distance to known species occurrences. Extrapolating our data beyond the outer species occurrences would therefore especially overestimate narrow-ranging species and include peripheral areas not belonging to the species range.

Interpolating species ranges

One challenge when applying our interpolation method to generate species ranges was to choose the right interpolation distance. To tackle this problem, we used the inverse-distance summation scheme described above. This approach ensures that the results of all interpolation distances are included, while the weighting favors smaller distances. Thereby, the risk of overestimation of species richness due to the generation of large and coherent species ranges for widespread, but locally scarce species is lowered. It has been shown, that particularly widespread species dominate distribution patterns (Jetz and Rahbek 2002; Kreft et al. 2006). If species with medium or large number of occurrences are interpolated with too much weight on long distances, the resulting large ranges will further aggravate this effect on species distribution patterns. Moreover, the risk of overestimation is reduced by putting a constraint on the largest possible interpolation distances, d max = 10. Avoiding even larger distances (>1000 km) is in accordance with Hopkins (2007) who modeled ranges of Amazonian angiosperm species considering interpolation distances between one and nine quadrats (corresponding to 100 and 900 km).

Another important step for our species richness estimation was the adjustment for sampling effort. It is difficult to quantify the influence of overall sampling effort, yet we can apply some adjustment for heterogeneous spatial sampling effort. We did this by defining reference quadrats for the centers of species richness. As a result, quadrats with low species numbers are assigned higher weights than quadrats with high observed species numbers. Thereby, quadrats with high observed species richness acquire fewer additional species from interpolation while quadrats with a low number of observed species could acquire a larger fraction of additional species—if the unadjusted interpolation results predict additional species. We accepted overestimating species richness in some quadrats, knowing that vast areas of the Neotropics are under-sampled (Prance et al. 2000; Ruokolainen et al. 2002; Tobler et al. 2007). Although detailed maps of botanical sampling effort are available for some areas within the Neotropics (e.g. for Amazonia by Schulman et al. 2007), they are not available everywhere and therefore not used in the present work. Also, the procedure to adjust for sampling effort proposed here has the advantage of only requiring information inherent in the available point-to-grid data.

Species richness

Areas of elevated levels of species richness are the result of multiple overlapping species ranges. Most species occupy small ranges (Fig. 2a). Weighting of the species ranges (Eq. 3) demonstrates that the range sizes increase when applying our interpolation approach (Fig. 2f), but with a lower skewness and a lower maximum number of species compared to a medium interpolation distance of five quadrats (Fig. 2c), thus avoiding overestimation of ranges of widespread species.

The ‘smoothed’ increase of the range sizes due to the interpolation approach is reflected in the species richness maps (Fig. 3b, c). Whereas the inclusion of 340 more species (Fig. 3a) showed no major differences to the point-to-grid species richness map presented in Morawetz and Raedig (2007), considerable distinctions are evident in both maps of species richness (Fig. 3b, c). For the weighted interpolation, these differences are plotted in Fig. 4. For all centers of diversity as well as for the unassigned quadrats, interpolated species richness is above the equity line. The different effect of interpolation on the species richness according to diversity center is particularly revealing for Amazonia. Even for small distances, the interpolation of species ranges here is consistently high.

Comparison of maps 3b and 3c reveals the effect of adjusting species richness for sampling effort: the range of species richness is reduced, whereas the peaks of species richness found in Fig. 3b are retained in Fig. 3c. This effect is also apparent in the lower mean and standard deviation values for the centers of adjusted species richness, and in their closer range (Table 1). The Andean species richness center (Fig. 3c, polygon 2) shows the lowest standard deviation relative to the mean values (Table 1), suggesting more equal species richness and sampling effort of these Andean quadrats. The most obvious difference is that the Amazonian species richness center is by far the largest.

Amazonia contains the largest part of today’s remaining contiguous rainforest area (e.g. Davis et al. 1997; Bates and Demos 2001). It has been suggested to be exceptionally species-rich (e.g. Kress et al. 1998; Ruokolainen et al. 2002; Schulman et al. 2007; Saatchi et al. 2008), which has been explained by habitat heterogeneity in combination with historical events (de Oliveira and Daly 1999; de Oliveira and Mori 1999) such as river dynamics and geological history.

In a global overview on species richness within ecoregions, Kier et al. (2005) suggested that the majority of ecoregions from the Andes to the Brazilian coast are very species-rich, but they placed the Chocó and parts of the northern Andes along with the entire Cerrado as the most species-rich zones. This contrasts with the patterns we detected for Amazonia, where we identified highest species richness, and for the Cerrado, where we identified high species richness only in the peripheral zones. The diversity zones of a global comparison of vascular plants (Barthlott et al. 2005) differ from ours mainly in that they are much less pronounced for southwestern Amazonia.

In comparison with a plot-based model of Amazonian tree diversity (ter Steege et al. 2003), the Amazonian diversity center we found is spatially more uniform and includes parts of lower Amazonia as well. Our species richness map (Fig. 3c) also differs from the maps of Amazonia presented by Hopkins (2007) and ranges in between his overall species richness map (generated by a bootstrap approach based on species occurrences) and the species richness map generated by the overlay of extrapolated species ranges. The latter method is comparable to the one applied here, but some differences exist: (1) our approach is more conservative seeking to avoid overestimation and avoiding disproportionate influence of widespread species on distribution patterns, (2) we applied a weighed interpolation approach (as opposed to using only one interpolation distance), (3) we used a larger number of species and we also were able to consider a larger area.

The species richness estimates were validated by LOOCV to specify the robustness of the species ranges and therefore the robustness of the derived species richness map. Thus, the differences in the robustness depicted in Table 2 are due the spatial distribution of the species occurrences and give an indication of how heavily the prediction relies on information from single points. Observations from single points are important (1) when only few observations exist, and the information from one point represents a larger area, (2) for species that are widespread and only loosely connected and (3) for species with restricted distribution. In all cases leaving out single observations might lead to considerably smaller species ranges, and consequently to lower predicted species richness in the quadrats affected. With this in mind, the ratio between species estimates derived from LOOCV and from weighted interpolation will be smaller, indicating a lower robustness. We can thus re-interpret the higher robustness found for Amazonia: it suggests a high proportion of more uniformly distributed species with medium and larger numbers of species occurrences, and a low proportion of small-clustered species and species with few occurrences.

The LOOCV approach does not account for errors due to heterogeneous data quality or sampling effort. Whereas we integrated a strategy to adjust for heterogeneous spatial sampling effort at the level of species richness, we did not include an adjustment for the fact that more recent monographs will be more complete in terms of both taxa and occurrences considered. For the future, the interpolation process could be altered to include an additional weighting at species level. Furthermore, our maps will improve if more data based on future monographs were to be included in the analysis.

The results identified here are not absolute estimates of species richness per quadrat. To obtain a rough estimate of the absolute figures, the numbers per quadrat found need to be multiplied by the factor 20, since our data set represents approximately about 5% of the angiosperm flora occurring in the Neotropics. Following this estimation, our uppermost results would lie in close proximity to the uppermost results of Barthlott et al. (2005) suggesting more than 5,000 vascular plant species in the most species-rich 10,000 km2 units, and that of Kreft and Jetz (2007), modeling 6,500 species at maximum per most species-rich 1° quadrats. Although our species richness map can only approximate ‘real patterns’, this consistency broadly supports our estimation of distribution patterns.

Narrow endemic species

Compared with previous work (Morawetz and Raedig 2007), in spite of considering more species, a similar number of species is identified as narrow endemic species. Previously, all species occurring in three or fewer quadrats were defined as narrow endemic species irrespective of distance between species occurrences, while in the present work only those species that occurred in five or less quadrats after interpolation with the maximum distance of five quadrats qualified as narrow endemic. Although the threshold of five quadrats appears more generous, the method is more rigorous in that it considers spatial distance. The main differences seen between Morawetz and Raedig (2007) and the present study are the absences of some species in southeastern Amazonia and in the Cerrado and Caatinga (two Brazilian floristic provinces) whose recorded occurrences were too geographically distant to be considered narrow endemic.

The analysis of narrow endemic species revealed two shortcomings of our interpolation method: first, if quadrats hold no species after interpolation, no adjustment of sampling effort can be applied. Considering the large number of empty quadrats, the map of narrow endemism (Fig. 6a) might reflect sampling effort more than distribution patterns. Second, we are interpolating species ranges, but not species richness per quadrat. Thus, narrow endemic species that have never been collected are absent from our analysis. We can hypothesize that quadrats near to well-collected quadrats with many narrow endemic species (Fig. 6a) might also hold more narrow endemic species. Considering the low levels of collecting and taxonomic activity in Amazonia in combination with the shortcomings of our method, the question remains elusive, whether narrow endemic species are a common phenomenon in Amazonia. Clarification in this matter can only be achieved by sampling of quadrats which have not been sampled appropriately (Bates and Demos 2001; Hopkins 2007), by taxonomical classification of the unidentified specimens already deposited in herbaria (Ruokolainen et al. 2002) and by publishing of these results as well as constant complementing and updating of databases with this information. Accordingly, our long-time objective is the complementing and updating of our database in combination with the integration of topographic or satellite-based or species-related information in the process of interpolating (e.g. inclusion of detailed soil data in combination with knowledge of the edaphic demands of species).

Protection status

In the Neotropics, almost 90% of the quadrats are without or with low protection status according to the WDPA 2007 (WDPA Consortium 2008; Fig. 5a, b). This figure is worryingly high, and reveals the size of many protected areas to be rather small. Species richness in better protected quadrats (Fig. 5c, d) in populated regions is low, which hints at the conflict between species diversity and human settlement; the existence of large cities in a quadrat excludes the establishment of large protected areas.

Bearing in mind the limitations of our approach, the large number of endemic-rich quadrats lacking protection status (Fig. 6b) demonstrates the urgency of the situation. Such quadrats were found in all parts of the Neotropical region. Since our database probably excludes many as yet undescribed narrow endemic species, the picture could be substantially worse. Many quadrats in particular in north-eastern Amazonia are empty in our map, and rather poorly provided with protected areas. In comparison to a previous analysis based on the WDPA 2005 (Morawetz and Raedig 2007), some quadrats containing many narrow endemic species but lacking protection status are now protected. However, as shown in Fig. 5, the proportion of the respective quadrats under protection is often small (Grenyer et al. 2006). Our map of protection status of narrow endemic species (Fig. 6b) could serve as s a first step towards prioritizing the creation of protected sites, while better resolution of endemism data would greatly improve the results. In summary, the distribution patterns found here, although based on incomplete data and therefore preliminary, advocate the establishment of further protected areas in the Neotropics.

Conclusion

In the light of increasing data availability and ever growing distribution data sets, methods need to be tailored to their analysis. Although distribution modeling approaches are available, their applicability for monographic data and for presence-only data in general is often compromised by data scarcity, poor data quality and lack of knowledge of the environmental correlates of species. Our method is precisely targeted at such data and can also be adjusted to accommodate different taxonomical groups by changing the weighting of interpolation distances for species range generation.

Using this new method, we identified and validated centers of Neotropical angiosperm species richness and compared them to the current protection status and human population density. In addition, we identified areas where insufficient data do not allow for reliable estimates of distribution patterns. This is due to the sensitivity of the distribution patterns of the narrow endemic species towards sampling effort. In particular, our method might underestimate the numbers and the ranges of narrow endemic species in poorly collected areas. Our maps also indicate areas for further sampling activity, because the available data do not yet allow for robust estimation of species richness patterns. To permit pinpointing of species-rich areas for conservation priorities, a robust estimate of total species richness and narrow endemic species richness is necessary. Therefore, future collection activity should focus on under-sampled areas and under-sampled taxa. Further taxonomic identification of both new and already collected, unidentified specimens is necessary, which requires additional training and support of expert taxonomists. New and reliable data will enable the scientific community to further clarify Neotropical angiosperm distribution and in particular endemism patterns to improve response to conservation needs.