The use of rarefaction and extrapolation as methods of estimating the effects of river eutrophication on macrophyte diversity


A study was carried out to estimate the effect of trophic degradation on the diversity of aquatic plants in rivers, with the application of rarefaction and extrapolation. The purpose of rarefaction was to standardise the uneven number of samples, while extrapolation enabled prediction of the real diversity considering the expected number of species undetected by the sampling effort. Both methods were based on three Hill’s numbers: q = 0 (species richness), q = 1 (Shannon index) and q = 2 (inverse Simpson index). The macrophyte survey was carried out at 96 river sites of a uniform abiotic type (medium lowland rivers with sandy substrate). Macrophyte diversity was evaluated based on survey data (the reference sample) as well as using a dataset standardised by rarefaction and extrapolation. Trends in species richness along the trophic gradient were depicted differently by analysis based on the reference sample and on the extrapolated dataset. Based on the reference sample, the increases of river trophy led to diminishing macrophyte richness, whereas use of the extrapolated dataset revealed that the highest plant diversity can be found in mesotrophic rivers. The extrapolated dataset showed that most oligotrophic pristine rivers were poorer in species than mesotrophic watercourses, and the most limited pool of macrophytes was found in highly eutrophic watercourses. Rarefaction/extrapolation methods enable the precise comparison of plant diversity across sites, by predicting the occurrence of rare species. Moreover, the extrapolation method allows assesses of the total biodiversity basing on a smaller number of trials (survey samples).


Biological diversity, i.e. variety of ecosystems, species and genes, is one of the Earth’s natural resources which is currently under severe threat from human activities. Its effective conservation requires appropriate methods for the identification and monitoring of existing natural resources (Loreau et al. 2001; Cardinale et al. 2012). Ecosystem diversity is difficult to quantify, and gathering comprehensive biodiversity data requires a great deal of work. Therefore, various metrics have been designed to provide a fast and objective way to test key assumptions and to present general trends related to biological diversity status (e.g. Dudgeon et al. 2006; Magurran 2004).

Biodiversity metrics are widely used in nature conservation and are utilised as a tool of ecological policy in various countries. The EU countries have developed numerous indices for estimating the diversity of freshwater organisms, which may be used in monitoring or play a fundamental role in the environmental impact assessment (EIA) of water engineering systems (Furse et al. 2006; Freni et al. 2008). River engineering strongly affects the freshwater habitat, which is well reflected by the biological elements (Herring et al. 2006; Błachuta et al. 2014). It is therefore important to estimate biodiversity precisely, since incorrect evaluation may lead to wrong decisions in natural resource monitoring and management (Gotelli and Colwell 2001; Hurlbert 1971; Simberloff 1979; Palmer 1990; Colwell and Coddington 1994; Colwell et al. 2004; Ulrich and Ollik 2005).

Species richness is the simplest and probably most widely adopted metric for assessing species diversity, but despite its intuitive suitability, it is problematic for many reasons. First, species richness is extremely sensitive to the sample size (i.e. number of samples, scope of sampling, etc.). Sampling for comprehensive diversity requires intensive labour, but is notoriously incomplete, because most of the species within ecosystems are rare and can easily go undetected (Lawton et al. 1998). Rare species should not be ignored, as many of them are at greater risk of extinction and they may still play an important role in ecosystem functioning (MacDougall et al. 2013).

The misidentification or non-detection of rare species can also influence other standard diversity metrics such as the Shannon and inverse Simpson indices. These metrics are influenced by species relative abundance; the Shannon index strongly depends on the number of common species, and the inverse Simpson index is strongly related to the number of very common species (Hill 1973; Brewer and Williamson 1994), but still the species richness component influences their final value. This means that their comprehensive estimation requires extensive investigations.

To reduce the consequences of insufficient sampling records, the quantification of diversity can be supported by rarefaction and extrapolation procedures (Gotelli et al. 2012; Chao et al. 2014). This technique has been tested for various organism groups in numerous types of ecosystems (Colwell et al. 2012; MacArthur 1965; Hill 1973; Jost 2006, 2007; Chiu et al. 2014; Chao et al. 2014; Longino and Colwell 2011).

The Chao method and Hill’s numbers have been previously used in macrophyte studies on German rivers by Steffen et al. (2013) and in river flood plains in Brasil (Moro et al. 2014) and Argentina (Schneider et al. 2014). The rarefaction and extrapolation methods for macrophytes in a single river catchement can be found in Budka et al. (2018). The method utilises Chao estimators (2014), to plot a unified species accumulation curve integrating rarefaction (interpolation) and prediction (extrapolation), further called the rarefaction/extrapolation curve. The part of the curve related to rarefaction standardises the biodiversity measures to a set of a smaller number of samples for the purpose of comparison among datasets with uneven sample units. The extrapolation part of the curve allows estimates of biodiversity measures for the increased number of samples. In this way the rarefaction/extrapolation curve improves diversity estimates and enables the comparison of diversity between datasets with an uneven number of samples and/or insufficient sampling records.

This study focused on macrophytes, which includes vascular plants, bryophytes (mosses and liverworts) and filamentous algae growing in the water. Macrophytes play an important role in aquatic environments by providing physical structure (Thomaz and Cunha 2010), increasing habitat complexity and heterogeneity which strongly affects other aquatic organisms such as micro- and macro-invertebrates (Bergström et al. 2000; Lansac-Tôha et al. 2003; Takeda et al. 2003), fish (Araújo-Lima et al. 1986; Meschiatti et al. 2000; Vono and Barbosa 2001; Theel et al. 2008), waterbirds (Pott and Pott 2000; Guadagnin et al. 2009; Klaassen and Nolet 2007), zooplankton or microalgae (Kuczynska-Kippen and Joniak 2016; Celewicz-Goldyn and Kuczynska-Kippen 2017). The scientific goal of this study was to estimate trends in macrophyte diversity along a trophic gradient in rivers. Diversity estimation was performed based on a reference sample and on a dataset standardised by rarefaction and extrapolation. The aim of rarefaction was to standardise the uneven number of survey sites in the reference sample, and extrapolation enabled prediction of the real diversity considering the estimation of species undetected by the sampling effort. The practical goal was to improve the precision of diversity estimates and to increase the effectiveness of ecological inventories in fluvial ecosystems for conservation and management purposes.

We hypothesised that diversity estimates based on a limited number of samples differ from estimates based on rarefaction and extrapolation so strongly that the ecological classification of a river may differ as result. We also hypothesised that the sample size required for evaluation of full biodiversity (sample coverage) exceeds the possibilities of standard plant monitoring in rivers—therefore, to detect full biodiversity, the field survey should be supported by extrapolation estimates.

Materials and methods

Survey data

The macrophyte survey was carried out at 96 river sites in the lowland area of Poland (Fig. 1). All of the sampled sites belong to a uniform abiotic type: small and medium lowland rivers with sandy substrate. This is a siliceous type of river, flowing in valleys dominated by fine-grained sand and sometimes clays and loesses. All of the sites were situated below 200 m above sea level, and the catchment areas were smaller than 1000 km2. The database has already been analyzed in a previous article that contains more detailed information about the character of the botanical and ecological records (Szoszkiewicz et al. 2017).

Fig. 1

Location of selected survey sites. (Color figure online)

Survey sites were chosen on the basis of the national environmental monitoring database analysis. The classification criterion was the annual average concentration of nutrients, based on twelve monthly water samples The chemical quality of the water was considered, and 96 rivers representing a wide gradient based on the concentration of phosphorus (reactive and total phosphorus) and nitrogen (total nitrogen). Only sites at which nitrogen concentration were at least partly correlated with the concentration of phosphorus were selected for analysis, to focus our analysis on the degradation gradient corresponding to the typical eutrophication process taking place in the environment. The rivers chosen included some from the group of purest rivers in Poland in terms of nutrient content, as well as the most polluted and mesotrophic rivers. The chosen watercourses were divided into five chemical quality classes—class I consisting of the highest-quality (unpolluted) rivers and class V of the most degraded. The numbers of survey sites representing the five quality classes were uneven, ranging from 17 to 20 river sites per class. The survey sites (surveyed sample) will from hereon be referred to as the reference sample. The numbers of sites in the reference sample from the particular quality classes are presented in Table 2.

The macrophyte survey was carried out between 2010 and 2013 in the summer period, from July to early September. Each river site was 100 m long, and all aquatic plants were recorded, including submerged, free-floating, amphibious and emerged species. Monocotyledonous and dicotyledonous plants were recorded, as well as pteridophytes, mosses, liverworts and filamentous algae. A glass-bottomed bucket was used to aid observations. Taxonomic identification was performed to species level, except in the case of algae, which were distinguished to genus level. The recorded incidence of each species across sampling units (the 17–20 rivers within each quality type) was used to calculate the Hill’s numbers and the rarefaction/extrapolation curves.

Hill’s numbers

Based on the survey data, Hill’s numbers, as a unified family of biodiversity indices, were estimated (Hill 1973). These are expressed in units of effective values of key diversity metrics—number of species, Shannon index and inverse Simpson index (Gotelli and Chao 2013; Chao et al. 2014). Hill’s numbers are defined in the following way (Chao et al. 2014):

$${}^{q}\Delta = \left( {\sum\limits_{i = 1}^{S} {p_{i}^{q} } } \right)^{{{1 \mathord{\left/ {\vphantom {1 {(1 - q)}}} \right. \kern-0pt} {(1 - q)}}}} ,\quad q \ge 0,q \ne 1$$

where S denotes the number of all species, and pi denotes the relative frequency of the ith species in the congeries (\(p_{i} \, = \,{{\pi_{i} } \mathord{\left/ {\vphantom {{\pi_{i} } {\sum \pi_{i} }}} \right. \kern-0pt} {\sum \pi_{i} }}\) the probability of occurrence of that species divided by the sum of probabilities for all species).

The parameter q defines the sensitivity of the diversty metric to the relative frequency of occurence of species in the environment (i.e. rare versus abundant). Assuming q = 0, we get \({}^{0}\Delta = S\), which is the measure of species richness. For arithmetic reasons, in the case where q = 1, we use the boundary value:

$${}^{1}\Delta = \mathop {\lim {}^{q}\Delta }\limits_{q \to 1} = \exp \left( { - \sum\limits_{i = 1}^{S} {p_{i} \log p_{i} } } \right)$$

thus obtaining the exponential Shannon index. When q = 2, we obtain the inverse Simpson index in the form:

$${}^{2}\Delta = {1 \mathord{\left/ {\vphantom {1 {\sum\limits_{i = 1}^{S} {p_{i}^{2} } }}} \right. \kern-0pt} {\sum\limits_{i = 1}^{S} {p_{i}^{2} } }}$$

As the value of species richness varies with the sample size (number of survey sites from 17 to 20 for individual river classes)—and consequently with the completeness of the sample—it is possible to present the expected value of the factor as a function of species coverage (in relation to the pool of species occurring in rivers of a given class). When the sample number is infinite, we get (asymptotically) the species richness of all rivers of a given class. Analogous curves can be constructed for other biodiversity measures, and they are graphs of functions increasing with sample completeness. For a sample size m which is smaller (rarefaction) or larger (extrapolation) than the real size, estimators for the first three Hill’s numbers have been presented (Chao et al. 2014), allowing the correct curves of biodiversity factors to be obtained. The actual sample size—that obtained in field surveys—will still be called the reference value. The values of the diversity metrics (\(^{q} \Delta\)) based on the reference sample are presented in Table 2, where the species richness is an estimation of \({}^{0}\Delta\) (Hill’s number for q = 0), Shannon index is \({}^{1}\Delta\) (Hill’s number for q = 1), and the inverse Simpson diversity index is \({}^{2}\Delta\) (Hill’s number for q = 2). The Chao2 estimator for incidence-based data was used to estimate the real species richness (Chao 1984, 1987, 2014). The approach of Colwell et al. (2012) was applied—this suggests the use of those estimators for a value not greater than 2–3 times the real sample size (for extrapolation); then the obtained estimators are reliable. Curves for appropriate Hill’s numbers comply with this principle, and the estimators used in the study have good properties for rarefaction and short-range extrapolation. The work of Chao et al. (2014) also presents a method of setting the confidence intervals for the described curves (rarefaction/extrapolation curve), which is extremely helpful when drawing conclusions. The confidence intervals for Hill’s numbers were obtained using the bootstrap method.

Rarefaction and extrapolation with the use of Hill’s numbers

The diversity ordering of the river quality classes based on the reference sample can be verified by the rarefaction and extrapolation method with the use of Hill’s numbers. Trends in macrophyte diversity along the trophic gradient were depicted by three integrated rarefaction/extrapolation curves, based on the first three Hill’s numbers (Colwell et al. 2012; Chao et al. 2014): species richness \({}^{0}\Delta\), Shannon index \({}^{1}\Delta\) and inverse Simpson index \({}^{2}\Delta\).

  • Integrated rarefaction/extrapolation curves with varying sample size for the first three Hill’s numbers were drawn for each quality class, along with corresponding 95% confidence intervals based on the bootstrap method. Extrapolations were carried out for a number of survey sites (sample size n) being twice the size of the reference sample. It should be noticed that partial overlapping of confidence intervals does not guarantee the absence of differences between biodiversity values. Bootstrap method was based on 100 repetitions which is consistent with the approach presented in the papers of Chao et al. (2014) or Budka et al. (2018). For some cases, the results for 1000 replicates were checked and the conclusions were consistent. To compare the vegetation of rivers in each quality class, three graphs for Hill’s numbers (q = 0, q = 1, q = 2), combining appropriate rarefaction/extrapolation curves for all compared habitats, were created. Extrapolation was carried out up to the basic sample size set by algorithm modelled on the basis of the study of Chao et al. (2014). Based on that paper, analyses were extrapolated to the level corresponding to double the reference sample of the smallest quality class. In further stages, the results obtained were verified using the approach of Chao et al. (2014) by the construction of sample completeness curves (number of survey sites) and comparison of rarefaction/extrapolation curves as a function of coverage.

Analyses were performed in accordance with the procedure implemented in the R 3.2.4 environment (R Development Core Team 2013).


The research indicated a wide range of hydrochemical quality in the investigated rivers. Five trophic classes were identified, and a significant range of the water parameters concentrations was found, including such a key water trophic status indicators as different forms of nitrogen and phosphorus, conductivity and B0D5 (5-day biological oxygen demand). An exception in this respect was pH, which showed no variation between the identified parameters (Table 1).

Table 1 Descriptive statistics of the physicochemical variables of rivers representing five trophic classes

The highest values of diversity in the reference sample were found for the pristine rivers, where 90 macrophyte taxa were detected \(\left( {{}^{0}\Delta } \right)\), the Shannon index was 69.51 \(\left( {{}^{1}\Delta } \right)\) and the inverse Simpson diversity index was 57.85 \(\left( {{}^{2}\Delta } \right)\). All three diversity metrics decrease along the trophic gradient, and the lowest values of the calculated three metrics were found in the last class: 71 \(\left( {{}^{0}\Delta } \right)\), 48.43 \(\left( {{}^{1}\Delta } \right)\) and 37.86 \(\left( {{}^{2}\Delta } \right)\) (Table 2). Therefore, based on the non-standardised data (i.e. the reference sample) a decrease in the diversity metrics along the trophic gradient of rivers was observed.

Table 2 Hill’s numbers for five river quality classes in a reference sample

The results of the first step of the analysis (based on reference sample), verifying the above relationships are presented in Fig. 2.

Fig. 2

Rarefaction/extrapolation biodiversity curves as functions of the number of survey sites (sampling units), based on Hill’s numbers \({}^{q}\Delta\) with q = 0 (species richness), q = 1 (Shannon index) and q = 2 (inverse Simpson index). The solid line is the rarefaction curve and the dotted line is the extrapolation curve, which goes up to double the size of the reference sample. Points representing biodiversity coordinates for the reference data are marked with dots, and the sample size and the observed Hill’s number in the reference sample appear in brackets. Extrapolation goes up to double the size of the reference sample, this being n = 40 for the first, second and fourth class, n = 34 for the third class and n = 38 for the fifth class. The shaded area represents 95% confidence intervals obtained using the bootstrap method on the basis of 100 repetitions. (Color figure online)

Every river quality class revealed the same pattern on the rarefaction/extrapolation curve (Fig. 2)—excluding small initial samples (survey sites), confidence intervals did not overlap. Moreover, the same curve order was detected for every quality class: the highest values were obtained for species richness, then for the Shannon index (representing strongly the number of frequent species), and last for the inverse Simpson index (strongly related to the number of very frequent species). The species richness curve for each class was continuously increasing with an increase in the sample size (number of survey sites), whereas the curves representing the Shannon and inverse Simpson indices grew considerably only in the initial part; on reaching the extrapolation part (the dotted line on the graph) their growth was very limited. In other words, the species richness curve grew significantly along with an increasing sampling effort and still indicated lack of completeness at the end of the extrapolation part. Concerning the two remaining indices, every river quality class showed the relatively limited growth with the increasing sampling effort and completeness of the sample was echived already at the beginning of the extrapolation part.

Comparing quality classes, the highest sample completeness was detected for class I (Fig. 2), because the \({}^{0}\Delta\) curve, after an initial phase of increasing, becomes almost horizontal. This means that after the initial phase the number of detected species does not increase. At the other extreme, the class IV curve exhibits sustained increase after the initial phase. This means that new species are detected with additional survey sites. The extrapolation curve was modelled for 40 survey sites (double the reference sample) and its increasing character shows that with continued sampling the number of detected species is expected to grow. This was confirmed by the predictions presented in Table 3, where the complete richness (total number of plant species) in class IV rivers was estimated at 123 species. This means that in addition to the 77 species observed during the field survey (reference sample) another 46 species are expected to be detected. Moreover, the prediction showed that to record the expected 46 plants during a field survey an extremely extensive range of additional surveying would have to be completed—it was estimated that to identify the full set of species in class IV rivers another 236 sites would have to be surveyed. On the other hand, identification of the full richness of the class I rivers (101 species) requires only an additional 62 sites to be surveyed.

Table 3 Observed and estimated number of species in five river quality classes

Comparisons of rarefaction/extrapolation curves for the five quality classes are presented in Fig. 3, separately for \({}^{0}\Delta\) (Fig. 3a), \({}^{1}\Delta\) (Fig. 3b) and \({}^{2}\Delta\) (Fig. 3c). The biodiversity comparison is based on 34 sites (basic sample size n = 34) as double the size of the sample for the class with the smallest number of sites (class III). The \({}^{0}\Delta\) curves (species richness) intersect each other, which means that the order of rivers representing different quality in terms of their biodiversity depends on the size of the sample to be analysed (the number of surveyed sites). When a sample was small (below the size of the reference sample) the river quality ordering corresponded to the species richness; for the most pristine rivers (quality class I) species richness was estimated at 90 taxa, followed by class II (86), class III (85), class IV (77) and finally class V (71). When a large sample was considered (n = 34 sites) the river quality ordering did not correspond to the species richness, and the group of mesotrophic rivers (quality class IV, III and II) appeared as the richest in species, followed by pristine rivers (quality class I). The most degraded rivers remained the most poor in species.

Fig. 3

Integrated curves of rarefaction/extrapolation, presented as a function of the sample size for: aq = 0 (species richness), bq = 1 (Shannon index), cq = 2 (inverse Simpson index). Roman numerals (I, II, etc.) correspond to the respective quality classes of rivers. (Color figure online)

The differences in the \({}^{0}\Delta\) estimations between the five quality classes were not large; the confidence intervals were not always separated (Fig. 3a). We can only conclude that the first quality class differs significantly from classes IV and V in terms of species richness, and classes II and III differ from class V. However, we did not see significant differences in species richness within the first three quality classes or within the two last classes. Our inferences were made at a significance level of α = 0.05.

As noted earlier, the first quality class was characterised by the highest dataset completeness, which resulted in a small increase in species richness when the sampling effort was increased above 17 sites (comparing with the measured species richness based on reference sample). As a result, it is observed that up to the basic value of the sample, a change in the order of richness within the first three quality classes takes place. The highest estimated value was in quality class III (101.08), followed by class II (99.71) and class I (98.29); however, these values do not differ significantly.

In the case of the diversity metrics related to the number of frequent species (\({}^{1}\Delta\), Shannon index) and those occurring very frequently (\({}^{2}\Delta\), inverse Simpson index), the order of the quality classes was consistent with the trophic gradient of rivers, I > II > III > IV > V. In both cases, the only significant differences observed were between the class I rivers and the other classes, except for class II, and between classes II and III and class V (Fig. 3a).

The results obtained were confirmed by the construction of sample completeness curves (number of survey sites). With the standardised sample size for 20 < n≤ 34, the highest coverage was estimated in class I (maximum for n = 34: 98.6%), then in class III (maximum 97.6%), class II (maximum 96.8%) and class V (maximum 96.3%), with the lowest coverage in class IV (maximum 95.1%). Comparison of the rarefaction/extrapolation curves as a function of coverage functions for all three Hill’s numbers, with high coverage values, confirmed the order of the biodiversity indices of the five quality classes analogously to the curves in Fig. 3. There was therefore a correspondence between the conclusions regarding the values of biodiversity estimators drawn on the basis of both curve types.


The study has shown that evaluation of river plant diversity is difficult to quantify, and advanced analytical tools must be applied. It was found that the ordering of rivers representing various quality classes according to diversity metrics was different if based on extrapolating the reference sample, rather than when the presence of rare species was extrapolated. The ordering based on the reference sample (17–20 river sites per quality class) showed that growth in river trophy leads to diminishing biodiversity (Table 2). The same pattern was observed in the case of all diversity metrics considered: species richness, the Shannon index and the inverse Simpson index. Ordering based on based on a larger number of samples (double the reference sample) showed the highest species richness (q = 0) to be that of mesotrophic rivers—the most rich in species was class II, followed by class III rivers. The pristine class I rivers dropped to third place. The most polluted rivers remained as poor in species. The change in the order of quality classes based on extrapolated samples, relative to the reference samples, confirms the hypothesis that data standardisation can modify the picture of macrophyte diversity change along the trophic gradient. We confirmed that extrapolation is a useful technique to reveal the diversity pattern of running waters, as has been found in other types of ecosystems (Colwell et al. 2012; Chao et al. 2014).

Analyses show that the estimated number of species for full diversity in each river quality class varies between 98 (class V) and 123 (class IV). This value seems to be large when compared to the number of species recorded but it corresponds well to the overall river plant biodiversity resources of lowland Poland. The total aquatic flora of these watercourses is estimated at about 115 vascular plant species (Rutkowski 2008; Bernatowicz and Wolny 1969). As well as true macrophytes another 63 terrestrial vascular plants may potentially develop in the river bank zone (Rutkowski 2008). This number should be increased by bryophytes, with 10 liverworts and 15 mosses (Jusik 2012). Additional 15 semi aquatic mosses are regularly recorded in lowland rivers with sandy substrate (mainly Bryopsida and Mnium species). Moreover, nine multicellular algae may potentially be found in this type of watercourses (Cladophora, Ulva, Vaucheria, Oedogonium, Ulothrix, Spirogyra, Hildenbrandia, Rhizoclonium, Stigeoclonium). The predicted number of species corresponds well with the river flora resources estimated for the European Lowlands by Szoszkiewicz et al. (2006) and Ellenberg et al. (1992).

The study also demonstrated the application of the rarefaction approach in biodiversity assessment. In our case, the number of surveyed sites was uneven among the compared river quality classes, but these differences were small—the reference sample varied between 17 and 20 river sites per quality class. The role of rarefaction is the standardisation of an uneven number of samples, but in our case the differences were not distinct enough to detect any change in macrophyte diversity along the trophic gradient. Therefore we were unable to confirm part of the hypothesis put forward. Nevertheless, the literature shows that diversity comparisons with various organisms in different ecosystems can often lead to incorrect conclusions when the samples are uneven, and based on the rarefaction plot the risk of error can be minimised (Colwell et al. 2012; Chao et al. 2014).

The extrapolation of the diversity metrics for rivers representing various quality classes changed their order only when the number of species was considered; the orderings based on the Shannon index and the inverse Simpson index were not changed (Fig. 3). The change in ordering based on species richness was a result of the abundance of rare species which are not detected in the reference sample. However, the flattened extrapolation curve (Fig. 2) for the class I rivers showed the highest completeness of species of the reference sample. On the other hand, the sharply increasing extrapolation curve for the impacted rivers showed a high number of rare species in the reference sample, indicating expected growth in identified species in the enlarged pool of sampled sites.

The highest deficit of identified species was found for class IV rivers, where 77 species were found in the reference sample and another 46 taxa were expected to be present (Table 3). Generally, the highest species richness was estimated for the rivers representing a medium level of trophy (class IV, followed by class II and III). The most species-poor type of rivers appeared to be the most polluted (class V). The identified pattern of river diversity supports the intermediate disturbance hypothesis (Connel 1978), revealing the greatest compositional variation at a moderate degree of degradation. This pattern has been identified in relation to aquatic plants in the nutrient gradient in the previous studies (Szoszkiewicz et al. 2017) showing higher species richness of moderately eutrophic sites comparing with the most impacted watercourses as well pristine brooks. This tendency was also indirectly confirmed in various studies where linear relationship between eutrophication and species richness in the wide trophy gradient was not found (Svitok et al. 2016; Szoszkiewicz et al. 2014; Thiébaut et al. 2002, 2006) or it was very weak (Hrivnak et al. 2014). These studies confirmed strong relationship of various macrophyte metrics as well as distinctiveness of species composition along the trophy gradient although very limited linear relationship with species richness as well as other diversity metrics was found. Our studies based on the complete richness confirmed the conclusions of the mentioned papers which based on statistically incomplete field sample (reference sample).

The analyses have shown that diversity estimates based on the full richness (Sest, see Table 3) would reveal a different trend in species diversity along the trophic gradient. Nevertheless, ordering based on the basic sample size (n = 34) is regarded as giving a better species diversity estimation than full richness estimates. Colwell et al. (2012) proved that reliable diversity estimations should be based on an extrapolated sample size which is not greater than 2–3 times the size of the real sample.

The study has shown that the extrapolation method is an effective method of diversity assessment in rivers, able to estimate the complete richness of plant species. The applied extrapolation procedure assesses realistic estimate of the total number of plants in small and medium lowland rivers with sandy substrate. It was also predicted that to detect every single species with a field survey, an extremely extensive survey programme would be required. For the potentially most species-rich group of rivers (quality class IV) the number of required river survey sites was estimated as 236 (Table 3). The highest completeness of species in the reference sample was found for class I rivers, but still 62 sites would need to be surveyed to detect the missing 11 species. The predicted extensiveness of the survey programme to cover the full species richness confirms the second initial hypothesis (the sample size required for evaluation of full biodiversity (sample coverage) exceeds the possibilities of standard plant monitoring in rivers—therefore, to detect full biodiversity, the field survey should be supported by extrapolation estimates.).

It was found that to cover the full macrophyte species richness a very labour-intensive survey programme is needed; therefore the extrapolation approach seems to be extremely attractive for the conduct of diversity studies. Field surveying on such a large scale often exceeds the capability of most study projects. An example may be one of the largest international projects devoted to rivers—STAR (Furse et al. 2006)—which included a total of 263 sites sampled in 11 countries on 22 river types. Other study projects have been based on much smaller trials (Armitage et al. 2003, Schneider et al. 2012). Since in our study, which was limited to a single river type, comprehensive evaluation would have required between 62 and 236 survey sites, it is clear that the biodiversity estimates are usually based on highly insufficient data. The extrapolation approach seems to be an effective solution in various scientific and applied diversity inventories for conservation purposes and for environmental impact assessment procedures in river engineering.

The study has demonstrated that surveying the biodiversity of variety of ecosystems is a complicated issue, and requires a well-designed field survey programme with considerable support from analytical tools such as rarefaction and/or extrapolation. Using only simple statistics (average, median) in relation to the field samples is likely to be insufficient, whereas identification of all occurring species exceeds the capabilities of survey projects. In our study, we have shown that biological evaluation based on a limited number of survey sites (the reference sample) can lead to false conclusions regarding the species richness of river ecosystems and probably similar problems in other types of ecosystems.


The rarefaction/extrapolation method enables the precise comparison of plant diversity in different fluvial ecosystems, with consideration of rare species.

Trends in species richness along the trophic gradient were depicted differently by analysis based on the reference sample and on an extrapolated dataset.

Full evaluation of macrophyte variety in rivers, detecting every single species, requires a very high number of sampling units, and the use of the extrapolation method allows estimation of the species diversity on the basis of a smaller number of samples. The biodiversity extrapolation approach is an effective solution to reduce the cost of labour-intensive survey programmes.

Analysis based on extrapolation curves showed that the growth of trophy in rivers leads to diminishing diversity value as reflected by metrics based on the relative abundance of plants (the Shannon index and the inverse Simpson index). Ordering based on macrophyte taxa richness showed that the highest plant diversity occurs in mesotrophic conditions. The most oligotrophic pristine rivers are poorer in species than the mesotrophic, but the rivers poorest in species are those that are highly eutrophic.

Ecological assessment of rivers based on a small number of sampling units may lead to incorrect conclusions regarding the species richness. The extrapolation approach improves the precision of diversity estimates and increases the effectiveness of ecological inventories in fluvial ecosystems for conservation and management purposes.


  1. Araújo-Lima CARM, Forsberg BR, Victoria R, Marginelli L (1986) Energy-sources for detritivorous fishes in the Amazon. Science 234:1256–1258.

    Article  PubMed  Google Scholar 

  2. Armitage PD, Szoszkiewicz K, Blackburn JH, Nesbitt I (2003) Ditch communities: a major contributor to floodplain biodiversity. Aquat Conserv 13(2):165–185.

    Article  Google Scholar 

  3. Bergström SE, Svensson JE, Westberg E (2000) Habitat distribution of zooplankton in relation to macrophytes in an eutrophic lake. Verhandlungen des Internationalen Verein Limnologie 27:2861–2864

    Google Scholar 

  4. Bernatowicz S, Wolny P (1969) Fisherman’s botany (in polish). Państwowe Wydawnictwo Rolnicze i Leśne, Warszawa

    Google Scholar 

  5. Błachuta J, Szoszkiewicz K, Gebler D, Schneider SC (2014) How do environmental parameters relate to macroinvertebrate metrics? Prospects for river water quality assessment. Pol J Ecol 62(1):111–122

    Article  Google Scholar 

  6. Brewer A, Williamson M (1994) A new relationship for rarefaction. Biodivers Conserv 3:373–379.

    Article  Google Scholar 

  7. Budka A, Łacka A, Szoszkiewicz K (2018) Estimation of river ecosystem biodiversity based on the Chao estimator. Biodivers Conserv 27:205–216.

    Article  Google Scholar 

  8. Cardinale BJ, Duffy JE, Gonzalez A, Hooper DU, Perrings C, Venail P, Narwani A, Mace GM, Tilman D, Wardle DA, Kinzig AP, Daily GC, Loreau M, Grace JB, Larigauderie A, Srivastava D, Naeem S (2012) Biodiversity loss and its impact on humanity. Nature 486:59–67.

    Article  CAS  PubMed  Google Scholar 

  9. Celewicz-Goldyn S, Kuczynska-Kippen N (2017) Ecological value of macrophyte cover in creating habitat for microalgae (diatoms) and zooplankton (rotifers and crustaceans) in small field and forest water bodies. PLoS ONE.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Chao A (1984) Nonparametric estimation of the number of classes in a population. Scand J Stat 11:265–270.

    Article  Google Scholar 

  11. Chao A (1987) Estimating the population size for capture recapture data with unequal catchability. Biometrics 43:783–791.

    Article  CAS  PubMed  Google Scholar 

  12. Chao A, Gotelli NJ, Hsieh TC, Sander EL, Ma KH, Colwell RK, Ellison AM (2014a) Rarefaction and extrapolation with Hill numbers: a framework for sampling and estimation in species diversity studies. Ecol Monogr.

    Article  Google Scholar 

  13. Chao A, Gotelli NJ, Hsieh TC, Sander EL, Ma KH, Colwell RK, Ellison AM (2014b) Rarefaction and extrapolation with Hill numbers: a framework for sampling and estimation in species diversity studies. Ecol Monogr 84:45–67.

    Article  Google Scholar 

  14. Chiu C-H, Jost L, Chao A (2014) Phylognetic beta diversity, similarity, and differentiation measures based on Hill numbers. Ecol Monogr 84:21–44.

    Article  Google Scholar 

  15. Colwell RK, Coddington JA (1994) Estimating terrestrial biodiversity through extrapolation. Philos T Roy Soc B 345:101–118.

    Article  CAS  Google Scholar 

  16. Colwell RK, Mao CX, Chang J (2004) Interpolating, extrapolating, and comparing incidence-based species accumulation curves. Ecology 85:2717–2727

    Article  Google Scholar 

  17. Colwell RK, Chao A, Gotelli NJ, Lin SY, Mao CX, Chazdon RL, Longino JT (2012) Models and estimators linking individual-based and sample-based rarefaction, extrapolation, and comparison of assemblage. J Plant Ecol 5:3–21.

    Article  Google Scholar 

  18. Connel JH (1978) Diversity in tropical rain forests and coral reefs. Science 199:1302–1310

    Article  Google Scholar 

  19. Dudgeon D, Arthington AH, Gessner MO, Kawabata Z-I, Knowler DJ, Naiman RJ, Prieur-Richard A-H, Soto D, Stiassny MLJ, Sullivan CA (2006) Freshwater biodiversity: importance, threats, status and conservation challenges. Biol Rev Camb Philos Soc 81:163–182

    Article  PubMed  Google Scholar 

  20. Ellenberg H, Weber HE, Dull R, Wirth V, Werner W, Baulissen D (1992) Zeigerwerte von Pflanzen in Mitteleuropa. Scripta Geobotanica 18:1–257

    Google Scholar 

  21. Freni G, Maglionico M, Mannina G, Viviani G (2008) Comparison between a detailed and a simplified integrated model for the assessment of urban drainage environmental impact on an Ephemeral River. Urban Water J 5(2):87–96.

    Article  CAS  Google Scholar 

  22. Furse M, Hering D, Moog O, Verdonschot P, Johnson RK, Brabec K, Gritzalis K, Buffagni A, Pinto P, Friberg N, Murray-Bligh J, Kokes J, Alber R, Usseglio-Polatera P, Haase P, Sweeting R, Bis B, Szoszkiewicz K, Soszka H, Springe G, Ferdinand Sporka K, Krno I (2006) The STAR Project: context, objectives and approaches. Hydrobiologia 566(1):3–29.

    Article  Google Scholar 

  23. Gotelli NJ, Chao A (2013) Measuring and estimating species richness, species diversity, and biotic similarity from sampling data. In: Levin SA (ed) Encyclopedia of Biodiversity, 2nd edn. Elsevier, Amsterdam, pp 195–211

    Chapter  Google Scholar 

  24. Gotelli NJ, Colwell RK (2001) Quantifying biodiversity: procedures and pitfalls in the measurement and comparison of species richness. Ecol Lett 4:379–391.

    Article  Google Scholar 

  25. Gotelli NJ, Ellison AM, Ballif BA (2012) Environmental proteomics, biodiversity statistics and food web structure. Trends Ecol Evol 27:436–442.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Guadagnin DL, Maltchik L, Fonseca CR (2009) Species-area relationship of Neotropical waterbird assemblages in remnant wetlands: looking at the mechanisms. Divers Distrib 15:319–327.

    Article  Google Scholar 

  27. Herring TA, King RW, McClusky SC (2006) Introduction to GAMIT/GLOBK Release 10.3, Dep. Earth Atmos. Planet. Sci., Mass. Inst. of Technol Cambridge

  28. Hill M (1973) Diversity and evenness: a unifying notation and consequences. Ecology 54:427–432.

    Article  Google Scholar 

  29. Hrivnak R, Kochjarova J, Otàhelóvà H, Palóve-Balanga P, Slezàka M, Slezàka P (2014) Environmental drivers of macrophyte species richness in artificial and natural aquatic water bodies—comparative approach from two central European regions. Annales De Limnologie Int J Limnol 50:269–278.

    Article  Google Scholar 

  30. Hurlbert SH (1971) The nonconcept of species diversity: a critique and alternative parameters. Ecology 52:577–586.

    Article  PubMed  Google Scholar 

  31. Jost L (2006) Entropy and diversity. Oikos 113:363–375.

    Article  Google Scholar 

  32. Jost L (2007) Partitioning diversity into independent alpha and beta components. Ecology 88:2427–2439.

    Article  PubMed  Google Scholar 

  33. Jusik S (2012) Identification key to mosses and water liverwords required to the ecological status assessment of surface waters in Poland (in polish). Biblioteka Monitoringu Środowiska, Warszawa

    Google Scholar 

  34. Klaassen M, Nolet BA (2007) The role of herbivorous water birds in aquatic systems through interactions with aquatic macrophytes, with special reference to the Bewick’s Swan-Fennel Pondweed system. Hydrobiologia 584:205–213.

    Article  Google Scholar 

  35. Kuczyńska-Kippen N, Joniak T (2016) Zooplankton diversity and macrophyte biometry in shallow water bodies of various trophic state. Hydrobiologia 774(1):39–51.

    Article  CAS  Google Scholar 

  36. Lansac-Tôha FA, Velho LFM, Bonecker CC (2003) Influência de macrófitas aquáticas sobre a estrutura da comunidade zooplanctônica. In: Thomaz SM, Bini LM (eds) Ecologia e Manejo de Macrófitas Aquáticas. Maringá, Eduem, pp 231–243

    Google Scholar 

  37. Lawton JH, Bignell DE, Bolton B, Bloemers GF, Eggleton P, Hammond PM, Hodda M, Holt RD, Larsen TB, Mawdsley NA, Stork NE, Srivastava DS, Watt AD (1998) Biodiversity inventories, indicator taxa and effects of habitat modification in tropical forest. Nature 391:72–76.

    Article  CAS  Google Scholar 

  38. Longino JT, Colwell RK (2011) Density compensation, species composition, and richness of ants on a neotropical elevational gradient. Ecosphere 2:29.

    Article  Google Scholar 

  39. Loreau M, Naeem S, Inchausti P, Bengtsson J, Grime JP, Hector A et al (2001) Biodiversity and ecosystem functioning: current knowledge and future challenges. Science 294:804–808.

    Article  CAS  PubMed  Google Scholar 

  40. MacArthur RH (1965) Patterns of species diversity. Biol Rev 40:510–533.

    Article  Google Scholar 

  41. MacDougall AS, McCann KS, Gellner G, Turkington R (2013) Diversity loss with persistent human disturbance increases vulnerability to ecosystem collapse. Nature 494:86–89.

    Article  CAS  PubMed  Google Scholar 

  42. Magurran AE (2004) Measuring biological diversity. Blackwell Science, Malden

    Google Scholar 

  43. Meschiatti AJ, Arcifa MS, Fenerich-Verani N (2000) Fish communities associated with macrophytes in Brazilian floodplain lakes. Environ Biol Fish 58:133–143.

    Article  Google Scholar 

  44. Moro MF, Sousa DJL, Matias LQ (2014) Rarefaction, richness estimation and extrapolation methods in the evaluation of unseen plant diversity in aquatic ecosystems. Aquat Bot 117:48–55.

    Article  Google Scholar 

  45. Palmer MW (1990) The estimation of species richness by extrapolation. Ecology 71:1195–1198.

    Article  Google Scholar 

  46. Pott VJ, Pott A (2000) Plantas aquáticas do Pantanal. Embrapa, Brasília, p 404

    Google Scholar 

  47. R Development Core Team (2013) R: A Language and Environment for Statistical Computing. Vienna, Austria: the R Foundation for Statistical Computing. ISBN: 3–900051-07-0.

  48. Rutkowski L (2008) Identification key to vascular plants of Polish Lowland (in polish). Wydawnictwo Naukowe PWN, Warszawa

    Google Scholar 

  49. Schneider SC, Ławniczak AE, Picinska-Faltynowicz J, Szoszkiewicz K (2012) Do macrophytes, diatoms and non-diatom benthic algae give redundant information? Results from a case study in Poland. Limnologica 42(3):204–211.

    Article  CAS  Google Scholar 

  50. Schneider B, Cunha ER, Marchese M, Thomaz SM (2014) Explanatory variables associated with diversity and composition of aquatic macrophytes in a large subtropical river floodplain. Aquat Bot 121:67–75.

    Article  Google Scholar 

  51. Simberloff D (1979) Rarefaction as a distribution-free method of expressing and estimating diversity. In: Grassle JF, Patil GP, Smith WK, Taillie C (eds) Ecological diversity in theory and practice. International Co-operative Publishing House, Fairland, pp 159–176

    Google Scholar 

  52. Steffen K, Becker T, Herr W, Leuschner C (2013) Diversity loss in the macrophyte vegetation of northwest German streams and rivers between the 1950s and 2010. Hydrobiologia 713:1–17.

    Article  Google Scholar 

  53. Svitok M, Hrivnák R, Kochjarová J, Oťaheľová H, Pal’ove-Balang P (2016) Environmental thresholds and predictors of macrophyte species richness in aquatic habitats in central Europe. Folia Geobot 51:227–238.

    Article  Google Scholar 

  54. Szoszkiewicz K, Ferreira T, Korte T, Baattrup-Pedersen A, Davy-Bowker J, O’Hare M (2006) European river plant communities: the importance of organic pollution and the usefulness of existing macrophyte metrics. Hydrobiologia 566:211–234.

    Article  CAS  Google Scholar 

  55. Szoszkiewicz K, Budka A, Pietruczuk K, Kayzer D, Gebler D (2014) Diversity of macrophyte communities and their relationship to water quality in different types of lowland rivers in Poland. Hydrobiologia 737:77–85.

    Article  Google Scholar 

  56. Szoszkiewicz K, Budka A, Pietruczuk K, Kayzer D, Gebler D (2017) Is the macrophyte diversification along the trophic gradient distinct enough for river monitoring? Environ Monit Assess 189:4.

    Article  Google Scholar 

  57. Takeda AM, Souza-Franco GM, Melo SM, Monkolski A (2003) Invertebrados associados às macrófitas aquáticas da planície de inundação do alto rio Paraná (Brasil). In: Thomaz SM, Bini LM (eds) Ecologia e Manejo de Macrófitas Aquáticas. Maringá, Eduem, pp 1–28

    Google Scholar 

  58. Theel HJ, Dibble ED, Madsen JD (2008) Differential influence of a monotypic and diverse native aquatic plant bed on a macroinvertebrate assemblage; an experimental implication of exotic plant induced habitat. Hydrobiologia 600:77–87.

    Article  Google Scholar 

  59. Thiébaut G, Guerold F, Muller S (2002) Are trophic and diversity indices based on macrophyte communities pertinent tools to monitor water quality? Water Res 36:3602–3610.

    Article  PubMed  Google Scholar 

  60. Thiébaut G, Tixier G, Guěrold F, Muller S (2006) Comparison of different biological indices for the assessment of river quality: application to the upper river Moselle (France). Hydrobiologia 570:159–164.

    Article  CAS  Google Scholar 

  61. Thomaz SM, Cunha ER (2010) The role of macrophytes in habitat structuring in aquatic ecosystems: methods of measurement, causes and consequences on animal assemblages, composition and biodiversity. Acta Limnol Bras 22(2):218–236.

    Article  Google Scholar 

  62. Ulrich W, Ollik M (2005) Limits to the estimation of species richness: the use of relative abundance distributions. Divers Distrib 11:265–273.

    Article  Google Scholar 

  63. Vono V, Barbosa AR (2001) Habitats and littoral zone fish community structure of two natural lakes in southeast Brazil. Environ Biol Fishes 61:371–379.

    Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Anna Budka.

Additional information

Communicated by Daniel Sanchez Mata.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Budka, A., Łacka, A. & Szoszkiewicz, K. The use of rarefaction and extrapolation as methods of estimating the effects of river eutrophication on macrophyte diversity. Biodivers Conserv 28, 385–400 (2019).

Download citation


  • Biodiversity indices
  • Ecological status
  • Freshwater assessment
  • Hill’s numbers
  • Macrophytes