Introduction

The dynamics and functioning of ecosystems, and hence the ability of ecosystems to provide humans with essential goods and services, depends to a great extent on the diversity of life (Cardinale 2011; Hector 2011; Tscharntke et al. 2012; review by Cardinale et al. 2012). Diversity has other benefits as well, such as the reduction of disease prevalence in plants and animals when diversity is high (Civitello et al. 2015). But the diversity of species, genetics, and communities is being lost at an alarming rate (Cardinale et al. 2012). Human-induced species losses are arguably leading to a sixth mass extinction (Ceballos et al. 2015)—for example, the rate of vertebrate species loss over the past century is 100 times higher than the background rate (Ceballos et al. 2015).

The conservation and management of species, ecosystems, and diversity at various levels are crucially important in sustaining natural structures and functions. Conservation and management are likely to entail land-use decisions, for example, the design of a system of reserves, selection of areas for intensified agricultural production, or the choice among options for balancing human and ecological needs (Margules and Pressey 2000; Tscharntke et al. 2012). Some forms of management may involve making land-use decisions through such mechanisms as environmental markets (Pindilli and Casey 2015), which are often incorporated in regulatory frameworks that guide management of species and habitats (Salzman and Ruhl 2000).

The management of biological diversity on the landscape requires its accurate measurement, so as to compare alternatives, choose management actions, and monitor progress in achieving objectives. One approach to the measurement and tracking of biological diversity is use of a multi-metric site assessment to serve as a surrogate for surveys of species presence and abundance (Oliver et al. 2014). An example of this type of multi-metric index is the ecological integrity assessment, which combines measures of biotic and abiotic ecosystem features into a single index, as outlined by Andreasen et al. (2001). NatureServe has developed a version of ecological integrity assessment that includes geographic information system (GIS)–based landscape features, vegetation, and abiotic attributes. This framework has been used by NatureServe to build ecosystem-specific ecological integrity indices for wetlands (Faber-Langendoen et al. 2012a, p. 49) and northeastern temperate forests (Tierney et al. 2009), with the potential for application to other ecosystem types as well. The stated intention is to provide a standardized measure of outcomes of conservation programs. The proponents of ecological integrity assessment emphasize a number of desirable attributes such as convenience, cost-effectiveness, ease of use, measurability, flexibility, and sensitivity (Andreasen et al. 2001; Willamette Partnership 2011).

Given the relative scarcity of conservation funding and the increasingly urgent need for information about biological diversity, it is important to channel funds effectively for conservation value. Advocates of ecological integrity assessment explicitly claim that this approach can be used to measure and manage biological diversity efficiently (Andreasen et al. 2001; Willamette Partnership 2011; Vickerman and Kagan 2014), and hence this type of assessment is potentially very appealing to state and federal land-management agencies and other practitioners who are understandably looking for practical and inexpensive ways to generate useful information. However, ecological integrity assessment methodology so far lacks a thorough review in the refereed literature and a compilation of empirical evidence that it actually measures biological diversity. Without such a review and evidence, an ad hoc model can use up valuable time and resources but result in misleading metrics and misinformed decision making (Tulloch et al. 2013). These concerns are particularly relevant to broad-scale adoption of unvalidated methodology by government agencies, as was the case, for example, in 30 years of official commitment to ineffective tiger monitoring in India (Karanth et al. 2003).

This paper is an initial review and critique of the capacity of ecological integrity assessment to measure biological diversity. We first provide definitions and the general approach of ecological integrity assessment as described in the literature. Then we consider the ecological integrity index in relation to the rich scientific literature on biological diversity and its measurement, and examine the index’s reliability and robustness, its use for decision making, and robust alternative methods for measuring biological diversity. We conclude by discussing interpretation of ecological integrity assessment.

Ecological integrity assessment

An ecological integrity assessment is a multi-metric index in the form of an ecological scorecard (Faber-Langendoen et al. 2012a, b) that is intended to assess ecosystem structure, composition, function, species composition, diversity, and functional organization. It is held to be useful for measuring and monitoring biological diversity as well as ecosystem integrity (Table 1). For example, Parrish et al. (2003) claim that it can be used to track “important ecological characteristics of focal biodiversity” and synthesize their status into “a set of simple categorical ratings… of biodiversity status in an area” in order to “determine whether the status of biodiversity is responding to conservation investments and strategies.” Biota constitute the focus of ecosystem composition, one of the main conceptual aspects of integrity assessment (Andreasen et al. 2001). According to Unnasch et al. (2009), the concept of ecological integrity serves as a proxy for biological diversity, in that ecological integrity is said to be “the ability of an ecological system to support and maintain a community of organisms that has species composition, diversity, and functional organization comparable to those of natural habitats…” [italics added]. This definition clearly implies that an individual site with a high score for ecological integrity can be expected to host a typical array and abundance of biota for that site’s ecosystem type.

Table 1 Statements about ecological integrity assessment

General approach

In the current ecological integrity assessment methodology as described by NatureServe (Faber-Langendoen et al. 2012a, b), an ecosystem’s “condition” at a particular site is expressed in terms of ranked scores for several spatial and ecological characteristics that are intended to represent ecological structure, function, and composition, in comparison to reference (benchmark) conditions for that ecosystem type (Faber-Langendoen 2012b, p. 1). The idea is that ecological integrity “can be effectively assessed using a suite of rapid assessment metrics, structured around our general ecological model” (Faber-Langendoen 2012b, p. 1). According to Faber-Langendoen et al. (2012b, p. 2), their general model incorporates three ecosystem attributes called primary attributes—size, condition, and landscape context—which are subdivided by a set of ecosystem attributes called major attributes, held to be key “components capturing the structure, composition, and processes of a system” (Faber-Langendoen et al. 2012b, p. 2). In other words, these attributes are assumed to represent reliably the biological patterns and processes of ecosystems (i.e., structure, function, and composition), including biological diversity. According to Faber-Langendoen et al. (2012b, p. 2), additional attributes of what is called biotic integrity, with birds, amphibians, and macroinvertebrates given as examples, can be included in the assessment where resources and time permit. Keystone species, rare/sensitive species, and guilds are suggested as potential indicator variables by Unnasch et al. (2009). After the set of major attributes has been chosen for a particular type of ecosystem, specific indicator or proxy variables are chosen for each of the attributes. The raw data collected for each variable or metric are converted as necessary into ordinal categories (scored excellent/good/fair/poor on a simple ranked scale, according to criteria established by NatureServe), and then weighted and combined into a single score (Parrish et al. 2003; Tierney et al. 2009; Faber-Langendoen et al. 2012a). This process is illustrated schematically in Fig. 1, and a specific example, the wetland ecological integrity index, is presented in detail in Appendix 2.

Fig. 1
figure 1

Schematic for ecological integrity assessment. \(\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{m}_{i}\) data for each of 3 levels. \(\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{v}_{i}\) vector of indicator variables for each of k attributes, based on scaled and ranked data. Scaling and ranking factors \(\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{c}_{i}\) provided by NatureServe. a i attribute variable i, formed by weighting and aggregating the indicator variables in \(\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{v}_{i}\). EI ecological integrity index, formed by aggregating the attribute variables. The ecological integrity index is held to express the condition of ecosystem structure, function, and composition, including biodiversity

The pivotal role of vegetation in ecological integrity assessment

In their conceptual paper describing how a terrestrial ecological integrity index could be developed, Andreasen et al. (2001) stated that the focus of metrics for ecosystem composition is the biota, and therefore an important step is “selecting the biological entities to use as metrics for the index.” Plants and associated vegetation measures are at the core of the ecological integrity assessment version developed by NatureServe, whose Natural Heritage programs use vegetation as the primary means of assessing ecosystem condition (Faber-Langendoen et al. 2012a, p. 18). Vegetation thus serves as the key biological entity in ecological integrity assessment. The validity of this role is addressed in later sections that consider the empirical basis for the assumption that the chosen vegetation structure and composition attributes are consistently correlated with diversity of a range of taxa, and for the assumption that vegetation richness/diversity is correlated with richness/diversity of other groups.

Most of the vegetation metrics in the ecological integrity assessment (Faber-Langendoen et al. 2012b) are based on subjective visual estimates of vegetation structure or composition that can be recorded in a rapid site visit. Some vegetation methodology, such as the so-called floristic quality assessment, is intensive in terms of time and effort, and involves sampling in measured plots. Floristic quality assessment uses data on vascular plants, and can consist of one or more metrics that include species richness, other measures based on species richness (floristic quality index, mean coefficient of conservatism), and relative percentage of native vascular plant species at a site (Taft et al. 1997). These metrics are described in more detail in Appendix 3.

Ecological integrity assessment and the diversity of biota

The developers and advocates of ecological integrity assessment methodology have stated repeatedly that biodiversity is a major component of ecological integrity assessment (Andreasen et al. 2001; Parrish et al. 2003; Willamette Partnership 2011; Unnasch et al. 2009; Faber-Langendoen et al. 2012a, b; NatureServe 2012; Vickerman and Kagan 2014). Andreasen et al. (2001) explicitly state that ecological integrity encompasses biodiversity. Parrish et al. (2003) claim that the methodology can be used to measure the “status of biodiversity,” track “important ecological characteristics of focal biodiversity,” and determine whether the “status of biodiversity” is responding to investments (see Table 1). Unnasch et al. (2009) hold that ecological integrity assessment methodology supports conservation of native biological diversity. As described in the previous section, plant species, in themselves an important component of biological diversity, are the biological entities that play the most important role in the methodology. The assumption underlying such vegetation-based metrics is that the natural composition and structure of plant communities are optimal for supporting the range of naturally occurring wildlife (Willamette Partnership 2011, p. 10). Unimpaired ecosystem functions (which are assumed to be measured by the ecological indicators that are chosen) are also interpreted as providing habitats for naturally occurring biota (Willamette Partnership 2011, p. 11).

Given the emphasis in ecological integrity assessment on measurement and management of biological diversity (see Table 1), here we examine the methodology in relation to biological diversity. Because biological diversity has been a central theme in ecology for well over a century, a large body of work in both theoretical and applied ecology has resulted in a rich scientific literature on diversity and its measurement. Numerous standard biostatistics textbooks such as Krebs (1999), as well as more specialized texts such as Magurran (1988, 2004), cover the measurement of diversity. Thus, a wealth of information is available for examining the ecological integrity assessment methodology in light of existing literature on biological diversity.

Issues in biological diversity

In approaches to biological diversity, ecologists often study species diversity, partly because it is one of the most intuitive measures (Chiarucci et al. 2011). There are a number of statistically robust techniques for investigating species diversity. Most of the classical diversity measures are based on concepts of species richness (the number of species), evenness (the proportional abundance of each species), and differentiation (differences in species composition in an assemblage). According to Krebs (1999), the simplest measure of diversity is species richness, the number of species in an area or community. Several common statistical methods for estimating species richness include the rarefaction method, bootstrap procedures, and species–area curve (Krebs 1999). A new generation of statistical advances is based on capture–recapture methods (Williams et al. 2002a; MacKenzie et al. 2006). A common problem with species richness measures is that they are sensitive to sample size and the size of the sampled area. In particular, sampling must be random in order to avoid a biased result. Any method chosen for determining species richness must therefore control for sample size effects, species–area effects, and sampling effectiveness; the best ways to avoid the common pitfalls in measuring species richness are discussed by Gotelli and Colwell (2001). A more complex measure of diversity is heterogeneity, which combines richness and evenness in the distribution of the sampled species (the relative abundance of different species). The logarithmic series, Shannon’s index, and Simpson’s index are among the most widely used approaches (Krebs 1999). Again, samples must be random, and the sampling method chosen can strongly affect the results (Magurran 2004).

Some important considerations in conservation management that are not captured by traditional diversity measures are phylogenetic diversity (Magurran 2004; Chiarucci et al. 2011), as well as species intactness across a landscape and community structure relative to reference conditions (Lamb et al. 2009). Consideration of the mathematical properties of measures of diversity is also important in choosing an appropriate metric (Van Strien et al. 2012; Buckland et al. 2005).

Detecting trends over time in multiple taxa is especially challenging. For statistically robust longer-term monitoring, it often is better to focus on minimizing sampling error and designing robust monitoring (number of sites, duration of monitoring, sample variability, etc.) rather than monitoring a few sites intensively and many sites rapidly (Nielsen et al. 2009). MacKenzie et al. (2006) cover recent monitoring and estimation advances in depth. Magurran’s (1988, 2004) textbooks cover both traditional and new-generation methods for measuring numerous aspects of diversity.

Representing biological diversity with ecological integrity metrics

The foregoing brief overview raises questions about whether and how accurately the ecological integrity assessment measures key aspects of biological diversity. The ecological integrity assessment includes data on some remotely sensed landscape characteristics such as patch size and surrounding land use, some abiotic factors such as hydrology, and some attributes of vegetation structure and composition (see indicator variables in Table 2). The methodology relies almost entirely on proxy variables, such as structure of vegetation or the species richness of vascular plants as a proxy for diversity of a range of taxa. In this section we examine empirical evidence for the validity of these proxy relationships in representing biological diversity.

Table 2 Example set of indicators for rapid ecological integrity assessment of wetlands (based on Faber-Langendoen et al. 2012a, p. 33)

Vegetation attributes and diversity

One important question is whether the vegetation attributes as measured by indicator variables (Table 2) are correlated with biological diversity. In an ecological integrity assessment it is assumed that selected habitat measures are reliably correlated with biological diversity, and that easily measured vegetation characteristics (such as structure, composition, relative percent cover of native plant species, and other similar indicator variables [see Table 2]) can be used as surrogates for the diversity of biota. However, Faber-Langendoen et al. (2012a, b) present no evidence for a strong correlation between diversity across multiple taxa and the chosen habitat characteristics represented by indicator variables such as those in Table 2.

Empirical studies of the relationship between habitat structure and faunal diversity have had mixed success in finding any significant relationship (Williams et al. 2002 b). For example, Cushman et al. (2008) found that forest composition and structure variables and forest community type could not explain the majority of variation in the relative abundance of 53 bird species: forest vegetation measures were not reliable proxies for abundance or viability of animal populations. Psyllakis and Gillingham (2009) empirically tested how many of 55 vertebrate species could be predicted by various sets of forest structure measures (e.g., woody debris, tree and shrub species and size, stem numbers), and found that no set of structural attributes predicted multiple species. Barton et al. (2014) tested six vegetation variables (such as percent cover and species richness of different vegetation strata) and quantified their relationships to abundance, species richness, and composition of bird, mammal and reptile assemblages. They found strong and consistent relationships between vegetation overstory richness and cover and bird assemblages, but inconsistent relationships of varying strengths for any given vegetation attributes across all three study assemblages of bird, mammal and reptile taxa, suggesting that vegetation attributes were not reliable proxies for diversity across fauna taxa. Gollan et al. (2009) tested environmental factors (vegetation structure, soil) and found that none were reliably correlated with species diversity across four invertebrate orders, and hence could not be adequate proxies for the diversity of multiple invertebrate taxa. In another empirical study, Axmacher et al. (2009) found in a principal components analysis that vegetation structural attributes overall (cover, tree crown diameter, leaf shape, height, epiphyte cover) accounted for less than one-quarter of the variation in diversity patterns of 279 geometrid moth species. Because habitat-based measures tend to be specific to a given taxon, they may correlate well with occurrence of that particular species or group [such as birds and overstory vegetation richness (Barton et al. (2014); arboreal marsupials and hollow trees (Lindenmayer et al. 2014); see review by McElhinny et al. (2006) for fauna of Australian woodlands)] but are unlikely to be suitable as proxies for diversity of biota across multiple taxa. Different patterns result from the fact that different species groups perceive habitat differently and rely on different resources. Even for a species with well-known habitat requirements that include a readily measured structural or floristic element that is a limiting resource (Lindenmayer et al. 2014), accurate and reliable estimates of abundance and population trends depend on well-designed species sampling (MacKenzie et al. 2006), not simply habitat metrics (Lindenmayer et al. 2014; Pierson et al. 2016).

Vegetation richness and diversity

A second important question is whether vegetation richness/diversity is correlated with richness/diversity of other groups. In other words, is vegetation diversity a good proxy for diversity of biota across taxa and trophic levels? The floristic quality index and vascular plant species richness are considered by Wilhelm and Ladd (1988) and Taft et al. (1997) to be measures of vascular plant diversity, and their methodology is described in Faber-Langendoen et al. (2012a), but no evidence is presented in any of these sources for a consistent correlation between species richness of vascular plants and that of other biota. However, a number of studies in the refereed literature have found highly inconsistent relationships between species richness values for one taxon compared to others, including vascular plants. Studies of cross-taxon richness relationships (Prendergast and Eversham 1997; Wolters et al. 2006; Heino 2010; Eglington et al. 2012; Westgate et al. 2014) have shown that no taxon is particularly good for predicting the richness of other taxa. In other words no group—including vascular plants—has been shown empirically to be a good biodiversity indicator. For example, Kirkman et al. (2012) found that species richness of vascular plants at wetland sites was not a good predictor of species richness of the other groups, probably because biotic and abiotic processes act at different scales for different taxa, and other factors such as the number of species in neighboring patches and the number of neighboring patches also need to be accounted for. In another empirical study, Axmacher et al. (2009) found that alpha diversity of 279 geometrid moth species was not significantly correlated with overall vascular plant species richness, and hence “the diversity of vascular plants cannot universally be used as a suitable biodiversity indicator for diverse insect taxa” at the community level. While it is certain that animal species assemblages are influenced by vegetation, among other drivers, what is not so clear is the nature of the relationship between vegetation and diversity, and the degree to which the former can serve as a surrogate for the latter. The published evidence suggests that the relationship is at best scale-specific, taxon-specific, and of limited value at the scale at which habitat management is typically practiced.

Focal biodiversity and diversity

A third question is whether focal biodiversity is representative of biological diversity in an area. The use of so-called focal biodiversity to focus conservation planning efforts is intertwined with the issue of biodiversity indicators by Parrish et al. (2003) among others (see Lindenmayer et al. 2002). Parrish et al. (2003) claim that because focal biodiversity is “chosen to represent the biodiversity” of an area, an assessment of focal biodiversity is also a “measure of the status of biodiversity overall” (Parrish et al. 2003). But is the assumption that one or a small number of focal species reliably represents much of the regional biota empirically supported? Andelman and Fagan (2000) evaluated patterns of spatial co-occurrence between different biodiversity indicator species and regional biota in three conservation databases representing different scales and regions, and found that none of the various schemes (e.g., species most threatened; riparian species) captured ecologically associated species better than randomly selected indicator species. Prendergast et al. (1993), using empirical data on British plants and animals across many sites, found that species-rich areas frequently differ for different taxa, and many rare species do not occur in species-rich areas. Cushman et al. (2010) found that abundance patterns of multiple forest bird species were not consistently correlated in any of several species grouping schemes (e.g., by migratory status). The assumption that the response of indicator species will be typical of the response of many other species is not supported by evidence (Lindenmayer et al. 2002), for example because of limited species co-occurrence or varying responses to habitat disturbance. In order to use an indicator species or taxon as a reliable proxy for the presence, abundance, or richness of other taxa or for particular environmental conditions, it is essential to quantify the relationship between the indicator and what it is supposed to represent (Lindenmayer and Likens 2011). No such relationship has been demonstrated for vascular plants or other taxa at sites evaluated by ecological integrity assessment methods.

Multi-metric indices and diversity

Finally, there remains the question of whether multi-metric indices of site condition are correlated with diversity of a range of taxa. Site-condition multi-metric indices standardize, weight, and combine a variety of habitat-based variables in a single score, as distinct from habitat-based biological diversity surrogates comprising individual habitat measures (Lindenmayer et al. 2000; Psyllakis and Gillingham 2009; Oliver et al. 2014; see also foregoing section on vegetation characteristics and diversity). A reliable correlation between a multi-metric ecological score and biological diversity at a site is required for the score to be as useful in planning and management as claimed by proponents (see Table 1). There have been few tests of how diversity (Oliver et al. 2014), but empirical analyses so far have produced problematic results. Kwok et al. (2011) tested whether there was a predictable empirical relationship between either of two types of ecological scorecard for eucalypt woodlands (one based on measures of landscape function related to water run-off and soil condition; and one based on measures of vegetation structure and composition such as tree and shrub richness and cover, litter cover, log numbers) and patterns of diversity in three arthropod orders (species and family abundance, richness, and community composition). Index values from both ecological scorecards were weakly and inconsistently related to arthropod diversity in all three orders. Similarly, Oliver et al. (2014) tested the relationship of site-condition scores for three Australian multi-metric indices (based on vegetation composition and structure such as percent cover of native canopy and shrubs, litter cover, number of logs, as well as landscape-scale attributes) and species diversity data for 11 disparate taxa (1068 species of vertebrates, invertebrates, and plants). Site condition scores were not reliably related to diversity in most cases: of the 11 taxa, richness/diversity of only 2 taxa (birds and wasps) was significantly correlated with the site condition score. McGoff et al. (2013) used multivariate analyses in empirical tests of the relationship of two scoring protocols for lake shore habitat and patterns of diversity (taxon richness, Shannon-Wiener diversity index, and several macroinvertebrate metrics) in 14 macroinvertebrate taxa in 4 European regions. Although in some cases selected macroinvertebrate metrics were significantly correlated with habitat scores, there was no relationship between overall lakeshore index scores and macroinvertebrate diversity metrics across all the regions (McGoff et al. 2013). Taken together, these findings suggest that a scorecard approach is likely to be of limited use in representing patterns of biological diversity across multiple taxonomic groups, perhaps because biological processes operate at different scales for the different species using a site, and that empirical validation is essential.

Statistical robustness of ecological integrity assessment

According to Parrish et al. (2003), the ecological integrity assessment can “track important ecological characteristics” of focal biodiversity; Faber-Langendoen et al. (2012a, p. 7) state that it can be used to monitor status and trends; and Vickerman and Kagan (2014) state that it will “reveal trends.” Above and beyond the question of whether the indicators in an ecological integrity assessment actually represent diversity, there is a methodological question about the statistical reliability and repeatability of the indictors themselves. Substantial bias or high statistical variability could limit their usefulness, irrespective of their linkage to biodiversity. In this section we examine two issues related to statistical robustness, namely systematic bias and statistical power.

Systematic bias in index

Systematic bias is a concern with multi-metric indices, such as the ecological scorecard, that convert raw measurements of diverse variables into scores and weight and combine them into a single score. The indices can be subject not only to measurement error and observer bias in collecting the raw data (Gorrod et al. 2013; Dolph et al. 2010), but also to systematic bias when raw estimates are converted into categorical scores (Gorrod et al. 2013). In a study of uncertainty associated with site-condition scores for two Australian multi-metric indices based on vegetation composition and structure and used to predict value of sites for diversity of a range of taxa, Gorrod et al. (2013) found substantial and systematic underestimation of value, generated by sensitivity of the benchmark scoring intervals to observer error. The resulting bias in site scores clearly could have significant implications for conservation outcomes, especially in a market-based context (Gorrod et al. 2013). Dolph et al. (2010) used multi-metric indices of aquatic biotic integrity to demonstrate how their sensitivity to random sampling variability could lead to bias in scores and hence affect management decisions. In addition, as pointed out for multi-metric indices in general (Suter 1993; Efroymson et al. 2008), such indices do not measure real-world properties, and the index’s variance is not clearly related to a biological response; further, combining heterogeneous measures into a single index value implies that there is only a single linear scale and only a single type of response by ecosystems to disturbance. Thus, the inherent statistical properties of multi-metric indices in general tend to confound their results.

Statistical power to detect trends

A second issue concerns the statistical power needed to detect trends over time. According to its proponents, the ecological integrity assessment is useful to detect ecological change and reveal trends when repeated measures are made (Table 1). These statements notwithstanding, distinguishing real change from natural variation (e.g., spatial or temporal variation) and measurement or sampling error (e.g., detectability problems), requires a careful and sophisticated survey design. A critical issue is the level of survey effort required to achieve enough precision to identify (quantify) trends (MacKenzie et al. 2006; Magurran et al. 2010). For a single site, a key is the length of the time series and the precision of measurement at each point (Magurran et al. 2010). Power calculations or determination of confidence intervals can be used to estimate the level of survey effort needed (Magurran et al. 2010). Consideration of statistical power is not mentioned as part of ecological integrity assessment methodology in Faber-Langendoen et al. (2012a, b). Without explicit design for adequate statistical power, there is no assurance that the ecological integrity index can distinguish among directional change in diversity, natural variation over time, and measurement error.

Reliability of the vegetation metrics in ecological integrity assessment

In sampling plants and animals alike, detectability and spatial variation are well known issues in statistical methodology (Yoccoz et al. 2001), as is the need to account for them in sampling (Williams et al. 2002a; MacKenzie et al. 2006). Reliability of the vegetation metrics in ecological integrity analysis is important because measures of vascular plants are the main measures of biological diversity and of ecosystem composition in NatureServe’s characterization of the methodology. These metrics have been the core of the approach from its earliest development by Wilhelm (1977), Taft et al. (1997), and Wilhelm and Ladd (1988). In this section we examine concerns about various sources of error in the vegetation methodology of ecological integrity assessments.

Bias in richness estimates

One concern is bias in estimates of plant species richness. Species richness measures are sensitive to the number of individuals sampled, and to the number, size, and spatial arrangement of samples (Gotelli and Colwell 2011). The single-plot sampling of vascular plant species richness at a wetland site in Faber-Langendoen et al. (2012a) cannot provide any indication of variability across the site. Without multiple plots, it is impossible to determine the mean and variance of index values among plots (Magurran 1988) and thus to examine whether the sample data are actually representative. Any individual plot cannot characterize site variation in plant communities accurately (Bourdaghs et al. 2006). However, increasing the area sampled has its own pitfalls, because increasing the size of the sampling area also increases species richness estimates (Krebs 1999) as well as the mean coefficient of conservatism and floristic quality index (Matthews et al. 2005; Bourdaghs et al. 2006) which are based on species richness. Species richness of an entire site, rather than one plot in the site, is often used in floristic quality assessment (Nichols 1999), following the protocols defined by Wilhelm and Ladd (1988) and cited by Faber-Langendoen et al. (2012a) as the basis for floristic quality assessment. In such a case, variation in site size can lead to variation in estimates of species richness and related measures such as mean coefficient of conservatism. These sources of bias can easily confound the estimation of a site’s conservation value.

Species detection error

A second concern is sampling error in detecting plant species presence. Vegetation measures are highly susceptible to detectability problems. Numerous studies have empirically tested detectability of plants (Chen et al. 2013; Moore et al. 2011; Per et al. 2008; Archaux et al. 2006; Ringvall et al. 2005; Klimes 2003; Milberg et al. 2008) in vegetation monitoring, including the presence/absence sampling and visual estimates of cover that are used in ecological integrity assessment. In surveys of plant species presence and distribution, detection error is the norm rather than the exception: in some surveys a high percentage of species (e.g. 20–34 %) were overlooked, and detection probability varied with observer, sampling time, season, and spatial location. If survey designs do not explicitly incorporate detection probability, serious bias in estimates of species distributions and species richness can result. Accommodation for sampling error is not mentioned by Faber-Langendoen et al. (2012a), and hence the vegetation metrics are likely to be subject to bias.

Bias in coefficient of conservatism

Another concern is subjectivity (observer variability) in assigning coefficients of conservatism. Coefficients of conservatism assigned to plant species are qualitative and subjectively determined on the basis of professional judgment, rather than objectively assigned. The reason given is that the taxa “span a range that is too broad…for any objective natural sorting to serve as a guide to species rankings” (Taft et al. 1997). Because of this subjectivity, the metric is highly susceptible to inter-observer variability and bias. Land and Chiarucci (2010) specifically tested inter-observer variation in coefficients of conservatism, and found that scores given by different experts were not consistent and resulted in derived floristic quality indices that were statistically different. Inter-observer variation within NatureServe species databases that contain pre-assigned coefficients of conservatism apparently has not been examined by Faber-Langendoen et al. (2012a) and could result in bias.

There are other potential problems with the coefficient of conservatism in addition to those mentioned above. The metric is computed by averaging predetermined conservatism values for all the native species observed at a site, and then assigning the resulting average value to one of a few ordinal categories (see Appendix 3). One problem is that the coefficient of conservatism is based on species occurrence at a single site, which may or may not represent the larger area to which inference is made. The absence of any accounting of spatial variability associated with it (Magurran 1988) limits its usefulness for broader assessments. In addition, the metric is based on observed rather than actual occurrence, with no accommodation for sampling error in detecting plant species presence (see above discussion of this topic). Also, sensitivity of the metric to cut-off points used to distinguish categories is a concern that is yet to be investigated. Finally, even under ideal circumstances, the metric would be at best an ambiguous measure of native plant diversity: a large mean C at a site can be obtained when just one or a few high-value species are observed, and a much smaller mean C can be obtained from the range of conservatism values resulting when numerous species are observed. Among other things this means the coefficient of conservatism can actually be inversely related to diversity of native plants at a site.

Ecological integrity assessment and decision making

The foregoing issues affect the usefulness of ecological integrity assessment in decision making. Assessments are held by some to be useful in land-use planning, in the prioritization of sites for mitigation (Faber-Langendoen et al. 2012a, p. 7), in the identification of sites for “conservation/management actions” (Faber-Langendoen et al. 2012a, p. 9), in conservation planning (Unnasch et al. 2009), in assessment of management effectiveness (Tierney et al. 2009), and in the allocation of resources and management decisions (Vickerman and Kagan 2014). One example would be a comparative assessment of sites for land-use decisions in environmental markets and trading schemes, such as those described by Salzman and Ruhl (2000). It is unclear how an ecological integrity assessment could play these important roles. Whatever else is involved, decision making is based on a comparison of alternative management choices against value-based criteria, so as to allow one to recognize value differences among alternatives. From the regulatory point of view, it is important that objectives are clear and acceptable, and measures allow discrimination among alternative outcomes on the basis of the scientific method, including hypothesis testing. NatureServe’s ecological integrity assessment in its present form may be unable to satisfy these requirements, due to the many potential sources of error and sampling bias that are unaccounted for and the ambiguity about what ecological attributes are actually represented by the index. For example, any metric used by federal agencies in a context of the U.S. Endangered Species Act (ESA) would need to be able to withstand legal scrutiny during judicial review. In the ESA context, 25 of 32 listing decisions reviewed by the court in 2003 were set aside, many because the U.S. Fish and Wildlife Service did not use “best available science” (Wilde 2014). It is unlikely that an ecological integrity assessment as currently described and justified would meet the necessary criteria to withstand such scrutiny.

Robust alternative measures of biological diversity

Biodiversity measures can indeed provide valuable empirical metrics for assessing change (Buckland et al. 2005). But one size definitely does not fit all—there is no universally appropriate generic monitoring program, and an optimal design must always be tailored to a particular situation and purpose (Ferraz et al. 2008). Different indices measure different aspects of diversity, so it is essential to define the objectives of any monitoring program clearly in order to choose an appropriate index (Yoccoz et al. 2001; Pollock et al. 2002). Recent biometric research on how to measure and monitor biological diversity has proliferated, and includes work on appropriate statistical criteria, new indices and methods, design of monitoring programs, and methodological comparisons (Yoccoz et al. 2001; Magurran 2004; Lamb et al. 2009; Buckland et al. 2005; MacKenzie et al. 2006; Magurran et al. 2010; Magurran and McGill 2011; Dornelas et al. 2013). While it is beyond the scope of this paper to review the statistical literature, here we cover some important considerations in designing statistically robust surveys of biological diversity.

The size of the area and the number of taxa of interest affect the design of a monitoring program. For monitoring change in biological diversity over time in a wide heterogeneous region, Buckland et al. (2005) set out six criteria to evaluate indices for detecting temporal change. They reviewed performance of several measures (Shannon’s index, Simpson’s index, arithmetic mean of relative abundance indices, geometric mean of relative abundance indices) against the criteria and found that Shannon’s index and the geometric mean performed best. Yoccoz et al. (2001) and Pollock et al. (2002) suggested that in certain circumstances species richness is an adequate measure, absent an accounting for abundance. Lamb et al. (2009) examined diversity indices for monitoring temporal change over large areas. They evaluated 13 diversity indices of 3 types (traditional, community [species] intactness based on occurrence, community [species] intactness based on abundance) against several criteria such as sensitivity to detection error and power to detect trends, in six ecological scenarios. They found that the intactness index based on Buckland’s (2005) arithmetic mean of relative abundance indices performed best (Lamb et al. 2009). Monitoring of single sites with unique habitats or rare species could be treated by collecting data only on the specialist species of interest (Buckland et al. 2005) or by conducting single-species surveys (Magurran et al. 2010).

Any serious attempt to monitor biological diversity should address the two main sources of error, detectability and spatial variation (or environmental heterogeneity) (Yoccoz et al. 2001; Williams et al. 2002a; Buckland et al. 2005). Detection error results when some individual organisms or species evade detection during a survey. Distance sampling and capture–recapture are two types of methodology that can be used to estimate detection probabilities associated with count statistics (Yoccoz et al. 2001; Williams et al. 2002a; MacKenzie et al. 2006). While these methods may be more appropriate for one or a few sites, Pollock et al. (2002) discussed methods to measure and incorporate detectability in large-scale studies at multiple points in space and time. Survey errors result when inferences about a larger area are not based on an appropriate spatial design for sampling smaller sites. A particular problem is the use of subjectively chosen sampling sites, which can lead to biased estimates of diversity at the larger scale (Yoccoz et al. 2001). Yoccoz et al. (2001) and Williams et al. (2002a) discussed some of the many recent statistical advances in sampling design, with survey features that can be customized for the taxa of interest.

A survey that is properly designed can quantify the uncertainty and precision of diversity measures, and thus their reliability (Buckland et al. 2005; MacKenzie et al. 2006). Determining the survey effort needed for sufficient precision to quantify change over time is key to producing unbiased results (Buckland et al. 2005; Magurran et al. 2010). The effort needed for a given level of precision can be estimated in advance by power calculations. How long the time series is, and how precise the measurements are at each time, are important factors at a single site (Magurran et al. 2010). In addition, the number of plots, plot size, and frequency of sampling also determine precision for multi-site sampling of a larger area (Magurran et al. 2010). In a case study, Nielsen et al. (2009) examined statistical power for detecting simulated species declines in several monitoring scenarios, given different numbers of sites coupled with different sampling intervals. Dornelas et al. (2014) reviewed other general issues involved in quantifying trends in biological diversity over time.

Broad applicability and cost-effectiveness are both important considerations in implementing large-scale biological diversity monitoring programs. Use of common monitoring designs across global regions was recommended by Buckland et al. (2005), in order to provide greater scope in measuring changes, as well as economies of scale. They suggested designing surveys such that entry at various levels is possible, thus allowing nations with fewer resources to take part, perhaps with design modifications such as a subset of species, lower sampling rates, and simpler methods (Buckland et al. 2005). If the monitoring program and sampling protocols are well-designed and well-coordinated, professionals are not necessarily needed to collect all data (Magurran et al. 2010; Tulloch et al. 2013). With adequate training and supervision, non-professional volunteers can be as proficient as professionals in many tasks, as shown by a number of quantitative evaluations of volunteer-collected data [e.g., in mammal surveys (Newman et al. 2003), amphibian call surveys (Genet and Sargent 2003), or shark counts (Ward-Paige and Lotze 2011)]. However, generating high-quality data from “citizen science” involves other costs associated with coordination, communication, and data quality control (Tulloch et al. 2013), and it is important to keep in mind that adhering to statistical and ecological principles remains essential (Buckland et al. 2005; Lamb et al. 2009; Magurran et al. 2010; Tulloch et al. 2013).

Discussion

The investigation of ecological integrity addresses a critical need for usable information to help stem the accelerating loss of biological diversity. But we believe there are serious and unaddressed concerns about the suitability of ecological integrity assessment as described by Faber-Langendoen et al. (2012a, b) for measuring diversity and detecting trends over time. For example, the vegetation sampling methods—including protocols for estimating plant species richness—are susceptible to sampling error and observer bias. Vascular plant diversity, which is used as a key proxy for biological diversity, is not a reliable indicator of diversity of other taxa and has no demonstrated relationship to measures of cross-taxon diversity at a site. The empirical studies discussed earlier point to serious difficulties in using these indicators as surrogates for biological diversity. In fact, patterns of biological diversity in landscapes, however they are represented, are too complex to be represented effectively with these indicators. And there is no evidence that the ecological integrity assessment protocols as currently designed can resolve problems of detectability and environmental heterogeneity in distinguishing natural variation from ecological change over time.

A related issue that merits discussion is whether expansion of the current index to include taxa other than vascular plants would improve its suitability for measuring biological diversity. Proponents hold that the ecological integrity index can be extended as necessary to include other components of the biota such as birds or amphibians (Faber-Langendoen et al. 2012b, p. 2); keystone species, rare/sensitive species or guilds (Unnasch et al. 2009, p. 27); and unique native species or vertebrate species (Vickerman and Kagan 2014, p. 16). In our opinion, the inclusion of additional taxa would still leave ecological integrity assessment in its present form an inadequate measure of biological diversity. First, none of the current sampling protocols described in Faber-Langendoen et al. (2012a, b) address the issues of detectability and environmental heterogeneity in sampling for plants, and these problems are even harder to resolve when sampling animal populations (Williams et al. 2002a). Second, as discussed previously, no taxon or group has been shown empirically to be a reliable proxy for diversity of a range of taxa. Third, the process of converting raw data to numerical scores and then weighting them in an assessment can introduce bias (Gorrod et al. 2013). A noteworthy point is that in the various publications describing ecological integrity assessment, proponents of the methodology (Parrish et al. 2003; Willamette Partnership 2011; Unnasch et al. 2009; Faber-Langendoen et al. 2012a, b; NatureServe 2012; Vickerman and Kagan 2014) have yet to include a worked example for any species other than vascular plants, although the word biodiversity is frequently mentioned.

Measuring diversity of biota and estimating the conservation importance of any given site requires more than a few proxy variables or indicator species. There is a large body of recent literature on diversity theory (reviewed by Bestelmeyer et al. 2003), indicating that the biological processes creating biological patterns operate at different scales for different species. Many factors interact to determine animal diversity patterns, including competition, territoriality, dispersal, predation, physical environmental variation (especially landscape-scale gradients and patchiness), and historical variation in biogeography (Bestelmeyer et al. 2003). For individual species, distribution patterns across scales are determined by habitat requirements, dispersal capabilities, and the size and location of the geographic range. Thus, measuring habitat variables at a given site is not sufficient for monitoring population viability or abundance of even a single species (MacKenzie et al. 2006). By extension, scorecards at individual sites are not sufficient to explain distribution patterns across sites. For biological diversity as a whole, patterns of species diversity are strongly influenced by spatial heterogeneity in a scale-dependent way (Williams et al. 2002b), which can potentially result in a strong association of habitat features with one or a few species but a weak association with diversity of multiple taxonomic groups. However, ecological integrity assessment largely ignores these issues, and assumes instead that habitat and landscape features at individual sites fully account for diversity of biota in a predictable way. Inaccurate estimates of the conservation value of a site could easily result from this unproven assumption.

While we applaud efforts to address important environmental issues with an ecological integrity assessment, further development is needed, especially in the areas of technical refinements and validation. In its current form the methodology is of limited use in providing meaningful metrics of biological diversity, and lacks a foundation in ecological and statistical principles. At a minimum, sources of sampling error—especially organism detectability and spatial variation—should be investigated and accounted for, along with the potential for bias and loss of essential information due to condensing such a large amount of disparate data into a single index. Further empirical investigation is needed to quantify how well the indicator variables and metrics are correlated with the particular ecological processes or environmental conditions they are supposed to represent (Lindenmayer and Likens 2011; Lindenmayer et al. 2014). Relationships between indicator variables and biological diversity attributes should be quantified to determine the transferability of a given indicator to the biotic component for which it is used as a proxy (Lindenmayer and Likens 2011). More effective development of a metric will require greater collaboration of statisticians, landscape ecologists, and theoretical ecologists.

A broader acceptance will necessitate evidence from the refereed scientific literature that ecological integrity assessment actually measures biological diversity. In particular, the methodology should be subjected to ongoing critical review in the literature to a much greater extent than it has been to date. Empirical studies such as those suggested in the previous paragraph should be undertaken, with results published in peer-reviewed journals. In addition to a more thorough investigation of the assessment in relation to biological diversity than this paper permits, the linkage between ecological integrity assessment and ecological structure and function should also be investigated, for example by identifying criteria with respect to the use of various proxy variables, by which to assess index performance; conducting a comprehensive literature review to identify evidence for each criterion; and subjecting the evidence to formal analysis.

With improvements in methodology and thoughtful choices, measuring biological diversity can produce unbiased results that reflect real change rather than sources of error, and provide the accurate assessments necessary for effective conservation decisions.