1 Introduction

The term biodiversity is a contraction of biological diversity and can be simply defined as the sum of all biotic (animal and plant life) variation from the level of genes to ecosystems (Purvis and Hector 2000). Biodiversity loss can be driven by processes which are extrinsic such as climate change or tectonic movements. However, current changes in biodiversity result primarily from processes intrinsic to life on Earth, and almost exclusively from human activities. Human-induced pressures are affecting the earth’s ecosystems, eliminating genes, species and biological traits at an alarming rate; which in some cases is irreversible. The most important direct human-induced impacts on biodiversity are habitat destruction (Bawa and Dayanandan 1997; Tilman 2001), the introduction of alien species (Everett 2000; Levine 2000), over-exploitation (Pauly et al. 2002; Hutchings and Reynolds 2004), disease (Daszak et al. 2001), pollution (Baillie et al. 2004), and climate change (Parmesan et al. 1999; McLaughlin et al. 2002; Walther et al. 2002). Until the effects of critical pressures are reduced, most declines seem likely to continue at the same or increased rates. While there is evidence that biodiversity loss is slowing or even recovering for some habitats, over the past few decades, there is substantial concern regarding the rate at which biodiversity loss will alter the functioning of ecosystems and services (Cardinale et al. 2012). In light of the above, there is an urgent need for scientific information to support policy makers and to ease the decision-making process that is required behind conserving ecosystems.

By incorporating biodiversity loss estimates into modelling tools for conservation management and environmental risk assessment, we can make better conservation and restoration decisions; with the objective of maintaining biological diversity and the ecosystem services that this diversity provides. A powerful modelling technique which determines the environmental impacts of products, processes or services is Life Cycle Assessment (LCA; e.g. Guinée 2002). This modelling framework assesses the environmental pressures and related potential environmental impacts associated with all the stages of a product’s life cycle from cradle to grave. Environmental impact categories include climate change with global warming potential as the corresponding characterisation factor (Plevin et al. 2013; Nakano 2015), acidification (Huijbregts et al. 2000; Kim and Chae 2016), human toxicity (Hertwich et al. 2001; Juraske et al. 2009), water depletion (Pfister et al. 2009; Finkbeiner et al. 2010), resource depletion (Klinglmair et al. 2014) etc.

Impacts on biodiversity has been one of the most challenging categories to be incorporated in life cycle impact assessment (LCIA, see e.g. Curran et al. 2011; De Souza et al. 2015). The Millennium Ecosystem Assessment (MEA 2005) identified various drivers for biodiversity loss, of which the most important are (a) terrestrial and aquatic habitat change, (b) invasive species, (c) pollution, (d) climate change and (e) over-exploitation. Some LCIA methods attempt to assess impacts like global warming (e.g. De Schryver et al. 2009; Wilting et al. 2017) and freshwater use (e.g. Pfister et al. 2009; Verones et al. 2017) on biodiversity, but the vast majority of LCIA approaches look at biodiversity impacts by land use and land use change only. An important milestone was the publication of a special issue of this journal on Global land use impacts on biodiversity and ecosystem services in LCA that included a.o. papers on related UNEP-SETAC guidelines (Koellner et al. 2013), land use impacts on biotic production (Brandão et al. 2013; compare Taelman et al. 2016), land use impacts on ecosystem services like freshwater and erosion regulation (Saad et al. 2013) and land use impacts on functional diversity (De Souza et al. 2013). Approaches which analyse the impacts of land use occupation and change on species abundance or species richness, usually by applying species-area relationships (SARs) are however dominating the current state of the art in LCIA. Recent examples of studies presenting such indicators include de Baan et al. (2014), Chaudhary et al. (2015) and Wilting et al. (2017). In the case of non-land use impacts, the link between species richness and anthropogenic influences is weak at best.

In addition to traditional biodiversity measures such as species richness and phylogenetic diversity measures (e.g. Faith 1992), the notion and use of functional diversity (i.e. the diversity of plant species traits in ecosystems) has emerged, particularly over the last few decades as a measure of biodiversity (Kattge et al. 2011). Typical examples of plant traits include; leaf dry mass, leaf area, rooting depth, maximum growth rate, leaf nitrogen concentration, (Petchey and Gaston 2002; Mason et al. 2010) while animal traits include body size, wing size (for birds and insects) and respiration rates (Tarka et al. 2010; Laine et al. 2013). Many studies have shown that functional diversity is one of the best predictors of ecosystem functioning that is available, providing a strong and direct link to ecosystem functioning (Petchey et al. 2004; Cadotte et al. 2009; Flynn et al. 2011). The reason is that it is not the number of species but the number of traits that directly relates to ecosystem functioning (Díaz and Cabido 2001; Flynn et al. 2009; Mouchet et al. 2010; Petchey and Gaston 2006).

Indeed, ecological literature (e.g. Fukami et al. 2005) has shown that functional diversity responds more consistently to environmental drivers than species richness. It is precisely these same arguments that we consider, justifying the incorporation of functional diversity when assessing biodiversity in a LCA. Using functional diversity instead of species richness will make the impact assessment more certain and hence more clearly dedicated to biodiversity impacts. Unfortunately, the use of functional diversity metrics in LCA is highly limited. Only recently, a specific model has been introduced by De Souza et al. (2013) based on data from the North-South Americas. The functional diversity metric (see FRD metric (1.4) in Appendix Table A1, Electronic Supplementary Material) proposed by Petchey and Gaston (2002) and Mouchet et al. (2008) was implemented. Although this model has not been made operational due to the lack of global characterisation factors, this model can be applied to evaluate the impact of land occupation (at least in the North and South Americas with possible extension to global regions).

We argue that, through its relationship with various environmental drivers and human impacts thereon, functional diversity may provide a more generic tool for assessing environmental impacts on biodiversity in LCA. To achieve a more extensive use of functional diversity metrics in LCA, the first challenge is to select the most appropriate metrics that generically and quantitatively define the relationships between inventory flows and functional diversity. However, ever since the importance of functional diversity has been realised, a wide variety of metrics have been developed (Appendix Table A1), and a general consensus regarding the most appropriate measure is still lacking (Petchey et al. 2009). This problem has been amplified by the strong increase in the number of available metrics over recent years. Hence, there is a growing need to categorise and identify appropriate metrics to guide LCA model developers to choose meaningful metrics for the purpose of LCA. A major issue is that the novice user whose interest lies in environmental impacts, risks choosing an inaccurate metric or a combination of metrics, without sufficient justification. In summary, functional diversity has been promoted as a promising metric in LCA assessments (De Souza et al. 2013), but information on correct metric choice and real-world applications are hitherto lacking. Therefore, in this study, we aim to examine the properties with additional comments on metric quality and behaviour, possible drawbacks, constraints and limitations, see Appendix Table A3 (Electronic Supplementary Material). Building on the excellent framework originally formulated by Schleuter et al. (2010) and by extending this review, we identified the following aims:

  • To identify metrics which are frequently applied in scientific literature and highlight those with greater explanatory power (i.e. link to ecosystem functioning)

  • To categorise functional diversity metrics according to important and desirable propertiesFootnote 1

  • Provide the reader with further informed and justified recommendations to ease the selection process of functional diversity metric(s) based on reconciliation of the above objectives

With respect to the first aim, we performed a meta-analysis of literature that either directly incorporated or mentioned functional diversity metrics in relation to ecosystem functioning and services in the context of induced pressures, and paid special attention to a quantifiable description of the strength of the link. For this analysis, our study is confined to well developed and commonly used metrics, see Appendix Table A1 (Electronic Supplementary Material). For the second aim, we reconciled the information on desirable properties, frequent use and explanatory power. In combination, we discuss and select the functional diversity metric(s) that are most suitable for incorporation into LCAs.

2 Functional diversity metrics

For functional diversity to be meaningful and worth measuring, it must be related to human-induced pressures included in LCAs, and it should provide information above and beyond what species richness can explain. Functional diversity is measured in a multitude of ways; technically, it represents the diversity of traits, but it is taken to represent the diversity of species niches in trait space (Petchey et al. 2004; McGill et al. 2006; Petchey and Gaston 2006; Villéger et al. 2008). While the use of functional diversity metrics presupposes a mechanistic link between diversity and the ecological phenomena in question, which indeed has been proven in experimental settings (e.g. Fukami et al. 2005; Heemsbergen et al. 2004), a systematic review of field studies aiming at establishing this link is so far lacking.

2.1 Description of metrics

Functional diversity metrics can be one-dimensional (1D), i.e. incorporating a single functional trait (e.g. functional logarithmic variance, see (3.1) in Appendix A1). More often though a multi-dimensional/variate (MD) metric is applied, (e.g. functional volume, see (1.3) in Appendix A1). For a multi-dimensional metric, each co-ordinate corresponds to a measured trait and each point represents the position of an individual or a species in trait space. A full comprehensive list can be found in Appendix Table A1 (Electronic Supplementary Material). Whether it is better to use a single trait or to combine several traits depends on the ecological context (e.g. Butterfield and Suding 2013). Schleuter et al. (2010) argue that multivariate metrics are preferable since studies are more informative when the distribution of species is represented in a multi-dimensional trait space. Petchey et al. (2004) re-analysed six biodiversity ecosystem functioning experiments and found that multivariate metrics explained variation in ecosystem function better. However, there is no principal argument that justifies the preference of either, and neither can conclusions be drawn from field tests. One may argue that the strategies of species are always composed of multiple axes and hence multiple trait combinations. Thus, to express functional diversity properly, multiple dimensions are required. On the other hand, one may also argue that because of multiple strategy axes, and given that each axis is probably driven by a specific environmental pressure, functional diversity metrics of multiple dimensions will not be able to form a strong link to specific human impacts. For our purposes, we suggest that it would be improper to dismiss a metric based on its dimensionality; therefore, our study comprises of both one-dimensional and multivariate metrics.

In principle, seeking to aggregate information into a single metric would be most desirable. However, in case of functional diversity, such a metric providing complete information does not exist. This is not unique to biodiversity and other impact categories in LCA are also characterised by multiple metrics. Functional diversity, like biodiversity, is a multi-faceted entity. In line therewith, Mason et al. (2005) strongly argue that it is not possible to completely represent the diversity of a community in a single index and instead to capture the multiple facets of functional diversity, as well as associated impacts, by using multiple independent metrics. Mason et al. (2005) decomposed biodiversity into three distinct components each of which can be quantified and linked to a different—independent—facet of functional diversity, namely functional richness (FR), functional evenness (FE) and functional divergence (FD). A similar view is indicated by Ludwig and Reynolds (1988); Purvis and Hector (2000). The three components can be categorised as the following:

  • Functional richness (FR): the amount of niche space filled by all species in the community.

  • Functional evenness (FE): the evenness of abundance distribution in filled niche space.

  • Functional divergence (FD): the degree to which the distribution of species abundances in niche space maximises total community variation (see Appendix Table A1, Electronic Supplementary Material) (Fig. 1).

In this study, the dependence of metrics will be assessed through analysis of correlations (based on literature as described in “Section 3”). In addition, the independence of the metrics from species richness and evenness, essential to obtain orthogonal information (Mouillot et al. 2005), is evaluated.

A large variation of functional diversity metrics is presented amongst the literature (Rao 1982; Villéger et al. 2008; Mouillot et al. 2005; Mason et al. 2003). Reviews of functional diversity metrics are presented by Schleuter et al. (2010) and Mouchet et al. (2010). These reviews did not include the metric of functional dispersion, from multiple traits as proposed by Laliberté and Legendre (2010). Functional dispersion is an extension of the original framework of Villéger et al. (2008), which has been generalised to a highly flexible distance-based framework for any distance or dissimilarity measure, multiple traits of different types and allowing for missing trait values and weighting of individual traits. Also, Blonder et al. (2014) proposed the n-dimensional hypervolume, a generalisation of the convex hull concept that allowed for gaps in the convex hull. A comprehensive list with all functional diversity metrics currently available from the above compilation can be found in Appendix Table A1 (Electronic Supplementary Material). Functional dispersion and the n-dimensional hypervolume have been included in this list to provide the reader with a comprehensive view of the different types of metrics. Particularly, these metrics are recently developed and shown to be promising in applications. However, due to lack of information on metric structural properties (see “Section 3”) and their link to ecosystem functioning (see “Section 2.3”), these metrics are omitted from those analyses. We will return to these metrics in “Section 4” when discussing limitations.

2.2 Application of metrics to human-induced pressures

Although it is generally understood that functional diversity is linked to ecosystem functioning and services, it is still unclear whether a link can be derived from human-induced pressures. In particular, the following question arises: which functional diversity metrics provide a greater explanatory power on impacts of these drivers? To answer this question, we will focus on studies for functional diversity of plant communities: in principle, functional diversity metrics are applicable to all taxa. Given that each taxon has different functional traits, however, functional diversity will have to be calculated for each taxon separately (in analogy to species richness of plants and insects which cannot be combined in one metric) but it is likely that some taxa will be more sensitive than others with respect to some impact categories. So far, the focus of most studies has been plant based, particularly due to the large amount of data readily available (such as the TRY Global database of plant traits with millions of records on plant traits on many taxonomic groups and on a global scale, Kattge et al. 2011, also see Díaz et al. 2016).

A study conducted by the Royal Botanic Gardens (Kew, UK) states that the impact of humanity far outweighs natural threats to plant species, accounting for approximately 81.3% of threats. Typical human-induced pressures such as residential or commercial development, commercial agriculture, wood plantations etc. can be briefly categorised to land use/land use change or land occupation/transformation. Other human-induced pressures on biodiversity include eutrophication caused by N and P emissions, ecotoxicological effects due to emissions of toxic substances, climate change caused by greenhouse gas emissions and water use/abstraction, which like land use change can be easily placed in the framework of Life cycle impact assessment (compare De Schryver et al. 2009 for climate change and Pfister et al. 2009 and Verones et al. 2017, see also Koellner et al. 2013). In Table 1, we review the recent studies which utilise functional diversity metrics to evaluate the threats of various human-induced pressures that relate well to impact categories in LCA.

Table 1 Functional diversity metrics (see list Appendix Table 1) which are frequently applied in scientific literature in light of studies on plant ecosystem functioning and services affected by pressures (as those used at the impact level in LCA). This compilation applies to plant based taxonomic groups

Most studies in Table 1 refer to land use change and eutrophication. This is not surprising since land use change is the main threat amongst pressures (see De Souza et al. 2015). According to the Millennium Ecosystem Assessment (MEA 2005), land use change has had the highest impact of all pressures on biodiversity. Even so, some studies indicate that climate change may be the biggest pressure, and climate effects are currently significant and forecasted to be an emerging major threat (Scheffers et al. 2016). Some studies have suggested that over the next few decades, climate change could surpass land use change as the greatest global threat to plant life (Leadley et al. 2010; Bellard et al. 2012). However, no quantitative assessment exists to support this claim at global scales and only few studies (as shown in Table 1) refer to CO2 elevation, temperature rise or water use. No studies were found that related functional diversity metrics to ecotoxicity or acidification. The frequent use of particular functional diversity metrics does however not directly imply that they are most suitable. Most studies do not provide a convincing justification and do not quantify the strength of the relationship between functional diversity metrics and human-induced pressures.

2.3 Strength of metric link to human-induced pressures

Several studies determined the links between functional diversity metrics and ecosystem function or allowed the link to be determined. From the list of metrics in Appendix Table A1 (Electronic Supplementary Material), only the richness metrics FRV, FRD, evenness metric FEm and divergence metrics FDvar, FDQ, FDm were found to have a quantifiable link. Figure 2 presents a circular chart where the area of each circle depicts the strength of metric link against pressure via average (squared) ordinary least square (OLS) regression coefficients Ri2. Note that, there was no study found which quantified the link using Ri2 between functional diversity and water use; hence, water use is missing from Fig. 2. Some studies, i.e. Mason et al. (2010); Pakeman (2011) and Dubuis et al. (2013) directly provided the effect size. Other studies presented the relationship between the chosen functional diversity metric and human impacts using descriptive statistics—which is not useful for a comparative study. In the case of quantitative measures, there is no unified approach, in the sense that the link between functional diversity metric and pressures is either described qualitatively or tested using a variety of statistical measures. Subsequently, any indication of the mean strength of the relationship is uncertain. For comparison purposes, our study forms a compilation of regression coefficients Ri2, extracted from literature. The average value is then calculated \( \left\langle {R_{\mathrm{i}}}^2\right\rangle =\frac{1}{N}\sum \limits_{\mathrm{i}=1}^N{R_{\mathrm{i}}}^2 \) for each relative pressure (across N number of studies which report Ri2 values) as a representative measure to determine the strength of the link between functional diversity metric and pressure. Then the overall mean strength γ is found by averaging 〈Ri2〉 across all pressures (see Appendix Table A2, Electronic Supplementary Material). For the mean values γ, we did not expect strong relationships for all metrics listed in Appendix Table A2 (Electronic Supplementary Material) as the mechanisms involved affecting functional diversity may differ for different pressures. In that case, we would expect pressures associated with changes in niche space to relate to functional richness metrics, while pressures affecting competition would relate more strongly to functional evenness metrics. Also note that for any metric for which no study exists, does not necessarily imply that the metric is useless.

Most studies represented in Fig. 2 incorporated multi-dimensional metrics with the exception of FDvar, whose usage is found in Mason et al. (2003) and Conti and Diaz (2013). With reference to Appendix Table A2 (Electronic Supplementary Material), most commonly used metrics across studies are FRV, FEm, FDvar and FDQ. We can group the metric link with human-induced pressures according to strength by introducing three classes using the mean values γ, that is; (i) strong link (from either multiple or single study) if 0.5 < γ ≤ 1, (ii) moderate link if 0.25 < γ ≤ 0.5, (iii) weak link if 0 < γ ≤ 0.25, and (iv) no or unknown link γ = 0. Here, γ = 1 would represent a link to ecosystem functioning of maximum strength (unrealistic case in the real world). Hypothetically, those metrics which are classed with unknown links could in reality have a link, but since this is unknown, we treat these cases the same as those with no links, thus γ = 0. In the case of FDvar, a strong relationship γ = 0.533 is found with averaging over multiple recordings (N = 13 in total) and two pressures (i.e., eutrophication and CO2 elevation), see Mason et al. (2003), Conti and Diaz (2013). Stronger relationships are only found for combinations where only one single recording is available (N = 1) i.e. for FRD, γ = 0.81 (Brown and Milner 2012), and FDm, γ = 0.6 (Conti and Diaz 2013). The strength of these links is therefore not precisely known. FRV and FDQ demonstrated an adequate link of moderate strength (γ = 0.318 and 0.311, respectively) across studies which incorporated different pressures, except for water use. FEm is consistently amongst the lowest performers noted from the mean value (γ = 0.193) listed in Fig. 2 and Appendix Table A2 (Electronic Supplementary Material), whilst more studies are needed to make an educated analysis of the other metrics. Based on this preliminary review, and with reference to our first objective, we may thus identify five metrics which demonstrate moderate to strong links to ecosystem functioning

$$ F{R}_{\mathrm{V}},F{R}_{\mathrm{D}},F{D}_{\mathrm{var}},F{D}_{\mathrm{Q}}\ \mathrm{and}\ F{D}_{\mathrm{m}} $$

which are applied in current scientific literature (however, note FRD and FDm are not frequently used).

3 Structural properties of metrics

Next to a link to human-induced pressures, selected functional diversity metrics should have relevant properties which are both important and desirable. There has been some discussion in the literature on the important and desirable structural properties of metrics. Several authors have identified these properties through experimental design, testing and simulations of artificial data sets (Solow and Polasky 1994; Mason et al. 2003; Ricotta 2005; Mouillot et al. 2005; Villéger et al. 2008; Schleuter et al. 2010). However, the number of studies is few. Also, there is some controversy over the statistical validity of these metrics (Petchey and Gaston 2007; Podani and Schmera 2007), an illustration of this can be found in Petchey and Gaston (2002). Some studies have incorporated theoretical tests to assess the metric quality or accuracy. For example, Schleuter et al. (2010) tested five distinct artificial scenarios of exemplary datasets, for information on metric behaviour (see Appendix Table A3, Electronic Supplementary Material), with the key objective to test whether the metrics behave according to design. Mouillot et al. (2005) provided a theoretical study for FE metrics, testing whether predefined properties are satisfied. Mason et al. (2003) developed a criterion with ten entities for ecological use (rather than a mathematical treatment) to assess functional diversity. We have highlighted key properties from literature such as (B) set monotonicity, (C) trait scale invariance (also known as monotonicity in distance), (D) twinning criterion, (E) response to empty space and (F) symmetry as those which are most relevant. Also, answers to some other pertinent questions regarding (G) correlation (i.e. independence or orthogonality), (A) dimensionality, and whether the metric conforms to design must be understood (see also Appendix A3, Electronic Supplementary Material). A description and reasoning of these relevant properties can be found in Table 2, alongside literature references.

Table 2 Detailed description of the specific properties which are desirable: dimensionality, set monotonicity, trait scale invariance (or monotonicity in distance), twinning criterion, response to empty space, symmetry and correlation

Table 3 shows whether the metrics conform to those properties described in Table 2. Each of these structural properties are considered with equal weighting i.e. identically in terms of importance.

Table 3 Assessment of functional diversity metrics (obtained from literature) with respect to important properties, including: dimensionality, set monotonicity, trait scale invariance (or monotonicity in distance), twinning criterion, response to empty space, symmetry and correlation. Here, 1D/MD corresponds to whether the metric is one- or multi-dimensional, respectively. Also, ‘yes’ denotes that the metric conforms to each corresponding property described in Table 2, with ‘no’ signifying opposite meaning. For correlation, ‘yes’ denotes whether there is significant evidence that the metrics are correlated with other metrics, where ‘no’ signifies independence. Those metrics which are correlated with species richness (SR) have been highlighted. For FR metrics, this is naturally the case due to construction. Note that the assessment is not applicable for metric (3.1) Fnc. Unalikeability (used for categorical traits) shown with *. We have included this metric for completeness. The blank spaces represent that either information on whether the metric conformed to the property could not be found in the literature or is not applicable. A detailed version with comments on metric quality, behaviour, constraints and limitations can be found in Appendix Table A3

In our evaluation of Table 3, we discuss the properties associated to metrics of richness, evenness and divergence separately. These three components of functional diversity have been suggested to be (partly) independent of each other. Such independence is important if more than one metric is applied for a particular study to ensure that orthogonal information is obtained. Hence, the actual independence from other metrics is also accounted for in our evaluation. To demonstrate a redundant choice, it would not make sense to select the divergence metric FDvar alongside FRR or FRV in the same metric set, as these metrics are correlated.

We find that all FR metrics satisfied set monotonicity, as normally expected of richness metrics. Of the one-dimensional FR metrics, FRIs satisfies more (known) requirements in comparison to FRR. Although both metrics satisfy set monotonicity and trait scale invariance, FRIs has the advantage that it responds well to empty space and is uncorrelated with any other metrics. Trait scale invariance is an important property required in order to avoid transformation or standardisation of data (Schleuter et al. 2010). Amongst the FE metrics, it is unclear which is more suitable, since both FEs and FEm satisfy a mixture of properties. FEm does not satisfy set monotonicity nor the twinning criterion, and these properties are unknown for FEs. Neither metric is correlated with other metrics. Further testing is required for FEs and simply dismissing this metric on the basis that it is one-dimensional does not suffice. From the one-dimensional FD metrics we find that FDvar is one of the strongest candidates due to the largest number of properties satisfied, but correlated with FRR and FRV which is undesired. FDσ and FDs satisfy a mixture of properties, that is FDσ satisfies set monotonicity but not trait scale invariance. For FDs, the opposite is recorded; the metric satisfies trait scale invariance but not set monotonicity. The latter has the further disadvantage that it is correlated with FRV and both of these metrics have one or more undesired properties.

Of the multi-dimensional FR metrics, FRV does not satisfy trait scale invariance and is also heavily correlated with other metrics; therefore, it should not be used alongside FDvar, FDs or FDQ. FRD does not satisfy the twinning criterion, nor trait scale invariance and does not respond well to empty gaps in trait space, whereas FRIm does respond well. FRV and FRD have their limitations and neither of these satisfy all properties. Trait scale invariance eases the calculation of characterisation factors, which are needed to link pressures to the metric, and thus used to assess impacts in LCA (e.g. FRD is not trait scale invariant, and therefore De Souza et al. 2013 standardised metric values to calculate characterisation factors for land use impacts). It is unclear whether FRIm performs better, a study is required to test this metric for trait scale invariance and the twinning criterion. Each of the multivariate FR metrics thus has some undesired properties. All multi-dimensional FD metrics satisfy the twinning criterion. A drawback for FDis is that it does not satisfy trait scale invariance whereas FDm does, and it is unclear for FDQ. The n-dimensional hypervolume metric introduced by Blonder et al. (2014) resolves this issue, and behaves well with respect to empty space or missing trait data and seems very promising. However, this is a relatively new metric which has yet to undergo the assessment stipulated by Table 3 and it is still unclear whether this metric satisfies other properties. In addition to the n-dimensional hypervolume, other recent developments include; Range box (Qiao et al. 2017), Minimum ellipse (Swanson et al. 2015), Dynamic range box (Junker et al. 2016) and Probabilistic hypervolume (Carmona et al. 2016). However, these metrics also have yet to undergo stringent tests to reveal whether they conform to important properties or have a link to ecosystem functioning. Therefore, we excluded these metrics from our assessment in Table 3. To summarise the above for our recommendations:

  1. I.

    Richness: FRIs is an uncorrelated metric which satisfies more properties than FRR and therefore is better suited. FRV can be incorporated into the list provided it is not selected simultaneously with those divergence metrics which it is heavily correlated with, namely FDvar, FDs or FDQ. FRD and FRIm are also other suitable candidates.

  2. II.

    Evenness: It is somewhat unclear which evenness metric is more suitable based on an assessment on properties; by default, we include both FEs and FEm.

  3. III.

    Divergence: Amongst the one-dimensional metrics, FDvar satisfies most properties and therefore is desired instead of FDσ or FDs (i.e. the preferable metric would need to satisfy relatively much larger number of properties to outperform other metrics). All other multi-dimensional metrics FDQ, FDm and FDis are less-preferred potential candidates, since they satisfy a mixture of properties.

Based on this evaluation, the following one-dimensional metrics FRIs, FEs, FDvar and multi-dimensional metrics FRV, FRD, FRIm, FEm, FDQ, FDm, FDis can be categorised as important and desirable, following a line of reasoning based on structural properties.

4 Discussion

In our analysis of the most suitable metrics, we have identified those metrics which have a link to human-induced pressures (“Section 2.3”) and satisfy desirable properties (“Section 3”). While many metrics of functional diversity have been published, a general consensus is still lacking as to exactly what the metrics quantify, how redundant they are and which ones are most suitable for application (Mouchet et al. 2010). Summarising a large data set into a single diversity figure results in a loss of information; therefore, a perfect measure of functional diversity does not exist. In fact, it is not possible or even desirable to sum up all the aspects of functional diversity into a single number (Ludwig and Reynolds 1988; Ricotta 2005). Mason et al. (2005) proposed a framework where functional diversity is best described via a metric set of three independent and complementary components as opposed to a singular metric; namely, functional richness, functional evenness and functional divergence (FR, FE, FD). The motivation behind this framework stems from the fact that each component describes a different aspect of functional diversity (see Fig. 1). This view is also supported by Mouillot et al. (2005); Villéger et al. (2008); Schleuter et al. (2010); Mouchet et al. (2010) and a similar view was also held by Ludwig and Reynolds (1988); Purvis and Hector (2000) before Mason et al. (2005) formalised a definition. Pakeman (2011) also highlights that there is a theoretical basis in measuring functional diversity in this way. The use of these three components will allow estimating the differential impacts on multiple aspects of functional diversity, and aid ecologists in examining the mechanisms behind ecosystem functioning (Mason et al. 2005). In the context of LCA, the decision maker will have a set of three metrics at their disposal, describing each component of functional diversity for a single impact category. Here, we followed this distinction into three categories to provide a comprehensive framework for the quantification of functional diversity in trait space and set out to choose a metric set of independent components.

Fig. 1
figure 1

Illustration of the concepts of functional diversity and those differences (low/high) between a functional richness FR, b functional evenness FE and c functional divergence FD. (Reprinted from Carmona et al. 2016)

To enable this choice, Table 4 summarises the findings on the ten metrics FRIs, FEs, FDvar, FRV, FRD, FRIm, FEm, FDQ, FDm, FDis that have been related to human-induced pressures. Within Table 4, we ranked metrics according to mean strength γ and the number of studies that evaluated the strength. On analysing Table 4, we find that the three metrics that have been frequently used and that have moderate to strong links to human impacts, namely FDvar, FRV and FDQ are heavily correlated with other metrics. However, this can be overcome provided that the metric set chosen is completely orthogonal. Following this reasoning, we propose that (FRV, FEm, FDm) is an ideal set, also supported by Villéger et al. (2008) and Mouchet et al. (2010). FRD is the alternative possible candidate for richness (used by De Souza et al. 2013 in a LCA study), despite it failing to satisfy the twinning criterion and trait scale invariance, it has an apparent link to human-induced pressures. By comparison, FRIs and FRIm have an unknown behaviour in this regard. Using the process of elimination, it seems that FRV and FRD are both deemed the only suitable richness metrics. With respect to evenness metrics, there is an urgent need to develop these further, to date only two such metrics exist. FEm is the only evenness metric which has a known link to human-induced pressures, whereas the link for FEs is unknown. Therefore, we include FEm in our recommendations despite the link being weak and even though Mouillot et al. (2005) argued for the usage of FEs. For divergence, there are multiple candidates; FDm does have a strong link with ecosystem functioning, however this result was obtained from a single study; therefore, it is somewhat unclear whether the link is viable. Mason et al. (2003) has strongly argued for the usage of FDvar, which has behaved well with respect to those important properties listed in Table 3 and has a strong link (see Fig. 2). This metric has outperformed others and consistently shown to be the most desirable. However, usage must be treated with caution and not selected alongside FRV, in order for orthogonal information to be obtained across all components. Amongst the one-dimensional metrics, FDvar is the only metric we recommend for usage. FDis is dismissed on the basis that it has issues with scale invariability and even worse has an unknown link. Therefore, FDvar, FDQ or FDm seem preferred. Taking into account all the information in this study and the relative metric inter-dependence, we arrive at the following recommendations, either

$$ \left(F{R}_{\mathrm{V}},F{E}_{\mathrm{m}},F{D}_{\mathrm{m}}\right),\left(F{R}_{\mathrm{D}},F{E}_{\mathrm{m}},F{D}_{\mathrm{var}}\right),\left(F{R}_{\mathrm{D}},F{E}_{\mathrm{m}},F{D}_{\mathrm{Q}}\right)\kern0.50em \mathrm{or}\ \left(F{R}_{\mathrm{D}},F{E}_{\mathrm{m}},F{D}_{\mathrm{m}}\right). $$
Table 4 Categorising metrics according to the three facets (FR, FE, FD) alongside the strength of the link with ecosystem functioning. Metric link with ecosystem functioning is grouped according to strength by using the mean values γ, i.e. strong link (either from multiple or single study) if 0.5 < γ ≤ 1, moderate link if 0.25 < γ ≤ 0.5, weak link if 0 < γ ≤ 0.25 and either no or unknown link γ = 0. The boxed metrics are the preliminary preferred candidates highlighted in the reduced list (see “Section 3”)
Fig. 2
figure 2

Strength of the link between functional diversity metrics and pressures. The area of the circles represents the numerical values 〈Ri2〉 which are the average squared linear regression correlation coefficients taken over N recordings. Here, N is the number of studies which report Ri2 values. See Appendix Table A2 for a compilation of Ri2 values with corresponding literature references. The mean strength γ is found by averaging 〈Ri2〉 across all pressures (i.e. γ = sum of 〈Ri2〉 values/no. of pressures for which a link was found), and is indicative of the overall strength of metric link to ecosystem functioning. The introduced pressures are in line with the DPSIR framework

These four distinct permutations are all orthogonal by selection. Notice that FRD provides multiple options since the metric is totally independent of all other metrics.

To come to a final recommendation on functional diversity metrics to be selected for LCA studies, we devised an adhoc scoring system (i.e. assign weights) based on a score for its link to ecosystem functioning (E.F.S.) and one for structural properties (S.P.S.). The multiplication of these scores, called λ summarises its usefulness for LCA studies (Table 5).

Table 5 Scoring metrics according to strength of link to ecosystem functioning and number of structural properties satisfied. Weights are assigned according to (i) strong link to ecosystem functioning from multiple recordings (E.F.S = 1) (ii) strong link from only one recording (E.F.S = 0.75) (iii) moderate link (E.F.S = 0.5) and (iv) weak link (E.F.S = 0.25). Structural property score (S.P.S) is calculated as number of satisfied properties/total no. of properties (note: total number of five properties (B)–(F) considered, enlisted in Table 2). We do not distinguish between 1D or multivariate metrics when computing S.P.S scores, as dimensionality is not linked to preference

When combining λ for the four metric permutations in the previous compilation of recommendations, we obtain;

$$ {\displaystyle \begin{array}{l}\left({FR}_D,{FE}_m,{FD}_{\mathrm{var}}\right),\left\langle \lambda \right\rangle =0.333,\\ {}\left({FR}_V,{FE}_m,{FD}_m\right),\left\langle \lambda \right\rangle =0.183,\\ {}\left({FR}_D,{FE}_m,{FD}_m\right),\left\langle \lambda \right\rangle =0.167,\\ {}\left({FR}_D,{FE}_m,{FD}_Q\right),\left\langle \lambda \right\rangle =0.1.\end{array}} $$

Hence, in conclusion, we propose that the most effective permutation is (FRD, FEm, FDvar). If a metric set of only multi-dimensional components is sought, then (FRV, FEm, FDm) is the best option (also supported by Villéger et al. 2008 and Mouchet et al. 2010). As a metric for functional evenness, FEm is included in all sets, although the relationships between functional evenness and human-induced pressures seems rather weak. Each selected set contains multiple metrics.

In the context of LCA application, the chosen set of metrics should not be aggregated because FR, FE and FD relate to different effects on ecosystem functioning. Hence, each metric indicates a different aspect of biodiversity affected and can be used to obtain an understanding of potential implications. For example, functional richness relates to the resistance of the ecosystem to new pressures and a change therein therefore indicates that some functions may not be fulfilled anymore (Mason et al. 2005). Likewise, impacts on functional evenness would suggest higher susceptibility to other competitive or invasive species (Hejda and De Bello, 2013). Similarly, impacts on functional divergence would relate to community stability e.g. communities with higher functional divergence are more prone to changes in species composition (De la Riva et al. 2017). In the above context specific scenarios, if the focus is on resistance to ecosystem (relating to richness) or community stability (relating to divergence), then it may be argued that evenness is not required for LCA, and can be omitted altogether. The result thereof, is that the distribution of species abundance in occupied niche space is not important. We justify inclusion of the evenness metric FEm on the basis that a link exists, and therefore possibly meaningful—despite a weak link. By formulating a consistent line of reasoning, we remove only those metrics which have no or uncertain links. Also note that, whether evenness is included or not, the order of the proposed metric sets does not change. To summarise, the LCA practitioner should understand that the metric set will provide independent and complementary information on richness, evenness and divergence that should be interpreted separately within a single impact category of LCA.

While in designing our study, we took care to perform our analysis in a structured and feasible way to come to our final recommendations; it is still clear that our study has limitations due to the following:

  1. I.

    To better evaluate how functional diversity metrics behave in practice, there needs to be an increased effort in studies pertaining to human-induced pressures which specifically quantify the strength of metric link in a coherent way (i.e. by incorporating a common statistical measure), thus allowing for comparisons across multiple studies. These links should be investigated in a general sense, and not only be confined to those associated with LCAs. As an initial starting point, focus should be on those metrics which have shown a strong link from one recording or even an unknown link, see Table 4. Also, there has been little testing of functional diversity metrics against field data (Pakeman 2011; Dubuis et al. 2013), and therefore, there is a lack of quantitative assessment of the link between functional diversity metrics and human-induced pressures. More specifically, further investigation is required to check the strength of metric link for FRIm, FRIs and FDis as well as for several other metrics, such as FRD and FDm. Both of these metrics have been shown to form a strong link, but found only one recording to support this claim. Provided that complete information is obtained, more accurate ecosystem functioning scores (E.F.S) can be assigned, resulting in a possible change in recommendations on metric selection.

  2. II.

    There is a need to test the link of metrics which provide information on evenness; currently there are only two evenness metrics, namely FEs and FEm, with FEs having either no or an unknown link. The lack of information poses a limitation on its use in LCA, and stronger evidence is required to reveal the importance of evenness in identifying human impacts. One may argue that evenness is primarily linked to competitive exclusion processes and subsequently less related to human-induced pressures. While the importance as a component of functional diversity is clear, it is less relevant in an LCA type of analysis. We find that evenness does have some link to human-induced pressures, as demonstrated by Pakeman (2011); Mason et al. (2012); Dubuis et al. (2013) etc. Therefore, we include the multi-dimensional counterpart in our recommendation, despite a weak link being found (see Fig. 2).

  3. III.

    It is unclear whether some metrics conform to those structural properties enlisted in Table 3. The blank spaces in Table 3 represent that either information on whether the metric conformed to the property could not be found in the literature or the property is not applicable. Further studies are required to check whether the metrics satisfy the corresponding properties. Also, it would be interesting to see a study which highlights those properties in order of importance. This will allow weights to be assigned accordingly, as opposed to treating each property equally (as in the case of this study). In terms of importance, we consider independence as an essential feature. Each component should be able to provide different and orthogonal information with respect to richness, evenness and divergence. This is precisely the functional diversity framework proposed by Mason et al. (2005) and others (Mouillot et al. 2005; Villéger et al. 2008; Schleuter et al. 2010; Mouchet et al. 2010).

  4. IV.

    We attempted to relate the individual metrics to specific human-induced pressures (Table 1). However, the list is most possibly not exhaustive. Also, note that infrequent use does not necessarily imply redundancy.

If research effort and attention is redirected to (I.–IV.) then this would in turn help reveal those relationships and mechanisms at play between ecosystem processes and functional diversity to improve characterisation factors for incorporation in LCA.

We hope that our suggestions for improved LCA of biodiversity based on metrics for functional richness, functional evenness and functional divergence will guide the LCA model developer. The next step is to make the concept operational. In the short term, this would consist of two steps: (1) to turn the concept into operational (global) characterisation factors that transform environmental pressures (e.g. N and P emissions, water extraction, other emissions, land use occupation) to our proposed metrics for biodiversity loss, the compilation in Table 1 could serve as the basis for such assessment, (2) to identify the basic data and models needed to calculate the types of proposed metrics. Recently, global maps of vegetation traits were produced (e.g. van Bodegom et al. 2014; Butler et al. 2017) based on which in principle each three metrics can be derived for use in background systems and to make our proposed metrics operational without much investment. The long-term strategy would be to gather data and hence calculate characterisation factors and the associated (change in) metrics more precisely for use and understanding in foreground systems. Occasionally, such approach is already applied. For example, the model proposed for LCA by De Souza et al. (2013), using the FRD metric, is based on compiled data by Flynn et al. (2009) and Gibson et al. (2011) across land use intensification.

5 Conclusions

Our analysis of functional diversity reinforces the need for using three independent and complementary components; richness, evenness and divergence. The sets of functional diversity metrics that best reconcile strength of link to human-induced pressures and desirable structural properties including independence are (FRV, FEm, FDm) OR a combination of FRD and FEm with either FDvar, FDm or FDQ. All four permutations are potential candidates for application and can be utilised to comprehensively determine human impacts on biodiversity in a LCA model. Obviously, these recommendations are not set in stone; i.e. once more information is readily available, refined performance indices can be computed for each metric, to allow for better informed choices. We hope that this study will constitute a useful point of reference and a means of reasoning for metric selection of functional diversity, particularly for Life Cycle Assessment model developers in future studies.