Identifying typical and early warning species by the combination of functional-based diagnostic species and dark diversity

The European Union Habitats Directive requires the identification of typical species that reflect the structure and functions of habitat types, as well as early changes in the habitat condition, but no common methods are available for their selection. Diagnostic species with high fidelity to a specific group of plots are identified by traditional methods, but their value as typical species is still debated. We designed a protocol for the identification of typical plant species based on a recently proposed method to detect diagnostic species by combining abundances and functional traits. We tested the method on a set of alpine habitat subtypes, comparing diagnostic species based on traits or Grime’s CSR strategies (competitive, stress-tolerant, ruderal) with those based on presence/absence or abundance only, and then we calculated for each species the dark diversity probability—i.e. probability of being absent from a habitat type with suitable ecological conditions. Functional-based methods allowed to recognize larger sets of exclusive species, adding dominant species linked to the structure and functions of habitat subtypes (i.e. to the functional centroid). Dark diversity probability was equally distributed between diagnostic and non-diagnostic species identified by functional-based methods. Species with higher dark diversity probability among those associated with the functional centroid can be considered as early warning indicators of changes in habitat conditions. The protocol proposed here enables species ranking on measurable variables (functional association, dark diversity probability) and can be applied as a standardized tool for the identification of typical plant species for habitat types dominated by plants.


Introduction
In the European Union (EU), the Habitats Directive (HD, 92/43/EEC) is the main legal instrument for the conservation of natural and semi-natural habitat types of Community interest listed in Annex I. Article 17 of the HD requires member states (MS) to produce a report every 6 years that must include the main results of the surveillance concerning the conservation status of habitat types. The conservation status of a habitat type is defined as favourable if "(i) its natural range and areas it covers within that range are stable or increasing, (ii) the specific structure and functions which are necessary for its long-term maintenance exist and are likely to continue to exist for the foreseeable future, (iii) the conservation status of its typical species is favourable". The identification of typical species is thus one of the main conditions to determine the conservation status of habitat types. However, there is still a lack of consensus on the method for selecting typical species, which is currently based mostly on expert opinion (Bonari et al. 2021a;Delbosc et al. 2021;Ellwanger et al. 2018), except for a few attempts (e.g. Tsiripidis et al. 2018). An agreed protocol for their identification should thus be the basis for a more standardized and efficient operative tool for monitoring.
The HD reporting guidelines (DG Environment 2017) specify the main features of typical species: "(i) they should be species that occur regularly at a high constancy in a habitat type or at least in a major habitat subtype or variant of a habitat type; (ii) they should include species which are good indicators of favourable habitat quality, e.g. by indicating the presence of a wider group of species with specific habitat requirements; (iii) they should include species sensitive to changes in the condition of the habitat ('early warning indicator species'); (iv) species which can be easily monitored by non-destructive and/or inexpensive means should be favoured". Consequently, the list of typical species for most habitat types can potentially be very long, if the above criteria are applied individually. At the same time, given the geographical and ecological variability of habitat types, it is not realistic to have recommended lists of typical species at the EU scale (Angelini et al. 2018;Ellwanger et al. 2018). Indeed, even within one MS, different sets of typical species may be required in different biogeographical regions or parts of the habitat range or habitat subtypes (Angelini et al. 2016; DG Environment 2017).
Based on these considerations, a typical species could be identified as "(i) a species on which habitat's identification is founded, or which is (ii) inseparable from the habitat, (iii) consistently present but not restricted, (iv) characteristic of the habitat, (v) an integral part of the structure of the habitat, and (vi) a 'key stone' species which significantly influences the habitat's structure and functions" (DG Environment 2017). Despite the definition of habitat types is often based on phytosociological syntaxa (Evans 2010;Rodwell et al. 2018), the meaning of typical species is not identical to that of diagnostic species used in phytosociology, and their connection has long been controversial (Carignan and Villard 2002;Gigante et al. 2016). However, sets of diagnostic species could form the starting point for the identification of typical species, provided that all other features required by the reporting guidelines are verified (e.g. Nicod et al. 2019;Rodríguez-Rojo et al. 2020). In phytosociology, the term 'diagnostic species' defines a combination of 'character or differential species' (Chytrý and Tichý 2003) that indicate the ecology of respectively a single or multiple vegetation unit (sensu Braun-Blanquet 1932). Diagnostic species with high fidelity to a given vegetation type are commonly derived by using species occurrences or abundances in distinct groups of relevés (plots) of different vegetation units, and many approaches have been proposed for their identification (e.g. Chytrý et al. 2002;Dufrêne and Legendre 1997;De Cáceres et al. 2010Podani and Csányi 2010). However, these approaches do not consider their functional trait-based characteristics and thus do not ensure information about their role within habitats functioning.
Functional species groups with similar traits (specifically life form, morphology, or phenology) recently started to be included in expert systems for assigning vegetation-plot records to EUNIS habitat types or other vegetation types (Bonari et al. 2021b;Chytrý et al. 2020;Landucci et al. 2015Landucci et al. , 2020Marcenò et al. 2018;Peterka et al. 2017). This underlines the need for a multi-trait approach to extract groups of species suitable for the identification, monitoring, and conservation status assessment of habitat types. When considering potential typical species, it seems necessary to evaluate their functional contribution in different local environmental contexts, likely accounting for their functional traits (Maciejewski 2010; Maciejewski et al. 2016). It is now well known that functional traits influence ecosystem processes and functions (see Lavorel and Garnier 2002;Garnier et al. 2004), and that their combinations modulate plant size and economics (i.e. strategies of resource use) from local to global scale (Díaz et al. 2016). The same patterns are evident also at the community level, where two main axes of variation have been identified and broadly interpreted as structure (size) and functions (economics) (Bruelheide et al. 2018).
These functional patterns are reflected within Grime's CSR (Competitive, Stress-tolerant, Ruderal) plant adaptive strategies (Grime 2006;Pierce et al. 2017) since the CSR strategy theory is the only one that can simultaneously explain both size and economics as fundamental gradients of plant evolution (Grime and Pierce 2012). The CSR strategies represent viable trait combinations arising under conditions of respectively, competition, 1 3 abiotic limitation to growth, or high-intensity disturbance. There is also evidence that the CSR theory can be upscaled to the community level (e.g. Zanzottera et al. 2020) and it is coherently mirrored among phytosociological classes . In this context, the inclusion of the community weighted mean (CWM) of plant traits for identifying diagnostic species (Ricotta et al. 2015(Ricotta et al. , 2020 following the mass-ratio hypothesis (Grime 1998), can favor the selection of diagnostic species having strong effects on both structure and functions of plant communities, which can be an aid to the identification of typical species, as demanded by the HD (Fig. 1).
To ensure that typical species can also match early warning indicator species, i.e. species that are most sensitive to pressures and will act as early warnings for less-susceptible species (Caro 2010), the information deriving from dark diversity could be helpful. Dark diversity is defined as the set of the species pool (i.e. all the species in the region that could potentially inhabit those particular ecological conditions; Zobel 2016) that is not present in local communities (Pärtel et al. 2011). The concept of dark diversity has a wide range of applications from nature conservation to ecological restoration (Lewis et al. 2017) and helps to improve studies of biodiversity changes, identifying species that are likely to go extinct, or estimating future invasion risk (Ronk et al. 2017;Trindade et al. 2020). Dark diversity can be defined as a fuzzy set, where the degree of certainty about a species membership to a given vegetation unit is expressed as a probability based on species co-occurrence patterns (Carmona and Pärtel 2020b;De Bello et al. 2016;Lewis et al. 2016). Hence, the subset of typical species with the highest dark diversity probability for a given habitat type can take on the meaning of early warning indicator species (Fig. 1), being at the same time ecologically suitable and the most sensitive to changes in favourable conditions in that given habitat type at a regional scale. Consider for example Bistorta vivipara or Trollius europaeus that are typical species of the habitat type 6520 "Mountain hay meadows" (European Commission 2013); they don't tolerate dry or nitrophilous conditions and thus a Fig. 1 Conceptual framework of the new protocol for the identification of typical species as required by Habitats Directive reporting guidelines (DG Environment 2017), based on the simultaneous assessment of structure and functions, as estimated by means of functional-based diagnostic species, and early warning indicators, as identified by high probability of dark diversity higher dark diversity probability in the habitat type 6520 would be a warning that the habitat is getting drier or richer in nutrients (e.g. indicating excessive mowing or fertilization, respectively).
In this study, we developed a protocol for the identification of typical plant species as required by HD, consisting of the consecutive assessment of functional-based diagnostic species and dark diversity probability following the conceptual framework of Fig. 1, and we applied it to a set of alpine habitat subtypes (Zanzottera et al. 2020). Specifically, we tested whether: (1) functional-based methods (trait-or CSR-based) for identifying diagnostic species perform better than traditional methods (i.e. fundamental methods in vegetation science based on fidelity and abundance of species), especially in pointing out exclusive species (i.e. related to only one habitat subtype); (2) functional-based diagnostic species reflect patterns and ranges of the functional variation evident at the community level (Zanzottera et al. 2020); (3) the probability of dark diversity can be used to highlight a subset of early warning indicators within the set of functional-based diagnostic species.

Dataset
We used a dataset of alpine plant communities derived from Zanzottera et al. (2020), composed of 357 relevés (size range 1-100 m 2 ) located in the Southern Alps (Lombardy, Northern Italy) and representative of three alpine habitat types on silicate substrates, ranging from less structured pioneer plant communities to more structured and stable ones: 8110 'Siliceous screes of the montane to snow levels (Androsacetalia alpinae and Galeopsetalia ladani)', 6150 'Siliceous alpine and boreal grasslands', 4060 'Alpine and boreal heaths'. Relevés are available in the Lombardy regional database of plant communities referred to habitat types (http:// www. biodi versi ta. lomba rdia. it) and were assembled from formerly published studies (see Appendix 1 in Zanzottera et al. 2020).
Annex I habitat types are often heterogeneous from a phytosociological point of view (Rodwell et al. 2018) so the recognition of habitat subtypes, when they can be identified, is often necessary for the identification of typical species. We thus classified habitat types into subtypes following the regional classification reported in the Lombardy region monitoring protocol of habitat types (Brusa et al. 2017a). Specifically, we identified the following habitat subtypes: 8110-A1 (n = 89) and 8110-A2 (n = 42), respectively unstable screes with early colonization stages and stable screes with more developed stages of Androsacion alpinae; 6150-A1 (n = 83) and 6150-A2 (n = 47), siliceous alpine and boreal grasslands of Caricetalia curvulae, respectively with low (Carex curvula dominated) and intense grazing pressure (Festuca halleri dominated); 6150-B (n = 67), snow beds of Salicetalia herbaceae; 4060-A (n = 29), alpine and boreal heaths of Loiseleurio-Vaccinion. A full description of the Corine biotopes and syntaxa associated with each habitat subtype is reported in Zanzottera et al. (2020).
For each habitat subtype, we derived species' percent abundance values from the Braun-Blanquet cover-abundance scale, as the central value of each Braun-Blanquet scale interval, and then we selected only those species with a cumulative abundance (sum of abundances of each relevé) higher than 10%, which led to a dataset of 116 (out of 219) species: 29 (out of 64) for 4060-A, 56 (129) for 6150-A1, 57 (139) for 6150-A2, 31 (60) for 6150-B, 45 (116) for 8110-A1 and 32 (98) for 8110-A2. This constraint allowed to leave 1 3 out many sporadic and occasional species and above all to standardize the effect of different sampling areas on species richness across plots of the same habitat subtype (De Bello et al. 2016), which is essential to calculate the dark diversity probability based on species co-occurrences.
For each species, we collected six quantitative functional traits, which provide a good representation of the global spectrum of plant form and function both at the species-level (Díaz et al. 2016) and at the community level (Bruelheide et al. 2018): vegetative plant height (H; mm), leaf area (LA; mm 2 ) leaf dry matter content (LDMC; %), specific leaf area (SLA; mm 2 × mg −1 ), leaf nitrogen content (LNC; %), and leaf carbon to nitrogen ratio (C/N). Data are available from the Authors' datasets (FIFTH and LIFTH, see Cerabolini et al. 2010;Dalle Fratte et al. 2021) also accessible in TRY (Kattge et al. 2020, https:// www. trydb. org: see datasets n. 227, 228, 229, 371, 372 and the forthcoming 467). Furthermore, LA, LDMC, and SLA were used to calculate for each species the CSR scores of Grime's plant strategies using the "StrateFy" classification tool (Pierce et al. 2017). The nomenclature of each taxon (family, genus, and species) in the dataset was standardized according to Bartolucci et al. (2018).
Missing data of LNC and C/N (n = 19 species) were derived using hierarchical (taxonomy: family and genus) based gap-filling by the function "GapFilling" of the BHPMF R package (Fazayeli et al. 2017). We applied the gap-filling procedure using data from both FIFTH and LIFTH datasets to gain a robust prediction thanks to the larger total number of taxa (n = 1746).

Data analysis
We applied traditional methods based on presence/absence scores (P/A) or on abundance data (AB) to test whether the species were significantly associated with any of the habitat subtypes. In this case, a diagnostic species is defined as a species that is more common in each group of plots than expected by chance alone. P/A and AB were calculated using the function "multipatt" of indicspecies R package (De Cáceres and Legendre 2009), using the "IndVal.g" index, which is a composite index implemented by Dufrêne and Legendre (1997) corrected considering that some groups have more sites than others, to avoid potential bias due to unbalanced sampling (De Cáceres et al. 2010).
We then applied the functional-based method, using the R script available in Appendix 1 of Ricotta et al. (2020), and calculated both trait-based (ϕ_trait) and CSR-based (ϕ_CSR) indicators. The functional association (ϕ) of each species with habitat subtypes was estimated as the species fuzzy degree of compatibility with each group, obtained by combining species abundances and their functional similarity with the functional centroid of the habitat subtype calculated by the CWMs of plant traits (see Ricotta et al. 2020 for further details). To evaluate whether the indicator values (both traditional and functionalbased) of each species with habitat subtypes were significantly higher than expected by chance, we set a significant threshold of p < 0.05, using 999 permutations. We then calculated the standardized effect size of functional association (SES_ϕ) as (ϕ obs − ϕ pred )/σ pred , where ϕ obs and ϕ pred are respectively the observed and predicted functional associations, and σ pred is the standard deviation of the predicted functional association.
We analyzed the main variation of functional traits and CSR strategy scores of diagnostic species identified by functional-based indicators. We applied principal component analysis (PCA) followed by varimax rotation, on the subset of diagnostic species identified by ϕ_traits using the "principal" function in the R package psych (Revelle 2017), while we used the package ggtern (Hamilton and Ferry 2018) to visualize the diagnostic species identified by ϕ_CSR in the ternary CSR strategies plot. All traits were standardized before running the PCA. For each habitat subtype, we measured the similarity between the functional space of the diagnostic species identified by each traditional (P/A and AB) or functional-based method (ϕ_trait and ϕ_CSR). We plotted the kernel density distribution curves identified by the diagnostic species of each method and habitat subtype along the main axes of traits variation, i.e. the most significant components of the PCA and C-S-R-scores. We estimated the overlapping areas between pairs of kernel density distribution curves by the function "overlap" of the overlapping R package (Pastore and Calcagnì 2019). For each habitat subtype, we also tested the differences in the functional space of the diagnostic species determined by the four different methods along each axis of traits variation by means of the analysis of variance using the function "aov" of the stats R base package.
To estimate which species belong to dark diversity, for each habitat subtype we calculated the probability of dark diversity for all species occurring in each relevé, on the basis of its co-occurrence with other species, using the function "DarkDiv" of the DarkDiv R package (Carmona and Pärtel, 2020a). The probability of a species occurring in each plot was estimated by the hypergeometric distribution, which has been identified as the best available method to estimate the probabilistic dark diversity based on co-occurrences (Carmona and Pärtel 2020b). We then computed the average probability of dark diversity for each species across relevés of the same habitat subtype; a species was considered as belonging to the dark diversity when its average probability was ≥ 0.6, thus avoiding probabilities near 0.5, which are related either to a genuine lack of association or to a lack of information due to species having a low frequency (Carmona and Pärtel 2020b). Finally, to check whether dark diversity could be used after the identification of functional-based diagnostic species, we tested the assumption that the average probabilities of dark diversity were equally distributed between diagnostic and not diagnostic species. Statistical comparison of the probability of dark diversity between diagnostic and not diagnostic species, as well as between PCA axis and CSR scores were estimated by the Kruskal-Wallis test followed by Dunn multiple comparisons available in the R package dunn.test (Dinno 2017).
All the statistical analyses were run with R software (v. 4.0.2; R Core Team 2020).

Results
Starting from the 116 selected species (cumulative abundance higher than 10%), a significant association (p < 0.05) with at least one habitat subtype was found for 110 (P/A) or 112 (AB) species using traditional methods, and for 113 (ϕ_traits) or 112 species (ϕ_CSR) using functional-based methods (Online Resource 1). All the methods detected the highest number of diagnostic species for the habitat subtype 6150-A2, and the lower for 4060-A, 6150-B and 8110-A2 (Fig. 2). For each habitat subtype, traditional methods (P/A and AB) provided a larger number of diagnostic species compared to functional-based ones (ϕ_traits and ϕ_CSR). On the contrary, the proportion of exclusive species, i.e. with significant association (p < 0.05) for only one habitat subtype, was considerably lower for traditional methods (P/A = 69/110, AB = 75/112) compared to functional-based methods (ϕ_traits = 104/113, ϕ_CSR = 102/112) (Fig. 2). Exclusive species of each habitat subtype in terms of P/A were also exclusive considering AB, as well as ϕ_traits and ϕ_CSR. In each habitat subtype, functional-based methods increased the number of exclusive species by adding important species also for habitat structure (Online Resource 1). Of the 116 initial species, 7 were common only to two subtypes belonging to the same habitat (3 species for 6150-A1 and 6150-A2; 4 species for 8110-A1 and 8110-A2). Only 6 species were not exclusive, but just diagnostic, and 2 were not even diagnostic for any habitat subtype by any method (Online Resource 1). Considering all habitat subtypes, only 2 species were identified as diagnostic only by traditional (P/A and AB) or functional-based methods (ϕ_traits or ϕ_CSR), respectively Cherleria sedoides or Leucanthemopsis alpina (Online Resource 1). Apart from these two exceptions, functional-based methods (ϕ_traits or ϕ_CSR) represented in each habitat subtype a specific subset of the species identified by traditional methods (P/A and AB).
The PCA based on functional traits of diagnostic species identified by ϕ_traits explained a large amount (69%) of the total variance, respectively 45% on PC1 and 24% on PC2 (Online Resource 2). The PC1 was positively correlated with SLA and LNC and negatively with LDMC and C/N, while the PC2 was positively correlated with LA and H (Online Resource 3), mirroring the two main axes of the global spectrum of plant form and function, from conservative to acquisitive leaf economics, along PC1-economics, and from small to large plant and organ size, along PC2-size. Diagnostic species identified by CSR plant strategies (ϕ_CSR) were arranged mostly in the lower side of CSR ternary space Fig. 2 Number of all significant (p < 0.05) diagnostic and exclusive (dashed area) species for the six alpine habitat subtypes selected in this study, calculated using presence/absence (P/A), species abundances (AB) and functional association (ϕ) based on functional traits (ϕ_traits) or Grime's CSR strategies scores (ϕ_ CSR) 1 3 along the stress-tolerant (S)/ruderal (R) axis, while few species reported relatively high C-score (Online Resource 4).
Considering sets of diagnostic species identified by functional traits, habitat subtypes exhibited significant differences along PC1-economics, with 4060-A and 6150-A1 the most conservative and 6150-B the most acquisitive. The latter was also distinguished by small-size diagnostic species (PC2-size), while 6150-A2 resulted in diagnostic species with the largest size (Online Resource 5). Results broadly overlap to the global spectrum space assuming C-score as a proxy of PC2-size and S/R axis as a representation of PC1-economics in alpine communities (Online Resource 5). Habitat subtype 6150-A2 showed the highest C-scores, subtypes 4060-A and 6150-A1 a higher S-scores, subtypes 8110-A1 and 6150-B a higher R-scores (Online Resource 5).
Considering each axis of trait variation, i.e. PC1-economics and PC2-size, as well as C-S-R-scores, the functional space occupied by species sets resulting from the four different methods (P/A, AB, ϕ_traits, and ϕ_CSR) did not show significant differences, within each habitat subtype. They rather largely overlapped, indicating that the functional space of diagnostic species identified by the different methods is highly comparable (Fig. 3 and Online Resource 6).
The average probability of dark diversity showed no significant differences (p > 0.05) between diagnostic and not diagnostic species identified by functional-based methods (ϕ_ traits or ϕ_CSR) for all habitat subtypes (Fig. 4a, c). Most of the average dark diversity probabilities of diagnostic species were in a range from 0.2 to 0.6, and the dark diversity probability of few species (n = 4 for 4060-A, 3 for 6150-A1, 6 for 6150-A2, 5 for 6150-B, 8 for 8110-A1, 1 for 8110-A2) was over the threshold of 0.6 ( Fig. 4b, d, Online Resource 1). The highest value of the average dark diversity probabilities was exhibited by the habitat subtype 4060-A and the lowest by the habitat subtype 8110-A2. In each habitat subtype, the degree of functional association (represented by SES-ϕ) and the dark diversity probability of diagnostic species did not show a clear pattern, although in general high dark diversity probability could be found at low SES-ϕ, hence farther from the functional centroid of the habitat subtype (Fig. 5).

Discussion
Concerning the first aim of this study, our results highlighted the contribution that functional-based methods give to the selection of diagnostic species, compared to traditional methods based on presence/absence or abundance only. The inclusion of functional traits or CSR strategies allowed the identification of species largely coinciding with diagnostic species identified by traditional methods, but highlighting a greater number of exclusive species. In any case, we should consider that a species strongly correlated with specific ecosystem properties of a given habitat subtype will certainly be identified as a diagnostic species regardless of its trait values. However, using traditional methods the given species may be diagnostic, and thus typical, for more than one habitat type (Nicod et al. 2019;Rodríguez-Rojo et al. 2020). Using functional-based methods, the focus turns to the functional centroid, and ubiquitous species are recognized as exclusive thanks to the functional fine-tuning to the specific condition of the habitat (Ricotta et al. 2020). This is crucial for adding information about functions of habitat types and for detecting more suitable typical species.
Functional-based diagnostic species are identified considering the CWM so that species are considered not just for their traits but also for their abundance, in line with the mass-ratio hypothesis (Grime 1998), which assumes that more abundant species have a greater effect on ecosystem processes and functions. This leads to the evidence that functional-based diagnostic species are often structural indicators as well. Striking cases are Vaccinium myrtillus (4060-A), Carex curvula (6150-A), Festuca halleri (6150-A2), Salix herbacea (6150-B), Ranunculus glacialis (8110-A1) and Luzula alpinopilosa (8110-A2); remarkably all these species are identified as exclusive of their habitat subtype only by functional-based methods (Online Resource 1). This doesn't mean that the functional space determined by functional-based diagnostic species is smaller compared to the one determined by traditional ones. Our analysis clearly demonstrated that when diagnostic species of a habitat type have a wide trait spectrum, the functional centroid will be more likely located halfway between more diverse species, and all of them will be identified as typical. For example, in the habitat subtype 4060-A, Kalmia procumbens (stress-tolerant) is listed as typical along with species such as Vaccinium myrtillus or V. uliginosum (stress-tolerant/ Fig. 3 Kernel density distribution curves along the main axes of traits variation for each habitat subtype of the diagnostic species identified by the indicators based on: presence/absence (P/A), abundance (AB), functional traits (ϕ_traits) or Grime's CSR strategies scores (ϕ_CSR). All the comparisons are not significant (p > 0.05) ruderal) and even more Deschampsia flexuosa (ruderal). A full list of typical species that can be extracted by functional-based methods is reported in Online Resource 1.
Considering our second aim, for each habitat subtype we identified a set of functionalbased diagnostic species (ϕ_traits or ϕ_CSR) that mirrors the main trade-offs among form and function of plant species and communities (Bruelheide et al. 2018;Díaz et al. 2016). Fig. 4 Comparison of the dark diversity probability between diagnostic species, calculated using the functional association (ϕ) based on a functional traits (ϕ_traits) or c Grime's CSR strategies scores (ϕ_CSR), and not diagnostic species of each habitat subtype. The boxplots display the median (line in the middle of the boxes), the interquartile range (boxes), ± 1.5 times the interquartile range (whiskers) and outliers (circles); all pairwise comparison are not significant (p > 0.05). The subfigure b and d show the ranking of the dark diversity probability for functional based diagnostic species of each habitat subtype, respectively ϕ_traits and ϕ_CSR; the dashed line indicate the 0.6 probability of dark diversity Fig. 5 Relations between the standardized effect size of functional association (SES-ϕ) for diagnostic species based on functional traits (squares) or Grime's CSR strategy scores (triangles), and their average probability of dark diversity (circles) among each habitat subtype. The dark circles represent the species with a probability of dark diversity higher than 0.6 (dashed line). 8110-A1 and 8110-A2 = respectively unstable screes with early colonization stages and stable screes with more developed stages of Androsacion alpinae; 6150-A1 and 6150-A2 = siliceous alpine and boreal grasslands of Caricetalia curvulae, respectively with low (Carex curvula dominated) and intense grazing pressure (Festuca halleri dominated); 6150-B = snow beds of Salicetalia herbaceae; 4060-A = alpine and boreal heaths of Loiseleurio-Vaccinion They describe the functional range evident at the community level (Zanzottera et al. 2020) (Online Resources 2 and 4) and can thus underlie the full extent of ecosystem properties and functions (e.g. De Bello et al. 2010;Hevia et al. 2017). Diagnostic species of windexposed ridges and almost ungrazed alpine grasslands (4060-A and 6150-A1, respectively) showed more conservative characteristics, i.e. low scores of PC1-economics, conversely to snowbeds (6150-B) that were represented by more acquisitive species, i.e. higher scores of PC1-economics. Such a trade-off largely corresponds to the gradient from stress-tolerant to ruderal plant strategies (sensu Grime 1977), highlighting the variation from alpine communities constrained by climate to those dominated by stochastic disturbance events (Caccianiga et al. 2006). Only for grazed alpine grasslands (6150-A2), did we find a clear tendency toward large-sized diagnostic species with higher competitive scores (Pierce et al. 2007), possibly due to increased nutrient supply related to the presence of cattle.
The number of diagnostic species for each habitat subtype was primarily related to the successional stage: few diagnostic species for sparsely colonized screes or many for stable and more complex plant communities (wind-exposed ridges and alpine grasslands). The highest number of diagnostic species was found in grazed alpine grasslands of 6150-A2, where moderate disturbance can increase the functional diversity of plants, which are then distributed across a wide range in the ternary CSR space (Pierce et al. 2007). This agrees with the evidence that greater functional diversity can be associated with a higher number of typical species (Nicod et al. 2019). The comparison of ϕ_traits and ϕ_CSR outcomes showed very small differences, so the two approaches can be alternative, although the CSR method seems to be more efficient. For example, the CSR method excludes Luzula spicata, an S-selected species linked to more stable screes (Caccianiga et al. 2006), from the diagnostic species of unstable screes (8110-A1), and considers Helictochloa versicolor, an S-selected species of alpine pastures (Pierce et al. 2007), exclusively for 6150-A2. Whatever the functional trait-space is considered (trait-or CSR-based), it is worthy to note that the selected diagnostic species are not just one-dimensional indicators of functionality, i.e. along PC1-economics. They also have their own position along a dimensional axis, i.e. PC2-size, which relates to community structure, reiterating the evidence that diagnostic species selected with functional-based methods are good indicators of habitat structure.
Concerning the third aim of our study, we demonstrated that there was no bias in the dark diversity probability between functional-based diagnostic and non-diagnostic species since species belonging to dark diversity are equally divided between those with or without a functional association to a single vegetation unit. Only a few diagnostic species showed average dark diversity probabilities higher than 0.6, and this set of species is thus the fraction of the species pool with the highest affinity to the dark diversity (Carmona and Pärtel 2020b) within those identifiable as typical species. It is important to note that all the species above this threshold are not absent from all sites, rather they must be present in some sites so that they can have several co-occurrences and a high probability of dark diversity. These species are rarely found in local communities due to unstable ecological conditions of the habitat (Lewis et al. 2017;Pärtel et al. 2011) and could thus act as early warning indicators, as required by the reporting guidelines (DG Environment 2017).
Crucially, the analysis of dark diversity after functional-based methods allowed us to select the species most sensitive to environmental or management changes as a subset of those species associated with the structure and functions of habitat types. For example, we identified Gentiana purpurea, Pedicularis kerneri and Hieracium glanduliferum as early warning indicators in the habitat subtype 6150-A1 (Online Resource 1). Their ecology is strictly related to acidic soils with low nutrients, and their absence could indicate an increase of nutrients (e.g. due to intense cattle grazing). Moreover, 1 3 G. purpurea is also the one with highest values of PC2-size and its absence would thus change the physiognomy of the habitat subtype. Other reasons, not strictly related to any pressure, can be responsible for the absence of early warning species, but they are still essential for the long-term maintenance of habitat types. For example, species identified in unstable screes (8110-A1) are highly related to the debris movement (e.g. Geum reptans, Oxyria digyna) and their absence would indicate the evolution towards more stable plant communities.
Typical species can show large dominance or can be specialists of specific functions in plant communities of a habitat type, hence their abundance can be a sensitive indicator of changes in habitat types' conservation status (Kovač et al. 2016(Kovač et al. , 2020Rodríguez-Rojo et al. 2020). Accordingly, the decline to the point of disappearance of typical species can have drastic consequences on ecosystem functions (Nicod et al. 2019), even locally, and thus the selection of a most sensitive subset of typical species is crucial for the monitoring of habitat types. Here we provide an example of the use of dark diversity on a set of functional-based diagnostic species to identify early warning indicator species, a combination never considered before. The HD aims to assess conservation status categories based on the distance from a defined favourable reference value (DG Environment 2017) that we identified as the community functional centroid. Species we identified with the dark diversity probability are to some extent associated with the functional centroid, although they often lie in a marginal position. Therefore, any changes in the ecological condition of the habitat type leading to a shift of centroid coordinates can cause less favourable conditions for their survival in the community, up to their disappearance. Since the subset of species that we identified pointed out the first community functional shifts, they have to be considered early warning species (Caro 2010).
Despite the functional association and dark diversity of species are habitat-specific, and they are thus strongly linked to each single habitat subtype, the dark diversity depends on species regional richness (Lewis et al. 2017), which contribution to the assembly of local communities is strongly mediated by biogeographic regions (Chiarucci et al. 2019). Hence, the identification of typical species might be more efficient when performed at a national or regional scale (Gigante et al. 2018;Tsiripidis et al. 2018), depending on the size of the MS, based on a sound quantitative database (Carigan and Villard 2002). From this point of view, it becomes useful the large volumes of local vegetation data referred to habitat types that are increasingly available (e.g. Agrillo et al. 2018;Brusa et al. 2017b;Evangelista et al. 2016).
At this stage, this protocol can only be complementary to other existing approaches since it needs to be tested on longer lists of habitat types, on more structurally complex habitat types such as forests, on habitat types more prone to invasion by alien species, and on other taxa rather than plants, if comparable data are available. Other methods should be instead implemented for habitat types with reduced or particular biological components (e.g. some marine or rocky habitat types). Our results on alpine habitat types were indeed promising, encouraging its application for the selection of typical species as a routinized method. A lot of efforts have been done to develop standardized protocols to determine the local distribution of habitat types (e.g. Dalle Fratte et al. 2019) and to calculate indicators of favourable conservation status (Hernando et al. 2010;Kovač et al. 2020). In the same way, the identification of typical species should shift from expert-based to objective and replicable methods (Delbosc et al. 2021;Kovač et al. 2016;Rodríguez-Rojo et al. 2020), and standardized methodological protocols, such as the one we propose here, are thus needed. This could help the comparison of species lists from different MS, in order to have a solid baseline for a more consistent and comparable reporting across the EU.

Conclusion
Here we provide a first step towards a unified framework for identifying typical plant species using functional-based diagnostic species. These have revealed themselves to be good indicators of structure and functions of habitat subtypes, meeting one of their main requirements according to the HD. We built a combined approach that allowed to firstly identify species with higher functional association to a single habitat subtype, and secondly, among these, to select early warning indicator species based on their dark diversity probability. In this way, the identified typical species have a clear measurable association with the habitat functional centroid and a measurable probability of belonging to dark diversity. Finally, we recommend a careful choice of plant traits so that they can depict functional spaces for species and communities consistent with what has already been highlighted in scientific literature, such as the global spectrum of plant form and function. In this sense, Grime's CSR plant strategy scheme can provide a straightforward identification of functional-based diagnostic species, possibly reducing costs and efforts since only three traits have to be considered.