1 Introduction

Diagnostic species is an important concept in vegetation classification (Whittaker 1962; Westhoff and van der Maarel 1973). It refers to those species that, due to their niche preferences, concentrate their occurrence or abundance in a single vegetation unit or in a few vegetation units (Dengler et al. 2008). Under the Braun-Blanquet approach to vegetation classification, the degree of concentration is usually called fidelity. There are several statistical techniques for the determination of fidelity and diagnostic species. Among them, those that assess the strength of association between species and groups of relevés (i.e., vegetation plot records) using correlation or indicator value indices are the most widely used (Chytrý et al. 2002; De Cáceres and Legendre 2009; De Cáceres et al. 2012).

Apart from its usefulness to characterize species niche preferences, the diagnostic species concept is used in vegetation classification: (1) for the assignment of new relevés to the vegetation units of an existing classification; (2) to refine vegetation classifications by reassigning relevés that sustain the definition of vegetation units (Tichý 2005; Dai et al. 2006). To conduct these tasks, phytosociologists traditionally (re)assigned relevés informally using the diagnostic species concept. However, nowadays fidelity values can be formally incorporated into the calculation of the numerical similarity between the relevé to be assigned and each of the vegetation units (van Tongeren et al. 2008). Similarity indices are often used to assign relevés to vegetation units based on complete species composition. For example, Hill (1989) described a method based on a modified Czekanowski coefficient of similarity between observed and expected numbers of species in each constancy class. Two assumptions underpin the incorporation of fidelity values into the calculation of the similarity between relevés and vegetation units: (a) some species may be more informative than others for the assignment of relevés to vegetation units; (b) since diagnostic species are indicators of their preferred habitats (De Cáceres et al. 2012), fidelity values are a reasonable way of defining species weights in the calculation of similarity to vegetation units(Chytr and Tichý. 2003). Following this rationale, Tichý (2005) developed four similarity indices using the phi coefficient as a fidelity measure. Among them, the frequency-positive fidelity index (FPFI) has been often used for the assignment of relevés that were misclassified or classified to more than one groups in supervised classifications (Boublík et al. 2007; Douda 2008; Boublík 2010; Janišová et al. 2010; Svitková and Šibík 2013; Landucci et al. 2013; Rodríguez-Rojo et al. 2014; Chytry and Tichy 2018; Maciejewski et al. 2020). Another approach employing the diagnostic species concept in a similarity index was proposed by Dai et al. (2006), who suggested using the total indicator value index (TIVI) to test the validity of a TWINSPAN classification and to refine the initial classification by reassigning relevés. While Dai et al. (2006) based relevé assignments on fidelity values calculated using the indicator value (IndVal) index (Dufrêne and Legendre 1997), Esmailzadeh and Asadi (2014) recently suggested replacing IndVal with the phi coefficient, calling the resulting approach total phi fidelity index (TPFI). Gégout and Coudun (2012) also developed the fidelity index of relevés (FI) to reassign relevés according to fidelity values, but in contrast to previous indices FI is defined as the sum of phi values, regardless of species abundance or frequency.

Here, we propose that TIVI and TPFI can be unified in a single assignment framework that can be called the total fidelity value index (TFVI). For each relevé and vegetation unit, TFVI is defined as the sum, across all species occurring in the relevé, of the species fidelity values for the vegetation unit multiplied by the species abundance values in the relevé. Each relevé is then assigned to the vegetation unit for which TFVI is highest. Several alternatives can be used as a choice of fidelity measure in the TFVI framework (Tichý and Chytrý 2006; De Cáceres and Legendre 2009), and it is unknown how this choice may affect the results. Therefore, in this paper, we compare the performance of eight different statistical fidelity measures for their use within the TFVI framework. Because relevé assignments are commonly based on the FPFI, we also include this assignment rule in our evaluation. We take vegetation data from the Hyrcanian Box tree (Buxus hyrcana Pojark.) and common yew (Taxus baccata L.) forests and the result of three frequently-used unsupervised methods as initial vegetation classification. We evaluate the nine different assignment rules: (a) in terms of the predictive performance; (b) in terms of the quality of the vegetation units after re-assignment and also in terms of several evaluator indices. Evaluating predictive performance (a) implies the assumption that the number of groups is known and the original classification is the “truth” to be reproduced by the assignment rule. In contrast, evaluating the quality of classification after reassignment (b) assumes that the original classification may be improved. We conduct this evaluation without fixing the number of groups, but we penalize those cases when the classification obtained after reassignment reduces the number of relevés sustaining the definition of some vegetation units to the point of compromising their statistical validity.

2 Methods

2.1 Study area

The Hyrcanian region is a narrow green belt covering an area of 1.9 million ha in northern Iran and 20,000 ha in the Republic of Azerbaijan. In Iran, this region is located between the Caspian Sea and the northern foothills of Alborz mountains in three northern provinces: Gilan (western part), Mazandaran (middle part), and Gorgan (eastern part). Unlike most of Iran, the Hyrcanian region is relatively humid, with an average annual rainfall that ranges between 530 mm in the east and 1350 mm in the west (with an occasional record over 2000 mm in some locations). Rainfall mostly occurs during late fall, winter, and spring. The average annual temperature varies from 15 °C in the west to 17.5 °C in the east. The average temperature of the warmest month ranges from 28 to 35 °C while that of the coldest month is between 1.5 and 4 °C (Sagheb-Talebi et al. 2014). Brown soils (i.e., calcareous, forest acidic, podzolic, and non-podzolic soils) are the most important soil type, comprising approximately 90% of the region (Habibi Kasseb 1992). Hyrcanian forests are dominated by combinations of oriental beech (Fagus orientalis Lipsky), Caucasian oak (Quercus castaneifolia C.A.Mey.), hornbeam (Carpinus betulus L.), and Persian ironwood (Parrotia persica C.A.Mey.), and depending on the site Acer velutinum Boiss., Tilia rubra DC., Fraxinus excelsior L., Alnus subcordata C.A.Mey., and B. hyrcana (Marvie Mohadjer 2005).

In Hyrcanian forests, plant communities of B. hyrcana are the remnants of broad masses that formerly occupied lowlands (along with Q. castaneifolia, C. betulus, and P. persica) and steep slopes of wet valleys (along with F. orientalis) but are now restricted to limited areas. They are characterized by a low species richness and variability in species composition and are adapted to grow on sites within a range of edaphic conditions, as long as they are exposed to the adequate air moisture in the southern parts of the Caspian Sea, Northern Iran (Asadi et al. 2011; Esmailzadeh et al. 2014; Soleymainpour and Esmailzadeh 2015). The broad ecological niche and the high sociability of B. hyrcana makes the composition of box tree stands in Hyrcanian forests to be highly variable across the study area. Box trees co-occur with Zelkova carpinifolia (Pall.) Dippel and Celtis australis L. as drought-tolerant tree species in the eastern lowland Hyrcanian forests to Pterocarya fraxinifolia (Lam.) Spach and Populus caspica (Bornm.) Bornm. as hygrophilous tree species in the western part of Hyrcanian lowland forests. Along elevation gradients, box trees co-occur with Q. castaneifolia at low elevations and F. orientalis in highland forests (up to 1700 m). Box tree stands are distributed in a wide range of geographical slopes: in steep, north-oriented slopes, they are accompanied by T. baccata, Prunus laurocerasus L., and Danae racemosa (L.) Moench as a hygrophilous indicator species of Hyrcanian highland forests. However, in slopes with a little lower air humidity, they are accompanied by T. rubra, Acer cappadicicum Gled., and Sorbus torminalis (L.) Crantz.

The Hyrcanian T. baccata communities’ dataset was also included as the second dataset for reinforcing the significance of the results. T. baccata is the only coniferous species that is able to distribute in the main Hyrcanian forest types. T. baccata is often individually scattered or consisting small populations in humid sites (i.e., with high air humidity, but not humid soils) of northern steep slopes as well as hillslopes of northern valleys throughout Hyrcanian mountainous forests (Sagheb-Talebi et al. 2014). In much more humid sites of Hyrcanian forests, it forms pure and mixed dense large populations (Sagheb-Talebi et al. 2014).

2.2 Sampling

Habitats containing B. hyrcana forests were searched from the Cheshmeh-Bolbol protected area (west of Golestan Province) to Lire-sar in the west of Mazandaran province (Fig. 1). These habitats are located from 50 m a.s.l in Sisangan protected area to 1750 m a.s.l in Farim, the highest altitude of B. hyrcana in Hyrcanian forests. After this initial survey, vegetation was sampled using the Braun-Blanquet relevé method in summer 2010 to 2014. Vegetation plots (with an area 400 m2) were conducted in stands considered representative to provide an appropriate representation of the variability of B. hyrcana forests (Mueller-Dombois and Ellenberg, 1974). To include any possible change in vegetation indicating variations in habitat conditions, while considering the principle of a representative stand, we defined transects which were 400-m stretches systematically set along the altitudinal gradient and relevés were conducted whenever floristically or environmental (especially in topographical features) alteration was perceived. In each vegetation plot, all vascular plant species were recorded, and their percentage cover was visually assessed using a modification of the ordinal van der Maarel (1979) cover-abundance scale (0, absent; 1, 0–1%; 2, 1–2.5%; 3, 2.5–5%; 4, 5–12.5%; 5, 12.5–25%; 6, 25–50%; 7, 50–75%; 8, 75–100%). Cover-abundance class values were replaced by the mean cover of each cover class. The resulting dataset (referred here as the B. hyrcana dataset) included 484 relevés and 157 species.

Fig. 1
figure 1

Map of the two Iranian provinces where Hyrcanian forests were sampled. Dots indicate sampling locations corresponding to the B. hyrcana (circles) and T. Baccata (triangles) dataset

Also 408 relevés in habitats containing T. baccata in central and eastern of the Hyrcanian forests in summer 2015, 2017, and 2018 were sampled. These forests were distributed from Siah-roudbar Valleys in the east of Golestan province to Mazga in the west of Mazandaran province (Fig. 1). These habitats are located from 1000 m a.s.l in Gazoo to 2000 m a.s.l in Afrathakhteh. This dataset is referred to here as the T. baccata dataset.

2.3 Initial classifications

We used three unsupervised classification methods to classify the compositional structure of the both B. hyrcana and T. baccata datasets into vegetation units: (1) modified TWINSPAN (Roleček et al. 2009), (2) partitioning around medoids (PAM, Kauffman and Rousseeuw 1990), and (3) k-means (Mac Queen 1967). Modified TWINSPAN is a hierarchical divisive method that combines the classical TWINSPAN algorithm (two-way indicator species analysis; Hill 1979) with an analysis of heterogeneity of the clusters prior to each division. Unlike the original version, modified TWINSPAN does not enforce a dichotomy of classification but instead, at each step, divides only the most heterogeneous cluster of the previous hierarchical level. Thus, the application of modified TWINSPAN results in vegetation units of similar internal heterogeneity (Luther-Mosebach et al. 2012). We applied total inertia (i.e., the sum of all eigenvalues in correspondence analysis) as measure of cluster heterogeneity (Roleček et al. 2009) and pseudo-species cut levels were set to 0%, 1%, 2.5%, 5%, 12.5%, 25%, 50%, 75%, and 100%. K-means and PAM are commonly used non-hierarchical clustering methods (Legendre and Legendre, 2012; Tichý et al. 2014). Both of them require the number of clusters and the initial group members to be specified by the user. The main difference between k-means and PAM is that in the former each cluster is represented by its centroid, a multivariate mean, whereas in the latter, each cluster is represented by its medoid, the cluster member that has the minimum sum of distances to all the other members of the cluster. Both methods have an objective error function that is progressively minimized by iteratively reassigning objects (i.e., relevés) to their nearest cluster center (centroid or medoid). Iterations terminate when no further reassignments are possible. We ran k-means and PAM starting from 100 random initial configurations to avoid local minima of the error function. The floristic resemblance between pairs of relevés was assessed using the Hellinger distance, which can be emulated by transforming relevé data prior to calculation of Euclidean distances (Legendre and Gallagher 2001; De Cáceres et al. 2008). All classification methods were executed using the JUICE software (Tichy 2002) based on both vegetation datasets.

We used the OptimClass procedure (Tichy et al. 2010) to determine the optimum number of clusters (Appendix 1) in both B. hyrcana and T. baccata datasets. Specifically, we searched for the partition with the largest number of faithful species across all clusters. Faithful species were determined based on the p-value of the Fisher’s exact test as a measure of fidelity (Tichy et al. 2010). OptimClass was run on 40 classification algorithms with five distance measures (i.e., Euclidean, relative Euclidean, correlation, chi-square, and relative Sorensen) and eight methods of group linkage (i.e., Flexible Beta, McQuitty’s, Ward’s, centroid, group average, median, nearest and farthest neighbor methods) based on square-root transformed cover percentage. Across all classification methods, the relationship between the number of faithful species and the number of clusters (groups) revealed that classification by 18 group numbers in B. hyrcana forests and 17 group numbers in T. baccata forests presented the most number of faithful species and it was thereafter considered as optimal group number in. For the evaluation of predictive performance, we took the partitions obtained by TWINSPAN, PAM, and k-means with 18 and 17 groups as the initial classification to be recovered in numbers in B. hyrcana and T. baccata datasets, respectively. For the evaluation of the quality of classification after reassignment, however, we kept the partitions generated by TWINSPAN, PAM, and k-means for 2, 3, …, 18 groups in B. hyrcana datasets and 2, 3, …, 18 groups in T. baccata datasets as initial classifications to be refined.

2.4 Assignment rules

The TFVIig value for a given relevé i and vegetation group g is defined as:

$$TFVI_{ig} = \sum\limits_{j = 1}^{{s_{t} }} {C_{ij} } \times FV_{gj}$$
(1)

where FVgj is the fidelity (indicator) value of species j in group g and Cij is the cover value of species j in relevé i. Once the TFVIig value is calculated for all vegetation units, the assignment rule consists of assigning relevé i to the group corresponding to the highest TFVIig value.

There are many alternative indices to define the fidelity value of species (Tichý and Chytrý 2006; De Cáceres and Legendre 2009). To determine how important was the choice of a fidelity index for assignments in the TFVI framework, we compared eight different alternatives, differing in the general approach (i.e., correlation vs. indicator value indices, or IndVal, indices), in the way differences in group size are dealt (i.e., non-equalized vs. group-equalized) and in whether species abundance values are taken into account (Table 1, taken from De Cáceres and Legendre 2009). For correlation indices, we only used species with positive fidelity values in the calculation of TFVI. Calculations were performed using the R statistical language and the “indicspecies” package, version 1.7.5 (De Cáceres and Legendre

 2009). Because it is frequently used for relevé assignments, we also considered FPFI (Tichý 2005) as an additional assignment rule to be evaluated. FPFI is a combination of the frequency index (FQI) and positive fidelity index (PFDI) (Eq. 1). The FQIig (Eq. 2), PFDIig (Eq. 3), and FPFIig (Eq. 4) value for a given relevé i and vegetation group g are defined as:

$$FQI_{ig} = 100 \times \left( {{{\sum\limits_{j \in R} {FQ_{gj} } } \mathord{\left/ {\vphantom {{\sum\limits_{j \in R} {FQ_{gj} } } {\sum\limits_{j \in C} {FQ_{gj} } }}} \right. \kern-\nulldelimiterspace} {\sum\limits_{j \in C} {FQ_{gj} } }}} \right)$$
(2)
$$PFDI_{ig} = 100 \times \left( {{{\sum\limits_{j \in R} {FD_{gj} } } \mathord{\left/ {\vphantom {{\sum\limits_{j \in R} {FD_{gj} } } {\sum\limits_{j \in C} {FD_{gj} } }}} \right. \kern-\nulldelimiterspace} {\sum\limits_{j \in C} {FD_{gj} } }}} \right)$$
(3)
Table 1 Non-equalized and group-equalized versions of the correlation indices (r) and the indicator value indices (IndVal)
$$FPFIig\hspace{0.17em}=\hspace{0.17em}100\hspace{0.17em}\times \hspace{0.17em}(FQIig\hspace{0.17em}+\hspace{0.17em}PFDIig) /\space2$$
(4)

where FQgj is the frequency (constancy) value of species j in group g. Species present in the relevé are indicated as j \(\in\) R and species present in constancy column as j \(\in\) C. FDgj is positive fidelity value (phi coefficient) for species j to a g vegetation unit. All assignment algorithms were done based on the results of every third initial classification methods in both B. hyrcana and T. baccata datasets separately.

2.5 Evaluation of predictive performance

Before using it for the assignment of new relevés, it is important to evaluate the predictive performance of any given assignment rule (i.e., to evaluate to which degree a “known” classification can be reproduced). We used each of the three 18-group initial classifications (produced by modified TWINSPAN, k-means, and PAM) for the reassignment of the 484 relevés according to TFVI and FPFI. Reassignment of each target relevé was done after excluding it from the calculation of fidelity values. Note that this does not make the assignment rule completely independent of the relevé to be assigned, since it still influences how the original classification was obtained. Nevertheless, this effect is very difficult to remove and we think it may be small in most cases. In total, 27 new classifications were obtained being the result of reassigning relevés of each initial classification using each of the nine assignment rules (i.e., TFVI with eight fidelity measures plus FPFI) in each datasets.

Assuming that the initial classification was the “known” classification solution, we calculated the adjusted Rand index (ARI) (Hubert and Arabie 1985) to assess the degree of agreement between the initial classification and the classification obtained after reassignment. ARI values are bounded between 0 and 1, with a value of 1 meaning perfect agreement between the initial classification and the TFVI result and a value of 0 meaning that agreement was not any better than what would be obtained by chance. ARI was calculated using the R statistical language in the “mclust” package, version 5.1 (Fraley et al. 2012). Ranks of ARI values were used to compare the performance of different assignment rules across the three classifications in each datasets.

2.5.1 Evaluation of the quality of vegetation classifications

For the evaluation of the quality of vegetation units, we tested the nine assignment rules on the partitions generated by the three classification methods for each number of groups between 2 and 18 in B. hyrcana datasets and 2 and 17 in T. baccata datasets. This resulted in 459 new classifications (3 classification methods × 9 assignment rules × 17 clustering levels) in B. hyrcana datasets and 432 new classifications (3 classification methods × 9 assignment rules × 16 clustering levels) in T. baccata datasets. In some cases, reassignments led to a reduction in the number of members of some vegetation units. The minimum group size for accepting a group as ‘statistically valid’ was conventionally set to three relevés, and assignment rules leading to a reduction in the number of valid vegetation units were penalized (see below). The 459 classifications for B. hyrcana and the 432 classifications for T. baccata were evaluated were evaluated with eight internal classification evaluators, most of which have been tested and reviewed in the literature (Aho, Roberts and Weaver 2008; Roberts 2015). The eight evaluators consist of five geometric and three non-geometric measures, and are summarized in Table 2 (see discussion about the circularity in the choice of evaluators in Sect. 4). Among non-geometric evaluators, we applied (1) Morisita’s index of niche overlap (Horn 1966) that evaluates classification effectiveness concerning species distributions, (2) ISAMIC-indicator species analysis to minimize intermediate constancy (Roberts 2010) measures the constancy (either high or low) of species within groups irrespective of how many groups that species occurs in (Roberts 2015), and (3) ISA-indicator species analysis (Aho et al. 2008) is derived from the indicator value (IndVal) of species (Dufrene and Legendre 1997). The IndVal has long been the most popular measure to assess species importance in community classifications (Podani and Csanyi 2010) as it is one of the most widely used goodness of clustering index (Roberts 2015). High ISA values indicating high fidelity and abundance of species within groups (Aho et al. 2008). ISA is presented in two modes: average p value and number of significant indicator (α = 0.05), but we used the last one. p Values for ISA was calculated with Monte-Carlo procedures.

Table 2 Summary of classification solution evaluators in this paper

For the geometric evaluators, we considered five indices: (4) C-index (Hubert and Levin 1976), (5) PBC-point biserial correlation (Brogden 1949), (6) PARTANA ratio (Roberts 2005), (7) ASW-average silhouette width (Rousseeuw 1987), and (8) ANOSIM-analysis of similarity (Clarke 1993). Geometric indices evaluate classification effectiveness based on the relationship of pairwise dissimilarities within- and between-groups. For the calculation of geometric evaluators, Euclidean distance was used to create the required distance matrices based on presence-absence and abundance data (after Hellinger’s transformation) as two types of species data. We used both presence-absence and abundance data to avoid biasing our evaluation towards reassignments made with either incidence-based or abundance-based fidelity measures. Finally, we used thirteen classification evaluators. All evaluators were run in R with “plant.ecol” package, version 0.4-1 (Aho, K. 2017. https://sites.google.com/a/isu.edu/aho) for Morisita, C-index, and PBC; “labdsv” (Robert and Robert 2016. https://cran.r-project.org/web/packages/optpart/index.html) for ISAMIC; “optpart” (Robert 2010. https://cran.r-project.org/web/packages/optpart/index.html) for PARTANA; “cluster” (Maechler et al. 2013. https://cran.r-project.org/web/packages/cluster/index.html) for Silhouette; “indicspecies” (De Cáceres et. al 2016. https://cran.r-project.org/web/packages/indicspecies/index.html) for ISA, and “vegan” (Oksanen et al. 2013. https://cran.r-project.org/web/packages/vegan/index.html) for ANOSIM indices.

A numerical value was obtained for each evaluator and each classification solution. The reduction in the number of groups can lead to an inflation of the evaluator statistic (because the vegetation units that loose relevé members during reassignment have lower quality). Therefore, we penalized the fact that some assignment rules led to some vegetation units having less than three relevés. In these cases, we decreased the value of the evaluator multiplying the ratio between the number of “statistically valid” groups and the number of initial groups (this proportion hereafter called “correction ratio”). After possibly modifying the evaluator values, we averaged the values of each evaluator across the partitions of 2 to 18 groups in B. hyrcana datasets and 2 and 17 in T. baccata datasets. For each classification method, the ten evaluator values, corresponding to the initial classification and the nine assignment rules, were ranked from best (1) to worst (10). To combine the results of the different evaluators, we calculated median ranks (across the thirteen evaluator statistics) for each classification. Ranks of these medians were used to compare the different assignment rules globally. All synthesizing process was summarized in a flowchart (Fig. 2). All analyses were separately done based on two available datasets.

Fig. 2
figure 2

Flowchart of all synthesizing process

3 Results

3.1 Evaluation of predictive performance

All assignment rules produced some distortion of the three initial classifications. The predictive performance of the TFVI framework varied strongly depending on the fidelity measure, whereas that of the FPFI assignment rule was rather low especially in the T. baccata dataset. Indval obtained the best rank in its performance to reproduce all three initial classifications in both B. hyrcana and T. baccata dataset and, hence, the best median rank. The order of the remaining assignment rules in terms of performance was IndVal, Indval, r, r, r, r, FPFI, and IndVal, respectively, in the B. hyrcana dataset (Table 3, section A). However, Indval had the best predictive performance in the T. baccata dataset, but with a different order in the remaining assignment order, little has been changed including: r, r, IndVal, IndVal, Indval, r, r, and FPFI (Table 3, section B).

Table 3 Adjusted Rand index (ARI) values for all cluster solutions in B. hyrcana (section A) and T. baccata (section B) datasets. Ranks (1 to 9) are indicated in parentheses

3.2 Quality of the classification after reassignment

The average (from 2 to 18 groups in B. hyrcana and from 2 to 17 groups in T. baccata dataset) values of quality evaluators are shown in Table 4 (averages after correcting for the decrease in the proportion of valid groups are underlined). These are shown in two sections A and B corresponded to B. hyrcana and T. baccata dataset, respectively. Among the eight fidelity measures, only the reassignment with Indval did not change the number of ‘statistically valid’ groups for partitions generated by any of the three classification methods in the B. hyrcana dataset. Using other indices in the TFVI framework or using the FPFI assignment rule in B. hyrcana dataset led to a decrease in the number of valid groups for at least one initial classification. But the changes in the number of statistically valid groups, which were derived by nine algorithms, were relatively low in the T. baccata dataset and reassignment with Indval as well as r did not change the number of groups in all three initial classification methods.

Table 4 Average values of each evaluator of the initial classification and that obtained with each assignment rule. Averages were calculated across the different number of groups 2 to 18 in B. hyrcana (section A) and 2 to 17 in T. baccata (section B) datasets, before (not underlined) and after (underlined) correcting for the ratio of “statistically valid” Appendix 3 groups. Ranks (1 to 10) are calculated from the corrected averages (i.e., from italicized values). Incidence-based evaluators are specified in the table with braces

Boxplot of the ranks of the thirteen evaluators, except ISA, in ten clustering solutions for B. hyrcana dataset indicate that at least one of the nine reassignment methods resulted in an improvement in the value of the evaluators compared to the initial classification (Fig. 3). In this dataset and from the point of view of abundance-based evaluators except for 1-Morisita and ISAMIC, TFVI based on Indval was the best solution. The TFVI based on r and r assignment rules obtained the first rank for Morisita and ISAMIC indices, respectively. However, in terms of incidence-based evaluators (i.e., 1-C.index, PARTANA, and ANOSIM), TFVI based on r assignment was ranked first, followed by the FPFI (i.e., Sil and PBC). The initial solution was ranked first for the ISA evaluator only. In this relation, boxplot of the ranks related to all evaluators in ten clustering solution for T. baccata dataset indicate that at least one of the nine reassignment methods improve the value of the evaluators compared to the initial (Fig. 4). From the point of view of abundance-based evaluators in T. baccata dataset except for Morisita, PBC and ISAMIC, TFVI based on Indval was the best solution. The TFVI based on r, IndVal and r assignment rules obtained the first rank for Morisita, PBC and ISAMIC indices, respectively. However, in terms of incidence-based evaluators (i.e., PARTANA and ANOSIM), TFVI based on r assignment was ranked first, followed by the IndVal (i.e., 1-C.index and PBC). But TFVI based on Indval acquired the first rank in term of incidence-based Sil evaluator. Based on T. baccata dataset, the initial solution was not ranked first at all.

Fig. 3
figure 3

Boxplot of the evaluator ranks across ten cluster solutions in the B. hyrcana dataset. Boxplots were drawn based on 51 values per box (3 initial classifications × 17 clustering levels): 1 initial algorithm, 2 FPFI and 3, 4, 5, 6, 7, 8, 9, and 10 are IndVal, 4 Indval, 5 IndVal, 6 Indval, 7 r, 8 r, 9 r based, 10 r based TFVI models. These are partitioned into three definite parts including (a) initial with FPFI solutions, (b) IndVal-based solutions, and (c) phi-based solutions. The box extends from the first quartile to the third quartile. The crossed line in the center of the box is the median. Each boxplot is based on three ranks, corresponding to the three initial classifications. The best solution (lowest average rank) is indicated using a checkmark symbol

Fig. 4
figure 4

Boxplot of the evaluator ranks across ten cluster solutions in the T. baccata dataset: 1 initial algorithm, 2 FPFI and 3, 4, 5, 6, 7, 8, 9, and 10 are IndVal, 4 Indval, 5 IndVal, 6 Indval, 7 r, 8 r, 9 r based, 10 r based TFVI models. These are partitioned in three definite parts including (a) initial with FPFI solutions, (b) IndVal-based solutions, and (c) phi-based solutions. The box extends from the first quartile to the third quartile. The crossed line in the center of the box is the median. Each boxplot is based on three ranks, corresponding to the three initial classifications. The best solution (lowest average rank) is indicated using a checkmark symbol

Calculating median ranks revealed that assignment rules sometimes led to classifications with the better overall quality compared to the initial classification in both datasets (Table 5). In the B. hyrcana dataset, in terms of incidence-based evaluators, an improvement of the quality of classification was obtained by TFVI in combination with r and FPFI, whereas in terms of abundance-based evaluators, only reassignments using TFVI based on Indval led to classifications of better quality than the initial classification. This process in the T. baccata dataset also revealed that the TFVI algorithm based on r led to the best refinement of initial classification. While in terms of abundance-based evaluators, Indval, IndVal, and FPFI were ranked first, while r gained the second importance.

Table 5 Median ranks for incidence-based, for abundance-based evaluators and all evaluators in B. hyrcana (section A) and T. baccata (section B) dataset. Ranks of median ranks are shown in parentheses

In the overall comparison of assignment rules (including both incidence-based and abundance-based evaluators) in both vegetation datasets, we found that TFVI based on Indval, r indices, and FPFI performed best in terms of evaluation statistics, followed by the initial classifications and finally the other TFVI algorithm (Table 5). From the point of view incidence-based evaluators, TFVI based on r had the highest quality while the Indval as abundance-based indicator value has the best rank based on abundance-based evaluator. Results also indicated that FPFI was a reliable assignment index, achieving the second rank after TFVI based on r when considering incidence-based evaluators and similar rank as TFVI based on r when considering all evaluators in the B. hyrcana dataset. FPFI also obtained the third rank after Indval and r by considering all evaluators in T. baccata dataset.

4 Discussion

Finding efficient, simple and precise rules for the assignment of new or misclassified relevés to existing vegetation units is an important topic in vegetation science (Bruelheide 1997; Černá and Chytrý 2005; Tichý 2005; Dai et al. 2006; van Tongeren et al. 2008). In parallel, vegetation scientists often recommend refining the results produced using unsupervised classification methods before accepting vegetation units (Wiser and De Cáceres 2013; Tichý et al. 2014). These two tasks can be conducted employing species fidelity data. Tichý (2005) compared several similarity indices for the assignment of relevés to the vegetation units using simulated data. Among them, Tichý (2005) recommended FPFI (which combines frequency information with fidelity values) for the assignment of relevés to preexisting vegetation units. Following a similar approach, Dai et al. (2006) and Esmailzadeh and Asadi (2014) developed TFVI and TPFI, respectively, based on fidelity and the cover percentage of each species. Esmailzadeh and Asadi (2014) concluded that TPFI using a group-equalized phi fidelity index could be used as an approach to improve TWINSPAN results. In this paper we generalized TFVI and TPFI into a single framework, which we called TFVI, for the assignment of relevés to existing vegetation units based on fidelity values and the cover percentage of each species. We sought to determine the most suitable fidelity measure to be used in the TFVI framework. For this purpose, we took the vegetation units derived from a B. hyrcana as well as T. baccata datasets and tested the performance of the FPFI assignment rule and the TFVI framework using eight different fidelity measures. Despite only testing assignment rules on a single (but real) dataset, we obtained some interesting findings, which we describe in the following paragraphs.

We found that assignments using the TFVI framework often had higher predictive performance than assignments using FPFI. The reason for this result may be related to the usage of cover percentage (i.e., species abundance data) as a weighting criterion instead of species frequency (i.e., presence/absence data). Assuming that a high percentage cover of a species implies more favorable environmental conditions than its frequency, the weighting of fidelity values by cover percentage causes the influence of each species to be related to the availability of favorable conditions. In the TFVI framework, species with high cover percentage as well as high fidelity for the target unit will be more influential in assignments.

We also found that using group-equalized fidelity indices led to better results in the TFVI framework, both in terms of predictive performance and quality of the resulting classification, compared to the use of non-equalized fidelity indices. Diagnostic values analysis using non-equalized indices is biased towards common species (e.g., Chytrý et al. 2002; Tichý and Chytrý 2006). Group equalization allows assessing diagnostic value independently of the size of the data set and of the size of the target site group, resulting in a better treatment of species rarity in fidelity calculations (Tichý and Chytrý 2006). In the case of the phi coefficient, another advantage of group equalization is that for each species, the order of its relative frequencies within different vegetation units is the same as the order of its fidelities to those vegetation units (Tichý and Chytrý 2006). Our results emphasize the importance of group size equalization not only for diagnostic value calculations but also for assignments of relevés based on fidelity values.

TFVI assignments based on Indval preserved the initial number of groups regardless of the method used to produce the initial classification (i.e., modified TWINSPAN, k-means or PAM). Hence, TFVI based on Indval can be considered superior to other assignment rules in the sense that it does not produce strong alterations of the vegetation concepts in the original (unsupervised) classification. In addition, classifications obtained using assignments based on Indval resulted in the highest predictive performance. If assigning new relevés to existing vegetation units is the only usage of the assignment rule, Indval should be recommended because of its higher predictive power. However, if the assignment rule is applied for the refinement of a classification, other fidelity measures may also be suitable, because in this case having a good predictive performance may not be as important as improving the quality of the classification.

The choice of an evaluator index often implies a bias in the evaluation towards classification procedures that better match the concepts considered important in the conception of the evaluator index. Our quality evaluation results also showed that the ranking of evaluators is influenced by the type of vegetation data (i.e. incidence or abundance- based). Actually, in our analysis non-geometric evaluators (i.e., ISAMIC and Morisita) as well as incidence-based evaluators indicated that classifications obtained using correlation measures followed by FPFI were better than IndVal-based reassignments. Since the calculation of non-geometric and incidence-based evaluators is also based on frequency values, it can be said that the results of these evaluators were biased towards classification solutions obtained using phi coefficients such as TFVI based on r and FPFI. A similar bias could be argued towards abundance-based site-group association measures when using abundance-based evaluators. We found Indval to perform very well with abundance-based evaluators but this was not the case for other abundance-based measures such as IndVal.

In terms of the overall quality of the resulting classification, our results indicate that the TFVI framework works better when the chosen fidelity measure is either Indval or r. Indeed, comparisons based on all evaluators indicated that TFVI based on \({Indval}_{ind}^{g}\) was the first option, followed by r and FPFI. There are two differences between Indval and r. One is that Indval uses abundance data for the calculation of fidelity values, whereas r does not (but remember that the TFVI framework uses species abundance values for assignments regardless of the fidelity measure). The second difference is that Indval does not take into account species absences values outside the target site group. The fact that absences outside the target site group contribute to the strength of association in \(r_{\varphi }^{g}\) suggests a potential overestimation of the fidelity value (De Caceres et al. 2008). In this sense, De Cáceres and Legendre (2009) mentioned that indicator value indices have the advantage, compared to correlation indices, of being less dependent on the context of fidelity determination.

5 Conclusion

While the results of our analysis based on two vegetation datasets in the Hyrcanian forests points towards a slight preference of Indval over r for relevé assignments, we acknowledge that additional studies are necessary, using both simulated and real datasets, before more conclusive recommendations can be made in favor of one or another. Given its good performance in the context of the TFVI framework, one could ask whether Indval, being based on species abundances, should also be preferred over phi fidelity indices in the FPFI framework too. Additional work is also needed to test this hypothesis. It may well be the case that the preference for one site-group association measure or another depends on both the intended usage of the assignment rule (i.e., for assigning new relevés to an existing classification vs. refining an initial classification) and on the kind of vegetation considered (e.g., forests vs. grasslands, or species-poor vs. species-rich vegetation).