The use of a meta-criterion
to define an optimal parameter
value has been used widely in phylogenetic analysis, i.e. the incongruence length difference test to define the ts/tv/gap costs (Wheeler 1995) or jack-knife frequencies to evaluate whether concavity parsimony outperforms linear parsimony (Goloboff et al. 2008).
In conservation biology, there must be a measure of the confidence and robustness
of the results. A sensitivity analysis
, deleting at random part of the information, helps to understand the support
of the data as the persistence of a given area
in the ranking. Therefore, jack-knife is the appropriate tool to explore the behavior of the results to perturbations in the data set (Holmes 2003).
In a conservation phylogenetic based analysis, there are three different items to evaluate, as we have three input parameters: the topology
, the species in a given topology, and the distribution of a species.
The first question arises when we ask about the distributional pattern
of the species -what if a locality (therefore all or some species in that area
) is not included in the analysis? -, A species could not be included in a given locality for three reasons, because (1) it was never present there; (2) it is locally extinct; or (3) it was not sampled, although the species is present in the area. To evaluate such situation, the species can be deleted from a number areas to quantify the effect of missing information.
The second question arises when a species included in the phylogenetic analysis is not considered in the conservation analysis -what if a species is not included?-. A species not included in the analysis will affect the index value as this depends on the species included on the calculation. In this context, the presence of a species is deleted from all the areas it inhabits.
The third question arises when we do not include a given phylogeny -what if a phylogeny is not included?-. The whole topology
might not be available for the conservation analysis. We could depend on a limited subset of phylogenies to the ranking of an specific area
. Here, the topology, therefore the species and their distributions are deleted.
Given the three questions we can decide whether a phylogeny, a taxon or an area
is deleted, with different probability values:
j.topol is the probability to choose a topology
j.tip is the probability to choose a species (= q)
is the probability to choose an area (= r)
In the first scenario, an area
is deleted from the distribution of a species with a probability of p × q × r (0 < p, q, r < 1), that is, the probability to select the topology
and then select the species and then select the area. An area could be removed from the whole analysis, and this has to be run only the number of areas times, eliminating a single area each time. It would show the position of the area in the ranking of the areas and is equal to delete
the area from the final results.
In the second scenario, a species is deleted from a single topology
with a probability p × q (0 < p, q < 1, r = 1.0), therefore all areas inhabited by this species will not be included.
In the third scenario, the whole topology
is not included in the analysis with a probability p (0 < p < 1, q = r = 1.0), all the species and areas, belonging to that topology, will not be included in the analysis.
The first decision in the three scenarios, is made on the topology
. As the number of topologies NOT included increases with the value of p, the absolute indices values would be small and inversely proportional to the value of p.
Those areas prioritized because of its position in a single or just a few topologies would change, the indices values would be lower, and the position of the area
in the ranking might change. If an area is supported by all or most of the topologies, its position in the ranking must be stable, although the index value would be small in all the replicates, therefore the index values per se are meaningless, but the ranking is informative.
There is a fourth question, not considered here, related to the length of the branch. This question is valid in the context of Phylogenetic Diversity
] (Faith 1992), Genetic Diversity [GD] (Crozier 1992), or total lineage divergence (Scheiner 2012) [a metric similar to PD]. These methods require the precise estimation of the length, therefore the accuracy of the index value depends heavily on the length estimation.
Although Krajewski (1994) considers that the debate of the use and calculations of divergence in systematics and conservation are two topics, I consider that the same criticisms to the accuracy estimation of the length in systematics will have a profound impact in the decision made when the topology
and its branch length
s are used in conservation. And as this quotation from Brown et al. (2010) states, “in any phylogenetic analysis, the biological plausibility of branch-length
output must be carefully considered”. Therefore, we must be well aware of the methodological approach used to construct the phylogeny (Rannala et al. 2012).
Additionally, in some cases we must consider the sensitivity of PD
value to intra-specific variation (Albert et al. 2012). Therefore, we must take into account the source of the tree (species vs. gene trees) [see for example Spinks and Shaffer (2009)].