Measuring temporal trends in biodiversity

In 2002, nearly 200 nations signed up to the 2010 target of the Convention for Biological Diversity, ‘to significantly reduce the rate of biodiversity loss by 2010’. To assess whether the target was met, it became necessary to quantify temporal trends in measures of diversity. This resulted in a marked shift in focus for biodiversity measurement. We explore the developments in measuring biodiversity that was prompted by the 2010 target. We consider measures based on species proportions, and also explain why a geometric mean of relative abundance estimates was preferred to such measures for assessing progress towards the target. We look at the use of diversity profiles, and consider how species similarity can be incorporated into diversity measures. We also discuss measures of turnover that can be used to quantify shifts in community composition arising, for example, from climate change.

S s=1 p s = 1. The two classical measures most favoured are the Shannon index (also called Shannon entropy, Shannon 1948), H = − S s=1 p s log p s , and Simpson's index (Simpson 1949), D = S s=1 p 2 s . Rényi (1961) entropy generalizes Shannon entropy. Patil and Taillie (1982) formulated diversity measures in terms of average rarity of the species in a community. They showed that, with different measures of rarity, the Shannon and Simpson's indices are both special cases, as is species richness (i.e., the number of species in the community). Rényi (1961) and Patil and Taillie (1982) also developed axioms that a diversity measure should meet.
The focus for classical measures, such as Shannon and Simpson's, is typically on comparing two communities, and measuring which is the more diverse. This largely academic exercise was found wanting when in 2002, the Convention for Biological Diversity (CBD) sets its 2010 target, 'to significantly reduce the rate of biodiversity loss by 2010'. Because nations signed up to this commitment, methods were needed to quantify temporal trends in biodiversity of regions (nations). This raised several issues.
An obvious issue is that, if all species in a community are declining at the same rate, the species proportions remain the same (i.e., p s j = p s independent of year j), so that classical measures based on species proportions show no change. That is, they do not measure changes in abundance.
When interest is in regional diversity, the N s (which now represent the number of individuals of species s in the region) are unknown. At best, they can be estimated from survey data from a relatively small sample of sites. Often, this is not possible, but time series of data exist that are believed to reflect temporal trends in abundance . In this case, we can estimate the abundance of species s in year j relative to year 1: R s j = N s j /N s1 for s = 1, . . . , S, j = 1, . . . , J , where N s j is abundance of species s in year j. Assuming that we have a time series of counts n s j with expectation E n s j = α s N s j , where α s is an unknown constant for species s, then we can estimate R s j byR s j = n s j /n s1 . Now, we cannot estimate the species proportions (unless α s = α for all s), and, hence, cannot evaluate the classical biodiversity measures, but we can use as an index of biodiversity in year j, the geometric mean of theR s j for s = 1, . . . , S: G j = exp 1 S S s=1 log R s j . The Living Planet Index (Loh et al. 2005) is a high-profile example of an index that takes advantage of this approach. The properties of G j as a measure of biodiversity are explored by Buckland et al. (2005Buckland et al. ( , 2011 and Gregory and Strien (2010).
When a biodiversity survey of a region is conducted according to a randomized design (Buckland et al. 2012), and data are collected to allow estimation of detectability , then much more can be done than merely plotting the index G j against year j: • We can explore how temporal trends vary spatially (Harrison et al. 2014).
• We can use generalized measures based on species proportions to explore how temporal trends change, as we weight the measure towards the more common species or the more scarce species in the community . • We can explore changes in the relative dominance of species by adopting suitable turnover measures ), which, for example, allow assessment of the effects of climate change on biodiversity, and how these effects vary spatially ). • We can also incorporate changes in absolute abundance into a turnover measure (Shimadzu et al. 2015). • Provided we can measure how similar two species are, we can ensure that those species that contribute most to diversity in terms of their genetic distinctness from other species also contribute most to our measure (Leinster and Cobbold 2012).
We present here a review of metrics and indices that allow us to exploit biodiversity data from regional surveys. We start with the geometric mean of relative abundances, which quantifies the average trend in relative abundance across species. Diversity profiles allow trends to be explored when rare species are given greater or lesser weight. Turnover measures allow changes in species composition of a community to be quantified, as distinct from changes in diversity of the community. We also argue in favour of measures that quantify both temporal and spatial variability in diversity. Finally, we introduce measures considering the similarity between species.

The geometric mean
The geometric mean index G j is widely used for assessing temporal trends in biodiversity. Buckland et al. (2011) showed that, if the total number of species S is constant, G j < G 1 = 1 if and only if the mean of the log abundances in year j is less than the mean of the log abundances in year 1. Furthermore, if overall abundance N j = S s=1 N s j is constant, then G j < G 1 if and only if the mean of the log species proportions in year j is less than the mean of the log species proportions in year 1.
The mean of the log species proportions has the key property of an evenness measure, in that it attains its maximum value when all the species proportions are equal: p s j = 1/S for all s (Smith and Wilson 1996). Thus, when overall abundance and number of species are constant, G j may be regarded as a measure of the change in evenness from year 1 to year j . When overall abundance is changing, changes in G j reflect changes in both abundance and evenness. If all species are declining at the same rate (so that there is no trend in evenness), then G j will decline at this rate. By contrast, measures that are functions of species proportions alone will show no trend.
The index G j is unaffected if detectability varies by species, as it is based on withinspecies trends; if detectability of individuals of a given species does not change over time, we do not need to estimate detectability to avoid bias, regardless of whether detectability varies among species. By contrast, measures based on species proportions are biased when detectability varies by species, unless counts are corrected using species-specific estimates of detectability ). However, if there is temporal trend in detectability, it becomes important to correct for detectability if we use G j .
Because G j is based on within-species trends, standardized to a baseline year, it makes no difference whether we use counts of individuals or biomass to quantify abundance, provided there is no trend over time within-species in mean weight of individuals. Further, we can readily combine trends obtained from different surveys: the geometric mean is a natural method to adopt when we wish to construct composite indices across surveys, regions, or communities.
A consequence of the above two advantages is that we can combine relative abundance trends from surveys that use different units of measurement. For example, trends in a plant species might be quantified using percent cover, those for a bird species using counts, and those for a fish species using biomass. We can legitimately combine these different trends into a composite index. The different time series also do not need to span the same time period; if a given time series starts in year k > 1, we simply rescale the relative abundance estimates from that series, so that the value in year k matches the value of the composite index in year k.
A limitation of G j is that it cannot be calculated if any of the relative abundance estimates are zero. Thus, if a species is not recorded in a given year, the index cannot be evaluated. We could add a small quantity to zeros (O'Brien et al. 2010), but the index is sensitive to the quantity chosen, and has poor precision if rarely recorded species are included. Hence typically, species with small sample sizes in some or all years are excluded from analysis. Thus, G j is not a useful measure if primary interest is in rare, or rarely recorded, species, or in species that are not consistently present in the community. However, most biodiversity measures perform poorly in such cases. The index G j gives equal weight to all species, and so, is sensitive to changes in rarely recorded species, but is also adversely affected if lack of data on rarely recorded species results in imprecise and possibly biased estimates of trend; by contrast, indices based on species proportions are typically insensitive to changes in rarely recorded species, and largely unaffected by unreliable estimation of trends for such species. If we include rarely recorded species when calculating G j , problems are reduced but not eliminated by developing a model for counts, and replacing the observed counts by the corresponding predicted counts before evaluating the index . A species that becomes too rare to monitor reliably can be dropped from the index, while a species that becomes sufficiently common to monitor can be added, so that the set of species monitored changes over time.

Diversity profiles
A single biodiversity measure cannot encapsulate the multivariate information in the data. For this reason, diversity profiles can be more useful (Tóthmérész 1995). If a measure is defined with a free parameter, then it can be plotted against that parameter, to show how the measure changes. For example, Hill's diversity family links species richness, the Shannon and Simpson's indices, and the Berger-Parker dominance measure via a free parameter λ, and allows investigators to construct a diversity profile of a community (Hill 1973). The Hill numbers refer to the value of the profile at specific values of the free parameter: λ = −∞ gives the reciprocal of the proportional abundance of the rarest species; λ = 0 gives the number of species (species richness); λ = 1 gives the exponential of the Shannon index; λ = 2 gives the reciprocal of Simpson's index; and λ = ∞ gives the reciprocal of the proportional abundance of the most common species. Jost (2006) showed the generality of the Hill numbers for quantifying diversity, where diversity is the effective number of species in a community (the number of equally abundant species that would be needed to give the same value of the diversity measure). Leinster and Cobbold (2012) introduced a general measure which incorporates species similarity; when similarities among species are all set to zero, their measure is equivalent to the Hill numbers. They further note that many existing measures are special cases of the Hill numbers or closely related to them. Chao et al. (2014a) also demonstrated how the Hill numbers can be used to unify measures of species diversity, phylogenetic diversity, and distance-based functional diversity. Chao et al. (2014b) unified two fundamental frameworks for the measurement and estimation of species diversity: Hill numbers and rarefaction/extrapolation. Diversity profiles typically represent a trend from species richness towards the dominance of one species, thus combining different aspects of the species abundance distribution. For a perfectly even community with no dominant species, such a profile is flat. The family of divergence measures of Studeny et al. (2011) quantifies the relative contributions of dominant and rare species to unevenness. Thus, it can distinguish between communities with a few highly dominant species from those where common species are fairly balanced in abundance, with the main contribution to unevenness coming from the rare species. We can then explore time trends for different choices of the free parameter, representing separate trends in dominant and rare species. We can also generate a three-dimensional plot, with time on the x-axis, the free parameter on the y-axis, and the measure on the z-axis, showing how the profile changes over time . Studeny et al. (2011) adapted the power divergence statistics of Cressie and Read (1984) for their divergence measure. Cowell (1980) used the same measure in econometrics. Hill's numbers have advantages: they can be interpreted as the number of effective species; they are members of a more general family of diversities (Leinster and Cobbold 2012; Chao et al. 2014a); and reduced-bias estimators are available (Chao and Jost 2015). Marcon et al. (2014a) showed that the Hill numbers are the deformed exponential of Tsallis entropy (Tsallis 1988), whereas they are the exponential of Rényi's entropy (Patil and Taillie 1982). The relation between Tsallis entropy and the Hill numbers is continuous and strictly increasing (Jost 2006), so a bijective relation exists between entropy and diversity. It can be shown algebraically that the measure of Studeny et al. is also closely related to Tsallis entropy. Theil (1967)'s entropy, used by economists, is the difference between the Shannon index (entropy) and its maximum value (log e S, which occurs for a perfectly even community). Up to a constant factor equal to 1 λS λ−1 , the measure of Studeny et al. of order λ − 1 is the difference between the Tsallis entropy of order λ and its maximum value (the deformed logarithm of order λ of the number of species). Thus, it generalizes the Theil entropy. time points, but abundances of individual species have changed, such measures show no change. If climate change causes some species to become more common while others become rarer, we require different measures to quantify change.
Additive partitioning of regional (gamma) diversity into local (alpha) and spatially varying (beta) diversity was proposed by Lande (1996). Jost (2007) proposed a multiplicative partitioning, such that beta diversity is the effective number of communities in the region. Chalmandrier et al. (2015) further explored the multiplicative approach, developing methodology for decomposing phylogenetic and functional diversity over space and time, and obtaining measurements of beta diversity which are independent of gamma diversity and alpha diversity. Tuomisto (2010a) noted that beta diversity has been used to refer to 'a wide variety of phenomena'; to define the term, he proposed a framework for the umbrella concept of beta diversity. In a second paper, he addressed issues related to quantifying beta diversity from data (Tuomisto 2010b). Jost et al. (2010) reviewed the issue of compositional similarity, as it relates to beta diversity.
Beta diversity is concerned with spatial turnover. In the context of CBD targets, we are interested in temporal turnover, and how that varies spatially. Anderson et al. (2011) showed how beta diversity measures can be used to quantify turnover in community structure through time. However, unlike time, space has no natural order (Dornelas et al. 2013); temporal measures that change if time is reversed are not appropriate measures of beta diversity.
An example of a spatial turnover measure is Bray-Curtis dissimilarity (Bray and Curtis 1957;Gower and Legendre 1986), which is a simple index for quantifying the dissimilarity between two communities A and B. When applied to species abundances N s A and N s B , it is given by 1 −  , p s B ). It, thus, takes the value zero when two communities have the same composition, and the value one when no species are in common to the two communities. A measure of difference between two communities can also be applied to quantify change between two time points of a single community. Carranza et al. (2007) considered temporal turnover, quantifying temporal change in land cover using Rényi's generalized entropy function (1961).
Many turnover measures are based on recorded range changes, so that the measures reflect changes in the species present in a community. This is unsatisfactory when temporal biodiversity changes within regions are of interest. First, at the regional level, extinctions and colonizations tend to be relatively rare events, and second, regional surveys tend to sample a very small proportion of the region, so that there is considerable uncertainty over when extinction and colonization events occur . Measures based on the changes that each species shows in its species proportion between two time points ) are more useful in this context.
Denote the species proportion vector in year j by p j = p 1 j , . . . , p Sj and the turnover measure between year j 1 and year j 2 by d p j 1 , p j 2 . Yuan et al. (2016) argued that we would like the measure to be a metric, with the following properties: 1. Positive definiteness d p j 1 , p j 2 > 0 for any p j1 = p j2 and d p j 1 , p j 2 = 0 if and only if p j1 = p j2 . 2. Symmetry d p j 1 , p j 2 = d p j 2 , p j 1 for any p j1 = p j2 .

Triangle inequality d p j
Further properties proposed by Yuan et al. are: scale invariance, which ensures, for example, that d p j 1 , p j 2 = d N j 1 , N j 2 , where the abundance vector N j = N 1 j , . . . , N Sj and p j = N j / S s=1 N s j ; invariance to coordinate scaling, so that, for example, d p j 1 , p j 2 = d n j 1 , n j 2 , where n j is the vector of species counts in year j, with equality holding even if individuals of one species are more detectable than individuals of another; and permutation invariance, so that turnover measures are unchanged if the species ordering is changed. Yuan et al. (2016) considered the following families of measures, and listed which of the above properties each measure satisfies. The L q -distance measure Its range spans [0,1], where 0 corresponds to no turnover, while 1 corresponds to 100 % turnover. It is a metric only if q = 1: d 1 p j 1 , p j 2 = 1 2 S s=1 p s j 1 − p s j 2 = 1 − S s=1 min p s j 1 , p s j 2 , which is the Bray-Curtis dissimilarity (Bray and Curtis 1957) between the two time points, when applied to species proportions. Another special case of interest is q = 2, when the numerator is the Euclidean distance between

Pairwise angular measure
This measure is undefined if a species is absent at both time points, or if more than one species is absent at one of the time points. A value of zero corresponds to no turnover, and one is an upper bound for the measure, but if there are more than two species, it cannot attain this upper bound.
Scaled centred-logratio measure This is the only measure considered by Yuan et al. (2016), which satisfies all six properties listed above. However, it is undefined if any species proportion is zero, and it can take any non-negative real value (with zero corresponding to no turnover), while most measures span the interval from zero (no turnover) to one (100 % turnover).

Kullback-Leibler divergence measure
The Kullback-Leibler divergence measure may be expressed as S s=1 p s j 1 log . This is asymmetric-if time is reversed, the value changes. We obtain a symmetric measure by taking the sum of the contribution to the measure by each species, averaged over the measure, and its time-reversed equivalent: which may be expressed as where H j is the Shannon index in year j. As the Shannon index does not incorporate species identity, the information on turnover is entirely within the final sum. This measure is also undefined if any species proportions are zero, and it can take any non-negative real value, with zero corresponding to no turnover.
The asymmetric measure of Shimadzu et al. Shimadzu et al. (2015) also proposed a measure that incorporates species proportions. They argued that, unlike for spatial turnover measures, temporal measures need not be symmetric. That is if time is reversed, there is no imperative for the measure to be unaltered. Their measure can be expressed as the sum of two components. One component is a function of species proportions: Thus, this component is (apart from sign) the Kullback-Leibler divergence measure. It is never positive, and takes the value zero only when there is no turnover. The second component is a function of expected overall abundance of the S species at two time points j 1 and j 2 : where λ j = s s=1 λ s j and λ s j = E(N s j ). This component can take any real value. Thus, the measure D 1 p j1 , p j2 + D 2 λ j1 , λ j2 can also take any real value. In particular, it is not constrained to be positive. The measure is zero when both components are zero, which occurs when both p j1 = p j2 and λ j1 = λ j2 (i.e., no change). However, it is also zero if D 1 p j1 , p j2 < 0 and D 2 λ j1 , λ j2 = −D 1 p j1 , p j2 > 0. Note that if species proportions stay the same, but abundance changes (i.e., all species have the same trend in abundance), then the measure gives negative turnover for decreasing abundance, and positive turnover for increasing abundance.

Which measure?
Given the large number of potential measures for quantifying temporal turnover, some guidance is, perhaps, needed. We do not favour using the asymmetric measure of Shimadzu et al. (2015). While we accept their argument that it is not essential for a measure of temporal turnover (as distinct from spatial turnover) to satisfy the symmetry property, nevertheless, we believe that interpretation of changes is helped using a symmetric measure. Furthermore, a measure that can give a zero estimate of turnover when increases in abundance offset changes in species proportions seems unsatisfactory.
If we wish our turnover measure to be sensitive to changes among rare species, then a pairwise measure should be the preferred choice. The pairwise angular measure satisfies properties 1-4, and is restricted to the range [0,1], but, in general, cannot attain a value of one, and cannot be interpreted as an absolute measure of turnover (for which zero should correspond to no turnover, and one to 100 % turnover, with no species in common between the two time points). The centred-log-ratio measure satisfies all six properties, but it has no upper limit, so is also best considered a relative measure of turnover.
If precision is more important than sensitivity to rare species, then the angular measure or the L q -distance measure should be used. Both are absolute measures of turnover. For the latter measure, if we take q = 2, it has the added advantage that it can be extended to incorporate species similarities (below). A choice of q = 1 gives a measure that is less sensitive to those species that show very big changes. This sensitivity is reduced further for choices of q in the range (0,1) (Royden 1968).

Spatial variation in temporal trends
Another consequence of considering biodiversity trends of regions as distinct from sites is that there is interest in how the temporal trends vary across the region. To assess this, we need spatial models, allowing the density of each species to be estimated at any location in the region. This allows any of the above diversity or turnover measures to be evaluated at any location. This approach was used in conjunction with generalized additive models to assess how diversity trends (Harrison et al. 2014) and turnover ) of breeding birds varied by 100 km square throughout Great Britain. Yuan et al. (unpublished) used R-INLA (Rue et al. 2009) to fit spatio-temporal models to quantify how temporal changes in commercial fish diversity varied through the North Sea. Chalmandrier et al. (2015) proposed a multiplicative decomposition over space and time of the exponential of Shannon entropy. Their methods allow regions with relatively rapid temporal change, or with high beta diversity, to be identified.
Note the distinction between spatial variation in temporal trends of biodiversity and spatial trends. Leitão et al. (2015) develop Sparse Generalised Dissimilarity Modelling for quantifying spatial trends.
6 Incorporating species similarity Leinster and Cobbold (2012) define a similarity matrix Z, where the element z st is the similarity between species s and t, where z ss = 1 and 0 ≤ z st ≤ 1. They do not assume that this matrix is symmetric. They then define diversity of order q to be which is the expected similarity between an individual of species s and an individual chosen at random. By taking limits, they also define the index for q = 1 and ∞.
Pavoine and Ricotta (2014) introduced a family of similarity measures, by extending the dissimilarity coefficient of Gower and Legendre (1986). It provides a framework for comparing the traditional compositional turnover with functional or phylogenetic similarities among communities. Yuan et al. (2016) propose a modification of the L 2 -distance measure d 2 p j1 , p j2 to take account of species similarity when quantifying turnover. By noting that the expressions in d 2 p j 1 , p j 2 may be written as quadratic forms, we can write If Z is the identity matrix, then the measure is unaltered. However, we can define the element z st of Z to be the similarity between species s and t, where z ss = 1, z st = z ts and 0 ≤ z st < 1 for s = t. The matrix Z should be chosen so that it is positive definite, which ensures that the quadratic forms are positive. This is closely related to the methods of Pavoine and Ricotta (2014). For the above methods, we need to quantify the similarity of species. As discussed by Leinster and Cobbold (2012), methods for quantifying similarity might be genetic, functional, taxonomic, morphological, or phylogenetic. Similarity matrices are often derived from distance matrices whose elements are d st , as discussed in depth by Leinster and Cobbold (2012). The most common transformation is z st = 1 − d st .
Building the distance matrix by representing the species in the multi-dimensional space of traits is a classical approach (Gower 1971) that ensures that the distance matrix is Euclidean and the similarity matrix is positive definite. Ultrametric distances obtained from a phylogeny can also be transformed into a similarity matrix (Leinster and Cobbold 2012), although it is not symmetric. Addressing non-ultrametric phylogenies ) is controversial (Leinster and Cobbold 2012;Marcon and Hérault 2015), since they result in matrices Z that are not similarities (some z st are greater than 1) and depend on the frequencies of species. Marcon and Hérault (2015) considered the partitioning of phylogenetic diversity, in which species-relatedness is considered, and Marcon et al. (2014b) addressed the partitioning and estimation of similarity-based diversity as defined by Leinster and Cobbold (2012).

Discussion
The classical work on measuring diversity tends to be focussed on comparing species assemblages at selected sites. For assessing progress towards CBD targets, we are interested in temporal trends in biodiversity of regions (nations). Thus, we need surveys designed to allow the diversity of regions to be quantified. The UK Breeding Bird Survey is an excellent example of what can be achieved: around 3000 1 km squares are surveyed each year, according to a stratified random sampling scheme. In each sampled square, a volunteer carries out a line transect survey along two lines, each of length 1 km. It illustrates that 'citizen science' surveys can generate good-quality data to allow assessment of progress towards targets. We expect to see a growth in ambitious surveys of this type, to span other regions and taxa. This will lead to a greater demand for statistical methodologies for measuring temporal trends in the diversity of regions.
Species richness has played a prominent role in biodiversity measurement. It has the advantage that no abundance estimates are required; we simply need a list of species present. When small sites are being monitored, it may be practical to list all species of a community, or to adopt methods for estimating the number of species present. When quantifying the biodiversity of regions, with a focus on temporal changes, it is more difficult to achieve reliable inference based on species richness. Furthermore, at the national level, the number of species in a community typically varies little over short timescales, whereas species abundances (and hence proportions) tend to vary more rapidly, so that surveys that enable abundance estimation are able to identify change more quickly.
Biodiversity may be quantified in many ways. Its multi-dimensional nature means that no single measure can meet all needs for assessing change. It is important that measures are selected, taking account of what types of changes are of interest. Thus, consideration should be given to the following issues: • Are changes in common species of greater or lesser importance than changes in scarce species?
• Is species identity important, or is the priority simply to maintain diversity levels, regardless of species composition? • Is it important to identify shifts in community structure, for example, away from specialist and towards generalist species? • Do you wish to assign greater weight to certain species? If so, is there an objective way to set weights? • Do you wish to identify the effects of environmental change such as climate change on communities? • Do you wish to quantify changes arising from changes in land-use management?
Classical measures based on species proportions have good precision for quantifying changes in evenness, and are robust to uncertain estimation for rare species. However, they are also insensitive to change among rare species, and the measures do not incorporate species identity, and so are ineffective at identifying changes resulting from environmental or land-use change, unless such changes affect evenness or species richness. Turnover measures are more appropriate for quantifying change arising from environmental or land-use change. Pairwise turnover measures are sensitive to changes amongst rarer species in a community, but they also typically have lower precision than other turnover measures. Turnover measures also offer options for quantifying spatial trends, temporal trends, and/or spatio-temporal trends in biodiversity. For example, we may wish to explore how temporal trends vary spatially, or how spatial trends vary in time. The geometric mean is appropriate if an index that is sensitive to changes in overall abundance is required; it gives equal weight to all species, and as a consequence, it also reflects changes in evenness. Its precision is reduced if rarer species are included in the analysis, and it cannot accommodate abundance estimates of zero. The geometric mean, as with some other measures, can be weighted to allow different species to have different weights (Buckland et al. 2012).
We have presented major aspects of biodiversity that should be considered to assess temporal trends in the context of Convention for Biological Diversity targets. These are the ability to include similarity between species; to reflect adequately changes in the relative abundance of rare species; to account for varying abundances; and to measure turnover. The literature of diversity measurement is extensive, and yet we do not have a unified approach that incorporates these aspects in a single coherent framework.