Skip to main content

Only Simpson Diversity can be Estimated Accurately from Microbial Community Fingerprints


Lalande et al. (Microb. Ecol. 66(3):647–658, 2013) introduced a promising approach to quantify microbial diversity from fingerprinting profiles. Their analysis is based on extrapolating the abundance of the phylotypes detectable in a fingerprint towards the rare phylotypes of the community. By considering a set of reconstructed communities, Lalande et al. obtained a range of estimates for phylotype richness, Shannon diversity and Simpson diversity. They reported narrow ranges indicating accurate estimation, especially for Shannon and Simpson diversities. Here, we show that a much larger set of reconstructed communities than the one considered by Lalande et al. is consistent with the fingerprint. We find that the estimates for phylotype richness and Shannon diversity vary over orders of magnitude, but that the estimates for Simpson diversity are restricted to a narrow range (around 10 %). We conclude that only Simpson diversity can be estimated accurately from fingerprints.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2


  1. Lalande J, Villemur R, Deschênes L (2013) A new framework to accurately quantify soil bacterial community diversity from DGGE. Microb Ecol 66(3):647–658

    Article  PubMed  Google Scholar 

  2. Loisel P, Harmand J, Zemb O, Latrille E, Lobry C, Delgenès JP, Godon J J (2006) Denaturing gradient electrophoresis (DGE) and single-strand conformation polymorphism (SSCP) molecular fingerprintings revisited by simulation and used as a tool to measure microbial diversity. Environ Microbiol 8(4):720–731

    CAS  Article  PubMed  Google Scholar 

  3. Blackwood CB, Hudleston D, Zak DR, Buyer JS (2007) Interpreting ecological diversity indices applied to terminal restriction fragment length polymorphism data: insights from simulated microbial communities. Appl Environ Microbiol 73(16):5276–5283

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  4. Haegeman B, Hamelin J, Moriarty J, Neal P, Dushoff J, Weitz JS (2013) Robust estimation of microbial diversity in theory and in practice. ISME J 7(6):1092–1101

    PubMed Central  Article  PubMed  Google Scholar 

Download references


This work was supported by the SYSCOMM project DISCO (ANR-09-SYSC-003) and by the TULIP Laboratory of Excellence (ANR-10-LABX-41).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Bart Haegeman.



Here, we describe the reconstructed communities of Fig. 1 and the diversity estimates shown in Fig. 2.

First, we extracted the fingerprint peak areas from Fig. 1 of Ref. [1]. The total area of the 34 extracted peak equals 20 % of the total area under the fingerprinting profile (hence, the peak-to-signal ratio PSR =0.20 in the terminology of Ref. [1]). The remaining 80 % of the area under the profile corresponds to the background (that is, the subpeak background percentage SBP =0.80 in the terminology of Ref. [2]).

Second, we constructed four communities consistent with the fingerprint data. The 34 most abundant phylotypes correspond to the fingerprint peaks. The relative abundance of these phylotypes is equal to the peak areas divided by the total area under the profile. Hence, the total relative abudance of the most abundant phylotypes is equal to 0.20. We chose the abundance distribution of the rare phylotypes such that the following conditions are satisfied: (1) the total relative abundance of the rare phylotypes is equal to 0.80 and (2) the abundance of a rare phylotype is smaller than the abundance of each of the most abundant phylotypes.

We report the abundance distribution of the rare phylotypes as rank-abundance curves, that is, we give the relationship between relative abundance p i and rank i for the rare phylotypes (with rank i < 34):

  • The red community has 103 phylotypes. Its rank-abundance curve is quadratic on a log-log plot, ln p i = −3.391 − 0.8554 ln i + 0.03750 (lni)2 for 34 < i ≤ 103.

  • The yellow community has 104 phylotypes. Its rank-abundance curve is linear on a log-log plot, ln p i = −2.924 − 0.8535 ln i for 34 < i ≤ 104.

  • The green community has 105 phylotypes. Its rank-abundance curve is linear on a log-log plot, ln p i = −2.492 − 0.9750 ln i for 34 < i ≤ 105.

  • The blue community has 106 phylotypes. Its rank-abundance curve is linear on a log-log plot, ln p i = −2.294 − 1.0306 ln i for 34 < i ≤ 106.

For the yellow, green and blue communities, the abundance distribution of the rare phylotypes is power law. For the red community this distribution is approximately power law (the rank-abundance curve is slightly convex, see Fig. 1, right-hand panel). For a community with 103 phylotypes, a power law distribution for the rare phylotypes does not match smoothly the abundance of the dominant phylotypes.

Third, we computed three diversity metrics for the four reconstructed communities: phylotype richness D 0, Shannon diversity D 1,

$$ D_{1} = \mathrm{e}^{H} \qquad \text{with} \quad H = - \sum\limits_{I} p_{i}\ln p_{i}, $$

and Simpson diversity D 2,

$$ D_{2} = \frac{1}{C} \qquad \text{with} \quad C = \sum\limits_{i} p_{i}^{2}. $$

The notation D 0, D 1 and D 2 refers to Hill diversities of order 0, 1 and 2 (see Ref. [4] for details). Because Hill diversities can be interpreted as effective numbers of phylotypes, they are intercomparable. Therefore, we prefer to use the transformed diversity metrics D 1 and D 2 rather than Shannon diversity index H and Simpson concentration index C. We find:

  • For red community: D 0=103, D 1=7.4 102 and D 2=4.1 102.

  • For yellow community: D 0=104, D 1=2.8 103 and D 2=5.0 102.

  • For green community: D 0=105, D 1=7.7 103 and D 2=5.2 102.

  • For blue community: D 0=106, D 1=1.7 104 and D 2=5.3 102.

Finally, we generalized the analysis to a much large set of reconstructed communities. More precisely, we considered all reconstructed communities satisfying conditions (1) and (2) above. This set, although it contains unrealistic communities (for example, communities with an abrupt transition from dominant to rare phylotypes), is useful to obtain lower and upper bounds for the estimation range of the diversity metrics. Indeed, it is possible to determine the community in this set yielding the lowest and highest diversity estimates. The lowest diversity estimate is obtained for a community in which all the rare phylotypes have the same abundance as the smallest abundance of the most abundant phylotypes. The highest diversity estimate is obtained for a community in which there are a large number R of rare phylotypes which all have the same relative abundance 0.20/R.

The results of this further analysis are shown as the grey-shaded regions in Fig. 2. The lower end of these regions are equal to the lowest diversity estimate. At the upper end, the shade of grey becomes gradually lighter, corresponding to the higest diversity estimate with R ranging from 104 to 107. It is interesting to note the dependence of the highest diversity estimate on the number of rare phylotypes R for the three diversity metrics: when R is large, the estimate for phylotype richness increases proportional to R, the estimate for Shannon diversity increases proportional to ln R and the estimate for Simpson diversity tends to a fixed value. This establishes another argument of why Simpson diversity can be estimated more accurately than Shannon diversity and phylotype richness.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Haegeman, B., Sen, B., Godon, JJ. et al. Only Simpson Diversity can be Estimated Accurately from Microbial Community Fingerprints. Microb Ecol 68, 169–172 (2014).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Terminal Restriction Fragment Length Polymorphism
  • Estimation Range
  • Abundance Distribution
  • Diversity Metrics
  • Simpson Diversity