Mathematical Geosciences

, Volume 47, Issue 6, pp 647–661 | Cite as

Mineral Species Frequency Distribution Conforms to a Large Number of Rare Events Model: Prediction of Earth’s Missing Minerals

  • Grethe HystadEmail author
  • Robert T. Downs
  • Robert M. Hazen


A population model is introduced to describe the mineral species frequency distribution. Mineral species coupled with their localities conform to a large number of rare events (LNRE) distribution: 100 common mineral species occur at more than 1,000 localities, whereas \(34 \,\%\) of the approved 4,831 mineral species are found at only one or two localities. LNRE models formulated in terms of a structural type distribution allow the estimation of Earth’s undiscovered mineralogical diversity and the prediction of the percentage of observed mineral species that would differ if Earth’s history were replayed.


Statistical mineralogy Mineral ecology Mineral frequency distribution 



Joshua Golden, Edward Grew, and Dimitri Sverjensky provided valuable advice and discussions. We thank the Deep Carbon Observatory, the Keck Foundation, and a private foundation for support.


  1. Baayen RH (1993) Statistical models for word frequency distributions: a linguistic evaluation. Comput Humanit 26:347–363CrossRefGoogle Scholar
  2. Baayen RH (2001) Word frequency distributions, text, speech and language technology, vol 18. Kluwer Academic Publishers, DordrechtCrossRefGoogle Scholar
  3. Baroni M, Evert S (2007) Words and echoes: assessing and mitigating the non-randomness problem in word frequency distribution modeling. In: Proceedings of the 45th annual meeting of the association for computational linguistics, Prague, pp 904–911Google Scholar
  4. Baroni M, Evert S (2005) Testing the extrapolation quality of word frequency models. In: Danielsson P, Wagenmakers M (eds) Proceedings of corpus linguistics 2005, Birmingham, UK. The corpus linguistics conference series, vol 1Google Scholar
  5. Bunge J, Barger K (2008) Parametric models for estimating the number of classes. Biom J 50(6):971–982CrossRefGoogle Scholar
  6. Bunge J, Fitzpatrick M (1993) Estimating the number of species: a review. J Am Stat Assoc 88(421):364–373Google Scholar
  7. Bunge J, Willis A, Walsh F (2014) Estimating the number of species in microbial diversity studies. Annu Rev Stat Appl 1:427–445Google Scholar
  8. Burnham KP, Overton WS (1978) Estimation of the size of a closed population when capture probabilities vary among animals. Biometrika 65(3):625–633CrossRefGoogle Scholar
  9. Burnham KP, Overton WS (1979) Robust estimation of population size when capture probabilities vary among animals. Ecology 60(5):927–936CrossRefGoogle Scholar
  10. Chao A (1984) Nonparametric estimation of the number of classes in a population. Scand J Stat 11(4):265–270Google Scholar
  11. Chao A, Bunge J (2002) Estimating the number of species in a stochastic abundance model. Biometrics 58(3):531–539CrossRefGoogle Scholar
  12. Chao A, Lee SM (1992) Estimating the number of classes via sample coverage. J Am Stat Assoc 87(417):210–217CrossRefGoogle Scholar
  13. Chao A, Ma MC, Yang MCK (1993) Stopping rules and estimation for recapture debugging with unequal failure rates. Biometrika 80:193–201CrossRefGoogle Scholar
  14. Chao A, Hwang WH, Chen YC, Kuo CY (2000) Estimating the number of shared species in two communities. Stat Sin 10:227–246Google Scholar
  15. Efron B, Thisted R (1976) Estimating the number of unseen species: how many words did Shakespeare know? Biometrica 63(3):435–447Google Scholar
  16. Efron B, Tibshirani RJ (1993) An introduction to the bootstrap, monographs on statistics and applied probability, vol 57. Chapman & Hall/CRC, LondonCrossRefGoogle Scholar
  17. Evert S (2004) A simple LNRE model for random character sequences. In: Proceedings of the 7èmes Journées Internationales d’Analyse Statistique des Données Textuelles, Louvain-la-Neuve, pp 411–422Google Scholar
  18. Evert S, Baroni M (2007) zipfR: word frequency distributions in R. In: Proceedings of the 45th annual meeting of the association for computational linguistics, posters and demonstrations session, Prague, pp 29–32Google Scholar
  19. Evert S, Baroni M (2008) Statistical models for word frequency distributions, package zipfR. Accessed 10 Nov 2008
  20. Fisher RA, Corbet AS, Williams CB (1943) The relation between the number of species and the number of individuals in a random sample of an animal population. J Anim Ecol 12(1):42–58CrossRefGoogle Scholar
  21. Good IJ (1953) The population frequencies of species and the estimation of population parameters. Biometrika 40:237–264Google Scholar
  22. Hazen RM, Grew ES, Downs RT, Golden J, Hystad G (2015) Mineral ecology: chance and necessity in the mineral diversity of terrestrial planets. Can Mineral 53(2). doi: 10.3749/canmin.1400086
  23. Heller G (1997) Estimation of the number of classes. S Afr Stat J 31:65–90Google Scholar
  24. Keating KA, Quinn JF, Ivie MA, Ivie LL (1998) Estimating the effectiveness of further sampling in species inventories. Ecol Appl 8(4):1239–1249Google Scholar
  25. Khmaladze EV (1987) The statistical analysis of large number of rare events. Tech. Rep. MS-R8804, Department of Mathematical Statistics, Center for Mathematics and Computer Science, CWI, Amsterdam, NetherlandsGoogle Scholar
  26. Khmaladze EV, Chitashvili RJ (1989) Statistical analysis of large number of rare events and related problems. Trans Tbilisi Math Inst 91:196–245Google Scholar
  27. Kyselý J (2010) Coverage probability of bootstrap confidence intervals in heavy-tailed frequency models, with application to precipitation data. Theor Appl Climatol 101:345–361CrossRefGoogle Scholar
  28. Ma C, Beckett JR, Rossman GR (2014) Monipite, MoNiP, a new phosphide mineral in a Ca-Al-rich inclusion from the Allende meteorite. Am Mineral 99(1):198–205CrossRefGoogle Scholar
  29. Miller RI, Wiegert RG (1989) Documenting completeness, species-area relations, and the species-abundance distribution of a regional flora. Ecology 70(1):16–22CrossRefGoogle Scholar
  30. Norris JL, Pollock KH (1998) Non-parametric MLE for Poisson species abundance models allowing for heterogeneity between species. Environ Ecol Stat 5(4):391–402CrossRefGoogle Scholar
  31. Shen TJ, Chao A, Lin CF (2003) Predicting the number of new species in further taxonomic sampling. Ecology 84(3):798–804CrossRefGoogle Scholar
  32. Sichel HS (1971) On a family of discrete distributions particularly suited to represent long-tailed frequency data. In: Proceedings of the third symposium on mathematical statistics, Pretoria, pp 51–97Google Scholar
  33. Sichel HS (1975) On a distribution law for word frequencies. J Am Stat Assoc 70:542–547Google Scholar
  34. Sichel HS (1986) Word frequency distributions and type-token characteristics. Math Sci 11:45–72Google Scholar
  35. Soberón J, Llorente J (1993) The use of species accumulation functions for the prediction of species richness. Conserv Biol 7(3):480–488CrossRefGoogle Scholar
  36. Solow AR, Polasky S (1999) A quick estimator for taxonomic surveys. Ecology 80(8):2799–2803CrossRefGoogle Scholar
  37. Wang JP (2010) Estimating species richness by a Poisson-compound Gamma model. Biometrika 97(3):727–740CrossRefGoogle Scholar
  38. Wang JP (2011) SPECIES: an R package for species richness estimation. J Stat Softw 40(9):1–15Google Scholar

Copyright information

© International Association for Mathematical Geosciences 2015

Authors and Affiliations

  • Grethe Hystad
    • 1
    Email author
  • Robert T. Downs
    • 2
  • Robert M. Hazen
    • 3
  1. 1.Department of MathematicsUniversity of ArizonaTucsonUSA
  2. 2.Department of GeosciencesUniversity of ArizonaTucsonUSA
  3. 3.Geophysical LaboratoryCarnegie InstitutionWashingtonUSA

Personalised recommendations