Mathematical Geosciences

, Volume 51, Issue 4, pp 401–417 | Cite as

Bayesian Estimation of Earth’s Undiscovered Mineralogical Diversity Using Noninformative Priors

  • Grethe HystadEmail author
  • Ahmed Eleish
  • Robert M. Hazen
  • Shaunna M. Morrison
  • Robert T. Downs


Recently, statistical distributions have been explored to provide estimates of the mineralogical diversity of Earth, and Earth-like planets. In this paper, a Bayesian approach is introduced to estimate Earth’s undiscovered mineralogical diversity. Samples are generated from a posterior distribution of the model parameters using Markov chain Monte Carlo simulations such that estimates and inference are directly obtained. It was previously shown that the mineral species frequency distribution conforms to a generalized inverse Gauss–Poisson (GIGP) large number of rare events model. Even though the model fit was good, the population size estimate obtained by using this model was found to be unreasonably low by mineralogists. In this paper, several zero-truncated, mixed Poisson distributions are fitted and compared, where the Poisson-lognormal distribution is found to provide the best fit. Subsequently, the population size estimates obtained by Bayesian methods are compared to the empirical Bayes estimates. Species accumulation curves are constructed and employed to estimate the population size as a function of sampling size. Finally, the relative abundances, and hence the occurrence probabilities of species in a random sample, are calculated numerically for all mineral species in Earth’s crust using the Poisson-lognormal distribution. These calculations are connected and compared to the calculations obtained in a previous paper using the GIGP model for which mineralogical criteria of an Earth-like planet were given.


Bayesian statistics Mixed Poisson distribution Species estimation Mineral frequency distribution Mineral ecology 



We would like to thank the reviewers for their valuable comments to improve the paper. We gratefully acknowledge support from NASA Mars Science Laboratory Mission NNX11AP82A, as well as support from the Alfred P. Sloan Foundation (Grant No. 2013-10-01), the W.M. Keck Foundation (Grant No. 140002372), the John Templeton Foundation (Grant No. 60645), the Deep Carbon Observatory, the Carnegie Institution for Science, and an anonymous private foundation.

Supplementary material

11004_2019_9795_MOESM1_ESM.pdf (70 kb)
Supplementary material 1 (pdf 69 KB)
11004_2019_9795_MOESM2_ESM.pdf (69 kb)
Supplementary material 2 (pdf 68 KB)
11004_2019_9795_MOESM3_ESM.pdf (70 kb)
Supplementary material 3 (pdf 69 KB)


  1. Baayen RH (2001) Word frequency distributions, text, speech and language technology, vol 18. Kluwer Academic Publishers, DordrechtCrossRefGoogle Scholar
  2. Barger K, Bunge J (2008) Bayesian estimation of the number of species using noninformative priors. Biom J 50(6):1064–1076CrossRefGoogle Scholar
  3. Barger K, Bunge J (2010) Objective bayesian estimation for the number of species. Bayesian Anal 5(4):765–785CrossRefGoogle Scholar
  4. Baroni M, Evert S (2007) Words and echoes: assessing and mitigating the non-randomness problem in word frequency distribution modeling. In: Proceedings of the 45th annual meeting of the association for computational linguistics, Prague, Czech Republic, pp 904–911Google Scholar
  5. Bernardo JM (1979) Reference posterior distributions for bayesian inference. J R Stat Soc B 41:113–147Google Scholar
  6. Bernardo JM, Ramón JM (1998) An introduction to bayesian reference analysis: inference on the ratio of multinomial parameters. J R Stat Soc D 47:101–135CrossRefGoogle Scholar
  7. Bunge J, Barger K (2008) Parametric models for estimating the number of classes. Biom J 50(6):971–982CrossRefGoogle Scholar
  8. Bunge J, Fitzpatrick M (1993) Estimating the number of species: a review. J Am Stat Assoc 88(421):364–373Google Scholar
  9. Carroll JB (1967) On sampling from a lognormal model of word frequency distribution. In: Kučera H, Francis WN (eds) Computational analysis of present-day American English. Brown University Press, Providence, pp 406–424Google Scholar
  10. Chao A, Bunge J (2002) Estimating the number of species in a stochastic abundance model. Biometrics 58(3):531–539CrossRefGoogle Scholar
  11. Chib S, Greenberg E (1995) Understanding the metropolis-hastings algorithm. Am Stat 49(4):327–335Google Scholar
  12. Fisher RA, Corbet AS, Williams CB (1943) The relation between the number of species and the number of individuals in a random sample of an animal population. J Anim Ecol 12(1):42–58CrossRefGoogle Scholar
  13. Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2013) Bayesian data analysis, 3rd edn. CRC Press, Boca RatonGoogle Scholar
  14. Golden J, McMillan M, Downs RT, Hystad G, Goldstein I, Stein HJ, Zimmerman A, Sverjensky DA, Armstrong JT, Hazen RM (2013) Rhenium variations in molybdenite \((MoS_{2})\): evidence for progressive subsurface oxidation. Earth Planet Sci Lett 366:1–5CrossRefGoogle Scholar
  15. Grew ES, Krivovichev SV, Hazen RM, Hystad G (2016) Evolution of structural complexity in boron minerals. Can Mineral 54:125–143CrossRefGoogle Scholar
  16. Grøtan V, Engen S (2015) Poisson lognormal and bivariate Poisson lognormal distribution, package poilog. Accessed 20 Feb 2015
  17. Gutiérrez-Peña E, Rueda R (2003) Reference priors for exponential families. J Stat Plan Inference 110:35–54CrossRefGoogle Scholar
  18. Hazen RM (2017) Chance, necessity, and the origins of life: a physical sciences perspective. Philos Trans R Soc A 375:20160353CrossRefGoogle Scholar
  19. Hazen RM, Grew ES, Downs RT, Golden J, Hystad G (2015a) Mineral ecology: chance and necessity in the mineral diversity of terrestrial planets. Can Mineral 53:295–324CrossRefGoogle Scholar
  20. Hazen RM, Hystad G, Downs RT, Golden JJ, Pires AJ, Grew ES (2015b) Earth’s ’missing’ minerals. Am Mineral 100:2344–2347CrossRefGoogle Scholar
  21. Hazen RM, Hummer DR, Hystad G, Downs RT, Golden JJ (2016) Carbon mineral ecology: predicting the undiscovered minerals of carbon. Am Mineral 101:889–906CrossRefGoogle Scholar
  22. Hazen RM, Hystad G, Golden JJ, Hummer DR, Liu C, Downs RT, Morrison SM, Ralph J, Grew ES (2017) Cobalt mineral ecology. Am Mineral 102:108–116CrossRefGoogle Scholar
  23. Hystad G, Downs RT, Grew ES, Hazen RM (2015a) Statistical analysis of mineral diversity and distribution: earth’s mineralogy is unique. Earth Planet Sci Lett 426:154–157CrossRefGoogle Scholar
  24. Hystad G, Downs RT, Hazen RM (2015b) Mineral species fequency distribution conforms to a large number of rare events model: prediction of Earth’s missing minerals. Math Geosci 47:647–661CrossRefGoogle Scholar
  25. Hystad G, Downs RT, Hazen RM, Golden JJ (2017) Relative abundances of mineral species: a statistical measure to characterize earth-like planets based on earth’s mineralogy. Math Geosci 49:179–194CrossRefGoogle Scholar
  26. Jeffreys H (1946) An invariant form for the prior probability in estimation problems. Proc R Soc Ser A 186:453–461Google Scholar
  27. Jørgensen B (1982) Statistical properties of the generalized inverse Gaussian distribution, 1st edn. Springer, New YorkCrossRefGoogle Scholar
  28. Lindsay BG, Roeder K (1987) A unified treatment of integer parameter models. J Am Stat Assoc 82:758–764CrossRefGoogle Scholar
  29. McGill BJ et al (2007) Species abundance distributions: moving beyond single prediction theories to integration within an ecological framework. Ecol Lett 10:995–1015CrossRefGoogle Scholar
  30. Plummer M, Best N, Cowles K, Vines K, Sarkar D, Bates D, Almond R, Magnusson A (2018) Output analysis and diagnostics for MCMC, package ’coda’. Accessed 8 Oct 2018
  31. Quince C, Curtis TP, Sloan WT (2008) The rational exploration of microbial diversity. Int Soc Microb Ecol J 2:997–1006Google Scholar
  32. Rodrigues J, Milan LA, Leite JG (2001) Hierarchical Bayesian estimation for the number of species. Biom J 43(6):737–746CrossRefGoogle Scholar
  33. Sichel HS (1971) On a family of discrete distributions particularly suited to represent long-tailed frequency data. In: Proceedings of the third symposium on mathematical statistics, Pretoria, South Africa, pp 51–97Google Scholar
  34. Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A (2002) Bayesian measures of model complexity and fit. J R Stat Soc B 64:583–639CrossRefGoogle Scholar
  35. Stan Development Team (2017) RStan: the R interface to stan., R package version 2.17.2
  36. Stasinopoulos M, Rigby B (2018) Generalised additive models for location scale and shape, package GAMLSS. Accessed 6 Oct 2018
  37. Stein GZ, Zucchini W, Juritz JM (1987) Parameter estimation for the sichel distribution and its multivariate extension. J Am Stat Assoc 82:938–944CrossRefGoogle Scholar

Copyright information

© International Association for Mathematical Geosciences 2019

Authors and Affiliations

  1. 1.Mathematics, Statistics, and Computer SciencePurdue University NorthwestHammondUSA
  2. 2.Tetherless World Constellation, Department of Earth and Environmental SciencesRensselaer Polytechnic InstituteTroyUSA
  3. 3.Geophysical LaboratoryCarnegie Institution for ScienceWashingtonUSA
  4. 4.Department of GeosciencesUniversity of ArizonaTucsonUSA

Personalised recommendations