Skip to main content

Advertisement

Log in

Statistical learning for species distribution models in ecological studies

  • Original Paper
  • Modern Methods and Applications for Biodiversity Studies
  • Published:
Japanese Journal of Statistics and Data Science Aims and scope Submit manuscript

Abstract

We discuss species distribution models (SDM) for biodiversity studies in ecology. SDM plays an important role to estimate abundance of a species based on environmental variables that are closely related with the habitat of the species. The resultant habitat map indicates areas where the species is likely to live, hence it is essential for conservation planning and reserve selection. We especially focus on a Poisson point process and clarify relations with other statistical methods. Then we discuss a Poisson point process from a view point of information divergence, showing the Kullback–Leibler divergence of density functions reduces to the extended Kullback–Leibler divergence of intensity functions. This property enables us to extend the Poisson point process to that derived from other divergence such as \(\beta \) and \(\gamma \) divergences. Finally, we discuss integrated SDM and evaluate the estimating performance based on the Fisher information matrices.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Akaike, H. (1973) Information theory and an extension of the maximum likelihood principle. Second International Symposium on Information Theory, pp. 267–281

  • Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723.

    MathSciNet  MATH  Google Scholar 

  • Basu, A., Harris, I. R., Hjort, N., & Jones, M. (1998). Robust and efficient estimation by minimising a density power divergence. Biometrika, 85, 549–559.

    MathSciNet  MATH  Google Scholar 

  • Belbin, L., Daly, J., Hirsch, T., Hobern, D., & La Salle, J. (2013). A specialist’s audit of aggregated occurrence records: an ‘aggregator’ s’ perspective. Zookeys, 305, 67–76.

    Google Scholar 

  • Chao, A., Chazdon, R. L., Colwell, R. K., & Shen, T. J. (2005). A new statistical approach for assessing similarity of species composition with incidence and abundance data. Ecology Letters, 8, 148–159.

    Google Scholar 

  • Copas, J. (1988). Binary regression models for contaminated data. Journal of the Royal Statistical Society: Series B, 50, 225–265.

    MathSciNet  Google Scholar 

  • Dudík, M., Phillips, S. J., & Schapire, R. E. (2004). Performance guarantees for regularized maximum entropy density estimation. In J. Shawe-Taylor & Y. Singer (Eds.), Learning Theory (pp. 472–486). Berlin Heidelberg, Berlin, Heidelberg: Springer.

    Google Scholar 

  • Dudík, M., Schapire, R. E., & Phillips, S. J. (2005). Correcting sample selection bias in maximum entropy density estimation. Advances in Neural Information Processing System, 18(18), 323–330.

    Google Scholar 

  • Eguchi, S. & Komori, O. (2015) Path connectedness on a space of probability density functions. In (eds. F. Nielsen & F. Barbaresco (Eds.), Geometric Science of Information: Second International Conference, GSI 2015 (p. 615). Springer, Cham

  • Eguchi, S., & Komori, O. (2022). Minimum divergence methods in statistical machine learning: from an information geometric viewpoint. Tokyo: Springer.

    MATH  Google Scholar 

  • Elith, J., Graham, C. H., Anderson, R. P., Dudík, M., Ferrier, S., Guisan, A., Hijmans, R. J., Huettmann, F., Leathwick, J. R., Lehmann, A., Li, J., Lohmann, L. G., Loiselle, B. A., Manion, G., Moritz, C., Nakamura, M., Nakazawa, Y., Overton, J. M., Peterson, A. T., … Zimmermann, N. E. (2006). Novel methods improve prediction of species’ distributions from occurrence data. Ecography, 29, 129–151.

    Google Scholar 

  • Farr, M. T., Green, D. S., Holekamp, K. E., & Zipkin, E. F. (2021). Integrating distance sampling and presence-only data to estimate species abundance. Ecology, 102, e03204.

    Google Scholar 

  • Fithian, W., Elith, J., Hastie, T., & Keith, D. A. (2015). Bias correction in species distribution models: pooling survey and collection data for multiple species. Methods in Ecology and Evolution, 6, 424–438.

    Google Scholar 

  • Fithian, W., & Hastie, T. (2013). Finite-sample equivalence in statistical models for presence-only data. Annals of Applied Statistics, 7, 1917–1939.

    MathSciNet  MATH  Google Scholar 

  • Frans, V. F., Augé, A. A., Fyfe, J., Zhang, Y., McNally, N., Edelhoff, H., Balkenhol, N., & Engler, J. O. (2022). Integrated SDM database: enhancing the relevance and utility of species distribution models in conservation management. Methods in Ecology and Evolution, 13, 243–261.

    Google Scholar 

  • Fujisawa, H., & Eguchi, S. (2008). Robust parameter estimation with a small bias against heavy contamination. Journal of Multivariate Analysis, 99, 2053–2081.

    MathSciNet  MATH  Google Scholar 

  • Fukaya, K., Kusumoto, B., Shiono, T., Fujinuma, J., & Kubota, Y. (2020). Integrating multiple sources of ecological data to unveil macroscale species abundance. Nature Communications, 11, 1695.

    Google Scholar 

  • King, G., & Zeng, L. (2001). Logistic regression in rare events data. Political Analysis, 9, 137–163.

    Google Scholar 

  • Komori, O., & Eguchi, S. (2014). Maximum power entropy method for ecological data analysis. In A. Mohammad-Djafari & F. Barbaresco (Eds.), Bayesian inference and maximum entropy methods in science and engineering (Maxent2014) (pp. 337–344). New York: AIP.

    Google Scholar 

  • Komori, O., & Eguchi, S. (2019). Statistical methods for imbalanced data in ecological and biological studies. Tokyo: Springer.

    MATH  Google Scholar 

  • Komori, O., Eguchi, S., Ikeda, S., Okamura, H., Ichinokawa, M., & Nakayama, S. (2016). An asymmetric logistic regression model for ecological data. Methods in Ecology and Evolution, 7, 249–260.

    Google Scholar 

  • Komori, O., Eguchi, S., Saigusa, Y., Kusumoto, B., & Kubota, Y. (2020). Sampling bias correction in species distribution models by quasi-linear Poisson point process. Ecological Informatics, 55, 1–11.

    Google Scholar 

  • Konishi, S., & Kitagawa, G. (1996). Generalised information criteria in model selection. Biometrika, 83, 875–890.

    MathSciNet  MATH  Google Scholar 

  • Koshkina, V., Wang, Y., Gordon, A., Dorazio, R. M., White, M., & Stone, L. (2017). Integrated species distribution models: combining presence-background data and site-occupancy data with imperfect detection. Methods in Ecology and Evolution, 8, 420–430.

    Google Scholar 

  • Kubota, Y., Shiono, T., & Kusumoto, B. (2015). Role of climate and geohistorical factors in driving plant richness patterns and endemicity on the east Asian continental islands. Ecography, 38, 639–648.

    Google Scholar 

  • Kusumoto, B., Kubota, Y., Shiono, T., & Villalobos, F. (2021). Biogeographical origin effects on exotic plants colonization in the insular flora of Japan. Biological Invasions, 23, 2973–2984.

    Google Scholar 

  • Maalouf, M., & Siddiqi, M. (2014). Weighted logistic regression for large-scale imbalanced and rare events data. Knowledge-Based Systems, 59, 142–148.

    Google Scholar 

  • Maalouf, M., & Trafalis, T. B. (2011). Robust weighted kernel logistic regression in imbalanced and rare events data. Computational Statistics and Data Analysis, 55, 168–183.

    MathSciNet  MATH  Google Scholar 

  • Manski, C. F., & Lerman, S. R. (1977). The estimation of choice probabilities from choice based samples. Econometrica, 45, 1977–1988.

    MathSciNet  MATH  Google Scholar 

  • McCullagh, P., & Nelder, J. (1989). Generalized linear models. New York: Chapman & Hall.

    MATH  Google Scholar 

  • Mesibov, R. (2013). A specialist’s audit of aggregated occurrence records. ZooKeys, 293, 11–18.

    Google Scholar 

  • Minami, M., & Eguchi, S. (2002). Robust blind source separation by beta divergence. Neural Computation, 14, 1859–1886.

    MATH  Google Scholar 

  • Murata, N., Takenouchi, T., Kanamori, T., & Eguchi, S. (2004). Information geometry of \({\cal{U} }\)-boost and Bregman divergence. Neural Computation, 16, 1437–1481.

    MATH  Google Scholar 

  • Naudts, J. (2011). Generalised thermostatistics. Berlin: Springer.

    MATH  Google Scholar 

  • Phillips, S. J., & Dudík, M. (2008). Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography, 31, 161–175.

    Google Scholar 

  • Phillips, S.J., Dudík, M. & Schapire, R.E. (2004) A maximum entropy approach to species distribution modeling. In Proceedings of the 21st International Conference on Machine Learning (pp. 472–486). ACM Press, New York

  • Rathbun, S. L., & Cressie, N. (1994). Asymptotic properties of estimators for the parameters of spatial inhomogeneous Poisson point processes. Advances in Applied Probability, 26, 122–154.

    MathSciNet  MATH  Google Scholar 

  • Renner, I., Elith, J., Baddeley, A., Fithian, W., Hastie, T., Phillips, S. J., Popovic, G., & Warton, I. D. (2015). Point process models for presence-only analysis. Methods in Ecology and Evolution, 6, 366–379.

    Google Scholar 

  • Renner, I. W., & Warton, D. I. (2013). Equivalence of MAXENT and Poisson point process models for species distribution modeling in ecology. Biometrics, 69, 274–281.

    MathSciNet  MATH  Google Scholar 

  • Royle, J. A., & Dorazio, R. M. (2008). Hierachical modeling and inference in ecology: the analysis of data from populations. Metapopulations and Communities: Academic Press, London.

    Google Scholar 

  • Shiono, T., Kubota, Y., & Kusumoto, B. (2021). Area-based conservation planning in Japan: the importance of OECMs in the post-2020 Global Biodiversity Framework. Global Ecology and Conservation, 30, e01783.

    Google Scholar 

  • Streit, R. L. (2010). Poisson point processes: imaging, tracking, and sensing. New York: Springer.

    Google Scholar 

  • Takashina, N., Kusumoto, B., Kubota, Y., & Economo, E. P. (2019). A geometric approach to scaling individual distributions to macroecological patterns. Journal of Theoretical Biology, 461, 170–188.

    MathSciNet  MATH  Google Scholar 

  • Villero, D., Pla, M., Camps, D., Ruiz-Olmo, J., & Brontons, L. (2017). Integrating species distribution modelling into decision-making to inform conservation actions. Biodiversity and Conservation, 26, 251–271.

    Google Scholar 

  • Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. New York: Cambridge University Press.

    MATH  Google Scholar 

  • Warton, D. I. (2015). New opportunities at the interface between ecology and statistics. Methods in Ecology and Evolution, 6, 363–365.

    Google Scholar 

  • Warton, D. I., & McGeoch, M. A. (2017). Technical advances at the interface between ecology and statistics: improving the biodiversity knowledge generation workflow. Methods in Ecology and Evolution, 8, 396–397.

    Google Scholar 

  • Warton, D. I., & Shepherd, L. C. (2010). Poisson point process models solve the" pseudo-absence problem" for presence-only data in ecology. The Annals of Applied Statistics, 4, 1383–1402.

    MathSciNet  MATH  Google Scholar 

  • Yee, T. W. (2015). Vector generalized linear and additive models. New York: Springer.

    MATH  Google Scholar 

  • Yee, T. W., & Mitchell, N. D. (1991). Generalized additive models in plant ecology. Journal of Vegetation Science, 2, 587–602.

    Google Scholar 

Download references

Acknowledgements

We would like to thank two referees for careful reading and useful suggestions, which much improve quality of our manuscript. Part of this work is supported by JSPS KAKENHI No. JP18H03211 and No. JP22K11938.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Osamu Komori.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Komori, O., Saigusa, Y. & Eguchi, S. Statistical learning for species distribution models in ecological studies. Jpn J Stat Data Sci 6, 803–826 (2023). https://doi.org/10.1007/s42081-023-00206-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42081-023-00206-1

Keywords

Navigation