Abstract
We discuss species distribution models (SDM) for biodiversity studies in ecology. SDM plays an important role to estimate abundance of a species based on environmental variables that are closely related with the habitat of the species. The resultant habitat map indicates areas where the species is likely to live, hence it is essential for conservation planning and reserve selection. We especially focus on a Poisson point process and clarify relations with other statistical methods. Then we discuss a Poisson point process from a view point of information divergence, showing the Kullback–Leibler divergence of density functions reduces to the extended Kullback–Leibler divergence of intensity functions. This property enables us to extend the Poisson point process to that derived from other divergence such as \(\beta \) and \(\gamma \) divergences. Finally, we discuss integrated SDM and evaluate the estimating performance based on the Fisher information matrices.
Similar content being viewed by others
References
Akaike, H. (1973) Information theory and an extension of the maximum likelihood principle. Second International Symposium on Information Theory, pp. 267–281
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723.
Basu, A., Harris, I. R., Hjort, N., & Jones, M. (1998). Robust and efficient estimation by minimising a density power divergence. Biometrika, 85, 549–559.
Belbin, L., Daly, J., Hirsch, T., Hobern, D., & La Salle, J. (2013). A specialist’s audit of aggregated occurrence records: an ‘aggregator’ s’ perspective. Zookeys, 305, 67–76.
Chao, A., Chazdon, R. L., Colwell, R. K., & Shen, T. J. (2005). A new statistical approach for assessing similarity of species composition with incidence and abundance data. Ecology Letters, 8, 148–159.
Copas, J. (1988). Binary regression models for contaminated data. Journal of the Royal Statistical Society: Series B, 50, 225–265.
Dudík, M., Phillips, S. J., & Schapire, R. E. (2004). Performance guarantees for regularized maximum entropy density estimation. In J. Shawe-Taylor & Y. Singer (Eds.), Learning Theory (pp. 472–486). Berlin Heidelberg, Berlin, Heidelberg: Springer.
Dudík, M., Schapire, R. E., & Phillips, S. J. (2005). Correcting sample selection bias in maximum entropy density estimation. Advances in Neural Information Processing System, 18(18), 323–330.
Eguchi, S. & Komori, O. (2015) Path connectedness on a space of probability density functions. In (eds. F. Nielsen & F. Barbaresco (Eds.), Geometric Science of Information: Second International Conference, GSI 2015 (p. 615). Springer, Cham
Eguchi, S., & Komori, O. (2022). Minimum divergence methods in statistical machine learning: from an information geometric viewpoint. Tokyo: Springer.
Elith, J., Graham, C. H., Anderson, R. P., Dudík, M., Ferrier, S., Guisan, A., Hijmans, R. J., Huettmann, F., Leathwick, J. R., Lehmann, A., Li, J., Lohmann, L. G., Loiselle, B. A., Manion, G., Moritz, C., Nakamura, M., Nakazawa, Y., Overton, J. M., Peterson, A. T., … Zimmermann, N. E. (2006). Novel methods improve prediction of species’ distributions from occurrence data. Ecography, 29, 129–151.
Farr, M. T., Green, D. S., Holekamp, K. E., & Zipkin, E. F. (2021). Integrating distance sampling and presence-only data to estimate species abundance. Ecology, 102, e03204.
Fithian, W., Elith, J., Hastie, T., & Keith, D. A. (2015). Bias correction in species distribution models: pooling survey and collection data for multiple species. Methods in Ecology and Evolution, 6, 424–438.
Fithian, W., & Hastie, T. (2013). Finite-sample equivalence in statistical models for presence-only data. Annals of Applied Statistics, 7, 1917–1939.
Frans, V. F., Augé, A. A., Fyfe, J., Zhang, Y., McNally, N., Edelhoff, H., Balkenhol, N., & Engler, J. O. (2022). Integrated SDM database: enhancing the relevance and utility of species distribution models in conservation management. Methods in Ecology and Evolution, 13, 243–261.
Fujisawa, H., & Eguchi, S. (2008). Robust parameter estimation with a small bias against heavy contamination. Journal of Multivariate Analysis, 99, 2053–2081.
Fukaya, K., Kusumoto, B., Shiono, T., Fujinuma, J., & Kubota, Y. (2020). Integrating multiple sources of ecological data to unveil macroscale species abundance. Nature Communications, 11, 1695.
King, G., & Zeng, L. (2001). Logistic regression in rare events data. Political Analysis, 9, 137–163.
Komori, O., & Eguchi, S. (2014). Maximum power entropy method for ecological data analysis. In A. Mohammad-Djafari & F. Barbaresco (Eds.), Bayesian inference and maximum entropy methods in science and engineering (Maxent2014) (pp. 337–344). New York: AIP.
Komori, O., & Eguchi, S. (2019). Statistical methods for imbalanced data in ecological and biological studies. Tokyo: Springer.
Komori, O., Eguchi, S., Ikeda, S., Okamura, H., Ichinokawa, M., & Nakayama, S. (2016). An asymmetric logistic regression model for ecological data. Methods in Ecology and Evolution, 7, 249–260.
Komori, O., Eguchi, S., Saigusa, Y., Kusumoto, B., & Kubota, Y. (2020). Sampling bias correction in species distribution models by quasi-linear Poisson point process. Ecological Informatics, 55, 1–11.
Konishi, S., & Kitagawa, G. (1996). Generalised information criteria in model selection. Biometrika, 83, 875–890.
Koshkina, V., Wang, Y., Gordon, A., Dorazio, R. M., White, M., & Stone, L. (2017). Integrated species distribution models: combining presence-background data and site-occupancy data with imperfect detection. Methods in Ecology and Evolution, 8, 420–430.
Kubota, Y., Shiono, T., & Kusumoto, B. (2015). Role of climate and geohistorical factors in driving plant richness patterns and endemicity on the east Asian continental islands. Ecography, 38, 639–648.
Kusumoto, B., Kubota, Y., Shiono, T., & Villalobos, F. (2021). Biogeographical origin effects on exotic plants colonization in the insular flora of Japan. Biological Invasions, 23, 2973–2984.
Maalouf, M., & Siddiqi, M. (2014). Weighted logistic regression for large-scale imbalanced and rare events data. Knowledge-Based Systems, 59, 142–148.
Maalouf, M., & Trafalis, T. B. (2011). Robust weighted kernel logistic regression in imbalanced and rare events data. Computational Statistics and Data Analysis, 55, 168–183.
Manski, C. F., & Lerman, S. R. (1977). The estimation of choice probabilities from choice based samples. Econometrica, 45, 1977–1988.
McCullagh, P., & Nelder, J. (1989). Generalized linear models. New York: Chapman & Hall.
Mesibov, R. (2013). A specialist’s audit of aggregated occurrence records. ZooKeys, 293, 11–18.
Minami, M., & Eguchi, S. (2002). Robust blind source separation by beta divergence. Neural Computation, 14, 1859–1886.
Murata, N., Takenouchi, T., Kanamori, T., & Eguchi, S. (2004). Information geometry of \({\cal{U} }\)-boost and Bregman divergence. Neural Computation, 16, 1437–1481.
Naudts, J. (2011). Generalised thermostatistics. Berlin: Springer.
Phillips, S. J., & Dudík, M. (2008). Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography, 31, 161–175.
Phillips, S.J., Dudík, M. & Schapire, R.E. (2004) A maximum entropy approach to species distribution modeling. In Proceedings of the 21st International Conference on Machine Learning (pp. 472–486). ACM Press, New York
Rathbun, S. L., & Cressie, N. (1994). Asymptotic properties of estimators for the parameters of spatial inhomogeneous Poisson point processes. Advances in Applied Probability, 26, 122–154.
Renner, I., Elith, J., Baddeley, A., Fithian, W., Hastie, T., Phillips, S. J., Popovic, G., & Warton, I. D. (2015). Point process models for presence-only analysis. Methods in Ecology and Evolution, 6, 366–379.
Renner, I. W., & Warton, D. I. (2013). Equivalence of MAXENT and Poisson point process models for species distribution modeling in ecology. Biometrics, 69, 274–281.
Royle, J. A., & Dorazio, R. M. (2008). Hierachical modeling and inference in ecology: the analysis of data from populations. Metapopulations and Communities: Academic Press, London.
Shiono, T., Kubota, Y., & Kusumoto, B. (2021). Area-based conservation planning in Japan: the importance of OECMs in the post-2020 Global Biodiversity Framework. Global Ecology and Conservation, 30, e01783.
Streit, R. L. (2010). Poisson point processes: imaging, tracking, and sensing. New York: Springer.
Takashina, N., Kusumoto, B., Kubota, Y., & Economo, E. P. (2019). A geometric approach to scaling individual distributions to macroecological patterns. Journal of Theoretical Biology, 461, 170–188.
Villero, D., Pla, M., Camps, D., Ruiz-Olmo, J., & Brontons, L. (2017). Integrating species distribution modelling into decision-making to inform conservation actions. Biodiversity and Conservation, 26, 251–271.
Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. New York: Cambridge University Press.
Warton, D. I. (2015). New opportunities at the interface between ecology and statistics. Methods in Ecology and Evolution, 6, 363–365.
Warton, D. I., & McGeoch, M. A. (2017). Technical advances at the interface between ecology and statistics: improving the biodiversity knowledge generation workflow. Methods in Ecology and Evolution, 8, 396–397.
Warton, D. I., & Shepherd, L. C. (2010). Poisson point process models solve the" pseudo-absence problem" for presence-only data in ecology. The Annals of Applied Statistics, 4, 1383–1402.
Yee, T. W. (2015). Vector generalized linear and additive models. New York: Springer.
Yee, T. W., & Mitchell, N. D. (1991). Generalized additive models in plant ecology. Journal of Vegetation Science, 2, 587–602.
Acknowledgements
We would like to thank two referees for careful reading and useful suggestions, which much improve quality of our manuscript. Part of this work is supported by JSPS KAKENHI No. JP18H03211 and No. JP22K11938.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Komori, O., Saigusa, Y. & Eguchi, S. Statistical learning for species distribution models in ecological studies. Jpn J Stat Data Sci 6, 803–826 (2023). https://doi.org/10.1007/s42081-023-00206-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42081-023-00206-1