Machine learning is a popular tool in ecology but many scientific applications suffer from data leakage, causing misleading results. We highlight common pitfalls in ecological machine-learning methods and argue that discipline-specific model info sheets must be developed to aid in model evaluations.
References
Tuia, D. et al. Nat. Commun. 13, 792 (2022).
Valletta, J. J. et al. J. Anim. Behav. 124, 203–220 (2017).
Kapoor, S. & Narayanan, A. Preprint at arXiv, http://arxiv.org/abs/2207.07048 (2022).
Kaufman, S. et al. ACM Trans. Knowl. Discov. Data 6, 15 (2012).
Stock, A., Haupt, A. J., Mach, M. E. & Micheli, F. Ecol. Inform. 48, 37–47 (2018).
Geirhos, R. et al. Nat. Mach. Learn. 2, 665–673 (2020).
Shane, J. Do neural nets dream of electric sheep? AI Weirdness, https://www.aiweirdness.com/do-neural-nets-dream-of-electric-18-03-02/ (2 March 2018)
Beery, S., Van Horn, G. & Perona, P. Recognition in terra incognita. In Computer Vision – ECCV 2018 (eds Ferrari, V., Hebert, M., Sminchisescu, C. & Weiss, Y.) 472–489 (2018).
Gregr, E. J. et al. Ecography 42, 428–443 (2019).
Stock, A. ISPRS J. Photogramm. Remote Sens. 187, 46–60 (2022).
Roberts, D. R. et al. Ecography 40, 913–929 (2017).
Wiles, O. et al. Preprint at arXiv, https://doi.org/10.48550/arXiv.2110.11328 (2021).
Yates, K. L. et al. Trends Ecol. Evol. 33, 790–802 (2018).
Chan, K. M. A. & Gregr, E. J. Hindsight: tackling pattern, scale, and independence to ensure ecosystem models are predictive. functionalecologists.com, https://functionalecologists.com/2018/10/19/hindsight-tackling-pattern-scale-and-independence-to-ensure-ecosystem-models-are-predictive/ (2018).
Valavi, R. et al. Methods Ecol. Evol. 10, 225–232 (2019).
Feng, X. et al. Nat. Ecol. Evol. 3, 1382–1395 (2019).
Serra-Garcia, M. & Gneezy, U. Sci. Adv. 7, eabd1705 (, (2021).
Grill, G. Preprint at OSF Preprints, https://doi.org/10.31219/osf.io/zekqv (2022).
Lürig, M. D. et al. 9, 642774 (2021).
Acknowledgements
We were supported by a Liber Ero Postdoctoral Fellowship (A.S.) and NSERC Discovery Grant RGPIN-2020-05032 (K.M.A.C.).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Ecology & Evolution thanks the anonymous reviewers for their contribution to the peer review of this work.
Supplementary information
Supplementary Information
Supplementary Figure 1.
Rights and permissions
About this article
Cite this article
Stock, A., Gregr, E.J. & Chan, K.M.A. Data leakage jeopardizes ecological applications of machine learning. Nat Ecol Evol 7, 1743–1745 (2023). https://doi.org/10.1038/s41559-023-02162-1
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41559-023-02162-1
- Springer Nature Limited
This article is cited by
-
Within-season vegetation indices and yield stability as a predictor of spatial patterns of Maize (Zea mays L) yields
Precision Agriculture (2024)