Skip to main content
Log in

Data leakage jeopardizes ecological applications of machine learning

  • Comment
  • Published:

From Nature Ecology & Evolution

View current issue Submit your manuscript

Machine learning is a popular tool in ecology but many scientific applications suffer from data leakage, causing misleading results. We highlight common pitfalls in ecological machine-learning methods and argue that discipline-specific model info sheets must be developed to aid in model evaluations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1: How data leakage might occur in ecological applications, explained through the lens of shortcut learning.

References

  1. Tuia, D. et al. Nat. Commun. 13, 792 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Valletta, J. J. et al. J. Anim. Behav. 124, 203–220 (2017).

    Article  Google Scholar 

  3. Kapoor, S. & Narayanan, A. Preprint at arXiv, http://arxiv.org/abs/2207.07048 (2022).

  4. Kaufman, S. et al. ACM Trans. Knowl. Discov. Data 6, 15 (2012).

    Article  Google Scholar 

  5. Stock, A., Haupt, A. J., Mach, M. E. & Micheli, F. Ecol. Inform. 48, 37–47 (2018).

    Article  Google Scholar 

  6. Geirhos, R. et al. Nat. Mach. Learn. 2, 665–673 (2020).

    Google Scholar 

  7. Shane, J. Do neural nets dream of electric sheep? AI Weirdness, https://www.aiweirdness.com/do-neural-nets-dream-of-electric-18-03-02/ (2 March 2018)

  8. Beery, S., Van Horn, G. & Perona, P. Recognition in terra incognita. In Computer Vision – ECCV 2018 (eds Ferrari, V., Hebert, M., Sminchisescu, C. & Weiss, Y.) 472–489 (2018).

  9. Gregr, E. J. et al. Ecography 42, 428–443 (2019).

    Article  Google Scholar 

  10. Stock, A. ISPRS J. Photogramm. Remote Sens. 187, 46–60 (2022).

    Article  Google Scholar 

  11. Roberts, D. R. et al. Ecography 40, 913–929 (2017).

    Article  Google Scholar 

  12. Wiles, O. et al. Preprint at arXiv, https://doi.org/10.48550/arXiv.2110.11328 (2021).

  13. Yates, K. L. et al. Trends Ecol. Evol. 33, 790–802 (2018).

    Article  PubMed  Google Scholar 

  14. Chan, K. M. A. & Gregr, E. J. Hindsight: tackling pattern, scale, and independence to ensure ecosystem models are predictive. functionalecologists.com, https://functionalecologists.com/2018/10/19/hindsight-tackling-pattern-scale-and-independence-to-ensure-ecosystem-models-are-predictive/ (2018).

  15. Valavi, R. et al. Methods Ecol. Evol. 10, 225–232 (2019).

    Article  Google Scholar 

  16. Feng, X. et al. Nat. Ecol. Evol. 3, 1382–1395 (2019).

    Article  PubMed  Google Scholar 

  17. Serra-Garcia, M. & Gneezy, U. Sci. Adv. 7, eabd1705 (, (2021).

  18. Grill, G. Preprint at OSF Preprints, https://doi.org/10.31219/osf.io/zekqv (2022).

  19. Lürig, M. D. et al. 9, 642774 (2021).

Download references

Acknowledgements

We were supported by a Liber Ero Postdoctoral Fellowship (A.S.) and NSERC Discovery Grant RGPIN-2020-05032 (K.M.A.C.).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andy Stock.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Ecology & Evolution thanks the anonymous reviewers for their contribution to the peer review of this work.

Supplementary information

Supplementary Information

Supplementary Figure 1.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Stock, A., Gregr, E.J. & Chan, K.M.A. Data leakage jeopardizes ecological applications of machine learning. Nat Ecol Evol 7, 1743–1745 (2023). https://doi.org/10.1038/s41559-023-02162-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41559-023-02162-1

  • Springer Nature Limited

This article is cited by

Navigation