Evaluation of machine learning methods for predicting eradication of aquatic invasive species

A Correction to this article was published on 28 April 2018

This article has been updated


In the work, we evaluate the performance of machine learning approaches for predicting successful eradication of aquatic invasive species (AIS) and assess the extent to which eradication of an invasive species depends on the certain specified ecological features of the target ecosystem and/or features that characterize the planned intervention. We studied the outcomes of 143 planned attempts for eradicating AIS, where each attempt was described by ecological and eradication-strategy-related features of the target ecosystem. We considered several machine learning approaches to determine whether one could produce a classifier that accurately predicts weather an invasive species will be eradicated. To assess each learner’s performance, we examined its tenfold cross-validated prediction accuracy as well as the false positive rate, the F-measure, and the Area Under the ROC Curve. We also used Kaplan–Meier survival analysis to determine which features are relevant to predicting the time required for each eradication program. Across the five typical machine learning approaches, our analysis suggests that learners trained by the decision tree work well, and have the best performance. In particular, by examining the trained decision tree model, we found that if an occupied area was not large and/or containments of AIS dispersal were employed, the eradication of AIS was likely to be successful. We also trained decision tree models over only the ecological features and found that their performances were comparable with that of models trained using all features. As our trained decision tree models are accurate, decision makers can use them to estimate the result of the proposed actions before they commit to which specific strategy should be applied.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Change history

  • 28 April 2018

    Figure 6 was published incorrectly with an incorrect axis in the original publication. The correct version of Fig. 6 is provided in this correction.


  1. 1.

    Here, n = 6 as we are considering 6 features.

  2. 2.

    Some successful eradication attempts had records of several annual follow-up surveys at the end of the attempts (Rowe and Champion 1994; Akers 2009). (This is because confirmations of some species being eradicated may need several years of continuous observations on target ecosystems and assessments on the trade-offs arising in any decisions.) Here, we defined these recorded durations as the time of that final follow-up survey—i.e., as time required to confirm the eradication of the AIS. For the other successful trials, without records of follow-up surveys, we set the recorded time as the eradication time.


  1. Akers P (2009) Hydrilla eradication program progress report 2009. Technical report, California Department of Food and Agriculture

  2. Barahona-Segovia R, Grez A, Bozinovic F (2015) Testing the hypothesis of greater eurythermality in invasive than in native ladybird species: from physiological performance to life-history strategies. Ecol Entomol 41(2):182–191

    Article  Google Scholar 

  3. Boets P, Landuyt D, Everaert G, Broekx S, Goethals P (2015) Evaluation and comparison of data-driven and knowledge-supported Bayesian belief networks to assess the habitat suitability for alien macroinvertebrates. Environ Model Softw 74:92–103

    Article  Google Scholar 

  4. Breiman L, Friedman J, Stone C, Olshen R (1984) Classification and regression trees. Taylor & Francis, London

    Google Scholar 

  5. Cambray J (2003) Impact on indigenous species biodiversity caused by the globalisation of alien recreational freshwater fisheries. In: Martens K (ed) Aquatic biodiversity: a celebratory volume in honour of Henri J. Dumont. Springer, Dordrecht, pp 217–230

    Google Scholar 

  6. Caudron A, Champigneulle A (2011) Multiple electrofishing as a mitigate tool for removing nonnative atlantic brown trout (Salmo trutte l.) threatening a native mediterranean brown trout population. Eur J Widlife Res 5(3):575–583

    Article  Google Scholar 

  7. Cooling M, Hartley S, Sim D, Lester P (2011) The widespread collapse of an invasive species: Argentine ants (Linepithema humile) in New Zealand. Biol Lett 8:430–433

    Article  PubMed  PubMed Central  Google Scholar 

  8. Cox D, Oakes D (1984) Analysis of survival data. Chapman & Hall/CRC, London

    Google Scholar 

  9. Drake D, Mercader R, Dobson T, Mandrak E (2015) Can we predict risky human behaviour involving invasive species? A case study of the release of fishes to the wild. Biol Invasions 17:309–326

    Article  Google Scholar 

  10. Drolet D, Locke A, Lewis MA, Davidson J (2014) User-friendly and evidence-based tool to evaluate probability of eradication of aquatic non-indigenous species. J Appl Ecol 51(4):1050–1056

    Article  Google Scholar 

  11. Drolet D, Locke A, Lewis MA, Davidson J (2015) Evidence-based tool surpasses expert opinion in predicting probability of eradication of aquatic nonindigenous species. Ecol Appl 25(2):441–450

    Article  PubMed  Google Scholar 

  12. Eilers J, Truemper H, Jackson L, Eilers B, Loomis D (2011) Eradication of an invasive cyprinid (Gila bicolor) to achieve water quality goals in Diamond Lake, Oregon (USA). Ecol Appl 27:194–204

    CAS  Google Scholar 

  13. Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27:861–874

    Article  Google Scholar 

  14. Ferri C, Flach P, Hernandez-Orallo J (2002) Learning decision trees using the area under the ROC curve. In: Proceeding ICML ’02 proceedings of the nineteenth international conference on machine learning, pp 139–146

  15. Fielding A (1999) Machine learning methods for ecological applications. Springer, New York

    Google Scholar 

  16. Gurevitch J, Padilla DK (2004) Are invasive species a major cause of extinctions? Trends Ecol Evol 19(9):470–474

    Article  PubMed  Google Scholar 

  17. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I (2009) The weka data mining software: an update. SIGKDD Explor 11(1):10–18

    Article  Google Scholar 

  18. Houlahan J, Findlay CS (2004) Effect of invasive plant species on temperate wetland plant diversity. Conserv Biol 18(4):1132–1138

    Article  Google Scholar 

  19. Kaukeinen D (1983) Vertebrate pest control and management materials: fourth symposium. ASTM International, Philadelphia

    Google Scholar 

  20. Keller R, Kocev D, Dzeroski S (2011) Trait-based risk assessment for invasive species: high performance across diverse taxonomic groups, geographic ranges and machine learning/statistical tools. Divers Distrib 17(3):451–461

    Article  Google Scholar 

  21. Klein J, Moeschberger M (1997) Survival analysis—techniques for censored and truncated data—statistics for biology and health. Springer, New York

    Google Scholar 

  22. Kleinbaum D, Klein M (2005) Survival analysis: statistics for biology and health, 2nd edn. Springer, New York

    Google Scholar 

  23. Kolar C, Lodge D (2001) Progress in invasion biology: predicting invaders. Trends Ecol Evol 16:199–204

    Article  PubMed  Google Scholar 

  24. Kolar C, Lodge DM (2002) Ecological predictions and risk assessment for alien fishes in North America. Science 298(5596):1233–1236

    Article  PubMed  CAS  Google Scholar 

  25. Kulp M, Moore S (2000) Multiple electrofishing removals for eliminating rainbow trout in a small southern appalachian stream. N Am J Fish Manag 20(1):259–266

    Article  Google Scholar 

  26. Lawless J (2002) Statistical models and methods for lifetime data. Wiley-Interscience, Hoboken

    Google Scholar 

  27. Lawrence J (2005) Introduction to neural networks, 2nd edn. California Scientific Software Press, California

    Google Scholar 

  28. Lek S, Guacgan J (1999) Artificial neural networks as a tool in ecological modelling, an introduction. Ecol Model 120:65–73

    Article  Google Scholar 

  29. Lockwood J, Cassey P, Blackburn T (2005) The role of propagule pressure in explaining species invasions. Trends Ecol Evol 20:223–228

    Article  PubMed  Google Scholar 

  30. Mantel N (1966) Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemother Rep 50(3):163–170

    PubMed  CAS  Google Scholar 

  31. Mantel N, Haenszel W (1959) Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst 22(4):719–748

    PubMed  CAS  Google Scholar 

  32. Massey F (1951) The Kolmogorov–Smirnov test for goodness of fit. J Am Stat Assoc 46(253):68–78

    Article  Google Scholar 

  33. McDonald J (2014) Handbook of biological statistics, 3rd edn. Sparky House Publishing, Baltimore

    Google Scholar 

  34. Miller L (1956) Table of percentage points of Kolmogorov statistics. J Am Stat Assoc 51(273):111–121

    Article  Google Scholar 

  35. Mitchell T (1997) Machine learning. Mc-Graw-Hill Companies Inc, New York

    Google Scholar 

  36. Nagar L, Shenkar N (2016) Temperature and salinity sensitivity of the invasive ascidian Microcosmus exasperatus Heller, 1878. Aquat Invasions 11(1):33–43

    Article  Google Scholar 

  37. Olden J, Jackson D (2002) Illuminating the black box; understanding variable contributions in artificial neural networks. Ecol Model 154:135–150

    Article  Google Scholar 

  38. Olden J, Lawler J, Poff N (2008) Machine learning methods without tears: a primer for ecologists. Q Rev Biol 83(2):171–193

    Article  PubMed  Google Scholar 

  39. Peto R, Peto J (1972) Asymptotically efficient rank invariant test procedures. J R Stat Soc Ser A 135(2):185–207

    Article  Google Scholar 

  40. Powers D (2011) Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J Mach Learn Technol 2(1):37–63

    Google Scholar 

  41. Pu R, Gong P, Tian Y, Miao X, Carruthers R, Anderson G (2008) Invasive species change detection using artificial neural networks and CASI hyperspectral imagery. Environ Monit Assess 140(1–3):15–32

    Article  PubMed  Google Scholar 

  42. Pullin A, Knight T, Stone D, Charman K (2004) Do conservation managers use scientific evidence to support their decision-making? Biol Conserv 119:245–252

    Article  Google Scholar 

  43. Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc, San Francisco

    Google Scholar 

  44. Raymond B, McInnes J, Dambacher J, Way S, Bergstrom D (2011) Qualitative modelling of invasive species eradication on subantarctic macquarie island. J Appl Ecol 48(1):181–191

    Article  Google Scholar 

  45. Reichard S, Hamilton C (1997) Predicting invasions of woody plants introduced into North America. Conserv Biol 11(1):193–203

    Article  Google Scholar 

  46. Ricciardi A, Neves RJ, Richard J, Rasmussen J (1998) Impending extinctions of North American freshwater mussels (Unionoida) following the zebra mussel (Dreissena polymorpha) invasion. J Anim Ecol 67(4):613–619

    Article  Google Scholar 

  47. Rowe D, Champion P (1994) Biomanipulation of plants and fish to restore Lake Parkinson: a case study and its implications. In: Collier K (ed) Restoration of aquatic habitat. Selected papers from the second day of the New Zealand Limnological Society 1993 annual conference, pp 53–65

  48. Van-Dyke J, Leslie JA, Nall L (1984) The effects of the grass carp on the aquatic macrophytes of four Florida lakes. J Aquat Plant Manag 22:87–95

    Google Scholar 

Download references


MAL acknowledges support from a Canadian Research Chair, an NSERC Discovery Grant and a Killiam Research Fellowship. RG acknowledges support from NSERC and AMII. YX acknowledges support from the Simon foundations. We thank Boris Beric, David Drolet and Huge MacIsaac for their contribution on data collection and useful comments. This work was partially supported by the Alberta Innovates Centre for Machine Learning, the Canadian Aquatic Invasive Species Network, the Natural Sciences and Engineering Research Council of Canada.

Author information



Corresponding author

Correspondence to Yanyu Xiao.


Appendix 1: Definitions

In the main text, we used accuracy, AUC, F-measure, precision and recall to compare the performance of different machine learning algorithms (Powers 2011). Here, we will give a precise description and formula computed based on the following confusion matrix:

Success Failure
Truth Success TP FN
Failure FP TN
  1. Accuracy:

    The ratio of number of correctly predicted trials and the total number of trials, \(\frac{TP+TN}{TP+FP+FN+TN}\).

  2. Precision:

    the fraction of predicted ‘Success’ trials that are true: \(\frac{TP}{TP+FP}\).

  3. Recall:

    The fraction of successful trials that are correctly classified, \(\frac{TP}{TP+FN}\).

  4. F-measure:

    Harmonic mean of Precision and Recall: \(\frac{2\times {\text {Precision}}\times {\text {Recall}}}{{\text {Precision}}+{\text {Recall}}}\).

  5. AUC:

    The area under the receiving operating curve (ROC) for a model; here, we followed the method presented in Ferri et al. (2002) for our decision tree model and methods in Fawcett (2006) for other models.

Appendix 2: Kaplan–Meier analysis

We viewed (various subsets of our) database as ‘survival data’, where we set ‘eradication time’ to be the duration of eradication attempts and the ‘censor’ bit to uncensored if eradications succeeded, and to censored if the eradications failed. We then use this idea to compute a Kaplan–Meier survival curve, which produces \(P({\hbox {Time to eradication}} \ge T)\) as a function of time T (Cox and Oakes 1984; Lawless 2002; Kleinbaum and Klein 2005).

To explain this process, consider the subset of 61 instances with ‘containment = yes’. We first sorted the durations of these instances from the shortest to the longest (total of 25 durations without repetitions); call these times: \([t_1, t_2, \ldots , t_{25}]\). At each time \(t_i\), we defined the ’eradicated trials’ for the instances whose durations were \(t_i\) and whose outcome was ‘Success’, and ’censored trials’ for attempts with same duration but whose outcomes was ‘Failure’. We also defined the number of trials at risk at time \(t_i\) to be the number of trials whose durations were no less than \(t_i\). We used these quantities to compute the survival probability corresponding to these \(t_i\)’s, which are the 25 \(P_i\)’s; the curve then contains these 25 \([t_i, P_i]\) pairs; see Fig. 7a. The probability can be calculated by the following formula

$$P_i=\Pi _{i:t_i\le t} \left( 1-\frac{d_i}{n_i}\right) ,$$

with \(d_i\) be the number of events and \(n_i\) be the total individuals at risk at time i. The survival probability at each time point are listed in the following table.

Time (year) Number of eradicated trials Number of censoring (failed trials) Number of trials at risk Survival probability
\(t_0 =0.00\)     \(P_0 = 1\)
\(t_1=0.08\) 1 0 61 \(P_1=1-\frac{1}{61}\)
\(t_2=0.17\) 1 0 60 \(P_2=P_1 \cdot (1-\frac{1}{60})\)
\(t_3=0.25\) 3 0 59 \(P_3=P_2 \cdot (1-\frac{3}{59})\)
\(t_4=0.33\) 0 1 56 \(P_4=P_3 \cdot (1-\frac{0}{56})\)
\(t_5=0.83\) 1 0 55 \(P_5=P_4 \cdot (1-\frac{1}{55})\)
\(t_6=1.00\) 2 3 54 \(P_6=P_5 \cdot (1-\frac{2}{54})\)
\(t_7=1.33\) 1 2 49 \(P_7=P_6 \cdot (1-\frac{1}{49})\)
\(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\)
\(t_ {25}=18.00\) 0 1 1 \(P_{25}=P_{24} \cdot (1-\frac{0}{1})\)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Xiao, Y., Greiner, R. & Lewis, M.A. Evaluation of machine learning methods for predicting eradication of aquatic invasive species. Biol Invasions 20, 2485–2503 (2018). https://doi.org/10.1007/s10530-018-1715-2

Download citation


  • Aquatic species
  • Machine learning
  • Survival analysis
  • Ecological features
  • Planned intervention