Abstract
In the work, we evaluate the performance of machine learning approaches for predicting successful eradication of aquatic invasive species (AIS) and assess the extent to which eradication of an invasive species depends on the certain specified ecological features of the target ecosystem and/or features that characterize the planned intervention. We studied the outcomes of 143 planned attempts for eradicating AIS, where each attempt was described by ecological and eradicationstrategyrelated features of the target ecosystem. We considered several machine learning approaches to determine whether one could produce a classifier that accurately predicts weather an invasive species will be eradicated. To assess each learner’s performance, we examined its tenfold crossvalidated prediction accuracy as well as the false positive rate, the Fmeasure, and the Area Under the ROC Curve. We also used Kaplan–Meier survival analysis to determine which features are relevant to predicting the time required for each eradication program. Across the five typical machine learning approaches, our analysis suggests that learners trained by the decision tree work well, and have the best performance. In particular, by examining the trained decision tree model, we found that if an occupied area was not large and/or containments of AIS dispersal were employed, the eradication of AIS was likely to be successful. We also trained decision tree models over only the ecological features and found that their performances were comparable with that of models trained using all features. As our trained decision tree models are accurate, decision makers can use them to estimate the result of the proposed actions before they commit to which specific strategy should be applied.
This is a preview of subscription content, access via your institution.
Change history
28 April 2018
Figure 6 was published incorrectly with an incorrect axis in the original publication. The correct version of Fig. 6 is provided in this correction.
Notes
 1.
Here, n = 6 as we are considering 6 features.
 2.
Some successful eradication attempts had records of several annual followup surveys at the end of the attempts (Rowe and Champion 1994; Akers 2009). (This is because confirmations of some species being eradicated may need several years of continuous observations on target ecosystems and assessments on the tradeoffs arising in any decisions.) Here, we defined these recorded durations as the time of that final followup survey—i.e., as time required to confirm the eradication of the AIS. For the other successful trials, without records of followup surveys, we set the recorded time as the eradication time.
References
Akers P (2009) Hydrilla eradication program progress report 2009. Technical report, California Department of Food and Agriculture
BarahonaSegovia R, Grez A, Bozinovic F (2015) Testing the hypothesis of greater eurythermality in invasive than in native ladybird species: from physiological performance to lifehistory strategies. Ecol Entomol 41(2):182–191
Boets P, Landuyt D, Everaert G, Broekx S, Goethals P (2015) Evaluation and comparison of datadriven and knowledgesupported Bayesian belief networks to assess the habitat suitability for alien macroinvertebrates. Environ Model Softw 74:92–103
Breiman L, Friedman J, Stone C, Olshen R (1984) Classification and regression trees. Taylor & Francis, London
Cambray J (2003) Impact on indigenous species biodiversity caused by the globalisation of alien recreational freshwater fisheries. In: Martens K (ed) Aquatic biodiversity: a celebratory volume in honour of Henri J. Dumont. Springer, Dordrecht, pp 217–230
Caudron A, Champigneulle A (2011) Multiple electrofishing as a mitigate tool for removing nonnative atlantic brown trout (Salmo trutte l.) threatening a native mediterranean brown trout population. Eur J Widlife Res 5(3):575–583
Cooling M, Hartley S, Sim D, Lester P (2011) The widespread collapse of an invasive species: Argentine ants (Linepithema humile) in New Zealand. Biol Lett 8:430–433
Cox D, Oakes D (1984) Analysis of survival data. Chapman & Hall/CRC, London
Drake D, Mercader R, Dobson T, Mandrak E (2015) Can we predict risky human behaviour involving invasive species? A case study of the release of fishes to the wild. Biol Invasions 17:309–326
Drolet D, Locke A, Lewis MA, Davidson J (2014) Userfriendly and evidencebased tool to evaluate probability of eradication of aquatic nonindigenous species. J Appl Ecol 51(4):1050–1056
Drolet D, Locke A, Lewis MA, Davidson J (2015) Evidencebased tool surpasses expert opinion in predicting probability of eradication of aquatic nonindigenous species. Ecol Appl 25(2):441–450
Eilers J, Truemper H, Jackson L, Eilers B, Loomis D (2011) Eradication of an invasive cyprinid (Gila bicolor) to achieve water quality goals in Diamond Lake, Oregon (USA). Ecol Appl 27:194–204
Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27:861–874
Ferri C, Flach P, HernandezOrallo J (2002) Learning decision trees using the area under the ROC curve. In: Proceeding ICML ’02 proceedings of the nineteenth international conference on machine learning, pp 139–146
Fielding A (1999) Machine learning methods for ecological applications. Springer, New York
Gurevitch J, Padilla DK (2004) Are invasive species a major cause of extinctions? Trends Ecol Evol 19(9):470–474
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I (2009) The weka data mining software: an update. SIGKDD Explor 11(1):10–18
Houlahan J, Findlay CS (2004) Effect of invasive plant species on temperate wetland plant diversity. Conserv Biol 18(4):1132–1138
Kaukeinen D (1983) Vertebrate pest control and management materials: fourth symposium. ASTM International, Philadelphia
Keller R, Kocev D, Dzeroski S (2011) Traitbased risk assessment for invasive species: high performance across diverse taxonomic groups, geographic ranges and machine learning/statistical tools. Divers Distrib 17(3):451–461
Klein J, Moeschberger M (1997) Survival analysis—techniques for censored and truncated data—statistics for biology and health. Springer, New York
Kleinbaum D, Klein M (2005) Survival analysis: statistics for biology and health, 2nd edn. Springer, New York
Kolar C, Lodge D (2001) Progress in invasion biology: predicting invaders. Trends Ecol Evol 16:199–204
Kolar C, Lodge DM (2002) Ecological predictions and risk assessment for alien fishes in North America. Science 298(5596):1233–1236
Kulp M, Moore S (2000) Multiple electrofishing removals for eliminating rainbow trout in a small southern appalachian stream. N Am J Fish Manag 20(1):259–266
Lawless J (2002) Statistical models and methods for lifetime data. WileyInterscience, Hoboken
Lawrence J (2005) Introduction to neural networks, 2nd edn. California Scientific Software Press, California
Lek S, Guacgan J (1999) Artificial neural networks as a tool in ecological modelling, an introduction. Ecol Model 120:65–73
Lockwood J, Cassey P, Blackburn T (2005) The role of propagule pressure in explaining species invasions. Trends Ecol Evol 20:223–228
Mantel N (1966) Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemother Rep 50(3):163–170
Mantel N, Haenszel W (1959) Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst 22(4):719–748
Massey F (1951) The Kolmogorov–Smirnov test for goodness of fit. J Am Stat Assoc 46(253):68–78
McDonald J (2014) Handbook of biological statistics, 3rd edn. Sparky House Publishing, Baltimore
Miller L (1956) Table of percentage points of Kolmogorov statistics. J Am Stat Assoc 51(273):111–121
Mitchell T (1997) Machine learning. McGrawHill Companies Inc, New York
Nagar L, Shenkar N (2016) Temperature and salinity sensitivity of the invasive ascidian Microcosmus exasperatus Heller, 1878. Aquat Invasions 11(1):33–43
Olden J, Jackson D (2002) Illuminating the black box; understanding variable contributions in artificial neural networks. Ecol Model 154:135–150
Olden J, Lawler J, Poff N (2008) Machine learning methods without tears: a primer for ecologists. Q Rev Biol 83(2):171–193
Peto R, Peto J (1972) Asymptotically efficient rank invariant test procedures. J R Stat Soc Ser A 135(2):185–207
Powers D (2011) Evaluation: from precision, recall and Fmeasure to ROC, informedness, markedness and correlation. J Mach Learn Technol 2(1):37–63
Pu R, Gong P, Tian Y, Miao X, Carruthers R, Anderson G (2008) Invasive species change detection using artificial neural networks and CASI hyperspectral imagery. Environ Monit Assess 140(1–3):15–32
Pullin A, Knight T, Stone D, Charman K (2004) Do conservation managers use scientific evidence to support their decisionmaking? Biol Conserv 119:245–252
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc, San Francisco
Raymond B, McInnes J, Dambacher J, Way S, Bergstrom D (2011) Qualitative modelling of invasive species eradication on subantarctic macquarie island. J Appl Ecol 48(1):181–191
Reichard S, Hamilton C (1997) Predicting invasions of woody plants introduced into North America. Conserv Biol 11(1):193–203
Ricciardi A, Neves RJ, Richard J, Rasmussen J (1998) Impending extinctions of North American freshwater mussels (Unionoida) following the zebra mussel (Dreissena polymorpha) invasion. J Anim Ecol 67(4):613–619
Rowe D, Champion P (1994) Biomanipulation of plants and fish to restore Lake Parkinson: a case study and its implications. In: Collier K (ed) Restoration of aquatic habitat. Selected papers from the second day of the New Zealand Limnological Society 1993 annual conference, pp 53–65
VanDyke J, Leslie JA, Nall L (1984) The effects of the grass carp on the aquatic macrophytes of four Florida lakes. J Aquat Plant Manag 22:87–95
Acknowledgements
MAL acknowledges support from a Canadian Research Chair, an NSERC Discovery Grant and a Killiam Research Fellowship. RG acknowledges support from NSERC and AMII. YX acknowledges support from the Simon foundations. We thank Boris Beric, David Drolet and Huge MacIsaac for their contribution on data collection and useful comments. This work was partially supported by the Alberta Innovates Centre for Machine Learning, the Canadian Aquatic Invasive Species Network, the Natural Sciences and Engineering Research Council of Canada.
Author information
Affiliations
Corresponding author
Appendices
Appendix 1: Definitions
In the main text, we used accuracy, AUC, Fmeasure, precision and recall to compare the performance of different machine learning algorithms (Powers 2011). Here, we will give a precise description and formula computed based on the following confusion matrix:
Prediction  

Success  Failure  
Truth  Success  TP  FN 
Failure  FP  TN 

Accuracy:
The ratio of number of correctly predicted trials and the total number of trials, \(\frac{TP+TN}{TP+FP+FN+TN}\).

Precision:
the fraction of predicted ‘Success’ trials that are true: \(\frac{TP}{TP+FP}\).

Recall:
The fraction of successful trials that are correctly classified, \(\frac{TP}{TP+FN}\).

Fmeasure:
Harmonic mean of Precision and Recall: \(\frac{2\times {\text {Precision}}\times {\text {Recall}}}{{\text {Precision}}+{\text {Recall}}}\).

AUC:
The area under the receiving operating curve (ROC) for a model; here, we followed the method presented in Ferri et al. (2002) for our decision tree model and methods in Fawcett (2006) for other models.
Appendix 2: Kaplan–Meier analysis
We viewed (various subsets of our) database as ‘survival data’, where we set ‘eradication time’ to be the duration of eradication attempts and the ‘censor’ bit to uncensored if eradications succeeded, and to censored if the eradications failed. We then use this idea to compute a Kaplan–Meier survival curve, which produces \(P({\hbox {Time to eradication}} \ge T)\) as a function of time T (Cox and Oakes 1984; Lawless 2002; Kleinbaum and Klein 2005).
To explain this process, consider the subset of 61 instances with ‘containment = yes’. We first sorted the durations of these instances from the shortest to the longest (total of 25 durations without repetitions); call these times: \([t_1, t_2, \ldots , t_{25}]\). At each time \(t_i\), we defined the ’eradicated trials’ for the instances whose durations were \(t_i\) and whose outcome was ‘Success’, and ’censored trials’ for attempts with same duration but whose outcomes was ‘Failure’. We also defined the number of trials at risk at time \(t_i\) to be the number of trials whose durations were no less than \(t_i\). We used these quantities to compute the survival probability corresponding to these \(t_i\)’s, which are the 25 \(P_i\)’s; the curve then contains these 25 \([t_i, P_i]\) pairs; see Fig. 7a. The probability can be calculated by the following formula
with \(d_i\) be the number of events and \(n_i\) be the total individuals at risk at time i. The survival probability at each time point are listed in the following table.
Time (year)  Number of eradicated trials  Number of censoring (failed trials)  Number of trials at risk  Survival probability 

\(t_0 =0.00\)  \(P_0 = 1\)  
\(t_1=0.08\)  1  0  61  \(P_1=1\frac{1}{61}\) 
\(t_2=0.17\)  1  0  60  \(P_2=P_1 \cdot (1\frac{1}{60})\) 
\(t_3=0.25\)  3  0  59  \(P_3=P_2 \cdot (1\frac{3}{59})\) 
\(t_4=0.33\)  0  1  56  \(P_4=P_3 \cdot (1\frac{0}{56})\) 
\(t_5=0.83\)  1  0  55  \(P_5=P_4 \cdot (1\frac{1}{55})\) 
\(t_6=1.00\)  2  3  54  \(P_6=P_5 \cdot (1\frac{2}{54})\) 
\(t_7=1.33\)  1  2  49  \(P_7=P_6 \cdot (1\frac{1}{49})\) 
\(\vdots\)  \(\vdots\)  \(\vdots\)  \(\vdots\)  \(\vdots\) 
\(t_ {25}=18.00\)  0  1  1  \(P_{25}=P_{24} \cdot (1\frac{0}{1})\) 
Rights and permissions
About this article
Cite this article
Xiao, Y., Greiner, R. & Lewis, M.A. Evaluation of machine learning methods for predicting eradication of aquatic invasive species. Biol Invasions 20, 2485–2503 (2018). https://doi.org/10.1007/s1053001817152
Received:
Accepted:
Published:
Issue Date:
Keywords
 Aquatic species
 Machine learning
 Survival analysis
 Ecological features
 Planned intervention