Introduction

A major challenge for invasion biologists has been to determine which species can complete the invasion process (arrive, survive, establish, spread), and eventually have negative impacts in an invaded area. To this end, research has focused on three main invasion aspects: propagule pressure (the number of individuals arriving within a given time interval; Lockwood et al. 2005; Simberloff 2009), species traits influencing invasiveness (Rejmanek and Richardson 1996; Kolar and Lodge 2001, 2002), and characteristics of the receiving environment influencing invasibility (e.g., Lonsdale 1999; Levine et al. 2004). Many tools have been developed to evaluate the risk a species poses to a particular area using various combinations of factors thought to influence the success of a species outside of its native range (reviewed in Kumschick and Richardson 2013). Such tools, if accurate, have clear benefits from a policy or management perspective, such as rapidly screening species, denying risky proposed intentional introductions, and focusing limited resources on species posing the greatest risk.

Screening-level tools for non-indigenous species are generally based on the answers to a series of questions to determine if a species is a threat (high risk) or not. Common screening-level tool types include decision trees (Reichard and Hamilton 1997; Kolar and Lodge 2002; Caley and Kuhnert 2006) and scoring systems (Pheloung et al. 1999; Daehler et al. 2004; Copp et al. 2009). Most scoring systems that have been calibrated and tested to date are derivatives of the Australia Weed Risk Assessment model (AuWRA; Pheloung 1995), a tool designed to evaluate proposed intentional plant introductions. The AuWRA has been adapted successfully for terrestrial plants internationally (Daehler et al. 2004; Kato et al. 2006; Gordon et al. 2008a, b; Gordon and Gantz 2008), aquatic plants (Gordon and Gantz 2011), and some animal taxa (Copp et al. 2009; Tricarico et al. 2010).

In the marine realm, risk assessment for non-indigenous species has mostly focused on vectors and pathways of introduction (Floerl et al. 2005; Weigle et al. 2005; Barry et al. 2008; Acosta et al. 2010; Clarke Murray et al. 2013); much less has been done on developing approaches to rapidly screen species based on the risk they pose (but see Hayes and Sliwa 2003; Nyberg and Wallentinus 2005; Miller et al. 2007; Locke 2009). To our knowledge, only one screening-level risk assessment tool specific to marine invertebrates is available, the Marine Invertebrate Invasiveness Scoring Kit (MI-ISK). This tool is an adaptation of the AuWRA and has yet to be calibrated and tested (GH Copp, pers. comm.). Given the sheer number and impact of recent marine invertebrate introductions (Ruiz et al. 2000; Grosholz 2002), managers would benefit from tools that allow a rapid evaluation of the risk posed by a species in a particular area.

The testing and calibration of risk assessment tools typically involve scoring species known to have been introduced to an area and relating those scores to the actual outcome of the introductions. However, quantification of impacts of non-indigenous species can be problematic (e.g., Nentwig et al. 2010; Blackburn et al. 2014). Thus, most risk assessment studies generally identify categorical outcomes (e.g., non-pest/minor pest/major pest, or invasive/not invasive; often referred to as a priori categories) that are based on the opinion of experts (Pheloung et al. 1999; Daehler et al. 2004; Gordon et al. 2012), on databases (Daehler and Carino 2000; Gordon et al. 2008a; Copp et al. 2009; Gordon and Gantz 2011), or literature accounts of whether species became established following documented introductions (Bomford and Glover 2004; Bomford et al. 2005). Threshold values can then be determined as the assessment scores that best assign species to the correct outcome category (e.g., Copp et al. 2009). This allows categorizing a species in relation to the risk posed and associated management decisions (e.g., in the case of proposed introductions: ‘accept’ if low risk, ‘evaluate further’ if ambiguous, or ‘reject’ if high risk). However, the realized impact of a non-indigenous species in an area is not categorical; species categorized as invaders will invariably result in a gradient of impacts. A continuous scale for the expert rankings may thus be more appropriate as it allows: (1) ranking of relative impact within categories (i.e., some invaders have greater impacts or pose higher risk than others), (2) an evaluation of strength of association between assessment score and realized impact (e.g., using correlation or regression analysis), and (3) a quantification of uncertainty. However, few studies (Pheloung et al. 1999; Daehler et al. 2004; Crosti et al. 2010; McClay et al. 2010) have transformed categorical scores to a continuous scale by using the average of semi-quantitative expert classifications.

Answers to questions included in screening-level tools and the information used to test these tools (i.e., the measure of realized impacts/risk) both contain uncertainties. Uncertainty may arise from the quality of information used or its interpretation (judgement subjectivity, sensu Regan et al. 2002) or the interpretation of the language used in assessment tool questions or expert surveys (linguistic uncertainty, sensu Regan et al. 2002), resulting in both intra- and inter-assessor/expert uncertainty. Few studies have addressed these issues directly (Kumschick and Richardson 2013). Blackburn et al. (2014) proposed a qualitative way to describe uncertainty surrounding realized impacts. Holt et al. (2012, 2014) devised a way to visualize uncertainty surrounding risk components and developed a scheme to combine two risk components (e.g., risk of entry and risk of establishment) under different levels of uncertainty. Copp et al. (2009) and Tricarico et al. (2010) added an uncertainty score associated with the answer to each question; these scores are then averaged to provide a relative measure of intra-assessor uncertainty for the score assigned to each species. Copp et al. (2009) reported the variability in scores assigned by different assessors for the same species, thus quantifying inter-assessor uncertainty. To date, none of the risk assessment tools developed has included a way to quantify uncertainty to compute confidence limits surrounding either risk or impact scores (Koop et al. 2012).

Sanctioned introductions of novel species in the marine environment have been greatly reduced in many parts of the world (ICES 2005), and most recent introductions are the result of either illegal or accidental releases. Thus, a screening-level risk assessment tool adapted to these means of introduction is needed to inform legislations (i.e., populate lists of species to be regulated) and to prioritize intervention (i.e., focus resources on riskiest species when a choice needs to be made). Here we present a new tool (Canadian Marine Invasive Screening Tool; CMIST) that follows the sequence of events in the invasion process of marine invertebrates (arrival, survival, establishment, spread, and impact), but that is general enough to be adapted to any taxon. Further, we provide the first evaluation of screening-level risk assessment tools for the marine environment by comparing predictions made by CMIST and MI-ISK against expert evaluations of risk posed by species known to have been introduced to three Canadian marine ecoregions. Since these tools are ultimately designed to evaluate risk posed by species not already present in an area, we also evaluated 45 potential invasive species/ecoregion combinations and compared their risk scores with those of species already present. Lastly, we present and employ a simple way to quantify uncertainty in both the expert evaluations and assessment scores.

Methods

Tools evaluated

The Canadian Marine Invasive Screening Tool (CMIST) is a modification of the Alberta Risk Assessment Tool (version 3; IASWG 2009; a general risk assessment tool to evaluate the risk associated with terrestrial and aquatic organisms, developed for use by the province of AB, Canada). It focuses on the different steps of the invasion process and explicitly distinguishes the two risk components: ‘Likelihood of invasion’ and ‘Impact of invasion’ (Kumschick and Richardson 2013). To this end, CMIST asks 17 questions pertaining to a species’ present status, rate of introduction, probability of survival, establishment, and spread in the assessment area, and ecological impacts in the assessment area and elsewhere (Table 1; guidance for each question is available as supplemental material). The answer to each question (‘Low’, ‘Moderate’, or ‘High’) is converted into a numerical score of 1, 2, or 3. A mean score is calculated for likelihood of invasion (i.e., questions 1–8) and potential impacts (i.e., questions 9–17). These two mean scores are then multiplied to obtain a final risk score ranging from 1 to 9 (a spreadsheet to calculate species assessment scores is available as supplemental material); this results in an equal contribution of each question to the final score, but we recognize accuracy or precision could be increased with a weighting scheme (Drolet et al. in prep). Assessors also assign a qualitative certainty score to the answer provided for each question. Certainty assigned as ‘Low’, ‘Moderate’, or ‘High’ is hereafter referred to as uncertainty being ‘High’, ‘Moderate’, or ‘Low’, respectively, to standardize with published literature. The questions are phrased to be possible to answer even in the absence of information, and all questions need to be answered to calculate a final score.

Table 1 Questions of CMIST and description of potential scores

The second tool evaluated, MI-ISK, is an adaptation of the AuWRA specific to marine invertebrates and is available online (www.cefas.defra.gov.uk/media/621525/decisiontools__background&guidance_v4_oct13.pdf). It asks 49 questions (most responses require a choice among yes/no/don’t know and some are qualitative-ordinal) pertaining to species domestication, climate requirements, distribution, history of invasion, biological traits, feeding biology, reproduction biology, dispersal, and persistence attributes. A qualitative level of confidence (0 = very uncertain, 1 = mostly uncertain, 2 = mostly certain, 3 = very certain) is recorded for each question. Using the answers provided by the assessor, this system returns scores ranging from −12 to 57. A minimum of 10 questions need to be answered for score calculation and some questions are given greater weight based on their perceived importance and answers to some questions influence the weighting of subsequent questions.

Species assessments

Risk scores for non-indigenous invertebrate species known to have been introduced in Canadian marine ecosystems were calculated using both CMIST and MI-ISK. Specifically, we focused on species introduced to three marine ecoregions [DFO 2009: Strait of Georgia on Canada’s west coast (30 species), and Gulf of St. Lawrence (15 species) and Scotian Shelf (15 species) on Canada’s east coast; Table 2]. Some species were introduced to more than one ecoregion (Table 2). Two biologists with good knowledge of non-indigenous and indigenous marine fauna independently scored each species-ecoregion combination using both tools. Scoring was done through searching for information needed to answer each question from various sources (e.g., primary publications, reports, databases) available via the internet, the idea being that any new species could be evaluated in a day or two using currently available tools. The justification and sources of information used to answer each question were noted to leave a record justifying potential decisions based on assessment (accountability). The same procedure was used to assess 15 species not already present in each of the three ecoregions, but that have the potential to arrive and establish in the future (i.e., species with a history of establishment outside their native range in broadly similar environmental conditions).

Table 2 List of non-indigenous marine invertebrate species evaluated in three Canadian marine ecoregions

Expert opinion survey

Ideally tool performance would be evaluated against actual outcomes, here impacts in the new ecosystem. Other similar studies have been able to test performance using species for which the realized impact has been documented, i.e., species that have been present for a long time and for which the invasion outcome (impacts) are evident (e.g., McClay et al. 2010). Unfortunately, for marine invertebrates in Canadian waters, too few species meet these criteria to generate a sufficient dataset for evaluation; most introductions are recent, the species are still spreading, and impacts have not been realized or fully documented. Thus, we conducted an expert opinion survey to obtain a baseline against which CMIST and MI-ISK tool performance could be evaluated. Experts have knowledge of the species (vectors, abundance, spread, impacts, etc.) and environmental/habitat characteristics in areas where they have been introduced; thereby providing a reasonable proxy for risk to an ecoregion. This survey was conducted using the web-based platform SurveyMonkey®. A list of 70 potential respondents, consisting of biologists with extensive experience and knowledge of non-indigenous marine species in Canadian and bordering American waters, were contacted by e-mail and invited to complete the survey. Details about the objectives of the survey were explained and potential respondents were asked to provide answers only for species/ecoregions for which they felt they had sufficient expertise. Experts were asked to qualify the level of risk a species poses to an ecoregion, and their level of certainty, as ‘Low’, ‘Moderate’, or ‘High’.

Testing of tools

The precision and accuracy of the assessment scores returned by CMIST and MI-ISK were evaluated for each ecoregion. Precision was evaluated as the between-assessor variability using correlation analyses of the overall risk scores assigned by each assessor; relative precision of the two tools was evaluated by statistically comparing the correlation coefficients (Zar 2010). To determine if scores returned by the two tools were in agreement, average scores (of the two assessments) were compared using correlation analyses. These two analyses were conducted separately for species known to have been introduced to an ecoregion and species not already present. Tool accuracy was evaluated using linear regression analyses with the expert opinion scores as the dependent variable and the average assessment scores as the independent variable (only species already introduced). We used the average of the expert risk ratings (‘Low’ = 1, ‘Moderate’ = 2, and ‘High’ = 3) to produce a continuous expert opinion score to use in the linear regression. Finally, corrected Akaike Information Criterion (AICc) values (a measure of model fit) were used to compare the accuracy of both tools (a lower relative AICc values represents better model accuracy) and evaluate the likelihood that each tool provides the best fit to the expert opinion scores.

We conducted a second independent evaluation of the accuracy of CMIST using the results of detailed risk assessments. So far, in Canada, such information is available for five tunicate taxa (Styela clava, Ciona intestinalis, Botrylloides violaceus, Botryllus schlosseri, and Didemnum vexillum.; Therriault and Herborg 2007), the European green crab (Carcinus maenas; Therriault et al. 2008a), and the Chinese mitten crab (Eriocheir sinensis; Therriault et al. 2008b). These assessments, which often take over a year to produce, classify species as posing ‘Low’, ‘Moderate’, or ‘High’ risk for both the Atlantic and Pacific Coasts. To visualize how the results of these detailed risk analyses compare to the CMIST scores adjusted for uncertainty (see below), species-ecoregion plots were produced. CMIST scores for each species were sorted in ascending order (but no threshold between risk categories was determined); note that the first question (about current status in the ecoregion) was ignored in score calculation to allow a comparison between species present and not present. We then added the results of the available detailed risk assessments to evaluate where species classified by the detailed assessments as moderate and high risk (no species were evaluated as low risk) fall along the spectrum of CMIST scores.

Quantification of uncertainty

A simple way to quantify uncertainty around the expert opinion scores and the CMIST species assessment scores was developed. The idea is similar in concept to fuzzy logic (previously used in risk assessment for non-indigenous species when dealing with subjective data; Acosta et al. 2010) and captures the probabilities that an expert/assessor would have provided a different answer if they had to answer a question several times. For example, given a question for which the answer is obvious (clearly falls within a category based on the guidance) and the evidence is strong (several peer-reviewed sources with similar conclusions), an assessor would probably always answer the question the same way. In contrast, if the sources of information are weak (e.g., anecdotal evidence and/or similar studies reaching different conclusions) assessors might answer a question differently if they had to repeat the process because the available evidence makes several answers equally possible. Thus, we developed probability distributions of answers under different levels of uncertainty (Low, Moderate, or High) and used them to compute the range of possible outcomes. Four authors (CDB, AL, CWM, and TWT) independently drew these distributions to reflect how, on average, it was felt the scores would be distributed for the nine possible combinations of score and uncertainty levels. The probabilities returned by this group were then averaged to produce the final distributions in Fig. 1. Confidence limits for the expert opinion scores were calculated using Monte Carlo procedures. Specifically, we used the distributions in Fig. 1 and the answers returned by experts, i.e., combination of risk and uncertainty levels for a species, to produce a range of possible scores. As a simplified example, assume two experts returned risk scores for a species: one scored the species as being high risk with low uncertainty and the other, high risk with high uncertainty. For each Monte Carlo simulation, the species risk score would be calculated by drawing numbers from the bottom right distribution in Fig. 1 for the first expert (score would always be a 3) and from the bottom left distribution for the second expert (score would be 3 for ~60 %, 2 for ~35 %, and 1 for ~5 % of the simulations). The drawn scores would then be averaged among experts and the process would be repeated 1000 times; using the 2.5th and 97.5th percentiles as the 95 % confidence limits. A similar approach was used for the CMIST assessment scores: values for each question were drawn for the probability distributions in Fig. 1 based on the assessor’s answer and level of uncertainty. The adjusted CMIST scores and expert opinion scores were plotted with their associated confidence limits, and linear regression analyses used to determine if the inclusion of uncertainty changed the fit between the two variables (when compared to the raw scores). A vector field showing how the approach changed the results for each species was produced.

Fig. 1
figure 1

Probability distribution of scores, at different levels of uncertainty, used to compute confidence limits around expert opinion and CMIST assessment scores. The distributions were independently adjusted by four of the authors, and the average used to represent the probabilities that an answer may have been changed for a particular level of uncertainty

Results

Overall, the two assessors returned similar scores. Correlation coefficients for species already introduced ranged from 0.51 to 0.85 among ecoregions and tools, and 0.72–0.83 for species not present (Fig. 2). The linear relationships generally had similar slopes and intercepts when comparing species already present and not present. However, the intercept for CMIST scores was slightly smaller for species not present in the Gulf of St. Lawrence and Scotian Shelf (Fig. 2). The precision of CMIST and MI-ISK was similar for the Gulf of St. Lawrence (comparison of correlation coefficient: p = 0.88 and 0.82 for species already present and not present, respectively), the Scotian Shelf (p = 0.61 and 0.78), and the Strait of Georgia (p = 0. 35 and 0.50). The scores returned by the two tools for species already present were highly correlated for the Gulf of St. Lawrence and the Scotian Shelf but less so for the Strait of Georgia, and scores were moderately correlated for all ecoregions when considering species not present (Fig. 3).

Fig. 2
figure 2

Between-assessor variability in risk scores assigned by CMIST and MI-ISK to non-indigenous marine invertebrate species in three Canadian marine ecoregions. Each point represents the evaluation of one species by two assessors (solid circles show species already present and open circles species not present in an area), and r values represent the correlation coefficients for species already present (solid lines) and not present (dotted lines), respectively

Fig. 3
figure 3

Between-tool (CMIST and MI-ISK) variability in scores of non-indigenous marine invertebrate species in three Canadian marine ecoregions. Scores are the average of two independent assessments, solid circles show species already present and open circles species not present in an area, and r values represent the correlation coefficients for species already present (solid lines) and not present (dotted lines), respectively

A total of 43 experts returned the survey and results are presented in Table 2. In general, there was a positive relationship between expert opinion scores and the risk assessment scores (Fig. 4). For CMIST, the slope of the best-fit regression line was significant for all ecoregions (p = 0.02 for the Gulf of St. Lawrence, p < 0.001 for the Scotian Shelf, and p = 0.01 for the Strait of Georgia). For MI-ISK, the relationship was significant for the Scotian Shelf (p = 0.005) but not for the other ecoregions (p = 0.11 for the Gulf of St. Lawrence and p = 0.15 for the Strait of Georgia). For all ecoregions, the AICc value for CMIST was smaller than that for MI-ISK, translating into likelihoods that CMIST provides a better fit than MI-ISK ranging from 0.82 to 0.95 (Table 3). Incorporating uncertainty in the calculations of expert opinion and CMIST scores improved the fit for the Gulf of St. Lawrence and the Scotian Shelf, but not the Strait of Georgia (Figs. 4, 5). At the species level, incorporation of uncertainty generally brought scores closer to the middle of the plot; species whose scores changed the most were generally lower risk (Fig. 5).

Fig. 4
figure 4

Relationship between raw CMIST and MI-ISK assessment scores and averaged expert opinion scores for non-indigenous marine invertebrate species in three Canadian marine ecoregions. Lines show best-fit linear regressions

Table 3 Comparison of fit of CMIST and MI-ISK scores with expert opinion scores for non-indigenous invertebrate species introduced to three Canadian marine ecoregions
Fig. 5
figure 5

Relationship between CMIST assessment scores adjusted for uncertainty, and adjusted expert opinion scores for non-indigenous marine invertebrate species in three Canadian marine ecoregions (left panels). Solid lines show best-fit linear regressions and grey error bars show 95 % confidence limits. The right panels present vector fields showing how each species was affected by incorporating the uncertainty adjustments; the base of each arrow is the position of a species when raw scores are used and the tip is the scores adjusted for uncertainty

The range of CMIST scores for species not present in the ecoregions was comparable to the range of scores for species already introduced (Fig. 6). Among species evaluated, the Pacific oyster (Crassostrea gigas) was identified as the highest-risk species on the east coast, followed by the veined rapa whelk (Rapana venosa). In the Strait of Georgia, the European green crab (C. maenas) was the highest-risk species. The CMIST scores were in agreement with the results of previously conducted detailed-level risk assessments. For tunicates, risk for all species was considered ‘high’ in east coast ecoregions (Therriault and Herborg 2007) and CMIST ranked these species among the riskiest. On the west coast (of which the Strait of Georgia is a small portion), all tunicate species were considered ‘high’ risk, except C. intestinalis that was considered ‘moderate’ risk (Therriault and Herborg 2007). CMIST ranked all tunicate species among the riskiest, with the exception of C. intestinalis which received a moderate score. For C. maenas, risk was considered ‘very high’ for both coasts (Therriault et al. 2008a), which is well reflected by the high scores returned by CMIST for all ecoregions. Finally, E. sinensis was considered to pose ‘moderate’ risk to the marine environment on both coasts (Therriault et al. 2008b) and CMIST returned moderate scores for this species. For all ecoregions, CMIST perfectly discriminated among moderate and high risk species classified by detailed risk assessment. Admittedly, the number of species for which detailed risk assessments are available is very low.

Fig. 6
figure 6

Ranked CMIST scores for non-indigenous marine invertebrate species known (solid circles) and not known (open circles, bolded on axis labels) to have been introduced to three Canadian marine ecoregions. Error bars show 95 % confidence intervals and letters show results from available detailed risk assessments (M moderate risk and H high risk)

Discussion

Screening-level risk assessment tools are imperfect, but of great utility to quantify risk and inform management of non-indigenous species. Scoring schemes provide a relatively quick and accurate way to screen and rank species without conducting time- and data-intensive formal quantitative risk analyses (e.g., Leung et al. 2012; Therriault et al. 2008a, b; Therriault and Herborg 2007). Most screening tools currently used to evaluate risks from non-indigenous species are derived from the AuWRA (Pheloung et al. 1999) which was designed to evaluate proposed intentional plant introductions (i.e., to recommend acceptance or rejection). Thus, by definition the introduction step is almost certain and the tool was designed to assess risk in the context of probability of persistence and spread outside cultivation. As such, this tool does not include questions about probability of introduction (arrival) and includes few about probable ecological impacts. It is thus difficult to decompose risk in terms of likelihood and impact of invasion as recommended in Kumschick and Richardson (2013; but see Daehler and Virtue 2010). CMIST was designed to follow the sequence of events in the invasion process (including potential to be introduced to a new area), and thus asks questions directly related to probability of arrival, survival, establishment, spread, and impacts. Thus, it is better suited for assessing risk of unintentional (accidental) introductions (in addition to intentional ones); the most prevalent type of invasions in marine coastal waters and elsewhere.

CMIST uses generalized questions, which could be considered more difficult to answer or subject to greater interpretation than questions about specific life-history traits, such that greater inter-assessor variability in scores might be expected. However, no notable differences in precision were observed between CMIST and MI-ISK. In addition, CMIST inter-assessor variability was smaller than that of the other tool for which similar information was available [mean absolute difference in scores assigned by two assessors divided by mean score for all species assessed; freshwater Fish Invasiveness Scoring Kit (FISK): 0.51, derived from Fig. 1 in Copp et al. (2009), CMIST: 0.16]. Finally, an analysis of individual questions (data not presented), showed the two assessors answered 63 % of the 1785 questions the same way and when answers differed, it was by a score of just 1 in 97 % of these cases. Therefore it appears CMIST is not more prone to high inter-assessor variability in scores when compared to other tools, but admittedly very few tools have actually been evaluated.

The assessment scores returned by the tools were well correlated and provided a good approximation of the expert opinion scores, with the notable exception of the Strait of Georgia (see discussion below). Accuracy of CMIST was slightly lower (Gulf of St. Lawrence; R2 = 0.33) or comparable (Scotian Shelf; R2 = 0.66) to that reported for the WRA in the other studies for which a linear relationship was reported [R2 = 0.47 (Pheloung et al. 1999), 0.52 (Daehler et al. 2004), 0.67 (Crosti et al. 2010), and 0.52 (McClay et al. 2010)]. Comparison of CMIST results to those for which time intensive detailed-level risk assessments have been conducted for the same geographic areas (five tunicate and two crab species) further support the premise that CMIST generally returns reliable risk scores. Other studies generally use Receiver Operating Characteristics (ROC) curves (Hughes and Madden 2003) and report correct classification rates to evaluate accuracy. While this technique is appropriate for evaluation of intentional introductions for which the management consequence of classification is obvious (accept non-pests and reject pests), we felt that such an approach was not appropriate for unintentional introductions. To appropriately evaluate unintentional introductions, it is essential to retain the continuous nature of the severity of realized impacts. This allows: (1) a direct comparison of risk posed by several species in a situation where management resources need to be prioritized, (2) an assessment of the expected risk posed by a potential novel non-indigenous species introduction, in relation to known past experiences (e.g., if species A arrives in an area, we may expect an impact similar to already-established species B), and (3) a better representation of reality because species classified as invaders will invariably differ in their magnitude of impacts within and between area(s).

Kumschick and Richardson (2013) identified the lack of techniques to quantify uncertainty as one of the main weaknesses of screening-level risk assessment tools. The technique we developed to quantify uncertainty is very similar to the one independently developed by the United States Department of Agriculture (USDA 2015; Anthony Koop, Pers. Comm.). They also use a Monte Carlo procedure to generate potential scores based on the level of uncertainty associated with each question; the main difference lies in the probability distributions used (an area of future research). These simple techniques are a significant advancement that could be applied to other risk assessment tools (although challenges may exist for tools with unequal questions/scoring or with feedback among questions). While these systems do not take into account natural variation (peculiarities and chance events influencing each individual invasion), we believe they encompass the uncertainty related to the quality of information available and/or used and language interpretation. In our system, the calculated confidence limits incorporate intra-individual uncertainty, inter-individual disagreements, and sample size (number of individuals that participate in the evaluations). The technique adjusts the influence of an individual response based on the level of certainty; uncertain answers are given less weight than more certain ones. This usually resulted in scores moving closer to the center of the plot and in a notable improvement in fit between expert and assessment scores for two of three assessed ecoregions. Species predicted to have the lowest risk seem to be the most affected (larger change in scores) by this procedure. This is logical as species with greater impacts are often better studied and thus, assessors and experts are more certain of potential effects from these species than those from less well-studied species with potentially fewer impacts.

Studies testing and calibrating risk assessment tools typically use categorical outcomes for realized impacts. Once a species is categorized, this category is considered to be the ‘true value’ of impact. However, despite recent progress (Nentwig et al. 2010; Kumschick et al. 2012; Blackburn et al. 2014), it is still difficult to quantify impacts of non-native species, especially for the less studied species. Therefore, uncertainty exists in the data being used to test these tools. To our knowledge, our study is the first to quantify this uncertainty. It revealed that sometimes, discrepancies between risk assessment scores and an indicator of realized impact (expert opinion) may be the result of high uncertainty on behalf of the expert. Future studies should consider this source of uncertainty and evaluate the potential consequences of misclassification in impact categories.

Both CMIST and MI-ISK scores were more weakly related to expert opinion for species in the SoG than in the other ecoregions considered. There may be several reasons for this. First, the impact of non-indigenous marine invertebrates is often considered less significant on the Canadian west coast compared to the east coast. In fact, the expert scores were significantly lower in the Strait of Georgia than the other two ecoregions (Tukey post hoc tests following significant one-way ANOVA; results not presented). The absence (or lack of realization of potential impacts) of highly problematic species might prevent detection of a statistical relationship (this is particularly likely for MI-ISK, for which high risk scores in the Strait of Georgia were rare). Second, the lack of relationship might be related to higher uncertainty for this region. There are few species with a high negative economic impact in the Strait of Georgia, and these are species for which ecological studies are usually urged. Thus, the quality of information for this ecoregion might be lower than for the east coast ecoregions where several high profile invasive species were included in our analyses (e.g., several tunicate species, green crab). Also, fewer expert evaluations were completed for the Strait of Georgia, and the scores were more uncertain. These factors resulted in significantly larger confidence limits in the Strait of Georgia than in the other two ecoregions (Tukey post hoc tests following significant one-way ANOVA; results not presented).

In conclusion, we recommend the use of CMIST as a screening-level risk assessment tools for non-indigenous marine invertebrate species. This tool reflects the invasion cycle, the scores relate well with expert opinion scores, and uncertainty can be quantified. The technique developed to quantify uncertainty should be incorporated in existing tools designed to evaluate intentional introductions. Since the CMIST questions are generalized to the invasion process and resulting impacts, CMIST could easily be adapted to other taxa simply by modifying the guidance for each question.